In Analytica chimica acta
At present, deep learning is widely used in spectral data processing. Deep learning requires a large amount of data for training, while the collection of biological serum spectra is limited by sample numbers and labor costs, so it is impractical to obtain a large amount of serum spectral data for disease detection. In this study, we propose a spectral classification model based on the deep structured semantic model (DSSM) and successfully apply it to Fourier Transform Infrared (FT-IR) spectroscopy for ductal carcinoma in situ (DCIS) detection. Compared with the traditional deep learning model, we match the spectral data into positive and negative pairs according to whether the spectra are from the same category. The DSSM structure is constructed by extracting features according to the spectral similarity of spectra pairs. This new construction model increases the data amount used for model training and reduces the dimension of spectral data. Firstly, the FT-IR spectra are paired. The spectra pairs are labeled as positive pairs if they come from the same category, and the spectra pairs are labeled as negative pairs if they come from different categories. Secondly, two spectra in each spectra pair are put into two deep neural networks of the DSSM structure separately. Then the spectral similarity between the output feature maps of two deep neural networks is calculated. The DSSM structure is trained by maximizing the conditional likelihood of the spectra pairs from the same category. Thirdly, after the training of DSSM is done, the training set and testing set are input into two deep neural networks separately. The output feature maps of the training set are put into the reference library. Lastly, the k-nearest neighbor (KNN) model is used for classification according to Euclidean distances between the output feature map of each unknown sample to the reference library. The category of the unknown sample is judged according to the categories of k nearest samples. We also use principal component analysis (PCA) to reduce dimension for comparison. The accuracies of the KNN model, principal component analysis-k nearest neighbor (PCA-KNN) model, and deep structured semantic model-k nearest neighbor (DSSM-KNN) model are 78.8%, 72.7%, and 97.0%, which proves that our proposed model has higher accuracy.
Du Yu, Xie Fei, Wu Guohua, Chen Peng, Yang Yang, Yang Liu, Yin Longfei, Wang Shu
2023-Apr-22
Deep structured semantic model, Detection, Ductal carcinoma in situ, Fourier Transform Infrared spectroscopy