In Earth science informatics
Spectroscopy is a methodology for gaining knowledge of particles, especially biomolecules, by quantifying the interactions between matter and light. By examining the level of light absorbed, reflected or released by a specimen, its constituents, properties, and volume can be determined. Spectra obtained through spectroscopy procedures are quick, harmless and contactless; hence nowadays preferred in chemometrics. Due to the high dimensional nature of the spectra, it is challenging to build a robust classifier with good performance metrics. Many linear and nonlinear dimensionality reduction-based classification models have been previously implemented to overcome this issue. However, they lack in capturing the subtle details of the spectra into the low dimension space or cannot efficiently handle the nonlinearity present in the spectral data. We propose a graph-based neural network embedding approach to extract appropriate features into latent space and circumvent the spectrums' nonlinearity problem. Our approach performs dimensionality reduction into two phases: constructing a nearest neighbor graph and producing almost linear embedding using a fully connected neural network. Further, the low dimensional embedding is subjected to classification using the Random Forest algorithm. In this paper, we have implemented and compared our technique with four nonlinear dimensionality techniques widely used for spectral data analysis. In this study, we have considered five different spectral datasets belonging to specific applications. The various classification performance metrics of all the techniques are evaluated. The proposed approach is able to perform competitively well on six different low-dimensional spaces for each dataset with an accuracy score above 95% and Matthew's correlation coefficient value close to 1. The trustworthiness score of almost 1 show that the presented dimensionality reduction approach preserves the closest neighbor structure of high dimensional spectral inputs into latent space.
Yousuff Mohamed, Babu Rajasekhara
COVID-19, Chemometrics, Dimensionality reduction, Machine learning, Random Forest, Spectroscopy