In Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy
Quality assurance is one of the key issues in tobacco industry and many efforts have been put on the quality control. This paper introduces a new chemometrics technique to estimate the "quality similarity rate", which is used for quality control. The value of the quality similarity rate represents the similarity degree between the products and the standard reference samples, which is a global parameter that can be generated by either human assessors or machine learning. Supervised similarity regression models are built to automatically estimate the quality similarity rate value from NIRS data of tobacco leaf and smoke. For the similarity regression learning, the metric matrix is generated by a novel method which calculates the Mahalanobis distance from the segmented near infrared spectroscopy (NIRS). The results show the similarity regression learning can predict the quality similarity score well in high speed and can be improved with lasso (least absolute shrinkage and selection operator) related feature selection algorithms such as sRDA (sparse redundancy analysis) and glmnet.
Huo Juan, Ma Yuping, Lu Changtong, Chenggang Li, Kun Duan, Huaiqi Li
Canonical correlation, Feature selection, Lasso, Mahalanobis distance, Near infrared spectroscopy, Similarity regression learning, glmnet