*In Scientific reports **; h5-index 158.0 *

The linear relationship between optical absorbance and the concentration of analytes-as postulated by the Beer-Lambert law-is one of the fundamental assumptions that much of the optical spectroscopy literature is explicitly or implicitly based upon. The common use of linear regression models such as principal component regression and partial least squares exemplifies how the linearity assumption is upheld in practical applications. However, the literature also establishes that deviations from the Beer-Lambert law can be expected when (a) the light source is far from monochromatic, (b) the concentrations of analytes are very high and (c) the medium is highly scattering. The lack of a quantitative understanding of when such nonlinearities can become predominant, along with the mainstream use of nonlinear machine learning models in different fields, have given rise to the use of methods such as random forests, support vector regression, and neural networks in spectroscopic applications. This raises the question that, given the small number of samples and the high number of variables in many spectroscopic datasets, are nonlinear effects significant enough to justify the additional model complexity? In the present study, we empirically investigate this question in relation to lactate, an important biomarker. Particularly, to analyze the effects of scattering matrices, three datasets were generated by varying the concentration of lactate in phosphate buffer solution, human serum, and sheep blood. Additionally, the fourth dataset pertained to invivo, transcutaneous spectra obtained from healthy volunteers in an exercise study. Linear and nonlinear models were fitted to each dataset and measures of model performance were compared to attest the assumption of linearity. To isolate the effects of high concentrations, the phosphate buffer solution dataset was augmented with six samples with very high concentrations of lactate between (100-600 mmol/L). Subsequently, three partly overlapping datasets were extracted with lactate concentrations varying between 0-11, 0-20 and 0-600 mmol/L. Similarly, the performance of linear and nonlinear models were compared in each dataset. This analysis did not provide any evidence of substantial nonlinearities due high concentrations. However, the results suggest that nonlinearities may be present in scattering media, justifying the use of complex, nonlinear models.
*Mamouei M, Budidha K, Baishya N, Qassem M, Kyriacou P A*

*2021-Jul-02*