In The Journal of the Acoustical Society of America
In statistical-based speech enhancement algorithms, the a priori signal-to-noise ratio (SNR) must be estimated to calculate the required spectral gain function. This paper proposes a method to improve this estimation using features derived from the neural responses of the auditory-nerve (AN) system. The neural responses, interpreted as a neurogram (NG), are simulated for noisy speech using a computational model of the AN system with a range of characteristic frequencies (CFs). Two machine learning algorithms were explored to train the estimation model based on NG features: support vector regression and a convolutional neural network. The proposed estimator was placed in a common speech enhancement system, and three conventional spectral gain functions were employed to estimate the enhanced signal. The proposed method was tested using the NOIZEUS database at different SNR levels, and various speech quality and intelligibility measures were employed for performance evaluation. The a priori SNR estimated from NG features achieved better quality and intelligibility scores than that of recent estimators, especially for highly distorted speech and low SNR values.
Jassim Wissam A, Harte Naomi