In medRxiv : the preprint server for health sciences
Objectives : To detect unilateral vocal fold paralysis (UVFP) from voice recordings using an explainable model of machine learning.
Study Design : Case series - retrospective with a control group.
Methods : Patients with confirmed UVFP through endoscopic examination (N=77) and controls with normal voices matched for age and sex (N=77) were included. Two tasks were used to elicit voice samples: reading the Rainbow Passage and sustaining phonation of the vowel "a". The 88 extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) features were extracted as inputs for four machine learning models of differing complexity. SHAP was used to identify important features.
Results : The median bootstrapped Area Under the Receiver Operating Characteristic Curve (ROC AUC) score ranged from 0.79 to 0.87 depending on model and task. After removing redundant features for explainability, the highest median ROC AUC score was 0.84 using only 13 features for the vowel task and 0.87 using 39 features for the reading task. The most important features included intensity measures, mean MFCC1, mean F1 amplitude and frequency, and shimmer variability depending on model and task.
Conclusion : Using the largest dataset studying UVFP to date, we achieve high performance from just a few seconds of voice recordings. Notably, we demonstrate that while similar categories of features related to vocal fold physiology were conserved across models, the models used different combinations of features and still achieved similar effect sizes. Machine learning thus provides a mechanism to detect UVFP and contextualize the accuracy relative to both model architecture and pathophysiology.
Low Daniel M, Randolph Gregory, Rao Vishwanatha, Ghosh Satrajit S, Song Phillip C