In Computational and mathematical methods in medicine
Aim : This study used machine learning methods to develop a prediction model for knee pain in middle-aged and elderly individuals.
Methods : A total of 5386 individuals above 45 years old were obtained from the National Health and Nutrition Examination Survey. Participants were randomly divided into a training set and a test set at a 7 : 3 ratio. The training set was used to create a prediction model, whereas the test set was used to validate the proposed model. We constructed multiple predictive models based on three machine learning methods: logistic regression, random forest, and Extreme Gradient Boosting. The model performance was evaluated by areas under the receiver (AUC), sensitivity, specificity, positive predictive value, and negative predictive value. Additionally, we created a simplified nomogram based on logistic regression for better clinical application.
Results : About 31.4% (1690) individuals were with self-reported knee pain. The logistic regression showed that female gender (odds ratio [OR] = 1.28), pain elsewhere (OR = 4.64), and body mass index (OR = 1.05) were significantly associated with increased risk of knee pain. In the test set, the logistic regression (AUC = 0.71) showed similar but slightly higher accuracy than the random forest (AUC = 0.70), while the performance of the Extreme Gradient Boosting model was less reliable (AUC = 0.59). Based on mean decrease accuracy, the most important first five predictions were pain elsewhere, waist circumference, body mass index, age, and gender. Additionally, the most important first five predictions with the highest mean decrease Gini index were pain elsewhere, body mass index, waist circumference, triglycerides, and age. The nomogram model showed good discrimination ability with an AUC of 0.75 (0.73-0.77), a sensitivity of 0.72, specificity of 0.71, a positive predictive value of 0.45, and a negative predictive value of 0.88.
Conclusion : This study proposed a convenient nomogram tool to evaluate the risk of knee pain for the middle-aged and elderly US population in primary care. All the input variables can be easily obtained in a clinical setting, and no additional radiologic assessments were required.
Liu Lu, Zhu Min-Min, Cai Lin-Lin, Zhang Xiao