In International journal of sports medicine
The purpose of this article is to present how predictive machine learning methods can be utilized for detecting sport injury risk factors in a data-driven manner. The approach can be used for finding new hypotheses for risk factors and confirming the predictive power of previously recognized ones. We used three-dimensional motion analysis and physical data from 314 young basketball and floorball players (48.4% males, 15.72±1.79 yr, 173.34±9.14 cm, 64.65±10.4 kg). Both linear (L1-regularized logistic regression) and non-linear methods (random forest) were used to predict moderate and severe knee and ankle injuries (N=57) during three-year follow-up. Results were confirmed with permutation tests and predictive risk factors detected with Wilcoxon signed-rank-test (p<0.01). Random forest suggested twelve consistent injury predictors and logistic regression twenty. Ten of these were suggested in both models; sex, body mass index, hamstring flexibility, knee joint laxity, medial knee displacement, height, ankle plantar flexion at initial contact, leg press one-repetition max, and knee valgus at initial contact. Cross-validated areas under receiver operating characteristic curve were 0.65 (logistic regression) and 0.63 (random forest). The results highlight the difficulty of predicting future injuries, but also show that even with models having relatively low predictive power, certain predictive injury risk factors can be consistently detected.
Jauhiainen Susanne, Kauppi Jukka-Pekka, Leppänen Mari, Pasanen Kati, Parkkari Jari, Vasankari Tommi, Kannus Pekka, Äyrämö Sami