In International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases
OBJECTIVES : Administrative claims data are prone to underestimate the burden of nontuberculous mycobacterial pulmonary disease (NTM-PD).
METHODS : We developed machine learning-based algorithms using historical claims data from cases with NTM-PD to predict patients with a high probability of having previously undiagnosed NTM-PD and to assess actual prevalence and incidence. Adults with incident NTM-PD were classified from a representative 5% sample of the German population covered by statutory health insurance during 2011-2016 by the International Classification of Diseases, 10th revision code A31.0. Pre-diagnosis characteristics (patient demographics, comorbidities, diagnostic and therapeutic procedures, and medications) were extracted and compared to that of a control group without NTM-PD to identify risk factors.
RESULTS : Applying a random forest model (area under the curve 0.847; total error 19.4%) and a risk threshold of >99%, prevalence and incidence rates in 2016 increased 5-fold and 9-fold to 19 and 15 cases/100,000 population, respectively, for both coded and non-coded vs. coded cases alone.
CONCLUSIONS : The use of a machine learning-based algorithm applied to German statutory health insurance claims data predicted a considerable number of previously unreported NTM-PD cases with high probabilty.
Ringshausen Felix C, Ewen Raphael, Multmeier Jan, Monga Bondo, Obradovic Marko, van der Laan Roald, Diel Roland
Epidemiology, Insurance claims analysis, Machine learning, Nontuberculous mycobacteria, Nontuberculous mycobacterium infections, Probability learning