In Rheumatology (Oxford, England)
OBJECTIVE : The objective of this study was to compare the performance of an RA algorithm developed and trained in 2010 utilizing natural language processing and machine learning, using updated data containing ICD10, new RA treatments, and a new electronic medical records (EMR) system.
METHODS : We extracted data from subjects with ≥1 RA International Classification of Diseases (ICD) codes from the EMR of two large academic centres to create a data mart. Gold standard RA cases were identified from reviewing a random 200 subjects from the data mart, and a random 100 subjects who only have RA ICD10 codes. We compared the performance of the following algorithms using the original 2010 data with updated data: (i) a published 2010 RA algorithm; (ii) updated algorithm, incorporating ICD10 RA codes and new DMARDs; and (iii) published algorithm using ICD codes only, ICD RA code ≥3.
RESULTS : The gold standard RA cases had mean age 65.5 years, 78.7% female, 74.1% RF or antibodies to cyclic citrullinated peptide (anti-CCP) positive. The positive predictive value (PPV) for ≥3 RA ICD was 54%, compared with 56% in 2010. At a specificity of 95%, the PPV of the 2010 algorithm and the updated version were both 91%, compared with 94% (95% CI: 91, 96%) in 2010. In subjects with ICD10 data only, the PPV for the updated 2010 RA algorithm was 93%.
CONCLUSION : The 2010 RA algorithm validated with the updated data with similar performance characteristics as the 2010 data. While the 2010 algorithm continued to perform better than the rule-based approach, the PPV of the latter also remained stable over time.
Huang Sicong, Huang Jie, Cai Tianrun, Dahal Kumar P, Cagan Andrew, He Zeling, Stratton Jacklyn, Gorelik Isaac, Hong Chuan, Cai Tianxi, Liao Katherine P
electronic medical record, machine learning, rheumatoid arthritis