ArXiv Preprint
Importance: The prevalence of severe mental illnesses (SMIs) in the United
States is approximately 3% of the whole population. The ability to conduct risk
screening of SMIs at large scale could inform early prevention and treatment.
Objective: A scalable machine learning based tool was developed to conduct
population-level risk screening for SMIs, including schizophrenia,
schizoaffective disorders, psychosis, and bipolar disorders,using 1) healthcare
insurance claims and 2) electronic health records (EHRs).
Design, setting and participants: Data from beneficiaries from a nationwide
commercial healthcare insurer with 77.4 million members and data from patients
from EHRs from eight academic hospitals based in the U.S. were used. First, the
predictive models were constructed and tested using data in case-control
cohorts from insurance claims or EHR data. Second, performance of the
predictive models across data sources were analyzed. Third, as an illustrative
application, the models were further trained to predict risks of SMIs among
18-year old young adults and individuals with substance associated conditions.
Main outcomes and measures: Machine learning-based predictive models for SMIs
in the general population were built based on insurance claims and EHR.
Dianbo Liu, Karmel W. Choi, Paulo Lizano, William Yuan, Kun-Hsing Yu, Jordan Smoller, Isaac Kohane
2022-12-20