ArXiv Preprint
Understanding patterns of diagnoses, medications, procedures, and laboratory
tests from electronic health records (EHRs) and health insurer claims is
important for understanding disease risk and for efficient clinical
development, which often require rules-based curation in collaboration with
clinicians. We extended an unsupervised phenotyping algorithm, mixEHR, to an
online version allowing us to use it on order of magnitude larger datasets
including a large, US-based claims dataset and a rich regional EHR dataset. In
addition to recapitulating previously observed disease groups, we discovered
clinically meaningful disease subtypes and comorbidities. This work scaled up
an effective unsupervised learning method, reinforced existing clinical
knowledge, and is a promising approach for efficient collaboration with
clinicians.
Ying Xu, Anna Decker, Jacob Oppenheim, Romane Gauriau
2022-11-14