In Radiology. Cardiothoracic imaging

Background : Coronavirus disease 2019 (COVID-19) has spread quickly throughout the United States (US) causing significant disruption in healthcare and society. Tools to identify hot spots are important for public health planning. The goal of our study was to determine if natural language processing (NLP) algorithm assessment of thoracic computed tomography (CT) imaging reports correlated with the incidence of official COVID-19 cases in the US.

Methods : Using de-identified HIPAA compliant patient data from our common imaging platform interconnected with over 2,100 facilities covering all 50 states, we developed three NLP algorithms to track positive CT imaging features of respiratory illness typical in SARS-CoV-2 viral infection. We compared our findings against the number of official COVID-19 daily, weekly and state-wide.

Results : The NLP algorithms were applied to 450,114 patient chest CT comprehensive reports gathered from January 1st to October 3rd, 2020. The best performing NLP model exhibited strong correlation with daily official COVID-19 cases (r2=0.82, p<0.005). The NLP models demonstrated an early rise in cases followed by the increase of official cases, suggesting the possibility of an early predictive marker, with strong correlation to official cases on a weekly basis (r2=0.91, p<0.005). There was also substantial correlation between the NLP and official COVID-19 incidence by state (r2=0.92, p<0.005).

Conclusion : Using big data, we developed a novel machine-learning based NLP algorithm that can track imaging findings of respiratory illness detected on chest CT imaging reports with strong correlation with the progression of the COVID-19 pandemic in the US.

Cury Ricardo C, Megyeri Istvan, Lindsey Tony, Macedo Robson, Batlle Juan, Kim Shwan, Baker Brian, Harris Robert, Clark Reese H


big data, chest CT, computed tomography, machine learning, natural language processing, public health, viral outbreak