Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In BMC medical informatics and decision making ; h5-index 38.0

BACKGROUND : Extracting relevant information about infectious diseases is an essential task. However, a significant obstacle in supporting public health research is the lack of methods for effectively mining large amounts of health data.

OBJECTIVE : This study aims to use natural language processing (NLP) to extract the key information (clinical factors, social determinants of health) from published cases in the literature.

METHODS : The proposed framework integrates a data layer for preparing a data cohort from clinical case reports; an NLP layer to find the clinical and demographic-named entities and relations in the texts; and an evaluation layer for benchmarking performance and analysis. The focus of this study is to extract valuable information from COVID-19 case reports.

RESULTS : The named entity recognition implementation in the NLP layer achieves a performance gain of about 1-3% compared to benchmark methods. Furthermore, even without extensive data labeling, the relation extraction method outperforms benchmark methods in terms of accuracy (by 1-8% better). A thorough examination reveals the disease's presence and symptoms prevalence in patients.

CONCLUSIONS : A similar approach can be generalized to other infectious diseases. It is worthwhile to use prior knowledge acquired through transfer learning when researching other infectious diseases.

Raza Shaina, Schwartz Brian

2023-Jan-26

Artificial intelligence, COVID-19, Data cohort, Named entity, Natural language processing, Relation extraction, Transfer learning