Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In Journal of biomedical informatics ; h5-index 55.0

Natural Language Processing (NLP) can offer important tools for unlocking relevant information from clinical narratives. Although Transformer-based models can achieve remarkable results in several different NLP tasks, these models have been less used in clinical NLP, and particularly in low resource languages, of which Portuguese is one example. It is still not entirely clear whether pre-trained Transformer models are useful for clinical tasks, without further architecture engineering or particular training strategies. In this work, we propose a BERT model to assign ICD-10 codes for causes of death, by analyzing free-text descriptions in death certificates, together with the associated autopsy reports and clinical bulletins, from the Portuguese Ministry of Health. We used a novel pre-training procedure that incorporates in-domain knowledge, and also a fine-tuning method to address the class imbalance issue. Experimental results show that, in this particular clinical task that requires the processing of relatively short documents, Transformer-based models can achieve very strong results, significantly outperforming tailored approaches based on recurrent neural networks.

Coutinho Isabel, Martins Bruno

2022-Oct-25

Automated ICD coding, Clinical Natural Language Processing, Deep learning, Natural Language Processing for non-English languages, Transformer-based language models