In Journal of biomedical informatics ; h5-index 55.0
BACKGROUND : Increasing number of chest x-ray (CXR) examinations in radiodiagnosis departments burdens radiologists' and makes the timely generation of accurate radiological reports highly challenging. An automatic radiological report generation (ARRG) system is envisaged to generate radiographic reports with minimal human intervention, ease radiologists' burden, and smoothen the clinical workflow. The success of an ARRG system depends on two critical factors: i) quality of the features extracted by the ARRG system from the CXR images, and ii) quality of the linguistic expression generated by the ARRG system describing the normalities and abnormalities as indicated by the extracted features. Most of the existing ARRG systems miserably fail due to the latter factor and do not generate clinically acceptable reports because they ignore the contextual importance of the medical terms.
OBJECTIVE : The advent of contextual word embeddings, like ELMo and BERT, has revolutionized several natural language processing (NLP) tasks. A contextual embedding represents a word based on its context. The main objective of this work is to develop an ARRG system that uses contextual word embeddings to generate clinically accurate reports from CXR images.
METHODS : We present an end-to-end deep neural network that uses contextual word representations for generating clinically useful radiological reports from CXR images. The proposed network, termed as RadioBERT, uses DistilBERT for contextual word representation and leverages transfer learning. Additionally, due to the importance of abnormal observations over the normal ones, the network reorders the generated sentences by applying sentiment analysis to keep abnormal descriptions on the top of the generated report.
RESULTS : The empirical study consisting of several experiments performed on the OpenI dataset indicates that CNN+Hierarchical LSTM with DistilBERT embedding improves the benchmark performance. We have been able to achieve the following performance scores: BLEU-1=0.772, BLEU-2=0.770, BLEU-3=0.768, BLEU-4=0.767, CIDEr=0.5563, and ROUGE=0.897.
CONCLUSION : The proposed method improves the state-of-the-art performance scores by a substantial margin. It is concluded that the use of word embeddings generated by DistilBERT enhances the performance of hierarchical LSTM for producing clinical reports by significant margin.
Kaur Navdeep, Mittal Ajay
BERT (Bidirectional Encoder Representations from Transformers), Medical report generation, Pretrained language model, Word embedding