In Journal of pathology informatics ; h5-index 23.0
Background : Free-text sections of pathology reports contain the most important information from a diagnostic standpoint. However, this information is largely underutilized for computer-based analytics. The vast majority of NLP-based methods lack a capacity to accurately extract complex diagnostic entities and relationships among them as well as to provide an adequate knowledge representation for downstream data-mining applications.
Methods : In this paper, we introduce a novel informatics pipeline that extends open information extraction (openIE) techniques with artificial intelligence (AI) based modeling to extract and transform complex diagnostic entities and relationships among them into Knowledge Graphs (KGs) of relational triples (RTs).
Results : Evaluation studies have demonstrated that the pipeline's output significantly differs from a random process. The semantic similarity with original reports is high (Mean Weighted Overlap of 0.83). The precision and recall of extracted RTs based on experts' assessment were 0.925 and 0.841 respectively (P <0.0001). Inter-rater agreement was significant at 93.6% and inter-rated reliability was 81.8%.
Conclusion : The results demonstrated important properties of the pipeline such as high accuracy,minimality and adequate knowledge representation. Therefore, we conclude that the pipeline can be used in various downstream data-mining applications to assist diagnostic medicine.
Giannaris Pericles S, Al-Taie Zainab, Kovalenko Mikhail, Thanintorn Nattapon, Kholod Olha, Innokenteva Yulia, Coberly Emily, Frazier Shellaine, Laziuk Katsiarina, Popescu Mihail, Shyu Chi-Ren, Xu Dong, Hammer Richard D, Shin Dmitriy
Free-text pathology reports, information extraction, n-ary modeling, structurization