ArXiv Preprint
Healthcare domain generates a lot of unstructured and semi-structured text.
Natural Language processing (NLP) has been used extensively to process this
data. Deep Learning based NLP especially Large Language Models (LLMs) such as
BERT have found broad acceptance and are used extensively for many
applications. A Language Model is a probability distribution over a word
sequence. Self-supervised Learning on a large corpus of data automatically
generates deep learning-based language models. BioBERT and Med-BERT are
language models pre-trained for the healthcare domain. Healthcare uses typical
NLP tasks such as question answering, information extraction, named entity
recognition, and search to simplify and improve processes. However, to ensure
robust application of the results, NLP practitioners need to normalize and
standardize them. One of the main ways of achieving normalization and
standardization is the use of Knowledge Graphs. A Knowledge Graph captures
concepts and their relationships for a specific domain, but their creation is
time-consuming and requires manual intervention from domain experts, which can
prove expensive. SNOMED CT (Systematized Nomenclature of Medicine -- Clinical
Terms), Unified Medical Language System (UMLS), and Gene Ontology (GO) are
popular ontologies from the healthcare domain. SNOMED CT and UMLS capture
concepts such as disease, symptoms and diagnosis and GO is the world's largest
source of information on the functions of genes. Healthcare has been dealing
with an explosion in information about different types of drugs, diseases, and
procedures. This paper argues that using Knowledge Graphs is not the best
solution for solving problems in this domain. We present experiments using LLMs
for the healthcare domain to demonstrate that language models provide the same
functionality as knowledge graphs, thereby making knowledge graphs redundant.
Kunal Suri, Atul Singh, Prakhar Mishra, Swapna Sourav Rout, Rajesh Sabapathy
2023-01-10