ArXiv Preprint
Despite of the success of multi-modal foundation models pre-trained on
large-scale data in natural language understanding and vision recognition, its
counterpart in medical and clinical domains remains preliminary, due to the
fine-grained recognition nature of the medical tasks with high demands on
domain knowledge. Here, we propose a knowledge-enhanced vision-language
pre-training approach for auto-diagnosis on chest X-ray images. The algorithm,
named Knowledge-enhanced Auto Diagnosis~(KAD), first trains a knowledge encoder
based on an existing medical knowledge graph, i.e., learning neural embeddings
of the definitions and relationships between medical concepts and then
leverages the pre-trained knowledge encoder to guide the visual representation
learning with paired chest X-rays and radiology reports. We experimentally
validate KAD's effectiveness on three external X-ray datasets. The zero-shot
performance of KAD is not only comparable to that of the fully-supervised
models but also, for the first time, superior to the average of three expert
radiologists for three (out of five) pathologies with statistical significance.
When the few-shot annotation is available, KAD also surpasses all existing
approaches in finetuning settings, demonstrating the potential for application
in different clinical scenarios.
Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie
2023-02-27