In Proceedings of SPIE--the International Society for Optical Engineering
Multi-modal learning (e.g., integrating pathological images with genomic features) tends to improve the accuracy of cancer diagnosis and prognosis as compared to learning with a single modality. However, missing data is a common problem in clinical practice, i.e., not every patient has all modalities available. Most of the previous works directly discarded samples with missing modalities, which might lose information in these data and increase the likelihood of overfitting. In this work, we generalize the multi-modal learning in cancer diagnosis with the capacity of dealing with missing data using histological images and genomic data. Our integrated model can utilize all available data from patients with both complete and partial modalities. The experiments on the public TCGA-GBM and TCGA-LGG datasets show that the data with missing modalities can contribute to multi-modal learning, which improves the model performance in grade classification of glioma cancer.
Cui Can, Asad Zuhayr, Dean William F, Smith Isabelle T, Madden Christopher, Bao Shunxing, Landman Bennett A, Roland Joseph T, Coburn Lori A, Wilson Keith T, Zwerner Jeffrey P, Zhao Shilin, Wheless Lee E, Huo Yuankai
Multi-modal learning, deep learning, missing data