In Computers in biology and medicine
Multimodal deep learning models have been applied for disease prediction tasks, but difficulties exist in training due to the conflict between sub-models and fusion modules. To alleviate this issue, we propose a framework for decoupling feature alignment and fusion (DeAF), which separates the multimodal model training into two stages. In the first stage, unsupervised representation learning is conducted, and the modality adaptation (MA) module is used to align the features from various modalities. In the second stage, the self-attention fusion (SAF) module combines the medical image features and clinical data using supervised learning. Moreover, we apply the DeAF framework to predict the postoperative efficacy of CRS for colorectal cancer and whether the MCI patients change to Alzheimer's disease. The DeAF framework achieves a significant improvement in comparison to the previous methods. Furthermore, extensive ablation experiments are conducted to demonstrate the rationality and effectiveness of our framework. In conclusion, our framework enhances the interaction between the local medical image features and clinical data, and derive more discriminative multimodal features for disease prediction. The framework implementation is available at https://github.com/cchencan/DeAF.
Li Kangshun, Chen Can, Cao Wuteng, Wang Hui, Han Shuai, Wang Renjie, Ye Zaisheng, Wu Zhijie, Wang Wenxiang, Cai Leng, Ding Deyu, Yuan Zixu
2023-Feb-28
Computer tomography, Disease prediction, Medical image, Multimodal deep learning, Unsupervised representation learning