In Physics in medicine and biology
OBJECTIVE : Diabetic retinopathy (DR) grading is primarily performed by assessing fundus images. Many types of lesions, such as microaneurysms, hemorrhages, and soft exudates, are available simultaneously in a single image. However, their sizes may be small, making it difficult to differentiate adjacent DR grades even using deep convolutional neural networks (CNNs). Recently, a vision transformer (ViT) has shown comparable or even superior performance to CNNs, and it also learns different visual representations from CNNs. Inspired by this finding, we propose a two-path Contextual Transformer with Xception Network (CoT-XNet) to improve the accuracy of DR grading.
APPROACH : The representations learned by CoT through one path and those by the Xception network through another path are concatenated before the fully connected layer. Meanwhile, the dedicated pre-processing, data resampling, and test time augmentation strategies are implemented. The performance of CoT-XNet is evaluated in the publicly available datasets of DDR, APTOS2019, and EyePACS, which include over 50,000 images. Ablation experiments and comprehensive comparisons with various state-of-the-art (SOTA) models have also been performed.
MAIN RESULTS : Our proposed CoT-XNet shows better performance than available SOTA models, and the accuracy and Kappa are 83.10% and 0.8496, 84.18% and 0.9000, and 84.10% and 0.7684, respectively, in the three datasets (listed above). Class activation maps of CoT and Xception networks are different and complementary in most images.
SIGNIFICANCE : By concatenating the different visual representations learned by CoT and Xception networks, CoT-XNet can accurately grade DR from fundus images and present good generalizability. CoT-XNet will promote the application of artificial intelligence-based systems in the DR screening of large-scale populations.
Zhao Shuiqing, Wu Yanan, Tong Mengmeng, Yao Yudong, Qian Wei, Qi Shouliang
2022-Nov-02
convolutional neural network, deep learning, diabetic retinopathy grading, vision transformer