In Computers in biology and medicine
Medical image segmentation result is an essential reference for disease diagnosis. Recently, with the development and application of convolutional neural networks, medical image processing has significantly developed. However, most existing automatic segmentation tasks are still challenging due to various positions, sizes, and shapes, resulting in poor segmentation performance. In addition, most of the current methods use the encoder-decoder architecture for feature extraction, focusing on the acquisition of semantic information but ignoring the specific target and global context information. In this work, we propose a hybrid-scale contextual fusion network to capture the richer spatial and semantic information. First, a hybrid-scale embedding layer (HEL) is employed before the transformer. By mixing each embedding with multiple patches, the object information of different scales can be captured availably. Further, we present a standard transformer to model long-range dependencies in the first two skip connections. Meanwhile, the pooling transformer (PTrans) is employed to handle long input sequences in the following two skip connections. By leveraging the global average pooling operation and the corresponding transformer block, the spatial structure information of the target will be learned effectively. In the last, dual-branch channel attention module (DCA) is proposed to focus on crucial channel features and conduct multi-level features fusion simultaneously. By utilizing the fusion scheme, richer context and fine-grained features are captured and encoded efficiently. Extensive experiments on three public datasets demonstrate that the proposed method outperforms state-of-the-art methods.
Bao Hua, Zhu Yuqing, Li Qing
2022-Dec-22
Convolutional neural networks, Medical image segmentation, Transformer