In Analytical biochemistry
Human leukocyte antigen (HLA) plays a vital role in immunomodulatory function. Studies have shown that immunotherapy based on non-classical HLA has essential applications in cancer, COVID-19, and allergic diseases. However, there are few deep learning methods to predict non-classical HLA alleles. In this work, an adaptive dual-attention network named DapNet-HLA is established based on existing datasets. Firstly, amino acid sequences are transformed into digital vectors by looking up the table. To overcome the feature sparsity problem caused by unique one-hot encoding, the fused word embedding method is used to map each amino acid to a low-dimensional word vector optimized with the training of the classifier. Then, we use the GCB (group convolution block), SENet attention (squeeze-and-excitation networks), BiLSTM (bidirectional long short-term memory network), and Bahdanau attention mechanism to construct the classifier. The use of SENet can make the weight of the effective feature map high, so that the model can be trained to achieve better results. Attention mechanism is an Encoder-Decoder model used to improve the effectiveness of RNN, LSTM or GRU (gated recurrent neural network). The ablation experiment shows that DapNet-HLA has the best adaptability for five datasets. On the five test datasets, the ACC index and MCC index of DapNet-HLA are 4.89% and 0.0933 higher than the comparison method, respectively. According to the ROC curve and PR curve verified by the 5-fold cross-validation, the AUC value of each fold has a slight fluctuation, which proves the robustness of the DapNet-HLA. The codes and datasets are accessible at https://github.com/JYY625/DapNet-HLA.
Jing Yuanyuan, Zhang Shengli, Wang Houqiang
Bahdanau attention mechanism, Non-classical HLA binding sites, SENet attention mechanism, Word embedding