In Briefings in bioinformatics
Neuropeptides are a diverse and complex class of signaling molecules that regulate a variety of biological processes. Neuropeptides provide many opportunities for the discovery of new drugs and targets for the treatment of a wide range of diseases, and thus, computational tools for the rapid and accurate large-scale identification of neuropeptides are of great significance for peptide research and drug development. Although several machine learning-based prediction tools have been developed, there is room for improvement in the performance and interpretability of the proposed methods. In this work, we developed an interpretable and robust neuropeptide prediction model, named NeuroPred-PLM. First, we employed a language model (ESM) of proteins to obtain semantic representations of neuropeptides, which could reduce the complexity of feature engineering. Next, we adopted a multi-scale convolutional neural network to enhance the local feature representation of neuropeptide embeddings. To make the model interpretable, we proposed a global multi-head attention network that could be used to capture the position-wise contribution to neuropeptide prediction via the attention scores. In addition, NeuroPred-PLM was developed based on our newly constructed NeuroPep 2.0 database. Benchmarks based on the independent test set show that NeuroPred-PLM achieves superior predictive performance compared with other state-of-the-art predictors. For the convenience of researchers, we provide an easy-to-install PyPi package (https://pypi.org/project/NeuroPredPLM/) and a web server (https://huggingface.co/spaces/isyslab/NeuroPred-PLM).
Wang Lei, Huang Chen, Wang Mingxia, Xue Zhidong, Wang Yan
2023-Mar-09
deep learning, interpretable model, neuropeptide prediction, protein language model