In Frontiers in genetics ; h5-index 62.0
Anticancer peptides (ACPs) have provided a promising perspective for cancer treatment, and the prediction of ACPs is very important for the discovery of new cancer treatment drugs. It is time consuming and expensive to use experimental methods to identify ACPs, so computational methods for ACP identification are urgently needed. There have been many effective computational methods, especially machine learning-based methods, proposed for such predictions. Most of the current machine learning methods try to find suitable features or design effective feature learning techniques to accurately represent ACPs. However, the performance of these methods can be further improved for cases with insufficient numbers of samples. In this article, we propose an ACP prediction model called ACP-DA (Data Augmentation), which uses data augmentation for insufficient samples to improve the prediction performance. In our method, to better exploit the information of peptide sequences, peptide sequences are represented by integrating binary profile features and AAindex features, and then the samples in the training set are augmented in the feature space. After data augmentation, the samples are used to train the machine learning model, which is used to predict ACPs. The performance of ACP-DA exceeds that of existing methods, and ACP-DA achieves better performance in the prediction of ACPs compared with a method without data augmentation. The proposed method is available at http://github.com/chenxgscuec/ACPDA.
Chen Xian-Gan, Zhang Wen, Yang Xiaofei, Li Chenhong, Chen Hengling
anticancer peptide prediction, data augmentation, feature representation, machine learning, multilayer perception