In IEEE/ACM transactions on computational biology and bioinformatics
Protein fold recognition is critical for studies of the protein structure prediction and drug design. Several methods have been proposed to obtain discriminative features from the protein sequences for fold recognition. However, the ensemble methods that combine the various features to improve predictive performance remain the challenge problems. In this study, we proposed two novel algorithms: AWMG and EMfold. AWMG used a novel predictor based on the multi-view learning framework for fold recognition. Each view was treated as the intermediate representation of the corresponding data source of proteins. AWMG calculated the auto-weight for each view respectively and constructed the latent subspace which contains the common information shared by different views. The marginalized constraint was employed to enlarge the margins between different folds, improving the predictive performance of AWMG. Furthermore, we proposed a novel ensemble method called EMfold, which combines two complementary methods AWMG and DeepSS. The later method was a template-based algorithm using the SPARKS-X and DeepFR programs. EMfold integrated the advantages of template-based assignment and machine learning classifier. Experimental results on the two widely datasets (LE and YK) showed that the proposed methods outperformed some state-of-the-art methods, indicating that AWMG and EMfold are useful tools for protein fold recognition.
Yan Ke, Wen Jie, Xu Yong, Liu Bin