In BMC bioinformatics
BACKGROUND : Anti-CRISPR proteins are potent modulators that inhibit the CRISPR-Cas immunity system and have huge potential in gene editing and gene therapy as a genome-editing tool. Extensive studies have shown that anti-CRISPR proteins are essential for modifying endogenous genes, promoting the RNA-guided binding and cleavage of DNA or RNA substrates. In recent years, identifying and characterizing anti-CRISPR proteins has become a hot and significant research topic in bioinformatics. However, as most anti-CRISPR proteins fall short in sharing similarities to those currently known, traditional screening methods are time-consuming and inefficient. Machine learning methods could fill this gap with powerful predictive capability and provide a new perspective for anti-CRISPR protein identification.
RESULTS : Here, we present a novel machine learning ensemble predictor, called PreAcrs, to identify anti-CRISPR proteins from protein sequences directly. Three features and eight different machine learning algorithms were used to train PreAcrs. PreAcrs outperformed other existing methods and significantly improved the prediction accuracy for identifying anti-CRISPR proteins.
CONCLUSIONS : In summary, the PreAcrs predictor achieved a competitive performance for predicting new anti-CRISPR proteins in terms of accuracy and robustness. We anticipate PreAcrs will be a valuable tool for researchers to speed up the research process. The source code is available at: https://github.com/Lyn-666/anti_CRISPR.git .
Zhu Lin, Wang Xiaoyu, Li Fuyi, Song Jiangning
Anti-CRISPR protein, Feature engineering, Machine learning, Sequence analysis