In Bioinformatics (Oxford, England) ; h5-index 0.0
MOTIVATION : Carbohydrate-active enzymes (CAZymes) are extremely important to bioenergy, human gut microbiome, and plant pathogen researches and industries. Here we developed a new amino acid k-mer based CAZyme classification, motif identification, and genome annotation tool using a bipartite network algorithm. Using this tool, we classified 390 CAZyme families into thousands of subfamilies each with distinguishing k-mer peptides. These k-mers represented the characteristic motifs (in the form of a collection of conserved short peptides) of each subfamily, and thus were further used to annotate new genomes for CAZymes. This idea was also generalized to extract characteristic k-mer peptides for all the Swiss-Prot enzymes classified by the EC (enzyme commission) numbers and applied to enzyme EC prediction.
RESULTS : This new tool was implemented as a Python package named eCAMI. Benchmark analysis of eCAMI against the state-of-the-art tools on CAZyme and enzyme EC datasets found that: (i) eCAMI has the best performance in terms of accuracy and memory use for CAZyme and enzyme EC classification and annotation; (ii) the k-mer based tools (including PPR-Hotpep, CUPP, eCAMI) perform better than homology-based tools and deep-learning tools in enzyme EC prediction. Lastly, we confirmed that the k-mer based tools have the unique ability to identify the characteristic k-mer peptides in the predicted enzymes.
AVAILABILITY : https://github.com/yinlabniu/eCAMI and https://github.com/zhanglabNKU/eCAMI.
SUPPLEMENTARY INFORMATION : Supplementary data are available at Bioinformatics online.
Xu Jing, Zhang Han, Zheng Jinfang, Dovoedo Philippe, Yin Yanbin