In Plant molecular biology
We proposed an ensemble convolutional neural network model to identify sgRNA high on-target activity in four crops and we used one-hot encoding and k-mers for sequence encoding. As an important component of the CRISPR/Cas9 system, single-guide RNA (sgRNA) plays an important role in gene redirection and editing. sgRNA has played an important role in the improvement of agronomic species, but there is a lack of effective bioinformatics tools to identify the activity of sgRNA in agronomic species. Therefore, it is necessary to develop a method based on machine learning to identify sgRNA high on-target activity. In this work, we proposed a simple convolutional neural network method to identify sgRNA high on-target activity. Our study used one-hot encoding and k-mers for sequence data conversion and a voting algorithm for constructing the convolutional neural network ensemble model sgRNACNN for the prediction of sgRNA activity. The ensemble model sgRNACNN was used for predictions in four crops: Glycine max, Zea mays, Sorghum bicolor and Triticum aestivum. The accuracy rates of the four crops in the sgRNACNN model were 82.43%, 80.33%, 78.25% and 87.49%, respectively. The experimental results showed that sgRNACNN realizes the identification of high on-target activity sgRNA of agronomic data and can meet the demands of sgRNA activity prediction in agronomy to a certain extent. These results have certain significance for guiding crop gene editing and academic research. The source code and relevant dataset can be found in the following link: https://github.com/nmt315320/sgRNACNN.git .
Niu Mengting, Lin Yuan, Zou Quan
Convolutional neural networks, One-hot encoding, k-mer, sgRNA high on-target activity