In The Plant journal : for cell and molecular biology
Currently, the experimentally identified interactome of Arabidopsis (Arabidopsis thaliana) is still far from complete, suggesting that computational prediction methods can complement experimental techniques. Motivated by the prosperity and success of deep learning algorithms and natural language processing techniques, we introduce an integrative deep learning framework, DeepAraPPI, allowing us to predict protein-protein interactions (PPIs) of Arabidopsis utilizing sequence, domain and Gene Ontology (GO) information. Our current DeepAraPPI comprises (i) a word2vec encoding-based Siamese recurrent convolutional neural network (RCNN) model, (ii) a Domain2vec encoding-based multiple layer perceptron (MLP) model, and (iii) a GO2vec encoding-based MLP model. Finally, DeepAraPPI combines prediction results of the three individual predictors through a logistic regression model. Compiling high-quality positive and negative training and test samples by applying strict filtering strategies, DeepAraPPI shows superior performance compared to existing state-of-the-art Arabidopsis PPI prediction methods. DeepAraPPI also provides better cross-species predictive ability in rice (Oryza sativa) than traditional machine learning methods, although the overall performance in cross-species prediction remains to be improved. DeepAraPPI is freely accessible at http://zzdlab.com/deeparappi/. In the meantime, we have also made the source code and datasets of DeepAraPPI available at https://github.com/zjy1125/DeepAraPPI.
Zheng Jingyan, Yang Xiaodi, Huang Yan, Yang Shiping, Wuchty Stefan, Zhang Ziding
2023-Mar-14
Arabidopsis thaliana, GO annotation, deep learning, domain, prediction, protein-protein interaction