In Analytical biochemistry
Identification of ubiquitination sites is central to many biological experiments. Ubiquitination is a kind of post-translational protein modification (PTM). It is a key mechanism for increasing protein diversity and plays a vital role in regulating cell function. In recent years, many models have been developed to predict ubiquitination sites in humans, mice and yeast. However, few studies have predicted ubiquitination sites in Arabidopsis thaliana. In view of this, a deep network model named PrUb-EL is proposed to predict ubiquitination sites in Arabidopsis thaliana. Firstly, six features based on the protein sequence are extracted with amino acid index database (AAindex), dipeptide deviates from the expected mean (DDE), dipeptide composition (DPC), blocks substitution matrix (BLOSUM62), enhanced amino acid composition (EAAC) and binary encoding. Secondly, the synthetic minority over-sampling technique (SMOTE) is utilized to process the imbalanced data set. Then a new classifier named DG is presented, which includes Dense block, Residual block and Gated recurrent unit (GRU) block. Finally, each of six feature extraction methods is integrated into the DG model, and the ensemble learning strategy is used to gain the final prediction result. Experimental results show that PrUb-EL has good predictive ability with the accuracy (ACC) and area under the ROC curve (auROC) values of 91.00% and 97.70% using 5-fold cross-validation, respectively. Note that the values of ACC and auROC are 88.58% and 96.09% in the independent test, respectively. Compared with previous studies, our model has significantly improved performance. thus it is an excellent method for identifying ubiquitination sites in Arabidopsis thaliana. The datasets and code used for the article are available at https://github.com/Tom-Wangy/PreUb-EL.git.
Wang Houqiang, Li Hong, Gao Weifeng, Xie Jin
Deep learning, Dense block, Ensemble learning, GRU block, Residual block, SMOTE, Ubiquitination sites