Long non-coding RNAs (lncRNAs) play key roles in regulating cellular biological processes through diverse molecular mechanisms including binding to RNA binding proteins. The majority of plant lncRNAs are functionally uncharacterized, thus, accurate prediction of plant lncRNA-protein interaction is imperative for subsequent functional studies. We present an integrative model, namely DRPLPI. Its uniqueness is that it predicts by multi-feature fusion. Structural and four groups of sequence features are used, including tri-nucleotide composition, gapped k-mer, recursive complement and binary profile. We design a multi-head self-attention long short-term memory encoder-decoder network to extract generative high-level features. To obtain robust results, DRPLPI combines categorical boosting and extra trees into a single meta-learner. Experiments on Zea mays and Arabidopsis thaliana obtained 0.9820 and 0.9652 area under precision/recall curve (AUPRC) respectively. The proposed method shows significant enhancement in the prediction performance compared with existing state-of-the-art methods.
Wekesa Jael Sanyanda, Meng Jun, Luan Yushi
Deep learning, Plants, Prediction, Secondary structure features, lncRNA–protein interaction