In IEEE/ACM transactions on computational biology and bioinformatics
Deep learning has been successfully applied to surprisingly different domains. Researchers and practitioners are employing trained deep learning models to enrich our knowledge. Transcription factors (TFs) are essential for regulating gene expression in all organisms by binding to specific DNA sequences. Here, we designed a deep learning model named SemanticCS (Semantic ChIP-seq) to predict TF binding specificities. We trained our learning model on an ensemble of ChIP-seq datasets (Multi-TF-cell) to learn useful intermediate features across multiple TFs and cells. To interpret these feature vectors, visualization analysis was used. Our results indicate that these learned representations can be used to train shallow machines for other tasks. Using diverse experimental data and evaluation metrics, we show that SemanticCS outperforms other popular methods. In addition, from experimental data, SemanticCS can help to identify the substitutions that cause regulatory abnormalities and to evaluate the effect of substitutions on the binding affinity for the RXR transcription factor. The online server for SemanticCS is freely available at http://qianglab.scst.suda.edu.cn/semanticCS/.
Quan Lijun, Sun Xiaoyu, Wu Jian, Mei Jie, Huang Liqun, He Ruji, Nie Liangpeng, Chen Yu, Lyu Qiang