In IEEE/ACM transactions on computational biology and bioinformatics
Learning representations from data is a fundamental step for machine learning. High-quality and robust drug representations can broaden the understanding of pharmacology, and improve the modeling of multiple drug-related prediction tasks, which further facilitates drug development. Although there are a number of models developed for drug representation learning from various data sources, few researches extract drug representations from gene expression profiles. Since gene expression profiles of drug-treated cells are widely used in clinical diagnosis and therapy, it is believed that leveraging them to eliminate cell specificity can promote drug representation learning. In this paper, we propose a three-stage deep learning method for drug representation learning, named DRLM, which integrates gene expression profiles of drug-related cells and the therapeutic use information of drugs. Firstly, we construct a stacked autoencoder to learn low-dimensional compact drug representations. Secondly, we utilize an iterative clustering module to reduce the negative effects of cell specificity and noise in gene expression profiles on the low-dimensional drug representations. Thirdly, a therapeutic use discriminator is designed to incorporate therapeutic use information into the drug representations. The visualization analysis of drug representations demonstrates DRLM can reduce cell specificity and integrate therapeutic use information effectively. Extensive experiments on three types of prediction tasks are conducted based on different drug representations, and they show that the drug representations learned by DRLM outperform other representations in terms of most metrics. The ablation analysis also demonstrates DRLM's effectiveness of merging the gene expression profiles with the therapeutic use information. Furthermore, we input the learned representations into the machine learning models for case studies, which indicates its potential to discover new drug-related relationships in various tasks.
Fu Haitao, Zhao Cecheng, Niu Xiaohui, Zhang Wen