In Molecular informatics ; h5-index 0.0
Machine learning approaches are widely used to evaluate ligand activities of chemical compounds toward potential target proteins. Especially, exploration of highly selective ligands is important for the development of new drugs with higher safety. One difficulty in constructing well-performing model predicting such a ligand activity is the absence of data on true negative ligand-protein interactions. In other words, in many cases we can access to plenty of information on ligands that bind to specific protein, but less or almost no information showing that compounds don't bind to proteins of interest. In this paper, we suggested an approach to comprehensively explore candidates for ligands specifically targeting toward proteins without using information on the true negative interaction. The approach consists of 4 steps: 1) constructing a model that distinguishes ligands for the target proteins of interest from those targeting proteins that cause off-target effects, by using graph convolution neural network (GCNN); 2) extracting feature vectors after convolution/pooling processes and mapping their principal components in two dimensions; 3) specifying regions with higher density for two ligand groups through kernel density estimation; and 4) investigating the distribution of compounds for exploration on the density map using the same classifier and decomposer. If compounds for exploration are located in higher-density regions of ligand compounds, these compounds can be regarded as having relatively high binding affinity to the major target or off-target proteins compared with other compounds. We applied the approach to the exploration of ligands for β-site amyloid precursor protein [APP]-cleaving enzyme 1 (BACE1), a major target for Alzheimer Disease (AD), with less off-target effect toward cathepsin D. We demonstrated that the density region of BACE1 and cathepsin D ligands are well-divided, and a group of natural compounds as a target for exploration of new drug candidates also has significantly different distribution on the density map.
Miyazaki Yu, Ono Naoaki, Huang Ming, Altaf-Ul-Amin Md, Kanaya Shigehiko
BACE1, GCNN, cathepsin D, ligand selectivity, mapping of principal components