In Molecular informatics
Epoxidation is one of the reactions in drug metabolism. Since epoxide metabolites would bind with proteins or DNA covalently, drugs should avoid epoxidation metabolism in the body. Due to the instability of epoxide, it is difficult to determine epoxidation experimentally. In silico models based on big data and machine learning methods are hence valuable approaches to predict whether a compound would undergo epoxidation. In this study, we collected 884 epoxidation data manually from various sources, and finally got 829 unique sites of epoxidation. Three types of molecular fingerprints with different lengths (1024, 2048 or 4096 bits) were used to describe the reaction sites. Six machine learning methods were used to build the classification models. The training set and test set were randomly divided into 8: 2, and 54 models were constructed and evaluated. Four best models were selected for feature selection. The features were then chosen and verified by external validation set. The resulted optimal model had the accuracy and AUC (area under the curve) values at 0.873 and 0.944 for the test set, 0.838 and 0.987 for the external validation set, respectively. The models built in this study could accurately predict whether a compound will undergo epoxidation and which part is most susceptible to epoxidation, which is of great significance for drug design.
Hu Jiajing, Cai Yingchun, Li Weihua, Liu Guixia, Tang Yun
Machine learning, classification model, epoxidation, site of metabolism