In Journal of chemical information and modeling
Luciferase-based bioluminescence detection techniques are highly favored in high-throughput screening (HTS), in which the firefly luciferase (FLuc) is the most commonly used variant. However, FLuc inhibitors can interfere with the activity of luciferase, which may result in false positive signals in HTS assays. In order to reduce the unnecessary cost of time and money, in silico prediction model for FLuc inhibitors is highly desirable. In this study, we built an extensive dataset consisting of 20,888 FLuc inhibitors and 198,608 noninhibitors, and then developed a group of classification models based on the combination of three machine learning (ML) algorithms and four types of molecular representations. The best prediction model based on XGBoost and ECFP4 and MOE2d descriptors yielded a balanced accuracy (BA) of 0.878 and an AUC of 0.958 for the validation set, and a BA of 0.886 and an AUC of 0.947 for the test set. Three external validation sets, including Set 1 (3,231 FLuc inhibitors and 69,783 noninhibitors), Set 2 (695 FLuc inhibitors and 75,913 noninhibitors) and Set 3 (1,138 FLuc inhibitors and 8,155 noninhibitors), were used to verify the predictive ability of our models. The BA values for the three external validation sets given by the best model are 0.864, 0.845 and 0.791, respectively. In addition, the important features or structural fragments related to FLuc inhibitors were recognized by the Shapley additive explanations (SHAP) method along with their influences on predictions, which may provide valuable clues to detect undesirable luciferase inhibitors. Based on the important and explanatory features, 16 rules were proposed for detecting FLuc inhibitors, which can achieve the correction rate of 70% for FLuc inhibitors. Furthermore, a comparison with existing prediction rules and models for FLuc inhibitors used in virtual screening verified the high reliability of the models and rules proposed in this study. We also used the model to screen three curated chemical databases, and almost 10% of the molecules in the evaluated databases were predicted as inhibitors, highlighting the potential risk of false positives in luciferase-based assays. Finally, a public webserver called ChemFLuc was developed (http://admet.scbdd.com/chemfluc/index) and it offers a free available service to predict potential FLuc inhibitors.
Yang Ziyi, Dong Jie, Yang Zhijiang, Lu Aiping, Hou Tingjun, Cao Dongsheng