In IEEE/ACM transactions on computational biology and bioinformatics
For the past decades, computational methods have been developed to predict various interactions in biological problems. Usually, these methods treated the predicting problems as semi-supervised problems or positive-unlabeled(PU) learning problems. Researchers focused on the prediction of unlabeled samples and hoped to find novel interactions in the datasets they collected. However, most of the computational methods could only predict a small proportion of undiscovered interactions and the total number was unknown. In this paper, we developed an estimation method with deep learning to calculate the number of undiscovered interactions in the unlabeled samples, derived its asymptotic interval estimation, and applied it to the compound synergism dataset, drug-target interaction(DTI) dataset, and MicroRNA-disease interaction dataset successfully. Moreover, this method could reveal which dataset contained more undiscovered interactions and would be a guidance for the experimental validation. Furthermore, we compared our method with some mixture proportion estimators and demonstrated the efficacy of our method. Finally, we proved that AUC and AUPR were related to the number of undiscovered interactions, which was regarded as another evaluation indicator for the computational methods.
Zhou Lewei, Tang Yucong, Yan Guiying