In Computers in biology and medicine
Preterm labor is the leading cause of neonatal morbidity and mortality in newborns and has attracted significant research attention from many scientific areas. The relationship between uterine contraction and the underlying electrical activities makes uterine electrohysterogram (EHG) a promising direction for detecting and predicting preterm births. However, due to the scarcity of EHG signals, especially those leading to preterm births, synthetic algorithms have been used to generate artificial samples of preterm birth type in order to eliminate bias in the prediction towards normal delivery, at the expense of reducing the feature effectiveness in automatic preterm detection based on machine learning. To address this problem, we quantify the effect of synthetic samples (balance coefficient) on the effectiveness of features and form a general performance metric by using several feature scores with relevant weights that describe their contributions to class segregation. In combination with the activation/inactivation functions that characterize the effect of the abundance of training samples on the accuracy of the prediction of preterm and normal birth delivery, we obtained an optimal sample balance coefficient that compromises the effect of synthetic samples in removing bias toward the majority group (i.e., normal delivery and the side effect of reducing the importance of features). A more realistic predictive accuracy was achieved through a series of numerical tests on the publicly available TPEHG database, therefore demonstrating the effectiveness of the proposed method.
Xu Jinshan, Chen Zhenqin, Zhang Jinpeng, Lu Yanpei, Yang Xi, Pumir Alain
Preterm prediction, Sample balance coefficient, Synthetic sampling, Uterine electrohysterogram