ArXiv Preprint
Background: Medical decision-making impacts both individual and public
health. Clinical scores are commonly used among a wide variety of
decision-making models for determining the degree of disease deterioration at
the bedside. AutoScore was proposed as a useful clinical score generator based
on machine learning and a generalized linear model. Its current framework,
however, still leaves room for improvement when addressing unbalanced data of
rare events. Methods: Using machine intelligence approaches, we developed
AutoScore-Imbalance, which comprises three components: training dataset
optimization, sample weight optimization, and adjusted AutoScore. All scoring
models were evaluated on the basis of their area under the curve (AUC) in the
receiver operating characteristic analysis and balanced accuracy (i.e., mean
value of sensitivity and specificity). By utilizing a publicly accessible
dataset from Beth Israel Deaconess Medical Center, we assessed the proposed
model and baseline approaches in the prediction of inpatient mortality.
Results: AutoScore-Imbalance outperformed baselines in terms of AUC and
balanced accuracy. The nine-variable AutoScore-Imbalance sub-model achieved the
highest AUC of 0.786 (0.732-0.839) while the eleven-variable original AutoScore
obtained an AUC of 0.723 (0.663-0.783), and the logistic regression with 21
variables obtained an AUC of 0.743 (0.685-0.800). The AutoScore-Imbalance
sub-model (using down-sampling algorithm) yielded an AUC of 0. 0.771
(0.718-0.823) with only five variables, demonstrating a good balance between
performance and variable sparsity. Conclusions: The AutoScore-Imbalance tool
has the potential to be applied to highly unbalanced datasets to gain further
insight into rare medical events and to facilitate real-world clinical
decision-making.
Han Yuan, Feng Xie, Marcus Eng Hock Ong, Yilin Ning, Marcel Lucas Chee, Seyed Ehsan Saffari, Hairil Rizal Abdullah, Benjamin Alan Goldstein, Bibhas Chakraborty, Nan Liu
2021-07-13