ArXiv Preprint
In medical settings, Individual Variation (IV) refers to variation that is
due not to population differences or errors, but rather to within-subject
variation, that is the intrinsic and characteristic patterns of variation
pertaining to a given instance or the measurement process. While taking into
account IV has been deemed critical for proper analysis of medical data, this
source of uncertainty and its impact on robustness have so far been neglected
in Machine Learning (ML). To fill this gap, we look at how IV affects ML
performance and generalization and how its impact can be mitigated.
Specifically, we provide a methodological contribution to formalize the problem
of IV in the statistical learning framework and, through an experiment based on
one of the largest real-world laboratory medicine datasets for the problem of
COVID-19 diagnosis, we show that: 1) common state-of-the-art ML models are
severely impacted by the presence of IV in data; and 2) advanced learning
strategies, based on data augmentation and data imprecisiation, and proper
study designs can be effective at improving robustness to IV. Our findings
demonstrate the critical relevance of correctly accounting for IV to enable
safe deployment of ML in clinical settings.
Andra Campagner, Lorenzo Famiglini, Anna Carobene, Federico Cabitza
2022-10-10