ArXiv Preprint
Machine learning models deployed in healthcare systems face data drawn from
continually evolving environments. However, researchers proposing such models
typically evaluate them in a time-agnostic manner, with train and test splits
sampling patients throughout the entire study period. We introduce the
Evaluation on Medical Datasets Over Time (EMDOT) framework and Python package,
which evaluates the performance of a model class over time. Across five medical
datasets and a variety of models, we compare two training strategies: (1) using
all historical data, and (2) using a window of the most recent data. We note
changes in performance over time, and identify possible explanations for these
shocks.
Helen Zhou, Yuwen Chen, Zachary C. Lipton
2022-11-14