ArXiv Preprint
Predictive models -- as with machine learning -- can underpin causal
inference, to estimate the effects of an intervention at the population or
individual level. This opens the door to a plethora of models, useful to match
the increasing complexity of health data, but also the Pandora box of model
selection: which of these models yield the most valid causal estimates? Classic
machine-learning cross-validation procedures are not directly applicable.
Indeed, an appropriate selection procedure for causal inference should equally
weight both outcome errors for each individual, treated or not treated, whereas
one outcome may be seldom observed for a sub-population. We study how more
elaborate risks benefit causal model selection. We show theoretically that
simple risks are brittle to weak overlap between treated and non-treated
individuals as well as to heterogeneous errors between populations. Rather a
more elaborate metric, the R-risk appears as a proxy of the oracle error on
causal estimates, observable at the cost of an overlap re-weighting. As the
R-risk is defined not only from model predictions but also by using the
conditional mean outcome and the treatment probability, using it for model
selection requires adapting cross validation. Extensive experiments show that
the resulting procedure gives the best causal model selection.
Doutreligne Matthieu, Varoquaux Gaƫl
2023-02-01