ArXiv Preprint
Estimating causal effects from large experimental and observational data has
become increasingly prevalent in both industry and research. The bootstrap is
an intuitive and powerful technique used to construct standard errors and
confidence intervals of estimators. Its application however can be
prohibitively demanding in settings involving large data. In addition, modern
causal inference estimators based on machine learning and optimization
techniques exacerbate the computational burden of the bootstrap. The bag of
little bootstraps has been proposed in non-causal settings for large data but
has not yet been applied to evaluate the properties of estimators of causal
effects. In this paper, we introduce a new bootstrap algorithm called causal
bag of little bootstraps for causal inference with large data. The new
algorithm significantly improves the computational efficiency of the
traditional bootstrap while providing consistent estimates and desirable
confidence interval coverage. We describe its properties, provide practical
considerations, and evaluate the performance of the proposed algorithm in terms
of bias, coverage of the true 95% confidence intervals, and computational time
in a simulation study. We apply it in the evaluation of the effect of hormone
therapy on the average time to coronary heart disease using a large
observational data set from the Women's Health Initiative.
Matthew Kosko, Lin Wang, Michele Santacatterina
2023-02-06