ArXiv Preprint
Motivated by the emerging needs of personalized preventative intervention in
many healthcare applications, we consider a multi-stage, dynamic
decision-making problem in the online setting with unknown model parameters. To
deal with the pervasive issue of small sample size in personalized planning, we
develop a novel data-pooling reinforcement learning (RL) algorithm based on a
general perturbed value iteration framework. Our algorithm adaptively pools
historical data, with three main innovations: (i) the weight of pooling ties
directly to the performance of decision (measured by regret) as opposed to
estimation accuracy in conventional methods; (ii) no parametric assumptions are
needed between historical and current data; and (iii) requiring data-sharing
only via aggregate statistics, as opposed to patient-level data. Our
data-pooling algorithm framework applies to a variety of popular RL algorithms,
and we establish a theoretical performance guarantee showing that our pooling
version achieves a regret bound strictly smaller than that of the no-pooling
counterpart. We substantiate the theoretical development with empirically
better performance of our algorithm via a case study in the context of
post-discharge intervention to prevent unplanned readmissions, generating
practical insights for healthcare management. In particular, our algorithm
alleviates privacy concerns about sharing health data, which (i) opens the door
for individual organizations to levering public datasets or published studies
to better manage their own patients; and (ii) provides the basis for public
policy makers to encourage organizations to share aggregate data to improve
population health outcomes for the broader community.
Xinyun Chen, Pengyi Shi, Shanwen Pu
2022-11-16