ArXiv Preprint
We introduce a challenging decision-making task that we call active
acquisition for multimodal temporal data (A2MT). In many real-world scenarios,
input features are not readily available at test time and must instead be
acquired at significant cost. With A2MT, we aim to learn agents that actively
select which modalities of an input to acquire, trading off acquisition cost
and predictive performance. A2MT extends a previous task called active feature
acquisition to temporal decision making about high-dimensional inputs. Further,
we propose a method based on the Perceiver IO architecture to address A2MT in
practice. Our agents are able to solve a novel synthetic scenario requiring
practically relevant cross-modal reasoning skills. On two large-scale,
real-world datasets, Kinetics-700 and AudioSet, our agents successfully learn
cost-reactive acquisition behavior. However, an ablation reveals they are
unable to learn to learn adaptive acquisition strategies, emphasizing the
difficulty of the task even for state-of-the-art models. Applications of A2MT
may be impactful in domains like medicine, robotics, or finance, where
modalities differ in acquisition cost and informativeness.
Jannik Kossen, Cătălina Cangea, Eszter Vértes, Andrew Jaegle, Viorica Patraucean, Ira Ktena, Nenad Tomasev, Danielle Belgrave
2022-11-09