ArXiv Preprint
Modern surgeries are performed in complex and dynamic settings, including
ever-changing interactions between medical staff, patients, and equipment. The
holistic modeling of the operating room (OR) is, therefore, a challenging but
essential task, with the potential to optimize the performance of surgical
teams and aid in developing new surgical technologies to improve patient
outcomes. The holistic representation of surgical scenes as semantic scene
graphs (SGG), where entities are represented as nodes and relations between
them as edges, is a promising direction for fine-grained semantic OR
understanding. We propose, for the first time, the use of temporal information
for more accurate and consistent holistic OR modeling. Specifically, we
introduce memory scene graphs, where the scene graphs of previous time steps
act as the temporal representation guiding the current prediction. We design an
end-to-end architecture that intelligently fuses the temporal information of
our lightweight memory scene graphs with the visual information from point
clouds and images. We evaluate our method on the 4D-OR dataset and demonstrate
that integrating temporality leads to more accurate and consistent results
achieving an +5% increase and a new SOTA of 0.88 in macro F1. This work opens
the path for representing the entire surgery history with memory scene graphs
and improves the holistic understanding in the OR. Introducing scene graphs as
memory representations can offer a valuable tool for many temporal
understanding tasks.
Ege Özsoy, Tobias Czempiel, Felix Holm, Chantal Pellegrini, Nassir Navab
2023-03-23