ArXiv Preprint
Purpose: Surgery scene understanding with tool-tissue interaction recognition
and automatic report generation can play an important role in intra-operative
guidance, decision-making and postoperative analysis in robotic surgery.
However, domain shifts between different surgeries with inter and intra-patient
variation and novel instruments' appearance degrade the performance of model
prediction. Moreover, it requires output from multiple models, which can be
computationally expensive and affect real-time performance.
Methodology: A multi-task learning (MTL) model is proposed for surgical
report generation and tool-tissue interaction prediction that deals with domain
shift problems. The model forms of shared feature extractor, mesh-transformer
branch for captioning and graph attention branch for tool-tissue interaction
prediction. The shared feature extractor employs class incremental contrastive
learning (CICL) to tackle intensity shift and novel class appearance in the
target domain. We design Laplacian of Gaussian (LoG) based curriculum learning
into both shared and task-specific branches to enhance model learning. We
incorporate a task-aware asynchronous MTL optimization technique to fine-tune
the shared weights and converge both tasks optimally.
Results: The proposed MTL model trained using task-aware optimization and
fine-tuning techniques reported a balanced performance (BLEU score of 0.4049
for scene captioning and accuracy of 0.3508 for interaction detection) for both
tasks on the target domain and performed on-par with single-task models in
domain adaptation.
Conclusion: The proposed multi-task model was able to adapt to domain shifts,
incorporate novel instruments in the target domain, and perform tool-tissue
interaction detection and report generation on par with single-task models.
Lalithkumar Seenivasan, Mobarakol Islam, Mengya Xu, Chwee Ming Lim, Hongliang Ren
2022-11-28