ArXiv Preprint
Artificial intelligence-based methods have generated substantial interest in
nuclear medicine. An area of significant interest has been using deep-learning
(DL)-based approaches for denoising images acquired with lower doses, shorter
acquisition times, or both. Objective evaluation of these approaches is
essential for clinical application. DL-based approaches for denoising
nuclear-medicine images have typically been evaluated using fidelity-based
figures of merit (FoMs) such as RMSE and SSIM. However, these images are
acquired for clinical tasks and thus should be evaluated based on their
performance in these tasks. Our objectives were to (1) investigate whether
evaluation with these FoMs is consistent with objective clinical-task-based
evaluation; (2) provide a theoretical analysis for determining the impact of
denoising on signal-detection tasks; (3) demonstrate the utility of virtual
clinical trials (VCTs) to evaluate DL-based methods. A VCT to evaluate a
DL-based method for denoising myocardial perfusion SPECT (MPS) images was
conducted. The impact of DL-based denoising was evaluated using fidelity-based
FoMs and AUC, which quantified performance on detecting perfusion defects in
MPS images as obtained using a model observer with anthropomorphic channels.
Based on fidelity-based FoMs, denoising using the considered DL-based method
led to significantly superior performance. However, based on ROC analysis,
denoising did not improve, and in fact, often degraded detection-task
performance. The results motivate the need for objective task-based evaluation
of DL-based denoising approaches. Further, this study shows how VCTs provide a
mechanism to conduct such evaluations using VCTs. Finally, our theoretical
treatment reveals insights into the reasons for the limited performance of the
denoising approach.
Zitong Yu, Md Ashequr Rahman, Richard Laforest, Thomas H. Schindler, Robert J. Gropler, Richard L. Wahl, Barry A. Siegel, Abhinav K. Jha
2023-03-03