In Seminars in radiation oncology
In the last 5 years, deep learning applications for radiotherapy have undergone great development. An advantage of radiotherapy over radiological applications is that data in radiotherapy are well structured, standardized, and annotated. Furthermore, there is much to be gained in automating the current laborious workflows in radiotherapy. After the initial peak in the belief in deep learning, researchers have also identified fundamental weaknesses of deep learning. The basic assumption in deep learning is that the training and test data originate from the same data generating process. This is not always clear-cut in clinical practice, eg, data acquired with 2 different scanners of different vendors might not originate from the same data generating process. Furthermore, it is important to realize residual uncertainties remain even if test data arise from the same data generating process as the training data. As deep learning applications are being introduced in clinical radiotherapy workflows, a deep learning model must express to a user when a prediction exceeds a certain uncertainty threshold. The literature on uncertainty assessment for deep learning applications in radiotherapy is still in its infancy; however, quite a body of literature exists on the validity and uncertainty of deep learning models for computer vision applications. This paper tries to explain these general concepts to the radiotherapy community. Concepts of epistemic and aleatoric uncertainties and techniques to model them in deep learning are described in detail. It is discussed how they can be applied to maximize confidence in automated deep learning-driven workflows. Their usage is demonstrated in 3 examples from radiotherapy literature on deep learning applications, ie, dose prediction, synthetic CT generation, and contouring. In the final part, some of the key elements to ensure confidence and automatic alerting that are still missing are discussed. State-of-the-art automatic solutions for checking within-distribution vs out-of-distribution test samples are discussed. However, these methodologies are still immature, and strict QA protocols and close human supervision will still be needed. Nevertheless, deep learning models offer already much value for radiotherapy.
van den Berg Cornelis A T, Meliadò Ettore F