In JAMA network open
Importance : A chronic shortage of donor kidneys is compounded by a high discard rate, and this rate is directly associated with biopsy specimen evaluation, which shows poor reproducibility among pathologists. A deep learning algorithm for measuring percent global glomerulosclerosis (an important predictor of outcome) on images of kidney biopsy specimens could enable pathologists to more reproducibly and accurately quantify percent global glomerulosclerosis, potentially saving organs that would have been discarded.
Objective : To compare the performances of pathologists with a deep learning model on quantification of percent global glomerulosclerosis in whole-slide images of donor kidney biopsy specimens, and to determine the potential benefit of a deep learning model on organ discard rates.
Design, Setting, and Participants : This prognostic study used whole-slide images acquired from 98 hematoxylin-eosin-stained frozen and 51 permanent donor biopsy specimen sections retrieved from 83 kidneys. Serial annotation by 3 board-certified pathologists served as ground truth for model training and for evaluation. Images of kidney biopsy specimens were obtained from the Washington University database (retrieved between June 2015 and June 2017). Cases were selected randomly from a database of more than 1000 cases to include biopsy specimens representing an equitable distribution within 0% to 5%, 6% to 10%, 11% to 15%, 16% to 20%, and more than 20% global glomerulosclerosis.
Main Outcomes and Measures : Correlation coefficient (r) and root-mean-square error (RMSE) with respect to annotations were computed for cross-validated model predictions and on-call pathologists' estimates of percent global glomerulosclerosis when using individual and pooled slide results. Data were analyzed from March 2018 to August 2020.
Results : The cross-validated model results of section images retrieved from 83 donor kidneys showed higher correlation with annotations (r = 0.916; 95% CI, 0.886-0.939) than on-call pathologists (r = 0.884; 95% CI, 0.825-0.923) that was enhanced when pooling glomeruli counts from multiple levels (r = 0.933; 95% CI, 0.898-0.956). Model prediction error for single levels (RMSE, 5.631; 95% CI, 4.735-6.517) was 14% lower than on-call pathologists (RMSE, 6.523; 95% CI, 5.191-7.783), improving to 22% with multiple levels (RMSE, 5.094; 95% CI, 3.972-6.301). The model decreased the likelihood of unnecessary organ discard by 37% compared with pathologists.
Conclusions and Relevance : The findings of this prognostic study suggest that this deep learning model provided a scalable and robust method to quantify percent global glomerulosclerosis in whole-slide images of donor kidneys. The model performance improved by analyzing multiple levels of a section, surpassing the capacity of pathologists in the time-sensitive setting of examining donor biopsy specimens. The results indicate the potential of a deep learning model to prevent erroneous donor organ discard.
Marsh Jon N, Liu Ta-Chiang, Wilson Parker C, Swamidass S Joshua, Gaut Joseph P