In Intensive care medicine experimental
BACKGROUND : Assessing measurement error in alveolar recruitment on computed tomography (CT) is of paramount importance to select a reliable threshold identifying patients with high potential for alveolar recruitment and to rationalize positive end-expiratory pressure (PEEP) setting in acute respiratory distress syndrome (ARDS). The aim of this study was to assess both intra- and inter-observer smallest real difference (SRD) exceeding measurement error of recruitment using both human and machine learning-made lung segmentation (i.e., delineation) on CT. This single-center observational study was performed on adult ARDS patients. CT were acquired at end-expiration and end-inspiration at the PEEP level selected by clinicians, and at end-expiration at PEEP 5 and 15 cmH2O. Two human observers and a machine learning algorithm performed lung segmentation. Recruitment was computed as the weight change of the non-aerated compartment on CT between PEEP 5 and 15 cmH2O.
RESULTS : Thirteen patients were included, of whom 11 (85%) presented a severe ARDS. Intra- and inter-observer measurements of recruitment were virtually unbiased, with 95% confidence intervals (CI95%) encompassing zero. The intra-observer SRD of recruitment amounted to 3.5 [CI95% 2.4-5.2]% of lung weight. The human-human inter-observer SRD of recruitment was slightly higher amounting to 5.7 [CI95% 4.0-8.0]% of lung weight, as was the human-machine SRD (5.9 [CI95% 4.3-7.8]% of lung weight). Regarding other CT measurements, both intra-observer and inter-observer SRD were close to zero for the CT-measurements focusing on aerated lung (end-expiratory lung volume, hyperinflation), and higher for the CT-measurements relying on accurate segmentation of the non-aerated lung (lung weight, tidal recruitment…). The average symmetric surface distance between lung segmentation masks was significatively lower in intra-observer comparisons (0.8 mm [interquartile range (IQR) 0.6-0.9]) as compared to human-human (1.0 mm [IQR 0.8-1.3] and human-machine inter-observer comparisons (1.1 mm [IQR 0.9-1.3]).
CONCLUSIONS : The SRD exceeding intra-observer experimental error in the measurement of alveolar recruitment may be conservatively set to 5% (i.e., the upper value of the CI95%). Human-machine and human-human inter-observer measurement errors with CT are of similar magnitude, suggesting that machine learning segmentation algorithms are credible alternative to humans for quantifying alveolar recruitment on CT.
Penarrubia Ludmilla, Verstraete Aude, Orkisz Maciej, Davila Eduardo, Boussel Loic, Yonis Hodane, Mezidi Mehdi, Dhelft Francois, Danjou William, Bazzani Alwin, Sigaud Florian, Bayat Sam, Terzi Nicolas, Girard Mehdi, Bitker Laurent, Roux Emmanuel, Richard Jean-Christophe
2023-Feb-17
Acute respiratory distress syndrome, Alveolar recruitment, Bias, Computed tomography, Machine learning, Measurement error, Repeatability, Reproducibility