ArXiv Preprint
Generalization is one of the main challenges of computational pathology.
Slide preparation heterogeneity and the diversity of scanners lead to poor
model performance when used on data from medical centers not seen during
training. In order to achieve stain invariance in breast invasive carcinoma
patch classification, we implement a stain translation strategy using cycleGANs
for unsupervised image-to-image translation. We compare three cycleGAN-based
approaches to a baseline classification model obtained without any stain
invariance strategy. Two of the proposed approaches use cycleGAN's translations
at inference or training in order to build stain-specific classification
models. The last method uses them for stain data augmentation during training.
This constrains the classification model to learn stain-invariant features.
Baseline metrics are set by training and testing the baseline classification
model on a reference stain. We assessed performances using three medical
centers with H&E and H&E&S staining. Every approach tested in this study
improves baseline metrics without needing labels on target stains. The stain
augmentation-based approach produced the best results on every stain. Each
method's pros and cons are studied and discussed in this paper. However,
training highly performing cycleGANs models in itself represents a challenge.
In this work, we introduce a systematical method for optimizing cycleGAN
training by setting a novel stopping criterion. This method has the benefit of
not requiring any visual inspection of cycleGAN results and proves superiority
to methods using a predefined number of training epochs. In addition, we also
study the minimal amount of data required for cycleGAN training.
Nicolas Nerrienet, Rémy Peyret, Marie Sockeel, Stéphane Sockeel
2023-01-30