In NPJ digital medicine
To drive health innovation that meets the needs of all and democratize healthcare, there is a need to assess the generalization performance of deep learning (DL) algorithms across various distribution shifts to ensure that these algorithms are robust. This retrospective study is, to the best of our knowledge, an original attempt to develop and assess the generalization performance of a DL model for AF events detection from long term beat-to-beat intervals across geography, ages and sexes. The new recurrent DL model, denoted ArNet2, is developed on a large retrospective dataset of 2,147 patients totaling 51,386 h obtained from continuous electrocardiogram (ECG). The model's generalization is evaluated on manually annotated test sets from four centers (USA, Israel, Japan and China) totaling 402 patients. The model is further validated on a retrospective dataset of 1,825 consecutives Holter recordings from Israel. The model outperforms benchmark state-of-the-art models and generalized well across geography, ages and sexes. For the task of event detection ArNet2 performance was higher for female than male, higher for young adults (less than 61 years old) than other age groups and across geography. Finally, ArNet2 shows better performance for the test sets from the USA and China. The main finding explaining these variations is an impairment in performance in groups with a higher prevalence of atrial flutter (AFL). Our findings on the relative performance of ArNet2 across groups may have clinical implications on the choice of the preferred AF examination method to use relative to the group of interest.
Biton Shany, Aldhafeeri Mohsin, Marcusohn Erez, Tsutsui Kenta, Szwagier Tom, Elias Adi, Oster Julien, Sellal Jean Marc, Suleiman Mahmoud, Behar Joachim A
2023-Mar-17