Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine

STUDY OBJECTIVES : Polysomnography is the gold standard in identifying sleep stages; however, there are discrepancies in how technicians use the standards. Because organizing meetings to evaluate this discrepancy and/or reach a consensus among multiple sleep centers is time consuming, we developed an artificial intelligence (AI) system to efficiently evaluate the reliability and consistency of sleep scoring, and hence the sleep center quality.

METHODS : An interpretable machine learning algorithm was used to evaluate interrater reliability (IRR) of sleep stage annotation among sleep centers. The AI system was trained to learn raters from one hospital, and applied to subjects from the same or other hospitals. The results were compared with the experts' annotation to determine IRR. Intra-center and inter-center assessments were conducted on 679 subjects without sleep apnea from six sleep centers in Taiwan. Centers with potential quality issues were identified by the estimated IRR.

RESULTS : In the intra-center assessment, the median accuracy ranged from 80·3% to 83·3% with the exception of one hospital (designated E) with an accuracy of 72·3%. In the inter-center assessment, the median accuracy ranged from 75·7% to 83·3% when hospital E was excluded from testing and training. The performance of the proposed method was higher for N2, awake, and REM, compared to N1 and N3. The significant IRR discrepancy of hospital E suggested a quality issue. This quality issue is confirmed by the physicians in charge of hospital E.

CONCLUSIONS : The proposed AI system proved effective in assessing IRR and hence the sleep center quality.

Liu Gi-Ren, Lin Ting-Yu, Wu Hau-Tieng, Sheu Yuan-Chung, Liu Ching-Lung, Liu Wen-Te, Yang Mei-Chen, Ni Yung-Lun, Chou Kun-Ta, Chen Chao-Hsien, Wu Dean, Lan Chou-Chin, Chiu Kuo-Liang, Chiu Hwa-Yen, Lo Yu-Lun


inter-center assessments, interrater reliability, intra-center assessments, machine learning, sleep stage scoring