In The American journal of gastroenterology
OBJECTIVES : There is currently no widely accepted approach to screening for pancreatic cancer (PC). We aimed to develop and validate a risk prediction model for pancreatic ductal adenocarcinoma (PDAC), the most common form of PC, across two health systems using electronic health records (EHR).
METHODS : This retrospective cohort study consisted of patients 50-84 years of age having at least one clinic-based visit over a 10-year study period at Kaiser Permanente Southern California (KPSC, model training, internal validation) and the Veterans Affairs (VA, external testing). 'Random survival forests' models were built to identify the most relevant predictors from >500 variables and to predict risk of PDAC within 18 months of cohort entry.
RESULTS : The KPSC cohort consisted of 1.8 million patients (mean age 61.6) with 1,792 PDAC cases. The 18-month incidence rate of PDAC was 0.77 (95% CI 0.73-0.80)/1,000 person-years. The final main model contained age, abdominal pain, weight change, HbA1c and ALT change (c-index: mean=0.77, SD=0.02; calibration test: p-value 0.4, SD 0.3). The final early detection model comprised the same features as those selected by the main model except for abdominal pain (c-index: 0.77 and SD 0.4; calibration test: p-value 0.3 and SD 0.3). The VA testing cohort consisted of 2.7 million patients (mean age 66.1) with an 18-month incidence rate of 1.27 (1.23-1.30)/1,000 person-years. The recalibrated main and early detection models based on VA testing datasets achieved mean c-index of 0.71 (SD 0.002) and 0.68 (SD 0.003), respectively.
CONCLUSIONS : Using widely available parameters in EHR, we developed and externally validated parsimonious machine learning-based models for detection of pancreatic cancer. These models may be suitable for real-time clinical application.
Chen Wansu, Zhou Yichen, Xie Fagen, Butler Rebecca K, Jeon Christie Y, Luong Tiffany Q, Zhou Botao, Lin Yu-Chen, Lustigova Eva, Pisegna Joseph R, Kim Sungjin, Wu Bechien U