Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In Academic medicine : journal of the Association of American Medical Colleges

PROBLEM : Reviewing residency application narrative components is time intensive and has contributed in part to nearly half of all applications not receiving holistic review. The authors developed a natural language processing (NLP) based tool to automate review of applicants' narrative experience entries and predict interview invitation.

APPROACH : Experience entries (n = 188,500) were extracted from 6,403 residency applications across 3 application cycles (2017-2019) at 1 internal medicine program, combined at the applicant level, and paired with the interview invitation decision (n = 1,224 invitations). NLP identified important words (or word pairs) with term frequency-inverse document frequency, which were used to predict interview invitation using logistic regression with L1 regularization. Terms remaining in the model were analyzed thematically. Logistic regression models were also built using structured application data and a combination of NLP and structured data. Model performance was evaluated on never-before-seen data using area under the receiver operating characteristic and precision-recall curves (AUROC, AUPRC).

OUTCOMES : The NLP model had an AUROC of 0.80 (vs. chance decision of 0.50) and AUPRC of 0.49 (vs. chance decision of 0.19), showing moderate predictive strength. Phrases indicating active leadership, research, or work in social justice and health disparities were associated with interview invitation. The model's detection of these key selection factors demonstrated face validity. Adding structured data to the model significantly improved prediction (AUROC 0.92, AUPRC 0.73), as expected given reliance on such metrics for interview invitation.

NEXT STEPS : This model represents a first step in using NLP-based artificial intelligence tools to promote holistic residency application review. The authors are assessing the practical utility of using this model to identify applicants screened out using traditional metrics. Generalizability must be determined through model retraining and evaluation at other programs. Work is ongoing to thwart model "gaming," improve prediction, and remove unwanted biases introduced during model training.

Mahtani Arun Umesh, Reinstein Ilan, Marin Marina, Burk-Rafel Jesse

2023-Mar-16