Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In Journal of clinical epidemiology ; h5-index 60.0

OBJECTIVE : To assess the feasibility of a modified workflow that uses machine learning and crowdsourcing to identify studies for potential inclusion in a systematic review.

STUDY DESIGN AND SETTING : This was a sub-study to a larger randomised study; the main study sought to assess the performance of single screening search results versus dual screening. This sub-study assessed the performance in identifying relevant RCTs for a published Cochrane review of a modified version of Cochrane's Screen4Me workflow which uses crowdsourcing and machine learning. We included participants who had signed up for the main study but who were not eligible to be randomised to the two main arms of that study. The records were put through the modified workflow where a machine learning classifier divided the dataset into "Not RCTs" and "Possible RCTs". The records deemed "Possible RCTs" were then loaded into a task created on the Cochrane Crowd platform and participants classified those records as either "Potentially relevant" or "Not relevant" to the review. Using a pre-specified agreement algorithm we calculated the performance of the crowd in correctly identifying the studies that were included in the review (sensitivity) and correctly rejecting those that were not included (specificity).

RESULTS : The RCT machine learning classifier did not reject any of the included studies. In terms of the crowd, 112 participants were included in this sub-study. Of these, 81 completed the training module and went on to screen records in the live task. Applying the Cochrane Crowd agreement algorithm, the crowd achieved 100% sensitivity and 80.71% specificity.

CONCLUSIONS : Using a crowd to screen search results for systematic reviews can be an accurate method as long as the agreement algorithm in place is robust.

TRIAL REGISTRATION : Open Science Framework:

Noel-Storr Anna, Dooley Gordon, Affengruber Lisa, Gartlehner Gerald


accuracy, agreement algorithm, crowdsourcing, human computation, literature screening, machine learning, systematic reviews