Use of machine-learning tools to support efficient study identification in Cochrane reviews: A case study and cost-effectiveness analysis

ID: 

18376

Session: 

Short oral session 6: Evidence synthesis methods

Date: 

Thursday 14 September 2017 - 11:00 to 12:30

Location: 

All authors in correct order:

Shemilt I1, Hollands G2, Carter P2, Thomas J1
1 EPPI-Centre, University College London, United Kingdom
2 Behaviour and Health Research Unit, University of Cambridge, United Kingdom
Presenting author and contact person

Presenting author:

Shemilt Ian

Contact person:

Abstract text
Background: Study identification is a time-intensive phase of systematic-review production and a key driver of the total cost. Machine-Learning (ML) tools have the potential to speed up study identification and reduce manual screening workload, making previously intractable reviews with ‘too many records’ problems more feasible. However ML tools have not previously been deployed in Cochrane reviews.

Objectives: To explore and evaluate the use of ML tools to support efficient study identification in Cochrane reviews.

Methods: A novel, semi-automated screening workflow – incorporating both active learning and topic-modelling tools – was designed and implemented in a Cochrane Public Health review to help identify eligible studies among c. 157 000 unique citations retrieved by electronic searches of 11 databases. Electronic searches were supplemented by extensive searches of other resources. A cost-effectiveness analysis (CEA) was conducted to model and compare: (A) the novel, semi-automated workflow; with (B) a conventional screening workflow; and, (C) a semi-automated workflow incorporating active learning without topic modelling.

Results: Use of the novel, semi-automated workflow (A) reduced manual title-abstract screening workload by 83% in this review, compared with conventional screening (B), without any loss of recall. Topic modelling did not identify any eligible studies. Searches of other resources identified 4 further eligible studies but none were published prior to the date of last search, so were not represented among the c. 157 000 electronic search results. A full set of CEA results will be presented. Prior to having full CEA results, it is clear that the modelled semi-automated workflow incorporating active learning without topic modelling (C) ‘dominates’ the other options (A and B) in this case, i.e. it would cost less, with identical recall.

Conclusions: Use of ML tools can make study identification more efficient in Cochrane reviews that have a ‘too many records’ problem. Further evaluations of ML tools are needed to assess the generalisability of this finding and to help build an evidence base for efficient workflow design in reviews.