Automating RCT identification can identify more than 99% of RCTs included in Cochrane reviews and could reduce manual screening by 77%

ID:

3054

Session:

Poster session 3 Friday: Evidence Tools / Evidence synthesis - creation, publication and updating in the digital age

Date:

Friday 15 September 2017 - 12:30 to 14:00

Location:

Exhibition Halls 1 & 2

All authors in correct order:

Thomas J¹, McDonald S², Noel-Storr A³, Shemilt I¹, Wisniewski S³

¹ EPPI-Centre, UCL, London, United Kingdom
² Cochrane Australia, Australia
³ Radcliffe Department of Medicine, Oxford University, United Kingdom

Presenting author and contact person

Presenting author:

James Thomas

Contact person:

James Thomas

Abstract text

Background: The identification of RCTs for inclusion in Cochrane reviews is an extremely labour-intensive task. Machine learning may be able to reduce the manual burden of study identification, and previous evaluations have shown that recall of more than 99% is attainable whilst excluding more than 75% of citations automatically; leaving less than 25% for manual checking.

Objectives: To evaluate the performance of a machine-learning classifier to reduce manual workload in screening in terms of the burden of screening saved and the ‘cost’ in terms of relevant studies being erroneously excluded.

Methods: A machine-learning classifier was built to distinguish between RCTs and non-RCTs using more than 280 000 records from the Cochrane Crowd.

In January 2017, the classifier was applied to all 94 305 citations to studies included in published Cochrane reviews that had an inclusion criterion of including RCTs only. Records with no abstract were counted as ‘identified’ on the assumption they would have been manually checked.

Results: The classifier and assumed manual checking of records which did not have abstracts correctly identified 93 536 RCTs, leaving 769 records as potentially ‘missed’. The ‘missed’ RCTs were included in 510 (11.41% of reviews) with most reviews ‘losing’ one or two RCTs (366 and 92 respectively); and two reviews ‘losing’ 13 RCTs.

Conclusions: The RCT classifier and the manual checking of records missing abstracts can obtain a high recall of more than 99% when applied across Cochrane reviews. If it were used to screen out low probability RCTs, it might save between 1.75 and 3 million citations per annum from being manually screened.