Machine Learning for Identifying Randomized Controlled Trials: an evaluation and practitioner’s guide

Our new paper, just published in the Journal of Research Synthesis Methods, has evaluated machine learning (ML) for identification of RCTs. and has shown that ML works better than traditional database search filters.

To help get this into practice, we have released software as open source (available here), which implements our algorithm. Our software takes a standard database search result (in RIS format), and filters out the RCTs with very high accuracy.

Probably not all our users are keen to run Python code from scratch. We are therefore keen to get our RobotReviewer RCT classifier in as many databases as possible, so you could use the algorithm at the click of a button. Already, our RCT classifications are in the TRIP database, and hopefully more will follow.

We will keep this page updated with new ways of using our RCT classifier as they come live, and we're working on some new posts about why and how to use machine learning in practice — more soon!

We're grateful to the Cochrane Crowd volunteers, and to the McMaster HERU team for sharing their data with us which allowed us to build the tool and validate it.