Evidence synthesis (ES) is the process of converting scientific outputs (such as articles and reports) into usable evidence to inform management or policy. ES is critical to evidence-based management in a range of fields from environmental conservation to medicine, but ES projects can be enormously time-consuming and expensive. The idea behind the hackathon was for academics and programmers to come together and build tools to speed up the process for future evidence synthesis research. The participants were a diverse group, coming from 5 continents and a wide range of professions. For the R fans reading this post, the hackathon featured talks by two authors of popular meta-analysis packages: Marc LaJeunesse (Associate Professor in ecology and evolutionary biology at the University of South Florida), creator of the metagear package, and Wolfgang Viechtbauer (Associate Professor of methodology and statistics at Maastricht University), creator of metafor.
Over the three day hackathon, ten projects were built. The projects ranged from code-driven tools for parsing common documents used in evidence synthesis to academic papers describing how to use particular tools when doing a systematic review. At the hackathon, Data-Driven Yale was represented by Yale-NUS rising Senior Geoffrey Martin and staff data scientist Andrew Feierman. In a way, the contrast between the two DDY attendees was representative of the professional diversity of the larger hackathon group. Geoffrey has a traditional computer science background and has been introduced to environmental policy largely through his work with Data-Driven Yale, while Andrew began coding as a means to improve upon his existing research on environmental policy. The diverse backgrounds of the hackathon attendees created an environment rife for collaboration and interdisciplinary teaching.
Using Natural Language Processing to Improve Search Queries
Working to help reduce bias in the search queries during evidence synthesis, Geoffrey collaborated with fellow hackers Spencer Dixon from the UN Environment World Conservation Monitoring Centre and Panagiotis Bozelos from the University of Oxford, on a text processing application, Paperweight, driven by a combination of natural language processing (NLP) algorithms. In the evidence synthesis process, the first steps typically require reviewers to manually build a database of articles and journals they want to summarize. This process entails an exhaustive search of Google Scholar using manually chosen keywords. This approach is vulnerable to bias since the reviewer might be more likely to find certain articles or journals in their review over other ones, depending on the selected search keywords. Tackling this problem, Geoffrey and his teammates sought to remove the need for a reviewer to manually choose keywords to form their search queries.
Here at day 2 of the @SEIclimate #ESHackathon, rising @yalenus senior Geoffrey Martin is hard at work using natural language processing to improve search queries for academic papers pic.twitter.com/4Ug96RWYXT
— Data-Driven EnviroLab (@datadrivenlab) April 24, 2018
In essence, Paperweight takes as input an RIS file of publications (which can be exported from Scopus or Web of Science) that the reviewer is confident should be included in the final evidence synthesis. Then, Paperweight outputs a list of summary keywords and phrases, extracted using the RAKE and TextRank NLP algorithms, that the reviewer can then use for their search query. In this way, the reviewer need only identify several publications that they know will be included in their final review to retrieve a larger list of publications that should also be included in the review. Although Paperweight does not claim to remove all bias, as the reviewer ultimately still needs to decide on an initial collection of publications, the team still believes it can meaningfully reduce early stage bias in evidence synthesis. Paperweight is still under development and is open to pull requests at: https://github.com/ESHackathon/paperweight-python.
Screenshots from Paperweight demonstrating how a user might input RIS files and retrieve their keyword summaries.
Building Better Systematic Maps
Andrew worked primarily with a team building a tool called eviatlas, which is being designed to streamline the process of creating systematic maps. Systematic Maps are, according to the Environmental Evidence Journal, “overviews of the quantity and quality of evidence in relation to a broad (open) question of policy or management relevance.” In simple terms, this means that documents are categorized according to the type, location, and publication information available for each work within a particular topic. Systematic maps are often used for environmental research, where it is particularly important to track the location of study sites. The spatial nature of a systematic map, particularly for environmental research, means that academics often use some kind of geographic map to analyze and present their information. Understanding the academic community’s familiarity with the R programming language, the team decided to build a webapp using R Shiny that could automate certain parts of creating a systematic map for environmental research.
Overall, the hackathon was a tremendous experience, and we thank the great hosts at SEI and the Australian National University for having us. The non-competitive approach to the hackathon led to more efficient team-building and increased collaboration, which in turn helped the hackathon create better tools to release into the open-source community. We look forward to applying the collaborative concepts learned at the Evidence Synthesis Hackathon as we organize future hackathons and events.