I’ve been working a lot of automated geocoding of text over the last 6 months, and I’ve found myself consistently describing the same set of tasks or ways to extract location information from text. Here are some quick thoughts on how to schematize these geolocation tasks, relate them to each other, and where I think the future of the research is.
One of the most important trends in political science is the growth in subnational, geographically disaggregated quantitative research. The prerequisite for this research, of course, is having plentiful and high-quality georeferenced data. The software for generating georeferenced data are often difficult to build, scarce, or not easy to use. As part of my work with the Open Event Data Alliance to generate high-quality, freely available political event data, I’ve taken what I think is perhaps the best open source news text geocoding system, MIT’s CLIFF and packaged it into a virtual machine in the hope that anyone can set it up for their own use in a matter of minutes.
Two weeks ago, I went to the fall conference for the European Conflict Research Network (ENCoRe) in Uppsala, Sweden. The research projects that the (exceptionally welcoming) presenters detailed are really interesting and almost universally involve the production of new datasets. I presented a paper I wrote with John Beieler on some of the work we’re doing with PETRARCH, the Open Event Data Alliance (OEDA), and the production of new event data.
By John Beieler and Andy Halterman
The two of us, along with Phil Schrodt, Patrick Brandt, Erin Simpson, and Muhammed Idris have been working on several interrelated projects that we believe will improve the availability and quality of event data. We’ve discussed these projects formally at ISA and informally at MPSA and elsewhere. But we think these issues are important enough that they bear repeating here. These four projects are PETRARCH, Phoenix, EL:DIABLO, and the Open Event Data Alliance (OEDA). (Fun game: which of these are acronyms, backronyms, and regular words?).
Jay Ulfelder has a very nice post about the problems that applied researchers face when working with data that changes rapidly in its availability and production. I agree with the suggestions he proposes (which were, very roughly, 1. modularity in applied uses, 2. transparency in generating data, and 3. awareness of the larger data ecosystem), and wanted to add my own. This is a lightly edited version of a comment that I left on his post.
It’s been a very forecast-y week. Between (finally) reading Phil Tetlock’s excellent Expert Political Judgement, going to a half dozen panels at ISA on forecasting and event data, and today’s NPR story about the Good Judgement Project, I’ve been thinking a lot about how to make political forecasts and how we know when they’re good. I wanted to share one of the tools I’ve been using for forecasting and calculating subjective probabilities.
The International Studies Association had its annual conference last week in Toronto. I met many people I knew only online or from the reference sections of papers, and had a overall great time.
Because of the ongoing legal controversy around GDELT, we have no immediate plans to submit the paper for publication, but we welcome any feedback on the methodology and plan to use it in the future.