I’ve been working a lot of automated geocoding of text over the last 6 months, and I’ve found myself consistently describing the same set of tasks or ways to extract location information from text. Here are some quick thoughts on how to schematize these geolocation tasks, relate them to each other, and where I think the future of the research is.
One of the most important trends in political science is the growth in subnational, geographically disaggregated quantitative research. The prerequisite for this research, of course, is having plentiful and high-quality georeferenced data. The software for generating georeferenced data are often difficult to build, scarce, or not easy to use. As part of my work with the Open Event Data Alliance to generate high-quality, freely available political event data, I’ve taken what I think is perhaps the best open source news text geocoding system, MIT’s CLIFF and packaged it into a virtual machine in the hope that anyone can set it up for their own use in a matter of minutes.