Posts

Bootstrapping your way to active learning

Active learning is great, but what if you don't already have a model? You can bootstrap your way to a machine learning model with majority-vote deterministic rules. Human labeled data is often the primary bottleneck in building good machine learning models. One of the most important developments in machine learning recently is the development of … Continue reading Bootstrapping your way to active learning

Advertisements

Event Data in 30 Lines of Python

Much of my work involves improving large-scale systems to extract political events from text (see code from our NSF project on the subject here). These systems are designed for full production use over many hundreds of sources both daily and for the past in many dozens of event categories, including protests, armed conflict, statements, arrests, … Continue reading Event Data in 30 Lines of Python

Three Tasks in Automated Text Geocoding

I've been working a lot of automated geocoding of text over the last 6 months, and I've found myself consistently describing the same set of tasks or ways to extract location information from text. Here are some quick thoughts on how to schematize these geolocation tasks, relate them to each other, and where I think … Continue reading Three Tasks in Automated Text Geocoding

ENCoRe Conference Paper: A New, Near-Real-Time Event Dataset and the Role of Versioning

Two weeks ago, I went to the fall conference for the European Conflict Research Network (ENCoRe) in Uppsala, Sweden. The research projects that the (exceptionally welcoming) presenters detailed are really interesting and almost universally involve the production of new datasets. I presented a paper I wrote with John Beieler on some of the work we're … Continue reading ENCoRe Conference Paper: A New, Near-Real-Time Event Dataset and the Role of Versioning