Active learning is great, but what if you don’t already have a model? You can bootstrap your way to a machine learning model with majority-vote deterministic rules. Human labeled data is often the primary bottleneck in building good machine learning models.
Reproducible methods like knitr and version control using git are on their way toward being standard for academic code, even in social science disciplines such as political science. knitr, Rmarkdown, and Jupyter notebooks make it easy to verify that your findings and figures come from the most recent version of your code and that it runs without errors.
This tutorial covers how to create event data from a new set of text using existing Open Event Data Alliance tools. After going through it, you should be able to use the OEDA event data pipeline for your own projects with your own text.
In a previous version of my blog, I had posts from 2013 and 2014 describing how to work with a dataset called GDELT. I have serious misgivings about the quality of the data contained in GDELT, and after other members of the GDELT team cut ties with the project, I stopped using it as well.
I’ve been working a lot of automated geocoding of text over the last 6 months, and I’ve found myself consistently describing the same set of tasks or ways to extract location information from text.