ICWSM 2022 Tutorial: Information Extraction for Social Science Research

June 6, 2022 at ICWSM 2022 This workshop provides an interactive introduction to information extraction for social science–techniques for identifying specific words, phrases, or pieces of information contained within documents. It focuses on two common techniques, named entity recognition and dependency parses, and shows how they can provide useful descriptive data about the civil war in Syria.

Bootstrapping your way to active learning

Active learning is great, but what if you don’t already have a model? You can bootstrap your way to a machine learning model with majority-vote deterministic rules. Human labeled data is often the primary bottleneck in building good machine learning models.

Event Data in 30 Lines of Python

Much of my work involves improving large-scale systems to extract political events from text (see code from our NSF project on the subject here). These systems are designed for full production use over many hundreds of sources both daily and for the past in many dozens of event categories, including protests, armed conflict, statements, arrests, and humanitarian aid.

Managing Machine Learning Experiments

Reproducible methods like knitr and version control using git are on their way toward being standard for academic code, even in social science disciplines such as political science. knitr, Rmarkdown, and Jupyter notebooks make it easy to verify that your findings and figures come from the most recent version of your code and that it runs without errors.

Making Event Data From Scratch: A Step-By-Step Guide

This tutorial covers how to create event data from a new set of text using existing Open Event Data Alliance tools. After going through it, you should be able to use the OEDA event data pipeline for your own projects with your own text.

The future of forecasting: the good and the bad

Science has a special issue this month on forecasting political behavior, which included an essay by Cederman and Weidmann in which they discuss the limitations of current conflict forecasting models, as well as the areas where they’re better than many people, including political science scholars, think they are.

Three Tasks in Automated Text Geocoding

I’ve been working a lot of automated geocoding of text over the last 6 months, and I’ve found myself consistently describing the same set of tasks or ways to extract location information from text.

The Good Judgement Project and Bayes' Calculator

It’s been a very forecast-y week. Between (finally) reading Phil Tetlock’s excellent Expert Political Judgement, going to a half dozen panels at ISA on forecasting and event data, and today’s NPR story about the Good Judgement Project, I’ve been thinking a lot about how to make political forecasts and how we know when they’re good.