computational social science | Andy Halterman

computational social science

Codebook LLMs: Evaluating LLMs as Measurement Tools for Political Science Concepts

Evaluating LLMs' abilities to apply political science codebooks.

Using Synthetic Text Data to Train Better Classifiers

I’m excited to share my latest paper, now out in Political Analysis, which introduces a new approach to training supervised text classifiers. The core idea is simple: instead of relying solely on expensive hand-labeled data, we can use generative large language models (LLMs) to generate synthetic training examples, then fit a classifier on the synthetic text (and any real training data we have).

Codebook LLM Evaluation Framework: Systematic Approach to Large Language Models in Political Science Text Analysis

New framework for evaluating LLMs on political science codebooks with behavioral testing, zero-shot assessment, and instruction tuning guidance.

Simple political actor classification with "soft" dictionaries

As political scientists, we are often interested in using text to understand the actions of political actors. Thankfully, have a growing set of tools for identifying political actors in text, including named entity recognition and dependency parses, custom event models, or hand labeling events text.

Introducing Mordecai 3: A Neural Geoparser and Event Geocoder in Python

Researchers working with text data are often faced with the problem of identifying place names in text and linking them to their geographic coordinates. In social science, we might want to measure news coverage of specific locations, track discussions of specific places in government documents, or geolocate events such protests to the locations where they occur.

Tutorial: Information Extraction for Social Science Research

This workshop provides an interactive introduction to information extraction for social science–techniques for identifying specific words, phrases, or pieces of information contained within documents. It focuses on two common techniques, named entity recognition and dependency parses using the spaCy library, and shows how they can provide useful descriptive data about the civil war in Syria.