Python | Andy Halterman

Python

Using Synthetic Text Data to Train Better Classifiers

I’m excited to share my latest paper, now out in Political Analysis, which introduces a new approach to training supervised text classifiers. The core idea is simple: instead of relying solely on expensive hand-labeled data, we can use generative large language models (LLMs) to generate synthetic training examples, then fit a classifier on the synthetic text (and any real training data we have).

Crash course in Python for R users

This is Part 1 of a two part series. Stay tuned for Part 2, which will cover numpy, pandas and scikit-learn. R is an extremely powerful language for data analysis, and probably the best language for working with tabular data, running regressions, and making visualizations.

Simple political actor classification with "soft" dictionaries

As political scientists, we are often interested in using text to understand the actions of political actors. Thankfully, have a growing set of tools for identifying political actors in text, including named entity recognition and dependency parses, custom event models, or hand labeling events text.