I’m excited to share my latest paper, now out in Political Analysis, which introduces a new approach to training supervised text classifiers. The core idea is simple: instead of relying solely on expensive hand-labeled data, we can use generative large language models (LLMs) to generate synthetic training examples, then fit a classifier on the synthetic text (and any real training data we have).
(Verse 1) \
Listen up, let me tell you ‘bout this algorithmic trick, \
It’s called EM, gonna make your models sick. \
When you got hidden variables, feeling kinda lost,\