LLMs | Andy Halterman

LLMs

Using Synthetic Text Data to Train Better Classifiers

I’m excited to share my latest paper, now out in Political Analysis, which introduces a new approach to training supervised text classifiers. The core idea is simple: instead of relying solely on expensive hand-labeled data, we can use generative large language models (LLMs) to generate synthetic training examples, then fit a classifier on the synthetic text (and any real training data we have).

EM by Cardi B (and ChatGPT)

(Verse 1) \ Listen up, let me tell you ‘bout this algorithmic trick, \ It’s called EM, gonna make your models sick. \ When you got hidden variables, feeling kinda lost,\