Extracting Political Events from Text Using Syntax and Semantics


Many questions in empirical political science concern the relations between political actors. Researchers have long used text as a source of data on political actors behaviors and relationships. Manually extracting this information from text is slow and expensive, while existing automated methods are inaccurate or limited to a small set of pre-defined actors and actions. This paper formalizes the process of event extraction and introduces a method for identifying the words in text that report who is doing what to whom, along with where, when, why, and as reported by whom. To do so, it draws on natural language processing tools that provide information about the syntactic information of a sentence and neural networks trained on a diverse set of hand-labeled text. The extracted actors and events can be analyzed with hand-constructed dictionaries or classifiers, or can be clustered to inductively find types of actors or behaviors. I compare the performance of the model with existing techniques on a corpus of text from the Times of India. I then apply it to State Department reporting on human rights, extracting 1 million events from the corpus and apply a clustering algorithm to learn categories of human rights abuses.”

Andy Halterman
Andy Halterman
PhD Candidate

My research interests include natural language processing, text as data, and subnational armed conflict