Working Papers

Learning Political Events From Text

Andrew Halterman

Measuring what political actors do in the world at the core of empirical social science, and researchers expend enormous effort to compile data on events and behavior from text. I introduce a method for inductively and automatically learning political events from text. Qualitative coding of events is slow, expensive, and limits researchers’ abilities to explore large collections of text, while existing quantitative methods are brittle, expensive to create, and not suited to inductively learning event categories from text. I decompose the problem of learning events into two components. The first is to take in text and identify the key elements of an “event”: who is doing what to whom, where and when, as reported by whom? Each of these is a “slot” to be filled. I introduce a method for recognizing the span of text associated with each slot that combines information from the dependency parse and a machine learning classifier that distinguishes between words that occupy the same grammatical role in a sentence but fill different slots (e.g. “fired missiles” vs. “fired Tillerson”). The second task is to aggregate these extracted spans in a way that is useful for researchers. I introduce a new short text clustering algorithm that draws on prior information in the form of word embeddings to learn different event types from text. These techniques are not domain specific, so researchers can apply it to their own questions in the same way they would use topic models. I apply the method to an ongoing debate in international politics about the level of respect for human rights. I extract and cluster over one million events reported in State Department reports and find that although the number of reported violations is increasing, the types of actions and the specificity of each have changed over time, suggesting a changing standard of reporting.

Do the answers you get depend on the news you read? Protests and violence in Syria

Andrew Halterman, Jill Irvine, and Khaled Jabr

Machine-produced event data from news text is a cheap, accurate, and useful source of empirical data for researchers in political science. Many quantitative analyses rely on English language text rather than text in the local language. We investigate the relationship between protests in Syria and subsequent violence in Syria and demonstrate that the results are substantially different using data coded from English and Arabic news sources. Using a gold standard hand coded dataset (Mazur 2018), we find a significant effect of protests on subsequent violence in the locality. The result holds when using data coded from Arabic sources, but becomes insignificant when using English language text. These results suggest that researchers should include text from the local language when using automated text analysis to study subnational outcomes. We also offer guidance on which statistical classifiers perform best on detecting protest events in short text. While a neural net classifier using state-of-the-art BERT embeddings slightly outperforms other models and feature representations in Arabic, a simple random forest on a bag-of-words performs best in English.

Where Do Perceptions Match Provision? A Method for Learning the Scope Conditions of Retrospective Evaluation Theories

Paige Bollen, Andrew Halterman, and Blair Read

Citizens draw on a range of heuristics when evaluating government performance, including assessments of their personal material well-being and salient identities. The literature in African politics has sometimes neglected the role of egotropic heuristics in favor of focusing on identity considerations. We empirically evaluate the extent to which each type of heuristic explains citizens’ assessment of government policy performance, focusing on citizen satisfaction with roads. We employ an empirical method that treats individual theory membership as a latent variable, estimating the extent to which each citizen’s evaluation of service delivery is driven by identity or egotropic considerations. We introduce a new technique that estimates the scope conditions that predict consistency with each theory, finding that democratic competition and growth make identity-based heuristics more relevant, whereas collective hardship make egotropic heuristics more salient.

Violence against civilians in the Syrian civil war

Andrew Halterman

[pdf] [appendix]

Why have armed groups in the Syrian civil war deliberately killed so many civilians? Existing theories of civilian targeting in war offer indeterminate predictions about violence against civilians in civil war: targeting civilians can “drain the sea”, but lose “hearts and minds”, be rational or driven by emotions, carefully targeted or indiscriminately applied, or could be the inadvertent byproduct of conventional fighting. I compile the largest available micro-dataset on civilian death in civil war, comprising data on the dates, locations, and causes of over 100,000 civilian deaths in the Syrian war, along with fine-grained data on armed groups’ territorial control and locations of arrests during the protests in 2011. Using this data, I systematically evaluate existing theories’ abilities to explain violence in Syria. I find little support for prominent theories of violence against civilians that emphasize the importance of “hearts and minds,” intelligence, and territorial control, principal- agent problems, or “desperation”. Instead, strategic logics of deliberate civilian violence, especially “political” repression in areas of anti-regime mobilization and “war winning” mass violence explain the majority of casualties in Syria. The new micro-level dataset will contribute to other studies of violence in the context of civil war.


How Right Wing is Right Wing Populism? Evidence from the Manifesto Corpus

Jill Irvine, Andrew Halterman, and Nicholas Halterman


Right wing populist parties in Europe are clearly different from other right wing parties in their rhetoric and electoral appeal. Some observers see substantive differences between right wing populists and other right wing parties, with populists supporting the welfare state and gender equality more than other right wing parties, often as part of an anti-immigration and anti-Muslim agenda. We test this claim using novel data produced by a multilingual convolutional neural net on political party platforms for the years 1990 to 2015 from the Manifesto Corpus. We find no systematic differences between right wing populists and non-populists on support for welfare and gender equality, though there is some evidence that more successful populists are more centrist.