ENCoRe Conference Paper: A New, Near-Real-Time Event Dataset and the Role of Versioning

Two weeks ago, I went to the fall conference for the European Conflict Research Network (ENCoRe) in Uppsala, Sweden. The research projects that the (exceptionally welcoming) presenters detailed are really interesting and almost universally involve the production of new datasets. I presented a paper I wrote with John Beieler on some of the work we’re doing with PETRARCH, the Open Event Data Alliance (OEDA), and the production of new event data.

I also had several people ask me about how they could contribute to dictionary development for PETRARCH, which is one the main objectives of OEDA. Expect some more information soon about how researchers and contribute modifications and improvements to the dictionaries.

Our paper addressed a topic I’ve discussed previously on this blog, which is the tradeoff between frequent updates to the dataset and the kind of data stability that’s needed for good forecasting and causal inference work. We show that a small number of changes to the dictionaries used to code our data can create large changes in the resulting data. As more people get involved with improving dictionaries, these changes will come more and more frequently (which on the whole is extremely good). We argue that there are two basic categories of event data users: those who are doing monitoring and those who are doing forecasting and inference work. These two groups have very different needs regarding dictionary updates. To accommodate both, we will version our data and commit to supporting major versions for a year, so people engaged in monitoring can always access the most up-to-date and improved form of the data, but people who require stability in the data generating process can be assured of data consistency.

The full paper is available here: A New, Near-Real-Time Event Dataset and the Role of
Versioning
.

All the replication code is available on the Caerus Associates Github. Any comments and feedback are welcome.

Advertisements
ENCoRe Conference Paper: A New, Near-Real-Time Event Dataset and the Role of Versioning

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s