Two weeks ago, I went to the fall conference for the European Conflict Research Network (ENCoRe) in Uppsala, Sweden. The research projects that the (exceptionally welcoming) presenters detailed are really interesting and almost universally involve the production of new datasets. I presented a paper I wrote with John Beieler on some of the work we’re doing with PETRARCH, the Open Event Data Alliance (OEDA), and the production of new event data.
I also had several people ask me about how they could contribute to dictionary development for PETRARCH, which is one the main objectives of OEDA. Expect some more information soon about how researchers and contribute modifications and improvements to the dictionaries.
Our paper addressed a topic I’ve discussed previously on this blog, which is the tradeoff between frequent updates to the dataset and the kind of data stability that’s needed for good forecasting and causal inference work. We show that a small number of changes to the dictionaries used to code our data can create large changes in the resulting data. As more people get involved with improving dictionaries, these changes will come more and more frequently (which on the whole is extremely good). We argue that there are two basic categories of event data users: those who are doing monitoring and those who are doing forecasting and inference work. These two groups have very different needs regarding dictionary updates. To accommodate both, we will version our data and commit to supporting major versions for a year, so people engaged in monitoring can always access the most up-to-date and improved form of the data, but people who require stability in the data generating process can be assured of data consistency.
The full paper is available here: A New, Near-Real-Time Event Dataset and the Role of
All the replication code is available on the Caerus Associates Github. Any comments and feedback are welcome.