Subsetting GDELT for Domestic Events

Update: This post is only useful if you’re using the reduced dataset. The complete dataset includes an ActionGeo_CountryCode field that you can easily use to pull only events occurring in a country of interest. I recommend using the full dataset.

As part of a project examining the effects of U.S. democracy and governance assistance on civil society, I need to subset GDELT to include events occurring only inside one country.  This is my walkthrough of how I subset only events occuring inside Georgia between 1979 and 2012 in the GDELT reduced dataset.

  1. iIf you use Phil Schrodt’s Python script (included in the reduced dataset download) to subset the dataset by “-src GEO -tar GEO”, you end up with only events that have “GEO” in both actor codes. This gives you 1,874 events from 1979 to 2012––obviously too few. It excludes events that have GEO in only one actor code but not the other, for example, Actor 1 as GEOMED and Actor 2 as GOV.

  2. Using -src GEO and -tar ALL in the included Python script gives you all events with Georgia either in the source or target. This operation gives you 37,309 events.

  3. However, doing this gives you a lot of GEO-USA, RUS-GEO, etc. events, which aren’t what we want if we only want domestic events. We have to get rid of all events that involve other states, filtering by their state codes.  Here’s the R code I use to do that.

