Voter Turnout
We first load in the full Voter Turnout dataset from IDEA VT. You can download this dataset from http://www.idea.int/vt/viewdata.cfm
import pandas as pd
df = pd.read_csv('~/Downloads/Voter_Turnout_IDEA_VT.csv')
df.head()
The codebook gives us some additional information about the variables. The following variable explanations will help us to make sense of the data.
vt: Represents voter turnout, given in percentages. Total vote / Registration
vote: Total vote, total number of voters
reg: registration, the number of people who were registered for elections
vapvt: Voting age population (VAP) turnout statistics: total vote / estimated voting age population
vap: Voting age population - an estimate of the total number of potential voters of voting age Note: in some countries, the number of registered voters is higher than the number of VAP.
pop: population
invot: invalid votes, includes blank votes
fhav: Freedom house indicator, an average of scores assigned on Political Rights and Civil Liberties dimensions.
fhpr: Freedom house -- Political rights - from 1 (free) to 7 (not free)
fhcl: Freedom ouse -- Civil liberties
comp: Provides data on whether a country mandates compulsory voting through legislation.
I'm not a huge fan of the names of these variables, so I'm going to go ahead and rename some of them.
df.columns = ['Country', 'Election Type', 'Year', 'Voter Turnout', 'Total Vote', 'Registered',
'Voting Age Population Turnout Percent', 'Voting Age Population',
'Population', 'Invalid Votes', 'Freedom House Indicator', 'Fredom House Indicator: Political Rights',
'Freedom House Indicator: Civil Liberties', 'Compulsory Voting Mandate']
Great! Now, let's start asking some questions
len(df.Country.value_counts()) #number of countries
df.describe()
df['Election Type'].value_counts()
print(min(df.Year), max(df.Year)) #Year range
#Getting stuff ready for nice visualizations
import matplotlib.pyplot as plt
import brewer2mpl
# Set up some better defaults for matplotlib
from matplotlib import rcParams
#colorbrewer2 Dark2 qualitative color table
dark2_colors = brewer2mpl.get_map('Dark2', 'Qualitative', 7).mpl_colors
rcParams['figure.figsize'] = (14, 10)
rcParams['figure.dpi'] = 150
rcParams['axes.color_cycle'] = dark2_colors
rcParams['lines.linewidth'] = 2
rcParams['axes.facecolor'] = 'white'
rcParams['font.size'] = 14
rcParams['patch.edgecolor'] = 'white'
rcParams['patch.facecolor'] = dark2_colors[0]
rcParams['font.family'] = 'StixGeneral'
def remove_border(axes=None, top=False, right=False, left=True, bottom=True):
"""
Minimize chartjunk by stripping out unnecesasry plot borders and axis ticks
The top/right/left/bottom keywords toggle whether the corresponding plot border is drawn
"""
ax = axes or plt.gca()
ax.spines['top'].set_visible(top)
ax.spines['right'].set_visible(right)
ax.spines['left'].set_visible(left)
ax.spines['bottom'].set_visible(bottom)
#turn off all ticks
ax.yaxis.set_ticks_position('none')
ax.xaxis.set_ticks_position('none')
#now re-enable visibles
if top:
ax.xaxis.tick_top()
if bottom:
ax.xaxis.tick_bottom()
if left:
ax.yaxis.tick_left()
if right:
ax.yaxis.tick_right()
%matplotlib inline
df.hist()
plt.figure()
plt.ylim([0,100])
country_group = df.groupby('Country')
country_group.size()
#Just presidential elections for now
for country in df.Country.unique():
country_df = df.ix[df.Country == country,]
country_df = country_df.ix[country_df['Election Type'] == 'Presidential']
country_df = country_df.sort('Year')
plt.plot(country_df.Year, country_df['Voter Turnout'])
#and now just the parliamentary elections
plt.figure()
plt.ylim([0,100])
#Just presidential elections for now
for country in df.Country.unique():
country_df = df.ix[df.Country == country,]
country_df = country_df.ix[country_df['Election Type'] == 'Presidential']
country_df = country_df.sort('Year')
plt.plot(country_df.Year, country_df['Voter Turnout'])
df.Country.unique()
df_pres = df.ix[df['Election Type'] == 'Presidential',]
df_pres = df_pres[['Country', 'Year', 'Voter Turnout']]
#calculate world average
turnout_average = df_pres.groupby('Year').mean()
turnout_average.reset_index(level=0, inplace=True)
turnout_average['Country'] = pd.Series(['World Average'] * len(turnout_average))
turnout_average.head()
df_pres = pd.concat([df_pres, turnout_average])
df_pres.dropna(inplace=True)
%cd ~/Downloads/
df_pres.to_csv('Voter_Turnout_IDEA_VT_Presidential.csv')