Notes

This document illustrates the preprocessing of the dataset visualized in this article on srf.ch.

SRF Data attaches great importance to transparent and reproducible data preprocessing and -analysis. SRF Data believes in the principles of open data but also open and reproducible methods. Third parties should be empowered to build on the work of SRF Data and to generate new analyses and applications.

R-Script & processed data

The preprocessing and analysis of the data was conducted in the R project for statistical computing. The RMarkdown script used to generate this document and all the resulting data can be downloaded under this link. Through executing main.Rmd, the herein described process can be reproduced and this document can be generated. In the course of this, data from the folder ìnput will be processed and results will be written to output.

Attention: Please set your working directory in the first code chunk!

GitHub

The code for the herein described process can also be freely downloaded from https://github.com/srfdata/2015-06-elections-partystrengths. Criticism in the form of GitHub issues and pull requests are very welcome!

License

Creative Commons License
2015-06-elections-partystrengths by SRF Data is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Exclusion of liability

The published information has been collated carefully, but no guarantee is offered of its completeness, correctness or up-to-date nature. No liability is accepted for damage or loss incurred from the use of this script or the information drawn from it. This exclusion of liability also applies to third-party content that is accessible via this offer.

Other projects

All code & data from SRF Data is available under http://srfdata.github.io.

Data description

Original data source

  • input/parteienstaerke.xlsx - Communal party strengths, received by the Federal Statistical Office (BFS) (not publicly available as far as we know)
  • input/parteienstaerke_mod_2.xlsx - Slightly and manually structurally adjusted parteienstaerke.xlsx in order to make reading into R easier - note: file seems to have errors when opened via Excel/OpenOffice/etc. but still contains valid, R-readable data
  • input/parteienstaerke_national.xlsx - National party strengths, without formatting, as publicly available from the BFS
  • input/gdenamen_translation.xlsx - Common translations for Swiss municipalities, without formatting, as publicly available from the BFS (sheet “Geb. Übers. - Trad. usuelles”)
  • input/geodata/g2g14.* - Official, unaltered, generalized municipality borders (2014) as available from the BFS

Description of output

The following sections describe the results of the data preprocessing as stored in the output folder.

output/municipalities.csv

Contains metadata for all 2’345 municipalities as of May 1st, 2014, derived from input/parteienstaerke_mod_2.xlsx and input/gdenamen_translation.xlsx.

Attribute Type Description
ID Integer Official BFS-number
Name_Official String Official name of the municipality
Name_D String Contains the “common” German translation if it exists
Name_F String Contains the “common” French translation if it exists
Name_I String Contains the “common” Italian translation if it exists

output/parties.csv

Contains party classifications made by SRF Data with the help of political scientists, used throughout all projects related to elections.

Attribute Type Description
ID Integer Unique identifier
Abbr_* String Abbreviation in German (D), French (F), English (E), Romansh (R), Italian (I), respectively
Legend_* String Abbreviation, but with slightly more information, used for frontend purposes
Name_* String Full name
Sortorder Integer Used for frontend purposes solely
OLD_ID String “Official” ID as given in input/parteienstaerke_mod_2.xlsx (sheet “Parteien”), used for combining party strengths for party groupings

output/partystrengths.csv

Party strengths 1971-2011 in all 2’345 municipalities, as derived from input/parteienstaerke_mod_2.xlsx. Note: The municipalities have already been harmonized to May 1st 2014 by the BFS.

Note: Only those municipality-year-party combinations are contained where the party actually had candidates.

Attribute Type Description
municipality Integer Unique identifier, referencing ID in output/municipalities.csv
party Integer Party or party grouping, referencing ID in output/parties.csv
year Integer Election year
support Double Party strength in fraction of 1 (e.g. 0.2 = 20%)

output/by_municipality/municipality_{id}.csv

Party strengths 1971-2011 in municipality with BFS-number id.

Note: Only those year-party combinations are contained where the party actually had candidates.

Attribute Type Description
party Integer Party or party grouping, referencing ID in output/parties.csv
year Integer Election year
support Double Party strength in fraction of 1 (e.g. 0.2 = 20%)

output/by_municipality/national.csv

Party strengths 1971-2011 on the national level.

Note: Only those year-party combinations are contained where the party actually had candidates.

Attribute Type Description
party Integer Party or party grouping, referencing ID in output/parties.csv
year Integer Election year
support Double Party strength in fraction of 1 (e.g. 0.2 = 20%)

output/by_party/party_{id}.csv

Party strengths 1971-2011 in all municipalities for party with id as defined in output/parties.csv.

Note: Only those year-municipality combinations are contained where the party actually had candidates.

Attribute Type Description
municipality Integer Unique identifier, referencing ID in output/municipalities.csv
year Integer Election year
support Double Party strength in fraction of 1 (e.g. 0.2 = 20%)

output/by_party/dominant_party.csv

Party strengths 1971-2011 with only the “dominant” party per year-municipality combination (i.e. the one with the most support)

Attribute Type Description
municipality Integer Unique identifier, referencing ID in output/municipalities.csv
party Integer Party or party grouping, referencing ID in output/parties.csv
year Integer Election year
support Double Party strength in fraction of 1 (e.g. 0.2 = 20%)

Preparations

Define packages

# von https://mran.revolutionanalytics.com/web/packages/checkpoint/vignettes/using-checkpoint-with-knitr.html
cat("library(magrittr)
library(tidyr)
library(dplyr)
library(readxl)
library(ggplot2)", 
file = "manifest.R")
package_date <- "2015-08-27"

Install packages

if(!require(checkpoint)) {
  if(!require(devtools)){
    install.packages("devtools", repos = "http://cran.us.r-project.org")
    require(devtools)
  }
  devtools::install_github("checkpoint", username = "RevolutionAnalytics", ref = "v0.3.2", repos = "http://cran.us.r-project.org")
  require(checkpoint)
}
if(!dir.exists("~/.checkpoint")){
  dir.create("~/.checkpoint")
}
checkpoint(snapshotDate = package_date, project = path_to_wd, verbose = T, scanForPackages = T, use.knitr = F)
rm(package_date)

Load packages

source("manifest.R")
unlink("manifest.R")

Load necessary scripts

knitr::read_chunk('scripts/preprocessing.R')
source('scripts//preprocessing.R')

Preprocessing

Municipality data

# load data
# read in input file (xlsx) - need to load a slightly modified input file as the original xlsx makes it really complicated
raw_data <- read_excel("input/parteienstaerke_mod_2.xlsx", sheet = 1, skip = 12, col_names = T)[1:2345,] # replace this subsetting with bugfix in readxl
parties <- read_excel("input/parteienstaerke_mod_2.xlsx", sheet = 4)[1:24,]
# all years 
years <- seq(1971, 2011, 4)
length(years) # we have 11 different time stamps
## [1] 11
# first problem: not all years have the same amount of parties, thus we need to fill up by year
# first split the columns into sets for each year.

# doublecheck that "1" is contained as many times as we have years
table(names(raw_data) == "1")[2] == length(years)
## TRUE 
## TRUE
# this is true, so we can split the columns by this column name
indices <- which(names(raw_data) == "1")
yearlyDataframes <- list()
for(i in 1:length(indices)){
  start <- indices[i]
  stop <- if(is.na(indices[i + 1])) ncol(raw_data) else indices[i + 1] - 1 
  yearlyDataframes[[i]] <- prepareYearlyDataForVis(raw_data[,start:stop], years[i])
}
# combine
filled_data <- cbind(raw_data[, 1:indices[1] - 1], yearlyDataframes)
# doublecheck: column numbers should be 2 + 11 * 24
dim(filled_data)[2] == 2 + 11 * 24
## [1] TRUE
# aggregate parties (cf. to column OLD_ID in parties.csv)
for(year in years){
  # FDP
  filled_data <- combineAndDelete(1, c(1, 5), year)
  # Kleine Rechtsparteien
  filled_data <- combineAndDelete(16, c(16, 15, 14, 17), year)
  # Kleine Linksparteien
  filled_data <- combineAndDelete(9, c(9, 27, 10, 11, 12), year)
  # Andere
  filled_data <- combineAndDelete(8, c(8, 6, 26, 35, 33), year)
}
# gather into long form
long_data <- filled_data %>%
  gather(ParteiJahr, Staerke,-c(BFSNr, GdeName))

# add new columns Jahr and Partei from splitted ParteiJahr
long_data %<>% separate(col = ParteiJahr, into = c("Partei","Jahr")) %>%
  mutate(Jahr = as.Date(Jahr, "%Y")) %>%
  tbl_df()


# divide strength values through 100
long_data %<>% mutate(Staerke = Staerke / 100)
# only retain BFSNr
long_data %<>% select(BFSNr, Partei, Jahr, Staerke)

National data

# country
# read in input file (xlsx) - need to load a slightly modified input file as the original xlsx makes it really complicated
raw_data_country <- read_excel("input/parteienstaerke_national.xlsx", sheet = 1, skip = 0, col_names = T)[1:24,] # replace this subsetting with bugfix in readxl

long_data_country <- raw_data_country %>%
  gather(Jahr, Staerke, -Partei) %>%
  mutate(Jahr = as.Date(Jahr, "%Y")) %>%
  select(Partei, Jahr, Staerke) %>%
  rename(party = Partei, year = Jahr, support = Staerke) %>%
  mutate(support = support / 100, year = as.numeric(format.Date(year, "%Y")))

# aggregate parties
for(year in years){
  # FDP
  long_data_country <- combineAndDeleteCountry(1, c(1, 5), year)
  # Kleine Rechtsparteien
  long_data_country <- combineAndDeleteCountry(16, c(16, 15, 14, 17), year)
  # Kleine Linksparteien
  long_data_country <- combineAndDeleteCountry(9, c(9, 27, 10, 11, 12), year)
  # Andere
  long_data_country <- combineAndDeleteCountry(8, c(8, 6, 26, 35, 33), year)
}

dim(long_data_country)[1] == 11 * 24
## [1] TRUE
# delete incomplete cases
# long_data_country[complete.cases(long_data_country),]

Output

Copy party data to output

parties <- read.csv("input/parties.csv") # same here
write.csv(parties, file = "output/parties.csv", row.names = F, na = "")

Change variable names and doublecheck

# change variable names
long_data %<>% rename(year = Jahr, municipality = BFSNr, party = Partei, support = Staerke) %>%
  mutate(year = as.numeric(format.Date(year, "%Y")), party = as.numeric(party))

totalSupport <- long_data %>%
  group_by(year, municipality) %>%
  summarize(total = sum(support, na.rm = T)) %>%
  filter(!(total > 0.999999999 & total < 1.000000001))

totalSupport %>%
  filter(total > 0.0)
## Source: local data frame [0 x 3]
## Groups: year
## 
## Variables not shown: year (dbl), municipality (dbl), total (dbl)

By party

mainDir <- getwd()
subDir <- "output/by_party"
dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
# For the map, we need 13 files (one for each of the 12 parties or groups of parties, plus one for 'all parties'). Each file represents support for a single party, and contains data for all municipalities in all years. In the case of 'all parties', it's the support for the strongest party in that municipality in that year. 

# split by party and save
for(i in parties$ID){ 
  i <- as.character(i)
  subset_data <- subset(long_data, party == i)
  subset_data <- subset_data[complete.cases(subset_data),]
  write.csv(subset_data, file = paste("output/by_party/party_", i, ".csv", sep = ""), row.names = F)
}

# save value for dominant party in each year/municipality
dominance <- long_data %>%
  group_by(municipality, year) %>%
  arrange(desc(support)) %>%
  # only take the most dominant party in each year
  slice(1) %>%
  ungroup()
# only retain those where support is not NA
subset_data <- dominance[complete.cases(dominance),]
write.csv(subset_data, file = "output/by_party/dominant_party.csv", row.names = F)

By municipality

# For the area chart, we need many more files, one for each municipality plus one for all of Switzerland (we estimate that's about 2600 files). Each of these files should contain data for all parties in all years.
mainDir <- getwd()
subDir <- "output/by_municipality"
dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
# split by municipality and save
for(i in unique(long_data$municipality)){ 
  subset_data <- subset(long_data, municipality == i) %>%
    select(party, year, support)
  subset_data <- subset_data[complete.cases(subset_data),]
  # check if support adds up to 100
  totalSupport <- subset_data %>%
    group_by(year) %>%
    summarize(total = sum(support)) %>%
    filter(!(total > 0.999999999 & total < 1.000000001))
  # if it doesn't only retain those cases where it is not 0
  totalSupport %<>%
    filter(!(total < 0.000000001))
  if(dim(totalSupport)[1] != 0){
    warning("Party strengths do not sum up to 1 (or are not 0) in municipality ", i, " and year ", totalSupport$year)
  }
  write.csv(subset_data, file = paste("output/by_municipality/municipality_", i, ".csv", sep = ""), row.names = F)
}
# save a file for whole switzerland
subset_data <- long_data_country[complete.cases(long_data_country),]
# check if support adds up to 100
totalSupport <- subset_data %>%
  group_by(year) %>%
  summarize(total = sum(support)) %>%
  filter(!(total > 0.999999999 & total < 1.000000001))
if(dim(totalSupport)[1] != 0){
  warning("Party strengths do not sum up to 1 in Switzerland and year ", totalSupport$year)
}
write.csv(subset_data, file = "output/by_municipality/national.csv", row.names = F)

all together

alldata <- long_data %>% select(municipality, party, year, support)
# only keep complete cases (e.g. remove NA)
write.csv(alldata[complete.cases(alldata),], "output/partystrengths.csv", row.names = F)

Save municipality names & add translations into separate file

# save municipality names

# first, load municipality name translations (src http://www.bfs.admin.ch/bfs/portal/de/index/infothek/nomenklaturen/blank/blank/gem_liste/03.Document.90142.xls, slightly edited to make it easier to read in)
translated_names <- read_excel("input/gdenamen_translation.xlsx", sheet = 1)[1:75,]
municipalities <- filled_data %>%
  select(BFSNr, GdeName) %>%
  rename(ID = BFSNr, Name_Official = GdeName)
municipalities %<>%
  left_join(translated_names, by = "Name_Official")

write.csv(municipalities, file = "output/municipalities.csv", row.names = F, na = "")