Notes

This document illustrates the preprocessing of the dataset visualized in this article on srf.ch.

SRF Data attaches great importance to transparent and reproducible data preprocessing and -analysis. SRF Data believes in the principles of open data but also open and reproducible methods. Third parties should be empowered to build on the work of SRF Data and to generate new analyses and applications.

R-Script & processed data

The preprocessing and analysis of the data was conducted in the R project for statistical computing. The RMarkdown script used to generate this document and all the resulting data can be downloaded under this link. Through executing main.Rmd, the herein described process can be reproduced and this document can be generated. In the course of this, data from the folder ìnput will be processed and results will be written to output.

Attention: Please set your working directory in the first code chunk!

GitHub

The code for the herein described process can also be freely downloaded from https://github.com/srfdata/2015-06-elections-list-apparentments. Criticism in the form of GitHub issues and pull requests are very welcome!

License

Creative Commons License
2015-06-elections-list-apparentments by SRF Data is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Exclusion of liability

The published information has been collated carefully, but no guarantee is offered of its completeness, correctness or up-to-date nature. No liability is accepted for damage or loss incurred from the use of this script or the information drawn from it. This exclusion of liability also applies to third-party content that is accessible via this offer.

Other projects

All code & data from SRF Data is available under http://srfdata.github.io.

Data description

Original data source

  • input/lv9511n.csv - Effects of list apparentments on the cantonal level, 1995-2011. Data source: Daniel Bochsler, NCCR Democracy.
  • input/nrwresultate_sim9511.xls - Effects of list apparentments on the national level, 1995-2011. Data source: Daniel Bochsler, NCCR Democracy.
  • input/stimmen_der_wahllisten_nach_parteien_und_kantonen_clean.xls - Party strengths and list definitions on the cantonal level, 2011. Data source: BFS. The formatting was removed manually in order to make reading the data into R easier.
  • input/listenverbindungen_bfs_others_unedited.csv - intermediary file that was used because data needed to be corrected manually, not relevant.
  • input/listenverbindungen_bfs_others_edited.csv - intermediary file that was used because data needed to be corrected manually, not relevant. If you need to know what manual corrections were made, look at the diff between *_unedited.csv and *_edited.csv.

Description of output

The following sections describe the results of the data preprocessing as stored in the output folder.

output/parties.csv

Contains party classifications made by SRF Data with the help of political scientists, used throughout all projects related to elections.

Attribute Type Description
ID Integer Unique identifier
Abbr_* String Abbreviation in German (D), French (F), English (E), Romansh (R), Italian (I), respectively
Legend_* String Abbreviation, but with slightly more information, used for frontend purposes
Name_* String Full name
Sortorder Integer Used for frontend purposes solely
OLD_ID String “Official” ID as given in https://github.com/srfdata/2015-06-elections-partystrengths/blob/master/analysis/input/parteienstaerke_mod_2.xlsx (sheet “Parteien”), used for combining party strengths for party groupings

output/lv_2011.csv

Contains effects of list apparentments on the cantonal level, 2011, as derived from input/lv9511n.csv, with the help of input/stimmen_der_wahllisten_nach_parteien_und_kantonen_clean.xls.

Attribute Type Description
year Integer Election year
canton String Official cantonal abbreviation
party String Contains the party name, but only if it belongs to a group in output/parties.csv (e.g. id == 8)
party_id Integer Party or party grouping, referencing ID in output/parties.csv
party_strength Double Party strength in percent
seats_with String Actual, resulting seats
seat_difference String seats_with - seats_without
seats_without String Seats that would have resulted without the possibility of list apparentments
list_id String The cantonal list the party was on in 2011

output/lv_2011_national.csv

Contains effects of list apparentments on the national level, 2011, as derived from input/nrwresultate_sim9511.xls.

Attribute Type Description
year Integer Election year
party String Contains the party name, but only if it belongs to a group in output/parties.csv (e.g. id == 8)
party_id Integer Party or party grouping, referencing ID in output/parties.csv
party_strength Double Party strength in percent
seats_with String Actual, resulting seats
seats_without String Seats that would have resulted without the possibility of list apparentments
seat_difference String seats_with - seats_without

output/lv_historical.csv

Contains effects of list apparentments on the cantonal level, 1995 - 2007, as derived from input/lv9511n.csv.

Attribute Type Description
year Integer Election year
canton String Official cantonal abbreviation
party String Contains the party name, but only if it belongs to a group in output/parties.csv (e.g. id == 8)
party_id Integer Party or party grouping, referencing ID in output/parties.csv
party_strength Double Party strength in percent
seats_with String Actual, resulting seats
seat_difference String seats_with - seats_without
seats_without String Seats that would have resulted without the possibility of list apparentments

output/lv_historical_national.csv

Contains effects of list apparentments on the national level, 1995 - 2007, as derived from input/nrwresultate_sim9511.xls.

Attribute Type Description
year Integer Election year
party String Contains the party name, but only if it belongs to a group in output/parties.csv (e.g. id == 8)
party_id Integer Party or party grouping, referencing ID in output/parties.csv
party_strength Double Party strength in percent
seats_with String Actual, resulting seats
seat_difference String seats_with - seats_without
seats_without String Seats that would have resulted without the possibility of list apparentments

Preparations

Define packages

# von https://mran.revolutionanalytics.com/web/packages/checkpoint/vignettes/using-checkpoint-with-knitr.html
cat("library(magrittr)
library(tidyr)
library(dplyr)
library(readxl)
library(ggplot2)", 
file = "manifest.R")
package_date <- "2015-08-27"

Install packages

if(!require(checkpoint)) {
  if(!require(devtools)){
    install.packages("devtools", repos = "http://cran.us.r-project.org")
    require(devtools)
  }
  devtools::install_github("checkpoint", username = "RevolutionAnalytics", ref = "v0.3.2", repos = "http://cran.us.r-project.org")
  require(checkpoint)
}
if(!dir.exists("~/.checkpoint")){
  dir.create("~/.checkpoint")
}
checkpoint(snapshotDate = package_date, project = path_to_wd, verbose = T, scanForPackages = T, use.knitr = F)
rm(package_date)

Load packages

source("manifest.R")
unlink("manifest.R")

Preprocessing

Basic data loading

The following comments are unfortunately in German

# 1 Listenverbindungen in den Nationalratswahlen in den Jahren 1995-2011, nach Kantonen und Parteien

listenverbindungen_kantonal <- read.csv2(file = "input/lv9511n.csv", sep = ",", stringsAsFactors = F)
# Es gibt zwei Fehler in den kantonalen Daten, gleich hier beheben
listenverbindungen_kantonal[listenverbindungen_kantonal$party == "glp" & listenverbindungen_kantonal$year == 2011 & listenverbindungen_kantonal$kt == "TG",]$ap_s <- 1
listenverbindungen_kantonal[listenverbindungen_kantonal$party == "fdp" & listenverbindungen_kantonal$year == 2011 & listenverbindungen_kantonal$kt == "TG",]$ap_s <- -1

listenverbindungen_kantonal %<>% 
  mutate(vote = as.double(vote))

listenverbindungen_national <- read_excel(path = "input/nrwresultate_sim9511.xls")[1:140,]

# Es gibt zwei Fehler in den nationalen Daten, gleich hier beheben
listenverbindungen_national[listenverbindungen_national$year == 2011 & listenverbindungen_national$party == "GLP",]$ap_s <- 5
listenverbindungen_national[listenverbindungen_national$year == 2011 & listenverbindungen_national$party == "GLP",]$s_noap <- 7
listenverbindungen_national[listenverbindungen_national$year == 2011 & listenverbindungen_national$party == "FDP",]$ap_s <- -1
listenverbindungen_national[listenverbindungen_national$year == 2011 & listenverbindungen_national$party == "FDP",]$s_noap <- 30

# doublecheck
listenverbindungen_kantonal %>% group_by(id_kt) %>% 
  summarize(total_seats = sum(abs(ap_s))) %>% arrange(desc(total_seats))
## Source: local data frame [130 x 2]
## 
##     id_kt total_seats
## 1  BE1995           4
## 2  BE2003           4
## 3  FR1999           4
## 4  SG1995           4
## 5  ZH2003           4
## 6  ZH2007           4
## 7  AG1995           2
## 8  AG1999           2
## 9  AG2003           2
## 10 AG2007           2
## ..    ...         ...
# check that sum is always 100
rundungsfehler <- listenverbindungen_kantonal %>% 
  group_by(year, kt) %>% 
  summarize(total_vote = sum(vote)) %>% 
  ungroup() %>% 
  arrange(desc(total_vote))
# mostly ok

Copy party data to output

parties <- read.csv("input/parties.csv") # same here
write.csv(parties, file = "output/parties.csv", row.names = F, na = "")

2011: Cantonal

# neuer Approach: Daten aus BFS-Listen-Datensatz
# Datensatz einlesen
listenverbindungen_bfs <- read_excel(path = "input/stimmen_der_wahllisten_nach_parteien_und_kantonen_clean.xls", col_types = c("numeric", "text", "numeric", "text", "text", "text", "text", "text", "numeric"))[1:384,]
## DEFINEDNAME: 20 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 07 3b 00 00 00 00 00 00 00 00 ff 00 
## DEFINEDNAME: 00 00 00 0d 0b 00 00 00 00 00 00 00 00 00 00 4c 49 53 54 45 4e 53 54 49 4d 4d 45 4e 3b 01 00 00 00 32 01 00 00 07 00 
## DEFINEDNAME: 00 00 00 1a 0b 00 00 00 01 00 00 00 00 00 00 45 78 63 65 6c 5f 42 75 69 6c 74 49 6e 5f 50 72 69 6e 74 5f 54 69 74 6c 65 73 3b 00 00 00 00 00 00 00 00 fc 00 
## DEFINEDNAME: 00 00 00 0d 0b 00 00 00 01 00 00 00 00 00 00 4c 49 53 54 45 4e 53 54 49 4d 4d 45 4e 3b 00 00 00 00 80 01 00 00 08 00 
## DEFINEDNAME: 20 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 07 3b 00 00 00 00 00 00 00 00 ff 00 
## DEFINEDNAME: 00 00 00 0d 0b 00 00 00 00 00 00 00 00 00 00 4c 49 53 54 45 4e 53 54 49 4d 4d 45 4e 3b 01 00 00 00 32 01 00 00 07 00 
## DEFINEDNAME: 00 00 00 1a 0b 00 00 00 01 00 00 00 00 00 00 45 78 63 65 6c 5f 42 75 69 6c 74 49 6e 5f 50 72 69 6e 74 5f 54 69 74 6c 65 73 3b 00 00 00 00 00 00 00 00 fc 00 
## DEFINEDNAME: 00 00 00 0d 0b 00 00 00 01 00 00 00 00 00 00 4c 49 53 54 45 4e 53 54 49 4d 4d 45 4e 3b 00 00 00 00 80 01 00 00 08 00
# plausbilitätschecks
rundungsfehler <- listenverbindungen_bfs %>% 
  group_by(year, canton) %>% 
  summarize(total_vote = sum(party_strength)) %>% 
  ungroup() %>% 
  arrange(desc(total_vote))
# always ok

# party -> kleinschreiben und als faktor
listenverbindungen_bfs %<>% 
  mutate(party = as.factor(tolower(party)))

# welche parties gibt es?
unique(listenverbindungen_bfs$party)
##  [1] svp    sp     fdp    gps    cvp    glp    evp    übrige pda    edu   
## [11] bdp    sd     csp    fdp/lp lega   sol    mcr   
## 17 Levels: bdp csp cvp edu evp fdp fdp/lp glp gps lega mcr pda sd ... übrige
length(unique(listenverbindungen_bfs$party))
## [1] 17
# Datensatz in party == "Übrige" und restliche aufspalten
listenverbindungen_bfs_parties <- as.data.frame(split(listenverbindungen_bfs, listenverbindungen_bfs$party == "übrige")[1]$`FALSE`)
listenverbindungen_bfs_others <- as.data.frame(split(listenverbindungen_bfs, listenverbindungen_bfs$party == "übrige")[2]$`TRUE`)

# Für parties: Summation der party_strength gruppiert nach partei
listenverbindungen_bfs_parties_summed <- listenverbindungen_bfs_parties %>%
  group_by(year, canton, party) %>% 
  summarize(lv, party_strength = sum(party_strength)) %>% 
  ungroup() 

# Plausbilitätschecks
listenverbindungen_bfs_parties_summed %>% filter(canton == "BE")
## Source: local data frame [11 x 5]
## 
##    year canton party lv party_strength
## 1  2011     BE   bdp NA     14.8802936
## 2  2011     BE   cvp  B      2.0665817
## 3  2011     BE   edu NA      3.1039152
## 4  2011     BE   evp  B      4.1735028
## 5  2011     BE   fdp  A      8.6555962
## 6  2011     BE   glp  B      5.2913954
## 7  2011     BE   gps  C      9.4147215
## 8  2011     BE   pda  C      0.3426402
## 9  2011     BE    sd NA      0.5609540
## 10 2011     BE    sp  C     19.3274994
## 11 2011     BE   svp  D     28.9880443
listenverbindungen_kantonal %>% filter(kt == "BE" & year == 2011 & party != "uebrige")
##    year kt party  id_kt vote seats others_name ap_s  ap_i kname s
## 1  2011 BE   bdp BE2011 14.9    NA                0        Bern 4
## 2  2011 BE   csp BE2011  0.0    NA                0        Bern 0
## 3  2011 BE   cvp BE2011  2.1    NA                0 BE11C  Bern 0
## 4  2011 BE   edu BE2011  3.1    NA                0        Bern 0
## 5  2011 BE   evp BE2011  4.2    NA                0 BE11C  Bern 1
## 6  2011 BE   fdp BE2011  8.7    NA                0        Bern 2
## 7  2011 BE   glp BE2011  5.3    NA                1 BE11C  Bern 2
## 8  2011 BE   gps BE2011  9.4    NA                0 BE11B  Bern 3
## 9  2011 BE  lega BE2011  0.0    NA                0        Bern 0
## 10 2011 BE   lps BE2011  0.0    NA                0        Bern 0
## 11 2011 BE   mcr BE2011  0.0    NA                0        Bern 0
## 12 2011 BE   pda BE2011  0.3    NA                0 BE11B  Bern 0
## 13 2011 BE    sd BE2011  0.6    NA                0        Bern 0
## 14 2011 BE   sol BE2011  0.0    NA                0        Bern 0
## 15 2011 BE    sp BE2011 19.3    NA                0 BE11B  Bern 6
## 16 2011 BE   svp BE2011 29.0    NA               -1 BE11A  Bern 8
# ok

# Zusätzliche Spalten hinzufügen
listenverbindungen_bfs_parties_summed %<>% mutate(party_abbr = party, party = NA, party_id = NA, seats_with = NA, seats_without = NA, seat_difference = NA, list_id = ifelse(!is.na(lv), paste(canton, year, lv, sep = ""), ""))

# Differenzen in den Partei-Bezeichnungen
listenverbindungen_kantonal_2011 <- listenverbindungen_kantonal %>% filter(year == 2011)
setdiff(unique(listenverbindungen_bfs_parties_summed$party_abbr), unique(listenverbindungen_kantonal_2011$party))
## [1] "fdp/lp"
setdiff(unique(listenverbindungen_kantonal_2011$party), unique(listenverbindungen_bfs_parties_summed$party_abbr))
## [1] "lps"     "uebrige"
# welche sind fdp/lp in den BFS-Daten?
listenverbindungen_bfs_parties_summed %>% 
  filter(party_abbr == "fdp/lp" & party_strength > 0)
## Source: local data frame [2 x 11]
## 
##   year canton party lv party_strength party_abbr party_id seats_with
## 1 2011     BS    NA  A       6.826345     fdp/lp       NA         NA
## 2 2011     VD    NA  A       5.727088     fdp/lp       NA         NA
## Variables not shown: seats_without (lgl), seat_difference (lgl), list_id
##   (chr)
# zu lps umbenennen, damit die Daten gematcht werden können
listenverbindungen_bfs_parties_summed$party_abbr <-  factor(listenverbindungen_bfs_parties_summed$party_abbr, levels = c(levels(listenverbindungen_bfs_parties_summed$party_abbr), "lps"))
listenverbindungen_bfs_parties_summed[listenverbindungen_bfs_parties_summed$party_abbr == "fdp/lp",]$party_abbr <- "lps"

setdiff(unique(listenverbindungen_bfs_parties_summed$party_abbr), unique(listenverbindungen_kantonal_2011$party))
## character(0)
setdiff(unique(listenverbindungen_kantonal_2011$party), unique(listenverbindungen_bfs_parties_summed$party_abbr))
## [1] "uebrige"
listenverbindungen_bfs_parties_summed %>% 
  filter(party_abbr == "lps" & party_strength > 0)
## Source: local data frame [2 x 11]
## 
##   year canton party lv party_strength party_abbr party_id seats_with
## 1 2011     BS    NA  A       6.826345        lps       NA         NA
## 2 2011     VD    NA  A       5.727088        lps       NA         NA
## Variables not shown: seats_without (lgl), seat_difference (lgl), list_id
##   (chr)
# welche sind LPS in den Bochsler-Daten? 
listenverbindungen_kantonal_2011 %>% 
  filter(party == "lps" & vote > 0)
##   year kt party  id_kt vote seats others_name ap_s  ap_i       kname s
## 1 2011 BS   lps BS2011  6.8    NA                0 BS11A Basel-Stadt 0
## 2 2011 VD   lps VD2011  5.7    NA                0 VD11B       Waadt 1
# Wikipedia: Die noch bestehenden kantonalen Parteien sind nun zusammen mit ihren freisinnigen Schwesterparteien Teil der neuen liberalen Bundespartei.
listenverbindungen_kantonal_2011 %<>% 
  select(year, kt, party, ap_s, s, ap_i, vote) %>% 
  rename(party_bochsler = party) %>% 
  mutate(party_bochsler = as.factor(party_bochsler))

# Bochsler-Daten: Welche "uebrige" haben Sitze gewonnen? 
listenverbindungen_kantonal_2011 %>% 
  filter(party_bochsler == "uebrige" & s > 0)
##   year kt party_bochsler ap_s s ap_i vote
## 1 2011 OW        uebrige    0 1      56.9
listenverbindungen_kantonal_2011 %>%
  filter(party_bochsler == "uebrige" & ap_s != 0)
## [1] year           kt             party_bochsler ap_s          
## [5] s              ap_i           vote          
## <0 rows> (or 0-length row.names)
# es handelt sich um die csp
  
# Bochsler-Daten: matchen (Gewonnene Sitze und LV-Auswirkungen)
listenverbindungen_parties_combined <- listenverbindungen_bfs_parties_summed %>% left_join(listenverbindungen_kantonal_2011, by = c("year", "canton" = "kt", "party_abbr" = "party_bochsler"))

## Plausibilitätschecks
# Gesamtsumme Sitze
sum(listenverbindungen_parties_combined$s, na.rm = T)
## [1] 199
# 199, der Sitz der CSP im Kanton OW fehlt noch

# überprüfen, ob party_strength überall ungefähr gleich ist, sonst korrigieren
which((listenverbindungen_parties_combined$party_strength >= listenverbindungen_parties_combined$vote - 0.1 & listenverbindungen_parties_combined$party_strength <= listenverbindungen_parties_combined$vote + 0.1) == F)
## [1] 129 130 131
listenverbindungen_parties_combined[c(129,130,131),]
## Source: local data frame [3 x 15]
## 
##   year canton party lv party_strength party_abbr party_id seats_with
## 1 2011     SZ    NA  C       3.831282        gps       NA         NA
## 2 2011     SZ    NA  C      15.657458         sp       NA         NA
## 3 2011     SZ    NA  D      37.984713        svp       NA         NA
## Variables not shown: seats_without (lgl), seat_difference (lgl), list_id
##   (chr), ap_s (dbl), s (int), ap_i (chr), vote (dbl)
# Erklärung: 
# 1. GPS + Atomfrei werden im BFS Datensatz als GPS gerechnet, im Bochsler-Datensatz als Übrige
# 2. SP + Frischer Wind + Gewerkschaftsbund werden im BFS Datensatz als SP gerechnet, im Bochsler-Datensatz als Übrige
# 3. SVP + Aktive Senioren werden im BFS Datensatz als als SVP gerechnet, im Bochsler-Datensatz Aktive Senioren als übrige

# überprüfen, wo nicht in beiden Datensätzen von einer LV gesprochen wird
a <- which(!is.na(listenverbindungen_parties_combined$lv) & listenverbindungen_parties_combined$ap_i == "")
# und umgekehrt
b <- which(is.na(listenverbindungen_parties_combined$lv) & listenverbindungen_parties_combined$ap_i != "") 

# Fazit: Viele LV kommen bei Bochsler nicht vor (sind auch keine zwischenparteilichen LV)
# Eine kommt bei Bochsler vor, aber im BFS-Datensatz nicht, überprüfen:
listenverbindungen_parties_combined[c(b),]
## Source: local data frame [1 x 15]
## 
##   year canton party lv party_strength party_abbr party_id seats_with
## 1 2011     SO    NA NA      0.4855171        edu       NA         NA
## Variables not shown: seats_without (lgl), seat_difference (lgl), list_id
##   (chr), ap_s (dbl), s (int), ap_i (chr), vote (dbl)
listenverbindungen_kantonal_2011 %>% filter(ap_i == "SO11A")
##   year kt party_bochsler ap_s s  ap_i vote
## 1 2011 SO            edu    0 0 SO11A  0.5
## 2 2011 SO            svp    0 2 SO11A 24.3
listenverbindungen_kantonal_2011 %>% filter(kt == "SO"  & ap_s != 0)
##   year kt party_bochsler ap_s s  ap_i vote
## 1 2011 SO            cvp    1 2 SO11C 17.9
## 2 2011 SO            fdp   -1 1 SO11B 18.4
# Laut Bochsler sind EDU und SVP im Jahr 2011 in Solothurn eine LV eingegangen, es hat aber nichts gebracht, und die LV haben nur CVP und FDP beeinflusst

# Beschluss: wir übernehmen die LV von Bochsler

# variablen transformieren
listenverbindungen_parties_combined %<>% 
  mutate(list_id = ap_i, seats_with = s, seat_difference = ap_s, seats_without = seats_with - seat_difference)
# unnötige variablen rausschmeissen
listenverbindungen_parties_combined %<>%
  select(-ap_s, -s, -ap_i, -vote)

# LV brauchen wir noch für das Matching mit den übrigen

# Übrige: Variablen hinzufügen + manuell bearbeiten
listenverbindungen_bfs_others %<>%
  select(-lnr, -lnroff, -ulv) %>%
  mutate(party = name, party_abbr = NA, seats_with = 0, seats_without = 0, seat_difference = 0)

# Der Sitz der CSP muss noch eingetragen werden 
listenverbindungen_bfs_others[listenverbindungen_bfs_others$canton == "OW",]$seats_with <- 1
listenverbindungen_bfs_others[listenverbindungen_bfs_others$canton == "OW",]$seats_without <- 1
listenverbindungen_bfs_others[listenverbindungen_bfs_others$canton == "OW",]$party_abbr <- "csp"
write.csv(listenverbindungen_bfs_others, file = "input/listenverbindungen_bfs_others_unedited.csv", na = "")

# EDITIEREN
# wieder einlesen
listenverbindungen_bfs_others <- read.csv2(file = "input/listenverbindungen_bfs_others_edited.csv", sep = ",", stringsAsFactors = F)

# transformieren
listenverbindungen_bfs_others %<>%
  mutate(party = name) %>%
  mutate(year = as.numeric(year), party_strength = as.numeric(party_strength)) %>% 
  mutate(list_id = NA, party_id = NA) %>%
  select(year, canton, party, lv, party_strength, party_abbr, party_id, seats_with, seats_without, seat_difference, list_id)

str(listenverbindungen_bfs_others)
## 'data.frame':    49 obs. of  11 variables:
##  $ year           : num  2011 2011 2011 2011 2011 ...
##  $ canton         : chr  "ZH" "ZH" "ZH" "ZH" ...
##  $ party          : chr  "Anti PowerPoint Partei" "Tierpartei" "Piratenpartei" "Subitas" ...
##  $ lv             : chr  "" "B" "E" "" ...
##  $ party_strength : num  0.0346 0.4002 0.8614 0.0143 0.0615 ...
##  $ party_abbr     : chr  "APPP" "TPS" "PP" "Sub." ...
##  $ party_id       : logi  NA NA NA NA NA NA ...
##  $ seats_with     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ seats_without  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ seat_difference: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ list_id        : logi  NA NA NA NA NA NA ...
str(listenverbindungen_parties_combined)
## Classes 'tbl_df', 'tbl' and 'data.frame':    188 obs. of  11 variables:
##  $ year           : num  2011 2011 2011 2011 2011 ...
##  $ canton         : chr  "AG" "AG" "AG" "AG" ...
##  $ party          : logi  NA NA NA NA NA NA ...
##  $ lv             : chr  "B" "B" "D" "E" ...
##  $ party_strength : num  6.14 10.59 1.17 3.22 11.48 ...
##  $ party_abbr     : chr  "bdp" "cvp" "edu" "evp" ...
##  $ party_id       : logi  NA NA NA NA NA NA ...
##  $ seats_with     : int  1 1 0 0 2 1 1 0 3 6 ...
##  $ seats_without  : num  1 1 0 0 2 1 1 0 3 6 ...
##  $ seat_difference: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ list_id        : chr  "AG11B" "AG11B" "AG11A" "AG11C" ...
# anderer Datensatz auch noch einmal transformieren
listenverbindungen_parties_combined %<>% mutate(party = as.character(party))
# Datensätze wieder zusammenführen
listenverbindungen_bfs_final <- rbind(listenverbindungen_parties_combined, listenverbindungen_bfs_others)

# Plausibilitätschecks 
# Summation zu 100%
as.data.frame(listenverbindungen_bfs_final %>% 
  group_by(year, canton) %>% 
  summarize(total_vote = sum(party_strength)) %>% 
  ungroup() %>% 
  arrange(desc(total_vote)))
##    year canton total_vote
## 1  2011     BE        100
## 2  2011     TI        100
## 3  2011     ZG        100
## 4  2011     GL        100
## 5  2011     FR        100
## 6  2011     UR        100
## 7  2011     AR        100
## 8  2011     SH        100
## 9  2011     SO        100
## 10 2011     AI        100
## 11 2011     VD        100
## 12 2011     GR        100
## 13 2011     JU        100
## 14 2011     NE        100
## 15 2011     NW        100
## 16 2011     TG        100
## 17 2011     BL        100
## 18 2011     SG        100
## 19 2011     BS        100
## 20 2011     GE        100
## 21 2011     OW        100
## 22 2011     VS        100
## 23 2011     AG        100
## 24 2011     ZH        100
## 25 2011     SZ        100
## 26 2011     LU        100
# 200 Sitze
sum(listenverbindungen_bfs_final$seats_with)
## [1] 200
sum(listenverbindungen_bfs_final$seats_without)
## [1] 200
# party_id ermitteln
listenverbindungen_bfs_final$party_id <- 8
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "bdp",]$party_id <- 32
# listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "ldu",]$party_id <- 8
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "svp",]$party_id <- 4
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "cvp",]$party_id <- 2
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "fdp",]$party_id <- 1
# listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "fga",]$party_id <- 9
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "gps",]$party_id <- 13
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "sp",]$party_id <- 3
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "CSP",]$party_id <- 8
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "pda",]$party_id <- 9
# listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "fps",]$party_id <- 16
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "evp",]$party_id <- 7
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "sd",]$party_id <- 16
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "sol",]$party_id <- 9
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "lps",]$party_id <- 1
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "lega",]$party_id <- 18
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "edu",]$party_id <- 16
listenverbindungen_bfs_final[listenverbindungen_bfs_final$party_abbr == "glp",]$party_id <- 31

# Listenverbindungs-Code neu berechnen 
list_ids_per_canton <- listenverbindungen_bfs_final %>% 
  filter(!is.na(lv) & lv != "") %>% 
  group_by(canton, lv) %>% 
  slice(1) %>% 
  select(canton, lv, list_id) %>%
  ungroup()

computeListId <- function(row){
  # party ist nur bei "übrigen" ausgefüllt
  if(is.na(row$party)){
    return(row$list_id)
  } else {
    # perform lookup
    return(as.character(list_ids_per_canton[list_ids_per_canton$canton == row$canton & list_ids_per_canton$lv == row$lv,]$list_id))
  }
}

list_ids <- as.character(by(listenverbindungen_bfs_final, 1:nrow(listenverbindungen_bfs_final), computeListId))
list_ids[list_ids == "character(0)"] <- NA
list_ids[list_ids == ""] <- NA

# ersetzen
listenverbindungen_bfs_final$list_id <- list_ids

# letzte, manuelle korrekturen
listenverbindungen_bfs_final[listenverbindungen_bfs_final$canton == "BE" & listenverbindungen_bfs_final$lv == "E" & !is.na(listenverbindungen_bfs_final$party),]$list_id <- "BE11D"
listenverbindungen_bfs_final[listenverbindungen_bfs_final$canton == "FR" & listenverbindungen_bfs_final$lv == "E",]$list_id <- "FR11E"

# Abschluss
listenverbindungen_bfs_final %<>% 
  select(year, canton, party, party_abbr, party_id, party_strength, seats_with, seats_without, seat_difference, list_id)

# LPS zu FDP rechnen
listenverbindungen_bfs_final %<>%
  mutate(lpsfdp = ifelse(party_abbr == "fdp" | party_abbr == "lps", "fdplps", party_abbr))


listenverbindungen_bfs_final %<>% 
  group_by(lpsfdp, year, canton) %>% 
  summarise(party = first(party),  party_abbr = first(party_abbr), party_id = first(party_id),  party_strength = sum(party_strength),seats_with = sum(seats_with), seat_difference = sum(seat_difference), seats_without = sum(seats_without), list_id = first(list_id)) %>% 
  ungroup() %>% 
  select(-lpsfdp, -party_abbr)


# Plausibilitätschecks
as.data.frame(listenverbindungen_bfs_final %>% 
  group_by(year, canton) %>% 
  summarize(total_vote = sum(party_strength)) %>% 
  ungroup() %>% 
  arrange(desc(total_vote)))
##    year canton total_vote
## 1  2011     BE        100
## 2  2011     TI        100
## 3  2011     ZG        100
## 4  2011     GL        100
## 5  2011     FR        100
## 6  2011     UR        100
## 7  2011     AR        100
## 8  2011     SH        100
## 9  2011     SO        100
## 10 2011     AI        100
## 11 2011     VD        100
## 12 2011     GR        100
## 13 2011     JU        100
## 14 2011     NE        100
## 15 2011     NW        100
## 16 2011     TG        100
## 17 2011     BL        100
## 18 2011     SG        100
## 19 2011     BS        100
## 20 2011     GE        100
## 21 2011     OW        100
## 22 2011     VS        100
## 23 2011     AG        100
## 24 2011     ZH        100
## 25 2011     SZ        100
## 26 2011     LU        100
# 200 Sitze
sum(listenverbindungen_bfs_final$seats_with)
## [1] 200
sum(listenverbindungen_bfs_final$seats_without)
## [1] 200
# party_id ermitteln

write.csv(listenverbindungen_bfs_final, file = "output/lv_2011.csv", na = "", row.names = F)

2011: National

# Wie viele Sitze gehen an die CSP? 
listenverbindungen_national %>% 
    filter(party == "CSP")
## Source: local data frame [5 x 8]
## 
##   year party s ap_s s_noap apdummy   voteap   v
## 1 1995   CSP 1    1      0       2 22.20000 0.3
## 2 1999   CSP 1    1      0       3 34.40000 0.4
## 3 2003   CSP 1    1      0       1 38.86719 0.4
## 4 2007   CSP 1    1      0       3 29.92079 0.4
## 5 2011   CSP 0    0      0       4 26.70000 0.3
# Problem: Karl Vogler wird 2011 nicht als CSP aufgeführt, sondern bei den Übrigen, auch in den BFS-Daten

# In den kantonalen Daten für 2011 wird er jedoch als CSP aufgeführt, jedoch ist OW sowieso ein Majorzkanton
listenverbindungen2011_national <- listenverbindungen_national %>% 
    filter(year == 2011) %>% 
    mutate(party = tolower(party), party_abbr = party, party = NA, party_strength = v, seats_with = s, seats_without = s_noap, seat_difference = ap_s, canton = "CH", party_id = NA, list_id = NA) %>% 
    select(-apdummy, -voteap)

# Problem: LPS hat 1 Sitz, aber national wurde die Parteistärke der FDP zugerechnet. Deshalb: LPS entfernen und der FDP einen Sitz gutschreiben
# der fdp einen zurechnen
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "fdp",]$seats_with <- listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "fdp",]$seats_with + 1

listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "fdp",]$seats_without <- listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "fdp",]$seats_without + 1

# Plausibilitätschecks
# Summiert zu 100%
sum(listenverbindungen2011_national$v)
## [1] 99.9
# Nicht genau 100%
# Originaldaten matchen
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "svp",]$party_strength <- 26.5562298833
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "cvp",]$party_strength <- 12.3040121429
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "fdp",]$party_strength <- 15.1045453764
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "gps",]$party_strength <- 8.4328061825
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "sp",]$party_strength <- 18.7221778605
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "csp",]$party_strength <- 0.2557743755
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "pda",]$party_strength <- 0.5409966352
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "evp",]$party_strength <- 1.9973982108
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "sd",]$party_strength <- 0.2003102175
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "sol",]$party_strength <- 0.3384668779

listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "lega",]$party_strength <- 0.8047605286
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "edu",]$party_strength <- 1.2714234639
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "glp",]$party_strength <- 5.3808907415
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "uebrige",]$party_strength <- 2.2361843879
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "bdp",]$party_strength <- 5.4154010371
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "mcr",]$party_strength <- 0.4386220785
# nochmals überprüfen
sum(listenverbindungen2011_national$party_strength)
## [1] 100
# 200 Sitze
sum(listenverbindungen2011_national$seats_with)
## [1] 201
sum(listenverbindungen2011_national$seats_without)
## [1] 201
# solche mit party_strength == 0 entfernen
listenverbindungen2011_national %<>% filter(party_strength > 0)
sum(listenverbindungen2011_national$party_strength)
## [1] 100
sum(listenverbindungen2011_national$seats_with)
## [1] 200
sum(listenverbindungen2011_national$seats_without)
## [1] 200
sum(listenverbindungen2011_national$seat_difference)
## [1] 0
# 

listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "bdp",]$party_id <- 32
# listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "ldu",]$party_id <- 8
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "svp",]$party_id <- 4
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "cvp",]$party_id <- 2
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "fdp",]$party_id <- 1
# listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "fga",]$party_id <- 9
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "gps",]$party_id <- 13
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "sp",]$party_id <- 3
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "csp",]$party_id <- 8
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "pda",]$party_id <- 9
# listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "fps",]$party_id <- 16
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "evp",]$party_id <- 7
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "sd",]$party_id <- 16
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "sol",]$party_id <- 9
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "lega",]$party_id <- 18
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "edu",]$party_id <- 16
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "glp",]$party_id <- 31
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "csp",]$party_id <- 8
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "mcr",]$party_id <- 8
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "uebrige",]$party_id <- 99

# Volle Namen hinzufügen
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "sd",]$party <- "SD"
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "sol",]$party <- "Sol."
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "edu",]$party <- "EDU"
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "csp",]$party <- "CSP"
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "mcr",]$party <- "MCR"
listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "pda",]$party <- "PdA"
# listenverbindungen2011_national[listenverbindungen2011_national$party_abbr == "uebrige",]$party <- ""
# select variables
listenverbindungen2011_national %<>% select(year, party, party_id, party_strength, seats_with, seats_without, seat_difference)

write.csv(listenverbindungen2011_national,file = "output/lv_2011_national.csv", na = "", row.names = F)

Historical: National

# Plausibilitätscheck
listenverbindungen_national %>% 
  group_by(year) %>% 
  summarize(total_sitze = sum(s))
## Source: local data frame [5 x 2]
## 
##   year total_sitze
## 1 1995         202
## 2 1999         200
## 3 2003         200
## 4 2007         200
## 5 2011         200
# gleiches Problem wie bei kantonalen Daten: 1995 hat zwei Sitze zu viel
lv_historisch <- listenverbindungen_national %>% 
  filter(year < 2011)
# die FGA hat einen zu viel
lv_historisch[lv_historisch$year == 1995 & lv_historisch$party == "FGA",]$s <- 2
lv_historisch[lv_historisch$year == 1995 & lv_historisch$party == "FGA",]$ap_s <- 1
lv_historisch[lv_historisch$year == 1995 & lv_historisch$party == "FGA",]$s_noap <- 1
# die FDP hat einen zu viel
lv_historisch[lv_historisch$year == 1995 & lv_historisch$party == "FDP",]$s <- 45
lv_historisch[lv_historisch$year == 1995 & lv_historisch$party == "FDP",]$s_noap <- 48

lv_historisch %>% 
  group_by(year) %>% 
  summarize(total_sitze = sum(s))
## Source: local data frame [4 x 2]
## 
##   year total_sitze
## 1 1995         200
## 2 1999         200
## 3 2003         200
## 4 2007         200
# Plausibilitätscheck
lv_historisch %>% 
  group_by(year) %>% 
  summarize(total_sitze = sum(ap_s))
## Source: local data frame [4 x 2]
## 
##   year total_sitze
## 1 1995           0
## 2 1999           0
## 3 2003           0
## 4 2007           0
# gibt überall null, gut

# nur die nehmen, die mindestens einen Sitz gemacht haben
lv_historisch %<>%
    filter(s > 0)

lv_historisch %>% 
  group_by(year) %>% 
  summarize(total_sitze = sum(s))
## Source: local data frame [4 x 2]
## 
##   year total_sitze
## 1 1995         200
## 2 1999         200
## 3 2003         200
## 4 2007         200
lv_historisch %>% 
  group_by(year) %>% 
  summarize(total_sitze = sum(s_noap))
## Source: local data frame [4 x 2]
## 
##   year total_sitze
## 1 1995         200
## 2 1999         200
## 3 2003         200
## 4 2007         200
lv_historisch %>% 
  group_by(year) %>% 
  summarize(total_sitze = sum(ap_s))
## Source: local data frame [4 x 2]
## 
##   year total_sitze
## 1 1995           0
## 2 1999           0
## 3 2003           0
## 4 2007           0
# gut

# variablen transformieren
lv_historisch %<>% 
    mutate(party = tolower(party), party_abbr = party, party = NA, party_strength = v, seats_with = s, seats_without = s_noap, seat_difference = ap_s, canton = "CH", party_id = NA, list_id = NA) %>% 
    select(-apdummy, -voteap)

lv_historisch[lv_historisch$party_abbr == "ldu",]$party_id <- 8
lv_historisch[lv_historisch$party_abbr == "svp",]$party_id <- 4
lv_historisch[lv_historisch$party_abbr == "cvp",]$party_id <- 2
lv_historisch[lv_historisch$party_abbr == "fps",]$party_id <- 16
lv_historisch[lv_historisch$party_abbr == "fdp",]$party_id <- 1
lv_historisch[lv_historisch$party_abbr == "fga",]$party_id <- 9
lv_historisch[lv_historisch$party_abbr == "gps",]$party_id <- 13
lv_historisch[lv_historisch$party_abbr == "sp",]$party_id <- 3
lv_historisch[lv_historisch$party_abbr == "csp",]$party_id <- 8
lv_historisch[lv_historisch$party_abbr == "pda",]$party_id <- 9
lv_historisch[lv_historisch$party_abbr == "fps",]$party_id <- 16
lv_historisch[lv_historisch$party_abbr == "evp",]$party_id <- 7
lv_historisch[lv_historisch$party_abbr == "sd",]$party_id <- 16
lv_historisch[lv_historisch$party_abbr == "sol",]$party_id <- 9
lv_historisch[lv_historisch$party_abbr == "lps",]$party_id <- 1
lv_historisch[lv_historisch$party_abbr == "lega",]$party_id <- 18
lv_historisch[lv_historisch$party_abbr == "edu",]$party_id <- 16
lv_historisch[lv_historisch$party_abbr == "glp",]$party_id <- 31

# select variables
lv_historisch %<>% select(year, canton, party, seats_with, seat_difference, seats_without, party_abbr, party_id)

# LPS zu FDP rechnen
lv_historisch %<>%
  mutate(lpsfdp = ifelse(party_abbr == "fdp" | party_abbr == "lps", "fdplps", party_abbr))


lv_historisch %<>% 
  group_by(lpsfdp, year, canton) %>% 
  summarise(party = first(party),  party_abbr = first(party_abbr), party_id = first(party_id),seats_with = sum(seats_with), seat_difference = sum(seat_difference), seats_without = sum(seats_without)) %>% 
  ungroup() %>% 
  select(-lpsfdp, -canton, -party_abbr)

# Plausibilitätschecks

# 800 Sitze
sum(lv_historisch$seats_with)
## [1] 800
sum(lv_historisch$seats_without)
## [1] 800
sum(lv_historisch$seat_difference)
## [1] 0
write.csv(lv_historisch,file = "output/lv_historical_national.csv", na = "", row.names = F)

Historical: Cantonal

lv_historisch <- listenverbindungen_kantonal %>% 
  filter(year < 2011)
sum(lv_historisch$s)
## [1] 802
# Summe sollte 800 sein
lv_historisch %>% 
  group_by(year) %>% 
  summarize(total_sitze = sum(s))
## Source: local data frame [4 x 2]
## 
##   year total_sitze
## 1 1995         202
## 2 1999         200
## 3 2003         200
## 4 2007         200
# im Jahr 1995 ist die Summe nicht 200
as.data.frame(lv_historisch %>% 
  filter(year == 1995) %>% 
  group_by(kt) %>% 
  summarize(total_sitze = sum(s)) %>% 
  ungroup() %>% 
  arrange(desc(total_sitze)))
##    kt total_sitze
## 1  ZH          35
## 2  BE          27
## 3  VD          17
## 4  AG          15
## 5  SG          12
## 6  GE          11
## 7  LU          10
## 8  BL           8
## 9  TI           8
## 10 SO           7
## 11 VS           7
## 12 BS           6
## 13 FR           6
## 14 TG           6
## 15 GR           5
## 16 NE           5
## 17 SZ           3
## 18 ZG           3
## 19 AR           2
## 20 JU           2
## 21 SH           2
## 22 AI           1
## 23 GL           1
## 24 NW           1
## 25 OW           1
## 26 UR           1
# Zürich scheint einen zu viel zu haben
as.data.frame(lv_historisch %>% 
  filter(year == 1995 & kt == "ZH") %>% 
  group_by(party) %>% 
  summarize(total_sitze = sum(s)) %>% 
  ungroup() %>% 
  arrange(desc(total_sitze)))
##      party total_sitze
## 1       sp           9
## 2      svp           9
## 3      fdp           6
## 4      cvp           2
## 5      fga           2
## 6      gps           2
## 7      ldu           2
## 8      evp           1
## 9      fps           1
## 10      sd           1
## 11     csp           0
## 12     edu           0
## 13     glp           0
## 14    lega           0
## 15     lps           0
## 16     pda           0
## 17     sol           0
## 18 uebrige           0
# die FGA hat einen zu viel
lv_historisch[lv_historisch$year == 1995 & lv_historisch$kt == "ZH" & lv_historisch$party == "fga" & lv_historisch$vote < 1,]$s <- 0
# BL ebenfalls
as.data.frame(lv_historisch %>% 
  filter(year == 1995 & kt == "BL") %>% 
  group_by(party) %>% 
  summarize(total_sitze = sum(s)) %>% 
  ungroup() %>% 
  arrange(desc(total_sitze)))
##      party total_sitze
## 1      fdp           2
## 2       sp           2
## 3      cvp           1
## 4      gps           1
## 5       sd           1
## 6      svp           1
## 7      csp           0
## 8      edu           0
## 9      evp           0
## 10     fga           0
## 11     fps           0
## 12     glp           0
## 13     ldu           0
## 14    lega           0
## 15     lps           0
## 16     pda           0
## 17     sol           0
## 18 uebrige           0
# die FDP hat einen zu viel
lv_historisch[lv_historisch$year == 1995 & lv_historisch$kt == "BL" & lv_historisch$party == "fdp"  & lv_historisch$vote < 2,]$s <- 0

sum(lv_historisch$s)
## [1] 800
# nun ist es gut

lv_historisch %<>% 
  group_by(year, kt, party) %>% 
  summarize(seats_with = sum(s), seat_difference = sum(ap_s)) %>% 
  ungroup() %>% 
  mutate(seats_without = seats_with - seat_difference)

sum(lv_historisch$seats_without)
## [1] 800
# die entfernen, die weder mit oder ohne LV keine sitze gemacht hätten
lv_historisch %<>% 
  filter(seats_with > 0 | seats_without > 0)
# transformieren
lv_historisch %<>%
  rename(canton = kt) %>% 
  mutate(party_abbr = party, party = NA, party_id = NA)

unique(lv_historisch$party_abbr)
##  [1] "cvp"  "fdp"  "fps"  "gps"  "ldu"  "sp"   "svp"  "edu"  "evp"  "fga" 
## [11] "sd"   "lps"  "csp"  "pda"  "lega" "sol"  "glp"
# party_id reinholen 

# lv_historisch[lv_historisch$party_abbr == "bdp",]$party_id <- 32
lv_historisch[lv_historisch$party_abbr == "ldu",]$party_id <- 8
lv_historisch[lv_historisch$party_abbr == "svp",]$party_id <- 4
lv_historisch[lv_historisch$party_abbr == "cvp",]$party_id <- 2
lv_historisch[lv_historisch$party_abbr == "fps",]$party_id <- 16
lv_historisch[lv_historisch$party_abbr == "fdp",]$party_id <- 1
lv_historisch[lv_historisch$party_abbr == "fga",]$party_id <- 9
lv_historisch[lv_historisch$party_abbr == "gps",]$party_id <- 13
lv_historisch[lv_historisch$party_abbr == "sp",]$party_id <- 3
lv_historisch[lv_historisch$party_abbr == "csp",]$party_id <- 8
lv_historisch[lv_historisch$party_abbr == "pda",]$party_id <- 9
lv_historisch[lv_historisch$party_abbr == "fps",]$party_id <- 16
lv_historisch[lv_historisch$party_abbr == "evp",]$party_id <- 7
lv_historisch[lv_historisch$party_abbr == "sd",]$party_id <- 16
lv_historisch[lv_historisch$party_abbr == "sol",]$party_id <- 9
lv_historisch[lv_historisch$party_abbr == "lps",]$party_id <- 1
lv_historisch[lv_historisch$party_abbr == "lega",]$party_id <- 18
lv_historisch[lv_historisch$party_abbr == "edu",]$party_id <- 16
lv_historisch[lv_historisch$party_abbr == "glp",]$party_id <- 31


# LPS zu FDP rechnen
lv_historisch %<>%
  mutate(lpsfdp = ifelse(party_abbr == "fdp" | party_abbr == "lps", "fdplps", party_abbr))


lv_historisch %<>% 
  group_by(lpsfdp, year, canton) %>% 
  summarise(party = first(party),  party_abbr = first(party_abbr), party_id = first(party_id), seats_with = sum(seats_with), seat_difference = sum(seat_difference), seats_without = sum(seats_without)) %>% 
  ungroup() %>% 
  select(-lpsfdp, -party_abbr)


# Plausibilitätschecks

# 200 Sitze
sum(lv_historisch$seats_with)
## [1] 800
sum(lv_historisch$seats_without)
## [1] 800
sum(lv_historisch$seat_difference)
## [1] 0
# party_id ermitteln

write.csv(lv_historisch, file = "output/lv_historical.csv", na = "", row.names = F)