This document illustrates the preprocessing of the dataset visualized in this article on srf.ch.
SRF Data attaches great importance to transparent and reproducible data preprocessing and -analysis. SRF Data believes in the principles of open data but also open and reproducible methods. Third parties should be empowered to build on the work of SRF Data and to generate new analyses and applications.
The preprocessing and analysis of the data was conducted in the R project for statistical computing. The RMarkdown script used to generate this document and all the resulting data can be downloaded under this link. Through executing main.Rmd
, the herein described process can be reproduced and this document can be generated. In the course of this, data from the folder ìnput
will be processed and results will be written to output
.
Attention: Please set your working directory in the first code chunk!
The code for the herein described process can also be freely downloaded from https://github.com/srfdata/2015-09-elections-cantonal-budgets. Criticism in the form of GitHub issues and pull requests is very welcome!
2015-09-elections-cantonal-budgets by SRF Data is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The published information has been collated carefully, but no guarantee is offered of its completeness, correctness or up-to-date nature. No liability is accepted for damage or loss incurred from the use of this script or the information drawn from it. This exclusion of liability also applies to third-party content that is accessible via this offer.
All code & data from SRF Data is available under http://srfdata.github.io.
The data shown here is the result of an email questionnaire conducted by SRF Data and Radio Télévision Suisse (RTS) in July and August 2015 with over 200 cantonal party sections. See the questions asked here. In the case of a misunderstanding, clarification was asked for, when no answer was received, follow-up questions were asked.
The data shown is based exclusively on the cantonal party’s own declaration and has not been verified by other sources. A statement on the accuracy of this data can therefore not be made, neither on the total budget nor on the sources of finance. A question mark for the financial sources can have the following meanings: 1. the cantonal party cannot give any declaration, 2. the cantonal party does not want to make any declaration, 3. the figures could not be calculated. In the canton of Jura, the budget cannot always be separated between the federal and cantonal elections, which both take place on October 18.
input/data.csv
- The original survey response data, already double-checked, preprocessed and cleaned by SRF Data. Is copied over 1:1 to the output
folder.input/parties.csv
- Contains party classifications made by SRF Data with the help of political scientists, used throughout all projects related to elections. Is copied over 1:1 to the output
folder.The following sections describe the results of the data preprocessing as stored in the output
folder.
output/data.csv
Attribute | Type | Description |
---|---|---|
id | String | Unique identifier |
party_id | Integer | Party, references id in output/parties.csv |
party_name | String | Contains the party name, but only if it belongs to a group in output/parties.csv (e.g. id == 8 , id == 16 or id == 9 or if there is a special name such as “SP Oberwallis”) |
canton | String | Official cantonal abbreviation |
transparency_level | Integer | Level of transparency (0 no response, 1 refused to give answer, 2 at least total budget specified) |
budget_total_lower | Integer | Lower boundary of total budget as declared by the cantonal section |
budget_total_upper | Integer | Upper boundary of total budget as declared by the cantonal section (if no range is given budget_total_lower == budget_total_upper ) |
budget_share_private_donors | String | Share of budget coming from private donors (see survey questions above) |
budget_share_corporate_donors | String | Share of budget coming from corporate donors (see survey questions above) |
budget_share_candidates_elected | String | Share of budget coming from candidates OR already elected representatives (e.g. in the form of fees, see survey questions above) |
budget_share_members | String | Share of budget coming from member fees (see survey questions above) |
budget_share_others | String | Share of budget coming from other sources (see survey questions above) |
budget_share_others_description | String | Description of other sources by the cantonal section |
comment_by_party | String | Additional comments made by the cantonal section |
../frontend/src/assets/gsheets/data.json
Basically the same content as data.csv
but in JSON format. Used directly by the frontend application.
output/parties.csv
Contains party classifications made by SRF Data with the help of political scientists, used throughout all projects related to elections.
Attribute | Type | Description |
---|---|---|
message.code | String | Used for frontend purposes solely |
id | Integer | Unique identifier, referenced from output/data.csv |
abbr_* | String | Abbreviation, but with slightly more information, used for frontend purposes |
legend_* | String | Abbreviation, but with slightly more information, used for frontend purposes |
name_* | String | Full name |
sortorder | Integer | Used for frontend purposes solely |
# von https://mran.revolutionanalytics.com/web/packages/checkpoint/vignettes/using-checkpoint-with-knitr.html
cat("library(magrittr)
library(tidyr)
library(dplyr)
library(readxl)
library(ggplot2)
library(jsonlite)",
file = "manifest.R")
package_date <- "2015-08-01"
if(!require(checkpoint)) {
if(!require(devtools)){
install.packages("devtools", repos = "http://cran.us.r-project.org")
require(devtools)
}
devtools::install_github("checkpoint", username = "RevolutionAnalytics", ref = "v0.3.2", repos = "http://cran.us.r-project.org")
require(checkpoint)
}
## Loading required package: checkpoint
##
## checkpoint: Part of the Reproducible R Toolkit from Microsoft
## https://mran.microsoft.com/documents/rro/reproducibility/
if(!dir.exists("~/.checkpoint")){
dir.create("~/.checkpoint")
}
checkpoint(snapshotDate = package_date, project = path_to_wd, verbose = T, scanForPackages = T, use.knitr = F)
## Scanning for packages used in this project
## rmarkdown files found and will not be parsed. Set use.knitr = TRUE
## - Discovered 7 packages
## All detected packages already installed
## checkpoint process complete
## ---
rm(package_date)
source("manifest.R")
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
##
## extract
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'jsonlite'
## The following object is masked from 'package:utils':
##
## View
unlink("manifest.R")
# read in and save in output folder
destination_data <- read.csv("input/data.csv")
parties <- read.csv("input/parties.csv")
# save
write.csv(destination_data, "output/data.csv", row.names = F, quote = T, na = "")
write.csv(parties, "output/parties.csv", row.names = F, quote = T, na = "")
# count == 2 p.P.
perparty <- destination_data %>%
group_by(party_id) %>%
summarize(count_transparent = sum(transparency_level == 2), count_all = sum(transparency_level >= 0), count_percentage = count_transparent/count_all) %>%
left_join(parties, by = c("party_id" = "id")) %>%
select(abbr_en, count_transparent, count_all, count_percentage) %>%
arrange(desc(count_transparent), desc(count_percentage))
perparty
## Source: local data frame [12 x 4]
##
## abbr_en count_transparent count_all count_percentage
## 1 SP 23 25 0.9200000
## 2 GPS 17 20 0.8500000
## 3 GLP 14 17 0.8235294
## 4 BDP 13 16 0.8125000
## 5 CVP 13 24 0.5416667
## 6 SVP 13 26 0.5000000
## 7 EVP 12 12 1.0000000
## 8 FDP 9 25 0.3600000
## 9 Small left-wing parties 8 8 1.0000000
## 10 Others 6 7 0.8571429
## 11 Small right-wing parties 3 12 0.2500000
## 12 Lega 1 1 1.0000000
# perparty %>%
# write.csv("output/perparty.csv")
# count == 2 p.P.
percanton <- destination_data %>%
group_by(canton) %>%
summarize(count_transparent = sum(transparency_level == 2), count_all = sum(transparency_level >= 0), count_percentage = count_transparent/count_all) %>%
select(canton, count_transparent, count_all, count_percentage) %>%
arrange(desc(count_transparent), desc(count_percentage))
percanton %>%
as.data.frame()
## canton count_transparent count_all count_percentage
## 1 GE 11 12 0.9166667
## 2 VD 10 10 1.0000000
## 3 BE 9 10 0.9000000
## 4 BS 8 10 0.8000000
## 5 FR 8 10 0.8000000
## 6 NE 7 8 0.8750000
## 7 VS 7 10 0.7000000
## 8 ZH 7 10 0.7000000
## 9 JU 6 7 0.8571429
## 10 LU 6 8 0.7500000
## 11 TI 6 8 0.7500000
## 12 BL 6 9 0.6666667
## 13 SO 6 9 0.6666667
## 14 TG 6 9 0.6666667
## 15 SZ 5 7 0.7142857
## 16 SG 5 9 0.5555556
## 17 GR 4 6 0.6666667
## 18 AG 4 9 0.4444444
## 19 AI 2 2 1.0000000
## 20 OW 2 4 0.5000000
## 21 UR 2 5 0.4000000
## 22 ZG 2 6 0.3333333
## 23 AR 1 3 0.3333333
## 24 NW 1 3 0.3333333
## 25 GL 1 4 0.2500000
## 26 SH 0 5 0.0000000
# percanton %>%
# write.csv("output/percanton.csv")
overall_budget <- destination_data %>% filter() %>% summarize(sum_lower = sum(budget_total_lower, na.rm = T), sum_upper = sum(budget_total_upper, na.rm = T))
transparency_rate <- destination_data %>% filter(transparency_level == 2) %>% summarise(count = n()) / nrow(destination_data)
response_rate <- destination_data %>% filter(transparency_level >= 1) %>% summarise(count = n()) / nrow(destination_data)
estimation <- overall_budget / transparency_rate[1,1]
Based on our data at hand we can say that parties spend between 13490410 and 13616410 Swiss francs. Based on our data at hand and a transparency rate of 68 %, we can estimate that cantonal parties spend between 19724614 and 19908841 Swiss francs. Note: This is a very conservative estimate, since the parties spending the most (SVP, FDP) are usually the ones that are not transparent. And it is also an uncertain estimate, because it does not consider the inter-cantonal budget- and transparency distribution.