Notes

This document illustrates the preprocessing of the dataset visualized in this article on srf.ch.

SRF Data attaches great importance to transparent and reproducible data preprocessing and -analysis. SRF Data believes in the principles of open data but also open and reproducible methods. Third parties should be empowered to build on the work of SRF Data and to generate new analyses and applications.

R-Script & processed data

The preprocessing and analysis of the data was conducted in the R project for statistical computing. The RMarkdown script used to generate this document and all the resulting data can be downloaded under this link. Through executing main.Rmd, the herein described process can be reproduced and this document can be generated. In the course of this, data from the folder ìnput will be processed and results will be written to output.

Attention: Please set your working directory in the first code chunk!

GitHub

The code for the herein described process can also be freely downloaded from https://github.com/srfdata/2015-07-elections-parliament-lobbying. Criticism in the form of GitHub issues and pull requests are very welcome!

License

Creative Commons License
2015-07-elections-parliament-lobbying by SRF Data is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Exclusion of liability

The published information has been collated carefully, but no guarantee is offered of its completeness, correctness or up-to-date nature. No liability is accepted for damage or loss incurred from the use of this script or the information drawn from it. This exclusion of liability also applies to third-party content that is accessible via this offer.

Other projects

All code & data from SRF Data is available under http://srfdata.github.io.

Data description

Original data source

In the following, the data source “Smartvote” refers to “Politools/smartvote.ch, Bern (Daniel Schwarz)”.

The following files were originally preprocessed by Pauline Turuban (RTS). For specific questions, please contact her under pauline.turuban@rts.ch

  • input/billcontext.submitters.ext.leg49.xls - Contains all the parliamentarian submissions of the 49th legislature, for both chambers, classified by topic. Not directly used in the preprocessing. Data source: Smartvote. Original comment by data source: “It’s an excerpt of all submitted affairs in the 49th legislature (winter session 2011 until spring session 2015) in the Swiss National Council. Source = Webservices of the Swiss Parliamentary Services, Bern (http://ws.parlament.ch/)”.

  • input/MP_mandates_v3.xlsx - This file has two sources. The first and main one is, again, the register of interests. The data was copied, cleaned, and links of interest were categorized. To do that, the organisations’ websites and/or the Monetas database was checked to know in what activities MPs were involved. The second source is more a complement of the first one. It was obtained from Stefano Puddu (economist at UniNE) a dataset he set up for his work “MP’s links of interest, a comprehensive dataset”. It can now be downloaded from his website. Description by that data source: “Monetas has a list of Swiss politicians. For each of them we opened his/her personal page in Monetas and we opened all the pages referring to the companies in which the politician has a mandate. All the process has been automatized by using perl and import.io, to extract all the data.”. The database by Stefano Puddu was mainly used to complement missing information on MPs’ professions. The column “Topic” has been created by Pauline Turuban. Its purpose was to match certain types of interest to certain motions/postulates (see below).

  • input/motions_postulates.xlsx - Manually derived from input/billcontext.submitters.ext.leg49.xls and actually used in the preprocessing. The classification by topic is established by the parliament itself. Manually, a subcategory “Health insurance” within the category “Health” was created. Also, only what was needed was kept: motions and postulates.

  • input/MP_ID_v2.xlsx - A basic list of MPs’ IDs as made by the parliament itself. Data source: Smartvote.

  • input/MP_profession.xlsx - The original source for the column “profession” is the register of interests one can find on the parliament’s website (as of April 2015). Based on it, categories of professions were established (for example: lawyers, jurists, attorneys / bankers, fiduciaries / ingenieurs / teachers, etc.). It explains why the denominations are not exactly the same as in the MP’s biographies.

  • input/commissions.csv - This file was originally generated in the below script, but fetching the commission data from the parliament API turned out to be error-prone and unstable. Thus, the respective part of the code is now commented out and the file was transferred to input.

  • input/mp_commissions - Contains the links between MP’s IDs and their commissions (as IDs, semicola-separated).

  • input/* - Other files are just lookup tables that will just be copied over to output, their structure is described below under “Lookup tables”

Description of output

The following sections describe the results of the data preprocessing as stored in the output folder.

Lookup tables

output/cantons.csv

Attribute Type Description
abbr String Official cantonal abbreviation
majority-election Integer Whether the majoral voting system is in place (1) or not (0)
name.* String Name in German (de), French (fr), English (en), Romansh (ro), Italian (it), respectively

output/parties.csv

Contains party classifications made by SRF Data with the help of political scientists, used throughout all projects related to elections. This version here is a slightly altered form which contains all parties that have at least one MP in the national council.

Attribute Type Description
id Integer Unique identifier
abbr.* String Abbreviation in German (de), French (fr), English (en), Romansh (ro), Italian (it), respectively

output/commissions.csv

Contains commission information as fetched from the parliamentary API.

Attribute Type Description
id Integer Unique identifier
title.* String Name in German (de), French (fr), English (en), Romansh (ro), Italian (it), respectively

output/interest_categories.csv

Contains categories of links of interest as defined by Pauline Turuban (RTS).

Attribute Type Description
id Integer Unique identifier
title.* String Name of category of link of interest in German (de), French (fr), English (en), Romansh (ro), Italian (it), respectively

output/political_action_categories.csv

Contains topics of political “actions” (motions and postulates) as defined by the parliament itself.

Attribute Type Description
id Integer Unique identifier
title.* String Name of the topic in German (de), French (fr), English (en), Romansh (ro), Italian (it), respectively

output/professions.csv

Contains professions of MPs.

Attribute Type Description
id Integer Unique identifier
title.* String Name of the profession in German (de), French (fr), English (en), Romansh (ro), Italian (it), respectively

output/profession_branches.csv

Contains categories of professions (“branches”) as defined by Pauline Turuban (RTS).

Attribute Type Description
id Integer Unique identifier
title.* String Name of the category in German (de), French (fr), English (en), Romansh (ro), Italian (it), respectively

Parliamentarian tables

output/parliamentarian.csv

Attribute Type Description
id Integer Unique identifier as defined by the parliament itself
first_name String First name of the MP
last_name String Last name of the MP
canton String Official cantonal abbreviation, references abbr in output/cantons.csv
party_id Integer Party, references id in output/parties.csv
profession_id Integer Profession, references id in output/professions.csv
branch_id Integer Branch of profession, references id in output/profession_branches.csv
commissions String Semicolon-separated list of parliamentary commissions, references id in output/commissions.csv

output/parliamentarian_mandates.csv

Contains information about the links of interest of each MP (e.g. presidencies, administrative councils, etc.).

Attribute Type Description
parliamentarian_id Integer References id in output/parliamentarian.csv
interest_categories_id Integer The category of the link of interest, references id in output/interest_categories.csv
title String Name of link of interest as shown on parlament.ch

output/parliamentarian_actions.csv

Contains all political “actions”, i.e., motions and postulates submitted by MP’s.

Attribute Type Description
parliamentarian_id Integer References id in output/parliamentarian.csv
bill_id Integer The official ID of the political action
political_action_categories_id Integer The topic of the political action, references id in output/political_action_categories.csv

Preparations

Define packages

# von https://mran.revolutionanalytics.com/web/packages/checkpoint/vignettes/using-checkpoint-with-knitr.html
cat("library(magrittr)
library(tidyr)
library(dplyr)
library(readxl)
library(httr)
library(jsonlite)
library(ggplot2)", 
file = "manifest.R")
package_date <- "2015-08-01"

Install packages

if(!require(checkpoint)) {
  if(!require(devtools)){
    install.packages("devtools", repos = "http://cran.us.r-project.org")
    require(devtools)
  }
  devtools::install_github("checkpoint", username = "RevolutionAnalytics", ref = "v0.3.2", repos = "http://cran.us.r-project.org")
  require(checkpoint)
}
## Loading required package: checkpoint
## 
## checkpoint: Part of the Reproducible R Toolkit from Microsoft
## https://mran.microsoft.com/documents/rro/reproducibility/
if(!dir.exists("~/.checkpoint")){
  dir.create("~/.checkpoint")
}
checkpoint(snapshotDate = package_date, project = path_to_wd, verbose = T, scanForPackages = T, use.knitr = F)
## Scanning for packages used in this project
## rmarkdown files found and will not be parsed. Set use.knitr = TRUE
## - Discovered 8 packages
## All detected packages already installed
## checkpoint process complete
## ---
rm(package_date)

Load packages

source("manifest.R")
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
## 
##     extract
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Attaching package: 'jsonlite'
## The following object is masked from 'package:utils':
## 
##     View
unlink("manifest.R")

Preprocessing

Lookup Tables

parties <- read.csv("input/parties.csv") 
parties %<>% select(-message.code, -color.id, -sortorder, -use.label)
write.csv(parties, file = "output/parties.csv", row.names = F, na = "", quote = T)

cantons <- read.csv("input/cantons.csv")
write.csv(cantons, file = "output/cantons.csv", row.names = F, na = "", quote = T)

interest_categories <- read.csv("input/interest_categories.csv") 
interest_categories %<>% select(-sort)
write.csv(interest_categories, file = "output/interest_categories.csv", row.names = F, na = "", quote = T)

political_action_categories <- read.csv("input/political_action_categories.csv") 
political_action_categories %<>% select(-sort)
write.csv(political_action_categories, file = "output/political_action_categories.csv", row.names = F, na = "", quote = T)

profession_branches <- read.csv("input/profession_branches.csv") 
write.csv(profession_branches, file = "output/profession_branches.csv", row.names = F, na = "", quote = T)

professions_lt <- read.csv("input/professions.csv") 
write.csv(professions_lt, file = "output/professions.csv", row.names = F, na = "", quote = T)

Basic MP info

# Load list of MPs, professions
mps <- read_excel(path = "input/MP_ID_v2.xlsx")[1:200,1:5]
mps %<>% rename(first_name = `first name`, last_name = `last name`)
# load list of professions
professions <- read_excel(path = "input/MP_profession.xlsx")[1:200,1:5] %>% select(-name)
# join
mps_and_professions <- mps %>% full_join(professions)
## Joining by: "id"
mps_and_professions %<>% mutate(party_id = NA, party_abbr = tolower(party))
# replace party with party_id
unique(mps_and_professions$party_abbr)
##  [1] "svp"     "sp"      "cvp"     "glp"     "fdp-lib" "bdp"     "gps"    
##  [8] "mcr"     "evp"     "lega"    "csp-ow"
mps_and_professions$party_id <- 8
mps_and_professions[mps_and_professions$party_abbr == "svp",]$party_id <- 4
mps_and_professions[mps_and_professions$party_abbr == "cvp",]$party_id <- 2
mps_and_professions[mps_and_professions$party_abbr == "fdp-lib",]$party_id <- 1
mps_and_professions[mps_and_professions$party_abbr == "gps",]$party_id <- 13
mps_and_professions[mps_and_professions$party_abbr == "sp",]$party_id <- 3
mps_and_professions[mps_and_professions$party_abbr == "evp",]$party_id <- 7
# mps_and_professions[mps_and_professions$party_abbr == "sol",]$party_id <- 9
# mps_and_professions[mps_and_professions$party_abbr == "lps",]$party_id <- 1
mps_and_professions[mps_and_professions$party_abbr == "lega",]$party_id <- 18
mps_and_professions[mps_and_professions$party_abbr == "glp",]$party_id <- 31
mps_and_professions[mps_and_professions$party_abbr == "bdp",]$party_id <- 32
mps_and_professions[mps_and_professions$party_abbr == "mcr",]$party_id <- 90
mps_and_professions[mps_and_professions$party_abbr == "csp-ow",]$party_id <- 91

# some professions need to be cleaned manually
mps_and_professions[mps_and_professions$profession_french == "Viticulteur/trice, œnologue",]$profession_french = "Viticulteur/trice, oenologue"
mps_and_professions[mps_and_professions$profession_german == "Kaufmann / Kauffrau",]$profession_french = "Kaufmann/Kauffrau"

mps_and_professions %<>% 
  inner_join(professions_lt, by = c("profession_french" = "title.fr")) %>% 
  select(-profession_french, -profession_german, -title.de, -title.en, -title.it, -title.ro) %>% 
  rename(profession_id = id.y)
## Warning in inner_join_impl(x, y, by$x, by$y): joining character vector and
## factor, coercing into character vector
# combine with branch lookup
mps_and_professions %<>% inner_join(profession_branches, by = c("branch" = "title.fr")) %>% 
  rename(branch_id = id, id = id.x) %>% 
  select(-branch, -title.de, -title.en, -title.it, -title.ro) 
## Warning in inner_join_impl(x, y, by$x, by$y): joining character vector and
## factor, coercing into character vector
# select only relevant data
mps <- mps_and_professions %>% select(-party, -party_abbr) 

# DO NOT USE
# query commissions
# mps %<>% mutate(commissions = "1;2;3") # placeholder

# # loop over each MoP
# df_of_all_commissions <- data.frame()
# 
# for(id in mps$id){
#   commissions_all_lang <- list()
#   # query by official parliament id
#   for(lang in c("de", "fr", "it")){
#     query_url <- paste("http://ws.parlament.ch/councillors/", id, "?format=json&activeCommitteeMembershipsFilter=true&lang=", lang, sep = "")
#     rawdata <- GET(url=query_url, accept_json(), timeout(2))
#     jsondata <- fromJSON(content(rawdata, type="text", encoding = "UTF-8"))
#     commissions <- jsondata$committeeMemberships$committee
#     commissions_all_lang[[lang]] <- commissions[,c("id","name")]
#   }
#   # serialize id and attach to MoP
#   list_of_commission_ids <- paste(commissions_all_lang$de$id, collapse = ";")
#   mps[mps$id == id,]$commissions <- list_of_commission_ids
#   # add to list of all commissions
#   commissions <- data.frame("id" = commissions_all_lang$de$id, "de" = commissions_all_lang$de$name, "fr" = commissions_all_lang$fr$name, "it" = commissions_all_lang$it$name)
#   if(dim(commissions)[1] > 0){
#     df_of_all_commissions <- rbind(df_of_all_commissions, commissions[,c("id", "de", "fr", "it")])
#   }
#   print(paste("fetched MP id ", id))
# }
# 

# INSTEAD, use commissions.csv
commissions <- read.csv('input/mp_commissions.csv')
# join with mps
mps %<>% left_join(commissions, by = "id")
write.csv(mps, file = "output/parliamentarian.csv", na = "", row.names = F, quote = T)


# DO NOT USE
# # prepare commissions.csv
# unique_commissions <- df_of_all_commissions %>% unique()
# # transform
# unique_commissions %<>% 
#   rename(`title-de` = de,  `title-fr` = fr, `title-it` = it) %>% 
#   mutate(`title-en` = NA, `title-ro` = NA) %>% 
#   select(id, `title-de`, `title-en`, `title-fr`, `title-it`, `title-ro`)
# # save
# INSTEAD
unique_commissions <- read.csv('input/commissions.csv')
write.csv(unique_commissions, file = "output/commissions.csv", na = "", row.names = F, quote = T)

Motions and postulates of parliamentarians

# Load list of MPs, actions
mps_actions <- read_excel(path = "input/motions_postulates.xlsx", sheet = 1)[1:2104,1:4]

mps <- mps_actions %>% select(mp_id, bill_id, Topic) %>% rename(parliamentarian_id = mp_id)

parliamentarian_actions <- mps %>% inner_join(political_action_categories, by = c("Topic" = "title.en")) %>% 
  rename(political_action_categories_id = id) %>% 
  select(parliamentarian_id, bill_id, political_action_categories_id) %>% 
  arrange(parliamentarian_id, bill_id, political_action_categories_id)
## Warning in inner_join_impl(x, y, by$x, by$y): joining character vector and
## factor, coercing into character vector
# aggregate
# parliamentarian_actions %<>% group_by(parliamentarian_id, political_action_categories_id) %>% 
#   summarize(count = n()) %>% arrange(parliamentarian_id, desc(count))
write.csv(parliamentarian_actions, file = "output/parliamentarian_actions.csv", na = "", row.names = F, quote = T)

Analysis

Correlation between interest bindings and submitted motions and postulates?

The following comments are unfortunately in German Korrelation zwischen Anzahl Interessenbindungen und Anzahl eingereichten Vorstössen?

# Anzahl Bindungen
df1 <- parliamentarian_mandates %>% 
  group_by(parliamentarian_id) %>% 
  summarize(count_mandates = n()) %>% 
  arrange(parliamentarian_id)
# Anzahl Vorstösse
df2 <- parliamentarian_actions %>% 
  group_by(parliamentarian_id) %>% 
  summarize(count_actions = n_distinct(bill_id)) %>% 
  arrange(parliamentarian_id)
# Zusatzinfos
mps_parties <- mps_and_professions %>% select(id, party, branch_id) %>% rename(parliamentarian_id = id)
# Join
zusammen <- df1 %>% full_join(df2, by = "parliamentarian_id") %>% full_join(mps_parties)
## Joining by: "parliamentarian_id"
zusammen[is.na(zusammen$count_actions),]$count_actions <- 0

zusammen[is.na(zusammen$count_mandates),]$count_mandates <- 0


# ggplot eingefärbt nach Partei
ggplot(zusammen, aes(count_mandates, count_actions)) + 
  geom_point(size = 3) + 
  theme_bw() +
  ggtitle("Kein Zusammenhang zwischen Anzahl Mandaten und Vorstössen") +
  labs(x = "Anzahl Mandate", y = "Anzahl Vorstösse", title) + 
  stat_smooth(method = "lm", se = F)
## Warning in structure(NULL, class = "waiver"): Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
##   Consider 'structure(list(), *)' instead.

## Warning in structure(NULL, class = "waiver"): Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
##   Consider 'structure(list(), *)' instead.

## Warning in structure(NULL, class = "waiver"): Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
##   Consider 'structure(list(), *)' instead.

## Warning in structure(NULL, class = "waiver"): Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
##   Consider 'structure(list(), *)' instead.

## Warning in structure(NULL, class = "waiver"): Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
##   Consider 'structure(list(), *)' instead.

## Warning in structure(NULL, class = "waiver"): Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
##   Consider 'structure(list(), *)' instead.

## Warning in structure(NULL, class = "waiver"): Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
##   Consider 'structure(list(), *)' instead.

## Warning in structure(NULL, class = "waiver"): Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
##   Consider 'structure(list(), *)' instead.

## Warning in structure(NULL, class = "waiver"): Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
##   Consider 'structure(list(), *)' instead.

## Warning in structure(NULL, class = "waiver"): Calling 'structure(NULL, *)' is deprecated, as NULL cannot have attributes.
##   Consider 'structure(list(), *)' instead.

# durch. Mandate
mean(zusammen$count_actions)
## [1] 7.208955
mean(zusammen$count_mandates)
## [1] 8.298507