Elections 2019: Project 1

Preliminary Remarks

This document describes the pre-processing and exploratory analysis of the data set that is the basis of the article Die grössten Sorgen der Schweiz – und womit sie zusammenhängen published on srf.ch.

SRF Data attaches importance to the fact that the data pre-processing and analysis can be reproduced and checked. SRF Data believes in the principle of open data, but also open and comprehensible methods. On the other hand, it should be possible for third parties to build on this preparatory work and thus generate further evaluations or applications.

R-Script & Daten

The preprocessing and analysis of the data was conducted in the R project for statistical computing. The RMarkdown script used to generate this document and all the resulting data can be downloaded under this link. Through executing main.Rmd, the herein described process can be reproduced and this document can be generated. In the course of this, data from the folder input will be processed and results will be written to output.

SRF Data uses Timo Grossenbacher’s rddj-template as the basis for its R scripts. If you have problems executing this script, it may help to study the instructions from the rddj-template.

This report was generated on 2019-06-03 19:05:26. R version: 3.5.3 on x86_64-apple-darwin15.6.0. For this report, CRAN packages as of 2019-03-01 were used.

GitHub

The code for the herein described process can also be freely downloaded from https://github.com/srfdata/2019-06-worries.

Lizenz

2019-06-worries by SRF Data is licensed under a Creative Commons Namensnennung - Attribution ShareAlike 4.0 International License.

Other projects

Code and data by SRF Data are available on https://srfdata.github.io.

Disclaimer

The published information has been carefully compiled, but does not claim to be up-to-date, complete or correct. No liability is assumed for damages arising from the use of this script or the information drawn from it. This also applies to contents of third parties which are accessible via this offer.

Data description of output files

`worries.csv`

Attribute	Type	Description
year	Numeric	Year of the survey
topic	String	Topic (most important problems for the country)
value	Numeric	Share in percent of poeple (Swiss, above 18) that are worried about that topic
rank	Numeric	Rank of the topic, was it e.g. the number one worry in that year

`requests_per_fraction_and_topic.csv`

Caution: The topics in worries are not the same as in the requests… tables. Also: In 2012 the Federal Chancellery introduced numerous new topics to their classification system.

Attribute	Type	Description
year	Numeric	Year (there usually are 4-5 sessions per year)
a_author_fraction	String	What fraction submitted the requests
total	Numeric	Total number of requests the fraction submitted in that year
topic	String	What topic can the request be attributed to (multiple topics per request possible)
share	Numeric	Share of that topic in the total of all requests the fraction submitted that year
n	Numeric	Number of requests for that topic and fraction in that year

`requests_per_council_and_topic.csv`

Attribute	Type	Description
year	Numeric	Year (there usually are 4-5 sessions per year)
a_council	String	What council submitted the requests
total	Numeric	Total number of requests the council submitted in that year
topic	String	What topic can the request be attributed to (multiple topics per request possible)
share	Numeric	Share of that topic in the total of all requests the council submitted that year
n	Numeric	Number of requests for that topic and council in that year

`party_strengths.csv`

Attribute	Type	Description
year	Numeric	Year of the election
party	String	Name of the party
strength	Numeric	Share in percent of all votes that that party won in that election
delta	Numeric	Change of the strength share of that party compared to the last election

`unemployment_rate.csv`

Attribute	Type	Description
year	Numeric	Year of measurement
rate	Numeric	Proportion of people in Switzerland that were unemployed (average per year)
change	Numeric	Change of the rate since 1995

`asylum_applications.csv`

Attribute	Type	Description
year	Numeric	Year of measurement
applications	Numeric	Number of new applications for asylum in that year (per 31st of December)
change	Numeric	Change of the number of applications since 1995

`immigration.csv`

Attribute	Type	Description
year	Numeric	Year of measurement
immigrants	Numeric	Number of non-swiss immigrants that took up residence in Switzerland
change	Numeric	Change of the number of immigrants since 1995

`health_costs.csv`

Attribute	Type	Description
year	Numeric	Year of measurement
costs	Numeric	Costs of the Swiss health system in millions (CHF)
change	Numeric	Change of the number of applications since 1995

`smd_climate.csv`

Attribute	Type	Description
year	Numeric	Year searched (1st of Jan until 31st of Dec)
articles	Numeric	Number of hits according to smd.ch
months	Numeric	Numbers of months searched (mainly relevant for 2019)
per_month	Numeric	Number of articles with one of the keywords found

Original Source

Worries

-> input/Original_Sorgenbarometer_1988-2018.xlsx

The research institute gfs.bern and the bank Credit Suisse publish a Worry Barometer each year. On their website they write about the survey: “What are the major concerns of people in Switzerland? How much confidence do they have in decision-makers in the fields of politics, business, and society? For the past 42 years, Credit Suisse has conducted an annual Worry Barometer survey to examine precisely these issues. With the Worry Barometer, Credit Suisse aims to contribute to the public debate on issues of socio-political relevance. [Every summer] the research institute gfs.bern askes [about 2500] voters across Switzerland about their concerns on behalf of Credit Suisse. The statistical sampling error is ±2.0 percentage points.”

The survey is conducted the following way: “On these cards you can see some topics that have been discussed and written about a lot recently. Please take a look at all the cards and then put out to me those five of them which you personally consider to be the five most important problems in Switzerland.” (orig: «Auf diesen Kärtchen sehen Sie einige Themen, über die in der letzten Zeit viel diskutiert und geschrieben worden ist. Sehen Sie sich bitte alle Kärtchen an und legen Sie mir dann von allen Kärtchen jene fünf heraus, die Sie persönlich als die fünf wichtigsten Probleme der Schweiz ansehen.»)

As they changed their methodology in 1995 and introduced a lot of new topics, we decided to leave away years before 1995.

Parliamentary procedural requests

-> input/Analyse_SRG19_P1.RData -> input/modifications_p1.R

With the help of the team at Smartvote we can also offer (for the first time, as far as we know) all the parliamentary procedural requests (Parlamentarische Vorstösse) from the National Council and the Council of States. The original source of the items of parlamentary business is the Federal Chancellery.

They started the classification of the items in the year 2000. For this reason, we cannot work with requests in the years before 2000.

National Council elections

-> je-d-17.02.02.02.01.01.xlsx

The number of seats in the National Council we download from the Federal Statistical Offices FSO website.

-> px-x-1702020000_104.csv

For the party strenghts we download a CSV from Stat-Tab where we choose Switzerland in the first tab and all the other entries in all the other tabs.

Popular Votes

-> input/popular_votes_1994_2019.csv

We manually extracted the popular votes from the list on the Federal Chancellerys website.

Unemployment Rate

-> input/je-d-03.03.02.02.xlsx

We download the unemployment rate at the website of the Federal Statistical Office FSO.

Immigration into Switzerland

-> input/su-d-01.05.04.02.01.xlsx

Also the data about immigration we download from the FSO website.

Applications for Asylum

-> input/7-20-Bew-Asylgesuche-J-d-…-12.xlsx

From the website of the State Secretariat for Migration SEM we download the number of new applications of asylum per year.

Costs of the health system

-> input/je-d-14.05.01.01.xlsx

For another reference line concerning the cost of the Swiss health system, we work with this table by the FSO.

Articles concerning climate change

-> input/smd_climate.csv

We queried the Swiss Media Database SMD with the following search: erderwärmung OR "globale erwärmung" OR klimawandel OR treibhauseffekt OR klimaschutz OR klimaveränderung OR klimaerwärmung for each year since 1995 (by manually entering a start and end date of 1st of january and 31st of December each year). Then we manually saved the total number of results into a csv file.

Preparations

## [1] "package package:rmarkdown detached"

## Loading required package: knitr

## Loading required package: rstudioapi

Define packages

# from https://mran.revolutionanalytics.com/web/packages/checkpoint/vignettes/using-checkpoint-with-knitr.html
# if you don't need a package, remove it from here (commenting is probably not sufficient)
# tidyverse: see https://blog.rstudio.org/2016/09/15/tidyverse-1-0-0/
cat("
library(rstudioapi)
library(tidyverse) # ggplot2, dplyr, tidyr, readr, purrr, tibble
library(ggrepel) # repelling texts on ggplot
library(magrittr) # pipes
library(lintr) # code linting
library(sf) # spatial data handling
library(rmarkdown)",
file = "manifest.R")

Install packages

# if checkpoint is not yet installed, install it (for people using this
# system for the first time)
if (!require(checkpoint)) {
  if (!require(devtools)) {
    install.packages("devtools", repos = "http://cran.us.r-project.org")
    require(devtools)
  }
  devtools::install_github("RevolutionAnalytics/checkpoint",
                           ref = "v0.3.2", # could be adapted later,
                           # as of now (beginning of July 2017
                           # this is the current release on CRAN)
                           repos = "http://cran.us.r-project.org")
  require(checkpoint)
}
# nolint start
if (!dir.exists("~/.checkpoint")) {
  dir.create("~/.checkpoint")
}
# nolint end
# install packages for the specified CRAN snapshot date
checkpoint(snapshotDate = package_date,
           project = path_to_wd,
           verbose = T,
           scanForPackages = T,
           use.knitr = F,
           R.version = R_version)
rm(package_date)

Load packages

source("manifest.R")
unlink("manifest.R")
sessionInfo()

## R version 3.5.3 Patched (2019-03-11 r76221)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] rmarkdown_1.11   sf_0.7-3         lintr_1.0.3      magrittr_1.5    
##  [5] ggrepel_0.8.0    forcats_0.4.0    stringr_1.4.0    dplyr_0.8.0.1   
##  [9] purrr_0.3.0      readr_1.3.1      tidyr_0.8.2      tibble_2.0.1    
## [13] ggplot2_3.1.0    tidyverse_1.2.1  checkpoint_0.4.0 rstudioapi_0.9.0
## [17] knitr_1.21      
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_0.2.5 xfun_0.5         haven_2.1.0      lattice_0.20-38 
##  [5] colorspace_1.4-0 generics_0.0.2   htmltools_0.3.6  yaml_2.2.0      
##  [9] rlang_0.3.1      e1071_1.7-0.1    pillar_1.3.1     glue_1.3.0      
## [13] withr_2.1.2      DBI_1.0.0        modelr_0.1.4     readxl_1.3.0    
## [17] plyr_1.8.4       munsell_0.5.0    gtable_0.2.0     cellranger_1.1.0
## [21] rvest_0.3.2      evaluate_0.13    rex_1.1.2        class_7.3-15    
## [25] broom_0.5.1      Rcpp_1.0.0       classInt_0.3-1   scales_1.0.0    
## [29] backports_1.1.3  jsonlite_1.6     hms_0.4.2        digest_0.6.18   
## [33] stringi_1.3.1    grid_3.5.3       cli_1.0.1        tools_3.5.3     
## [37] lazyeval_0.2.1   crayon_1.3.4     pkgconfig_2.0.2  xml2_1.2.0      
## [41] lubridate_1.7.4  assertthat_0.2.0 httr_1.4.0       R6_2.4.0        
## [45] units_0.6-2      nlme_3.1-137     compiler_3.5.3

Read in Data

Worries (barometer)

# read table of survey conducted by GFS Bern
worries <- readxl::read_excel(
  "input/Original_Sorgenbarometer_1988-2018.xlsx"
) %>%
  rename(year = 1) %>%
  gather(topic, value, -one_of("year")) %>%
  # reduce to years after 1995 because before that,
  # the question asked was different than afterwards
  filter(year >= 1995) %>%
  # add a rank by grouping / arranging per year
  # what's not perfect though: when two topics have the same value, they're
  # ordered in the way they occur (so pretty randomly)
  group_by(year) %>%
  arrange(desc(value)) %>%
  mutate(rank = row_number()) %>%
  ungroup() %>%
  # if there is no value, we also don't want it to have a rank
  mutate(rank = ifelse(is.na(value), NA, rank)) %>%
  # convert to correct data type
  mutate(topic = factor(topic)) %>%
  # sort nicely
  arrange(year, rank)

## New names:
## * `` -> `..1`

# export
write_csv(
  worries,
  "output/worries.csv"
)

Parliamentary procedural requests

# read data of parlamentary affairs collected by smartvote
load("input/Analyse_SRG19_P1.RData")

# give more meaningful names to variables
# a = affairs, c = councillor
affairs_data_raw <- a_details
affairs_and_councillors_data <- a_details1
# data from now on filtered to time after year 2000
affairs_and_councillors_with_topics <- a_details2
affairs_per_year_and_council_raw <- a_details3_1
requests_per_year_and_council <- a_details4_1
requests_per_council_topic_shares <- a_details5_1
# affairs now reduced to procedural requests only
affairs_per_fraction <- a_details3_2
requests_per_year_and_fraction <- a_details4_2
requests_per_fraction_topic_shares <- a_details5_2

issue_names <- c(
  "^issue_08$" = "Internationale Politik",
  "^issue_09$" = "Sicherheitspolitik",
  "^issue_10$" = "Europapolitik",
  "^issue_15$" = "Wirtschaft",
  "^issue_44$" = "Beschäftigung Arbeit",
  "^issue_55$" = "Landwirtschaft",
  "^issue_24$" = "Finanzwesen",
  "^issue_2446$" = "Steuer",
  "^issue_28$" = "Soziale Fragen",
  "^issue_2811$" = "Migration",
  "^issue_2836$" = "Sozialer Schutz",
  "^issue_2841$" = "Gesundheit",
  "^issue_2846$" = "Raumplanung Wohnungswesen",
  "^issue_2831$" = "Kultur",
  "^issue_32$" = "Bildung",
  "^issue_36$" = "Wissenschaft Forschung",
  "^issue_34$" = "Medien Kommunikation",
  "^issue_48$" = "Verkehr",
  "^issue_52$" = "Umwelt",
  "^issue_66$" = "Energie",
  "^issue_04$" = "Staatspolitik",
  "^issue_0421$" = "Parlament",
  "^issue_12$" = "Recht allgemein",
  "^issue_1211$" = "Zivilrecht",
  "^issue_1216$" = "Strafrecht",
  "^issue_1221$" = "Gerichtswesen",
  "^issue_1231$" = "Internationales Recht",
  "^issue_1236$" = "Menschenrechte"
)

# function that makes the data frame tidy and replaces issue ids with real names
gather_and_convert_to_numbers <- function (df) {
  df %>%
    # convert columns to rows
    gather(topic, share, starts_with("issue_")) %>%
    mutate(topic = str_replace_all(topic, issue_names)) %>%
    rename(total = N) %>%
    mutate(
      share = share / 100,
      n = round(total * share, digits = 0)
    )
}

# apply to requests per council
requests_per_council_and_topic <-
  requests_per_council_topic_shares %>%
  gather_and_convert_to_numbers()

# and to requests per fraction
requests_per_fraction_and_topic <-
  requests_per_fraction_topic_shares %>%
  gather_and_convert_to_numbers()

# clean up
rm(
  a_details,
  a_details1,
  a_details2,
  a_details3_1,
  a_details3_2,
  a_details4_1,
  a_details4_2,
  a_details5_1,
  a_details5_2,
  gather_and_convert_to_numbers
)

# export
write_csv(
  requests_per_council_and_topic,
  "output/requests_per_council_and_topic.csv"
)

write_csv(
  requests_per_fraction_and_topic,
  "output/requests_per_fraction_and_topic.csv"
)

The raw data contain the following form of affairs:

knitr::kable(
  affairs_data_raw %>%
    group_by(a_type) %>%
    tally() %>%
    arrange(desc(n))
)

a_type	n
Interpellation	10434
Motion	8771
Fragestunde. Frage	8722
Postulat	4377
Parlamentarische Initiative	2243
Geschäft des Bundesrates	2113
Anfrage	1915
Einfache Anfrage	1254
Petition	797
Geschäft des Parlaments	588
Standesinitiative	473
Dringliche Interpellation	334
Dringliche Einfache Anfrage	202
Empfehlung	139
Dringliche Anfrage	79

Visual Analysis

Parliamentary procedural requests

ggplot(
  requests_per_fraction_topic_shares,
  aes(
    x = year,
    y = N
  )
) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(
    title = "Total number of parlamentary procedural requests",
    y = NULL,
    x = NULL
  )

ggplot(
  requests_per_fraction_and_topic,
  aes(
    x = topic,
    y = n
  )
) +
  geom_bar(stat = "identity") +
  facet_wrap(~ year, ncol = 4) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

## Warning: Removed 728 rows containing missing values (position_stack).

fraction_colors <- c(
  "Christlichdemokratische Fraktion" = "#D6862B",
  "Evangelische und Unabhängige Fraktion" = "#DEAA28",
  "Fraktion der Schweizerischen Volkspartei" = "#4B8A3E",
  "Fraktionslos" = "#9D9D9D",
  "Freisinnig-demokratische Fraktion" = "#3872B5",
  "Grüne Fraktion" = "#84B547",
  "Liberale Fraktion" = "#618DEA",
  "Sozialdemokratische Fraktion" = "#F0554D",
  "EVP/EDU Fraktion" = "#DEAA28",
  "Fraktion CVP/EVP/glp" = "#D6862B",
  "FDP-Liberale Fraktion" = "#3872B5",
  "Fraktion BD" = "#E6C820",
  "Fraktion CVP-EVP" = "#D6862B",
  "Grünliberale Fraktion" = "#C4C43D",
  "CVP-Fraktion" = "#D6862B"
)

ggplot(
  requests_per_fraction_and_topic,
  aes(
    x = year,
    y = n,
    fill = a_author_fraction
  )
) +
  geom_bar(stat = "identity") +
  facet_wrap(~ topic, ncol = 4) +
  scale_fill_manual(values = fraction_colors, guide = FALSE) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(
    title = "Number of requests per topic (colors correspond to parties)",
    x = NULL,
    y = NULL
  )

## Warning: Removed 728 rows containing missing values (position_stack).

ggplot(
  requests_per_fraction_and_topic %>%
    group_by(year, topic) %>%
    summarise(n = sum(n, na.rm = TRUE)) %>%
    filter(topic %in% c("Gesundheit")),
  aes(
    x = year,
    y = n
  )
) +
  geom_bar(stat = "identity") +
  facet_wrap(~ topic, ncol = 4) +
  theme_minimal()

# select a more or less random list of some topics to inspect a bit closer
relevant_topics <- c(
  "Umwelt",
  "Energie",
  "Europapolitik",
  "Sicherheitspolitik",
  "Migration"
)

ggplot(
  requests_per_fraction_and_topic %>%
    filter(topic %in% relevant_topics),
  aes(
    x = year,
    y = n,
    group = topic,
    color = topic
  )
) +
  geom_line() +
  facet_wrap(~ a_author_fraction, ncol = 1, scales = "free_y") +
  theme_minimal() +
  labs(
    title = "Parlamentary fractions: Five relevant topics over time",
    x = NULL,
    y = NULL
  )

Worries

To find out in what way we will display the worries in the final publication, we try different approaches to selecting topics and years. As the amount of data can get overwhelming pretty quickly, we need to find a clear focus to concentrate on.

Sorted by median (Top 20)

knitr::kable(
  worries %>%
    group_by(topic) %>%
    summarise(value = median(value, na.rm = TRUE)) %>%
    arrange(desc(value)) %>%
    top_n(20, wt = value)
)

topic	value
Arbeitslosigkeit	56.5
AHV / Altersvorsorge	42.0
Gesundheit, Krankenkassen	41.0
Flüchtlinge / Asyl	30.5
AusländerInnen	28.5
EU / Bilaterale / Integration	21.0
Neue Armut	18.0
Soziale Sicherheit	18.0
Umweltschutz	17.5
Bundesfinanzen	17.0
Persönliche Sicherheit	17.0
Euro-Krise	16.0
Löhne	13.0
Drogen / Alkohol	12.5
Bankenkrise	12.0
Inflation	12.0
Globalisierung	11.0
(Kern-) Energie	10.0
Rassismus / Fremdenfeindlichkeit	10.0
Internet-Sicherheit / Cyber-Spionage	9.0

Biggest worries

As Areas

number_of_top_worries <- 10

manually_selected_top_5 <- c(
  "Arbeitslosigkeit",
  "Flüchtlinge / Asyl",
  "AusländerInnen",
  "AHV / Altersvorsorge",
  "Umweltschutz"
)

# filter for most important groups
worries_sorted <- worries %>%
  group_by(topic) %>%
  mutate(
    # for median calculation, replace NAs with zero
    value = ifelse(is.na(value), 0, value),
    median = median(value, na.rm = TRUE)
  ) %>%
  arrange(desc(median)) %>%
  distinct(topic) %>%
  simplify()

worries %<>%
  mutate(topic = factor(topic, levels = worries_sorted)) %>%
  arrange(topic, year)

# use head instead of top_n to get an exact number
big_worries <- head(worries_sorted, number_of_top_worries)
# uncomment the following if you want to use the manual selection
# big_worries <- manually_selected_top_5

other_worries <- worries %>%
  filter(!topic %in% big_worries) %>%
  group_by(year) %>%
  summarise(value = sum(value, na.rm = TRUE)) %>%
  mutate(topic = "Andere")

ggplot(
  worries %>%
    filter(topic %in% big_worries),
  aes(
    x = year,
    y = value
  )
) +
  geom_area() +
  facet_wrap(~ topic) +
  theme_minimal() +
  labs(
    title = "Biggest worries",
    x = NULL,
    y = NULL
  )

## Warning: Removed 1 rows containing missing values (position_stack).

In one Plot

ggplot(
  worries %>%
    filter(topic %in% big_worries) %>%
    filter(year %% 4 == 3 | year == max(year)),
  aes(
    x = year,
    y = value,
    color = topic
  )
) +
  geom_line() +
  scale_color_brewer(palette = "Set1") +
  theme_minimal() +
  labs(
    title = "Top worries over time (only election years and latest)",
    fill = NULL,
    x = NULL,
    y = NULL
  )

## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors

## Warning: Removed 8 rows containing missing values (geom_path).

Bump Chart

ranks_of_worries <- worries %>%
  # add hex code for some topics to highlight them
  mutate(color = case_when(
    topic == "Arbeitslosigkeit" ~ "#e31f2b",
    topic == "Flüchtlinge / Asyl" ~ "#61b13e",
    topic == "AusländerInnen" ~ "#1cb373",
    topic == "AHV / Altersvorsorge" ~ "#1e8ce3",
    topic == "Umweltschutz" ~ "#ca51af",
    TRUE ~ "#d9d9d9"
  ))

ggplot(
  ranks_of_worries,
  aes(
    x = year,
    y = rank,
    color = color,
    group = topic,
    label = topic
  )
) +
  geom_line() +
  geom_point(size = 2) +
  geom_text_repel(
    data = ranks_of_worries %>%
      filter(year == max(year)),
    segment.size = 0.25,
    segment.color = "black",
    nudge_x = 0.5,
    hjust = 0,
    direction = "y"
  ) +
  scale_y_continuous(trans = "reverse", limits = c(20, 1)) +
  scale_color_identity() +
  xlim(min(ranks_of_worries$year), max(ranks_of_worries$year) + 7) +
  theme_minimal() +
  theme(legend.position = "none") +
  labs(
    title = "Rank of worries (labelled are only top 20 in 2018)",
    x = NULL,
    y = NULL
  )

## Warning: Removed 240 rows containing missing values (geom_path).

## Warning: Removed 361 rows containing missing values (geom_point).

## Warning: Removed 15 rows containing missing values (geom_text_repel).

# tmp filter list to display only 20 lines in a chart
top_20_2018 <- ranks_of_worries %>%
  filter(year == max(year)) %>%
  arrange(rank) %>%
  head(20) %>%
  select(topic) %>%
  simplify()

Only Most important ones

ggplot(
  ranks_of_worries %>%
    filter(year %% 2 == 1) %>%
    filter(topic %in% big_worries),
  aes(
    x = year,
    y = rank,
    color = color,
    group = topic,
    label = topic
  )
) +
  geom_line() +
  geom_point() +
  scale_x_continuous(
    breaks = seq(1995, 2015, 4),
    limits = c(min(ranks_of_worries$year), max(ranks_of_worries$year) + 8)
  ) +
  scale_y_continuous(trans = "reverse", limits = c(20, 1)) +
  scale_color_identity(guide = FALSE) +
  geom_text_repel(
    data = ranks_of_worries %>%
      filter(year %% 2 == 1) %>%
      filter(topic %in% big_worries) %>%
      filter(year == max(year)),
    segment.size = 0.25,
    segment.color = "black",
    nudge_x = 0.5,
    hjust = 0,
    direction = "y"
  ) +
  theme_minimal() +
  labs(
    title = "Top 10 worries ordered by rank",
    x = NULL,
    y = NULL
  )

## Warning: Removed 1 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

ggplot(
  ranks_of_worries %>%
    filter(year %% 2 == 1) %>%
    filter(topic %in% big_worries),
  aes(
    x = year,
    y = value,
    color = color,
    group = topic,
    label = topic
  )
) +
  geom_line() +
  geom_point() +
  scale_x_continuous(
    breaks = seq(1995, 2015, 4),
    limits = c(min(ranks_of_worries$year), max(ranks_of_worries$year) + 8)
  ) +
  scale_color_identity(guide = FALSE) +
  geom_text_repel(
    data = ranks_of_worries %>%
      filter(year %% 2 == 1) %>%
      filter(topic %in% big_worries) %>%
      filter(year == max(year)),
    segment.size = 0.25,
    segment.color = "black",
    nudge_x = 0.5,
    hjust = 0,
    direction = "y"
  ) +
  theme_minimal() +
  labs(
    title = "Real values: how many people were worried in this year?",
    subtitle = "Displayed are only the top 9 topics",
    x = NULL,
    y = NULL
  )

## Warning: Removed 1 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

After looking at the different charts, we decided to focus on the 6 topics that are the most relevant in the latest available year (2018). For the sorted area plot that you see in the final article, we left away all the even years (except for the latest). That was necessary so the plot does not look too confusing.

The visualisation itself was not produced in R but with the online tool RawGraphs using their d3 based tool for bump charts. From there we downloaded a SVG file and imported it into the vector graphics tool Sketch and manually optimized the curving of the lines and their colors and positions.

The result:

knitr::include_graphics("img/final_chart.png")

Additional data

In order to better contextualise and classify the data on the concerns of the population, the following sections introduce some additional data.

Popular Votes

By searching for some keywords we search for all popular votes that belong to the topics migration and pensions.

# load data from national votations from 1994 to 2019
popular_votes <- read_csv("input/popular_votes_1994_2019.csv")

## Parsed with column specification:
## cols(
##   date = col_character(),
##   vote = col_character(),
##   result = col_character()
## )

# save those related to migration in separate data frame
migration <- popular_votes %>%
  filter(str_detect(
    vote, paste(c(
      "einwanderung",
      "Ausländer",
      "Asyl",
      "Migration",
      "Flüchtling",
      "Flüchtlinge",
      "Minarett"
    ), collapse = "|")
  ))

# save those related to pensions in separate data frame
ahv <- popular_votes %>%
  filter(str_detect(
    vote, paste(c(
      "AHV",
      "Altersvorsorge",
      "Rente"
    ), collapse = "|")
  ))

Party strengths

# read voter's share of parties
party_strengths <- read_csv(
  "input/px-x-1702020000_104.csv",
  col_types = cols() # hide info about types
) %>%
  rename(
    canton = 1,
    year = 2,
    party = 3,
    votes = 4,
    fictitious_votes = 5,
    strength = 6
  ) %>%
  # drop unnecessary columns (votes is empty, canton is Schweiz)
  select(year, party, strength) %>%
  # convert strength to number
  mutate(strength = as.numeric(strength)) %>%
  # replace NAs with zero for delta calculations so parties appear also
  # when they have a result for the first time
  mutate(strength = ifelse(is.na(strength), 0, strength)) %>%
  group_by(party) %>%
  arrange(year) %>%
  # calculate delta by subtracting last value in group from current one
  mutate(delta = strength - lag(strength))

## Warning: NAs durch Umwandlung erzeugt

# select only seven biggest parties
main_parties <- party_strengths %>%
  ungroup() %>%
  filter(year == max(year)) %>%
  top_n(7, wt = strength) %>%
  select(party) %>%
  simplify()

party_colors <- c(
  "SVP/UDC" = "#4B8A3E",
  "CVP/PDC" = "#D6862B",
  "FDP/PLR (PRD)" = "#3872B5",
  "GPS/PES" = "#84B547",
  "SP/PS" = "#F0554D",
  "BDP/PBD" = "#E6C820",
  "GLP/PVL" = "#C4C43D"
)

ggplot(
  party_strengths %>%
    filter(party %in% main_parties, year >= 1995),
  aes(
    x = party,
    y = delta,
    fill = party,
    label = scales::percent(delta / 100)
  )
) +
  geom_bar(position = "dodge", stat = "identity") +
  geom_text(aes(vjust = ifelse(delta >= 0, -0.5, 1.5))) +
  scale_fill_manual(values = party_colors, guide = FALSE) +
  facet_wrap(~ year) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(
    title = "Did the main 7 parties win or lose?",
    subtitle = "Loss/gain in percentage points",
    x = NULL,
    y = NULL
  )

# export
write_csv(
  party_strengths %>%
    # round deltas as the substraction introduced weird results
    mutate(delta = round(delta, digits = 2)),
  "output/party_strengths.csv"
)

Unemployment Rate

The footnote 2 in the spreadsheet is not relevant for us: “Change of practice: In March 2018, a new partially automated registration system was introduced at the regional employment centres (RAV).” (orig: Praxisänderung: Auf den März 2018 wurde bei den regionalen Arbeitsvermittlungszentren (RAV) ein neues teilautomatisiertes Erfassungssystem eingeführt.)

unemployment_rate <- readxl::read_excel(
  "input/je-d-03.03.02.02.xlsx",
  sheet = 1,
  range = "B5:AC8"
) %>%
  # convert columns to rows
  gather(year, rate) %>%
  # filter out NAs introduced by empty lines 6 and 7
  filter(!is.na(rate)) %>%
  # remove the footnotes from the years and convert to numeric
  mutate(year = as.numeric(str_sub(year, 0, 4))) %>%
  # filter out years outside of our range of interest
  filter(year >= min(worries$year) & year <= max(worries$year)) %>%
  # add column for change relative to first year in range of interest
  mutate(change = rate / first(rate))

ggplot(
  unemployment_rate,
  aes(
    x = year,
    y = change
  )
) +
  geom_line() +
  expand_limits(y = 0) +
  theme_minimal() +
  labs(
    x = NULL,
    y = NULL
  )

# export
write_csv(
  unemployment_rate,
  "output/unemployment_rate.csv"
)

Immigration into Switzerland

The footnote 2 for year 2011 states: “From 2011, a change in the production process and a new definition of the permanent resident population, which additionally includes persons in the asylum process with a total stay of at least 12 months.” (orig: “Ab 2011 Wechsel des Produktionsverfahrens und neue Definition der ständigen Wohnbevölkerung, die zusätzlich Personen im Asylprozess mit einer Gesamtaufenthaltsdauer von mindestens 12 Monaten umfasst.”)

As there is no other or better data source available, we’ll have to ignore this.

immigration <- readxl::read_excel(
  "input/su-d-01.05.04.02.01.xlsx",
  range = "A3:AB23"
) %>%
  # rename first column
  rename(group = 1) %>%
  # convert all columns except group to rows
  gather(year, immigrants, -one_of("group")) %>%
  # remove footnotes and convert to numeric
  mutate(year = as.numeric(str_sub(year, 0, 4))) %>%
  # look only at non-swiss immigrants, then remove group column
  filter(group == "Ausland") %>%
  select(-group) %>%
  # filter out years outside of our range of interest
  filter(year >= min(worries$year) & year <= max(worries$year)) %>%
  # add column for change relative to first year in range of interest
  mutate(change = immigrants / first(immigrants))

## New names:
## * `` -> `..1`

ggplot(
  immigration,
  aes(
    x = year,
    y = change
  )
) +
  geom_line() +
  expand_limits(y = 0) +
  theme_minimal() +
  labs(
    x = NULL,
    y = NULL
  )

# export
write_csv(
  immigration,
  "output/immigration.csv"
)

Applications for Asylum

The files for new applications for asylum come in yearly files starting in 1994. By iterating over all years from 1994 to 2018 we read in each file and get out the total number of new applications in cell B7.

asylum_applications <- 1994:2018 %>%
  map_df(function(current_year) {
    readxl::read_excel(
      paste0("input/7-20-Bew-Asylgesuche-J-d-", current_year, "-12.xlsx"),
      sheet = "CH-Kt",
      range = "B7:B7",
      col_names = c("applications")
    ) %>%
      mutate(year = current_year)
  }) %>%
  # filter out years outside of our range of interest
  filter(year >= min(worries$year) & year <= max(worries$year)) %>%
  # add column for change relative to first year in range of interest
  mutate(change = applications / first(applications)) %>%
  # move column year to first position
  select(year, everything())

ggplot(
  asylum_applications,
  aes(
    x = year,
    y = change
  )
) +
  geom_line() +
  expand_limits(y = 0) +
  theme_minimal() +
  labs(
    x = NULL,
    y = NULL
  )

# export
write_csv(
  asylum_applications,
  "output/asylum_applications.csv"
)

Costs of the health system

Conveniently this table already comes with the relative change and is already indexed by the year 1995, so we can read these values directly and don’t need to calculate it ourselves.

health_costs <- readxl::read_excel(
  "input/je-d-14.05.01.01.xlsx",
  range = "A10:C66",
  col_names = c("year", "costs", "change")
) %>%
  # make decimals out of the change (now in percent)
  mutate(change = change / 100) %>%
  # filter out years outside of our range of interest
  filter(year >= min(worries$year) & year <= max(worries$year))

ggplot(
  health_costs,
  aes(
    x = year,
    y = change
  )
) +
  geom_line() +
  expand_limits(y = 0) +
  theme_minimal() +
  labs(
    x = NULL,
    y = NULL
  )

# export
write_csv(
  health_costs,
  "output/health_costs.csv"
)

Articles in Swiss Media (German only)

# calculate mentions per month
smd_climate <- read_csv("input/smd_climate.csv") %>%
  mutate(
    months = ifelse(
      year == lubridate::year(Sys.Date()),
      4, # the data we downloaded was up until 2019-04-30
      12
    ),
    per_month = articles / months
  )

## Parsed with column specification:
## cols(
##   year = col_double(),
##   articles = col_double()
## )

ggplot(
  smd_climate,
  aes(
    x = year,
    y = per_month
  )
) +
  geom_bar(stat = "identity") +
  expand_limits(y = 0) +
  theme_minimal() +
  labs(
    x = NULL,
    y = NULL
  )

# export
write_csv(
  smd_climate,
  "output/smd_climate.csv"
)

Linting

Der Code in diesem RMarkdown wird mit lintr automatisch auf den Wickham’schen tidyverse style guide überprüft.

lintr::lint("main.Rmd")

## main.Rmd:676:3: style: Commented code should be removed.
## # big_worries <- manually_selected_top_5
##   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Elections 2019: Project 1

What worries the Swiss?

SRF Data, Angelo Zehr (angelo.zehr@srf.ch)

June 2019

Preliminary Remarks

R-Script & Daten

GitHub

Lizenz

Other projects

Disclaimer

Data description of output files

worries.csv

requests_per_fraction_and_topic.csv

requests_per_council_and_topic.csv

party_strengths.csv

unemployment_rate.csv

asylum_applications.csv

immigration.csv

health_costs.csv

smd_climate.csv

Original Source

Worries

Parliamentary procedural requests

National Council elections

Popular Votes

Unemployment Rate

Immigration into Switzerland

Applications for Asylum

Costs of the health system

Articles concerning climate change

Preparations

Define packages

Install packages

Load packages

Read in Data

Worries (barometer)

Parliamentary procedural requests

Visual Analysis

Parliamentary procedural requests

Worries

Sorted by median (Top 20)

Biggest worries

As Areas

In one Plot

Bump Chart

Only Most important ones

Additional data

Popular Votes

Party strengths

Unemployment Rate

Immigration into Switzerland

Applications for Asylum

Costs of the health system

Articles in Swiss Media (German only)

Linting

`worries.csv`

`requests_per_fraction_and_topic.csv`

`requests_per_council_and_topic.csv`

`party_strengths.csv`

`unemployment_rate.csv`

`asylum_applications.csv`

`immigration.csv`

`health_costs.csv`

`smd_climate.csv`