Preliminary Remarks

This document describes the pre-processing and exploratory analysis of the data set that is the basis of the article Die grössten Sorgen der Schweiz – und womit sie zusammenhängen published on srf.ch.

SRF Data attaches importance to the fact that the data pre-processing and analysis can be reproduced and checked. SRF Data believes in the principle of open data, but also open and comprehensible methods. On the other hand, it should be possible for third parties to build on this preparatory work and thus generate further evaluations or applications.

R-Script & Daten

The preprocessing and analysis of the data was conducted in the R project for statistical computing. The RMarkdown script used to generate this document and all the resulting data can be downloaded under this link. Through executing main.Rmd, the herein described process can be reproduced and this document can be generated. In the course of this, data from the folder input will be processed and results will be written to output.

SRF Data uses Timo Grossenbacher’s rddj-template as the basis for its R scripts. If you have problems executing this script, it may help to study the instructions from the rddj-template.

This report was generated on 2019-06-03 19:05:26. R version: 3.5.3 on x86_64-apple-darwin15.6.0. For this report, CRAN packages as of 2019-03-01 were used.

GitHub

The code for the herein described process can also be freely downloaded from https://github.com/srfdata/2019-06-worries.

Other projects

Code and data by SRF Data are available on https://srfdata.github.io.

Disclaimer

The published information has been carefully compiled, but does not claim to be up-to-date, complete or correct. No liability is assumed for damages arising from the use of this script or the information drawn from it. This also applies to contents of third parties which are accessible via this offer.

Data description of output files

worries.csv

Attribute Type Description
year Numeric Year of the survey
topic String Topic (most important problems for the country)
value Numeric Share in percent of poeple (Swiss, above 18) that are worried about that topic
rank Numeric Rank of the topic, was it e.g. the number one worry in that year

requests_per_fraction_and_topic.csv

Caution: The topics in worries are not the same as in the requests… tables. Also: In 2012 the Federal Chancellery introduced numerous new topics to their classification system.

Attribute Type Description
year Numeric Year (there usually are 4-5 sessions per year)
a_author_fraction String What fraction submitted the requests
total Numeric Total number of requests the fraction submitted in that year
topic String What topic can the request be attributed to (multiple topics per request possible)
share Numeric Share of that topic in the total of all requests the fraction submitted that year
n Numeric Number of requests for that topic and fraction in that year

requests_per_council_and_topic.csv

Attribute Type Description
year Numeric Year (there usually are 4-5 sessions per year)
a_council String What council submitted the requests
total Numeric Total number of requests the council submitted in that year
topic String What topic can the request be attributed to (multiple topics per request possible)
share Numeric Share of that topic in the total of all requests the council submitted that year
n Numeric Number of requests for that topic and council in that year

party_strengths.csv

Attribute Type Description
year Numeric Year of the election
party String Name of the party
strength Numeric Share in percent of all votes that that party won in that election
delta Numeric Change of the strength share of that party compared to the last election

unemployment_rate.csv

Attribute Type Description
year Numeric Year of measurement
rate Numeric Proportion of people in Switzerland that were unemployed (average per year)
change Numeric Change of the rate since 1995

asylum_applications.csv

Attribute Type Description
year Numeric Year of measurement
applications Numeric Number of new applications for asylum in that year (per 31st of December)
change Numeric Change of the number of applications since 1995

immigration.csv

Attribute Type Description
year Numeric Year of measurement
immigrants Numeric Number of non-swiss immigrants that took up residence in Switzerland
change Numeric Change of the number of immigrants since 1995

health_costs.csv

Attribute Type Description
year Numeric Year of measurement
costs Numeric Costs of the Swiss health system in millions (CHF)
change Numeric Change of the number of applications since 1995

smd_climate.csv

Attribute Type Description
year Numeric Year searched (1st of Jan until 31st of Dec)
articles Numeric Number of hits according to smd.ch
months Numeric Numbers of months searched (mainly relevant for 2019)
per_month Numeric Number of articles with one of the keywords found

Original Source

Worries

-> input/Original_Sorgenbarometer_1988-2018.xlsx

The research institute gfs.bern and the bank Credit Suisse publish a Worry Barometer each year. On their website they write about the survey: “What are the major concerns of people in Switzerland? How much confidence do they have in decision-makers in the fields of politics, business, and society? For the past 42 years, Credit Suisse has conducted an annual Worry Barometer survey to examine precisely these issues. With the Worry Barometer, Credit Suisse aims to contribute to the public debate on issues of socio-political relevance. [Every summer] the research institute gfs.bern askes [about 2500] voters across Switzerland about their concerns on behalf of Credit Suisse. The statistical sampling error is ±2.0 percentage points.”

The survey is conducted the following way: “On these cards you can see some topics that have been discussed and written about a lot recently. Please take a look at all the cards and then put out to me those five of them which you personally consider to be the five most important problems in Switzerland.” (orig: «Auf diesen Kärtchen sehen Sie einige Themen, über die in der letzten Zeit viel diskutiert und geschrieben worden ist. Sehen Sie sich bitte alle Kärtchen an und legen Sie mir dann von allen Kärtchen jene fünf heraus, die Sie persönlich als die fünf wichtigsten Probleme der Schweiz ansehen.»)

As they changed their methodology in 1995 and introduced a lot of new topics, we decided to leave away years before 1995.

Parliamentary procedural requests

-> input/Analyse_SRG19_P1.RData -> input/modifications_p1.R

With the help of the team at Smartvote we can also offer (for the first time, as far as we know) all the parliamentary procedural requests (Parlamentarische Vorstösse) from the National Council and the Council of States. The original source of the items of parlamentary business is the Federal Chancellery.

They started the classification of the items in the year 2000. For this reason, we cannot work with requests in the years before 2000.

National Council elections

-> je-d-17.02.02.02.01.01.xlsx

The number of seats in the National Council we download from the Federal Statistical Offices FSO website.

-> px-x-1702020000_104.csv

For the party strenghts we download a CSV from Stat-Tab where we choose Switzerland in the first tab and all the other entries in all the other tabs.

Unemployment Rate

-> input/je-d-03.03.02.02.xlsx

We download the unemployment rate at the website of the Federal Statistical Office FSO.

Immigration into Switzerland

-> input/su-d-01.05.04.02.01.xlsx

Also the data about immigration we download from the FSO website.

Applications for Asylum

-> input/7-20-Bew-Asylgesuche-J-d-…-12.xlsx

From the website of the State Secretariat for Migration SEM we download the number of new applications of asylum per year.

Costs of the health system

-> input/je-d-14.05.01.01.xlsx

For another reference line concerning the cost of the Swiss health system, we work with this table by the FSO.

Articles concerning climate change

-> input/smd_climate.csv

We queried the Swiss Media Database SMD with the following search: erderwärmung OR "globale erwärmung" OR klimawandel OR treibhauseffekt OR klimaschutz OR klimaveränderung OR klimaerwärmung for each year since 1995 (by manually entering a start and end date of 1st of january and 31st of December each year). Then we manually saved the total number of results into a csv file.

Preparations

## [1] "package package:rmarkdown detached"
## Loading required package: knitr
## Loading required package: rstudioapi

Define packages

# from https://mran.revolutionanalytics.com/web/packages/checkpoint/vignettes/using-checkpoint-with-knitr.html
# if you don't need a package, remove it from here (commenting is probably not sufficient)
# tidyverse: see https://blog.rstudio.org/2016/09/15/tidyverse-1-0-0/
cat("
library(rstudioapi)
library(tidyverse) # ggplot2, dplyr, tidyr, readr, purrr, tibble
library(ggrepel) # repelling texts on ggplot
library(magrittr) # pipes
library(lintr) # code linting
library(sf) # spatial data handling
library(rmarkdown)",
file = "manifest.R")

Install packages

# if checkpoint is not yet installed, install it (for people using this
# system for the first time)
if (!require(checkpoint)) {
  if (!require(devtools)) {
    install.packages("devtools", repos = "http://cran.us.r-project.org")
    require(devtools)
  }
  devtools::install_github("RevolutionAnalytics/checkpoint",
                           ref = "v0.3.2", # could be adapted later,
                           # as of now (beginning of July 2017
                           # this is the current release on CRAN)
                           repos = "http://cran.us.r-project.org")
  require(checkpoint)
}
# nolint start
if (!dir.exists("~/.checkpoint")) {
  dir.create("~/.checkpoint")
}
# nolint end
# install packages for the specified CRAN snapshot date
checkpoint(snapshotDate = package_date,
           project = path_to_wd,
           verbose = T,
           scanForPackages = T,
           use.knitr = F,
           R.version = R_version)
rm(package_date)

Load packages

source("manifest.R")
unlink("manifest.R")
sessionInfo()
## R version 3.5.3 Patched (2019-03-11 r76221)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] rmarkdown_1.11   sf_0.7-3         lintr_1.0.3      magrittr_1.5    
##  [5] ggrepel_0.8.0    forcats_0.4.0    stringr_1.4.0    dplyr_0.8.0.1   
##  [9] purrr_0.3.0      readr_1.3.1      tidyr_0.8.2      tibble_2.0.1    
## [13] ggplot2_3.1.0    tidyverse_1.2.1  checkpoint_0.4.0 rstudioapi_0.9.0
## [17] knitr_1.21      
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_0.2.5 xfun_0.5         haven_2.1.0      lattice_0.20-38 
##  [5] colorspace_1.4-0 generics_0.0.2   htmltools_0.3.6  yaml_2.2.0      
##  [9] rlang_0.3.1      e1071_1.7-0.1    pillar_1.3.1     glue_1.3.0      
## [13] withr_2.1.2      DBI_1.0.0        modelr_0.1.4     readxl_1.3.0    
## [17] plyr_1.8.4       munsell_0.5.0    gtable_0.2.0     cellranger_1.1.0
## [21] rvest_0.3.2      evaluate_0.13    rex_1.1.2        class_7.3-15    
## [25] broom_0.5.1      Rcpp_1.0.0       classInt_0.3-1   scales_1.0.0    
## [29] backports_1.1.3  jsonlite_1.6     hms_0.4.2        digest_0.6.18   
## [33] stringi_1.3.1    grid_3.5.3       cli_1.0.1        tools_3.5.3     
## [37] lazyeval_0.2.1   crayon_1.3.4     pkgconfig_2.0.2  xml2_1.2.0      
## [41] lubridate_1.7.4  assertthat_0.2.0 httr_1.4.0       R6_2.4.0        
## [45] units_0.6-2      nlme_3.1-137     compiler_3.5.3

Read in Data

Worries (barometer)

# read table of survey conducted by GFS Bern
worries <- readxl::read_excel(
  "input/Original_Sorgenbarometer_1988-2018.xlsx"
) %>%
  rename(year = 1) %>%
  gather(topic, value, -one_of("year")) %>%
  # reduce to years after 1995 because before that,
  # the question asked was different than afterwards
  filter(year >= 1995) %>%
  # add a rank by grouping / arranging per year
  # what's not perfect though: when two topics have the same value, they're
  # ordered in the way they occur (so pretty randomly)
  group_by(year) %>%
  arrange(desc(value)) %>%
  mutate(rank = row_number()) %>%
  ungroup() %>%
  # if there is no value, we also don't want it to have a rank
  mutate(rank = ifelse(is.na(value), NA, rank)) %>%
  # convert to correct data type
  mutate(topic = factor(topic)) %>%
  # sort nicely
  arrange(year, rank)
## New names:
## * `` -> `..1`
# export
write_csv(
  worries,
  "output/worries.csv"
)

Parliamentary procedural requests

# read data of parlamentary affairs collected by smartvote
load("input/Analyse_SRG19_P1.RData")

# give more meaningful names to variables
# a = affairs, c = councillor
affairs_data_raw <- a_details
affairs_and_councillors_data <- a_details1
# data from now on filtered to time after year 2000
affairs_and_councillors_with_topics <- a_details2
affairs_per_year_and_council_raw <- a_details3_1
requests_per_year_and_council <- a_details4_1
requests_per_council_topic_shares <- a_details5_1
# affairs now reduced to procedural requests only
affairs_per_fraction <- a_details3_2
requests_per_year_and_fraction <- a_details4_2
requests_per_fraction_topic_shares <- a_details5_2

issue_names <- c(
  "^issue_08$" = "Internationale Politik",
  "^issue_09$" = "Sicherheitspolitik",
  "^issue_10$" = "Europapolitik",
  "^issue_15$" = "Wirtschaft",
  "^issue_44$" = "Beschäftigung Arbeit",
  "^issue_55$" = "Landwirtschaft",
  "^issue_24$" = "Finanzwesen",
  "^issue_2446$" = "Steuer",
  "^issue_28$" = "Soziale Fragen",
  "^issue_2811$" = "Migration",
  "^issue_2836$" = "Sozialer Schutz",
  "^issue_2841$" = "Gesundheit",
  "^issue_2846$" = "Raumplanung Wohnungswesen",
  "^issue_2831$" = "Kultur",
  "^issue_32$" = "Bildung",
  "^issue_36$" = "Wissenschaft Forschung",
  "^issue_34$" = "Medien Kommunikation",
  "^issue_48$" = "Verkehr",
  "^issue_52$" = "Umwelt",
  "^issue_66$" = "Energie",
  "^issue_04$" = "Staatspolitik",
  "^issue_0421$" = "Parlament",
  "^issue_12$" = "Recht allgemein",
  "^issue_1211$" = "Zivilrecht",
  "^issue_1216$" = "Strafrecht",
  "^issue_1221$" = "Gerichtswesen",
  "^issue_1231$" = "Internationales Recht",
  "^issue_1236$" = "Menschenrechte"
)

# function that makes the data frame tidy and replaces issue ids with real names
gather_and_convert_to_numbers <- function (df) {
  df %>%
    # convert columns to rows
    gather(topic, share, starts_with("issue_")) %>%
    mutate(topic = str_replace_all(topic, issue_names)) %>%
    rename(total = N) %>%
    mutate(
      share = share / 100,
      n = round(total * share, digits = 0)
    )
}

# apply to requests per council
requests_per_council_and_topic <-
  requests_per_council_topic_shares %>%
  gather_and_convert_to_numbers()

# and to requests per fraction
requests_per_fraction_and_topic <-
  requests_per_fraction_topic_shares %>%
  gather_and_convert_to_numbers()

# clean up
rm(
  a_details,
  a_details1,
  a_details2,
  a_details3_1,
  a_details3_2,
  a_details4_1,
  a_details4_2,
  a_details5_1,
  a_details5_2,
  gather_and_convert_to_numbers
)

# export
write_csv(
  requests_per_council_and_topic,
  "output/requests_per_council_and_topic.csv"
)

write_csv(
  requests_per_fraction_and_topic,
  "output/requests_per_fraction_and_topic.csv"
)

The raw data contain the following form of affairs:

knitr::kable(
  affairs_data_raw %>%
    group_by(a_type) %>%
    tally() %>%
    arrange(desc(n))
)
a_type n
Interpellation 10434
Motion 8771
Fragestunde. Frage 8722
Postulat 4377
Parlamentarische Initiative 2243
Geschäft des Bundesrates 2113
Anfrage 1915
Einfache Anfrage 1254
Petition 797
Geschäft des Parlaments 588
Standesinitiative 473
Dringliche Interpellation 334
Dringliche Einfache Anfrage 202
Empfehlung 139
Dringliche Anfrage 79

Visual Analysis

Parliamentary procedural requests

ggplot(
  requests_per_fraction_topic_shares,
  aes(
    x = year,
    y = N
  )
) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(
    title = "Total number of parlamentary procedural requests",
    y = NULL,
    x = NULL
  )

ggplot(
  requests_per_fraction_and_topic,
  aes(
    x = topic,
    y = n
  )
) +
  geom_bar(stat = "identity") +
  facet_wrap(~ year, ncol = 4) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 728 rows containing missing values (position_stack).

fraction_colors <- c(
  "Christlichdemokratische Fraktion" = "#D6862B",
  "Evangelische und Unabhängige Fraktion" = "#DEAA28",
  "Fraktion der Schweizerischen Volkspartei" = "#4B8A3E",
  "Fraktionslos" = "#9D9D9D",
  "Freisinnig-demokratische Fraktion" = "#3872B5",
  "Grüne Fraktion" = "#84B547",
  "Liberale Fraktion" = "#618DEA",
  "Sozialdemokratische Fraktion" = "#F0554D",
  "EVP/EDU Fraktion" = "#DEAA28",
  "Fraktion CVP/EVP/glp" = "#D6862B",
  "FDP-Liberale Fraktion" = "#3872B5",
  "Fraktion BD" = "#E6C820",
  "Fraktion CVP-EVP" = "#D6862B",
  "Grünliberale Fraktion" = "#C4C43D",
  "CVP-Fraktion" = "#D6862B"
)

ggplot(
  requests_per_fraction_and_topic,
  aes(
    x = year,
    y = n,
    fill = a_author_fraction
  )
) +
  geom_bar(stat = "identity") +
  facet_wrap(~ topic, ncol = 4) +
  scale_fill_manual(values = fraction_colors, guide = FALSE) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(
    title = "Number of requests per topic (colors correspond to parties)",
    x = NULL,
    y = NULL
  )
## Warning: Removed 728 rows containing missing values (position_stack).

ggplot(
  requests_per_fraction_and_topic %>%
    group_by(year, topic) %>%
    summarise(n = sum(n, na.rm = TRUE)) %>%
    filter(topic %in% c("Gesundheit")),
  aes(
    x = year,
    y = n
  )
) +
  geom_bar(stat = "identity") +
  facet_wrap(~ topic, ncol = 4) +
  theme_minimal()

# select a more or less random list of some topics to inspect a bit closer
relevant_topics <- c(
  "Umwelt",
  "Energie",
  "Europapolitik",
  "Sicherheitspolitik",
  "Migration"
)

ggplot(
  requests_per_fraction_and_topic %>%
    filter(topic %in% relevant_topics),
  aes(
    x = year,
    y = n,
    group = topic,
    color = topic
  )
) +
  geom_line() +
  facet_wrap(~ a_author_fraction, ncol = 1, scales = "free_y") +
  theme_minimal() +
  labs(
    title = "Parlamentary fractions: Five relevant topics over time",
    x = NULL,
    y = NULL
  )

Worries

To find out in what way we will display the worries in the final publication, we try different approaches to selecting topics and years. As the amount of data can get overwhelming pretty quickly, we need to find a clear focus to concentrate on.

Sorted by median (Top 20)
knitr::kable(
  worries %>%
    group_by(topic) %>%
    summarise(value = median(value, na.rm = TRUE)) %>%
    arrange(desc(value)) %>%
    top_n(20, wt = value)
)
topic value
Arbeitslosigkeit 56.5
AHV / Altersvorsorge 42.0
Gesundheit, Krankenkassen 41.0
Flüchtlinge / Asyl 30.5
AusländerInnen 28.5
EU / Bilaterale / Integration 21.0
Neue Armut 18.0
Soziale Sicherheit 18.0
Umweltschutz 17.5
Bundesfinanzen 17.0
Persönliche Sicherheit 17.0
Euro-Krise 16.0
Löhne 13.0
Drogen / Alkohol 12.5
Bankenkrise 12.0
Inflation 12.0
Globalisierung 11.0
(Kern-) Energie 10.0
Rassismus / Fremdenfeindlichkeit 10.0
Internet-Sicherheit / Cyber-Spionage 9.0
Biggest worries
As Areas
number_of_top_worries <- 10

manually_selected_top_5 <- c(
  "Arbeitslosigkeit",
  "Flüchtlinge / Asyl",
  "AusländerInnen",
  "AHV / Altersvorsorge",
  "Umweltschutz"
)

# filter for most important groups
worries_sorted <- worries %>%
  group_by(topic) %>%
  mutate(
    # for median calculation, replace NAs with zero
    value = ifelse(is.na(value), 0, value),
    median = median(value, na.rm = TRUE)
  ) %>%
  arrange(desc(median)) %>%
  distinct(topic) %>%
  simplify()

worries %<>%
  mutate(topic = factor(topic, levels = worries_sorted)) %>%
  arrange(topic, year)

# use head instead of top_n to get an exact number
big_worries <- head(worries_sorted, number_of_top_worries)
# uncomment the following if you want to use the manual selection
# big_worries <- manually_selected_top_5

other_worries <- worries %>%
  filter(!topic %in% big_worries) %>%
  group_by(year) %>%
  summarise(value = sum(value, na.rm = TRUE)) %>%
  mutate(topic = "Andere")

ggplot(
  worries %>%
    filter(topic %in% big_worries),
  aes(
    x = year,
    y = value
  )
) +
  geom_area() +
  facet_wrap(~ topic) +
  theme_minimal() +
  labs(
    title = "Biggest worries",
    x = NULL,
    y = NULL
  )
## Warning: Removed 1 rows containing missing values (position_stack).

In one Plot
ggplot(
  worries %>%
    filter(topic %in% big_worries) %>%
    filter(year %% 4 == 3 | year == max(year)),
  aes(
    x = year,
    y = value,
    color = topic
  )
) +
  geom_line() +
  scale_color_brewer(palette = "Set1") +
  theme_minimal() +
  labs(
    title = "Top worries over time (only election years and latest)",
    fill = NULL,
    x = NULL,
    y = NULL
  )
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
## Warning: Removed 8 rows containing missing values (geom_path).

Bump Chart
ranks_of_worries <- worries %>%
  # add hex code for some topics to highlight them
  mutate(color = case_when(
    topic == "Arbeitslosigkeit" ~ "#e31f2b",
    topic == "Flüchtlinge / Asyl" ~ "#61b13e",
    topic == "AusländerInnen" ~ "#1cb373",
    topic == "AHV / Altersvorsorge" ~ "#1e8ce3",
    topic == "Umweltschutz" ~ "#ca51af",
    TRUE ~ "#d9d9d9"
  ))

ggplot(
  ranks_of_worries,
  aes(
    x = year,
    y = rank,
    color = color,
    group = topic,
    label = topic
  )
) +
  geom_line() +
  geom_point(size = 2) +
  geom_text_repel(
    data = ranks_of_worries %>%
      filter(year == max(year)),
    segment.size = 0.25,
    segment.color = "black",
    nudge_x = 0.5,
    hjust = 0,
    direction = "y"
  ) +
  scale_y_continuous(trans = "reverse", limits = c(20, 1)) +
  scale_color_identity() +
  xlim(min(ranks_of_worries$year), max(ranks_of_worries$year) + 7) +
  theme_minimal() +
  theme(legend.position = "none") +
  labs(
    title = "Rank of worries (labelled are only top 20 in 2018)",
    x = NULL,
    y = NULL
  )
## Warning: Removed 240 rows containing missing values (geom_path).
## Warning: Removed 361 rows containing missing values (geom_point).
## Warning: Removed 15 rows containing missing values (geom_text_repel).

# tmp filter list to display only 20 lines in a chart
top_20_2018 <- ranks_of_worries %>%
  filter(year == max(year)) %>%
  arrange(rank) %>%
  head(20) %>%
  select(topic) %>%
  simplify()
Only Most important ones
ggplot(
  ranks_of_worries %>%
    filter(year %% 2 == 1) %>%
    filter(topic %in% big_worries),
  aes(
    x = year,
    y = rank,
    color = color,
    group = topic,
    label = topic
  )
) +
  geom_line() +
  geom_point() +
  scale_x_continuous(
    breaks = seq(1995, 2015, 4),
    limits = c(min(ranks_of_worries$year), max(ranks_of_worries$year) + 8)
  ) +
  scale_y_continuous(trans = "reverse", limits = c(20, 1)) +
  scale_color_identity(guide = FALSE) +
  geom_text_repel(
    data = ranks_of_worries %>%
      filter(year %% 2 == 1) %>%
      filter(topic %in% big_worries) %>%
      filter(year == max(year)),
    segment.size = 0.25,
    segment.color = "black",
    nudge_x = 0.5,
    hjust = 0,
    direction = "y"
  ) +
  theme_minimal() +
  labs(
    title = "Top 10 worries ordered by rank",
    x = NULL,
    y = NULL
  )
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).

ggplot(
  ranks_of_worries %>%
    filter(year %% 2 == 1) %>%
    filter(topic %in% big_worries),
  aes(
    x = year,
    y = value,
    color = color,
    group = topic,
    label = topic
  )
) +
  geom_line() +
  geom_point() +
  scale_x_continuous(
    breaks = seq(1995, 2015, 4),
    limits = c(min(ranks_of_worries$year), max(ranks_of_worries$year) + 8)
  ) +
  scale_color_identity(guide = FALSE) +
  geom_text_repel(
    data = ranks_of_worries %>%
      filter(year %% 2 == 1) %>%
      filter(topic %in% big_worries) %>%
      filter(year == max(year)),
    segment.size = 0.25,
    segment.color = "black",
    nudge_x = 0.5,
    hjust = 0,
    direction = "y"
  ) +
  theme_minimal() +
  labs(
    title = "Real values: how many people were worried in this year?",
    subtitle = "Displayed are only the top 9 topics",
    x = NULL,
    y = NULL
  )
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).

After looking at the different charts, we decided to focus on the 6 topics that are the most relevant in the latest available year (2018). For the sorted area plot that you see in the final article, we left away all the even years (except for the latest). That was necessary so the plot does not look too confusing.

The visualisation itself was not produced in R but with the online tool RawGraphs using their d3 based tool for bump charts. From there we downloaded a SVG file and imported it into the vector graphics tool Sketch and manually optimized the curving of the lines and their colors and positions.

The result:

knitr::include_graphics("img/final_chart.png")

Additional data

In order to better contextualise and classify the data on the concerns of the population, the following sections introduce some additional data.

Party strengths

# read voter's share of parties
party_strengths <- read_csv(
  "input/px-x-1702020000_104.csv",
  col_types = cols() # hide info about types
) %>%
  rename(
    canton = 1,
    year = 2,
    party = 3,
    votes = 4,
    fictitious_votes = 5,
    strength = 6
  ) %>%
  # drop unnecessary columns (votes is empty, canton is Schweiz)
  select(year, party, strength) %>%
  # convert strength to number
  mutate(strength = as.numeric(strength)) %>%
  # replace NAs with zero for delta calculations so parties appear also
  # when they have a result for the first time
  mutate(strength = ifelse(is.na(strength), 0, strength)) %>%
  group_by(party) %>%
  arrange(year) %>%
  # calculate delta by subtracting last value in group from current one
  mutate(delta = strength - lag(strength))
## Warning: NAs durch Umwandlung erzeugt
# select only seven biggest parties
main_parties <- party_strengths %>%
  ungroup() %>%
  filter(year == max(year)) %>%
  top_n(7, wt = strength) %>%
  select(party) %>%
  simplify()

party_colors <- c(
  "SVP/UDC" = "#4B8A3E",
  "CVP/PDC" = "#D6862B",
  "FDP/PLR (PRD)" = "#3872B5",
  "GPS/PES" = "#84B547",
  "SP/PS" = "#F0554D",
  "BDP/PBD" = "#E6C820",
  "GLP/PVL" = "#C4C43D"
)

ggplot(
  party_strengths %>%
    filter(party %in% main_parties, year >= 1995),
  aes(
    x = party,
    y = delta,
    fill = party,
    label = scales::percent(delta / 100)
  )
) +
  geom_bar(position = "dodge", stat = "identity") +
  geom_text(aes(vjust = ifelse(delta >= 0, -0.5, 1.5))) +
  scale_fill_manual(values = party_colors, guide = FALSE) +
  facet_wrap(~ year) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(
    title = "Did the main 7 parties win or lose?",
    subtitle = "Loss/gain in percentage points",
    x = NULL,
    y = NULL
  )

# export
write_csv(
  party_strengths %>%
    # round deltas as the substraction introduced weird results
    mutate(delta = round(delta, digits = 2)),
  "output/party_strengths.csv"
)

Unemployment Rate

The footnote 2 in the spreadsheet is not relevant for us: “Change of practice: In March 2018, a new partially automated registration system was introduced at the regional employment centres (RAV).” (orig: Praxisänderung: Auf den März 2018 wurde bei den regionalen Arbeitsvermittlungszentren (RAV) ein neues teilautomatisiertes Erfassungssystem eingeführt.)

unemployment_rate <- readxl::read_excel(
  "input/je-d-03.03.02.02.xlsx",
  sheet = 1,
  range = "B5:AC8"
) %>%
  # convert columns to rows
  gather(year, rate) %>%
  # filter out NAs introduced by empty lines 6 and 7
  filter(!is.na(rate)) %>%
  # remove the footnotes from the years and convert to numeric
  mutate(year = as.numeric(str_sub(year, 0, 4))) %>%
  # filter out years outside of our range of interest
  filter(year >= min(worries$year) & year <= max(worries$year)) %>%
  # add column for change relative to first year in range of interest
  mutate(change = rate / first(rate))

ggplot(
  unemployment_rate,
  aes(
    x = year,
    y = change
  )
) +
  geom_line() +
  expand_limits(y = 0) +
  theme_minimal() +
  labs(
    x = NULL,
    y = NULL
  )

# export
write_csv(
  unemployment_rate,
  "output/unemployment_rate.csv"
)

Immigration into Switzerland

The footnote 2 for year 2011 states: “From 2011, a change in the production process and a new definition of the permanent resident population, which additionally includes persons in the asylum process with a total stay of at least 12 months.” (orig: “Ab 2011 Wechsel des Produktionsverfahrens und neue Definition der ständigen Wohnbevölkerung, die zusätzlich Personen im Asylprozess mit einer Gesamtaufenthaltsdauer von mindestens 12 Monaten umfasst.”)

As there is no other or better data source available, we’ll have to ignore this.

immigration <- readxl::read_excel(
  "input/su-d-01.05.04.02.01.xlsx",
  range = "A3:AB23"
) %>%
  # rename first column
  rename(group = 1) %>%
  # convert all columns except group to rows
  gather(year, immigrants, -one_of("group")) %>%
  # remove footnotes and convert to numeric
  mutate(year = as.numeric(str_sub(year, 0, 4))) %>%
  # look only at non-swiss immigrants, then remove group column
  filter(group == "Ausland") %>%
  select(-group) %>%
  # filter out years outside of our range of interest
  filter(year >= min(worries$year) & year <= max(worries$year)) %>%
  # add column for change relative to first year in range of interest
  mutate(change = immigrants / first(immigrants))
## New names:
## * `` -> `..1`
ggplot(
  immigration,
  aes(
    x = year,
    y = change
  )
) +
  geom_line() +
  expand_limits(y = 0) +
  theme_minimal() +
  labs(
    x = NULL,
    y = NULL
  )

# export
write_csv(
  immigration,
  "output/immigration.csv"
)

Applications for Asylum

The files for new applications for asylum come in yearly files starting in 1994. By iterating over all years from 1994 to 2018 we read in each file and get out the total number of new applications in cell B7.

asylum_applications <- 1994:2018 %>%
  map_df(function(current_year) {
    readxl::read_excel(
      paste0("input/7-20-Bew-Asylgesuche-J-d-", current_year, "-12.xlsx"),
      sheet = "CH-Kt",
      range = "B7:B7",
      col_names = c("applications")
    ) %>%
      mutate(year = current_year)
  }) %>%
  # filter out years outside of our range of interest
  filter(year >= min(worries$year) & year <= max(worries$year)) %>%
  # add column for change relative to first year in range of interest
  mutate(change = applications / first(applications)) %>%
  # move column year to first position
  select(year, everything())

ggplot(
  asylum_applications,
  aes(
    x = year,
    y = change
  )
) +
  geom_line() +
  expand_limits(y = 0) +
  theme_minimal() +
  labs(
    x = NULL,
    y = NULL
  )

# export
write_csv(
  asylum_applications,
  "output/asylum_applications.csv"
)

Costs of the health system

Conveniently this table already comes with the relative change and is already indexed by the year 1995, so we can read these values directly and don’t need to calculate it ourselves.

health_costs <- readxl::read_excel(
  "input/je-d-14.05.01.01.xlsx",
  range = "A10:C66",
  col_names = c("year", "costs", "change")
) %>%
  # make decimals out of the change (now in percent)
  mutate(change = change / 100) %>%
  # filter out years outside of our range of interest
  filter(year >= min(worries$year) & year <= max(worries$year))

ggplot(
  health_costs,
  aes(
    x = year,
    y = change
  )
) +
  geom_line() +
  expand_limits(y = 0) +
  theme_minimal() +
  labs(
    x = NULL,
    y = NULL
  )

# export
write_csv(
  health_costs,
  "output/health_costs.csv"
)

Articles in Swiss Media (German only)

# calculate mentions per month
smd_climate <- read_csv("input/smd_climate.csv") %>%
  mutate(
    months = ifelse(
      year == lubridate::year(Sys.Date()),
      4, # the data we downloaded was up until 2019-04-30
      12
    ),
    per_month = articles / months
  )
## Parsed with column specification:
## cols(
##   year = col_double(),
##   articles = col_double()
## )
ggplot(
  smd_climate,
  aes(
    x = year,
    y = per_month
  )
) +
  geom_bar(stat = "identity") +
  expand_limits(y = 0) +
  theme_minimal() +
  labs(
    x = NULL,
    y = NULL
  )

# export
write_csv(
  smd_climate,
  "output/smd_climate.csv"
)

Linting

Der Code in diesem RMarkdown wird mit lintr automatisch auf den Wickham’schen tidyverse style guide überprüft.

lintr::lint("main.Rmd")
## main.Rmd:676:3: style: Commented code should be removed.
## # big_worries <- manually_selected_top_5
##   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~