This document describes the pre-processing and exploratory analysis of the data set that is the basis of the article Die grössten Sorgen der Schweiz – und womit sie zusammenhängen published on srf.ch.
SRF Data attaches importance to the fact that the data pre-processing and analysis can be reproduced and checked. SRF Data believes in the principle of open data, but also open and comprehensible methods. On the other hand, it should be possible for third parties to build on this preparatory work and thus generate further evaluations or applications.
The preprocessing and analysis of the data was conducted in the R project for statistical computing. The RMarkdown script used to generate this document and all the resulting data can be downloaded under this link. Through executing main.Rmd
, the herein described process can be reproduced and this document can be generated. In the course of this, data from the folder input
will be processed and results will be written to output
.
SRF Data uses Timo Grossenbacher’s rddj-template as the basis for its R scripts. If you have problems executing this script, it may help to study the instructions from the rddj-template.
This report was generated on 2019-06-03 19:05:26. R version: 3.5.3 on x86_64-apple-darwin15.6.0. For this report, CRAN packages as of 2019-03-01 were used.
The code for the herein described process can also be freely downloaded from https://github.com/srfdata/2019-06-worries.
2019-06-worries by SRF Data is licensed under a Creative Commons Namensnennung - Attribution ShareAlike 4.0 International License.
Code and data by SRF Data are available on https://srfdata.github.io.
The published information has been carefully compiled, but does not claim to be up-to-date, complete or correct. No liability is assumed for damages arising from the use of this script or the information drawn from it. This also applies to contents of third parties which are accessible via this offer.
worries.csv
Attribute | Type | Description |
---|---|---|
year | Numeric | Year of the survey |
topic | String | Topic (most important problems for the country) |
value | Numeric | Share in percent of poeple (Swiss, above 18) that are worried about that topic |
rank | Numeric | Rank of the topic, was it e.g. the number one worry in that year |
requests_per_fraction_and_topic.csv
Caution: The topics in worries
are not the same as in the requests…
tables. Also: In 2012 the Federal Chancellery introduced numerous new topics to their classification system.
Attribute | Type | Description |
---|---|---|
year | Numeric | Year (there usually are 4-5 sessions per year) |
a_author_fraction | String | What fraction submitted the requests |
total | Numeric | Total number of requests the fraction submitted in that year |
topic | String | What topic can the request be attributed to (multiple topics per request possible) |
share | Numeric | Share of that topic in the total of all requests the fraction submitted that year |
n | Numeric | Number of requests for that topic and fraction in that year |
requests_per_council_and_topic.csv
Attribute | Type | Description |
---|---|---|
year | Numeric | Year (there usually are 4-5 sessions per year) |
a_council | String | What council submitted the requests |
total | Numeric | Total number of requests the council submitted in that year |
topic | String | What topic can the request be attributed to (multiple topics per request possible) |
share | Numeric | Share of that topic in the total of all requests the council submitted that year |
n | Numeric | Number of requests for that topic and council in that year |
party_strengths.csv
Attribute | Type | Description |
---|---|---|
year | Numeric | Year of the election |
party | String | Name of the party |
strength | Numeric | Share in percent of all votes that that party won in that election |
delta | Numeric | Change of the strength share of that party compared to the last election |
unemployment_rate.csv
Attribute | Type | Description |
---|---|---|
year | Numeric | Year of measurement |
rate | Numeric | Proportion of people in Switzerland that were unemployed (average per year) |
change | Numeric | Change of the rate since 1995 |
asylum_applications.csv
Attribute | Type | Description |
---|---|---|
year | Numeric | Year of measurement |
applications | Numeric | Number of new applications for asylum in that year (per 31st of December) |
change | Numeric | Change of the number of applications since 1995 |
immigration.csv
Attribute | Type | Description |
---|---|---|
year | Numeric | Year of measurement |
immigrants | Numeric | Number of non-swiss immigrants that took up residence in Switzerland |
change | Numeric | Change of the number of immigrants since 1995 |
health_costs.csv
Attribute | Type | Description |
---|---|---|
year | Numeric | Year of measurement |
costs | Numeric | Costs of the Swiss health system in millions (CHF) |
change | Numeric | Change of the number of applications since 1995 |
smd_climate.csv
Attribute | Type | Description |
---|---|---|
year | Numeric | Year searched (1st of Jan until 31st of Dec) |
articles | Numeric | Number of hits according to smd.ch |
months | Numeric | Numbers of months searched (mainly relevant for 2019) |
per_month | Numeric | Number of articles with one of the keywords found |
-> input/Original_Sorgenbarometer_1988-2018.xlsx
The research institute gfs.bern and the bank Credit Suisse publish a Worry Barometer each year. On their website they write about the survey: “What are the major concerns of people in Switzerland? How much confidence do they have in decision-makers in the fields of politics, business, and society? For the past 42 years, Credit Suisse has conducted an annual Worry Barometer survey to examine precisely these issues. With the Worry Barometer, Credit Suisse aims to contribute to the public debate on issues of socio-political relevance. [Every summer] the research institute gfs.bern askes [about 2500] voters across Switzerland about their concerns on behalf of Credit Suisse. The statistical sampling error is ±2.0 percentage points.”
The survey is conducted the following way: “On these cards you can see some topics that have been discussed and written about a lot recently. Please take a look at all the cards and then put out to me those five of them which you personally consider to be the five most important problems in Switzerland.” (orig: «Auf diesen Kärtchen sehen Sie einige Themen, über die in der letzten Zeit viel diskutiert und geschrieben worden ist. Sehen Sie sich bitte alle Kärtchen an und legen Sie mir dann von allen Kärtchen jene fünf heraus, die Sie persönlich als die fünf wichtigsten Probleme der Schweiz ansehen.»)
As they changed their methodology in 1995 and introduced a lot of new topics, we decided to leave away years before 1995.
-> input/Analyse_SRG19_P1.RData
-> input/modifications_p1.R
With the help of the team at Smartvote we can also offer (for the first time, as far as we know) all the parliamentary procedural requests (Parlamentarische Vorstösse) from the National Council and the Council of States. The original source of the items of parlamentary business is the Federal Chancellery.
They started the classification of the items in the year 2000. For this reason, we cannot work with requests in the years before 2000.
-> je-d-17.02.02.02.01.01.xlsx
The number of seats in the National Council we download from the Federal Statistical Offices FSO website.
-> px-x-1702020000_104.csv
For the party strenghts we download a CSV from Stat-Tab where we choose Switzerland
in the first tab and all the other entries in all the other tabs.
-> input/popular_votes_1994_2019.csv
We manually extracted the popular votes from the list on the Federal Chancellerys website.
-> input/je-d-03.03.02.02.xlsx
We download the unemployment rate at the website of the Federal Statistical Office FSO.
-> input/su-d-01.05.04.02.01.xlsx
Also the data about immigration we download from the FSO website.
-> input/7-20-Bew-Asylgesuche-J-d-…-12.xlsx
From the website of the State Secretariat for Migration SEM we download the number of new applications of asylum per year.
-> input/je-d-14.05.01.01.xlsx
For another reference line concerning the cost of the Swiss health system, we work with this table by the FSO.
-> input/smd_climate.csv
We queried the Swiss Media Database SMD with the following search: erderwärmung OR "globale erwärmung" OR klimawandel OR treibhauseffekt OR klimaschutz OR klimaveränderung OR klimaerwärmung
for each year since 1995 (by manually entering a start and end date of 1st of january and 31st of December each year). Then we manually saved the total number of results into a csv file.
## [1] "package package:rmarkdown detached"
## Loading required package: knitr
## Loading required package: rstudioapi
# from https://mran.revolutionanalytics.com/web/packages/checkpoint/vignettes/using-checkpoint-with-knitr.html
# if you don't need a package, remove it from here (commenting is probably not sufficient)
# tidyverse: see https://blog.rstudio.org/2016/09/15/tidyverse-1-0-0/
cat("
library(rstudioapi)
library(tidyverse) # ggplot2, dplyr, tidyr, readr, purrr, tibble
library(ggrepel) # repelling texts on ggplot
library(magrittr) # pipes
library(lintr) # code linting
library(sf) # spatial data handling
library(rmarkdown)",
file = "manifest.R")
# if checkpoint is not yet installed, install it (for people using this
# system for the first time)
if (!require(checkpoint)) {
if (!require(devtools)) {
install.packages("devtools", repos = "http://cran.us.r-project.org")
require(devtools)
}
devtools::install_github("RevolutionAnalytics/checkpoint",
ref = "v0.3.2", # could be adapted later,
# as of now (beginning of July 2017
# this is the current release on CRAN)
repos = "http://cran.us.r-project.org")
require(checkpoint)
}
# nolint start
if (!dir.exists("~/.checkpoint")) {
dir.create("~/.checkpoint")
}
# nolint end
# install packages for the specified CRAN snapshot date
checkpoint(snapshotDate = package_date,
project = path_to_wd,
verbose = T,
scanForPackages = T,
use.knitr = F,
R.version = R_version)
rm(package_date)
source("manifest.R")
unlink("manifest.R")
sessionInfo()
## R version 3.5.3 Patched (2019-03-11 r76221)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rmarkdown_1.11 sf_0.7-3 lintr_1.0.3 magrittr_1.5
## [5] ggrepel_0.8.0 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.0.1
## [9] purrr_0.3.0 readr_1.3.1 tidyr_0.8.2 tibble_2.0.1
## [13] ggplot2_3.1.0 tidyverse_1.2.1 checkpoint_0.4.0 rstudioapi_0.9.0
## [17] knitr_1.21
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.5 xfun_0.5 haven_2.1.0 lattice_0.20-38
## [5] colorspace_1.4-0 generics_0.0.2 htmltools_0.3.6 yaml_2.2.0
## [9] rlang_0.3.1 e1071_1.7-0.1 pillar_1.3.1 glue_1.3.0
## [13] withr_2.1.2 DBI_1.0.0 modelr_0.1.4 readxl_1.3.0
## [17] plyr_1.8.4 munsell_0.5.0 gtable_0.2.0 cellranger_1.1.0
## [21] rvest_0.3.2 evaluate_0.13 rex_1.1.2 class_7.3-15
## [25] broom_0.5.1 Rcpp_1.0.0 classInt_0.3-1 scales_1.0.0
## [29] backports_1.1.3 jsonlite_1.6 hms_0.4.2 digest_0.6.18
## [33] stringi_1.3.1 grid_3.5.3 cli_1.0.1 tools_3.5.3
## [37] lazyeval_0.2.1 crayon_1.3.4 pkgconfig_2.0.2 xml2_1.2.0
## [41] lubridate_1.7.4 assertthat_0.2.0 httr_1.4.0 R6_2.4.0
## [45] units_0.6-2 nlme_3.1-137 compiler_3.5.3
# read table of survey conducted by GFS Bern
worries <- readxl::read_excel(
"input/Original_Sorgenbarometer_1988-2018.xlsx"
) %>%
rename(year = 1) %>%
gather(topic, value, -one_of("year")) %>%
# reduce to years after 1995 because before that,
# the question asked was different than afterwards
filter(year >= 1995) %>%
# add a rank by grouping / arranging per year
# what's not perfect though: when two topics have the same value, they're
# ordered in the way they occur (so pretty randomly)
group_by(year) %>%
arrange(desc(value)) %>%
mutate(rank = row_number()) %>%
ungroup() %>%
# if there is no value, we also don't want it to have a rank
mutate(rank = ifelse(is.na(value), NA, rank)) %>%
# convert to correct data type
mutate(topic = factor(topic)) %>%
# sort nicely
arrange(year, rank)
## New names:
## * `` -> `..1`
# export
write_csv(
worries,
"output/worries.csv"
)
# read data of parlamentary affairs collected by smartvote
load("input/Analyse_SRG19_P1.RData")
# give more meaningful names to variables
# a = affairs, c = councillor
affairs_data_raw <- a_details
affairs_and_councillors_data <- a_details1
# data from now on filtered to time after year 2000
affairs_and_councillors_with_topics <- a_details2
affairs_per_year_and_council_raw <- a_details3_1
requests_per_year_and_council <- a_details4_1
requests_per_council_topic_shares <- a_details5_1
# affairs now reduced to procedural requests only
affairs_per_fraction <- a_details3_2
requests_per_year_and_fraction <- a_details4_2
requests_per_fraction_topic_shares <- a_details5_2
issue_names <- c(
"^issue_08$" = "Internationale Politik",
"^issue_09$" = "Sicherheitspolitik",
"^issue_10$" = "Europapolitik",
"^issue_15$" = "Wirtschaft",
"^issue_44$" = "Beschäftigung Arbeit",
"^issue_55$" = "Landwirtschaft",
"^issue_24$" = "Finanzwesen",
"^issue_2446$" = "Steuer",
"^issue_28$" = "Soziale Fragen",
"^issue_2811$" = "Migration",
"^issue_2836$" = "Sozialer Schutz",
"^issue_2841$" = "Gesundheit",
"^issue_2846$" = "Raumplanung Wohnungswesen",
"^issue_2831$" = "Kultur",
"^issue_32$" = "Bildung",
"^issue_36$" = "Wissenschaft Forschung",
"^issue_34$" = "Medien Kommunikation",
"^issue_48$" = "Verkehr",
"^issue_52$" = "Umwelt",
"^issue_66$" = "Energie",
"^issue_04$" = "Staatspolitik",
"^issue_0421$" = "Parlament",
"^issue_12$" = "Recht allgemein",
"^issue_1211$" = "Zivilrecht",
"^issue_1216$" = "Strafrecht",
"^issue_1221$" = "Gerichtswesen",
"^issue_1231$" = "Internationales Recht",
"^issue_1236$" = "Menschenrechte"
)
# function that makes the data frame tidy and replaces issue ids with real names
gather_and_convert_to_numbers <- function (df) {
df %>%
# convert columns to rows
gather(topic, share, starts_with("issue_")) %>%
mutate(topic = str_replace_all(topic, issue_names)) %>%
rename(total = N) %>%
mutate(
share = share / 100,
n = round(total * share, digits = 0)
)
}
# apply to requests per council
requests_per_council_and_topic <-
requests_per_council_topic_shares %>%
gather_and_convert_to_numbers()
# and to requests per fraction
requests_per_fraction_and_topic <-
requests_per_fraction_topic_shares %>%
gather_and_convert_to_numbers()
# clean up
rm(
a_details,
a_details1,
a_details2,
a_details3_1,
a_details3_2,
a_details4_1,
a_details4_2,
a_details5_1,
a_details5_2,
gather_and_convert_to_numbers
)
# export
write_csv(
requests_per_council_and_topic,
"output/requests_per_council_and_topic.csv"
)
write_csv(
requests_per_fraction_and_topic,
"output/requests_per_fraction_and_topic.csv"
)
The raw data contain the following form of affairs:
knitr::kable(
affairs_data_raw %>%
group_by(a_type) %>%
tally() %>%
arrange(desc(n))
)
a_type | n |
---|---|
Interpellation | 10434 |
Motion | 8771 |
Fragestunde. Frage | 8722 |
Postulat | 4377 |
Parlamentarische Initiative | 2243 |
Geschäft des Bundesrates | 2113 |
Anfrage | 1915 |
Einfache Anfrage | 1254 |
Petition | 797 |
Geschäft des Parlaments | 588 |
Standesinitiative | 473 |
Dringliche Interpellation | 334 |
Dringliche Einfache Anfrage | 202 |
Empfehlung | 139 |
Dringliche Anfrage | 79 |
ggplot(
requests_per_fraction_topic_shares,
aes(
x = year,
y = N
)
) +
geom_bar(stat = "identity") +
theme_minimal() +
labs(
title = "Total number of parlamentary procedural requests",
y = NULL,
x = NULL
)
ggplot(
requests_per_fraction_and_topic,
aes(
x = topic,
y = n
)
) +
geom_bar(stat = "identity") +
facet_wrap(~ year, ncol = 4) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 728 rows containing missing values (position_stack).
fraction_colors <- c(
"Christlichdemokratische Fraktion" = "#D6862B",
"Evangelische und Unabhängige Fraktion" = "#DEAA28",
"Fraktion der Schweizerischen Volkspartei" = "#4B8A3E",
"Fraktionslos" = "#9D9D9D",
"Freisinnig-demokratische Fraktion" = "#3872B5",
"Grüne Fraktion" = "#84B547",
"Liberale Fraktion" = "#618DEA",
"Sozialdemokratische Fraktion" = "#F0554D",
"EVP/EDU Fraktion" = "#DEAA28",
"Fraktion CVP/EVP/glp" = "#D6862B",
"FDP-Liberale Fraktion" = "#3872B5",
"Fraktion BD" = "#E6C820",
"Fraktion CVP-EVP" = "#D6862B",
"Grünliberale Fraktion" = "#C4C43D",
"CVP-Fraktion" = "#D6862B"
)
ggplot(
requests_per_fraction_and_topic,
aes(
x = year,
y = n,
fill = a_author_fraction
)
) +
geom_bar(stat = "identity") +
facet_wrap(~ topic, ncol = 4) +
scale_fill_manual(values = fraction_colors, guide = FALSE) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(
title = "Number of requests per topic (colors correspond to parties)",
x = NULL,
y = NULL
)
## Warning: Removed 728 rows containing missing values (position_stack).
ggplot(
requests_per_fraction_and_topic %>%
group_by(year, topic) %>%
summarise(n = sum(n, na.rm = TRUE)) %>%
filter(topic %in% c("Gesundheit")),
aes(
x = year,
y = n
)
) +
geom_bar(stat = "identity") +
facet_wrap(~ topic, ncol = 4) +
theme_minimal()
# select a more or less random list of some topics to inspect a bit closer
relevant_topics <- c(
"Umwelt",
"Energie",
"Europapolitik",
"Sicherheitspolitik",
"Migration"
)
ggplot(
requests_per_fraction_and_topic %>%
filter(topic %in% relevant_topics),
aes(
x = year,
y = n,
group = topic,
color = topic
)
) +
geom_line() +
facet_wrap(~ a_author_fraction, ncol = 1, scales = "free_y") +
theme_minimal() +
labs(
title = "Parlamentary fractions: Five relevant topics over time",
x = NULL,
y = NULL
)
To find out in what way we will display the worries in the final publication, we try different approaches to selecting topics and years. As the amount of data can get overwhelming pretty quickly, we need to find a clear focus to concentrate on.
knitr::kable(
worries %>%
group_by(topic) %>%
summarise(value = median(value, na.rm = TRUE)) %>%
arrange(desc(value)) %>%
top_n(20, wt = value)
)
topic | value |
---|---|
Arbeitslosigkeit | 56.5 |
AHV / Altersvorsorge | 42.0 |
Gesundheit, Krankenkassen | 41.0 |
Flüchtlinge / Asyl | 30.5 |
AusländerInnen | 28.5 |
EU / Bilaterale / Integration | 21.0 |
Neue Armut | 18.0 |
Soziale Sicherheit | 18.0 |
Umweltschutz | 17.5 |
Bundesfinanzen | 17.0 |
Persönliche Sicherheit | 17.0 |
Euro-Krise | 16.0 |
Löhne | 13.0 |
Drogen / Alkohol | 12.5 |
Bankenkrise | 12.0 |
Inflation | 12.0 |
Globalisierung | 11.0 |
(Kern-) Energie | 10.0 |
Rassismus / Fremdenfeindlichkeit | 10.0 |
Internet-Sicherheit / Cyber-Spionage | 9.0 |
number_of_top_worries <- 10
manually_selected_top_5 <- c(
"Arbeitslosigkeit",
"Flüchtlinge / Asyl",
"AusländerInnen",
"AHV / Altersvorsorge",
"Umweltschutz"
)
# filter for most important groups
worries_sorted <- worries %>%
group_by(topic) %>%
mutate(
# for median calculation, replace NAs with zero
value = ifelse(is.na(value), 0, value),
median = median(value, na.rm = TRUE)
) %>%
arrange(desc(median)) %>%
distinct(topic) %>%
simplify()
worries %<>%
mutate(topic = factor(topic, levels = worries_sorted)) %>%
arrange(topic, year)
# use head instead of top_n to get an exact number
big_worries <- head(worries_sorted, number_of_top_worries)
# uncomment the following if you want to use the manual selection
# big_worries <- manually_selected_top_5
other_worries <- worries %>%
filter(!topic %in% big_worries) %>%
group_by(year) %>%
summarise(value = sum(value, na.rm = TRUE)) %>%
mutate(topic = "Andere")
ggplot(
worries %>%
filter(topic %in% big_worries),
aes(
x = year,
y = value
)
) +
geom_area() +
facet_wrap(~ topic) +
theme_minimal() +
labs(
title = "Biggest worries",
x = NULL,
y = NULL
)
## Warning: Removed 1 rows containing missing values (position_stack).
ggplot(
worries %>%
filter(topic %in% big_worries) %>%
filter(year %% 4 == 3 | year == max(year)),
aes(
x = year,
y = value,
color = topic
)
) +
geom_line() +
scale_color_brewer(palette = "Set1") +
theme_minimal() +
labs(
title = "Top worries over time (only election years and latest)",
fill = NULL,
x = NULL,
y = NULL
)
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
## Warning: Removed 8 rows containing missing values (geom_path).
ranks_of_worries <- worries %>%
# add hex code for some topics to highlight them
mutate(color = case_when(
topic == "Arbeitslosigkeit" ~ "#e31f2b",
topic == "Flüchtlinge / Asyl" ~ "#61b13e",
topic == "AusländerInnen" ~ "#1cb373",
topic == "AHV / Altersvorsorge" ~ "#1e8ce3",
topic == "Umweltschutz" ~ "#ca51af",
TRUE ~ "#d9d9d9"
))
ggplot(
ranks_of_worries,
aes(
x = year,
y = rank,
color = color,
group = topic,
label = topic
)
) +
geom_line() +
geom_point(size = 2) +
geom_text_repel(
data = ranks_of_worries %>%
filter(year == max(year)),
segment.size = 0.25,
segment.color = "black",
nudge_x = 0.5,
hjust = 0,
direction = "y"
) +
scale_y_continuous(trans = "reverse", limits = c(20, 1)) +
scale_color_identity() +
xlim(min(ranks_of_worries$year), max(ranks_of_worries$year) + 7) +
theme_minimal() +
theme(legend.position = "none") +
labs(
title = "Rank of worries (labelled are only top 20 in 2018)",
x = NULL,
y = NULL
)
## Warning: Removed 240 rows containing missing values (geom_path).
## Warning: Removed 361 rows containing missing values (geom_point).
## Warning: Removed 15 rows containing missing values (geom_text_repel).
# tmp filter list to display only 20 lines in a chart
top_20_2018 <- ranks_of_worries %>%
filter(year == max(year)) %>%
arrange(rank) %>%
head(20) %>%
select(topic) %>%
simplify()
ggplot(
ranks_of_worries %>%
filter(year %% 2 == 1) %>%
filter(topic %in% big_worries),
aes(
x = year,
y = rank,
color = color,
group = topic,
label = topic
)
) +
geom_line() +
geom_point() +
scale_x_continuous(
breaks = seq(1995, 2015, 4),
limits = c(min(ranks_of_worries$year), max(ranks_of_worries$year) + 8)
) +
scale_y_continuous(trans = "reverse", limits = c(20, 1)) +
scale_color_identity(guide = FALSE) +
geom_text_repel(
data = ranks_of_worries %>%
filter(year %% 2 == 1) %>%
filter(topic %in% big_worries) %>%
filter(year == max(year)),
segment.size = 0.25,
segment.color = "black",
nudge_x = 0.5,
hjust = 0,
direction = "y"
) +
theme_minimal() +
labs(
title = "Top 10 worries ordered by rank",
x = NULL,
y = NULL
)
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
ggplot(
ranks_of_worries %>%
filter(year %% 2 == 1) %>%
filter(topic %in% big_worries),
aes(
x = year,
y = value,
color = color,
group = topic,
label = topic
)
) +
geom_line() +
geom_point() +
scale_x_continuous(
breaks = seq(1995, 2015, 4),
limits = c(min(ranks_of_worries$year), max(ranks_of_worries$year) + 8)
) +
scale_color_identity(guide = FALSE) +
geom_text_repel(
data = ranks_of_worries %>%
filter(year %% 2 == 1) %>%
filter(topic %in% big_worries) %>%
filter(year == max(year)),
segment.size = 0.25,
segment.color = "black",
nudge_x = 0.5,
hjust = 0,
direction = "y"
) +
theme_minimal() +
labs(
title = "Real values: how many people were worried in this year?",
subtitle = "Displayed are only the top 9 topics",
x = NULL,
y = NULL
)
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
After looking at the different charts, we decided to focus on the 6 topics that are the most relevant in the latest available year (2018). For the sorted area plot that you see in the final article, we left away all the even years (except for the latest). That was necessary so the plot does not look too confusing.
The visualisation itself was not produced in R but with the online tool RawGraphs using their d3 based tool for bump charts. From there we downloaded a SVG file and imported it into the vector graphics tool Sketch and manually optimized the curving of the lines and their colors and positions.
The result:
knitr::include_graphics("img/final_chart.png")
In order to better contextualise and classify the data on the concerns of the population, the following sections introduce some additional data.
By searching for some keywords we search for all popular votes that belong to the topics migration and pensions.
# load data from national votations from 1994 to 2019
popular_votes <- read_csv("input/popular_votes_1994_2019.csv")
## Parsed with column specification:
## cols(
## date = col_character(),
## vote = col_character(),
## result = col_character()
## )
# save those related to migration in separate data frame
migration <- popular_votes %>%
filter(str_detect(
vote, paste(c(
"einwanderung",
"Ausländer",
"Asyl",
"Migration",
"Flüchtling",
"Flüchtlinge",
"Minarett"
), collapse = "|")
))
# save those related to pensions in separate data frame
ahv <- popular_votes %>%
filter(str_detect(
vote, paste(c(
"AHV",
"Altersvorsorge",
"Rente"
), collapse = "|")
))
# read voter's share of parties
party_strengths <- read_csv(
"input/px-x-1702020000_104.csv",
col_types = cols() # hide info about types
) %>%
rename(
canton = 1,
year = 2,
party = 3,
votes = 4,
fictitious_votes = 5,
strength = 6
) %>%
# drop unnecessary columns (votes is empty, canton is Schweiz)
select(year, party, strength) %>%
# convert strength to number
mutate(strength = as.numeric(strength)) %>%
# replace NAs with zero for delta calculations so parties appear also
# when they have a result for the first time
mutate(strength = ifelse(is.na(strength), 0, strength)) %>%
group_by(party) %>%
arrange(year) %>%
# calculate delta by subtracting last value in group from current one
mutate(delta = strength - lag(strength))
## Warning: NAs durch Umwandlung erzeugt
# select only seven biggest parties
main_parties <- party_strengths %>%
ungroup() %>%
filter(year == max(year)) %>%
top_n(7, wt = strength) %>%
select(party) %>%
simplify()
party_colors <- c(
"SVP/UDC" = "#4B8A3E",
"CVP/PDC" = "#D6862B",
"FDP/PLR (PRD)" = "#3872B5",
"GPS/PES" = "#84B547",
"SP/PS" = "#F0554D",
"BDP/PBD" = "#E6C820",
"GLP/PVL" = "#C4C43D"
)
ggplot(
party_strengths %>%
filter(party %in% main_parties, year >= 1995),
aes(
x = party,
y = delta,
fill = party,
label = scales::percent(delta / 100)
)
) +
geom_bar(position = "dodge", stat = "identity") +
geom_text(aes(vjust = ifelse(delta >= 0, -0.5, 1.5))) +
scale_fill_manual(values = party_colors, guide = FALSE) +
facet_wrap(~ year) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(
title = "Did the main 7 parties win or lose?",
subtitle = "Loss/gain in percentage points",
x = NULL,
y = NULL
)
# export
write_csv(
party_strengths %>%
# round deltas as the substraction introduced weird results
mutate(delta = round(delta, digits = 2)),
"output/party_strengths.csv"
)
The footnote 2 in the spreadsheet is not relevant for us: “Change of practice: In March 2018, a new partially automated registration system was introduced at the regional employment centres (RAV).” (orig: Praxisänderung: Auf den März 2018 wurde bei den regionalen Arbeitsvermittlungszentren (RAV) ein neues teilautomatisiertes Erfassungssystem eingeführt.)
unemployment_rate <- readxl::read_excel(
"input/je-d-03.03.02.02.xlsx",
sheet = 1,
range = "B5:AC8"
) %>%
# convert columns to rows
gather(year, rate) %>%
# filter out NAs introduced by empty lines 6 and 7
filter(!is.na(rate)) %>%
# remove the footnotes from the years and convert to numeric
mutate(year = as.numeric(str_sub(year, 0, 4))) %>%
# filter out years outside of our range of interest
filter(year >= min(worries$year) & year <= max(worries$year)) %>%
# add column for change relative to first year in range of interest
mutate(change = rate / first(rate))
ggplot(
unemployment_rate,
aes(
x = year,
y = change
)
) +
geom_line() +
expand_limits(y = 0) +
theme_minimal() +
labs(
x = NULL,
y = NULL
)
# export
write_csv(
unemployment_rate,
"output/unemployment_rate.csv"
)
The footnote 2 for year 2011 states: “From 2011, a change in the production process and a new definition of the permanent resident population, which additionally includes persons in the asylum process with a total stay of at least 12 months.” (orig: “Ab 2011 Wechsel des Produktionsverfahrens und neue Definition der ständigen Wohnbevölkerung, die zusätzlich Personen im Asylprozess mit einer Gesamtaufenthaltsdauer von mindestens 12 Monaten umfasst.”)
As there is no other or better data source available, we’ll have to ignore this.
immigration <- readxl::read_excel(
"input/su-d-01.05.04.02.01.xlsx",
range = "A3:AB23"
) %>%
# rename first column
rename(group = 1) %>%
# convert all columns except group to rows
gather(year, immigrants, -one_of("group")) %>%
# remove footnotes and convert to numeric
mutate(year = as.numeric(str_sub(year, 0, 4))) %>%
# look only at non-swiss immigrants, then remove group column
filter(group == "Ausland") %>%
select(-group) %>%
# filter out years outside of our range of interest
filter(year >= min(worries$year) & year <= max(worries$year)) %>%
# add column for change relative to first year in range of interest
mutate(change = immigrants / first(immigrants))
## New names:
## * `` -> `..1`
ggplot(
immigration,
aes(
x = year,
y = change
)
) +
geom_line() +
expand_limits(y = 0) +
theme_minimal() +
labs(
x = NULL,
y = NULL
)
# export
write_csv(
immigration,
"output/immigration.csv"
)
The files for new applications for asylum come in yearly files starting in 1994. By iterating over all years from 1994 to 2018 we read in each file and get out the total number of new applications in cell B7
.
asylum_applications <- 1994:2018 %>%
map_df(function(current_year) {
readxl::read_excel(
paste0("input/7-20-Bew-Asylgesuche-J-d-", current_year, "-12.xlsx"),
sheet = "CH-Kt",
range = "B7:B7",
col_names = c("applications")
) %>%
mutate(year = current_year)
}) %>%
# filter out years outside of our range of interest
filter(year >= min(worries$year) & year <= max(worries$year)) %>%
# add column for change relative to first year in range of interest
mutate(change = applications / first(applications)) %>%
# move column year to first position
select(year, everything())
ggplot(
asylum_applications,
aes(
x = year,
y = change
)
) +
geom_line() +
expand_limits(y = 0) +
theme_minimal() +
labs(
x = NULL,
y = NULL
)
# export
write_csv(
asylum_applications,
"output/asylum_applications.csv"
)
Conveniently this table already comes with the relative change and is already indexed by the year 1995, so we can read these values directly and don’t need to calculate it ourselves.
health_costs <- readxl::read_excel(
"input/je-d-14.05.01.01.xlsx",
range = "A10:C66",
col_names = c("year", "costs", "change")
) %>%
# make decimals out of the change (now in percent)
mutate(change = change / 100) %>%
# filter out years outside of our range of interest
filter(year >= min(worries$year) & year <= max(worries$year))
ggplot(
health_costs,
aes(
x = year,
y = change
)
) +
geom_line() +
expand_limits(y = 0) +
theme_minimal() +
labs(
x = NULL,
y = NULL
)
# export
write_csv(
health_costs,
"output/health_costs.csv"
)
# calculate mentions per month
smd_climate <- read_csv("input/smd_climate.csv") %>%
mutate(
months = ifelse(
year == lubridate::year(Sys.Date()),
4, # the data we downloaded was up until 2019-04-30
12
),
per_month = articles / months
)
## Parsed with column specification:
## cols(
## year = col_double(),
## articles = col_double()
## )
ggplot(
smd_climate,
aes(
x = year,
y = per_month
)
) +
geom_bar(stat = "identity") +
expand_limits(y = 0) +
theme_minimal() +
labs(
x = NULL,
y = NULL
)
# export
write_csv(
smd_climate,
"output/smd_climate.csv"
)
Der Code in diesem RMarkdown wird mit lintr automatisch auf den Wickham’schen tidyverse style guide überprüft.
lintr::lint("main.Rmd")
## main.Rmd:676:3: style: Commented code should be removed.
## # big_worries <- manually_selected_top_5
## ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~