Dieses Dokument beschreibt die Vorprozessierung und explorative Analyse des Datensatzes, der Grundlage des auf srf.ch veröffentlichten Artikel Diese Lebensfragen sorgen nicht mehr für rote Köpfe ist.
SRF Data legt Wert darauf, dass die Datenvorprozessierung und -Analyse nachvollzogen und überprüft werden kann. SRF Data glaubt an das Prinzip offener Daten, aber auch offener und nachvollziehbarer Methoden. Zum anderen soll es Dritten ermöglicht werden, auf dieser Vorarbeit aufzubauen und damit weitere Auswertungen oder Applikationen zu generieren.
Die Endprodukte des vorliegenden Scripts, neben der vorliegenden explorativen Analyse, sind (Datenbeschreibung siehe unten):
approvement_*.csv
: Für die Visualisierung aufbereitete Daten bezgl. der jeweiligen Fragestellung (nach dem _
, korrespondiert mit dem Codebuch des ESS).approvement_*_icsbfm.csv
: Idem, jedoch jeweils betreffend Frauen oder Männer.Die Vorprozessierung und Analyse wurde im Statistikprogramm R vorgenommen. Das zugrunde liegende Script sowie die prozessierten Daten können unter diesem Link heruntergeladen werden. Durch Ausführen von main.Rmd
kann der hier beschriebene Prozess nachvollzogen und der für den Artikel verwendete Datensatz generiert werden. Dabei werden Daten aus dem Ordner input
eingelesen und Ergebnisse in den Ordner output
geschrieben.
SRF Data verwendet das rddj-template von Timo Grossenbacher als Grundlage für seine R-Scripts. Entstehen bei der Ausführung dieses Scripts Probleme, kann es helfen, die Anleitung von rddj-template zu studieren.
Debug-Informationen: This report was generated on 2019-12-06 08:26:12. R version: 3.5.3 on x86_64-pc-linux-gnu. For this report, CRAN packages as of 2019-03-01 were used.
Der Code für die vorliegende Datenprozessierung ist auf https://github.com/srfdata/2019-12-ess-timing-of-life zur freien Verwendung verfügbar.
2019-12-ess-timing-of-life von SRF Data ist lizenziert unter einer Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International Lizenz.
Code & Daten von SRF Data sind unter https://srfdata.github.io verfügbar.
Die veröffentlichten Informationen sind sorgfältig zusammengestellt, erheben aber keinen Anspruch auf Aktualität, Vollständigkeit oder Richtigkeit. Es wird keine Haftung übernommen für Schäden, die durch die Verwendung dieses Scripts oder der daraus gezogenen Informationen entstehen. Dies gilt ebenfalls für Inhalte Dritter, die über dieses Angebot zugänglich sind.
approvement_*.csv
(Beispiel)Attribut | Typ | Beschreibung |
---|---|---|
Answer | String | Art der Antwort. |
2006 | Numeric | Anteil der Befragten mit dieser Antwort im Jahr 2006. |
2018 | Numeric | Anteil der Befragten mit dieser Antwort im Jahr 2018. |
approvement_*_icsbfm.csv
Attribut | Typ | Beschreibung |
---|---|---|
icsbfm | String | Ob die Frage betr. Frauen oder Männer gestellt wurde. |
Answer | String | Art der Antwort. |
2006 | Numeric | Anteil der Befragten mit dieser Antwort im Jahr 2006. |
2018 | Numeric | Anteil der Befragten mit dieser Antwort im Jahr 2018. |
Die Daten stammen vom European Social Survey (ESS), welcher alle zwei Jahre seit 2002 durchgeführt wird und Werte, Einstellungen sowie das soziale Verhalten der Bevölkerung erforscht. Der ESS ist eine wissenschaftliche, länderübergreifende Erhebung und wird in mehr als 30 europäischen Ländern durchgeführt. In der Schweiz wird die Befragung vom Schweizerischen Kompetenzzentrum Sozialwissenschaften (Fors) an der Universität Lausanne durchgeführt. Die neunte Erhebung fand im Jahr 2018 statt und wurde am 31.10.2019 publiziert. Für das vorliegende Projekt wurden die neusten Daten aus der neunten Runde (2018), sowie die Daten aus der dritten Runde (2006) verwendet. Beide erheben Daten zu dem hier analysierten Themenblock “Timing of Life”.
Die Daten können auf der Webseite des European Social Survey heruntergeladen werden.
## [1] "package package:rmarkdown detached"
## Loading required package: knitr
## Loading required package: rstudioapi
# from https://mran.revolutionanalytics.com/web/packages/checkpoint/vignettes/using-checkpoint-with-knitr.html
# if you don't need a package, remove it from here (commenting is probably not sufficient)
# tidyverse: see https://blog.rstudio.org/2016/09/15/tidyverse-1-0-0/
cat("
library(rstudioapi)
library(tidyverse) # ggplot2, dplyr, tidyr, readr, purrr, tibble
library(magrittr) # pipes
library(readxl) # excel
library(scales) # scales for ggplot2
library(jsonlite) # json
library(lintr) # code linting
library(sf) # spatial data handling
library(rmarkdown)",
file = "manifest.R")
# if checkpoint is not yet installed, install it (for people using this
# system for the first time)
if (!require(checkpoint)) {
if (!require(devtools)) {
install.packages("devtools", repos = "http://cran.us.r-project.org")
require(devtools)
}
devtools::install_github("RevolutionAnalytics/checkpoint",
ref = "v0.3.2", # could be adapted later,
# as of now (beginning of July 2017
# this is the current release on CRAN)
repos = "http://cran.us.r-project.org")
require(checkpoint)
}
# nolint start
if (!dir.exists("~/.checkpoint")) {
dir.create("~/.checkpoint")
}
# nolint end
# install packages for the specified CRAN snapshot date
checkpoint(snapshotDate = package_date,
project = path_to_wd,
verbose = T,
scanForPackages = T,
use.knitr = F,
R.version = R_version)
rm(package_date)
source("manifest.R")
unlink("manifest.R")
sessionInfo()
## R version 3.5.3 (2019-03-11)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /opt/R/R-3.5.3/lib/R/lib/libRblas.so
## LAPACK: /opt/R/R-3.5.3/lib/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_CH.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=de_CH.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=de_CH.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rmarkdown_1.11 sf_0.7-3 lintr_1.0.3 jsonlite_1.6
## [5] scales_1.0.0 readxl_1.3.0 magrittr_1.5 forcats_0.4.0
## [9] stringr_1.4.0 dplyr_0.8.0.1 purrr_0.3.0 readr_1.3.1
## [13] tidyr_0.8.2 tibble_2.0.1 ggplot2_3.1.0 tidyverse_1.2.1
## [17] checkpoint_0.4.0 rstudioapi_0.9.0 knitr_1.21
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.5 xfun_0.5 haven_2.1.0 lattice_0.20-38
## [5] colorspace_1.4-0 generics_0.0.2 htmltools_0.3.6 yaml_2.2.0
## [9] rlang_0.3.1 e1071_1.7-0.1 pillar_1.3.1 glue_1.3.0
## [13] withr_2.1.2 DBI_1.0.0 modelr_0.1.4 plyr_1.8.4
## [17] munsell_0.5.0 gtable_0.2.0 cellranger_1.1.0 rvest_0.3.2
## [21] evaluate_0.13 rex_1.1.2 class_7.3-15 broom_0.5.1
## [25] Rcpp_1.0.0 backports_1.1.3 classInt_0.3-1 hms_0.4.2
## [29] digest_0.6.18 stringi_1.3.1 grid_3.5.3 cli_1.0.1
## [33] tools_3.5.3 lazyeval_0.2.1 crayon_1.3.4 pkgconfig_2.0.2
## [37] xml2_1.2.0 lubridate_1.7.4 assertthat_0.2.0 httr_1.4.0
## [41] R6_2.4.0 units_0.6-2 nlme_3.1-137 compiler_3.5.3
# if you want to outsource logic to other script files, see README for
# further information
knitr::read_chunk("scripts/my_script.R")
source("scripts/my_script.R")
my_function(5)
## [1] 5
# some constants
default_font_color <- "#4e4d47"
default_background_color <- "#f5f5f2"
default_font_family <- "SRG SSR Type Text"
theme_srf <- function(...) {
theme_minimal() +
theme(
text = element_text(family = default_font_family,
color = default_font_color),
# add a subtle grid
panel.grid.major = element_line(color = "#dbdbd9", size = 0.2),
panel.grid.minor = element_blank(),
# background colors
plot.background = element_rect(fill = default_background_color,
color = NA),
panel.background = element_rect(fill = default_background_color,
color = NA),
legend.background = element_rect(fill = default_background_color,
color = NA),
# borders and margins
plot.margin = unit(c(.5, .5, .2, .5), "cm"),
panel.border = element_blank(),
# panel.spacing = unit(c(-.1, 0.2, .2, 0.2), "cm"),
# titles
legend.title = element_text(size = 11),
legend.text = element_text(size = 9, hjust = 0,
color = default_font_color),
plot.title = element_text(size = 15, hjust = 0.5,
color = default_font_color),
plot.subtitle = element_text(size = 10, hjust = 0.5,
color = default_font_color,
margin = margin(b = -0.1,
t = -0.1,
l = 2,
unit = "cm"),
debug = F),
# captions
plot.caption = element_text(size = 7,
hjust = .5,
margin = margin(t = 0.2,
b = 0,
unit = "cm"),
color = "#939184"),
...
)
}
Round_3 <- haven::read_dta("input/ESS3e03_7.dta", encoding = "latin1")
Round_9.1 <- haven::read_dta("input/ESS9e01.dta", encoding = "latin1") %>%
# rename some variables so data frames are consistent
rename(icsbfm = admge,
iaglvmr = iagmr,
age = agea)
# merge the two rounds with bind_rows()
suppressWarnings({
Timing_of_life <- Round_3 %>%
bind_rows(Round_9.1) %>%
rename(year = essround) %>%
mutate(year = case_when(
year == 1 ~ 2002,
year == 2 ~ 2004,
year == 3 ~ 2006,
year == 4 ~ 2008,
year == 5 ~ 2010,
year == 6 ~ 2012,
year == 7 ~ 2014,
year == 8 ~ 2016,
year == 9 ~ 2018,
TRUE ~ NA_real_
)) %>%
mutate(year = as.character(year)) %>%
# select only necessary variables
select(year, cntry, gndr, age, icsbfm,
dweight,
starts_with("iag"), # ideal age questions
advcyc,
acldnmr,
aftjbyc,
alvgptn,
anvcld# approval questions
) %>%
filter(cntry == "CH") %>%
# column mutations
mutate(gndr = case_when(
gndr == 1 ~ "Mann",
gndr == 2 ~ "Frau"
),
icsbfm = case_when(
icsbfm == 1 ~ "Betr. Frauen",
icsbfm == 2 ~ "Betr. Männer"
)) %>%
mutate(age_grouping = cut(
age,
c(14, 34, 49, 64, 500),
c("15-34", "35-49", "50-64", "65+")
)) %>%
# group_by(year, cntry) %>%
gather(key = Question,
value = Answer, -year, -cntry,
-gndr, -age, -age_grouping, -icsbfm, -dweight)
})
rm(Round_3, Round_9.1)
iaglptn : Start living with partner not married to, ideal age. iaglvmr : Get married and live with husband/wife, ideal age. iagpnt : Become mother/father, ideal age. iagrtr : Retire permanently, ideal age.
# How did the ideal age change? (aggregate with mean)
Timing_of_life %>%
filter(!is.na(year) &
Question %in% c("iaglptn", "iaglvmr", "iagpnt", "iagrtr")) %>%
mutate(
Answer = replace(Answer, Answer == 0, NA),
Answer = replace(Answer, Answer == 111, NA),
Answer = replace(Answer, Answer == 222, NA),
Answer = as.numeric(Answer),
Question = case_when(
Question == "iaglptn" ~ "mit Partner zusammenziehen",
Question == "iaglvmr" ~ "zu heiraten",
Question == "iagpnt" ~ "Eltern zu werden",
Question == "iagrtr" ~ "sich pensionieren zu lassen"
),
Question = factor(Question, levels = c("mit Partner zusammenziehen",
"zu heiraten",
"Eltern zu werden",
"sich pensionieren zu lassen"))
) %>%
filter(!is.na(Answer)) %>%
group_by(Question, year) %>%
# use weighted mean here
summarise(mean = weighted.mean(Answer, dweight)) %>%
# compute fractions
ggplot(aes(x = year,
y = mean,
group = Question)) +
geom_line() +
geom_point() +
facet_wrap(~ Question, nrow = 1) +
theme_srf() +
labs(title = "Welches ist das ideale Alter, um...",
x = "",
y = "Alter",
color = "")
# How did the ideal age change? Function
IdealAge <- function(data, grouping_var) {
data %>%
filter(cntry == "CH" &
Question %in% c("iaglptn", "iaglvmr", "iagpnt", "iagrtr")) %>%
mutate(
Answer = replace(Answer, Answer == 0, NA),
Answer = replace(Answer, Answer == 111, NA),
Answer = replace(Answer, Answer == 222, NA),
Answer = as.numeric(Answer),
Question = case_when(
Question == "iaglptn" ~ "mit Partner zusammenziehen",
Question == "iaglvmr" ~ "zu heiraten",
Question == "iagpnt" ~ "Eltern zu werden",
Question == "iagrtr" ~ "sich pensionieren zu lassen"
),
Question = factor(Question, levels = c("mit Partner zusammenziehen",
"zu heiraten",
"Eltern zu werden",
"sich pensionieren zu lassen"))
) %>%
filter(!is.na(Answer)) %>%
filter(!is.na(!!sym(grouping_var))) %>%
group_by(Question, year, !!sym(grouping_var)) %>%
# use weighted mean here
summarise(mean = weighted.mean(Answer, dweight)) %>%
ggplot(aes(x = year,
y = mean,
group = !!sym(grouping_var),
color = !!sym(grouping_var))
) +
geom_line() +
geom_point() +
scale_color_brewer(palette = "Dark2") +
facet_wrap(~ Question, nrow = 1) +
theme_srf() +
labs(title = "Welches ist das ideale Alter für eine Person, um...",
x = "",
y = "Alter",
color = "")
}
# ideal age for men/women
IdealAge(Timing_of_life, "icsbfm")
# ideal age by gender
IdealAge(Timing_of_life, "gndr")
# ideal age by age groups
IdealAge(Timing_of_life, "age_grouping")
# Prepare the titles for the function
translate_question_ids_short <- function(question_id) {
case_when(
question_id == "anvcld" ~
"Befürworten Sie es, wenn sich eine Person entscheidet,
niemals Kinder zu haben?",
question_id == "alvgptn" ~
"Befürworten Sie es, wenn eine Person unverheiratet
mit ihrem Partner zusammen lebt?",
question_id == "acldnmr" ~
"Befürworten Sie es, wenn eine Person unverheiratet
mit ihrem Partner Kinder bekommt?",
question_id == "aftjbyc" ~
"Befürworten Sie es, wenn eine Person in Vollzeit
arbeitet, trotz Kindern unter 3 Jahren?",
question_id == "advcyc" ~
"Befürworten Sie es, wenn sich eine Person scheiden
lässt, trotz Kindern unter 12 Jahren?")
}
# Function: How did the attitudes change?
PeopleReaction <- function(data, question_id, save_csv = FALSE) {
share_calculation <- data %>%
filter(Question == question_id &
!is.na(Answer)) %>%
mutate(Answer = case_when(
Answer == 1 ~ "Stark ablehnen",
Answer == 2 ~ "Ablehnen",
Answer == 3 ~ "Indifferent",
Answer == 4 ~ "Befürworten",
Answer == 5 ~ "Stark befürworten"
),
Answer = factor(Answer, levels = c(
"Stark ablehnen",
"Ablehnen",
"Indifferent",
"Befürworten",
"Stark befürworten"))
) %>%
group_by(year, Answer) %>%
summarise(is = n(),
# weights
avg_dweight = mean(dweight)) %>%
# weight data with average weights over group
mutate(is = avg_dweight * is) %>%
# compute fractions
mutate(share = is / sum(is)) %>%
ungroup()
max_share <- share_calculation %>%
arrange(desc(share)) %>%
slice(1) %>%
pull()
if(save_csv == TRUE) {
share_calculation %>%
select(-avg_dweight, -is) %>%
# spread for datawrapper
spread(key = year, value = share) %>%
write_csv(path = glue::glue("output/approvement_{question_id}.csv"))
}
share_calculation %>%
ggplot(aes(x = year,
y = share,
fill = Answer,
alpha = year)) +
geom_bar(stat = "identity",
position = position_dodge2(reverse = TRUE)) +
geom_text(
aes(
y = share,
label = scales::percent(round(share, 3))
),
position = position_dodge(width = 0.9),
vjust = -.3,
color = "black",
size = 3
) +
facet_wrap(~ Answer, nrow = 1) +
scale_fill_manual(values = c(
"Stark ablehnen" = "#c91024",
"Ablehnen" = "#ff9193",
"Indifferent" = "#cac8bf",
"Befürworten" = "#91ceff",
"Stark befürworten" = "#1a7ac5"
), guide = FALSE
) +
scale_alpha_discrete(range = c(0.7, 1), guide = FALSE) +
scale_y_continuous(labels = scales::percent) +
theme_srf() +
labs(title = translate_question_ids_short(question_id),
x = "", y = "")
}
PeopleReactionGrouped <- function(data,
question_id,
grouping_var,
gndr_filter = FALSE,
save_csv = FALSE) {
share_calculation <- data %>%
filter(Question == question_id &
!is.na(Answer) &
!is.na(!!sym(grouping_var)))
if (gndr_filter != FALSE){
share_calculation %<>%
filter(gndr == gndr_filter)
}
print(nrow(share_calculation))
share_calculation %<>%
mutate(Answer = case_when(
Answer == 1 ~ "Stark ablehnen",
Answer == 2 ~ "Ablehnen",
Answer == 3 ~ "Indifferent",
Answer == 4 ~ "Befürworten",
Answer == 5 ~ "Stark befürworten"
),
Answer = factor(Answer, levels = c(
"Stark ablehnen",
"Ablehnen",
"Indifferent",
"Befürworten",
"Stark befürworten"))
) %>%
group_by(year, !!sym(grouping_var), Answer) %>%
summarise(is = n(),
# weights
avg_dweight = mean(dweight)) %>%
# weight data with average weights over group
mutate(is = avg_dweight * is) %>%
# compute fractions
mutate(share = is / sum(is)) %>%
ungroup()
if(save_csv == TRUE) {
share_calculation %>%
select(-avg_dweight, -is) %>%
# spread for datawrapper
spread(key = year, value = share) %>%
write_csv(path =
glue::glue(
"output/approvement_{question_id}_{grouping_var}.csv"))
}
max_share <- share_calculation %>%
arrange(desc(share)) %>%
slice(1) %>%
pull()
addendum <- ifelse(gndr_filter != FALSE, "(Antwort nur von", "")
addendum <- ifelse(gndr_filter == "Mann",
glue::glue("{addendum} Männern)"), addendum)
addendum <- ifelse(gndr_filter == "Frau",
glue::glue("{addendum} Frauen)"), addendum)
share_calculation %>%
ggplot(aes(x = year,
y = share,
fill = Answer,
alpha = year)) +
geom_bar(stat = "identity",
position = position_dodge2(reverse = TRUE)) +
geom_text(
aes(
y = share,
label = scales::percent(round(share, 3))
),
position = position_dodge(width = 0.9),
vjust = -.3,
color = "black",
size = 3
) +
scale_fill_manual(values = c(
"Stark ablehnen" = "#c91024",
"Ablehnen" = "#ff9193",
"Indifferent" = "#cac8bf",
"Befürworten" = "#91ceff",
"Stark befürworten" = "#1a7ac5"
), guide = FALSE
) +
facet_grid(
cols = vars(Answer),
rows = vars(!!sym(grouping_var)),
) +
scale_alpha_discrete(range = c(0.7, 1), guide = FALSE) +
scale_y_continuous(labels = scales::percent,
limits = c(0, max_share + .1)) +
theme_srf() +
labs(title =
translate_question_ids_short(question_id),
subtitle =
ifelse(grouping_var == "icsbfm",
glue::glue(
"Frage jeweils betreffend Frauen und Männern separat gestellt {addendum}"),
""),
x = "", y = "")
}
# never has children
PeopleReaction(Timing_of_life, "anvcld", save_csv = TRUE)
## Warning: Using alpha for a discrete variable is not advised.
PeopleReactionGrouped(Timing_of_life, "anvcld", "icsbfm")
## [1] 3313
## Warning: Using alpha for a discrete variable is not advised.
# live unmarried with partner
PeopleReaction(Timing_of_life, "alvgptn", save_csv = TRUE)
## Warning: Using alpha for a discrete variable is not advised.
PeopleReactionGrouped(Timing_of_life, "alvgptn", "icsbfm")
## [1] 3323
## Warning: Using alpha for a discrete variable is not advised.
# get divorced with kids under 12
PeopleReaction(Timing_of_life, "advcyc", save_csv = TRUE)
## Warning: Using alpha for a discrete variable is not advised.
PeopleReactionGrouped(Timing_of_life, "advcyc", "icsbfm")
## [1] 3269
## Warning: Using alpha for a discrete variable is not advised.
# unmarried and has kids
PeopleReaction(Timing_of_life, "acldnmr", save_csv = TRUE)
## Warning: Using alpha for a discrete variable is not advised.
PeopleReactionGrouped(Timing_of_life, "acldnmr", "icsbfm")
## [1] 3322
## Warning: Using alpha for a discrete variable is not advised.
# working full-time with kids under 3
PeopleReaction(Timing_of_life, "aftjbyc")
## Warning: Using alpha for a discrete variable is not advised.
PeopleReactionGrouped(Timing_of_life, "aftjbyc", "icsbfm", save_csv = TRUE)
## [1] 3252
## Warning: Using alpha for a discrete variable is not advised.
PeopleReactionGrouped(Timing_of_life, "aftjbyc", "icsbfm", "Frau")
## [1] 1702
## Warning: Using alpha for a discrete variable is not advised.
PeopleReactionGrouped(Timing_of_life, "aftjbyc", "icsbfm", "Mann")
## [1] 1550
## Warning: Using alpha for a discrete variable is not advised.
Der Code in diesem RMarkdown wird mit lintr automatisch auf den Wickham’schen tidyverse style guide überprüft.
lintr::lint("main.Rmd", linters =
lintr::with_defaults(
commented_code_linter = NULL,
trailing_whitespace_linter = NULL
)
)
## main.Rmd:490:5: style: Place a space before left parenthesis, except in a function call.
## if(save_csv == TRUE) {
## ^
## main.Rmd:570:5: style: Place a space before left parenthesis, except in a function call.
## if(save_csv == TRUE) {
## ^
# if you have additional scripts and want them to be linted too, add them here
# lintr::lint("scripts/my_script.R")