Data Dojo Würzburg 9 - DataDojo@Lunch
February 2022
- When: Thursday, February 10th, 2022 at 12:00pm
- Where: Zoom
- Zoom:
- Link
- Meeting ID: 987 2271 0806
- Password: 952914
- Info: DataDojo Website, Repo
DataDojo@Lunch
This time we try something new: we’ll meet at 12pm for just 1 hour. This way you can have your lunch while attending the Dojo (we are taking turns coding as usual). This is an experiment to see if this format better fits the tight schedules of the participants.
Participants
Please add your name to the list (click the pen icon at the top left to edit) if you plan to come. And please remove it if you can not make it. Feel free to add your preferred tool or programming language.
- Markus (R/tidyverse)
- Laura (Python, R)
- Andreas (Julia/DataFrames.jl)
- Simon (Python)
- Anne (R/tidyverse)
- Florens (Python, R)
- »add your name here«
Dataset
Local results of the German Federal Election from Würzburg (Stadt and Landkreis) together with demographic information (e.g. age structure): Stadt, Landkreis
Specific task for today
- Select Stadt or Land voting data
- Bring into the tidy format:
Place | Party | Erststimmen | Zweitstimmen |
---|---|---|---|
bla | blub | 0.23 | 0.12 |
- Add a column
Mean Age
to that table
When you are done, feel free to do the same for Stadt/Land (whichever you did not select first) or start exploring some of the interesting questions (for inspiration, see below).
Question Pool:
- Generic
- What kind of information is stored in the table(s)?
- How much data is missing?
- Is the dataset clean or are there any clear outliers?
- How can the different datasets be combined?
- How to visualize the results in a suitable way?
- Specific
- Overview of voting behavior: how does voting behavior vary by location? (General trends, total variability, …)
- Overview of demographic info: how does age/gender distribution vary by location? (General trends, total variability, …)
- Which party has the strongest (positive/negative) correlation with age?
- Which party has the strongest (positive/negative) correlation with gender?
- Can we predict voting behavior from age/gender distribution? (or vice-versa)
- Add your own questions
- Further Ideas
- Show results with district resolution on an interactive map (e.g using these shapes)
- Add your own ideas
Collaborative Tools and Workflow
For Notebooks (R, python, julia, js, …) with real time collaboration CoCalc seems to be the best option right now. It worked great the last couple of times so we’ll stick to it for now. You need to register an account there (it is free).
Future Suggestions
Add your suggestions to the list and :+1: to the end of a line you are interested in
Data Sets
- Results of the Bundestagswahl 2021
- Weather data throughout Germany over time (incl. temperature, precipitation, …): https://www.dwd.de/DE/leistungen/cdc_portal/cdc_portal.html
- German Mikrozensus
- Kaggle Titanic or Tabular Playground or Meta Kaggle
- World Trade Data (Open Trade Statistics)
- Open Citation Data
- Top 100 charts + Audio Features
- Emoji Usage :hugging_face::heart::laughing:
Tools/Languages
Skills
- interactive maps
- dashboards
- animations
Data Sources
all data types are welcome, including tables, images, videos, sounds, DNA, …
- TidyTuesday
- Our World in Data (R package: owidR), Sustainable Development Goals
- Open Data Initiatives (Würzburg, Germany, Statistisches Bundesamt, Europe, APIs)
- Awesome Public Datasets
- Kaggle Datasets or Competitions, e.g. SLICED
- tsibbledata: Time Series Datasets
- R-text-data: Text Datasets, ready to use in R
- data.world
- Statista - the University of Würzburg has a campus license
- Open Legal Data
- Bundestag Data (e.g. poll results, deputies, wahl-o-mat, inspirational blog post)
- Deutsche Digitale Bibliothek (API, old newspapers from Germany)
- Earth Observation: Satellite Image Time Series
- Machine Learning Datasets
- Internation (Student) Assessment Data (TIMSS, PIRLS, PISA, …)
- (Medical) Imaging Datasets, MedMNIST
- Inspirational Notebooks on Observable