Data Dojo Würzburg 26
DataDojo@Lunch - hybrid (?)
February 2024
- When: Wednesday, January 31st, 2024 at 11:00am until 12:30pm (90 minutes) :warning: preponed by one week
- Where: CCTB or online (CCTB Seminar Zoom Link)
- Info: DataDojo Website, Repo
Dataset
Data about German Politicians and their voting behaviour (Abgeordnetenwatch).
We will use pre-downloaded data for all available polls of the 2017 Bundestag
legislature period.
For each poll we have general information about that poll in the polls.json
file and the voting behavior of each politician in votes/<pollid>.json
.
Potential questions
- Which individual polititian voted yes/no/abstain most often?
- Which individual polititian missed most votes
- Which party voted yes/no/abstain most often?
- Which party voted against the majority most often?
- How heterogeneous were the votes of each party?
- How often did the coalition of CDU and SPD loose the vote?
- Which individual polititians had the most (dis)similar voting pattern?
- How similar was the voting pattern of each party?
Necessary step to answer these questions
- Load the json data
- Create a suitable data structure (one large tidy table?)
Collaborative Tools and Workflow
For Notebooks (R, python, julia, js, …) with real time collaboration CoCalc seems to be the best option right now. It worked great the last couple of times so we’ll stick to it for now. You need to register an account there (it is free).
Future Suggestions
Add your suggestions to the list and :+1: to the end of a line you are interested in
Concrete suggestions
- Build a citation and/or collaboration network for the CCTB using the OpenAlex API
Data Sets
- Würzburg Baumkataster, Würzburger Klimabäume
- National Registry of Exonerations
- Bee Varroa Image Classification :bee:
- Mattermost Chat History - e.g. analyze the messages and reactions from the lunch channel
- Wordbank - data of children learning to talk
- All Birds :bird:
- Results of the Bundestagswahl 2021
- Weather data throughout Germany over time (incl. temperature, precipitation, …): https://www.dwd.de/DE/leistungen/cdc_portal/cdc_portal.html
- German Mikrozensus
- Kaggle Titanic or Tabular Playground or Meta Kaggle
- World Trade Data (Open Trade Statistics)
- Open Citation Data
- Top 100 charts + Audio Features
- Emoji Usage :hugging_face::heart::laughing:
- Observable Curated Datasets
- Abgeordnetenwatch - Data on German elected officials in EU-Parlament, Bundes- and Landtag (Election History, Ausschusszugehörigkeit, Side jobs, etc)
- Button Men Game Results
- Extract structured data from pdfs (Example: population statistics in Leinach from Gemeindeblatt)
- DSA Transparency Database - daily archives of statements of reasons for take downs from social media
- Dürremonitor Deutschland
Data Sources
all data types are welcome, including tables, images, videos, sounds, DNA, …
- OpenData Bayern
- TidyTuesday
- Our World in Data (R package: owidR), Sustainable Development Goals
- Open Data Initiatives (Würzburg, Germany, Statistisches Bundesamt, Europe, APIs)
- Data is plural
- Awesome Public Datasets
- Kaggle Datasets or Competitions, e.g. SLICED
- tsibbledata: Time Series Datasets
- R-text-data: Text Datasets, ready to use in R
- data.world
- Statista - the University of Würzburg has a campus license
- Open Legal Data
- Bundestag Data (e.g. poll results, deputies, wahl-o-mat, inspirational blog post)
- Deutsche Digitale Bibliothek (API, old newspapers from Germany)
- Earth Observation: Satellite Image Time Series
- Machine Learning Datasets
- Internation (Student) Assessment Data (TIMSS, PIRLS, PISA, …)
- (Medical) Imaging Datasets, MedMNIST
- Inspirational Notebooks on Observable
- Ski resort statistics :skier: