Data Dojo Würzburg 13

DataDojo@Lunch - live

June 2022

When: Thursday, June 9^th, 2022 at 11:30pm until 12:45pm (75 minutes)
Where: CCTB
Info: DataDojo Website, Repo

Participants

Please add your name to the list (click the pen icon at the top left to edit) if you plan to come. And please remove it if you can not make it. Feel free to add your preferred tool or programming language.

Markus (R, julia)

Jörg

Dataset

Spotify listening history (request yours here)

Question Pool:

Generic
- What kind of information is stored in the table(s)?
- How much data is missing?
- Is the dataset clean or are there any clear outliers?
- How can the different datasets be combined?
- How to visualize the results in a suitable way?
Specific
- Which artists/songs were most popular during the day vs in the evening?
- What was the listening frequency for a specific band per year? e.g. Spice Girls
- What is the top song of each year?
- Which song was most frequently played in a single day/week/month?
- What was the longest time a song was not played before being played again?
- How many different songs and artists did we listen to (by year)?
- Which artists were most popular in summer/winter?
- Add your own questions
Further Ideas
- What is the most skipped song all time?
- Is there a temporal correlation between songs/artists? (probably yes, because of playlists…)
- Can we predict the year based on a selection of five random songs?
- Add your own ideas

Collaborative Tools and Workflow

For Notebooks (R, python, julia, js, …) with real time collaboration CoCalc seems to be the best option right now. It worked great the last couple of times so we’ll stick to it for now. You need to register an account there (it is free).

Future Suggestions

Add your suggestions to the list and :+1: to the end of a line you are interested in

Data Sets

All Birds :bird:
Results of the Bundestagswahl 2021
Weather data throughout Germany over time (incl. temperature, precipitation, …): https://www.dwd.de/DE/leistungen/cdc_portal/cdc_portal.html
German Mikrozensus
Kaggle Titanic or Tabular Playground or Meta Kaggle
World Trade Data (Open Trade Statistics)
Open Citation Data
Top 100 charts + Audio Features
Emoji Usage :hugging_face::heart::laughing:
Observable Curated Datasets

Tools/Languages

R/tidyverse
python
Power BI
Tableau
KNIME
javascript
julia
visidata

Skills

interactive maps
dashboards
animations

Data Sources

all data types are welcome, including tables, images, videos, sounds, DNA, …

TidyTuesday
Our World in Data (R package: owidR), Sustainable Development Goals
Open Data Initiatives (Würzburg, Germany, Statistisches Bundesamt, Europe, APIs)
Data is plural
Awesome Public Datasets
Kaggle Datasets or Competitions, e.g. SLICED
tsibbledata: Time Series Datasets
R-text-data: Text Datasets, ready to use in R
data.world
Statista - the University of Würzburg has a campus license
Open Legal Data
Bundestag Data (e.g. poll results, deputies, wahl-o-mat, inspirational blog post)
Deutsche Digitale Bibliothek (API, old newspapers from Germany)
Earth Observation: Satellite Image Time Series
Machine Learning Datasets
Internation (Student) Assessment Data (TIMSS, PIRLS, PISA, …)
(Medical) Imaging Datasets, MedMNIST
Inspirational Notebooks on Observable
Ski resort statistics :skier:

Data Dojo Würzburg 13

Let's practice our data analytics skills together!

Data Dojo Würzburg 13

DataDojo@Lunch - live

June 2022

Participants

Dataset

Collaborative Tools and Workflow

Future Suggestions

Data Sets

Tools/Languages

Skills

Data Sources

Cross Links