Data Dojo Würzburg 7
December 2021
- When: Thursday, December 9th, 2021 at 4:00pm
- Where: Zoom
- Zoom: the event has ended
- Info: DataDojo Website, Repo
Participants
Please add your name to the list (click the pen icon at the top left to edit) if you plan to come. And please remove it if you can not make it. Feel free to add your preferred tool or programming language.
- Markus (julia)
- Andreas (julia)
- Florens
Dataset
Question Pool:
- Generic
- What kind of information is stored in the table(s)?
- How much data is missing?
- Is the dataset clean or are there any clear outliers?
- Specific
- How many christmas songs have been in the top 100 (top 10) each year (plot)
- Which song has been in the top 100 most often
- Which song reached the highest rank each year
- Have there been christmas songs in the top 100 in June (far away from christmas)?
- How long was the longest a christmas song stayed in the top 100 (consecutively)?
- Add your own questions
- Further Ideas
- Look at the song texts and word usage or sentiment
- Add your own ideas
- Inspiration
- https://www.kaggle.com/michau96/what-carols-are-popular-merry-christmas
Collaborative Tools and Workflow
For Notebooks (R, python, julia, js, …) with real time collaboration CoCalc seems to be the best option right now. It worked great the last couple of times so we’ll stick to it for now. You need to register an account there (it is free).
Other real time collaboration tools
Feel free to add suggestions to this list
- VS Code with Live Share Extension (very promising but notebook support not yet stable), languages: python, R, julia, …
- Jupyter Lab real time collaboration (alpha feature), languages: python, R, julia, …
- Observable multiplayer (experimental feature), languages: javascript
- Jupyter Lite: in browser version of Jupyter Lab, languages: javascript, (a subset of) python
Future Suggestions
Add your suggestions to the list and :+1: to the end of a line you are interested in
Data Sets
- Results of the Bundestagswahl 2021
- Weather data throughout Germany over time (incl. temperature, precipitation, …): https://www.dwd.de/DE/leistungen/cdc_portal/cdc_portal.html
- German Mikrozensus
- Kaggle Titanic or Tabular Playground or Meta Kaggle
- World Trade Data (Open Trade Statistics)
- Open Citation Data
- Top 100 charts + Audio Features
Tools/Languages
Skills
- interactive maps
- dashboards
- animations
Data Sources
all data types are welcome, including tables, images, videos, sounds, DNA, …
- TidyTuesday
- Our World in Data (R package: owidR), Sustainable Development Goals
- Open Data Initiatives (Würzburg, Germany, Statistisches Bundesamt, Europe, APIs)
- Awesome Public Datasets
- Kaggle Datasets or Competitions, e.g. SLICED
- tsibbledata: Time Series Datasets
- R-text-data: Text Datasets, ready to use in R
- data.world
- Statista - the University of Würzburg has a campus license
- Open Legal Data
- Bundestag Data (e.g. poll results, deputies, wahl-o-mat, inspirational blog post)
- Deutsche Digitale Bibliothek (API, old newspapers from Germany)
- Earth Observation: Satellite Image Time Series
- Machine Learning Datasets