Data Dojo Würzburg 22

DataDojo@Lunch - live

May 2023

Dataset

German Electricity Data :electric_plug:

First steps:

Retro on the machine learning series

In the first dojo of the series, we filtered the full set to 3 species with reasonable overlap (Fagus sylvatica, Pinus pinaster, Quercus ilex). Now we want to try different Machine Learning methods to classify tree species from traits.

In the second dojo we created our first models. A very simple “Majority Vote” model and some K-Nearest-Neighbor (KNN) models with scikit-learn.

In the third dojo we explored the effect of scaling on the performance of the KNN models.

In the fourth dojo we explored Decision Trees as models for classification

In the fifth dojo we used Support Vector Machines as models for classification

In the sixth dojo we used ensemble models, including Ada boosting and random forests.

In the seventh dojo we used imputation methods to also make predictions for cases with missing data.

In the eighth dojo we trained some neural networks.

Collaborative Tools and Workflow

For Notebooks (R, python, julia, js, …) with real time collaboration CoCalc seems to be the best option right now. It worked great the last couple of times so we’ll stick to it for now. You need to register an account there (it is free).

Future Suggestions

Add your suggestions to the list and :+1: to the end of a line you are interested in

Data Sets

Tools/Languages

Skills

Data Sources

all data types are welcome, including tables, images, videos, sounds, DNA, …