Data Dojo Würzburg 14

DataDojo@Lunch - live

July 2022

Participants

Please add your name to the list (click the pen icon at the top left to edit) if you plan to come. And please remove it if you can not make it. Feel free to add your preferred tool or programming language.

Dataset

We want to start a series of Data Dojos on machine learning. The task will be to classify tree species by their traits (e.g. height, stem diameter, geographic location). :deciduous_tree::evergreen_tree::palm_tree: We’ll use the recently published database: Tallo

It contains measurements for almost 500k individual trees from more than 5k species.

The task for the first episode is: Explore and visualize the data. Filter the full set to ~5 species/genera with enough individuals and reasonable overlap to have an interesting classification problem.

Question Pool:

Collaborative Tools and Workflow

For Notebooks (R, python, julia, js, …) with real time collaboration CoCalc seems to be the best option right now. It worked great the last couple of times so we’ll stick to it for now. You need to register an account there (it is free).

Future Suggestions

Add your suggestions to the list and :+1: to the end of a line you are interested in

Data Sets

Tools/Languages

Skills

Data Sources

all data types are welcome, including tables, images, videos, sounds, DNA, …