Data Crunch Dojo

Challenges

Data: 100 Kymographs (as images)
Goal: Determine the number of growth phases for each image (MAE flag)
Our own approach:
- determine the length for every timepoint (using classical image analysis)
- plot this with the images for QC
- extract the number of growth phases by determining breakpoints in the graph (possibly using a sliding window)
- remove images that are too noisy
Possible problem: too easy to solve manually (maybe using 1k images makes it less interesting or use something that is harder for a human to do (e.g. growth rate and duration of the longest growth period))
Possible alterations/extensions:
- use simulated data for warm-up (here we have ground truth)
- let people extract the kymographs from 2D+t images with ROIs
- let people determine the ROIs from the 2D+t images
- extract other values from the images (e.g. mean growth rate or total growth)
- images are taken under different conditions and for different frog species, so we could ask for e.g. mean growth rate by concentration

Sascha and Felix brainstorm ideas with others over lunch. Using data or even scripts from movies.

Image stitching (split a large image into overlapping tiles and shuffle, possible challenge series: exact/inexact match, fixed/dynamic/no overlap, +flipping/mirroring/rotation)
Get most common sequence for a gene from large vcf file
Sequence barcode assignment (fastq file with many sequences and barcode sequence at specific position, count barcodes but allow for sequencing errors, there is a list of valid barcodes, assign closest as long as unique, extensible: using sequence quality or adding UMIs)
Parrondo’s paradox (combine two loosing games to a winning game) - https://en.wikipedia.org/wiki/Parrondo%27s_paradox#Coin-tossing_example

Names in German words (e.g. longest path in graph of names connected by co-occurence in German words)
Survivor (pick one team each week, that you think will win, if it does you survive) (find retrospective selection strategies under constraints, e.g. in the nfl using only NFC teams + 1, find a strategy that works the same in multiple seasons, find the winning selection with the least amount of total victories (or a set of teams with most victories, for which no winning strategy exists), use win probabilities rather than retrospective wins - most likely winning strategy)
Sascha: there is a cool public dataset wit stats from NFL games

Collect ideas from participants and present some of my own.

Split up into teams of 2-3 to work on a challenge together.

In teams of 2-3 (possibly new ones)