Zombies Apocalypse¶

  • Data Source: Kaggle
  • Tasks: compare humans and zombies to identify differences in supplies
  • Language: julia

Context¶

News reports suggest that the impossible has become possible…zombies have appeared on the streets of the US! What should we do? The Centers for Disease Control and Prevention (CDC) zombie preparedness website recommends storing water, food, medication, tools, sanitation items, clothing, essential documents, and first aid supplies. Thankfully, we are CDC analysts and are prepared, but it may be too late for others!

Content¶

Our team decides to identify supplies that protect people and coordinate supply distribution. A few brave data collectors volunteer to check on 200 randomly selected adults who were alive before the zombies. We have recent data for the 200 on age and sex, how many are in their household, and their rural, suburban, or urban location. Our heroic volunteers visit each home and record zombie status and preparedness. Now it's our job to figure out which supplies are associated with safety!

File¶

Because every moment counts when dealing with life and (un)death, we want to get this right! The first task is to compare humans and zombies to identify differences in supplies. We review the data and find the following:

  • zombieid: unique identifier
  • zombie: human or zombie
  • age: age in years
  • sex: male or female
  • rurality: rural, suburban, or urban
  • household: number of people living in household
  • water: gallons of clean water available
  • food: food or no food
  • medication: medication or no medication
  • tools: tools or no tools
  • firstaid: first aid or no first aid
  • sanitation: sanitation or no sanitation
  • clothing: clothing or no clothing
  • documents: documents or no documents

Acknowledgements¶

DataCamp

In [1]:
ENV["COLUMNS"] = 1000; # print more columns of tables
In [2]:
using Random
In [3]:
Random.seed!(42)
"Andi Kerstin Chris Caro Jana" |> split |> shuffle |> x -> join(x," → ")
Out[3]:
"Kerstin → Chris → Andi → Jana → Caro"

1. Data loading¶

In [4]:
using Dates
using CSV
using DataFrames
In [5]:
data = CSV.read("zombies.csv", DataFrame)
first(data, 5)
Out[5]:
5×14 DataFrame
Rowzombieidzombieagesexruralityhouseholdwaterfoodmedicationtoolsfirstaidsanitationclothingdocuments
Int64String7Int64String7String15Int64Int64String7String15String15String31String15String15String15
11Human18FemaleRural10FoodMedicationNo toolsFirst aid suppliesSanitationClothingNA
22Human18MaleRural324FoodMedicationtoolsFirst aid suppliesSanitationClothingNA
33Human18MaleRural416FoodMedicationNo toolsFirst aid suppliesSanitationClothingNA
44Human19MaleRural10FoodMedicationtoolsNo first aid suppliesSanitationClothingNA
55Human19MaleUrban10FoodMedicationNo toolsFirst aid suppliesSanitationNANA
In [6]:
unique(data.food)
Out[6]:
2-element Vector{String7}:
 "Food"
 "No food"
In [7]:
unique(data.medication)
Out[7]:
2-element Vector{String15}:
 "Medication"
 "No medication"
In [8]:
unique(data.tools)
Out[8]:
2-element Vector{String15}:
 "No tools"
 "tools"
In [9]:
unique(data.firstaid)
Out[9]:
2-element Vector{String31}:
 "First aid supplies"
 "No first aid supplies"
In [10]:
unique(data.sanitation)
Out[10]:
2-element Vector{String15}:
 "Sanitation"
 "No sanitation"
In [11]:
unique(data.clothing)
Out[11]:
2-element Vector{String15}:
 "Clothing"
 "NA"
In [12]:
unique(data.documents)
Out[12]:
2-element Vector{String15}:
 "NA"
 "Documents"
In [13]:
using GLMakie
In [14]:
using DataFramesMeta
using Chain
using StatsBase
In [15]:
unique(data.age)
dict_age = sort(countmap(data.age))
Out[15]:
OrderedCollections.OrderedDict{Int64, Int64} with 62 entries:
  18 => 4
  19 => 4
  20 => 3
  21 => 5
  22 => 1
  23 => 4
  24 => 5
  25 => 7
  26 => 4
  27 => 2
  28 => 6
  29 => 6
  30 => 4
  31 => 2
  32 => 8
  33 => 3
  34 => 2
  35 => 2
  36 => 5
  ⋮  => ⋮
In [16]:
collect(values(dict_age))
Out[16]:
62-element Vector{Int64}:
 4
 4
 3
 5
 1
 4
 5
 7
 4
 2
 ⋮
 1
 2
 3
 1
 2
 1
 2
 1
 1
In [18]:
f, ax, plt = hist(data_grouped[1].age, color = (:blue, 0.5), label = "Human")
hist!(ax, data_grouped[2].age, color = (:red, 0.5), label = "Zombie")
axislegend(ax)
display(f)
Out[18]:
GLMakie.Screen(...)
In [17]:
data_grouped = groupby(data, :zombie)
Out[17]:

GroupedDataFrame with 2 groups based on key: zombie

First Group (121 rows): zombie = "Human"
96 rows omitted
Rowzombieidzombieagesexruralityhouseholdwaterfoodmedicationtoolsfirstaidsanitationclothingdocuments
Int64String7Int64String7String15Int64Int64String7String15String15String31String15String15String15
11Human18FemaleRural10FoodMedicationNo toolsFirst aid suppliesSanitationClothingNA
22Human18MaleRural324FoodMedicationtoolsFirst aid suppliesSanitationClothingNA
33Human18MaleRural416FoodMedicationNo toolsFirst aid suppliesSanitationClothingNA
44Human19MaleRural10FoodMedicationtoolsNo first aid suppliesSanitationClothingNA
55Human19MaleUrban10FoodMedicationNo toolsFirst aid suppliesSanitationNANA
66Human19FemaleUrban10FoodMedicationtoolsFirst aid suppliesSanitationClothingNA
77Human20FemaleSuburban20No foodMedicationNo toolsFirst aid suppliesSanitationClothingNA
88Human20FemaleRural20FoodNo medicationNo toolsNo first aid suppliesSanitationClothingNA
99Human21FemaleUrban18No foodNo medicationtoolsFirst aid suppliesSanitationClothingDocuments
1010Human21FemaleRural28No foodNo medicationtoolsFirst aid suppliesSanitationClothingDocuments
1111Human21MaleRural18FoodNo medicationNo toolsFirst aid suppliesNo sanitationNANA
1212Human21MaleRural216No foodMedicationNo toolsNo first aid suppliesSanitationClothingDocuments
1313Human22MaleSuburban216FoodMedicationNo toolsFirst aid suppliesSanitationClothingDocuments
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮
110110Human63FemaleRural10FoodMedicationNo toolsFirst aid suppliesSanitationClothingDocuments
111111Human65FemaleRural216FoodNo medicationtoolsNo first aid suppliesNo sanitationNANA
112112Human67MaleRural216No foodNo medicationtoolsNo first aid suppliesNo sanitationNANA
113113Human68MaleRural28No foodMedicationNo toolsFirst aid suppliesSanitationClothingDocuments
114114Human69FemaleRural28FoodMedicationNo toolsFirst aid suppliesNo sanitationNANA
115115Human71FemaleUrban28FoodMedicationNo toolsNo first aid suppliesSanitationClothingDocuments
116116Human72MaleSuburban20FoodMedicationtoolsNo first aid suppliesNo sanitationClothingNA
117117Human74MaleSuburban10FoodMedicationtoolsFirst aid suppliesNo sanitationClothingNA
118118Human75FemaleRural18FoodNo medicationtoolsFirst aid suppliesNo sanitationNANA
119119Human77FemaleRural18FoodMedicationNo toolsNo first aid suppliesNo sanitationNANA
120120Human81MaleRural18FoodMedicationtoolsNo first aid suppliesNo sanitationNANA
121121Human32MaleRural28FoodNo medicationNo toolsFirst aid suppliesSanitationClothingDocuments

⋮

Last Group (79 rows): zombie = "Zombie"
54 rows omitted
Rowzombieidzombieagesexruralityhouseholdwaterfoodmedicationtoolsfirstaidsanitationclothingdocuments
Int64String7Int64String7String15Int64Int64String7String15String15String31String15String15String15
1122Zombie20FemaleUrban20FoodNo medicationtoolsFirst aid suppliesNo sanitationClothingNA
2123Zombie23MaleSuburban30No foodNo medicationNo toolsNo first aid suppliesNo sanitationClothingNA
3124Zombie25FemaleRural50No foodNo medicationNo toolsNo first aid suppliesNo sanitationClothingNA
4125Zombie28FemaleSuburban30No foodMedicationtoolsFirst aid suppliesNo sanitationClothingNA
5126Zombie31FemaleRural40No foodNo medicationtoolsFirst aid suppliesNo sanitationClothingNA
6127Zombie32MaleSuburban40No foodNo medicationNo toolsNo first aid suppliesSanitationNADocuments
7128Zombie42MaleRural48No foodNo medicationNo toolsNo first aid suppliesSanitationClothingDocuments
8129Zombie43MaleUrban58No foodNo medicationtoolsFirst aid suppliesNo sanitationClothingNA
9130Zombie44MaleRural58FoodNo medicationtoolsFirst aid suppliesNo sanitationClothingNA
10131Zombie45MaleUrban40FoodMedicationNo toolsNo first aid suppliesNo sanitationClothingNA
11132Zombie47FemaleUrban20No foodMedicationNo toolsNo first aid suppliesNo sanitationClothingNA
12133Zombie48FemaleSuburban30No foodNo medicationNo toolsFirst aid suppliesNo sanitationClothingNA
13134Zombie48FemaleUrban20No foodNo medicationtoolsFirst aid suppliesSanitationClothingDocuments
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮
68189Zombie32MaleUrban30No foodNo medicationNo toolsFirst aid suppliesNo sanitationClothingNA
69190Zombie41FemaleRural50No foodNo medicationtoolsFirst aid suppliesNo sanitationClothingNA
70191Zombie43FemaleRural50No foodNo medicationNo toolsNo first aid suppliesSanitationClothingDocuments
71192Zombie48FemaleSuburban48No foodNo medicationNo toolsNo first aid suppliesNo sanitationNANA
72193Zombie58MaleUrban10FoodNo medicationtoolsFirst aid suppliesNo sanitationNANA
73194Zombie65MaleUrban10No foodNo medicationtoolsFirst aid suppliesNo sanitationNANA
74195Zombie67FemaleSuburban20No foodNo medicationNo toolsNo first aid suppliesNo sanitationNANA
75196Zombie68MaleSuburban10FoodNo medicationNo toolsNo first aid suppliesSanitationClothingDocuments
76197Zombie71MaleSuburban18No foodNo medicationtoolsFirst aid suppliesNo sanitationClothingNA
77198Zombie76FemaleUrban10No foodNo medicationtoolsFirst aid suppliesSanitationClothingDocuments
78199Zombie82MaleUrban10No foodNo medicationNo toolsNo first aid suppliesNo sanitationNANA
79200Zombie85MaleUrban10No foodMedicationNo toolsNo first aid suppliesSanitationClothingNA
In [19]:
countmap(data_grouped[1].sex)
Out[19]:
Dict{String7, Int64} with 2 entries:
  "Female" => 62
  "Male"   => 59
In [20]:
first(data)
Out[20]:
DataFrameRow (14 columns)
Rowzombieidzombieagesexruralityhouseholdwaterfoodmedicationtoolsfirstaidsanitationclothingdocuments
Int64String7Int64String7String15Int64Int64String7String15String15String31String15String15String15
11Human18FemaleRural10FoodMedicationNo toolsFirst aid suppliesSanitationClothingNA
In [21]:
data_count = @chain data begin
    groupby(:zombie)
    @combine(:sex = countmap(:sex), 
    :rurality = countmap(:rurality), 
    :food = countmap(:food), 
    :medication = countmap(:medication), 
    :tools = countmap(:tools), 
    :firstaid = countmap(:firstaid), 
    :sanitation = countmap(:sanitation), 
    :clothing = countmap(:clothing), 
    :documents = countmap(:documents))
end
Out[21]:
2×10 DataFrame
Rowzombiesexruralityfoodmedicationtoolsfirstaidsanitationclothingdocuments
String7Dict…Dict…Dict…Dict…Dict…Dict…Dict…Dict…Dict…
1HumanDict{String7, Int64}("Female"=>62, "Male"=>59)Dict{String15, Int64}("Urban"=>16, "Rural"=>80, "Suburban"=>25)Dict{String7, Int64}("Food"=>91, "No food"=>30)Dict{String15, Int64}("No medication"=>43, "Medication"=>78)Dict{String15, Int64}("tools"=>60, "No tools"=>61)Dict{String31, Int64}("First aid supplies"=>67, "No first aid supplies"=>54)Dict{String15, Int64}("Sanitation"=>73, "No sanitation"=>48)Dict{String15, Int64}("NA"=>47, "Clothing"=>74)Dict{String15, Int64}("NA"=>77, "Documents"=>44)
2ZombieDict{String7, Int64}("Female"=>37, "Male"=>42)Dict{String15, Int64}("Urban"=>38, "Rural"=>18, "Suburban"=>23)Dict{String7, Int64}("Food"=>19, "No food"=>60)Dict{String15, Int64}("No medication"=>63, "Medication"=>16)Dict{String15, Int64}("tools"=>39, "No tools"=>40)Dict{String31, Int64}("First aid supplies"=>39, "No first aid supplies"=>40)Dict{String15, Int64}("Sanitation"=>25, "No sanitation"=>54)Dict{String15, Int64}("NA"=>27, "Clothing"=>52)Dict{String15, Int64}("NA"=>57, "Documents"=>22)
In [22]:
colors = [:red, :blue]
elem_1 = [PolyElement(color = :red, strokecolor = :blue, strokewidth = 1)]
elem_2 = [PolyElement(color = :blue, strokecolor = :blue, strokewidth = 1)]
Out[22]:
1-element Vector{PolyElement}:
 PolyElement(Attributes with 3 entries:
  polycolor => blue
  polystrokecolor => blue
  polystrokewidth => 1)
In [23]:
f, ax, plt = pie(collect(values(data_count.sex[1])),
                 color = colors,
                 radius = 4,
                 inner_radius = 2,
                 strokecolor = :white,
                 strokewidth = 5,
                 axis = ( autolimitaspect = 1, ))
ax2 = Axis(f[1,2],  autolimitaspect = 1, )
pie!(ax2, collect(values(data_count.sex[2])),
color = colors,
radius = 4,
inner_radius = 2,
strokecolor = :white,
strokewidth = 5)
Legend(f[1, 3],
    [elem_1, elem_2],
    ["Female", "Male"],
    patchsize = (35, 35), rowgap = 10)
display(f)
Out[23]:
GLMakie.Screen(...)
In [24]:

In [24]:
f, ax, plt = pie(collect(values(data_count.food[1])),
                 color = colors,
                 radius = 4,
                 inner_radius = 2,
                 strokecolor = :white,
                 strokewidth = 5,
                 axis = ( autolimitaspect = 1, ))
                
ax2 = Axis(f[1,2], autolimitaspect = 1, )
pie!(ax2, collect(values(data_count.food[2])),
color = colors,
radius = 4,
inner_radius = 2,
strokecolor = :white,
strokewidth = 5)
Legend(f[1, 3],
    [elem_1, elem_2],
    ["Food", "No Food"],
    patchsize = (35, 35), rowgap = 10)
display(f)
Out[24]:
GLMakie.Screen(...)
In [25]:
f, ax, plt = pie(collect(values(data_count.medication[1])),
                 color = colors,
                 radius = 4,
                 inner_radius = 2,
                 strokecolor = :white,
                 strokewidth = 5,
                 axis = ( autolimitaspect = 1, ))
                
ax2 = Axis(f[1,2] , autolimitaspect = 1, )
pie!(ax2, collect(values(data_count.medication[2])),
color = colors,
radius = 4,
inner_radius = 2,
strokecolor = :white,
strokewidth = 5)
Legend(f[1, 3],
    [elem_1, elem_2],
    ["Medication", "No Medication"],
    patchsize = (35, 35), rowgap = 10)
display(f)
Out[25]:
GLMakie.Screen(...)
In [0]: