TidyTuesday Netflix Titles Datensatz. Original auf Kaggle von Shivam Bansal.
#> sample(c("Franzi", "Selina", "Anne", "Helmut", "Markus"))
# [1] "Franzi" "Selina" "Markus" "Helmut" "Anne"
library(tidyverse)
# Get the Data
# Read in with tidytuesdayR package
# Install from CRAN via: install.packages("tidytuesdayR")
# This loads the readme and all the datasets for the week of interest
# Either ISO-8601 date or year/week works!
tuesdata <- tidytuesdayR::tt_load('2021-04-20')
## --- Compiling #TidyTuesday Information for 2021-04-20 ----
## --- There is 1 file available ---
## --- Starting Download ---
##
## Downloading file 1 of 1: `netflix_titles.csv`
## --- Download complete ---
netflix <- tuesdata$netflix
head(netflix) %>% knitr::kable()
show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description |
---|---|---|---|---|---|---|---|---|---|---|---|
s1 | TV Show | 3% | NA | João Miguel, Bianca Comparato, Michel Gomes, Rodolfo Valente, Vaneza Oliveira, Rafael Lozano, Viviane Porto, Mel Fronckowiak, Sergio Mamberti, Zezé Motta, Celso Frateschi | Brazil | August 14, 2020 | 2020 | TV-MA | 4 Seasons | International TV Shows, TV Dramas, TV Sci-Fi & Fantasy | In a future where the elite inhabit an island paradise far from the crowded slums, you get one chance to join the 3% saved from squalor. |
s2 | Movie | 7:19 | Jorge Michel Grau | Demián Bichir, Héctor Bonilla, Oscar Serrano, Azalia Ortiz, Octavio Michel, Carmen Beato | Mexico | December 23, 2016 | 2016 | TV-MA | 93 min | Dramas, International Movies | After a devastating earthquake hits Mexico City, trapped survivors from all walks of life wait to be rescued while trying desperately to stay alive. |
s3 | Movie | 23:59 | Gilbert Chan | Tedd Chan, Stella Chung, Henley Hii, Lawrence Koh, Tommy Kuan, Josh Lai, Mark Lee, Susan Leong, Benjamin Lim | Singapore | December 20, 2018 | 2011 | R | 78 min | Horror Movies, International Movies | When an army recruit is found dead, his fellow soldiers are forced to confront a terrifying secret that’s haunting their jungle island training camp. |
s4 | Movie | 9 | Shane Acker | Elijah Wood, John C. Reilly, Jennifer Connelly, Christopher Plummer, Crispin Glover, Martin Landau, Fred Tatasciore, Alan Oppenheimer, Tom Kane | United States | November 16, 2017 | 2009 | PG-13 | 80 min | Action & Adventure, Independent Movies, Sci-Fi & Fantasy | In a postapocalyptic world, rag-doll robots hide in fear from dangerous machines out to exterminate them, until a brave newcomer joins the group. |
s5 | Movie | 21 | Robert Luketic | Jim Sturgess, Kevin Spacey, Kate Bosworth, Aaron Yoo, Liza Lapira, Jacob Pitts, Laurence Fishburne, Jack McGee, Josh Gad, Sam Golzari, Helen Carey, Jack Gilpin | United States | January 1, 2020 | 2008 | PG-13 | 123 min | Dramas | A brilliant group of students become card-counting experts with the intent of swindling millions out of Las Vegas casinos by playing blackjack. |
s6 | TV Show | 46 | Serdar Akar | Erdal Beşikçioğlu, Yasemin Allen, Melis Birkan, Saygın Soysal, Berkan Şal, Metin Belgin, Ayça Eren, Selin Uludoğan, Özay Fecht, Suna Yıldızoğlu | Turkey | July 1, 2017 | 2016 | TV-MA | 1 Season | International TV Shows, TV Dramas, TV Mysteries | A genetics professor experiments with a treatment for his comatose sister that blends medical and shamanic cures, but unlocks a shocking side effect. |
dim(netflix)
## [1] 7787 12
count(netflix, country, sort = TRUE) %>% head()
## # A tibble: 6 x 2
## country n
## <chr> <int>
## 1 United States 2555
## 2 India 923
## 3 <NA> 507
## 4 United Kingdom 397
## 5 Japan 226
## 6 South Korea 183
count(netflix, country, sort = TRUE) %>% tail()
## # A tibble: 6 x 2
## country n
## <chr> <int>
## 1 Uruguay, Guatemala 1
## 2 Uruguay, Spain, Mexico 1
## 3 Venezuela 1
## 4 Venezuela, Colombia 1
## 5 West Germany 1
## 6 Zimbabwe 1
Manche Filme haben in der country
Spalte mehrere Länder aufgezählt. Wir wollen uns zunächst auf die Filme beschrängen die ein eindeutigem Land zuzuordnen sind. Außerdem ist die duration
Spalte noch nicht numerisch da sie aus Zahl und Einheit (min) besteht.
singleCountryEntry <- netflix %>% filter(!str_detect(country, ","))
singleCountryEntry %>% count(country, sort=TRUE)
## # A tibble: 69 x 2
## country n
## <chr> <int>
## 1 United States 2555
## 2 India 923
## 3 United Kingdom 397
## 4 Japan 226
## 5 South Korea 183
## 6 Canada 177
## 7 Spain 134
## 8 France 115
## 9 Egypt 101
## 10 Mexico 100
## # … with 59 more rows
singleCountryEntry %>%
group_by(country) %>%
filter(type=="Movie") %>%
separate(duration,c("duration","time_unit"),sep=" ") %>%
mutate(duration = as.integer(duration)) %>%
summarise(longestFilm = max(duration), n = n()) %>%
arrange(-n)
## # A tibble: 62 x 3
## country longestFilm n
## <chr> <int> <int>
## 1 United States 312 1850
## 2 India 228 852
## 3 United Kingdom 146 193
## 4 Canada 140 118
## 5 Egypt 253 89
## 6 Spain 163 89
## 7 Turkey 153 73
## 8 Philippines 150 70
## 9 France 133 69
## 10 Japan 151 69
## # … with 52 more rows
netflix %>% separate(duration,c("duration","time_unit"),sep=" ") %>% mutate(duration = as.integer(duration)) %>%
filter(duration == max(duration)) %>% select(title)
## # A tibble: 1 x 1
## title
## <chr>
## 1 Black Mirror: Bandersnatch
# All durations are in min
singleCountryEntry %>% group_by(country) %>% filter(type=="Movie") %>% filter(!str_detect(duration,"min"))
## # A tibble: 0 x 12
## # Groups: country [0]
## # … with 12 variables: show_id <chr>, type <chr>, title <chr>, director <chr>, cast <chr>, country <chr>, date_added <chr>, release_year <dbl>, rating <chr>,
## # duration <chr>, listed_in <chr>, description <chr>
singleCountryEntry %>% separate(duration,c("duration","time_unit"),sep=" ") %>% mutate(duration = as.integer(duration)) %>% filter(type=="Movie")%>%
group_by(country) %>% mutate(n=n()) %>% filter(duration == max(duration)) %>% select(country, title, duration, n) %>% arrange(-duration) %>%
#filter(country %in% c("Germany", "West Germany", "East Germany")) %>%
identity
## # A tibble: 62 x 4
## # Groups: country [62]
## country title duration n
## <chr> <chr> <int> <int>
## 1 United States Black Mirror: Bandersnatch 312 1850
## 2 Egypt The School of Mischief 253 89
## 3 India Sangam 228 852
## 4 Netherlands Elephants Dream 4 Hour 196 12
## 5 Nigeria King of Boys 182 62
## 6 Indonesia This Earth of Mankind 180 68
## 7 Kuwait Bye Bye London 177 4
## 8 Pakistan Ho Mann Jahaan 170 14
## 9 Denmark A Fortunate Man 168 5
## 10 Spain Palm Trees in the Snow 163 89
## # … with 52 more rows
Offenbar stammt der längste Film “Black Mirror: Bandersnatch” mit 312 Minuten Laufzeit aus den USA. Die Angaben auf IMDB geben die Filmlänge jedoch nur mit 90 Minuten an. Hier könnte also ein Fehler im Datensatz vorliegen.
Die Genres sind eine Komma-getrennte Liste pro Film und müssen zunächst separiert werden. Da es zu viele Genres gibt werden alle außer den häufigsten 8 in “Other” zusammengefasst (lumped).
#netflix$listed_in
netflix %>% separate_rows(listed_in, sep = ", ") %>% count(release_year, listed_in, sort = TRUE) %>%
mutate(listed_in=fct_lump(listed_in, 8, w=n)) %>%
ggplot(aes(x = release_year, y = n, fill = listed_in)) +
geom_col() + facet_wrap(~ listed_in, scale="free_y")
netflix %>% separate_rows(listed_in, sep = ", ") %>% count(release_year, listed_in, sort = TRUE) %>%
filter(listed_in=="Classic Movies") %>%
ggplot(aes(x = release_year, y = n)) +
geom_col()
netflix %>% filter(str_detect(listed_in, "Classic Movie")) %>% arrange(-release_year) %>% select(title, release_year, listed_in, country)
## # A tibble: 103 x 4
## title release_year listed_in country
## <chr> <dbl> <chr> <chr>
## 1 The Other Side of the Wind 2018 Classic Movies, Dramas, Independent Movies France, Iran, United States
## 2 Four Weddings and a Funeral 1994 Classic Movies, Comedies, International Movies United Kingdom
## 3 Hum Aapke Hain Koun 1994 Classic Movies, Dramas, International Movies India
## 4 Pulp Fiction 1994 Classic Movies, Cult Movies, Dramas United States
## 5 Philadelphia 1993 Classic Movies, Dramas, LGBTQ Movies United States
## 6 Schindler's List 1993 Classic Movies, Dramas United States
## 7 Searching for Bobby Fischer 1993 Children & Family Movies, Classic Movies, Dramas United States
## 8 Tango Feroz 1993 Classic Movies, Dramas, International Movies Argentina, Spain
## 9 What's Eating Gilbert Grape 1993 Classic Movies, Dramas, Independent Movies United States
## 10 Basic Instinct 1992 Classic Movies, Thrillers United States, France
## # … with 93 more rows
It is “The Other Side of the Wind”.