Here, we are going to use the Lemurs data set provided by the TidyTuesday project to illustrate:
In this analysis, we will rely heavily on packages from the Tidyverse. This is a collection of packages that were collectively designed specifically for data science. This means that these packages share data structures, function syntax, and can be easily combined in a slick, yet comprehensible workflow.
library(tidyverse)
library(lubridate) # easy manipulation of date objects
More specifically, we will work with the following packages:
readr
: functions to easily, yet reliably read in rectangular data (e.g. csv, tsv) containing multiple data types (e.g. numeric, logical). By reliably, I mean that it can recognize errors in that table formatting that require checking by the user (e.g. the occurrence of numeric values in a seemingly logical column).
tidyr
: functions to create and manipulate “tidy data”, i.e., data where each column is a variable, each row is an observation, and each is unique. The other functions in tidyverse
are optimized to work with this type of data.
dplyr
: functions for data manipulation (e.g. filtering, summarizing). One of features that make this package particularly good is the fact that functions are names as verbs, indicating the type of data transformation that it does. This makes reading the code considerably easy.
stringr
: functions for string manipulation, considering that this is not one of base R strengths.
gglot2
: functions to code graphs following the (“Grammar of Graphics”)[https://cfss.uchicago.edu/notes/grammar-of-graphics/]. Simply put, the grammar of graphics is a system of rules that allows coding data into visual elements - reading the article above and other precise definitions is highly recommendable, though.
The files in this project are organized as such:
data_crunch_wue
|--README.md
|--lemurs.Rmd
|--figures
|--results
| |--data
| | |--raw
| | |--processed
| |--figures
| |--tables
| |--scripts
This file structure adapts the minimal set up I propose for scientific computational projects. The idea is organizing the project around the .Rmd
file (the .html
version of which you are reading right now). By combining descriptive text code and results of the analysis, this “computational notebook” facilitates communication and reproducibility of the work it reports. As part of this set up, inputs and outputs can be accessed with relative paths:
raw_dir <- file.path("results", "data", "raw")
processed_dir <- file.path("results", "data", "processed")
scripts_dir <- file.path("reults", "scripts")
figures_dir <- file.path("results", "figures")
tables_dir <- file.path("results", "tables")
As shown in the dataset page, the original data can be downloaded the git repository:
lemurs_df <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-24/lemur_data.csv')
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_double(),
## taxon = col_character(),
## dlc_id = col_character(),
## hybrid = col_character(),
## sex = col_character(),
## name = col_character(),
## current_resident = col_character(),
## stud_book = col_character(),
## dob = col_date(format = ""),
## estimated_dob = col_character(),
## birth_type = col_character(),
## birth_institution = col_character(),
## estimated_concep = col_date(format = ""),
## dam_id = col_character(),
## dam_name = col_character(),
## dam_taxon = col_character(),
## dam_dob = col_date(format = ""),
## sire_id = col_character(),
## sire_name = col_character(),
## sire_taxon = col_character(),
## sire_dob = col_date(format = "")
## # ... with 8 more columns
## )
## ℹ Use `spec()` for the full column specifications.
## Warning: 29130 parsing failures.
## row col expected actual file
## 1324 age_of_living_y 1/0/T/F/TRUE/FALSE 23.77 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-24/lemur_data.csv'
## 1325 age_of_living_y 1/0/T/F/TRUE/FALSE 23.77 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-24/lemur_data.csv'
## 1326 age_of_living_y 1/0/T/F/TRUE/FALSE 23.77 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-24/lemur_data.csv'
## 1327 age_of_living_y 1/0/T/F/TRUE/FALSE 23.77 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-24/lemur_data.csv'
## 1328 age_of_living_y 1/0/T/F/TRUE/FALSE 23.77 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-24/lemur_data.csv'
## .... ............... .................. ...... ..........................................................................................................
## See problems(...) for more details.
Already at loading, it seems we come across a parsing error: read_csv
identified the column age_of_living_y
as containing characters of type logical
(it cites the expected values as `1/0/T/F/TRUE/FALSE
), but it seems that at line 1324, the value is a double
. The simplest reason why this error can occur is that, with default settings, the read_csv
function identifies the types of objects (character
, logical
, etc.) in each column of the data frame based on the first 1000 rows. We can verify whether the contents of the first 1000 rows:
unique(lemurs_df[1:1000, "age_of_living_y"])
## # A tibble: 1 x 1
## age_of_living_y
## <lgl>
## 1 NA
Thus, we see that missing data in these rows lead to the issue with its identification. We can fix it by explicitly identifying the types of objects in the columns of the data frame. Before we do this, however, let’s verify that age_of_living_y
was the only column that raised an issue, with the problems
attribute of objects read with the read_*
functions from the readr
package. This attribute stores parsing problems in a data frame containing the row
and col
where expected
and actual
values differ.
unique(problems(lemurs_df)$col)
## [1] "age_of_living_y"
When specifying the columns types, it’s all or nothing: we either identify all of them, or none at all. With 54 columns, this would be a lot, but this is where the magic starts: the spec()
function lists all column types, and we just need to fixed the ones that were read in wrong.
spec(lemurs_df)
## cols(
## taxon = col_character(),
## dlc_id = col_character(),
## hybrid = col_character(),
## sex = col_character(),
## name = col_character(),
## current_resident = col_character(),
## stud_book = col_character(),
## dob = col_date(format = ""),
## birth_month = col_double(),
## estimated_dob = col_character(),
## birth_type = col_character(),
## birth_institution = col_character(),
## litter_size = col_double(),
## expected_gestation = col_double(),
## estimated_concep = col_date(format = ""),
## concep_month = col_double(),
## dam_id = col_character(),
## dam_name = col_character(),
## dam_taxon = col_character(),
## dam_dob = col_date(format = ""),
## dam_age_at_concep_y = col_double(),
## sire_id = col_character(),
## sire_name = col_character(),
## sire_taxon = col_character(),
## sire_dob = col_date(format = ""),
## sire_age_at_concep_y = col_double(),
## dod = col_date(format = ""),
## age_at_death_y = col_double(),
## age_of_living_y = col_logical(),
## age_last_verified_y = col_double(),
## age_max_live_or_dead_y = col_double(),
## n_known_offspring = col_double(),
## dob_estimated = col_character(),
## weight_g = col_double(),
## weight_date = col_date(format = ""),
## month_of_weight = col_double(),
## age_at_wt_d = col_double(),
## age_at_wt_wk = col_double(),
## age_at_wt_mo = col_double(),
## age_at_wt_mo_no_dec = col_double(),
## age_at_wt_y = col_double(),
## change_since_prev_wt_g = col_double(),
## days_since_prev_wt = col_double(),
## avg_daily_wt_change_g = col_double(),
## days_before_death = col_double(),
## r_min_dam_age_at_concep_y = col_double(),
## age_category = col_character(),
## preg_status = col_character(),
## expected_gestation_d = col_double(),
## concep_date_if_preg = col_date(format = ""),
## infant_dob_if_preg = col_date(format = ""),
## days_before_inf_birth_if_preg = col_double(),
## pct_preg_remain_if_preg = col_double(),
## infant_lit_sz_if_preg = col_double()
## )
lemurs_rawdf <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-24/lemur_data.csv',
col_types = cols(
.default = col_double(),
taxon = col_character(),
dlc_id = col_character(),
hybrid = col_character(),
sex = col_character(),
name = col_character(),
current_resident = col_character(),
stud_book = col_character(),
dob = col_date(format = ""),
birth_month = col_double(),
estimated_dob = col_character(),
birth_type = col_character(),
birth_institution = col_character(),
estimated_concep = col_date(format = ""),
dam_id = col_character(),
dam_name = col_character(),
dam_taxon = col_character(),
dam_dob = col_date(format = ""),
dam_age_at_concep_y = col_double(),
sire_id = col_character(),
sire_name = col_character(),
sire_taxon = col_character(),
sire_dob = col_date(format = ""),
dod = col_date(format = ""),
age_of_living_y = col_double(), ## the column that was tyoed wrong by default
dob_estimated = col_character(),
weight_date = col_date(format = ""),
age_category = col_character(),
preg_status = col_character(),
concep_date_if_preg = col_date(format = ""),
infant_dob_if_preg = col_date(format = "")
)
)
Let’s also load a data frame with the species full names and abbreviations, to use later for more understandable graphs and tables:
lemurs_sppnames_df <- readr::read_csv("https://raw.githubusercontent.com/ludmillafigueiredo/computational_notebooks/master/examples/datastudy_r/results/data/raw/lemurs_sppnames.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## taxon = col_character(),
## species = col_character(),
## common_name = col_character()
## )
We have a couple of things to work on in the original table, to make it originally more digestible:
lemurs_smallts <- dplyr::mutate_at(lemurs_rawdf, vars(name, dam_name, sire_name), stringr::str_to_title)
weight_date
and month_of_weight
variables report the full date and the month when the weight was measured, respectively. It would be good to have those two easily accessible.lemurs_smallts <- dplyr::mutate(lemurs_smallts, year = lubridate::year(weight_date))
lemurs_smallts <- dplyr::rename(lemurs_smallts, month = month_of_weight)
lemurs_smallts <- dplyr::select(lemurs_smallts,
c(year, month, ## time variables
taxon, dlc_id, ## id variables
hybrid, sex, name, birth_month, litter_size, concep_month, ## birth variables
dam_id, dam_name, dam_taxon, ## name of mother
sire_id, sire_name, sire_taxon, ## name of father
age_at_death_y, age_of_living_y, age_last_verified_y, ## age variables
age_max_live_or_dead_y, age_at_wt_y, age_category, ## age variables
weight_g, avg_daily_wt_change_g, ## weight variables
preg_status,
n_known_offspring, infant_lit_sz_if_preg))
lemurs_smallts <- dplyr::rename(lemurs_smallts,
weight = weight_g,
avg_d_wt_chg = avg_daily_wt_change_g,
n_offspring = n_known_offspring)
However, having that many iterations redefining the same object (the lemurs_smallts
in this case) is not good practice, because if you forget one of them for some reason, it can lead to errors down the line (e.g. you do transformations on one of the “intermediate” stages). Having multiple objects is also not great, because one would have to name them, and it would be a waste of creativity on temporary files. With that in mind, let’s try some magic: We will put all the transformations together in a pipeline, where the transformations are chained in a readable form, any only one data frame is created at the end:
lemurs_smallts <- lemurs_rawdf %>%
# 1. capitalizing the first letter, only
dplyr::mutate_at(vars(name, dam_name, sire_name), stringr::str_to_title) %>%
# 2. extract the year of the measure, and give a simpler name to the column containing the
dplyr::mutate(year = lubridate::year(weight_date)) %>%
dplyr::rename(month = month_of_weight) %>%
# 3. select the most relevant
dplyr::select(c(year, month, ## time variables
taxon, dlc_id, ## id variables
hybrid, sex, name, birth_month, litter_size, concep_month, ## birth variables
dam_id, dam_name, dam_taxon, ## name of mother
sire_id, sire_name, sire_taxon, ## name of father
age_at_death_y, age_of_living_y, age_last_verified_y, ## age variables
age_max_live_or_dead_y, age_at_wt_y, age_category, ## age variables
weight_g, avg_daily_wt_change_g, ## weight variables
preg_status,
n_known_offspring, infant_lit_sz_if_preg)) %>%
## simplify the names
dplyr::rename(weight = weight_g,
avg_d_wt_chg = avg_daily_wt_change_g,
n_offspring = n_known_offspring)
If we are trying to protect a species, reproduction is one of the most important aspects to understand. With the DLC lemur data, we can estimate fertility rates, reproductive seasons, and the relationship between age, sizes and offspring production.
Let’s have a look at the fertility rates of the species we are trying to save:
fertiilty_df <- lemurs_smallts %>%
dplyr::right_join(lemurs_sppnames_df,., by = "taxon") %>% ## add species names, so we have a complete table
dplyr::filter(!is.na(infant_lit_sz_if_preg)) %>% ## filter the animals for which this information was available
dplyr::group_by(dlc_id, species) %>%
dplyr::summarize(inflt_mean_ind = mean(infant_lit_sz_if_preg)) %>%
ungroup() %>%
dplyr::group_by(species) %>%
dplyr::summarize(inflt_mean = mean(inflt_mean_ind),
inflt_sd = sd(inflt_mean_ind),
n = n()) %>%
dplyr::rename(Species = species,
"Infant litter size (mean)" = inflt_mean,
"Infant litter size (sd)" = inflt_sd)
We can save this table
readr::write_csv(fertiilty_df, file = file.path(tables_dir, "fertility_rates.csv"))
Or have it nicely displayed in our html file:
fertiilty_df %>%
kableExtra::kbl(caption = "Fertility rates (mean +- sd) of the species housed at the Duke Lemur Center, in North Carolina, USA.")%>%
kableExtra::kable_styling(c("striped", "hover")) %>%
kableExtra::scroll_box(width = "100%", height = "300px")
Species | Infant litter size (mean) | Infant litter size (sd) | n |
---|---|---|---|
Cheirogaleus medius | 1.912775 | 0.8543182 | 13 |
Daubentonia madagascariensis | 1.000000 | 0.0000000 | 7 |
Eulemur albifrons | 1.000000 | NA | 1 |
Eulemur collaris | 1.000000 | 0.0000000 | 7 |
Eulemur coronatus | 1.108333 | 0.2486072 | 10 |
Eulemur Eulemur | 1.000000 | 0.0000000 | 5 |
Eulemur flavifrons | 1.051079 | 0.1270108 | 13 |
Eulemur fulvus | 1.000000 | 0.0000000 | 4 |
Eulemur macaco | 1.125000 | 0.3535534 | 8 |
Eulemur mongoz | 1.010417 | 0.0416667 | 16 |
Eulemur rubriventer | 1.080000 | 0.1788854 | 5 |
Eulemur rufus | 1.145833 | 0.2736801 | 8 |
Eulemur sanfordi | 1.000000 | NA | 1 |
Galago moholi | 1.357143 | 0.4738035 | 4 |
Hapalemur griseus griseus | 1.123457 | 0.2055889 | 9 |
Lemur catta | 1.330122 | 0.3794544 | 32 |
Loris tardigradus | 1.000000 | 0.0000000 | 10 |
Mircocebus murinus | 2.326840 | 0.8419061 | 22 |
Mirza coquereli | 1.733333 | 0.4346135 | 5 |
Nycticebus coucang | 1.000000 | 0.0000000 | 14 |
Nycticebus pygmaeus | 1.728566 | 0.2791717 | 7 |
Otolemur garnettii garnettii | 1.136842 | 0.3336840 | 19 |
Perodicticus potto | 1.000000 | 0.0000000 | 3 |
Propithecus coquereli | 1.000000 | 0.0000000 | 22 |
Varecia rubra | 1.998291 | 0.7132938 | 13 |
Varecia Varecia | 2.000000 | 0.0000000 | 2 |
Varecia variegata variegata | 1.959375 | 0.7996520 | 16 |
Question: one could argue that this summary is hiding some valuable information. Any guesses?
Many species of lemurs are seasonal breeders, meaning that are specific times of the year when animal will look for partners and reproduce. Let’s see if we can detect it in data.
First, I know that the data contains the dates in numeric form, but it would be nice to have the names of each month in data, for later plotting Remember, we are obeying the Grammar of Graphics, so we cannot simply paste tags with the names of months later on.
So, we start with a simple data frame with the relevant information: individuals id, taxon, and month of birth.
births_df <- lemurs_smallts %>%
dplyr::select(dlc_id, taxon, birth_month) %>%
dplyr::filter(!is.na(birth_month)) %>%
dplyr::mutate_at(vars(birth_month),
lubridate::month, label = TRUE,
locale = Sys.getlocale(category = "LC_CTYPE")) %>% ## id months
dplyr::right_join(lemurs_sppnames_df,., by = "taxon") %>% ## id species
dplyr::arrange(taxon, birth_month)
Now, let’s count the number of births that happened per species, per month:
birth_season_countdf <- births_df %>%
unique() %>%
dplyr::group_by(species, common_name, taxon, birth_month) %>%
dplyr::summarize(n_births = n()) %>%
ungroup() %>%
dplyr::arrange(species, common_name, taxon, birth_month)
## `summarise()` regrouping output by 'species', 'common_name', 'taxon' (override with `.groups` argument)
Let’s say we would like to plot this:
birth_season_countdf %>%
ggplot(aes(x = birth_month, y = n_births, fill = species)) +
geom_bar(alpha=0.6, stat = "identity") +
facet_wrap(~species, ncol = 3) +
labs(x = "Month", y = "Number of births (mean)") +
theme(legend.position = "none",
axis.text.x = element_text(angle = 45))
I can define specific aesthetic values to be applied to my plot:
source("results/scripts/custom_aesthetics.R")
birth_season_countdf %>%
ggplot(aes(x = birth_month, y = n_births, fill = species)) +
geom_bar(alpha=0.6, stat = "identity") +
theme_lemurs() +
facet_wrap(~species, ncol = 3) +
labs(x = "Month", y = "Number of births (mean)") +
theme(legend.position = "none",
axis.text.x = element_text(angle = 45))
Challenge: there is a mistake in this summary. What is it?
Question: If we were talking to a larger audience, we could include the common names of the species in this graph, how would we go about ?
offspring_df <- lemurs_smallts %>%
dplyr::right_join(lemurs_sppnames_df,., by = "taxon") %>% ## id species
dplyr::select(year, month, species, taxon, dlc_id, sex,
litter_size, ## size of litter it was born into
age_at_wt_y, weight, ## age and weight
preg_status, ## pregnancy status
n_offspring, ## total number of offspring produced until that day
infant_lit_sz_if_preg) ## size of litter, if pregnant
Let’s see if we can find some relationship between pregnant female weight and the size of the litter it is carrying.
First, let’s look into species separately:
offspring_df %>%
## filter only the pregnant females
dplyr::filter(preg_status == "P") %>%
## get their last measurement while pregnant
dplyr::group_by(dlc_id) %>%
dplyr::filter(age_at_wt_y == max(age_at_wt_y)) %>%
ungroup() %>%
ggplot(aes(x = weight, y = litter_size))+
geom_point(alpha = 0.2) +
facet_wrap(~species, ncol = 3, scales = "free") +
labs(x = "Weight (log(g))", y = "Species", size = "Infant litter size") +
theme_lemurs()
## Warning: Removed 95 rows containing missing values (geom_point).
This was not very informative, but let’s see if we can have a summarized graph, at least
offspring_df %>%
## filter only the pregnant females
dplyr::filter(preg_status == "P") %>%
## get their last measurement while pregnant
dplyr::group_by(dlc_id) %>%
dplyr::filter(age_at_wt_y == max(age_at_wt_y)) %>%
ungroup() %>%
ggplot(aes(x = log(weight), y = species))+
geom_point(aes(size = infant_lit_sz_if_preg), alpha = 0.2) +
labs(x = "Weight (log(g))", y = "Species", size = "Infant litter size") +
theme_lemurs()
## Warning: Removed 7 rows containing missing values (geom_point).
### Individual weight and litter size Is individual’s weight affected by the size of the litter it was in? Get individual’s weight at its younger age and plot it against against the litter it came from (separate males and females differently)
litterweight_df <- offspring_df %>%
dplyr::group_by(dlc_id) %>%
dplyr::filter(age_at_wt_y == min(age_at_wt_y)) %>% ## filter for the younger age of a single individual
ungroup()
litterweight_df %>%
dplyr::group_by(species) %>%
dplyr::summarize(weight_mean = mean(weight),
weight_sd = sd(weight)) %>%
dplyr::arrange(weight_mean) %>%
kableExtra::kbl(caption = "Infant size (mean +- sd) of the species housed at the Duke Lemur Center, in North Carolina, USA.")%>%
kableExtra::kable_styling(c("striped", "hover")) %>%
kableExtra::scroll_box(width = "100%", height = "300px")
## `summarise()` ungrouping output (override with `.groups` argument)
species | weight_mean | weight_sd |
---|---|---|
Mircocebus murinus | 46.55263 | 40.60969 |
Galago moholi | 56.44510 | 65.37406 |
Loris tardigradus | 88.04091 | 71.92052 |
Nycticebus pygmaeus | 140.01418 | 207.20038 |
Cheirogaleus medius | 140.57398 | 112.29466 |
Mirza coquereli | 224.13714 | 108.32224 |
Hapalemur griseus griseus | 300.79592 | 345.94202 |
Daubentonia madagascariensis | 409.59273 | 779.91112 |
Perodicticus potto | 544.68182 | 476.33900 |
Eulemur flavifrons | 551.80286 | 815.14909 |
Nycticebus coucang | 585.27612 | 511.38793 |
Propithecus coquereli | 611.64595 | 1307.95988 |
Otolemur garnettii garnettii | 624.86850 | 451.39680 |
Eulemur rubriventer | 651.52000 | 855.25426 |
Eulemur mongoz | 756.73619 | 696.07625 |
Lemur catta | 849.66735 | 990.33820 |
Eulemur coronatus | 896.58644 | 693.56773 |
Varecia Varecia | 943.37879 | 1297.88274 |
Eulemur Eulemur | 1014.67419 | 970.02145 |
Eulemur rufus | 1227.22245 | 1014.64928 |
Eulemur macaco | 1238.90137 | 1091.10120 |
Varecia rubra | 1248.20649 | 1327.63775 |
Eulemur collaris | 1424.04828 | 1006.32052 |
Eulemur sanfordi | 1596.78421 | 696.43490 |
Eulemur fulvus | 1617.04054 | 1008.32868 |
Varecia variegata variegata | 1621.11523 | 1567.93244 |
Eulemur albifrons | 1756.97059 | 969.45562 |
litterweight_df %>%
dplyr::group_by(species) %>%
dplyr::summarize(litter_mean = mean(litter_size),
litter_sd = sd(litter_size)) %>%
dplyr::arrange(litter_mean) %>%
kableExtra::kbl(caption = "Litter size (mean +- sd) of the species housed at the Duke Lemur Center, in North Carolina, USA.")%>%
kableExtra::kable_styling(c("striped", "hover")) %>%
kableExtra::scroll_box(width = "100%", height = "300px")
## `summarise()` ungrouping output (override with `.groups` argument)
species | litter_mean | litter_sd |
---|---|---|
Varecia Varecia | 2.333333 | 0.595119 |
Cheirogaleus medius | NA | NA |
Daubentonia madagascariensis | NA | NA |
Eulemur albifrons | NA | NA |
Eulemur collaris | NA | NA |
Eulemur coronatus | NA | NA |
Eulemur Eulemur | NA | NA |
Eulemur flavifrons | NA | NA |
Eulemur fulvus | NA | NA |
Eulemur macaco | NA | NA |
Eulemur mongoz | NA | NA |
Eulemur rubriventer | NA | NA |
Eulemur rufus | NA | NA |
Eulemur sanfordi | NA | NA |
Galago moholi | NA | NA |
Hapalemur griseus griseus | NA | NA |
Lemur catta | NA | NA |
Loris tardigradus | NA | NA |
Mircocebus murinus | NA | NA |
Mirza coquereli | NA | NA |
Nycticebus coucang | NA | NA |
Nycticebus pygmaeus | NA | NA |
Otolemur garnettii garnettii | NA | NA |
Perodicticus potto | NA | NA |
Propithecus coquereli | NA | NA |
Varecia rubra | NA | NA |
Varecia variegata variegata | NA | NA |
litterweight_df %>%
dplyr::group_by(species) %>%
dplyr::summarize(litter_mean = mean(litter_size, na.rm = TRUE),
litter_sd = sd(litter_size, na.rm = TRUE)) %>%
dplyr::arrange(litter_mean) %>%
kableExtra::kbl(caption = "Litter size (mean +- sd) of the species housed at the Duke Lemur Center, in North Carolina, USA.")%>%
kableExtra::kable_styling(c("striped", "hover")) %>%
kableExtra::scroll_box(width = "100%", height = "300px")
## `summarise()` ungrouping output (override with `.groups` argument)
species | litter_mean | litter_sd |
---|---|---|
Daubentonia madagascariensis | 1.000000 | 0.0000000 |
Nycticebus coucang | 1.000000 | 0.0000000 |
Perodicticus potto | 1.000000 | 0.0000000 |
Propithecus coquereli | 1.000000 | 0.0000000 |
Eulemur mongoz | 1.051282 | 0.2220001 |
Loris tardigradus | 1.057143 | 0.2355041 |
Eulemur rubriventer | 1.100000 | 0.3077935 |
Hapalemur griseus griseus | 1.100000 | 0.3038218 |
Eulemur flavifrons | 1.131868 | 0.3402219 |
Eulemur sanfordi | 1.133333 | 0.3518658 |
Eulemur rufus | 1.151163 | 0.3603084 |
Eulemur fulvus | 1.160000 | 0.3741657 |
Otolemur garnettii garnettii | 1.196078 | 0.3989892 |
Eulemur coronatus | 1.282609 | 0.4552432 |
Eulemur Eulemur | 1.288591 | 0.4972261 |
Eulemur collaris | 1.380000 | 0.4903144 |
Galago moholi | 1.543478 | 0.5036102 |
Lemur catta | 1.548673 | 0.5331858 |
Eulemur macaco | 1.573770 | 0.4986320 |
Eulemur albifrons | 1.642857 | 0.4972452 |
Mirza coquereli | 1.818182 | 0.3892495 |
Nycticebus pygmaeus | 1.918367 | 0.4931504 |
Varecia variegata variegata | 2.314516 | 0.8101685 |
Varecia Varecia | 2.333333 | 0.5951190 |
Cheirogaleus medius | 2.567251 | 0.9074094 |
Varecia rubra | 2.602837 | 0.7547888 |
Mircocebus murinus | 2.625000 | 0.8087458 |
litterweight_df %>%
dplyr::filter(sex != "ND") %>%
ggplot(aes(x = log(weight), y = species))+
geom_point(aes(size = litter_size), alpha = 0.2) +
facet_wrap(~sex, ncol = 2) +
labs(x = "Weight (log(g))", y = "Species", size = "Litter size") +
theme_lemurs()
## Warning: Removed 374 rows containing missing values (geom_point).
Let’s try a more summarized version of it, this time differentiating males and females.
litterweight_df %>%
dplyr::filter(sex != "ND") %>%
ggplot(aes(x = log(weight), y = species))+
geom_point(aes(size = litter_size), alpha = 0.2) +
facet_wrap(~sex, ncol = 2) +
labs(x = "Weight (log(g))", y = "Species", size = "Litter size") +
theme_lemurs()
## Warning: Removed 374 rows containing missing values (geom_point).
Try exploring the flights
data set, included with the basic R download.
install.packages("nycfilghts13")
## Installing package into '/home/ludmilla/R/x86_64-pc-linux-gnu-library/4.0'
## (as 'lib' is unspecified)
## Warning: package 'nycfilghts13' is not available for this version of R
##
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
library(nycflights13)
R version, the OS and attached or loaded packages:
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] nycflights13_1.0.2 lubridate_1.7.9 forcats_0.5.0 stringr_1.4.0
## [5] dplyr_1.0.2 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2
## [9] tibble_3.1.1 ggplot2_3.3.3 tidyverse_1.3.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 assertthat_0.2.1 digest_0.6.27 utf8_1.2.1
## [5] R6_2.5.0 cellranger_1.1.0 backports_1.1.10 reprex_0.3.0
## [9] evaluate_0.14 httr_1.4.2 highr_0.9 pillar_1.6.0
## [13] rlang_0.4.11 curl_4.3 readxl_1.3.1 rstudioapi_0.13
## [17] jquerylib_0.1.4 blob_1.2.1 rmarkdown_2.11 labeling_0.4.2
## [21] webshot_0.5.2 munsell_0.5.0 broom_0.7.2 compiler_4.0.3
## [25] modelr_0.1.8 xfun_0.22 pkgconfig_2.0.3 htmltools_0.5.1.1
## [29] tidyselect_1.1.0 fansi_0.4.2 viridisLite_0.4.0 crayon_1.4.1
## [33] dbplyr_1.4.4 withr_2.4.2 grid_4.0.3 jsonlite_1.7.2
## [37] gtable_0.3.0 lifecycle_1.0.0 DBI_1.1.1 magrittr_2.0.1
## [41] scales_1.1.1 pals_1.7 cli_2.5.0 stringi_1.5.3
## [45] farver_2.1.0 mapproj_1.2.7 fs_1.5.0 xml2_1.3.2
## [49] bslib_0.2.4 ellipsis_0.3.2 generics_0.0.2 vctrs_0.3.8
## [53] kableExtra_1.3.1 tools_4.0.3 dichromat_2.0-0 glue_1.4.2
## [57] maps_3.4.0 hms_0.5.3 yaml_2.2.1 colorspace_2.0-2
## [61] rvest_0.3.6 knitr_1.33 haven_2.3.1 sass_0.3.1