Global Power Plant Database¶

https://github.com/wri/global-power-plant-database/

Initial Exploration with Power BI¶

Thanks to Timo for the nice introduction and demonstration of Power BI. Power BI is freely available for personal use: https://www.microsoft.com/en-US/download/details.aspx?id=58494

Questions:¶

Generic
- What kind of information is stored in the table?
- How much data is missing?
- Is the dataset clean or are there any clear outliers?
Specific
- Which country has the highest number of powerplants (how many, top 10)?
- Which is the country with the highest power preduction (MW) per continent (how much, top 5)
- What is the overall proportion of each energy source (by plant number, by MW produced)
- Which is the greenest country (highest fraction of power from renewable sources)?
- How does the distribution of power plants look on a worldmap (by type, interactive)?

Rotation¶

Markus
Stefan
Timo

library(tidyverse)

Warning message in system("timedatectl", intern = TRUE):
“running command 'timedatectl' had status 1”
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

✔ ggplot2 3.3.5     ✔ purrr   0.3.4
✔ tibble  3.1.2     ✔ dplyr   1.0.7
✔ tidyr   1.1.3     ✔ stringr 1.4.0
✔ readr   1.4.0     ✔ forcats 0.5.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

power_plants <- readr::read_csv('https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv')

── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_character(),
  capacity_mw = col_double(),
  latitude = col_double(),
  longitude = col_double(),
  other_fuel3 = col_logical(),
  commissioning_year = col_double(),
  wepp_id = col_double(),
  year_of_capacity_data = col_double(),
  generation_gwh_2013 = col_double(),
  generation_gwh_2014 = col_double(),
  generation_gwh_2015 = col_double(),
  generation_gwh_2016 = col_double(),
  generation_gwh_2017 = col_double(),
  estimated_generation_gwh = col_double()
)
ℹ Use `spec()` for the full column specifications.


Warning message:
“194 parsing failures.
  row     col               expected                    actual                                                                                                                       file
 6041 wepp_id no trailing characters 1071984|1082331           'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'
13964 wepp_id no trailing characters 1013223|1085384           'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'
13992 wepp_id no trailing characters 1057928|1064452           'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'
14567 wepp_id no trailing characters 1024905|1060668           'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'
14733 wepp_id no trailing characters 1074142| 1074143| 1030932 'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'
..... ....... ...................... ......................... ..........................................................................................................................
See problems(...) for more details.
”

What kind of information is in the table?¶

str(power_plants)

spec_tbl_df [33,643 × 25] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ country                 : chr [1:33643] "AFG" "AFG" "AFG" "AFG" ...
 $ country_long            : chr [1:33643] "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
 $ name                    : chr [1:33643] "Kajaki Hydroelectric Power Plant Afghanistan" "Kandahar DOG" "Kandahar JOL" "Mahipar Hydroelectric Power Plant Afghanistan" ...
 $ gppd_idnr               : chr [1:33643] "GEODB0040538" "WKS0070144" "WKS0071196" "GEODB0040541" ...
 $ capacity_mw             : num [1:33643] 33 10 10 66 100 ...
 $ latitude                : num [1:33643] 32.3 31.7 31.6 34.6 34.6 ...
 $ longitude               : num [1:33643] 65.1 65.8 65.8 69.5 69.7 ...
 $ primary_fuel            : chr [1:33643] "Hydro" "Solar" "Solar" "Hydro" ...
 $ other_fuel1             : chr [1:33643] NA NA NA NA ...
 $ other_fuel2             : chr [1:33643] NA NA NA NA ...
 $ other_fuel3             : logi [1:33643] NA NA NA NA NA NA ...
 $ commissioning_year      : num [1:33643] NA NA NA NA NA ...
 $ owner                   : chr [1:33643] NA NA NA NA ...
 $ source                  : chr [1:33643] "GEODB" "Wiki-Solar" "Wiki-Solar" "GEODB" ...
 $ url                     : chr [1:33643] "http://globalenergyobservatory.org" "https://www.wiki-solar.org" "https://www.wiki-solar.org" "http://globalenergyobservatory.org" ...
 $ geolocation_source      : chr [1:33643] "GEODB" "Wiki-Solar" "Wiki-Solar" "GEODB" ...
 $ wepp_id                 : num [1:33643] 1009793 NA NA 1009795 1009797 ...
 $ year_of_capacity_data   : num [1:33643] 2017 NA NA 2017 2017 ...
 $ generation_gwh_2013     : num [1:33643] NA NA NA NA NA NA NA NA NA NA ...
 $ generation_gwh_2014     : num [1:33643] NA NA NA NA NA NA NA NA NA NA ...
 $ generation_gwh_2015     : num [1:33643] NA NA NA NA NA NA NA NA NA NA ...
 $ generation_gwh_2016     : num [1:33643] NA NA NA NA NA NA NA NA NA NA ...
 $ generation_gwh_2017     : num [1:33643] NA NA NA NA NA NA NA NA NA NA ...
 $ generation_data_source  : chr [1:33643] NA NA NA NA ...
 $ estimated_generation_gwh: num [1:33643] NA NA NA NA NA ...
 - attr(*, "problems")= tibble [194 × 5] (S3: tbl_df/tbl/data.frame)
  ..$ row     : int [1:194] 6041 13964 13992 14567 14733 14896 14925 16074 16137 16327 ...
  ..$ col     : chr [1:194] "wepp_id" "wepp_id" "wepp_id" "wepp_id" ...
  ..$ expected: chr [1:194] "no trailing characters" "no trailing characters" "no trailing characters" "no trailing characters" ...
  ..$ actual  : chr [1:194] "1071984|1082331" "1013223|1085384" "1057928|1064452" "1024905|1060668" ...
  ..$ file    : chr [1:194] "'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'" "'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'" "'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'" "'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'" ...
 - attr(*, "spec")=
  .. cols(
  ..   country = col_character(),
  ..   country_long = col_character(),
  ..   name = col_character(),
  ..   gppd_idnr = col_character(),
  ..   capacity_mw = col_double(),
  ..   latitude = col_double(),
  ..   longitude = col_double(),
  ..   primary_fuel = col_character(),
  ..   other_fuel1 = col_character(),
  ..   other_fuel2 = col_character(),
  ..   other_fuel3 = col_logical(),
  ..   commissioning_year = col_double(),
  ..   owner = col_character(),
  ..   source = col_character(),
  ..   url = col_character(),
  ..   geolocation_source = col_character(),
  ..   wepp_id = col_double(),
  ..   year_of_capacity_data = col_double(),
  ..   generation_gwh_2013 = col_double(),
  ..   generation_gwh_2014 = col_double(),
  ..   generation_gwh_2015 = col_double(),
  ..   generation_gwh_2016 = col_double(),
  ..   generation_gwh_2017 = col_double(),
  ..   generation_data_source = col_character(),
  ..   estimated_generation_gwh = col_double()
  .. )

Missing Values¶

power_plants %>% select(everything()) %>%
  summarise_all(list(~sum(is.na(.)))) %>%
  pivot_longer(country:estimated_generation_gwh)%>%filter(value<1000)

Outliers? Distribution¶

ggplot(data=power_plants) + geom_density(aes(x=capacity_mw)) + scale_x_log10()

Top power plant and producers (country)¶

power_plants %>% filter(capacity_mw==max(capacity_mw))

power_plants %>% count(country_long, sort=T) %>% head(10)

power_plants %>% group_by(country_long,)  %>% summarize(capacity = sum(capacity_mw))  %>% arrange(-capacity)  %>% head(20)

Include primary fuel type¶

unique(power_plants$primary_fuel)

power_plants %>% 
    group_by(country_long,primary_fuel)  %>% 
    summarize(capacity = sum(capacity_mw), number_powerplants=n()) %>% arrange(-capacity) %>% head

`summarise()` has grouped output by 'country_long'. You can override using the `.groups` argument.

options(repr.plot.res = 250, repr.plot.width = 14)

power_plants %>% 
    mutate(
        country_long = fct_lump(country_long, 9),
        primary_fuel = fct_lump(primary_fuel, 8, w=capacity_mw)
    ) %>%
    group_by(country_long,primary_fuel)  %>% 
    summarize(capacity = sum(capacity_mw), number_powerplants=n()) %>% arrange(-capacity) %>%
    ungroup() %>%
    mutate(country_long = fct_reorder(country_long, capacity, .fun=sum)) %>%
    ggplot(aes(y=country_long, x=capacity, fill=primary_fuel)) + 
       geom_col() +
       scale_fill_viridis_d(option = "F") +
       theme_minimal() +
       labs(y="Country", x = "Capacity") #+ facet_wrap(~primary_fuel, scale="free_x")

`summarise()` has grouped output by 'country_long'. You can override using the `.groups` argument.

power_plants %>% 
    mutate(
        country_long = fct_lump(country_long, 9),
        #primary_fuel = fct_lump(primary_fuel, 8, w=capacity_mw)
    ) %>%
    group_by(country_long,primary_fuel)  %>% 
    summarize(capacity = sum(capacity_mw), number_powerplants=n()) %>% arrange(-capacity) %>%
    ungroup() %>%
    mutate(primary_fuel = fct_reorder(primary_fuel, capacity, .fun=sum)) %>%
    ggplot(aes(y=primary_fuel, x=capacity, fill=primary_fuel)) + 
       geom_col() +
       scale_fill_viridis_d(option = "F", direction = -1) +
       theme_minimal() +
       labs(y="Country", x = "Capacity")

`summarise()` has grouped output by 'country_long'. You can override using the `.groups` argument.

power_plants %>% 
    group_by(country_long,primary_fuel)  %>% 
    summarize(number_powerplants=n()) %>%
    pivot_wider(names_from=primary_fuel, values_from=number_powerplants, values_fill=0) %>%
    identity %>% head

`summarise()` has grouped output by 'country_long'. You can override using the `.groups` argument.

country	country_long	name	gppd_idnr	capacity_mw	latitude	longitude	primary_fuel	other_fuel1	other_fuel2	⋯	geolocation_source	wepp_id	year_of_capacity_data	generation_gwh_2013	generation_gwh_2014	generation_gwh_2015	generation_gwh_2016	generation_gwh_2017	generation_data_source	estimated_generation_gwh
<chr>	<chr>	<chr>	<chr>	<dbl>	<dbl>	<dbl>	<chr>	<chr>	<chr>	⋯	<chr>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<chr>	<dbl>
CHN	China	Three Gorges Dam	WRI1000452	22500	30.8235	111.0032	Hydro	NA	NA	⋯	NA	1012216	NA	NA	NA	NA	NA	NA	NA	92452.57

country_long	n
<chr>	<int>
United States of America	8688
China	4235
United Kingdom	2603
Brazil	2360
France	2155
India	1590
Germany	1309
Canada	1159
Spain	829
Russia	545

country_long	capacity
<chr>	<dbl>
China	1415067.38
United States of America	1193670.75
India	311713.71
Russia	228220.05
Japan	215365.85
Brazil	147589.27
Canada	143578.70
Germany	112040.37
France	110615.93
South Korea	99472.68
United Kingdom	87946.78
Saudi Arabia	84341.55
Spain	79035.07
Italy	73095.06
Australia	68841.97
Mexico	62365.90
Iran	61362.10
Turkey	51444.70
South Africa	50422.70
Indonesia	48751.42

country_long	primary_fuel	capacity	number_powerplants
<chr>	<chr>	<dbl>	<int>
China	Coal	955718.0	946
United States of America	Gas	525687.9	1741
United States of America	Coal	282990.1	338
China	Hydro	259025.6	947
India	Coal	200503.2	254
Russia	Gas	106722.6	258

country_long	Gas	Hydro	Solar	Other	Oil	Wind	Coal	Nuclear	Biomass	Waste	Wave and Tidal	Petcoke	Geothermal	Cogeneration	Storage
<chr>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>	<int>
Afghanistan	1	6	2	0	0	0	0	0	0	0	0	0	0	0	0
Albania	0	7	0	1	0	0	0	0	0	0	0	0	0	0	0
Algeria	31	1	27	0	0	0	0	0	0	0	0	0	0	0	0
Angola	3	5	0	0	6	0	0	0	0	0	0	0	0	0	0
Antarctica	0	0	0	0	1	1	0	0	0	0	0	0	0	0	0
Argentina	57	50	7	2	96	12	9	3	0	0	0	0	0	0	0