German Electricity Data :electric_plug:¶

  • Data Source: SMARD Web Portal
  • Task: forecasting (using the past to predict the future)
  • Language: R

First steps:¶

  • load the data (with appropriate column types) :heavy_check_mark:
  • split the data into train/val/test sets :heavy_check_mark:
  • plot the time series (at different temporal resolution) :heavy_check_mark:
  • calculate the mean production per month :heavy_check_mark:
  • is there a seasonal pattern (what kinds of seasonalities)
  • make a prediction for the validation set
  • evaluate the quality of the prediction
In [1]:
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
In [2]:
theme_set(theme_light())
In [3]:
set.seed(42)
"Oliver Stefan Markus" %>%  str_split(" ") %>% unlist %>% sample %>% str_c(collapse=" → ")
Out[3]:
'Oliver → Markus → Stefan'

1. Data loading¶

  • load the hourly data from the file Realisierte_Erzeugung_201501010000_202305301300_Stunde.csv as a table into R :heavy_check_mark:
  • make sure, that data is interpreted and typed correctly :heavy_check_mark:
  • clean header names :heavy_check_mark:
  • create column Time containing the combined info from Datum and Anfang
In [4]:
data <- read_csv2(
    "../data_dojo_22/Realisierte_Erzeugung_201501010000_202305301300_Stunde.csv",
    col_types=cols(`Datum` = col_date("%d.%m.%Y")),
    na="-"
)
ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.

In [5]:
colnames(data) <- colnames(data) %>% str_remove(" \\[.*") %>% str_replace(" ","_")

2nd session¶

  • Create Time column
  • Split into training/validation/test set
  • Plot the training set with hourly resolution
  • Aggregate by day/month
  • Plot aggregated data
Bonus task
  • did you observe something peculiar with dates and times?
  • look at the hourly data of 2017-10-29 for an example
  • what happened there?
  • are there more days with a similar "problem"?
  • is this a problem?
  • how can we solve this?
In [6]:
data <- data %>%
    mutate(Time=Datum+hours(hour(Anfang)))
In [7]:
data
Out[7]:
A tibble: 73716 × 16
DatumAnfangEndeBiomasseWasserkraftWind_OffshoreWind_OnshorePhotovoltaikSonstige_ErneuerbareKernenergieBraunkohleSteinkohleErdgasPumpspeicherSonstige_KonventionelleTime
<date><time><time><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dttm>
2015-01-0100:00:0001:00:004024.251158.25516.50 8128.00 0.00133.0010710.5015687.253219.751226.251525.754909.252015-01-01 00:00:00
2015-01-0101:00:0002:00:003982.751188.00516.25 8297.50 0.00122.5011086.2515321.752351.25 870.751079.254932.752015-01-01 01:00:00
2015-01-0102:00:0003:00:004019.501139.25514.00 8540.00 0.00 93.0011026.2514817.502227.00 809.50 787.005041.752015-01-01 02:00:00
2015-01-0103:00:0004:00:004040.751122.50517.75 8552.00 0.00 86.5011027.7514075.002339.75 821.00 287.755084.002015-01-01 03:00:00
2015-01-0104:00:0005:00:004037.751112.00519.75 8643.50 0.00 86.5010962.2514115.002461.50 831.25 346.755070.752015-01-01 04:00:00
2015-01-0105:00:0006:00:004028.251107.75520.00 8711.75 0.00 86.7510696.0013474.252217.75 851.00 765.505096.752015-01-01 05:00:00
2015-01-0106:00:0007:00:004013.251111.75521.50 9167.25 0.00 87.0010299.5012403.752373.25 868.25 414.505153.002015-01-01 06:00:00
2015-01-0107:00:0008:00:004012.751113.75520.25 9811.00 0.00 87.0010035.2512062.502491.00 876.00 582.505161.002015-01-01 07:00:00
2015-01-0108:00:0009:00:003999.751107.50525.25 9683.00 53.00 87.0010245.7512405.002530.25 888.25 750.505393.502015-01-01 08:00:00
2015-01-0109:00:0010:00:004016.251121.00527.00 9501.75 773.25 85.7510060.2512798.752386.25 891.50 387.005884.002015-01-01 09:00:00
2015-01-0110:00:0011:00:004007.751122.00525.0010025.002116.75 82.0010082.2512728.502557.00 888.25 176.256064.002015-01-01 10:00:00
2015-01-0111:00:0012:00:004011.751146.25527.2510862.503364.25 82.00 9949.0012451.002657.75 876.50 884.505914.752015-01-01 11:00:00
2015-01-0112:00:0013:00:004014.001139.75527.7511575.254198.25 82.0010119.0012464.502937.75 874.25 705.005778.252015-01-01 12:00:00
2015-01-0113:00:0014:00:004005.501128.00527.7511977.253500.50 84.5010453.5012417.752810.25 883.50 967.255808.752015-01-01 13:00:00
2015-01-0114:00:0015:00:004024.251154.75524.7511396.002278.75 87.0010828.2512621.502664.00 890.251176.255842.502015-01-01 14:00:00
2015-01-0115:00:0016:00:004028.751160.00524.5011326.50 745.50 86.0011069.7513387.252485.25 969.251663.006075.252015-01-01 15:00:00
2015-01-0116:00:0017:00:004030.751145.75524.5013058.00 50.00 82.0011006.5014679.002668.751085.251227.005946.002015-01-01 16:00:00
2015-01-0117:00:0018:00:004029.251171.50524.5014911.50 0.00 82.2511057.5014964.252808.751193.751597.755464.502015-01-01 17:00:00
2015-01-0118:00:0019:00:004027.501168.50524.0016344.00 0.00 86.0010930.2514286.002690.251135.501320.005604.002015-01-01 18:00:00
2015-01-0119:00:0020:00:004002.751157.75523.5017484.50 0.00 87.7510861.7513498.002490.751123.751535.505866.252015-01-01 19:00:00
2015-01-0120:00:0021:00:003991.501168.25522.2518659.00 0.00 88.0010343.0012366.002440.001209.251162.755861.252015-01-01 20:00:00
2015-01-0121:00:0022:00:003990.251186.50521.0019831.50 0.00 88.00 9840.0011284.752409.001205.75 838.756090.752015-01-01 21:00:00
2015-01-0122:00:0023:00:004000.001170.25519.5020716.00 0.00 88.00 9749.0010428.502298.751324.252066.756084.752015-01-01 22:00:00
2015-01-0123:00:0000:00:004002.001162.00494.0021589.00 0.00 88.00 9045.75 9861.252209.251212.751436.255989.252015-01-01 23:00:00
2015-01-0200:00:0001:00:003999.751122.00305.0022166.75 0.00 88.00 8164.75 9035.502225.501172.001301.005978.002015-01-02 00:00:00
2015-01-0201:00:0002:00:004003.251129.25276.5023158.00 0.00 87.75 7147.75 9013.502188.751127.75 893.506370.752015-01-02 01:00:00
2015-01-0202:00:0003:00:003997.251124.25282.0023966.00 0.00 87.75 6675.75 9018.501974.501123.25 608.505873.752015-01-02 02:00:00
2015-01-0203:00:0004:00:004000.001133.25324.7524708.75 0.00 87.75 6466.75 8765.001907.001120.25 203.505858.002015-01-02 03:00:00
2015-01-0204:00:0005:00:003992.001122.25335.0025386.00 0.00 88.00 6304.00 8398.252098.751129.75 712.506012.752015-01-02 04:00:00
2015-01-0205:00:0006:00:003981.501100.75388.2525783.75 0.00 88.00 6760.00 8593.002406.001157.50 729.256221.252015-01-02 05:00:00
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮
2023-05-2907:00:0008:00:004489.002074.502402.00 6005.75 9430.50117.0003455.751583.252201.251292.751444.752023-05-29 07:00:00
2023-05-2908:00:0009:00:004523.252068.752560.00 4323.7517827.00117.0003462.501812.002127.00 935.001448.002023-05-29 08:00:00
2023-05-2909:00:0010:00:004571.752071.502365.50 3468.7525662.25117.0003450.251726.002052.25 476.001449.752023-05-29 09:00:00
2023-05-2910:00:0011:00:004579.002072.001576.25 3647.0031928.50117.0003450.751691.752025.00 215.751448.002023-05-29 10:00:00
2023-05-2911:00:0012:00:004594.752032.251447.50 3519.5035927.00117.0003424.251684.252041.00 225.001445.752023-05-29 11:00:00
2023-05-2912:00:0013:00:004584.251643.001128.25 3267.2537681.50117.0003176.251464.752029.00 152.001433.752023-05-29 12:00:00
2023-05-2913:00:0014:00:004545.501598.75 855.00 3471.7537628.50117.0003178.001290.502043.75 161.251439.252023-05-29 13:00:00
2023-05-2914:00:0015:00:004509.751574.50 697.00 3401.0035503.25117.0003166.251273.002062.25 19.751419.002023-05-29 14:00:00
2023-05-2915:00:0016:00:004482.501518.251243.25 4111.7531892.50117.0003169.501266.502077.50 2.501386.752023-05-29 15:00:00
2023-05-2916:00:0017:00:004458.001648.502406.50 7099.5027421.75117.2503174.501282.502055.00 11.251408.252023-05-29 16:00:00
2023-05-2917:00:0018:00:004449.502067.252649.50 9514.0020541.50125.7503390.751326.002118.50 109.751421.502023-05-29 17:00:00
2023-05-2918:00:0019:00:004465.002099.004183.5012215.5012323.25126.0003441.501375.002348.75 360.001405.752023-05-29 18:00:00
2023-05-2919:00:0020:00:004508.252099.504259.7513248.00 5363.50134.2503643.251350.253261.25 815.001384.252023-05-29 19:00:00
2023-05-2920:00:0021:00:004562.002144.004221.0013197.00 1635.00133.2503861.001323.003647.751620.501174.002023-05-29 20:00:00
2023-05-2921:00:0022:00:004600.502119.504300.2514307.75 79.00132.7504030.001422.503970.751577.001170.502023-05-29 21:00:00
2023-05-2922:00:0023:00:004634.252117.504509.2514627.50 0.00123.0003946.751466.003681.751807.751154.502023-05-29 22:00:00
2023-05-2923:00:0000:00:004651.252066.504647.2514237.50 0.00122.0003713.001368.253729.75 900.251154.252023-05-29 23:00:00
2023-05-3000:00:0001:00:004629.751968.004278.7513122.75 0.00120.0003735.251313.753710.00 625.251147.502023-05-30 00:00:00
2023-05-3001:00:0002:00:004581.502033.753891.7511488.50 0.00119.5003523.501288.503984.75 769.251146.002023-05-30 01:00:00
2023-05-3002:00:0003:00:004551.001963.503725.5010141.00 0.00119.0003484.251263.753592.00 604.251141.752023-05-30 02:00:00
2023-05-3003:00:0004:00:004503.252090.503635.25 9170.25 0.00119.0003490.001272.504030.50 578.751132.502023-05-30 03:00:00
2023-05-3004:00:0005:00:004454.252105.253635.75 8735.75 2.25120.0003678.251275.004895.50 433.251136.002023-05-30 04:00:00
2023-05-3005:00:0006:00:004406.752134.752940.00 8419.75 407.75120.2504087.501289.255723.75 404.251142.252023-05-30 05:00:00
2023-05-3006:00:0007:00:004413.252227.502272.50 7531.00 2888.75128.2505010.751362.006525.252803.751144.502023-05-30 06:00:00
2023-05-3007:00:0008:00:004468.002276.501916.25 5507.25 8343.25137.0005275.501173.756905.755712.751156.752023-05-30 07:00:00
2023-05-3008:00:0009:00:004503.502152.501643.25 3647.7515893.25138.0005181.751196.256946.505079.501132.002023-05-30 08:00:00
2023-05-3009:00:0010:00:004544.502156.251285.00 2900.5023612.50135.0004707.501114.756384.502122.751115.002023-05-30 09:00:00
2023-05-3010:00:0011:00:004568.252085.251241.25 2729.5030341.75121.2503978.501095.504802.501279.001096.252023-05-30 10:00:00
2023-05-3011:00:0012:00:004618.752068.251053.25 3115.0035196.25119.0003759.751151.503762.25 774.001096.502023-05-30 11:00:00
2023-05-3012:00:0013:00:004612.251967.25 781.25 3496.2537030.00119.0003765.001296.753240.00 196.751099.752023-05-30 12:00:00
In [0]:

In [8]:
data %>% count(year(Time))
Out[8]:
A tibble: 9 × 2
year(Time)n
<dbl><int>
20158760
20168784
20178760
20188760
20198760
20208784
20218760
20228760
20233588
In [9]:
train <- data %>% filter(year(Time)<2021)
In [10]:
val <- data %>% filter(year(Time)==2021)
In [11]:
test <- data %>% filter(year(Time)>2021)
In [12]:
dim(train)
Out[12]:
  1. 52608
  2. 16
In [13]:
options(repr.plot.res = 200)
In [14]:
tidydf <- train %>%
    pivot_longer(names_to='Type', values_to='MWh', Biomasse:Sonstige_Konventionelle) %>%
    select(-c(Datum, Anfang, Ende))
In [15]:
ggplot(tidydf)+
    geom_line(aes(Time, MWh, color = Type)) + facet_wrap(Type~.)
Out[15]:
In [16]:
tidydf %>% 
            group_by(date = floor_date(Time, unit = "day"),Type) %>% 
            summarize(MWh = mean(MWh)) %>%
            ggplot() + geom_line(aes(date,MWh,color = Type)) + facet_wrap(~Type)
`summarise()` has grouped output by 'date'. You can override using the
`.groups` argument.
Out[16]:
In [17]:
tidydf %>% 
            group_by(date = floor_date(Time, unit = "month"),Type) %>%
            summarize(MWh = mean(MWh)) %>%
            ggplot() + geom_line(aes(date,MWh,color = Type)) + facet_wrap(~Type)
            #identity
`summarise()` has grouped output by 'date'. You can override using the
`.groups` argument.
Out[17]:

Todos 3rd Session¶

  • Select one of the types (e.g. Photovoltaik, Erdgas or Wind_Onshore)
  • What kinds of seasonality are detectable?
  • Is there a general trend on top of the seasonal pattern(s)?
  • Can you use this information to manually craft a prediction model?

Bonus:

  • How accurately does your manually created model predict the values in the validation set?
In [20]:
tidydf %>%
    filter(Type == "Photovoltaik") %>%
    ggplot(aes(x=Time, y=MWh)) +
        geom_line()
Out[20]:

Inspect yearly pattern¶

In [26]:
tidydf %>%
    filter(Type == "Photovoltaik") %>%
    mutate(year = year(Time), date_only = format(Time, format="%m-%d")) %>%
    ggplot(aes(x=date_only, y=MWh, color=year)) +
        geom_line(alpha=.4)
Out[26]:

Inspect daily pattern¶

In [33]:
tidydf %>%
    filter(Type == "Photovoltaik") %>%
    mutate(year = year(Time), month = as.factor(month(Time)), day = day(Time), hour = hour(Time)) %>%
    filter(year == 2015, day == 1) %>%
    ggplot(aes(x=hour, y=MWh, color=month)) +
        geom_line()
Out[33]:
In [36]:
tidydf %>%
    filter(Type == "Photovoltaik") %>%
    mutate(year = year(Time), month = as.factor(month(Time)), day = day(Time), hour = hour(Time)) %>%
    mutate(summertime = month(Time) > 3 & month(Time) < 11) %>%
    filter(year == 2015, day == 1) %>%
    ggplot(aes(x=hour, y=MWh, color=month)) +
        geom_line(aes(lty = summertime))
Out[36]:
In [71]:
timeseries <-
tidydf %>%
    filter(Type == "Photovoltaik") %>%
    group_by(year(Time), month(Time)) %>%
    summarize(mean_year = mean(MWh)) %>%
    select(mean_year)
#timeseries
    timeseries <- ts(timeseries$mean_year, start = 2015, frequency = 12)
    timeseries_df <- decompose(timeseries)
#plot(timeseries)

plot(timeseries_df)
`summarise()` has grouped output by 'year(Time)'. You can override using the
`.groups` argument.
Adding missing grouping variables: `year(Time)`
Out[71]:
In [73]:
timeseries <-
tidydf %>%
    filter(Type == "Photovoltaik") %>%
    group_by(year(Time), month(Time), day(Time)) %>%
    summarize(mean_day = mean(MWh)) %>%
    select(mean_day)
#timeseries
    timeseries <- ts(timeseries$mean_day, start = 2015, frequency = 365)
    timeseries_df <- decompose(timeseries)
#plot(timeseries)

plot(timeseries_df)
`summarise()` has grouped output by 'year(Time)', 'month(Time)'. You can
override using the `.groups` argument.
Adding missing grouping variables: `year(Time)`, `month(Time)`
Out[73]: