German Electricity Data :electric_plug:¶

  • Data Source: SMARD Web Portal
  • Task: forecasting (using the past to predict the future)
  • Language: julia

First steps:¶

  • load the data (with appropriate column types) :heavy_check_mark:
  • split the data into train/val/test sets
  • plot the time series (at different temporal resolution)
  • calculate the mean production per month
  • is there a seasonal pattern (what kinds of seasonalities)
  • make a prediction for the validation set
  • evaluate the quality of the prediction
In [3]:
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
In [4]:
set.seed(42)
"A B C D E F" %>%  str_split(" ") %>% unlist %>% sample %>% str_c(collapse=" → ")
Out[4]:
'A → E → F → D → B → C'
In [0]:

1. Data loading¶

  • load the hourly data from the file Realisierte_Erzeugung_201501010000_202305301300_Stunde.csv as a table into R :heavy_check_mark:
  • make sure, that data is interpreted and typed correctly :heavy_check_mark:
  • clean header names :heavy_check_mark:
  • create column Time containing the combined info from Datum and Anfang
In [11]:
data <- read_csv2(
    "../data_dojo_22/Realisierte_Erzeugung_201501010000_202305301300_Stunde.csv",
    col_types=cols(`Datum` = col_date("%d.%m.%Y")),
    na="-"
)
ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.

In [12]:
colnames(data) <- colnames(data) %>% str_remove(" \\[.*") %>% str_replace(" ","_")
In [13]:
data
Out[13]:
A spec_tbl_df: 73716 × 15
DatumAnfangEndeBiomasseWasserkraftWind_OffshoreWind_OnshorePhotovoltaikSonstige_ErneuerbareKernenergieBraunkohleSteinkohleErdgasPumpspeicherSonstige_Konventionelle
<date><time><time><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
2015-01-0100:00:0001:00:004024.251158.25516.50 8128.00 0.00133.0010710.5015687.253219.751226.251525.754909.25
2015-01-0101:00:0002:00:003982.751188.00516.25 8297.50 0.00122.5011086.2515321.752351.25 870.751079.254932.75
2015-01-0102:00:0003:00:004019.501139.25514.00 8540.00 0.00 93.0011026.2514817.502227.00 809.50 787.005041.75
2015-01-0103:00:0004:00:004040.751122.50517.75 8552.00 0.00 86.5011027.7514075.002339.75 821.00 287.755084.00
2015-01-0104:00:0005:00:004037.751112.00519.75 8643.50 0.00 86.5010962.2514115.002461.50 831.25 346.755070.75
2015-01-0105:00:0006:00:004028.251107.75520.00 8711.75 0.00 86.7510696.0013474.252217.75 851.00 765.505096.75
2015-01-0106:00:0007:00:004013.251111.75521.50 9167.25 0.00 87.0010299.5012403.752373.25 868.25 414.505153.00
2015-01-0107:00:0008:00:004012.751113.75520.25 9811.00 0.00 87.0010035.2512062.502491.00 876.00 582.505161.00
2015-01-0108:00:0009:00:003999.751107.50525.25 9683.00 53.00 87.0010245.7512405.002530.25 888.25 750.505393.50
2015-01-0109:00:0010:00:004016.251121.00527.00 9501.75 773.25 85.7510060.2512798.752386.25 891.50 387.005884.00
2015-01-0110:00:0011:00:004007.751122.00525.0010025.002116.75 82.0010082.2512728.502557.00 888.25 176.256064.00
2015-01-0111:00:0012:00:004011.751146.25527.2510862.503364.25 82.00 9949.0012451.002657.75 876.50 884.505914.75
2015-01-0112:00:0013:00:004014.001139.75527.7511575.254198.25 82.0010119.0012464.502937.75 874.25 705.005778.25
2015-01-0113:00:0014:00:004005.501128.00527.7511977.253500.50 84.5010453.5012417.752810.25 883.50 967.255808.75
2015-01-0114:00:0015:00:004024.251154.75524.7511396.002278.75 87.0010828.2512621.502664.00 890.251176.255842.50
2015-01-0115:00:0016:00:004028.751160.00524.5011326.50 745.50 86.0011069.7513387.252485.25 969.251663.006075.25
2015-01-0116:00:0017:00:004030.751145.75524.5013058.00 50.00 82.0011006.5014679.002668.751085.251227.005946.00
2015-01-0117:00:0018:00:004029.251171.50524.5014911.50 0.00 82.2511057.5014964.252808.751193.751597.755464.50
2015-01-0118:00:0019:00:004027.501168.50524.0016344.00 0.00 86.0010930.2514286.002690.251135.501320.005604.00
2015-01-0119:00:0020:00:004002.751157.75523.5017484.50 0.00 87.7510861.7513498.002490.751123.751535.505866.25
2015-01-0120:00:0021:00:003991.501168.25522.2518659.00 0.00 88.0010343.0012366.002440.001209.251162.755861.25
2015-01-0121:00:0022:00:003990.251186.50521.0019831.50 0.00 88.00 9840.0011284.752409.001205.75 838.756090.75
2015-01-0122:00:0023:00:004000.001170.25519.5020716.00 0.00 88.00 9749.0010428.502298.751324.252066.756084.75
2015-01-0123:00:0000:00:004002.001162.00494.0021589.00 0.00 88.00 9045.75 9861.252209.251212.751436.255989.25
2015-01-0200:00:0001:00:003999.751122.00305.0022166.75 0.00 88.00 8164.75 9035.502225.501172.001301.005978.00
2015-01-0201:00:0002:00:004003.251129.25276.5023158.00 0.00 87.75 7147.75 9013.502188.751127.75 893.506370.75
2015-01-0202:00:0003:00:003997.251124.25282.0023966.00 0.00 87.75 6675.75 9018.501974.501123.25 608.505873.75
2015-01-0203:00:0004:00:004000.001133.25324.7524708.75 0.00 87.75 6466.75 8765.001907.001120.25 203.505858.00
2015-01-0204:00:0005:00:003992.001122.25335.0025386.00 0.00 88.00 6304.00 8398.252098.751129.75 712.506012.75
2015-01-0205:00:0006:00:003981.501100.75388.2525783.75 0.00 88.00 6760.00 8593.002406.001157.50 729.256221.25
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮
2023-05-2907:00:0008:00:004489.002074.502402.00 6005.75 9430.50117.0003455.751583.252201.251292.751444.75
2023-05-2908:00:0009:00:004523.252068.752560.00 4323.7517827.00117.0003462.501812.002127.00 935.001448.00
2023-05-2909:00:0010:00:004571.752071.502365.50 3468.7525662.25117.0003450.251726.002052.25 476.001449.75
2023-05-2910:00:0011:00:004579.002072.001576.25 3647.0031928.50117.0003450.751691.752025.00 215.751448.00
2023-05-2911:00:0012:00:004594.752032.251447.50 3519.5035927.00117.0003424.251684.252041.00 225.001445.75
2023-05-2912:00:0013:00:004584.251643.001128.25 3267.2537681.50117.0003176.251464.752029.00 152.001433.75
2023-05-2913:00:0014:00:004545.501598.75 855.00 3471.7537628.50117.0003178.001290.502043.75 161.251439.25
2023-05-2914:00:0015:00:004509.751574.50 697.00 3401.0035503.25117.0003166.251273.002062.25 19.751419.00
2023-05-2915:00:0016:00:004482.501518.251243.25 4111.7531892.50117.0003169.501266.502077.50 2.501386.75
2023-05-2916:00:0017:00:004458.001648.502406.50 7099.5027421.75117.2503174.501282.502055.00 11.251408.25
2023-05-2917:00:0018:00:004449.502067.252649.50 9514.0020541.50125.7503390.751326.002118.50 109.751421.50
2023-05-2918:00:0019:00:004465.002099.004183.5012215.5012323.25126.0003441.501375.002348.75 360.001405.75
2023-05-2919:00:0020:00:004508.252099.504259.7513248.00 5363.50134.2503643.251350.253261.25 815.001384.25
2023-05-2920:00:0021:00:004562.002144.004221.0013197.00 1635.00133.2503861.001323.003647.751620.501174.00
2023-05-2921:00:0022:00:004600.502119.504300.2514307.75 79.00132.7504030.001422.503970.751577.001170.50
2023-05-2922:00:0023:00:004634.252117.504509.2514627.50 0.00123.0003946.751466.003681.751807.751154.50
2023-05-2923:00:0000:00:004651.252066.504647.2514237.50 0.00122.0003713.001368.253729.75 900.251154.25
2023-05-3000:00:0001:00:004629.751968.004278.7513122.75 0.00120.0003735.251313.753710.00 625.251147.50
2023-05-3001:00:0002:00:004581.502033.753891.7511488.50 0.00119.5003523.501288.503984.75 769.251146.00
2023-05-3002:00:0003:00:004551.001963.503725.5010141.00 0.00119.0003484.251263.753592.00 604.251141.75
2023-05-3003:00:0004:00:004503.252090.503635.25 9170.25 0.00119.0003490.001272.504030.50 578.751132.50
2023-05-3004:00:0005:00:004454.252105.253635.75 8735.75 2.25120.0003678.251275.004895.50 433.251136.00
2023-05-3005:00:0006:00:004406.752134.752940.00 8419.75 407.75120.2504087.501289.255723.75 404.251142.25
2023-05-3006:00:0007:00:004413.252227.502272.50 7531.00 2888.75128.2505010.751362.006525.252803.751144.50
2023-05-3007:00:0008:00:004468.002276.501916.25 5507.25 8343.25137.0005275.501173.756905.755712.751156.75
2023-05-3008:00:0009:00:004503.502152.501643.25 3647.7515893.25138.0005181.751196.256946.505079.501132.00
2023-05-3009:00:0010:00:004544.502156.251285.00 2900.5023612.50135.0004707.501114.756384.502122.751115.00
2023-05-3010:00:0011:00:004568.252085.251241.25 2729.5030341.75121.2503978.501095.504802.501279.001096.25
2023-05-3011:00:0012:00:004618.752068.251053.25 3115.0035196.25119.0003759.751151.503762.25 774.001096.50
2023-05-3012:00:0013:00:004612.251967.25 781.25 3496.2537030.00119.0003765.001296.753240.00 196.751099.75

Todo 2nd session¶

  • Create Time column
  • Split into training/validation/test set
  • Plot the training set with hourly resolution
  • Aggregate by day/month
  • Plot aggregated data
Bonus task
  • did you observe something peculiar with dates and times?
  • look at the hourly data of 2017-10-29 for an example
  • what happened there?
  • are there more days with a similar "problem"?
  • is this a problem?
  • how can we solve this?
In [14]:
data <- data %>% 
        mutate(year = Datum) %>%
        unite("Time", Datum:Anfang)
In [15]:
head(data)
Out[15]:
A tibble: 6 × 15
TimeEndeBiomasseWasserkraftWind_OffshoreWind_OnshorePhotovoltaikSonstige_ErneuerbareKernenergieBraunkohleSteinkohleErdgasPumpspeicherSonstige_Konventionelleyear
<chr><time><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><date>
2015-01-01_00:00:0001:00:004024.251158.25516.508128.000133.0010710.5015687.253219.751226.251525.754909.252015-01-01
2015-01-01_01:00:0002:00:003982.751188.00516.258297.500122.5011086.2515321.752351.25 870.751079.254932.752015-01-01
2015-01-01_02:00:0003:00:004019.501139.25514.008540.000 93.0011026.2514817.502227.00 809.50 787.005041.752015-01-01
2015-01-01_03:00:0004:00:004040.751122.50517.758552.000 86.5011027.7514075.002339.75 821.00 287.755084.002015-01-01
2015-01-01_04:00:0005:00:004037.751112.00519.758643.500 86.5010962.2514115.002461.50 831.25 346.755070.752015-01-01
2015-01-01_05:00:0006:00:004028.251107.75520.008711.750 86.7510696.0013474.252217.75 851.00 765.505096.752015-01-01
In [10]:
nrow(data)
Out[10]:
73716
In [17]:
training_data <- data %>%
    filter(between(year, as.Date("2015-01-01"), as.Date("2021-05-30")))

validation_data <- data %>%
    filter(between(year, as.Date("2021-05-31"), as.Date("2022-05-30")))

test_data <- data %>%
    filter(between(year, as.Date("2022-05-31"), as.Date("2023-05-30")))
In [19]:
tail(training_data)
Out[19]:
A tibble: 6 × 15
TimeEndeBiomasseWasserkraftWind_OffshoreWind_OnshorePhotovoltaikSonstige_ErneuerbareKernenergieBraunkohleSteinkohleErdgasPumpspeicherSonstige_Konventionelleyear
<chr><time><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><date>
2021-05-30_18:00:0019:00:004599.251955.50203.505483.7511433.50148.007842.757016.252075.253636.00 431.01369.502021-05-30
2021-05-30_19:00:0020:00:004644.252012.50228.005009.25 4841.75148.007901.258593.752258.754068.001388.51350.502021-05-30
2021-05-30_20:00:0021:00:004677.501986.75286.504443.50 1270.25148.757903.009298.502327.504063.251868.51339.002021-05-30
2021-05-30_21:00:0022:00:004640.502015.50240.505177.00 58.50150.257904.259402.502355.253998.752516.01356.502021-05-30
2021-05-30_22:00:0023:00:004572.501965.50224.756339.75 0.00151.007911.509450.002353.253947.251863.51355.752021-05-30
2021-05-30_23:00:0000:00:004512.751933.00209.507019.25 0.00151.757898.009474.252257.753840.25 824.51355.752021-05-30
In [0]:

In [22]:

Warning message in xy.coords(x, y, xlabel, ylabel, log):
“NAs introduced by coercion”
Warning message in min(x):
“no non-missing arguments to min; returning Inf”
Warning message in max(x):
“no non-missing arguments to max; returning -Inf”
WARNING: Some output was deleted.
In [30]:
plot(training_data$Photovoltaik,type="l",col="yellow")
lines(training_data$Wind_Offshore,type="l",col="gray")
lines(training_data$Wasserkraft,type="l",col="blue")
Out[30]:
In [26]:
plot(training_data$Wasserkraft,type="l",col="blue")
Out[26]:
In [50]:
aggregate(training_data$Wasserkraft, list(training_data$year), FUN=mean)
Out[50]:
A data.frame: 2342 × 2
Group.1x
<date><dbl>
2015-01-011144.292
2015-01-021128.656
2015-01-031297.323
2015-01-041758.844
2015-01-051587.188
2015-01-061506.375
2015-01-071477.604
2015-01-081493.302
2015-01-091520.146
2015-01-101834.219
2015-01-111797.198
2015-01-121749.708
2015-01-131666.719
2015-01-141630.625
2015-01-151651.906
2015-01-161632.135
2015-01-171647.198
2015-01-181607.688
2015-01-191627.823
2015-01-201716.552
2015-01-211752.917
2015-01-221698.396
2015-01-231642.250
2015-01-241536.698
2015-01-251388.698
2015-01-261375.635
2015-01-271453.750
2015-01-281459.240
2015-01-291483.333
2015-01-301581.062
⋮⋮
2021-05-011746.969
2021-05-021842.750
2021-05-031841.500
2021-05-041733.823
2021-05-051767.896
2021-05-061917.021
2021-05-071931.146
2021-05-081826.156
2021-05-091583.917
2021-05-101817.938
2021-05-112202.385
2021-05-122322.167
2021-05-132352.062
2021-05-142277.323
2021-05-152200.688
2021-05-162014.333
2021-05-172169.760
2021-05-182234.156
2021-05-192207.667
2021-05-202212.958
2021-05-211997.812
2021-05-221814.167
2021-05-232015.958
2021-05-242045.812
2021-05-252265.625
2021-05-262267.792
2021-05-272249.958
2021-05-282305.885
2021-05-292152.240
2021-05-301950.156
In [40]:
tData_long <- training_data %>%
   gather(key = "variable", value = "value", -Time, -year,- Ende)
In [41]:
head(tData_long)
Out[41]:
A tibble: 6 × 5
TimeEndeyearvariablevalue
<chr><time><date><chr><dbl>
2015-01-01_00:00:0001:00:002015-01-01Biomasse4024.25
2015-01-01_01:00:0002:00:002015-01-01Biomasse3982.75
2015-01-01_02:00:0003:00:002015-01-01Biomasse4019.50
2015-01-01_03:00:0004:00:002015-01-01Biomasse4040.75
2015-01-01_04:00:0005:00:002015-01-01Biomasse4037.75
2015-01-01_05:00:0006:00:002015-01-01Biomasse4028.25
In [42]:
tail(tData_long)
Out[42]:
A tibble: 6 × 5
TimeEndeyearvariablevalue
<chr><time><date><chr><dbl>
2021-05-30_18:00:0019:00:002021-05-30Sonstige_Konventionelle1369.50
2021-05-30_19:00:0020:00:002021-05-30Sonstige_Konventionelle1350.50
2021-05-30_20:00:0021:00:002021-05-30Sonstige_Konventionelle1339.00
2021-05-30_21:00:0022:00:002021-05-30Sonstige_Konventionelle1356.50
2021-05-30_22:00:0023:00:002021-05-30Sonstige_Konventionelle1355.75
2021-05-30_23:00:0000:00:002021-05-30Sonstige_Konventionelle1355.75
In [51]:
summary_df <- tData_long %>%
  group_by(year,variable) %>%
  summarize(Total_Value = sum(value))
head(summary_df)
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
Out[51]:
A grouped_df: 6 × 3
yearvariableTotal_Value
<date><chr><dbl>
2015-01-01Biomasse 96341.25
2015-01-01Braunkohle 314599.00
2015-01-01Erdgas 23806.00
2015-01-01Kernenergie 251485.25
2015-01-01Photovoltaik 17080.25
2015-01-01Pumpspeicher 23683.50
In [53]:
summary_df %>%
  ggplot( aes(x=year, y=Total_Value, group=variable, color=variable)) +
    geom_point()
Out[53]:
In [44]:
tData_long %>%
  ggplot( aes(x=year, y=value, group=variable, color=variable)) +
    geom_point()
Out[44]:
In [45]:
tData_longSum=summarise(tData_long, variable_name=year)
Warning message:
“Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.”
In [48]:
tData_long_sum = aggregate(tData_long$value, by=list(tData_long$year), sum)
In [49]:
tData_long_sum
Out[49]:
A data.frame: 2342 × 2
Group.1x
<date><dbl>
2015-01-011262749
2015-01-021499551
2015-01-031405097
2015-01-041320111
2015-01-051509716
2015-01-061512748
2015-01-071647952
2015-01-081730101
2015-01-091730532
2015-01-101511796
2015-01-111440382
2015-01-121568389
2015-01-131578018
2015-01-141539344
2015-01-151737290
2015-01-161619432
2015-01-171446805
2015-01-181408074
2015-01-191550405
2015-01-201588490
2015-01-211614505
2015-01-221623537
2015-01-231574030
2015-01-241497368
2015-01-251404334
2015-01-261678994
2015-01-271708761
2015-01-281758790
2015-01-291734302
2015-01-301571701
⋮⋮
2021-05-011005633.2
2021-05-021145334.5
2021-05-031366224.2
2021-05-041682637.2
2021-05-051642942.8
2021-05-061417251.0
2021-05-071420219.2
2021-05-081207325.2
2021-05-091284934.5
2021-05-101324527.2
2021-05-111222800.0
2021-05-121176296.5
2021-05-13 945800.8
2021-05-141039090.8
2021-05-151022135.2
2021-05-161019434.8
2021-05-171174479.0
2021-05-181212419.8
2021-05-191207679.0
2021-05-201233783.8
2021-05-211436713.2
2021-05-221465029.8
2021-05-231209283.8
2021-05-241239453.0
2021-05-251429122.8
2021-05-261484295.8
2021-05-271502677.5
2021-05-281319048.5
2021-05-291142207.5
2021-05-301077126.8