German Electricity Data :electric_plug:¶

  • Data Source: SMARD Web Portal
  • Task: forecasting (using the past to predict the future)
  • Language: julia

First steps:¶

  • load the data (with appropriate column types)
  • split the data into train/val/test sets
  • plot the time series (at different temporal resolution)
  • calculate the mean production per month
  • is there a seasonal pattern (what kinds of seasonalities)
  • make a prediction for the validation set
  • evaluate the quality of the prediction
In [1]:
using Random
In [2]:
Random.seed!(530)
"Max Joel Felix Caro Kerstin Mike Markus Chris" |> split |> shuffle |> x -> join(x," -> ")
Out[2]:
"Max -> Joel -> Markus -> Chris -> Kerstin -> Felix -> Caro -> Mike"

1. Data loading¶

  • load the hourly data from the file Realisierte_Erzeugung_201501010000_202305301300_Stunde.csv as a table into julia
  • make sure, that data is interpreted and typed correctly
  • clean header names
In [3]:
using DataFrames
using CSV
In [73]:
data = DataFrame(CSV.File("Realisierte_Erzeugung_201501010000_202305301300_Stunde.csv", delim=";", dateformat="dd.mm.yyyy"));
In [74]:
head(data)
Out[74]:

6 rows × 15 columns (omitted printing of 10 columns)

DatumAnfangEndeBiomasse [MWh] Berechnete AuflösungenWasserkraft [MWh] Berechnete Auflösungen
Date…StringStringStringString
12015-01-0100:0001:004.024,251.158,25
22015-01-0101:0002:003.982,751.188
32015-01-0102:0003:004.019,51.139,25
42015-01-0103:0004:004.040,751.122,5
52015-01-0104:0005:004.037,751.112
62015-01-0105:0006:004.028,251.107,75
In [11]:
data["Anfang"]
Out[11]:
73716-element Vector{Dates.Time}:
 00:00:00
 01:00:00
 02:00:00
 03:00:00
 04:00:00
 05:00:00
 06:00:00
 07:00:00
 08:00:00
 09:00:00
 10:00:00
 11:00:00
 12:00:00
 ⋮
 01:00:00
 02:00:00
 03:00:00
 04:00:00
 05:00:00
 06:00:00
 07:00:00
 08:00:00
 09:00:00
 10:00:00
 11:00:00
 12:00:00
In [13]:
data["Biomasse [MWh] Berechnete Auflösungen"]
Out[13]:
73716-element Vector{String}:
 "4.024,25"
 "3.982,75"
 "4.019,5"
 "4.040,75"
 "4.037,75"
 "4.028,25"
 "4.013,25"
 "4.012,75"
 "3.999,75"
 "4.016,25"
 "4.007,75"
 "4.011,75"
 "4.014"
 ⋮
 "4.581,5"
 "4.551"
 "4.503,25"
 "4.454,25"
 "4.406,75"
 "4.413,25"
 "4.468"
 "4.503,5"
 "4.544,5"
 "4.568,25"
 "4.618,75"
 "4.612,25"
In [16]:
parse.(Float64, data["Biomasse [MWh] Berechnete Auflösungen"])
ArgumentError: cannot parse "4.024,25" as Float64

Stacktrace:
  [1] _parse_failure(T::Type, s::String, startpos::Int64, endpos::Int64) (repeats 2 times)
    @ Base ./parse.jl:373
  [2] #tryparse_internal#478
    @ ./parse.jl:369 [inlined]
  [3] tryparse_internal
    @ ./parse.jl:366 [inlined]
  [4] #parse#479
    @ ./parse.jl:379 [inlined]
  [5] parse
    @ ./parse.jl:379 [inlined]
  [6] _broadcast_getindex_evalf
    @ ./broadcast.jl:670 [inlined]
  [7] _broadcast_getindex
    @ ./broadcast.jl:653 [inlined]
  [8] getindex
    @ ./broadcast.jl:597 [inlined]
  [9] macro expansion
    @ ./broadcast.jl:961 [inlined]
 [10] macro expansion
    @ ./simdloop.jl:77 [inlined]
 [11] copyto!
    @ ./broadcast.jl:960 [inlined]
 [12] copyto!
    @ ./broadcast.jl:913 [inlined]
 [13] copy
    @ ./broadcast.jl:885 [inlined]
 [14] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(parse), Tuple{Base.RefValue{Type{Float64}}, Vector{String}}})
    @ Base.Broadcast ./broadcast.jl:860
 [15] top-level scope
    @ In[16]:1
In [23]:
# replace . by "" and , by "."
parse.(Float64, replace.(replace.(data["Biomasse [MWh] Berechnete Auflösungen"], Pair(".","")), Pair(",",".")))
ArgumentError: cannot parse "-" as Float64

Stacktrace:
  [1] _parse_failure(T::Type, s::String, startpos::Int64, endpos::Int64) (repeats 2 times)
    @ Base ./parse.jl:373
  [2] #tryparse_internal#478
    @ ./parse.jl:369 [inlined]
  [3] tryparse_internal
    @ ./parse.jl:366 [inlined]
  [4] #parse#479
    @ ./parse.jl:379 [inlined]
  [5] parse
    @ ./parse.jl:379 [inlined]
  [6] _broadcast_getindex_evalf
    @ ./broadcast.jl:670 [inlined]
  [7] _broadcast_getindex
    @ ./broadcast.jl:653 [inlined]
  [8] getindex
    @ ./broadcast.jl:597 [inlined]
  [9] macro expansion
    @ ./broadcast.jl:961 [inlined]
 [10] macro expansion
    @ ./simdloop.jl:77 [inlined]
 [11] copyto!
    @ ./broadcast.jl:960 [inlined]
 [12] copyto!
    @ ./broadcast.jl:913 [inlined]
 [13] copy
    @ ./broadcast.jl:885 [inlined]
 [14] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(parse), Tuple{Base.RefValue{Type{Float64}}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(replace), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(replace), Tuple{Vector{String}, Base.RefValue{Pair{String, String}}}}, Base.RefValue{Pair{String, String}}}}}})
    @ Base.Broadcast ./broadcast.jl:860
 [15] top-level scope
    @ In[23]:1
In [30]:
#parse.(Float64, data["Biomasse [MWh] Berechnete Auflösungen"], Pair("-",) )
#data["Biomasse [MWh] Berechnete Auflösungen"]
#data["Biomasse [MWh] Berechnete Auflösungen"][occursin.("-", data["Biomasse [MWh] Berechnete Auflösungen"])] = None

replace.(data["Biomasse [MWh] Berechnete Auflösungen"] , r"^-$" => s"")
Out[30]:
73716-element Vector{String}:
 "4.024,25"
 "3.982,75"
 "4.019,5"
 "4.040,75"
 "4.037,75"
 "4.028,25"
 "4.013,25"
 "4.012,75"
 "3.999,75"
 "4.016,25"
 "4.007,75"
 "4.011,75"
 "4.014"
 ⋮
 "4.581,5"
 "4.551"
 "4.503,25"
 "4.454,25"
 "4.406,75"
 "4.413,25"
 "4.468"
 "4.503,5"
 "4.544,5"
 "4.568,25"
 "4.618,75"
 "4.612,25"
In [31]:
names(data)
Out[31]:
15-element Vector{String}:
 "Datum"
 "Anfang"
 "Ende"
 "Biomasse [MWh] Berechnete Auflösungen"
 "Wasserkraft [MWh] Berechnete Auflösungen"
 "Wind Offshore [MWh] Berechnete Auflösungen"
 "Wind Onshore [MWh] Berechnete Auflösungen"
 "Photovoltaik [MWh] Berechnete Auflösungen"
 "Sonstige Erneuerbare [MWh] Berechnete Auflösungen"
 "Kernenergie [MWh] Berechnete Auflösungen"
 "Braunkohle [MWh] Berechnete Auflösungen"
 "Steinkohle [MWh] Berechnete Auflösungen"
 "Erdgas [MWh] Berechnete Auflösungen"
 "Pumpspeicher [MWh] Berechnete Auflösungen"
 "Sonstige Konventionelle [MWh] Berechnete Auflösungen"
In [50]:
for column in names(data)[4:end]
    data[column] = replace.(data[column] , r"^-$" => -999)
    data[column] = parse.(Float64, replace.(replace.(data[column], Pair(".","")), Pair(",",".")))
    data[column] = replace.(data[column] , -999 => NaN)
end
MethodError: no method matching similar(::Float64, ::Type{Float64})
Closest candidates are:
  similar(::Union{LinearAlgebra.Adjoint{T, var"#s886"}, LinearAlgebra.Transpose{T, var"#s886"}} where {T, var"#s886"<:(AbstractVector)}, ::Type{T}) where T at /ext/julia/julia-1.8.4/share/julia/stdlib/v1.8/LinearAlgebra/src/adjtrans.jl:207
  similar(::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S}, ::Type{T}) where T at /ext/julia/julia-1.8.4/share/julia/stdlib/v1.8/LinearAlgebra/src/adjtrans.jl:211
  similar(::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S}, ::Type{T}, ::Tuple{Vararg{Int64, N}}) where {T, N} at /ext/julia/julia-1.8.4/share/julia/stdlib/v1.8/LinearAlgebra/src/adjtrans.jl:212
  ...

Stacktrace:
 [1] _similar_or_copy(x::Float64, #unused#::Type{Float64})
   @ Base ./set.jl:497
 [2] replace(A::Float64, old_new::Pair{Regex, Int64}; count::Nothing)
   @ Base ./set.jl:615
 [3] replace(A::Float64, old_new::Pair{Regex, Int64})
   @ Base ./set.jl:611
 [4] _broadcast_getindex_evalf
   @ ./broadcast.jl:670 [inlined]
 [5] _broadcast_getindex
   @ ./broadcast.jl:643 [inlined]
 [6] getindex
   @ ./broadcast.jl:597 [inlined]
 [7] copy
   @ ./broadcast.jl:899 [inlined]
 [8] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(replace), Tuple{Vector{Float64}, Base.RefValue{Pair{Regex, Int64}}}})
   @ Base.Broadcast ./broadcast.jl:860
 [9] top-level scope
   @ ./In[50]:2
In [55]:
for print(data["Biomasse [MWh] Berechnete Auflösungen"])
MethodError: no method matching occursin(::Float64, ::Float64)
Closest candidates are:
  occursin(::Any) at strings/search.jl:642

Stacktrace:
 [1] _broadcast_getindex_evalf
   @ ./broadcast.jl:670 [inlined]
 [2] _broadcast_getindex
   @ ./broadcast.jl:643 [inlined]
 [3] getindex
   @ ./broadcast.jl:597 [inlined]
 [4] copy
   @ ./broadcast.jl:899 [inlined]
 [5] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(occursin), Tuple{Float64, Vector{Float64}}})
   @ Base.Broadcast ./broadcast.jl:860
 [6] top-level scope
   @ In[55]:1
In [0]:

In [0]:

In [64]:
for i in data["Biomasse [MWh] Berechnete Auflösungen"]
    if isnan(i)
        println(i)
    end
end
NaN
NaN
NaN
NaN
In [65]:
NaN < missing
Out[65]:
missing
In [66]:
NaN < NaN
Out[66]:
false
In [69]:
NaN === NaN
Out[69]:
true
In [72]:
0 == 0.0
Out[72]:
true
In [67]:
missing < missing
Out[67]:
missing