library(tidyverse)
Warning message in system("timedatectl", intern = TRUE):
“running command 'timedatectl' had status 1”
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.5 ✔ purrr 0.3.4
✔ tibble 3.1.4 ✔ dplyr 1.0.7
✔ tidyr 1.1.3 ✔ stringr 1.4.0
✔ readr 2.0.1 ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
data <- read_csv("data/data_OBS_DEU_PT1H_T2M.csv")
Rows: 790288 Columns: 6 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (1): Produkt_Code dbl (3): SDO_ID, Wert, Qualitaet_Byte dttm (1): Zeitstempel ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
Warning message: “One or more parsing issues, see `problems()` for details”
| Produkt_Code | SDO_ID | Zeitstempel | Wert | Qualitaet_Byte | Qualitaet_Niveau |
|---|---|---|---|---|---|
| <chr> | <dbl> | <dttm> | <dbl> | <dbl> | <dbl> |
| OBS_DEU_PT1H_T2M | 2600 | 2005-03-01 00:00:00 | -12.4 | 4 | 3 |
| OBS_DEU_PT1H_T2M | 2600 | 2005-03-01 01:00:00 | -12.6 | 6 | 10 |
| OBS_DEU_PT1H_T2M | 2600 | 2005-03-01 02:00:00 | -13.1 | 1 | 10 |
| OBS_DEU_PT1H_T2M | 2600 | 2005-03-01 03:00:00 | -13.7 | 1 | 10 |
| OBS_DEU_PT1H_T2M | 2600 | 2005-03-01 04:00:00 | -14.4 | 1 | 10 |
| OBS_DEU_PT1H_T2M | 2600 | 2005-03-01 05:00:00 | -14.9 | 1 | 10 |
station <- read_csv("data/sdo_OBS_DEU_PT1H_T2M.csv")
Rows: 2 Columns: 6 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (2): SDO_Name, Metadata_Link dbl (2): SDO_ID, Hoehe_ueber_NN ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
station
| SDO_ID | SDO_Name | Geogr_Laenge | Geogr_Breite | Hoehe_ueber_NN | Metadata_Link |
|---|---|---|---|---|---|
| <dbl> | <chr> | <dbl> | <dbl> | <dbl> | <chr> |
| 5705 | Würzburg | 99576 | 497704 | 268 | https://cdc.dwd.de/rest/metadata/station/html/812300016295 |
| 2600 | Kitzingen | 101781 | 497363 | 193 | https://cdc.dwd.de/rest/metadata/station/html/812300321959 |
data <- data %>% mutate(SDO_ID = if_else(SDO_ID==2600, "Kitzingen", "Würzburg"))
data %>%
group_by(SDO_ID) %>%
summarise(min_Wert = min(Wert), max = max(Wert), mean=mean(Wert)) %>%
mutate(range = max-min_Wert)
| SDO_ID | min_Wert | max | mean | range |
|---|---|---|---|---|
| <chr> | <dbl> | <dbl> | <dbl> | <dbl> |
| Kitzingen | -20.5 | 39.4 | 10.687821 | 59.9 |
| Würzburg | -23.4 | 39.3 | 9.500432 | 62.7 |
library(lubridate)
Attaching package: ‘lubridate’
The following objects are masked from ‘package:base’:
date, intersect, setdiff, union
data$Zeitstempel[1:5] %>% year
data <- data %>% mutate(year = year(Zeitstempel))
In Kitzingen beginnen die Daten im März 2005, der Durchschnitt für 2005 ist also verfälscht
data %>% filter(SDO_ID == "Kitzingen", year==2005) %>% count(month(Zeitstempel))
| month(Zeitstempel) | n |
|---|---|
| <dbl> | <int> |
| 3 | 744 |
| 4 | 720 |
| 5 | 744 |
| 6 | 720 |
| 7 | 744 |
| 8 | 744 |
| 9 | 720 |
| 10 | 744 |
| 11 | 720 |
| 12 | 744 |
data %>%
filter(SDO_ID=="Würzburg" | year > 2005) %>% # ignore Kitzingen 2005 it is incomplete (jan and feb missing)
group_by(SDO_ID, year) %>%
summarise(mean=mean(Wert)) %>%
ggplot(aes(x = year, y = mean, col = SDO_ID)) + geom_line()
`summarise()` has grouped output by 'SDO_ID'. You can override using the `.groups` argument.
Es gibt noch sehr viele Möglichkeiten für weitere Fragestellungen. Zum Beispiel kann man den erkennbaren Trend in der jährlichen Durchschnittstemperatur versuchen zu modellieren oder allgemein Vorhersagemodelle erstellen und testen.