Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch data per year #15

Merged
merged 12 commits into from
Apr 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions source/targets/data_preparation/R/path_to_files.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
# Path to raw data from SOVON
path_to_counts_sovon <- function(proj_path, file) {
file_path <- file.path(proj_path, "data", "mas", file)
return(file_path)
# Paths to raw data from SOVON
paths_to_counts_sovon <- function(
proj_path,
pattern = "qgis_export_sovon_wfs") {
# List paths to all files
file_paths <- list.files(
file.path(proj_path, "data"),
pattern = pattern,
full.names = TRUE,
recursive = TRUE)

return(file_paths)
}

# Path to counting locations
Expand Down
99 changes: 99 additions & 0 deletions source/targets/data_preparation/README.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
title: "Procedure om data te downloaden van de SOVON WFS service en verwerking via targets pipeline"
author: "Hans Van Calster & Ward Langeraert"
date: "`r Sys.Date()`"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Download data via SOVON WFS service
## QGIS v3.24

1. een account aanmaken via <https://www.vogelatlas.be/user/newuser>

1. Rechten aanvragen Sovon

1. open QGIS

1. Zet project crs op EPSG:28992
1. Kies `Kaartlagen` --\> `Databronnen beheren` --\> `WFS / OGC API objecten`

1. klik `nieuw` in het dialoogvenster om een nieuwe WFS verbinding te maken

1. Gebruik als naam `sovon` en als url `https://portal.sovon.nl/views/wfs/453/`
1. Klik op OK

1. Maak een verbinding met de zopas toegevoegde WFS service

1. Klik op verbinden
1. Selecteer de laag `ms:viewpoints`
- Vraag een query aan ('Query maken') voor data van het nieuwe jaar, bv.: `SELECT * FROM viewpoints WHERE jaar = 2024`
1. Vink het boxje aan 'Alleen objecten bevragen die het huidige zichtbare bereik overlappen'
1. Klik op toevoegen
1. Geef je gebruikersnaam en paswoord
1. Indien QGIS vraagt voor transformatie van EPSG:28992 naar een ander CRS, kan je dit best cancellen

Als alles goed gaat, worden nu alle data waar je toegang toe hebt gedownload (dit kan even duren).
Dit zijn de zogenaamde bezoekstippen.

## QGIS v3.26-3.28

1. een account aanmaken via <https://www.vogelatlas.be/user/newuser>

1. Rechten aanvragen Sovon

1. open QGIS

1. Zet project crs op EPSG:28992
1. Kies `Kaartlagen` --\> `Databronnen beheren` --\> `WFS / OGC API objecten`

1. klik `nieuw` in het dialoogvenster om een nieuwe WFS verbinding te maken

1. Gebruik als naam `sovon` en als url `https://portal.sovon.nl/views/wfs/453/`
1. Klik op OK

1. Maak een verbinding met de zopas toegevoegde WFS service

1. Klik op Verbinden
1. Geef je gebruikersnaam en paswoord en kik op ok
1. Selecteer de laag `ms:viewpoints`
- Vraag een query aan ('Query maken') voor data van het nieuwe jaar, bv.: `SELECT * FROM viewpoints WHERE jaar = 2024`
1. Vink het boxje aan 'Alleen objecten bevragen die het huidige zichtbare bereik overlappen'
1. Wijzig Coördinaten Referentiesysteem naar EPSG:28992
1. Klik op toevoegen
1. Indien QGIS vraagt voor transformatie van EPSG:28992 naar een ander CRS, kan je dit best cancellen

Als alles goed gaat, worden nu alle data waar je toegang toe hebt gedownload (dit kan even duren).
Dit zijn de zogenaamde bezoekstippen.

# Exporteren en localisatie van de data

Wanneer alle data gedownload zijn, kan je deze laag exporteren:

1. Zorg dat alle data zichtbaar zijn in de view

1. Selecteer `Kaartlagen` --\> `Opslaan als ...` en sla op als `.geojson` met als bestandsnaam `<YYYYMMDD_qgis_export_sovon_wfs_JAAR>` en CRS `EPSG:28992`.
`YYYYMMDD` is de datum van export, `JAAR` is het jaar wanneer de data verzameld is, zie SQL query.
Klik op OK (dit kan even duren).
Sla het geojson-bestand op in de folder `mbag-mas/data/mas`.

1. De finale export die je wilt gebruiken voor data preparatie en latere analyses sla je op in een folder met als naam `JAAR` onder `mbag-mas/source/targets/data_preparation/data`.
Elke folder mag slechts 1 bestand hebben met de data van dat jaar (zie verder).

# Verwerking van de data

We verwerken de data via een pipeline met de [targets package](https://books.ropensci.org/targets/).
Dit omvat data selectie, preparatie en berekening van variabelen.

We maken gebruik van ["dynamic branching"](https://books.ropensci.org/targets/dynamic.html) in de targets pipeline.
Dit is een manier om nieuwe targets te definiëren terwijl de pipeline actief is.
Hierbij wordt een nieuwe target gemaakt voor elk bestand.
Bij het toevoegen van een nieuwe dataset van een jaar, zal de pipeline bijgevolg enkel de berekeningen voor de data van het nieuwe jaar moeten doen en niet opnieuw de berekeningen voor de vorige jaren.
De volledige pipeline ziet er als volgt uit:

```{r}
targets::tar_glimpse()
```
55 changes: 40 additions & 15 deletions source/targets/data_preparation/_targets.R
Original file line number Diff line number Diff line change
Expand Up @@ -38,24 +38,28 @@ source(file.path(mbag_dir, "source", "R", "predatoren_f.R"))

# Target list
list(
tarchetypes::tar_file(
name = mas_counts_sovon_file,
command = path_to_counts_sovon(
proj_path = mbag_dir,
file = "20230810_qgis_export_sovon_wfs_2023.geojson"
tarchetypes::tar_files(
name = mas_counts_sovon_files,
command = paths_to_counts_sovon(
proj_path = target_dir
)
),
tar_target(
name = mas_counts_sovon,
command = sf::st_read(
mas_counts_sovon_file
)
dsn = mas_counts_sovon_files,
quiet = TRUE
),
pattern = map(mas_counts_sovon_files),
iteration = "list"
),
tar_target(
name = crs_pipeline,
command = amersfoort_to_lambert72(
mas_counts_sovon
)
),
pattern = map(mas_counts_sovon),
iteration = "list"
),
tarchetypes::tar_file(
name = sample_file,
Expand All @@ -77,44 +81,65 @@ list(
x = crs_pipeline,
y = sample,
by = dplyr::join_by(plotnaam == pointid)
)
),
pattern = map(crs_pipeline),
iteration = "list"
),
tar_target(
name = select_time_periods,
command = select_within_time_periods(
counts_df = select_sampled_points
)
),
pattern = map(select_sampled_points),
iteration = "list"
),
tar_target(
name = select_within_radius,
command = select_within_circle_radius(
counts_df = select_time_periods,
radius = 300
)
),
pattern = map(select_time_periods),
iteration = "list"
),
tar_target(
name = select_species_groups,
command = dplyr::filter(
select_within_radius,
soortgrp %in% 1:2
)
),
pattern = map(select_within_radius),
iteration = "list"
),
tar_target(
name = remove_double_counts,
command = process_double_counted_data(
counts_df = select_species_groups
)
),
pattern = map(select_species_groups),
iteration = "list"
),
tar_target(
name = remove_subspecies_names,
command = adjust_subspecies_names_nl(
counts_df = remove_double_counts
)
),
pattern = map(remove_double_counts),
iteration = "list"
),
tar_target(
name = mas_data_clean,
name = add_predator_variable,
command = add_predator_variables(
counts_df = remove_subspecies_names
),
pattern = map(remove_subspecies_names),
iteration = "list"
),
tar_target(
name = mas_data_clean,
command = do.call(
what = rbind.data.frame,
args = c(add_predator_variable, make.row.names = FALSE)
)
)
)
61 changes: 46 additions & 15 deletions source/targets/data_preparation/_targets/meta/meta
Original file line number Diff line number Diff line change
@@ -1,28 +1,59 @@
name|type|data|command|depend|seed|path|time|size|bytes|format|repository|iteration|parent|children|seconds|warnings|error
.Random.seed|object|93b4d65f2506ee1c|||||||||||||||
add_predator_variable|stem|b1d6101a48661ce9|e1c93a06d9f6d4cb|d65332b68dd4f90d|-1919973047||t19836.3580925959s|448d33d82c53cd6c|1211653|qs|local|vector|||0.11||
.Random.seed|object|9f7a7034c0e37700|||||||||||||||
add_predator_variable|pattern|01c728dfa7f8b9cc|e1c93a06d9f6d4cb||-1919973047||||2253323|qs|local|list||add_predator_variable_cb66b9e8*add_predator_variable_85b65ad8|0.2||
add_predator_variable_85b65ad8|branch|b1d6101a48661ce9|e1c93a06d9f6d4cb|801966e5819e4cf7|-1006700069||t19837.3755483621s|448d33d82c53cd6c|1211653|qs|local|list|add_predator_variable||0.09||
add_predator_variable_cb66b9e8|branch|031187f0a3d3d06d|e1c93a06d9f6d4cb|ff7cec7a334367e8|-758145585||t19837.375545188s|6338640b9116fd0c|1041670|qs|local|list|add_predator_variable||0.11||
add_predator_variable_fab8aebd|branch|b1d6101a48661ce9|e1c93a06d9f6d4cb|6d94daf6feaabdba|-1267021414||t19836.6334798554s|448d33d82c53cd6c|1211653|qs|local|list|add_predator_variable||0.16||
add_predator_variables|function|7467ab35b1f1bd3d|||||||||||||||
adjust_subspecies_names_nl|function|50a4c7e8d7a82397|||||||||||||||
amersfoort_to_lambert72|function|7a05a501641027b3|||||||||||||||
crs_pipeline|stem|24316d2285354371|fc01a4ccb0c4ce92|980a00c69b7059d5|-1580479739||t19836.3456511859s|cd33a8123b9ea861|1405399|qs|local|vector|||0.65||
crs_pipeline|pattern|235a92d58c68e1f8|fc01a4ccb0c4ce92||-1580479739||||3378016|qs|local|list||crs_pipeline_64cad22b*crs_pipeline_1af84150|2.98||
crs_pipeline_1af84150|branch|25cc7b9c3738e95d|fc01a4ccb0c4ce92|e944c1068696cbd8|-1771903272||t19837.3754989254s|2fbb451c02f1c505|1445019|qs|local|list|crs_pipeline||0.5||
crs_pipeline_64cad22b|branch|9ab625d2f93529ae|fc01a4ccb0c4ce92|864fa89216d77cc1|-908645192||t19837.3754907866s|9899be058e361a24|1932997|qs|local|list|crs_pipeline||2.48||
crs_pipeline_916fc15e|branch|24316d2285354371|fc01a4ccb0c4ce92|a8efcc518d143ecd|1475784767||t19836.6334441313s|cd33a8123b9ea861|1405399|qs|local|list|crs_pipeline||0.91||
kraaiachtigen_f|function|32a6f93504fb3f7a|||||||||||||||
mas_counts_sovon|stem|3079d5194a1a13d4|404705b488b2fa7b|88ddfa3cdb0ecee3|222117260||t19831.5074770628s|e5aa51f74688237f|918916|qs|local|vector|||1.44||
mas_counts_sovon_file|stem|8b786fe9ff0cbd2e|6ce5a8aaba143dfe|9c71b69a8c465d4b|-1372158522|C:/R/git_repositories/mbag-mas/data/mas/20230810_qgis_export_sovon_wfs_2023.geojson|t19579.6291970393s|ce6386d2aec493f8|24420334|file|local|vector|||0.22||
mas_data_clean|stem|b1d6101a48661ce9|e1c93a06d9f6d4cb|7114343701e0abce|-1942276229||t19836.3791747591s|448d33d82c53cd6c|1211653|qs|local|vector|||0.11||
mas_counts_sovon|pattern|5fbe6dfeb0d38926|168c3b81e4150fa3||222117260||||2225605|qs|local|list||mas_counts_sovon_91991620*mas_counts_sovon_614b0811|3.13||
mas_counts_sovon_1da3ce1e|branch|3079d5194a1a13d4|168c3b81e4150fa3|6eec518936a1b4f1|881722848||t19836.6334312342s|e5aa51f74688237f|918916|qs|local|list|mas_counts_sovon||1.95||
mas_counts_sovon_614b0811|branch|10642f9094fdec9b|168c3b81e4150fa3|ca9e87ddd0d9c32c|-548042463||t19837.3754597336s|5b3a0e257f2b0824|950517|qs|local|list|mas_counts_sovon||1.31||
mas_counts_sovon_91991620|branch|effc8a3fabb0ef20|168c3b81e4150fa3|c8ebcef82b558ede|337220797||t19837.375443185s|1d54da80af1b45c9|1275088|qs|local|list|mas_counts_sovon||1.82||
mas_counts_sovon_files|pattern|0e998fcd11de47cb|c52338d7124b0a05||1920956259||||57764640|file|local|vector||mas_counts_sovon_files_c804bee4*mas_counts_sovon_files_b858e527|0||
mas_counts_sovon_files_4776d7dc|branch|8b786fe9ff0cbd2e|c52338d7124b0a05|ef46db3751d8e999|-1411769136|C:/R/git_repositories/mbag-mas/source/targets/data_preparation/data/2023/20230810_qgis_export_sovon_wfs_2023.geojson|t19579.6291970393s|ce6386d2aec493f8|24420334|file|local|vector|mas_counts_sovon_files||0||
mas_counts_sovon_files_b858e527|branch|ab8d895f67e61e2f|c52338d7124b0a05|ef46db3751d8e999|1632737556|C:/R/git_repositories/mbag-mas/source/targets/data_preparation/data/2023/20240424_qgis_export_sovon_wfs_2023.geojson|t19837.3619874362s|71c635049da56b6f|24964075|file|local|vector|mas_counts_sovon_files||0||
mas_counts_sovon_files_c804bee4|branch|0a5fba55a0968fbb|c52338d7124b0a05|ef46db3751d8e999|90413319|C:/R/git_repositories/mbag-mas/source/targets/data_preparation/data/2018_2022/20240424_qgis_export_sovon_wfs_2018_2022.geojson|t19837.3739489539s|0fabe1ebcd5e16ff|32800565|file|local|vector|mas_counts_sovon_files||0||
mas_counts_sovon_files_files|stem|2351b7f404e60792|9afdce93aca4c6a2|fad413a8af0d6d43|1663214924||t19837.3804801301s|ef5bb583e5bfa4cd|174|rds|local|vector||mas_counts_sovon_files_files_b080d5d2*mas_counts_sovon_files_files_ab3c34ba|0.22||
mas_data_clean|stem|44e15449852bb5ad|f3f492e13a4a7743|8a3d0f01a50a17eb|-1942276229||t19837.3804850748s|03f66b67abe35641|2222011|qs|local|vector|||0.09||
mbag_dir|object|73add2da6d24990f|||||||||||||||
path_to_counts_sovon|function|b8a0d214f65ee27f|||||||||||||||
path_to_samples|function|5ecc4024b743a014|||||||||||||||
paths_to_counts_sovon|function|3dfadc517ce46981|||||||||||||||
predatoren_f|function|febc8f3bf40ecb7d|||||||||||||||
process_double_counted_data|function|705fe098ec314f30|||||||||||||||
remove_double_counts|stem|33d4032f50d2b5ed|56d403ffe2f51374|bc3a8ebbfcaba53e|-48525810||t19836.3456652496s|d83934c08084e74c|1195980|qs|local|vector|||0.17||
remove_subspecies_names|stem|35f7dbbf56f35d05|6192e8e36bfb46d7|a54bc7e6764336c6|1657860091||t19836.3580890818s|ca0af1b203c1cf95|1195963|qs|local|vector|||0.09||
remove_double_counts|pattern|ffe95c1207105760|56d403ffe2f51374||-48525810||||2224185|qs|local|list||remove_double_counts_f63cd31f*remove_double_counts_c457cfbe|0.39||
remove_double_counts_2183673c|branch|33d4032f50d2b5ed|56d403ffe2f51374|120e9624994e724f|-1583060927||t19836.6334707344s|d83934c08084e74c|1195980|qs|local|list|remove_double_counts||0.25||
remove_double_counts_c457cfbe|branch|33d4032f50d2b5ed|56d403ffe2f51374|ab0b8bcc64a7515b|-1868331755||t19837.3755358011s|d83934c08084e74c|1195980|qs|local|list|remove_double_counts||0.14||
remove_double_counts_f63cd31f|branch|37b50a4fb2d8fcf2|56d403ffe2f51374|01d2ddb2c97905f7|-466127144||t19837.3755321213s|67d14343058855aa|1028205|qs|local|list|remove_double_counts||0.25||
remove_subspecies_names|pattern|c2ee75b4dd7cea9f|6192e8e36bfb46d7||1657860091||||2224142|qs|local|list||remove_subspecies_names_d6384eb7*remove_subspecies_names_b2cf8fa2|0.2||
remove_subspecies_names_8033c49b|branch|35f7dbbf56f35d05|6192e8e36bfb46d7|d6984430fdb56526|425039760||t19836.6334751919s|ca0af1b203c1cf95|1195963|qs|local|list|remove_subspecies_names||0.14||
remove_subspecies_names_b2cf8fa2|branch|35f7dbbf56f35d05|6192e8e36bfb46d7|3f914117ad872cd8|1016185236||t19837.3755419358s|ca0af1b203c1cf95|1195963|qs|local|list|remove_subspecies_names||0.11||
remove_subspecies_names_d6384eb7|branch|1d4a71f600a3d00d|6192e8e36bfb46d7|be2db777c07b1ce5|518042368||t19837.3755388335s|a03bcf07b5ae740d|1028179|qs|local|list|remove_subspecies_names||0.09||
roofvogels_f|function|ff0ef1f62c67a283|||||||||||||||
sample|stem|e4966dd76c186021|7c7940c3ee902e8e|351c5860311905d4|887991846||t19831.545437312s|e3ea31e6dc0d59a4|15016|qs|local|vector|||0.07||
sample_file|stem|12ec54a1802f0ccc|9373cad8a757c886|fce4235ad2580001|1141903364|C:/R/git_repositories/mbag-mas/data/steekproefkaders/steekproef_avimap_mbag_piloot.csv|t19803.5701886462s|60ef8d6db1574934|62070|file|local|vector|||0||
select_sampled_points|stem|f751de663336b945|b659c8f8ce392cde|ecc1e7b45b6a8d0c|-107447084||t19831.5248409952s|d8de74379de70ddb|1187211|qs|local|vector|||0.03||
select_species_groups|stem|f9ee84b7905152e5|3f1e242af19e665d|50097636e83c673b|-1185638444||t19831.5759555089s|9d27dbe475bb2b9d|1194195|qs|local|vector|||0.04||
select_time_periods|stem|e194aeeaa582d6a6|206ce93b3790e30c|4f8cbc5f7ebd8ca9|291324708||t19836.3456566927s|2783a0b288bff409|1201608|qs|local|vector|||0.28||
sample|stem|e4966dd76c186021|7c7940c3ee902e8e|351c5860311905d4|887991846||t19836.6334056857s|e3ea31e6dc0d59a4|15016|qs|local|vector|||0.14||
sample_file|stem|12ec54a1802f0ccc|9373cad8a757c886|fce4235ad2580001|1141903364|C:/R/git_repositories/mbag-mas/data/steekproefkaders/steekproef_avimap_mbag_piloot.csv|t19803.5701886462s|60ef8d6db1574934|62070|file|local|vector|||0.35||
select_sampled_points|pattern|46f373e6f1e29ecd|b659c8f8ce392cde||-107447084||||2222005|qs|local|list||select_sampled_points_798b2ce7*select_sampled_points_88f19b3d|0.08||
select_sampled_points_4f539302|branch|f751de663336b945|b659c8f8ce392cde|a65d89958ab07d0a|-1764201090||t19836.6334476012s|d8de74379de70ddb|1187211|qs|local|list|select_sampled_points||0.05||
select_sampled_points_798b2ce7|branch|80b137acdb5e8b01|b659c8f8ce392cde|bb98d2b8fd3cc0de|-1314208889||t19837.3755014323s|78ad3df6a3489d61|1034794|qs|local|list|select_sampled_points||0.05||
select_sampled_points_88f19b3d|branch|f751de663336b945|b659c8f8ce392cde|fbf7d5e89f0b86db|-648933279||t19837.3755038851s|d8de74379de70ddb|1187211|qs|local|list|select_sampled_points||0.03||
select_species_groups|pattern|daa88c8ce05e71c3|3f1e242af19e665d||-1185638444||||2221551|qs|local|list||select_species_groups_49cc114d*select_species_groups_f7e0a6a0|0.05||
select_species_groups_49cc114d|branch|482c066fc0b0f032|3f1e242af19e665d|40712c34058e5d42|-433748385||t19837.3755250427s|d12ae4d089c90245|1027356|qs|local|list|select_species_groups||0.03||
select_species_groups_9005462e|branch|f9ee84b7905152e5|3f1e242af19e665d|653a67f1416430f0|1024928494||t19836.6334651819s|9d27dbe475bb2b9d|1194195|qs|local|list|select_species_groups||0.03||
select_species_groups_f7e0a6a0|branch|f9ee84b7905152e5|3f1e242af19e665d|18248565ac075282|824669762||t19837.3755272638s|9d27dbe475bb2b9d|1194195|qs|local|list|select_species_groups||0.02||
select_time_periods|pattern|7826a632a7f2449b|206ce93b3790e30c||291324708||||2247642|qs|local|list||select_time_periods_106885d3*select_time_periods_29eeb0e5|0.59||
select_time_periods_00ee06b7|branch|e194aeeaa582d6a6|206ce93b3790e30c|2e587d1043c214a5|1334375700||t19836.6334562646s|2783a0b288bff409|1201608|qs|local|list|select_time_periods||0.52||
select_time_periods_106885d3|branch|06c7e3bf807396d0|206ce93b3790e30c|20ba66030adc44e5|195757004||t19837.3755096047s|4920a3c8cbb29c0c|1046034|qs|local|list|select_time_periods||0.33||
select_time_periods_29eeb0e5|branch|e194aeeaa582d6a6|206ce93b3790e30c|83e2769cddc8e49b|-1029364792||t19837.3755147218s|2783a0b288bff409|1201608|qs|local|list|select_time_periods||0.26||
select_within_circle_radius|function|880681aea06f4662|||||||||||||||
select_within_radius|stem|f9ee84b7905152e5|b43ee3d813fbfa23|f5c01b2557c9822c|-523127267||t19836.3456611804s|9d27dbe475bb2b9d|1194195|qs|local|vector|||0.19||
select_within_radius|pattern|87eb19e22f8fb771|b43ee3d813fbfa23||-523127267||||2221551|qs|local|list||select_within_radius_dcff1891*select_within_radius_2565ae04|0.38||
select_within_radius_2565ae04|branch|f9ee84b7905152e5|b43ee3d813fbfa23|010a05c2966bd8d0|-237587632||t19837.3755230418s|9d27dbe475bb2b9d|1194195|qs|local|list|select_within_radius||0.19||
select_within_radius_730aa92e|branch|f9ee84b7905152e5|b43ee3d813fbfa23|d58b5b6016a3cc70|1596416445||t19836.6334621581s|9d27dbe475bb2b9d|1194195|qs|local|list|select_within_radius||0.29||
select_within_radius_dcff1891|branch|482c066fc0b0f032|b43ee3d813fbfa23|0d8a75fc941e0264|1251980147||t19837.3755189017s|d12ae4d089c90245|1027356|qs|local|list|select_within_radius||0.19||
select_within_time_periods|function|2798499bd63b690b|||||||||||||||
target_dir|object|5289a857edb685f1|||||||||||||||
Loading