GitHub - Durin-project/Durin_data

This is the git repository for the DURIN project. The goal of this repo is to organize and streamline the data management in the project and beyond.

DATA MANAGEMENT

Location of data, metadata and code

The overview of all the datasets is here.

The raw and clean datasets from are stored and will be made available after the end of the project on OSF. For now the data is only available to the project partners.

All R code for the cleaning the raw data is available on the Durin GitHub. Code pushed to GitHub should be clean, tested that it does what it is supposed to do and run on any computer (e.g. no absolute path).

The data documentation, including a draft for the data paper is available here (only available for authors), and the data dictionaries are in this below in this readme file.

Meta data

To help you with the correct naming convention use the function from the dataDocumentation package to create the meta data. Install the dataDocumentation R package like this:

# if needed install the remotes package
install.packages("remotes")

# then install the dataDocumentation package
remotes::install_github("audhalbritter/dataDocumentation")

# and load it
library(dataDocumentation)

The function create_durin_meta_data() creates meta data. The meta data can be made for the 4Corners, DroughtNet or Nutrient study. Use the argument study to define which meta data you want. 4Corners: study on the 4 main Durin sites; DroughtNet: droughtNet experiment at Lygra and Tjotta; Nutrient: nutrient experiment at Lygra

my_meta <- create_durin_meta_data(study = "4Corners")
my_meta

If you want to save the metadata as a csv file, set csv_output = TRUE and you can choose a file name (not mandatory). The data will be stored in the project directory. The file name always contains Durin and the study, e.g. Durin_4Corners_filename.csv.

create_durin_meta_data(study = "4Corners", csv_output = TRUE, filename = "biomass")

Naming conventions for the clean datasets and files

We use snake_case for names and coding. Snake_case means that we use lower case letters separated by an underscore, for example biomass_g or age_class. The one exception are siteID, blockID, plotID (legacy from previous projects).

File and variable names should be meaningful. Do not use my_data.csv or var1. Follow the naming convention for the main variables (see full description of the main variables in the table below). Note that these naming conventions apply for clean datasets. For field datasheets and raw data, the naming convention does not have to be followed as strictly if other variable facilitate data collection (e.g. splitting date into year, month and day to avoid excel issues).

We use redundancy in variable names to avoid mistakes. PlotID should contain the higher hierarchical levels such as siteID, blockID, treatment, habitat, species etc. For example plotID contains siteID, habitat, species and plot number: LY_O_VV_1, LY_F_VM_3

Files or variable	Naming convention	Example
File name	Project_Status_Study_Approach_Response_Year(s).Extension	DURIN_clean_gradient_field_cflux_2023-2025.csv
	Project	DURIN
	Status	raw or clean
	Study	4Corners: study on the 4 main Durin sites; DroughtNet: droughtNet experiment at Lygra and Tjotta (previously part of Landpress); Gradient: gradient study between 4 corners; Nutrient: nutrient experiment at Lygra; ClimateChamber: climate chamber experiment in Oslo?
	Approach	field, lab, molecular, other
	Response	trait, biomass, flux, etc.

4Corners
date	Date of data collection	yyyy-mm-dd; do not split year, month and day into several columns
year	Year of data collection	yyyy; sometimes there is no specific date, then year can be used
site_name	Full site name	Lygra, Sogndal, Senja, Kautokeino and Tjotta
siteID	Unique siteID, first 2 letters of site_name.	LY, SO, SE, and KA, TJ
biogeography	Biogeography of the site	Boreal, Sub-arctic
oceanity	Oceanity of the site	Coast, Inland
habitat	Open versus forested habitat	Open, Forested
plot_nr	Plot number, numeric value from 1-5.	1-5
plotID	Unique plot ID as a combination of siteID, habitat, speciesID and plot number	LY_O_VV_1, KA_F_VM_5
species	Vascular plant taxon names follow Elven et al. (2022). We use full species names. For field sheets the names can be abbreviated (see speciesID), but the clean data should contain the full species name	Vaccinium myrtillus
speciesID	2 letter abbreviation of species	VM, VV, CV, EN, BN
plant_nr	Plant number, numeric value	1-n
plantID	plantID is not defined but can be constructed by concatenate siteID, etc.	…
segment	S24 = growth in 2024, S23 = growth in 2023, S22 = growth in 2022	S24, S23, S22
variable	Response variable(s)	e.g. cover, biomass, Reco
value	Value of response variable(s)	numeric value
unit	Unit for response variable(s)	percent, µmol m−2 s−1
other variables	Other important variables in the dataset	remark, data collector, weather, flag

DroughtNet experiment
date	Date of data collection	yyyy-mm-dd; do not split year, month and day into several columns
year	Year of data collection	yyyy; sometimes there is no specific date, then year can be used
site_name	Site name	Lygra, Tjotta
siteID	Unique siteID, first 2 letters of site_name.	LY, TJ
geography	Location according to latitude	North, South
habitat	Open habitat	Open
age_class	Age class of the vegetation representing post-fire successional stages.	Pioneer, Building, Mature
age_classID	Age class ID of the vegetation representing post-fire successional stages.	PIO, BUI, MAT
drought_treatment	Drought treatment using rain-out shelters that reduce roof cover by 0 = ambient, 60 = moderate, or 90% = extreme	Ambient, Moderate, Extreme
drought_treatmentID	Drought treatment ID using first three letters of drought_treatment	AMB, MOD, EXT
plot_nr	Unique plotID from the DroughtNet frames in the field. Correspond with Landpress naming.	1.1,1.2,1.3 - 9.1,9.2,9.3
plotID	Unique plotID as a combination of siteID, age_classID, drought_treatmentID and plot_nr	LY_PIO_AMB_1.3, LY_MAT_EXT_9.1 (note these IDs might not exist!)
species	Vascular plant taxon names follow Elven et al. (2022). We use full species names. For field sheets the names can be abbreviated (see speciesID), but the clean data should contain the full species name	Vaccinium myrtillus
speciesID	2 letter abbreviation of species	VM, VV, CV, EN
segment	S24 = growth in 2024, S23 = growth in 2023, S22 = growth in 2022	S24, S23, S22
variable	Response variable(s)	e.g. cover, biomass, Reco
value	Value of response variable(s)	numeric value
unit	Unit for response variable(s)	percent, µmol m−2 s−1
other variables	Other important variables in the dataset	remark, data collector, weather, flag

Nutrient experiment
date	Date of data collection	yyyy-mm-dd; do not split year, month and day into several columns
year	Year of data collection	yyyy; sometimes there is no specific date, then year can be used
site_name	Site name	Lygra
siteID	Unique siteID, first 2 letters of site_name.	LY
habitat	Open habitat	Open
age_class	Vegetation representing post-fire successional stages.	Building
nitrogen_addition	Added level of nitrogen in kg ha-1 y-1	0, 1, 5, 10, 25
block_nr	Block number as N plus numeric value	N1, N2, N3, N4, N5
plotID	Unique plotID is combination of block number and nitrogen addition level combined by a dash	e.g. N1-10, N5-1
segment	S24 = growth in 2024, S23 = growth in 2023, S22 = growth in 2022	S24, S23, S22
variable	Response variable(s)	e.g. cover, biomass, Reco
value	Value of response variable(s)	numeric value
unit	Unit for response variable(s)	percent, µmol m−2 s−1
other variables	Other important variables in the dataset	remark, data collector, weather, flag

Gradient study
date	Date of data collection	yyyy-mm-dd; do not split year, month and day into several columns
year	Year of data collection	yyyy; sometimes there is no specific date, then year can be used
siteID	ANO_flatID, Durin siteID_habitatID, VCG siteID	e.g. 239, SE_O, Vikesland
ANO_pointID	Only relevant for ANO flate. Numeric value from 11-66	e.g. 11, 12
NiN_type	NiN type	e.g. T31-C-1
latitude_N	Decimal degree latitude	69.54 °N
longitude_E	Decimal degree longitude	4.54 °E
habitat	Open versus forested habitat, only relevant for Durin and VCG	Open, Forested
species	Vascular plant taxon names follow Elven et al. (2022). We use full species names. For field sheets the names can be abbreviated (see speciesID), but the clean data should contain the full species name	Vaccinium myrtillus
speciesID	2 letter abbreviation of species	VM, VV
individual_nr	Individual number is a numeric value	1-n
segment	S24 = growth in 2024, S23 = growth in 2023, S22 = growth in 2022	S24, S23, S22
collector	Full name separated by underscore of the person that collected the data.	I, me and myself
other variables	Other variables	…

Data collections and experiments associated with plant individuals and/or leaves should have unique IDs for each individual and leaf. PlantID and leafID should contain redundancy of the higher levels, such as siteID, plotID, treatments etc. but also make sure the IDs do not get ridiculously long.

Data collection might occur outside of the plots from the studies described above. For those samples include a new variable called sampling_plot. The options for this variable are defined (sampling inside a plot) and undefined (sampling outside the plots). The unique ID for these samples will look something like: LY_O_1, SE_F_4

Organize data sets

Each dataset should contain only one response variable, or several if they are closely related. E.g. biomass and carbon flux should be two separate datasets. But the functional trait dataset can contain several traits, and the carbon flux dataset can contain GPP, NEE and Reco.

Datasets should be in a long format. If you have several response variables (e.g. traits, flux measurements), then use pivot_long(cols = ..., names_to = "variable", values_to = "value") to conver the data to a long format. Using the names variable and value for the columns is a standard. If you have very different response variables, it can be useful to have a column called unit.

When dealing with many different datasets it can be useful to structure them in a similar way. Arrange the dataset so that the important variables come first, preferable use this order:

Year and/or date
siteID
habitat
treatment
plotID
species (focal)
plantID
leafID
response variable, value, unit (diversity, biomass, flux)
predictor variables (temperature level, oceanity)
other_variables (remark, data collector, weather)

Data dictionary

How to make a data dictionary?

The R package dataDocumentation that will help you to make the data dictionary. You can install and load the package as follows:

# if needed install the remotes package
install.packages("remotes")

# then install the dataDocumentation package
remotes::install_github("audhalbritter/dataDocumentation")

# and load it
library(dataDocumentation)

Make data description table

Find the file R/data_dic/data_description.xlsx. Enter all the variables into that table, including variable name, description, unit/treatment level and how measured. If the variables are global for all of Funder, leave TableID blank (e.g. siteID). If the variable is unique for a specific dataset, create a TableID and use it consistently for one specific dataset. Make sure you have described all variables.

Make data dictionary

Then run the function make_data_dic().

data_dic <- make_data_dictionary(data = biomass,
                                 description_table = description_table,
                                 table_ID = "biomass",
                                 keep_table_ID = FALSE)

Check that the function produces the correct data dictionary.

Add data dictionary to readme file

Finally, add the data dictionary below to be displayed in this readme file. Add a title, and a code chunk using kable() to display the data dictionary.

For more details go to the dataDocumentation readme file.

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
R code		R code
R		R
pics		pics
.gitignore		.gitignore
Analysis_drought-f.html		Analysis_drought-f.html
Durin_data.Rproj		Durin_data.Rproj
README.Rmd		README.Rmd
README.md		README.md
Testing_D_net_analysis.html		Testing_D_net_analysis.html
Testing_D_net_analysis.qmd		Testing_D_net_analysis.qmd
droughtnet_statisticalanalysis.html		droughtnet_statisticalanalysis.html
gradient_sampling.R		gradient_sampling.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DATA MANAGEMENT

Location of data, metadata and code

Meta data

Naming conventions for the clean datasets and files

Organize data sets

Data dictionary

About

Releases

Packages

Contributors 3

Languages

Durin-project/Durin_data

Folders and files

Latest commit

History

Repository files navigation

DATA MANAGEMENT

Location of data, metadata and code

Meta data

Naming conventions for the clean datasets and files

Organize data sets

Data dictionary

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages