Skip to content

Latest commit

 

History

History
243 lines (186 loc) · 9.99 KB

README.md

File metadata and controls

243 lines (186 loc) · 9.99 KB

DOI Last-changedate

hklmirs

This repository contains the data and code for our two manuscripts:

Henning Teickner, and Klaus-Holger Knorr (2022): Improving Models to Predict Holocellulose and Klason Lignin Contents for Peat Soil Organic Matter with Mid-Infrared Spectra. SOIL 8 (2): 699–715. DOI: 10.5194/soil-8-699-2022.

Henning Teickner and Klaus-Holger Knorr (2023): Comment on Hodgkins et al. (2018). Not peer-reviewed preprint.

How to cite

Please cite this compendium as:

Henning Teickner and Klaus-Holger Knorr, (2023). Compendium of R code and data for “Improving Models to Predict Holocellulose and Klason Lignin Contents for Peat Soil Organic Matter with Mid Infrared Spectra” and “Comment on Hodgkins et al. (2018)”. Accessed 06 Dec 2023. Online at https://doi.org/10.5281/zenodo.6325760

Contents

The analysis directory contains:

  • 📁 paper: R Markdown source documents needed to reproduce the manuscript, including figures and tables. The main script is 001-paper-main.Rmd. This script produces both manuscripts and the corresponding supplementary information. Additional scripts are:

    • 002-paper-m-original-models.Rmd: Computes the original models used in Hodgkins et al. (2018) and models with the same model structure, but as Bayesian models.
    • 003-paper-m-gaussian-beta.Rmd: Computes models assuming a Beta distribution for holocellulose and Klason lignin contents and compares them to the original models.
    • 004-paper-m-reduce-underfitting.Rmd: Extents the Beta regression models by including additional variables (additional peaks) or using a different approach (using measured spectral intensities of binned spectra instead of extracted peaks), and validates these models using LOO-CV.
    • 005-paper-m-minerals.Rmd: Uses the models from 003-paper-m-gaussian-beta.Rmd to test how accurate a model for holocellulose content is which is also calibrated on training samples with higher mineral contents.
    • 006-paper-m-prediction-domain.Rmd: Analyzes the prediction domain (Wadoux et al. 2021) of the original models and the modified models and identifies under which conditions models extrapolate for peat and vegetation samples from Hodgkins et al. (2018).
    • 007-paper-m-prediction-differences.Rmd: Compares predictions for the training data and the peat and vegetation data from Hodgkins et al. (2018) for the original models from Hodgkins et al. (2018) and the modified models from 004-paper-m-reduce-underfitting.Rmd.
    • 008-paper-supplementary.Rmd: Computes supplementary analyses and figures for the first manuscript.
    • 001-reply-main.Rmd: This is the main script for manuscript 2. It is compiled from within 001-paper-main.Rmd and produces the supplementary information S1 for manuscript 2.
    • 002-reply-main.Rmd: This script produces the document for manuscript 2. It is compiled from within 001-reply-main.Rmd.
  • 📁 data: Data used in the analysis. Note that raw data is not stored in 📁 raw_data (empty folder), but in 📁 /inst/extdata. 📁 derived_data contains derived data computed from the scripts. The raw data are derived from Hodgkins et al. (2018).

  • 📁 stan_models: The Stan model used in 001-reply-main.Rmd.

The other folders in this directory follow the standard naming scheme and function of folders in R packages. There are the following directories and files:

  • README.md/README.Rmd: Readme for the compendium.
  • DESCRIPTION: The R package DESCRIPTION file for the compendium.
  • NAMESPACE: The R package NAMESPACE file for the compendium.
  • LICENSE.md: Details on the license for the code in the compendium.
  • CONTRIBUTING.md and CONDUCT.md: Files with information on how to contribute to the compendium.
  • Dockerfile: Dockerfile to build a Docker image for the compendium.
  • .Rbuildignore, .gitignore, .dockerignore: Files to ignore during R package building, to ignore by Git, and to ignore while building a Docker image, respectively.
  • renv.lock: renv lock file (Lists all R package dependencies and versions and can be used to restore the R package library using renv). renv.lock was created by running renv::snapshot() in the R package directory and it uses the information included in the DESCRIPTION file.
  • .Rprofile: Code to run upon opening the R-project.
  • R, man, inst, data-raw, data, src: Default folders for making the R package run.
  • Folder inst/extdata: Folder with the raw data used for the analyses. All files in this folder are derived from Hodgkins et al. (2018).

How to run in your broswer or download and run locally

You can download the compendium as a zip from from these URLs: https://github.com/henningte/hklmirs/ or https://doi.org/10.5281/zenodo.6325760

Or you can install this compendium as an R package, hklmirs, from GitHub with:

remotes::install_github("henningte/hklmirs")

How to use

Reproduce the analyses

To reproduce the analyses for the paper, open the RStudio project included in this research compendium and run the Rmarkdown script in analysis/paper/001-paper-main.rmd.

Running the whole script takes about 12 hours and occupies additional disk space of ~2 Gb.

Alternatively, the Dockerfile can be used to build a Docker image from which all analyses can be reproduced. The Dockerfile ensures that all required dependencies are installed (e.g. specific R packages; this is managed using the R package renv).

The Dockerfile provides instructions how to build a Docker image from the Dockerfile and how to run the image in a Docker container. It occupies disk space of ~7 Gb.

When the Docker image runs in a container, go to localhost:8787 in your Browser. You will find an RStudio interface where you can log in with username rstudio and password hkl. Here you can find the Rmarkdown scripts (hklmirs/analysis/paper/001-paper-main.rmd) as described above.

Licenses

Text and figures : CC-BY-4.0

Code : See the DESCRIPTION file

Data : CC-0 attribution requested in reuse. See the sources section for licenses for data derived from external sources and how to give credit to the original author(s) and the source.

Sources

All files in inst/extdata are derived from Hodgkins et al. (2018). These data are licensed under the CC-BY 4.0 license (see https://www.nature.com/articles/s41467-018-06050-2#rightslink).

The format of this research compendium is inspired by Marwick, Boettiger, and Mullen (2018) and was created with rrtools (Marwick 2019). The Rmarkdown template for the main article is from the rticles package (Allaire et al. 2020).

Contributions

We welcome contributions from everyone. Before you get started, please see our contributor guidelines. Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Funding

This study was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) grant no. KN 929/23-1 to Klaus-Holger Knorr and grant no. PE 1632/18-1 to Edzer Pebesma. We acknowledge support from the Open Access Publication Fund of the University of Münster.

References

Allaire, JJ, Yihui Xie, R Foundation, Hadley Wickham, Journal of Statistical Software, Ramnath Vaidyanathan, Association for Computing Machinery, et al. 2020. Rticles: Article Formats for R Markdown. Manual.

Hodgkins, Suzanne B., Curtis J. Richardson, René Dommain, Hongjun Wang, Paul H. Glaser, Brittany Verbeke, B. Rose Winkler, et al. 2018. “Tropical Peatland Carbon Storage Linked to Global Latitudinal Trends in Peat Recalcitrance.” Nature Communications 9 (1): 3640. https://doi.org/10.1038/s41467-018-06050-2.

Marwick, Ben. 2019. “Rrtools: Creates a Reproducible Research Compendium.”

Marwick, Ben, Carl Boettiger, and Lincoln Mullen. 2018. “Packaging Data Analytical Work Reproducibly Using R (and Friends).” The American Statistician 72 (1): 80–88. https://doi.org/10.1080/00031305.2017.1375986.

Wadoux, Alexandre M. J.-C., Brendan Malone, Budiman Minasny, Mario Fajardo, and Alex B. McBratney. 2021. Soil Spectral Inference with R: Analysing Digital Soil Spectra Using the R Programming Environment. Progress in Soil Science. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-64896-1.