This README.md file was generated on 20170718 by Sheila Saia.
This GitHub repository was created to provide access to collected data, analysis code, and other information associated with the paper by Saia et al. titled 'Evidence for polyphosphate accumulating organism (PAO)-mediated phosphorus cycling in stream biofilms under alternating aerobic/anaerobic conditions' in Freshwater Science (https://doi.org/10.1086/691439).
Title of Dataset
"paper-p-cycling-in-stream-biofilms"
Contact Information
Name: Sheila Saia
Institution: Cornell University
Address: B62 Riley-Robb Hall, Ithaca, NY 14853
Email: sms493 at cornell dot edu
Date of data collection
These data were collected during a laboratory experiment from 20151003 to 20151005.
Geographic location of data collection
These data were collected in Cascadilla Creek, Ithaca, NY, USA as well as in the Soil & Water Lab at Cornell University in Ithaca, NY, USA.
Information about funding sources that supported the collection of the data
Sheila Saia was supported by a Cornell University CALS Land Grant Fellowship and USEPA STAR Fellowship. This project was supported by a USDA NIFA grant #2014-67019-21636.
Licenses/restrictions placed on the data
Please use and distribute according to CC-BY v4.0. For a human readible version of this license visit https://creativecommons.org/licenses/by/4.0/ .
Links to publications that cite or use the data
As of 20170112 there are no other publications that cite or use these data.
Links to other publicly accessible locations of the data
This dataset and associated R code are available at https://github.com/sheilasaia/paper-p-cycling-in-stream-biofilms and via Zenodo (https://doi.org/10.5281/zenodo.242599). The associated publication is available via Freshwater Science (https://doi.org/10.1086/691439).
This dataset is also available on the National Agricultural Library's Ag Data Commons at http://bit.ly/2nJYhEb.
Links/relationships to ancillary data sets
There are no links to or relationships with other ancillary data sets.
Data derived from another source
Data was not derived from another source.
Recommended citation for the data
Saia, S. M., P. J. Sullivan, J. M. Regan, H. J. Carrick, A. R. Buda, N. A. Locke, M. T. Walter. 2017. Evidence for polyphosphate accumulating organism (PAO)-mediated phosphorus cycling in stream biofilms under alternating aerobic/anaerobic conditions. Freshwater Science. 36(2):284-296.
Paper Availability
Open-access paper available at https://doi.org/10.1086/691439.
File List
Filename: allTUBdata_oct2014_forPaper.txt
Short description: This text file includes water quality related data for this experiment including: phosphate concentrations, FeII concentrations, and cation concentrations. It also includes environmental variables measured during the experiment including pH, dissolved oxygen, and temperature.
Filename: cellCounts_oct2014_forPaper.txt
Short description: This text file includes the cell counts (DAPI-DNA and DAPI-polyphosphate) taken from both treatments at the end of the experiment.
Filename: PPextAll_oct2014_forPaper.txt
Short description: This text file includes the results of the polyphosphate biofilm extractions at the start and ends of the experiment.
Filename: TPextAll_oct2014_forPaper.txt
Short description: This text file includes the results of the total phosphate biofilm extractions at the start and ends of the experiment.
Filename: oct2014experiment_script_forPaper_final.R
Short description: This R script includes code for all data analysis (apart from calibration of phosphate, FeII, and ICP-MS results) including the statistical analysis and data visualization used in the Freshwater Science journal article associated with these data.
Filename: oct2014experiment_script_forPaper_final.Rmd
Short description: This file is the same as the oct2014experiment_script_forPaper_final.R file but in the R markdown format.
Relationship Between Files
The text files listed above (allTUBdata_oct2014_forPaper.txt, cellCounts_oct2014_forPaper.txt, PPextAll_oct2014_forPaper.txt, TPextAll_oct2014_forPaper.txt) are all required for running the R script called oct2014experiment_script_forPaper_final.R. The oct2014experiment_script_forPaper_final.Rmd file is identical to oct2014experiment_script_forPaper_final.R, but in the R mark-up format.
Raw Data
This repository also contains the raw data that was used compiled into the files listed above. Raw data can be found in the directory called raw_data. Sub-directories within this main directory include:
Directory name: cell_counts
Short description: This directory contains microscope images (DAPI-DNA and DAPI-polyphosphate (polyP)) for all treatment replicate views as well as the grid drawing used for manual cell counts. Methods are described in the associated Freshwater Science journal article.
Directory name: feII_analysis
Short description: This directory contains the raw and processed FeII data obtained using the Ferrozine assay and Tecan plate reader described in the methods section of the associated Freshwater Science journal article.
Directory name: icp_analysis
Short description: This directory includes the raw ICP-MS data from this experiment and described in the methods section of the associated Freshwater Science journal article.
Directory name: srp_analysis
Short description: This folder contains the raw phosphate (aka srp) data from this experiment (rawPdata_20141013.xlsx), the R script used to calibrate these data (P_calibration_code_20141013.R), text files that are used as inputs for the R script (pCalib_, pCheck_, pData_, pDI_), text file outputs of the R script (pNewData_), and a file that combines these inputs and outputs (Pdata_20141013.xlsx).
Filename: oct2014experiment_rawData_forPaper.xlsx
Short description: This file has all the processed data in one place and is the source of the text files mentioned in the 'File List' section of this README file (see tabs with the same names). Briefly, the 'schedule' tab defines the sampling timing and protocol for this experiment as well as the treatment labels. These labels are also explained in the associated Freshwater Science journal article.
Additional related data collected that was not included in the current data package:
All data is included in this package.
Are there multiple versions of the dataset?
Yes, this dataset is also available on the National Agricultural Library's Ag Data Commons at http://bit.ly/2nJYhEb.
Description of methods used for collection/generation of data:
See the associated Freshwater Science journal article for a full description of the methods used to collect and analyze these data.
Methods for processing the data:
See the R scripts in this repository as well as the associated Freshwater Science journal article for a full description of the methods used to collect and analyze these data.
Instrument- or software-specific information needed to interpret the data:
The latest version of the R programming lanugage is required to run the R scripts in this repository. R can be downloaded for free here: https://www.r-project.org/. Microsoft Excel is required to open .xlsx files.
Standards and calibration information, if appropriate:
Information on calibrations are included in the 'Raw Data' section of this README file.
Environmental/experimental conditions:
See the associated Freshwater Science journal article for a full description of the environmental and experimental conditions used while collecting these data.
Describe any quality-assurance procedures performed on the data:
We double checked manually entered data and plotted data in R (scripts not encluded) to ensure that errors were not made when manually entering data.
People involved with sample collection, processing, analysis and/or submission:
See the associated Freshwater Science journal article for a full description of author contributions and acknowledgments.
Variable list
'HoursFromStart' - Hours from start of experiment.
'SampleNum' - Sample number.
'TubID' - Tub treatment identifier where T1 represents alternating anaerobic/aerobic treatment and T2 represents control treatment that was always aerobic.
'Condition' - Identifier to explain specific condition of treatment for a given sample number where 'air' means the treatment was being bubbled with air and where 'n2' means the treatment was being bubbled with a mixed anaerobic gas (80% N2:20% CO2 gas).
'pH' - pH of overlying water for a given sample.
'DOmgL' - Dissolved oxygen concentration (mg/L) of overlying water for a given sample.
'AvgTempC' - Average temperature concentration (degrees C) of the overlying water for a given sample.
'AvgSRPppm' - Average soluble reactive phosphorus (i.e. phosphate) concentration (ppm) of the overlying water for a given sample as analyzed by the molybdenum blue method.
'StdSRPppm' - Standard deviation of the average soluble reactive phosphorus (i.e. phosphate as P) concentration (ppm) of the overlying water for a given sample as analyzed by the molybdenum blue method.
'TotalPmgL' - Total phosphorus concentration (mg/L of the overlying water for a given sample as analyzed by ICP-MS.
'Fe2ppm' - Iron II concentration (ppm) of the overlying water for a given sample as analyzed by ferrozine assay.
'TotalFemgL' - Total iron concentration (mg/L) of the overlying water for a given sample as analyzed by ICP-MS.
'TotalCamgL' - Total calcium concentration (mg/L) of the overlying water for a given sample as analyzed by ICP-MS.
'TotalSmgL' - Total sulfer concentration (mg/L) of the overlying water for a given sample as analyzed by ICP-MS.
'TotalKmgL' - Total potassium concentration (mg/L) of the overlying water for a given sample as analyzed by ICP-MS.
'TotalMgmgL' - Total magnesium concentration (mg/L) of the overlying water for a given sample as analyzed by ICP-MS.
'TotalMnmgL' - Total mangenese concentration (mg/L) of the overlying water for a given sample as analyzed by ICP-MS.
Missing data codes
NA - Data was below detection limit of machine.
Variable list
'SampleID' - Indentfier to explain the specific treatment at the end of the experiment for a given field of view where T1 represents anaerobic treatment and T2 represents control treatment that was aerobic.
'SampleRep' - Number of replicate for a particluar treatment.
'ViewNumber' - Number of field of view for a particluar sample.
'DNACounts' - Number of cells fluorescing under DAPI-DNA filter set for a particular field of view.
'PolyPCounts' - Number of cells fluorescing under DAPI-polyphosphate filter set for a particular field of view.
'Dillution' - Dilution strength of a given sample.
'DNAcellsmL' - Total number of cells per mL of sample.
'PolyPcellsmL' - Total number of cells with stored polyphosphate granules per mL of sample.
'PerPolyP' - Percent number of cells with stored polyphosphate granules for a given sample.
Missing data codes
NA - Cells in view had no polyP granules.
Variable list
'Num' - Row number.
'SampleID' - Indentfier to explain the specific treatment at the end of the experiment for a given field of view where T1 represents anaerobic treatment and T2 represents control treatment that was aerobic.
'Replicate' - Replicate number.
'Extraction' - Type of extraction. PP for polyphosphate or TP for total phosphate.
'Period' - Indication whether biofilm sample was taken for analysis at the start or end of the experiment.
'myPppmFix' - Calibrated polyphosphate concentration (ppm) in the digested biofilm (as soluble reactive phosphorus or phosphate as P). Analyzed by the molybdenum blue method.
'wetBFg' - Mass of wet biofilm added to the digestion in grams.
'VolAddedmL' - Volume of solution added to the digestion in mL.
'AvgSAm2' - Average surface area of the cobbles for that particular treatment and period in m^2.
Missing data codes
No missing data codes.
Variable list
'Num' - Row number.
'SampleID' - Indentfier to explain the specific treatment at the end of the experiment for a given field of view where T1 represents anaerobic treatment and T2 represents control treatment that was aerobic.
'Replicate' - Replicate number.
'Extraction' - Type of extraction. PP for polyphosphate or TP for total phosphate.
'Period' - Indication whether biofilm sample was taken for analysis at the start or end of the experiment.
'myPppmFix' - Calibrated total phosphorus concentration (ppm) in the digested biofilm (as soluble reactive phosphorus or phosphate as P). Analyzed by the molybdenum blue method.
'wetBFg' - Mass of wet biofilm added to the digestion in grams.
'VolAddedmL' - Volume of solution added to the digestion in mL.
'AvgSAm2' - Average surface area of the cobbles for that particular treatment and period in m^2.
Missing data codes
No missing data codes.