diff --git a/.github/workflows/compile-current-nmdc-documentation.yml b/.github/workflows/compile-current-nmdc-documentation.yml
new file mode 100644
index 0000000..f225d91
--- /dev/null
+++ b/.github/workflows/compile-current-nmdc-documentation.yml
@@ -0,0 +1,44 @@
+# This GitHub Actions workflow compiles the Sphinx documentation into web-based documentation.
+# Reference: https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions
+name: Compile current NMDC documentation into HTML
+
+on:
+ push: { branches: [ main ] }
+ workflow_dispatch: { }
+ # Allow this workflow to be called by other workflows.
+ # Reference: https://docs.github.com/en/actions/using-workflows/reusing-workflows
+ workflow_call: { }
+
+jobs:
+ compile:
+ name: Compile
+ runs-on: ubuntu-latest
+ defaults:
+ run:
+ # Set a default working directory for all `run` steps in this job.
+ # Reference: https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#jobsjob_iddefaultsrun
+ working-directory: content/nmdc
+ permissions:
+ contents: read
+ steps:
+ - name: Check out commit # Docs: https://github.com/actions/checkout
+ uses: actions/checkout@v4
+ - name: Set up Python # Docs: https://github.com/actions/setup-python
+ uses: actions/setup-python@v5
+ with: { python-version: '3.12' }
+ - name: Install Sphinx # Docs: https://pypi.org/project/Sphinx/
+ run: pip install Sphinx
+ - name: Install other dependencies
+ run: pip install -r requirements.txt
+ - name: Compile source documents into HTML
+ run: sphinx-build -b html src ${{ github.workspace }}/content/nmdc/_build/html
+ # Upload the result as an "artifact" so it can then be downloaded and used by another job.
+ - name: Save the HTML for publishing later # Docs: https://github.com/actions/upload-artifact
+ uses: actions/upload-artifact@v4
+ with:
+ name: current-nmdc-documentation-as-html
+ # Note: Relative `path` values here are relative to the _workspace_, not to the current working directory.
+ # Reference: https://github.com/actions/upload-artifact/pull/477#issue-2044900649
+ path: content/nmdc/_build/html
+ if-no-files-found: error
+ retention-days: 1 # Note: 1 day is the shortest period possible
diff --git a/.github/workflows/deploy-to-gh-pages.yml b/.github/workflows/deploy-to-gh-pages.yml
index 8a957f5..596c009 100644
--- a/.github/workflows/deploy-to-gh-pages.yml
+++ b/.github/workflows/deploy-to-gh-pages.yml
@@ -17,6 +17,9 @@ jobs:
compile-legacy-nmdc-documentation:
name: Compile legacy NMDC documentation
uses: ./.github/workflows/compile-legacy-nmdc-documentation.yml
+ compile-current-nmdc-documentation:
+ name: Compile current NMDC documentation
+ uses: ./.github/workflows/compile-current-nmdc-documentation.yml
fetch-and-compile-nmdc-runtime-documentation:
name: Fetch and compile NMDC Runtime documentation
uses: ./.github/workflows/fetch-and-compile-nmdc-runtime-documentation.yml
@@ -30,6 +33,7 @@ jobs:
# Reference: https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#jobsjob_idneeds
needs:
- compile-legacy-nmdc-documentation
+ - compile-current-nmdc-documentation
- fetch-and-compile-nmdc-runtime-documentation
- fetch-and-compile-mag-workflow-documentation
runs-on: ubuntu-latest
@@ -47,6 +51,7 @@ jobs:
ls -R artifacts
mkdir -p _build/html _build/html/legacy _build/html/workflows
cp -R artifacts/legacy-nmdc-documentation-as-html _build/html/legacy/nmdc-documentation
+ cp -R artifacts/current-nmdc-documentation-as-html _build/html/nmdc
cp -R artifacts/nmdc-runtime-documentation-as-html _build/html/nmdc-runtime-documentation
cp -R artifacts/mag-workflow-documentation-as-html _build/html/workflows/mag-workflow-documentation
cp -R content/assets _build/html/assets
diff --git a/README.md b/README.md
index e489465..57e9b35 100644
--- a/README.md
+++ b/README.md
@@ -17,13 +17,12 @@ This repository contains the content that we compile into our
* [Table of contents](#table-of-contents)
* [Repository structure](#repository-structure)
* [Content](#content)
+ * [Maintenance](#maintenance)
+ * [Procedure: Basic (to edit 1 file)](#procedure-basic-to-edit-1-file)
+ * [Procedure: Intermediate (to edit 1+ files)](#procedure-intermediate-to-edit-1-files)
* [Legacy content](#legacy-content)
* [NMDC documentation](#nmdc-documentation)
* [Omissions](#omissions)
- * [Maintenance](#maintenance)
- * [Prerequisites](#prerequisites)
- * [Procedure: Basic (to edit 1 file)](#procedure-basic-to-edit-1-file)
- * [Procedure: Intermediate (to edit 1+ files)](#procedure-intermediate-to-edit-1-files)
* [Code](#code)
* [Repository-level configuration files and documentation](#repository-level-configuration-files-and-documentation)
* [GitHub Actions](#github-actions)
@@ -37,14 +36,65 @@ This repository contains the content that we compile into our
This repository has the following sections:
1. `./content`: Current, high-level content about NMDC
+ 1. `nmdc`: Current content (under construction), initialized as a copy of the legacy content
2. `./legacy`: Legacy content we include in the website to support legacy references/publications
3. `./src`: Code we use to compile local and remote content into a website
4. `./`: Repository-level configuration files and documentation
### Content
-> [!NOTE]
-> TODO
+The `./content/nmdc` directory contains our current documentation that is not pulled from an external repository.
+This directory began as a 1-to-1 copy of the `./legacy/nmdc-documentation` directory. The latter is, itself, mostly a
+copy of the `NMDC_documentation` repository (more details about this are in the "Legacy content" section below).
+
+Unlike the contents of the `./legacy/nmdc-documentation` directory, the contents of the `./content/nmdc` directory will
+continue to change over time; i.e. NMDC team members will update and add documentation to this directory.
+
+#### Maintenance
+
+This documentation is implemented within the [Sphinx](https://www.sphinx-doc.org) framework.
+The content is organized according to the
+[Diátaxis](https://diataxis.fr/how-to-use-diataxis/#use-diataxis-as-a-guide-not-a-plan) guidelines.
+
+Here's how you can make (technically, "propose") changes to this documentation:
+
+> **Note:** The high-level process may be familiar to you: (1) create a GitHub Issue, (2) create a branch associated
+> with that Issue, (3) make changes on that branch, and (4) create a Pull Request to merge that branch into `main`.
+> You can use whatever workflow you want in order to follow that process. The following are some example workflows:
+
+##### Procedure: Basic (to edit 1 file)
+
+1. Create a GitHub Issue describing what you want to change (e.g. "Fix Foo in Bar")
+2. On GitHub, go to the file within `./content/nmdc/src` that you want to edit
+3. Click the "Edit this file" button (i.e. the pencil icon button) at the upper right
+4. Edit the file
+5. Click the "Commit changes..." button at the upper right
+6. Customize the commit message to tell others what you did (e.g. "`Fix typo in link`")
+7. Mark the bubble that says "**Create a new branch** for this commit and start a pull request"
+8. (Recommended) Customize the branch name so it starts with the GitHub Issue number (e.g. `123-fix-foo-in-bar`)
+9. Click "Propose changes"
+10. Fill in the Pull Request form and click "Create pull request"
+
+You will end up with a Pull Request (PR) containing the changes. Once the PR gets merged into `main`,
+the documentation website will automatically be updated to reflect the changes.
+
+##### Procedure: Intermediate (to edit 1+ files)
+
+1. Create a GitHub Issue describing what you want to change (e.g. "Fix Foo in Bar")
+2. Visit https://github.dev/microbiomedata/docs/
+3. Click the branch name (e.g. `main`) at the lower left
+4. Click "Create a new branch..." at the top
+5. Enter a name for the branch, beginning with an issue number (e.g. `123-fix-foo-in-bar`)
+6. (If prompted) Click "Switch to Branch"
+7. Make changes in `./content/nmdc/src`
+8. Click the "Source Control" icon in the left sidebar (3rd from the top)
+9. Hover over the "Changes" heading and click the `+` icon that appears
+10. Enter a commit message to tell others what you did (e.g. "`Fix typo in link`")
+11. Click the "Commit & Push" button
+12. Visit https://github.com/microbiomedata/docs/ and create a Pull Request for that branch
+
+You will end up with a Pull Request (PR) containing the changes. Once the PR gets merged into `main`,
+the documentation website will automatically be updated to reflect the changes.
### Legacy content
@@ -58,7 +108,6 @@ That was the latest commit on the `main` branch as of August 28, 2024.
In addition to the files we copied, the directory also contains some files that are _exclusive_ to this repository;
e.g., `Dockerfile` and `.gitignore`.
-
##### Omissions
When copying the aforementioned files from the `NMDC_documentation` repository, we _omitted_ the following files:
@@ -85,53 +134,6 @@ git clone https://github.com/microbiomedata/NMDC_documentation.git /tmp/NMDC_doc
git diff --stat /tmp/NMDC_documentation/docs ./legacy/nmdc-documentation/src
```
-##### Maintenance
-
-This documentation is implemented within the [Sphinx](https://www.sphinx-doc.org) framework.
-The content is organized according to the [Diátaxis](https://diataxis.fr/how-to-use-diataxis/#use-diataxis-as-a-guide-not-a-plan) guidelines.
-
-Here's how you can propose changes to this documentation:
-
-> Note: The general flow is: (1) create a GitHub Issue, (2) create a branch associated with that Issue,
-> (3) make changes on that branch, and (4) create a Pull Request to merge that branch into the `main` branch.
-> The following are a couple of the many ways someone can do those things (other ways are also OK).
-
-###### Prerequisites
-
-1. Create a GitHub Issue describing what you want to change (e.g. "Fix Foo in Bar")
-
-###### Procedure: Basic (to edit 1 file)
-
-1. On GitHub, go to the file within `legacy/nmdc-documentation/src/` that you want to edit
-2. Click the "Edit this file" button (i.e. the pencil icon button) at the upper right
-3. Edit the file
-4. Click the "Commit changes..." button at the upper right
-5. Customize the commit message to tell others what you did (e.g. "`Fix typo in link`")
-6. Mark the bubble that says "**Create a new branch** for this commit and start a pull request"
-7. (Recommended) Customize the branch name so it starts with the GitHub Issue number (e.g. `123-fix-foo-in-bar`)
-8. Click "Propose changes"
-9. Fill in the Pull Request form and click "Create pull request"
-
-You will end up with a Pull Request (PR) containing the changes. Once the PR gets merged into `main`,
-the documentation website will automatically be updated to reflect the changes.
-
-###### Procedure: Intermediate (to edit 1+ files)
-
-1. Visit https://github.dev/microbiomedata/docs/
-2. Click the branch name (e.g. `main`) at the lower left
-3. Click "Create a new branch..." at the top
-4. Enter a name for the branch, beginning with an issue number (e.g. `123-fix-foo-in-bar`)
-5. (If prompted) Click "Switch to Branch"
-6. Make changes in `legacy/nmdc-documentation/src`
-7. Click the "Source Control" icon in the left sidebar (3rd from the top)
-8. Hover over the "Changes" heading and click the `+` icon that appears
-9. Enter a commit message to tell others what you did (e.g. "`Fix typo in link`")
-10. Click the "Commit & Push" button
-11. Visit https://github.com/microbiomedata/docs/ and create a Pull Request for that branch
-
-You will end up with a Pull Request (PR) containing the changes. Once the PR gets merged into `main`,
-the documentation website will automatically be updated to reflect the changes.
-
### Code
> [!NOTE]
@@ -159,13 +161,15 @@ Assuming you have Docker installed, you can spin up the development environment
docker compose up
```
-That will run a web server, serving the legacy section of the website at the following URL:
+That will start up several Docker containers, which you can access via the URLs below:
-- http://localhost:50000
+- http://localhost:5000 - the home page of the centralized documentation website
+- http://localhost:5001 - the legacy documentation website
+- http://localhost:5002 - the current documentation website
In addition, whenever you make changes to content,
-the associated sections of the website will automatically be rebuilt
-(at which point, you can refresh your web browser to see the newly-rebuilt sections).
+the associated section of the associated website will automatically be rebuilt
+(at which point, you can refresh your web browser to see the newly-rebuilt section).
# TODO
diff --git a/content/index.html b/content/index.html
index db55e2f..0de89dd 100644
--- a/content/index.html
+++ b/content/index.html
@@ -98,12 +98,17 @@
Running bioinformatics workflows
Other stuff
- Explore documentation produced during earlier phases of the project.
+ Read tutorials, how-to guides, references, and explanations of various aspects of the NMDC.
diff --git a/content/nmdc/.gitignore b/content/nmdc/.gitignore
new file mode 100644
index 0000000..63b55de
--- /dev/null
+++ b/content/nmdc/.gitignore
@@ -0,0 +1,2 @@
+# Ignore the web-based documentation generated by Sphinx.
+/_build
diff --git a/content/nmdc/Dockerfile b/content/nmdc/Dockerfile
new file mode 100644
index 0000000..5822900
--- /dev/null
+++ b/content/nmdc/Dockerfile
@@ -0,0 +1,21 @@
+# Base this container image upon the official Python container image.
+# Reference: https://hub.docker.com/_/python
+FROM python:3.12
+
+WORKDIR /app
+
+# Install a package that can be used to effectively run Spinx in "watch" mode,
+# wherein the web-based documentation will automatically be rebuilt whenever
+# a source file changes.
+# Reference: https://github.com/sphinx-doc/sphinx-autobuild
+RUN pip3 install sphinx-autobuild
+
+ADD requirements.txt .
+RUN pip3 install -r requirements.txt
+
+EXPOSE 8000
+
+# Run `sphinx-autobuild` in a way that, whenever something changes
+# in the `src` directory, it rebuilds the web-based documentation
+# and stores the result in `_build/html`.
+CMD ["sphinx-autobuild", "--host", "0.0.0.0", "src", "_build/html"]
diff --git a/content/nmdc/requirements.txt b/content/nmdc/requirements.txt
new file mode 100644
index 0000000..83dac51
--- /dev/null
+++ b/content/nmdc/requirements.txt
@@ -0,0 +1,7 @@
+myst-parser
+sphinx_markdown_tables
+sphinx_rtd_theme
+
+# Sphinx plugin that handles redirects.
+# Reference: https://pypi.org/project/sphinx-reredirects/
+sphinx-reredirects
diff --git a/content/nmdc/src/_static/css/custom.css b/content/nmdc/src/_static/css/custom.css
new file mode 100644
index 0000000..a17d6bf
--- /dev/null
+++ b/content/nmdc/src/_static/css/custom.css
@@ -0,0 +1,13 @@
+/* Hide "On Read the Docs" section from versions menu */
+div.rst-versions > div.rst-other-versions > div.injected > dl:nth-child(3) {
+ display: none;
+}
+/* Hide "On GitHub" section from versions menu */
+div.rst-versions > div.rst-other-versions > div.injected > dl:nth-child(4) {
+ display: none;
+}
+
+/* Customize text wrapping in table cells */
+.wy-table-responsive table td, .wy-table-responsive table th {
+ white-space: inherit;
+}
\ No newline at end of file
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_cursor.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_cursor.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_cursor.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_data_objects_data_object_id.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_data_objects_data_object_id.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_data_objects_data_object_id.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_note.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_note.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_note.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_step1.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_step1.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_step1.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_step2.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_step2.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_step2.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_step3and4.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_step3and4.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_step3and4.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_step5.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_step5.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_example_step5.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_activities.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_activities.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_activities.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_activities_activity_id.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_activities_activity_id.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_activities_activity_id.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_biosamples.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_biosamples.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_biosamples.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_biosamples_sample_id.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_biosamples_sample_id.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_biosamples_sample_id.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_data_objects.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_data_objects.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_data_objects.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_data_objects_study.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_data_objects_study.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_data_objects_study.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_studies.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_studies.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_studies.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_studies_study_id.png b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_studies_study_id.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/find_get_studies_study_id.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step1.png b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step1.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step1.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step2.png b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step2.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step2.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step3.png b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step3.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step3.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step4and5.png b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step4and5.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step4and5.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step6.png b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step6.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_example_step6.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_collection_name.png b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_collection_name.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_collection_name.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_collection_name_doc_id.png b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_collection_name_doc_id.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_collection_name_doc_id.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_collection_stats.png b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_collection_stats.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_collection_stats.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_doc_id.png b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_doc_id.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_doc_id.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_nmdcschema_version.png b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_nmdcschema_version.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_get_nmdcschema_version.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_page_token_param.png b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_page_token_param.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_page_token_param.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_post_changesheets_validate.png b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_post_changesheets_validate.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_post_changesheets_validate.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_post_json_validate.png b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_post_json_validate.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_post_json_validate.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_post_validate_urls_file.png b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_post_validate_urls_file.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/api_gui/metadata_post_validate_urls_file.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/data_mgt/data_mgt_list.png b/content/nmdc/src/_static/images/howto_guides/data_mgt/data_mgt_list.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/data_mgt/data_mgt_list.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/Create_submission.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/Create_submission.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/Create_submission.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/KO_results.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/KO_results.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/KO_results.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/KO_term_search.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/KO_term_search.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/KO_term_search.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/ORCiD.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/ORCiD.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/ORCiD.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/PI_search.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/PI_search.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/PI_search.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/bar_plot.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/bar_plot.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/bar_plot.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/bulk_download.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/bulk_download.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/bulk_download.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/color_legend.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/color_legend.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/color_legend.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/column_help.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/column_help.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/column_help.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/column_search.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/column_search.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/column_search.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/column_visibility.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/column_visibility.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/column_visibility.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/date.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/date.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/date.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/depth.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/depth.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/depth.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/download_individual_file.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/download_individual_file.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/download_individual_file.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/enviro_package.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/enviro_package.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/enviro_package.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/envo.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/envo.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/envo.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/geographic_map.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/geographic_map.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/geographic_map.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/geographic_name.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/geographic_name.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/geographic_name.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/gold_classification.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/gold_classification.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/gold_classification.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/instrument_name.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/instrument_name.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/instrument_name.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/jump_to_column.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/jump_to_column.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/jump_to_column.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/latitude.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/latitude.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/latitude.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/longitude.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/longitude.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/longitude.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/multiomics.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/multiomics.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/multiomics.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/omics_type.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/omics_type.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/omics_type.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/portal_functionality.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/portal_functionality.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/portal_functionality.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/processing_institution.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/processing_institution.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/processing_institution.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/sankey_diagram.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/sankey_diagram.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/sankey_diagram.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/shipping_info.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/shipping_info.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/shipping_info.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/study_info.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/study_info.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/study_info.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/sub_portal_enviro_package.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/sub_portal_enviro_package.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/sub_portal_enviro_package.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/sub_portal_input.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/sub_portal_input.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/sub_portal_input.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/submission_context.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/submission_context.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/submission_context.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/temporal_slider.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/temporal_slider.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/temporal_slider.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/upset_plot.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/upset_plot.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/upset_plot.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/portal_guide/validate.png b/content/nmdc/src/_static/images/howto_guides/portal_guide/validate.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/portal_guide/validate.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image1.png b/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image1.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image1.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image2.png b/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image2.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image2.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image3.png b/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image3.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image3.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image4.png b/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image4.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image4.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image5.png b/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image5.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image5.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image6.png b/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image6.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image6.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image7.png b/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image7.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/MAGs/image7.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image1.png b/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image1.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image1.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image2.png b/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image2.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image2.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image3.png b/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image3.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image3.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image4.png b/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image4.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image4.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image5.png b/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image5.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/NOM/image5.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image1.png b/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image1.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image1.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image2.png b/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image2.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image2.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image3.png b/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image3.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image3.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image4.png b/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image4.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image4.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image5.png b/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image5.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image5.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image6.png b/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image6.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metaT/image6.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image1.png b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image1.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image1.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image2.png b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image2.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image2.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image3.png b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image3.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image3.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image4.png b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image4.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image4.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image5.png b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image5.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image5.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image6.png b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image6.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image6.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image7.png b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image7.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAnnotation/image7.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image1.png b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image1.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image1.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image2.png b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image2.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image2.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image4.png b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image4.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image4.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image5.png b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image5.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image5.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image6.png b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image6.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image6.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image7.png b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image7.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image7.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image8.png b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image8.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/metagenomeAssembly/image8.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image1.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image1.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image1.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image10.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image10.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image10.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image11.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image11.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image11.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image12.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image12.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image12.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image13.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image13.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image13.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image14.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image14.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image14.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image15.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image15.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image15.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image2.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image2.png
new file mode 100644
index 0000000..dc2c201
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image2.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image3.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image3.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image3.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image4.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image4.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image4.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image5.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image5.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image5.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image6.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image6.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image6.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image7.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image7.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image7.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image8.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image8.png
new file mode 100644
index 0000000..5cbbd35
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image8.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image9.png b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image9.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/quickStart/image9.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image1.png b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image1.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image1.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image10.png b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image10.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image10.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image2.png b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image2.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image2.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image3.png b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image3.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image3.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image4.png b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image4.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image4.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image6.png b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image6.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image6.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image7.png b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image7.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image7.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image8.png b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image8.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image8.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image9.png b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image9.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readBasedTaxonomy/image9.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image1.png b/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image1.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image1.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image2.png b/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image2.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image2.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image3.png b/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image3.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image3.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image4.png b/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image4.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image4.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image5.png b/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image5.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image5.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image6.png b/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image6.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image6.png differ
diff --git a/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image7.png b/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image7.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/howto_guides/workflows/readsQC/image7.png differ
diff --git a/content/nmdc/src/_static/images/nmdc-logo-bg-trans.png b/content/nmdc/src/_static/images/nmdc-logo-bg-trans.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/nmdc-logo-bg-trans.png differ
diff --git a/content/nmdc/src/_static/images/nmdc-logo-bg-white.png b/content/nmdc/src/_static/images/nmdc-logo-bg-white.png
new file mode 100644
index 0000000..a8714b4
Binary files /dev/null and b/content/nmdc/src/_static/images/nmdc-logo-bg-white.png differ
diff --git a/content/nmdc/src/_static/images/other/construction_1_2887593.jpg b/content/nmdc/src/_static/images/other/construction_1_2887593.jpg
new file mode 100644
index 0000000..d3b6b1e
Binary files /dev/null and b/content/nmdc/src/_static/images/other/construction_1_2887593.jpg differ
diff --git a/content/nmdc/src/_static/images/other/construction_2_3434723.jpg b/content/nmdc/src/_static/images/other/construction_2_3434723.jpg
new file mode 100644
index 0000000..c1c88d0
Binary files /dev/null and b/content/nmdc/src/_static/images/other/construction_2_3434723.jpg differ
diff --git a/content/nmdc/src/_static/images/other/construction_3_4232388.jpg b/content/nmdc/src/_static/images/other/construction_3_4232388.jpg
new file mode 100644
index 0000000..9d780e0
Binary files /dev/null and b/content/nmdc/src/_static/images/other/construction_3_4232388.jpg differ
diff --git a/content/nmdc/src/_static/images/other/construction_4_kindpng_1353982.png b/content/nmdc/src/_static/images/other/construction_4_kindpng_1353982.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/other/construction_4_kindpng_1353982.png differ
diff --git a/content/nmdc/src/_static/images/other/construction_5_kindpng_2708235.png b/content/nmdc/src/_static/images/other/construction_5_kindpng_2708235.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/other/construction_5_kindpng_2708235.png differ
diff --git a/content/nmdc/src/_static/images/overview/diataxis_documentation_graphic.png b/content/nmdc/src/_static/images/overview/diataxis_documentation_graphic.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/overview/diataxis_documentation_graphic.png differ
diff --git a/content/nmdc/src/_static/images/reference/data_portal/nmdc-diagram.svg b/content/nmdc/src/_static/images/reference/data_portal/nmdc-diagram.svg
new file mode 100644
index 0000000..8aea0bc
--- /dev/null
+++ b/content/nmdc/src/_static/images/reference/data_portal/nmdc-diagram.svg
@@ -0,0 +1,4 @@
+
+
+
+
\ No newline at end of file
diff --git a/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img1.png b/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img1.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img1.png differ
diff --git a/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img2.png b/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img2.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img2.png differ
diff --git a/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img3.png b/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img3.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img3.png differ
diff --git a/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img4.png b/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img4.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img4.png differ
diff --git a/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img5.png b/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img5.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img5.png differ
diff --git a/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img6.png b/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img6.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/metadata/NMDC_metadata_img6.png differ
diff --git a/content/nmdc/src/_static/images/reference/metadata/ReadAnalysis_workflow.png b/content/nmdc/src/_static/images/reference/metadata/ReadAnalysis_workflow.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/metadata/ReadAnalysis_workflow.png differ
diff --git a/content/nmdc/src/_static/images/reference/metadata/nmdc-logo-bg-trans.png b/content/nmdc/src/_static/images/reference/metadata/nmdc-logo-bg-trans.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/metadata/nmdc-logo-bg-trans.png differ
diff --git a/content/nmdc/src/_static/images/reference/metadata/nmdc-logo-bg-white.png b/content/nmdc/src/_static/images/reference/metadata/nmdc-logo-bg-white.png
new file mode 100644
index 0000000..a8714b4
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/metadata/nmdc-logo-bg-white.png differ
diff --git a/content/nmdc/src/_static/images/reference/metadata/test_schema_uml.png b/content/nmdc/src/_static/images/reference/metadata/test_schema_uml.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/metadata/test_schema_uml.png differ
diff --git a/content/nmdc/src/_static/images/reference/metadata/test_schema_uml.svg b/content/nmdc/src/_static/images/reference/metadata/test_schema_uml.svg
new file mode 100644
index 0000000..84437f1
--- /dev/null
+++ b/content/nmdc/src/_static/images/reference/metadata/test_schema_uml.svg
@@ -0,0 +1,263 @@
+
+
+
+
diff --git a/content/nmdc/src/_static/images/reference/workflows/1_RQC_rqc_workflow.png b/content/nmdc/src/_static/images/reference/workflows/1_RQC_rqc_workflow.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/1_RQC_rqc_workflow.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/2_MAG_MAG_workflow.png b/content/nmdc/src/_static/images/reference/workflows/2_MAG_MAG_workflow.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/2_MAG_MAG_workflow.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/2_ReadAnalysis_readbased_analysis_workflow.png b/content/nmdc/src/_static/images/reference/workflows/2_ReadAnalysis_readbased_analysis_workflow.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/2_ReadAnalysis_readbased_analysis_workflow.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/3_MetaGAssemly_workflow_assembly.png b/content/nmdc/src/_static/images/reference/workflows/3_MetaGAssemly_workflow_assembly.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/3_MetaGAssemly_workflow_assembly.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/4_MetaGAnnotation_annotation.png b/content/nmdc/src/_static/images/reference/workflows/4_MetaGAnnotation_annotation.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/4_MetaGAnnotation_annotation.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/5_MAG_MAG_workflow.png b/content/nmdc/src/_static/images/reference/workflows/5_MAG_MAG_workflow.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/5_MAG_MAG_workflow.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/5_Metaproteomics_workflow_diagram.png b/content/nmdc/src/_static/images/reference/workflows/5_Metaproteomics_workflow_diagram.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/5_Metaproteomics_workflow_diagram.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/5_ReadAnalysis_readbased_analysis_workflow.png b/content/nmdc/src/_static/images/reference/workflows/5_ReadAnalysis_readbased_analysis_workflow.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/5_ReadAnalysis_readbased_analysis_workflow.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/6_MetaT_metaT_figure.png b/content/nmdc/src/_static/images/reference/workflows/6_MetaT_metaT_figure.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/6_MetaT_metaT_figure.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/6_MetaT_workflow_metatranscriptomics.png b/content/nmdc/src/_static/images/reference/workflows/6_MetaT_workflow_metatranscriptomics.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/6_MetaT_workflow_metatranscriptomics.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/6_Metabolomics_metamsworkflow.png b/content/nmdc/src/_static/images/reference/workflows/6_Metabolomics_metamsworkflow.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/6_Metabolomics_metamsworkflow.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/6_Metaproteomics_workflow_diagram.png b/content/nmdc/src/_static/images/reference/workflows/6_Metaproteomics_workflow_diagram.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/6_Metaproteomics_workflow_diagram.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/7_Metabolomics_metamsworkflow.png b/content/nmdc/src/_static/images/reference/workflows/7_Metabolomics_metamsworkflow.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/7_Metabolomics_metamsworkflow.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/7_Metaproteomics_detailed_workflow_diagram.png b/content/nmdc/src/_static/images/reference/workflows/7_Metaproteomics_detailed_workflow_diagram.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/7_Metaproteomics_detailed_workflow_diagram.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/7_Metaproteomics_workflow_diagram.png b/content/nmdc/src/_static/images/reference/workflows/7_Metaproteomics_workflow_diagram.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/7_Metaproteomics_workflow_diagram.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/8_Metabolomics_metamsworkflow.png b/content/nmdc/src/_static/images/reference/workflows/8_Metabolomics_metamsworkflow.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/8_Metabolomics_metamsworkflow.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/9_MetaT_Workflow_metatranscriptomics.png b/content/nmdc/src/_static/images/reference/workflows/9_MetaT_Workflow_metatranscriptomics.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/9_MetaT_Workflow_metatranscriptomics.png differ
diff --git a/content/nmdc/src/_static/images/reference/workflows/9_NOM_enviromsworkflow.png b/content/nmdc/src/_static/images/reference/workflows/9_NOM_enviromsworkflow.png
new file mode 100644
index 0000000..ba5213f
Binary files /dev/null and b/content/nmdc/src/_static/images/reference/workflows/9_NOM_enviromsworkflow.png differ
diff --git a/content/nmdc/src/_templates/breadcrumbs.html.disabled b/content/nmdc/src/_templates/breadcrumbs.html.disabled
new file mode 100644
index 0000000..339f008
--- /dev/null
+++ b/content/nmdc/src/_templates/breadcrumbs.html.disabled
@@ -0,0 +1,4 @@
+{%- extends "sphinx_rtd_theme/breadcrumbs.html" %}
+
+{% block breadcrumbs_aside %}
+{% endblock %}
\ No newline at end of file
diff --git a/content/nmdc/src/conf.py b/content/nmdc/src/conf.py
new file mode 100644
index 0000000..ef63cfc
--- /dev/null
+++ b/content/nmdc/src/conf.py
@@ -0,0 +1,79 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+
+# sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+
+# -- Project information -----------------------------------------------------
+
+project = 'NMDC Documentation'
+copyright = '2024, The NMDC Team'
+author = 'The NMDC Team'
+
+# The full version, including alpha/beta/rc tags
+release = '0.1'
+
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+ 'myst_parser',
+ 'sphinx_markdown_tables',
+ 'sphinx_reredirects'
+]
+
+# source_suffix = '.rst'
+source_suffix = ['.rst', '.md']
+# Add any paths that contain templates here, relative to this directory.
+# templates_path = ['_templates']
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = []
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages. See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'sphinx_rtd_theme'
+html_title = 'NMDC'
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+# These paths are either relative to html_static_path
+# or fully qualified paths (eg. https://...)
+html_css_files = [
+ 'css/custom.css',
+]
+html_logo = "_static/images/nmdc-logo-bg-white.png"
+
+# -- Redirects ------------------------------------------
+
+# Redirect old schema documentation URLs to the schema documentation
+# that is automatically kept in sync with the schema.
+# Reference: https://pypi.org/project/sphinx-reredirects/
+redirects = {
+ "reference/metadata/xylene": "https://w3id.org/nmdc/xylene", # the latter redirects to: https://microbiomedata.github.io/nmdc-schema/xylene/
+ "reference/metadata/*": "https://w3id.org/nmdc/nmdc",
+}
\ No newline at end of file
diff --git a/content/nmdc/src/explanation/community_conversations.md b/content/nmdc/src/explanation/community_conversations.md
new file mode 100644
index 0000000..faca757
--- /dev/null
+++ b/content/nmdc/src/explanation/community_conversations.md
@@ -0,0 +1,51 @@
+# Community Conversations
+Through this series, the NMDC team invites the broader research community to join us to learn about the breadth of NMDC program activities and partnerships, engage in lively conversations with the NMDC team, and contribute feedback and ideas to inform the future directions of the NMDC.
+
+### 7. [How High are Your Standards? The Importance of Metadata Standards for FAIR Data](https://www.youtube.com/watch?v=SkKm_bGV1CE)
+In this Community Conversation you will learn what it looks like to contribute to community standards and the import of using these standards in your own work. (Apr. 6, 2022)
+>
+>[Recording](https://www.youtube.com/watch?v=SkKm_bGV1CE)
+>
+>[Shared Notes](https://docs.google.com/document/d/1kGwNwk7pfev63BaAdlGJxO0JQgsYdww6g9yCgfpSlCA/edit#heading=h.5b9q553zw0w0)
+
+### 6. [The True Potential of Citing Data](https://www.youtube.com/watch?v=bmxhi1C3QR4)
+This Community Conversation will help our community understand the data citation landscape and increase awareness of best practices for getting and giving credit to data. (Mar. 9, 2022)
+>
+>[Recording](https://www.youtube.com/watch?v=bmxhi1C3QR4)
+>
+>[Shared Notes](https://docs.google.com/document/d/1kh3-9zLTL_S8burzmBgpq1CQdhrhcB89n9aZnD9JQQA/edit)
+
+### 5. [Enabling Shared Ownership of Science Gateways with User-Centered Design](https://www.youtube.com/watch?v=VrUCRmMVfiE)
+This Community Conversation will focus on how a user centered design approach enables successful development of science gateways. (Feb. 2, 2022)
+>
+>[Recording](https://www.youtube.com/watch?v=VrUCRmMVfiE)
+>
+>[Shared Notes](https://docs.google.com/document/d/1ES6x15SUqz82wVEaRlKBaSjw26mu-NfX-sfdNVc1_kk/edit#heading=h.kvova04igq3m)
+
+### 4. [Flowing into Shared Data: Advantages and Disadvantages of Easily Distributable Containerized Open Source Bioinformatics Workflows](https://www.youtube.com/watch?v=cAW5sa0kxb0)
+A panelist of experts will discuss the advantages of containerization and open source tools for computing. (Jan. 12, 2022)
+>
+>[Recording](https://www.youtube.com/watch?v=cAW5sa0kxb0)
+>
+>[Shared Notes](https://docs.google.com/document/d/1djLDcOe9zVt5IFznBm33jqzD_wHhaEkT-B4M1rYqA_E/edit#heading=h.oefl3ce58jxh)
+
+### 3. [Developing Data Science resources in Partnership with Scientific Communities](https://www.youtube.com/watch?v=T3qYcPEn4x4)
+In this webinar, the Center for Scientific Collaboration and Community Engagement illustrates the importance of supporting multiple modes of community engagement, and highlights the role of the most involved community members, in maintaining, growing, and evolving scientific communities and their core activities. (Dec. 1, 2021)
+>
+>[Recording](https://www.youtube.com/watch?v=T3qYcPEn4x4)
+>
+>[Shared Notes](https://docs.google.com/document/d/1LcNnLO8ZSFRzZAOvoy8QSDhmV4UcgmKX_zRyNB-lNU4/edit#heading=h.oefl3ce58jxh)
+
+### 2. [Making Ontologies Work for Microbiome Research](https://www.youtube.com/watch?v=S0fEKwDH2MU)
+A panel of experts will discuss the power of ontologies and how to build quality bidirectional relationships between biologists and ontologists to maximize the usability of ontologies. (Nov. 3, 2021)
+>
+>[Recording](https://www.youtube.com/watch?v=S0fEKwDH2MU)
+>
+>[Shared Notes](https://docs.google.com/document/d/1p6Xend5_wAb8055Dyg8-HVtGXof1tPsoKUQ-deOccTA/edit)
+
+### 1. [Setting Up for Success: The Importance of a Data Management Plan](https://www.youtube.com/watch?v=5vQ6FFyyZoA)
+Jointly sponsored by California Digital Library’s [DMPTool](https://dmptool.org/), this conversation will showcase how creating robust data management plans is a vital first step in developing FAIR data. (Oct. 6, 2021)
+>
+>[Recording](https://www.youtube.com/watch?v=5vQ6FFyyZoA)
+>
+>[Shared Notes](https://docs.google.com/document/d/1AUZFkvYHQtnHnov6jFzorV9XDL1_TsQtSJJAhyjTvT4/edit)
diff --git a/content/nmdc/src/explanation/fair_data.md b/content/nmdc/src/explanation/fair_data.md
new file mode 100644
index 0000000..59440f6
--- /dev/null
+++ b/content/nmdc/src/explanation/fair_data.md
@@ -0,0 +1,16 @@
+# FAIR data
+
+### Making microbiome data Findable, Accessible, Interoperable, and Reusable (FAIR).
+
+The NMDC is committed to adhering to [FAIR Guiding Principles](https://www.nature.com/articles/sdata201618) with regard to data management best practices. As part of this commitment, the NMDC has established the [FAIR Microbiome Implementation Network](https://www.go-fair.org/implementation-networks/overview/fair-microbiome/) (IN), a consortium interested in microbiome research that has agreed upon a set of objectives and common vision for implementing FAIR principles in the context of microbiome data science.
+
+The FAIR Microbiome IN seeks to align and synergize the efforts of the NMDC with that of the [GO FAIR]( community, leveraging the [GO FAIR](https://www.go-fair.org/) open and inclusive [ecosystem of Implementation Networks](https://www.go-fair.org/implementation-networks/overview/) for the microbiome research community (e.g., [Chemistry IN](https://www.go-fair.org/implementation-networks/overview/chemistryin/), [Metabolomics IN](https://www.go-fair.org/implementation-networks/overview/metabolomics/), Biodiversites INs). The purpose of the FAIR Microbiome IN is to work with microbiome research communities to:
+
+* Formalize core and domain-specific microbiome ontologies that promote discovery and reuse, and
+* Establish training on the NMDC data models that allow for broader dissemination of knowledge and compliance for both humans and machines.
+
+The NMDC is working across research teams, funders, publishers, and societies/consortiums to provide training and community engagement on data standards, a key component to making data FAIR. For example, the NMDC is a registered database with [FAIRsharing](https://fairsharing.org/biodbcore-001563/), which provides journals with a curated source of recommended repositories. The NMDC is also a member of ORCID, and registered with the DOE Office of Scientific and Technical Information (OSTI), which enables the NMDC to assign Digital Object Identifiers (DOIs) to any data set submitted and processed through the NMDC workflows.
+
+Additional FAIR-related efforts will be updated as the NMDC pilot continues to be developed.
+
+Reference: Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. (2016) The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018. https://doi.org/10.1038/sdata.2016.18
diff --git a/content/nmdc/src/explanation/idea.md b/content/nmdc/src/explanation/idea.md
new file mode 100644
index 0000000..b484276
--- /dev/null
+++ b/content/nmdc/src/explanation/idea.md
@@ -0,0 +1,16 @@
+# IDEA
+
+The National Microbiome Data Collaborative’s (NMDC) vision is to empower the research community to harness microbiome data exploration and discovery through a collaborative integrative science ecosystem. The NMDC team is constructing this discovery portal in the midst of a technological and data revolution. With the advent of new technologies and massive amounts of data being produced by these technologies, the scientific community’s capacity for innovation, experimentation, and adaptation has increased dramatically. Data, and particularly open data, play a crucial role in advancing innovation, governance, and self-determination among vulnerable communities [(Carroll 2020)](https://datascience.codata.org/articles/10.5334/dsj-2020-043/). Yet, “vulnerable populations remain both under-studied and under-consulted on the use of data…restraining the utility of big data applications” to contribute to inclusive innovation [(Jackson 2019)](https://www.frontiersin.org/articles/10.3389/fdata.2019.00019/full).”
+
+Diversity within microbiome research, in all its forms (racial, gender, sexual identity, class, and more), strengthens research teams and practice, and helps advance science. Significant parallel, non-technical efforts are required to ensure microbiome data science, new technologies, and infrastructure developments work in the best interests of the research community and society at large. We are committed to supporting the diversity of experiences, expertise, backgrounds, needs, and perspectives of the microbiome research community, and to actively work towards an inclusive culture at a programmatic and individual level.
+
+Understanding that engagement of vulnerable populations and working to correct these systemic exclusions of data lead to higher quality data generation and more diverse outcomes, the NMDC has constructed an Inclusion, Diversity, Equity, and Accountability (IDEA) strategic plan. This strategy is a living document that will grow and evolve as the NMDC Team continues to engage with the community for feedback and work with partners to make an inclusive, diverse, and equitable environment. If you would like to engage with the NMDC, please reach out to support@microbiomedata.org.
+Our Goals:
+
+>Goal 1: Promote transparency and accountability within NMDC’s Team and Operations.
+>
+>Goal 2: Promote transparency and accountability within NMDC’s Governance Structure.
+>
+>Goal 3: Engage and support diverse stakeholders and users.
+
+You can find more information about the [NMDC IDEA Stategic Plan.](https://microbiomedata.org/idea-strategic-plan/) on the NMDC website.
diff --git a/content/nmdc/src/explanation/publications.md b/content/nmdc/src/explanation/publications.md
new file mode 100644
index 0000000..1bb0747
--- /dev/null
+++ b/content/nmdc/src/explanation/publications.md
@@ -0,0 +1,12 @@
+# Publications
+
+### NMDC Team Publications
+* Hu B, Canon S, Eloe-Fadrosh EA, Anubhav, Babinski M, Corilo Y, Davenport K, Duncan WD, Fagnan K, Flynn M, Foster B, Hays D, Huntemann Ml, Jackson EKP, Kelliher J, Li PE, Lo CC, Mans D, McCue LA, Mouncey N, Mungall CJ, Piehowski PD, Purvine SO, Smith M, Varghese NJ, Winston D, Xu Y, Chain PSG. [Challenges in Bioinformatics Workflows for Processing Microbiome Omics Data at Scale.](https://www.frontiersin.org/articles/10.3389/fbinf.2021.826370/full) Front Bioinform. 2022 January 17. 1:826370. doi: 10.3389/fbinf.2021.826370.
+* Eloe-Fadrosh EA, Ahmed F, Anubhav, Babinski M, Baumes J, Borkum M, Bramer L, Canon S, Christianson DS, Corilo YE, Davenport KW, Davis B, Drake M, Duncan WD, Flynn MC, Hays D, Hu B, Huntemann M, Kelliher J, Lebedeva S, Li PE, Lipton M, Lo CC, Martin S, Millard D, Miller K, Miller MA, Piehowski P, Jackson EP, Purvine S, Reddy TBK, Richardson R, Rudolph M, Sarrafan S, Shakya M, Smith M, Stratton K, Sundaramurthi JC, Vangay P, Winston D, Wood-Charlson EM, Xu Y, Chain PSG, McCue LA, Mans D, Mungall CJ, Mouncey NJ, Fagnan K. [The National Microbiome Data Collaborative Data Portal: an integrated multi-omics microbiome data resource.](https://academic.oup.com/nar/article/50/D1/D828/6414581?login=true) Nucleic Acids Res. 2022 January 7;60(D1):D828–D836. doi: 10.1093/nar/gkab990.
+* Vangay P, Burgin J, Johnston A, Beck KL, Berrios DC, Blumberg K, Canon S, Chain P, Chandonia JM, Christianson D, Costes SV, Damerow J, Duncan WD, Dundore-Arias JP, Fagnan K, Galazka JM, Gibbons SM, Hays D, Hervey J, Hu B, Hurwitz BL, Jaiswal P, Joachimiak MP, Kinkel L, Ladau J, Martin SL, McCue LA, Miller K, Mouncey N, Mungall C, Pafilis E, Reddy TBK, Richardson L, Roux S, Schriml LM, Shaffer JP, Sundaramurthi JC, Thompson LR, Timme RE, Zheng J, Wood-Charlson EM, Eloe-Fadrosh EA. [Microbiome Metadata Standards: Report of the National Microbiome Data Collaborative’s Workshop and Follow-On Activities.](https://journals.asm.org/doi/10.1128/mSystems.01194-20) mSystems. 2021 Feb 23;6(1). doi: 10.1128/mSystems.01194-20. PubMed PMID: 33622857.
+* Wood-Charlson EM, Anubhav, Auberry D, Blanco H, Borkum MI, Corilo YE, Davenport KW, Deshpande S, Devarakonda R, Drake M, Duncan WD, Flynn MC, Hays D, Hu B, Huntemann M, Li PE, Lipton M, Lo CC, Millard D, Miller K, Piehowski PD, Purvine S, Reddy TBK, Shakya M, Sundaramurthi JC, Vangay P, Wei Y, Wilson BE, Canon S, Chain PSG, Fagnan K, Martin S, McCue LA, Mungall CJ, Mouncey NJ, Maxon ME, Eloe-Fadrosh EA. [The National Microbiome Data Collaborative: enabling microbiome science.](https://www.nature.com/articles/s41579-020-0377-0) Nat Rev Microbiol. 2020 Jun;18(6):313-314. doi: 10.1038/s41579-020-0377-0. PubMed PMID: 32350400.
+
+### Related Publications
+* Kyrpides NC, Eloe-Fadrosh EA, Ivanova NN. [Microbiome Data Science: Understanding Our Microbial Planet.](https://www.sciencedirect.com/science/article/pii/S0966842X16000482?via%3Dihub) Trends Microbiol. 2016 Jun;24(6):425-427. doi: 10.1016/j.tim.2016.02.011. PubMed PMID: 27197692.
+* Mukherjee S, Stamatis D, Bertsch J, Ovchinnikova G, Sundaramurthi JC, Lee J, Kandimalla M, Chen IA, Kyrpides NC, Reddy TBK. [Genomes OnLine Database (GOLD) v.8: overview and updates.](https://academic.oup.com/nar/article/49/D1/D723/5957166?login=true) Nucleic Acids Res. 2021 Jan 8;49(D1):D723-D733. doi: 10.1093/nar/gkaa983. PMID: 33152092; PMCID: PMC7778979.
+* Chen IA, Chu K, Palaniappan K, Ratner A, Huang J, Huntemann M, Hajek P, Ritter S, Varghese N, Seshadri R, Roux S, Woyke T, Eloe-Fadrosh EA, Ivanova NN, Kyrpides NC. [The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities.](https://academic.oup.com/nar/article/49/D1/D751/5943189?login=true) Nucleic Acids Res. 2021 Jan 8;49(D1):D751-D763. doi: 10.1093/nar/gkaa939. PMID: 33119741; PMCID: PMC7778900.
diff --git a/content/nmdc/src/howto_guides/api_gui.md b/content/nmdc/src/howto_guides/api_gui.md
new file mode 100644
index 0000000..958b877
--- /dev/null
+++ b/content/nmdc/src/howto_guides/api_gui.md
@@ -0,0 +1,151 @@
+# Using the NMDC API Graphical User Interface (GUI)
+
+Dependency versions:
+nmdc-runtime=1.2.0
+
+## Retrieving Metadata using the ___Find___ and ___Metadata___ API Endpoints
+
+Metadata describing NMDC data (e.g. studies, biosamples, data objects, etc.) may be retrieved with GET requests, using the **[NMDC API Graphical User Interface (GUI)](https://api.microbiomedata.org/docs#/)**. The API GUI provides a guided user interface for direct access to the NMDC data portal. It allows for:
+1. performing highly granular and targeted queries directly. This is especially helpful if a user has a query that may not be supported by the [NMDC Data Portal](https://data.microbiomedata.org/) yet.
+2. interactive exploration of querying capabilities. It provides code snippets that can be used in scripts for programmatic access, i.e. the request `curl` commands and URLs provided in the responses (please see the examples below).
+
+Please note that the endpoints discussed in this documentation are targeted for users, such as NMDC data consumers. For documentation describing other endpoints, primarily used by developers, please see [nmdc-runtime-docs](https://microbiomedata.github.io/nmdc-runtime/).
+
+Requests can include various parameters to filter, sort, and organize the requested information. Attribute names in the parameters will vary depending on the collection. The required syntax of the parameters will also vary, depending on if it is a ___find___ or a ___metadata___ endpoint. ___Find___ endpoints are designed to use more [compact syntax](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/filter-entity-lists) (for example, filtering biosamples for an "Ecosystem Category" of "Plants" would look like `ecosystem_category:Plants` using the `GET /biosamples` endpoint). While ___metadata___ endpoints use [MongoDB-like language querying](https://www.mongodb.com/docs/manual/tutorial/query-documents/) (e.g. the same filter would look like `{"ecosystem_category": "Plants"}` using the `GET /nmdcshema/{collection_name}` endpoint with `collection_name` set to `biosample_set`).
+
+#### ___Find___ Endpoints
+
+The [find endpoints](https://api.microbiomedata.org/docs#/find:~:text=Find%20NMDC-,metadata,-entities.) are provided with NMDC metadata entities already specified - where metadata about [studies](https://w3id.org/nmdc/Study), [biosamples](https://w3id.org/nmdc/Biosample), [data objects](https://w3id.org/nmdc/DataObject/), and [activities](https://w3id.org/nmdc/Activity/) can be retrieved using GET requests.
+
+The applicable parameters of the ___find___ endpoints, with acceptable syntax and examples, are in the table below.
+
+| Parameter | Description | Syntax | Example |
+| :---: | :-----------: | :-------: | :---: |
+| filter | Allows conditions to be set as part of the query, returning only results that satisfy the conditions | Comma separated string of attribute:value pairs. Can include comparison operators like >=, <=, <, and >. May use a `.search` after the attribute name to conduct a full text search of the field that are of type string. e.g. `attribute:value,attribute.search:value` | `ecosystem_category:Plants, lat_lon.latitude:>35.0` |
+| search | Not yet implemented | Coming Soon | Not yet implemented |
+| sort | Specifies the order in which the query returns the matching documents | Comma separated string of attribute:value pairs, where the value can be empty, `asc`, or `desc` (for ascending or descending order) e.g. `attribute` or `attribute:asc` or `attribute:desc`| `depth.has_numeric_value:desc, ecosystem_type` |
+| page | Specifies the desired page number among the paginated results | Integer | `3` |
+| per_page | Specifies the number of results returned per page. Maximum allowed is 2,000 | Integer | `50` |
+| cursor | A bookmark for where a query can pick up where it has left off. To use cursor paging, set the `cursor` parameter to `*`. The results will include a `next_cursor` value in the response's `meta` object that can be used in the `cursor` parameter to retrieve the subsequent results ![next_cursor](../_static/images/howto_guides/api_gui/find_cursor.png) | String | `*` or `nmdc:sys0zr0fbt71` |
+| group_by | Not yet implemented | Coming Soon | Not yet implemented |
+| fields | Indicates the desired attributes to be included in the response. Helpful for trimming down the returned results | Comma-separated list of attributes that belong to the documents in the collection being queried | `name, ess_dive_datasets` |
+| study_id | The unique identifier of a study | Curie e.g. `prefix:identifier` | `nmdc:sty-11-34xj1150` |
+| sample_id | The unique identifier of a biosample | Curie e.g. `prefix:identifier` | `nmdc:bsm-11-w43vsm21` |
+| data_object_id | The unique identifier of a data object | Curie e.g. `prefix:identifier` | `nmdc:dobj-11-7c6np651` |
+| activity_id | The unique identifier for an NMDC workflow execution activity | Curie e.g. `prefix:identifier` | `nmdc:wfmgan-11-hvcnga50.1`|
+
+
+
+
+Each endpoint is unique and requires the applicable attribute names to be known in order to structure a query in a meaningful way. Please note that endpoints with parameters that do not have a red `* required` label next to them, are optional.
+
+
+![find get studies](../_static/images/howto_guides/api_gui/find_get_studies.png)
+The `GET /studies` endpoint is a general purpose way to retrieve NMDC studies based on parameters provided by the user. Studies can be filtered and sorted based on the applicable [Study attributes](https://microbiomedata.github.io/nmdc-schema/Study/).
+
+
+![find get studies by study_id](../_static/images/howto_guides/api_gui/find_get_studies_study_id.png)
+If the study identifier is known, a study can be retrieved directly using the `GET /studies/{study_id}` endpoint. Note that only one study can be retrieved at a time using this method.
+
+
+![find get biosamples](../_static/images/howto_guides/api_gui/find_get_biosamples.png)
+The `GET /biosamples` endpoint is a general purpose way to retrieve biosample metadata using user-provided filter and sort criteria. Please see the applicable [Biosample attributes](https://microbiomedata.github.io/nmdc-schema/Biosample/).
+
+
+![find get biosamples by sample_id](../_static/images/howto_guides/api_gui/find_get_biosamples_sample_id.png)
+If the biosample identifier is known, a biosample can be retrieved directly using the `GET /biosamples/{sample_id}`. Note that only one biosample metadata record can be retrieved at a time using this method.
+
+
+![find get data objects](../_static/images/howto_guides/api_gui/find_get_data_objects.png)
+To retrieve metadata about NMDC data objects (such as files, records, or omics data) the `GET /data_objects` endpoint may be used along with various parameters. Please see the applicable [Data Object attributes](https://microbiomedata.github.io/nmdc-schema/DataObject).
+
+
+![find get data objects by data object_id](../_static/images/howto_guides/api_gui/find_data_objects_data_object_id.png)
+If the data object identifier is known, the metadata can be retrieved using the `GET /data_objects/{data_object_id}` endpoint. Note that only one data object metadata record may be retrieved at a time using this method.
+
+
+![find get activities](../_static/images/howto_guides/api_gui/find_get_activities.png)
+The `GET /activities` endpoint is a general way to fetch metadata about various activities (e.g. metagenome assembly, natural organic matter analysis, library preparation, etc.). Any "slot" (a.k.a. attribute) for [WorkflowExecutionActivity](https://microbiomedata.github.io/nmdc-schema/WorkflowExecutionActivity/) or [PlannedProcess](https://microbiomedata.github.io/nmdc-schema/PlannedProcess/) classes may be used in the filter and sort parameters, including attributes of subclasses of `WorkflowExecutionActivity` and `PlannedProcess`. For example, attributes used in subclasses such as, [MetabolomicsAnalysisActivity](https://microbiomedata.github.io/nmdc-schema/MetabolomicsAnalysisActivity/) (subclass of `WorkflowExecutionActivity`) or (Extraction)[https://microbiomedata.github.io/nmdc-schema/Extraction/] (subclass of `PlannedProcess`), can be used as input criteria for the filter and sort parameters of this endpoint.
+
+
+![find get activities by activity id](../_static/images/howto_guides/api_gui/find_get_activities_activity_id.png)
+If the activity identifier is known, the activity metadata can be retrieved using the `GET /activities/activity_id` endpoint. Note that only one metadata record for an activity may be returned at a time using this method.
+
+
+For more information and to see more examples of __find__ endpoints outside of the [autogenerated user interface](https://api.microbiomedata.org/docs#/find), please visit: https://api.microbiomedata.org/search
+
+
+#### Find Endpoint Example: get all studies that have EMSL (Environmental Molecular Sciences Laboratory) related funding
+
+1. Click on the drop down arrow to the right side of the **`GET /studies`** endpoint
+![find example step 1](../_static/images/howto_guides/api_gui/find_example_step1.png)
+2. Click **Try it out** in the upper right of the expanded endpoint box
+![find example step 2](../_static/images/howto_guides/api_gui/find_example_step2.png)
+3. Enter in parameters. In this case, we will input `funding_sources.search:EMSL` into the **filter** parameter. The `.search` performs a full text search to find studies with `funding_sources` that have the word "EMSL" in its value.
+4. Click **Execute**
+![find example step 3 and step 4](../_static/images/howto_guides/api_gui/find_example_step3and4.png)
+5. View the results in JSON format, available to download by clicking **Download**, or copy the results by clicking the clipboard icon in the bottom right corner of the response. In this case, two studies were retrieved.
+![find example step 5](../_static/images/howto_guides/api_gui/find_example_step5.png)
+
+
+- Note that a curl request and request URL are provided as well for command line usage:
+![find example note](../_static/images/howto_guides/api_gui/find_example_note.png)
+
+#### ___Metadata___ Endpoints
+
+The [metadata endpoints](https://api.microbiomedata.org/docs#/metadata) can be used to get and filter metadata from collection set types (including studies, biosamples, activities, and data objects as discussed in the __find__ section).
+
+Unlike the compact syntax used in the __find__ endpoints, the syntax for the filter parameter of the metadata endpoints uses [MongoDB-like language querying](https://www.mongodb.com/docs/manual/tutorial/query-documents/). The applicable parameters of the __metadata__ endpoints, with acceptable syntax and examples, are in the table below.
+
+| Parameter | Description | Syntax | Example |
+| :---: | :-----------: | :-------: | :---: |
+| collection_name | The name of the collection to be queried. For a list of collection names please see the [Database class](https://microbiomedata.github.io/nmdc-schema/Database/) of the NMDC Schema | String | `biosample_set` |
+| filter | Allows conditions to be set as part of the query, returning only results that satisfy the conditions | [MongoDB-like query language](https://www.mongodb.com/docs/manual/tutorial/query-documents/). All strings should be in double quotation marks. | `{"lat_lon.latitude": {"$gt": 45.0}, "ecosystem_category": "Plants"}` |
+| max_page_size | Specifies the maximum number of documents returned at a time | Integer | `25`
+| page_token | Specifies the token of the page to return. If unspecified, the first page is returned. To retrieve a subsequent page, the value received as the `next_page_token` from the bottom of the previous results can be provided as a `page_token`. ![next_page_token](../_static/images/howto_guides/api_gui/metadata_page_token_param.png) | String | `nmdc:sys0ae1sh583`
+| projection | Indicates the desired attributes to be included in the response. Helpful for trimming down the returned results | Comma-separated list of attributes that belong to the documents in the collection being queried | `name, ecosystem_type` |
+| doc_id | The unique identifier of the item being requested. For example, the identifier of a biosample or an extraction | Curie e.g. `prefix:identifier` | `nmdc:bsm-11-ha3vfb58` |
+
+
+
+
+The __metadata__ endpoints allow users to retrieve metadata from the data portal using the various `GET` endpoints that are slightly different than the __find__ endpoints, but some can be used similarly. As with the __find__ endpoints, parameters for the __metadata__ endpoints that do not have a red `* required` next to them are optional.
+
+
+![metadata get nmdcshema version](../_static/images/howto_guides/api_gui/metadata_get_nmdcschema_version.png)
+To view the [NMDC Schema](https://microbiomedata.github.io/nmdc-schema/) version the database is currently using, try executing the `GET /nmdcschema/version` endpoint
+
+
+![metadata get collection stats](../_static/images/howto_guides/api_gui/metadata_get_collection_stats.png)
+To get the NMDC Database collection statistics, like the total count of records in a collection or the size of the collection, try executing the `GET /nmdcschema/collection_stats` endpoint
+
+
+![metadata get collection name](../_static/images/howto_guides/api_gui/metadata_get_collection_name.png)
+The `GET /nmdcschema/{collection_name}` endpoint is a general purpose way to retrieve metadata about a specified collection given user-provided filter and projection criteria. Please see the [Collection Names](https://microbiomedata.github.io/nmdc-schema/Database/) that may be retrieved. Please note that metadata may only be retrieved about one collection at a time.
+
+
+![metadata get doc_id](../_static/images/howto_guides/api_gui/metadata_get_doc_id.png)
+If the identifier of the record is known, the `GET /nmdcshema/ids/{doc_id}` can be used to retrieve the specified record. Note that only one identifier may be used at a time, and therefore, only one record may be retrieved at a time using this method.
+
+
+![metadata get collection_name doc_id](../_static/images/howto_guides/api_gui/metadata_get_collection_name_doc_id.png)
+If both the identifier and the collection name of the desired record is known, the `GET /nmdcschema/{collection_name}/{doc_id}` can be used to retrieve the record. The projection parameter is optionally available for this endpoint to retrieve only desired attributes from a record. Please note that only one record can be retrieved at one time using this method.
+
+
+#### Metadata Endpoints Example 1: Get all of the biosamples that are part of the 1000 Soils Research Campaign Study sampled from Colorado
+
+1. Click on the drop down arrow to the right side of the **`GET /nmdcschema/{collection_name}`** endpoint
+![metadata example step1](../_static/images/howto_guides/api_gui/metadata_example_step1.png)
+2. Click **Try it out** in the upper right of the expanded endpoint box
+![metadata example step2](../_static/images/howto_guides/api_gui/metadata_example_step2.png)
+3. In order to enter in the parameters, get the identifier for this study by navigating to the [1000 Soils Research Campaign study page](https://data.microbiomedata.org/details/study/nmdc:sty-11-28tm5d36) in the data portal and copying the `ID`
+![metadata example step3](../_static/images/howto_guides/api_gui/metadata_example_step3.png)
+4. Enter in the parameters in the **`GET /nmdcschema/{collection_name}`** endpoint. For this example, we will input `biosample_set` into the **collection_name** parameter and `{"part_of": "nmdc:sty-11-28tm5d36", "geo_loc_name.has_raw_value": {"$regex": "Colorado"}}` into the **filter** parameter. See the [Biosample Class](https://microbiomedata.github.io/nmdc-schema/Biosample/) in the NMDC Schema to view the applicable biosample attributes (slots); for this example, they are `part_of` and `geo_loc_name.has_raw_value`. Note that `$regex` conducts a full text search for the word "Colorado" in the `geo_loc_name.has_raw_value` attribute.
+5. Click **Execute**
+![metadata example step4](../_static/images/howto_guides/api_gui/metadata_example_step4and5.png)
+6. View the results in JSON format, available to download by clicking **Download**; or copy the results by clicking the clipboard icon in the bottom right corner of the response. In this case, two studies were retrieved. Note that the curl and request URL are provided as well.
+![metadata example step6](../_static/images/howto_guides/api_gui/metadata_example_step6.png)
+
+
+
+
diff --git a/content/nmdc/src/howto_guides/data_plan.md b/content/nmdc/src/howto_guides/data_plan.md
new file mode 100644
index 0000000..75afcb7
--- /dev/null
+++ b/content/nmdc/src/howto_guides/data_plan.md
@@ -0,0 +1,29 @@
+# Creating a Data Management Plan
+
+### What is a Data Management Plan?
+
+A data management plan (DMP) is an integral part of grant applications. DMPs are required by every federal funder but the guidelines vary depending on the agency. Here you can find information for Federal Funding Agencies that work with microbiome data.
+
+Your DMP communicates how you and your team will collect, categorize, store, and share any data produced during the duration of a grant, and how that data will be preserved and made accessible after the completion of the project. While the DMP is important for your grant proposal, it is also important for laying the groundwork for producing high quality, accessible, and reusable data. A DMP should be a living document that sets expectations for your project team before and during the project. To maximize the impact of your DMP it should be public, machine readable, and openly licensed by generating a personalized identification number.
+
+While a new concept, making your DMP machine readable, increases the likelihood that your data will gain recognition and credit because your data can be located, reused and cited easily. To learn more about the merits of a machine readable DMPs read [Ten principles for machine-actionable data management plans.](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006750)
+
+### What to include in your DMP?
+
+While your DMP should include information about all data collected through the duration of your research, these guidelines and best practices focus on large omic data generated from microbiome samples. If you are unfamiliar with microbiome data management and metadata standards, we recommend you begin with an [Introduction to Metadata and Ontologies: Everything You Always Wanted to Know About Metadata and Ontologies (But Were Afraid to Ask)](https://microbiomedata.org/introduction-to-metadata-and-ontologies/) and the [NMDC Metadata Standards Documentation](https://w3id.org/nmdc/nmdc). These resources will introduce you to multi-omics metadata standards that leverage existing community-driven standards.
+
+***The NMDC DMPTool Template***
+
+In partnership with the University of California Curation Center of the California Digital Library, the NMDC team has created a microbiome-specific DMPTool Template. DMPTool is an open-source application that assists researchers in the creation of data management plans compliant with federal funding requirements. The NMDC DMPTool template is funding-organization agnostic and was developed to support microbiome data management best practices with specifications unique to microbiome standards and data processing. Once you [create an account at DMPTool](https://dmptool.org/), this link will take you to the [NMDC Microbiome Omics Research DMP Template](https://dmptool.org/plans?plan%5Bfunder%5D%5Bid%5D=%7B+%22id%22%3A+4265%2C+%22name[…]Microbiome+Data+Collaborative%22+%7D&plan%5Btemplate_id%5D=1321) ***which provides step-by-step prompts for your DMP.***
+
+All of the sections below are laid out in the NMDC DMPTool template. This is a living document and the NMDC team welcomes community feedback on this resource.
+
+![](../_static/images/howto_guides/data_mgt/data_mgt_list.png)
+
+### For more information
+You can find more information about [Data Management Best Practices](https://microbiomedata.org/data-management/) on the NMDC website.
+
+### Contact us
+
+Please [email us](https://microbiomedata.org/contact/) with comments, suggestions, or questions. The NMDC team provides Data Management Plan Consultancies where we can help you draft an effective DMP and provide you with tools and resources for completing your DMP in accordance with funder requirements and community best practices.
+
diff --git a/content/nmdc/src/howto_guides/globus.md b/content/nmdc/src/howto_guides/globus.md
new file mode 100644
index 0000000..0fd313f
--- /dev/null
+++ b/content/nmdc/src/howto_guides/globus.md
@@ -0,0 +1,19 @@
+# Downloading NMDC Data via Globus
+
+[Globus](https://globus.org) provides a mechanism to download NMDC data using high-bandwidth managed transfers. Globus has an automated point-and-click interface that lets you schedule a bulk transfer to your own machine or another compute center. You can learn more about using Globus to transfer data, by reading the [Globus documentation](https://www.globus.org/data-transfer).
+
+## Globus Collections
+
+NMDC collections are publicly visible to everyone that has a [Globus ID](https://www.globusid.org/create).
+
+### NERSC
+
+To access NMDC data housed at NERSC, you can use the ["NMDC" collection](https://app.globus.org/file-manager?origin_id=72dd396a-2242-11ec-a0a4-6b21ca6daf73&origin_path=%2F).
+
+In that collection, NMDC data are organized by _NMDC identifiers_. This collection serves up the same contents as `https://data.microbiomedata.org/data/` does, so any file path underneath that base URL can be mapped to the equivalent file in Globus.
+
+### EMSL
+
+To access NMDC data housed at EMSL, you can use the ["NMDC Bulk Data Cache" collection](https://app.globus.org/file-manager?origin_id=07de22e4-6c17-4bd2-86bc-49fe5ddb2070&origin_path=%2F).
+
+In that collection, NMDC data are organized by _omics types_. This collection serves up the same contents as `https://nmdcdemo.emsl.pnnl.gov/` does, so any file path underneath that base URL can be mapped to the equivalent file in Globus.
diff --git a/content/nmdc/src/howto_guides/portal_guide.md b/content/nmdc/src/howto_guides/portal_guide.md
new file mode 100644
index 0000000..8e9fdbb
--- /dev/null
+++ b/content/nmdc/src/howto_guides/portal_guide.md
@@ -0,0 +1,421 @@
+# The NMDC Data Portal User Guide
+
+## Introduction
+
+The pilot NMDC Data Portal () provides
+a resource for consistently processed multi-omics data that is
+integrated to enable search, access, analysis, and download. Open-source
+bioinformatics workflows are used to process raw multi-omics data and
+produce interoperable and reusable annotated data from metagenome,
+metatranscriptome, metaproteome, metabolome, and natural organic matter
+characterizations. The NMDC Data Portal offers several search and
+navigation components, and data can be downloaded through the graphical
+user interface using an ORCiD authentication, with associated download
+metrics, or retrieved through available RESTful APIs. All multi-omics
+data are available under a Creative Commons 4.0 license, which enables
+public use with attribution, as outlined in the NMDC Data Use Policy
+(). This first
+iteration of the NMDC Data Portal was released in March 2021, and will
+continue to expand its data hostings and functionality on an ongoing.
+
+There is a short video tutorial showing how to navigate the portal on
+Youtube ().
+
+## User-Centered Design Process
+
+The NMDC is a resource designed together with and for the scientific
+community. We have engaged in extensive user research through interviews
+and direct collaboration with the scientific community that have
+informed the design, development, and display of data through the NMDC
+Data Portal. This methodology (1) enables the scientific community to
+provide feedback, iterative and continuous improvement of our systems,
+and ensures that our systems enable a high level of scientific
+productivity. Feedback collected from the scientific community during
+early iterations of the Data Portal can be linked to the features and
+design directions found in the current release. Our community-centered
+design approach ensures that the NMDC can evolve with the needs of the
+microbiome research community, but will also be important for uncovering
+creative design solutions, clarifying expectations, reducing redesign,
+and perhaps most importantly, enabling shared ownership (2) of the NMDC.
+We hope that this inclusive approach will enable us to expand our
+engagements with the microbiome research community and the utility of
+the NMDC Data Portal.
+
+## Available Studies & Data
+
+Data hostings include studies, biosamples, and 5 data types from a breadth of
+environmental microbiomes, spanning river sediments, subsurface shale
+carbon reservoirs, plant-microbe associations, and temperate and
+tropical soils. Specifics are as follows:
+
+## Studies
+
+As the NMDC Data Portal is a pilot infrastructure, incoming projects for
+which study information and curated environmental metadata become
+available is first validated and loaded with a flag (Omics data coming
+soon) before processed instrumentation data is integrated into the
+portal.
+
+## Standards
+
+The NMDC team works closely with several standards groups and
+organizations. We have adopted the Genomic Standards Consortium (GSC)
+Minimum Information about any (x) Sequence (MIxS) templates (3). This
+provides a standard data dictionary of sample descriptors (e.g.,
+location, biome, altitude, depth) organized into seventeen environmental
+packages () for sequence data. The NMDC team has
+mapped fields used to describe samples in the GOLD database to MIxS
+version 6.1 elements. In addition, we are adopting the MIxS standards
+for sequence data types (e.g., sequencing method, pcr primers and
+conditions, etc.), and are leveraging standards and controlled
+vocabularies developed by the Proteomics Standards Initiative (4), the
+National Cancer Institute's Proteomic Data Commons
+(), and the
+Metabolomics Standards Initiative (5) for mass spectrometry data types
+(e.g., ionization mode, mass resolution, scan rate, etc.).
+
+### *MIxS environmental packages*
+
+The GSC has developed standards for describing genomic and metagenomic
+sequences, and the environment from which a biological sample
+originates. These "[Minimum Information about any (x)
+Sequence](https://www.gensc.org/pages/standards-intro.html)" (MIxS) packages provides
+standardized sample descriptors (e.g., location, environment, elevation,
+altitude, depth, etc.) for 17 different sample environments.
+
+### *Environment Ontology (EnvO)*
+
+EnvO is a community-led ontology that represents environmental entities
+such as biomes, environmental features, and environmental materials.
+These EnvO entities are the recommended values for several of the
+mandatory terms in the MIxS packages, often referred to as the "MIxS
+triad".
+
+### *Genomes OnLine Database (GOLD)*
+
+GOLD is an open-access repository of genome, metagenome, and
+metatranscriptome sequencing projects with their associated metadata.
+Biosamples (defined as the physical material collected from an
+environment) are described using a five-level ecosystem classification
+path that goes from ecosystem down to the type of environmental material
+that describes the sample.
+
+## Omics Data
+
+A suite of omics processing data can be generated from available
+biosamples, and the value of associating these data through a common
+sample source enables researchers to probe function. The NMDC data
+schema offers an approach to link omics processing runs to their source
+biosample (for example, multiple organic matter characterizations can be
+generated from a single sample through extraction with various solvents,
+eg, chloroform, methanol, and water fractionation). Below outlines the
+various omics data currently available through the portal.
+
+### *Metagenomes.*
+
+Illumina-sequenced shotgun metagenome data undergo pre-processing, error
+correction, assembly, structural and functional annotation, and binning
+leveraging the JGI's production pipelines (6), along with an additional
+read-based taxonomic analysis component. Standardized outputs from the
+read QC, read-based analysis, assembly, annotation, and binning are
+available for search and download on the NMDC Data
+Portal.
+
+### *Metatranscriptomes.*
+
+Illumina-sequenced shotgun reads from cDNA library undergo
+pre-processing and error correction in the same way as described above
+in the metagenome workflow with additional steps to filter ribosomal
+reads. High-quality reads are then assembled into transcripts using
+MEGAHIT (7), annotated using the annotation module described in the
+metagenome workflow, and the high-quality reads are mapped back to the
+annotated transcripts using HISAT2 (8) and then processed to calculate
+the number of reads mapped per feature using FeatureCount (9) and RPKM
+calculations per feature using edgeR (10). Results from read QC,
+assembly, and annotation are available for search and download for
+metatranscriptomes on the NMDC Data Portal.
+
+### *Metaproteomes.*
+
+Data-dependent mass spectrometry raw data files are first converted to
+mzML, using MSConvert (11). Peptide identification is achieved using
+MSGF+ (12) and the associated metagenomic information in the FASTA file.
+Peptide identification false discovery rate is controlled using a decoy
+database approach. Intensity information is extracted using MASIC (13)
+and combined with protein information. Protein annotation information is
+obtained from the associated metagenome annotation output. Standardized
+outputs for quality control, and peptide and protein-level quantitative
+data are available for search and download for metaproteomes on the
+NMDC Data Portal.
+
+### *Metabolomes.*
+
+The gas chromatography-mass spectrometry (GC-MS) based metabolomics
+workflow (metaMS) developed by leveraging EMSL's CoreMS mass
+spectrometry software framework allows target and semi-target data
+analysis of metabolomics data (14). The raw data is parsed into coreMS
+data structure and undergoes all the steps of signal processing (signal
+noise reduction, m/z based chromatogram peak deconvolution, abundance
+threshold calculation, peak picking) and molecular identification,
+including the molecular search using a metabolites standard compound
+library, spectral similarity calculation, and similarity score
+calculation (15), all in a single step. The putative metabolite
+annotation data is available to download for metabolomes on the NMDC
+Data Portal. Data dependent LC-MS based workflows are currently under
+development. Additionally, it should be noted that all available data
+derives from exploratory, untargeted analysis and is semi-quantitative.
+
+### *Natural Organic Matter Characterization (NOM).*
+
+Direct Infusion Fourier Transform mass spectrometry (DI FT-MS) data
+undergoes signal processing and molecular formula assignment leveraging
+EMSL's CoreMS framework (14). Raw time domain data is transformed into
+the *m/z* domain using Fourier Transform and Ledford equation (16). Data
+is denoised followed by peak picking, recalibration using an external
+reference list of known compounds, and searched against a dynamically
+generated molecular formula library with a defined molecular search
+space. The confidence scores for all the molecular formula candidates
+are calculated based on the mass accuracy and fine isotopic structure,
+and the best candidate assigned as the highest score. The molecular
+formula characterization table is available to download for natural
+organic matter characterizations on the NMDC Data Portal.
+
+## Portal Functionality
+
+#Faceted search and access
+
+### *Search by investigator name*
+
+[![](../_static/images/howto_guides/portal_guide/PI_search.png)](../_static/images/howto_guides/portal_guide/PI_search.png)
+
+NMDC-linked data can be filtered by the associated principal
+investigator by selecting 'PI Name' from the left query term bar. This
+selection will display studies and samples associated with that PI, and
+selecting the arrow on the right side of the study name will open up
+more information about that study and that principal investigator.
+
+### *Search by omics processing information*
+
+[![](../_static/images/howto_guides/portal_guide/instrument_name.png)](../_static/images/howto_guides/portal_guide/instrument_name.png)
+
+[![](../_static/images/howto_guides/portal_guide/omics_type.png)](../_static/images/howto_guides/portal_guide/omics_type.png)
+
+[![](../_static/images/howto_guides/portal_guide/processing_institution.png)](../_static/images/howto_guides/portal_guide/processing_institution.png)
+
+Samples can be queried by various omics processing information terms
+including instrument name, omics type (processing runs sorted by omics
+type can also be queried using the bar plot on the main portal page),
+and processing institution.
+
+### *Search by KEGG Orthology (KO)*
+
+[![](../_static/images/howto_guides/portal_guide/KO_term_search.png)](../_static/images/howto_guides/portal_guide/KO_term_search.png)
+
+[![](../_static/images/howto_guides/portal_guide/KO_results.png)](../_static/images/howto_guides/portal_guide/KO_results.png)
+
+Under 'Function' on the query term bar, users are able to search by KEGG
+Orthology (KO) terms to limit the query to samples with datasets that
+include at least one of the listed KO terms. Users may list multiple KO
+terms, but it is important to note that adding multiple terms will limit
+the search to datasets that include at least one of those KO terms, not
+all of the added terms.
+
+### *Search by environmental descriptors*
+
+[![](../_static/images/howto_guides/portal_guide/depth.png)](../_static/images/howto_guides/portal_guide/depth.png)
+
+[![](../_static/images/howto_guides/portal_guide/date.png)](../_static/images/howto_guides/portal_guide/date.png)
+
+[![](../_static/images/howto_guides/portal_guide/latitude.png)](../_static/images/howto_guides/portal_guide/latitude.png)
+
+[![](../_static/images/howto_guides/portal_guide/longitude.png)](../_static/images/howto_guides/portal_guide/longitude.png)
+
+[![](../_static/images/howto_guides/portal_guide/geographic_name.png)](../_static/images/howto_guides/portal_guide/geographic_name.png)
+
+The query term bar also includes several environmental descriptor
+filtering fields of where the samples were isolated from. Users can
+filter by sample isolation depth, collection date, latitude and
+longitude (can also filter by latitude and longitude using the
+interactive map on the omics main page), as well as geographic location
+name.
+
+### *Search by ecosystem classifications*
+
+[![](../_static/images/howto_guides/portal_guide/gold_classification.png)](../_static/images/howto_guides/portal_guide/gold_classification.png)
+
+[![](../_static/images/howto_guides/portal_guide/envo.png)](../_static/images/howto_guides/portal_guide/envo.png)
+
+Samples can also be queried by ecosystem classifications using GOLD
+and/or ENVO terms. Selecting GOLD classification in the query term bar
+opens up a hierarchy that can be navigated through to select ecosystem
+classification(s) of interest. Users can select everything under a
+certain classification at any point, or can continue navigating to more
+specific classifications. The Sankey diagram on the 'Environment' page
+provides an interactive visualization of the GOLD classification system.
+
+Similarly, ENVO terms can be used to query the portal, and these are
+broken down into environmental biome, feature, and material categories.
+ENVO is another effective classification system that can be used to
+describe environments where samples were collected from.
+
+## Interactive visualizations
+
+### *Omics Page*
+
+#### Barplot
+
+[![](../_static/images/howto_guides/portal_guide/bar_plot.png)](../_static/images/howto_guides/portal_guide/bar_plot.png)
+
+The barplot on the omics page displays the number of omics processing
+runs (not number of samples) for each data type available: organic
+matter, metagenomic, metatranscriptomic, proteomic, and metabolomic.
+Selecting the bar of a data type will limit the search to just that data
+type.
+
+#### Geographic map
+
+[![](../_static/images/howto_guides/portal_guide/geographic_map.png)](../_static/images/howto_guides/portal_guide/geographic_map.png)
+
+The geographic map on the omics page allows for samples to be queried by
+the geographic location from which they were isolated. The map displays
+the geographical location (latitude, longitude) of the sample collection
+sites as clusters with colors corresponding to the number of samples
+from that area. The map can be zoomed in and out of, and clusters can be
+selected to focus on that specific area. After zooming and moving around
+the map to a region of interest, selecting the 'Search this region'
+button will limit the search to the current map bounds.
+
+#### Temporal slider
+
+[![](../_static/images/howto_guides/portal_guide/temporal_slider.png)](../_static/images/howto_guides/portal_guide/temporal_slider.png)
+
+Samples can also be queried by a sample collection date range by
+dragging the dots below the temporal slider on the omics page. Sample
+collection dates are grouped by month.
+
+#### Upset plot
+
+[![](../_static/images/howto_guides/portal_guide/upset_plot.png)](../_static/images/howto_guides/portal_guide/upset_plot.png)
+
+The upset plot on the omics page displays the number of samples that
+have various combinations of associated omics data. The axis at the top
+of the plot refers to the different omics types (MG: metagenomic, MT:
+metatranscriptomic, MP: metaproteomic, MB: metabolomic, NOM: natural
+organic matter) and the dots and lines in the graph below represent the
+combinations of the omics data types. The numbers and bars on the right
+side represent the number of samples searchable in the NMDC data portal
+with each corresponding combination of omics data types. This plot will
+update as query terms are added.
+
+### *Environment Page*
+
+#### Sankey diagram
+
+[![](../_static/images/howto_guides/portal_guide/sankey_diagram.png)](../_static/images/howto_guides/portal_guide/sankey_diagram.png)
+
+On the environment page, the Sankey diagram displays the environments
+that NMDC-linked samples were isolated from. This visualization is based
+on the GOLD ecosystem classification path, and the diagram is fully
+interactive, so environments of interest can be chosen at descending
+levels of specificity. This will then limit your search to samples that
+came from that selected environment.
+
+### Download
+
+### *Individual file*
+
+[![](../_static/images/howto_guides/portal_guide/download_individual_file.png)](../_static/images/howto_guides/portal_guide/download_individual_file.png)
+
+Various output data files are available from samples findable through
+the NMDC that have been run through the NMDC standardized workflows.
+Output files from each omic type are sorted by the specific workflow
+(e.g. Metagenome Assembly, Annotation) that was run and are each
+available for download when the sample of interest is selected. Users
+must log in with an ORCID account before downloading data.
+
+### *Bulk download*
+
+[![](../_static/images/howto_guides/portal_guide/bulk_download.png)](../_static/images/howto_guides/portal_guide/bulk_download.png)
+
+In addition to the ability to download single output files from samples
+run through the NMDC standardized workflows, the NMDC portal allows
+users to perform bulk downloads on workflow output files. Once samples
+of interest are down-selected through query terms, output files from
+each NMDC standardized workflow run on those samples are available as
+bulk downloads. Users must be logged in with an ORCID account before
+downloading data.
+
+## References
+
+> 1. Abras C, Maloney-Krichmar, D., Preece, J. 2004. User-Centered
+> Design. \_In \_Bainbridge W (ed), Encyclopedia of Human-Computer
+> Interaction. Sage Publications, Thousand Oaks.
+> 2. Preece J, Rogers, Y., & Sharp, H. 2002. Interaction design: Beyond
+> human-computer interaction. John Wiley & Sons, New York, NY.
+> 3. Yilmaz P, Kottmann R, Field D, Knight R, Cole JR, Amaral-Zettler
+> L, Gilbert JA, Karsch-Mizrachi I, Johnston A, Cochrane G, Vaughan
+> R, Hunter C, Park J, Morrison N, Rocca-Serra P, Sterk P, Arumugam
+> M, Bailey M, Baumgartner L, Birren BW, Blaser MJ, Bonazzi V, Booth
+> T, Bork P, Bushman FD, Buttigieg PL, Chain PSG, Charlson E,
+> Costello EK, Huot-Creasy H, Dawyndt P, DeSantis T, Fierer N,
+> Fuhrman JA, Gallery RE, Gevers D, Gibbs RA, Gil IS, Gonzalez A,
+> Gordon JI, Guralnick R, Hankeln W, Highlander S, Hugenholtz P,
+> Jansson J, Kau AL, Kelley ST, Kennedy J, Knights D, Koren O, et
+> al. 2011. Minimum information about a marker gene sequence
+> (MIMARKS) and minimum information about any (x) sequence (MIxS)
+> specifications. \_Nature Biotechnol. \_29:415-420.
+> 4. Taylor CF, Paton NW, Lilley KS, Binz P-A, Julian RK, Jones AR, Zhu
+> W, Apweiler R, Aebersold R, Deutsch EW, Dunn MJ, Heck AJR, Leitner
+> A, Macht M, Mann M, Martens L, Neubert TA, Patterson SD, Ping P,
+> Seymour SL, Souda P, Tsugita A, Vandekerckhove J, Vondriska TM,
+> Whitelegge JP, Wilkins MR, Xenarios I, Yates JR,
+> Hermjakob H. 2007. The minimum information about a proteomics
+> experiment (MIAPE). \_Nature Biotechnol. \_25:887-893.
+> 5. Sansone S-A, Fan T, Goodacre R, Griffin JL, Hardy NW,
+> Kaddurah-Daouk R, Kristal BS, Lindon J, Mendes P, Morrison N,
+> Nikolau B, Robertson D, Sumner LW, Taylor C, van der Werf M, van
+> Ommen B, Fiehn O, Members MSIB. 2007. The Metabolomics Standards
+> Initiative. \_Nature Biotechnol. \_25:846-848.
+> 6. Clum A, Huntemann M, Bushnell B, Foster B, Foster B, Roux S, Hajek
+> PP, Varghese N, Mukherjee S, Reddy TBK, Daum C, Yoshinaga Y,
+> O'Malley R, Seshadri R, Kyrpides NC, Eloe-Fadrosh EA, Chen I-MA,
+> Copeland A, Ivanova NN, Segata N. 2021. DOE JGI Metagenome
+> Workflow. \_mSystems \_6:e00804-20.
+> 7. Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. 2015. MEGAHIT: an
+> ultra-fast single-node solution for large and complex metagenomics
+> assembly via succinct de Bruijn graph. \_Bioinformatics
+> \_31:1674-1676.
+> 8. Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. 2019. Graph-based
+> genome alignment and genotyping with HISAT2 and HISAT-genotype.
+> \_Nature Biotechnol. \_37:907-915.
+> 9. Liao Y, Smyth GK, Shi W. 2014. featureCounts: an efficient general
+> purpose program for assigning sequence reads to genomic features.
+> \_Bioinformatics \_30:923-30.
+> 10. Robinson MD, McCarthy DJ, Smyth GK. 2010. edgeR: a Bioconductor
+> package for differential expression analysis of digital gene
+> expression data. \_Bioinformatics \_26:139-140.
+> 11. Chambers MC, Maclean B, Burke R, Amodei D, Ruderman DL, Neumann S,
+> Gatto L, Fischer B, Pratt B, Egertson J, Hoff K, Kessner D, Tasman
+> N, Shulman N, Frewen B, Baker TA, Brusniak MY, Paulse C, Creasy D,
+> Flashner L, Kani K, Moulding C, Seymour SL, Nuwaysir LM, Lefebvre
+> B, Kuhlmann F, Roark J, Rainer P, Detlev S, Hemenway T, Huhmer A,
+> Langridge J, Connolly B, Chadick T, Holly K, Eckels J, Deutsch EW,
+> Moritz RL, Katz JE, Agus DB, MacCoss M, Tabb DL, Mallick P. 2012.
+> A cross-platform toolkit for mass spectrometry and proteomics.
+> \_Nature Biotechnol. \_30:918-20.
+> 12. Kim S, Gupta N, Pevzner PA. 2008. Spectral Probabilities and
+> Generating Functions of Tandem Mass Spectra: A Strike against
+> Decoy Databases. \_J Proteome Res. \_7:3354-3363.
+> 13. Monroe ME, Shaw JL, Daly DS, Adkins JN, Smith RD. 2008. MASIC: A
+> software program for fast quantitation and flexible visualization
+> of chromatographic profiles from detected LC-- MS(/MS) features.
+> \_Comp. Biol. Chemistry \_32:215-217.
+> 14. Corilo YE, Kew WR, McCue LA. 2021. EMSL-Computing/CoreMS: CoreMS
+> 1.0.0 (v1.0.0). Zenodo. 10.5281/zenodo.4641552.
+> 15. Hiller K, Hangebrauk J, Jäger C, Spura J, Schreiber K,
+> Schomburg D. 2009. MetaboliteDetector: comprehensive analysis tool
+> for targeted and nontargeted GC/MS based metabolome analysis.
+> \_Anal Chem \_81:3429-39.
+> 16. Marshall AG, Hendrickson CL, Jackson GS. 1998. Fourier transform
+> ion cyclotron resonance mass spectrometry: a primer. \_Mass
+> Spectrom Rev \_17:1-35.
diff --git a/content/nmdc/src/howto_guides/run_workflows.md b/content/nmdc/src/howto_guides/run_workflows.md
new file mode 100644
index 0000000..aba7a09
--- /dev/null
+++ b/content/nmdc/src/howto_guides/run_workflows.md
@@ -0,0 +1,1112 @@
+# Running the Workflows
+
+## NMDC EDGE Quick Start User Guide
+![](../_static/images/howto_guides/workflows/quickStart/image1.png)
+
+### Register for an account
+
+Users must register for an account within the NMDC EDGE platform or login using the user's ORCiD account.
+
+![](../_static/images/howto_guides/workflows/quickStart/image2.png)
+
+![](../_static/images/howto_guides/workflows/quickStart/image3.png)
+
+### User Profile
+
+Once logged in, the green button with the user's initials on the right provides a drop-down menu which allows the user to manage their projects and uploads; there is also a button which allows users to edit their profile. On this profile page, there are two options: 1) the option to receive email notification of a project's status (OFF by default) and 2) the option to change the user's password (also OFF by default).
+
+![](../_static/images/howto_guides/workflows/quickStart/image4.png)
+
+### Upload data
+
+Two options are available for users to upload their own data to process through the workflows. The first is using the button in the left menu bar. The second is through the drop-down menu shown when clicking the green button with the user's initials on the right. Either button will open a window which allows the user to drag and drop files or browse for the user's data files. (There are also some datasets in the Public Data folder for users to test the platform.)
+
+![](../_static/images/howto_guides/workflows/quickStart/image5.png)
+
+### Running a single workflow
+
+To run a workflow, the user must provide:
+
+1. A unique Project/Run Name with no spaces (underscores are fine).
+
+2. A description is optional, but recommended.
+
+3. The user then selects the workflow desired from the drop-down menu.
+
+4. For metagenomic/metatranscriptomic data, the user must also select if the input data is interleaved or separate files for the paired reads.
+
+5. Then the input file(s) from the available list of files.
+
+6. The user should click "Submit.
+
+> ![](../_static/images/howto_guides/workflows/quickStart/image6.png)
+
+Note: Clicking on the buttons to the right of the data input blanks
+opens a box called "Select a file" to allow the user to find the desired files (shown in purple) from previously run
+projects, the public data folder, or user uploaded files.
+
+![](../_static/images/howto_guides/workflows/quickStart/image7.png)
+
+### Running multiple workflows
+
+1. Another option is to select "Run Multiple Workflows" if the user
+ desires to run more than one of the metagenomic workflows or the
+ entire metagenomic pipeline.
+
+2. Enter a **unique** Project/Run Name with no spaces
+ (underscores are fine).
+
+3. A description is optional, but recommended.
+
+4. The user must also select if the input data is interleaved or
+ separate files for the paired reads.
+
+> ![](../_static/images/howto_guides/workflows/quickStart/image8.png)
+
+All five of the metagenomic workflows are "ON" by default, but the user
+can select to turn off any workflows not desired. The pipeline uses the
+output of each workflow as the input for subsequent workflows. (Note:
+Some workflows require input data from prior workflows, so turning one
+workflow off may result in other workflows also automatically turning
+off.) Then the user can click "Submit."
+![](../_static/images/howto_guides/workflows/quickStart/image9.png)
+
+### Output
+
+1. The link for 'My Projects' opens the list of projects for that user
+
+2. Links (in the purple circles) are provided to share projects, make projects public, or delete projects
+
+3. The "Status" column shows whether the job is in the queue (gray), submitted (purple), running (yellow), has failed (red) or completed (green). If a project fails, a log will give the error messages for troubleshooting.
+
+4. Clicking on the icon to the left of a project name opens up the results page for that project.
+
+> ![](../_static/images/howto_guides/workflows/quickStart/image10.png)
+
+### Project Summary (Results)
+
+The project summary page will show three categories. Clicking on the bar or tab opens up the information.
+
+1. General contains the project run information.
+
+2. "Workflow" Result contains the tabular/visual output.
+
+3. Browser/Download Outputs contains all the output files available for downloading. There may be several folders.
+
+> ![](../_static/images/howto_guides/workflows/quickStart/image11.png)
+
+This example shows the results of a ReadsQC workflow run which shows run time under the General tab, the workflow results of quality trimming and filtering under the ReadsQC Results tab, and the files available for download (shown in purple) under the Browser/Download Outputs tab.
+
+![](../_static/images/howto_guides/workflows/quickStart/image12.png)
+
+
+The full Metagenome pipeline or "Multiple Workflow" run results show
+the results of each workflow under a separate tab and the associated
+files available for download are in separate workflow folders under the
+Browser/Download Outputs tab.
+
+![](../_static/images/howto_guides/workflows/quickStart/image13.png)
+
+
+As a second example, the next two figures show the results from the Read-based Taxonomy Classification workflow. The summary includes classified reads and the number of species identified for all of the selected taxonomy classifiers. The top ten organisms identified by each tool at three taxonomic levels is also provided. Tabs for each of the classification tools providing more in-depth results are in the Detail section. Krona plots are generated for the results at each of the three taxonomic levels for each of the tools and can also be found in the Detail section. Full results files (beyond the Top 10) and the graphics are available for download.
+
+![](../_static/images/howto_guides/workflows/quickStart/image14.png)
+
+![](../_static/images/howto_guides/workflows/quickStart/image15.png)
+
+
+## Metagenomics Workflows
+### ReadsQC
+
+![](../_static/images/howto_guides/workflows/readsQC/image2.png)
+
+#### Overview
+
+This workflow performs quality control on raw Illumina reads to
+trim/filter low quality data and to remove artifacts, linkers, adapters,
+spike-in reads and reads mapping to several hosts and common microbial
+contaminants.
+
+#### Running the Workflow
+
+Currently, this workflow is available in
+[GitHub](https://github.com/microbiomedata/ReadsQC) and can be run from
+the command line. (CLI instructions and requirements are found
+[here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/1_RQC_index.html).)
+Alternatively, this workflow can be run in [NMDC
+EDGE](https://nmdc-edge.org/).
+
+#### Input
+
+Metagenome ReadsQC requires paired-end Illumina data as an interleaved
+file or as separate pairs of FASTQ files.
+
+- **Acceptable file formats:** .fastq, .fq, .fastq.gz, .fq.gz
+
+#### Details
+
+This workflow performs quality control on raw Illumina reads using
+rqcfilter2. The workflow performs quality trimming, artifact removal,
+linker trimming, adapter trimming, and spike-in removal using bbduk, and
+performs human/cat/dog/mouse/microbe removal using bbmap. Full
+documentation can be found in
+[ReadtheDocs](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/1_RQC_index.html).
+
+#### Software Versions
+
+- rqcfilter2 (BBTools v38.94)
+
+- bbduk (BBTools v38.94)
+
+- bbmap (BBTools v38.94)
+
+#### Output
+
+Multiple output files are provided by the workflow; the primary files
+are shown below. The full list of output files can be found in
+[ReadtheDocs](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/1_RQC_index.html).
+
+
+
+
+
Primary Output Files
+
Description
+
+
+
+
+
Filtered Sequencing Reads
+
Cleaned paired-end data in interleaved format (.fastq.gz)
+
+
+
QC statistics (2 files)
+
Reads QC summary statistics (.txt)
+
+
+
+
+#### Running the Reads QC Workflow in NMDC EDGE
+
+Select a workflow
+
+1. From the Metagenomics category in the left menu bar, select 'Run a
+ Single Workflow'.
+
+2. Enter a **unique** project name with no spaces
+ (underscores are fine).
+
+3. A description is optional, but helpful.
+
+4. Select 'ReadsQC' from the dropdown menu under Workflow.
+
+> ![](../_static/images/howto_guides/workflows/readsQC/image3.png)
+
+Input
+
+ReadsQC requires paired-end Illumina data in FASTQ format as the input;
+the file can be interleaved and can be compressed. **Acceptable file
+formats:** .fastq, .fq, .fastq.gz, .fq.gz
+
+5. The default setting is for the raw data to be in an interleaved
+ format (paired reads interleaved into one file). If the raw data is
+ paired reads in separate files (forward and reverse), click 'No'.
+
+6. Additional data files (of the same type--interleaved or separate)
+ can be added with the button below.
+
+7. Click the button to the right of the input blank for data to select
+ the data file for the analysis. (If there are separate files, there
+ will be two input blanks.) A box called 'Select a File' will open to
+ allow the user to find the desired file(s) from previously run
+ projects, the public data folder, or files uploaded by the user.
+
+8. Then click 'Submit'.
+
+> ![](../_static/images/howto_guides/workflows/readsQC/image4.png)
+
+Output
+
+The General section of the output shows which workflow was run and the
+run time information.
+
+![](../_static/images/howto_guides/workflows/readsQC/image5.png)
+
+The ReadsQC Result section shows the data input and provides a variety
+of metrics including the number of reads and bases before and after
+trimming and filtering.
+
+![](../_static/images/howto_guides/workflows/readsQC/image6.png)
+
+The Browser/Download Output section provides output files available to
+download. The clean data will be in an interleaved .fq.gz file. General
+QC statistics are in the filterStats.txt file.
+
+![](../_static/images/howto_guides/workflows/readsQC/image7.png)
+
+
+### Read-based Taxonomy Classification
+
+![](../_static/images/howto_guides/workflows/readBasedTaxonomy/image2.png)
+
+#### Overview
+
+This workflow takes in Illumina sequencing files (single-end or
+paired-end) and profiles the reads using multiple taxonomic
+classification tools.
+
+#### Running the Workflow
+
+Currently, this workflow is available in
+[GitHub](https://github.com/microbiomedata/ReadbasedAnalysis) and can be
+run from the command line. (CLI instructions and requirements are found
+[here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/2_ReadAnalysis_index.html).)
+Alternatively, this workflow can be run in [NMDC
+EDGE](https://nmdc-edge.org/).
+
+#### Input
+
+The Metagenome Read-based Taxonomy Classification workflow requires
+Illumina data and can accept data as an interleaved file or as separate
+pairs of FASTQ files. Interleaved data will be treated as single-end
+reads. (It is highly recommended to input clean data from the ReadsQC
+workflow.)
+
+- **Acceptable file formats:** .fastq, .fq, .fastq.gz, .fq.gz
+
+#### Details
+
+To create a community profile, this workflow utilizes three taxonomy
+classification tools: GOTTCHA2, Kraken2, and Centrifuge. These tools
+vary in levels of specificity and sensitivity. Each tool has a separate
+reference database. These databases (152 GB) are built into NMDC EDGE.
+Users can select one, two, or all three of the classification tools to
+run in the workflow. Full documentation can be found in
+[ReadtheDocs](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/2_ReadAnalysis_index.html).
+
+#### Software Versions
+
+- GOTTCHA2 v2.1.6
+
+- Kraken2 v2.0.8
+
+- Centrifuge v1.0.4
+
+
+#### Output
+
+Multiple output files are provided by the workflow; the primary files
+are shown below. The full list of output files can be found in
+[ReadtheDocs](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/2_ReadAnalysis_index.html).
+
+
+
+
+
Primary Output Files
+
Description
+
+
+
+
+
Profiling results for each tool
+
Tabular results of the profile for each tool (.tsv)
+
+
+
Krona plots for each tool
+
Interactive graphic file (.html)
+
+
+
+
+#### Running the Read-based Taxonomy Classification Workflow in NMDC EDGE
+
+Select a workflow
+
+1. From the Metagenomics category in the left menu bar, select 'Run a
+ Single Workflow'.
+
+2. Enter a ***unique** project name with no spaces
+ (underscores are fine).
+
+3. A description is optional, but helpful.
+
+4. Select 'Read-based Taxonomy Classification' from the dropdown menu
+ under Workflow.
+
+> ![](../_static/images/howto_guides/workflows/readBasedTaxonomy/image3.png)
+
+> ![](../_static/images/howto_guides/workflows/readBasedTaxonomy/image4.png)
+
+Input
+
+This workflow accepts Illumina data in FASTQ format as the input; the
+file can be interleaved and can be compressed. This input can be the
+output from the ReadsQC workflow and this is recommended. **Acceptable
+file formats:** .fastq, .fq, .fastq.gz, .fq.gz
+
+5. The default setting is for the raw data to be in an interleaved
+ format (paired reads interleaved into one file). If the raw data is
+ paired reads in separate files (forward and reverse), click 'No'.
+
+6. Additional data files (of the same type--interleaved or separate)
+ can be added with the button below.
+
+7. Click the button to the right of the input blank for data to select
+ the data file for the analysis. (If there are separate files, there
+ will be two input blanks.) A box called 'Select a File' will open to
+ allow the user to find the desired file(s) from previously run
+ projects, the public data folder, or files uploaded by the user.
+
+8. Then click 'Submit'.
+
+> ![](../_static/images/howto_guides/workflows/readBasedTaxonomy/image6.png)
+
+Output
+
+The General section of the output shows which workflow and which tools
+were run and the run time information.
+
+![](../_static/images/howto_guides/workflows/readBasedTaxonomy/image7.png)
+
+The Read-based Taxonomy Classification Result section has a summary
+section at the top and results for each tool at three levels of taxonomy
+in the Taxonomy Top 10 section. The Detail section has classified reads
+results and relative abundance results for each tool at three levels of
+taxonomy.
+
+![](../_static/images/howto_guides/workflows/readBasedTaxonomy/image8.png)
+
+The Detail section also provides an interactive Krona plot for each
+tool.
+
+![](../_static/images/howto_guides/workflows/readBasedTaxonomy/image9.png)
+
+The Browser/Download Output section provides output files available to
+download. Each tool has a separate folder for the results from that
+tool. Full tabular results are in the largest .tsv file and the
+interactive Krona plots (.html files) open in a separate browser window.
+
+![](../_static/images/howto_guides/workflows/readBasedTaxonomy/image10.png)
+
+### Assembly
+
+![](../_static/images/howto_guides/workflows/metagenomeAssembly/image2.png)
+
+#### Overview
+
+This workflow takes in paired-end Illumina data, runs error correction,
+assembly, and assembly validation.
+
+#### Running the Workflow
+
+Currently, this workflow is available in
+[GitHub](https://github.com/microbiomedata/metaAssembly) and can be run
+from the command line. (CLI instructions and requirements are found
+[here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/3_MetaGAssemly_index.html).)
+Alternatively, this workflow can be run in [NMDC
+EDGE](https://nmdc-edge.org/).
+
+#### Input
+
+Metagenome Assembly requires paired-end Illumina data as an interleaved
+file or as separate pairs of FASTQ files. The recommended input is the
+output from the ReadsQC workflow.
+
+- **Acceptable file formats:** .fastq, .fq, .fastq.gz, .fq.gz
+
+#### Details
+
+This workflow takes in paired-end Illumina reads and performs error
+correction using bbcms. Then the corrected reads are assembled using
+metaSPAdes. After assembly, the reads are mapped back to the contigs
+using bbmap for coverage information. Full documentation can be found in
+[ReadtheDocs.](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/3_MetaGAssemly_index.html)
+
+#### Software Versions and Parameters
+
+- bbcms (BBTools v38.94)
+
+- metaSpades v3.15.0
+
+- bbmap (BBTools v38.94)
+
+#### Output
+
+Multiple output files are provided by the workflow; the primary files
+are shown below. The full list of output files can be found in
+[ReadtheDocs](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/3_MetaGAssemly_index.html).
+
+
+
+
+
Primary Output Files
+
Description
+
+
+
+
+
Assembly Contigs
+
Final assembly contigs (assembly_contigs.fna)
+
+
+
Assembly Scaffolds
+
Final assembly scaffolds (assembly_scaffolds.fna)
+
+
+
Assembly AGP
+
An AGP format file which describes the assembly
+
+
+
Assembly Coverage BAM
+
Sorted bam file of reads mapping back to the final assembly
+
+
+
Assembly Coverage Stats
+
Assembled contigs coverage information
+
+
+
+
+#### Running the Metagenome Assembly Workflow in NMDC EDGE
+
+Select a workflow
+
+1. From the Metagenomics category in the left menu bar, select 'Run a
+ Single Workflow'.
+
+2. Enter a **unique** project name with no spaces
+ (underscores are fine).
+
+3. A description is optional, but helpful.
+
+4. Select 'Metagenome Assembly' from the dropdown menu under Workflow.
+
+> ![](../_static/images/howto_guides/workflows/metagenomeAssembly/image4.png)
+
+Input
+
+This workflow accepts Illumina data in FASTQ format as the input; the
+file can be interleaved and can be compressed. (It is highly recommended
+to input clean data from the ReadsQC workflow.)
+
+**Acceptable file formats:** .fastq, .fq, .fastq.gz, .fq.gz
+
+5. The default setting is for the raw data to be in an interleaved
+ format (paired reads interleaved into one file). If the raw data is
+ paired reads in separate files (forward and reverse), click 'No'.
+
+6. Additional data files (of the same type--interleaved or separate)
+ can be added with the button below.
+
+7. Click the button to the right of the input blank for data to select
+ the data file for the analysis. (If there are separate files, there
+ will be two input blanks.) A box called 'Select a File' will open to
+ allow the user to find the desired file(s) from previously run
+ projects, the public data folder, or files uploaded by the user.
+
+8. Then click 'Submit'.
+
+> ![](../_static/images/howto_guides/workflows/metagenomeAssembly/image5.png)
+
+Output
+
+The General section of the output shows which workflow was run and the
+run time information.
+
+![](../_static/images/howto_guides/workflows/metagenomeAssembly/image6.png)
+
+The Metagenome Assembly Result section has all of the statistics from
+the assembly.
+
+![](../_static/images/howto_guides/workflows/metagenomeAssembly/image7.png)
+
+The Browser/Download Output section provides output files available to
+download. The primary result is the assembly_contigs.fna file which can
+also be the input for the Metagenome Annotation workflow. The
+pairedMapped_sorted.bam file along with the assembled contigs file can
+be the input for the MAGs Generation workflow.
+
+![](../_static/images/howto_guides/workflows/metagenomeAssembly/image8.png)
+
+
+### Annotation
+
+![](../_static/images/howto_guides/workflows/metagenomeAnnotation/image2.png)
+
+#### Overview
+
+This workflow takes assembled metagenomes and generates structural and
+functional annotations.
+
+#### Running the Workflow
+
+Currently, this workflow is available in
+[GitHub](https://github.com/microbiomedata/mg_annotation/) and can be
+run from the command line. (CLI instructions and requirements are found
+[here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/4_MetaGAnnotation_index.html).)
+Alternatively, this workflow can be run in [NMDC
+EDGE.](https://nmdc-edge.org/)
+
+#### Input
+
+Metagenome Annotation requires assembled contigs in a FASTA file. This
+input can be the output from the Metagenome Assembly workflow and this
+is recommended.
+
+- **Acceptable file formats:** .fasta, .fa, .fna, .fasta.gz, .fa.gz,
+ .fna.gz
+
+#### Details
+
+The workflow uses a number of open-source tools and databases to
+generate the structural and functional annotations. The input assembly
+is first split into 10MB splits to be processed in parallel. Depending
+on the workflow engine configuration, the split can be processed in
+parallel. Each split is first structurally annotated, then those results
+are used for the functional annotation. The structural annotation uses
+tRNAscan_se, RFAM, CRT, Prodigal and GeneMarkS. These results are merged
+to create a consensus structural annotation. The resulting GFF is the
+input for functional annotation which uses multiple protein family
+databases (SMART, COG, TIGRFAM, SUPERFAMILY, Pfam and Cath-FunFam) along
+with custom HMM models. The functional predictions are created using
+Last and HMM. These annotations are also merged into a consensus GFF
+file. Finally, the respective split annotations are merged together to
+generate a single structural annotation file and single functional
+annotation file. In addition, several summary files are generated in TSV
+format. Full documentation can be found in
+[ReadtheDocs](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/4_MetaGAnnotation_index.html).
+
+#### Software Versions
+
+- Conda
+
+- tRNAscan-SE \>= 2.0
+
+- Infernal 1.1.2
+
+- CRT-CLI 1.8
+
+- Prodigal 2.6.3
+
+- GeneMarkS-2 \>= 1.07
+
+- Last \>= 983
+
+- HMMER 3.1b2
+
+- TMHMM 2.0
+
+#### Output
+
+Multiple output files are provided by the workflow; the primary files
+are shown below. The full list of output files can be found in
+[ReadtheDocs.](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/4_MetaGAnnotation_index.html)
+
+
+
+
+
Primary Output Files
+
Description
+
+
+
+
+
Structural Annotation
+
Consensus structural annotation file from multiple tools (.gff)
+
+
+
Functional Annotation
+
Consensus functional annotation file from multiple tools (.gff)
+
+
+
KEGG summary
+
KEGG gene function tabular summary (.tsv)
+
+
+
EC summary
+
Enzyme Commission tabular summary (.tsv)
+
+
+
Gene phylogeny summary
+
Gene phylogeny tabular summary (.tsv)
+
+
+
+
+#### Running the Metagenome Annotation Workflow in NMDC EDGE
+
+Select a workflow
+
+1. From the Metagenomics category in the left menu bar, select 'Run a
+ Single Workflow'.
+
+2. Enter a **unique** project name with no spaces
+ (underscores are fine).
+
+3. A description is optional, but helpful.
+
+4. Select 'Metagenome Annotation' from the dropdown menu under
+ Workflow.
+
+>![](../_static/images/howto_guides/workflows/metagenomeAnnotation/image3.png)
+
+Input
+
+This workflow accepts assembled Illumina data in FASTA format as the
+input; the file can be compressed. (It is highly recommended to input
+the assembled contigs from the Metagenome Assembly workflow.)
+**Acceptable file formats:** .fasta, .fa, .fna, .fasta.gz, .fa.gz,
+.fna.gz.
+
+5. Click the button to the right of the input blank for data to select
+ the data file for the analysis. (If there are separate files, there
+ will be two input blanks.) A box called 'Select a File' will open to
+ allow the user to find the desired file(s) from previously run
+ projects, the public data folder, or files uploaded by the user.
+
+6. Then click 'Submit'.
+
+> ![](../_static/images/howto_guides/workflows/metagenomeAnnotation/image4.png)
+
+Output
+
+The General section of the output shows which workflow was run and the
+run time information.
+
+![](../_static/images/howto_guides/workflows/metagenomeAnnotation/image5.png)
+
+The Metagenome Annotation Result section has statistics for Processed
+Sequences, Predicted Genes, and General Quality Information from the
+workflow.
+
+![](../_static/images/howto_guides/workflows/metagenomeAnnotation/image6.png)
+
+The Browser/Download Output section provides output files available to
+download. The primary results are the functional annotation and the
+structural annotation files (.gff). The functional annotation file is
+required input for the MAGs Generation workflow along with the assembled
+contigs.
+
+![](../_static/images/howto_guides/workflows/metagenomeAnnotation/image7.png)
+
+### MAGs Generation
+
+![](../_static/images/howto_guides/workflows/MAGs/image2.png)
+
+#### Overview
+
+This workflow classifies contigs into bins and the resulting bins are
+refined using the functional annotation file. The bins are evaluated for
+completeness and contamination. The quality of the bins is determined
+and a lineage is assigned to each bin of high or medium quality.
+
+#### Running the Workflow
+
+Currently, this workflow is available in
+[GitHub](https://github.com/microbiomedata/metaMAGs) and can be run from
+the command line. (CLI instructions and requirements are found
+[here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/5_MAG_index.html).)
+Alternatively, this workflow can be run in [NMDC
+EDGE.](https://nmdc-edge.org/)
+
+#### Input
+
+This workflow requires assembled contigs in a FASTA file, the read
+mapping file from the assembly (SAM or BAM), a functional annotation of
+the assembly in a GFF file.
+
+- **Acceptable file formats:** assembled contigs (.fasta, .fa, or
+ .fna); read mapping to assembly (.sam.gz or .bam); Functional
+ annotation (.gff)
+
+#### Details
+
+The workflow is based on IMG metagenome binning pipeline and has been
+modified specifically for the NMDC project. For all processed
+metagenomes, it classifies contigs into bins using MetaBat2. Next, the
+bins are refined using the functional Annotation file (GFF) from the
+Metagenome Annotation workflow and optional contig lineage information.
+The completeness of and the contamination present in the bins are
+evaluated by CheckM and bins are assigned a quality level (High Quality
+(HQ), Medium Quality (MQ), Low Quality (LQ)) based on MiMAG standards.
+In the end, GTDB-Tk is used to assign lineage for HQ and MQ bins. The
+required GTDB-Tk database is incorporated into NMDC EDGE. Full
+documentation can be found in
+[ReadtheDocs](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/5_MAG_index.html).
+
+#### Software Versions
+
+- Biopython v1.74
+
+- Sqlite
+
+- Pymysql
+
+- requests
+
+- samtools \> v1.9 (License: MIT License)
+
+- Metabat2 v2.15
+
+- CheckM v1.1.2
+
+- GTDB-TK v1.2.0
+
+- FastANI v1.3
+
+- FastTree v2.1.10
+
+#### Output
+
+Multiple output files are provided by the workflow; the primary files
+are shown below. The full list of output files can be found in
+[ReadtheDocs.](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/5_MAG_index.html)
+
+
+
+
+
Primary Output Files
+
Description
+
+
+
+
+
hqmq-metabat-bins.zip
+
Bins of contigs rated high or medium quality with an assigned lineage
+
+
+
+
+#### Running the Metagenome Assembled Genomes (MAGs) Workflow in NMDC EDGE
+
+Select a workflow
+
+1. From the Metagenomics category in the left menu bar, select 'Run a
+ Single Workflow'.
+
+2. Enter a **unique** project name with no spaces
+ (underscores are fine).
+
+3. A description is optional, but helpful.
+
+4. Select 'Metagenome MAGs' from the dropdown menu under Workflow.
+
+> ![](../_static/images/howto_guides/workflows/MAGs/image3.png)
+
+Input
+
+Metagenome MAGs requires assembled contigs, the read mapping file of
+reads to assembled contigs, and a functional annotation file. The
+recommended input would be from the NMDC assembly and annotation
+workflows. **Acceptable file formats:** assembled contigs (.fasta, .fa,
+or .fna); read mapping to assembly (.sam.gz or .bam); functional
+annotation (.gff)
+
+5. Click the button to the right of the blank for Input Contig File. A
+ box called 'Select a File' will open to allow the user to find the
+ desired file from a previously run assembly project, the public data
+ folder, or a file uploaded by the user.
+
+6. Click the button to the right of the blank for Input Sam/Bam File. A
+ box called 'Select a File' will open to allow the user to find the
+ read mapping file from a previously run assembly project, the public
+ data folder, or a file uploaded by the user.
+
+7. Click the button to the right of the blank for Input GFF File. A box
+ called 'Select a File' will open to allow the user to find the
+ desired file(s) from a previously run annotation project, the public
+ data folder, or a file uploaded by the user.
+
+8. Then click 'Submit'.
+
+> ![](../_static/images/howto_guides/workflows/MAGs/image4.png)
+
+Output
+
+The General section of the output shows which workflow was run and the
+run time information.
+
+![](../_static/images/howto_guides/workflows/MAGs/image5.png)
+
+The Metagenome MAGs Result section provides a Summary section with
+information on binned and unbinned contigs. The MAGs section provides
+information such as the completeness of the genome, amount of
+contamination, and number of genes present on all bins determined to be
+high quality or medium quality.
+
+![](../_static/images/howto_guides/workflows/MAGs/image6.png)
+
+The Browser/Download Output section provides output files available to
+download. The primary output file is the zipped file with all bins
+determined to be high quality or medium quality (hqmq-metabat-bins.zip).
+
+![](../_static/images/howto_guides/workflows/MAGs/image7.png)
+
+### Running multiple workflows or the full metagenomic pipeline with a single input
+
+## Metatranscriptomics Workflow
+![](../_static/images/howto_guides/workflows/metaT/image1.png)
+
+### Overview
+
+The metatranscriptome (metaT) workflow takes in raw metatranscriptome
+data, filters the data for quality, removes rRNA reads, then assembles
+and annotates the transcripts. The data is mapped back to the genomic
+features in the transcripts and RPKMs ((Reads Per Kilobase of transcript
+per Million mapped reads) are calculated for each feature in the
+functional annotation file.
+
+### Running the Workflow
+
+Currently, this workflow can be run in [NMDC
+EDGE](https://nmdc-edge.org/home) or from the command line. (CLI
+instructions and requirements are found
+[here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/6_MetaT_index.html).)
+
+### Input
+
+Metatranscriptomics requires paired-end Illumina data as an interleaved
+file or as separate pairs of FASTQ files.
+
+- **Acceptable file formats:** .fastq, .fq, .fastq.gz, .fq.gz
+
+### Details
+
+MetaT is a workflow designed to analyze metatranscriptomes, and this
+workflow builds upon other NMDC workflows for processing input
+sequencing data. The metatranscriptomics workflow takes in raw RNA
+sequencing data and quality filters the reads using the ReadsQC
+workflow. Then the MetaT workflow filters out ribosomal RNA reads (using
+the SILVA rRNA database) and separates interleaved files into separate
+pairs of files using bbduk (BBTools). After the filtering steps, the
+reads are assembled into transcripts using MEGAHIT and transcripts are
+annotated using the [Metagenome Annotation NMDC
+Workflow](https://github.com/microbiomedata/mg_annotation) which
+produces GFF functional annotation files. Features are counted with
+[Subread's featureCounts](http://subread.sourceforge.net/) which assigns
+mapped reads to genomic features and generates RPKMs for each feature in
+a GFF file for sense and antisense reads.
+
+### Software Versions
+
+- BBTools v38.44
+
+- hisat2 v2.1
+
+- Python v3.7.6.
+
+- featureCounts v2.0.1
+
+- R v3.6.0
+
+- edgeR v3.28.1
+
+- pandas v1.0.5
+
+- gffutils v0.10.1
+
+### Output
+
+The table below lists the primary output files. The main outputs are the
+assembled transcripts and annotated features file. Several annotation
+files are also available to download.
+
+
+
+
+
Primary Output Files
+
Description
+
+
+
+
+
INPUT_NAME.contigs.fa
+
Assembled transcripts
+
+
+
rpkm_sorted_features.tsv
+
Feature table sorted by RPKM
+
+
+
+
+### Running the Metatranscriptomics Workflow in NMDC EDGE
+
+Select a workflow
+
+1. From the Metatranscriptomics category in the left menu bar, select
+ 'Run a Single Workflow'.
+
+2. Enter a **unique** project name with no spaces
+ (underscores are fine).
+
+3. A description is optional, but helpful.
+
+4. Select 'Metatranscriptome' from the dropdown menu under Workflow.
+
+> ![](../_static/images/howto_guides/workflows/metaT/image2.png)
+
+Input
+
+The metatranscriptome workflow requires paired-end Illumina data in
+FASTQ format as the input; the file can be interleaved and can be
+compressed. **Acceptable file formats:** .fastq, .fq, .fastq.gz, .fq.gz
+
+5. The default setting is for the raw data to be in an interleaved
+ format (paired reads interleaved into one file). If the raw data is
+ paired reads in separate files (forward and reverse), click 'No'.
+
+6. Additional data files (of the same type--interleaved or separate)
+ can be added with the button below.
+
+7. Click the button to the right of the input blank to select the data
+ file for the analysis. (If there are separate files, there will be
+ two input blanks.) A 'Select a File' box will open to allow the user
+ to find the desired file(s) from previously run projects, the public
+ data folder, or files uploaded by the user.
+
+8. Click 'Submit' when ready to run the workflow.
+
+> ![](../_static/images/howto_guides/workflows/metaT/image3.png)
+
+Output
+
+The General section of the output shows which workflow was run, the run
+time information, and the Project Configuration
+
+![](../_static/images/howto_guides/workflows/metaT/image4.png)
+
+The Metatranscriptome Result section includes a table of the top 100
+RPKM results from the overall metatranscriptome data file sorted by
+RPKM. Selecting the header of each column will sort this data by that
+column. This section also includes a button to quickly download a tsv
+file of all detected features in the input dataset for further analysis.
+
+![](../_static/images/howto_guides/workflows/metaT/image5.png)
+
+The Browser/Download Output section provides all output files available
+to download. The output contigs can be found in the assembly folder and
+the tsv file of all detected features sorted by RPKM is available under
+the metat_output folder.
+
+![](../_static/images/howto_guides/workflows/metaT/image6.png)
+
+## Natural Organic Matter Workflow
+
+![](../_static/images/howto_guides/workflows/NOM/image1.png)
+
+### Overview
+
+This workflow takes FTICR mass spectrometry data collected from organic
+extracts to determine the molecular formulas of natural organic
+biomolecules in the input sample.
+
+### Running the Workflow
+
+Currently, this workflow can be run in [NMDC
+EDGE](https://nmdc-edge.org/home) or from the command line. (CLI
+instructions and requirements are found
+[here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/9_NOM_index.html).)
+
+### Input
+
+The input for this workflow is the output from a massSpec experiment (a
+massSpec list) which includes a minimum of two columns of data: a
+mass-to-charge ratio (m/z) and a signal intensity (Intensity) column for
+every feature in the analysis. A calibration file of molecular formula
+references is also required when running the workflow via command line.
+(This calibration file is built into NMDC EDGE.)
+
+**Acceptable file formats:** .raw, .tsv, .csv, .xlsx
+
+### Details
+
+Direct Infusion Fourier Transform Ion Cyclotron Resonance mass
+spectrometry (DI FTICR-MS) data undergoes signal processing and
+molecular formula assignment leveraging EMSL's CoreMS framework. Raw
+time domain data is transformed into the m/z domain using Fourier
+Transform and Ledford equation. Data is denoised followed by peak
+picking, recalibration using an external reference list of known
+compounds, and searched against a dynamically generated molecular
+formula library with a defined molecular search space. The confidence
+scores for all the molecular formula candidates are calculated based on
+the mass accuracy and fine isotopic structure, and the best candidate
+assigned as the highest score. This workflow will not work as reliably
+with Orbitrap mass spectrometry data.
+
+### Software Versions
+
+- CoreMS (2-clause BSD)
+
+- Click (BSD 3-Clause "New" or "Revised" License)
+
+### Output
+
+The primary output file is the Molecular Formula Data Table (in a .csv
+file).
+
+
+
+
+
Primary Output Files
+
Description
+
+
+
+
+
INPUT_NAME.csv
+
m/z, Peak height, Peak Area, Molecular Formula IDs, Confidence Score, etc.
+
+
+
+
+### Running the Natural Organic Matter Workflow in NMDC EDGE
+
+Select a workflow
+
+1. From the Organic Matter category in the left menu bar, select 'Run a
+ Single Workflow'.
+
+2. Enter a **unique** project name with no spaces
+ (underscores are fine).
+
+3. A description is optional, but helpful.
+
+4. Select 'EnviroMS' from the dropdown menu under Workflow.
+
+> ![](../_static/images/howto_guides/workflows/NOM/image2.png)
+
+Input
+
+The Natural Organic Matter workflow input is the output from a massSpec
+experiment (a massSpec list) with a minimum of two columns of data: a
+mass-to-charge ratio (m/z) and a signal intensity (Intensity) column for
+every feature in the analysis. **Acceptable file formats:** .tsv, .csv,
+.raw, .xlsx
+
+5. Click the button to the right of the input blank for data to select
+ the data file for the analysis. (If there are separate files, there
+ will be two input blanks.) A box called 'Select a File' will open to
+ allow the user to find the desired file(s) from the public data
+ folder or files uploaded by the user.
+
+6. Additional input files can be added by clicking the 'Add file'
+ button to create additional input blanks.
+
+7. Once all the input files have been selected, click 'Submit'.
+
+> ![](../_static/images/howto_guides/workflows/NOM/image3.png)
+
+Output
+
+The General section of the output shows which workflow was run and the
+run time information. The Project Configuration can be seen by clicking
+the three dots in the bracket.
+
+![](../_static/images/howto_guides/workflows/NOM/image4.png)
+
+The Browser/Download Output section provides output files available to
+download. The primary output files are: the Molecular Formula Data-Table
+(.csv file) containing m/z measurements, Peak height, Peak Area,
+Molecular Formula Identification, Ion Type, and Confidence Score.
+
+![](../_static/images/howto_guides/workflows/NOM/image5.png)
diff --git a/content/nmdc/src/howto_guides/submit2nmdc.md b/content/nmdc/src/howto_guides/submit2nmdc.md
new file mode 100644
index 0000000..c4920a7
--- /dev/null
+++ b/content/nmdc/src/howto_guides/submit2nmdc.md
@@ -0,0 +1,120 @@
+# Submitting to NMDC
+
+
+## Introduction
+The NMDC Submission Portal (https://data.microbiomedata.org/submission/home) was released in April 2022. It is designed to lower barriers to capture metadata and adhere to community standards, addressing the critical gap of collecting necessary metadata describing a study and its biosamples. The Submission Portal is designed using a flexible framework leveraging a new modeling approach called the Linked Data Modeling Language (LinkML) and the template-driven spreadsheet tool DataHarmonizer. It supports several different community standards, such as the Minimal Information about any (x) Sequence (MIxS) standard from the Genomic Standards Consortium (GSC), the PROV standard for provenance metadata, the Proteomics Standards Initiative (PSI) standards for metaproteomics, and the Metabolomics Standards Initiative (MSI) standards for metabolomics. The Submission Portal is an intuitive interface that allows researchers to provide information about their study, the metadata about the study’s biosamples, multi-omics data associated with the study, and whether Department of Energy (DOE) user facility proposals are associated with the data. Updates and new features are continually being implemented as user research provides new insights to improve usability, and as standards are updated and improved.
+
+### Collaboration to support Community Standards
+The Genomic Standards Consortium (GSC) is an open-membership working body formed nearly twenty years ago with the aim of supporting community-driven standards for sequence data. The GSC has defined a set of core descriptors for genomes, metagenomes and the samples thereof, with the intention to capture relevant environmental and other contextual data (e.g., metadata) to be made available in the International Nucleotide Sequence Database Collaboration (INSDC) primary repositories. The Minimal Information about any (x) Sequence (MIxS) was developed in 2011, and forms the basis for environmental packages that include terms describing specific environments from which a sample was collected (e.g., soil or water). Together with the GSC, the NMDC team has rendered the MIxS standards in LinkML as part of the latest version release. The NMDC has added computability to portions of the MIxS standard and validation can be applied in the NMDC Submission Portal. Through the GSC’s Compliance and Interoperability Group (CIG), the NMDC supports improvements to metadata elements that were unclear or missing, and makes updates to terminology and curation through the insights from numerous workshops hosted through the NMDC Ambassador Program. Specific issues are tracked using the NMDC tag in the GSC GitHub repository (https://github.com/GenomicsStandardsConsortium/mixs/labels/NMDC).
+
+As the NMDC continues to develop and gain user feedback, future iterations of the NMDC Submission Portal will provide templates for describing the ways in which samples are processed in preparation for analysis and improve ecosystem description. This will be accomplished by leveraging and collaborating with many existing standards and ontologies. Beyond the GSC’s standards, the NMDC leverages standards and controlled vocabularies developed by the Proteomics Standards Initiative (PSI), the National Cancer Institute’s Proteomic Data Commons (https://pdc.cancer.gov/data-dictionary/dictionary.html), the IUPAC Gold Book, and the Metabolomics Standards Initiative (MSI) for mass spectrometry data types (e.g., ionization mode, mass resolution, scan rate, and so on). The NMDC team also collaborates heavily with the Environment Ontology (EnvO), which is a community-led ontology that represents environmental entities such as biomes, environmental features, and environmental materials. .
+
+In addition to working across community standards groups, the NMDC also works closely with the Genomes OnLine Database (GOLD) hosted by the Department of Energy’s Joint Genome Institute (JGI). GOLD is an open-access repository of genome, metagenome, and metatranscriptome sequencing projects with their associated metadata. Samples are described using a five-level ecosystem classification path that goes from ecosystem down to the type of environmental material. The NMDC team supports this hierarchical classification system in the Submission Portal, along with enabling search capabilities in the Data Portal. Further, the NMDC and GOLD teams collaborate to curate, update, and make improvements to shared study and metadata information to support interoperability.
+
+### DataHarmonizer: A flexible template-driven tool
+Developed by the Centre for Infectious Disease Genomics and One Health (CIDGOH) at Simon Fraser University, DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating, and transforming sequence contextual data into submission-ready formats for public or private repositories. The tool’s web browser-based JavaScript environment enables real time validation of terms, enabling rapid quality checks within the interface directly. The NMDC has an ongoing open-source collaboration to leverage DataHarmonizer to support the NMDC Submission Portal.
+
+### User-Centered Design Process
+The NMDC is a resource designed together with and for the scientific community. In 2022, the NMDC team conducted three rounds of user interviews (17 user interviews total) from three target groups: general microbiome researchers, potential data submitters, and metadata generators. The Submission Portal has been the subject of several rounds of user research and usability testing, and additional interviews and beta-testing rounds are scheduled to occur throughout the year. New features are continuously added and tested by the microbiome research community, and the NMDC team implements fixes, changes, and enhancements based on this community feedback. The NMDC team and the NMDC Ambassadors run several workshops throughout the year, and feedback from workshop participants is also incorporated into new Submission Portal improvements. The Submission Portal will continue to be shaped by our user-centered design approach.
+
+### Support for DOE User Facilities
+The DOE User Facilities JGI and EMSL are key partners for the NMDC because they support the environmental research community. The NMDC team collaborates closely with the JGI and EMSL to support integration of multi-omics data generated across these Facilities, and particularly as part of the Facilities Integrating Collaborations for User Science (FICUS) Program. The Submission Portal has been designed upfront to be compliant with both JGI and EMSL sample submission requirements, ensuring study and biosample information is consistently collected to support interoperability and data reuse. To demonstrate feasibility, several FICUS user projects have been submitted using the Submission Portal with feedback that has informed improvements and new features. Through the Submission Portal, study and biosample metadata is validated against the NMDC schema, with ‘realtime’ checks on data integrity (accuracy, completeness, and consistency). Further developments will also support data embargo information in accordance with the JGI and EMSL Data Policies.
+
+## Portal Functionality
+
+[![](../_static/images/howto_guides/portal_guide/portal_functionality.png)](../_static/images/howto_guides/portal_guide/portal_functionality.png)
+
+The NMDC Submission Portal is accessible from the ‘Products’ tab on the NMDC website, the NMDC Data Portal, and NMDC EDGE.
+
+[![](../_static/images/howto_guides/portal_guide/ORCiD.png)](../_static/images/howto_guides/portal_guide/ORCiD.png)
+
+The Submission Portal requires ORCiD authentication to access. If you have already signed in via ORCID, you will not see this screen in the NMDC Submission Portal.
+
+[![](../_static/images/howto_guides/portal_guide/Create_submission.png)](../_static/images/howto_guides/portal_guide/Create_submission.png)
+
+Once signed in with an ORCiD, you will see an option to ‘Create New Submission’ with subsequent guidance to provide information required for submission to NMDC. Details about each section are outlined below. Users can also return to existing submissions saved under the ORCiD account to resume their work at any point.
+
+### Submission Context
+
+[![](../_static/images/howto_guides/portal_guide/submission_context.png)](../_static/images/howto_guides/portal_guide/submision_context.png)
+
+On the Submission Context screen, users are asked whether data has already been generated for their study. If a user selects ‘Yes’ a DOI associated with the data can be entered. For data generated at either JGI or EMSL (DOE user facilities), the specific Award DOI (e.g., 10.46936/10.25585/60001289) should be entered, along with selecting the checkbox for this option. If the data was not generated at a DOE User Facility, a valid data DOI can be entered. A data DOI is not the same as a publication DOI, and would be issued through a separate resource as a unique persistent identifier (e.g., 10.48443/e4zf-b917).
+
+[![](../_static/images/howto_guides/portal_guide/shipping_info.png)](../_static/images/howto_guides/portal_guide/shipping_info.png)
+
+If a user selects ‘No’, a user will be asked whether samples will be submitted to JGI, EMSL, or both User Facilities. Selecting EMSL will further prompt the user to provide shipping information for EMSL, along with indicating what project type is associated with an active User proposal (e.g., CSP, BERSS, BRC, FICUS, MONet, or other).
+
+### Study
+
+[![](../_static/images/howto_guides/portal_guide/study_info.png)](../_static/images/howto_guides/portal_guide/study_info.png)
+
+The Study Information page requires a valid ‘Study Name’ along with a valid email address. We highly recommend the use of standardized, informative study names as described by the GOLD team (Mukherjee et. al., 2023). Further information can be provided to include relevant links to webpages and a description of the study. Last, a user can include ‘Contributors’ to acknowledge members of a research team associated with the study. This includes listing Contributor names, ORCiDs, and associating role(s) based on the CRediT (Contributor Roles Taxonomy). A Contributor can have a single or multiple roles.
+
+### Multi-omics Data
+
+[![](../_static/images/howto_guides/portal_guide/multiomics.png)](../_static/images/howto_guides/portal_guide/multiomics.png)
+
+The Multi-omics Data page will prompt users to specify what data types have either already been generated or are anticipated to be generated, depending on the previously entered data. There are no limits to the number of data combinations that can be selected. If any data type attributed to JGI or EMSL (or both) are selected, a user will be required to input the respective proposal number for tracking purposes. Importantly, these selections will be used to support coordinated submission of biosample information with multiple data types. For example, if a user plans to generate paired metagenome and metaproteome data from aliquots of the same physical experimental sample, this information will be captured on the subsequent customized metadata template.
+
+
+### Environmental Package
+
+[![](../_static/images/howto_guides/portal_guide/enviro_package.png)](../_static/images/howto_guides/portal_guide/enviro_package.png)
+
+The Biosample metadata portion of the Submission Portal uses the GSC’s environmental packages to define data entry screens that are suitable for samples from a particular environment. The available environmental packages include: air, built environment, host-associated, hydrocarbon resources-core, hydrocarbon resources- fluids swab, microbial material biofilm, plant associated, sediment, soil, wastewater sludge, water, and miscellaneous natural or artificial environment. Incorporation of additional GSC packages will be completed upon further user research. A user will need to select a single package that best fits the sample environment. If multiple environment types are associated through a single Study (e.g., soil and plant-associated), a separate submission for each environment package will be needed. The selected package determines which metadata fields are required, recommended or optional for each interface. Additionally, curated EnvO and GOLD ecosystem classification terms and other enumerations that can be selected by dropdown menus are available for some packages. Additional packages will be curated as user research is continued..
+
+### Sample Metadata
+
+The Biosample Metadata interface consists of a grid in which each row represents one sample and each column represents one attribute of a Biosample. Users are provided with numerous convenience and organizational features (described below) to assist in metadata completion.
+
+NMDC uses sections for clarity, and to identify when MIxS specifications have been used as published by the GSC, or when NMDC has modified the description, examples, or validation rules for a MIxS attribute captured in their respective columns. These modifications are based on user research and feedback provided to the NMDC.
+
+Biosample metadata can be entered manually (by typing each row), or the data can be entered in bulk by importing a Microsoft Excel XLSX file. The metadata is updated in a NMDC database each time the user navigates across the submission template. Upon completion, the study submission and metadata will be reviewed by a member of NMDC and once approved the submitting user will indicate when data are ready to be published to the Data Portal.
+
+#### Metadata Sections
+Detailed biosample metadata input is captured using a curated metadata template. This page allows users to input biosample metadata into standardized fields based on the selected environmental template. The biosample metadata fields are split into 4 sections: Sample ID which consists of sample and environmental identification information; MIxS which are fields that are identical to those provided in MIxS templates, MIxS (modified) & Inspired which are similar to the MIxS fields, but have been altered in some way or were added based on user feedback. These updated and additional fields are meant to provide clearer context and expectations for the submitter to better capture information about their samples.
+
+#### Download and Import
+
+[![](../_static/images/howto_guides/portal_guide/sub_portal_input.png)](../_static/images/howto_guides/portal_guide/sub_portal_input.png)
+
+The NMDC Submission Portal allows users to enter sample metadata directly into the web interface. However, if a submitter prefers to work in other applications and programs, such as Microsoft Excel, the NMDC sample metadata template can be downloaded as a .xlsx file, opened via a separate application where users can add metadata, and imported back into the Submission Portal for completion and validation.
+
+#### Tools and Features
+
+##### Column Information
+
+[![](../_static/images/howto_guides/portal_guide/sub_portal_enviro_package.png)](../_static/images/howto_guides/portal_guide/sub_portal_enviro_package.png)
+
+[![](../_static/images/howto_guides/portal_guide/column_help.png)](../_static/images/howto_guides/portal_guide/column_help.png)
+
+When column headers are double clicked, or when metadata validation is performed, a column help box will appear. This provides a description of the field, additional guidance, and examples of valid completion.
+
+##### Show/Hide
+
+[![](../_static/images/howto_guides/portal_guide/column_visibility.png)](../_static/images/howto_guides/portal_guide/column_visibility.png)
+
+Users are encouraged to populate as many of the columns as possible, but not all are required or relevant to all sample types or research. To accommodate such needs, the BIosample Metadata interface distinguishes between required, recommended, and optional columns. Which columns appear on the screen can be controlled with the show/hide menu. This tool can be used to hide optional or optional + recommended columns, and the show sub-menu can be used to center a particular section on the user’s screen.
+
+
+##### Jump to Column Search
+
+[![](../_static/images/howto_guides/portal_guide/jump_to_column.png)](../_static/images/howto_guides/portal_guide/jump_to_column.png)
+[![](../_static/images/howto_guides/portal_guide/column_search.png)](../_static/images/howto_guides/portal_guide/column_search.png)
+
+A ‘Jump to column’ feature is available for searching for specific metadata fields. The columns in the ‘Jump to column’ menu are listed in the order they appear on the interface when no visibility constraint has been applied. Users can either scroll though the listed columns or type in any portion of a column's name. For example, as shown above, users can search for the term ‘carbon’ in order to find the ‘carbon/nitrogen ratio’ column. Many slots are available for sample metadata completion, but not all are required or relevant depending on your research questions. The ‘Jump to column’ feature allows the submitter to find the attributes they need and those relevant to their samples.
+
+##### Real Time Validation
+
+[![](../_static/images/howto_guides/portal_guide/validate.png)](../_static/images/howto_guides/portal_guide/validate.png)
+
+The real-time validation tool allows submitters to check their filled-in metadata and overall progress as they submit values to ensure the submission will be valid and adhere to the NMDC schema.
+
+##### Color Legend
+
+[![](../_static/images/howto_guides/portal_guide/color_legend.png)](../_static/images/howto_guides/portal_guide/color_legend.png)
+
+All fields in the metadata template innately fall into one of three color categories: Grey, Yellow, or Purple. Grey, or no highlighted color, indicates optional fields. Required columns are denoted with yellow. These yellow columns must be correctly completed or the submission will not validate. Recommended fields, highlighted purple, are required where applicable. If any of the purple columns provide information relevant to the study, they should be completed. For example, if samples are from a moisture manipulation study, the column ‘watering regimen’ should be filled in to provide context and information about the samples.
+
+After selecting the validate button, cells will become color coded to indicate invalid and incomplete metadata. Dark pink cells indicate a required cell has been left empty. Light pink indicates that there is an error in the formatting of the information entered into that cell. The ‘Column Information’ tool, described above, provides expected structure patterns and examples of valid metadata in order to help remedy errors and invalid inputs.
+)
diff --git a/content/nmdc/src/index.rst b/content/nmdc/src/index.rst
new file mode 100644
index 0000000..a61372b
--- /dev/null
+++ b/content/nmdc/src/index.rst
@@ -0,0 +1,45 @@
+
+NMDC Documentation
+==================
+
+.. toctree::
+ :maxdepth: 2
+ :caption: NMDC Overview:
+
+ overview/nmdc_overview.rst
+
+.. toctree::
+ :maxdepth: 2
+ :caption: Tutorials:
+
+ Submitting to NMDC <./tutorials/submission_portal.md>
+ Running the Workflows <./tutorials/run_workflows>
+ Navigating the Data Portal <./tutorials/nav_data_portal>
+
+.. toctree::
+ :maxdepth: 2
+ :caption: How-To Guides:
+
+ Creating a Data Mgt. Plan <./howto_guides/data_plan.md>
+ Submitting to NMDC <./howto_guides/submit2nmdc.md>
+ Running the Workflows <./howto_guides/run_workflows.md>
+ Navigating the Data Portal <./howto_guides/portal_guide.md>
+ Using the NMDC API GUI <./howto_guides/api_gui.md>
+ Downloading Data via Globus <./howto_guides/globus.md>
+
+.. toctree::
+ :maxdepth: 2
+ :caption: Reference:
+
+ NMDC Schema
+ NMDC Workflows
+ NMDC Data Portal <./reference/data_portal.md>
+
+.. toctree::
+ :maxdepth: 2
+ :caption: Explanation:
+
+ FAIR Data <./explanation/fair_data.md>
+ IDEA <./explanation/idea.md>
+ Community Conversations <./explanation/community_conversations.md>
+ Publications <./explanation/publications.md>
diff --git a/content/nmdc/src/overview/nmdc_overview.rst b/content/nmdc/src/overview/nmdc_overview.rst
new file mode 100644
index 0000000..9b0e8f6
--- /dev/null
+++ b/content/nmdc/src/overview/nmdc_overview.rst
@@ -0,0 +1,21 @@
+Advancing microbiome science together: Welcome to the NMDC
+==================
+
+We have built the National Microbiome Data Collaborative (NMDC) to advance how scientists create, use, and reuse data to redefine the way we understand and harness the power of microbes. The three core infrastructure elements of the NMDC framework are: (1) the `Submission Portal `_ to support collection of standardized study and biosample information; (2) `NMDC EDGE `_, an intuitive user interface to access standardized bioinformatics workflows; and (3) the `Data Portal `_, a resource for consistently processed and integrated multi-omics data enabling search, access, and download [2]. Our engagement strategy includes partnerships with complementary data resources like DOE’s Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (`ESS-DIVE `_) and DOE’s Systems Biology Knowledgebase (`KBase `_); partnerships with DOE User Facilities, the Joint Genome Institute (`JGI `_) and the Envionmental Molecular Sciences Laboratory (`EMSL `_); coordinating with interagency programs outside of the DOE ecosystem such as NSF’s National Ecological Observatory Network (`NEON `_); and development of our flagship engagement programs, the NMDC `Ambassadors `_ and `Champions `_. This community-centric framework leverages unique capabilities, expertise, and resources available at the Department of Energy National Laboratories to create an enabling environment for findable, accessible, interoperable, and reusable (FAIR) multi-omics microbiome data.
+
+
+
+About our Documentation
+==================
+
+To support a systematic approach to our technical documentation, we have adopted the `Diátaxis framework ` which identifies four distinct forms of documentation: tutorials, how-to guides, technical reference and explanation. We aim for our documentation to be accessible to a broad audience, so whether you are new to microbiome research or a leading scientist in the field, we welcome everyone to learn more from our:
+
+1. Tutorials (Learning-oriented): get started through short, hands-on activities
+2. How-To Guides (Task-oriented): step-by-step instructions for using NMDC's products and resources
+3. Explanation (Understanding-oriented): background information covering a wide range of topics
+4. Reference (Information-oriented): in-depth learning materials
+
+Let us know if you have suggestions to further support your research by `reaching out to us `_!
+
+.. image:: ../_static/images/overview/diataxis_documentation_graphic.png
+ :width: 1000
diff --git a/content/nmdc/src/reference/combined_workflow_docs.rst b/content/nmdc/src/reference/combined_workflow_docs.rst
new file mode 100644
index 0000000..04a8122
--- /dev/null
+++ b/content/nmdc/src/reference/combined_workflow_docs.rst
@@ -0,0 +1,1504 @@
+****************************
+NMDC Workflow Documentations
+****************************
+
+
+
+Overview
+==================
+
+NMDC
+----
+The National Microbiome Data Collaborative (NMDC) is a new initiative, launched in July 2019 and funded by the Department of Energy’s (DOE) Office of Science, Biological and Environmental Research program, that aims to empower the research community to more effectively harness microbiome data. The NMDC is building an open-source, integrated data science ecosystem aimed at leveraging existing efforts in the microbiome research space, including data standards, quality, integration, and access, to create a linked data discovery portal. Read the `Nature Reviews Microbiology Comment `_ on the NMDC or visit the `NMDC website `_.
+
+Four national laboratories are working together to produce the NMDC:
+
+ - Lawrence Berkeley National Laboratory
+ - Los Alamos National Laboratory
+ - Oak Ridge National Laboratory
+ - Pacific Northwest National Laboratory
+
+
+NMDC Workflows
+--------------
+
+General Guidelines
+--------------------
+
+NMDC aims to integrate existing open-source bioinformatics tools into standardized workflows for processing raw multi-omics data to produce interoperable and reusable annotated data products. Any commercial software are optional alternatives and not required.
+
+Execution Evironment
+--------------------
+
+Two common ways to install and run the NMDC workflows:
+
+ - Native installation
+ - Containers
+
+The NMDC workflows have been written in WDL and require a WDL-capable Workflow Execution Tool (i.e., Cromwell). To ease the native installation, Docker images have been created for the third-party tools for all of the workflows as well. The workflows use the corresponding Docker images to run the required third-party tools. Databases must be downloaded and installed for most of the workflows.
+
+
+The NMDC workflows are also available as a web application called `NMDC EDGE `_ . The application has only the NMDC workflows integrated into an updated framework for `EDGE Bioinformatics `_ ; this provides the workflows, third-party software, and requisite databases within a platform with a user-friendly interface. NMDC EDGE is provided as a web application especially for users who are not comfortable with running command line tools or without the computational resources to run the command line/ Docker versions.
+
+
+
+Reads QC Workflow (v1.0.2)
+=============================
+
+.. image:: ../_static/images/reference/workflows/1_RQC_rqc_workflow.png
+ :align: center
+ :scale: 50%
+
+
+Workflow Overview
+-----------------
+
+This workflow utilizes the program “rqcfilter2” from BBTools to perform quality control on raw Illumina reads. The workflow performs quality trimming, artifact removal, linker trimming, adapter trimming, and spike-in removal (using BBDuk), and performs human/cat/dog/mouse/microbe removal (using BBMap).
+
+The following parameters are used for "rqcfilter2" in this workflow::
+ - qtrim=r : Quality-trim from right ends before mapping.
+ - trimq=0 : Trim quality threshold.
+ - maxns=3 : Reads with more Ns than this will be discarded.
+ - maq=3 : Reads with average quality (before trimming) below this will be discarded.
+ - minlen=51 : Reads shorter than this after trimming will be discarded. Pairs will be discarded only if both are shorter.
+ - mlf=0.33 : Reads shorter than this fraction of original length after trimming will be discarded.
+ - phix=true : Remove reads containing phiX kmers.
+ - khist=true : Generate a kmer-frequency histogram of the output data.
+ - kapa=true : Remove and quantify kapa tag
+ - trimpolyg=5 : Trim reads that start or end with a G polymer at least this long
+ - clumpify=true : Run clumpify; all deduplication flags require this.
+ - removehuman=true : Remove human reads via mapping.
+ - removedog=true : Remove dog reads via mapping.
+ - removecat=true : Remove cat reads via mapping.
+ - removemouse=true : Remove mouse reads via mapping.
+ - barcodefilter=false : Disable improper barcodes filter
+ - chastityfilter=false: Remove illumina reads failing chastity filter.
+ - trimfragadapter=true: Trim all known Illumina adapter sequences, including TruSeq and Nextera.
+ - removemicrobes=true : Remove common contaminant microbial reads via mapping, and place them in a separate file.
+
+
+Workflow Availability
+---------------------
+
+The workflow from GitHub uses all the listed docker images to run all third-party tools.
+The workflow is available in GitHub: https://github.com/microbiomedata/ReadsQC; the corresponding
+Docker image is available in DockerHub: https://hub.docker.com/r/microbiomedata/bbtools.
+
+Requirements for Execution
+--------------------------
+
+(recommendations are in **bold**)
+
+- WDL-capable Workflow Execution Tool (**Cromwell**)
+- Container Runtime that can load Docker images (**Docker v2.1.0.3 or higher**)
+
+Hardware Requirements
+---------------------
+
+- Disk space: 106 GB for the RQCFilterData database
+- Memory: >40 GB RAM
+
+
+Workflow Dependencies
+---------------------
+
+Third party software (This is included in the Docker image.)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- `BBTools v38.96 `_ (License: `BSD-3-Clause-LBNL `_)
+
+Requisite database
+~~~~~~~~~~~~~~~~~~
+
+The RQCFilterData Database must be downloaded and installed. This is a 106 GB tar file which includes reference datasets of artifacts, adapters, contaminants, the phiX genome, and some host genomes.
+
+The following commands will download the database::
+
+ mkdir refdata
+ wget http://portal.nersc.gov/dna/microbial/assembly/bushnell/RQCFilterData.tar
+ tar -xvf RQCFilterData.tar -C refdata
+ rm RQCFilterData.tar
+
+Sample dataset(s)
+-----------------
+
+- small dataset: `Ecoli 10x `_ . You can find input/output in the downloaded tar gz file.
+
+- large dataset: Zymobiomics mock-community DNA control (`SRR7877884 `_); the `original gzipped dataset `_ is ~5.7 GB. You can find input/output in the downloaded tar gz file.
+
+
+.. note::
+
+ If the input data is paired-end data, it must be in interleaved format. The following command will interleave the files, using the above dataset as an example:
+
+.. code-block:: bash
+
+ paste <(zcat SRR7877884_1.fastq.gz | paste - - - -) <(zcat SRR7877884_2.fastq.gz | paste - - - -) | tr '\t' '\n' | gzip -c > SRR7877884-int.fastq.gz
+
+For testing purposes and for the following examples, we used a 10% sub-sampling of the above dataset: `SRR7877884-int-0.1.fastq.gz `_. This dataset is already interleaved.
+
+Inputs
+------
+
+A JSON file containing the following information:
+
+1. the path to the database
+2. the path to the interleaved fastq file (input data)
+3. the path to the output directory
+4. input_interleaved (boolean)
+5. forwards reads fastq file (when input_interleaved is false)
+6. reverse reads fastq file (when input_interleaved is false)
+7. (optional) parameters for memory
+8. (optional) number of threads requested
+
+
+An example input JSON file is shown below:
+
+.. code-block:: JSON
+
+ {
+ "jgi_rqcfilter.database": "/path/to/refdata",
+ "jgi_rqcfilter.input_files": [
+ "/path/to/SRR7877884-int-0.1.fastq.gz "
+ ],
+ "jgi_rqcfilter.input_interleaved": true,
+ "jgi_rqcfilter.input_fq1":[],
+ "jgi_rqcfilter.input_fq2":[],
+ "jgi_rqcfilter.outdir": "/path/to/rqcfiltered",
+ "jgi_rqcfilter.memory": "35G",
+ "jgi_rqcfilter.threads": "16"
+ }
+
+.. note::
+
+ In an HPC environment, parallel processing allows for processing multiple samples. The "jgi_rqcfilter.input_files" parameter is an array data structure. It can be used for multiple samples as input separated by a comma (,).
+ Ex: "jgi_rqcfilter.input_files":[“first-int.fastq”,”second-int.fastq”]
+
+
+Output
+------
+
+A directory named with the prefix of the FASTQ input file will be created and multiple output files are generated; the main QC FASTQ output is named prefix.anqdpht.fastq.gz. Using the dataset above as an example, the main output would be named SRR7877884-int-0.1.anqdpht.fastq.gz. Other files include statistics on the quality of the data; what was trimmed, detected, and filtered in the data; a status log, and a shell script documenting the steps implemented so the workflow can be reproduced.
+
+An example output JSON file (filterStats.json) is shown below:
+
+.. code-block:: JSON
+
+ {
+ "inputReads": 331126,
+ "kfilteredBases": 138732,
+ "qfilteredReads": 0,
+ "ktrimmedReads": 478,
+ "outputBases": 1680724,
+ "ktrimmedBases": 25248,
+ "kfilteredReads": 926,
+ "qtrimmedBases": 0,
+ "outputReads": 11212,
+ "gcPolymerRatio": 0.182857,
+ "inputBases": 50000026,
+ "qtrimmedReads": 0,
+ "qfilteredBases": 0
+ }
+
+
+Below is an example of all the output directory files with descriptions to the right.
+
+==================================== ============================================================================
+FileName Description
+==================================== ============================================================================
+SRR7877884-int-0.1.anqdpht.fastq.gz main output (clean data)
+adaptersDetected.fa adapters detected and removed
+bhist.txt base composition histogram by position
+cardinality.txt estimation of the number of unique kmers
+commonMicrobes.txt detected common microbes
+file-list.txt output file list for rqcfilter2.sh
+filterStats.txt summary statistics
+filterStats.json summary statistics in JSON format
+filterStats2.txt more detailed summary statistics
+gchist.txt GC content histogram
+human.fq.gz detected human sequence reads
+ihist_merge.txt insert size histogram
+khist.txt kmer-frequency histogram
+kmerStats1.txt synthetic molecule (phix, linker, lamda, pJET) filter run log
+kmerStats2.txt synthetic molecule (short contamination) filter run log
+ktrim_kmerStats1.txt detected adapters filter run log
+ktrim_scaffoldStats1.txt detected adapters filter statistics
+microbes.fq.gz detected common microbes sequence reads
+microbesUsed.txt common microbes list for detection
+peaks.txt number of unique kmers in each peak on the histogram
+phist.txt polymer length histogram
+refStats.txt human reads filter statistics
+reproduce.sh the shell script to reproduce the run
+scaffoldStats1.txt detected synthetic molecule (phix, linker, lamda, pJET) statistics
+scaffoldStats2.txt detected synthetic molecule (short contamination) statistics
+scaffoldStatsSpikein.txt detected skipe-in kapa tag statistics
+sketch.txt mash type sketch scanned result against nt, refseq, silva database sketches.
+spikein.fq.gz detected skipe-in kapa tag sequence reads
+status.log rqcfilter2.sh running log
+synth1.fq.gz detected synthetic molecule (phix, linker, lamda, pJET) sequence reads
+synth2.fq.gz detected synthetic molecule (short contamination) sequence reads
+==================================== ============================================================================
+
+
+Version History
+---------------
+
+- 1.0.2 (release date **04/09/2021**; previous versions: 1.0.1)
+
+
+Point of contact
+----------------
+
+- Original author: Brian Bushnell
+
+- Package maintainer: Chienchi Lo
+
+
+
+
+The Read-based Taxonomy Classification (v1.0.1)
+================================================
+
+.. image:: ../_static/images/reference/workflows/2_ReadAnalysis_readbased_analysis_workflow.png
+ :align: center
+ :scale: 50%
+
+Workflow Overview
+-----------------
+The pipeline takes in sequencing files (single- or paired-end) and profiles them using multiple taxonomic classification tools with the Cromwell as the workflow manager.
+
+Workflow Availability
+---------------------
+The workflow is available in GitHub: https://github.com/microbiomedata/ReadbasedAnalysis; the corresponding Docker image is available in DockerHub: https://hub.docker.com/r/microbiomedata/nmdc_taxa_profilers
+
+Requirements for Execution:
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+(recommendations are in **bold**)
+
+- WDL-capable Workflow Execution Tool (**Cromwell**)
+- Container Runtime that can load Docker images (**Docker v2.1.0.3 or higher**)
+
+Hardware Requirements:
+~~~~~~~~~~~~~~~~~~~~~~
+- Disk space: 152 GB for databases (55 GB, 89 GB, and 8 GB for GOTTCHA2, Kraken2 and Centrifuge databases, respectively)
+- 60 GB RAM
+
+Workflow Dependencies
+---------------------
+
+Third party software:
+~~~~~~~~~~~~~~~~~~~~~
+
+(These are included in the Docker image.)
+
+- `GOTTCHA2 v2.1.6 `_ (License: `BSD-3-Clause-LANL `_)
+- `Kraken2 v2.0.8 `_ (License: `MIT `_)
+- `Centrifuge v1.0.4 `_ (License: `GPL-3 `_)
+
+Requisite databases:
+~~~~~~~~~~~~~~~~~~~~
+
+The database for each tool must be downloaded and installed. These databases total 152 GB.
+
+- GOTTCHA2 database (gottcha2/):
+
+The database RefSeqr90.cg.BacteriaArchaeaViruses.species.fna contains complete genomes of bacteria, archaea and viruses from RefSeq Release 90. The following commands will download the database:
+
+::
+
+ wget https://edge-dl.lanl.gov/GOTTCHA2/RefSeq-r90.cg.BacteriaArchaeaViruses.species.tar
+ tar -xvf RefSeq-r90.cg.BacteriaArchaeaViruses.species.tar
+ rm RefSeq-r90.cg.BacteriaArchaeaViruses.species.tar
+
+- Kraken2 database (kraken2/):
+
+This is a standard Kraken 2 database, built from NCBI RefSeq genomes. The following commands will download the database:
+
+::
+
+ mkdir kraken2
+ wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20201202.tar.gz
+ tar -xzvf k2_standard_20201202.tar.gz -C kraken2
+ rm k2_standard_20201202.tar.gz
+
+- Centrifuge database (centrifuge/):
+
+This is a compressed database built from RefSeq genomes of Bacteria and Archaea. The following commands will download the database:
+
+::
+
+ mkdir centrifuge
+ wget https://genome-idx.s3.amazonaws.com/centrifuge/p_compressed_2018_4_15.tar.gz
+ tar -xzvf p_compressed_2018_4_15.tar.gz -C centrifuge
+ rm p_compressed_2018_4_15.tar.gz
+
+
+Sample dataset(s):
+~~~~~~~~~~~~~~~~~~
+
+Zymobiomics mock-community DNA control (SRR7877884); this dataset is ~7 GB.
+
+Input: A JSON file containing the following information:
+1. selection of profiling tools (set as true if selected)
+2. the paths to the required database(s) for the tools selected
+3. the paths to the input fastq file(s) (paired-end data is shown; this can be the output of the Reads QC workflow in interleaved format which will be treated as single-end data.)
+4. the prefix for the output file names
+5. the path of the output directory
+6. CPU number requested for the run.
+
+.. code-block:: JSON
+
+ {
+ "ReadbasedAnalysis.enabled_tools": {
+ "gottcha2": true,
+ "kraken2": true,
+ "centrifuge": true
+ },
+ "ReadbasedAnalysis.db": {
+ "gottcha2": "/path/to/database/RefSeq-r90.cg.BacteriaArchaeaViruses.species.fna",
+ "kraken2": " /path/to/kraken2",
+ "centrifuge": "/path/to/centrifuge/p_compressed"
+ },
+ "ReadbasedAnalysis.reads": [
+ "/path/to/SRR7877884.1.fastq.gz",
+ "/path/to/SRR7877884.2.fastq.gz"
+ ],
+ "ReadbasedAnalysis.paired": true,
+ "ReadbasedAnalysis.prefix": "SRR7877884",
+ "ReadbasedAnalysis.outdir": "/path/to/ReadbasedAnalysis",
+ "ReadbasedAnalysis.cpu": 4
+ }
+
+Output:
+~~~~~~~
+
+The workflow creates an output JSON file and individual output sub-directories for each tool which include tabular classification results, a tabular report, and a Krona plot (html).::
+
+ ReadbasedAnalysis/
+ |-- SRR7877884.json
+ |-- centrifuge
+ | |-- SRR7877884.classification.tsv
+ | |-- SRR7877884.report.tsv
+ | `-- SRR7877884.krona.html
+ |
+ |-- gottcha2
+ | |-- SRR7877884.full.tsv
+ | |-- SRR7877884.krona.html
+ | `-- SRR7877884.tsv
+ |
+ `-- kraken2
+ |-- SRR7877884.classification.tsv
+ |-- SRR7877884.krona.html
+ `-- SRR7877884.report.tsv
+
+
+Below is an example of the output directory files with descriptions to the right.
+
+======================================== ==============================================
+FileName Description
+---------------------------------------- ----------------------------------------------
+SRR7877884.json ReadbasedAnalysis result JSON file
+centrifuge/SRR7877884.classification.tsv Centrifuge output read classification TSV file
+centrifuge/SRR7877884.report.tsv Centrifuge output report TSV file
+centrifuge/SRR7877884.krona.html Centrifuge krona plot HTML file
+gottcha2/SRR7877884.full.tsv GOTTCHA2 detail output TSV file
+gottcha2/SRR7877884.tsv GOTTCHA2 output report TSV file
+gottcha2/SRR7877884.krona.html GOTTCHA2 krona plot HTML file
+kraken2/SRR7877884.classification.tsv Kraken2 output read classification TSV file
+kraken2/SRR7877884.report.tsv Kraken2 output report TSV file
+kraken2/SRR7877884.krona.html Kraken2 krona plot HTML file
+======================================== ==============================================
+
+
+Version History
+---------------
+
+1.0.1 (release date 01/14/2021; previous versions: 1.0.0)
+
+Point of contact
+----------------
+
+Package maintainer: Po-E Li
+
+
+
+Metagenome Assembly Workflow (v1.0.2)
+========================================
+
+.. image:: ../_static/images/reference/workflows/3_MetaGAssemly_workflow_assembly.png
+ :scale: 60%
+ :alt: Metagenome assembly workflow dependencies
+
+Workflow Overview
+-----------------
+
+This workflow takes in paired-end Illumina reads in interleaved format and performs error correction, then reformats the interleaved file into two FASTQ files for downstream tasks using bbcms (BBTools). The corrected reads are assembled using metaSPAdes. After assembly, the reads are mapped back to contigs by bbmap (BBTools) for coverage information. The .wdl (Workflow Description Language) file includes five tasks, *bbcms*, *assy*, *create_agp*, *read_mapping_pairs*, and *make_output*.
+
+1. The *bbcms* task takes in interleaved FASTQ inputs and performs error correction and reformats the interleaved fastq into two output FASTQ files for paired-end reads for the next tasks.
+2. The *assy* task performs metaSPAdes assembly
+3. Contigs and Scaffolds (output of metaSPAdes) are consumed by the *create_agp* task to rename the FASTA header and generate an `AGP format `_ which describes the assembly
+4. The *read_mapping_pairs* task maps reads back to the final assembly to generate coverage information.
+5. The final *make_output* task adds all output files into the specified directory.
+
+Workflow Availability
+---------------------
+
+The workflow from GitHub uses all the listed docker images to run all third-party tools.
+The workflow is available in GitHub: https://github.com/microbiomedata/metaAssembly; the corresponding Docker images are available in DockerHub: https://hub.docker.com/r/microbiomedata/spades and https://hub.docker.com/r/microbiomedata/bbtools
+
+Requirements for Execution
+--------------------------
+
+(recommendations are in **bold**)
+
+- WDL-capable Workflow Execution Tool (**Cromwell**)
+- Container Runtime that can load Docker images (**Docker v2.1.0.3 or higher**)
+
+Hardware Requirements
+---------------------
+
+- Memory: >40 GB RAM
+
+The memory requirement depends on the input complexity. Here is a simple estimation equation for the memory required based on kmers in the input file::
+
+ predicted_mem = (kmers * 2.962e-08 + 1.630e+01) * 1.1 (in GB)
+
+.. note::
+
+ The kmers variable for the equation above can be obtained using the kmercountmulti.sh script from BBTools.
+
+ kmercountmulti.sh -k=31 in=your.read.fq.gz
+
+
+Workflow Dependencies
+---------------------
+
+Third party software: (This is included in the Docker image.)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- `metaSPades v3.15.0 `_ (License: `GPLv2 `_)
+- `BBTools:38.94 `_ (License: `BSD-3-Clause-LBNL `_)
+
+Sample dataset(s)
+-----------------
+
+- small dataset: `Ecoli 10x (287M) `_ . You can find input/output in the downloaded tar gz file.
+
+- large dataset: `Zymobiomics mock-community DNA control (22G) `_ . You can find input/output in the downloaded tar gz file.
+
+Zymobiomics mock-community DNA control (`SRR7877884 `_); this original dataset is ~4 GB.
+
+For testing purposes and for the following examples, we used a 10% sub-sampling of the above dataset: (`SRR7877884-int-0.1.fastq.gz `_). This dataset is already interleaved.
+
+
+Input
+-----
+
+A JSON file containing the following information:
+
+1. the path to the input FASTQ file (Illumina paired-end interleaved FASTQ) (recommended the output of the Reads QC workflow.)
+2. the contig prefix for the FASTA header
+3. the output path
+4. input_interleaved (boolean)
+5. forwards reads fastq file (required value when input_interleaved is false, otherwise use [] )
+6. reverse reads fastq file (required value when input_interleaved is false, otherwise use [] )
+7. memory (optional) ex: “jgi_metaASM.memory”: “105G”
+8. threads (optional) ex: “jgi_metaASM.threads”: “16”
+
+An example input JSON file is shown below::
+
+ {
+ "jgi_metaASM.input_file":["/path/to/SRR7877884-int-0.1.fastq.gz "],
+ "jgi_metaASM.rename_contig_prefix":"projectID",
+ "jgi_metaASM.outdir":"/path/to/ SRR7877884-int-0.1_assembly",
+ "jgi_metaASM.input_interleaved":true,
+ "jgi_metaASM.input_fq1":[],
+ "jgi_metaASM.input_fq2":[],
+ "jgi_metaASM.memory": "105G",
+ "jgi_metaASM.threads": "16"
+ }
+
+Output
+------
+
+The output directory will contain following files::
+
+
+ output/
+ ├── assembly.agp
+ ├── assembly_contigs.fna
+ ├── assembly_scaffolds.fna
+ ├── covstats.txt
+ ├── pairedMapped.sam.gz
+ ├── pairedMapped_sorted.bam
+ └── stats.json
+
+Part of an example output stats JSON file is shown below:
+
+```
+{
+ "scaffolds": 58,
+ "contigs": 58,
+ "scaf_bp": 28406,
+ "contig_bp": 28406,
+ "gap_pct": 0.00000,
+ "scaf_N50": 21,
+ "scaf_L50": 536,
+ "ctg_N50": 21,
+ "ctg_L50": 536,
+ "scaf_N90": 49,
+ "scaf_L90": 317,
+ "ctg_N90": 49,
+ "ctg_L90": 317,
+ "scaf_logsum": 22.158,
+ "scaf_powsum": 2.245,
+ "ctg_logsum": 22.158,
+ "ctg_powsum": 2.245,
+ "asm_score": 0.000,
+ "scaf_max": 1117,
+ "ctg_max": 1117,
+ "scaf_n_gt50K": 0,
+ "scaf_l_gt50K": 0,
+ "scaf_pct_gt50K": 0.0,
+ "gc_avg": 0.39129,
+ "gc_std": 0.03033,
+ "filename": "/global/cfs/cdirs/m3408/aim2/metagenome/assembly/cromwell-executions/jgi_metaASM/3342a6e8-7f78-40e6-a831-364dd2a47baa/call-create_agp/execution/assembly_scaffolds.fna"
+}
+```
+
+
+The table provides all of the output directories, files, and their descriptions.
+
+=================================================== ================================= ===============================================================
+Directory File Name Description
+=================================================== ================================= ===============================================================
+**bbcms** Error correction result directory
+bbcms/berkeleylab-jgi-meta-60ade422cd4e directory containing checking resource script
+bbcms/ counts.metadata.json bbcms commands and summary statistics in JSON format
+bbcms/ input.corr.fastq.gz error corrected reads in interleaved format.
+bbcms/ input.corr.left.fastq.gz error corrected forward reads
+bbcms/ input.corr.right.fastq.gz error corrected reverse reads
+bbcms/ rc cromwell script sbumit return code
+bbcms/ readlen.txt error corrected reads statistics
+bbcms/ resources.log resource checking log
+bbcms/ script Task run commands
+bbcms/ script.background Bash script to run script.submit
+bbcms/ script.submit cromwell submit commands
+bbcms/ stderr standard error where task writes error message to
+bbcms/ stderr.background standard error where bash script writes error message to
+bbcms/ stderr.log standard error from bbcms command
+bbcms/ stdout standard output where task writes error message to
+bbcms/ stdout.background standard output where bash script writes error message(s)
+bbcms/ stdout.log standard output from bbcms command
+bbcms/ unique31mer.txt the count of unique kmer, K=31
+**spades3** metaSPAdes assembly result directory
+spades3/K33 directory containing intermediate files from the run with K=33
+spades3/K55 directory containing intermediate files from the run with K=55
+spades3/K77 directory containing intermediate files from the run with K=77
+spades3/K99 directory containing intermediate files from the run with K=99
+spades3/K127 directory containing intermediate files from the run with K=127
+spades3/misc directory containing miscellaneous files
+spades3/tmp directory for temp files
+spades3/ assembly_graph.fastg metaSPAdes assembly graph in FASTG format
+spades3/ assembly_graph_with_scaffolds.gfa metaSPAdes assembly graph and scaffolds paths in GFA 1.0 format
+spades3/ before_rr.fasta contigs before repeat resolution
+spades3/ contigs.fasta metaSPAdes resulting contigs
+spades3/ contigs.paths paths in the assembly graph corresponding to contigs.fasta
+spades3/ dataset.info internal configuration file
+spades3/ first_pe_contigs.fasta preliminary contigs of iterative kmers assembly
+spades3/ input_dataset.yaml internal YAML data set file
+spades3/ params.txt information about SPAdes parameters in this run
+spades3/ scaffolds.fasta metaSPAdes resulting scaffolds
+spades3/ scaffolds.paths paths in the assembly graph corresponding to scaffolds.fasta
+spades3/ spades.log metaSPAdes log
+**final_assembly** create_agp task result directory
+final_assembly/berkeleylab-jgi-meta-60ade422cd4e directory containing checking resource script
+final_assembly/ assembly.agp an AGP format file describes the assembly
+final_assembly/ assembly_contigs.fna Final assembly contig fasta
+final_assembly/ assembly_scaffolds.fna Final assembly scaffolds fasta
+final_assembly/ assembly_scaffolds.legend name mapping file from spades node name to new name
+final_assembly/ rc cromwell script sbumit return code
+final_assembly/ resources.log resource checking log
+final_assembly/ script Task run commands
+final_assembly/ script.background Bash script to run script.submit
+final_assembly/ script.submit cromwell submit commands
+final_assembly/ stats.json assembly statistics in json format
+final_assembly/ stderr standard error where task writes error message to
+final_assembly/ stderr.background standard error where bash script writes error message to
+final_assembly/ stdout standard output where task writes error message to
+final_assembly/ stdout.background standard output where bash script writes error message to
+**mapping** maps reads back to the final assembly result directory
+mapping/ covstats.txt contigs coverage informaiton
+mapping/ mapping_stats.txt contigs coverage informaiton (same as covstats.txt)
+mapping/ pairedMapped.bam reads mapping back to the final assembly bam file
+mapping/ pairedMapped.sam.gz reads mapping back to the final assembly sam.gz file
+mapping/ pairedMapped_sorted.bam reads mapping back to the final assembly sorted bam file
+mapping/ pairedMapped_sorted.bam.bai reads mapping back to the final assembly sorted bam index file
+mapping/ rc cromwell script sbumit return code
+mapping/ resources.log resource checking log
+mapping/ script Task run commands
+mapping/ script.background Bash script to run script.submit
+mapping/ script.submit cromwell submit commands
+mapping/ stderr standard error where task writes error message to
+mapping/ stderr.background standard error where bash script writes error message to
+mapping/ stdout standard output where task writes error message to
+mapping/ stdout.background standard output where bash script writes error message to
+=================================================== ================================= ===============================================================
+
+Version History
+---------------
+
+- 1.0.2 (release date **03/12/2021**; previous versions: 1.0.1)
+
+Point of contact
+----------------
+
+- Original author: Brian Foster
+
+- Package maintainer: Chienchi Lo
+
+
+
+Metagenome Annotation Workflow (v1.0.0)
+=======================================
+
+.. image:: ../_static/images/reference/workflows/4_MetaGAnnotation_annotation.png
+
+Workflow Overview
+-----------------
+This workflow takes assembled metagenomes and generates structural and functional annotations. The workflow uses a number of open-source tools and databases to generate the structural and functional annotations.
+
+The input assembly is first split into 10MB splits to be processed in parallel. Depending on the workflow engine configuration, the split can be processed in parallel. Each split is first structurally annotated, then those results are used for the functional annotation. The structural annotation uses tRNAscan_se, RFAM, CRT, Prodigal and GeneMarkS. These results are merged to create a consensus structural annotation. The resulting GFF is the input for functional annotation which uses multiple protein family databases (SMART, COG, TIGRFAM, SUPERFAMILY, Pfam and Cath-FunFam) along with custom HMM models. The functional predictions are created using Last and HMM. These annotations are also merged into a consensus GFF file. Finally, the respective split annotations are merged together to generate a single structural annotation file and single functional annotation file. In addition, several summary files are generated in TSV format.
+
+
+Workflow Availability
+---------------------
+The workflow is available in GitHub: https://github.com/microbiomedata/mg_annotation/ and the corresponding Docker image is available in DockerHub: https://hub.docker.com/r/microbiomedata/mg-annotation.
+
+Requirements for Execution (recommendations are in bold):
+---------------------------------------------------------
+
+- WDL-capable Workflow Execution Tool **(Cromwell)**
+- Container Runtime that can load Docker images **(Docker v2.1.0.3 or higher)**
+
+Hardware Requirements:
+----------------------
+- Disk space: 106 GB for the reference databases
+- Memory: >100 GB RAM
+
+
+Workflow Dependencies
+---------------------
+
+- Third party software (This is included in the Docker image.)
+ - Conda (3-clause BSD)
+ - tRNAscan-SE >= 2.0 (GNU GPL v3)
+ - Infernal 1.1.2 (BSD)
+ - CRT-CLI 1.8 (Public domain software, last official version is 1.2)
+ - Prodigal 2.6.3 (GNU GPL v3)
+ - GeneMarkS-2 >= 1.07 (Academic license for GeneMark family software)
+ - Last >= 983 (GNU GPL v3)
+ - HMMER 3.1b2 (3-clause BSD)
+ - TMHMM 2.0 (Academic)
+- Requisite databases: The databases are available by request. Please contact NMDC (support@microbiomedata.org) for access.
+
+
+Sample datasets
+---------------
+https://raw.githubusercontent.com/microbiomedata/mg_annotation/master/example.fasta
+
+
+**Input:** A JSON file containing the following:
+
+1. The path to the assembled contigs fasta file
+2. The ID to associate with the result products (e.g. sample ID)
+
+An example JSON file is shown below:
+
+.. code-block:: JSON
+
+ {
+ "annotation.imgap_input_fasta": "/path/to/fasta.fna",
+ "annotation.imgap_project_id": "samp_xyz123"}
+ }
+
+
+
+**Output:** The final structural and functional annotation files are in GFF format and the summary files are in TSV format. The key outputs are listed below but additional files are available.
+
+- GFF: Structural annotation
+- GFF: Functional annotation
+- TSV: KO Summary
+- TSV: EC Summary
+- TSV: Gene Phylogeny Summar
+
+The output paths can be obtained from the output metadata file from the Cromwell Exectuion. Here is a snippet from the outputs section
+of the full metadata JSON file.
+
+.. code-block:: JSON
+
+ {
+ "annotation.cath_funfam_gff": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_cath_funfam.gff",
+ "annotation.cog_gff": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_cog.gff",
+ "annotation.ko_ec_gff": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_ko_ec.gff",
+ "annotation.product_names_tsv": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_product_names.tsv",
+ "annotation.gene_phylogeny_tsv": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_gene_phylogeny.tsv",
+ "annotation.pfam_gff": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_pfam.gff",
+ "annotation.proteins_tigrfam_domtblout": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_proteins.tigrfam.domtblout",
+ "annotation.structural_gff": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_structural_annotation.gff",
+ "annotation.ec_tsv": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_ec.tsv",
+ "annotation.supfam_gff": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_supfam.gff",
+ "annotation.proteins_supfam_domtblout": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_proteins.supfam.domtblout",
+ "annotation.tigrfam_gff": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_tigrfam.gff",
+ "annotation.stats_tsv": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-final_stats/execution/samp_xyz123_structural_annotation_stats.tsv",
+ "annotation.proteins_cog_domtblout": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_proteins.cog.domtblout",
+ "annotation.ko_tsv": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_ko.tsv",
+ "annotation.proteins_pfam_domtblout": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_proteins.pfam.domtblout",
+ "annotation.proteins_smart_domtblout": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_proteins.smart.domtblout",
+ "annotation.crt_crisprs": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_crt.crisprs",
+ "annotation.functional_gff": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_functional_annotation.gff",
+ "annotation.proteins_faa": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123.faa",
+ "annotation.smart_gff": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_smart.gff",
+ "annotation.proteins_cath_funfam_domtblout": "/output/cromwell-executions/annotation/a67a5a0f-1ad7-4469-bb0c-780f4ef20307/call-merge_outputs/execution/samp_xyz123_proteins.cath_funfam.domtblout"
+ }
+
+
+
+**Version History:** 1.0.0 (release data)
+
+Point of contact
+----------------
+
+* Package maintainer: Shane Canon
+
+
+
+
+
+Metagenome Assembled Genomes Workflow (v1.0.4)
+=============================================
+
+.. image:: ../_static/images/reference/workflows/5_MAG_MAG_workflow.png
+ :scale: 40%
+ :alt: Metagenome assembled genomes generation
+
+
+Workflow Overview
+-----------------
+
+
+The workflow is based on `IMG metagenome binning pipeline `_ and has been modified specifically for the `NMDC project `_. For all processed metagenomes, it classifies contigs into bins using MetaBat2. Next, the bins are refined using the functional Annotation file (GFF) from the Metagenome Annotation workflow and optional contig lineage information. The completeness of and the contamination present in the bins are evaluated by CheckM and bins are assigned a quality level (High Quality (HQ), Medium Quality (MQ), Low Quality (LQ)) based on `MiMAG standards `_. In the end, GTDB-Tk is used to assign lineage for HQ and MQ bins.
+
+Workflow Availability
+---------------------
+
+The workflow from GitHub uses all the listed docker images to run all third-party tools.
+The workflow is available in GitHub: https://github.com/microbiomedata/metaMAGs
+The corresponding Docker image is available in DockerHub: https://hub.docker.com/r/microbiomedata/nmdc_mbin
+
+Requirements for Execution
+--------------------------
+
+(recommendations are in **bold**):
+
+- WDL-capable Workflow Execution Tool (**Cromwell**)
+- Container Runtime that can load Docker images (**Docker v2.1.0.3 or higher**)
+
+Hardware Requirements
+---------------------
+
+- Disk space: > 33 GB for the CheckM and GTDB-Tk databases
+- Memory: ~150GB memory for GTDB-tk.
+
+Workflow Dependencies
+---------------------
+
+Third party software (These are included in the Docker image.)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- `Biopython v1.74 `_ (License: `BSD-3-Clause `_)
+- `Sqlite `_ (License: `Public Domain `_)
+- `Pymysql `_ (License: `MIT License `_)
+- `requests `_ (License: `Apache 2.0 `_)
+- `samtools > v1.9 `_ (License: `MIT License `_)
+- `Metabat2 v2.15 `_ (License: `BSD-3-Clause `_)
+- `CheckM v1.1.2 `_ (License: `GPLv3 `_)
+- `GTDB-TK v1.3.0 `_ (License: `GPLv3 `_)
+- `FastANI v1.3 `_ (License: `Apache 2.0 `_)
+- `FastTree v2.1.10 `_ (License: `GPLv2 `_)
+
+
+Requisite databases
+~~~~~~~~~~~~~~~~~~~~~
+
+The GTDB-Tk database must be downloaded and installed. The CheckM database included in the Docker image is a 275MB file contains the databases used for the Metagenome Binned contig quality assessment. The GTDB-Tk (27GB) database is used to assign lineages to the binned contigs.
+
+- The following commands will download and unarchive the GTDB-Tk database::
+
+ wget https://data.gtdb.ecogenomic.org/releases/release95/95.0/auxillary_files/gtdbtk_r95_data.tar.gz
+ tar -xvzf gtdbtk_r95_data.tar.gz
+ mv release95 GTDBTK_DB
+ rm gtdbtk_r95_data.tar.gz
+
+Sample dataset(s)
+-----------------
+
+
+The following test datasets include an assembled contigs file, a BAM file, and a functional annotation file:
+
+- small dataset: `LQ only (3.1G) `_ . You can find input/output in the downloaded tar gz file.
+
+- large dataset: `with HQ and MQ bins (12G) `_ . You can find input/output in the downloaded tar gz file.
+
+
+
+Input
+-----
+
+A JSON file containing the following:
+
+1. the number of CPUs requested
+2. The number of threads used by pplacer (Use lower number to reduce the memory usage)
+3. the path to the output directory
+4. the project name
+5. the path to the Metagenome Assembled Contig fasta file (FNA)
+6. the path to the Sam/Bam file from read mapping back to contigs (SAM.gz or BAM)
+7. the path to contigs functional annotation result (GFF)
+8. the path to the text file which contains mapping of headers between SAM or BAM and GFF (ID in SAM/FNAID in GFF). A two column tab-delimited file. When the annotation and assembly are performed using different identifiers for contigs. The map file is to link the gff file content and mapping result bam file content to the assembled contigs ID.
+9. the path to the database directory which includes *checkM_DB* and *GTDBTK_DB* subdirectories.
+10. (optional) scratch_dir: use --scratch_dir for gtdbtk disk swap to reduce memory usage but longer runtime
+
+
+An example JSON file is shown below::
+
+ {
+ "nmdc_mags.cpu":32,
+ "nmdc_mags.pplacer_cpu":1,
+ "nmdc_mags.outdir":"/path/to/output",
+ "nmdc_mags.proj_name":" Ga0482263",
+ "nmdc_mags.contig_file":"/path/to/Ga0482263_contigs.fna ",
+ "nmdc_mags.sam_file":"/path/to/pairedMapped_sorted.bam ",
+ "nmdc_mags.gff_file":"/path/to/Ga0482263_functional_annotation.gff",
+ "nmdc_mags.map_file":"/path/to/Ga0482263_contig_names_mapping.tsv",
+ "nmdc_mags.gtdbtk_database":"/path/to/GTDBTK_DB"
+ "nmdc_mags.scratch_dir":"/path/to/scratch_dir"
+ }
+
+
+
+Output
+------
+
+The workflow creates several output directories with many files. The main output files, the binned contig files from HQ and MQ bins, are in the *hqmq-metabat-bins* directory; the corresponding lineage results for the HQ and MQ bins are in the *gtdbtk_output* directory.
+
+
+A partial JSON output file is shown below::
+
+ |-- MAGs_stats.json
+ |-- 3300037552.bam.sorted
+ |-- 3300037552.depth
+ |-- 3300037552.depth.mapped
+ |-- bins.lowDepth.fa
+ |-- bins.tooShort.fa
+ |-- bins.unbinned.fa
+ |-- checkm-out
+ | |-- bins/
+ | |-- checkm.log
+ | |-- lineage.ms
+ | `-- storage
+ |-- checkm_qa.out
+ |-- gtdbtk_output
+ | |-- align/
+ | |-- classify/
+ | |-- identify/
+ | |-- gtdbtk.ar122.classify.tree -> classify/gtdbtk.ar122.classify.tree
+ | |-- gtdbtk.ar122.markers_summary.tsv -> identify/gtdbtk.ar122.markers_summary.tsv
+ | |-- gtdbtk.ar122.summary.tsv -> classify/gtdbtk.ar122.summary.tsv
+ | |-- gtdbtk.bac120.classify.tree -> classify/gtdbtk.bac120.classify.tree
+ | |-- gtdbtk.bac120.markers_summary.tsv -> identify/gtdbtk.bac120.markers_summary.tsv
+ | |-- gtdbtk.bac120.summary.tsv -> classify/gtdbtk.bac120.summary.tsv
+ | `-- ..etc
+ |-- hqmq-metabat-bins
+ | |-- bins.11.fa
+ | |-- bins.13.fa
+ | `-- ... etc
+ |-- mbin-2020-05-24.sqlite
+ |-- mbin-nmdc.20200524.log
+ |-- metabat-bins
+ | |-- bins.1.fa
+ | |-- bins.10.fa
+ | `-- ... etc
+
+Below is an example of all the output directory files with descriptions to the right.
+
+=================================================== ====================================================================================
+FileName/DirectoryName Description
+=================================================== ====================================================================================
+1781_86104.bam.sorted sorted input bam file
+1781_86104.depth the contig depth coverage
+1781_86104.depth.mapped the name mapped contig depth coverage
+MAGs_stats.json MAGs statistics in json format
+bins.lowDepth.fa lowDepth (mean cov <1 ) filtered contigs fasta file by metaBat2
+bins.tooShort.fa tooShort (< 3kb) filtered contigs fasta file by metaBat2
+bins.unbinned.fa unbinned fasta file
+metabat-bins/ initial metabat2 binning result fasta output directory
+checkm-out/bins/ hmm and marker genes analysis result directory for each bin
+checkm-out/checkm.log checkm run log file
+checkm-out/lineage.ms lists the markers used to assign taxonomy and the taxonomic level to which the bin
+checkm-out/storage/ intermediate file directory
+checkm_qa.out checkm statistics report
+hqmq-metabat-bins/ HQ and MQ bins contigs fasta files directory
+gtdbtk_output/identify/ gtdbtk marker genes identify result directory
+gtdbtk_output/align/ gtdbtk genomes alignment result directory
+gtdbtk_output/classify/ gtdbtk genomes classification result directory
+gtdbtk_output/gtdbtk.ar122.classify.tree archaeal reference tree in Newick format containing analyzed genomes (bins)
+gtdbtk_output/gtdbtk.ar122.markers_summary.tsv summary tsv file for gtdbtk marker genes identify from the archaeal 122 marker set
+gtdbtk_output/gtdbtk.ar122.summary.tsv summary tsv file for gtdbtk archaeal genomes (bins) classification
+gtdbtk_output/gtdbtk.bac120.classify.tree bacterial reference tree in Newick format containing analyzed genomes (bins)
+gtdbtk_output/gtdbtk.bac120.markers_summary.tsv summary tsv file for gtdbtk marker genes identify from the bacterial 120 marker set
+gtdbtk_output/gtdbtk.bac120.summary.tsv summary tsv file for gtdbtk bacterial genomes (bins) classification
+gtdbtk_output/gtdbtk.bac120.filtered.tsv a list of genomes with an insufficient number of amino acids in MSA
+gtdbtk_output/gtdbtk.bac120.msa.fasta the MSA of the user genomes (bins) and the GTDB genomes
+gtdbtk_output/gtdbtk.bac120.user_msa.fasta the MSA of the user genomes (bins) only
+gtdbtk_output/gtdbtk.translation_table_summary.tsv the translation table determined for each sgenome (bins)
+gtdbtk_output/gtdbtk.warnings.log gtdbtk warning message log
+mbin-2021-01-31.sqlite sqlite db file stores MAGs metadata and statistics
+mbin-nmdc.20210131.log the mbin-nmdc pipeline run log file
+rc cromwell script sbumit return code
+script Task run commands
+script.background Bash script to run script.submit
+script.submit cromwell submit commands
+stderr standard error where task writes error message to
+stderr.background standard error where bash script writes error message to
+stdout standard output where task writes error message to
+stdout.background standard output where bash script writes error message to
+complete.mbin the dummy file to indicate the finish of the pipeline
+=================================================== ====================================================================================
+
+
+
+Version History
+---------------
+
+- 1.0.4 (release date **01/12/2022**; previous versions: b1.0.3)
+
+Point of contact
+----------------
+
+- Original author: Neha Varghese
+
+- Package maintainer: Chienchi Lo
+
+
+
+Metatranscriptome Workflow (v0.0.2)
+=====================================
+
+Summary
+-------
+
+MetaT is a workflow designed to analyze metatranscriptomes, building on top of already existing NMDC workflows for processing input. The metatranscriptoimics workflow takes in raw data and starts by quality filtering the reads using the `RQC worfklow `__. With filtered reads, the workflow filters out rRNA reads (and separates the interleaved file into separate files for the pairs) using bbduk (BBTools). After the filtering steps, reads are assembled into transcripts and using MEGAHIT and annotated using the `Metagenome Anotation Workflow `_; producing GFF funtional annotation files. Features are counted with `Subread's featureCounts `_ which assigns mapped reads to genomic features and generating RPKMs for each feature in a GFF file for sense and antisense reads.
+
+
+
+
+Workflow Diagram
+------------------
+
+.. image:: ../_static/images/reference/workflows/6_MetaT_metaT_figure.png
+ :scale: 25%
+ :alt: Metatranscriptome workflow
+
+Workflow Availability
+---------------------
+The workflow uses the listed docker images to run all third-party tools.
+The workflow is available in GitHub:
+https://github.com/microbiomedata/metaT; and the corresponding Docker images that have all the required dependencies are available in following DockerHub (https://hub.docker.com/r/microbiomedata/bbtools, https://hub.docker.com/r/microbiomedata/meta_t, and https://hub.docker.com/r/intelliseqngs/hisat2)
+
+
+Requirements for Execution (recommendations are in bold):
+--------------------------------------------------------
+1. WDL-capable Workflow Execution Tool (**Cromwell**)
+2. Container Runtime that can load Docker images (**Docker v2.1.0.3 or higher**)
+
+Workflow Dependencies
+---------------------
+Third-party software (These are included in the Docker images.)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+1. `BBTools v38.94 `_. (License: `BSD-3-Clause-LBNL `_.)
+2. `BBMap v38.94 `_. (License: `BSD-3-Clause-LBNL `_.)
+3. `Python v3.7.6 `_. (License: Python Software Foundation License)
+4. `featureCounts v2.0.2 `_. (License: GNU-GPL)
+5. `R v3.6.0 `_. (License: GPL-2/GPL-3)
+6. `edgeR v3.28.1 `_. (R package) (License: GPL (>=2))
+7. `pandas v1.0.5 `_. (python package) (License: BSD-3-Clause)
+8. `gffutils v0.10.1 `_. (python package) (License: MIT)
+
+
+Requisite database
+~~~~~~~~~~~~~~~~~~
+The RQCFilterData Database must be downloaded and installed. This is a 106 GB tar file which includes reference datasets of artifacts, adapters, contaminants, the phiX genome, rRNA kmers, and some host genomes. The following commands will download the database:
+
+.. code-block:: bash
+
+ wget http://portal.nersc.gov/dna/microbial/assembly/bushnell/RQCFilterData.tar
+ tar -xvf RQCFilterData.tar
+ rm RQCFilterData.tar
+
+
+Sample dataset(s)
+------------------
+The following files are provided with the GitHub download in the test_data folder:
+
+1. Raw reads: test_data/test_interleave.fastq.gz (output from ReadsQC workflow)
+
+2. Annotation file: test_functional_annotation.gff (output from mg_annotation workflow)
+
+Input: A JSON file containing the following
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+1. a name for the analysis
+2. the number of cpus requested
+3. the path to the clean input interleaved fastq file (recommended: the output from the Reads QC workflow)
+4. the path to the rRNA_kmer database provided as part of RQCFilterData
+5. the path to the assembled transcripts (output of part 1)
+6. the paths to the reads with rRNA removed (paired-end files) (output of part 1)
+7. the path to the annotation file (from the Metagenome Annotation workflow)
+
+An example JSON file is shown below:
+
+.. code-block:: JSON
+
+ {
+ "metat_omics.project_name": "test",
+ "metat_omics.no_of_cpus": 1,
+ "metat_omics.rqc_clean_reads": "test_data/test_interleave.fastq",
+ "metat_omics.ribo_kmer_file": "/path/to/riboKmers20fused.fa.gz",
+ "metat_omics.metat_contig_fn": "/path/to/megahit_assem.contigs.fa",
+ "metat_omics.non_ribo_reads": [
+ "/path/to/filtered_R1.fastq",
+ "/path/to/filtered_R2.fastq"
+ ],
+ "metat_omics.ann_gff_fn": "test_data/test_functional_annotation.gff"
+ }
+
+Output
+~~~~~~
+Output is split up between steps of the workflow. The first half of the workflow will output rRNA-filtered reads and the assembled transcripts. After annotations and featureCount steps include a JSON file that contain RPKMs for both sense and antisense, reads, and information from annotation for each feature. An example of JSON outpus:
+
+.. code-block:: JSON
+
+ {
+ "featuretype": "transcript",
+ "seqid": "k123_15",
+ "id": "STRG.2.1",
+ "source": "StringTie",
+ "start": 1,
+ "end": 491,
+ "length": 491,
+ "strand": ".",
+ "frame": ".",
+ "extra": [],
+ "cov": "5.928717",
+ "FPKM": "76638.023438",
+ "TPM": "146003.046875"
+ }
+
+Below is an example of the output directory files with descriptions to the right.
+
+.. list-table:: Title
+ :widths: 25 50
+ :header-rows: 1
+
+.. list-table:: Title
+ :widths: 25 50
+ :header-rows: 1
+
+ * - Directory/File Name
+ - Description
+ * - metat_output/sense_out.json
+ - RPKM for each feature on + strand
+ * - metat_output/antisense_out.json
+ - RPKM for each feature on - strand
+ * - assembly/megahit_assem.contigs.fa
+ - assembled transcripts
+ * - mapback/mapped_sorted.bam
+ - alignment of reads and transcripts
+ * - qa/_interleaved.fastq
+ - non-ribosomal reads
+ * - qa/filterStats.txt
+ - summary statistics in JSON format
+ * - qa/filterStats2.txt
+ - more detailed summary statistics
+ * - annotation/annotations.json
+ - annotation information
+ * - annotation/features.json
+ - feature information
+ * - annotation/_cath_funfam.gff
+ - features from cath database
+ * - annotation/_cog.gff
+ - features from cog databse
+ * - annotation/_ko_ec.gff
+ - features from ko database
+ * - annotation/_pfam.gff
+ - features from pfam database
+ * - annotation/_smart.gff
+ - features from smart database
+ * - annotation/_structural_annotation.gff
+ - structural features
+ * - annotation/_supfam.gff
+ - features from supfam databse
+ * - annotation/_tigrfam.gff
+ - features from trigfam database
+ * - annotation/_functional_annotation.gff
+ - functional features
+ * - annotation/_ec.tsv
+ - ec terms tsv
+ * - annotation/_ko.tsv
+ - ko terms tsv
+ * - annotation/proteins.faa
+ - fasta containing protiens
+
+
+
+Version History
+---------------
+- 0.0.2 (release date 01/14/2021; previous versions: 0.0.1)
+- 0.0.3 (release date 07/28/2021; previous versions: 0.0.2)
+Points of contact
+-----------------
+- Author: Migun Shakya
+
+
+Metaproteomic Workflow (v1.0.0)
+==============================
+
+Summary
+-------
+The metaproteomics workflow/pipeline is an end-to-end data processing workflow for protein identification and characterization using MS/MS data. Briefly, mass spectrometry instrument generated data files(.RAW) are converted to mzML, an open data format, using MSConvert. Peptide identification is achieved using MSGF+ and the associated metagenomic information in the FASTA (protein sequences) file format. Intensity information for identified species is extracted using MASIC and combined with protein information.
+
+Workflow Diagram
+------------------
+
+.. image:: ../_static/images/reference/workflows/7_Metaproteomics_workflow_diagram.png
+
+Workflow Dependencies
+---------------------
+
+Third party software
+~~~~~~~~~~~~~~~~~~~~
+.. code-block:: bash
+
+ |----------------------------|------------------------------------------|
+ | MSGFPlus | v20190628 |
+ | Mzid-To-Tsv-Converter | v1.3.3 |
+ | PeptideHitResultsProcessor | v1.5.7130 |
+ | pwiz-bin-windows | x86_64-vc141-release-3_0_20149_b73158966 |
+ | MASIC | v3.0.7235 |
+ | sqlite-netFx-full-source | 1.0.111.0 |
+ | Conda | (3-clause BSD) |
+ | | |
+
+
+Workflow Availability
+---------------------
+
+The workflow is available in GitHub:
+https://github.com/microbiomedata/metaPro
+
+The container is available at Docker Hub (microbiomedata/mepro):
+https://hub.docker.com/r/microbiomedata/mepro
+
+Inputs
+~~~~~~~~
+
+- `.raw`, `metagenome`, `parameter files : MSGFplus & MASIC`, `contaminant_file`
+
+Outputs
+~~~~~~~~
+
+1. Processing multiple datasets.
+
+.. code-block:: bash
+
+ .
+ ├── Data/
+ ├── FDR_table.csv
+ ├── Plots/
+ ├── dataset_job_map.csv
+ ├── peak_area_crosstab_by_dataset_id.csv
+ ├── protein_peptide_map.csv
+ ├── specID_table.csv
+ └── spectra_count_crosstab_by_dataset_id.csv
+
+2. Processing single FICUS dataset.
+
+- metadatafile, [Example](https://jsonblob.com/400362ef-c70c-11ea-bf3d-05dfba40675b)
+
+.. code-block:: bash
+
+
+ | Keys | Values |
+ |--------------------|--------------------------------------------------------------------------|
+ | id | str: "md5 hash of $github_url+$started_at_time+$ended_at_time" |
+ | name | str: "Metagenome:$proposal_extid_$sample_extid:$sequencing_project_extid |
+ | was_informed_by | str: "GOLD_Project_ID" |
+ | started_at_time | str: "metaPro start-time" |
+ | ended_at_time | str: "metaPro end-time" |
+ | type | str: tag: "nmdc:metaPro" |
+ | execution_resource | str: infrastructure name to run metaPro |
+ | git_url | str: "url to a release" |
+ | dataset_id | str: "dataset's unique-id at EMSL" |
+ | dataset_name | str: "dataset's name at EMSL" |
+ | has_inputs | json_obj |
+ | has_outputs | json_obj |
+ | stats | json_obj |
+
+ has_inputs :
+ | MSMS_out | str: file_name \|file_size \|checksum |
+ | metagenome_file | str: file_name \|file_size \|checksum \|
+ int: entry_count(#of gene sequences) \|
+ int: duplicate_count(#of duplicate gene sequences) |
+ | parameter_files | str: for_masic/for_msgfplus : file_name \|file_size \|checksum
+ parameter file used for peptide identification search
+ | Contaminant_file | str: file_name \|file_size \|checksum
+ (FASTA containing common contaminants in proteomics)
+
+ has_outputs:
+ | collapsed_fasta_file | str: file_name \|file_size \|checksum |
+ | resultant_file | str: file_name \|file_size \|checksum |
+ | data_out_table | str: file_name \|file_size \|checksum |
+
+ stats:
+ | from_collapsed_fasta | int: entry_count(#of unique gene sequences) |
+ | from_resultant_file | int: total_protein_count |
+ | from_data_out_table | int: PSM(# of MS/MS spectra matched to a peptide sequence at 5% false discovery rate (FDR)
+ float: PSM_identification_rate(# of peptide matching MS/MS spectra divided by total spectra searched (5% FDR)
+ int: unique_peptide_seq_count(# of unique peptide sequences observed in pipeline analysis 5% FDR)
+ int: first_hit_protein_count(# of proteins observed assuming single peptide-to-protein relationships)
+ int: mean_peptide_count(Unique peptide sequences matching to each identified protein.)
+
+- data_out_table
+
+.. code-block:: bash
+
+ | DatasetName | PeptideSequence | FirstHitProtein | SpectralCount | sum(MasicAbundance) | GeneCount | FullGeneList | FirstHitDescription | DescriptionList | min(Qvalue) |
+
+- collapsed_fasta_file
+- resultant_file
+
+Requirements for Execution
+--------------------------
+
+- Docker or other Container Runtime
+
+Version History
+---------------
+
+- 1.0.0
+
+Point of contact
+----------------
+
+Package maintainer: Anubhav
+
+
+Metabolomics Workflow
+==============================
+
+Summary
+-------
+
+The gas chromatography-mass spectrometry (GC-MS) based metabolomics workflow (metaMS) has been developed by leveraging PNNL's CoreMS software framework.
+The current software design allows for the orchestration of the metabolite characterization pipeline, i.e., signal noise reduction, m/z based Chromatogram Peak Deconvolution,
+abundance threshold calculation, peak picking, spectral similarity calculation and molecular search, similarity score calculation, and confidence filtering, all in a single step.
+
+
+Workflow Diagram
+------------------
+
+.. image:: ../_static/images/reference/workflows/8_Metabolomics_metamsworkflow.png
+
+
+Workflow Dependencies
+---------------------
+
+Third party software
+~~~~~~~~~~~~~~~~~~~~
+
+- CoreMS (2-clause BSD)
+- Click (BSD 3-Clause "New" or "Revised" License)
+
+Database
+~~~~~~~~~~~~~~~~
+- PNNL Metabolomics GC-MS Spectral Database
+
+Workflow Availability
+---------------------
+
+The workflow is available in GitHub:
+https://github.com/microbiomedata/metaMS
+
+The container is available at Docker Hub (microbiomedata/metaMS):
+https://hub.docker.com/r/microbiomedata/metams
+
+The python package is available on PyPi:
+https://pypi.org/project/metaMS/
+
+The databases are available by request.
+Please contact NMDC (support@microbiomedata.org) for access.
+
+Test datasets
+-------------
+https://github.com/microbiomedata/metaMS/blob/master/data/GCMS_FAMES_01_GCMS-01_20191023.cdf
+
+
+Execution Details
+---------------------
+
+Please refer to:
+
+https://github.com/microbiomedata/metaMS#metams-installation
+
+Inputs
+~~~~~~~~
+
+- Supported format for low resolution GC-MS data:
+ - ANDI NetCDF for GC-MS (.cdf)
+- Fatty Acid Methyl Esters Calibration File:
+ - ANDI NetCDF for GC-MS (.cdf) - C8 to C30
+- Parameters:
+ - CoreMS Parameter File (.json)
+ - MetaMS Parameter File (.json)
+
+Outputs
+~~~~~~~~
+
+- Metabolites data-table
+ - CSV, TAB-SEPARATED TXT
+ - HDF: CoreMS HDF5 format
+ - XLSX : Microsoft Excel
+- Workflow Metadata:
+ - JSON
+
+Requirements for Execution
+--------------------------
+
+- Docker Container Runtime
+
+ or
+- Python Environment >= 3.6
+- Python Dependencies are listed on requirements.txt
+
+
+Version History
+---------------
+
+- 2.1.3
+
+Point of contact
+----------------
+
+Package maintainer: Yuri E. Corilo
+
+
+
+Natural Organic Matter Workflow
+================================
+
+Summary
+-------
+
+Direct Infusion Fourier Transform mass spectrometry (DI FT-MS) data undergoes signal processing and molecular formula assignment leveraging EMSL’s CoreMS framework. Raw time domain data is transformed into the m/z domain using Fourier Transform and Ledford equation. Data is denoised followed by peak picking, recalibration using an external reference list of known compounds, and searched against a dynamically generated molecular formula library with a defined molecular search space. The confidence scores for all the molecular formula candidates are calculated based on the mass accuracy and fine isotopic structure, and the best candidate assigned as the highest score.
+
+Workflow Diagram
+------------------
+
+.. image:: ../_static/images/reference/workflows/9_NOM_enviromsworkflow.png
+
+
+Workflow Dependencies
+---------------------
+
+Third party software
+~~~~~~~~~~~~~~~~~~~~
+
+- CoreMS (2-clause BSD)
+- Click (BSD 3-Clause "New" or "Revised" License)
+
+Database
+~~~~~~~~~~~~~~~~
+- CoreMS dynamic molecular database search and generator
+
+Workflow Availability
+---------------------
+
+The workflow is available in GitHub:
+https://github.com/microbiomedata/enviroMS
+
+The container is available at Docker Hub (microbiomedata/metaMS):
+https://hub.docker.com/r/microbiomedata/enviroms
+
+The python package is available on PyPi:
+https://pypi.org/project/enviroMS/
+
+The databases are available by request.
+Please contact NMDC (support@microbiomedata.org) for access.
+
+Test datasets
+-------------
+https://github.com/microbiomedata/enviroMS/tree/master/data
+
+
+Execution Details
+---------------------
+
+Please refer to:
+
+https://github.com/microbiomedata/enviroMS#enviroms-installation
+
+Inputs
+~~~~~~~~
+
+- Supported format for Direct Infusion FT-MS data:
+ - Thermo raw file (.raw)
+ - Bruker raw file (.d)
+ - Generic mass list in profile and/or centroid mode (inclusive of all delimiters types and Excel formats)
+- Calibration File:
+ - Molecular Formula Reference (.ref)
+- Parameters:
+ - CoreMS Parameter File (.json)
+ - EnviroMS Parameter File (.json)
+
+Outputs
+~~~~~~~~
+
+- Molecular Formula Data-Table, containing m/z measuments, Peak height, Peak Area, Molecular Formula Identification, Ion Type, Confidence Score, etc.
+ - CSV, TAB-SEPARATED TXT
+ - HDF: CoreMS HDF5 format
+ - XLSX : Microsoft Excel
+- Workflow Metadata:
+ - JSON
+
+Requirements for Execution
+--------------------------
+
+- Docker Container Runtime
+ or
+- Python Environment >= 3.8
+ and
+- Python Dependencies are listed on requirements.txt
+
+
+Version History
+---------------
+
+- 4.1.5
+
+Point of contact
+----------------
+
+Package maintainer: Yuri E. Corilo
+
+
+
diff --git a/content/nmdc/src/reference/data_portal.md b/content/nmdc/src/reference/data_portal.md
new file mode 100644
index 0000000..27110af
--- /dev/null
+++ b/content/nmdc/src/reference/data_portal.md
@@ -0,0 +1,34 @@
+# NMDC Data Portal
+
+The NMDC Data Portal is a web application researchers can use to discover and access standardized multi-omics microbiome data.
+
+The main technologies upon which it is built are:
+
+* [Python](https://www.python.org/) and [FastAPI](https://fastapi.tiangolo.com/)
+* [PostgreSQL](https://www.postgresql.org/) and [SQLAlchemy](https://www.sqlalchemy.org/)
+* [Celery](https://docs.celeryq.dev/) and [Redis](https://redis.io/)
+* [Vue.js](https://vuejs.org/) and [Vuetify](https://vuetifyjs.com/)
+
+### Dependencies
+
+The NMDC Data Portal depends upon various Python and JavaScript libraries, which are listed in either of the following documents:
+
+* [Python dependencies](https://github.com/microbiomedata/nmdc-server/blob/main/setup.py)
+* [Javascript dependencies](https://github.com/microbiomedata/nmdc-server/blob/main/web/package.json)
+
+## Architecture
+
+![nmdc-diagram](../_static/images/reference/data_portal/nmdc-diagram.svg)
+
+## API documentation
+
+In addition to providing a web-based GUI (graphical user interface), the NMDC Data Portal also exposes an HTTP API. Researchers can use the latter to _programmatically_ discover and access standardized multi-omics microbiome data.
+
+Information about the HTTP API is in this [wiki](https://github.com/microbiomedata/nmdc-server/wiki/Search-API-Docs).
+
+## Development documentation
+
+Here are some resources people can use to learn about the development of the NMDC Data Portal.
+
+* [Server and client development documentation](https://github.com/microbiomedata/nmdc-server)
+* [Client architecture notes](https://github.com/microbiomedata/nmdc-server/blob/main/web/README.md)
diff --git a/content/nmdc/src/tutorials/nav_data_portal.md b/content/nmdc/src/tutorials/nav_data_portal.md
new file mode 100644
index 0000000..533c920
--- /dev/null
+++ b/content/nmdc/src/tutorials/nav_data_portal.md
@@ -0,0 +1,48 @@
+# Navigating the Data Portal
+
+
+
+
+
+
+>NMDC Data Portal Tutorial Practice
+>
+>There are many ways to search the microbiome studies available in the Data Portal. This Tutorial will guide you through some of the most common methods.
+>
+>Task 1: Go to the [NMDC Data Portal](https://data.microbiomedata.org/) and log in with your ORCID account. (You can browse the data without logging in, but you will not be able to download data/processed data. You can create an account from the login page from the ORCID login link in the top corner of the Data Portal.)
+>
+>Task 2: Using the map features (zoom capability and "Search this region" to filter by latitude/longitude), find all the metagenomes collected closest to Corvallis, Oregon.
+>
+> Question 1: How many metagenomes have been collected near Corvallis?
+>
+>Task 3: Using the Study box (which shows the number of microbiome studies related to the metagenomes identified in Task 2), click the arrow on the right side of this box to go to the Study Page for this study.
+>
+> Question 1: What is the DOI for this study? (Note: This is also the DOI for the Dataset Citation.)
+>
+>Task 4: Go back to the main Data Portal page and clear the active query terms in the upper left corner. Use the Collection date option in the left menu bar or the timeline slide feature (below the Omics type and map) to filter to samples collected in 2015. Use the "search" feature (upper left corner) to find metagenomes collected from freshwater river biomes.
+>
+> Question 1: How many samples collected from freshwater biomes in 2015 have metagenomic data?
+>
+> Question 2: What other types of omics data are available for these samples?
+>
+>Task 5: In the Omics box, click the additional omics types available for these samples. This will allow you to be able to download from any of the processed data. (You must be logged in to be able to download data.) Download some **small** files from the first sample in the list:
+>
+> 1. Click the Metagenome button under the first sample. You can see all of the processed data available from this metagenome; download the QC Statistics.
+>
+> 2. Click on the Proteomics button. You can see all of the processed data available from this metaproteomic sample; download the Protein Report.
+>
+> 3. Click on the Metabolomics button; download the GC-MS Metabolomics Results.
+>
+> 4. In the purple box above the samples, test the **Bulk Download** feature. In the *Select file type* box dropdown menu, select nmdc:ReadQC Analysis Activity. Click to **remove** the Filtered Sequencing Reads. This will provide a zipped (compressed) files with just the QC Statistics for all of samples in the study.
+
+
+
+
+## Answers to Tutorial Questions
+>Task 2, Question 1: 108 metagenomes have been collected near Corvallis, Oregon.
+>
+>Task 3, Question 1: The DOI for this study and dataset is https://doi.org/10.46936/10.25585/60000017
+>
+>Task 4, Question 1: There are 32 samples collected from freshwater river biomes in 2015 which have metagenomic data.
+>
+>Task 4, Question 2: There are also proteomics and metabolomics data for these 32 samples.
diff --git a/content/nmdc/src/tutorials/prepare_metadata.md b/content/nmdc/src/tutorials/prepare_metadata.md
new file mode 100644
index 0000000..9508e39
--- /dev/null
+++ b/content/nmdc/src/tutorials/prepare_metadata.md
@@ -0,0 +1,3 @@
+# Preparing your metadata
+
+![](../_static/images/other/construction_4_kindpng_1353982.png)
diff --git a/content/nmdc/src/tutorials/run_workflows.md b/content/nmdc/src/tutorials/run_workflows.md
new file mode 100644
index 0000000..de51cf6
--- /dev/null
+++ b/content/nmdc/src/tutorials/run_workflows.md
@@ -0,0 +1,188 @@
+# Running the Workflows
+
+## NMDC EDGE QuickStart
+
+
+
+
+
+
+> ### NMDC EDGE QuickStart Tutorial Practice
+>
+>Task 1: Create an NMDC EDGE account with either your email address or your ORCiD account.
+>
+>Task 2: Download the small interleaved [data file](https://nmdc-edge.org/publicdata/test_data/SRR7877884-int-0.1.fastq.gz) listed here. (Note: This is paired-end data with the pairs interleaved together into a single file.) Upload the file to NMDC EDGE.
+>
+>Task 3: Click the user icon (in the top right corner with your initials) and under “Files”, click on “Manage Uploads”. Verify that the file you uploaded is there. (Note: Later you can delete uploads that are no longer needed.)
+>
+>Task 4 (optional): Click the user icon and under “Account”, click on “Profile”. Edit your account to receive email notification of project status by clicking “ON”.
+
+## Metagenomics
+
+### ReadsQC
+
+
+
+
+
+
+>NMDC EDGE Metagenome ReadsQC Tutorial Practice
+>
+>Task: Log into NMDC EDGE and run the Metagenome ReadsQC workflow using the dataset uploaded in the QuickStart tutorial.
+>
+> Question 1: How many reads were in the input file? How many bases were in the input file?
+>
+> Question 2: How many reads were in the output file? How many bases were in the output file?
+>
+> Question 3: What file in the output would be used in the next workflow?
+
+
+### Read-based Taxonomy Classification
+
+
+
+
+
+
+>NMDC EDGE Metagenome Read-based Taxonomy Classification Tutorial Practice
+>
+>Task:
+run the Metagenome Read-based Taxonomy Classification workflow with all three taxonomy classification tools. (Note: All three tools are selected by default. While a user can opt to turn off one or two tools, it is recommended to run all three.) Use the clean data output file from the project run in the ReadsQC Tutorial (the file ending in .anqdpht.fq.gz). In this case, the file will be treated as single-end reads.
+>
+> Question 1: How many of the Top 10 species are called by more than one tool?
+>
+> Question 2: List the **genera** that are called by all three tools in the Top 10.
+>
+> Question 3: From the Krona plot shown from the taxonomy classification tool Centrifuge results at **species level**, what percentage of the sample is estimated to be _Pseudomonas aeruginosa_?
+
+### Assembly
+
+
+
+
+
+
+>NMDC EDGE Metagenome Assembly Tutorial Practice
+>
+>Task: Log into NMDC EDGE and run the Metagenome Assembly workflow. Use the clean data output file from the project run in the ReadsQC Tutorial (the file ending in .anqdpht.fq.gz). In this case, the file is interleaved paired data and only one file is required for input.
+>
+> Question 1: How many contigs were generated from the assembly?
+>
+> Question 2: How many scaffolds were generated from the assembly?
+>
+> Question 3: Download the covstats.txt file. From the top of the file, what percentage of the reads map back to the assembled contigs?
+
+### Annotation
+
+
+
+
+
+
+>NMDC EDGE Metagenome Annotation Tutorial Practice
+>
+>Task: Log into NMDC EDGE and run the Metagenome Annotation workflow. Use the assebled contigs which are output from the project run in the Assembly Tutorial (assembled_contigs.fna).
+>
+> Question 1: How many contigs had genes called (sequences_with_genes)?
+>
+> Question 2: How many coding sequences (genes) were called by Prodigal? How many were called by GeneMark?
+>
+> Question 3: What is the coding density of this metagenome?
+
+### MAGs Generation
+
+
+
+
+
+
+>NMDC EDGE MAGs Generation Tutorial Practice
+>
+>Task: Log into NMDC EDGE and run the Metagenome MAGs workflow. Use the assebled contigs and the read mapping file which are output from the project run in the Assembly Tutorial (assembled_contigs.fna and pairedMapped_sorted.bam) and the combined functional annotation file fromt he Annotation Tutorial (the file ending in fuctional_annotation.gff)
+>
+> Question 1: Calculate the percentage of the contigs were binned.
+>
+> Question 2: How many bins were determined to be high quality (HQ)? How many bins were determined to be medium quality (MQ)?
+>
+> Question 3: What is the organism identified from genome in the bin which is most complete and has the least contamination (the highest quality bin)? (Note: Scroll to the far right of the summary table in the results to get the species assignment.)
+
+### Running multiple workflows or the full metagenomic pipeline with a single input
+
+
+
+
+
+
+>NMDC EDGE Full Metagenome Piepline Tutorial Practice
+>
+>Task: Log into NMDC EDGE and run the full Metagenome pipeline (multiple workflows- all workflows are selected by default). Use the same dataset uploaded in the QuickStart tutorial. Check your results for the full pipeline against your results from the previous tutorials. The results should be identical/nearly identical to the results from the previous tutorials for each individual workflow with the added benefit of submitting all the workflows with a single input of raw sequencing data.
+>
+>If you want to just test the full pipeline and not the individual workflows, [download this data set.](
+https://nmdc-edge.org/publicdata/test_data/SRR7877884-int-0.1.fastq.gz) Upload the file to NMDC EDGE and run the full pipeline.
+
+
+## Metatranscriptomics
+
+
+
+
+
+
+
+
+
+
+>NMDC EDGE Metatranscriptomics Tutorial Practics
+>
+>Task 1: Download the small interleaved [data file](https://nmdc-edge.org/publicdata/metaT/test_smaller_interleave.fastq.gz) listed here. (Note: This is paired-end data with the pairs interleaved together into a single file.)
+>
+>Task 2: Log into NMDC EDGE and upload the file.
+>
+>Task 3: Click the user icon (in the top right corner with your initials) and under “Files”, click on “Manage Uploads”. Verify that the file you uploaded is there. (Note: Later you can delete uploads that are no longer needed.)
+>
+>Task 4: Run the MetaT single workflow with this dataset in your upload folder. When the analysis is complete, the Top_features summary table under Metatranscriptome Result tab shows the proteins assigned to transcripts with the highest rpkm values. The full results (rpkm_sorted_features.tsv) can be downloaded from the metat_output folder under the Browser/Download output tab. The assembled transcripts and the annotation fules can also be downloaded from the respective folders under the Browser/Download output tab.
+>
+> Question 1: What product (protein) is assigned to the transcript with the highest rpkm value? (Note: Scroll to the far right to see these results.)
+>
+> Question 2: Download the contigs.fa (transcripts) file. How many transcripts were assembled?
+>
+> Question 3: Download the rpkm_sorted_features.tsv file. How many transcripts were assigned a product (protein) that is **not hypothetical**?
+
+
+
+## Answers to Tutorial Questions
+
+### Metegenomics ReadsQC
+>Question 1: Input contained 4,496,774 reads and 674,516,100 bases.
+>
+>Question 2: Output contained 3,353,438 reads and 487,250,239 bases.
+>
+>Question 3: For this project, the clean, filtered data is in the output file called SRR7877884-int-0.1.anqdpht.fastq.gz.
+
+### Metegenomics Read-based Taxonomy Classification
+>Question 1: There are seven species called by more than one taxonomy tool: *Pseudomonas aeruginosa, Salmonella enterica, Listeria monocytogenes, Enterococcus faecalis, Lactobacillus fermentum, Bacillus subtilis, and Escherichia coli.*
+>
+>Question 2: There are four genera called by all three taxonomy classification tools: *Pseudomonas, Bacillus, Enterococcus, and Lactobacillus.*
+>
+>Question 3: The Krona plot shows that Centrifuge estimates that 12% of the sample is *Pseudomonas aeruginosa."
+
+### Metegenomics Assembly
+>Question 1: 3,196 contigs were assembled.
+>
+>Question 2: 3,141 scaffolds were created.
+>
+>Question 3:
+
+### Metegenomics Annotation
+>Question 1: 3,031 contigs had genes called.
+>
+>Question 2: 2,495 CDS (coding sequences or genes) were called by GeneMark and 936 CDS were called by Prodigal.
+>
+>Question 3: The coding density of the metagenome is 89.15%.
+
+### Metagenome MAGs
+>Question 1: 24% of the contigs were binned.
+>
+>Question 2: One bin was determined to be high quality and five bins were determined to be medium quality.
+>
+>Question 3: The organism called by gtdbtk for the highest quality bin is *Bacillus marinus.*
diff --git a/content/nmdc/src/tutorials/submission_portal.md b/content/nmdc/src/tutorials/submission_portal.md
new file mode 100644
index 0000000..6de0d9f
--- /dev/null
+++ b/content/nmdc/src/tutorials/submission_portal.md
@@ -0,0 +1,6 @@
+# Using the Submission Portal
+
+