Merge branch 'v1.2.0' of https://github.com/CDCgov/seqsender

CDCgov · Apr 11, 2024 · ea50744 · ea50744
2 parents 67d14c9 + 9f6eac5
commit ea50744
Show file tree

Hide file tree

Showing 517 changed files with 399,345 additions and 303 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -34,7 +34,7 @@ github_pages_url <- description$GITHUB_PAGES
 
 ## Overview
 
-``r program`` is a Python program that is developed to automate the process of generating necessary submission files and batch uploading them to <ins>NCBI archives</ins> (such as **BioSample**, **SRA**, and **Genbank**) and <ins>GISAID databases</ins> (e.g. **EpiFlu** and **EpiCoV**). Presently, the pipeline is capable of uploading **Influenza A Virus** (FLU) and **SARS-COV-2** (COV) data. However, the dynamic nature of this pipeline can allow for additional uploads of other organisms in future updates or requests.
+``r program`` is a Python program that is developed to automate the process of generating necessary submission files and batch uploading them to <ins>NCBI archives</ins> (such as **BioSample**, **SRA**, and **Genbank**) and <ins>GISAID databases</ins> (e.g. **EpiFlu**, **EpiCoV**, **EpiPox**, **EpiArbo**). Presently, the pipeline is capable of uploading **Influenza A Virus** (FLU), **SARS-COV-2** (COV), **Monkeypox** (POX), **Arbovirus** (ARBO), and a wide variety of other organisms. If you'd like to have ``r program`` support your virus create a issue.
 
 ## Contacts
 
@@ -62,11 +62,11 @@ github_pages_url <- description$GITHUB_PAGES
 
 - **GISAID Submissions**
 
-``r program`` makes use of GISAID's Command Line Interface tools to bulk uploading meta- and sequence-data to GISAID databases. Presently, the pipeline only allows upload to EpiFlu (**Influenza A Virus**) and EpiCoV (**SARS-COV-2**) databases. Before uploading, submitter needs to 
+``r program`` makes use of GISAID's Command Line Interface tools to bulk uploading meta- and sequence-data to GISAID databases. Presently, the pipeline supports upload to EpiFlu (**Influenza A Virus**), EpiCoV (**SARS-COV-2**), EpiPox (**Monkeypox**), and EpiArbo (**Arbovirus**). Before uploading, submitter needs to 
 
 1. Have a GISAID account. To sign up, visit [GISAID Platform](https://gisaid.org/). 
 
-2. Request a client-ID for EpiFlu or EpiCoV database in order to use its CLI tool. The CLI utilizes the client-ID along with the username and password to authenticate the database prior to make a submission. To obtain a client-ID, please email <a href="mailto:[email protected]" >[email protected]</a> to request. _**Important note**: If submitter would like to upload a "test" submission first to familiarize themselves with the submission process prior to make a real submission, one should additionally request a test client-id to perform such submissions._
+2. Request a client-ID for your specified Epi(Flu/CoV/Pox/Arbo) database in order to use its CLI tool. The CLI utilizes the client-ID along with the username and password to authenticate the database prior to make a submission. To obtain a client-ID, please email <a href="mailto:[email protected]" >[email protected]</a> to request. _**Important note**: If submitter would like to upload a "test" submission first to familiarize themselves with the submission process prior to make a real submission, one should additionally request a test client-id to perform such submissions._
 
 3. Download the <a href="`r github_pages_url`/articles/images/fluCLI_download.png" target="_blank">EpiFlu</a> or <a href="`r github_pages_url`/articles/images/covCLI_download.png" target="_blank">EpiCoV</a> CLI from the **GISAID platform** and stored them in the destination of choice prior to perform a batch upload.
 
@@ -80,21 +80,21 @@ Here is a quick look of where to store the downloaded **GISAID CLI** package.
 
 Before submitters can perform a batch submission using ``r program``, they must make sure the requirement files (such as *config.yaml*, *metadata.csv*, *sequence.fasta*, *raw reads*, etc.) are already prepared and stored in a submission directory of choice.
 
-(a) To prep for FLU submissions, select one of the databases below to get started:
+To prep for submissions, select one of the databases below to get started:
+*to submit to multiple databases just combine the required metadata for each database into one file.
+
+**NCBI:**
 
 > <a href="`r github_pages_url`/articles/biosample_submission.html" target="_blank">BioSample</a> <br>
 > <a href="`r github_pages_url`/articles/sra_submission.html" target="_blank">SRA</a> <br>
 > <a href="`r github_pages_url`/articles/genbank_submission.html" target="_blank">Genbank</a> <br>
-> <a href="`r github_pages_url`/articles/gisaid_flu_submission.html" target="_blank">GISAID</a> <br>
-<!-- > <a href="`r github_pages_url`/articles/multiple_databases_flu_submission.html" target="_blank">Multiple databases</a> -->
 
-(b) To prep for COV submissions, select one of the databases below to get started:
+**GISAID:**
 
-> <a href="`r github_pages_url`/articles/biosample_submission.html" target="_blank">BioSample</a> <br>
-> <a href="`r github_pages_url`/articles/sra_submission.html" target="_blank">SRA</a> <br>
-> <a href="`r github_pages_url`/articles/genbank_submission.html" target="_blank">Genbank</a> <br>
-> <a href="`r github_pages_url`/articles/gisaid_cov_submission.html" target="_blank">GISAID</a> <br>
-<!-- > <a href="`r github_pages_url`/articles/multiple_databases_cov_submission.html" target="_blank">Multiple databases</a> -->
+> <a href="`r github_pages_url`/articles/gisaid_flu_submission.html" target="_blank">EpiFlu</a> <br>
+> <a href="`r github_pages_url`/articles/gisaid_cov_submission.html" target="_blank">EpiCoV</a> <br>
+> <a href="`r github_pages_url`/articles/gisaid_pox_submission.html" target="_blank">EpiPox</a> <br>
+> <a href="`r github_pages_url`/articles/gisaid_arbo_submission.html" target="_blank">EpiArbo</a> <br>
 
 ## Quick Start
 

diff --git a/README.md b/README.md
@@ -30,11 +30,12 @@ service, product, or enterprise.
 `seqsender` is a Python program that is developed to automate the
 process of generating necessary submission files and batch uploading
 them to <ins>NCBI archives</ins> (such as **BioSample**, **SRA**, and
-**Genbank**) and <ins>GISAID databases</ins> (e.g. **EpiFlu** and
-**EpiCoV**). Presently, the pipeline is capable of uploading **Influenza
-A Virus** (FLU) and **SARS-COV-2** (COV) data. However, the dynamic
-nature of this pipeline can allow for additional uploads of other
-organisms in future updates or requests.
+**Genbank**) and <ins>GISAID databases</ins> (e.g. **EpiFlu**,
+**EpiCoV**, **EpiPox**, **EpiArbo**). Presently, the pipeline is capable
+of uploading **Influenza A Virus** (FLU), **SARS-COV-2** (COV),
+**Monkeypox** (POX), **Arbovirus** (ARBO), and a wide variety of other
+organisms. If you’d like to have `seqsender` support your virus create a
+issue.
 
 ## Contacts
 
@@ -99,16 +100,18 @@ FTP on the command line. Before attempting to submit a submission using
 
 `seqsender` makes use of GISAID’s Command Line Interface tools to bulk
 uploading meta- and sequence-data to GISAID databases. Presently, the
-pipeline only allows upload to EpiFlu (**Influenza A Virus**) and EpiCoV
-(**SARS-COV-2**) databases. Before uploading, submitter needs to
+pipeline supports upload to EpiFlu (**Influenza A Virus**), EpiCoV
+(**SARS-COV-2**), EpiPox (**Monkeypox**), and EpiArbo (**Arbovirus**).
+Before uploading, submitter needs to
 
 1.  Have a GISAID account. To sign up, visit [GISAID
     Platform](https://gisaid.org/).
 
-2.  Request a client-ID for EpiFlu or EpiCoV database in order to use
-    its CLI tool. The CLI utilizes the client-ID along with the username
-    and password to authenticate the database prior to make a
-    submission. To obtain a client-ID, please email
+2.  Request a client-ID for your specified Epi(Flu/CoV/Pox/Arbo)
+    database in order to use its CLI tool. The CLI utilizes the
+    client-ID along with the username and password to authenticate the
+    database prior to make a submission. To obtain a client-ID, please
+    email
     <a href="mailto:[email protected]" >[email protected]</a> to
     request. ***Important note**: If submitter would like to upload a
     “test” submission first to familiarize themselves with the
@@ -134,31 +137,29 @@ must make sure the requirement files (such as *config.yaml*,
 *metadata.csv*, *sequence.fasta*, *raw reads*, etc.) are already
 prepared and stored in a submission directory of choice.
 
-1)  To prep for FLU submissions, select one of the databases below to
-    get started:
+To prep for submissions, select one of the databases below to get
+started: \*to submit to multiple databases just combine the required
+metadata for each database into one file.
+
+**NCBI:**
 
 > <a href="https://cdcgov.github.io/seqsender/articles/biosample_submission.html" target="_blank">BioSample</a>
 > <br>
 > <a href="https://cdcgov.github.io/seqsender/articles/sra_submission.html" target="_blank">SRA</a>
 > <br>
 > <a href="https://cdcgov.github.io/seqsender/articles/genbank_submission.html" target="_blank">Genbank</a>
 > <br>
-> <a href="https://cdcgov.github.io/seqsender/articles/gisaid_flu_submission.html" target="_blank">GISAID</a>
-> <br>
-> <!-- > <a href="https://cdcgov.github.io/seqsender/articles/multiple_databases_flu_submission.html" target="_blank">Multiple databases</a> -->
 
-2)  To prep for COV submissions, select one of the databases below to
-    get started:
+**GISAID:**
 
-> <a href="https://cdcgov.github.io/seqsender/articles/biosample_submission.html" target="_blank">BioSample</a>
+> <a href="https://cdcgov.github.io/seqsender/articles/gisaid_flu_submission.html" target="_blank">EpiFlu</a>
 > <br>
-> <a href="https://cdcgov.github.io/seqsender/articles/sra_submission.html" target="_blank">SRA</a>
+> <a href="https://cdcgov.github.io/seqsender/articles/gisaid_cov_submission.html" target="_blank">EpiCoV</a>
 > <br>
-> <a href="https://cdcgov.github.io/seqsender/articles/genbank_submission.html" target="_blank">Genbank</a>
+> <a href="https://cdcgov.github.io/seqsender/articles/gisaid_pox_submission.html" target="_blank">EpiPox</a>
 > <br>
-> <a href="https://cdcgov.github.io/seqsender/articles/gisaid_cov_submission.html" target="_blank">GISAID</a>
+> <a href="https://cdcgov.github.io/seqsender/articles/gisaid_arbo_submission.html" target="_blank">EpiArbo</a>
 > <br>
-> <!-- > <a href="https://cdcgov.github.io/seqsender/articles/multiple_databases_cov_submission.html" target="_blank">Multiple databases</a> -->
 
 ## Quick Start
 

diff --git a/config/biosample/Beta-lactamase.1.0.xml b/config/biosample/Beta-lactamase.1.0.xml
@@ -0,0 +1,171 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<BioSamplePackages>
+  <Package>
+    <Name>Beta-lactamase.1.0</Name>
+    <DisplayName>Beta-lactamase; version 1.0</DisplayName>
+    <ShortName>Beta-lactamase</ShortName>
+    <EnvPackage/>
+    <EnvPackageDisplay/>
+    <NotAppropriateFor>wgs_single;wgs_batch;wgs_diploid</NotAppropriateFor>
+    <Description>Use for beta-lactamase gene transformants that have sequence and antibiotic resistance data. Please use the 'Supplementary Files' wizard to submit corresponding Sequin and Antibiogram files.</Description>
+    <Example>SAMN04099646</Example>
+    <TemplateHeader># This is a submission template for batch deposit of 'Beta-lactamase; version 1.0' samples to the NCBI BioSample database (https://www.ncbi.nlm.nih.gov/biosample/).&#13;
+# See SAMN04099646 for an example record of this type of BioSample (https://www.ncbi.nlm.nih.gov/biosample/SAMN04099646).&#13;
+# Fields with an asterisk (*) are mandatory. Your submission will fail if any mandatory fields are not completed. If information is unavailable for any mandatory field, please enter 'not collected', 'not applicable' or 'missing' as appropriate.&#13;
+# All other fields are optional. Leave optional fields empty if no information is available.&#13;
+# You can add any number of custom fields to fully describe your BioSamples, simply include them in the table.&#13;
+# CAUTION: Be aware that Excel may automatically apply formatting to your data. In particular, take care with dates, incrementing autofills and special characters like / or -. Doublecheck that your text file is accurate before uploading to BioSample.&#13;
+# TO MAKE A SUBMISSION:&#13;
+#     1. Complete the template table (typically in Excel, or another spreadsheet application)&#13;
+#     2. Save the worksheet as a Text (Tab-delimited) file - use 'File, Save as, Save as type: Text (Tab-delimited)'&#13;
+#     3. Upload the file on the 'Attributes' tab of the BioSample Submission Portal at https://submit.ncbi.nlm.nih.gov/subs/biosample/.&#13;
+#     4. If you have any questions, please contact us at [email protected].</TemplateHeader>
+    <Attribute use="either_one_mandatory" group_name="Organism">
+      <Name>strain</Name>
+      <HarmonizedName>strain</HarmonizedName>
+      <Description>microbial or eukaryotic strain name</Description>
+      <Format>
+        <Description/>
+      </Format>
+    </Attribute>
+    <Attribute use="either_one_mandatory" group_name="Organism">
+      <Name>isolate</Name>
+      <HarmonizedName>isolate</HarmonizedName>
+      <Description>identification or description of the specific individual from which this sample was obtained</Description>
+      <Format>
+        <Description/>
+      </Format>
+    </Attribute>
+    <Attribute use="mandatory">
+      <Name>beta-lactamase family</Name>
+      <HarmonizedName>beta_lactamase_family</HarmonizedName>
+      <Description>Specify the beta-lactamase family for this gene.</Description>
+      <Format type="select">
+        <Description>ACC | ACT | ADC | BEL | CARB | CBP | CFE | CMY | CTX-M | DHA | FOX | GES | GIM | KPC | IMI | IMP | IND | LAT | MIR | MOX | NDM | OXA | PER | PDC | SHV | SME | TEM | VEB | VIM | missing | not applicable | not collected | not provided | restricted access</Description>
+        <Choice/>
+        <Choice>ACC</Choice>
+        <Choice>ACT</Choice>
+        <Choice>ADC</Choice>
+        <Choice>BEL</Choice>
+        <Choice>CARB</Choice>
+        <Choice>CBP</Choice>
+        <Choice>CFE</Choice>
+        <Choice>CMY</Choice>
+        <Choice>CTX-M</Choice>
+        <Choice>DHA</Choice>
+        <Choice>FOX</Choice>
+        <Choice>GES</Choice>
+        <Choice>GIM</Choice>
+        <Choice>KPC</Choice>
+        <Choice>IMI</Choice>
+        <Choice>IMP</Choice>
+        <Choice>IND</Choice>
+        <Choice>LAT</Choice>
+        <Choice>MIR</Choice>
+        <Choice>MOX</Choice>
+        <Choice>NDM</Choice>
+        <Choice>OXA</Choice>
+        <Choice>PER</Choice>
+        <Choice>PDC</Choice>
+        <Choice>SHV</Choice>
+        <Choice>SME</Choice>
+        <Choice>TEM</Choice>
+        <Choice>VEB</Choice>
+        <Choice>VIM</Choice>
+        <Choice>missing</Choice>
+        <Choice>not applicable</Choice>
+        <Choice>not collected</Choice>
+        <Choice>not provided</Choice>
+        <Choice>restricted access</Choice>
+      </Format>
+    </Attribute>
+    <Attribute use="mandatory">
+      <Name>carbapenemase</Name>
+      <HarmonizedName>carbapenemase</HarmonizedName>
+      <Description>Does the enzyme exhibit carbapenemase activity? If the enzyme does exhibit carbapenemase activity, the response should be "yes", otherwise "no."</Description>
+      <Format type="select">
+        <Description>yes | no | missing | not applicable | not collected | not provided | restricted access</Description>
+        <Choice/>
+        <Choice>yes</Choice>
+        <Choice>no</Choice>
+        <Choice>missing</Choice>
+        <Choice>not applicable</Choice>
+        <Choice>not collected</Choice>
+        <Choice>not provided</Choice>
+        <Choice>restricted access</Choice>
+      </Format>
+    </Attribute>
+    <Attribute use="mandatory">
+      <Name>collection date</Name>
+      <HarmonizedName>collection_date</HarmonizedName>
+      <Description>the date on which the sample was collected; date/time ranges are supported by providing two dates from among the supported value formats, delimited by a forward-slash character; collection times are supported by adding "T", then the hour and minute after the date, and must be in Coordinated Universal Time (UTC), otherwise known as "Zulu Time" (Z); supported formats include "DD-Mmm-YYYY", "Mmm-YYYY", "YYYY" or ISO 8601 standard "YYYY-mm-dd", "YYYY-mm", "YYYY-mm-ddThh:mm:ss"; e.g., 30-Oct-1990, Oct-1990, 1990, 1990-10-30, 1990-10,  21-Oct-1952/15-Feb-1953, 2015-10-11T17:53:03Z; valid non-ISO dates will be automatically transformed to ISO format</Description>
+      <Format>
+        <Description>{timestamp}</Description>
+      </Format>
+    </Attribute>
+    <Attribute use="mandatory">
+      <Name>EDTA inhibitor tested</Name>
+      <HarmonizedName>edta_inhibitor_tested</HarmonizedName>
+      <Description>Was carbapenemase activity tested in the presence of EDTA? If carbapenemase activity was tested in the presence of EDTA, the response should be "yes", otherwise "noâ.</Description>
+      <Format type="select">
+        <Description>yes | no | missing | not applicable | not collected | not provided | restricted access</Description>
+        <Choice/>
+        <Choice>yes</Choice>
+        <Choice>no</Choice>
+        <Choice>missing</Choice>
+        <Choice>not applicable</Choice>
+        <Choice>not collected</Choice>
+        <Choice>not provided</Choice>
+        <Choice>restricted access</Choice>
+      </Format>
+    </Attribute>
+    <Attribute use="mandatory">
+      <Name>geographic location</Name>
+      <HarmonizedName>geo_loc_name</HarmonizedName>
+      <Description>Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg "Canada: Vancouver" or "Germany: halfway down Zugspitze, Alps"</Description>
+      <Format>
+        <Description>{term}:{term}:{text}</Description>
+      </Format>
+    </Attribute>
+    <Attribute use="optional">
+      <Name>collected by</Name>
+      <HarmonizedName>collected_by</HarmonizedName>
+      <Description>Name of persons or institute who collected the sample</Description>
+      <Format>
+        <Description>None</Description>
+      </Format>
+    </Attribute>
+    <Attribute use="optional">
+      <Name>host</Name>
+      <HarmonizedName>host</HarmonizedName>
+      <Description>The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, "Homo sapiens".</Description>
+      <Format>
+        <Description>None</Description>
+      </Format>
+    </Attribute>
+    <Attribute use="optional">
+      <Name>isolation source</Name>
+      <HarmonizedName>isolation_source</HarmonizedName>
+      <Description>Describes the physical, environmental and/or local geographical source of the biological sample from which the sample was derived.</Description>
+      <Format>
+        <Description>None</Description>
+      </Format>
+    </Attribute>
+    <Attribute use="optional">
+      <Name>lab host</Name>
+      <HarmonizedName>lab_host</HarmonizedName>
+      <Description>Scientific name and description of the laboratory host used to propagate the source organism or material from which the sample was obtained, e.g., Escherichia coli DH5a, or Homo sapiens HeLa cells</Description>
+      <Format>
+        <Description>None</Description>
+      </Format>
+    </Attribute>
+    <Attribute use="optional">
+      <Name>latitude and longitude</Name>
+      <HarmonizedName>lat_lon</HarmonizedName>
+      <Description>The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format "d[d.dddd] N|S d[dd.dddd] W|E", eg, 38.98 N 77.11 W</Description>
+      <Format>
+        <Description>{float} {float}</Description>
+      </Format>
+    </Attribute>
+  </Package>
+</BioSamplePackages>