diff --git a/v2.2.1/algorithm/technical/index.html b/v2.2.1/algorithm/technical/index.html
index 08ebae5..4331f3d 100644
--- a/v2.2.1/algorithm/technical/index.html
+++ b/v2.2.1/algorithm/technical/index.html
@@ -89,7 +89,7 @@
     <div data-md-component="skip">
       
         
-        <a href="#calculating-percent-gene-coverage" class="md-skip">
+        <a href="#technical-code-breakdown" class="md-skip">
           Skip to content
         </a>
       
@@ -879,6 +879,8 @@
       <input class="md-nav__toggle md-toggle" type="checkbox" id="__toc">
       
       
+        
+      
       
         <label class="md-nav__link md-nav__link--active" for="__toc">
           
@@ -907,6 +909,8 @@
   
   
   
+    
+  
   
     <label class="md-nav__title" for="__toc">
       <span class="md-nav__icon md-icon"></span>
@@ -1014,10 +1018,13 @@
   
 
 
-  <h1>Technical Code Breakdown</h1>
-
+<div class="admonition tip inline end">
+<p class="admonition-title">Examples from TBProfiler v4.4.2</p>
+<p>The examples in this document are based on the output of TBProfiler v4.4.2. However, the general principles apply to all versions of TBProfiler and tbp-parser.</p>
+</div>
+<h1 id="technical-code-breakdown">Technical Code Breakdown<a class="headerlink" href="#technical-code-breakdown" title="Permanent link">&para;</a></h1>
 <p><code>tbp-parser</code> is object-oriented, with each class representing either <em>an output file</em>, <em>a part of an output file</em>, or <em>a part of the input JSON file</em> produced by TBProfiler.</p>
-<p>The first class that is invoked by the <code>tbp-parser.py</code> script is <code>Parser</code> which is a control class that orchestrates the creation of the different output reports. </p>
+<p>The first class that is invoked by the <code>tbp-parser.py</code> script is <code>Parser</code> which is a control class that orchestrates the creation of the different output reports.</p>
 <h2 id="calculating-percent-gene-coverage">Calculating percent gene coverage<a class="headerlink" href="#calculating-percent-gene-coverage" title="Permanent link">&para;</a></h2>
 <p>Before creating any reports, <code>Parser</code> calls the <code>Coverage</code> class to calculate the percent gene coverage over a specified minimum depth (default: 10) for the coding regions of all genes included in the TBDB (the database used in TBProfiler to generate the drug resistance annotations). This requires as input the BAM and BAI files produced by TBProfiler during alignment to the H37Rv reference genome. The percent gene coverage results are then stored in a global dictionary that is accessed multiple times for QC purposes during the creation of the final reports.</p>
 <h2 id="creating-the-laboratorian-report">Creating the Laboratorian report<a class="headerlink" href="#creating-the-laboratorian-report" title="Permanent link">&para;</a></h2>
@@ -1228,7 +1235,7 @@ <h2 id="creating-the-coverage-report">Creating the coverage report<a class="head
     <span class="md-icon" title="Last update">
       <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21 13.1c-.1 0-.3.1-.4.2l-1 1 2.1 2.1 1-1c.2-.2.2-.6 0-.8l-1.3-1.3c-.1-.1-.2-.2-.4-.2m-1.9 1.8-6.1 6V23h2.1l6.1-6.1zM12.5 7v5.2l4 2.4-1 1L11 13V7zM11 21.9c-5.1-.5-9-4.8-9-9.9C2 6.5 6.5 2 12 2c5.3 0 9.6 4.1 10 9.3-.3-.1-.6-.2-1-.2s-.7.1-1 .2C19.6 7.2 16.2 4 12 4c-4.4 0-8 3.6-8 8 0 4.1 3.1 7.5 7.1 7.9l-.1.2z"/></svg>
     </span>
-    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-08-20</span>
+    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-11-21</span>
   </span>
 
     
diff --git a/v2.2.1/assets/tbp-parser_versioning.png b/v2.2.1/assets/tbp-parser_versioning.png
index 2aa6c9f..107ffdf 100644
Binary files a/v2.2.1/assets/tbp-parser_versioning.png and b/v2.2.1/assets/tbp-parser_versioning.png differ
diff --git a/v2.2.1/index.html b/v2.2.1/index.html
index 2508810..2ab4fc3 100644
--- a/v2.2.1/index.html
+++ b/v2.2.1/index.html
@@ -980,6 +980,16 @@ <h1 id="tbp-parser">tbp-parser<a class="headerlink" href="#tbp-parser" title="Pe
 <p class="admonition-title">Not for Diagnostic Use</p>
 <p><strong>CAUTION</strong>: The information produced by this program should <strong>not</strong> be used for clinical reporting unless and until extensive validation has occured in your laboratory on a stable version. Otherwise, the outputs of tbp-parser are for research use only.</p>
 </div>
+<div class="admonition dna">
+<p class="admonition-title">FUTURE DEPRECATION NOTICE</p>
+<p><mark><strong>At the time of the PHB v2.3.0 release:</strong></mark></p>
+<ul>
+<li><strong>all</strong> branches on Terra that have been mentioned in this documentation will be deleted. Please use the v2.3.0 version of TheiaProk moving forward.</li>
+<li>the <code>main</code> branch of tbp-parser will host v2.1.0 and above; earlier versions of tbp-parser will no longer be supported</li>
+<li>future releases of tbp-parser will only support outputs generated by TBProfiler v6.0.0 and above.</li>
+</ul>
+<p><strong>Versions of TBProfiler prior to v6.0.0 are not compatible with v2+ of tbp-parser.</strong> Please ensure that you are using the correct version of tbp-parser for your version of TBProfiler.</p>
+</div>
 <h2 id="overview">Overview<a class="headerlink" href="#overview" title="Permanent link">&para;</a></h2>
 <p><code>tbp-parser</code> is a tool developed in partnership with the California Department of Health (CDPH) to parse the output of <a href="https://github.com/jodyphelan/TBProfiler">Jody Phelan’s TBProfiler tool</a> into four additional files:</p>
 <ol>
@@ -1011,7 +1021,7 @@ <h2 id="overview">Overview<a class="headerlink" href="#overview" title="Permanen
     <span class="md-icon" title="Last update">
       <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21 13.1c-.1 0-.3.1-.4.2l-1 1 2.1 2.1 1-1c.2-.2.2-.6 0-.8l-1.3-1.3c-.1-.1-.2-.2-.4-.2m-1.9 1.8-6.1 6V23h2.1l6.1-6.1zM12.5 7v5.2l4 2.4-1 1L11 13V7zM11 21.9c-5.1-.5-9-4.8-9-9.9C2 6.5 6.5 2 12 2c5.3 0 9.6 4.1 10 9.3-.3-.1-.6-.2-1-.2s-.7.1-1 .2C19.6 7.2 16.2 4 12 4c-4.4 0-8 3.6-8 8 0 4.1 3.1 7.5 7.1 7.9l-.1.2z"/></svg>
     </span>
-    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-08-20</span>
+    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-11-21</span>
   </span>
 
     
diff --git a/v2.2.1/inputs/inputs/index.html b/v2.2.1/inputs/inputs/index.html
index 4cf70a0..49ae6d1 100644
--- a/v2.2.1/inputs/inputs/index.html
+++ b/v2.2.1/inputs/inputs/index.html
@@ -1042,7 +1042,7 @@ <h1>Command-line Arguments</h1>
 
 <p>The inputs on this page reflect the parameters that are applicable for the command-line tool. To see the inputs required for <code>tbp-parser</code> when run as part of the TheiaProk workflow series, please refer to the <a href="../theiaprok/">TheiaProk Inputs</a> page.</p>
 <h2 id="required-inputs">Required Inputs<a class="headerlink" href="#required-inputs" title="Permanent link">&para;</a></h2>
-<p><code>tbp-parser</code> is designed to run immediately after <a href="https://github.com/jodyphelan/TBProfiler">Jody Phelan’s TB-Profiler tool</a>. Only two inputs are required: the JSON file produced by <code>TB-Profiler</code> and the BAM file produced by <code>TB-Profiler</code>.</p>
+<p><code>tbp-parser</code> is designed to run immediately after <a href="https://github.com/jodyphelan/TBProfiler">Jody Phelan’s TBProfiler tool</a>. Only two inputs are required: the JSON file produced by <code>TBProfiler</code> and the BAM file produced by <code>TBProfiler</code>.</p>
 <p>The JSON file contains information about the mutations detected in the sample: the quality, the type, and if that mutation confers resistance to an antimicrobial drug. The BAM file contains the alignment information for the sample and is needed for determining sequencing quality. </p>
 <table>
 <thead>
@@ -1054,16 +1054,16 @@ <h2 id="required-inputs">Required Inputs<a class="headerlink" href="#required-in
 <tbody>
 <tr>
 <td style="text-align: left;">input_json</td>
-<td style="text-align: left;">The path to the JSON file that was produced by <code>TB-Profiler</code></td>
+<td style="text-align: left;">The path to the JSON file that was produced by <code>TBProfiler</code></td>
 </tr>
 <tr>
 <td style="text-align: left;">input_bam</td>
-<td style="text-align: left;">The path to the BAM file that was produced by <code>TB-Profiler</code></td>
+<td style="text-align: left;">The path to the BAM file that was produced by <code>TBProfiler</code></td>
 </tr>
 </tbody>
 </table>
 <div class="admonition info">
-<p class="admonition-title">Info</p>
+<p class="admonition-title">BAM index file required</p>
 <p>The BAM file must have the accompanying BAI file in the same directory. It must also be named exactly the same as the BAM file but ending with a <code>.bai</code> suffix.</p>
 </div>
 <h2 id="optional-inputs">Optional Inputs<a class="headerlink" href="#optional-inputs" title="Permanent link">&para;</a></h2>
@@ -1109,7 +1109,7 @@ <h3 id="quality-control-arguments">Quality Control Arguments<a class="headerlink
 <td style="text-align: left;">-r</td>
 <td style="text-align: left;">--coverage_regions</td>
 <td style="text-align: left;">A BED file containing the regions to calculate percent coverage for</td>
-<td style="text-align: left;"><a href="https://github.com/theiagen/tbp-parser/blob/v1.6.0/data/tbdb-modified-regions.bed">/data/tbdb-modified-regions.md</a></td>
+<td style="text-align: left;"><a href="https://github.com/theiagen/tbp-parser/blob/main/data/tbdb-modified-regions.bed">/data/tbdb-modified-regions.md</a></td>
 </tr>
 </tbody>
 </table>
@@ -1164,7 +1164,7 @@ <h3 id="lims-arguments">LIMS Arguments<a class="headerlink" href="#lims-argument
 </tbody>
 </table>
 <h3 id="tngs-specific-arguments">tNGS-specific Arguments<a class="headerlink" href="#tngs-specific-arguments" title="Permanent link">&para;</a></h3>
-<p>These options are primarily used for tNGS data, although all frequency arguments are compatible with WGS data.</p>
+<p>These options are primarily used for tNGS data, although all frequency and read support arguments are compatible with WGS data.</p>
 <table>
 <thead>
 <tr>
@@ -1182,7 +1182,7 @@ <h3 id="tngs-specific-arguments">tNGS-specific Arguments<a class="headerlink" hr
 <tr>
 <td style="text-align: left;">--tngs_expert_regions</td>
 <td style="text-align: left;">A BED file containing the regions to calculate coverage for expert rule regions. This is used to determine coverage quality in the regions where resistance-conferring mutations are found, or where a CDC expert rule is applied. This is not used for QC purposes</td>
-<td style="text-align: left;"><a href="https://github.com/theiagen/tbp-parser/blob/v1.6.0/data/tbdb-expert-regions.bed">/data/tbdb-expert-regions.bed</a></td>
+<td style="text-align: left;"><a href="https://github.com/theiagen/tbp-parser/blob/main/data/tbdb-expert-regions.bed">/data/tbdb-expert-regions.bed</a></td>
 </tr>
 <tr>
 <td style="text-align: left;">--rrs_frequency</td>
@@ -1217,11 +1217,28 @@ <h3 id="tngs-specific-arguments">tNGS-specific Arguments<a class="headerlink" hr
 </tbody>
 </table>
 <h3 id="logging-arguments">Logging Arguments<a class="headerlink" href="#logging-arguments" title="Permanent link">&para;</a></h3>
-<p>These options change the verbosity of the <code>stdout</code> log
-| Name | Description | Default Value |
-| :--- | :---------- | :------------ |
-| --verbose | Increases the output verbosity to describe which stage of the analysis is currently running | false |
-| --debug | The highest level of output verbosity detailing every step of the analysis and logic implemented; overwrites --verbose | false |</p>
+<p>These options change the verbosity of the <code>stdout</code> log</p>
+<table>
+<thead>
+<tr>
+<th style="text-align: left;">Name</th>
+<th style="text-align: left;">Description</th>
+<th style="text-align: left;">Default Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td style="text-align: left;">--verbose</td>
+<td style="text-align: left;">Increases the output verbosity to describe which stage of the analysis is currently running</td>
+<td style="text-align: left;">false</td>
+</tr>
+<tr>
+<td style="text-align: left;">--debug</td>
+<td style="text-align: left;">The highest level of output verbosity detailing every step of the analysis and logic implemented; overwrites --verbose</td>
+<td style="text-align: left;">false</td>
+</tr>
+</tbody>
+</table>
 
 
 
@@ -1244,7 +1261,7 @@ <h3 id="logging-arguments">Logging Arguments<a class="headerlink" href="#logging
     <span class="md-icon" title="Last update">
       <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21 13.1c-.1 0-.3.1-.4.2l-1 1 2.1 2.1 1-1c.2-.2.2-.6 0-.8l-1.3-1.3c-.1-.1-.2-.2-.4-.2m-1.9 1.8-6.1 6V23h2.1l6.1-6.1zM12.5 7v5.2l4 2.4-1 1L11 13V7zM11 21.9c-5.1-.5-9-4.8-9-9.9C2 6.5 6.5 2 12 2c5.3 0 9.6 4.1 10 9.3-.3-.1-.6-.2-1-.2s-.7.1-1 .2C19.6 7.2 16.2 4 12 4c-4.4 0-8 3.6-8 8 0 4.1 3.1 7.5 7.1 7.9l-.1.2z"/></svg>
     </span>
-    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-08-20</span>
+    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-11-21</span>
   </span>
 
     
diff --git a/v2.2.1/inputs/theiaprok/index.html b/v2.2.1/inputs/theiaprok/index.html
index 447a4b7..cda7819 100644
--- a/v2.2.1/inputs/theiaprok/index.html
+++ b/v2.2.1/inputs/theiaprok/index.html
@@ -989,11 +989,11 @@
 
   <h1>TheiaProk Inputs on Terra</h1>
 
-<p>When running <code>tbp-parser</code> as part of the TheiaProk workflow series (<a href="https://theiagen.notion.site/Theiagen-Public-Health-Resources-a4bd134b0c5c4fe39870e21029a30566?pvs=4">find documentation for TheiaProk here</a>) on <a href="https://terra.bio">Terra.bio</a>, an optional input must be activated to instruct TheiaProk to run <code>tbp-parser</code>.</p>
+<p>When running <code>tbp-parser</code> as part of the TheiaProk workflow series (<a href="https://theiagen.github.io/public_health_bioinformatics/latest/workflows/genomic_characterization/theiaprok/">find documentation for TheiaProk here</a>) on <a href="https://terra.bio">Terra.bio</a>, an optional input must be activated to instruct TheiaProk to run <code>tbp-parser</code>.</p>
 <p><code>tbp-parser</code> is not on by default due to the nature of this tool and its outputs.</p>
 <div class="admonition info annotate">
 <p class="admonition-title">TheiaProk Version</p>
-<p>This information only corresponds to <abbr title="Public Health Bioinformatics is the GitHub repository that contains the TheiaProk workflows.">PHB</abbr> v2.2.0. These inputs and outputs may not be applicable to other versions of TheiaProk.</p>
+<p>This information only corresponds to the upcoming <abbr title="Public Health Bioinformatics is the GitHub repository that contains the TheiaProk workflows.">PHB</abbr> v2.3.0 release. These inputs and outputs may not be applicable to other versions of TheiaProk.</p>
 </div>
 <h2 id="required-inputs">Required Inputs<a class="headerlink" href="#required-inputs" title="Permanent link">&para;</a></h2>
 <p>To activate <code>tbp-parser</code> you must set the following variable to true:</p>
@@ -1003,17 +1003,17 @@ <h2 id="required-inputs">Required Inputs<a class="headerlink" href="#required-in
 <th style="text-align: left;">Terra Task name</th>
 <th style="text-align: left;">Variable</th>
 <th style="text-align: left;">Type</th>
-<th style="text-align: left;">Default value</th>
 <th style="text-align: left;">Description</th>
+<th style="text-align: left;">Default Value</th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td style="text-align: left;"><code>merlin_magic</code></td>
-<td style="text-align: left;"><code>tbprofiler_additional_outputs</code></td>
+<td style="text-align: left;"><strong>call_tbp_parser</strong></td>
 <td style="text-align: left;">Boolean</td>
-<td style="text-align: left;"><code>false</code></td>
 <td style="text-align: left;">Set to <code>true</code> to activate <code>tbp-parser</code></td>
+<td style="text-align: left;"><code>false</code></td>
 </tr>
 </tbody>
 </table>
@@ -1025,80 +1025,136 @@ <h2 id="optional-inputs">Optional Inputs<a class="headerlink" href="#optional-in
 <th style="text-align: left;">Terra Task name</th>
 <th style="text-align: left;">Variable</th>
 <th style="text-align: left;">Type</th>
-<th style="text-align: left;">Default value</th>
 <th style="text-align: left;">Description</th>
+<th style="text-align: left;">Default Value</th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td style="text-align: left;"><code>merlin_magic</code></td>
-<td style="text-align: left;"><code>tbp_parser_output_seq_method_type</code></td>
-<td style="text-align: left;">String</td>
-<td style="text-align: left;">"WGS"</td>
-<td style="text-align: left;">Fills out the “seq_method” field in the tbp_parser output files</td>
+<td style="text-align: left;"><strong>tbp_parser_add_cs_lims</strong></td>
+<td style="text-align: left;">Boolean</td>
+<td style="text-align: left;">Set to <code>true</code> to add Cycloserine (CS) fields to the LIMS report</td>
+<td style="text-align: left;"><code>false</code></td>
+</tr>
+<tr>
+<td style="text-align: left;"><code>merlin_magic</code></td>
+<td style="text-align: left;"><strong>tbp_parser_coverage_regions_bed</strong></td>
+<td style="text-align: left;">File</td>
+<td style="text-align: left;">A BED file containing the regions to calculate percent coverage for</td>
+<td style="text-align: left;"><a href="https://github.com/theiagen/tbp-parser/blob/main/data/tbdb-modified-regions.bed">tbdb-modified-regions.md</a></td>
+</tr>
+<tr>
+<td style="text-align: left;"><code>merlin_magic</code></td>
+<td style="text-align: left;"><strong>tbp_parser_coverage_threshold</strong></td>
+<td style="text-align: left;">Int</td>
+<td style="text-align: left;">The minimum percentage of a region that has depth above the threshold set by <code>min_depth</code> (used for a gene/locus to pass QC)</td>
+<td style="text-align: left;">100</td>
 </tr>
 <tr>
 <td style="text-align: left;"><code>merlin_magic</code></td>
-<td style="text-align: left;"><code>tbp_parser_operator</code></td>
+<td style="text-align: left;"><strong>tbp_parser_debug</strong></td>
+<td style="text-align: left;">Boolean</td>
+<td style="text-align: left;">Set to <code>false</code> to turn off debug mode for <code>tbp-parser</code></td>
+<td style="text-align: left;"><code>true</code></td>
+</tr>
+<tr>
+<td style="text-align: left;"><code>merlin_magic</code></td>
+<td style="text-align: left;"><strong>tbp_parser_docker_image</strong></td>
 <td style="text-align: left;">String</td>
-<td style="text-align: left;">"Operator not provided"</td>
-<td style="text-align: left;">The operator who ran the analysis; used in the LIMS &amp; Looker reports</td>
+<td style="text-align: left;">The Docker image to use when running <code>tbp-parser</code></td>
+<td style="text-align: left;">"us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.1.0"</td>
 </tr>
 <tr>
 <td style="text-align: left;"><code>merlin_magic</code></td>
-<td style="text-align: left;"><code>tbp_parser_min_depth</code></td>
+<td style="text-align: left;"><strong>tbp_parser_etha237_frequency</strong></td>
+<td style="text-align: left;">Float</td>
+<td style="text-align: left;">Minimum frequency for a mutation in ethA at protein position 237 to pass QC in <code>tbp-parser</code></td>
+<td style="text-align: left;">0.1</td>
+</tr>
+<tr>
+<td style="text-align: left;"><code>merlin_magic</code></td>
+<td style="text-align: left;"><strong>tbp_parser_expert_rule_regions_bed</strong></td>
+<td style="text-align: left;">File</td>
+<td style="text-align: left;">A file that contains the regions where R mutations and expert rules are applied</td>
+<td style="text-align: left;"></td>
+</tr>
+<tr>
+<td style="text-align: left;"><code>merlin_magic</code></td>
+<td style="text-align: left;"><strong>tbp_parser_min_depth</strong></td>
 <td style="text-align: left;">Int</td>
+<td style="text-align: left;">Minimum depth for a variant to pass QC in tbp_parser</td>
 <td style="text-align: left;">10</td>
-<td style="text-align: left;">The minimum depth of coverage required for a site to pass QC</td>
 </tr>
 <tr>
 <td style="text-align: left;"><code>merlin_magic</code></td>
-<td style="text-align: left;"><code>tbp_parser_min_frequency</code></td>
+<td style="text-align: left;"><strong>tbp_parser_min_frequency</strong></td>
 <td style="text-align: left;">Int</td>
+<td style="text-align: left;">The minimum frequency for a mutation to pass QC</td>
 <td style="text-align: left;">0.1</td>
-<td style="text-align: left;">The minimum frequency for a mutation to pass QC (0.1 -&gt; 10%)</td>
 </tr>
 <tr>
 <td style="text-align: left;"><code>merlin_magic</code></td>
-<td style="text-align: left;"><code>tbp_parser_min_read_support</code></td>
+<td style="text-align: left;"><strong>tbp_parser_min_read_support</strong></td>
 <td style="text-align: left;">Int</td>
-<td style="text-align: left;">10</td>
 <td style="text-align: left;">The minimum read support for a mutation to pass QC</td>
+<td style="text-align: left;">10</td>
 </tr>
 <tr>
 <td style="text-align: left;"><code>merlin_magic</code></td>
-<td style="text-align: left;"><code>tbp_parser_coverage_threshold</code></td>
-<td style="text-align: left;">Int</td>
-<td style="text-align: left;">100</td>
-<td style="text-align: left;">The minimum percentage of a region that has depth above the threshold set by <code>min_depth</code> (used for a gene/locus to pass QC)</td>
+<td style="text-align: left;"><strong>tbp_parser_operator</strong></td>
+<td style="text-align: left;">String</td>
+<td style="text-align: left;">Fills the "operator" field in the tbp_parser output files</td>
+<td style="text-align: left;">"Operator not provided"</td>
 </tr>
 <tr>
 <td style="text-align: left;"><code>merlin_magic</code></td>
-<td style="text-align: left;"><code>tbp_parser_coverage_regions_bed</code></td>
-<td style="text-align: left;">File</td>
-<td style="text-align: left;"><a href="https://github.com/theiagen/tbp-parser/blob/v1.6.0/data/tbdb-modified-regions.bed">tbdb-modified-regions.md</a></td>
-<td style="text-align: left;">A BED file containing the regions to calculate percent coverage for</td>
+<td style="text-align: left;"><strong>tbp_parser_output_seq_method_type</strong></td>
+<td style="text-align: left;">String</td>
+<td style="text-align: left;">Fills out the "seq_method" field in the tbp_parser output files</td>
+<td style="text-align: left;">"Sequencing method not provided"</td>
 </tr>
 <tr>
 <td style="text-align: left;"><code>merlin_magic</code></td>
-<td style="text-align: left;"><code>tbp_parser_debug</code></td>
-<td style="text-align: left;">Boolean</td>
-<td style="text-align: left;">false</td>
-<td style="text-align: left;">Turn on debug mode for tbp-parser</td>
+<td style="text-align: left;"><strong>tbp_parser_rpob449_frequency</strong></td>
+<td style="text-align: left;">Float</td>
+<td style="text-align: left;">Minimum frequency for a mutation at protein position 449 to pass QC in <code>tbp-parser</code></td>
+<td style="text-align: left;">0.1</td>
 </tr>
 <tr>
 <td style="text-align: left;"><code>merlin_magic</code></td>
-<td style="text-align: left;"><code>tbp_parser_add_cs_lims</code></td>
-<td style="text-align: left;">Boolean</td>
-<td style="text-align: left;">false</td>
-<td style="text-align: left;">Adds Cycloserine (CS) fields to the LIMS report</td>
+<td style="text-align: left;"><strong>tbp_parser_rrl_frequency</strong></td>
+<td style="text-align: left;">Float</td>
+<td style="text-align: left;">Minimum frequency for a mutation in rrl to pass QC in <code>tbp-parser</code></td>
+<td style="text-align: left;">0.1</td>
 </tr>
 <tr>
 <td style="text-align: left;"><code>merlin_magic</code></td>
-<td style="text-align: left;"><code>tbp_parser_docker_image</code></td>
-<td style="text-align: left;">String</td>
-<td style="text-align: left;">"us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.6.0"</td>
-<td style="text-align: left;">The Docker image to use when running tbp-parser</td>
+<td style="text-align: left;"><strong>tbp_parser_rrl_read_support</strong></td>
+<td style="text-align: left;">Int</td>
+<td style="text-align: left;">Minimum read support for a mutation in rrl to pass QC in <code>tbp-parser</code></td>
+<td style="text-align: left;">10</td>
+</tr>
+<tr>
+<td style="text-align: left;"><code>merlin_magic</code></td>
+<td style="text-align: left;"><strong>tbp_parser_rrs_frequency</strong></td>
+<td style="text-align: left;">Float</td>
+<td style="text-align: left;">Minimum frequency for a mutation in rrs to pass QC in <code>tbp-parser</code></td>
+<td style="text-align: left;">0.1</td>
+</tr>
+<tr>
+<td style="text-align: left;"><code>merlin_magic</code></td>
+<td style="text-align: left;"><strong>tbp_parser_rrs_read_support</strong></td>
+<td style="text-align: left;">Int</td>
+<td style="text-align: left;">Minimum read support for a mutation in rrs to pass QC in <code>tbp-parser</code></td>
+<td style="text-align: left;">10</td>
+</tr>
+<tr>
+<td style="text-align: left;"><code>merlin_magic</code></td>
+<td style="text-align: left;"><strong>tbp_parser_tngs_data</strong></td>
+<td style="text-align: left;">Boolean</td>
+<td style="text-align: left;">Set to <code>true</code> to enable tNGS-specific parameters and runs in <code>tbp-parser</code></td>
+<td style="text-align: left;"><code>false</code></td>
 </tr>
 </tbody>
 </table>
@@ -1125,7 +1181,7 @@ <h2 id="optional-inputs">Optional Inputs<a class="headerlink" href="#optional-in
     <span class="md-icon" title="Last update">
       <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21 13.1c-.1 0-.3.1-.4.2l-1 1 2.1 2.1 1-1c.2-.2.2-.6 0-.8l-1.3-1.3c-.1-.1-.2-.2-.4-.2m-1.9 1.8-6.1 6V23h2.1l6.1-6.1zM12.5 7v5.2l4 2.4-1 1L11 13V7zM11 21.9c-5.1-.5-9-4.8-9-9.9C2 6.5 6.5 2 12 2c5.3 0 9.6 4.1 10 9.3-.3-.1-.6-.2-1-.2s-.7.1-1 .2C19.6 7.2 16.2 4 12 4c-4.4 0-8 3.6-8 8 0 4.1 3.1 7.5 7.1 7.9l-.1.2z"/></svg>
     </span>
-    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-08-20</span>
+    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-11-21</span>
   </span>
 
     
diff --git a/v2.2.1/outputs/coverage/index.html b/v2.2.1/outputs/coverage/index.html
index 9911b57..69c4e1b 100644
--- a/v2.2.1/outputs/coverage/index.html
+++ b/v2.2.1/outputs/coverage/index.html
@@ -1043,8 +1043,8 @@ <h2 id="tngs-specific-information">tNGS-specific information<a class="headerlink
 </tr>
 </tbody>
 </table>
-<p>Coverage regions are determined with either the default "../data/tbdb-modified-regions.bed" (collected on Sep 1, 2023 from the TBProfiler repository, or if <code>--tngs</code>, "../data/tngs-reportable-regions.bed".</p>
-<p>The R-expert rule region is determined only if <code>--tngs</code> is indicated and uses the ranges in "../data/tngs-expert-rule-regions.bed".</p>
+<p>Coverage regions are determined with either the default <a href="https://github.com/theiagen/tbp-parser/blob/main/data/tbdb-modified-regions.bed">/data/tbdb-modified-regions.bed</a> (collected on Sep 1, 2023 from the TBProfiler repository, or if <code>--tngs</code>, <a href="https://github.com/theiagen/tbp-parser/blob/main/data/tngs-reportable-regions.bed">/data/tngs-reportable-regions.bed</a>.</p>
+<p>The R-expert rule region is determined only if <code>--tngs</code> is indicated and uses the ranges in <a href="https://github.com/theiagen/tbp-parser/blob/main/data/tbdb-expert-regions.bed">/data/tbdb-expert-regions.bed</a>.</p>
 
 
 
@@ -1067,7 +1067,7 @@ <h2 id="tngs-specific-information">tNGS-specific information<a class="headerlink
     <span class="md-icon" title="Last update">
       <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21 13.1c-.1 0-.3.1-.4.2l-1 1 2.1 2.1 1-1c.2-.2.2-.6 0-.8l-1.3-1.3c-.1-.1-.2-.2-.4-.2m-1.9 1.8-6.1 6V23h2.1l6.1-6.1zM12.5 7v5.2l4 2.4-1 1L11 13V7zM11 21.9c-5.1-.5-9-4.8-9-9.9C2 6.5 6.5 2 12 2c5.3 0 9.6 4.1 10 9.3-.3-.1-.6-.2-1-.2s-.7.1-1 .2C19.6 7.2 16.2 4 12 4c-4.4 0-8 3.6-8 8 0 4.1 3.1 7.5 7.1 7.9l-.1.2z"/></svg>
     </span>
-    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-08-20</span>
+    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-11-21</span>
   </span>
 
     
diff --git a/v2.2.1/outputs/looker/index.html b/v2.2.1/outputs/looker/index.html
index f2f2fdc..947dbc9 100644
--- a/v2.2.1/outputs/looker/index.html
+++ b/v2.2.1/outputs/looker/index.html
@@ -1056,7 +1056,7 @@ <h3 id="explanation-of-column-headers">Explanation of column headers<a class="he
 </tr>
 <tr>
 <td>lineage</td>
-<td>The lineage of the sample (the <code>main_lin</code> field as reported by TB-Profiler); for example, lineage1.2.1.2.1</td>
+<td>The lineage of the sample (the <code>main_lin</code> field as reported by TBProfiler); for example, lineage1.2.1.2.1</td>
 </tr>
 <tr>
 <td>ID</td>
@@ -1095,7 +1095,7 @@ <h3 id="explanation-of-column-headers">Explanation of column headers<a class="he
     <span class="md-icon" title="Last update">
       <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21 13.1c-.1 0-.3.1-.4.2l-1 1 2.1 2.1 1-1c.2-.2.2-.6 0-.8l-1.3-1.3c-.1-.1-.2-.2-.4-.2m-1.9 1.8-6.1 6V23h2.1l6.1-6.1zM12.5 7v5.2l4 2.4-1 1L11 13V7zM11 21.9c-5.1-.5-9-4.8-9-9.9C2 6.5 6.5 2 12 2c5.3 0 9.6 4.1 10 9.3-.3-.1-.6-.2-1-.2s-.7.1-1 .2C19.6 7.2 16.2 4 12 4c-4.4 0-8 3.6-8 8 0 4.1 3.1 7.5 7.1 7.9l-.1.2z"/></svg>
     </span>
-    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-08-20</span>
+    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-11-21</span>
   </span>
 
     
diff --git a/v2.2.1/outputs/theiaprok/index.html b/v2.2.1/outputs/theiaprok/index.html
index 92f8358..bc5d5dd 100644
--- a/v2.2.1/outputs/theiaprok/index.html
+++ b/v2.2.1/outputs/theiaprok/index.html
@@ -938,10 +938,10 @@
 
   <h1>TheiaProk Outputs on Terra</h1>
 
-<p>When running <code>tbp-parser</code> as part of the TheiaProk workflow series (<a href="https://theiagen.notion.site/Theiagen-Public-Health-Resources-a4bd134b0c5c4fe39870e21029a30566?pvs=4">find documentation for TheiaProk here</a>) on <a href="https://terra.bio">Terra.bio</a>, you will find the following outputs in your data table.</p>
+<p>When running <code>tbp-parser</code> as part of the TheiaProk workflow series (<a href="https://theiagen.github.io/public_health_bioinformatics/latest/workflows/genomic_characterization/theiaprok/">find documentation for TheiaProk here</a>) on <a href="https://terra.bio">Terra.bio</a>, you will find the following outputs in your data table.</p>
 <div class="admonition info annotate">
 <p class="admonition-title">TheiaProk Version</p>
-<p>This information only corresponds to <abbr title="Public Health Bioinformatics is the GitHub repository that contains the TheiaProk workflows.">PHB</abbr> v2.2.0. These inputs and outputs may not be applicable to other versions of TheiaProk.</p>
+<p>This information only corresponds to the upcoming <abbr title="Public Health Bioinformatics is the GitHub repository that contains the TheiaProk workflows.">PHB</abbr> v2.3.0 release. These inputs and outputs may not be applicable to other versions of TheiaProk.</p>
 </div>
 <table>
 <thead>
@@ -1017,7 +1017,7 @@ <h1>TheiaProk Outputs on Terra</h1>
     <span class="md-icon" title="Last update">
       <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21 13.1c-.1 0-.3.1-.4.2l-1 1 2.1 2.1 1-1c.2-.2.2-.6 0-.8l-1.3-1.3c-.1-.1-.2-.2-.4-.2m-1.9 1.8-6.1 6V23h2.1l6.1-6.1zM12.5 7v5.2l4 2.4-1 1L11 13V7zM11 21.9c-5.1-.5-9-4.8-9-9.9C2 6.5 6.5 2 12 2c5.3 0 9.6 4.1 10 9.3-.3-.1-.6-.2-1-.2s-.7.1-1 .2C19.6 7.2 16.2 4 12 4c-4.4 0-8 3.6-8 8 0 4.1 3.1 7.5 7.1 7.9l-.1.2z"/></svg>
     </span>
-    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-08-20</span>
+    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-11-21</span>
   </span>
 
     
diff --git a/v2.2.1/search/search_index.json b/v2.2.1/search/search_index.json
index f68fa97..a39c710 100644
--- a/v2.2.1/search/search_index.json
+++ b/v2.2.1/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"tbp-parser","text":"<p>Not for Diagnostic Use</p> <p>CAUTION: The information produced by this program should not be used for clinical reporting unless and until extensive validation has occured in your laboratory on a stable version. Otherwise, the outputs of tbp-parser are for research use only.</p>"},{"location":"#overview","title":"Overview","text":"<p><code>tbp-parser</code> is a tool developed in partnership with the California Department of Health (CDPH) to parse the output of Jody Phelan\u2019s TBProfiler tool into four additional files:</p> <ol> <li>A Laboratorian report, which contains information about each mutation detected and its associated drug resistance profile in a CSV file.</li> <li>A LIMS report, formatted specifically for CDPH\u2019s STAR LIMS, which summarizes the highest severity mutations for each antimicrobial drug and the relevant mutations.</li> <li>A Looker report, which condenses the information contained in the Laboratorian report into a format suitable for generating a dashboard in Google\u2019s Looker Studio.</li> <li>A coverage report, which contains the percent coverage of each gene relative to the H37Rv reference genome in addition to any warnings, such as any deletions identified in the gene that might have contributed to a reduced percent coverage</li> </ol> <p>Please reach out to us at support@theiagen.com if you would like any custom file formats and/or changes to these output files that suit your individual needs.</p>"},{"location":"usage/","title":"Getting Started","text":""},{"location":"usage/#installation","title":"Installation","text":""},{"location":"usage/#docker","title":"Docker","text":"<p>We highly recommend using the following Docker iamge to run tbp-parser:</p> <pre><code>docker pull us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.6.0 #(1)!\n</code></pre> <ol> <li>We host our Docker images on the Google Artifact Registry so that they are always availble for usage.</li> </ol> <p>The entrypoint for this Docker iamge is the <code>tbp-parser</code> help message. To run this container interactively, use the following command:</p> <pre><code>docker run -it --entrypoint=/bin/bash us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.6.0\n# Once inside the container interactively, you can run the tbp-parser tool\npython3 /tbp-parser/tbp_parser/tbp_parser.py -v\n# v1.6.0\n</code></pre>"},{"location":"usage/#locally-with-python","title":"Locally with Python","text":"<p><code>tbp-parser</code> is not yet available with <code>pip</code> or <code>conda</code>. To run <code>tbp-parser</code> in your local command-line environment, install the following dependencies:</p> <ul> <li>python3</li> <li>pandas &gt;= 1.4.2</li> <li>importlib_resources</li> <li>samtools</li> </ul> <p>After installation of these dependencies, download and extract the latest release of <code>tbp-parser</code> and run the script with <code>Python</code>.</p>"},{"location":"usage/#usage","title":"Usage","text":""},{"location":"usage/#example-usage","title":"Example Usage","text":"<p>This shows how the script can be run if used inside the Docker container provided above.</p> <pre><code>python3 /tbp-parser/tbp_parser/tbp_parser.py \\\n    /path/to/data/tbprofiler_output.json \\\n    /path/to/data/tbprofiler_output.bam \\\n    -o \"example\" \\\n    --min_depth 12 \\\n    --min_frequency 0.9 \\\n    --sequencing_method \"Illumina NextSeq\" \\\n    --operator \"John Doe\" \n</code></pre> <p>Please note that the BAM file must have the accompanying BAI file in the same directory. It must also be named exactly the same as the BAM file but ending with a .bai suffix.</p>"},{"location":"usage/#help-message","title":"Help Message","text":"<p>The help message printed by <code>tbp-parser</code> is quite extensive, but has a lot of useful information regarding the input parameters. Here is the entire message in full. You can find more information regarding these inputs in the Inputs section.</p> <pre><code>usage: python3 /tbp-parser/tbp_parser/tbp_parser.py [-h|-v] &lt;input_json&gt; &lt;input_bam&gt; [&lt;args&gt;]\n\nParses Jody Phelon's TB-Profiler JSON output into four files:\n- a Laboratorian report,\n- a LIMS report\n- a Looker report, and\n- a coverage report\n\npositional arguments:\n  input_json\n          the JSON file produced by TBProfiler\n  input_bam\n          the BAM file produced by TBProfiler\n\noptional arguments:\n  -h, --help\n          show this help message and exit\n  -v, --version\n          show program's version number and exit\n\nquality control arguments:\n  options that determine what passes QC\n\n  -d, --min_depth\n          the minimum depth of coverage for a site to pass QC\n          default=10\n  -c, --min_percent_coverage\n          the minimum percentage of a region that has depth above the threshold set by min_depth\n            (used for a gene/locus to pass QC)\n          default=100\n  -s, --min_read_support\n          the minimum read support for a mutation to pass QC\n          default=10\n  -f, --min_frequency\n          the minimum frequency for a mutation to pass QC (0.1 -&gt; 10%)\n          default=0.1\n  -r, --coverage_regions\n          the BED file containing the regions to calculate percent coverage for\n          default=data/tbdb-modified-regions.bed\n\ntext arguments:\n  arguments that are used verbatim in the reports or to name the output files\n\n  -m, --sequencing_method\n          the sequencing method used to generate the data; used in the LIMS &amp; Looker reports\n          ** Enclose in quotes if includes a space\n          default=\"Sequencing method not provided\"\n  -p, --operator\n          the operator who ran the sequencing; used in the LIMS &amp; Looker reports\n          ** Enclose in quotes if includes a space\n          default=\"Operator not provided\"\n  -o, --output_prefix\n          the output file name prefix\n          ** Do not include any spaces\n\ntNGS-specific arguments:\n  options that are primarily used for tNGS data\n  (all frequency arguments are compatible with WGS data)\n\n  --tngs\n          indicates that the input data was generated using Deeplex + CDPH modified protocol\n          Turns on tNGS-specific global parameters\n  --tngs_expert_regions\n          the BED file containing the regions to calculate coverage for expert rule regions\n            (used to determine coverage quality in the regions where resistance-conferring\n            mutations are found, or where a CDC expert rule is applied; not for QC)\n          default=data/tngs-expert-rule-regions.bed\n  --rrs_frequency\n          the minimum frequency for an rrs mutation to pass QC\n            (rrs has several problematic sites in the Deeplex tNGS assay)\n          default=0.1\n  --rrl_frequency\n          the minimum frequency for an rrl mutation to pass QC\n            (rrl has several problematic sites in the Deeplex tNGS assay)\n          default=0.1\n  --rpob449_frequency\n          the minimum frequency for an rpoB mutation at protein position 449 to pass QC\n            (this is a problematic site in the Deeplex tNGS assay)\n          default=0.1\n  --etha237_frequency\n          the minimum frequency for an ethA mutation at protein position 237 to pass QC\n            (this is a problematic site in the Deeplex tNGS assay)\n          default=0.1\n\nlogging arguments:\n  options that change the verbosity of the stdout log\n\n  --verbose\n          increase output verbosity\n  --debug\n          increase output verbosity to debug; overwrites --verbose\n\nPlease contact support@theiagen.com with any questions\n</code></pre>"},{"location":"algorithm/","title":"Algorithm Overview","text":"<p>The algorithm of <code>tbp-parser</code> was developed with extensive guidance from the California Department of Health (CDPH). </p> <p>Find the following information in this section:</p> <ul> <li>Detailed information on how the algorithm works in the Technical Code Breakdown page.</li> <li>Information on how interpretations are determined in the Interpretation Document page.</li> </ul>"},{"location":"algorithm/interpretation/","title":"The Interpretation Document","text":"<p>Resistance calls are made in either one of two ways. The first is using the WHO annotation, which is output directly from the TBProfiler. The WHO has a catalogue of mutations and how they may confer antimicrobial resistance. If this annotation is present, it will always be used.</p> <p>In the case where the WHO annotation is missing, either due to novel mutations or mutations with unclear significance in the literature, <code>tbp-parser</code> will apply expert rules. These expert rules are additional conditions used to decide if a mutation is considered to confer resistance or not. These expert rules come from the CDC and can be found documented in the <code>tbp-parser</code> GitHub repository inside the interpretation logic PDFs.</p> <p>When an expert rule is applied, the <code>rationale</code> field of the laboratorian report will indicate which expert rule was used (the number prefacing the rule directly correlates to the appropriate section in the interpretation logic PDF) and indicate that there was no WHO annotation.</p> <p>The interpretation documents for v1.2.2 and v1.4.4.8 are available in the root directory of the <code>tbp-parser</code> repository. Versions that correspond to different releases are available in the interpretation_docs directory on GitHub.</p>"},{"location":"algorithm/technical/","title":"Technical Code Breakdown","text":"<p><code>tbp-parser</code> is object-oriented, with each class representing either an output file, a part of an output file, or a part of the input JSON file produced by TBProfiler.</p> <p>The first class that is invoked by the <code>tbp-parser.py</code> script is <code>Parser</code> which is a control class that orchestrates the creation of the different output reports. </p>"},{"location":"algorithm/technical/#calculating-percent-gene-coverage","title":"Calculating percent gene coverage","text":"<p>Before creating any reports, <code>Parser</code> calls the <code>Coverage</code> class to calculate the percent gene coverage over a specified minimum depth (default: 10) for the coding regions of all genes included in the TBDB (the database used in TBProfiler to generate the drug resistance annotations). This requires as input the BAM and BAI files produced by TBProfiler during alignment to the H37Rv reference genome. The percent gene coverage results are then stored in a global dictionary that is accessed multiple times for QC purposes during the creation of the final reports.</p>"},{"location":"algorithm/technical/#creating-the-laboratorian-report","title":"Creating the Laboratorian report","text":"<p>Then, <code>Parser</code> creates the Laboratorian report using the <code>Laboratorian</code> class and its associated <code>.create_laboratorian_report()</code> method.</p> <p>The <code>Laboratorian</code> class uses the input JSON file to collect the necessary information. The structure of the input JSON file is a good place to start the breakdown:</p> example_input.json<pre><code>{\n  \"id\": \"sample01\",\n  ...\n  \"main_lin\": \"lineage1\",\n  \"sublin\": \"lineage1.2.1.2.1\",\n  \"dr_variants\": [ ... ],\n  \"other_variants\": [ ...],\n  ...\n}\n</code></pre> <p>In this example, we can see only the relevant top-level JSON fields that are used in <code>tbp-parser</code>.</p> <p>Of interest, the <code>\"id\"</code> column is used to set the global <code>SAMPLE_NAME</code> variable.</p> <p>The lineage information, found in <code>\"main_lin\"</code> and <code>\"sublin\"</code> are used in the LIMS and Looker reports, so we won\u2019t go into detail about them here.</p> <p>The variant information is what makes up the bulk of the Laboratorian report and can be found in the <code>\"dr_variants\"</code> and <code>\"other_variants\"</code> fields. We\u2019ll talk more about these fields later.</p> <p>There are many other fields that are omitted from this example since they are not used in <code>tbp-parser</code>, such as version information and overall sample drug resistance type (like RR-TB, etc.). These fields are found in the other TBProfiler output files in more human-readable formats.</p> <p>Within the input JSON file, there are two fields that are examined the most: <code>\"dr_variants\"</code> and <code>\"other_variants\"</code>. These fields are treated the same, and have the same format, although different mutations are found in both regions. The difference between the two fields is unclear to me at this time. In the example below, only the fields used in <code>tbp-parser</code> are shown.</p> example_input.json<pre><code>{\n  ...\n  \"dr_variants\": [\n    {\n      \"chrom\": \"Chromosome\",\n      \"genome_pos\": 761109,\n      ...\n      \"depth\": 130,\n      \"freq\": 1,\n      \"type\": \"missense_variant\",\n      \"nucleotide_change\": \"c.1303G&gt;T\",\n      \"protein_change\": \"p.Asp435Tyr\",\n      \"annotation\": [\n        {\n          \"type\": \"who_confidence\",\n          \"drug\": \"rifampicin\",\n          \"who_confidence\": \"Assoc w R\"\n        }\n      ],\n      \"alternate_consequences\": [],\n      ...\n      \"locus_tag\": \"Rv0667\",\n      \"gene\": \"rpoB\",\n      \"gene_associated_drugs\": [\n        \"rifampicin\"\n      ]\n    }\n  ],\n  \"other_variants\": [\n    {\n      \"chrom\": \"Chromosome\",\n      \"genome_pos\": 6112,\n      ...\n      \"depth\": 105,\n      \"freq\": 1,\n      \"type\": \"missense_variant\",\n      \"nucleotide_change\": \"c.873G&gt;C\",\n      \"protein_change\": \"p.Met291Ile\",\n      \"annotation\": [\n        {\n          \"type\": \"who_confidence\",\n          \"drug\": \"moxifloxacin\",\n          \"who_confidence\": \"Not assoc w R\"\n        },\n        {\n          \"type\": \"who_confidence\",\n          \"drug\": \"levofloxacin\",\n          \"who_confidence\": \"Not assoc w R\"\n        }\n      ],\n      \"alternate_consequences\": [],\n      ...\n      \"locus_tag\": \"Rv0005\",\n      \"gene\": \"gyrB\",\n      \"gene_associated_drugs\": [\n        \"levofloxacin\",\n        \"ofloxacin\",\n        \"moxifloxacin\",\n        \"fluoroquinolones\",\n        \"ciprofloxacin\"\n      ]\n    },\n...\n}\n</code></pre> <p>After the global <code>SAMPLENAME</code> variable is set, the <code>Laboratorian</code> class calls the <code>.iterate_section()</code> method, starting with the <code>\"dr_variants\"</code> field.</p> <p>Since the contents of each variant section in the JSON dictionary are considered a list, we start to iterate through each list item, which consists of each section within curly brackets <code>{...}</code>. In the example to the left, I\u2019ve only included 1 item in each list. </p> <p>Immediately, each item in the list is converted into a <code>Variant</code> class object, and every item in each list item (the <code>\"chrom\"</code>, <code>\"genome_pos\"</code>, <code>\"locus_tag\"</code>, etc.) is converted to a class attribute. This is because each item in the list represents a single mutation or a single variant. I\u2019ll now refer to each variant section item as a <code>Variant</code>.</p> <p>Each new <code>Variant</code> object has the <code>.extract_annotations()</code> method called. This method starts by iterating through the <code>\"annotation\"</code> field in the input JSON. The annotation field can contain multiple different annotations, so we look at each one individually.</p> <p>Each annotation is turned into a <code>Row</code> object, which represents a row in the Laboratorian report. During the initiation of the <code>Row</code> object, each column in the Laboratorian report is created based on both the annotation field and the originating <code>Variant</code> object. Additionally, a warning field is created based on both the global dictionary created with the <code>Coverage</code> class and the mutation\u2019s <code>\"depth\"</code> and <code>\"freq\"</code> fields.</p> <p>Sometimes multiple annotations for the same drug can appear for a single <code>Variant</code>. If this is the case, only the most severe annotation is saved (that is, an annotation that indicates resistance is kept instead of one that indicates susceptibility).</p> <p>After the annotation field has been iterated through, we then check the <code>\"gene_associated_drugs\"</code> field to make sure that we create a <code>Row</code> for each antimicrobial drug that is associated with the gene. As you can see in the <code>\"other_variants\"</code> section, the annotation field for the variant only lists annotations for moxifloxacin and levofloxacin, but the gene is associated with three other antimicrobial drugs. This iteration creates additional <code>Row</code> objects for those antimicrobial drugs.</p> <p>This means that each mutation will potentially appear several times in the final report, once for every antimicrobial associated with the drug. This is because sometimes a mutation confers a different resistance level to one drug, but not another.</p> <p>After <code>Row</code> objects are created for each <code>Variant</code> in the variant section, every <code>Row</code> has the <code>.complete_row()</code> method called, which adds the interpretation columns to the object. Two interpretation columns are created, <code>mdl_interpretation</code> and <code>looker_interpretation</code>.</p> <p>Please note that these interpretation columns are typically identical, but in several cases, the <code>mdl_interpretation</code> column will call a variant-drug combination as \u201csusceptible\u201d (S), while the <code>looker_interpretation</code> column will call the same combination \u201cuncertain\u201d (U).</p> <p>In the case where a WHO annotation was not identified, the <code>Variant</code> class\u2019 <code>.apply_expert_rules()</code> method is called. This function applies expert rules that are listed in detail on the <code>tbp-parser</code> GitHub repository, available here.</p> <p>The expert rules assign a drug resistance call to the variant-drug combination only when there is no WHO annotation and will fill the <code>mdl_interpretation</code> and <code>looker_interpretation</code> fields.</p> <p>If the mutation is in either mmpS5, mmpL5, or mmpR5/Rv0678, then the <code>\"alternate_consequences\"</code> field is iterated through. This field typically lists the same mutation but in reference to a different gene; for instance, if a mutation is in the upstream non-coding region of one gene, it may be in the coding region of a different gene.</p> <p>Then, any genes that do not have any variants are added to the laboratorian report with various \u201cNA\u201d or \u201cWT\u201d values filling the appropriate fields.</p> <p>This means that every gene in the TBDB appears in the Laboratorian report regardless if any mutations were identified in that gene.</p> <p>Finally, a few more quality control measures are taken and then all of the individual <code>Row</code> objects are written to a CSV file, which concludes the creation of the laboratorian report.</p>"},{"location":"algorithm/technical/#creating-the-looker-report","title":"Creating the Looker report","text":"<p>The <code>Parser</code> class then creates a <code>Looker</code> object which uses the <code>.create_looker_report()</code> method. The Looker report uses the Laboratorian report to generate most of the included information.</p> <p>It starts by iterating through a list of antimicrobial drugs and extracting all of the <code>looker_interpretation</code> values for each row in the report with that antimicrobial drug. It then identifies the highest resistance rating (R &gt; R-Interim &gt; U &gt; S-Interim &gt; S) for all resistance annotations for a drug.</p> <p>Then, a quality check is performed and if a particular gene fails coverage that contributed to the highest resistance rating, an insufficient coverage warning is given.</p> <p>The <code>\"main_lin\"</code> and <code>\"sublin\"</code> fields from the input JSON file are used to fill the <code>ID</code> field in the report. These fields are converted into shortened English without any technical lineage information.</p> <p>Finally, the information is written to a CSV file which concludes the creation of the Looker report.</p>"},{"location":"algorithm/technical/#creating-the-lims-report","title":"Creating the LIMS report","text":"<p>The <code>Parser</code> class then creates <code>LIMS</code> object which uses the <code>.create_lims_report()</code> method. The LIMS report also uses the Laboratorian report to generate the bulk of the information included.</p> <p>The <code>.create_lims_report()</code> method begins by iterating through each LIMS antimicrobial and gene code (corresponding to the LIMS codes in the CDPH STAR LIMS system). Then, the highest <code>mdl_interpretation</code> value is extracted for each row in the report that is associated with that antimicrobial drug, like in the Looker report. Then, the annotation is converted into a human-readable format (R \u2192 Mutations(s) associated with resistance to {antimicrobial} detected\u201d, etc.).</p> <p>Then, the <code>.apply_lims_rules()</code> function is activated which determines which mutations should be output for the corresponding drug-gene combination. The mutations are then formatted so that they appear in the following format: <code>{nucleotide mutation} ({amino acid mutation, if available})</code> repeated, separated by semicolons.</p> <p>Some specific parsing rules apply to mutations within the rpoB gene, which changes the output language on the LIMS report. These rules depend on the position of the mutation in the gene.</p> <p>After the rules are applied and the mutations are collected, the information is written to a CSV file which concludes the creation of the LIMS report.</p>"},{"location":"algorithm/technical/#creating-the-coverage-report","title":"Creating the coverage report","text":"<p>The <code>Parser</code> class then reuses the <code>Coverage</code> object created first and calls the <code>.reformat_coverage()</code> method which adds any warnings, such as any deletion mutations detected for a gene. If a deletion is detected, a warning is useful because it indicates that although the reported coverage is less than 100%, it may be due to that deletion. If the coverage is still 100% and a deletion was identified, the warning will say that the deletion may be upstream.</p> <p>The coverage dictionary and the associated warnings are then written to a CSV file which concludes the creation of the coverage report, and the <code>tbp-parser</code> script.</p>"},{"location":"inputs/inputs/","title":"Command-line Arguments","text":"<p>The inputs on this page reflect the parameters that are applicable for the command-line tool. To see the inputs required for <code>tbp-parser</code> when run as part of the TheiaProk workflow series, please refer to the TheiaProk Inputs page.</p>"},{"location":"inputs/inputs/#required-inputs","title":"Required Inputs","text":"<p><code>tbp-parser</code> is designed to run immediately after Jody Phelan\u2019s TB-Profiler tool. Only two inputs are required: the JSON file produced by <code>TB-Profiler</code> and the BAM file produced by <code>TB-Profiler</code>.</p> <p>The JSON file contains information about the mutations detected in the sample: the quality, the type, and if that mutation confers resistance to an antimicrobial drug. The BAM file contains the alignment information for the sample and is needed for determining sequencing quality. </p> Parameter Description input_json The path to the JSON file that was produced by <code>TB-Profiler</code> input_bam The path to the BAM file that was produced by <code>TB-Profiler</code> <p>Info</p> <p>The BAM file must have the accompanying BAI file in the same directory. It must also be named exactly the same as the BAM file but ending with a <code>.bai</code> suffix.</p>"},{"location":"inputs/inputs/#optional-inputs","title":"Optional Inputs","text":"<p><code>tbp-parser</code> can be customized with a number of optional input parameters. These parameters can be used to control the quality control thresholds, the text that appears in the reports, and the names of the output files. The following is a list of all the input parameters that can be used with <code>tbp-parser</code>.</p> <p>In addition to these arguments, <code>tbp-parser</code> also has a <code>-h, --help</code> argument that will out the list of possible arguments and their descriptions and a <code>-v, --version</code> argument that will print out the version of <code>tbp-parser</code> that is installed. Both of these commands exit the program after printing their output.</p>"},{"location":"inputs/inputs/#quality-control-arguments","title":"Quality Control Arguments","text":"<p>These options determine the thresholds for quality control.</p> Short Version Long Version Description Default Value -d --min_depth The minimum depth of coverage required for a site to pass QC 10 -c --min_percent_coverage The minimum percentage of a region that has depth above the threshold set by <code>min_depth</code> (used for a gene/locus to pass QC) 100 -s --min_read_support The minimum read support for a mutation to pass QC 10 -f --min_frequency The minimum frequency for a mutation to pass QC (0.1 -&gt; 10%) 0.1 -r --coverage_regions A BED file containing the regions to calculate percent coverage for /data/tbdb-modified-regions.md"},{"location":"inputs/inputs/#text-arguments","title":"Text Arguments","text":"<p>These options are used verbatim in the reports, or are used to name the output files.</p> Short Version Long Version Description Default Value -m --sequencing_method The sequencing method used to gerneate the data; used in the LIMS &amp; Looker reports. Enclose in quotes if including a space \"Sequencing method not provided\" -p --operator The operator who ran the analysis; used in the LIMS &amp; Looker reports. Enclose in quotes if including a space \"Operator not provided\" -o --output_prefix The prefix to use for the output files. Do not include any spaces \"tbp-parser\""},{"location":"inputs/inputs/#lims-arguments","title":"LIMS Arguments","text":"<p>These options are used to customize the LIMS report</p> Name Description Default Value --add_cs_lims Adds Cycloserine (CS) fields to the LIMS report false"},{"location":"inputs/inputs/#tngs-specific-arguments","title":"tNGS-specific Arguments","text":"<p>These options are primarily used for tNGS data, although all frequency arguments are compatible with WGS data.</p> Name Description Default Value --tngs Indicates that the input data was generated using the Deeplex + CDPH modified protocol. Turns on tNGS-specific global parameters false --tngs_expert_regions A BED file containing the regions to calculate coverage for expert rule regions. This is used to determine coverage quality in the regions where resistance-conferring mutations are found, or where a CDC expert rule is applied. This is not used for QC purposes /data/tbdb-expert-regions.bed --rrs_frequency The minimum frequency for an rrs mutation to pass QC, as rrs has several problematic sites in the Deeplex tNGS assay 0.1 --rrl_frequency The minimum frequency for an rrl mutation to pass QC, as rrl has several problematic sites in the Deeplex tNGS assay 0.1 --rrs_read_support The minimum read support for an rrs mutation to pass QC, as rrs has several problematic sites in the Deeplex tNGS assay 10 --rrl_read_support The minimum read support for an rrl mutation to pass QC, as rrl has several problematic sites in the Deeplex tNGS assay 10 --rpob449_frequency The minimum frequency for an rpoB mutation at protein position 449 to pass QC, as this site is problematic in the Deeplex tNGS assay 0.1 --etha237_frequency The minimum frequency for an ethA mutation at protein position 237 to pass QC, as this site is problematic in the Deeplex tNGS assay 0.1"},{"location":"inputs/inputs/#logging-arguments","title":"Logging Arguments","text":"<p>These options change the verbosity of the <code>stdout</code> log | Name | Description | Default Value | | :--- | :---------- | :------------ | | --verbose | Increases the output verbosity to describe which stage of the analysis is currently running | false | | --debug | The highest level of output verbosity detailing every step of the analysis and logic implemented; overwrites --verbose | false |</p>"},{"location":"inputs/theiaprok/","title":"TheiaProk Inputs on Terra","text":"<p>When running <code>tbp-parser</code> as part of the TheiaProk workflow series (find documentation for TheiaProk here) on Terra.bio, an optional input must be activated to instruct TheiaProk to run <code>tbp-parser</code>.</p> <p><code>tbp-parser</code> is not on by default due to the nature of this tool and its outputs.</p> <p>TheiaProk Version</p> <p>This information only corresponds to PHB v2.2.0. These inputs and outputs may not be applicable to other versions of TheiaProk.</p>"},{"location":"inputs/theiaprok/#required-inputs","title":"Required Inputs","text":"<p>To activate <code>tbp-parser</code> you must set the following variable to true:</p> Terra Task name Variable Type Default value Description <code>merlin_magic</code> <code>tbprofiler_additional_outputs</code> Boolean <code>false</code> Set to <code>true</code> to activate <code>tbp-parser</code>"},{"location":"inputs/theiaprok/#optional-inputs","title":"Optional Inputs","text":"<p>The following optional inputs are also available for user modification on Terra:</p> Terra Task name Variable Type Default value Description <code>merlin_magic</code> <code>tbp_parser_output_seq_method_type</code> String \"WGS\" Fills out the \u201cseq_method\u201d field in the tbp_parser output files <code>merlin_magic</code> <code>tbp_parser_operator</code> String \"Operator not provided\" The operator who ran the analysis; used in the LIMS &amp; Looker reports <code>merlin_magic</code> <code>tbp_parser_min_depth</code> Int 10 The minimum depth of coverage required for a site to pass QC <code>merlin_magic</code> <code>tbp_parser_min_frequency</code> Int 0.1 The minimum frequency for a mutation to pass QC (0.1 -&gt; 10%) <code>merlin_magic</code> <code>tbp_parser_min_read_support</code> Int 10 The minimum read support for a mutation to pass QC <code>merlin_magic</code> <code>tbp_parser_coverage_threshold</code> Int 100 The minimum percentage of a region that has depth above the threshold set by <code>min_depth</code> (used for a gene/locus to pass QC) <code>merlin_magic</code> <code>tbp_parser_coverage_regions_bed</code> File tbdb-modified-regions.md A BED file containing the regions to calculate percent coverage for <code>merlin_magic</code> <code>tbp_parser_debug</code> Boolean false Turn on debug mode for tbp-parser <code>merlin_magic</code> <code>tbp_parser_add_cs_lims</code> Boolean false Adds Cycloserine (CS) fields to the LIMS report <code>merlin_magic</code> <code>tbp_parser_docker_image</code> String \"us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.6.0\" The Docker image to use when running tbp-parser <p>Find the outputs for <code>tbp-parser</code> in TheiaProk on Terra here.</p>"},{"location":"outputs/","title":"Output Overview","text":"<p><code>tbp-parser</code> produces four files as outputs. See each individual page for more details on how they are constructed and what they contain:</p> <ul> <li>Laboratorian report</li> <li>LIMS report</li> <li>Looker report</li> <li>Coverage report</li> </ul> <p>The four reports contain a wealth of information. The reports can be ordered from increasing to decreasing verbosity as follows: the laboratorian report, the LIMS report, the Looker report, and the coverage report. The same information is used in all four reports but at differing levels of verbosity.</p> <p>Running <code>tbp-parser</code> as part of TheiaProk on Terra produces additional outputs. You can find that information in the TheiaProk Outputs on Terra page.</p>"},{"location":"outputs/coverage/","title":"Coverage Report","text":"<p>The coverage report lists every gene and its percent gene coverage over a minimum depth (default: 10) relative to the H37Rv genome.</p> <p>Please note that user-provided coverage regions always take precedence over default values.</p>"},{"location":"outputs/coverage/#wgs-coverage-report","title":"WGS Coverage Report","text":"Column name Explanation Gene The name of the gene or locus Percent_Coverage The percent of the gene\u2019s coding region that has a read depth over the minimum value (default: 10; user-customizable by altering <code>--min_depth</code>) Warning Indicates if any deletions were identified in the gene which may contribute to lower than expected coverage <p>If run using the TheiaProk workflow series, there will be an additional column that contains only the name of the sample, which is useful when concatenating many reports as it helps differentiate which gene belongs to which sample.</p>"},{"location":"outputs/coverage/#tngs-specific-information","title":"tNGS-specific information","text":"<p>If the <code>--tngs</code> flag is used, the report contains the following fields:</p> Column name Explanation Gene The name of the gene or locus Coverage_Breadth_reportableQC_region The percent of the gene (positions determined by the regions covered by the tNGS Deeplex + CDPH assay primers that are considered reportable by CDPH) that is covered at a depth greater than the <code>--min_depth</code> value QC_Warning Indicates if any deletions were identified in the gene which may contribute to lower than expected coverage Coverage_Breadth_R_expert-rule_region The percent of the regions (positions that could contain any resistance-conferring mutations or require expert-rule application) that is covered at a depth greater than the <code>--min_depth</code> value <p>Coverage regions are determined with either the default \"../data/tbdb-modified-regions.bed\" (collected on Sep 1, 2023 from the TBProfiler repository, or if <code>--tngs</code>, \"../data/tngs-reportable-regions.bed\".</p> <p>The R-expert rule region is determined only if <code>--tngs</code> is indicated and uses the ranges in \"../data/tngs-expert-rule-regions.bed\".</p>"},{"location":"outputs/laboratorian/","title":"Laboratorian Report","text":"<p>The laboratorian report is the main report produced by <code>tbp-parser</code> and is used to generate all of the other reports. What follows is an explanation of all the columns in the report.</p>"},{"location":"outputs/laboratorian/#explanation-of-column-headers","title":"Explanation of column headers","text":"Column name Explanation sample_id The name of the sample tbprofiler_gene_name The name of the gene where the mutation has been identified tbprofiler_locus_tag The locus tag for the mutation that has been identified tbprofiler_variant_substitution_type The type of mutation identified, whether or not it was a frameshift, missense, or synonymous mutation tbprofiler_variant_substitution_nt The mutation in nucleotide format tbprofiler_variant_substitution_aa The mutation in amino acid format, if possible confidence Contains either:- the WHO annotation- an indication that there was no WHO annotation- NA for when there is no mutation antimicrobial The antimicrobial drug that may be affected by this mutation looker_interpretation The drug resistance interpretation intended for the Looker report mdl_interpretation The drug resistance interpretation intended for the LIMS report depth The depth of coverage at the mutation frequency The frequency of the mutation in the reads read_support How many reads support the mutation (depth * frequency) rationale Contains an indication of what was used (the WHO annotation, the specific expert rule used, or neither) to create the two interpretations warning Any potential quality warnings that may indicate lower reliability gene_tier The gene tier of the mutation\u2019s gene (Tier 1, Tier 2, or NA) <p>Because of how a particular mutation may contribute resistance to different drugs at the same time, each mutation is listed multiple times, once for each antimicrobial drug that could be affected. In addition, any genes that do not have any mutations are also included in the laboratorian report with NA or WT in the appropriate field. This results in a report with many rows and often, rows with very similar values. However, the laboratorian report contains the \u201ccomplete picture\u201d of the sample and is incredibly useful for understanding the sample\u2019s drug resistance profile.</p>"},{"location":"outputs/lims/","title":"LIMS Report","text":"<p>The LIMS report is intended for direct import into a STAR LIMS system. The columns are in the specific LIMS code format for CDPH, and may not apply to your LIMS system. Please contact us if you need different column headers and we can work with you towards a solution.</p>"},{"location":"outputs/lims/#explanation-of-column-headers","title":"Explanation of column headers","text":"Column name Explanation MDL sample accession numbers The name of the sample M_DST_A01_ID The lineage of the sample in human-readable language M_DST_B01_INH The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (isoniazid) M_DST_B02_katG Any non-S mutations found in this gene with good quality responsible for the predicted resistance for ethionamideresponsible for the predicted resistance for isoniazid M_DST_B03_fabG1 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for isoniazid M_DST_B04_inhA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for isoniazid M_DST_C01_ETO The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (ethionamide) M_DST_C02_ethA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for ethionamide M_DST_C03_fabG1 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for ethionamide M_DST_C04_inhA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for ethionamide M_DST_D01_RIF The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (rifampin) M_DST_D02_rpoB Any non-S mutations found in this gene with good quality responsible for the predicted resistance for rifampin M_DST_E01_PZA The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (pyrazinamide) M_DST_E02_pncA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for pyrazinamide M_DST_F01_EMB The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (ethambutol) M_DST_F02_embA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for ethambutol M_DST_F03_embB Any non-S mutations found in this gene with good quality responsible for the predicted resistance for ethambutol M_DST_G01_AMK The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (amikacin) M_DST_G02_rrs Any non-S mutations found in this gene with good quality responsible for the predicted resistance for amikacin M_DST_G03_eis Any non-S mutations found in this gene with good quality responsible for the predicted resistance for amikacin M_DST_H01_KAN The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (kanamycin) M_DST_H02_rrs Any non-S mutations found in this gene with good quality responsible for the predicted resistance for kanamycin M_DST_H03_eis Any non-S mutations found in this gene with good quality responsible for the predicted resistance for kanamycin M_DST_I01_CAP The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (capreomycin) M_DST_I02_rrs Any non-S mutations found in this gene with good quality responsible for the predicted resistance for capreomycin M_DST_I03_tlyA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for capreomycin M_DST_J01_MFX The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (moxifloxacin) M_DST_J02_gyrA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for moxifloxacin M_DST_J03_gyrB Any non-S mutations found in this gene with good quality responsible for the predicted resistance for moxifloxacin M_DST_K01_LFX The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (levofloxacin) M_DST_K02_gyrA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for levofloxacin M_DST_K03_gyrB Any non-S mutations found in this gene with good quality responsible for the predicted resistance for levofloxacin M_DST_L01_BDQ The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (bedaquiline) M_DST_L02_Rv0678 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for bedaquiline M_DST_L03_atpE Any non-S mutations found in this gene with good quality responsible for the predicted resistance for bedaquiline M_DST_L04_pepQ Any non-S mutations found in this gene with good quality responsible for the predicted resistance for bedaquiline M_DST_L05_mmpL5 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for bedaquiline M_DST_L06_mmpS5 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for bedaquiline M_DST_M01_CFZ The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (clofazimine) M_DST_M02_Rv0678 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for clofazimine M_DST_M03_pepQ Any non-S mutations found in this gene with good quality responsible for the predicted resistance for clofazimine M_DST_M04_mmpL5 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for clofazimine M_DST_M05_mmpS5 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for clofazimine M_DST_N01_LZD The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (linezolid) M_DST_N02_rrl Any non-S mutations found in this gene with good quality responsible for the predicted resistance for linezolid M_DST_N03_rplC Any non-S mutations found in this gene with good quality responsible for the predicted resistance for linezolid Analysis date The date <code>tbp-parser</code> was run in YYYY-MM-DD HH:SS format Operator The name of the person who ran <code>tbp-parser</code>; can be provided with the <code>--operator</code> input parameter. If left blank, \u201cOperator not provided\u201d is the default value. M_DST_O01_lineage The lineage of the sample (the <code>main_lin</code> of the sample as reported by TBProfiler) M_DST_P01_CS The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (cycloserine); only included when <code>--add_cs_lims</code> is set to true M_DST_P02_ald Any non-S mutations found in this gene with good quality responsible for the predicted resistance for cycloserine; only included when <code>--add_cs_lims</code> is set to true M_DST_PO3_alr Any non-S mutations found in this gene with good quality responsible for the predicted resistance for cycloserine; only included when <code>--add_cs_lims</code> is set to true <p>The LIMS report offers a condensed version of the laboratorian report with more details than the Looker report. By containing only the most important information about a drug and its related mutations, the LIMS report provides an invaluable summary.</p>"},{"location":"outputs/looker/","title":"Looker Report","text":"<p>The Looker report is intended for use in Google's Looker Studio Data Studio for dashboarding purposes. It offers a highly condensed version of the resistance calls (using the <code>looker_interpretation</code> field from the laboratorian report) for a quick summary of the sample\u2019s drug resistance profile.</p>"},{"location":"outputs/looker/#explanation-of-column-headers","title":"Explanation of column headers","text":"Column name Explanation sample_id The name of the sample output_seq_method_type The sequencing method used to generate the data; can be set with the <code>--sequencing_method</code> input parameter. If left blank, \u201cSequencing method not provided\u201d is the default value amikacin The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug bedaquiline The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug capreomycin The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug clofazimine The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug ethambutol The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug ethionamide The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug isoniazid The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug kanamycin The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug levofloxacin The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug linezolid The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug moxifloxacin The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug pyrazinamide The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug rifampin The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug streptomycin The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug lineage The lineage of the sample (the <code>main_lin</code> field as reported by TB-Profiler); for example, lineage1.2.1.2.1 ID The lineage of the sample in human-readable language (the same as <code>M_DST_A01_ID</code> in the LIMS report) analysis_date The date <code>tbp-parser</code> was run in YYYY-MM-DD HH:SS format operator The name of the person who ran <code>tbp-parser</code>; can be provided with the <code>--operator</code> input parameter. If left blank, \u201cOperator not provided\u201d is the default value. <p>Please note that occasionally, the <code>looker_interpretation</code> field can differ from the <code>mdl_interpretation</code> field. Typically, they are identical, but occasionally, the <code>mdl_interpretation</code> column will call a variant-drug combination \u201csusceptible\u201d (S), while the <code>looker_interpretation</code> column will call the same combination \u201cuncertain\u201d (U). Be aware of this difference when choosing an interpretation to report.</p>"},{"location":"outputs/theiaprok/","title":"TheiaProk Outputs on Terra","text":"<p>When running <code>tbp-parser</code> as part of the TheiaProk workflow series (find documentation for TheiaProk here) on Terra.bio, you will find the following outputs in your data table.</p> <p>TheiaProk Version</p> <p>This information only corresponds to PHB v2.2.0. These inputs and outputs may not be applicable to other versions of TheiaProk.</p> Variable Type Description tbp_parser_average_genome_depth Float The average depth of coverage across the H37Rv reference genome tbp_parser_coverage_report File The coverage report generated by <code>tbp-parser</code> tbp_parser_docker String The Docker image used to run <code>tbp-parser</code> tbp_parser_genome_percent_coverage Float The percentage of the H37Rv reference genome that has depth above the threshold set by <code>tbp_parser_min_depth</code> tbp_parser_laboratorian_report_csv File The laboratorian report generated by <code>tbp-parser</code> tbp_parser_lims_report_csv File The LIMS report generated by <code>tbp-parser</code> tbp_parser_looker_report_csv File The Looker report generated by <code>tbp-parser</code> tbp_parser_version String The version of tbp-parser used in the analysis as determined by <code>tbp-parser --version</code> <p>Find the inputs for <code>tbp-parser</code> in TheiaProk on Terra here.</p>"},{"location":"versioning/","title":"Versioning and Releases","text":"<p>The California Department of Public Health has clinically validated the following versions:</p> <ul> <li>v1.2.2 for WGS, and</li> <li>v1.4.4.8 for tNGS</li> </ul> <p>Interpretation documents for v1.2.2 and v1.4.4.8 are available in the root directory of the <code>tbp-parser</code> repository; others are available in the interpretation_docs directory on GitHub.</p> <p>If you are running tbp-parser as part of the TheiaProk pipeline(s) with Terra, the following branches are recommended:</p> <ul> <li>To run v1.2.2 on Terra, please use the smw-tb-2024-01-16-dev branch.</li> <li>To run v1.4.4.8+ and v1.6.x+, please use the smw-tb-2024-05-03-dev branch.</li> <li>To run v1.5.x+ and v2.x+, please use the smw-tb-2024-05-03-who2-dev branch.</li> </ul> <p>For more information on the differences between versions, you can see the Brief Description of Versions or the Exhaustive List of Versions.</p>"},{"location":"versioning/brief/","title":"Brief Description of Versions","text":"<p>You may notice there are many releases; <code>tbp-parser</code> is in active development and each release is \"use at your own risk.\" We highly recommend upgrading to the latest release as they include important bug fixes. In order to help track the different changes, we have included a brief description of each release:</p> <ul> <li>v1.2.x &amp; below - the initial developmental stages of tbp-parser for WGS data</li> <li>v1.3.x - the addition of tNGS data parsing and includes some updates applicable to WGS parsing</li> <li>v1.4.x - reworks how QC is performed (changes in order of operations)<ul> <li>v1.4.3+ - changes how tNGS lineage determination is performed</li> <li>v1.4.4+ - changes how nonsynonymous mutations are interpretted; major interpretation differences between earlier versions</li> </ul> </li> <li>v1.6.x - only considers the genes included in the LIMS report to determine the drug output in the LIMS report</li> <li>v1.5.x+ and v2.0.0 - major changes to code in due to using results from TB-Profiler v6.2.0+<ul> <li>code changes for v2.x are available on the <code>who-v2</code> branch of <code>tbp-parser</code></li> </ul> </li> </ul> <p>For a more exhaustive list, please visit the Exhaustive List of Versions.</p>"},{"location":"versioning/exhaustive/","title":"Exhaustive version descriptions","text":"<p>The following is a list of every version of <code>tbp-parser</code> and a short summary of the changes made in each version.</p> <p>Blue indicates that CDPH performed a clinical validation on that version</p> <ul> <li>v1.0.0 - initial version</li> <li>v1.1.0 - adjusts the highest interpretation for a drug to only consider genes in LIMS report, adds the rule to the confidence column, adds QRDR expert rules for gyrA and gyrB</li> <li>v1.1.1 - fixes a bug in R/QRDR region calculations</li> <li>v1.1.2 - adjusts LIMS lineage designation by checking for BCG and if lineage from TB Profiler is empty</li> <li>v1.1.3 - now includes the TB Profiler sublineage output when determining BCG M bovis</li> <li>v1.1.4 - now checks if multiple lineages/sublineages were detected</li> <li>v1.1.5 - checks all mmpS/mmpL/mmpR alternate consequences; also checks to make sure all drugs are reported</li> <li>v1.1.5.1 - renames rifampicin to rifampin</li> <li>v1.1.6 - removes a locus warning with deletion caveat</li> <li>v1.1.7 - ensures all deletion caveat locus warnings are gone, overwrites all fields with locus warning with \u201cNA\u201d or \u201cInsufficient Coverage\u201d as appropriate and moves them to the bottom of the Laboratorian report</li> <li>v1.1.8 - changes overwrite to only overwrite interpretation values, not mutation information</li> <li>v1.1.9 - renames rifampicin to rifampin</li> <li>v1.2.0 - enables ability to provide alternate coverage bed file; introduced the modified regions (just coding region + 30bp upstream or promoter region)</li> <li>v1.2.1 - fixes a bug when renaming rifampicin to rifampin</li> <li>v1.2.2 (WGS) - improve how maximum MDL interpretation is calculated for the LIMS report. Use the smw-tb-2024-01-16-dev branch on Terra.</li> <li>v1.2.3 - check only the LIMS genes\u2019 coverage for LIMS lineage determination and use a threshold for all lineage designation</li> <li>v1.3.0 - adds tNGS regions, checks to make sure that only variants for genes in the coverage report are included in the laboratorian (tNGS), error-proof locus tag designation, add check to prevent failures when gene not in coverage dictionary (tNGS), adds \u201cNA\u201d to the mutation rank list (score = 0, same as Insufficient Coverage)</li> <li>v1.3.1 - adds <code>--tngs</code> flag to turn on tNGS-specific global parameters, establishes different threshold calculation for lineage designation for tNGS, checks the segment of a gene a variant was detected in, removes check that did not prevent failures when gene not in coverage dictionary from v1.3.0, error-proof all coverage checks, adds \u201cThis mutation is outside the expected region\u201d warning</li> <li>v1.3.2 - error-proofs coverage warning and adds additional section for tNGS gene segments, error-proofs gene tier for tNGS gene segments</li> <li>v1.3.3 - condenses most gene segments into one, for WT mutations, set the mutation to \u201cWT\u201d not \u201cNA\u201d</li> <li>v1.3.4 - error-proofs maximum mdl interpretation determination and maximum looker interpretation determination</li> <li>v1.3.5 - adds rrs &amp; rrl frequency input parameters to customize mutation frequency for those genes , overwrites gene MDL interpretation when \u201cInsufficient Coverage\u201d to act as if \u201cWT\u201d if greater than S</li> <li>v1.3.6 - adds the TB-Profiler lineage to the end of the LIMS report and the Looker report, adds LIMS lineage to Looker report, introduces check if max MDL interpretation is also Insufficient Coverage to change output to Pending Retest</li> <li>v1.3.7 - add to the coverage report the \u201cexpert rule regions\u201d column for tNGS, overwrites gene MDL interpretation when \u201cInsufficient Coverage\u201d to act as if \u201cWT\u201d if gr **eater than or equal to S</li> <li>v1.3.8 - add frequency input parameters for rpoB 449 and ethA 237, renames coverage threshold to minimum percent coverage</li> <li>v1.3.9 - check if gene name is rpoB because that means it\u2019s outside the expected region (tNGS - rpoB is in two segments), add rrs and rrl read support input parameters</li> <li>v1.4.0 - rework how QC is performed (order of operations)</li> <li>v1.4.1 - remove rpoB expected region check, implements deletion position quality check in QC (keep only valid deletions), if outside expected region warning, set MDL interpretations to NA</li> <li>v1.4.2 - remove \u201coutside expected region\u201d mutations from LIMS report, error-proofs determining responsible MDL interpretations</li> <li>v1.4.2.1 (same change in v1.5.4) - prevent overwriting \u201cR\u201d mutations with No Sequence, and overwrite \u201cU\u201d mutations with \u201cPending Retest\u201d if bad quality</li> <li>v1.4.3 - implement different thresholds for LIMS lineage identification for tNGS,</li> <li>v1.4.4 - update expert rule interpretations (mainly S \u2192 U in several spots)</li> <li>v1.4.4.1 (v1.5.0 branched off of this one)- update LIMS threshold to 90, not the coverage threshold</li> <li>v1.4.4.2 (same change in v1.5.1) - fix an issue where \u201cNo sequence\u201d was not triggering Pending Retest</li> <li>v1.4.4.3 (same change in v1.5.5) - fix an issue where \u201cPending Retest\u201d was not properly appearing</li> <li>v1.4.4.4 (same change in v1.5.6) - prevent \u201cPending Retest\u201d if Insufficient Coverage is in a gene that also has a valid deletion</li> <li>v1.4.4.5 - consider deletions invalid if coverage is between 0 and minimum coverage (10 default) (this consideration is unique to old TB Profiler and not mimicked in v1.5)</li> <li>v1.4.4.6 - a mistake; updates the version (this release is a mystery to me as there is nothing in there except version update)</li> <li>v1.4.4.7 (same change in v1.5.8) - change tNGS LIMS lineage designation to items in the coverage dictionary (to represent both rpoB segments)</li> <li>v1.4.4.8 (tNGS) (same change in v1.5.9)- reduce tNGS LIMS threshold to 70% from 90. Use the smw-tb-2024-05-03-dev branch on Terra for this and all subsequent v1.4.4.x+ versions.</li> <li>v1.4.4.9 (same change in v1.5.7) - add optional input to add cycloserine to LIMS report</li> <li>v1.4.4.10 - fix issue when MDL resistance was being overwritten to Pending Retest but without considering other genes when calculating the highest MDL resistance (as the other genes may have had higher resistances that were not captured at first)</li> <li>v1.4.4.11 - fix issue introduced by last fix where we ran into indexing errors due to no more MDL interpretations available in the list</li> <li>v1.5.0 (branched off of v1.4.4.1)- make all language changes necessary to be compatible with TBProfiler v6.2.1. Use the smw-tb-2024-05-03-who2-dev branch on Terra for this and all subsequent v1.5.x+ versions.</li> <li>v1.5.1  (same change in v1.4.4.2)- fix an issue where \u201cNo sequence\u201d was not triggering Pending Retest</li> <li>v1.5.2 - a mistake; somehow exactly the same as 1.4.4.2?? (this release is also a mystery)</li> <li>v1.5.3 - make additional language changes and fix an unusual edge case where the same mutation was identified; rename mmpR5 to Rv0678 again</li> <li>v1.5.4 (same change in v1.4.2.1) - prevent overwriting \u201cR\u201d mutations with No Sequence</li> <li>v1.5.5 (same change in v1.4.4.3 - fix an issue where \u201cPending Retest\u201d was not properly appearing; consider only LIMS genes for LIMS reort</li> <li>v1.5.6 (same change in v1.4.4.4) - prevent \u201cPending Retest\u201d if Insufficient Coverage is in a gene that also has a valid deletion</li> <li>v1.5.7 (same change in v1.4.4.9) - add optional input to add cycloserine to LIMS report</li> <li>v1.5.8 (same change in v1.4.4.7) - change tNGS LIMS lineage designation to check items in the coverage dictionary (to represent both rpoB segments; percentage calculation erroneously combined them)</li> <li>v1.5.9 (same change in v1.4.4.8) - reduce tNGS LIMS threshold to 70% from 90</li> <li>v1.5.10 - correct spelling of two genes in the LIMS report for cycloserine</li> <li>v1.6.0 (branched off of v1.4.4.11) - ensures that only LIMS genes are being considered for the LIMS report. Use the smw-tb-2024-05-03-dev branch on Terra for this and all subsequent v1.6.x+ versions.</li> <li>v2.0.0 (branched off of v1.5.10; same change in v1.4.4.10 and v1.4.4.11) - fix issue when MDL resistance was being overwritten to Pending Retest but without considering other genes when calculating the highest MDL resistance (as the other genes may have had higher resistances that were not captured at first) and fixes the resulting issue where indexing errors occurred\u00a0due to no more MDL interpretations. Use the smw-tb-2024-05-03-who2-dev branch on Terra for this and all subsequent v2.x+ versions.</li> </ul> <p>The following diagram shows how each version is related to the others without technical details: </p>"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"tbp-parser","text":"<p>Not for Diagnostic Use</p> <p>CAUTION: The information produced by this program should not be used for clinical reporting unless and until extensive validation has occured in your laboratory on a stable version. Otherwise, the outputs of tbp-parser are for research use only.</p> <p>FUTURE DEPRECATION NOTICE</p> <p>At the time of the PHB v2.3.0 release:</p> <ul> <li>all branches on Terra that have been mentioned in this documentation will be deleted. Please use the v2.3.0 version of TheiaProk moving forward.</li> <li>the <code>main</code> branch of tbp-parser will host v2.1.0 and above; earlier versions of tbp-parser will no longer be supported</li> <li>future releases of tbp-parser will only support outputs generated by TBProfiler v6.0.0 and above.</li> </ul> <p>Versions of TBProfiler prior to v6.0.0 are not compatible with v2+ of tbp-parser. Please ensure that you are using the correct version of tbp-parser for your version of TBProfiler.</p>"},{"location":"#overview","title":"Overview","text":"<p><code>tbp-parser</code> is a tool developed in partnership with the California Department of Health (CDPH) to parse the output of Jody Phelan\u2019s TBProfiler tool into four additional files:</p> <ol> <li>A Laboratorian report, which contains information about each mutation detected and its associated drug resistance profile in a CSV file.</li> <li>A LIMS report, formatted specifically for CDPH\u2019s STAR LIMS, which summarizes the highest severity mutations for each antimicrobial drug and the relevant mutations.</li> <li>A Looker report, which condenses the information contained in the Laboratorian report into a format suitable for generating a dashboard in Google\u2019s Looker Studio.</li> <li>A coverage report, which contains the percent coverage of each gene relative to the H37Rv reference genome in addition to any warnings, such as any deletions identified in the gene that might have contributed to a reduced percent coverage</li> </ol> <p>Please reach out to us at support@theiagen.com if you would like any custom file formats and/or changes to these output files that suit your individual needs.</p>"},{"location":"usage/","title":"Getting Started","text":""},{"location":"usage/#installation","title":"Installation","text":""},{"location":"usage/#docker","title":"Docker","text":"<p>We highly recommend using the following Docker iamge to run tbp-parser:</p> <pre><code>docker pull us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.1.0 #(1)!\n</code></pre> <ol> <li>We host our Docker images on the Google Artifact Registry so that they are always availble for usage.</li> </ol> <p>The entrypoint for this Docker image is the <code>tbp-parser</code> help message. To run this container interactively, you can use the following command:</p> <pre><code>docker run -it --entrypoint=/bin/bash us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.1.0\n\n# Once inside the container interactively, you can run the tbp-parser tool\npython3 /tbp-parser/tbp_parser/tbp_parser.py -v\n# v2.1.0\n</code></pre>"},{"location":"usage/#locally-with-python","title":"Locally with Python","text":"<p><code>tbp-parser</code> is not yet available with <code>pip</code> or <code>conda</code>. To run <code>tbp-parser</code> in your local command-line environment, install the following dependencies:</p> <ul> <li>python3</li> <li>pandas &gt;= 1.4.2</li> <li>importlib_resources</li> <li>samtools</li> </ul> <p>After installation of these dependencies, download and extract the latest release of <code>tbp-parser</code> and run the script with <code>python3</code>.</p>"},{"location":"usage/#usage","title":"Usage","text":""},{"location":"usage/#example-usage","title":"Example Usage","text":"<p>This shows how the script can be run if used inside the Docker container provided above.</p> <pre><code>python3 /tbp-parser/tbp_parser/tbp_parser.py \\\n    /path/to/data/tbprofiler_output.json \\\n    /path/to/data/tbprofiler_output.bam \\\n    -o \"example\" \\\n    --min_depth 12 \\\n    --min_frequency 0.9 \\\n    --sequencing_method \"Illumina NextSeq\" \\\n    --operator \"John Doe\" \n</code></pre> <p>Please note that the BAM file must have the accompanying BAI file in the same directory.</p>"},{"location":"usage/#help-message","title":"Help Message","text":"<p>The help message printed by <code>tbp-parser</code> is quite extensive, but has a lot of useful information regarding the input parameters. Here is the entire message in full. You can find more information regarding these inputs in the Inputs section.</p> <pre><code>usage: python3 /tbp-parser/tbp_parser/tbp_parser.py [-h|-v] &lt;input_json&gt; &lt;input_bam&gt; [&lt;args&gt;]\n\nParses Jody Phelon's TB-Profiler JSON output into four files:\n- a Laboratorian report,\n- a LIMS report\n- a Looker report, and\n- a coverage report\n\npositional arguments:\n  input_json\n          the JSON file produced by TBProfiler\n  input_bam\n          the BAM file produced by TBProfiler\n\noptional arguments:\n  -h, --help\n          show this help message and exit\n  -v, --version\n          show program's version number and exit\n\nquality control arguments:\n  options that determine what passes QC\n\n  -d, --min_depth\n          the minimum depth of coverage for a site to pass QC\n          default=10\n  -c, --min_percent_coverage\n          the minimum percentage of a region that has depth above the threshold set by min_depth\n            (used for a gene/locus to pass QC)\n          default=100\n  -s, --min_read_support\n          the minimum read support for a mutation to pass QC\n          default=10\n  -f, --min_frequency\n          the minimum frequency for a mutation to pass QC (0.1 -&gt; 10%)\n          default=0.1\n  -r, --coverage_regions\n          the BED file containing the regions to calculate percent coverage for\n          default=data/tbdb-modified-regions.bed\n\ntext arguments:\n  arguments that are used verbatim in the reports or to name the output files\n\n  -m, --sequencing_method\n          the sequencing method used to generate the data; used in the LIMS &amp; Looker reports\n          ** Enclose in quotes if includes a space\n          default=\"Sequencing method not provided\"\n  -p, --operator\n          the operator who ran the sequencing; used in the LIMS &amp; Looker reports\n          ** Enclose in quotes if includes a space\n          default=\"Operator not provided\"\n  -o, --output_prefix\n          the output file name prefix\n          ** Do not include any spaces\n\ntNGS-specific arguments:\n  options that are primarily used for tNGS data\n  (all frequency arguments are compatible with WGS data)\n\n  --tngs\n          indicates that the input data was generated using Deeplex + CDPH modified protocol\n          Turns on tNGS-specific global parameters\n  --tngs_expert_regions\n          the BED file containing the regions to calculate coverage for expert rule regions\n            (used to determine coverage quality in the regions where resistance-conferring\n            mutations are found, or where a CDC expert rule is applied; not for QC)\n          default=data/tngs-expert-rule-regions.bed\n  --rrs_frequency\n          the minimum frequency for an rrs mutation to pass QC\n            (rrs has several problematic sites in the Deeplex tNGS assay)\n          default=0.1\n  --rrl_frequency\n          the minimum frequency for an rrl mutation to pass QC\n            (rrl has several problematic sites in the Deeplex tNGS assay)\n          default=0.1\n  --rpob449_frequency\n          the minimum frequency for an rpoB mutation at protein position 449 to pass QC\n            (this is a problematic site in the Deeplex tNGS assay)\n          default=0.1\n  --etha237_frequency\n          the minimum frequency for an ethA mutation at protein position 237 to pass QC\n            (this is a problematic site in the Deeplex tNGS assay)\n          default=0.1\n\nlogging arguments:\n  options that change the verbosity of the stdout log\n\n  --verbose\n          increase output verbosity\n  --debug\n          increase output verbosity to debug; overwrites --verbose\n\nPlease contact support@theiagen.com with any questions\n</code></pre>"},{"location":"algorithm/","title":"Algorithm Overview","text":"<p>The algorithm of <code>tbp-parser</code> was developed with extensive guidance from the California Department of Health (CDPH). </p> <p>Find the following information in this section:</p> <ul> <li>Detailed information on how the algorithm works in the Technical Code Breakdown page.</li> <li>Information on how interpretations are determined in the Interpretation Document page.</li> </ul>"},{"location":"algorithm/interpretation/","title":"The Interpretation Document","text":"<p>Resistance calls are made in either one of two ways. The first is using the WHO annotation, which is output directly from the TBProfiler. The WHO has a catalogue of mutations and how they may confer antimicrobial resistance. If this annotation is present, it will always be used.</p> <p>In the case where the WHO annotation is missing, either due to novel mutations or mutations with unclear significance in the literature, <code>tbp-parser</code> will apply expert rules. These expert rules are additional conditions used to decide if a mutation is considered to confer resistance or not. These expert rules come from the CDC and can be found documented in the <code>tbp-parser</code> GitHub repository inside the interpretation logic PDFs.</p> <p>When an expert rule is applied, the <code>rationale</code> field of the laboratorian report will indicate which expert rule was used (the number prefacing the rule directly correlates to the appropriate section in the interpretation logic PDF) and indicate that there was no WHO annotation.</p> <p>The interpretation documents for v1.2.2 and v1.4.4.8 are available in the root directory of the <code>tbp-parser</code> repository. Versions that correspond to different releases are available in the interpretation_docs directory on GitHub.</p>"},{"location":"algorithm/technical/","title":"Technical Code Breakdown","text":"<p>Examples from TBProfiler v4.4.2</p> <p>The examples in this document are based on the output of TBProfiler v4.4.2. However, the general principles apply to all versions of TBProfiler and tbp-parser.</p>"},{"location":"algorithm/technical/#technical-code-breakdown","title":"Technical Code Breakdown","text":"<p><code>tbp-parser</code> is object-oriented, with each class representing either an output file, a part of an output file, or a part of the input JSON file produced by TBProfiler.</p> <p>The first class that is invoked by the <code>tbp-parser.py</code> script is <code>Parser</code> which is a control class that orchestrates the creation of the different output reports.</p>"},{"location":"algorithm/technical/#calculating-percent-gene-coverage","title":"Calculating percent gene coverage","text":"<p>Before creating any reports, <code>Parser</code> calls the <code>Coverage</code> class to calculate the percent gene coverage over a specified minimum depth (default: 10) for the coding regions of all genes included in the TBDB (the database used in TBProfiler to generate the drug resistance annotations). This requires as input the BAM and BAI files produced by TBProfiler during alignment to the H37Rv reference genome. The percent gene coverage results are then stored in a global dictionary that is accessed multiple times for QC purposes during the creation of the final reports.</p>"},{"location":"algorithm/technical/#creating-the-laboratorian-report","title":"Creating the Laboratorian report","text":"<p>Then, <code>Parser</code> creates the Laboratorian report using the <code>Laboratorian</code> class and its associated <code>.create_laboratorian_report()</code> method.</p> <p>The <code>Laboratorian</code> class uses the input JSON file to collect the necessary information. The structure of the input JSON file is a good place to start the breakdown:</p> example_input.json<pre><code>{\n  \"id\": \"sample01\",\n  ...\n  \"main_lin\": \"lineage1\",\n  \"sublin\": \"lineage1.2.1.2.1\",\n  \"dr_variants\": [ ... ],\n  \"other_variants\": [ ...],\n  ...\n}\n</code></pre> <p>In this example, we can see only the relevant top-level JSON fields that are used in <code>tbp-parser</code>.</p> <p>Of interest, the <code>\"id\"</code> column is used to set the global <code>SAMPLE_NAME</code> variable.</p> <p>The lineage information, found in <code>\"main_lin\"</code> and <code>\"sublin\"</code> are used in the LIMS and Looker reports, so we won\u2019t go into detail about them here.</p> <p>The variant information is what makes up the bulk of the Laboratorian report and can be found in the <code>\"dr_variants\"</code> and <code>\"other_variants\"</code> fields. We\u2019ll talk more about these fields later.</p> <p>There are many other fields that are omitted from this example since they are not used in <code>tbp-parser</code>, such as version information and overall sample drug resistance type (like RR-TB, etc.). These fields are found in the other TBProfiler output files in more human-readable formats.</p> <p>Within the input JSON file, there are two fields that are examined the most: <code>\"dr_variants\"</code> and <code>\"other_variants\"</code>. These fields are treated the same, and have the same format, although different mutations are found in both regions. The difference between the two fields is unclear to me at this time. In the example below, only the fields used in <code>tbp-parser</code> are shown.</p> example_input.json<pre><code>{\n  ...\n  \"dr_variants\": [\n    {\n      \"chrom\": \"Chromosome\",\n      \"genome_pos\": 761109,\n      ...\n      \"depth\": 130,\n      \"freq\": 1,\n      \"type\": \"missense_variant\",\n      \"nucleotide_change\": \"c.1303G&gt;T\",\n      \"protein_change\": \"p.Asp435Tyr\",\n      \"annotation\": [\n        {\n          \"type\": \"who_confidence\",\n          \"drug\": \"rifampicin\",\n          \"who_confidence\": \"Assoc w R\"\n        }\n      ],\n      \"alternate_consequences\": [],\n      ...\n      \"locus_tag\": \"Rv0667\",\n      \"gene\": \"rpoB\",\n      \"gene_associated_drugs\": [\n        \"rifampicin\"\n      ]\n    }\n  ],\n  \"other_variants\": [\n    {\n      \"chrom\": \"Chromosome\",\n      \"genome_pos\": 6112,\n      ...\n      \"depth\": 105,\n      \"freq\": 1,\n      \"type\": \"missense_variant\",\n      \"nucleotide_change\": \"c.873G&gt;C\",\n      \"protein_change\": \"p.Met291Ile\",\n      \"annotation\": [\n        {\n          \"type\": \"who_confidence\",\n          \"drug\": \"moxifloxacin\",\n          \"who_confidence\": \"Not assoc w R\"\n        },\n        {\n          \"type\": \"who_confidence\",\n          \"drug\": \"levofloxacin\",\n          \"who_confidence\": \"Not assoc w R\"\n        }\n      ],\n      \"alternate_consequences\": [],\n      ...\n      \"locus_tag\": \"Rv0005\",\n      \"gene\": \"gyrB\",\n      \"gene_associated_drugs\": [\n        \"levofloxacin\",\n        \"ofloxacin\",\n        \"moxifloxacin\",\n        \"fluoroquinolones\",\n        \"ciprofloxacin\"\n      ]\n    },\n...\n}\n</code></pre> <p>After the global <code>SAMPLENAME</code> variable is set, the <code>Laboratorian</code> class calls the <code>.iterate_section()</code> method, starting with the <code>\"dr_variants\"</code> field.</p> <p>Since the contents of each variant section in the JSON dictionary are considered a list, we start to iterate through each list item, which consists of each section within curly brackets <code>{...}</code>. In the example to the left, I\u2019ve only included 1 item in each list. </p> <p>Immediately, each item in the list is converted into a <code>Variant</code> class object, and every item in each list item (the <code>\"chrom\"</code>, <code>\"genome_pos\"</code>, <code>\"locus_tag\"</code>, etc.) is converted to a class attribute. This is because each item in the list represents a single mutation or a single variant. I\u2019ll now refer to each variant section item as a <code>Variant</code>.</p> <p>Each new <code>Variant</code> object has the <code>.extract_annotations()</code> method called. This method starts by iterating through the <code>\"annotation\"</code> field in the input JSON. The annotation field can contain multiple different annotations, so we look at each one individually.</p> <p>Each annotation is turned into a <code>Row</code> object, which represents a row in the Laboratorian report. During the initiation of the <code>Row</code> object, each column in the Laboratorian report is created based on both the annotation field and the originating <code>Variant</code> object. Additionally, a warning field is created based on both the global dictionary created with the <code>Coverage</code> class and the mutation\u2019s <code>\"depth\"</code> and <code>\"freq\"</code> fields.</p> <p>Sometimes multiple annotations for the same drug can appear for a single <code>Variant</code>. If this is the case, only the most severe annotation is saved (that is, an annotation that indicates resistance is kept instead of one that indicates susceptibility).</p> <p>After the annotation field has been iterated through, we then check the <code>\"gene_associated_drugs\"</code> field to make sure that we create a <code>Row</code> for each antimicrobial drug that is associated with the gene. As you can see in the <code>\"other_variants\"</code> section, the annotation field for the variant only lists annotations for moxifloxacin and levofloxacin, but the gene is associated with three other antimicrobial drugs. This iteration creates additional <code>Row</code> objects for those antimicrobial drugs.</p> <p>This means that each mutation will potentially appear several times in the final report, once for every antimicrobial associated with the drug. This is because sometimes a mutation confers a different resistance level to one drug, but not another.</p> <p>After <code>Row</code> objects are created for each <code>Variant</code> in the variant section, every <code>Row</code> has the <code>.complete_row()</code> method called, which adds the interpretation columns to the object. Two interpretation columns are created, <code>mdl_interpretation</code> and <code>looker_interpretation</code>.</p> <p>Please note that these interpretation columns are typically identical, but in several cases, the <code>mdl_interpretation</code> column will call a variant-drug combination as \u201csusceptible\u201d (S), while the <code>looker_interpretation</code> column will call the same combination \u201cuncertain\u201d (U).</p> <p>In the case where a WHO annotation was not identified, the <code>Variant</code> class\u2019 <code>.apply_expert_rules()</code> method is called. This function applies expert rules that are listed in detail on the <code>tbp-parser</code> GitHub repository, available here.</p> <p>The expert rules assign a drug resistance call to the variant-drug combination only when there is no WHO annotation and will fill the <code>mdl_interpretation</code> and <code>looker_interpretation</code> fields.</p> <p>If the mutation is in either mmpS5, mmpL5, or mmpR5/Rv0678, then the <code>\"alternate_consequences\"</code> field is iterated through. This field typically lists the same mutation but in reference to a different gene; for instance, if a mutation is in the upstream non-coding region of one gene, it may be in the coding region of a different gene.</p> <p>Then, any genes that do not have any variants are added to the laboratorian report with various \u201cNA\u201d or \u201cWT\u201d values filling the appropriate fields.</p> <p>This means that every gene in the TBDB appears in the Laboratorian report regardless if any mutations were identified in that gene.</p> <p>Finally, a few more quality control measures are taken and then all of the individual <code>Row</code> objects are written to a CSV file, which concludes the creation of the laboratorian report.</p>"},{"location":"algorithm/technical/#creating-the-looker-report","title":"Creating the Looker report","text":"<p>The <code>Parser</code> class then creates a <code>Looker</code> object which uses the <code>.create_looker_report()</code> method. The Looker report uses the Laboratorian report to generate most of the included information.</p> <p>It starts by iterating through a list of antimicrobial drugs and extracting all of the <code>looker_interpretation</code> values for each row in the report with that antimicrobial drug. It then identifies the highest resistance rating (R &gt; R-Interim &gt; U &gt; S-Interim &gt; S) for all resistance annotations for a drug.</p> <p>Then, a quality check is performed and if a particular gene fails coverage that contributed to the highest resistance rating, an insufficient coverage warning is given.</p> <p>The <code>\"main_lin\"</code> and <code>\"sublin\"</code> fields from the input JSON file are used to fill the <code>ID</code> field in the report. These fields are converted into shortened English without any technical lineage information.</p> <p>Finally, the information is written to a CSV file which concludes the creation of the Looker report.</p>"},{"location":"algorithm/technical/#creating-the-lims-report","title":"Creating the LIMS report","text":"<p>The <code>Parser</code> class then creates <code>LIMS</code> object which uses the <code>.create_lims_report()</code> method. The LIMS report also uses the Laboratorian report to generate the bulk of the information included.</p> <p>The <code>.create_lims_report()</code> method begins by iterating through each LIMS antimicrobial and gene code (corresponding to the LIMS codes in the CDPH STAR LIMS system). Then, the highest <code>mdl_interpretation</code> value is extracted for each row in the report that is associated with that antimicrobial drug, like in the Looker report. Then, the annotation is converted into a human-readable format (R \u2192 Mutations(s) associated with resistance to {antimicrobial} detected\u201d, etc.).</p> <p>Then, the <code>.apply_lims_rules()</code> function is activated which determines which mutations should be output for the corresponding drug-gene combination. The mutations are then formatted so that they appear in the following format: <code>{nucleotide mutation} ({amino acid mutation, if available})</code> repeated, separated by semicolons.</p> <p>Some specific parsing rules apply to mutations within the rpoB gene, which changes the output language on the LIMS report. These rules depend on the position of the mutation in the gene.</p> <p>After the rules are applied and the mutations are collected, the information is written to a CSV file which concludes the creation of the LIMS report.</p>"},{"location":"algorithm/technical/#creating-the-coverage-report","title":"Creating the coverage report","text":"<p>The <code>Parser</code> class then reuses the <code>Coverage</code> object created first and calls the <code>.reformat_coverage()</code> method which adds any warnings, such as any deletion mutations detected for a gene. If a deletion is detected, a warning is useful because it indicates that although the reported coverage is less than 100%, it may be due to that deletion. If the coverage is still 100% and a deletion was identified, the warning will say that the deletion may be upstream.</p> <p>The coverage dictionary and the associated warnings are then written to a CSV file which concludes the creation of the coverage report, and the <code>tbp-parser</code> script.</p>"},{"location":"inputs/inputs/","title":"Command-line Arguments","text":"<p>The inputs on this page reflect the parameters that are applicable for the command-line tool. To see the inputs required for <code>tbp-parser</code> when run as part of the TheiaProk workflow series, please refer to the TheiaProk Inputs page.</p>"},{"location":"inputs/inputs/#required-inputs","title":"Required Inputs","text":"<p><code>tbp-parser</code> is designed to run immediately after Jody Phelan\u2019s TBProfiler tool. Only two inputs are required: the JSON file produced by <code>TBProfiler</code> and the BAM file produced by <code>TBProfiler</code>.</p> <p>The JSON file contains information about the mutations detected in the sample: the quality, the type, and if that mutation confers resistance to an antimicrobial drug. The BAM file contains the alignment information for the sample and is needed for determining sequencing quality. </p> Parameter Description input_json The path to the JSON file that was produced by <code>TBProfiler</code> input_bam The path to the BAM file that was produced by <code>TBProfiler</code> <p>BAM index file required</p> <p>The BAM file must have the accompanying BAI file in the same directory. It must also be named exactly the same as the BAM file but ending with a <code>.bai</code> suffix.</p>"},{"location":"inputs/inputs/#optional-inputs","title":"Optional Inputs","text":"<p><code>tbp-parser</code> can be customized with a number of optional input parameters. These parameters can be used to control the quality control thresholds, the text that appears in the reports, and the names of the output files. The following is a list of all the input parameters that can be used with <code>tbp-parser</code>.</p> <p>In addition to these arguments, <code>tbp-parser</code> also has a <code>-h, --help</code> argument that will out the list of possible arguments and their descriptions and a <code>-v, --version</code> argument that will print out the version of <code>tbp-parser</code> that is installed. Both of these commands exit the program after printing their output.</p>"},{"location":"inputs/inputs/#quality-control-arguments","title":"Quality Control Arguments","text":"<p>These options determine the thresholds for quality control.</p> Short Version Long Version Description Default Value -d --min_depth The minimum depth of coverage required for a site to pass QC 10 -c --min_percent_coverage The minimum percentage of a region that has depth above the threshold set by <code>min_depth</code> (used for a gene/locus to pass QC) 100 -s --min_read_support The minimum read support for a mutation to pass QC 10 -f --min_frequency The minimum frequency for a mutation to pass QC (0.1 -&gt; 10%) 0.1 -r --coverage_regions A BED file containing the regions to calculate percent coverage for /data/tbdb-modified-regions.md"},{"location":"inputs/inputs/#text-arguments","title":"Text Arguments","text":"<p>These options are used verbatim in the reports, or are used to name the output files.</p> Short Version Long Version Description Default Value -m --sequencing_method The sequencing method used to gerneate the data; used in the LIMS &amp; Looker reports. Enclose in quotes if including a space \"Sequencing method not provided\" -p --operator The operator who ran the analysis; used in the LIMS &amp; Looker reports. Enclose in quotes if including a space \"Operator not provided\" -o --output_prefix The prefix to use for the output files. Do not include any spaces \"tbp-parser\""},{"location":"inputs/inputs/#lims-arguments","title":"LIMS Arguments","text":"<p>These options are used to customize the LIMS report</p> Name Description Default Value --add_cs_lims Adds Cycloserine (CS) fields to the LIMS report false"},{"location":"inputs/inputs/#tngs-specific-arguments","title":"tNGS-specific Arguments","text":"<p>These options are primarily used for tNGS data, although all frequency and read support arguments are compatible with WGS data.</p> Name Description Default Value --tngs Indicates that the input data was generated using the Deeplex + CDPH modified protocol. Turns on tNGS-specific global parameters false --tngs_expert_regions A BED file containing the regions to calculate coverage for expert rule regions. This is used to determine coverage quality in the regions where resistance-conferring mutations are found, or where a CDC expert rule is applied. This is not used for QC purposes /data/tbdb-expert-regions.bed --rrs_frequency The minimum frequency for an rrs mutation to pass QC, as rrs has several problematic sites in the Deeplex tNGS assay 0.1 --rrl_frequency The minimum frequency for an rrl mutation to pass QC, as rrl has several problematic sites in the Deeplex tNGS assay 0.1 --rrs_read_support The minimum read support for an rrs mutation to pass QC, as rrs has several problematic sites in the Deeplex tNGS assay 10 --rrl_read_support The minimum read support for an rrl mutation to pass QC, as rrl has several problematic sites in the Deeplex tNGS assay 10 --rpob449_frequency The minimum frequency for an rpoB mutation at protein position 449 to pass QC, as this site is problematic in the Deeplex tNGS assay 0.1 --etha237_frequency The minimum frequency for an ethA mutation at protein position 237 to pass QC, as this site is problematic in the Deeplex tNGS assay 0.1"},{"location":"inputs/inputs/#logging-arguments","title":"Logging Arguments","text":"<p>These options change the verbosity of the <code>stdout</code> log</p> Name Description Default Value --verbose Increases the output verbosity to describe which stage of the analysis is currently running false --debug The highest level of output verbosity detailing every step of the analysis and logic implemented; overwrites --verbose false"},{"location":"inputs/theiaprok/","title":"TheiaProk Inputs on Terra","text":"<p>When running <code>tbp-parser</code> as part of the TheiaProk workflow series (find documentation for TheiaProk here) on Terra.bio, an optional input must be activated to instruct TheiaProk to run <code>tbp-parser</code>.</p> <p><code>tbp-parser</code> is not on by default due to the nature of this tool and its outputs.</p> <p>TheiaProk Version</p> <p>This information only corresponds to the upcoming PHB v2.3.0 release. These inputs and outputs may not be applicable to other versions of TheiaProk.</p>"},{"location":"inputs/theiaprok/#required-inputs","title":"Required Inputs","text":"<p>To activate <code>tbp-parser</code> you must set the following variable to true:</p> Terra Task name Variable Type Description Default Value <code>merlin_magic</code> call_tbp_parser Boolean Set to <code>true</code> to activate <code>tbp-parser</code> <code>false</code>"},{"location":"inputs/theiaprok/#optional-inputs","title":"Optional Inputs","text":"<p>The following optional inputs are also available for user modification on Terra:</p> Terra Task name Variable Type Description Default Value <code>merlin_magic</code> tbp_parser_add_cs_lims Boolean Set to <code>true</code> to add Cycloserine (CS) fields to the LIMS report <code>false</code> <code>merlin_magic</code> tbp_parser_coverage_regions_bed File A BED file containing the regions to calculate percent coverage for tbdb-modified-regions.md <code>merlin_magic</code> tbp_parser_coverage_threshold Int The minimum percentage of a region that has depth above the threshold set by <code>min_depth</code> (used for a gene/locus to pass QC) 100 <code>merlin_magic</code> tbp_parser_debug Boolean Set to <code>false</code> to turn off debug mode for <code>tbp-parser</code> <code>true</code> <code>merlin_magic</code> tbp_parser_docker_image String The Docker image to use when running <code>tbp-parser</code> \"us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.1.0\" <code>merlin_magic</code> tbp_parser_etha237_frequency Float Minimum frequency for a mutation in ethA at protein position 237 to pass QC in <code>tbp-parser</code> 0.1 <code>merlin_magic</code> tbp_parser_expert_rule_regions_bed File A file that contains the regions where R mutations and expert rules are applied <code>merlin_magic</code> tbp_parser_min_depth Int Minimum depth for a variant to pass QC in tbp_parser 10 <code>merlin_magic</code> tbp_parser_min_frequency Int The minimum frequency for a mutation to pass QC 0.1 <code>merlin_magic</code> tbp_parser_min_read_support Int The minimum read support for a mutation to pass QC 10 <code>merlin_magic</code> tbp_parser_operator String Fills the \"operator\" field in the tbp_parser output files \"Operator not provided\" <code>merlin_magic</code> tbp_parser_output_seq_method_type String Fills out the \"seq_method\" field in the tbp_parser output files \"Sequencing method not provided\" <code>merlin_magic</code> tbp_parser_rpob449_frequency Float Minimum frequency for a mutation at protein position 449 to pass QC in <code>tbp-parser</code> 0.1 <code>merlin_magic</code> tbp_parser_rrl_frequency Float Minimum frequency for a mutation in rrl to pass QC in <code>tbp-parser</code> 0.1 <code>merlin_magic</code> tbp_parser_rrl_read_support Int Minimum read support for a mutation in rrl to pass QC in <code>tbp-parser</code> 10 <code>merlin_magic</code> tbp_parser_rrs_frequency Float Minimum frequency for a mutation in rrs to pass QC in <code>tbp-parser</code> 0.1 <code>merlin_magic</code> tbp_parser_rrs_read_support Int Minimum read support for a mutation in rrs to pass QC in <code>tbp-parser</code> 10 <code>merlin_magic</code> tbp_parser_tngs_data Boolean Set to <code>true</code> to enable tNGS-specific parameters and runs in <code>tbp-parser</code> <code>false</code> <p>Find the outputs for <code>tbp-parser</code> in TheiaProk on Terra here.</p>"},{"location":"outputs/","title":"Output Overview","text":"<p><code>tbp-parser</code> produces four files as outputs. See each individual page for more details on how they are constructed and what they contain:</p> <ul> <li>Laboratorian report</li> <li>LIMS report</li> <li>Looker report</li> <li>Coverage report</li> </ul> <p>The four reports contain a wealth of information. The reports can be ordered from increasing to decreasing verbosity as follows: the laboratorian report, the LIMS report, the Looker report, and the coverage report. The same information is used in all four reports but at differing levels of verbosity.</p> <p>Running <code>tbp-parser</code> as part of TheiaProk on Terra produces additional outputs. You can find that information in the TheiaProk Outputs on Terra page.</p>"},{"location":"outputs/coverage/","title":"Coverage Report","text":"<p>The coverage report lists every gene and its percent gene coverage over a minimum depth (default: 10) relative to the H37Rv genome.</p> <p>Please note that user-provided coverage regions always take precedence over default values.</p>"},{"location":"outputs/coverage/#wgs-coverage-report","title":"WGS Coverage Report","text":"Column name Explanation Gene The name of the gene or locus Percent_Coverage The percent of the gene\u2019s coding region that has a read depth over the minimum value (default: 10; user-customizable by altering <code>--min_depth</code>) Warning Indicates if any deletions were identified in the gene which may contribute to lower than expected coverage <p>If run using the TheiaProk workflow series, there will be an additional column that contains only the name of the sample, which is useful when concatenating many reports as it helps differentiate which gene belongs to which sample.</p>"},{"location":"outputs/coverage/#tngs-specific-information","title":"tNGS-specific information","text":"<p>If the <code>--tngs</code> flag is used, the report contains the following fields:</p> Column name Explanation Gene The name of the gene or locus Coverage_Breadth_reportableQC_region The percent of the gene (positions determined by the regions covered by the tNGS Deeplex + CDPH assay primers that are considered reportable by CDPH) that is covered at a depth greater than the <code>--min_depth</code> value QC_Warning Indicates if any deletions were identified in the gene which may contribute to lower than expected coverage Coverage_Breadth_R_expert-rule_region The percent of the regions (positions that could contain any resistance-conferring mutations or require expert-rule application) that is covered at a depth greater than the <code>--min_depth</code> value <p>Coverage regions are determined with either the default /data/tbdb-modified-regions.bed (collected on Sep 1, 2023 from the TBProfiler repository, or if <code>--tngs</code>, /data/tngs-reportable-regions.bed.</p> <p>The R-expert rule region is determined only if <code>--tngs</code> is indicated and uses the ranges in /data/tbdb-expert-regions.bed.</p>"},{"location":"outputs/laboratorian/","title":"Laboratorian Report","text":"<p>The laboratorian report is the main report produced by <code>tbp-parser</code> and is used to generate all of the other reports. What follows is an explanation of all the columns in the report.</p>"},{"location":"outputs/laboratorian/#explanation-of-column-headers","title":"Explanation of column headers","text":"Column name Explanation sample_id The name of the sample tbprofiler_gene_name The name of the gene where the mutation has been identified tbprofiler_locus_tag The locus tag for the mutation that has been identified tbprofiler_variant_substitution_type The type of mutation identified, whether or not it was a frameshift, missense, or synonymous mutation tbprofiler_variant_substitution_nt The mutation in nucleotide format tbprofiler_variant_substitution_aa The mutation in amino acid format, if possible confidence Contains either:- the WHO annotation- an indication that there was no WHO annotation- NA for when there is no mutation antimicrobial The antimicrobial drug that may be affected by this mutation looker_interpretation The drug resistance interpretation intended for the Looker report mdl_interpretation The drug resistance interpretation intended for the LIMS report depth The depth of coverage at the mutation frequency The frequency of the mutation in the reads read_support How many reads support the mutation (depth * frequency) rationale Contains an indication of what was used (the WHO annotation, the specific expert rule used, or neither) to create the two interpretations warning Any potential quality warnings that may indicate lower reliability gene_tier The gene tier of the mutation\u2019s gene (Tier 1, Tier 2, or NA) <p>Because of how a particular mutation may contribute resistance to different drugs at the same time, each mutation is listed multiple times, once for each antimicrobial drug that could be affected. In addition, any genes that do not have any mutations are also included in the laboratorian report with NA or WT in the appropriate field. This results in a report with many rows and often, rows with very similar values. However, the laboratorian report contains the \u201ccomplete picture\u201d of the sample and is incredibly useful for understanding the sample\u2019s drug resistance profile.</p>"},{"location":"outputs/lims/","title":"LIMS Report","text":"<p>The LIMS report is intended for direct import into a STAR LIMS system. The columns are in the specific LIMS code format for CDPH, and may not apply to your LIMS system. Please contact us if you need different column headers and we can work with you towards a solution.</p>"},{"location":"outputs/lims/#explanation-of-column-headers","title":"Explanation of column headers","text":"Column name Explanation MDL sample accession numbers The name of the sample M_DST_A01_ID The lineage of the sample in human-readable language M_DST_B01_INH The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (isoniazid) M_DST_B02_katG Any non-S mutations found in this gene with good quality responsible for the predicted resistance for ethionamideresponsible for the predicted resistance for isoniazid M_DST_B03_fabG1 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for isoniazid M_DST_B04_inhA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for isoniazid M_DST_C01_ETO The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (ethionamide) M_DST_C02_ethA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for ethionamide M_DST_C03_fabG1 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for ethionamide M_DST_C04_inhA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for ethionamide M_DST_D01_RIF The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (rifampin) M_DST_D02_rpoB Any non-S mutations found in this gene with good quality responsible for the predicted resistance for rifampin M_DST_E01_PZA The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (pyrazinamide) M_DST_E02_pncA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for pyrazinamide M_DST_F01_EMB The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (ethambutol) M_DST_F02_embA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for ethambutol M_DST_F03_embB Any non-S mutations found in this gene with good quality responsible for the predicted resistance for ethambutol M_DST_G01_AMK The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (amikacin) M_DST_G02_rrs Any non-S mutations found in this gene with good quality responsible for the predicted resistance for amikacin M_DST_G03_eis Any non-S mutations found in this gene with good quality responsible for the predicted resistance for amikacin M_DST_H01_KAN The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (kanamycin) M_DST_H02_rrs Any non-S mutations found in this gene with good quality responsible for the predicted resistance for kanamycin M_DST_H03_eis Any non-S mutations found in this gene with good quality responsible for the predicted resistance for kanamycin M_DST_I01_CAP The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (capreomycin) M_DST_I02_rrs Any non-S mutations found in this gene with good quality responsible for the predicted resistance for capreomycin M_DST_I03_tlyA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for capreomycin M_DST_J01_MFX The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (moxifloxacin) M_DST_J02_gyrA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for moxifloxacin M_DST_J03_gyrB Any non-S mutations found in this gene with good quality responsible for the predicted resistance for moxifloxacin M_DST_K01_LFX The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (levofloxacin) M_DST_K02_gyrA Any non-S mutations found in this gene with good quality responsible for the predicted resistance for levofloxacin M_DST_K03_gyrB Any non-S mutations found in this gene with good quality responsible for the predicted resistance for levofloxacin M_DST_L01_BDQ The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (bedaquiline) M_DST_L02_Rv0678 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for bedaquiline M_DST_L03_atpE Any non-S mutations found in this gene with good quality responsible for the predicted resistance for bedaquiline M_DST_L04_pepQ Any non-S mutations found in this gene with good quality responsible for the predicted resistance for bedaquiline M_DST_L05_mmpL5 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for bedaquiline M_DST_L06_mmpS5 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for bedaquiline M_DST_M01_CFZ The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (clofazimine) M_DST_M02_Rv0678 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for clofazimine M_DST_M03_pepQ Any non-S mutations found in this gene with good quality responsible for the predicted resistance for clofazimine M_DST_M04_mmpL5 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for clofazimine M_DST_M05_mmpS5 Any non-S mutations found in this gene with good quality responsible for the predicted resistance for clofazimine M_DST_N01_LZD The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (linezolid) M_DST_N02_rrl Any non-S mutations found in this gene with good quality responsible for the predicted resistance for linezolid M_DST_N03_rplC Any non-S mutations found in this gene with good quality responsible for the predicted resistance for linezolid Analysis date The date <code>tbp-parser</code> was run in YYYY-MM-DD HH:SS format Operator The name of the person who ran <code>tbp-parser</code>; can be provided with the <code>--operator</code> input parameter. If left blank, \u201cOperator not provided\u201d is the default value. M_DST_O01_lineage The lineage of the sample (the <code>main_lin</code> of the sample as reported by TBProfiler) M_DST_P01_CS The highest <code>mdl_interpretation</code> resistance identified for mutations associated with this drug (cycloserine); only included when <code>--add_cs_lims</code> is set to true M_DST_P02_ald Any non-S mutations found in this gene with good quality responsible for the predicted resistance for cycloserine; only included when <code>--add_cs_lims</code> is set to true M_DST_PO3_alr Any non-S mutations found in this gene with good quality responsible for the predicted resistance for cycloserine; only included when <code>--add_cs_lims</code> is set to true <p>The LIMS report offers a condensed version of the laboratorian report with more details than the Looker report. By containing only the most important information about a drug and its related mutations, the LIMS report provides an invaluable summary.</p>"},{"location":"outputs/looker/","title":"Looker Report","text":"<p>The Looker report is intended for use in Google's Looker Studio Data Studio for dashboarding purposes. It offers a highly condensed version of the resistance calls (using the <code>looker_interpretation</code> field from the laboratorian report) for a quick summary of the sample\u2019s drug resistance profile.</p>"},{"location":"outputs/looker/#explanation-of-column-headers","title":"Explanation of column headers","text":"Column name Explanation sample_id The name of the sample output_seq_method_type The sequencing method used to generate the data; can be set with the <code>--sequencing_method</code> input parameter. If left blank, \u201cSequencing method not provided\u201d is the default value amikacin The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug bedaquiline The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug capreomycin The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug clofazimine The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug ethambutol The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug ethionamide The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug isoniazid The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug kanamycin The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug levofloxacin The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug linezolid The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug moxifloxacin The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug pyrazinamide The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug rifampin The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug streptomycin The highest <code>looker_interpretation</code> resistance identified for mutations associated with this drug lineage The lineage of the sample (the <code>main_lin</code> field as reported by TBProfiler); for example, lineage1.2.1.2.1 ID The lineage of the sample in human-readable language (the same as <code>M_DST_A01_ID</code> in the LIMS report) analysis_date The date <code>tbp-parser</code> was run in YYYY-MM-DD HH:SS format operator The name of the person who ran <code>tbp-parser</code>; can be provided with the <code>--operator</code> input parameter. If left blank, \u201cOperator not provided\u201d is the default value. <p>Please note that occasionally, the <code>looker_interpretation</code> field can differ from the <code>mdl_interpretation</code> field. Typically, they are identical, but occasionally, the <code>mdl_interpretation</code> column will call a variant-drug combination \u201csusceptible\u201d (S), while the <code>looker_interpretation</code> column will call the same combination \u201cuncertain\u201d (U). Be aware of this difference when choosing an interpretation to report.</p>"},{"location":"outputs/theiaprok/","title":"TheiaProk Outputs on Terra","text":"<p>When running <code>tbp-parser</code> as part of the TheiaProk workflow series (find documentation for TheiaProk here) on Terra.bio, you will find the following outputs in your data table.</p> <p>TheiaProk Version</p> <p>This information only corresponds to the upcoming PHB v2.3.0 release. These inputs and outputs may not be applicable to other versions of TheiaProk.</p> Variable Type Description tbp_parser_average_genome_depth Float The average depth of coverage across the H37Rv reference genome tbp_parser_coverage_report File The coverage report generated by <code>tbp-parser</code> tbp_parser_docker String The Docker image used to run <code>tbp-parser</code> tbp_parser_genome_percent_coverage Float The percentage of the H37Rv reference genome that has depth above the threshold set by <code>tbp_parser_min_depth</code> tbp_parser_laboratorian_report_csv File The laboratorian report generated by <code>tbp-parser</code> tbp_parser_lims_report_csv File The LIMS report generated by <code>tbp-parser</code> tbp_parser_looker_report_csv File The Looker report generated by <code>tbp-parser</code> tbp_parser_version String The version of tbp-parser used in the analysis as determined by <code>tbp-parser --version</code> <p>Find the inputs for <code>tbp-parser</code> in TheiaProk on Terra here.</p>"},{"location":"versioning/","title":"Versioning and Releases","text":""},{"location":"versioning/#validated-versions","title":"Validated Versions","text":"<p>The California Department of Public Health has clinically validated the following versions:</p> <ul> <li>v1.2.2 for WGS, and</li> <li>v1.4.4.8 for tNGS</li> </ul> <p>Validate Before Use</p> <p>CAUTION: The information produced by this program should not be used for clinical reporting unless and until extensive validation has occured in your laboratory on a stable version. Otherwise, the outputs of tbp-parser are for research use only.</p>"},{"location":"versioning/#interpretation-documents","title":"Interpretation Documents","text":"<p>Interpretation documents for v1.2.2 and v1.4.4.8 are available in the root directory of the <code>tbp-parser</code> repository.</p> <p>Interpretation documents for other versions are available in the interpretation_docs directory on GitHub.</p>"},{"location":"versioning/#when-using-theiaprok","title":"When Using TheiaProk","text":"<p>FUTURE DEPRECATION NOTICE</p> <p>At the time of the PHB v2.3.0 release:</p> <ul> <li>all branches on Terra that have been mentioned in this documentation will be deleted. Please use the v2.3.0 version of TheiaProk moving forward.</li> <li>the <code>main</code> branch of tbp-parser will host v2.1.0 and above; earlier versions of tbp-parser will no longer be supported</li> </ul> <p>If you are running tbp-parser as part of the TheiaProk pipeline(s) with Terra, the following branches are recommended:</p> <ul> <li>To run v1.2.2 on Terra, please use the smw-tb-2024-01-16-dev branch.</li> <li>To run v1.4.4.8+ and v1.6.x+, please use the smw-tb-2024-05-03-dev branch.</li> <li>To run v1.5.x+ and v2.x+, please use the smw-tb-2024-05-03-who2-dev branch.</li> <li>To run v2.1.0+, please use the smw-tbprofiler-updates-dev branch until the time of the v2.3.0 PHB release.</li> </ul>"},{"location":"versioning/#version-differences","title":"Version Differences","text":"<p>For more information on the differences between versions, you can see the Brief Description of Versions or the Exhaustive List of Versions.</p>"},{"location":"versioning/brief/","title":"Brief Description of Versions","text":"<p>You may notice there are many releases; <code>tbp-parser</code> is in active development and each release is \"use at your own risk.\" We highly recommend upgrading to the latest release as they include important bug fixes. In order to help track the different changes, we have included a brief description of each release:</p> <ul> <li>v1.2.x &amp; below - the initial developmental stages of tbp-parser for WGS data</li> <li>v1.3.x - the addition of tNGS data parsing and includes some updates applicable to WGS parsing</li> <li>v1.4.x - reworks how QC is performed (changes in order of operations)<ul> <li>v1.4.3+ - changes how tNGS lineage determination is performed</li> <li>v1.4.4+ - changes how nonsynonymous mutations are interpretted; major interpretation differences between earlier versions</li> </ul> </li> <li>v1.6.x - only considers the genes included in the LIMS report to determine the drug output in the LIMS report</li> <li>v1.5.x+ and v2.0.0 - major changes to code in due to using results from TBProfiler v6.2.0+</li> <li>v2.1.0 - v1.6.0 and earlier versions are no longer supported; v2.1+ changes are included on <code>main</code> branch moving foward.</li> </ul> <p>For a more exhaustive list, please visit the Exhaustive List of Versions.</p>"},{"location":"versioning/exhaustive/","title":"Exhaustive version descriptions","text":"<p>The following is a list of every version of <code>tbp-parser</code> and a short summary of the changes made in each version.</p> <p>Blue indicates that CDPH performed a clinical validation on that version</p> <ul> <li>v1.0.0 - initial version</li> <li>v1.1.0 - adjusts the highest interpretation for a drug to only consider genes in LIMS report, adds the rule to the confidence column, adds QRDR expert rules for gyrA and gyrB</li> <li>v1.1.1 - fixes a bug in R/QRDR region calculations</li> <li>v1.1.2 - adjusts LIMS lineage designation by checking for BCG and if lineage from TB Profiler is empty</li> <li>v1.1.3 - now includes the TB Profiler sublineage output when determining BCG M bovis</li> <li>v1.1.4 - now checks if multiple lineages/sublineages were detected</li> <li>v1.1.5 - checks all mmpS/mmpL/mmpR alternate consequences; also checks to make sure all drugs are reported</li> <li>v1.1.5.1 - renames rifampicin to rifampin</li> <li>v1.1.6 - removes a locus warning with deletion caveat</li> <li>v1.1.7 - ensures all deletion caveat locus warnings are gone, overwrites all fields with locus warning with \u201cNA\u201d or \u201cInsufficient Coverage\u201d as appropriate and moves them to the bottom of the Laboratorian report</li> <li>v1.1.8 - changes overwrite to only overwrite interpretation values, not mutation information</li> <li>v1.1.9 - renames rifampicin to rifampin</li> <li>v1.2.0 - enables ability to provide alternate coverage bed file; introduced the modified regions (just coding region + 30bp upstream or promoter region)</li> <li>v1.2.1 - fixes a bug when renaming rifampicin to rifampin</li> <li>v1.2.2 (WGS) - improve how maximum MDL interpretation is calculated for the LIMS report. Use the smw-tb-2024-01-16-dev branch on Terra.</li> <li>v1.2.3 - check only the LIMS genes\u2019 coverage for LIMS lineage determination and use a threshold for all lineage designation</li> <li>v1.3.0 - adds tNGS regions, checks to make sure that only variants for genes in the coverage report are included in the laboratorian (tNGS), error-proof locus tag designation, add check to prevent failures when gene not in coverage dictionary (tNGS), adds \u201cNA\u201d to the mutation rank list (score = 0, same as Insufficient Coverage)</li> <li>v1.3.1 - adds <code>--tngs</code> flag to turn on tNGS-specific global parameters, establishes different threshold calculation for lineage designation for tNGS, checks the segment of a gene a variant was detected in, removes check that did not prevent failures when gene not in coverage dictionary from v1.3.0, error-proof all coverage checks, adds \u201cThis mutation is outside the expected region\u201d warning</li> <li>v1.3.2 - error-proofs coverage warning and adds additional section for tNGS gene segments, error-proofs gene tier for tNGS gene segments</li> <li>v1.3.3 - condenses most gene segments into one, for WT mutations, set the mutation to \u201cWT\u201d not \u201cNA\u201d</li> <li>v1.3.4 - error-proofs maximum mdl interpretation determination and maximum looker interpretation determination</li> <li>v1.3.5 - adds rrs &amp; rrl frequency input parameters to customize mutation frequency for those genes , overwrites gene MDL interpretation when \u201cInsufficient Coverage\u201d to act as if \u201cWT\u201d if greater than S</li> <li>v1.3.6 - adds the TBProfiler lineage to the end of the LIMS report and the Looker report, adds LIMS lineage to Looker report, introduces check if max MDL interpretation is also Insufficient Coverage to change output to Pending Retest</li> <li>v1.3.7 - add to the coverage report the \u201cexpert rule regions\u201d column for tNGS, overwrites gene MDL interpretation when \u201cInsufficient Coverage\u201d to act as if \u201cWT\u201d if gr **eater than or equal to S</li> <li>v1.3.8 - add frequency input parameters for rpoB 449 and ethA 237, renames coverage threshold to minimum percent coverage</li> <li>v1.3.9 - check if gene name is rpoB because that means it\u2019s outside the expected region (tNGS - rpoB is in two segments), add rrs and rrl read support input parameters</li> <li>v1.4.0 - rework how QC is performed (order of operations)</li> <li>v1.4.1 - remove rpoB expected region check, implements deletion position quality check in QC (keep only valid deletions), if outside expected region warning, set MDL interpretations to NA</li> <li>v1.4.2 - remove \u201coutside expected region\u201d mutations from LIMS report, error-proofs determining responsible MDL interpretations</li> <li>v1.4.2.1 (same change in v1.5.4) - prevent overwriting \u201cR\u201d mutations with No Sequence, and overwrite \u201cU\u201d mutations with \u201cPending Retest\u201d if bad quality</li> <li>v1.4.3 - implement different thresholds for LIMS lineage identification for tNGS,</li> <li>v1.4.4 - update expert rule interpretations (mainly S \u2192 U in several spots)</li> <li>v1.4.4.1 (v1.5.0 branched off of this one)- update LIMS threshold to 90, not the coverage threshold</li> <li>v1.4.4.2 (same change in v1.5.1) - fix an issue where \u201cNo sequence\u201d was not triggering Pending Retest</li> <li>v1.4.4.3 (same change in v1.5.5) - fix an issue where \u201cPending Retest\u201d was not properly appearing</li> <li>v1.4.4.4 (same change in v1.5.6) - prevent \u201cPending Retest\u201d if Insufficient Coverage is in a gene that also has a valid deletion</li> <li>v1.4.4.5 - consider deletions invalid if coverage is between 0 and minimum coverage (10 default) (this consideration is unique to old TB Profiler and not mimicked in v1.5)</li> <li>v1.4.4.6 - a mistake; updates the version (this release is a mystery to me as there is nothing in there except version update)</li> <li>v1.4.4.7 (same change in v1.5.8) - change tNGS LIMS lineage designation to items in the coverage dictionary (to represent both rpoB segments)</li> <li>v1.4.4.8 (tNGS) (same change in v1.5.9)- reduce tNGS LIMS threshold to 70% from 90. Use the smw-tb-2024-05-03-dev branch on Terra for this and all subsequent v1.4.4.x+ versions.</li> <li>v1.4.4.9 (same change in v1.5.7) - add optional input to add cycloserine to LIMS report</li> <li>v1.4.4.10 - fix issue when MDL resistance was being overwritten to Pending Retest but without considering other genes when calculating the highest MDL resistance (as the other genes may have had higher resistances that were not captured at first)</li> <li>v1.4.4.11 - fix issue introduced by last fix where we ran into indexing errors due to no more MDL interpretations available in the list</li> <li>v1.5.0 (branched off of v1.4.4.1)- make all language changes necessary to be compatible with TBProfiler v6.2.1. Use the smw-tb-2024-05-03-who2-dev branch on Terra for this and all subsequent v1.5.x+ versions.</li> <li>v1.5.1  (same change in v1.4.4.2)- fix an issue where \u201cNo sequence\u201d was not triggering Pending Retest</li> <li>v1.5.2 - a mistake; somehow exactly the same as 1.4.4.2?? (this release is also a mystery)</li> <li>v1.5.3 - make additional language changes and fix an unusual edge case where the same mutation was identified; rename mmpR5 to Rv0678 again</li> <li>v1.5.4 (same change in v1.4.2.1) - prevent overwriting \u201cR\u201d mutations with No Sequence</li> <li>v1.5.5 (same change in v1.4.4.3 - fix an issue where \u201cPending Retest\u201d was not properly appearing; consider only LIMS genes for LIMS reort</li> <li>v1.5.6 (same change in v1.4.4.4) - prevent \u201cPending Retest\u201d if Insufficient Coverage is in a gene that also has a valid deletion</li> <li>v1.5.7 (same change in v1.4.4.9) - add optional input to add cycloserine to LIMS report</li> <li>v1.5.8 (same change in v1.4.4.7) - change tNGS LIMS lineage designation to check items in the coverage dictionary (to represent both rpoB segments; percentage calculation erroneously combined them)</li> <li>v1.5.9 (same change in v1.4.4.8) - reduce tNGS LIMS threshold to 70% from 90</li> <li>v1.5.10 - correct spelling of two genes in the LIMS report for cycloserine</li> <li>v1.6.0 (branched off of v1.4.4.11) - ensures that only LIMS genes are being considered for the LIMS report. Use the smw-tb-2024-05-03-dev branch on Terra for this and all subsequent v1.6.x+ versions.</li> <li>v2.0.0 (branched off of v1.5.10; same change in v1.4.4.10 and v1.4.4.11) - fix issue when MDL resistance was being overwritten to Pending Retest but without considering other genes when calculating the highest MDL resistance (as the other genes may have had higher resistances that were not captured at first) and fixes the resulting issue where indexing errors occurred\u00a0due to no more MDL interpretations. Use the smw-tb-2024-05-03-who2-dev branch on Terra for this and all subsequent v2.x+ versions.</li> <li>v2.1.0 - any mutations in the 60 proximal promoter regions included in the WHO v2 database (Table 22, page 89-90). Use either the smw-tbprofiler-updates-dev branch until the time of the v2.3.0 release of TheiaProk on Terra for this and all subsequent v2.1.x+ versions<ul> <li>Earlier versions are now deprecated and will no longer be supported.</li> </ul> </li> </ul> <p>The following diagram shows how each version is related to the others without technical details: </p>"}]}
\ No newline at end of file
diff --git a/v2.2.1/usage/index.html b/v2.2.1/usage/index.html
index 801ce9a..6fcc486 100644
--- a/v2.2.1/usage/index.html
+++ b/v2.2.1/usage/index.html
@@ -1035,16 +1035,17 @@ <h1>Getting Started</h1>
 <h2 id="installation">Installation<a class="headerlink" href="#installation" title="Permanent link">&para;</a></h2>
 <h3 id="docker">Docker<a class="headerlink" href="#docker" title="Permanent link">&para;</a></h3>
 <p>We highly recommend using the following Docker iamge to run tbp-parser:</p>
-<div class="language-bash highlight"><pre><span></span><code><span id="__span-0-1"><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>docker<span class="w"> </span>pull<span class="w"> </span>us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.6.0<span class="w"> </span><span class="c1">#(1)!</span>
+<div class="language-bash highlight"><pre><span></span><code><span id="__span-0-1"><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>docker<span class="w"> </span>pull<span class="w"> </span>us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.1.0<span class="w"> </span><span class="c1">#(1)!</span>
 </span></code></pre></div>
 <ol>
 <li>We host our Docker images on the Google Artifact Registry so that they are always availble for usage.</li>
 </ol>
-<p>The entrypoint for this Docker iamge is the <code>tbp-parser</code> help message. To run this container <em>interactively</em>, use the following command:</p>
-<div class="language-bash highlight"><pre><span></span><code><span id="__span-1-1"><a id="__codelineno-1-1" name="__codelineno-1-1" href="#__codelineno-1-1"></a>docker<span class="w"> </span>run<span class="w"> </span>-it<span class="w"> </span>--entrypoint<span class="o">=</span>/bin/bash<span class="w"> </span>us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.6.0
-</span><span id="__span-1-2"><a id="__codelineno-1-2" name="__codelineno-1-2" href="#__codelineno-1-2"></a><span class="c1"># Once inside the container interactively, you can run the tbp-parser tool</span>
-</span><span id="__span-1-3"><a id="__codelineno-1-3" name="__codelineno-1-3" href="#__codelineno-1-3"></a>python3<span class="w"> </span>/tbp-parser/tbp_parser/tbp_parser.py<span class="w"> </span>-v
-</span><span id="__span-1-4"><a id="__codelineno-1-4" name="__codelineno-1-4" href="#__codelineno-1-4"></a><span class="c1"># v1.6.0</span>
+<p>The entrypoint for this Docker image is the <code>tbp-parser</code> help message. To run this container <em>interactively</em>, you can use the following command:</p>
+<div class="language-bash highlight"><pre><span></span><code><span id="__span-1-1"><a id="__codelineno-1-1" name="__codelineno-1-1" href="#__codelineno-1-1"></a>docker<span class="w"> </span>run<span class="w"> </span>-it<span class="w"> </span>--entrypoint<span class="o">=</span>/bin/bash<span class="w"> </span>us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.1.0
+</span><span id="__span-1-2"><a id="__codelineno-1-2" name="__codelineno-1-2" href="#__codelineno-1-2"></a>
+</span><span id="__span-1-3"><a id="__codelineno-1-3" name="__codelineno-1-3" href="#__codelineno-1-3"></a><span class="c1"># Once inside the container interactively, you can run the tbp-parser tool</span>
+</span><span id="__span-1-4"><a id="__codelineno-1-4" name="__codelineno-1-4" href="#__codelineno-1-4"></a>python3<span class="w"> </span>/tbp-parser/tbp_parser/tbp_parser.py<span class="w"> </span>-v
+</span><span id="__span-1-5"><a id="__codelineno-1-5" name="__codelineno-1-5" href="#__codelineno-1-5"></a><span class="c1"># v2.1.0</span>
 </span></code></pre></div>
 <h3 id="locally-with-python">Locally with Python<a class="headerlink" href="#locally-with-python" title="Permanent link">&para;</a></h3>
 <p><code>tbp-parser</code> is not yet available with <code>pip</code> or <code>conda</code>. To run <code>tbp-parser</code> in your local command-line environment, install the following dependencies:</p>
@@ -1054,7 +1055,7 @@ <h3 id="locally-with-python">Locally with Python<a class="headerlink" href="#loc
 <li>importlib_resources</li>
 <li>samtools</li>
 </ul>
-<p>After installation of these dependencies, download and extract the latest release of <code>tbp-parser</code> and run the script with <code>Python</code>.</p>
+<p>After installation of these dependencies, download and extract the latest release of <code>tbp-parser</code> and run the script with <code>python3</code>.</p>
 <h2 id="usage">Usage<a class="headerlink" href="#usage" title="Permanent link">&para;</a></h2>
 <h3 id="example-usage">Example Usage<a class="headerlink" href="#example-usage" title="Permanent link">&para;</a></h3>
 <p>This shows how the script can be run if used inside the Docker container provided above.</p>
@@ -1067,7 +1068,7 @@ <h3 id="example-usage">Example Usage<a class="headerlink" href="#example-usage"
 </span><span id="__span-2-7"><a id="__codelineno-2-7" name="__codelineno-2-7" href="#__codelineno-2-7"></a>    --sequencing_method &quot;Illumina NextSeq&quot; \
 </span><span id="__span-2-8"><a id="__codelineno-2-8" name="__codelineno-2-8" href="#__codelineno-2-8"></a>    --operator &quot;John Doe&quot; 
 </span></code></pre></div>
-<p>Please note that the BAM file must have the accompanying BAI file in the same directory. It must also be named exactly the same as the BAM file but ending with a .bai suffix.</p>
+<p>Please note that the BAM file must have the accompanying BAI file in the same directory.</p>
 <h3 id="help-message">Help Message<a class="headerlink" href="#help-message" title="Permanent link">&para;</a></h3>
 <p>The help message printed by <code>tbp-parser</code> is quite extensive, but has a lot of useful information regarding the input parameters. Here is the entire message in full. You can find more information regarding these inputs in the <a href="../inputs/inputs/">Inputs</a> section.</p>
 <div class="language-text highlight"><pre><span></span><code><span id="__span-3-1"><a id="__codelineno-3-1" name="__codelineno-3-1" href="#__codelineno-3-1"></a>usage: python3 /tbp-parser/tbp_parser/tbp_parser.py [-h|-v] &lt;input_json&gt; &lt;input_bam&gt; [&lt;args&gt;]
@@ -1186,7 +1187,7 @@ <h3 id="help-message">Help Message<a class="headerlink" href="#help-message" tit
     <span class="md-icon" title="Last update">
       <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21 13.1c-.1 0-.3.1-.4.2l-1 1 2.1 2.1 1-1c.2-.2.2-.6 0-.8l-1.3-1.3c-.1-.1-.2-.2-.4-.2m-1.9 1.8-6.1 6V23h2.1l6.1-6.1zM12.5 7v5.2l4 2.4-1 1L11 13V7zM11 21.9c-5.1-.5-9-4.8-9-9.9C2 6.5 6.5 2 12 2c5.3 0 9.6 4.1 10 9.3-.3-.1-.6-.2-1-.2s-.7.1-1 .2C19.6 7.2 16.2 4 12 4c-4.4 0-8 3.6-8 8 0 4.1 3.1 7.5 7.1 7.9l-.1.2z"/></svg>
     </span>
-    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-08-20</span>
+    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-11-21</span>
   </span>
 
     
diff --git a/v2.2.1/versioning/brief/index.html b/v2.2.1/versioning/brief/index.html
index a841af7..b303e0b 100644
--- a/v2.2.1/versioning/brief/index.html
+++ b/v2.2.1/versioning/brief/index.html
@@ -948,10 +948,8 @@ <h1>Brief Description of Versions</h1>
 </ul>
 </li>
 <li><strong>v1.6.x</strong> - only considers the genes included in the LIMS report to determine the drug output in the LIMS report</li>
-<li><strong>v1.5.x+ and v2.0.0</strong> - major changes to code in due to using results from TB-Profiler v6.2.0+<ul>
-<li>code changes for v2.x are available on the <code>who-v2</code> branch of <code>tbp-parser</code></li>
-</ul>
-</li>
+<li><strong>v1.5.x+ and v2.0.0</strong> - major changes to code in due to using results from TBProfiler v6.2.0+</li>
+<li><strong>v2.1.0</strong> - <mark><em>v1.6.0 and earlier versions are no longer supported</em></mark>; v2.1+ changes are included on <code>main</code> branch moving foward.</li>
 </ul>
 <p>For a more exhaustive list, please visit <a href="../exhaustive/">the Exhaustive List of Versions</a>.</p>
 
@@ -976,7 +974,7 @@ <h1>Brief Description of Versions</h1>
     <span class="md-icon" title="Last update">
       <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21 13.1c-.1 0-.3.1-.4.2l-1 1 2.1 2.1 1-1c.2-.2.2-.6 0-.8l-1.3-1.3c-.1-.1-.2-.2-.4-.2m-1.9 1.8-6.1 6V23h2.1l6.1-6.1zM12.5 7v5.2l4 2.4-1 1L11 13V7zM11 21.9c-5.1-.5-9-4.8-9-9.9C2 6.5 6.5 2 12 2c5.3 0 9.6 4.1 10 9.3-.3-.1-.6-.2-1-.2s-.7.1-1 .2C19.6 7.2 16.2 4 12 4c-4.4 0-8 3.6-8 8 0 4.1 3.1 7.5 7.1 7.9l-.1.2z"/></svg>
     </span>
-    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-08-20</span>
+    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-11-21</span>
   </span>
 
     
diff --git a/v2.2.1/versioning/exhaustive/index.html b/v2.2.1/versioning/exhaustive/index.html
index aaac0f2..b0bf003 100644
--- a/v2.2.1/versioning/exhaustive/index.html
+++ b/v2.2.1/versioning/exhaustive/index.html
@@ -963,7 +963,7 @@ <h1>Exhaustive List of Versions</h1>
 <li>v1.3.3 - condenses most gene segments into one, for WT mutations, set the mutation to “WT” not “NA”</li>
 <li>v1.3.4 - error-proofs maximum mdl interpretation determination and maximum looker interpretation determination</li>
 <li>v1.3.5 - adds rrs &amp; rrl frequency input parameters to customize mutation frequency for those genes , overwrites gene MDL interpretation when “Insufficient Coverage” to act as if “WT” if greater than S</li>
-<li>v1.3.6 - adds the TB-Profiler lineage to the end of the LIMS report and the Looker report, adds LIMS lineage to Looker report, introduces check if max MDL interpretation is also Insufficient Coverage to change output to Pending Retest</li>
+<li>v1.3.6 - adds the TBProfiler lineage to the end of the LIMS report and the Looker report, adds LIMS lineage to Looker report, introduces check if max MDL interpretation is also Insufficient Coverage to change output to Pending Retest</li>
 <li>v1.3.7 - add to the coverage report the “expert rule regions” column for tNGS, overwrites gene MDL interpretation when “Insufficient Coverage” to act as if “WT” if gr **<em><em>eater than </em>or equal to</em> S</li>
 <li>v1.3.8 - add frequency input parameters for rpoB 449 and ethA 237, renames coverage threshold to minimum percent coverage</li>
 <li>v1.3.9 - check if gene name is rpoB because that means it’s outside the expected region (tNGS - rpoB is in two segments), add rrs and rrl read support input parameters</li>
@@ -997,6 +997,10 @@ <h1>Exhaustive List of Versions</h1>
 <li>v1.5.10 - correct spelling of two genes in the LIMS report for cycloserine</li>
 <li>v1.6.0 (branched off of v1.4.4.11) - ensures that only LIMS genes are being considered for the LIMS report. <em>Use the smw-tb-2024-05-03-dev branch on Terra for this and all subsequent v1.6.x+ versions.</em></li>
 <li>v2.0.0 (branched off of v1.5.10; same change in v1.4.4.10 and v1.4.4.11) - fix issue when MDL resistance was being overwritten to Pending Retest but without considering other genes when calculating the highest MDL resistance (as the other genes may have had higher resistances that were not captured at first) and fixes the resulting issue where indexing errors occurred due to no more MDL interpretations. <em>Use the smw-tb-2024-05-03-who2-dev branch on Terra for this and all subsequent v2.x+ versions.</em></li>
+<li>v2.1.0 - any mutations in the 60 proximal promoter regions included in the WHO v2 database (Table 22, page 89-90). <em>Use either the smw-tbprofiler-updates-dev branch until the time of the v2.3.0 release of TheiaProk on Terra for this and all subsequent v2.1.x+ versions</em><ul>
+<li>Earlier versions are now deprecated and will no longer be supported.</li>
+</ul>
+</li>
 </ul>
 <hr />
 <p>The following diagram shows how each version is related to the others without technical details:
@@ -1023,7 +1027,7 @@ <h1>Exhaustive List of Versions</h1>
     <span class="md-icon" title="Last update">
       <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21 13.1c-.1 0-.3.1-.4.2l-1 1 2.1 2.1 1-1c.2-.2.2-.6 0-.8l-1.3-1.3c-.1-.1-.2-.2-.4-.2m-1.9 1.8-6.1 6V23h2.1l6.1-6.1zM12.5 7v5.2l4 2.4-1 1L11 13V7zM11 21.9c-5.1-.5-9-4.8-9-9.9C2 6.5 6.5 2 12 2c5.3 0 9.6 4.1 10 9.3-.3-.1-.6-.2-1-.2s-.7.1-1 .2C19.6 7.2 16.2 4 12 4c-4.4 0-8 3.6-8 8 0 4.1 3.1 7.5 7.1 7.9l-.1.2z"/></svg>
     </span>
-    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-08-20</span>
+    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-11-21</span>
   </span>
 
     
diff --git a/v2.2.1/versioning/index.html b/v2.2.1/versioning/index.html
index 6ac5926..7851586 100644
--- a/v2.2.1/versioning/index.html
+++ b/v2.2.1/versioning/index.html
@@ -753,6 +753,17 @@
         
       
       
+        <label class="md-nav__link md-nav__link--active" for="__toc">
+          
+  
+  <span class="md-ellipsis">
+    Versioning and Releases
+  </span>
+  
+
+          <span class="md-nav__icon md-icon"></span>
+        </label>
+      
       <a href="./" class="md-nav__link md-nav__link--active">
         
   
@@ -763,6 +774,61 @@
 
       </a>
       
+        
+
+<nav class="md-nav md-nav--secondary" aria-label="Table of contents">
+  
+  
+  
+    
+  
+  
+    <label class="md-nav__title" for="__toc">
+      <span class="md-nav__icon md-icon"></span>
+      Table of contents
+    </label>
+    <ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>
+      
+        <li class="md-nav__item">
+  <a href="#validated-versions" class="md-nav__link">
+    <span class="md-ellipsis">
+      Validated Versions
+    </span>
+  </a>
+  
+</li>
+      
+        <li class="md-nav__item">
+  <a href="#interpretation-documents" class="md-nav__link">
+    <span class="md-ellipsis">
+      Interpretation Documents
+    </span>
+  </a>
+  
+</li>
+      
+        <li class="md-nav__item">
+  <a href="#when-using-theiaprok" class="md-nav__link">
+    <span class="md-ellipsis">
+      When Using TheiaProk
+    </span>
+  </a>
+  
+</li>
+      
+        <li class="md-nav__item">
+  <a href="#version-differences" class="md-nav__link">
+    <span class="md-ellipsis">
+      Version Differences
+    </span>
+  </a>
+  
+</li>
+      
+    </ul>
+  
+</nav>
+      
     </li>
   
 
@@ -944,18 +1010,36 @@
 
 
 <h1 id="versioning-and-releases">Versioning and Releases<a class="headerlink" href="#versioning-and-releases" title="Permanent link">&para;</a></h1>
+<h2 id="validated-versions">Validated Versions<a class="headerlink" href="#validated-versions" title="Permanent link">&para;</a></h2>
 <p>The California Department of Public Health has clinically validated the following versions:</p>
 <ul>
 <li><strong>v1.2.2 for WGS</strong>, and</li>
 <li><strong>v1.4.4.8 for tNGS</strong></li>
 </ul>
-<p>Interpretation documents for v1.2.2 and v1.4.4.8 are available in the <a href="https://www.github.com/theiagen/tbp-parser">root directory</a> of the <code>tbp-parser</code> repository; others are available in the <a href="https://github.com/theiagen/tbp-parser/tree/main/interpretation_docs">interpretation_docs</a> directory on GitHub.</p>
+<div class="admonition warning">
+<p class="admonition-title">Validate Before Use</p>
+<p><strong>CAUTION</strong>: The information produced by this program should <strong>not</strong> be used for clinical reporting unless and until extensive validation has occured in <mark><em><strong>your</strong></em></mark> laboratory on a stable version. Otherwise, the outputs of tbp-parser are for research use only.</p>
+</div>
+<h2 id="interpretation-documents">Interpretation Documents<a class="headerlink" href="#interpretation-documents" title="Permanent link">&para;</a></h2>
+<p>Interpretation documents for v1.2.2 and v1.4.4.8 are available in the <a href="https://www.github.com/theiagen/tbp-parser">root directory</a> of the <code>tbp-parser</code> repository.</p>
+<p>Interpretation documents for other versions are available in the <a href="https://github.com/theiagen/tbp-parser/tree/main/interpretation_docs">interpretation_docs</a> directory on GitHub.</p>
+<h2 id="when-using-theiaprok">When Using TheiaProk<a class="headerlink" href="#when-using-theiaprok" title="Permanent link">&para;</a></h2>
+<div class="admonition dna">
+<p class="admonition-title">FUTURE DEPRECATION NOTICE</p>
+<p><mark><strong>At the time of the PHB v2.3.0 release:</strong></mark></p>
+<ul>
+<li><strong>all</strong> branches on Terra that have been mentioned in this documentation will be deleted. Please use the v2.3.0 version of TheiaProk moving forward.</li>
+<li>the <code>main</code> branch of tbp-parser will host v2.1.0 and above; earlier versions of tbp-parser will no longer be supported</li>
+</ul>
+</div>
 <p>If you are running tbp-parser as part of the TheiaProk pipeline(s) with Terra, the following branches are recommended:</p>
 <ul>
 <li>To run v1.2.2 on Terra, please use the <strong>smw-tb-2024-01-16-dev</strong> branch.</li>
 <li>To run v1.4.4.8+ and v1.6.x+, please use the <strong>smw-tb-2024-05-03-dev</strong> branch.</li>
 <li>To run v1.5.x+ and v2.x+, please use the <strong>smw-tb-2024-05-03-<em>who2</em>-dev</strong> branch.</li>
+<li>To run v2.1.0+, please use the <strong>smw-tbprofiler-updates-dev</strong> branch <em>until the time of the v2.3.0 PHB release</em>.</li>
 </ul>
+<h2 id="version-differences">Version Differences<a class="headerlink" href="#version-differences" title="Permanent link">&para;</a></h2>
 <p>For more information on the differences between versions, you can see the <a href="brief/">Brief Description of Versions</a> or the <a href="exhaustive/">Exhaustive List of Versions</a>.</p>
 
 
@@ -979,7 +1063,7 @@ <h1 id="versioning-and-releases">Versioning and Releases<a class="headerlink" hr
     <span class="md-icon" title="Last update">
       <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21 13.1c-.1 0-.3.1-.4.2l-1 1 2.1 2.1 1-1c.2-.2.2-.6 0-.8l-1.3-1.3c-.1-.1-.2-.2-.4-.2m-1.9 1.8-6.1 6V23h2.1l6.1-6.1zM12.5 7v5.2l4 2.4-1 1L11 13V7zM11 21.9c-5.1-.5-9-4.8-9-9.9C2 6.5 6.5 2 12 2c5.3 0 9.6 4.1 10 9.3-.3-.1-.6-.2-1-.2s-.7.1-1 .2C19.6 7.2 16.2 4 12 4c-4.4 0-8 3.6-8 8 0 4.1 3.1 7.5 7.1 7.9l-.1.2z"/></svg>
     </span>
-    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-08-20</span>
+    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-iso_date">2024-11-21</span>
   </span>
 
     

The path to the JSON file that was produced by `TB-Profiler`	The path to the JSON file that was produced by `TBProfiler`
input_bam	The path to the BAM file that was produced by `TB-Profiler`	The path to the BAM file that was produced by `TBProfiler`
--tngs_expert_regions	A BED file containing the regions to calculate coverage for expert rule regions. This is used to determine coverage quality in the regions where resistance-conferring mutations are found, or where a CDC expert rule is applied. This is not used for QC purposes	/data/tbdb-expert-regions.bed	/data/tbdb-expert-regions.bed
--rrs_frequency
Name	Description	Default Value
--verbose	Increases the output verbosity to describe which stage of the analysis is currently running	false
--debug	The highest level of output verbosity detailing every step of the analysis and logic implemented; overwrites --verbose	false