Skip to content

Commit

Permalink
deprecate v1 version of tbp-parser and only support v2 moving forward (
Browse files Browse the repository at this point in the history
…#16)

* update for tbprofiler v6.2.0

* update version

* add an additional case where R interim has a captial R

* genome_pos -> pos

* fix pending retest/no sequence issue & update version

* fix an edge case bug and rename consequences

* make sure mmpR5 is renamed appropriately

* update version tag

* prevent R mutations from being over written

* version update

* add comment field

* combine two conditionals

* fix issue with responsible genes not being restricted to the gene dictionary list

* update version

* add cycloserine

* apply deletion coverage fix

* fix percentage calculations

* update percentage limit for tngs

* lowercase ald/alr

* update version

* immediately exclude mutations with failed quality in the position warnings

* comma to parentheses

* add additional promoters; create standard function

* add comment

* bump version

* Update README.md

* update documentation and bump version to appropriate level

* update docs
  • Loading branch information
sage-wright authored Nov 21, 2024
1 parent 4aee67c commit a5c4373
Show file tree
Hide file tree
Showing 37 changed files with 396 additions and 224 deletions.
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Shelby Bennett, Erin Young, Curtis Kapsak, & Kutluhan Incekara

ARG SAMTOOLS_VER="1.18"
ARG TBP_PARSER_VER="1.6.0"
ARG TBP_PARSER_VER="2.1.0"

FROM ubuntu:jammy as builder

Expand Down Expand Up @@ -42,7 +42,7 @@ ARG TBP_PARSER_VER
LABEL base.image="ubuntu:jammy"
LABEL dockerfile.version="1"
LABEL software="tbp-parser"
LABEL software.version="1.6.0"
LABEL software.version="2.1.0"
LABEL description="tbp-parser and samtools"
LABEL website="https://github.com/theiagen/tbp-parser"
LABEL license="https://github.com/theiagen/tbp-parser/blob/main/LICENSE"
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@

## Overview

This repository contains the tbp-parser tool which parses the JSON output of [Jody Phelan's TB-Profiler tool](https://github.com/jodyphelan/TBProfiler). Available as a downloadable Python package and as a Docker image, tbp-parser converts the output of TB-Profiler into four files.
This repository contains the tbp-parser tool which parses the JSON output of [Jody Phelan's TBProfiler tool](https://github.com/jodyphelan/TBProfiler). Available as a downloadable Python package and as a Docker image, tbp-parser converts the output of TBProfiler into four files.

Please reach out to us at [[email protected]](mailto:[email protected]) if you would like any custom file formats and/or changes to these output files that suit your individual needs.

[Please see our full documentation here](https://theiagen.github.io/tbp-parser/).
[See our full documentation here](https://theiagen.github.io/tbp-parser/).
9 changes: 5 additions & 4 deletions docs/algorithm/technical.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
---
title: "Technical Code Breakdown"
---
!!! tip inline end "Examples from TBProfiler v4.4.2"
The examples in this document are based on the output of TBProfiler v4.4.2. However, the general principles apply to all versions of TBProfiler and tbp-parser.

# Technical Code Breakdown

`tbp-parser` is object-oriented, with each class representing either *an output file*, *a part of an output file*, or *a part of the input JSON file* produced by TBProfiler.

The first class that is invoked by the `tbp-parser.py` script is `Parser` which is a control class that orchestrates the creation of the different output reports.
The first class that is invoked by the `tbp-parser.py` script is `Parser` which is a control class that orchestrates the creation of the different output reports.

## Calculating percent gene coverage

Expand Down
Binary file modified docs/assets/tbp-parser_versioning.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,15 @@
!!! warning "Not for Diagnostic Use"
**CAUTION**: The information produced by this program should **not** be used for clinical reporting unless and until extensive validation has occured in your laboratory on a stable version. Otherwise, the outputs of tbp-parser are for research use only.

!!! dna "FUTURE DEPRECATION NOTICE"
==**At the time of the PHB v2.3.0 release:**==

- **all** branches on Terra that have been mentioned in this documentation will be deleted. Please use the v2.3.0 version of TheiaProk moving forward.
- the `main` branch of tbp-parser will host v2.1.0 and above; earlier versions of tbp-parser will no longer be supported
- future releases of tbp-parser will only support outputs generated by TBProfiler v6.0.0 and above.

**Versions of TBProfiler prior to v6.0.0 are not compatible with v2+ of tbp-parser.** Please ensure that you are using the correct version of tbp-parser for your version of TBProfiler.

## Overview

`tbp-parser` is a tool developed in partnership with the California Department of Health (CDPH) to parse the output of [Jody Phelan’s TBProfiler tool](https://github.com/jodyphelan/TBProfiler) into four additional files:
Expand Down
15 changes: 8 additions & 7 deletions docs/inputs/inputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@ The inputs on this page reflect the parameters that are applicable for the comma

## Required Inputs

`tbp-parser` is designed to run immediately after [Jody Phelan’s TB-Profiler tool](https://github.com/jodyphelan/TBProfiler). Only two inputs are required: the JSON file produced by `TB-Profiler` and the BAM file produced by `TB-Profiler`.
`tbp-parser` is designed to run immediately after [Jody Phelan’s TBProfiler tool](https://github.com/jodyphelan/TBProfiler). Only two inputs are required: the JSON file produced by `TBProfiler` and the BAM file produced by `TBProfiler`.

The JSON file contains information about the mutations detected in the sample: the quality, the type, and if that mutation confers resistance to an antimicrobial drug. The BAM file contains the alignment information for the sample and is needed for determining sequencing quality.

| Parameter | Description |
| :--------- | :---------- |
| input_json | The path to the JSON file that was produced by `TB-Profiler` |
| input_bam | The path to the BAM file that was produced by `TB-Profiler` |
| input_json | The path to the JSON file that was produced by `TBProfiler` |
| input_bam | The path to the BAM file that was produced by `TBProfiler` |

!!! info
!!! info "BAM index file required"
The BAM file must have the accompanying BAI file in the same directory. It must also be named exactly the same as the BAM file but ending with a `.bai` suffix.

## Optional Inputs
Expand All @@ -34,7 +34,7 @@ These options determine the thresholds for quality control.
| -c | --min_percent_coverage | The minimum percentage of a region that has depth above the threshold set by `min_depth` (used for a gene/locus to pass QC) | 100 |
| -s | --min_read_support | The minimum read support for a mutation to pass QC | 10
| -f | --min_frequency | The minimum frequency for a mutation to pass QC (0.1 -> 10%)| 0.1 |
| -r | --coverage_regions | A BED file containing the regions to calculate percent coverage for | [/data/tbdb-modified-regions.md](https://github.com/theiagen/tbp-parser/blob/v1.6.0/data/tbdb-modified-regions.bed) |
| -r | --coverage_regions | A BED file containing the regions to calculate percent coverage for | [/data/tbdb-modified-regions.md](https://github.com/theiagen/tbp-parser/blob/main/data/tbdb-modified-regions.bed) |

### Text Arguments

Expand All @@ -56,12 +56,12 @@ These options are used to customize the LIMS report

### tNGS-specific Arguments

These options are primarily used for tNGS data, although all frequency arguments are compatible with WGS data.
These options are primarily used for tNGS data, although all frequency and read support arguments are compatible with WGS data.

| Name | Description | Default Value |
| :--- | :---------- | :------------ |
| --tngs | Indicates that the input data was generated using the Deeplex + CDPH modified protocol. Turns on tNGS-specific global parameters | false |
| --tngs_expert_regions | A BED file containing the regions to calculate coverage for expert rule regions. This is used to determine coverage quality in the regions where resistance-conferring mutations are found, or where a CDC expert rule is applied. This is not used for QC purposes | [/data/tbdb-expert-regions.bed](https://github.com/theiagen/tbp-parser/blob/v1.6.0/data/tbdb-expert-regions.bed) |
| --tngs_expert_regions | A BED file containing the regions to calculate coverage for expert rule regions. This is used to determine coverage quality in the regions where resistance-conferring mutations are found, or where a CDC expert rule is applied. This is not used for QC purposes | [/data/tbdb-expert-regions.bed](https://github.com/theiagen/tbp-parser/blob/main/data/tbdb-expert-regions.bed) |
| --rrs_frequency | The minimum frequency for an _rrs_ mutation to pass QC, as _rrs_ has several problematic sites in the Deeplex tNGS assay | 0.1 |
| --rrl_frequency | The minimum frequency for an _rrl_ mutation to pass QC, as _rrl_ has several problematic sites in the Deeplex tNGS assay | 0.1 |
| --rrs_read_support | The minimum read support for an _rrs_ mutation to pass QC, as _rrs_ has several problematic sites in the Deeplex tNGS assay | 10 |
Expand All @@ -72,6 +72,7 @@ These options are primarily used for tNGS data, although all frequency arguments
### Logging Arguments

These options change the verbosity of the `stdout` log

| Name | Description | Default Value |
| :--- | :---------- | :------------ |
| --verbose | Increases the output verbosity to describe which stage of the analysis is currently running | false |
Expand Down
42 changes: 25 additions & 17 deletions docs/inputs/theiaprok.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,38 +2,46 @@
title: TheiaProk Inputs on Terra
---

When running `tbp-parser` as part of the TheiaProk workflow series ([find documentation for TheiaProk here](https://theiagen.notion.site/Theiagen-Public-Health-Resources-a4bd134b0c5c4fe39870e21029a30566?pvs=4)) on [Terra.bio](https://terra.bio), an optional input must be activated to instruct TheiaProk to run `tbp-parser`.
When running `tbp-parser` as part of the TheiaProk workflow series ([find documentation for TheiaProk here](https://theiagen.github.io/public_health_bioinformatics/latest/workflows/genomic_characterization/theiaprok/)) on [Terra.bio](https://terra.bio), an optional input must be activated to instruct TheiaProk to run `tbp-parser`.

`tbp-parser` is not on by default due to the nature of this tool and its outputs.

!!! info annotate "TheiaProk Version"
This information only corresponds to PHB v2.2.0. These inputs and outputs may not be applicable to other versions of TheiaProk.
This information only corresponds to the upcoming PHB v2.3.0 release. These inputs and outputs may not be applicable to other versions of TheiaProk.

*[PHB]: Public Health Bioinformatics is the GitHub repository that contains the TheiaProk workflows.

## Required Inputs

To activate `tbp-parser` you must set the following variable to true:

| Terra Task name | Variable | Type | Default value | Description |
| Terra Task name | Variable | Type | Description | Default Value |
| :-------------- | :------- | :--- | :------------ | :---------- |
| `merlin_magic` | `tbprofiler_additional_outputs` | Boolean | `false` | Set to `true` to activate `tbp-parser` |
| `merlin_magic` | **call_tbp_parser** | Boolean | Set to `true` to activate `tbp-parser` | `false` |

## Optional Inputs

The following optional inputs are also available for user modification on Terra:

| Terra Task name | Variable | Type | Default value | Description |
| Terra Task name | Variable | Type | Description | Default Value |
| :-------------- | :------- | :--- | :------------ | :---------- |
| `merlin_magic` | `tbp_parser_output_seq_method_type` | String | "WGS" | Fills out the “seq_method” field in the tbp_parser output files |
| `merlin_magic` | `tbp_parser_operator` | String | "Operator not provided" | The operator who ran the analysis; used in the LIMS & Looker reports |
| `merlin_magic` | `tbp_parser_min_depth` | Int | 10 | The minimum depth of coverage required for a site to pass QC |
| `merlin_magic` | `tbp_parser_min_frequency` | Int | 0.1 | The minimum frequency for a mutation to pass QC (0.1 -> 10%) |
| `merlin_magic` | `tbp_parser_min_read_support` | Int | 10 | The minimum read support for a mutation to pass QC |
| `merlin_magic` | `tbp_parser_coverage_threshold` | Int | 100 | The minimum percentage of a region that has depth above the threshold set by `min_depth` (used for a gene/locus to pass QC) |
| `merlin_magic` | `tbp_parser_coverage_regions_bed` | File | [tbdb-modified-regions.md](https://github.com/theiagen/tbp-parser/blob/v1.6.0/data/tbdb-modified-regions.bed) | A BED file containing the regions to calculate percent coverage for |
| `merlin_magic` | `tbp_parser_debug` | Boolean | false | Turn on debug mode for tbp-parser |
| `merlin_magic` | `tbp_parser_add_cs_lims` | Boolean | false | Adds Cycloserine (CS) fields to the LIMS report |
| `merlin_magic` | `tbp_parser_docker_image` | String | "us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.6.0" | The Docker image to use when running tbp-parser |

[Find the outputs for `tbp-parser` in TheiaProk on Terra here](../outputs/theiaprok.md).
| `merlin_magic` | **tbp_parser_add_cs_lims** | Boolean | Set to `true` to add Cycloserine (CS) fields to the LIMS report | `false` |
| `merlin_magic` | **tbp_parser_coverage_regions_bed** | File | A BED file containing the regions to calculate percent coverage for | [tbdb-modified-regions.md](https://github.com/theiagen/tbp-parser/blob/main/data/tbdb-modified-regions.bed) |
| `merlin_magic` | **tbp_parser_coverage_threshold** | Int | The minimum percentage of a region that has depth above the threshold set by `min_depth` (used for a gene/locus to pass QC) | 100 |
| `merlin_magic` | **tbp_parser_debug** | Boolean | Set to `false` to turn off debug mode for `tbp-parser` | `true` |
| `merlin_magic` | **tbp_parser_docker_image** | String | The Docker image to use when running `tbp-parser` | "us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.1.0" |
| `merlin_magic` | **tbp_parser_etha237_frequency** | Float | Minimum frequency for a mutation in ethA at protein position 237 to pass QC in `tbp-parser` | 0.1 |
| `merlin_magic` | **tbp_parser_expert_rule_regions_bed** | File | A file that contains the regions where R mutations and expert rules are applied | |
| `merlin_magic` | **tbp_parser_min_depth** | Int | Minimum depth for a variant to pass QC in tbp_parser | 10 |
| `merlin_magic` | **tbp_parser_min_frequency** | Int | The minimum frequency for a mutation to pass QC | 0.1 |
| `merlin_magic` | **tbp_parser_min_read_support** | Int | The minimum read support for a mutation to pass QC | 10 |
| `merlin_magic` | **tbp_parser_operator** | String | Fills the "operator" field in the tbp_parser output files | "Operator not provided" |
| `merlin_magic` | **tbp_parser_output_seq_method_type** | String | Fills out the "seq_method" field in the tbp_parser output files | "Sequencing method not provided" |
| `merlin_magic` | **tbp_parser_rpob449_frequency** | Float | Minimum frequency for a mutation at protein position 449 to pass QC in `tbp-parser` | 0.1 |
| `merlin_magic` | **tbp_parser_rrl_frequency** | Float | Minimum frequency for a mutation in rrl to pass QC in `tbp-parser` | 0.1 |
| `merlin_magic` | **tbp_parser_rrl_read_support** | Int | Minimum read support for a mutation in rrl to pass QC in `tbp-parser` | 10 |
| `merlin_magic` | **tbp_parser_rrs_frequency** | Float | Minimum frequency for a mutation in rrs to pass QC in `tbp-parser` | 0.1 |
| `merlin_magic` | **tbp_parser_rrs_read_support** | Int | Minimum read support for a mutation in rrs to pass QC in `tbp-parser` | 10 |
| `merlin_magic` | **tbp_parser_tngs_data** | Boolean | Set to `true` to enable tNGS-specific parameters and runs in `tbp-parser` | `false` |

[Find the outputs for `tbp-parser` in TheiaProk on Terra here](../outputs/theiaprok.md).
4 changes: 2 additions & 2 deletions docs/outputs/coverage.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,6 @@ If the `--tngs` flag is used, the report contains the following fields:
| QC_Warning | Indicates if any deletions were identified in the gene which may contribute to lower than expected coverage |
| Coverage_Breadth_R_expert-rule_region | The percent of the regions (positions that could contain any resistance-conferring mutations or require expert-rule application) that is covered at a depth greater than the `--min_depth` value |

Coverage regions are determined with either the default "../data/tbdb-modified-regions.bed" (collected on Sep 1, 2023 from the TBProfiler repository, or if `--tngs`, "../data/tngs-reportable-regions.bed".
Coverage regions are determined with either the default [/data/tbdb-modified-regions.bed](https://github.com/theiagen/tbp-parser/blob/main/data/tbdb-modified-regions.bed) (collected on Sep 1, 2023 from the TBProfiler repository, or if `--tngs`, [/data/tngs-reportable-regions.bed](https://github.com/theiagen/tbp-parser/blob/main/data/tngs-reportable-regions.bed).

The R-expert rule region is determined only if `--tngs` is indicated and uses the ranges in "../data/tngs-expert-rule-regions.bed".
The R-expert rule region is determined only if `--tngs` is indicated and uses the ranges in [/data/tbdb-expert-regions.bed](https://github.com/theiagen/tbp-parser/blob/main/data/tbdb-expert-regions.bed).
2 changes: 1 addition & 1 deletion docs/outputs/looker.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ The Looker report is intended for use in Google's Looker Studio Data Studio for
| pyrazinamide | The highest `looker_interpretation` resistance identified for mutations associated with this drug |
| rifampin | The highest `looker_interpretation` resistance identified for mutations associated with this drug |
| streptomycin | The highest `looker_interpretation` resistance identified for mutations associated with this drug |
| lineage | The lineage of the sample (the `main_lin` field as reported by TB-Profiler); for example, lineage1.2.1.2.1 |
| lineage | The lineage of the sample (the `main_lin` field as reported by TBProfiler); for example, lineage1.2.1.2.1 |
| ID | The lineage of the sample in human-readable language (the same as `M_DST_A01_ID` in the LIMS report) |
| analysis_date | The date `tbp-parser` was run in YYYY-MM-DD HH:SS format |
| operator | The name of the person who ran `tbp-parser`; can be provided with the `--operator` input parameter. If left blank, “Operator not provided” is the default value. |
Expand Down
4 changes: 2 additions & 2 deletions docs/outputs/theiaprok.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ title: TheiaProk Outputs on Terra
---


When running `tbp-parser` as part of the TheiaProk workflow series ([find documentation for TheiaProk here](https://theiagen.notion.site/Theiagen-Public-Health-Resources-a4bd134b0c5c4fe39870e21029a30566?pvs=4)) on [Terra.bio](https://terra.bio), you will find the following outputs in your data table.
When running `tbp-parser` as part of the TheiaProk workflow series ([find documentation for TheiaProk here](https://theiagen.github.io/public_health_bioinformatics/latest/workflows/genomic_characterization/theiaprok/)) on [Terra.bio](https://terra.bio), you will find the following outputs in your data table.

!!! info annotate "TheiaProk Version"
This information only corresponds to PHB v2.2.0. These inputs and outputs may not be applicable to other versions of TheiaProk.
This information only corresponds to the upcoming PHB v2.3.0 release. These inputs and outputs may not be applicable to other versions of TheiaProk.

*[PHB]: Public Health Bioinformatics is the GitHub repository that contains the TheiaProk workflows.

Expand Down
13 changes: 7 additions & 6 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,19 @@ title: Getting Started
We highly recommend using the following Docker iamge to run tbp-parser:

``` bash
docker pull us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.6.0 #(1)!
docker pull us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.1.0 #(1)!
```

1. We host our Docker images on the Google Artifact Registry so that they are always availble for usage.

The entrypoint for this Docker iamge is the `tbp-parser` help message. To run this container *interactively*, use the following command:
The entrypoint for this Docker image is the `tbp-parser` help message. To run this container *interactively*, you can use the following command:

``` bash
docker run -it --entrypoint=/bin/bash us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.6.0
docker run -it --entrypoint=/bin/bash us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.1.0

# Once inside the container interactively, you can run the tbp-parser tool
python3 /tbp-parser/tbp_parser/tbp_parser.py -v
# v1.6.0
# v2.1.0
```

### Locally with Python
Expand All @@ -32,7 +33,7 @@ python3 /tbp-parser/tbp_parser/tbp_parser.py -v
- importlib_resources
- samtools

After installation of these dependencies, download and extract the latest release of `tbp-parser` and run the script with `Python`.
After installation of these dependencies, download and extract the latest release of `tbp-parser` and run the script with `python3`.

## Usage

Expand All @@ -51,7 +52,7 @@ python3 /tbp-parser/tbp_parser/tbp_parser.py \
--operator "John Doe"
```

Please note that the BAM file must have the accompanying BAI file in the same directory. It must also be named exactly the same as the BAM file but ending with a .bai suffix.
Please note that the BAM file must have the accompanying BAI file in the same directory.

### Help Message

Expand Down
Loading

0 comments on commit a5c4373

Please sign in to comment.