Skip to content

Commit

Permalink
updates lims ID
Browse files Browse the repository at this point in the history
  • Loading branch information
sage-wright committed Dec 23, 2024
1 parent e6d1afb commit 232e8e6
Show file tree
Hide file tree
Showing 6 changed files with 41 additions and 37 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Shelby Bennett, Erin Young, Curtis Kapsak, & Kutluhan Incekara

ARG SAMTOOLS_VER="1.18"
ARG TBP_PARSER_VER="2.2.2"
ARG TBP_PARSER_VER="2.3.0"

FROM ubuntu:jammy AS builder

Expand Down
2 changes: 1 addition & 1 deletion docs/inputs/theiaprok.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The following optional inputs are also available for user modification on Terra:
| `merlin_magic` | **tbp_parser_coverage_regions_bed** | File | A BED file containing the regions to calculate percent coverage for | [tbdb-modified-regions.md](https://github.com/theiagen/tbp-parser/blob/main/data/tbdb-modified-regions.bed) |
| `merlin_magic` | **tbp_parser_coverage_threshold** | Int | The minimum percentage of a region that has depth above the threshold set by `min_depth` (used for a gene/locus to pass QC) | 100 |
| `merlin_magic` | **tbp_parser_debug** | Boolean | Set to `false` to turn off debug mode for `tbp-parser` | `true` |
| `merlin_magic` | **tbp_parser_docker_image** | String | The Docker image to use when running `tbp-parser` | "us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.2.2" |
| `merlin_magic` | **tbp_parser_docker_image** | String | The Docker image to use when running `tbp-parser` | "us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.3.0" |
| `merlin_magic` | **tbp_parser_etha237_frequency** | Float | Minimum frequency for a mutation in ethA at protein position 237 to pass QC in `tbp-parser` | 0.1 |
| `merlin_magic` | **tbp_parser_expert_rule_regions_bed** | File | A file that contains the regions where R mutations and expert rules are applied | |
| `merlin_magic` | **tbp_parser_min_depth** | Int | Minimum depth for a variant to pass QC in tbp_parser | 10 |
Expand Down
4 changes: 2 additions & 2 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ title: Getting Started
We highly recommend using the following Docker iamge to run tbp-parser:

``` bash
docker pull us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.2.2 #(1)!
docker pull us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.3.0 #(1)!
```

1. We host our Docker images on the Google Artifact Registry so that they are always availble for usage.
Expand All @@ -21,7 +21,7 @@ docker run -it --entrypoint=/bin/bash us-docker.pkg.dev/general-theiagen/theiage

# Once inside the container interactively, you can run the tbp-parser tool
python3 /tbp-parser/tbp_parser/tbp_parser.py -v
# v2.2.2
# 2.3.0
```

### Locally with Python
Expand Down
3 changes: 2 additions & 1 deletion docs/versioning/exhaustive.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,8 @@ The following is a list of every version of `tbp-parser` and a short summary of
- v2.2.0 - removes ciprofloxacin, fluoroquinolones, and ofloxacin from gyrA and gyrB and aminoglycosides from rrs in the `globals.GENE_TO_ANTIMICROBIAL_DRUG_NAME` dictionary; if a drug is missing in the TBProfiler JSON's gene_associated_drug field that is present in that global dictionary, it will be added for the mutation.
- v2.2.1 - fixes a bug where rifampicin was not renamed to rifampin, which caused duplicate lines to appear in the Laboratorian report.
- v2.2.2 - removes the high-level and low-level resistance comments from the LIMS report

- v2.3.0 - reworks the lineage detection so that if TBProfiler detects a lineage, it is reported; if no lineage is reported by TBProfiler, then whether or not M.tb was detected depends on the percentage of LIMS genes above a (now lower) default percentage of 0.7

---

The following diagram shows how each version is related to the others without technical details:
Expand Down
65 changes: 34 additions & 31 deletions tbp_parser/LIMS.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,18 @@ def get_id(self):
"""
self.logger.info("LIMS:Within LIMS class get_id function")

# if the percentage of genes above the coverage threshold is greater than 70%, then we can call the lineage if TBProfiler did not designate it
percentage_limit = 0.7

# calculate percentage of genes in the LIMS report above the coverage threshold
self.logger.debug("LIMS:Calculating the percentage of LIMS genes above the coverage threshold")
if self.tngs:
number_of_lims_genes_above_coverage_threshold = sum(int(globals.COVERAGE_DICTIONARY[gene]) >= 90 for gene in globals.COVERAGE_DICTIONARY.keys())
percentage_lims_genes_above = number_of_lims_genes_above_coverage_threshold / len(globals.COVERAGE_DICTIONARY.keys())
# if the percentage of genes above the coverage threshold is greater than 70%, then we can call the lineage
percentage_limit = 0.7

else:
number_of_lims_genes_above_coverage_threshold = sum(int(globals.COVERAGE_DICTIONARY[gene]) >= globals.COVERAGE_THRESHOLD for gene in globals.GENES_FOR_LIMS)
percentage_lims_genes_above = number_of_lims_genes_above_coverage_threshold / len(globals.GENES_FOR_LIMS)
# if the percentage of genes above the coverage threshold is greater than 90%, then we can call the lineage
percentage_limit = 0.9

self.logger.debug("LIMS:The percentage of LIMS genes above the coverage threshold is {}".format(percentage_lims_genes_above))

Expand All @@ -54,37 +54,40 @@ def get_id(self):
self.logger.debug("LIMS:The detected lineage is: '{}', and the detected sublineage is: '{}'".format(detected_lineage, detected_sublineage))

sublineages = detected_sublineage.split(";")
if percentage_lims_genes_above >= percentage_limit:
self.logger.debug("LIMS:Percentage of LIMS genes above the coverage threshold is GREATER than 90%")

if self.tngs:
self.logger.debug("LIMS:The sequencing method is tNGS; now checking for a His57Asp mutation in pncA")
pncA_mutations = globals.DF_LABORATORIAN[(globals.DF_LABORATORIAN["tbprofiler_gene_name"] == "pncA")]
if "p.His57Asp" in pncA_mutations["tbprofiler_variant_substitution_aa"].tolist():
self.logger.debug("LIMS:p.His57Asp detected in pncA, lineage is likely M. bovis")
lineage.add("DNA of Mycobacterium bovis detected")
else:
self.logger.debug("LIMS:p.His57Asp not detected in pncA, lineage is likely M. tuberculosis")
lineage.add("DNA of Mycobacterium tuberculosis complex detected (not M. bovis)")
if self.tngs:
self.logger.debug("LIMS:The sequencing method is tNGS; now checking for a His57Asp mutation in pncA")
pncA_mutations = globals.DF_LABORATORIAN[(globals.DF_LABORATORIAN["tbprofiler_gene_name"] == "pncA")]
if "p.His57Asp" in pncA_mutations["tbprofiler_variant_substitution_aa"].tolist():
self.logger.debug("LIMS:p.His57Asp detected in pncA, lineage is likely M. bovis")
lineage.add("DNA of Mycobacterium bovis detected")
else:
self.logger.debug("LIMS:p.His57Asp not detected in pncA, lineage is likely M. tuberculosis")
lineage.add("DNA of Mycobacterium tuberculosis complex detected (not M. bovis)")

else:
self.logger.debug("LIMS:The sequencing method is WGS; now checking the TBProfiler lineage calls")
if "lineage" in detected_lineage:
lineage.add("DNA of Mycobacterium tuberculosis species detected")

for sublineage in sublineages:
if "BCG" in detected_lineage or "BCG" in sublineage:
lineage.add("DNA of Mycobacterium bovis BCG detected")

elif ("La1" in detected_lineage or "La1" in sublineage) or ("bovis" in detected_lineage or "bovis" in sublineage):
lineage.add("DNA of Mycobacterium bovis (not BCG) detected")

if len(lineage) == 0:
if (percentage_lims_genes_above >= percentage_limit):
self.logger.debug("LIMS:Percentage of LIMS genes above the coverage threshold is GREATER than 90% AND no lineage has been detected")
self.logger.debug("LIMS:TBProfiler was likely unable to determine the lineage, but since percentage_lims_genes_above >= percentage_limit, we will assume M.tb")

else:
self.logger.debug("LIMS:The sequencing method is WGS; now checking the TBProfiler lineage calls")
if "lineage" in detected_lineage:
lineage.add("DNA of Mycobacterium tuberculosis species detected")

for sublineage in sublineages:
if "BCG" in detected_lineage or "BCG" in sublineage:
lineage.add("DNA of Mycobacterium bovis BCG detected")

elif ("La1" in detected_lineage or "La1" in sublineage) or ("bovis" in detected_lineage or "bovis" in sublineage):
lineage.add("DNA of Mycobacterium bovis (not BCG) detected")

if detected_lineage == "" or detected_lineage == "NA" or len(lineage) == 0:
lineage.add("DNA of Mycobacterium tuberculosis complex detected")

else:
self.logger.debug("LIMS:Percentage of LIMS genes above the coverage threshold is LESS than 90%")
lineage.add("DNA of Mycobacterium tuberculosis complex NOT detected")
else:
self.logger.debug("LIMS:Percentage of LIMS genes above the coverage threshold is LESS than 90% AND no lineage has been detected")
lineage.add("DNA of Mycobacterium tuberculosis complex NOT detected")

lineage = "; ".join(sorted(lineage))

Expand Down
2 changes: 1 addition & 1 deletion tbp_parser/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__VERSION__ = "v2.2.0"
__VERSION__ = "v2.3.0"

0 comments on commit 232e8e6

Please sign in to comment.