Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add clumpify-based dedup #970

Open
wants to merge 56 commits into
base: master
Choose a base branch
from
Open

add clumpify-based dedup #970

wants to merge 56 commits into from

Commits on Jul 2, 2019

  1. add bbmap.BBMapTool().dedup_clumpify()

    add bbmap.BBMapTool().dedup_clumpify(), along with unit tests
    tomkinsc committed Jul 2, 2019
    Configuration menu
    Copy the full SHA
    a5d58b5 View commit details
    Browse the repository at this point in the history
  2. pass JVMmemory; add read_utils.rmdup_clumpify_bam; dedup_bam WDL task

    pass JVMmemory to bbmap and clumpify; add rmdup_clumpify_bam to read_utils.py; change TestRmdupUnaligned unit tests for bbmap to use read_utils.py::rmdup_clumpify_bam; add dedup_bam WDL task to tasks_read_utils.wdl
    tomkinsc committed Jul 2, 2019
    Configuration menu
    Copy the full SHA
    595764e View commit details
    Browse the repository at this point in the history
  3. switch from mvicuna to clumpify-based dedup in taxon_filter.py deplete

    replace mvicuna-based read deduplication in taxon_filter.py::deplete() with clumpify-based deduplication that occurs farther upstream in advance of BWA-based depletion; add dedup_bam WDL workflow; in dedup_bam WDL task, create and emit FastQC report of only de-duplicated reads; update unit test input to include dup reads, and update expected output for the test_taxon_filter::TestDepleteHuman integration tests to reflect difference in output from clumpify vs previous mvicuna output
    tomkinsc committed Jul 2, 2019
    Configuration menu
    Copy the full SHA
    09901d3 View commit details
    Browse the repository at this point in the history
  4. replace unicode apostrophe

    tomkinsc committed Jul 2, 2019
    Configuration menu
    Copy the full SHA
    df208ea View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    98ac4fc View commit details
    Browse the repository at this point in the history
  6. dump dx-toolkit version and update URL to reflect new source

    DNAnexus seems to have replaced their wiki with a new documentation page ( https://documentation.dnanexus.com/downloads ) and the old download URLs along with it
    tomkinsc committed Jul 2, 2019
    Configuration menu
    Copy the full SHA
    784877a View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    232f9cd View commit details
    Browse the repository at this point in the history

Commits on Jul 3, 2019

  1. add missing import

    tomkinsc committed Jul 3, 2019
    Configuration menu
    Copy the full SHA
    c01bb5b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e25ef52 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6ba96d4 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e8a4081 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    8280063 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    b86b1c9 View commit details
    Browse the repository at this point in the history

Commits on Jul 4, 2019

  1. avoid collision

    tomkinsc committed Jul 4, 2019
    Configuration menu
    Copy the full SHA
    8afe18f View commit details
    Browse the repository at this point in the history

Commits on Jul 10, 2019

  1. Configuration menu
    Copy the full SHA
    7d2f45a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f48038e View commit details
    Browse the repository at this point in the history

Commits on Jul 11, 2019

  1. Configuration menu
    Copy the full SHA
    c78f246 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    72fb4cd View commit details
    Browse the repository at this point in the history

Commits on Jul 13, 2019

  1. Configuration menu
    Copy the full SHA
    d97f773 View commit details
    Browse the repository at this point in the history
  2. remove sambamba

    tomkinsc authored Jul 13, 2019
    Configuration menu
    Copy the full SHA
    a685a8a View commit details
    Browse the repository at this point in the history

Commits on Aug 1, 2019

  1. specify containment=t for bbmap clumpify

    Allow containments (where one sequence is shorter) when using bbmap clumpify to deduplicate
    tomkinsc committed Aug 1, 2019
    Configuration menu
    Copy the full SHA
    a2ce0f1 View commit details
    Browse the repository at this point in the history

Commits on Aug 26, 2019

  1. Configuration menu
    Copy the full SHA
    6f26717 View commit details
    Browse the repository at this point in the history

Commits on Sep 11, 2019

  1. Configuration menu
    Copy the full SHA
    b73950e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a3010ea View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8199b12 View commit details
    Browse the repository at this point in the history

Commits on Sep 12, 2019

  1. update miniconda ssl certs

    tomkinsc committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    6bb3f6b View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2019

  1. Configuration menu
    Copy the full SHA
    97eff11 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f6f9b85 View commit details
    Browse the repository at this point in the history

Commits on Oct 3, 2019

  1. Configuration menu
    Copy the full SHA
    1674c44 View commit details
    Browse the repository at this point in the history

Commits on Nov 7, 2019

  1. Configuration menu
    Copy the full SHA
    4ca4693 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8f8aaae View commit details
    Browse the repository at this point in the history
  3. Merge branch 'ct-add-clumpify' of ssh://github.com/broadinstitute/vir…

    …al-ngs into ct-add-clumpify
    tomkinsc committed Nov 7, 2019
    Configuration menu
    Copy the full SHA
    218a12b View commit details
    Browse the repository at this point in the history
  4. update stage number

    tomkinsc committed Nov 7, 2019
    Configuration menu
    Copy the full SHA
    fa5e01e View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2019

  1. Configuration menu
    Copy the full SHA
    a0735c7 View commit details
    Browse the repository at this point in the history

Commits on Nov 20, 2019

  1. Configuration menu
    Copy the full SHA
    663deba View commit details
    Browse the repository at this point in the history
  2. demux_plus/demux_metag: merge linear parts of scatters, run spike-in …

    …on raw rather than de-duped reads
    tomkinsc committed Nov 20, 2019
    Configuration menu
    Copy the full SHA
    5a7ed3b View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2be4a85 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    1d691b2 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    1ba7415 View commit details
    Browse the repository at this point in the history
  6. fix bug in conda command quiet calling

    fix bug in conda command quiet calling ('-q -y' must be after 'conda <command>')
    tomkinsc committed Nov 20, 2019
    Configuration menu
    Copy the full SHA
    bb589a1 View commit details
    Browse the repository at this point in the history
  7. maintain RG info in clumpify dedup; move processing to bbmap.py

    for bbmap clumpify de-dup, merge like-library RGs and perform deduplication on each, then gather the IDs of kept reads, and filter the input sam based on the list of IDs to keep so as to maintain header and RG information. move most of the theprocessing to bbmap.py::dedup_clumpify so it has a more simple interface that accepts one bam and emits one bam. ToDo: parallelize across LBs
    tomkinsc committed Nov 20, 2019
    Configuration menu
    Copy the full SHA
    472703b View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    ca726d0 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    3f9f188 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    995cf0d View commit details
    Browse the repository at this point in the history
  11. respecify kaiju deps

    tomkinsc committed Nov 20, 2019
    Configuration menu
    Copy the full SHA
    d54eff3 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    21a6ac4 View commit details
    Browse the repository at this point in the history

Commits on Nov 21, 2019

  1. Configuration menu
    Copy the full SHA
    13f5172 View commit details
    Browse the repository at this point in the history
  2. change to clumpify for pre-depletion dedup

    change to clumpify for pre-depletion dedup; dedup lication can be likely be removed from depletion entirely in the future once all calls in the codebase have been updated to have one fewer arg
    tomkinsc committed Nov 21, 2019
    Configuration menu
    Copy the full SHA
    c1d18be View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7c45da6 View commit details
    Browse the repository at this point in the history
  4. remove rmdup from depletion call

    remove rmdup from depletion call, remove *.rmdup.bam from positional arguments for depletion CLI parser, remove *.rmdup.bam from inputs where depletion is called (test cases, WDL), remove *.rmdup.bam from expected depletion outputs. Chance Snakemake merge_one_per_sample rule to call  rmdup_clumpify_bam rather than rmdup_mvicuna_bam
    tomkinsc committed Nov 21, 2019
    Configuration menu
    Copy the full SHA
    d91eca5 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    12f73cb View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    16a2b50 View commit details
    Browse the repository at this point in the history

Commits on Dec 2, 2019

  1. Configuration menu
    Copy the full SHA
    f1f9a40 View commit details
    Browse the repository at this point in the history

Commits on Dec 3, 2019

  1. Configuration menu
    Copy the full SHA
    49bffcb View commit details
    Browse the repository at this point in the history

Commits on Mar 27, 2020

  1. pass through single-end IDs for bbmap dedup

    single-end reads do not have /1 /2 mate suffix, so pass through IDs missing the suffix
    tomkinsc committed Mar 27, 2020
    Configuration menu
    Copy the full SHA
    362d0f3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f816f6b View commit details
    Browse the repository at this point in the history