Skip to content
Kamil S. Jaron edited this page Jan 8, 2025 · 1 revision

This will ultimately be a end-to-end genome profiling task, but for now it is a task that takes a k-mer pair histogram (the output of smudgeplot hetkmers), infers the 1n coverage and generates smudgeplots. One important aspect to consider is the expected genome coverage, if you have high coverage dataset (with >50x per haplotype), you might want to set -cov_max paramter to something higher. If the plot seems to be mislabeled, you can specify the coverage you think is right, and take a look how the smudgeplot will look like (parameter -cov, disables -cov_min, -cov_max).

Usage

usage: smudgeplot.py [-h] [-o O] [-cov_min COV_MIN] [-cov_max COV_MAX] [-cov COV] [-c C] [-t TITLE] [-ylim YLIM] [-col_ramp COL_RAMP] [--invert_cols] [infile]

positional arguments:
  infile                name of the input tsv file with covarages and frequencies.

options:
  -h, --help            show this help message and exit
  -o O                  The pattern used to name the output (smudgeplot).
  -cov_min COV_MIN      Minimal coverage to explore (default 6)
  -cov_max COV_MAX      Maximal coverage to explore (default 50)
  -cov COV              Define coverage instead of infering it. Disables cov_min and cov_max.
  -c C, -cov_filter C   Filter pairs with one of them having coverage bellow specified threshold (default 0; disables parameter L)
  -t TITLE, --title TITLE
                        name printed at the top of the smudgeplot (default none).
  -ylim YLIM            The upper limit for the coverage sum (the y axis)
  -col_ramp COL_RAMP    An R palette used for the plot (default "viridis", other sensible options are "magma", "mako" or "grey.colors" - recommended in combination with --invert_cols).
  --invert_cols         Revert the colour palette (default False).
Clone this wiki locally