Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roi: update to docs #148

Merged
merged 1 commit into from
Sep 24, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 17 additions & 4 deletions docs/content/roi.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Synopsis
```shell
$ dnmtools roi [OPTIONS] <intervals.bed> <input.meth>
$ dnmtools roi [OPTIONS] <intervals.bed> <input.counts>
```

## Description
Expand All @@ -17,15 +17,18 @@ found in the documentation for the `levels` command.

The `roi` command requires two input files. The first is a
sorted [counts output file](../counts),
i.e. `input.meth` in the example above. This file provides data for
i.e. `input.counts` in the example above. This file provides data for
every site, either a cytosine or CpG, that is of interest. The second
input file (`intervals.bed`) specifies the genomic intervals in which
methylation statistics should be summarized. If either file is not
sorted by (chrom,end,start,strand) it can be sorted using the
following command:
```shell
$ LC_ALL=C sort -k 1,1 -k 3,3n -k 2,2n -k 6,6 -o input-sorted.meth input.meth
$ LC_ALL=C sort -k 1,1 -k 3,3n -k 2,2n -k 6,6 -o input-sorted.counts input.counts
```
Note: As of v1.4.0, the sorted order of chromosomes/targets within these
files is not important, but the sites within each chromosome must
still be sorted.

The intervals must be specified as a BED format file, and these can be
sorted using [bedtools
Expand All @@ -35,9 +38,19 @@ formats: (1) 6-column BED format, which may have more than 6 columns,
but requires the first 6 columns to match the specification, or (2)
3-column BED format.

*An important note about the input files:* several aspects of the
output for `roi` depend on the number of sites within each region of
interest. If the `.counts` file provided as input does not have all
the sites you might expect, for example if it is missing sites that
have been excluded from some earlier step in your pipeline, then the
results will be affected. We hope to make `roi` more robust to this
issue in the future, for example by accepting some information about
the reference genome to ensure that the numbers of sites are as
expected by the user.

From there, the `roi` command can be run as follows:
```shell
$ dnmtools roi -o output.bed regions.bed input-sorted.meth
$ dnmtools roi -o output.bed regions.bed input-sorted.counts
```

The default output format is a 6-column BED format file, with the
Expand Down