Skip to content

Commit

Permalink
Merge pull request #443 from deeptools/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
joachimwolff authored Oct 8, 2019
2 parents 6627801 + 4c8e796 commit cee139f
Show file tree
Hide file tree
Showing 45 changed files with 6,524 additions and 449 deletions.
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@
HiCExplorer
===========

Set of programs to process, analyze and visualize Hi-C data
-----------------------------------------------------------
Set of programs to process, analyze and visualize Hi-C and cHi-C data
---------------------------------------------------------------------

Sequencing techniques that probe the 3D organization of the genome generate large amounts of data whose processing,
analysis and visualization is challenging. Here, we present HiCExplorer, a set of tools for the analysis and
Expand Down
7 changes: 7 additions & 0 deletions bin/hicPlotSVL
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

from hicexplorer.hicPlotSVL import main

if __name__ == "__main__":
main()
29 changes: 29 additions & 0 deletions docs/content/News.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,35 @@
News and Developments
=====================

Release 3.3
-----------
**8 October 2019**

- Fixing many bugs:
- A bug in hicDetectLoops if a sub-matrix was very small
- A bug in hicPlotMatrix if the region defined by --region was only a chromosome and loops should be plotted too
- A bug in hicPlotMatrix if a loop region should be plotted and chromosomeOrder argument was used too
- A bug in hicAggregateContacts (issue #405) if chromosomes were present in the matrix but not in the bed file
- A bug in hicAdjustMatrix concerning a bed file and consecutive regions, see issue #343
- A bug in hicAdjustMatrix if a chromosome is present in the matrix but not in the bed file, see issue #397
- A bug in hicCompartmentsPolarization concerning the arguments 'quantile' and 'outliers' were interpreted as strings but should be integers
- A bug in hicAdjustMatrix concerning the 'keep' option and how matrices are reordered internally. Thanks @LeilyR

- Added features as requested:
- hicPCA ignores now masked bins, see issue #342
- chicPlotViewpoint:
- Better legend handling on x-axis
- Peaks are now display with their fill width
- Add option `--pValueSignificantLevels` to categorize the p-values in x levels (e.g. 0.001 0.05 0.1)
- chicViewpoint:
- adding sorting via viewpoints and not by samples option (--allViewpointsList)
- Adding an option to hicNormalize to normalize via multiplication and a use defined value (see issues #385, #424)

- Rearrange hicAdjustMatrix to have a better accessibility to its functions from outside of main
- Improving the documentation and fixing grammar / spelling mistakes. Thanks @simonbray
- New script: hicPlotSVL to investigate short range vs long range ratios.


Release 3.2
-----------
** 22 August 2019**
Expand Down
87 changes: 43 additions & 44 deletions docs/content/capture-Hi-C.rst

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions docs/content/list-of-tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ HiCExplorer tools
| | | one PCA bedgraph file | polarization plot | |
+--------------------------------------+------------------+-----------------------------------+---------------------------------------------+-----------------------------------------------------------------------------------+
|:ref:`hicPlotAverageRegions` | visualization | one npz file | one image | Visualization of hicAverageRegions. |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|:ref`hicPlotSVL` | analysis | one / multiple Hi-C matrices | one image, p-values file, raw data file | Computes short/long range contacts; a box plot, a p-value and raw data file |
+--------------------------------------+------------------+-----------------------------------+---------------------------------------------+-----------------------------------------------------------------------------------+
|:ref:`hicMergeTADbins` | preprocessing | one Hi-C matrix, one BED file | one Hi-C matrix | Uses a BED file of domains or TAD boundaries to merge the |
| | | | | bin counts of a Hi-C matrix. |
Expand Down Expand Up @@ -167,6 +169,8 @@ Tools for Hi-C data analysis
""""""""""""""""""""
:ref:`hicCompartmentsPolarization`
""""""""""""""""""""""""""""""""""
:ref:`hicPlotSVL`
""""""""""""""""""""""""""""""""""

Tools for TADs processing
^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
77 changes: 77 additions & 0 deletions docs/content/tools/hicPlotSVL.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
.. _hicPlotSVL:

hicPlotSVL
==========

.. contents::
:local:

Description
^^^^^^^^^^^

hicPlotSVL computes the ratio between short range and long range contacts per chromosome independently. Per sample one box plot is created and, if more than one sample is given,
the computed ratios are assumed to be one distribution and a Wilcoxon rank-sum test under H0 'distributions are equal' is computed. All used data is written to a third raw data file.

The distance to distinct short and long range contacts can be set via the `--distance` parameter; for short range the sum for all contacts smaller or equal this distance are computed, for long range contacts all contacts greater this distance.


Usage example
^^^^^^^^^^^^^^

.. code-block:: bash
$ hicPlotSVL -m hmec_10kb.cool nhek_10kb.cool hmec_10kb.cool --distance 2000000 --threads 4 --plotFileName plot.png --outFileName pvalues.txt --outFileNameData rawData.txt
This results in three files:

The raw data containing the sums for short range, long range and ratio per sample and chromosome.

.. code-block:: INI
# Created with HiCExplorer's hicPlotSVL 3.3
# Short range vs long range contacts per chromosome: raw data
# Short range contacts: <= 2000000
# hmec_10kb.cool nhek_10kb.cool hmec_10kb.cool
# Chromosome Ratio Sum <= 2000000 Sum > 2000000 Ratio Sum <= 2000000 Sum > 2000000 Ratio Sum <= 2000000 Sum > 2000000
1 3.0399769346724543 33476834 11012200 2.79740105237572 44262902 15822866 3.0399769346724543 33476834 11012200
2 2.7532203542810625 31723954 11522490 2.5007877714355438 47468438 18981394 2.7532203542810625 31723954 11522490
3 2.922650759458664 26251027 8981924 2.6235211241878442 39640848 15109788 2.922650759458664 26251027 8981924
4 2.7235598858451637 22474680 8251950 2.5572455199457864 37486882 14659086 2.7235598858451637 22474680 8251950
5 2.9585962905193712 22716268 7678056 2.752922527526723 35445722 12875670 2.9585962905193712 22716268 7678056
6 3.168274165465025 22872690 7219290 2.8602111006131703 33990211 11883812 3.168274165465025 22872690 7219290
7 3.1093346580597516 19603416 6304698 2.8021236966788887 29712823 10603680 3.1093346580597516 19603416 6304698
8 3.135391026076832 18355087 5854162 2.7964394470859024 28660624 10248970 3.135391026076832 18355087 5854162
9 4.1147978383348125 15395763 3741560 3.819940066283481 21994046 5757694 4.1147978383348125 15395763 3741560
10 3.448063050802953 17964043 5209894 3.1116673856502253 26270171 8442474 3.448063050802953 17964043 5209894
11 3.5924666993070407 18651850 5191934 3.1364875011923035 26240350 8366158 3.5924666993070407 18651850 5191934
12 3.6817551043464416 18640866 5063038 3.306662109403207 26101554 7893626 3.6817551043464416 18640866 5063038
13 3.476204237522881 11018462 3169682 3.0976674036654805 18922281 6108558 3.476204237522881 11018462 3169682
14 3.70550850832778 11164875 3013048 3.6226817463785164 17245704 4760480 3.70550850832778 11164875 3013048
15 4.607631079612186 11165313 2423222 4.567998349104569 15273742 3343640 4.607631079612186 11165313 2423222
16 4.397874357146307 10745775 2443402 3.890983210350018 14666462 3769346 4.397874357146307 10745775 2443402
17 5.809374740402161 12168235 2094586 5.3360710927739285 14154110 2652534 5.809374740402161 12168235 2094586
18 3.7647349280938895 9339833 2480874 3.485487446356812 15019063 4309028 3.7647349280938895 9339833 2480874
19 6.492239632778196 8466283 1304062 5.774337450385819 9368978 1622520 6.492239632778196 8466283 1304062
20 5.542933774973686 8962935 1617002 4.977679877778358 12009479 2412666 5.542933774973686 8962935 1617002
21 6.665622315255486 3910374 586648 6.1843701763589225 6554715 1059884 6.665622315255486 3910374 586648
22 8.063663557923096 4992327 619114 7.433759425439728 5932928 798106 8.063663557923096 4992327 619114
X 2.208752982178897 14424173 6530460 2.3130534357407995 27628734 11944702 2.208752982178897 14424173 6530460
Y 4.165021803993573 36294 8714 3.8063291139240505 45105 11850 4.165021803993573 36294 8714
MT
The p-values between the samples:

.. code-block:: INI
# Created with HiCExplorer's hicPlotSVL 3.3
# Short range vs long range contacts per chromosome, p-values of each distribution against each other distribution with Wilcoxon rank-sum
# Short range contacts: <= 2000000
hmec_10kb.cool nhek_10kb.cool 0.28362036331636575
hmec_10kb.cool hmec_10kb.cool 1.0
nhek_10kb.cool hmec_10kb.cool 0.28362036331636575
The box plot:

.. image:: ../../images/plot_svl.png
Binary file added docs/images/plot_svl.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 3 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
HiCExplorer
===========

Set of programs to process, normalize, analyze and visualize Hi-C data
----------------------------------------------------------------------
Set of programs to process, normalize, analyze and visualize Hi-C and cHi-C data
--------------------------------------------------------------------------------

HiCExplorer addresses the common tasks of Hi-C data analysis from processing to visualization.

Expand Down Expand Up @@ -61,6 +61,7 @@ tool description
:ref:`chicAggregateStatistic` Compiling of target regions for two samples as input for differential analysis
:ref:`chicDifferentialTest` Differential analysis of interactions of two samples
:ref:`chicPlotViewpoint` Plotting of viewpoint with background model and highlighting of significant and differential regions
:ref:`hicPlotSVL` Computing short vs long range contacts and plotting the results
=================================== ==========================================================================================================================================================


Expand Down
7 changes: 4 additions & 3 deletions hicexplorer/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
import logging
# logging.basicConfig(level=logging.DEBUG)
logging.basicConfig(level=logging.INFO)
logging.getLogger('matplotlib').setLevel(logging.WARNING)
logging.getLogger('cooler').setLevel(logging.WARNING)
logging.getLogger('hicmatrix').setLevel(logging.WARNING)
logging.getLogger('matplotlib').setLevel(logging.ERROR)
logging.getLogger('cooler').setLevel(logging.ERROR)
logging.getLogger('hicmatrix').setLevel(logging.ERROR)
logging.getLogger('numexpr').setLevel(logging.ERROR)


import warnings
Expand Down
2 changes: 1 addition & 1 deletion hicexplorer/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
# This file is originally generated from Git information by running 'setup.py
# version'. Distribution tarballs contain a pre-generated copy of this file.

__version__ = '3.2'
__version__ = '3.3'
18 changes: 9 additions & 9 deletions hicexplorer/chicAggregateStatistic.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,14 @@ def parse_arguments(args=None):
`$ chicAggregateStatistic --interactionFile viewpoint1.bed viewpoint2.bed --targetFile targets.bed --outFileNameSuffix aggregated.bed`
and this will create one file: `viewpoint1_viewpoint2_aggregated.bed`
which will create a single output file: `viewpoint1_viewpoint2_aggregated.bed`
A second mode is the batch processing mode, for this you need a file containing the names of the viewpoint files (generated by chicViewpoint via --writeFileNamesToFile),
a folder containing them, a target list file containing the name of all target files and a folder containing them (created by chicSignificantInteractions):
A second mode is the batch processing mode. For this you need a file containing the names of the viewpoint files (generated by chicViewpoint via --writeFileNamesToFile),
a folder which contains the files, a target list file containing the name of all target files and a folder which contains the target files (created by chicSignificantInteractions):
`$ chicAggregateStatistic --interactionFile viewpoint_names.txt --targetFile target_names.txt --interactionFileFolder viewpointFilesFolder --targetFileFolder targetFolder --batchMode --threads 20 --outFileNameSuffix aggregated.bed`
If no `--targetFileFolder` in batch mode is given, it is assumed the `--targetFile` should be used for all viewpoints.
If the `--targetFileFolder` flag is not set in batch mode, it is assumed the `--targetFile` should be used for all viewpoints.
"""
)
parserRequired = parser.add_argument_group('Required arguments')
Expand All @@ -64,26 +64,26 @@ def parse_arguments(args=None):
default='_aggregate_target.bed')

parserOpt.add_argument('--interactionFileFolder', '-iff',
help='Folder where the interaction files are stored in. Applies only for batch mode.',
help='Folder where the interaction files are stored. Applies only for batch mode.',
required=False,
default='.')
parserOpt.add_argument('--targetFileFolder', '-tff',
help='Folder where the interaction files are stored in. Applies only for batch mode.',
help='Folder where the target files are stored. Applies only for batch mode.',
required=False)
parserOpt.add_argument('--outputFolder', '-o',
help='Output folder of the files.',
help='Output folder containing the files.',
required=False,
default='aggregatedFiles')
parserOpt.add_argument('--writeFileNamesToFile', '-w',
help='',
default='aggregatedFilesBatch.txt')
parserOpt.add_argument('--batchMode', '-bm',
help='The given file for --interactionFile and or --targetFile contains a list of the to be processed files.',
help='turns on batch mode. The files provided by --interactionFile and/or --targetFile contain a list of the files to be processed.',
required=False,
action='store_true')

parserOpt.add_argument('--threads', '-t',
help='Number of threads. Using the python multiprocessing module. ',
help='Number of threads (uses the python multiprocessing module). ',
required=False,
default=4,
type=int
Expand Down
20 changes: 10 additions & 10 deletions hicexplorer/chicDifferentialTest.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ def parse_arguments(args=None):
parser = argparse.ArgumentParser(add_help=False,
formatter_class=argparse.RawDescriptionHelpFormatter,
description="""
chicDifferentialTest tests if two locations under consideration of the reference point have a different interaction count. For this either Fisher's test or chi2 contingency test can be used.
The files that are accepted for this test can be created with `chicAggregateStatistic`. H0 is assuming the interactions are not different. Therefore the differential interaction counts are all where H0 was rejected.
chicDifferentialTest tests if two locations under consideration of the reference point have a different interaction count. For this either Fisher's test or the chi2 contingency test can be used.
The files that are accepted for this test can be created with `chicAggregateStatistic`. H0 assumes the interactions are not different. Therefore the differential interaction counts are all where H0 was rejected.
An example usage is:
Expand All @@ -35,47 +35,47 @@ def parse_arguments(args=None):
A second mode is the batch processing mode, for this you need a file containing the names of the aggregated files (generated by chicAggregateStatistic via --writeFileNamesToFile and the batch mode):
A second mode is the batch processing mode. For this you need a file containing the names of the aggregated files (generated by chicAggregateStatistic via --writeFileNamesToFile and the batch mode):
`$ chicDifferentialTest --statisticTest fisher --alpha 0.05 --interactionFile aggregatedFilesBatch.txt --interactionFileFolder aggregatedFilesFolder --batchMode --threads 20 --outputFolder differentialResults`
This will create, as in the non-batch mode, three files per aggregated file and writes the file name to the file given by `--rejectedFileNamesToFile`. This last file can be used to plot the differential interactions per viewpoint in batch mode by chicPlotViewpoint.
This will create, as in the non-batch mode, three files per aggregated file and writes the file name to the file given by `--rejectedFileNamesToFile`. This last file can be used to plot the differential interactions per viewpoint in batch mode, using chicPlotViewpoint.
"""
)

parserRequired = parser.add_argument_group('Required arguments')

parserRequired.add_argument('--interactionFile', '-if',
help='path to the interaction files which should be used for differential test.',
help='path to the interaction files which should be used for the differential test.',
required=True,
nargs='+')

parserRequired.add_argument('--alpha', '-a',
help='Accept all samples to significance level alpha',
help='define a significance level (alpha) for accepting samples',
type=float,
default=0.05,
required=True)

parserOpt = parser.add_argument_group('Optional arguments')

parserOpt.add_argument('--interactionFileFolder', '-iff',
help='Folder where the interaction files are stored in. Applies only for batch mode.',
help='Folder where the interaction files are stored. Applies only for batch mode.',
required=False,
default='.')
parserOpt.add_argument('--outputFolder', '-o',
help='Output folder of the files.',
required=False,
default='differentialResults')
parserOpt.add_argument('--statisticTest',
help='Type of test used for testing: fisher\'s exact test or chi2 contingency',
help='Type of test used: fisher\'s exact test or chi2 contingency',
choices=['fisher', 'chi2'],
default='fisher')
parserOpt.add_argument('--batchMode', '-bm',
help='The given file for --interactionFile and or --targetFile contain a list of the to be processed files.',
help='turn on batch mode. The given file for --interactionFile and or --targetFile contain a list of the to be processed files.',
required=False,
action='store_true')
parserOpt.add_argument('--threads', '-t',
help='Number of threads. Using the python multiprocessing module. ',
help='Number of threads (uses the python multiprocessing module)',
required=False,
default=4,
type=int
Expand Down
Loading

0 comments on commit cee139f

Please sign in to comment.