- Installing the latest development version:
- install prerequisites:
conda install -c bioconda numpy pandas pybigwig idr
- clone
develop
branch:git clone --branch develop https://github.com/jurgjn/yapc.git
- install in editable mode using pip:
pip install -e yapc
- install prerequisites:
- Broad overview of the IDR workflow -- replicates, pseudoreplicates, self-pseudoreplicates, etc -- in (Landt et al. 2012) around Fig 7.
- There are two IDR implementations: an initial R version, and a v2 python rewrite (available in bioconda). The latter has several improvements and is being used by the current version of this pipeline.
idr
script in IDR v2 is the equivalent ofbatch-consistency-analysis.r
in the initial R version; it should be used in a similar fashion (replicates, pseudoreplicates, self-pseudoreplicates)- An explanation of globalIDR vs localIDR
Use the global IDR for thresholding. Essentially the local IDR is akin to the posterior prob. of a peak belonging to the irreproducible noise component. The global IDR is analogous to a multiple hypothesis correction on a p-value to compute an FDR.
- Replicate-specific peaks/peak scores can be replaced with a metric calculated from a replicate-specific coverage track at regions defined by the oracle peaks
- Interpreting IDR v2 diagnostic plots
- Weak but highly correlated peaks can be problematic with IDR
truncate the number of peaks to the top 100k-125k. Using more than this simply increases the running time of the IDR analysis with no advantage. Infact using more peaks with MACS2 can cause problems with the IDR model because MACS2 seems to produce strange highly correlated peak scores for very weak and noisy detections. This can confuse the IDR model.
- The method is sensitive to the choice of the smoothing window. It could, in theory, be further improved by utilising multiple windows (and wavelets), as has been previously done for mass spectrometry data in (Du et al 2006).