-
Notifications
You must be signed in to change notification settings - Fork 1
/
Changelog
865 lines (698 loc) · 32.7 KB
/
Changelog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
######## MultiRepMacsChIPSeq Revision History #############
v20.0
multirep_macs2_pipeline.pl
- Pre-sorts profile data files based on number of peaks in each group
- Improve genomic coverage interpretation, including warnings if coverage
is insufficient, which may affect background and q-value scores.
- Change --atac option to --cutsite as it is more appropriate
- Change --blacklist option to --exclude as it is more appropriate
- Requires latest Bio::ToolBox version 2
sort_data_by_key.pl
- New script to sort a data file by groups using an external key list.
In this context, sorting by the matrix key list based on which samples
from which the peaks were called.
render.pl
- New script to generate the HTML report from the markdown report. This
allows the markdown report to be edited after the pipeline completes
and a new HTML file generated.
- Fixes warnings about pandoc options
plot_peak_figures.R
- Updated to handle and generate plots for pre-sorted profile heat maps
and group-specific mean profile line plots.
v19.2
- Update multiple library and script code for new Bio::ToolBox compatibility
- Set pandoc title
v19.1
multirep_macs2_pipeline.pl
- Improvements to the Markdown and HTML reporting, including which plots
to use, setting palette colors based on sample numbers, and style and
text changes.
- Change --atac option parameters to tighter settings
v19.0
multirep_macs2_pipeline.pl
- New Markdown report generation, which summarizes options, alignment
statistics, and peak call numbers, intersections, correlations, and results.
Includes select (but not all) plots to illustrate the report. If Pandoc is
available, the Markdown report will automatically be converted to a
self-contained HTML report.
- Properly set pseudo counts when generating replicate-mean q-value tracks for
samples with no reference control. Previous versions had a bug that
artificially lowered q-value scores, resulting in fewer peaks called.
Primarily only affected ATAC-Seq and Cut-and-Run samples without reference
(Input) control.
- Dynamically adjust sequencing depth when generating mean-replicate q-value
tracks for each sample. Previous versions used the same scaling depth for all
(calculated as the minimum depth observed of all replicates), which
significantly reduced overall peak calling effectiveness when samples had a
large disparity in sequencing depth, for example in Cut-and-Run experiments.
This method was already employed with the `--independent` mode peak calling.
Use the option `--samedepth` option to revert to the old behavior.
- By default, the count bigWig tracks are now scaled to the median observed
sequencing depth, rather than minimum observed depth as in previous
versions. This improves differential analysis, particularly when provided
samples that have a large disparity in sequencing depth, for example in
Cut-and-Run experiments. It also mimics the scaling methodology of
differential analysis programs. Set the option `--targetdepth` to `min` for
previous behavior; `mean` is also available.
- Automatically and empirically set the fragment, log2FE, and q-value limit
values when generating heat maps and plots. This makes the plots a little
more accurate and pertinent.
- Automatically adjust color palette if sample number exceeds default.
recall_peaks.pl
- Added report generation as described above
generate_mean_bedGraph.pl
- Always print empirically derived numbers even if genome size was provided.
plot_peak_figures.R
- Skip heat map plot generation when the number of samples exceed the number
supported by the indicated color palette.
v18.2
multirep_macs2_pipeline.pl
- bug fixes
combine_std_chipstats.pl
- Add support for Bowtie2 alignment statistics
run_DESeq2.R
- use fitType=local
v18.1
multirep_macs2_pipeline.pl
- New --atac option to simplify setting multiple parameters at once for
ATAC-Seq cut site analysis
- Increase default profile bin size to 100 bp, effectively making profile
plots +/- 2.5 Kb instead of 1 Kb
- Generate replicate fragment coverage profile plots in addition to mean
coverage plots when independent peak calls are generated
- Set default minimum replicate peak overlap to at least 2
- Set default output directory to 'MultiRepPeakCall'.
intersect_peaks.pl
- automatically remove commas from sample names, if present
plot_peak_figures.R
- Spatial intersection UpSet plots is now optional and off by default,
since these are compute intensive with minimal benefit
- Handle replicate fragment coverage plots
v18.0
multirep_macs2_pipeline.pl
- Running independent replicate peak calls with --independent flag now always
runs replicate-mean peak calls. Output peak files are named as 'rep_merge'
for merged peaks from independent calls, and 'rep_mean' for standard
replicate-mean peak calls. Plot files from each analysis are put into
separate Plot subfolders for convenience and organization.
- Peak call files are now left in narrowPeak format rather than simplifying
into Bed format. Fold-enrichment and Q-value scores are updated in the
narrowPeak file, and the standard score is normalized to 0-1000.
- A summary file for the combined output peaks is now written as a simple
TSV file with peak name, coordinate, sample source, and log2
fold-enrichment and maximum Q-value for each sample.
- Collected data files for log2 fold enrichment, Q-value, and counts are
now gzip compressed.
- Added new option to specify the minimum number of replicate peak overlaps
to require before accepting in the merged peak. Default is one less than
the number of replicates. This should reduce inclusion of spurious peak
calls from only replicate, particularly if replicates have low concordance.
- Removed outdated options.
intersect_peaks.pl
- New option for specifying the minimum number of contributing source peaks
that must overlap before accepting the merged peak in the output. Useful
for filtering out peaks identified from only one input.
- Writes a boolean matrix (1 or 0) for merged peaks indicating which of
the input source files contributed to the merged peak. Useful for
quickly identifying and filtering peaks from specific sources.
- No longer writes a simple text file of peak information.
update_peak_file.pl
- New script to update various peak file formats, including narrowPeak,
gappedPeak, and broadPeak, and update the extra score columns using
one or more provided, relevant, bigWig files.
- Additional options include normalizing the Score column, updating the
peak name, and writing a summit bed file (for narrowPeak only).
peak2bed.pl
- Add options to explicitly define input and output files, and specify
functions for renaming peak names, normalizing the Score column, and
writing a summit bed file. Default options should allow similar
functionality as before without options.
plot_peak_figures.R
- Update to handle compressed input files. This will break any backwards
compatibility. Solution would be to simply gzip compress relevant input
files.
- Rotate histogram plots horizontally to allow long sample names.
bam_partial_dedup.pl
- Change reporting statistics
- Change duplicate-distance threshold report file as a fraction of duplicates
considered, not total observed.
v17.9
multirep_macs2_pipeline.pl
- Filter unwanted alignments, including duplicates, secondary, supplemental,
singletons, and those in exclusion intervals, from bam files when running
macs2 callpeak in independent mode. This makes it more consistent with
calls made in the standard mode.
- Explicitly allow turning off exclusion list generation from the reference
control bam files by specifying "none".
v17.8
multirep_macs2_pipeline.pl
- Use provided genome size when calculating mean genome coverage for lambda.
This should improve q-value estimations for sparse coverage datasets.
- Fix bugs with generating plots for broad peaks and independent call peaks.
generate_mean_bedGraph.pl
- Use provided genome size instead of empirically calculated one. Print result.
intersect_peaks.pl
- Minor fixes
plot_peak_figures.R
- Generate peak count score distributions.
- Adjust ranked peak enrichment y-axis.
bam_partial_dedup.pl
- include mapq parameter in header command line string
combine_std_chipstats.pl
- Handle optical duplicates from UMIScripts bam_umi_dedup.pl
v17.7
multirep_macs2_pipeline.pl
- Fix bug in running independent peak calls
- Include broad peaks when generating plots
intersect_peaks.pl
- Major update to improve efficiency
- Write spatial and count intersection statistics to new files
- Skip unwanted chromosomes based on provided genome file
plot_peak_figures.R
- Generate a ranked peak enrichment plot line from count data
- Generate new UpSet plots for spatial and count intersections
- Fix transparent background in pie charts
v17.6
multirep_macs2_pipeline.pl
- Improve processing when given ChIP fragment coverage bigWig files instead of
Bam files
- Always run plot peak script
- Safely handle no exclusion peaks found in control
plot_peak_figures.R
- Add new fragment duplication bar chart summary
- Only generate correlation plots with three or more samples
- Only generate k-means heat maps with 500 or more peaks. Adjustable with command
line option.
- Tweak colors to improve visualization
generate_mean_bedGraph.pl
- Require explicit input and output file names. Only one input now allowed.
report_mappable_space.pl
- Check for non-Bam input files
v17.5
multirep_macs2_pipeline.pl
- When run in independent replicate mode, also generate a separate Fragment
bigWig file for each replicate as a convenience for later evaluation purposes.
- Fix bug of ignoring user-specified job value
v17.4
multirep_macs2_pipeline.pl
- Improve handling of multiple ChIP sample jobs when using a single universal
reference control for all samples.
- Improve handling of independent replicate peak call jobs, including disabling
when only 1 replicate is provided per sample.
- Sample names are no longer automatically changed for compatibility.
recall_peaks.pl
- Check for the explicit existence of every file that will be needed before
starting to avoid subsequent crashes.
- Keep the user-provided sample name order rather than resorting alphabetically.
intersect_peaks.pl
- Temporary sorted files are named uniquely to avoid race conditions when
multiple instances are run simultaneously, such as during independent
pipeline mode.
plot_peak_figures.R
- Handle nulls better in Jaccard files.
v17.3
recall_peaks.pl
- Skip requirement of finding example bam files. Use fragment bigWig instead.
v17.2
multirep_macs2_pipeline.pl
- Now checks all provided bam files for chromosome name and order consistency.
Inconsistent order frequently results in failures, despite best efforts. This
should fail early with helpful error message.
- Add "alt" as default chromosome name skip
print_chromosome_lengths.pl
- Accepts multiple source files to check chromosome name and order
consistency. Prints orders of discordant files.
plot_peak_figures.R
- Fix bug with Log2FE profile summary file
v17.1
multirep_macs2_pipeline.pl
- Bug fixes to handle errors in handling efficiency output files, duplicate
sample names, and periods in sample names.
v17
multirep_macs2_pipeline.pl
- Allow peaks to be called independently on each ChIP replicate if
desired. Some replicates are not well correlated, and calling peaks
independently, then merging, can improve overall peak calls, particularly
when one replicate has poor enrichment.
- Bug fixes.
v16.2
multirep_macs2_pipeline.pl
- Print configuration, job commands, and job logs in order of execution
to the combined log output when complete.
bam_partial_dedup.pl
- Add sanity check to ensure alignments were counted.
- Add verbose option.
v16.1
multirep_macs2_pipeline.pl
- Bug fixes to ensure command line options are passed properly to the
Runner library object. Fixes problems with running in single-end mode.
intersect_peaks.pl
- Enforce peak sort order by resorting everything, because bedtools is
a stickler for sort order, and there is no guarantee provided files
are in the same order, even with a provided genome file!
v16
multirep_macs2_pipeline.pl
- Completely restructure internal code (!!!), breaking apart the script
into separate code libraries for future work and sustainability.
No functional changes.
recall_peaks.pl
- New script for calling new peaks with different parameters from a
completed pipeline run. This avoids re-running the entire pipeline or
performing it manually. Utilizes the new library code to avoid redundancy.
subset_bigwig.pl
- New script to extract one chromosome from one or more bigWig files,
basically to make a smaller version for quick evaluation on a personal
computer.
print_chromosome_lengths.pl
- Improve chromosome order output from bigWig files, since the adapter
does not maintain order.
v15.2
multirep_macs2_pipeline.pl
- Check for duplicate input bam files, which can cause havoc later on.
- Only run optical duplicate checking when necessary
- Minor fixes
print_chromosome_lengths.pl
- bug fix
v15.1
multirep_macs2_pipeline.pl
- Change maxdup option to maxdepth. The option was misnamed, as setting
this to 1 would technically still leave 1 duplicate behind. Now
setting to 1 actually ensures all duplicates are removed, leaving 1
alignment per position.
- Improve error checking and reporting, especially of inputs. Default
to generating custom exclusion list when one isn't provided or available.
- Change default profile data collection to +/- 1 Kb (25 x 40 bins each side).
bam_partial_dedup.pl
- Major update to version 5. Max option now sets maximum depth, whereas
before it implied depth but actually did duplicate number.
- Removed faulty method of calculating maximum depth to achieve target
duplication rate, i.e. --norandom option. This wasn't recommended,
never worked well, and wasn't used anyway. Won't be missed.
- Change options when running samtools.
print_chromosome_lengths.pl
- Changed internal method of generating files by avoiding temp files
plot_peak_figures.R
- Switch to plotting k-means of 4, 6, 8 clusters (dropping 10)
run_DESeq2.R
- Include p-value column in output
v15.0
multirep_macs2_pipeline.pl
- Calculate exclusion lists empirically from provided Input Bam files.
This reduces sources of false positives and duplicate alignments.
- Calculate mappable space empirically rather than use pre-defined values.
This can still be overridden by explicitly setting the genome size. The
species option is deprecated, ignored, and a warning given if used.
- Retain the Control fragment bigWig in addition to the lambda_control.
- Require fragment size for single-end runs and peak-size for paired-end
runs. Default values are not reliable or appropriate.
- Add support for multiple-hit alignment fractional recording to reduce
signal at repetitive sequence genomic loci.
report_mappable_space.pl
- New script to report the fraction of the reported genome with coverage
based on either all (map quality 0) or unique (map quality 10)
alignments. Allows for empirically determining the mappable genome size
by using the given Bam files.
generate_mean_bedGraph.pl
- Rewrote the script as version 2 to generate a true global mean value
from either bigWig or bedGraph text files. Ignores zero coverage
regions. Should be more accurate than previous version.
peak2bed.pl
- Handle empty files more gracefully.
v14.3
multirep_macs2_pipeline.pl
- put summit bed files in separate directory when organizing
- improve checking finished commands
bam_partial_dedup.pl
- Handle and report alignments missing tile and pixel coordinates when
checking for optical duplicates. Disables optical deduplication if
errors are too high.
- Change pixel distance reporting (again) as fraction of total duplicates
at maximum observed distance.
- Add minimum mapping quality filter
intersect_peaks.pl
- New version that now collects basic peak length statistics of all the
input peak files for comparison and contrast.
- Properly handle differing input file formats.
plot_peak_figures.R
- Generate three new plots for peak length distribution, peak number, and
total peak area
v14.2
multirep_macs2_pipeline.pl
- Add support for genome files in intersect_peaks.pl
- Fix bug with detecting single- and paired-end counts from bam2wig
bam_partial_dedup.pl
- Change pixel distance reporting to report the fraction of observed
duplicates at a given pixel distance, rather than maximum distance,
which should be more useful in determining cutoff distance.
- Use input bam file base name as default report file name.
intersect_peaks.pl
- Add genome file command line option to use with bedtools jaccard, which
can fail when one file is not represented on chromosomes relative to the
second file, or files aren't sorted the same. Improve error reporting
and checking files before starting.
install_R_packages.R
- Change shebang.
v14.1
multirep_macs2_pipeline.pl
- Change the relative data collection from peak midpoint to smaller bin sizes
(peak_size/10) and more bins (25).
peak2bed.pl
- Added new helper script to convert raw Macs2 peak files to simple Bed
interval and summit files. It also renames and properly sorts the intervals.
This replaces complicated code in main pipeline.
bam_partial_dedup.pl
- Add new option to report the pixel distances observed with duplicates to get
a better idea of optical duplicates. Report is for informative purposes only
and de-duplication is turned off when the report is enabled.
- Add command line to bam header, but only when using the external samtools
command. The Perl HTS Bam adapter still has unreliable behavior with different
versions.
plot_peak_figures.R
- Tweak the plots for Jaccard and Number of Intersections. Reset the scale from 0
to max, and write the value in the boxes.
run_DESeq2.R
- Turn off log2fc shrink. Takes a long time and not clear if absolutely necessary
with ChIPSeq
v14.0
multirep_macs2_pipeline.pl
- Calculate target sequence depth for scaling by empirically collecting the
information from the fragment RPM bedgraph generation. This separates
fragment and count bam2wig commands separately. This is much more reliable than
explicit user input values, since partial deduplication can really throw off
the original estimates. Manual control is still allowed but warned.
- Remove documentation for external scaling ChIP and control depths. The code is
still present, but issues stern warnings if used. This effectively
removes support for external genome calibration. Basically, scaling depths
independently breaks all assumptions in the statistical test for calling peaks.
In other words, it doesn't work.
- Change file names for lambda control and collected scores. Collected scores are
now "mean" to reflect what they truly are, and not imply something else.
- Bug fixes
generate_differential.pl
- Major change to generate only one differential file with Macs2, and split based
on sign. Also allow to set minimum value of input files to discard, allowing
fold enrichment files to be used as well as statistical files.
v13.5
multirep_macs2_pipeline.pl
- Fix bug where command-line specified cutoff value wasn't being used properly
run_DESeq2.R
- Better handling of input files
generate_differential.pl
- Bug fix in directory creation
v13.4
multirep_macs2_pipeline.pl
- More safety checks to handle exceptions when no peaks are identified, rather
than simply crashing.
plot_peak_figures.R
- Update plot titles
v13.3
multirep_macs2_pipeline.pl
- Fix bug with organizing Bam files, handle situations where no peak files are
found, and make organizing files and generating plots default to true.
bam_partial_dedup.pl
- Include Read Group ID in calculation of identifying duplicate reads.
This ensures that alignments derived from different flow
cells don't get mis-classified as an optical duplicate. Empirical tests on
sample data suggest there could be 0.1% mis-classification.
generate_differential.pl
- Added new script to calculate differential tracks and optionally identify
regions of differential enrichment. Intended to be used with log Fold
Enrichment bedGraph or bigWig files.
plot_peak_figures.R
- Use the same color palette for all the plots, and make it user definable.
combine_std_chipstats.pl.
- Better handling of multiple input files with the same GNomEx Identifier in
v13.2
multirep_macs2_pipeline.pl
- Bug and efficiency fixes
plot_peak_figures.R
- Invert colors for Pearson and Spearman correlation plots
- Filter spatial Venn categories to only plot top categories
run_DESeq2.R
- Add option to report all results
v13.1
multirep_macs2_pipeline.pl
- Bug fixes for running de-duplication
v13.0
multirep_macs2_pipeline.pl
- Change the "dup" option to "dedup". This is confusing too many people.
- Add new dry run option, which should print out every job command that would
be run in during normal execution, making assumptions when actual decisions
or file results are not calculable or present.
- Collect de-duplication results into a single summary file.
- Clean narrowPeak and gappedPeak files into simple bed files, since the Macs2
bdgpeakcall and bdgbroadcall functions do not calculate the statistical
p-value and q-value scores. Also rename the peaks - I hate it's naming scheme.
- Run BioToolBox get_relative_data.pl script to collect relative data around the
midpoint of called merged called peaks for plotting. Collects 10 bins on either
side, with bin size determined as a fraction of peak size. Collects both log2FE
and fragment coverage.
- Add option to organize all the result files into subdirectories rather than
leave as a jumbled mess for manual cleanup.
- Combine all individual output log files into one
- Make Parallel::ForkManager required
- Bug fixes
get_chip_efficiency.pl
- Add new script to collect ChIP efficiencies by calculating the fraction of
alignment counts for each sample replicate over its respective called peak.
plot_peak_figures.R
- Massive overhaul to plot 13 sets of quality control and analytical plots.
New additional plots for deduplication results, ChIP efficiency, spatial
Venn, relative occupancy.
run_normR_enrichment.R
- add option to report all results
install_R_packages.R
- Add simple script to install R dependencies
v12.4
multirep_macs2_pipeline.pl
- Fix bug with handling multiple control files and lamba control not being
properly generated.
- Add proper help command line option
- Print entire command, not just app name, for clearer debugging purposes
v12.3
multirep_macs2_pipeline.pl
- Fix bug with handling single control Bam files, which got broken in 12.0
- Improve error messages
v12.2
multirep_macs2_pipeline.pl
- Fix critical bug where lambda control was no longer enabled by default
- Bug fix in log redirection
v12.1
multirep_macs2_pipeline.pl
- Safety check of left over command line options - there shouldn't be any
- Fix bug with processing progress file
- Bug fixes
v12.0
multirep_macs2_pipeline.pl
- Add progress tracking file. This should help with restarting the pipeline
so that jobs are not needlessly re-executed.
- Improve checking whether job commands executed successfully by checking for
output files.
- Add full crash reporter for each ChIPJob data object so that errors can be
tracked more completely.
- Improve handling when provided a single control reference bigWig file
- Avoid expensive forking for simple remove commands
v11.6
multirep_macs2_pipeline.pl
- Fix bug with default de-duplication argument not being set properly
- Add log file to wigToBigWig commands to catch errors
v11.5
multirep_macs2_pipeline.pl
- When executing with no-lambda enabled, the reference control coverage should
be shifted just as with the ChIP samples
- The reference control d fragment generation should be treated the same as the
ChIP fragments and extended and shifted in the same manner as the ChIP samples.
This, I think, deviates slightly from how the standard Macs2 callpeak function
operates, but should be more appropriate. However, it shouldn't really change
results by very much.
- Add explicit Rscript path command line option
v11.4
multirep_macs2_pipeline.pl
- Add option to de-duplicate a Bam file as paired-end, but otherwise treat as a
single-end file for fragment and count file generation. This enables users to
analyze ATACSeq data as cut-site only, but avoid issues with over de-duplication
by still including pair information.
v11.3
multirep_macs2_pipeline.pl
- Fix bedtools command line bug
v11.2
multirep_macs2_pipeline.pl
- Fix bugs with handling the no-lambda option and no control files
- Fix bug with getting genomic window counts
combine_std_chipstats.pl
- Fix bug so that sample names are always present
v11.1
multirep_macs2_pipeline.pl
- Fix bugs with running bedtools and manipulate_wig commands
combine_std_chipstats.pl
- Now allows processing given files, rather than only allowing parsing Pysano
stderr and stdout job files
intersect_peaks.pl
- Clean up sample names to make plots simpler
v11.0
multirep_macs2_pipeline.pl
- Use bedtools and BiotoolBox data2wig.pl for generating control lambda bedGraph
rather than running four sequential Macs2 commands. This should be substantially
faster.
- Add new options for collecting data over genomic windows for use in subsequent
analysis programs (like DESeq2 or normR), as well as combining replicate data.
- Always use an external chromosome file
run_normR_difference.R
run_normR_enrichment.R
run_DESeq2.R
- Add new scripts for running DESeq2 or normR for enrichment or differential analysis
combine_replicate_data.pl
- Add new script for easily combining replicate count data for use in programs that
don't natively handle replicates (normR).
intersect_peaks.pl
- Run bedtools multi-intersection tool, great for calculating spatial Venn diagrams.
plot_peak_figures.R
- Update correlation plots
v10.2
multirep_macs2_pipeline.pl
- Check that all indicated input files are present
- Add option to run the plot_peak_figures.R script automatically, but not enabled
by default, in case R libraries aren't available
- Change sample file format
bam_partial_dedup.pl
- Make sure deduplication still occurs when optical duplicates are present but
regular duplicates are below action threshold
print_chromosome_lengths.pl
- Add support for skipping chromosomes
v10.1
multirep_macs2_pipeline.pl
- Added ability to use the global ChIP fragment mean when no reference Bam file is
available
generate_mean_bedGraph.pl
- change output filename
bam_partial_dedup.pl
- Fix bug to mark alignments
plot_peak_figures.R
- Cluster sample columns
v10.0
multirep_macs2_pipeline.pl
- Add support for handling optical duplicates
bam_partial_dedup.pl
- Add critical support for detecting and processing optical duplicates. These are
sequencing artifacts, notably from Illumina sequencers with patterned flow cells
such as Nextseq and Novaseq. They are can severely inflate the duplicate count
artificially, which could be detrimental when trying to normalize duplicate levels.
combine_std_chipstats.pl
- Include optical duplicates in reporting
v9
multirep_macs2_pipeline.pl
- Fix bugs pertaining to deduplication when sharing reference control files
between ChIP samples
- Add basic broad (gapped) peak calling. No data collection or QC analysis is
done with these files, just the narrow peak files.
intersect_peaks.pl
- Fix bug where script dies before printing help
v8
multirep_macs2_pipeline.pl
- Count bigWigs now include fractional counts, rather than integers. This had the
side effect of undercounting reads in peaks due to rounding.
- Add option to use either raw counts or scaled counts
- Add option to save q-value bedGraph files
v7
multirep_macs2_pipeline.pl
- Inprove handling of duplicate, but not universal, reference control files. For
example, when the same Input is used for 2 out of 3 ChIP samples.
- Improve handling of control file naming scheme to avoid errors
- Improve handling of situations where we have an output file but no log file
v6.2
multirep_macs2_pipeline.pl
- Add a pseudo count of 1 when generating fold enrichment when no lambda is enabled.
- Improve handling and fix bugs when combining log output files
- Numerous bug fixes with naming count bigWig files, merging peak calls, printing
commands
v6.1
multirep_macs2_pipeline.pl
- Fix bugs with scaling
combine_std_chipstats.pl
- Handle UMI de-duplication reporting changes
plot_peak_figures.R
- Ignore Name column from collected data
- Convert null values to zeroes
v6.0
multirep_macs2_pipeline.pl
- Add option to explicitly turn off lambda control, or just use small or large
lambda only.
- Generate properly scaled count bigWig files for data collection.
- Add checks to check with individual command jobs have completed or not. This
will help with re-starting a failed pipeline by not having to repeat successfully
completed commands.
- Add option to save deduplicated bam files, which are automatically deleted upon
completion
intersect_peaks.pl
- New external helper script to handle intersecting peak files and generating
intersection reports.
combine_std_chipstats.pl
- Add support for reporting actual de-duplication numbers
plot_peak_figures.R
- New script for plotting peak comparison figures, including heat maps of collected
mean log2FE and q-values, number of peak intersections, and Jaccard intersection.
plot_shift_models.R
- New script for plotting the calculated, empirical, single-end alignment shift
models from the BioToolBox bam2wig.pl application.
combine_std_chipstats.pl
- New script for collecting alignment, bam2wig emprical
Add a proper Module::Build script
v5.2
multirep_macs2_pipeline.pl
- Add support for chromosome-specific normalization, for example, multi-copy
plasmids, viral episomes, or sex chromosomes
bam_partial_dedup.pl
- Use external samtools application for concatenating child chromosome bam files
for a significant speed-boost over doing it in Perl
v5.1
multirep_macs2_pipeline.pl
- Add a shift option for fragment processing, specifically to support centering
ATACSeq coverage over the endpoint or cut site.
v5.0
multirep_macs2_pipeline.pl
- Look up helper application paths using module File::Which
- Bug fixes
generate_mean_bedGraph.pl
- Add new helper script to generate chromosomal mean bedGraph file that supports
using the libBigWig adapter, and not the old UCSC adapter, for bigWig files.
v4
multirep_macs2_pipeline.pl
- Add support for no Input reference control file, and use a calculated global
mean value instead.
v3
multirep_macs2_pipeline.pl
- Add support for parallel processing of bam deduplication
bam_partial_dedup.pl
- Major update to support parallel processing to dramatically increase processing
time
- Add blacklist and chromosome skipping
v2
multirep_macs2_pipeline.pl
- Add support for external scaling factors, particularly when using an external
reference genome for calibration of ChIPSeq, such as the ChIP-Rx technique.
Scales may be provided independently for both ChIP and control.
v1
multirep_macs2_pipeline.pl
- New script to process the pipeline using Perl, my de facto language. This gives
much better command line processing, as well as multiple job and parallel
processing control. Yippee!!!! Entirely moving away from the old cumbersome Bash
shell script. Yuck!
v2.1
multirep_macs2_pipeline.sh
- Handle paired-end ChIPs
- Switch to using wigToBigWig instead of bedGraphToBigWig
v2
multirep_macs2_pipeline.sh
- Restructure bash script to put user variables closer to top instead of embedded
throughout
v1
multirep_macs2_pipeline.sh
- First version of wrapper script to handle custom execution of Macs2
bam_partial_dedup.pl
- There were multiple versions around before this was moved into this project as its
final home. Lots of improvements, like using end-position as well as start to
identify duplicates, random sub sampling, paired-end alignment support, agnostic
Bam file adapter support (to support the newer Bio::DB::HTS and old Bio::DB::Sam),
etc.
If you've made it this far, congratulations! You are dedicated!!!! Anyway, this is
basically the beginning. Thanks for tripping down memory lane with me!!!