Skip to content

Commit

Permalink
Fix CRISPRessoAggregate bug and other improvements (#95)
Browse files Browse the repository at this point in the history
* D3-Enhancements (#78)

* Sam/try plots (#71)

* Fix batch mode pandas warning. (#70)

* refactor to call method on DataFrame, rather than Series.
Removes warning.

* Fix pandas future warning in CRISPRessoWGS

---------

Co-authored-by: Cole Lyman <[email protected]>

* Functional

* Cole/fix status file name (#69)

* Update config file logging messages

This removes printing the exception (which is essentially a duplicate),
and adds a condition if no config file was provided. Also changes `json`
to `config` so that it is more clear.

* Fix divide by zero when no amplicons are present in Batch mode

* Don't append file_prefix to status file name

* Place status files in output directories

* Update tests branch for file_prefix addition

* Load D3 and plotly figures with pro with multiple amplicons

* Update batch

* Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix

Before this fix, when using a file_prefix the second run that was compared
would not be displayed as a data in the first figure of the report.

* Import CRISPRessoPro instead of importing the version

When installed via conda, the version is not available

* Remove `get_amplicon_output` unused function from CRISPRessoCompare

Also remove unused argparse import

* Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests

* Allow for matching of multiple guides in the same amplicon

* Fix pandas FutureWarning

* Change test branch back to master

---------

Co-authored-by: Sam <[email protected]>

* Try catch all futures

* Fix test fail plots

* Point test to try-plots

* Fix d3 not showing and plotly mixing with matplotlib

* Use logger for warnings and debug statements

* Point tests back at master

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Sam/fix plots (#72)

* Fix batch mode pandas warning. (#70)

* refactor to call method on DataFrame, rather than Series.
Removes warning.

* Fix pandas future warning in CRISPRessoWGS

---------

Co-authored-by: Cole Lyman <[email protected]>

* Functional

* Cole/fix status file name (#69)

* Update config file logging messages

This removes printing the exception (which is essentially a duplicate),
and adds a condition if no config file was provided. Also changes `json`
to `config` so that it is more clear.

* Fix divide by zero when no amplicons are present in Batch mode

* Don't append file_prefix to status file name

* Place status files in output directories

* Update tests branch for file_prefix addition

* Load D3 and plotly figures with pro with multiple amplicons

* Update batch

* Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix

Before this fix, when using a file_prefix the second run that was compared
would not be displayed as a data in the first figure of the report.

* Import CRISPRessoPro instead of importing the version

When installed via conda, the version is not available

* Remove `get_amplicon_output` unused function from CRISPRessoCompare

Also remove unused argparse import

* Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests

* Allow for matching of multiple guides in the same amplicon

* Fix pandas FutureWarning

* Change test branch back to master

---------

Co-authored-by: Sam <[email protected]>

* Try catch all futures

* Fix test fail plots

* Fix d3 not showing and plotly mixing with matplotlib

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Remove token from integration tests file

* Provide sgRNA_sequences to plot_nucleotide_quilt plots

* Passing sgRNA_sequences to plot

* Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots

* Add max-height to Batch report samples

* Change testing branch

* Fix wrong check for large Batch plots

* Fix typo and move flexiguide to debug (#77)

* Change flexiguide output to debug level

* Fix typo in fastp merged output file name

* Adding id tags for d3 script enhancements

* pointing to test branch

* Add amplicon_name parameter to allele heatmap and line plots

* Add function to extract quantification window regions from include_idxs

* Scale the quantification window according to the coordinates of the sgRNA plot

* added c2pro check, added space in args.json

* Correct the quantification window indexes for multiple guides

* Fix name of nucleotide conversion plot when guides are not the same

* Fix jinja variables that aren't found

* Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot

* Remove unneeded variable and extra whitespace

* Switch test branch to master

---------

Co-authored-by: Samuel Nichols <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* Add amplicon_name to plot functions

* Add sgRNA sequences to nucleotide quilt parameters in Aggregate

* Add custom_colors to Aggregate plot functions

* Update Aggregate and make_aggregate_report to have logger and tool

* Write command_used to Aggregate .json info file

* Point to new test branch and add Aggregate run

* Make the order of Aggregate runs explicit

* Sort all instances of crispresso2_folder_info in Aggregate

* Sort df_summary_quantification df in Aggregate

* Try sorting with a list of single column

* Update to correct test branch

* Move to master test branch

---------

Co-authored-by: Trevor Martin <[email protected]>
Co-authored-by: Samuel Nichols <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>
  • Loading branch information
4 people authored Aug 9, 2024
1 parent fa05bd5 commit 876abc2
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 7 deletions.
5 changes: 5 additions & 0 deletions .github/workflows/integration_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -114,3 +114,8 @@ jobs:
if: success() || failure()
run: |
make compare test print
- name: Run Aggregate
if: success() || failure()
run: |
make aggregate test print
29 changes: 22 additions & 7 deletions CRISPResso2/CRISPRessoAggregateCORE.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ def main():
crispresso2_info = {'running_info': {}, 'results': {'alignment_stats': {}, 'general_plots': {}}} #keep track of all information for this run to be pickled and saved at the end of the run
crispresso2_info['running_info']['version'] = CRISPRessoShared.__version__
crispresso2_info['running_info']['args'] = deepcopy(args)
crispresso2_info['running_info']['command_used'] = ' '.join(sys.argv)

crispresso2_info['running_info']['log_filename'] = os.path.basename(log_filename)

Expand Down Expand Up @@ -227,7 +228,7 @@ def main():

if successfully_imported_count > 0:

crispresso2_folders = crispresso2_folder_infos.keys()
crispresso2_folders = list(sorted(crispresso2_folder_infos.keys()))
crispresso2_folder_names = {}
crispresso2_folder_htmls = {}#file_loc->html folder loc
quilt_plots_to_show = {} # name->{'href':path to report, 'img': png}
Expand Down Expand Up @@ -515,8 +516,10 @@ def main():
'fig_filename_root': this_window_nuc_pct_quilt_plot_name,
'save_also_png': save_png,
'sgRNA_intervals': sub_sgRNA_intervals,
'sgRNA_sequences': consensus_guides,
'quantification_window_idxs': include_idxs,
'group_column': 'Folder',
'custom_colors': None,
}
plot(
CRISPRessoPlot.plot_nucleotide_quilt,
Expand Down Expand Up @@ -550,8 +553,10 @@ def main():
'fig_filename_root': this_nuc_pct_quilt_plot_name,
'save_also_png': save_png,
'sgRNA_intervals': consensus_sgRNA_intervals,
'sgRNA_sequences': consensus_guides,
'quantification_window_idxs': include_idxs,
'group_column': 'Folder',
'custom_colors': None,
}
plot(
CRISPRessoPlot.plot_nucleotide_quilt,
Expand Down Expand Up @@ -589,8 +594,10 @@ def main():
'fig_filename_root': this_nuc_pct_quilt_plot_name,
'save_also_png': save_png,
'sgRNA_intervals': consensus_sgRNA_intervals,
'sgRNA_sequences': consensus_guides,
'quantification_window_idxs': consensus_include_idxs,
'group_column': 'Folder',
'custom_colors': None,
}
plot(
CRISPRessoPlot.plot_nucleotide_quilt,
Expand Down Expand Up @@ -654,6 +661,7 @@ def main():
'plot_path': plot_path,
'title': modification_type,
'div_id': heatmap_div_id,
'amplicon_name': amplicon_name,
}
plot(
CRISPRessoPlot.plot_allele_modification_heatmap,
Expand Down Expand Up @@ -687,6 +695,7 @@ def main():
'plot_path': plot_path,
'title': modification_type,
'div_id': line_div_id,
'amplicon_name': amplicon_name,
}
plot(
CRISPRessoPlot.plot_allele_modification_line,
Expand Down Expand Up @@ -779,7 +788,7 @@ def main():

header = 'Name\tUnmodified%\tModified%\tReads_total\tReads_aligned\tUnmodified\tModified\tDiscarded\tInsertions\tDeletions\tSubstitutions\tOnly Insertions\tOnly Deletions\tOnly Substitutions\tInsertions and Deletions\tInsertions and Substitutions\tDeletions and Substitutions\tInsertions Deletions and Substitutions'
header_els = header.split("\t")
df_summary_quantification=pd.DataFrame(quantification_summary, columns=header_els)
df_summary_quantification=pd.DataFrame(quantification_summary, columns=header_els).sort_values(by=['Name'])
samples_quantification_summary_filename = _jp('CRISPRessoAggregate_quantification_of_editing_frequency.txt') #this file has one line for each run (sum of all amplicons)
df_summary_quantification.fillna('NA').to_csv(samples_quantification_summary_filename, sep='\t', index=None)
crispresso2_info['results']['alignment_stats']['samples_quantification_summary_filename'] = os.path.basename(samples_quantification_summary_filename)
Expand Down Expand Up @@ -841,11 +850,17 @@ def main():
report_filename = OUTPUT_DIRECTORY+'.html'
if (args.place_report_in_output_folder):
report_filename = _jp("CRISPResso2Aggregate_report.html")
CRISPRessoReport.make_aggregate_report(crispresso2_info, args.name,
report_filename, OUTPUT_DIRECTORY,
_ROOT, crispresso2_folders,
crispresso2_folder_htmls,
quilt_plots_to_show)
CRISPRessoReport.make_aggregate_report(
crispresso2_info,
args.name,
report_filename,
OUTPUT_DIRECTORY,
_ROOT,
crispresso2_folders,
crispresso2_folder_htmls,
logger,
compact_plots_to_show=quilt_plots_to_show,
)
crispresso2_info['running_info']['report_location'] = report_filename
crispresso2_info['running_info']['report_filename'] = os.path.basename(report_filename)
else: #no files successfully imported
Expand Down
4 changes: 4 additions & 0 deletions CRISPResso2/CRISPRessoReports/CRISPRessoReport.py
Original file line number Diff line number Diff line change
Expand Up @@ -646,6 +646,7 @@ def make_aggregate_report(
_ROOT,
folder_arr,
crispresso_html_reports,
logger,
compact_plots_to_show=None,
display_names=None,
):
Expand All @@ -660,6 +661,7 @@ def make_aggregate_report(
_ROOT (string): location of crispresso assets (images, templates, etc)
folder_arr (arr of strings): paths to the aggregated crispresso folders
crispresso_html_reports (dict): folder->html_path; Paths to the aggregated crispresso run html reports
logger (logging.Logger): logger to log messages
compact_plots_to_show (dict): name=>{'href': path to target(report) when user clicks on image, 'img': path to png image to show}
display_names (dict): folder->display_name; Titles to be shown for crispresso runs
(if different from names_arr, e.g. if display_names have spaces or bad chars, they won't be the same as names_arr)
Expand Down Expand Up @@ -778,6 +780,8 @@ def make_aggregate_report(
crispresso_report_folder,
_ROOT,
report_name,
'aggregate',
logger,
window_nuc_pct_quilts=window_nuc_pct_quilts,
nuc_pct_quilts=nuc_pct_quilts,
summary_plots=summary_plots,
Expand Down

0 comments on commit 876abc2

Please sign in to comment.