Fix CRISPRessoAggregate bug and other improvements (#95)

* D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Add amplicon_name to plot functions * Add sgRNA sequences to nucleotide quilt parameters in Aggregate * Add custom_colors to Aggregate plot functions * Update Aggregate and make_aggregate_report to have logger and tool * Write command_used to Aggregate .json info file * Point to new test branch and add Aggregate run * Make the order of Aggregate runs explicit * Sort all instances of crispresso2_folder_info in Aggregate * Sort df_summary_quantification df in Aggregate * Try sorting with a list of single column * Update to correct test branch * Move to master test branch --------- Co-authored-by: Trevor Martin <[email protected]> Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]>
pinellolab · Aug 9, 2024 · 876abc2 · 876abc2
1 parent fa05bd5
commit 876abc2
Show file tree

Hide file tree

Showing 3 changed files with 31 additions and 7 deletions.
diff --git a/.github/workflows/integration_tests.yml b/.github/workflows/integration_tests.yml
@@ -114,3 +114,8 @@ jobs:
         if: success() || failure()
         run: |
           make compare test print
+
+      - name: Run Aggregate
+        if: success() || failure()
+        run: |
+          make aggregate test print
diff --git a/CRISPResso2/CRISPRessoAggregateCORE.py b/CRISPResso2/CRISPRessoAggregateCORE.py
@@ -109,6 +109,7 @@ def main():
         crispresso2_info = {'running_info': {}, 'results': {'alignment_stats': {}, 'general_plots': {}}} #keep track of all information for this run to be pickled and saved at the end of the run
         crispresso2_info['running_info']['version'] = CRISPRessoShared.__version__
         crispresso2_info['running_info']['args'] = deepcopy(args)
+        crispresso2_info['running_info']['command_used'] = ' '.join(sys.argv)
 
         crispresso2_info['running_info']['log_filename'] = os.path.basename(log_filename)
 
@@ -227,7 +228,7 @@ def main():
 
         if successfully_imported_count > 0:
 
-            crispresso2_folders = crispresso2_folder_infos.keys()
+            crispresso2_folders = list(sorted(crispresso2_folder_infos.keys()))
             crispresso2_folder_names = {}
             crispresso2_folder_htmls = {}#file_loc->html folder loc
             quilt_plots_to_show = {}  # name->{'href':path to report, 'img': png}
@@ -515,8 +516,10 @@ def main():
                                     'fig_filename_root': this_window_nuc_pct_quilt_plot_name,
                                     'save_also_png': save_png,
                                     'sgRNA_intervals': sub_sgRNA_intervals,
+                                    'sgRNA_sequences': consensus_guides,
                                     'quantification_window_idxs': include_idxs,
                                     'group_column': 'Folder',
+                                    'custom_colors': None,
                                 }
                                 plot(
                                     CRISPRessoPlot.plot_nucleotide_quilt,
@@ -550,8 +553,10 @@ def main():
                                     'fig_filename_root': this_nuc_pct_quilt_plot_name,
                                     'save_also_png': save_png,
                                     'sgRNA_intervals': consensus_sgRNA_intervals,
+                                    'sgRNA_sequences': consensus_guides,
                                     'quantification_window_idxs': include_idxs,
                                     'group_column': 'Folder',
+                                    'custom_colors': None,
                                 }
                                 plot(
                                     CRISPRessoPlot.plot_nucleotide_quilt,
@@ -589,8 +594,10 @@ def main():
                                     'fig_filename_root': this_nuc_pct_quilt_plot_name,
                                     'save_also_png': save_png,
                                     'sgRNA_intervals': consensus_sgRNA_intervals,
+                                    'sgRNA_sequences': consensus_guides,
                                     'quantification_window_idxs': consensus_include_idxs,
                                     'group_column': 'Folder',
+                                    'custom_colors': None,
                                 }
                                 plot(
                                     CRISPRessoPlot.plot_nucleotide_quilt,
@@ -654,6 +661,7 @@ def main():
                                 'plot_path': plot_path,
                                 'title': modification_type,
                                 'div_id': heatmap_div_id,
+                                'amplicon_name': amplicon_name,
                             }
                             plot(
                                 CRISPRessoPlot.plot_allele_modification_heatmap,
@@ -687,6 +695,7 @@ def main():
                                 'plot_path': plot_path,
                                 'title': modification_type,
                                 'div_id': line_div_id,
+                                'amplicon_name': amplicon_name,
                             }
                             plot(
                                 CRISPRessoPlot.plot_allele_modification_line,
@@ -779,7 +788,7 @@ def main():
 
             header = 'Name\tUnmodified%\tModified%\tReads_total\tReads_aligned\tUnmodified\tModified\tDiscarded\tInsertions\tDeletions\tSubstitutions\tOnly Insertions\tOnly Deletions\tOnly Substitutions\tInsertions and Deletions\tInsertions and Substitutions\tDeletions and Substitutions\tInsertions Deletions and Substitutions'
             header_els = header.split("\t")
-            df_summary_quantification=pd.DataFrame(quantification_summary, columns=header_els)
+            df_summary_quantification=pd.DataFrame(quantification_summary, columns=header_els).sort_values(by=['Name'])
             samples_quantification_summary_filename = _jp('CRISPRessoAggregate_quantification_of_editing_frequency.txt') #this file has one line for each run (sum of all amplicons)
             df_summary_quantification.fillna('NA').to_csv(samples_quantification_summary_filename, sep='\t', index=None)
             crispresso2_info['results']['alignment_stats']['samples_quantification_summary_filename'] = os.path.basename(samples_quantification_summary_filename)
@@ -841,11 +850,17 @@ def main():
                 report_filename = OUTPUT_DIRECTORY+'.html'
                 if (args.place_report_in_output_folder):
                     report_filename = _jp("CRISPResso2Aggregate_report.html")
-                CRISPRessoReport.make_aggregate_report(crispresso2_info, args.name,
-                                                       report_filename, OUTPUT_DIRECTORY,
-                                                       _ROOT, crispresso2_folders,
-                                                       crispresso2_folder_htmls,
-                                                       quilt_plots_to_show)
+                CRISPRessoReport.make_aggregate_report(
+                    crispresso2_info,
+                    args.name,
+                    report_filename,
+                    OUTPUT_DIRECTORY,
+                    _ROOT,
+                    crispresso2_folders,
+                    crispresso2_folder_htmls,
+                    logger,
+                    compact_plots_to_show=quilt_plots_to_show,
+                )
                 crispresso2_info['running_info']['report_location'] = report_filename
                 crispresso2_info['running_info']['report_filename'] = os.path.basename(report_filename)
         else: #no files successfully imported

diff --git a/CRISPResso2/CRISPRessoReports/CRISPRessoReport.py b/CRISPResso2/CRISPRessoReports/CRISPRessoReport.py
@@ -646,6 +646,7 @@ def make_aggregate_report(
     _ROOT,
     folder_arr,
     crispresso_html_reports,
+    logger,
     compact_plots_to_show=None,
     display_names=None,
 ):
@@ -660,6 +661,7 @@ def make_aggregate_report(
     _ROOT (string): location of crispresso assets (images, templates, etc)
     folder_arr (arr of strings): paths to the aggregated crispresso folders
     crispresso_html_reports (dict): folder->html_path; Paths to the aggregated crispresso run html reports
+    logger (logging.Logger): logger to log messages
     compact_plots_to_show (dict): name=>{'href': path to target(report) when user clicks on image, 'img': path to png image to show}
     display_names (dict): folder->display_name; Titles to be shown for crispresso runs
         (if different from names_arr, e.g. if display_names have spaces or bad chars, they won't be the same as names_arr)
@@ -778,6 +780,8 @@ def make_aggregate_report(
         crispresso_report_folder,
         _ROOT,
         report_name,
+        'aggregate',
+        logger,
         window_nuc_pct_quilts=window_nuc_pct_quilts,
         nuc_pct_quilts=nuc_pct_quilts,
         summary_plots=summary_plots,