Scripts can be found at ./src. The scripts must be run in the order below given that barcode processing is required ahead of data processing and to extract data under ROI the final data file generated by the main pipeline is required.
A testing script can be found here.
hifi_barcode_processing.sh
: Deduplicate the barcode reads and generate bwa index of their sequence.
-b : Directory of the scripts.
-f : Flowcell ID.
-l : Flowcell lane.
-s : Flowcell surface.
-d : Directory of the barcode fastq files.
-N : Suffix of the fastq file (example R1.fastq.gz, or *R1.fastq.gz if multiple input fastq files).
-t : Max CPU threads for parallelized processing, at least 4 (default 8).
-o : Parent output directory.
hifi_wrapper.sh
: Main processing pipeline.
-b : Directory of the scripts.
-i : Directory of the STAR index.
-g : GTF annotation file of the reference genome.
-N : Sample name, used to label the final output files.
-S : Directory of the processed spatial barcodes.
-f : Flowcell ID, including lane and surface (example: XXX_1_1).
-1 : R1 fastq.gz file of the HiFi library.
-2 : R2 fastq.gz file of the HiFi library.
-t : Max CPU threads for parallelized processing, at least 4 (default 8).
-o : Parent output directory.
hifi_extract_roi.sh
: Subset final output data to extract only tiles under ROI.
-R : Path to the txt file with the tiles under ROI (each tile on a new line).
-r : ROI name to label the output file.
-N : Sample name, used to label the final output file.
-o : Parent output directory.
hifi_stats.sh
: Calculate statistics and QC metrics.
-N : Sample name.
-o : Parent output directory.