-
Notifications
You must be signed in to change notification settings - Fork 22
Reference Alignment
- Overview
-
Guides
- Inputs
- Data Products
- Build Processes
- Processing Profiles
-
Tutorials
- How do I import external data for reference alignment?
- How do I align non-human reads and call variants?
-
How do I align reads with BWA-SW instead of regular BWA (aka BWA-short)?
- Q: Do you have an existing Reference Alignment model?
- How do I define a region of interest for evaluating coverage?
- How do I list the BAM location for a model?
- Known Issues
The Reference Alignment pipeline performs basic alignment and variant detection on sequence data for a single sample.
The Instrument Data records corresponding to the sequence data to align.
The Reference Sequence Build to which to align the Instrument Data.
The dbSNP Imported Variation List Build to use
The Genotype Microarray Model for the Genotype Data for the Subject of this model
The Annotation Build to use when annotating the variants.
The name of the Target Region Set Feature List used to perform capture to generate the Instrument Data. This should match the target_region_set_name property on the Instrument Data.
The name of the Feature List used to perform the Coverage analysis.
For a multi-tracked BED file, this controls which track is used to perform the coverage analysis. Valid values are "tiled_region" and "target_region".
This boolean flag controls whether or not the "name" column is replaced in Coverage analysis with "short" names (currently "r" followed by a sequential number).
This boolean flag controls whether or not overlapping and adjacent regions in the BED file are combined before coverage analysis is performed.
The top of a Reference Alignment build directory looks like this:
Build directory listing
alignments@ build.xml logs/ reference_coverage@ reports/ variants/
The "logs" and "reports" directories as well as the "build.xml" file are standard products of running a Build in the GMS.
4bf260ad9a244e20b4fd4bbc830b7ac1.bam 4bf260ad9a244e20b4fd4bbc830b7ac1.bam.flagstat H_patient1-sample1-lib1-2893653597.log
4bf260ad9a244e20b4fd4bbc830b7ac1.bam.bai 4bf260ad9a244e20b4fd4bbc830b7ac1.bam.md5 H_patient1-sample1-lib1lib1-2893653597.metrics
The primary data product from running alignment is a merged BAM File (in this case, "4bf260ad9a244e20b4fd4bbc830b7ac1.bam".) It will also have an index file along side it that downstream tools use for faster lookups inside the BAM file.
An MD5 of the BAM file is generated:
4bf260ad9a244e20b4fd4bbc830b7ac1.bam.md5
355a84d163e30d36ca915677f3c97728 /path/to/4bf260ad9a244e20b4fd4bbc830b7ac1.bam
The path listed in this file will reflect a location in which the BAM file was being worked on in the process of running and may not reflect its current location on disk. In general it is superior to keep track of the ID of the merged Alignment Result rather than the path where the BAM file currently happens to reside.
The otuput of running samtools flagstat
on the BAM file is also saved for future reference:
4bf260ad9a244e20b4fd4bbc830b7ac1.bam.flagstat
Two files will be produced for each library contained in the merged BAM if duplication handling was performed. The "H_patient1-sample1-lib1-2893653597.log" will typically be empty and can be disregarded. The corresponding .metrics file will contain the result of running Picard's DuplicationMetrics for the library:
H_patient1-sample1-lib1.metrics
The output of the Variant Detection API is a directory of symlinks and subdirectories with a few bookkeeping files:
Example output directory
./ dispatcher.cmd indels.hq.bed@ indels.lq.v1.bed@ snvs.detailed.vcf.gz@ snvs.hq.v2.bed@ snvs.vcf.gz sv/
../ indel/ indels.hq.v1.bed@ indels.lq.v2.bed@ snvs.detailed.vcf.gz.tbi@ snvs.lq.bed@ snvs.vcf.gz.bed@ svs.hq@
cnv/ indels.detailed.vcf.gz@ indels.hq.v2.bed@ indels.vcf.gz snvs.hq.bed@ snvs.lq.v1.bed@ snvs.vcf.gz.v1.bed@ workflow.xml
cnvs.hq@ indels.detailed.vcf.gz.tbi@ indels.lq.bed@ snv/ snvs.hq.v1.bed@ snvs.lq.v2.bed@ snvs.vcf.gz.v2.bed
The symlinks point to the "final" files inside the subdirectories. If only a detector was run this is the detector output; if filters were run, this is the filter output; and if combiners were run this is the combination output.
The subdirectories themselves contain symlinks to the Software Result output directory for each detector and combiner that was run:
Example snv subdirectory
./ sniper-0.7.3-a5de64567b157457613af1127b6563ab@
../ unionunique-intersect-snv_samtools_r963__8-snv_sniper_0.7.3__4-snv_varscan-somatic-validation_2.2.6__3@
intersect-snv_samtools_r963__8-snv_sniper_0.7.3__4@ varscan-somatic-validation-2.2.6-a9094f13b415c7c967433c8742871425@
samtools-r963-67b654281932619f0b6c5f28fae43235@
The detector links are named for the detector name, version, and a hash of the parameters as specified in the strategy. The combiner links are named for the workflow steps of the detectors that they combine.
Inside an individual detector output directory are whatever output files it generates as well as symlinks to the output directory of a filter applied to that detector. All detectors produce at a minimum a variant_type.hq file representing the native output of running the detector.
Example detector directory
./ snvs.hq snvs.hq.v1.bed@ snvs.vcf.gz.tbi@
../ snvs.hq.bed@ snvs.hq.v2.bed snvs.vcf.gz@ false-positive-v1-d41d8cd98f00b204e9800998ecf8427e@
If a second filter is applied, its symlink will appear in the output directory of the first filter. Each filter should produce a minimum of two files, a variant_type.hq and a variant_type.lq file. These are similarly converted to BED files and into a single VCF with a filter column. Filters may require that their detectors produce a file in a particular format in order to run. Most operate on the BED or VCF directly, but a few are specific to particular detectors.
Most detectors and filters should generate both BED files and VCFs.
The same BED file goes by three names, variant_type.hq.bed, variant_type.hq.v1.bed, and variant_type.hq.v2.bed. The first is intended to be a symlink to the highest version of BED file available (currently V2). Since V2 is backwards compatible with V1, the V1 file also symlinks to the V2 file. For the filters, there are a similar set of three files with "lq" instead of "hq" for those variants that failed to pass the filter.
No BED files or VCFs are generated for SVs. They all use a variation on the output format of Breakdancer.
No BED files or VCFs are generated for CNVs. Some CNV detectors produce an empty cnvs.hq file and are only used for the other reports they generate.
The "dispatcher.cmd" file contains the PERL5LIB that was used when running variant detection and the command-line that would be used to generate the directory again.
dispatcher.cmd
/etc/perl /usr/local/lib/site_perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10.1 /usr/share/perl/5.10.1 /usr/local/lib/site_perl
===============================================
gmt detect-variants2 dispatcher --output-directory /path/to/variants --reference-build 106942997 --aligned-reads-input /path/to/75609fb1cd7b4dbabbe3d279e89ff136.bam --control-aligned-reads-input /path/to/84d056b908ec4acabec79dad50c1c78d.bam --aligned-reads-sample TEST-patient1-somval_tumor1 --control-aligned-reads-sample TEST-patient1-somval_normal1 --snv-detection-strategy '(samtools r963 [-A -B] filtered by snp-filter v1 intersect sniper 0.7.3 [-q 1 -Q 15] filtered by false-positive v1 [--bam-readcount-version 0.4 --bam-readcount-min-base-quality 15] then somatic-score-mapping-quality v1 [--min-mapping-quality 40 --min-somatic-score 40]) unique union varscan-somatic-validation 2.2.6 [--min-var-freq 0.08 --p-value 0.10 --somatic-p-value 0.01 --validation] filtered by varscan-high-confidence v1 then false-positive v1 [--bam-readcount-version 0.4 --bam-readcount-min-base-quality 15]' --sv-detection-strategy '(breakdancer 1.2 [-g -h:-a -t -q 10 -d] filtered by novo-realign v1 then tigra-validation v1 union breakdancer 1.2 [-g -h:-a -q 10 -o] filtered by tigra-validation v1) union squaredancer 0.1 filtered by tigra-validation v1' --indel-detection-strategy '(gatk-somatic-indel 5336 union pindel 0.5 filtered by pindel-somatic-calls v1 then pindel-read-support v1) union varscan-somatic-validation 2.2.6 filtered by varscan-high-confidence-indel v1 then false-indel v1 [--bam-readcount-version 0.4 --bam-readcount-min-base-quality 15]'
The "workflow.xml" file is used internally for Workflow to run the requested detectors, filters, and combiners.
There are two additional files later added to the variants directory for annotation:
The annotation process results in two files. The names are customizable, for example Reference Alignment uses:
Annotation output files
filtered.variants.pre_annotation filtered.variants.post_annotation
The annotator runs on a file format that is slightly different from the standard BED files produced by the Variant Detection API. Given this input:
indels.hq.bed
1 109745943 109745945 AA/* 41 1
the corresponding adapted output is:
filtered.variants.pre_annotation
1 109745944 109745945 AA -
The annotator format uses a one-based start coordinate and splits the reference and variant positions to separate columns. The additional columns of the BED file are ignored.
Running the annotator on this adapted file adds the annotation data available from the chosen Annotation Build:
filtered.variants.post_annotation
chromosome_name start stop reference variant type gene_name transcript_name transcript_species transcript_source transcript_version strand transcript_status trv_type c_position amino_acid_change ucsc_cons domain all_domains deletion_substructures transcript_error default_gene_name gene_name_source ensembl_gene_id
1 109745944 109745945 AA - DEL PSMA5 NM_002790 human genbank 54_36p -1 reviewed 3_prime_flanking_region c.*214 NULL 0.022:0.021 - - - no_errors Unknown Unknown Unknown
1 109745944 109745945 AA - DEL SORT1 NM_002959 human genbank 54_36p -1 reviewed 5_prime_flanking_region c.-3907 NULL 0.022:0.021 - - - no_errors Unknown Unknown Unknown
Running the Reference Coverage analysis produces a directory of files:
Coverage result directory
4bf260ad9a244e20b4fd4bbc830b7ac1-ea2eb5fde49b41adaa1fbb9145cc2a57-wingspan_500-alignment_summary.tsv ea2eb5fde49b41adaa1fbb9145cc2a57.bed
4bf260ad9a244e20b4fd4bbc830b7ac1-ea2eb5fde49b41adaa1fbb9145cc2a57-wingspan_0-alignment_summary-v2.tsv wingspan_0/
4bf260ad9a244e20b4fd4bbc830b7ac1-ea2eb5fde49b41adaa1fbb9145cc2a57-wingspan_0-alignment_summary.tsv wingspan_500/
4bf260ad9a244e20b4fd4bbc830b7ac1-ea2eb5fde49b41adaa1fbb9145cc2a57-wingspan_500-alignment_summary-v2.tsv
The files are named for the Merged Alignment Result's ID and the Feature List's ID.
The processed BED File (here ea2eb5fde49b41adaa1fbb9145cc2a57.bed) is produced according to the selected Coverage Options. This file is used as the list of regions to generate the remaining files.
There will be Alignment Summary TSV files produced for each requested wingspan. Each is formatted like the following:
4bf260ad9a244e20b4fd4bbc830b7ac1-ea2eb5fde49b41adaa1fbb9145cc2a57-wingspan_0-alignment_summary-v2.tsv
total_bp total_aligned_bp total_unaligned_bp total_duplicate_bp paired_end_bp read_1_bp read_2_bp mapped_paired_end_bp proper_paired_end_bp singleton_bp total_target_aligned_bp unique_target_aligned_bp duplicate_target_aligned_bp total_off_target_aligned_bp unique_off_target_aligned_bp duplicate_off_target_aligned_bp
11035644200 10644323947 391320253 953153277 11035644200 5517822100 5517822100 10906482400 10764388400 142094600 6427456805 5823315543 604141262 4216867142
3867855127 349012015
Each wingspan directory will contain files with more detailed stats:
Example wingspan directory
4bf260ad9a244e20b4fd4bbc830b7ac1_ea2eb5fde49b41adaa1fbb9145cc2a57_STATS.tsv 4bf260ad9a244e20b4fd4bbc830b7ac1_ea2eb5fde49b41adaa1fbb9145cc2a57_STATS.txt
The TXT file contains a summary of the coverage report:
4bf260ad9a244e20b4fd4bbc830b7ac1_ea2eb5fde49b41adaa1fbb9145cc2a57_STATS.txt
The TSV file has the statistics collected for each region of the processed BED file:
4bf260ad9a244e20b4fd4bbc830b7ac1_ea2eb5fde49b41adaa1fbb9145cc2a57_STATS.tsv
Processing Profile: 2635769 "Nov 2011 Default Reference Alignment" (Current Default)
Read Alignment:
Each lane or sub-lane of data is aligned to GRCh37-lite with bwa v0.5.9. Defaults are used in both bwa aln and bwa samse/sampe with the exception that for bwa aln four threads are utilized (-t 4) and bwa's built in quality-based read trimming (-q 5). If the median insert size of the data is available then the median + 5 standard deviations is passed as the maximum insert size to bwa sampe (-a parameter) otherwise this value is specified as 600. ReadGroup entries are added to resulting SAM files using gmt sam add-read-group-tag. This SAM file is then converted to a BAM file using Samtools vr963, name sorted (samtools sort -n), mate pairings assigned (samtools fixmate), resorted by position (samtools sort), and indexed using gmt sam index-bam.
Reads from multiple lanes, but the same sequencing library are merged, if necessary, using Picard v1.46 MergeSamFiles and duplicates are then marked per library using Picard MarkDuplicates v1.4.6. Lastly, each per-library BAM with duplicates marked is merged together to generate a single BAM file for the sample. For MergeSamFiles we run with SORT_ORDER=coordinate and MERGE_SEQUENCE_DICTIONARIES=true. For both tools, ASSUME_SORTED=true and VALIDATION_STRINGENCY=SILENT are specified. All other parameters are set to defaults. Samtools flagstat is run on each BAM file generated (per-lane, per-library, and final merged).
Coverage Reporting:
Genotype Concordance:
Variant Calling:
Variant Annotation:
A. Yes. Should we ask users to validate before import....
A. No, convert from FASTQ to BAM, see below.
A. My data was sequenced on the PacBio.
Here is an example PacBio lister to show the raw primary FASTQ and the CCS FASTQ files per movie:
$ pac-bio input list --filter sample_name~NF1% --show +primary_fastq_files,ccs_fastq_files
$ gmt picard fastq-to-sam --use-version=1.82 --fastq=/gscmnt/gc2122/production/paresults/TD13-0003_659/A01_1/Analysis_Results/m130130_003332_00116_c100448422550000001523062202151375_s1_p0.fastq
--platform=pacbio --platform-unit=10044842255000000152306220215137.5 --read-group-name=TD13-0003_A01_s1 --run-date='2013-01-31 14:17:13'
--description='PacBio TD13-0003 A01 NF1_1.5Kb_0.9nM Movie 1' --comment=NF1_1.5Kb_0.9nM --predicted-insert-size=1500 --sequencing-center=TGI \
--sample-name=H_IJ-NA12878 --library-name=H_IJ-NA12878-extlibs --quality-format=Standard --output=$BAM_PATH
Here is an example sample name that already exists. More information about the sample can be found through the command line lister:
$ genome sample list --filter name=H_IJ-NA12878
ID NAME SPECIES_NAME PATIENT_COMMON_NAME COMMON_NAME TISSUE_LABEL TISSUE_DESC EXTRACTION_TYPE EXTRACTION_LABEL EXTRACTION_DESC
-- ---- ------------ ------------------- ----------- ------------ ----------- --------------- ---------------- ---------------
2376483307 H_IJ-NA12878 human <NULL> <NULL> <NULL> <NULL> genomic dna NA12878 <NULL>
TO DO
$ genome instrument-data import basic --source-files=$BAM_PATH --sample=H_IJ-NA12878 --import-source-name=TGI_PacBio_RS --description='PacBio TD13-0003 A01 NF1_1.5Kb_0.9nM Movie 1'
'sample' may require verification...
Resolving parameter 'sample' from command argument 'H_IJ-NA12878'... found 1
Import instrument data...
Resolve library...
Sample name: H_IJ-NA12878
Sample id: 2376483307
Library name: H_IJ-NA12878-extlibs
Library id: 2891323975
Resolve library...done
Validate source files...
Source file(s): /tmp/test.bam
Source file count: 1
Format: bam
Kilobytes requested: 318632
Validate source files...done
Checking if source files were previously imported...
Source files were NOT previously imported!
Create instrument data...
Instrument data id: 2891323976
Allocation (eea8c19297dd4623ad3752f10954ce82) created at /gscmnt/gc1400/info/instrument_data/imported/2891323976
Allocation id: eea8c19297dd4623ad3752f10954ce82
Allocation path: /gscmnt/gc1400/info/instrument_data/imported/2891323976
Allocation tmp path: /gscmnt/gc1400/info/instrument_data/imported/2891323976/tmp
Create instrument data...done
Transfer bam file and run flagstat...
Source file: /tmp/test.bam
Temp bam file: /gscmnt/gc1400/info/instrument_data/imported/2891323976/tmp/all_sequences.bam
Copy...
Copy...done
Run flagstat...
Flagstat file: /gscmnt/gc1400/info/instrument_data/imported/2891323976/all_sequences.bam.flagstat
RUN: samtools flagstat /gscmnt/gc1400/info/instrument_data/imported/2891323976/tmp/all_sequences.bam > /gscmnt/gc1400/info/instrument_data/imported/2891323976/all_sequences.bam.flagstat
Run flagstat...done
Move tmp bam file to permenant file...
Bam file: /gscmnt/gc1400/info/instrument_data/imported/2891323976/all_sequences.bam
Move tmp bam file to permenant file...done
Transfer bam file and run flagstat...done
Update properties on instrument data...
Bam path: /gscmnt/gc1400/info/instrument_data/imported/2891323976/all_sequences.bam
Read count: 75153
Is paired end: 0
Update properties on instrument data...done
Remove tmp dir...
rmdir /gscmnt/gc1400/info/instrument_data/imported/2891323976/tmp
Remove tmp dir...done
Reallocate...
Setting kilobytes_requested to 267648 based on `du` for allocation eea8c19297dd4623ad3752f10954ce82
Reallocate...done
Import instrument data...done
TO DO (see GMS-Reference Sequence)
Step 2: Create a reference alignment and variant-calling processing profile
Example to align genome reads from a cat breed to cat reference and call variants using VarScan and samtools.
Either locate a processing profile that you want to use, or create a new one. The processing profile is a collection of the parameters that you want the pipeline to use. It can be re-used for many different projects. If you find a processing profile that is close to what you want to use but needs a few tweaks, use the --based-on option (see `genome processing-profile create reference-alignment -h` for more info).
Example create command:
genome processing-profile create reference-alignment --based-on 2774195 --snv-detection-strategy " varscan 2.2.9 [--min-coverage 3 --min-var-freq 0.20 --p-value 0.10 --strand-filter 1 --map-quality 10] filtered by false-positive v1 [--min-var-count 2 --max-mm-qualsum-diff 100 --bam-readcount-version 0.4 --bam-readcount-min-base-quality 15] unique union samtools r963 filtered by snp-filter v1 then false-positive v1 [--max-mm-qualsum-diff 100 --bam-readcount-version 0.4 --bam-readcount-min-base-quality 15]" --indel-detection-strategy "varscan 2.2.9 [--min-coverage 3 --min-var-freq 0.20 --p-value 0.10 --strand-filter 1 --map-quality 10] filtered by false-indel v1 [--min-var-count 2 --max-mm-qualsum-diff 100 --bam-readcount-version 0.4 --bam-readcount-min-base-quality 15]" --merger-name picard --merger-version 1.46 --name "mmontagu align and detect variants" --samtools-version r599
2. Locate instrument data ID
3. Define a genome model with appropriate processing profile ID and reference sequence
genome model define reference-alignment --processing-profile 2779197 --subject-name 'FCBK-081219_ARND4_pool_tube1' --model-name 'FCBK Asian to Felis catus 6.2' --instrument-data 2857675652,2857676651,2857676887,2857676888,2857676889,2857679396,2857679850 --reference-sequence-build 131815546
4. Retrieve Model ID after running model define and start build
genome model build start 2890120880
Step 3: Match variants to coordinates
Variant Effect Predictor (VEP) matches variants to coordinates (genome db ensembl vep)
genome db ensembl vep --input-file <VARIANT.CALLING.OUTPUT.bed> --output-file <OUTPUT.NAME> --version 2_7 --gtf-file <LOCATION.FILENAME.gtf> --gtf-cache --species <GENUS_SPECIES> --reference-build-id <IMPORTED.REFERENCE.SEQUENCE.ID> --ensembl-annotation-build-id ENSEMBL.ANNOTATION.BUILD.ID>
Example to run a final filtering step and annotate variants.
Run a clustered variant filtering step for the cat variant calls before annotating (using parameters recommended by Dan Kolboldt; can we add this into processing profile?):
--max-variants 5
--window-size 500
using the following command:
gmt somatic filter-variant-clusters --variant-file <snvs.hq.bed> --output-file snv.cluster.filter --max-variants 5 --window-size 500 > clusters.summary
2. Run VEP
genome db ensembl vep --input-file /gscmnt/ams1156/info/model_data/2889900372/build131861968/variants/snvs.hq.bed --output-file snvs.hq.annotated --version 2_7 --gtf-file /gscmnt/gc6136/assembly/cat/cat_ensembl70_all_genes.gtf --gtf-cache --species felis_catus --reference-build-id 131815546 --ensembl-annotation-build-id 124434505
CONSIDER: VEP v2.7 can output a vcf file, but this version needs an ensembl api of 69 or higher and this means --ensembl-annotation-build-id 131184146
Note about gtf format to be compatible with VEP: The GTF file have its features sorted in chromosomal order, and must contain the following entry types and data:
* "exon" feature lines with have "transcript_id", "gene_id" and "exon_number" in the description field
* "CDS" feature lines must correspond to coding exons, with "transcript_id" and "exon_number" populated in the description field, and the "frame" column populated with the read frame of the exon
* the "source" column of the GTF should indicate the biotype of the exon/CDS - this should be "protein_coding" for protein coding transcripts
A : Yes.
genome model copy MODEL processing_profile=2787013
A : No, see instructions for creating new models .
BED format files are used to define regions of interest. The GMS will use these files to evaluate coverage of the defined regions, please see Targeted Selection Coverage for additional details.
There is a genome command that will create a 'feature-list'. The command takes a BED format file. The valid types of BED files are listed in the command usage.
genome feature-list create --help
The preference for BED format is in the following order:
1. 'multi-tracked BED' files are a convention used by several Exome providers that support multiple UCSC browser tracks contained within a single file. Typically the tracks are 'target_regions' and 'tiled_regions'. The target_regions represent the targeted exons in an Exome product. The 'tiled_regions' represent the actual placement of hybridization probes. This is the ideal format for BED files.
2. 'true-BED' format follows all of the defined specifications made by the UCSC Genome Browser BED format. This includes using zero based start positions and one based end positions.
3. 1-based format means the start position is still in one based genomic coordinates rather than using zero based start positions.
4. 'multi-tracked 1-based' files mean there are multiple tracks for 'target_region' and 'tiled_region' in addition to having one based start positions.
5. 'unknown' means the format is unknown but still needs to be determined in order for analysis to proceed.
A model input exists called region_of_interest_set_name. When defining a model the name or id of a feature-list is provided as the value for this input:
genome model define reference-alignment --region-of-interest-set-name=$FEATURE_LIST ...
You may want to update the region_of_interest_set_name or a reference alignment model in order to evaluate a different region. To do this you need to execute the following commands:
genome model input remove --name=region_of_interest_set_name $MODEL --values=$CURRENT_FEATURE_LIST
genome model input add --name=region_of_interest_set_name $MODEL --values=$NEW_FEATURE_LIST
The 'target_region_set_name' is the name of the target region for which instrument data was generated. Each instrument data that has had targeted selection will have a barcoded target_region_set_name. This model input is used to filter the instrument data assigned to a model by this property. Only instrument data that has the same target_region_set_name will be assigned to a model with the defined model input.
The 'region_of_interest_set_name' is the name of a set of regions for which coverage and variant detection will be evaluated. This is often the same value as 'target_region_set_name' since coverage of the regions targeted by hybridization is often desired. However, it is possible to evaluate coverage and variants within any arbitrary set of genomic coordinates using a 'region_of_interest_set_name' that corresponds to a valid feature-list.
Models do not have results of their own--they are merely a container for builds. Thus we need to select a build from the model. "last_succeeded_build" (and "last_complete_build", which is synonymous) give us the most recent build that completed successfully.
"whole_rmdup_bam_file" is something particular to reference alignment builds. Both reference alignment and rna-seq have a merged_alignment_result (which is itself a Genome::InstrumentData::AlignmentResult::Merged). Looking in that module we see that there is a bam_path accessor.
$ genome model list --filter id=abcd1234 --show id,name,last_complete_build.merged_alignment_result.bam_path
Under development