-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Let's try to explore the MACARON_output.txt:
CHROM POS ID REF ALT Gene_Name QUAL sample01.GT sample02.GT sample03.GT sample01.GT sample02.GT sample03.GT Protein_coding_Gene_Name AA-Change REF-codon ALT-codon ALT-codon_merge-2VAR AA-Change-2VAR ALT-codon_merge-3VAR AA-Change-3VAR
chr22 21349676 rs412470 T A LZTR1 423.0 T/T T/A T/T 0/0 0/1 0/0 MISSENSE S92T Tct Act ATt I . .
chr22 21349677 rs376419 C T LZTR1 423.0 C/C C/T C/C 0/0 0/1 0/0 MISSENSE S92F tCt tTt . I . .
chr22 23247169 rs527511481 T G IGLJ3 719.0 T/T T/G T/T 0/0 0/1 0/0 MISSENSE W39G Tgg Ggg GTg V . .
chr22 23247170 rs540954398 G T IGLJ3 716.0 G/G G/T G/G 0/0 0/1 0/0 MISSENSE W39L tGg tTg . V . .
Overall, two pairs of pcSNV are observed (Line 1 and 2; first pair chr22:21349676-21349677
, and Line 3 and 4; second pair chr22:23247169-23247170
). Since we learned this from available SNVs annotation callers, we start to interpret each SNV (individually) of first pair. chr22:21349676
part of the genetic codon Tct
codes for the amino acid position S92
for the protein product of gene LZTR1
. For allele A
, Tct
changes to Act
that now codes for T92
. chr22:21349677
also part of the genetic codon tCt
so codes for the same amino acid position S92
for the protein product of gene LZTR1
. For allele T
, tCt
changes to tTt
that now codes for F92
.
Now, according to MACARON:
Rather than considering these two SNVs as two different variation events, it should be regarded as combined chr22:23247169-23247170
variation event as both SNVs affecting the same genetic codon. So, the re-annotation should be the merging of two ALT-codon, that is, Act
and tTt
to form a new codon ATt
that codes for I
.
Continuing our focus on first pair of pcSNV, here, we will access how many samples have:
- this pcSNV
(chr22:23247169-23247170)
, and - What is the allelic (or genotype) status of this pcSNV?
sample01
and sample03
both are homozygous reference to this pcSNV but sample02
is heterozygous reference. So, the user needs to focus sample02
for validating pcSNV.
Let's try to understand the output of MACARON_validate.sh:
sub1 chr22:21349676-21349677 sample02
1 AA
1 T
11 AT
14 TC
The first line sub1 chr22:21349676-21349677 sample02
indicates the numbering (do not confuse with sub
, it is just the sub
stitution of REF codon to ALT codon in first pair of pcSNV in as observed in MACARON_output.txt, chr coordinates with sample name.
As we understood from Interpretation of MACARON Output, the ALT
bases we are looking at position chr22:21349676-21349677
should be AT
. So, total we have 29 read counts for this pcSNV. 1 for AA
, 1 for T
, 11 for AT
and 14 for TC
. Since sample02 is heterozygous reference for this pcSNV, out of 29 reads, 11 reads have AT
pcSNV existed on the same read.
/home/wuk/Pictures/for_macaron_wiki.png
- Python 2 and 3 compatibility (Done),
- Deprecated GATK3 option --allowMissingData is removed,
- Handles GATK versions >= 4.0 via new option: --gatk4,
- MACARON renaming and respect to UNIX standard (Done).
- Changed shell output aesthetics ;)
- Check that SNPEFF_HG is not empty,
- Check that files set in GATK, HG_REF and SNPEFF exist,
- Added verbosity option: -v , enables output from GATK and SNPEff.
- All temporary files are now stored in "macaron_tmp" directory, which is removed at the end of the process,
- Option -d has been removed,
- Added mode eco-friendly with option -c or --eco-friendly, which disable animation but save a thread,
- Added possibility to set GATK path, SnpEff path, SnpEff human genome annotation database version, and human reference genome paths as optional arguments, the user can still set default values directly in the script.
Update to MACARON, now version 1.0
Some news:
Now compatible with latest version of GATK4 (4.1.7) - GATK3 is no longer supported (the previous version of MACARON will still work great with GATK3) GATK4 is now used as default, and MACARON relies on gatk wrapper instead of the .jar file For version of GATK4 before 4.1.4.1, the option of IndexFeatureFile is different. The option --gatk4_previous must be added when using these older versions of GATK4. MACARON can also handle the snpEff wrapper (available via bioconda). The extension of the file (either '.jar' or no extension) will allow MACARON to determine if the wrapper is used or the .jar file. If gatk and snpEff are accessible via $PATH, there is no need for the user to provide the path of these programs to MACARON. Animation is now an option, use -c option to visualize the wheel turning (great to pass the time!) A bug with grep has been fixed to allow the use of MACARON with MacOS ! Global refactoring (classes dropped, and some more) Option --keep_temp was added to not delete temporary files (useful for debugging)
To do:
GATK4 is not completely silent despite setting QUIET and verbosity parameters. Need to find a way to hide it in the next version...