This repository comprises a list of data files elaborating the methodology and data analysis of a rare cis-AB blood group identified in Indian subcontinent
The sample used in the study was subjected to whole exome sequencing. Computational analysis of the sequencing data duly invloved the alignment of the raw sequencing output files to GRCh37/hg19 human reference genome. This was followed by systematic compilation of genetic variations in Variant Call Format (VCF) file. The variants were annotated for their functional consequences using ANNOVAR.
For ease of analysis the alignment and variant call files were subsetted (including chromosome 9 data exclusively). The subsetted files were further used to analyse the phasing of genetic variants using Plink and SHAPEIT tools.
1. FastQC
2. Illumina DRAGEN Bio-IT platform
3. ANNOVAR
4. Samtools
5. Awk scripting
6. PLINK
7. SHAPEIT
Raw sequencing outfiles (.fastq files - paired end). Samples underwent Whole Exome Sequencing (WES). Read files were checked for their quality using FastQC and were subjected to alignment to reference genome (GRCh37/hg19) and variant calling using Illumina DRAGEN v3.4 Bio-IT platform
$ fastqc Sample_R1.fastq.gz
$ fastqc Sample_R2.fastq.gz
$ dragen -r {hg19 reference genome} -1 {Sample_R1.fastq.gz} -2 {Sample_R2.fastq.gz} --enable-variant-caller true --output-file-prefix {outfilename} --output-directory {outfilefolder} Compiled list of genetic variants were systematically annotated for their functional consequences from a range of computational tools using ANNOVAR
$ table_annovar.pl {Sample.avinput} Annovar/humandb --buildver hg19 --outfile {outfilename-prefix} --protocol refGene,cytoBand,genomicSuperDups,dbnsfp33a,avsnp147,exac03,1000g2015aug_all --operation g,r,r,f,f,f,f --nastring NA --otherinfo
A smaller subset of the alignment (.bam) file comprising chromosome 9 information was created using SAMTOOLS. Similarly, variants spanning chromosome 9 were subsetted from the ouput VCF using bespoke AWK commands/scripts
$ samtools view {Sample.bam} chr9 -b > Sample_chr9.bam
$ awk -F'\t' '{if($1 == "chr9") print $0}' {Sample.vcf} > Sample_chr9.vcf
The subsetted vcf file was preprocessed to .ped and .map file formats using PLINK tool. The obtained PLINK output files were further used to perform variant phasing using SHAPEIT tool utilizing the publicly available refrence panel of haplotypes provided by the 1000 Genomes Project. THe files can be downloaded at https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#reference
$ plink1.9 --vcf Sample_chr9.vcf --make-bed --out Sample_chr9
$ plink1.9 --vcf Sample_chr9.vcf --recode --out Sample_chr9
$ shapeit --input-ped Sample_chr9.ped Sample_chr9.map --input-map {1000GP_Phase3/genetic_map_chr7_combined_b37.txt} --input-ref .{1000GP_Phase3/1000GP_Phase3_chr8.hap.gz} {1000GP_Phase3/1000GP_Phase3_chr8.legend.gz} {1000GP_Phase3/1000GP_Phase3.sample} --output-max Sample_chr9_phased.haps Sample_chr9_phased.sample
$ shapeit --input-ped Sample_chr9.ped Sample_chr9.map --input-map {1000GP_Phase3/genetic_map_chr7_combined_b37.txt} --input-ref {1000GP_Phase3/1000GP_Phase3_chr8.hap.gz} {1000GP_Phase3/1000GP_Phase3_chr8.legend.gz} {1000GP_Phase3/1000GP_Phase3.sample} --output-max Sample_chr9_phased.haps Sample_chr9_phased.sample --exclude-snp {output.snp.strand.exclude}