Skip to content

Data files elaborating the data analysis of a rare cis-AB blood group from Indian subcontinent

Notifications You must be signed in to change notification settings

mercywilliams160896/Indian_CisAB_Casereport

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Indian_CisAB_Casereport

This repository comprises a list of data files elaborating the methodology and data analysis of a rare cis-AB blood group identified in Indian subcontinent

The sample used in the study was subjected to whole exome sequencing. Computational analysis of the sequencing data duly invloved the alignment of the raw sequencing output files to GRCh37/hg19 human reference genome. This was followed by systematic compilation of genetic variations in Variant Call Format (VCF) file. The variants were annotated for their functional consequences using ANNOVAR.

For ease of analysis the alignment and variant call files were subsetted (including chromosome 9 data exclusively). The subsetted files were further used to analyse the phasing of genetic variants using Plink and SHAPEIT tools.

Tools/Packages used for analysis


1. FastQC
2. Illumina DRAGEN Bio-IT platform
3. ANNOVAR
4. Samtools
5. Awk scripting
6. PLINK
7. SHAPEIT

Steps followed in the analysis of data

Input files

Raw sequencing outfiles (.fastq files - paired end). Samples underwent Whole Exome Sequencing (WES).

Quality Control, Alignment and Variant Calling

Read files were checked for their quality using FastQC and were subjected to alignment to reference genome (GRCh37/hg19) and variant calling using Illumina DRAGEN v3.4 Bio-IT platform
Commands used

$ fastqc Sample_R1.fastq.gz
$ fastqc Sample_R2.fastq.gz
$ dragen -r {hg19 reference genome} -1 {Sample_R1.fastq.gz} -2 {Sample_R2.fastq.gz} --enable-variant-caller true --output-file-prefix {outfilename} --output-directory {outfilefolder}

Variant annotation and filtering

Compiled list of genetic variants were systematically annotated for their functional consequences from a range of computational tools using ANNOVAR
Commands used

$ table_annovar.pl {Sample.avinput} Annovar/humandb --buildver hg19 --outfile {outfilename-prefix} --protocol refGene,cytoBand,genomicSuperDups,dbnsfp33a,avsnp147,exac03,1000g2015aug_all --operation g,r,r,f,f,f,f --nastring NA --otherinfo

BAM and VCF subsetting

A smaller subset of the alignment (.bam) file comprising chromosome 9 information was created using SAMTOOLS. Similarly, variants spanning chromosome 9 were subsetted from the ouput VCF using bespoke AWK commands/scripts
Commands used

$ samtools view {Sample.bam} chr9 -b > Sample_chr9.bam
$ awk -F'\t' '{if($1 == "chr9") print $0}' {Sample.vcf} > Sample_chr9.vcf

File preprocessing and variant phasing

The subsetted vcf file was preprocessed to .ped and .map file formats using PLINK tool. The obtained PLINK output files were further used to perform variant phasing using SHAPEIT tool utilizing the publicly available refrence panel of haplotypes provided by the 1000 Genomes Project. THe files can be downloaded at https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#reference
Commands used

$ plink1.9 --vcf Sample_chr9.vcf --make-bed --out Sample_chr9
$ plink1.9 --vcf Sample_chr9.vcf --recode --out Sample_chr9
$ shapeit --input-ped Sample_chr9.ped Sample_chr9.map --input-map {1000GP_Phase3/genetic_map_chr7_combined_b37.txt} --input-ref .{1000GP_Phase3/1000GP_Phase3_chr8.hap.gz} {1000GP_Phase3/1000GP_Phase3_chr8.legend.gz} {1000GP_Phase3/1000GP_Phase3.sample} --output-max Sample_chr9_phased.haps Sample_chr9_phased.sample
$ shapeit --input-ped Sample_chr9.ped Sample_chr9.map --input-map {1000GP_Phase3/genetic_map_chr7_combined_b37.txt} --input-ref {1000GP_Phase3/1000GP_Phase3_chr8.hap.gz} {1000GP_Phase3/1000GP_Phase3_chr8.legend.gz} {1000GP_Phase3/1000GP_Phase3.sample} --output-max Sample_chr9_phased.haps Sample_chr9_phased.sample --exclude-snp {output.snp.strand.exclude}

About

Data files elaborating the data analysis of a rare cis-AB blood group from Indian subcontinent

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published