This is data prepartion process for whole wokrflow. It is computationally intensive, so we would subset the data for demonstation purpose . The following codes is for those who want to practice whole, real-life WGS workflow
TL;DR : run following code
curl -o run.sh https://raw.githubusercontent.com/JiehoonKwak/MSE801_JHLEE/main/download.sh && chmod +x download.sh && ./download.sh
Raw sequence reads & reference genome download
Download sequence reads using SRA explorer
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR713/000/SRR7138440/SRR7138440_1.fastq.gz -o SRR7138440_WES_of_homo_sapiens_blood_of_brain_tumor_patient_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR713/000/SRR7138440/SRR7138440_2.fastq.gz -o SRR7138440_WES_of_homo_sapiens_blood_of_brain_tumor_patient_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR713/008/SRR7138438/SRR7138438_1.fastq.gz -o SRR7138438_WES_of_homo_sapiens_subventricular_zone_of_brain_tumor_patient_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR713/008/SRR7138438/SRR7138438_2.fastq.gz -o SRR7138438_WES_of_homo_sapiens_subventricular_zone_of_brain_tumor_patient_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR713/005/SRR7138435/SRR7138435_1.fastq.gz -o SRR7138435_WES_of_homo_sapiens_tumor_of_brain_tumor_patient_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR713/005/SRR7138435/SRR7138435_2.fastq.gz -o SRR7138435_WES_of_homo_sapiens_tumor_of_brain_tumor_patient_2.fastq.gz
Download reference genome
curl -L https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz -o hg38.fa.gz
Download Base Quality Score Recalibration (BQSR)
curl -L https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf -o Homo_sapiens_assembly38.dbsnp138.vcf
curl -L https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx -o Homo_sapiens_assembly38.dbsnp138.vcf.idx
Download Mutect2 resources
curl -L https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0/somatic-hg38/af-only-gnomad.hg38.vcf.gz -o af-only-gnomad.hg38.vcf.gz
curl -L https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0/somatic-hg38/af-only-gnomad.hg38.vcf.gz.tbi -o af-only-gnomad.hg38.vcf.gz.tbi
curl -L https://storage.googleapis.com/gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz -o 1000g_pon.hg38.vcf.gz
curl -L https://storage.googleapis.com/gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz.tbi -o 1000g_pon.hg38.vcf.gz.tbi
curl -L https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0/exome_calling_regions.v1.1.interval_list -o exome_calling_regions.v1.1.interval_list
gatk FuncotatorDataSourceDownloader --somatic --validate-integrity --extract-after-download --hg38