This is only for demonstration purpose, for 2024 MSE801 KAIST
- Reference : GATK4 Best Practices, Biostar Handbook
We are going to use subset of data for demonstration purpose from this paper : Human glioblastoma arises from subventricular zone cells with low-level driver mutations
For those who wants to follow the whole workflow with real-world dataset, refer to this data and GATK4 Best Practices
- In your labtop, install
IGV
and processed data
- IGV : for visualization of bam/vgf files
- Processed data : I already run the codes and uploaded the processed data. You can download the data from the link. We will run subsetted data, because the original data is computationaly expensive. (cf. links will be expired after class)
- if the above link does not work, please use this Google Drive link
- install packages from
requirements.txt
in your own, new conda environment
curl -L -O https://raw.githubusercontent.com/JiehoonKwak/MSE801_JHLEE/main/requirements.txt
conda create -n YOUR_ENV -y
conda activate YOUR_ENV
conda install --file requirements.txt -c conda-forge -c bioconda -y
- download the data from the link provided in HERE
bash <(curl -s https://raw.githubusercontent.com/JiehoonKwak/MSE801_JHLEE/main/download_demo.sh)
- Then, make a link to shared files from the server
# first go to your working directory
mkdir -p ref && cd ref && ln -s /home/users/SHARE/jhlee/ref/* .
Ready? Let's start!
- Part1 : Process Analyze-ready bam file
- Part2 : Variant Calling & Downstream Analysis
- Part3 : Amplicon sequencing using
dada2
orCRISPResso