Skip to content

Latest commit

 

History

History
72 lines (61 loc) · 4.03 KB

README.md

File metadata and controls

72 lines (61 loc) · 4.03 KB

Downloading and mapping reads

The raw reads from Barson et al 2015 were downloaded from the ENA https://www.ebi.ac.uk/ena/browser/home. Download links can be found in PRJEB10744.txt.

mkdir /scratch/project_2002047/barson_mapping_v2
mkdir /scratch/project_2002047/barson_mapping_v2/reads

python download_reads.py

Reads were cleaned/trimmed using trim galore with cutadapt and then mapped to the salmon reference genome with BWA. Duplicates were marked and read group info added with picard. The above was performed per fastq pair, the resulting bam files were merged to produce individual bams.

mkdir /scratch/project_2002047/barson_mapping_v2/clean_reads
python clean_reads.py -fastq_dir /scratch/project_2002047/barson_mapping_v2/reads/ -out_dir /scratch/project_2002047/barson_mapping_v2/clean_reads/

ls /scratch/project_2002047/barson_mapping_v2/clean_reads/ERR1013*1.fq.gz | python get_read_group_info.py
mkdir /scratch/project_2002047/barson_mapping_v2/bams
python map_reads.py -ref /scratch/project_2002047/sal_reseq/bams/Reference_genome_with_SDY/GCF_000233375.1_ICSASG_v2_genomic_with_SDY.fna -bam_dir /scratch/project_2002047/barson_mapping_v2/bams/ -read_info_file fastq_and_rg.txt

# some failed
ls /scratch/project_2002047/barson_mapping_v2/clean_reads/*1.fq.gz | cut -d '/' -f 6 | cut -d '_' -f 1 > fastqs_bams.txt
ls /scratch/project_2002047/barson_mapping_v2/bams/*.dedup.bam | cut -d '/' -f 6 | cut -d '.' -f 1 >> fastqs_bams.txt
sort fastqs_bams.txt | uniq -c | grep -w '1' | tr -s ' ' | cut -d ' ' -f 3 | while read i; do sbatch --account=project_2002047 /scratch/project_2002047/barson_mapping_v2/bams/${i}_mapping_job.sh ; done

mkdir /scratch/project_2002047/barson_mapping_v2/merged_bams/
python merge_bams.py -bam_in /scratch/project_2002047/barson_mapping_v2/bams/ -bam_out /scratch/project_2002047/barson_mapping_v2/merged_bams/ -info_file fastq_and_rg.txt -ref /scratch/project_2002047/sal_reseq/bams/Reference_genome_with_SDY/GCF_000233375.1_ICSASG_v2_genomic_with_SDY.fna

# rerunning failed merges
ls /scratch/project_2002047/barson_mapping_v2/merged_bams/*bam | cut -d '/' -f 6 | cut -d '.' -f 1 > bams_done.txt
ls /scratch/project_2002047/barson_mapping_v2/merged_bams/*sh | cut -d '/' -f 6 | cut -d '_' -f 1,2,3 >> bams_done.txt
sort bams_done.txt | uniq -c | grep -w '1' | tr -s ' ' | cut -d ' ' -f 3 | while read i; do sbatch --account=project_2002047 /scratch/project_2002047/barson_mapping_v2/merged_bams/${i}_merge_job.sh ; done

ls /scratch/project_2002047/barson_mapping_v2/merged_bams/*.txt | python coverage_summary.py

Read coverage in resulting mapped data:

run accession sample mean coverage median coverage
ERR1013407 Alta_12_0001 7.520952 8
ERR1013495 Alta_12_0124 3.349194 3
ERR1013583 Alta_12_0228 7.705269 8
ERR1013463 Arga_12_0075 7.728542 8
ERR1013471 Arga_12_0082 9.305405 10
ERR1013479 Arga_12_0089 13.245793 14
ERR1013647 Jols_13_0001 10.631584 11
ERR1013639 Jols_13_0009 8.723167 9
ERR1013655 Jols_13_0027 7.64258 8
ERR1013447 Nams_12_0024 10.828244 11
ERR1013455 Nams_12_0071 10.654392 11
ERR1013439 Nams_12_0201 6.65685 7
ERR1013423 Naus_12_0014 9.373985 10
ERR1013415 Naus_12_0037 7.585018 8
ERR1013431 Naus_12_0059 12.050208 13
ERR1013615 Repp_12_0007 8.360628 9
ERR1013623 Repp_12_0023 6.412691 6
ERR1013631 Repp_12_0034 12.499478 13
ERR1013487 Uts_11_15 7.155256 7
ERR1013511 Uts_11_17 19.169955 21
ERR1013519 Uts_11_24 9.167601 10
ERR1013527 Uts_11_26 5.419665 5
ERR1013535 Uts_11_27 6.556109 7
ERR1013543 Uts_11_28 6.834338 7
ERR1013551 Uts_11_29 4.531277 4
ERR1013559 Uts_11_30 6.543761 7
ERR1013567 Uts_11_31 6.588743 7
ERR1013575 Uts_11_39 2.994131 3
ERR1013591 Uts_11_46 7.094451 7
ERR1013599 Uts_11_52 3.360987 3
ERR1013607 Uts_11_53 8.927331 9