Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export variants to VCF format #17

Open
gkarthik opened this issue Mar 23, 2020 · 8 comments
Open

Export variants to VCF format #17

gkarthik opened this issue Mar 23, 2020 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@gkarthik
Copy link
Member

ivar variants outputs results only in custom .tsv format so unusable with other tools.

Allow option to export variants in VCF v4.3 format.

@gkarthik gkarthik self-assigned this Mar 23, 2020
@gkarthik gkarthik added the enhancement New feature or request label Mar 23, 2020
@gkarthik gkarthik added this to the v1.1 milestone Mar 23, 2020
@gkarthik gkarthik modified the milestones: v1.2, v1.3 Apr 3, 2020
@drpatelh
Copy link

@svarona wrote a simple Python script to do this for the nf-core/viralrecon pipeline if its useful @gkarthik.

@paulstretenowich
Copy link

Hi, for deletions ivar is outputting, for a reference having ACGT and a read having a deletion of CGT:

REF ALT
A -CGT

And general VCF format is:

REF ALT
ACGT A

So in your python script you should do something like:

ref = line[2]
alt = line[3]
if alt[0] == "-":
    ref += alt[1:]
    alt = line[2]

Paul

@drpatelh
Copy link

drpatelh commented May 1, 2020

Thanks @paulstretenowich Ill fix that 👍

@drpatelh
Copy link

drpatelh commented May 1, 2020

Having looked at this properly now @paulstretenowich it appears that there was a bug in the way insertions were being reported too as fixed here

@paulstretenowich
Copy link

You are right @drpatelh I did just a naive fix on a specific case I was facing but I suspected other cases. Thanks 👍

@fanninpm
Copy link

Summary

I ran a few of iVar's franken-VCF files (from the Cecret workflow) through the vcfR package, but not all of the expected graphs showed up when I generated a chromR object and plotted it. On the iVar end, it would be nice to have the MQ and QUAL fields. (The DP field seems to already be present, however it isn't used for some reason.)


Reproducing

I have included the following instructions for retracing my steps (some of this is aimed more towards the people coming from the vcfR repo):

Steps to reproduce:

System Requirements:

  • Downloading the initial FASTQ files:
  • Generating the VCF file:
    • the master branch of https://github.com/UPHL-BioNGS/Cecret (you actually have to download this, as the repo itself has no main.nf file)
    • nextflow (available on Bioconda)
    • docker or singularity (the latter is available on conda-forge)
    • gigabytes of space for the Docker/Singularity images (you can prune this down by turning things off in a nextflow {filename}.config file)
  • Using vcfR to generate the plots:
    • a recent enough version of R (I was using version 3.5.2)
    • vcfR (available on CRAN, but you can also get it from conda-forge if you prefer downloading R and vcfR via conda)
    • (OPTIONAL) RStudio (so you can easily interact with R and save the plots)

Step 1 (downloading the initial FASTQ files):

  • Create a directory in which you will hold your paired-end FASTQ gz archives. Navigate to that directory.
  • Run the following commands (the -p flag is optional, and displays a progress bar):
prefetch -p SRR11953924
fasterq-dump -p --split-files SRR11953924/SRR11953924.sra

Here, I chose SRR11953924 as an example, as it seemed to be the most interesting.

Step 2 (generating the VCF file):

  • (OPTIONAL) Write a config file that disables what you don't need (this prevents docker/singularity from downloading excessive containers). Here's an example (note that process.ivar_variants is the process that runs ivar variants, so it should remain set to true):
params.fastqc = false
params.ivar_variants = true
params.samtools_stats = false
params.samtools_coverage = false
params.samtools_flagstat = false
params.samtools_ampliconstats = false
params.bedtools_multicov = false
params.nextclade = false
params.snpdists = false
params.iqtree = false
  • Run the Cecret workflow, generating the VCF file, with a command similar to:
nextflow -C {your_config_file}.config run Cecret.nf -c configs/singularity.config {directory_from_step_1}

Step 3 (using vcfR to generate the plots):

  • NOTE: Because the SARS-CoV-2 genome is only ~30 kbp in size, I found a window size of 500 (the default is 1000) worked best for display purposes.
  • Run the following R commands (folks coming from the iVar repository can feel free to ignore the spurious warning about the names not matching up):
library(vcfR)

dna <- ape::read_dna("{cecret_directory}/configs/MN908947.3.fasta", format = "fasta")
gff <- ape::read_gff("{cecret_directory}/configs/MN908947.3.gff")
vcf <- read.vcfR("{path_to_your_cecret_output}/ivar_variants/SRR11953924.ivar_variants.vcf", verbose = TRUE)

chrom <- create.chromR(name = "SRR11953924", vcf = vcf, seq = dna, ann = gff)
chrom <- masker(chrom)
chrom <- proc.chromR(chrom, win.size = 500, verbose = TRUE)

plot(chrom)

chromoqc(chrom)

Here's the chromR plot (window size 500, notice that 3 out of the 4 plots are missing, compared to the first plot found on this page):
image

Here's the chromoqc plot (window size 500, the low number of variants per site is normal for this sample; however, the top three plots are missing, compared to the last plot found on this page):
image

@Moslemecoh
Copy link

Filter variants across replicates with iVar
The following is the command for filter variants:
ivar filtervariants -t 0.5 -p test.filtered P1_S1.tsv P2_S2.tsv P3_S3.tsv

But always encountering the error message below.
Header format of P1_S1.tsv did not match! Please use files generated using "ivar variants" command

I have double checked the header format of those file which do not have any change. Even though, I have changed tsv to txt file. What could be the problem and solution to resolve this issue.

@cmaceves
Copy link
Collaborator

Hi @Moslemecoh! Thanks for bringing this up - I think the better place for this conversation is #178. If you could redirect there, and possibly share the files from the command I can take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

6 participants