Improvement in Variant calling #28

Rohit-Satyam · 2022-10-26T09:33:25Z

I am using COVIGATOR as a reference to build my own Nextflow pipeline and thanks for doing a wonderful job. Your code is clean and comprehensible. I have few queries and suggestions that I need your insight on

Suggestion

I stumbled upon this post, that recommends use of -x in bcftools mpileup flag to disable read-pair overlap detection, following an issue on bcftools and use of -h500 flag for not missing deletions of high coverage, see.

Also, maybe -m 10

EDIT 1: There is also some issue with IVAR related to spanning deletions that can be temporarily taken care of by using --ignore-overlaps. See issue1 and issue2

EDIT2: I also observe a weird trend of more variants being reported by ivar when we don't use --reference option in samtools pileup. Attaching the results obtained on same sample with the only difference of --reference Sars_cov_2.ASM985889v3.dna.toplevel.fa. Can you tell me what's happening?

ivar-results.xlsx

The text was updated successfully, but these errors were encountered:

Rohit-Satyam · 2022-10-31T09:39:52Z

Hi @priesgo. Do you have any thoughts about the points I mentioned above. Besides, I have one more doubt. It appears that the GATK HaplotypeCaller default behaviour is to decompose MNPs to SNPs (See Documentation where they use `--max-mnp-distance 0).

This can be achieved using bcftools norm --atomize option (see issue). However, in the normalization step of COVIGATOR, I see that this option is missing. I was wondering if we were to compare the VCF calls generated by covigator (intersect VCF calls from different tools), the presence of MNVs in some VCF and absence of MNVs in GATK might give false results

priesgo · 2022-10-31T10:22:24Z

Thanks for the kind words. Many points in a single issue, I'll try to cover them all and I will spin off particular tasks from this thread.

Starting from the end...

Normalization and MNVs

We actually "phase" the mutations from all variant callers and join them into MNVs if two conditions are met: 1) they are both clonal variants (VAF >= 80 %) and 2) overlap the same amino acid. We use some custom code for this under bin/phasing.py. I am sure there may be corner cases here though, please, let us know if you find any pathological case and we can revisit our approach.

iVar and reference

According to the samtools documentation when reference is provided then the Base Alignment Quality is performed.

BAQ (Base Alignment Quality)

BAQ is the Phred-scaled probability of a read base being misaligned. It greatly helps to reduce false SNPs caused by misalignments. BAQ is calculated using the probabilistic realignment method described in the paper “Improving SNP discovery by base alignment quality”, Heng Li, Bioinformatics, Volume 27, Issue 8 <https://doi.org/10.1093/bioinformatics/btr076>

BAQ is turned on when a reference file is supplied using the -f option. To disable it, use the -B option.

So, you may be filtering FPs when using the reference... will add this option here #29

iVar and spanning deletions

This looks reasonable although we have been focusing on LoFreq results so cannot comment. Will add though an option to the pipeline here #30

Improvements to BCFtools

Again we have not been looking so much into BCFtools results, but the best for the pipeline would to add optional BCFtools arguments for end-users to tweak variant calling. Will do this here #31

@Rohit-Satyam we accept contributions if you would want to extend the CoVigator pipeline to your needs instead of developing your own pipeline. That may help us both advance faster, but we are happy to help whatever your decision is.

Rohit-Satyam · 2022-10-31T12:24:23Z

Sorry, I have few more doubt

How do you handle "Consecutive variants called by ivar belonging to the same codon ". I can see that Viralrecon in their release 2.3 inculcated this feature.
Is Phasing necessary to be performed before using VCFs to produce assemblies using bcftools consensus. This is because I can see VCF2FASTA running before normalization and Phasing step.

It would be wonderful if that can happen. The only problem is I have just started learning nextflow. I will try to edit your code and if that doesn't work will try to share the additional modules (maybe via Email/ gists) that can be included in pipeline.

priesgo · 2022-11-01T17:08:25Z

How do you handle "Consecutive variants called by ivar belonging to the same codon ". I can see that Viralrecon in their nf-core/viralrecon#271 inculcated this feature.

We do the same. In order to get the annotation right you need to merge all variants that overlap the same codon. We do this for all variant callers, not only for iVar. This is the "phasing".

Is Phasing necessary to be performed before using VCFs to produce assemblies using bcftools consensus. This is because I can see VCF2FASTA running before normalization and Phasing step.

This is a good question! The phasing does not matter as bcftools consensus just builds the DNA sequence by applying all mutations, phasing is only important when predicting the amino acid change. But I wonder about normalization not sure if it will change things, but it definitely is more correct to apply it before, will handle this in #33

priesgo · 2022-11-01T17:09:59Z

I will try to edit your code and if that doesn't work will try to share the additional modules (maybe via Email/ gists) that can be included in pipeline.

Any contribution will be appreciated. You can also create a github branch and do your work there, I can help you to make it work if you face troubles.

This was referenced Oct 31, 2022

Use the reference in mpileup to reduce FPs #29

Closed

Add optional --ignore-overlaps to ivar #30

Closed

Add optional arguments to BCFtools #31

Closed

priesgo mentioned this issue Nov 1, 2022

Apply normalization of VCFs before bcftools consensus in preparation for Pangolin #33

Closed

Rohit-Satyam closed this as completed Nov 5, 2022

priesgo added a commit that referenced this issue Nov 19, 2022

use reference for mpileup for ivar #28

7d7c6e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement in Variant calling #28

Improvement in Variant calling #28

Rohit-Satyam commented Oct 26, 2022 •

edited

Loading

Rohit-Satyam commented Oct 31, 2022

priesgo commented Oct 31, 2022

Rohit-Satyam commented Oct 31, 2022 •

edited

Loading

priesgo commented Nov 1, 2022

priesgo commented Nov 1, 2022

Improvement in Variant calling #28

Improvement in Variant calling #28

Comments

Rohit-Satyam commented Oct 26, 2022 • edited Loading

Suggestion

Rohit-Satyam commented Oct 31, 2022

priesgo commented Oct 31, 2022

Normalization and MNVs

iVar and reference

iVar and spanning deletions

Improvements to BCFtools

Rohit-Satyam commented Oct 31, 2022 • edited Loading

priesgo commented Nov 1, 2022

priesgo commented Nov 1, 2022

Rohit-Satyam commented Oct 26, 2022 •

edited

Loading

Rohit-Satyam commented Oct 31, 2022 •

edited

Loading