Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test on the SRA for a known reference genome #8

Open
hgscott opened this issue Nov 19, 2024 · 2 comments
Open

Test on the SRA for a known reference genome #8

hgscott opened this issue Nov 19, 2024 · 2 comments
Assignees

Comments

@hgscott
Copy link
Member

hgscott commented Nov 19, 2024

Download reads from Sequence Read Archive (SRA) for an organism with a good reference. If that has a high SNP count then it is a pipeline issue.

@hgscott hgscott self-assigned this Nov 19, 2024
@hgscott
Copy link
Member Author

hgscott commented Nov 25, 2024

I decided to look for sequencing data for Mycoplasma mycoides JCVI-syn3.0, a synthetic bacterial genome with minimal genome content (Reference genome: CP014940.1). I'm hoping that it being minimal/synthetic will make sure that the SNPs really should be 0.

Similarly, I could also use PhiX 174 bacteriophage, which is apparently a commonly used as a control in Illumina sequencing, since it has a very small genome (~5.4 kb) (Reference genome: NC_001422.1).

On SRA, I found a lot of sequencing data just by searching for "Mycoplasma mycoides JCVI-syn3.0 ". Entires were either titled:

  • Mutation accumulation evolved population
  • Adapted evolution evolved population
  • Mutation accumulation ancestor
  • Adaptive evolution ancestor

I assume I will want an ancestor, not an evolved population, if the whole point is to have 0 SNPs.

This must all be coming from a paper- what is it?
On one of the indiviudal sequence entries I found links to the:

One potential issue: The paper mentions that the minimal cell has the highest recorded mutation rate for any cellular organism.

Do I need the Adapted evolution ancestor or the mutation accumulation ancestor?

  • From the paper "Mutation accumulation (MA) experiments are designed to reduce the influence of natural selection through repeated bottlenecks of evolving populations" and "In contrast to the mutation accumulation experiments, we conducted experiments that allowed bacteria to achieve large population sizes to increase the efficacy of natural selection."
  • Does that mean that the MA ancestor is already just a random colony, rather than a large culture? If so, I think that means I want the AE ancestor.

I downloaded the AE ancestor reads (from https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&page_size=10&acc=SRR15032631&display=download) and the reference genome (from https://www.ebi.ac.uk/ena/browser/view/CP014940).

@hgscott
Copy link
Member Author

hgscott commented Nov 25, 2024

I made a copy of my pipeline, and hardcoded it to run on just the one reference file, and something about the summary table isn't working. But from looking at the filtered VCF file, I am getting 28 SNPs called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant