Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building Custom Genome/Working with published genome #275

Open
riyabelani opened this issue Apr 9, 2024 · 1 comment
Open

Building Custom Genome/Working with published genome #275

riyabelani opened this issue Apr 9, 2024 · 1 comment

Comments

@riyabelani
Copy link

Hello, I am trying to analyze RRBS data from M. californianus and there is a published scaffold genome that I used in Bismark to get bam and bed files. I want to visualize these against the genome so I can see where on the genome methylation is occurring. I am following the YouTube tutorial for creating a custom genome however in the tutorial when it says to create pseudo-chromosomes, the number 25 is used. Most Mytilus species have 14 genomes so I thought I should put 14 but when I do, it creates 19 anyways. What should I do and if the genome does not have chromosomal annotations but did have a gff file, will this work?

I am also doing this with a chromosomally annotated genome for M. trossulus but this genome was not showing up on SeqMonk. Would I have to create a custom genome for that analysis as well? My goal is to be able to visualize at what chromosomes DNA Methylation is occurring.

Thank you!

@s-andrews
Copy link
Owner

Pseudo chromosomes aren't meant to represent real chromosomes so the number you use doesn't have to relate to the species you're using. The problem is that if you use a scaffold based assembly then each scaffold will become a chromosome, so much of the interface in seqmonk will be unusable (genome view, any filter based on chromosomes etc). Also the data caching is done at the level of a chromosome so if you have thousands of chromosomes then you'll have tens of thousands of cache files and everyhing will be super slow.

A pseudo chromosome just groups scaffolds together into sensibly sized chunks. The scaffolds are still independent, it's just a display thing. We say around 25 as that's a good ratio for reducing your data into cached chunks.

When you do your analysis you're not using the pseudo chromosomes. You'll use genes or scaffolds in your reporting so this is just a technical implementation detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants