-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Most cells are unassigned and a few of them do not match the filters #99
Comments
Hi, thanks for sending the detailed info (& trust). The first thing I noticed is the unusual allele rate: [0.117 0.281 0.468], instead of around [0, 0.5, 1]. After reading your commands, one thought came into my mind:
To fix it, you can try randomly selecting half of the variants and flipping the ALT and REF alleles, for example by
Hope this may help you. Yuanhua |
Thanks for the suggestions. I tried a (crude version of) what you suggested flipping the ALT/REF for half the variants and got similarly bad results with a flipped allele rate:
I have delved a bit into this, and it seems that these samples are first generation hybrids between C57BL6 and FVB mice strains. That means that vireo's underlying allele rate assumption will never be true for these samples. I tried downloading the strain-specific VCFs from the Mouse Genomes Project and limit the analysis to either the merged SNV set, or the intersection SNV set between both strains. Merged:
Decent amount of SNV, bad allelic rates, poor classification. Intersect:
Extremely low number of SNV, better allelic ratios, no donor classification power. Can you think of any way to make vireo work with these kind of samples? Otherwise, do you know of any other similar tool that could make them work? Thank you very much. |
For your last strategy, maybe you can double-check with a similar one used in this paper:
|
I've just tried that:
Similarly to using only the "intersected" SNV, using these variants give good allelic ratios, but they are simply not enough of them to classify donors.
|
Hello, I have also encountered this problem. Have you solved it |
No. We ended up considering these samples a lost cause and we threw them away. We focused our analysis on a different set of samples that had a "better genetic background" and were only a mixture of 2 animals. We were able to use vireo + souporcell + sex gene information to demultiplex those samples. |
Sorry to hear this. If you want to share this data (email me the link [email protected]), I may give it a try when I have time and see if there is anything we can help. Yuanhua |
Hi,
first of all, thank you for this tool. It has really saved our ass in a different experiment where HTO-based demultiplexing failed. Thank you a lot.
I am running now vireo on two 10x sequencing runs, each containing 4 samples (mouse, littermates with WT/mutant genotypes, 3F/1M or 1F/3M in each run).
I created a VCF file from the single cell sequencing on each of the samples with:
This resulted in 60,797 SNV for the first sequencing run (25,647 cells) and 106,960 for the second one (19,207 cells).
Then I run vireo using the same command I had success with in a previous analysis:
However, the results I got have an overwhelming amount of "unassigned" samples:
Looking a other posts here I also tried to run it with
--callAmbientRNAs
:with
--callAmbientRNAs -M 200
:And with
--callAmbientRNAs -M 1000
:with identical results.
I also explored the results of the first run in R and I see the following distribution:
With most unassigned cells having moderate values of prob_max and prob_doublet < 0.25
Also, the values for the unassigned cells overlap those of donor-assigned cells:
26 unassigned cells have prob_max = 0.9, which, after looking into the code (io_utils.py), I don't understand how were they deemed "unassigned" because they all have n_vars > 10.
It is possible that these cells "suffered" a lot during the processing and there is a lot of ambient RNA floating around. How should I deal with all this? Should I just use a lower prob_max threshold and maybe use a different doublet finder software down the line?
PS: running vireo with --noDoublet leaves 4,726 unassigned cells.
Oh, and in the previous runs, 6,371 unassigned cells have their
best_singlet
not in their ownbest_doublet
list.The text was updated successfully, but these errors were encountered: