Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between is_doublet and preds #62

Closed
ymahmoud opened this issue Apr 23, 2021 · 2 comments
Closed

Difference between is_doublet and preds #62

ymahmoud opened this issue Apr 23, 2021 · 2 comments

Comments

@ymahmoud
Copy link

ymahmoud commented Apr 23, 2021

Hi!
I'm trying your tool to identify doublets in my scRNASeq data but I'm not sure why I get slightly different results in is_doublet.csv and preds.csv files. I'm not using the -e parameter (expected_number_of_doublets), so shouldn't they be the same? What's the difference between these files?

Thanks!!
Yamil

@njbernstein
Copy link
Contributor

Great question and something I need to clean up in the readme. They are supposed to be different now because we recalibrate the predictions the come directly out of solo to account for the fact that during training the model sees twice as many doublets as singlets. Because this doesn't reflect what a real scRNA-seq experiment will look like the scores get recalibrated and then cells with a softmax greater than .5 get called a doublet. This is what is written to is_doublet.csv and is the file you want. preds.csv are the raw predictions out of solo

Long story short is_doublet.csv are the doublet calls you want.

@ymahmoud
Copy link
Author

Great question and something I need to clean up in the readme. They are supposed to be different now because we recalibrate the predictions the come directly out of solo to account for the fact that during training the model sees twice as many doublets as singlets. Because this doesn't reflect what a real scRNA-seq experiment will look like the scores get recalibrated and then cells with a softmax greater than .5 get called a doublet. This is what is written to is_doublet.csv and is the file you want. preds.csv are the raw predictions out of solo

Long story short is_doublet.csv are the doublet calls you want.

Great, I got it. Thank you for your quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants