-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newbie questions: warning message, model parameters, and outputs #73
Comments
Hi there sorry for the very late response.
|
No worries! Thanks for your answers. |
Actually, I have another question- what kind of subsetting do you suggest before using solo? The clustering-aware tools seem to work best when low UMI cells are removed, but all of the upper outliers are left in place- and this makes sense. However, it seems like the way solo was built to handle the data, this might not matter too much? |
If there is a threshold above which you are certain that a cell is a doublet, you should go ahead and pre-remove those before running Solo. Otherwise, the training phase will incorrectly treat those as singlets. |
Thanks Dave, that makes a lot of sense, given how Solo works compared to other methods. |
Hi, I've used solo a few times now and am really appreciating how user-friendly it is. It seems to work really well on my dataset.
Every time I run solo, I get the following warning:
"UserWarning: Make sure the registered X field in anndata contains unnormalized count data."
I want to confirm that this is a normal error that shows with scvi-tools and I'm not screwing something up- before running Solo, I've been removing ambient RNA and empty droplets from my dataset using CellBender, then doing some subsetting in Seurat to remove droplets with aggressively low or high counts, but that's all.
I found the warning in this vignette on the scvi-tools website: https://docs.scvi-tools.org/en/0.13.0/user_guide/notebooks/scarches_scvi_tools.html so my gut says it's ok but I figured I'd check since I'm new at all this.
This relates to the model parameters:
On the README, the example parameters are:
{
"n_hidden": 384,
"n_latent": 64,
"n_layers": 1,
"cl_hidden": 128,
"cl_layers": 1,
"dropout_rate": 0.2,
"learning_rate": 0.001,
"valid_pct": 0.10
}
But the model.json file included with Solo has:
{
"n_hidden": 128,
"n_latent": 16,
"cl_hidden": 64,
"cl_layers": 1,
"dropout_rate": 0.1,
"learning_rate": 0.001,
"valid_pct": 0.10
}
Which of these should I use for regular snRNAseq data? is one of these examples intended for use with demultiplexing/hashsolo?
I wanted to make sure that I should go by the is_doublet.csv binary predictions, and not worry about the preds.npy files etc.
My data are from muscle nuclei and when I create a FeaturePlot for a particular muscle marker, many of the non-muscle cells that express this marker are categorized as doublets in is_doublet.csv, so the results seem reasonable.
I got some example code from a colleague for adding solo calls to seurat metadata and they use rcpp to import the preds.npy output to R, which doesn't work well for me. It's something to do with my newer versions of either rcpp or solo- he gets consistent number strings for the binary "T" and 0 for "F" upon importing the preds.npy file to R, and I have more than 2 different number strings when I do this. When I open my preds.npy output in python it IS binary, and matches what I see in is_doublet.csv. I'm happy to cut out the rcpp middleman and just use is_doublet.csv, but wanted to make sure that this is correct.
I just started using solo in Dec2021/Jan2022 so I suspect the other person wrote the code for data generated with an older version of solo, given what you mentioned in Difference between is_doublet and preds #62.
However, it's been several months since that issue was resolved so I wanted to make sure that I'm using the correct output files.
Thank you so much for your time!!
The text was updated successfully, but these errors were encountered: