Newbie questions: warning message, model parameters, and outputs #73

lauren-fish · 2022-03-07T22:19:36Z

Hi, I've used solo a few times now and am really appreciating how user-friendly it is. It seems to work really well on my dataset.

Every time I run solo, I get the following warning:
"UserWarning: Make sure the registered X field in anndata contains unnormalized count data."
I want to confirm that this is a normal error that shows with scvi-tools and I'm not screwing something up- before running Solo, I've been removing ambient RNA and empty droplets from my dataset using CellBender, then doing some subsetting in Seurat to remove droplets with aggressively low or high counts, but that's all.
I found the warning in this vignette on the scvi-tools website: https://docs.scvi-tools.org/en/0.13.0/user_guide/notebooks/scarches_scvi_tools.html so my gut says it's ok but I figured I'd check since I'm new at all this.
This relates to the model parameters:
On the README, the example parameters are:
{
"n_hidden": 384,
"n_latent": 64,
"n_layers": 1,
"cl_hidden": 128,
"cl_layers": 1,
"dropout_rate": 0.2,
"learning_rate": 0.001,
"valid_pct": 0.10
}
But the model.json file included with Solo has:
{
"n_hidden": 128,
"n_latent": 16,
"cl_hidden": 64,
"cl_layers": 1,
"dropout_rate": 0.1,
"learning_rate": 0.001,
"valid_pct": 0.10
}
Which of these should I use for regular snRNAseq data? is one of these examples intended for use with demultiplexing/hashsolo?
I wanted to make sure that I should go by the is_doublet.csv binary predictions, and not worry about the preds.npy files etc.
My data are from muscle nuclei and when I create a FeaturePlot for a particular muscle marker, many of the non-muscle cells that express this marker are categorized as doublets in is_doublet.csv, so the results seem reasonable.
I got some example code from a colleague for adding solo calls to seurat metadata and they use rcpp to import the preds.npy output to R, which doesn't work well for me. It's something to do with my newer versions of either rcpp or solo- he gets consistent number strings for the binary "T" and 0 for "F" upon importing the preds.npy file to R, and I have more than 2 different number strings when I do this. When I open my preds.npy output in python it IS binary, and matches what I see in is_doublet.csv. I'm happy to cut out the rcpp middleman and just use is_doublet.csv, but wanted to make sure that this is correct.
I just started using solo in Dec2021/Jan2022 so I suspect the other person wrote the code for data generated with an older version of solo, given what you mentioned in Difference between is_doublet and preds #62.
However, it's been several months since that issue was resolved so I wanted to make sure that I'm using the correct output files.

Thank you so much for your time!!

njbernstein · 2022-03-28T21:35:01Z

Hi there sorry for the very late response.

You should be fine. I'd double check that they are counts but I get that warning when I run it on counts
Either one of those will work. I will update them to be the same, but default to the one in the model.json file.
Yes use the is_doublet.csv output.

lauren-fish · 2022-03-29T15:41:14Z

No worries! Thanks for your answers.

lauren-fish · 2022-04-17T22:14:27Z

Actually, I have another question- what kind of subsetting do you suggest before using solo?
I think I'm ok with removing the bottom 1% of cells by nFeature or setting a threshold of >150 features, but is it ok/recommended to remove the top 1% of cells (by nFeature)?

The clustering-aware tools seem to work best when low UMI cells are removed, but all of the upper outliers are left in place- and this makes sense.

However, it seems like the way solo was built to handle the data, this might not matter too much?

davek44 · 2022-04-24T22:46:02Z

If there is a threshold above which you are certain that a cell is a doublet, you should go ahead and pre-remove those before running Solo. Otherwise, the training phase will incorrectly treat those as singlets.

lauren-fish · 2022-04-24T22:52:03Z

Thanks Dave, that makes a lot of sense, given how Solo works compared to other methods.

lauren-fish changed the title ~~Warning message, model parameters, and another general question~~ Newbie questions: warning message, model parameters, and outputs Mar 7, 2022

lauren-fish closed this as completed Mar 29, 2022

lauren-fish reopened this Apr 17, 2022

davek44 closed this as completed Apr 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newbie questions: warning message, model parameters, and outputs #73

Newbie questions: warning message, model parameters, and outputs #73

lauren-fish commented Mar 7, 2022

njbernstein commented Mar 28, 2022

lauren-fish commented Mar 29, 2022

lauren-fish commented Apr 17, 2022

davek44 commented Apr 24, 2022

lauren-fish commented Apr 24, 2022

Newbie questions: warning message, model parameters, and outputs #73

Newbie questions: warning message, model parameters, and outputs #73

Comments

lauren-fish commented Mar 7, 2022

njbernstein commented Mar 28, 2022

lauren-fish commented Mar 29, 2022

lauren-fish commented Apr 17, 2022

davek44 commented Apr 24, 2022

lauren-fish commented Apr 24, 2022