Solo encountering Nan values with 10x data #71

symbiologist · 2022-02-17T02:53:11Z

I'm getting this error while running solo on one (but not all) of my samples:

ValueError: Expected parameter loc (Tensor of shape (128, 64)) of distribution Normal(loc: torch.Size([128, 64]), scale: torch.Size([128, 64])) to satisfy the constraint Real(), but found invalid values: tensor([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], device='cuda:0', grad_fn=<AddmmBackward0>)

Using this as my model json file, if helpful:
{ "n_hidden": 384, "n_latent": 64, "n_layers": 1, "cl_hidden": 128, "cl_layers": 1, "dropout_rate": 0.2, "learning_rate": 0.001, "valid_pct": 0.10 }

Interestingly, if I filter out cells with < 1000 UMI, this error goes away. Weirdly, I have several other samples where this does not appear to be a problem at all (regardless of filtering).

Any thoughts on how to resolve this without applying the UMI cutoff? Thanks!

The text was updated successfully, but these errors were encountered:

Elhl93 · 2022-02-22T14:53:39Z

Hi, I also encounter the same error. To me, the UMI filter also did not resolve for all samples this issue. I work on a HPC - depending on the GPU node, this error shows up and sometimes not. Thanks for further support on this.

njbernstein · 2022-02-22T17:39:04Z

Hi all,

@symbiologist @Elhl93 I was on leave for a couple weeks so sorry about the slow response.

Removing high and low count cells and genes is sometimes necessary to have the model converge.

I will often use the following filters to prevent this issue:

sc.pp.filter_cells(adata, min_counts = 50,)
sc.pp.filter_genes(adata, min_counts = 100, )
sc.pp.filter_genes(adata, max_counts = 100000, )

I will add this suggestion to the README.

symbiologist · 2022-02-22T17:53:26Z

Thanks for the tip, and appreciate the response!

I'll echo @Elhl93 's sentiments - I did some more testing and found that filtering on UMI and genes didn't always determine whether solo would throw this error.

I found that changing the n_hidden (changed to 320 instead of 384) or dropout_rate (to to 0.1 from 0.2) in the model parameters would resolve the issue (on the same unfiltered dataset). Sometimes changing the seed value would resolve it too, though it was not as reliable as reducing n_hidden. I've noticed reducing n_hidden would typically lead to more doublets called.

My knowledge of VAEs is limited - which parameters would be most appropriate to change (or not change) in this case? Should we be trying to go with the max n_hidden value and find a dropout_rate and seed value that allows solo to run, or is it more robust to simply reduce n_hidden?

Thank you!

njbernstein · 2022-02-22T18:06:41Z

Interesting. I'm always hesitant to increase n_hidden as I don't won't to over-parameterize the model. Upping the dropout rate should be fine. Adjusting the learning_rate parameter up or down can sometimes be helpful as well.

Deep learning models are tricky to have converge for all datasets and I've found the defaults to work in majority of cases when I remove observations (cells) or features (genes) which are orders of magnitude different.

If you can share your data I'm happy to take a look to see if I can understand why its happening.

Elhl93 · 2022-02-23T12:56:10Z

Hi, when changing the learning rate it runs stable. I have 15 samples, previously I had 6 errors, now none. However, it takes more time. Just for you information, I also tested an older version of solo (0.3), but it runs stable with the default settings. However, as the learning rate should be flexible, I would suggest to make them parameters in the function. If I see it correctly you can't change the learning rate for solo.train in the json file. I forked the repo and exchanged the hard-coded learning rate into a parameter with different default, happy to make a pull request if this is of interest.

njbernstein · 2022-02-23T17:05:46Z

Always happy to take a pull request!

Its a good idea.

Elhl93 · 2022-02-25T14:48:16Z

I made a pull request. I tested the branch on my machine, it worked fine.

njbernstein closed this as completed Apr 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solo encountering Nan values with 10x data #71

Solo encountering Nan values with 10x data #71

symbiologist commented Feb 17, 2022

Elhl93 commented Feb 22, 2022

njbernstein commented Feb 22, 2022

symbiologist commented Feb 22, 2022

njbernstein commented Feb 22, 2022

Elhl93 commented Feb 23, 2022

njbernstein commented Feb 23, 2022

Elhl93 commented Feb 25, 2022

Solo encountering Nan values with 10x data #71

Solo encountering Nan values with 10x data #71

Comments

symbiologist commented Feb 17, 2022

Elhl93 commented Feb 22, 2022

njbernstein commented Feb 22, 2022

symbiologist commented Feb 22, 2022

njbernstein commented Feb 22, 2022

Elhl93 commented Feb 23, 2022

njbernstein commented Feb 23, 2022

Elhl93 commented Feb 25, 2022