scvi is deprecated #50

adamgayoso · 2020-09-18T06:21:57Z

Hi all,

I wanted to let you all know that we are deprecating the scvi software package. We recently pushed a final update (0.6.8) with a deprecation warning upon import. scvi is now implemented in the scvi-tools package (same repo). Based on viewing your code, it seems like the easiest thing to do at the moment is pin the scvi requirement to 0.6.7. We will be changing some of the classes you use (like Classifier etc.) in scvi-tools, so it will require non-trivial updating to achieve the same solo algorithm. To fix the issue with the invalid parameter, you might consider detecting outliers in the dataset in terms of library size, possibly truncating the library size prior, or reducing the number of latent dimensions.

Please let me, @galenxing, or @romain-lopez know if you have any questions.

njbernstein · 2020-09-18T18:00:34Z

Hi @adamgayoso

I really appreciate the heads up. I'll pin it now to 0.6.7.

Not opposed to rewriting to conform to scvi-tools, but would be nice to do it only once. Do you know when the Classifier class will be stable in scvi-tools?

adamgayoso · 2020-09-18T18:13:37Z

Probably in a month or two. Now that I look at your code again, if I'm understanding correctly, it's probably easiest to

Use scvi-tools for the scVI part, extract the mean of the latent space, mean of the latent log library size
Train your own classifier (hand-built using pytorch, can probably copy paste some of our code).

It might even be faster this way as what is done now is that each cell is encoded to get the same latent z mean and latent log l mean, and you'd be able to cut that out.

For the scVI part, scvi-tools is ready enough to replace the vae training -> latent variable extraction. We also have a new save/load mechanism that should simplify some of your code. Though I'd wait until we have the official 0.7 release of scvi-tools.

njbernstein · 2020-09-18T18:25:10Z

I actually sample the latent space during training and don't take the mean

strainer = ClassifierTrainer(classifier, classifier_data,
                                 train_size=(1. - valid_pct),
                                 frequency=2, metrics_to_monitor=['accuracy'],
                                 use_cuda=args.gpu,
                                 sampling_model=vae, sampling_zl=True,
                                 early_stopping_kwargs=stopping_params,
                                 batch_size=batch_size)

When we don't sample the latent space the performance of solo suffers quite a bit, otherwise, I would be happy to do what you suggested and just take the mean of the latent space.

I'll probably wait to see what 0.7 looks like and decide then

adamgayoso · 2020-09-18T18:28:18Z

It looks like that option just concatenates the mean of the latent log library size, as the [0] grabs the mean of the variational distribution.

https://github.com/YosefLab/scvi-tools/blob/d9cb4806780502da0ed412f395f8fd5b98f4e917/scvi/core/trainers/annotation.py#L106-L108

                if self.sampling_zl:
                    x_z = self.sampling_model.z_encoder(x)[0]
                    x_l = self.sampling_model.l_encoder(x)[0]
                    x = torch.cat((x_z, x_l), dim=-1)

https://github.com/YosefLab/scvi-tools/blob/d9cb4806780502da0ed412f395f8fd5b98f4e917/scvi/core/modules/vae.py#L298-L299

        # Sampling
        qz_m, qz_v, z = self.z_encoder(x_, y)
        ql_m, ql_v, library = self.l_encoder(x_)

Note that qz_m and ql_m are deterministic quantities and the z and library are the samples.

In any case, in our new API it's pretty easy to sample or get the mean of the latent variables.

njbernstein · 2020-09-18T18:49:42Z

Huh you are right. Not sure why I thought we were sampling from the posterior. My mistake.

Thanks!

adamgayoso · 2020-09-19T21:03:43Z

Well, the variables do have pretty poor names, which we are going to fix! That said, my opinion is that building your own classifier on z and l and using scvi-tools for the scVI part would be the simplest transition.

Though, we could also consider implementing solo's core functionality (train vae, make doublets, classifier on z+l of augmented data) in scvi-tools? I think we could be open to the idea... and it's certainly a conversation we could continue offline if you all are interested.

adata = sc.read...
scvi.data.setup_anndata(adata)
vae = scvi.model.SCVI(adata)
vae.train()
solo = scvi.model.SOLO(vae)
solo.train() # simulates doublets, trains classifier
solo.predict(adata)

njbernstein · 2020-09-23T19:41:22Z

@adamgayoso I'm interested in doing that. Discussing with my collaborators tomorrow. I'll let you know where things are after that.

Nick

njbernstein · 2020-09-24T16:58:05Z

@adamgayoso We are interested in implementing Solo's core functionality into scVI but an offline discussion seems best to move forward. I'll email you shortly

njbernstein closed this as completed Sep 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scvi is deprecated #50

scvi is deprecated #50

adamgayoso commented Sep 18, 2020

njbernstein commented Sep 18, 2020

adamgayoso commented Sep 18, 2020

njbernstein commented Sep 18, 2020 •

edited

Loading

adamgayoso commented Sep 18, 2020 •

edited

Loading

njbernstein commented Sep 18, 2020

adamgayoso commented Sep 19, 2020

njbernstein commented Sep 23, 2020

njbernstein commented Sep 24, 2020

scvi is deprecated #50

scvi is deprecated #50

Comments

adamgayoso commented Sep 18, 2020

njbernstein commented Sep 18, 2020

adamgayoso commented Sep 18, 2020

njbernstein commented Sep 18, 2020 • edited Loading

adamgayoso commented Sep 18, 2020 • edited Loading

njbernstein commented Sep 18, 2020

adamgayoso commented Sep 19, 2020

njbernstein commented Sep 23, 2020

njbernstein commented Sep 24, 2020

njbernstein commented Sep 18, 2020 •

edited

Loading

adamgayoso commented Sep 18, 2020 •

edited

Loading