Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scvi is deprecated #50

Closed
adamgayoso opened this issue Sep 18, 2020 · 8 comments
Closed

scvi is deprecated #50

adamgayoso opened this issue Sep 18, 2020 · 8 comments

Comments

@adamgayoso
Copy link

Hi all,

I wanted to let you all know that we are deprecating the scvi software package. We recently pushed a final update (0.6.8) with a deprecation warning upon import. scvi is now implemented in the scvi-tools package (same repo). Based on viewing your code, it seems like the easiest thing to do at the moment is pin the scvi requirement to 0.6.7. We will be changing some of the classes you use (like Classifier etc.) in scvi-tools, so it will require non-trivial updating to achieve the same solo algorithm. To fix the issue with the invalid parameter, you might consider detecting outliers in the dataset in terms of library size, possibly truncating the library size prior, or reducing the number of latent dimensions.

Please let me, @galenxing, or @romain-lopez know if you have any questions.

@njbernstein
Copy link
Contributor

Hi @adamgayoso

I really appreciate the heads up. I'll pin it now to 0.6.7.

Not opposed to rewriting to conform to scvi-tools, but would be nice to do it only once. Do you know when the Classifier class will be stable in scvi-tools?

@adamgayoso
Copy link
Author

Probably in a month or two. Now that I look at your code again, if I'm understanding correctly, it's probably easiest to

  1. Use scvi-tools for the scVI part, extract the mean of the latent space, mean of the latent log library size
  2. Train your own classifier (hand-built using pytorch, can probably copy paste some of our code).

It might even be faster this way as what is done now is that each cell is encoded to get the same latent z mean and latent log l mean, and you'd be able to cut that out.

For the scVI part, scvi-tools is ready enough to replace the vae training -> latent variable extraction. We also have a new save/load mechanism that should simplify some of your code. Though I'd wait until we have the official 0.7 release of scvi-tools.

@njbernstein
Copy link
Contributor

njbernstein commented Sep 18, 2020

I actually sample the latent space during training and don't take the mean

strainer = ClassifierTrainer(classifier, classifier_data,
                                 train_size=(1. - valid_pct),
                                 frequency=2, metrics_to_monitor=['accuracy'],
                                 use_cuda=args.gpu,
                                 sampling_model=vae, sampling_zl=True,
                                 early_stopping_kwargs=stopping_params,
                                 batch_size=batch_size)

When we don't sample the latent space the performance of solo suffers quite a bit, otherwise, I would be happy to do what you suggested and just take the mean of the latent space.

I'll probably wait to see what 0.7 looks like and decide then

@adamgayoso
Copy link
Author

adamgayoso commented Sep 18, 2020

It looks like that option just concatenates the mean of the latent log library size, as the [0] grabs the mean of the variational distribution.

https://github.com/YosefLab/scvi-tools/blob/d9cb4806780502da0ed412f395f8fd5b98f4e917/scvi/core/trainers/annotation.py#L106-L108

                if self.sampling_zl:
                    x_z = self.sampling_model.z_encoder(x)[0]
                    x_l = self.sampling_model.l_encoder(x)[0]
                    x = torch.cat((x_z, x_l), dim=-1)

https://github.com/YosefLab/scvi-tools/blob/d9cb4806780502da0ed412f395f8fd5b98f4e917/scvi/core/modules/vae.py#L298-L299

        # Sampling
        qz_m, qz_v, z = self.z_encoder(x_, y)
        ql_m, ql_v, library = self.l_encoder(x_)

Note that qz_m and ql_m are deterministic quantities and the z and library are the samples.

In any case, in our new API it's pretty easy to sample or get the mean of the latent variables.

@njbernstein
Copy link
Contributor

Huh you are right. Not sure why I thought we were sampling from the posterior. My mistake.

Thanks!

@adamgayoso
Copy link
Author

Well, the variables do have pretty poor names, which we are going to fix! That said, my opinion is that building your own classifier on z and l and using scvi-tools for the scVI part would be the simplest transition.

Though, we could also consider implementing solo's core functionality (train vae, make doublets, classifier on z+l of augmented data) in scvi-tools? I think we could be open to the idea... and it's certainly a conversation we could continue offline if you all are interested.

adata = sc.read...
scvi.data.setup_anndata(adata)
vae = scvi.model.SCVI(adata)
vae.train()
solo = scvi.model.SOLO(vae)
solo.train() # simulates doublets, trains classifier
solo.predict(adata)

@njbernstein
Copy link
Contributor

@adamgayoso I'm interested in doing that. Discussing with my collaborators tomorrow. I'll let you know where things are after that.

Nick

@njbernstein
Copy link
Contributor

@adamgayoso We are interested in implementing Solo's core functionality into scVI but an offline discussion seems best to move forward. I'll email you shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants