Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about code and some SD start knowledge #17

Open
stillbetter opened this issue Dec 9, 2024 · 4 comments
Open

Questions about code and some SD start knowledge #17

stillbetter opened this issue Dec 9, 2024 · 4 comments

Comments

@stillbetter
Copy link

Hi, thanks for your patience.

I want to know the difference between the two function get_velocity and get_approximated_x0 in scheduler/ddpm_scheduler.py with the v_prediction type in step function in line 356. Seems they all predict the denoised latents, but why thet are called in different way.

@stillbetter
Copy link
Author

Another question is, since we can get the pred_original_sample in every step in pipeline, why we not take it directly, instead keep the denoising step by step.

This may be irrelavent to the paper, but I wat truly confused, but cant get an proper answer. It would be great if you can explain it. Thanks

@claudiom4sir
Copy link
Owner

Hi,

I want to know the difference between the two function get_velocity and get_approximated_x0 in scheduler/ddpm_scheduler.py with the v_prediction type in step function in line 356. Seems they all predict the denoised latents, but why thet are called in different way.

get_approximated_x0 does exactly the same as v_prediction type in line 410. A different function was implemented to avoid calling the step function of the scheduler.

The implementation of get_velocity is very similar to get_approximated_x0 but has a different purpose. Indeed, you can see that the equation velocity = sqrt_alpha_prod * noise - sqrt_one_minus_alpha_prod * sample is different from pred_original_sample = (alpha_prod_t**0.5) * sample - (beta_prod_t**0.5) * model_output (i.e., the coefficients of sample and noise are inverted).

Another question is, since we can get the pred_original_sample in every step in pipeline, why we not take it directly, instead keep the denoising step by step.

Because pred_original_sample is just an approximation of the final latent. It is computed by combining the noise predicted by the UNet and the noisy latent that is progressively refined. When t is large (e.g., 900), the latent is still very noisy and the noise predicted by the UNet may contain errors. As a consequence, the approximation of x0 is not good enough to be the final output (refer to Figure 3 of the paper). By adopting a step-by-step denoising, the current latent becomes progressively better (i.e., with less noise) and the noise predicted by the UNet is more accurate. In addition, this step-by-step denoising allows us to exploit the bidirectional strategy proposed to ensure temporal consistency.

@stillbetter
Copy link
Author

stillbetter commented Dec 10, 2024

Greatly thanks for your reply! Exciting to cry~

  1. So can I just treat v_prediction as the prediction object is a denoised latents, and epsilon as the object is noise?
  2. I compare the get_velocity and get_approximated_x0. Since get_approximated_x0 is the same as v_prediction type . And in train.py line 1004, v_prediction corresponding to get_velocity function. But in step function of scheduler line 410, v_prediction relates to an inverted one. Why their return back are inverted? For get_approximated_x0, I understand the return thing is a noised latent minus a estimated noise so we can get a 'clean' latens of $\widetilde{x_0}$. But for get_velocity, I can't take intuitively the meaning of a noise minus a latents.

@stillbetter
Copy link
Author

Well, let me make Quest.2 clear. Why the a same v_prediction corresponding to two inverted formulars, in train and step func.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants