Notes from How Diffusion Models Work by DeepLearning.ai
- With Extra Noise
explorer_pC0437cXSo.mp4
Taught By Sharon Zhou
Noted by Atul
- Example used throughout the course: Generate 16X16 size sprites for video games.
- Goal : Given a lot of sprite images, generate even more sprite images
-
What does the network learn?
- Fine details
- General outline
- Everything in between
-
Noising Process (bob as ink drop analogy)
-
Denoising Process (what should the NN think?)
- If its' Bob the sprite, keep it as it is
- If its likely to be Bob, suggest more details to be filled
- If its just an outline of a sprite, suggest general details for likely sprite(bob/fred/...)
- If its nothing, suggest outline of a sprite
-
Give the NN input noise, whose pixels are obtained from Normal distribution, and get a completely new sprite !
- Assume you have a trained NN
- At each denoising step, it predicts noise, and subtracts it to get a better image
- NOTE: At each denoising step, some random noise is added again to prevent "mode collapse"
- UNet Architecture
- Input and output of same size
- First used for image segmentation
-
Takes a noisy image, embeds into small space by downsampling, and upsamples to predict noise
-
Can take more info. in form of embeddings
- Time: related to timestep, and noise level added
- Context: guides generation process
-
Checkout
forward()
in sampling notebook
Learns the distribution of what is "not noise"
- Sample training image, timestep
t
, and noise, randomly- Timestep helps control level of noise
- randomisation ensures a stable model
- Add noise to image
- Input this into NN, which predicts the noise
- Compute loss between actual and predicted noise
- Backprop and learn
- Embeddings are vectors , for instance, strings represented as number vectors
- Given as input to NN along with training image
- Get associated with a training example, and its properties
- Uses: Generate funky mixtures by combining embeddings
- Context formats
- Text
- Categories, one hot encoded (Eg. hero, non-hero, spells ...)
- DDPM is slow!
- Multiple timesteps, and markovian nature
- Skips steps, making the process deterministic
- Lower quality than DDPM
Other applications : Music, Inpainting, Textual Inversion