3Blue1Brown: But how do AI videos actually work? | Guest video by ‪@WelchLabsVideo‬

Аватар автора
Михаил
Special Thanks to: Jonathan Ho - Jonathan is the Author of the DDPM paper and the Classifier Free Guidance Paper. Preetum Nakkiran - Preetum has an excellent introductory diffusion tutorial: Cheyang also has a terrific tutorial and MIT course on diffusion models Other References Written by: Stephen Welch, with very helpful feedback from Grant Sanderson Produced by: Stephen Welch, Sam Baskin, and Pranav Gundu Technical Notes The noise videos in the opening have been passed through a VAE (actually, diffusion process happens in a compressed “latent” space), which acts very much like a video compressor - this is why the noise videos don’t look like pure salt and pepper. Per Chenyang Yuan: at 10:15, the blurry image that results when removing random noise in DDPM is probably due to a mismatch in noise levels when calling the denoiser. When the denoiser is called on x_{t-1} during DDPM sampling, it is expected to have a certain noise level (let&call it sigma_{t-1}). If you generate x_{t-1} from x_t without adding noise, then the noise present in x_{t-1} is always smaller than sigma_{t-1}. This causes the denoiser to remove too much noise, thus pointing towards the mean of the dataset. For the vectors at 31:40 - Some implementations use f(x, t, cat) + alpha(f(x, t, cat) - f(x, t)), and some that do f(x, t) + alpha(f(x, t, cat) - f(x, t)), where an alpha value of 1 corresponds to no guidance. I chose the second format here to keep things simpler. At 30:30, the...

0/0


0/0

0/0

0/0