Paper Read | Generation

Chiaro

2025-07-04

paper-read

The repository can be found on scratch-diffusion

References

Step-by-Step Diffusion: An Elementary Tutorial

Overview of Diffusions

The goal of generative models is constructing a sampler for an unknown distribution . As long as the sampler is contructed, then we can generate new samples from the distribution.

As the distribution is unknown, we should get a set of representative samples from the unknown distribution to estimate it.

We can directly learn a transformation between the unknown distribution and a simple-to-sample distribution such as Gaussian distribution. However this may be untractable as the target unknown distribution can be very large and complicated. Diffusion models provide a general framework for learning such transformations, reducing the problem of sampling from distribution to a sequence of easier sampling problems.

Gaussian Diffusions

Let random variable , then construct a sequence of random variables by successively adding independent Gaussian noise with some small scale : which is called the forward process. Let be the marginal distribution of each . We have . In practice, is sufficiently large and we can assume that , where is the standard deviation. What we want to do is to learn a transformation such that given the marginal distribution , we can produce . And if the final marginal distribution is given, we can produce iteratively. This process is called reverse process, and the method to implemente this process is called reverse sampler.

The reverse process can be represented by conditional probability. At time step , given the input sampled from , the output of the reverse sampler generate a sample from the conditional distribution However, though this implies that a generative model should learn the conditional distribution for every which could be complicated. But we have the following insight, which will be proved in DDPM sampler section:

Fact 1 (Diffusion Reverse Process). For small and the Gaussian diffusion process , where , then the conditional distribution is itself close to Gaussian. For all time step and condition , there exists some mean parameter such that .

Given this fact, we find that if s are provided, then the only thing the model should learn is the mean of the distribution , which is noted as . Fortunately, learning the mean is much simpler than learning as this can be seen as a regression problem: Given the joint distribution , from which we can easily to sample, and the definition of the mean of the distribution : we have where the expectation is taken over sample from the target distribution .

Now, the problem of learning to sample from an arbitrary distribution is converted into optimizing the regression problem.

Abstract of Diffusions

We abstract the Gaussian settings of diffusions. Given a set of samples extracted from target distribution , and the easy-to-sample base distribution (e.g. Gaussian or multinomial), we try to construct a sequence of distributions which interpolate between and , such that and the adjacent distribution are marginally close. The aim is to learn a reverse sampler to transform distribution to .

The reverse sampler at step is a potentially stochastic function such that if , then the marginal distribution of is exactly , which means the reverse sampler is used to transform the distribution to :

Dicretization

What is exactly mean about close between and ?

Assuming we have a time-evolving function and the constructed sequence , then the sequence is discretization of the time-evolving function such that and . If the sequence is sampled uniformly, then where controls the fineness of the discretization.

We want the terminal variance of the final distribution to be independent of , the incremental variance can be defined as as and .

Following description of DDPM, DDIM and Flow-matching will use so the diffusion process can be written as or

Now, let’s start the journey of DDPM, DDIM, and Flow-matching 😃.