The goal of generative models is constructing a
sampler for an unknown distribution . As long as the sampler is
contructed, then we can generate new samples from the
distribution.
As the distribution is unknown, we should get a set of representative
samples from the unknown distribution to estimate it.
We can directly learn a transformation between the unknown
distribution and a simple-to-sample distribution such as Gaussian
distribution. However this may be untractable as the target unknown
distribution can be very large and complicated. Diffusion models provide
a general framework for learning such transformations, reducing the
problem of sampling from distribution to a sequence of easier sampling
problems.
Gaussian Diffusions
Let random variable , then construct a sequence of random variables
by successively adding independent Gaussian noise with
some small scale : which is called the forward process. Let be the
marginal distribution of each .
We have . In practice, is sufficiently large and we can assume
that , where is the standard deviation. What
we want to do is to learn a transformation such that given the
marginal distribution , we can
produce . And if the final
marginal distribution is given,
we can produce
iteratively. This process is called reverse process, and the
method to implemente this process is called reverse
sampler.
The reverse process can be represented by conditional probability. At
time step , given the input sampled from , the output of the reverse sampler
generate a sample from the conditional distribution However, though this implies that a generative model should
learn the conditional distribution for every which could be complicated. But we
have the following insight, which will be proved in DDPM sampler
section:
Fact 1 (Diffusion Reverse Process). For small and the Gaussian diffusion process
, where , then
the conditional distribution is itself close to Gaussian. For all time step and condition , there exists some
mean parameter such that .
Given this fact, we find that if s are provided, then the only
thing the model should learn is the mean of the distribution , which is noted as . Fortunately, learning
the mean is much simpler than learning as this can be seen as a
regression problem: Given the joint distribution , from which we can easily
to sample, and the definition of the mean of the distribution : we have where the expectation is taken over sample from the target distribution .
Now, the problem of learning to sample from an arbitrary distribution
is converted into optimizing the regression problem.
Abstract of Diffusions
We abstract the Gaussian settings of diffusions. Given a set of
samples extracted from target distribution , and the easy-to-sample base
distribution (e.g. Gaussian or
multinomial), we try to construct a sequence of distributions which interpolate
between and , such that and the adjacent
distribution are
marginally close. The aim is to learn a reverse sampler to transform
distribution to .
The reverse sampler at step is a potentially stochastic function
such that if , then the marginal
distribution of is exactly
, which means the reverse
sampler is used to transform the distribution to :
Dicretization
What is exactly mean about close between and ?
Assuming we have a time-evolving function and the constructed sequence , then the
sequence is discretization of the time-evolving function such that and . If the sequence is sampled
uniformly, then where controls the
fineness of the discretization.
We want the terminal variance of the final distribution to be independent of , the incremental variance can be
defined as as and .
Following description of DDPM, DDIM and Flow-matching will use so the diffusion process can
be written as or
Now, let’s start the journey of DDPM, DDIM, and Flow-matching 😃.