../

Generative Modeling

Table of Contents

Classifier-free Guidance In Practice

  • Train a network to handle both conditional and unconditional generation: $f(x_t, t, c)$ and $f(x_t, t, \varnothing)$.
  • For each denoising step, generate two outputs, one conditioned and one unconditioned. The output used is $\epsilon_{\text{guided}} = (1 - w)\epsilon_{\text{unconditioned}} + w\epsilon_{\text{conditioned}}$.
  • Train the network by dropping the condition with some probability — known as condition dropout.

Guiding a Diffusion Model with a Bad Version of Itself1

  • Classifier-free guidance: training both conditional and unconditional diffusion model, extrapolate between the two denoiser networks with some factor $w$.
  • Maximum likelihood attempts to cover all training samples.
  • CFG results in more natural looking images.
  • Paper proposes autoguidance - interpolate with a poorer version of itself, where the poor model $D_0$ is trained on the same task and data distribution, but is maybe small/low capacity or under-trained.

References