A Visual Guide to How Diffusion Models Work

This article is aimed at those who want to understand exactly how diffusion models work, with no prior knowledge expected. I’ve tried to use illustrations wherever possible to provide visual intuitions on each part of these models. I’ve kept mathematical notation and equations to a minimum, and where they are necessary I’ve tried to define […] The post A Visual Guide to How Diffusion Models Work appeared first on Towards Data Science.

Feb 6, 2025 - 22:08
 0
A Visual Guide to How Diffusion Models Work

This article is aimed at those who want to understand exactly how Diffusion Models work, with no prior knowledge expected. I’ve tried to use illustrations wherever possible to provide visual intuitions on each part of these models. I’ve kept mathematical notation and equations to a minimum, and where they are necessary I’ve tried to define and explain them as they occur.

Intro

I’ve framed this article around three main questions:

  • What exactly is it that diffusion models learn?
  • How and why do diffusion models work?
  • Once you’ve trained a model, how do you get useful stuff out of it?

The examples will be based on the glyffuser, a minimal text-to-image diffusion model that I previously implemented and wrote about. The architecture of this model is a standard text-to-image denoising diffusion model without any bells or whistles. It was trained to generate pictures of new “Chinese” glyphs from English definitions. Have a look at the picture below — even if you’re not familiar with Chinese writing, I hope you’ll agree that the generated glyphs look pretty similar to the real ones!

Random examples of glyffuser training data (left) and generated data (right).

What exactly is it that diffusion models learn?

Generative Ai models are often said to take a big pile of data and “learn” it. For text-to-image diffusion models, the data takes the form of pairs of images and descriptive text. But what exactly is it that we want the model to learn? First, let’s forget about the text for a moment and concentrate on what we are trying to generate: the images.

Probability distributions

Broadly, we can say that we want a generative AI model to learn the underlying probability distribution of the data. What does this mean? Consider the one-dimensional normal (Gaussian) distribution below, commonly written