Consistency models (CMs) are a cutting-edge class of diffusion-based generative models designed for rapid and efficient sampling. However, most existing CMs rely on discretized timesteps, which introduce complex hyperparameter tuning and are susceptible to discretization errors. While continuous-time approaches offer a promising alternative, their adoption has been hindered by significant training instabilities.
To overcome these challenges, OpenAI researchers introduces TrigFlow, a simplified theoretical framework that unifies and refines the parameterizations of diffusion models and CMs. TrigFlow identifies the key causes of training instability and addresses them with novel improvements in diffusion process parameterization, network architecture, and training objectives.

Existing consistency models build upon the parameterization and diffusion process formulations outlined in EDM (Karras et al., 2022). TrigFlow bridges EDM with Flow Matching, streamlining the formulation of diffusion models, their probability flow ODEs, and CMs. This unified framework significantly reduces complexity while maintaining the expressive power of prior approaches.
Building on this theoretical foundation, the researchers pinpointed the root causes of instability in CM training. They then proposed a comprehensive suite of strategies to address these challenges, including:
- Enhanced Time-Conditioning: Improving the temporal representation within the model.
- Adaptive Group Normalization: Refining network architecture to better handle variations in the data.
- Revised Training Objectives: Reformulating the loss function for continuous-time CMs, incorporating adaptive weighting, normalization, and progressive annealing to stabilize training dynamics.

The result of these innovations is a new class of consistency models, referred to as sCMs. These models demonstrate superior performance in both consistency training and distillation, achieving state-of-the-art results across diverse datasets and model sizes. The researchers trained sCMs on datasets like CIFAR-10, ImageNet 64×64, and ImageNet 512×512, scaling up to an unprecedented 1.5 billion parameters—the largest consistency models trained to date.

sCMs deliver high-quality samples with impressive scalability and efficiency:
- Predictable Scaling: sCMs achieve better sample quality as computational resources increase, ensuring consistent improvements.
- Efficient Sampling: Using only two sampling steps, sCMs produce results competitive with state-of-the-art diffusion models, which require significantly more compute.
- FID Scores: sCMs achieve FID scores of 2.06 on CIFAR-10, 1.48 on ImageNet 64×64, and 1.88 on ImageNet 512×512, reducing the performance gap with leading diffusion models to within 10%.
sCMs exhibit notable advantages over variational score distillation (VSD). While VSD struggles at high guidance levels, sCMs produce more diverse samples and exhibit greater compatibility with guided sampling techniques. This flexibility further establishes sCMs as a robust alternative for various generative modeling tasks.
Overall, OpenAI’s TrigFlow framework and the resulting sCMs mark a significant milestone in the evolution of generative modeling. By addressing training instability and scaling challenges, the researchers have unlocked the potential of continuous-time consistency models, setting new benchmarks in sample quality, scalability, and efficiency. With their streamlined two-step generation process and groundbreaking performance metrics, sCMs pave the way for future innovations in diffusion-based generative modeling.
The paper Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models is on arXiv.
Author: Hecate He | Editor: Chain Zhang
The post Redefines Consistency Models”: OpenAI’s TrigFlow Narrows FID Gap to 10% with Efficient Two-Step Sampling first appeared on Synced.