Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models

Pengze Zhang*1, Hubery Yin*2, Chen Li2, Xiaohua Xie†1
1Sun Yat-sen University 2Wechat, Tencent Inc.
CVPR 2024
(Highlight)
*Equal Contribution, Corresponding Author

Existing diffusion models encounter singularities when t=0 and t=1. Especially, due to the lack of consideration of the sampling at t=1, they will encounter the average brightness issue. To tackle this, we propose a plug-and-play SingDiffusion method (highlighted in red) to bridge this gap.

Abstract

Most diffusion models assume that the reverse process adheres to a Gaussian distribution. However, this approximation has not been rigorously validated, especially at singularities, where t=0 and t=1. Improperly dealing with such singularities leads to an average brightness issue in applications, and limits the generation of images with extreme brightness or darkness. We primarily focus on tackling singularities from both theoretical and practical perspectives. Initially, we establish the error bounds for the reverse process approximation, and showcase its Gaussian characteristics at singularity time steps. Based on this theoretical insight, we confirm the singularity at t=1 is conditionally removable while it at t=0 is an inherent property. Upon these significant conclusions, we propose a novel plug-and-play method SingDiffusion to address the initial singular time step sampling, which not only effectively resolves the average brightness issue for a wide range of diffusion models without extra training efforts, but also enhances their generation capability in achieving notable lower FID scores. Code and models are released at https://github.com/PangzeCheung/SingDiffusion

Seamlessly Adapt to Different Models

Definition of Singularities

MY ALT TEXT

Important Theoretical Contributions

MY ALT TEXT

Tackling Average Brightness Issue

MY ALT TEXT

Comparison of stable diffusion models and SingDiffusion on average brightness issue.

Improve FID and CLIP Score on 30K COCO Prompts

MY ALT TEXT

Comparison of Pareto curves between SingDiffusion, SD-1.5, and SD-2.0-base on 30k COCO images, across various guidance scales in [1.5, 2, 3, 4, 5, 6, 7, 8].

Seamlessly Adapt to ControlNet

MY ALT TEXT

SingDiffusion integrates seamlessly with ControlNet.

BibTeX

@inproceedings{ 
zhang2024tackling, 
title={Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models}, 
author={Pengze Zhang and Hubery Yin and Chen Li and Xiaohua Xie},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, 
year={2024}
}