Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Hyper-SD:

Trajectory Segmented Consistency Model for Efficient Image Synthesis

ByteDance
^* Project Lead

Recently, a series of diffusion-aware distillation algorithms have emerged to alleviate the computational overhead associated with the multi-step inference process of Diffusion Models (DMs). Current distillation techniques often dichotomize into two distinct aspects: i) ODE Trajectory Preservation; and ii) ODE Trajectory Reformulation. However, these approaches suffer from severe performance degradation or domain shifts. To address these limitations, we propose Hyper-SD, a novel framework that synergistically amalgamates the advantages of ODE Trajectory Preservation and Reformulation, while maintaining near-lossless performance during step compression. Firstly, we introduce Trajectory Segmented Consistency Distillation to progressively perform consistent distillation within pre-defined time-step segments, which facilitates the preservation of the original ODE trajectory from a higher-order perspective. Secondly, we incorporate human feedback learning to boost the performance of the model in a low-step regime and mitigate the performance loss incurred by the distillation process. Thirdly, we integrate score distillation to further improve the low-step generation capability of the model and offer the first attempt to leverage a unified LoRA to support the inference process at all steps. Extensive experiments and user studies demonstrate that Hyper-SD achieves SOTA performance from 1 to 8 inference steps for both SDXL and SD1.5. For example, Hyper-SDXL surpasses SDXL-Lightning by +0.68 in CLIP Score and +0.51 in Aes Score in the 1-step inference.

Experiment

Qualitative comparisons between Hyper-SD and other LoRA-based acceleration approaches on SDXL architecture.

Qualitative comparisons between Hyper-SD and other LoRA-based acceleration approaches on SD15 architecture.

Hyper-SD exhibits a remarkable superiority over existing methods that concentrate on acceleration and obtain more user preference on both SD1.5 and SDXL architectures.

Hyper-SD LoRAs with different steps can be applied to different base models and consistently generate high-quality images

The unified LoRAs of Hyper-SD are compatible with ControlNet. The examples are conditioned on either scribble or canny images.

BibTeX

@misc{ren2024hypersd, title={Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis}, author={Yuxi Ren and Xin Xia and Yanzuo Lu and Jiacheng Zhang and Jie Wu and Pan Xie and Xing Wang and Xuefeng Xiao}, year={2024}, eprint={2404.13686}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Hyper-SD:

Trajectory Segmented Consistency Model for Efficient Image Synthesis

Real-Time Generation Demo of Hyper-SD.

Abstract

Pipeline

Experiment

BibTeX