Hyper-SD:

Trajectory Segmented Consistency Model for Efficient Image Synthesis

ByteDance
*  Project Lead

Visual Comparison between Hyper-SD and Other Methods. From the first column to the fourth column, the prompts of these images are (1) A dog wearing a white t-shirt, with the word “hyper” written on it ... (2) Abstract beauty, approaching perfection, pure form, golden ratio, minimalistic, unfinished, ... (3) A crystal heart laying on moss in a serene zen garden ... (4)Anthropomorphic art of a scientist stag, victorian inspired clothing by krenz cushart ... , respectively.

Real-Time Generation Demo of Hyper-SD.

Abstract


Recently, a series of diffusion-aware distillation algorithms have emerged to alleviate the computational overhead associated with the multi-step inference process of Diffusion Models (DMs). Current distillation techniques often dichotomize into two distinct aspects: i) ODE Trajectory Preservation; and ii) ODE Trajectory Reformulation. However, these approaches suffer from severe performance degradation or domain shifts. To address these limitations, we propose Hyper-SD, a novel framework that synergistically amalgamates the advantages of ODE Trajectory Preservation and Reformulation, while maintaining near-lossless performance during step compression. Firstly, we introduce Trajectory Segmented Consistency Distillation to progressively perform consistent distillation within pre-defined time-step segments, which facilitates the preservation of the original ODE trajectory from a higher-order perspective. Secondly, we incorporate human feedback learning to boost the performance of the model in a low-step regime and mitigate the performance loss incurred by the distillation process. Thirdly, we integrate score distillation to further improve the low-step generation capability of the model and offer the first attempt to leverage a unified LoRA to support the inference process at all steps. Extensive experiments and user studies demonstrate that Hyper-SD achieves SOTA performance from 1 to 8 inference steps for both SDXL and SD1.5. For example, Hyper-SDXL surpasses SDXL-Lightning by +0.68 in CLIP Score and +0.51 in Aes Score in the 1-step inference.

Pipeline


Hyper-SD take the two-stage Progressive Consistency Distillation. The first stage involves consistency distillation in two separate time segments: [0, T/2] and [T/2 , T] to obtain the two segments consistency ODE. Then, this ODE trajectory is adopted to train a global consistency model in the subsequent stage

Experiment

Qualitative comparisons between Hyper-SD and other LoRA-based acceleration approaches on SDXL architecture.

Qualitative comparisons between Hyper-SD and other LoRA-based acceleration approaches on SD15 architecture.

Hyper-SD exhibits a remarkable superiority over existing methods that concentrate on acceleration and obtain more user preference on both SD1.5 and SDXL architectures.

Hyper-SD LoRAs with different steps can be applied to different base models and consistently generate high-quality images

The unified LoRAs of Hyper-SD are compatible with ControlNet. The examples are conditioned on either scribble or canny images.

BibTeX

@misc{ren2024hypersd,
      title={Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis}, 
      author={Yuxi Ren and Xin Xia and Yanzuo Lu and Jiacheng Zhang and Jie Wu and Pan Xie and Xing Wang and Xuefeng Xiao},
      year={2024},
      eprint={2404.13686},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}