A curated list for Efficient Diffusion Models
Title | Authors | Introduction | Links | Github |
---|---|---|---|---|
Cache Me if You Can: Accelerating Diffusion Models through Block Caching | Meta GenAI | which we reuse outputs from layer blocks of previous steps to speed up inference. Furthermore, we propose a technique to automatically determine caching schedules based on each block's changes over timesteps. | Paper | |
DeepCache: Accelerating Diffusion Models for Free | Xinyin Ma Gongfan Fang Xinchao Wang* National University of Singapore | DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models, which caches and retrieves features across adjacent denoising stages, thereby curtailing redundant computations. Utilizing the property of the U-Net, we reuse the high-level features while up | Paper | Github |
FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models | Junhyuk So, Jungwon Lee, and Eunhyeok Park | we introduce an advanced acceleration technique that leverages the temporal redundancy inherent in diffusion models. Reusing feature maps with high temporal similarity opens up a new opportunity to save computation resources without compromising output quality. | Paper | |
Approximate Caching for Efficiently Serving Text-to-Image Diffusion Models | Shubham Agarwal and Subrata Mitra, Adobe Research | we introduce a novel approximate-caching technique that can reduce such iterative denoising steps by reusing intermediate noise states created during a prior image generation. Based on this idea, we present an end-to-end text-to-image generation system | Paper | |
FORA: Fast-Forward Caching in Diffusion Transformer Acceleration | Pratheba Selvaraju Microsoft | We present Fast-FORward CAching (FORA), a simple yet effective approach designed to accelerate DiT by exploiting the repetitive nature of the diffusion process. FORA implements a caching mechanism that stores and reuses intermediate outputs from the attention and MLP layers across denoising steps, thereby reducing computational overhead. This approach does not require model retraining and seamlessly integrates with existing transformerbased diffusion models | Paper | Github |
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching | Xinyin Ma Xinchao Wang National University of Singapore1 | To achieve this, we introduce a novel scheme, named Learning-to-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Specifically, by leveraging the identical structure of layers in transformers and the sequential nature of diffusion, we explore redundant computations between timesteps by treating each layer as the fundamental unit for caching. | Paper |