- Diffusion Inversion
- Text-Guided Editing
- Continual Learning
- Remove Concept
- New Concept Learning
- T2I augmentation
- Spatial Control
- Image Translation
- Seg & Detect & Track
- Adding Conditions
- Few-Shot
- Inpainting
- Doc Layout
- Text Generation
- Super Resolution
- Drag Edit
- Video Generation
- Video Editing
- Virtual Try On
⭐⭐⭐Null-text Inversion for Editing Real Images using Guided Diffusion Models
[CVPR 2023]
[Website]
[Project]
[Code]
⭐⭐Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code
[Website]
[Code]
⭐Inversion-Based Creativity Transfer with Diffusion Models
[CVPR 2023]
[Website]
[Code]
⭐EDICT: Exact Diffusion Inversion via Coupled Transformations
[CVPR 2023]
[Website]
[Code]
⭐Improving Negative-Prompt Inversion via Proximal Guidance
[Website]
[Code]
IterInv: Iterative Inversion for Pixel-Level T2I Models
[NeurIPS 2023 workshop]
[Openreview]
[NeuripsW on Diffusion Models]
[Website]
[Code]
Object-aware Inversion and Reassembly for Image Editing
[Website]
[Project]
[Code]
Noise Map Guidance: Inversion with Spatial Context for Real Image Editing
[ICLR 2024]
[Code]
Generating Non-Stationary Textures using Self-Rectification
[Website]
[Code]
Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling
[Website]
[Code]
Score-Based Diffusion Models as Principled Priors for Inverse Imaging
[ICCV 2023]
[Website]
Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models
[Website]
Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models
[Website]
Fixed-point Inversion for Text-to-image diffusion models
[Website]
⭐⭐⭐Prompt-to-Prompt Image Editing with Cross Attention Control
[ICLR 2023]
[Website]
[Project]
[Code]
[Replicate Demo]
⭐⭐⭐Zero-shot Image-to-Image Translation
[SIGGRAPH 2023]
[Project]
[Code]
[Replicate Demo]
[Diffusers Doc]
[Diffusers Code]
⭐⭐⭐Null-text Inversion for Editing Real Images using Guided Diffusion Models
[CVPR 2023]
[Website]
[Project]
[Code]
⭐⭐InstructPix2Pix: Learning to Follow Image Editing Instructions
[CVPR 2023 (Highlight)]
[Website]
[Project]
[Diffusers Doc]
[Diffusers Code]
[Official Code]
[Dataset]
⭐⭐Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
[CVPR 2023]
[Website]
[Project]
[Code]
[Dataset]
[Replicate Demo]
[Demo]
⭐DiffEdit: Diffusion-based semantic image editing with mask guidance
[ICLR 2023]
[Website]
[Unofficial Code]
[Diffusers Doc]
[Diffusers Code]
⭐Imagic: Text-Based Real Image Editing with Diffusion Models
[CVPR 2023]
[Website]
[Project]
[Diffusers]
⭐Inpaint Anything: Segment Anything Meets Image Inpainting
[Website]
[Code 1]
[Code 2]
⭐Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code
[Website]
[Code]
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
[ICCV 2023]
[Website]
[Project]
[Code]
[Demo]
Collaborative Score Distillation for Consistent Visual Synthesis
[NeurIPS 2023]
[Website]
[Project]
[Code]
Visual Instruction Inversion: Image Editing via Visual Prompting
[NeurIPS 2023]
[Website]
[Project]
[Code]
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing
[NeurIPS 2023]
[Website]
[Code]
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models
[NeurIPS 2023]
[Website]
[Code]
Localizing Object-level Shape Variations with Text-to-Image Diffusion Models
[ICCV 2023]
[Website]
[Project]
[Code]
Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance
[Website]
[Code1]
[Code2]
[Diffusers Code]
PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models
[Website]
[Project]
[Code]
[Demo]
An Edit Friendly DDPM Noise Space: Inversion and Manipulations
[Website]
[Project]
[Code]
[Demo]
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods
[Website]
[Project]
[Code]
InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions
[Website]
[Project]
[Code]
Text-Driven Image Editing via Learnable Regions
[Website]
[Project]
[Code]
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
[Website]
[Project]
[Code]
MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path
[Website]
[Project]
[Code]
HIVE: Harnessing Human Feedback for Instructional Visual Editing
[Website]
[Project]
[Code]
FaceStudio: Put Your Face Everywhere in Seconds
[Website]
[Project]
[Code]
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
[Website]
[Project]
[Code]
Inversion-Free Image Editing with Natural Language
[Website]
[Project]
[Code]
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance
[Website]
[Project]
[Code]
MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image Translation by Prompts Redescription and Beyond
[Website]
[Project]
[Code]
Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators
[Website]
[Project]
[Code]
UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image
[SIGGRAPH 2023]
[Code]
Learning to Follow Object-Centric Image Editing Instructions Faithfully
[EMNLP 2023]
[Code]
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing
[Website]
[Code]
Differential Diffusion: Giving Each Pixel Its Strength
[Website]
[Code]
Region-Aware Diffusion for Zero-shot Text-driven Image Editing
[Website]
[Code]
Forgedit: Text Guided Image Editing via Learning and Forgetting
[Website]
[Code]
AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing
[Website]
[Code]
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation
[Website]
[Code]
Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance
[Website]
[Code]
SpecRef: A Fast Training-free Baseline of Specific Reference-Condition Real Image Editing
[Website]
[Code]
Conditional Score Guidance for Text-Driven Image-to-Image Translation
[NeurIPS 2023]
[Website]
LIME: Localized Image Editing via Attention Regularization in Diffusion Models
[Website]
[Project]
Watch Your Steps: Local Image and Scene Editing by Text Instructions
[Website]
[Project]
Delta Denoising Score
[Website]
[Project]
ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation
[Website]
[Project]
Emu Edit: Precise Image Editing via Recognition and Generation Tasks
[Website]
[Project]
MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers
[Website]
[Project]
Effective Real Image Editing with Accelerated Iterative Diffusion Inversion
[ICCV 2023 Oral]
[Website]
Iterative Multi-granular Image Editing using Diffusion Models
[WACV 2024]
Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks
[AAAI 2024]
BARET : Balanced Attention based Real image Editing driven by Target-text Inversion
[WACV 2024]
Text-to-image Editing by Image Information Removal
[WACV 2024]
Face Aging via Diffusion-based Editing
[BMVC 2023]
Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models
[CVPR 2023 AI4CC Workshop]
Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing
[ICASSP 2024]
FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference
[Website]
LayerDiffusion: Layered Controlled Image Editing with Diffusion Models
[Website]
iEdit: Localised Text-guided Image Editing with Weak Supervision
[Website]
Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models
[Website]
KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing
[Website]
User-friendly Image Editing with Minimal Text Input: Leveraging Captioning and Injection Techniques
[Website]
PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing
[Website]
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance
[Website]
LEDITS++: Limitless Image Editing using Text-to-Image Models
[Website]
PRedItOR: Text Guided Image Editing with Diffusion Prior
[Website]
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
[Website]
FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image Editing
[Website]
The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing
[Website]
Tuning-Free Inversion-Enhanced Control for Consistent Image Editing
[Website]
ZONE: Zero-Shot Instruction-Guided Local Editing
[Website]
Image Translation as Diffusion Visual Programmers
[Website]
Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing
[Website]
RGBD2: Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models
[CVPR 2023]
[Website]
[Project]
[Code]
Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models
[Website]
[Code]
Continual Learning of Diffusion Models with Generative Distillation
[Website]
[Code]
Prompt-Based Exemplar Super-Compression and Regeneration for Class-Incremental Learning
[Website]
[Code]
Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA
[Website]
[Project]
Class-Incremental Learning using Diffusion Model for Distillation and Replay
[ICCV 2023 VCL workshop best paper]
Create Your World: Lifelong Text-to-Image Diffusion
[Website]
Exploring Continual Learning of Diffusion Models
[Website]
DiracDiffusion: Denoising and Incremental Reconstruction with Assured Data-Consistency
[Website]
DiffusePast: Diffusion-based Generative Replay for Class Incremental Semantic Segmentation
[Website]
Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters
[Website]
Ablating Concepts in Text-to-Image Diffusion Models
[ICCV 2023]
[Website]
[Project]
[Code]
Erasing Concepts from Diffusion Models
[ICCV 2023]
[Website]
[Project]
[Code]
Inst-Inpaint: Instructing to Remove Objects with Diffusion Models
[Website]
[Project]
[Code]
[Demo]
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications
[Website]
[Project]
[Code]
Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models
[ICML 2023 workshop]
[Code]
Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models
[Website]
[Code]
Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models
[Website]
[Code]
Geom-Erasing: Geometry-Driven Removal of Implicit Concept in Diffusion Models
[Website]
Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers
[Website]
All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models
[Website]
EraseDiff: Erasing Data Influence in Diffusion Models
[Website]
⭐⭐⭐DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
[CVPR 2023 Honorable Mention]
[Website]
[Project]
[Official Dataset]
[Unofficial Code]
[Diffusers Doc]
[Diffusers Code]
⭐⭐⭐An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
[ICLR 2023 top-25%]
[Website]
[Diffusers Doc]
[Diffusers Code]
[Code]
⭐⭐Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion
[CVPR 2023]
[Website]
[Project]
[Diffusers Doc]
[Diffusers Code]
[Code]
⭐⭐ReVersion: Diffusion-Based Relation Inversion from Images
[Website]
[Project]
[Code]
⭐SINE: SINgle Image Editing with Text-to-Image Diffusion Models
[CVPR 2023]
[Website]
[Project]
[Code]
⭐Break-A-Scene: Extracting Multiple Concepts from a Single Image
[SIGGRAPH Asia 2023]
[Project]
[Code]
⭐Concept Decomposition for Visual Exploration and Inspiration
[SIGGRAPH Asia 2023]
[Project]
[Code]
Cones: Concept Neurons in Diffusion Models for Customized Generation
[ICML 2023 Oral]
[ICML 2023 Oral]
[Website]
[Code]
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
[NeurIPS 2023]
[Website]
[Project]
[Code]
Inserting Anybody in Diffusion Models via Celeb Basis
[NeurIPS 2023]
[Website]
[Project]
[Code]
Controlling Text-to-Image Diffusion by Orthogonal Finetuning
[NeurIPS 2023]
[Website]
[Project]
[Code]
Photoswap: Personalized Subject Swapping in Images
[NeurIPS 2023]
[Website]
[Project]
[Code]
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
[NeurIPS 2023]
[Website]
[Project]
[Code]
ITI-GEN: Inclusive Text-to-Image Generation
[ICCV 2023 Oral]
[Website]
[Project]
[Code]
Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models
[ICCV 2023]
[Website]
[Project]
[Code]
ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation
[ICCV 2023 Oral]
[Website]
[Code]
A Neural Space-Time Representation for Text-to-Image Personalization
[SIGGRAPH Asia 2023]
[Project]
[Code]
Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models
[SIGGRAPH 2023]
[Project]
[Code]
Is This Loss Informative? Speeding Up Textual Inversion with Deterministic Objective Evaluation
[NeurIPS 2023]
[Website]
[Code]
Material Palette: Extraction of Materials from a Single Image
[Website]
[Project]
[Code]
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
[AAAI 2024]
[Project]
[Code]
StyleDrop: Text-to-Image Generation in Any Style
[Website]
[Project]
[Code]
Style Aligned Image Generation via Shared Attention
[Website]
[Project]
[Code]
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
[Website]
[Project]
[Code]
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
[Website]
[Project]
[Code]
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
[Website]
[Project]
[Code]
Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion
[Website]
[Project]
[Code]
DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Positive-Negative Prompt-Tuning
[Website]
[Project]
[Code]
The Hidden Language of Diffusion Models
[Website]
[Project]
[Code]
SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing
[Website]
[Project]
[Code]
CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models
[Website]
[Project]
[Code]
When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation
[Website]
[Project]
[Code]
InstantID: Zero-shot Identity-Preserving Generation in Seconds
[Website]
[Project]
[Code]
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
[Website]
[Project]
[Code]
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
[Website]
[Project]
[Code]
CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization
[Website]
[Project]
[Code]
DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models
[Website]
[Project]
[Code]
CapHuman: Capture Your Moments in Parallel Universes
[Website]
[Project]
[Code]
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
[Website]
[Project]
[Code]
Learning Continuous 3D Words for Text-to-Image Generation
[Website]
[Project]
[Code]
Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models
[Website]
[Project]
[Code]
ProSpect: Expanded Conditioning for the Personalization of Attribute-aware Image Generation
[SIGGRAPH Asia 2023]
[Code]
Multiresolution Textual Inversion
[NeurIPS 2022 workshop]
[Code]
Compositional Inversion for Stable Diffusion Models
[AAAI 2024]
[Code]
Cross Initialization for Personalized Text-to-Image Generation
[Website]
[Code]
Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach
[Website]
[Code]
SVDiff: Compact Parameter Space for Diffusion Fine-Tuning
[Website]
[Code]
ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation
[Website]
[Code]
AerialBooth: Mutual Information Guidance for Text Controlled Aerial View Synthesis from a Single Image
[Website]
[Code]
A Closer Look at Parameter-Efficient Tuning in Diffusion Models
[Website]
[Code]
Controllable Textual Inversion for Personalized Text-to-Image Generation
[Website]
[Code]
Cross-domain Compositing with Pretrained Diffusion Models
[Website]
[Code]
Concept-centric Personalization with Large-scale Diffusion Priors
[Website]
[Code]
Customization Assistant for Text-to-image Generation
[Website]
[Code]
Cross Initialization for Personalized Text-to-Image Generation
[Website]
[Code]
High-fidelity Person-centric Subject-to-Image Synthesis
[Website]
[Code]
Key-Locked Rank One Editing for Text-to-Image Personalization
[SIGGRAPH 2023]
[Project]
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization
[Website]
[Project]
Subject-driven Text-to-Image Generation via Apprenticeship Learning
[Website]
[Project]
Orthogonal Adaptation for Modular Customization of Diffusion Models
[Website]
[Project]
Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation
[Website]
[Project]
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
[Website]
[Project]
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
[Website]
[Project]
Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models
[Website]
[Project]
[Website]
[Project]
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
[Website]
[Project]
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
[Website]
[Project]
Total Selfie: Generating Full-Body Selfies
[Website]
[Project]
DreamTuner: Single Image is Enough for Subject-Driven Generation
[Website]
[Project]
PALP: Prompt Aligned Personalization of Text-to-Image Models
[Website]
[Project]
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
[Website]
[Project]
Language-Informed Visual Concept Learning
[ICLR 2024]
[Project]
DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
[AAAI 2024]
Towards Prompt-robust Face Privacy Protection via Adversarial Decoupling Augmentation Framework
[Website]
InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser
[Website]
DisenBooth: Disentangled Parameter-Efficient Tuning for Subject-Driven Text-to-Image Generation
[Website]
Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models
[Website]
Gradient-Free Textual Inversion
[Website]
Identity Encoder for Personalized Diffusion
[Website]
Unified Multi-Modal Latent Diffusion for Joint Subject and Text Conditional Image Generation
[Website]
ELODIN: Naming Concepts in Embedding Spaces
[Website]
Cones 2: Customizable Image Synthesis with Multiple Subjects
[Website]
Generate Anything Anywhere in Any Scene
[Website]
Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model
[Website]
Face0: Instantaneously Conditioning a Text-to-Image Model on a Face
[Website]
MagiCapture: High-Resolution Multi-Concept Portrait Customization
[Website]
A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization
[Website]
DIFFNAT: Improving Diffusion Image Quality Using Natural Image Statistics
[Website]
An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis
[Website]
Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
[Website]
Memory-Efficient Personalization using Quantized Diffusion Model
[Website]
BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models
[Website]
Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization
[Website]
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding
[Website]
StableIdentity: Inserting Anybody into Anywhere at First Sight
[Website]
SeFi-IDE: Semantic-Fidelity Identity Embedding for Personalized Diffusion-Based Generation
[Website]
⭐⭐⭐Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
[SIGGRAPH 2023]
[Project]
[Official Code]
[Diffusers Code]
[Diffusers doc]
[Replicate Demo]
SEGA: Instructing Diffusion using Semantic Dimensions
[NeurIPS 2023]
[Website]
[Code]
[Diffusers Code]
[Diffusers Doc]
Improving Sample Quality of Diffusion Models Using Self-Attention Guidance
[ICCV 2023]
[Website]
[Project]
[Code Official]
[Diffusers Doc]
[Diffusers Code]
Expressive Text-to-Image Generation with Rich Text
[ICCV 2023]
[Website]
[Project]
[Code]
[Demo]
Editing Implicit Assumptions in Text-to-Image Diffusion Models
[ICCV 2023]
[Website]
[Project]
[Code]
[Demo]
MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models
[ICCV 2023]
[Website]
[Project]
[Code]
Discriminative Class Tokens for Text-to-Image Diffusion Models
[ICCV 2023]
[Website]
[Project]
[Code]
Compositional Visual Generation with Composable Diffusion Models
[ECCV 2022]
[Website]
[Project]
[Code]
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
[NeurIPS 2023]
[Website]
[Code]
Diffusion Self-Guidance for Controllable Image Generation
[NeurIPS 2023]
[Website]
[Project]
[Code]
DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models
[NeurIPS 2023]
[Website]
[Code]
Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment
[NeurIPS 2023]
[Website]
[Code]
Divide & Bind Your Attention for Improved Generative Semantic Nursing
[BMVC 2023 Oral]
[Project]
[Code]
Real-World Image Variation by Aligning Diffusion Inversion Chain
[Website]
[Project]
[Code]
FreeU: Free Lunch in Diffusion U-Net
[Website]
[Project]
[Code]
ConceptLab: Creative Generation using Diffusion Prior Constraints
[Website]
[Project]
[Code]
Aligning Text-to-Image Diffusion Models with Reward Backpropagationn
[Website]
[Project]
[Code]
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
[Website]
[Project]
[Code]
ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models
[Website]
[Project]
[Code]
One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls
[Website]
[Project]
[Code]
TokenCompose: Grounding Diffusion with Token-level Supervision
[Website]
[Project]
[Code]
DiffusionGPT: LLM-Driven Text-to-Image Generation System
[Website]
[Project]
[Code]
Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models
[Website]
[Project]
[Code]
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support
[Website]
[Project]
[Code]
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
[Website]
[Project]
[Code]
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
[ACM MM 2023 Oral]
[Code]
Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models
[ICLR 2024]
[Code]
ORES: Open-vocabulary Responsible Visual Synthesis
[Website]
[Code]
Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness
[Website]
[Code]
Detector Guidance for Multi-Object Text-to-Image Generation
[Website]
[Code]
Designing a Better Asymmetric VQGAN for StableDiffusion
[Website]
[Code]
FABRIC: Personalizing Diffusion Models with Iterative Feedback
[Website]
[Code]
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
[Website]
[Code]
Progressive Text-to-Image Diffusion with Soft Latent Direction
[Website]
[Code]
Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy
[Website]
[Code]
If at First You Don’t Succeed, Try, Try Again:Faithful Diffusion-based Text-to-Image Generation by Selection
[Website]
[Code]
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts
[Website]
[Code]
Making Multimodal Generation Easier: When Diffusion Models Meet LLMs
[Website]
[Code]
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
[Website]
[Code]
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
[Website]
[Code]
It is all about where you start: Text-to-image generation with seed selection
[Website]
[Code]
End-to-End Diffusion Latent Optimization Improves Classifier Guidance
[Website]
[Code]
Correcting Diffusion Generation through Resampling
[Website]
[Code]
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
[Website]
[Code]
Semantic Guidance Tuning for Text-To-Image Diffusion Models
[Website]
[Project]
Amazing Combinatorial Creation: Acceptable Swap-Sampling for Text-to-Image Generation
[Website]
[Project]
Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation
[Website]
[Project]
Norm-guided latent space exploration for text-to-image generation
[NeurIPS 2023]
[Website]
Improving Diffusion-Based Image Synthesis with Context Prediction
[NeurIPS 2023]
[Website]
Instruct-Imagen: Image Generation with Multi-modal Instruction
[Website]
CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models
[Website]
MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask
[Website]
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
[Website]
Text2Layer: Layered Image Generation using Latent Diffusion Model
[Website]
Stimulating the Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling
[Website]
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
[Website]
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion
[Website]
Improving Compositional Text-to-image Generation with Large Vision-Language Models
[Website]
Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else
[Website]
Unseen Image Synthesis with Diffusion Models
[Website]
AnyLens: A Generative Diffusion Model with Any Rendering Lens
[Website]
Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering
[Website]
Text2Street: Controllable Text-to-image Generation for Street Views
[Website]
MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation
[ICML 2023]
[ICML 2023]
[Website]
[Project]
[Code]
[Diffusers Code]
[Diffusers Doc]
[Replicate Demo]
SceneComposer: Any-Level Semantic Image Synthesis
[CVPR 2023 Highlight]
[Website]
[Project]
[Code]
GLIGEN: Open-Set Grounded Text-to-Image Generation
[CVPR 2023]
[Website]
[Code]
[Demo]
Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
[ICLR 2023]
[Website]
[Project]
[Code]
Visual Programming for Text-to-Image Generation and Evaluation
[NeurIPS 2023]
[Website]
[Project]
[Code]
ReCo: Region-Controlled Text-to-Image Generation
[CVPR 2023]
[Website]
[Code]
Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis
[ICCV 2023]
[Website]
[Code]
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
[ICCV 2023]
[Website]
[Code]
Dense Text-to-Image Generation with Attention Modulation
[ICCV 2023]
[Website]
[Code]
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
[Website]
[Project]
[Code]
[Demo]
[Blog]
Training-Free Layout Control with Cross-Attention Guidance
[Website]
[Project]
[Code]
Directed Diffusion: Direct Control of Object Placement through Attention Guidance
[Website]
[Project]
[Code]
Grounded Text-to-Image Synthesis with Attention Refocusing
[Website]
[Project]
[Code]
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation
[Website]
[Project]
[Code]
Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
[Website]
[Project]
[Code]
R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation
[Website]
[Project]
[Code]
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
[Website]
[Project]
[Code]
InstanceDiffusion: Instance-level Control for Image Generation
[Website]
[Project]
[Code]
Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation
[Website]
[Code]
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
[Website]
[Code]
InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
[Website]
[Project]
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
[Website]
[Project]
Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
[Website]
[Project]
Guided Image Synthesis via Initial Image Editing in Diffusion Model
[ACM MM 2023]
A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis
[Website]
Controllable Text-to-Image Generation with GPT-4
[Website]
Localized Text-to-Image Generation for Free via Cross Attention Control
[Website]
Training-Free Location-Aware Text-to-Image Synthesis
[Website]
Composite Diffusion | whole >= \Sigma parts
[Website]
Continuous Layout Editing of Single Images with Diffusion Models
[Website]
Zero-shot spatial layout conditioning for text-to-image diffusion models
[Website]
Enhancing Object Coherence in Layout-to-Image Synthesis
[Website]
LoCo: Locally Constrained Training-Free Layout-to-Image Synthesis
[Website]
Self-correcting LLM-controlled Diffusion Models
[Website]
Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis
[Website]
Joint Generative Modeling of Scene Graphs and Images via Diffusion Models
[Website]
Spatial-Aware Latent Initialization for Controllable Image Generation [Website]
⭐⭐⭐SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
[ICLR 2022]
[Website]
[Project]
[Code]
CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
[NeurIPS 2023]
[Website]
[Project]
[Code]
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
[CVPR 2022]
[Website]
[Code]
Diffusion-based Image Translation using Disentangled Style and Content Representation
[ICLR 2023]
[Website]
[Code]
FlexIT: Towards Flexible Semantic Image Translation
[CVPR 2022]
[Website]
[Code]
Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer
[ICCV 2023]
[Website]
[Code]
Cross-Image Attention for Zero-Shot Appearance Transfer
[Website]
[Project]
[Code]
Diffusion Guided Domain Adaptation of Image Generators
[Website]
[Project]
[Code]
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
[Website]
[Project]
[Code]
Improving Diffusion-based Image Translation using Asymmetric Gradient Guidance
[Website]
[Code]
GEM: Boost Simple Network for Glass Surface Segmentation via Segment Anything Model and Data Synthesis
[Website]
[Code]
CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion
[Website]
[Code]
FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models
[Website]
[Project]
StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models
[ICCV 2023]
[Website]
ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors
[ACM MM 2023]
High-Fidelity Diffusion-based Image Editing
[AAAI 2024]
E2GAN: Efficient Training of Efficient GANs for Image-to-Image Translation
[Website]
UniHDA: Towards Universal Hybrid Domain Adaptation of Image Generators
[Website]
odise: open-vocabulary panoptic segmentation with text-to-image diffusion modelss
[CVPR 2023 Highlight]
[Project]
[Code]
[Demo]
LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation
[ICCV 2023]
[Website]
[Project]
[Code]
Stochastic Segmentation with Conditional Categorical Diffusion Models
[ICCV 2023]
[Website]
[Code]
DDP: Diffusion Model for Dense Visual Prediction
[ICCV 2023]
[Website]
[Code]
DiffusionDet: Diffusion Model for Object Detection
[ICCV 2023]
[Website]
[Code]
OVTrack: Open-Vocabulary Multiple Object Tracking
[CVPR 2023]
[Website]
[Project]
SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process
[NeurIPS 2023]
[Website]
[Code]
Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
[Website]
[Project]
[Code]
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
[Website]
[Project]
[Code]
Personalize Segment Anything Model with One Shot
[Website]
[Code]
DiffusionTrack: Diffusion Model For Multi-Object Tracking
[Website]
[Code]
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
[Website]
[Code]
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
[Website]
[Code]
Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation
[Website]
[Code]
UniGS: Unified Representation for Image Generation and Segmentation
[Website]
[Code]
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
[ICLR 2024]
[Website]
[Project]
DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models
[Website]
[Project]
Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation
[ICCV 2023]
[Website]
Generalization by Adaptation: Diffusion-Based Domain Extension for Domain-Generalized Semantic Segmentation
[WACV 2024]
SLiMe: Segment Like Me
[Website]
MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation
[Website]
DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery
[Website]
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
[Website]
Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter
[Website]
Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion
[Website]
From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models
[Website]
Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation
[Website]
Patch-based Selection and Refinement for Early Object Detection
[Website]
TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models
[Website]
Towards Granularity-adjusted Pixel-level Semantic Annotation
[Website]
Gen2Det: Generate to Detect
[Website]
Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors
[Website]
⭐⭐⭐Adding Conditional Control to Text-to-Image Diffusion Models
[ICCV 2023 best paper]
[Website]
[Official Code]
[Diffusers Doc]
[Diffusers Code]
SketchKnitter: Vectorized Sketch Generation with Diffusion Models
[ICLR 2023 Spotlight]
[ICLR 2023 Spotlight]
[Website]
[Code]
Freestyle Layout-to-Image Synthesis
[CVPR 2023 highlight]
[Website]
[Project]
[Code]
Collaborative Diffusion for Multi-Modal Face Generation and Editing
[CVPR 2023]
[Website]
[Project]
[Code]
HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation
[ICCV 2023]
[Website]
[Project]
[Code]
FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model
[ICCV 2023]
[Website]
[Code]
Sketch-Guided Text-to-Image Diffusion Models
[SIGGRAPH 2023]
[Project]
[Code]
Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive
[ICLR 2024]
[Project]
[Code]
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
[Website]
[Project]
[Code]
Late-Constraint Diffusion Guidance for Controllable Image Synthesis
[Website]
[Project]
[Code]
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
[Website]
[Project]
[Code]
Composer: Creative and controllable image synthesis with composable conditions
[Website]
[Project]
[Code]
DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models
[Website]
[Project]
[Code]
Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation
[Website]
[Project]
[Code]
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild
[Website]
[Project]
[Code]
Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
[Website]
[Project]
[Code]
LooseControl: Lifting ControlNet for Generalized Depth Conditioning
[Website]
[Project]
[Code]
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
[Website]
[Project]
[Code]
ControlNet-XS: Designing an Efficient and Effective Architecture for Controlling Text-to-Image Diffusion Models
[Website]
[Project]
[Code]
ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet
[Website]
[Project]
[Code]
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
[ICLR 2024]
[Code]
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
[Website]
[Code]
Universal Guidance for Diffusion Models
[Website]
[Code]
Late-Constraint Diffusion Guidance for Controllable Image Synthesis
[Website]
[Code]
Meta ControlNet: Enhancing Task Adaptation via Meta Learning
[Website]
[Code]
Local Conditional Controlling for Text-to-Image Diffusion Models
[Website]
[Code]
Modulating Pretrained Diffusion Models for Multimodal Image Synthesis
[SIGGRAPH 2023]
[Project]
SpaText: Spatio-Textual Representation for Controllable Image Generation
[CVPR 2023]
[Project]
CCM: Adding Conditional Controls to Text-to-Image Consistency Models
[Website]
[Project]
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection
[Website]
[Project]
Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor
[Website]
[Project]
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
[Website]
[Project]
SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation
[Website]
Conditioning Diffusion Models via Attributes and Semantic Masks for Face Generation
[Website]
Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt
[Website]
Adding 3D Geometry Control to Diffusion Models
[Website]
LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation
[Website]
JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling
[Website]
ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet
[Website]
Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons
[Website]
Discriminative Diffusion Models as Few-shot Vision and Language Learners
[Website]
[Code]
Few-Shot Diffusion Models
[Website]
[Code]
Few-shot Semantic Image Synthesis with Class Affinity Transfer
[CVPR 2023]
[Website]
DiffAlign : Few-shot learning using diffusion based synthesis and alignment
[Website]
Few-shot Image Generation with Diffusion Models
[Website]
Lafite2: Few-shot Text-to-Image Generation
[Website]
Paint by Example: Exemplar-based Image Editing with Diffusion Models
[CVPR 2023]
[Website]
[Code]
[Diffusers Doc]
[Diffusers Code]
GLIDE: Towards photorealistic image generation and editing with text-guided diffusion model
[ICML 2022 Spotlight]
[Website]
[Code]
Blended Diffusion for Text-driven Editing of Natural Images
[CVPR 2022]
[Website]
[Project]
[Code]
Blended Latent Diffusion
[SIGGRAPH 2023]
[Project]
[Code]
TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition
[ICCV 2023]
[Website]
[Project]
[Code]
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
[CVPR 2023]
[Website]
[Code]
Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models
[ICML 2023]
[Website]
[Code]
AnyDoor: Zero-shot Object-level Image Customization
[Website]
[Project]
[Code]
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
[Website]
[Project]
[Code]
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
[Website]
[Project]
[Code]
360-Degree Panorama Generation from Few Unregistered NFoV Images
[ACM MM 2023]
[Code]
Delving Globally into Texture and Structure for Image Inpainting
[ACM MM 2022]
[Code]
Reference-based Image Composition with Sketch via Structure-aware Diffusion Model
[Website]
[Code]
Image Inpainting via Iteratively Decoupled Probabilistic Modeling
[Website]
[Code]
ControlCom: Controllable Image Composition using Diffusion Model
[Website]
[Code]
Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model
[Website]
[Code]
MAGICREMOVER: TUNING-FREE TEXT-GUIDED IMAGE INPAINTING WITH DIFFUSION MODELS
[Website]
[Code]
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
[Website]
[Code]
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control
[Website]
[Project]
Towards Stable and Faithful Inpainting
[Website]
[Project]
Personalized Face Inpainting with Diffusion Models by Parallel Visual Attention
[Website]
SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model
[Website]
Gradpaint: Gradient-Guided Inpainting with Diffusion Models
[Website]
Infusion: Internal Diffusion for Video Inpainting
[Website]
LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
[CVPR 2023]
[Website]
[Project]
[Code]
DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer
[ICCV 2023]
[Website]
[Code]
LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models
[ICCV 2023]
[Website]
[Code]
LayoutDM: Transformer-based Diffusion Model for Layout Generation
[CVPR 2023]
[Website]
Unifying Layout Generation with a Decoupled Diffusion Model
[CVPR 2023]
[Website]
PLay: Parametrically Conditioned Layout Generation using Latent Diffusion
[ICML 2023]
[Website]
Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints
[ICLR 2024]
Diffusion-based Document Layout Generation
[Website]
Dolfin: Diffusion Layout Transformers without Autoencoder
[Website]
TextDiffuser: Diffusion Models as Text Painters
[NeurIPS 2023]
[Website]
[Project]
[Code]
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
[Website]
[Code]
GlyphControl: Glyph Conditional Control for Visual Text Generation
[NeurIPS 2023]
[Website]
[Code]
DiffUTE: Universal Text Editing Diffusion Model
[NeurIPS 2023]
[Website]
[Code]
Word-As-Image for Semantic Typography
[SIGGRAPH 2023]
[Project]
[Code]
UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
[Website]
[Project]
[Code]
Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model
[AAAI 2024]
[Code]
FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
[AAAI 2024]
[Code]
Text Image Inpainting via Global Structure-Guided Diffusion Models
[AAAI 2024]
[Code]
Ambigram generation by a diffusion model
[ICDAR 2023]
[Code]
AnyText: Multilingual Visual Text Generation And Editing
[Website]
[Code]
AmbiGen: Generating Ambigrams from Pre-trained Diffusion Model
[Website]
[Project]
UniVG: Towards UNIfied-modal Video Generation
[Website]
[Project]
Scene Text Image Super-resolution based on Text-conditional Diffusion Models
[WACV 2024]
DECDM: Document Enhancement using Cycle-Consistent Diffusion Models
[WACV 2024]
VecFusion: Vector Font Generation with Diffusion
[Website]
ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting
[NeurIPS 2023 spotlight]
[Website]
[Project]
[Code]
Image Super-Resolution via Iterative Refinement
[TPAMI]
[Website]
[Project]
[Code]
DiffIR: Efficient Diffusion Model for Image Restoration
[ICCV 2023]
[Website]
[Code]
Exploiting Diffusion Prior for Real-World Image Super-Resolution
[Website]
[Project]
[Code]
Iterative Token Evaluation and Refinement for Real-World Super-Resolution
[AAAI 2024]
[Code]
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach
[Website]
[Code]
Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization
[Website]
[Code]
DSR-Diff: Depth Map Super-Resolution with Diffusion Model
[Website]
[Code]
HSR-Diff: Hyperspectral Image Super-Resolution via Conditional Diffusion Models
[ICCV 2023]
[Website]
You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
[Website]
Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution
[Website]
Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models
[Website]
YODA: You Only Diffuse Areas. An Area-Masked Diffusion Approach For Image Super-Resolution
[Website]
Domain Transfer in Latent Space (DTLS) Wins on Image Super-Resolution -- a Non-Denoising Model
[Website]
Image Super-Resolution with Text Prompt Diffusio
[Website]
DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based Single Image Super-resolution
[Website]
DREAM: Diffusion Rectification and Estimation-Adaptive Models
[Website]
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
[Website]
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models
[ICLR 2024]
[Website]
[Project]
[Code]
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
[SIGGRAPH 2023]
[Project]
[Code]
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
[Website]
[Project]
[Code]
Repositioning the Subject within Image
[Website]
[Project]
[Code]
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
[Website]
[Code]
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing
[Website]
[Code]
DragVideo: Interactive Drag-style Video Editing
[Website]
[Code]
RotationDrag: Point-based Image Editing with Rotated Diffusion Features
[Website]
[Code]
Readout Guidance: Learning Control from Diffusion Features
[Website]
[Project]
Drag-A-Video: Non-rigid Video Editing with Point-based Interaction
[Website]
[Project]
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
[ICCV 2023 Oral]
[Website]
[Project]
[Code]
SinFusion: Training Diffusion Models on a Single Image or Video
[ICML 2023]
[Website]
[Project]
[Code]
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
[CVPR 2023]
[Website]
[Project]
[Code]
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
[NeurIPS 2022]
[Website]
[Project]
[Code]
GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER
[NeurIPS 2023]
[Website]
[Code]
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator
[NeurIPS 2023]
[Website]
[Code]
Conditional Image-to-Video Generation with Latent Flow Diffusion Models
[CVPR 2023]
[Website]
[Code]
Video Diffusion Models
[ICLR 2022 workshop]
[Website]
[Code]
[Project]
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
[Website]
[Diffusers Doc]
[Project]
[Code]
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
[Website]
[Project]
[Code]
MagicAvatar: Multimodal Avatar Generation and Animation
[Website]
[Project]
[Code]
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
[Website]
[Project]
[Code]
Breathing Life Into Sketches Using Text-to-Video Priors
[Website]
[Project]
[Code]
Latent Video Diffusion Models for High-Fidelity Long Video Generation
[Website]
[Project]
[Code]
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
[Website]
[Project]
[Code]
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
[Website]
[Project]
[Code]
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models
[Website]
[Project]
[Code]
VideoComposer: Compositional Video Synthesis with Motion Controllability
[Website]
[Project]
[Code]
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
[Website]
[Project]
[Code]
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models
[Website]
[Project]
[Code]
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
[Website]
[Project]
[Code]
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation
[Website]
[Project]
[Code]
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer
[Website]
[Project]
[Code]
LLM-GROUNDED VIDEO DIFFUSION MODELS
[Website]
[Project]
[Code]
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling
[Website]
[Project]
[Code]
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
[Website]
[Project]
[Code]
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
[Website]
[Project]
[Code]
VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning
[Website]
[Project]
[Code]
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
[Website]
[Project]
[Code]
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline
[Website]
[Project]
[Code]
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
[Website]
[Project]
[Code]
ART⋅V: Auto-Regressive Text-to-Video Generation with Diffusion Models
[Website]
[Project]
[Code]
FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax
[Website]
[Project]
[Code]
VideoBooth: Diffusion-based Video Generation with Image Prompts
[Website]
[Project]
[Code]
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
[Website]
[Project]
[Code]
LivePhoto: Real Image Animation with Text-guided Motion Control
[Website]
[Project]
[Code]
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
[Website]
[Project]
[Code]
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
[Website]
[Project]
[Code]
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
[Website]
[Project]
[Code]
DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models
[Website]
[Project]
[Code]
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
[Website]
[Project]
[Code]
FreeInit: Bridging Initialization Gap in Video Diffusion Models
[Website]
[Project]
[Code]
Text2AC-Zero: Consistent Synthesis of Animated Characters using 2D Diffusion
[Website]
[Project]
[Code]
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
[Website]
[Project]
[Code]
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
[Website]
[Project]
[Code]
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
[Website]
[Project]
[Code]
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions
[Website]
[Project]
[Code]
Latte: Latent Diffusion Transformer for Video Generation
[Website]
[Project]
[Code]
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
[Website]
[Project]
[Code]
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
[Website]
[Project]
[Code]
Towards A Better Metric for Text-to-Video Generation
[Website]
[Project]
[Code]
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning
[Website]
[Project]
[Code]
Diffusion Probabilistic Modeling for Video Generation
[Website]
[Code]
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
[Website]
[Code]
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
[Website]
[Code]
STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video Prediction
[Website]
[Code]
Vlogger: Make Your Dream A Vlog
[Website]
[Code]
VideoPoet: A Large Language Model for Zero-Shot Video Generation
[Website]
[Project]
PEEKABOO: Interactive Video Generation via Masked-Diffusion
[Website]
[Project]
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
[Website]
[Project]
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
[Website]
[Project]
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
[Website]
[Project]
Imagen Video: High Definition Video Generation with Diffusion Models
[Website]
[Project]
MoVideo: Motion-Aware Video Generation with Diffusion Models
[Website]
[Project]
Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
[Website]
[Project]
Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning
[Website]
[Project]
VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model
[Website]
[Project]
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
[Website]
[Project]
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
[Website]
[Project]
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
[Website]
[Project]
Customizing Motion in Text-to-Video Diffusion Models
[Website]
[Project]
Photorealistic Video Generation with Diffusion Models
[Website]
[Project]
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
[Website]
[Project]
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
[Website]
[Project]
ActAnywhere: Subject-Aware Video Background Generation
[Website]
[Project]
Lumiere: A Space-Time Diffusion Model for Video Generation
[Website]
[Project]
InstructVideo: Instructing Video Diffusion Models with Human Feedback
[Website]
[Project]
Boximator: Generating Rich and Controllable Motions for Video Synthesis
[Website]
[Project]
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
[Website]
[Project]
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
[Website]
[Project]
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
[Website]
Dual-Stream Diffusion Net for Text-to-Video Generation
[Website]
SimDA: Simple Diffusion Adapter for Efficient Video Generation
[Website]
VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation
[Website]
Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models
[Website]
ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation
[Website]
LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation
[Website]
Optimal Noise pursuit for Augmenting Text-to-Video Generation
[Website]
Make Pixels Dance: High-Dynamic Video Generation
[Website]
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
[Website]
Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion
[Website]
Decouple Content and Motion for Conditional Image-to-Video Generation
[Website]
F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis
[Website]
MTVG : Multi-text Video Generation with Text-to-Video Models
[Website]
VideoLCM: Video Latent Consistency Model
[Website]
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
[Website]
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models
[Website]
360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model
[Website]
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
[Website]
Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation
[Website]
Training-Free Semantic Video Composition via Pre-trained Diffusion Model
[Website]
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
[Website]
Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models
[Website]
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
[ICCV 2023 Oral]
[Website]
[Project]
[Code]
Text2LIVE: Text-Driven Layered Image and Video Editing
[ECCV 2022 Oral]
[Project]
[code]
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding
[CVPR 2023]
[Project]
[Code]
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
[ICCV 2023]
[Project]
[Code]
StableVideo: Text-driven Consistency-aware Diffusion Video Editing
[ICCV 2023]
[Website]
[Code]
Video-P2P: Video Editing with Cross-attention Control
[Website]
[Project]
[Code]
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
[Website]
[Project]
[Code]
MagicEdit: High-Fidelity and Temporally Coherent Video Editing
[Website]
[Project]
[Code]
TokenFlow: Consistent Diffusion Features for Consistent Video Editing
[Website]
[Project]
[Code]
ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing
[Website]
[Project]
[Code]
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
[Website]
[Project]
[Code]
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
[Website]
[Project]
[Code]
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
[Website]
[Project]
[Code]
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models
[Website]
[Project]
[Code]
MotionEditor: Editing Video Motion via Content-Aware Diffusion
[Website]
[Project]
[Code]
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
[Website]
[Project]
[Code]
MagicStick: Controllable Video Editing via Control Handle Transformations
[Website]
[Project]
[Code]
VidToMe: Video Token Merging for Zero-Shot Video Editing
[Website]
[Project]
[Code]
VASE: Object-Centric Appearance and Shape Manipulation of Real Videos
[Website]
[Project]
[Code]
Neural Video Fields Editing
[Website]
[Project]
[Code]
Vid2Vid-zero: Zero-Shot Video Editing Using Off-the-Shelf Image Diffusion Models
[Website]
[Code]
DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization
[Website]
[Code]
LOVECon: Text-driven Training-Free Long Video Editing with ControlNet
[Website]
[Code]
Pix2video: Video Editing Using Image Diffusion
[Website]
[Code]
Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer
[Website]
[Code]
Flow-Guided Diffusion for Video Inpainting
[Website]
[Code]
Shape-Aware Text-Driven Layered Video Editing
[CVPR 2023]
[Website]
[Project]
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
[Website]
[Project]
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
[Website]
[Project]
VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing
[Website]
[Project]
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
[Website]
[Project]
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
[Website]
[Project]
MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance
[Website]
[Project]
Edit Temporal-Consistent Videos with Image Diffusion Model
[Website]
Cut-and-Paste: Subject-Driven Video Editing with Attention Control
[Website]
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation
[Website]
Dreamix: Video Diffusion Models Are General Video Editors
[Website]
Towards Consistent Video Editing with Text-to-Image Diffusion Models
[Website]
EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints
[Website]
CCEdit: Creative and Controllable Video Editing via Diffusion Models
[Website]
Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models
[Website]
FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier
[Website]
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
[Website]
RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing
[Website]
Object-Centric Diffusion for Efficient Video Editing
[Website]
TryOnDiffusion: A Tale of Two UNets
[CVPR 2023]
[Website]
[Project]
[Code]
Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images
[Website]
[Project]
[Code]
PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns
[Website]
[Project]
[Code]
StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
[Website]
[Project]
[Code]
Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow
[ACM MM 2023]
[Code]
LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On
[ACM MM 2023]
[Code]
DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling
[Website]
[Code]
CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model
[Website]
[Code]
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All
[Website]
[Project]
WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on
[Website]
Product-Level Try-on: Characteristics-preserving Try-on with Realistic Clothes Shading and Wrinkles
[Website]
Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models
[Website]