Skip to content

Monad-Cube/ECCV-2024-Oral

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

ECCV-2024-Oral

2D Scene Understanding

Diffusion Models for Zero-Shot Open-Vocabulary Segmentation

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models

Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities

OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation

ESA: Annotation-Efficient Active Learning for Semantic Segmentation

Towards Scene Graph Anticipation

Dataset Enhancement with Instance-Level Augmentations

An Adaptive Correspondence Scoring Framework for Unsupervised Image Registration of Medical Images

HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution

Towards Open-ended Visual Quality Comparison

CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model

A Fair Ranking and New Model for Panoptic Scene Graph Generation

Parrot Captions Teach CLIP to Spot Text

On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines

From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition

SINDER: Repairing the Singular Defects of DINOv2

Emergent Visual-Semantic Hierarchies in Image-Text Representations

AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation

3D Scene Understanding

OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects

PointLLM: Empowering Large Language Models to Understand Point Clouds

Bi-directional Contextual Attention for 3D Dense Captioning

Watch Your Steps: Local Image and Scene Editing by Text Instructions

Scene Coordinate Reconstruction

HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation

RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentatio

RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation

Grounding Image Matching in 3D with MASt3R

Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration

SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

NeRF / Gaussian

Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields

RaFE: Generative Radiance Fields Restoration

Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration

FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information

2D Generation

Adversarial Diffusion Distillation

Adversarial Robustification via Text-to-Image Diffusion Models

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Accelerating Image Generation with Sub-path Linear Approximation Model

LLMGA: Multimodal Large Language Model based Generation Assistant

LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model

SemGrasp: Semantic Grasp Generation via Language Aligned Discretization

3D Generation

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

FlashTex: Fast Relightable Mesh Texturing with LightControlNet

Pyramid Diffusion for Fine 3D Large Scene Generation

COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation

A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures

Human

TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation

Controllable Human-Object Interaction Synthesis

Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation

Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models

A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars

ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

Sapiens: Foundation for Human Vision Models

Arc2Face: A Foundation Model for ID-Consistent Human Faces

Video

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Audio-Synchronized Visual Animation

LongVLM: Efficient Long Video Understanding via Large Language Models

ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems

Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos

E3M: Zero-Shot Spatio-Temporal Video Grounding

Classification Matters: Improving Video Action Detection with Class-Specific Attention

Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets

ActionVOS: Actions as Prompts for Video Object Segmentation

DEVIAS: Learning Disentangled Video Representations of Action and Scene

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

Made to Order: Discovering monotonic temporal changes via self-supervised video ordering

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation

Video Editing via Factorized Diffusion Distillation

Towards Neuro-Symbolic Video Understanding

LLM / MLLM / VLM

MMBench: Is Your Multi-modal Model an All-around Player?

BRAVE: Broadening the visual encoding of vision-language models

Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Models

Towards Goal-oriented Large Language Model Prompting: A Survey

Transformer

Denoising Vision Transformers

Diffusion

Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published