Skip to content

Monad-Cube/CVPR-2024-Highlight-Oral

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 

Repository files navigation

CVPR-2024-Oral

Geometry

Rethinking Inductive Biases for Surface Normal Estimation

Scene Understanding

PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

Reconstruction

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

Embodied AI

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Multi-Modal

Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations

Video

FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring

CVPR-2024-Highlight

3D Presentation(GS, NeRF)

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces

HashPoint: Accelerated Point Searching and Sampling for Neural Rendering

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

Diffusion

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

Presentation

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

LLM

VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

3D Generation

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

2D Generation

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

Digital Human

A 4D Dataset ofReal-World Human Clothing With Semantic Annotations

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

3D Human Pose Perception from Egocentric Stereo Videos

Relightable and Animatable Neural Avatar from Sparse-View Video

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians

Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Human-Scene

Scaling Up Dynamic Human-Scene Interaction Modeling

Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Multi-Modal

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Video

3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos

SpatialTracker: Tracking Any 2D Pixels in 3D Space

Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution

RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Putting the Object Back into Video Object Segmentation

Boosting Neural Representations for Videos with a Conditional Decoder

Enhancing Video Super-Resolution via Implicit Resampling-based Alignment

VTimeLLM: Empower LLM to Grasp Video Moments

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

Image

ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

Novel View Synthesis

CoPoNeRF: Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs

SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image

Reconstruction

IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images

Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle

Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments

HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video

Segmentation

EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

GraCo: Granularity-Controllable Interactive Segmentation

OpenESS: Event-Based Semantic Scene Understanding with Open Vocabularies

Frequency-Adaptive Dilated Convolution for Semantic Segmentation

SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers

Embodied AI

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

Pose Estimation

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Autonomous Driving

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Generalized Predictive Model for Autonomous Driving

Dynamic LiDAR Re-simulation using Compositional Neural Fields

SLAM

Gaussian Splatting SLAM

SFM

VGGSfM: Visual Geometry Grounded Deep Structure From Motion

Camera Pose

Map-Relative Pose Regression for Visual Re-Localization

FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation

Medical Image Analysis

Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning

Diversified and Personalized Multi-rater Medical Image Segmentation

Robotic Manipulation

Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation

Knowledge Distillation

Logit Standardization in Knowledge Distillation

Feature Matching

Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

Stereo Matching

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Brain Decoding

MindBridge: A Cross-Subject Brain Decoding Framework

About

Collection of Highlight papers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published