CVPR-2024-Oral

Geometry

Rethinking Inductive Biases for Surface Normal Estimation

Homepage : https://baegwangbin.github.io/DSINE/
Paper : https://arxiv.org/abs/2403.00712

Scene Understanding

PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

Homepage : https://astra-vision.github.io/PaSCo/
Paper : https://arxiv.org/abs/2312.02158

SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

Homepage : https://scenefun3d.github.io/
Paper : https://alexdelitzas.github.io/assets/pdf/SceneFun3D.pdf

Reconstruction

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

Homepage : https://davidcharatan.com/pixelsplat/
Paper : https://arxiv.org/abs/2312.12337

Embodied AI

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Homepage : https://ego-exo4d-data.org/
Paper : https://arxiv.org/abs/2311.18259
Description : Dataset, Benchmark

Multi-Modal

Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations

Paper : https://arxiv.org/abs/2403.02090
Descritpion : Baseline

Video

FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring

Homepage : https://kaist-viclab.github.io/fmanet-site/
Paper : https://arxiv.org/abs/2401.03707
Description : Method

CVPR-2024-Highlight

3DGS
Diffusion
Presentation
LLM
3D Generation
2D Generation
Digital Human
Human-Scene
Multi-Modal
Video
Image
Novel View Synthesis
Reconstruction
Segmentation
Embodied AI
Pose Estimation
Autonomous Driving
SLAM
Medical Image Analysis
Machine Learning
Robotic Manipulation
Others

3D Presentation(GS, NeRF)

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces

Homepage : https://haithemturki.com/hybrid-nerf/
Paper : https://haithemturki.com/hybrid-nerf/resources/paper.pdf
Description : Presentation

HashPoint: Accelerated Point Searching and Sampling for Neural Rendering

Homepage : https://jiahao-ma.github.io/hashpoint/
Paper : https://arxiv.org/abs/2404.14044v1
Description : Method Theory

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Homepage : https://city-super.github.io/scaffold-gs/
Paper: https://arxiv.org/abs/2312.00109

Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

Homepage : https://feature-3dgs.github.io/
Paper : https://arxiv.org/abs/2312.03203

Diffusion

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Homepage : https://migcproject.github.io/
Paper : https://arxiv.org/abs/2402.19481

Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

Homepage : https://jiwoogit.github.io/StyleID_site/
Paper : https://arxiv.org/abs/2312.09008

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

Paper : https://arxiv.org/abs/2311.16503

Presentation

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Paper : https://arxiv.org/abs/2311.08046

LLM

VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Paper : https://arxiv.org/abs/2311.16922

3D Generation

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

Homepage : https://aigc3d.github.io/richdreamer/
Paper : https://arxiv.org/abs/2311.16918
Description : Method

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

Homepage : https://nihalsid.github.io/mesh-gpt/
Paper : https://arxiv.org/abs/2311.15475
Description : Method, Mesh Generation

PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

Homepage : https://xpandora.github.io/PhysGaussian/
Paper : https://arxiv.org/abs/2311.12198

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

Homepage : https://daveredrum.github.io/SceneTex/
Paper : https://arxiv.org/abs/2311.17261

2D Generation

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

Homepage : https://migcproject.github.io/
Paper : https://arxiv.org/pdf/2402.05408

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

Paper : https://arxiv.org/abs/2404.07990

Digital Human

A 4D Dataset ofReal-World Human Clothing With Semantic Annotations

Homepage : https://eth-ait.github.io/4d-dress/
Paper : https://arxiv.org/abs/2404.18630
Description : Dataset

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

Homepage : https://river-zhang.github.io/SIFU-projectpage/
Paper : https://arxiv.org/abs/2312.06704

3D Human Pose Perception from Egocentric Stereo Videos

Homepage : https://4dqv.mpi-inf.mpg.de/UnrealEgo2/
Paper : https://arxiv.org/abs/2401.00889
Description : Benchmark Dataset

Relightable and Animatable Neural Avatar from Sparse-View Video

Homepage : https://zju3dv.github.io/relightable_avatar/
Paper : https://arxiv.org/abs/2308.07903

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

Homepage : https://alvinliu0.github.io/projects/HumanGaussian
Paper : https://arxiv.org/abs/2311.17061

GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians

Homepage : https://github.com/ShenhanQian/GaussianAvatars

Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Homepage : https://shunyuanzheng.github.io/GPS-Gaussian
Paper : https://arxiv.org/pdf/2312.02155

Human-Scene

Scaling Up Dynamic Human-Scene Interaction Modeling

Homepage : https://jnnan.github.io/trumans/
Paper : https://arxiv.org/abs/2403.08629
Description : Dataset

Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Homepage : https://afford-motion.github.io/
Paper : https://arxiv.org/abs/2403.18036
Description :

Multi-Modal

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Homepage : https://github.com/shikiw/OPERA
Paper : https://arxiv.org/abs/2311.17911
Description : Method

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Homepage : https://julianjuaner.github.io/projects/PromptHighlighter/
Paper : https://arxiv.org/abs/2312.04302

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Paper: https://arxiv.org/pdf/2312.06462

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Paper : https://arxiv.org/abs/2311.06607

Video

3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos

Homepage : https://sjojok.github.io/3dgstream/
Paper : https://arxiv.org/abs/2403.01444

SpatialTracker: Tracking Any 2D Pixels in 3D Space

Homepage : https://henry123-boy.github.io/SpaTracker/
Paper : https://arxiv.org/abs/2404.04319

Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution

Homepage : https://shangchenzhou.com/projects/upscale-a-video/
Paper : https://arxiv.org/abs/2312.06640

RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models

Homepage : https://rave-video.github.io/
Paper : https://arxiv.org/abs/2312.04524

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Homepage : https://qiuyu96.github.io/CoDeF/
Paper : https://arxiv.org/abs/2308.07926

Putting the Object Back into Video Object Segmentation

Homepage : https://hkchengrex.com/Cutie/
Paper : https://arxiv.org/abs/2310.12982

Boosting Neural Representations for Videos with a Conditional Decoder

Paper : https://arxiv.org/abs/2402.18152

Enhancing Video Super-Resolution via Implicit Resampling-based Alignment

Paper : https://arxiv.org/abs/2305.00163

VTimeLLM: Empower LLM to Grasp Video Moments

Homepage : https://arxiv.org/abs/2311.18445

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

Paper : https://arxiv.org/abs/2311.16464

Image

ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object

Homepage : https://chenshuang-zhang.github.io/imagenet_d/
Paper : https://arxiv.org/abs/2403.18775
Description : Benchmark Dataset

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

Homepage : https://yuzhou914.github.io/SmartEdit/
Paper : https://arxiv.org/abs/2312.06739

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

Paper : https://arxiv.org/abs/2402.18078

Novel View Synthesis

CoPoNeRF: Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs

Homepage : https://ku-cvlab.github.io/CoPoNeRF/
Paper : https://arxiv.org/abs/2312.07246

SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image

Paper : https://arxiv.org/abs/2403.20018

Reconstruction

IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images

Homepage : https://yushuang-wu.github.io/IPoD/
Paper : https://arxiv.org/abs/2404.00269

Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle

Homepage : https://nju-3dv.github.io/projects/Gaussian-Flow/
Paper : https://arxiv.org/abs/2312.03431

Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments

Homepage : https://www.zhuliyuan.net/livingscenes
Paper : https://arxiv.org/abs/2312.09138

HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video

Homepage : https://zc-alexfan.github.io/hold
Paper : https://arxiv.org/abs/2311.18448

Segmentation

EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

Homepage : https://micv-yonsei.github.io/eagle2024/
Paper : https://arxiv.org/abs/2403.01482

GraCo: Granularity-Controllable Interactive Segmentation

Homepage : https://zhao-yian.github.io/GraCo/
Paper : https://arxiv.org/abs/2405.00587

OpenESS: Event-Based Semantic Scene Understanding with Open Vocabularies

Paper : https://arxiv.org/abs/2405.05259

Frequency-Adaptive Dilated Convolution for Semantic Segmentation

Paper : https://arxiv.org/abs/2403.05369

SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers

Paper : https://arxiv.org/abs/2312.00648

Embodied AI

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

Homepage : https://physcene.github.io/
Paper : https://arxiv.org/abs/2404.09465.pdf
Description : Method

Pose Estimation

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Homepage : https://nvlabs.github.io/FoundationPose/
Paper : https://arxiv.org/abs/2312.08344

Autonomous Driving

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Paper : https://arxiv.org/abs/2312.17655

Generalized Predictive Model for Autonomous Driving

Paper : https://arxiv.org/abs/2403.09630

Dynamic LiDAR Re-simulation using Compositional Neural Fields

Paper : https://arxiv.org/abs/2312.05247

SLAM

Gaussian Splatting SLAM

Homepage : https://rmurai.co.uk/projects/GaussianSplattingSLAM/
Paper : https://arxiv.org/abs/2312.06741
Description : Method

SFM

VGGSfM: Visual Geometry Grounded Deep Structure From Motion

Homepage : https://vggsfm.github.io/
Paper : https://arxiv.org/abs/2312.04563

Camera Pose

Map-Relative Pose Regression for Visual Re-Localization

Homepage : https://nianticlabs.github.io/marepo/
Paper : https://arxiv.org/abs/2404.09884

FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation

Paper : https://arxiv.org/abs/2311.17597

Medical Image Analysis

Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning

Paper : https://arxiv.org/abs/2311.17597

Diversified and Personalized Multi-rater Medical Image Segmentation

Paper : https://arxiv.org/abs/2403.13417

Robotic Manipulation

Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation

Homepage : https://sites.google.com/view/diffusion-edfs
Paper : https://arxiv.org/abs/2309.02685

Knowledge Distillation

Logit Standardization in Knowledge Distillation

Homepage : https://sunsean21.github.io/logit-stand-KD.html
Paper : https://arxiv.org/abs/2403.01427

Feature Matching

Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

Homepage : https://zju3dv.github.io/efficientloftr/
Paper : https://zju3dv.github.io/efficientloftr/files/EfficientLoFTR.pdf

Stereo Matching

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Paper : https://arxiv.org/abs/2403.00486

Brain Decoding

MindBridge: A Cross-Subject Brain Decoding Framework

Homepage : https://littlepure2333.github.io/MindBridge/
Paper : https://arxiv.org/abs/2404.07850

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
README.md		README.md

Monad-Cube/CVPR-2024-Highlight-Oral

Folders and files

Latest commit

History

Repository files navigation

CVPR-2024-Oral

Geometry

Scene Understanding

Reconstruction

Embodied AI

Multi-Modal

Video

CVPR-2024-Highlight

3D Presentation(GS, NeRF)

Diffusion

Presentation

LLM

3D Generation

2D Generation

Digital Human

Human-Scene

Multi-Modal

Video

Image

Novel View Synthesis

Reconstruction

Segmentation

Embodied AI

Pose Estimation

Autonomous Driving

SLAM

SFM

Camera Pose

Medical Image Analysis

Robotic Manipulation

Knowledge Distillation

Feature Matching

Stereo Matching

Brain Decoding

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages