Rethinking Inductive Biases for Surface Normal Estimation
- Homepage : https://baegwangbin.github.io/DSINE/
- Paper : https://arxiv.org/abs/2403.00712
PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness
- Homepage : https://astra-vision.github.io/PaSCo/
- Paper : https://arxiv.org/abs/2312.02158
SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
- Homepage : https://scenefun3d.github.io/
- Paper : https://alexdelitzas.github.io/assets/pdf/SceneFun3D.pdf
pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
- Homepage : https://davidcharatan.com/pixelsplat/
- Paper : https://arxiv.org/abs/2312.12337
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
- Homepage : https://ego-exo4d-data.org/
- Paper : https://arxiv.org/abs/2311.18259
- Description : Dataset, Benchmark
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
- Paper : https://arxiv.org/abs/2403.02090
- Descritpion : Baseline
FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring
- Homepage : https://kaist-viclab.github.io/fmanet-site/
- Paper : https://arxiv.org/abs/2401.03707
- Description : Method
- 3DGS
- Diffusion
- Presentation
- LLM
- 3D Generation
- 2D Generation
- Digital Human
- Human-Scene
- Multi-Modal
- Video
- Image
- Novel View Synthesis
- Reconstruction
- Segmentation
- Embodied AI
- Pose Estimation
- Autonomous Driving
- SLAM
- Medical Image Analysis
- Machine Learning
- Robotic Manipulation
- Others
HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces
- Homepage : https://haithemturki.com/hybrid-nerf/
- Paper : https://haithemturki.com/hybrid-nerf/resources/paper.pdf
- Description : Presentation
HashPoint: Accelerated Point Searching and Sampling for Neural Rendering
- Homepage : https://jiahao-ma.github.io/hashpoint/
- Paper : https://arxiv.org/abs/2404.14044v1
- Description : Method Theory
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
- Homepage : https://city-super.github.io/scaffold-gs/
- Paper: https://arxiv.org/abs/2312.00109
Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
- Homepage : https://feature-3dgs.github.io/
- Paper : https://arxiv.org/abs/2312.03203
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
- Homepage : https://migcproject.github.io/
- Paper : https://arxiv.org/abs/2402.19481
Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
- Homepage : https://jiwoogit.github.io/StyleID_site/
- Paper : https://arxiv.org/abs/2312.09008
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
- Paper : https://arxiv.org/abs/2311.16503
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
- Paper : https://arxiv.org/abs/2311.08046
VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
- Paper : https://arxiv.org/abs/2311.16922
RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D
- Homepage : https://aigc3d.github.io/richdreamer/
- Paper : https://arxiv.org/abs/2311.16918
- Description : Method
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers
- Homepage : https://nihalsid.github.io/mesh-gpt/
- Paper : https://arxiv.org/abs/2311.15475
- Description : Method, Mesh Generation
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
- Homepage : https://xpandora.github.io/PhysGaussian/
- Paper : https://arxiv.org/abs/2311.12198
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
- Homepage : https://daveredrum.github.io/SceneTex/
- Paper : https://arxiv.org/abs/2311.17261
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
- Homepage : https://migcproject.github.io/
- Paper : https://arxiv.org/pdf/2402.05408
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models
- Paper : https://arxiv.org/abs/2404.07990
A 4D Dataset ofReal-World Human Clothing With Semantic Annotations
- Homepage : https://eth-ait.github.io/4d-dress/
- Paper : https://arxiv.org/abs/2404.18630
- Description : Dataset
SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction
- Homepage : https://river-zhang.github.io/SIFU-projectpage/
- Paper : https://arxiv.org/abs/2312.06704
3D Human Pose Perception from Egocentric Stereo Videos
- Homepage : https://4dqv.mpi-inf.mpg.de/UnrealEgo2/
- Paper : https://arxiv.org/abs/2401.00889
- Description : Benchmark Dataset
Relightable and Animatable Neural Avatar from Sparse-View Video
- Homepage : https://zju3dv.github.io/relightable_avatar/
- Paper : https://arxiv.org/abs/2308.07903
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
- Homepage : https://alvinliu0.github.io/projects/HumanGaussian
- Paper : https://arxiv.org/abs/2311.17061
GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians
Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis
- Homepage : https://shunyuanzheng.github.io/GPS-Gaussian
- Paper : https://arxiv.org/pdf/2312.02155
Scaling Up Dynamic Human-Scene Interaction Modeling
- Homepage : https://jnnan.github.io/trumans/
- Paper : https://arxiv.org/abs/2403.08629
- Description : Dataset
Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
- Homepage : https://afford-motion.github.io/
- Paper : https://arxiv.org/abs/2403.18036
- Description :
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
- Homepage : https://github.com/shikiw/OPERA
- Paper : https://arxiv.org/abs/2311.17911
- Description : Method
Prompt Highlighter: Interactive Control for Multi-Modal LLMs
- Homepage : https://julianjuaner.github.io/projects/PromptHighlighter/
- Paper : https://arxiv.org/abs/2312.04302
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
- Paper : https://arxiv.org/abs/2311.06607
3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
- Homepage : https://sjojok.github.io/3dgstream/
- Paper : https://arxiv.org/abs/2403.01444
SpatialTracker: Tracking Any 2D Pixels in 3D Space
- Homepage : https://henry123-boy.github.io/SpaTracker/
- Paper : https://arxiv.org/abs/2404.04319
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
- Homepage : https://shangchenzhou.com/projects/upscale-a-video/
- Paper : https://arxiv.org/abs/2312.06640
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
- Homepage : https://rave-video.github.io/
- Paper : https://arxiv.org/abs/2312.04524
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
- Homepage : https://qiuyu96.github.io/CoDeF/
- Paper : https://arxiv.org/abs/2308.07926
Putting the Object Back into Video Object Segmentation
- Homepage : https://hkchengrex.com/Cutie/
- Paper : https://arxiv.org/abs/2310.12982
Boosting Neural Representations for Videos with a Conditional Decoder
- Paper : https://arxiv.org/abs/2402.18152
Enhancing Video Super-Resolution via Implicit Resampling-based Alignment
- Paper : https://arxiv.org/abs/2305.00163
VTimeLLM: Empower LLM to Grasp Video Moments
- Homepage : https://arxiv.org/abs/2311.18445
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
- Paper : https://arxiv.org/abs/2311.16464
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
- Homepage : https://chenshuang-zhang.github.io/imagenet_d/
- Paper : https://arxiv.org/abs/2403.18775
- Description : Benchmark Dataset
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
- Homepage : https://yuzhou914.github.io/SmartEdit/
- Paper : https://arxiv.org/abs/2312.06739
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
- Paper : https://arxiv.org/abs/2402.18078
CoPoNeRF: Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs
- Homepage : https://ku-cvlab.github.io/CoPoNeRF/
- Paper : https://arxiv.org/abs/2312.07246
SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image
- Paper : https://arxiv.org/abs/2403.20018
IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images
- Homepage : https://yushuang-wu.github.io/IPoD/
- Paper : https://arxiv.org/abs/2404.00269
Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle
- Homepage : https://nju-3dv.github.io/projects/Gaussian-Flow/
- Paper : https://arxiv.org/abs/2312.03431
Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments
- Homepage : https://www.zhuliyuan.net/livingscenes
- Paper : https://arxiv.org/abs/2312.09138
HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video
- Homepage : https://zc-alexfan.github.io/hold
- Paper : https://arxiv.org/abs/2311.18448
EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation
- Homepage : https://micv-yonsei.github.io/eagle2024/
- Paper : https://arxiv.org/abs/2403.01482
GraCo: Granularity-Controllable Interactive Segmentation
- Homepage : https://zhao-yian.github.io/GraCo/
- Paper : https://arxiv.org/abs/2405.00587
OpenESS: Event-Based Semantic Scene Understanding with Open Vocabularies
- Paper : https://arxiv.org/abs/2405.05259
Frequency-Adaptive Dilated Convolution for Semantic Segmentation
- Paper : https://arxiv.org/abs/2403.05369
SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers
- Paper : https://arxiv.org/abs/2312.00648
PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI
- Homepage : https://physcene.github.io/
- Paper : https://arxiv.org/abs/2404.09465.pdf
- Description : Method
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
- Homepage : https://nvlabs.github.io/FoundationPose/
- Paper : https://arxiv.org/abs/2312.08344
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
- Paper : https://arxiv.org/abs/2312.17655
Generalized Predictive Model for Autonomous Driving
- Paper : https://arxiv.org/abs/2403.09630
Dynamic LiDAR Re-simulation using Compositional Neural Fields
- Paper : https://arxiv.org/abs/2312.05247
Gaussian Splatting SLAM
- Homepage : https://rmurai.co.uk/projects/GaussianSplattingSLAM/
- Paper : https://arxiv.org/abs/2312.06741
- Description : Method
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
- Homepage : https://vggsfm.github.io/
- Paper : https://arxiv.org/abs/2312.04563
Map-Relative Pose Regression for Visual Re-Localization
- Homepage : https://nianticlabs.github.io/marepo/
- Paper : https://arxiv.org/abs/2404.09884
FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation
- Paper : https://arxiv.org/abs/2311.17597
Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning
- Paper : https://arxiv.org/abs/2311.17597
Diversified and Personalized Multi-rater Medical Image Segmentation
- Paper : https://arxiv.org/abs/2403.13417
Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation
- Homepage : https://sites.google.com/view/diffusion-edfs
- Paper : https://arxiv.org/abs/2309.02685
Logit Standardization in Knowledge Distillation
- Homepage : https://sunsean21.github.io/logit-stand-KD.html
- Paper : https://arxiv.org/abs/2403.01427
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
- Homepage : https://zju3dv.github.io/efficientloftr/
- Paper : https://zju3dv.github.io/efficientloftr/files/EfficientLoFTR.pdf
Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching
- Paper : https://arxiv.org/abs/2403.00486
MindBridge: A Cross-Subject Brain Decoding Framework
- Homepage : https://littlepure2333.github.io/MindBridge/
- Paper : https://arxiv.org/abs/2404.07850