官网链接:http://iccv2021.thecvf.com/home
开会时间:2021年10月11日至17日
- Separable Flow: Learning Motion Cost Volumes for Optical Flow Estimation
⭐code - High-Resolution Optical Flow from 1D Attention and Correlation
😮oral⭐code - GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning
⭐code - Sensor-Guided Optical Flow
⭐code
- 表面异常检测
- 异常检测
- DivAug: Plug-In Automated Data Augmentation With Explicit Diversity Maximization
⭐code - TrivialAugment: Tuning-Free Yet State-of-the-Art Data Augmentation
😮oral⭐code - Semantic Aware Data Augmentation for Cell Nuclei Microscopical Images With Artificial Neural Networks
- A Simple Baseline for Semi-Supervised Semantic Segmentation With Strong Data Augmentation
- OpenGAN: Open-Set Recognition via Open Data Generation
🏆Best Paper Honorable Mention - Conditional Variational Capsule Network for Open Set Recognition
⭐code
- Do Different Deep Metric Learning Losses Lead to Similar Learned Features?
⭐code - Learning With Memory-Based Virtual Classes for Deep Metric Learning
⭐code
- Federated Learning for Non-IID Data via Unified Feature Learning and Optimization Objective Alignment
- Ensemble Attention Distillation for Privacy-Preserving Federated Learning
- Meta-Aggregator: Learning to Aggregate for 1-bit Graph Neural Networks
- PoGO-Net: Pose Graph Optimization With Graph Neural Networks
⭐code - Dynamic Dual Gating Neural Networks
⭐code
- An Asynchronous Kalman Filter for Hybrid Event Cameras
⭐code - 4D Cloud Scattering Tomography
- Snapshot compressive imaging(快照压缩成像)
- 光场
- Light Field Saliency Detection with Dual Local Graph Learning andReciprocative Guidance
- Fast Light-Field Disparity Estimation With Multi-Disparity-Scale Cost Aggregation
⭐code - SeLFVi: Self-supervised Light-Field Video Reconstruction from Stereo Video
- SIGNET: Efficient Neural Representation for Light Fields
- 光场重建
- 压缩成像
- Homography Estimation
- 计算成像
- 光学像差矫正
- Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm Under Mixed Illumination
🌻dataset - FloW: A Dataset and Benchmark for Floating Waste Detection in Inland Waters
🌻dataset
内陆水域漂浮废物检测数据集和基准 - FloorPlanCAD: A Large-Scale CAD Drawing Dataset for Panoptic Symbol Spotting
🏠project - 生物医学图像
- 3D重建
- 航空影像数据集
- 动作识别
- 目标识别
- 车道线检测
- 自动驾驶
- 视觉语言数据集
- DeepFake检测
- 高质量视频
- SketchLattice: Latticed Representation for Sketch Manipulation
- SketchAA: Abstract Representation for Abstract Sketches
- Continual Learning for Image-Based Camera Localization
⭐code - CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization
🌻dataset - Pose Correction for Highly Accurate Visual Localization in Large-Scale Indoor Spaces
⭐code - Cross-Descriptor Visual Localization and Mapping
- YouRefIt: Embodied Reference Understanding with Language and Gesture
😮oral🏠project - VLGrammar: Grounded Grammar Induction of Vision and Language
⭐code - COOKIE: Contrastive Cross-Modal Knowledge Sharing Pre-Training for Vision-Language Representation
⭐code - Panoptic Narrative Grounding
😮oral⭐code - AESOP: Abstract Encoding of Stories, Objects, and Pictures
⭐code📺video - Adaptive Hierarchical Graph Reasoning With Semantic Coherence for Video-and-Language Inference
- 视觉推理
- 语义导航
- 视觉语言导航
- Airbert: In-domain Pretraining for Vision-and-Language Navigation
🏠project - Waypoint Models for Instruction-guided Navigation in Continuous Environments
😮oral⭐code🏠project📺video - The Road To Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
⭐code - Vision-Language Navigation With Random Environmental Mixup
- Airbert: In-domain Pretraining for Vision-and-Language Navigation
- 视觉对话导航
- 视觉导航
- visual grounding
- 视觉对话
- Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization
⭐code - Deep 3D Mask Volume for View Synthesis of Dynamic Scenes
🏠project - Embedding Novel Views in a Single JPEG Image
- Video Autoencoder: self-supervised disentanglement of static 3D structure and motion
😮oral⭐code🏠project📺video - Geometry-Free View Synthesis: Transformers and No 3D Priors
⭐code - Dynamic View Synthesis From Dynamic Monocular Video
🏠project📺video - Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
🏠project📺video - Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image
😮oral⭐code🏠project📺video - Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image
😮oral⭐code🏠project📺video
- Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data
⭐code - Continual Learning on Noisy Data Streams via Self-Purified Replay
⭐code - Rehearsal Revealed: The Limits and Merits of Revisiting Samples in Continual Learning
⭐code - Co2L: Contrastive Continual Learning
⭐code
- Exploiting Scene Graphs for Human-Object Interaction Detection
⭐code - Spatially Conditioned Graphs for Detecting Human-Object Interactions
⭐code📺video - Virtual Multi-Modality Self-Supervised Foreground Matting for Human-Object Interaction
- Detecting Human-Object Relationships in Videos
- Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions
⭐code🏠project🌻dataset - Discovering Human Interactions With Large-Vocabulary Objects via Query and Multi-Scale Detection
⭐code - Visual Relationship Detection Using Part-and-Sum Transformers With Composite QueriesVRD和HOI
- Interaction Compass: Multi-Label Zero-Shot Learning of Human-Object Interactions via Spatial Relations
⭐code - H2O
- Human Interaction Understanding
- 手物交互
- HOI(行为理解)
- SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation
⭐code - StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation
- SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation
- RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering
⭐code - DualPoseNet: Category-Level 6D Object Pose and Size Estimation Using Dual Pose Network With Refined Learning of Pose Consistency
⭐code - PR-GCN: A Deep Graph Convolutional Network With Point Refinement for 6D Pose Estimation
- 物体姿势估计
- BN-NAS: Neural Architecture Search with Batch Normalization
⭐code - RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving
- Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift
⭐code - Evolving Search Space for Neural Architecture Search
⭐code📺video - FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search
⭐code - GLiT: Neural Architecture Search for Global and Local Image Transformer
⭐code - Neural Architecture Search for Joint Human Parsing and Pose Estimation
⭐code - Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces
- Learning Latent Architectural Distribution in Differentiable Neural Architecture Search via Variational Information Maximization
- Not All Operations Contribute Equally: Hierarchical Operation-Adaptive Predictor for Neural Architecture Search
- Zen-NAS: A Zero-Shot NAS for High-Performance Image Recognition
⭐code - BossNAS: Exploring Hybrid CNN-Transformers With Block-Wisely Self-Supervised Neural Architecture Search
⭐code - NAS-OoD: Neural Architecture Search for Out-of-Distribution Generalization
- AutoSpace: Neural Architecture Search With Less Human Interference
⭐code - IDARTS: Interactive Differentiable Architecture Search
- Who's Waldo? Linking People Across Text and Images
😮oral🏠project
📰解读:ICCV2021 Oral-新任务!新数据集!康奈尔大学提出了类似VG但又不是VG的PVG任务 - Partial Off-Policy Learning: Balance Accuracy and Diversity for Human-Oriented Image Captioning
- Topic Scene Graph Generation by Attention Distillation From Caption
⭐code - Understanding and Evaluating Racial Biases in Image Captioning
⭐code🏠project - In Defense of Scene Graphs for Image Captioning
⭐code - art description generation(艺术描述生成)
- Change Captioning
- MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction
⭐code - Stochastic Scene-Aware Motion Prediction
⭐code🏠project - Generating Smooth Pose Sequences for Diverse Human Motion Prediction
😮oral⭐code - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild
🏠project - Motion Prediction using Trajectory Cues
- 3D人体运动预测
- Learning From Noisy Data With Robust Representation Learning
⭐code - Self-Supervised Representation Learning From Flow Equivariance
- Exploring Visual Engagement Signals for Representation Learning
⭐code - Switchable K-class Hyperplanes for Noise-Robust Representation Learning
⭐code - Region Similarity Representation Learning
⭐code - Curious Representation Learning for Embodied Intelligence
⭐code🏠project - 视觉表征学习
- Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
📰解读:ICCV2021 比MoCo更通用的对比学习范式,中科大&MSRA提出对比学习新方法MaskCo - Temporal Knowledge Consistency for Unsupervised Visual Representation Learning
- Contrasting Contrastive Self-Supervised Representation Learning Pipelines
⭐code - Concept Generalization in Visual Representation Learning
🏠project - Collaborative Unsupervised Visual Representation Learning from Decentralized Data
- Episodic Transformer for Vision-and-Language Navigation
⭐code - Multi-VAE: Learning Disentangled View-Common and View-Peculiar Visual Representations for Multi-View Clustering
- Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
- 视频表示学习
- ASCNet: Self-Supervised Video Representation Learning With Appearance-Speed Consistency
- ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
🏠project - Time-Equivariant Contrastive Video Representation Learning
- Space-Time Crop & Attend: Improving Cross-Modal Video Representation Learning
⭐code
- CODEs: Chamfer Out-of-Distribution Examples against Overconfidence Issue
- Semantically Coherent Out-of-Distribution Detection
⭐code🏠project - The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
⭐code
- Towards Interpretable Deep Metric Learning with Structural Matching
⭐code - Deep Relational Metric Learning
⭐code - LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric Learning
⭐code - Manifold Matching via Deep Metric Learning for Generative Modeling
⭐code
- 类增量学习
- 半监督
- Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning
- Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments With Support Samples
⭐code - Semi-Supervised Active Learning for Semi-Supervised Models: Exploit Adversarial Examples With Graph-Based Virtual Labels
- CoMatch: Semi-Supervised Learning With Contrastive Graph Regularization
⭐code - Multiview Pseudo-Labeling for Semi-supervised Learning from Video
- 自监督
- Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring
⭐code - Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging
⭐code - ISD: Self-Supervised Learning by Iterative Similarity Distillation
⭐code - Contrast and Order Representations for Video Self-Supervised Learning
- On Feature Decorrelation in Self-Supervised Learning
😮oral - Geography-Aware Self-Supervised Learning
- Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
- Efficient Visual Pretraining with Contrastive Detection
- Broaden Your Views for Self-Supervised Video Learning
- CDS: Cross-Domain Self-supervised Pre-training
- On Compositions of Transformations in Contrastive Self-Supervised Learning
⭐code - Solving Inefficiency of Self-Supervised Representation Learning
⭐code - Divide and Contrast: Self-supervised Learning from Uncurated Data
- Emerging Properties in Self-Supervised Vision Transformers
⭐code - Mean Shift for Self-Supervised Learning
⭐code
- Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring
- 弱监督
- MultiTask-CenterNet (MCN): Efficient and Diverse Multitask Learning using an Anchor Free Approach
📰解读:ICCV2021《MultiTask CenterNet》CV多任务新进展!一节更比三节强 - Multi-Task Self-Training for Learning General Representations
📰解读:ICCV2021 MuST:还在特定任务里为刷点而苦苦挣扎?谷歌的大佬们都已经开始玩多任务训练了 - UniT: Multimodal Multitask Learning With a Unified Transformer
⭐code - Learning Multiple Pixelwise Tasks Based on Loss Scale Balancing
⭐code - Learning With Privileged Tasks
- Task Switching Network for Multi-Task Learning
- 机器人
- VR/AR
- The Power of Points for Modeling Humans in Clothing
⭐code🏠project📺video - 虚拟试穿
- M3D-VTON: A Monocular-to-3D Virtual Try-On Network
⭐code - ZFlow: Gated Appearance Flow-based Virtual Try-on with 3D Priors
- Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-On and Outfit Editing
- FashionMirror: Co-Attention Feature-Remapping Virtual Try-On With Sequential Template Poses
- Structure-transformed Texture-enhanced Network for Person Image Synthesis
- M3D-VTON: A Monocular-to-3D Virtual Try-On Network
- The Power of Points for Modeling Humans in Clothing
- SLAM
- On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation
⭐code - Transfusion: A Novel SLAM Method Focused on Transparent Objects
- iMAP: Implicit Mapping and Positioning in Real-Time
- Learning To Bundle-Adjust: A Graph Network Approach to Faster Optimization of Bundle Adjustment for Vehicular SLAM
- R-SLAM: Optimizing Eye Tracking From Rolling Shutter Video of the Retina
- Place Recognition
- On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation
- 知识蒸馏
- Distilling Holistic Knowledge with Graph Neural Networks
⭐code - Lipschitz Continuity Guided Knowledge Distillation
⭐code - Densely Guided Knowledge Distillation Using Multiple Teacher Assistants
⭐code - Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make Student Better
⭐code - Compressing Visual-linguistic Model via Knowledge Distillation
- Self-Knowledge Distillation With Progressive Refinement of Targets
⭐code📺video - Student Customized Knowledge Distillation: Bridging the Gap Between Student and Teacher
- Channel-Wise Knowledge Distillation for Dense Prediction
⭐code - Exploring Inter-Channel Correlation for Diversity-Preserved Knowledge Distillation
⭐code
- Distilling Holistic Knowledge with Graph Neural Networks
- 量化
- Distance-aware Quantization
⭐code🏠project - Dynamic Network Quantization for Efficient Video Inference
⭐code🏠project - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss
- Improving Low-Precision Network Quantization via Bin Regularization
- Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization
- Integer-arithmetic-only Certified Robustness for Quantized Neural Networks
- RMSMP: A Novel Deep Neural Network Quantization Framework With Row-Wise Mixed Schemes and Multiple Precisions
- Improving Neural Network Efficiency via Post-Training Quantization With Adaptive Floating-Point
⭐code - Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search
⭐code
- Distance-aware Quantization
- 模型压缩
- 剪枝
- ISR
- Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution
⭐code - Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling
⭐code - Deep Reparametrization of Multi-Frame Super-Resolution and Denoising
😮oral - Dual-Camera Super-Resolution with Aligned Attention Modules
⭐code🏠project📺video - Attention-Based Multi-Reference Learning for Image Super-Resolution
⭐code🏠project - Learning a Single Network for Scale-Arbitrary Super-Resolution
- Fourier Space Losses for Efficient Perceptual Image Super-Resolution
⭐code - Achieving On-Mobile Real-Time Super-Resolution With Neural Architecture and Pruning Search
- Designing a Practical Degradation Model for Deep Blind Image Super-Resolution
⭐code - Event Stream Super-Resolution via Spatiotemporal Constraint Learning
- Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution
- Super-Resolving Cross-Domain Face Miniatures by Peeking at One-Shot Exemplar
- Context Reasoning Attention Network for Image Super-Resolution
- EvIntSR-Net: Event Guided Multiple Latent Frames Reconstruction and Super-Resolution
- Super Resolve Dynamic Scene from Continuous Spike Streams
- Deep Blind Video Super-Resolution
- Benchmarking Ultra-High-Definition Image Super-Resolution
- Lucas-Kanade Reloaded: End-to-End Super-Resolution From Raw Image Bursts
- Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective
- Real-World Video Super-Resolution: A Benchmark Dataset and a Decomposition Based Learning Scheme
⭐code
📰解读:ICCV2021 香港理工、阿里达摩院提出RealVSR:视频超分任务中的新数据集与损失方案
- Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution
- VSR
- SUNet: Symmetric Undistortion Network for Rolling Shutter Correction
- Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery
⭐code
📰解读:ICCV2021|武汉大学RSIDEA团队提出一种新颖的弱监督遥感变化检测算法STAR - 卫星图像全景视频合成
- 基于卫星影像的交通事故检测
- 遥感数据
- 分割
- 三维重建
- The Right to Talk: An Audio-Visual Transformer Approach
⭐code - Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis
⭐code🏠project - 音频分离
- 音频-手势
- Active Speaker Detection(ASD主动式扬声器检测)
- 从人脸视频中重新收集音频
- 视听源定位
- 视听源分离
- 视听平面图重建
- AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer
⭐code - Domain-Aware Universal Style Transfer
⭐code - Diverse Image Style Transfer via Invertible Cross-Space Mapping
- StyleFormer: Real-Time Arbitrary Style Transfer via Parametric Style Composition
- Manifold Alignment for Semantically Aligned Style Transfer
⭐code
- ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models
😮oral - Image Synthesis via Semantic Composition
⭐code🏠project - Image Synthesis From Layout With Locality-Aware Mask Adaption
- 图像融合
- DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features
⭐code - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
⭐code🏠project - Self-supervised Product Quantization for Deep Unsupervised Image Retrieval
⭐code - Instance-Level Image Retrieval Using Reranking Transformers
⭐code - Learning Attribute-Driven Disentangled Representations for Interactive Fashion Retrieval
⭐code - Telling the What While Pointing to the Where: Multimodal Queries for Image Retrieval
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
- Learning Deep Local Features With Multiple Dynamic Attentions for Large-Scale Image Retrieval
⭐code - Bayesian Triplet Loss: Uncertainty Quantification in Image Retrieval
- 跨域检索
- Visual Geolocalization
- 跨模态检索
- Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval With Partial Query
⭐code - Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining
⭐code - Wasserstein Coupled Graph Learning for Cross-Modal Retrieval
- Adversarial Attack on Deep Cross-Modal Hamming Retrieval
- Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval With Partial Query
- 文本-视频检索
- 视频- 文本检索
- image-based 3D shape retrieval
- 近邻搜索
- Improving Contrastive Learning by Visualizing Feature Transformation
😮oral⭐code - TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
📰解读:ICCV2021-TOCo-微软&CMU提出Token感知的级联对比学习方法,在视频文本对齐任务上“吊打”其他SOTA方法 - A Broad Study on the Transferability of Visual Representations With Contrastive Learning
⭐code - Vi2CLR: Video and Image for Visual Contrastive Learning of Representation
- LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions
⭐code - CrossCLR: Cross-Modal Contrastive Learning for Multi-Modal Video Representations
- Social NCE: Contrastive Learning of Socially-Aware Motion Representations
⭐code📺video - With a Little Help From My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations
- Contrastive Learning of Image Representations With Cross-Video Cycle-Consistency
🏠project - Weakly Supervised Contrastive Learning
- Residual Attention: A Simple but Effective Method for Multi-Label Recognition
⭐code - Transformer-based Dual Relation Graph for Multi-label Image Recognition
- Aligning Latent and Image Spaces to Connect the Unconnectable
⭐code🏠project - 图像形状操纵
- 边缘检测
- 图像识别
- 图像去模糊
- Rethinking Coarse-to-Fine Approach in Single Image Deblurring
⭐code - Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions
- Defocus Map Estimation and Deblurring From a Single Dual-Pixel Image
- Motion Deblurring with Real Events
- Pyramid Architecture Search for Real-Time Image Deblurring
- 运动去模糊
- Rethinking Coarse-to-Fine Approach in Single Image Deblurring
- 视频去模糊
- Image quality assessment(图像质量评估IQA)
- Image Harmonization
- 去阴影
- 去噪
- Rethinking Deep Image Prior for Denoising
⭐code - Rethinking Noise Synthesis and Modeling in Raw Denoising
⭐code - C2N: Practical Generative Noise Modeling for Real-World Denoising
- The Benefit of Distraction: Denoising Camera-Based Physiological Measurements Using Inverse Attention
⭐code - Hyperspectral Image Denoising with Realistic Data
⭐code - End-to-End Unsupervised Document Image Blind Denoising
- Cross-Patch Graph Convolutional Network for Image Denoising
- 视频去噪
- Rethinking Deep Image Prior for Denoising
- 图像着色
- 图像增强
- Real-time Image Enhancer via Learnable Spatial-aware 3D Lookup Tables
- Adaptive Unfolding Total Variation Network for Low-Light Image Enhancement
⭐code - Representative Color Transform for Image Enhancement
- STAR: A Structure-Aware Lightweight Transformer for Real-Time Image Enhancement
- Deep Symmetric Network for Underexposed Image Enhancement With Recurrent Attentional Learning
⭐code🏠project - StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement
- 图像恢复
- Spatially-Adaptive Image Restoration using Distortion-Guided Networks
⭐code - Dynamic Attentive Graph Learning for Image Restoration
⭐code - Self-Supervised Cryo-Electron Tomography Volumetric Image Restoration From Single Noisy Volume With Sparsity Constraint
⭐code - Searching for Controllable Image Restoration Networks
⭐code
- Spatially-Adaptive Image Restoration using Distortion-Guided Networks
- 图像压缩
- 图像修复
- Image Inpainting via Conditional Texture and Structure Dual Generation
⭐code - CR-Fill: Generative Image Inpainting With Auxiliary Contextual Reconstruction
⭐code - Parallel Multi-Resolution Fusion Network for Image Inpainting
- Painting from Part
⭐code - WaveFill: A Wavelet-Based Generation Network for Image Inpainting
- Distillation-Guided Image Inpainting
- Learning a Sketch Tensor Space for Image Inpainting of Man-made Scenes
⭐code🏠project
- Image Inpainting via Conditional Texture and Structure Dual Generation
- Image extrapolation
- Reversible Image Conversion
- 伪影去除
- De-rendering
- 去除光晕
- 全景图拼接
- Flare Removal
- 图像裁剪
- 去反射
- 去雨
- 图像失真去除
- 消除水下图像的折射失真
- 图像补全
- Image Decomposition
- 失真矫正
- HDR
- 图像去雪
- Image Harmonization
- 图像编辑
- image hiding(图像隐藏)
- Equivariant Imaging: Learning Beyond the Range Space
😮oral⭐code - Deep Survival Analysis With Longitudinal X-Rays for COVID-19
- 医学图像分割
- 病理学图像表示
- 医学图像分析
- 医学图像去噪
- 视频翻译
- 病理学图像核检测分割
- 医学报告生成
- CT
- 医学图像识别
- 医学图像分类
- VariTex: Variational Neural Face Textures
⭐code🏠project📺video - 人脸造假检测
- 人脸合成
- 人脸识别
- PASS: Protected Attribute Suppression System for Mitigating Bias in Face Recognition
- SynFace: Face Recognition with Synthetic Data
- Adaptive Label Noise Cleaning With Meta-Supervision for Deep Face Recognition
- Disentangled Representation for Age-Invariant Face Recognition: A Mutual Information Minimization Perspective
- Teacher-Student Adversarial Depth Hallucination To Improve Face Recognition
⭐code - DAM: Discrepancy Alignment Metric for Face Recognition
- “去”识别
- Face perception面部感知
- 说话人脸生成
- 说话头合成
- 人脸表情识别
- 人脸呈现攻击检测
- 人脸编辑
- 人脸对齐
- 人脸图像重建
- 3D人脸重建
- 三维人脸动画
- Remote Photoplethysmography (rPPG远程光电容积描记术)
- 人脸加密
- Deepfake检测
- 人脸纹理补全
- 面部动作单元检测
- 人脸分析
- 3D头重建
- 人脸关键点检测
- 人脸图像检索
- Sketch Your Own GAN
⭐code🏠project - Online Multi-Granularity Distillation for GAN Compression
⭐code - Dual Projection Generative Adversarial Networks for Conditional Image Generation
⭐code - InSeGAN: A Generative Approach to Segmenting Identical Instances in Depth Images
- ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement
⭐code🏠project📺video - WarpedGANSpace: Finding non-linear RBF paths in GAN latent space
⭐code - Toward a Visual Concept Vocabulary for GAN Latent Space
- Collaging Class-specific GANs for Semantic Image Synthesis
🏠project - Latent Transformations via NeuralODEs for GAN-Based Image Editing
- Reality Transform Adversarial Generators for Image Splicing Forgery Detection and Localization
- GAN-Control: Explicitly Controllable GANs(https://alonshoshan10.github.io/gan_control/)
🏠project - Omni-GAN: On the Secrets of cGANs and Beyond
⭐code - Unsupervised Image Generation with Infinite Generative Adversarial Networks
⭐code - DAE-GAN: Dynamic Aspect-Aware GAN for Text-to-Image Synthesis
- Detail Me More: Improving GAN’s photo-realism of complex scenes
- Unsupervised Segmentation Incorporating Shape Prior via Generative Adversarial Networks
- DRB-GAN: A Dynamic ResBlock Generative Adversarial Network for Artistic Style Transfer
⭐code - Dual Contrastive Loss and Attention for GANs
- Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation
- Gradient Normalization for Generative Adversarial Networks
⭐code - EigenGAN: Layer-Wise Eigen-Learning for GANs
⭐code - Retrieve in Style: Unsupervised Facial Feature Transfer and Retrieval
⭐code - HeadGAN: One-shot Neural Head Synthesis and Editing
🏠project📺video - Explaining in Style: Training a GAN To Explain a Classifier in StyleSpace
⭐code🏠project📺video - StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
😮oral⭐code📺video - Towards Discovery and Attribution of Open-World GAN Generated Images
- Diagonal Attention and Style-Based GAN for Content-Style Disentanglement in Image Generation and Translation
- Re-Aging GAN: Toward Personalized Face Age Transformation
- When do GANs replicate? On the choice of dataset size
⭐code - LoFGAN: Fusing Local Representations for Few-shot Image Generation
- Multi-Class Multi-Instance Count Conditioned Adversarial Image Generation
⭐code - Generative Adversarial Registration for Improved Conditional Deformable Templates
⭐code - F-Drop&Match: GANs with a Dead Zone in the High-Frequency Domain
- GAN inversion(GAN逆映射)
- 图像到图像翻译
- Unaligned Image-to-Image Translation by Learning to Reweight
⭐code - Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation
⭐code - Instance-Wise Hard Negative Example Generation for Contrastive Learning in Unpaired Image-to-Image Translation
- TransferI2I: Transfer Learning for Image-to-Image Translation from Small Datasets
⭐code - Rethinking the Truly Unsupervised Image-to-Image Translation
- SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised Image-to-Image Translation
- Unaligned Image-to-Image Translation by Learning to Reweight
- Image translation
- Scaling-up Disentanglement for Image Translation
⭐code🏠project - Harnessing the Conditioning Sensorium for Improved Image Translation
- Frequency Domain Image Translation: More Photo-Realistic, Better Identity-Preserving
⭐code - Dual Transfer Learning for Event-based End-task Prediction via Pluggable Event to Image Translation
⭐code - Semantically Robust Unpaired Image Translation for Data with Unmatched Semantics Statistics
- Scaling-up Disentanglement for Image Translation
- Semi-Supervised Active Learning with Temporal Output Discrepancy
⭐code - Influence Selection for Active Learning
- Active Domain Adaptation via Clustering Uncertainty-weighted Embeddings
⭐code - Contrastive Coding for Active Learning under Class Distribution Mismatch
⭐code
- Low Curvature Activations Reduce Overfitting in Adversarial Training
- Removing Adversarial Noise in Class Activation Feature Space
- Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings
- Invisible Backdoor Attack With Sample-Specific Triggers
⭐code - Defending Against Universal Adversarial Patches by Clipping Feature Norms
- 对抗攻击
- Feature Importance-aware Transferable Adversarial Attacks
⭐code - TkML-AP: Adversarial Attacks to Top-k Multi-Label Learning
⭐code - Meta Gradient Adversarial Attack
- AGKD-BML: Defense Against Adversarial Attack by Attention Guided Knowledge Distillation and Bi-directional Metric Learning
⭐code - Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes
- AdvDrop: Adversarial Attack to DNNs by Dropping Information
⭐code - Adversarial Attacks Are Reversible With Natural Supervision
- Attack As the Best Defense: Nullifying Image-to-Image Translation GANs via Limit-Aware Adversarial Attack
- Learnable Boundary Guided Adversarial Training
⭐code - Augmented Lagrangian Adversarial Attacks
⭐code - Meta-Attack: Class-Agnostic and Model-Agnostic Physical Adversarial Attack
- On Generating Transferable Targeted Perturbations
⭐code - Admix: Enhancing the Transferability of Adversarial Attacks
⭐code - Consistency-Sensitivity Guided Ensemble Black-Box Adversarial Attacks in Low-Dimensional Spaces
- Adversarial Attacks On Multi-Agent Communication
- Interpreting Attributions and Interactions of Adversarial Attacks
- RDA: Robust Domain Adaptation via Fourier Adversarial Attacking
- Feature Importance-aware Transferable Adversarial Attacks
- 对抗样本
- 黑盒
- End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
⭐code - MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving
⭐code - NEAT: Neural Attention Fields for End-to-End Autonomous Driving
⭐code - Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving
⭐code - Social-NCE: Contrastive Learning of Socially-aware Motion Representations
⭐code📺video - Learning To Drive From a World on Rails
😮oral⭐code🏠project - DRIVE: Deep Reinforced Accident Anticipation With Visual Explanation
⭐code🏠project📺video - LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving
- Prediction by Anticipation: An Action-Conditional Prediction Method Based on Interaction Learning
⭐code📺video - TMCOSS: Thresholded Multi-Criteria Online Subset Selection for Data-Efficient Autonomous Driving
- FIERY: Future Instance Prediction in Bird's-Eye View From Surround Monocular Cameras
⭐code - On Exposing the Challenging Long Tail in Future Prediction of Traffic Actors
⭐code - MGNet: Monocular Geometric Scene Understanding for Autonomous Driving
⭐code📺video - Human trajectory prediction(人体轨迹预测)
- 轨迹预测
- Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction
- LOKI: Long Term and Key Intentions for Trajectory Prediction
- MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction
⭐code - DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets
- Where Are You Heading? Dynamic Trajectory Prediction With Expert Goal Examples
⭐code - Three Steps to Multimodal Trajectory Prediction: Modality Clustering, Classification and Synthesis
- Spatial-Temporal Consistency Network for Low-Latency Trajectory Forecasting
- Likelihood-Based Diverse Sampling for Trajectory Forecasting
⭐code
- 运动预测
- 自动导航
- 交通场景理解
- 车辆车牌识别
- 自主赛车
- 预测司机的视觉注意力
- 姿势预测
- 车辆跟踪
- 对任意相机视角的车辆进行检测分析
- 车道线检测
- 车速估计
- Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
😮oral⭐code
📰解读:ICCV2021 Oral-TAU&Facebook提出了通用的Attention模型可解释性 - Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction
⭐code - PlaneTR: Structure-Guided Transformers for 3D Plane Recovery
⭐code - Rethinking and Improving Relative Position Encoding for Vision Transformer
⭐code - Vision Transformer with Progressive Sampling
⭐code - Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
😮oral⭐code - Rethinking Spatial Dimensions of Vision Transformers
⭐code
📰解读:ICCV2021-PiT-池化操作不是CNN的专属,ViT说:“我也可以”;南大提出池化视觉Transformer(PiT) - PnP-DETR: Towards Efficient Visual Analysis with Transformers
⭐code - Describing and Localizing Multiple Changes With Transformers
⭐code🏠project - LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference
⭐code - VidTr: Video Transformer Without Convolutions
- Visformer: The Vision-Friendly Transformer
⭐code - Going Deeper With Image Transformers
⭐code - Multiscale Vision Transformers
⭐code - Learning Multi-Scene Absolute Pose Regression With Transformers
⭐code - Visual Saliency Transformer
⭐code - Event-Based Video Reconstruction Using Transformer
⭐code - Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
⭐code - An Empirical Study of Training Self-Supervised Vision Transformers
😮oral⭐code - Tokens-to-Token ViT: Training Vision Transformers From Scratch on ImageNet
⭐code - CvT: Introducing Convolutions to Vision Transformers
⭐code - COTR: Correspondence Transformer for Matching Across Images
- ViViT: A Video Vision Transformer
⭐code - AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting
⭐code🏠project - Incorporating Convolution Designs into Visual Transformers
⭐code - LayoutTransformer: Layout Generation and Completion with Self-attention
⭐code🏠project - AutoFormer: Searching Transformers for Visual Recognition
⭐code - Scalable Vision Transformers With Hierarchical Pooling
⭐code - Visual Transformers: Where Do Transformers Really Belong in Vision Models?
- Anticipative Video Transformer
⭐code🏠project - 密集预测
- 3D人体纹理估计
- 图像编辑
- OCR
- 根据音乐生成舞蹈
- Discovering 3D Parts from Image Collections
😮oral⭐code🏠project📺video - PixelSynth: Generating a 3D-Consistent Experience from a Single Image
⭐code🏠project - Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
⭐code🏠project - Pixel-Perfect Structure-from-Motion with Featuremetric Refinement
⭐code - Learning Anchored Unsigned Distance Functions with Gradient Direction Alignment for Single-view Garment Reconstruction
😮oral⭐code - LSD-StructureNet: Modeling Levels of Structural Detail in 3D Part Hierarchies
- Rational Polynomial Camera Model Warping for Deep Learning Based Satellite Multi-View Stereo Matching
⭐code - Where2Act: From Pixels to Actions for Articulated 3D Objects
📺video - BuildingNet: Learning to Label 3D Buildings
😮oral⭐code🏠project - SurfGen: Adversarial 3D Shape Synthesis With Explicit Surface Discriminators
- Deep Virtual Markers for Articulated 3D Shapes
⭐code📺video - Learning Efficient Photometric Feature Transform for Multi-view Stereo
🏠project - Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing
- Just a Few Points Are All You Need for Multi-View Stereo: A Novel Semi-Supervised Learning Method for Multi-View Stereo
- 3D-FRONT: 3D Furnished Rooms With layOuts and semaNTics
- Learning Generative Models of Textured 3D Meshes from Real-World Images
⭐code - Self-Supervised Pretraining of 3D Features on any Point-Cloud
- High Quality Disparity Remapping with Two-Stage Warping
- Structure-From-Sherds: Incremental 3D Reassembly of Axially Symmetric Pots From Unordered and Mixed Fragment Collections
⭐code - Interpolation-Aware Padding for 3D Sparse Convolutional Neural Networks
- 深度估计
- StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation
⭐code - Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision
⭐code🏠project - Augmenting Depth Estimation with Geospatial Context
- Can Scale-Consistent Monocular Depth Be Learned in a Self-Supervised Scale-Invariant Manner?
- Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective With Transformers
😮oral⭐code - Adaptive Surface Normal Constraint for Depth Estimation
⭐code - Event-Intensity Stereo: Estimating Depth by the Best of Both Worlds
- DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes
- DepthInSpace: Exploitation and Fusion of Multiple Video Frames for Structured-Light Depth Estimation
🏠project - Boosting Monocular Depth Estimation With Lightweight 3D Point Fusion
- Monocular Depth Estimation(单目深度估计)
- Revealing the Reciprocal Relations Between Self-Supervised Stereo and Monocular Depth Estimation
- MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments
- Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark
⭐code - Towards Interpretable Deep Networks for Monocular Depth Estimation
⭐code📺video - Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation
⭐code - Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation
😮oral⭐code - Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation
⭐code - R-MSFM: Recurrent Multi-Scale Feature Modulation for Monocular Depth Estimating
⭐code - Adaptive Confidence Thresholding for Monocular Depth Estimation(https://github.com/megvii-research/OMNet)
- SaccadeCam: Adaptive Visual Attention for Monocular Depth Sensing
- StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation
- 深度补全
- Omnidirectional Localization
- 三维重建
- Learning Signed Distance Field for Multi-view Surface Reconstruction
😮oral - 3DStyleNet: Creating 3D Shapes with Geometric and Texture Style Variations
😮oral🏠project - DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension
- In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces
😮oral⭐code📺video - Gaussian Fusion: Accurate 3D Reconstruction via Geometry-Guided Displacement Interpolation
- RetrievalFuse: Neural 3D Scene Reconstruction With a Database
⭐code🏠project📺video - Multi-View 3D Reconstruction With Transformers
- Polarimetric Helmholtz Stereopsis
- MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis
⭐code📺video - Toward Realistic Single-View 3D Object Reconstruction With Unsupervised Learning From Multiple Images
⭐code - CryoDRGN2: Ab initio neural reconstruction of 3D protein structures from real cryo-EM images
- 三维场景重建
- 三维形状重建
- 三维网格重建
- Learning Signed Distance Field for Multi-view Surface Reconstruction
- 三维场景
- 相机校准
- 表面重建
- 3D场景合成
- 3D形状识别
- 图像重建
- Multi-view Stereo(MVS)
- Digging into Uncertainty in Self-supervised Multi-view Stereo
⭐code - PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility
⭐code - A Confidence-Based Iterative Solver of Depths and Surface Normals for Deep Multi-View Stereo
⭐code - EPP-MVSNet: Epipolar-Assembling Based Depth Prediction for Multi-View Stereo
- Digging into Uncertainty in Self-supervised Multi-view Stereo
- Spatio-Temporal Representation Factorization for Video-based Person Re-Identification
- Learning Instance-level Spatial-Temporal Patterns for Person Re-identification
⭐code - Towards Discriminative Representation Learning for Unsupervised Person Re-identification
- Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences
⭐code🏠project - Video-based Person Re-identification with Spatial and Temporal Memory Networks
🏠project - Multi-Expert Adversarial Attack Detection in Person Re-identification Using Context Inconsistency
- Clothing Status Awareness for Long-Term Person Re-Identification
- Dense Interaction Learning for Video-Based Person Re-Identification
😮oral - Explainable Person Re-Identification With Attribute-Guided Metric Distillation
🏠project - Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-Identification
- Pyramid Spatial-Temporal Aggregation for Video-Based Person Re-Identification
⭐code - ICE: Inter-Instance Contrastive Encoding for Unsupervised Person Re-Identification
⭐code📺video - Learning To Know Where To See: A Visibility-Aware Approach for Occluded Person Re-Identification
- Attack-Guided Perceptual Data Generation for Real-world Re-Identification
- BV-Person: A Large-Scale Dataset for Bird-View Person Re-Identification
🌻dataset - CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification
⭐code - Meta Pairwise Relationship Distillation for Unsupervised Person Re-Identification
- Syncretic Modality Collaborative Learning for Visible Infrared Person Re-Identification
- Weakly Supervised Text-Based Person Re-Identification
⭐code - Occlude Them All: Occlusion-Aware Attention Network for Occluded Person Re-ID
- Occluded Person Re-Identification with Single-scale Global Representations
⭐code - 域适应人员重识别
- Crowd Counting(拥挤人群计数)
- Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework
😮oral⭐code - Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting
⭐code - Spatial Uncertainty-Aware Semi-Supervised Crowd Counting
⭐code - Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting
⭐code - Exploiting Sample Correlation for Crowd Counting With Multi-Expert Network
⭐code - Crowd Counting With Partial Annotations in an Image
⭐code - Towards A Universal Model for Cross-Dataset Crowd Counting
- Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework
😮oral⭐code - Uniformity in Heterogeneity: Diving Deep Into Count Interval Partition for Crowd Counting
⭐code
- Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework
- 跨模态人员重识别
- 行人检测
- 行人属性识别
- Person Search(行人搜索)
- Weakly Supervised Person Search with Region Siamese Networks
- End-to-End Trainable Trident Person Search Network Using Adaptive Gradient Propagation
- ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer
🏠project - Weakly Supervised Person Search with Region Siamese Networks
- 行人行为预测
- 步态识别
- Saliency-Associated Object Tracking
- Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds
⭐code - Learning to Track Objects from Unlabeled Videos
⭐code - DepthTrack : Unveiling the Power of RGBD Tracking
⭐code - Learning Target Candidate Association To Keep Track of What Not To Track
⭐code - Transparent Object Tracking Benchmark
🏠project - DepthTrack: Unveiling the Power of RGBD Tracking
⭐code - Object Tracking by Jointly Exploiting Frame and Event Domain
- High-Performance Discriminative Tracking with Transformers
- Visio-Temporal Attention for Multi-Camera Multi-Target Association
- 视觉目标跟踪
- 卫星图像跟踪
- 3D多目标跟踪
- 多目标跟踪与分割
- 多目标跟踪
- 视频目标跟踪
- Rank & Sort Loss for Object Detection and Instance Segmentation
😮oral⭐code - MDETR : Modulated Detection for End-to-End Multi-Modal Understanding
😮oral⭐code - SimROD: A Simple Adaptation Method for Robust Object Detection
😮oral🏠project
📰解读:ICCV2021 Oral SimROD:简单高效的数据增强!华为提出了一种简单的鲁棒目标检测自适应方法 - GraphFPN: Graph Feature Pyramid Network for Object Detection
- Fast Convergence of DETR with Spatially Modulated Co-Attention
⭐code - Oriented R-CNN for Object Detection
⭐code - Conditional DETR for Fast Training Convergence
📰解读:通过显式寻找物体的 extremity 区域加快 DETR 的收敛:Conditional DETR - Vector-Decomposed Disentanglement for Domain-Invariant Object Detection
⭐code - G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation
- ODAM: Object Detection, Association, and Mapping using Posed RGB Video
😮oral - Reconcile Prediction Consistency for Balanced Object Detection
- Deep Structured Instance Graph for Distilling Object Detectors
⭐code - Towards Rotation Invariance in Object Detection
⭐code - Morphable Detector for Object Detection on Demand
⭐code - DetCo: Unsupervised Contrastive Learning for Object Detection
⭐code - Domain-Invariant Disentangled Network for Generalizable Object Detection
- MDETR - Modulated Detection for End-to-End Multi-Modal Understanding
⭐code - Detecting Persuasive Atypicality by Modeling Contextual Compatibility
⭐code - Wanderlust: Online Continual Object Detection in the Real World
🏠project - PreDet: Large-Scale Weakly Supervised Pre-Training for Detection
- FMODetect: Robust Detection of Fast Moving Objects
- Multi-Source Domain Adaptation for Object Detection
- Self-Supervised Object Detection via Generative Image Synthesis
⭐code - Naturalistic Physical Adversarial Patch for Object Detectors
⭐code - Rethinking Transformer-Based Set Prediction for Object Detection
⭐code - Detecting Invisible People
🏠project📺video - Dynamic DETR: End-to-End Object Detection With Dynamic Attention
- CrossDet: Crossline Representation for Object Detection
⭐code - Robust Object Detection via Instance-Level Temporal Cycle Confusion
⭐code - End-to-End Semi-Supervised Object Detection With Soft Teacher
⭐code - Parallel Rectangle Flip Attack: A Query-Based Black-Box Attack Against Object Detection
- Fooling LiDAR Perception via Adversarial Trajectory Perturbation
⭐code🏠project - TOOD: Task-Aligned One-Stage Object Detection
😮oral⭐code - Active Learning for Deep Object Detection via Probabilistic Modeling
⭐code
📰解读:ICCV2021 还在用大量数据暴力train模型?主动学习,教你选出数据集中最有价值的样本 - Dual Bipartite Graph Learning: A General Approach for Domain Adaptive Object Detection
- WB-DETR: Transformer-Based Detector without Backbone
- 3D目标检测
- Geometry Uncertainty Projection Network for Monocular 3D Object Detection
⭐code - Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather
⭐code🏠project - Is Pseudo-Lidar needed for Monocular 3D Object detection?
⭐code - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection
- LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector
⭐code🏠project - Improving 3D Object Detection with Channel-wise Transformer
- 4D-Net for Learned Multi-Modal Alignment
- Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection
- Voxel Transformer for 3D Object Detection
- An End-to-End Transformer Model for 3D Object Detection
😮oral⭐code🏠project - Unsupervised Domain Adaptive 3D Detection With Multi-Level Consistency
- Group-Free 3D Object Detection via Transformers
⭐code - VENet: Voting Enhancement Network for 3D Object Detection
- Multi-Echo LiDAR for 3D Object Detection
- Voxel Transformer for 3D Object Detection
- RangeDet: In Defense of Range View for LiDAR-Based 3D Object Detection
⭐code - The Devil Is in the Task: Exploiting Reciprocal Appearance-Localization Features for Monocular 3D Object Detection
- Gated3D: Monocular 3D Object Detection From Temporal Illumination Cues
🏠project - Are We Missing Confidence in Pseudo-LiDAR Methods for Monocular 3D Object Detection?
- SPG: Unsupervised Domain Adaptation for 3D Object Detection via Semantic Point Generation
- You Don't Only Look Once: Constructing Spatial-Temporal Memory for Integrated 3D Object Detection and Tracking
⭐code🏠project📺video - Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection
- AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection
⭐code - Geometry-Based Distance Decomposition for Monocular 3D Object Detection
⭐code
- Geometry Uncertainty Projection Network for Monocular 3D Object Detection
- 目标定位
- Anomaly Detection(图像异常检测)
- 弱监督目标检测
- OOD 检测
- 显著目标检测
- Disentangled High Quality Salient Object Detection
- Specificity-preserving RGB-D Saliency Detection
⭐code - Light Field Saliency Detection with Dual Local Graph Learning and Reciprocative Guidance
- MFNet: Multi-Filter Directive Network for Weakly Supervised Salient Object Detection
⭐code - Scene Context-Aware Salient Object Detection
⭐code - Dynamic Context-Sensitive Filtering Network for Video Salient Object Detection
⭐code - iNAS: Integral NAS for Device-Aware Salient Object Detection
🏠project - RGB-D显著目标检测
- co-saliency detection
- 违禁物品检测
- 小样本目标检测
- 视觉关系协同定位
- 密集目标检测
- 域适应目标检测
- 图像篡改检测
- Visual Relationship Detection(VRD视觉关系检测)
- 长尾目标检测
- Salient Object Ranking
- 小目标检测
- 黑暗中目标检测
- 3D object prediction
- 多目标检测
- 3D object grounding
- 细粒度裂纹检测
- 线段检测
- 细胞检测与分类
- 阴影检测
- 社交距离检测
- 伪装目标检测
- Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation
😮oral⭐code📺video - TransForensics: Image Forgery Localization with Dense Self-Attention
- From Contexts to Locality: Ultra-high Resolution Image Segmentation via Locality-aware Contextual Correlation
⭐code - Labels4Free: Unsupervised Segmentation using StyleGAN
🏠project📺video - Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle-consistency
- Scaling up instance annotation via label propagation
⭐code🏠project - Robust Trust Region for Weakly Supervised Segmentation
⭐code📺video - HPNet: Deep Primitive Segmentation Using Hybrid Representations
⭐code - Weakly Supervised Segmentation of Small Buildings With Point Labels
- BAPA-Net: Boundary Adaptation and Prototype Alignment for Cross-Domain Semantic Segmentation
⭐code - Conditional Diffusion for Interactive Segmentation
- Human Detection and Segmentation via Multi-view Consensus
⭐code - Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation
- Enhanced Boundary Learning for Glass-Like Object Segmentation
⭐code - PARTS: Unsupervised segmentation with slots, attention and independence maximization
- Predictive Feature Learning for Future Segmentation Prediction
- Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation
⭐code - Segmenter: Transformer for Semantic Segmentation
⭐code - C3-SemiSeg: Contrastive Semi-Supervised Segmentation via Cross-Set Learning and Dynamic Class-Balancing
- 全景分割
- 语义分割
- Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation
😮oral⭐code - Personalized Image Semantic Segmentation
⭐code - RECALL: Replay-based Continual Learning in Semantic Segmentation
⭐code - Deep Metric Learning for Open World Semantic Segmentation
- LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation
😮oral - Dual Path Learning for Domain Adaptation of Semantic Segmentation
⭐code - Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation
- Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation
⭐code🏠project - Multi-Anchor Active Domain Adaptation for Semantic Segmentation
😮oral⭐code - Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation
- Self-Regulation for Semantic Segmentation
⭐code - ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation
⭐code - Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation
🏠project - Mining Contextual Information Beyond Image for Semantic Segmentation
⭐code - ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation
⭐code - Pseudo-mask Matters in Weakly-supervised Semantic Segmentation
⭐code - SIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic Segmentation
- Region-Aware Contrastive Learning for Semantic Segmentation
- GP-S3Net: Graph-Based Panoptic Sparse Semantic Segmentation Network
- Domain Adaptive Semantic Segmentation With Self-Supervised Depth Estimation
⭐code - Scribble-Supervised Semantic Segmentation by Uncertainty Reduction on Neural Representation and Self-Supervision on Neural Eigenspace
- Exploring Cross-Image Pixel Contrast for Semantic Segmentation
😮oral⭐code - Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation
⭐code - Uncertainty-Aware Pseudo Label Refinery for Domain Adaptive Semantic Segmentation
- Contrastive Learning for Label Efficient Semantic Segmentation
- Scaling Semantic Segmentation Beyond 1K Classes on a Single GPU
⭐code - Prototypical Matching and Open Set Rejection for Zero-Shot Semantic Segmentation
- Geometric Unsupervised Domain Adaptation for Semantic Segmentation
- Calibrated Adversarial Refinement for Stochastic Semantic Segmentation
⭐code - Multi-View Radar Semantic Segmentation
⭐code - Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation
😮oral⭐code - Specialize and Fuse: Pyramidal Output Representation for Semantic Segmentation
- Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals
⭐code - Scribble-Supervised Semantic Segmentation Inference
- Semi-Supervised Semantic Segmentation With Pixel-Level Contrastive Learning From a Class-Wise Memory Bank
⭐code - 小样本语义分割
- 3D语义分割
- VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation
😮oral⭐code - Sparse-to-Dense Feature Matching: Intra and Inter Domain Cross-Modal Learning in Domain Adaptation for 3D Semantic Segmentation
⭐code - Weakly Supervised 3D Semantic Segmentation Using Cross-Image Consensus and Inter-Voxel Affinity Relations
- VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation
- 视频语义分割
- 弱监督语义分割
- Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
⭐code - Complementary Patch for Weakly Supervised Semantic Segmentation
- ECS-Net: Improving Weakly Supervised Semantic Segmentation by Using Connections Between Class Activation Maps
- Unlocking the Potential of Ordinary Classifier: Class-Specific Adversarial Erasing Framework for Weakly Supervised Semantic Segmentation
⭐code - Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation
⭐code - Seminar Learning for Click-Level Weakly Supervised Semantic Segmentation
- Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
- 点云语义分割
- ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation
📺video - Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation
- TempNet: Online Semantic Segmentation on Large-Scale Point Cloud Series
- Guided Point Contrastive Learning for Semi-Supervised Point Cloud Semantic Segmentation
- Learning With Noisy Labels for Robust Point Cloud Segmentation
⭐code🏠project
- ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation
- OOD
- Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation
- 实例分割
- Rank & Sort Loss for Object Detection and Instance Segmentation
😮oral⭐code - SOTR: Segmenting Objects with Transformers
⭐code - A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation
- Instances as Queries
⭐code📺video - CrossVIS: Crossover Learning for Fast Online Video Instance Segmentation
⭐code📺video - CDNet: Centripetal Direction Network for Nuclear Instance Segmentation
⭐code - PrimitiveNet: Primitive Instance Segmentation With Local Primitive Embedding Under Adversarial Metric
⭐code - FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation
⭐code🏠project - Prior to Segment: Foreground Cues for Weakly Annotated Classes in Partially Supervised Instance Segmentation
⭐code - DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence From Box Supervision
- End-to-End Video Instance Segmentation via Spatial-Temporal Graph Neural Networks
- The Surprising Impact of Mask-Head Architecture on Novel Class Segmentation
🏠project - How Shift Equivariance Impacts Metric Learning for Instance Segmentation
⭐code - Parallel Detection-and-Segmentation Learning for Weakly Supervised Instance Segmentation
- Real-Time Instance Segmentation With Discriminative Orientation Maps
⭐code - 视频实例分割
- 3D实例分割
- Rank & Sort Loss for Object Detection and Instance Segmentation
- 小样本分割
- Human Motion Segmentation(人体运动分割)
- 点云分割
- 视频目标分割(VOS)
- Full-Duplex Strategy for Video Object Segmentation
🏠project - Joint Inductive and Transductive Learning for Video Object Segmentation
⭐code - Hierarchical Memory Matching Network for Video Object Segmentation
⭐code - Self-supervised Video Object Segmentation by Motion Grouping
⭐code🏠project📺video - Deep Transport Network for Unsupervised Video Object Segmentation
- Generating Masks From Boxes by Mining Spatio-Temporal Consistencies in Videos
⭐code - Learning Motion-Appearance Co-Attention for Zero-Shot Video Object Segmentation
⭐code - Video Object Segmentation With Dynamic Memory Networks and Adaptive Object Alignment
⭐code
- Full-Duplex Strategy for Video Object Segmentation
- 语义场景分割
- Referring Segmentation(基于文本的分割)
- 场景理解
- DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization
😮oral⭐code🏠project📺video - ACDC: The Adverse Conditions Dataset With Correspondences for Semantic Driving Scene Understanding
🏠project - Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding
⭐code
- DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization
- CMA
- 多目标分割
- 动作分割
- 场景解析
- 抠图
- 运动分割
- DiagViB-6: A Diagnostic Benchmark Suite for Vision Models in the Presence of Shortcut and Generalization Opportunities
- Online Continual Learning For Visual Food Classification
- A Unified Objective for Novel Class Discovery
😮oral⭐code🏠project
📰解读:ICCV2021 Oral | UNO:用于“新类发现”的统一目标函数,简化训练流程!已开源! - Improving Generalization of Batch Whitening by Convolutional Unit Optimization
⭐code - Towards Learning Spatially Discriminative Feature Representations
- CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
⭐code
📰解读:ICCV2021 MIT-IBM沃森开源CrossViT:Transformer走向多分支、多尺度 - SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition
⭐code - Influence-Balanced Loss for Imbalanced Visual Classification
⭐code - Explanations for Occluded Images
⭐code🏠project📺video - Understanding Robustness of Transformers for Image Classification
- Learning Rare Category Classifiers on a Tight Labeling Budget
- Discover the Unknown Biased Attribute of an Image Classifier
⭐code - Co-Scale Conv-Attentional Image Transformers
😮oral⭐code - Benchmark Platform for Ultra-Fine-Grained Visual Categorization Beyond Human Performance
⭐code - Do Image Classifiers Generalize Across Time?
🏠project - Interpretable Image Recognition by Constructing Transparent Embedding Space
⭐code - The Pursuit of Knowledge: Discovering and Localizing Novel Categories using Dual Memory
- 长尾识别
- Parametric Contrastive Learning
⭐code - ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot
😮oral⭐code - Self Supervision to Distillation for Long-Tailed Visual Recognition
⭐code - Distilling Virtual Examples for Long-Tailed Recognition
- Distributional Robustness Loss for Long-Tail Learning
- GistNet: A Geometric Structure Transfer Network for Long-Tailed Recognition
- 长尾视觉关系识别
- Parametric Contrastive Learning
- 细粒度
- Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
⭐code - Learning Canonical 3D Object Representation for Fine-Grained Recognition
- Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification
⭐code - N-ImageNet: Towards Robust, Fine-Grained Object Recognition With Event Cameras
- Grafit: Learning fine-grained image representations with coarse labels
- Stochastic Partial Swap: Enhanced Model Generalization and Interpretability for Fine-Grained Recognition
⭐code
- Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
- 小样本分类
- Transductive Few-Shot Classification on the Oblique Manifold
- Relational Embedding for Few-Shot Classification
⭐code🏠project - Binocular Mutual Learning for Improving Few-shot Classification
⭐code - Partner-Assisted Learning for Few-Shot Image Classification
- On the Importance of Distractors for Few-Shot Classification
⭐code - Few-Shot Image Classification: Just Use a Library of Pre-Trained Feature Extractors and a Simple Classifier
- Universal Representation Learning From Multiple Domains for Few-Shot Classification
⭐code - A Multi-Mode Modulator for Multi-Domain Few-Shot Classification
- Variational Feature Disentangling for Fine-Grained Few-Shot Classification
⭐code - Mixture-Based Feature Space Learning for Few-Shot Image Classification
⭐code🏠project📺video
- 多标签分类
- Greedy Gradient Ensemble for Robust Visual Question Answering
⭐code - Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
⭐code - Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
⭐code - Unshuffling Data for Improved Generalization in Visual Question Answering
- TRAR: Routing the Attention Spans in Transformer for Visual Question Answering(https://github.com/rentainhe/TRAR-VQA/)
- Contrast and Classify: Training Robust VQA Models
- Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering
- Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
⭐code - Auto-Parsing Network for Image Captioning and Visual Question Answering
- video question answering
- Just Ask: Learning to Answer Questions from Millions of Narrated Videos
😮oral⭐code🏠project - Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
🏠project - Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments
🌻dataset - On the Hidden Treasure of Dialog in Video Question Answering
⭐code🏠project - HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering
- Video Question Answering Using Language-Guided Deep Compressed-Domain Video Feature
⭐code
- Just Ask: Learning to Answer Questions from Millions of Narrated Videos
- A-VQA
- Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
📺video - Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation
📺video - Towards the Unseen: Iterative Text Recognition by Distilling from Errors
📺video - 任意形状文本检测
- 场景文本识别
- 场景文本替换
- 提取文档图像
- 手写文本生成
- Table Structure Recognition(表格结构识别)
- Action Detection and Recognition(人体动作检测与识别)
- Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
⭐code - MGSampler: An Explainable Sampling Strategy for Video Action Recognition
⭐code - Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning
- Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition
- Class Semantics-based Attention for Action Detection
- MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
⭐code - AdaSGN: Adapting Joint Number and Model Size for Efficient Skeleton-Based Action Recognition
⭐code - OadTR: Online Action Detection With Transformers
⭐code - Self-Supervised 3D Skeleton Action Representation Learning With Motion Consistency and Continuity
- Interactive Prototype Learning for Egocentric Action Recognition
- Efficient Action Recognition via Dynamic Knowledge Propagation
- Else-Net: Elastic Semantic Network for Continual Action Recognition From Skeleton Data
- Learning Self-Similarity in Space and Time As Generalized Motion for Video Action Recognition
⭐code🏠project - Temporal Action Detection With Multi-Level Supervision
⭐code - Watch Only Once: An End-to-End Video Action Detection Framework
⭐code - Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation
😮oral - Geometric Deep Neural Network Using Rigid and Non-Rigid Transformations for Human Action Recognition
- Just One Moment: Structural Vulnerability of Deep Action Recognition Against One Frame Attack
- Evidential Deep Learning for Open Set Action Recognition
⭐code🏠project📺video - Learning an Augmented RGB Representation With Cross-Modal Knowledge Distillation for Action Detection
- Class-Incremental Learning for Action Recognition in Videos
- D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations
⭐code - 零样本动作识别
- Temporal Action Localization(时序动作定位)
- Enriching Local and Global Contexts for Temporal Action Localization
⭐code - Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization
😮oral⭐code - Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization
⭐code - Video Self-Stitching Graph Network for Temporal Action Localization
- Divide and Conquer for Single-Frame Temporal Action Localization
- CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization
- Enriching Local and Global Contexts for Temporal Action Localization
- Temporal Action Proposal Generation(时序动作提案生成)
- Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
- Action Quality Assessment(行动质量评估)
- Video Rescaling
- Video activity localisation
- 视频修复
- Internal Video Inpainting by Implicit Long-range Propagation
⭐code🏠project - Occlusion-Aware Video Object Inpainting
🏠project - FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
⭐code - Flow-Guided Video Inpainting with Scene Templates
⭐code - Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection
- Internal Video Inpainting by Implicit Long-range Propagation
- 视频分析
- 视频剪辑
- 视频字幕
- 视频编码
- 视频生成
- Video Relation Detection(视频关系检测)
- Video Grounding
- 视频精彩片段检测
- Cross-category Video Highlight Detection via Set-based Learning
⭐code - PR-Net: Preference Reasoning for Personalized Video Highlight Detection
- HighlightMe: Detecting Highlights from Human-Centric Videos
- Temporal Cue Guided Video Highlight Detection With Low-Rank Audio-Visual Fusion
- Joint Visual and Audio Learning for Video Highlight Detection
- Cross-category Video Highlight Detection via Set-based Learning
- 视频识别
- Searching for Two-Stream Models in Multivariate Space for Video Recognition
- Adaptive Focus for Efficient Video Recognition
😮oral⭐code - AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
⭐code🏠project - TAM: Temporal Adaptive Module for Video Recognition
⭐code - Condensing a Sequence to One Informative Frame for Video Recognition
- VideoLT: Large-Scale Long-Tailed Video Recognition
⭐code - Motion-Augmented Self-Training for Video Recognition at Smaller Scale
- Multi-Modal Multi-Action Video Recognition
⭐code
- Motion Retargeting(运动重定位)
- 视频预测
- 视频合成
- 视频帧插值
- Deepfake 视频检测
- 视频稳定
- Video Frame-level Similarity(视频帧级相似度学习)
- 视频压缩
- 视频时刻检索
- 视频摘要
- 视频质量评估
- Video Grounding
- 视频定位
- 视频推理
- 视频相关
- 视频异常检测
- Dance With Self-Attention: A New Look of Conditional Random Fields on Anomaly Detection in Videos
- A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction
⭐code
📰解读:ICCV 2021 oral 重构+预测,双管齐下提升视频异常检测性能 - Weakly-Supervised Video Anomaly Detection With Robust Temporal Feature Magnitude Learning
⭐code
- 视频去噪
- Video Portrait Relighting(人像视频重照明)
- 视频时序定位
- 视频关联性
- 视频抠图
- 视频编码
- 识别视频中互动关系
- 视频去模糊
- 视频理解
- 视频重建
- Human Pose Regression with Residual Log-likelihood Estimation
😮oral⭐code - Online Knowledge Distillation for Efficient Pose Estimation
- DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders
😮oral⭐code - Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation
😮oral⭐code - Dynamical Pose Estimation
⭐code📺video - Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation
⭐code🏠project - Egocentric Pose Estimation From Human Vision Span
- Learning Privacy-Preserving Optics for Human Pose Estimation
😮oral⭐code🏠project📺video - TokenPose: Learning Keypoint Tokens for Human Pose Estimation
⭐code - Motion Adaptive Pose Estimation from Compressed Videos
- 3D 人体姿态估计
- PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop
😮oral⭐code🏠project - HuMoR: 3D Human Motion Model for Robust Pose Estimation
😮oral🏠project📺video - Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows
⭐code📺video - Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation
⭐code - EventHPE: Event-based 3D Human Pose and Shape Estimation
⭐code - imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose
- Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
- Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition
- Learning to Regress Bodies from Images using Differentiable Semantic Rendering
🏠project - Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild
⭐code - 3D Human Pose Estimation With Spatial and Temporal Transformers
⭐code📺video - PARE: Part Attention Regressor for 3D Human Body Estimation
⭐code🏠project📺video - Learning Causal Representation for Training Cross-Domain Pose Estimator via Generative Interventions
- UltraPose: Synthesizing Dense Pose With 1 Billion Points by Human-Body Decoupling 3D Model
⭐code - Modulated Graph Convolutional Network for 3D Human Pose Estimation
⭐code - Revitalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation
⭐code🏠project📺video - Estimating Egocentric 3D Human Pose in Global Space
🏠project📺video - Camera Distortion-Aware 3D Human Pose Estimation in Video With Optimization-Based Meta-Learning
⭐code - EM-POSE: 3D Human Pose Estimation From Sparse Electromagnetic Trackers
⭐code🏠project📺video - Towards Alleviating the Modeling Ambiguity of Unsupervised Monocular 3D Human Pose Estimation
🏠project
- PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop
- SPEC: Seeing People in the Wild with an Estimated Camera
⭐code🏠project📺video - Encoder-Decoder With Multi-Level Attention for 3D Human Shape and Pose Estimation
⭐code - 3D姿势迁移
- 手部姿势
- 手势合成
- 手势识别
- 3D 手部姿态
- HandFoldingNet: A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton
⭐code - EventHands: Real-Time Neural 3D Hand Pose Estimation From an Event Stream
⭐code🏠project📺video - Self-Supervised 3D Hand Pose Estimation from monocular RGB via Contrastive Learning
⭐code🏠project📺video
- HandFoldingNet: A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton
- 手部交互姿势估计
- 3D手网格建模
- Towards Accurate Alignment in Real-Time 3D Hand-Mesh Reconstruction
- 手部网格恢复
- 手势学习
- 手势重建
- 三维网格合成
- 人体重建
- 4D人体捕捉
- 人体姿态估计与合成
- 多人姿态估计
- 人/物体姿态关键点检测
- 人体运动捕捉
- 2D人体姿势估计
- Human Action Video Alignment
- 3D姿态迁移
- 人体网格恢复
- 根据人体姿势估计距离
- 3D人体
- 运动合成
- 3D动画
- 服装类别级姿势估计
- 服装人体建模
- 关键点定位
- Spatial-Temporal Transformer for Dynamic Scene Graph Generation
⭐code📺video - Unconditional Scene Graph Generation
🏠project - Target Adaptive Context Aggregation for Video Scene Graph Generation
⭐code - Learning to Generate Scene Graph from Natural Language Supervision
⭐code - Segmentation-Grounded Scene Graph Generation
⭐code - Context-aware Scene Graph Generation with Seq2Seq Transformer
⭐code - A Simple Baseline for Weakly-Supervised Scene Graph Generation
⭐code - Generative Compositional Augmentations for Scene Graph Prediction
⭐code - From General to Specific: Informative Scene Graph Generation via Balance Adjustment
⭐code - 场景合成
- AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds
⭐code🏠project - Adaptive Graph Convolution for Point Cloud Analysis
⭐code - Learning Inner-Group Relations on Point Clouds
- CPFN: Cascaded Primitive Fitting Networks for High-Resolution Point Clouds
⭐code - Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks
⭐code📺video - PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds
⭐code - 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds
- Differentiable Convolution Search for Point Cloud Processing
- Superpoint Network for Point Cloud Oversegmentation
⭐code - PU-EVA: An Edge-Vector Based Approximation Solution for Flexible-Scale Point Cloud Upsampling
- SGMNet: Learning Rotation-Invariant Point Cloud Representations via Sorted Gram Matrix
- DWKS: A Local Descriptor of Deformations Between Meshes and Point Clouds
⭐code - Robustness Certification for Point Cloud Models
⭐code - Vector Neurons: A General Framework for SO(3)-Equivariant Networks
⭐code - Unsupervised Point Cloud Pre-Training via Occlusion Completion
⭐code - Towards Efficient Graph Convolutional Networks for Point Cloud Handling
⭐code - Progressive Seed Generation Auto-Encoder for Unsupervised Point Cloud Learning
- 点云去噪
- 点云配准
- HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration
⭐code🏠project - (Just) A Spoonful of Refinements Helps the Registration Error Go Down
😮oral⭐code - A Robust Loss for Point Cloud Registration
- Deep Hough Voting for Robust Global Registration
- Sampling Network Guided Cross-Entropy Method for Unsupervised Point Cloud Registration
⭐code - Feature Interactive Representation for Point Cloud Registration
- LSG-CPD: Coherent Point Drift With Local Surface Geometry for Point Cloud Registration
⭐code📺video - OMNet: Learning Overlapping Mask for Partial-to-Partial Point Cloud Registration
⭐code - DeepPRO: Deep Partial Point Cloud Registration of Objects
- Provably Approximated Point Cloud Registration
- Bootstrap Your Own Correspondences点云配准
- Distinctiveness Oriented Positional Equilibrium for Point Cloud Registration
- HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration
- 3D点云
- Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projection Matching
⭐code - Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds
⭐code🏠project - Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projections Matching
⭐code - Point Transformer
- Point-Set Distances for Learning Representations of 3D Point Clouds
- PointBA: Towards Backdoor Attacks in 3D Point Cloud
- Minimal Adversarial Examples for Deep Learning on 3D Point Clouds
- 3D点云重建
- Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projection Matching
- 点云补全
- SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
😮oral⭐code - ME-PCN: Point Completion Conditioned on Mask Emptiness
- PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers
😮oral⭐code - Voxel-based Network for Shape Completion by Leveraging Edge Generation
⭐code - RFNet: Recurrent Forward Network for Dense Point Cloud Completion
- SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
- 点云增强
- 点云形状分析
- 点云分析
- 3D点云分类
- 3D点云生成与补全
- point cloud object co-segmentation
- 点云理解
- 域适应
- Transporting Causal Mechanisms for Unsupervised Domain Adaptation
😮oral
⭐code - Generalized Source-free Domain Adaptation
⭐code - Semantic Concentration for Domain Adaptation
⭐code - PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation
⭐code - Learning Cross-modal Contrastive Features for Video Domain Adaptation
- Zero-Shot Day-Night Domain Adaptation With a Physics Prior
😮oral⭐code - Active Universal Domain Adaptation
- Re-energizing Domain Discriminator with Sample Relabeling for Adversarial Domain Adaptation
- OVANet: One-vs-All Network for Universal Domain Adaptation
⭐code - Collaborative Optimization and Aggregation for Decentralized Domain Generalization and Adaptation
- Partial Video Domain Adaptation with Partial Adversarial Temporal Attentive Network
⭐code - Information-Theoretic Regularization for Multi-Source Domain Adaptation
- Gradient Distribution Alignment Certificates Better Adversarial Domain Adaptation
- Adaptive Adversarial Network for Source-Free Domain Adaptation
⭐code - T-SVDNet: Exploring High-Order Prototypical Correlations for Multi-Source Domain Adaptation
⭐code - Self-Supervised Domain Adaptation for Forgery Localization of JPEG Compressed Images
- ECACL: A Holistic Framework for Semi-Supervised Domain Adaptation
⭐code - STEM: An approach to Multi-source Domain Adaptation with Guarantees
- Towards Novel Target Discovery Through Open-Set Domain Adaptation
⭐code - Deep Co-Training With Task Decomposition for Semi-Supervised Domain Adaptation
⭐code - mDALU: Multi-Source Domain Adaptation and Label Unification with Partial Datasets
- Geometry-Aware Self-Training for Unsupervised Domain Adaptation on Object Point Clouds
⭐code - 无监督域适应
- Adversarial Unsupervised Domain Adaptation with Conditional and Label Shift: Infer, Align and Iterate
- Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation
😮oral - Tune it the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density
⭐code - Adversarial Robustness for Unsupervised Domain Adaptation
🏠project - SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised Domain Adaptation
⭐code
- 零样本域适应
- Transporting Causal Mechanisms for Unsupervised Domain Adaptation
- 域泛化
- Domain Generalization via Gradient Surgery
⭐code - Learning to Diversify for Single Domain Generalization
⭐code - Shape-Biased Domain Generalization via Shock Graph Embeddings
- SelfReg: Self-Supervised Contrastive Regularization for Domain Generalization
- A Style and Semantic Memory Mechanism for Domain Generalization
- Confidence Calibration for Domain Generalization Under Covariate Shift
- A Simple Feature Augmentation for Domain Generalization
- Domain Generalization via Gradient Surgery
- 小样本
- Boosting the Generalization Capability in Cross-Domain Few-shot Learning via Noise-enhanced Supervised Autoencoder
- Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration without Forgetting
⭐code - Meta Navigator: Search for a Good Adaptation Policy for Few-shot Learning
- Meta-Learning with Task-Adaptive Loss Function for Few-Shot Learning
😮oral⭐code - Z-Score Normalization, Hubness, and Few-Shot Learning
- Pseudo-Loss Confidence Metric for Semi-Supervised Few-Shot Learning
- Curvature Generation in Curved Spaces for Few-Shot Learning
⭐code - Task-Aware Part Mining Network for Few-Shot Learning
- Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning
⭐code - UVStyle-Net: Unsupervised Few-Shot Learning of 3D Style Similarity Measure for B-Reps
⭐code - Shallow Bayesian Meta Learning for Real-World Few-Shot Recognition
⭐code - Iterative Label Cleaning for Transductive and Semi-Supervised Few-Shot Learning
⭐code - Coarsely-labeled Data for Better Few-shot Transfer
⭐code - 小样本异常检测
- Zero-Shot Learning(零样本学习)
- In-Place Scene Labelling and Understanding with Implicit Scene Representation
😮oral🏠project📺video - Differentiable Surface Rendering via Non-Differentiable Sampling
- Self-Calibrating Neural Radiance Fields
⭐code - NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo
😮oral⭐code🏠project - Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering
⭐code🏠project - CodeNeRF: Disentangled Neural Radiance Fields for Object Categories
⭐code - MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo
⭐code🏠project📺video - PlenOctrees for Real-Time Rendering of Neural Radiance Fields
😮oral⭐Conversion Code⭐Viewer Code🏠project📺video - Neural Radiance Flow for 4D View Synthesis and Video Processing
⭐code🏠project - Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies
⭐code🏠project📺video
📰解读:浙大三维视觉团队提出Animatable NeRF,从RGB视频中重建可驱动人体模型 (ICCV'21) - GNeRF: GAN-Based Neural Radiance Field Without Posed Camera
😮oral - BARF: Bundle-Adjusting Neural Radiance Fields
😮oral⭐code🏠project - FastNeRF: High-Fidelity Neural Rendering at 200FPS
🏠project📺video - PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering
⭐code📺video - NeRD: Neural Reflectance Decomposition from Image Collections
⭐code🏠project📺video - Editing Conditional Radiance Fields
⭐code🏠project📺video - GRF: Learning a General Radiance Field for 3D Representation and Rendering
⭐code - 4DComplete: Non-Rigid Motion Estimation Beyond the Observable Surface
⭐code📺video - KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
⭐code - Neural Articulated Radiance Field
⭐code - Baking Neural Radiance Fields for Real-Time View Synthesis
🏠project📺video - Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video
⭐code🏠project - Nerfies: Deformable Neural Radiance Fields
⭐code🏠project📺video - Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields
- UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction
😮oral⭐code🏠project📺video - 3D渲染
- 3D photography(3D 相片)
- 渲染
- Clustering by Maximizing Mutual Information Across Views
- Learning Hierarchical Graph Neural Networks for Image Clustering
⭐code - One-Pass Multi-View Clustering for Large-Scale Data
- End-to-End Robust Joint Unsupervised Image Alignment and Clustering
- Graph Contrastive Clustering
⭐code - 人脸聚类
- Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives
- SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition
- Self-Mutual Distillation Learning for Continuous Sign Language Recognition
- Visual Alignment Constraint for Continuous Sign Language Recognition
⭐code - 手语翻译
- Bias Loss for Mobile Neural Networks
⭐code - Improve Unsupervised Pretraining for Few-label Transfer
- Temporal-wise Attention Spiking Neural Networks for Event Streams Classification
- Accelerating Atmospheric Turbulence Simulation via Learned Phase-to-Space Transform
- Energy-Based Open-World Uncertainty Modeling for Confidence Calibration
- Robustness via Cross-Domain Ensembles
😮oral⭐code🏠project📺video - Warp Consistency for Unsupervised Learning of Dense Correspondences
😮oral⭐code - Few-Shot and Continual Learning with Attentive Independent Mechanisms
⭐code - Out-of-Core Surface Reconstruction via Global TGV Minimization
- ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description
- Multi-scale Matching Networks for Semantic Correspondence
⭐code - Learning with Noisy Labels via Sparse Regularization
⭐code - CanvasVAE: Learning to Generate Vector Graphic Documents
- Toward Spatially Unbiased Generative Models
⭐code - Learning Compatible Embeddings
⭐code - Instance Similarity Learning for Unsupervised Feature Representation
⭐code - Generalizable Mixed-Precision Quantization via Attribution Rank Preservation
⭐code - Unifying Nonlocal Blocks for Neural Networks
⭐code - Impact of Aliasing on Generalization in Deep Convolutional Networks
- NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models
⭐code - ProAI: An Efficient Embedded AI Hardware for Automotive Applications - a Benchmark Study
- m-RevNet: Deep Reversible Neural Networks with Momentum 涉嫌学术不端,已申请撤稿
- Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations
- MT-ORL: Multi-Task Occlusion Relationship Learning
⭐code - Finding Representative Interpretations on Convolutional Neural Networks
- Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation
⭐code - PR-RRN: Pairwise-Regularized Residual-Recursive Networks for Non-rigid Structure-from-Motion
- Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks
⭐code - Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision
⭐code - Structured Outdoor Architecture Reconstruction by Exploration and Classification
- Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs
⭐code - A New Journey from SDRTV to HDRTV
⭐code - A Simple Framework for 3D Lensless Imaging with Programmable Masks
⭐code - Causal Attention for Unbiased Visual Recognition
⭐code - Learning to Match Features with Seeded Graph Matching Network
⭐code - Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain
- PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility
😮oral - Towards Understanding the Generative Capability of Adversarially Robust Classifiers
😮oral - Ranking Models in Unlabeled New Environments
⭐code - Learning of Visual Relations: The Devil is in the Tails
🏠project - BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies
⭐code - Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image
- 去偏差
- Full-Velocity Radar Returns by Radar-Camera Fusion
- CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing
⭐code🏠project - NGC: A Unified Framework for Learning with Open-World Noisy Data
- LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
🏠project - Unsupervised Dense Deformation Embedding Network for Template-Free Shape Correspondence
- Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process
⭐code - Digging into Uncertainty in Self-supervised Multi-view Stereo
- Learning to Discover Reflection Symmetry via Polar Matching Convolution
⭐code🏠project - A Dual Adversarial Calibration Framework for Automatic Fetal Brain Biometry
- The Functional Correspondence Problem
- The Animation Transformer: Visual Correspondence via Segment Matching
- Parsing Table Structures in the Wild
⭐code - Square Root Marginalization for Sliding-Window Bundle Adjustment
⭐code🏠project📺video - Hierarchical Object-to-Zone Graph for Object Navigation
⭐code📺video - Robustness and Generalization via Generative Adversarial Training
- Learning Fast Sample Re-weighting Without Reward Data
⭐code - ReconfigISP: Reconfigurable Camera Image Processing Pipeline
🏠project - Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting
😮oral - Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories
- DisUnknown: Distilling Unknown Factors for Disentanglement Learning
⭐code🏠project - S3VAADA: Submodular Subset Selection for Virtual Adversarial Active Domain Adaptation
🏠project - ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity
📺video - Photon-Starved Scene Inference using Single Photon Cameras
📺video - OSCAR-Net: Object-centric Scene Graph Attention for Image Attribution
⭐code🏠project - Learning to Estimate Hidden Motions with Global Motion Aggregation
⭐code📺video - Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning
- Meta Learning on a Sequence of Imbalanced Domains with Difficulty Awareness
⭐code - Procedure Planning in Instructional Videosvia Contextual Modeling and Model-based Policy Learning
😮oral - Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice
😮oral⭐code - Neural Strokes: Stylized Line Drawing of 3D Shapes
⭐code - Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D Shape, Pose, and Appearance Consistency
- Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans
🏠project - Cherry-Picking Gradients: Learning Low-Rank Embeddings of Visual Data via Differentiable Cross-Approximation
⭐code - Exploiting Explanations for Model Inversion Attacks
- Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization
- RDI-Net: Relational Dynamic Inference Networks
⭐code - ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators
⭐code - T-Net: Effective Permutation-Equivariant Network for Two-View Correspondence Learning
⭐code - Learning To Stylize Novel Views
⭐code🏠project - A Lazy Approach to Long-Horizon Gradient-Based Meta-Learning
- Viewing Graph Solvability via Cycle Consistency
😮oral⭐code
🏆Best paper honorable mention - SACoD: Sensor Algorithm Co-Design Towards Efficient CNN-Powered Intelligent PhlatCam
⭐code - Rethinking 360° Image Visual Attention Modelling with Unsupervised Learning
- Motion Basis Learning for Unsupervised Deep Homography Estimation with Subspace Projection
⭐code - Batch Normalization Increases Adversarial Vulnerability and Decreases Adversarial Transferability: A Non-Robust Feature Perspective
- DeepCAD: A Deep Generative Network for Computer-Aided Design Models
🏠project - Better Aggregation in Test-Time Augmentation
- Self-Born Wiring for Neural Trees
- Detector-Free Weakly Supervised Grounding by Separation
- Motion-Aware Dynamic Architecture for Efficient Frame Interpolation
- Relating Adversarially Robust Generalization to Flat Minima
- Bit-Mixer: Mixed-Precision Networks With Runtime Bit-Width Selection
- AINet: Association Implantation for Superpixel Segmentation
⭐code - Orthogonal Projection Loss
⭐code - Knowledge-Enriched Distributional Model Inversion Attacks
⭐code - Architecture Disentanglement for Deep Neural Networks
⭐code - On Equivariant and Invariant Learning of Object Landmark Representations
⭐code🏠project - Predicting with Confidence on Unseen Distributions
- Embed Me If You Can: A Geometric Perceptron
⭐code - Persistent Homology Based Graph Convolution Network for Fine-Grained 3D Shape Segmentation
- HIRE-SNN: Harnessing the Inherent Robustness of Energy-Efficient Deep Spiking Neural Networks by Training With Crafted Input Noise
⭐code - Towards Memory-Efficient Neural Networks via Multi-Level In Situ Generation
- From Culture to Clothing: Discovering the World Events Behind a Century of Fashion Images
🏠project - MBA-VO: Motion Blur Aware Visual Odometry
⭐code - STR-GQN: Scene Representation and Rendering for Unknown Cameras Based on Spatial Transformation Routing
- Explaining Local, Global, And Higher-Order Interactions In Deep Learning
- Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations
⭐code - Homogeneous Architecture Augmentation for Neural Predictor
⭐code - SS-IL: Separated Softmax for Incremental Learning
- VSAC: Efficient and Accurate Estimator for H and F
- Fusion Moves for Graph Matching
⭐code🏠project - Geometric Granularity Aware Pixel-To-Mesh
- Modulated Periodic Activations for Generalizable Local Functional Representations
🏠project - Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents
⭐code🏠project📺video - A Dark Flash Normal Camera
🏠project📺video - Pri3D: Can 3D Priors Help 2D Representation Learning?
⭐code📺video - Membership Inference Attacks Are Easier on Difficult Problems
- Auxiliary Tasks and Exploration Enable ObjectGoal Navigation
⭐code🏠project - MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
- Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery
🏠project - DCT-SNN: Using DCT To Distribute Spatial Information Over Time for Low-Latency Spiking Neural Networks
⭐code - Learning To Resize Images for Computer Vision Tasks
- Field of Junctions: Extracting Boundary Structure at Low SNR
- DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling
- Learning To Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data
⭐code - Graph-based Asynchronous Event Processing for Rapid Object Recognitio
- Ranking Models in Unlabeled New Environments
⭐code - A Hybrid Frequency-Spatial Domain Model for Sparse Image Reconstruction in Scanning Transmission Electron Microscopy
⭐code - MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing
- Efficient Large Scale Inlier Voting for Geometric Vision Problems
⭐code - Aggregation With Feature Detection
- ReCU: Reviving the Dead Weights in Binary Neural Networks
⭐code - Deep Halftoning With Reversible Binary Pattern
- FFT-OT: A Fast Algorithm for Optimal Transportation
- Progressive Correspondence Pruning by Consensus Learning
⭐code🏠project
📰解读:基于一致性学习的渐进式匹配筛选 (ICCV 2021) - Multispectral Illumination Estimation Using Deep Unrolling Network
- Distilling Global and Local Logits With Densely Connected Relations
- Learning specialized activation functions with the Piecewise Linear Unit
- Adaptive Convolutions With Per-Pixel Dynamic Filter Atom
- Deep Matching Prior: Test-Time Optimization for Dense Correspondence
⭐code - Calibrated and Partially Calibrated Semi-Generalized Homographies
⭐code - The Spatio-Temporal Poisson Point Process: A Simple Model for the Alignment of Event Camera Data
⭐code - EC-DARTS: Inducing Equalized and Consistent Optimization Into DARTS
- Refining activation downsampling with SoftPool
- FATNN: Fast and Accurate Ternary Neural Networks
⭐code - GTT-Net: Learned Generalized Trajectory Triangulation
- Deep Permutation Equivariant Structure from Motion
⭐code - Extending Neural P-frame Codecs for B-frame Codin
- Hierarchical Graph Attention Network for Few-Shot Visual-Semantic Learning
- SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks
⭐code - AA-RMVSNet: Adaptive Aggregation Recurrent Multi-View Stereo Network
⭐code - Rethinking the Backdoor Attacks' Triggers: A Frequency Perspective
⭐code - Orthographic-Perspective Epipolar Geometry
- Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?
⭐code - PixelPyramids: Exact Inference Models From Lossless Image Pyramids
⭐code - SurfaceNet: Adversarial SVBRDF Estimation from a Single Image
⭐code - Adaptive Curriculum Learning
- Sparse-Shot Learning With Exclusive Cross-Entropy for Extremely Many Localisations
- Graspness Discovery in Clutters for Fast and Accurate Grasp Detection
- RobustNav: Towards Benchmarking Robustness in Embodied Navigation
⭐code - Generating Attribution Maps With Disentangled Masked Backpropagation
- Spectral Leakage and Rethinking the Kernel Size in CNNs
⭐code - What You Can Learn by Staring at a Blank Wall
- Neural TMDlayer: Modeling Instantaneous Flow of Features via SDE Generators
- CLEAR: Clean-up Sample-Targeted Backdoor in Neural Networks
- Learning To Hallucinate Examples From Extrinsic and Intrinsic Supervision
- Single-shot Hyperspectral-Depth Imaging with Learned Diffractive Optics
- GridToPix: Training Embodied Agents With Minimal Supervision
🏠project📺video - Differentiable Dynamic Wirings for Neural Networks
- JEM++: Improved Techniques for Training JEM
⭐code - X-World: Accessibility, Vision, and Autonomy Meet
- Memory-augmented Dynamic Neural Relational Inference
- Physics-based Differentiable Depth Sensor Simulation
- Hypergraph Neural Networks for Hypergraph Matching
⭐code - Visual Grounding
- Cortical Surface Shape Analysis Based on Alexandrov Polyhedra
- FcaNet: Frequency Channel Attention Networks
⭐code - Procedure Planning in Instructional Videos via Contextual Modeling and Model-Based Policy Learning
- Structured Outdoor Architecture Reconstruction by Exploration and Classification
⭐code - ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description
- Testing Using Privileged Information by Adapting Features With Statistical Dependence
- Virtual Light Transport Matrices for Non-Line-of-Sight Imaging
😮oral - DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training
- Contrastive Multimodal Fusion with TupleInfoNCE
- Learning Better Visual Data Similarities via New Grouplet Non-Euclidean Embedding
- An Elastica Geodesic Approach With Convexity Shape Prior
- Inverting a Rolling Shutter Camera: Bring Rolling Shutter Images to High Framerate Global Shutter Video
- Multimodal Knowledge Expansion
⭐code - Direct Differentiable Augmentation Search
⭐code - The Functional Correspondence Problem
🏠project - Joint Topology-Preserving and Feature-Refinement Network for Curvilinear Structure Segmentation
⭐code - Generative Layout Modeling Using Constraint Graphs
- Self-Supervised Image Prior Learning with GMM from a Single Noisy Image
⭐code - Deep Implicit Surface Point Prediction Networks
⭐code🏠project📺video - Poly-NL: Linear Complexity Non-local Layers With 3rd Order Polynomials
- Factorizing Perception and Policy for Interactive Instruction Following
⭐code - Group-Wise Inhibition Based Feature Regularization for Robust Classification
⭐code - Searching for Robustness: Loss Learning for Noisy Classification Tasks
- Statistically Consistent Saliency Estimation
- Practical Relative Order Attack in Deep Ranking
⭐code - Q-Match: Iterative Shape Matching via Quantum Annealing
⭐code🏠project - Learning To Better Segment Objects From Unseen Classes With Unlabeled Videos
🏠project📺video - Globally Optimal and Efficient Manhattan Frame Estimation by Delimiting Rotation Search Space
- Cross-Encoder for Unsupervised Gaze Representation Learning
- Hierarchical Disentangled Representation Learning for Outdoor Illumination Estimation and Editing
- NeuSpike-Net: High Speed Video Reconstruction via Bio-Inspired Neuromorphic Cameras
- Local Temperature Scaling for Probability Calibration
- LIRA: Learnable, Imperceptible and Robust Backdoor Attacks
- Conformer: Local Features Coupling Global Representations for Visual Recognition
⭐code - Reliably fast adversarial training via latent adversarial perturbation
- PX-NET: Simple and Efficient Pixel-Wise Training of Photometric Stereo Networks
- A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation
⭐code🏠project📺video - ICON: Learning Regular Maps Through Inverse Consistency
- Video Geo-Localization Employing Geo-Temporal Feature Learning and GPS Trajectory Smoothing
⭐code - Kernel Methods in Hyperbolic Spaces
- Cross-Camera Convolutional Color Constancy
- BlockPlanner: City Block Generation with Vectorized Graph Representation
- A Machine Teaching Framework for Scalable Recognition
- Clothed Human Bodies
- 迁移学习
- Active Recognition(AR)
- 3D摄影
- Sub-Bit Neural Networks: Learning To Compress and Accelerate Binary Neural Networks
⭐code - When Pigs Fly: Contextual Reasoning in Synthetic and Natural Scenes
⭐code - Physics-Enhanced Machine Learning for Virtual Fluorescence Microscopy
⭐code - Ground-truth or DAER: Selective Re-query of Secondary Information
⭐code - Can Shape Structure Features Improve Model Robustness Under Diverse Adversarial Settings?
- Joint Representation Learning and Novel Category Discovery on Single- and Multi-Modal Data
- Sparse Needlets for Lighting Estimation with Spherical Transport Loss
- Semantic Perturbations with Normalizing Flows for Improved Generalization
- Differentiable Surface Rendering via Non-Differentiable Sampling
- Towards Robustness of Deep Neural Networks via Regularization
- Objects as Cameras: Estimating High-Frequency Illumination from Shadows
- Inference of Black Hole Fluid-Dynamics From Sparse Interferometric Measurements
- Removing the Bias of Integral Pose Regression
- A Light Stage on Every Desk
🏠project - Multi-Level Curriculum for Training a Distortion-Aware Barrel Distortion Rectification Model
- Generic Event Boundary Detection: A Benchmark for Event Segmentation
- Extreme Structure from Motion for Indoor Panoramas without Visual Overlaps
⭐code - Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams
⭐code - VaPiD: A Rapid Vanishing Point Detector via Learned Optimizers
- Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images
⭐code - Efficient and Differentiable Shadow Computation for Inverse Problems
- Minimal Cases for Computing the Generalized Relative Pose using Affine Correspondences
- Radial Distortion Invariant Factorization for Structure from Motion
- LaLaLoc: Latent Layout Localisation in Dynamic, Unvisited Environments
- Transforms Based Tensor Robust PCA: Corrupted Low-Rank Tensors Recovery via Convex Optimization
- Synchronization of Group-labelled Multi-graphs
- Robust Watermarking for Deep Neural Networks via Bi-Level Optimization
- CrossNorm and SelfNorm for Generalization under Distribution Shifts
⭐code - Learning Temporal Dynamics from Cycles in Narrated Video
🏠project - von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning
- Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts
⭐code - Me-Momentum: Extracting Hard Confident Examples From Noisily Labeled Data
⭐code - ProFlip: Targeted Trojan Attack with Progressive Bit Flips
- Attention Is Not Enough: Mitigating the Distribution Discrepancy in Asynchronous Multimodal Sequence Fusion
- AdvRush: Searching for Adversarially Robust Neural Architectures
- Improving robustness against common corruptions with frequency biased models
- UASNet: Uncertainty Adaptive Sampling Network for Deep Stereo Matching
- Glimpse-Attend-and-Explore: Self-Attention for Active Visual Exploration
⭐code - Field Convolutions for Surface CNNs
😮oral⭐code - SIMstack: A Generative Shape and Instance Model for Unordered Object Stacks
- Learning Icosahedral Spherical Probability Map Based on Bingham Mixture Model for Vanishing Point Estimation
- Incorporating Learnable Membrane Time Constant to Enhance Learning of Spiking Neural Networks
⭐code - Real-Time Vanishing Point Detector Integrating Under-Parameterized RANSAC and Hough Transform
- Low-Rank Tensor Completion by Approximating the Tensor Average Rank
- Rotation Averaging in a Split Second: A Primal-Dual Method and a Closed-Form for Cycle Graphs
⭐code - Effectively Leveraging Attributes for Visual Similarity
⭐code - Localized Simple Multiple Kernel K-means
⭐code - SmartShadow: Artistic Shadow Drawing Tool for Line Drawings
- PT-CapsNet: A Novel Prediction-Tuning Capsule Network Suitable for Deeper Architectures
⭐code - Generalized Shuffled Linear Regression
⭐code - The Animation Transformer: Visual Correspondence via Segment Matching
- Weak Adaptation Learning: Addressing Cross-Domain Data Insufficiency With Weak Annotator
- Building-GAN: Graph-Conditioned Architectural Volumetric Design Generation
- Procrustean Training for Imbalanced Deep Learning