ICCV2021最新信息及已接收论文/代码(持续更新)

官网链接：http://iccv2021.thecvf.com/home
开会时间：2021年10月11日至17日

❗❗❗🌟🌟🌟📗📗📗ICCV 2021收录论文已全部公布，下载可在【我爱计算机视觉】后台回复“paper”，即可收到。共计 1612 篇。

❗❗❗🌟🌟🌟全部论文已粗略分类完毕，请查阅

🐶	🐭	🐹	🐯
65.Optical Flow Estimation(光流估计)
61.Metric Learning(元学习)	62.Open-Set Recognition(开放集识别)	63.Data Augmentation(数据增强)	64.Anomaly Detection(异常检测)
57.Image Matching(图像匹配)	58.Computational Photography(光学、几何、光场成像、计算摄影)	59.Graph Neural Networks(图神经网络)	60.Federated Learning(联合学习)
53.Vision Localization(视觉定位)	54.Sketch recognition(草图)	55.Activity Recognition(活动识别)	56.Dataset(数据集)
49.Human-Object Interaction(人物交互)	50.Continual Learning(持续学习)	51.View Synthesis(视图合成)	52.Vision-and-Language(视觉语言)
45.Image Caption(图像字幕)	46.Defect Detection(缺陷检测)	47.NAS	48.6DoF
41.Out-of-Distribution Detection(OOD)	42.Visual Representations Learning(视觉表征学习)	43.Dense Prediction(密集预测)	44.Human motion prediction(人体运动预测)
37.Multitask Learning(多任务学习)	38.Weakly/Semi-Supervised/Self-supervised/Unsupervised Learning(自/半/弱监督学习)	39.Incremental Learning(增量学习)	40.Metric Learning(度量学习)
33.Remote Sensing Images(遥感影像)	34.Image Super-Resolution(图像超分辨率)	35.Quantization/Pruning/Knowledge Distillation/Model Compression(量化、剪枝、蒸馏、模型压缩/扩展与优化)	36.SLAM/AR/VR/机器人
29.Image Retrieval(图像检索)	30.Image Generation/synthesis(图像生成/合成)	31.Style Transfer(风格迁移)	32.语音
25.Medical Image(医学影像)	26.Image Processing(图像处理)	27.Multi-label image recognition(多标签图像识别)	28.Contrastive Learning(对比学习)]
21.Active Learning(主动学习)	22.GAN	23.Gaze Estimation(视线估计)	24.Face(人脸)
17.3D(三维视觉)	18.Transformers	19.Self-Driving Vehicles(自动驾驶)	20.Adversarial Learning(对抗学习)
13.Image Segmentation(图像分割)	14.Object Detection(目标检测)	15.Object Tracking(目标跟踪)	16.Re-Identification(重识别)
9.Video	10.OCR	11.Visual Question Answering(视觉问答)	12.Image/Fine-Grained Classification(图像/细粒度分类)
5.Few-Shot/Zero-Shot Learning;Domain Generalization/Adaptation(小/零样本学习;域适应/泛化)	6.Point Cloud(点云)	7.Scene Graph Generation(场景图生成)	8.Human Pose Estimation(人体姿态估计)
1.Other(其它)	2.Sign Language(手语识别)	3.Image Clustering(图像聚类)	4.Neural rendering(神经渲染)

65.Optical Flow Estimation(光流估计)

Separable Flow: Learning Motion Cost Volumes for Optical Flow Estimation
⭐code
High-Resolution Optical Flow from 1D Attention and Correlation
😮oral⭐code
GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning
⭐code
Sensor-Guided Optical Flow
⭐code

64.Anomaly Detection(异常检测)

表面异常检测
- DRÆM – A discriminatively trained reconstru
  ⭐code
异常检测
- Weakly Supervised Temporal Anomaly Segmentation with Dynamic Time Warping
- Learning Unsupervised Metaformer for Anomaly Detection
  解决图像异常的分类或定位

63.Data Augmentation(数据增强)

DivAug: Plug-In Automated Data Augmentation With Explicit Diversity Maximization
⭐code
TrivialAugment: Tuning-Free Yet State-of-the-Art Data Augmentation
😮oral⭐code
Semantic Aware Data Augmentation for Cell Nuclei Microscopical Images With Artificial Neural Networks
A Simple Baseline for Semi-Supervised Semantic Segmentation With Strong Data Augmentation

62.Open-Set Recognition(开放集识别)

OpenGAN: Open-Set Recognition via Open Data Generation
🏆Best Paper Honorable Mention
Conditional Variational Capsule Network for Open Set Recognition
⭐code

61.Metric Learning(元学习)

Do Different Deep Metric Learning Losses Lead to Similar Learned Features?
⭐code
Learning With Memory-Based Virtual Classes for Deep Metric Learning
⭐code

60.Federated Learning(联合学习)

Federated Learning for Non-IID Data via Unified Feature Learning and Optimization Objective Alignment
Ensemble Attention Distillation for Privacy-Preserving Federated Learning

59.Graph Neural Networks(图神经网络)

Meta-Aggregator: Learning to Aggregate for 1-bit Graph Neural Networks
PoGO-Net: Pose Graph Optimization With Graph Neural Networks
⭐code
Dynamic Dual Gating Neural Networks
⭐code

58.Computational Photography(光学、几何、光场成像、计算摄影)

An Asynchronous Kalman Filter for Hybrid Event Cameras
⭐code
4D Cloud Scattering Tomography
Snapshot compressive imaging(快照压缩成像)
- Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging
  ⭐code
光场
压缩成像
- Time-Multiplexed Coded Aperture Imaging: Learned Coded Aperture and Pixel Exposures for Compressive Imaging Systems
Homography Estimation
- LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution Homography Estimation
计算成像
- Extreme-Quality Computational Imaging via Degradation Framework
光学像差矫正
- Universal and Flexible Optical Aberration Correction Using Deep-Prior Based Deconvolution
  ⭐code

57.Image Matching(图像匹配)

Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes
特征点匹配
- P2-Net: Joint Description and Detection of Local Features for Pixel and Point Matching
  ⭐code

56.Dataset(数据集)

Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm Under Mixed Illumination
🌻dataset
FloW: A Dataset and Benchmark for Floating Waste Detection in Inland Waters
🌻dataset
内陆水域漂浮废物检测数据集和基准
FloorPlanCAD: A Large-Scale CAD Drawing Dataset for Panoptic Symbol Spotting
🏠project
生物医学图像
- BioFors: A Large Biomedical Image Forensics Dataset
3D重建
- Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
  🌻dataset
航空影像数据集
- Beyond Road Extraction: A Dataset for Map Update using Aerial Images
  ⭐code🏠project
  用于使用航拍图像更新地图的数据集
动作识别
- HAA500: Human-Centric Atomic Action Dataset with Curated Videos
目标识别
- ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition
  ⭐code🌻dataset
车道线检测
- VIL-100: A New Dataset and a Baseline Model for Video Instance Lane Detection
  🌻dataset
自动驾驶
- Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset
视觉语言数据集
- E-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
  ⭐codeVL
DeepFake检测
- KoDF: A Large-Scale Korean DeepFake Detection Dataset
  🌻dataset
高质量视频
- Seeing Dynamic Scene in the Dark: A High-Quality Video Dataset With Mechatronic Alignment
  🌻dataset视频

55.Activity Recognition(活动识别)

Selective Feature Compression for Efficient Activity Recognition Inference
小组活动识别
- Spatio-Temporal Dynamic Inference Network for Group Activity Recognition
  ⭐code
- GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer
  ⭐code

54.Sketch recognition(草图)

SketchLattice: Latticed Representation for Sketch Manipulation
SketchAA: Abstract Representation for Abstract Sketches

53.Vision Localization(视觉定位)

Continual Learning for Image-Based Camera Localization
⭐code
CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization
🌻dataset
Pose Correction for Highly Accurate Visual Localization in Large-Scale Indoor Spaces
⭐code
Cross-Descriptor Visual Localization and Mapping

52.Vision-and-Language(视觉语言)

YouRefIt: Embodied Reference Understanding with Language and Gesture
😮oral🏠project
VLGrammar: Grounded Grammar Induction of Vision and Language
⭐code
COOKIE: Contrastive Cross-Modal Knowledge Sharing Pre-Training for Vision-Language Representation
⭐code
Panoptic Narrative Grounding
😮oral⭐code
AESOP: Abstract Encoding of Stories, Objects, and Pictures
⭐code📺video
Adaptive Hierarchical Graph Reasoning With Semantic Coherence for Video-and-Language Inference
视觉推理
- Interpretable Visual Reasoning via Induced Symbolic Space
语义导航
- THDA: Treasure Hunt Data Augmentation for Semantic Navigation
视觉语言导航
视觉对话导航
- Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation
视觉导航
- Pose Invariant Topological Memory for Visual Navigation
  ⭐code
- Visual Graph Memory With Unsupervised Representation for Visual Navigation
  ⭐code🏠project📺video
visual grounding
- InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
  ⭐code
- TransVG: End-to-End Visual Grounding With Transformers
  ⭐code
视觉对话
- Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue

51.View Synthesis(视图合成)

Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization
⭐code
Deep 3D Mask Volume for View Synthesis of Dynamic Scenes
🏠project
Embedding Novel Views in a Single JPEG Image
Video Autoencoder: self-supervised disentanglement of static 3D structure and motion
😮oral⭐code🏠project📺video
Geometry-Free View Synthesis: Transformers and No 3D Priors
⭐code
Dynamic View Synthesis From Dynamic Monocular Video
🏠project📺video
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
🏠project📺video
Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image
😮oral⭐code🏠project📺video
Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image
😮oral⭐code🏠project📺video

50.Continual Learning(持续学习)

Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data
⭐code
Continual Learning on Noisy Data Streams via Self-Purified Replay
⭐code
Rehearsal Revealed: The Limits and Merits of Revisiting Samples in Continual Learning
⭐code
Co2L: Contrastive Continual Learning
⭐code

49.Human-Object Interaction(人物交互)

Exploiting Scene Graphs for Human-Object Interaction Detection
⭐code
Spatially Conditioned Graphs for Detecting Human-Object Interactions
⭐code📺video
Virtual Multi-Modality Self-Supervised Foreground Matting for Human-Object Interaction
Detecting Human-Object Relationships in Videos
Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions
⭐code🏠project🌻dataset
Discovering Human Interactions With Large-Vocabulary Objects via Query and Multi-Scale Detection
⭐code
Visual Relationship Detection Using Part-and-Sum Transformers With Composite QueriesVRD和HOI
Interaction Compass: Multi-Label Zero-Shot Learning of Human-Object Interactions via Spatial Relations
⭐code
H2O
- H2O: A Benchmark for Visual Human-Human Object Handover Analysis
Human Interaction Understanding
- Consistency-Aware Graph Network for Human Interaction Understanding
  ⭐code
- H2O: Two Hands Manipulating Objects for First Person Interaction Recognition
  🏠project
手物交互
HOI(行为理解)
- GeomNet: A Neural Network Based on Riemannian Geometries of SPD Matrix Space and Cholesky Space for 3D Skeleton-Based Interaction Recognition

48.6DoF

SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation
⭐code
StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation
SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation
RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering
⭐code
DualPoseNet: Category-Level 6D Object Pose and Size Estimation Using Dual Pose Network With Refined Learning of Pose Consistency
⭐code
PR-GCN: A Deep Graph Convolutional Network With Point Refinement for 6D Pose Estimation
物体姿势估计
- CAPTRA: CAtegory-Level Pose Tracking for Rigid and Articulated Objects From Point Clouds
  😮oral⭐code🏠project📺video

47.NAS

BN-NAS: Neural Architecture Search with Batch Normalization
⭐code
RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving
Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift
⭐code
Evolving Search Space for Neural Architecture Search
⭐code📺video
FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search
⭐code
GLiT: Neural Architecture Search for Global and Local Image Transformer
⭐code
Neural Architecture Search for Joint Human Parsing and Pose Estimation
⭐code
Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces
Learning Latent Architectural Distribution in Differentiable Neural Architecture Search via Variational Information Maximization
Not All Operations Contribute Equally: Hierarchical Operation-Adaptive Predictor for Neural Architecture Search
Zen-NAS: A Zero-Shot NAS for High-Performance Image Recognition
⭐code
BossNAS: Exploring Hybrid CNN-Transformers With Block-Wisely Self-Supervised Neural Architecture Search
⭐code
NAS-OoD: Neural Architecture Search for Out-of-Distribution Generalization
AutoSpace: Neural Architecture Search With Less Human Interference
⭐code
IDARTS: Interactive Differentiable Architecture Search

46.Defect Detection(缺陷检测)

DRÆM -- A discriminatively trained reconstruction embedding for surface anomaly detection

45.Image Caption(图像字幕)

Who's Waldo? Linking People Across Text and Images
😮oral🏠project
📰解读:ICCV2021 Oral-新任务！新数据集！康奈尔大学提出了类似VG但又不是VG的PVG任务
Partial Off-Policy Learning: Balance Accuracy and Diversity for Human-Oriented Image Captioning
Topic Scene Graph Generation by Attention Distillation From Caption
⭐code
Understanding and Evaluating Racial Biases in Image Captioning
⭐code🏠project
In Defense of Scene Graphs for Image Captioning
⭐code
art description generation(艺术描述生成)
- Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation
  ⭐code
Change Captioning
- Viewpoint-Agnostic Change Captioning With Cycle Consistency

44.Human motion prediction(人体运动预测)

MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction
⭐code
Stochastic Scene-Aware Motion Prediction
⭐code🏠project
Generating Smooth Pose Sequences for Diverse Human Motion Prediction
😮oral⭐code
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild
🏠project
Motion Prediction using Trajectory Cues
3D人体运动预测
- Contextually Plausible and Diverse 3D Human Motion Prediction

43.Dense Prediction(密集预测)

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction
⭐code
多任务密集预测
- Exploring Relational Context for Multi-Task Dense Prediction

42.Representations Learning(表征学习)

Learning From Noisy Data With Robust Representation Learning
⭐code
Self-Supervised Representation Learning From Flow Equivariance
Exploring Visual Engagement Signals for Representation Learning
⭐code
Switchable K-class Hyperplanes for Noise-Robust Representation Learning
⭐code
Region Similarity Representation Learning
⭐code
Curious Representation Learning for Embodied Intelligence
⭐code🏠project
视觉表征学习
视频表示学习
- Composable Augmentation Encoding for Video Representation Learning
- Motion-Focused Contrastive Learning of Video Representations
ASCNet: Self-Supervised Video Representation Learning With Appearance-Speed Consistency
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
🏠project
Time-Equivariant Contrastive Video Representation Learning
Space-Time Crop & Attend: Improving Cross-Modal Video Representation Learning
⭐code

41.Out-of-Distribution Detection(OOD)

CODEs: Chamfer Out-of-Distribution Examples against Overconfidence Issue
Semantically Coherent Out-of-Distribution Detection
⭐code🏠project
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
⭐code

40.Metric Learning(度量学习)

Towards Interpretable Deep Metric Learning with Structural Matching
⭐code
Deep Relational Metric Learning
⭐code
LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric Learning
⭐code
Manifold Matching via Deep Metric Learning for Generative Modeling
⭐code

39.Incremental Learning(增量学习)

类增量学习

38.Weakly/Semi-Supervised/Self-supervised/Unsupervised Learning(自/半/弱监督学习)

半监督
自监督
弱监督
- Weakly Supervised Representation Learning With Coarse Labels
  ⭐code

37.Multitask Learning(多任务学习)

MultiTask-CenterNet (MCN): Efficient and Diverse Multitask Learning using an Anchor Free Approach
📰解读:ICCV2021《MultiTask CenterNet》CV多任务新进展！一节更比三节强
Multi-Task Self-Training for Learning General Representations
📰解读:ICCV2021 MuST：还在特定任务里为刷点而苦苦挣扎？谷歌的大佬们都已经开始玩多任务训练了
UniT: Multimodal Multitask Learning With a Unified Transformer
⭐code
Learning Multiple Pixelwise Tasks Based on Loss Scale Balancing
⭐code
Learning With Privileged Tasks
Task Switching Network for Multi-Task Learning

36.SLAM/AR/VR/机器人

机器人
- 室内导航
  - The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
    ⭐code🏠project
  - Pathdreamer: A World Model for Indoor Navigation
    📺video
- 机器手抓取
  - Hand-Object Contact Consistency Reasoning for Human Grasps Generation
    😮oral⭐code🏠project📺video
VR/AR
- The Power of Points for Modeling Humans in Clothing
  ⭐code🏠project📺video
- 虚拟试穿
SLAM

35.Quantization/Pruning/Knowledge Distillation/Model Compression(量化、剪枝、蒸馏、模型压缩/扩展与优化)

知识蒸馏
量化
模型压缩
- GDP: Stabilized Neural Network Pruning via Gates with Differentiable Polarization
- Exploration and Estimation for Model Compression
剪枝
- ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting
  ⭐code
- Auto Graph Encoder-Decoder for Neural Network Pruning

34.Super-Resolution(超分辨率)

ISR
VSR

33.Remote Sensing Images(遥感影像)

SUNet: Symmetric Undistortion Network for Rolling Shutter Correction
Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery
⭐code
📰解读:ICCV2021｜武汉大学RSIDEA团队提出一种新颖的弱监督遥感变化检测算法STAR
卫星图像全景视频合成
- Sat2Vid: Street-view Panoramic Video Synthesis from a Single Satellite Image
基于卫星影像的交通事故检测
- Inferring High-Resolution Traffic Accident Risk Maps Based on Satellite Imagery and GPS Trajectories
遥感数据
- Seasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing Data
  ⭐code
- Dynamic Cross Feature Fusion for Remote Sensing Pansharpening
分割
- Self-Mutating Network for Domain Adaptive Segmentation in Aerial Images
- 卫星图像的全景分割
  - Panoptic Segmentation of Satellite Image Time Series With Convolutional Temporal Attention Networks
    ⭐code🌻PASTIS dataset
三维重建
- 3D Building Reconstruction from Monocular Remote Sensing Images
  🏠project

32.语音

The Right to Talk: An Audio-Visual Transformer Approach
⭐code
Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis
⭐code🏠project
音频分离
- Visual Scene Graphs for Audio Source Separation
音频-手势
- Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders
  🏠project
Active Speaker Detection(ASD主动式扬声器检测)
- How To Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild
  ⭐code
- MAAS: Multi-Modal Assignation for Active Speaker Detection
从人脸视频中重新收集音频
- Multi-Modality Associative Bridging Through Memory: Speech Sound Recollected From Face Video
视听源定位
- Localize to Binauralize: Audio Spatialization From Visual Sound Source Localization
  ⭐code📺video
视听源分离
- Move2Hear: Active Audio-Visual Source Separation
  ⭐code🏠project
视听平面图重建
- Audio-Visual Floorplan Reconstruction
  ⭐code🏠project📺video

31.Style Transfer(风格迁移)

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer
⭐code
Domain-Aware Universal Style Transfer
⭐code
Diverse Image Style Transfer via Invertible Cross-Space Mapping
StyleFormer: Real-Time Arbitrary Style Transfer via Parametric Style Composition
Manifold Alignment for Semantically Aligned Style Transfer
⭐code

30.Image Generation/synthesis(图像生成/合成)

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models
😮oral
Image Synthesis via Semantic Composition
⭐code🏠project
Image Synthesis From Layout With Locality-Aware Mask Adaption
图像融合
- DTMNet: A Discrete Tchebichef Moments-Based Deep Neural Network for Multi-Focus Image Fusion

29.Image Retrieval(图像检索)

DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features
⭐code
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
⭐code🏠project
Self-supervised Product Quantization for Deep Unsupervised Image Retrieval
⭐code
Instance-Level Image Retrieval Using Reranking Transformers
⭐code
Learning Attribute-Driven Disentangled Representations for Interactive Fashion Retrieval
⭐code
Telling the What While Pointing to the Where: Multimodal Queries for Image Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Learning Deep Local Features With Multiple Dynamic Attentions for Large-Scale Image Retrieval
⭐code
Bayesian Triplet Loss: Uncertainty Quantification in Image Retrieval
跨域检索
- Universal Cross-Domain Retrieval: Generalizing Across Classes and Domains
Visual Geolocalization
- Viewpoint Invariant Dense Matching for Visual Geolocalization
  ⭐code
跨模态检索
文本-视频检索
- TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
  🏠project
视频- 文本检索
- HiT: Hierarchical Transformer With Momentum Contrast for Video-Text Retrieval
image-based 3D shape retrieval
- Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning
近邻搜索
- Product Quantizer Aware Inverted Index for Scalable Nearest Neighbor Search

28.Contrastive Learning(对比学习)

Improving Contrastive Learning by Visualizing Feature Transformation
😮oral⭐code
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
📰解读:ICCV2021-TOCo-微软&CMU提出Token感知的级联对比学习方法，在视频文本对齐任务上“吊打”其他SOTA方法
A Broad Study on the Transferability of Visual Representations With Contrastive Learning
⭐code
Vi2CLR: Video and Image for Visual Contrastive Learning of Representation
LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions
⭐code
CrossCLR: Cross-Modal Contrastive Learning for Multi-Modal Video Representations
Social NCE: Contrastive Learning of Socially-Aware Motion Representations
⭐code📺video
With a Little Help From My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations
Contrastive Learning of Image Representations With Cross-Video Cycle-Consistency
🏠project
Weakly Supervised Contrastive Learning

27.Multi-label image recognition(多标签图像识别)

Residual Attention: A Simple but Effective Method for Multi-Label Recognition
⭐code
Transformer-based Dual Relation Graph for Multi-label Image Recognition

26.Image Processing(图像处理)

Aligning Latent and Image Spaces to Connect the Unconnectable
⭐code🏠project
图像形状操纵
- Image Shape Manipulation from a Single Augmented Training Sample
  😮oral⭐code🏠project
边缘检测
- RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth
  😮oral⭐code
- Pixel Difference Networks for Efficient Edge Detection
  ⭐code
图像识别
- MicroNet: Improving Image Recognition with Extremely Low FLOPs
  ⭐code
图像去模糊
- Rethinking Coarse-to-Fine Approach in Single Image Deblurring
  ⭐code
- Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions
- Defocus Map Estimation and Deblurring From a Single Dual-Pixel Image
- Motion Deblurring with Real Events
- Pyramid Architecture Search for Real-Time Image Deblurring
- 运动去模糊
  - Perceptual Variousness Motion Deblurring With Light Global Context Refinement
视频去模糊
- Bringing Events Into Video Deblurring With Non-Consecutively Blurry Frames
  ⭐code
Image quality assessment(图像质量评估IQA)
- MUSIQ: Multi-scale Image Quality Transformer
  ⭐code
Image Harmonization
- SSH: A Self-Supervised Framework for Image Harmonization
  ⭐code
- Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment
  ⭐code
去阴影
- CANet: A Context-Aware Network for Shadow Removal
  ⭐code
- DC-ShadowNet: Single-Image Hard and Soft Shadow Removal Using Unsupervised Domain-Classifier Guided Network
去噪
- Rethinking Deep Image Prior for Denoising
  ⭐code
- Rethinking Noise Synthesis and Modeling in Raw Denoising
  ⭐code
- C2N: Practical Generative Noise Modeling for Real-World Denoising
- The Benefit of Distraction: Denoising Camera-Based Physiological Measurements Using Inverse Attention
  ⭐code
- Hyperspectral Image Denoising with Realistic Data
  ⭐code
- End-to-End Unsupervised Document Image Blind Denoising
- Cross-Patch Graph Convolutional Network for Image Denoising
- 视频去噪
  - Patch Craft: Video Denoising by Deep Modeling and Patch Matching
图像着色
- Towards Vivid and Diverse Image Colorization with Generative Color Prior
- Deep Edge-Aware Interactive Colorization Against Color-Bleeding Effects
  😮oral🏠project
图像增强
- Real-time Image Enhancer via Learnable Spatial-aware 3D Lookup Tables
- Adaptive Unfolding Total Variation Network for Low-Light Image Enhancement
  ⭐code
- Representative Color Transform for Image Enhancement
- STAR: A Structure-Aware Lightweight Transformer for Real-Time Image Enhancement
- Deep Symmetric Network for Underexposed Image Enhancement With Recurrent Attentional Learning
  ⭐code🏠project
- StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement
图像恢复
- Spatially-Adaptive Image Restoration using Distortion-Guided Networks
  ⭐code
- Dynamic Attentive Graph Learning for Image Restoration
  ⭐code
- Self-Supervised Cryo-Electron Tomography Volumetric Image Restoration From Single Noisy Volume With Sparsity Constraint
  ⭐code
- Searching for Controllable Image Restoration Networks
  ⭐code
图像压缩
- Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform
  ⭐code
- Neural Image Compression via Attentional Multi-Scale Back Projection and Frequency Decomposition
图像修复
- Image Inpainting via Conditional Texture and Structure Dual Generation
  ⭐code
- CR-Fill: Generative Image Inpainting With Auxiliary Contextual Reconstruction
  ⭐code
- Parallel Multi-Resolution Fusion Network for Image Inpainting
- Painting from Part
  ⭐code
- WaveFill: A Wavelet-Based Generation Network for Image Inpainting
- Distillation-Guided Image Inpainting
- Learning a Sketch Tensor Space for Image Inpainting of Man-made Scenes
  ⭐code🏠project
Image extrapolation
- SemIE: Semantically-aware Image Extrapolation
  🏠project
Reversible Image Conversion
- IICNet: A Generic Framework for Reversible Image Conversion
  ⭐code
伪影去除
- Towards Flexible Blind JPEG Artifacts Removal
  ⭐code
- Learning Dual Priors for JPEG Compression Artifacts Removal
- Let's See Clearly: Contaminant Artifact Removal for Moving Cameras
De-rendering
- De-rendering Stylized Texts
  ⭐code🏠project
去除光晕
- Light Source Guided Single-Image Flare Removal From Unpaired Data
全景图拼接
- Minimal Solutions for Panoramic Stitching Given Gravity Prior
Flare Removal
- How to Train Neural Networks for Flare Removal
  🏠project📺video
图像裁剪
- TransView: Inside, Outside, and Across the Cropping View Boundaries
- Dissecting Image Crops
  ⭐code
去反射
- Location-Aware Single Image Reflection Removal
  ⭐code
- V-DESIRR: Very Fast Deep Embedded Single Image Reflection Removal
  ⭐code
去雨
- Improving De-Raining Generalization via Neural Reorganization
- Unpaired Learning for Deep Image Deraining with Rain Direction Regularize
  🏠project
- Structure-Preserving Deraining With Residue Channel Prior Guidance
  ⭐code
图像失真去除
- Unsupervised Non-Rigid Image Distortion Removal via Grid Deformation
  ⭐code📺video
消除水下图像的折射失真
- Learning To Remove Refractive Distortions From Underwater Images
图像补全
- High-Fidelity Pluralistic Image Completion With Transformers
  ⭐code🏠project
Image Decomposition
- Unsupervised Layered Image Decomposition into Object Prototypes
失真矫正
- Towards Complete Scene and Regular Shape for Distortion Rectification by Curve-Aware Extrapolation
HDR
- Unpaired Learning for High Dynamic Range Image Tone Mapping
- 超高清图像HDR重建
  - Ultra-High-Definition Image HDR Reconstruction via Collaborative Bilateral Learning
图像去雪
- ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-Tree Complex Wavelet Representation and Contradict Channel Loss
  ⭐code
Image Harmonization
- Image Harmonization With Transformer
  ⭐code
图像编辑
- Language-Guided Global Image Editing via Cross-Modal Cyclic Mechanism
image hiding(图像隐藏)
- HiNet: Deep Image Hiding by Invertible Network
  ⭐code

25.Medical Image(医学影像)

Equivariant Imaging: Learning Beyond the Range Space
😮oral⭐code
Deep Survival Analysis With Longitudinal X-Rays for COVID-19
医学图像分割
病理学图像表示
- A QuadTree Image Representation for Computational Pathology
医学图像分析
- Preservational Learning Improves Self-supervised Medical Image Models by Reconstructing Diverse Contexts
  ⭐code
  📰解读:ICCV2021 2D和3D通用！新医疗影像自监督SOTA（代码已开源）
医学图像去噪
- Eformer: Edge Enhancement based Transformer for Medical Image Denoising
视频翻译
- Long-Term Temporally Consistent Unpaired Video Translation From Simulated Surgical 3D Data
  ⭐code🏠project
病理学图像核检测分割
- Mutual-Complementing Framework for Nuclei Detection and Segmentation in Pathology Image
医学报告生成
- Visual-Textual Attentive Semantic Consistency for Medical Report Generation
CT
医学图像识别
- GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition
  ⭐code
医学图像分类
- Big Self-Supervised Models Advance Medical Image Classification
- Large-Scale Robust Deep AUC Maximization: A New Surrogate Loss and Empirical Studies on Medical Image Classification
  ⭐code

24.Face(人脸)

VariTex: Variational Neural Face Textures
⭐code🏠project📺video
人脸造假检测
- OpenForensics: Large-Scale Challenging Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild
  🏠project
- Exploring Temporal Coherence for More General Video Face Forgery Detection
人脸合成
- Disentangled Lifespan Face Synthesis
  ⭐code🏠project📺video
人脸识别
- PASS: Protected Attribute Suppression System for Mitigating Bias in Face Recognition
- SynFace: Face Recognition with Synthetic Data
- Adaptive Label Noise Cleaning With Meta-Supervision for Deep Face Recognition
- Disentangled Representation for Age-Invariant Face Recognition: A Mutual Information Minimization Perspective
- Teacher-Student Adversarial Depth Hallucination To Improve Face Recognition
  ⭐code
- DAM: Discrepancy Alignment Metric for Face Recognition
- “去”识别
  - Personalized and Invertible Face De-Identification by Disentangled Identity Information Manipulation
Face perception面部感知
- Learning Facial Representations from the Cycle-consistency of Face
  ⭐code
说话人脸生成
- FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning
说话头合成
- AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis
  ⭐code
- Learned Spatial Representations for Few-Shot Talking-Head Synthesis
  ⭐code🏠project
人脸表情识别
- Understanding and Mitigating Annotation Bias in Facial Expression Recognition
- TransFER: Learning Relation-aware Facial Expression Representations with Transformers
人脸呈现攻击检测
- Detection and Continual Learning of Novel Face Presentation Attacks
  ⭐code
人脸编辑
- Talk-to-Edit: Fine-Grained Facial Editing via Dialog
  ⭐code🏠project
  📰解读:ICCV2021 | 南洋理工大学、港中大提出Talk-to-Edit，对话实现高细粒度人脸编辑
- A Latent Transformer for Disentangled Face Editing in Images and Videos
  ⭐code
人脸对齐
- ADNet: Leveraging Error-Bias Towards Normal Direction in Face Alignment
人脸图像重建
- Focal Frequency Loss for Image Reconstruction and Synthesis
  ⭐code🏠project📺video
- Towards High Fidelity Monocular Face Reconstruction With Rich Reflectance Using Self-Supervised Learning and Ray Tracing
- Neural Photofit: Gaze-Based Mental Image Reconstruction
  🏠project
3D人脸重建
- Topologically Consistent Multi-View Face Inference Using Volumetric Sampling
  ⭐code
- Self-Supervised 3D Face Reconstruction via Conditional Estimation
三维人脸动画
- MeshTalk: 3D Face Animation From Speech Using Cross-Modality Disentanglement
  ⭐code📺video
Remote Photoplethysmography (rPPG远程光电容积描记术)
- The Way to My Heart Is Through Contrastive Learning: Remote Photoplethysmography From Unlabelled Video
  ⭐code
人脸加密
- Towards Face Encryption by Generating Adversarial Identity Masks
  ⭐code
Deepfake检测
- Learning Self-Consistency for Deepfake Detection
  😮oral
- Joint Audio-Visual Deepfake Detection
- Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data
  😮oral
人脸纹理补全
- Learning High-Fidelity Face Texture Completion Without Complete Face Texture
面部动作单元检测
- PIAP-DF: Pixel-Interested and Anti Person-Specific Facial Action Unit Detection Net With Discrete Feedback Learning
人脸分析
- Fake It Till You Make It: Face analysis in the wild using synthetic data alone
  🏠project📺video
3D头重建
- H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction
人脸关键点检测
- Improving Robustness of Facial Landmark Detection by Defending Against Adversarial Attacks
  ⭐code
人脸图像检索
- Face Image Retrieval with Attribute Manipulation

23.Gaze Estimation(视线估计)

Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation
⭐code
视线跟踪
- Looking Here or There? Gaze Following in 360-Degree Images
视点估计
- ViewNet: Unsupervised Viewpoint Estimation From Conditional Generation

22.GAN

Sketch Your Own GAN
⭐code🏠project
Online Multi-Granularity Distillation for GAN Compression
⭐code
Dual Projection Generative Adversarial Networks for Conditional Image Generation
⭐code
InSeGAN: A Generative Approach to Segmenting Identical Instances in Depth Images
ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement
⭐code🏠project📺video
WarpedGANSpace: Finding non-linear RBF paths in GAN latent space
⭐code
Toward a Visual Concept Vocabulary for GAN Latent Space
Collaging Class-specific GANs for Semantic Image Synthesis
🏠project
Latent Transformations via NeuralODEs for GAN-Based Image Editing
Reality Transform Adversarial Generators for Image Splicing Forgery Detection and Localization
GAN-Control: Explicitly Controllable GANs(https://alonshoshan10.github.io/gan_control/)
🏠project
Omni-GAN: On the Secrets of cGANs and Beyond
⭐code
Unsupervised Image Generation with Infinite Generative Adversarial Networks
⭐code
DAE-GAN: Dynamic Aspect-Aware GAN for Text-to-Image Synthesis
Detail Me More: Improving GAN’s photo-realism of complex scenes
Unsupervised Segmentation Incorporating Shape Prior via Generative Adversarial Networks
DRB-GAN: A Dynamic ResBlock Generative Adversarial Network for Artistic Style Transfer
⭐code
Dual Contrastive Loss and Attention for GANs
Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation
Gradient Normalization for Generative Adversarial Networks
⭐code
EigenGAN: Layer-Wise Eigen-Learning for GANs
⭐code
Retrieve in Style: Unsupervised Facial Feature Transfer and Retrieval
⭐code
HeadGAN: One-shot Neural Head Synthesis and Editing
🏠project📺video
Explaining in Style: Training a GAN To Explain a Classifier in StyleSpace
⭐code🏠project📺video
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
😮oral⭐code📺video
Towards Discovery and Attribution of Open-World GAN Generated Images
Diagonal Attention and Style-Based GAN for Content-Style Disentanglement in Image Generation and Translation
Re-Aging GAN: Toward Personalized Face Age Transformation
When do GANs replicate? On the choice of dataset size
⭐code
LoFGAN: Fusing Local Representations for Few-shot Image Generation
Multi-Class Multi-Instance Count Conditioned Adversarial Image Generation
⭐code
Generative Adversarial Registration for Improved Conditional Deformable Templates
⭐code
F-Drop&Match: GANs with a Dead Zone in the High-Frequency Domain
GAN inversion(GAN逆映射)
- From Continuity to Editability: Inverting GANs with Consecutive Images
  ⭐code
- GAN Inversion for Out-of-Range Images with Geometric Transformations
  🏠project
图像到图像翻译
- Unaligned Image-to-Image Translation by Learning to Reweight
  ⭐code
- Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation
  ⭐code
- Instance-Wise Hard Negative Example Generation for Contrastive Learning in Unpaired Image-to-Image Translation
- TransferI2I: Transfer Learning for Image-to-Image Translation from Small Datasets
  ⭐code
- Rethinking the Truly Unsupervised Image-to-Image Translation
- SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised Image-to-Image Translation
Image translation
- Scaling-up Disentanglement for Image Translation
  ⭐code🏠project
- Harnessing the Conditioning Sensorium for Improved Image Translation
- Frequency Domain Image Translation: More Photo-Realistic, Better Identity-Preserving
  ⭐code
- Dual Transfer Learning for Event-based End-task Prediction via Pluggable Event to Image Translation
  ⭐code
- Semantically Robust Unpaired Image Translation for Data with Unmatched Semantics Statistics

21.Active Learning(主动学习)

Semi-Supervised Active Learning with Temporal Output Discrepancy
⭐code
Influence Selection for Active Learning
Active Domain Adaptation via Clustering Uncertainty-weighted Embeddings
⭐code
Contrastive Coding for Active Learning under Class Distribution Mismatch
⭐code

20.Adversarial Learning(对抗学习)

Low Curvature Activations Reduce Overfitting in Adversarial Training
Removing Adversarial Noise in Class Activation Feature Space
Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings
Invisible Backdoor Attack With Sample-Specific Triggers
⭐code
Defending Against Universal Adversarial Patches by Clipping Feature Norms
对抗攻击
对抗样本
- Adversarial Example Detection Using Latent Neighborhood Graph
- On the Robustness of Vision Transformers to Adversarial Examples
黑盒

19.Self-Driving Vehicles(自动驾驶)

End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
⭐code
MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving
⭐code
NEAT: Neural Attention Fields for End-to-End Autonomous Driving
⭐code
Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving
⭐code
Social-NCE: Contrastive Learning of Socially-aware Motion Representations
⭐code📺video
Learning To Drive From a World on Rails
😮oral⭐code🏠project
DRIVE: Deep Reinforced Accident Anticipation With Visual Explanation
⭐code🏠project📺video
LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving
Prediction by Anticipation: An Action-Conditional Prediction Method Based on Interaction Learning
⭐code📺video
TMCOSS: Thresholded Multi-Criteria Online Subset Selection for Data-Efficient Autonomous Driving
FIERY: Future Instance Prediction in Bird's-Eye View From Surround Monocular Cameras
⭐code
On Exposing the Challenging Long Tail in Future Prediction of Traffic Actors
⭐code
MGNet: Monocular Geometric Scene Understanding for Autonomous Driving
⭐code📺video
Human trajectory prediction(人体轨迹预测)
- Personalized Trajectory Prediction via Distribution Discrimination
  ⭐code
- Human Trajectory Prediction via Counterfactual Analysis
  ⭐code
- From Goals, Waypoints & Paths to Long Term Human Trajectory Forecasting
  ⭐code🏠project📺video
轨迹预测
- Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction
- LOKI: Long Term and Key Intentions for Trajectory Prediction
- MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction
  ⭐code
- DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets
- Where Are You Heading? Dynamic Trajectory Prediction With Expert Goal Examples
  ⭐code
- Three Steps to Multimodal Trajectory Prediction: Modality Clustering, Classification and Synthesis
- Spatial-Temporal Consistency Network for Low-Latency Trajectory Forecasting
- Likelihood-Based Diverse Sampling for Trajectory Forecasting
  ⭐code
运动预测
- RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting
  🏠project
- SLAMP: Stochastic Latent Appearance and Motion Prediction
  🏠project
- SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos
自动导航
- FOVEA: Foveated Image Magnification for Autonomous Navigation
  🏠project
交通场景理解
- Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images
  ⭐code
车辆车牌识别
- 车辆重识别
  - Heterogeneous Relational Complement for Vehicle Re-identification
  - Self-Supervised Geometric Features Discovery via Interpretable Attention for Vehicle Re-Identification and Beyond
自主赛车
- Learn-to-Race: A Multimodal Control Environment for Autonomous Racing
  ⭐code
预测司机的视觉注意力
- MEDIRL: Predicting the Visual Attention of Drivers via Maximum Entropy Deep Inverse Reinforcement Learning
  ⭐code
姿势预测
- Space-Time-Separable Graph Convolutional Network for Pose Forecasting
  ⭐code📺video
车辆跟踪
- Track Without Appearance: Learn Box and Tracklet Embedding With Local and Global Motion Patterns for Vehicle Tracking
对任意相机视角的车辆进行检测分析
- Robust 2D/3D Vehicle Parsing in Arbitrary Camera Views for CVIS
车道线检测
- CondLaneNet: A Top-To-Down Lane Detection Framework Based on Conditional Convolution
  ⭐code
- Active Learning for Lane Detection: A Knowledge Distillation Approach
车速估计
- Robust Automatic Monocular Vehicle Speed Estimation for Traffic Surveillance
  ⭐code

18.Transformers

Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
😮oral⭐code
📰解读:ICCV2021 Oral-TAU&Facebook提出了通用的Attention模型可解释性
Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction
⭐code
PlaneTR: Structure-Guided Transformers for 3D Plane Recovery
⭐code
Rethinking and Improving Relative Position Encoding for Vision Transformer
⭐code
Vision Transformer with Progressive Sampling
⭐code
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
😮oral⭐code
Rethinking Spatial Dimensions of Vision Transformers
⭐code
📰解读:ICCV2021-PiT-池化操作不是CNN的专属，ViT说：“我也可以”；南大提出池化视觉Transformer（PiT）
PnP-DETR: Towards Efficient Visual Analysis with Transformers
⭐code
Describing and Localizing Multiple Changes With Transformers
⭐code🏠project
LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference
⭐code
VidTr: Video Transformer Without Convolutions
Visformer: The Vision-Friendly Transformer
⭐code
Going Deeper With Image Transformers
⭐code
Multiscale Vision Transformers
⭐code
Learning Multi-Scene Absolute Pose Regression With Transformers
⭐code
Visual Saliency Transformer
⭐code
Event-Based Video Reconstruction Using Transformer
⭐code
Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
⭐code
An Empirical Study of Training Self-Supervised Vision Transformers
😮oral⭐code
Tokens-to-Token ViT: Training Vision Transformers From Scratch on ImageNet
⭐code
CvT: Introducing Convolutions to Vision Transformers
⭐code
COTR: Correspondence Transformer for Matching Across Images
ViViT: A Video Vision Transformer
⭐code
AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting
⭐code🏠project
Incorporating Convolution Designs into Visual Transformers
⭐code
LayoutTransformer: Layout Generation and Completion with Self-attention
⭐code🏠project
AutoFormer: Searching Transformers for Visual Recognition
⭐code
Scalable Vision Transformers With Hierarchical Pooling
⭐code
Visual Transformers: Where Do Transformers Really Belong in Vision Models?
Anticipative Video Transformer
⭐code🏠project
密集预测
- Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
  😮oral⭐code
  📰解读:大白话Pyramid Vision Transformer
- Vision Transformers for Dense Prediction
  ⭐code
3D人体纹理估计
- 3D Human Texture Estimation from a Single Image with Transformers
  😮oral🏠project
图像编辑
- Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
  ⭐code
OCR
- DocFormer: End-to-End Transformer for Document Understanding
根据音乐生成舞蹈
- AI Choreographer: Music Conditioned 3D Dance Generation With AIST++
  🏠project
  📰简介:Transformer又又来了，生成配有音乐的丝滑3D舞蹈，开放最大规模数据集AIST++

17.3D(三维视觉)

Discovering 3D Parts from Image Collections
😮oral⭐code🏠project📺video
PixelSynth: Generating a 3D-Consistent Experience from a Single Image
⭐code🏠project
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
⭐code🏠project
Pixel-Perfect Structure-from-Motion with Featuremetric Refinement
⭐code
Learning Anchored Unsigned Distance Functions with Gradient Direction Alignment for Single-view Garment Reconstruction
😮oral⭐code
LSD-StructureNet: Modeling Levels of Structural Detail in 3D Part Hierarchies
Rational Polynomial Camera Model Warping for Deep Learning Based Satellite Multi-View Stereo Matching
⭐code
Where2Act: From Pixels to Actions for Articulated 3D Objects
📺video
BuildingNet: Learning to Label 3D Buildings
😮oral⭐code🏠project
SurfGen: Adversarial 3D Shape Synthesis With Explicit Surface Discriminators
Deep Virtual Markers for Articulated 3D Shapes
⭐code📺video
Learning Efficient Photometric Feature Transform for Multi-view Stereo
🏠project
Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing
Just a Few Points Are All You Need for Multi-View Stereo: A Novel Semi-Supervised Learning Method for Multi-View Stereo
3D-FRONT: 3D Furnished Rooms With layOuts and semaNTics
Learning Generative Models of Textured 3D Meshes from Real-World Images
⭐code
Self-Supervised Pretraining of 3D Features on any Point-Cloud
High Quality Disparity Remapping with Two-Stage Warping
Structure-From-Sherds: Incremental 3D Reassembly of Axially Symmetric Pots From Unordered and Mixed Fragment Collections
⭐code
Interpolation-Aware Padding for 3D Sparse Convolutional Neural Networks
深度估计
- StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation
  ⭐code
- Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision
  ⭐code🏠project
- Augmenting Depth Estimation with Geospatial Context
- Can Scale-Consistent Monocular Depth Be Learned in a Self-Supervised Scale-Invariant Manner?
- Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective With Transformers
  😮oral⭐code
- Adaptive Surface Normal Constraint for Depth Estimation
  ⭐code
- Event-Intensity Stereo: Estimating Depth by the Best of Both Worlds
- DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes
- DepthInSpace: Exploitation and Fusion of Multiple Video Frames for Structured-Light Depth Estimation
  🏠project
- Boosting Monocular Depth Estimation With Lightweight 3D Point Fusion
- Monocular Depth Estimation(单目深度估计)
深度补全
- Bayesian Deep Basis Fitting for Depth Completion With Uncertainty
- Unsupervised Depth Completion With Calibrated Backprojection Layers
  ⭐code
Omnidirectional Localization
- PICCOLO: Point Cloud-Centric Omnidirectional Localization
三维重建
- Learning Signed Distance Field for Multi-view Surface Reconstruction
  😮oral
- 3DStyleNet: Creating 3D Shapes with Geometric and Texture Style Variations
  😮oral🏠project
- DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension
- In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces
  😮oral⭐code📺video
- Gaussian Fusion: Accurate 3D Reconstruction via Geometry-Guided Displacement Interpolation
- RetrievalFuse: Neural 3D Scene Reconstruction With a Database
  ⭐code🏠project📺video
- Multi-View 3D Reconstruction With Transformers
- Polarimetric Helmholtz Stereopsis
- MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis
  ⭐code📺video
- Toward Realistic Single-View 3D Object Reconstruction With Unsupervised Learning From Multiple Images
  ⭐code
- CryoDRGN2: Ab initio neural reconstruction of 3D protein structures from real cryo-EM images
- 三维场景重建
  - Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs
  - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction
- 三维形状重建
- 三维网格重建
  - Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility
    ⭐code
三维场景
- Scene Synthesis via Uncertainty-Driven Attribute Synchronization
  ⭐code
- Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs
  🏠project
相机校准
- CTRL-C: Camera calibration TRansformer with Line-Classification
  ⭐code
- BabelCalib: A Universal Approach to Calibrating Central Cameras
  ⭐code
表面重建
- Temporally-Coherent Surface Reconstruction via Metric-Consistent Atlases
- Planar Surface Reconstruction from Sparse Views
  😮oral⭐code🏠project📺video
- Adaptive Surface Reconstruction With Multiscale Convolutional Kernels
3D场景合成
- Indoor Scene Generation From a Collection of Semantic-Segmented Depth Images
  ⭐code
3D形状识别
- Learning Canonical View Representation for 3D Shape Recognition With Arbitrary Views
  ⭐code
图像重建
- Semantic-embedded Unsupervised Spectral Reconstruction from Single RGB Images in the Wild
  ⭐code
Multi-view Stereo(MVS)
- Digging into Uncertainty in Self-supervised Multi-view Stereo
  ⭐code
- PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility
  ⭐code
- A Confidence-Based Iterative Solver of Depths and Surface Normals for Deep Multi-View Stereo
  ⭐code
- EPP-MVSNet: Epipolar-Assembling Based Depth Prediction for Multi-View Stereo

16.Re-Identification(重识别)

Object Re-Identification目标(物体)重识别

TransReID: Transformer-Based Object Re-Identification
⭐code

Person Re-Identification(人员重识别)

Spatio-Temporal Representation Factorization for Video-based Person Re-Identification
Learning Instance-level Spatial-Temporal Patterns for Person Re-identification
⭐code
Towards Discriminative Representation Learning for Unsupervised Person Re-identification
Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences
⭐code🏠project
Video-based Person Re-identification with Spatial and Temporal Memory Networks
🏠project
Multi-Expert Adversarial Attack Detection in Person Re-identification Using Context Inconsistency
Clothing Status Awareness for Long-Term Person Re-Identification
Dense Interaction Learning for Video-Based Person Re-Identification
😮oral
Explainable Person Re-Identification With Attribute-Guided Metric Distillation
🏠project
Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-Identification
Pyramid Spatial-Temporal Aggregation for Video-Based Person Re-Identification
⭐code
ICE: Inter-Instance Contrastive Encoding for Unsupervised Person Re-Identification
⭐code📺video
Learning To Know Where To See: A Visibility-Aware Approach for Occluded Person Re-Identification
Attack-Guided Perceptual Data Generation for Real-world Re-Identification
BV-Person: A Large-Scale Dataset for Bird-View Person Re-Identification
🌻dataset
CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification
⭐code
Meta Pairwise Relationship Distillation for Unsupervised Person Re-Identification
Syncretic Modality Collaborative Learning for Visible Infrared Person Re-Identification
Weakly Supervised Text-Based Person Re-Identification
⭐code
Occlude Them All: Occlusion-Aware Attention Network for Occluded Person Re-ID
Occluded Person Re-Identification with Single-scale Global Representations
⭐code
域适应人员重识别
- IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID
  😮oral⭐code
Crowd Counting(拥挤人群计数)
- Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework
  😮oral⭐code
- Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting
  ⭐code
- Spatial Uncertainty-Aware Semi-Supervised Crowd Counting
  ⭐code
- Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting
  ⭐code
- Exploiting Sample Correlation for Crowd Counting With Multi-Expert Network
  ⭐code
- Crowd Counting With Partial Annotations in an Image
  ⭐code
- Towards A Universal Model for Cross-Dataset Crowd Counting
- Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework
  😮oral⭐code
- Uniformity in Heterogeneity: Diving Deep Into Count Interval Partition for Crowd Counting
  ⭐code
跨模态人员重识别
- Cross-Modality Person Re-Identification via Modality Confusion and Center Aggregation
行人检测
- MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?
- Stacked Homography Transformations for Multi-View Pedestrian Detection
- Robust Small-scale Pedestrian Detection with Cued Recall via Memory Learning
- Body-Face Joint Detection via Embedding and Head Hook
  ⭐code
行人属性识别
- Spatial and Semantic Consistency Regularizations for Pedestrian Attribute Recognition
- LapsCore: Language-Guided Person Search via Color Reasoning
Person Search(行人搜索)
- Weakly Supervised Person Search with Region Siamese Networks
- End-to-End Trainable Trident Person Search Network Using Adaptive Gradient Propagation
- ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer
  🏠project
- Weakly Supervised Person Search with Region Siamese Networks
行人行为预测
- Bifold and Semantic Reasoning for Pedestrian Behavior Prediction
步态识别
- Gait Recognition via Effective Global-Local Feature Representation and Local Temporal Aggregation
- Context-Sensitive Temporal Feature Learning for Gait Recognition
  ⭐code
- 3D Local Convolutional Neural Networks for Gait Recognition
  ⭐code
- Gait Recognition in the Wild: A Benchmark
  ⭐code🏠project

15.Object Tracking(目标跟踪)

Saliency-Associated Object Tracking
Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds
⭐code
Learning to Track Objects from Unlabeled Videos
⭐code
DepthTrack : Unveiling the Power of RGBD Tracking
⭐code
Learning Target Candidate Association To Keep Track of What Not To Track
⭐code
Transparent Object Tracking Benchmark
🏠project
DepthTrack: Unveiling the Power of RGBD Tracking
⭐code
Object Tracking by Jointly Exploiting Frame and Event Domain
High-Performance Discriminative Tracking with Transformers
Visio-Temporal Attention for Multi-Camera Multi-Target Association
视觉目标跟踪
卫星图像跟踪
- HiFT: Hierarchical Feature Transformer for Aerial Tracking
  ⭐code
3D多目标跟踪
- Exploring Simple 3D Multi-Object Tracking for Autonomous Driving
  ⭐code
  📰解读:ICCV 2021丨轻舟智航提出SimTrack: 3D多目标一体化检测与跟踪，简单又精确
多目标跟踪与分割
- Assignment-Space-Based Multi-Object Tracking and Segmentation
- Continuous Copy-Paste for One-Stage Multi-Object Tracking and Segmentation
  ⭐code
多目标跟踪
视频目标跟踪
- TF-Blender: Temporal Feature Blender for Video Object Detection
  ⭐code

14.Object Detection(目标检测)

Rank & Sort Loss for Object Detection and Instance Segmentation
😮oral⭐code
MDETR : Modulated Detection for End-to-End Multi-Modal Understanding
😮oral⭐code
SimROD: A Simple Adaptation Method for Robust Object Detection
😮oral🏠project
📰解读:ICCV2021 Oral SimROD：简单高效的数据增强！华为提出了一种简单的鲁棒目标检测自适应方法
GraphFPN: Graph Feature Pyramid Network for Object Detection
Fast Convergence of DETR with Spatially Modulated Co-Attention
⭐code
Oriented R-CNN for Object Detection
⭐code
Conditional DETR for Fast Training Convergence
📰解读:通过显式寻找物体的 extremity 区域加快 DETR 的收敛：Conditional DETR
Vector-Decomposed Disentanglement for Domain-Invariant Object Detection
⭐code
G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation
ODAM: Object Detection, Association, and Mapping using Posed RGB Video
😮oral
Reconcile Prediction Consistency for Balanced Object Detection
Deep Structured Instance Graph for Distilling Object Detectors
⭐code
Towards Rotation Invariance in Object Detection
⭐code
Morphable Detector for Object Detection on Demand
⭐code
DetCo: Unsupervised Contrastive Learning for Object Detection
⭐code
Domain-Invariant Disentangled Network for Generalizable Object Detection
MDETR - Modulated Detection for End-to-End Multi-Modal Understanding
⭐code
Detecting Persuasive Atypicality by Modeling Contextual Compatibility
⭐code
Wanderlust: Online Continual Object Detection in the Real World
🏠project
PreDet: Large-Scale Weakly Supervised Pre-Training for Detection
FMODetect: Robust Detection of Fast Moving Objects
Multi-Source Domain Adaptation for Object Detection
Self-Supervised Object Detection via Generative Image Synthesis
⭐code
Naturalistic Physical Adversarial Patch for Object Detectors
⭐code
Rethinking Transformer-Based Set Prediction for Object Detection
⭐code
Detecting Invisible People
🏠project📺video
Dynamic DETR: End-to-End Object Detection With Dynamic Attention
CrossDet: Crossline Representation for Object Detection
⭐code
Robust Object Detection via Instance-Level Temporal Cycle Confusion
⭐code
End-to-End Semi-Supervised Object Detection With Soft Teacher
⭐code
Parallel Rectangle Flip Attack: A Query-Based Black-Box Attack Against Object Detection
Fooling LiDAR Perception via Adversarial Trajectory Perturbation
⭐code🏠project
TOOD: Task-Aligned One-Stage Object Detection
😮oral⭐code
Active Learning for Deep Object Detection via Probabilistic Modeling
⭐code
📰解读:ICCV2021 还在用大量数据暴力train模型？主动学习，教你选出数据集中最有价值的样本
Dual Bipartite Graph Learning: A General Approach for Domain Adaptive Object Detection
WB-DETR: Transformer-Based Detector without Backbone
3D目标检测
- Geometry Uncertainty Projection Network for Monocular 3D Object Detection
  ⭐code
- Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather
  ⭐code🏠project
- Is Pseudo-Lidar needed for Monocular 3D Object detection?
  ⭐code
- RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection
- LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector
  ⭐code🏠project
- Improving 3D Object Detection with Channel-wise Transformer
- 4D-Net for Learned Multi-Modal Alignment
- Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection
- Voxel Transformer for 3D Object Detection
- An End-to-End Transformer Model for 3D Object Detection
  😮oral⭐code🏠project
- Unsupervised Domain Adaptive 3D Detection With Multi-Level Consistency
- Group-Free 3D Object Detection via Transformers
  ⭐code
- VENet: Voting Enhancement Network for 3D Object Detection
- Multi-Echo LiDAR for 3D Object Detection
- Voxel Transformer for 3D Object Detection
- RangeDet: In Defense of Range View for LiDAR-Based 3D Object Detection
  ⭐code
- The Devil Is in the Task: Exploiting Reciprocal Appearance-Localization Features for Monocular 3D Object Detection
- Gated3D: Monocular 3D Object Detection From Temporal Illumination Cues
  🏠project
- Are We Missing Confidence in Pseudo-LiDAR Methods for Monocular 3D Object Detection?
- SPG: Unsupervised Domain Adaptation for 3D Object Detection via Semantic Point Generation
- You Don't Only Look Once: Constructing Spatial-Temporal Memory for Integrated 3D Object Detection and Tracking
  ⭐code🏠project📺video
- Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection
- AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection
  ⭐code
- Geometry-Based Distance Decomposition for Monocular 3D Object Detection
  ⭐code
目标定位
- Contrastive Attention Maps for Self-supervised Co-localization
- TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization
  ⭐code
- 弱监督目标定位
Anomaly Detection(图像异常检测)
- Divide-and-Assemble: Learning Block-wise Memory for Unsupervised Anomaly Detection
弱监督目标检测
- Boosting Weakly Supervised Object Detection via Learning Bounding Box Adjusters
  ⭐code
- CaT: Weakly Supervised Object Detection With Category Transfer
  ⭐code
OOD 检测
- Triggering Failures: Out-Of-Distribution detection by learning from local adversarial attacks in Semantic Segmentation
  ⭐code
显著目标检测
- Disentangled High Quality Salient Object Detection
- Specificity-preserving RGB-D Saliency Detection
  ⭐code
- Light Field Saliency Detection with Dual Local Graph Learning and Reciprocative Guidance
- MFNet: Multi-Filter Directive Network for Weakly Supervised Salient Object Detection
  ⭐code
- Scene Context-Aware Salient Object Detection
  ⭐code
- Dynamic Context-Sensitive Filtering Network for Video Salient Object Detection
  ⭐code
- iNAS: Integral NAS for Device-Aware Salient Object Detection
  🏠project
- RGB-D显著目标检测
  - RGB-D Saliency Detection via Cascaded Mutual Information Minimization
    ⭐code
- co-saliency detection
  - Summarize and Search: Learning Consensus-aware Dynamic Convolution for Co-Saliency Detection
    ⭐code
违禁物品检测
- Towards Real-World Prohibited Item Detection: A Large-Scale X-ray Benchmark
  🌻dataset
- Towards Real-World X-Ray Security Inspection: A High-Quality Benchmark and Lateral Inhibition Module for Prohibited Items Detection
  ⭐code
小样本目标检测
- DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection
  ⭐code
- Query Adaptive Few-Shot Object Detection With Heterogeneous Graph Convolutional Networks
- Universal-Prototype Enhancing for Few-Shot Object Detection
  ⭐code
视觉关系协同定位
- Few-shot Visual Relationship Co-localization
  ⭐code🏠project
密集目标检测
- Mutual Supervision for Dense Object Detection
  ⭐code
域适应目标检测
- Seeking Similarities over Differences: Similarity-based Domain Alignment for Adaptive Object Detection
- Knowledge Mining and Transferring for Domain Adaptive Object Detection
图像篡改检测
- Image Manipulation Detection by Multi-View Multi-Scale Supervision
  ⭐code
Visual Relationship Detection(VRD视觉关系检测)
- Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship Detection
长尾目标检测
- Exploring Classification Equilibrium in Long-Tailed Object Detection
  ⭐code
- MosaicOS: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection
  ⭐code
Salient Object Ranking
- Salient Object Ranking with Position-Preserved Attention
  ⭐code
小目标检测
- Robust Small Object Detection on the Water Surface Through Fusion of Camera and Millimeter Wave Radar
黑暗中目标检测
- Multitask AET With Orthogonal Tangent Regularity for Dark Object Detection
  ⭐code
3D object prediction
- Holistic Pose Graph: Modeling Geometric Structure among Objects in a Scene using Graph Inference for 3D Object Prediction
  ⭐code
多目标检测
- Training Multi-Object Detector by Estimating Bounding Box Distribution for Input Image
  ⭐code
3D object grounding
- Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud
  ⭐code
细粒度裂纹检测
- CrackFormer: Transformer Network for Fine-Grained Crack Detection
线段检测
- ELSD: Efficient Line Segment Detector and Descriptor
细胞检测与分类
- Multi-Class Cell Detection Using Spatial Context Representation
  ⭐code
阴影检测
- Mitigating Intensity Bias in Shadow Detection via Feature Decomposition and Reweighting
社交距离检测
- BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning
  ⭐code
伪装目标检测
- Uncertainty-Guided Transformer Reasoning for Camouflaged Object Detection
  ⭐code

13.Image Segmentation(图像分割)

Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation
😮oral⭐code📺video
TransForensics: Image Forgery Localization with Dense Self-Attention
From Contexts to Locality: Ultra-high Resolution Image Segmentation via Locality-aware Contextual Correlation
⭐code
Labels4Free: Unsupervised Segmentation using StyleGAN
🏠project📺video
Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle-consistency
Scaling up instance annotation via label propagation
⭐code🏠project
Robust Trust Region for Weakly Supervised Segmentation
⭐code📺video
HPNet: Deep Primitive Segmentation Using Hybrid Representations
⭐code
Weakly Supervised Segmentation of Small Buildings With Point Labels
BAPA-Net: Boundary Adaptation and Prototype Alignment for Cross-Domain Semantic Segmentation
⭐code
Conditional Diffusion for Interactive Segmentation
Human Detection and Segmentation via Multi-view Consensus
⭐code
Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation
Enhanced Boundary Learning for Glass-Like Object Segmentation
⭐code
PARTS: Unsupervised segmentation with slots, attention and independence maximization
Predictive Feature Learning for Future Segmentation Prediction
Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation
⭐code
Segmenter: Transformer for Semantic Segmentation
⭐code
C3-SemiSeg: Contrastive Semi-Supervised Segmentation via Cross-Set Learning and Dynamic Class-Balancing
全景分割
- Panoptic Narrative Grounding
语义分割
- Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation
  😮oral⭐code
- Personalized Image Semantic Segmentation
  ⭐code
- RECALL: Replay-based Continual Learning in Semantic Segmentation
  ⭐code
- Deep Metric Learning for Open World Semantic Segmentation
- LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation
  😮oral
- Dual Path Learning for Domain Adaptation of Semantic Segmentation
  ⭐code
- Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation
- Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation
  ⭐code🏠project
- Multi-Anchor Active Domain Adaptation for Semantic Segmentation
  😮oral⭐code
- Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation
- Self-Regulation for Semantic Segmentation
  ⭐code
- ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation
  ⭐code
- Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation
  🏠project
- Mining Contextual Information Beyond Image for Semantic Segmentation
  ⭐code
- ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation
  ⭐code
- Pseudo-mask Matters in Weakly-supervised Semantic Segmentation
  ⭐code
- SIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic Segmentation
- Region-Aware Contrastive Learning for Semantic Segmentation
- GP-S3Net: Graph-Based Panoptic Sparse Semantic Segmentation Network
- Domain Adaptive Semantic Segmentation With Self-Supervised Depth Estimation
  ⭐code
- Scribble-Supervised Semantic Segmentation by Uncertainty Reduction on Neural Representation and Self-Supervision on Neural Eigenspace
- Exploring Cross-Image Pixel Contrast for Semantic Segmentation
  😮oral⭐code
- Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation
  ⭐code
- Uncertainty-Aware Pseudo Label Refinery for Domain Adaptive Semantic Segmentation
- Contrastive Learning for Label Efficient Semantic Segmentation
- Scaling Semantic Segmentation Beyond 1K Classes on a Single GPU
  ⭐code
- Prototypical Matching and Open Set Rejection for Zero-Shot Semantic Segmentation
- Geometric Unsupervised Domain Adaptation for Semantic Segmentation
- Calibrated Adversarial Refinement for Stochastic Semantic Segmentation
  ⭐code
- Multi-View Radar Semantic Segmentation
  ⭐code
- Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation
  😮oral⭐code
- Specialize and Fuse: Pyramidal Output Representation for Semantic Segmentation
- Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals
  ⭐code
- Scribble-Supervised Semantic Segmentation Inference
- Semi-Supervised Semantic Segmentation With Pixel-Level Contrastive Learning From a Class-Wise Memory Bank
  ⭐code
- 小样本语义分割
  - Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer
    ⭐code
  - Learning Meta-class Memory for Few-Shot Semantic Segmentation
  - Few-Shot Semantic Segmentation With Cyclic Memory Network
- 3D语义分割
  - VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation
    😮oral⭐code
  - Sparse-to-Dense Feature Matching: Intra and Inter Domain Cross-Modal Learning in Domain Adaptation for 3D Semantic Segmentation
    ⭐code
  - Weakly Supervised 3D Semantic Segmentation Using Cross-Image Consensus and Inter-Voxel Affinity Relations
- 视频语义分割
  - Domain Adaptive Video Segmentation via Temporal Consistency Regularization
    ⭐code
- 弱监督语义分割
  - Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
    ⭐code
  - Complementary Patch for Weakly Supervised Semantic Segmentation
  - ECS-Net: Improving Weakly Supervised Semantic Segmentation by Using Connections Between Class Activation Maps
  - Unlocking the Potential of Ordinary Classifier: Class-Specific Adversarial Erasing Framework for Weakly Supervised Semantic Segmentation
    ⭐code
  - Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation
    ⭐code
  - Seminar Learning for Click-Level Weakly Supervised Semantic Segmentation
- 点云语义分割
  - ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation
    📺video
  - Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation
  - TempNet: Online Semantic Segmentation on Large-Scale Point Cloud Series
  - Guided Point Contrastive Learning for Semi-Supervised Point Cloud Semantic Segmentation
  - Learning With Noisy Labels for Robust Point Cloud Segmentation
    ⭐code🏠project
- OOD
  - Entropy Maximization and Meta Classification for Out-of-Distribution Detection in Semantic Segmentation
    ⭐code
实例分割
- Rank & Sort Loss for Object Detection and Instance Segmentation
  😮oral⭐code
- SOTR: Segmenting Objects with Transformers
  ⭐code
- A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation
- Instances as Queries
  ⭐code📺video
- CrossVIS: Crossover Learning for Fast Online Video Instance Segmentation
  ⭐code📺video
- CDNet: Centripetal Direction Network for Nuclear Instance Segmentation
  ⭐code
- PrimitiveNet: Primitive Instance Segmentation With Local Primitive Embedding Under Adversarial Metric
  ⭐code
- FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation
  ⭐code🏠project
- Prior to Segment: Foreground Cues for Weakly Annotated Classes in Partially Supervised Instance Segmentation
  ⭐code
- DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence From Box Supervision
- End-to-End Video Instance Segmentation via Spatial-Temporal Graph Neural Networks
- The Surprising Impact of Mask-Head Architecture on Novel Class Segmentation
  🏠project
- How Shift Equivariance Impacts Metric Learning for Instance Segmentation
  ⭐code
- Parallel Detection-and-Segmentation Learning for Weakly Supervised Instance Segmentation
- Real-Time Instance Segmentation With Discriminative Orientation Maps
  ⭐code
- 视频实例分割
  - Video Instance Segmentation with a Propose-Reduce Paradigm
    ⭐code
- 3D实例分割
  - Hierarchical Aggregation for 3D Instance Segmentation
    ⭐code
小样本分割
- Mining Latent Classes for Few-shot Segmentation
  😮oral⭐code
Human Motion Segmentation(人体运动分割)
- Graph Constrained Data Representation Learning for Human Motion Segmentation
  ⭐code
- Hypercorrelation Squeeze for Few-Shot Segmenation
  ⭐code🏠project
点云分割
- Learning with Noisy Labels for Robust Point Cloud Segmentation
  😮oral⭐code🏠project
- DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation
- RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR Point Cloud Segmentation
视频目标分割(VOS)
- Full-Duplex Strategy for Video Object Segmentation
  🏠project
- Joint Inductive and Transductive Learning for Video Object Segmentation
  ⭐code
- Hierarchical Memory Matching Network for Video Object Segmentation
  ⭐code
- Self-supervised Video Object Segmentation by Motion Grouping
  ⭐code🏠project📺video
- Deep Transport Network for Unsupervised Video Object Segmentation
- Generating Masks From Boxes by Mining Spatio-Temporal Consistencies in Videos
  ⭐code
- Learning Motion-Appearance Co-Attention for Zero-Shot Video Object Segmentation
  ⭐code
- Video Object Segmentation With Dynamic Memory Networks and Adaptive Object Alignment
  ⭐code
语义场景分割
- BiMaL: Bijective Maximum Likelihood Approach to Domain Adaptation in Semantic Scene Segmentation
  ⭐code
Referring Segmentation(基于文本的分割)
- Vision-Language Transformer and Query Generation for Referring Segmentation
  ⭐code
场景理解
- DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization
  😮oral⭐code🏠project📺video
- ACDC: The Adverse Conditions Dataset With Correspondences for Semantic Driving Scene Understanding
  🏠project
- Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding
  ⭐code
CMA
- Towards Better Explanations of Class Activation Mapping
- Keep CALM and Improve Visual Feature Attribution
  ⭐code
- LFI-CAM: Learning Feature Importance for Better Visual Explanation
  ⭐code
多目标分割
- Faster Multi-Object Segmentation Using Parallel Quadratic Pseudo-Boolean Optimization
动作分割
- Weakly-Supervised Action Segmentation and Alignment via Transcript-Aware Union-of-Subspaces Learning
  ⭐code
- Refining Action Segmentation with Hierarchical Video Representations
  ⭐code
场景解析
- Interaction via Bi-Directional Graph of Semantic Region Affinity for Scene Parsing
抠图
- Cascade Image Matting With Deformable Graph Refinement
- Tripartite Information Mining and Integration for Image Matting
运动分割
- SLIM: Self-Supervised LiDAR Scene Flow and Motion Segmentation

12.Image/Fine-Grained Classification(图像/细粒度分类)

DiagViB-6: A Diagnostic Benchmark Suite for Vision Models in the Presence of Shortcut and Generalization Opportunities
Online Continual Learning For Visual Food Classification
A Unified Objective for Novel Class Discovery
😮oral⭐code🏠project
📰解读:ICCV2021 Oral | UNO：用于“新类发现”的统一目标函数，简化训练流程！已开源！
Improving Generalization of Batch Whitening by Convolutional Unit Optimization
⭐code
Towards Learning Spatially Discriminative Feature Representations
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
⭐code
📰解读:ICCV2021 MIT-IBM沃森开源CrossViT：Transformer走向多分支、多尺度
SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition
⭐code
Influence-Balanced Loss for Imbalanced Visual Classification
⭐code
Explanations for Occluded Images
⭐code🏠project📺video
Understanding Robustness of Transformers for Image Classification
Learning Rare Category Classifiers on a Tight Labeling Budget
Discover the Unknown Biased Attribute of an Image Classifier
⭐code
Co-Scale Conv-Attentional Image Transformers
😮oral⭐code
Benchmark Platform for Ultra-Fine-Grained Visual Categorization Beyond Human Performance
⭐code
Do Image Classifiers Generalize Across Time?
🏠project
Interpretable Image Recognition by Constructing Transparent Embedding Space
⭐code
The Pursuit of Knowledge: Discovering and Localizing Novel Categories using Dual Memory
长尾识别
- Parametric Contrastive Learning
  ⭐code
- ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot
  😮oral⭐code
- Self Supervision to Distillation for Long-Tailed Visual Recognition
  ⭐code
- Distilling Virtual Examples for Long-Tailed Recognition
- Distributional Robustness Loss for Long-Tail Learning
- GistNet: A Geometric Structure Transfer Network for Long-Tailed Recognition
- 长尾视觉关系识别
  - Exploring Long Tail Visual Relationship Recognition With Large Vocabulary
    ⭐code
细粒度
- Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
  ⭐code
- Learning Canonical 3D Object Representation for Fine-Grained Recognition
- Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification
  ⭐code
- N-ImageNet: Towards Robust, Fine-Grained Object Recognition With Event Cameras
- Grafit: Learning fine-grained image representations with coarse labels
- Stochastic Partial Swap: Enhanced Model Generalization and Interpretability for Fine-Grained Recognition
  ⭐code
小样本分类
- Transductive Few-Shot Classification on the Oblique Manifold
- Relational Embedding for Few-Shot Classification
  ⭐code🏠project
- Binocular Mutual Learning for Improving Few-shot Classification
  ⭐code
- Partner-Assisted Learning for Few-Shot Image Classification
- On the Importance of Distractors for Few-Shot Classification
  ⭐code
- Few-Shot Image Classification: Just Use a Library of Pre-Trained Feature Extractors and a Simple Classifier
- Universal Representation Learning From Multiple Domains for Few-Shot Classification
  ⭐code
- A Multi-Mode Modulator for Multi-Domain Few-Shot Classification
- Variational Feature Disentangling for Fine-Grained Few-Shot Classification
  ⭐code
- Mixture-Based Feature Space Learning for Few-Shot Image Classification
  ⭐code🏠project📺video
多标签分类
- Asymmetric Loss For Multi-Label Classification
  ⭐code
- Semantic Diversity Learning for Zero-Shot Multi-label Classification
  ⭐code

11.Visual Question Answering(视觉问答)

Greedy Gradient Ensemble for Robust Visual Question Answering
⭐code
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
⭐code
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
⭐code
Unshuffling Data for Improved Generalization in Visual Question Answering
TRAR: Routing the Attention Spans in Transformer for Visual Question Answering(https://github.com/rentainhe/TRAR-VQA/)
Contrast and Classify: Training Robust VQA Models
Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
⭐code
Auto-Parsing Network for Image Captioning and Visual Question Answering
video question answering
A-VQA
- Pano-AVQA: Grounded Audio-Visual Question Answering on 360∘ Videos

10.OCR

Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
📺video
Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation
📺video
Towards the Unseen: Iterative Text Recognition by Distilling from Errors
📺video
任意形状文本检测
- Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection
  ⭐code
场景文本识别
- From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
  ⭐code
场景文本替换
- STRIVE: Scene Text Replacement In Videos
  🏠project
提取文档图像
- End-to-End Piece-Wise Unwarping of Document Images
  🏠project
手写文本生成
- Handwriting Transformers
  ⭐code
Table Structure Recognition(表格结构识别)
- TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition
  ⭐code

9.Video

Action Detection and Recognition(人体动作检测与识别)
- Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
  ⭐code
- MGSampler: An Explainable Sampling Strategy for Video Action Recognition
  ⭐code
- Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning
- Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition
- Class Semantics-based Attention for Action Detection
- MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
  ⭐code
- AdaSGN: Adapting Joint Number and Model Size for Efficient Skeleton-Based Action Recognition
  ⭐code
- OadTR: Online Action Detection With Transformers
  ⭐code
- Self-Supervised 3D Skeleton Action Representation Learning With Motion Consistency and Continuity
- Interactive Prototype Learning for Egocentric Action Recognition
- Efficient Action Recognition via Dynamic Knowledge Propagation
- Else-Net: Elastic Semantic Network for Continual Action Recognition From Skeleton Data
- Learning Self-Similarity in Space and Time As Generalized Motion for Video Action Recognition
  ⭐code🏠project
- Temporal Action Detection With Multi-Level Supervision
  ⭐code
- Watch Only Once: An End-to-End Video Action Detection Framework
  ⭐code
- Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation
  😮oral
- Geometric Deep Neural Network Using Rigid and Non-Rigid Transformations for Human Action Recognition
- Just One Moment: Structural Vulnerability of Deep Action Recognition Against One Frame Attack
- Evidential Deep Learning for Open Set Action Recognition
  ⭐code🏠project📺video
- Learning an Augmented RGB Representation With Cross-Modal Knowledge Distillation for Action Detection
- Class-Incremental Learning for Action Recognition in Videos
- D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations
  ⭐code
- 零样本动作识别
  - Elaborative Rehearsal for Zero-shot Action Recognition
    ⭐code
- Temporal Action Localization(时序动作定位)
  - Enriching Local and Global Contexts for Temporal Action Localization
    ⭐code
  - Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization
    😮oral⭐code
  - Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization
    ⭐code
  - Video Self-Stitching Graph Network for Temporal Action Localization
  - Divide and Conquer for Single-Frame Temporal Action Localization
  - CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization
- Temporal Action Proposal Generation(时序动作提案生成)
  - Relaxed Transformer Decoders for Direct Action Proposal Generation
    ⭐code
Action Quality Assessment(行动质量评估)
- Group-aware Contrastive Regression for Action Quality Assessment
Video Rescaling
- Self-Conditioned Probabilistic Learning of Video Rescaling
  📰解读:ICCV2021 |上交、北理、百度联合研究视频缩放任务中的自条件概率学习
Video activity localisation
- Cross-Sentence Temporal and Semantic Relations in Video Activity Localisation
  📰解读:ICCV2021 | 如何高效视频定位？QMUL&北大&Adobe强强联手提出弱监督CRM，性能SOTA
视频修复
- Internal Video Inpainting by Implicit Long-range Propagation
  ⭐code🏠project
- Occlusion-Aware Video Object Inpainting
  🏠project
- FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
  ⭐code
- Flow-Guided Video Inpainting with Scene Templates
  ⭐code
- Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection
视频分析
- 视频表征学习
视频剪辑
- Learning to Cut by Watching Movies
  ⭐code🏠project
视频字幕
- Motion Guided Region Message Passing for Video Captioning
- Aligning Subtitles in Sign Language Videos
  🏠project📺video
- Dense Video Captioning
  - End-to-End Dense Video Captioning with Parallel Decoding
    ⭐code
    📰解读:港大&南科大提出端到端PDVC，用DETR的方法做Dense Video Captioning！简化训练流程
视频编码
- Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
  ⭐code
  📰解读:ICCV2021—工业界中的神经网络视频传输超分算法
视频生成
- Click to Move: Controlling Video Generation with Sparse Motion
  ⭐code
Video Relation Detection(视频关系检测)
- Social Fabric: Tubelet Compositions for Video Relation Detection
  ⭐code
Video Grounding
- Support-Set Based Cross-Supervision for Video Grounding
视频精彩片段检测
- Cross-category Video Highlight Detection via Set-based Learning
  ⭐code
- PR-Net: Preference Reasoning for Personalized Video Highlight Detection
- HighlightMe: Detecting Highlights from Human-Centric Videos
- Temporal Cue Guided Video Highlight Detection With Low-Rank Audio-Visual Fusion
- Joint Visual and Audio Learning for Video Highlight Detection
视频识别
- Searching for Two-Stream Models in Multivariate Space for Video Recognition
- Adaptive Focus for Efficient Video Recognition
  😮oral⭐code
- AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
  ⭐code🏠project
- TAM: Temporal Adaptive Module for Video Recognition
  ⭐code
- Condensing a Sequence to One Informative Frame for Video Recognition
- VideoLT: Large-Scale Long-Tailed Video Recognition
  ⭐code
- Motion-Augmented Self-Training for Video Recognition at Smaller Scale
- Multi-Modal Multi-Action Video Recognition
  ⭐code
Motion Retargeting(运动重定位)
- Contact-Aware Retargeting of Skinned Motion
  📺video
视频预测
- A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction
  😮oral
视频合成
- iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis
  ⭐code🏠project
视频帧插值
- Training Weakly Supervised Video Frame Interpolation With Events
  ⭐code
- Asymmetric Bilateral Motion Estimation for Video Frame Interpolation
  ⭐code
- XVFI: eXtreme Video Frame Interpolation
  😮oral⭐code📺video
Deepfake 视频检测
- ID-Reveal: Identity-aware DeepFake Video Detection
  ⭐code
视频稳定
- Hybrid Neural Fusion for Full-Frame Video Stabilization
  ⭐code🏠project📺video
Video Frame-level Similarity(视频帧级相似度学习)
- Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective
  😮oral⭐code🏠project📺video
视频压缩
- Online-Trained Upsampler for Deep Low Complexity Video Compression
视频时刻检索
- Fast Video Moment Retrieval
视频摘要
- Multiple Pairwise Ranking Networks for Personalized Video Summarization
视频质量评估
- Unsupervised Curriculum Domain Adaptation for No-Reference Video Quality Assessment
  ⭐code
Video Grounding
- STVGBert: A Visual-Linguistic Transformer Based Framework for Spatio-Temporal Video Grounding
视频定位
- Zero-Shot Natural Language Video Localization
  😮oral⭐code
视频推理
- Real-Time Video Inference on Edge Devices via Adaptive Model Streaming
  ⭐code
视频相关
- Anonymizing Egocentric Videos
视频异常检测
- Dance With Self-Attention: A New Look of Conditional Random Fields on Anomaly Detection in Videos
- A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction
  ⭐code
  📰解读:ICCV 2021 oral 重构+预测，双管齐下提升视频异常检测性能
- Weakly-Supervised Video Anomaly Detection With Robust Temporal Feature Magnitude Learning
  ⭐code
视频去噪
- Unsupervised Deep Video Denoising
  ⭐code🏠project
Video Portrait Relighting(人像视频重照明)
- Neural Video Portrait Relighting in Real-time via Consistency Modeling
视频时序定位
- Boundary-Sensitive Pre-Training for Temporal Localization in Videos
  ⭐code🏠project
视频关联性
- Explainable Video Entailment With Grounded Visual Evidence
视频抠图
- Video Matting via Consistency-Regularized Graph Neural Networks
  ⭐code
视频编码
- ELF-VC: Efficient Learned Flexible-Rate Video Coding
识别视频中互动关系
- Motion Guided Attention Fusion To Recognize Interactions From Videos
视频去模糊
- Multi-Scale Separable Network for Ultra-High-Definition Video Deblurring
视频理解
- Unified Graph Structured Models for Video Understanding
视频重建
- HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset
  ⭐code🏠project📺video🌻dataset

8.Human Pose Estimation(人体姿态估计)

Human Pose Regression with Residual Log-likelihood Estimation
😮oral⭐code
Online Knowledge Distillation for Efficient Pose Estimation
DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders
😮oral⭐code
Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation
😮oral⭐code
Dynamical Pose Estimation
⭐code📺video
Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation
⭐code🏠project
Egocentric Pose Estimation From Human Vision Span
Learning Privacy-Preserving Optics for Human Pose Estimation
😮oral⭐code🏠project📺video
TokenPose: Learning Keypoint Tokens for Human Pose Estimation
⭐code
Motion Adaptive Pose Estimation from Compressed Videos
3D 人体姿态估计
- PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop
  😮oral⭐code🏠project
- HuMoR: 3D Human Motion Model for Robust Pose Estimation
  😮oral🏠project📺video
- Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows
  ⭐code📺video
- Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation
  ⭐code
- EventHPE: Event-based 3D Human Pose and Shape Estimation
  ⭐code
- imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose
- Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
- Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition
- Learning to Regress Bodies from Images using Differentiable Semantic Rendering
  🏠project
- Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild
  ⭐code
- 3D Human Pose Estimation With Spatial and Temporal Transformers
  ⭐code📺video
- PARE: Part Attention Regressor for 3D Human Body Estimation
  ⭐code🏠project📺video
- Learning Causal Representation for Training Cross-Domain Pose Estimator via Generative Interventions
- UltraPose: Synthesizing Dense Pose With 1 Billion Points by Human-Body Decoupling 3D Model
  ⭐code
- Modulated Graph Convolutional Network for 3D Human Pose Estimation
  ⭐code
- Revitalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation
  ⭐code🏠project📺video
- Estimating Egocentric 3D Human Pose in Global Space
  🏠project📺video
- Camera Distortion-Aware 3D Human Pose Estimation in Video With Optimization-Based Meta-Learning
  ⭐code
- EM-POSE: 3D Human Pose Estimation From Sparse Electromagnetic Trackers
  ⭐code🏠project📺video
- Towards Alleviating the Modeling Ambiguity of Unsupervised Monocular 3D Human Pose Estimation
  🏠project
SPEC: Seeing People in the Wild with an Estimated Camera
⭐code🏠project📺video
Encoder-Decoder With Multi-Level Attention for 3D Human Shape and Pose Estimation
⭐code
3D姿势迁移
- Unsupervised Geodesic-preserved Generative Adversarial Networks for Unconstrained 3D Pose Transfer
  ⭐code
手部姿势
- 手势合成
  - Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates
    ⭐code
- 手势识别
  - Hand Image Understanding via Deep Multi-Task Learning
    ⭐code
  - SemiHand: Semi-Supervised Hand Pose Estimation With Consistency
- 3D 手部姿态
- 手部交互姿势估计
  - End-to-End Detection and Pose Estimation of Two Interacting Hands
- 3D手网格建模
  - I2UV-HandNet: Image-to-UV Prediction Network for Accurate and High-Fidelity 3D Hand Mesh Modeling
- Towards Accurate Alignment in Real-Time 3D Hand-Mesh Reconstruction
- 手部网格恢复
  - Self-Supervised Transfer Learning for Hand Mesh Recovery From Binocular Images
- 手势学习
  - TravelNet: Self-Supervised Physically Plausible Hand Motion Learning From Monocular Color Images
    🏠project
- 手势重建
  - Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image
    ⭐code
三维网格合成
- Deep Hybrid Self-Prior for Full 3D Mesh Generation
  🏠project
- Mesh Graphormer
  ⭐code
人体重建
- ARCH++: Animation-Ready Clothed Human Reconstruction Revisited
  📺video
- 3D 人体重建
4D人体捕捉
- Learning Motion Priors for 4D Human Body Capture in 3D Scenes
  ⭐code🏠project📺video
人体姿态估计与合成
- Physics-based Human Motion Estimation and Synthesis from Videos
多人姿态估计
- Shape-aware Multi-Person Pose Estimation from Multi-View Images
  ⭐code🏠project📺video
  论文公开
- The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation
  ⭐code
人/物体姿态关键点检测
- Keypoint Communities
  ⭐code
人体运动捕捉
- SOMA: Solving Optical Marker-Based MoCap Automatically
  ⭐code🏠project📺video
- DeepMultiCap: Performance Capture of Multiple Characters Using Sparse Multiview Cameras
  🏠project
- Lightweight Multi-person Total Motion Capture Using Sparse Multi-view Cameras
2D人体姿势估计
- An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation
  ⭐code
Human Action Video Alignment
- Normalized Human Pose Features for Human Action Video Alignment
3D姿态迁移
- Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer
  ⭐code
人体网格恢复
- Skeleton2Mesh: Kinematics Prior Injected Unsupervised Human Mesh Recovery
  🏠project
- Uncertainty-Aware Human Mesh Recovery From Video by Learning Part-Based 3D Dynamics
根据人体姿势估计距离
- Single View Physical Distance Estimation using Human Pose
  🏠project
3D人体
- SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes
  ⭐code🏠project📺video
- Monocular, One-stage, Regression of Multiple 3D People
  ⭐code
运动合成
- Synthesis of Compositional Animations from Textual Descriptions
  ⭐code
- 3D人体运动合成
  - Action-Conditioned 3D Human Motion Synthesis With Transformer VAE
  - A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder
3D动画
- DeePSD: Automatic Deep Skinning and Pose Space Deformation for 3D Garment Animation
服装类别级姿势估计
- GarmentNets: Category-Level Pose Estimation for Garments via Canonical Space Shape Completion
  ⭐code🏠project
服装人体建模
- Point-Based Modeling of Human Clothing
  ⭐code🏠project📺video
关键点定位
- TransPose: Keypoint Localization via Transformer
  ⭐code

7.Scene Graph Generation(场景图生成)

Spatial-Temporal Transformer for Dynamic Scene Graph Generation
⭐code📺video
Unconditional Scene Graph Generation
🏠project
Target Adaptive Context Aggregation for Video Scene Graph Generation
⭐code
Learning to Generate Scene Graph from Natural Language Supervision
⭐code
Segmentation-Grounded Scene Graph Generation
⭐code
Context-aware Scene Graph Generation with Seq2Seq Transformer
⭐code
A Simple Baseline for Weakly-Supervised Scene Graph Generation
⭐code
Generative Compositional Augmentations for Scene Graph Prediction
⭐code
From General to Specific: Informative Scene Graph Generation via Balance Adjustment
⭐code
场景合成

6.Point Cloud(点云)

AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds
⭐code🏠project
Adaptive Graph Convolution for Point Cloud Analysis
⭐code
Learning Inner-Group Relations on Point Clouds
CPFN: Cascaded Primitive Fitting Networks for High-Resolution Point Clouds
⭐code
Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks
⭐code📺video
PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds
⭐code
3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds
Differentiable Convolution Search for Point Cloud Processing
Superpoint Network for Point Cloud Oversegmentation
⭐code
PU-EVA: An Edge-Vector Based Approximation Solution for Flexible-Scale Point Cloud Upsampling
SGMNet: Learning Rotation-Invariant Point Cloud Representations via Sorted Gram Matrix
DWKS: A Local Descriptor of Deformations Between Meshes and Point Clouds
⭐code
Robustness Certification for Point Cloud Models
⭐code
Vector Neurons: A General Framework for SO(3)-Equivariant Networks
⭐code
Unsupervised Point Cloud Pre-Training via Occlusion Completion
⭐code
Towards Efficient Graph Convolutional Networks for Point Cloud Handling
⭐code
Progressive Seed Generation Auto-Encoder for Unsupervised Point Cloud Learning
点云去噪
- Score-Based Point Cloud Denoising
  ⭐code
点云配准
- HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration
  ⭐code🏠project
- (Just) A Spoonful of Refinements Helps the Registration Error Go Down
  😮oral⭐code
- A Robust Loss for Point Cloud Registration
- Deep Hough Voting for Robust Global Registration
- Sampling Network Guided Cross-Entropy Method for Unsupervised Point Cloud Registration
  ⭐code
- Feature Interactive Representation for Point Cloud Registration
- LSG-CPD: Coherent Point Drift With Local Surface Geometry for Point Cloud Registration
  ⭐code📺video
- OMNet: Learning Overlapping Mask for Partial-to-Partial Point Cloud Registration
  ⭐code
- DeepPRO: Deep Partial Point Cloud Registration of Objects
- Provably Approximated Point Cloud Registration
- Bootstrap Your Own Correspondences点云配准
- Distinctiveness Oriented Positional Equilibrium for Point Cloud Registration
3D点云
- Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projection Matching
  ⭐code
- Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds
  ⭐code🏠project
- Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projections Matching
  ⭐code
- Point Transformer
- Point-Set Distances for Learning Representations of 3D Point Clouds
- PointBA: Towards Backdoor Attacks in 3D Point Cloud
- Minimal Adversarial Examples for Deep Learning on 3D Point Clouds
- 3D点云重建
  - MonteFloor: Extending MCTS for Reconstructing Accurate Large-Scale Floor Plans
    😮oral🏠project📺video
点云补全
- SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
  😮oral⭐code
- ME-PCN: Point Completion Conditioned on Mask Emptiness
- PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers
  😮oral⭐code
- Voxel-based Network for Shape Completion by Leveraging Edge Generation
  ⭐code
- RFNet: Recurrent Forward Network for Dense Point Cloud Completion
点云增强
- Point Cloud Augmentation with Weighted Local Transformations
  ⭐code
点云形状分析
- Walk in the Cloud: Learning Curves for Point Clouds Shape Analysis
  ⭐code🏠project
点云分析
- A Closer Look at Rotation-Invariant Deep Point Cloud Analysis
  ⭐code
3D点云分类
- A Backdoor Attack Against 3D Point Cloud Classifiers
  ⭐code
3D点云生成与补全
- 3D Shape Generation and Completion through Point-Voxel Diffusion
  🏠project📺video
point cloud object co-segmentation
- Unsupervised Point Cloud Object Co-Segmentation by Co-Contrastive Learning and Mutual Attention Sampling
  ⭐code
点云理解
- Shape Self-Correction for Unsupervised Point Cloud Understanding

5.Few-Shot/Zero-Shot Learning;Domain Generalization/Adaptation(小/零样本学习;域适应/泛化)

域适应
域泛化
小样本
Zero-Shot Learning(零样本学习)
- Discriminative Region-based Multi-Label Zero-Shot Learning
  ⭐code
- Field-Guide-Inspired Zero-Shot Learning
- Generalized Zero-Shot Learning(广义零样本学习)
  - FREE: Feature Refinement for Generalized Zero-Shot Learning
    ⭐code
  - Semantics Disentangling for Generalized Zero-Shot Learning
    ⭐code

4.Neural rendering(神经渲染)

In-Place Scene Labelling and Understanding with Implicit Scene Representation
😮oral🏠project📺video
Differentiable Surface Rendering via Non-Differentiable Sampling
Self-Calibrating Neural Radiance Fields
⭐code
NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo
😮oral⭐code🏠project
Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering
⭐code🏠project
CodeNeRF: Disentangled Neural Radiance Fields for Object Categories
⭐code
MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo
⭐code🏠project📺video
PlenOctrees for Real-Time Rendering of Neural Radiance Fields
😮oral⭐Conversion Code⭐Viewer Code🏠project📺video
Neural Radiance Flow for 4D View Synthesis and Video Processing
⭐code🏠project
Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies
⭐code🏠project📺video
📰解读:浙大三维视觉团队提出Animatable NeRF，从RGB视频中重建可驱动人体模型 (ICCV'21)
GNeRF: GAN-Based Neural Radiance Field Without Posed Camera
😮oral
BARF: Bundle-Adjusting Neural Radiance Fields
😮oral⭐code🏠project
FastNeRF: High-Fidelity Neural Rendering at 200FPS
🏠project📺video
PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering
⭐code📺video
NeRD: Neural Reflectance Decomposition from Image Collections
⭐code🏠project📺video
Editing Conditional Radiance Fields
⭐code🏠project📺video
GRF: Learning a General Radiance Field for 3D Representation and Rendering
⭐code
4DComplete: Non-Rigid Motion Estimation Beyond the Observable Surface
⭐code📺video
KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
⭐code
Neural Articulated Radiance Field
⭐code
Baking Neural Radiance Fields for Real-Time View Synthesis
🏠project📺video
Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video
⭐code🏠project
Nerfies: Deformable Neural Radiance Fields
⭐code🏠project📺video
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields
UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction
😮oral⭐code🏠project📺video
3D渲染
- GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds
  😮oral⭐code🏠project📺video
3D photography(3D 相片)
- SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting
  😮oral🏠project📺video
渲染
- EgoRenderer: Rendering Human Avatars from Egocentric Camera Images

3.Image Clustering(图像聚类)

Clustering by Maximizing Mutual Information Across Views
Learning Hierarchical Graph Neural Networks for Image Clustering
⭐code
One-Pass Multi-View Clustering for Large-Scale Data
End-to-End Robust Joint Unsupervised Image Alignment and Clustering
Graph Contrastive Clustering
⭐code
人脸聚类
- Learn To Cluster Faces via Pairwise Classification

2.Sign Language(手语识别)

Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives
SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition
Self-Mutual Distillation Learning for Continuous Sign Language Recognition
Visual Alignment Constraint for Continuous Sign Language Recognition
⭐code
手语翻译
- Stochastic Transformer Networks With Linear Competing Units: Application To End-to-End SL Translation

1.Other(其它)

Bias Loss for Mobile Neural Networks
⭐code
Improve Unsupervised Pretraining for Few-label Transfer
Temporal-wise Attention Spiking Neural Networks for Event Streams Classification
Accelerating Atmospheric Turbulence Simulation via Learned Phase-to-Space Transform
Energy-Based Open-World Uncertainty Modeling for Confidence Calibration
Robustness via Cross-Domain Ensembles
😮oral⭐code🏠project📺video
Warp Consistency for Unsupervised Learning of Dense Correspondences
😮oral⭐code
Few-Shot and Continual Learning with Attentive Independent Mechanisms
⭐code
Out-of-Core Surface Reconstruction via Global TGV Minimization
ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description
Multi-scale Matching Networks for Semantic Correspondence
⭐code
Learning with Noisy Labels via Sparse Regularization
⭐code
CanvasVAE: Learning to Generate Vector Graphic Documents
Toward Spatially Unbiased Generative Models
⭐code
Learning Compatible Embeddings
⭐code
Instance Similarity Learning for Unsupervised Feature Representation
⭐code
Generalizable Mixed-Precision Quantization via Attribution Rank Preservation
⭐code
Unifying Nonlocal Blocks for Neural Networks
⭐code
Impact of Aliasing on Generalization in Deep Convolutional Networks
NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models
⭐code
ProAI: An Efficient Embedded AI Hardware for Automotive Applications - a Benchmark Study
m-RevNet: Deep Reversible Neural Networks with Momentum 涉嫌学术不端，已申请撤稿
Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations
MT-ORL: Multi-Task Occlusion Relationship Learning
⭐code
Finding Representative Interpretations on Convolutional Neural Networks
Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation
⭐code
PR-RRN: Pairwise-Regularized Residual-Recursive Networks for Non-rigid Structure-from-Motion
Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks
⭐code
Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision
⭐code
Structured Outdoor Architecture Reconstruction by Exploration and Classification
Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs
⭐code
A New Journey from SDRTV to HDRTV
⭐code
A Simple Framework for 3D Lensless Imaging with Programmable Masks
⭐code
Causal Attention for Unbiased Visual Recognition
⭐code
Learning to Match Features with Seeded Graph Matching Network
⭐code
Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain
PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility
😮oral
Towards Understanding the Generative Capability of Adversarially Robust Classifiers
😮oral
Ranking Models in Unlabeled New Environments
⭐code
Learning of Visual Relations: The Devil is in the Tails
🏠project
BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies
⭐code
Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image
去偏差
- BiaSwap: Removing dataset bias with bias-tailored swapping augmentation
Full-Velocity Radar Returns by Radar-Camera Fusion
CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing
⭐code🏠project
NGC: A Unified Framework for Learning with Open-World Noisy Data
LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
🏠project
Unsupervised Dense Deformation Embedding Network for Template-Free Shape Correspondence
Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process
⭐code
Digging into Uncertainty in Self-supervised Multi-view Stereo
Learning to Discover Reflection Symmetry via Polar Matching Convolution
⭐code🏠project
A Dual Adversarial Calibration Framework for Automatic Fetal Brain Biometry
The Functional Correspondence Problem
The Animation Transformer: Visual Correspondence via Segment Matching
Parsing Table Structures in the Wild
⭐code
Square Root Marginalization for Sliding-Window Bundle Adjustment
⭐code🏠project📺video
Hierarchical Object-to-Zone Graph for Object Navigation
⭐code📺video
Robustness and Generalization via Generative Adversarial Training
Learning Fast Sample Re-weighting Without Reward Data
⭐code
ReconfigISP: Reconfigurable Camera Image Processing Pipeline
🏠project
Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting
😮oral
Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories
DisUnknown: Distilling Unknown Factors for Disentanglement Learning
⭐code🏠project
S3VAADA: Submodular Subset Selection for Virtual Adversarial Active Domain Adaptation
🏠project
ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity
📺video
Photon-Starved Scene Inference using Single Photon Cameras
📺video
OSCAR-Net: Object-centric Scene Graph Attention for Image Attribution
⭐code🏠project
Learning to Estimate Hidden Motions with Global Motion Aggregation
⭐code📺video
Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning
Meta Learning on a Sequence of Imbalanced Domains with Difficulty Awareness
⭐code
Procedure Planning in Instructional Videosvia Contextual Modeling and Model-based Policy Learning
😮oral
Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice
😮oral⭐code
Neural Strokes: Stylized Line Drawing of 3D Shapes
⭐code
Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D Shape, Pose, and Appearance Consistency
Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans
🏠project
Cherry-Picking Gradients: Learning Low-Rank Embeddings of Visual Data via Differentiable Cross-Approximation
⭐code
Exploiting Explanations for Model Inversion Attacks
Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization
RDI-Net: Relational Dynamic Inference Networks
⭐code
ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators
⭐code
T-Net: Effective Permutation-Equivariant Network for Two-View Correspondence Learning
⭐code
Learning To Stylize Novel Views
⭐code🏠project
A Lazy Approach to Long-Horizon Gradient-Based Meta-Learning
Viewing Graph Solvability via Cycle Consistency
😮oral⭐code
🏆Best paper honorable mention
SACoD: Sensor Algorithm Co-Design Towards Efficient CNN-Powered Intelligent PhlatCam
⭐code
Rethinking 360° Image Visual Attention Modelling with Unsupervised Learning
Motion Basis Learning for Unsupervised Deep Homography Estimation with Subspace Projection
⭐code
Batch Normalization Increases Adversarial Vulnerability and Decreases Adversarial Transferability: A Non-Robust Feature Perspective
DeepCAD: A Deep Generative Network for Computer-Aided Design Models
🏠project
Better Aggregation in Test-Time Augmentation
Self-Born Wiring for Neural Trees
Detector-Free Weakly Supervised Grounding by Separation
Motion-Aware Dynamic Architecture for Efficient Frame Interpolation
Relating Adversarially Robust Generalization to Flat Minima
Bit-Mixer: Mixed-Precision Networks With Runtime Bit-Width Selection
AINet: Association Implantation for Superpixel Segmentation
⭐code
Orthogonal Projection Loss
⭐code
Knowledge-Enriched Distributional Model Inversion Attacks
⭐code
Architecture Disentanglement for Deep Neural Networks
⭐code
On Equivariant and Invariant Learning of Object Landmark Representations
⭐code🏠project
Predicting with Confidence on Unseen Distributions
Embed Me If You Can: A Geometric Perceptron
⭐code
Persistent Homology Based Graph Convolution Network for Fine-Grained 3D Shape Segmentation
HIRE-SNN: Harnessing the Inherent Robustness of Energy-Efficient Deep Spiking Neural Networks by Training With Crafted Input Noise
⭐code
Towards Memory-Efficient Neural Networks via Multi-Level In Situ Generation
From Culture to Clothing: Discovering the World Events Behind a Century of Fashion Images
🏠project
MBA-VO: Motion Blur Aware Visual Odometry
⭐code
STR-GQN: Scene Representation and Rendering for Unknown Cameras Based on Spatial Transformation Routing
Explaining Local, Global, And Higher-Order Interactions In Deep Learning
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations
⭐code
Homogeneous Architecture Augmentation for Neural Predictor
⭐code
SS-IL: Separated Softmax for Incremental Learning
VSAC: Efficient and Accurate Estimator for H and F
Fusion Moves for Graph Matching
⭐code🏠project
Geometric Granularity Aware Pixel-To-Mesh
Modulated Periodic Activations for Generalizable Local Functional Representations
🏠project
Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents
⭐code🏠project📺video
A Dark Flash Normal Camera
🏠project📺video
Pri3D: Can 3D Priors Help 2D Representation Learning?
⭐code📺video
Membership Inference Attacks Are Easier on Difficult Problems
Auxiliary Tasks and Exploration Enable ObjectGoal Navigation
⭐code🏠project
MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery
🏠project
DCT-SNN: Using DCT To Distribute Spatial Information Over Time for Low-Latency Spiking Neural Networks
⭐code
Learning To Resize Images for Computer Vision Tasks
Field of Junctions: Extracting Boundary Structure at Low SNR
DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling
Learning To Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data
⭐code
Graph-based Asynchronous Event Processing for Rapid Object Recognitio
Ranking Models in Unlabeled New Environments
⭐code
A Hybrid Frequency-Spatial Domain Model for Sparse Image Reconstruction in Scanning Transmission Electron Microscopy
⭐code
MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing
Efficient Large Scale Inlier Voting for Geometric Vision Problems
⭐code
Aggregation With Feature Detection
ReCU: Reviving the Dead Weights in Binary Neural Networks
⭐code
Deep Halftoning With Reversible Binary Pattern
FFT-OT: A Fast Algorithm for Optimal Transportation
Progressive Correspondence Pruning by Consensus Learning
⭐code🏠project
📰解读:基于一致性学习的渐进式匹配筛选 (ICCV 2021)
Multispectral Illumination Estimation Using Deep Unrolling Network
Distilling Global and Local Logits With Densely Connected Relations
Learning specialized activation functions with the Piecewise Linear Unit
Adaptive Convolutions With Per-Pixel Dynamic Filter Atom
Deep Matching Prior: Test-Time Optimization for Dense Correspondence
⭐code
Calibrated and Partially Calibrated Semi-Generalized Homographies
⭐code
The Spatio-Temporal Poisson Point Process: A Simple Model for the Alignment of Event Camera Data
⭐code
EC-DARTS: Inducing Equalized and Consistent Optimization Into DARTS
Refining activation downsampling with SoftPool
FATNN: Fast and Accurate Ternary Neural Networks
⭐code
GTT-Net: Learned Generalized Trajectory Triangulation
Deep Permutation Equivariant Structure from Motion
⭐code
Extending Neural P-frame Codecs for B-frame Codin
Hierarchical Graph Attention Network for Few-Shot Visual-Semantic Learning
SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks
⭐code
AA-RMVSNet: Adaptive Aggregation Recurrent Multi-View Stereo Network
⭐code
Rethinking the Backdoor Attacks' Triggers: A Frequency Perspective
⭐code
Orthographic-Perspective Epipolar Geometry
Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?
⭐code
PixelPyramids: Exact Inference Models From Lossless Image Pyramids
⭐code
SurfaceNet: Adversarial SVBRDF Estimation from a Single Image
⭐code
Adaptive Curriculum Learning
Sparse-Shot Learning With Exclusive Cross-Entropy for Extremely Many Localisations
Graspness Discovery in Clutters for Fast and Accurate Grasp Detection
RobustNav: Towards Benchmarking Robustness in Embodied Navigation
⭐code
Generating Attribution Maps With Disentangled Masked Backpropagation
Spectral Leakage and Rethinking the Kernel Size in CNNs
⭐code
What You Can Learn by Staring at a Blank Wall
Neural TMDlayer: Modeling Instantaneous Flow of Features via SDE Generators
CLEAR: Clean-up Sample-Targeted Backdoor in Neural Networks
Learning To Hallucinate Examples From Extrinsic and Intrinsic Supervision
Single-shot Hyperspectral-Depth Imaging with Learned Diffractive Optics
GridToPix: Training Embodied Agents With Minimal Supervision
🏠project📺video
Differentiable Dynamic Wirings for Neural Networks
JEM++: Improved Techniques for Training JEM
⭐code
X-World: Accessibility, Vision, and Autonomy Meet
Memory-augmented Dynamic Neural Relational Inference
Physics-based Differentiable Depth Sensor Simulation
Hypergraph Neural Networks for Hypergraph Matching
⭐code
Visual Grounding
- SAT: 2D Semantics Assisted Training for 3D Visual Grounding
Cortical Surface Shape Analysis Based on Alexandrov Polyhedra
FcaNet: Frequency Channel Attention Networks
⭐code
Procedure Planning in Instructional Videos via Contextual Modeling and Model-Based Policy Learning
Structured Outdoor Architecture Reconstruction by Exploration and Classification
⭐code
ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description
Testing Using Privileged Information by Adapting Features With Statistical Dependence
Virtual Light Transport Matrices for Non-Line-of-Sight Imaging
😮oral
DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training
Contrastive Multimodal Fusion with TupleInfoNCE
Learning Better Visual Data Similarities via New Grouplet Non-Euclidean Embedding
An Elastica Geodesic Approach With Convexity Shape Prior
Inverting a Rolling Shutter Camera: Bring Rolling Shutter Images to High Framerate Global Shutter Video
Multimodal Knowledge Expansion
⭐code
Direct Differentiable Augmentation Search
⭐code
The Functional Correspondence Problem
🏠project
Joint Topology-Preserving and Feature-Refinement Network for Curvilinear Structure Segmentation
⭐code
Generative Layout Modeling Using Constraint Graphs
Self-Supervised Image Prior Learning with GMM from a Single Noisy Image
⭐code
Deep Implicit Surface Point Prediction Networks
⭐code🏠project📺video
Poly-NL: Linear Complexity Non-local Layers With 3rd Order Polynomials
Factorizing Perception and Policy for Interactive Instruction Following
⭐code
Group-Wise Inhibition Based Feature Regularization for Robust Classification
⭐code
Searching for Robustness: Loss Learning for Noisy Classification Tasks
Statistically Consistent Saliency Estimation
Practical Relative Order Attack in Deep Ranking
⭐code
Q-Match: Iterative Shape Matching via Quantum Annealing
⭐code🏠project
Learning To Better Segment Objects From Unseen Classes With Unlabeled Videos
🏠project📺video
Globally Optimal and Efficient Manhattan Frame Estimation by Delimiting Rotation Search Space
Cross-Encoder for Unsupervised Gaze Representation Learning
Hierarchical Disentangled Representation Learning for Outdoor Illumination Estimation and Editing
NeuSpike-Net: High Speed Video Reconstruction via Bio-Inspired Neuromorphic Cameras
Local Temperature Scaling for Probability Calibration
LIRA: Learnable, Imperceptible and Robust Backdoor Attacks
Conformer: Local Features Coupling Global Representations for Visual Recognition
⭐code
Reliably fast adversarial training via latent adversarial perturbation
PX-NET: Simple and Efficient Pixel-Wise Training of Photometric Stereo Networks
A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation
⭐code🏠project📺video
ICON: Learning Regular Maps Through Inverse Consistency
Video Geo-Localization Employing Geo-Temporal Feature Learning and GPS Trajectory Smoothing
⭐code
Kernel Methods in Hyperbolic Spaces
Cross-Camera Convolutional Color Constancy
BlockPlanner: City Block Generation with Vectorized Graph Representation
A Machine Teaching Framework for Scalable Recognition
Clothed Human Bodies
- Dynamic Surface Function Networks for Clothed Human Bodies
  ⭐code🏠project📺video
迁移学习
- Fast and Efficient DNN Deployment via Deep Gaussian Transfer Learning
Active Recognition(AR)
- FLAR: A Unified Prototype Framework for Few-Sample Lifelong Active Recognition
3D摄影
- SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting
  😮oral🏠project📺video
Sub-Bit Neural Networks: Learning To Compress and Accelerate Binary Neural Networks
⭐code
When Pigs Fly: Contextual Reasoning in Synthetic and Natural Scenes
⭐code
Physics-Enhanced Machine Learning for Virtual Fluorescence Microscopy
⭐code
Ground-truth or DAER: Selective Re-query of Secondary Information
⭐code
Can Shape Structure Features Improve Model Robustness Under Diverse Adversarial Settings?
Joint Representation Learning and Novel Category Discovery on Single- and Multi-Modal Data
Sparse Needlets for Lighting Estimation with Spherical Transport Loss
Semantic Perturbations with Normalizing Flows for Improved Generalization
Differentiable Surface Rendering via Non-Differentiable Sampling
Towards Robustness of Deep Neural Networks via Regularization
Objects as Cameras: Estimating High-Frequency Illumination from Shadows
Inference of Black Hole Fluid-Dynamics From Sparse Interferometric Measurements
Removing the Bias of Integral Pose Regression
A Light Stage on Every Desk
🏠project
Multi-Level Curriculum for Training a Distortion-Aware Barrel Distortion Rectification Model
Generic Event Boundary Detection: A Benchmark for Event Segmentation
Extreme Structure from Motion for Indoor Panoramas without Visual Overlaps
⭐code
Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams
⭐code
VaPiD: A Rapid Vanishing Point Detector via Learned Optimizers
Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images
⭐code
Efficient and Differentiable Shadow Computation for Inverse Problems
Minimal Cases for Computing the Generalized Relative Pose using Affine Correspondences
Radial Distortion Invariant Factorization for Structure from Motion
LaLaLoc: Latent Layout Localisation in Dynamic, Unvisited Environments
Transforms Based Tensor Robust PCA: Corrupted Low-Rank Tensors Recovery via Convex Optimization
Synchronization of Group-labelled Multi-graphs
Robust Watermarking for Deep Neural Networks via Bi-Level Optimization
CrossNorm and SelfNorm for Generalization under Distribution Shifts
⭐code
Learning Temporal Dynamics from Cycles in Narrated Video
🏠project
von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning
Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts
⭐code
Me-Momentum: Extracting Hard Confident Examples From Noisily Labeled Data
⭐code
ProFlip: Targeted Trojan Attack with Progressive Bit Flips
Attention Is Not Enough: Mitigating the Distribution Discrepancy in Asynchronous Multimodal Sequence Fusion
AdvRush: Searching for Adversarially Robust Neural Architectures
Improving robustness against common corruptions with frequency biased models
UASNet: Uncertainty Adaptive Sampling Network for Deep Stereo Matching
Glimpse-Attend-and-Explore: Self-Attention for Active Visual Exploration
⭐code
Field Convolutions for Surface CNNs
😮oral⭐code
SIMstack: A Generative Shape and Instance Model for Unordered Object Stacks
Learning Icosahedral Spherical Probability Map Based on Bingham Mixture Model for Vanishing Point Estimation
Incorporating Learnable Membrane Time Constant to Enhance Learning of Spiking Neural Networks
⭐code
Real-Time Vanishing Point Detector Integrating Under-Parameterized RANSAC and Hough Transform
Low-Rank Tensor Completion by Approximating the Tensor Average Rank
Rotation Averaging in a Split Second: A Primal-Dual Method and a Closed-Form for Cycle Graphs
⭐code
Effectively Leveraging Attributes for Visual Similarity
⭐code
Localized Simple Multiple Kernel K-means
⭐code
SmartShadow: Artistic Shadow Drawing Tool for Line Drawings
PT-CapsNet: A Novel Prediction-Tuning Capsule Network Suitable for Deeper Architectures
⭐code
Generalized Shuffled Linear Regression
⭐code
The Animation Transformer: Visual Correspondence via Segment Matching
Weak Adaptation Learning: Addressing Cross-Domain Data Insufficiency With Weak Annotator
Building-GAN: Graph-Conditioned Architectural Volumetric Design Generation
Procrustean Training for Imbalanced Deep Learning

Files

ICCV2021.md

Latest commit

History

ICCV2021.md

File metadata and controls

ICCV2021最新信息及已接收论文/代码(持续更新)

❗❗❗🌟🌟🌟📗📗📗ICCV 2021收录论文已全部公布，下载可在【我爱计算机视觉】后台回复“paper”，即可收到。共计 1612 篇。

❗❗❗🌟🌟🌟全部论文已粗略分类完毕，请查阅

目录

65.Optical Flow Estimation(光流估计)

64.Anomaly Detection(异常检测)

63.Data Augmentation(数据增强)

62.Open-Set Recognition(开放集识别)

61.Metric Learning(元学习)

60.Federated Learning(联合学习)

59.Graph Neural Networks(图神经网络)

58.Computational Photography(光学、几何、光场成像、计算摄影)

57.Image Matching(图像匹配)

56.Dataset(数据集)

55.Activity Recognition(活动识别)

54.Sketch recognition(草图)

53.Vision Localization(视觉定位)

52.Vision-and-Language(视觉语言)

51.View Synthesis(视图合成)

50.Continual Learning(持续学习)

49.Human-Object Interaction(人物交互)

48.6DoF

47.NAS

46.Defect Detection(缺陷检测)

45.Image Caption(图像字幕)

44.Human motion prediction(人体运动预测)

43.Dense Prediction(密集预测)

42.Representations Learning(表征学习)

41.Out-of-Distribution Detection(OOD)

40.Metric Learning(度量学习)

39.Incremental Learning(增量学习)

38.Weakly/Semi-Supervised/Self-supervised/Unsupervised Learning(自/半/弱监督学习)

37.Multitask Learning(多任务学习)

36.SLAM/AR/VR/机器人

35.Quantization/Pruning/Knowledge Distillation/Model Compression(量化、剪枝、蒸馏、模型压缩/扩展与优化)

34.Super-Resolution(超分辨率)

33.Remote Sensing Images(遥感影像)

32.语音

31.Style Transfer(风格迁移)

30.Image Generation/synthesis(图像生成/合成)

29.Image Retrieval(图像检索)

28.Contrastive Learning(对比学习)

27.Multi-label image recognition(多标签图像识别)

26.Image Processing(图像处理)

25.Medical Image(医学影像)

24.Face(人脸)

23.Gaze Estimation(视线估计)

22.GAN

21.Active Learning(主动学习)

20.Adversarial Learning(对抗学习)

19.Self-Driving Vehicles(自动驾驶)

18.Transformers

17.3D(三维视觉)

16.Re-Identification(重识别)

Object Re-Identification目标(物体)重识别

Person Re-Identification(人员重识别)

15.Object Tracking(目标跟踪)

14.Object Detection(目标检测)

13.Image Segmentation(图像分割)

12.Image/Fine-Grained Classification(图像/细粒度分类)

11.Visual Question Answering(视觉问答)

10.OCR

9.Video

8.Human Pose Estimation(人体姿态估计)

7.Scene Graph Generation(场景图生成)

6.Point Cloud(点云)

5.Few-Shot/Zero-Shot Learning;Domain Generalization/Adaptation(小/零样本学习;域适应/泛化)

4.Neural rendering(神经渲染)

3.Image Clustering(图像聚类)

2.Sign Language(手语识别)

1.Other(其它)