TOOD: Task-aligned One-stage Object Detection

Abstract

One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks. In this work, we propose a Task-aligned One-stage Object Detection (TOOD) that explicitly aligns the two tasks in a learning-based manner. First, we design a novel Task-aligned Head (T-Head) which offers a better balance between learning task-interactive and task-specific features, as well as a greater flexibility to learn the alignment via a task-aligned predictor. Second, we propose Task Alignment Learning (TAL) to explicitly pull closer (or even unify) the optimal anchors for the two tasks during training via a designed sample assignment scheme and a task-aligned loss. Extensive experiments are conducted on MS-COCO, where TOOD achieves a 51.1 AP at single-model single-scale testing. This surpasses the recent one-stage detectors by a large margin, such as ATSS (47.7 AP), GFL (48.2 AP), and PAA (49.0 AP), with fewer parameters and FLOPs. Qualitative results also demonstrate the effectiveness of TOOD for better aligning the tasks of object classification and localization.

Citation

@inproceedings{feng2021tood,
    title={TOOD: Task-aligned One-stage Object Detection},
    author={Feng, Chengjian and Zhong, Yujie and Gao, Yu and Scott, Matthew R and Huang, Weilin},
    booktitle={ICCV},
    year={2021}
}

Results and Models

Backbone	Style	Anchor Type	Lr schd	Multi-scale Training	Mem (GB)	box AP	Config	Download
R-50	pytorch	Anchor-free	1x	N	4.1	42.4	config	model \| log
R-50	pytorch	Anchor-based	1x	N	4.1	42.4	config	model \| log
R-50	pytorch	Anchor-free	2x	Y	4.1	44.5	config	model \| log
R-101	pytorch	Anchor-free	2x	Y	6.0	46.1	config	model \| log
R-101-dcnv2	pytorch	Anchor-free	2x	Y	6.2	49.3	config	model \| log
X-101-64x4d	pytorch	Anchor-free	2x	Y	10.2	47.6	config	model \| log
X-101-64x4d-dcnv2	pytorch	Anchor-free	2x	Y			config	model \| log

[1] 1x and 2x mean the model is trained for 90K and 180K iterations, respectively.
[2] All results are obtained with a single model and without any test time data augmentation such as multi-scale, flipping and etc..
[3] dcnv2 denotes deformable convolutional networks v2. \

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TOOD: Task-aligned One-stage Object Detection

Abstract

Citation

Results and Models

Files

README.md

Latest commit

History

README.md

File metadata and controls

TOOD: Task-aligned One-stage Object Detection

Abstract

Citation

Results and Models