Skip to content

[NeurIPS 2022] PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points

License

Notifications You must be signed in to change notification settings

MCG-NJU/PointTAD

Repository files navigation

PointTAD [NeurIPS 2022]

This repo holds the codes of paper: "PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points", which is accepted in NeurIPS 2022.

pointtad

[Paper Link] [Zhihu]

News

[Jan. 10, 2023] Fixed some bugs and typos; updated best checkpoints for both multi-label benchmarks.

[Dec. 13, 2022] We release the codes and checkpoints on MultiTHUMOS and Charades.

Overview

This paper presents a query-based framework for multi-label temporal action detection, namely PointTAD, that leverages a set of learnable query points to handle both boundary frames and action semantic keyframes for finer action representation. Our model takes RGB input only and streamlines an end-to-end trainable framework for easy deployment. PointTAD surpasses previous multi-label TAD works by a large margin under detection-mAP and achieves comparable results under segmentation-mAP.

Dependencies

PyTorch 1.8.1 or higher, opencv-python, scipy, terminaltables, ruamel-yaml, ffmpeg

pip install -r requirements.txt to install dependencies.

Data Preparation

To prepare the RGB frames and corresponding annotations,

  • Clone the repository and cd PointTAD; mkdir data

  • For MultiTHUMOS:

    • Download the raw videos of THUMOS14 from here and put them into /data/thumos14_videos;
    • Extract the RGB frames from raw videos using util/extract_frames.py. The frames will be placed in /data/multithumos_frames;
    • You also need to generate multithumos_frames.json for the extracted frames with /util/generate_frame_dict.py and put the json file into /datasets folder.
  • For Charades:

    • Download the RGB frames of Charades from here , and place the frames at /data/charades_v1_rgb.
  • Replace the frame folder path or image tensor path in /datasets/dataset_cfg.yml.

The structure of data/ is displayed as follows:

|-- data
|   |-- thumos14_videos
|   |   |-- training
|   |   |-- testing
|   |-- multithumos_frames
|   |   |-- training
|   |   |-- testing
|   |-- charades_v1_rgb

[Optional] Once you had the raw frames, you can convert them into tensors with /util/frames2tensor.py to speed up IO. By enabling --img_tensor in train.sh and test.sh, the model takes in image tensors instead of frames.

Checkpoints

The best checkpoint is provided in the link below. We provide an error bar for each benchmark in the supplementary material of our paper.

Dataset [email protected] [email protected] [email protected] Avg-mAP Checkpoint
MultiTHUMOS 39.70% 24.90% 12.04% 23.46% Link
Charades 17.45% 13.46% 9.14% 12.13% Link

image-20230108105441363

image-20230108112946729

Testing

Use test.sh to evaluate,

  • MultiTHUMOS:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset multithumos --eval --load multithumos_best.pth
  • Charades:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset charades --eval --load charades_best.pth

Training

Use train.sh to train PointTAD,

  • MultiTHUMOS:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset multithumos
  • Charades:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset charades

Acknowledgements

The codebase is built on top of RTD-Net, DETR, Sparse R-CNN, AFSD and E2ETAD, we thank them for providing useful codes.

Citations

If you think our work is useful, please feel free to cite our paper:

@inproceedings{
	tan2022pointtad,
	title={Point{TAD}: Multi-Label Temporal Action Detection with Learnable Query Points},
	author={Jing Tan and Xiaotong Zhao and Xintian Shi and Bin Kang and Limin Wang},
	booktitle={Advances in Neural Information Processing Systems},
	editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
	year={2022},
	url={https://openreview.net/forum?id=_r8pCrHwq39}
}

Contacts

Jing Tan: [email protected]

About

[NeurIPS 2022] PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published