SimpleTrack: Rethinking and Improving the JDE Approach for Multi-Object Tracking
Jiaxin Li, Yan Ding, Hua-Liang Wei, Yutong Zhang and Wenxiang Lin
A simple baseline for SimpleTrack:
[SimpleTrack: Rethinking and Improving the JDE Approach for Multi-Object Tracking]
Joint detection and embedding (JDE) methods usually estimate bounding boxes and embedding features of objects with a single network in Multi-Object Tracking (MOT). Most JDE methods improve tracking accuracy by designing more efficient network structures. However, in the tracking stage, they fuse the target motion information and appearance information following the same rule, which could fail when the target is briefly lost or blocked. To mitigate this problem, we propose a new association matrix, called the EG matrix, which combines embedding cosine distance and Giou distance of objects. We apply the EG matrix to 5 different state-of-the-art JDE trackers and achieve significant improvements in IDF1, HOTA and, IDsw metrics and increase the tracking speed of these methods by about 20%. To further utilize the EG matrix, we introduce a simple, effective tracker named SimpleTrack, which utilizes a decouple method for objects detection and Re-identity and fuses BYTE and EG matrix for tracking.
- (2022.08.03) Our paper is accepted by Sensors!
- (2022.03.08) Our method gets the sota results among the JDE-based methods and the arxiv preprint of SimpleTrack is released.
- TODO
- Clone this repo, and we'll call the directory that you cloned as ${SimpleTrack_ROOT}
- Install dependencies. We use python 3.7 and pytorch >= 1.7.0
conda create -n SimpleTrack
conda activate SimpleTrack
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch
cd ${SimpleTrack_ROOT}
pip install -r requirements.txt
- We use DCNv2 in our backbone network and more details can be found in their repo.
git clone https://github.com/CharlesShang/DCNv2
cd DCNv2
./make.sh
Followed Fairmot
- CrowdHuman The CrowdHuman dataset can be downloaded from their official webpage. After downloading, you should prepare the data in the following structure:
crowdhuman
|——————images
| └——————train
| └——————val
└——————labels_with_ids
| └——————train(empty)
| └——————val(empty)
└------annotation_train.odgt
└------annotation_val.odgt
Then, you can change the paths in src/gen_labels_crowd.py and run:
cd src
python gen_labels_crowd.py
- MIX We use the same training data as JDE in this part and we call it "MIX". Please refer to their DATA ZOO to download and prepare all the training data including Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16.
- 2DMOT15 and MOT20 2DMOT15 and MOT20 can be downloaded from the official webpage of MOT challenge. After downloading, you should prepare the data in the following structure:
MOT15
|——————images
| └——————train
| └——————test
└——————labels_with_ids
└——————train(empty)
MOT20
|——————images
| └——————train
| └——————test
└——————labels_with_ids
└——————train(empty)
Then, you can change the seq_root and label_root in src/gen_labels_15.py and src/gen_labels_20.py and run:
cd src
python gen_labels_15.py
python gen_labels_20.py
to generate the labels of 2DMOT15 and MOT20. The seqinfo.ini files of 2DMOT15 can be downloaded here [Google], [Baidu],code:8o0w.
- Pretrained models
DLA-34 COCO pretrained model: DLA-34 official. HRNetV2 ImageNet pretrained model: HRNetV2-W18 official, HRNetV2-W32 official. After downloading, you should put the pretrained models in the following structure:
${SimpleTrack_ROOT}
└——————models
└——————ctdet_coco_dla_2x.pth
└——————hrnetv2_w32_imagenet_pretrained.pth
└——————hrnetv2_w18_imagenet_pretrained.pth
- Baseline model
Our baseline SimpleTrack model (DLA-34 backbone) is pretrained on the CrowdHuman for 60 epochs with the self-supervised learning approach and then trained on the MIX dataset for 30 epochs. The models can be downloaded here: crowdhuman_simple.pth [Google] [Baidu, code:simp ] SimpleTrack.pth [Google] [Baidu, code:simp ]. (This is the model we get 61.0 HOTA on the MOT17 test set. ) After downloading, you should put the baseline model in the following structure:
${FAIRMOT_ROOT}
└——————models
SimpleTrack.pth
└——————...
- Download the training data
- Change the dataset root directory 'root' in src/lib/cfg/data.json and 'data_dir' in src/lib/opts.py
- Pretrain on CrowdHuman and train on MIX:
sh experiments/crowdhuman_dla34.sh
sh experiments/mix_ft_ch_simpletrack.sh
- For ablation study, we evaluate on the other half of the training set of MOT17, you can run:
cd src
python track_half.py mot --load_model ../models/crowdhuman_simple.pth --conf_thres 0.3 --val_mot17 True
If you use our pretrained model 'crowdhuman_simple.pth', you can get 72.5 MOTA and 78.5 IDF1.
- To get the txt results of the test set of MOT17, you can run:
cd src
python track.py mot --test_mot17 True --load_model ../models/SimpleTrack.pth --conf_thres 0.3
and send the txt files to the MOT challenge evaluation server to get the results. (You can get the SOTA results 74+ MOTA on MOT17 test set using the baseline model 'SimpleTrack.pth'.)
Please see the tutorials folder.
run the TrackEval-master/scripts/run_mot_challenge.py. (Have converted code for half dataset test)
A large part of the code is borrowed from ifzhang/FairMOT, ifzhang/ByteTrack. Thanks for their wonderful works.
@misc{https://doi.org/10.48550/arxiv.2203.03985,
doi = {10.48550/ARXIV.2203.03985},
url = {https://arxiv.org/abs/2203.03985},
author = {Li, Jiaxin and Ding, Yan and Wei, Hualiang},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {SimpleTrack: Rethinking and Improving the JDE Approach for Multi-Object Tracking},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
@Article{s22155863,
AUTHOR = {Li, Jiaxin and Ding, Yan and Wei, Hua-Liang and Zhang, Yutong and Lin, Wenxiang},
TITLE = {SimpleTrack: Rethinking and Improving the JDE Approach for Multi-Object Tracking},
JOURNAL = {Sensors},
VOLUME = {22},
YEAR = {2022},
NUMBER = {15},
ARTICLE-NUMBER = {5863},
URL = {https://www.mdpi.com/1424-8220/22/15/5863},
PubMedID = {35957422},
ISSN = {1424-8220},
ABSTRACT = {Joint detection and embedding (JDE) methods usually fuse the target motion information and appearance information as the data association matrix, which could fail when the target is briefly lost or blocked in multi-object tracking (MOT). In this paper, we aim to solve this problem by proposing a novel association matrix, the Embedding and GioU (EG) matrix, which combines the embedding cosine distance and GioU distance of objects. To improve the performance of data association, we develop a simple, effective, bottom-up fusion tracker for re-identity features, named SimpleTrack, and propose a new tracking strategy which can mitigate the loss of detection targets. To show the effectiveness of the proposed method, experiments are carried out using five different state-of-the-art JDE-based methods. The results show that by simply replacing the original association matrix with our EG matrix, we can achieve significant improvements in IDF1, HOTA and IDsw metrics, and increase the tracking speed of these methods by around 20%. In addition, our SimpleTrack has the best data association capability among the JDE-based methods, e.g., 61.6 HOTA and 76.3 IDF1, on the test set of MOT17 with 23 FPS running speed on a single GTX2080Ti GPU.},
DOI = {10.3390/s22155863}
}