Waymo Challenge: Object Detection / Tracking in RGB images*
Frank Gabel1, Jens Settelmeier2* Work done during Corona lockdown
1 Heidelberg University
2 KTH Royal Institute of Technology, Stockholm
Summary The waymo dataset is the largest and most diverse autonomous driving datasets ever released. It consists of HD images of 5 cameras (front, front-left, front-right, left, right), LiDAR scans and associated 2D/3D bounding boxes. Data have temporal order (similar to videos). We tackle the problem of 2D object detection by deploying DeepSORT with an EfficientDet backbone.
This is a repo describing our approach to the Waymo Challenge. The challenges mandated to build a model detecting vehicles (anything with wheels, basically), cyclists and pedestrians.
As the data have a pretty significant time component, the general approach was to employ a tracking algorithm with an aggressive association metric, allowing to generalize to occlusions, shape changes etc.
We used a powerful tracking algorithm (DeepSORT) to keep track of objects. Our detection backbone was an EfficientDet with SOTA performance in real time, original paper link: https://arxiv.org/abs/1911.09070.
pip install -r requirements.txt
* pytorch==1.1.0 or 1.2.0
* tqdm
* opencv-python
* scipy
* sklearn
* matplotlib
* pillow
* tensorboardX
Dataset and Weights
- Download: Waymo Open Dataset
- Pretrained weights: Google Drive
Project
|--- EfficientDet-DeepSORT-Tracker
| |--- main.py
| |--- train
| |--- train_unsupervised.py
| |--- ...
|
|--- data
|--- training
|--- xxxxxxxxxxxx_0000.tfrecord
|--- xxxxxxxxxxxx_0001.tfrecord
|--- test
|--- yyyyyyyyyyyy_0000.tfrecord
|--- yyyyyyyyyyyy_0001.tfrecord
|--- ...
Then, call --data_path='../data'
Hardware
This source code was mainly tested on an NVIDIA 2070 / 1080Ti.
More examples
Run
Using EfficientDet backbone
python run_waymo_deepsort_efficientdet.py --gpu $GPU_TO_USE --p_semi 1.0 --data_path='../data'
Semi-supervised
python run_waymo_deepsort_yolov4.py --gpu $GPU_TO_USE --p_semi 0.5 --data_path='../data'
Train
Training this model entails first training your backbone detector and then training DeepSORT.
Training EfficientDet backbone
python train.py --gpu $GPU_TO_USE --data_path='../data'
Training DeepSORT
python deep_sort/deep/train.py --gpu $GPU_TO_USE --data_path='../data'
If you are not interested in training yourself, you can use our weights. Put them into the base directory and you are good to go.
coefficient | pth_download | GPU Mem(MB) | FPS | Extreme FPS (Batchsize 32) | mAP | mAP 0.1:0.9 |
---|---|---|---|---|---|---|
D7 | efficientdet-weights.pth | ~10000 | 5 | - | 52.2 | |
Yolo v4 | yolov4.weights | ~9000 | 7 | - | 54.4 |
Appreciate the great work from the following repositories: