SSD: Single Shot MultiBox Detector | a PyTorch Model for Object Detection | VOC , COCO | Custom Object Detection
This repo contains code for Single Shot Multibox Detector (SSD) with custom backbone networks. The authors' original implementation can be found here.
- Pascal Visual Object Classes (VOC) data from the years 2007 and 2012.
- COCO.
- Custom Dataset.
VOC dataset contains images with twenty different types of objects.
{'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'}
Each image can contain one or more ground truth objects.
Each object is represented by –
-
a bounding box in absolute boundary coordinates
-
a label (one of the object types mentioned above)
-
a perceived detection difficulty (either
0
, meaning not difficult, or1
, meaning difficult)
Specfically, you will need to download the following VOC datasets –
-
2007 trainval (460MB)
-
2012 trainval (2GB)
-
2007 test (451MB)
We will need three inputs.
- For SSD300 variant, the images would need to be sized at
300, 300
pixels and in the RGB format. - PyTorch follows the NCHW convention, which means the channels dimension (C) must precede the size dimensions(1, 3, 300, 300).
Therefore, images fed to the model must be a Float
tensor of dimensions N, 3, 300, 300
, and must be normalized by the aforesaid mean and standard deviation. N
is the batch size.
For each image, the bounding boxes of the ground truth objects follows (x_min, y_min, x_max, y_max) format`.
- In config.json change the paths.
- "backbone_network" : "MobileNetV2" or "MobileNetV1"
- For training run
python train.py config.json
python inference.py image_path checkpoint