ImageAI : Custom Detection Model Training

ImageAI provides the most simple and powerful approach to training custom object detection models using the YOLOv3 architeture, which which you can load into the imageai.Detection.Custom.CustomObjectDetection class. This allows you to train your own model on any set of images that corresponds to any type of objects of interest. The training process generates a JSON file that maps the objects names in your image dataset and the detection anchors, as well as creates lots of models. In choosing the best model for your custom object detection task, an evaluateModel() function has been provided to compute the mAP of your saved models by allowing you to state your desired IoU and Non-maximum Suppression values. Then you can perform custom object detection using the model and the JSON file generated.

Preparing your custom dataset

To train a custom detection model, you need to prepare the images you want to use to train the model. You will prepare the images as follows:

Decide the type of object(s) you want to detect and collect about 200 (minimum recommendation) or more picture of each of the object(s)
Once you have collected the images, you need to annotate the object(s) in the images. ImageAI uses the Pascal VOC format for image annotation. You can generate this annotation for your images using the easy to use LabelImg image annotation tool, available for Windows, Linux and MacOS systems. Open the link below to install the annotation tool. See: https://github.com/tzutalin/labelImg
When you are done annotating your images, annotation XML files will be generated for each image in your dataset. The annotation XML file describes each or all of the objects in the image. For example, if each image your image names are image(1).jpg, image(2).jpg, image(3).jpg till image(z).jpg; the corresponding annotation for each of the images will be image(1).xml, image(2).xml, image(3).xml till image(z).xml.
Once you have the annotations for all your images, create a folder for your dataset (E.g headsets) and in this parent folder, create child folders train and validation
In the train folder, create images and annotations sub-folders. Put about 70-80% of your dataset of each object's images in the images folder and put the corresponding annotations for these images in the annotations folder.
In the validation folder, create images and annotations sub-folders. Put the rest of your dataset images in the images folder and put the corresponding annotations for these images in the annotations folder.

Once you have done this, the structure of your image dataset folder should look like below:

>> train    >> images       >> img_1.jpg  (shows Object_1)
            >> images       >> img_2.jpg  (shows Object_2)
            >> images       >> img_3.jpg  (shows Object_1, Object_3 and Object_n)
            >> annotations  >> img_1.xml  (describes Object_1)
            >> annotations  >> img_2.xml  (describes Object_2)
            >> annotations  >> img_3.xml  (describes Object_1, Object_3 and Object_n)

>> validation   >> images       >> img_151.jpg (shows Object_1, Object_3 and Object_n)
                >> images       >> img_152.jpg (shows Object_2)
                >> images       >> img_153.jpg (shows Object_1)
                >> annotations  >> img_151.xml (describes Object_1, Object_3 and Object_n)
                >> annotations  >> img_152.xml (describes Object_2)
                >> annotations  >> img_153.xml (describes Object_1)

You can train your custom detection model completely from scratch or use transfer learning (recommended for better accuracy) from a pre-trained YOLOv3 model. Also, we have provided a sample annotated Hololens and Headsets (Hololens and Oculus) dataset for you to train with. Download the pre-trained YOLOv3 model and the sample datasets in the link below.

https://github.com/OlafenwaMoses/ImageAI/releases/tag/essential-v4

Training on your custom dataset

Before you start training your custom detection model, kindly take note of the following:

The default batch_size is 4. If you are training with Google Colab, this will be fine. However, I will advice you use a more powerful GPU than the K80 offered by Colab as the higher your batch_size (8, 16), the better the accuracy of your detection model.
If you experience '_TfDeviceCaptureOp' object has no attribute '_set_device_from_string' error in Google Colab, it is due to a bug in Tensorflow. You can solve this by installing Tensorflow GPU 1.13.1.
```
 pip3 install tensorflow-gpu==1.13.1
```

Then your training code goes as follows:

from imageai.Detection.Custom import DetectionModelTrainer

trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="hololens")
trainer.setTrainConfig(object_names_array=["hololens"], batch_size=4, num_experiments=200, train_from_pretrained_model="pretrained-yolov3.h5")
# In the above,when training for detecting multiple objects,
#set object_names_array=["object1", "object2", "object3",..."objectz"]
trainer.trainModel()

Yes! Just 6 lines of code and you can train object detection models on your custom dataset. Now lets take a look at how the code above works.

from imageai.Detection.Custom import DetectionModelTrainer

trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="hololens")

In the first line, we import the ImageAI detection model training class, then we define the model trainer in the second line, we set the network type in the third line and set the path to the image dataset we want to train the network on.

trainer.setTrainConfig(object_names_array=["hololens"], batch_size=4, num_experiments=200, train_from_pretrained_model="pretrained-yolov3.h5")

In the line above, we configured our detection model trainer. The parameters we stated in the function as as below:

num_objects : this is an array containing the names of the objects in our dataset
batch_size : this is to state the batch size for the training
num_experiments : this is to state the number of times the network will train over all the training images, which is also called epochs
train_from_pretrained_model(optional) : this is to train using transfer learning from a pre-trained YOLOv3 model

trainer.trainModel()

When you start the training, you should see something like this in the console:

Using TensorFlow backend.
Generating anchor boxes for training images and annotation...
Average IOU for 9 anchors: 0.78
Anchor Boxes generated.
Detection configuration saved in  hololens/json/detection_config.json
Training on: 	['hololens']
Training with Batch Size:  4
Number of Experiments:  200

Epoch 1/200
480/480 [==============================] - 395s 823ms/step - loss: 36.9000 - yolo_layer_1_loss: 3.2970 - yolo_layer_2_loss: 9.4923 - yolo_layer_3_loss: 24.1107 - val_loss: 15.6321 - val_yolo_layer_1_loss: 2.0275 - val_yolo_layer_2_loss: 6.4191 - val_yolo_layer_3_loss: 7.1856
Epoch 2/200
480/480 [==============================] - 293s 610ms/step - loss: 11.9330 - yolo_layer_1_loss: 1.3968 - yolo_layer_2_loss: 4.2894 - yolo_layer_3_loss: 6.2468 - val_loss: 7.9868 - val_yolo_layer_1_loss: 1.7054 - val_yolo_layer_2_loss: 2.9156 - val_yolo_layer_3_loss: 3.3657
Epoch 3/200
480/480 [==============================] - 293s 610ms/step - loss: 7.1228 - yolo_layer_1_loss: 1.0583 - yolo_layer_2_loss: 2.2863 - yolo_layer_3_loss: 3.7782 - val_loss: 6.4964 - val_yolo_layer_1_loss: 1.1391 - val_yolo_layer_2_loss: 2.2058 - val_yolo_layer_3_loss: 3.1514
Epoch 4/200
480/480 [==============================] - 297s 618ms/step - loss: 5.5802 - yolo_layer_1_loss: 0.9742 - yolo_layer_2_loss: 1.8916 - yolo_layer_3_loss: 2.7144 - val_loss: 6.4275 - val_yolo_layer_1_loss: 1.6153 - val_yolo_layer_2_loss: 2.1203 - val_yolo_layer_3_loss: 2.6919
Epoch 5/200
480/480 [==============================] - 295s 615ms/step - loss: 4.8717 - yolo_layer_1_loss: 0.7568 - yolo_layer_2_loss: 1.6641 - yolo_layer_3_loss: 2.4508 - val_loss: 6.3723 - val_yolo_layer_1_loss: 1.6434 - val_yolo_layer_2_loss: 2.1188 - val_yolo_layer_3_loss: 2.6101
Epoch 6/200
480/480 [==============================] - 300s 624ms/step - loss: 4.7989 - yolo_layer_1_loss: 0.8708 - yolo_layer_2_loss: 1.6683 - yolo_layer_3_loss: 2.2598 - val_loss: 5.8672 - val_yolo_layer_1_loss: 1.2349 - val_yolo_layer_2_loss: 2.0504 - val_yolo_layer_3_loss: 2.5820
Epoch 7/200

Let us explain the details shown above:

Using TensorFlow backend.
Generating anchor boxes for training images and annotation...
Average IOU for 9 anchors: 0.78
Anchor Boxes generated.
Detection configuration saved in  hololens/json/detection_config.json
Training on: 	['hololens']
Training with Batch Size:  4
Number of Experiments:  200

The above details signifies the following:

ImageAI autogenerates the best match detection anchor boxes for your image dataset.
The anchor boxes and the object names mapping are saved in json/detection_config.json path of in the image dataset folder. Please note that for every new training you start, a new detection_config.json file is generated and is only compatible with the model saved during that training.

Epoch 1/200
480/480 [==============================] - 395s 823ms/step - loss: 36.9000 - yolo_layer_1_loss: 3.2970 - yolo_layer_2_loss: 9.4923 - yolo_layer_3_loss: 24.1107 - val_loss: 15.6321 - val_yolo_layer_1_loss: 2.0275 - val_yolo_layer_2_loss: 6.4191 - val_yolo_layer_3_loss: 7.1856
Epoch 2/200
480/480 [==============================] - 293s 610ms/step - loss: 11.9330 - yolo_layer_1_loss: 1.3968 - yolo_layer_2_loss: 4.2894 - yolo_layer_3_loss: 6.2468 - val_loss: 7.9868 - val_yolo_layer_1_loss: 1.7054 - val_yolo_layer_2_loss: 2.9156 - val_yolo_layer_3_loss: 3.3657
Epoch 3/200
480/480 [==============================] - 293s 610ms/step - loss: 7.1228 - yolo_layer_1_loss: 1.0583 - yolo_layer_2_loss: 2.2863 - yolo_layer_3_loss: 3.7782 - val_loss: 6.4964 - val_yolo_layer_1_loss: 1.1391 - val_yolo_layer_2_loss: 2.2058 - val_yolo_layer_3_loss: 3.1514
Epoch 4/200
480/480 [==============================] - 297s 618ms/step - loss: 5.5802 - yolo_layer_1_loss: 0.9742 - yolo_layer_2_loss: 1.8916 - yolo_layer_3_loss: 2.7144 - val_loss: 6.4275 - val_yolo_layer_1_loss: 1.6153 - val_yolo_layer_2_loss: 2.1203 - val_yolo_layer_3_loss: 2.6919
Epoch 5/200
480/480 [==============================] - 295s 615ms/step - loss: 4.8717 - yolo_layer_1_loss: 0.7568 - yolo_layer_2_loss: 1.6641 - yolo_layer_3_loss: 2.4508 - val_loss: 6.3723 - val_yolo_layer_1_loss: 1.6434 - val_yolo_layer_2_loss: 2.1188 - val_yolo_layer_3_loss: 2.6101
Epoch 6/200
480/480 [==============================] - 300s 624ms/step - loss: 4.7989 - yolo_layer_1_loss: 0.8708 - yolo_layer_2_loss: 1.6683 - yolo_layer_3_loss: 2.2598 - val_loss: 5.8672 - val_yolo_layer_1_loss: 1.2349 - val_yolo_layer_2_loss: 2.0504 - val_yolo_layer_3_loss: 2.5820
Epoch 7/200

The above signifies the progress of the training.
For each experiment (Epoch), the general total validation loss (E.g - loss: 4.7582) is reported.
For each drop in the loss after an experiment, a model is saved in the hololens/models folder. The lower the loss, the better the model.
Tensorboard report file for the training will be saved in the hololens/logs folder.

Once you are done training, you can visit the link below for performing object detection with your custom detection model and detection_config.json file.

Detection/Custom/CUSTOMDETECTION.md

Evaluating your saved detection models' mAP

After training on your custom dataset, you can evaluate the mAP of your saved models by specifying your desired IoU and Non-maximum suppression values. See details as below:

Single Model Evaluation: To evaluate a single model, simply use the example code below with the path to your dataset directory, the model file and the detection_config.json file saved during the training. In the example, we used an object_threshold of 0.3 ( percentage_score >= 30% ), IoU of 0.5 and Non-maximum suppression value of 0.5.

from imageai.Detection.Custom import DetectionModelTrainer

trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="hololens")
metrics = trainer.evaluateModel(model_path="detection_model-ex-60--loss-2.76.h5", json_path="detection_config.json", iou_threshold=0.5, object_threshold=0.3, nms_threshold=0.5)

Consider that trainer.evaluateModel method will show the metrics on standard output as shown below, but also returns a list of dicts containing all the information that is displayed.

Sample Result:

Model File:  hololens_detection_model-ex-09--loss-4.01.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.9613
mAP: 0.9613
===============================

Let's see how those metrics looks like:

[{
    'average_precision': {'hololens': 0.9613334437735249},
    'map': 0.9613334437735249,
    'model_file': 'hololens_detection_model-ex-09--loss-4.01.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
}]

Multi Model Evaluation: To evaluate all your saved models, simply parse in the path to the folder containing the models as the model_path as seen in the example below:

from imageai.Detection.Custom import DetectionModelTrainer

trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="hololens")
metrics = trainer.evaluateModel(model_path="hololens/models", json_path="hololens/json/detection_config.json", iou_threshold=0.5, object_threshold=0.3, nms_threshold=0.5)

Sample Result:

Model File:  hololens/models/detection_model-ex-07--loss-4.42.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.9231
mAP: 0.9231
===============================
Model File:  hololens/models/detection_model-ex-10--loss-3.95.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.9725
mAP: 0.9725
===============================
Model File:  hololens/models/detection_model-ex-05--loss-5.26.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.9204
mAP: 0.9204
===============================
Model File:  hololens/models/detection_model-ex-03--loss-6.44.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.8120
mAP: 0.8120
===============================
Model File:  hololens/models/detection_model-ex-18--loss-2.96.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.9431
mAP: 0.9431
===============================
Model File:  hololens/models/detection_model-ex-17--loss-3.10.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.9404
mAP: 0.9404
===============================
Model File:  hololens/models/detection_model-ex-08--loss-4.16.h5 
Using IoU :  0.5
Using Object Threshold :  0.3
Using Non-Maximum Suppression :  0.5
hololens: 0.9725
mAP: 0.9725
===============================

Let's see how those metrics looks like:

[{
    'average_precision': {'hololens': 0.9231334437735249},
    'map': 0.9231334437735249,
    'model_file': 'hololens/models/detection_model-ex-07--loss-4.42.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
},
{
    'average_precision': {'hololens': 0.9725334437735249},
    'map': 0.97251334437735249,
    'model_file': 'hololens/models/detection_model-ex-10--loss-3.95.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
},
{
    'average_precision': {'hololens': 0.92041334437735249},
    'map': 0.92041334437735249,
    'model_file': 'hololens/models/detection_model-ex-05--loss-5.26.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
},
{
    'average_precision': {'hololens': 0.81201334437735249},
    'map': 0.81201334437735249,
    'model_file': 'hololens/models/detection_model-ex-03--loss-6.44.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
},
{
    'average_precision': {'hololens': 0.94311334437735249},
    'map': 0.94311334437735249,
    'model_file': 'hololens/models/detection_model-ex-18--loss-2.96.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
},
{
    'average_precision': {'hololens': 0.94041334437735249},
    'map': 0.94041334437735249,
    'model_file': 'hololens/models/detection_model-ex-17--loss-3.10.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
},
{
    'average_precision': {'hololens': 0.97251334437735249},
    'map': 0.97251334437735249,
    'model_file': 'hololens/models/detection_model-ex-08--loss-4.16.h5',
    'using_iou': 0.5,
    'using_non_maximum_suppression': 0.5,
    'using_object_threshold': 0.3
}
]

>> Documentation

We have provided full documentation for all ImageAI classes and functions in 3 major languages. Find links below:

Documentation - English Version https://imageai.readthedocs.io
Documentation - Chinese Version https://imageai-cn.readthedocs.io
Documentation - French Version https://imageai-fr.readthedocs.io

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUSTOMDETECTIONTRAINING.md

CUSTOMDETECTIONTRAINING.md

ImageAI : Custom Detection Model Training

TABLE OF CONTENTS

Preparing your custom dataset

Training on your custom dataset

Evaluating your saved detection models' mAP

>> Documentation

Files

CUSTOMDETECTIONTRAINING.md

Latest commit

History

CUSTOMDETECTIONTRAINING.md

File metadata and controls

ImageAI : Custom Detection Model Training

TABLE OF CONTENTS

Preparing your custom dataset

Training on your custom dataset

Evaluating your saved detection models' mAP

>> Documentation