Skip to content

Latest commit

 

History

History
82 lines (58 loc) · 6.44 KB

README.md

File metadata and controls

82 lines (58 loc) · 6.44 KB

Detecting lost objects with Deep Learning

In this repository you can find my experiments using Deep Learning on VDAO database.

VDAO Database

VDAO is a video database containing annotated videos in a cluttered industrial environment. The videos were captured using a camera on a moving platform.

The complete database comprises a total 6 multi-object, 56 single-object and 4 no-object (for reference purposes) footages, acquired with two different cameras and two different light conditions, yielding an approximate total of 8.2 hours of video. A total of 77 videos form the VDAO base. Those videos are grouped into tables according to configurations such as number of lost objects and illuminations.

See here the paper presenting the database. You can download the database videos and related annotation files from the official VDAO database webpage.

Or if you prefer, you can download the videos and annotations directly from here. The links just point to the official distribution.

You can have a bunch of useful tools to play with VDAO database in the VDAO_Access Project.

The images below show examples of reference frames (no object) and target frames (with objects to be detected).

AAAAAA

Examples of the VDAO dataset reference frames (no objects)

Examples of the VDAO dataset target frames (objects manually annotated with bounding boxes)

VDAO Alignment

As mentioned before, VDAO database has 77 videos divided into 10 tables. Each table (except the table 01) has one reference video and multiple videos containing lost objects. Some applications it is necessary to make temporal alignment between the target videos (with objects) and the reference videos.

Part of this project focused on performing temporal alignment of the target videos to their corresponding reference ones. The frames correspondences can be found here.

YOLO

Yolo (You Only Live Look Once) is a real-time object detection and classification that obtained excellent results on the Pascal VOC dataset. So far, yolo has two versions: Yolo V1 and Yolo V2, also refered as Yolo 9000. Click on the image below to watch Yolo 9000's promo video.

The authors have created a website explaining how it works, how to use it and how to train yolo with your images. Check the references below:

YOLO: You Only Look Once: Unified, Readl-Time Object Detection (2016)
(Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi)
[site] [pdf] [slides] [talk] [ted talk]

YOLO9000: Better, Faster, Stronger (2017)
(Joseph Redmon, Ali Farhadi)
[site] [pdf] [talk] [slides]

YOLO: People talking about it
[Andrew NG] [Siraj Raval]

YOLO: People writing about it (Explanations and codes)
[Towards data science]: A brief summary about yolo and how it works.
[Machine Think blog]: A brief summary about yolo and how it works.
[Timebutt's github]: A tutorial explaing how to train yolo 9000 to detect a single class object.
[Timebutt's github]: Read this if you want to understand yolo's training output.
[Cvjena's github]: Comments of some of the tags used in the cfg files.
[Guanghan Ning's blog]: A tutorial explaining how to train yolo v1 with your own data. The author used two classes (yield and stop signs).   [AlexeyAB's github]: Very good project forked from yolo 9000 supporting Windows and Linux.
[Google's Group]: Excellent source of information. People ask and answer doubts about darknet and yolo. [Guanghan Ning's blog]: Studies and analysis on reducing the running time of Yolo on CPU. [Guanghan Ning's blog]: Recurrent YOLO. This is an interesting work mixing recurrent network and yolo for object tracking.

Yolo's pretrained weights and cfg files

Find below pretrained weights to be used with its respective networks:

To do