The goal of this project is to predict if a hand gesture is "open hand", "fist", "left hand" or "right hand". It was used for my student ROS project to move a Khepera mobile robot according to hand gestures.
This project uses Fourier descriptors (see below to compute the important information of a gesture from an image), and KNN to classify the gestures.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
This project works on a Linux system.
In order to compile the project, you will need:
- C++17
- Boost library
- OpenCV
You will also need a dataset of hand gestures. You can find my dataset on Zenodo.
Firt, get the source code:
git clone https://github.com/alexandremgo/hand-gesture-recognition
Then you install the project as follow:
mkdir -p hand-gesture-recognition/build
cd hand-gesture-recognition/build
cmake ..
make
Finally you put the dataset (you can find mine here. Download the 5 zip files) in the dataset folder:
cd where_you_downloaded_zip_files
mv raw_*.zip path_this_project/dataset/
cd path_this_project/dataset/
unzip "raw_*.zip"
You will see 5 other folders in the dataset folder: open_hand, fist, left_hand, right_hand and negative. Keep those folders: that's where the preprocess images will go after executing TODO example.
Go to:
cd build/examples
You can test your camera and display the biggest found outline on the images from your camera using the example camera_detection
.
You will also be able to find your value for the outline_threshold
using this example.
The outline_threshold
is a value used during the search of the gesture outline on each image. You can put 25 for this dataset.
To run it:
./camera_detection camera_id
On linux you can find your camera by doing (it's probably 0)
ls -ltrh /dev/video*
In order to use the KNN model, you need to pre-process the images in the dataset. To do so:
./preprocess_dataset outline_threshold
The outline_threshold
is the same variable than the one above
This will save the outline in a binary image for each image:
You can see that for some images in each category, some outlines were not found during this pre-process.
To use the KNN model on 1 particular image, run:
./knn_image_predictionpath_to_image_to_predict
You can try to predict hand gestures directly coming from your webcam:
To use your webcam and the KNN model, run:
./knn_camera_prediction camera_id
With camera_id
the same as this section.
You can get the image representation of the Fourier descriptors computed from an image by running:
./image_representation_fourier_descriptors path_to_image
An example with a left hand gesture:
Fourier descriptors permit to represent a closed shapes independently of its rotation, scaling and location.
The Fourier descriptors have been computed using this article of Jean-Luc Collette.
In a nutshell: let's take a discrete representation of a closed shapes: a list of N points (x_m, y_m). We can create a complex sequence: z_m = x_m + i.y_m from this list. Then we can find a Fourier decomposition of this sequence by supposing this sequence is periodic with period N. Finaly, we only keep the most significant coefficients of this Fourier serie, and they will become the Fourier descriptors after different normalization steps.