Skip to content

Latest commit

 

History

History
66 lines (47 loc) · 5.34 KB

README.md

File metadata and controls

66 lines (47 loc) · 5.34 KB

ISL_hand_gesture_recognition_in_real-time

Table of Contents

Result

animated

In this result, recognition is done on 5-9 ISL hand gesture video. The video is played in left window and the same video is played in right window after processing. Process video is playing slowly because i have no good GPU, using google colab GPU, i am able to play process video at around 8 FPS.

Overview

It is a vision-based system in which deep 3d CNN arhitecture is used to recognize ISL hand gesture in real-time and video using tranfer learning. It recognize 10 ISL hand gestures for numeric digits (0-9) in which all are static gesture except gesture for 6 which is dynamic. But, it can be extended for large no. of gesture classes without requiring huge amount of data. It gives around 85 % accuracy on video stream.

Abstract

Real-time recognition of ISL hand gestures using vision-based system is a challenging task because there is no indication when a dynamic gesture is starts and ends in a video stream and there is no ISL data publically available unlike ASL, to work on. In this work, i handle these challenges by doing transfer learning and operating deep 3D CNN architecture using sliding window approach. Sliding window approach suffers with multiple time activations problem but i remove it by doing some post processing. To find the region of interest(RoI) is also a difficult task, i solve this using face detection algorithm. The proposed architecture consists of two models: (1) A detector which is a lightweight CNN architecture to detect gestures and (2) a classifier which is a deep CNN to classify the detected gestures. To measure misclassifications, multiple detections, and missing detections at the same time, i use Levenshtein Distance. Using this, i find Levenshtein accuracy on video stream. i create own dataset of 10 ISL hand gestures for numeric digits(0-9), in which just 70 samples are created for each gesture class. i fine tune ResNeXt-101 model on the dataset, which is used as a classifier, achieves good classification accuracy of 95.79 % and 94.39 % on training set and validation set respectively and around 85 % considerable accuracy on video stream.

Installation

Just install the necessary libraries mentioned in the requirements.txt.

Run

To run the app, just run this command after cloning the repository, installing the necessary libraries and downloading the models.

python app.py

Note: I tested it only on windows not on other os platforms like Linux, macOS.

Training

I used Google colab GPU to train or fine tune the classifier.

Use training.ipynb to train or fine tune the classifier on Google colab GPU or your own GPU

Pretrained models

Download pretrained ResNeXt_101 classifier model from here, which is trained on jester largest dynamic hand gesture dataset.

Download pretrained ResNetl_10 detector model from here, which is trained on Egogesture hand gesture dataset.

Download fine tuned ResNeXt_101 classifier model from here, which is fine tuned on our ISL hand gesture dataset.

Note: To run the app you would just need detector and classifier, after downloading, place them in same directory where all other files are present.

Technologies used

         

    

License

Licensed under MIT Licencse

Credits

I thank Okan Köpüklü, Ahmet Gündüz et al. for providing the codebase and i build this project on top of that.

I also thank my freinds Kunal Singh Bhandari, Mohd. Bilal and Digant Bhanwariya who all helped me in Web App design and data creation.

I also want to thank to Google for providing free Colab GPU service to everyone, due to which i was able to train the model.