The project was started as the final project for Data MInig Course. The requirements were to create 2 classification algorithms and 2 evaluation techniques. According to National Institute on Deafness and Other Communication Disorders (NIDCD), about 2 to 3 out of every 1,000 children in the United States are born with a detectable level of hearing loss in one or both ears.1 and about 28.8 million U.S. adults could benefit from using hearing aids.2
The dataset used in this implementation is ASL Alphabet from Kaggle.
The data set is a collection of images of alphabets from the American Sign Language, separated in 29 folders which represent the various classes. The training data set contains 87,000 images which are 200x200 pixels. There are 29 classes, of which 26 are for the letters A-Z and 3 classes for SPACE, DELETE and NOTHING.
Examples of the dataset (letters A to D):
2 types of networks were utilised:
- using Transfer Learning from Inception V3
- CNN built from scratch
In this implementation a regular CNN was created in Keras using a network similar to VGG 16.
Data was preprocessed with some data augmentation from Keras and fed into the network.
Transfer learning or inductive transfer is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.[3] For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks. This area of research bears some relation to the long history of psychological literature on transfer of learning, although formal ties between the two fields are limited.
The weights of Inception V3 were utlized.
Inception-v3 is trained for the ImageNet Large Visual Recognition Challenge using the data from 2012. This is a standard task in computer vision, where models try to classify entire images into 1000 classes, like "Zebra", "Dalmatian", and "Dishwasher".
Inception is well suited for this task due to the fact that it works with images, therefore the features that all images have in common are already extracted.
The network have shown high performance results.
Implementation | training loss | training accuracy | validation loss | validation accuracy |
---|---|---|---|---|
Convolutional Neural Network (scratch) | 0.0584 | 0.9851 | 0.3794 | 0.9225 |
Transfer Learning | 1.2887 | 0.6502 | 3.7230 | 0.2230 |
Dependencies used are located in a req.txt file
git clone https://github.com/Denisolt/DeepSign2Text.git
cd DeepSign2Text
virtualenv -p python3 venv
source venv/bin/activate
pip install -r req.txt
jupyter notebook
Now either open the ScratchCNN notebook or TransferLearning notebook.
Run the notebook.
All additional resources including other datasets and published papers are located in the folder resources