This is a speech recognition example using spiking and artificial recurrent neural networks.
Check out this awesome demonstration built by Tim Shea at Accenture using Intel Kapaho Bay:
https://vimeo.com/intelpr/review/486105332/2b3ac2e263
This model implements the recurrent Long short-term Spiking Neural Network (LSNN) and reproduces the Google Speech Commands results from the paper:
Salaj, D., Subramoney, A., Kraisnikovic, C., Bellec, G., Legenstein, R. and Maass, W., 2020.
Spike-frequency adaptation provides a long short-term memory to networks of spiking neurons. bioRxiv.
To reproduce result from the paper (91.2% test accuracy) run the following commands:
python3 train.py --model_architecture=lsnn --window_stride_ms=1
The details that allow spiking network to achieve the high accuracy are:
- Spiking network is able to exploit the higher temporal resolution of the input so we use
--window_stride_ms=1
- For classification we consider the output of spiking network throught the sequence
--avg_spikes=True
- We use larger number of neurons
--n_hidden=2048
Resulting accuracy:
Iteration | Validation | Test |
---|---|---|
400 | 68.6% | |
1200 | 79.4% | |
2400 | 85.3% | |
4800 | 87.5% | |
18000 | 91.5% | 91.2% |
run with:
python3 train.py --model_architecture=lstm --n_hidden=512 --n_layer=1 --dropout_prob=0.4 --optimizer=adam
Resulting accuracy:
Iteration | Validation | Test |
---|---|---|
400 | 81.4% | |
1200 | 90.9% | |
18000 | 94.6% | 94.4% |
CNN model is the default one used in TensorFlow GSC example, which is based on
cnn-trad-fpool3
in the 'Convolutional Neural Networks for Small-footprint Keyword Spotting' paper.
run with:
python3 train.py --model_architecture=conv
Resulting accuracy:
Iteration | Validation | Test |
---|---|---|
18000 | 88.4% | 87.6% |
Tested with TensorFlow 2.0
and 2.1
.
To get started create the conda environment from file and activate:
conda env create -f environment.yml
conda activate venv2.1
python3 train.py --model_architecture=lsnn --window_stride_ms=1