Speech Recognition ANN Implementation

An implementation of Speech Recognition using Artificial Neural Networks.

Language Used: Python

You need numpy and scipy for this to work.

Words Recognized: "Apple", "Banana", "Kiwi", "Lime", "Orange"

#How to add new words

Record your new word in Audacity or any audio processing software. Set the sampling rate to 44100Hz then export into a .wav file. It would be better to record a lot of samples from different speakers to improve accuracy.
Put the wav files into the training_sets directory. Rename your wav files to the word you want to add + -sample_index (ex: hello-1.wav,hello-2.wav). In this way, the feature extractor later can iterate within the files easily.
In the featureExtractor.py, append your new word to the words array.
Run the featureExtractor.py. Numpy files with Mel Cepstrum Coefficients will be generated in the mfccData folder.
In anntrainer.py, go to the main function, open another file instance: Ex. f6 = open("mfccData/hello_mfcc.npy").
Load the npy file by using np.load() then concatenate it in the inputArray
You have to edit the Neural network target outputs, so if I'm going to add the word hello, I'll need to edit the results as follows

t1 = np.array([[1,0,0,0,0,0] for _ in range(len(inputArray1))]) #Apple
t2 = np.array([[0,1,0,0,0,0] for _ in range(len(inputArray2))]) #Banana
t3 = np.array([[0,0,1,0,0,0] for _ in range(len(inputArray3))]) #Kiwi
t4 = np.array([[0,0,0,1,0,0] for _ in range(len(inputArray4))]) #Lime
t5 = np.array([[0,0,0,0,1,0] for _ in range(len(inputArray5))]) #Orange
t6 = np.array([[0,0,0,0,0,1] for _ in range(len(inputArray6))]) #Hello

target = np.concatenate([t1,t2,t3,t4,t5,t6])

then run anntrainer.py. This could take a lot of time to compute. Grab a coffee while you wait =)

#Running the speech recognizer Just run main.py! =) You can view demo.mp4 for sample usage.

#Developers A CS 180 Artificial Intelligence Project, University of the Philippines Diliman Developers: Romelio Tavas Jr., Dion Melosantos

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
mfccData		mfccData
network		network
test_files		test_files
training_sets		training_sets
.gitignore		.gitignore
LICENSE_MFCC_FEATURES		LICENSE_MFCC_FEATURES
README.md		README.md
anntester_single.py		anntester_single.py
anntrainer.py		anntrainer.py
button.gif		button.gif
demo.mp4		demo.mp4
featureExtractor.py		featureExtractor.py
features.py		features.py
main.py		main.py
record.py		record.py
sigproc.py		sigproc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Recognition ANN Implementation

About

Releases

Packages

Contributors 2

Languages

License

bongtavas/Speech-Recognition-ANN

Folders and files

Latest commit

History

Repository files navigation

Speech Recognition ANN Implementation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages