-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hotword Detection #100
Comments
Closing as a duplicate of #18 |
I'm not meaning a command prefix, I'm meaning a string that when detected will trigger a callback function so that the user can be visually informed speech input has been activated. |
Hello again @alanjames1987, sorry for the late response on this issue. Here's an idea of how to do this, using the new regular expression support available in v2.0.0 // run this after hotword was detected to register the "real" commands
var hotWordDetected = function() {
annyang.removeCommands();
annyang.addCommands({
'hello': function() { alert('Hello world!'); }
});
}
// initial command to listen for the hotword
var hotwordCommand = {
'hotword': {'regexp': /shenanigans/, 'callback': hotWordDetected}
}
annyang.addCommands(hotwordCommand);
annyang.start(); If what you had in mind was that the user would have to always say the hotword before each command, you can do something like: var hello = function() {
alert('Hello world!');
}
var goodbye = function() {
alert('Goodbye world!');
}
annyang.addCommands({
'hello': {'regexp': /.* shenanigans hello/, 'callback': hello},
'goodbye': {'regexp': /.* shenanigans goodbye/, 'callback': goodbye}
});
annyang.start(); |
I will try that out but that might be a good solution. It still seems slightly like a hack. I was hoping this could be built into annyang. The code to interact with it might look something like this. function hotwordDetectionHandler() {
// code to trigger a sound
// or update interface to show it's listening
}
function hotwordTimeoutHandler() {
// code to trigger a sound
// or update interface to show it's not listening
}
var hotwords = {
'(hey) computer': hotwordDetectionHandler,
'(hey) hal': hotwordDetectionHandler,
'(hey) jarvis': hotwordDetectionHandler,
};
var commands = {
'show me *term': showFlickr,
'calculate :month stats': calculateStats,
'say hello (to my little) friend': greeting
};
annyang.hotwords(true);
annyang.addHotwords(hotwords);
annyang.hotwordTimeout(1000); // <-- if a sentence isn't started within the time a deactivation function is called
annyang.hotwordTimeoutHandler(hotwordTimeoutHandler); // <-- function to run after timeout
annyang.addCommands(hotwordCommand);
annyang.start(); |
That's an interesting idea... and a very well thought out API! How do you see the importance of allowing separate hotwordDetectionHandlers? Why allow just one hotwordTimeoutHandler but multiple hotwordDetectionHandlers? Is there a specific common use case that requires multiple ones? This would allow us to simplify the API to something like: var hotwords = [
'(hey) computer',
/(hey|hello) hal/
]; |
I don't think multiple hotwordDetectionHandlers is very important. I added it in there because it was in line with the commands are currently added to annyang and I was trying to keep a similar API. I think the idea of sending the spoken hotword to the hotword handler is great. |
Sounds good. Would you like to give this a shot and send me a pull request? |
I will look into this as soon as I can, hopefully this weekend. I know I will have to use interim results, so I will be enabling that. |
Enabling interim results seems like a very drastic change to how annyang works, and doesn't really seem required for hotword detection. Is there a reason this feature can't be enabled without enabling interim results? |
I can only see real time hotword detection being added if we have real time results using interim results. There might be a better way. If you think there is I would love to hear it. |
Hi there, I am wondering if this feature has been added. |
It hasn't been added. I have had no time to work on this yet. |
Any plans on a timeframe for getting this feature implemented? |
+1 |
1 similar comment
+1 |
Looks like the Snowboy hotword detection toolkit is exactly used for this purpose: https://github.com/kitt-ai/snowboy It works offline so no streaming data to Google until you explicitly activate it. Currently there are discussions about a NodeJS module (Kitt-AI/snowboy#4). Anyone wants to give it a try? |
Now that we've finished the snowboy node module, I can continue with my master plan! Because annyang is such an awesome library, there have been loads of people (myself included), that have used it for "non-web" (Electron or otherwise) projects. Just to make my point, there are over 700 forks of @TalAter's annyang-electron-demo. That's is why I've started building It's probably worth pointing out that it's not ready for prime time just yet, but I am looking for collaborators, so if you're interested hit me up! 🚀 |
I'm currently building a "Jarvis" like system based on a chromium-browser and a rpi with a 7" screen. At first, when i saw annyang doesn't use hotwords, it was perfect. But after adding a few commands, well.. you can imagine what chaos is in the house :) I'm looking forward for a hotword plugin / update for annyang. |
After some deliberation I decided to take the "core" of annyang and include it in the project - it wasn't built to run outside of the web browser and there's a lot of logic that Sonus already offers that would take a lot of work to plumb into annyang. I've included the annyang command registration system out of the box as a part of Sonus. Here's an example: 'use strict'
const Sonus = require('sonus')
const speech = require('@google-cloud/speech')({
projectId: 'streaming-speech-sample',
keyFilename: './keyfile.json'
})
const hotwords = [{ file: './resources/sonus.pmdl', hotword: 'sonus' }]
const language = "en-US"
const sonus = Sonus.init({ hotwords, language }, speech)
const commands = {
'hello': () => {
console.log('You will obey');
},
'(give me) :flavor ice cream': flavor => {
console.log('Fetching some ' + flavor + ' ice cream for you, yo')
},
'turn (the)(lights) :state (the)(lights)': state => {
console.log('Turning the lights', (state == 'on') ? state : 'off')
},
'stop': () => {
console.log('Stopping...')
}
}
Sonus.annyang.addCommands(commands)
Sonus.start(sonus)
console.log('Say "' + hotwords[0].hotword + '"...')
sonus.on('hotword', (index, keyword) => console.log("!" + keyword))
sonus.on('partial-result', result => console.log("Partial", result))
sonus.on('final-result', result => {
console.log("Final", result)
if (result.includes("stop")) {
Sonus.stop()
}
}) As of tonight I've published Feedback is welcome and appreciated. |
Here is a how I ended up creating a Global Command Prefix and Suffix /* SET GLOBAL COMMAND PREFIX */
var globalCommandPrefix = "Computer (please)" + " ";
/* SET GLOBAL COMMAND Suffix */
var globalCommandSuffix = " " + "(please)";
/* SET UNIQUE COMMAND TEXT */
var command1 = "say my name is :name";
var command2 = "I am :name";
(function () {
var commands, log, sayName;
log = $('.log');
sayName = function (name) {
log.append('<li>Your name is ' + name + '!</li>');
return console.log(name);
};
/* CONCATENATE COMMANDs IN VARIABLES */
var command1Con = globalCommandPrefix + command1 + globalCommandSuffix;
var command2Con = globalCommandPrefix + command2 + globalCommandSuffix;
/* USE VARIABLE IN BRACKETS AS OBJECT KEY */
commands = { [command1Con]: sayName,
[command2Con]: sayName};
annyang.addCommands(commands);
annyang.start();
annyang.debug();
}.call(this));
$('.globalCommandPrefix').text(globalCommandPrefix);
$('.globalCommandSuffix').text(globalCommandSuffix);
$('.command1').text(command1);
$('.command2').text(command2); I'm sure it could be a bit cleaner. It works well for my use case as I am developing a plugin for another piece of software who's API allows be to use a GUI to toggle on and off parts of the code each command/function time I drag a new stack into the IDE. I set the global commands once per page or use PHP to set it once per site. |
I'll +1 to this issue. It would definitely be awesome to have some front-end javascript based hotword detection. If i'm correct snowboy and sonus both require node.js server side stuff? I'm writing my own home assistant bot as well, using Python for command processing, and I only use browser as a UI that recognizes speech and sends text commands to the Python Flask server. I chose this approach, because this way I can just put a few cheap android or windows tablets around the house, instead of dealing with and mixing a lot of microphones routed to one pc. It also allows me to use my AI when I'm not at home. So it makes it more like Cortana\OkGoogle\Alexa. So I'm really curious about how to detect hotwords with browser-side JS. |
Sonus uses Node.js, but it's a bit a-typical because it's primarily a "client" library intended for low powered hardware devices. I'm also looking to create a Python interface: evancohen/sonus#13. To address your main question: You can run browser based detection with pocketsphinx. An alternative that I really like is JsSpeechRecognizer. You need a reasonably high powered device in order to actually get real-time recognition for both of these. Accuracy is also a big problem, if you have any background noise you are unlikely to get any kind of reasonable detection (and lots of false positives). I went down the "offline hotword recognition in the browser" path for my smart mirror. After a lot of pain and dead-ends I found snowboy, wrote their Node library, and created sonus. As an aside (and for inspiration): My current home automation solution right now uses a bunch of $9 CHIPs + $5 PlayStation Eyes + Sonus. Each device is location aware ("turn on the lights" will do something different depending on what room you are in, but "turn on the living room lights" will always turn on the living room lights). Also cool: Next Thing Co also recently released the $16 CHIP Pro which has an on-board microphone (I've yet to receive mine, but it looks promising). |
I don't really need offline recognition, I only need offline hotword detection in browser, to activate google's online speech recognition after that. This way I'll be able to both NOT spam google with non-stop speech recognition requests (well, as far as people are talking), and talking paranoia - it will only get commands for recognition, no private talks. And running offline speech recognition is even harder, because I need it to work with Russian language, and sphinx only supports english out of the box. As for the power, I can record audio in the browser, send it to the home server (powerful DIY NAS), it can recognize whether there is hotword or not, but that would probably take too long. |
@Nixellion @evancohen |
@andreimavenhut Right now I use Annyang in a form of just ONE command basicallly. , *tag. It just grabs everything after botname, and sends it to my personal Python server, which then does all the natural language processing, user-specific context, user-specific conversations, finding the right command and\or using chatbot. I limit the use of JS only for a very simple web-ui. This way I can then make very simple native apps for other platforms IF needed. And I won't have to rewrite a lot of code for that. And it's more secure, I can give client access to any number of friends, and they can have fun with my bot, and have security clearance restricting them from accessing sensitive commands :D I actually already have the groundwork for speaker recognition. My client can send audio to server for processing, but I got stuck on actual audio speaker recognition yet. So, I don't think it's a good solution to detect hotword with annyang, then process another command. With annyang it's easier to just use commands with global prefix. Because I don't really see any other reasons other than bandwidth and privacy that you would need a separate hotword, it only makes running all commands in 2 steps instead of one. Instead of just saying without a pause "SuperBot, kill the lights!" you will have to go through a dialog:
With prefix approach it's just:
Now, I could use Python's speech recognition for hotword detection, sending audio to the server to proocess it using some custom matching algorythm, but I don't want to put so much data through my local network all the time. I mean, always sending audio, each time there is SOME sound detected... Oh, and about your second question. While you're waiting for evancohen's answer, my opinion is that with Rpi or chips you should probably go with Python, using it's SpeechRecognition module, which support online recognition using Google's services, and also bing and a number of other online services (which you have to get API though). It does not support russian Yandex recognition service yet, but in fact it's not that hard to write your own online recognition module. It's all about recording audio, and just sending it as POST request to their server, and receiving the JSON response. But SpeechRecognition (or SpeechRecognizer? Not sure how it's called in pip) also supports offline recognition using Sphinx. If you're english speaker, you're in huge luck. It does a nice job at recognizing english language out of the box. Worse than google's or any other online service (they're constantly improving, from what I understand they use neural networks to improve recognition over time), but it's still pretty good. |
I thought about using sphinx or another offline recognizer, but after a few benchmarks i went with the SpeechRecognition in chrome. I know you can use their APIs, but hey.. they do cost :) and inside chrome, the speechRecognition has a ApiKey that (from what i know) it's unlimited, which converts for me in 0 costs. |
@andreimavenhut , Well, Chrome's speech recognition is actually using Google's servers as well, from what I know, so it's still bandwidth usage and all. And in Python's speech recognition there is actually an unlimited Google apikey as well. So you get it for free in python too. And sphinx is of course free as well but a pain in the ass :D |
@andreimavenhut For speech recognition I use Sonus. In terms of audio encoding, it uses 16-bit signed-integer linear pulse modulation coded WAV (no mp3 support today). It's entirely stream based, so you could theoretically stream to it from your web browser so a server instance of Sonus (although I've never actually tried this). One big problem trying to do keyword spotting (aka hotword detection) off-device is latency/lag in detection. That's also a problem in the browser, JS simply isn't really optimized for audio processing... That's not to say it can't be done - it will probably just be a bit slower. I saw @Nixellion's comment on Kitt-AI/snowboy#98 and would love to see browser compatibility (I would create a browser based version of Sonus in a heartbeat). Based on what you described it's exactly what you are looking for. Since this conversation is no longer directly related to Annyang (and so we don't spam others) I've created a new issue on the Sonus repo to continue this discussion: evancohen/sonus#28 |
Is there an update on this issue? i.e. using annyang along with a hotword? Is the solution to always join the hotword in front of the command? |
@gaitat You could try running continious recognition and checking if there's a hotword on each update. Once there is - restart and go for the phrase recognition. I did not test this approach, but thinking about doing it some time. If you're not worried about constant stream of your audio going to google's servers that is. Alternatively, instead of appending, I would also split the string at the hotword. Because you may be talking something, recognition starts. And in the middle of your talk you say your hotword and command. It will be in the middle, not in front of the string. Did not try this approach either though :D |
@andreimavenhut HI Andrei I was reading with interest about your conversation class. Currently I'm adding items via speech to physical boxes with Annyang, saying for example "add gloves to box number 4" (gloves are then saved to box 4 in MySQL), but I was thinking about the possibility of removing things and Annyang asking me e.g. "are you sure you want to remove the candle from box number 4", then waiting for me to say either "yes" or "no". |
@LukeMcLachlan Hi Luke, it's not that great of a module/class, i just hacked it fast. Here it is: Short Explanation: |
Thank you @lynxaegon I'll have a look at it this evening and see what I can do with it, very kind of you to share it with me! |
I would like to have hotword detection availible in Annyang, basically allowing me to tell Annyang to not "activate" until a certain hotword is spoken. Essentally speech recognition would be working the entire time but only start caching results returned from the
webkitSpeechRecognition
from the first instance of the spoken hotword.Similar to how Okay Google works, but without the plugin.
The text was updated successfully, but these errors were encountered: