Voice to Text

This has been fixed please build again

Scott, Thank you for fixing the microphone sounds.

I assume that getting the phonemes back is not an option (though still one I would prefer). What about sending in an extension to the lexicon with typical child utterances, such as these sets: duh, bub, fah, dah, puh, el, pah, dow – partial words like that. Is there any way to get those to match? Isn’t submitting lexicon extensions a typical thing for VTT to do?

Unfortunately there is no way to extend the native speech to text libraries on iOS or supprisingly android. In fact the apis on these libraries are one of the most limited I have seen.

Is there any possibility of getting Kindle Fire to work? I don’t know what is wrong, but it is as if init is never called. I get no initialization of the microphone (BTW, on my Android phone, I still get the microphone start/stop sounds). I get no error in the adb debug either.

Any thoughts?

BTW, we are getting pretty good results now with recognition of the children. I send back everything that the voice recognition thought it heard, and I can add words to my matching set. We are using mostly iPad with the children for now, but I want to get a cheap tablet for other families to use.

Probably won’t work with kindle because on android it uses googles speech to text service. I believe that amazon has their own service but the plugin does not currently support this. I don’t have kindle to verify but I would suspect that is the problem. As for if I plan to support amazon, as mentioned I don’t have a kindle so I would have to go out and buy a kindle and I have not had much interest from others to justify supporting amazon.

Kindle runs Android, though, and the plugin loads fine, and the microphone privileges show up correctly. If you are just using Android calls, and not hardware specifics, I thought it would work. But maybe the Kindle microphone needs different hardware support. Record does work fine for Alexa on the Kindle. I have asked Amazon to be part of their trial of Transcribe, but I think I am not a big enough customer, haven’t heard back. I like Kindle Fire for the children voicing app because it is only $50, so low point of entry for families.

Thanks. Let me know if anything changes.

^ I recognize the audio que from google now, which leads mean to believe that it uses google now specifically for speech to text. I don’t know if that is supported on kindle. I did a little googling and was not able to find anything on this.

That must be it. I am sure Amazon uses its own audio queue, the one for Alexa. Thanks, I can see that would be a big addition.

Is there any way for me in the app to see/monitor the microphone amplitude? The issue is this: there will always be a gap between when the child finishes saying a word and when the app returns with ‘correct’ or not, because the service has to recognize the silence before processing. I wanted to catch the ‘end of the word’ as soon as the child stops talking (given that we have one-word speech, this is easy in principle) and give them feedback that the app is ‘working on it’. Any way my app can ‘listen’ to the microphone sound stream also?

This is currently not supported by the plugin. You should be able to record audio while converting voice to text

Thank you, I will look into doing that.

But I have a new real problem that I cannot explain. On iOS (iPhone and iPad, different users) I have been getting text strings back that make no sense at all to me. And, consecutive text strings seem to be extensions of previous ones even though I do a stop() before doing each start().

Here are my init and start calls:

voiceToText.init(function(e)

  if (e) then

  local hit

  if e.speech then

    – here log e.speech to analytics

    for w =1,#candidates do

      – candidates is a list of subtrings that I match against the text received

      if string.find(string.lower(e.speech),string.lower(candidates[w])) then

        if not hit then

          hit = candidates[w]

        elseif string.len(candidates[w])> string.len(hit) then 

          hit = candidates[w]

        end

      end

    end

  elseif e.response then

    if e.response==“stopped” then

      hit = “stopped”

  end

  if hit then

    callBack(hit)

  end

end)

voiceToText.startRecording(nil,true,nil,nil)

I have been getting strings back that I cannot account for, and today I sat with my grandson to be certain. He did it perfectly, single word answers, no one else around, no TV, no background noise, etc.

Here are some strings I received:

Glass America crack

I don’t

I don’t know (consecutive)

Good luck

Good luck last

Good luck last try (these 3 were consecutive)

Good luck last two adapter

I can swear to you that there was nothing said even remotely similar to those. He was saying “fish” for the last 4, for example.

Is there any chance I am getting someone else’s buffer? Is nil invalid for language in the Apple version? I am very confused.

It appears that the ‘interference’ is more common during typical busy times and when I leave the microphone open longer (5 secs). I changed the code to close the microphone as soon as I get any word back, and that improved the response, and I had less interference over the weekend. Nevertheless, I had a few cases of interference this afternoon:

jesus fish

ok doc

what you are

what

when you’re

when

I was careful today to either say nothing or to say cheese or to say the correct word (fox, fish, duck, or cow). The only one close above was the first one, when I might have said ‘fish’, and the ‘ok doc’ when I said ‘duck’. Still, extra words are strange, and the last 4 are bizarre. Is it possible that this is an Apple problem?

It is possible that these interference strings only happen when I leave the whole 5 secs silent. I will test that.

Looks to be an Apple problem, I am have not run into these problems myself (actually using this plugin in a personal project and works for me 100% of the time). My plugin just uses the native voice to text service of the device. I don’t know what the cause is: weak internet, old devices, apple bug, holding it up too close, holding it too far away, older iOS version, apple server problems, etc. 

Certainly not older devices, we are using an iPhone 8, iPhone 6, and an iPad mini less than 2 years old, and all are up to date. From my testing, it seems to be at least more common if not always when the microphone is open to silence for more than a few seconds, and I think even sometimes if the microphone is left open to silence even after a single word. The words might be coming in during the stop operation. The effect might also arise from the single-word paradigm I am using, which the service is not intended for. I do not think those bizarre text strings can be accounted for by the microphone distance or other user errors.

I agree they seem like Apple server errors, combined perhaps with timing loopholes. Is there a way to force using the Google service within your plugin?

^this would require a rewrite to the plugin.  Android and iOS have different code bases. My iOS plugin use the native iOS speech recognition api and android use the native android speech recognition api. Have you tried using dictation on your devices? The recognition should be similar to that.

Hi, thank you for your great work.

I’m currently having fun developing a game of which this plugin is a crucial part.

I have a bit of difficult time understanding what is the correct workflow with it.

When exactly do I process the input and check what the user said?

I mean the following:

  1. start recording

  2. user says something

  3. recording stops on its own

  4. Now I want to check if the user said ‘hello’ or ‘goodbye’ (Example)

Do I do point 4 in the e.response==stopped part of the init function? Or should I detect it in the if e.speech then part and manually call stopRecording?

Also how many times/how often does the init function trigger during speech? Does it fire after every new word?

On Android, it will stop recording after the user stops speaking. On iOS, it will record until you hit the stop. I wish there was a way to make Android like iOS but it a limitation on Android. 

Edit:

On iOS e.speech will return after every new word.

Hi Scott, Does the plugin works on android TV Box?

I don’t have an android tv to 100% verify but I googled and looks like the speech library is supported on android tv and on stackoverflow people have had lots of success with the speech library on android tv

Hi, bought this a couple of days ago wanting to incorporate it into a project.

However, when I download and attempt to test the voiceToText-Demo-master example, upon clicking Stop I get the following error:

main.lua:40: attempt to call field ‘stopRecording’ (a nil value)

stack traceback:

main.lua:40: in function ‘?’

?: in function <?:190>

Can someone please advise?