Looking for Speech Recognition feature for Android

Yeah, its always some new, whiz bang feature. My client knows there is decent Speech Recognition on their Android phones and wants Apps that use that feature. Since I dont see that in any Audio API or Tools/Code I have to think it doesn’t yet exist.

What does now exist is this:
Speech-Enable Your Android® iSpeech Text to Speech (TTS) and Speech Recognition (ASR) SDK for Android lets you Speech-enable any Android App
https://www.ispeech.org/developers/android

Now is there a way for us Corona devs to integrate this Java module and control it in some manner from Corona code?
Thanks

[import]uid: 6114 topic_id: 30721 reply_id: 330721[/import]

You can do this with the Enterprise version of Corona. With this version, Corona is just a library (ie: a Jar file) that you can include in your own Android projects. This means that you’ll be developing your app with the Android SDK and Eclipse instead of the Corona Simulator, like how all other Android developers do it. But it also gives you the freedom to add whatever libraries that you want to your app, including excluding Corona’s default libraries (such as OpenFeint), and our Corona Java classes give you the ability to extend the Lua APIs. [import]uid: 32256 topic_id: 30721 reply_id: 123063[/import]

Not a great solution but assuming you have a decent connection and assuming Google doesn’t change things, you could use their text to speech service.

http://translate.google.com/translate_tts?tl=en&q=This+is+a+test+using+Google+text+to+Speech.

Just send no more than 100 characters in the parm, split your text up if necessary to stay in that limit, make individual calls to the service if > 100, save the downloads as sequential wav files and play them in order.

Ugly but it works.


– sample code below

local speechinstance = require(“classspeech”).new({
speechtext=(“This is a test of the google text to speech. Adding some text to go over 100 characters. Monday, Tuesday, Wednesday. One Two three. Unfortunately there is a slight pause if the audio files are split.”),
filename = “myuniquefilename”,
filedir = system.TemporaryDirectory,
audioformat=“wav”
})
speechinstance:PlayAudio()



– classspeech.lua class


local speech = {}
local speech_mt = { __index = speech } – metatable


– PUBLIC FUNCTIONS

– COnstructor

function speech.new( parms ) – constructor

local newspeech = {
speechtext=parms.speechtext, – The actual text to convert to speech
filename = parms.filename, – Filename (A seq id will automatically be appended for you)
– Make sure this would uniquely identify this “object” as the audio
– files will not be deleted and re-used if requested again
filedir = parms.filedir, – Dir… typically the tempdirectory
audioformat = parms.audioformat,
}
newspeech.play = true

return setmetatable( newspeech, speech_mt )
end

function speech:PlayAudio()
–===============================
–== Split The Text if needed
–== keep words together
–===============================

local t = {}
local sep = " "
local maxlength = 100
local i=1
local str

t[i] = “”
local strremovestuff = string.gsub(self.speechtext, ‘\n’, ’ ’ )
strremovestuff = string.gsub(strremovestuff, ‘\r’, ’ ’ )
for str in string.gmatch( strremovestuff , “([^”…sep…"]+)") do
if string.len(t[i]) + string.len(str) + string.len(sep) > maxlength then
i = i + 1
t[i] = str
else
if t[i] ~= “” then t[i] = t[i] … sep end
t[i] = t[i] … str
end
end

–======================================
–== Cleanup
–======================================
self.Cleanup = function()
self.play = false
if self.audiostreamchannel then
audio.stop( self.audiostreamchannel )
self.audiostreamchannel = nil
end
if self.audiostream then
audio.dispose( self.audiostream )
self.audiostream = nil
end
end

–======================================
–== Function to see if file exists
–======================================
self.fileExists = function( srcName, srcPath )
local results = false – assume no errors
local path = system.pathForFile(srcName, srcPath )
if path then – file exists
local file = io.open( path, “r” )
if file then – nil if no file found
io.close( file )
results = true
end
end
return results
end

–======================================
–== urlencode
–======================================
self.urlencode = function (str)
if (str) then
str = string.gsub (str, “\n”, “\r\n”)
str = string.gsub (str, “([^%w])”,function © return string.format ("%%%02X", string.byte©) end)
str = string.gsub (str, " ", “+”)
end
return str
end

–======================================
–== Function to play the audio
–== and when finished launch the next one if it exists
–======================================
self.PlaySpeechFile = function(event)
if self.play == true then
if ( event.isError ) then
print ( “Network error - download failed” )
self.Cleanup()
else
if event.pos < #t then
event.nextfile = self.filename … “seq” … event.pos + 1 … “.” … self.audioformat
else
event.nextfile = nil
end
–======================
–== if on a slow connection and the next file is unavailbe we basially stop
–======================
if event.fn and self.fileExists(event.fn,event.fp) == true then
self.audiostream = audio.loadStream( event.fn,event.fp )
self.audiostreamchannel = audio.play( self.audiostream,{
onComplete=function()
self.PlaySpeechFile({isError=event.isError,fn=event.nextfile,fp=event.fp,pos=event.pos+1})
end
})
else
self.Cleanup()
end
end
end
end

–======================
–== Loop through the split text
–======================
for i = 1,#t ,1 do
local finalfn = self.filename … “seq” … i … “.” … self.audioformat
if self.fileExists(finalfn,self.filedir) == false then
network.download(“http://translate.google.com/translate_tts?tl=en&q=” … self.urlencode(t[i]),
“GET”,
function(event) if i == 1 then self.PlaySpeechFile({isError=event.error,fn=finalfn,fp=self.filedir,pos=i }) end end ,
finalfn,
self.filedir
)
else
–=======================
–== if the audio files already exists, just play them
–=======================
if i == 1 then self.PlaySpeechFile({isError=false,fn=finalfn,fp=self.filedir,pos=i }) end
end
end

end

function speech:removeSelf()
self.Cleanup()
end

return speech [import]uid: 79594 topic_id: 30721 reply_id: 123115[/import]

Oh, I did not know that. Well that solves the stated problem doesnt it? Likely there are some other trick modules on the Java side to take advantage of, though I really dont want to have to learn and use Java.

Thanks of the advice, Joshua.
[import]uid: 6114 topic_id: 30721 reply_id: 123067[/import]

I dont need text to speech, I need the reverse as the Subject stated. My Client wants to speak data into the phone. Wouldnt that be great?

I need to find a way to hook into the existing Speech Recognition the Google provides in my Android phone.
[import]uid: 6114 topic_id: 30721 reply_id: 123131[/import]

You can do this with the Enterprise version of Corona. With this version, Corona is just a library (ie: a Jar file) that you can include in your own Android projects. This means that you’ll be developing your app with the Android SDK and Eclipse instead of the Corona Simulator, like how all other Android developers do it. But it also gives you the freedom to add whatever libraries that you want to your app, including excluding Corona’s default libraries (such as OpenFeint), and our Corona Java classes give you the ability to extend the Lua APIs. [import]uid: 32256 topic_id: 30721 reply_id: 123063[/import]

Not a great solution but assuming you have a decent connection and assuming Google doesn’t change things, you could use their text to speech service.

http://translate.google.com/translate_tts?tl=en&q=This+is+a+test+using+Google+text+to+Speech.

Just send no more than 100 characters in the parm, split your text up if necessary to stay in that limit, make individual calls to the service if > 100, save the downloads as sequential wav files and play them in order.

Ugly but it works.


– sample code below

local speechinstance = require(“classspeech”).new({
speechtext=(“This is a test of the google text to speech. Adding some text to go over 100 characters. Monday, Tuesday, Wednesday. One Two three. Unfortunately there is a slight pause if the audio files are split.”),
filename = “myuniquefilename”,
filedir = system.TemporaryDirectory,
audioformat=“wav”
})
speechinstance:PlayAudio()



– classspeech.lua class


local speech = {}
local speech_mt = { __index = speech } – metatable


– PUBLIC FUNCTIONS

– COnstructor

function speech.new( parms ) – constructor

local newspeech = {
speechtext=parms.speechtext, – The actual text to convert to speech
filename = parms.filename, – Filename (A seq id will automatically be appended for you)
– Make sure this would uniquely identify this “object” as the audio
– files will not be deleted and re-used if requested again
filedir = parms.filedir, – Dir… typically the tempdirectory
audioformat = parms.audioformat,
}
newspeech.play = true

return setmetatable( newspeech, speech_mt )
end

function speech:PlayAudio()
–===============================
–== Split The Text if needed
–== keep words together
–===============================

local t = {}
local sep = " "
local maxlength = 100
local i=1
local str

t[i] = “”
local strremovestuff = string.gsub(self.speechtext, ‘\n’, ’ ’ )
strremovestuff = string.gsub(strremovestuff, ‘\r’, ’ ’ )
for str in string.gmatch( strremovestuff , “([^”…sep…"]+)") do
if string.len(t[i]) + string.len(str) + string.len(sep) > maxlength then
i = i + 1
t[i] = str
else
if t[i] ~= “” then t[i] = t[i] … sep end
t[i] = t[i] … str
end
end

–======================================
–== Cleanup
–======================================
self.Cleanup = function()
self.play = false
if self.audiostreamchannel then
audio.stop( self.audiostreamchannel )
self.audiostreamchannel = nil
end
if self.audiostream then
audio.dispose( self.audiostream )
self.audiostream = nil
end
end

–======================================
–== Function to see if file exists
–======================================
self.fileExists = function( srcName, srcPath )
local results = false – assume no errors
local path = system.pathForFile(srcName, srcPath )
if path then – file exists
local file = io.open( path, “r” )
if file then – nil if no file found
io.close( file )
results = true
end
end
return results
end

–======================================
–== urlencode
–======================================
self.urlencode = function (str)
if (str) then
str = string.gsub (str, “\n”, “\r\n”)
str = string.gsub (str, “([^%w])”,function © return string.format ("%%%02X", string.byte©) end)
str = string.gsub (str, " ", “+”)
end
return str
end

–======================================
–== Function to play the audio
–== and when finished launch the next one if it exists
–======================================
self.PlaySpeechFile = function(event)
if self.play == true then
if ( event.isError ) then
print ( “Network error - download failed” )
self.Cleanup()
else
if event.pos < #t then
event.nextfile = self.filename … “seq” … event.pos + 1 … “.” … self.audioformat
else
event.nextfile = nil
end
–======================
–== if on a slow connection and the next file is unavailbe we basially stop
–======================
if event.fn and self.fileExists(event.fn,event.fp) == true then
self.audiostream = audio.loadStream( event.fn,event.fp )
self.audiostreamchannel = audio.play( self.audiostream,{
onComplete=function()
self.PlaySpeechFile({isError=event.isError,fn=event.nextfile,fp=event.fp,pos=event.pos+1})
end
})
else
self.Cleanup()
end
end
end
end

–======================
–== Loop through the split text
–======================
for i = 1,#t ,1 do
local finalfn = self.filename … “seq” … i … “.” … self.audioformat
if self.fileExists(finalfn,self.filedir) == false then
network.download(“http://translate.google.com/translate_tts?tl=en&q=” … self.urlencode(t[i]),
“GET”,
function(event) if i == 1 then self.PlaySpeechFile({isError=event.error,fn=finalfn,fp=self.filedir,pos=i }) end end ,
finalfn,
self.filedir
)
else
–=======================
–== if the audio files already exists, just play them
–=======================
if i == 1 then self.PlaySpeechFile({isError=false,fn=finalfn,fp=self.filedir,pos=i }) end
end
end

end

function speech:removeSelf()
self.Cleanup()
end

return speech [import]uid: 79594 topic_id: 30721 reply_id: 123115[/import]

Oh, I did not know that. Well that solves the stated problem doesnt it? Likely there are some other trick modules on the Java side to take advantage of, though I really dont want to have to learn and use Java.

Thanks of the advice, Joshua.
[import]uid: 6114 topic_id: 30721 reply_id: 123067[/import]

I dont need text to speech, I need the reverse as the Subject stated. My Client wants to speak data into the phone. Wouldnt that be great?

I need to find a way to hook into the existing Speech Recognition the Google provides in my Android phone.
[import]uid: 6114 topic_id: 30721 reply_id: 123131[/import]

Automatic Speech recognition software normally reduces the manual transcription needs. But Automatic voice recognition technology is not still matured to produce accurate transcription for non-American accents, or with people speaking quickly or multiple speakers audio files. If you have more than one voice it is almost impossible to get a good transcript. Add to that any background noise or a weak recording and you can pretty much forget it.

So, after automatic transcription, you have to massage this transcription into the final form. Some of the massages are

Correct incorrectly transcribed words/phrases.
Correct punctuation/sentence breaks.
Define paragraph breaks.

For automatic transcription, you can also refer

http://audacity.sourceforge.net

In this scenario, better to go for manual transcription. For accurate and cost-effective manual transcription, please refer:
Speech Transcription [import]uid: 177922 topic_id: 30721 reply_id: 123644[/import]

Automatic Speech recognition software normally reduces the manual transcription needs. But Automatic voice recognition technology is not still matured to produce accurate transcription for non-American accents, or with people speaking quickly or multiple speakers audio files. If you have more than one voice it is almost impossible to get a good transcript. Add to that any background noise or a weak recording and you can pretty much forget it.

So, after automatic transcription, you have to massage this transcription into the final form. Some of the massages are

Correct incorrectly transcribed words/phrases.
Correct punctuation/sentence breaks.
Define paragraph breaks.

For automatic transcription, you can also refer

http://audacity.sourceforge.net

In this scenario, better to go for manual transcription. For accurate and cost-effective manual transcription, please refer:
Speech Transcription [import]uid: 177922 topic_id: 30721 reply_id: 123644[/import]