Can Text Candy support non-english unicode languages like chinese, japanese, korean, thai?

boscotwcheung · February 27, 2012, 10:56pm

Can Text Candy support non-english unicode languages like chinese, japanese, korean, thai? Anyone successfully done this? thanks [import]uid: 18718 topic_id: 22531 reply_id: 322531[/import]

jbverschoor_bloomsix · February 28, 2012, 1:58am

I’ve checked with the autor of TC… Multibyte characters are not supported. [import]uid: 5942 topic_id: 22531 reply_id: 89851[/import]

boscotwcheung · February 28, 2012, 2:17am

Thanks for checking with them [import]uid: 18718 topic_id: 22531 reply_id: 89855[/import]

gtatarkin · February 28, 2012, 5:09am

It does… If you build bitmap sprite with all this language. [import]uid: 12704 topic_id: 22531 reply_id: 89878[/import]

x-pressive.com · March 7, 2012, 6:16am

Both is correct. While TC does not provide direct support for multibyte characters with vector fonts yet, you can use them with bitmap fonts by placing any char you like onto your charset texture and mapping it to a “normal” char (see the chardOrder string passed when loading a new charset).
[import]uid: 10504 topic_id: 22531 reply_id: 91705[/import]

xhuang · March 14, 2012, 3:37pm

Is it inherently difficult to support multi-byte characters or just not enough people asked for it? I’d love to use it with a TTF directly that I download without the tediously convert them into a bitmap first. [import]uid: 124979 topic_id: 22531 reply_id: 93388[/import]

juliusbangert · March 5, 2013, 2:25am

I’d like for this to be possible too. I have spent a while setting up text candy using a bitmap font texture in my game only to find out that it doesn’t work with my localisations. Languages such as Japanese and Chinese need for the document encoding to be set to unicode(UTF-8) to handle the characters but this causes errors with text candy.
I can’t use the suggested solution of mapping non-“normal” characters to “normal” characters because I need both and this will get very messy. Has no one successfully used text candy across multiple languages?
x-pressive.com, please advise. [import]uid: 62617 topic_id: 22531 reply_id: 145053[/import]

x-pressive.com · March 5, 2013, 3:52am

The basic problem is that the string library shipped with Corona treats multibyte characters as two different values when passed as function parameters, which makes it very difficult to handle. This does not only affect the string library, we also observed a strange behaviour with the SQL library shipped with Corona when using Umlauts, for example -we had to do massive workarounds as we tried to work with it. Our own impression is that multibyte support is still not fully included with Corona, even if document encoding is set to UTF-8. This is why we explicitely mention that Text Candy does not support multibyte characters (yet) on the Text Candy web site. We’re still looking for a solution that would not affect performance too much. [import]uid: 10504 topic_id: 22531 reply_id: 145061[/import]

juliusbangert · March 5, 2013, 5:20pm

Hi x-pressive.com.
I really want to use your text candy library but need translations in my app more than I’m willing to sacrifice, so I have abandoned it for now.
I began building my own simple image text function and have been using alternatives to string.len and string.sub to stop the problem of double characters. This seems to be working well. Could this be something worth you guys looking into? I’d love to be able to go back to text candy.

--------------------------------------------------------------  
function string.lenUnicode(s)  
 local len, k = 0, 1  
 while k \<= #s do  
 len = len + 1  
 if string.byte(s, k) \<= 190 then k = k + 1 else k = k + 2 end  
 end  
 return len  
end  
  
--------------------------------------------------------------  
function string.subUnicode( s, i, j )  
 local chars = {}  
 local k = 1  
 while k \<= #s do  
 local byte1 = string.byte( s, k )  
 if byte1\<=190 then  
 chars[#chars+1] = string.char( byte1 )  
 k=k+1  
 else  
 local byte2 = string.byte( s, k + 1 )  
 chars[#chars+1] = string.char( byte1, byte2 )  
 k = k + 2  
 end  
 end  
 local sub = ""  
 for m = i, j do  
 sub = sub..chars[m]  
 end  
 return sub  
end

Usage:

local word = "a?r??jämnñ"  
print( string.lenUnicode( word ) )  
   
for i = 1, string.lenUnicode( word ) do  
 print( string.subUnicode( word, i, i ) )  
end

I found these functions here:
https://developer.coronalabs.com/forum/2010/08/08/string-greek-characters [import]uid: 62617 topic_id: 22531 reply_id: 145129[/import]

x-pressive.com · March 6, 2013, 2:22am

Did a quick test with german umlauts (tested on Windows here) and although lenUnicode returns the correct string length, subUnicode returns rubbish, no matter if document encoding is set to UTF-8 or not.

[lua]
local word = “öäüÖÄÜ”
print( string.lenUnicode( word ) )

for i = 1, string.lenUnicode( word ) do
print( string.subUnicode( word, i, i ) )
end
[/lua]
[import]uid: 10504 topic_id: 22531 reply_id: 145155[/import]

juliusbangert · March 5, 2013, 2:25am

I’d like for this to be possible too. I have spent a while setting up text candy using a bitmap font texture in my game only to find out that it doesn’t work with my localisations. Languages such as Japanese and Chinese need for the document encoding to be set to unicode(UTF-8) to handle the characters but this causes errors with text candy.
I can’t use the suggested solution of mapping non-“normal” characters to “normal” characters because I need both and this will get very messy. Has no one successfully used text candy across multiple languages?
x-pressive.com, please advise. [import]uid: 62617 topic_id: 22531 reply_id: 145053[/import]

x-pressive.com · March 5, 2013, 3:52am

The basic problem is that the string library shipped with Corona treats multibyte characters as two different values when passed as function parameters, which makes it very difficult to handle. This does not only affect the string library, we also observed a strange behaviour with the SQL library shipped with Corona when using Umlauts, for example -we had to do massive workarounds as we tried to work with it. Our own impression is that multibyte support is still not fully included with Corona, even if document encoding is set to UTF-8. This is why we explicitely mention that Text Candy does not support multibyte characters (yet) on the Text Candy web site. We’re still looking for a solution that would not affect performance too much. [import]uid: 10504 topic_id: 22531 reply_id: 145061[/import]

juliusbangert · March 5, 2013, 5:20pm

Hi x-pressive.com.
I really want to use your text candy library but need translations in my app more than I’m willing to sacrifice, so I have abandoned it for now.
I began building my own simple image text function and have been using alternatives to string.len and string.sub to stop the problem of double characters. This seems to be working well. Could this be something worth you guys looking into? I’d love to be able to go back to text candy.

--------------------------------------------------------------  
function string.lenUnicode(s)  
 local len, k = 0, 1  
 while k \<= #s do  
 len = len + 1  
 if string.byte(s, k) \<= 190 then k = k + 1 else k = k + 2 end  
 end  
 return len  
end  
  
--------------------------------------------------------------  
function string.subUnicode( s, i, j )  
 local chars = {}  
 local k = 1  
 while k \<= #s do  
 local byte1 = string.byte( s, k )  
 if byte1\<=190 then  
 chars[#chars+1] = string.char( byte1 )  
 k=k+1  
 else  
 local byte2 = string.byte( s, k + 1 )  
 chars[#chars+1] = string.char( byte1, byte2 )  
 k = k + 2  
 end  
 end  
 local sub = ""  
 for m = i, j do  
 sub = sub..chars[m]  
 end  
 return sub  
end

Usage:

local word = "a?r??jämnñ"  
print( string.lenUnicode( word ) )  
   
for i = 1, string.lenUnicode( word ) do  
 print( string.subUnicode( word, i, i ) )  
end

I found these functions here:
https://developer.coronalabs.com/forum/2010/08/08/string-greek-characters [import]uid: 62617 topic_id: 22531 reply_id: 145129[/import]

x-pressive.com · March 6, 2013, 2:22am

Did a quick test with german umlauts (tested on Windows here) and although lenUnicode returns the correct string length, subUnicode returns rubbish, no matter if document encoding is set to UTF-8 or not.

[lua]
local word = “öäüÖÄÜ”
print( string.lenUnicode( word ) )

for i = 1, string.lenUnicode( word ) do
print( string.subUnicode( word, i, i ) )
end
[/lua]
[import]uid: 10504 topic_id: 22531 reply_id: 145155[/import]

juliusbangert · March 5, 2013, 10:25am

I’d like for this to be possible too. I have spent a while setting up text candy using a bitmap font texture in my game only to find out that it doesn’t work with my localisations. Languages such as Japanese and Chinese need for the document encoding to be set to unicode(UTF-8) to handle the characters but this causes errors with text candy.
I can’t use the suggested solution of mapping non-“normal” characters to “normal” characters because I need both and this will get very messy. Has no one successfully used text candy across multiple languages?
x-pressive.com, please advise. [import]uid: 62617 topic_id: 22531 reply_id: 145053[/import]

x-pressive.com · March 5, 2013, 11:52am

The basic problem is that the string library shipped with Corona treats multibyte characters as two different values when passed as function parameters, which makes it very difficult to handle. This does not only affect the string library, we also observed a strange behaviour with the SQL library shipped with Corona when using Umlauts, for example -we had to do massive workarounds as we tried to work with it. Our own impression is that multibyte support is still not fully included with Corona, even if document encoding is set to UTF-8. This is why we explicitely mention that Text Candy does not support multibyte characters (yet) on the Text Candy web site. We’re still looking for a solution that would not affect performance too much. [import]uid: 10504 topic_id: 22531 reply_id: 145061[/import]

juliusbangert · March 6, 2013, 1:20am

Hi x-pressive.com.
I really want to use your text candy library but need translations in my app more than I’m willing to sacrifice, so I have abandoned it for now.
I began building my own simple image text function and have been using alternatives to string.len and string.sub to stop the problem of double characters. This seems to be working well. Could this be something worth you guys looking into? I’d love to be able to go back to text candy.

--------------------------------------------------------------  
function string.lenUnicode(s)  
 local len, k = 0, 1  
 while k \<= #s do  
 len = len + 1  
 if string.byte(s, k) \<= 190 then k = k + 1 else k = k + 2 end  
 end  
 return len  
end  
  
--------------------------------------------------------------  
function string.subUnicode( s, i, j )  
 local chars = {}  
 local k = 1  
 while k \<= #s do  
 local byte1 = string.byte( s, k )  
 if byte1\<=190 then  
 chars[#chars+1] = string.char( byte1 )  
 k=k+1  
 else  
 local byte2 = string.byte( s, k + 1 )  
 chars[#chars+1] = string.char( byte1, byte2 )  
 k = k + 2  
 end  
 end  
 local sub = ""  
 for m = i, j do  
 sub = sub..chars[m]  
 end  
 return sub  
end

Usage:

local word = "a?r??jämnñ"  
print( string.lenUnicode( word ) )  
   
for i = 1, string.lenUnicode( word ) do  
 print( string.subUnicode( word, i, i ) )  
end

I found these functions here:
https://developer.coronalabs.com/forum/2010/08/08/string-greek-characters [import]uid: 62617 topic_id: 22531 reply_id: 145129[/import]

x-pressive.com · March 6, 2013, 10:22am

Did a quick test with german umlauts (tested on Windows here) and although lenUnicode returns the correct string length, subUnicode returns rubbish, no matter if document encoding is set to UTF-8 or not.

[lua]
local word = “öäüÖÄÜ”
print( string.lenUnicode( word ) )

for i = 1, string.lenUnicode( word ) do
print( string.subUnicode( word, i, i ) )
end
[/lua]
[import]uid: 10504 topic_id: 22531 reply_id: 145155[/import]

blomjens1 · March 14, 2013, 8:15am

This is how i fixed it for me

add the code below after “if string == nil then require “string” end”

then in the code lib change all sub and len

to utf8.sub and utf8.len

works whit all i have tried

russian,korean,chines …

&nbsp; utf8 &nbsp;={ charbytes = function &nbsp;(s, i) &nbsp; &nbsp;-- argument defaults &nbsp; &nbsp;i = i or 1 &nbsp; &nbsp;local c = string.byte(s, i) &nbsp; &nbsp;-- determine bytes needed for character, based on RFC 3629 &nbsp; &nbsp;if c \> 0 and c \<= 127 then &nbsp; &nbsp; &nbsp; -- UTF8-1 &nbsp; &nbsp; &nbsp; return 1 &nbsp; &nbsp;elseif c \>= 194 and c \<= 223 then &nbsp; &nbsp; &nbsp; -- UTF8-2 &nbsp; &nbsp; &nbsp; local c2 = string.byte(s, i + 1) &nbsp; &nbsp; &nbsp; return 2 &nbsp; &nbsp;elseif c \>= 224 and c \<= 239 then &nbsp; &nbsp; &nbsp; -- UTF8-3 &nbsp; &nbsp; &nbsp; local c2 = s:byte(i + 1) &nbsp; &nbsp; &nbsp; local c3 = s:byte(i + 2) &nbsp; &nbsp; &nbsp; return 3 &nbsp; &nbsp;elseif c \>= 240 and c \<= 244 then &nbsp; &nbsp; &nbsp; -- UTF8-4 &nbsp; &nbsp; &nbsp; local c2 = s:byte(i + 1) &nbsp; &nbsp; &nbsp; local c3 = s:byte(i + 2) &nbsp; &nbsp; &nbsp; local c4 = s:byte(i + 3) &nbsp; &nbsp; &nbsp; return 4 &nbsp; &nbsp;end end &nbsp; -- returns the number of characters in a UTF-8 string ,len = function &nbsp;(s) &nbsp; &nbsp;if(s~=nil) then &nbsp; &nbsp;local pos = 1 &nbsp; &nbsp;local bytes = string.len(s) &nbsp; &nbsp;local lenX = 0 &nbsp; &nbsp; while pos \<= bytes and lenX ~= chars do &nbsp; &nbsp; &nbsp; &nbsp; local c = string.byte(s,pos) &nbsp; &nbsp; &nbsp; &nbsp; lenX = lenX + 1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; pos = pos + utf8.charbytes(s, pos) &nbsp; &nbsp; end &nbsp; &nbsp;if chars ~= nil then &nbsp; &nbsp; &nbsp; return pos - 1 &nbsp; &nbsp;end &nbsp; &nbsp; &nbsp; &nbsp;return lenX end return 0 end &nbsp; -- functions identically to string.sub except that i and j are UTF-8 characters -- instead of bytes ,sub = function(s, i, j) &nbsp; &nbsp;j = j or -1 &nbsp; &nbsp; &nbsp;if i == nil then &nbsp; &nbsp; &nbsp; return "" &nbsp; &nbsp;end &nbsp; &nbsp;local pos = 1 &nbsp; &nbsp;local bytes = string.len(s) &nbsp; &nbsp;local len = 0 &nbsp; &nbsp; -- only set l if i or j is negative &nbsp; &nbsp;local l = (i \>= 0 and j \>= 0) or utf8.len(s) &nbsp; &nbsp;local startChar = (i \>= 0) and i or l + i + 1 &nbsp; &nbsp;local endChar = (j \>= 0) and j or l + j + 1 &nbsp; &nbsp; &nbsp;-- can't have start before end! &nbsp; &nbsp;if startChar \> endChar then &nbsp; &nbsp; &nbsp; return "" &nbsp; &nbsp;end &nbsp; &nbsp; &nbsp; &nbsp;-- byte offsets to pass to string.sub &nbsp; &nbsp;local startByte, endByte = 1, bytes &nbsp; &nbsp; &nbsp; &nbsp;while pos \<= bytes do &nbsp; &nbsp; &nbsp; len = len + 1 &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; if len == startChar then &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;startByte = pos &nbsp; &nbsp; &nbsp; end &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; pos = pos + utf8.charbytes(s, pos) &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; if len == endChar then &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;endByte = pos - 1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;break &nbsp; &nbsp; &nbsp; end &nbsp; &nbsp;end &nbsp; &nbsp; &nbsp; &nbsp;return string.sub(s, startByte, endByte) end &nbsp; -- replace UTF-8 characters based on a mapping table ,replace = function(s, mapping) &nbsp; &nbsp;local pos = 1 &nbsp; &nbsp;local bytes = string.len(s) &nbsp; &nbsp;local charbytes &nbsp; &nbsp;local newstr = "" &nbsp; &nbsp; &nbsp;while pos \<= bytes do &nbsp; &nbsp; &nbsp; charbytes = utf8.charbytes(s, pos) &nbsp; &nbsp; &nbsp; local c = string.sub(s, pos, pos + charbytes - 1) &nbsp; &nbsp; &nbsp; newstr = newstr .. (mapping[c] or c) &nbsp; &nbsp; &nbsp; pos = pos + charbytes &nbsp; &nbsp;end &nbsp; &nbsp; &nbsp;return newstr end } &nbsp;

juliusbangert · March 5, 2013, 10:25am

I’d like for this to be possible too. I have spent a while setting up text candy using a bitmap font texture in my game only to find out that it doesn’t work with my localisations. Languages such as Japanese and Chinese need for the document encoding to be set to unicode(UTF-8) to handle the characters but this causes errors with text candy.
I can’t use the suggested solution of mapping non-“normal” characters to “normal” characters because I need both and this will get very messy. Has no one successfully used text candy across multiple languages?
x-pressive.com, please advise. [import]uid: 62617 topic_id: 22531 reply_id: 145053[/import]