The basic problem is that the string library shipped with Corona treats multibyte characters as two different values when passed as function parameters, which makes it very difficult to handle. This does not only affect the string library, we also observed a strange behaviour with the SQL library shipped with Corona when using Umlauts, for example -we had to do massive workarounds as we tried to work with it. Our own impression is that multibyte support is still not fully included with Corona, even if document encoding is set to UTF-8. This is why we explicitely mention that Text Candy does not support multibyte characters (yet) on the Text Candy web site. We’re still looking for a solution that would not affect performance too much. [import]uid: 10504 topic_id: 22531 reply_id: 145061[/import]
Hi x-pressive.com.
I really want to use your text candy library but need translations in my app more than I’m willing to sacrifice, so I have abandoned it for now.
I began building my own simple image text function and have been using alternatives to string.len and string.sub to stop the problem of double characters. This seems to be working well. Could this be something worth you guys looking into? I’d love to be able to go back to text candy.
--------------------------------------------------------------
function string.lenUnicode(s)
local len, k = 0, 1
while k \<= #s do
len = len + 1
if string.byte(s, k) \<= 190 then k = k + 1 else k = k + 2 end
end
return len
end
--------------------------------------------------------------
function string.subUnicode( s, i, j )
local chars = {}
local k = 1
while k \<= #s do
local byte1 = string.byte( s, k )
if byte1\<=190 then
chars[#chars+1] = string.char( byte1 )
k=k+1
else
local byte2 = string.byte( s, k + 1 )
chars[#chars+1] = string.char( byte1, byte2 )
k = k + 2
end
end
local sub = ""
for m = i, j do
sub = sub..chars[m]
end
return sub
end
Usage:
local word = "a?r??jämnñ"
print( string.lenUnicode( word ) )
for i = 1, string.lenUnicode( word ) do
print( string.subUnicode( word, i, i ) )
end
I found these functions here:
https://developer.coronalabs.com/forum/2010/08/08/string-greek-characters [import]uid: 62617 topic_id: 22531 reply_id: 145129[/import]
Did a quick test with german umlauts (tested on Windows here) and although lenUnicode returns the correct string length, subUnicode returns rubbish, no matter if document encoding is set to UTF-8 or not.
[lua]
local word = “öäüÖÄÜ”
print( string.lenUnicode( word ) )
for i = 1, string.lenUnicode( word ) do
print( string.subUnicode( word, i, i ) )
end
[/lua]
[import]uid: 10504 topic_id: 22531 reply_id: 145155[/import]
This is how i fixed it for me
add the code below after “if string == nil then require “string” end”
then in the code lib change all sub and len
to utf8.sub and utf8.len
works whit all i have tried
russian,korean,chines …
utf8 ={ charbytes = function (s, i) -- argument defaults i = i or 1 local c = string.byte(s, i) -- determine bytes needed for character, based on RFC 3629 if c \> 0 and c \<= 127 then -- UTF8-1 return 1 elseif c \>= 194 and c \<= 223 then -- UTF8-2 local c2 = string.byte(s, i + 1) return 2 elseif c \>= 224 and c \<= 239 then -- UTF8-3 local c2 = s:byte(i + 1) local c3 = s:byte(i + 2) return 3 elseif c \>= 240 and c \<= 244 then -- UTF8-4 local c2 = s:byte(i + 1) local c3 = s:byte(i + 2) local c4 = s:byte(i + 3) return 4 end end -- returns the number of characters in a UTF-8 string ,len = function (s) if(s~=nil) then local pos = 1 local bytes = string.len(s) local lenX = 0 while pos \<= bytes and lenX ~= chars do local c = string.byte(s,pos) lenX = lenX + 1 pos = pos + utf8.charbytes(s, pos) end if chars ~= nil then return pos - 1 end return lenX end return 0 end -- functions identically to string.sub except that i and j are UTF-8 characters -- instead of bytes ,sub = function(s, i, j) j = j or -1 if i == nil then return "" end local pos = 1 local bytes = string.len(s) local len = 0 -- only set l if i or j is negative local l = (i \>= 0 and j \>= 0) or utf8.len(s) local startChar = (i \>= 0) and i or l + i + 1 local endChar = (j \>= 0) and j or l + j + 1 -- can't have start before end! if startChar \> endChar then return "" end -- byte offsets to pass to string.sub local startByte, endByte = 1, bytes while pos \<= bytes do len = len + 1 if len == startChar then startByte = pos end pos = pos + utf8.charbytes(s, pos) if len == endChar then endByte = pos - 1 break end end return string.sub(s, startByte, endByte) end -- replace UTF-8 characters based on a mapping table ,replace = function(s, mapping) local pos = 1 local bytes = string.len(s) local charbytes local newstr = "" while pos \<= bytes do charbytes = utf8.charbytes(s, pos) local c = string.sub(s, pos, pos + charbytes - 1) newstr = newstr .. (mapping[c] or c) pos = pos + charbytes end return newstr end }
How to use this code?
I have a string in Russian,what I need to do that it will be correct to post with network.request
How to use this code?
I have a string in Russian,what I need to do that it will be correct to post with network.request
How to use this code?
I have a string in Russian,what I need to do that it will be correct to post with network.request
How to use this code?
I have a string in Russian,what I need to do that it will be correct to post with network.request