Hello.
I think you already know.
Lua’s string functions become useless when
it came with text containing multibyte characters like Japanese, Chinese, Greeks…etc.
string.byte()
string.char()
string.find()
string.format()
string.gmatch()
string.gsub()
string.len()
string.lower()
string.match()
string.rep()
string.reverse()
string.sub()
string.upper()
Ansca team, please make utf8 supported string functions.
Here is my example of string.len’s utf8 version.
how to use:
print( utf8Len( “???abc” ) ) – outputs 8
[code]
– returns the number of bytes used by the UTF-8 character at byte i in s
– also doubles as a UTF-8 character validator
function utf8CharBytes(s, i)
– argument defaults
i = i or 1
local c = string.byte(s, i)
– determine bytes needed for character, based on RFC 3629
if c > 0 and c <= 127 then
– UTF8-1
return 1
elseif c >= 194 and c <= 223 then
– UTF8-2
local c2 = string.byte(s, i + 1)
return 2
elseif c >= 224 and c <= 239 then
– UTF8-3
local c2 = s:byte(i + 1)
local c3 = s:byte(i + 2)
return 3
elseif c >= 240 and c <= 244 then
– UTF8-4
local c2 = s:byte(i + 1)
local c3 = s:byte(i + 2)
local c4 = s:byte(i + 3)
return 4
end
end
– returns the number of characters in a UTF-8 string
function utf8Len (s)
local pos = 1
local bytes = string.len(s)
local len = 0
while pos <= bytes and len ~= chars do
local c = string.byte(s,pos)
len = len + 1
pos = pos + utf8CharBytes(s, pos)
end
if chars ~= nil then
return pos - 1
end
return len
end
[/code] [import]uid: 98774 topic_id: 20876 reply_id: 320876[/import]