string functions which support UTF8 normally

fatalita · January 24, 2012, 9:04am

Hello.

I think you already know.
Lua’s string functions become useless when
it came with text containing multibyte characters like Japanese, Chinese, Greeks…etc.

string.byte()
string.char()
string.find()
string.format()
string.gmatch()
string.gsub()
string.len()
string.lower()
string.match()
string.rep()
string.reverse()
string.sub()
string.upper()

Ansca team, please make utf8 supported string functions.
Here is my example of string.len’s utf8 version.

how to use:
print( utf8Len( “???abc” ) ) – outputs 8

[code]
– returns the number of bytes used by the UTF-8 character at byte i in s
– also doubles as a UTF-8 character validator
function utf8CharBytes(s, i)
– argument defaults
i = i or 1
local c = string.byte(s, i)

– determine bytes needed for character, based on RFC 3629
if c > 0 and c <= 127 then
– UTF8-1
return 1
elseif c >= 194 and c <= 223 then
– UTF8-2
local c2 = string.byte(s, i + 1)
return 2
elseif c >= 224 and c <= 239 then
– UTF8-3
local c2 = s:byte(i + 1)
local c3 = s:byte(i + 2)
return 3
elseif c >= 240 and c <= 244 then
– UTF8-4
local c2 = s:byte(i + 1)
local c3 = s:byte(i + 2)
local c4 = s:byte(i + 3)
return 4
end
end

– returns the number of characters in a UTF-8 string
function utf8Len (s)
local pos = 1
local bytes = string.len(s)
local len = 0

while pos <= bytes and len ~= chars do
local c = string.byte(s,pos)
len = len + 1

pos = pos + utf8CharBytes(s, pos)
end

if chars ~= nil then
return pos - 1
end

return len
end
[/code] [import]uid: 98774 topic_id: 20876 reply_id: 320876[/import]

Danny · January 26, 2012, 2:57am

Passing this onto the team [import]uid: 84637 topic_id: 20876 reply_id: 82701[/import]

monogames · February 28, 2012, 6:13am

Do you possibly have a alternative function to :lower and :upper?

This fix should really be a priority. Because i’m guessing the majority of the corona users doesn’t live in the US.

[import]uid: 103182 topic_id: 20876 reply_id: 89888[/import]