That was fun
I’ve been thinking about how to handle this before but had no project that needed it, however this topic sparked my interest and I’ve created a small function that should work.
local UTF8ToCharArray = function(str) local charArray = {}; local iStart = 0; local strLen = str:len(); local function bit(b) return 2 ^ (b - 1); end local function hasbit(w, b) return w % (b + b) \>= b; end local checkMultiByte = function(i) if (iStart ~= 0) then charArray[#charArray + 1] = str:sub(iStart, i - 1); iStart = 0; end end for i = 1, strLen do local b = str:byte(i); local multiStart = hasbit(b, bit(7)) and hasbit(b, bit(8)); local multiTrail = not hasbit(b, bit(7)) and hasbit(b, bit(8)); if (multiStart) then checkMultiByte(i); iStart = i; elseif (not multiTrail) then checkMultiByte(i); charArray[#charArray + 1] = str:sub(i, i); end end -- process if last character is multi-byte checkMultiByte(strLen + 1); return charArray; end local arr = UTF8ToCharArray("Äpplet är i trädet ÅÄÖåäö"); for k,v in pairs(arr) do print(k, v); end
Multi byte characters start with a byte with bits 7 and 8 set, trailing bytes have bit 7 not set and bit 8 set.
My function checks for these bits and acts accordingly.
Give this function a whirl and see if it works for you.
I’ve done some basic testing, and it works well even for Chinese, Japanese and Korean text :wub: