Replacing non-english characters (CP-1250) with their simple equivalents

joedavinci · September 14, 2014, 7:06pm

Hi all,

Today I have ran into a problem that Strings containing non english characters, while displayed OK on a device’s screen, when they are used for example in network.request, they get translated incorrectly in the URL.

This doesn’t happen on Corona Simulator as that probably uses the PCs codepages and can translate well.

In effect, my network.requests worked fine on simulator, but when I put them on my Android phone, they would fail if a non-english character (from codepage CP-1250) was present.

I have managed to fix it by replacing these characters in the request string with their simpler, english equivalents (luckily the server API I’m using works well even if I use english character set only and can guess the differences)

Anyway, here’s a simple function if it may help anyone in the future. I know it is actually very easy to create, but maybe it will save someone’s time in the future.

local oldString = "Pěkný žlüťóúňký kůň se zářezěm a ďůľkem." local newString local function replaceStrings() local nonEnglishChars = {"×","µ","Ä","ä","Á","á","Â","â","Ă","ă","Ą","ą","Ć","ć","Ç","ç","Č","č","Ď","ď","Đ","đ","Ë","ë","É","é","Ě","ě","Ę","ę","Í","í","Î","î","Ĺ","ĺ","Ľ","ľ","ł","Ń","ń","Ň","ň","Ö","ö","Ó","ó","Ô","ô","Ő","ő","Ŕ","ŕ","Ř","ř","Ś","ś","Ş","ş","ß","Š","š","Ť","ť","Ţ","ţ","Ü","ü","Ú","ú","Ů","ů","Ű","ű","Ý","ý","Ż","ż","Ź","ź","Ž","ž"} local englishChars = {"x","u","A","a","A","a","A","a","A","a","A","a","C","c","C","c","C","c","D","d","D","d","E","e","E","e","E","e","E","e","I","i","I","i","L","l","L","l","l","N","n","N","n","O","o","O","o","O","o","O","o","R","r","R","r","S","s","S","s","ss","S","s","T","t","T","t","U","u","U","u","U","u","U","u","Y","y","Z","z","Z","z","Z","z"} for i=1,#nonEnglishChars do newString = string.gsub( oldString, nonEnglishChars[i], englishChars[i] ) end --ending for loop end --ending replaceStrings() replaceStrings() print(oldString) -- returns "Pěkný žlüťóúňký kůň se zářezěm a ďůľkem." print(newString) -- returns "Pekny zlutounky kun se zarezem a dulkem."

Good luck!

rob · September 14, 2014, 9:32pm

URLs are not UTF safe. In fact you are only supposed to use ASCII letters, numbers and a few symbols only. If the URL contains anything else, you are supposed to “URL encode” the string.

I use this function:

function M.urlencode(str)
if (str) then
    str = string.gsub (str, “\n”, “\r\n”)
    str = string.gsub (str, “([^%w])”,
        function © return string.format ("%%%02X", string.byte©) end)
    str = string.gsub (str, " ", “+”)
end
return str
end

but I don’t know how well this handles UTF-8 characters. You can give it a try. If it doesn’t you may have to search for a UTF-8 friendly version of the function.

joedavinci · September 15, 2014, 9:53am

ok, thanks for the idea, for now my code works well enough for my purpose, but i’ll look into it as a possible future improvement.

rob · September 14, 2014, 9:32pm

URLs are not UTF safe. In fact you are only supposed to use ASCII letters, numbers and a few symbols only. If the URL contains anything else, you are supposed to “URL encode” the string.

I use this function:

function M.urlencode(str)
if (str) then
    str = string.gsub (str, “\n”, “\r\n”)
    str = string.gsub (str, “([^%w])”,
        function © return string.format ("%%%02X", string.byte©) end)
    str = string.gsub (str, " ", “+”)
end
return str
end

but I don’t know how well this handles UTF-8 characters. You can give it a try. If it doesn’t you may have to search for a UTF-8 friendly version of the function.

joedavinci · September 15, 2014, 9:53am

ok, thanks for the idea, for now my code works well enough for my purpose, but i’ll look into it as a possible future improvement.