Comparing Unicode strings

bamazy · October 26, 2019, 10:38am

Hi all. I have problem with comparing unicode strings. I have a certain situation in which I can’t come up with an idea to solve it. The game is in Turkish language right now and I can come up with a weird way to solve the problem. However, I want to make it multilingual, so I don’t want to have the same trouble when adding another language.

I have a screen where I get all the distinct first characters of sentences from SQLite table and show them as buttons. As the first characters are uppercase, I show them as uppercase alphabets. And then I have a word which I ask the player to guess. When the player presses a button, I compare it with the alphabet from the word.

For example, I have this string " akıldan", and when the user presses the button A, I make it lower and compare it with the first alphabet from the word. If it matches, I move to check the next alphabet, which, in the given example, is the alphabet ’ k’. Again, if the player presses the button K, I make it lowercase and compare with the alphabet ’ k’, and they match. However, when it comes to the alphabet ’ ı’, if the player presses the button ’ I’(which is not english i , but the uppercase version of the alphabet ’ ı’), it becomes ’ i’ when being lowercased, and does not match with the alphabet ’ ı’.

I can create a table in SQLite and keep lowercase and uppercase versions of the alphabets in every language that I will add to the game, and then I can send both versions of the alphabets to the touch listener function and then check both uppercase and lowercase version of the pressed alphabets.

Anybody has faced such a problem and came up with a solution that could recommend me so that I don’t have to do extra work and keep lowercase and uppercase versions of the alphabets in SQLite every time I add a new language?

P.S. I hope I have managed to explain it in simple words. Sorry if not.

rob · October 26, 2019, 4:35pm

Are you using the utf8 plugin? It’s a version of string.* that works with multi-byte characters.

Rob

bamazy · October 27, 2019, 9:18am

Rob, I am using the utf8 plugin but using utf8.lower() for the alphabets ’ I’ gives ’ i’ which is not true. However, using utf8.upper() for the alphabet ’ ı’ gives ’ I’ which is true. But for the alphabet ’ i’ utf8.upper() function gives ’ I’ which is not true. It should give ’ İ’.

As I also noticed that if I get the list of alphabets from the first letters of the alphabets in the database, I might not have some characters if there is no word starting with that character. Therefore, I thought keeping a list of both uppercase and lowercase versions of the alphabets in the database and using the string comparison will be a better idea. I don’t if I’ll have gotchas with different languages for that solution in the future though.

rob · October 27, 2019, 2:44pm

I’m not sure I understand. What language are these glyphs from? The lower of I is i. I am un familiar with a letter i that has a dot on it when upper cased, or doesn’t have a dot over it when lower cased.

Rob

bamazy · October 27, 2019, 3:58pm

The game is in turkish language right now. Look at these words: ‘Ihlamur’ - ‘ıhlamur’ and ‘İlkel’ - ‘ilkel’ Azerbaijani language also has the same accented alphabets. I checked all the accented alphabets in both languages, the only problem is with the alphabets ‘I ı’ and ‘İ i’.

royaragon5946 · December 16, 2019, 4:38pm

Thanks for the clarification