Error in the length of a string using Spanish language.

fferraro67 · October 5, 2014, 6:39pm

Example:

A = "Ñandú"&nbsp; Numberofchars = string.len (A)

The answer is 7 would have to be 5, and the letter Ñ is a character and not two. Same with the accent. Accented letters are equivalent to two characters.

How do I fix this?

RedBeach · October 5, 2014, 7:01pm

You can create your custom length function that decreases “1” from the string.len() result for each accent letter

rob · October 5, 2014, 9:56pm

Adding to this, string.len() returns the count of bytes in the string. UTF-8 characters can be multi-bytes long. You might want to look here:

http://stackoverflow.com/questions/10097941/print-number-of-characters-in-utf-8-string

and here:

http://www.math.ntnu.no/~stacey/documents/Codea/Library/utf8.lua

For some assistance with this.

Rob

fferraro67 · October 5, 2014, 11:46pm

The letters á, é, í, ó, ú, Á, É, Í, Ó, Ú, Ü, ñ and Ñ have problems. They are not the only ones. Thanks Rob. Excellent solution. Where the module is installed? In the same place where it is located main.lua or elsewhere?

rob · October 6, 2014, 2:18am

If it’s just a .lua file, drop it in with your main.lua. You could put it in a folder if you wish, but when you require the module, you would need to adjust the require statement to reflect it’s in a folder.

Rob

fferraro67 · October 7, 2014, 10:25pm

I tried to find another solution to the problem and came up with this:

There are many symbols in the ASCII table that are rarely used when writing words. Generally they are auxiliary symbols. But symbols are not used in proper names, animals, geographic locations, history, etc.

There is no name that use @ (arroba). It is used in the email, Twitter, was an old unit of weight, but little else. Nobody called “Canci@n”.

So I decided to find some symbols to replace those special characters that made the resulting string length altered.

The idea is to change them before going through the API, and return to the original symbol at the end.

For this I run with an advantage: the names are stored in a text file beforehand. Then the names are written with symbols allowed from the start. The @ symbol replaces the accent over the letter “ó” and “Ó”. The replacement table is:

“~” Is the accent on the “ñ” and "Ñ "

“#” Is the emphasis on the “á” and “Á”

“!” is the emphasis on the “é” and “É”

“?” is the emphasis on the “í” and “Í”

“@” Is the emphasis on the “Ó” and “ó”

“&” Is the accent over the “u” and “Ú”

“_” Is the umlaut over the “ü” or “Ü”

Example: Ñandú

It has two special chars: “Ñ” and “ú”. That makes the length is measured as of 7, when in fact it is 5.

The trick: replacing “Ñandú” with “~ and &”. This replacement do in word processor. Something very simple. How do I distinguish capitals which are not? In the game the letters that come first are capital, therefore knowing the position in the word determine if “~” becomes a Ñ or a ñ. Same with accented vowels.

It looks very sloppy, but it works very well. I have not add any library and allows me to use characters that are used in the Spanish language.

HyperBeard_Games · October 17, 2014, 2:12am

This is what we did on our game Palabraz

for c in theWord:gmatch('.[\128-\191]\*') do table.insert(wordArray, c) end

then #wordArray nows exactly how many characters the word has. Hope it helps you

RedBeach · October 5, 2014, 7:01pm