Problem displaying international (Norwegian) characters

In Norway we have three special characters: Æ, Ø & Å

But getting them printed on the screen with Corona turns out to be probematic. I have the following super-simple main.lua file:

display.newText("Æ Ø Å", display.contentWidth/2, display.contentHeight/2, native.systemFontBold, 24); 

The result of this is three diamonds with a question mark in the centers.

I’m using Notepad++ as the editor and the settings->Preferences->New document->Encoding is “UTF-8 without BOM” (and Apply to opened ANSI files checked).

The font is the native.systemFont(Bold) because I guess that is the only font to be guaranteed to exist on any device?

What am I doing wrong?

From the documentation, it seems that international characters (e.g. pretty much anything outside standard ASCII characters !) needs to be encoded in UTF-8. When you put characters in your text editor , whatever it is, it may or may not be saved in UTF-8 (probably not !) or it may be stored in 8-bit ASCII.

The problem goes back to 8 bit data and teletypes and so on. Characters were defined using a 7 bit code, which allowed basic English characters and not much else, no French accents, no Norwegian characters. 

So it was extended. But it is not extended consistently, there are different ways of representing Norwegian characters depending on the character encoding used. Initially the values 128-255 were used to store the ‘foreign’ characters but this of course limits them to a maximum of 128 which is no use for Chinese or Japanese alphabets. So there are different ways of encoding those characters, the reason you get diamonds is that Corona and your text editor use different encoding methods.

Try these functions

http://developer.coronalabs.com/code/utf-8-encode-and-decode

what they do is to convert extended ASCII (8 bit) strings to UTF-8 strings which is what Corona wants. These may not work, in which case you will need to look up the byte sequences UTF-8 uses for your language’s characters, and add those into the strings. 

If that doesn’t work,  you’ll have to ask the Corona folks.

Thanks for the reply!

After fiddling a bit with the encoder settings in Notepad++ suddenly this special character thing worked. Actually there seemed to be a few different encoding schemes that worked, but only one where I saw the Norwegian characters correct both in the editor (Notepad++) and on the simulator and on the device. That was to set “Encoding->Encode in UTF-8 without BOM” (see attachment).

In other words; just as I claimed to have done (but admittedly in another part (obviously the wrong part) of Notepad++).

Now it seems to work!

I deal with special characters at times with the Pāli language and unicode  Tahoma font seems to be quite reliable.

Actually, here is a list of unicode fonts that may work for you:

http://en.wikipedia.org/wiki/Unicode_font

From the documentation, it seems that international characters (e.g. pretty much anything outside standard ASCII characters !) needs to be encoded in UTF-8. When you put characters in your text editor , whatever it is, it may or may not be saved in UTF-8 (probably not !) or it may be stored in 8-bit ASCII.

The problem goes back to 8 bit data and teletypes and so on. Characters were defined using a 7 bit code, which allowed basic English characters and not much else, no French accents, no Norwegian characters. 

So it was extended. But it is not extended consistently, there are different ways of representing Norwegian characters depending on the character encoding used. Initially the values 128-255 were used to store the ‘foreign’ characters but this of course limits them to a maximum of 128 which is no use for Chinese or Japanese alphabets. So there are different ways of encoding those characters, the reason you get diamonds is that Corona and your text editor use different encoding methods.

Try these functions

http://developer.coronalabs.com/code/utf-8-encode-and-decode

what they do is to convert extended ASCII (8 bit) strings to UTF-8 strings which is what Corona wants. These may not work, in which case you will need to look up the byte sequences UTF-8 uses for your language’s characters, and add those into the strings. 

If that doesn’t work,  you’ll have to ask the Corona folks.

Thanks for the reply!

After fiddling a bit with the encoder settings in Notepad++ suddenly this special character thing worked. Actually there seemed to be a few different encoding schemes that worked, but only one where I saw the Norwegian characters correct both in the editor (Notepad++) and on the simulator and on the device. That was to set “Encoding->Encode in UTF-8 without BOM” (see attachment).

In other words; just as I claimed to have done (but admittedly in another part (obviously the wrong part) of Notepad++).

Now it seems to work!

I deal with special characters at times with the Pāli language and unicode  Tahoma font seems to be quite reliable.

Actually, here is a list of unicode fonts that may work for you:

http://en.wikipedia.org/wiki/Unicode_font