I would do some research on how RSS is supposed to handle UTF8 characters. It should only be a problem if the feed isn’t wrapping text in CDATA tags which protect about it. Unfortunately you can’t always control the RSS feed and make sure it’s valid.
I don’t understand how CDATA would help - it just makes the reader ignore the content right?
What CDATA tags do is hide non XML data from an XML parser. It’s still in the feed. Now if an RSS reader ignores it, that’s on them. It’s still valid data for that field and they should be able to take it and show it. Maybe this article will help:
http://themetaq.com/articles/feeding-frenzy-part-iii
Rob
How about porting old codes into the new codebase?
Hi Satheesh. We ported a few of the popular, graphics 2.0 compatible projects to get things started. The things had to be in github already, etc. We encourage the authors to port over any thing they wish and we are keeping the old code exchange up, so you can still get to the old files.
This is a way for us to spring-clean so-to-speak.
Oh ok. Makes sense.
I’m still working on this issue and experimenting with the CDATA tag, but as I dig deeper I’m not convinced it’s a parser issue - I think it’s more about how I’m handling the data after I’ve consumed the rss feed.
It appears to be more about how Corona handles special characters in strings. For example I have myString = “dönüyor” - when I ask Corona how long the string is #myString = 9 - no doubt because of the two special characters. That’s fine and I can output the string just fine. The problem arises when I want to shorten the string for display purposes if I get a substring that ends in the “middle” of one of the special characters all I get back is an empty string…
e.g. myString:sub(1,8) when output to screen is “dönüyo” which is fine, but myString:sub(1,2) outputs to “” when I’m expecting “dö”, but that’s only achievable with myString:sub(1,3). Is there any way to test a character to see if it’s “special” so I can extend my string length in this special case?
Thanks,
Nathan.
@beernathan Check out this function Ingemar made, maybe it helps: http://forums.coronalabs.com/topic/42019-split-utf-8-string-word-with-foreign-characters-to-letters/
Awesome - thanks. Couldn’t get the whole thing to work for me, but used parts to be able to analyse my strings for special characters.
Not working for my Polish feed that I referenced earlier, but it is working for other languages - will get to the Polish soon!
Using the double byte code I’ve managed to detect special characters in Turkish & Japanese, but I’m still struggling with my Polish feed - I can’t get the contents of the feed to be displayed. I’m starting to wonder if this is a Corona issue.
Here’s an example of a title in the RSS feed: “Zespoły zaakceptowały propozycje zmian w formacie kwalifikacji?” (one of the examples from this feed http://www.f1wm.pl/rss_f1.php)
When the RSS code reads it in and I output it to the console (in Glider) I get “Zespo�y zaakceptowa�y propozycje zmian w formacie kwalifikacji” (note the weird diamond characters)
I tried wrapping the content in CDATA as per Rob’s suggestion
<![CDATA[Zespoły zaakceptowały propozycje zmian w formacie kwalifikacji?]]>
Console still shows exactly the same output : “Zespo�y zaakceptowa�y propozycje zmian w formacie kwalifikacji”
This makes me think that the CDATA tags aren’t making a difference - in both cases the XML parser is successfully delivering the content. The issue is that when I try to set the contents of a text object to this string I get nothing on screen.
The characters that are delivered by the parser aren’t “double byte” either. I tried pumping out the string through native.showAlert and I got the following error - the plot thickens:
Corona Simulator Generic error
NSInternalInconsistencyException: Invalid parameter not satisfying: aString != nil
(
0 CoreFoundation 0x00007fff96c64b06 __exceptionPreprocess + 198
1 libobjc.A.dylib 0x00007fff94c9b3f0 objc_exception_throw + 43
2 CoreFoundation 0x00007fff96c64948 +[NSException raise:format:arguments:] + 104
3 Foundation 0x00007fff951eb4c2 -[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] + 189
4 AppKit 0x00007fff957ac125 -[NSCell _objectValue:forString:errorDescription:] + 159
5 AppKit 0x00007fff957ac07f -[NSCell _objectValue:forString:] + 20
6 AppKit 0x00007fff957abffb -[NSCell setStringValue:] + 39
7 AppKit 0x00007fff9584032c -[NSControl setStringValue:] + 138
8 AppKit 0x00007fff95a0ab71 -[NSAlert setInformativeText:] + 33
9 Corona SimuNSInternalInconsistencyException
Any ideas about what is actually going on here?
Thanks,
Nathan.
OK, problem solved - it tuned out that it was an encoding problem. Rather than UTF-8 the feed was encoding=“iso-8859-2” so the characters weren’t double byte! The other thing to note is that Polish is “special” and has a set of 18 characters that you need to handle specially (the ones that were encoded) http://en.wikipedia.org/wiki/Polish_code_pages
The XML parser was passing them through just fine - I just had to write a function to parse the string looking for these special characters and converting them to a double-byte equivalent.