Easy way to extract urls out of a podcast rss feed?

Omnigeek_Media · August 16, 2011, 3:34pm

Regarding the errors above with ad.init(). Seems that the latest nightly build DID add a third parameter, a callback function, though it is supposed to be optional. If your running build 605 then that could be the cause. I’m building with 600.

As for it working with some feeds and others, all I can really do is suggest dropping in some print statements, making sure the console is open in case its crashing and the error might help narrow down were to look.

For instance, I had to hack xml.lua to deal with some special characters and I may not have trapped them all. Or there could be other things in the RSS feed that causes that feed to not validate properly and could break the XML parser.
Check your feed in an XML or Feed validator such as:

http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.w3schools.com%2Fxml%2Fxml_validator.asp

and see if it sees problems with the feed.

[import]uid: 19626 topic_id: 13462 reply_id: 51180[/import]

Omnigeek_Media · August 16, 2011, 3:54pm

as for the delay I’m using the built in test to see if the network is available. This is an async event handler, so I have to trigger the test, wait for it to come back saying “Yes you have a network”, if not it picks up a cached version. Then it calls network.download, which is again async so I have to wait on that call back too.

You can speed this up by getting rid of the network test. Also if its your wordpress blog, make sure that you have some caching software so it doesn’t have to build the feed on the fly. I would hope most of the caching plugins would cash feeds too.
[import]uid: 19626 topic_id: 13462 reply_id: 51182[/import]

lano78 · August 17, 2011, 4:22am

I ran the feed in the validator and it came out perfect, no problems at all. So I thought at first I must have made some changes by accident, now here’s the odd part…I put in the Ansca blog again just to check and it came out as this;

Runtime error ...560j8ncjymmw0btj00000gn/T/TemporaryItems/191/xml.lua:94: XmlParser: trying to close div with p stack traceback: [C]: ? [C]: in function 'error' ...560j8ncjymmw0btj00000gn/T/TemporaryItems/191/xml.lua:94: in function 'ParseXmlText' ...560j8ncjymmw0btj00000gn/T/TemporaryItems/191/xml.lua:119: in function 'loadFile' ...560j8ncjymmw0btj00000gn/T/TemporaryItems/191/rss.lua:23: in function 'feed' ...8ncjymmw0btj00000gn/T/TemporaryItems/191/screen1.lua:59: in function 'processRSSFeed' ...8ncjymmw0btj00000gn/T/TemporaryItems/191/screen1.lua:98: in function <...8ncjymmw0btj00000gn> 

Strange, so I thought what if I delete the sandbox and rebuild?
It came out with the same error as above. [import]uid: 13560 topic_id: 13462 reply_id: 51293[/import] </…8ncjymmw0btj00000gn>

Omnigeek_Media · August 17, 2011, 7:01am

This is one of those “This isn’t my problem” problems.

The error says that the XML is trying to close a

with a

tag.

It appears that the xml.lua file is actually trying to parse the HTML tags that are embedded in the tag (the stuff enclosed in the tags. In that embedded HTML, there is apparently a

being opened without a closing tag… or at least xml.lua believes that to be the case (I cut the HTML out and put it into firebug and didn’t find problems with the HTML).

I have no clue where all that HTML is coming from. I would think RSS feeds would be more about the content than tons of markup.

I think we need to either get Jonathan BeeBee or Alexander Makeev’s take on the problem. XML should ignore anything in tags and shouldn’t try and parse it. Since I didn’t write the xml.lua file, I only hacked it to deal with HTML entities, its going to take me a while to decipher it.

EDIT:
Digging a bit further, the HTML embedded in the tags fails to validate. On line 47:

 Chickens Quest has a total approximately 8000 lines of code! 

That closing

tag doesn’t have a corresponding open

tag and its breaking the XML parser. I don't understand the code in xml.lua enough to work around it. [import]uid: 19626 topic_id: 13462 reply_id: 51315[/import]

lano78 · August 17, 2011, 9:32am

Great find, I would never had gone looking in the rss feed since they all validated fine. But I just ran the Ansca blog and it was fine except for this;

“interoperability with the widest range of feed readers”

This doesn’t sound good, to me that sounds like;

“yes you can walk but you need legs first”

When I burned my feed I checked for compatibility in the options in feedburner so mine should be fine but maybe I have that error too in my rss feed even though it validates.

After further investigations, I seem to have the same issue with my feed.
I think it’s odd, I burn the feed, I validate and it is fine but still the xml is bad. Shouldn’t feedburner or the Validator pick up such fault??

I’m not an XML genie myself so I can’t go much further in this. There was xml built in to the old coronaUI and it seemed a lot easier to use.

I hope Ansca put their magic hands on XML and build it in to Corona making it a one line wonder thingy like with the rest of Corona. [import]uid: 13560 topic_id: 13462 reply_id: 51350[/import]

Omnigeek_Media · August 17, 2011, 9:59am

Feedburner and other validators should ignore the content inside of tags. It’s character data that is supposed to be passed verbatim to the reader.

The problem is the xml.lua file, which Jonathan got from a LuaXML site is trying to parse the HTML inside the CDATA tags when it shouldn’t be.

So while the XML in the RSS is perfectly valid in this case, the HTML inside the CDATA tags is not.

I tried to look at xml.lua to see if I could easily skip parsing the data, but I haven’t figured out what all is going on in that loop yet. I’m not very comfortable with Lua’s string formatting codes and the original author didn’t use the best variable names (Yea, we are all guilty of that too). So until I can find time to tear that apart (and to be honest, writing an XML parser for the community isn’t high on my priority list) it would be helpful for some other eye-balls to take a peek and see if they can make sense of what’s going on in that loop and find a way to just capture the CDATA blocks.
[import]uid: 19626 topic_id: 13462 reply_id: 51355[/import]

lano78 · August 17, 2011, 10:03am

I removed some of my articles in my blog and now it worked, I narrowed it down to some code I had on the blog so I think it had some characters not supported in the xml parser.

…it was a snippet of Obj-C, allergic reaction from lua perhaps???

Now it works.
[import]uid: 13560 topic_id: 13462 reply_id: 51356[/import]

Omnigeek_Media · August 17, 2011, 10:34am

If there are specific characters, I can add (you can too) that in. Look around line 40 in xml.lua for all the:

if h == “8217” then return “’” end

type lines. If h has a string you want to get rid of, just put in some additional tests there.

But if you’re getting closing tags not matching up with opening ones, that’s going to take much more thought.
[import]uid: 19626 topic_id: 13462 reply_id: 51360[/import]

Omnigeek_Media · August 18, 2011, 10:31am

I’ve made some progress on the CDATA issue. I’m ripping it out at the moment, so its no longer trying to parse the content. Now I have to figure out how to get it back in the table in the right place.

When this happens, apps like mine probably won’t be able to use native.textBox to show the content since its likely to be filled with HTML and will have to write the block out to a temp file then load it in with a native.webPopUp.

[import]uid: 19626 topic_id: 13462 reply_id: 51607[/import]

Omnigeek_Media · August 19, 2011, 7:31pm

Jonathan BeeBee fixed xml.lua to handle CDATA and we owe him mucho thanks!

I’ve updated my project with a few new things that might interest you.

xml.lua – the new CDATA friendly version.
rss.lua – needed a minor tweek to the handling of the content:encoded tag since I now get the whole entry instead of having to try and parse the paragraphs out of it.
webpage.lua – a version of page.lua that uses native.webPopUp() to try and render the HTML content your content:encoded is likely to now contain. Still a bit buggy.

and . . . if your an ATOM fan instead of RSS a new… wait for it… wait for it…

atom.lua – processes atom based feeds.

and to show you how to use it, screen1.lua now does atom instead of RSS.

Have a little fun with it!
[import]uid: 19626 topic_id: 13462 reply_id: 51835[/import]