Pulling Information from a Table within a .html file

bayanimills · January 31, 2011, 12:54am

Hi,

Quite new to Corona and trying to put together what I thought would be a quite simple App.

Essentially there is data I want to pull from a .html file on the internet; There’s some junk data at the top of the page, and at the bottom. So I want to by-pass that.

I want to format it as I see fit - and also apply function to some of the data (such as following the links they lead to). There are also a few different table layouts, but I quite happy to work through that at this stage.

So far I haven’t been able to find (via a search) how this could be done - maybe It’s my unfamiliarity with the terms in Lua.

All suggestions and comments welcome. [import]uid: 30374 topic_id: 5748 reply_id: 305748[/import]

jhocking · January 31, 2011, 1:58pm

What you’re talking about is web scraping. One of the top results when I searched “lua web scraping”

http://software.artiztix.com/harvester/index.htm [import]uid: 12108 topic_id: 5748 reply_id: 19811[/import]

bayanimills · January 31, 2011, 2:51pm

Excellent!

Thanks for that; looks like it was just me and my lack of programming vocabulary. [import]uid: 30374 topic_id: 5748 reply_id: 19819[/import]

bayanimills · January 31, 2011, 4:46pm

Following the instructions from the site, and looked at the reference manual – all seems to be fine, but terminal keeps spiting back the following:

Runtime error: /Users/bayani/Desktop/Community Board/main.lua:1: attempt to call global ‘GetURL’ (a nil value)
stack traceback:
[C]: in function ‘GetURL’
/Users/bayani/Desktop/Community Board/main.lua:1: in main chunk

Below is my code thus far.

llocal page = GetURL("http://boards.nexustk.com/Community/index.html")  
   
local harvester = newHarvester( [[  
 {group commboard}

<center><b><u>{value board}</u></b></center>
  

  
 {repeat list}  
|  
   
[{value postnum}](%7Bvalue%20posturl%7D)  
 |  
   
[{value postdate}](%7Bvalue%20posturl%7D)  
 |  
   
<nobr><br> <a href="%7Bvalue%20posturl%7D">{value postauthor}</a><br> </nobr>  
 |  
   
<nobr><br>   <br> <a href="%7Bvalue%20posturl%7D">{value posttitle}</a><br> </nobr>  
 |
  
 {/repeat}  

  
 {/group}  
]] )  
  
local data = harvester.harvest(page)

I’ve also noted that Harvester is downloadable as a llf – But, I don’t know what to do with this! [import]uid: 30374 topic_id: 5748 reply_id: 19841[/import]

jhocking · January 31, 2011, 8:58pm

It keeps telling you there’s an error on the command GetURL because there’s no such command in Corona. Look at the docs for network commands:
http://developer.anscamobile.com/reference/index/asynchronous-http [import]uid: 12108 topic_id: 5748 reply_id: 19884[/import]

bayanimills · February 1, 2011, 4:06am

Oooh!

I see; I was able to successfully retrieve the html source code and display it in the terminal, as per the example.

Thanks! [import]uid: 30374 topic_id: 5748 reply_id: 19910[/import]

bayanimills · February 1, 2011, 5:39am

Mm.

I don’t seem to understand how I am to implement a harvester if it’s not supported by Corona. The Harvester Script comes as a .llf but I can’t seem to find any documentation on it’s use with Corona for it.

I’ve been able to write the data to a new file; but suspect I need to use something like string.match to work through the data; or would SQL be a better way to go?

[code]
– Retrieve Community Board Titles
local function networkListener( event )
if ( event.isError ) then
print( “Network error!”)
else
local tmp = io.output() – save current file handle
local path = system.pathForFile( “source.txt”, system.DocumentsDirectory )
io.output( path ) – open new file in text mode

io.write( event.response )

io.output():close() – close the file

print( event.response )
end
end

– Access Community Board via GET request
network.request( “http://boards.nexustk.com/Community/index.htm”, “GET”, networkListener )
– Display Alert when Completed
local function onComplete( event )
if “clicked” == event.action then
local i = event.index
if 1 == i then
end
end
end

local alert = native.showAlert( “Success!”, “Data Saved”,
{ “OK” }, onComplete )
[/code] [import]uid: 30374 topic_id: 5748 reply_id: 19922[/import]

jhocking · February 1, 2011, 10:27am

Harvester was just the first result I found, I’ve never used it. Digging a little deeper I tried searching “parse html in lua” and got this result:
http://luaforge.net/projects/html/

Too bad Beautiful Soup is Python, not Lua. [import]uid: 12108 topic_id: 5748 reply_id: 19971[/import]

bayanimills · February 1, 2011, 5:49pm

Oo. This looks promising. I will look at implementing it soon.

I’m assuming Beautiful Soup is a Web-Scraper, like I’m trying to find? [import]uid: 30374 topic_id: 5748 reply_id: 20056[/import]

jhocking · February 1, 2011, 5:59pm

the best one around
http://www.crummy.com/software/BeautifulSoup/ [import]uid: 12108 topic_id: 5748 reply_id: 20062[/import]