Hi,
Quite new to Corona and trying to put together what I thought would be a quite simple App.
Essentially there is data I want to pull from a .html file on the internet; There’s some junk data at the top of the page, and at the bottom. So I want to by-pass that.
I want to format it as I see fit - and also apply function to some of the data (such as following the links they lead to). There are also a few different table layouts, but I quite happy to work through that at this stage.
So far I haven’t been able to find (via a search) how this could be done - maybe It’s my unfamiliarity with the terms in Lua.
All suggestions and comments welcome.
[import]uid: 30374 topic_id: 5748 reply_id: 305748[/import]
What you’re talking about is web scraping. One of the top results when I searched “lua web scraping”
http://software.artiztix.com/harvester/index.htm [import]uid: 12108 topic_id: 5748 reply_id: 19811[/import]
Excellent!
Thanks for that; looks like it was just me and my lack of programming vocabulary.
[import]uid: 30374 topic_id: 5748 reply_id: 19819[/import]
Following the instructions from the site, and looked at the reference manual – all seems to be fine, but terminal keeps spiting back the following:
Runtime error: /Users/bayani/Desktop/Community Board/main.lua:1: attempt to call global ‘GetURL’ (a nil value)
stack traceback:
[C]: in function ‘GetURL’
/Users/bayani/Desktop/Community Board/main.lua:1: in main chunk
Below is my code thus far.
llocal page = GetURL("http://boards.nexustk.com/Community/index.html")
local harvester = newHarvester( [[
{group commboard}
<center><b><u>{value board}</u></b></center>
{repeat list}
|
[{value postnum}](%7Bvalue%20posturl%7D)
|
[{value postdate}](%7Bvalue%20posturl%7D)
|
<nobr><br> <a href="%7Bvalue%20posturl%7D">{value postauthor}</a><br> </nobr>
|
<nobr><br> <br> <a href="%7Bvalue%20posturl%7D">{value posttitle}</a><br> </nobr>
|
{/repeat}
{/group}
]] )
local data = harvester.harvest(page)
I’ve also noted that Harvester is downloadable as a llf – But, I don’t know what to do with this! [import]uid: 30374 topic_id: 5748 reply_id: 19841[/import]
It keeps telling you there’s an error on the command GetURL because there’s no such command in Corona. Look at the docs for network commands:
http://developer.anscamobile.com/reference/index/asynchronous-http [import]uid: 12108 topic_id: 5748 reply_id: 19884[/import]
Oooh!
I see; I was able to successfully retrieve the html source code and display it in the terminal, as per the example.
Thanks! [import]uid: 30374 topic_id: 5748 reply_id: 19910[/import]
Mm.
I don’t seem to understand how I am to implement a harvester if it’s not supported by Corona. The Harvester Script comes as a .llf but I can’t seem to find any documentation on it’s use with Corona for it.
I’ve been able to write the data to a new file; but suspect I need to use something like string.match to work through the data; or would SQL be a better way to go?
[code]
– Retrieve Community Board Titles
local function networkListener( event )
if ( event.isError ) then
print( “Network error!”)
else
local tmp = io.output() – save current file handle
local path = system.pathForFile( “source.txt”, system.DocumentsDirectory )
io.output( path ) – open new file in text mode
io.write( event.response )
io.output():close() – close the file
print( event.response )
end
end
– Access Community Board via GET request
network.request( “http://boards.nexustk.com/Community/index.htm”, “GET”, networkListener )
– Display Alert when Completed
local function onComplete( event )
if “clicked” == event.action then
local i = event.index
if 1 == i then
end
end
end
local alert = native.showAlert( “Success!”, “Data Saved”,
{ “OK” }, onComplete )
[/code] [import]uid: 30374 topic_id: 5748 reply_id: 19922[/import]
Harvester was just the first result I found, I’ve never used it. Digging a little deeper I tried searching “parse html in lua” and got this result:
http://luaforge.net/projects/html/
Too bad Beautiful Soup is Python, not Lua. [import]uid: 12108 topic_id: 5748 reply_id: 19971[/import]
Oo. This looks promising. I will look at implementing it soon. 
I’m assuming Beautiful Soup is a Web-Scraper, like I’m trying to find? [import]uid: 30374 topic_id: 5748 reply_id: 20056[/import]
the best one around
http://www.crummy.com/software/BeautifulSoup/ [import]uid: 12108 topic_id: 5748 reply_id: 20062[/import]