Pulling Content from Website

I’m very new to Corona and Lua, so I’m hoping to get some help here. I want to pull content directly from my website. I know how to pull the web page itself into the app, but I want to get specific text and images. I’ve tried using the network.request() to get the source code for the web page, but I’m not sure where to go after that or if that’s even the correct approach.

So I guess my first question should be “is this even possible?”. And if so, “How?”

First you have to know how your website works.

Is it a basic HTML file based site where every file is hand coded (unlikely)?

Is it a Wordpress site?

etc.

Knowing that, then you can approach the question of how to make specific requests of your site for page data and/or how to scrape pages.

I’m not much of a web-head and I’m sure others here are much better at this kind of thing, but I can’t imagine this is all that easy to do. i.e. Unless you’re pulling a very specific subset of data from your site, this could be a huge endeavor.

Can you tell us what kinds of stuff you want to pull from your site?

Also, share your site link so we can see what you mean.

Followup

If you know the exact URL of files and or images, they are pretty easy to download.

If you can’t work this out from the code in the linked docs (above), please supply a few example URLs from YOUR website (to real files and images) and I (or someone else) will write a small example.

This is the URL for one of the pages I want to pull from. https://oregonflowers.com/availability/#availabilitycalendar
The table at the bottom is something that needs to be included in the app. In this case, the table is neither a file nor an image.

Here is another URL.


I’d like to scrape the images and contact information from the page.

If I were to do something similar I would probably create some kind of simple web service that does the actual fetching/parsing from the web pages and presents a simple REST API that your Solar2D app can communicate with. That way your app becomes much simpler, and not so tightly coupled to the content of the web pages.

Something like this:

  1. Http request from Solar2D app to your web service
  2. Web service fetches the web page that should be parsed
  3. Parse web page. For example using BeautifulSoup/jSoup/jsSoup depending on your language of choice.
  4. Return a well-structured JSON response with the data your app is interested in
  5. In your app, handle the JSON response

A great benefit from such a solution is that if your web site’s content changes, you won’t have to update the code of your app. Just update the web service that parses the web pages instead, and return the same JSON response to your app.

1 Like