Storing .bin and .data remotely

We are building a site that will serve some of our Corona apps via HTML5.  However we have a storage/transfer issue, and want to put the .bin and .data files on a service like Amazon S3 or Backblaze.  

The index.html file will still reside on our webserver, but those .bin and .data files will transfer so much data that they need to be on a cloud storage system.  The issue is, how can we update the html file to point to those remote .bin and .data files?  

We were able to simply update the xml.open line to point to the bin, but it then tries to find the .data file in the same directory as our index.html file and we see no way to change that.

Any ideas on this?  I can’t imagine anyone is running a commercial site where their web hosting allows them to transfer large amounts of data in the way HTML5 builds are currently structured.  It just isn’t a very scalable solution unless the .data and .bin files can be served from elsewhere.

This isn’t something I’ve tried myself yet, oddly, but if you leave your hosting server as the origin instead of setting up a storage bucket, I expect a CDN url should actually work just fine. Relative paths within the bin would just load a copy over the CDN url which would just translate back to your servers copy, and the CDN would still create a cloud of caches for serving subsequent requests exactly the same as if you’d used a bucket. XSS issues might occur if you don’t set the correct origin headers up, but that shouldn’t be difficult to resolve.

I’m unfamiliar with Amazon, but we do this kind of thing all the time with Rackspace CDN. I actually prefer it over storage buckets because it means you can have admin panels that upload CMS content to your web server as normal, but then the front end loads all of that content via a cloud url. No need for any middleman system to synchronise the CMS uploads with external buckets. Really, the only advantage to file storage buckets is that you don’t also have to pay for a web server, but if you’re building a site you’ll still need a server to run that so you just might as well use it as the origin.

In fact, what I’d probably do if I were building the site, is have it just load those index.html files into iframes via a CDN url. That way the index file would load up from the cloud and all of the dependencies that it references would come in through relative paths, so also via the cloud. You wouldn’t need to edit anything at all within those files, just have Corona compile a new build, upload to your server, and leave the iframe to do all of the magic.

I run a web agency and we’re Rackspace partners which puts us in a pretty good position to offer complex hosting set-ups. If you’re open to the idea of migrating your hosting to us, I’d be happy to take a proper look and maybe do you a quote. https://www.qweb.co.uk . Absolutely no obligation to of course - my thinking out loud above might have been enough to get you on your feet anyway. :smirk:

Thanks for the insights!  I’ll be the first to admit my web hosting knowledge is about 10 years old, so all this is pretty new to me.  I think a CDN is a bit overkill for our needs, as we don’t need quite that speed of service.  But that idea of a URL redirect is interesting, and I was able to accomplish it use .htaccess and some mod_rewrites.  However it did kick up that XSS issue you mentioned, and I wasn’t able to solve that.  I’m sure someone smarter than me could do it though.

I like that iFrame idea a lot, but the issue is our site we’re building is going to be subscription based.  So a user must authenticate and have an active subscription, and that lets them play the games.  If we just have static index.html files they could just bookmark those and skip the subscription.  So our idea is to dynamically make the index.html on the fly depending on the game they load and if they have a subscription.

After some more playing around, I was able to use htaccess to redirect the request to S3 and then on S3 we had to update our CORS configuration.  After that it works great with no XSS errors. index.html is local on our server and the bin and data are hosted remotely.

Your S3 bucket is a CDN, it’s just that you’re uploading the files to a storage server that feeds this CDN rather than using your existing server as the origin. The technology itself is the same but ref my post above, remaining the origin just makes things simpler.

Glad you got it working anyway. The CORS config is the key part to this.

For the record, your subscription concept would work fine with the iframe approach. You can restrict content to only be available if requested through an iframe from a page hosted on your own domain. That way the content wouldn’t be served if requested directly, meaning the user would have to bookmark the page that the iframe is on rather than the iframe itself.

Curious though, as to whether there’s actually a market for subscription based in-browser games? With the number of free games available via the Google and iOS stores, and the number of websites offering free browser games (all of which modelled around ad or IAP monetisation), I’m not convinced there are many people these days who would consider subscribing to a service like this unless it had the backing of a AAA publisher. If I were you, I’d strongly consider just running adverts.

Ah, thanks for clarifying the CDN concept.

As for the viability, we’re in a pretty niche market (kids education) where the audience does pay for this content.  Ads are a backup option for us, but there are several successful subscription services in our market already (ABCMouse, SplashMath, etc).

@kbradford Let us know when you are up and running.

How did you configure .htaccess? I’m trying to do the same thing as you.

RewriteEngine On
RewriteCond %{REQUEST_URI} .*data$|.*bin$|.*stash$ [NC]
RewriteRule (.*) https://myexamplebucket.s3-us-west-2.amazonaws.com/$1 [R]

Ah, I see now that this isn’t quite the approach I was getting at. It looks like what you’re doing is hosting the standard HTML page as-is but then using htaccess to reroute the requests that the browser sends for those assets, to tell it to grab them from the CDN instead. This would work and effectively offload a lot of bandwidth but it’s worth noting that it means your server still has to respond to all of those requests, and since htaccess is an Apache feature and Apache itself is resource hungry, at high periods of traffic this isn’t a brilliant solution. In a nutshell, you’re using the CDN for content delivery but your server is becoming a bottleneck - you’re not benefitting from the fact CDN can handle way more requests simultaneously than your single server is able to.

Additionally, modern servers tend to be HTTP2 enabled, and one key difference between HTTP1 and 2 is that when a page request comes in over v1, the page is delivered. The browser then scans it for additional content and puts in additional requests for those files. With v2 when the page request comes in, the server knows that the browser will next ask for those additional assets so instead of just sending the page back, it goes ahead and starts sending all of the assets back too. Because of this I’m not actually sure that your htaccess trick will be saving you the bandwidth cost if your server is HTTP2…

The way I was meaning in my original response was to load the actual HTML page over CDN. I.e. add the page and its assets to your server as normal, but then instead of pointing a domain directly at that page, use the CDN URL to get to it. You can mask this either using an iframe or by pointing a domains CNAME record.

That way, the index.html itself will be delivered via the CDN and the relative paths it includes for assets will naturally also be requested via that same CDN URL. Much, much fewer requests coming in to your server even during peak traffic.

Actually I wanted to follow that with an even simpler solution. With Backspace Cloud you’ve the option of file storage buckets which are basically folders that you create directly on their cloud, for hosting static files without any server at all. Other CDNs offer the same but with Rackspace you can actually host entire websites this way, so long as you don’t need any dynamic scripts - so no PHP or ASP sites basically.

With this approach, you could just dump your index.html and the assets into a storage bucket and point a domain directly there.

Ref my original response, my company is a Rackspace partner. I mentioned the option of migrating to us for hosting but that’s totally optional. If you’re just interested in using their CDN or cloud storage buckets drop me a PM and I’ll get a rep to give you a call.

The htaccess rewrite was mostly a proof of concept. Our production implementation is actually NodeJS with Express doing the routing for the bin and data files which we now host on Backblaze. The index.html to load it all is dynamic so we can’t host that on a CDN, it has to be generated on our server You can check out our production implementation of this now at www.rosimosi.com. It’s been live only a couple days so let me know if you see anything broken!

Sorry, I’ve re-read this thread and realise now that you’d not misunderstood anything, just needed a more dynamic solution. My memory is apparently even more horrific than I thought!

For what you’re wanting, this does seem like a reasonable solution after all since you don’t seem worried about huge numbers of requests coming in.

One final idea though - rather than creating the requested index file on the fly every time that request comes in, why not create it every time the subscription renews and have a Cron set to delete it when the subscription expires? I.e. a daily Cron that checks for expired accounts and deletes their associated index file. That way you could go back to the original iframe idea and if somebody does try to bookmark the direct URL and avoid renewals, it’ll just delete itself at the appropriate time anyway.

If you go for that idea, you’ll want to set a relatively low cache time within the CDN so that the CDN URL expires pretty soon after the page is deleted, but the subsequent request count would still be far lower than with your hosted routing solution.

thanks…