Making a Tilemap Shader

In another topic, Corona user @techdojo mentioned using a shader to render tilemaps on the GPU instead of drawing each tile separately, linking to http://blog.tojicode.com/2012/07/sprite-tile-maps-on-gpu.html and http://connorhollis.com/fast-tilemap-shader/ as references. Without going into the advantages or disadvantages of it, what things as a semi-beginner graphics programmer would I need to know to get started with this? I’ve done some basic graphics programming (writing a skeleton of a graphics engine in C++ with OpenGL), but I’m not very well versed in shader programming techniques other than “Find the texture coordinate, shade with RGB, done” along with some simple sine-coloring (read: demo sort of stuff) and fun with vertex shading (read: demo sort of stuff).

If anyone here is experienced with shaders, can I have some pointers? How would you go about porting the code from the two links above to Corona? Or do you have some general shader programming suggestions?

  • Caleb

Hi.

I don’t think I’m lucid enough to give any great help at the moment, but I’ll fish around for some info.

Did you by any chance watch any of the Corona Geek episodes on shaders? Not strictly necessary, but I could reference some stuff from those, if so.

If you haven’t, take a look at this and this (in particular the second half, there).

I don’t quite know Dusk’s requirements, but several of Toji’s caveats also came to my mind, so it would be good to know where it stands with respect to some of those. Connor’s code probably won’t be too far off from what you’d end up with, once it was switched over to GLSL.

The “entry sampler” is a lookup table, not  pixels, so you will get very weird results with bilinear filtering. “nearest” mode to the rescue: texture keys 

You can make this texture, say, by arranging a bunch of 1-pixel rects, with the (0-based) tile index / 256 as the red component (or chopped up between red and green, 8 bits each of the index, then recover the index in-shader), into a line or grid, and display.save()'ing it.

Connor lists six inputs apart from the samplers, while you have four vertex userdata. I described in a couple of those shows how one can bake constants (say, the number of tiles and dimensions) into a shader and just make one such shader per size. Otherwise, you have a challenge ahead.  :slight_smile:

Let me know if that gets you anywhere. I can probably offer better insight when it’s not 1 AM.  :smiley:

Thanks for your ideas. I haven’t seen any shader Corona Geeks, do you have any in particular you’re thinking of? Otherwise, I can just go from the most recent and find 'em.

Let me walk through the way the shader will work theoretically.

In a “loading” phase (which Dusk runs once in your game), Dusk creates the data image for the map. Is Corona’s display.save accurate enough for this? I assume yes, and I’m not really well versed in this, but needing pixel-perfect accuracy in display saving using 1x1 rectangles feels a little like it’ll break. Is there another way to write images?

Another question: Could the data image be used to store information about the map? For example, use the first pixels in the data image to define the map’s width and height, the next to define information about the tileset (dimensions, tile size, maybe eventually margin and spacing), then have all the data for the tiles? I think if I could bake information about the map into the data image, that would fix the lack of inputs. Is this just wishful thinking?

Moving on, after the loading phase, the shader renders the map by…

  1. Finding the current tile in the map and its offset from the tile’s corner (i.e., which tile from the map is this pixel from? And how far along the tileset tile should this pixel be?)

  2. Finding the index of the tile in the data image (i.e., which tile from the tileset is this tile meant to be?)

  3. Sampling from the tileset image based on the offset we got from step 1

I like the TojiCode method of setting pixels in the data image to coordinates in the tileset (i.e. a R/G value of 1 and 2 means the tile here is tile 1,2). I’m also concerned that if I use Connor’s float-based approach, it won’t be accurate enough for larger tilesets. Alternatively, I could blend the two and use the tile’s R/G values as kind of a sixteen-bit index value for the tile’s GID with some simple bit manipulation. What are your thoughts on all this?

Also, could you explain what you mean by this:

The “entry sampler” is a lookup table, not  pixels, so you will get very weird results with bilinear filtering. “nearest” mode to the rescue: texture keys

I understand the difference between bilinear filtering and nearest filtering, but what about the part about the entry sampler?
 

Now, for the “nuts and bolts” of how the shader’ll work.

  • To use the shader, I’d set a screen-sized rectangle to have a fill of type “composite”, with paint1 = the tileset image and paint2 = the data image. Right? From the docs page, what does this mean:

Because of the way multi-texturing works, both paint1 and paint2 will be rendered using the same texture coordinates. Because GradientPaints and ImageSheetPaints use different texture coordinates from plain BitmapPaints, you will get unexpected results unless you use plain BitmapPaints for paint1 and paint2.

  • How can I send the content-X- and Y-position of the rectangle to the shader so that I can offset the drawing as needed?

Finally, as for the caveats -

Only supports square tiles aligned to a grid

That’s fine, that’s what Dusk currently does.

Layer rendering is back-to-front to ensure proper transparency, which means there can be quite a bit of unnecessary overdraw.

Not really a problem; layers in Dusk are rendered separately. When getting overdraw out of Dusk becomes my priority, I’ll be amazed.
 

Viewport offsets are currently snapped to the nearest pixel. Floating point offsets introduce artifacts at tile edges. (Might be able to fix this in the shader)

Artifacts are already a problem of Dusk - extruding a tileset usually fixes them.

This only really makes sense for static tiles. Anything that moves (player sprites, enemies, etc) would be drawn separately.

Tiles rendered with this shader will be way less flexible than tiles created separately. The idea would be that you could set a flag to let Dusk render a layer with the shader, thereby making that layer ultra-fast but less flexible. For layers like the background or scenery or whatever, this would be great. For layers like the ground or walls, you’d need it to be a “real” layer.
 

  • Caleb

Hi Caleb,

I thought this page was enlightening. 2D texture lookup seems to be what you need for copying images from a tileMap into another image.

http://www.shaderific.com/glsl-functions/

Hi.

The Corona Geek episodes were over a few weeks, around #150 or so. Skipped a week somewhere and did some half episodes in the end. Off-hand, the one with the glass bakes in some constants, basically taking advantage of shader source being conventional Lua strings, and touches on the uber-shader concept. The program covering shader precision might come in handy as well.

1-pixel rects are merely the minimum, just so long as you land on one of the pixels in question. If you consult the reference card, the way to get pixels back onto the C / Lua side is glReadPixels (), which does take the dimensions and such. I’m pretty sure there’s a thorough process getting them back; if you want the nitty gritty details, see the spec. Where this becomes troublesome, I think, is bringing content scaling into the picture.

If it’s useful, I do have PNG-saving code (the “operators” module is compatible with bit , so you could use that instead), with docs here. This will not have any of PNG’s compression, so if you want that too, open it up in some program and re-save it. :) It’s been a while since I’ve touched that, but I’ll try to answer any questions.

You certainly could put some of that information in the data texture. I guess the major point to make there is that, assuming the code flow looks like Connor’s, you would be doing a texture fetch, then immediately needing the results to do a second texture fetch. You also immediately need the results of the second (getting the index), of course, but there’s less getting around that one. So it can bottleneck on the read, and it’s a double whammy to boot. Something to consider if it ends up being slow.

That said, those values being constants, they’re perfectly suited to fetching in the vertex shader (which will be executed for each of your four vertices, versus hundreds of thousands of pixels) and feeding into the fragment shader through a varying. (“varying” is rather general terminology; you’d sample identical data in each vertex, so the per-pixel interpolation is a formality.) I haven’t tried this yet, so I don’t know if it “just works”, but ideally you’d just do

if (gl\_MaxVertexTextureImageUnits \> 0) { // In vertex shader, fetch data stuff, pass into varying // In fragment shader, compute index from varying } else { // In vertex shader, nothing? // In fragment shader, do two fetches, compute index }

clauses and sample / not sample as appropriate in each kernel. Vertex texture fetch (VTF) isn’t guaranteed to be present on all hardware, thus the checks, but it’s not an exotic feature by any means. gl_MaxVertexTextureImageUnits is a built-in constant, so any reasonable driver should do static branching, much like an #if / #else / #endif combo, given the above type of code. Also, I believe VTF only uses “nearest”-style filtering.

Ufff, floating point…  :stuck_out_tongue:

I gave something of a primer on it during one of the shows, and assembled some links here. ( WARNING : rabbit hole!) If you refer to the “qualifiers” section of the reference card you’ll see something about (minimum guaranteed) relative precisions, 2^-10 or 2^-16. Basically, the way floating point works, between each power of 2, you can exactly represent each such step along the way. So between, say, 32 and 64, we can represent 32, 32 + 32 * 1 / 1024, 32 + 32 * 2 / 1024, … This applies equally well to the negative powers of 2, i.e. 2^-14, 2^-13, … 2^-1, so we have, say, as per the reference, 14 or 62 intervals, of 1024 or 65535 values, respectively. So we can get quite accurate for numbers between -1 and 1.  :D I bring this up since your texture coordinates will have been normalized to a [0, 1] range on the shader side. Non-power-of-2 textures could be slightly off, but with filtering going on anyhow it might not really amount to anything.

On the 16-bit index, sounds good, although of course you’ll send them across as multiples of 1 / 256 and then recover them shader-side.

Now that I’m looking for it I can’t find it, but “entry sampler” was just what one of the guys called the data texture’s sampler. And you don’t want bilinear filtering because you’d be looking up two or four tiles at once!  :slight_smile:

The only comment I’d make about the composite paint is that, if you do end up trying vertex texture fetch, you probably want the data texture in paint1 just in case only one vertex sampler is available.

The single set of texture coordinates won’t be an issue for you, since you’re coming up with your tile’s coordinates on the fly. This limitation rears its head instead, say, when trying to use an image sheet frame and a full texture together in the same effect, or two sheets.

The content coordinates are what you see in the vertex shader. Really, rather than making it screen-sized, the rect could just be of the normal size and in its proper position. The fragment shader will only receive those pixels that survive clipping anyhow.

@thomas6:

Thanks for the link. Looks useful.

@StarCrunch:

I think I’ll go with the PNG direct writing rather than screen capture - it also occurred to me that I might run into difficult-to-get-around issues with display saving if the screen is too small. Obviously, this generating-data-image step could take place in development rather than as a phase of running the app itself, but I think for the least amount of hair-pulling, I’ll just write the PNG data. I’ll look at your code for that.

As for doing two reads per pixel, hm. I’m a little concerned by the lack of universal VTF that you mentioned, and this sentence (from the link @thomas6 posted) exacerbates it:

Side note: On iOS devices texture lookup functionality is only available in the fragment shader.

Does this mean that I couldn’t do the data lookup in the vertex shader? I might just go with a “construct shader code” function in the engine which concatenates the config values and such into the shader source as constants, like you mentioned in your first paragraph.

As for floating-point math, if I understand you correctly, in short, this isn’t a problem?

  • Caleb

Hi Caleb,

Did some researching myself, and the general consensus is that for filling pixels, you use the fragment shader - so I wouldn’t give vertex / fragment any further thought.

I think that the big goal of the experiment here is to fill one image with rectangular areas from another image. As soon as that works the rest is tedious but ultimately simple trigonometry and math.

@ thomas6

The name suggests it’s all about positions, but the vertex stage also allows for setting up other properties to be interpolated as we traverse the pixels between those positions. (In this context the values will be the same at each vertex, so they just get “interpolated” to themselves.)

Vertices are processed before the fragment shader is run, of course, and this stage is invoked FAR less frequently for all but the most trivial geometry (4 vertices for a rectangle, versus rect.width * rect.height pixels), so when possible it can be a good place to move some work. (Even when this doesn’t amount to any improvement speed-wise, every operation necessarily consumes a little bit of power…)

The basic motivation behind bringing it up here is that texture fetches already have some latency, and when the results are needed immediately, that could be exacerbated, something like stop-and-go traffic.

@ Caleb P

After a little reading, it sounds like VTF came and went on iOS, for whatever reason. But it seems to be back in more recent versions: see here (“OpenGL ES”) and here (“Use Textures for Larger Memory Buffers in Vertex Shaders”).

That said, it’s probably a bit of a distraction for now. I’d recommend to just get the shader up and running in the most straightforward way, keeping VTF in mind as a possible refinement down the road. It might even turn out that you’ll have spare inputs to feed in the information in question, say if some of the others can be derived in-shader from Corona’s environment variables (listed in the custom effects guide).

Likewise for the accuracy. As you said about overdraw, once you’re at the point where it’s your biggest problem, you’re in pretty good standing.  :slight_smile:

Ok. I’ll see what I can put together! Wish me luck!

  • Caleb

@StarCrunch:

I’m running into an issue with your PNG library. It’s working nicely when I use your bitwise operators library, but when I try to use plugin.bit, I get the following error:

png\_encoder.lua:152: bad argument #1 to 'char' (invalid value) stack traceback: [C]: in function 'char' /Users/calebplace/Desktop/github/dusk/Dusk/dusk\_core/shader/png\_encoder.lua:152: in function 'U32' /Users/calebplace/Desktop/github/dusk/Dusk/dusk\_core/shader/png\_encoder.lua:246: in function 'ToChunk' /Users/calebplace/Desktop/github/dusk/Dusk/dusk\_core/shader/png\_encoder.lua:285: in function \</Users/calebplace/Desktop/github/dusk/Dusk/dusk\_core/shader/png\_encoder.lua:253\> (tail call): ?

I’m using the library correctly, because it works with your operators.lua library when the bit plugin isn’t found and it creates its own functions. Any time I change the require path to plugin.bit, it stops working.

[EDIT:] I’ve tracked it down to the bitwise bnot on line #242. Is something different between the two?

  • Caleb

Hi. Try doing a % 2^32 on the result of the bnot (). I think values larger than 2^31 will look like they’re negative when using plugin.bit , and bnot can arrive at those fairly easily.

Excellent. Works perfectly. Onward!

  • Caleb

Hm. After I save the PNG data, Preview gives me a corrupted file error. Opening it with ImageOptim and compressing it fixes it. Is there some way to fix this?

[EDIT] Maybe it’s just Preview. Affinity is opening it fine. So will OpenGL mind?

[EDIT AGAIN] Trying to load the image with a composite paint fill gives me  <Error>: ImageIO: PNG IHDR: CRC error. What to do?
 

[EDIT YET AGAIN] Using your Lua bit library makes the same error only it won’t even open in ImageOptim.

  • Caleb

Uff, an emotional rollercoaster reading those edits.  :mellow:

Sanity check : all four bitwise operators belong to the respective library, in either case?

I thought I had run this through bit , but I must be mistaken. I do remember being able to reopen the files I did make, though. Hmm. In the situations where they “worked”, did they look right, once converted?

It sounds like some of those programs might simply be more forgiving of a bad CRC. I suspect the

 c = .5 \* (c - bit)

line will also flake out when using bit , in cases where c looks negative (which will be pretty frequent, I’d imagine, given all the bxor ()'ing). See if

local bit = band(c, 0x1) c = rshift(c, 1) if bit ~= 0 then

works any better.

In my original application I had some fairly normal-sized images and so this whole process could take some time, to the point where I actually eked out performance gains by forgoing function calls where possible. Unfortunately, trouble like this comes along for the ride.  :frowning:

OpenGL won’t ever actually play a part in this. libpng is probably somewhere in the middle, or something like it. I have some image processing stuff I’d like to revisit, so I’m actually considering wrapping both that and some flavor of libjpeg into a plugin, for saving and loading purposes. I don’t know how soon that might be, though. Others are more than welcome to beat me to the punch!

Apologies for this roadblock, which I hope isn’t dampening your enthusiasm for the end goal!

!!!

Boy do I feel dumb. I figured out the problem - I had Tiled exporting in a compressed format. It’s all working fine now. It did give a corrupted error even with the correct export format, but changing the c = line fixed it.

Carrying on again!

  • Caleb

Sorry for the delay, I’ve been pretty busy lately.

The actual writing of the shader was much easier than I expected. On the first day (the 21st) I managed to complete an initial tilemap shader. I’ve named it Nightshade (see what I did there?), and, though it’s currently kind of hacked into Dusk’s structure, it works! As soon as I clean things up, I’ll push a new branch to the repo and you shader wizards can give any optimization suggestions.

  • Caleb

Heh, it’s always the “dumb” things that get you. All too common has been the case where I find myself running into some bizarre roadblock just as I seem to be tying the last loose ends on a module, when suddenly everything goes wrong. After much futile searching, I worry either that I’ve made some tragic design decision along the way, or that I’m in over my head with the material. And then (I can think of at least three such occasions in the fairly recent past) it turns out, nope, I just had two objects reusing the same table and stomping over one another’s data, with hilarious results. *sigh*

Sounds good on the shader. I know whenever I finish the above-mentioned image processing stuff, something like this would be great. In that case, the “entry sampler” is a hash function based on the position, rather than a texture, but the rest of the process should be the same. (This is where the uber-shader concept rears its head!)

The easiest way to do this is to create a new layer containing the color areas, then apply a modified version of the effect to function as a “mask” shader (it uses the layer texture to modify the background). It’s quite easy to do.

Hm. I thought up a really great idea to expand Corona’s vertex data capacity by storing multiple variables in one using bit ops (oh how I hate jumping through hoops!), but it seems that Corona’s version of GLSL ES doesn’t support them. And it seems that I can’t use a #version directive because Corona must pin on some code to the front of the shader. Is there a way to support bitwise ops in a shader at all?

[EDIT] #extension GL_EXT_gpu_shader4 : enable makes the shader compile, but is this a good idea? Like will Corona keep this supported?

  • Caleb

Hmm, were you actually able to feed anything in successfully?  :) Looks like just a desktop define, though?

Extensions are sort of hit-and-miss, depending as they do upon the hardware capabilities, driver, etc. Bitwise operators come standard in GLES 3.0, so newer devices ought to have all the machinery available, but it will probably be some time still before Corona migrates to that.

What I do in-shader is use this logic to pry apart an encoded float into two 10-bit numbers. I encode those in Lua using this routine (built atop this stuff), but I’m still not certain the num = … line is exact.[1] (A few things in that file need wider testing.) I think with some rewriting it might also be possible to eke out 11-bit numbers (2048 x 2048) given the GLES-guaranteed minima, but I haven’t tried.

After that you only have a couple more guaranteed bits, so I highly doubt 12-bit numbers will “Just Work”. That said, if such large ranges will be used for texture coordinates, it would be a weird thing to find a device that supports >= 1024-pixel dimensions but doesn’t provide more bits. Unfortunately the available ranges aren’t easily accessible just yet.

If you know the numbers will be 8-bit or less, there’s this older alternative. I’m not sure where the encoding function is, honestly. :) I could explain the idea behind it, though.

That said, for an effect like this, uniform userdata sound like they’ll be a good fit, whenever the interface is finally documented. I’m not sure if that stuff is still unstable or somebody just needs a little nudge to write it up.

[1] - I do have a test in mind.

Basically, a texture with 1024 x 1024 unique values, compared against as many 1-pixel rects, each sampled at the appropriate coordinate. The test would fail if any mismatch was found. Not quite as insane as it sounds, since everything’s static and so a few pixels at a time could be checked. Still, it sounds like a pain to write up.  :stuck_out_tongue: