Making a Tilemap Shader

Oh, bother. It’s just a desktop solution. On iPad, it says the extension is unavailable :(. The desktop version works perfectly, though, with bit magic and everything. It’s maddening - the shader works, I just need a way to pass more variables to it!

Due to GLSL’s 32-bit-float minimum, I was able to set the bit shift distance to 16 and it seems to be working fine. The distance can be tweaked in settings.lua, though, if you think that’s a bad idea. In that case, why would using 16 bits not be so good, if we know that a float is at minimum 32?

The current version is published as another branch in the GitHub repository, so check it out and see what you think.

  • Caleb

Hi. Sorry for the late reply.

It’s actually great to know that’s a viable option on desktop! I don’t particularly like my crazy solution, but it is what it is.  :D I feel your pain on the data limit, especially with such low-frequency effects. The great advantage of vertex userdata is that, as the name suggests, the inputs must be passed along as part of the vertex, in a spare attribute. I assume the stock Corona “master effect” employs one attribute each for position, color, and vertex userdata. (I guess that does bring to mind that you could smuggle in data through setFillColor (), if it’s not otherwise being used…) This allows subsequent uses of the same shader to be batched, which is far cheaper than doing tons of draw calls, but in this case that effectively happens anyway, so going through uniforms would be a wash.

For what it’s worth, there could very well be GLES extensions that offer the same. Here’s the list, if you care to do some digging: Khronos OpenGL ES Registry Typically these provide some symbols that can be #ifdef’d / #if defined(X) in your shader. They’ll also usually provide some friendly name so that on the Lua side you can do something like

local extensions\_list = {} for name in string.gmatch(system.getInfo("GL\_EXTENSIONS"), "[\_%w]+") do extensions\_list[name] = true end local is\_thing\_supported = extensions\_list["THING"] -- available?

Desktop GLSL might provide 32 bits (although I have my doubts about some integrated laptop GPUs), in which case it’s well and good, but mobile is not often so generous. 24-bit seems to be the minimum one can assume (thus some of my numbers, above). While GLES 3.0-focused, I found this to be a good analysis. Then again, if bit ops aren’t an option anyhow, not a big deal.

I’ll take a look at your code and see if I can incorporate the technique into Nightshade. I do like the idea of using setFillColor to get extra inputs. The problem is, the CoronaColorScale function (macro?) is all we get. I don’t even know where the color variables are stored.

What about multiplying and dividing by 2^n? Would that be efficient enough to be a viable alternative to real bitwise operators?

  • Caleb

Wow… I have to say, this whole thread is going way over my head :slight_smile:

Caleb, can you sort of summarize this in laymen’s terms? It sound to me like you have succeeded in building a tilemap by copying RGBA as a shader, right? I’m not following what else is needed, actually, because that sounds like it allows everything you’d want? Like I said, I’m not following well because it’s becoming too technical for my brain!

  1. Tiled stores the tiles a layer displays in the form of a long list of numbers, corresponding to each tile’s GID in a tileset.

  2. Dusk reads the tile numbers and generates an image based on them. Each tile GID is encoded in the image as a single pixel, with the pixel’s red value denoting the GID. A GID of 0 corresponds to a completely black pixel, and GIDs from then up make the pixels redder and redder. In other words, Dusk stores the layer’s tile data in an image. Dusk stores it in an image because GLSL can’t use a list of Lua numbers, but it can use pixels from a texture.

  3. To build Nightshade layers, Dusk creates a rectangle the size of the layer and tells it to shade itself using the Nightshade shader. The Nightshade shader takes two texture inputs: the data image created in step 2 and the tileset image the layer uses.

  4. Each time the shader colors a pixel in the rectangle, it finds the tile coordinate the pixel would fall into (e.g. for a 32x32 tileset, pixels 1 to 32 in X- and Y-axes would all be in tile coordinate [1, 1]), reads the red value at that location from the data image, and finds the tile in the tileset corresponding to the red-value-encoded GID (the image storage mechanism makes red values from 0-255, so a red value of 1 means tile 1 in the tileset, 2 means tile 2, etc.). Then it gets the correct pixel from that tile and tells the graphics engine to color the pixel in the layer rectangle that color.

As for all the bit-shifting trouble, that’s because Corona only allows developers to send 4 custom inputs to a shader. And how many do we need to render a tile layer? Width of the layer in pixels, height of the layer in pixels (because GLSL sends coordinates in the range of [0,1]), tileset width in tiles, tileset height in tiles, width of each tile in pixels, height of each tile in pixels, and, preferably, margin and spacing of the tileset. That makes at least 6 inputs required, and ideally 8. We can store multiple values in a single number by moving the bits over and using a 32-bit number (for desktop GLSL) as two 16-bit numbers, thereby giving us 8 inputs, but bit manipulation isn’t available in GLSL ES 2.0, which is what Corona uses. Thus, the bit-shifting alternative discussions.

  • Caleb

Ah yes… Okay, I see. Thanks for the superclear update!

To add to Caleb’s excellent summary, the difference in my technique for packing two values is that it encodes an integer[1] using exactly representable floating point values (refer to some of those links I mention above), then decodes them on the shader side accordingly.

Unfortunately, I’m stuck with what GLES 2.0 gives me, so I’m still unsure whether I have an exact decoding or just a good approximation;  thus the mention of a test. I’m at least heartened that it no longer wildly freaks out on some values.  :D  (To give an analogy, think about trying different identities for, say, 1 - cos(x). Mathematically they’re all equal, but in real-world computing, some might give more accurate / stable results than others. In this case I’m aiming for an equation that is 100% exact.)

[1] - Up to 20 bits, so  for example two integers from 0 to 1023, but able to just sneak in 1024 as well on a lowest-common-denominator device. Often these will only temporarily be integers, e.g. having undergone a math.floor(x * 1024) transformation, which will be undone once in the shader.

Wow, it’s been a while. I’ve decided that, if uniform userdata is coming fairly soon, I’ll just wait and go with that for clarity and “normal Corona-ness.” I posted a topic about it here: https://forums.coronalabs.com/topic/59706-uniform-userdata/

  • Caleb

If you were willing to trade-off flexibility, might you require that the tileset scheme be one of a few supported configurations, then just pass a single number to describe it?  (getting a 4-for-1 deal, otherwise you’d fall back to conventional tile rendering for “atypical” tileset schemes)

for example,

tileset scheme “1” might decode to:  256x256 sheet, 16x16 tiles

tileset scheme “2” might decode to:  512x512 sheet, 32x32 tiles

tileset scheme “3” might decode to:  512x512 sheet, 30x30 tiles, border 1 for aa edge extrude

etc

Hm. Interesting idea. I’ll think through it. Maybe a better way would be to encode only certain values as presets. That is, the layer width and height could be normal numbers, but the tile size could be presets, 'cause they’re usually 16x16/32x32/64x64. Or, alternatively, I could assume people use square tiles and store tile size in one variable. Good stuff to think about; thanks for bringing this approach up.

  • Caleb

Hi.

I don’t think I’m lucid enough to give any great help at the moment, but I’ll fish around for some info.

Did you by any chance watch any of the Corona Geek episodes on shaders? Not strictly necessary, but I could reference some stuff from those, if so.

If you haven’t, take a look at this and this (in particular the second half, there).

I don’t quite know Dusk’s requirements, but several of Toji’s caveats also came to my mind, so it would be good to know where it stands with respect to some of those. Connor’s code probably won’t be too far off from what you’d end up with, once it was switched over to GLSL.

The “entry sampler” is a lookup table, not  pixels, so you will get very weird results with bilinear filtering. “nearest” mode to the rescue: texture keys 

You can make this texture, say, by arranging a bunch of 1-pixel rects, with the (0-based) tile index / 256 as the red component (or chopped up between red and green, 8 bits each of the index, then recover the index in-shader), into a line or grid, and display.save()'ing it.

Connor lists six inputs apart from the samplers, while you have four vertex userdata. I described in a couple of those shows how one can bake constants (say, the number of tiles and dimensions) into a shader and just make one such shader per size. Otherwise, you have a challenge ahead.  :slight_smile:

Let me know if that gets you anywhere. I can probably offer better insight when it’s not 1 AM.  :smiley:

Thanks for your ideas. I haven’t seen any shader Corona Geeks, do you have any in particular you’re thinking of? Otherwise, I can just go from the most recent and find 'em.

Let me walk through the way the shader will work theoretically.

In a “loading” phase (which Dusk runs once in your game), Dusk creates the data image for the map. Is Corona’s display.save accurate enough for this? I assume yes, and I’m not really well versed in this, but needing pixel-perfect accuracy in display saving using 1x1 rectangles feels a little like it’ll break. Is there another way to write images?

Another question: Could the data image be used to store information about the map? For example, use the first pixels in the data image to define the map’s width and height, the next to define information about the tileset (dimensions, tile size, maybe eventually margin and spacing), then have all the data for the tiles? I think if I could bake information about the map into the data image, that would fix the lack of inputs. Is this just wishful thinking?

Moving on, after the loading phase, the shader renders the map by…

  1. Finding the current tile in the map and its offset from the tile’s corner (i.e., which tile from the map is this pixel from? And how far along the tileset tile should this pixel be?)

  2. Finding the index of the tile in the data image (i.e., which tile from the tileset is this tile meant to be?)

  3. Sampling from the tileset image based on the offset we got from step 1

I like the TojiCode method of setting pixels in the data image to coordinates in the tileset (i.e. a R/G value of 1 and 2 means the tile here is tile 1,2). I’m also concerned that if I use Connor’s float-based approach, it won’t be accurate enough for larger tilesets. Alternatively, I could blend the two and use the tile’s R/G values as kind of a sixteen-bit index value for the tile’s GID with some simple bit manipulation. What are your thoughts on all this?

Also, could you explain what you mean by this:

The “entry sampler” is a lookup table, not  pixels, so you will get very weird results with bilinear filtering. “nearest” mode to the rescue: texture keys

I understand the difference between bilinear filtering and nearest filtering, but what about the part about the entry sampler?
 

Now, for the “nuts and bolts” of how the shader’ll work.

  • To use the shader, I’d set a screen-sized rectangle to have a fill of type “composite”, with paint1 = the tileset image and paint2 = the data image. Right? From the docs page, what does this mean:

Because of the way multi-texturing works, both paint1 and paint2 will be rendered using the same texture coordinates. Because GradientPaints and ImageSheetPaints use different texture coordinates from plain BitmapPaints, you will get unexpected results unless you use plain BitmapPaints for paint1 and paint2.

  • How can I send the content-X- and Y-position of the rectangle to the shader so that I can offset the drawing as needed?

Finally, as for the caveats -

Only supports square tiles aligned to a grid

That’s fine, that’s what Dusk currently does.

Layer rendering is back-to-front to ensure proper transparency, which means there can be quite a bit of unnecessary overdraw.

Not really a problem; layers in Dusk are rendered separately. When getting overdraw out of Dusk becomes my priority, I’ll be amazed.
 

Viewport offsets are currently snapped to the nearest pixel. Floating point offsets introduce artifacts at tile edges. (Might be able to fix this in the shader)

Artifacts are already a problem of Dusk - extruding a tileset usually fixes them.

This only really makes sense for static tiles. Anything that moves (player sprites, enemies, etc) would be drawn separately.

Tiles rendered with this shader will be way less flexible than tiles created separately. The idea would be that you could set a flag to let Dusk render a layer with the shader, thereby making that layer ultra-fast but less flexible. For layers like the background or scenery or whatever, this would be great. For layers like the ground or walls, you’d need it to be a “real” layer.
 

  • Caleb

Hi Caleb,

I thought this page was enlightening. 2D texture lookup seems to be what you need for copying images from a tileMap into another image.

http://www.shaderific.com/glsl-functions/

Hi.

The Corona Geek episodes were over a few weeks, around #150 or so. Skipped a week somewhere and did some half episodes in the end. Off-hand, the one with the glass bakes in some constants, basically taking advantage of shader source being conventional Lua strings, and touches on the uber-shader concept. The program covering shader precision might come in handy as well.

1-pixel rects are merely the minimum, just so long as you land on one of the pixels in question. If you consult the reference card, the way to get pixels back onto the C / Lua side is glReadPixels (), which does take the dimensions and such. I’m pretty sure there’s a thorough process getting them back; if you want the nitty gritty details, see the spec. Where this becomes troublesome, I think, is bringing content scaling into the picture.

If it’s useful, I do have PNG-saving code (the “operators” module is compatible with bit , so you could use that instead), with docs here. This will not have any of PNG’s compression, so if you want that too, open it up in some program and re-save it. :) It’s been a while since I’ve touched that, but I’ll try to answer any questions.

You certainly could put some of that information in the data texture. I guess the major point to make there is that, assuming the code flow looks like Connor’s, you would be doing a texture fetch, then immediately needing the results to do a second texture fetch. You also immediately need the results of the second (getting the index), of course, but there’s less getting around that one. So it can bottleneck on the read, and it’s a double whammy to boot. Something to consider if it ends up being slow.

That said, those values being constants, they’re perfectly suited to fetching in the vertex shader (which will be executed for each of your four vertices, versus hundreds of thousands of pixels) and feeding into the fragment shader through a varying. (“varying” is rather general terminology; you’d sample identical data in each vertex, so the per-pixel interpolation is a formality.) I haven’t tried this yet, so I don’t know if it “just works”, but ideally you’d just do

if (gl\_MaxVertexTextureImageUnits \> 0) { // In vertex shader, fetch data stuff, pass into varying // In fragment shader, compute index from varying } else { // In vertex shader, nothing? // In fragment shader, do two fetches, compute index }

clauses and sample / not sample as appropriate in each kernel. Vertex texture fetch (VTF) isn’t guaranteed to be present on all hardware, thus the checks, but it’s not an exotic feature by any means. gl_MaxVertexTextureImageUnits is a built-in constant, so any reasonable driver should do static branching, much like an #if / #else / #endif combo, given the above type of code. Also, I believe VTF only uses “nearest”-style filtering.

Ufff, floating point…  :stuck_out_tongue:

I gave something of a primer on it during one of the shows, and assembled some links here. ( WARNING : rabbit hole!) If you refer to the “qualifiers” section of the reference card you’ll see something about (minimum guaranteed) relative precisions, 2^-10 or 2^-16. Basically, the way floating point works, between each power of 2, you can exactly represent each such step along the way. So between, say, 32 and 64, we can represent 32, 32 + 32 * 1 / 1024, 32 + 32 * 2 / 1024, … This applies equally well to the negative powers of 2, i.e. 2^-14, 2^-13, … 2^-1, so we have, say, as per the reference, 14 or 62 intervals, of 1024 or 65535 values, respectively. So we can get quite accurate for numbers between -1 and 1.  :D I bring this up since your texture coordinates will have been normalized to a [0, 1] range on the shader side. Non-power-of-2 textures could be slightly off, but with filtering going on anyhow it might not really amount to anything.

On the 16-bit index, sounds good, although of course you’ll send them across as multiples of 1 / 256 and then recover them shader-side.

Now that I’m looking for it I can’t find it, but “entry sampler” was just what one of the guys called the data texture’s sampler. And you don’t want bilinear filtering because you’d be looking up two or four tiles at once!  :slight_smile:

The only comment I’d make about the composite paint is that, if you do end up trying vertex texture fetch, you probably want the data texture in paint1 just in case only one vertex sampler is available.

The single set of texture coordinates won’t be an issue for you, since you’re coming up with your tile’s coordinates on the fly. This limitation rears its head instead, say, when trying to use an image sheet frame and a full texture together in the same effect, or two sheets.

The content coordinates are what you see in the vertex shader. Really, rather than making it screen-sized, the rect could just be of the normal size and in its proper position. The fragment shader will only receive those pixels that survive clipping anyhow.

@thomas6:

Thanks for the link. Looks useful.

@StarCrunch:

I think I’ll go with the PNG direct writing rather than screen capture - it also occurred to me that I might run into difficult-to-get-around issues with display saving if the screen is too small. Obviously, this generating-data-image step could take place in development rather than as a phase of running the app itself, but I think for the least amount of hair-pulling, I’ll just write the PNG data. I’ll look at your code for that.

As for doing two reads per pixel, hm. I’m a little concerned by the lack of universal VTF that you mentioned, and this sentence (from the link @thomas6 posted) exacerbates it:

Side note: On iOS devices texture lookup functionality is only available in the fragment shader.

Does this mean that I couldn’t do the data lookup in the vertex shader? I might just go with a “construct shader code” function in the engine which concatenates the config values and such into the shader source as constants, like you mentioned in your first paragraph.

As for floating-point math, if I understand you correctly, in short, this isn’t a problem?

  • Caleb

Hi Caleb,

Did some researching myself, and the general consensus is that for filling pixels, you use the fragment shader - so I wouldn’t give vertex / fragment any further thought.

I think that the big goal of the experiment here is to fill one image with rectangular areas from another image. As soon as that works the rest is tedious but ultimately simple trigonometry and math.

@ thomas6

The name suggests it’s all about positions, but the vertex stage also allows for setting up other properties to be interpolated as we traverse the pixels between those positions. (In this context the values will be the same at each vertex, so they just get “interpolated” to themselves.)

Vertices are processed before the fragment shader is run, of course, and this stage is invoked FAR less frequently for all but the most trivial geometry (4 vertices for a rectangle, versus rect.width * rect.height pixels), so when possible it can be a good place to move some work. (Even when this doesn’t amount to any improvement speed-wise, every operation necessarily consumes a little bit of power…)

The basic motivation behind bringing it up here is that texture fetches already have some latency, and when the results are needed immediately, that could be exacerbated, something like stop-and-go traffic.

@ Caleb P

After a little reading, it sounds like VTF came and went on iOS, for whatever reason. But it seems to be back in more recent versions: see here (“OpenGL ES”) and here (“Use Textures for Larger Memory Buffers in Vertex Shaders”).

That said, it’s probably a bit of a distraction for now. I’d recommend to just get the shader up and running in the most straightforward way, keeping VTF in mind as a possible refinement down the road. It might even turn out that you’ll have spare inputs to feed in the information in question, say if some of the others can be derived in-shader from Corona’s environment variables (listed in the custom effects guide).

Likewise for the accuracy. As you said about overdraw, once you’re at the point where it’s your biggest problem, you’re in pretty good standing.  :slight_smile:

Ok. I’ll see what I can put together! Wish me luck!

  • Caleb

@StarCrunch:

I’m running into an issue with your PNG library. It’s working nicely when I use your bitwise operators library, but when I try to use plugin.bit, I get the following error:

png\_encoder.lua:152: bad argument #1 to 'char' (invalid value) stack traceback: [C]: in function 'char' /Users/calebplace/Desktop/github/dusk/Dusk/dusk\_core/shader/png\_encoder.lua:152: in function 'U32' /Users/calebplace/Desktop/github/dusk/Dusk/dusk\_core/shader/png\_encoder.lua:246: in function 'ToChunk' /Users/calebplace/Desktop/github/dusk/Dusk/dusk\_core/shader/png\_encoder.lua:285: in function \</Users/calebplace/Desktop/github/dusk/Dusk/dusk\_core/shader/png\_encoder.lua:253\> (tail call): ?

I’m using the library correctly, because it works with your operators.lua library when the bit plugin isn’t found and it creates its own functions. Any time I change the require path to plugin.bit, it stops working.

[EDIT:] I’ve tracked it down to the bitwise bnot on line #242. Is something different between the two?

  • Caleb

Hi. Try doing a % 2^32 on the result of the bnot (). I think values larger than 2^31 will look like they’re negative when using plugin.bit , and bnot can arrive at those fairly easily.