Corona code optimisation

Hey Rob,I do have a couple of technical questions that I hope you (or the Corona team can answer)

 

  1. Is there anyway of getting a break down of lua memory used (particularly arrays and metatables)?  Texture memory is fairly easy to calculate (based on graphic size in 32bit).  For example, My game uses 100mb lua memory as a baseline for an empty city of 130x130 tiles.   Each tile is represented in memory as a metatable that contains multiple displayGroups, variables, arrays and functions.  This essentially makes a 4D array of data that defines my game world.

 

With a playable city the texture memory can easily reach another 50mb (with non 2x graphics, upwards of 200MB with HD graphics) and this is a problem for 512mb devices and often means the app is terminated by the OS.

 

I did try a paid corona profiler but it just kept falling over (out of memory exceptions) and crashing my laptop. Something native like getMemoryUsed(object) would be really great!

 

 

  1. My game uses multiple 2D arrays of boolean values because in my experience (in multiple languages) this is faster in random access than larger 3D arrays when accessed continually (similar to unrolling the loop).  This also makes the code more readable as I can create an array called  isTileARoad  and then access it if isTileARoad[x][y] then … end to quickly ascertain if the tile at x,y is a road, etc. 

 

But in lua is this more efficient; scan multiple simple 2D boolean arrays or a more complex 3D array aggregating these values? Also, which is more memory efficient in Corona, 5 x 2D arrays or a 3D array where the 3rd index contains the 5 boolean values?

 

i.e.

if array1[x][y] and array2[x][y] then DoSomething end

or

if array[x][y].value1 and array[x][y].value2 then DoSomething end

  1. I’ve noticed that if I leave my game running without any activity other than my enterFrame() listener then memory creeps up.  In the time it has taken to write this email my lua memory has gone from 103mb to 132mb.  I do declare a few local variables in my enterFrame() so I assume the memory creep is because garbage collection is not clearing them down even though they have gone out of scope?  

 

There is no load as I am running at 30fps (my laptop could easily sustain 120fps) so when does garbage collection kick in?   I thought it was on spare CPU cycles or is it only when you call collectgarbage()?

 

I appreciate these are probing questions but I am trying to understand Corona at a low level so I can optimise my code as much as possible!

 

Thanks

Adrian

Hi.

  1. My game uses multiple 2D arrays of boolean values because in my experience (in multiple languages) this is faster in random access than larger 3D arrays when accessed continually (similar to unrolling the loop). 

The deeper your data structure, the more you’re likely to be hopping around in memory. To get a deeper sense of the issues this can cause, I’d recommend reading this and watching this. You can even flatten down to a 1D array, from 2D as index = y * width + x , and from 3D as index = (z * height + y) * width + x. Naturally this has its own trade-offs, and you’ll have to decide if / when they’re worth it.

Array access will generally win out over general keys (strings, tables, etc.) if you’re accessing elements in order, though a little less so in Lua than C / C++ since the type must also be stored and thus things they won’t pack quite so tightly. (You did ask for low-level.  :))

If you are doing a lot of ifs in a loop, you might fall afoul of branch misprediction, where you’re asking the same question and getting yes or no randomly in response, frustrating the processor’s attempts to take advantage of patterns. See this for a good overview. Off-hand I don’t know how closely a Lua branch maps to a C one.

Anyhow, these will not always matter (profiling!), but if they hit you they can really hit hard.

Lua memory would tend to climb as you create tables, functions (do you do any function X () --[[stuff]] end in your enterFrame?), concatenate strings, etc. If it never stops growing, you have an issue, maybe a leak. But it might very well level off.

I never heard that device can take 100MB of Lua (system) memory. I beleive you are talking about Texture memory and not Lua memory.
My iPad mini 2 would crash after 4MBs.

(prefaced with all the usual cautions about don’t optimize too early, etc)

assuming all your variables are local…

any form of 2D access (t[x][y], t[x][“a”], t[x].a) takes 2 vm instructions to derefernce.

any form of 3D access (t[x][y][z], t[x][y][“a”], t[x][y].a) takes 3 vm instructions to derefernce.

and, aside, assuming your arrays aren’t sparse, numeric indices perform better than hashkeys

however, it’s not quite as simple as “flatter tables perform better”, because the 3D form t[x][y].a.would allow you to alias the first two dereferences if you’d be using the value repeatedly, fe:

local cell = t[x][y] if (cell.a or cell.b or cell.c) then... -- ==5 vm instructions for the table dereferencing

so would be “better” than a flat method with three 2D tables:

if (ta[x][y] or tb[x][y] or tc[x][y]) -- ==6 vm instructions for the table dereferencing

in general, the more you’ll be using that “cell”, the better off you’d be making the cell “fat” and aliasing to it just once.

if it’s truly just twice, as in your example, then it’s a wash, do as you please.  (4 vm instructions either way)

Thanks for your input guys…  I’ve tried a lot of things to reduce memory footprint.  Things like localising functions so using doSomething(self) rather than 17k instances of doSomething() for my main 4D array but there was no noticeable change in memory footprint or execution speed.  I think the Corona compiler does a good job realising where this happens and optimising accordingly.

My biggest gain to FPS is to not rely on Corona culling off screen graphics and handling this manually - splitting my 130x130 tiles into smaller quads and doing bounds checking on each quad.

My app is big (think Sim CIty 3000) and I guess 200-350MB is not actually that bad.  Shot showing my game and the memory usage (the selected building is 384x1448 in retina).

Image1.png

How are you computing the Lua memory usage? (copy/paste the code would be helpful)

Rob

Sure, I am using this…

local memoryUsed = "Tex: "..mfloor(system.getInfo("textureMemoryUsed")/1048576).."mb Lua:"..mfloor(collectgarbage("count")/1024).."mb"

Lua usually (depends a bit on how the VM was compiled) needs at least 16bytes per value in a table used as an array. This means, if you use an array to store a single boolean you’re wasting 99% of the memory, i.e. you use 1 bit and waste another 127. That’s just a massive loss and in addition is also close to the worst cash trashing you can do to a modern CPU.

F.i. a simple 128x128x128 true/false array consumes 32MB while the actual data is only 256kb.

Sadly I haven’t had the time to use Corona so I’m not sure which version of Lua is included. I guess it’s one without real integer support and no bitwise operators available so what you probably should use is the BitOp plugin https://marketplace.coronalabs.com/plugin/bit and store multiple booleans of a tile in a single value of your arrays. Seems you can use up to 32bits in a value using this which, depending on how many of these boolean arrays you have, might shrink you memory requirements down by 96%.

We are using a modified version of Lua 5.1.

I’ve learnt a few tricks while optimising my game.

In terms of memory usage the best one was this - say you have a large table of objects, in my case football players. I might have 6,000 loaded at once, each with a large number of properties such as .name, .age, .club, .wage etc.

Storing it in memory like this:

[lua]

player[1] = {name = “Tony Fork”, age = 30, club = “Crystal Palace”, wage=  2500}

player[2] = {name = “Winston Risk”, age = 20, club = “Fulham”, wage = 250}

[/lua]

Is nearly 4x as expensive in terms of memory than this:

[lua]

player.name = {“Tony Fork”, “Winston Risk”}

player.age = {30, 20}

player.club = {“Crystal Palace”, “Fulham”}

player.wage = {2500, 250}

[/lua]

Because disk access on Windows is so slow compared to mac, especially on IDE drives, I’ve had to move from accessing data from an SQL table on the fly, to holding most of it in memory and loading/saving in bulk when necessary.

I’ve got a number of other tips which are focussed more on optimising for speed rather than memory usage.

  1. Populating large arrays

[lua]

local arr = {}

local arrCnt = 1

for a = 1, 50000, 1 do

  arr[arrCnt] = {5, 1, 0, 0, “stuff”}

  arrCnt = arrCnt + 1

end

[/lua]

is over 3.5 faster than:

[lua]

local arr = {}

for a = 1, 50000, 1 do

 arr[#arr+1] = {5, 1, 0, 0, “stuff”}

end

[/lua]

  1. Populating small arrays

If you know how exactly big a table is going to be in advance, but don’t yet have the data to fill it yet.

[lua]

local arr = {true, true, true, true}

for a = 1, 4, 1 do

 arr[a] = someCalculation(a)

end

[/lua]

Is quicker than:

[lua]

local arr = {}

for a = 1, 4, 1 do

 arr[#arr+1] = someCalculation(a)

end

[/lua]

  1. Building large queries from strings

You might be building a dynamic query to interrogate an online DB or a local SQLlite DB, with a number of parameters, using the “…” concatenate command to build up the query string.

Putting all the components of the query into a table, and then creating the final query string at the end using table.concat is noticeably faster. This is vital in places where I’m doing up to 80,000 dynamic inserts or updates on an SQLlite table.

  1. Avoid while loops

If you need to loop until a certain criteria is fulfilled, a while loop is handy but expensive.

[lua]

for a = 1, 100000, 1 do

  local b = math.random(1,1000)

  if b == 50 then

    break

  end

end

[/lua]

is much faster than (assuming the 50 is found on the same iteration):

[lua]

local b = 0

while b ~= 50 do

  b = math.random(1,1000)

end

[/lua]

  1. Avoid inPairs, if possible

Say I am iterating over all the attributes a footballer has, to calculate his influence on the next passage of play:

att.drb = 15

att.pas = 7

att.tac = 12

It is much more efficient to either store these as a numbered array rather than keys (which uses less memory but is less easy to debug later), or store the key names in a numbered array and iterate that way.

SLOW:

[lua]

local inf = 0

local fac = {drb = 0.4, pas = 1, tac = 0.2}

for k, v in pairs(att) do

  inf = v * fac[k]

end

[/lua]

FASTER:

[lua]

local inf = 0

local fac = {drb = 0.4, pas = 1, tac = 0.2}

local nms = {“drb”, “pas”, tac"}

for a = 1, #nms, 1 do

  local k = nms[a]

  local v = att[k]

   inf = v * fac[k]

end

[/lua]

I’m now really confused about memory allocations…  Take this simple test code

local mfloor = math.floor print("boolean array 1000x1000") print("start:"..mfloor(collectgarbage("count")/1024).."mb") local a, b, c = {}, {}, {} for i = 1, 1000 do a[i] = {} for j = 1, 1000 do a[i][j] = true end end print("end:"..mfloor(collectgarbage("count")/1024).."mb") print("int array 1000x1000") print("start:"..mfloor(collectgarbage("count")/1024).."mb") for i = 1, 1000 do b[i] = {} for j = 1, 1000 do b[i][j] = 100 end end print("end:"..mfloor(collectgarbage("count")/1024).."mb") print("string array 1000x1000") print("start:"..mfloor(collectgarbage("count")/1024).."mb") for i = 1, 1000 do c[i] = {} for j = 1, 1000 do c[i][j] = "Lorem ipsum dolor sit amet, pri ad impetus eleifend, ut sanctus appellantur sea. Ferri vivendum eam an, pri nonumy omnium persequeris no, sed in virtute salutatus. An discere definitiones pri. Accusamus splendide qui ei, mei in error corpora vituperatoribus. Te quot indoctum disputando vix, vix at vivendum moderatius." end end print("end:"..mfloor(collectgarbage("count")/1024).."mb")

and the corresponding output

17:01:53.828 boolean array 1000x1000 17:01:53.828 start:0mb 17:01:53.875 end:15mb 17:01:53.875 int array 1000x1000 17:01:53.875 start:15mb 17:01:53.922 end:31mb 17:01:53.922 string array 1000x1000 17:01:53.922 start:31mb 17:01:53.969 end:47mb

It seems that simple arrays of boolean and integers do indeed allocate the same amount of memory (15Mb each) so what Michael said seems to be proved by test and that arrays of boolean are massively wasteful!

I don’t understand why the array of long strings is only just larger than the array of booleans?

@nick_sherman - good stuff :slight_smile:

the trouble with posting optimization tips is that you’ll tend to attract pesky nitpickers :smiley:

(truthfully though, any appearance of just being a pesky-nitpicker is apologized for in advance)

  1.  arrCnt isn’t needed

  2.  {nil,nil,nil,nil} is faster than {true,true,true,true}

  3.  spot on

  4.  while loops can perform better than “for i” in certain cases

  5.  clever indirection :slight_smile:

your string is only stored once, it’s static.  the table is full of identical pointers to that single static.

  1. Good point, I generally use this technique when I’m importing thousands of rows from an SQLlite table, in that case I wouldn’t have an iterator variable so would need arrCnt.

  2. I’ll remember that!

  3. I’ll keep that mind, certainly I’ve found it to be a lot faster in all the places I’ve used it.

To easily verify what davebollinger said just replace your long string with something short but unique for each entry - like

c[i][j] = “teststring”…(i*1000+j)

and watch your memory requirements become even more impressive :slight_smile:

There are a few more rules/tips to optimize memory usage. F.i. the overhead for tables is bigger than for values which means, if you use a 1d array instead of 2d or even 3d, there’s some more memory to be saved at slightly more complex, but maybe even faster code especially while iterating/updating your tiles.

I.e. you’d write

local mapWidth = 130 local mapHeight = 130 local mapArray = { filled with your content } for iy = 1, mapHeight do     local tileIdx = iy\*mapWidth     for ix = 1, mapWidth do         local tileValue = mapArray[tileIdx]         tileIdx = tileIdx + 1     end end

The reason this can be better is because Lua does not support multidimensional arrays in a single memory block as f.i. C/C++, you just create lots of small arrays within other arrays and if you do 2 or even 3 dimensions you actually do just a second or third array access which for sure is slower than a simple addition.

Downside is, single element access is a bit less convenient.

Also, there is a chance that, depending on how you build your arrays, Lua might use it’s hash table implementation instead of the direct integer based indexing but I’d start with big low hanging fruit first (i.e. the boolean arrays).

I did try and ran out of memory fast!

I’ve merged all my boolean lookup arrays into a single array called  tileStats.  This now stores 10 values per index (ints and booleans) and saved around 5Mb.  I have a much larger and more complicated array called  buildings that stores 100+ values, displayGroups, etc. per index.  I find looking up all my main run-time values in tileStats is much faster than accessing the main array if I just want to know if there is say a road asset at x,y rather than the road piece, index, rotation, how it connects to its neighbours, etc.

Just getting your valued input here really… I often do things like this (which may not be the most optimised but it is easy to code and more importantly easy to read

function rebuild\_isTileARoad() local function rebuild() --add reference to whether this tile has a road or path asset local x, y = 0, 0 for x = 1, \_maxTilesX do for y = 1, \_maxTilesY do tileStats[x][y].isRoad = buildings[x][y].isRoad or buildings[x][y].isPath end end end if \_debug then rebuild() else pcall(rebuild) end end 

and then repeatedly query it throughout the lifetime of my app like this

if not tileStats[x][y].isRoad then --allow action else --block action end

Would you guys do anything different?

@Michael Flad In this sort of situation where one is only dealing with a bit at a time, you can even just emulate the bitwise ops and that gets you up to 53 bits (after which the spacing between consecutive double-precision numbers goes from 1 to 2 and you can no longer hit integers reliably). I wrote a bit vector along these lines some time back. Of course with userdata you can go to 64 and beyond.

As for an “ints and booleans” case such as  tileStats , I’ve also had my eye on this project for quite some time now, though I don’t know how difficult it would be to port to ARM.

@adrianmm If a lot of your information is “cold” (only used occasionally), especially if a lot of it happens to be small integers, the MemoryBlob plugin I recently released might help. (You can find sample code linked in the docs.) This would let you keep some data together without the slight per-object Lua bookkeeping overhead, paying in retrieval time instead. At the moment this probably requires decomposing integers into bytes and some slightly cumbersome string.byte and string.char calls, but if all goes well I should have another plugin out shortly that wraps up some data structuring APIs. (Both of these are / will be free.)

Slightly related, I’ve been considering binding bundle to allow for in-memory compression, which I think might be interesting in scenarios like these.

Basically my issue is not knowing how much RAM is available on the device.  I’ve searched the docs but can find no way of actually retrieving this valuable piece of info using system.getInfo() or similar.  I could approximate this based on DPI but that doesn’t work as a tablet my have a low DPI but way more RAM than the equivalent phone based on DPI alone.

Whilst knowing how much RAM my app is using is useful, it is essentially meaningless without knowing how much is available on the device.  200MB is nothing if you have 2GB available… it is a massive issue if the device only has 512MB.

According to stackoverflow this is relatively trivial.  If someone could knock up a simple plugin to answer what is surely a clear hole in Corona that would be awesome!  It needs nothing more than a single function to return the total RAM and then developers could adapt their games accordingly.  Would be preferable that the SDK exposes this as it is really important for larger, more resource intensive apps. 

I would be interested in this too. Retrieving SQLlite data on Windows builds is so slow that I have to load as much into memory as possible and lookup against that. I may just have to stipulate a minimum of 2GB ram.