Corona code optimisation

2GB minimum - wow!  Why do you need to store so much data?

For me personally, if I knew the player was launching my game (for the first time) on a device with under 1GB RAM I could size their city to 100x100 and those over 1GB get the full 130x130.  I can only do this on the very first launch so the process is transparent to the end user.

It’s an historical management sim, so there’s around 11,000 players depending which year you choose, each with a career up to 20 years long. It’s really the new game creation phase that is the issue, data on 100,000 player-seasons, 15,000 clubs and 279 nations really need to be in memory to have this perform at an acceptable speed on Windows machines with IDE drives. Unlike a normal management sim where their ability & attributes are already set, my game has to analyse each player’s career path to determine how good they were.

Do you calculate 20 years worth of data on first load?  As this is historical data, could that not be precalculated and stored?

I don’t know your game so this may be totally irrelevant: but lets say you choose a country (UK) and then choose a team (Spurs) could you not then calculate their dataset and store it.  For each club looked at you would calculate their stats in real time.  Eventually you will have the full dataset without the huge upfront calculation.

Happy to help on the SQL side if you need a second pair of eyes… I create transactional SQL DBs for my day job.  Currently I have a few clusters in different locations all syncing and crunching data in real time.

Another option might be to scale out the calculations on a platform like AWS as that will be considerably faster than a PC but I appreciate this might involve large data transfers.

It could possibly be pre-calculated and I have considered it, but I wanted to avoid every game being exactly the same - there’s an element of randomness to how and when players develop.

Another issue is that the base data is constantly being improved, having come from numerous sources. For example, I may discover that the Johnny Smith that played on the wing in Scotland from 1968-72 is the same John Smith that played at FB in England from 1974-1976. And he’s also the same Jon Smith that is in the international database with 4 caps in midfield for New Zealand. At the moment I just have to give them the same playerID and the data is combined automatically. By pre-calculating I’d have to re-calculate every time something changes.

Speed is not really an issue on iOS/OSX, as SQLlite is so much faster. So all this optimisation is aimed at getting Windows/Android up to the same speed.

For instance, there is a routine which picks the lineups for the CPU teams. Currently it selects the players from the DB for the given club, as well as grabbing their attributes ready for the match engine. I wrap the transactions up with ‘BEGIN TRANSACTION’ and ‘COMMIT’ but there’s still an overhead, compared with having all the players and their attributes in memory from the start.

So far, I have improved the process from taking 70 seconds for one league and no international database (on average this adds an extra 40% of players to create), to 25 seconds for four leagues including the international database, with the downside of memory usage going from a max of 70MB to around 380MB.

There are so many ways of optimising SQL that can massively improve performance, some are query-based, others index-based but the big gains come from re-structuring the physical data and normalisation.  Sometimes de-nomalising RI data can really speed up reads.

If you are running lots of inline aggregation then localising that into aggregated tables can have a massive performance gains.

A simplistic example… if you find your sim is constantly looping every game a team played in a season to find the goals scored you could have an aggregated table indexed per team per year and store the SUM().  This changes your query from a table scan (really slow) to an index seek (very fast).

Yeah, perhaps I need to look into this more. I’d love to be able to go back to just grabbing data when I need it, rather than having it in memory just in case. Just accessing the DB on Windows is so expensive, when I need a task such as picking a CPU side to be done in 10ms this is pretty much eaten up as soon as I go near SQLlite.

Hi.

  1. My game uses multiple 2D arrays of boolean values because in my experience (in multiple languages) this is faster in random access than larger 3D arrays when accessed continually (similar to unrolling the loop). 

The deeper your data structure, the more you’re likely to be hopping around in memory. To get a deeper sense of the issues this can cause, I’d recommend reading this and watching this. You can even flatten down to a 1D array, from 2D as index = y * width + x , and from 3D as index = (z * height + y) * width + x. Naturally this has its own trade-offs, and you’ll have to decide if / when they’re worth it.

Array access will generally win out over general keys (strings, tables, etc.) if you’re accessing elements in order, though a little less so in Lua than C / C++ since the type must also be stored and thus things they won’t pack quite so tightly. (You did ask for low-level.  :))

If you are doing a lot of ifs in a loop, you might fall afoul of branch misprediction, where you’re asking the same question and getting yes or no randomly in response, frustrating the processor’s attempts to take advantage of patterns. See this for a good overview. Off-hand I don’t know how closely a Lua branch maps to a C one.

Anyhow, these will not always matter (profiling!), but if they hit you they can really hit hard.

Lua memory would tend to climb as you create tables, functions (do you do any function X () --[[stuff]] end in your enterFrame?), concatenate strings, etc. If it never stops growing, you have an issue, maybe a leak. But it might very well level off.

I never heard that device can take 100MB of Lua (system) memory. I beleive you are talking about Texture memory and not Lua memory.
My iPad mini 2 would crash after 4MBs.

(prefaced with all the usual cautions about don’t optimize too early, etc)

assuming all your variables are local…

any form of 2D access (t[x][y], t[x][“a”], t[x].a) takes 2 vm instructions to derefernce.

any form of 3D access (t[x][y][z], t[x][y][“a”], t[x][y].a) takes 3 vm instructions to derefernce.

and, aside, assuming your arrays aren’t sparse, numeric indices perform better than hashkeys

however, it’s not quite as simple as “flatter tables perform better”, because the 3D form t[x][y].a.would allow you to alias the first two dereferences if you’d be using the value repeatedly, fe:

local cell = t[x][y] if (cell.a or cell.b or cell.c) then... -- ==5 vm instructions for the table dereferencing

so would be “better” than a flat method with three 2D tables:

if (ta[x][y] or tb[x][y] or tc[x][y]) -- ==6 vm instructions for the table dereferencing

in general, the more you’ll be using that “cell”, the better off you’d be making the cell “fat” and aliasing to it just once.

if it’s truly just twice, as in your example, then it’s a wash, do as you please.  (4 vm instructions either way)

Thanks for your input guys…  I’ve tried a lot of things to reduce memory footprint.  Things like localising functions so using doSomething(self) rather than 17k instances of doSomething() for my main 4D array but there was no noticeable change in memory footprint or execution speed.  I think the Corona compiler does a good job realising where this happens and optimising accordingly.

My biggest gain to FPS is to not rely on Corona culling off screen graphics and handling this manually - splitting my 130x130 tiles into smaller quads and doing bounds checking on each quad.

My app is big (think Sim CIty 3000) and I guess 200-350MB is not actually that bad.  Shot showing my game and the memory usage (the selected building is 384x1448 in retina).

Image1.png

How are you computing the Lua memory usage? (copy/paste the code would be helpful)

Rob

Sure, I am using this…

local memoryUsed = "Tex: "..mfloor(system.getInfo("textureMemoryUsed")/1048576).."mb Lua:"..mfloor(collectgarbage("count")/1024).."mb"

Lua usually (depends a bit on how the VM was compiled) needs at least 16bytes per value in a table used as an array. This means, if you use an array to store a single boolean you’re wasting 99% of the memory, i.e. you use 1 bit and waste another 127. That’s just a massive loss and in addition is also close to the worst cash trashing you can do to a modern CPU.

F.i. a simple 128x128x128 true/false array consumes 32MB while the actual data is only 256kb.

Sadly I haven’t had the time to use Corona so I’m not sure which version of Lua is included. I guess it’s one without real integer support and no bitwise operators available so what you probably should use is the BitOp plugin https://marketplace.coronalabs.com/plugin/bit and store multiple booleans of a tile in a single value of your arrays. Seems you can use up to 32bits in a value using this which, depending on how many of these boolean arrays you have, might shrink you memory requirements down by 96%.

We are using a modified version of Lua 5.1.

I’ve learnt a few tricks while optimising my game.

In terms of memory usage the best one was this - say you have a large table of objects, in my case football players. I might have 6,000 loaded at once, each with a large number of properties such as .name, .age, .club, .wage etc.

Storing it in memory like this:

[lua]

player[1] = {name = “Tony Fork”, age = 30, club = “Crystal Palace”, wage=  2500}

player[2] = {name = “Winston Risk”, age = 20, club = “Fulham”, wage = 250}

[/lua]

Is nearly 4x as expensive in terms of memory than this:

[lua]

player.name = {“Tony Fork”, “Winston Risk”}

player.age = {30, 20}

player.club = {“Crystal Palace”, “Fulham”}

player.wage = {2500, 250}

[/lua]

Because disk access on Windows is so slow compared to mac, especially on IDE drives, I’ve had to move from accessing data from an SQL table on the fly, to holding most of it in memory and loading/saving in bulk when necessary.

I’ve got a number of other tips which are focussed more on optimising for speed rather than memory usage.

  1. Populating large arrays

[lua]

local arr = {}

local arrCnt = 1

for a = 1, 50000, 1 do

  arr[arrCnt] = {5, 1, 0, 0, “stuff”}

  arrCnt = arrCnt + 1

end

[/lua]

is over 3.5 faster than:

[lua]

local arr = {}

for a = 1, 50000, 1 do

 arr[#arr+1] = {5, 1, 0, 0, “stuff”}

end

[/lua]

  1. Populating small arrays

If you know how exactly big a table is going to be in advance, but don’t yet have the data to fill it yet.

[lua]

local arr = {true, true, true, true}

for a = 1, 4, 1 do

 arr[a] = someCalculation(a)

end

[/lua]

Is quicker than:

[lua]

local arr = {}

for a = 1, 4, 1 do

 arr[#arr+1] = someCalculation(a)

end

[/lua]

  1. Building large queries from strings

You might be building a dynamic query to interrogate an online DB or a local SQLlite DB, with a number of parameters, using the “…” concatenate command to build up the query string.

Putting all the components of the query into a table, and then creating the final query string at the end using table.concat is noticeably faster. This is vital in places where I’m doing up to 80,000 dynamic inserts or updates on an SQLlite table.

  1. Avoid while loops

If you need to loop until a certain criteria is fulfilled, a while loop is handy but expensive.

[lua]

for a = 1, 100000, 1 do

  local b = math.random(1,1000)

  if b == 50 then

    break

  end

end

[/lua]

is much faster than (assuming the 50 is found on the same iteration):

[lua]

local b = 0

while b ~= 50 do

  b = math.random(1,1000)

end

[/lua]

  1. Avoid inPairs, if possible

Say I am iterating over all the attributes a footballer has, to calculate his influence on the next passage of play:

att.drb = 15

att.pas = 7

att.tac = 12

It is much more efficient to either store these as a numbered array rather than keys (which uses less memory but is less easy to debug later), or store the key names in a numbered array and iterate that way.

SLOW:

[lua]

local inf = 0

local fac = {drb = 0.4, pas = 1, tac = 0.2}

for k, v in pairs(att) do

  inf = v * fac[k]

end

[/lua]

FASTER:

[lua]

local inf = 0

local fac = {drb = 0.4, pas = 1, tac = 0.2}

local nms = {“drb”, “pas”, tac"}

for a = 1, #nms, 1 do

  local k = nms[a]

  local v = att[k]

   inf = v * fac[k]

end

[/lua]

I’m now really confused about memory allocations…  Take this simple test code

local mfloor = math.floor print("boolean array 1000x1000") print("start:"..mfloor(collectgarbage("count")/1024).."mb") local a, b, c = {}, {}, {} for i = 1, 1000 do a[i] = {} for j = 1, 1000 do a[i][j] = true end end print("end:"..mfloor(collectgarbage("count")/1024).."mb") print("int array 1000x1000") print("start:"..mfloor(collectgarbage("count")/1024).."mb") for i = 1, 1000 do b[i] = {} for j = 1, 1000 do b[i][j] = 100 end end print("end:"..mfloor(collectgarbage("count")/1024).."mb") print("string array 1000x1000") print("start:"..mfloor(collectgarbage("count")/1024).."mb") for i = 1, 1000 do c[i] = {} for j = 1, 1000 do c[i][j] = "Lorem ipsum dolor sit amet, pri ad impetus eleifend, ut sanctus appellantur sea. Ferri vivendum eam an, pri nonumy omnium persequeris no, sed in virtute salutatus. An discere definitiones pri. Accusamus splendide qui ei, mei in error corpora vituperatoribus. Te quot indoctum disputando vix, vix at vivendum moderatius." end end print("end:"..mfloor(collectgarbage("count")/1024).."mb")

and the corresponding output

17:01:53.828 boolean array 1000x1000 17:01:53.828 start:0mb 17:01:53.875 end:15mb 17:01:53.875 int array 1000x1000 17:01:53.875 start:15mb 17:01:53.922 end:31mb 17:01:53.922 string array 1000x1000 17:01:53.922 start:31mb 17:01:53.969 end:47mb

It seems that simple arrays of boolean and integers do indeed allocate the same amount of memory (15Mb each) so what Michael said seems to be proved by test and that arrays of boolean are massively wasteful!

I don’t understand why the array of long strings is only just larger than the array of booleans?

@nick_sherman - good stuff :slight_smile:

the trouble with posting optimization tips is that you’ll tend to attract pesky nitpickers :smiley:

(truthfully though, any appearance of just being a pesky-nitpicker is apologized for in advance)

  1.  arrCnt isn’t needed

  2.  {nil,nil,nil,nil} is faster than {true,true,true,true}

  3.  spot on

  4.  while loops can perform better than “for i” in certain cases

  5.  clever indirection :slight_smile:

your string is only stored once, it’s static.  the table is full of identical pointers to that single static.

  1. Good point, I generally use this technique when I’m importing thousands of rows from an SQLlite table, in that case I wouldn’t have an iterator variable so would need arrCnt.

  2. I’ll remember that!

  3. I’ll keep that mind, certainly I’ve found it to be a lot faster in all the places I’ve used it.