[Tips] Optimization 101


Hey guys, just want to share some optimization techniques here with you. I will update this list when I can or add any techniques you guys may suggest.

1) Localization : Localizing Lua functions provides a decent > massive speed difference. How do you do this ? Like so :

The slower way (Bad)

local myRand = math.random(1, 10)  

The faster way (Good)

local mRand = math.random  
local myRand = mRand(1, 10)  

Other examples of this

local mRandSeed = math.randomseed  
--Call it  
local sFormat = string.format  
--Call it  
sFormat("%02d", myVar)  
local tRemove = table.remove  
--Call it  

2) Avoid moving objects manually. Moving objects by incrementing it’s x,y values each frame is slower than using translate.

Example :

The bad way

obj.x = obj.x + 1  
obj.y = obj.y + 1  

The good way

obj:translate(1, 1)  

You can also use translate even if you only need to just increase the x or y and not both as shown below :

--Only increment the x value  
obj:translate(1, 0)  
--Only increment the y value  
obj:translate(0, 1)  

3) Avoid heavy string comparison. If you are doing a lot of string comparison checks like below :

local myString = "none"  
if myString == "hi" or myString == "do" or myString == "lol" then  

It is faster to use a table of enums. That way you can still know the values but you are comparing numbers which is much faster than comparing strings.

local myString = 1  
local enumTable = {  
 ["hi"] = 1,  
 ["do"] = 2,  
 ["lol"] = 3,  
if myString == enumTable["hi"] or myString == enumTable["do"] or myString == enumTable["lol"] then  

4) For loops. Certain types of for loop execute faster than others.

Fastest :

for i = 1, #myTable do  

Second fastest

local fpairs = ipairs  
for i, v in fpairs(myTable) do  

The difference between the two is roughly 120ms.

5) Make use of tables to cut down on local variable use and organise your code.

Bad practice for a group of data.

local mySheet1 = sprite.newSpriteSheetFromData("mySheet1.png", require("mySheet1").getSpriteSheetData())  
local mySheet2 = sprite.newSpriteSheetFromData("mySheet2.png", require("mySheet2").getSpriteSheetData())  
local mySheet3 = sprite.newSpriteSheetFromData("mySheet3.png", require("mySheet3").getSpriteSheetData())  

Better practice for group of data.

local mySheets = {  
 ["Sheet1"] = sprite.newSpriteSheetFromData("mySheet1.png", require("mySheet1").getSpriteSheetData())  
 ["Sheet2"] = sprite.newSpriteSheetFromData("mySheet2.png", require("mySheet2").getSpriteSheetData())  
 ["Sheet3"] = sprite.newSpriteSheetFromData("mySheet3.png", require("mySheet3").getSpriteSheetData())  

To give an advantage of the benefits of this, despite the obvious ones of memory savings and organized blocks of code, take this scenario.

Attempting to dispose of the 3 sprite sheets and nil the variables using method 1 (the bad practice)

mySheet1 = nil  
mySheet2 = nil  
mySheet3 = nil  

Now disposing of the sprite sheets and nil the table. (the better practice)

local tRemove = table.remove  
for i, v in pairs(mySheets) do  
 mySheets[i] = nil  

See how much easier that is to manage and remove? Thats one of the other countless benefits of using tables.

Note: Overuse of tables will consume more lua memory. So use appropriately.

6) Functions : Use less of them.

Say for instance you have 4 buttons on screen that each have a tap event. You don’t need four functions (one for each), just the one will suffice.

--Table to store the buttons  
local myButtons = {}  
myButtons["Pause"] = display.newImage("pause.png")  
myButtons["Pause"].myId = "Pause" --Set the buttons id  
myButtons["Shoot"] = display.newImage("shoot.png")  
myButtons["Shoot"].myId = "Shoot" --Set the buttons id  
myButtons["Move"] = display.newImage("move.png")  
myButtons["Move"].myId = "Move" --Set the buttons id  
myButtons["Retry"] = display.newImage("retry.png")  
myButtons["Retry"].myId = "Retry" --Set the buttons id  
--Function to handle our buttons  
local function handleButtons(event)  
 local target = event.target  
 --Handle action for each different button  
 if target.myId == "Pause" then  
 elseif target.myId == "Shoot" then  
 elseif target.myId == "Move" then  
 elseif target.myId == "Retry" then  
 return true  
--Add event listeners for all the buttons  
for i = 1, #myButtons do  
 myButtons[i]:addEventListener("tap", handleButtons)  

See one function that handles all your buttons. This saves using 4 separate functions for something that can be achieved in one :slight_smile:

7) Updating Objects : Only do it when you have to.

Say for instance you are creating a space shooter and have a text object on screen to display your current score. You obviously want to update it to reflect the current score, but you don’t need to do this every frame.

The bad way

local function updateScore(event)  
 score = score + 1  
 score.text = "score " .. score  
Runtime:addEventListener("enterFrame", updateScore)  

The good way:

--In your collision handler (when enemy gets hit with a bullet you fired  
local function onCollision(event)  
 --Where Bullet collides with enemy  
 score = score + 1  
 score.text = "score" .. score  

That way your only updating an object when you need to, rather than wasting cpu cycles updating something when it doesn’t need updating.

8) Creating objects : Things to always remember.

This is covered in greater depth here: http://blog.anscamobile.com/2011/09/how-to-spawn-objects-—-the-right-way/

When you have a function to create an object (or series of objects) you must return the object at the end of the function otherwise you have no way to clear the object from memory. This is also good programming practice.

Bad example:

local function spawnAGuy(amount)  
 local guy = display.newImage("guy.png")  
 guy.x = 100  
 guy.y = 100  
--Create the guy (note in this case your local reference is only creating a reference the the function not the created "guy" object)  
local spawnedGuy = spawnAGuy  
--Remove it  
you cant  

Better example:

local function spawnAGuy(group)  
 local guy = display.newImage("guy.png")  
 guy.x = 100  
 guy.y = 100  
 --Insert the guy into the specified group if it exists (then you can easily clear it from display  
 if group then  
 return guy  
--Create the guy  
local spawnedGuy = spawnAGuy(localGroup)  
--Remove it  
spawnedGuy = nil  

Best example (use parameters)

local function spawnAGuy(params)  
 local guy = display.newImage("guy.png")  
 guy.x = params.x or 100  
 guy.y = params.y or 100  
 --Insert the guy into the specified group if it exists (then you can easily clear it from display  
 if params.group then  
 return guy  
--Create the guy  
local spawnedGuy = spawnAGuy({x = 20, y = 130, group = localGroup})  
--Remove it  
spawnedGuy = nil  

Using paramater passing is a hugely powerful feature, you can even create a function in the spawning function to destroy/remove the guy.

local function spawnAGuy(params)  
 local guy = display.newImage("guy.png")  
 guy.x = params.x or 100  
 guy.y = params.y or 100  
 --Insert the guy into the specified group if it exists (then you can easily clear it from display  
 if params.group then  
 --Create a function to remove the guy  
 function guy:destroy()  
 self = nil  
 return guy  
--Create the guy  
local spawnedGuy = spawnAGuy({x = 20, y = 130, group = localGroup})  
--Remove it  

More coming soon!, enjoy!
[import]uid: 84637 topic_id: 18550 reply_id: 318550[/import]

Nice post, thank you!

Specially #2, I wasn’t aware of that till now… [import]uid: 10478 topic_id: 18550 reply_id: 71250[/import]

very cool examples, thanks [import]uid: 16142 topic_id: 18550 reply_id: 71266[/import]

Just added #6, functions. Enjoy :slight_smile: [import]uid: 84637 topic_id: 18550 reply_id: 71269[/import]

For #2 would it still be better to use translate if your core game loop is inside an enterFrame listener (I currently increment x and y values within enterFrame).

Would it actually be a noticeable performance difference? [import]uid: 51654 topic_id: 18550 reply_id: 71271[/import]

Yeah it would be better to use translate and it would be faster yeah. [import]uid: 84637 topic_id: 18550 reply_id: 71272[/import]

I’ve done tests. #2 works even better by caching the access of x and y. It’s the accessing that’s expensive not the modification.

My test was to use the uma horse sprite. Display 225 of them and have them loop the animation.
Looping over all the sprites ever frame with this code produced 40-45fps*

d.x = 0 -- just this access will slow things down significantly  
d.y = 0  

This however produced 56-60fps*

d:translate(0, 0)  

In other words, when you construct your display objects:

local d = display.newGroup()  
d.cx = d.x  
d.cy = d.y  
-- .. somewhere modify cx, cy  
local dx = d.cx + 1  
local dy = d.cy + 1  
d:translate(dx - d.cx, dx - d.cy)  
d.cx = dx  
d.cy = dy  

Is much faster for large numbers of display objects over x, y, rotation, xScale, yScale. The alpha and isVisible properties however are not affected.

This has been confirmed by yanuar and I. But don’t take my word for it, test it yourself.

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil” -Donald Knuth

* numbers reported testingon 4th gen ipod touch, your will be different if you use a different sprite or device due to fillrate or other device dependent characteristics [import]uid: 27183 topic_id: 18550 reply_id: 71273[/import]

I do agree with Don. I avoid doing stuff I know is horrible, but beyond that I will only start optimizing if my iPhone 3GS has significant performance issues. (iPhone 3GS is what I consider the “bare minimum” right now).

@Danny I’d be interested to see the performance comparison between using the transition functions versus other possible solutions. (example, trying to transition with timers instead) Also, the information you provided above should be added to the existing optimization document in the docs section so that it remains easily accessible. [import]uid: 36054 topic_id: 18550 reply_id: 71277[/import]

This post need to be sticky !!

EDIT: Ok, sticked !! XD
[import]uid: 55808 topic_id: 18550 reply_id: 71276[/import]

+1 @blasterv , i used several times transition.to in enterframe and i need to get better perfomance. [import]uid: 55808 topic_id: 18550 reply_id: 71278[/import]

@jose2 : yeah i made it a sticky for convenience :wink:

@blasterv, sure. Plenty of other things will be covered too. Once i feel the list is extensive enough I will make a blog post about it and add it to the official api optimization listing. [import]uid: 84637 topic_id: 18550 reply_id: 71279[/import]

@jose2 I try to keep enterframe fairly empty. Typically I use timers and transition API for Artifical Intelligence (when enemies must take/not take action based on actions of the player) [import]uid: 36054 topic_id: 18550 reply_id: 71280[/import]

@danny, about the 4th. Is more faster user the second or the first ???. Is not clearly explained.

I read somewhere (a blog about lua optimization) about the for…do perfomance vs ipairs and tolds is more faster use for…do if you know the table length, is true ?

[import]uid: 55808 topic_id: 18550 reply_id: 71282[/import]

This is awesome! Thank you, Danny!

Please keep more tips coming!

Naomi [import]uid: 67217 topic_id: 18550 reply_id: 71284[/import]

@jose2: I stated above the code snippets. The first one is the fastest :slight_smile: [import]uid: 84637 topic_id: 18550 reply_id: 71287[/import]

@blasterv yes but i need to use enterframe loop to know some positions of my sprites to add new graphics.

For example, a sprite who needs a “glasses” sprite i do this:

for i = eyeeffectsGroup.numChildren,1,-1 do  
 if (eyeeffectsGroup[i]~= nil) then  
 local xo = eyeeffectsGroup[i].padre.x  
 local yo = eyeeffectsGroup[i].padre.y   
 local degrees = eyeeffectsGroup[i].padre.rotation  
 local radians = ((degrees-90)\*math.pi)/180  
 eyeeffectsGroup[i].x = xo+15\*math.cos(radians)  
 eyeeffectsGroup[i].y = yo+15\*math.sin(radians)  
 eyeeffectsGroup[i].rotation = eyeeffectsGroup[i].padre.rotation  

I put the glasses sprite in the eyeeffects display group and call the previous “padre” (parent) graphic to know the exactly x, y from parent and made the rotation. It’s ok? [import]uid: 55808 topic_id: 18550 reply_id: 71283[/import]

Thanks @danny !!

So, for example in the code in #13 what’s the best way to optimize this ??

This code execute on enterframe [import]uid: 55808 topic_id: 18550 reply_id: 71289[/import]

[deleting the double post] [import]uid: 67217 topic_id: 18550 reply_id: 71285[/import]


What do you think about creating a series of tests so we can measure this stuff and have everyone report back? I created 65 tests just to figure out what’s slowing stuff down. It would be cool if there was a place like on github to share with everyone and collate results so that we’re not all coding in the dark.
Here are my findings based on those tests. Stuff that kills sprite performance.

  1. fill rate! large graphics that need to be updated use smaller sprites (scaling up affects fill rate, scaling down improves perfromance)
  2. resorting of sprites (create / destroy, toBack, toFront) do this a lot and it will slow things down a lot
  3. accessing x, y, rotation, xScale, yScale is very slow (use translate, rotate and scale and cache the properties for update, see my earlier post for an example)
  4. avoid destroying your sprites, instead make them invisible (isVisible = false or alpha = 0) Toggling sprite visibility has some overhead, but may really improve performance.
  5. if your sprites are hidden, pause their playback if you can. (you can have 10k hidden sprites not playing and get 60fps, but only 400 hidden sprites playing at 60fps)
  6. if the 300 uma sprites are visible when in the viewport, what is performance like when they are out of the view port are they culled? performance improves by about 10 fps when they are offscreen (would doing manual culling help? needs to tested)

All my sprite tests were done with the uma sprite sheet tested on 4th gen ipod touch.

I assume this will extend to any display object, but I haven’t confirmed this since my entire game is using only sprites.

Further useful test should be done with physics, timers, variable scoping (globals are slow, but how local is local? what if you access a variable local to a file from inside a function?) [import]uid: 27183 topic_id: 18550 reply_id: 71294[/import]

I made a mistake. I don’t use transition.to in enterframe but i have several loops using it:

Normally in my game swaps around 40 sprites using transition.to loops:

local a1  
local a2  
function a1()  
 transition.to (ball, {xScale =0.9, yScale=1.1, onComplete=a2, time=200})  
function a2()  
 transition.to (ball, {xScale =1.1, yScale=0.9, onComplete=a1, time=200})  

I do this to achieve a bounce effect in balls and every ball loops it.

A better way to optimize this ? [import]uid: 55808 topic_id: 18550 reply_id: 71299[/import]