"Is This Faster?" Issue #1

Hi.  First, this is not a question.  Instead I’m posting the results of some (perhaps silly) questions that were floating around in the back of my mind.  

(Get the code here if you want to test my results yourself.)

I’m a performance geek and sometimes I get these weird ideas about faster and slower ways of doing things.  So, without further ado, let me expose my silly-ness.

Question #1 - for k,v in pairs() vs for _,v in pairs()

Is this:

-- Test 1 for \_,v in pairs( tbl ) do -- Uses '\_' to say, don't store value end

faster/slower than this:

-- Test 2 for k,v in pairs( tbl ) do end

(The key difference is the use of ‘_’ as a storage space instead of a variable.  The real question here should be, “Is ‘_’ special to Lua or just another temporary variable name”)

After 10 runs x 1,000,000 iterations:   No appreciable difference.  ‘_’ must be just another variable and not special.

Question #2 - obj.fieldName vs obj[“fieldName”] lookups

Is this:

-- Test 1 for i = 1, iterations do if(obj.fieldName) then -- No work end end

faster/slower than this: 

-- Test 2 for i = 1, iterations do if(obj["fieldName"]) then -- No work end end

(This question is about how Lua interprets the lookup request for a field named ‘fieldName’.)

After 10 runs x 1,000,000 iterations:   No appreciable difference.  I guess both bits of code generate the same or similar interpreted results.

Question #3 - Do we get a speedup for not re-creating temporary strings?

Is this:

-- Test 1 for i = 1, iterations do if(obj["fieldName"]) then -- No work end end

faster/slower than this: 

-- Test 2 local index = "fieldName" for i = 1, iterations do if(obj[index]) then -- No work end end

(In this question, I’m trying to see how costly the creation of a temporary string is for doing lookups is.  In test 1, I believe, a temporary string is created every iteration.  Whereas, in test 2, the string is only created per run, right before the iteration code.)

After 10 runs x 1,000,000 iterations:  Test #2 is slightly (2…3%) faster.  This tells me that its not worth doing this unless it is part of a long chain of lookups or part of my code needs to have a programatically defined lookup value.

Question #4 - Same as #3 with subtle locality difference

Is this: 

-- Test 1 local index = "fieldName" for i = 1, iterations do if(obj[index]) then -- No work end end

faster/slower than this:  

-- Test 2 local index = "fieldName" function() for i = 1, iterations do if(obj[index]) then -- No work end end end

(This question is similar to #3, but in the second test of this case, the local variable containing the string is created outside the scope of a function that does the work.  Will it be faster or slower?)

After 10 runs x 1,000,000 iterations:   Whoa!  Locality makes a big difference.  Simply being outside the scope of the function has actually made case #2 20% slower than test #1.  So, let that be a lesson.  Locality is a crucial to speed.

i fully support your efforts, and actual testing sometimes is needed to “prove” something merely assumed true, but… much of this “lua internals” performance is (and has long been) fully documented.

  1.  _ is just another legal variable name, it’s only a convention to mean ‘not used’, nothing ‘special’ about it.

  2. this is one of lua’s as ‘syntactic sugars’ (a shortcut way of writing equivalent statements), purely for the benefit of human writers, both compile to identical vm code

  3. the reason test 2 is faster is NOT because of string manipulations, it’s because of an extra read of your variable “index”.  in all cases string literals are stored as statics.  (test 1:  read static string #N;  test 2:  read local variable (containing pointer to static string #N), get static string #N)

  4. the local environment can be accessed by index, which is quite a bit faster than accessing a more remote environment by name

if it sounds like i’m just being negative, apologies, as that’s not my intent… rather, it’s because what i think really NEEDS performance testing is Corona’s *internals*.  there are many sites with general lua performance tips and preferred patterns/idioms.  but there’s far less in the way of testing Corona.

for example, what performance hit (if any) to individually do displayObject.alpha=0.5 versus changing alpha on it’s parent group?

or what performance hit (if any) to draw a rectangle with/without a stroke?  or of varying weights?

or a “plain” rect (just geometry) versus a newImage() (whic is a textured rectangle)?

…and many other things like that… perhaps?  fwiw

@Dave,

So, what you’re saying is that I should revive my old ‘Corona Bench’ project and open source it so folks can create and add tests to it on their own?

https://www.youtube.com/watch?v=ZsQ04F5wtTs

Note: This tool measured Lua and Corona speed and memory usages for various scenarios.

Note2: I wrote a modified version of this tool recently to test Corona -> HTML5  (again this starts with Lua tests, but moves on to Corona tests)

-Ed

anything, as long as it finishes with an “actionable” lesson to be learned  :)

(or it could be that i’m just too dense to decipher your bench results :D)

what are all those circles doing?  what’s the take-away?  500 circles are slower than 100?  (but isn’t that a no-brainer?)

in short, while i don’t think the world needs any more for-versus-while analysis, if there’s a “corona internals performance secret” revealed by your circles (or any of the tests), then that sort of stuff would be tremendously useful.

the problem i keep bumping my head into is there doesn’t seem to be any “good” way to get actual rendering time.  all you can do is keep adding complexity until you affect frame rate, stating results for example “500 of this until suffer, but 600 of that until suffer”.

but that’s far from rigorous as is, and prone to any number of other extraneous factors/events that might affect framerate, thus hard to “learn” from.  and most devices won’t do arbitrary framerates, so a lot of ‘important’ info is lost between 60 to 30

(that is, all you can typically notice is when the hit is finally SO big that you’ve halved your framerate – you may have ‘doubled’ your rendering time much earlier, for example, but have so much leeway that you don’t feel it in the framerate until triple (or more) your rendering time.  but that’s critical to know if trying to figure how much time you might have for doing OTHER stuff during that loop)

it’d be nice if we had access to some “internal” times (specifically end-of-render minus begin-of-render).  if you’ve got an approach to address that it’d be awesome

again, just my 2c, and only fwiw

This is a starting of something amazing :slight_smile: Good read and would gladly always read any future info on this topic. Granted most of it is common info just from programming in corona for a few months, but you never know if some of this misses some people. Would love to see more misc stuff like this to help general performance.

Would like to see more on localization, like getting way more into function scopes deeper and deeper by a few functions always started to confuse me and I think is where I lose some of my performance. Ex. Create a monster in compose “did” function and check for collision against another sprite in a collision function where you check for the HP and when HP reaches zero you call another function which starts removing stuff but properly keeping scope throughout and the differences of not keeping scope throughout like above post.

Or another, lets say I press a button and its to create something on the spot (like weapon for a few seconds *slashing animation*) or blocks or something. Now I usually hold my imageSheet setup and sequence animation code at the top of my programs but should they properly be scoped right inside when you press the button? or have setup code in the button? but thats like an extra lines of code in the button that just boggle it down every few seconds.  Maybe show differences of more advanced localization of certain random variables like creation of stuff (variable names, physics shapes, random variables) and if that has an affect on performance? As usually more advanced stuff being called on the spot like collision or certain timers and not just at the start of the game.

Anyways just a few questions maybe for issue #2 or something that can keep this performance thing going :stuck_out_tongue: none of it needs to be answered as my performance in my game is pretty good now, just stuff that I always wondered the past few months and maybe other people have too.

I ran some tests and I can confirm that the differences on Question 4 above is around 20-something %.

However does it really matter at all in the real world? 

Maybe I’m completely missing something, but the way I’m thinking is that percentages can be misleading. Actual timings are more relevant.

I ran tests on an iPod Touch 4th gen, and the difference between the two runs is around 25%, however the actual time difference is 117 ms…over 1.000.000 iterations! That would translate to 0.000117ms extra time per frame.

Hardly anything that could make a noticeable difference in gaming performance?

@ingemar: it’s not my topic, but i’m subscribed, so, want my two cents?  for many (if not most) apps it’s probably not enough to make a difference.  for instance, anything in the “casual puzzle” genre isn’t likely to tax a device hard enough to matter.

but it isn’t hard to imagine a more “ambitious” scenario where it might matter.  imagine an arcade-style shooter, with 1000 entities that need individual updates, and let’s say within their update routine there are 10 opportunities to localize something, so there’s an easy 10K multiplier, and now you’ve accounted for a whole millisecond, with only 16.7 available per frame.  might matter?  it’s not so much about how much time this took, but how much time might be left for everything else - things that might be harder to optimize because they’re handled more “internally”:  rendering, audio, physics, whatever else.  really just trying to find some extra “headroom”

but yes, in short, if you don’t have a performance problem then don’t bother  :)

Don’t get me wrong. I do see a point in optimization. However I’ve been caught spending a lot of time optimizing code in the past where I’ve been chasing ms for no apparent reason.

My approach for quite some time now is to code without thinking about optimization and then profile/optimize my tight loops if necessary. Some might not think it’s a good approach, but I’m one of those who follow these “rules”  :stuck_out_tongue: :

• The First Rule of Code Optimization: Don’t. 

• The Second Rule of Code Optimization: Don’t yet.

• The Third Rule of Code Optimization: Profile first. Never optimize without profiling first.

This is a very important point.

I don’t know how much time I’ve spent optimizing things for no apparent reason, other than to “do it”.

“Premature optimization is the root of all evil”. This saying is very very true.

i knew i shouldn’t have replied  ;)  i’m not here to defend pointless optimization

all true, and i’ve read them all 1000 times before, and i’ve already agreed above.

but a “tight loop” scenario as you say is exactly the example i was attempting to describe

ok, devil’s advocate (since the can of worms is already opened, why not?)

on the flip side, there is no evil in following best-practices in advance simply because you’re worried that you might be optimizing too soon – it’s not like typing "local"in front of a variable is a radical idea.  it takes 1 second to implement and is a well-known language feature that has the neat benefit of being faster than not doing so.  (not to mention that it’s just better practice in general to declare local things local)

@All,

We’re not quite there yet, but lest this turn into a holy war…let me re-iterate my initial statement

Hi.  First, this is not a question.  Instead I’m posting the results of some ( perhaps silly ) questions that were floating around in the back of my mind.  

This was just me making a post about some ideas I found interesting.  

I am firmly of the mind:

  1. Always try to use performant coding practices, BUT…
  2. Don’t do so at the expense of code legibility and portability, AND…
  3. Don’t optimize until you need to, AND THEN…
  4. Only optimize what you find to be the worst performer via profiling.

Cheers,

Ed

My response wasn’t aimed at you, just a general comment :slight_smile:

I agree with what you just said 100%.

My take on this is that its good to have best practices which will help keep you from getting into some situations where optimization is required.  Avoiding those pitfalls is a good thing.

On the other hand, things like multiplying by 0.5 vs. dividing by 2 is frequently an optimization does more harm than good.  While no one would argue that CPU wise, a multiply operation is faster than a divide, it’s more typing and frankly harder to read.  And if you’re doing something like:

local cx = display.contentWidth * 0.5

at the top of your code and then using cx everywhere else, then really, you saved 2 CPU cycles that only happens once in your code.  Sure in a tight loop, you need to use multiplies over divides (I’m using this as an example) but it’s pointless when the optimization is hit once and never again.

To me, where you need to be spending time is optimizing your images and audio.  Keep the images the right size for the device (i.e. don’t load in huge images and force Corona SDK to scale them down for you on smaller screen).  Think about texture memory bounderies: don’t have a 1025 pixel wide image.  Find a way to make it 1024.

Follow best programming practices and things should do a great job of taking care of themselves.

But that said, when you do start having issues, profile, and look for optimizations.

Rob

i fully support your efforts, and actual testing sometimes is needed to “prove” something merely assumed true, but… much of this “lua internals” performance is (and has long been) fully documented.

  1.  _ is just another legal variable name, it’s only a convention to mean ‘not used’, nothing ‘special’ about it.

  2. this is one of lua’s as ‘syntactic sugars’ (a shortcut way of writing equivalent statements), purely for the benefit of human writers, both compile to identical vm code

  3. the reason test 2 is faster is NOT because of string manipulations, it’s because of an extra read of your variable “index”.  in all cases string literals are stored as statics.  (test 1:  read static string #N;  test 2:  read local variable (containing pointer to static string #N), get static string #N)

  4. the local environment can be accessed by index, which is quite a bit faster than accessing a more remote environment by name

if it sounds like i’m just being negative, apologies, as that’s not my intent… rather, it’s because what i think really NEEDS performance testing is Corona’s *internals*.  there are many sites with general lua performance tips and preferred patterns/idioms.  but there’s far less in the way of testing Corona.

for example, what performance hit (if any) to individually do displayObject.alpha=0.5 versus changing alpha on it’s parent group?

or what performance hit (if any) to draw a rectangle with/without a stroke?  or of varying weights?

or a “plain” rect (just geometry) versus a newImage() (whic is a textured rectangle)?

…and many other things like that… perhaps?  fwiw

@Dave,

So, what you’re saying is that I should revive my old ‘Corona Bench’ project and open source it so folks can create and add tests to it on their own?

https://www.youtube.com/watch?v=ZsQ04F5wtTs

Note: This tool measured Lua and Corona speed and memory usages for various scenarios.

Note2: I wrote a modified version of this tool recently to test Corona -> HTML5  (again this starts with Lua tests, but moves on to Corona tests)

-Ed

anything, as long as it finishes with an “actionable” lesson to be learned  :)

(or it could be that i’m just too dense to decipher your bench results :D)

what are all those circles doing?  what’s the take-away?  500 circles are slower than 100?  (but isn’t that a no-brainer?)

in short, while i don’t think the world needs any more for-versus-while analysis, if there’s a “corona internals performance secret” revealed by your circles (or any of the tests), then that sort of stuff would be tremendously useful.

the problem i keep bumping my head into is there doesn’t seem to be any “good” way to get actual rendering time.  all you can do is keep adding complexity until you affect frame rate, stating results for example “500 of this until suffer, but 600 of that until suffer”.

but that’s far from rigorous as is, and prone to any number of other extraneous factors/events that might affect framerate, thus hard to “learn” from.  and most devices won’t do arbitrary framerates, so a lot of ‘important’ info is lost between 60 to 30

(that is, all you can typically notice is when the hit is finally SO big that you’ve halved your framerate – you may have ‘doubled’ your rendering time much earlier, for example, but have so much leeway that you don’t feel it in the framerate until triple (or more) your rendering time.  but that’s critical to know if trying to figure how much time you might have for doing OTHER stuff during that loop)

it’d be nice if we had access to some “internal” times (specifically end-of-render minus begin-of-render).  if you’ve got an approach to address that it’d be awesome

again, just my 2c, and only fwiw

This is a starting of something amazing :slight_smile: Good read and would gladly always read any future info on this topic. Granted most of it is common info just from programming in corona for a few months, but you never know if some of this misses some people. Would love to see more misc stuff like this to help general performance.

Would like to see more on localization, like getting way more into function scopes deeper and deeper by a few functions always started to confuse me and I think is where I lose some of my performance. Ex. Create a monster in compose “did” function and check for collision against another sprite in a collision function where you check for the HP and when HP reaches zero you call another function which starts removing stuff but properly keeping scope throughout and the differences of not keeping scope throughout like above post.

Or another, lets say I press a button and its to create something on the spot (like weapon for a few seconds *slashing animation*) or blocks or something. Now I usually hold my imageSheet setup and sequence animation code at the top of my programs but should they properly be scoped right inside when you press the button? or have setup code in the button? but thats like an extra lines of code in the button that just boggle it down every few seconds.  Maybe show differences of more advanced localization of certain random variables like creation of stuff (variable names, physics shapes, random variables) and if that has an affect on performance? As usually more advanced stuff being called on the spot like collision or certain timers and not just at the start of the game.

Anyways just a few questions maybe for issue #2 or something that can keep this performance thing going :stuck_out_tongue: none of it needs to be answered as my performance in my game is pretty good now, just stuff that I always wondered the past few months and maybe other people have too.

I ran some tests and I can confirm that the differences on Question 4 above is around 20-something %.

However does it really matter at all in the real world? 

Maybe I’m completely missing something, but the way I’m thinking is that percentages can be misleading. Actual timings are more relevant.

I ran tests on an iPod Touch 4th gen, and the difference between the two runs is around 25%, however the actual time difference is 117 ms…over 1.000.000 iterations! That would translate to 0.000117ms extra time per frame.

Hardly anything that could make a noticeable difference in gaming performance?

@ingemar: it’s not my topic, but i’m subscribed, so, want my two cents?  for many (if not most) apps it’s probably not enough to make a difference.  for instance, anything in the “casual puzzle” genre isn’t likely to tax a device hard enough to matter.

but it isn’t hard to imagine a more “ambitious” scenario where it might matter.  imagine an arcade-style shooter, with 1000 entities that need individual updates, and let’s say within their update routine there are 10 opportunities to localize something, so there’s an easy 10K multiplier, and now you’ve accounted for a whole millisecond, with only 16.7 available per frame.  might matter?  it’s not so much about how much time this took, but how much time might be left for everything else - things that might be harder to optimize because they’re handled more “internally”:  rendering, audio, physics, whatever else.  really just trying to find some extra “headroom”

but yes, in short, if you don’t have a performance problem then don’t bother  :)

Don’t get me wrong. I do see a point in optimization. However I’ve been caught spending a lot of time optimizing code in the past where I’ve been chasing ms for no apparent reason.

My approach for quite some time now is to code without thinking about optimization and then profile/optimize my tight loops if necessary. Some might not think it’s a good approach, but I’m one of those who follow these “rules”  :stuck_out_tongue: :

• The First Rule of Code Optimization: Don’t. 

• The Second Rule of Code Optimization: Don’t yet.

• The Third Rule of Code Optimization: Profile first. Never optimize without profiling first.