Lua runtime performance improvements

Hi, we’re developing a new rpg/shooter side scroller game and we’re using Spine for all the character animations (including mobs).

Our skeletons are quite simple with the legs and arms being one bone each (no knees or elbows) and the animations are quite simple with just a few key frames each.

We aimed at being able to have 50 animated characters on screen and we run into some performance issues and after running some simple time profiling we found spine to be responsible for most of the load.

In our case on an iPad 3 each character consumed more than 1.5% CPU, and with Corona being single threaded we ended up with lags ones we put more than 30 mobs on screen.

We’ve been able to improve the performance of spine by more than twice! I’ll post here the changes we did, starting with changes in the generic lua runtime and I’ll follow with Corona specific changes.

The first thing we noted is that bone:updateWorldTransform is responsible for most of the processing time, so we concentrated on optimizing this function as much as we could by following various lua performance tips we found on the internet:

  1. localize - the lua environment allocates 250 registers to each function, local variables are allocated in these registers and operations on these registers, especially math operations are much faster. So at the start of the function we copy anything we need to local variables, and at the end we copy back whatever we want to the actual table structure.
  2. reduce property lookups - what we did for localization also reduce the number of lua table lookups. For example if you have a bone, accessing bone.x requires lua to lookup the ‘x’ property of the bone table which is much slower than accessing local variable so by working on locals we reduce lookups as well. Aside from that any math.xxx function should also be localized to reduce these lookups.
  3. eliminate ipairs - using the ipairs command works with a next() iterator which generates a function call each iteration. working with simple ‘for’ loop is much faster (by %30 for).

In addition to these we also decided to remove any runtime calls to math.sin and math.cos because they are slow. instead we preloaded all the cos/sin values for angle -359 to 359 and we only work in full angles. we didn’t notice any difference for our animation but you could decide to store more angles for example steps of .1 angles. it only has a cost of memory. We also used a lua trick to round the angle by using (n - n%1) instead of math.floor which is %28 faster.

So here are the results from profiling the execution time of bone:updateWorldTransform:
original runtime = 100%
original runtime + localized math functions = 93%
optimized + localized math functions = 85%
optimized + const sin/cosine = 74%

and here is the code for the optimized version:

local Bone = {} local SIN, COS = {}, {} for i = -359, 359 do SIN[i] = math.sin( math.pi / 180 \* i ) COS[i] = math.cos( math.pi / 180 \* i ) end function Bone.new (data, parent) if not data then error("data cannot be nil", 2) end local self = { data = data, parent = parent, x = 0, y = 0, rotation = 0, scaleX = 1, scaleY = 1, m00 = 0, m01 = 0, worldX = 0, -- a b x m10 = 0, m11 = 0, worldY = 0, -- c d y worldRotation = 0, worldScaleX = 1, worldScaleY = 1, } --local rad, cos, sin = math.rad, math.cos, math.sin local inheritScale, inheritRotation = self.data.inheritScale, self.data.inheritRotation function self:updateWorldTransform (flipX, flipY) local parent, data = self.parent, self.data local x, y = self.x, self.y local rotation, scaleX, scaleY = self.rotation, self.scaleX, self.scaleY local m00, m01, worldX, m10, m11, worldY local worldRotation, worldScaleX, worldScaleY = rotation, scaleX, scaleY if parent then worldX = x \* parent.m00 + y \* parent.m01 + parent.worldX worldY = x \* parent.m10 + y \* parent.m11 + parent.worldY if inheritScale then worldScaleX = parent.worldScaleX\*worldScaleX worlsScaleY = parent.worldScaleY\*worldScaleY end if inheritRotation then worldRotation = worldRotation + parent.worldRotation end else worldX = (flipX and -x) or x worldY = (flipY and -y) or y end --local radians = rad(worldRotation) --local cosV = cos(radians) --local sinV = sin(radians) local angle = (worldRotation - worldRotation % 1) % 360 local cosV = COS[angle] local sinV = SIN[angle] m00 = cosV \* worldScaleX m10 = sinV \* worldScaleX m01 = -sinV \* worldScaleY m11 = cosV \* worldScaleY if flipX then m00 = -m00 m01 = -m01 end if flipY then m10 = -m10 m11 = -m11 end self.m00, self.m01, self.worldX = m00, m01, worldX self.m10, self.m11, self.worldY = m10, m11, worldY self.worldRotation, self.worldScaleX, self.worldScaleY = worldRotation, worldScaleX, worldScaleY end function self:setToSetupPose () local data = self.data self.x = data.x self.y = data.y self.rotation = data.rotation self.scaleX = data.scaleX self.scaleY = data.scaleY end return self end return Bone

note: I’ve posted an identical post on the official spine forum as well

I just tested your code with one of my animations but the animation now is lagging when the main character body is rotated. Limbs of the body seem to work normal… only the root rotation seems affected.

Wow! Assuming d.mach’s concerns get fixed this is a great improvement.

Only problem now is getting Graphics 2.0 support…

I just tested your code with one of my animations but the animation now is lagging when the main character body is rotated. Limbs of the body seem to work normal… only the root rotation seems affected.

Wow! Assuming d.mach’s concerns get fixed this is a great improvement.

Only problem now is getting Graphics 2.0 support…