debugging

Hi Corona staff

I log every error in my live app to a database and I am getting a large number of errors that are almost useless for debugging purposes…  This is by far the most common error (below).

This gets logged at least 5,000 times a day but only from around 1% of active devices.  How am I supposed to localise the error on players’ devices when this is all I get as a stack trace?

To give you and idea of the (needle in a haystack) issue, reCalculateStatistics() is some 500 lines long that time slices over 10 frames (via a runtime enter frame event). It recalculates 20+ runtime parameters for each tile over 100x100 tiles.

Without a line number to localise the error to how do I know what instruction is raising the error?

bad argument #1 to ? (number expected, got nil)

stack traceback:
 [C]: in function ?
 ?: in function reCalculateStatistics
 ?: in function func
 /Users/jenkins/slaveroot/workspace/Templates/label/android/platform/resources/init.lua:222: in function </Users/jenkins/slaveroot/workspace/Templates/label/android/platform/resources/init.lua:205>

Also, players are saying that the game randomly crashes and I don’t understand why this happens when I have the following

Runtime:addEventListener( “unhandledError”, unhandledErrorListener)

that is supposed to catch and handle all errors?  The function is as follows and all that does is log the error to Google Analytics, log the error to my server and return true

function unhandledErrorListener( event )
  ga.error(event.errorMessage, fatal)
  local errorString = event.errorMessage…"\n"…event.stackTrace
  gameEngine:logError(errorString)
  return true
end

Is there something I am missing?  Shouldn’t this function mean that no unhandled error will ever stop the app crashing?  Because players are reporting the app just immediately stops and exits back to their home screen.

FYI, one device logged over 800 errors, I replicated their game on simulator (using their device id) and was unable to raise a single error so this is definitely something that is device dependent.

If there is anything else I can do then please advise what this might be.  Upsetting paying players is not high on my list of things to do. 

I have over 10,000 DAU so I would really appreciate some insight into how I can localise these errors.

Kind regards

Adrian

If you can reproduce this on a device of your own (or a friends) do so with a build that has this in build settings:

build = { neverStripDebugInfo = true }

https://docs.coronalabs.com/daily/guide/distribution/buildSettings/index.html#build-control

That should allow you to get the exact line number.

However, that specific error tells me you may be accessing an object that has been removed but not garbage collected.

One more thing.  enterFrame is notorious for this kind of issue.  

Objects that have a runtime listener will still keep receiving them after having been destroyed (Corona object removed).

You have to verify the object is still valid before operating on it and do your best to clean up enterFrame listeners.

Here is a trick I use sometimes:

local bob = display.newCircle( 10, 10, 10 ) bob.age = 10 function bob.enterFrame( self ) if( self.removeSelf == nil ) then Runtime:removeEventListener( "enterFrame", self ) return end print( self.age ) end Runtime:addEventListener( "enterFrame", bob ) -- wait then delete bob timer.performWithDelay( 500, function() display.remove(bob) end )

I do appreciate you replying roaminggamer , I was kinda hoping for some input from corona people as this is about error handling and this is an area corona is weak in compared to unity.  In unity I would simply have a try catch block and that would allow program execution to continue on the next line but this is sadly missing.  It is crazy to have an error raised in a function and that function simply stops running if an error is encountered because we can’t actively trap and handle the error.

If I have a catch all error handler then surely my app should never crash?  yet I am seeing crash logs in Google play that are way out of scope of my code.  I am sure you experience this too?

Do you have any published apps?  I’d be interested in checking them out.

@roaminggamer is right about the neverStripDebugInfo block. That should get you some more information to work with.

As far as not catching errors, if your app is hard crashing, that is it’s generating a SIGSEGV or a SIGBUS error, the OS Kernel is going to dump the app and our error handling won’t catch it. Unity’s try-catch isn’t likely to catch these either. We work hard to try and avoid these, but there are things you can do that will trash pointers that are hard to protect against it.

A common cause would be to leave a timer running, or an onComplete call that tries to call a function that no longer exists because you removed the scene.

Your users will just see the app lock up and quit or just quit back to the launcher screen.

Rob

Hi Rob,

Just to set the scene (in case it helps and no pun intended).  I only have a single scene in my game.  The function that raises 95% of all logged errors is reCalculateStatistics().  This gets started on any change in game state and runs from an enter frame handler as the entire function takes around 2 seconds to complete.  Instead of locking up the game by recalculating game state per tile for 100 x 100 tiles it splits the processing over 15 frames.  I appreciate this adds a bit of over head looping the entire array again and again but the frame rate drops to 20fps for a bit rather than 0fps!  I looked at coroutines but that didn’t seem to offer any performance advantage over enterframe.  The ability to spawn a background thread would really help - especially with multicore devices.

This function is performing some heavy maths and only references the main array (where each element is a metatable) so it is not referencing any display objects (that could have been destroyed) nor is there any timer events or transitions involved.

What is so frustrating is I cannot get it to break on simulator nor on devices (S4, S5, G3, tab4) I have for testing. 

They seem to mostly come from low end Android devices so I wonder if it is an issue with low memory or something similar?  The game can use 90+MB and 160+MB of texture memory.

Thanks

Adrian

Hey Adrian,

unfortunetly I can’t offer you a solution to you initial question, but maybe you are able to bypass the problem?

To access and calculate such great amount of data over several frames in comparison to a single frame is a great Idea.

But may I ask why you want to process this amount of data on a regular intervall in the first place?

Is it for saving the game state, or for showing the player certain statistics? Depending on that there may be better solutions.

You might just register any object that changes, and than loop through all changed objects to get the new values. Don’t get me wrong, but looping through so much data every few frames seems strange and over the top.

Are you trapping low memory warnings? If you run out of memory the OS will dump you and look like a crash. We have a low memory event. I’m travelling and can’t look it up for you.

If you handle that event you will be able to free stuff up and/or ping your analytics to let you know that’s the cause.

As for the other error have you told it to not strip the debug info yet? That will help clue you in to the errors you are catching.

@torbenratzlaff, The reason for the heavy calculation is because of the game mechanics. 

My game is a city-builder. Each building has an area of affect on it’s neighbours to a greater or smaller degree depending on what building it is and what property is being radiated.  For example a park will provide a positive bonus between 3 and 8 tiles radius depending on it’s upgrade level.

Other things like if a user removes a road from a power station then that will stop it functioning and that could have an effect on hundreds of buildings.  As the affected tiles no longer have power then they won’t generate pollution or happiness and that then has to ripple out to it’s neighbours, etc.  Also I need to aggregate lots of values over the entire city and that requires accessing every element in the array to perform in real time.

It actually is quicker to recalculate everything then try and build a binary tree of all the “tile 1 has affected tile 2 has affected tile 3”, etc.

At the end of every recalculation it then saves a compressed and encrypted game state to disk (and every so often to the cloud) - this is done over many frames too.

@Rob - that’s an idea I will look into low memory event.  I can’t use the strip debug setting on release builds unfortunately - only with a debug keystore.

The whole idea behind that key is to keep the debug symbols in a release lap. If you’re using a debug keystone we don’t strip the symbols anyway. Try it

Update: that setting (build = { neverStripDebugInfo = true }) did nothing on a “live build” on local device installed via ADB.  I did however manage to get proper line numbers from building with the debug keystore on local device.

I thought I had problem customers until I managed to emulate it on simulator.  I have a master display group with 12 other display groups layers to help simulate a 3D environment.  All the pinch zoom / pan events modify the master display group and the nested groups update accordingly.  All of a sudden the child display groups lost their coordination and their z-indexing and the display corrupts.

This is after the display group disruption

this is after a restart

any ideas?  No errors logged on simulator

Hi Rob,

Unfortunately the low memory event is only on iOS (according to https://docs.coronalabs.com/api/event/memoryWarning/index.html) so how should I handle this on Android? (my install base is 80% Android)

Any idea what is causing the displayObjects to change their hierarchy and relative x, y coords relative the their parent displayGroup? 

Basically they appear to reverse the order they are inserted into their parent displayGroup (as per the screenshots in post above).  If it does happen it is only after 20-30 minutes of playing (I have 5 customers that have reported the issue).  memory usage is around 250MB when it goes pear-shaped.  It is happening on a Galaxy tab 4, Galaxy S5, HTC one and LG G3 so it is not localised to low end devices. 

I know for a fact that there is no code to shuffle the order as this would make absolutely no sense in my game so therefore the problem is with the SDK somewhere.  I can’t submit a use case unless it is my entire project.

Thanks

If you can reproduce this on a device of your own (or a friends) do so with a build that has this in build settings:

build = { neverStripDebugInfo = true }

https://docs.coronalabs.com/daily/guide/distribution/buildSettings/index.html#build-control

That should allow you to get the exact line number.

However, that specific error tells me you may be accessing an object that has been removed but not garbage collected.

One more thing.  enterFrame is notorious for this kind of issue.  

Objects that have a runtime listener will still keep receiving them after having been destroyed (Corona object removed).

You have to verify the object is still valid before operating on it and do your best to clean up enterFrame listeners.

Here is a trick I use sometimes:

local bob = display.newCircle( 10, 10, 10 ) bob.age = 10 function bob.enterFrame( self ) if( self.removeSelf == nil ) then Runtime:removeEventListener( "enterFrame", self ) return end print( self.age ) end Runtime:addEventListener( "enterFrame", bob ) -- wait then delete bob timer.performWithDelay( 500, function() display.remove(bob) end )

I do appreciate you replying roaminggamer , I was kinda hoping for some input from corona people as this is about error handling and this is an area corona is weak in compared to unity.  In unity I would simply have a try catch block and that would allow program execution to continue on the next line but this is sadly missing.  It is crazy to have an error raised in a function and that function simply stops running if an error is encountered because we can’t actively trap and handle the error.

If I have a catch all error handler then surely my app should never crash?  yet I am seeing crash logs in Google play that are way out of scope of my code.  I am sure you experience this too?

Do you have any published apps?  I’d be interested in checking them out.

@roaminggamer is right about the neverStripDebugInfo block. That should get you some more information to work with.

As far as not catching errors, if your app is hard crashing, that is it’s generating a SIGSEGV or a SIGBUS error, the OS Kernel is going to dump the app and our error handling won’t catch it. Unity’s try-catch isn’t likely to catch these either. We work hard to try and avoid these, but there are things you can do that will trash pointers that are hard to protect against it.

A common cause would be to leave a timer running, or an onComplete call that tries to call a function that no longer exists because you removed the scene.

Your users will just see the app lock up and quit or just quit back to the launcher screen.

Rob

Hi Rob,

Just to set the scene (in case it helps and no pun intended).  I only have a single scene in my game.  The function that raises 95% of all logged errors is reCalculateStatistics().  This gets started on any change in game state and runs from an enter frame handler as the entire function takes around 2 seconds to complete.  Instead of locking up the game by recalculating game state per tile for 100 x 100 tiles it splits the processing over 15 frames.  I appreciate this adds a bit of over head looping the entire array again and again but the frame rate drops to 20fps for a bit rather than 0fps!  I looked at coroutines but that didn’t seem to offer any performance advantage over enterframe.  The ability to spawn a background thread would really help - especially with multicore devices.

This function is performing some heavy maths and only references the main array (where each element is a metatable) so it is not referencing any display objects (that could have been destroyed) nor is there any timer events or transitions involved.

What is so frustrating is I cannot get it to break on simulator nor on devices (S4, S5, G3, tab4) I have for testing. 

They seem to mostly come from low end Android devices so I wonder if it is an issue with low memory or something similar?  The game can use 90+MB and 160+MB of texture memory.

Thanks

Adrian

Hey Adrian,

unfortunetly I can’t offer you a solution to you initial question, but maybe you are able to bypass the problem?

To access and calculate such great amount of data over several frames in comparison to a single frame is a great Idea.

But may I ask why you want to process this amount of data on a regular intervall in the first place?

Is it for saving the game state, or for showing the player certain statistics? Depending on that there may be better solutions.

You might just register any object that changes, and than loop through all changed objects to get the new values. Don’t get me wrong, but looping through so much data every few frames seems strange and over the top.

Are you trapping low memory warnings? If you run out of memory the OS will dump you and look like a crash. We have a low memory event. I’m travelling and can’t look it up for you.

If you handle that event you will be able to free stuff up and/or ping your analytics to let you know that’s the cause.

As for the other error have you told it to not strip the debug info yet? That will help clue you in to the errors you are catching.