com.ansca.corona.CoronaService ANR affects 20k users

Application Not Responding is being reported by play console

executing service org.***.****/com.ansca.corona.CoronaService

com.ansca.corona.CoronaService

 

"main" tid=1 Native "main" prio=5 tid=1 Native | group="main" sCount=1 dsCount=0 obj=0x75de81e0 self=0xea985400 | sysTid=29183 nice=-4 cgrp=default sched=0/0 handle=0xedbf3534 | state=S schedstat=( 0 0 0 ) utm=94 stm=23 core=0 HZ=100 | stack=0xff735000-0xff737000 stackSize=8MB | held mutexes= #00 pc 0000000000017530 /system/lib/libc.so (syscall+28) #01 pc 00000000000482bf /system/lib/libc.so (pthread\_join+146) #02 pc 0000000000015b58 /data/app/package-1/lib/arm/libopenal.so (alcDestroyContext+516) #03 pc 0000000000008ed7 /data/app/package-1/lib/arm/libalmixer.so (ALmixer\_Quit+230) #04 pc 000000000011753c /data/app/package-1/lib/arm/libcorona.so (???) #05 pc 0000000000119970 /data/app/package-1/lib/arm/libcorona.so (???) #06 pc 0000000000129a38 /data/app/package-1/lib/arm/libcorona.so (???) #07 pc 0000000000028cb4 /data/app/package-1/lib/arm/libcorona.so (???) #08 pc 000000000002bfb0 /data/app/package-1/lib/arm/libcorona.so (Java\_com\_ansca\_corona\_JavaToNativeShim\_nativeDone+28) #09 pc 00000000000205e5 /data/app/package-1/oat/arm/base.odex (Java\_com\_ansca\_corona\_JavaToNativeShim\_nativeDone\_\_J+80) at com.ansca.corona.JavaToNativeShim.nativeDone (Native method) at com.ansca.corona.JavaToNativeShim.destroy (JavaToNativeShim.java:277) at com.ansca.corona.Controller.destroy (Controller.java:286) - locked \<0x05bd76d9\> (a com.ansca.corona.Controller) at com.ansca.corona.CoronaRuntime.dispose (CoronaRuntime.java:88) at com.ansca.corona.CoronaActivity.onDestroy (CoronaActivity.java:1580) at android.app.Activity.performDestroy (Activity.java:7220) at android.app.Instrumentation.callActivityOnDestroy (Instrumentation.java:1161) at android.app.ActivityThread.performDestroyActivity (ActivityThread.java:4621) at android.app.ActivityThread.handleDestroyActivity (ActivityThread.java:4661) at android.app.ActivityThread.-wrap7 (ActivityThread.java) at android.app.ActivityThread$H.handleMessage (ActivityThread.java:1703) at android.os.Handler.dispatchMessage (Handler.java:102) at android.os.Looper.loop (Looper.java:154) at android.app.ActivityThread.main (ActivityThread.java:6776) at java.lang.reflect.Method.invoke! (Native method) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run (ZygoteInit.java:1518) at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:1408)

I’m experiencing the same. Any idea or fix for this? Calling Corona support.

Our engineers are working on finding a solution.

Rob

Our installs were at 150 a day, and with these similar errors based upon com.ansca.corona.Controller, Google has demoted us down to about 90 a day, quickly too.

So please, do what can be done. With plenty of RAM and Android versions 5.1, 7.0 and 8.0, this is especially discouraging. I might expect this from slower, old devices.

V20 (elsa) 2 40.0% LG Stylo 3 Plus (sf340n) 1 20.0% Galaxy S8 (dreamqltesq) 1 20.0% Galaxy J7 Prime (j7popeltetmo) 1 20.0%

Broadcast of Intent { act=android.intent.action.SCREEN_OFF flg=0x50000010 launchParam=MultiScreenLaunchParams { mDisplayId=0 mBaseDisplayId=0 mFlags=0 } (has extras) }

Apr 6, 12:32 PM on app version 273
LGE V20 (elsa), 4096MB RAM, Android 7.0

“main” prio=5 tid=1 Blocked | group=“main” sCount=1 dsCount=0 obj=0x75d0d6a8 self=0xf1305400 | sysTid=22014 nice=0 cgrp=default sched=0/0 handle=0xf413a534 | state=S schedstat=( 0 0 0 ) utm=1969 stm=897 core=1 HZ=100 | stack=0xff66f000-0xff671000 stackSize=8MB | held mutexes=

at com.ansca.corona.Controller.stop (Controller.java:263)

  • waiting to lock <0x0abd65e7> (a com.ansca.corona.Controller) held by thread 11

at com.ansca.corona.CoronaActivity.requestSuspendCoronaRuntime (CoronaActivity.java:2005)

at com.ansca.corona.CoronaActivity.onPause (CoronaActivity.java:1828)

at android.app.Activity.performPause (Activity.java:6894)

at android.app.Instrumentation.callActivityOnPause (Instrumentation.java:1323)

at android.app.ActivityThread.performPauseActivityIfNeeded (ActivityThread.java:3791)

at android.app.ActivityThread.performPauseActivity (ActivityThread.java:3768)

at android.app.ActivityThread.performPauseActivity (ActivityThread.java:3742)

at android.app.ActivityThread.handlePauseActivity (ActivityThread.java:3716)

at android.app.ActivityThread.-wrap16 (ActivityThread.java)

at android.app.ActivityThread$H.handleMessage (ActivityThread.java:1516)

at android.os.Handler.dispatchMessage (Handler.java:102)

at android.os.Looper.loop (Looper.java:154)

at android.app.ActivityThread.main (ActivityThread.java:6247)

at java.lang.reflect.Method.invoke! (Native method)

at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run (ZygoteInit.java:872)

at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:762)

Hi Troy, I’ve asked Engineering to look at this as part of the Android Crashing project we are working on.

Thanks

Rob

@Rob, thank you.

Hey, troylyndon. Is this full traceback? Usually there’s more.

held mutexes=

at com.ansca.corona.Controller.stop (Controller.java:263)

  • waiting to lock <0x0abd65e7> (a com.ansca.corona.Controller) held by thread 11

at com.ansca.corona.CoronaActivity.requestSuspendCoronaRuntime (CoronaActivity.java:2005)

Usually it should be more than 1 mutex to cause a deadlock

Are you able to reproduce this crash? If yes, can you provide surrounding output of logcat, and generally, tell what is going on.

Also, what plugins are you using? You can post build.settings to answer that. If there’s a link to live app can you provide it if it is free? You can also PM it to me.

This is good news and I know a LOT of us would prefer current dev being put on hold for a bit to fix the Android issues we are all facing.  This really impacts our ranking and thus downloads and ultimately our bottom line.

One of my projects went from 1,000+ downloads a day to sub 200 because of this.  I binned off Vungle SDK 5 and rolled back to Corona build 3200 and after a week rankings are being finally (albeit slowly) restored.

Is it a Corona change or a Vungle change that caused the massive increases in ANRs/Crashes?  Personally, I have no idea.

I can only relay my findings… this has gotten much worse in the past 3 to 4 months (just check all the #metoo forum posts on this subject)

Why build 3200?  Well that was the last build I used on one of my games that doesn’t show this silly escalation in ANRs so that might be a starting point for your investigations.

(Updated: This issue RE: runtime error caused by widget.lua has been moved to https://forums.coronalabs.com/topic/72782-runtime-error-caused-by-widgetlua/))

@vlads, the crash is completely unreplicatable.

@everyone

I found some issues with widget when a listbox is removed from memory, when widget.lua attempts to clean-up and remove it again, crashing when it attempts to remove an object already removed. Personally, I like during development that Corona’s LUA gives us an error when we attempt to remove objects that don’t exist anymore. But these kinds of errors should not happen during runtime IMHO.

It is my suspicion that this is the same issue or it is other similar runtime errors which should not report during runtime, again IMHO.

Just wanted to chime in and say my experience is roughly the same - I’m down about ~60%, and I really don’t have anything else to attribute it to other than the ANR-vs-SEO hit from vitals.

33f4y6s.jpg

Though my #1 offender remains PackageStateChangedService and the crash of libopenal/libalmixer (accounting for ~80% of total ANR’s for any sample period, on wide range of Android 5.0 - 8.0 devices)

Same problem here with all the ANR crashes! Tests on Android 7.0 and 8.1 show that when the game is freshly installed and asks for permissions it then crashes. If you launch the game again it runs normally. Uninstalling the game, then re-installing will have the permissions pop up and right after it will crash again, so is it something to do with permissions?

@SGS I’m actually using build 3184 and never thought of upgrading yet.

@Rob and Corona support Thank you for looking into this. I’m in the same situation, about 90% of my ANRs/crashes is related to that “packageStateChangedService” issue.

(Updated: This issue RE: runtime error caused by widget.lua has been moved to https://forums.coronalabs.com/topic/72782-runtime-error-caused-by-widgetlua/))

@Rob, any chance the team can eliminate the runtime error that occurs when widget.lua (OR ANY OTHER MODULE) attempts to remove an object that doesn’t exist, not during development, but on an actual device build? Gotta be just a line or two of code to check first and it will solve a number of ANRs I’ve gotten in the past year from widget.lua.

I just looked through the updates since build 3200 and found iOS 11.3 and that was enough for me to discontinue thinking about an old build, even though the ANR’s are on Android. Until Corona informs us that build 3200 will fix the ANR issues, I’m sticking with the latest version for this week’s update.

Can you be more specific? Is there an error in widgets that’s causing this or is it a case that you need to test to make sure the widget exists before you try and remove it?

Rob

We will never go back and patch 3200. Any non-plugin fixes will happen in later daily builds.  Plugins are independent of builds (though they may have a minimum build number). If you need to support iOS 11.3 you should use at least the minimum build that supports that.  

Rob

(Updated: This issue RE: runtime error caused by widget.lua has been moved to https://forums.coronalabs.com/topic/72782-runtime-error-caused-by-widgetlua/))

If I have my code remove a scrollview object, once in a very blue moon, I get a runtime error that the scrollview object cannot be removed within “widget.lua” - but this is indicative of occassional runtime errors I’ve seen in other plugins, too.

But this is indicative of a larger problem in the Corona engine where object:remove() causes runtime errors when an object doesn’t exist. The only fix I have for this is to use display.remove, but widget.lua is outside of my codebase and it doesn’t use display.remove.

My point was that in device builds, Corona should NEVER CAUSE RUNTIME ERROR OR CRASH when…

  1. object:remove() attempts to remove an object already removed and…

  2. when a variable compare involves one value that is nil. If myVariable<>5 CRASHES when myVariable=nil, that’s absurd, too. If it’s not equal to 5 (even if it is nil), then it should return FALSE. If both compared values are nil, Corona should return TRUE.

These are just two examples of how Corona engine will CRASH on a RUNTIME ERROR when there really is no reason for Corona engine to be confused. During development and in the simulator, these errors help us to fix bugs. But on device builds, they cause CRASHING.

I’m sure there are other cirucumstances the Corona Team could add about how to resolve runtime errors on device builds, since that’s what this Forum post is about - CRASH and ANRs on Android (for example), but these same issues I’ve mentioned here apply to iOS runtime errors, too.

The Corona Team team really should consider what other RUNTIME ERRORS can be reduced with a few lines of additional code in the engine - specifally on device builds.

WHY? Because if I supress the runtime pop-ups on deviceBuilds, as it written in the runtime supression feature related FAQ, the error information would not be sent to Google, unless we were to send it to our own servers. https://coronalabs.com/blog/2013/03/13/wednesday-faq-runtime-error-listener/

For this reason, we do not supress runtime popups, so hopefully this post makes it more clear as to why the Corona Team should fix these types of issues - just try and have the engine not be confused when the coders intent is quite clear. Makes sense, right?

But instead, runtime errors cannot be supressed (if we want good reporting to Google) and often cause a CRASH, and sometimes in plugins, too.

So, here we are, IMHO.

@troylyndon: Sorry in advance for the following harshness but I cannot stand idly after reading your last post.

Although I agree that having an SDK that prevents every possible error from happening is great, you can’t really expect corona to add in those features you suggest because those errors are obviously caused by bad coding.

I have had loads of errors occur over the years because I tried removing objects that were not there or tried doing a number calculation with something that is nil. And i had to spend lots of effort and time going through my code to clean them up to fix the issues. Issues which were of my own making due to sloppy coding.

Really, i have to say your thinking is completely ridiculous in expecting corona to solve all the problems that clearly you are creating yourself.

Those 2 types of errors you just mentioned have nothing to do with the current ANR errors all the developers are facing.

On a side note, you can have corona SDK pop up a specially designed window with the error code info/trace and let customers screen cap or email it to you so that you can see any bugs you have missed after release. It may sound silly but I have most customers who come across bugs happy to do that as they then see a developer who actually cares about solving the bugs instead of simply crashing out of the app leaving the customer (and developer) with no idea why the bug happened.

(Updated: This issue RE: runtime error caused by widget.lua has been moved to https://forums.coronalabs.com/topic/72782-runtime-error-caused-by-widgetlua/))

@jacques1, first let me apologize - I never meant to offend you or anyone else. BUT PLEASE READ THE POST ACCCURATELY. This has nothing to do with bad coding. Please read again and refrain from insults.

@everyone else: Firstly, I explained that my CRASH and ANRs are caused by “widget.lua”, a Corona plugin failing to recognize when a scrollView object has already been removed.

Secondly, I explained that I solved runtime issues relating to object:remove() by using display.remove(object).

In summary, the fact that “widget.lua” is outside of my ability to modify the code (without using Enterprise) is what’s at issue here, which as I stated, is indicative of a larger issue of runtime errors unnecessarily causing a CRASH or ANRs.

The fact that Corona discourages use of their runtime supression feature, because it disables reporting to Google, makes the matter more of something the Corona Team should address. And my point was, while they are at it, they can fix a few other issues that cause runtime errors, too - because as is evident in widget.lua, Corona’s own plugins are suseptible to rare, almost impossible to duplicate, bugs too.

Of course, I could start using Enterprise and fix the bug myself in widget.lua (for my game only), but that will cause me more work everytime I want to create a build. I’m willing to be patient, hoping that the Corona Team will fix it themselves.

Overall, the point is, the plugins are not perfect, either. Bugs that occur extremely rarely, in complex plugins with 0.5% ANR rate and which are very difficult or impossible to duplicate, are indicative of the larger problem I’ve tried to bring to everyone’s attention.

Why not look at plugging a limited number of runtime errors on device builds? Or make the engine a bit smarter when the coder’s intent is never unclear?

In my case, the author of widget.lua obviously never intended to use object:remove() if the object was already removed. But they did, instead of using display.remove(object). Making the engine smarter on device builds, IMHO, is the best resolution and it will likely be a strategy that can stand in the gap for less experienced coders, too.

And this would likely apply to the ‘controller’ module in Corona, too, where my latest runtime ANRs are occuring.