High ANR rate and poor discoverability

We are quite consistently seeing fairly high ANR rates (>1% which exceeds the Bad Behaviour threshold by some distance) on all our apps for the last many months.

Recently it was announced that Google plan to raise the technical quality bar in deciding how discoverable apps are on the Play Store and they intend to directly use the user perceived Crash and ANR rates for this.

The vast majority of the ANRs are related to com.ansca.corona.Controller.stop and many others appear on android.os.MessageQueue.nativePollOnce

As for the nature of problem, many cases are input dispatching timeouts and there are other multiple issues with execution of certain services originating at these calling points.

We only use plugins for ads (Scott’s IrnSrc plugin), the latest Solar2d Notifications plugin and the new billing library.

I do remember reading some other similar threads recently where people are pushing the bad behaviour threshold on some of their apps and if this is something in the core engine, I was wondering if this will be looked at any time soon. With Google planning to take these apps off their discoverability queues, the damage to developers would be substantial.

com.ansca.corona.Controller.stop logs:

  #00  pc 0x00000000004a67e4  /apex/com.android.art/lib64/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, int, BacktraceMap*, char const*, art::ArtMethod*, void*, bool)+140)
  #01  pc 0x00000000005b5704  /apex/com.android.art/lib64/libart.so (art::Thread::DumpStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool, BacktraceMap*, bool) const+372)
  #02  pc 0x00000000005d2ca4  /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+924)
  #03  pc 0x00000000005ccb14  /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*)+532)
  #04  pc 0x00000000005cbc94  /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+1876)
  #05  pc 0x00000000005cb150  /apex/com.android.art/lib64/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+792)
  #06  pc 0x00000000005764f4  /apex/com.android.art/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+196)
  #07  pc 0x000000000058bf7c  /apex/com.android.art/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1396)
  #08  pc 0x000000000058af14  /apex/com.android.art/lib64/libart.so (art::SignalCatcher::Run(void*)+348)
  #09  pc 0x00000000000b0bd8  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+64)
  #10  pc 0x00000000000505d0  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)
  at com.ansca.corona.Controller.stop (Controller.java)
  at com.ansca.corona.CoronaActivity.requestSuspendCoronaRuntime (CoronaActivity.java:2055)
  at com.ansca.corona.CoronaActivity.onPause (CoronaActivity.java:1878)
  at android.app.Activity.performPause (Activity.java:8174)
  at android.app.Instrumentation.callActivityOnPause (Instrumentation.java:1510)
  at android.app.ActivityThread.performPauseActivityIfNeeded (ActivityThread.java:4771)
  at android.app.ActivityThread.performPauseActivity (ActivityThread.java:4732)
  at android.app.ActivityThread.handlePauseActivity (ActivityThread.java:4683)
  at android.app.servertransaction.PauseActivityItem.execute (PauseActivityItem.java:46)
  at android.app.servertransaction.TransactionExecutor.executeLifecycleState (TransactionExecutor.java:176)
  at android.app.servertransaction.TransactionExecutor.execute (TransactionExecutor.java:97)
  at android.app.ActivityThread$H.handleMessage (ActivityThread.java:2105)
  at android.os.Handler.dispatchMessage (Handler.java:106)
  at android.os.Looper.loop (Looper.java:223)
  at android.app.ActivityThread.main (ActivityThread.java:7703)
  at java.lang.reflect.Method.invoke (Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run (RuntimeInit.java:612)
  at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:997)
  at java.lang.Object.wait (Native method)
  at java.lang.Object.wait (Object.java:442)
  at java.lang.ref.ReferenceQueue.remove (ReferenceQueue.java:190)
  at java.lang.ref.ReferenceQueue.remove (ReferenceQueue.java:211)
  at java.lang.Daemons$FinalizerDaemon.runInternal (Daemons.java:273)
  at java.lang.Daemons$Daemon.run (Daemons.java:139)
  at java.lang.Thread.run (Thread.java:923)

Around 50% of the ANRs for android.os.MessageQueue.nativePollOnce are when the app is in background and the logs are not loading up for this. I’ll update the post when I’m able to fetch more details on this one.

Recently I have been using bugsnag for ANRs. I don’t have any solutions at the moment, but the cool thing about bugsnag is that when the ANR occurs, it tells you how we get there.

So at least for me this is what happens for this error on CoronaActivity:

Input dispatching timed out (Waiting to send non-key event because the touched window has not finished proce

  1. The player opens the game 21 hours before the error:

  2. The player puts the game in the background at some point. Opens and closes the phone 2 times over a span of 20 hours from the initial launch:



  3. Then decides to play my game, and there is an ANR. Notice it is a resume. And there is no bugsnag start, so the game is still running.

4 Likes

Thanks for posting your findings. My analysis of the stacktrace from the dev console also points to something going wrong in the pause/resume/start phases for the app.

Another one of our apps has over 4% of daily sessions with ANRs and that app uses a system for calculating time since the user last opened the app to reward them for returning in the “applicationResume” phase-- so, the problem does appear to be somewhere in starting and resuming of the apps.

@vlads just tagging you in to see if there’s any plan for looking into this anytime soon. Looking through the forum, this seems to have been a persisting problem since a few years and if it will make google stop the apps from being organically discoverable, it will be a major setback.

This would make sense given how many hours Solar2D devs have cumulatively spent on testing their apps and yet no one seems to have been able to reproduce these ANR issues.

I strongly recommend using a crash reporting service like firebase crashlytics, specially if you’re above bad behavior threshold. It generally has more info than the console.
In my experience most crashes occur on exit/reenter, for multiple reasons.

In my humble opinion, BugSnag is superior to Crashlytics and it is also “free.”

Has BugSnag helped you find anything in your own code that helped to reduce ANRs? I’ve long suspected that where the ANRs report the Solar function controller.stop() as being a very common cause, there’s a chance this could be misleading.
If a bug in our own code (or a plugin) were to cause the app to hang/crash causing the Solar2D activity to terminate, perhaps it is common for it to get stuck at the controller.stop() function even though this isn’t the root cause. If BugSnag lets us follow the stack trace back further to the true cause, that would be extremely useful.

Is the plugin production ready? Is there any documentation on the best way of using it to actually get something meaningful (Google stack traces don’t really help).

I’ve just released an update and my ANR rate has doubled. The only changes were some extra assets and moving to the new Google billing library.

I have noticed 2 ANRs that I can resolve within my app. One is AdColony, and the other Tapjoy. I plan to remove those two ad providers for now or try to upgrade to a newer version than what Ironsource currently supports if available.

Sorry Alan, I don’t follow what you are saying. The difference between what is provided through the other ANR tools and Bugsnag is that Bugsnag provides a breadcrumb of what exactly happened up to the point of the ANR. It also provides the stack trace but not sure if it goes further back.

You can read the bugsnag documentation it is fairly thorough. I have been using the plugin for a while with no issues. The plugin only implements three methods. The init, add custom breadcrumbs and force an error for testing. Usage of those three methods are on solar2d github.

Implementation was very simple. Now to add breadcrumbs everywhere and see what happens.

The what happened before trace seems very useful.

hey @agramonte looks awesome… Im looking at implementing it. and reading the doc for the plugin. Where do we place the api key they generate per app in our solar2d project?
Also regarding creating a project in Bugsnag which project type did you choose? (Other, Android, etc)

Many thanks in advance!

Android Project

Under build settings:

applicationChildElements =
        {
            [[
                <meta-data android:name="com.bugsnag.android.API_KEY"
             android:value="<your API key>"/>
            ]],
}
1 Like

Hi folks,
I have added Bugsnag to one of my projects.
I like it so much so far. I only had some issues finding the plug-in doc on GitHub, so here’s the link if anyone needs it:

Anyway,
The first error event I received is this one.
As you can see, it occurred on a resume after the ad closure (tapjoy rewarded ad, served via ironsource)

Maybe @vlads and @Scott_Harrison can take s look into it?

And this ANR happened during ads plugin init.

Yeah will take a look when did you build and get these error, I recently made and update to my IronSrc plugin

First release December 3rd, then I have made several updates like every 2 days, trying to figure out what’s going on.
(Way over the bad behaviour threshold).

I am about to release my app with this integrated. I’ll add my results here too but I imagine it will also be plugin related.

And this ANR may have occurred when resuming from the att dialogue (I don’t use att plugin, but there’s ironsource integrated one)

I’ll stop sending these messages here to not spam, but if you need more, just ask.

ATT plugin would only be on iOS though.

The “reason = recentapps” parts makes me think that in this case the player put your app into the background by opening the “recent apps” carousel, which triggers the CoronaActivity.onPause() function and led to a crash.

1 Like