Excessive increase "crashes rate" from Build 3692

These are all the crashes in common and the most relevant that my apps are having with Build 3699.
I have already tried everything and I have not been able to find a solution.

android.view.ViewGroup.offsetRectBetweenParentAndChild
java.lang.IllegalArgumentException

[base.apk] com.ansca.corona.storage.FileServices.copyFile
SIGBUS

[libc.so] __memcpy
SIGBUS

[split_config.arm64_v8a.apk!libalmixer.so] ALmixer_PlayChannelTimed
SIGSEGV

android.media.MediaPlayer.native_finalize
java.util.concurrent.TimeoutException

[split_config.arm64_v8a.apk!libcorona.so]
SIGSEGV

[split_config.arm64_v8a.apk!liblua.so] lua_pcall
SIGSEGV

[base.apk] com.ansca.corona.NativeToJavaBridge.loadBitmap
SIGBUS

Does anyone have any idea how to solve any of these crashes?

Please note, please include the PC address of frames so we can use addr2line to find the source code.
It looks like:

#00  pc 0x00000000003889e0 ... libcorona.so

BTW, can you provide about the crash memcpy related PC address?
I doubt 0f62fee should fixed this.

Thanks.

Here is the detail

backtrace:
  #00  pc 0x000000000004b884  /apex/com.android.runtime/lib64/bionic/libc.so (__memcpy+276)
  #01  pc 0x00000000000386a8  /system/lib64/libandroidfw.so (android::_FileAsset::read(void*, unsigned long)+236)
  #02  pc 0x000000000015ce18  /system/lib64/libandroid_runtime.so (android::NativeAssetRead(_JNIEnv*, _jclass*, long, _jbyteArray*, int, int)+152)
  #03  pc 0x000000000036bd8c  /data/misc/apexdata/com.android.art/dalvik-cache/arm64/boot.oat (art_jni_trampoline+140)

Following this now too. We’ve seen a slight crash rate increase, but mostly ANRs out of control for the last 6-12 months.

Is Solar no longer a viable option for Android builds? Can’t believe this isn’t a high priority.

It is a high priority, it doesn’t mean there is a clear solution for this. Having reproducable events would help a lot.

1 Like

Maybe CoronaRuntime need a life cycle request queue to solve like Controller.stop cause ANR when Lua Codes (GLThread) doing heavy work and Android life cycle been triggered (CoronaActivity in main thread).

Sorry that I can’t help this time, it may related to Android it self.

This suggestion seems to fit with feedback we had from Google about our ANRs. They said that it seemed as though heavy work on the GL thread (they suspected it could be garbage collection) was the root cause. Having looked at the source I found that the controller.stop() function does make requests to the GL thread - there’s a comment in stop() that says:

// If we don't do this then there won't be one last onDrawFrame call which means the runtime won't be stopped!
		requestEventRender();

My theory is that this sequence of events happens:

  • The rendering has frozen for some reason
  • User tries to close the app.
  • App goes through shutdown process, which includes calling the stop function.
  • The stop function has a line which is attempting to render one more frame - but the GL thread is stuck so it cannot do this.

I don’t have any ideas on how to resolve this though.

1 Like

I can try make a build which does just that, doesn’t do anything in the GL thread right away but schedules update to the GL thread. We will lose some events on Lua side, but we would anyway because app would be killed because of ANR. Will this post with the links

Here is the build - Daily Build · coronalabs/corona@6327674 · GitHub (scroll down to see the artifacts, you want one which starts with Simulator-)

2 Likes

Controller.start() and others that using the synchronized/mutex things also needs to be shedule.
I’ll collect the reproduced demos and prototypes and send them to you or submit a PR.

Pretty sure scheduler is a queue by itself. You are right, this is proof of concept

Thanks for this Vlad. I’ll roll out a new build to a small % of players later today and let you know if it makes a difference after a few days.

1 Like

Not sure how reliable this might be, but there is another idea to try to track crashes: use the Flurry analytics plugin with option ‘crashReportingEnabled = true’. If you saturate the application with different events such as app resume, show ads etc., you will probably be able to see some kind of pattern. I tried it myself, but haven’t delved too deeply into it yet and haven’t seen much of anything. Maybe someone more experienced can try this.


Unfortunately it doesn’t look like this change had any impact.
ANR rate is unchanged and I still see the ANR getting stuck in controller.stop() in my ANR traces.

That is of course assuming that it did actually build using the test version. After downloading that build I clicked the Setup Native button - there was nothing else that needed to be done for the command line builds to use that version, right?
A month or so ago I tried tweaking the source code and used the gradle function to send those changes to the sim, but afaik downloading a new version and clicking Setup Native would override that.

It seems we have an actual solution on the way. Will make a build later today.

8 Likes

Amazing! Can’t wait to hear more about this. Thanks Vlad!

I have comment the demo and my test at #643.

And tested with this Daily Build (2100.9999) on macOS, build for Android, also reproduced this issue.

Exciting to hear! Any chance we can get this today, I’d love to have a new version out for the black Friday downloads that are about to happen.