Simulator crashing/memory leak with display objects

I am creating thousands of display objects (specifically, shape objects) and then capturing the resulting screen rendering using display.captureScreen(). I do this so that the game engine doesn’t have to keep re-drawing the thousands of shape objects over and over again with each update.

Immediately after calling display.captureScreen(), I remove all of the shape objects from the scene and nil their references.

When I repeat this process, the memory used by the simulator (according to Windows Task Manager) increases by a few hundred mb each time. Eventually, more than 1 gb is used and the simulator crashes with the console output: The stdin connection has closed.

What is causing this? Why is so much memory being used?

I remove all display objects directly after display.captureScreen(), so what’s the problem here?

When you say “repeat this process”, do you mean on the same run of the program, say the next frame or after a timeout? Or after a relaunch? (I know of a bug with the latter very likely to exhibit this problem. I’ve mentioned it to @vlads but not submitted a PR yet.)

“Why is so much memory being used?” Display objects get boiled down to a big soup of geometry and this is realized in a big buffer. Two actually, as there’s a couple of them that get ping-ponged, with one backing the data for OpenGL to use. The bug mentioned above comes down to there being some slight misordering of cleanup operations; I’m wondering if something similar might be going on.

Does it change at all (other than failing more slowly :smile:) if you stagger it a bit, say a capture then clean the stuff up in a couple frames, then wait again before another capture?

The process is repeated multiple times within the same simulator instance (no relaunching in between).

If I reduce the number of shape objects, then each time I do a cycle, there is slightly less memory added/used by the simulator (from about 200mb-350mb to maybe 100mb-150mb each time), and also the memory usage goes down occasionally (if I wait like 10 seconds, or sometimes when I do another cycle). The memory dropping when using fewer shape objects prevents the simulator from crashing.

The memory usage does not go down at all when using an increased number of shape objects as mentioned in the OP.

“and also the memory usage goes down occasionally” Any better / worse if you follow the removals with a collectgarbage()?

Could this bug you’re mentioning lead to valid textures becoming invalid between relaunches? Some textures, usually the biggest in the scene, just turn into black rectangles in my case, and this is usually accompanied by warnings along the lines of:

11:11:11.111 WARNING: D:\pathToScriptCreatingTheImage\scriptCreatingTheImage.lua:111: file ‘pathToImage/image.png’ does not contain a valid image

I’ve only seen such behavior with texture memory being maxed before, but that’s not the case in this scenario.

@Gil44liG It’s possible. It might be that the memory allocator has gotten starved of free contiguous blocks that it can allocate for the larger texture, after loading it from file but before uploading it to VRAM.

This is the part that needs addressing: https://github.com/coronalabs/corona/blob/master/librtt/Renderer/Rtt_Renderer.cpp#L153

The cleanup of the geometry pool happens too late: it enqueues the pool for removal, but as you see, the resources get cleaned up just before… and if I remember right, the record of it even gets wiped when everything is reset.

I originally discovered this since the Vulkan backend I was working on was very vocal about this not being cleaned up. :smile:

I have a local fix, just a rearrangement:

    Rtt_DELETE( fGeometryPool ); // <- CHANGE

	DestroyQueuedGPUResources();
	
	Rtt_DELETE( fBackCommandBuffer );
	Rtt_DELETE( fFrontCommandBuffer );
//	Rtt_DELETE( fGeometryPool ); <- CHANGE

You can actually see my “CREATE” / “DESTROY” tracking for this in this video over on Discord. These are geometry resources being created and destroyed, respectively. Without the fix the "DESTROY"s only show up for indexed mesh geometry.

I can see about doing a PR later tonight.

1 Like

Ooh, nice, thank you very much for the info!

Not sure how soon I’ll get enough time to try it out, but if the fix doesn’t make it into the main repo by then, I guess I’ll just have to try building the engine for the first time, haha.

I don’t know much about how rendering works, but since I started using the sim on my current Mac Mini (2019 version which was bought right after they were available) I’ve had issues where using captureScreen() in the sim would cause issues. Often the captured image is corrupted, and sometimes it would cause the entire Mac to freeze requiring a hard reset.

Do you think the change you’ve suggested is needed could be related to that behaviour?

@Gil44liG Sounds good. My fork is actually a bit of a mess right now, so I might be a while submitting this myself. If you do try it, feel free to do the PR instead. (Test code changes are pretty simple; let me know if you want them.)

@alanFlickGames I don’t think it would be specifically related. Thought experiments are failing me. :smiley:

A while ago, somebody mentioned to me experiencing capture-related issues on Windows, and wanting to give the aforementioned Vulkan backend a go. That ended up not having the problems. We suspected glReadPixels() being the culprit, but haven’t pursued it.

Maybe something like this is at fault. On Windows, for instance, external texture widths are finicky if you’re using the RGB format; I believe this needs to be generalized. (I wrote some workarounds in this regard for the Bytemap plugin, largely from guesswork since Solar wasn’t yet open-source, but since “it works” now, never got around to submitting a proper fix.)

I haven’t run into anything similar on Mac–I use captures rather sparingly, though–but maybe something has changed more recently, say with M1.

1 Like

Welcome to my world… been struggling with this for ages.

Sim can’t save a screenshot (memory alloc issues - can’t remember the exact code but it was on create new bitmap) also randomly on device.

Almost at the point of giving up!

Any better / worse if you follow the removals with a collectgarbage() ?

Calling collectgarbage() after removing the shape objects doesn’t do anything.

Also, sometimes before the simulator crashes, if I continue doing cycles, the texture becomes corrupt or something. It gets filled with miscolored pixels and such.

Also, sometimes before the simulator crashes, if I continue doing cycles, the texture becomes corrupt or something. It gets filled with miscolored pixels and such.

I’ve been reviewing some of the display.capture() et al. crashes, in the hopes of trying to fix something, and I think this might help me recreate some problems others had mentioned.

Unfortunately, probably won’t help in your case! :smiley:

That said, I think I have a plausible situation, in part expanding on some of my comments above.

Display objects and the underlying resources (the “front end” for geometry, textures, etc. but also their GPU-side counterparts) aren’t evicted immediately on removal, but go into an “orphanage” and are periodically removed in bulk. There’s a Collect() function here that does this. As you can see, it staggers the CPU-side resources every four frames (if a GPU resource was needed for that, it will then be removed before the next render); display objects every 32 frames.

The normal scene renderer calls this each frame. The one used by captures does not. (I’m also wondering if that shouldn’t Flush(); it worked without it on a Windows test and I could see where that might have issues. But that’s a tangent.)

So maybe there should be a Collect() here.

That said, by the sounds of it, the 4 and 32 frame counts probably aren’t aggressive enough to keep up with your output. 32 frames of thousands of objects comes to a lot. :smiley: Maybe those numbers could be tuned somehow, say with display.setDefault() and some powers-of-2.

Anyhow, that’s my updated guess.

Okay, I cobbled together a test:

local random = math.random
local cw, ch = display.contentWidth, display.contentHeight

local count = 100

for i = 1, count do
print("I",i)
  local g = display.newGroup()

  for _ = 1, 5000 do
    local c = display.newCircle(g, random(100, cw - 100), random(100, ch - 100), random(10, 25))
    c:setFillColor(random(), random(), random())
  end

  display.save(display.getCurrentStage(), ("F%i.png"):format(i), system.DocumentsDirectory)

  g:removeSelf()
end

This seems to agree with my last post. As a timer.performWithDelay() with count iterations, there was no problem. With the loop as above, and without a Collect() in the capture logic, it would die around iteration 37 or so. Adding that, it actually holds up, though it goes slightly over 1GB before dropping down (probably just in the nick of time, given the other crashes). I tried changing the 32 frames to 8 in Collect() and it seems to not quite hit 600 MB before dropping back.

The proximate cause of the crash seems to be failing to allocate the buffer that will receive the captured pixels; in Debug it will assert() the result and fail right there. So a release version is just failing due to that at some later point; it never did give me any distortion, for what it’s worth.

I might see about adding that setDefault() thing and submitting this down the road. Probably will still be tinkering for a bit, though. :smile: