Excessive garbage collection pausing my app during scene changes on Android devices. What can I do?

I’ve gone through and changed most of my text objects to bitmap fonts using Text Candy.  There is one huge disadvantage though, in that Text Candy doesn’t support unicode text, i.e. non-English letters such as á ñ  Ç etc.  This appears to be due to a known limitation of lua’s string library.  The “work around” for Text Candy as described by the TC developer uses lower case letters as placeholders for non-supported characters.  That won’t work in my case since I need to be able to use all characters. (I should also mention the TC developer appears to no longer update or even provide support for the product that he is still selling, so buyer beware.)

So, once again, I’m kind of stuck.  I’d like to have localized, multi-language support for my game.  I can do that with regular text objects and vector fonts but then I get the long pauses between scene changes.  Or, I can use fast and fancy bitmap fonts, but then lose ability to support multiple languages. This looks like a job for… Corona Labs!   Seems like essential functionality for a game centric mobile platform to support fast bitmap fonts and unicode.

Perhaps we can convince the Text Candy developer to add unicode support to his library?

Because the real issue here is that the Text Candy library does not support UTF-8 encoded strings.  That is, it assumes that all characters are 1 byte ASCII characters.  With UTF-8 strings, 1 character can be 1 to 6 bytes long.  This also means that the string length that you get in Lua (or even in C/C++) really represents the number of bytes in the string and not the number of characters.  So, in code, you can detect if the next character in the string is a unicode character if the most significant bit (ie: the 8th bit) is set to 1, in which case the following bits indicate the number of bytes the character uses.  If it is not set to 1, then it is a 1 byte ASCII character.  If you ever get bored, then you can read how UTF-8 encoded strings work here… :slight_smile:

   http://en.wikipedia.org/wiki/UTF-8

Alternatively, I suppose you could convert your UTF-8 strings to something that is compatible with Text Candy.  Such as a function that would replace a unicode character with place holder characters as suggested by the Text Candy developer.

Hi Joshua, thanks for the reply.

I get the feeling the Text Candy developer is no longer supporting his product.  I’ve emailed him over this issue as well as a couple others, and never did get a response.   The TC website mentions it doesn’t support multibyte characters “yet”, implying that it’s intended to in the future, but it’s been a long time since the last TC update. There are other threads on here going back to last year where others are asking for this same feature with limited or no response from the developer.

I did do a bunch of research on UTF8  and Corona/lua over the past couple days while trying to roll my own fix.  There are even a few posts on here where people have posted various code snippets and functions that supposedly fix this issue. There are functions similar to string.len and string.sub that are utf8 capable but I couldn’t quite get them all to work with TC (they also killed my framerate when I got them to partially work).  I’m guessing the developer tried, failed as I did, gave up, then abandoned his product.  Definitely a bummer since TC is actually quite powerful in every other aspect.

I’m not sure what Corona Labs could do to help, but given how slow the utf8 aware string functions are as written in lua perhaps there are ways to speed them up by making them part of the API?  Not sure that would even be enough for TC to work correctly, so perhaps a plugin or more comprehensive support of bitmap fonts is needed.

We have a plugin that allows you to do bit masking and bit manipulation.  It should be a lot faster since it is implemented in C/C++.  Have a look at the documentation for it here…
   http://docs.coronalabs.com/daily/plugin/bit/index.html
   http://bitop.luajit.org/api.html
 
In C/C++, you can determine the number of bytes one character takes like this…

int sourceStringLength = (int)strlen(sourceString); for (int index = 0; index \< sourceStringLength; index++) { char sourceCharacter = sourceString[index]; if ((sourceCharacter & 0x80) && ((sourceString[index + 1] & 0xC0) == 0x80)) { // This is a multibyte unicode character. // Determine how many bytes it takes within the given string. int unicodeByteCount = 1; if ((sourceCharacter & 0xE0) == 0xC0) { // This is an 11-bit unicode character. unicodeByteCount = 2; } else if ((sourceCharacter & 0xF0) == 0xE0) { // This is a 16-bit unicode character. unicodeByteCount = 3; } else if ((sourceCharacter & 0xF8) == 0xF0) { // This is a 21-bit unicode character. unicodeByteCount = 4; } else if ((sourceCharacter & 0xFC) == 0xF8) { // This is a 26-bit unicode character. unicodeByteCount = 5; } else if ((sourceCharacter & 0xFE) == 0xFC) { // This is a 31-bit unicode character. unicodeByteCount = 6; } } else { // This is a 1 byte ASCII character. } } &nbsp;

 
Note that I slapped the above code in a hurry.  You’ll have to convert it to Lua, but hopefully it might help you out.

Thanks, Joshua.  That looks similar to this lua function I found on another thread here:

-- returns the number of bytes used by the UTF-8 character at byte i in s -- also doubles as a UTF-8 character validator function utf8charbytes (s, i) &nbsp; &nbsp; -- argument defaults &nbsp; &nbsp; i = i or 1 &nbsp; &nbsp; local c = string.byte(s, i) &nbsp; &nbsp; -- determine bytes needed for character, based on RFC 3629 &nbsp; &nbsp; if c \> 0 and c \<= 127 then &nbsp; &nbsp; &nbsp; &nbsp; -- UTF8-1 &nbsp; &nbsp; &nbsp; &nbsp; return 1 &nbsp; &nbsp; elseif c \>= 194 and c \<= 223 then &nbsp; &nbsp; &nbsp; &nbsp; -- UTF8-2 &nbsp; &nbsp; &nbsp; &nbsp; local c2 = string.byte(s, i + 1) &nbsp; &nbsp; &nbsp; &nbsp; return 2 &nbsp; &nbsp; elseif c \>= 224 and c \<= 239 then &nbsp; &nbsp; &nbsp; &nbsp; -- UTF8-3 &nbsp; &nbsp; &nbsp; &nbsp; local c2 = s:byte(i + 1) &nbsp; &nbsp; &nbsp; &nbsp; local c3 = s:byte(i + 2) &nbsp; &nbsp; &nbsp; &nbsp; return 3 &nbsp; &nbsp; elseif c \>= 240 and c \<= 244 then &nbsp; &nbsp; &nbsp; &nbsp; -- UTF8-4 &nbsp; &nbsp; &nbsp; &nbsp; local c2 = s:byte(i + 1) &nbsp; &nbsp; &nbsp; &nbsp; local c3 = s:byte(i + 2) &nbsp; &nbsp; &nbsp; &nbsp; local c4 = s:byte(i + 3) &nbsp; &nbsp; &nbsp; &nbsp; return 4 &nbsp; &nbsp; end end &nbsp;

And here are the utf8len and utf8sub functions, as well as a utf8 character replace function, also found in another thread here

-- returns the number of characters in a UTF-8 string function utf8len (s) &nbsp; &nbsp; local pos = 1 &nbsp; &nbsp; local bytes = string.len(s) &nbsp; &nbsp; local len = 0 &nbsp; &nbsp; while pos \<= bytes and len ~= chars do &nbsp; &nbsp; &nbsp; &nbsp; local c = string.byte(s,pos) &nbsp; &nbsp; &nbsp; &nbsp; len = len + 1 &nbsp; &nbsp; &nbsp; &nbsp; pos = pos + utf8charbytes(s, pos) &nbsp; &nbsp; end &nbsp; &nbsp; if chars ~= nil then &nbsp; &nbsp; &nbsp; &nbsp; return pos - 1 &nbsp; &nbsp; end &nbsp; &nbsp; return len end -- functions identically to string.sub except that i and j are UTF-8 characters -- instead of bytes function utf8sub (s, i, j) &nbsp; &nbsp; j = j or -1 &nbsp; &nbsp; if i == nil then &nbsp; &nbsp; &nbsp; &nbsp; return "" &nbsp; &nbsp; end &nbsp; &nbsp; local pos = 1 &nbsp; &nbsp; local bytes = string.len(s) &nbsp; &nbsp; local len = 0 &nbsp; &nbsp; -- only set l if i or j is negative &nbsp; &nbsp; local l = (i \>= 0 and j \>= 0) or utf8len(s) &nbsp; &nbsp; local startChar = (i \>= 0) and i or l + i + 1 &nbsp; &nbsp; local endChar = (j \>= 0) and j or l + j + 1 &nbsp; &nbsp; -- can't have start before end! &nbsp; &nbsp; if startChar \> endChar then &nbsp; &nbsp; &nbsp; &nbsp; return "" &nbsp; &nbsp; end &nbsp; &nbsp; -- byte offsets to pass to string.sub &nbsp; &nbsp; local startByte, endByte = 1, bytes &nbsp; &nbsp; while pos \<= bytes do &nbsp; &nbsp; &nbsp; &nbsp; len = len + 1 &nbsp; &nbsp; &nbsp; &nbsp; if len == startChar then &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; startByte = pos &nbsp; &nbsp; &nbsp; &nbsp; end &nbsp; &nbsp; &nbsp; &nbsp; pos = pos + utf8charbytes(s, pos) &nbsp; &nbsp; &nbsp; &nbsp; if len == endChar then &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; endByte = pos - 1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break &nbsp; &nbsp; &nbsp; &nbsp; end &nbsp; &nbsp; end &nbsp; &nbsp; return string.sub(s, startByte, endByte) end -- replace UTF-8 characters based on a mapping table function utf8replace (s, mapping) &nbsp; &nbsp; local pos = 1 &nbsp; &nbsp; local bytes = string.len(s) &nbsp; &nbsp; local charbytes &nbsp; &nbsp; local newstr = "" &nbsp; &nbsp; while pos \<= bytes do &nbsp; &nbsp; &nbsp; &nbsp; charbytes = utf8charbytes(s, pos) &nbsp; &nbsp; &nbsp; &nbsp; local c = string.sub(s, pos, pos + charbytes - 1) &nbsp; &nbsp; &nbsp; &nbsp; newstr = newstr .. (mapping[c] or c) &nbsp; &nbsp; &nbsp; &nbsp; pos = pos + charbytes &nbsp; &nbsp; end &nbsp; &nbsp; return newstr end &nbsp;

I tried replaced all the string.len and string.sub in the Text Candy Library with calls to these functions instead, and it actually seemed to work somewhat.  That is, it didn’t crash, but the non-ascii characters in my bitmap font were still not being displayed, and I don’t understand the workings of the TC library enough to know why they didn’t.  And as I mentioned, the functions killed my framerate, even on my Mac desktop.  I’ll try to wrap my head around the bitop plugin, but at first glance I’m not sure how it could be used to speed up these functions.