indexes in tables - speed differences between table[1] and table["abc_123"]?

Dear Corona-community,

quick question: In a table

local table = {}

Docs say that I can use whatever key I want. My question:

Is there, speed- or memorywise, a difference between

table[1] = “test”

and

table[“abc_123”] = “test”

So, would “test” get printed out faster when indexed with an integer rather than a complex string?

Thanks :slight_smile:

Yes and No.

Yes, Numerically indexed tables are faster to iterate, but you won’t notice it for reasonably sized tables.

No, your last statement is wrong.  Both will print equally fast since printing is gated by the frame time which is 16.67 or 33.33 ms while index lookup is many orders of magnitude faster than that.

PS - Another thing to consider is that the work you do ‘in the loop’ is going to dominate the loop.

i.e. This

for k,v in pairs( enemies ) do v.x = v.x + 10 end

Is many 100’s of times slower than this:

for k,v in pairs( enemies ) do -- empty, no work end

While this is almost the same as the second (assuming 1000 or so entries):

for i = 1, #enemies do -- empty, no work end

At the end of the day, the indexing mechanism you choose should be based on other factors:

  • “Do I need an ordered list?” - Use numeric (but you can’t have gaps).
  • “Do I need a direct reference to the object?” - Use objects as indexes.
  • “Do I need to have a set of attributes with meaningful names?”  - Use strings.

Forget about optimization.  Think first about ease-of  -development and -maintenance.

I think there is a hidden question here.

“Are pre-defined string indexes hashed or evaluated at runtime?”

Clearly dynamically generated indexes are not hashed, but it is possible pre-defined strings are hashed.

If so (hashed), they should be nearly as fast as numeric indexes.

If anyone knows, please chime in.

PS - I purposely ignored the ‘memory’ facet of this questions because I do not know the memory behavior of Lua tables as a side-effect of indexing choice.

Yes, hashing occurs when the string literal gets lexed, during the phase that gathers the bytecode. (Search for TK_STRING in llex.c, if curious.) Or rather, when any string is instantiated, its hash is computed at the same time, and this string is no different.

Kein-Hong Man long ago wrote up an analysis on the bytecode of Lua 5.1, the flavor used by Corona. You can read up on the behavior in question in the parts where it mentions the  GETTABLE op, in particular when also using a constant (the fixed string key). (In lvm.c, this is found under luaV_gettable , with some of the deeper implementation details if you drill down into luaH_get.)

Thank you Ed and StarCrunch for your explanations.

By the end of the day I’ll have two tables, each filled with ~2500 key-value-pairs. Ed, it’s the tables for the utf8-emoji mapping :slight_smile:

I won’t directly loop through those tables, even though behind the scenes a looping might occur, I dont know what technically is happening in the background.

Consider an emojiMap, where the key is a utf8 codepoint and its value is its respective position in an imageSheet:

local emojiMap = {}
emojiMap[128513] = 294
emojiMap[128515] = 423
emojiMap[128514] = 123

Since new emojis can be cominbations of two or even more emojis or even skincolor-modifiers, (consider ‍‍‍the family-emojy for example, which is a combination of a man, a woman, a boy, and another boy) I need keys like

emojiMap[128513-128542-128256-128356] = 345

So, my original question aimed at finding out wether

emojiMap[“128513-128542-128256-128356”] with a string as index would be way worse than

emojiMap[128513128542128256128356] with possible huge integers.

Now, if I understand you two correct, since my tables are predefined and therefor hashed, I can go for strings without a noticeable difference. Correct?

Integer indices are quite faster and even more so if you may also use frameworks with LuaJIT (there you’ll also benefit a lot from an optional much improved memory usage), but in this case, hashed keys are not only the better but actually the only “choice”.

Given the distribution of integer values used as keys, those would end in the hashmap of Lua anyway as Lua is smart enough to not use arrays for integer keys with huge gaps between (otherwise this would waste huge amounts of memory for no use).

Also, given the huge integers, you would also probably run into precission problems.

So, just use strings - also don’t be afraid to create them dynamically, Lua interns/hashes them on creation, f.i.

local myKey = tostring(codepoint1)…"-"…tostring(codepoint2)…"-" etc. etc.

Lua will create a immutable string, it’ll be hashed and put into Lua’s internal global, all including string hashmap.

Each access to myKey will just use this already hashed and interned string.

Excellent explanations, thanks Michael :slight_smile:

besides, lua can’t created an indexed array that (>~10^24) big (limit 2^32-1)

…and even if it could, you wouldn’t have the memory for it (~several yottabytes?)

…nor could doubles reliably represent all the integers (limit 2^53)

so you have no choice, it’ll have to be hash-keyed (thus perf diff is irrelevant)

Yes and No.

Yes, Numerically indexed tables are faster to iterate, but you won’t notice it for reasonably sized tables.

No, your last statement is wrong.  Both will print equally fast since printing is gated by the frame time which is 16.67 or 33.33 ms while index lookup is many orders of magnitude faster than that.

PS - Another thing to consider is that the work you do ‘in the loop’ is going to dominate the loop.

i.e. This

for k,v in pairs( enemies ) do v.x = v.x + 10 end

Is many 100’s of times slower than this:

for k,v in pairs( enemies ) do -- empty, no work end

While this is almost the same as the second (assuming 1000 or so entries):

for i = 1, #enemies do -- empty, no work end

At the end of the day, the indexing mechanism you choose should be based on other factors:

  • “Do I need an ordered list?” - Use numeric (but you can’t have gaps).
  • “Do I need a direct reference to the object?” - Use objects as indexes.
  • “Do I need to have a set of attributes with meaningful names?”  - Use strings.

Forget about optimization.  Think first about ease-of  -development and -maintenance.

I think there is a hidden question here.

“Are pre-defined string indexes hashed or evaluated at runtime?”

Clearly dynamically generated indexes are not hashed, but it is possible pre-defined strings are hashed.

If so (hashed), they should be nearly as fast as numeric indexes.

If anyone knows, please chime in.

PS - I purposely ignored the ‘memory’ facet of this questions because I do not know the memory behavior of Lua tables as a side-effect of indexing choice.

Yes, hashing occurs when the string literal gets lexed, during the phase that gathers the bytecode. (Search for TK_STRING in llex.c, if curious.) Or rather, when any string is instantiated, its hash is computed at the same time, and this string is no different.

Kein-Hong Man long ago wrote up an analysis on the bytecode of Lua 5.1, the flavor used by Corona. You can read up on the behavior in question in the parts where it mentions the  GETTABLE op, in particular when also using a constant (the fixed string key). (In lvm.c, this is found under luaV_gettable , with some of the deeper implementation details if you drill down into luaH_get.)

Thank you Ed and StarCrunch for your explanations.

By the end of the day I’ll have two tables, each filled with ~2500 key-value-pairs. Ed, it’s the tables for the utf8-emoji mapping :slight_smile:

I won’t directly loop through those tables, even though behind the scenes a looping might occur, I dont know what technically is happening in the background.

Consider an emojiMap, where the key is a utf8 codepoint and its value is its respective position in an imageSheet:

local emojiMap = {}
emojiMap[128513] = 294
emojiMap[128515] = 423
emojiMap[128514] = 123

Since new emojis can be cominbations of two or even more emojis or even skincolor-modifiers, (consider ‍‍‍the family-emojy for example, which is a combination of a man, a woman, a boy, and another boy) I need keys like

emojiMap[128513-128542-128256-128356] = 345

So, my original question aimed at finding out wether

emojiMap[“128513-128542-128256-128356”] with a string as index would be way worse than

emojiMap[128513128542128256128356] with possible huge integers.

Now, if I understand you two correct, since my tables are predefined and therefor hashed, I can go for strings without a noticeable difference. Correct?

Integer indices are quite faster and even more so if you may also use frameworks with LuaJIT (there you’ll also benefit a lot from an optional much improved memory usage), but in this case, hashed keys are not only the better but actually the only “choice”.

Given the distribution of integer values used as keys, those would end in the hashmap of Lua anyway as Lua is smart enough to not use arrays for integer keys with huge gaps between (otherwise this would waste huge amounts of memory for no use).

Also, given the huge integers, you would also probably run into precission problems.

So, just use strings - also don’t be afraid to create them dynamically, Lua interns/hashes them on creation, f.i.

local myKey = tostring(codepoint1)…"-"…tostring(codepoint2)…"-" etc. etc.

Lua will create a immutable string, it’ll be hashed and put into Lua’s internal global, all including string hashmap.

Each access to myKey will just use this already hashed and interned string.

Excellent explanations, thanks Michael :slight_smile:

besides, lua can’t created an indexed array that (>~10^24) big (limit 2^32-1)

…and even if it could, you wouldn’t have the memory for it (~several yottabytes?)

…nor could doubles reliably represent all the integers (limit 2^53)

so you have no choice, it’ll have to be hash-keyed (thus perf diff is irrelevant)