Hash Performance

Hi!

What is the best way (performance) to iterate (many times) an hash (with many items) ?

  1. With pairs

    local hash = {} hash[10] = {x=10,y=10} hash[20] = {x=12,y=20} hash[30] = {x=14,y=15} hash[40] = {x=16,y=20} for k,v in pairs(hash) do – do something end

  2. with more tables

    local hash = {} hash[10] = {x=10,y=10} hash[20] = {x=12,y=20} hash[30] = {x=14,y=15} hash[40] = {x=16,y=20} local index = {} index[1] = 10 index[2] = 20 index[3] = 30 index[4] = 40 for i=1, #index do local v = hash[index[i]] – do something end

  3. Others… ?

Thank you!

Hi.

You may set elements of a table to nil while iterating over it with pairs(). Usually this would be the element you’re currently visiting:

hash[k] = nil

but it could just as well be some other key, including the next one you would have iterated over. Since pairs() has no special knowledge of how you’re using it, e.g. maybe you’re not removing elements at all, it has to do some things internally to gracefully and robustly handle such possibilities. This gives pairs() some useful generality, but it’s also “slow”. (Even LuaJIT’s developer has found optimizing pairs() to be a non-starter.)

That said, “many” is crucial, when you mention number of items and times, and you’ll need to profile with different approaches. Up to some number of elements, basic “noise” will cancel out any gains in performance, and then there might be a break-even range as well. In my own programs I use pairs() if it fits the problem, then only reconsider if I see actual slowdown.

The index approach sounds difficult to maintain, although it would let you sort your data, if you need that. As for others, you might also consider a flat array:

local data = {} for k, v in pairs{ [10] = { x = 10, y = 10 }, -- etc. } do data[#data + 1] = k data[#data + 1] = v.x data[#data + 1] = v.y end -- iterate! for i = 1, #data, 3 do local k, x, y = data[i], data[i + 1], data[i + 2] -- do something end

This comes with its own pros and cons. Sorting, for instance, is extremely cumbersome. Also, if you add a nil by mistake, a huge number of elements suddenly wind up in the wrong slots, which can be confusing to debug!

If order is important, removing elements is also expensive: either do 3 up-front **table.remove()**s in the above case, which end up shifting the array contents to close the gap, or stuff sentinel values into the vacated slots and slow down all of your loops with checks for them (not to mention, your array will keep growing).

On the other hand, if order doesn’t matter, you can just move the final element into the vacated slots, a so-called “backfill”. An example of doing it during a loop, iterating backward:

local n = #items for i = n - 2, 1, -3 do local k, x, y = items[i], items[i + 1], items[i + 2] -- do something if DoneWithItem(k, x, y) then items[i], items[i + 1], items[i + 2] = items[n - 2], items[n - 1], items[n] items[n - 2], items[n - 1], items[n] = nil n = n - 3 end end

When I find myself doing this, though, I tend to do it less for raw speed and more as a way to avoid generating lots of garbage tables (not an issue if you never need to throw them away, of course).

Hi.

You may set elements of a table to nil while iterating over it with pairs(). Usually this would be the element you’re currently visiting:

hash[k] = nil

but it could just as well be some other key, including the next one you would have iterated over. Since pairs() has no special knowledge of how you’re using it, e.g. maybe you’re not removing elements at all, it has to do some things internally to gracefully and robustly handle such possibilities. This gives pairs() some useful generality, but it’s also “slow”. (Even LuaJIT’s developer has found optimizing pairs() to be a non-starter.)

That said, “many” is crucial, when you mention number of items and times, and you’ll need to profile with different approaches. Up to some number of elements, basic “noise” will cancel out any gains in performance, and then there might be a break-even range as well. In my own programs I use pairs() if it fits the problem, then only reconsider if I see actual slowdown.

The index approach sounds difficult to maintain, although it would let you sort your data, if you need that. As for others, you might also consider a flat array:

local data = {} for k, v in pairs{ [10] = { x = 10, y = 10 }, -- etc. } do data[#data + 1] = k data[#data + 1] = v.x data[#data + 1] = v.y end -- iterate! for i = 1, #data, 3 do local k, x, y = data[i], data[i + 1], data[i + 2] -- do something end

This comes with its own pros and cons. Sorting, for instance, is extremely cumbersome. Also, if you add a nil by mistake, a huge number of elements suddenly wind up in the wrong slots, which can be confusing to debug!

If order is important, removing elements is also expensive: either do 3 up-front **table.remove()**s in the above case, which end up shifting the array contents to close the gap, or stuff sentinel values into the vacated slots and slow down all of your loops with checks for them (not to mention, your array will keep growing).

On the other hand, if order doesn’t matter, you can just move the final element into the vacated slots, a so-called “backfill”. An example of doing it during a loop, iterating backward:

local n = #items for i = n - 2, 1, -3 do local k, x, y = items[i], items[i + 1], items[i + 2] -- do something if DoneWithItem(k, x, y) then items[i], items[i + 1], items[i + 2] = items[n - 2], items[n - 1], items[n] items[n - 2], items[n - 1], items[n] = nil n = n - 3 end end

When I find myself doing this, though, I tend to do it less for raw speed and more as a way to avoid generating lots of garbage tables (not an issue if you never need to throw them away, of course).