ROW_NUMBER(() in SQlite3 and Lua

I’m trying to get the first result of each group in a Sqlite3 table. In the snapshot below, only hightlighted rows should be returned (first buildingOnMapId of each group).

After some research I believe i need to use row_number and partition_by as explained here, unfortunately that doesn’t seem to work in my case. My query is as follows:

for row in globalData.db:nrows("SELECT ROW\_NUMBER() OVER ( PARTITION BY buildingOnMapId ORDER BY id ASC) RowNum, buildingType, x, y FROM builtBuildings") do

Note: I planned to limit the results with  WHERE RowNum = 1  but I wanted to start with the simplest query…

But the console returns an error:

ERROR: Runtime error near "(": syntax error stack traceback: [C]: in function 'nrows'

I can’t see where my syntax is wrong, maybe row_number is not available in sqlite for Lua?

I’m open to any other alternative if there is a better way to achieve this…

Seems like row_number() is a pretty new addition to SQLite (Sept 2018), so I think your only option is to iterate over the rows in Lua, sorted as above, and manually add the required rows to a new table.

Are you just trying to return your ‘id’ columns, one for each unique ‘buildingOnMapId’ value? If so, this should work…

for row in globalData.db:nrows( [[SELECT `id`,`buildingOnMapId`,`buildingType` FROM `builtBuildings` GROUP BY `buildingOnMapId` ORDER BY `id` ASC;]] ) do print("id = " .. row.id .. " , buildingOnMapId = " .. row.buildingOnMapId .. " , buildingType = " .. row.buildingType) end

The GROUP BY clause groups all records where the defined field matches, so that only one of those records is returned. Combined with your ORDER BY this should always be the record with the lowest ID.

As a side note, it’s worth getting into the habit of prefixing your field and table names with something to ensure that they’re not interpreted as SQL commands. I for example, prefix db for databases, tbl for tables, and fld for fields. That way, something like id becomes fldId and there’s no risk of it being interpreted as a statement in database engines that interpret ‘id’ if you forget to escape it.

Sorry my post wasn’t complete, I need the x and y coordinates of the first row of each unique ‘buildingOnMapId’. My current query is the same as the one Richard mentioned:

"SELECT buildingType, x, y FROM builtBuildings GROUP BY buildingOnMapId ORDER BY id ASC"

But that returns the last row of each unique ‘buildingOnMapId’. I.e. id 7 and 12 in the example of my previous spot.

I though of iterating through the rows and only getting the first entry but would have been nice to manage it directly with a simple query. I will end up doing that if we can’t find anything better.

Thanks for the tip on the prefix, I will do that next time I cleanup my code.

It sounds like SQLite performs the group by before the order by then. In which case you’ll need to use an inner select to return an ordered list before grouping, like this:

SELECT * FROM (SELECT id,buildingOnMapId,x,y FROM builtBuildings ORDER BY id ASC) tmp GROUP BY buildingOnMapId

If SQLite is like MariaDB, it’s then also possible that it ignores the order by clause of inner selects for performance. In which case with MariaDB you can work around that by adding a limit clause, so the same might work with SQLite:

SELECT * FROM (SELECT id,buildingOnMapId,x,y FROM builtBuildings ORDER BY id ASC LIMIT 100000) tmp GROUP BY buildingOnMapId

Just make sure the limit is a number higher than anything you’re likely to actually return, so that it’s not actually a limit.

Sorry, I’m on my mobile so can’t format the above properly, or test, and I’m not overly experienced with SQLite specifically - I’m just coming at this from a generic SQL perspective. Hope it works!

Thanks Richard. I tried your suggestion (both with and without the limit) but I get the same result (the last row is returned instead of the first one).

I experimented by changing mine and your query from ORDER BY  id ASC to ORDER BY  id DESC and I now get the first row. I guess that it is the last query that is kept in memory/returned so in that case it is the lowest id.

Just wondering if that is a safe way of doing it?

Strange. SQLite must always return the last record in the group then, in which case your DESC ordering should be safe.

I’ve thought of another, potentially faster method though. If you stick a HAVING on the end of the group, you can filter the records within that group. Filtering by the minimum id should therefore return groups of just one record, like this:

SELECT id,buildingOnMapId,x,y FROM builtBuildings GROUP BY buildingOnMapId HAVING MIN(id);

Not needing an order by or an inner select should be faster, but I’ve no idea what the performance of HAVING() is like. Regardless, if this one works it’s probably more elegant.

Well as usual Richard has the solution :slight_smile:

Works perfectly with the HAVING!

I will check the performance but from my limited testing it looks okay so far.

Glad to hear.

Setting up indexes on the buildingOnMapId and id fields should help with performance regardless, at the cost of using up more storage. 99% of the time, the speed increase of indexes far outweighs the expense of storing them, so as a general rule of thumb you should create an index for any field that you use in a join, group by, or where/having clause.

Seems like row_number() is a pretty new addition to SQLite (Sept 2018), so I think your only option is to iterate over the rows in Lua, sorted as above, and manually add the required rows to a new table.

Are you just trying to return your ‘id’ columns, one for each unique ‘buildingOnMapId’ value? If so, this should work…

for row in globalData.db:nrows( [[SELECT `id`,`buildingOnMapId`,`buildingType` FROM `builtBuildings` GROUP BY `buildingOnMapId` ORDER BY `id` ASC;]] ) do print("id = " .. row.id .. " , buildingOnMapId = " .. row.buildingOnMapId .. " , buildingType = " .. row.buildingType) end

The GROUP BY clause groups all records where the defined field matches, so that only one of those records is returned. Combined with your ORDER BY this should always be the record with the lowest ID.

As a side note, it’s worth getting into the habit of prefixing your field and table names with something to ensure that they’re not interpreted as SQL commands. I for example, prefix db for databases, tbl for tables, and fld for fields. That way, something like id becomes fldId and there’s no risk of it being interpreted as a statement in database engines that interpret ‘id’ if you forget to escape it.

Sorry my post wasn’t complete, I need the x and y coordinates of the first row of each unique ‘buildingOnMapId’. My current query is the same as the one Richard mentioned:

"SELECT buildingType, x, y FROM builtBuildings GROUP BY buildingOnMapId ORDER BY id ASC"

But that returns the last row of each unique ‘buildingOnMapId’. I.e. id 7 and 12 in the example of my previous spot.

I though of iterating through the rows and only getting the first entry but would have been nice to manage it directly with a simple query. I will end up doing that if we can’t find anything better.

Thanks for the tip on the prefix, I will do that next time I cleanup my code.

It sounds like SQLite performs the group by before the order by then. In which case you’ll need to use an inner select to return an ordered list before grouping, like this:

SELECT * FROM (SELECT id,buildingOnMapId,x,y FROM builtBuildings ORDER BY id ASC) tmp GROUP BY buildingOnMapId

If SQLite is like MariaDB, it’s then also possible that it ignores the order by clause of inner selects for performance. In which case with MariaDB you can work around that by adding a limit clause, so the same might work with SQLite:

SELECT * FROM (SELECT id,buildingOnMapId,x,y FROM builtBuildings ORDER BY id ASC LIMIT 100000) tmp GROUP BY buildingOnMapId

Just make sure the limit is a number higher than anything you’re likely to actually return, so that it’s not actually a limit.

Sorry, I’m on my mobile so can’t format the above properly, or test, and I’m not overly experienced with SQLite specifically - I’m just coming at this from a generic SQL perspective. Hope it works!

Thanks Richard. I tried your suggestion (both with and without the limit) but I get the same result (the last row is returned instead of the first one).

I experimented by changing mine and your query from ORDER BY  id ASC to ORDER BY  id DESC and I now get the first row. I guess that it is the last query that is kept in memory/returned so in that case it is the lowest id.

Just wondering if that is a safe way of doing it?

Strange. SQLite must always return the last record in the group then, in which case your DESC ordering should be safe.

I’ve thought of another, potentially faster method though. If you stick a HAVING on the end of the group, you can filter the records within that group. Filtering by the minimum id should therefore return groups of just one record, like this:

SELECT id,buildingOnMapId,x,y FROM builtBuildings GROUP BY buildingOnMapId HAVING MIN(id);

Not needing an order by or an inner select should be faster, but I’ve no idea what the performance of HAVING() is like. Regardless, if this one works it’s probably more elegant.

Well as usual Richard has the solution :slight_smile:

Works perfectly with the HAVING!

I will check the performance but from my limited testing it looks okay so far.

Glad to hear.

Setting up indexes on the buildingOnMapId and id fields should help with performance regardless, at the cost of using up more storage. 99% of the time, the speed increase of indexes far outweighs the expense of storing them, so as a general rule of thumb you should create an index for any field that you use in a join, group by, or where/having clause.