torstai 29. elokuuta 2024

TIL: SQL Limit is slow

 It's been a while since I wrote anything. Mainly reason is that I have been very busy with work and other things, and consequently have had very little time to do any fun experimentation that would (or could) be noteworthy here.

Until now that I ran onto something. I have a Postgresql database with several tens of millions of rows. At one point performance got very bad so I updated to indexes, problem solved.

Now I was doing testing with new feature that does data analysis and did simple query, for testing (x.col being indexed column);

select a,b,c from table x where x.col=something and x.id=12345;

Trivial query and it did complete in milliseconds. In initial testing I wanted to just use one specific row to get started.

When things was working okay for that row, I wanted to update this so I can process things one at a time - again, to verify things work as expected;

select a,b,c from table x where x.col=something limit 1;

This query took ridiculously long time to complete, close to a minute or so. And returned zero rows.

So I did what anyone would, asked postgre to explain what it did. And it showed sequential scan. What?

Okay, when things fail, fall back to something trivial and start working back, so let's try something simpler...

select a,b,c from table x where x.col=something;
(0 rows returned)

This ran again in milliseconds. So limit did something not fun.

Time for googling, and yes, limit apparently manages to make postgre fall back to sequential scan in some cases. 

So there. Drop limit from query and process just one row (for now). And then check around in how many places exactly I am using limit needlessly...

Maybe something many already knew, but I didn't. Ah well.