performance issue using cursors and long queries #2620

pdowler · 2022-09-16T17:03:39Z

pdowler
Sep 16, 2022

Hi all - my first time here so I hope this is the right venuw to ask about this problem.

client: docker/tomcat-9/openjdk-11+ postgresql-jdbc-42.2.18-2.fc34.noarch
server: postgresql-12.3

The client is a web service that let's users execute ~SQL queries via a REST API and the code that executes the query uses PreparedStatement.setFetchSize(1000) (eg) to limit memory consumption when streaming large query results.
relevant java code:

connection.setAutoCommit(false);
          
pstmt = connection.prepareStatement(sql);          
//pstmt = connection.prepareStatement(sql, ResultSet.TYPE_FORWARD_ONLY,
//                          ResultSet.CONCUR_READ_ONLY, ResultSet.CLOSE_CURSORS_AT_COMMIT);
pstmt.setFetchSize(1000);
pstmt.setFetchDirection(ResultSet.FETCH_FORWARD);

resultSet = pstmt.executeQuery();

The issue I am running into is with the time for pstmt.executeQuery() to return the ResultSet (the first rows).
Small queries that return immediately (a select of one row) also run as expected (in 1-2ms). Simple queries that return large numbers of rows start streaming right away (return the ResultSet in a small amount of time) as expected and I occasionally profile jvm memory usage to make sure it is still consuming a fixed/finite amount.

The performance issue comes up with a long running query that doesn't produce output for a long time, and then a small amount of output (an aggregate query). The table has ~150e6 rows and the query produces about 50 rows.

`select foo,count(*) from massive_table group by foo`

In psql, this query takes ~150sec and produces ~50 rows
From java code, it takes ~370sec to return the ResultSet and produces the same ~50 rows.

If I disable cursor use (eg by setting fetch size to 0) in java code, the query takes ~150sec.

aside: the two varaints of prepareStatement(...) above behave the same; as I understand, those extra flags are the defaults except maybe fetchDirection which I had already been setting explicitly.

Any ideas?

more context: the java code implements a REST API so users can execute ~SQL queries (select only). This implements a community standard (https://www.ivoa.net/documents/TAP/) and we have been operating a variety of these TAP services for more than 10 years. I can't say this issue is new because it seems to only be noticeable with certain kinds of queries and it was only when we started seeing client timeouts querying a new and much larger table that we investigated and noted the performance discrepancy.

davecramer · 2022-09-16T18:26:27Z

davecramer
Sep 16, 2022
Maintainer

It's entirely possible that the cursor query takes longer.

You can replicate it using

execute <your query>```

Let me know what how long that takes

0 replies

sehrope · 2022-09-16T18:35:16Z

sehrope
Sep 16, 2022
Maintainer

It's possible the server is generating a different plan when a cursor is being used. This wouldn't be something specific to this driver, the server would do it for any connection that declares a cursor rather than directly executing the entire command. You can simulate this in psql by manually declaring a cursor: ``` BEGIN; PREPARE my_cursor AS SELECT t.foo, COUNT(*) FROM massive_table t group by t.foo; FETCH 1000 FROM my_cursor; FETCH 1000 FROM my_cursor; FETCH 1000 FROM my_cursor; ``` I would bet that the first fetch operation takes just as long as when it's from via this driver. FYI, the server might change the plan as cursors are optimized for fetching the first batch of rows (rather than everything). Sometimes that leads to bad plans v.s. whatever it would do for efficiently retrieving the entire result. Check out the plans for the two approaches to see the specifics of why it's happening and you may find a way to resolve it.

0 replies

pdowler · 2022-09-16T20:50:30Z

pdowler
Sep 16, 2022
Author

Ok, I have this test:

select now(); 

start transaction;

declare cur NO SCROLL CURSOR FOR 
select foo, count(*)
from massive_table
group by foo order by foo;

select now();

FETCH 1000 FROM cur;

commit transaction;

select now();

which is close to what happens via the driver. The results are:

now              
-------------------------------
 2022-09-16 13:38:19.571161-07
(1 row)

START TRANSACTION
DECLARE CURSOR
             now              
------------------------------
 2022-09-16 13:38:19.57177-07
(1 row)

    foo     |   count   
------------------+-----------
...
(52 rows)

COMMIT
              now              
-------------------------------
 2022-09-16 13:44:23.585818-07
(1 row)

The time between declare and finish (FETCH) is ~374sec, so yes, it looks like it's not a JDBC issue but a cursor/plan issue.

I will try using explain with and without cursor to see how the plan differs.

0 replies

pdowler · 2022-09-16T21:04:58Z

pdowler
Sep 16, 2022
Author

OK, that wasn't too hard and I can see that without a cursor the query plan uses multiple workers (2) doing "Partial Aggregate" and then "Gather" and "Finalize Aggregate"... while with cursor it is just the simple plan and single scan of the massive table.

So, cursors seem to disable/block the ability to use a parallel worker plan. I can see that plan change in other queries too now that I know how to explain it.

Now to figure out how to make my java code know when to not use cursors, but certainly not a JDBC driver issue.

thanks for the help!

1 reply

davecramer Sep 19, 2022
Maintainer

Well that's interesting...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance issue using cursors and long queries #2620

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

performance issue using cursors and long queries #2620

pdowler Sep 16, 2022

Replies: 4 comments · 1 reply

davecramer Sep 16, 2022 Maintainer

sehrope Sep 16, 2022 Maintainer

pdowler Sep 16, 2022 Author

pdowler Sep 16, 2022 Author

davecramer Sep 19, 2022 Maintainer

pdowler
Sep 16, 2022

Replies: 4 comments 1 reply

davecramer
Sep 16, 2022
Maintainer

sehrope
Sep 16, 2022
Maintainer

pdowler
Sep 16, 2022
Author

pdowler
Sep 16, 2022
Author

davecramer Sep 19, 2022
Maintainer