Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve default setNull performance on PostgreSQL and MS SQL Server (e.g. for NamedParameterJdbcTemplate batch updates) #25679

Closed
ricardoekm opened this issue Sep 2, 2020 · 23 comments
Assignees
Labels
in: data Issues in data modules (jdbc, orm, oxm, tx) type: enhancement A general enhancement
Milestone

Comments

@ricardoekm
Copy link

We've been investigating an issue where we had a 100x performance drop when switching from JdbcTemplate to NamedParameterJdbcTemplate in batch update operations.

We found out that the root cause is the handling of null values by StatementCreatorUtils:setNull, which in turn calls getParameterMetadata from the connection. It seems this issue a database call for every row to be inserted (call stack from profiler attached) which basically kills the performance gains of the batch insert.

We're using Postgres as back-end database. Spring JDBC 5.2.5.

Thanks.

namedparameterjdbctemplate_stack

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Sep 2, 2020
@quaff
Copy link
Contributor

quaff commented Sep 3, 2020

Have you tried -Dspring.jdbc.getParameterType.ignore=true?

@quaff
Copy link
Contributor

quaff commented Sep 3, 2020

It seems spring.jdbc.getParameterType.ignore will not avoid ps.getConnection().getMetaData(), should spring need introducing a new spring.jdbc.getMetaData.ignore?

@quaff
Copy link
Contributor

quaff commented Sep 3, 2020

@jhoeller Could you explain why not simply call ps.setObject(paramIndex, null) ?

@jhoeller jhoeller self-assigned this Sep 3, 2020
@jhoeller
Copy link
Contributor

jhoeller commented Sep 3, 2020

ps.getConnection().getMetaData() usually isn't expensive since it's cached in the connection pool. Just getParameterMetaData() may turn out to be expensive, and that's what spring.jdbc.getParameterType.ignore=true covers.

As for why not simply call setObject, it's about compatibility with certain drivers/databases. Quoting the setObject javadoc:

 *<b>Note:</b> Not all databases allow for a non-typed Null to be sent to
 * the backend. For maximum portability, the <code>setNull</code> or the
 * <code>setObject(int parameterIndex, Object x, int sqlType)</code>
 * method should be used
 * instead of <code>setObject(int parameterIndex, Object x)</code>.

@ricardoekm
Copy link
Author

spring.jdbc.getParameterType.ignore=true did the trick, thanks!

@quaff
Copy link
Contributor

quaff commented Sep 4, 2020

@jhoeller I assume all modern databases allow non-typed Null, if it's true, spring.jdbc.getParameterType.ignore should be true by default.

@ricardoekm
Copy link
Author

Reasonable defaults, that's what life is about :)

@jhoeller
Copy link
Contributor

jhoeller commented Sep 4, 2020

We're happy to refine our default assumptions there. However, a few years ago, there were only a few databases supporting untyped setObject(paramIndex, null) calls, e.g. Microsoft's then-new SQL Server driver. For all others, a setNull call with some SQL type argument is needed.

I'd particularly like to learn about recent driver behavior with MySQL and PostgreSQL there. We're effectively using setNull(paramIndex, Types.NULL) for the latter, like for other databases that we don't have explicit rules for. If this is known to work reliably, we can explictly opt into that for such well-known databases, not ever trying getParameterType for those then (independent from the spring.jdbc.getParameterType.ignore setting).

@jhoeller
Copy link
Contributor

jhoeller commented Sep 4, 2020

A StackOverflow answer about support for setObject vs setNull in common JDBC drivers (https://stackoverflow.com/a/10068111, dating back to 2016) and other articles suggest that MySQL does indeed reliably support untyped nulls, whereas PostgreSQL requires the specific SQL type to be set in some cases. I'd appreciate feedback on the latter since according to the indications above, setNull(paramIndex, Types.NULL) does seem to work for your cases here (since that's what we're effectively using when you set spring.jdbc.getParameterType.ignore=true), possibly due to a more recent JDBC driver?

A more recent discussion thread about PostgreSQL null handling (https://www.postgresql-archive.org/quot-could-not-determine-data-type-of-parameter-quot-with-timestamp-td5995489.html) suggests that setNull(paramIndex, Types.NULL) does usually work but that ParameterMetaData.getParameterType should be used for better execution performance. That's pretty much the situation that we have right now with the choice that spring.jdbc.getParameterType.ignore is providing, following the PostgreSQL recommendation by default but allowing to opt out of it.

Another interesting thread on StackOverflow (https://stackoverflow.com/questions/37942063/slow-insert-on-postgresql-using-jdbc) suggests that the pgjdbc-ng driver caches parameter data and is therefore much faster with our default settings.

I'll consider revising our default algorithm to take recent driver updates into account, reopening this issue for it. There might not be much that we can do about PostgreSQL specifically but this is nevertheless worth a fresh look now with 2020-era drivers.

@jhoeller jhoeller reopened this Sep 4, 2020
@jhoeller jhoeller changed the title NamedParameterJdbcTemplate + Batch Update + Null parameters = Bad performance Revise database defaults for setNull handling (e.g. for NamedParameterJdbcTemplate batch updates) Sep 4, 2020
@jhoeller jhoeller added in: data Issues in data modules (jdbc, orm, oxm, tx) type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Sep 4, 2020
@jhoeller jhoeller added this to the 5.2.9 milestone Sep 4, 2020
jhoeller added a commit that referenced this issue Sep 5, 2020
@jhoeller jhoeller modified the milestones: 5.2.9, 5.3 RC1 Sep 8, 2020
@ricardoekm
Copy link
Author

ricardoekm commented Sep 8, 2020

Great to hear that you're revisiting this implementation.

Setting null with Types.NULL indeed works for PostgresSQL (JDBC) 42.2.12 and C3P0 0.9.5.5.

Adding a bit more information, here is the result of our performance test

JdbcTemplate

Batch 1: 10925 records/sec.
Batch 2: 11189 records/sec.
Batch 3: 11170 records/sec.
Batch 4: 11140 records/sec.
Batch 5: 11179 records/sec.

Average: 11,120

NamedJdbcTemplate with default configuration

Batch 1: 312 records/sec.
Batch 2: 344 records/sec.
Batch 3: 348 records/sec.
Batch 4: 350 records/sec.
Batch 5: 357 records/sec.

Average: 342.2 records/sec.

NamedJdbcTemplate with spring.jdbc.getParameterType.ignore=true

Batch 1: 10384 records/sec.
Batch 2: 10652 records/sec.
Batch 3: 10937 records/sec.
Batch 4: 10697 records/sec.
Batch 5: 10961 records/sec.

Average: 10,726 records/sec.

So in general, JdbcTemplate is a bit faster than NamedJdbcTemplate with spring.jdbc.getParameterType.ignore=true (around 4%), but in the same order of magnitude.

However, there's a huge performance drop by using NamedJdbcTemplate with the default configuration.

@quaff
Copy link
Contributor

quaff commented Sep 9, 2020

@jhoeller I'm wondering why JdbcTemplate not affected.

@quaff
Copy link
Contributor

quaff commented Sep 9, 2020

It seems like JdbcTemplate always use TYPE_UNKNOWN if type is not explicit, so there is some inconsistency with NamedParameterJdbcTemplate.

StatementCreatorUtils.setParameterValue(ps, parameterPosition, SqlTypeValue.TYPE_UNKNOWN, argValue);

@jhoeller
Copy link
Contributor

jhoeller commented Sep 9, 2020

Hmm NamedParameterJdbcTemplate should effectively also end up with SqlTypeValue.UNKNOWN by default... wondering where the performance difference actually comes from. Maybe it's some subtle interaction that I'm missing here.

@jhoeller jhoeller modified the milestones: 5.3 RC1, 5.3 RC2 Sep 14, 2020
@jhoeller jhoeller added this to the 6.1.x milestone Jul 10, 2023
@jhoeller
Copy link
Contributor

jhoeller commented Jul 12, 2023

For the classic PostgresSQL JDBC driver, there is actually a GitHub issue for it: pgjdbc/pgjdbc#621
Seems to have got stuck some time last year, despite a concrete caching approach having been tried.
Can everyone interested in this from Spring's side here cast their vote over there please :-)

@jhoeller jhoeller modified the milestones: 6.1.x, 6.x Backlog Jul 17, 2023
@davecramer
Copy link

@jhoeller can you point me to the concrete caching approach you are referring to?

@jhoeller
Copy link
Contributor

@davecramer it looks like @vlsi was pretty much working in that direction with pgjdbc/pgjdbc#621 already. I suppose a PreparedStatement-level cache would go a long way, a Connection-level cache would be even better.

All we really need is getParameterMetaData() turning into a cheap enough operation for repeated calls on the same PreparedStatement, with any necessary backend interaction only happening once and the resulting SQL type information then being immediately available for repeated parameter setting on the same statement.

On Spring's side, we just need that metadata for passing in a SQL type to a PreparedStatement.setNull call if we don't have more specific type information for a given parameter. If there is another way of sending typed nulls to the DBMS without introspecting getParameterMetaData(), we could support that on Spring's side as well. Generally speaking, we operate under the assumption that setNull(..., Types.NULL) is not good enough by default, and neither is setObject(..., null).

@jhoeller
Copy link
Contributor

jhoeller commented Aug 3, 2023

@davecramer, reviewing other JDBC driver implementations in that respect, the key-based caching approach that @vlsi considered was probably more sophisticated than necessary. The common driver implementations seem to simply cache the ParameterMetaData object in the PreparedStatement instance after obtaining it. So no reactions to intermediate changes in the database schema or the like, and no reuse at the connection level either, just returning the existing metadata object for repeated calls on the same PreparedStatement instance.

@vlsi
Copy link
Contributor

vlsi commented Aug 3, 2023

@jhoeller , thank you for the clarification. Let me see if I can implement the MVP.

@jhoeller jhoeller changed the title Revise database defaults for setNull handling (e.g. for NamedParameterJdbcTemplate batch updates) Improve setNull performance on PostgreSQL (e.g. for NamedParameterJdbcTemplate batch updates) Aug 3, 2023
@jhoeller jhoeller added type: documentation A documentation task and removed type: enhancement A general enhancement labels Aug 3, 2023
@jhoeller jhoeller modified the milestones: 6.x Backlog, 6.0.12 Aug 3, 2023
@sbrannen sbrannen changed the title Improve setNull performance on PostgreSQL (e.g. for NamedParameterJdbcTemplate batch updates) Improve setNull performance on PostgreSQL (e.g. for NamedParameterJdbcTemplate batch updates) Aug 3, 2023
@jhoeller jhoeller modified the milestones: 6.0.12, 6.1.x Sep 5, 2023
@jhoeller jhoeller modified the milestones: 6.1.x, 6.0.13 Sep 15, 2023
@jhoeller jhoeller modified the milestones: 6.0.13, 6.x Backlog Oct 9, 2023
@viniciusxyz
Copy link

I had performance problems in production due to the setNull using MSSQL with the official driver in the latest version available (12.4.1) and JDBCTemplate.update(), when adding the parameter -Dspring.jdbc.getParameterType.ignore=true, the time for preparing the query decreased drastically, giving a 70% lower average execution time and reducing the amount of GC during some tests from 196 executions to just 17 which greatly improved CPU performance.

I would like to understand when the use of this parameter can actually be something negative and if there is any relationship between drivers where it causes problems, currently I use the official mssql driver and the postgresql driver for some demands, is there any way that Do you recommend so that I can test these drivers? If possible, I would like to always set the value of spring.jdbc.getParameterType.ignore to true in some applications I have, but I'm unsure of what exactly might happen if I do that.

Thank you in advance if anyone can clarify these doubts, I don't understand in depth how the drivers work and so it's difficult for me to know exactly which scenarios are problematic with the mentioned parameter.

@jhoeller
Copy link
Contributor

jhoeller commented Dec 9, 2023

spring.jdbc.getParameterType.ignore=true is safe against PostgreSQL and MS SQL Server, as far as we are able to infer, since the drivers (or rather the DBMS wire protocols) do not actually rely on parameter type information for null values there.

I wasn't aware of such a dramatic performance impact on SQL Server yet, this looks like a reason to prefer the bypass (without a getParameterType call) there as well. So for both PostgreSQL and SQL Server, we have a situation where type information for null values does not matter in common scenarios but getParameterType calls are known to be expensive.

For H2, HSQLDB, Derby and MySQL, type information does not matter for null values either but getParameterType is efficient enough there.

On the other side, to the best of my knowledge, type information for null values is important on Oracle, Sybase and DB2, with getParameterType generally being preferable there. Or of course, explicitly specifying SQL type information on the application side through providing SqlParameterValue objects instead of plain null values to begin with.

All in all, I'm inclined to implement a default bypass for PostgreSQL and MS SQL Server for the case where spring.jdbc.getParameterType.ignore is unspecified, similar to the database-specific setObject vs setNull checks that we got there in StatementCreatorUtils already. An explicit true or false value for the flag would still override the bypass.

@jhoeller jhoeller modified the milestones: 6.x Backlog, 6.1.2 Dec 9, 2023
@jhoeller jhoeller added type: enhancement A general enhancement and removed type: documentation A documentation task labels Dec 9, 2023
@jhoeller jhoeller changed the title Improve setNull performance on PostgreSQL (e.g. for NamedParameterJdbcTemplate batch updates) Improve default setNull performance on PostgreSQL and MS SQL Server (e.g. for NamedParameterJdbcTemplate batch updates) Dec 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: data Issues in data modules (jdbc, orm, oxm, tx) type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

8 participants