New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance degradation of XML operations under Java 21 with custom parsers/transformers #3221
Comments
@demonti There's connection property, You can create or your class that implements @davecramer I think we (or more accurately me...) forgot to add But I guess we never added it to the site itself so one would have to look at the enum or the release notes to know about it. A page that's generated off the driver's actual enum values would be nice as it already has the types and descriptions. That way it's always in sync. |
Yes, it would ! :) |
For your information: I just created an issue in the Oracle Java Bug Database. As soon as this becomes accepted and public (somehow – don't know their process), I will post a link. While the workaround is surely a solution, it might still be a good idea to solve the problem directly by reworking the code. |
The bug report has been accepted by Oracle. Of course, they do not give any timeframe/priority about when they will take care. The bug ID at Oracle is 8331025, direct link https://bugs.java.com/bugdatabase/view_bug?bug_id=8331025. |
pg-xml-test.tar.gz
Describe the issue
Our software is still running on Java 8, but we are in the process of migrating it to Java 21. When doing performance tests, my colleague accidentally ran the tests using Java 21, and it showed a considerable performance degradation – it took two or three times longer.
I performed a profiling using the Linux "gprofng" tool. This showed me a large amount of time spent in specific JDBC calls in the context of the XML support, i.e. SQLXML, PreparedStatement.setSQLXML, ResultSet.getSQLXML, which is actually consumed by Java class loading code. I further used the Linux "strace" tool to validate my assumption, which I describe in the following.
In our project, we replaced the built-in XML parsers and transformers with the original Apache Xerces and the Saxon XSL/T processor. The PostgreSQL JDBC driver uses the build-in XML API to perform various conversions from user generated XML to the driver's internal representation and vice versa. To get actual implementations of the interfaces, the driver uses factories provided by Java itself. This seems to be done by the DefaultPGXmlFactoryFactory class.
It turns out that each time a factory is constructed (not the actual XML parser) via .newInstance () methods, Java 21 performs a reload of the respective JAR(s) to some extent. From the profiling, it is obvious that the respective JAR files are loaded, uncompressed, their signatures verified again and again. This is a waste of resources and the source of the experienced low performance.
Performing the same test on Java 8 did not show a similar behaviour.
The reasoning behind creating a new factory when requesting a new parser or similar was not erroneous: In the Javadocs of old Java versions (e.g. 5), it is clearly noted that the APIs should not be considered thread-safe. However, not later than Java 8, these notes have been removed, indicating that from then on they may be considered as thread-safe. So implementations like DefaultPGXmlFactoryFactory could keep a copy of the respective factories and avoid the creation over and over again.
Driver Version?
42.7.3
Java Version?
openjdk version "21" 2023-09-19
OpenJDK Runtime Environment (build 21+35-2513)
OpenJDK 64-Bit Server VM (build 21+35-2513, mixed mode, sharing)
OS Version?
Ubuntu 23.10 (x64)
PostgreSQL Version?
16.0
To Reproduce
Compile the provided example code. Let it run under Java 21 and measure the time.
Repeat this with a Java 8. Compare the times.
Alternatively run the code with strace. The strace will show a large number of read accesses of the Saxon JAR file. It relates to the number of iterations done in the test program.
Expected behaviour
The third party XML libraries should be loaded only once. This is of course also an issue for the Java VM developers at Oracle. Perhaps I will file an issue there also, but to my experience the resolution of issues takes quite long.
Logs
Logs are omitted due to their size (up to a Gigabyte) and due to non-public content.
This is a summary of Linux "stat" system calls for JAR files in the our software during a load test containing the topmost candiates:
6 stat("/home/klaus/projects/tango/svn/tango.2/apps/srs/build/install/srs/lib/regex-21.1.0.jar"
7 stat("/home/klaus/projects/tango/svn/tango.2/apps/srs/build/install/srs/lib/js-21.1.0.jar"
19 stat("/home/klaus/projects/tango/svn/tango.2/apps/srs/build/install/srs/lib/woodstox-core-6.2.6.jar"
20 stat("/home/klaus/projects/tango/svn/tango.2/apps/srs/build/install/srs/lib/truffle-api-21.1.0.jar"
75015 stat("/home/klaus/projects/tango/svn/tango.2/apps/srs/build/install/srs/lib/Saxon-HE-9.8.0-15.jar"
163383 stat("/home/klaus/projects/tango/svn/tango.2/apps/srs/build/install/srs/lib/xercesImpl-2.12.1.jar"
In the provided test case, Java 8 performed 1653 read operations on the Saxon file, while Java 21 performed 40546 read operations.
The text was updated successfully, but these errors were encountered: