Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation of XML operations under Java 21 with custom parsers/transformers #3221

Open
demonti opened this issue Apr 18, 2024 · 4 comments

Comments

@demonti
Copy link

demonti commented Apr 18, 2024

pg-xml-test.tar.gz

Describe the issue

Our software is still running on Java 8, but we are in the process of migrating it to Java 21. When doing performance tests, my colleague accidentally ran the tests using Java 21, and it showed a considerable performance degradation – it took two or three times longer.

I performed a profiling using the Linux "gprofng" tool. This showed me a large amount of time spent in specific JDBC calls in the context of the XML support, i.e. SQLXML, PreparedStatement.setSQLXML, ResultSet.getSQLXML, which is actually consumed by Java class loading code. I further used the Linux "strace" tool to validate my assumption, which I describe in the following.

In our project, we replaced the built-in XML parsers and transformers with the original Apache Xerces and the Saxon XSL/T processor. The PostgreSQL JDBC driver uses the build-in XML API to perform various conversions from user generated XML to the driver's internal representation and vice versa. To get actual implementations of the interfaces, the driver uses factories provided by Java itself. This seems to be done by the DefaultPGXmlFactoryFactory class.

It turns out that each time a factory is constructed (not the actual XML parser) via .newInstance () methods, Java 21 performs a reload of the respective JAR(s) to some extent. From the profiling, it is obvious that the respective JAR files are loaded, uncompressed, their signatures verified again and again. This is a waste of resources and the source of the experienced low performance.

Performing the same test on Java 8 did not show a similar behaviour.

The reasoning behind creating a new factory when requesting a new parser or similar was not erroneous: In the Javadocs of old Java versions (e.g. 5), it is clearly noted that the APIs should not be considered thread-safe. However, not later than Java 8, these notes have been removed, indicating that from then on they may be considered as thread-safe. So implementations like DefaultPGXmlFactoryFactory could keep a copy of the respective factories and avoid the creation over and over again.

Driver Version?

42.7.3

Java Version?

openjdk version "21" 2023-09-19
OpenJDK Runtime Environment (build 21+35-2513)
OpenJDK 64-Bit Server VM (build 21+35-2513, mixed mode, sharing)

OS Version?

Ubuntu 23.10 (x64)

PostgreSQL Version?

16.0

To Reproduce

Compile the provided example code. Let it run under Java 21 and measure the time.
Repeat this with a Java 8. Compare the times.

Alternatively run the code with strace. The strace will show a large number of read accesses of the Saxon JAR file. It relates to the number of iterations done in the test program.

Expected behaviour

The third party XML libraries should be loaded only once. This is of course also an issue for the Java VM developers at Oracle. Perhaps I will file an issue there also, but to my experience the resolution of issues takes quite long.

Logs

Logs are omitted due to their size (up to a Gigabyte) and due to non-public content.

This is a summary of Linux "stat" system calls for JAR files in the our software during a load test containing the topmost candiates:

     6 stat("/home/klaus/projects/tango/svn/tango.2/apps/srs/build/install/srs/lib/regex-21.1.0.jar"
     7 stat("/home/klaus/projects/tango/svn/tango.2/apps/srs/build/install/srs/lib/js-21.1.0.jar"
    19 stat("/home/klaus/projects/tango/svn/tango.2/apps/srs/build/install/srs/lib/woodstox-core-6.2.6.jar"
    20 stat("/home/klaus/projects/tango/svn/tango.2/apps/srs/build/install/srs/lib/truffle-api-21.1.0.jar"
 75015 stat("/home/klaus/projects/tango/svn/tango.2/apps/srs/build/install/srs/lib/Saxon-HE-9.8.0-15.jar"
163383 stat("/home/klaus/projects/tango/svn/tango.2/apps/srs/build/install/srs/lib/xercesImpl-2.12.1.jar"

In the provided test case, Java 8 performed 1653 read operations on the Saxon file, while Java 21 performed 40546 read operations.

@sehrope
Copy link
Member

sehrope commented Apr 19, 2024

@demonti
That's very interesting. I'm not sure about merging in JDK-specific behavior in the core driver, but luckily in this case you can handle it in the connection options without waiting for a new release.

There's connection property, xmlFactoryFactory, that let's you specify a factory class for instantiating the XML factory. The default is the built-in class, DefaultPGXmlFactoryFactory, that acts as you describe (creating a new instance for each request).

You can create or your class that implements PGXmlFactoryFactory with the caching behavior across requests. That should solve your jar loading performance issues as it could initialized statically and reused.


@davecramer I think we (or more accurately me...) forgot to add xmlFactoryFactory to the connection property descriptions on the website. It got added when we fixed that CVE to allow opt-in fallback to the old insecure behavior: 14b62ac

But I guess we never added it to the site itself so one would have to look at the enum or the release notes to know about it. A page that's generated off the driver's actual enum values would be nice as it already has the types and descriptions. That way it's always in sync.

@davecramer
Copy link
Member

But I guess we never added it to the site itself so one would have to look at the enum or the release notes to know about it. A page that's generated off the driver's actual enum values would be nice as it already has the types and descriptions. That way it's always in sync.

Yes, it would ! :)

@demonti
Copy link
Author

demonti commented Apr 23, 2024

For your information: I just created an issue in the Oracle Java Bug Database. As soon as this becomes accepted and public (somehow – don't know their process), I will post a link.

While the workaround is surely a solution, it might still be a good idea to solve the problem directly by reworking the code.

@demonti demonti changed the title Performance Degradation of XML operations under Java 21 with custom parsers/transformers Performance degradation of XML operations under Java 21 with custom parsers/transformers Apr 23, 2024
@demonti
Copy link
Author

demonti commented Apr 24, 2024

The bug report has been accepted by Oracle. Of course, they do not give any timeframe/priority about when they will take care. The bug ID at Oracle is 8331025, direct link https://bugs.java.com/bugdatabase/view_bug?bug_id=8331025.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants