Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The changes provide improvements in two ways: batching 500 updates before committing changes, and enclosing multiple non-transactional field updates into a single transaction.
Addressed Issue
Link: #2895
Compared to a local in-instance DB the performance with RDS PostgreSQL dropped significantly.
Performance testing has shown that with AWS RDS PostgreSQL on db.t3.medium and the api server on t3.xlarge the initial load time for the 2023 JSON was 6 minutes 20 seconds before the changes, and 3 minutes 33 seconds after the changes. For the 2022 JSON the load time has changed from 11 minutes 37 seconds to 7 minutes 58 seconds.
As a comparison, with a local DB instance the load time for the 2022 data was 3 minutes 46 seconds.
Additional Details
The code changes don't affect the behaviour of the application, just the performance.
To verify if that's the case it's sufficient to run existing test cases, no new test cases are needed.
Re-running the test cases were successful when setting TZ=UTC, with NZST time zone SnykAnalysisTaskTest.testAnalyzeWithRateLimiting:309 failed.
Output of 'mvn clean verify -P enhance' was
INFO] Results:
[INFO]
[WARNING] Tests run: 1173, Failures: 0, Errors: 0, Skipped: 2
Further improvements are possible by hiding latency by increasing the number of threads which do read only queries, and distributing those queries to read only replicas.
Checklist