perf: Handle Trivy scan results asynchronously #3571

robert-blackman · 2024-03-20T01:24:50Z

Use CompletableFutures with a FixedThreadPool Executor to asynchronously handle each scan result from Trivy

Description

This change allows the TrivyAnalysisTask to handle scan results asynchronously by borrowing the existing CompletableFutures / CountDownLatch pattern being used in the SnykAnalysisTask. The FixedThreadPool can be configured using TRIVY_THREAD_POOL_SIZE (default: 10).

Addressed Issue

fixes #3570

Additional Details

The check for SCANNER_TRIVY_IGNORE_UNFIXED is no longer checked for each vulnerability, as this is expensive and seemingly is not necessary. Instead it is checked once per analysis task.

Checklist

I have read and understand the contributing guidelines
~~[ ] This PR fixes a defect, and I have provided tests to verify that the fix is effective~~
~~[ ] This PR implements an enhancement, and I have provided tests to verify that it works as intended~~
~~[ ] This PR introduces changes to the database model, and I have added corresponding update logic~~
~~[ ] This PR introduces new or alters existing behavior, and I have updated the documentation accordingly~~

Use CompletableFutures with a FixedThreadPool Executor to asynchronously handle each scan result from Trivy Signed-off-by: robert-blackman <robert.blackman@monster.com>

Difference in classloaders seems to cause this when the common pool (used by parallelStream) tries to load a class for the first time Signed-off-by: robert-blackman <robert.blackman@monster.com>

Signed-off-by: robert-blackman <robert.blackman@monster.com>

codacy-production · 2024-03-20T11:16:20Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ +0.01% (target: -1.00%)	✅ 87.10% (target: 70.00%)

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`3b52284`)	21573	15971	74.03%
Head commit (`28d1d42`)	21597 (+24)	15991 (+20)	74.04% (+0.01%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#3571)	31	27	87.10%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation}

nscuro · 2024-03-22T10:36:18Z

Thanks for the PR @robert-blackman! Very much appreciate folks looking into unreleased code, it's the best opportunity to tweak and fix things!

How is the performance holding up when uploading many BOMs? Processing records in a separate thread pool can potentially have the opposite effect. For example, with a worker pool thread of 20, DT could potentially be executing the VulnerabilityAnalysisTask up to 20 times concurrently. But now the Trivy thread pool acts as a bottleneck, as all 20 worker threads shove tasks into it.

robert-blackman · 2024-03-25T08:03:33Z

Good point @nscuro! Maybe a better approach (perhaps a naive one 😂) would be to initialise a new executor with each TrivyAnalysisTask, terminating it after each analysis? This simple approach wouldn't allow for finer control over the total number of concurrent executions (not to mention the memory overhead), but would prevent tasks from blocking one-another for the most part. Any thoughts?

nscuro · 2024-03-25T20:07:55Z

Maybe a better approach (perhaps a naive one 😂) would be to initialise a new executor with each TrivyAnalysisTask, terminating it after each analysis?

In that case we may end up creating more contention as threads compete for database connections. One TrivyAnalysisTask alone could hog all database connections in the connection pool, depending on how many threads we're allocating per task. You could test this by building a container image with your changes locally, and using it in the Docker Compose dev setup with monitoring. The Grafana dashboard includes metrics for the database connection pools.

I believe throwing concurrency at the problem might not solve it without creating new issues. Instead, we may want to look at making the code more efficient. One example would be here:

dependency-track/src/main/java/org/dependencytrack/tasks/scanners/TrivyAnalysisTask.java

Line 447 in 35c7f86

    
           final Component componentPersisted = qm.getObjectByUuid(Component.class, component.getUuid());

Where we fetch a Component from the database, for each reported vulnerability. If multiple vulnerabilities are reported for the same component, we're doing redundant queries. By grouping the vulnerabilities by affected component, we should be able to minimize those queries.

~~Similarly, super.isEnabled is executed once for each vulnerability:~~ (Edit: You already found and fixed that)

dependency-track/src/main/java/org/dependencytrack/tasks/scanners/TrivyAnalysisTask.java

Line 319 in 35c7f86

    
           if (!super.isEnabled(ConfigPropertyConstants.SCANNER_TRIVY_IGNORE_UNFIXED) || vulnerability.getStatus() == 3) {

~~The method executes a DB query behind the scenes, which again can be expensive. Really it only needs to be executed once when the task is started.~~

There's also logic here that checks whether a vulnerability exists, and creates it if necessary:

dependency-track/src/main/java/org/dependencytrack/tasks/scanners/TrivyAnalysisTask.java

Lines 450 to 456 in 35c7f86

    
           Vulnerability vulnerability = qm.getVulnerabilityByVulnId(parsedVulnerability.getSource(), parsedVulnerability.getVulnId()); 
        
           if (vulnerability == null) { 
        
               LOGGER.debug("Creating unavailable vulnerability:" + parsedVulnerability.getSource() + " - " + parsedVulnerability.getVulnId()); 
        
               vulnerability = qm.createVulnerability(parsedVulnerability, false); 
        
               addVulnerabilityToCache(componentPersisted, vulnerability); 
        
           }

This is executed for every Component<->Vulnerability pair. So if the same vulnerability is reported for multiple components, we're doing redundant work.

So, to summarize, there are multiple opportunities here to structure the code in such a way that:

Each component is only fetched once
Each vulnerability is only synchronized once
Configuration (isEnabled) is only fetched once

robert-blackman · 2024-03-27T00:15:13Z

Thanks for the feedback @nscuro 🙂 My original approach really revolved around making the smallest change possible in order to meet the release window, but it did seem that the code in general could do with some optimisation. I'll have a go at implementing some form of local cache to handle the proposed optimisations, not sure if you had an idea for how that would best be implemented? A simple map would do the trick, but I'm not sure if something more robust would be warranted, for example caching vulnerabilities across scans 🤔

I do think (from our usage at least) that concurrency might still be necessary. For example, we are seeing some scans drop from 10-12 minutes down to 1:30-2:30 minutes. There are likely some threadpools out there that are better suited to this use case where fairness can be taken into account, did you have any thoughts on how concurrency might be best achieved here?

New plan:

do less things
(only then) do things faster 😉

robert-blackman added 3 commits March 20, 2024 12:05

perf: Handle Trivy scan results asynchronously

ff2ad0e

Use CompletableFutures with a FixedThreadPool Executor to asynchronously handle each scan result from Trivy Signed-off-by: robert-blackman <robert.blackman@monster.com>

fix: Use stream over parallelStream to prevent ClassNotFoundException

022e7e1

Difference in classloaders seems to cause this when the common pool (used by parallelStream) tries to load a class for the first time Signed-off-by: robert-blackman <robert.blackman@monster.com>

chore: use standard formatting for string interpolation

28d1d42

Signed-off-by: robert-blackman <robert.blackman@monster.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Handle Trivy scan results asynchronously #3571

perf: Handle Trivy scan results asynchronously #3571

robert-blackman commented Mar 20, 2024 •

edited

codacy-production bot commented Mar 20, 2024

nscuro commented Mar 22, 2024

robert-blackman commented Mar 25, 2024

nscuro commented Mar 25, 2024 •

edited

robert-blackman commented Mar 27, 2024

perf: Handle Trivy scan results asynchronously #3571

Are you sure you want to change the base?

perf: Handle Trivy scan results asynchronously #3571

Conversation

robert-blackman commented Mar 20, 2024 • edited

Description

Addressed Issue

Additional Details

Checklist

codacy-production bot commented Mar 20, 2024

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

nscuro commented Mar 22, 2024

robert-blackman commented Mar 25, 2024

nscuro commented Mar 25, 2024 • edited

robert-blackman commented Mar 27, 2024

robert-blackman commented Mar 20, 2024 •

edited

nscuro commented Mar 25, 2024 •

edited