Add /v1/project/batchDelete API method #3407

mikael-carneholm-2-wcar · 2024-01-24T14:03:49Z

Description

This patch adds a /v1/project/batchDelete method that makes project life-cycle management more effective since more than 1 project can be deleted with one request.

Addressed Issue

This fixes #3361

Additional Details

Checklist

I have read and understand the contributing guidelines
~~- [ ] This PR fixes a defect, and I have provided tests to verify that the fix is effective~~
This PR implements an enhancement, and I have provided tests to verify that it works as intended
~~- [ ] This PR introduces changes to the database model, and I have added corresponding [update logic]~~(https://github.com/DependencyTrack/dependency-track/tree/master/src/main/java/org/dependencytrack/upgrade)
This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

Signed-off-by: Mikael Carneholm <mikael.carneholm.2@wirelesscar.com>

nscuro · 2024-01-24T17:01:49Z

Do you have some rough numbers you can share as to how much improvement this achieves, over issuing one request per project?

In the end all projects are still deleted individually, so there's no benefit when it comes to database interaction. Also, because deletion happens synchronously, it will be quite common that clients time out, and the request is cancelled before all projects could be deleted.

mikael-carneholm-2-wcar · 2024-01-30T13:28:48Z

These are the results from testing with ~162 000 projects imported into a local Postgres DB container:

Without patch:

real	85m42.879s
user	14m29.058s
sys	11m48.642s

With patch, batch_size 100

real	58m11.109s
user	0m18.538s
sys	0m14.279s

With patch, batch_size 1000

real	44m1.921s
user	0m2.921s
sys	0m1.229s

So there is an improvement in throughput, thanks to the reduced overhead of handling each request. It surely could be improved further by executing "delete from project where uuid in (list_of_uuids)" (pseudo-SQL:ish), but I first went for a simple approach before investing too much time on it.

The upside is that it's easier to inform about which projects the caller doesn't have access to (which I otherwise imagine could be a cause of time-consuming investigation), but that could probably be done in the query-deletion case by iterating through all projects and calling qm.hasAccess(super.getPrincipal(), project), removing the inaccessible project UUIDs from the list before executing the query and finally generating a Response that informs the client about which projects are inaccessible.

If so desired, I could have a go at updating the PR with the delete-with-query approach (which I'm sure will be even more effective).

For transparency, these are the scripts I used:

For importing

#!/usr/bin/env bash

zcat ~/dtrack.data-only.dump.gz | docker exec -i dev-postgres-1 psql -U dtrack

For creating a file containing the project UUIDs to delete

#!/usr/bin/env bash

docker exec -it dev-postgres-1 psql -U dtrack -c 'copy(select "UUID" from "PROJECT") to stdout;' > uuids.all-projects
dos2unix uuids.all-projects

For single project deletion

#!/usr/bin/env bash

ENDPOINT="http://localhost:8080/api/v1"
API_KEY=<my API key>

cat uuids.all-projects | while read uuid; do 
    echo "Deleting $uuid"
    curl -H "X-Api-Key:${API_KEY}" -X DELETE "${ENDPOINT}/project/${uuid}"
done

For batch deletion

#!/usr/bin/env bash

# Special requirements: 
# - jo (apt install jo)

ENDPOINT="http://localhost:8080/api/v1"
API_KEY=<my API key>

batch_size=1000

n_lines=$(wc -l uuids.all-projects | cut -d " " -f 1)

i=0
until [[ $i -gt $n_lines ]]; do
    slice=($(tail -n +$i uuids.all-projects | head -n $batch_size))
    json_list=$(jo -a ${slice[@]})
    echo "Deleting $batch_size projects starting at index $i"
    curl -H "X-Api-Key:${API_KEY}" \
        -H 'Content-Type: application/json' \
        -X POST \
        -d $json_list \
        "${ENDPOINT}/project/batchDelete"
    let i+=batch_size
done

sebD · 2024-03-01T05:56:26Z

src/main/java/org/dependencytrack/resources/v1/ProjectResource.java

+                    LOGGER.warn("No project found by UUID " + uuid);
+                }
+            }
+            if (!inaccessibleProjects.isEmpty()) {


When arriving here, we potentially have projects that have been correctly deleted, yet we respond with 403.
Could it not be confusing for users to receive an error status code when the request has been partly successful ? Since it is an expected behavior that some projects might be inaccessible, shouldn't we rather respond with a 204 and a body containing the list of inaccessible projects instead ?

How about a 207 (Multi-Status) instead of a 204? That maps semantically better to a mixed outcome when some (but not all) projects are inaccessible. I'm thinking something like this:

if (!inaccessibleProjects.isEmpty()) { // not all are inaccessible if (inaccessibleProjects != uuids) { return Response.status(207) .entity(inaccessibleProjects) .build(); } // all are inaccessible return Response.status(Response.Status.FORBIDDEN) .entity(inaccessibleProjects) .build(); }

mikael-carneholm-2-wcar · 2024-03-15T14:51:01Z

Umm...that didn't go as intended. I forgot to sign-off the commit 0fc3170, and followed the DCO recommendation which was

git rebase HEAD~3 --signoff
git push --force-with-lease origin master

...but that obviously dragged in a lot of other changes. I currently have no idea how to deal with this, except starting over with a fresh fork. But maybe there's a smarter way to handle it?

sebD · 2024-03-15T21:43:32Z

The commit you mention is signed off. If you need to find a commit hash from before the rebase check git reflog.
Then you can git reset --hard to that prior state. After that you could probably git --ammend -s to change the commit add the signature.

Signed-off-by: Mikael Carneholm <mikael.carneholm.2@wirelesscar.com>

codacy-production · 2024-03-18T10:32:42Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ +1.63% (target: -1.00%)	✅ 90.00% (target: 70.00%)

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`ade53f8`)	19567	14189	72.51%
Head commit (`2177c1f`)	21566 (+1999)	15991 (+1802)	74.15% (+1.63%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#3407)	10	9	90.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation}

Add /batchDelete API method

0f7a7a9

Signed-off-by: Mikael Carneholm <mikael.carneholm.2@wirelesscar.com>

mikael-carneholm-2-wcar changed the title ~~Add /batchDelete API method~~ Add /v1/project/batchDelete API method Jan 24, 2024

sebD reviewed Mar 1, 2024

View reviewed changes

mikael-carneholm-2-wcar force-pushed the master branch from 9bed588 to 0fc3170 Compare March 15, 2024 14:41

Send a 207 (Multi-Status) when some projects are inaccessible

2177c1f

Signed-off-by: Mikael Carneholm <mikael.carneholm.2@wirelesscar.com>

mikael-carneholm-2-wcar force-pushed the master branch from 0fc3170 to 2177c1f Compare March 18, 2024 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add /v1/project/batchDelete API method #3407

Add /v1/project/batchDelete API method #3407

mikael-carneholm-2-wcar commented Jan 24, 2024

nscuro commented Jan 24, 2024

mikael-carneholm-2-wcar commented Jan 30, 2024

sebD Mar 1, 2024

mikael-carneholm-2-wcar Mar 13, 2024

mikael-carneholm-2-wcar commented Mar 15, 2024 •

edited

sebD commented Mar 15, 2024

codacy-production bot commented Mar 18, 2024

Add /v1/project/batchDelete API method #3407

Are you sure you want to change the base?

Add /v1/project/batchDelete API method #3407

Conversation

mikael-carneholm-2-wcar commented Jan 24, 2024

Description

Addressed Issue

Additional Details

Checklist

nscuro commented Jan 24, 2024

mikael-carneholm-2-wcar commented Jan 30, 2024

Without patch:

With patch, batch_size 100

With patch, batch_size 1000

For importing

For creating a file containing the project UUIDs to delete

For single project deletion

For batch deletion

sebD Mar 1, 2024

Choose a reason for hiding this comment

mikael-carneholm-2-wcar Mar 13, 2024

Choose a reason for hiding this comment

mikael-carneholm-2-wcar commented Mar 15, 2024 • edited

sebD commented Mar 15, 2024

codacy-production bot commented Mar 18, 2024

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

mikael-carneholm-2-wcar commented Mar 15, 2024 •

edited