-
-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add /v1/project/batchDelete API method #3407
base: master
Are you sure you want to change the base?
Add /v1/project/batchDelete API method #3407
Conversation
Signed-off-by: Mikael Carneholm <mikael.carneholm.2@wirelesscar.com>
Do you have some rough numbers you can share as to how much improvement this achieves, over issuing one request per project? In the end all projects are still deleted individually, so there's no benefit when it comes to database interaction. Also, because deletion happens synchronously, it will be quite common that clients time out, and the request is cancelled before all projects could be deleted. |
These are the results from testing with ~162 000 projects imported into a local Postgres DB container: Without patch:
With patch, batch_size 100
With patch, batch_size 1000
So there is an improvement in throughput, thanks to the reduced overhead of handling each request. It surely could be improved further by executing "delete from project where uuid in (list_of_uuids)" (pseudo-SQL:ish), but I first went for a simple approach before investing too much time on it. The upside is that it's easier to inform about which projects the caller doesn't have access to (which I otherwise imagine could be a cause of time-consuming investigation), but that could probably be done in the query-deletion case by iterating through all projects and calling If so desired, I could have a go at updating the PR with the delete-with-query approach (which I'm sure will be even more effective). For transparency, these are the scripts I used: For importing#!/usr/bin/env bash
zcat ~/dtrack.data-only.dump.gz | docker exec -i dev-postgres-1 psql -U dtrack
For creating a file containing the project UUIDs to delete#!/usr/bin/env bash
docker exec -it dev-postgres-1 psql -U dtrack -c 'copy(select "UUID" from "PROJECT") to stdout;' > uuids.all-projects
dos2unix uuids.all-projects For single project deletion#!/usr/bin/env bash
ENDPOINT="http://localhost:8080/api/v1"
API_KEY=<my API key>
cat uuids.all-projects | while read uuid; do
echo "Deleting $uuid"
curl -H "X-Api-Key:${API_KEY}" -X DELETE "${ENDPOINT}/project/${uuid}"
done For batch deletion#!/usr/bin/env bash
# Special requirements:
# - jo (apt install jo)
ENDPOINT="http://localhost:8080/api/v1"
API_KEY=<my API key>
batch_size=1000
n_lines=$(wc -l uuids.all-projects | cut -d " " -f 1)
i=0
until [[ $i -gt $n_lines ]]; do
slice=($(tail -n +$i uuids.all-projects | head -n $batch_size))
json_list=$(jo -a ${slice[@]})
echo "Deleting $batch_size projects starting at index $i"
curl -H "X-Api-Key:${API_KEY}" \
-H 'Content-Type: application/json' \
-X POST \
-d $json_list \
"${ENDPOINT}/project/batchDelete"
let i+=batch_size
done |
LOGGER.warn("No project found by UUID " + uuid); | ||
} | ||
} | ||
if (!inaccessibleProjects.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When arriving here, we potentially have projects that have been correctly deleted, yet we respond with 403.
Could it not be confusing for users to receive an error status code when the request has been partly successful ? Since it is an expected behavior that some projects might be inaccessible, shouldn't we rather respond with a 204 and a body containing the list of inaccessible projects instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a 207 (Multi-Status) instead of a 204? That maps semantically better to a mixed outcome when some (but not all) projects are inaccessible. I'm thinking something like this:
if (!inaccessibleProjects.isEmpty()) {
// not all are inaccessible
if (inaccessibleProjects != uuids) {
return Response.status(207)
.entity(inaccessibleProjects)
.build();
}
// all are inaccessible
return Response.status(Response.Status.FORBIDDEN)
.entity(inaccessibleProjects)
.build();
}
9bed588
to
0fc3170
Compare
Umm...that didn't go as intended. I forgot to sign-off the commit 0fc3170, and followed the DCO recommendation which was
...but that obviously dragged in a lot of other changes. I currently have no idea how to deal with this, except starting over with a fresh fork. But maybe there's a smarter way to handle it? |
The commit you mention is signed off. If you need to find a commit hash from before the rebase check git reflog. |
Signed-off-by: Mikael Carneholm <mikael.carneholm.2@wirelesscar.com>
0fc3170
to
2177c1f
Compare
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesYou may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation |
Description
This patch adds a
/v1/project/batchDelete
method that makes project life-cycle management more effective since more than 1 project can be deleted with one request.Addressed Issue
This fixes #3361
Additional Details
Checklist
- [ ] This PR fixes a defect, and I have provided tests to verify that the fix is effective- [ ] This PR introduces changes to the database model, and I have added corresponding [update logic](https://github.com/DependencyTrack/dependency-track/tree/master/src/main/java/org/dependencytrack/upgrade)