Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add /v1/project/batchDelete API method #3407

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mikael-carneholm-2-wcar

Description

This patch adds a /v1/project/batchDelete method that makes project life-cycle management more effective since more than 1 project can be deleted with one request.

Addressed Issue

This fixes #3361

Additional Details

Checklist

Signed-off-by: Mikael Carneholm <mikael.carneholm.2@wirelesscar.com>
@mikael-carneholm-2-wcar mikael-carneholm-2-wcar changed the title Add /batchDelete API method Add /v1/project/batchDelete API method Jan 24, 2024
@nscuro
Copy link
Member

nscuro commented Jan 24, 2024

Do you have some rough numbers you can share as to how much improvement this achieves, over issuing one request per project?

In the end all projects are still deleted individually, so there's no benefit when it comes to database interaction. Also, because deletion happens synchronously, it will be quite common that clients time out, and the request is cancelled before all projects could be deleted.

@mikael-carneholm-2-wcar
Copy link
Author

These are the results from testing with ~162 000 projects imported into a local Postgres DB container:

Without patch:

real	85m42.879s
user	14m29.058s
sys	11m48.642s

With patch, batch_size 100

real	58m11.109s
user	0m18.538s
sys	0m14.279s

With patch, batch_size 1000

real	44m1.921s
user	0m2.921s
sys	0m1.229s

So there is an improvement in throughput, thanks to the reduced overhead of handling each request. It surely could be improved further by executing "delete from project where uuid in (list_of_uuids)" (pseudo-SQL:ish), but I first went for a simple approach before investing too much time on it.

The upside is that it's easier to inform about which projects the caller doesn't have access to (which I otherwise imagine could be a cause of time-consuming investigation), but that could probably be done in the query-deletion case by iterating through all projects and calling qm.hasAccess(super.getPrincipal(), project), removing the inaccessible project UUIDs from the list before executing the query and finally generating a Response that informs the client about which projects are inaccessible.

If so desired, I could have a go at updating the PR with the delete-with-query approach (which I'm sure will be even more effective).

For transparency, these are the scripts I used:

For importing

#!/usr/bin/env bash

zcat ~/dtrack.data-only.dump.gz | docker exec -i dev-postgres-1 psql -U dtrack

For creating a file containing the project UUIDs to delete

#!/usr/bin/env bash

docker exec -it dev-postgres-1 psql -U dtrack -c 'copy(select "UUID" from "PROJECT") to stdout;' > uuids.all-projects
dos2unix uuids.all-projects

For single project deletion

#!/usr/bin/env bash

ENDPOINT="http://localhost:8080/api/v1"
API_KEY=<my API key>

cat uuids.all-projects | while read uuid; do 
    echo "Deleting $uuid"
    curl -H "X-Api-Key:${API_KEY}" -X DELETE "${ENDPOINT}/project/${uuid}"
done

For batch deletion

#!/usr/bin/env bash

# Special requirements: 
# - jo (apt install jo)

ENDPOINT="http://localhost:8080/api/v1"
API_KEY=<my API key>

batch_size=1000

n_lines=$(wc -l uuids.all-projects | cut -d " " -f 1)

i=0
until [[ $i -gt $n_lines ]]; do
    slice=($(tail -n +$i uuids.all-projects | head -n $batch_size))
    json_list=$(jo -a ${slice[@]})
    echo "Deleting $batch_size projects starting at index $i"
    curl -H "X-Api-Key:${API_KEY}" \
        -H 'Content-Type: application/json' \
        -X POST \
        -d $json_list \
        "${ENDPOINT}/project/batchDelete"
    let i+=batch_size
done

LOGGER.warn("No project found by UUID " + uuid);
}
}
if (!inaccessibleProjects.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When arriving here, we potentially have projects that have been correctly deleted, yet we respond with 403.
Could it not be confusing for users to receive an error status code when the request has been partly successful ? Since it is an expected behavior that some projects might be inaccessible, shouldn't we rather respond with a 204 and a body containing the list of inaccessible projects instead ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a 207 (Multi-Status) instead of a 204? That maps semantically better to a mixed outcome when some (but not all) projects are inaccessible. I'm thinking something like this:

            if (!inaccessibleProjects.isEmpty()) {
                // not all are inaccessible
                if (inaccessibleProjects != uuids) { 
                    return Response.status(207)
                            .entity(inaccessibleProjects)
                            .build();
                }
                // all are inaccessible
                return Response.status(Response.Status.FORBIDDEN)
                        .entity(inaccessibleProjects)
                        .build();
            }

@mikael-carneholm-2-wcar
Copy link
Author

mikael-carneholm-2-wcar commented Mar 15, 2024

Umm...that didn't go as intended. I forgot to sign-off the commit 0fc3170, and followed the DCO recommendation which was

git rebase HEAD~3 --signoff
git push --force-with-lease origin master

...but that obviously dragged in a lot of other changes. I currently have no idea how to deal with this, except starting over with a fresh fork. But maybe there's a smarter way to handle it?

@sebD
Copy link
Contributor

sebD commented Mar 15, 2024

The commit you mention is signed off. If you need to find a commit hash from before the rebase check git reflog.
Then you can git reset --hard to that prior state. After that you could probably git --ammend -s to change the commit add the signature.

Signed-off-by: Mikael Carneholm <mikael.carneholm.2@wirelesscar.com>
Copy link

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+1.63% (target: -1.00%) 90.00% (target: 70.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (ade53f8) 19567 14189 72.51%
Head commit (2177c1f) 21566 (+1999) 15991 (+1802) 74.15% (+1.63%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3407) 10 9 90.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make project life-cycle management more effective
3 participants