Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce "collection" projects for better usage of hierarchical view #2041 #3258

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

rkg-mm
Copy link
Contributor

@rkg-mm rkg-mm commented Dec 2, 2023

Description

This change introduces logic for "collection projects". Those are basically projects used as parent for other projects that shall not hold any own component or vulnerability data, but instead get calculated from child projects using different configurable aggregation logics.

Corresponding Frontend PR with Screenshots: DependencyTrack/frontend#658

Addressed Issue

#2041
#657
#2410

Additional Details

Checklist

  • I have read and understand the contributing guidelines
  • This PR fixes a defect, and I have provided tests to verify that the fix is effective
  • This PR implements an enhancement, and I have provided tests to verify that it works as intended
  • This PR introduces changes to the database model, and I have added corresponding update logic
  • This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

@cheapshot2000
Copy link

Has there been any updates/progress on the feature in completion of the 4.11 release?

@TWpgo
Copy link

TWpgo commented Apr 22, 2024

Looking forward for this feature 👍

@rkg-mm
Copy link
Contributor Author

rkg-mm commented Apr 22, 2024

I messed up the branch trying to rebase onto master to resolve 1 conflicting file... ending up with >1300 changed files now that already are changed in master. no idea what git is doing... will try to fix it but I'm struggling a bit how to get back to a clean state

@rkg-mm rkg-mm force-pushed the 2041-introduce-collection-projects branch from 0aa204e to ddc5c2b Compare April 23, 2024 00:50
@rkg-mm
Copy link
Contributor Author

rkg-mm commented Apr 23, 2024

Warning: I fixed a lot of conflicts and did not have time to test it. Will try to find time tomorrow to test it. I think I finally also fixed the branch.

@rkg-mm
Copy link
Contributor Author

rkg-mm commented Apr 23, 2024

@nscuro I wonder why the unit test still fails. I don't see a change that could influence the parent of a project not being serialized?

@nscuro
Copy link
Member

nscuro commented Apr 23, 2024

@rkg-mm FWIW there can be subtle changes in how serialization libraries (i.e. Jackson) handle certain fields, when those libraries are updated. Or how the persistence framework fetches fields implicitly.

@rkg-mm
Copy link
Contributor Author

rkg-mm commented Apr 23, 2024

Yea so, the PRs backend & frontend seem to be working well. But I have no clue about this unit test and will need some help here. The test explicitly tests the now not working condition, so I guess its important that it works, but I don't see a change that causes the failure.

 Error:  Tests run: 40, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 74.367 s <<< FAILURE! - in org.dependencytrack.resources.v1.ProjectResourceTest
Error:  patchProjectParentTest(org.dependencytrack.resources.v1.ProjectResourceTest)  Time elapsed: 1.483 s  <<< FAILURE!
org.opentest4j.AssertionFailedError: 
JSON documents are different:
Different keys found in node "", missing: "parent", expected: 
<{"active":true,"collectionLogic":"NONE","name":"DEF","parent":{"name":"GHI","uuid":"${json-unit.matches:parentProjectUuid}","version":"3.0"},"properties":[],"tags":[],"uuid":"${json-unit.matches:projectUuid}","version":"2.0"}> 
but was: 
<{"active":true,"collectionLogic":"NONE","name":"DEF","properties":[],"tags":[],"uuid":"ba033fa5-aa20-44ae-85de-6a33e5cf36a3","version":"2.0"}>

	at org.dependencytrack.resources.v1.ProjectResourceTest.patchProjectParentTest(ProjectResourceTest.java:733)

* @author Ralf King
* @since 4.11.0
*/
public enum ProjectCollectionLogic {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add some documentation about this. In #2041 you mention a few use cases that these address, having those be part of the docs will definitely help others.

@@ -538,6 +547,7 @@ public Project updateProject(Project transientProject, boolean commitIndex) {
project.setDescription(transientProject.getDescription());
project.setVersion(transientProject.getVersion());
project.setClassifier(transientProject.getClassifier());
project.setCollectionLogic(transientProject.getCollectionLogic());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the collection logic from say AGGREGATE_DIRECT_CHILDREN to HIGHEST_SEMVER_CHILD can drastically change metric values on existing projects. There will be no indication as to if and when the logic was changed, which can make it tricky to explain why values changed so much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, but also it would be an intended change. Idk what could be done about this except add such change to the logfile, but that would only be helpful for deep analysis of such issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair.

Naive idea: We could add a Collection logic change flag to ProjectMetrics. We could use that in the metrics graph, in some similar way to what you can do in Grafana with annotations:

image

Copy link
Contributor Author

@rkg-mm rkg-mm May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea, but at least on a first look the frontend chart lib doesn't support this. not sure how difficult this would be to add on top of it. Any experience with that? I never used that component.
From data perspective in backend it should not be too complicated.

edit: or would be adding it to the charts tooltip be enough? Also it seems there are options to style datapoints differently. Maybe we could give such points a different styling + add the info in the tooltip?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have too strong opinions on how we highlight it, but I do think that highlighting it one way or the other would be good.

Tooltip on it's own might be too hidden, but styling the datapoints sounds good.

Comment on lines +44 to +48
/**
* Find the child with the highest SemVer version and show only that child's data.
* Ignores children not using SemVer versions.
*/
HIGHEST_SEMVER_CHILD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How critical is the capability to choose the test child by version?

I'm asking because metrics updates will be done via stored procedures for DT v5.x, since we found calculating them in the app to be too inefficient. What this means is that certain semantics will be harder to achieve, one such example being versioning (it's possible to write functions for them of course).

My main concern is that SemVer will not be sufficient for everyone, there are so many versioning schemes out there, and it can become a true nightmare trying to support multiple of them. That is already true for Java, it will be even more true for SQL.

Copy link
Contributor Author

@rkg-mm rkg-mm May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its pretty important for versioned software.
Only alternative I could think of:
One needs logic in CI scripts to identify the latest SemVer version in Dtrack after creating a version and must auto-manage tags to always append a "highest-version" tag or similar to the latest version on its own and auto-remove it for all other versions. Then you can use the "aggregate by tag" logic to achieve the same. But this requires quite some logic in CI builds.
Or could we at least offer an API to identify the latest SemVer version of a project? That would not be called as often as Metrics update, hence doing it outside SQL would be ok. Then it would be a step easier to implement above logic.

Or even better: one could enable an optional feature that lets the project management APIs auto-tag the latest SemVer version automatically whenever projects are added/removed/reversioned. (Could also be a DB flag instead of a Tag)
Would this be an option? Then we don't lose any comfort AND the info could even be reused for other potential features, while not slowing down or making more complicated the metrics update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or could we at least offer an API to identify the latest SemVer version of a project?

When dealing with versioned software like this, wouldn't it be a fair assumption that BOMs of older versions would have been uploaded earlier than those of newer versions?

At the very least, the latest version should be somewhere among the most recently uploaded BOMs, no? So you could use this endpoint in CI to pull the last, say, 5 projects with name foo and then do the version comparison as needed:

curl -H 'X-Api-Key: <API_KEY>' \
    'http://localhost:8080/api/v1/project?name=foo&sortName=lastBomImport&sortOrder=desc&offset=0&limit=5'

To be clear, I'm not strictly against supporting SemVer in DT, I'm just trying to avoid us steering into a maintenance mess once people start asking for more versioning schemes.

Copy link
Contributor Author

@rkg-mm rkg-mm May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When dealing with versioned software like this, wouldn't it be a fair assumption that BOMs of older versions would have been uploaded earlier than those of newer versions?

Not necessarily.
Lets say you have 4 major versions
1.x.x, 2.x.x, 3.x.x, 4.x.x
and you support e.g. the last 3 of those due to business contracts (>=2.x.x) additionally you have active development on the latest version (4.x.x). In this case it can easily happen your upload order is for a while (first to latest):
4.1.0
4.1.1 (bugfix)
3.4.1 (backport of bugfix)
2.3.1 (backport of bugfix)
2.3.2 (additional fix in older version that doesn't apply to newer version)
...

At the very least, the latest version should be somewhere among the most recently uploaded BOMs, no? So you could use this endpoint in CI to pull the last, say, 5 projects with name foo and then do the version comparison as needed:

Likely, yes. I would also assume for our current use cases 5 would be enough. However, idk if that would be the case for everyone. Special cases (e.g. newly using Dtrack and importing existing SBOMs of a lot of SW versions) also not considered.

Not sure whats the best strategy here...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also assume for our current use cases 5 would be enough. However, idk if that would be the case for everyone.

Nothing would prevent users to traverse the project list to the very end in order to determine the "latest" version.

Not sure whats the best strategy here...

My suggestion would be to investigate if and how the use case can be addressed with tools we already have available, including pulling all project versions and doing the "latest" determination on the client-side (i.e. CI scripts).

Based on that, figure out what the pains / limitations are with that approach, and how a server-side implementation could help resolve that. Are the identified limitations show-stoppers or merely annoyances?

If we are really sure we need a server-side implementation, we will need something that covers more than vanilla SemVer: Some organizations will version their software according to Maven, others may use .NET semantics, and so on. We also need to consider that organizations may track software they did not build (i.e. container images from vendors), and thus don't have control over versioning. The vers specification lists a few versioning schemes here: https://github.com/package-url/purl-spec/blob/version-range-spec/VERSION-RANGE-SPEC.rst#some-of-the-known-versioning-schemes

We don't have to cover all cases from the very beginning, but the solutions should at least be extensible such that more can be added in the future.


More generally, my recommendation would be to decouple the version-awareness from this PR, as to not block this from getting merged.

@nscuro
Copy link
Member

nscuro commented Apr 28, 2024

@rkg-mm But I have no clue about this unit test and will need some help here. The test explicitly tests the now not working condition, so I guess its important that it works, but I don't see a change that causes the failure.

I debugged the area in question multiple times now and still don't quite understand how this is happening. The parent is part of the response when I comment out these lines: https://github.com/DependencyTrack/dependency-track/pull/3258/files#diff-75cd29a0a085d84dff3cb4794242e0d1b400385f1acfae172346c3e6780629b0R582-R590

Something about those lines causes parent to be unloaded. But they're literally just calling a setter, so I don't quite get why. The test succeeds when adding project.getParent(); after the lines mentioned above. This kind of "force loading" is used in other areas of the code base. I don't like it, but in this case I think it's OK to do it.

@rkg-mm
Copy link
Contributor Author

rkg-mm commented May 5, 2024

oh come on.. Idk what IntelliJ is doing but whenever I rebase this branch on your master I end up having all changes from master in my PR again. What am I doing wrong here? I don't get it...

edit: ok fixed it again with some hacky workaround. still don't know what I do wrong here...

rkg-mm added 7 commits May 5, 2024 22:49
…CollectionLogics.

Modifies project metrics updates to consider collection project logics.

Added several events to trigger project metrics updates to update collection projects if child projects change in a relevant way.

Signed-off-by: Ralf King <rkg@mm-software.com>
Add test cases for new metrics update logic for collection projects.

Signed-off-by: Ralf King <rkg@mm-software.com>
… all projects

Signed-off-by: Ralf King <rkg@mm-software.com>
Signed-off-by: Ralf King <rkg@mm-software.com>
… wrongly influence numbers

Signed-off-by: Ralf King <rkg@mm-software.com>
Signed-off-by: Ralf King <rkg@mm-software.com>
Signed-off-by: Ralf King <rkg@mm-software.com>
@rkg-mm rkg-mm force-pushed the 2041-introduce-collection-projects branch from efe2c29 to ecfc394 Compare May 5, 2024 20:53
@nscuro
Copy link
Member

nscuro commented May 5, 2024

@rkg-mm How do you do the rebase? I usually run git pull --rebase upstream master in the command line and then use IntelliJ to resolve any conflicts in case there are any.

I never had this particular problem happen, but can imagine IntelliJ might do some unexpected things when merging/rebasing via UI.

@rkg-mm
Copy link
Contributor Author

rkg-mm commented May 5, 2024

In IntelliJ UI I select the upstream master and select "Rebase (my branch) onto master" then push the changes to my remote. The strange thing happens here already, as suddenly my own remote has changes not recognized by my local branch and needs to either be merged (which I tried before and ended up with double changes on same files) or I select "rebase anyway" which avoids double merging. But in both cases I still end up with all those unwanted changes again. Maybe I should try the console attempt next time...

rkg-mm added 2 commits May 6, 2024 18:09
* Add validation to prevent invalid states of collection projects (prevent collection project having Components or services), including several unit tests covering these scenarios.

Signed-off-by: Ralf King <rkg@mm-software.com>
…ject actually changed.

Signed-off-by: Ralf King <rkg@mm-software.com>
@rkg-mm
Copy link
Contributor Author

rkg-mm commented May 6, 2024

@rkg-mm But I have no clue about this unit test and will need some help here. The test explicitly tests the now not working condition, so I guess its important that it works, but I don't see a change that causes the failure.

I debugged the area in question multiple times now and still don't quite understand how this is happening. The parent is part of the response when I comment out these lines: https://github.com/DependencyTrack/dependency-track/pull/3258/files#diff-75cd29a0a085d84dff3cb4794242e0d1b400385f1acfae172346c3e6780629b0R582-R590

Something about those lines causes parent to be unloaded. But they're literally just calling a setter, so I don't quite get why. The test succeeds when adding project.getParent(); after the lines mentioned above. This kind of "force loading" is used in other areas of the code base. I don't like it, but in this case I think it's OK to do it.

Fixed by your suggestion

Copy link

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.10% (target: -1.00%) 86.04% (target: 70.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (ff22d4f) 22069 16756 75.93%
Head commit (0cfdd8c) 22259 (+190) 16923 (+167) 76.03% (+0.10%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3258) 222 191 86.04%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy will stop sending the deprecated coverage status from June 5th, 2024. Learn more

@nscuro
Copy link
Member

nscuro commented May 7, 2024

@rkg-mm We have to schedule this for 4.12 to allow us for sufficient time to review and test.

@nscuro nscuro modified the milestones: 4.11, 4.12 May 7, 2024
@rkg-mm
Copy link
Contributor Author

rkg-mm commented May 7, 2024

@rkg-mm We have to schedule this for 4.12 to allow us for sufficient time to review and test.

no problem, I understand. Lets just hope the release cycle is a bit shorter this time :D

@cheapshot2000
Copy link

@rkg-mm We have to schedule this for 4.12 to allow us for sufficient time to review and test.

no problem, I understand. Lets just hope the release cycle is a bit shorter this time :D

Is there a set release cadence? or other logic for determining when releases are completed?

@nscuro
Copy link
Member

nscuro commented May 7, 2024

Is there a set release cadence? or other logic for determining when releases are completed?

There is not at the moment. We originally planned to do monthly releases, but ended up stretching that (by far...) yet again. A problem is that the core team has certain items they want (and have to work on) in a given release, while said core team also needs to review and test community contributions. It's a luxury problem, but it means that more time is needed to meet both our own, and the community's expectations.

This particular PR was planned for inclusion in 4.11 for a long time, but it also touches on the very core of how DT behaves, in turn requiring more review and testing than other, smaller PRs. And of course contributors also don't always have time to answer our annoying requests. :)

We're still in the process of figuring out how to best plan and size releases. Going forward, we're labelling issues with T-shirt sizes, according to their effort. Aim being to limit the number of high-effort items per release, enabling faster cycles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants