Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include hasFiles and documentDescribes information in relationship comparison #153

Open
meretp opened this issue Feb 14, 2023 · 4 comments
Labels
wontfix This will not be worked on

Comments

@meretp
Copy link

meretp commented Feb 14, 2023

I had a look into the comparison method in the tools-java to find out if we can also use them in the testbed. As this method originates from this repo, I decided to open the issue here. So, apart from the fact that we would need to change one namespace to compare two equal docs, I ran into another issue. This is also an issue within the testbed (#51).
When comparing the doument

SPDXVersion: SPDX-2.3
DataLicense: CC0-1.0
DocumentNamespace: https://some.namespace
DocumentName: document name
SPDXID: SPDXRef-DOCUMENT

## Creation Information
Creator: Tool: test-tool
Created: 2022-01-01T00:00:00Z
## Relationships
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileA
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileB

FileName: ./fileA.c
SPDXID: SPDXRef-fileA
FileChecksum: SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
LicenseConcluded: NOASSERTION
## Relationships
Relationship: SPDXRef-fileA DEPENDS_ON SPDXRef-fileB
Relationship: SPDXRef-fileA DESCRIBED_BY SPDXRef-DOCUMENT
FileName: ./fileB.c
SPDXID: SPDXRef-fileB
FileChecksum: SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
LicenseConcluded: NOASSERTION
## Relationships
Relationship: SPDXRef-fileB DEPENDENCY_OF SPDXRef-fileA
Relationship: SPDXRef-fileB DESCRIBED_BY SPDXRef-DOCUMENT

with

SPDXVersion: SPDX-2.3
DataLicense: CC0-1.0
DocumentNamespace: https://some.namespace2
DocumentName: document name
SPDXID: SPDXRef-DOCUMENT

## Creation Information
Creator: Tool: test-tool
Created: 2022-01-01T00:00:00Z
## Relationships
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileA
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileB

FileName: ./fileA.c
SPDXID: SPDXRef-fileA
FileChecksum: SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
LicenseConcluded: NOASSERTION
## Relationships
Relationship: SPDXRef-fileA DEPENDS_ON SPDXRef-fileB
Relationship: SPDXRef-fileA DESCRIBED_BY SPDXRef-DOCUMENT
FileName: ./fileB.c
SPDXID: SPDXRef-fileB
FileChecksum: SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
LicenseConcluded: NOASSERTION
## Relationships
Relationship: SPDXRef-fileB DEPENDENCY_OF SPDXRef-fileA

the resulting xlsx file lists a difference in the file relationships (which is understandable as the relationship Relationship: SPDXRef-fileB DESCRIBED_BY SPDXRef-DOCUMENT is missing -although it is a duplicate of SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileB). But the result also marks document describes with diff although the values in the rows below are the same. The same holds for the file relationships of ./fileA.c.
Why is this marked as diff?
In general I would expect that the comparison is somehow independant from the direction of the relationship although I see that this is a rather complex topic and I can understand that the comparison would only be about the "actual present" relationships.

Another problem, which is related to this, is the following: For json, yaml and xml there are additionally the tags "documentDescribes" and "hasFiles" in packages to represent DESCRIBES, resp. CONTAINS-relationships. The tools-python avoid duplications when serialising and do not write for example SPDXRef-DOCUMENT DESCRIBES SPDXRef-File additionally out, because this information is already mapped in "documentDescribes: [SPDXRef-File]". This leads then with the comparison however again to the fact that two documents are not evaluated as equal.

Do you think it would make sense to add some additional logic to the comparison that checks for this kind of semantic equivalence of the relationships ( so including the information from hasFilesand documentDescribes) and not only for actual existence?

@goneall
Copy link
Member

goneall commented Feb 14, 2023

Thanks @meretp for the detailed explanation of the issues.

In general I would expect that the comparison is somehow independant from the direction of the relationship although I see that this is a rather complex topic and I can understand that the comparison would only be about the "actual present" relationships.

This would involve an assumption of equivalence for different relationship types and directions. I think this is a reasonable assumption, but it would be good to document those in the spec itself.

You're right in that this would be a rather complex fix, but I think it is doable.

Another problem, which is related to this, is the following: For json, yaml and xml there are additionally the tags "documentDescribes" and "hasFiles" in packages to represent DESCRIBES, resp. CONTAINS-relationships. The tools-python avoid duplications when serialising and do not write for example SPDXRef-DOCUMENT DESCRIBES SPDXRef-File additionally out, because this information is already mapped in "documentDescribes: [SPDXRef-File]". This leads then with the comparison however again to the fact that two documents are not evaluated as equal.

The Java deserializer code for JSON, YAML and XML should handle deserializing these properties into a normalized form. So, a document which has a describes property should translate to a DESCRIBES relationship. Perhaps there is an issue with the deserialization?

Do you think it would make sense to add some additional logic to the comparison that checks for this kind of semantic equivalence of the relationships ( so including the information from hasFilesand documentDescribes) and not only for actual existence?

Yes - but it would be a bit of work.

@goneall
Copy link
Member

goneall commented Apr 9, 2023

In looking at the effort involved, I don't think it would make sense to fix this before 3.0.

The 3.0 spec has some structural changes in the relationships that will make this a different solution and perhaps easier.

@meretp - let me know if you agree

@meretp
Copy link
Author

meretp commented Apr 11, 2023

@goneall I agree with you. Considering the effort involved and the rather rare use case, I wouldn't fix it now either.

@goneall goneall added the wontfix This will not be worked on label Apr 12, 2023
@goneall
Copy link
Member

goneall commented Apr 12, 2023

Clarification on the "won't fix" label - won't be fixed before SPDX spec version 3.0. Once the SPDX spec version 3.0 is released, we'll reconsider working on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants