Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Stringer interface on structs, add FromString method where relevant #137

Closed
wants to merge 2 commits into from
Closed

Conversation

ianling
Copy link
Collaborator

@ianling ianling commented Apr 18, 2022

Allows structs to be reliably and consistently converted to strings (e.g. for tag-value and spreadsheet export/writing)

Allows structs to be filled in by parsing them from strings (e.g. for tag-value and spreadsheet import/parsing)

Note: this PR does not completely refactor the tag-value/RDF parsers/writers to use these methods. They are used by the spreadsheet parser in #138.

Depends on and contains commits from #133 and #134

@ianling ianling marked this pull request as draft April 18, 2022 18:05
@ianling ianling changed the title WIP: Implement Stringer interface on structs, add FromString method where relevant Implement Stringer interface on structs, add FromString method where relevant Apr 20, 2022
@ianling ianling marked this pull request as ready for review April 20, 2022 19:00
@ianling ianling marked this pull request as draft April 26, 2022 17:41
Signed-off-by: Ian Ling <ian@iancaling.com>
@ianling ianling marked this pull request as ready for review April 27, 2022 17:03
@ianling
Copy link
Collaborator Author

ianling commented Apr 27, 2022

@swinslow this one's up next. The methods added here should also probably be used in the tag-value and RDF parsers/writers, for consistency. But that can get taken care of in another PR

Copy link
Member

@swinslow swinslow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ianling, thanks for this and apologies again for my delay in reviewing it.

I have a few comments below, mostly around how this handles strings that might have extra whitespace and whether to leverage some of the tag-value parsing functions that I think already address some of these. Let me know your thoughts. Thanks!

spdx/checksum.go Show resolved Hide resolved
// ExternalDocumentRef2_1 is a reference to an external SPDX document
// as defined in section 2.6 for version 2.1 of the spec.
type ExternalDocumentRef2_1 struct {
// DocumentRefID is the ID string defined in the start of the
// reference. It should _not_ contain the "DocumentRef-" part
// of the mandatory ID string.
DocumentRefID string `json:"externalDocumentId"`
DocumentRefID DocElementID `json:"externalDocumentId"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, I believe the reason I had used string rather than DocElementID as the type is that DocElementID is essentially a tuple of an SPDX Document ID and an SPDX Element ID within that Document.

By contrast, External Document Refs only consist of a Document ID. In essence, it is defining a Document ID corresponding to the namespace of the External Document, so that the Document ID can be used throughout the rest of this current Document to make reference to elements in the External Document.

Because of that, I'm not sure we should switch this to DocElementID. It might be confusing to folks since the ElementRefID part of the struct would have no meaning, and shouldn't be rendered here. I think it might be preferable to keep it as just a string. Does that make sense?

Copy link
Collaborator Author

@ianling ianling Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I changed it is because we lose all of the special handling for JSON/YAML exports if we keep this as a normal string and will need to re-implement those cases again.

In other words, if we export DocumentRefID string to JSON, we'll just get myDoc, but if we make it DocumentRefID DocElementID and export it to JSON, we automatically get DocumentRef-myDoc. All of the logic is already in place for the library to properly handle situations where only the DocumentRefID field has a value and not ElementRefID or SpecialID, so only DocumentRef-myDoc would be rendered.

I think it is simplest just to make it clear in comments (and future document validation functions) what the correct usage of DocElementID is inside of an external document ref.

We could also provide a constructor for the ExternalDocumentRef structs if we're worried about making one from scratch being confusing. Something like func MakeExternalDocumentRef(documentRef, uri, algorithm, checksumValue string) ExternalDocumentRef2_1 {...}

spdx/document.go Outdated Show resolved Hide resolved
case 2:
// "SPDXRef-" prefix was present
*e = ElementID(idFields[1])
case 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow this case: if the SPDXRef- prefix is not present, I think we'd want to treat it as an invalid ID rather than ignore that the prefix isn't present? Unless there's a use case here I'm misunderstanding.

Copy link
Collaborator Author

@ianling ianling Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be useful for this library to accept slightly malformed data and sorta "clean it up" if the user chooses to export it again. In this case, if the user parsed a file and an SPDXID field was missing the SPDXRef- prefix, they could export the file again after parsing it and the file would then pass validation (assuming that was the only issue with the file).

If that happens though, we should probably at least issue a warning rather than just charging ahead, or force the user to opt out of validation in order to parse a document that contains problems like this.

I'll give this one some more thought. We frequently have users providing us slightly invalid SPDX documents like this, and we have to walk them through fixing their documents since the parser throws errors for small issues like this, so my preference is just that the library makes a decent guess as to what the user intended, but I also see the value in ensuring that the we uphold the integrity of the spec by at least telling the user they did something wrong.

func MakeDocElementID(docRef string, eltRef string) DocElementID {
docRef = strings.Replace(docRef, "DocumentRef-", "", 1)
eltRef = strings.Replace(eltRef, "SPDXRef-", "", 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit uncomfortable with having this function be able to take strings which might or might not have the DocumentRef- or SPDXRef- prefixes.

This is an edge case probably, but in an SPDX Document you could have two different SPDX elements with the following SPDX Identifiers:

  • SPDXRef-File1
  • SPDXRef-SPDXRef-File1

I wouldn't recommend anyone actually build an SPDX Document this way, but I think it would be syntactically valid.

So if we've got functions that are optionally stripping the first instance of SPDXRef- that it sees, I feel like that will lead to unexpected behavior for edge cases like the one above. I'm having trouble putting my finger on a concrete example, but I guess I'm thinking that it'll be easier to avoid doing the wrong thing if a function like this always requires its inputs to either have the prefix or omit the prefix.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your example should just work, due to the , 1) at the end of the strings.Replace(...) call.

strings.Replace("SPDXRef-File1", "SPDXRef-", "", 1) -> "File1"
strings.Replace("SPDXRef-SPDXRef-File1", "SPDXRef-", "", 1) -> "SPDXRef-File1"
strings.Replace("SPDXRef-DocumentRef-File1", "SPDXRef-", "", 1) -> "DocumentRef-File1"
strings.Replace("File1-SPDXRef", "SPDXRef-", "", 1) -> "File1-SPDXRef"

If we can come up with edge cases we're worried about, I can add test cases for them to ensure that this function is handling them the way we'd want.

But as I mentioned in another comment, this is a situation we have run into repeatedly with real SPDX documents from users, where sometimes they just omit the SPDXRef- prefix on their identifiers, and it's not sustainable for us to fix that manually every time it happens. I think this is a case where the user clearly meant to add the prefix but just forgot/missed it and it's safe for us to make an assumption.

spdx/package.go Outdated Show resolved Hide resolved
tvloader/parser2v1/parse_creation_info.go Outdated Show resolved Hide resolved
@ianling
Copy link
Collaborator Author

ianling commented Jun 7, 2022

Thank you again for your time @swinslow. I moved some code up from the utils folders so that we are using the same logic in all the parsers.


// following tests are copy pasted from tvloader/parser2v2/util_test.go

func TestCanExtractDocumentAndElementRefsFromID(t *testing.T) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made sure to hang on to non-duplicate tests and move them to the proper place after my re-arranging, so we did not lose any coverage.

Signed-off-by: Ian Ling <ian@iancaling.com>
@kzantow
Copy link
Collaborator

kzantow commented Jan 18, 2023

Hi yet again @ianling -- are you interested in updating this PR based on the latest refactoring?

@kzantow
Copy link
Collaborator

kzantow commented May 22, 2023

Hiya @ianling -- thank you for this (and the other) contributions. I'm going to close this for now, since it's been a few months open without movement. We will gladly reopen this if you decide to update/continue it!

@kzantow kzantow closed this May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants