Implement Stringer interface on structs, add FromString method where relevant #137

ianling · 2022-04-18T18:05:29Z

Allows structs to be reliably and consistently converted to strings (e.g. for tag-value and spreadsheet export/writing)

Allows structs to be filled in by parsing them from strings (e.g. for tag-value and spreadsheet import/parsing)

Note: this PR does not completely refactor the tag-value/RDF parsers/writers to use these methods. They are used by the spreadsheet parser in #138.

Depends on and contains commits from #133 and #134

Signed-off-by: Ian Ling <ian@iancaling.com>

ianling · 2022-04-27T17:04:04Z

@swinslow this one's up next. The methods added here should also probably be used in the tag-value and RDF parsers/writers, for consistency. But that can get taken care of in another PR

swinslow

Hi @ianling, thanks for this and apologies again for my delay in reviewing it.

I have a few comments below, mostly around how this handles strings that might have extra whitespace and whether to leverage some of the tag-value parsing functions that I think already address some of these. Let me know your thoughts. Thanks!

spdx/checksum.go

swinslow · 2022-05-15T13:52:53Z

spdx/document.go

 // ExternalDocumentRef2_1 is a reference to an external SPDX document
 // as defined in section 2.6 for version 2.1 of the spec.
 type ExternalDocumentRef2_1 struct {
 	// DocumentRefID is the ID string defined in the start of the
 	// reference. It should _not_ contain the "DocumentRef-" part
 	// of the mandatory ID string.
-	DocumentRefID string `json:"externalDocumentId"`
+	DocumentRefID DocElementID `json:"externalDocumentId"`


For this one, I believe the reason I had used string rather than DocElementID as the type is that DocElementID is essentially a tuple of an SPDX Document ID and an SPDX Element ID within that Document.

By contrast, External Document Refs only consist of a Document ID. In essence, it is defining a Document ID corresponding to the namespace of the External Document, so that the Document ID can be used throughout the rest of this current Document to make reference to elements in the External Document.

Because of that, I'm not sure we should switch this to DocElementID. It might be confusing to folks since the ElementRefID part of the struct would have no meaning, and shouldn't be rendered here. I think it might be preferable to keep it as just a string. Does that make sense?

The reason I changed it is because we lose all of the special handling for JSON/YAML exports if we keep this as a normal string and will need to re-implement those cases again.

In other words, if we export DocumentRefID string to JSON, we'll just get myDoc, but if we make it DocumentRefID DocElementID and export it to JSON, we automatically get DocumentRef-myDoc. All of the logic is already in place for the library to properly handle situations where only the DocumentRefID field has a value and not ElementRefID or SpecialID, so only DocumentRef-myDoc would be rendered.

I think it is simplest just to make it clear in comments (and future document validation functions) what the correct usage of DocElementID is inside of an external document ref.

We could also provide a constructor for the ExternalDocumentRef structs if we're worried about making one from scratch being confusing. Something like func MakeExternalDocumentRef(documentRef, uri, algorithm, checksumValue string) ExternalDocumentRef2_1 {...}

spdx/document.go

swinslow · 2022-05-15T14:00:21Z

spdx/identifier.go

+	case 2:
+		// "SPDXRef-" prefix was present
+		*e = ElementID(idFields[1])
+	case 1:


I'm not sure I follow this case: if the SPDXRef- prefix is not present, I think we'd want to treat it as an invalid ID rather than ignore that the prefix isn't present? Unless there's a use case here I'm misunderstanding.

I think it might be useful for this library to accept slightly malformed data and sorta "clean it up" if the user chooses to export it again. In this case, if the user parsed a file and an SPDXID field was missing the SPDXRef- prefix, they could export the file again after parsing it and the file would then pass validation (assuming that was the only issue with the file).

If that happens though, we should probably at least issue a warning rather than just charging ahead, or force the user to opt out of validation in order to parse a document that contains problems like this.

I'll give this one some more thought. We frequently have users providing us slightly invalid SPDX documents like this, and we have to walk them through fixing their documents since the parser throws errors for small issues like this, so my preference is just that the library makes a decent guess as to what the user intended, but I also see the value in ensuring that the we uphold the integrity of the spec by at least telling the user they did something wrong.

swinslow · 2022-05-15T14:08:42Z

spdx/identifier.go

 func MakeDocElementID(docRef string, eltRef string) DocElementID {
+	docRef = strings.Replace(docRef, "DocumentRef-", "", 1)
+	eltRef = strings.Replace(eltRef, "SPDXRef-", "", 1)


I'm a bit uncomfortable with having this function be able to take strings which might or might not have the DocumentRef- or SPDXRef- prefixes.

This is an edge case probably, but in an SPDX Document you could have two different SPDX elements with the following SPDX Identifiers:

SPDXRef-File1

SPDXRef-SPDXRef-File1

I wouldn't recommend anyone actually build an SPDX Document this way, but I think it would be syntactically valid.

So if we've got functions that are optionally stripping the first instance of SPDXRef- that it sees, I feel like that will lead to unexpected behavior for edge cases like the one above. I'm having trouble putting my finger on a concrete example, but I guess I'm thinking that it'll be easier to avoid doing the wrong thing if a function like this always requires its inputs to either have the prefix or omit the prefix.

I think your example should just work, due to the , 1) at the end of the strings.Replace(...) call.

strings.Replace("SPDXRef-File1", "SPDXRef-", "", 1) -> "File1" strings.Replace("SPDXRef-SPDXRef-File1", "SPDXRef-", "", 1) -> "SPDXRef-File1" strings.Replace("SPDXRef-DocumentRef-File1", "SPDXRef-", "", 1) -> "DocumentRef-File1" strings.Replace("File1-SPDXRef", "SPDXRef-", "", 1) -> "File1-SPDXRef"

If we can come up with edge cases we're worried about, I can add test cases for them to ensure that this function is handling them the way we'd want.

But as I mentioned in another comment, this is a situation we have run into repeatedly with real SPDX documents from users, where sometimes they just omit the SPDXRef- prefix on their identifiers, and it's not sustainable for us to fix that manually every time it happens. I think this is a case where the user clearly meant to add the prefix but just forgot/missed it and it's safe for us to make an assumption.

spdx/package.go

tvloader/parser2v1/parse_creation_info.go

ianling · 2022-06-07T19:43:59Z

Thank you again for your time @swinslow. I moved some code up from the utils folders so that we are using the same logic in all the parsers.

ianling · 2022-06-07T19:46:17Z

rdfloader/parser2v2/utils_test.go

-
-// following tests are copy pasted from tvloader/parser2v2/util_test.go
-
-func TestCanExtractDocumentAndElementRefsFromID(t *testing.T) {


I made sure to hang on to non-duplicate tests and move them to the proper place after my re-arranging, so we did not lose any coverage.

Signed-off-by: Ian Ling <ian@iancaling.com>

kzantow · 2023-01-18T21:49:09Z

Hi yet again @ianling -- are you interested in updating this PR based on the latest refactoring?

kzantow · 2023-05-22T19:30:44Z

Hiya @ianling -- thank you for this (and the other) contributions. I'm going to close this for now, since it's been a few months open without movement. We will gladly reopen this if you decide to update/continue it!

ianling marked this pull request as draft April 18, 2022 18:05

ianling mentioned this pull request Apr 18, 2022

Spreadsheet / XLSX parser and writer #138

Closed

ianling changed the title ~~WIP: Implement Stringer interface on structs, add FromString method where relevant~~ Implement Stringer interface on structs, add FromString method where relevant Apr 20, 2022

ianling marked this pull request as ready for review April 20, 2022 19:00

This was referenced Apr 21, 2022

Support more checksum algorithms #139

Closed

Overhaul structs, refactor JSON parser and saver #133

Merged

ianling marked this pull request as draft April 26, 2022 17:41

Add String and FromString methods to structs

3a40164

Signed-off-by: Ian Ling <ian@iancaling.com>

ianling marked this pull request as ready for review April 27, 2022 17:03

swinslow requested changes May 15, 2022

View reviewed changes

ianling mentioned this pull request Jun 6, 2022

Make FilesAnalyzed a pointer, assume "true" if absent #147

Closed

ianling commented Jun 7, 2022

View reviewed changes

Use utils functions in struct string parsers

c783614

Signed-off-by: Ian Ling <ian@iancaling.com>

kzantow closed this May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Stringer interface on structs, add FromString method where relevant #137

Implement Stringer interface on structs, add FromString method where relevant #137

ianling commented Apr 18, 2022 •

edited

ianling commented Apr 27, 2022 •

edited

swinslow left a comment

swinslow May 15, 2022

ianling Jun 7, 2022 •

edited

swinslow May 15, 2022

ianling Jun 7, 2022 •

edited

swinslow May 15, 2022

ianling Jun 7, 2022

ianling commented Jun 7, 2022

ianling Jun 7, 2022

kzantow commented Jan 18, 2023

kzantow commented May 22, 2023


		// following tests are copy pasted from tvloader/parser2v2/util_test.go

		func TestCanExtractDocumentAndElementRefsFromID(t *testing.T) {

Implement Stringer interface on structs, add FromString method where relevant #137

Implement Stringer interface on structs, add FromString method where relevant #137

Conversation

ianling commented Apr 18, 2022 • edited

ianling commented Apr 27, 2022 • edited

swinslow left a comment

Choose a reason for hiding this comment

swinslow May 15, 2022

Choose a reason for hiding this comment

ianling Jun 7, 2022 • edited

Choose a reason for hiding this comment

swinslow May 15, 2022

Choose a reason for hiding this comment

ianling Jun 7, 2022 • edited

Choose a reason for hiding this comment

swinslow May 15, 2022

Choose a reason for hiding this comment

ianling Jun 7, 2022

Choose a reason for hiding this comment

ianling commented Jun 7, 2022

ianling Jun 7, 2022

Choose a reason for hiding this comment

kzantow commented Jan 18, 2023

kzantow commented May 22, 2023

ianling commented Apr 18, 2022 •

edited

ianling commented Apr 27, 2022 •

edited

ianling Jun 7, 2022 •

edited

ianling Jun 7, 2022 •

edited