Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create SpdxId type for spdxId property #407

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

davaya
Copy link
Contributor

@davaya davaya commented Jul 5, 2023

Using a semantic type for Element identifiers enables software to distinguish between IRIs used as Element IDs and IRIs used for other purposes such as Package download location, package URL, and home page.

This allows NamespaceMap to be applied to serialized Element IDs such as Relationship to and from without affecting other IRI properties.

@goneall
Copy link
Member

goneall commented Jul 5, 2023

Although this would be convenient for some serializations, it would add complexity to others.

In RDF, you can always check the type of the subject to find out if it is an Element.

Also note that the rules for "subclassing" data types are a bit different - see https://www.w3.org/TR/xmlschema11-2/#dt-derived - I'm not sure if our current schema tooling supports deriving one datatype from another.

@davaya
Copy link
Contributor Author

davaya commented Jul 6, 2023

This is the same issue as #377 (comment). We already have SemVer, DateTime, and MediaType datatypes in the model, and contrary to the assertion that it can't be done, they work fine, as the xsd example shows.

The issue is just a matter of wording - "subclassing" gives the impression that datatypes are subclassed from something. They aren't. They are defined as root types themselves, not subclassed from anything:

CreationInfo, DictionaryEntry, ExternalIdentifier and other datatypes aren't a "SubclassOf" anything.

ExternalMap, ExternalReference, PositiveIntegerRange and others are a "SubclassOf" "none", which means the same thing.

SemVer, DateTime and MediaType say they are "SubclassOf" xsd:string, but they aren't. They are root datatypes like other datatypes, xsd calls them "facets" that have a "base" of xsd:string (as demonstrated in the XSD in #377) but the base is not a subclass relationship.

The model uses "Vocabularies" files that define xsd:string types that can take on a restricted set of values. But the model doesn't claim that Vocabularies are "SubclassOf" xsd:string. Can Vocabularies be implemented in RDF? If so, then other string-based datatypes can do it the same way. Get rid of the "SubclassOf" metadata item from all datatypes, or replace it with "Base" for simple datatypes, along with pattern or value restrictions below like Vocabularies, and the mistaken impression of subclassing goes away.

(Actually Python does allow subclassing built-in types like str and int to do almost anything. But what we are modeling is much simpler, the type is exactly a native string, but with a setter method that only allows certain values. And unlike SemVer that does have a pattern that can be checked, xsd:anyURI doesn't even do any checking. The URI label is purely a semantic marker for a string that can have any value; it's up to the application to ensure that the strings it labels as URIs are usable as URIs.)

Note: see #408 for an example of why types like SpdxId are useful. xsd:string by itself is implemented as a programming language string variable (str in Python), but strings or integers with restrictions are types.

@goneall
Copy link
Member

goneall commented Jul 6, 2023

I'm running into other issues with the subclassing conventions used - for example, in RDF, there is no class known as none. There is a class known as owl:thing which should be used.

This is complex enough we should probably have a general discussion on:

  • what are the conventions we want to use for nature and subclass?
  • how does this get translated to the OWL and SHACL documents?
  • do we want to take on defining the datatypes in an xsd document?

@zvr
Copy link
Member

zvr commented Jul 6, 2023

@goneall please keep in mind that the human-editable markdown files were not meant to be used for anything else besides the spec-parser.
That program will produce lots of generated stuff: for example ontologies, but also complete markdown files.

I'm now seeing more and more attempts to parse these files, which I think it's a step to the wrong direction.

@goneall
Copy link
Member

goneall commented Jul 6, 2023

please keep in mind that the human-editable markdown files were not meant to be used for anything else besides the spec-parser.

@zvr - that is exactly my point - I didn't mean to imply we are to use the markdown files for anything beyond translating to other formats via the spec parser. Since the usage of the markdown files are restricted, we have some freedom to define the semantics and syntax to meet the needs of the output schemas generated from the markdown files.

As it stands, I don't think creating subclasses for data types will generate the correct OWL / SHACL and we don't currently support XSD definitions of data types - so merging this PR in I don't think would result in anything useful. Let me know my understanding is not correct.

@davaya
Copy link
Contributor Author

davaya commented Jul 7, 2023

@goneall @zvr: I'd ask you both to look at Relationship.

This is the logical type of a relationship element. which has start and end DateTimes. The logical type has nothing to do with any particular serialization, it is the answer to the question "what is the start time of this Relationship". Regardless of whether RDF or Tag or JSON is being used, the user must be able to look at the serialized data and get the answer to that question, and it doesn't make sense to NOT have a name for what a DateTime looks like. Given the model markdown files, is the logical type the exactly correct list of questions that can be asked about a Relationship instance, not more nor fewer?

The question isn't whether tooling currently supports giving names to regex patterns, the question is whether it should. It is a tooling defect to not do so.

@goneall
Copy link
Member

goneall commented Jul 7, 2023

@goneall @zvr: I'd ask you both to look at Relationship.

This is the logical value of a relationship element. which has start and end DateTimes. The logical value has nothing to do with any particular serialization, it is the answer to the question "what is the start time of this Relationship". Regardless of whether RDF or Tag or JSON is being used, the user must be able to look at the serialized data and get the answer to that question, and it doesn't make sense to NOT have a name for what a DateTime looks like. Given the model markdown files, is the logical value the exactly correct list of questions that can be asked about a Relationship instance, not more nor fewer?

The question isn't whether tooling currently supports giving names to regex patterns, the question is whether it should. It is a tooling defect to not do so.

I agree it DateTime should be defined - I'm just confused about the use of SubClass of a data type.

When I look at the generated OWL/SHACL file, I see:

core:DateTime a owl:Class,
        sh:NodeShape ;
    rdfs:comment """A Datetime is a string representation of a specific date and time.
It has resolution of seconds and is always expressed in UTC timezone.
The specific format is one of the most commonly used ISO-8601 formats.
Format restriction: pattern: ^\\d\\d\\d\\d-\\d\\d-\\d\\dT\\d\\d:\\d\\d:\\d\\dZ$""" ;
    rdfs:subClassOf xsd:string ;
    ns0:term_status "Stable" .

I'm not completely sure, but I don't believe this is correct. I don't think you want to have the type to be a subclass of xsd:string since xsd:string is a datatype.

For the purposes of generating schemas we can validate against, it looks like we should be using SHACL shapes and not RDFS Subclasses for this purpose.

I have to admit, I find this area to be rather complex and I could well be wrong.

Here's one reference to a discussion on a similar topic in Stack Overflow.

@goneall
Copy link
Member

goneall commented Jul 7, 2023

I spent some time trying to get a definitive answer on if you can subclass data types in RDF.

It looks like I was wrong - you can subclass datatypes. Based on this, I withdraw my objection and it looks like we are property treating DateTime property correctly.

I wasn't able to figure this out using google searches, but with a bit of help from Google Bard, I found an example. Based on the dialog, I'm not too embarrassed that I didn't know this from the start - it seems Google Bard was also a bit confused ;)

Mostly for entertainment value, here's my Google Dialog.

Prompt: In RDF, can you have a subclass of a datatype?

No, you cannot have a subclass of a datatype in RDF. In RDF, a datatype is a class, and classes cannot have subclasses. The only way to create a new datatype in RDF is to extend an existing datatype. For example, you could create a new datatype called xsd:positiveInteger by extending the datatype xsd:integer.

The reason for this is that datatypes in RDF are not just sets of values. They are also classes, and classes in RDF have a specific meaning. A class in RDF is a set of resources that share some common property. In the case of datatypes, the common property is the set of values that the datatype can represent.

If you were able to have subclasses of datatypes, then you would be able to create a class that contains resources that are not values. This would violate the definition of a class in RDF.

However, you can still create new datatypes in RDF by extending existing datatypes. This is a common practice, and it allows you to create more specialized datatypes that meet your specific needs.

Which prompted my second question I thought you could subclass classes in RDF, isn't this true?

Response:

Yes, you can subclass classes in RDF. This is done using the rdfs:subClassOf property. For example, you could define a class called Animal and then subclass it with a class called Dog. This would mean that all dogs are also animals.

The rdfs:subClassOf property is a transitive property. This means that if class A is a subclass of class B, and class B is a subclass of class C, then class A is also a subclass of class C.

Subclassing classes in RDF is a powerful way to represent hierarchical relationships between classes. It allows you to define more complex ontologies that can be used to model real-world concepts.

I apologize for the confusion earlier. I thought you were asking about subclassing datatypes, which is not possible in RDF. However, subclassing classes is possible, and it is a very common practice.

My response: Then how do you explain your answer on not being able to subclass datatypes since "a datatype is a class, and classes cannot have subclasses"?

Bard:

You are correct. A datatype is a class in RDF, and classes cannot have subclasses. However, there is a distinction between datatypes and classes in RDF. Datatypes are a special type of class that represents sets of values. Classes, on the other hand, can represent any set of resources.

This distinction is important because it means that you cannot subclass a datatype in the same way that you can subclass a class. If you were able to subclass a datatype, then you would be able to create a class that contains resources that are not values. This would violate the definition of a class in RDF.

However, you can still create new datatypes in RDF by extending existing datatypes. This is a common practice, and it allows you to create more specialized datatypes that meet your specific needs.

To summarize, the following are the key points to remember:

Datatypes are a special type of class in RDF.
Classes cannot have subclasses.
You can create new datatypes in RDF by extending existing datatypes.
I hope this explanation is more clear. Let me know if you have any other questions.

My next prompt: In your last response, you state on you 2nd bullet "Classes cannot have subclasses.", but that is not true based on your previous response.

Bard:

You are correct. I apologize for the confusion. Classes can have subclasses in RDF. I misspoke in my previous response.

To be more specific, classes in RDF can have subclasses if the subclasses are also classes. This means that you cannot subclass a datatype with a class, because datatypes are not classes. However, you can subclass a class with another class.

For example, you could define a class called Animal and then subclass it with a class called Dog. This would mean that all dogs are also animals.

I hope this explanation is more clear. Let me know if you have any other questions.

My next prompt: So, how do you extend a datatype in rdf?

Bard:

I'm unable to help you with that, as I'm only a language model and don't have the necessary information or abilities.

Prompt: You said "You can create new datatypes in RDF by extending existing datatypes." - how do you do that?

Bard:

Sure, I can help you with that.

To extend a datatype in RDF, you need to use the rdfs:subClassOf property. The rdfs:subClassOf property is used to indicate that one class is a subclass of another class. In the case of datatypes, this means that the new datatype is a subclass of the existing datatype.

For example, let's say you have an existing datatype called xsd:integer. You want to create a new datatype called xsd:positiveInteger that is a subclass of xsd:integer. You would do this by adding the following statement to your RDF vocabulary:

Code snippet
<http://example.org/datatypes/positiveInteger>
    rdfs:subClassOf <http://www.w3.org/2001/XMLSchema#integer> .
Use code with caution. [Learn more](https://bard.google.com/faq#coding)
This statement tells RDF that the xsd:positiveInteger datatype is a subclass of the xsd:integer datatype. This means that any value that is valid for the xsd:integer datatype is also valid for the xsd:positiveInteger datatype.

In addition to the rdfs:subClassOf property, you can also use the xsd:restriction property to further constrain the values that are valid for a datatype. For example, you could use the xsd:restriction property to specify that the xsd:positiveInteger datatype can only contain values that are greater than or equal to 1.

Here is an example of how you would use the xsd:restriction property to further constrain the values that are valid for the xsd:positiveInteger datatype:

Code snippet
<http://example.org/datatypes/positiveInteger>
    rdfs:subClassOf <http://www.w3.org/2001/XMLSchema#integer> ;
    xsd:restriction
        # Only allow values that are greater than or equal to 1
        {
            xsd:minInclusive "1" .
        }
Use code with caution. [Learn more](https://bard.google.com/faq#coding)
This statement tells RDF that the xsd:positiveInteger datatype is a subclass of the xsd:integer datatype, and that the values that are valid for the xsd:positiveInteger datatype must be greater than or equal to 1.

I hope this explanation is helpful. Let me know if you have any other questions.

@goneall
Copy link
Member

goneall commented Jul 7, 2023

It looks like I was wrong - you can subclass datatypes. Based on this, I withdraw my objection and it looks like we are property treating DateTime property correctly.

Now I'm not so sure - Bard may have been hallucinating - I couldn't find any spec or examples directly from the internet.

@davaya @zvr @sbarnum - can you provide any references to using rdfs:subClassOf for a data type?

@goneall
Copy link
Member

goneall commented Jul 7, 2023

I found this W3C note that provides some context on using data types.

For reference, here's the W3C description of data types in the concepts document.

My reading of the above is data types need to be extensions of the XML datatypes and not using owl:subclasses.

I did find this reference to extensible datatypes in OWL2. Note that these extensible datatypes can not be used with literals - so we should probably avoid them.

Back to the conclusion we should not be subclassing datatypes (and an additional conclusion that Bard is hallucinating on the topic).

@davaya
Copy link
Contributor Author

davaya commented Jul 7, 2023

The almost correct:

Datatypes are a special type of class in RDF.
Classes cannot have subclasses.
You can create new datatypes in RDF by extending existing datatypes.

combined with the almost-correct correction:

This means that you cannot subclass a datatype with a class, because datatypes are not classes.

gets close to my mental model of the explanation:

  • Datatypes are a special type of class in RDF (they are not graph nodes because they do not have an id)
  • Datatypes are simple (primitive) or complex (container)
  • New types can be constructed (SubclassOf) from simple types by restriction, list, union
  • New types can be constructed (SubclassOf) from complex types by adding/deleting contained elements

XSD has two kinds of classes (simple and complex) while RDF, being a graph language, further distinguishes
complex classes into "classtypes" (is a graph node) and "datatypes" (is not a graph node.)
RDF objects can be classtypes, complex datatypes, or simple datatypes. RDF subjects can only be classtypes.

So as long as "SubclassOf" means a base from which something else can be derived, then RDF can "Subclass" all of
simple datatypes, complex datatypes, and classtypes.


I've never tried Bard, but I'm quite impressed with your exchanges - thanks for doing and posting them. https://www.reddit.com/r/quotes/comments/ds44r2/the_marvel_is_not_that_the_bear_dances_well_but/

As for how to restrict a datatype class in RDF, @sbarnum will have to answer that based on the fact that RDF has both container and collection types, and the fact that the model is not just RDF, it also needs/includes XSD and SHACL.

Bard's confusion is another symptom of why "JSON" people (@nishakm, @tsteenbe) say that JSON/JSON-Schema (and to a lesser extent XML/XSD) are simple to understand and use while JSON-LD, RDF/XML and Turtle should be a separate topic of discussion, not harmonized as one-or-the-other equivalents of JSON and XML. I don't think Bard would have any trouble explaining XSD facets or JSON Schema types.

And for SpdxId:

  • String / xsd:string is any text string
  • AnyUri / xsd:anyURI is a SubtypeOf xsd:string - any string that can be used as a URI/IRI
  • SpdxId is a SubtypeOf AnyURI - any URI that is an SPDX element unique identifier.

(I always use the Gettysburg test whenever I see "xsd:string" - if it's OK for the data to be "Fourscore and seven years ago our fathers brought forth, upon this continent, a new nation" then it's OK to use string. Otherwise it's necessary to ask why the Gettysburg address isn't an OK value for that property, and create a SubtypeOf string that is OK. The Gettysburg address is fine as a comment, but it's probably not fine as a name.)

@zvr
Copy link
Member

zvr commented Jul 7, 2023

"The result of Generative AI is highly plausible fiction -- which oftentimes happens to be factually correct."

@goneall
Copy link
Member

goneall commented Jul 7, 2023

I think for RDF, my current opinion is we "restrict" our use to data types to only be pre-defined XSD schema datatypes and use SHACL to further restrict the property. For xsd:string there is a number of SHACL string based constraints the most interesting being pattern.

It looks like implementing pattern restrictions is still a TODO in the spec-parser, so perhaps when we implement that we can switch from using owl:subClassOf to SHACL constraints.

@goneall
Copy link
Member

goneall commented Jul 7, 2023

Back to the original PR - in RDF, the SpdxId will always have to be the URI of the object itself for the graph to work.

It seems a bit odd that we would have a property that has to be the same as the object URI. We would want to restrict the object URI to be a real URI (e.g. no anonymous) - not sure if there is a way to do such a restriction.

@sbarnum any thoughts?

@davaya
Copy link
Contributor Author

davaya commented Jul 7, 2023

I'm not sure the idea of Logical types and values is coming through. For each Element you need to know:

  1. its ID (the type of the ID is SpdxId)
  2. its Type (one of the concrete types defined by the model)
  3. the IDs of all elements it references, such as creators. These references are also type SpdxID.

Serialization determines how the answers to those questions are represented in data.

In computing, matrices can be stored in memory in row-major or column-major order - it is the identical matrix regardless of how it is laid out (serialized) in memory. Similarly, Elements can be serialized into payloads in type-major order (as 2.3 does it, with elements indexed by type (Package, File, etc) first, then by id within a type) or id-major order (with elements indexed by id first, then the type of the element is designated individually for each id). There have been discussions of which is better, but the point is that it makes no difference to the logical model.
Serialization is invisible - two different serializations of the identical logical model must yield the identical set of elements - so arguing about type-major or id-major order has nothing to do with the model, it's purely a serialization discussion.

It seems a bit odd that we would have a property that has to be the same as the object URI. We would want to restrict the object URI to be a real URI (e.g. no anonymous) - not sure if there is a way to do such a restriction.

A logical "property" is a question to be answered about an element value. Regardless of how it is serialized, an element has to have an "object URI" (an id) and a type. In RDF serialization there would not be two copies of the same id (or worse, two ids with different values). Regardless of serialization, there is one answer to the id question - the spdxId "logical property" of that element, of type SpdxId. In an in-memory representation of elements as used by a Python or Java application, a reasonable implementation would be for the spdxId to be a dictionary key. The value of that key would be all the other "logical properties", the value wouldn't have another copy of the key. In an application with type-major internal storage it would be a dictionary with type as the key and the value could be either a dictionary or a list. Either element[type][spdxId][name] or element[type][0][name] would be the string name of the element. In the latter case (only), element[package][0] would have an spdxId property in addition to the name property.

The logical model is implementation variable agnostic just as it is serialization-agnostic - the requirement levied by the logical model is for an application to be able to read RDF data and access the values of an element and know each element's id and type.

For xsd:string there is a number of SHACL string based constraints
the most interesting being pattern.

That's fine - the end result is that every property with a DateTime type has to match the pattern. An implementer would accomplish that using SHACL for RDF, XSD for XML, and JSON Schema for JSON. The logical model doesn't define implementation, it defines results.

@goneall
Copy link
Member

goneall commented Jul 7, 2023

@davaya - My comment above is rather specific to RDF and the Ontology and Schema generated by the spec parser. In it's current form, we are duplicating the ID - I'm not sure that makes sense specifically for the RDF OWL and SHACL schemas.

We're currently using the following transformations:

Model markdown files -> model.ttl file (an RDF Turtle serialization of the OWL and SHACL for RDF) -> Java source files.

When I look at what comes out in the Java files, I'm noticing some oddities - one of which is duplicate object ID's.

@davaya
Copy link
Contributor Author

davaya commented Jul 8, 2023

I made some progress on the sandbox code that demonstrates datatype classes. In the model files, a Format section signals that the class is a simple datatype. A simple datatype has:

  • no Properties properties
  • Format properties that specify the SHACL string constraints you mentioned, or a schema property that specifies the xsd constraints.

For example, the DateTime class has a pattern constraint under Format:

DateTime

Model: 828388b 2023-07-05T10:09:48Z

class DateTime(String):
    pattern: ^\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\dZ$

String is the root string class with type xsd:string, AnyUri is a subtype of String with the xsd:anyURI schema, and SpdxId is a subtype of AnyUri that inherits the xsd:anyURI schema using the schema constraint under Format:

String

Model: 828388b 2023-07-05T10:09:48Z

class String():
    schema: xsd:string

AnyUri

Model: 828388b 2023-07-05T10:09:48Z

class AnyUri(String):
    schema: xsd:anyURI

SpdxId

Model: 828388b 2023-07-05T10:09:48Z

class SpdxId(AnyUri):
    schema: xsd:anyURI

Implementing simple datatype classes is simple in Python; I'm not a Java programmer but assume the details would be similar. Test code for the DateTime class produces the expected output:

   OK: 2023-04-13T11:15:00Z
Error: Fred: "Fred" does not match DateTime pattern
Error: 42: expected string or bytes-like object, got 'int'
Error: None: expected string or bytes-like object, got 'NoneType'

I hope you are able to find the transformation bugs that are causing duplicate Object IDs; the model markdown files are working fine with the simple datatype Core classes.

@zvr
Copy link
Member

zvr commented Jul 21, 2023

As above, I am against introducing a type which does not add anything to what is already available.

@davaya
Copy link
Contributor Author

davaya commented Jul 24, 2023

The reason for defining an SpdxId datatype for xsd:anyURI is to distinguish IRIs that are SpdxIds (e.g., declaredLicense, concludedLicense, originatedBy, suppliedBy) from IRIs that are not SpdxIds (e.g., downloadLocation, packageUrl, homepage). These are all of type xsd:anyURL but they are serialized differently - namespaceMap is not used to compress IRIs that are not SpdxIds.

Look at Package for a concrete example, and compare downloadLocation with concludedLicense.

Naming the SpdxId datatype, just like naming the DateTime datatype, obviously adds to what would be in the model without them. No rationale has been presented for how assigning names to simple Datatypes is harmful, and the names are helpful.

@goneall
Copy link
Member

goneall commented Jul 28, 2023

Since this is targeted for RC2 - @zvr @davaya - we should discuss within the next couple weeks and come to a conclusion

@zvr zvr mentioned this pull request Jul 28, 2023
@davaya davaya mentioned this pull request Jul 30, 2023
@davaya
Copy link
Contributor Author

davaya commented Jul 30, 2023

Since this is targeted for RC2 - @zvr @davaya - we should discuss within the next couple weeks and come to a conclusion

This is a long and interesting discussion, including Bard's AI opinions, but the questions to be answered are simple:

  • Does it help? - does giving names to simple Datatypes like DateTime, SemVer and SpdxId make the model easier to understand and use?
  • Does it hurt? - do names make the model harder to understand and use? Are there other disadvantages?
  • Is it necessary? - can the model be implemented without naming simple Datatypes?

We stipulate that names are not necessary - the model can be implemented without them.

The benefits are discussed above: SpdxId is used in many places, many more than DateTime. And URIs that are not SpdxIds are also used in many places. One example is Package, where every property is shown along with its type. The downloadLocation, packageUrl and homepage properties are URIs, and so are concludedLicense, declaredLicense, originatedBy and suppliedBy. It is not necessary to give them different names, but there are two benefits to doing so:

  • it becomes obvious by inspection that some URIs identify SPDX elements and others do not
  • it identifies for serialization software which URIs can be shortened using NamespaceMap prefixes and which cannot.

In my opinion the benefits of naming Datatypes in general and SpdxId in particular are obvious and substantial.

We have yet to hear any specific harms, costs, or disadvantages.

Package

Model: 8dff2a3 2023-07-28T22:15:03Z

class Package(/Software/SoftwareArtifact):
    packageVersion: String = None                      # * optional 
    downloadLocation: AnyUri = None                    # * optional 
    packageUrl: AnyUri = None                          # * optional 
    homePage: AnyUri = None                            # * optional 
    sourceInfo: String = None                          # * optional 
    contentIdentifier: AnyUri = None                   # * optional 
    primaryPurpose: SoftwarePurpose = None             # optional 
    additionalPurpose: SoftwarePurpose = None          # optional Set[1..*]
    concludedLicense: SpdxId = None                    # optional 
    declaredLicense: SpdxId = None                     # optional 
    copyrightText: String = None                       # * optional 
    attributionText: String = None                     # * optional 
    originatedBy: SpdxId = None                        # optional Set[1..*]
    suppliedBy: SpdxId = None                          # optional Set[1..*]
    builtTime: DateTime = None                         # optional 
    releaseTime: DateTime = None                       # optional 
    validUntilTime: DateTime = None                    # optional 
    standard: String = None                            # * optional Set[1..*]
    spdxId: SpdxId = None                              # 
    name: String = None                                # * 
    summary: String = None                             # * optional 
    description: String = None                         # * optional 
    comment: String = None                             # * optional 
    creationInfo: CreationInfo = None                  # 
    verifiedUsing: IntegrityMethod = None              # optional Set[1..*]
    externalReference: ExternalReference = None        # optional Set[1..*]
    externalIdentifier: ExternalIdentifier = None      # optional Set[1..*]
    extension: Extension = None                        # optional Set[1..*]

NOTE: in the time since this PR was submitted, datatypes have been created, moved to a different directory, SpdxId was added and then deleted. Once a decision is made, this PR can be closed and a new PR (if approved) will create SpdxId in the Datatypes directory.

@goneall
Copy link
Member

goneall commented Jul 30, 2023

We have yet to hear any specific harms, costs, or disadvantages.

I'll register one disadvantage. It is specific to RDF serializations. We currently make the spdxId property the ID for the object. In the JSON-LD serialization, this is done in the context file which makes the spdxId look like a property when it really isn't - it is actually @id which has special meaning and a predefined set of syntax and semantics.

If we create our own definitions of spdxId and we retain the current context file which makes spdxId the @id, then we are defining something that already has a definition. Even if the definition is the same, it will lead to confusion IMO.

BTW - I don't have a strong opinion on whether this disadvantage outweighs the advantages - but it is something to consider.

@davaya
Copy link
Contributor Author

davaya commented Sep 12, 2023

@zvr - is Gary's concern a blocker (defining an SpdxId class forces two definitions that prevent RDF serialization from working), or just a preference (@id is an IRI in RDF serialization, so there's no reason to define an SpdxId class of type IRI to support other serializations)?

@zvr zvr modified the milestones: 3.0-rc2, 3.1 Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants