Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON format spec review (for scope creep, ambiguity etc) #963

Open
jskeet opened this issue Mar 2, 2022 · 16 comments
Open

JSON format spec review (for scope creep, ambiguity etc) #963

jskeet opened this issue Mar 2, 2022 · 16 comments

Comments

@jskeet
Copy link
Contributor

jskeet commented Mar 2, 2022

As promised, this is the result of trawling through the JSON format spec, and seeing which aspects should possibly be specified elsewhere, or generally clarified. There was actually less scope creep than I expected to find - partly as one example ("all events in the batch must have the same specversion") is bizarrely in the HTTP spec.

2. Attributes - duplication

Current text:

In particular, extensions are placed as top-level JSON properties. Extensions MUST be serialized as a top-level JSON property.

It's not clear how these two sentences are expected to say anything other than exactly the same thing. Proposal: move "MUST" into the first sentence, and delete the second sentence (which has singular/plural issues anyway).

2.2 Type system mapping - "primary vs secondary mappings"

Current text:

Extension specifications MAY define secondary mapping rules for the values of attributes they define, but MUST also include the previously defined primary mapping.

For instance, the attribute value might be a data structure defined in a standard outside of CloudEvents, with a formal JSON mapping, and there might be risk of translation errors or information loss when the original format is not preserved.

An extension specification that defines a secondary mapping rule for JSON, and any revision of such a specification, MUST also define explicit mapping rules for all other event formats that are part of the CloudEvents core at the time of the submission or revision.

This was raised as a point of confusion in a discussion.

Proposal: delete. The intention is already covered elsewhere, as Doug demonstrated in the discussion. If we feel there is something JSON-specific that we want to include, we should make it much clearer.

2.2 Type system mapping - ambiguity in extension types

Current text:

If required, like when decoding Maps, the CloudEvents type can be determined by inference using the rules from the mapping table, whereby the only potentially ambiguous JSON data type is string. The value is compatible with the respective CloudEvents type when the mapping rules are fulfilled.

Firstly, I suspect that "like when decoding Maps" harks back to a pre-1.0 era where "map" was a CloudEvent attribute type. It should probably be removed.

Secondly, it's noted that an extension attribute with a JSON type of "string" is ambiguous, but no information is provided about how to resolve this. This is primarily a practical issue for SDKs parsing JSON-format events, but it's a conceptual one as well. If a JSON object represents a CloudEvent, what is the type of an extension specified using a string value?

I can give the answer from a C# SDK perspective: it's inferred to be string unless there is pre-existing information to the contrary. (A user can provide a known set of extension attributes to the parsing operation.) I've no idea whether other SDKs do the same thing though. While I know we try to avoid providing too much SDK-specific guidance in formats (and I'd argue we should probably give more), in this particular case I think we could do so without going too deep. I'm happy to create a PR if that would be useful.

3. Envelope - clarity/accuracy and typo

Current text:

OPTIONAL not omitted attributes MAY be represeted as a null JSON value.

Clearly "represeted" should be "represented" - but should "not omitted" actually be "omitted"? I thought we had agreed before that "null" wasn't really a "present" value anyway, conceptually. In other words, there should be no such thing as a "not omitted attribute with a value of null".

3.1.2 Payload deserialization - handling of non-string values

This was what #934 was about. I knew I'd got the impression from somewhere that strings were important...

Current text:

If the datacontenttype does not declare JSON-formatted data content, then the data member SHOULD be treated as an encoded content string. An implementation MAY fail to deserialize the event if the data member is not a string, or if it is unable to interpret the data with the datacontenttype.

Given that this is about deserialization, it's not really clear what it means to say that the data member SHOULD be treated as an encoded string. When we're deserializing, we don't get to decide its type - the JSON does! The equivalent part of serialization is more reasonable, basically saying we shouldn't create JSON representations with non-JSON datacontenttype and non-string values.

4. JSON batch format: document or object

Current text:

In the JSON Batch Format several CloudEvents are batched into a single JSON document.

I'd argue that really only a JSON object is specified. Indeed, RFC 7159 doesn't even describe what a JSON document is... but either way, it would be entirely reasonable to create a JSON object which had a JSON batch object as the value of one property.

And yes, this is very much like my objection to the use of document in the XML format.

4. JSON batch format: separate formats?

Current text:

Although the JSON Batch Format builds ontop of the JSON Format, it is considered as a separate format

We've got this in every format that supports batches. Rather than repeating this text everywhere, I think it would be better to specify it in a single place, for "event formats that support batch". A new document about event formats in general would be useful - that could also give more details of structured vs binary mode and the relationship between them and event formats.

4. JSON batch format: context for an empty batch

Current text:

An example of an empty batch of CloudEvents (typically used in a response, but also valid in a request):

It's useful to state that a batch may be empty, and to give an example, but we should probably remove any expectations as to when it'll be used.

@erikerikson
Copy link
Member

Largely lovely and appreciated observations. Some nits...

  1. - Agreed, though the word "placed" seems more ambiguous than "serialized". We are talking about the format, right? So maybe we switch those words out when collapsing to the first sentence?

should "not omitted" actually be "omitted"?

  1. - I think it could be correct "as is" and referencing a semantically significant language behavior (i.e. the JS part of JSON). In the browser console:
>> o = { "a": "b", "c": null, "d": undefined }
<- Object { a: "b", c: null, d: undefined }
>> JSON.stringify(o)
<- "{\"a\":\"b\",\"c\":null}"

I don't recall the intention but I expect the existing spec language is meant to specify that behavior. As written, it could potentially leave a choice to implementers about how to handle attributes explicitly sent to undefined (i.e. omitting them as per the stringify behavior or including them with value null).

4."document or object" - ❤️

@jskeet
Copy link
Contributor Author

jskeet commented Mar 14, 2022

Placed: yes, it's about the format, but I don't think it has to be about serialization. For example, it would be reasonable to have a JSON file that's expected to represent a CloudEvent, and talk about its meaning, without any SDK being involved. I wonder if "represented" would be unambiguous?

Omitted/not-omitted: thanks for the clarification. It's probably worth clarifying that :)

@erikerikson
Copy link
Member

RE Placed - it's a good point. Perhaps "...extensions MUST be top-level..."

@jskeet
Copy link
Contributor Author

jskeet commented Mar 14, 2022

RE Placed - it's a good point. Perhaps "...extensions MUST be top-level..."

Works for me.

@duglin
Copy link
Collaborator

duglin commented Mar 16, 2022

  1. Perhaps also say "Extension context attributes" instead of just "Extensions". Right now the main spec only talks about extension context attributes and it might be best to future proof things in case we ever allow extensions that are not "attributes". While I'm ok with keeping just one sentence, I would also be ok with deleting both since the main spec already says extension attributes must be serialized just like all other context attributes. So this is a bit repetitive.

3.1.2 Note the spec also says "Unset attributes MAY be encoded to the JSON value of null.". So present w/null is valid.

  1. Could we just put that in the main spec? It's non-normative and kind of a pointless sentence anyway :-) Unless the intent is to say that each gets their own mime type?? Which might be useful for authors of future formats - and then does belong in a global spot... spec or primer. I prefer spec.

rest LGTM

@duglin
Copy link
Collaborator

duglin commented Jan 19, 2023

@jskeet status of this one?

@jskeet
Copy link
Contributor Author

jskeet commented Jan 19, 2023

@duglin: Waiting for me to have more time :)

@github-actions
Copy link

This issue is stale because it has been open for 30 days with no
activity. Mark as fresh by updating e.g., adding the comment /remove-lifecycle stale.

@jskeet
Copy link
Contributor Author

jskeet commented Feb 27, 2023

/remove-lifecycle stale

I should get back to this at some point...

@github-actions
Copy link

This issue is stale because it has been open for 30 days with no
activity. Mark as fresh by updating e.g., adding the comment /remove-lifecycle stale.

@jskeet
Copy link
Contributor Author

jskeet commented Apr 23, 2023

/remove-lifecycle stale

I do still want to do this - the recent discussions in #1186 demonstrate the importance of it.

@github-actions
Copy link

This issue is stale because it has been open for 30 days with no
activity. Mark as fresh by updating e.g., adding the comment /remove-lifecycle stale.

@jskeet
Copy link
Contributor Author

jskeet commented May 24, 2023

/remove-lifecycle stale

@github-actions
Copy link

This issue is stale because it has been open for 30 days with no
activity. Mark as fresh by updating e.g., adding the comment /remove-lifecycle stale.

@jskeet
Copy link
Contributor Author

jskeet commented Sep 23, 2023

/remove-lifecycle stale

@github-actions
Copy link

This issue is stale because it has been open for 30 days with no
activity. Mark as fresh by updating e.g., adding the comment /remove-lifecycle stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants