Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBORParser does not accept "undefined value" #93

Closed
mbaril opened this issue Jun 7, 2017 · 10 comments
Closed

CBORParser does not accept "undefined value" #93

mbaril opened this issue Jun 7, 2017 · 10 comments
Labels

Comments

@mbaril
Copy link

mbaril commented Jun 7, 2017

I'm trying to use the CBORParser to parse a byte stream but the method nextToken() throws JsonParseException when the stream contains "undefined value" (Major Type: 7, Additional info: 23).

I guess it is because there in no equivalent for this datatype in JSON.

Is it possible to use the CBORParser when input stream contain "undefined value"?

Would that be a good idea to modify the CBORParser class to add a new method "nextCborToken" to add support for "undefined value"?

@cowtowncoder
Copy link
Member

@mbaril Could you point to a document or article that explains semantics of the type? I assume this is one of (optional) extension types?

I can't really comment on how such type should behave before reading about its semantics within CBOR, but there are ways to expose values for which there is no JSON counterpart; typically as VALUE_EMBEDDED_OBJECT tokens (which is most commonly used to expose embedded binary data). Sometimes it may make sense to add separate format-specific accessors.

@mbaril
Copy link
Author

mbaril commented Jun 9, 2017

@cowtowncoder
Copy link
Member

cowtowncoder commented Jun 9, 2017

Ok thanks. I assume this to be a relatively new addition, since I did not remember such a value....
I guess my next question is this: what purpose does this value serve? Isn't it just null?
What kind of use case is this used with data you receive?

I am guessing this might try to somehow capture logical case of "not defined vs explicit null"?
Or... not sure.

Anyway: the simplest way would seem to be to just expose this as JsonToken.VALUE_NULL, and perhaps add a CBORParser method of allowing further introspection.
Alternatively/additionally, could make getEmbeddedValue() return null (but not throw exception).

@cowtowncoder
Copy link
Member

Further from the spec:

3.8.  Undefined Values

   In some CBOR-based protocols, the simple value (Section 2.3) of
   Undefined might be used by an encoder as a substitute for a data item
   with an encoding problem, in order to allow the rest of the enclosing
   data items to be encoded without harm.

which suggest two likely courses of action:

  1. Expose just as null
  2. Throw an exception to indicate like encoding problem by encoder

I think the simplest initial approach would be (1) here, and at very least I think that would make sense for 2.8 (minimal change for patch of a maintenance branch). For 2.9 other changes are still possible, although we are getting very close to 2.9 API freeze.

@mbaril
Copy link
Author

mbaril commented Jun 9, 2017

IMHO it is important to be able to make the difference between "null" (Major Type: 7, Additional info: 22) and "undefined value" (Major Type: 7, Additional info: 23) because the meaning is not exactly the same (as you pointed out (section 3.8).

If I understand your propositions correctly, I don't think it will be possible to make such distinction with option #1.

For that reason, my preference would be to go with option #2.

Right now, when we call the CBOR parser with an undefined value the next token function throws an JsonParseException. For that reason, I think that option #2 is also the option that is the most appropriate if you want to keep the same behaviour as before.

@cowtowncoder
Copy link
Member

Do you have specific reason to differentiate explicit null from undefined here? I am bit surprised that failing to decode seems better than coercing this value into null.
I am also not sure how adding such coercion would break use cases: encoded data that used to be unprocessable would become available, to the degree that null is legal for value in question.

As to distinguishing the "undefined" value; for most data-binding use cases (taking CBOR, getting POJOs) it seems null is about the only workable option, as I can not see any other JVM value that would have better semantics.
For streaming decoding application could perhaps take some other steps but that is bit unclear.

The other option, exposing JsonToken.VALUE_EMBEDDED_OBJECT, and making getEmbeddedObject() return null (or if you can think of better JDK type to represent this concept, that -- in Java 8 could conceivably use Optional.empty(), but that can't be done before Jackson 3.x), is still a possibility as well.

Still, exposing as null vs throwing exception, based on new CBORParser.Feature value, seems simplest at this point.

@bgiori
Copy link

bgiori commented May 29, 2018

Hi, is this issue still on the table? I would prefer a null value to an exception. There may be a slight difference between the two, but I think the ability parse CBOR to JSON is paramount.

In the meantime I will fork to make the change myself as I need to be able to parse undefined values. I can submit a PR if that is something you'd be interested in pulling in.

@cowtowncoder
Copy link
Member

@bgiori Yes, no progress. I think my take on this is as follows:

  1. For patch release, I think exposing as null is the way to go. Seems to me like a better choice than decoding failure, essentially.
  2. There is now plan to do 2.10 before 3.x, and for that, something additional may be planned -- such as a new method in CBORParser (which at least allows custom handling, even if not conveniently)
  3. Alternatively/additionally for 2.10 / 3.x may add CBORParser.Feature to enable alternate handling -- but still not quite sure what that'd be. Exception (with proper JsonParseException subtype, if there is one, and message) is an option, just not sure how useful.

So; PR against 2.9 branch for just returning null would make sense, as the first step.

@cowtowncoder
Copy link
Member

Hmmh. After re-reading the specification, I am beginning to think that encoding use of 0xF7 is a REALLY REALLY bad practice, and I can't see why on earth anyone would want to do that.
I can support for accepting that, but it... I don't think that is how formats should work.
There is no equivalent concept in JSON (reference from CBOR to Javascript is bit nonsensical in that sense; there are no special symbols for, say, Lisp or Cobol, so why Javascript?), nor most other formats. There are LOGICAL concepts for something like that (one could argue null might be sort of similar), at higher levels, but at encoding it just... makes no sense. At all.

Sigh. I like CBOR specification in many places, but just is just Weird, and not in a good way.

@cowtowncoder cowtowncoder changed the title [cbor] CborParser do not support "undefined value" [cbor] CBORParser does not accept "undefined value" May 30, 2018
@cowtowncoder cowtowncoder changed the title [cbor] CBORParser does not accept "undefined value" CBORParser does not accept "undefined value" May 30, 2018
cowtowncoder added a commit that referenced this issue May 30, 2018
@cowtowncoder
Copy link
Member

Implemented so that "undefined" is now return as JsonToken.VALUE_NULL, for Jackson 2.9.6 (and rest of 2.9 patches).

Created #137 as follow-up item for improvements in 2.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants