Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json primitive map keys #319

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

OndrejSpanel
Copy link
Contributor

@OndrejSpanel OndrejSpanel commented Nov 11, 2020

The code submitted supports Short, Int and Long in both reader and renderer.

Can you please check if the overall style is acceptable for you? I am unsure about a few things:

  • I made the JSON reader to accept Chars anywhere an integral type (int, long, short) is expected. I am not sure if this is desired behaviour. If the strings as numbers should be accepted in map keys only, please advise how can I recognize I am parsing a key
  • there is very little error handling. Once the decoder decodes a Long sucessfully, it will be silently truncated to Int or Short if such type is expected. If you want more error handling, I can add it

@OndrejSpanel OndrejSpanel marked this pull request as draft November 11, 2020 13:55
@codecov-io
Copy link

Codecov Report

Merging #319 (9e46b0a) into master (4f4ac9c) will decrease coverage by 0.03%.
The diff coverage is 60.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #319      +/-   ##
==========================================
- Coverage   74.02%   73.99%   -0.04%     
==========================================
  Files          63       64       +1     
  Lines        4893     4902       +9     
  Branches      575      588      +13     
==========================================
+ Hits         3622     3627       +5     
- Misses       1271     1275       +4     
Impacted Files Coverage Δ
core/src/main/scala/io/bullet/borer/Reader.scala 70.34% <50.00%> (-0.21%) ⬇️
...main/scala/io/bullet/borer/json/JsonRenderer.scala 75.43% <100.00%> (-0.59%) ⬇️
.../io/bullet/borer/derivation/DerivationConfig.scala 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update adff294...9e46b0a. Read the comment docs.

@OndrejSpanel
Copy link
Contributor Author

@sirthias Can you please take a look and tell me what you think about it?

def readShort(): Short =
if (hasShort) {
clearDataItem()
receptacle.intValue.toShort
} else if (hasChars) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think it's a good idea to generally attempt to parse a String element when the user want's to read a number. If the user wants to read a short we shouldn't accept a String.
Remember that borer is first and foremost a CBOR serialization library and supports JSON only as a subset of what CBOR offers.

If we want to enable this "read-simple-types-from-strings" feature we should at least put it behind a configuration flag. Also, we'd have to provide rock-solid error handling since we are now parsing in the Reader itself. And: We'd have to make sure that this additional code doesn't end up hurting performance, e.g. by increasing the method size beyond some inlining limit.

Luckily we are outside of the main hot path here and can easily move all the code in the else branch out into a separate method, where it doesn't mess with JVM inlining scopes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to enable this "read-simple-types-from-strings" feature

I would prefer enabling this for map keys only, but as I wrote, I have no idea how to check if I am parsing a map key or not. Do you see some way for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rock-solid error handling

Is there any parsing with error handling ready for individual numeric types I could use here, or should I use readLongFromString as I do now and check for the range?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... what's your reasoning behind only allowing this auto-conversion on map keys and not other elements as well? What's so special about map keys? To me they are are simply data elements like every other one as well.

The Reader.Config interface already has two other config flags called readIntegersAlsoAsFloatingPoint and readDoubleAlsoAsFloat. We can simply add readStringsAlsoAsPrimitives, which would enable this auto-conversion everywhere (but the default should be false).

(And while we are at it, breaking binary compatibility, we can also rename readDoubleAlsoAsFloat to readDoublesAlsoAsFloat for better consistency.)

Is there any parsing with error handling ready for individual numeric types I could use here, or should I use readLongFromString as I do now and check for the range?

I would simply let java.lang.Integer.parseInteger, java.lang.Short.parseShort, java.lang.Float.parseFloat, etc. do the job and wrap all exceptions in a Borer.InvalidInputData.
When reading booleans from a String I would also allow for "off" and "no" to be read as false and "on" and "yes" to be read as true, in addition to "false" and "true".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not thinking about it as auto-conversion, more like map key encoding / decoding - which is a topic of this PR. I have implemented the encoding, that I think is easy and straightforward. I need some decoding as well. A universal auto-conversion is a possible way, but as it achieves more than desired, it raises the need of config flags.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get the point of your last comment.
What is it that you are suggesting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just trying to explain why I wanted to limit the string parsing to map key. I have no idea how to achieve that, therefore I am unable to suggest something. I leave the decision to you - if you think a broader implementation accepting strings as primitives everywhere is good with you, I can proceed.

There is only one minor point I do not like about this: it will be possible to encode a map to JSON without any flags, the numeric keys will be encoded as string, but when one wants to decode the result of such encoding, it will be necessary to use the config allowing readStringsAlsoAsPrimitives. If you think this is fine, I can live with that.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Reader abstraction has no way of knowing what the role of the next data element is. It doesn't have to know, because in CBOR all data element types can appear in all "positions". It's only JSON that limits certain positions (map keys) to certain types (Strings).
As such there is no readily available way to restrict the "primitives-from-string" auto-conversion to map keys at the reader level and I wouldn't like to change the design or add additional layers/complexity just for this (minor, IMHO) feature.
So, I'm afraid it's going to be an all-or-nothing decision for or against the readStringsAlsoAsPrimitives functionality.

There is only one minor point...

Yes, I agree. This isn't perfect but acceptable to me. borer offers configuration options at the encoding or decoding level, not overarching, across everything. A certain amount of asymmetry is ok, I think.
We already have readIntegersAlsoAsFloatingPoint and readDoubleAlsoAsFloat, which are also somewaht asymmetric.

@@ -74,7 +74,9 @@ final private[borer] class JsonRenderer(var out: Output) extends Renderer {
def onLong(value: Long): Unit =
if (isNotMapKey) {
out = count(writeLong(sep(out), value))
} else failCannotBeMapKey("integer values")
} else {
out = count(writeLong(sep(out).writeAsByte('"'), value).writeAsByte('"'))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move that into it's own method, so that it doesn't get inlined when the JVM decides to inline the onLong method.

Also, what about booleans, ints, overlongs, floats, doubles and number strings?
Should we also stringify null and undefined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about booleans, ints, overlongs, floats, doubles and number strings

I will add that. As I wrote, I first wanted to be sure the style I am implementing it is OK with you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants