-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding on-demand support for a subset of json path. #2127
Conversation
- Not yet tested - Not yet compiled - Not yet used in ondemand array/object/etc...
Pending: - Building - Adding tests for each of the on-demand classes usage of at_path - Properly documenting the subset of json path that is currently supported.
Will check the failures tomorrow |
…n function. Now json_path_to_pointer_conversion returns a std::string.
@FranciscoThiesen Looks very good to me. Please consider my comments. (All minor stuff so far.) |
@mbasmanova Would you have a look? At least at the documentation? There is still time to clarify the documentation, adjust the API and so forth. |
Thanks for the review @lemire, I didn't know about the string suffix format so I learned something new (: Will make the adjustments later today! |
Look at my PR, it should become clear... Let me repeat myself: we can't use it inside the library because we require that the library be compatible with C++11... but I fully expect nearly everyone that uses simdutf to be at C++17 right now... or better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FranciscoThiesen @lemire Francisco, thank you for working on this. Daniel, thank you for enabling this work. I read the docs and found them very nice and easy to understand. I made a few tiny comments.
@@ -1080,10 +1081,10 @@ auto cars = parser.iterate(cars_json); | |||
cout << cars.at_pointer("/0/tire_pressure/1") << endl; // Prints 39.9 | |||
``` | |||
|
|||
A JSON Path is a sequence of segments each starting with the '/' character. Within arrays, an integer | |||
A JSON Pointer path is a sequence of segments each starting with the '/' character. Within arrays, an integer | |||
index allows you to select the indexed node. Within objects, the string value of the key allows you to | |||
select the value. If your keys contain the characters '/' or '~', they must be escaped as '~1' and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your keys contain the characters '/' or '~'
I understand that / is used as a separator, hence, needs escaping, but I wonder why ~ needs escaping.
Curious, do we allow unicode characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes from the json pointer spec: https://datatracker.ietf.org/doc/html/rfc6901#section-4
Since we use ~1
to escape '/', there has to be a way for the user to specify ~1
as part of a valid string. Imagine that we want to specify the following path "/someField~1/...", without having the ~1
being replaced by "/". For this to happen, we need to escape the ~
as ~0
and then the string can be represented as /someField~01/
@lemire can you clarify regarding unicode character support? (I see it in the JSON pointer spec, but you have a lot more context here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we use '~1' to escape '/', there has to be a way for the user to specify...
Got it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the markdown text for the message got super weird from using ~
. Just updated the comment and it seems that the strikethrough effect is now gone from the markdown format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lemire can you chime in here regarding @mbasmanova question on unicode characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Regarding my comment, I know we mean JSON Path, but I was answering regarding what exists currently, prior to this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FranciscoThiesen @mbasmanova My statement was incorrect. We will support unicode (UTF-8) just fine, but not if there are escaped Unicode sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the key "école"
would be matched. No problem. But only if it appears byte-by-byte identical in the path and in the JSON document.
It has to be UTF-8 throughout. We don't do normalization. We don't escape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lemire Goti it. Thank you for clarifying. If not too much trouble, it would be nice to document this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be documented for the next release.
@FranciscoThiesen I suggest you have a look at @mbasmanova's comments. If needed, I can chime in/help. |
Thank you @mbasmanova for the review! Will address all the comments later today. |
@FranciscoThiesen Do you recommend merging? |
@lemire yes Regarding improvements that I can do in a follow-up PR, so far I have:
Also, do you think that adding at_path support for DOM makes sense, or it would be more interesting for me to work on any other issues? |
Sure !!! If you have lots of talent and energy, the most sought after issue is #1386 |
Merging. I will improve the documentation before the release. |
The documentation regarding Unicode characters is coming: #2132 Note that I changed "JSON Path" to "JSONPath" throughout as the in-progress RFC uses JSONPath and we tend to follow the conventions present in the RFCs. |
Implementation for #2070
After some reviews I can also introduce this in the DOM (since it won't be much different)