Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding on-demand support for a subset of json path. #2127

Merged
merged 18 commits into from
Feb 15, 2024

Conversation

FranciscoThiesen
Copy link
Member

@FranciscoThiesen FranciscoThiesen commented Feb 11, 2024

Implementation for #2070

  • Added at_path() method + documentation
  • Created a new ondemand_json_path_tests file to stress test the newly introduced method

After some reviews I can also introduce this in the DOM (since it won't be much different)

@FranciscoThiesen
Copy link
Member Author

Will check the failures tomorrow

…n function. Now json_path_to_pointer_conversion returns a std::string.
doc/basics.md Outdated Show resolved Hide resolved
doc/basics.md Show resolved Hide resolved
doc/basics.md Outdated Show resolved Hide resolved
@lemire lemire mentioned this pull request Feb 12, 2024
@lemire
Copy link
Member

lemire commented Feb 12, 2024

@FranciscoThiesen Looks very good to me. Please consider my comments. (All minor stuff so far.)

@lemire
Copy link
Member

lemire commented Feb 12, 2024

@mbasmanova Would you have a look? At least at the documentation?

There is still time to clarify the documentation, adjust the API and so forth.

@FranciscoThiesen
Copy link
Member Author

Thanks for the review @lemire, I didn't know about the string suffix format so I learned something new (:

Will make the adjustments later today!

@lemire
Copy link
Member

lemire commented Feb 13, 2024

I didn't know about the string suffix format so I learned something new

Look at my PR, it should become clear...

#2128

Let me repeat myself: we can't use it inside the library because we require that the library be compatible with C++11... but I fully expect nearly everyone that uses simdutf to be at C++17 right now... or better.

Copy link

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FranciscoThiesen @lemire Francisco, thank you for working on this. Daniel, thank you for enabling this work. I read the docs and found them very nice and easy to understand. I made a few tiny comments.

doc/basics.md Outdated Show resolved Hide resolved
@@ -1080,10 +1081,10 @@ auto cars = parser.iterate(cars_json);
cout << cars.at_pointer("/0/tire_pressure/1") << endl; // Prints 39.9
```

A JSON Path is a sequence of segments each starting with the '/' character. Within arrays, an integer
A JSON Pointer path is a sequence of segments each starting with the '/' character. Within arrays, an integer
index allows you to select the indexed node. Within objects, the string value of the key allows you to
select the value. If your keys contain the characters '/' or '~', they must be escaped as '~1' and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your keys contain the characters '/' or '~'

I understand that / is used as a separator, hence, needs escaping, but I wonder why ~ needs escaping.

Curious, do we allow unicode characters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from the json pointer spec: https://datatracker.ietf.org/doc/html/rfc6901#section-4

Since we use ~1 to escape '/', there has to be a way for the user to specify ~1 as part of a valid string. Imagine that we want to specify the following path "/someField~1/...", without having the ~1 being replaced by "/". For this to happen, we need to escape the ~ as ~0 and then the string can be represented as /someField~01/

@lemire can you clarify regarding unicode character support? (I see it in the JSON pointer spec, but you have a lot more context here)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use '~1' to escape '/', there has to be a way for the user to specify...

Got it. Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the markdown text for the message got super weird from using ~. Just updated the comment and it seems that the strikethrough effect is now gone from the markdown format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lemire can you chime in here regarding @mbasmanova question on unicode characters?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Regarding my comment, I know we mean JSON Path, but I was answering regarding what exists currently, prior to this PR.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FranciscoThiesen @mbasmanova My statement was incorrect. We will support unicode (UTF-8) just fine, but not if there are escaped Unicode sequences.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the key "école" would be matched. No problem. But only if it appears byte-by-byte identical in the path and in the JSON document.

It has to be UTF-8 throughout. We don't do normalization. We don't escape.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lemire Goti it. Thank you for clarifying. If not too much trouble, it would be nice to document this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be documented for the next release.

doc/basics.md Outdated Show resolved Hide resolved
doc/basics.md Outdated Show resolved Hide resolved
doc/basics.md Outdated Show resolved Hide resolved
@lemire
Copy link
Member

lemire commented Feb 13, 2024

@FranciscoThiesen I suggest you have a look at @mbasmanova's comments. If needed, I can chime in/help.

@FranciscoThiesen
Copy link
Member Author

Thank you @mbasmanova for the review! Will address all the comments later today.

@lemire
Copy link
Member

lemire commented Feb 14, 2024

@FranciscoThiesen Do you recommend merging?

@FranciscoThiesen
Copy link
Member Author

@lemire yes

Regarding improvements that I can do in a follow-up PR, so far I have:

  • Adding better/more informative error handling during path -> pointer conversion

Also, do you think that adding at_path support for DOM makes sense, or it would be more interesting for me to work on any other issues?

@lemire
Copy link
Member

lemire commented Feb 15, 2024

DOM makes sense

Sure !!!

If you have lots of talent and energy, the most sought after issue is #1386

@lemire
Copy link
Member

lemire commented Feb 15, 2024

Merging. I will improve the documentation before the release.

@lemire lemire merged commit 2b53248 into simdjson:master Feb 15, 2024
41 checks passed
@lemire
Copy link
Member

lemire commented Feb 15, 2024

The documentation regarding Unicode characters is coming: #2132

Note that I changed "JSON Path" to "JSONPath" throughout as the in-progress RFC uses JSONPath and we tend to follow the conventions present in the RFCs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants