Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add full support for JSONPath #2070

Open
lemire opened this issue Sep 26, 2023 · 24 comments
Open

Add full support for JSONPath #2070

lemire opened this issue Sep 26, 2023 · 24 comments
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@lemire
Copy link
Member

lemire commented Sep 26, 2023

We support JSON Pointer, but we should support JSON Path. It is more work, but also more useful.

Currently, a limited subset of JSONPath is supported, see #2127

@mbasmanova
Copy link

@lemire We are interested in this functionality for Velox. Curious if you have a timeline in mind.

@lemire
Copy link
Member Author

lemire commented Oct 13, 2023

@mbasmanova Work on this feature will start 'soon' (1 week or 2 weeks). JSON Path is quite rich, and it is (if nothing else) challenging to test support. However, we should have partial support in the coming weeks, at the prototypical level.

@mbasmanova
Copy link

This is great. Keep us posted.

@lemire
Copy link
Member Author

lemire commented Dec 10, 2023

@mbasmanova Sorry for the delay. We fully support JSON Pointer with high performance. Supporting JSON with high performance is... challenging. A subset of the language could be supported, but this subset has a significant overlap with JSON Pointers...

@lemire
Copy link
Member Author

lemire commented Dec 10, 2023

Basically, there are engineering issues involved to do it efficiently. If you don't care about performance, then it is easy, of course, but providing slow code is not in the spirit of this project. So... it is a challenge...

I do recommend people consider JSON Pointer.

@mbasmanova
Copy link

@lemire Daniel, thank you for the update. I'm wondering if you could share some more details. In particular, I'm curious what are the challenges in supporting JSON Path efficiently and what is the subset that can be supported. I haven't looked at JSON Pointers yet, but do you happen whether it is possible to automatically re-write a subset of JSON Path queries into JSON Pointers queries?

@lemire lemire added help wanted Extra attention is needed good first issue Good for newcomers labels Dec 11, 2023
@lemire
Copy link
Member Author

lemire commented Dec 11, 2023

but do you happen whether it is possible to automatically re-write a subset of JSON Path queries into JSON Pointers queries?

Basically JSON Pointer provides forward queries...

Given

{ "c" :{ "foo": { "a": [ 10, 20, 30 ] }}, "d": { "foo2": { "a": [ 10, 20, 30 ] }} , "e": 120 }

You have the following JSON Pointer queries...

  • "/c/foo/a/1" is 2
  • "/d/foo2/a/2" is 30
  • "/e" is 120

The equivalent in JSON Path might be... (up to potential semantics differences)

  • $c.foo.a[1]
  • $d.foo2.a[2]
  • $e

JSON Pointer is a well-established standard.

See https://www.rfc-editor.org/rfc/rfc6901

I should stress that JSON Pointer queries are very much still used in production and the standard is very much alive.

We also support an extension whereas you can apply a JSON Pointer from the current node, as in...

auto cars_json = R"( [
        { "make": "Toyota", "model": "Camry",  "year": 2018, "tire_pressure": [ 40.1, 39.9, 37.7, 40.4 ] },
        { "make": "Kia",    "model": "Soul",   "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] },
        { "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] }
        ] )"_padded;

        ondemand::parser parser;
        ondemand::document cars = parser.iterate(cars_json);
        std::vector<double> measured;
        for (auto car_element : cars) {
            double x = (double) car_element.at_pointer("/tire_pressure/1");
            measured.push_back(x);
        }

       //  measured.push_back == {39.9, 31, 30};

I'm curious what are the challenges

We support JSON Pointer highly efficiently. There is no head memory allocation and no need for additional dependencies.

As far as I can tell, JSON Path implementations are currently not guaranteed to be efficient.

The current state-of-the-art with respect to attempting to implement JSON Path efficiently is JSONSki but they provide only a partial implementation... It has no support for descendant selectors, and their wildcard selector implements only a part of the JSONPath specification, stepping into every entry of an array, but not into every field of an object.

The type of JSON Path queries that would be challenging to implement efficiently are queries such as$.*[?@.price < 10]..a[?search(@.b, | {"b": "j"}].

It is doable if you have enough engineering effort, and I am not closing this issue. In fact, I am marking it as 'help needed' and 'good first issue'. A couple of talented engineers could implement JSON Path on top of, say, the On Demand API. But it would take more than a few days. I would be interested in working on this, and I might still work on this, but it is not trivial.

@FranciscoThiesen
Copy link
Member

@lemire do you still think this is a good-first-issue?

Issue looks challenging and interesting.

@lemire
Copy link
Member Author

lemire commented Feb 2, 2024

@FranciscoThiesen It can be quite challenging, and maybe difficult as a starting point. However, you are welcome to give it a try, it might prove to be easier than I anticipate. Furthermore, it is not necessary to implement the full specification.

@mbasmanova
Copy link

@lemire Daniel, thank you for detailed explanation. I think I'm getting it. It sounds like we could support a subset of JSONPath that can be re-written into JSON Pointer.

@lemire
Copy link
Member Author

lemire commented Feb 2, 2024

@mbasmanova Yes, such support could be done relatively quickly.

@lemire
Copy link
Member Author

lemire commented Feb 2, 2024

Maybe @FranciscoThiesen could be interested !!!

@FranciscoThiesen
Copy link
Member

I'll give it a shot! @lemire can you assign it to me?

@lemire
Copy link
Member Author

lemire commented Feb 2, 2024

@FranciscoThiesen Done.

@FranciscoThiesen
Copy link
Member

I took some time this weekend to familiarize myself with the codebase + PRs introducing json pointers in the past years + some Json Path resources like (https://goessner.net/articles/JsonPath/).

@lemire @mbasmanova do you believe the strategy of (eficiently) converting json path -> json pointer and then just leveraging the current at_pointer functionality makes sense and adds value? (at least as a starting point)

The json path -> json pointer conversion appears to be much simpler that to have an at_path() method implemented from scratch.

@mbasmanova
Copy link

do you believe the strategy of (eficiently) converting json path -> json pointer and then just leveraging the current at_pointer functionality makes sense and adds value? (at least as a starting point)

I feel this would be valuable.

@lemire
Copy link
Member Author

lemire commented Feb 6, 2024

@FranciscoThiesen Give it a try.

@FranciscoThiesen
Copy link
Member

Just wanted to give an update. I am actively working on it, currently trying to solve some linker errors

@mbasmanova
Copy link

mbasmanova commented Feb 10, 2024

@FranciscoThiesen Thank you for the update. Super excited about this functionality becoming available soon.

@PHILO-HE
Copy link

PHILO-HE commented Apr 26, 2024

When using bracket to specify field name in json path, e.g., $['store']['book'][0]['title'], I found single quote symbols have to be removed beforehand. Otherwise, simdjson cannot return the correct result. Is this an expected behavior? cc @lemire, @FranciscoThiesen

@mbasmanova
Copy link

To add to @PHILO-HE's question, does simdjson support json paths with keys with dots, e.g. $['store.1.2.3'] ?

@lemire
Copy link
Member Author

lemire commented Apr 26, 2024

The documentation is as follows:

The subset of JSONPath that is implemented is the subset that is trivially convertible into the JSON Pointer format, using . to access a field and [] to access a specific index.

It is obviously underspecified. We fully support JSON Pointer (the entire specification), and @FranciscoThiesen essentially mapped a subset of JSONPath to the equivalent JSON Pointer.

Observe that the issue is still open: we provide strictly limited support for JSON Path.

We are certainly inviting further contributions.

Question 1.

When using bracket to specify field name in json path, e.g., $['store']['book'][0]['title'], I found single quote symbols have to be removed beforehand. Otherwise, simdjson cannot return the correct result. Is this an expected behavior?

I believe that the expected JSON Path is .store.book[0].title. Example:

void demo1() {

  auto json = R"( {
  "store": {
    "book": [
      {
        "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      {
        "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      },
      {
        "category": "fiction",
        "author": "Herman Melville"
      }
    ]
  })"_padded;
  ondemand::parser parser;
  auto doc = parser.iterate(json);
  // prints "Sayings of the Century"
  std::cout << doc.at_path(".store.book[0].title") << std::endl;
}

Question 2.

does simdjson support json paths with keys with dots, e.g. $['store.1.2.3'] ?

It works in the sense of the examples below...

void demo2() {
  auto json = R"( {
  "store": ["aa", ["humbug", "Montreal",  ["a", "christmas", "carol", "by", "charles", "dickens"]]]
  })"_padded;
  ondemand::parser parser;
  auto doc = parser.iterate(json);
  // prints "by"
  std::cout << doc.at_path(".store.1.2.3") << std::endl;
}


void demo3() {
  auto json = R"( {
  "store": {"1":{ "2":{ "3": "by" } }
  })"_padded;
  ondemand::parser parser;
  auto doc = parser.iterate(json);
  // prints "by"
  std::cout << doc.at_path(".store.1.2.3") << std::endl;
}

@PHILO-HE
Copy link

PHILO-HE commented May 3, 2024

Question 1.

When using bracket to specify field name in json path, e.g., $['store']['book'][0]['title'], I found single quote symbols have to be removed beforehand. Otherwise, simdjson cannot return the correct result. Is this an expected behavior?

I believe that the expected JSON Path is .store.book[0].title.

Hi @lemire, thanks for your comment! In Jayway JsonPath, I note an example for bracket–notation is ['<name>' (, '<name>')], see https://github.com/json-path/JsonPath?tab=readme-ov-file#operators. Maybe, simdjson can support such usage of bracket notation in the future?

@lemire
Copy link
Member Author

lemire commented May 3, 2024

The issue is open. We are eagerly inviting contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants