Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge cases and Gotchas #2

Open
jdesrosiers opened this issue Feb 27, 2024 · 11 comments
Open

Edge cases and Gotchas #2

jdesrosiers opened this issue Feb 27, 2024 · 11 comments

Comments

@jdesrosiers
Copy link

Here are a few things to look out for when implementing something like this.

The set of properties that are considered keywords depends on the dialect

In the following example, additionalItems should not be highlighted as a keyword because it was removed in 2020-12.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "prefixItems": [true],
  "additionalItems": false, // <- not a keyword
  "items": false,
  "definitions": {} // <- not a keyword
  "aaa": 42 // <- not a keyword
}

When we change the dialect, the properties that are considered keywords changes.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "prefixItems": [true], // <- not a keyword
  "additionalItems": false,
  "items": false
  "definitions": {}
  "aaa": 42 // <- not a keyword
}

Properties are only keywords inside schemas

Not every object in a JSON Schema document is a schema, so you need to know when you're in a schema and when you're not. Here are a couple examples.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "properties": {
    "$id": "foo" // <- not a keyword
  }
}

In the next example, $id isn't considered a keyword because definitions isn't a keyword in 2020-12. Therefore, their values aren't schemas and the properties of those values shouldn't be considered keywords.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "definitions": {
    "foo": {
      "type": "string" // <- not a keyword
    }
  }
}

Embedded schemas can have a different dialect

It's possible for embedded schemas to have a different dialect than their parent schema. In the following example, the same keywords are highlighted differently depending on which schema resource the keyword appears in.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "prefixItems": [true],
  "additionalItems": false, // <- not a keyword
  "items": false,
  "$defs": {
    "foo": {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "$id": "https://example.com/schema/embedded",
      "prefixItems": [true], // <- not a keyword
      "additionalItems": false,
      "items": false,
      "definitions": {}
    }
  }
}
@sudo-jarvis
Copy link
Contributor

sudo-jarvis commented Feb 27, 2024

@jdesrosiers , the current implementation as in #1 , is supporting only the latest dialect, not multiple dialects or previous dialects. Any idea how to dynamically fetch the keywords for each dialect ?

@jdesrosiers
Copy link
Author

Any idea how to dynamically fetch the keywords for each dialect ?

There isn't a convenient list anywhere you can just fetch. You'll need to build the lists yourself from the spec or meta-schemas or whatever other source you can find.

@Julian
Copy link
Member

Julian commented Feb 28, 2024

The simple list of keywords is something that my plan is probably to eventually live in the jsonschema-specifications project, which essentially represents "give me the JSON Schema specifications in Python at runtime".

But that plan includes also writing type annotations for them, so it's a bit medium term.

For now simply copying / writing them down is the right thing.

@Julian
Copy link
Member

Julian commented Feb 28, 2024

(Oh and definitely awesome! Thanks again Jason for sharing your learnings!)

@sudo-jarvis
Copy link
Contributor

@Julian To add support for multiple schemas what we could do is that once the lexer gives us a list of tokens we can iterate from left to right and maintain a stack using which we will find for each keyword which is its nearest $schema to the left.

We'll fill the stack with each token and once we encounter a } we will pop all tokens till the first {, this will ensure that even due to nesting the earliest $schema present on the left would actually represent the $schema which we need to refer for that token.

Then once we know it we'll check the dict of that particular schema if the token is to be treated as a keyword or not.

@Julian
Copy link
Member

Julian commented Feb 29, 2024

Does pygments's JSON lexer not already handle the recursion? It presumably must, since it's noticing when an object literal is being encountered, so the stack you're talking about must already be there. "All" we should have to do is intercept that object literal parsing once it's done, look at the $schema keyword if present, and then decide how to handle the other keywords, I'd think. But I haven't looked closely clearly.

@sudo-jarvis
Copy link
Contributor

@Julian, Yes you are right we can do that when the whole document has the same schema. However, I was talking about the case when say in the outer object we have draft-2020-12 schema and in some inner object we have draft-07 schema

As mentioned by @jdesrosiers here:

It's possible for embedded schemas to have a different dialect than their parent schema. In the following example, the same keywords are highlighted differently depending on which schema resource the keyword appears in.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "prefixItems": [true],
  "additionalItems": false, // <- not a keyword
  "items": false,
  "$defs": {
    "foo": {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "$id": "https://example.com/schema/embedded",
      "prefixItems": [true], // <- not a keyword
      "additionalItems": false,
      "items": false,
      "definitions": {}
    }
  }
}

@Julian
Copy link
Member

Julian commented Feb 29, 2024

Yes I know that bit of course, but I forgot Pygments doesn't do any AST parsing, just a flat list of tokens, so it doesn't tell us where objects start and end... OK, that's unfortunate, but what you say sounds fine then. And you can get the list of keywords for each dialect by adding a dependency on jsonschema -- the keywords you need are then jsonschema.Draft202012Validator.VALIDATORS.keys().

@jdesrosiers
Copy link
Author

find for each keyword which is its nearest $schema to the left.

It's a little more complicated than that. $schema only has an effect when it's at the root of a schema resource. The presence of an identifier ($id or id depending on dialect) determines that the subschema is schema resource.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "prefixItems": [true],
  "additionalItems": false, // <- not a keyword
  "items": false,
  "$defs": {
    "foo": {
      "$schema": "http://json-schema.org/draft-07/schema#", // <- no $id, so this keyword has no effect
      "prefixItems": [true],
      "additionalItems": false, // <- not a keyword
      "items": false,
      "definitions": {} // <- not a keyword
    }
  }
}

Keep in mind that you can't just look for $id or id, you have to look for the one appropriate to the dialect.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "prefixItems": [true],
  "additionalItems": false, // <- not a keyword
  "items": false,
  "$defs": {
    "foo": {
      "$schema": "http://json-schema.org/draft-04/schema#", // <- no id, so this keyword has no effect
      "$id": "https://example.com/schema/embedded", // <- $id doesn't apply for draft-04
      "prefixItems": [true],
      "additionalItems": false, // <- not a keyword
      "items": false,
      "definitions": {} // <- not a keyword
    }
  }
}

Unfortunately, there's an ambiguous situation that you're going to have to figure out how to deal with. Imagine that $schema declares a dialect that you don't recognize.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "prefixItems": [true],
  "additionalItems": false, // <- not a keyword
  "items": false,
  "$defs": {
    "foo": {
      "$schema": "https://example.com/unknown-dialect",
      "$id": "https://example.com/schema/embedded", // <- is this an identifier or not?
      "prefixItems": [true], // <- is this a keyword or not?
      "additionalItems": false, // <- is this a keyword or not?
      "items": false, // <- is this a keyword or not?
      "definitions": {} // <- is this a keyword or not?
    }
  }
}

If you don't understand the dialect, you don't know what keyword is used for identifying a schema resource. Therefore, it's ambiguous whether $schema should be respected or not. My solution for the purpose of highlighting is to treat an unknown dialect as an embedded schema even though I don't know if it declares an identifier and treat all properties as non-keywords. It's not perfect, but it's the best we can do with limited information.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "prefixItems": [true],
  "additionalItems": false, // <- not a keyword
  "items": false,
  "$defs": {
    "foo": {
      "$schema": "https://example.com/unknown-dialect",
      "$id": "https://example.com/schema/embedded", // <- not a keyword
      "prefixItems": [true], // <- not a keyword
      "additionalItems": false, // <- not a keyword
      "items": false, // <- not a keyword
      "definitions": {} // <- not a keyword
    }
  }
}

@sudo-jarvis
Copy link
Contributor

@jdesrosiers , So basically first we need to look at the dialect, and then that dialect would specify if id is a keyword or $id and then accordingly the presence of id or $id would tell whether the keywords in that subschema are to be treated according to that dialect or according to the dialect of the enclosing schema?

@jdesrosiers
Copy link
Author

Correct, but don't forget to also handle the case where you don't know the dialect that's specified (the ambiguous situation described in my last comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants