Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC: Define upgrade/downgrade language agnostic declarative transformation rules for all JSON Schema dialects #599

Open
jviotti opened this issue Jan 30, 2024 · 67 comments
Labels
gsoc Google Summer of Code Project Idea

Comments

@jviotti
Copy link
Member

jviotti commented Jan 30, 2024

Project title

Define upgrade/downgrade language agnostic declarative transformation rules for all JSON Schema dialects.

Brief Description

The Alterschema project defines a set of JSON-based formal transformation rules for upgrading schemas between Draft 4 and 2020-12, and all dialects in between. These rules are defined using JSON Schema and JSON-e and live within the Alterschema project.

We would like to revise these rules, extend them to support every dialect of JSON Schema (potentially including OpenAPI's old dialects too), and attempt to support some level of downgrading.

Instead of having these rules on the Alterschema repository, we want to have them on the JSON Schema organization for everybody to consume, including Alterschema itself.

Revising the rule format should consider currently unresolved edge cases in Alterschema like tweaking references after a subschema is moved.

Expected Outcomes

A new repository in the JSON Schema organization with upgrade/downgrade rules defined using JSON.

Skills Required

Understanding of various dialects of JSON Schema and their differences.

Mentors

@jviotti

Expected Difficulty

Medium

Expected Time Commitment

350 hours

@benjagm benjagm added the gsoc Google Summer of Code Project Idea label Jan 30, 2024
@benjagm
Copy link
Collaborator

benjagm commented Jan 31, 2024

Thanks Juan. This looks amazing!

@Era-cell
Copy link

Era-cell commented Feb 22, 2024

Hey @jviotti I read through the problem statement, I loved the way the description was put through giving a good understanding. I would love to work on this problem statement under GSOC and the mentors. Can you guide me through more understanding regarding this..😁 and where to start with
And will it be good to read all of the repositories

@jviotti
Copy link
Member Author

jviotti commented Feb 23, 2024

Hey there! I'd first suggest getting acquainted with https://github.com/sourcemeta/alterschema. This is the original project where I prototyped something like what we want to do here, using JSON-e (https://json-e.js.org), but ended up hitting some blockers. You can take a look at all the upgrade transformation rules I support here: https://github.com/sourcemeta/alterschema/tree/master/rules. Try to read them, and understand them mainly in conjunction with JSON Schema's official migration guide: https://json-schema.org/specification#migrating-from-older-drafts.

The way Alterschema work is pretty simple. It will recursively traverse through every subschema of the given schema in a top-down manner, applying all the rules it knows about to every subschema over and over again until no more transformation rules can be executed. The core business logic of it its literally a small JavaScript file: https://github.com/sourcemeta/alterschema/blob/master/bindings/node/index.js

For example, Alterschema rules for upgrading JSON Schema 2019-09 to 2020-12 are defined here: https://github.com/sourcemeta/alterschema/blob/master/rules/jsonschema-2019-09-to-2020-12.json, based on what JSON Schema published here: https://json-schema.org/draft/2020-12/release-notes.

Now, what we would like to do in this GSoC initiative is learn from what we did in Alterschema to do another take on the problem that solves the limitations of Alterschema. The main limitation is this one: sourcemeta/alterschema#43.

In summary, a JSON Schema may reference other parts of itself using URI encoded JSON Pointers along with the $ref and $dynamicRef keywords. The current JSON-e rules that I have on Alterschema will only look at the current subschema and blindly transform it according to what the template says.

However, what happens if there is a reference in another other part the schema that is now invalid after the schema transformation you did somewhere else? If so, we don't have a deterministic way of detecting this, even less know how to "fix up" the reference pointers.

The conclusion I got from this is that JSON-e, while powerful, is too low level and doesn't carry semantics about what the transformation actually did. For example, if you upgrade definitions to $defs, that's a simple rename. Knowing that it is indeed just a simple rename, it's easy to know how to fix any pointers that included /definitions in it.

So what I'm thinking about is that we can study the transformation rules that we want to perform, and break them down into higher level sub transformations. For example, are you completely deleting something? Are we performing just a rename? Are we moving the contents around? If we design a JSON language that works at a higher level of abstraction, we can deterministically know how we should fix any affected pointer.

@jviotti
Copy link
Member Author

jviotti commented Feb 23, 2024

So I'd say the phases in this project are like this:

  • Research JSON Schema transformation rules, categorize them, etc
  • Come up with a higher-level transformation language than JSON-e that carry semantics about how we are actually transforming the schema (I was thinking something similar to JSON Patch (https://jsonpatch.com))
  • Then do a prototype of implementing upgrade rules with this language, ensuring it solves the limitations of Alterschema
  • If we have more time, we use this language to attempt to level of downgrading support, etc

@jviotti
Copy link
Member Author

jviotti commented Feb 23, 2024

As an initial qualifying task for this project (cc @benjagm), I propose:

  • Go through every upgrade transformation rules from JSON Schema 2019-09 to 2020-12 in the official upgrade guide (https://json-schema.org/draft/2020-12/release-notes) and on Alterschema (https://github.com/sourcemeta/alterschema/blob/master/rules/jsonschema-2019-09-to-2020-12.json) and categorize them on a spreadsheet/table based on what they are doing. For example, are they simple renames, are they completely moving stuff around? Are they doing something even more complicated? Up to you to figure out how to categorize them

  • Propose a toy JSON-based DSL transformation language (perhaps inspired by JSON-e and JSON Patch) that encapsulates how to perform these 2019-09 to 2020-12 upgrade rules in a way that you can algorithmically tell how to fix any $ref JSON Pointer that went through the transformed schema

  • Describe a pseudo-algorithm to fix up $refs

@jviotti
Copy link
Member Author

jviotti commented Feb 23, 2024

As a more specific (though probably a bit artificial and silly 😅) example of the $ref issue, consider the following JSON Schema 2019-09:

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "type": "array",
  "items": [
    { "type": "string" },
    { "type": "number" }
  ],
  "additionalItems": { 
    "$ref": "#/items/0" 
  }
}

To turn it into a JSON Schema 2020-12, we need to:

  • Replace $schema with https://json-schema.org/draft/2020-12/schema
  • Rename /items to /prefixItems
  • Rename /additionalItems to /items

However, if you blindly perform these transformations, you would end up with the following schema:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "prefixItems": [
    { "type": "string" },
    { "type": "number" }
  ],
  "items": { 
    "$ref": "#/items/0" 
  }
}

However note that the /items/$ref, which still says #/items/0 is now invalid. We first renamed prefixItems to items, so the $ref should have been updated to #/prefixItems/0 too.

This one is a bit simple, but think about more complex variations of the same problem. You might have long references where many of its components will need to be updated, and in some cases, it will be more than just a component rename.

@jviotti
Copy link
Member Author

jviotti commented Feb 23, 2024

Or if you can think of a better way to deterministically solve this problem, please propose it and we can work on it together!

@MeastroZI
Copy link

MeastroZI commented Feb 23, 2024

However note that the /items/$ref, which still says #/items/0 is now invalid. We first renamed prefixItems to items, so the $ref should have been updated to #/prefixItems/0 too.

I'm confused by this line. Are we supposed to convert prefixItems to items for the reference to be #/prefixItems/0 as part of the conversion from 2019-09 to 2020-12?

Perhaps you meant items to prefixItems, or maybe I am misunderstanding? 😕

@jviotti
Copy link
Member Author

jviotti commented Feb 23, 2024

@MeastroZI The reference was originally #/items/0, but because we rename items to prefixItems, for the schema to be valid, we should have also adjusted the reference from #/items/0 to #/prefixItems/0. The expected end result should have been this:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "prefixItems": [
    { "type": "string" },
    { "type": "number" }
  ],
  "items": { 
    "$ref": "#/prefixItems/0" 
  }
}

@MeastroZI
Copy link

MeastroZI commented Feb 23, 2024

Hasn't this problem already been addressed with the pattern

"pattern": "/items/\\d+"

"$eval": "replace(schema['$ref'], '/items/(\\d+)', '/prefixItems/$1')"

or is there a possibility that this approach might not cover all cases? If so, could you please specify which cases it might not handle, so I can gain a better understanding of the issue?

@jviotti
Copy link
Member Author

jviotti commented Feb 23, 2024

@MeastroZI For this very trivial rename case yes, but it's very easy to construct valid JSON Schemas where that simple pattern won't do. Take this one as a silly example:

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "type": "object",
  "properties": {
    "items": {
      "items": [
        { "type": "string" }
      ]
    },
    "extra": {
      "$ref": "#/properties/items/items/0" 
    }
  }
}

It has an object property called items which is not the actual JSON Schema keyword. In this case, you need to rename only /properties/items/items to /properties/items/prefixItems, and thus only rename the second occurrence of items in the JSON Pointer. In JSON Schema 2019-09, items can also be both a schema or a collection of schemas, so you can have items be a schema that declares items as an array inside and get into a similar situation. You can probably come up with more edge cases around it.

In any case, items to prefixItems is just a simple rename upgrade example. Other JSON Schema keywords may require more than just a simple renaming, making this even harder to resolve for all cases.

Keep in mind that a tool that upgrades schemas must be able to handle ANY valid JSON Schema document that the user passes to it, and handle these tricky edge cases accordingly.

@jviotti
Copy link
Member Author

jviotti commented Feb 23, 2024

For i.e. definitions to $defs in the Alterschema issue I shared is even trickier, because you cannot rely on the next component of items being an integer to improve the pattern like we do for items to prefixItems.

@jviotti
Copy link
Member Author

jviotti commented Feb 23, 2024

Here is a fun one that is valid and breaks the \\d part of the regex:

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "type": "object",
  "properties": {
    "foo": {
      "$ref": "#/$defs/items/0" 
    }
  },
  "$defs": {
    "items": {
      "0": {
        "type": "string"
      }
    }
  }
}

@jviotti
Copy link
Member Author

jviotti commented Feb 23, 2024

What I'm thinking about is that we can statically analyze the schema first, and know what each component of the pointers mean (i.e. does the /items part of #/$defs/items correspond to the actual items 2019-09 applicator in array form?) That plus additional semantics around what the transformation does could help us resolve every case

@Era-cell
Copy link

What I'm thinking about is that we can statically analyze the schema first, and know what each component of the pointers mean (i.e. does the /items part of #/$defs/items correspond to the actual items 2019-09 applicator in array form?) That plus additional semantics around what the transformation does could help us resolve every case
Hi, so instead of handling for every single case for keywords to be transformed.., it is better to make checks based on the semantic hierarchial flow. Am I right? Like chacking whether its an array or object if its only a real item and then casting the 0 to string? Is that what semantics means

@Era-cell
Copy link

Era-cell commented Feb 23, 2024 via email

@jviotti
Copy link
Member Author

jviotti commented Feb 24, 2024

Hi, so instead of handling for every single case for keywords to be transformed.., it is better to make checks based on the semantic hierarchial flow. Am I right? Like chacking whether it's an array or object if it's only a real item and then casting the 0 to string? Is that what semantics means

Not 100% sure what you mean, but what I mean by semantics is being able to statically analyze the actual transformation DSL and actually understand what it does. For example, you cannot very easily tell from a JSON-e template that such template is actually a property rename. And if we can tell that i.e. a rule is actually a rename for A to B, then we might know how to handle the reference fix ups.

Coming back to the items to prefixItems example we've been discussing so far, this is the corresponding JSON-e rule we have in Alterschema:

{
  "$merge": [
    { "$eval": "omit(schema, 'items')" },
    {
      "prefixItems": {
        "$eval": "schema.items"
      }
    }
  ]
}

What if instead of that weird-looking low-level complex JSON template, we instead had:

[
  { "type": "rename", "from": "items", "to": "prefixItems" }
]

The latter is a LOT more machine readable.

I guess the main challenge is that leaving the simple prefixItems rule aside, some upgrade rules are more complex and involve even more cryptic JSON-e templates that do more than just renames. So the problem statement is: can we come up with a set of higher level operations that capture everything we need, AND that is machine readable enough for us to deterministically do $ref fix-ups in every possible case?

@Era-cell
Copy link

So I'd say the phases in this project are like this:

  • Research JSON Schema transformation rules, categorize them, etc
  • Come up with a higher-level transformation language than JSON-e that carry semantics about how we are actually transforming the schema (I was thinking something similar to JSON Patch (https://jsonpatch.com))
  • Then do a prototype of implementing upgrade rules with this language, ensuring it solves the limitations of Alterschema
  • If we have more time, we use this language to attempt to level of downgrading support, etc

@jviotti one question in this: Should the high level transformation language call the JSON-e at the backend or can say(should the high level one be written on top of JSON-e itself)?

@jviotti
Copy link
Member Author

jviotti commented Feb 24, 2024

@Era-cell Maybe. I'm open to both building it on top of JSON-e or as a standalone thing. Whatever is easier I guess

@benjagm
Copy link
Collaborator

benjagm commented Feb 27, 2024

Thanks a lot for joining JSON Schema org for this edition of GSoC!!

Qualification tasks will be published as comments in the project ideas by Thursday/Friday of this week. In addition I'd like to invite you to a office hours session this thursday 18:30 UTC where we'll present the ideas and the relevant date to consider at this stage of the program.

Please use this link to join the session:
🌐 Zoom
📅 20124-02-29 18:30 UTC

See you there!

@jviotti
Copy link
Member Author

jviotti commented Feb 27, 2024

For the qualifying task, just to echo back what I said before: the main thing we want to see on proposals is that you have a good grasp on what the problem of upgrading JSON Schemas is and are capable of understanding the upgrade rules that would need to be implemented.

So for that, you can focus only on 2019-09 to 2020-12 for the proposal (we'll cover other drafts later), list down the transformation rules that need to happen on all those drafts, and try to categorize them based on different criteria to understand them better. For example, what vocabulary they involve, what type of operation they are (rename, wrap, etc), whether they affect other sibling or non sibling keywords, etc. Be creative! Good grouping criteria can surface patterns that we might not be thinking about and that could influence the DSL. You can present this as a spreadsheet, list, or any form you want.

Then, once accepted, we will continue building up on this analysis to design the DSL, and finally implement it. If we did the previous phases well (mainly the one one understanding and categorizing the transformation rules), the rest will be easy

@MeastroZI
Copy link

MeastroZI commented Feb 29, 2024

{
  "$schema": "https://json-schema.org/draft/2020-12",
  "$id": "https://example.com/anotherthing/agains/customer",

  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "phone": { "$ref": "/schema/common#/$defs/phone" },
    "address": { "$ref": "/schema/address" }
  },

  "$defs": {
    "https://example.com/schema/address": {
      "$id": "https://example.com/schema/address",

      "type": "object",
      "properties": {
        "address": { "type": "string" },
        "city": { "type": "string" },
        "postalCode": { "$ref": "/schema/common#/$defs/usaPostalCode" },
        "state": { "$ref": "#/$defs/states" }
      },

      "$defs": {
        "states": {
          "enum": [4, 4]
        }
      }
    },
    "https://example.com/schema/common": {
      "$schema": "https://json-schema.org/draft/2019-09",
      "$id": "https://example.com/schema/common",

      "$defs": {
        "phone": {
          "type": "number"
        },
        "usaPostalCode": {
          "type": "string",
          "pattern": "^[0-9]{5}(?:-[0-9]{4})?$"
        },
        "unsignedInt": {
          "type": "integer",
          "minimum": 0
        }
      }
    }
  }
}

@jviotti I am not able to understand how, in this case, this $ref under:

"phone": { "$ref": "/schema/common#/$defs/phone" }

which has the relative path, gets resolved by the schema validator. I mean, how is the base URL for this calculated even if there is nothing common in the relative path under $ref and the $id of the root?

@Era-cell
Copy link

Era-cell commented Feb 29, 2024

```json
{
  "$schema": "https://json-schema.org/draft/2020-12",
  "$id": "https://example.com/anotherthing/agains/customer",

  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "phone": { "$ref": "/schema/common#/$defs/phone" },
    "address": { "$ref": "/schema/address" }
  },

  "$defs": {
    "https://example.com/schema/address": {
      "$id": "https://example.com/schema/address",

      "type": "object",
      "properties": {
        "address": { "type": "string" },
        "city": { "type": "string" },
        "postalCode": { "$ref": "/schema/common#/$defs/usaPostalCode" },
        "state": { "$ref": "#/$defs/states" }
      },

      "$defs": {
        "states": {
          "enum": [4, 4]
        }
      }
    },
    "https://example.com/schema/common": {
      "$schema": "https://json-schema.org/draft/2019-09",
      "$id": "https://example.com/schema/common",

      "$defs": {
        "phone": {
          "type": "number"
        },
        "usaPostalCode": {
          "type": "string",
          "pattern": "^[0-9]{5}(?:-[0-9]{4})?$"
        },
        "unsignedInt": {
          "type": "integer",
          "minimum": 0
        }
      }
    }
  }
}

@jviotti I am not able to understand how, in this case, this $ref under:

"phone": { "$ref": "/schema/common#/$defs/phone" }

which has the relative part, gets resolved by the schema validator. I mean, how is the base URL for this calculated even if there is nothing common in the relative path under $ref and the $id of the root?

Did you try to run it? I am thinking this is related to how schemas are stored

@MeastroZI
Copy link

MeastroZI commented Feb 29, 2024

@Era-cell, I have read somewhere that $ref is resolved by directly pointing to the schema part they are referring to. So now my question is: how does the schema validator resolve this $ref with a relative path? Even if the schema validator stores these schemas in the definition part or in some other way under the hood , there is still a need to resolve it by referencing it and resolving $ref.

@Era-cell
Copy link

Era-cell commented Feb 29, 2024 via email

@MeastroZI
Copy link

MeastroZI commented Feb 29, 2024

The schema I provided is not invalidating; it's working and successfully validating the JSON data.

You can try it here:
https://www.jsonschemavalidator.net/

Edited: Sorry, I am typing from my phone, so may you face typos in my messages

@jviotti
Copy link
Member Author

jviotti commented Feb 29, 2024

@MeastroZI Your reference, /schema/common#/$defs/phone is a URI reference, where /schema/common is the URI path and #/$defs/phone is the URI fragment. Furthermore, that URI reference is relative.

According to JSON Schema use of URI and the URI RFC, that relative URI is resolved taking https://example.com/anotherthing/agains/customer (the $id of the schema resource that contains such reference), as the base URI.

Following standard URI behavior, the result of resolving /schema/common#/$defs/phone against https://example.com/anotherthing/agains/customer results in https://example.com/schema/common#/$defs/phone. Then, when resolving that reference, JSON Schema will look for https://example.com/schema/common, which is an embedded schema resource in the schema you shared, and from then, resolve #/$defs/phone as a JSON Pointer.

If URI behavior is the confusing part, I recommend reading the URI RFC: https://www.rfc-editor.org/rfc/rfc3986

@Era-cell
Copy link

Era-cell commented Mar 2, 2024

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "http://example.com/lets_uneval",
    "type": "array",
    "prefixItems": [{
        "const": "aaa"
    }],
    "items": {
        "type": "string",
        "allOf": [
            { "pattern": "^a" },
            { "pattern": "^b" }
        ]
    },
    "uniqueItems": true,
    "unevaluatedItems": {
        "type": "string",
        "pattern": "^an"
    },
}

now for ["aaa", "a", "bn", "an"] "an" should be left unevaluated because "a" took care of it,
I expect the result to be true but given false, if even this is evaluated can I get an example where "items" is present and values are unevaluated

@MeastroZI
Copy link

MeastroZI commented Mar 3, 2024

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "http://example.com/lets_uneval",
    "type": "array",
    "prefixItems": [{
        "const": "aaa"
    }],
    "items": {
        "type": "string",
        "allOf": [
            { "pattern": "^a" },
            { "pattern": "^b" }
        ]
    },
    "uniqueItems": true,
    "unevaluatedItems": {
        "type": "string",
        "pattern": "^an"
    },
}

just tell me one thing is it possible to make the string start with a and simultaneously start with b , so because there is no possible string which is start with a and also start with b that why you are getting error try this

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "http://example.com/lets_uneval",
    "type": "array",
    "prefixItems": [{
        "const": "aaa"
    }],
    "items": {
        "type": "string",
        "allOf": [
            { "pattern": "^a" }, 
            { "pattern": "b$" }  
        ]
    },
    "uniqueItems": true,
    "unevaluatedItems": {
        "type": "string",
        "pattern": "^an"
    }
}

on this instance
["aaa" ,"aab" ,"aaab" ]

will give the result true but if you add any string which not start with a and end with b then that element is get catch by the items keyword, as i said earlier items check for all the elements which not consider by the prefixitems , not let the element go toward unevaluatedItems !

Correct me please if i am wrong 😺

@Era-cell
Copy link

Era-cell commented Mar 3, 2024

@jviotti , I have some more questions in alterschema:
Why are rules mentioned 2019 to 2019, 2020 to 2020 -- what is the need of these
Why did you opt to choose json-e over javascript functions.. because it was more intuitive?
Is there a need of imperative DSL or is declarative DSL like OOP is what you meant (which gives higher level of abstraction) ?
Are you going to use alterschema or that will be abandoned?

@jviotti
Copy link
Member Author

jviotti commented Mar 4, 2024

@MeastroZI

@jviotti can you explain this

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "prefixItems": [{ "type": "string" }, { "type": "string" }],
  "not": {
    "items": {
      "not": { "type": "string", "minLength": 3 }
    }
  },
  "unevaluatedItems": false
}

My understanding is that it dictates that there must not be any items in the array that are strings with a length less than 3. Therefore, the schema should only accept arrays where all elements have a minimum length of 3. However, it seems to also accept arrays like ["axd", "d"]. Could you clarify this?"

That schema looks overly complicated. Maybe what you want is this instead?

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "items": {
    "minLength": 3
  }
}

@jviotti
Copy link
Member Author

jviotti commented Mar 4, 2024

@Era-cell

Also the unevaluatedItems behaviour is a bit wierd:

The unevaluatedItems behavior depend on other adjacent array-related keywords. As it name implies, unevaluatedItems will only kick-in for array items that have not been evaluated by adjacent array keywords, so the precent of items and prefixItems will indeed affect its behavior

@jviotti
Copy link
Member Author

jviotti commented Mar 4, 2024

@Era-cell

I have some more questions in alterschema:Why are rules mentioned 2019 to 2019, 2020 to 2020 -- what is the need of these

These perform simplifications within the same version to make it easier to process the other rules. i.e. you could simplify the use of certain keywords on the input schema without changing the version, before you attempt to upgrade it.

Why did you opt to choose json-e over javascript functions.. because it was more intuitive?

The whole point of this project is to make rule definitions programming language agnostic. We don't want to just create an upgrade tool for JavaScript, but one that is embeddable and implementable on ANY language out there. That's why the rules are pure JSON.

Is there a need of imperative DSL or is declarative DSL like OOP is what you meant (which gives higher level of abstraction) ?

Not sure I follow this. Can you give me an example?

Are you going to use alterschema or that will be abandoned?

I will. The idea is for the JSON-based rules to be moved to the JSON Schema org while Alterschema is (one of many, potentially?) an implementation of the actual engine.

@Era-cell
Copy link

Era-cell commented Mar 4, 2024

@Era-cell

Also the unevaluatedItems behaviour is a bit wierd:

The unevaluatedItems behavior depend on other adjacent array-related keywords. As it name implies, unevaluatedItems will only kick-in for array items that have not been evaluated by adjacent array keywords, so the precent of items and prefixItems will indeed affect its behavior

@jviotti
My query on this is:
at the presence of items keyword wouldnt the items evaluate each and every instance value, so
-- none of them will be left unevaluated.
(can you give an example even at the presence of "items" keyword there are some unevaluated values left over)

@jviotti
Copy link
Member Author

jviotti commented Mar 4, 2024

at the presence of items keyword wouldnt the items evaluate each and every instance value, so none of them will be left unevaluated.

Correct. Maybe this example helps clarifying that: https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/main/tests/draft2020-12/unevaluatedItems.json#L64-L78

@Era-cell
Copy link

Era-cell commented Mar 4, 2024

@Era-cell

I have some more questions in alterschema:Why are rules mentioned 2019 to 2019, 2020 to 2020 -- what is the need of these

These perform simplifications within the same version to make it easier to process the other rules. i.e. you could simplify the use of certain keywords on the input schema without changing the version, before you attempt to upgrade it.

Why did you opt to choose json-e over javascript functions.. because it was more intuitive?

The whole point of this project is to make rule definitions programming language agnostic. We don't want to just create an upgrade tool for JavaScript, but one that is embeddable and implementable on ANY language out there. That's why the rules are pure JSON.

Is there a need of imperative DSL or is declarative DSL like OOP is what you meant (which gives higher level of abstraction) ?

Not sure I follow this. Can you give me an example?

Are you going to use alterschema or that will be abandoned?

I will. The idea is for the JSON-based rules to be moved to the JSON Schema org while Alterschema is (one of many, potentially?) an implementation of the actual engine.

  1. like do we need to use parsers, lexifiers and new grammar defining the language, OR use abstraction over the json-e or javascript(or any other language to create functions with arguments) itself..?

@jviotti
Copy link
Member Author

jviotti commented Mar 4, 2024

@Era-cell

like do we need to use parsers, lexifiers and new grammar defining the language, OR use abstraction over the json-e or javascript(or any other language to create functions with arguments) itself..?

It should be all JSON based. No need for a new grammar. Just use JSON's grammar. But don't embed an actual programming language like JavaScript on the JSON. JSON-e is one valid way of doing it. It expresses the transformations purely using JSON.

@Era-cell
Copy link

Era-cell commented Mar 9, 2024

Hi, @jviotti when the algorithm/DSL will be included in JSON Schema org, will the access to external json schema documents be provided,

"$ref":"other.json#/$defs/items/0"

whose schema resource isnt present in the document which is being altered, at this point the external schema document(which is external resource) also needs to be altered?

@jviotti
Copy link
Member Author

jviotti commented Mar 11, 2024

Hi @Era-cell

whose schema resource isnt present in the document which is being altered, at this point the external schema document(which is external resource) also needs to be altered?

Great question! Yes on both cases:

  • A JSON Schema is allowed to externally reference another JSON Schema that makes use of a different draft. i.e. you can have a JSON Schema 2020-12 that externally references a JSON Schema Draft 4. So in that case, it is not really required to i.e. upgrade the other schema and we can simply ignore it if we don't have access to it

  • That said, while this cross-version referencing is supposed to work, I think many implementations out there don't properly support it, and the JSON Schema test suite doesn't cover it either. For these cases, what you can do is perform JSON Schema Bundling (https://json-schema.org/blog/posts/bundling-json-schema-compound-documents) before upgrading that schema. Bundling will bring in all externally referenced schema into a single schema with nested schema resources, and then we upgrade them all together

But in both cases, our upgrader shouldn't really mind. If its passed a schema with unresolved remote references, it will do what it can, and if its passed a bundled schema, it will transform the entire thing.

@MeastroZI
Copy link

MeastroZI commented Mar 11, 2024

"Hi, @jviotti! I have one more question about bundling schemas. Can I assume that the name(key) of the schema in $def will always be an $id of that schema, or it can be anything? For example, in this schema under the $def, the names are set to the $id of the schema:"

{
  "$id": "https://jsonschema.dev/schemas/examples/non-negative-integer-bundle",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "description": "Must be a non-negative integer",
  "$comment": "A JSON Schema Compound Document. Aka a bundled schema.",
  "$defs": {
    "https://jsonschema.dev/schemas/mixins/integer": {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "$id": "https://jsonschema.dev/schemas/mixins/integer",
      "description": "Must be an integer",
      "type": "integer"
    },
    "https://jsonschema.dev/schemas/mixins/non-negative": {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "$id": "https://jsonschema.dev/schemas/mixins/non-negative",
      "description": "Not allowed to be negative",
      "minimum": 0
    },
    "nonNegativeInteger": {
      "allOf": [
        {
          "$ref": "/schemas/mixins/integer"
        },
        {
          "$ref": "/schemas/mixins/non-negative"
        }
      ]
    }
  },
  "$ref": "#/$defs/nonNegativeInteger"
}

@mwadams
Copy link

mwadams commented Mar 11, 2024

It can be anything.

@mwadams
Copy link

mwadams commented Mar 11, 2024

(The value of the $ref is applied to the current scope and the schema is resolved from that reference.)

@Era-cell
Copy link

Era-cell commented Mar 11, 2024

Hi @Era-cell

whose schema resource isnt present in the document which is being altered, at this point the external schema document(which is external resource) also needs to be altered?

Great question! Yes on both cases:

  • A JSON Schema is allowed to externally reference another JSON Schema that makes use of a different draft. i.e. you can have a JSON Schema 2020-12 that externally references a JSON Schema Draft 4. So in that case, it is not really required to i.e. upgrade the other schema and we can simply ignore it if we don't have access to it
  • That said, while this cross-version referencing is supposed to work, I think many implementations out there don't properly support it, and the JSON Schema test suite doesn't cover it either. For these cases, what you can do is perform JSON Schema Bundling (https://json-schema.org/blog/posts/bundling-json-schema-compound-documents) before upgrading that schema. Bundling will bring in all externally referenced schema into a single schema with nested schema resources, and then we upgrade them all together

But in both cases, our upgrader shouldn't really mind. If its passed a schema with unresolved remote references, it will do what it can, and if its passed a bundled schema, it will transform the entire thing.

Okay, so if we have access to external resource and it is resolved.. we dont change the external schema,
but we bundle it in the present document itself right?
BECAUSE the user may use the external schema for other purposes too.. Right?

@jviotti
Copy link
Member Author

jviotti commented Mar 11, 2024

Keep in mind the project would not be able to "modify" any schema in place. What it does is create a copy of the input schema with the given transformations. So:

  • If the schema is bundled, you transform the entire thing, including the bundled resources
  • If the schema is NOT bundled, you just transform the immediate schema only

@benjagm
Copy link
Collaborator

benjagm commented Mar 18, 2024

🚩 IMPORTANT INSTRUCTIONS REGARDING HOW AND WHERE TO SUBMIT YOU APPLICATION 🚩

Please join this discussion in JSON Schema slack to get the last details very important details on how to better submit your application to JSON Schema.

See communication here.

@Era-cell
Copy link

Hi, @jviotti where should the qualification task be submitted, and what is the deadline for it?

@jviotti
Copy link
Member Author

jviotti commented Mar 18, 2024

@Era-cell I believe there is a GSoC portal that you should use. @benjagm Can you clarify?

@Era-cell
Copy link

Era-cell commented Mar 18, 2024

@Era-cell I believe there is a GSoC portal that you should use. @benjagm Can you clarify?

@jviotti I guess that is for the proposal, should I embed qualification task inside proposal itself..?
@benjagm

@benjagm
Copy link
Collaborator

benjagm commented Mar 18, 2024

@Era-cell yes please. Make sure you add the details of the qualification task to the proposal! Feel free to join the #gsoc channel in our Slack workspace to get immediately response to these type of questions

@MeastroZI
Copy link

Hi @jviotti,

First of all, I apologize for using the Alterschima UI to display my DSL transformation. It's only temporary!

Could you please review the transformation from 2019 to 2020 draft on this site? I've embedded the qualification tasks' DSL transformation code and have tried my best to cover all edge cases. However, if I've missed any, please let me know."

@jviotti
Copy link
Member Author

jviotti commented Mar 23, 2024

@MeastroZI Not much I can comment on given a single example, but looking forward to the explanations, proposed rules, etc in the proposal!

@MeastroZI
Copy link

@jviotti, I submitted my proposal (Name: Pandit Vinit ) in Json schema. Could you please review it and provide any suggestions if possible ?

@jviotti
Copy link
Member Author

jviotti commented Mar 26, 2024

I will, thanks a lot for the submission! ❤️

@MeastroZI
Copy link

MeastroZI commented Mar 28, 2024

@jviotti in 2019-09 draft i am not able to find the any difference between additionalItems and unevaluatedItems
here written as "Similar to additionalItems, but can "see" into subschemas and across references" but as i tested this schema , additionalItems also doing all of this
here is the example

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "$def": {
    "stringArray": {
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "numberArray": {
      "oneOf": [
        {
          "type": "array",
          "items": [
            {
              "type": "number"
            },
            {
              "$ref": "#/$def/stringArray"
            }
          ]
        },
        {
          "type": "boolean"
        }
      ]
    }
  },
  "type": "array",
  "items": [
    {
      "$ref": "#/$def/stringArray"
    }
  ],
  "additionalItems": {
    "$ref": "#/$def/numberArray"
  }
}

validate against : [[""] , [5 , [""]] ] and [[""] , true ]

so my question is what is the difference between additionalItems and unevaluatedItems in 2019-09 draft and is there any example schema which show the difference between additionalItems and unevaluatedItems ?

@jviotti
Copy link
Member Author

jviotti commented Mar 28, 2024

@MeastroZI Take a look at the official test suite examples: https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/main/tests/draft2019-09/unevaluatedItems.json. additionalItems matches any array element not covered by an adjacent items. unevaluatedItems applies to array items that were not evaluated (as its name implies) by any other relevant keyword (whether adjacent or not).

@MeastroZI
Copy link

MeastroZI commented Apr 4, 2024

"@jviotti, I need direction to think on how to approach downgrading of JSON schema. Is it even possible to do this for all the dialects? With each new version, new keywords are introduced, and I'm unsure if it's feasible to replicate their behavior using the previous version.

Regarding upgrading, I've developed the DSL, and I believe it's capable of handling all upgrades. Please review the recent changes I made in the repository and please provide feedback if possible."

@jviotti
Copy link
Member Author

jviotti commented Apr 4, 2024

@MeastroZI It is not always feasible, but I think you can go a long way with it, and we can think how to handle the problematic cases. I think if the resulting downgraded schema is a superset of the schema (i.e. it doesn't add more constraints), then it's probably acceptable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gsoc Google Summer of Code Project Idea
Projects
None yet
Development

No branches or pull requests

5 participants