JSON Schema for pypyr YAML files #229

vlcinsky · 2021-08-13T13:11:20Z

Having JSON schema for pypyr YAML configuration files would be handy. Many editors (at least my neovim) allow to configure runtime validation of authored file if JSON schema is available.

My question is - is there any JSON schema for pypyr already in place? My quick research did not reveal any.

If it does not exist, I am considering authoring one (using pydantic allows to create such a schema very simply).

yaythomas · 2021-08-13T13:37:27Z

great idea @vlcinsky! 🎉

is your idea to create a model file for Pydantic somewhere like schema/pipelinemodel.py?

sadly there isn't a one-stop "valid" pipeline schema. . . but valid keys & structure is fully documented:

the top-level pipeline structure keys look like this: https://pypyr.io/docs/pipelines/pipeline-structure/. keep in mind that a user can create custom-step groups, so it's not just context_parser, steps, on_success & on_failure that are valid as top-level keys, you could easily have "mycustomgroup". The only mandatory key is steps.
and these are all the possible attributes for a step (including default values) - AKA decorators: https://pypyr.io/docs/decorators/

Also keep in mind that YAML (and also pypyr!) supports YAML references/anchors - I don't know how/if this works with a schema.

vlcinsky · 2021-08-20T20:39:40Z

@yaythomas thanks for hints. Last weekend i worked on it a bit. It is probably doable, but you are right it is extensive task and there are risks that it would not serve as well as it could.

For that reason I am setting this task on hold on my side.

My current attempt (in form of pytest test file) looks as follows:

from typing import List, Mapping, Optional, Union, Any

import jsonschema
import pytest
import yaml
from pydantic import BaseModel, Field, constr

pymodule = constr(regex=r"^[a-z][a-z]*(\.[a-z][a-z0-9]*)*$")
substring = constr(regex=r".*{.+}.*")  # expect at least one pair of {}
pystring = constr(regex=r"^!py .+")  # line starting with '!py '


class Step(BaseModel):
    name: str
    description: Optional[str]
    comment: Optional[str]
    incontext: Mapping = Field(alias="in")
    run: bool
    skip: bool
    swallow: bool
    foreach: Optional[Union[List, substring, pystring]]
    onError: Optional[Any]


class MainModel(BaseModel):
    """
    This is the description of the main model
    """

    context_parser: pymodule
    steps: List[Union[pymodule, Step]]
    on_success: Optional[List[Union[pymodule, Step]]]
    on_failure: Optional[List[Union[pymodule, Step]]]

    class Config:
        title = "Main"


@pytest.fixture
def schema():
    return MainModel.schema()


@pytest.fixture
def data():
    fname = "tests/data/pipelinename.yaml"
    with open(fname, "r", encoding="utf-8") as f:
        data = yaml.safe_load(f)
    return data


def test_schema(schema, data):
    print(schema)
    jsonschema.validate(schema=schema, instance=data)

and the schema looks as follows:

{
  "title": "Main",
  "description": "This is the description of the main model",
  "type": "object",
  "properties": {
    "context_parser": {
      "title": "Context Parser",
      "pattern": "^[a-z][a-z]*(\\.[a-z][a-z0-9]*)*$",
      "type": "string"
    },
    "steps": {
      "title": "Steps",
      "type": "array",
      "items": {
        "anyOf": [
          {
            "type": "string",
            "pattern": "^[a-z][a-z]*(\\.[a-z][a-z0-9]*)*$"
          },
          {
            "$ref": "#/definitions/Step"
          }
        ]
      }
    },
    "on_success": {
      "title": "On Success",
      "type": "array",
      "items": {
        "anyOf": [
          {
            "type": "string",
            "pattern": "^[a-z][a-z]*(\\.[a-z][a-z0-9]*)*$"
          },
          {
            "$ref": "#/definitions/Step"
          }
        ]
      }
    },
    "on_failure": {
      "title": "On Failure",
      "type": "array",
      "items": {
        "anyOf": [
          {
            "type": "string",
            "pattern": "^[a-z][a-z]*(\\.[a-z][a-z0-9]*)*$"
          },
          {
            "$ref": "#/definitions/Step"
          }
        ]
      }
    }
  },
  "required": [
    "context_parser",
    "steps"
  ],
  "definitions": {
    "Step": {
      "title": "Step",
      "type": "object",
      "properties": {
        "name": {
          "title": "Name",
          "type": "string"
        },
        "description": {
          "title": "Description",
          "type": "string"
        },
        "comment": {
          "title": "Comment",
          "type": "string"
        },
        "in": {
          "title": "In",
          "type": "object"
        },
        "run": {
          "title": "Run",
          "type": "boolean"
        },
        "skip": {
          "title": "Skip",
          "type": "boolean"
        },
        "swallow": {
          "title": "Swallow",
          "type": "boolean"
        },
        "foreach": {
          "title": "Foreach",
          "anyOf": [
            {
              "type": "array",
              "items": {}
            },
            {
              "type": "string",
              "pattern": ".*{.+}.*"
            },
            {
              "type": "string",
              "pattern": "^!py .+"
            }
          ]
        },
        "onError": {
          "title": "Onerror"
        }
      },
      "required": [
        "name",
        "in",
        "run",
        "skip",
        "swallow"
      ]
    }
  }
}

Keep in mind, this is WIP, it is definitely not complete.

yaythomas · 2021-08-20T21:15:28Z

Thanks so much @vlcinsky , this looks great! The regex-ing to make a custom "type" for a py module and pypyr-style !py strings is a very nice touch!

Yes, agree with you, this is very much an extensive task. To illustrate this even more, we've not even talked about individual step inputs themselves yet. . . because from experience/feedback I've received in real-world usage, the places where most of the mistakes happen in yaml-authoring is actually more in the in section for each step - because each step takes its own shape of input arguments. But one step at a time!

So what you've done already is a great start - if nothing else, I'm sure at some point it will be useful to someone at least to have the broad outline of the schema for the over-all structure like you've done here, even if the individual built-in steps are out of scope.

As useful as a schema like this will be for those who like a more "full" IDE experience. . . there is actually another purpose too, to do validation of pipelines before/without actually running them, per ref #116. I've not fully (haha, or at all, really) thought through how this should work best, but you've definitely given me food for thought here that we might be able to serve both objectives at the same time.

Thank you again!

vlcinsky · 2021-08-20T21:38:14Z

@yaythomas you are welcome.

I am astonished by pydantic too. It looks like a magic, that it can turn the classes into JSON schema incl. rules such as regexp.

So far there can be two goals:

[authoring] support user editing the YAML document by giving him hints and highlighting possible errors. If we have schema and it is provided to the editor, some editors can provide such service.
[validation] validate resulting document for errors.

I would say that [authoring] could be feasible.

[validation] could turn out being very difficult or impossible. If we aim for this, then we would need to use custom validators which would not be reflected in JSON schema.

Handy links:

use pydantic to create JSON schema
extensive pydentic set of field types - these shall be reflected in the JSON schema
code generator - it shall allow converting YAML doc into whatever incl. pydantic python code (I have discovered it just now)
pydentic providing custom validators - note that this will not be reflected in resulting JSON schema
pydentic for Settings maangement - not related to this issue

yaythomas · 2021-08-20T23:16:00Z

the links are very handy indeed, thank you!

yaythomas added this to To do in pypyr roadmap via automation Aug 13, 2021

yaythomas added the enhancement new features label Aug 13, 2021

yaythomas assigned vlcinsky Aug 13, 2021

yaythomas added the type: env system, environmental, python runtime label Aug 13, 2021

yaythomas added the on hold waiting on something else label Aug 20, 2021

lucasrcezimbra mentioned this issue Sep 25, 2023

Models and loader #332

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON Schema for pypyr YAML files #229

JSON Schema for pypyr YAML files #229

vlcinsky commented Aug 13, 2021

yaythomas commented Aug 13, 2021 •

edited

vlcinsky commented Aug 20, 2021 •

edited

yaythomas commented Aug 20, 2021

vlcinsky commented Aug 20, 2021

yaythomas commented Aug 20, 2021

JSON Schema for pypyr YAML files #229

JSON Schema for pypyr YAML files #229

Comments

vlcinsky commented Aug 13, 2021

yaythomas commented Aug 13, 2021 • edited

vlcinsky commented Aug 20, 2021 • edited

yaythomas commented Aug 20, 2021

vlcinsky commented Aug 20, 2021

yaythomas commented Aug 20, 2021

yaythomas commented Aug 13, 2021 •

edited

vlcinsky commented Aug 20, 2021 •

edited