Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema for pypyr YAML files #229

Open
vlcinsky opened this issue Aug 13, 2021 · 5 comments
Open

JSON Schema for pypyr YAML files #229

vlcinsky opened this issue Aug 13, 2021 · 5 comments
Assignees
Labels
enhancement new features on hold waiting on something else type: env system, environmental, python runtime

Comments

@vlcinsky
Copy link
Contributor

Having JSON schema for pypyr YAML configuration files would be handy. Many editors (at least my neovim) allow to configure runtime validation of authored file if JSON schema is available.

My question is - is there any JSON schema for pypyr already in place? My quick research did not reveal any.

If it does not exist, I am considering authoring one (using pydantic allows to create such a schema very simply).

@yaythomas yaythomas added this to To do in pypyr roadmap via automation Aug 13, 2021
@yaythomas yaythomas added the enhancement new features label Aug 13, 2021
@yaythomas
Copy link
Member

yaythomas commented Aug 13, 2021

great idea @vlcinsky! 🎉

is your idea to create a model file for Pydantic somewhere like schema/pipelinemodel.py?

sadly there isn't a one-stop "valid" pipeline schema. . . but valid keys & structure is fully documented:

  1. the top-level pipeline structure keys look like this: https://pypyr.io/docs/pipelines/pipeline-structure/. keep in mind that a user can create custom-step groups, so it's not just context_parser, steps, on_success & on_failure that are valid as top-level keys, you could easily have "mycustomgroup". The only mandatory key is steps.

  2. and these are all the possible attributes for a step (including default values) - AKA decorators: https://pypyr.io/docs/decorators/

Also keep in mind that YAML (and also pypyr!) supports YAML references/anchors - I don't know how/if this works with a schema.

@yaythomas yaythomas added the type: env system, environmental, python runtime label Aug 13, 2021
@vlcinsky
Copy link
Contributor Author

vlcinsky commented Aug 20, 2021

@yaythomas thanks for hints. Last weekend i worked on it a bit. It is probably doable, but you are right it is extensive task and there are risks that it would not serve as well as it could.

For that reason I am setting this task on hold on my side.

My current attempt (in form of pytest test file) looks as follows:

from typing import List, Mapping, Optional, Union, Any

import jsonschema
import pytest
import yaml
from pydantic import BaseModel, Field, constr

pymodule = constr(regex=r"^[a-z][a-z]*(\.[a-z][a-z0-9]*)*$")
substring = constr(regex=r".*{.+}.*")  # expect at least one pair of {}
pystring = constr(regex=r"^!py .+")  # line starting with '!py '


class Step(BaseModel):
    name: str
    description: Optional[str]
    comment: Optional[str]
    incontext: Mapping = Field(alias="in")
    run: bool
    skip: bool
    swallow: bool
    foreach: Optional[Union[List, substring, pystring]]
    onError: Optional[Any]


class MainModel(BaseModel):
    """
    This is the description of the main model
    """

    context_parser: pymodule
    steps: List[Union[pymodule, Step]]
    on_success: Optional[List[Union[pymodule, Step]]]
    on_failure: Optional[List[Union[pymodule, Step]]]

    class Config:
        title = "Main"


@pytest.fixture
def schema():
    return MainModel.schema()


@pytest.fixture
def data():
    fname = "tests/data/pipelinename.yaml"
    with open(fname, "r", encoding="utf-8") as f:
        data = yaml.safe_load(f)
    return data


def test_schema(schema, data):
    print(schema)
    jsonschema.validate(schema=schema, instance=data)

and the schema looks as follows:

{
  "title": "Main",
  "description": "This is the description of the main model",
  "type": "object",
  "properties": {
    "context_parser": {
      "title": "Context Parser",
      "pattern": "^[a-z][a-z]*(\\.[a-z][a-z0-9]*)*$",
      "type": "string"
    },
    "steps": {
      "title": "Steps",
      "type": "array",
      "items": {
        "anyOf": [
          {
            "type": "string",
            "pattern": "^[a-z][a-z]*(\\.[a-z][a-z0-9]*)*$"
          },
          {
            "$ref": "#/definitions/Step"
          }
        ]
      }
    },
    "on_success": {
      "title": "On Success",
      "type": "array",
      "items": {
        "anyOf": [
          {
            "type": "string",
            "pattern": "^[a-z][a-z]*(\\.[a-z][a-z0-9]*)*$"
          },
          {
            "$ref": "#/definitions/Step"
          }
        ]
      }
    },
    "on_failure": {
      "title": "On Failure",
      "type": "array",
      "items": {
        "anyOf": [
          {
            "type": "string",
            "pattern": "^[a-z][a-z]*(\\.[a-z][a-z0-9]*)*$"
          },
          {
            "$ref": "#/definitions/Step"
          }
        ]
      }
    }
  },
  "required": [
    "context_parser",
    "steps"
  ],
  "definitions": {
    "Step": {
      "title": "Step",
      "type": "object",
      "properties": {
        "name": {
          "title": "Name",
          "type": "string"
        },
        "description": {
          "title": "Description",
          "type": "string"
        },
        "comment": {
          "title": "Comment",
          "type": "string"
        },
        "in": {
          "title": "In",
          "type": "object"
        },
        "run": {
          "title": "Run",
          "type": "boolean"
        },
        "skip": {
          "title": "Skip",
          "type": "boolean"
        },
        "swallow": {
          "title": "Swallow",
          "type": "boolean"
        },
        "foreach": {
          "title": "Foreach",
          "anyOf": [
            {
              "type": "array",
              "items": {}
            },
            {
              "type": "string",
              "pattern": ".*{.+}.*"
            },
            {
              "type": "string",
              "pattern": "^!py .+"
            }
          ]
        },
        "onError": {
          "title": "Onerror"
        }
      },
      "required": [
        "name",
        "in",
        "run",
        "skip",
        "swallow"
      ]
    }
  }
}

Keep in mind, this is WIP, it is definitely not complete.

@yaythomas
Copy link
Member

Thanks so much @vlcinsky , this looks great! The regex-ing to make a custom "type" for a py module and pypyr-style !py strings is a very nice touch!

Yes, agree with you, this is very much an extensive task. To illustrate this even more, we've not even talked about individual step inputs themselves yet. . . because from experience/feedback I've received in real-world usage, the places where most of the mistakes happen in yaml-authoring is actually more in the in section for each step - because each step takes its own shape of input arguments. But one step at a time!

So what you've done already is a great start - if nothing else, I'm sure at some point it will be useful to someone at least to have the broad outline of the schema for the over-all structure like you've done here, even if the individual built-in steps are out of scope.

As useful as a schema like this will be for those who like a more "full" IDE experience. . . there is actually another purpose too, to do validation of pipelines before/without actually running them, per ref #116. I've not fully (haha, or at all, really) thought through how this should work best, but you've definitely given me food for thought here that we might be able to serve both objectives at the same time.

Thank you again!

@yaythomas yaythomas added the on hold waiting on something else label Aug 20, 2021
@vlcinsky
Copy link
Contributor Author

@yaythomas you are welcome.

I am astonished by pydantic too. It looks like a magic, that it can turn the classes into JSON schema incl. rules such as regexp.

So far there can be two goals:

  1. [authoring] support user editing the YAML document by giving him hints and highlighting possible errors. If we have schema and it is provided to the editor, some editors can provide such service.
  2. [validation] validate resulting document for errors.

I would say that [authoring] could be feasible.

[validation] could turn out being very difficult or impossible. If we aim for this, then we would need to use custom validators which would not be reflected in JSON schema.

Handy links:

@yaythomas
Copy link
Member

the links are very handy indeed, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new features on hold waiting on something else type: env system, environmental, python runtime
Projects
pypyr roadmap
  
To do
Development

No branches or pull requests

2 participants