Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Umbrella issue for defining schema in some common machine readable format #124

Open
programmer04 opened this issue Apr 25, 2022 · 15 comments
Labels
enhancement New feature or request question Further information is requested schema Shape of YAMLs

Comments

@programmer04
Copy link
Member

programmer04 commented Apr 25, 2022

Problem to solve

As discussed many times, having a definition of schema in some parsable format is highly desirable. Let's discuss here possible solutions, pros, and cons for them and decide which one we want to choose.

Please keep in mind that YAML is only a format that we use to describe this because it's typical for the configuration/infrastructure world, but our YAMLs are convertible to JSONs. Furthermore, probably APIs of the platform will expect those definitions in JSON anyways. Basically, YAMLs are more human readable than bare schema. YAMLs should be able to be easily validated against the schema defined with the chosen solution.

Nice to have

  • popular / widely supported
  • support validation (required a conditional one)
  • support aggregation (define a piece of schema and reuse it)
  • code generation for popular languages
  • use it seamlessly in https://github.com/OpenSLO/oslo
  • anything else?

Proposal

@programmer04 programmer04 added enhancement New feature or request question Further information is requested labels Apr 25, 2022
@proffalken
Copy link
Collaborator

OpenAPI gets my vote given that it will auto-document in Backstage.io: https://backstage.io/docs/features/software-catalog/descriptor-format#kind-api

@hob
Copy link

hob commented May 3, 2022

OpenAPI is meant more for speccing APIs, no? This seems more like a request for a way to spec schemas. There are a couple of interesting PRs to that effect open:

#129
#113

@ian-bartholomew
Copy link
Contributor

I agree that OpenAPI is more focused on APIs, so I don't think that's the right fit.

I think that this doesn't have to be an either/or proposition either though. I like the idea of having the spec in Go structs in here from @kenfinnigan's PR #113 . We could use those to build on the work that @CRThaze did in #129 and generate the json-schema from the structs (probably via oslo) as well as cue.

@hob
Copy link

hob commented May 3, 2022

fwiw, I've already used @kenfinnigan's Go structs with great success to generate OpenSLO yaml docs from a DB of SLOs that I've been maintaining.

@kenfinnigan
Copy link
Collaborator

Thanks @ian-bartholomew

I agree it's not a one and done situation. All we need is a central mechanism to define the schema, and CI processes can generate other formats as part of release or through other tools. Whichever we'd prefer

@proffalken
Copy link
Collaborator

OpenAPI is meant more for speccing APIs, no? This seems more like a request for a way to spec schemas. There are a couple of interesting PRs to that effect open:

#129 #113

OK, yeah, that's a good point, I'll drop my request for OpenAPI and instead add my voice to the "Let's make Oslo generate them from the Structs" perspective. :)

@CRThaze
Copy link
Contributor

CRThaze commented Jun 15, 2022

So now that we've merged the JSON-Schema definitions, I've been looking at ways we might leverage it. I think @kenfinnigan is correct in their comment. Some decision should be finalized about figuring out what the ultimate source of truth for a common machine-readable format should be, before we stumble too far down the road in one direction or another.

Currently, we have two handcrafted machine-readable definitions:

I'm of the opinion that we should hopefully be maintaining only one of these two, and deriving the other from it. This will avoid drift between them and reduce the complexity of future changes.

Though as the author of the JSON-Schema approach (and as someone who also needs the spec defined as JSON-Schema draft-07 for my Backstage work) I'm not an unbiased commenter; I'd like to make a pitch for having JSON-Schema be that primary source of truth.

Oslo Validation

99% of all validations we want Oslo to perform (and possibly more than are currently supported by Oslo) are currently implemented within the unit-test code currently checked into this repo; using far less code (and still less if you ignore the test-table of inputs in that file). This is a strong advantage of using a broadly adopted and language agnostic definition for the schema. However, as I pointed out in the PR, there are a couple of rules defined in the spec which are difficult to validate using a static definition like JSON-Schema. Those remaining rules are where additional logic will need to be written in Oslo to perform. Yet the overall code-footprint of Oslo could be greatly reduced by implementing the logic used in the unit-test, and then using targeted tests for certain fields after deserializing a document into a generic map[string]interface{} for the remaining validations.

That is not to say that there aren't still other benefits to having the spec represented as structs, and then using those structs within Oslo to perform the additional validations. That would, after all, probably result in cleaner and even shorter logic for those remaining validations. Instead, I'm trying to highlight that a native Go representation of the schema is not necessary for Oslo to validate OpenSLO documents.

SDK/Library Code

A key advantage to having a native Go struct representation of the schema, as I see it, would be to assist those wanting to write code which consumes and implements OpenSLO. However, that use-case is not limited to projects written in Go. And it's very easy to imagine that other languages would also benefit from having a library they can use to serialize/deserialize OpenSLO documents using objects native to the language. That then gives an advantage to making the source of truth a language agnostic standard like JSON-Schema, which can then be used to generate schema models in various languages using popular polyglot tools like quicktype.

OpenAPI

A quick note about OpenAPI and JSON-Schema. OpenAPI 2.0+ is technically an “extended subset” of JSON-Schema, and as of OpenAPI 3.0 the divergence is narrowing. There do exist some tools which attempt to translate between them by removing unsupported keywords from JSON-Schema documents, but those looking to have an API spec simply import our published JSON-Schema would likely need to perform some translation. This is a situation where the tools from generation from Go (or other language) native objects are probably more well suited. But we can still provide OpenAPI object definitions from JSON-Schema through either JSON-Schema -> OpenAPI Schema or JSON-Schema -> <Go or some other language model> -> OpenAPI Schema.

quicktype

I haven't used quicktype before, but the project seems fairly robust (>8K ⭐ and extensible language support).

Running the following produced some interesting results very quickly:

npm install -g quicktype
quicktype -s schema --src ./schemas/v1 -o models.go

The output left me scratching my head a bit and might need some massaging, or maybe our schema will need to be to help it along, but it seems promising and worth exploring.

Conclusion

Using Go structs as the source of truth certainly seems like it would be possible, and there do exist some tools that can render JSON-Schema from them. But as of right now the impression I'm getting is that going from JSON-Schema to Go and other languages will be more flexible, and is a more well trodden path. Either approach will, however, require a bit more investigation and so long as we end up with both JSON-Schema that Backstage can use, and native language objects, both being derived from a single source of truth; I'll be happy. No matter the direction.

@djetelina
Copy link

Hi everybody, I've been trying to read through various issues and I have to admit it's not really easy to follow what's the current status of everything, or perhaps the intentions/goals/vision.

I see that a lot of discussion is centered around oslo and there's not much "else" built on top of OpenSLO out in the open, at least not from what I can find. I have a use case, where I'd love to let our developers manage their SLOs purely through Kubernetes, through OpenSLO and manage the specifics of our observability stack on the cluster level. I'd love to use OpenSLO for that and initially I thought I'm just going to figure out how to take the json schema in main repository and turn it into CRDs. Then I read through all these issues and I'm not quite sure that that would be the best approach. In some issues there were mentions of it not being up to date...

So, what would the recommended approach be, to get CRDs, in order to start building some tooling on top of them?

@nieomylnieja
Copy link
Member

hey @djetelina, I think for now the best way to move forward is with oslo. As of v0.11.0 it is a usable Go package, here's an example usage: https://github.com/OpenSLO/oslo/blob/main/examples/example.go, so If there's going to be an operator managing these CRDs you could just use oslo to validate the extracted/converted definition. I understand the problem you're facing would be easier solved with a JSON Schema and I myself want to push in this direction, I'm still not sure if pure JSON Schema is the way to go or whether we'd be better off with sth like cue. Feel free to get involved in these discussions too :)

@nieomylnieja
Copy link
Member

#87 - pinning for reference. If you'd have the schema to work with @djetelina would you consider making the CRDs public and part of the OpenSLO?

@djetelina
Copy link

Definitely, it wouldn't make much sense to start building anything on these foundations if I (or my team) weren't willing to make them stronger by our contributions :)

I don't (and can't) promise anything, but I've managed to infect at least one of my colleagues with the idea of Kubernetes SLO operator, so we'll see where the idea is going to take us.

For some backstory, I've written very simplistic https://github.com/heureka/omni-slo-generator in the past which we're running in production, but one of it's major flaws is it just wrapping https://github.com/google/slo-generator which creates very inefficient load on our Mimir cluster (loads of 28d queries instead of using ruler), as well as being fairly complicated to get up and running, not solving alerts etc. As we spend more and more time helping our teams get it up and running leading into unnecessarily overscaled cluster, it's slowly starting to be worth our time to build a robust solution.

@fourstepper
Copy link
Contributor

fourstepper commented Jun 24, 2023

I am definitely for quicktype @CRThaze mentioned.

With some well defined JSON-Schemas including good titling for it to properly name things as we would like, we (or the consumers of the OpenSLO spec) would be able to (what seems like reliably) generate code for whatever project would need it. (EDIT: sorry for this terrible sentence, hope it makes sense)

From my (biased) point of view it would be great if we had:

  1. The JSON-Schema as the source of truth
  2. Validated Go structs generated by quicktype based on the JSON-Schema
  3. Generated Kubernetes CRDs either directly from the JSON-Schema or from the Go structs

EDIT2: I looked into CUE a bit more as well, and while it looks sleek, it seems like it currently doesn't support generating a JSON-schema out of the CUE definitions. It also doesn't support generating into any other language other than Go.

I think it really depends if we care mainly about the Go and Kube CRDs ecosystem and keeping that neat and tidy or if we are more interested in providing some higher standard such as the JSON-Schema for other languages to consume as well, at the expense of readability and workability with the JSON-Schema

@nieomylnieja
Copy link
Member

hey @fourstepper thanks for chiming in!

I'm torn on this one, mainly because I've had some decent JSON Schema exposure in the past, and It was pure pain. The main problems with JSON Schema revolve around the fact the validation results vary based on the implementation.
I've had yamlls and ajv not only resolve the URI's between the subschemas differently but also produce different errors altogether. I've dealt with the first issue by bundling them into a single file, but the latter had no answers.
Crafting a proper schema is not an easy task, what might work well on one implementation of the parser might not go so well on another one. There are also caveats to the schema itself, I've mentioned to @CRThaze here #151 the example of oneOf not doing what you think it might do with complex type (objects). The errors you're getting are crafted by the implementation, and they often tend to be ambiguous to say the least.
You have to also take the schema version into consideration, many implementations support draft 07 and don't plan on moving forward, this obviously limits what you can do with the schema.
Coming back to Go, both libraries mentioned in the aforementioned PR are not maintained anymore, leaving us with almost nothing. While we don't want to focus on Go and k8s specifically, we should take that into consideration since oslo is written in it.

Cue converts natively to both openapi and CRDs, having openapi spec generated opens up a way to convert it to JSON Schema eventually. If you have openapi you CAN generate into any language you'd like.
What it brings to the table I really like compared to JSON Schema is that it is reliable, it will produce the same exact error wherever you run it, given you use cue to do the validation. It reduces the toil of maintaining the JSON Schema to be compatible with different implementations. It can be also easily embedded into oslo and is generally much more readable and easier to write than JSON Schema.

@nieomylnieja
Copy link
Member

I'm currently picking up on what was initially done in #32, as a minimum I'd like to have a full working SLI schema defined in cue as it has the XOR logic on an object member of ratioMetric vs thresholdMetric which in JSON Schema is a bit tricky to do. I wonder how well would that translate over to JSON Schema through OpenaAPI spec. If it translates and we can generate JSON Schema from cue, even for our sanity sake we should go with cue imo ;)

@fourstepper
Copy link
Contributor

@nieomylnieja You sound convincing :)

If OpenAPI is enough for languages to get up and running and we would be able to generate that using CUE, I think that would be enough on the "support other languages than Golang" front

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested schema Shape of YAMLs
Projects
None yet
Development

No branches or pull requests

9 participants