Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADTs as a Foundation for the IDL Vocabulary #54

Open
koehlma opened this issue Sep 15, 2023 · 0 comments
Open

ADTs as a Foundation for the IDL Vocabulary #54

koehlma opened this issue Sep 15, 2023 · 0 comments

Comments

@koehlma
Copy link

koehlma commented Sep 15, 2023

After looking at the individual issues, I would like to share a few general and overarching thoughts. I feel scattering them around the individual issues will lead to confusion. Hence, I decided to create a new issue.

For context, I am currently working on an IDL and my thoughts have been shaped by that.

TL;DR: I think algebraic data types (ADTs) would be a good foundation for the IDL vocabulary.

I am looking forward to your feedback and comments. :)

General Approach

According to the philosophy outlined by @gregsdennis in issue #47:

“[We] must start with those languages by enumerating their features, and, for each language feature, find a way to represent it in JSON Schema using either existing keywords or defining new ones.”

I think that this is exactly the right approach and, in my mind, it means defining a type system which has all relevant features and then finding a way to map between types of that system and JSON Schema with extended IDL vocabulary.

My suggestion is to turn to type theory to get a foundational understanding of different features of programming languages. In particular, I think, we should start from algebraic data types (ADTs) and see how those map to JSON schema.

Why ADTs?

First of all, I would like to point out that ADTs are supported by most languages. In my opinion, their widespread availability alone make them a perfect fit for the type system of an IDL and, thereby, for the IDL vocabulary. If we can map a JSON schema to a set of ADTs, then we can also easily generate code for a wide varity of languages.

At the same time, ADTs are very general and, in my experience, sufficient to describe any data model one could want. I also take it to be important for the IDL vocabulary that JSON formats already existing in-the-wild can be faithfully captured. Note that I do not mean that any existing JSON schema can simply be annotated to support code generation. It has already been pointed out that this does not make much sense (see #47). Instead, I mean that I can (re)write a schema for some format to accommodate code generation. While this may not be possible in all cases, it is nevertheless a goal worth having. I am quite confident that, due to their generality, ADTs as a foundation would allow capturing many existing formats.

To summarize, I think ADTs are a good starting point for the IDL vocabulary because (a) they are supported by most languages (or can be encoded), and (b) they are very general covering most data modeling needs.

ADTs and JSON

To give you an idea about how ADTs map to JSON, let's have a look at an example for two-dimensional coordinates (in Rust):

enum Coordinate {
    Cartesian(CartesianCoordinate),
    Polar(PolarCoordinate),
}

struct CartesianCoordinate {
    x: f64,
    y: f64,
}

struct PolarCoordinate {
    r: f64,
    phi: f64,
}

There are multiple options to map such coordinates to JSON. Here are three options1 for the coordinate $(4, 5)$:

  1. { "x": 4, "y": 5 } (implicity tagged)
  2. { "type": "Cartesian", "x": 4, "y": 5 } (internally tagged)
  3. { "Cartesian": { "x": 4, "y": 5 } } (externally tagged)

I guess, you are all able to build the respective JSON schemas in your head. ;)

Note that enums are a special case of sum types, namely sum types without data. Here is another example:

enum TrafficLightColor {
    Red,
    Yellow,
    Green,
}

Again, there are multiple options to encode the color of a traffic light. For instance, "Red", "RED", 0, and { "color": "Red" }, are all imaginable encodings of the variant Red of the TrafficLightColor type.

How to proceed?

I suggest, we focus the effort on mapping ADTs to JSON Schema and vice versa. To this end, we may create a list of possible JSON encodings of ADTs and their respective JSON schemas, i.e., identifying the encoding patterns. The examples I have shown cover simple objects (#46), enumerations (#43), sum types (#48), and to some extend polymorphism2 (#49). Reconstructing ADTs from a JSON schema of a particular representation without any further annotations is challanging, to say the least. This is where I see the role of the IDL vocabulary to provide the necessary information.

Tooling

As I said, I am working on an IDL and, as part of that, I am currently working on the generation of JSON Schema from the types of the IDL. I could, in principle, come up with some keywords to preserve the type information when doing the mapping. I also see this as a kind of test bed for the different encodings. It is already possible to specify different JSON encodings by annotating the type definitions in the IDL. This is also how I managed to define the structure of JSON Schema itself in the IDL.

Footnotes

  1. I took some inspiration from Serde here.

  2. Sum types enable polymorphism. For instance, in Java one may use a sealed interface to implement them. For the example, one could envision a function taking a Coordinate which in OOP land may be either an instance of CartesianCoordinate or PolarCoordinate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant