Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 roadmap proposal #230

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

V2 roadmap proposal #230

wants to merge 3 commits into from

Conversation

febus982
Copy link
Contributor

@febus982 febus982 commented Nov 24, 2023

Initial draft for a major refactor of the repository.

One line description for the changelog

V2 roadmap proposal

Signed-off-by: Federico Busetti <729029+febus982@users.noreply.github.com>
pre-commit-ci bot and others added 2 commits November 24, 2023 14:49
Signed-off-by: Federico Busetti <729029+febus982@users.noreply.github.com>
@febus982
Copy link
Contributor Author

febus982 commented Dec 2, 2023

@xSAVIKx pinging you in case you missed this. I realized it's a bit verbose, if it helps I can add a class/methods diagram for the actual version and the proposed changes to better show the differences.

Copy link
Member

@xSAVIKx xSAVIKx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@febus982 thanks a lot for your work! this indeed is a great proposal and a roadmap. For now, I've just added some comments to the places where I think we should discuss smth a little bit more.

I'd love to dedicate more time to this right away, but there are multiple other things everyone needs to get done till the end of the year, so just don't think I'll have any available slots soon, but we can keep this async.

Again, thanks a lot for your time and effort and let's try to make this happen 👍 🙏

* Validate required fields and generation of defaults
* Marshalling/unmarshalling the cloudevents core and extensions fields
* Marshalling/unmarshalling the `data` field
* Encode/decode the marshalled fields to/from the specific binding format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd arguably say this is the responsibility of the specific binding format extension rather than the core library

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a list of all the operations needed to transform a python class in a specific event format implementation

Comment on lines +45 to +46
* `cloudevents.formats` - This would contain the structured formats implementations (JSON, AVRO, Protobuf)
* `cloudevents.bindings` - This would contain the specific binding implementations (kafka, http)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of moving forward and scoping modules like they are scoped in the spec. Looks like a good go-to solution to me

Comment on lines +60 to +63
The `data` field depends on the specific event and the `contentdatatype` attribute and
its marshalling/unmarshalling logic should be the same for any binding/format. It seems
a `CloudEvent` class responsibility to take care of this. (We should implement this logic
in overridable methods)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So from one perspective, I do agree with the statement, but from the other, it sometimes is beneficial to have the marshalling/unmarshalling logic moved elsewhere. IMO we can achieve that by using mixin classes. Again, this is more of an implementation detail

a `CloudEvent` class responsibility to take care of this. (We should implement this logic
in overridable methods)

Ideally the end user will be able to extend the `CloudEvent` class for their necessity. Ie:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extend or create a separate one from scratch and just the behavior should be the same. I guess we may benefit from having a Protocol with behavior description


Ideally the end user will be able to extend the `CloudEvent` class for their necessity. Ie:

* adding new extensions fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the responsibility of the libraries CloudEvent implementation to handle all the possible extensions fields as far as I can tell. There may be special extension-specific mixins that provide additional behavior beneficial for this or that implementation, but overall we should be able to digest any extension fields with just the base implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be special extension-specific mixins that provide additional behavior

Yes, I used the word "extend" in a generic meaning. It could be also a mixin with some custom implementation.

The idea is that it should be possible (and easy) to create a class MyBaseEvent(MyMixin, CloudEvent) at application level to extend the sdk functionalities.

Comment on lines +71 to +72
We also want a DTO that can be accepted by the most common libraries (ie. JSON) to keep the
`CloudEvent` class decoupled from the formats/bindings. Using a dictionary seems a good idea.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what we want is a canonical representation. That's what Event and BaseEvent are in the current implementation. I don't like the way it is right now.

But I am also not sure if we want to always keep a canonical representation as dict. Maybe it's better to have a dataclass instead or expose some immutable data structure over dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the end user to never deal with this DTO. They should always rely on either the CloudEvent class or the specific format output, which could be a class (ie. KafkaEvent)

This DTO structure was thought to be used only internally, to move data between the base class, the formats and the bindings.

I thought about dictionaries because they are fast, and libraries we might want to use for formats and bindings will probably support them out of the box (the classic example would be the JSON library that maps JSON objects to dicts)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we'll then use Protocol definition as the base DTO. It already is immutable, well-defined in the spec as well as has a to_json and to_text implementations out of the box. The drawback here is that we're dragging Protobuf inside the core.

I'm kinda just thinking out loud here as I don't have a strong opinion on this at the moment. Also depending on whether the canonical representation is gonna be part of the CloudEvent instance or can be just created on the fly we might need to use different approaches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we'll then use Protocol definition as the base DTO. It already is immutable, well-defined in the spec as well as has a to_json and to_text implementations out of the box. The drawback here is that we're dragging Protobuf inside the core.

Ideally I think it would be more maintainable to keep formats implementation separate from the base class, so that they can be implemented and tested independently.

The more formats we implement, the bigger the class will become and the created objects as a consequence. I am thinking about performance in systems where we want to handle a high number of events/sec and allocating memory for larger objects could become impactful.

Copy link
Contributor Author

@febus982 febus982 Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also keep the existing implementation interface where we have "get_attributes" and "get_data" and use them in bindings and formats.

If we find at some point that having a dict in the middle of the pipeline might help we can introduce it.

Comment on lines +76 to +77
QUESTION: Can we define a data type for the `data` field after it's been marshalled?
(ie. can we say it will be either `str` or binary data?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the type should be known in advance.

Comment on lines +79 to +96
### The `cloudevents.formats` module

Formats do solve a single responsibility:

* Marshalling/unmarshalling the cloudevents core and extensions fields

Their implementation has to be used when other bindings implement their
structured input/output and make sure their implementation is compatible.

Each format accepts the dictionary from the `CloudEvent` as param
and produces an output based on the format itself. Ie.

```python
def to_json(event: dict) -> str: ...
def from_json(event: str) -> dict: ...
```

TODO: Review typing as JSON produces a string but other formats might be different.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is also a little bit tricky. I'd rather avoid doing a CloudEvent -> dict -> json conversion if we can do CloudEvent -> json without the dict part (also not sure if that's possible, most probably it'd be done somewhere under the hood anyway).

The next part is the API. As we've faced in the current implementation, it should be structured in a way that it's super easy to swap the underlying implementation (e.g. replace native json with smth else) or at least propagate some configuration options. I'm also not sure if we want to have that capability as part of the API or we just should provide an easy way to extend the API instead, e.g. have our own different version of to_json where we use different libraries and have them as optional dependencies (or maybe just as a separate library like cloudevents-ext


Bindings do solve these responsibilities:

* Marshalling/unmarshalling the cloudevents core and extensions fields **(if binary format)**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bindings sometimes support structured formats as well

Copy link
Contributor Author

@febus982 febus982 Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is for structured formats to use the formats implementation, to marshall the specific fields. (The assumption here is that formats are the same for all the binding implementations)

Ie.

If I need to marshall a CloudEvent in a JSON structured HTTP event, I expect the SDK to:

  • use the JSON formatter (which would marshall the single fields as specified by the format)
  • compose headers and data fields (but not marshalling the single fields)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that assumption is not true taking into account the spec. I'm not quite sure I fully understand the example, but in HTTP event attributes are converted to headers and data is the HTTP request payload AFAIK.
So not sure how the JSON formatter is relevant here. Can you maybe expand your example a little bit?

Copy link
Contributor Author

@febus982 febus982 Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the HTTP structured mode the whole event (attributes and data) is marshalled into the body.

Headers MAY include the attributes, but attributes MUST be in the body.

For JSON conversion I'd expect something like this:

def to_http_json(cloudevent: CloudEvent):
    dict_event = cloudevent.to_dict()
    body = formats.to_json(dict_event)
    headers = [] # compose headers according to spec, eventually grabbing data from dict_event
    return headers, body

Comment on lines +174 to +201
## Pydantic support

The current Pydantic implementation is implemented as a possible substitute of
HTTP Event class, by looking at its conversion module and tests, but it's used
by calling the functions in conversion module.

Pydantic offers good support and performance for data validation, defaults and
JSON serialization (some of the requirements we have identified)

We need to make a decision:

* We use pydantic as a base for `CloudEvent` class:
* PROs
* No need to implement data validation
* Performance (pydantic V2 is written in Rust)
* We can lay down some base logic on `data` field serialization (ie. nested pydantic models)
* Enable users to use pydantic for automatic documentation functionalities
* CONs
* We introduce a new package dependency, and depend on its implementation
* _minor_ If we use pydantic JSON serialization it is implemented in the model class (slightly different from what proposed but we can use wrapper functions in the JSON format module)
* We remove pydantic support:
* PROs
* We have more control over validation and defaults implementation
* We don't depend on other packages
* CONs
* Performance on validation and defaults (we can still use libraries like `orjson` for JSON performance)
* Additional effort in writing and maintaining data validation and required/default fields
...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I love Pydantic, I don't think we should have it as a core dependency and I'd rather have a Pydantic-specific extension/implementation of CloudEvents.

I do agree that it kinda solves a lot of questions out of the box, but still is a dependency we can't drag in. Especially while there are multiple projects that are still using Pydantic v1 while the v2 is kinda the mainstream already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here. I just thought it might be easier to maintain a single approach for more consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely might be, but as I mentioned, some libraries currently depend on CloudEvents and they may not be able to move forward if CloudEvents brings Pydantic as a required dependency.

@febus982
Copy link
Contributor Author

febus982 commented Dec 4, 2023

@febus982 thanks a lot for your work! this indeed is a great proposal and a roadmap. For now, I've just added some comments to the places where I think we should discuss smth a little bit more.

I'd love to dedicate more time to this right away, but there are multiple other things everyone needs to get done till the end of the year, so just don't think I'll have any available slots soon, but we can keep this async.

Again, thanks a lot for your time and effort and let's try to make this happen 👍 🙏

@xSAVIKx I replied to some comments but I'm happy to see we're roughly on the same page, with only details to be discussed. I'm happy to continue asynchronously.

@febus982
Copy link
Contributor Author

Hey @xSAVIKx , unfortunately because of unforeseen circumstances I don't think I can dedicate more time to this roadmap. Feel free to pick up from this idea, if you think it's still relevant, and iterate over it.

@xSAVIKx
Copy link
Member

xSAVIKx commented Apr 12, 2024

Hey @febus982. Thx for the update. I hope everything is ok.

I also hope to find some extra time myself at some point to move forward with the v2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants