Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscriptions manager API and Kafka, and event logging producers in general #1121

Open
jroper opened this issue Nov 10, 2022 · 7 comments
Open

Comments

@jroper
Copy link
Contributor

jroper commented Nov 10, 2022

I've had a look at the subscriptions spec, and I'm not sure where the right place is to initiate this conversation is, but it seems rather unidiomatic in the way it uses Kafka.

Kafka in general doesn't have a concept of subscriptions, nor does it need one, since Kafka is conceptually equivalent to ad-hoc consumers tailing logs. You don't need to register a subscription on a log file to be able to run the tail command on it.

So, if I did have a system that was built on Kafka, and I had a source of events that was a Kafka topic (pushed to by some producer, but that producer is pushing to that topic regardless of whether there are any consumers, so the fact that that producer is pushing to the topic shouldn't be considered a subscription). If I register a subscriber, according to this spec as I understand it, it sounds like a new Kafka topic would be created, and events from the first Kafka topic would be consumed from the original topic and put onto the new topic. The problem here is that this is just not idiomatic Kafka usage. It's in fact problematic because it's a waste of resources, because the topic is now duplicated, you have two identical logs.

Is my understanding of how this is meant to work with Kafka correct, or have I misunderstood things?

I do think in general, there are two different ways that the transmission of events is handled - one is the MQTT or AMQP way, where events are inherently transient, and are "delivered" to explicitly registered destinations. These events, once consumed or delivered to all consumers, disappear from the producer or intermediary. The other way is logging, and this is the Kafka style, and also any producers that use event sourcing (which is what I'm very much interested in this spec for because at Lightbend with Kalix, we offer event sourced serverless entities, so our entities are inherently sources of events that it would be great to have a generic way like the CloudEvents spec to subscribe to them). In this style of transmission, events are persistent, sometimes indefinitely (this is the case when event sourcing), sometimes with a time to live, but whether it's persisted indefinitely or for a certain amount of time, the consumption of an event by a consumer has no relation to when or if the event gets discarded by the producer or intermediary, and therefore the producer doesn't need any knowledge of a particular subscriber. Subscribers can join in an ad-hoc manner to consume events, without needing to register any intention of subscribing to them first.

I think the subscriptions spec should acknowledge this axis of event distribution, logging vs transient delivery, and be compatible with it in an idiomatic way, not requiring the concept of subscriptions to be bolted on where it doesn't belong. In particular, I'd love to see a generic, non proprietary protocol for doing a pull style consumption of events from a producer. I'm not sure what this might look like, but I could imagine, when posting to a subscription manager, the subscription manager rather than actually doing something, might respond with instructions of how to pull messages from the producer. Or perhaps the subscription manager may be able to respond directly with the event stream.

@jroper
Copy link
Contributor Author

jroper commented Nov 10, 2022

To put a potential solution for a generic way to subscribe to pull based protocols out there - let's say we decided that Server-Sent Events would be the protocol used for pull streaming (SSE is almost perfect for this use case, as it has built in stream resumption semantics, necessary when a consumer is responsible for tracking where they are up to in consumption). I could imagine something like this:

Request

POST /subscriptions HTTP/1.1

{
  "id": "sub-193-18365",

  "filters": [
    { "prefix": { "type": "com.example." } }
  ],

  "protocol": "SSE"
}

Response

HTTP/1.1 300 Multiple Choice
Link: <https://my.producer/stream?prefix=type:com.example.> rel="event-source" type="application/cloudevents+json">

Then, when the client sees this, it can make an SSE request to the resource, and expect to get cloudevents back.

@clemensv
Copy link
Contributor

In the latest revisions of the discovery spec, we distinguish producer, consumer, and subscriber endpoints. An existing Kafka consumer group is a consumer endpoint and does not need the subscriber model. Same applies to existing queues on a queue broker and even ad-hoc subscriptions on a queue pub/sub broker. We have not yet spelled out the interactions.

The subscriber protocol for Kafka might be used to create a new consumer group via this standardized API and then wire up Kafka Connect or MirrorMaker to point to the designated endpoint to flow events there.

@jroper
Copy link
Contributor Author

jroper commented Nov 22, 2022

Ok I see, so essentially for something like Kafka, the subscriptions spec says using Kafka is part of the spec.

I guess then what I'm interested in seeing has nothing to do with Kafka. Currently the subscriptions spec provides a generic spec for push based eventing. I'd like to see something for pull based eventing. It can piggy-back off an existing pull based eventing open standard (eg, Server-Sent Events), so it can be just as simple as the push based one.

@duglin
Copy link
Collaborator

duglin commented Jan 18, 2023

@jroper are you looking for just an http-based pull based eventing model or a transport agnostic one?
ie. is your proposed http one just a sample of the idea or the full scope of what you're looking for?

@clemensv
Copy link
Contributor

We'll get some more clarity into this when we formally define the protocol details for discovery/registry.

This is a consumer endpoint declaration for MQTT. For Kafka, the options would contain the consumer group name and possibly an initial read offset (e.g. "oldest" or "latest") and the URI would identify the bootstrap address. Otherwise pulling from Kafka is as easy as walking up to it and starting to read.

@jroper
Copy link
Contributor Author

jroper commented Feb 15, 2023

I'm looking for an HTTP based eventing model. I think that does require defining some transport agnostic concepts, the same way CloudEvents defines a bunch of transport agnostic concepts. But just as CloudEvents itself would be of no use if there wasn't a single concrete encoding, like the JSON one, I think a protocol agnostic pull based eventing model wouldn't be useful (there would be no way to validate it) without a single concrete transport encoding.

@github-actions
Copy link

This issue is stale because it has been open for 30 days with no
activity. Mark as fresh by updating e.g., adding the comment /remove-lifecycle stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants