Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kinesis: Allow for more than one consumer per function/stream #9491

Closed
aheuermann opened this issue May 17, 2021 · 38 comments · Fixed by #9706
Closed

Kinesis: Allow for more than one consumer per function/stream #9491

aheuermann opened this issue May 17, 2021 · 38 comments · Fixed by #9706

Comments

@aheuermann
Copy link

aheuermann commented May 17, 2021

Use case description

I have a use-case where we want to deploy a new "version" of the same function that consumes from kinesis stream. We deploy the function in parallel by using a different stage which creates a new lambda; however, it fails when trying to create a kinesis fanout consumer because the consumer name conflicts between the two functions. Resulting in the following error in cloudformation:

xxx-xxxConsumer|arn:aws:kinesis:us-west-2:xxxxxx:stream/production-model already exists in stack arn:aws:cloudformation:us-west-2:x:stack/stream-consumers-production/xxxxx

Proposed solution

If we could override the consumer name used when creating the kinesis fanout consumer. I realize we could create the consumer outside of the serverless cloudformation and reference the ARN, but would ideally like it to be created in the same cloudformation stack so that it is also destroyed when the lambda is destroyed.

Currently consumer is assumed to be an string (arn just used as-is) or true (consumer is created with name based on function and stream). Would it be possible to first check whether the string is an arn format otherwise create stream using the string as the name?

@pgrzesik
Copy link
Contributor

Hello @aheuermann, thanks a lot for reporting. I see how that can be potentially problematic as the logical IDs overlap due to the same name of the consumer. As for the proposal to inspect the string, it is a possibility, but I'm usually a bit against properties that can accept values with totally different meanings - maybe we could introduce a separate property that would allow to define the explicit name of the consumer? What are your thoughts on this?

@aheuermann
Copy link
Author

maybe we could introduce a separate property that would allow to define the explicit name of the consumer? What are your thoughts on this?

Yeah that is a better idea. I'm already taking a look, hopefully get a PR out soon. Thanks!

@medikoo
Copy link
Contributor

medikoo commented May 20, 2021

@pgrzesik isn't the issue here that created costumer name is not unique agaist <service>-<stage>? We had similiar issue with request validators, and we solved it by ensuring that validator name created with one service cannot end same as in other. Shouldn't we apply same fix here?

@pgrzesik
Copy link
Contributor

pgrzesik commented May 20, 2021

That is a great point @medikoo - given that, I believe a better choice would be to instead of allowing to specify the customerName, instead generate it with service and stage (here:

getStreamConsumerName(functionName, streamName) {
) to ensure it's unique across different deployments. Sorry @aheuermann for confusion here.

@preshetin
Copy link
Contributor

@pgrzesik I'd like to take this one. So I assume the solution would look like this:

--- a/lib/plugins/aws/lib/naming.js
+++ b/lib/plugins/aws/lib/naming.js
@@ -372,7 +372,9 @@ module.exports = {
     )}${this.normalizeNameToAlphaNumericOnly(streamName)}`;
   },
   getStreamConsumerName(functionName, streamName) {
-    return `${functionName}${streamName}Consumer`;
+    const serviceName = this.provider.serverless.service.getServiceName();
+    const stage = this.provider.getStage();
+    return `${functionName}${streamName}${serviceName}${stage}Consumer`;
   },
   getStreamConsumerLogicalId(streamConsumerName) {

If we go this way then several tests would have to be adjusted.

I will be happy to create a PR for this

@pgrzesik
Copy link
Contributor

pgrzesik commented Jul 6, 2021

Hey @preshetin 👋 We'd love to accept a PR for this one 🙌 That's pretty much the approach that we should take here - there's one caveat though - we need to confirm if it's not breaking for existing users as it's going to alter the logical ID of stream consumer.

@preshetin
Copy link
Contributor

there's one caveat though - we need to confirm if it's not breaking for existing users as it's going to alter the logical ID of stream consumer.

Hey @pgrzesik I can confirm it works for existing users. Let me share how I did it. ⚠️ Disclaimer: I'm new to Kinesis so feel free to correct me if I'm doing it the wrong way.

First, I deployed a service with old naming. Then re-deployed it with a patched version of sls command. It went smoothly. It looks like the new consumer resource gets created and the old one gets deleted. Here's from from CF console logs:

...
CloudFormation - CREATE_COMPLETE - AWS::Kinesis::StreamConsumer - HelloFooslsDashkinesisdevConsumerStreamConsumer
...
CloudFormation - DELETE_COMPLETE - AWS::Kinesis::StreamConsumer - HelloFooConsumerStreamConsumer
...

I then put a record to my Kinesis stream and can confirm it was processed.

I'm going to create a PR with instructions how to test it.

Let me know if there are any concerns

@medikoo
Copy link
Contributor

medikoo commented Jul 7, 2021

@preshetin great thanks for checking it out.

Still isn't there a risk, that old instances of lambda will attempt to push to old stream, at the moment it is replaced?

I think we do not have guarantee that old lambdas will be immediately replaced. e.g. when deployment will be going on, the old instances might be at some invocation and still attempt to work with old streams.

Do we have clarity how it'll work?

@preshetin
Copy link
Contributor

preshetin commented Jul 7, 2021

Still isn't there a risk, that old instances of lambda will attempt to push to old stream, at the moment it is replaced?

Hi @medikoo, this issue is about adjusting stream consumers, not producers.

By default, Kinesis stream cannot be removed if there are existing consumers. In such a case, EnforceConsumerDeletion parameter has to be added to aws command. Or consumers have to be deleted first.

Do we have clarity how it'll work?

I haven't worked with Kinesis before, so I may not see the nuances that are important. If some guidance is provided at what I should look at then I can try to give it another try.

I might have been too impatient when I created my PR so if this is not what we want then I'm totally fine if it is closed.

@medikoo
Copy link
Contributor

medikoo commented Jul 7, 2021

@preshetin great thanks for the PR. We will be very happy to take it, but let's just first inspect all eventual implications.

I also do not have significant experience with Kinesis to be able to tell.

Does it have any chance to affect stream producers? (so entities that feed the stream), or is that tightly bound with stream configuration, and producers are automatically detached with stream removal (?)

@preshetin
Copy link
Contributor

I agree @medikoo it is important to make sure nothing breaks. Especially when data can potentially be lost.

I don't think it would affect stream producers at all. In Kinesis, there is no such CF resource as producer. Only stream, consumer, and few resources with AWS::KinesisAnalytics:: prefix.

My understanding of a producer is it's an app that sends data to stream via API. To put data to a stream, only stream name has to be specified. Stream ARN looks like arn:aws:kinesis:region:account-id:stream/stream-name and is composed of three properties, stream name, AWS account ID, and AWS region.

The only thing solution from #9491 (comment) does is it ensures consumer name is different for different services or stages. That's all it does. In my PR, I've also outlined steps to reproduce this bug.

Does this make sense? If there are still any concerns I would try to address them

@medikoo
Copy link
Contributor

medikoo commented Jul 8, 2021

@preshetin great thanks for explanations. That looks great to me. Let's just wait for @pgrzesik to double confirm (he will be back on Monday)

@pgrzesik
Copy link
Contributor

pgrzesik commented Jul 12, 2021

Thanks a lot for elaborate discussion above - it all sounds great in general and I believe the risk is small, but I'm wondering - what about time-critical applications that rely on kinesis integration. With the replacement above, is there a risk that there will be a time window in which the data from stream won't be processed until the new StreamConsumer will be provisioned? Did you check that by any chance @preshetin ?

@preshetin
Copy link
Contributor

@pgrzesik I looked into the concern you raised. The way Kinesis works is when a producer puts data to stream, Kinesis will keep it at least 24 hours. Here's from Kinesis docs:

The retention period is the length of time that data records are accessible after they are added to the stream. A stream’s retention period is set to a default of 24 hours after creation.

I ran a small experiment during which I created a stream, then put a record into it, and only then created a service with Serverlsss Framework that contained consumer resource. Once the service was deployed I checked logs and can confirm that piece of data was processed.

Let me know if there are still any concerns

@pgrzesik
Copy link
Contributor

Thanks for testing it out and for clarification @preshetin - I was wondering also about the other thing - assuming that application is time-sensitive (it's processing data form the stream as soon as possible) - does that replacement cause a significant downtime in processing records from the stream? For example, during redeployment, events are not processed by 1 minute or more. Do you think that might be the case here?

@preshetin
Copy link
Contributor

I think there may be some downtime during deployment between the time the old consumer is deleted and the new consumer is created. To address that, there are deployment techniques like Blue/Green deployment that creates two parallel app versions ("Blue" and "Green") and then quickly switch to a new version. For this, two consumer apps have to be created and then switched from the old one to the new one.

For Lambda functions, I think this is why Lamba version resources exist. During deployment, a new version is created and then the latest version is used. Such architecture achieves zero downtime deployments.

In Kinesis consumers, there is no such resource as the consumer version. So there can be some delay in consuming real time data. The time of this delay is about the same as service deployment time. For big apps, it can be more than one minute.

I hope that answers your question @pgrzesik

@pgrzesik
Copy link
Contributor

pgrzesik commented Jul 14, 2021

Thanks a lot @preshetin - that answers my question very well. Given the fact that this change can introduce an unexpected change, and cause even temporary delay in consuming data, I think it should be opt-in in context of v2 version and become the default with v3 version of the Framework, with deprecation notice in context of v2. What do you think @medikoo? Or to not clutter users with deprecations, we're okay with risk of potential unexpected delay in consuming data after redeployment with newer version? I personally prefer to err on the side of never breaking existing functionality, even in minor way in such situations and would vote for making it opt-in in v2.

@medikoo
Copy link
Contributor

medikoo commented Jul 14, 2021

@pgrzesik I trust your judgement on that. If you feel migration process may be harmful for any case, we definitely should make this opt-in

@pgrzesik
Copy link
Contributor

@medikoo Thanks for your input 🙇 I believe, as also @preshetin outlined, that it's possible that this change can cause unexpected downtime of the service. Given that, I believe we should extend the current implementation with two things:

  1. Make it opt-in via an explicit flag
  2. If someone uses that event and does not have a flag configured, they should see a deprecation with a notice that this will become the default starting with next major

One question I still have is if the flag should be per-event/function or rather on provider level for the whole service. From your experience @aheuermann - what do you think would be a better approach?

@aheuermann
Copy link
Author

aheuermann commented Jul 16, 2021 via email

@pgrzesik
Copy link
Contributor

Thanks a lot @aheuermann and congratulations 🎉

As for definition on provider level - if you set it on provider, it will apply to all functions, but you can also specify it only on specific functions/events.

@preshetin what do you think? Would you be interested in continuing work on that given the proposed changes above?

@preshetin
Copy link
Contributor

preshetin commented Jul 29, 2021

I'm leaving for vacation until 22th of August, or even until 14th of September. Before that time, I won't be available.

So @pgrzesik if later time works for you I'd be happy to implement it. I guess we should agree on the implementation. You may provide how you see it, or I can propose something once I'm available.

If it feels like too much time feel free to de-assign me

@pgrzesik
Copy link
Contributor

pgrzesik commented Aug 3, 2021

Thanks for the update @preshetin - that all sounds good, let's come back to it once you're back. Have great vacations 🎉

@preshetin
Copy link
Contributor

Hey @pgrzesik I'm back now. To move forward with this, we have to agree on the implementation. It looks like we can have it done the way it was described in #9496 (comment)

The way I understood that proposal is we expand consumer parameter to also be an object. Currently it can be only boolean or string starting with arn:.... When defined as an object, consumer can have a name property that would solve the issue.

Let me know your thoughts

@pgrzesik
Copy link
Contributor

Hello @preshetin, hope you had good vacations 🙌

The discussion here uncovered that setting the consumerName might not be necessary after all, so my linked comment might not apply at all. As @medikoo pointed out in #9491 (comment), it's a problem of the combination not being unique across stage and service combination. So, I think that solution you suggested here: #9491 (comment) sounds good, but we also need to apply that logic only if user explicitly opts in as it will cause redeployment and potential downtime/delay in processing.

@preshetin
Copy link
Contributor

Thanks @pgrzesik for the explanations. I was thinking to go with the previous solution because it actually provides a clear way for user to explicitly opt-in.

When we proceed with #9491 (comment), how do you see a way for a user to opt-in? The only idea that comes to my mind is we introduce a forceRedeploy property which is false by default. If it is true, then an updated naming is applied. This feels a bit unnatural to me however.

Any ideas on how to allow a user to explicitly opt-in?

@pgrzesik
Copy link
Contributor

I agree that the first solution allows a nice way for users to opt-in, but the problem with that is that it doesn't really solve the underlying issue that should be addressed by the framework, which is ensuring uniqueness of the generate name for service and stage pair. The approach that I think would make the most sense to me would be similar to a flag that we have for http or httpApi events like this one #9758

That would allow users to explicitly opt-in to a new naming scheme. After thinking about it more, I think we should not make it granular and provide a single opt-in flag on provider level instead of supporting it on each event separately for keeping things simpler.

@aheuermann @preshetin @medikoo How does that sound to you?

cc @mnapoli

@medikoo
Copy link
Contributor

medikoo commented Aug 27, 2021

@pgrzesik it sounds great to me.

Concerning property name. I suggest to introduce:

provider.kinesis.consumerNamingMode: "serviceSpecific"

Make it a default, and show deprecation if it's not set (assuming service relies on kinesis stream and is subject to naming issue).

@pgrzesik @preshetin What do you think?

@pgrzesik
Copy link
Contributor

Make it a default

I think we can make it a default only with new v3, is that what you had in mind?

As for the actual property, I was thinking about boolean flag as I don't expect we will ever support more options here so something like serviceSpecificConsumerNamingMode: true or similar but I know it's quite verbose name of the property.

What do you guys think?

@medikoo
Copy link
Contributor

medikoo commented Aug 27, 2021

I think we should not be that afraid of verboseness (it's better if name is clear in what it address).

I proposed to consumerNamingMode: "serviceSpecific" as maybe there would be a case in a future to again revisit that (and then we can again use same name, but with different value). Still anyway I think this properly should be supported only in v2, and since v3 we should just introduce service specific handling. So what you proposed @pgrzesik also works for me.

@preshetin
Copy link
Contributor

I'm preparing a PR for this, hoping it'll be ready in a few hours.

provider.kinesis.consumerNamingMode: "serviceSpecific"

I suggest we introduce provider.stream.kinesisConsumerNamingMode instead since there's no such event like kinesis. So what we are doing here is configuring stream event at the provider level.

@medikoo
Copy link
Contributor

medikoo commented Sep 2, 2021

I suggest we introduce provider.stream.kinesisConsumerNamingMode instead since there's no such event like kinesis.

That's true. I proposed kinesis namespace as we were considering separating those two events #8137 Still it's unlikely we will follow with that (it'll be a breaking change, and it's always hard to migrate all users)

Anyway as on event level it appeared to be problematic to handle both in one bucket I think it's wise now to rely on separate namespaces for that on provider

@preshetin
Copy link
Contributor

Thanks @medikoo I'll stick with what you initially proposed then. Just wanted to make sure there's no typo here

@stickystyle
Copy link

Does anyone have any insight if this issue is still being worked on?

@medikoo
Copy link
Contributor

medikoo commented May 12, 2023

@stickystyle There was a PR, that was near finalized I think, yet postponed due to v3 release.

We'll be happy to take it, but it needs to be rebased against lastest main, @preshetin are you still open for that?

@preshetin
Copy link
Contributor

Hey @medikoo I will see what I can do. Will let you know

@medikoo
Copy link
Contributor

medikoo commented May 12, 2023

Thank you @preshetin 🙇

@preshetin
Copy link
Contributor

The PR with solution was updated with the latest code and now targets main branch.

@medikoo can you take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment