New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(aws): Add support for msk as event trigger #8164
feat(aws): Add support for msk as event trigger #8164
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8164 +/- ##
==========================================
+ Coverage 88.04% 88.10% +0.05%
==========================================
Files 248 250 +2
Lines 9329 9357 +28
==========================================
+ Hits 8214 8244 +30
+ Misses 1115 1113 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @pgrzesik that looks really good! Please see my suggestions and let me know what do you think
docs/providers/aws/events/msk.md
Outdated
|
||
<!-- DOCS-SITE-LINK:START automatically generated --> | ||
|
||
### [Read this on the main serverless docs site](https://www.serverless.com/framework/docs/providers/aws/events/cognito-user-pool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/cognito-user-pool/msk/
const EventSourceArn = event.msk.arn; | ||
const Topic = event.msk.topic; | ||
const BatchSize = event.msk.batchSize || 100; | ||
const Enabled = event.msk.enabled != null ? event.msk.enabled : true; | ||
const StartingPosition = event.msk.startingPosition || 'TRIM_HORIZON'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style suggestion: In JS convention is to use capitalized var names only for function constructors.
Therefore it'll be less confusing to use not capitalized var names, although I know it'll get more repetitive at resource configuration.
(Also we already do dependsOn
instead of DependsOn
)
Unfortunately this is hard to guard by lint rules, as in most cases linter can't be sure if we deal with function constructor or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes perfect sense - I was trying to follow the convention I found in similar places (stream
or sqs
events) but I totally agree - changed 👍
lib/plugins/aws/lib/naming.js
Outdated
return `${this.getNormalizedFunctionName( | ||
functionName | ||
)}EventSourceMappingMSK${this.normalizeNameToAlphaNumericOnly( | ||
clusterName | ||
)}${this.normalizeNameToAlphaNumericOnly(topicName)}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have 255
max chars length. It'll be nice to probably secure we do not overflow that.
I think we can assume that normalizedFunction name, won't exceed 96 chars (at least such constraints will be guaranteed after we implement #7056).
Them maybe we should set max of 79
chars for both normalized clusterName
and topicName
(I would probably just trim it, if it exceeds, with appropriate comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - I followed your suggestion here
return dependsOn; | ||
} | ||
|
||
getMSKClusterName(eventSourceArn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's maybe rename it to getMSKClusterNameToken
(as we're not necessarily resolve the cluster name)
if (event.msk) { | ||
const EventSourceArn = event.msk.arn; | ||
const Topic = event.msk.topic; | ||
const BatchSize = event.msk.batchSize || 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not set default for optional properties (I know we sometimes do that in old code, but we have also history of errors cause of doing that).
It's safe to pass it as undefined
to resource definition, as undefined
values are automatically stripped on JSON stringification
const newMSKObject = { | ||
[mskEventLogicalId]: mskResource, | ||
}; | ||
|
||
Object.assign(cfTemplate.Resources, newMSKObject); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler would be just:
cfTemplate.Resources[mskEventLogicalId] = mskResource;
return fixtures | ||
.extend('function', { | ||
functions: { | ||
foo: { | ||
events: [ | ||
{ | ||
msk: { | ||
topic, | ||
arn, | ||
batchSize, | ||
enabled, | ||
startingPosition, | ||
}, | ||
}, | ||
], | ||
}, | ||
}, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For tests performance it'll be nice to rely on just one fixture, so use multiple events that resemble different tested configurations in scope of one service.
const serverless = new Serverless(); | ||
serverless.setProvider('aws', new AwsProvider(serverless)); | ||
awsCompileMSKEvents = new AwsCompileMSKEvents(serverless); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not test it like that, as it may be problematic for eventual internal refactors.
I'd rather introduce all three cases in runServerless
tests, and just confirm the outcome is expected. (that should be good enough)
Eventually we may seclude getMSKClusterName
to standalone util (e.g. to getMSKClusterName.js
in same folder) and write unit tests for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with moving it out to separate file as I feel it's a bit "cleaner" to unit test this. Please let me know what do you think about such approach.
resolveDependsOn(funcRole) { | ||
let dependsOn = 'IamRoleLambdaExecution'; | ||
|
||
if (funcRole) { | ||
if ( | ||
// check whether the custom role is an ARN | ||
typeof funcRole === 'string' && | ||
funcRole.indexOf(':') !== -1 | ||
) { | ||
dependsOn = []; | ||
} else if ( | ||
// otherwise, check if we have an in-service reference to a role ARN | ||
typeof funcRole === 'object' && | ||
'Fn::GetAtt' in funcRole && | ||
Array.isArray(funcRole['Fn::GetAtt']) && | ||
funcRole['Fn::GetAtt'].length === 2 && | ||
typeof funcRole['Fn::GetAtt'][0] === 'string' && | ||
typeof funcRole['Fn::GetAtt'][1] === 'string' && | ||
funcRole['Fn::GetAtt'][1] === 'Arn' | ||
) { | ||
dependsOn = funcRole['Fn::GetAtt'][0]; | ||
} else if ( | ||
// otherwise, check if we have an import or parameters ref | ||
typeof funcRole === 'object' && | ||
('Fn::ImportValue' in funcRole || 'Ref' in funcRole) | ||
) { | ||
dependsOn = []; | ||
} else if (typeof funcRole === 'string') { | ||
dependsOn = funcRole; | ||
} | ||
} | ||
|
||
return dependsOn; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #8167 I've secluded this logic to this.provider.resolveFunctionIamRoleResourceName(functionObj)
, so let's rely on this instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for this 🙇
@pgrzesik concerning this, if we're just guessing the regex, I'd rather stick to |
a4f1d4c
to
e52c2c4
Compare
Thanks @medikoo for such a quick review 🙇 As for ARN regex - I followed your advice and replaced it with As for testing everything in one fixture - I moved as much as possible to one fixture, but still decided to go with another run to verify if Last question - how do you feel about the lack of integration test for this one? Thanks once again 🙇 |
@pgrzesik if configuring (via CloudFormation) a infrastructure (which will allow us to test it repeatedly) is not specifically difficult, I would go for it, but taking new approach (as you pointed provisioning is timetaking): What do you think about that:
|
Hey @medikoo, thanks a lot for the suggestion about different approach to integration test. Setting up infrastructure is not that tricky, the tricky part is publishing messages to MSK topic so it can be consumed by Lambda. I've experimented with doing that via EC2 instances and it should work fine (I've tested with a few manual steps, but it should be possible to automate), I'll see if it would be possible to implement a Lambda producer in a reasonable manner. |
Great thanks @pgrzesik that looks very promising! This new integration tests infra setup, can open door to test other important parts of the framework, we do not test currently due to implied fuss. |
Hello again 👋 I've landed on an approach with EC2 instance periodically pushing messages to Kafka topic and was considering Lambda as I'm not 100% sold on EC2 route, however, I'm not entirely sure what would be the recommended way to set up the infrastructure for that. In order to publish messages to MSK directly from another Lambda I need to use a third-party library (e.g. https://kafka.js.org/) and I would like to ask your opinion about it:
Thanks in advance for the response. Also, I'll be taking a short break so I won't be able to do any development till upcoming Thursday (10.09) and I might be slow to respond to messages here. |
@pgrzesik great thanks for exploring that.
I understand that it'll be tricky to do it without this 3rd party package (as tricky as e.g. trying to operate on AWS services without using
Yes, so far it was the case of services that do not require an additional installation step. I believe this needs to be supported out of a box by our fixtures engine, let me look into it and get back to you.
Have a great break! :) |
@pgrzesik I've improved our fixtures engine, refactored current tests to use it, and converted all integration tests to rely on it In new engine setup if fixture contains serverless/test/integration/fileSystemConfig/index.test.js Lines 52 to 63 in 0bed880
|
Hello there 👋 Thanks a lot for that @medikoo - I will be working on the integration test for MSK over the weekend and the improved fixture functionality will be invaluable 🙇 |
e52c2c4
to
8a88c9e
Compare
Hello @medikoo, sorry for a bit lengthy radio silence - it turned out that there were a few more obstacles to implement it than I initially anticipated. There is some good news and, sadly, some bad news too. The need for Good news is that the integration test works and takes about 31 minutes to execute (I've ran it >10 times today in the background), so it might be acceptable (or not!) to run it as standalone integration test. Unfortunately, there's also some bad news :( From time to time (I observed it for about every 4th run), the CloudFormation stack fails to delete. The reason for that is the fact that when Lambda is configured with MSK, it creates an ENI underneath. In some cases, that ENI is not removed right after removing Lambda function and it makes it impossible for the stack to be successfully deleted (VPC + Subnet cannot be deleted due to ENI still existing). I was looking for solutions to that and the only thing I've found is this thread (https://forums.aws.amazon.com/message.jspa?messageID=734756#734756) which suggest to "just" wait up to 6 hours before removing the other stack. Other solution is to remove ENI manually but that doesn't seem like a great solution for automated tests. I was also researching the another approach with having the separate scripts for setting up infrastructure and tearing it down, however, it suffers from similar issue - it might take a long time before it will be possible to tear down the infrastructure. It also creates a few other challenges with retaining specific parameters (e.g. ClusterConfigurationArn) between setting up and tearing down the infrastructure. I'm not entirely sure where to go with this one. I don't think it would be possible to fully automate this test due to issues with tearing down the CloudFormation stack with dependencies. Do you think it's acceptable to account for this issue and go with separate provisioning/de-provisioning scripts for infrastructure? |
@pgrzesik great thanks for looking into this thoroughly. It's amazing! Concerning ENI removal issue. Maybe we can steal approach from that plugin: https://github.com/medikoo/serverless-plugin-vpc-eni-cleanup (internal implementation is pretty straightforward) It deletes all interfaces manually, and that ensures that stack is removed promptly. I've used this for removal of services with VPC involved and it worked. Do you think it can work in this case? |
Hello @medikoo - thanks for a great suggestion - I've added an additional step to cleanup that looks for any "available" ENIs (there should be at most one in VPC created via CF for this test) and removes them, I've run the tests ~15 times afterward and it works correctly both when dangling ENI is present or it was already cleaned up automatically. The last question is - should it stay as an integration test or should it depend on the separate flow with
So as you can see, in most cases, it's around 33 minutes, with some runs taking as much as 41 minutes. |
@pgrzesik great to hear clearing ENI helped! I think it'll be good to introduce |
dd8b79e
to
102296a
Compare
Hello 👋 I've introduced, as suggested, separate commands for setting up and tearing down the infrastructure for tests. MSK test skips execution when the dependency stack is not available. I'm not entirely sure that my approach is what was expected so I'm open to feedback 🙌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pgrzesik that looks great! I'm really happy we've paved the path for more sophisticated integration testing.
I've proposed just minor improvements, as otherwise it looks as ready to take. Let me know what you think.
f6431c5
to
21fe37e
Compare
Hello, I believe I've applied all suggested changes. One last question that I have at the moment - are integration tests run as a part of any automated pipeline at the moment? The reason I ask is that after switching from skipping MSK test to failing it thanks to throwing error, the CI check will be failing all the time which might not be a desirable thing. (Unless it's also going to setup and teardown corresponding infra) |
@pgrzesik good question. Yes they're run automatically on each commit that lands in I was also thinking on whether to not run it automatically if resources are not detected, but It's probably too invasive. Anyway we may introduce it if we'd agree it'll be helpful |
Thanks for a quick answer and explanation. If manual approach works then that's alright, what I'm worried about is that it might become time consuming in the long run if it will require from you to run that setup multiple times. The alternative is to just leave this infrastructure up and running, but that will incur additional costs (~60$/month for just MSK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pgrzesik It looks very good. I have just one last concern towards eventual updates to infra. Please see my comments
StackName: SHARED_INFRA_TESTS_CLOUDFORMATION_STACK, | ||
}); | ||
log.error('Integration tests CloudFormation stack already exists. Quitting.'); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering, that it'll probably be good to proceed with update operation if stack is there.
I can imagine that at some point we may want to extend this stack, and currently this script won't handle that well (we will have to remove the stack and re-create it, which might in many cases be quite time taking).
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense - I believe in that case I would like to have it as two separate branches of logic as when we want to perform an update, we will need to skip some of the operations (e.g. creation of MSK Cluster Configuration). Note: We might still need to go through the whole process if there are dependencies that are not set up via CloudFormation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adjusted the setup script so now it'll check if the stack can be updated and will try to perform an update 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pgrzesik great job! I've just tested it, and it worked flawlessly!
I've proposed just one minor exit code handling improvement, and logger namespace suggestion
Thanks for a review and patience @medikoo 🙇 I believe all suggestions should be addressed at the moment 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's outstanding @pgrzesik ! Thank you
Add support for MSK as event trigger for AWS Lambda
A few (questionable) choices:
IamRoleLambdaExecution
- I've identified this as the minimal set of permissions that Lambda needs to successfully configure MSK trigger. Withoutec2
permissions, it was failing to successfully enable the trigger.Fn::ImportValue
andRef
. Support forFn::GetAtt
does not make sense as ARN is not exposed as a resource attribute (you can get it withRef
).sqs
andstream
events when it comes to resolvingdependsOn
property. I noticed that all of these actually share the same logic. I think it makes sense to unify it in a form of a helper function, but I'm not entirely sure how do you feel about it and if it's even the proper PR to implement such change. For now, I've moved it to a separate method, but I'm open to refactoring it even further.^arn:aws[a-zA-Z-]*:kafka:[a-z]{2}((-gov)|(-iso(b?)))?-[a-z]+-[1-9]{1}:[0-9]{12}:cluster
pattern forarn
validation. I didn't find exact ARN regex anywhere in AWS docs and I feel like this one is "good enough", but I'm happy to try to reach out support or try to establish the better version if needed.Closes: #8117