Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow self managed Apache Kafka as an event source #8718

Closed
lewgordon opened this issue Jan 5, 2021 · 19 comments · Fixed by #8784
Closed

Allow self managed Apache Kafka as an event source #8718

lewgordon opened this issue Jan 5, 2021 · 19 comments · Fixed by #8784

Comments

@lewgordon
Copy link
Contributor

lewgordon commented Jan 5, 2021

Use case description

We'd like to utilize our self managed Kafka cluster as an event source, now that AWS will allow it.

Proposed solution

I would think it would look fairly similar to what was implemented for #8117. See https://docs.aws.amazon.com/lambda/latest/dg/kafka-smaa.html for additional details. I also provided an example of what I'm thinking the serverless.yml events would look like, although feedback is appreciated!

functions:
  compute:
    handler: handler.compute
    events:
      - kafka:
          saslScram512Auth: arn:aws:secretsmanager:us-east-1:01234567890:secret:MyBrokerSecretName
          topic: mytopic
          bootstrapServers:
            - abc3.xyz.com:9092
            - abc2.xyz.com:9092
      - kafka:
          vpcSubnets:
            - subnet-0011001100
            - subnet-0022002200
          vpcSecurityGroups:
            - sg-0123456789
          topic: mytopic
          bootstrapServers:
            - abc3.xyz.com:9092
            - abc2.xyz.com:9092

I didn't enumerate all of the options that would be provided, but I figure it'd support all of the options for the https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-lambda-eventsourcemapping.html resource, but being focused on using SelfManagedEventSource specifically.

@pgrzesik
Copy link
Contributor

pgrzesik commented Jan 6, 2021

Hello @lewgordon 👋 Thank you for your proposal, it looks good overall, however, I have a few questions:

  1. Do you happen to know if SourceAccessConfigurations in CloudFormation supports SASL_SCRAM_512_AUTH type? In current documentation, it seems like it only works for Amazon MQ's BASIC_AUTH.
  2. saslScram512Auth - could we name it e.g. accessConfiguration? For now it will only support SASL_SCRAM_512_AUTH, but if in the future there's another type supported, then we will have a nice upgrade path to accessConfiguration.type and accessConfiguration.uri setting instead of adding similar property.

Thanks!

@lewgordon
Copy link
Contributor Author

Hello @lewgordon Thank you for your proposal, it looks good overall, however, I have a few questions:

  1. Do you happen to know if SourceAccessConfigurations in CloudFormation supports SASL_SCRAM_512_AUTH type? In current documentation, it seems like it only works for Amazon MQ's BASIC_AUTH.

Will check on this today!

  1. saslScram512Auth - could we name it e.g. accessConfiguration? For now it will only support SASL_SCRAM_512_AUTH, but if in the future there's another type supported, then we will have a nice upgrade path to accessConfiguration.type and accessConfiguration.uri setting instead of adding similar property.

Yeah I think that makes sense. However, things like VPC_SUBNET and VPC_SECURITY_GROUP are also technically accessConfigurations. Do you think that we should make accessConfiguration accept a list of {type, uri} properties? I had split it out since it seemed easier to understand at a glance, but I'm curious as to what your opinion is.

@lewgordon
Copy link
Contributor Author

Hello @lewgordon Thank you for your proposal, it looks good overall, however, I have a few questions:

  1. Do you happen to know if SourceAccessConfigurations in CloudFormation supports SASL_SCRAM_512_AUTH type? In current documentation, it seems like it only works for Amazon MQ's BASIC_AUTH.

Will check on this today!

Hi @pgrzesik, I believe all the new types that aren't mentioned in the documentation should work. I opened a PR with them (awsdocs/aws-cloudformation-user-guide#898) to just confirm this though. However, speaking with an Amazon support rep I believe the internal team has been notified, too.

@pgrzesik
Copy link
Contributor

pgrzesik commented Jan 7, 2021

Thanks for following up @lewgordon and for clarification around VPC_SUBNET and VPC_SECURITY_GROUP - I missed the fact that they're also accessConfigrations. The question I have here, how should we format them when passing to CloudFormation's SourceAccessConfiguration.Uri? As for potential implementation in Framework, what do you think about the approach below (it still depends on the answer to my previous question as I don't fully understand how to properly construct CF Resource when e.g. using VPC_SECURITY_GROUP)

functions:
  compute:
    handler: handler.compute
    events:
      - kafka:
          accessConfiguration:
            type: saslScram512Auth
            arn(or uri): arn:aws:secretsmanager:us-east-1:01234567890:secret:MyBrokerSecretName
          topic: mytopic
          bootstrapServers:
            - abc3.xyz.com:9092
            - abc2.xyz.com:9092
      - kafka:
          accessConfiguration:
            type: vpcSubnet
            vpcSubnets:
              - subnet-0011001100
              - subnet-0022002200
          vpcSecurityGroups:
            - sg-0123456789
          topic: mytopic
          bootstrapServers:
            - abc3.xyz.com:9092
            - abc2.xyz.com:9092
      - kafka:
          accessConfiguration:
            type: vpcSecurityGroups
            vpcSecurityGroups:
              - sg-0123456789
          topic: mytopic
          bootstrapServers:
            - abc3.xyz.com:9092
            - abc2.xyz.com:9092

Let me know what do you think 💯

@lewgordon
Copy link
Contributor Author

Here's a snippet from the Cloudformation I ran yesterday, hopefully it clarifies a bit on what the resource needs to look like:

"KafkaEventSource": {
  "Type" : "AWS::Lambda::EventSourceMapping",
  "Properties": {
    "Topics": ["ConfigurationChanged"],
    "FunctionName": {"Ref": "HelloLambdaFunction"},
    "SelfManagedEventSource": {
      "Endpoints": {"KafkaBootstrapServers": ["abc2.xyz.com:9092"]}
    },
    "SourceAccessConfigurations": [
      {"Type": "VPC_SECURITY_GROUP", "URI": "security_group:sg-1234567890"},
      {"Type": "VPC_SUBNET", "URI": "subnet:subnet-1234567890"}
    ]
  }
},

I do like the accessConfiguration YAML mapping as I think that seems fairly extensible as new properties come in. Although as my example Cloudformation shows we'd still need a sequence of accessConfigurations. In my mind it'd look like this (modifying your example):

functions:
  compute:
    handler: handler.compute
    events:
      - kafka:
          accessConfigurations:
            - type: saslScram512Auth
              arn(or uri): arn:aws:secretsmanager:us-east-1:01234567890:secret:MyBrokerSecretName
          topic: mytopic
          bootstrapServers:
            - abc3.xyz.com:9092
            - abc2.xyz.com:9092
      - kafka:
          accessConfigurations:
            - type: vpcSubnet
              vpcSubnets:
                - subnet-0011001100
                - subnet-0022002200
            - type: vpcSecurityGroup
              vpcSecurityGroups:
                - sg-0123456789
          topic: mytopic
          bootstrapServers:
            - abc3.xyz.com:9092
            - abc2.xyz.com:9092
      - kafka:
          accessConfigurations:
            - type: vpcSecurityGroup
              vpcSecurityGroups:
                - sg-0123456789
          topic: mytopic
          bootstrapServers:
            - abc3.xyz.com:9092
            - abc2.xyz.com:9092

Also my initial thought was to simplify the URIs that are needed for security-group and subnet, since in my opinion they seem a bit redundant. To be more explicit I mean so that the end user doesn't need to specify security-group:sg-0123456789 for example. Hopefully this makes a bit more sense though and looking forward to your thoughts, @pgrzesik !

@pgrzesik
Copy link
Contributor

pgrzesik commented Jan 8, 2021

Thank you @lewgordon - I missed the fact that it's multiple accessConfigurations, not a single one so your proposal makes perfect sense. I really like the simplification of securityGroup and subnet as well. 👍

I think there are no other outstanding questions and it's ready for implementation - let me know what do you think. Would you be okay with updating the issue with the final version of the implementation proposal? And of course, we'd be more than happy to accept a PR 🙌

@lewgordon
Copy link
Contributor Author

Sure I wouldn't necessarily assign it to me right away, but I may look into the implementation next week!

@lewgordon
Copy link
Contributor Author

@pgrzesik thinking through some of the implementation a bit I found another "quirk" that'll have to do with Ref. Say a user provides:

functions:
  compute:
    handler: handler.compute
    events:
      - kafka:
          accessConfigurations:
            - type: vpcSubnet
              vpcSubnets:
                Ref: VpcSubnets
            - type: vpcSecurityGroup
              vpcSecurityGroups:
                - sg-0123456789
          topic: mytopic
          bootstrapServers:
            - abc3.xyz.com:9092
            - abc2.xyz.com:9092

Where VpcSubnets is a CommaDelimitedList for example. In this case we couldn't be able to do the transform to the Cloudformation object without dereferencing the Ref, since we don't know the size. For example, we'd be unable to determine this:

"KafkaEventSource": {
  "Type" : "AWS::Lambda::EventSourceMapping",
  "Properties": {
    "Topics": ["mytopic"],
    "FunctionName": {"Ref": "ComputerLambdaFunction"},
    "SelfManagedEventSource": {
      "Endpoints": {"KafkaBootstrapServers": ["abc3.xyz.com:9092", "abc2.xyz.com:9092"]}
    },
    "SourceAccessConfigurations": [
      {"Type": "VPC_SECURITY_GROUP", "URI": "security_group:sg-0123456789"},
      {"Type": "VPC_SUBNET", "URI": "subnet:???"},
    ]
  }
},

I think the we could disallow Ref in certain spots, but that seems like a poor user experience. I think we may just want to further modify the YAML to look like this:

functions:
  compute:
    handler: handler.compute
    events:
      - kafka:
          accessConfigurations:
            - vpcSubnet: subnet-0011001100
            - vpcSubnet: subnet-0022002200
            - vpcSecurityGroup: sg-0123456789
          topic: mytopic
          bootstrapServers:
            - abc3.xyz.com:9092
            - abc2.xyz.com:9092

Sorry if we'll need to go through more rounds of discussion around the event object, but I think it'll be worth it. Looking forward to your thoughts!

@pgrzesik
Copy link
Contributor

Hey @lewgordon, thanks for mentioning that issue, you're totally right. In the proposed example, we still will have to disallow the use of CF intrinsic functions, or do you think we'd be just fine with using Sub when crafting the SourceAccessConfigurations?

@lewgordon
Copy link
Contributor Author

do you think we'd be just fine with using Sub when crafting the SourceAccessConfigurations?

That was my initial thought at least. Basically, for any intrinsic function we could use Fn::Sub to transform the value. Although I'm not sure if that has possible unexpected behavior implications down the line.

@pgrzesik
Copy link
Contributor

If that's going to work, I believe your latest suggestion is the best way to implement it @lewgordon 👍

@medikoo
Copy link
Contributor

medikoo commented Jan 20, 2021

Just as a side note. We support CF intrinsic functions only in properties which we pass directly (as-is) to CloudFormation template (where they're supported).

So if the case here is different, I believe we simply should not support CF intrinsic functions (there definitely should not be the need to resolve those functions)

@lewgordon
Copy link
Contributor Author

That sounds good to me. I think it'll simplify this PR a bit, too.

@medikoo
Copy link
Contributor

medikoo commented Jan 20, 2021

Concerning accessConfigurations why not configuration as:

accessConfigurations:
  saslScram512Auth: arn:aws:secretsmanager:us-east-1:01234567890:secret:MyBrokerSecretName # can be an array
  vpcSubnet:
    - subnet-0011001100
    - subnet-0022002200
  vpcSecurityGroup: sg-0123456789 # can be an array

At least what's currently reminds me the design of function event on our side, which is troublesome and doesn't feel natural (we've scheduled to change it: #7601)

@lewgordon
Copy link
Contributor Author

@medikoo I think that's valid now that we won't accept the use of the intrinsic functions. The problem with that configuration initially was how it'd translate to Cloudformation. For example, if we allowed the use of Ref:

accessConfigurations:
  saslScram512Auth: arn:aws:secretsmanager:us-east-1:01234567890:secret:MyBrokerSecretName # can be an array
  vpcSubnet:
    - subnet-0011001100
    - subnet-0022002200
  vpcSecurityGroup: sg-0123456789 # can be an array

Would not translate to a valid Cloudformation template for this resource, since we wouldn't be able to determine how many values were in Ref when packaging the function:

"KafkaEventSource": {
  "Type" : "AWS::Lambda::EventSourceMapping",
  "Properties": {
    "Topics": ["mytopic"],
    "FunctionName": {"Ref": "ComputerLambdaFunction"},
    "SelfManagedEventSource": {
      "Endpoints": {"KafkaBootstrapServers": ["abc3.xyz.com:9092", "abc2.xyz.com:9092"]}
    },
    "SourceAccessConfigurations": [
      {"Type": "VPC_SECURITY_GROUP", "URI": "security_group:sg-0123456789"},
      {"Type": "VPC_SUBNET", "URI": "subnet:???"},
    ]
  }
},

Hope that makes sense and I think that by not allowing the intrinisic functions your proposal does seem more natural.

@medikoo
Copy link
Contributor

medikoo commented Jan 20, 2021

@lewgordon thanks for explanation, yes we definitely should not support any CF intristic functions here, as it's not a value we directly pass to CF template.

In such case I suggest to improve the syntax to one now proposed. What do you think? @pgrzesik what's your opinion on that?

@pgrzesik
Copy link
Contributor

I agree with the proposal to simplify the syntax if we're not allowing CloudFormation intrinsic functions - that was the only reason for current syntax 👍

@lewgordon
Copy link
Contributor Author

Hi @pgrzesik @medikoo, after some testing yesterday I think the experience without the use of intrinsic functions is going to make this event painful to use. My (if potentially biased) workflow would to be able to contain all the resources within the one Serverless stack and definitely discourage the pattern of doing a double deploy. For example:

functions:
  compute:
    events:
      - kafka:
          accessConfigurations:
            vpcSecurityGroup:
              - Fn::Sub:
                   - "security_group:${SecurityGroupId}"
                   - SecurityGroupId:
                       Ref: LambdaSecurityGroup
          topic: MyTopic
          bootstrapServers:
            - abc.xyc.com:9092

resources:
  Resources:
    LambdaSecurityGroup:
      Type: AWS::EC2::SecurityGroup
      Properties: {...}

I think this would be much more preferable, even if I needed to use Fn::Sub to achieve it. If we only allowed strings I think I'd need to do two passes and hardcode the physical ids of the security group, rather than letting CF determine it for me. I think we can still use the same model as we discussed, though. Looking forward to hearing your thoughts.

@medikoo
Copy link
Contributor

medikoo commented Jan 22, 2021

@lewgordon I'm not sure if that's possible.

AFAIK we need to resolve a Fn::Sub and Ref locally to be able to compile accessConfiguration into result CF template, and we probably do not have a means to deduct id for LambdaSecurityGroup resource (?)

Technically the way things work in a Framework:

  • If you want to reuse some value across different properties (1) define it in custom block, (2) reference it via configuration variable (${self:custom...}) in those properties
  • CF intrinsic functions are allowed only in values passed "as is" to compiled template. In many cases it's not possible to resolve them on spot, and in many other cases resolution rules are not straightfoward. We have few exceptions where we attempt to resolve those functions (e.g. for local invocation), but still it's limited and incomplete (as it cannot be complete), and it's only in cases where in main scenario those values are used "as-is" in CF template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants