Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Multi-cluster architecture to increase resiliency and reduce inter-az data transfer charges #1802

Closed
wants to merge 5 commits into from

Conversation

ashoksrirama
Copy link

@ashoksrirama ashoksrirama commented Oct 23, 2023

Description

Motivation and Context

How was this change tested?

  • Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
  • Yes, I have updated the docs for this feature
  • Yes, I ran pre-commit run -a with this PR

Additional Notes

@ashoksrirama ashoksrirama requested a review from a team as a code owner October 23, 2023 14:48
@ashoksrirama ashoksrirama changed the title [feat] Multi-cluster architecture to increase resiliency and reduce inter-az data transfer charges #1765 feat: Multi-cluster architecture to increase resiliency and reduce inter-az data transfer charges #1765 Oct 23, 2023
@ashoksrirama ashoksrirama changed the title feat: Multi-cluster architecture to increase resiliency and reduce inter-az data transfer charges #1765 feat: Multi-cluster architecture to increase resiliency and reduce inter-az data transfer charges Oct 23, 2023
data "aws_availability_zones" "available" {}

locals {
cluster_name = format("%s-%s", basename(path.cwd), "shared")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets follow the current norm from what is used in other patterns:

Suggested change
cluster_name = format("%s-%s", basename(path.cwd), "shared")
name = basename(path.cwd)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

azs = slice(data.aws_availability_zones.available.names, 0, 3)

tags = {
Blueprint = local.cluster_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Blueprint = local.cluster_name
Blueprint = local.name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"

name = local.cluster_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name = local.cluster_name
name = local.name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,47 @@
provider "aws" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the cluster-per-AZ design requires splitting up the Terraform configurations into multiple directories. We should collapse this back down to a single directory, but have multiple cluster definitions - one for each AZ used. This can be shown with a set of definitions split into multiple files - for example:

  • az1.tf
  • az1.yaml
  • az2.tf
  • az2.yaml
  • az3.tf
  • az3.yaml

Within each of these AZ specific Terraform files we'll have:

  • EKS cluster definition
  • Addons definition
  • Kubernetes and Helm aliased providers scoped to that cluster and addon definition

Then within each of the AZ specific YAML files will be the Karpenter specific manifests for that AZ and cluster within that AZ.

Thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, made changes as suggested. We were following the istio multi-cluster pattern structure before.


This example shows how to provision a cell based Amazon EKS cluster.

* Deploy EKS Cluster with one managed node group in a VPC and AZ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the motivation for mixing Fargate, managed nodegroup, and Karpenter in this design?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was about showing how to use them in single AZ pattern. Removed the Fargate and using 1 managed node group + Karpenter now.

3. [terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli)
4. [helm](https://helm.sh/docs/helm/helm_install/)

## Deploy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the other pattern readmes for the "standard" README structure

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

provider "kubectl" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are actively moving away from this provider #1675

You can see the start of this in #1819 which also includes updates for Karpenter 0.32

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the use of kubectl provider and added instructions in README.md to install the karpenter and sample app.

# Karpenter
################################################################################

resource "aws_security_group" "karpenter_sg" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these Karpenter security group resources required?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, cleaned them up.


resource "kubectl_manifest" "karpenter_provisioner" {
yaml_body = <<-YAML
apiVersion: karpenter.sh/v1alpha5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll want to use v0.32 and v1beta1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,12 @@

variable "name" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't provide variables unless we absolutely require something from users to deploy (i.e. - a domain name or hosted zone) - these are not consumed in place, they are references

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the references

Copy link
Contributor

github-actions bot commented Dec 5, 2023

This PR has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this PR will be closed in 10 days

@github-actions github-actions bot added the stale label Dec 5, 2023
@ashoksrirama
Copy link
Author

I'll work on the changes and get back with an updated PR.

@github-actions github-actions bot removed the stale label Dec 6, 2023
@ashoksrirama
Copy link
Author

Updates made based on the above feedback.

Copy link
Contributor

This PR has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this PR will be closed in 10 days

@github-actions github-actions bot added the stale label Jan 27, 2024
@ashoksrirama
Copy link
Author

A gentle reminder to review this.

@github-actions github-actions bot removed the stale label Jan 30, 2024
Copy link
Contributor

github-actions bot commented Mar 1, 2024

This PR has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this PR will be closed in 10 days

@github-actions github-actions bot added the stale label Mar 1, 2024
Copy link
Contributor

Pull request closed due to inactivity.

@github-actions github-actions bot closed this Mar 12, 2024
@ashoksrirama
Copy link
Author

ashoksrirama commented Apr 11, 2024

Another gentle reminder to re-open this PR and review the latest changes.

@allamand
Copy link
Contributor

Hi @ashoksrirama, so in this pattern we create one EKS cluster per az and you deploy an app in each cluster. I am not sure to understand what new here and what needs another pattern? The title is multi-cluster with objective to reduce inter-az costs, so basically you do it just by having 3 clusters. What about the complexity of managing 3 agains 1 cluster ? What if you deploy microservices that needs to talk each other, how you ensure high availability in case of problem on 1 az ? I think I’m missing something here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-cluster architecture to increase resiliency and reduce inter-az data transfer charges
3 participants