Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct EKS getting started docs #543

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lwr20
Copy link
Member

@lwr20 lwr20 commented Apr 5, 2023

EKS getting started docs are wrong:

  • the Helm install doesn't mention needing to create and specify a namespace
  • nodegroup can be added once aws-node daemonset is deleted, and this means the Helm instructions won't fail at the helm install step.

Product Version(s):

Calico v3.24 upwards.

Issue:

Raised by CU Byron Mansfield in slack thread: https://calicousers.slack.com/archives/CPTH1KS00/p1680705961183599

Link to docs preview:

SME review:

  • An SME has approved this change.

DOCS review:

  • A member of the docs team has approved this change.

Additional information:

@lwr20 lwr20 requested a review from a team as a code owner April 5, 2023 15:27
@netlify
Copy link

netlify bot commented Apr 5, 2023

Deploy Preview succeeded! Please check the logs for test failures which will cause the production (main) build to fail.

Built without sensitive environment variables

Name Link
🔨 Latest commit c425edf
🔍 Latest deploy log https://app.netlify.com/sites/tigera/deploys/642ec56d3f96c5000805ddd8
😎 Deploy Preview https://deploy-preview-543--tigera.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Copy link
Contributor

@girishshankaran girishshankaran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


```batch
eksctl create nodegroup --cluster my-calico-cluster --node-type t3.medium --max-pods-per-node 100
helm install calico projectcalico/tigera-operator --namespace tigera-operator --version {{releaseTitle}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is eks specific. And it is for using tigera-operator helm chart. I believe the values file here might want to be set to "EKS". In which case you would need to pass in the --set installation.kubernetesProvider="EKS" argument to the helm install command.

Copy link
Member Author

@lwr20 lwr20 Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://github.com/tigera/operator/blob/master/pkg/controller/installation/core_controller.go#L499
installation.kubernetesProvider="EKS" causes tigera-operator to:

  • pick AmazonVPC CNI
  • specify default IP pool of "172.16.0.0/16" + VXLAN
  • disable BGP
  • set autodetection to CanReach: "8.8.8.8"

Since the point of this section of the doc is to set up Calico CNI, not AmazonVPC CNI, I think we do NOT want kubernetesProvider="EKS"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will only pick those settings if no other settings are provided - they are the deafults. Even if you set the provider to empty, the operator will know it is running on EKS and default that field to EKS anyway, and thus default to AmazonVPC CNI.

To use Calico on EKS, you will need to explicitly tell it to use Calico CNI as part of your Installation (via values.yaml, or explicit CLI flags to set particular values).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will want to create a values.yaml file and pass it to helm, with something like this:

cat > values.yaml <<EOF
installation:
  kubernetesProvider: EKS
  cni:
    type: Calico
  calicoNetwork:
    bgp: Disabled
EOF

Potentially also include IP pools, if we need to:

    ipPools:
    - cidr: <default cidr>
      encapsulation: VXLAN

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctauchen this particular bit of feedback needs to be addressed before merge. I think it's the only outstanding technical feedback.

I think we need the following as part of this step in order to install correctly.:


  1. Create a values.yaml file to configure Calico as your network provider.
cat > values.yaml <<EOF
installation:
  kubernetesProvider: EKS
  cni:
    type: Calico
  calicoNetwork:
    bgp: Disabled
    ipPools: 
    - encapsulation: VXLAN
      cidr: 192.168.0.0/16
EOF
  1. Install calico using helm
helm install calico projectcalico/tigera-operator --namespace tigera-operator --version {{releaseTitle}} -f values.yaml

Copy link
Collaborator

@ctauchen ctauchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, which should be carried through globally.

Is there anyone you'd like to nominate for a technical review?

Co-authored-by: Christopher Tauchen <tauchen@gmail.com>
@lwr20
Copy link
Member Author

lwr20 commented Apr 6, 2023

Is there anyone you'd like to nominate for a technical review?

Perhaps @caseydavenport? I have a recollection of Casey adding the namespace and helm installing into that namespace before.

@caseydavenport
Copy link
Member

per @lwr20 's request, there are a few other things we need to do for our helm docs across the board:

Our generic helm docs here have some content that is just wrong.

If you are installing on a cluster installed by EKS, GKE, AKS or Mirantis Kubernetes Engine (MKE), or you need to customize TLS certificates, you must customize this Helm chart by creating a values.yaml file. Otherwise, you can skip this step.

This is misleading and both too specific and too general. It should be re-written to be more precise and explain that for basic setups, default settings will just work. However there are many many reasons you might want to customize your settings. I'm not sure why it specifically calls out TLS as a reason there - it's one reason, but the implication by omission is that it's the only reason, which is wrong!

Our helm upgrade docs are also wrong - they omit the step where you need to manually replace the CRDs prior to upgrade.

We can handle those in this PR, or in separate follow-ons depending on how you want to do it. But it would be great to attack these while we're thinking about helm.

@bmckercher123
Copy link
Contributor

bmckercher123 commented Jul 26, 2023

@https://github.com/byronmansfield Looks like code change that the docs PR is for still hasn't been merged yet - it's failing tests. We had someone restart it. If you can rebase, we can get this merged.

@bmckercher123 bmckercher123 added the community-contributor Author is from outside of Tigera label Jul 26, 2023
@ctauchen
Copy link
Collaborator

@lwr20 I'm trying to clear out stale PRs. This one seems both stale and necessary. I get the impression that the changes are sound, but need to be updated to later versions. Casey's comments may have slowed folks down, but the general Helm changes can be considered separately.

Have I got the right idea here? I'm happy to put up a new PR with these changes in the later versions. And put something in the backlog for the other Helm business.

@@ -70,12 +70,18 @@ Before you get started, make sure you have downloaded and configured the [necess
eksctl create cluster --name my-calico-cluster --without-nodegroup
```

1. Since this cluster will use {{prodname}} for networking, you must delete the `aws-node` daemon set to disable AWS VPC networking for pods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the new wording is more desirable - the former was more abstractly correct, whereas the latter is one particular reason among several for why we don't want the aws-node daemonset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "to ensure that nodes are not configured with AWS VPC networking for pods" would be better than "to disable"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contributor Author is from outside of Tigera Needs docs review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants