Inconsistent terminationGracePeriodSeconds set in different versions of calico-node daemonset #8691

BenjaminHuang · 2024-04-03T09:53:08Z

The calico-node daemonset has terminationGracePeriodSeconds set.

In the manifest version, it's coded as 0:
terminationGracePeriodSeconds: 0

But in the version generated by tigera-operator, it's coded as 5:
terminationGracePeriodSeconds: 5

However, both versions have prestop hook specified

        lifecycle:
          preStop:
            exec:
              command:
              - /bin/calico-node
              - -shutdown

If terminationGracePeriodSeconds set to 0
pre-stop hook will be unreachable, this makes the impact of calico-node deletion minimized.
If terminationGracePeriodSeconds set to non-zero
pre-stop hook will cause NetworkUnavailable status set:

  conditions:
  - lastHeartbeatTime: "2024-04-03T08:18:42Z"
    lastTransitionTime: "2024-04-03T08:18:42Z"
    message: Calico is running on this node
    reason: CalicoIsUp
    status: "False"
    type: NetworkUnavailable

and eventually cause a no-schedule taint added by kube-controller-manager:

  taints:
  - effect: NoSchedule
    key: node.kubernetes.io/network-unavailable
    timeAdded: "2024-04-03T07:17:46Z"

However, I'm not sure which is the desired behavior.

Expected Behavior

terminationGracePeriodSeconds should be consistent in calico-node daemonset, both in manifest and tigera-operator-generated version.

Current Behavior

terminationGracePeriodSeconds is inconsistent in calico-node daemonset, between manifest and tigera-operator-generated version.

Possible Solution

set terminationGracePeriodSeconds to 0 in different version of calico-node daemonset

Steps to Reproduce (for bugs)

Go to installation guide, e.g. https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises
Choose the way for installing: manifest or operator in a k8s cluster (optionally using kind)
Follow up the instructions to comple the installation
Try deleing calico-node instance and inspect coressponding node status/taints during deletion.
Compare the different behaviour , in different installation, and focus on difference of terminationGracePeriodSeconds

Context

I want calico installation from manifest or from tigera operator has the same behavior, in calico-node deletion.

Your Environment

Calico version 3.20.6
Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes 1.20.7
Operating System and version: Ubuntu 20.04
Link to your project (optional):

The text was updated successfully, but these errors were encountered:

caseydavenport · 2024-04-04T18:57:15Z

I suspect the manifest value needs to be increased to match what the operator is setting, and to enable the preStop hook to run.

cyclinder · 2024-04-08T03:27:43Z

If terminationGracePeriodSeconds set to non-zero
pre-stop hook will cause NetworkUnavailable status set:

Does this look like it's expected? If so, we need to adjust the manifest value to 5.

BenjaminHuang · 2024-04-10T02:44:39Z

If terminationGracePeriodSeconds set to non-zero
pre-stop hook will cause NetworkUnavailable status set:

Does this look like it's expected? If so, we need to adjust the manifest value to 5.

that depends on your situation

if you have all pods on overlay, and don't want new pods to be scheduled on the node, during calico-node crash/restart, set a positive value will be better
if you have all pods on host network, and just want to delete calico-node from cluster, leave it as zero would be nice

BenjaminHuang · 2024-04-10T02:50:55Z

I suspect the manifest value needs to be increased to match what the operator is setting, and to enable the preStop hook to run.

I'm not sure whether setting both to positive value would be better.

I guess by adding a commet above this parameter, describing the impact to node status , would be good enough.

Without that, it'd be hard to imagine what happen when changing it, you have to dig out more details from source code.

caseydavenport · 2024-04-15T22:26:39Z

if you have all pods on host network, and just want to delete calico-node from cluster, leave it as zero would be nice

Agreed, although I would classify this as an exceptional case and far from the expected scenario in 90% of Kubernetes clusters using Calico.

I think we should:

Adjust the manifest to use the same value as the operator.
Add a comment explaining why the value is set that way, to aid anyone who might want to adjust it.

BenjaminHuang changed the title ~~Inconsistent terminationGracePeriodSeconds seen in different versions of calico-node daemonset~~ Inconsistent terminationGracePeriodSeconds set in different versions of calico-node daemonset Apr 3, 2024

caseydavenport added kind/bug likelihood/high impact/low labels Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent terminationGracePeriodSeconds set in different versions of calico-node daemonset #8691

Inconsistent terminationGracePeriodSeconds set in different versions of calico-node daemonset #8691

BenjaminHuang commented Apr 3, 2024 •

edited

caseydavenport commented Apr 4, 2024

cyclinder commented Apr 8, 2024

BenjaminHuang commented Apr 10, 2024 •

edited

BenjaminHuang commented Apr 10, 2024 •

edited

caseydavenport commented Apr 15, 2024

Inconsistent terminationGracePeriodSeconds set in different versions of calico-node daemonset #8691

Inconsistent terminationGracePeriodSeconds set in different versions of calico-node daemonset #8691

Comments

BenjaminHuang commented Apr 3, 2024 • edited

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

caseydavenport commented Apr 4, 2024

cyclinder commented Apr 8, 2024

BenjaminHuang commented Apr 10, 2024 • edited

BenjaminHuang commented Apr 10, 2024 • edited

caseydavenport commented Apr 15, 2024

BenjaminHuang commented Apr 3, 2024 •

edited

BenjaminHuang commented Apr 10, 2024 •

edited

BenjaminHuang commented Apr 10, 2024 •

edited