Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade-health-check Job fails on a single control plane node cluster after drain #3050

Closed
ilia1243 opened this issue Apr 23, 2024 · 12 comments
Assignees
Labels
area/upgrades kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@ilia1243
Copy link

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version): v1.30.0

Environment:

  • Kubernetes version (use kubectl version): v1.30.0
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): Ubuntu 22.04.1 LTS
  • Kernel (e.g. uname -a): 5.15.0-50-generic
  • Container runtime (CRI) (e.g. containerd, cri-o): containerd=1.6.12-0ubuntu1~22.04.3
  • Container networking plugin (CNI) (e.g. Calico, Cilium): calico
  • Others:

What happened?

  1. Install a single control plane node cluster v1.29.1
  2. Drain the only node
  3. kubeadm upgrade apply v1.30.0 fails with
[ERROR CreateJob]: Job "upgrade-health-check-lvr8s" in the namespace "kube-system" did not complete in 15s: client rate limiter Wait returned an error: context deadline exceeded

It seems that previously the pod was Pending as well, but this was ignored, because the jobs was successfully deleted in defer and return value was overridden with nil.

defer func() {
    lastError = deleteHealthCheckJob(client, ns, jobName)
}()

https://github.com/kubernetes/kubernetes/blob/v1.29.1/cmd/kubeadm/app/phases/upgrade/health.go#L151

Similar issue #2035

What you expected to happen?

There might be no need to create the job.

How to reproduce it (as minimally and precisely as possible)?

See What happened?

@ilia1243 ilia1243 changed the title upgrade-health-check Job fails on a single control plane node cluster with drain upgrade-health-check Job fails on a single control plane node cluster after drain Apr 23, 2024
@neolit123 neolit123 added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Apr 23, 2024
@neolit123 neolit123 added this to the v1.31 milestone Apr 23, 2024
@neolit123
Copy link
Member

thanks for testing @ilia1243

this is tricky problem. but either way there is a regression in 1.30 that we need to fix.

the only problem here seems to be with the CreateJob logic.

k drain node
sudo kubeadm upgrade apply -f v1.30.0 --ignore-preflight-errors=CreateJob

^ this completes the upgrade of a single node CP and addons are applied correctly.
but the CreateJob check will always fail.

one option is to skip this check if there is a single CP node in the cluster.
WDYT?

cc @SataQiu @pacoxu @carlory

@neolit123
Copy link
Member

neolit123 commented Apr 23, 2024

one option is to skip this check if there is a single CP node in the cluster.

another option (for me less preferred) is to make the CreateJob health check return a warning instead of an error.
it will always show a warning on single node CP cluster.

@pacoxu
Copy link
Member

pacoxu commented Apr 23, 2024

+1 for skip

I need to check it again. I think I faiI for another reason that I did not install CNI for control plane and the pod failed for no CNI(not sure if this is a general use case, in this case the job should be run in hostNetwork.). I will do some test tomorrow.

@SataQiu
Copy link
Member

SataQiu commented Apr 24, 2024

My suggestion is to print a warning and skip Job creation when there are no nodes to schedule. WDYT?

@neolit123
Copy link
Member

My suggestion is to print a warning and skip Job creation when there are no nodes to schedule. WDYT?

IIUC, the only way to test if a job pod can schedule somewhere is to try to create the same job? the problem is that this preflight check's purpose is exactly that - check if the cluster accepts workloads.

i don't even remember why we added it, but now we need a fix it right away. perhaps later we can discuss removing it.

@neolit123
Copy link
Member

My suggestion is to print a warning and skip Job creation when there are no nodes to schedule. WDYT?

IIUC, the only way to test if a job pod can schedule somewhere is to try to create the same job? the problem is that this preflight check's purpose is exactly that - check if the cluster accepts workloads.

i don't even remember why we added it, but now we need a fix it right away. perhaps later we can discuss removing it.

we could look at Unschedulable taints on nodes, which means they were cordoned.

but listing all nodes on every kubeadm upgrade command will be very expensive in big scale clusters with many nodes.

so i am starting to think we should just convert this check to a preflight warning.

@neolit123 neolit123 modified the milestones: v1.31, v1.30 Apr 24, 2024
@neolit123 neolit123 added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Apr 24, 2024
@neolit123 neolit123 assigned neolit123 and unassigned carlory Apr 24, 2024
@carlory
Copy link
Member

carlory commented Apr 24, 2024

I'm not sure whether it is an expected patch for this issue?

i.e. add a new toleration into job like

{key=node.kubernetes.io/unschedulable, effect:NoSchedule} 

FYI:

Or just convert this check to a preflight warning?

@neolit123
Copy link
Member

Or just convert this check to a preflight warning?

i have a WIP PR for this.

I'm not sure whether it is an expected patch for this issue?

i don't know... ideally a node should be drained before upgrading kubelet.
so if we allow pods to schedule after the node is drained with the {key=node.kubernetes.io/unschedulable, effect:NoSchedule} hack, we are breaking this rule. i don't even know if it will work.

we do upgrade coredns and kube-proxy for a single node cluster while the node is drained with kubeadm upgrade apply, but we do ignore daemon sets anyway and the coredns pods will remain pending if the node is not schedulable. so technically for the addons we don't schedule new pods IIUC.

@neolit123
Copy link
Member

i have a WIP PR for this.

please see kubernetes/kubernetes#124503
and my comments there.

@neolit123
Copy link
Member

@carlory came up with a good idea how to catch the scenario.
kubernetes/kubernetes#124503 (comment)
the PR is updated.

more reviews are appreciated.

@neolit123
Copy link
Member

fix will be added to 1.30.1
kubernetes/kubernetes#124570

@neolit123
Copy link
Member

1.30.1 is out with the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/upgrades kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

5 participants