Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix panic in linuxNodeHandler.NodeDelete #25851

Merged
merged 1 commit into from
Jun 21, 2023

Conversation

cyclinder
Copy link
Contributor

@cyclinder cyclinder commented Jun 2, 2023

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Fixes: #25543

Fix crash of cilium-agent happening when a remote node without node IP addresses is removed.

@cyclinder cyclinder requested review from a team as code owners June 2, 2023 03:45
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 2, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jun 2, 2023
@pchaigno
Copy link
Member

pchaigno commented Jun 2, 2023

Can you provide details on how this causes a panic? Do you have a stacktrace for the panic and could you include it in the commit description?

@bimmlerd
Copy link
Member

bimmlerd commented Jun 2, 2023

Can you provide details on how this causes a panic? Do you have a stacktrace for the panic and could you include it in the commit description?

See #25543 for details on the panic. The commit msg could use a link too, though :)

@pchaigno
Copy link
Member

pchaigno commented Jun 2, 2023

Can you provide details on how this causes a panic? Do you have a stacktrace for the panic and could you include it in the commit description?

See #25543 for details on the panic. The commit msg could use a link too, though :)

At least for the datapath, we try to have the commit descriptions as self contained as possible, so I'd rather have the full stack trace in the description than a link.

@cyclinder
Copy link
Contributor Author

hi @pchaigno, I have updated the commit message, Please tell me if you feel this is good.

@pchaigno
Copy link
Member

pchaigno commented Jun 4, 2023

hi @pchaigno, I have updated the commit message, Please tell me if you feel this is good.

I think the formatting is broken as there are no newlines. You will also need to squash your two commits in one.

@cyclinder cyclinder requested a review from a team as a code owner June 4, 2023 11:34
@maintainer-s-little-helper
Copy link

Commit 37604abb942d388f92f8583ec20d4f7dd96073f7 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

1 similar comment
@maintainer-s-little-helper
Copy link

Commit 37604abb942d388f92f8583ec20d4f7dd96073f7 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jun 4, 2023
@cyclinder
Copy link
Contributor Author

It seems git rebase was broken, Now cilium use main or master? 🤔

@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jun 4, 2023
@pchaigno
Copy link
Member

pchaigno commented Jun 4, 2023

Cilium uses main and your commit now looks good to me.

Do you know under what conditions the panic happens? I'd like to write a release note and that requires understanding why our CI didn't catch this. Is there something special about your deployment? Why don't nodes have IP addresses? 🤔

@cyclinder
Copy link
Contributor Author

sorry, I can't explain the root cause, but according to #25543 description, this happens in the case of node rebuilding (deleted and created), and the node loses its IP, maybe ask @egsysoev for more details if needed.

@egsysoev
Copy link

egsysoev commented Jun 5, 2023

I am the reporter of #25543. I can't reproduce the issue, but it happened two times during the abnormal shutdown of hypervisor:

  1. esxi hypervisor failed and rebooted.
  2. VMs with attached GPU device could not be migrated to another hypervisor.
  3. Cluster API operator deleted the failed VMs: drain, delete node from k8s, remove VM from esxi.
  4. Cluster API operator created new VMs.

@pchaigno
Copy link
Member

pchaigno commented Jun 5, 2023

Thanks for the additional context @egsysoev!

Cluster API operator deleted the failed VMs: drain, delete node from k8s, remove VM from esxi.

So the VM wasn't fully provisioned yet when it was deleted? If so, that could explain how the deleted node doesn't have IP addresses.

@cyclinder
Copy link
Contributor Author

@pchaigno Hi, It's time to merge this PR?

@pchaigno pchaigno added release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Jun 12, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jun 12, 2023
@pchaigno
Copy link
Member

@cyclinder Note that every time you push to the branch, we will need to restart the tests. So I would only push if there are conflicts or if we ask you to because some test is failing. Not sure if that was the case here, but I thought I'd notify you 🙂

@cyclinder
Copy link
Contributor Author

I see, thanks for the clarification @pchaigno

@ti-mo
Copy link
Contributor

ti-mo commented Jun 20, 2023

@cyclinder Sorry to pull you in the other direction; would you be able to rebase to pick up the recent changes to CI? (Jenkins is no longer)

@ti-mo ti-mo added kind/bug This is a bug in the Cilium logic. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. labels Jun 20, 2023
Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)

github.com/cilium/cilium/pkg/datapath/linux.(*linuxNodeHandler).deallocateIDForNode(0xc003e871f0?, 0xc003e87108?)
	/go/src/github.com/cilium/cilium/pkg/datapath/linux/node_ids.go:110 +0x335

Signed-off-by: qifeng guo <qifeng.guo@daocloud.io>
@cyclinder
Copy link
Contributor Author

@ti-mo Sure, Updated.

@ti-mo
Copy link
Contributor

ti-mo commented Jun 20, 2023

/test

@gandro
Copy link
Member

gandro commented Jun 21, 2023

Two ci-gingo jobs failed with ssh: connect to host localhost port 2222: Connection refused - restarting affected jobs

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 21, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.5 Jun 21, 2023
@borkmann borkmann merged commit 104dafd into cilium:main Jun 21, 2023
61 of 62 checks passed
@nbusseneau nbusseneau mentioned this pull request Jun 22, 2023
19 tasks
@nbusseneau nbusseneau added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Jun 22, 2023
@tklauser tklauser added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Jun 29, 2023
@gentoo-root gentoo-root moved this from Needs backport from main to Backport done to v1.13 in 1.13.5 Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. kind/bug This is a bug in the Cilium logic. kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.13.5
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

Panic in linuxNodeHandler.NodeDelete