Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The NotifReceive function is blocked and the notifHandler goroutine cannot exit. #30

Open
neblen opened this issue Jan 11, 2023 · 4 comments

Comments

@neblen
Copy link

neblen commented Jan 11, 2023

Description

[ 1 paragraph concisely describing the bug ]
The NotifReceive function is blocked and the notifHandler goroutine cannot exit.
When the container generates a new process, seccomp agent will allocate a notifHandler goroutine to monitor the abnormal syscall. When the process was dead, the notifHandler goroutine was still there and blocked at the NotifReceive function.

Impact

A large number of useless notifHandler goroutines are generated in the seccomp agent container
[ 1 sentence detailing the impact this bug is creating for you ]

Environment and steps to reproduce

k8s version v1.21.4+rke2r2
linux system Ubuntu 20.04.1 LTS
kernel version 5.15.0-50-generic

  1. Set-up: [ describe the environment Flatcar/Lokomotive/Nebraska etc was running in when encountering the bug; Platform etc. ]
  2. Task: [ describe the task performing when encountering the bug ]
  3. Action(s): [ sequence of actions that triggered the bug, see example below ]
    a. [ requested the start of a new pod or container ]
    b. [ container image downloaded ]
  4. Error: [describe the error that was triggered]

Expected behavior

[ describe what you expected to happen at 4. above but instead got an error ]

Additional information

Please add any information here that does not fit the above format.

@alban
Copy link
Member

alban commented Jan 11, 2023

x-ref seccomp/libseccomp-golang#104

@rata
Copy link
Member

rata commented Jan 11, 2023

Thanks for the report!

Given that libseccomp-golang bug and that the proper fix might be in the kernel, there might be some wordarounds we can do until that happens, is merged and backported. Like a switch statement with a case that executes this blocking call and a default case to sleep and check somehow if the process is still running, for example (exit if the process is not running anymore, loop and try to receive a notification otherwise). Maybe something like this, or some other workaround, can be used in the seccomp agent meanwhile.

To know if a process is still valid and don't suffer from pid recycle, we could use the pidfd of the process. But not sure we can get that without any race, so not sure we can use that...

And there doesn't seem to be any way to check if the seccomp fd is still valid either, so... yeah, maybe we can't work around this? It seems weird, I guess LXC/LXD handles this in some way, so maybe we can have a look at what they do to see if there is any way to detect this?

@neblen do you want to experiment with this and have a look to see if we have any options to workaround this issue?

@neblen
Copy link
Author

neblen commented Jan 11, 2023

Hi~ @rata
Yes. I can do some experiments to verify whether the notifyHandler gooroutine can be terminated.
I have an idea: After the process monitored by the notifyHandler coroutine exited, the notifyHandler goroutine would block in the NotifReceive function. At this time, using another goroutine to write a message to the SeccompFd of the notifyHandler gooroutine to activate the NotifReceive function. At this time, the notifyHandler goroutine can exit by itself.

But now I am not sure how to successfully write data to SeccompFd by goroutines.

@rata
Copy link
Member

rata commented Jan 11, 2023

Talking with alban, he remembered you get a POLLHUP event on the fd: torvalds/linux@99cdb8b

Userspace is currently not polling on this, IIRC, but that can be a solution for the mean time. With that option, though, not sure if it is worth writing a patch to improve for users only calling blocking functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants