Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the error of runc doesn't work with go1.22 #4193

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lifubang
Copy link
Member

@lifubang lifubang commented Feb 7, 2024

As the description in #4233, there is a bug in glibc, pthread_self()
will return wrong info after we do clone(CLONE_PARENT) in libct/nsenter,
it will cause runc can't work in go 1.22.*. So we use fork(2) to replace
clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we
need to use PR_SET_CHILD_SUBREAPER to let runc can reap grand child
process in libct/nsenter.

Fix #4233

@lifubang
Copy link
Member Author

lifubang commented Feb 7, 2024

go 1.22.0 error msg:

DEBU[0000]libcontainer/dmz/cloned_binary_linux.go:202 libcontainer/dmz.IsCloned() F_GET_SEALS on /proc/self/exe failed: invalid argument 
DEBU[0000]libcontainer/dmz/cloned_binary_linux.go:177 libcontainer/dmz.CloneBinary() cloning runc-dmz binary (8736 bytes)         
DEBU[0000]libcontainer/container_linux.go:537 libcontainer.(*Container).newParentProcess() runc-dmz: using runc-dmz                     
DEBU[0000] nsexec[42599]: => nsexec container setup     
DEBU[0000] nsexec-0[42599]: ~> nsexec stage-0           
DEBU[0000] nsexec-0[42599]: spawn stage-1               
DEBU[0000] nsexec-0[42599]: -> stage-1 synchronisation loop 
DEBU[0000] nsexec-1[42600]: ~> nsexec stage-1           
DEBU[0000] nsexec-1[42600]: unshare remaining namespaces 
DEBU[0000] nsexec-1[42600]: spawn stage-2               
DEBU[0000] nsexec-1[42600]: request stage-0 to forward stage-2 pid (42601) 
DEBU[0000] nsexec-0[42599]: stage-1 requested pid to be forwarded 
DEBU[0000] nsexec-0[42599]: forward stage-1 (42600) and stage-2 (42601) pids to runc 
DEBU[0000] nsexec-2[1]: ~> nsexec stage-2               
DEBU[0000] nsexec-1[42600]: signal completion to stage-0 
DEBU[0000] nsexec-1[42600]: <~ nsexec stage-1           
DEBU[0000]libcontainer/process_linux.go:457 libcontainer.(*initProcess).goCreateMountSources.func1() mount source thread: successfully running in container mntns 
DEBU[0000] nsexec-0[42599]: stage-1 complete            
DEBU[0000] nsexec-0[42599]: <- stage-1 synchronisation loop 
DEBU[0000] nsexec-0[42599]: -> stage-2 synchronisation loop 
DEBU[0000] nsexec-0[42599]: signalling stage-2 to run   
DEBU[0000] nsexec-2[1]: signal completion to stage-0    
DEBU[0000] nsexec-2[1]: <= nsexec container setup       
DEBU[0000] nsexec-2[1]: booting up go runtime ...       
DEBU[0000] nsexec-0[42599]: stage-2 complete            
DEBU[0000] nsexec-0[42599]: <- stage-2 synchronisation loop 
DEBU[0000] nsexec-0[42599]: <~ nsexec stage-0           
DEBU[0000]libcontainer/sync.go:127 libcontainer.doReadSync() reading sync                                 
DEBU[0000] sync pipe closed                             
DEBU[0000] mount source thread: closing thread: context canceled 
ERRO[0000] runc run failed: unable to start container process: error during container init: procReady not received

@kolyshkin
Copy link
Contributor

As go has released v1.22.0, so there is no 1.20.x in https://go.dev/dl/?mode=json anymore.

This can be fixed by adding &include=all (i.e. use https://go.dev/dl/?mode=json&include=all). I'll open a PR.

@kolyshkin
Copy link
Contributor

Interestingly, both runc 1.1.12 and runc from git HEAD built with go1.22.0 work fine on my machine (all tests are passing).

@kolyshkin

This comment was marked as outdated.

@kolyshkin
Copy link
Contributor

We also need to fix this for Go 1.22

# (in test file tests/integration/spec.bats, line 37)
#   `GO111MODULE=auto go get github.com/xeipuuv/gojsonschema' failed
# runc spec (status=0):
#
# Cloning into 'runtime-spec'...
# HEAD is now at 4fec88f merge #1219 into main
# go: go.mod file not found in current directory or any parent directory.
# 	'go get' is no longer supported outside a module.
# 	To build and install a command, use 'go install' with a version,
# 	like 'go install example.com/cmd@latest'
# 	For more information, see https://golang.org/doc/go-get-install-deprecation
# 	or run 'go help get' or 'go help install'.

I don't remember why I haven't switched to go install, guess it's not as easy as it seems.

@lifubang
Copy link
Member Author

lifubang commented Feb 9, 2024

Interestingly, both runc 1.1.12 and runc from git HEAD built with go1.22.0 work fine on my machine (all tests are passing).

It seems that cgo may be broken with clone(2) in go1.22.0?
golang/go#65625
PTAL

@kolyshkin
Copy link
Contributor

Interestingly, both runc 1.1.12 and runc from git HEAD built with go1.22.0 work fine on my machine (all tests are passing).

It seems that cgo may be broken with clone(2) in go1.22.0? golang/go#65625 PTAL

Again, I can't repro locally.

[kir@kir-tp1 cgoclone2]$ go version
go version go1.21.6 linux/amd64
[kir@kir-tp1 cgoclone2]$ go run main.go 
STAGE_PARENT
STAGE_CHILD
STAGE_INIT
This from nsexec
From main!
[kir@kir-tp1 cgoclone2]$ go1.22.0 version
go version go1.22.0 linux/amd64
[kir@kir-tp1 cgoclone2]$ go1.22.0 run main.go 
STAGE_PARENT
STAGE_CHILD
STAGE_INIT
This from nsexec
From main!

Maybe it's your kernel version @lifubang? Can you show uname -a?

@kolyshkin
Copy link
Contributor

@lifubang also if you can repro that (alas I can not), you can git bisect golang between 1.21.0 and 1.22.0.

@kolyshkin
Copy link
Contributor

Note in CI it happens with Ubuntu 20.04 but not Ubuntu 22.04. Will try to repro in a VM.

@kolyshkin
Copy link
Contributor

On Ubuntu 20.04, when running the binary compiled with go 1.22, I am seeing a SIGSEGV:

--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xf8} ---

Can't yet figure out what's going on there; will continue tomorrow.

@kolyshkin
Copy link
Contributor

@lifubang I did a bisect, here are the results: golang/go#65625 (comment)

Will continue tomorrow.

@kolyshkin
Copy link
Contributor

It seems that cgo may be broken with clone(2) in go1.22.0?
golang/go#65625

So, to summarize the investigation done there -- it's a glibc bug, in fact, two bugs:

  1. pthread_self() returns wrong info after we do what we do in libct/nsenter
  2. pthread_getattr_np(pthread_self(), &attr) (which Go 1.22 calls internally) does a NULL pointer dereference, so the app gets SIGABRT.

These two bugs are apparently specific to glibc used by Ubuntu 20.04 (libc6 2.31-0ubuntu9.14) and maybe also Debian 10 (libc6 2.28-10+deb10u2), as I was able to reproduce on both. With Debian 10, it even prints error from free: free(): invalid pointer, maybe due to some extra Debian-specific patches, but still gets SIGABRT.

For some reason I was unable to repro on older Fedora (F32, glibc-2.31-2.fc32, F33, glibc-2.32-10.fc33) and Debian 11 (libc6 2.31-7).

The bad news is, every version of glibc has the bug 1 above, and https://go-review.googlesource.com/c/go/+/563379 may make it so go 1.22.x will fail runc init on every version of glibc.

Meaning, we need a workaround for that. Perhaps changing runc libct/nsenter logic in some radical way, so that pthread_self works.

stgraber added a commit to zabbly/incus that referenced this pull request Feb 16, 2024
Go 1.22 currently causes crashes on older Debian/Ubuntu systems.

lxc/incus#497
golang/go#65625
opencontainers/runc#4193

Signed-off-by: Stéphane Graber <stgraber@stgraber.org>
stgraber added a commit to zabbly/incus that referenced this pull request Feb 16, 2024
Go 1.22 currently causes crashes on older Debian/Ubuntu systems.

lxc/incus#497
golang/go#65625
opencontainers/runc#4193

Signed-off-by: Stéphane Graber <stgraber@stgraber.org>
@AkihiroSuda
Copy link
Member

Meaning, we need a workaround for that. Perhaps changing runc libct/nsenter logic in some radical way, so that pthread_self works.

👍

@kolyshkin
Copy link
Contributor

Rebasing this to re-run with Go 1.22.1

@kolyshkin kolyshkin force-pushed the feat-go-1.21-1.22 branch 2 times, most recently from c54384f to 5907889 Compare March 28, 2024 01:16
@kolyshkin
Copy link
Contributor

Sorry @lifubang I've high-jacked your PR, needed to run it with Go 1.22.1 and added missing changes to go.sum to fix failing CI (https://github.com/opencontainers/runc/actions/runs/8460901105/job/23179867537)

@kolyshkin
Copy link
Contributor

OK, Go 1.22.1 makes no difference. I guess we have to disable Go 1.22 for now.

@cyphar
Copy link
Member

cyphar commented Apr 5, 2024

But maybe this will cause the issues like runc-dmz if we use execve in stage-2?

We still have full capabilities at the beginning of stage-2 (both with and without user namespaces) and haven't applied any LSM labels or anything like that. I wouldn't expect there to be any issues.

@kolyshkin

This comment was marked as outdated.

@lifubang
Copy link
Member Author

Do you think moving the c code in stage-0 and stage-2 to golang code could fix this issue or not? I don't know whether the setjmp and longjmp could work in go or not, and it can using together with C or not?
Does it worth to try?

@lifubang
Copy link
Member Author

lifubang commented Apr 11, 2024

and it can using together with C or not?

I think maybe no, because after clone(2), it has already in go routine, it can’t longjmp to C.

So, it seems that there is no other way to fix this issue?

@cyphar
Copy link
Member

cyphar commented Apr 13, 2024

This patch also works, while still allowing us to use CLONE_PARENT. Yes, I'm sure we agree it's not lovely, but IMHO using fork() is depending on glibc internals just as much as this is. If glibc stops using CLONE_CHILD_CLEARTID then fork() will also stop working. The only downside of this approach is that it only works with CONFIG_CHECKPOINT_RESTORE=y but I suspect most people running with containers have that enabled.

diff --git a/libcontainer/nsenter/nsexec.c b/libcontainer/nsenter/nsexec.c
index c771ac7e1165..319899bd9b71 100644
--- a/libcontainer/nsenter/nsexec.c
+++ b/libcontainer/nsenter/nsexec.c
@@ -15,6 +15,7 @@
 #include <stdbool.h>
 #include <string.h>
 #include <unistd.h>
+#include <pthread.h> /* _only_ used for pthread_self() in debug log */
 
 #include <sys/ioctl.h>
 #include <sys/prctl.h>
@@ -319,7 +320,41 @@ static int clone_parent(jmp_buf *env, int jmpval)
 		.jmpval = jmpval,
 	};
 
-	return clone(child_func, ca.stack_ptr, CLONE_PARENT | SIGCHLD, &ca);
+	/*
+	 * Since glibc 2.25 (see c579f48edba88380635ab98cb612030e3ed8691e),
+	 * glibc no longer updates the TLS state containing the current process
+	 * tid after clone(2). This results in stale TIDs being used when Go
+	 * 1.22 and later call pthread_gettattr_np(pthread_self()), resulting
+	 * in crashes on ancient glibcs and errors on newer glibcs.
+	 *
+	 * Luckily, because the same address is used for CLONE_PARENT_SETTID,
+	 * we can poke around in glibc's internal cache by getting the address
+	 * using PR_GET_TID_ADDRESS (only available in Linux >= 3.5, with
+	 * CONFIG_CHECKPOINT_RESTORE=y) and then overwriting it with
+	 * CLONE_CHILD_SETTID. CLONE_CHILD_CLEARTID is set to allow descendant
+	 * PR_GET_TID_ADDRESS calls to work, as well as matching what glibc
+	 * does in arch_fork().
+	 *
+	 * Yes, this is pretty horrific, but the core issue here is that we
+	 * need to run Go code ("runc init") in the child after fork(), which
+	 * is not allowed by glibc (see signal-safety(7)). We cannot exec to
+	 * solve the problem because we are in a security critical situation
+	 * here, and doing an exec would allow for container escapes (obvious
+	 * issues include that the shared libraries loaded from a re-exec would
+	 * come from the container, and doing an exec here would clear the bit
+	 * that makes non-dumpable flags effective for userns containers with
+	 * CAP_SYS_PTRACE).
+	 */
+	pid_t *tid_addr = NULL;
+	if (prctl(PR_GET_TID_ADDRESS, &tid_addr) < 0)
+		/* what should we do here... */;
+	write_log(DEBUG, "nsenter clone: get_tid_address gave us %p (pthread_self=%p)", tid_addr, (void *) pthread_self());
+	if (!tid_addr || *tid_addr != gettid())
+		write_log(WARNING, "nsenter clone: glibc private tid address is wrong: *%p %d != gettid() %d", tid_addr, tid_addr ? *tid_addr : -1, gettid());
+
+	return clone(child_func, ca.stack_ptr,
+		     CLONE_PARENT | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, &ca,
+		     NULL /* parent_tid */ , NULL /* tls */ , tid_addr);
 }
 
 /* Returns the clone(2) flag for a namespace, given the name of a namespace. */

@kolyshkin wdyt?

@cyphar
Copy link
Member

cyphar commented Apr 13, 2024

@lifubang I can also take a look next week at whether we can somehow remove stage-1 so that we don't need a grandchild (which would remove the need for PR_SET_CHILD_SUBREAPER).

@lifubang
Copy link
Member Author

This PR needs some refactor work, so convert it to draft state.

runc/exec.go

Line 184 in df04ed4

enableSubreaper: false,

Signed-off-by: lifubang <lifubang@acmcoder.com>
@lifubang lifubang force-pushed the feat-go-1.21-1.22 branch 2 times, most recently from 1253158 to bdec4c7 Compare May 17, 2024 10:07
@lifubang lifubang marked this pull request as ready for review May 17, 2024 10:44
@lifubang
Copy link
Member Author

Welcome more suggestions.

As the description in opencontainers#4233, there is a bug in glibc, pthread_self()
will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter,
it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace
clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we
need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child
process in libct/nsenter.

Signed-off-by: lifubang <lifubang@acmcoder.com>
This reverts commit ac31da6.

Signed-off-by: lifubang <lifubang@acmcoder.com>
This reverts commit e377e16.

Signed-off-by: lifubang <lifubang@acmcoder.com>
@lifubang lifubang marked this pull request as draft May 17, 2024 23:38
@lifubang lifubang marked this pull request as ready for review May 17, 2024 23:43
@lifubang lifubang marked this pull request as draft May 18, 2024 00:03
logrus.Warn(err)
}
}
func newSignalHandler(notifySocket *notifySocket) *signalHandler {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @kolyshkin pointed out in #4278 (comment)
This PR has done such things. But there is still one thing I can't determine whether we should do or not:

If we use PR_SET_CHILD_SUBREAPER and fork(2) to replace clone(CLONE_PARENT), I think we should move this signal handler to libcontainer. Or else someone use libcontainer directly in the code will have to write a signal handler by themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/todo/1.1 A PR in main branch which needs to be backported to release-1.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

runc doesn't work with go1.22
4 participants