Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update: key too big for map: argument list too long: unknown #33

Open
113xiaoji opened this issue Jan 22, 2024 · 0 comments
Open

update: key too big for map: argument list too long: unknown #33

113xiaoji opened this issue Jan 22, 2024 · 0 comments

Comments

@113xiaoji
Copy link

After running for a while, the containerd logs are continuously reporting an error:

level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-scheduler-master1,Uid:4bba31f5bbd08c1ecb43f3eeca03effb,Namespace:kube-system,Attempt:221,} failed, error" error="failed to create containerd task: failed to create init process: failed to insert taskinfo for init process(id=5585c9eb3702e459fb2c73b0314e2d77670df6af8b23b0662c4032e7e328af1a, namespace=k8s.io): update: key too big for map: argument list too long: unknown"

It appears that the error is occurring during the update of an eBPF map. The following Go code seems to be involved in the issue:

// traceInitProcess checks init process is alive and starts to trace it's exit
// event by exitsnoop bpf tracepoint.
func (m *monitor) traceInitProcess(init *initProcess) (retErr error) {
	m.Lock()
	defer m.Unlock()

	fd, err := pidfd.Open(uint32(init.Pid()), 0)
	if err != nil {
		return fmt.Errorf("failed to open pidfd for %s: %w", init, err)
	}
	defer func() {
		if retErr != nil {
			unix.Close(int(fd))
		}
	}()

	// NOTE: The pid might be reused before pidfd.Open(like oom-killer or
	// manually kill), so that we need to check the runc-init's exec.fifo
	// file descriptor which is the "identity" of runc-init. :)
	//
	// Why we don't use runc-state commandline?
	//
	// The runc-state command only checks /proc/$pid/status's starttime,
	// which is not reliable. And then it only checks exec.fifo exist in
	// disk, but the runc-init has been killed. So we can't just use it.
	if err := checkRuncInitAlive(init); err != nil {
		return err
	}

	nsInfo, err := getPidnsInfo(uint32(init.Pid()))
	if err != nil {
		return fmt.Errorf("failed to get pidns info: %w", err)
	}

	if err := m.initStore.Trace(uint32(init.Pid()), &exitsnoop.TaskInfo{
		TraceID:   init.traceEventID,
		PidnsInfo: nsInfo,
	}); err != nil {
		return fmt.Errorf("failed to insert taskinfo for %s: %w", init, err)
	}
	defer func() {
		if retErr != nil {
			m.initStore.DeleteTracingTask(uint32(init.Pid()))
			m.initStore.DeleteExitedEvent(init.traceEventID)
		}
	}()

	// Before trace it, the init-process might be killed and the exitsnoop
	// tracepoint will not work, we need to check it alive again by pidfd.
	if err := fd.SendSignal(0, 0); err != nil {
		return err
	}

	if err := m.pidPoller.Add(fd, func() error {
		// TODO(fuweid): do we need to check the pid value in event?
		status, err := m.initStore.GetExitedEvent(init.traceEventID)
		if err != nil {
			init.SetExited(unexpectedExitCode)
			return fmt.Errorf("failed to get exited status: %w", err)
		}

		init.SetExited(int(status.ExitCode))
		return nil
	}); err != nil {
		return err
	}
	return nil
}

It seems that the key is not being validated properly. The key 5585c9eb3702e459fb2c73b0314e2d77670df6af8b23b0662c4032e7e328af1a is just an example, and there are other keys that also fail, such as 1ea7f8369914d19bda8da29673e4f4e037c1b39e185f6f4da0dc167539754ca2, 578193dfea54c854054abdea0a7bea11ab99e35a8d89c6469ed28084d5ab5080.

@113xiaoji 113xiaoji changed the title Background update: key too big for map: argument list too long: unknown Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant