Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seemingly inconsistent behavior when using "args" argument in different namespaces #109

Open
b-t-g opened this issue May 21, 2020 · 1 comment

Comments

@b-t-g
Copy link

b-t-g commented May 21, 2020

This is my first day using kubectl-trace, so I apologize if this is a silly question/if my assessment of it is wrong.

output of kubectl trace version:
git commit: d34d1d5
build date: 2019-09-19 09:00:13 -0600 MDT

output of uname -r on the node:
5.4.0-1009-aws (Ubuntu 20.04 AMI)

Platform:
AWS (with kops)

Problem:

I've been able to successfully run this command:
kubectl trace run -e "tracepoint:syscalls:sys_enter_* { if(args->ret < 0) {@[ustack] = count();} }" pod/some-pod -a

However, when I try to run this command in a different namespace (on a pod in the kube-system namespace for example), I get:

if your program has maps to print, send a SIGINT using Ctrl-C, if you want to interrupt the execution send SIGINT two times
definitions.h:55:3: error: unknown type name 'key_serial_t'
definitions.h:123:3: error: unknown type name 'cap_user_header_t'
definitions.h:124:3: error: unknown type name 'cap_user_data_t'
definitions.h:133:3: error: unknown type name 'cap_user_header_t'
definitions.h:134:9: error: unknown type name 'cap_user_data_t'
definitions.h:375:9: error: unknown type name 'sigset_t'
definitions.h:1102:3: error: unknown type name 'aio_context_t'
definitions.h:1113:3: error: unknown type name 'aio_context_t'
definitions.h:1122:3: error: unknown type name 'aio_context_t'
definitions.h:1135:3: error: unknown type name 'aio_context_t'
definitions.h:1150:3: error: unknown type name 'aio_context_t'
definitions.h:1159:3: error: unknown type name 'aio_context_t'
definitions.h:1174:9: error: unknown type name 'sigset_t'
definitions.h:1987:3: error: unknown type name 'siginfo_t'
definitions.h:2071:9: error: unknown type name 'sigset_t'
definitions.h:2124:3: error: unknown type name 'rwf_t'
definitions.h:2229:3: error: unknown type name 'rwf_t'
definitions.h:2240:3: error: unknown type name 'qid_t'
definitions.h:2417:3: error: unknown type name 'key_serial_t'
fatal error: too many errors emitted, stopping now [-ferror-limit=]
exit status 1                                                           

Steps taken:

At first I thought it had something to do with trying to use the default service account, but following the instructions in the readme does not appear to make a difference.

Afterwards, I noticed this issue on bpftrace which seems to indicate that this is an issue with headers not being installed; however, running the same command with --fetch-headers appended yields:

WARNING: Cannot find distro-specific headers for "Ubuntu". Fetching generic headers.
++ uname -r
+ BUILD_DIR=/linux-generic-5.4.0-1009-aws
++ uname -r
+ SOURCES_DIR=/usr/src/linux-generic-5.4.0-1009-aws
+ '[' '!' -e /usr/src/linux-generic-5.4.0-1009-aws/.installed ']'
+ echo 'Installing kernel headers for generic kernel'
+ fetch_generic_linux_sources
Installing kernel headers for generic kernel
++ uname -r
+ kernel_version=5.4.0-1009-aws
++ echo 5.4.0-1009-aws
Fetching upstream kernel sources for 5.4.0-1009-aws.
+ major_version=5
+ echo 'Fetching upstream kernel sources for 5.4.0-1009-aws.'
+ mkdir -p /linux-generic-5.4.0-1009-aws
+ curl -sL https://www.kernel.org/pub/linux/kernel/v5.x/linux-5.4.0-1009-aws.tar.gz
+ tar --strip-components=1 -xzf - -C /linux-generic-5.4.0-1009-aws
tar: invalid magic
tar: short read
real    0m0.271s
user    0m0.038s
sys    0m0.004s
+ generate_headers
Generating kernel headers
+ echo 'Generating kernel headers'
+ cd /linux-generic-5.4.0-1009-aws
+ zcat /proc/config.gz
zcat: /proc/config.gz: No such file or directory
+ make ARCH=x86 oldconfig
make: *** No rule to make target 'oldconfig'.  Stop.
+ make ARCH=x86 prepare
make: *** No rule to make target 'prepare'.  Stop.
+ find /linux-generic-5.4.0-1009-aws -regex '.*\.c\|.*\.txt\|.*Makefile\|.*Build\|.*Kconfig' -type f -delete
real    0m0.006s
user    0m0.005s
sys    0m0.000s
+ mv /linux-generic-5.4.0-1009-aws /usr/src
real    0m0.001s
user    0m0.001s
sys    0m0.000s
+ touch /usr/src/linux-generic-5.4.0-1009-aws/.installed
+ HEADERS_TARGET=/usr/src/linux-generic-5.4.0-1009-aws
++ uname -r
+ mkdir -p /lib/modules/5.4.0-1009-aws
++ uname -r
+ ln -sf /usr/src/linux-generic-5.4.0-1009-aws /lib/modules/5.4.0-1009-aws/source
++ uname -r
+ ln -sf /usr/src/linux-generic-5.4.0-1009-aws /lib/modules/5.4.0-1009-aws/build
+ touch /lib/modules/.installed
stream closed
kubectl-trace-67b1691a-9ae6-11ea-be33-acde48001122 if your program has maps to print, send a SIGINT using Ctrl-C, if you want to interrupt the execution send SIGINT two times
kubectl-trace-67b1691a-9ae6-11ea-be33-acde48001122 fatal error: '/lib/modules/5.4.0-1009-aws/source/include/linux/kconfig.h' file not found

When changing the bpftrace expression to just: "tracepoint:syscalls:sys_enter_* { @[ustack] = count(); }, then it works regardless of pod/namespace, which leads me to believe that it has something to do with the args parameter.

Apologies again if this the wrong place to ask this kind of question or if my assessment of my problem is incorrect.

@b-t-g
Copy link
Author

b-t-g commented Aug 20, 2020

I took a look at this again and these lines in the fetch-headers logs posted above caught my eye:

+ kernel_version=5.4.0-1009-aws
++ echo 5.4.0-1009-aws
Fetching upstream kernel sources for 5.4.0-1009-aws.
+ major_version=5
+ echo 'Fetching upstream kernel sources for 5.4.0-1009-aws.'
+ mkdir -p /linux-generic-5.4.0-1009-aws
+ curl -sL https://www.kernel.org/pub/linux/kernel/v5.x/linux-5.4.0-1009-aws.tar.gz

Looking at the URL in question, there is no entry for the -aws... kernel version. Looking at the source code for fetch-headers I see this line which is supposed to strip out such information.

Running this script locally (MacOS with the supplied version of AWK; i.e., BSD AWK and not GNU AWK), I get 5.4.0-1009-aws. However, when installing GNU AWK (and mawk which is, apparently standard on Ubuntu and Debian), I get 5.4.0, as expected. I'll see if I can create a new cluster with the same, or a similar image, and see what happens when I run the linked command in the cluster later on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant