Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU hotplug breaks interval probes #3125

Open
tnovak opened this issue Apr 16, 2024 · 2 comments
Open

CPU hotplug breaks interval probes #3125

tnovak opened this issue Apr 16, 2024 · 2 comments
Labels
bug Something isn't working kernel Issue may require kernel work

Comments

@tnovak
Copy link
Contributor

tnovak commented Apr 16, 2024

Interval probes are attached to PERF_COUNT_SW_CPU_CLOCK events that fire on CPU 0. If that CPU goes offline once an interval probe is attached, that probe never runs, even if the CPU comes back up later:

bpftrace -e 'interval:s:3 { exit(); }' &
echo 0 > /sys/devices/system/cpu/cpu0/online
echo 1 > /sys/devices/system/cpu/cpu0/online
# exit() doesn't get called

(If CPU0 is already offline, perf_event_open() in attach_interval() returns ENODEV, and bpftrace reports an error. This case can be handled by using the first online CPU instead of hardcoding CPU0.)

This issue was observed on an Android device where CPU hotplug can be used to save power.

From my understanding of the kernel code here, once the CPU goes offline (perf_event_exit_cpu), associated perf events transition into PERF_EVENT_STATE_OFF and stay there until they're explicitly re-enabled (e.g. via PERF_EVENT_IOC_ENABLE). Bpftrace doesn't consume the SW_CPU_CLOCK event directly, so it doesn't detect when it becomes disabled.

Looks like there was a proposal to handle this automatically in the kernel, but the patch was rejected: https://lore.kernel.org/all/aee872be-c34d-729b-575d-df488a46f2cf@codeaurora.org/T/

Here are some ideas on how bpftrace might be able to handle this edge case. Unfortunately none of them seems particularly elegant nor trivial to implement.

  • Attach the interval probe to a perf event for each online CPU (just like for profile probes), but emit extra BPF code that ensures it's called only once (per iteration). This could presumably be implemented via a counter and atomic BPF instructions (e.g. BPF_CMPXCHG).
  • Add perf event fds to the epoll fd set; when an event is disabled, attach the BPF program to a new event on one of the online CPUs. Period may need to be adjusted when reattaching.
  • Implement interval probes in userspace, e.g. by having bpftrace spawn a new thread that sleeps (in a loop) for the specific duration before calling a trigger function that has a uprobe attached to it (similar to how BEGIN/END probes work).
@tnovak tnovak added the bug Something isn't working label Apr 16, 2024
@ajor ajor added the kernel Issue may require kernel work label Apr 18, 2024
@ajor
Copy link
Member

ajor commented Apr 18, 2024

It looks like this affects all probe types using perf events (profile, interval, software and hardware).

Ideally, our solution would work for all these probe types and not just interval. We'll want to avoid the situation where a long-running profile gradually loses visibility into more and more CPUs as they go offline and online again.

Tbh the kernel managing this for us seems like the best option to me... Although Peter Zijlstra didn't like the patch as submitted, it doesn't sound like he was completely opposed to the concept of preserving perf events across hotplugs.

Tagging this with "kernel" in case there's anyone who fancies putting in an upstream Linux patch :)

@tnovak
Copy link
Contributor Author

tnovak commented Apr 18, 2024

Ideally, our solution would work for all these probe types and not just interval. We'll want to avoid the situation where a long-running profile gradually loses visibility into more and more CPUs as they go offline and online again.

Tbh the kernel managing this for us seems like the best option to me... Although Peter Zijlstra didn't like the patch as submitted, it doesn't sound like he was completely opposed to the concept of preserving perf events across hotplugs.

I believe there are two slightly different issues, and the proposed kernel patch only addresses the first:

  1. Perf events remain in the off state when CPUs come back online (this affects all probe types using perf events). Ideally this would be handled by the kernel (as in the linked patch), but a not-so-great alternative is bpftrace internally monitoring hotplug events.
  2. Probes that are pinned to a specific CPU (namely, interval probes) stop firing when that CPU is offline.

It doesn't matter which CPU an interval probe runs on (as long as it fires only once per period), so attaching it to all CPUs may still be required even with the kernel patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working kernel Issue may require kernel work
Projects
None yet
Development

No branches or pull requests

2 participants