Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subsys: samples: doc: arch: Make perf subsystem #72890

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

KushnerovMikhail
Copy link

@KushnerovMikhail KushnerovMikhail commented May 16, 2024

Add profiling util based on periodic stack unwinding. Perf from Linux was taken as a reference.

Code is based on Sampling profiler (#29304) by @Jongy.

The operation of module is based on frame pointer usage and saving registers during interruption handling by zephyr.

General work description:
The unwinding function stay in timer as expiry function so is called during interruption handling. Thus the function have access to saved registers (program counter and frame pointer in particular) of the current thread and use it to unwind the thread stack.
Timer starting and results printing function are made as shell commands for conveniency.
Obtained samples can be translated into flamegraph using stackcollapse.py script.

Flame graph example, generated from echo_server sample:

graph

Kconfig optioins:

  • CONFIG_PERF enables perf subsystem in compilation.
  • CONFIG_PERF_BUFFER_SIZE specifies size of buffer, where samples are stored.

Kconfig requirements:

  • CONFIG_THREAD_STACK_INFO=y provides start and size thread stack values.
  • CONFIG_SMP=n SMP support is not implemented yet.
  • CONFIG_OMIT_FRAME_POINTER=n && CONFIG_OVERRIDE_FRAME_POINTER_DEFAULT=y provide frame pointer.
  • CONFIG_SHELL=y needs because subsystem provides shell commands.
  • CONFIG_RISCV=y || CONFIG_X86_64=y - subsystem implemented only for x86_64 and riscv yet.

@zephyrbot zephyrbot added area: X86 x86 Architecture (32-bit) area: Debugging area: RISCV RISCV Architecture (32-bit & 64-bit) area: Samples Samples labels May 16, 2024
Copy link

Hello @KushnerovMikhail, and thank you very much for your first pull request to the Zephyr project!
Our Continuous Integration pipeline will execute a series of checks on your Pull Request commit messages and code, and you are expected to address any failures by updating the PR. Please take a look at our commit message guidelines to find out how to format your commit messages, and at our contribution workflow to understand how to update your Pull Request. If you haven't already, please make sure to review the project's Contributor Expectations and update (by amending and force-pushing the commits) your pull request if necessary.
If you are stuck or need help please join us on Discord and ask your question there. Additionally, you can escalate the review when applicable. 😊

cfriedt
cfriedt previously approved these changes May 17, 2024
Copy link
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨.Beautiful ✨.


.. code-block:: bash

./FlameGraph/flamegraph.pl perf_buf.foolded > graph.svg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
./FlameGraph/flamegraph.pl perf_buf.foolded > graph.svg
./FlameGraph/flamegraph.pl perf_buf.folded > graph.svg

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Corrected it.


.. code-block:: bash

python zephyr/scripts/perf/stackcollapse.py perf_buf <build_dir>/zephyr/zephyr.elf > perf_buf.foolded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python zephyr/scripts/perf/stackcollapse.py perf_buf <build_dir>/zephyr/zephyr.elf > perf_buf.foolded
python zephyr/scripts/perf/stackcollapse.py perf_buf <build_dir>/zephyr/zephyr.elf > perf_buf.folded

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Corrected it.

Copy link
Member

@nashif nashif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split the commit into multiple commits, i.e. 1 commit per arch, docs, adding the subsystem code and finally the sample.
Also, perf itself does not qualify as standalone subsystem. I suggest putting this under subsys/profiling/

Copy link
Collaborator

@ycsin ycsin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARM64 supports stack unwinding as well. The docs can use some review from @kartben

@@ -0,0 +1,14 @@
# cOPYRIGHT (C) 2023 kns gROUP llc (yadro)
Copy link
Collaborator

@ycsin ycsin May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and other similar ones, be consistent

Suggested change
# cOPYRIGHT (C) 2023 kns gROUP llc (yadro)
# Copyright (c) 2023 KNS Group LLC (YADRO)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how I let this happen. Thanks for the comment. Corrected it.

menuconfig PROFILING
bool "Profiling tools"
help
Enable profiling tools, such as perf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and other Kconfig files, fix indent

Suggested change
Enable profiling tools, such as perf
Enable profiling tools, such as perf

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Corrected it.


config PROFILING_PERF
bool "Perf support"
default n
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n is already default

Suggested change
default n

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected it.

Comment on lines 10 to 11
depends on !OMIT_FRAME_POINTER
depends on OVERRIDE_FRAME_POINTER_DEFAULT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use CONFIG_FRAME_POINTER after #72646 is merged

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Сhanged code to use a CONFIG_FRAME_POINTER.

perf_data.idx += trace_length;
} else {
--perf_data.idx;
printf("perf buf override!\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should print to the shell, have a look at k_timer_user_data_set

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to create an additional handler thread because the shell_print does not work in interrupts.
I hope I understood correctly how you wanted me to use k_timer_user_data_set.


k_timer_start(&perf_data.timer, K_NO_WAIT, K_NSEC(period));

printf("Enabled perf\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should print to the shell

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected it. It prints to shell now.


int cmd_perf_print(const struct shell *sh, size_t argc, char **argv)
{
printf("Perf buf length %zu\n", perf_data.idx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please print to shell

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected it. It prints to shell now.

}

SHELL_STATIC_SUBCMD_SET_CREATE(m_sub_perf,
SHELL_CMD_ARG(record, NULL, "Start recording for X ms on Y Hz", cmd_perf_record, 3, 0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like

#define CMD_HELP_RECORD						\
	"Start recording for <duration> ms on <frequency> Hz\n"	\
	"Usage: record <duration> <frequency>\n"
Suggested change
SHELL_CMD_ARG(record, NULL, "Start recording for X ms on Y Hz", cmd_perf_record, 3, 0),
SHELL_CMD_ARG(record, NULL, CMD_HELP_RECORD, cmd_perf_record, 3, 0),

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your help string is really better then mine. Thanks for help

}

/*
* This function use frame poiters to unwind stack and get trace of return addresses.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* This function use frame poiters to unwind stack and get trace of return addresses.
* This function use frame pointers to unwind stack and get trace of return addresses.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected it.

* Return addresses are translated in corresponding function's names using .elf file.
* So we get function call trace
*/
size_t arch_perf_current_stack_trace(uintptr_t *buf, size_t size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance that this and https://github.com/zephyrproject-rtos/zephyr/blob/main/arch/riscv/core/stacktrace.c can share some of the unwinding logics?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #73587

Mikhail Kushnerov added 6 commits May 31, 2024 18:55
Add profiling subsystem.

Add perf util based on periodic stack unwinding. Perf from Linux
was taken as a reference.

The operation of module is based on frame pointer usage and saving
registers during interruption handling.
The unwinding function stay in timer as expiry functioin so is called
during interruption handling. Thus the function have access to saved
registers (program counter and frame pointer in particular) of the current
thread and use it to unwind the thread stack.
Timer starting and results printing function are made as shell commands
for conveniency.

Originally-by: Yonatan Goldschmidt <yon.goldschmidt@gmail.com>
Signed-off-by: Mikhail Kushnerov <m.kushnerov@yadro.com>
Implement stack trace function for riscv arch, that get required
thread register values and unwind stack with it.

Signed-off-by: Mikhail Kushnerov <m.kushnerov@yadro.com>
Implement stack trace function for x86_64 arch, that get required
thread register values and unwind stack with it.

Originally-by: Yonatan Goldschmidt <yon.goldschmidt@gmail.com>
Signed-off-by: Mikhail Kushnerov <m.kushnerov@yadro.com>
Samples, that were obtained by profiling perf tool, can be be translated
into flamegraph using stackcollapse.py script.

Originally-by: Yonatan Goldschmidt <yon.goldschmidt@gmail.com>
Signed-off-by: Mikhail Kushnerov <m.kushnerov@yadro.com>
Add doc for profiling perf tool

Signed-off-by: Mikhail Kushnerov <m.kushnerov@yadro.com>
The operation of perf tool can be checked using this sample.

Signed-off-by: Mikhail Kushnerov <m.kushnerov@yadro.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Debugging area: RISCV RISCV Architecture (32-bit & 64-bit) area: Samples Samples area: X86 x86 Architecture (32-bit)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants