-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple intermitent failures in seveal SMP targets due to UART missbehaviour #72858
Comments
There is some instability problems with qemu_riscv32 zephyrproject-rtos#72858 which cause this sample test to fail in CI at random. Let's change the integration platform to a reliable one, so this test focuses on the sample and does not produce false test failures due to the platform. Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
CC @dcpleung |
Hi @aescolar, I've checked other QEMU-based platforms that have SMP variants, and I could get this behavior to trigger with
So just FYI this affects other archs as well |
There is some instability problems with qemu_riscv32 zephyrproject-rtos#72858 which cause this sample test to fail in CI at random. Let's change the integration platform to a reliable one, so this test focuses on the sample and does not produce false test failures due to the platform. Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
There is some instability problems with qemu_riscv32 #72858 which cause this sample test to fail in CI at random. Let's change the integration platform to a reliable one, so this test focuses on the sample and does not produce false test failures due to the platform. Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
Lowering priority to medium as this happens seldom enough that the 3 retry mechanism in CI tends to be enough to avoid CI failures, and when not a rerun will with high likelihood be enough. |
I'm a little confused. Is that ' My vague guess is this is a twister bug. @nashif ? |
There is some instability problems with qemu_riscv32 zephyrproject-rtos/zephyr#72858 which cause this sample test to fail in CI at random. Let's change the integration platform to a reliable one, so this test focuses on the sample and does not produce false test failures due to the platform. (cherry picked from commit 19df415) Original-Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no> GitOrigin-RevId: 19df415 Change-Id: Ie35bb6f889211353018ad1139744845aa48bae93 Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/zephyr/+/5549530 Reviewed-by: Fabio Baltieri <fabiobaltieri@google.com> Tested-by: ChromeOS Prod (Robot) <chromeos-ci-prod@chromeos-bot.iam.gserviceaccount.com> Commit-Queue: Fabio Baltieri <fabiobaltieri@google.com>
That string comes from here and is used here. I've seen this issue on our system when using SMP without using twister. I haven't been able to come up with a consistent repro for the issue unfortunately :/. All I know is that it happens sporadically with |
So the bug is just that we have multiple contexts multiplexing the console output and interleaving? Have you tried CONFIG_PRINTK_SYNC=y? That will separate the output cleanly by individual printk() calls (usually but not always the same as "lines") and might work around your bug. But it seems like the actual bug is in the test, making assumptions about serialization that don't work in SMP. |
That's actually enabled by default when SMP is enabled & there is > 1 core enabled Line 13 in 19f645e
This is where my debugging had gotten me and then I got stumped. I was gonna try to create and post a repro here but I'm sidetracked right now :/ |
I have seen related issues with running subsys/ipc tests/samples on an521 where printk messages where corrupted because core0 and core1 were using the same uart concurently. In these cases, it was 2 independent Zephyr instances running (one on each core). I havin’t had time to investigate how to properly fix it, I temporarily removed one of the offending message so that it’s not corrupting the one the test is looking for. I have also seen printk output corruptions in tflm-ethosu sample where |
Describe the bug
Multiple tests are failing intermittently in CI qemu_riscv32/qemu_virt_riscv32/smp and qemu_cortex_a53//smp.
From the logs it appears characters from the UART prompt and output are being mingled breaking messages and losing characters
and causing tests which check the output against a regex to fail.
To Reproduce
Steps to reproduce the behavior:
OR
Expected behavior
No tests errors (and no mingled shell prompt and output messages)
Impact
main CI fails sometimes.
Logs and console output
https://github.com/zephyrproject-rtos/zephyr/actions/runs/9108203173/job/25038560360#step:11:245
https://github.com/zephyrproject-rtos/zephyr/actions/runs/9108203173/job/25038560360#step:11:1245
Environment (please complete the following information):
Aditional info
The issue can be reproduced at least back to 527e712
The text was updated successfully, but these errors were encountered: