Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve stability of riscv-tests for targets with low remotetimeout value #1065

Open
aap-sc opened this issue May 2, 2024 · 5 comments
Open

Comments

@aap-sc
Copy link
Collaborator

aap-sc commented May 2, 2024

After riscv-software-src/riscv-tests#553 was merged in @en-sc expressed the concern that this is a rather radical change and it reduces coverage of the test suite.

During a very heated discussion we've decided that:

  1. It would be nice if we could bring smaller timeouts back (for all targets)
  2. However, in order to do that we should get rid of the sporadic failures when these timeouts are small. Most likely the presence of these timeout errors is a result of missing keep_alive packets that OpenOCD should send to GDB - this needs to be investigated and fixed (if needed)

@en-sc , @TommyMurphyTM1234 FYI

@JanMatCodasip
Copy link
Collaborator

@aap-sc Please, what is the relationship between the timeouts in the testsuite and the coverage? In what way does the extended timeout reduce the coverage? Thank you for explanation.

@TommyMurphyTM1234
Copy link
Collaborator

I presumed that the reasoning here was that (subject to further investigation/analysis) some of the unexpected test failures/exceptions might point to problems with OpenOCD itself - in particular not sending sufficient "keep alive" packets to GDB to avoid anomalous timeouts?

FWIW at the moment I'm encountering several problems when running tests:

@aap-sc
Copy link
Collaborator Author

aap-sc commented May 3, 2024

@JanMatCodasip

what is the relationship between the timeouts in the testsuite and the coverage? In what way does the extended timeout reduce the coverage?

One may have an impression that the current testsuite from riscv-tests is a set of directed tests. It is not. It is more like a set of end-to-end tests that involve a rather complex communication between gdb, openocd, spike/jtag adapter.

Typical GDB command should never fail due to a timeout value even if the target is relatively slow (there are limits of course). Most ordinary users won't touch (and should not touch) remotetimeout settings.

"Reducing coverage" in this context means that after I've increased the remotetimeout for all targets we avoid a situation where OpenOCD has to send a "keep_alive" packet for operation to be successful. This just hides an underlying issues with some operations (and this was the intention till the test suite is stabilized - this is what this ticket is about).

For example, in context of https://github.com/riscv-collab/riscv-openocd/issues/1049 ( (sporadic DownloadTest failure part) the underlying issue is that when OpenOCD processes qCRC request it does not send keep_alive packets. Here is the patch to address this: https://review.openocd.org/c/openocd/+/8227

If we always have these timeouts increased, situations like the one presented above won't happen. Means it won't be tested -> coverage is reduced.

@aap-sc
Copy link
Collaborator Author

aap-sc commented May 3, 2024

I presumed that the reasoning here was that (subject to further investigation/analysis) some of the unexpected test failures/exceptions might point to problems with OpenOCD itself - in particular not sending sufficient "keep alive" packets to GDB to avoid anomalous timeouts?

Yes, this is precisely the reason.

@JanMatCodasip
Copy link
Collaborator

@aap-sc @TommyMurphyTM1234 Understood, thank you for explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants