Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address the race condition #101

Open
Kixunil opened this issue Dec 23, 2022 · 6 comments
Open

Address the race condition #101

Kixunil opened this issue Dec 23, 2022 · 6 comments

Comments

@Kixunil
Copy link

Kixunil commented Dec 23, 2022

If my understanding is correct this crate launches a separate instance per test, which is nice but because of the race condition and tests being run in parallel this can cause flaky tests which is quite bad. It'd be great to fix the race completely.

I took a look at the obvious "what happens if I set -port 0 -rpcport 0" The answer is `bitcoind seems to attempt to bind the default port anyway (WTF?) and doesn't log the actual port so if we do this we have no way of figuring it out.

I see only two solutions that include existing bitcoind instances:

  1. On platforms that support it we could LD_PRELOAD a library that overrides bind() to pass in port 0 and then reports the mapping over a pipe. We could read that report and learn the actual port number.
  2. Create private network namespace which has a custom IP and is somehow allowed to communicate with the parent. This should be possible on all platforms which support Docker because it presumably does this but it may require root which is not nice.

Of course the long term solution is to add direct support to bitcoind itself.

@RCasatta
Copy link
Collaborator

Do you have evidence of flaky tests?

Because ATM bitcoind is tried to be started three times, every time with different ports asked to the OS so it should be pretty hard to hit the race condition.

But yes if bitcoind get some way to communicate which random ports it is started with, like writing a file in datadir, it would be better.

@Kixunil
Copy link
Author

Kixunil commented Dec 25, 2022

It's just my guess that it could happen. I was looking at using this library but so far I went for extending a shell script which I already had. I'm still contemplating rewriting, though.

@MaxFangX
Copy link

Do you have evidence of flaky tests?

🙋‍♀️ I just tried to upgrade our bitcoind from version 22_0 to 23_0 and it introduced flakiness in our tests. We have two tests that spin up a BitcoinD instance, with the result that sometimes the tests will both pass, sometimes they will both fail, and sometimes only 1 of the 2 will fail.

We have not observed any flakiness when using 22_0, so it seems that there's something specific to 23_0 that causes these problems. I am not sure if it is related to the ports issue you guys are discussing; the error I get is SQLite trying to acquire a lock on the database in order to verify a wallet file:

failures:

---- runner::tests::integration::load_shutdown_runner stdout ----
thread 'runner::tests::integration::load_shutdown_runner' panicked at 'Failed to init bitcoind: JsonRpc(Rpc(RpcError { code: -4, message: "Wallet file verification failed. SQLiteDatabase: Unable to obtain an exclusive lock on the database, is it being used by another instance of Bitcoin Core?\n", data: None }))', /Users/fang/lexe/dev/lexe/public/common/src/test_utils/regtest.rs:44:14

---- runner::tests::integration::load_status_shutdown_node stdout ----
thread 'runner::tests::integration::load_status_shutdown_node' panicked at 'Failed to init bitcoind: JsonRpc(Rpc(RpcError { code: -4, message: "Wallet file verification failed. SQLiteDatabase: Unable to obtain an exclusive lock on the database, is it being used by another instance of Bitcoin Core?\n", data: None }))', /Users/fang/lexe/dev/lexe/public/common/src/test_utils/regtest.rs:44:14


failures:
    runner::tests::integration::load_shutdown_runner
    runner::tests::integration::load_status_shutdown_node

test result: FAILED. 30 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 17.45s

The tests use the default bitcoind::Conf, only pushing an -rpcauth=user:passwordhash to the Conf::args.

@tnull
Copy link
Contributor

tnull commented Mar 14, 2023

Can second the wallet flakiness, we're seeing the same issue pop up in CI (cf lightningdevkit/rust-lightning#2098 (comment)):

---- test_esplora_syncs stdout ----
thread 'test_esplora_syncs' panicked at 'called `Result::unwrap()` on an `Err` value: JsonRpc(Rpc(RpcError { code: -4, message: "Wallet file verification failed. SQLiteDatabase: Unable to obtain an exclusive lock on the database, is it being used by another instance of Bitcoin Core?\n", data: None }))', tests/integration_tests.rs:35:65
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Not sure if this the same issue as OP though, and not sure what exactly is the source of this.

@Kixunil
Copy link
Author

Kixunil commented Mar 14, 2023

I think your races are unrelated to port binding.

@RCasatta
Copy link
Collaborator

Not sure if this the same issue as OP though, and not sure what exactly is the source of this.

For the record, this was most likely due to wrapping BitcoinD inside OnceCell causing the drop to not be called: lightningdevkit/rust-lightning#2132

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants