Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deflake transport timeout case #14861

Merged
merged 2 commits into from Nov 27, 2022

Conversation

fuweid
Copy link
Contributor

@fuweid fuweid commented Nov 27, 2022

There is data race on stop channel. After verify write-timeout successfully, the case won't wait for blocker to receive close signal from stop channel. If the new blocker, which is to read-timeout verifier, get dial's result immediately, the new blocker might fetch the message from stop channel before old one and then close the connection, which causes that the conn.Read returns EOF when it reads data.

How to reproduce this in linux devbox?

Use taskset to limit the test process in one-cpu.

cd ./client/pkg/transport
go test -c -o /tmp/test --race=true ./
taskset -c 0 /tmp/test -test.run TestWriteReadTimeoutListener -test.v -test.cpu 4 -test.count=10000 -test.failfast

To fix this, suggest to use seperate stop channel to prevent from data race.

Signed-off-by: Wei Fu fuweid89@gmail.com

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

REF: #14834 (comment)

There is data race on `stop` channel. After verify write-timeout successfully,
the case won't wait for `blocker` to receive close signal from `stop` channel.
If the new `blocker`, which is to read-timeout verifier, get dial's result
immediately, the new `blocker` might fetch the message from `stop` channel
before old one and then close the connection, which causes that the
`conn.Read` returns `EOF` when it reads data.

How to reproduce this in linux devbox?

Use `taskset` to limit the test process in one-cpu.

```bash
cd ./client/pkg/transport
go test -c -o /tmp/test --race=true ./
taskset -c 0 /tmp/test -test.run TestWriteReadTimeoutListener -test.v -test.cpu 4 -test.count=10000 -test.failfast
```

To fix this, suggest to use seperate `stop` channel to prevent from data
race.

Signed-off-by: Wei Fu <fuweid89@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
@codecov-commenter
Copy link

Codecov Report

Merging #14861 (cd9ade5) into main (cdb9b8b) will decrease coverage by 0.22%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main   #14861      +/-   ##
==========================================
- Coverage   75.56%   75.33%   -0.23%     
==========================================
  Files         457      457              
  Lines       37423    37423              
==========================================
- Hits        28278    28194      -84     
- Misses       7371     7443      +72     
- Partials     1774     1786      +12     
Flag Coverage Δ
all 75.33% <ø> (-0.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/storage/mvcc/watchable_store.go 84.78% <0.00%> (-5.80%) ⬇️
pkg/adt/interval_tree.go 81.95% <0.00%> (-5.27%) ⬇️
raft/rafttest/node.go 95.00% <0.00%> (-5.00%) ⬇️
server/proxy/grpcproxy/watch.go 92.48% <0.00%> (-4.05%) ⬇️
server/etcdserver/txn/util.go 75.47% <0.00%> (-3.78%) ⬇️
client/v3/leasing/txn.go 88.09% <0.00%> (-3.18%) ⬇️
server/etcdserver/api/v3rpc/watch.go 84.76% <0.00%> (-2.86%) ⬇️
server/etcdserver/api/rafthttp/peer.go 85.06% <0.00%> (-1.95%) ⬇️
pkg/traceutil/trace.go 96.15% <0.00%> (-1.93%) ⬇️
client/v3/leasing/kv.go 89.03% <0.00%> (-1.33%) ⬇️
... and 10 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@serathius serathius merged commit 1503f46 into etcd-io:main Nov 27, 2022
@fuweid fuweid deleted the deflake-transport-timeout-case branch November 28, 2022 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants