Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statedb: Fix race between Observable and DB stopping #30816

Merged
merged 1 commit into from Feb 17, 2024

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Feb 16, 2024

Since "Observable" forks a goroutine that is not tied to the lifecycle of the application what may occur is that the "observe" goroutine calls DeleteTracker.Close after DB.Stop, leading to:

    panic: send on closed channel

    goroutine 106 [running]:
    github.com/cilium/cilium/pkg/statedb.(*DeleteTracker[...]).Close(0x0)
        /host/pkg/statedb/deletetracker.go:76 +0x21e

While it would be ideal that goroutines created by statedb would be tied to its lifecycle and thus Stop() could wait for e.g. all observable goroutines to be finished, it's not enough as DeleteTracker's may be created outside and stopped after DB. Thus this commit changes the logic to make it safe to call DeleteTracker.Close() even after the DB has stopped.

The fix was validated by adding a "defer time.Sleep(100*time.Millisecond)" to observable.go before the "dt.Close()" to force it to run after DB.Stop, with it failing with "send on closed channel" before fix and passing after.

As a future follow-up it would make sense to use a Hive job group tied to DB's lifecycle to make sure all goroutines are cleaned up (this follow-up will be done against the cilium/statedb repo as it's being moved there). The fix in this commit is already part of cilium/statedb repo and does not need to be ported.

Fixes: #30806
Fixes: 23b0492 ("statedb2: StateDB v2.0 with per-table locks and deletion tracking")

Since "Observable" forks a goroutine that is not tied to the lifecycle of the
application what may occur is that the "observe" goroutine calls DeleteTracker.Close
after DB.Stop, leading to:

    panic: send on closed channel

    goroutine 106 [running]:
    github.com/cilium/cilium/pkg/statedb.(*DeleteTracker[...]).Close(0x0)
        /host/pkg/statedb/deletetracker.go:76 +0x21e

While it would be ideal that goroutines created by statedb would be tied to its lifecycle
and thus Stop() could wait for e.g. all observable goroutines to be finished, it's not
enough as DeleteTracker's may be created outside and stopped after DB. Thus this commit
changes the logic to make it safe to call DeleteTracker.Close() even after the DB has
stopped.

The fix was validated by adding a "defer time.Sleep(100*time.Millisecond)" to observable.go
before the "tracker.Close()" to force it to run after DB.Stop, with it failing with
"send on closed channel" before fix and passing after.

As a future follow-up it would make sense to use a Hive job group tied to DB's
lifecycle to make sure all goroutines are cleaned up (this follow-up will be done against
the cilium/statedb repo as it's being moved there). The fix in this commit is already
part of cilium/statedb repo and does not need to be ported.

Fixes: cilium#30806
Fixes: 23b0492 ("statedb2: StateDB v2.0 with per-table locks and deletion tracking")

Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki added release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Feb 16, 2024
@joamaki joamaki requested a review from a team as a code owner February 16, 2024 14:59
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.15.2 Feb 16, 2024
@pippolo84
Copy link
Member

/test

@joamaki joamaki added this pull request to the merge queue Feb 17, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 17, 2024
Merged via the queue into cilium:main with commit 48bd2ac Feb 17, 2024
63 checks passed
@joamaki joamaki deleted the pr/joamaki/fix-deletetracker-close branch February 17, 2024 10:47
@tklauser tklauser mentioned this pull request Feb 20, 2024
7 tasks
@tklauser tklauser added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Feb 20, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.15 in 1.15.2 Feb 20, 2024
@github-actions github-actions bot added backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. and removed backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. labels Feb 21, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.15 to Backport done to v1.15 in 1.15.2 Feb 21, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Backport pending to v1.15 in 1.15.2 Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pkg/statedb: panic in statedb.(*DeleteTracker[...]).Close
3 participants