storage/backend: fatal if there is panic during defrag #17757

fuweid · 2024-04-09T15:44:43Z

We should exit as soon as possible if there is panic during defrag. Because that tx might be closed. The inflight request might use invalid tx and then panic as well. However, the real panic might be shadowed by new panic. It's related to goroutine schedule. So, we should fatal here.

How to reproduce bbolt#715:

diff --git a/server/etcdserver/api/v3rpc/maintenance.go b/server/etcdserver/api/v3rpc/maintenance.go
index 4f55c7c74..0a91b4225 100644
--- a/server/etcdserver/api/v3rpc/maintenance.go
+++ b/server/etcdserver/api/v3rpc/maintenance.go
@@ -89,7 +89,13 @@ func NewMaintenanceServer(s *etcdserver.EtcdServer, healthNotifier notifier) pb.
 func (ms *maintenanceServer) Defragment(ctx context.Context, sr *pb.DefragmentRequest) (*pb.DefragmentResponse, error) {
        ms.lg.Info("starting defragment")
        ms.healthNotifier.defragStarted()
-       defer ms.healthNotifier.defragFinished()
+       defer func() {
+               ms.healthNotifier.defragFinished()
+               if rerr := recover(); rerr != nil {
+                       time.Sleep(30 * time.Second)
+                       panic(rerr)
+               }
+       }()
        err := ms.bg.Backend().Defrag()
        if err != nil {
                ms.lg.Warn("failed to defragment", zap.Error(err))

$ make gofail-enable
$ make
$ GOFAIL_HTTP="127.0.0.1:1234" bin/etcd

$ New Terminal
$ curl http://127.0.0.1:1234/defragBeforeRename -XPUT -d'panic()'
$ bin/etcdctl defrag

$ Old Terminal
{"level":"info","ts":"2024-04-09T23:28:54.084434+0800","caller":"v3rpc/maintenance.go:90","msg":"starting defragment"}
{"level":"info","ts":"2024-04-09T23:28:54.087363+0800","caller":"backend/backend.go:509","msg":"defragmenting","path":"default.etcd/member/snap/db","current-db-size-bytes":20480,"current-db-size":"20 kB","current-db-size-in-use-bytes":16384,"current-db-size-in-use":"16 kB"}
{"level":"info","ts":"2024-04-09T23:28:54.091229+0800","caller":"v3rpc/health.go:62","msg":"grpc service status changed","service":"","status":"SERVING"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xbb2553]

goroutine 156 [running]:
go.etcd.io/bbolt.(*Tx).Bucket(...)
        go.etcd.io/bbolt@v1.4.0-alpha.0/tx.go:112
go.etcd.io/etcd/server/v3/storage/backend.(*baseReadTx).UnsafeRange(0xc00061ac80, {0x12b7780, 0x1a73420}, {0x1a113e8, 0xe, 0xe}, {0x0, 0x0, 0x0}, 0x1)
        go.etcd.io/etcd/server/v3/storage/backend/read_tx.go:103 +0x233
go.etcd.io/etcd/server/v3/storage/schema.UnsafeReadStorageVersion({0x7f6280613618?, 0xc00061ac80?})
        go.etcd.io/etcd/server/v3/storage/schema/version.go:35 +0x5d
go.etcd.io/etcd/server/v3/storage/schema.UnsafeDetectSchemaVersion(0xc000334b80, {0x7f6280613618, 0xc00061ac80})
        go.etcd.io/etcd/server/v3/storage/schema/schema.go:94 +0x47
go.etcd.io/etcd/server/v3/storage/schema.DetectSchemaVersion(0xc000334b80, {0x12b77f0, 0xc00061ac80})
        go.etcd.io/etcd/server/v3/storage/schema/schema.go:89 +0xf2
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).StorageVersion(0xc000393c08)
        go.etcd.io/etcd/server/v3/etcdserver/server.go:2216 +0x105
go.etcd.io/etcd/server/v3/etcdserver.(*serverVersionAdapter).GetStorageVersion(0x0?)
        go.etcd.io/etcd/server/v3/etcdserver/adapters.go:77 +0x16
go.etcd.io/etcd/server/v3/etcdserver/version.(*Monitor).UpdateStorageVersionIfNeeded(0xc00002df70)
        go.etcd.io/etcd/server/v3/etcdserver/version/monitor.go:112 +0x5d
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).monitorStorageVersion(0xc000393c08)
        go.etcd.io/etcd/server/v3/etcdserver/server.go:2259 +0xa8
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).GoAttach.func1()
        go.etcd.io/etcd/server/v3/etcdserver/server.go:2440 +0x59
created by go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).GoAttach in goroutine 1
        go.etcd.io/etcd/server/v3/etcdserver/server.go:2438 +0xf9

REF: etcd-io/bbolt#715

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

We should exit as soon as possible if there is panic during defrag. Because that tx might be closed. The inflight request might use invalid tx and then panic as well. However, the real panic might be shadowed by new panic. It's related to goroutine schedule. So, we should fatal here. How to reproduce bbolt#715: ```diff diff --git a/server/etcdserver/api/v3rpc/maintenance.go b/server/etcdserver/api/v3rpc/maintenance.go index 4f55c7c74..0a91b4225 100644 --- a/server/etcdserver/api/v3rpc/maintenance.go +++ b/server/etcdserver/api/v3rpc/maintenance.go @@ -89,7 +89,13 @@ func NewMaintenanceServer(s *etcdserver.EtcdServer, healthNotifier notifier) pb. func (ms *maintenanceServer) Defragment(ctx context.Context, sr *pb.DefragmentRequest) (*pb.DefragmentResponse, error) { ms.lg.Info("starting defragment") ms.healthNotifier.defragStarted() - defer ms.healthNotifier.defragFinished() + defer func() { + ms.healthNotifier.defragFinished() + if rerr := recover(); rerr != nil { + time.Sleep(30 * time.Second) + panic(rerr) + } + }() err := ms.bg.Backend().Defrag() if err != nil { ms.lg.Warn("failed to defragment", zap.Error(err)) ``` ```bash $ make gofail-enable $ make $ GOFAIL_HTTP="127.0.0.1:1234" bin/etcd $ New Terminal $ curl http://127.0.0.1:1234/defragBeforeRename -XPUT -d'panic()' $ bin/etcdctl defrag $ Old Terminal {"level":"info","ts":"2024-04-09T23:28:54.084434+0800","caller":"v3rpc/maintenance.go:90","msg":"starting defragment"} {"level":"info","ts":"2024-04-09T23:28:54.087363+0800","caller":"backend/backend.go:509","msg":"defragmenting","path":"default.etcd/member/snap/db","current-db-size-bytes":20480,"current-db-size":"20 kB","current-db-size-in-use-bytes":16384,"current-db-size-in-use":"16 kB"} {"level":"info","ts":"2024-04-09T23:28:54.091229+0800","caller":"v3rpc/health.go:62","msg":"grpc service status changed","service":"","status":"SERVING"} panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xbb2553] goroutine 156 [running]: go.etcd.io/bbolt.(*Tx).Bucket(...) go.etcd.io/bbolt@v1.4.0-alpha.0/tx.go:112 go.etcd.io/etcd/server/v3/storage/backend.(*baseReadTx).UnsafeRange(0xc00061ac80, {0x12b7780, 0x1a73420}, {0x1a113e8, 0xe, 0xe}, {0x0, 0x0, 0x0}, 0x1) go.etcd.io/etcd/server/v3/storage/backend/read_tx.go:103 +0x233 go.etcd.io/etcd/server/v3/storage/schema.UnsafeReadStorageVersion({0x7f6280613618?, 0xc00061ac80?}) go.etcd.io/etcd/server/v3/storage/schema/version.go:35 +0x5d go.etcd.io/etcd/server/v3/storage/schema.UnsafeDetectSchemaVersion(0xc000334b80, {0x7f6280613618, 0xc00061ac80}) go.etcd.io/etcd/server/v3/storage/schema/schema.go:94 +0x47 go.etcd.io/etcd/server/v3/storage/schema.DetectSchemaVersion(0xc000334b80, {0x12b77f0, 0xc00061ac80}) go.etcd.io/etcd/server/v3/storage/schema/schema.go:89 +0xf2 go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).StorageVersion(0xc000393c08) go.etcd.io/etcd/server/v3/etcdserver/server.go:2216 +0x105 go.etcd.io/etcd/server/v3/etcdserver.(*serverVersionAdapter).GetStorageVersion(0x0?) go.etcd.io/etcd/server/v3/etcdserver/adapters.go:77 +0x16 go.etcd.io/etcd/server/v3/etcdserver/version.(*Monitor).UpdateStorageVersionIfNeeded(0xc00002df70) go.etcd.io/etcd/server/v3/etcdserver/version/monitor.go:112 +0x5d go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).monitorStorageVersion(0xc000393c08) go.etcd.io/etcd/server/v3/etcdserver/server.go:2259 +0xa8 go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).GoAttach.func1() go.etcd.io/etcd/server/v3/etcdserver/server.go:2440 +0x59 created by go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).GoAttach in goroutine 1 go.etcd.io/etcd/server/v3/etcdserver/server.go:2438 +0xf9 ``` REF: etcd-io/bbolt#715 Signed-off-by: Wei Fu <fuweid89@gmail.com>

fuweid · 2024-04-12T02:41:20Z

ping @ahrtr @serathius @jmhbnz @siyuanfoundation request a review on this. hope it can help for failpoint test.

ahrtr · 2024-04-12T08:36:11Z

server/storage/backend/backend.go

+	defer func() {
+		// NOTE: We should exit as soon as possible because that tx
+		// might be closed. The inflight request might use invalid
+		// tx and then panic as well. The real panic reason might be
+		// shadowed by new panic. So, we should fatal here with lock.
+		if rerr := recover(); rerr != nil {
+			b.lg.Fatal("unexpected panic during defrag", zap.Any("panic", rerr))
+		}
+	}()


During defragmentation, we acquire all locks, so any TXs (including both batchTx and readTx) will never be closed by other operations, because all other operations will be blocked. The only exception is there may be some inflight concurrent read operations, but which will never close any TX either.

Also etcdserver will wait for all read TXs to finish when closing the bbolt db; so the defrag will not impact any read TXs either.

I agree printing the real panic is a little better so that it won't be shadowed by any other new panic. But I would suggest that we have a overall solution as mentioned in etcd-io/bbolt#715 (comment)

But I would suggest that we have a overall solution as mentioned

I do have a plan. Still working on POC. Will file pull request for RFC when it's ready.

I agree printing the real panic is a little better so that it won't be shadowed by any other new panic.

Side note: The original issue is from robustness test suite. The failpoint was injected during serving requests, not in shutdown phase.

Side note: The original issue is from robustness test suite. The failpoint was injected during serving requests, not in shutdown phase.

IN real production environment, the only possible reason for panicking a defragmentation is system error, e.g. power-off, OOM, I/O issue etc.

YES. So, in this patch, fatal within lock is kind of fail fast. :) Even if we have better drain solution, it's still possible to use nil pointer during handling other inflight requests.

I suggest not to b.lg.Fatal

It calls os.Exit(1) to terminate the process immediately. Deferred functions won't get executed, so any cleanup tasks won't get executed.

As mentioned above, during defragmentation, we acquires all locks, so defragmentation won't conflict with any other operations. The TXN won't be closed by any other opeartion/goroutine during defragmentation.

Also I don't see any relationship between this PR and the issue UnsafeRange panicking during shutdown #17223

Fatal will call os.Exit(1) with flushing pending log. I think it's better than os.Exit(1) here.

Also I don't see any relationship between this PR and the issue #17223

Yes. It's about bbolt#715. I was thinking that it's related to termination phase so I linked it.
Sorry for that confusion.

As mentioned above, during defragmentation, we acquires all locks, so defragmentation won't conflict with any other operations. The TXN won't be closed by any other opeartion/goroutine during defragmentation.

Yes. However, if there is panic, all the defer functions will be called and the lock will be released. If there is any pending lock acquire function, it can get the lock and do query. It depends on goroutine schedule. Before exit caused by panic, other goroutines stil have chance to execute somethings. It depends on kernel or go runtime.

Based on bbolt#715 comment, the failpoint make the original db reference closed. The unsafe query will panic as well and shadow the original one.

ahrtr reviewed Apr 12, 2024

View reviewed changes

fuweid mentioned this pull request Apr 15, 2024

UnsafeRange panicking during shutdown #17223

Open

4 tasks

ahrtr approved these changes Apr 19, 2024

View reviewed changes

serathius self-requested a review April 19, 2024 11:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage/backend: fatal if there is panic during defrag #17757

storage/backend: fatal if there is panic during defrag #17757

fuweid commented Apr 9, 2024

fuweid commented Apr 12, 2024

ahrtr Apr 12, 2024

fuweid Apr 12, 2024

ahrtr Apr 12, 2024

fuweid Apr 14, 2024 •

edited

ahrtr Apr 18, 2024

fuweid Apr 19, 2024

storage/backend: fatal if there is panic during defrag #17757

Are you sure you want to change the base?

storage/backend: fatal if there is panic during defrag #17757

Conversation

fuweid commented Apr 9, 2024

fuweid commented Apr 12, 2024

ahrtr Apr 12, 2024

Choose a reason for hiding this comment

fuweid Apr 12, 2024

Choose a reason for hiding this comment

ahrtr Apr 12, 2024

Choose a reason for hiding this comment

fuweid Apr 14, 2024 • edited

Choose a reason for hiding this comment

ahrtr Apr 18, 2024

Choose a reason for hiding this comment

fuweid Apr 19, 2024

Choose a reason for hiding this comment

fuweid Apr 14, 2024 •

edited