"fatal error: concurrent map writes" when submitting transaction #570

SupaHam · 2023-02-06T02:31:21Z

Summary

When span.Finish is called, there's a chance a fatal concurrent write could occur due to the forceful setting of device parameters arch and cpu.

Steps To Reproduce

Set "device" context on scope concurrently using scope.SetContext
In parallel, create and Finish a span as is.
fatal error: concurrent map writes

scope.SetContext("device", map[string]interface{}{
  "name": model
})

Expected Behavior

Don't crash the go application because integrations have no concept of context safety.

Screenshots

Sentry.io Event

Environment

SDK

sentry-go version: v0.16.0
Go version: 1.18
Using Go Modules? yes

Sentry

Using hosted Sentry in sentry.io? yes
Using your own Sentry installation? Version:
Anything particular to your environment that could be related to this issue?

Additional context


goroutine 140293796 [running]:
runtime.throw({0x2935f76?, 0x24b8a60?})
        /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc01795ddc0 sp=0xc01795dd90 pc=0x43bcb1
runtime.mapassign_faststr(0x24b8640, 0xc0297caff0, {0x2909cb4, 0x4})
        /usr/local/go/src/runtime/map_faststr.go:295 +0x38b fp=0xc01795de28 sp=0xc01795ddc0 pc=0x416acb
github.com/getsentry/sentry-go.(*environmentIntegration).processor(0x0?, 0xc009852a00, 0xc01795dd38?)
        /build/.go/pkg/mod/github.com/getsentry/sentry-go@v0.16.0/integrations.go:89 +0x21a fp=0xc01795ded0 sp=0xc01795de28 pc=0x8d62ba
github.com/getsentry/sentry-go.(*environmentIntegration).processor-fm(0xc006d61560?, 0xc009852a00?)
        <autogenerated>:1 +0x2f fp=0xc01795def8 sp=0xc01795ded0 pc=0x8e5b6f
github.com/getsentry/sentry-go.(*Client).prepareEvent(0xc000d6e000, 0xc009852a00, 0x0?, {0x30983c0, 0xc006d61560})
        /build/.go/pkg/mod/github.com/getsentry/sentry-go@v0.16.0/client.go:642 +0x6a3 fp=0xc01795ecf0 sp=0xc01795def8 pc=0x8d2683
github.com/getsentry/sentry-go.(*Client).processEvent(0xc000d6e000, 0xc009852a00, 0x0, {0x30983c0, 0xc006d61560})
        /build/.go/pkg/mod/github.com/getsentry/sentry-go@v0.16.0/client.go:569 +0x185 fp=0xc01795eff8 sp=0xc01795ecf0 pc=0x8d1d85
github.com/getsentry/sentry-go.(*Client).CaptureEvent(...)
        /build/.go/pkg/mod/github.com/getsentry/sentry-go@v0.16.0/client.go:395
github.com/getsentry/sentry-go.(*Hub).CaptureEvent(0xc0255b39b0, 0xc009852a00)
        /build/.go/pkg/mod/github.com/getsentry/sentry-go@v0.16.0/hub.go:224 +0x66 fp=0xc01795f040 sp=0xc01795eff8 pc=0x8d5146
github.com/getsentry/sentry-go.(*Span).Finish(0xc000e4fb20)
        /build/.go/pkg/mod/github.com/getsentry/sentry-go@v0.16.0/tracing.go:185 +0x174 fp=0xc01795f0a0 sp=0xc01795f040 pc=0x8defb4
...

This issue has been around for around at least a year. This environment is a game server that gains a lot of unique connections and requires has multiple goroutines running at the initialisation of the connection that are spawning spans, while asynchronously manipulating context as new data comes in.

The text was updated successfully, but these errors were encountered:

tonyo · 2023-02-06T09:50:07Z

Thanks for reporting, we'll look into this 👍

cleptric · 2023-02-06T10:08:48Z

I think we need locks here https://github.com/getsentry/sentry-go/blob/master/integrations.go#L87 and in all other places.

tonyo · 2023-02-23T17:17:19Z

Spent some time on this.
At the moment I don't see a clear scenario of how this specific error could be triggered, given the info provided in this issue 🤔
Like, scope.SetContext and event.Contexts here are essentially operating on different maps, so I suspect there might be something else in play.

Tried to reproduce it with the following test (runs for 10 minutes), but it passes:

// Running like:
// go test -run ^TestSetContextAndFinishTransaction$ -count 1 -race
//
func TestSetContextAndFinishTransaction(t *testing.T) {
	ctx := NewTestContext(ClientOptions{
		EnableTracing:    true,
		TracesSampleRate: 1.0,
	})
	hub := GetHubFromContext(ctx)
	scope := hub.Scope()

	transaction := StartTransaction(ctx, "op")

	go func() {
		for {
			transaction.Finish()
		}
	}()

	go func() {
		for {
			scope.SetContext("device", map[string]interface{}{
				"name": "value",
			})
		}
	}()

	time.Sleep(600 * time.Second)

	fmt.Println("Done!")
}

For comparison, the following test with concurrent map writes fails within 1 second:

func TestConcurrentMapWrite(t *testing.T) {
	m := map[string]string{}

	go func() {
		for {
			m["key1"] = "value1"
		}
	}()

	go func() {
		for {
			m["key2"] = "value2"
		}
	}()

	time.Sleep(1 * time.Second)

	fmt.Println("Done!")
}

For now I can think of the following alternative scenarios that may cause this:

span.Finish (for a transaction span) is called multiple times (for whatever reason), which leads to a data race here. This is something that we generally want to prevent/fix, opened a separate issue for it: span.Finish() should be a noop when span is already finished #587
scope.SetContext(key, newContext) takes newContext that is also modified elsewhere in parallel.

@SupaHam if at some point you have any additional information (e.g. more stack trace examples, ideally where the traces of both goroutines are present), that would help us a lot 🙏

SupaHam · 2023-03-02T19:25:24Z

Hi @tonyo, sorry for the wait.

Your test will not trigger the issue as SetContext is locked, so that will always be safe to run asynchronously. The reported issue is, specifically step 2, is multiple spans (transactions) modifying the same scope asynchronously. Apologies if the issue wasn't clearly described, here is a test that should trigger it for you locally - increase runs if test passes to increasing chances of fatal panic.

func TestAB(t *testing.T) {
	ctx := NewTestContext(sentry.ClientOptions{
		EnableTracing:    true,
		TracesSampleRate: 1,
	})
	hub := sentry.GetHubFromContext(ctx)
	c := make(chan *sentry.Span)
	const runs = 100
	for i := 0; i < runs; i++ {
		go func() {
			span := sentry.StartSpan(ctx, "test", sentry.TransactionName("test"))
			c <- span
			go hub.Scope().SetContext("device", map[string]interface{}{"name": "value"})
		}()
	}
	var wg sync.WaitGroup
	defer wg.Wait()
	for i := 0; i < 4; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for span := range c {
				go span.Finish()
			}
		}()
	}
}

cleptric · 2023-03-06T09:18:13Z

Thanks for the follow-up.
I do acknowledge that we need to fix the race condition, however, I wanted to follow up on your initial example.

We normally recommend that each go routine uses its own hub, as described here: https://docs.sentry.io/platforms/go/concurrency/
As you're using the global hub, there might be further issues like hub/scope bleed happening. For error reporting, I would highly recommend you use a local hub, in case of performance monitoring it depends on your use case. Do you need to mutate the global scope in your case?

SupaHam · 2023-03-06T22:35:13Z

Hi @cleptric. Firstly, I'd like to apologise for not being able to expose our internal proprietary code, but I was trying to demonstrate an example of how it fails. Just to clarify, there is one hub per player in our game, but the loading of player data is done asynchronously. Changing that to be synchronous would add serious wait times and we cannot afford that solution. Yet, the problem is that there is that sliver of a chance for a panic, and it disconnects players from their gaming experience which is very abrupt and ugly for user retention.

I'm not quite sure I see an issue with the usage of asynchronous loading, i think the locking can be justified.

EDIT: I would like to update you on this issue, we have prevented the panics by forcefully setting the properties when creating the context so that this specific faulty map write doesn't ever happen again. But that doesn't seem user friendly, and I can imagine more users coming across this issue.

SupaHam added the Type: Bug label Feb 6, 2023

SupaHam mentioned this issue Feb 6, 2023

Make writing default sentry properties optional #571

Closed

tonyo added the Status: Backlog label Feb 6, 2023

cleptric assigned tonyo Feb 6, 2023

tonyo added the Status: Needs Reproduction label Feb 23, 2023

tonyo mentioned this issue Feb 23, 2023

span.Finish() should be a noop when span is already finished #587

Closed

cleptric removed the Status: Needs Reproduction label Mar 2, 2023

tonyo mentioned this issue May 11, 2023

fix: Clone scope.Context in multiple places to avoid concurrent reads/writes #638

Merged

tonyo closed this as completed in #638 Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"fatal error: concurrent map writes" when submitting transaction #570

"fatal error: concurrent map writes" when submitting transaction #570

SupaHam commented Feb 6, 2023

tonyo commented Feb 6, 2023

cleptric commented Feb 6, 2023

tonyo commented Feb 23, 2023

SupaHam commented Mar 2, 2023

cleptric commented Mar 6, 2023

SupaHam commented Mar 6, 2023 •

edited

"fatal error: concurrent map writes" when submitting transaction #570

"fatal error: concurrent map writes" when submitting transaction #570

Comments

SupaHam commented Feb 6, 2023

Summary

Steps To Reproduce

Expected Behavior

Screenshots

Sentry.io Event

Environment

SDK

Sentry

Additional context

tonyo commented Feb 6, 2023

cleptric commented Feb 6, 2023

tonyo commented Feb 23, 2023

SupaHam commented Mar 2, 2023

cleptric commented Mar 6, 2023

SupaHam commented Mar 6, 2023 • edited

SupaHam commented Mar 6, 2023 •

edited