Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in v1.12.2 #1058

Closed
nicolaasuni opened this issue May 25, 2022 · 6 comments
Closed

Race condition in v1.12.2 #1058

nicolaasuni opened this issue May 25, 2022 · 6 comments

Comments

@nicolaasuni
Copy link

After upgrading to github.com/prometheus/client_golang v1.12.2 I can consistently observe a race condition in our tests.
This issue does not happen with the previous versions.

I am using:

  • go version go1.18.2 linux/amd64
  • Linux **** 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

The errors can be reproduced by using the open source library:
https://github.com/nexmoinc/gosrvlib

And running the metrics tests after upgrading to prometheus v1.12.2:
gotest -race -failfast -v github.com/nexmoinc/gosrvlib/pkg/metrics/prometheus

The results are:

$ gotest -race -failfast -v github.com/nexmoinc/gosrvlib/pkg/metrics/prometheus
=== RUN   TestNew
=== PAUSE TestNew
=== RUN   TestInstrumentHandler
=== PAUSE TestInstrumentHandler
=== RUN   TestInstrumentRoundTripper
=== PAUSE TestInstrumentRoundTripper
=== RUN   TestIncLogLevelCounter
=== PAUSE TestIncLogLevelCounter
=== RUN   TestIncErrorCounter
=== PAUSE TestIncErrorCounter
=== RUN   TestClose
=== PAUSE TestClose
=== RUN   TestInstrumentDB
=== PAUSE TestInstrumentDB
=== RUN   TestWithHandlerOpts
=== PAUSE TestWithHandlerOpts
=== RUN   TestWithCollector
=== PAUSE TestWithCollector
=== RUN   TestWithInboundRequestSizeBuckets
=== PAUSE TestWithInboundRequestSizeBuckets
=== RUN   TestWithInboundResponseSizeBuckets
=== PAUSE TestWithInboundResponseSizeBuckets
=== RUN   TestWithInboundRequestDurationBuckets
=== PAUSE TestWithInboundRequestDurationBuckets
=== RUN   TestWithOutboundRequestDurationBuckets
=== PAUSE TestWithOutboundRequestDurationBuckets
=== CONT  TestNew
=== CONT  TestWithCollector
=== RUN   TestNew/succeeds_with_empty_options
=== CONT  TestIncLogLevelCounter
=== PAUSE TestNew/succeeds_with_empty_options
=== RUN   TestNew/succeeds_with_custom_options
=== CONT  TestWithInboundRequestSizeBuckets
=== CONT  TestWithInboundResponseSizeBuckets
=== PAUSE TestNew/succeeds_with_custom_options
=== CONT  TestInstrumentRoundTripper
=== CONT  TestWithOutboundRequestDurationBuckets
=== RUN   TestNew/fails_with_invalid_option
=== CONT  TestClose
=== CONT  TestIncErrorCounter
=== CONT  TestWithHandlerOpts
=== CONT  TestInstrumentHandler
=== CONT  TestWithInboundRequestDurationBuckets
--- PASS: TestWithInboundRequestSizeBuckets (0.00s)
=== PAUSE TestNew/fails_with_invalid_option
=== RUN   TestNew/fails_with_duplicate_collector
--- PASS: TestWithCollector (0.00s)
--- PASS: TestWithInboundResponseSizeBuckets (0.00s)
=== CONT  TestInstrumentDB
=== PAUSE TestNew/fails_with_duplicate_collector
=== CONT  TestNew/succeeds_with_empty_options
==================
WARNING: DATA RACE
Read at 0x00c000426120 by goroutine 79:
  runtime.mapaccess2_faststr()
      /usr/local/go/src/runtime/map_faststr.go:108 +0x0
  runtime/metrics.runtime_readMetrics()
      /usr/local/go/src/runtime/metrics.go:563 +0xb9
  github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
      /home/user/go/pkg/mod/github.com/prometheus/client_golang@v1.12.2/prometheus/registry.go:446 +0x131

Previous write at 0x00c000426120 by goroutine 70:
  runtime.mapassign_faststr()
      /usr/local/go/src/runtime/map_faststr.go:203 +0x0
  runtime.initMetrics()
      /usr/local/go/src/runtime/metrics.go:67 +0x1fc
  github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
      /home/user/go/pkg/mod/github.com/prometheus/client_golang@v1.12.2/prometheus/registry.go:446 +0x131

Goroutine 79 (running) created at:
  github.com/prometheus/client_golang/prometheus.(*Registry).Gather()
      /home/user/go/pkg/mod/github.com/prometheus/client_golang@v1.12.2/prometheus/registry.go:457 +0x764
  github.com/prometheus/client_golang/prometheus/testutil.GatherAndCount()
      /home/user/go/pkg/mod/github.com/prometheus/client_golang@v1.12.2/prometheus/testutil/testutil.go:139 +0x68
  github.com/nexmoinc/gosrvlib/pkg/metrics/prometheus.TestIncErrorCounter()
      /home/user/go/src/github.com/nexmoinc/gosrvlib/pkg/metrics/prometheus/client_test.go:160 +0x16c
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1439 +0x213
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1486 +0x47

Goroutine 70 (running) created at:
  github.com/prometheus/client_golang/prometheus.(*Registry).Gather()
      /home/user/go/pkg/mod/github.com/prometheus/client_golang@v1.12.2/prometheus/registry.go:457 +0x764
  github.com/prometheus/client_golang/prometheus/testutil.GatherAndCount()
      /home/user/go/pkg/mod/github.com/prometheus/client_golang@v1.12.2/prometheus/testutil/testutil.go:139 +0x68
  github.com/nexmoinc/gosrvlib/pkg/metrics/prometheus.TestIncLogLevelCounter()
      /home/user/go/src/github.com/nexmoinc/gosrvlib/pkg/metrics/prometheus/client_test.go:144 +0x151
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1439 +0x213
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1486 +0x47
==================
=== CONT  TestNew/fails_with_duplicate_collector
=== CONT  TestNew/succeeds_with_custom_options
--- PASS: TestWithOutboundRequestDurationBuckets (0.00s)
--- PASS: TestWithHandlerOpts (0.00s)
--- PASS: TestClose (0.00s)
--- PASS: TestWithInboundRequestDurationBuckets (0.00s)
=== CONT  TestNew/succeeds_with_empty_options
    testing.go:1312: race detected during execution of test
=== CONT  TestNew/fails_with_invalid_option
=== CONT  TestInstrumentDB
    testing.go:1312: race detected during execution of test
--- FAIL: TestInstrumentDB (0.00s)
--- FAIL: TestNew (0.00s)
    --- FAIL: TestNew/succeeds_with_empty_options (0.00s)
    --- PASS: TestNew/fails_with_invalid_option (0.00s)
    --- PASS: TestNew/fails_with_duplicate_collector (0.00s)
    --- PASS: TestNew/succeeds_with_custom_options (0.00s)
=== CONT  TestIncLogLevelCounter
    testing.go:1312: race detected during execution of test
--- FAIL: TestIncLogLevelCounter (0.01s)
=== CONT  TestIncErrorCounter
    testing.go:1312: race detected during execution of test
--- FAIL: TestIncErrorCounter (0.01s)
=== CONT  TestInstrumentRoundTripper
    testing.go:1312: race detected during execution of test
--- FAIL: TestInstrumentRoundTripper (0.01s)
=== CONT  TestInstrumentHandler
    testing.go:1312: race detected during execution of test
--- FAIL: TestInstrumentHandler (0.01s)
FAIL
FAIL	github.com/nexmoinc/gosrvlib/pkg/metrics/prometheus	0.029s
FAIL
@kakkoyun
Copy link
Member

Thanks for reporting. We will have look at it soon.

@kakkoyun kakkoyun added this to the v1.13.0 milestone Jun 17, 2022
@dnwe
Copy link

dnwe commented Jun 22, 2022

@kakkoyun FYI, with a quick bisect on the release-1.12 branch since v1.11.0 (which was fine), this race seems to have been originally introduced by 22da949 (#955) and was then fixed by various race condition improvements that were included in v1.12.1 (which was then passing). Up until d498b3c the branch was still passing without issue, but surprisingly the race condition was occurring again from 7eb9d11 onward and was hence included in v1.12.2 again

The same issue is still present on main (currently 4ad265f) and was introduced by 11ee9ad (the same commit as 7eb9d11, just on main rather than release-1.12 branch)

@kakkoyun
Copy link
Member

@dnwe Thanks for the detailed investigation. I haven't had much time to focus on this. I'll do my best.

However, help is always welcome. If anyone has a clear fix, please go for it.

@bboreham
Copy link
Member

The read is of a package-local variable metrics at https://github.com/golang/go/blob/go1.18.2/src/runtime/metrics.go#L563

The previous write is at https://github.com/golang/go/blob/go1.18.2/src/runtime/metrics.go#L67
which is called from https://github.com/golang/go/blob/go1.18.2/src/runtime/metrics.go#L555.

As far as I can see, a semaphore prevents both of these running concurrently, so I wonder if the Go race-detection library is throwing a false positive?

@cristiangreco
Copy link

@bboreham is right that this is a false positive, reproduced and reported upstream here.

@kakkoyun
Copy link
Member

Thanks, @bboreham and @cristiangreco verifying this. Closing it now.

@kakkoyun kakkoyun closed this as not planned Won't fix, can't repro, duplicate, stale Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants