Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug(?) in prometheus.CounterVec #1429

Open
niciki opened this issue Jan 9, 2024 · 4 comments
Open

Bug(?) in prometheus.CounterVec #1429

niciki opened this issue Jan 9, 2024 · 4 comments

Comments

@niciki
Copy link

niciki commented Jan 9, 2024

Hello everyone! I am faced with the problem of re-registering metrics. I have seen the last occurrence of a similar problem in #242.

Link to the repository with the sample code: https://github.com/niciki/prom_bug/tree/main

OS: MacOS 14.1.1 M1 Pro,
Golang: 1.21
client_golang: 1.18.0

I am creating the following metric:

httpNbReq := promauto.NewCounterVec(
	prometheus.CounterOpts{
		Namespace: ns,
		Subsystem: "http",
		Name:      "nnnnnn",
	}, []string{"ID"},
)

and I plan to count the access to the method with a different, small number of different ids.
There is a problem with the following requests at the same time:

curl --location 'http://localhost:8080/v1/info/79'
curl --location 'http://localhost:8080/v1/info/711'
curl --location 'http://localhost:8080/v1/info/7'
curl --location 'http://localhost:8080/v1/info/171'

and after them I try get metrics:

curl --location 'http://localhost:9090/'

and I get an error of about the following type:

 collected metric "ns_http_nnnnnn" { label:{name:"ID"  value:"171"}  counter:{value:1  created_timestamp:{seconds:1704806774  nanos:95948000}}} was collected before with the same name and label values

I register this metric once and therefore this error caught me by surprise. I tried to make test conclusions of the variable id transmitted by passing it to the function sm.WithLabelValues(id).Add(1), thinking that I was having some kind of error, but no, everything was correct. This may be useful, but I was interested in digging into the prometheus source code and I noticed the following.
I found that it looks like an array that stores hashes and label values is being changed:
image
it seems that the number that corresponds to its hash is changing, but I could not understand the possible reason for this behavior:
image

@theskch
Copy link

theskch commented Jan 25, 2024

I ran into the same issue in our production code and traced it back to metricMap overwriting the label value, while the hash remains from the old value. It baffles me how easy it is to reproduce the issue with the code @niciki shared, while I did something similar with no luck (no errors, everything was working as expected). The requests don't even have to be executed at the same time, I managed to get the error while making a curl call every 2 seconds or so, after the third request.
Worth noting that the same issue happens for GaugeVec and HistogramVec. It looks like metricMap gets updated before it enters the getOrCreateMetricWithLabelValues although I didn't manage to trace where the update happens. I would really appreciate some help here, as it's causing our metrics to become unreliable, since the service is unable to provide metrics because of the `errors.

@niciki
Copy link
Author

niciki commented Jan 25, 2024

I ran into the same issue in our production code and traced it back to metricMap overwriting the label value, while the hash remains from the old value. It baffles me how easy it is to reproduce the issue with the code @niciki shared, while I did something similar with no luck (no errors, everything was working as expected). The requests don't even have to be executed at the same time, I managed to get the error while making a curl call every 2 seconds or so, after the third request. Worth noting that the same issue happens for GaugeVec and HistogramVec. It looks like metricMap gets updated before it enters the getOrCreateMetricWithLabelValues although I didn't manage to trace where the update happens. I would really appreciate some help here, as it's causing our metrics to become unreliable, since the service is unable to provide metrics because of the `errors.

Hello! Glad to meet someone with a similar problem. Try rewriting the variable into a new variable before passing it to the Prometheus function. I can't explain why, but it seems to be starting to work. I still haven’t been able to figure out the reasons why this doesn’t work( By the way, I couldn’t find in the source code where this map could change

@theskch
Copy link

theskch commented Jan 26, 2024

Hey @Nicik, unfortunately no luck with variable passing, still the same issue. I'll try to dig up more, it would be nice if we can get some clarity on why and where the previously added metric gets updated with new label value.

@mihail641
Copy link

@Nicik, you are using Fiber, in Fiber http parameters, body unsafe and mutable. Set Immutable: true in fiber.Config.
Or you can copy unsafe string value in immutable string fmt.Sprinf("%s", myVal)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants