Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix metric sort error #1443

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

fix metric sort error #1443

wants to merge 1 commit into from

Conversation

linuxgcc
Copy link

@linuxgcc linuxgcc commented Feb 2, 2024

@ArthurSens
Copy link
Member

ArthurSens commented Feb 2, 2024

Although I agree that sometimes humans debug exposed metrics by manually looking the /metrics endpoint, I believe this endpoint it focused on metrics being scraped by Prometheus. Prometheus doesn't care about the number ordering 🤷

If we introduce a regex sort, I wonder how much are we sacrificing in performance for this change. Do you mind doing some benchmarking and posting the results here?

@linuxgcc linuxgcc force-pushed the main branch 2 times, most recently from b1bb9b5 to 1d8e814 Compare March 5, 2024 06:55
@linuxgcc
Copy link
Author

linuxgcc commented Mar 5, 2024

@ArthurSens
Sorry, I don't have time recently, benchmark has been completed.

1 . touch such file prometheus/internal/metric_sort_test.go
2. with out reg sort benchmark

[root@trace client_golang]# go test -bench="BenchmarkSort" -benchtime=100x -benchmem ./prometheus/internal
goos: linux
goarch: amd64
pkg: github.com/prometheus/client_golang/prometheus/internal
cpu: Intel Xeon Processor (Cascadelake)
BenchmarkSort-16 100 2020 ns/op 265 B/op 8 allocs/op
PASS
ok github.com/prometheus/client_golang/prometheus/internal 0.012s

  1. with reg sort benchmark

[root@trace client_golang]# go test -bench="BenchmarkSort" -benchtime=100x -benchmem ./prometheus/internal
goos: linux
goarch: amd64
pkg: github.com/prometheus/client_golang/prometheus/internal
cpu: Intel Xeon Processor (Cascadelake)
BenchmarkSort-16 100 88625 ns/op 52093 B/op 626 allocs/op
PASS
ok github.com/prometheus/client_golang/prometheus/internal 0.022s

[root@trace client_golang]#

It can be seen from the above comparison that the comparison of 16-core CPU data before and after using regular expressions takes no more than 0.1ms

@linuxgcc linuxgcc changed the title fix metric sort error fix metric sort error Mar 8, 2024
@linuxgcc linuxgcc closed this Mar 8, 2024
@linuxgcc linuxgcc reopened this Mar 8, 2024
@ArthurSens
Copy link
Member

ArthurSens commented Mar 15, 2024

We can also see a huge increase in memory allocation though 😱 , 8 allocations per operation up to 626 allocations. The allocated memory also increased a lot

Signed-off-by: heyitao <heyitao@uniontech.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for this!

I added more points to the discussion #1442 (comment)

@ArthurSens is right, as you said it's "only" 0.1ms but if ops is one comparison.. we do this potentially 10 000 times per scrape making it a second. Number of allocs is one thing (a lot more) but the bump from 200 B to 50Kb is not minor (per comparison!). Maybe I am misinterpreting what "ops" is in your benchmark, maybe it's per full sorting (but for how many series?) In future please paste the benchmark code too, so we can verify.

But performance is one thing, the most important part is:

  • if readability hit of NOT having this is big enough to make this change
  • if numerical sorting would NOT actually surprise (thus impact readability) of ppl who got used to it

@@ -46,6 +48,21 @@ func (s MetricSorter) Swap(i, j int) {
s[i], s[j] = s[j], s[i]
}

func (s MetricSorter) compare(si, sj string) bool {
// When the number of devices exceeds two digits, a sorting error occurs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very specific to node exporter, not generic client_golang use, right? client_golang is used in more places than node_exporter, so we have to care about more use cases.

func (s MetricSorter) compare(si, sj string) bool {
// When the number of devices exceeds two digits, a sorting error occurs.
// dealing with sorting problems, especially CPU sorting errors
re := regexp.MustCompile(`\d+$`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I think we can't use regex in such a hot path, we might need to be more smarter here if we ever do this ( let's talk on the issue) - or do it optionally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants