Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use runtime/metrics package for metrics collection in Go 1.16+ #842

Closed
mknyszek opened this issue Mar 8, 2021 · 16 comments
Closed

Use runtime/metrics package for metrics collection in Go 1.16+ #842

mknyszek opened this issue Mar 8, 2021 · 16 comments

Comments

@mknyszek
Copy link
Contributor

mknyszek commented Mar 8, 2021

Following up from https://groups.google.com/g/prometheus-developers/c/FyRu5C4Yqzo.

I would like to modify the Go Prometheus client to use the new runtime/metrics package introduced in Go 1.16. The new package:

  • Supersedes existing APIs like runtime.ReadMemStats and runtime.NumGoroutine, merging them into one API.
  • Provides human-readable descriptions of each metric.
  • Namespaces each metric name and includes a unit.
  • Does not suffer from the same latency and whole-program performance issues that runtime.ReadMemStats does.

Specifically, I would like to provide an alternative implementation of the goCollector structure found in prometheus/collector.go that's used for Go 1.16+, chosen via build tag. This new structure would export all metrics it finds in runtime/metrics.All and transform the names it finds via:

// translateMetricName translates a metric name from one provided by the runtime/metrics
// package to one usable by Prometheus.
func translateMetricName(name string) string {
    // Replace / and : with _.
    return strings.ReplaceAll(strings.ReplaceAll(name, "/", "_"), ":", "_")
}

It will also re-export metrics that are currently provided by the existing goCollector under the same names, since they all appear in the runtime/metrics package.

The set of available metrics will grow over time, and more metrics will magically appear in everyone's dashboards. Automatic improvements in telemetry! One concern here is reading more metrics increases the cost of reading metrics but I think that's OK because:

  1. Today the sampling latency is O(microseconds), even in the 99th percentile. It's unlikely to increase by an order of magnitude overnight, bugs notwithstanding.
  2. They're going to be added relatively slowly, so if it becomes a problem, it wouldn't be hard to just sample some of them at a lower rate. Part of the point of the package is you can choose what you want to sample.
  3. Particularly performance sensitive metrics will be marked as such in their Descriptions, so identifying them at runtime will be straightforward.

I'm happy to do the implementation if someone is willing to review!

@beorn7
Copy link
Member

beorn7 commented Mar 9, 2021

That's great.

However, we have to be careful with changing the existing metrics. (Adding metrics is mostly harmless. And changing the help string of existing metrics is probably also acceptable. )

Since this library is used so heavily, even minor changes usually come with a lot of fallout.

Let's see how much deviations will happen. From the above, I'd assume that essentially all the go_... metrics will change. To deal with that, we broadly have the following options:

  1. Just add the new metrics while keeping the old ones around. (
  2. Like (1) but provide an opt-out for the old metrics.
  3. Only add the new metrics when explicitly opt'd in. (That would be the most conservative option. We would essentially provide a different goCollector that users of the library could register instead of or in addition to the old one.)

@mknyszek
Copy link
Contributor Author

mknyszek commented Mar 9, 2021

In my original post, I proposed (1), and that's my preference. I think getting new metrics out to users with new Go releases is just a good thing in general; better telemetry, better understanding. I also understand not wanting to break anybody's dashboard that watches things like Go-reported memory use and goroutine count. To disambiguate, we could modify the old description to point to the new metric name. Also, I omitted it in my original post but adding a go_ prefix to the metric names is what I was thinking anyway; it should be clear that these are metrics exported by the Go runtime.

But, there is the other side of this that when the Go runtime implementation changes, some metrics might go away (the API requires users check first if they're only pulling a single specific metric). Generally speaking this will happen slowly over a couple Go releases, and will be announced beforehand. To be clear, this isn't ever going to be something like "heap memory use" or "goroutine count" but something implementation-specific like a hit count for an internal cache in the allocator or something. These are things that are useful for performance debugging, but not really all that useful to actively monitor.

Is that kind of metric removal acceptable for you?

@beorn7
Copy link
Member

beorn7 commented Mar 10, 2021

Sure, if a metric just doesn't make sense anymore because the thing that this metric is about has changed, then I think it's the best way to remove the metric, thereby making the change obvious.

My concern here is about simple renaming or a different representation of the same thing. That is particularly annoying if you have version drift across your fleet.

I think going with (1) is fine, but let's take a raincheck when we actually know the number of added metrics. Perhaps an additional function NewGoCollectorWithOpts(GoCollectorOpts) could be added where a user can pick the sets of metrics they desire. (We might need that anyway because OpenMetrics managed to make the current set of metrics impossible to expose as legal OpenMetrics… see #829 😢 )

@mknyszek
Copy link
Contributor Author

but let's take a raincheck when we actually know the number of added metrics.

Can you clarify this? The number of added metrics would be exactly what you see at https://golang.org/pkg/runtime/metrics/ so I assume you mean something else. Or, if you mean just changing the underlying implementation but not the exposed metrics, I'm OK with that for now.

Here's an idea: what if we expose all the same values as today as they are, but let any new metrics that don't fit into that set get added via the automatic naming?

That is, /memory/classes/heap/objects:bytes will always get exposed as alloc_bytes instead of go_memory_classes_heap_objects_bytes, but any new metric will still just appear (there aren't any new ones today except for totals, which we should probably just give a different name to line up more closely with the existing ones).

That's unfortunate to hear about OpenMetrics. I would like to make new metrics opt-out instead of opt-in, ideally, but we can make the existing metrics changing name opt-in (or just not even make that an option). Luckily I think we sidestep the magic _total issue with OpenMetrics in the runtime/metrics package because the suffix of the metric name is always the unit.

@beorn7
Copy link
Member

beorn7 commented Mar 11, 2021

The number of added metrics would be exactly what you see at https://golang.org/pkg/runtime/metrics/

Yeah, but you also said: “The set of available metrics will grow over time, and more metrics will magically appear in everyone's dashboards.” That was the point where I made a mental not like "before we merge this, let's reflect on the number of metrics that we can expect to be added in this way". I assume it won't be a massive amount, and we are probably fine. It's still something to keep in mind. (And it won't be about a "no, we won't do it at all" anyway, it would be more like "let's make it opt-in for the existing library as this might surprise the long-time users". But again, as said, most likely we are fine.)

Here's an idea: what if we expose all the same values as today as they are, but let any new metrics that don't fit into that set get added via the automatic naming?

Yes, that would be a way to save on the number fo metrics, but I actually like the consistency of a direct mapping. So as long as the number of metrics stays at the quite low amount we have now, I'd just do the parallel exposition (old name for compatibility, new name for consistency) as discussed before.

Luckily I think we sidestep the magic _total issue with OpenMetrics in the runtime/metrics package because the suffix of the metric name is always the unit.

Even without OpenMetrics, we should still have a _total suffix on all counters. (OpenMetrics merely mandates what was a very strongly recommended practice before.) The problem in #829 is not directly related. It has more to do with the fact that OpenMetrics turned the _total into a “real” suffix so that the actual name does not contain the _total anymore, thereby creating a naming collision in those cases where a “counter” version and a “gauge” version of the same thing existed so far (with go_memstats_alloc_bytes being a perfect example). I had a number of ideas about how to deal with the problem differently in OpenMetrics, but they didn't make it into the spec.

@mknyszek
Copy link
Contributor Author

That was the point where I made a mental not like "before we merge this, let's reflect on the number of metrics that we can expect to be added in this way". I assume it won't be a massive amount, and we are probably fine. It's still something to keep in mind. (And it won't be about a "no, we won't do it at all" anyway, it would be more like "let's make it opt-in for the existing library as this might surprise the long-time users". But again, as said, most likely we are fine.)

Ah hah, I understand.

Just to set expectations, this is my current best guess: I'd expect maybe 1 or 2 new metrics to be added in each Go release on average (so 1 or 2 every 6 months) with this eventually flattening out to zero, save for major changes to the runtime which are somewhat unpredictable. The runtime changes a lot over time, but usually not in a way that suggests a new metric. We'll probably be adding a few over the next few releases as we fill in the gaps in our telemetry (for instance, I'd like to add a scheduler latency distribution this next release). What do you think about that?

RE: the _total suffix, sounds good. The metric description already has an "is it a counter" boolean so we can do this programmatically.

Yes, that would be a way to save on the number fo metrics, but I actually like the consistency of a direct mapping. So as long as the number of metrics stays at the quite low amount we have now, I'd just do the parallel exposition (old name for compatibility, new name for consistency) as discussed before.

Got it. :) Alrighty, I'll send a PR then, and we'll go from there?

@beorn7
Copy link
Member

beorn7 commented Mar 11, 2021

What do you think about that?

That sounds all mostly harmless.

What do you think about that?

Yes, please.

@Pryz
Copy link

Pryz commented Mar 23, 2021

While more work for the end user, have you thought about exposing an additional (and optional) collector in its own package that people could opt-in ?

This way you could imagine that goCollector stays with runtime + runtime/debug while the new one (e.g: goMetricsColelctor) could be based on runtime/metrics and auto-magically export new metrics as they come.

@mknyszek
Copy link
Contributor Author

I considered it, but to put it bluntly I'm not sure it buys you that much. In terms of compatibility, the addition of new metrics I figure should be totally fine (as long as the old names are still there). In terms of maintainability, the project still needs to maintain two goCollector implementations. If the new one is the default, the old one can (maybe) eventually get dropped in a backwards-incompatible version bump (I don't know if Prometheus does those). Finally, everyone gets to stop thinking about the runtime.MemStats performance impact without doing anything.

I will admit that there is room for some degree of confusion with having two metrics that mean the same thing, but I believe that can be resolved with documentation. It wouldn't be hard to have the descriptions for the old and new metrics point to each other saying "X and Y are the same, but X still exists for compatibility." Assuming these descriptions are surfaced in any UX (IIRC Grafana does surface them in autocomplete, last I tried?) then I think that should minimize confusion. But I don't know. Maybe this is a bigger issue than I'm making it out to be. Or maybe I'm just missing something else. :)

(Also, apologies for the delay in sending the PR; I've been pretty busy these last few weeks but will likely have some time in early April to do this properly.)

@mknyszek
Copy link
Contributor Author

I've finally got the bandwidth to make this happen, so I'm circling back to it, and I realized in the weeks leading up to the Go 1.17 release that a bunch of statistics were actually missing (oops), notably related to total allocation count. These are important, and will be available in Go 1.17. So, I think it makes sense to introduce different behavior with Go 1.17 instead of Go 1.16.

I also went back and walked over the list of Prometheus Go metrics, and there are still 2 metrics generated by runtime.MemStats that are not represented in the runtime/metrics package: last_gc_time_seconds, and gc_cpu_fraction.

The former doesn't really have all that much practical use to most people. However, it is actually possible to pull out last_gc_time_seconds from runtime/debug.ReadGCStats, which notably does not stop the world, so that is an option to retaining it.

The latter is not a very good metric because it doesn't actually give you a good sense of the how much of the application's CPU time the GC is eating up. It's an average over all time so if, for example, your application is idle for 2 hours and then has a spike of activity, it will like just sit at 0 and you'll never even notice the activity. I plan to replace that with a better, per-GC measure (that's pushed out after each GC, so one can actually build reliable time series out of it and not miss anything), in the future. last_gc_time_seconds would also work better in this format.

Overall, I'd like to move forward without them (the former because it's not actually that useful, the latter because it's actively misleading), or at least stub them out and have them always say "0" or some other reasonable value.

Is that possible and/or reasonable for Prometheus? I'm absolutely willing to work within the bounds of your backwards-compatibility constraints, so if either of those options is unacceptable, I can try to come up with something else.

@beorn7
Copy link
Member

beorn7 commented Jun 21, 2021

First of all: New maintainers of this repo are @kakkoyun and @bwplotka . So they are the new "default deciders". (o:

What you said above sounds all very reasonable to me. If a metric is positively useless, I guess we can dare to set it to "0" and perhaps update the HELP accordingly or remove it outright. A similar thing would happen anyway if a future Go version worked so differently that certain go metrics wouldn't make sense anymore. That would still not trigger major release of prometheus/client_golang, IMHO.

I'm a bit torn if it is better to remove the metric or export it with "0". Currently, I'm leaning to the former (if we potentially break use cases, we better break them noticeably rather than subtly).

In the spirit of the above, we can also be conservative and make it all opt-in (switching for good in the next major release). But as said, it doesn't sound like it's required in this case.

@helios741
Copy link

So when is the collection done using runtime/metrics and is there a schedule? @mknyszek @beorn7

I think the two new metrics exposed in runtime/metrics in 1.17(/sched/goroutines:goroutines and /sched/latencies:seconds) are very good, and now that we've exposed them by our own means, I hope the officials will follow through.

@mknyszek
Copy link
Contributor Author

@helios741 I keep getting side-tracked, but I am working on a PR. This has bubbled up on my priorities recently so I'll send something soon.

mknyszek added a commit to mknyszek/client_golang that referenced this issue Dec 30, 2021
This change introduces use of the runtime/metrics package in place of
runtime.MemStats for Go 1.17 or later. The runtime/metrics package was
introduced in Go 1.16, but not all the old metrics were accounted for
until 1.17.

The runtime/metrics package offers several advantages over using
runtime.MemStats:
* The list of metrics and their descriptions are machine-readable,
  allowing new metrics to get added without any additional work.
* Detailed histogram-based metrics are now available, offering much
  deeper insights into the Go runtime.
* The runtime/metrics API is significantly more efficient than
  runtime.MemStats, even with the additional metrics added, because
  it does not require any stop-the-world events.

That being said, integrating the package comes with some caveats, some
of which were discussed in prometheus#842. Namely:
* The old MemStats-based metrics need to continue working, so they're
  exported under their old names backed by equivalent runtime/metrics
  metrics.
* Earlier versions of Go need to continue working, so the old code
  remains, but behind a build tag.

Finally, a few notes about the implementation:
* This change includes a whole bunch of refactoring to avoid significant
  code duplication.
* This change adds a new histogram metric type specifically optimized
  for runtime/metrics histograms. This type's methods also include
  additional logic to deal with differences in bounds conventions.
* This change makes a whole bunch of decisions about how runtime/metrics
  names are translated.
mknyszek added a commit to mknyszek/client_golang that referenced this issue Dec 30, 2021
This change introduces use of the runtime/metrics package in place of
runtime.MemStats for Go 1.17 or later. The runtime/metrics package was
introduced in Go 1.16, but not all the old metrics were accounted for
until 1.17.

The runtime/metrics package offers several advantages over using
runtime.MemStats:
* The list of metrics and their descriptions are machine-readable,
  allowing new metrics to get added without any additional work.
* Detailed histogram-based metrics are now available, offering much
  deeper insights into the Go runtime.
* The runtime/metrics API is significantly more efficient than
  runtime.MemStats, even with the additional metrics added, because
  it does not require any stop-the-world events.

That being said, integrating the package comes with some caveats, some
of which were discussed in prometheus#842. Namely:
* The old MemStats-based metrics need to continue working, so they're
  exported under their old names backed by equivalent runtime/metrics
  metrics.
* Earlier versions of Go need to continue working, so the old code
  remains, but behind a build tag.

Finally, a few notes about the implementation:
* This change includes a whole bunch of refactoring to avoid significant
  code duplication.
* This change adds a new histogram metric type specifically optimized
  for runtime/metrics histograms. This type's methods also include
  additional logic to deal with differences in bounds conventions.
* This change makes a whole bunch of decisions about how runtime/metrics
  names are translated.

Signed-off-by: Michael Anthony Knyszek <mknyszek@google.com>
mknyszek added a commit to mknyszek/client_golang that referenced this issue Jan 6, 2022
This change introduces use of the runtime/metrics package in place of
runtime.MemStats for Go 1.17 or later. The runtime/metrics package was
introduced in Go 1.16, but not all the old metrics were accounted for
until 1.17.

The runtime/metrics package offers several advantages over using
runtime.MemStats:
* The list of metrics and their descriptions are machine-readable,
  allowing new metrics to get added without any additional work.
* Detailed histogram-based metrics are now available, offering much
  deeper insights into the Go runtime.
* The runtime/metrics API is significantly more efficient than
  runtime.MemStats, even with the additional metrics added, because
  it does not require any stop-the-world events.

That being said, integrating the package comes with some caveats, some
of which were discussed in prometheus#842. Namely:
* The old MemStats-based metrics need to continue working, so they're
  exported under their old names backed by equivalent runtime/metrics
  metrics.
* Earlier versions of Go need to continue working, so the old code
  remains, but behind a build tag.

Finally, a few notes about the implementation:
* This change includes a whole bunch of refactoring to avoid significant
  code duplication.
* This change adds a new histogram metric type specifically optimized
  for runtime/metrics histograms. This type's methods also include
  additional logic to deal with differences in bounds conventions.
* This change makes a whole bunch of decisions about how runtime/metrics
  names are translated.

Signed-off-by: Michael Anthony Knyszek <mknyszek@google.com>
mknyszek added a commit to mknyszek/client_golang that referenced this issue Jan 6, 2022
This change introduces use of the runtime/metrics package in place of
runtime.MemStats for Go 1.17 or later. The runtime/metrics package was
introduced in Go 1.16, but not all the old metrics were accounted for
until 1.17.

The runtime/metrics package offers several advantages over using
runtime.MemStats:
* The list of metrics and their descriptions are machine-readable,
  allowing new metrics to get added without any additional work.
* Detailed histogram-based metrics are now available, offering much
  deeper insights into the Go runtime.
* The runtime/metrics API is significantly more efficient than
  runtime.MemStats, even with the additional metrics added, because
  it does not require any stop-the-world events.

That being said, integrating the package comes with some caveats, some
of which were discussed in prometheus#842. Namely:
* The old MemStats-based metrics need to continue working, so they're
  exported under their old names backed by equivalent runtime/metrics
  metrics.
* Earlier versions of Go need to continue working, so the old code
  remains, but behind a build tag.

Finally, a few notes about the implementation:
* This change includes a whole bunch of refactoring to avoid significant
  code duplication.
* This change adds a new histogram metric type specifically optimized
  for runtime/metrics histograms. This type's methods also include
  additional logic to deal with differences in bounds conventions.
* This change makes a whole bunch of decisions about how runtime/metrics
  names are translated.

Signed-off-by: Michael Anthony Knyszek <mknyszek@google.com>
mknyszek added a commit to mknyszek/client_golang that referenced this issue Jan 11, 2022
This change introduces use of the runtime/metrics package in place of
runtime.MemStats for Go 1.17 or later. The runtime/metrics package was
introduced in Go 1.16, but not all the old metrics were accounted for
until 1.17.

The runtime/metrics package offers several advantages over using
runtime.MemStats:
* The list of metrics and their descriptions are machine-readable,
  allowing new metrics to get added without any additional work.
* Detailed histogram-based metrics are now available, offering much
  deeper insights into the Go runtime.
* The runtime/metrics API is significantly more efficient than
  runtime.MemStats, even with the additional metrics added, because
  it does not require any stop-the-world events.

That being said, integrating the package comes with some caveats, some
of which were discussed in prometheus#842. Namely:
* The old MemStats-based metrics need to continue working, so they're
  exported under their old names backed by equivalent runtime/metrics
  metrics.
* Earlier versions of Go need to continue working, so the old code
  remains, but behind a build tag.

Finally, a few notes about the implementation:
* This change includes a whole bunch of refactoring to avoid significant
  code duplication.
* This change adds a new histogram metric type specifically optimized
  for runtime/metrics histograms. This type's methods also include
  additional logic to deal with differences in bounds conventions.
* This change makes a whole bunch of decisions about how runtime/metrics
  names are translated.
* This change adds a `go generate` script to generate a list of expected
  runtime/metrics names for a given Go version for auditing. Users of
  new versions of Go will transparently be allowed to use new metrics,
  however.

Signed-off-by: Michael Anthony Knyszek <mknyszek@google.com>
mknyszek added a commit to mknyszek/client_golang that referenced this issue Jan 11, 2022
This change introduces use of the runtime/metrics package in place of
runtime.MemStats for Go 1.17 or later. The runtime/metrics package was
introduced in Go 1.16, but not all the old metrics were accounted for
until 1.17.

The runtime/metrics package offers several advantages over using
runtime.MemStats:
* The list of metrics and their descriptions are machine-readable,
  allowing new metrics to get added without any additional work.
* Detailed histogram-based metrics are now available, offering much
  deeper insights into the Go runtime.
* The runtime/metrics API is significantly more efficient than
  runtime.MemStats, even with the additional metrics added, because
  it does not require any stop-the-world events.

That being said, integrating the package comes with some caveats, some
of which were discussed in prometheus#842. Namely:
* The old MemStats-based metrics need to continue working, so they're
  exported under their old names backed by equivalent runtime/metrics
  metrics.
* Earlier versions of Go need to continue working, so the old code
  remains, but behind a build tag.

Finally, a few notes about the implementation:
* This change includes a whole bunch of refactoring to avoid significant
  code duplication.
* This change adds a new histogram metric type specifically optimized
  for runtime/metrics histograms. This type's methods also include
  additional logic to deal with differences in bounds conventions.
* This change makes a whole bunch of decisions about how runtime/metrics
  names are translated.
* This change adds a `go generate` script to generate a list of expected
  runtime/metrics names for a given Go version for auditing. Users of
  new versions of Go will transparently be allowed to use new metrics,
  however.

Signed-off-by: Michael Anthony Knyszek <mknyszek@google.com>
mknyszek added a commit to mknyszek/client_golang that referenced this issue Jan 11, 2022
This change introduces use of the runtime/metrics package in place of
runtime.MemStats for Go 1.17 or later. The runtime/metrics package was
introduced in Go 1.16, but not all the old metrics were accounted for
until 1.17.

The runtime/metrics package offers several advantages over using
runtime.MemStats:
* The list of metrics and their descriptions are machine-readable,
  allowing new metrics to get added without any additional work.
* Detailed histogram-based metrics are now available, offering much
  deeper insights into the Go runtime.
* The runtime/metrics API is significantly more efficient than
  runtime.MemStats, even with the additional metrics added, because
  it does not require any stop-the-world events.

That being said, integrating the package comes with some caveats, some
of which were discussed in prometheus#842. Namely:
* The old MemStats-based metrics need to continue working, so they're
  exported under their old names backed by equivalent runtime/metrics
  metrics.
* Earlier versions of Go need to continue working, so the old code
  remains, but behind a build tag.

Finally, a few notes about the implementation:
* This change includes a whole bunch of refactoring to avoid significant
  code duplication.
* This change adds a new histogram metric type specifically optimized
  for runtime/metrics histograms. This type's methods also include
  additional logic to deal with differences in bounds conventions.
* This change makes a whole bunch of decisions about how runtime/metrics
  names are translated.
* This change adds a `go generate` script to generate a list of expected
  runtime/metrics names for a given Go version for auditing. Users of
  new versions of Go will transparently be allowed to use new metrics,
  however.

Signed-off-by: Michael Anthony Knyszek <mknyszek@google.com>
mknyszek added a commit to mknyszek/client_golang that referenced this issue Jan 11, 2022
This change introduces use of the runtime/metrics package in place of
runtime.MemStats for Go 1.17 or later. The runtime/metrics package was
introduced in Go 1.16, but not all the old metrics were accounted for
until 1.17.

The runtime/metrics package offers several advantages over using
runtime.MemStats:
* The list of metrics and their descriptions are machine-readable,
  allowing new metrics to get added without any additional work.
* Detailed histogram-based metrics are now available, offering much
  deeper insights into the Go runtime.
* The runtime/metrics API is significantly more efficient than
  runtime.MemStats, even with the additional metrics added, because
  it does not require any stop-the-world events.

That being said, integrating the package comes with some caveats, some
of which were discussed in prometheus#842. Namely:
* The old MemStats-based metrics need to continue working, so they're
  exported under their old names backed by equivalent runtime/metrics
  metrics.
* Earlier versions of Go need to continue working, so the old code
  remains, but behind a build tag.

Finally, a few notes about the implementation:
* This change includes a whole bunch of refactoring to avoid significant
  code duplication.
* This change adds a new histogram metric type specifically optimized
  for runtime/metrics histograms. This type's methods also include
  additional logic to deal with differences in bounds conventions.
* This change makes a whole bunch of decisions about how runtime/metrics
  names are translated.
* This change adds a `go generate` script to generate a list of expected
  runtime/metrics names for a given Go version for auditing. Users of
  new versions of Go will transparently be allowed to use new metrics,
  however.

Signed-off-by: Michael Anthony Knyszek <mknyszek@google.com>
mknyszek added a commit to mknyszek/client_golang that referenced this issue Jan 11, 2022
This change introduces use of the runtime/metrics package in place of
runtime.MemStats for Go 1.17 or later. The runtime/metrics package was
introduced in Go 1.16, but not all the old metrics were accounted for
until 1.17.

The runtime/metrics package offers several advantages over using
runtime.MemStats:
* The list of metrics and their descriptions are machine-readable,
  allowing new metrics to get added without any additional work.
* Detailed histogram-based metrics are now available, offering much
  deeper insights into the Go runtime.
* The runtime/metrics API is significantly more efficient than
  runtime.MemStats, even with the additional metrics added, because
  it does not require any stop-the-world events.

That being said, integrating the package comes with some caveats, some
of which were discussed in prometheus#842. Namely:
* The old MemStats-based metrics need to continue working, so they're
  exported under their old names backed by equivalent runtime/metrics
  metrics.
* Earlier versions of Go need to continue working, so the old code
  remains, but behind a build tag.

Finally, a few notes about the implementation:
* This change includes a whole bunch of refactoring to avoid significant
  code duplication.
* This change adds a new histogram metric type specifically optimized
  for runtime/metrics histograms. This type's methods also include
  additional logic to deal with differences in bounds conventions.
* This change makes a whole bunch of decisions about how runtime/metrics
  names are translated.
* This change adds a `go generate` script to generate a list of expected
  runtime/metrics names for a given Go version for auditing. Users of
  new versions of Go will transparently be allowed to use new metrics,
  however.

Signed-off-by: Michael Anthony Knyszek <mknyszek@google.com>
bwplotka pushed a commit that referenced this issue Jan 16, 2022
This change introduces use of the runtime/metrics package in place of
runtime.MemStats for Go 1.17 or later. The runtime/metrics package was
introduced in Go 1.16, but not all the old metrics were accounted for
until 1.17.

The runtime/metrics package offers several advantages over using
runtime.MemStats:
* The list of metrics and their descriptions are machine-readable,
  allowing new metrics to get added without any additional work.
* Detailed histogram-based metrics are now available, offering much
  deeper insights into the Go runtime.
* The runtime/metrics API is significantly more efficient than
  runtime.MemStats, even with the additional metrics added, because
  it does not require any stop-the-world events.

That being said, integrating the package comes with some caveats, some
of which were discussed in #842. Namely:
* The old MemStats-based metrics need to continue working, so they're
  exported under their old names backed by equivalent runtime/metrics
  metrics.
* Earlier versions of Go need to continue working, so the old code
  remains, but behind a build tag.

Finally, a few notes about the implementation:
* This change includes a whole bunch of refactoring to avoid significant
  code duplication.
* This change adds a new histogram metric type specifically optimized
  for runtime/metrics histograms. This type's methods also include
  additional logic to deal with differences in bounds conventions.
* This change makes a whole bunch of decisions about how runtime/metrics
  names are translated.
* This change adds a `go generate` script to generate a list of expected
  runtime/metrics names for a given Go version for auditing. Users of
  new versions of Go will transparently be allowed to use new metrics,
  however.

Signed-off-by: Michael Anthony Knyszek <mknyszek@google.com>
@kakkoyun
Copy link
Member

Fixed by #955

@bboreham
Copy link
Member

I plan to replace [gc_cpu_fraction] with a better, per-GC measure (that's pushed out after each GC

Did this happen? What was it replaced by?

BTW I would expect to see the removal of a metric called out in CHANGELOG.

@mknyszek
Copy link
Contributor Author

On the Go side, a replacement is currently in the Go tree for 1.20: direct CPU time metrics. Apologies for the delay.

mangalaman93 pushed a commit to dgraph-io/dgraph that referenced this issue Feb 9, 2023
This removes the metric `go_memstats_gc_cpu_fraction`.
This metric is not useful and often misleading due to the
fact that it's an average over the lifetime of the process.
See prometheus/client_golang#842 (comment)

These are the new metrics that are added:

go_cgo_go_to_c_calls_calls_total
go_cpu_classes_gc_mark_assist_cpu_seconds_total
go_cpu_classes_gc_mark_dedicated_cpu_seconds_total
go_cpu_classes_gc_mark_idle_cpu_seconds_total
go_cpu_classes_gc_pause_cpu_seconds_total
go_cpu_classes_gc_total_cpu_seconds_total
go_cpu_classes_idle_cpu_seconds_total
go_cpu_classes_scavenge_assist_cpu_seconds_total
go_cpu_classes_scavenge_background_cpu_seconds_total
go_cpu_classes_scavenge_total_cpu_seconds_total
go_cpu_classes_total_cpu_seconds_total
go_cpu_classes_user_cpu_seconds_total
go_gc_cycles_automatic_gc_cycles_total
go_gc_cycles_forced_gc_cycles_total
go_gc_cycles_total_gc_cycles_total

go_gc_heap_allocs_by_size_bytes_bucket
go_gc_heap_allocs_by_size_bytes_count
go_gc_heap_allocs_by_size_bytes_sum
go_gc_heap_allocs_bytes_total
go_gc_heap_allocs_objects_total
go_gc_heap_frees_by_size_bytes_bucket
go_gc_heap_frees_by_size_bytes_count
go_gc_heap_frees_by_size_bytes_sum
go_gc_heap_frees_bytes_total
go_gc_heap_frees_objects_total
go_gc_heap_goal_bytes
go_gc_heap_objects_objects
go_gc_heap_tiny_allocs_objects_total
go_gc_limiter_last_enabled_gc_cycle
go_gc_pauses_seconds_bucket
go_gc_pauses_seconds_count
go_gc_pauses_seconds_sum
go_gc_stack_starting_size_bytes

go_memory_classes_heap_free_bytes
go_memory_classes_heap_objects_bytes
go_memory_classes_heap_released_bytes
go_memory_classes_heap_stacks_bytes
go_memory_classes_heap_unused_bytes
go_memory_classes_metadata_mcache_free_bytes
go_memory_classes_metadata_mcache_inuse_bytes
go_memory_classes_metadata_mspan_free_bytes
go_memory_classes_metadata_mspan_inuse_bytes
go_memory_classes_metadata_other_bytes
go_memory_classes_os_stacks_bytes
go_memory_classes_other_bytes
go_memory_classes_profiling_buckets_bytes
go_memory_classes_total_bytes

go_sched_gomaxprocs_threads
go_sched_goroutines_goroutines
go_sched_latencies_seconds_bucket
go_sched_latencies_seconds_count
go_sched_latencies_seconds_sum
go_sync_mutex_wait_total_seconds_total
mangalaman93 pushed a commit to dgraph-io/dgraph that referenced this issue Feb 9, 2023
This removes the metric `go_memstats_gc_cpu_fraction`.
This metric is not useful and often misleading due to the
fact that it's an average over the lifetime of the process.
See prometheus/client_golang#842 (comment)

These are the new metrics that are added:

go_cgo_go_to_c_calls_calls_total
go_cpu_classes_gc_mark_assist_cpu_seconds_total
go_cpu_classes_gc_mark_dedicated_cpu_seconds_total
go_cpu_classes_gc_mark_idle_cpu_seconds_total
go_cpu_classes_gc_pause_cpu_seconds_total
go_cpu_classes_gc_total_cpu_seconds_total
go_cpu_classes_idle_cpu_seconds_total
go_cpu_classes_scavenge_assist_cpu_seconds_total
go_cpu_classes_scavenge_background_cpu_seconds_total
go_cpu_classes_scavenge_total_cpu_seconds_total
go_cpu_classes_total_cpu_seconds_total
go_cpu_classes_user_cpu_seconds_total
go_gc_cycles_automatic_gc_cycles_total
go_gc_cycles_forced_gc_cycles_total
go_gc_cycles_total_gc_cycles_total

go_gc_heap_allocs_by_size_bytes_bucket
go_gc_heap_allocs_by_size_bytes_count
go_gc_heap_allocs_by_size_bytes_sum
go_gc_heap_allocs_bytes_total
go_gc_heap_allocs_objects_total
go_gc_heap_frees_by_size_bytes_bucket
go_gc_heap_frees_by_size_bytes_count
go_gc_heap_frees_by_size_bytes_sum
go_gc_heap_frees_bytes_total
go_gc_heap_frees_objects_total
go_gc_heap_goal_bytes
go_gc_heap_objects_objects
go_gc_heap_tiny_allocs_objects_total
go_gc_limiter_last_enabled_gc_cycle
go_gc_pauses_seconds_bucket
go_gc_pauses_seconds_count
go_gc_pauses_seconds_sum
go_gc_stack_starting_size_bytes

go_memory_classes_heap_free_bytes
go_memory_classes_heap_objects_bytes
go_memory_classes_heap_released_bytes
go_memory_classes_heap_stacks_bytes
go_memory_classes_heap_unused_bytes
go_memory_classes_metadata_mcache_free_bytes
go_memory_classes_metadata_mcache_inuse_bytes
go_memory_classes_metadata_mspan_free_bytes
go_memory_classes_metadata_mspan_inuse_bytes
go_memory_classes_metadata_other_bytes
go_memory_classes_os_stacks_bytes
go_memory_classes_other_bytes
go_memory_classes_profiling_buckets_bytes
go_memory_classes_total_bytes

go_sched_gomaxprocs_threads
go_sched_goroutines_goroutines
go_sched_latencies_seconds_bucket
go_sched_latencies_seconds_count
go_sched_latencies_seconds_sum
go_sync_mutex_wait_total_seconds_total
mangalaman93 pushed a commit to dgraph-io/dgraph that referenced this issue Feb 9, 2023
This removes the metric `go_memstats_gc_cpu_fraction`.
This metric is not useful and often misleading due to the
fact that it's an average over the lifetime of the process.
See prometheus/client_golang#842 (comment)

These are the new metrics that are added:

go_cgo_go_to_c_calls_calls_total
go_cpu_classes_gc_mark_assist_cpu_seconds_total
go_cpu_classes_gc_mark_dedicated_cpu_seconds_total
go_cpu_classes_gc_mark_idle_cpu_seconds_total
go_cpu_classes_gc_pause_cpu_seconds_total
go_cpu_classes_gc_total_cpu_seconds_total
go_cpu_classes_idle_cpu_seconds_total
go_cpu_classes_scavenge_assist_cpu_seconds_total
go_cpu_classes_scavenge_background_cpu_seconds_total
go_cpu_classes_scavenge_total_cpu_seconds_total
go_cpu_classes_total_cpu_seconds_total
go_cpu_classes_user_cpu_seconds_total
go_gc_cycles_automatic_gc_cycles_total
go_gc_cycles_forced_gc_cycles_total
go_gc_cycles_total_gc_cycles_total

go_gc_heap_allocs_by_size_bytes_bucket
go_gc_heap_allocs_by_size_bytes_count
go_gc_heap_allocs_by_size_bytes_sum
go_gc_heap_allocs_bytes_total
go_gc_heap_allocs_objects_total
go_gc_heap_frees_by_size_bytes_bucket
go_gc_heap_frees_by_size_bytes_count
go_gc_heap_frees_by_size_bytes_sum
go_gc_heap_frees_bytes_total
go_gc_heap_frees_objects_total
go_gc_heap_goal_bytes
go_gc_heap_objects_objects
go_gc_heap_tiny_allocs_objects_total
go_gc_limiter_last_enabled_gc_cycle
go_gc_pauses_seconds_bucket
go_gc_pauses_seconds_count
go_gc_pauses_seconds_sum
go_gc_stack_starting_size_bytes

go_memory_classes_heap_free_bytes
go_memory_classes_heap_objects_bytes
go_memory_classes_heap_released_bytes
go_memory_classes_heap_stacks_bytes
go_memory_classes_heap_unused_bytes
go_memory_classes_metadata_mcache_free_bytes
go_memory_classes_metadata_mcache_inuse_bytes
go_memory_classes_metadata_mspan_free_bytes
go_memory_classes_metadata_mspan_inuse_bytes
go_memory_classes_metadata_other_bytes
go_memory_classes_os_stacks_bytes
go_memory_classes_other_bytes
go_memory_classes_profiling_buckets_bytes
go_memory_classes_total_bytes

go_sched_gomaxprocs_threads
go_sched_goroutines_goroutines
go_sched_latencies_seconds_bucket
go_sched_latencies_seconds_count
go_sched_latencies_seconds_sum
go_sync_mutex_wait_total_seconds_total
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants