Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/metrics: add simple metrics #4527

Closed

Conversation

lukedirtwalker
Copy link
Collaborator

Introduce simple version of the pkg/metrics objects. The simpler versions don't have a With method. The With version can have quite some overhead if used in a hot path. For now the simple counters are used in snet.

Introduce simple version of the pkg/metrics objects. The simpler
versions don't have a With method. The With version can have quite some
overhead if used in a hot path. For now the simple counters are used in
snet.
@lukedirtwalker lukedirtwalker requested a review from a team as a code owner May 16, 2024 08:54
@matzf
Copy link
Member

matzf commented May 16, 2024

This change is Reviewable

Copy link
Member

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructing the SimpleCounter (same for other SimpleX, I'll just stick to Counter as an example) types with NewCounter seems very restricted now, it can only be used when all the label values have been set via the prometheus library. Like this, I don't see much of a point in using this wrapper library, we might as well just use prometheus.Counter instead of metrics.SimpleCounter.
Instead of this special case, we could add an additional method to "bake" a Counter into a SimpleCounter once all label values have been "With-ed".

type Counter interface {
   SimpleCounter                  // compatibility / shortcut for To().Add()
   With(labelValues ...string) Counter
   To() SimpleCounter          // "bake" 
}

type SimpleCounter interface {
   Add(delta float64)
}


// .... in prometheus.go:

func (c *counter) To() SimpleCounter {
  return &simpleCounter {
    c.cv.With(makeLabels(c.lvs...))
  }
}

func (c *counter) Add(delta float64) {
  c.To().Add(delta)
}

With this, we don't need the new NewCounter constructor.

I wonder if With could directly return SimpleCounter, or if we have many use cases where we incrementally apply the label values.
Ultimately, I think I'd remove the SimpleCounter interface from Counter, requiring that the caller makes this "baking" explicit. But this would be rather lot of type changes, so maybe could make sense to do this later.

"Simple" might not be an ideal name under this proposal. I find prometheus terminology quite sensible, CounterVec corresponding to our Counter, and Counter corresponding to SimpleCounter.

Reviewable status: 0 of 9 files reviewed, all discussions resolved

Copy link
Contributor

@oncilla oncilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the Counter interface has the wrong interface from the get go.

Almost no metrics library I have seen, the client can choose the set of label keys after the counter has been initialized.
Thus, it does not make sense to push the responsibility of choosing the key names to the client of the interface.
Doing so, tightly and implicitly couples the initialization and consumer code.

If I could start out again, I would only have the SimpleCounter interface. Having a With is ill-advised IMO.

Like this, I don't see much of a point in using this wrapper library, we might as well just use prometheus.Counter instead of metrics.SimpleCounter.

I disagree, the point is that you can stick any counter implementation. Could be prometheus, opentelemetry, opencensus.

Reviewable status: 0 of 9 files reviewed, all discussions resolved

Copy link
Member

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, and I agree that the API is not ideal. To me, one of the biggest weaknesses of the current API is that there is no way to tell from the type if the counter has been "fully" initialized with label values.
But I don't think that just adding a completely separate way to do metrics improves the situation much. The newest approach to do metrics is then to handle "provisioning" the counters with all label values somewhere in the main application modules and pass only these to the libraries. This seems a good approach to me, but re-organizing the existing libraries to follow this exact approach will be a very large amount of effort (and thus will virtually never be completed). My suggestion thus is to improve the situation for both new code and the old libraries. I thought it was too daring to suggest before, but I actually thought that restricting the Counter interface even more, and make it act as only a "builder" would be a good goal:

type Counter interface {           // rename -> CounterVec / CounterBuilder / CounterWither / CounterLabeler? 
   With(labelValues ...string) SimpleCounter
}
type SimpleCounter interface {   // rename -> Counter?
     Add(delta float64)
}

I tried this as an experiment and it was surprisingly straight forward. There are some cases in the gateway that do require some simple reorganization because the same struct is (ab)-used for both the "builder" and the "built" counters, but otherwise it was a rather easy pass, just changing the types.

Reviewable status: 0 of 9 files reviewed, all discussions resolved

Copy link
Member

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on @lukedirtwalker)


pkg/metrics/prometheus.go line 115 at r1 (raw file):

}

func (g *simpleGauge) Set(value float64) {

Do we even need these wrapper structs? It seems that the prometheus interfaces already implement the SimpleX interfaces.
If we keep the structs, I think I'd make them value types, the additional pointer indirection seems unnecessary.

Copy link
Contributor

@jiceatscion jiceatscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this API was every designed with the intent that the client constructs the labeled counter reference on-the-fly, although it's certainly a prototyping simplification. However, the ability to build the label sets incrementally is convenient and I am no too sure what we gain by trying to remove it. For example in the router we have things like:

metrics.DroppedPacketsTotal.MustCurryWith(ifLabels).MustCurryWith(scLabels).With(reasonMap)

Sure, one can always build a single label dic from all the input dics, but that's just extra boiler-plate code for the same result. Also, the fact that the router only uses the incremental building in fluent-style one-liners is a side effect of refactoring done for other reasons. Earlier there was code that built the label dics incrementally by passing curried vectors to other functions. Not vital, but why to we want to remove that ability? I still don't get what the cost of it is (unless used clearly improperly).

Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on @lukedirtwalker and @matzf)

Copy link
Member

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only tried to focus my suggestion snippet as much as possible. IMO, the important part is to

  • make it possible to create the "SimpleCounter" from the "Counter" to allow transitioning our existing uses of the library to a cleaner and more efficient pattern, and
  • if feasible, make it mandatory to explicitly create the "SimpleCounter" from the "Counter" before using the increment/set functions, in order to untangle the creation from the use of the metrics.

Regarding applying partial label sets incrementally, I wouldn't oppose in keeping this if it seems useful. My impression from an experimental pass over the code was that the few cases where this feature is used are rather hard to follow because the incremental construction is not at all apparent from the type signatures. For example, look at the gateway_frames_discarded_total metric, which has label values specified in three separate places.

Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on @lukedirtwalker)

Copy link
Collaborator Author

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I added a separate set of types was to prevent breaking changes. If we are fine with breaking changes, I would just do aways with the whole With method and let that be handled outside.

With is bad because it causes a lot of allocations and is quite CPU intensive. A counter increment on a counter that uses "With" label values causes 2 allocations. Furthermore the lookup of the correct counter in prometheus happens all on every increment.

For high traffic operations, .e.g. path probes in the gateway or potentially packet counters in the router, metrics which allocate per packet are absolutely not an option.

If you are fine with a breaking change I can also make a proposal on an API that does away with the whole With clauses.

Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on @matzf)

Copy link
Contributor

@jiceatscion jiceatscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not realized that using with() caused counters to become expensive during use. Or are you referring to cases of using with() every time the counter is changed? I guess what you really want is to make that faux-pas impossible to commit. Right?

Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on @lukedirtwalker and @matzf)

Copy link
Collaborator Author

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No even counters initialized with "With" once and re-used will cause allocations:
So for example the counter Add implementation:

func (c *counter) Add(delta float64) {
c.cv.With(makeLabels(c.lvs...)).Add(delta)
}

makeLabels creates a dictionary which is then passed to prometheus.
Also in prometheus the label values have to be hashed we call the With function (via Add).

Use the following to compare the 2:

func BenchmarkMetrics(b *testing.B) {
	auto := promauto.With(prometheus.NewRegistry())
	probesSent := auto.NewCounterVec(
		prometheus.CounterOpts{
			Name: "gateway_path_probes_sent_" + b.Name(),
			Help: "Number of path probes being sent.",
		},
		[]string{"isd_as", "remote_isd_as"},
	)
	create1 := func(local, remote addr.IA) metrics.Counter {
		return metrics.CounterWith(
			metrics.NewPromCounter(probesSent),
			"isd_as", local.String(),
			"remote_isd_as", remote.String(),
		)
	}
	create2 := func(local, remote addr.IA) metrics.SimpleCounter {
		return metrics.NewCounter(probesSent.With(prometheus.Labels{
			"isd_as":        local.String(),
			"remote_isd_as": remote.String(),
		}))
	}
	mustIA := func(s string) addr.IA {
		ia, err := addr.ParseIA(s)
		if err != nil {
			b.Fatal(err)
		}
		return ia
	}
	var c metrics.Counter
	_ = create1(mustIA("1-ff00:0:1"), mustIA("2-ff00:0:2"))
	c = create2(mustIA("1-ff00:0:1"), mustIA("2-ff00:0:2"))
	for n := 0; n < b.N; n++ {
		metrics.CounterInc(c)
	}
}

create1 has 2 allocations, create2 0.

Yes I think we should prevent this in general, metrics should not cause allocations/or high computation overhead.

Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on @matzf)

Copy link
Member

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not confuse the suboptimal implementation of the current interface with the flaws of the interface.
Even the current interface could be easily tweaked to avoid the makeLabels call in Add, by implementing With by means prometheus' built in Curry functions instead of tracking the label value set ourselves. Furthermore, the first call to Add could cache the internal counter object to avoid prometheus' internal map lookups for subsequent invocations on the same Counter object.

That being said, it seems that we mostly agree that the interface is flawed on its own; the types do not convey any information about whether all the label values have already been set (and thus whether Add can be used), requiring the programmer to keep track of the state of the initialization. And, as @oncilla pointed out, it encourages stringly-typed, tight coupling of the initialization and the usage of the metrics. Without such a With, setting the label values is forced to be closer to the initialization of the metrics, using explicit (typed!) call backs where dynamic label values are required.

The reason why I added a separate set of types was to prevent breaking changes. If we are fine with breaking changes, I would just do aways with the whole With method and let that be handled outside.

I think breaking changes here are ok; even this is on the periphery of the public API, we don't need to be overly conservative here IMO.
Removing the With generally sounds like a good plan to me. My impression was that this will require a lot of restructuring. If you think this is feasible, I'd fully agree that this would be ideal. The suggestion with a split of CounterBuilder and SimpleCounter intended to at least partially disentangle the With/Add without having fully restructure this immediately.

Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on @lukedirtwalker)

Copy link
Collaborator Author

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's see, I'll try it out, when I have a bit of time.

Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on @matzf)

Copy link
Collaborator Author

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I wonder if we should just introduce a metrics/v2 package for the time being and gradually migrate to it. I fear that migrating everything in one pass will be too large of a PR to review at once.

Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on @matzf)

lukedirtwalker added a commit to lukedirtwalker/scion that referenced this pull request May 23, 2024
Introduce a v2 version of pkg/metrics. The current metrics package is
not optimal:
- Using objects that have been called with With allocated on
  each increment/set value call.
- Having the option to call With ties consumer code with initialization
  code.
- There is no way whether a counter is fully initialized with all
  labels.

A full discussion can be found in
scionproto#4527.

The new v2 version is compatible with prometheus, so no explicit
conversion functions are offered, instead the prometheus creation
functions can be used directly.

So far the new library has been plugged in selected places. The
intention is that it will eventually replace the current library
completely, however doing so in a single sweep would be too extensive.
Copy link
Collaborator Author

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a v2 package: #4530

Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion (waiting on @matzf)

@lukedirtwalker
Copy link
Collaborator Author

Closing in favor of #4530

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants