Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

collect metrics in a separate go routine #82

Merged
merged 3 commits into from Jul 15, 2021
Merged

Conversation

marten-seemann
Copy link
Contributor

Fixes #81.

I really wish there was a way to pass a Prometheus struct to tcp-transport, and only collect metrics in case metrics are actually scraped. This PR introduces an eternal Go routine that periodically asks the kernel for metrics, and removes a connection from connection map when it is closed.

metrics.go Outdated
Comment on lines 123 to 131
info, err := conn.getTCPInfo()
if err != nil {
if strings.Contains(err.Error(), "use of closed network connection") {
c.closedConn(conn)
continue
}
log.Errorf("Failed to get TCP info: %s", err)
continue
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a problem that if we conn.getTCPInfo() errors differently from expected such that we'll never close the connection?

I've seen 2021-07-08T21:39:47.925-0400 ERROR tcp-tpt go-tcp-transport@v0.2.2/metrics.go:118 Failed to get TCP info: raw-control tcp 192.168.1.6:4001: getsockopt: not implemented on Windows so I'm concerned this could cause us problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it in my Windows VM, and I'm getting the same error. Should we disable metrics collection on Windows?

metrics.go Show resolved Hide resolved
@mxinden
Copy link
Member

mxinden commented Jul 14, 2021

Following up here, with the recent Slack conversations in mind.

Only collect stats when Collect is called.

That is the Prometheus idiomatic way. In other words, just-in-time (on scrape / on Collect) metric generation, instead of pre-generated metric generation decoupled from the Prometheus scrape interval, improves data accuracy all the way to the Prometheus data store.

@BigLep
Copy link

BigLep commented Jul 15, 2021

@marten-seemann : what are the next steps here? Who do we need a review from?

@marten-seemann
Copy link
Contributor Author

@marten-seemann : what are the next steps here? Who do we need a review from?

I wasn't done yet with my changes, that's why I didn't request any reviews. Now I am, and will do so.

@marten-seemann
Copy link
Contributor Author

@aschmahmann @Stebalien I updated the PR. Now

  1. we use calls to Close to garbage collect the connection map.
  2. we start the Go routine that collects the metrics (and performs all the syscalls) when Collect is called, i.e. only when Prometheus is actually scraping metrics.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small fix but it should work otherwise.

metrics.go Show resolved Hide resolved
@marten-seemann marten-seemann merged commit 398a666 into master Jul 15, 2021
@marten-seemann marten-seemann deleted the fix-unbounded-state branch July 15, 2021 20:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cleanup prometheus metrics more predictably
5 participants