Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TSDB max exemplars config per tenant #5080

Merged
merged 9 commits into from
Feb 2, 2023
Merged

Conversation

sahnib
Copy link
Contributor

@sahnib sahnib commented Jan 5, 2023

Signed-off-by: sahnib sahnib@amazon.com

What this PR does:

Makes TSDB max exemplars config per tenant. Note that the MaxExemplars value is passed down to prometheus tsdb at tsdb.Open call, hence this configuration would not be hot loaded at this time - unless the ingesters re-open the database handle, or go through a restart..

Which issue(s) this PR fixes:
Fixes #5016

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: sahnib <sahnib@amazon.com>
@sahnib sahnib marked this pull request as ready for review January 5, 2023 02:23
Signed-off-by: sahnib <sahnib@amazon.com>
@yeya24
Copy link
Collaborator

yeya24 commented Jan 5, 2023

Thanks for the contribution!

Note that the MaxExemplars value is passed down to prometheus tsdb at tsdb.Open call, hence this configuration would not be hot loaded at this time - unless the ingesters re-open the database handle, or go through a restart..

Can we make it hot reloadable then as it is already reloadable in Prometheus. We can update it in the loop https://github.com/cortexproject/cortex/blob/master/pkg/ingester/ingester.go#L776 here by calling https://github.com/prometheus/prometheus/blob/main/tsdb/head.go#L855.

@sahnib
Copy link
Contributor Author

sahnib commented Jan 6, 2023

I have added the hot reload changes to the PR.

Testing

Ran cortex using docker with a update exemplars time of 10 seconds. Below are logs showing updates happening for a "fake" org_id.

level=info ts=2023-01-09T21:00:23.119451717Z caller=ingester.go:846 org_id=fake meg="updating max exemplars configuration."
level=info ts=2023-01-09T21:00:33.119455937Z caller=ingester.go:846 org_id=fake meg="updating max exemplars configuration."
level=info ts=2023-01-09T21:00:43.119592139Z caller=ingester.go:846 org_id=fake meg="updating max exemplars configuration."
level=info ts=2023-01-09T21:00:53.119561225Z caller=ingester.go:846 org_id=fake meg="updating max exemplars configuration."
level=info ts=2023-01-09T21:01:03.11941392Z caller=ingester.go:846 org_id=fake meg="updating max exemplars configuration."

Signed-off-by: sahnib <sahnib@amazon.com>
}

level.Info(logutil.WithUserID(userID, i.logger)).Log("meg", "updating max exemplars configuration.")
err := userDB.db.ApplyConfig(cfg)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method currently updates OutOfOrderTimeWindow and MaxExemplars value. I could not find usage of OutOfOrderTimeWindow in ingester. Please point me to where its configured per tsdb if it needs to be updated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one is fine. The OOO config is still WIP so we can ignore that in this pr.

Signed-off-by: sahnib <sahnib@amazon.com>
@@ -821,6 +829,31 @@ func (i *Ingester) updateLoop(ctx context.Context) error {
}
}

func (i *Ingester) updateTSDBMaxExemplars() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we will support OOO config in the future, can we rename the function to not only updateTSDBMaxExemplars? Maybe something like updateUserTSDBs or updateDBConfigs, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about updateUserTSDBConfigs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

@@ -92,7 +94,8 @@ type Config struct {
// Config for metadata purging.
MetadataRetainPeriod time.Duration `yaml:"metadata_retain_period"`

RateUpdatePeriod time.Duration `yaml:"rate_update_period"`
RateUpdatePeriod time.Duration `yaml:"rate_update_period"`
MaxExemplarsUpdatePeriod time.Duration `yaml:"max_exemplars_update_period"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to use a different config name to not only cover exemplars.

sahnib added 2 commits January 11, 2023 17:21
…db config.

Signed-off-by: sahnib <sahnib@amazon.com>
Signed-off-by: sahnib <sahnib@amazon.com>
@sahnib
Copy link
Contributor Author

sahnib commented Jan 12, 2023

I have made the API renaming changes.

pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
@@ -173,7 +170,6 @@ func (cfg *TSDBConfig) RegisterFlags(f *flag.FlagSet) {
f.IntVar(&cfg.WALSegmentSizeBytes, "blocks-storage.tsdb.wal-segment-size-bytes", wal.DefaultSegmentSize, "TSDB WAL segments files max size (bytes).")
f.BoolVar(&cfg.FlushBlocksOnShutdown, "blocks-storage.tsdb.flush-blocks-on-shutdown", false, "True to flush blocks to storage on shutdown. If false, incomplete blocks will be reused after restart.")
f.DurationVar(&cfg.CloseIdleTSDBTimeout, "blocks-storage.tsdb.close-idle-tsdb-timeout", 0, "If TSDB has not received any data for this duration, and all blocks from TSDB have been shipped, TSDB is closed and deleted from local disk. If set to positive value, this value should be equal or higher than -querier.query-ingesters-within flag to make sure that TSDB is not closed prematurely, which could cause partial query results. 0 or negative value disables closing of idle TSDB.")
f.IntVar(&cfg.MaxExemplars, "blocks-storage.tsdb.max-exemplars", 0, "Enables support for exemplars in TSDB and sets the maximum number that will be stored. 0 or less means disabled.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alanprot Is it okay to remove an existing flag cause it breaks backward compatibility?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shuold keep around for 2 releases as per governance, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Maybe we can do something like #5068

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the original flag in storage config, and marked it as deprecated.

Further, cortex will fall back to storage config value if exemplars value for the user is set to zero. Discussed these changes with @alanprot offline.

sahnib added 2 commits January 12, 2023 13:48
Signed-off-by: sahnib <sahnib@amazon.com>
Signed-off-by: sahnib <sahnib@amazon.com>
@pull-request-size pull-request-size bot added size/L and removed size/M labels Jan 31, 2023
Signed-off-by: sahnib <sahnib@amazon.com>
@sahnib sahnib requested review from yeya24 and alanprot and removed request for yeya24 and alanprot February 2, 2023 17:58
Copy link
Collaborator

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work!

@yeya24 yeya24 merged commit 204e251 into cortexproject:master Feb 2, 2023
alexqyle pushed a commit to alexqyle/cortex that referenced this pull request May 2, 2023
* Make TSDB max exemplars config per tenant - cortexproject#5016

Signed-off-by: sahnib <sahnib@amazon.com>

* Fix documentation as per new flags

Signed-off-by: sahnib <sahnib@amazon.com>

* Enable hot reload of max exemplars value.

Signed-off-by: sahnib <sahnib@amazon.com>

* Fix goimports in ingester.

Signed-off-by: sahnib <sahnib@amazon.com>

* Rename config flag, and method to be more generic towards updating tsdb config.

Signed-off-by: sahnib <sahnib@amazon.com>

* Remove un-intended single process config change.

Signed-off-by: sahnib <sahnib@amazon.com>

* Remove un-necessary logging.

Signed-off-by: sahnib <sahnib@amazon.com>

* Add fallback on previous limit in block storage config

Signed-off-by: sahnib <sahnib@amazon.com>

* Added unit-test cases for fallback logic, and updated docs.

Signed-off-by: sahnib <sahnib@amazon.com>

---------

Signed-off-by: sahnib <sahnib@amazon.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make TSDB max exemplars config per tenant
3 participants