Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limit get series API call time range when start param is not specified #4976

Merged
merged 3 commits into from
Nov 28, 2022

Conversation

yeya24
Copy link
Collaborator

@yeya24 yeya24 commented Nov 18, 2022

Signed-off-by: Ben Ye benye@amazon.com

What this PR does:

  1. We ensure select start time is always a non-negative value (we set it to 0 if it is < 0) to avoid int64 overflow issue.
  2. For /api/v1/series API and start time of 0, we just fallback to query ingesters only to avoid OOM kills. This should work well for those clients who don't support specifying start on series API.

Which issue(s) this PR fixes:

/api/v1/series API supports optional start and end parameters. Since Cortex uses the Prometheus API code, right now if start is not specified, it will use the min value here https://github.com/prometheus/prometheus/blob/main/web/api/v1/api.go#L725. This happens for some clients that don't configure these params.

The issue here is that if start time is too small, we will end up querying all the data for the tenant. For series API this usually means we need to fetch a lot of data and do a lot of series duplications. It is really easy to get OOM killed.

https://github.com/cortexproject/cortex/blob/master/pkg/querier/querier.go#L344 Here we checks the max query length. However, it is broken because the endTime.Sub(startTime) overflows int64.

Other notes:

  1. Why maxQueryLookback is not enough?
    maxQueryLookback could limit the start time. But since maxQueryLookback is a limit which applies to both query, query range and series queries, We might not want to use the same range for all those 3 queries. For example, you may want to query 3 months data for query range but querying 3 months data using Series might not make sense in terms of performance. And if maxQueryLookback is not set then we don't have a fallback mechanism to protect ourselves.

  2. What if the user specifies start but start is a really small value?
    We ignore this case for now because this usually is not a valid use case. If the user intentionally does this, it is better to use maxQueryLookback.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@alanprot
Copy link
Member

I'm ok with this change.. but we are changing the behavior a bit (before no arguments = everything and now we only return the data stored on ingesters).

@alvinlin123 WDYT?

@alvinlin123
Copy link
Member

alvinlin123 commented Nov 23, 2022

So according to https://cortexmetrics.io/docs/api/#get-series-by-label-matchers Cortex already ignore start/end parameter by default. If -querier.query-store-for-labels-enabled then it will use it, but it's experimental feature. So I think we can move forward with this.

Let's update https://cortexmetrics.io/docs/api/#get-series-by-label-matchers with the default value.

@@ -335,6 +335,12 @@ func (q querier) Select(sortSeries bool, sp *storage.SelectHints, matchers ...*l
sp.Start = startMs
sp.End = endMs

// For series queries without specifying the start time, we prefer to
// only query ingesters and not to query maxQueryLength to avoid OOM kill.
if sp.Func == "series" && startMs == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better if we merge this to line 307 above. Because what they are doing are pretty much the same.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I wanted to merge the two but to do this we need to do L318 ~ L336 first before L307.
This is probably a behavior change because we will do the time range validation first and then send to the metadata querier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we can do something along the line

userID, err := tenant.TenantID(ctx)
if err != nil {
  return storage.ErrSeriesSet(err)
}
startMs, endMs, err := validateQueryTimeRange(ctx, userID, sp.Start, sp.End, q.limits, q.maxQueryIntoFuture)

if sp == nil {
  sp = &storage.SelectHints{Start: q.mint, End: q.maxt}
} else if sp.Func == "series" && (!q.queryStoreForLabels ||  (err != nil && startMs == 0)) {	
   return q.metadataQuerier.Select(true, sp, matchers...)
} 

if err == errEmptyTimeRange {
    return storage.NoopSeriesSet()
} else if err != nil {
   return storage.ErrSeriesSet(err)
}

Essentially if start is 0 we treat it as if queryStoreForLabels is false. I also think that validateQueryTimeRange should return -1 for startMs and endMs when err != nil to make our life easier.

There is an edge case where user actually specifies "0" for "start", and Cortex would still return last 24 hours data, but I think this behaviour is ok for the edge case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think this could work...
But is it worth? For readability I prefer to separate them. It is just 1 duplicated line.
This is not very clear tbh.
if sp.Func == "series" && (!q.queryStoreForLabels || (err != nil && startMs == 0))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that's why I was thinking maybe validateQueryTimeRange should return -1s when err is not nil. But this is fine, I don't think we need to do it change anything for now; it's up to you.

I think this PR is ok, but I will let @alanprot have a chance to voice any other concerns based on what I said :)

Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
@yeya24 yeya24 merged commit fcfa5a0 into cortexproject:master Nov 28, 2022
@yeya24 yeya24 deleted the limit-get-series-time-range branch November 28, 2022 18:10
t00350320 pushed a commit to t00350320/cortex that referenced this pull request Nov 29, 2022
cortexproject#4976)

* limit get series API call time range when start is not specified

Signed-off-by: Ben Ye <benye@amazon.com>

* add changelog

Signed-off-by: Ben Ye <benye@amazon.com>

* update API doc

Signed-off-by: Ben Ye <benye@amazon.com>

Signed-off-by: Ben Ye <benye@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants