-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
limit get series API call time range when start param is not specified #4976
limit get series API call time range when start param is not specified #4976
Conversation
I'm ok with this change.. but we are changing the behavior a bit (before no arguments = everything and now we only return the data stored on ingesters). @alvinlin123 WDYT? |
So according to https://cortexmetrics.io/docs/api/#get-series-by-label-matchers Cortex already ignore start/end parameter by default. If Let's update https://cortexmetrics.io/docs/api/#get-series-by-label-matchers with the default value. |
@@ -335,6 +335,12 @@ func (q querier) Select(sortSeries bool, sp *storage.SelectHints, matchers ...*l | |||
sp.Start = startMs | |||
sp.End = endMs | |||
|
|||
// For series queries without specifying the start time, we prefer to | |||
// only query ingesters and not to query maxQueryLength to avoid OOM kill. | |||
if sp.Func == "series" && startMs == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better if we merge this to line 307 above. Because what they are doing are pretty much the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I wanted to merge the two but to do this we need to do L318 ~ L336 first before L307.
This is probably a behavior change because we will do the time range validation first and then send to the metadata querier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we can do something along the line
userID, err := tenant.TenantID(ctx)
if err != nil {
return storage.ErrSeriesSet(err)
}
startMs, endMs, err := validateQueryTimeRange(ctx, userID, sp.Start, sp.End, q.limits, q.maxQueryIntoFuture)
if sp == nil {
sp = &storage.SelectHints{Start: q.mint, End: q.maxt}
} else if sp.Func == "series" && (!q.queryStoreForLabels || (err != nil && startMs == 0)) {
return q.metadataQuerier.Select(true, sp, matchers...)
}
if err == errEmptyTimeRange {
return storage.NoopSeriesSet()
} else if err != nil {
return storage.ErrSeriesSet(err)
}
Essentially if start is 0
we treat it as if queryStoreForLabels
is false
. I also think that validateQueryTimeRange
should return -1
for startMs
and endMs
when err != nil
to make our life easier.
There is an edge case where user actually specifies "0" for "start", and Cortex would still return last 24 hours data, but I think this behaviour is ok for the edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think this could work...
But is it worth? For readability I prefer to separate them. It is just 1 duplicated line.
This is not very clear tbh.
if sp.Func == "series" && (!q.queryStoreForLabels || (err != nil && startMs == 0))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that's why I was thinking maybe validateQueryTimeRange
should return -1
s when err
is not nil. But this is fine, I don't think we need to do it change anything for now; it's up to you.
I think this PR is ok, but I will let @alanprot have a chance to voice any other concerns based on what I said :)
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
122779d
to
8af14e5
Compare
Signed-off-by: Ben Ye <benye@amazon.com>
1d478ee
to
d0a4a61
Compare
cortexproject#4976) * limit get series API call time range when start is not specified Signed-off-by: Ben Ye <benye@amazon.com> * add changelog Signed-off-by: Ben Ye <benye@amazon.com> * update API doc Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye benye@amazon.com
What this PR does:
/api/v1/series
API and start time of 0, we just fallback to query ingesters only to avoid OOM kills. This should work well for those clients who don't support specifyingstart
on series API.Which issue(s) this PR fixes:
/api/v1/series
API supports optionalstart
andend
parameters. Since Cortex uses the Prometheus API code, right now ifstart
is not specified, it will use the min value here https://github.com/prometheus/prometheus/blob/main/web/api/v1/api.go#L725. This happens for some clients that don't configure these params.The issue here is that if start time is too small, we will end up querying all the data for the tenant. For series API this usually means we need to fetch a lot of data and do a lot of series duplications. It is really easy to get OOM killed.
https://github.com/cortexproject/cortex/blob/master/pkg/querier/querier.go#L344 Here we checks the max query length. However, it is broken because the
endTime.Sub(startTime)
overflows int64.Other notes:
Why
maxQueryLookback
is not enough?maxQueryLookback
could limit the start time. But sincemaxQueryLookback
is a limit which applies to bothquery
,query range
andseries
queries, We might not want to use the same range for all those 3 queries. For example, you may want to query 3 months data for query range but querying 3 months data usingSeries
might not make sense in terms of performance. And ifmaxQueryLookback
is not set then we don't have a fallback mechanism to protect ourselves.What if the user specifies
start
butstart
is a really small value?We ignore this case for now because this usually is not a valid use case. If the user intentionally does this, it is better to use
maxQueryLookback
.Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]