-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch Iterator optimization #5237
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
alanprot
force-pushed
the
batch-opmization
branch
2 times, most recently
from
March 29, 2023 00:33
79742f8
to
050f9d8
Compare
alanprot
force-pushed
the
batch-opmization
branch
from
March 29, 2023 03:05
6bc676c
to
b48a851
Compare
yeya24
reviewed
Mar 29, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work overall. This is huge! Thanks
Signed-off-by: Alan Protasio <alanprot@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>
alanprot
force-pushed
the
batch-opmization
branch
from
March 29, 2023 17:45
ccf58b1
to
5a0086a
Compare
Signed-off-by: Alan Protasio <alanprot@gmail.com>
alanprot
force-pushed
the
batch-opmization
branch
from
March 29, 2023 18:07
05eec5f
to
017db76
Compare
yeya24
approved these changes
Mar 29, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
yeya24
pushed a commit
to yeya24/cortex
that referenced
this pull request
Mar 31, 2023
* Batch Opmization Signed-off-by: Alan Protasio <alanprot@gmail.com> * Add test bacj Signed-off-by: Alan Protasio <alanprot@gmail.com> * Testing Multiples scrape intervals Signed-off-by: Alan Protasio <alanprot@gmail.com> * no assimption Signed-off-by: Alan Protasio <alanprot@gmail.com> * Using max chunk ts Signed-off-by: Alan Protasio <alanprot@gmail.com> * test with scrape 10 Signed-off-by: Alan Protasio <alanprot@gmail.com> * rename method Signed-off-by: Alan Protasio <alanprot@gmail.com> * comments Signed-off-by: Alan Protasio <alanprot@gmail.com> * using next Signed-off-by: Alan Protasio <alanprot@gmail.com> * change test name Signed-off-by: Alan Protasio <alanprot@gmail.com> * changelog/comments Signed-off-by: Alan Protasio <alanprot@gmail.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com>
yeya24
pushed a commit
to yeya24/cortex
that referenced
this pull request
Mar 31, 2023
* Batch Opmization Signed-off-by: Alan Protasio <alanprot@gmail.com> * Add test bacj Signed-off-by: Alan Protasio <alanprot@gmail.com> * Testing Multiples scrape intervals Signed-off-by: Alan Protasio <alanprot@gmail.com> * no assimption Signed-off-by: Alan Protasio <alanprot@gmail.com> * Using max chunk ts Signed-off-by: Alan Protasio <alanprot@gmail.com> * test with scrape 10 Signed-off-by: Alan Protasio <alanprot@gmail.com> * rename method Signed-off-by: Alan Protasio <alanprot@gmail.com> * comments Signed-off-by: Alan Protasio <alanprot@gmail.com> * using next Signed-off-by: Alan Protasio <alanprot@gmail.com> * change test name Signed-off-by: Alan Protasio <alanprot@gmail.com> * changelog/comments Signed-off-by: Alan Protasio <alanprot@gmail.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com>
yeya24
added a commit
that referenced
this pull request
Apr 1, 2023
* Batch Iterator optimization (#5237) * Batch Opmization Signed-off-by: Alan Protasio <alanprot@gmail.com> * Add test bacj Signed-off-by: Alan Protasio <alanprot@gmail.com> * Testing Multiples scrape intervals Signed-off-by: Alan Protasio <alanprot@gmail.com> * no assimption Signed-off-by: Alan Protasio <alanprot@gmail.com> * Using max chunk ts Signed-off-by: Alan Protasio <alanprot@gmail.com> * test with scrape 10 Signed-off-by: Alan Protasio <alanprot@gmail.com> * rename method Signed-off-by: Alan Protasio <alanprot@gmail.com> * comments Signed-off-by: Alan Protasio <alanprot@gmail.com> * using next Signed-off-by: Alan Protasio <alanprot@gmail.com> * change test name Signed-off-by: Alan Protasio <alanprot@gmail.com> * changelog/comments Signed-off-by: Alan Protasio <alanprot@gmail.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> * Store Gateway: Convert metrics from summary to histograms (#5239) * Convert following metrics from summary to histogram cortex_bucket_store_series_blocks_queried cortex_bucket_store_series_data_fetched cortex_bucket_store_series_data_size_touched_bytes cortex_bucket_store_series_data_size_fetched_bytes cortex_bucket_store_series_data_touched cortex_bucket_store_series_result_series Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> * Update changelog Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> * fix changelog Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> --------- Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> * update changelog Signed-off-by: Ben Ye <benye@amazon.com> * Catch context error in the s3 bucket client (#5240) Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> * bump RC version Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> Co-authored-by: Alan Protasio <approtas@amazon.com> Co-authored-by: Friedrich Gonzalez <friedrichg@gmail.com> Co-authored-by: Xiaochao Dong <the.xcdong@gmail.com>
friedrichg
added a commit
that referenced
this pull request
Apr 23, 2023
* prepare 1.15.0-rc release (#5235) Signed-off-by: Ben Ye <benye@amazon.com> * Cherry-pick fixes to release 1.15 branch (#5241) * Batch Iterator optimization (#5237) * Batch Opmization Signed-off-by: Alan Protasio <alanprot@gmail.com> * Add test bacj Signed-off-by: Alan Protasio <alanprot@gmail.com> * Testing Multiples scrape intervals Signed-off-by: Alan Protasio <alanprot@gmail.com> * no assimption Signed-off-by: Alan Protasio <alanprot@gmail.com> * Using max chunk ts Signed-off-by: Alan Protasio <alanprot@gmail.com> * test with scrape 10 Signed-off-by: Alan Protasio <alanprot@gmail.com> * rename method Signed-off-by: Alan Protasio <alanprot@gmail.com> * comments Signed-off-by: Alan Protasio <alanprot@gmail.com> * using next Signed-off-by: Alan Protasio <alanprot@gmail.com> * change test name Signed-off-by: Alan Protasio <alanprot@gmail.com> * changelog/comments Signed-off-by: Alan Protasio <alanprot@gmail.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> * Store Gateway: Convert metrics from summary to histograms (#5239) * Convert following metrics from summary to histogram cortex_bucket_store_series_blocks_queried cortex_bucket_store_series_data_fetched cortex_bucket_store_series_data_size_touched_bytes cortex_bucket_store_series_data_size_fetched_bytes cortex_bucket_store_series_data_touched cortex_bucket_store_series_result_series Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> * Update changelog Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> * fix changelog Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> --------- Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> * update changelog Signed-off-by: Ben Ye <benye@amazon.com> * Catch context error in the s3 bucket client (#5240) Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> * bump RC version Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> Co-authored-by: Alan Protasio <approtas@amazon.com> Co-authored-by: Friedrich Gonzalez <friedrichg@gmail.com> Co-authored-by: Xiaochao Dong <the.xcdong@gmail.com> * Cherry-pick fixes to release 1.15 for new RC (#5259) * fix remote read error in query frontend (#5257) * fix remote read error in query frontend Signed-off-by: Ben Ye <benye@amazon.com> * fix integration test Signed-off-by: Ben Ye <benye@amazon.com> * add extra one query Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com> * bump RC version Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com> * Fix splitByInterval incorrect error response format (#5260) (#5261) * fix query frontend incorrect error response format * update changelog * fix integration test --------- Signed-off-by: Ben Ye <benye@amazon.com> * release 1.15.0 (#5274) Signed-off-by: Ben Ye <benye@amazon.com> * merge 1.15 into master and resolve changelog conflicts Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: Alan Protasio <alanprot@gmail.com> Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> Co-authored-by: Alan Protasio <approtas@amazon.com> Co-authored-by: Friedrich Gonzalez <friedrichg@gmail.com> Co-authored-by: Xiaochao Dong <the.xcdong@gmail.com>
yeya24
pushed a commit
to yeya24/cortex
that referenced
this pull request
Apr 28, 2023
* Batch Opmization Signed-off-by: Alan Protasio <alanprot@gmail.com> * Add test bacj Signed-off-by: Alan Protasio <alanprot@gmail.com> * Testing Multiples scrape intervals Signed-off-by: Alan Protasio <alanprot@gmail.com> * no assimption Signed-off-by: Alan Protasio <alanprot@gmail.com> * Using max chunk ts Signed-off-by: Alan Protasio <alanprot@gmail.com> * test with scrape 10 Signed-off-by: Alan Protasio <alanprot@gmail.com> * rename method Signed-off-by: Alan Protasio <alanprot@gmail.com> * comments Signed-off-by: Alan Protasio <alanprot@gmail.com> * using next Signed-off-by: Alan Protasio <alanprot@gmail.com> * change test name Signed-off-by: Alan Protasio <alanprot@gmail.com> * changelog/comments Signed-off-by: Alan Protasio <alanprot@gmail.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com>
yeya24
added a commit
to yeya24/cortex
that referenced
this pull request
Apr 28, 2023
* prepare 1.15.0-rc release (cortexproject#5235) Signed-off-by: Ben Ye <benye@amazon.com> * Cherry-pick fixes to release 1.15 branch (cortexproject#5241) * Batch Iterator optimization (cortexproject#5237) * Batch Opmization Signed-off-by: Alan Protasio <alanprot@gmail.com> * Add test bacj Signed-off-by: Alan Protasio <alanprot@gmail.com> * Testing Multiples scrape intervals Signed-off-by: Alan Protasio <alanprot@gmail.com> * no assimption Signed-off-by: Alan Protasio <alanprot@gmail.com> * Using max chunk ts Signed-off-by: Alan Protasio <alanprot@gmail.com> * test with scrape 10 Signed-off-by: Alan Protasio <alanprot@gmail.com> * rename method Signed-off-by: Alan Protasio <alanprot@gmail.com> * comments Signed-off-by: Alan Protasio <alanprot@gmail.com> * using next Signed-off-by: Alan Protasio <alanprot@gmail.com> * change test name Signed-off-by: Alan Protasio <alanprot@gmail.com> * changelog/comments Signed-off-by: Alan Protasio <alanprot@gmail.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> * Store Gateway: Convert metrics from summary to histograms (cortexproject#5239) * Convert following metrics from summary to histogram cortex_bucket_store_series_blocks_queried cortex_bucket_store_series_data_fetched cortex_bucket_store_series_data_size_touched_bytes cortex_bucket_store_series_data_size_fetched_bytes cortex_bucket_store_series_data_touched cortex_bucket_store_series_result_series Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> * Update changelog Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> * fix changelog Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> --------- Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> * update changelog Signed-off-by: Ben Ye <benye@amazon.com> * Catch context error in the s3 bucket client (cortexproject#5240) Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> * bump RC version Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> Co-authored-by: Alan Protasio <approtas@amazon.com> Co-authored-by: Friedrich Gonzalez <friedrichg@gmail.com> Co-authored-by: Xiaochao Dong <the.xcdong@gmail.com> * Cherry-pick fixes to release 1.15 for new RC (cortexproject#5259) * fix remote read error in query frontend (cortexproject#5257) * fix remote read error in query frontend Signed-off-by: Ben Ye <benye@amazon.com> * fix integration test Signed-off-by: Ben Ye <benye@amazon.com> * add extra one query Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com> * bump RC version Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com> * Fix splitByInterval incorrect error response format (cortexproject#5260) (cortexproject#5261) * fix query frontend incorrect error response format * update changelog * fix integration test --------- Signed-off-by: Ben Ye <benye@amazon.com> * release 1.15.0 (cortexproject#5274) Signed-off-by: Ben Ye <benye@amazon.com> * merge 1.15 into master and resolve changelog conflicts Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: Alan Protasio <alanprot@gmail.com> Signed-off-by: Friedrich Gonzalez <friedrichg@gmail.com> Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> Co-authored-by: Alan Protasio <approtas@amazon.com> Co-authored-by: Friedrich Gonzalez <friedrichg@gmail.com> Co-authored-by: Xiaochao Dong <the.xcdong@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does:
This PR is till a work in progress but seems we can speed up some query ranges quite a bit:
We noticed a very high cpu utilization by the
seek
method when running query ranges that does not have any overlapping steps:Ex: q=sum(rate(up[1m]))&step=120 (1 min matrix selector with a 2 min step)
This type of queries and steps are quite common since grafana default to 120s when displaying an 1 day graph.
In our test environment this change reduced for the query
sum(rate(up[1m]))&step=120&start=now-600m&end=now
over 30K series from30s
to5s
One interesting point is that if we set the step to
60s
the seek is not performed and the query returns in9s
instead of43s
We can see on the benchmarks bellow that there is a sweet spot and this change can be slower than the current implementation when the steps are skipping lots of samples inside the same chunk (20% slower). Said that, this change seems to still make sense as: 1. It improves 90%+ the most commons use cases; 2. The penalty only happens in cases that is already very fast (~15ms) 3. it makes the latency way less sensitive to the steps (before a
30s
step was taking 600ms and300s
step was taking 12ms and now all the cases are under 20ms.The Crux of this change is that the
seek
cause the iterator to transverse all the samples from the beginning of the chunk to the timet
. In other words, with a step=60s, we we transverse the samples O(n) times but with the step=120s we transverse them O(n*s) where s is the number of steps.See this:
cortex/pkg/querier/batch/chunk.go
Line 45 in f694529
cortex/pkg/chunk/encoding/prometheus_chunk.go
Lines 90 to 95 in f694529
https://github.com/prometheus/prometheus/blob/211ae4f1f0a2cdaae09c4c52735f75345c1817c6/tsdb/chunkenc/xor.go#L247
We could create a new iterator only if
t
is before the current at this place but the batching makes this not possible.** Memory allocations benchmarks were omitted as there is no change at all on them **
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]