You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We run several short-lived (sometimes only 1 hour in age) clusters. When using the thanos sidecar approach, downscaling a prometheus replica (either permanently or removing shards) will result in data loss of all chunks in the head.
It would be great to have native support for flushing and uploading what’s in the head in prometheus-operator (likely requiring changes to other components as well).
Unfortunately there's no TSDB API for "flushing" the head, but you can create a snapshot of TSDB, then move all new blocks in that snapshot into the top-level data dir.
The thanos sidecar can then perform it's own "flushing" in the form of uploading blocks one last time.
prometheus-operator feels like the most natural place to orchestrate this, but open to discussion!
I'm currently achieving this in a separate container that uses a preStop hook to
call the snapshot endpoint of prometheus
move the new blocks from that snapshot dir into the top-level data dir
run thanos tools bucket upload-blocks.
The snapshot isn't a lot of storage as existing blocks in the snapshot are symlinks to the actual block.
We previously used a Thanos receiver setup which avoided this problem altogether, but it was wildly more expensive and quite a lot of overhead to operate.
Hey @Nashluffy thanks for this new issue. 😄
Yes, the described steps would work and sounds pretty nice.
I'd like to have something less hacky by not relying on lifecycle hooks.
I just opened this new issue on the Thanos Project, let's see what people think about it. thanos-io/thanos#7295
Thanks! I'll keep the prometheus-operator discussion here
Just another point: I think a call to the flush endpoint should be part of the Prometheus finalizer as well, not just when scaling down shards. This would capture my use-case, as we don't use shards.
I think we could extend the proposed API to also provide this alternative as a shutdown option. Of course, that requires me to continue and finish my PR 😅
Component(s)
Prometheus
What is missing? Please describe.
We run several short-lived (sometimes only 1 hour in age) clusters. When using the thanos sidecar approach, downscaling a prometheus replica (either permanently or removing shards) will result in data loss of all chunks in the head.
There are several issues that have all roughly touched on this issue.
#4967
prometheus/prometheus#12261
thanos-io/thanos#1849
It would be great to have native support for flushing and uploading what’s in the head in prometheus-operator (likely requiring changes to other components as well).
Unfortunately there's no TSDB API for "flushing" the head, but you can create a snapshot of TSDB, then move all new blocks in that snapshot into the top-level data dir.
The thanos sidecar can then perform it's own "flushing" in the form of uploading blocks one last time.
prometheus-operator feels like the most natural place to orchestrate this, but open to discussion!
Describe alternatives you've considered.
I'm currently achieving this in a separate container that uses a
preStop
hook tothanos tools bucket upload-blocks
.The snapshot isn't a lot of storage as existing blocks in the snapshot are symlinks to the actual block.
We previously used a Thanos receiver setup which avoided this problem altogether, but it was wildly more expensive and quite a lot of overhead to operate.
Environment Information.
Environment
Kubernetes Version: 1.27
Prometheus-Operator Version: 0.73
The text was updated successfully, but these errors were encountered: