Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less surprising behavior for tiered storage topics when remote hasn't been constructed #18398

Open
andrwng opened this issue May 10, 2024 · 1 comment
Labels
area/cloud-storage Shadow indexing subsystem kind/enhance New feature or request

Comments

@andrwng
Copy link
Contributor

andrwng commented May 10, 2024

Today, when you change tiered storage configs (e.g. to start using tiered storage), the underlying cloud_storage::remote() doesn't get constructed until the process is restarted. It'd be nice if the remote could be rebuilt at runtime, but this is a tricky task, considering all the places the remote leaks into different abstractions with the expectation that it is constructed once at startup.

Some surprises that come out of this:

  • Tiered storage topics will tell RPK that they have tiered storage enabled, but they will be unable to upload anything.
  • This can result in disk filling up because space management will not trim anything for local-only topics.

We should refine the behavior of the span in between setting tiered storage configs and the next restart. A simple strawman proposal is to at least reject topic creation. It doesn't solve everything though because topics that have already been created can have equally surprising behavior when toggling tiered storage cluster configs.

Some other, spicier options to consider:

  • Nuke everything from within Redpanda rather than relying on external services to restart Redpanda. On a config change that includes something that needs-restart, perhaps we should stop/shutdown and rebuild most of the application. This seems a bit risky to do automatically, but it would avoid this problem.
  • Don't allow changing tiered storage via cluster configs at all, and force users to use bootstrap yaml. If needed, maybe we'd need to make toggling tiered storage into a more explicit, heavy-weight operation that makes users confront the effects of the toggling (maybe this would be RPK driven and show topics that would be affected by toggling).

JIRA Link: CORE-2911

@andrwng andrwng added kind/enhance New feature or request area/cloud-storage Shadow indexing subsystem labels May 10, 2024
@hcoyote
Copy link

hcoyote commented May 10, 2024

Wonder if we should bubble up to the health status the fact that the cluster config is in a state requiring a restart. Right now it's buried in some api the user wouldn't normally access. Or treat it as a metric state that they can alert on (similar to what we do now with some disk alert functions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem kind/enhance New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants