Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't backup when compaction is running #3770

Open
Michal-Leszczynski opened this issue Mar 28, 2024 · 2 comments
Open

Don't backup when compaction is running #3770

Michal-Leszczynski opened this issue Mar 28, 2024 · 2 comments

Comments

@Michal-Leszczynski
Copy link
Collaborator

Michal-Leszczynski commented Mar 28, 2024

Creating snapshots when compaction is running can lead to increased disk consumption. It might be a good idea for SM to wait for them to finish first as described in https://github.com/scylladb/scylla-enterprise/issues/3809#issuecomment-1931976661. Note that there is no way to prevent compaction from happening when the snapshots are already taken.

Connected issues:

cc: @karol-kokoszka @tzach

@karol-kokoszka
Copy link
Collaborator

Candidate for 3.2.9 (or 3.3.1, depending on tablets)

@karol-kokoszka
Copy link
Collaborator

grooming notes

To check for ongoing compaction you should:

query /task_manager/list_module_tasks/compaction;
filter out the ones for which state in {done, failed};
wait for the rest (/task_manager/wait_task/{task_id}).
If regular compaction should be also waited for then you should rather:

query /task_manager/list_module_tasks/compaction with internal flag on;
filter out the ones for which state in {done, failed} or have non-zero parent_id;
wait for the rest (/task_manager/wait_task/{task_id}).


The worst case scenario:
When the compaction is running, Scylla rewrites the SSTables files. If, just before a snapshot requested on the same SSTables has been taken, the hard links to these SSTables are created. It leads to the situation that already compacted SSTable cannot be removed, because there still exists hardlink pointing to this file. What eventually leads to the situation where file exists and consumes the disk space, even though it's completely not needed (it's needed only to complete the backup).

The disk utilization could be doubled, but the probability of such a situation seems to be low (but still exists).


The problem that we want to address here refers mainly to the major compaction process,.https://opensource.docs.scylladb.com/stable/kb/compaction.html


The proposal includes to backoff/retry the backup task until there is no major compaction running. But, the major compaction may last for a long time. It creates the risk that the backup won't be created at the expected time.

Due to the risk of not having a backup at the scheduled time, we need to bring the issue to the planning.
The priority of this issue is rather low.
It looks like an edge case.

(cc: @tzach )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants