Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup commands run forever causing excessive disk usage #313

Closed
smlx opened this issue May 6, 2024 · 0 comments · Fixed by #314
Closed

Backup commands run forever causing excessive disk usage #313

smlx opened this issue May 6, 2024 · 0 comments · Fixed by #314
Labels
bug Something isn't working

Comments

@smlx
Copy link
Member

smlx commented May 6, 2024

Redis persistence works like this:

  • Redis forks. We now have a child and a parent process.
  • The child starts to write the dataset to a temporary RDB file.
  • When the child is done writing the new RDB file, it replaces the old one.

Meanwhile, the persistent redis backup command in this repository is:

/bin/sh -c "/bin/busybox tar -cf - -C /data .u

k8up can sometimes lose track of executed commands such that inside a redis pod you'll see multiple backup commands like this:

  426 user      0:00 /bin/busybox tar -cf - -C /data .
  854 user      0:00 /bin/busybox tar -cf - -C /data .

When tar is executed like this to send data to stdout it will open the files in /data, bundle them into a file, and then start writing to the stdout pipe. Once the stdout pipe fills (as can happen if k8up pods restart or otherwise stop reading off the exec pipe), tar will be blocked but will retain an open file handle to /data/restic.rdb.

Redis will rotate restic.rdb files as outlined above, but the disk space will never actually be reclaimed while tar holds the open file handle.

This will cause the disk space in /data to be filled with old phantom copies of redis.rdb and we get disk usage alerts.

I think the best solution will be to set a timeout on the tar command so that it cannot get into the "infinite runtime" state if the stdout pipe fills. Maybe setting the backup command to something like this to enforce a 4 hour maximum runtime?

/bin/sh -c "timeout 14400 tar -cf - -C /data .
@smlx smlx added the bug Something isn't working label May 6, 2024
@tobybellwood tobybellwood transferred this issue from uselagoon/lagoon-images May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant