Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check WAL shipping setup only in archiver mode #720

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

andrey-utkin
Copy link

Setups with only streaming_archiver enabled will fail this check. This blocks "backup" operation, while receive-wal works.

This appears to be previously reported issue, for example see #298

This bug may be explained by the fact previously enabling both streaming_archiver and archiver (WAL shipping) was the recommended configuration, per documentation. In such configuration this check would pass.

Setups with only streaming_archiver enabled will fail this check.
This blocks "backup" operation, while receive-wal works.

This appears to be previously reported issue, for example see
EnterpriseDB#298

This bug may be explained by the fact previously enabling both
streaming_archiver and archiver (WAL shipping) was the recommended
configuration, per documentation. In such configuration this check would
pass.
@andrey-utkin
Copy link
Author

Hey, I'm glad to offer this tiny patch, but please take it very critically as it's my first time ever using barman (although not the first time setting up Postgres replication), so I might not know what I've done here.

@mikewallace1979
Copy link
Contributor

Hi @andrey-utkin - thanks for reporting the issue and providing a patch. The patch itself looks fine however skipping this particular check if archiver = off isn't quite the full solution.

The short explanation is that the WAL archive check is checking whether Barman knows about any WALs at all, be they received either via the PostgreSQL archive_command or barman receive-wal. When this check fails it is because the file used to store WAL metadata (xlog.db) is either not present or empty; this means that Barman has not yet received any complete WALs.

Whether archiver or streaming_archiver (or both) is used, the WAL archive check is still relevant, because it is verifying whether any WALs at all exist in Barman's WAL archive. The usual way to deal with this failing check when setting up a new server in Barman is to force a WAL switch on the PostgreSQL server using barman switch-wal with the --force and --archive options.

However when streaming_archiver is used and archiver is not used it is not necessary to wait for a complete WAL to be copied into Barman's WAL archive - it might be enough to verify that a .partial file is present in the path specified by streaming_wals_directory. While this would tell us whether or not WAL streaming is working it would not verify that the full process of writing WALs into Barman's WAL archive works end-to-end (for example, it would be possible for the end-to-end process to fail if wals_directory pointed to a path which was not writable by the barman user).

So there is still an opportunity here to improve the check when archiver = off and streaming_archiver = on but in addition to skipping the xlog.db check we need to do the following:

  1. Check for the presence of a .partial file in streaming_wals_directory.
  2. Check that wals_directory is writable by the barman user.

- wals_directory must be writable
- partial file should be present in case of streaming archiver
@andrey-utkin
Copy link
Author

Hi, thanks a lot for the guidance!
Sorry for the long delay.

I've added the two checks you suggested. I have tested it on a new streaming-only setup, so it works at least on my machine.
How do they look to you?
I am not sure should I have used f-string. Do you support Python older than 3.6?

@andrey-utkin
Copy link
Author

andrey-utkin commented Feb 13, 2023 via email

@mikewallace1979
Copy link
Contributor

Hi @andrey-utkin - thanks for the reminder, we'll re-review this shortly.

Copy link
Contributor

@mikewallace1979 mikewallace1979 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes and sorry for making the suggestion about checking for .partial files.

We still need to come up with a better way of verifying WAL streaming is working as intended in order to skip the xlog.db check however we'll need to think about it a bit more to figure out the right approach.

hint="please make sure WAL shipping is setup",
)
if self.config.streaming_archiver:
if not glob(os.path.join(self.config.streaming_wals_directory,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately when I suggested this check I didn't take into account the fact that the .partial file is not always going to be present. For example, when a WAL switch occurs there is no guarantee that the .partial file for the next segment will have been created by the time the completed segment is renamed.

While there's still a need to improve on how Barman verifies WAL streaming is working, checking for the .partial file is not the way to do it and we'll need to come up with something better.

if not os.access(self.config.wals_directory, os.W_OK):
check_strategy.result(
self.config.name, False,
hint=f"wals_directory {self.config.wals_directory} must be writable"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This f string should be fine for future Barman releases because they will require Python 3.6 as a minimum. If we did want to backport it to the 3.4.x branch then it would need to be compatible with Python 2.7 but that would be a small enough change to deal with if it came up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants