Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WAL functions cannot be executed in standby servers #899

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fmbiete
Copy link

@fmbiete fmbiete commented Sep 1, 2023

WAL functions cannot be executed in standby servers

ERROR:  recovery is in progress
HINT:  WAL control functions cannot be executed during recovery.

In PostgreSQL 16 replication slots persist in the standby servers.
This is also the case when using extensions like pg_failover_slots that transfer the slot information to the standby.

This condition will make those queries to return values only in the primary node avoiding the errors.

Signed-off-by: Francisco Miguel Biete Banon <fbiete@gmail.com>
@elpavel
Copy link

elpavel commented Sep 20, 2023

What about using this logic when pg_is_in_recovery ?

 		{
 			semver.MustParseRange(">=9.4.0 <10.0.0"),
 			`
-			SELECT slot_name, database, active, pg_xlog_location_diff(pg_current_xlog_location(), restart_lsn)
+			SELECT slot_name, database, active,
+				(case pg_is_in_recovery() when 't' then pg_xlog_location_diff(pg_last_xlog_receive_location(), restart_lsn) else pg_xlog_location_diff(pg_current_xlog_location(), restart_lsn) end) as pg_xlog_location_diff
 			FROM pg_replication_slots
 			`,
 		},
 		{
 			semver.MustParseRange(">=10.0.0"),
 			`
-			SELECT slot_name, database, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
+			SELECT slot_name, database, active,
+				(case pg_is_in_recovery() when 't' then pg_wal_lsn_diff(pg_last_wal_receive_lsn(), restart_lsn) else pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) end) as pg_wal_lsn_diff
 			FROM pg_replication_slots
 			`,
 		},

@fmbiete
Copy link
Author

fmbiete commented Sep 22, 2023

What about using this logic when pg_is_in_recovery ?

That doesn't throw an error but there are some buts.

  • pg_last_wal_receive_lsn() can returns NULL, so we would need to make the statement resilient to it

  • I'm not sure of the correctness of that data, it shows 2 different things depending of the server status

    • in the case of being in recovery it would show the difference between the last lsn sent by the primary to this standby and the received lsn of the remote standby
    • in case of not being in recover it would show the difference between the current position and the received lsn of the remote standby

The value in the case of being in recovery could be quite off, compared to the normal value.

What do you think?

@elpavel
Copy link

elpavel commented Dec 22, 2023

I am not sure what is the best solution here, but anything would be better than the original failing query.

@mbanck-ntap
Copy link

In any case, the patch proposed here leads to no metrics being sent back from the standby at all, is that what we want?

@fmbiete
Copy link
Author

fmbiete commented May 18, 2024

Yes, that query should only be executed in the primary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants