You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched for an existing issue, and could not find anything. I believe this is a new bug.
I have read the troubleshooting guide
I have read the troubleshooting guide and I think this is a new bug.
I am running a supported version of CloudNativePG
I have read the troubleshooting guide and I think this is a new bug.
Contact Details
(I'll watch this issue)
Version
1.23.0
What version of Kubernetes are you using?
1.26
What is your Kubernetes environment?
Cloud: Other (provider using Gardener on OpenStack)
How did you install the operator?
Helm
What happened?
Start with a database with backup setup (base + WAL) and make sure a base backup is present.
Add a (temporary) tablespace by updating the resource, and make sure WALs are backed up. I also deleted the cluster afterwards.
Create a new cluster by bootstrapping from recovery, from the backup.
The recovery process has a fatal error on the redo of the Tablespace/CREATE record.
I was able to recover to just before affected WAL record without data loss, but I can imagine that this could bring unexpected downtime during recovery in certain situations, hence this report.
At the very least, I would recommend to mention something in the tablespaces limitations, e.g. make sure to create a base backup right after changing tablespaces (which I assume would work fine) and/or noting that changing tablespaces of an existing cluster can introduce recovery issues.
Cluster resource
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: db2
labels:
cnpg.io/reload: 'true'
spec:
instances: 1
imageName: ghcr.io/cloudnative-pg/postgresql:16.2
storage:
size: 80Gi
# tried restoring both with and without this tablespace# in both cases, restore failed because the tablespace was not setup# ideally, I would be able to restore with a different temporary tablespace setup#tablespaces:# - name: temporary# storage:# size: 40Gi# temporary: true
env:
- name: PSQL_HISTORY
value: /run/psql_history
bootstrap:
recovery:
source: clusterBackup
database: db
owner: db
secret:
name: db-secret
resources:
limits:
memory: 6000Mi
requests:
memory: 3000Mi
postgresql:
parameters:
checkpoint_completion_target: "0.9"
default_statistics_target: "100"
effective_cache_size: 4608MB
effective_io_concurrency: "300"
maintenance_work_mem: 384MB
max_connections: "20"
max_standby_streaming_delay: "300000"
max_wal_size: 4GB
min_wal_size: 1GB
pg_stat_statements.max: "10000"
pg_stat_statements.track: all
random_page_cost: "1.1"
shared_buffers: 1536MB
timezone: Europe/Amsterdam
wal_buffers: 16MB
work_mem: 19660kB
externalClusters:
- name: clusterBackup
barmanObjectStore:
destinationPath: "s3://dbbackup/pg16"
endpointURL: "https://s3.example.com"
serverName: db
s3Credentials:
accessKeyId:
name: db-s3-secret
key: key
secretAccessKey:
name: db-s3-secret
key: secret
wal:
maxParallel: 4
Relevant log output
This is the log output for when the tablespace was not present in the cluster definition. When the tablespace was present, restore failed with the error "PGData already exists, can't overwrite" (instead of here that the directory is absent).
...
{"level":"info","ts":"2024-04-29T21:28:43Z","logger":"postgres","msg":"record","logging_pod":"db2-1-full-recovery","record":{"log_time":"2024-04-29 21:28:43.892 UTC","process_id":"37","session_id":"662fe84e.25","session_line_num":"18117","session_start_time":"2024-04-29 18:34:54 UTC","virtual_transaction_id":"1/0","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"restored log file \"0000000C00000277000000AE\" from archive","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-04-29T21:28:44Z","logger":"postgres","msg":"record","logging_pod":"db2-1-full-recovery","record":{"log_time":"2024-04-29 21:28:44.368 UTC","process_id":"37","session_id":"662fe84e.25","session_line_num":"18118","session_start_time":"2024-04-29 18:34:54 UTC","virtual_transaction_id":"1/0","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"restored log file \"0000000C00000277000000AF\" from archive","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-04-29T21:28:44Z","msg":"Checking if the server is still in recovery","logging_pod":"db2-1-full-recovery","recovery":true}
{"level":"info","ts":"2024-04-29T21:28:44Z","logger":"postgres","msg":"record","logging_pod":"db2-1-full-recovery","record":{"log_time":"2024-04-29 21:28:44.824 UTC","process_id":"37","session_id":"662fe84e.25","session_line_num":"18119","session_start_time":"2024-04-29 18:34:54 UTC","virtual_transaction_id":"1/0","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"restored log file \"0000000C00000277000000B0\" from archive","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-04-29T21:28:45Z","logger":"postgres","msg":"record","logging_pod":"db2-1-full-recovery","record":{"log_time":"2024-04-29 21:28:45.296 UTC","process_id":"37","session_id":"662fe84e.25","session_line_num":"18120","session_start_time":"2024-04-29 18:34:54 UTC","virtual_transaction_id":"1/0","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"restored log file \"0000000C00000277000000B1\" from archive","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-04-29T21:28:45Z","logger":"postgres","msg":"record","logging_pod":"db2-1-full-recovery","record":{"log_time":"2024-04-29 21:28:45.871 UTC","process_id":"37","session_id":"662fe84e.25","session_line_num":"18121","session_start_time":"2024-04-29 18:34:54 UTC","virtual_transaction_id":"1/0","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"restored log file \"0000000C00000277000000B2\" from archive","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-04-29T21:28:45Z","logger":"postgres","msg":"record","logging_pod":"db2-1-full-recovery","record":{"log_time":"2024-04-29 21:28:45.895 UTC","process_id":"37","session_id":"662fe84e.25","session_line_num":"18122","session_start_time":"2024-04-29 18:34:54 UTC","virtual_transaction_id":"1/0","transaction_id":"0","error_severity":"FATAL","sql_state_code":"58P01","message":"directory \"/var/lib/postgresql/tablespaces/temporary/data\" does not exist","hint":"Create this directory for the tablespace before restarting the server.","context":"WAL redo at 277/AE0031F0 for Tablespace/CREATE: 3383535 \"/var/lib/postgresql/tablespaces/temporary/data\"","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-04-29T21:28:45Z","logger":"postgres","msg":"record","logging_pod":"db2-1-full-recovery","record":{"log_time":"2024-04-29 21:28:45.953 UTC","process_id":"33","session_id":"662fe84e.21","session_line_num":"6","session_start_time":"2024-04-29 18:34:54 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"startup process (PID 37) exited with exit code 1","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-04-29T21:28:45Z","logger":"postgres","msg":"record","logging_pod":"db2-1-full-recovery","record":{"log_time":"2024-04-29 21:28:45.953 UTC","process_id":"33","session_id":"662fe84e.21","session_line_num":"7","session_start_time":"2024-04-29 18:34:54 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"terminating any other active server processes","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-04-29T21:28:46Z","logger":"postgres","msg":"record","logging_pod":"db2-1-full-recovery","record":{"log_time":"2024-04-29 21:28:46.141 UTC","process_id":"33","session_id":"662fe84e.21","session_line_num":"8","session_start_time":"2024-04-29 18:34:54 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"shutting down due to startup process failure","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-04-29T21:28:46Z","logger":"postgres","msg":"record","logging_pod":"db2-1-full-recovery","record":{"log_time":"2024-04-29 21:28:46.310 UTC","process_id":"33","session_id":"662fe84e.21","session_line_num":"9","session_start_time":"2024-04-29 18:34:54 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"database system is shut down","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-04-29T21:28:49Z","logger":"pg_ctl","msg":"pg_ctl: no server running\n","pipe":"stdout","logging_pod":"db2-1-full-recovery"}
{"level":"info","ts":"2024-04-29T21:28:49Z","msg":"Error while deactivating instance","logging_pod":"db2-1-full-recovery","err":"instance is not running"}
{"level":"info","ts":"2024-04-29T21:28:49Z","msg":"Exited log pipe","fileName":"/controller/log/postgres.csv","logging_pod":"db2-1-full-recovery"}
{"level":"error","ts":"2024-04-29T21:28:49Z","msg":"Error while restoring a backup","logging_pod":"db2-1-full-recovery","error":"while waiting for PostgreSQL to stop recovery mode: error while reading results of pg_is_in_recovery: failed to connect to `host=/controller/run user=postgres database=postgres`: dial error (dial unix /controller/run/.s.PGSQL.5432: connect: no such file or directory)","stacktrace":"github.com/cloudnative-pg/cloudnative-pg/pkg/management/log.(*logger).Error\n\tpkg/management/log/log.go:125\ngithub.com/cloudnative-pg/cloudnative-pg/pkg/management/log.Error\n\tpkg/management/log/log.go:163\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/restore.restoreSubCommand\n\tinternal/cmd/manager/instance/restore/cmd.go:92\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/restore.NewCmd.func2\n\tinternal/cmd/manager/instance/restore/cmd.go:63\ngithub.com/spf13/cobra.(*Command).execute\n\tpkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tpkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\tpkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\tcmd/manager/main.go:66\nruntime.main\n\t/opt/hostedtoolcache/go/1.22.2/x64/src/runtime/proc.go:271"}
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
You are right. This is something that needs to be documented, at the very least. IMO, if you set the 'future' state (i.e. with), we should be able to support it.
Is there an existing issue already for this bug?
I have read the troubleshooting guide
I am running a supported version of CloudNativePG
Contact Details
(I'll watch this issue)
Version
1.23.0
What version of Kubernetes are you using?
1.26
What is your Kubernetes environment?
Cloud: Other (provider using Gardener on OpenStack)
How did you install the operator?
Helm
What happened?
Tablespace/CREATE
record.I was able to recover to just before affected WAL record without data loss, but I can imagine that this could bring unexpected downtime during recovery in certain situations, hence this report.
At the very least, I would recommend to mention something in the tablespaces limitations, e.g. make sure to create a base backup right after changing tablespaces (which I assume would work fine) and/or noting that changing tablespaces of an existing cluster can introduce recovery issues.
Cluster resource
Relevant log output
This is the log output for when the tablespace was not present in the cluster definition. When the tablespace was present, restore failed with the error "PGData already exists, can't overwrite" (instead of here that the directory is absent).
Code of Conduct
The text was updated successfully, but these errors were encountered: