Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVC creation pending due to deadline exception on external cluster using official 1.10 rook docs which remain after apply a fix to issue 8696 #11347

Closed
egulatee opened this issue Nov 24, 2022 · 24 comments

Comments

@egulatee
Copy link

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:
PVC stuck in pending

Expected behavior:
PVC should be created?

How to reproduce it (minimal and precise):

I followed the instructions provided here: https://rook.io/docs/rook/v1.10/CRDs/Cluster/external-cluster/

  • Applied all all Kubernetes manifest via argocd. (crds, common, operator, common-external, cluster-external) yaml files

  • Manually Generate ceph cluster environment variables via shell script on source cluster

  • Manually Set environment variables and ran import-external cluster shell script.

  • Create a sample PVC using rbd SC, which remains pending. (I will attach logs)

  • Before I opened this issue, I tried the solutions for issue: New BlockPool / SC + Parallel RBD Volume Creation hangs and fails #8696:
    1 - I ran rbd pool init rook_rbd_storage then recreated the PVC. No change, PVC remains pending.
    2 - I tried creating the toolbox, but it doesn't seem to work with external clusters.

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary

Using https://raw.githubusercontent.com/rook/rook/3bccf60c5fd853fb80ecd4d3e8e0d146aa7226a9/deploy/examples/cluster-external.yaml

Logs to submit:

  • Operator's logs, if necessary

kubectl_logs_ceph_operator.txt

  • Crashing pod(s) logs, if necessary

kubectl_logs_rbdplugin-provisioner.txt

To get logs, use kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the insert code button from the Github UI.
Read GitHub documentation if you need help.

Cluster Status to submit:
kubectl_describe_cephcluster.txt

  • Output of krew commands, if necessary
    `ceph status
    cluster:
    id: a896bcd1-6089-470f-a973-85f2fabe5149
    health: HEALTH_OK

    services:
    mon: 5 daemons, quorum nuc9034,nuc9036,nuc9038,nuc9039,nuc9037 (age 45h)
    mgr: nuc9035(active, since 22h), standbys: nuc9036, nuc9034
    mds: 2/2 daemons up, 2 standby
    osd: 8 osds: 8 up (since 2d), 8 in (since 2d)

    data:
    volumes: 2/2 healthy
    pools: 7 pools, 169 pgs
    objects: 14.12k objects, 54 GiB
    usage: 161 GiB used, 20 TiB / 20 TiB avail
    pgs: 169 active+clean

    io:
    client: 600 KiB/s wr, 0 op/s rd, 135 op/s wr
    `

    To get the health of the cluster, use kubectl rook-ceph health
    To get the status of the cluster, use kubectl rook-ceph ceph status
    For more details, see the Rook Krew Plugin

Environment:

  • OS (e.g. from /etc/os-release):
    PRETTY_NAME="Ubuntu 22.04.1 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.1 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy
  • Kernel (e.g. uname -a):
    Linux k3s-node1 5.15.0-1021-kvm #26-Ubuntu SMP Tue Oct 25 18:39:10 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Cloud provider or hardware configuration:
  • Rook version (use rook version inside of a Rook Pod):
    rook version rook: v1.10.6 go: go1.18.7
  • Storage backend version (e.g. for ceph do ceph -v):
    Inside Rook Pod: ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
    ProxMox Ceph Version: 16.2.9
  • Kubernetes version (use kubectl version):
    kubectl version WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version. Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.1", GitCommit:"3ddd0f45aa91e2f30c70734b175631bec5b5825a", GitTreeState:"clean", BuildDate:"2022-05-24T12:17:11Z", GoVersion:"go1.18.2", Compiler:"gc", Platform:"darwin/arm64"} Kustomize Version: v4.5.4 Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3+k3s1", GitCommit:"f2585c1671b31b4b34bddbb3bf4e7d69662b0821", GitTreeState:"clean", BuildDate:"2022-10-25T19:59:38Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
    'K3S on 7 Intel Nucs with dual NICs and VLANs'
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):
    Unable to bring up toolbox.
@egulatee egulatee added the bug label Nov 24, 2022
@egulatee
Copy link
Author

A few questions / comments:

1- If I want to leverage an external ceph cluster so that rook can provide Kubernetes the capability to use SC, PVC and PV. I should be using cluster-external.yaml. (I ask because I'm unclear what cluster-external-management.yaml does management of the cluster, beyond just consuming it). I get different errors going down that approach but I'm under the impression it's not necessary

2- When extracting the config from the source cluster, I am not specifying the RGW endpoint. Might that be the cause?
https://rook.io/docs/rook/v1.10/CRDs/Cluster/external-cluster/#1-create-all-users-and-keys
python3 create-external-cluster-resources.py --cephfs-filesystem-name <filesystem-name> --rbd-data-pool-name <pool_name> --cluster-name <cluster-name> --restricted-auth-permission true --format <bash> --rgw-endpoint <rgw_endpoin> --namespace <rook-ceph-external>

@parth-gr
Copy link
Member

parth-gr commented Nov 29, 2022

Create a sample PVC using rbd SC, which remains pending. (I will attach logs)

Please share the logs and describe of pvc.

1- If I want to leverage an external ceph cluster so that rook can provide Kubernetes the capability to use SC, PVC and PV. I should be using cluster-external.yaml. (I ask because I'm unclear what cluster-external-management.yaml does management of the cluster, beyond just consuming it). I get different errors going down that approach but I'm under the impression it's not necessary

Cluster-mangement is not necessary
Is see it just adding use of some additional feature in the CR

dataDirHostPath: /var/lib/rook
 cephVersion:
   image: quay.io/ceph/ceph:v17.2.5 # Should match external cluster version

Letme know what exactly you want and you are not able to get it, Then I can help you in investegation

2- When extracting the config from the source cluster, I am not specifying the RGW endpoint. Might that be the cause?
https://rook.io/docs/rook/v1.10/CRDs/Cluster/external-cluster/#1-create-all-users-and-keys
python3 create-external-cluster-resources.py --cephfs-filesystem-name --rbd-data-pool-name <pool_name> --cluster-name --restricted-auth-permission true --format --rgw-endpoint <rgw_endpoin> --namespace

That for rgw specific won't be a problem...

@jasminstrkonjic
Copy link

Hello,

I have the same problem. Also pending when I try to create PVC
After following the documentation okd is successfully connected to external cluster.

Client Version: 4.10.0-0.okd-2022-04-23-131357
Server Version: 4.10.0-0.okd-2022-04-23-131357
Kubernetes Version: v1.23.5-rc.0.2062+9ce5071670476d-dirty

ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)

oc get CephCluster -n rook-ceph-external NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL rook-ceph-external 8h Connected Cluster connected successfully HEALTH_OK true

There are no error logs in operator:

2023-01-25 21:09:11.343453 W | ceph-spec: waiting for connection info of the external cluster. retrying in 1m0s.
2023-01-25 21:09:57.365779 I | ceph-spec: parsing mon endpoints: ceph01=xxxxxxxx:6789
2023-01-25 21:09:57.365863 I | op-k8sutil: ROOK_OBC_WATCH_OPERATOR_NAMESPACE="true" (configmap)
2023-01-25 21:09:57.365870 I | op-bucket-prov: ceph bucket provisioner launched watching for provisioner "rook-ceph-external.ceph.rook.io/bucket"
2023-01-25 21:09:57.366259 I | op-bucket-prov: successfully reconciled bucket provisioner
I0125 21:09:57.366327 1 manager.go:135] objectbucket.io/provisioner-manager "msg"="starting provisioner" "name"="rook-ceph-external.ceph.rook.io/bucket"
2023-01-25 21:10:11.351168 I | ceph-spec: parsing mon endpoints: ceph01=xxxxxxxx:6789
2023-01-25 21:10:11.351238 I | ceph-spec: found the cluster info to connect to the external cluster. will use "client.healthchecker" to check health and monitor status. mons=map[ceph01.king-paas.local:0xc002f13a40]
2023-01-25 21:10:11.354183 I | cephclient: writing config file /var/lib/rook/rook-ceph-external/rook-ceph-external.config
2023-01-25 21:10:11.354312 I | cephclient: generated admin config in /var/lib/rook/rook-ceph-external
2023-01-25 21:10:11.354324 I | ceph-cluster-controller: external cluster identity established
2023-01-25 21:10:11.376852 I | ceph-csi: successfully created csi config map "rook-ceph-csi-config"
2023-01-25 21:10:11.386000 I | ceph-cluster-controller: successfully updated csi config map
2023-01-25 21:10:11.386027 I | ceph-cluster-controller: enabling ceph mon monitoring goroutine for cluster "rook-ceph-external"
2023-01-25 21:10:11.386045 I | ceph-cluster-controller: enabling ceph status monitoring goroutine for cluster "rook-ceph-external"
2023-01-25 21:10:56.914323 I | op-mon: new external mon "ceph02" found: xxxxxxxx:6789, adding it
2023-01-25 21:10:56.914353 I | op-mon: new external mon "ceph03" found: xxxxxxxx:6789, adding it
2023-01-25 21:10:56.971235 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"rook-ceph-external","monitors":["xxxxxxx:6789","xxxxxxxx:6789","xxxxxxxx:6789"],"namespace":""}] data:ceph01=xxxxxxxx:6789,ceph02=xxxxxxxx:6789,ceph03=xxxxxxxx:6789 mapping:{"node":{}} maxMonId:2 outOfQuorum:]
2023-01-25 21:10:56.989508 I | cephclient: writing config file /var/lib/rook/rook-ceph-external/rook-ceph-external.config
2023-01-25 21:10:56.989708 I | cephclient: generated admin config in /var/lib/rook/rook-ceph-external

In project rook-ceph on server

deployment/csi-cephfsplugin-provisioner deploys registry.k8s.io/sig-storage/csi-attacher:v4.1.0,registry.k8s.io/sig-storage/csi-snapshotter:v6.2.1,registry.k8s.io/sig-storage/csi-resizer:v1.7.0,registry.k8s.io/sig-storage/csi-provisioner:v3.4.0,quay.io/cephcsi/cephcsi:v3.7.2
deployment #1 running for 9 hours - 0/2 pods growing to 2

deployment/csi-rbdplugin-provisioner deploys registry.k8s.io/sig-storage/csi-provisioner:v3.4.0,registry.k8s.io/sig-storage/csi-resizer:v1.7.0,registry.k8s.io/sig-storage/csi-attacher:v4.1.0,registry.k8s.io/sig-storage/csi-snapshotter:v6.2.1,quay.io/cephcsi/cephcsi:v3.7.2
deployment #1 running for 9 hours - 0/2 pods growing to 2

deployment/rook-ceph-operator deploys rook/ceph:v1.10.10
deployment #1 running for 9 hours - 1 pod

daemonset/csi-rbdplugin manages registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0,quay.io/cephcsi/cephcsi:v3.7.2
generation #1 running for 9 hours

daemonset/csi-cephfsplugin manages registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0,quay.io/cephcsi/cephcsi:v3.7.2
generation #1 running for 9 hours

In project rook-ceph-external

You have no services, deployment configs, or build configs.
Run 'oc new-app' to create an application.

Also I do not have csi provisioner pod created as @egulatee have the situation.
It seems that all is good but PVC provisioning does not work.
Can someone try to help please.

BR,
Jasmin

@Madhu-1
Copy link
Member

Madhu-1 commented Jan 26, 2023

@jasminstrkonjic can you please check https://rook.io/docs/rook/latest/Troubleshooting/ceph-csi-common-issues/ helps you to identify the problem? if not, please provide the required csi logs for debugging this.

@Madhu-1
Copy link
Member

Madhu-1 commented Jan 26, 2023

Also I do not have csi provisioner pod created as @egulatee have the situation.
It seems that all is good but PVC provisioning does not work.

Sorry missed this part. Check you have csi-deployment and daemonset created if not i think you might need to check the SCC associated with it, do oc describe for csi deployment and daemonset for some details

@jasminstrkonjic
Copy link

@Madhu-1 thank you very much for your help.
I had no deployments for any csi component wich is strange.
SCC was the case for operator, and after giving it anyuid, deploymet was scaled up successfully and after running import-external-cluster.sh operator was connected to external cluster.

Now I have deleted both namespaces and all resources.
Going to give whole procedure a second run.
Will let you know how it went.
Do you recommend to consolidate all resources to single namespace?

BR,
Jasmin

@Madhu-1
Copy link
Member

Madhu-1 commented Jan 26, 2023

Everything should work as it is no need for it.

@jasminstrkonjic
Copy link

Now I have consolidated everything to single namespace.
CephCluster is in connected state and csi deployments are created but ReplicaSets can not be scaled due to event.

Error creating: pods "csi-cephfsplugin-provisioner-5b6959d889-" is forbidden: unable to validate against any security context constraint: [spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

Error creating: pods "csi-rbdplugin-provisioner-d6df8c994-" is forbidden: unable to validate against any security context constraint: [spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

I have tried to give permissions as I gave it to operator deployment with no luck to successfully start them.
oc adm policy add-scc-to-user anyuid -z rook-csi-cephfs-provisioner-sa
oc adm policy add-scc-to-user anyuid -z rook-csi-rbd-provisioner-sa

@Madhu-1
Copy link
Member

Madhu-1 commented Jan 26, 2023

@jasminstrkonjic can you check if you have created the required SCC for csidriver https://github.com/rook/rook/blob/master/deploy/examples/operator-openshift.yaml#L53-L97? if yes looks like some issue with existing SCC's in the cluster as openshift chooses the SCC based on the caps.

@jasminstrkonjic
Copy link

jasminstrkonjic commented Jan 26, 2023

Thank you this SCC solved the problem for csi pod to be created.
Now I have tried to create PVC and it is failing due to event

failed to provision volume with StorageClass "ceph-rbd": error getting secret rook-csi-rbd-provisioner in namespace rook-ceph-external: secrets "rook-csi-rbd-provisioner" not found

obvious some secrets have not been created... will try to fix it

@Madhu-1
Copy link
Member

Madhu-1 commented Jan 26, 2023

you need to change the secret namespace in storageclass you are only using rook-ceph namespace for both ceph and rook operator

@jasminstrkonjic
Copy link

Found the problem there was
Error from server (AlreadyExists): error when creating "STDIN": storageclasses.storage.k8s.io "ceph-rbd" already exists
when I deleted resources, forgot to delete SC.

Fixed by deleting old and created new SC with valid namespace parameter.
PVC is successfully bounded.

Thank you so much.
Owe you a beer

@jasminstrkonjic
Copy link

@Madhu-1 have one last request to make this finally work.
Ceph-rbd storageclass works without any problems for auto provisioning PV and PVC.

Cephfs however return permisson denied on client pod (i.e. postgres)
The same errors are visible on csi-cephplugin driver on node where client pod is trying to instantiate.

MountVolume.MountDevice failed for volume "pvc-048e93f4-298f-4ba9-a5e8-996310d95793" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph xxxxxxxx:6789,xxxxxxxx:6789,xxxxxxxx:6789:/volumes/csi/csi-vol-ddcbbf39-9d8d-11ed-8436-0a580afe0d21/13e07e0e-f48e-48c6-93ae-68b8593e1a4c /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-048e93f4-298f-4ba9-a5e8-996310d95793/globalmount -o name=csi-cephfs-provisioner,secretfile=/tmp/csi/keys/keyfile-2824771535,mds_namespace=okdfs,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon 2023-01-26T15:28:05.375+0000 7fbd6ded7f40 -1 failed for service _ceph-mon._tcp mount error 13 = Permission denied

Maybe you can give me some hint where to focus since I do not know what to look anymore.

@Madhu-1
Copy link
Member

Madhu-1 commented Jan 26, 2023

996310d95793/globalmount -o name=csi-cephfs-provisioner,secretfile=/tmp/csi/keys/keyfile-

You should not use csi-cephfs-provisioner ceph user for StageSecret in storageclass because the provisioner user will not have required access to mount the cephfs volume. It would be best if you were using node user, which exists in node secret

csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node

@jasminstrkonjic
Copy link

jasminstrkonjic commented Jan 26, 2023

996310d95793/globalmount -o name=csi-cephfs-provisioner,secretfile=/tmp/csi/keys/keyfile-

You should not use csi-cephfs-provisioner ceph user for StageSecret in storageclass because the provisioner user will not have required access to mount the cephfs volume. It would be best if you were using node user, which exists in node secret

csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node

Thank you again. Dynamic provisioning works like charm for both storage classes but only for a moment.
After I have deleted pod I picked up an event
Multi-Attach error for volume "pvc-xxxxx" Volume is already exclusively attached to one node and can't be attached to another

Tried to change RWO, RWX on PVC but no luck...

@jasminstrkonjic
Copy link

also here are events. every pod with cepfs storage class (as default storage class) is crashing with not much info...

6s Warning BackOff pod/mariadb-1-5gdv2 Back-off restarting failed container
102s Normal Scheduled pod/mariadb-1-deploy Successfully assigned jasmin/mariadb-1-deploy to worker0
100s Normal AddedInterface pod/mariadb-1-deploy Add eth0 [xxxxxxxxxx] from openshift-sdn
100s Normal Pulled pod/mariadb-1-deploy Container image "quay.io/openshift/okd-content@sha256:6925d60b97415da6e8d02e8bd1289ff7fe5a82719f76e7e12c97c52eb681580d" already present on machine
100s Normal Created pod/mariadb-1-deploy Created container deployment
100s Normal Started pod/mariadb-1-deploy Started container deployment
99s Normal SuccessfulCreate replicationcontroller/mariadb-1 Created pod: mariadb-1-5gdv2
104s Normal ExternalProvisioning persistentvolumeclaim/mariadb waiting for a volume to be created, either by external provisioner "rook-ceph.cephfs.csi.ceph.com" or manually created by system administrator
104s Normal Provisioning persistentvolumeclaim/mariadb External provisioner is provisioning volume for claim "jasmin/mariadb"
103s Normal ProvisioningSucceeded persistentvolumeclaim/mariadb Successfully provisioned volume pvc-e52e9e62-a32a-42fd-9744-7ba5817b0689
102s Normal DeploymentCreated deploymentconfig/mariadb Created new replication controller "mariadb-1" for version 1

@Madhu-1
Copy link
Member

Madhu-1 commented Jan 27, 2023

@jasminstrkonjic, please check the pod logs/previous logs to see why it's failing.

@Madhu-1
Copy link
Member

Madhu-1 commented Jan 27, 2023

Multi-Attach error for volume "pvc-xxxxx" Volume is already exclusively attached to one node and can't be attached to another

This happens for RWO volume, and the error comes from kubernetes/OCP, not Rook.

@jasminstrkonjic
Copy link

jasminstrkonjic commented Jan 27, 2023

B

@jasminstrkonjic, please check the pod logs/previous logs to see why it's failing.

Problem is only with cephfs storageclass. If I instantiate for example postgres template. Pod is in error state after a couple crashloopback. PVC and PV are successfully dynamic provisioned and bounded.
If I run pod in debug mode I can see that cephfs is mounted and I have tried to run initdb command. Here is the output:

`[root@omgr01 ~]# oc debug postgresql-1-56mds
Starting pod/postgresql-1-56mds-debug, command was: container-entrypoint run-postgresql
Pod IP: 10.254.10.64
If you don't see a command prompt, try pressing enter.
sh-4.4$ df -h
Filesystem Size Used Avail Use% Mounted on
overlay 120G 14G 106G 12% /
tmpfs 64M 0 64M 0% /dev
shm 64M 0 64M 0% /dev/shm
tmpfs 3.2G 59M 3.1G 2% /etc/passwd
/dev/sda4 120G 14G 106G 12% /etc/hosts
10.23.13.23:6789,10.23.13.24:6789,10.23.13.25:6789:/volumes/csi/csi-vol-e9d09cd3-9e3f-11ed-8436-0a580afe0d21/f226764c-c134-4be7-af64-4d8c09f607c0 1.0G 0 1.0G 0% /var/lib/pgsql/data
tmpfs 512M 20K 512M 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 7.9G 0 7.9G 0% /proc/acpi
tmpfs 7.9G 0 7.9G 0% /proc/scsi
tmpfs 7.9G 0 7.9G 0% /sys/firmware
sh-4.4$ initdb -D /var/lib/pgsql/
.bash_profile backups/ data/
sh-4.4$ initdb -D /var/lib/pgsql/data/userdata/
The files belonging to this database system will be owned by user "1000610000".
This user must also own the server process.

The database cluster will be initialized with locale "C".
The default database encoding has accordingly been set to "SQL_ASCII".
The default text search configuration will be set to "english".

Data page checksums are disabled.

initdb: directory "/var/lib/pgsql/data/userdata" exists but is not empty
If you want to create a new database system, either remove or empty
the directory "/var/lib/pgsql/data/userdata" or run initdb
with an argument other than "/var/lib/pgsql/data/userdata".
sh-4.4$ rm -rf /var/lib/pgsql/data/userdata/*
sh-4.4$ initdb -D /var/lib/pgsql/data/userdata/
The files belonging to this database system will be owned by user "1000610000".
This user must also own the server process.

The database cluster will be initialized with locale "C".
The default database encoding has accordingly been set to "SQL_ASCII".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/pgsql/data/userdata ... ok
creating subdirectories ... ok
initdb: could not open file "/var/lib/pgsql/data/userdata/postgresql.conf" for writing: Permission denied
initdb: removing contents of data directory "/var/lib/pgsql/data/userdata"
could not stat file or directory "/var/lib/pgsql/data/userdata/postgresql.conf": Permission denied
initdb: failed to remove contents of data directory
sh-4.4$ ls -al /var/lib/pgsql/
.bash_profile backups/ data/
sh-4.4$ ls -al /var/lib/pgsql/data/userdata/
ls: cannot access '/var/lib/pgsql/data/userdata/postgresql.conf': Permission denied
total 0
drwx------. 2 1000610000 1000610000 1 Jan 27 13:51 .
drwxrwsr-x. 3 root 1000610000 1 Jan 27 13:22 ..
-?????????? ? ? ? ? ? postgresql.conf
`

Also if I try to manually create file or directory it works fine. I'm totally confused now.

sh-4.4$ pwd
/var/lib/pgsql/data/userdata
sh-4.4$ touch testfile
sh-4.4$ echo "test" >> testfile
sh-4.4$ cat testfile
test
sh-4.4$ ls -al
ls: cannot access 'postgresql.conf': Permission denied
total 1
drwx------. 2 1000610000 1000610000 2 Jan 27 13:57 .
drwxrwsr-x. 3 root 1000610000 1 Jan 27 13:22 ..
-?????????? ? ? ? ? ? postgresql.conf
-rw-r--r--. 1 1000610000 root 5 Jan 27 13:57 testfile

Any help would be much appreciated.

@jasminstrkonjic
Copy link

Permission denied is on every deployment pod that uses cephfs... I have tried with jenkins and jboss.

@jasminstrkonjic
Copy link

jasminstrkonjic commented Jan 27, 2023

I belive that issue is lot like this related with SELinux boolean "container_use_cephfs"
ceph/ceph-csi#1097
and more likely this
okd-project/okd#1160

Can't find proper note how to set this up on FedoraCoreOS for OKD...

@jasminstrkonjic
Copy link

I can confirm that this was the solution for OKD 4.10 Fedora CoreOS 35
okd-project/okd#1160

Need to add parameter to cephfs storageclass
kernelMountOptions: wsync

It works! Horay

@parth-gr parth-gr removed their assignment Mar 10, 2023
@github-actions
Copy link

github-actions bot commented May 9, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions
Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants