Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having to add privileged:true in osd deployment spec for blkdevmapper to fix permission denied #9186

Closed
mikementzmaersk opened this issue Nov 16, 2021 · 6 comments · Fixed by #9191
Labels
Projects

Comments

@mikementzmaersk
Copy link

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:

OSD Deployments are getting stuck in pod initialisation phase, init container blkdevmapper loggin permission denied creating special file.
Adding privileged:true to blkdevmapper container security context resolves.

Expected behavior:
Shouldn't have to edit the deployment yaml

How to reproduce it (minimal and precise):

rook-ceph-operator version v1.7.7-16.g20b74f0

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary
  • Operator's logs, if necessary
  • Crashing pod(s) logs, if necessary

To get logs, use kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the insert code button from the Github UI.
Read Github documentation if you need help.

Environment:

  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Cloud provider or hardware configuration:
  • Rook version (use rook version inside of a Rook Pod):
  • Storage backend version (e.g. for ceph do ceph -v):
  • Kubernetes version (use kubectl version):
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): OpenShift
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):
@travisn
Copy link
Member

travisn commented Nov 16, 2021

This may be fixed by #9158 that just merged. Can you test with the latest test image rook/ceph:v1.7.7-16.g20b74f0 to see if it solves this issue? This will be included in the v1.7.8 release in the next couple days.

@travisn
Copy link
Member

travisn commented Nov 16, 2021

Ah, just noticed you're already using that image. @Omar007 is rook/ceph:v1.7.7-16.g20b74f0 working for you now? It seems the pod still needs to be privileged.

@travisn travisn added this to To do in v1.7 via automation Nov 16, 2021
@travisn travisn moved this from To do to Blocking Release in v1.7 Nov 16, 2021
@Omar007
Copy link
Contributor

Omar007 commented Nov 16, 2021

I'll give that version a shot later but at the time of the PR, it solved it on my end. So unless I've missed a different permission that is also relevant or it somehow wasn't granted, it should be working 🤔

@mikementzmaersk Is there any output/logging available? Or is it the exact same as in #9156 ?

@mikementzmaersk
Copy link
Author

Hi
Just took a look at 9156 - it is certainly the same error message.

What I did note though was that the MKNOD capability was already in the blkdevmapper deployment (as we also had to add MKNOD to the allowedCapabilities section of the SCC) in order to get the pod created.

Just now trying to figure out whether it really should run as privileged or not. The other init container all have privileged: true in their security sections.

@Omar007
Copy link
Contributor

Omar007 commented Nov 16, 2021

Since you're on OpenShift I assume you are running with hostpathRequiresPrivileged: true, which would be why the others (and blkdevmapper before my change) would all run fully privileged.
As the blkdevmapper isn't doing anything with host paths, this flag was completely dropped in that change and I still would not expect privileged: true to be required for this container at all. At the moment to me it looks like it's something i.r.t. OpenShift specifically which was just always hidden because it just ran privileged (just like the MKNOD capability only came to light specifically because I ran non-privileged and also on a container engine that does not grant that capability by default).

However, I do not have an OpenShift cluster in my back pocket to try this on and my Linux knowledge does not go much further than those base capabilities. I do know OpenShift/Red Hat tends to do a lot of SELinux level stuff, maybe that becomes relevant and also requires certain capabilities? Sadly that is completely out of my area of expertise/knowledge so I can't really help there :(

In the mean time, if we do not know what is involved for this case and it's apparently breaking OpenShift deployments, maybe we need to update the function I added to listen to that flag hostpathRequiresPrivileged as well to keep the backward compatible behaviour (work-around) in-place until this can be figured out in more detail and with the proper set of settings 🤔

@travisn
Copy link
Member

travisn commented Nov 16, 2021

@leseb is investigating the minimal privileges with #9175 and testing on openshift. In the meantime, that sounds reasonable to listen to the flag to run them privileged.

v1.7 automation moved this from Blocking Release to Done Nov 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
v1.7
Done
Development

Successfully merging a pull request may close this issue.

3 participants