Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph: modify CephFS provisioner permission #8739

Merged
merged 1 commit into from Sep 22, 2021

Conversation

Madhu-1
Copy link
Member

@Madhu-1 Madhu-1 commented Sep 17, 2021

Description of your changes:
Like RBD, the CephFS provisioner pod need not run as privileged. as it's not doing any operation like plugin pods which does mounting and unmounting removing the permissions for the same.

Signed-off-by: Madhu Rajanna madhupr007@gmail.com

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: Add the flag for skipping the build if this is only a documentation change. See here for the flag.
  • Skip Unrelated Tests: Add a flag to run tests for a specific storage provider. See test options.
  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.

@mergify mergify bot added the ceph main ceph tag label Sep 17, 2021
@leseb leseb added the security label Sep 17, 2021
Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thoughts, I see you have a similar change in ceph/ceph-csi#2518. It's "a lot" of back and forth.
What if ceph-csi would provide its yaml files as code? So that Rook, when pulling a new version for ceph-csi (through go modules) can also get the updated yamls?

@Madhu-1
Copy link
Member Author

Madhu-1 commented Sep 17, 2021

After thoughts, I see you have a similar change in ceph/ceph-csi#2518. It's "a lot" of back and forth.
What if ceph-csi would provide its yaml files as code? So that Rook, when pulling a new version for ceph-csi (through go modules) can also get the updated yamls?

We don't have such a mechanism in cephcsi and we are not exposing any functions for external consumptions.

@leseb
Copy link
Member

leseb commented Sep 17, 2021

After thoughts, I see you have a similar change in ceph/ceph-csi#2518. It's "a lot" of back and forth.
What if ceph-csi would provide its yaml files as code? So that Rook, when pulling a new version for ceph-csi (through go modules) can also get the updated yamls?

We don't have such a mechanism in cephcsi and we are not exposing any functions for external consumptions.

Yes but you do have an examples directory right? And it needs to be synced with Rook all the time, right?

@Madhu-1
Copy link
Member Author

Madhu-1 commented Sep 17, 2021

After thoughts, I see you have a similar change in ceph/ceph-csi#2518. It's "a lot" of back and forth.
What if ceph-csi would provide its yaml files as code? So that Rook, when pulling a new version for ceph-csi (through go modules) can also get the updated yamls?

We don't have such a mechanism in cephcsi and we are not exposing any functions for external consumptions.

Yes but you do have an examples directory right? And it needs to be synced with Rook all the time, right?

Yes correct

@leseb
Copy link
Member

leseb commented Sep 17, 2021

After thoughts, I see you have a similar change in ceph/ceph-csi#2518. It's "a lot" of back and forth.
What if ceph-csi would provide its yaml files as code? So that Rook, when pulling a new version for ceph-csi (through go modules) can also get the updated yamls?

We don't have such a mechanism in cephcsi and we are not exposing any functions for external consumptions.

Yes but you do have an examples directory right? And it needs to be synced with Rook all the time, right?

Yes correct

Honestly, it feels like an annoyance to me. @BlaineEXE @travisn thoughts?

@travisn
Copy link
Member

travisn commented Sep 17, 2021

After thoughts, I see you have a similar change in ceph/ceph-csi#2518. It's "a lot" of back and forth.
What if ceph-csi would provide its yaml files as code? So that Rook, when pulling a new version for ceph-csi (through go modules) can also get the updated yamls?

We don't have such a mechanism in cephcsi and we are not exposing any functions for external consumptions.

Yes but you do have an examples directory right? And it needs to be synced with Rook all the time, right?

Yes correct

Honestly, it feels like an annoyance to me. @BlaineEXE @travisn thoughts?

This would introduce a code dependency between rook and ceph-csi that we don't have today, right? But removing the duplication between repos would certainly be convenient. Anyways, a much bigger change separate from this PR

@BlaineEXE
Copy link
Member

I think we wouldn't need any code synchronization if the templates CSI needed were stored using Go 1.16's embed file feature, right?

@travisn
Copy link
Member

travisn commented Sep 17, 2021

I think we wouldn't need any code synchronization if the templates CSI needed were stored using Go 1.16's embed file feature, right?

Actually, how would this change work? If the manifests are embedded in the CSI image rather than the Rook image, how would Rook create them?

@leseb
Copy link
Member

leseb commented Sep 20, 2021

I think we wouldn't need any code synchronization if the templates CSI needed were stored using Go 1.16's embed file feature, right?

Actually, how would this change work? If the manifests are embedded in the CSI image rather than the Rook image, how would Rook create them?

If manifest are stored with embed, the code still needs to parse and unmarshal in a known type, eg: DaemonSet

@travisn
Copy link
Member

travisn commented Sep 20, 2021

I think we wouldn't need any code synchronization if the templates CSI needed were stored using Go 1.16's embed file feature, right?

Actually, how would this change work? If the manifests are embedded in the CSI image rather than the Rook image, how would Rook create them?

If manifest are stored with embed, the code still needs to parse and unmarshal in a known type, eg: DaemonSet

But if rook is going to read the manifest stored with embed, doesn't it have to be embedded in the rook binary? In that case I don't see how we can move the manifest to the csi repo

@Madhu-1
Copy link
Member Author

Madhu-1 commented Sep 21, 2021

I think we wouldn't need any code synchronization if the templates CSI needed were stored using Go 1.16's embed file feature, right?

Actually, how would this change work? If the manifests are embedded in the CSI image rather than the Rook image, how would Rook create them?

If manifest are stored with embed, the code still needs to parse and unmarshal in a known type, eg: DaemonSet

But if rook is going to read the manifest stored with embed, doesn't it have to be embedded in the rook binary? In that case I don't see how we can move the manifest to the csi repo

AFAIK yes we need to embed this to the Rook binary. again if we need that its adds more inter dependency between Rook and CSI releases and it's more work in cephcsi to support this one.

@leseb
Copy link
Member

leseb commented Sep 21, 2021

I think we wouldn't need any code synchronization if the templates CSI needed were stored using Go 1.16's embed file feature, right?

Actually, how would this change work? If the manifests are embedded in the CSI image rather than the Rook image, how would Rook create them?

If manifest are stored with embed, the code still needs to parse and unmarshal in a known type, eg: DaemonSet

But if rook is going to read the manifest stored with embed, doesn't it have to be embedded in the rook binary? In that case I don't see how we can move the manifest to the csi repo

AFAIK yes we need to embed this to the Rook binary. again if we need that its adds more inter dependency between Rook and CSI releases and it's more work in cephcsi to support this one.

embed is only to put the template in the binary but the template still remains as a file within the package itself. So I believe that if CSI provides the template as code in its API it should work. Rook does not need to read the template file but the object like corev1.DaemonSet.

Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for now, will bring the topic in the next CSI meeting.

@travisn
Copy link
Member

travisn commented Sep 21, 2021

@Madhu-1 The file test is failing consistently in the smoke suite:

2021-09-17 08:38:14.317974 I | testutil: Giving up waiting for pod file-test in namespace smoke-ns to be running
    ceph_base_file_test.go:319: 
        	Error Trace:	ceph_base_file_test.go:319
        	            				ceph_base_file_test.go:262
        	            				ceph_smoke_test.go:127
        	Error:      	Should be true
        	Test:       	TestCephSmokeSuite/TestFileStorage_SmokeTest
        	Messages:   	make sure file-test pod is in running state

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI needs investigation

@Madhu-1
Copy link
Member Author

Madhu-1 commented Sep 21, 2021

CI needs investigation

Am able to create PVC and mount to application pod locally with this change

[root@dhcp53-170 ceph]# kubectl get pvc
NAME           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
cephfs-pvc-1   Bound    pvc-21ab0a8c-db0d-4aba-ae9a-45f27bd4f653   1Gi        RWO            rook-cephfs       3m51s
rbd-pvc-test   Bound    pvc-9a2db094-d507-4664-959c-b6bb2a5c1bda   8Gi        RWO            rook-ceph-block   8h
[root@dhcp53-170 ceph]# 
[root@dhcp53-170 ceph]# kubectl get po
NAME                 READY   STATUS    RESTARTS   AGE
csicephfs-demo-pod   1/1     Running   0          26s

[root@dhcp53-170 ceph]# kubectl get deployment -nrook-ceph csi-cephfsplugin-provisioner -oyaml |grep -i privileged
[root@dhcp53-170 ceph]# kubectl get deployment -nrook-ceph csi-cephfsplugin-provisioner -oyaml |grep -i secu
      securityContext: {}
[root@dhcp53-170 ceph]# 

Let me try to run CI locally.

@subhamkrai
Copy link
Contributor

CI needs investigation

Am able to create PVC and mount to application pod locally with this change

[root@dhcp53-170 ceph]# kubectl get pvc
NAME           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
cephfs-pvc-1   Bound    pvc-21ab0a8c-db0d-4aba-ae9a-45f27bd4f653   1Gi        RWO            rook-cephfs       3m51s
rbd-pvc-test   Bound    pvc-9a2db094-d507-4664-959c-b6bb2a5c1bda   8Gi        RWO            rook-ceph-block   8h
[root@dhcp53-170 ceph]# 
[root@dhcp53-170 ceph]# kubectl get po
NAME                 READY   STATUS    RESTARTS   AGE
csicephfs-demo-pod   1/1     Running   0          26s

[root@dhcp53-170 ceph]# kubectl get deployment -nrook-ceph csi-cephfsplugin-provisioner -oyaml |grep -i privileged
[root@dhcp53-170 ceph]# kubectl get deployment -nrook-ceph csi-cephfsplugin-provisioner -oyaml |grep -i secu
      securityContext: {}
[root@dhcp53-170 ceph]# 

Let me try to run CI locally.

FYI, @Madhu-1 you can log into GitHub action runner

@Madhu-1
Copy link
Member Author

Madhu-1 commented Sep 21, 2021

CI needs investigation

Am able to create PVC and mount to application pod locally with this change

[root@dhcp53-170 ceph]# kubectl get pvc
NAME           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
cephfs-pvc-1   Bound    pvc-21ab0a8c-db0d-4aba-ae9a-45f27bd4f653   1Gi        RWO            rook-cephfs       3m51s
rbd-pvc-test   Bound    pvc-9a2db094-d507-4664-959c-b6bb2a5c1bda   8Gi        RWO            rook-ceph-block   8h
[root@dhcp53-170 ceph]# 
[root@dhcp53-170 ceph]# kubectl get po
NAME                 READY   STATUS    RESTARTS   AGE
csicephfs-demo-pod   1/1     Running   0          26s

[root@dhcp53-170 ceph]# kubectl get deployment -nrook-ceph csi-cephfsplugin-provisioner -oyaml |grep -i privileged
[root@dhcp53-170 ceph]# kubectl get deployment -nrook-ceph csi-cephfsplugin-provisioner -oyaml |grep -i secu
      securityContext: {}
[root@dhcp53-170 ceph]# 

Let me try to run CI locally.

FYI, @Madhu-1 you can log into GitHub action runner

ssh 4Db8VxxQtXZPbbVqRkAAycKSR@sfo2.tmate.io
The session host disconnected 4 days ago.
Hopefully it will reconnect soon. You may try again later.
Connection to sfo2.tmate.io closed.

Let me try to rerun the test tomorrow and see

@subhamkrai
Copy link
Contributor

CI needs investigation

Am able to create PVC and mount to application pod locally with this change

[root@dhcp53-170 ceph]# kubectl get pvc
NAME           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
cephfs-pvc-1   Bound    pvc-21ab0a8c-db0d-4aba-ae9a-45f27bd4f653   1Gi        RWO            rook-cephfs       3m51s
rbd-pvc-test   Bound    pvc-9a2db094-d507-4664-959c-b6bb2a5c1bda   8Gi        RWO            rook-ceph-block   8h
[root@dhcp53-170 ceph]# 
[root@dhcp53-170 ceph]# kubectl get po
NAME                 READY   STATUS    RESTARTS   AGE
csicephfs-demo-pod   1/1     Running   0          26s

[root@dhcp53-170 ceph]# kubectl get deployment -nrook-ceph csi-cephfsplugin-provisioner -oyaml |grep -i privileged
[root@dhcp53-170 ceph]# kubectl get deployment -nrook-ceph csi-cephfsplugin-provisioner -oyaml |grep -i secu
      securityContext: {}
[root@dhcp53-170 ceph]# 

Let me try to run CI locally.

FYI, @Madhu-1 you can log into GitHub action runner

ssh 4Db8VxxQtXZPbbVqRkAAycKSR@sfo2.tmate.io
The session host disconnected 4 days ago.
Hopefully it will reconnect soon. You may try again later.
Connection to sfo2.tmate.io closed.

Yes, it has timeout of 1 hr.

Let me try to rerun the test tomorrow and see

@Madhu-1
Copy link
Member Author

Madhu-1 commented Sep 22, 2021

runner@fv-az120-915:~/work/rook/rook$ kubectl get po -nsmoke-ns
NAME                                                     READY   STATUS      RESTARTS   AGE
rook-ceph-crashcollector-fv-az120-915-68d566c77f-fcvvh   1/1     Running     0          112s
rook-ceph-mgr-a-78c6ff567c-5z4j8                         1/1     Running     0          10m
rook-ceph-mon-a-68bbf96d58-qdc54                         1/1     Running     0          11m
rook-ceph-mon-b-74978dc6dd-szfqh                         1/1     Running     0          11m
rook-ceph-mon-c-85896c99b6-g2pwg                         1/1     Running     0          11m
rook-ceph-osd-0-5f9c94f77d-4jfsb                         1/1     Running     0          9m21s
rook-ceph-osd-prepare-fv-az120-915-9zb57                 0/1     Completed   0          10m
rook-ceph-tools-7865b9c9f6-tdgcl                         1/1     Running     0          10m
runner@fv-az120-915:~/work/rook/rook$ kubectl exec -it rook-ceph-tools-7865b9c9f6-tdgcl -nsmoke-ns -- sh
sh-4.4# ceph -s
  cluster:
    id:     7ae97632-c490-4967-801e-a773d04ae288
    health: HEALTH_ERR
            1 filesystem is offline
            1 filesystem is online with fewer MDS than max_mds
            3 pool(s) have no replicas configured
 
  services:
    mon: 3 daemons, quorum a,b,c (age 11m)
    mgr: a(active, since 10m)
    mds: 0/0 daemons up
    osd: 1 osds: 1 up (since 9m), 1 in (since 11m)
 
  data:
    volumes: 1/1 healthy
    pools:   3 pools, 192 pgs
    objects: 3 objects, 0 B
    usage:   7.9 MiB used, 14 GiB / 14 GiB avail
    pgs:     192 active+clean
 
sh-4.4#

ceph cluster is in error state. @travisn @subhamkrai anyone can help here?

@subhamkrai
Copy link
Contributor

runner@fv-az120-915:~/work/rook/rook$ kubectl get po -nsmoke-ns
NAME                                                     READY   STATUS      RESTARTS   AGE
rook-ceph-crashcollector-fv-az120-915-68d566c77f-fcvvh   1/1     Running     0          112s
rook-ceph-mgr-a-78c6ff567c-5z4j8                         1/1     Running     0          10m
rook-ceph-mon-a-68bbf96d58-qdc54                         1/1     Running     0          11m
rook-ceph-mon-b-74978dc6dd-szfqh                         1/1     Running     0          11m
rook-ceph-mon-c-85896c99b6-g2pwg                         1/1     Running     0          11m
rook-ceph-osd-0-5f9c94f77d-4jfsb                         1/1     Running     0          9m21s
rook-ceph-osd-prepare-fv-az120-915-9zb57                 0/1     Completed   0          10m
rook-ceph-tools-7865b9c9f6-tdgcl                         1/1     Running     0          10m
runner@fv-az120-915:~/work/rook/rook$ kubectl exec -it rook-ceph-tools-7865b9c9f6-tdgcl -nsmoke-ns -- sh
sh-4.4# ceph -s
  cluster:
    id:     7ae97632-c490-4967-801e-a773d04ae288
    health: HEALTH_ERR
            1 filesystem is offline
            1 filesystem is online with fewer MDS than max_mds
            3 pool(s) have no replicas configured
 
  services:
    mon: 3 daemons, quorum a,b,c (age 11m)
    mgr: a(active, since 10m)
    mds: 0/0 daemons up
    osd: 1 osds: 1 up (since 9m), 1 in (since 11m)
 
  data:
    volumes: 1/1 healthy
    pools:   3 pools, 192 pgs
    objects: 3 objects, 0 B
    usage:   7.9 MiB used, 14 GiB / 14 GiB avail
    pgs:     192 active+clean
 
sh-4.4#

ceph cluster is in error state. @travisn @subhamkrai anyone can help here?

looks similar to this issue #8745

@Madhu-1
Copy link
Member Author

Madhu-1 commented Sep 22, 2021

@leseb am frequently hitting the issue in the CI any idea what could be wrong?

JFYI same template changes work fine in cephcsi ceph/ceph-csi#2518 and CI is passed

As like RBD, CephFS provisioner pod need not to
run as privileged. as its not doing any operation
like plugin pods which does mounting and unmounting
removing the permissions for the same.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
@leseb
Copy link
Member

leseb commented Sep 22, 2021

@leseb am frequently hitting the issue in the CI any idea what could be wrong?

JFYI same template changes work fine in cephcsi ceph/ceph-csi#2518 and CI is passed

Looks like you just needed to rebase.

@leseb leseb requested a review from travisn September 22, 2021 08:47
@Madhu-1
Copy link
Member Author

Madhu-1 commented Sep 22, 2021

@leseb am frequently hitting the issue in the CI any idea what could be wrong?
JFYI same template changes work fine in cephcsi ceph/ceph-csi#2518 and CI is passed

Looks like you just needed to rebase.

That helped Thank you.

@lhzw
Copy link

lhzw commented Sep 23, 2021

Hi, there, when will this be merged to release-1.7? Thank you.

mergify bot added a commit that referenced this pull request Sep 23, 2021
ceph: modify CephFS provisioner permission (backport #8739)
@travisn
Copy link
Member

travisn commented Sep 23, 2021

Hi, there, when will this be merged to release-1.7? Thank you.

Merged now, and will be included in v1.7.4 planned today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ceph main ceph tag security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants