mgr/rook: update `ceph orch apply nfs` #43046

josephsawaya · 2021-09-03T16:20:50Z

This PR updates the ceph orch apply nfs command in the Rook orchestrator to keep up with the changes made to the NFS module by preventing the creation of NFS daemons not using the '.nfs' RADOS pool and creating the pool if it doesn't exist when a user tries to create an NFS daemon.

Depends on rook #8501

Checklist

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

varshar16

Please address the following comments

src/pybind/mgr/rook/module.py

src/pybind/mgr/rook/rook_cluster.py

varshar16

Apart from the comment below, it would be great if you can add qa tests.

src/pybind/mgr/rook/rook_cluster.py

varshar16

Please move the imports. Otherwise looks good.

src/pybind/mgr/rook/rook_cluster.py

varshar16

Please run the qa tests after the depending PR is merged. Otherwise looks good.

jmolmo · 2021-09-14T10:28:22Z

src/pybind/mgr/rook/module.py

@@ -431,6 +433,9 @@ def remove_service(self, service_name: str) -> str:
            return self.rook_cluster.rm_service('cephobjectstores', service_name)
        elif service_type == 'nfs':
            return self.rook_cluster.rm_service('cephnfses', service_name)
+        elif service_type == 'ingress':


This is not harmful,, but I think that nobody is going to create an ingress service in a rook cluster, so difficult to reach this point. In any case, i think the operation that must be blocked is the creation of an orchestrator ingress service within the rook orchestrator

jmolmo · 2021-09-14T11:52:16Z

src/pybind/mgr/rook/rook_cluster.py

        # TODO use spec.placement
        # TODO warn if spec.extended has entries we don't kow how
        #      to action.
        # TODO Number of pods should be based on the list of hosts in the
        #      PlacementSpec.
+        assert spec.service_id, "service id in NFS service spec cannot be an empty string or None " # for mypy typing


No error management... (in the whole method). We have commented about this problem in other PRs. Basically if we do not include a basic error management in this methods, we are reporting managing programming and unexpected errors in the same way that errors related to the operative.
At least, we should avoid to dump the stack trace in the console ( is good to do that in the log... but in the console causes panic in final users)

Example:

# ceph orch apply nfs jmo_nfs Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1564, in _handle_command return self.handle_command(inbuf, cmd) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 167, in handle_command return dispatch[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 415, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731 File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper return func(*args, **kwargs) File "/usr/share/ceph/mgr/orchestrator/module.py", line 1155, in _apply_nfs return self._apply_misc([spec], dry_run, format, no_overwrite) File "/usr/share/ceph/mgr/orchestrator/module.py", line 1062, in _apply_misc raise_if_exception(completion) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 224, in raise_if_exception raise e kubernetes.client.rest.ApiException: (422) Reason: Unprocessable Entity HTTP response headers: HTTPHeaderDict({'Audit-Id': '611aedf2-314e-4f9b-a828-1c2c5a292027', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '21f0f9d3-30f5-4bef-a93b-e25595220817', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'ce436ed7-8716-4a58-8b63-07fd783fdd5c', 'Date': 'Tue, 14 Sep 2021 11:10:23 GMT', 'Content-Length': '908'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"CephNFS.ceph.rook.io \"jmo_nfs\" is invalid: metadata.name: Invalid value: \"jmo_nfs\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","reason":"Invalid","details":{"name":"jmo_nfs","group":"ceph.rook.io","kind":"CephNFS","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: \"jmo_nfs\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","field":"metadata.name"}]},"code":422}

jmolmo · 2021-09-14T11:54:46Z

src/pybind/mgr/rook/module.py

@@ -431,6 +433,9 @@ def remove_service(self, service_name: str) -> str:
            return self.rook_cluster.rm_service('cephobjectstores', service_name)
        elif service_type == 'nfs':


Ceph auth keys for the nfs service are not deleted. This is an operation to do in ALL the orchestrators

# ceph auth ls client.nfs-ganesha.jmonfs.a key: AQChg0Bh5V7ZHRAAFMydhUKjSPKvZRxV7qTBsg== caps: [mon] allow r caps: [osd] allow rw pool=.nfs namespace=jmonfs

jmolmo

Apart of the issues pointed in the comments, the list of the service is not properly managed:

[root@rook-ceph-tools-78cdfd976c-qc6fm /]# ceph orch ls
NAME          PORTS  RUNNING  REFRESHED  AGE  PLACEMENT  
crash                    3/3  0s ago     38m  *          
mgr                      1/1  0s ago     38m  count:1    
mon                      3/3  0s ago     39m  count:3    
nfs.test2nfs             0/1  0s ago     -    count:1  <--------------  Number of nfs pods not updated properly

github-actions · 2021-09-29T19:00:36Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

sebastian-philipp · 2021-09-30T14:43:30Z

jenkins test sign

github-actions · 2021-10-14T03:55:47Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

github-actions · 2021-11-01T20:50:34Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

github-actions · 2021-11-10T16:54:10Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

This commit moves the functionality for creating the .nfs pool from the nfs module to the rook module and makes the rook module use the .nfs pool when creating an NFS daemon. Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>

This commit prevents the creation of NFS clusters that don't use the .nfs RADOS pool using ceph orch apply nfs. Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>

This commit adds apply nfs to the rook qa task to see if the command runs with no errors, this doesn't actually check if an NFS daemon was created. Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>

This commit updates orch ls to show the age and the number of running nfs pods, removes auth entities when removing an nfs service and implements better error checking when creating nfs daemons. Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>

josephsawaya requested a review from a team as a code owner September 3, 2021 16:20

github-actions bot added orchestrator pybind rook labels Sep 3, 2021

varshar16 suggested changes Sep 8, 2021

View reviewed changes

src/pybind/mgr/rook/module.py Outdated Show resolved Hide resolved

src/pybind/mgr/rook/rook_cluster.py Outdated Show resolved Hide resolved

src/pybind/mgr/rook/rook_cluster.py Outdated Show resolved Hide resolved

varshar16 added the nfs label Sep 8, 2021

josephsawaya force-pushed the wip-mgr-rook-apply-nfs branch from cf671fd to f29c194 Compare September 8, 2021 14:57

varshar16 suggested changes Sep 9, 2021

View reviewed changes

src/pybind/mgr/rook/rook_cluster.py Outdated Show resolved Hide resolved

src/pybind/mgr/rook/rook_cluster.py Outdated Show resolved Hide resolved

src/pybind/mgr/rook/rook_cluster.py Outdated Show resolved Hide resolved

josephsawaya force-pushed the wip-mgr-rook-apply-nfs branch from f29c194 to 68d5044 Compare September 9, 2021 19:37

varshar16 suggested changes Sep 10, 2021

View reviewed changes

src/pybind/mgr/rook/rook_cluster.py Show resolved Hide resolved

josephsawaya force-pushed the wip-mgr-rook-apply-nfs branch from 68d5044 to 114a74d Compare September 10, 2021 20:50

varshar16 approved these changes Sep 10, 2021

View reviewed changes

varshar16 added DNM needs-qa labels Sep 10, 2021

jmolmo reviewed Sep 14, 2021

View reviewed changes

jmolmo requested changes Sep 14, 2021

View reviewed changes

josephsawaya force-pushed the wip-mgr-rook-apply-nfs branch from 114a74d to 31ad8cd Compare September 14, 2021 18:57

jmolmo approved these changes Sep 17, 2021

View reviewed changes

josephsawaya force-pushed the wip-mgr-rook-apply-nfs branch from 31ad8cd to 6beb03c Compare September 17, 2021 14:52

github-actions bot added the needs-rebase label Sep 29, 2021

josephsawaya force-pushed the wip-mgr-rook-apply-nfs branch from 6beb03c to c06d253 Compare September 29, 2021 20:02

github-actions bot removed the needs-rebase label Sep 29, 2021

github-actions bot added the needs-rebase label Oct 14, 2021

josephsawaya force-pushed the wip-mgr-rook-apply-nfs branch from c06d253 to 90e62f2 Compare October 25, 2021 13:17

github-actions bot removed the needs-rebase label Oct 25, 2021

github-actions bot added the needs-rebase label Nov 1, 2021

josephsawaya force-pushed the wip-mgr-rook-apply-nfs branch from 90e62f2 to 88b0cc7 Compare November 3, 2021 13:15

github-actions bot removed the needs-rebase label Nov 3, 2021

josephsawaya added the wip-sage-testing label Nov 4, 2021

liewegas removed the wip-sage-testing label Nov 4, 2021

github-actions bot added the needs-rebase label Nov 10, 2021

Joseph Sawaya added 4 commits November 10, 2021 12:07

mgr/rook: prevent creation of NFS clusters not in .nfs rados pool

606b051

This commit prevents the creation of NFS clusters that don't use the .nfs RADOS pool using ceph orch apply nfs. Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>

qa/tasks/rook: add apply nfs to rook qa task

2c803ec

This commit adds apply nfs to the rook qa task to see if the command runs with no errors, this doesn't actually check if an NFS daemon was created. Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>

josephsawaya force-pushed the wip-mgr-rook-apply-nfs branch from 88b0cc7 to 672e904 Compare November 10, 2021 17:09

github-actions bot removed the needs-rebase label Nov 10, 2021

liewegas added the wip-sage2-testing label Nov 10, 2021

liewegas merged commit 25c65fd into ceph:master Nov 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mgr/rook: update `ceph orch apply nfs` #43046

mgr/rook: update `ceph orch apply nfs` #43046

josephsawaya commented Sep 3, 2021

varshar16 left a comment

varshar16 left a comment

varshar16 left a comment

varshar16 left a comment

jmolmo Sep 14, 2021

jmolmo Sep 14, 2021 •

edited

jmolmo Sep 14, 2021

jmolmo left a comment

github-actions bot commented Sep 29, 2021

sebastian-philipp commented Sep 30, 2021

github-actions bot commented Oct 14, 2021

github-actions bot commented Nov 1, 2021

github-actions bot commented Nov 10, 2021

		@@ -431,6 +433,9 @@ def remove_service(self, service_name: str) -> str:
		return self.rook_cluster.rm_service('cephobjectstores', service_name)
		elif service_type == 'nfs':

mgr/rook: update ceph orch apply nfs #43046

mgr/rook: update ceph orch apply nfs #43046

Conversation

josephsawaya commented Sep 3, 2021

Checklist

varshar16 left a comment

Choose a reason for hiding this comment

varshar16 left a comment

Choose a reason for hiding this comment

varshar16 left a comment

Choose a reason for hiding this comment

varshar16 left a comment

Choose a reason for hiding this comment

jmolmo Sep 14, 2021

Choose a reason for hiding this comment

jmolmo Sep 14, 2021 • edited

Choose a reason for hiding this comment

jmolmo Sep 14, 2021

Choose a reason for hiding this comment

jmolmo left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 29, 2021

sebastian-philipp commented Sep 30, 2021

github-actions bot commented Oct 14, 2021

github-actions bot commented Nov 1, 2021

github-actions bot commented Nov 10, 2021

mgr/rook: update `ceph orch apply nfs` #43046

mgr/rook: update `ceph orch apply nfs` #43046

jmolmo Sep 14, 2021 •

edited