New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/rook: update ceph orch apply nfs
#43046
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the following comments
cf671fd
to
f29c194
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the comment below, it would be great if you can add qa tests.
f29c194
to
68d5044
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the imports. Otherwise looks good.
68d5044
to
114a74d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run the qa tests after the depending PR is merged. Otherwise looks good.
@@ -431,6 +433,9 @@ def remove_service(self, service_name: str) -> str: | |||
return self.rook_cluster.rm_service('cephobjectstores', service_name) | |||
elif service_type == 'nfs': | |||
return self.rook_cluster.rm_service('cephnfses', service_name) | |||
elif service_type == 'ingress': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not harmful,, but I think that nobody is going to create an ingress service in a rook cluster, so difficult to reach this point. In any case, i think the operation that must be blocked is the creation of an orchestrator ingress service within the rook orchestrator
# TODO use spec.placement | ||
# TODO warn if spec.extended has entries we don't kow how | ||
# to action. | ||
# TODO Number of pods should be based on the list of hosts in the | ||
# PlacementSpec. | ||
assert spec.service_id, "service id in NFS service spec cannot be an empty string or None " # for mypy typing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No error management... (in the whole method). We have commented about this problem in other PRs. Basically if we do not include a basic error management in this methods, we are reporting managing programming and unexpected errors in the same way that errors related to the operative.
At least, we should avoid to dump the stack trace in the console ( is good to do that in the log... but in the console causes panic in final users)
Example:
# ceph orch apply nfs jmo_nfs
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1564, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 167, in handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 415, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 1155, in _apply_nfs
return self._apply_misc([spec], dry_run, format, no_overwrite)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 1062, in _apply_misc
raise_if_exception(completion)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 224, in raise_if_exception
raise e
kubernetes.client.rest.ApiException: (422)
Reason: Unprocessable Entity
HTTP response headers: HTTPHeaderDict({'Audit-Id': '611aedf2-314e-4f9b-a828-1c2c5a292027', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '21f0f9d3-30f5-4bef-a93b-e25595220817', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'ce436ed7-8716-4a58-8b63-07fd783fdd5c', 'Date': 'Tue, 14 Sep 2021 11:10:23 GMT', 'Content-Length': '908'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"CephNFS.ceph.rook.io \"jmo_nfs\" is invalid: metadata.name: Invalid value: \"jmo_nfs\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","reason":"Invalid","details":{"name":"jmo_nfs","group":"ceph.rook.io","kind":"CephNFS","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: \"jmo_nfs\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","field":"metadata.name"}]},"code":422}
@@ -431,6 +433,9 @@ def remove_service(self, service_name: str) -> str: | |||
return self.rook_cluster.rm_service('cephobjectstores', service_name) | |||
elif service_type == 'nfs': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ceph auth keys for the nfs service are not deleted. This is an operation to do in ALL the orchestrators
# ceph auth ls
client.nfs-ganesha.jmonfs.a
key: AQChg0Bh5V7ZHRAAFMydhUKjSPKvZRxV7qTBsg==
caps: [mon] allow r
caps: [osd] allow rw pool=.nfs namespace=jmonfs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart of the issues pointed in the comments, the list of the service is not properly managed:
[root@rook-ceph-tools-78cdfd976c-qc6fm /]# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
crash 3/3 0s ago 38m *
mgr 1/1 0s ago 38m count:1
mon 3/3 0s ago 39m count:3
nfs.test2nfs 0/1 0s ago - count:1 <-------------- Number of nfs pods not updated properly
114a74d
to
31ad8cd
Compare
31ad8cd
to
6beb03c
Compare
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
6beb03c
to
c06d253
Compare
jenkins test sign |
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
c06d253
to
90e62f2
Compare
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
90e62f2
to
88b0cc7
Compare
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
This commit moves the functionality for creating the .nfs pool from the nfs module to the rook module and makes the rook module use the .nfs pool when creating an NFS daemon. Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
This commit prevents the creation of NFS clusters that don't use the .nfs RADOS pool using ceph orch apply nfs. Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
This commit adds apply nfs to the rook qa task to see if the command runs with no errors, this doesn't actually check if an NFS daemon was created. Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
This commit updates orch ls to show the age and the number of running nfs pods, removes auth entities when removing an nfs service and implements better error checking when creating nfs daemons. Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
88b0cc7
to
672e904
Compare
This PR updates the
ceph orch apply nfs
command in the Rook orchestrator to keep up with the changes made to the NFS module by preventing the creation of NFS daemons not using the '.nfs' RADOS pool and creating the pool if it doesn't exist when a user tries to create an NFS daemon.Depends on rook #8501
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox