Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/rook: update ceph orch apply nfs #43046

Merged
merged 4 commits into from Nov 11, 2021

Conversation

josephsawaya
Copy link
Contributor

This PR updates the ceph orch apply nfs command in the Rook orchestrator to keep up with the changes made to the NFS module by preventing the creation of NFS daemons not using the '.nfs' RADOS pool and creating the pool if it doesn't exist when a user tries to create an NFS daemon.

Depends on rook #8501

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Copy link
Contributor

@varshar16 varshar16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the following comments

src/pybind/mgr/rook/module.py Outdated Show resolved Hide resolved
src/pybind/mgr/rook/rook_cluster.py Outdated Show resolved Hide resolved
src/pybind/mgr/rook/rook_cluster.py Outdated Show resolved Hide resolved
Copy link
Contributor

@varshar16 varshar16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the comment below, it would be great if you can add qa tests.

src/pybind/mgr/rook/rook_cluster.py Outdated Show resolved Hide resolved
src/pybind/mgr/rook/rook_cluster.py Outdated Show resolved Hide resolved
src/pybind/mgr/rook/rook_cluster.py Outdated Show resolved Hide resolved
Copy link
Contributor

@varshar16 varshar16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the imports. Otherwise looks good.

src/pybind/mgr/rook/rook_cluster.py Show resolved Hide resolved
Copy link
Contributor

@varshar16 varshar16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run the qa tests after the depending PR is merged. Otherwise looks good.

@@ -431,6 +433,9 @@ def remove_service(self, service_name: str) -> str:
return self.rook_cluster.rm_service('cephobjectstores', service_name)
elif service_type == 'nfs':
return self.rook_cluster.rm_service('cephnfses', service_name)
elif service_type == 'ingress':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not harmful,, but I think that nobody is going to create an ingress service in a rook cluster, so difficult to reach this point. In any case, i think the operation that must be blocked is the creation of an orchestrator ingress service within the rook orchestrator

# TODO use spec.placement
# TODO warn if spec.extended has entries we don't kow how
# to action.
# TODO Number of pods should be based on the list of hosts in the
# PlacementSpec.
assert spec.service_id, "service id in NFS service spec cannot be an empty string or None " # for mypy typing
Copy link
Member

@jmolmo jmolmo Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No error management... (in the whole method). We have commented about this problem in other PRs. Basically if we do not include a basic error management in this methods, we are reporting managing programming and unexpected errors in the same way that errors related to the operative.
At least, we should avoid to dump the stack trace in the console ( is good to do that in the log... but in the console causes panic in final users)

Example:

# ceph orch apply nfs jmo_nfs

Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1564, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 167, in handle_command
    return dispatch[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 415, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda>
    wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)  # noqa: E731
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/module.py", line 1155, in _apply_nfs
    return self._apply_misc([spec], dry_run, format, no_overwrite)
  File "/usr/share/ceph/mgr/orchestrator/module.py", line 1062, in _apply_misc
    raise_if_exception(completion)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 224, in raise_if_exception
    raise e
kubernetes.client.rest.ApiException: (422)
Reason: Unprocessable Entity
HTTP response headers: HTTPHeaderDict({'Audit-Id': '611aedf2-314e-4f9b-a828-1c2c5a292027', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '21f0f9d3-30f5-4bef-a93b-e25595220817', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'ce436ed7-8716-4a58-8b63-07fd783fdd5c', 'Date': 'Tue, 14 Sep 2021 11:10:23 GMT', 'Content-Length': '908'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"CephNFS.ceph.rook.io \"jmo_nfs\" is invalid: metadata.name: Invalid value: \"jmo_nfs\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","reason":"Invalid","details":{"name":"jmo_nfs","group":"ceph.rook.io","kind":"CephNFS","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: \"jmo_nfs\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","field":"metadata.name"}]},"code":422}

@@ -431,6 +433,9 @@ def remove_service(self, service_name: str) -> str:
return self.rook_cluster.rm_service('cephobjectstores', service_name)
elif service_type == 'nfs':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ceph auth keys for the nfs service are not deleted. This is an operation to do in ALL the orchestrators

# ceph auth ls

client.nfs-ganesha.jmonfs.a
	key: AQChg0Bh5V7ZHRAAFMydhUKjSPKvZRxV7qTBsg==
	caps: [mon] allow r
	caps: [osd] allow rw pool=.nfs namespace=jmonfs

Copy link
Member

@jmolmo jmolmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart of the issues pointed in the comments, the list of the service is not properly managed:

[root@rook-ceph-tools-78cdfd976c-qc6fm /]# ceph orch ls
NAME          PORTS  RUNNING  REFRESHED  AGE  PLACEMENT  
crash                    3/3  0s ago     38m  *          
mgr                      1/1  0s ago     38m  count:1    
mon                      3/3  0s ago     39m  count:3    
nfs.test2nfs             0/1  0s ago     -    count:1  <--------------  Number of nfs pods not updated properly

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@sebastian-philipp
Copy link
Contributor

jenkins test sign

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link

github-actions bot commented Nov 1, 2021

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Joseph Sawaya added 4 commits November 10, 2021 12:07
This commit moves the functionality for creating the .nfs pool from the
nfs module to the rook module and makes the rook module use the .nfs
pool when creating an NFS daemon.

Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
This commit prevents the creation of NFS clusters that don't use the
.nfs RADOS pool using ceph orch apply nfs.

Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
This commit adds apply nfs to the rook qa task to see if the
command runs with no errors, this doesn't actually check if
an NFS daemon was created.

Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
This commit updates orch ls to show the age and the number of running nfs
pods, removes auth entities when removing an nfs service and implements
better error checking when creating nfs daemons.

Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants