New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceph: remove RADOS options from CephNFS and use .nfs pool #8501
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- What's the upgrade story? For example, if someone has nfs configured and they upgrade, will their existing shares stop working?
- In what Ceph version is the
.nfs
pool required?
Looks like this should help answer your questions.
I'm not too familiar with how Rook handles upgrading but it looks like the NFS module should handle it on the backend, not too sure what would happen to the NFS CRDs though since there would be fields that aren't there anymore.
I'm also not too familiar with Ceph versioning but it looks like it would be required in later versions of Pacific and onward? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is for Quincy only and won't be backported to Pacific. But sill we need a good upgrade story for this. Perhaps we can just rename whatever pool was used for NFS data with .nfs
. That should be sufficient IMO to not break existing clusters. However, this needs to be carefully tested.
pkg/operator/ceph/nfs/nfs.go
Outdated
@@ -254,6 +255,21 @@ func instanceName(n *cephv1.CephNFS, name string) string { | |||
return fmt.Sprintf("%s-%s-%s", AppName, n.Name, name) | |||
} | |||
|
|||
// create and enable .nfs RADOS pool | |||
func createNFSRADOSPool(context *clusterd.Context, clusterInfo *cephclient.ClusterInfo) error { | |||
args := []string{"osd", "pool", "create", ".nfs"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
args := []string{"osd", "pool", "create", ".nfs"} | |
args := []string{"osd", "pool", "create", radosPoolName} |
pkg/operator/ceph/nfs/nfs.go
Outdated
if err != nil { | ||
return err | ||
} | ||
args = []string{"osd", "pool", "application", "enable", ".nfs", "nfs"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
pkg/operator/ceph/nfs/nfs.go
Outdated
// Create the .nfs pool if it doesn't already exist | ||
err := createNFSRADOSPool(context, clusterInfo) | ||
if err != nil { | ||
return errors.Wrapf(err, ".nfs pool not found and unable to create it ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return errors.Wrapf(err, ".nfs pool not found and unable to create it ") | |
return errors.Wrapf(err, "failed to find %q pool and unable to create it", radosPoolName) |
I think it's possible that before, users were able to use a different pool for each NFS daemon, there might've been multiple pools containing NFS data, so I don't think renaming the pool would work in that case. |
4335b6e
to
c66cbd4
Compare
🤔 Hum yes, if you had multiple CephNFS servers running? AFAIK Ganesha HA support hasn't been that good. I wonder how common this is. Still, we should document a migration path we believe that would work some scenarios. |
As discussed in huddle, after we confirm which version of Ceph has the changes we can add a check in Rook for the version where the new approach applies. Search for |
So the CephNFS CRD would have to support both approaches? |
Yes, Rook really needs to deal with the versioning issues since Rook is supporting multiple versions of Ceph. NFS is not used as commonly so I think we realistically don't need to maintain the full compatibility, but we need to at least have some time period where both modes are supported and a migration plan is fully understood. |
Ok so I guess an approach could be to keep the CRDs the same as they are right now and just make it clear through the documentation that for Ceph versions newer than Pacific 16.2.5 (or whatever version the NFS changes will be implemented in) those fields in the CRD aren't actually used and Ceph uses the default ".nfs" pool. For the unit tests and integration tests, how would I write tests for different Ceph versions? |
Yes, if we leave the settings in the CRD, then document the difference, that should work. In newer Ceph versions, those properties would just be ignored.
We run the integration tests on multiple versions of Kubernetes already, so we should already have coverage in those tests across versions. For the unit tests, you could just test smaller functions that wouldn't worry about setting different ceph versions. |
6a23c94
to
b6ce581
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM just one final question
de4c6b1
to
2917e08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. To test the upgrades manually, I'd suggest you deploy nfs from the rook master and ceph/ceph:v16.2.5, and verify it's working based on the old approach. Then you can update the ceph image in the cephcluster CR to latest pacific (quay.io/ceph/daemon-base:latest-pacific-devel
) and confirm the nfs upgrade works.
There's more issues with this, I tried running a cluster on Ceph 16.2.5 and it looks like that version of Ceph (and some preceding versions) also has a default pool name so it's impossible to create exports, not sure if the diff between 16.2.5 and the master branch is just the name of the default pool or if anything else in the NFS module changed. I somewhat fixed the issue mentioned above, and the exports aren't moved to the correct pool when you upgrade the Ceph version:
|
@batrick @liewegas To migrate to the new .nfs pool in latest Pacific, any suggestions? Not sure many users will be impacted, but at least some manual procedure to update would be helpful. |
2917e08
to
e8801ea
Compare
pkg/operator/ceph/nfs/controller.go
Outdated
if r.clusterInfo.CephVersion.IsAtLeast(version.CephVersion{Major: 16, Minor: 2, Extra: 6}) { | ||
cephNFS.Spec.RADOS.Pool = dotNFSPoolName | ||
} else { | ||
cephNFS.Spec.RADOS.Pool = nfsGaneshaPoolName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to 16.2.6 didn't the user have to set this in the CR, so we shouldn't overwrite it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your PR currently doesn't allow it to be set for octopus either. Seems like we can remove the else
block to keep the current behavior that they set the pool name in the CR if older than 16.2.6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the way I see it:
- Octopus: Customization is allowed, so don't change the pool and namespace
- Pacific before 16.2.6: No customization, default pool name is nfs-ganesha
- Pacific after 16.2.6: No customization, default pool name is .nfs
This code is supposed to change the pool and namespace if the version is pacific, if its above 16.2.6 it overwrites it to ".nfs", else it changes it to "nfs-ganesha". If the version precedes Pacific it doesn't change it at all and it should be whatever the user provided in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see now that the pre-pacific settings won't be overwritten. Thanks for the clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @josephsawaya last comment is valuable and should be a comment before this conditional block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes look good. There is just an open question about what it will take to migrate existing users so we can add documentation.
pkg/operator/ceph/nfs/controller.go
Outdated
if r.clusterInfo.CephVersion.IsAtLeast(version.CephVersion{Major: 16, Minor: 2, Extra: 6}) { | ||
cephNFS.Spec.RADOS.Pool = dotNFSPoolName | ||
} else { | ||
cephNFS.Spec.RADOS.Pool = nfsGaneshaPoolName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see now that the pre-pacific settings won't be overwritten. Thanks for the clarification.
Documentation/ceph-nfs-crd.md
Outdated
@@ -91,6 +93,8 @@ ceph dashboard set-ganesha-clusters-rados-pool-namespace <cluster_id>:<pool_name | |||
* `pool`: The pool where ganesha recovery backend and supplemental configuration objects will be stored | |||
* `namespace`: The namespace in `pool` where ganesha recovery backend and supplemental configuration objects will be stored | |||
|
|||
> **NOTE**: The RADOS settings aren't used in Ceph versions equal to or greater than Pacific 16.2.6, default values are used instead ".nfs" for the RADOS pool and the CephNFS CR's name for the RADOS namespace. However, RADOS settings are mandatory for versions preceding Pacific 16.2.6. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be part of the release note too. Probably in the breaking change section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there something I can do about this on my end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it were in a minor release (v1.8), we would add it to the PendingReleaseNotes.md, but this will need to be in a patch release so I'll make a mental note for a special release note. Do we have an idea how soon 16.2.6 will be out? It would be good to include this in the rook release immediately before that ceph release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just waiting for v1.8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rook will not be usable with NFS in v16.2.6 or newer until we merge this. Since v16.2.6 is about to be released, we should get this in the next v1.7.x patch.
pkg/operator/ceph/nfs/controller.go
Outdated
@@ -204,6 +205,15 @@ func (r *ReconcileCephNFS) reconcile(request reconcile.Request) (reconcile.Resul | |||
} | |||
r.clusterInfo.CephVersion = currentCephVersion | |||
|
|||
if r.clusterInfo.CephVersion.IsAtLeastPacific() { | |||
if r.clusterInfo.CephVersion.IsAtLeast(version.CephVersion{Major: 16, Minor: 2, Extra: 6}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add version.CephVersion{Major: 16, Minor: 2, Extra: 6}
as a variable and comment it to explain its purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/operator/ceph/nfs/controller.go
Outdated
if r.clusterInfo.CephVersion.IsAtLeast(version.CephVersion{Major: 16, Minor: 2, Extra: 6}) { | ||
cephNFS.Spec.RADOS.Pool = dotNFSPoolName | ||
} else { | ||
cephNFS.Spec.RADOS.Pool = nfsGaneshaPoolName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @josephsawaya last comment is valuable and should be a comment before this conditional block.
pkg/operator/ceph/nfs/nfs.go
Outdated
@@ -37,6 +38,8 @@ import ( | |||
|
|||
const ( | |||
ganeshaRadosGraceCmd = "ganesha-rados-grace" | |||
dotNFSPoolName = ".nfs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a more meaningful name for this. Like defaultNFSPoolName
? Also add a comment this is hardcoded in Ceph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/operator/ceph/nfs/nfs.go
Outdated
@@ -37,6 +38,8 @@ import ( | |||
|
|||
const ( | |||
ganeshaRadosGraceCmd = "ganesha-rados-grace" | |||
dotNFSPoolName = ".nfs" | |||
nfsGaneshaPoolName = "nfs-ganesha" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, different name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e8801ea
to
96d601b
Compare
The manual process for moving cluster
|
Sounds good to go ahead with this since we have the migration steps. Any word on how soon the 16.2.6 release will be? It would be ideal to have the next rook release with this change close to the same time. |
The pacific branch is blocked and QE has started yesterday. So if we don't backport this, we can go ahead, and once v1.8 is out v16.2.6 would have around for some time. |
does this depend on ceph/ceph#43075 ? |
Yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this depend on ceph/ceph#43075 ?
Yes
@sebastian-philipp That is planned to be merged now for v16.2.6?
Documentation/ceph-nfs-crd.md
Outdated
@@ -91,6 +93,8 @@ ceph dashboard set-ganesha-clusters-rados-pool-namespace <cluster_id>:<pool_name | |||
* `pool`: The pool where ganesha recovery backend and supplemental configuration objects will be stored | |||
* `namespace`: The namespace in `pool` where ganesha recovery backend and supplemental configuration objects will be stored | |||
|
|||
> **NOTE**: The RADOS settings aren't used in Ceph versions equal to or greater than Pacific 16.2.6, default values are used instead ".nfs" for the RADOS pool and the CephNFS CR's name for the RADOS namespace. However, RADOS settings are mandatory for versions preceding Pacific 16.2.6. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rook will not be usable with NFS in v16.2.6 or newer until we merge this. Since v16.2.6 is about to be released, we should get this in the next v1.7.x patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this and it works properly in the Rook part.
96d601b
to
31e4291
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion, we will go ahead and merge this based on the 16.2.7 code check. The new code won't be triggered until that release though.
Documentation/ceph-nfs-crd.md
Outdated
@@ -25,6 +25,8 @@ metadata: | |||
name: my-nfs | |||
namespace: rook-ceph | |||
spec: | |||
# rados property is not used in versions of Ceph equal to or greater than | |||
# 16.2.6, see note in RADOS settings section below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few places like this in the docs/comments that still need to be changed from .6 to .7
31e4291
to
4cdb375
Compare
pkg/operator/ceph/nfs/nfs.go
Outdated
err := createDefaultNFSRADOSPool(context, clusterInfo, n.Spec.RADOS.Pool) | ||
if err != nil { | ||
return errors.Wrapf(err, "failed to find %q pool and unable to create it", n.Spec.RADOS.Pool) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to return nil here, right? Or else the reconcile will fail a few lines below when it returns pool not found
} | |
} | |
return nil |
4cdb375
to
4ef7342
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tests!
This pull request has merge conflicts that must be resolved before it can be merged. @josephsawaya please rebase it. https://rook.io/docs/rook/master/development-flow.html#updating-your-fork |
4ef7342
to
828f440
Compare
@mergify rebase |
This commit updates the CephNFS CR to make the RADOS settings optional for Ceph versions above 16.2.7 due to the NFS module changes in Ceph. The changes in Ceph make it so the RADOS pool is always ".nfs" and the RADOS namespace is always the name of the NFS cluster. This commit also handles the changes in Ceph Pacific versions before 16.2.7 where the default pool name is "nfs-ganesha" instead of ".nfs". Closes: rook#8450 Signed-off-by: Joseph Sawaya <jsawaya@redhat.com>
Command
Hey, I reacted but my real name is @Mergifyio |
828f440
to
ee791b0
Compare
…port ceph: remove RADOS options from CephNFS and use .nfs pool (backport #8501)
ceph: remove RADOS options from CephNFS and use .nfs pool (backport #8501)
This commit updates the CephNFS CR to make the RADOS settings optional
for Ceph versions above 16.2.6 due to the NFS module changes in Ceph.
The changes in Ceph make it so the RADOS pool is always ".nfs" and the
RADOS namespace is always the name of the NFS cluster.
Description of your changes:
Which issue is resolved by this Pull Request:
Resolves #8450
Checklist:
make codegen
) has been run to update object specifications, if necessary.