Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable nfs mgr module configuration of CephNFS #9021

Closed
3 tasks done
BlaineEXE opened this issue Oct 21, 2021 · 13 comments
Closed
3 tasks done

Enable nfs mgr module configuration of CephNFS #9021

BlaineEXE opened this issue Oct 21, 2021 · 13 comments

Comments

@BlaineEXE
Copy link
Member

BlaineEXE commented Oct 21, 2021

Is this a bug report or feature request?

  • Feature Request

The new nfs mgr module in Ceph makes managing NFS exports much easier from the CLI. We would like to allow users to manage NFS exports using this interface.

Current state

In Octopus, users could create NFS exports from the dashboard only. They could do this via CLI using curl commands in what is frankly a pretty hacky experience as noted in the Octopus docs: https://docs.ceph.com/en/octopus/cephfs/nfs/#configure-nfs-ganesha-exports. Notably, the line below:

We need to download and run this script to pass the JSON file contents.

Configuring the dashboard to "see" a manually-created CephNFS involves running the command found here (assuming multiple CephNFSes): https://rook.github.io/docs/rook/v1.7/ceph-nfs-crd.html#samples

ceph dashboard set-ganesha-clusters-rados-pool-namespace <cluster_id>:<pool_name>[/<namespace>](,<cluster_id>:<pool_name>[/<namespace>])*

The dashboard interface continues to work when upgrading to Pacific v16.2.6. This also causes the nfs-ganesha pool to be created with with an empty conf for the NFS cluster, but no exports are migrated. (Update: nfs-ganesha pool might be created by Rook, not Ceph)

As of 21 Oct. 2021, the dashboard interface also continues to work when upgrading to latest-pacific-devel (will become v16.2.7), and still continues to work when upgrading to latest-master-devel (will become Quincy). This also causes the .nfs pool to be created with an empty conf for the NFS cluster, and again with no exports migrated. (Update: .nfs pool might be created by Rook, not Ceph)

As of Ceph v16.2.6, the ceph nfs export apply <cluster> command does not exist. It is added in the upcoming v16.2.7. The similar ceph nfs export update command exists, but it lacks the <cluster> argument, and I couldn't get it to work in my testing. This effectively means that currently, users cannot migrate existing exports to new pools until v16.2.7.

Desired state

Ideally, we want to get into a state where the dashboard and nfs mgr module can observe and create the same NFS exports. Ceph seems to have made some efforts to align these in the Pacific release, and Rook is lagging behind on the effort.

Firstly, I am not certain that the Ceph dashboard and nfs mgr module are able to operate on NFS exports in the same way, even in Pacific/Quincy releases. Some of my investigation suggests that the nfs mgr module modifies the config object to add a URL for each export whereas the dashboard does not. This remains to be investigated.

In order to get to the desired state in Rook, Rook needs to be able to either (1) migrate NFS exports and config to the new pool layout automatically or (2) give users good documentation about how to migrate NFS exports and config to the new pool layout for themselves manually.

So far, we have been planning on option 2 given that we believe the NFS user base is small. There are some issues reported to Rook that indicate there are users of CephNFS. I linked some issues below and skipped issues reported by the same username.

Technical proposal

CAVEAT: I still need to prove out that the below will actually work. The testing I have done so far suggests it will.

In order to keep the NFS code legible around pool names/namespaces, when releasing Rook v1.8, I would like to be able to force all users to use the new .nfs pool if possible regardless of the Ceph version. I think this will be the best experience for users. They will ideally only have to migrate their pools ONCE. It will be a poor experience for users if they have to migrate to nfs-ganesha for v16.2.[0-6] and migrate again to .nfs for v16.2.7+. The migration will only be required when they upgrade to Rook v1.8 where they will have the Rook upgrade guide to help them. If we require the migration to happen only on changes to the Ceph version, users are more likely to miss the updates.

Given the state of the nfs mgr module in Pacific before v16.2.7, I believe we must instruct users that the dashboard method is the only usable method for NFS management until v16.2.7.

For users on v16.2.6 and below (including Octopus), this will mean copying export-N items to pool .nfs in namespace <name-of-CephNFS> and using ceph dashboard set-ganesha-clusters-rados-pool-namespace to change the pool configuration.

For users on v16.2.7, the ceph nfs export apply command is working, and users can use that to migrate pools. Even if users have previously migrated exports to .nfs before this using the method above, it may be necessary for the user to use ceph nfs export apply to re-consume the export configurations if the nfs mgr module still expects URL lines to be present in the NFS config.

Ideally, users should have to do nothing if they upgrade from an already-migrated v16.2.6- to v16.2.7+.

Still to do

  • verify that it's possible to move exports to .nfs in Pacific 16.2.6 and below (including Octopus) and have the dashboard management continue working
  • verify that it's possible to move exports to .nfs in Pacific 16.2.7 and up using ceph nfs export apply
  • investigate the dashboard's management of the URL configuration line compared to the nfs mgr module's use of the line to ensure exports created with the dashboard can be managed via the nfs mgr module also (this may lead to a fix needed in Ceph)

Alternatives

We could allow users to migrate only if they actually want to use the nfs mgr module. This would inconvenience fewer users, but this would also mean we need to keep and maintain the migration docs for much longer in Rook for the eventuality that users want to migrate later. That comes with risks that docs might get outdated, and we would have to spend more time maintaining them. It seems better overall to prove out the migration steps for v1.8 and not have to worry about it after v1.9 is released.

@BlaineEXE
Copy link
Member Author

@travisn @leseb @jmolmo please take a look

@BlaineEXE
Copy link
Member Author

BlaineEXE commented Oct 22, 2021

To migrate to .nfs pool in Octopus (v15.2.11) and continue using the dashboard. In this example, the configured pool was myfs-data0 in namespace nfs-ns. The name of the CephNFS was my-nfs

  1. Create the .nfs pool
    # ceph osd pool create .nfs
    pool '.nfs' created
  2. Using toolbox pod, list all objects:
    # rados -p myfs-data0 -N nfs-ns ls
    grace
    rec-0000000000000002:my-nfs.a
    export-1
    export-2
    conf-nfs.my-nfs
  3. Copy the conf-* object and all export-* objects with the below template. The conf object was empty in my test.
    # rados -p myfs-data0 -N nfs-ns get ${obj} - > ${obj}.conf
    # rados -p .nfs -N my-nfs put ${obj}.conf
  4. Update the pool name/namespace using the mgr command
    # ceph dashboard set-ganesha-clusters-rados-pool-namespace my-nfs:.nfs/my-nfs
    Option GANESHA_CLUSTERS_RADOS_POOL_NAMESPACE updated

I verified with the dashboard that I can still see the existing 2 exports, and I was able to create a third into the .nfs pool as expected. This seems like a successful migration.

I then upgraded to Ceph v16.2.6, and the dashboard still works.


I'm still having problems actually using the NFS exports. I haven't been able to get them to work from Rook in any state, and I'm not sure how to debug what's actually going wrong. The Ganesha server has the right access to see the right rados pool/namespace and objects.

[Update] I tried below based on this comment: #6636 (comment)

# mount -t nfs -o nfsvers=4,proto=tcp <pod-ip>:/ /mnt/nfs
# # above is successful w/ no output
# ls /mnt/nfs
# # no output  :/

@BlaineEXE

This comment has been minimized.

@BlaineEXE
Copy link
Member Author

BlaineEXE commented Oct 25, 2021

After upgrading to v16.2.6, the dashboard no longer works for NFS management. In Pacific v16.2.6, I am no longer able to add NFS exports via dashboard. It seems to me that dashboard management of NFS exports is no longer allowed in Pacific in favor of the orchestrator interface.

It is necessary to change CephNFS.spec.rados.pool=.nfs and CephNFS.spec.rados.namespace=my-nfs and to unset the GANESHA_CLUSTERS_RADOS_POOL_NAMESPACE variable by issuing ceph dashboard set-ganesha-clusters-rados-pool-namespace "". This is because of an error rendered by the lines in Ceph linked below. These lines (and possibly others) must be changed in Ceph v16.2.7 in order to allow CephNFS resources to be specified with the spec.rados config omitted (it's now optional after #8501). Ceph should assume the default pool .nfs. https://github.com/ceph/ceph/blob/9c8bdbc8afe90dacd18f2ee15044f48109538abe/src/pybind/mgr/dashboard/services/ganesha.py#L69-L79

Additionally, it seems that the dashboard interface never added URLs to the Ganesha config object, so the exports were not working. I manually added exports in v16.2.6 to fix this, and the exports now work.

# cat exports.conf
%url   rados://.nfs/my-nfs/export-1
%url   rados://.nfs/my-nfs/export-2
%url   rados://.nfs/my-nfs/export-3

# rados -p .nfs -N my-nfs put conf-nfs.my-nfs exports.conf

# rados -p .nfs -N my-nfs get conf-nfs.my-nfs -
%url   rados://.nfs/my-nfs/export-1
%url   rados://.nfs/my-nfs/export-2
%url   rados://.nfs/my-nfs/export-3

The problem with my approach for Pacific v16.2.6 seems to be that the orchestrator interface does not "see" any of the migrated exports, even after the URLs have been added as above. This version is using pool nfs-ganesha and namespace my-nfs even though the CephNFS resource has spec.rados configured.

# ceph nfs export ls my-nfs
[]

# rados -p .nfs -N my-nfs ls
conf-nfs.my-nfs
export-1
export-3
grace
rec-0000000000000004:my-nfs.a
export-2

From this, I can gather that we cannot force users to use pool .nfs on Pacific versions 16.2.6 and lower as I had hoped.

@BlaineEXE
Copy link
Member Author

BlaineEXE commented Oct 25, 2021

Still in Ceph v16.2.6, I can't get the nfs mgr module to read exports that were migrated from a dashboard-created export.

A dashboard export looks like below:

# rados -p nfs-ganesha -N my-nfs get export-1 -
EXPORT {
    export_id = 1;
    path = "/";
    pseudo = "/test";
EXPORT {
    access_type = "RW";
    squash = "no_root_squash";
    protocols = 4;
    transports = "TCP";
    FSAL {
        name = "CEPH";
        user_id = "admin";
        filesystem = "myfs";
        secret_access_key = "AQBz63FhE9/gIxAAOsp+z0AW0/l0LFot/Em+Pg==";
    }

}

A similar nfs mgr module export looks like below:

# rados -p nfs-ganesha -N my-nfs get export-1 -
EXPORT {
    export_id = 1;
    path = "/";
    pseudo = "/from-cli";
    access_type = "RW";
    squash = "no_root_squash";
    security_label = true;
    protocols = 4;
    transports = "TCP";
    FSAL {
        name = "CEPH";
        user_id = "my-nfs1";
        filesystem = "myfs";
        secret_access_key = "AQBLCHdh7LCqNhAAzMkx3OsqPfsr7m/ACLyXrw==";
    }
}

I don't see where exactly the mgr module is failing to read the export. The mgr reports a log like below that doesn't include very much helpful info:

debug 2021-10-25T20:18:43.778+0000 7fbc15f54700  0 log_channel(audit) log [DBG] : from='client.35496 -' entity='client.admin' cmd=[{"prefix": "nfs export ls", "clusterid": "my-nfs", "target": ["mon-mgr", ""]}]: dispatch
debug 2021-10-25T20:18:43.793+0000 7fbc0d753700  0 [rook WARNING root] CephNFS {'apiVersion': 'ceph.rook.io/v1', 'items': [{'apiVersion': 'ceph.rook.io/v1', 'kind': 'CephNFS', 'metadata': {'annotations': {'kubectl.kubernetes.io/last-applied-configuration': '{"apiVersion":"ceph.rook.io/v1","kind":"CephNFS","metadata":{"annotations":{},"name":"my-nfs","namespace":"rook-ceph"},"spec":{"rados":{"namespace":"nfs-ns","pool":"myfs-data0"},"server":{"active":1,"logLevel":"NIV_INFO"}}}\n'}, 'creationTimestamp': '2021-10-21T22:37:27Z', 'finalizers': ['cephnfs.ceph.rook.io'], 'generation': 7, 'managedFields': [{'apiVersion': 'ceph.rook.io/v1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:annotations': {'.': {}, 'f:kubectl.kubernetes.io/last-applied-configuration': {}}}, 'f:spec': {'.': {}, 'f:server': {'.': {}, 'f:active': {}, 'f:logLevel': {}}}}, 'manager': 'kubectl-client-side-apply', 'operation': 'Update', 'time': '2021-10-21T22:37:27Z'}, {'apiVersion': 'ceph.rook.io/v1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:finalizers': {'.': {}, 'v:"cephnfs.ceph.rook.io"': {}}}, 'f:spec': {'f:server': {'f:placement': {}, 'f:resources': {}}}}, 'manager': 'rook', 'operation': 'Update', 'time': '2021-10-21T22:37:27Z'}, {'apiVersion': 'ceph.rook.io/v1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:status': {'.': {}, 'f:phase': {}}}, 'manager': 'rook', 'operation': 'Update', 'subresource': 'status', 'time': '2021-10-21T22:37:47Z'}, {'apiVersion': 'ceph.rook.io/v1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:spec': {'f:rados': {'.': {}, 'f:namespace': {}, 'f:pool': {}}}}, 'manager': 'kubectl-edit', 'operation': 'Update', 'time': '2021-10-25T19:07:03Z'}], 'name': 'my-nfs', 'namespace': 'rook-ceph', 'resourceVersion': '64902', 'uid': '9155e201-58ea-46ea-8a32-11afb870ae2b'}, 'spec': {'rados': {'namespace': 'my-nfs', 'pool': 'nfs-ganesha'}, 'server': {'active': 1, 'logLevel': 'NIV_INFO', 'placement': {}, 'resources': {}}}, 'status': {'phase': 'Ready'}}], 'kind': 'CephNFSList', 'metadata': {'continue': '', 'resourceVersion': '67003'}}
debug 2021-10-25T20:18:43.802+0000 7fbc0d753700  0 [nfs WARNING nfs.export] No exports to list for my-nfs

Given that I have been unsuccessful in migrating Ceph Octopus dashboard-created exports to Ceph v16.2.6, I think we may have to consider v16.2.0-v16.2.6 unable to be migrated for CephNFS purposes. For v16.2.7, we can use the new ceph nfs apply command to import old exports.

@BlaineEXE
Copy link
Member Author

BlaineEXE commented Oct 25, 2021

Ceph v16.2.7 (from latest-pacific-devel) does seem to be able to read the directly-copied dashboard exports. This is a big win!

I also verified that I can create a new export from the nfs mgr module which shows up in my mounted nfs share.

For some reason, the new export I created does not show up in the dashboard, however. I removed the spec.rados config from the CephNFS to make sure the dashboard would use the new default of .nfs, with no change. I also made sure the NFS config var was unset by re-issuing ceph dashboard set-ganesha-clusters-rados-pool-namespace "". I am not sure what is going wrong here. This may be an area where the Ceph team needs to make an update? I wonder if the dashboard is still using nfs-ganesha as the pool in my test, but I'm not sure how to validate that.

From the dashboard, I'm still unable to create exports because it reports that there are no clusters available (see screenshot).
image


I tried to get and apply my export-1 like below with an error:

# ceph nfs export get my-nfs /test
{
  "export_id": 1,
  "path": "/",
  "cluster_id": "my-nfs",
  "pseudo": "/test",
  "access_type": "RW",
  "squash": "no_root_squash",
  "security_label": true,
  "protocols": [
    4
  ],
  "transports": [
    "TCP"
  ],
  "fsal": {
    "name": "CEPH",
    "user_id": "admin",
    "fs_name": "myfs"
  },
  "clients": []
}
# ceph nfs export get my-nfs /test > test.conf
# ceph nfs export apply my-nfs -i test.conf
Error EINVAL: export FSAL user_id must be 'nfs.my-nfs.1'

When I do as the error message says and apply that user name, the apply is successful. The dashboard still shows the "CephFS user" as "admin" even after the change, which continues to suggest to me that the dashboard is still using the nfs-ganesha pool instead of the new .nfs pool.

FOR CEPH TEAM Is this a bug in the Ceph code, or is there something else going on? If it is a bug, it seems like this should be fixed in v16.2.7 before it's released.

@BlaineEXE
Copy link
Member Author

Giving a summary of my findings so far:

  1. In Octopus, the dashboard can continue to view existing exports and create new ones.
  2. In Pacific v16.2.{0..6}, mgr module is broken, and dashboard is read-only
    • I have been unable to migrate exports to the nfs-ganesha pool in a way that the nfs mgr module can see them
    • The dashboard can still view migrated exports in the nfs-ganesha pool but not create new exports
  3. In Pacific latest-devel (soon to be v16.2.7), mgr module is working, and dashboard is broken
    • The nfs mgr module can see migrated exports from previous dashboard-created ones
    • The nfs mgr module can also import via ceph nfs apply (with some user modification) exports from dashboard-created ones
    • The dashboard does not show the latest exports, and I believe it is still using the nfs-ganesha pool

@epuertat
Copy link

Thanks for the thorough summary, @BlaineEXE ! Ceph-Dashboard team plans to deliver the mgr/nfs - Dashboard integration by Pacific 16.2.7 (it's currently merged in master).

@BlaineEXE
Copy link
Member Author

Thanks for the thorough summary, @BlaineEXE ! Ceph-Dashboard team plans to deliver the mgr/nfs - Dashboard integration by Pacific 16.2.7 (it's currently merged in master).

Thanks @epuertat ! I think that is the last thing needed for users to have a good process migrating NFS to Pacific with Rook. :)

@sebastian-philipp
Copy link
Member

ceph/ceph#42526

@BlaineEXE BlaineEXE added this to To do in v1.8 via automation Nov 2, 2021
@BlaineEXE
Copy link
Member Author

BlaineEXE commented Nov 2, 2021

Even with latest-master-devel, the dashboard seems (based on the error message reported) to be using the nfs-ganesha pool rather than the .nfs pool. Is there a PR that still needs to make it into master?

image

This is from Ceph build 17.0.0-8135-gf5b96461 (f5b96461081c2f508d1393202ebcd94e6bd2ea3f) quincy (dev)

@alfonsomthd
Copy link
Contributor

Even with latest-master-devel, the dashboard seems (based on the error message reported) to be using the nfs-ganesha pool rather than the .nfs pool. Is there a PR that still needs to make it into master?

@BlaineEXE Somehow the container must be outdated (or at least the dashboard pkg) as that is the old exports list (i.e. in master the Daemons column no longer exists):
nfs-export-list

@BlaineEXE
Copy link
Member Author

Everything looks to be working with the latest Pacific devel image (soon v16.2.7) and Rook v1.8 beta. Thanks everyone for your help. :)

v1.8 automation moved this from To do to Done Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
v1.8
Done
Development

No branches or pull requests

4 participants