Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cephfs: add support for cache management callbacks #605

Open
qiankunli opened this issue Oct 20, 2021 · 9 comments
Open

cephfs: add support for cache management callbacks #605

qiankunli opened this issue Oct 20, 2021 · 9 comments

Comments

@qiankunli
Copy link

qiankunli commented Oct 20, 2021

go-ceph version: v0.8.0

there is a ceph-service pod running in k8s, which use go-ceph to operate cephfs, like(create/delete dir, create/delete dir), but mds report

mds.sh-bs-b1-303-j14-ceph-133-2(mds.0): Client xdl-ceph-client-d6795749-z988c:xdl failing to respond to cache pressure client_id: 6704

I retart ceph-service , mds is ok, is it a bug in go-ceph?

@qiankunli qiankunli changed the title failing to respond to cache pressure client_id failing to respond to cache pressure client_id xx Oct 20, 2021
@phlogistonjohn
Copy link
Collaborator

go-ceph version: v0.8.0

This version is a bit old at this point (8 months ago), but let's assume that it's not an issue for the moment.

Can you also please let us know what versions of the ceph libraries are being linked with go-ceph and the version of ceph running on the server side? Thanks.

there is a ceph-service pod running in k8s, which use go-ceph to operate cephfs, like(create/delete dir, create/delete dir), but mds report

OK, I think I understand the general use-case.

mds.sh-bs-b1-303-j14-ceph-133-2(mds.0): Client xdl-ceph-client-d6795749-z988c:xdl failing to respond to cache pressure client_id: 6704

OK thanks. That warning is issued when the mds wants clients to revoke inodes from the client cache. Refs:

I retart ceph-service , uat environment mds is ok, product environment mds is not ok,

When you restart the ceph-service pod the warnings stop?
I'm not clear on what "uat environment" is.
I don't know what you mean by "product environment mds is not ok".

is it a bug in go-ceph?

Possibly, but to determine if that's the case we need to understand more about how this condition is expected to be resolved by clients using the high-level api calls. go-ceph relies upon the C based APIs provided by ceph. Currently, the only api calls we use are what I call the high-level calls and there's not a lot of control we have over the behavior of things like inode cache or caps.

My gut feeling is that is more likely an issue with the version of the ceph libs in use or the high level libs in general. I suggest first checking to see if this is a known issue with libcephfs. If there are api calls pertaining to cache management I am not aware of them. So we might want to strike up a conversation with the maintainers of libcephfs too.

@qiankunli
Copy link
Author

qiankunli commented Oct 21, 2021

@phlogistonjohn you can ignore the "uat and product environment", when I restart the ceph-service pod, the warnings stop.

ceph version 14.2.21 (5ef401921d7a88aea18ec7558f7f9374ebd8f5a6) nautilus (stable)

ceph library : libcephfs/librados2/librbd1 version

sh-4.2# yum info libcephfs-devel
Loaded plugins: fastestmirror, ovl
Repository epel is listed more than once in the configuration
Repository epel-debuginfo is listed more than once in the configuration
Repository epel-source is listed more than once in the configuration
Loading mirror speeds from cached hostfile
 * base: ftp.sjtu.edu.cn
 * extras: ftp.sjtu.edu.cn
 * updates: mirrors.aliyun.com
Installed Packages
Name        : libcephfs-devel
Arch        : x86_64
Epoch       : 2
Version     : 15.2.13
Release     : 0.el7
Size        : 78 k
Repo        : installed
From repo   : ceph
Summary     : Ceph distributed file system headers
URL         : http://ceph.com/


sh-4.2# yum info librbd1-devel
Loaded plugins: fastestmirror, ovl
Repository epel is listed more than once in the configuration
Repository epel-debuginfo is listed more than once in the configuration
Repository epel-source is listed more than once in the configuration
Loading mirror speeds from cached hostfile
 * base: ftp.sjtu.edu.cn
 * extras: ftp.sjtu.edu.cn
 * updates: mirrors.aliyun.com
Available Packages
Name        : librbd1-devel
Arch        : i686
Epoch       : 1
Version     : 10.2.5
Release     : 4.el7
Size        : 23 k
Repo        : base/7/x86_64
Summary     : RADOS block device headers
URL         : http://ceph.com/


sh-4.2# yum info librados2-devel
Loaded plugins: fastestmirror, ovl
Repository epel is listed more than once in the configuration
Repository epel-debuginfo is listed more than once in the configuration
Repository epel-source is listed more than once in the configuration
Loading mirror speeds from cached hostfile
 * base: ftp.sjtu.edu.cn
 * extras: ftp.sjtu.edu.cn
 * updates: mirrors.aliyun.com
Available Packages
Name        : librados2-devel
Arch        : i686
Epoch       : 1
Version     : 10.2.5
Release     : 4.el7
Size        : 490 k
Repo        : base/7/x86_64
Summary     : RADOS headers
URL         : http://ceph.com/

@phlogistonjohn
Copy link
Collaborator

The pleasant coincidence of a ceph mailing list post indicating that nfs-ganesha once encountered similar problems and a bit more searching shows that there is now api hooks intended for managing client cache pressure:

I also see that there is a newer version: ceph_ll_register_callbacks2

What I'm not clear about is how the low-level api and the high level apis are meant to interact with regards to the cache and these callbacks.

@jtlayton if I may bother you for a moment, since you added the original set of callbacks: is it correct to say that using (only) the high level API doesn't automatically handle the cache pressure requests? Assuming we were to add support for the callbacks how would existing code that uses the high-level api make use of the callbacks? What, if any, additional low-level API functions would be needed to make it useful to someone who has a use-case like @qiankunli's?

Thanks for your time!

@jtlayton
Copy link

Yeah -- we probably need to add some documentation comments to struct ceph_client_callback_args. If you wouldn't mind opening a bug at tracker.ceph.com, we can add some comments in the near future. For now though, you unfortunately have to look at the userland client code and suss out how each gets called.

For this problem, you're probably mostly interested in .dentry_cb and .ino_release_cb. These get called when the MDS wants to trim the client's record of a Dentry or an Inode (usually to reduce its own memory consumption).

Ganesha keeps an cache of Inode references, and it registers an .ino_release_cb that tries to release the corresponding NFS filehandle when a request to trim that Inode comes in. It's not guaranteed to work if the FH is in use, but these usually come in sets and eventually some get released.

->dentry_cb works in a similar fashion for Dentry objects. Ganesha's FSAL_CEPH doesn't use that one, but ceph-fuse does.
The rest are somewhat specific to the ceph-fuse use-case.

Probably, you want to expose most or all of these via go-ceph. It's been a while since I did any Go work though, so I won't presume how you should go about it.

@jtlayton
Copy link

So, to be clear:

@jtlayton if I may bother you for a moment, since you added the original set of callbacks: is it correct to say that using (only) the high level API doesn't automatically handle the cache pressure requests?

Correct. This is really an application-level problem with the ll_* API. The application is generally holding references to Dentry and Inode objects, and usually needs to do something application-specific to release them in response to a request from the MDS to trim them.

@phlogistonjohn
Copy link
Collaborator

Yeah -- we probably need to add some documentation comments to struct ceph_client_callback_args. If you wouldn't mind opening a bug at tracker.ceph.com, we can add some comments in the near future.

Thanks for the suggestion. Done at https://tracker.ceph.com/issues/53004

For now though, you unfortunately have to look at the userland client code and suss out how each gets called.

OK, thanks for the heads up. I've already looked over the test C file and that helped somewhat. I'll take a look at some other examples too.

For this problem, you're probably mostly interested in .dentry_cb and .ino_release_cb. These get called when the MDS wants to trim the client's record of a Dentry or an Inode (usually to reduce its own memory consumption).

Ganesha keeps an cache of Inode references, and it registers an .ino_release_cb that tries to release the corresponding NFS filehandle when a request to trim that Inode comes in. It's not guaranteed to work if the FH is in use, but these usually come in sets and eventually some get released.

OK, thanks!

->dentry_cb works in a similar fashion for Dentry objects. Ganesha's FSAL_CEPH doesn't use that one, but ceph-fuse does. The rest are somewhat specific to the ceph-fuse use-case.

Sure. I can look into how both systems use the callback APIs down the road, to help serve as examples.

Probably, you want to expose most or all of these via go-ceph. It's been a while since I did any Go work though, so I won't presume how you should go about it.

Yeah, we currently have only bindings for a (good, but incomplete) set of the high-level api functions. I have suspected for a while we'd eventually want to cover more of the low level APIs too. On the plus side we've done some callback support for the rbd packge so we have some experience with callbacks between the C and Go layers.

So, to be clear:

@jtlayton if I may bother you for a moment, since you added the original set of callbacks: is it correct to say that using (only) the high level API doesn't automatically handle the cache pressure requests?

Correct. This is really an application-level problem with the ll_* API. The application is generally holding references to Dentry and Inode objects, and usually needs to do something application-specific to release them in response to a request from the MDS to trim them.

Thanks, that's probably the cause of @qiankunli issues then. One thing I'm still not clear on (but perhaps I just need to review the code closer) how do the high-level and low-level APIs interact. Is there something like a (private) cache the high level calls use that we need to be aware of?

@phlogistonjohn
Copy link
Collaborator

@qiankunli based on this conversation it appears that the issue is not a bug per-se, but something more architectural. As such, we're interested in improving go-ceph to handle this case but this could be a long process.

I'm changing this from a question to an enhancement. However, I can't make any promises as to when I or other contributors will start working on this directly, and when we do it will likely land in parts over time.

In the meantime, perhaps it could help if your application periodically "reset" during downtime, or you may just continue what you've been doing...

@phlogistonjohn phlogistonjohn changed the title failing to respond to cache pressure client_id xx cephfs: add support for cache management callbacks Oct 21, 2021
@jtlayton
Copy link

The high level API was made to mirror the POSIX filesystem API. It has its own file descriptor table, etc. to closely mirror how the kernel syscall API works. e.g.:

int ceph_read(struct ceph_mount_info *cmount, int fd, char *buf, int64_t size, int64_t offset);

The fd is synthetic and generated by libcephfs as the result of an earlier ceph_open call.

The low-level API works with Inode and Fh objects (which are really like file descriptions). So you (usually) do a lookup and then use the resulting Inode or Fh object to do other calls. Once you're done you have to put references to the objects, to ensure there aren't leaks.

Aside from that, they mostly end up calling the same code under the hood.

@qiankunli
Copy link
Author

qiankunli commented Oct 22, 2021

@phlogistonjohn just introduce the our usage of go-ceph
I need a restful api which support curd(create/update/read/delete) the cephfs directory, so that some service(java/go etc) can request it easy without integrating with .so library. most service running in physical machine, so it is difficult to install .so library on all machine which running the service.
so I create the ceph-service running in k8s, which use go-ceph to operate cephfs and provide restful api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants