-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] throttling: how to limit memory usage of nfs-ganesha? #1099
Comments
Are you using the async I/O mechanism in the FSAL? Which FSAL? You may be exposing an issue that we need some throttling of async requests. |
Yeah, we do use async read/write in the FSAL of our developing distributed file system.
We use FIO with following parameters from some NFS client. NFS-ganesha runs on the VM with 8G memory and 8 cores. It turns out that nfs-ganesha takes up 4G+ memory. OOM happens eventually.
|
You may want to investigate throttling. It may well be that async I/O is not ready for production yet. |
Can we add a simple throttling algorithm by set a limit on the total number of inflight requests? If it exceeds threshold, the thread blocks there. The algorithm is effective to avoid too much memory used. For example, if the limit is 256, per request size 1M, then about 256MB memory is used. It can be implemented in alloc_nfs_request() and free_nfs_request() functions.
|
If we do throttling, we want to do fair throttling. |
Is there any plan to add throttling mechanism into nfs-ganesha? |
It is on our short list, but just when we will get to it is very up in the air. |
We implement this mechanism in our environment and the experiment shows that it is effective to avoid overload of async operations(READ/WRITE). Is this feature acceptable into the community repository? |
something is definitely needed for async; and we aspire to some qos. it seems to me that since the current async dispatch mechanism (and selection of sync vs async) are at the fsal level, you shouldn't probably be using a global throttle to control it (although that would probably work ok for a lot of bespoke setups). Past that, we have requests for fairness on other dimensions--in particular, exports/shares. We've been treating that as something different. |
I agree with you about that it is better to add throttling mechanism into FSAL layer than global scope. Our goal is to avoid the case where nfs-ganesha takes up too much memory because of a lot of aysnc read/write requests. So we have to add the limit of number of inflight async OPs into the FSAL layer. And fairness on other dimensions like exports/share maybe not very in need now. Is this design reasonable? If so, I can implement and check it in our environment and contribute to our community :) |
fsal ops limit/budget? yes, I think so, please proceed governor :) |
If you want to submit something for review and discussion, that would be most welcome. |
The draft implementation above will bring deadlock. All RPC threads are waiting for conditional variable, and no threads can handle that an issued request has finished and wakeup other threads in blocking. Our libntirpc framework supports both sync and async requests while it has no limit on the concurrency of requests. I think there maybe two solutions here:
Maybe solution 1 is better because the application does not worry about overload anymore. |
If you have something implemented that is working, please submit to Gerrithub for review and discussion. Matt and I had a conversation about this. We will need a solution ourselves soon, so anything you have would be of interest to us. Your solution of throttling overall in flight requests is at least a good start. Another option that might be a bit harder because it requires ntirpc and Ganesha to talk is limit the amount of memory in I/O buffers (which is the big killer), so we stop accepting requests when we hit that limit. Maybe that's hard to do. Longer term, it would be useful for the FSAL to be involved so it can be smart about what types of requests are backing up. Eventually we will also want some fairness/QOS so one client doesn't hog all the budget. |
Thanks for your reply. The draft above has deadlock bug and now we let FSAL to return ERR_FSAL_DELAY if there are too many inflight requests. The error code will be translated into NFS4ERR_DELAY or NFS3ERR_JUKEBOX, and NFS client will retry according to mount options. What's your opinion about this solution? |
That could work, though there's no guarantee the client pauses sending other requests. It would be better to do something that blocks the client's IP stream. That would require a signal back to the RPC layer, or maybe simply a limit on inflight requests per SVCXPRT (basically per client). If that limit is hit, the RPC layer stops reading from that TCP stream until the inflight requests drop below the limit. Might want hi and lo water mark for hysteresis. |
I agree with you that it would be better to add throttling mechanism in RPC layer. |
Feedback into the RPC layer is an important addition, but in the longer term, I don't think the RPC layer can own all of flow control as it lacks knowledge of the i/o targets (shares, fsals, paths). I am not bringing up client fairness because we agreed earlier this change isn't attempting it. |
any patch on gerrit host and can review it? @zhitaoli-6 |
I don't submit a patch because the initial design brings deadlock. |
In our env, nfs-ganesha will take up too much memory if there is huge amount of data read/write IO load from some NFS client. Then OOM will happen.
I think following config items will not help because file read/write is implemented in an async way. Worker thread receives nfs4_op_write request and submits it into FSAL, again and again. Even only one worker thread can result in high memory usage.
Are there any mechanisms to limit memory usage? This maybe an overload case for nfs-ganesha. If overload happens, maybe nfs-ganesha can not respond to new NFS requests until it recovers from overload.
The text was updated successfully, but these errors were encountered: