[Question] throttling: how to limit memory usage of nfs-ganesha? #1099

zhitaoli-6 · 2024-03-16T13:23:36Z

In our env, nfs-ganesha will take up too much memory if there is huge amount of data read/write IO load from some NFS client. Then OOM will happen.

I think following config items will not help because file read/write is implemented in an async way. Worker thread receives nfs4_op_write request and submits it into FSAL, again and again. Even only one worker thread can result in high memory usage.

       RPC_Max_Connections(uint32, range 1 to 10000, default 1024)
              Maximum number of connections for TIRPC.

       MaxRPCRecvBufferSize(uint32, range 1 to 1048576*9, default 1048576)
              Size of RPC receive buffer.

       RPC_Ioq_ThrdMax(uint32, range 1 to 1024*128 default 200)
              TIRPC ioq max simultaneous io threads

Are there any mechanisms to limit memory usage? This maybe an overload case for nfs-ganesha. If overload happens, maybe nfs-ganesha can not respond to new NFS requests until it recovers from overload.

The text was updated successfully, but these errors were encountered:

ffilz · 2024-03-17T03:41:50Z

Are you using the async I/O mechanism in the FSAL? Which FSAL? You may be exposing an issue that we need some throttling of async requests.

zhitaoli-6 · 2024-03-17T05:46:29Z

Yeah, we do use async read/write in the FSAL of our developing distributed file system.

	 void (*write2)(struct fsal_obj_handle *obj_hdl,
			bool bypass,
			fsal_async_cb done_cb,
			struct fsal_io_arg *write_arg,
			void *caller_arg);
	 void (*read2)(struct fsal_obj_handle *obj_hdl,
		       bool bypass,
		       fsal_async_cb done_cb,
		       struct fsal_io_arg *read_arg,
		       void *caller_arg);

We use FIO with following parameters from some NFS client. NFS-ganesha runs on the VM with 8G memory and 8 cores. It turns out that nfs-ganesha takes up 4G+ memory. OOM happens eventually.

[global]
filesize=12G
time_based=1
numjobs=32
startdelay=5
exitall_on_error=1
create_serialize=0
filename_format=$jobnum/$filenum/bw.$jobnum.$filenum
directory=/mnt/vd
group_reporting=1
clocksource=gettimeofday
runtime=300
ioengine=psync
disk_util=0
iodepth=1

[read_throughput]
bs=1m
rw=read
direct=1
new_group

ffilz · 2024-03-18T18:51:54Z

You may want to investigate throttling.

It may well be that async I/O is not ready for production yet.

zhitaoli-6 · 2024-03-21T14:20:09Z

Can we add a simple throttling algorithm by set a limit on the total number of inflight requests? If it exceeds threshold, the thread blocks there. The algorithm is effective to avoid too much memory used. For example, if the limit is 256, per request size 1M, then about 256MB memory is used.

It can be implemented in alloc_nfs_request() and free_nfs_request() functions.
The pseudo code maybe like this:

alloc_nfs_request() {
// wait for a condition variable until the value is less than threshold
// inflight_num += 1;
}
free_nfs_request() {
// inflight_num -= 1;
// notify this change to unblock some threads
}

ffilz · 2024-03-21T20:55:59Z

If we do throttling, we want to do fair throttling.

zhitaoli-6 · 2024-03-22T01:49:15Z

Is there any plan to add throttling mechanism into nfs-ganesha?

ffilz · 2024-03-22T02:44:13Z

It is on our short list, but just when we will get to it is very up in the air.

zhitaoli-6 · 2024-03-29T15:05:44Z

Can we add a simple throttling algorithm by set a limit on the total number of inflight requests? If it exceeds threshold, the thread blocks there. The algorithm is effective to avoid too much memory used. For example, if the limit is 256, per request size 1M, then about 256MB memory is used.

We implement this mechanism in our environment and the experiment shows that it is effective to avoid overload of async operations(READ/WRITE). Is this feature acceptable into the community repository?

mattbenjamin · 2024-03-29T16:03:44Z

something is definitely needed for async; and we aspire to some qos. it seems to me that since the current async dispatch mechanism (and selection of sync vs async) are at the fsal level, you shouldn't probably be using a global throttle to control it (although that would probably work ok for a lot of bespoke setups). Past that, we have requests for fairness on other dimensions--in particular, exports/shares. We've been treating that as something different.

zhitaoli-6 · 2024-03-29T16:41:34Z

I agree with you about that it is better to add throttling mechanism into FSAL layer than global scope. Our goal is to avoid the case where nfs-ganesha takes up too much memory because of a lot of aysnc read/write requests. So we have to add the limit of number of inflight async OPs into the FSAL layer. And fairness on other dimensions like exports/share maybe not very in need now.

Is this design reasonable? If so, I can implement and check it in our environment and contribute to our community :)

mattbenjamin · 2024-03-29T18:17:50Z

fsal ops limit/budget? yes, I think so, please proceed governor :)

ffilz · 2024-04-01T16:43:52Z

If you want to submit something for review and discussion, that would be most welcome.

zhitaoli-6 · 2024-04-02T01:45:59Z

The draft implementation above will bring deadlock. All RPC threads are waiting for conditional variable, and no threads can handle that an issued request has finished and wakeup other threads in blocking.

Our libntirpc framework supports both sync and async requests while it has no limit on the concurrency of requests. I think there maybe two solutions here:

Add throttle into libntirpc. If there are too many requests inflight, it doesn't handle new connection or data receive events on sockets.
Add throttle into FSAL of nfs-ganesha. There is a limit on async requests and if it exceeds, FSAL switches into sync requests. So more callback functions need to be added into FSAL API, like sync write/read.

Maybe solution 1 is better because the application does not worry about overload anymore.

ffilz · 2024-04-04T18:49:53Z

If you have something implemented that is working, please submit to Gerrithub for review and discussion.

Matt and I had a conversation about this. We will need a solution ourselves soon, so anything you have would be of interest to us.

Your solution of throttling overall in flight requests is at least a good start. Another option that might be a bit harder because it requires ntirpc and Ganesha to talk is limit the amount of memory in I/O buffers (which is the big killer), so we stop accepting requests when we hit that limit. Maybe that's hard to do.

Longer term, it would be useful for the FSAL to be involved so it can be smart about what types of requests are backing up.

Eventually we will also want some fairness/QOS so one client doesn't hog all the budget.

zhitaoli-6 · 2024-04-07T03:13:47Z

Thanks for your reply. The draft above has deadlock bug and now we let FSAL to return ERR_FSAL_DELAY if there are too many inflight requests. The error code will be translated into NFS4ERR_DELAY or NFS3ERR_JUKEBOX, and NFS client will retry according to mount options.

What's your opinion about this solution?

ffilz · 2024-04-08T17:25:33Z

That could work, though there's no guarantee the client pauses sending other requests. It would be better to do something that blocks the client's IP stream. That would require a signal back to the RPC layer, or maybe simply a limit on inflight requests per SVCXPRT (basically per client). If that limit is hit, the RPC layer stops reading from that TCP stream until the inflight requests drop below the limit. Might want hi and lo water mark for hysteresis.

zhitaoli-6 · 2024-04-09T02:06:48Z

It would be better to do something that blocks the client's IP stream.

I agree with you that it would be better to add throttling mechanism in RPC layer.

mattbenjamin · 2024-04-09T11:55:26Z

Feedback into the RPC layer is an important addition, but in the longer term, I don't think the RPC layer can own all of flow control as it lacks knowledge of the i/o targets (shares, fsals, paths). I am not bringing up client fairness because we agreed earlier this change isn't attempting it.

xiaods · 2024-05-09T06:21:58Z

any patch on gerrit host and can review it? @zhitaoli-6

zhitaoli-6 · 2024-05-09T07:43:19Z

I don't submit a patch because the initial design brings deadlock.

ffilz added the Analyzing label Mar 17, 2024

zhitaoli-6 changed the title ~~[Question] How to limit memory usage of nfs-ganesha?~~ [Question] throttling: how to limit memory usage of nfs-ganesha? Mar 22, 2024

ffilz added enhancement and removed Analyzing labels Mar 25, 2024

ffilz added the Need Info Need more information from the reporter label Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] throttling: how to limit memory usage of nfs-ganesha? #1099

[Question] throttling: how to limit memory usage of nfs-ganesha? #1099

zhitaoli-6 commented Mar 16, 2024

ffilz commented Mar 17, 2024

zhitaoli-6 commented Mar 17, 2024

ffilz commented Mar 18, 2024

zhitaoli-6 commented Mar 21, 2024

ffilz commented Mar 21, 2024

zhitaoli-6 commented Mar 22, 2024

ffilz commented Mar 22, 2024

zhitaoli-6 commented Mar 29, 2024 •

edited

mattbenjamin commented Mar 29, 2024

zhitaoli-6 commented Mar 29, 2024

mattbenjamin commented Mar 29, 2024

ffilz commented Apr 1, 2024

zhitaoli-6 commented Apr 2, 2024 •

edited

ffilz commented Apr 4, 2024

zhitaoli-6 commented Apr 7, 2024

ffilz commented Apr 8, 2024

zhitaoli-6 commented Apr 9, 2024 •

edited

mattbenjamin commented Apr 9, 2024 •

edited

xiaods commented May 9, 2024

zhitaoli-6 commented May 9, 2024

[Question] throttling: how to limit memory usage of nfs-ganesha? #1099

[Question] throttling: how to limit memory usage of nfs-ganesha? #1099

Comments

zhitaoli-6 commented Mar 16, 2024

ffilz commented Mar 17, 2024

zhitaoli-6 commented Mar 17, 2024

ffilz commented Mar 18, 2024

zhitaoli-6 commented Mar 21, 2024

ffilz commented Mar 21, 2024

zhitaoli-6 commented Mar 22, 2024

ffilz commented Mar 22, 2024

zhitaoli-6 commented Mar 29, 2024 • edited

mattbenjamin commented Mar 29, 2024

zhitaoli-6 commented Mar 29, 2024

mattbenjamin commented Mar 29, 2024

ffilz commented Apr 1, 2024

zhitaoli-6 commented Apr 2, 2024 • edited

ffilz commented Apr 4, 2024

zhitaoli-6 commented Apr 7, 2024

ffilz commented Apr 8, 2024

zhitaoli-6 commented Apr 9, 2024 • edited

mattbenjamin commented Apr 9, 2024 • edited

xiaods commented May 9, 2024

zhitaoli-6 commented May 9, 2024

zhitaoli-6 commented Mar 29, 2024 •

edited

zhitaoli-6 commented Apr 2, 2024 •

edited

zhitaoli-6 commented Apr 9, 2024 •

edited

mattbenjamin commented Apr 9, 2024 •

edited