Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] throttling: how to limit memory usage of nfs-ganesha? #1099

Open
zhitaoli-6 opened this issue Mar 16, 2024 · 20 comments
Open

[Question] throttling: how to limit memory usage of nfs-ganesha? #1099

zhitaoli-6 opened this issue Mar 16, 2024 · 20 comments
Labels
enhancement Need Info Need more information from the reporter

Comments

@zhitaoli-6
Copy link
Contributor

In our env, nfs-ganesha will take up too much memory if there is huge amount of data read/write IO load from some NFS client. Then OOM will happen.

I think following config items will not help because file read/write is implemented in an async way. Worker thread receives nfs4_op_write request and submits it into FSAL, again and again. Even only one worker thread can result in high memory usage.

       RPC_Max_Connections(uint32, range 1 to 10000, default 1024)
              Maximum number of connections for TIRPC.

       MaxRPCRecvBufferSize(uint32, range 1 to 1048576*9, default 1048576)
              Size of RPC receive buffer.

       RPC_Ioq_ThrdMax(uint32, range 1 to 1024*128 default 200)
              TIRPC ioq max simultaneous io threads

Are there any mechanisms to limit memory usage? This maybe an overload case for nfs-ganesha. If overload happens, maybe nfs-ganesha can not respond to new NFS requests until it recovers from overload.

@ffilz
Copy link
Member

ffilz commented Mar 17, 2024

Are you using the async I/O mechanism in the FSAL? Which FSAL? You may be exposing an issue that we need some throttling of async requests.

@zhitaoli-6
Copy link
Contributor Author

Yeah, we do use async read/write in the FSAL of our developing distributed file system.

	 void (*write2)(struct fsal_obj_handle *obj_hdl,
			bool bypass,
			fsal_async_cb done_cb,
			struct fsal_io_arg *write_arg,
			void *caller_arg);
	 void (*read2)(struct fsal_obj_handle *obj_hdl,
		       bool bypass,
		       fsal_async_cb done_cb,
		       struct fsal_io_arg *read_arg,
		       void *caller_arg);

We use FIO with following parameters from some NFS client. NFS-ganesha runs on the VM with 8G memory and 8 cores. It turns out that nfs-ganesha takes up 4G+ memory. OOM happens eventually.

[global]
filesize=12G
time_based=1
numjobs=32
startdelay=5
exitall_on_error=1
create_serialize=0
filename_format=$jobnum/$filenum/bw.$jobnum.$filenum
directory=/mnt/vd
group_reporting=1
clocksource=gettimeofday
runtime=300
ioengine=psync
disk_util=0
iodepth=1

[read_throughput]
bs=1m
rw=read
direct=1
new_group 

@ffilz
Copy link
Member

ffilz commented Mar 18, 2024

You may want to investigate throttling.

It may well be that async I/O is not ready for production yet.

@zhitaoli-6
Copy link
Contributor Author

Can we add a simple throttling algorithm by set a limit on the total number of inflight requests? If it exceeds threshold, the thread blocks there. The algorithm is effective to avoid too much memory used. For example, if the limit is 256, per request size 1M, then about 256MB memory is used.

It can be implemented in alloc_nfs_request() and free_nfs_request() functions.
The pseudo code maybe like this:

alloc_nfs_request() {
// wait for a condition variable until the value is less than threshold
// inflight_num += 1;
}
free_nfs_request() {
// inflight_num -= 1;
// notify this change to unblock some threads
}

@ffilz
Copy link
Member

ffilz commented Mar 21, 2024

If we do throttling, we want to do fair throttling.

@zhitaoli-6
Copy link
Contributor Author

Is there any plan to add throttling mechanism into nfs-ganesha?

@ffilz
Copy link
Member

ffilz commented Mar 22, 2024

It is on our short list, but just when we will get to it is very up in the air.

@zhitaoli-6 zhitaoli-6 changed the title [Question] How to limit memory usage of nfs-ganesha? [Question] throttling: how to limit memory usage of nfs-ganesha? Mar 22, 2024
@zhitaoli-6
Copy link
Contributor Author

zhitaoli-6 commented Mar 29, 2024

Can we add a simple throttling algorithm by set a limit on the total number of inflight requests? If it exceeds threshold, the thread blocks there. The algorithm is effective to avoid too much memory used. For example, if the limit is 256, per request size 1M, then about 256MB memory is used.

We implement this mechanism in our environment and the experiment shows that it is effective to avoid overload of async operations(READ/WRITE). Is this feature acceptable into the community repository?

@mattbenjamin
Copy link
Contributor

something is definitely needed for async; and we aspire to some qos. it seems to me that since the current async dispatch mechanism (and selection of sync vs async) are at the fsal level, you shouldn't probably be using a global throttle to control it (although that would probably work ok for a lot of bespoke setups). Past that, we have requests for fairness on other dimensions--in particular, exports/shares. We've been treating that as something different.

@zhitaoli-6
Copy link
Contributor Author

I agree with you about that it is better to add throttling mechanism into FSAL layer than global scope. Our goal is to avoid the case where nfs-ganesha takes up too much memory because of a lot of aysnc read/write requests. So we have to add the limit of number of inflight async OPs into the FSAL layer. And fairness on other dimensions like exports/share maybe not very in need now.

Is this design reasonable? If so, I can implement and check it in our environment and contribute to our community :)

@mattbenjamin
Copy link
Contributor

fsal ops limit/budget? yes, I think so, please proceed governor :)

@ffilz
Copy link
Member

ffilz commented Apr 1, 2024

If you want to submit something for review and discussion, that would be most welcome.

@ffilz ffilz added the Need Info Need more information from the reporter label Apr 1, 2024
@zhitaoli-6
Copy link
Contributor Author

zhitaoli-6 commented Apr 2, 2024

The draft implementation above will bring deadlock. All RPC threads are waiting for conditional variable, and no threads can handle that an issued request has finished and wakeup other threads in blocking.

Our libntirpc framework supports both sync and async requests while it has no limit on the concurrency of requests. I think there maybe two solutions here:

  1. Add throttle into libntirpc. If there are too many requests inflight, it doesn't handle new connection or data receive events on sockets.
  2. Add throttle into FSAL of nfs-ganesha. There is a limit on async requests and if it exceeds, FSAL switches into sync requests. So more callback functions need to be added into FSAL API, like sync write/read.

Maybe solution 1 is better because the application does not worry about overload anymore.

@ffilz
Copy link
Member

ffilz commented Apr 4, 2024

If you have something implemented that is working, please submit to Gerrithub for review and discussion.

Matt and I had a conversation about this. We will need a solution ourselves soon, so anything you have would be of interest to us.

Your solution of throttling overall in flight requests is at least a good start. Another option that might be a bit harder because it requires ntirpc and Ganesha to talk is limit the amount of memory in I/O buffers (which is the big killer), so we stop accepting requests when we hit that limit. Maybe that's hard to do.

Longer term, it would be useful for the FSAL to be involved so it can be smart about what types of requests are backing up.

Eventually we will also want some fairness/QOS so one client doesn't hog all the budget.

@zhitaoli-6
Copy link
Contributor Author

Thanks for your reply. The draft above has deadlock bug and now we let FSAL to return ERR_FSAL_DELAY if there are too many inflight requests. The error code will be translated into NFS4ERR_DELAY or NFS3ERR_JUKEBOX, and NFS client will retry according to mount options.

What's your opinion about this solution?

@ffilz
Copy link
Member

ffilz commented Apr 8, 2024

That could work, though there's no guarantee the client pauses sending other requests. It would be better to do something that blocks the client's IP stream. That would require a signal back to the RPC layer, or maybe simply a limit on inflight requests per SVCXPRT (basically per client). If that limit is hit, the RPC layer stops reading from that TCP stream until the inflight requests drop below the limit. Might want hi and lo water mark for hysteresis.

@zhitaoli-6
Copy link
Contributor Author

zhitaoli-6 commented Apr 9, 2024

It would be better to do something that blocks the client's IP stream.

I agree with you that it would be better to add throttling mechanism in RPC layer.

@mattbenjamin
Copy link
Contributor

mattbenjamin commented Apr 9, 2024

Feedback into the RPC layer is an important addition, but in the longer term, I don't think the RPC layer can own all of flow control as it lacks knowledge of the i/o targets (shares, fsals, paths). I am not bringing up client fairness because we agreed earlier this change isn't attempting it.

@xiaods
Copy link

xiaods commented May 9, 2024

any patch on gerrit host and can review it? @zhitaoli-6

@zhitaoli-6
Copy link
Contributor Author

I don't submit a patch because the initial design brings deadlock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Need Info Need more information from the reporter
Projects
None yet
Development

No branches or pull requests

4 participants