-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] What is the meaning of NFS v4 grace period EVENT: EVENT_RELEASE_IP, EVENT_TAKE_IP? #1108
Comments
Close this issue because it maybe duplicated with issue. It is unclear or not maintained now. |
Reopen to wait for confirm from our community :) |
I don't think these events are related at all. EVENT_RELEASE_IP appears to be used to force-expire a client. EVENT_TAKE_IP is used for server failover. |
There is no description of force-expiring a NFS client according NFS v4.x RFC. And EVENT_RELEASE_IP that forces to expire a client can not work at all according to this issue. No NFS client records will match with the IP released. |
When a nfs-ganesha node named NodeA crashes, the VIP(virtual IP) will be assigned to another node named NodeB. When NodeA comes back, the VIP is assigned to NodeA again. We need an event to tell NodeB to release client_records and states from memory , and still keeps the states in FSAL layer. Then NFS clients connected with that VIP will reclaim their states with NodeA. The implementation of So I think we need an opposite event with Looking forward to help from our community. |
OK, so your question is how to implement fail back, EVENT_RELEASE_IP is for client assisted fail back, the cluster is put into grace period, the release IP is sent to the node that has the state which releases it from the FSAL. Take IP is sent to the recovered node to let it know to accept reclaims on that IP address. The clients will discover their clientid is failed and will reclaim. So basically, for those clients attached to NodeA's virtual IP, it will appear that their server crashed. We have floated the idea of state transfer without this disruption but no one has stepped up with development resources for that yet. |
In our case, lock_state are persisted into FSAL layer. When lock is reclaimed, we check the existence of this lock state from underlying storage. And if it doesn't exist, the lock reclaim request from NFS client will fail. So in our usage, EVENT_RELEASE_IP should release states from nfs-ganesha memory but can NOT release lock state from FSAL layer(underlying storage). I want to do some adaption according to our case. Before adaption, the call path which releases lock from FSAL is as follows:
I want to add an argument
Is this change acceptable to other cases? And are there any potential problems that I miss? Please help review this draft change :) |
Is your persistent lock state enough to be able to reclaim locks without the client assistance? That is something folks are interested in. But to your question, I think the best route might be to add a support bit to the FSAL supported things and then add a parameter set by RELEASE_IP that tests that support bit down in do_lock_op. |
Our NFS v4.x lock implementation is as follows in general. When lock is acquired, we persist the lock record into FSAL(our backend) with its
On failures:
So in our case, EVENT_RELEASE_IP is used to mark those NFS clients that access the moving VIP stale and release their states from nfs-ganesha memory, but still keep the states into FSAL. Those NFS client will receive NFS4ERR_STALE_CLIENTID and try to reclaim their states with the destination nfs-ganesha node. Are there any issues about correctness in our NFS v4.x lock design? If it is ok, I will move ahead with this suggestion and submit a patch to our mainstream.
|
That looks good, please do submit a patch via Gerrithub. |
We also change ip_match() implementation (issue #985) to match those NFS clients that access the VIP. Should these changes be merged into one patch? |
Please submit the other change as a separate patch. |
Some FSAL persists enough information about lock. Itself can control whether to reclaim lock, not completely trusting NFS client. If VIP is detached, the FSAL expects lock record keep in the backend storage, which is used to check following reclaim. See nfs-ganesha#1108 Change-Id: Idb223b115fe846968644c86ddc7f9797fca32957 Signed-off-by: zhitaoli <zhitao.li@iomesh.com>
Some FSAL persists enough information about lock. Itself can control whether to reclaim lock, not completely trusting NFS client. If VIP is detached, the FSAL expects lock record keep in the backend storage, which is used to check following reclaim. See nfs-ganesha#1108 Change-Id: Idb223b115fe846968644c86ddc7f9797fca32957 Signed-off-by: zhitaoli <zhitao.li@iomesh.com>
Some FSAL persists enough information about lock. Itself can control whether to reclaim lock, not completely trusting NFS client. If VIP is detached, the FSAL expects lock record keep in the backend storage, which is used to check following reclaim. See nfs-ganesha#1108 Change-Id: Idb223b115fe846968644c86ddc7f9797fca32957 Signed-off-by: zhitaoli <zhitao.li@iomesh.com>
Please review this patch on Gerrithub :) |
Some FSAL persists enough information about lock. Itself can control whether to reclaim lock, not completely trusting NFS client. If VIP is detached, the FSAL expects lock record keep in the backend storage, which is used to check following reclaim. See nfs-ganesha#1108 Change-Id: Idb223b115fe846968644c86ddc7f9797fca32957 Signed-off-by: zhitaoli <zhitao.li@iomesh.com>
Some FSAL persists enough information about lock. Itself can control whether to reclaim lock, not completely trusting NFS client. If VIP is detached, the FSAL expects lock record keep in the backend storage, which is used to check following reclaim. See nfs-ganesha#1108 Change-Id: Idb223b115fe846968644c86ddc7f9797fca32957 Signed-off-by: zhitaoli <zhitao.li@iomesh.com>
Some FSAL persists enough information about lock. Itself can control whether to reclaim lock, not completely trusting NFS client. If VIP is detached, the FSAL expects lock record keep in the backend storage, which is used to check following reclaim. See nfs-ganesha#1108 Change-Id: Idb223b115fe846968644c86ddc7f9797fca32957 Signed-off-by: zhitaoli <zhitao.li@iomesh.com>
Some FSAL persists enough information about lock. Itself can control whether to reclaim lock, not completely trusting NFS client. If VIP is detached, the FSAL expects lock record keep in the backend storage, which is used to check following reclaim. See nfs-ganesha#1108 Change-Id: Idb223b115fe846968644c86ddc7f9797fca32957 Signed-off-by: zhitaoli <zhitao.li@iomesh.com>
Some FSAL persists enough information about lock. Itself can control whether to reclaim lock, not completely trusting NFS client. If VIP is detached, the FSAL expects lock record keep in the backend storage, which is used to check following reclaim. See nfs-ganesha#1108 Change-Id: Idb223b115fe846968644c86ddc7f9797fca32957 Signed-off-by: zhitaoli <zhitao.li@iomesh.com>
Some FSAL persists enough information about lock. Itself can control whether to reclaim lock, not completely trusting NFS client. If VIP is detached, the FSAL expects lock record keep in the backend storage, which is used to check following reclaim. See nfs-ganesha#1108 Change-Id: Idb223b115fe846968644c86ddc7f9797fca32957 Signed-off-by: zhitaoli <zhitao.li@iomesh.com>
Some FSAL persists enough information about lock. Itself can control whether to reclaim lock, not completely trusting NFS client. If VIP is detached, the FSAL expects lock record keep in the backend storage, which is used to check following reclaim. See nfs-ganesha#1108 Change-Id: Idb223b115fe846968644c86ddc7f9797fca32957 Signed-off-by: zhitaoli <zhitao.li@iomesh.com>
What can I do to move this patch forward? |
i have seen it is on "Ready to submit" status |
Sorry, two weeks ago, I was in a rush to finish the weekly merge and missed this patch, and then last week I ran out of time to do a weekly merge. I should get it merged this week. |
Some FSAL persists enough information about lock. Itself can control whether to reclaim lock, not completely trusting NFS client. If VIP is detached, the FSAL expects lock record keep in the backend storage, which is used to check following reclaim. See #1108 Change-Id: Idb223b115fe846968644c86ddc7f9797fca32957 Signed-off-by: zhitaoli <zhitao.li@iomesh.com>
/close as resolved |
After seeing the implementation about state recovery, I understand as follows.
EVENT_TAKE_IP
means that if some new IP is added to the nfs-ganesha node, then nfs-ganesha enters into grace period to wait state recovery of the old NFS clients. This IP is where NFS server runs.EVENT_RELEASE_IP
means that NFS clients from that IP are marked as expired, and the OP_RECLAIM_COMPLETE will be cancelled. Ganesha enters into grace period and wait for their completion of state reclaim. This IP is where NFS clients runs.I see no caller of this event. What is the background of this event?Is it necessary for nfs-ganesha to enter into grace?
These two events are asymmetrical and quite confusing. Looking forward to some introduction about these events.
The text was updated successfully, but these errors were encountered: