Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: Improve rpc clnt connection cleanup process #4329

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

mohit84
Copy link
Contributor

@mohit84 mohit84 commented Apr 3, 2024

During the first rpc clnt submission we take the rpc reference and register the call_bail function for the timer thread. The timer thread call call_bail function every 10s basis. In case if a client trigger a shutdown request it try to call rpc_clnt_connection_cleanup to cleanup the rpc connection.The rpc_clnt_connection would not be able to cleanup the rpc connection successfully due to the cleanup_started flag being set by the upper xlator. The rpc reference will be unref only after trigger a call_bail function so basically if somehow call_bail is triggered just before start a shutdown process the application has to wait for 10s to cleanup the rpc connection eventually the process becomes slow.

Solution: Unref the rpc object based on the conn->timer/conn->reconnect pointer value as we are doing the same for ping_timer. These pointer are always modified under the critical section so we can assume if pointer is valid it means rpc reference is also valid.

Fixes: #4320
credits: Xavi Hernandez xhernandez@redhat.com
Change-Id: Ib947b8bfcbe1b49e1ed05a50a84de6f92afbca13

@mohit84
Copy link
Contributor Author

mohit84 commented Apr 3, 2024

/run regression

1 similar comment
@mohit84
Copy link
Contributor Author

mohit84 commented Apr 3, 2024

/run regression

@mohit84
Copy link
Contributor Author

mohit84 commented Apr 3, 2024

/run regression

@gluster-ant
Copy link
Collaborator

0 test(s) failed

1 test(s) generated core
./tests/000-flaky/basic_afr_split-brain-favorite-child-policy.t

1 test(s) needed retry
./tests/000-flaky/basic_afr_split-brain-favorite-child-policy.t

1 flaky test(s) marked as success even though they failed
./tests/000-flaky/basic_afr_split-brain-favorite-child-policy.t
https://build.gluster.org/job/gh_centos7-regression/3389/

During the first rpc clnt submission we take the rpc reference and
register the call_bail function for the timer thread. The timer thread
call call_bail function every 10s basis. In case if a client trigger a
shutdown request it try to call rpc_clnt_connection_cleanup to cleanup
the rpc connection.The rpc_clnt_connection would not be able to cleanup
the rpc connection successfully due to the cleanup_started flag being set by
the upper xlator. The rpc reference will be unref only after trigger
a call_bail function so basically if somehow call_bail is triggered just
before start a shutdown process the application has to wait for 10s
to cleanup the rpc connection eventually the process becomes slow.

Solution: Unref the rpc object based on the conn->timer/conn->reconnect
pointer value as we are doing the same for ping_timer. These pointer are always
modified under the critical section so we can assume if pointer is valid it means
rpc reference is also valid.

Fixes: gluster#4320
credits: Xavi Hernandez <xhernandez@redhat.com>
Change-Id: Ib947b8bfcbe1b49e1ed05a50a84de6f92afbca13
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
@mohit84
Copy link
Contributor Author

mohit84 commented Apr 3, 2024

/run regression

@gluster-ant
Copy link
Collaborator

1 test(s) failed
./tests/basic/ec/ec-badfd.t

0 test(s) generated core

3 test(s) needed retry
./tests/000-flaky/glusterd-restart-shd-mux.t
./tests/basic/afr/ta-shd.t
./tests/basic/ec/ec-badfd.t
https://build.gluster.org/job/gh_centos7-regression/3390/

@mohit84
Copy link
Contributor Author

mohit84 commented Apr 4, 2024

/run regression

@gluster-ant
Copy link
Collaborator

1 test(s) failed
./tests/basic/ec/ec-badfd.t

0 test(s) generated core

1 test(s) needed retry
./tests/basic/ec/ec-badfd.t
https://build.gluster.org/job/gh_centos7-regression/3391/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

qemu-img create via libfapi is slow
2 participants