Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q2 transfer stalls occasionally with resend-released enabled flows #1288

Open
kgiusti opened this issue Nov 2, 2023 · 2 comments
Open

Q2 transfer stalls occasionally with resend-released enabled flows #1288

kgiusti opened this issue Nov 2, 2023 · 2 comments
Assignees
Labels
blocked Cannot resolve due to external factor (see comments)

Comments

@kgiusti
Copy link
Contributor

kgiusti commented Nov 2, 2023

This stall does not recover. It appears to occur when the Q2 limit is hit before the PN_RECEIVED arrives from the endpoint. My assumption is that since the buffers sent prior to hitting Q2 do not have their refcounts decremented - by design so we can resent them if necessary - they still remain in the message after the PN_RECEIVED arrives and are never released. This prevents the Q2 low-water mark from ever being achieved.

@kgiusti kgiusti self-assigned this Nov 2, 2023
@kgiusti
Copy link
Contributor Author

kgiusti commented Nov 9, 2023

Status update: this issue appears to have a couple of underlying causes.

  1. There is a bug in how the qd_buffer_t reference counts are managed that causes the stall when Q2 is hit
  2. Aborting the released outgoing delivery is not properly detected in some cases and the abort is ignored. This leaves the released outgoing delivery stalled on the outgoing link. This prevents the outgoing link from being returned to the streaming link pool and the associated message's qd_buffer_t's do not get freed until the outgoing connection is dropped.

I think I have a simple fix for the first problem. The second problem is more complex. We've experienced "flakey" failures of the CI abort functionality tests, so there's something in the abort implementation that needs fixing.

@kgiusti kgiusti added the blocked Cannot resolve due to external factor (see comments) label Nov 9, 2023
@kgiusti
Copy link
Contributor Author

kgiusti commented Nov 9, 2023

Blocked - depends on fix to #1293

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Cannot resolve due to external factor (see comments)
Projects
None yet
Development

No branches or pull requests

1 participant