-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
struck on Connection timeout error #54
Comments
@fracasula looks like you added that change recently, can you help me look into it. |
Looks like the issue is due to the use of
and then calling As |
Looks like there is also issue with defering waitgroup wait method in cleanup function, Its basically struck waiting. I added the defer waitgroup.Wait() to the start of connect method and it was working for me. I have the change in my local commit here -> golthitarun@324831c I can create a PR if this looks good, lemme know |
The issue seems to be with the mutex like @MattBrittan pointed out. It's acquired by |
I was able to replicate with a simple test and found 2 issues so far:
I'll investigate further and I also think we should handle all context errors in c.debug.Println("waiting for CONNACK")
var cap *packets.Connack
select {
case <-connCtx.Done():
if e := connCtx.Err(); e != nil {
switch e {
case context.DeadlineExceeded:
c.debug.Println("timeout waiting for CONNACK")
case context.Canceled:
c.debug.Println("context canceled")
default:
c.debug.Println(fmt.Sprintf("unknown context error: %v", e))
}
cleanup()
return nil, e
}
case cap = <-c.caCtx.Return:
} |
Well spotted on the issue handling the context - It looks like the same issue is in a few other places (e.g. here, here, here etc). This is definitely a bug - the parent context may be cancelled for another reason and this should result in the process terminating (so if
I think it's enough to just add the context error to the log (e.g. |
Yes I agree @MattBrittan, it does make sense to return if Also, thanks for the other links, I'll have a look at those too. About the deadlock I was finally able to identify it. The Here's the stack:
|
I've been working on a somewhat similar issue in the v3 library and had meant to check how the v5 client dealt with this :-) ... Managing mutexs with lots of goRoutines is not much fun. Your stack trace is from a modified version so the line numbers don't match up with the current |
Please have a look at this PR: #56 |
Thank you @fracasula |
Describe the bug
When connection timeout occurs during client connection the code is blocked in cleanup method.
I was able to make it run by removing the sync mutex lock code added here recently
paho.golang/paho/client.go
Line 155 in 6f81099
To reproduce
Try to reproduce connection timeout in connect method
Debug output
Not seeing any output, during debug I observed the code being struck in gopark method of go. https://golang.org/pkg/runtime/?m=all#gopark
Expected behaviour
An error response
Software used:
Additional context
The text was updated successfully, but these errors were encountered: