-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
isConnectionOpen() and the return token from Publish() didn't precisely reflect the TCP connection #449
Comments
Hi @KShih, The response to
With your options I would expect internalConnLost to leave your messages in the store (clean up only happens I wonder if your issue relates to this issue #442 (can you please confirm which broker you are using?); perhaps try pull request #443 and see if that makes a difference. When I experienced the issue (using Mosquitto) I was sending messages more quickly (4 a second) but experienced something similar. Since implementing that change this unit has been sending 4 messages a second over an unreliable satellite connection for a month without any noticeable packet loss (sometimes a failure on the client end causes messages not to be generated so it's difficult to be sure but when I've looked all of the packet loss has been attributable to that). To enable further investigation it would be useful to have debug logs (see this issue for instructions; ideally run this with pull request #443 in place (because that avoids reusing message ID's it makes it a lot easier to trace each message in cases like this). It may also be worth checking the brokers logs to see if there is any record of the missing message ID's there (I assume you have subscribed at QOS 2 as well). Matt |
Hi @MattBrittan , Thanks for your help!
Referred to this article, my situation might seem like Here is the debug log, let me briefly explain the experiment:
DEBUG Log
Trying Some workaroundSo, I've continully working on it by adjusting the
DEBUG log with 2 payloads lost (val=7 and val=8)
I'm not sure if it's the right direction that I should work on, or is there any better workaround on it. |
One question on that. You are connecting with CleanSession set to true. Is whatever you are using to monitor the session also connecting with CleanSession = true and, if so, will it also loose connection when you pull out the network lead? CleanSession tells the broker to forget about any subscriptions when the connection is lost so it will drop any messages received while the subscriber is disconnected; this means that your subscriber would only start receiving messages again once it had connected AND sent a SUBSCRIBE (anything in the interim would be lost). I would suggest setting CleanSession at both ends to FALSE when testing this (if your aim is to process all messages then CleanSession=TRUE is a bad idea). Looking at the logs:
So message 19 got sent then there was an attempt to publish another message (20 - "enter Publish") and this detected the error on send and called Checking these logs would be simpler with PR #446 because this adds extra logging in strategic locations (@alsm any chance of this being accepted?).
This will help but it's really just hiding the issue (setting these will enable the package to detect the network link drop more quickly but my main concern is the lost messages - that should not happen however long the link is down). This library should be able to handle a long network outage without loosing any messages. |
Hi @MattBrittan , I hope you had a good weekend.
Thanks for notifying this! Although I manage the disconnect and reconnect logic on my own code(will talk about this later), it sounds better to keep it
Okay, it's because I handle this part on my own code, and the reason for that is the returned if !client.IsConnectionOpen() {
for _, payload := range payloads {
WriteToSqlite(payload)
}
result = false
} else {
for _, payload := range payloads {
if token := client.Publish(topic, mqttQoS["AtLeaseOnce"], true, payload); token.Wait() && token.Error() != nil {
fmt.Println(token.Error())
WriteToSqlite(payload)
result = false
}
}
}
return result As you can see in the This is the debug log if I comment out the
DEBUG LOG
Disconnect on val=5; reconnect on val=23, lost package val=6~val=22.
This is what I thought too. For me, I think the main issue here is Why the token.Error() that the Publish() return, didn't reflect the network condition? And how could we solve it on Mac and Ubuntu (there is no issue on Windows)? |
I'm afraid I'll have to disagree. The test you are performing (unplugging the network lead) is unrealistic; the computer is able to detect that immediately (even if that does not immediately have an impact at the TCP layer). While that may happen with a production system it seems more likely that the issue will arise somewhere else (perhaps the broker is stopped without properly shutting down; maybe a router is restarted and the NAT entry lost) and it is often not possible to detect these conditions immediately (you can get close using keepalives but doing so will chew through bandwidth). To my way of thinking the critical thing is that the message gets through eventually whatever happens to the connection (so long as it can be re-established at some point).
You have
I think that I'll need a Minimal, Reproducible Example to continue debugging this. The issue is that the final outcome depends upon both the library and your code so without access to a complete program its difficult to determine what is happening. Would you be able to try your scenario using the broker at test.mosquitto.org (see http://test.mosquitto.org for full details). If the issue occurs with that broker then you would have a standalone example you could share with me; if it works OK then we will be one step nearer to finding the issue). As mentioned in my previous message PR #446 adds some logging that may help trace the issue but the version you are using logs very little when it comes to resending messages. From the logs you included it looks like messages from the store are not being resubmitted but there is insufficient information to identify the cause. |
Hi @MattBrittan , After taking your advice of enabling the DEBUG log and create minimal and reproducible example, I found out it's my silly mistake to let the package lost.
And finally, I found out the // const.go
var mqttQoS = map[string]byte{
"AtMostOnce": 0,
"AtLeastOnce": 1,
"ExactlyOnce": 2,
}
// and I used it as:
client.Publish(topic, mqttQoS["AtLeaseOnce"], true, payload)
// equivalent to: client.Publish(topic, 0, true, payload) Thank you for your help, and I'm sorry that the whole bunch of issues is raising from the typo. |
No worries - happy its fixed (and I frequently find the problem when trying to make a simple example!). If you are happy this is resolved please close the issue, |
Env
github.com/eclipse/paho.mqtt.golang v1.2.1-0.20200609161119-ca94c5368c77
Experiment
Sample Code
Result
isConnectionOpen: false
, immediately after disconnecting the internetread tcp LOCAL_IP -> BROKER_IP:1883: wsarecv: A socket operation was attempted to an unreachable network.
isConnectionOpen: true
, even when internet connection is lost for 16 seconds, thenisConnectionOpen: false
write tcp LOCAL_IP -> BROKER_IP:1883: write: can't assign requested address
isConnectionOpen: true
, even when internet connection is lost for 26 seconds, thenisConnectionOpen: false
pinggresp not received, disconnecting
Discuss
isConnectionOpen()
status, and even will not result intoken.Error()
immediately when disconnecting the internet.The text was updated successfully, but these errors were encountered: