New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
With persistence enabled, delivery of incoming messages can hang #423
Comments
I'll add some notes below that may explain some of the behaviour but as the options play a big role in issues like this please do submit the code/logs (if messages are not being received then logs from mosquitto can be helpful too). Without full details its difficult to say too much (but I do have quite a few instances that have been running for six months plus on dodgy connections with persistence enabled; so it can work). Please note that there have been some fairly major changes to the package recently so it would be great if you could use
The library should handle these messages but I would expect them to be dropped. If you subscribe with QOS 1+ and
This is fine if |
Okay I have not been using
I'm able to reproduce the bug regardless of auto-connect setting, and the bug happens immediately when starting up. Also does not seem to matter where I put the subscribe so much as when it is called. Adding a Sleep in that method gets it to happen pretty much every time, but it's important to note it can still happen without the delay. I've attached a zip of a small test repo to reproduce the bug complete with a publish shell script, mosquitto.conf, and some instructions on reproducing. Should only take a minute to reproduce. |
I agree that improving the documentation for this would be beneficial. I generally setup a default handler that simply logs anything it receives as an error (on the basis that I should not receive any unexpected messages); there may be cases where unexpected messages are not an error (so building this into the library is probably not the right approach).
That comment related to your use of calling Subscribe after a successful Connect (really just a general suggestion to call subscribe from within Before I take a detailed look at your code can you please confirm what "the bug" is (i.e. a hang, lost messages or a failure to subscribe?). Your initial issue mentioned a number of problems so I want to be sure what I am looking for. If the issue is simply that you are loosing messages if you do not call |
Totally agree. I think the important thing is just to make sure the client doesn't lock up when it gets the persisted messages without having a route or default publisher, and then just document the right way to receive these messages if you want them.
Yeah I didn't ever really explain the bug in full: In the test, I am constantly publishing QoS1 messages to a persisted topic. The bug occurs when I am starting up my go program after having already connected to the broker once (so it knows it needs to store/send me messages if I close the app and reopen). In the test code (with the delay taken out, you'll see that in the comments in the onConnect callback), sometimes the paho library seems to connect, subscribe, and receive persisted messages just fine. But sometimes, in what we'll call the failure case, the paho library seems to stall in delivering messages to my callback(s), permanently. This happens pretty much immediately after the connect/subscribe. After this, no messages come through in my onMsg handler(s) or into the paho go as far as I can tell, despite the paho go logs showing it is sending/receiving pings to the broker. It appears the paho go client is successfully sending the sub and getting suckack, but due to these "stray" persisted messages that come in beforehand, it appears delivery to onMsg breaks down somewhere. The thing that's weird to me is that the broker seems to stop offloading messages, so if we don't hit the failure case on the next run, the messages come through. I suppose this is because broker is not getting ack'd. To me it seems like a stalled goroutine, or sending on a nil channel, but I'm not really sure. The desired behavior can be seen by using |
This issue is caused by a number of factors:
My previous comment:
is both correct and somewhat misleading. The library will ignore packets for which there is no handler completly; this means that no This could be addressed by acknowledging such messages but that would lead to lost messages (if we do not acknowledge them mosquitto will resend them when it is next restarted). So while the current behaviour is not ideal it may be the best option available to minimise data loss. For now I have raised PR #424 which adds debug messages when this occurs to highlight the issue and also suggests adding routes before calling connect in this situation. |
Great analysis, thank you for looking into it. That makes sense with what I was seeing.
I think that's reasonable. Should be pretty clear to anyone that may experience this in the future. |
Closing this because I believe it was resolved (and has been inactive for quite a while). |
An application I'm working on requires that we have persistence enabled for the broker. While testing persistence, I found that occasionally my go program would start up and appear to "hang" (runs but not receiving messages) when it should be receiving stored and/or new messages from the broker (mosquitto on localhost in this case). Sometimes after restarting the program, the messages come through, other times they don't. But once they do come through, they come through as expected, so it doesn't look like I'm missing messages.
Initially I had my client calling
Subscribe
after a successfulConnect
in the same method. After reading another issue here, I tried putting the subscribe inside my onConnect callback. This seemed to lower the rate of the issue happening, but on occasion it would still occur.After enabling the
MQTT.DEBUG
logs it looked like everything was fine in terms of data to/from the broker. And once the "hang" did happen, ping req/response were still going through. The startup log is a bit different every time due to goroutines executing at different times, so it's difficult to use those to pinpoint an issue, but I noticed that in the failure case the broker was receiving a message before it had successfully subscribed.Given all of that, I tried first calling
AddRoute
on the topics before callingConnect
. After that change, I was unable to get my application to hang. In my case I'm subscribing to 2 different wildcard topics and the issue was reproducible with 1.2.0 and current master.I can provide test code and logs but I wanted to make sure this was not just an undocumented quirk in the library? If not it seems that there's a bug with the library not able to handle incoming messages immediately upon connecting unless
AddRoute
has been called beforehand.The text was updated successfully, but these errors were encountered: