Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isConnectionOpen() and the return token from Publish() didn't precisely reflect the TCP connection #449

Closed
KShih opened this issue Aug 20, 2020 · 7 comments

Comments

@KShih
Copy link

KShih commented Aug 20, 2020

Env

  • GO 1.13.14
  • package version github.com/eclipse/paho.mqtt.golang v1.2.1-0.20200609161119-ca94c5368c77
clientOptions.SetAutoReconnect(true)
clientOptions.SetConnectRetry(true)
clientOptions.SetCleanSession(true)
clientOptions.SetConnectRetryInterval(1 * time.Second)
clientOptions.SetMaxReconnectInterval(1 * time.Second)
clientOptions.SetConnectionLostHandler(myHandlerFunction)

Experiment

  1. Publish value per second, each value increase by 1 per published
  2. Disconnect the internet
  3. (During the disconnection, doing some recovery step to keep those publishing failed data)
  4. Reconnect the internet
  5. Publish those publishing failed data

Sample Code

for _, payload := range payloads {
        fmt.Println("isConnectionOpen: "client.IsConnectionOpen())
	if token := client.Publish(topic, 1, true, payload); token.Wait() && token.Error() != nil {
		fmt.Println(token.Error())
		// recovery step, e.g.: store the payload into local sqlite
	}
}

Result

  1. on Windows10: no packets lost
  • Print: isConnectionOpen: false, immediately after disconnecting the internet
  • Print: read tcp LOCAL_IP -> BROKER_IP:1883: wsarecv: A socket operation was attempted to an unreachable network.
  1. on macOS: 16 packets lost
  • Print: isConnectionOpen: true, even when internet connection is lost for 16 seconds, then
  • Print: isConnectionOpen: false
  • Print: write tcp LOCAL_IP -> BROKER_IP:1883: write: can't assign requested address
  1. on Ubuntu: 26 packets lost
  • Print: isConnectionOpen: true, even when internet connection is lost for 26 seconds, then
  • Print: isConnectionOpen: false
  • Print: pinggresp not received, disconnecting

Discuss

  1. For me, it looks like different OS and their implementation of TCP connect and disconnect event will result in different isConnectionOpen() status, and even will not result in token.Error() immediately when disconnecting the internet.
  2. I'm not an expert in the Network area, so any feedback or discussion would be wonderful 😃 Thanks!
@MattBrittan
Copy link
Contributor

Hi @KShih,

The response to IsConnectionOpen() should change as soon as the loss of connection is detected (see internalConnLost) but how long it takes to detect the loss of connection will vary (see this article for a discussion of some of the issues). So the delay in the response to IsConnectionOpen() is unavoidable (without frequent keep-alives which generate additional traffic) and may be impacted by the OS TCP stack; however I am concerned by the packet loss (because the MemoryStore is used by default and should resend the messages automatically upon re-connection).

clientOptions.SetAutoReconnect(true)
clientOptions.SetConnectRetry(true)
clientOptions.SetCleanSession(true)

With your options I would expect internalConnLost to leave your messages in the store (clean up only happens if c.options.CleanSession && !c.options.AutoReconnect) (despite you receiving the token error the message should be resent when the connection comes back up). Upon reconnection resume is called which should resend any in progress messages (from the store; held in memory by default). The idea with AutoReconnect is that once you call Publish your message should get delivered if at all possible (and you cannot fully rely on the tokens; a cancelled token does not mean the message will not be retried).

I wonder if your issue relates to this issue #442 (can you please confirm which broker you are using?); perhaps try pull request #443 and see if that makes a difference. When I experienced the issue (using Mosquitto) I was sending messages more quickly (4 a second) but experienced something similar. Since implementing that change this unit has been sending 4 messages a second over an unreliable satellite connection for a month without any noticeable packet loss (sometimes a failure on the client end causes messages not to be generated so it's difficult to be sure but when I've looked all of the packet loss has been attributable to that).

To enable further investigation it would be useful to have debug logs (see this issue for instructions; ideally run this with pull request #443 in place (because that avoids reusing message ID's it makes it a lot easier to trace each message in cases like this). It may also be worth checking the brokers logs to see if there is any record of the missing message ID's there (I assume you have subscribed at QOS 2 as well).

Matt

@KShih
Copy link
Author

KShih commented Aug 21, 2020

Hi @MattBrittan ,

Thanks for your help!
Quick response to your question:

  1. I'm using RMQ as the broker, QoS = 1.
  2. And yes, all the packets lost are being checked by the broker logs and visualization tool.

Referred to this article, my situation might seem like Network cable unplugged as the author mentioned in Causes of Half-Open Connections, so I expect that the local socket should detect the connection is lost in the next message I published (which I thought might be implemented in token := Publish (), and token.Error() would be un-nil, please correct me if I'm wrong), but it detected after 16 seconds of disconnecting the internet and that is where I'm confused. But yes, it might be related to the TCP stack of the OS (Windows deal with it perfectly, so I'm also curious if there is some workaround to Ubuntu or macOS).

Here is the debug log, let me briefly explain the experiment:

  1. client Publish the payload with 2 tags(ATag1, ATag2) per second, followed by the tagName is its value.
  2. the first payload is value=1 : {[{Device1 ATag1 1} {Device1 ATag2 1}] 2020-08-21 13:34:40.949602 +0800 CST m=+1.817490109}, ATag1's value is 1, ATag2's value is also 1 (they will always be the same value).
  3. I disconnect the internet on nearly the fifth payload(value=5), which is {[{Device1 ATag1 5} {Device1 ATag2 5}] 2020-08-21 13:34:44.9541 +0800 CST m=+5.822118576}
  4. here is the different result begin:
    a. on Windows, the token.Error() that return from the client.Publish will be un nil, and enter into reconnect event immediately
    b. on Mac, the token.Error() is nil not until the payload value=20 {[{Device1 ATag1 20} {Device1 ATag2 20}] 2020-08-21 13:34:59.951302 +0800 CST m=+20.819811591}
    1. MemoryStore is used, internalConnLost is triggered, and IsConnectionOpen() finally set to false
    2. reconnect the internet
    3. Under {[{Device1 ATag1 33} {Device1 ATag2 33}] 2020-08-21 13:35:12.951768 +0800 CST m=+33.820703861}, payloads(value=21~value=30) resend, and follow by [Debug] [store] memorystore del: message 1 was deleted, so I assume MemoryStore works fine.
      [Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
      [Debug] [net]      obound wrote msg, id: 1
      [Debug] [net]      outgoing waiting for an outbound message
      [Debug] [net]      Received Message
      [Debug] [net]      startIncommingComms: got msg on ibound
      [Debug] [store]    memorystore del: message 1 was deleted
   iv. payloads after value=34, works fine.
   v. However, payloads during the value=6 ~ value=20 are lost.
DEBUG Log
[Debug] [client]   Connect()
[Debug] [store]    memorystore initialized
[Debug] [client]   about to write new connect msg
[Debug] [client]   socket connected to broker
[Debug] [client]   Using MQTT 3.1.1 protocol
[Debug] [net]      connect started
[Debug] [net]      received connack
[Debug] [client]   startCommsWorkers called
[Debug] [client]   client is connected/reconnected
[Debug] [net]      incoming started
[Debug] [net]      startIncommingComms started
[Debug] [net]      outgoing started
[Debug] [net]      startComms started
[Debug] [client]   startCommsWorkers done
[Debug] [client]   exit startClient
[Debug] [pinger]   keepalive starting
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      startIncommingComms: inboundFromStore complete
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Subscribe
[Debug] [client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [TARGET_TOPIC/cmd]
[Debug] [client]   sending subscribe message, topic: TARGET_TOPIC/cmd
[Debug] [client]   exit Subscribe
[Debug] [net]      obound priority msg to write, type *packets.SubscribePacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received suback, id: 1
[Debug] [net]      granted qoss [1]
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Subscribe
[Debug] [client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [TARGET_TOPIC/ack]
[Debug] [client]   sending subscribe message, topic: TARGET_TOPIC/ack
[Debug] [client]   exit Subscribe
[Debug] [net]      obound priority msg to write, type *packets.SubscribePacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received suback, id: 1
[Debug] [net]      granted qoss [1]
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/conn
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
connect successfully
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/cfg
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/conn
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [net]      received publish, msgId: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      putting puback msg on obound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      done putting puback msg on obound
[Debug] [net]      obound priority msg to write, type *packets.PubackPacket
true
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 1} {Device1 ATag2 1}] 2020-08-21 13:34:40.949602 +0800 CST m=+1.817490109}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 2} {Device1 ATag2 2}] 2020-08-21 13:34:41.949637 +0800 CST m=+2.817557070}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 3} {Device1 ATag2 3}] 2020-08-21 13:34:42.951465 +0800 CST m=+3.819418060}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 4} {Device1 ATag2 4}] 2020-08-21 13:34:43.953158 +0800 CST m=+4.821144514}
[Debug] [pinger]   ping check 0.793584889
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 5} {Device1 ATag2 5}] 2020-08-21 13:34:44.9541 +0800 CST m=+5.822118576}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 6} {Device1 ATag2 6}] 2020-08-21 13:34:45.9498 +0800 CST m=+6.817851007}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 7} {Device1 ATag2 7}] 2020-08-21 13:34:46.954008 +0800 CST m=+7.822092403}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 8} {Device1 ATag2 8}] 2020-08-21 13:34:47.953972 +0800 CST m=+8.822089035}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 9} {Device1 ATag2 9}] 2020-08-21 13:34:48.953928 +0800 CST m=+9.822078009}
[Debug] [pinger]   ping check 0.792819086
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 10} {Device1 ATag2 10}] 2020-08-21 13:34:49.954033 +0800 CST m=+10.822215642}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 11} {Device1 ATag2 11}] 2020-08-21 13:34:50.953963 +0800 CST m=+11.822177848}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 12} {Device1 ATag2 12}] 2020-08-21 13:34:51.94968 +0800 CST m=+12.817927472}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 13} {Device1 ATag2 13}] 2020-08-21 13:34:52.952657 +0800 CST m=+13.820937984}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 14} {Device1 ATag2 14}] 2020-08-21 13:34:53.951835 +0800 CST m=+14.820148563}
[Debug] [pinger]   ping check 0.792418158
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 15} {Device1 ATag2 15}] 2020-08-21 13:34:54.950543 +0800 CST m=+15.818889107}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 16} {Device1 ATag2 16}] 2020-08-21 13:34:55.952049 +0800 CST m=+16.820428408}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 17} {Device1 ATag2 17}] 2020-08-21 13:34:56.953036 +0800 CST m=+17.821447539}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 18} {Device1 ATag2 18}] 2020-08-21 13:34:57.951742 +0800 CST m=+18.820186635}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 19} {Device1 ATag2 19}] 2020-08-21 13:34:58.950866 +0800 CST m=+19.819343083}
[Debug] [pinger]   ping check 0.79279819
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 20} {Device1 ATag2 20}] 2020-08-21 13:34:59.951302 +0800 CST m=+20.819811591}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   internalConnLost called
write tcp HOST:64812->BROKER_IP:1883: write: can't assign requested address
[Debug] [client]   stopCommsWorkers called
[Debug] [net]      incoming complete
[Debug] [client]   startCommsWorkers output redirector finnished
[Debug] [pinger]   keepalive stopped
[Debug] [net]      startIncommingComms: ibound complete
[Debug] [net]      startIncommingComms goroutine complete
[Debug] [router]   matchAndDispatch exiting
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   stopCommsWorkers waiting for workers
[Debug] [client]   stopCommsWorkers waiting for comms
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      outgoing comms stopping
[Debug] [net]      startComms gorouting exiting
[Debug] [client]   comms goroutine done
[Debug] [client]   stopCommsWorkers done
[Debug] [client]   internalConnLost stopped workers
[Debug] [client]   internalConnLost exiting
Reconnecting...
write tcp HOST:64812->BROKER_IP:1883: write: can't assign requested address
[Debug] [client]   enter reconnect
[Debug] [client]   about to write new connect msg
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup TARGET_BROKER: no such host
[Debug] [client]   about to write new connect msg
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup TARGET_BROKER: no such host
[Debug] [client]   about to write new connect msg
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup TARGET_BROKER: no such host
[Debug] [client]   about to write new connect msg
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup TARGET_BROKER: no such host
[Debug] [client]   about to write new connect msg
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup TARGET_BROKER: no such host
[Debug] [client]   about to write new connect msg
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup TARGET_BROKER: no such host
[Debug] [client]   about to write new connect msg
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup TARGET_BROKER: no such host
[Debug] [client]   about to write new connect msg
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup TARGET_BROKER: no such host
[Debug] [client]   about to write new connect msg
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup TARGET_BROKER: no such host
disconnect successfully
[Debug] [client]   about to write new connect msg
[Debug] [client]   socket connected to broker
[Debug] [client]   Using MQTT 3.1.1 protocol
[Debug] [net]      connect started
[Debug] [net]      received connack
[Debug] [client]   startCommsWorkers called
[Debug] [client]   client is connected/reconnected
[Debug] [net]      incoming started
[Debug] [net]      startIncommingComms started
[Debug] [net]      outgoing started
[Debug] [net]      startComms started
[Debug] [pinger]   keepalive starting
[Debug] [client]   enter Subscribe
[Debug] [client]   startCommsWorkers done
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      startIncommingComms: inboundFromStore complete
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [TARGET_TOPIC/cmd]
[Debug] [client]   sending subscribe message, topic: TARGET_TOPIC/cmd
[Debug] [client]   exit Subscribe
[Debug] [net]      obound priority msg to write, type *packets.SubscribePacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received suback, id: 1
[Debug] [net]      granted qoss [1]
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Subscribe
[Debug] [client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [TARGET_TOPIC/ack]
[Debug] [client]   sending subscribe message, topic: TARGET_TOPIC/ack
[Debug] [client]   exit Subscribe
[Debug] [net]      obound priority msg to write, type *packets.SubscribePacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received suback, id: 1
[Debug] [net]      granted qoss [1]
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/conn
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
connect successfully
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/cfg
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/conn
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [net]      received publish, msgId: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      putting puback msg on obound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      done putting puback msg on obound
true
[Debug] [net]      obound priority msg to write, type *packets.PubackPacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 31} {Device1 ATag2 31}] 2020-08-21 13:35:10.948886 +0800 CST m=+31.817756157}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 32} {Device1 ATag2 32}] 2020-08-21 13:35:11.950198 +0800 CST m=+32.819100476}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 33} {Device1 ATag2 33}] 2020-08-21 13:35:12.951768 +0800 CST m=+33.820703861}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 34} {Device1 ATag2 34}] 2020-08-21 13:35:13.948728 +0800 CST m=+34.817696178}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 35} {Device1 ATag2 35}] 2020-08-21 13:35:14.950231 +0800 CST m=+35.819232250}
[Debug] [pinger]   ping check 0.369608827
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 36} {Device1 ATag2 36}] 2020-08-21 13:35:15.949439 +0800 CST m=+36.818473772}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 37} {Device1 ATag2 37}] 2020-08-21 13:35:16.951392 +0800 CST m=+37.820458883}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 38} {Device1 ATag2 38}] 2020-08-21 13:35:17.950013 +0800 CST m=+38.819113393}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 39} {Device1 ATag2 39}] 2020-08-21 13:35:18.952828 +0800 CST m=+39.821961161}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 40} {Device1 ATag2 40}] 2020-08-21 13:35:19.950926 +0800 CST m=+40.820091296}
[Debug] [pinger]   ping check 0.366472644
^Csignal: interrupt

Trying Some workaround

So, I've continully working on it by adjusting the SetKeepAlive, SetPingTimeout; the best combination I could got is 2 payloads lost by setting:

	clientOptions.SetAutoReconnect(true)
	clientOptions.SetConnectRetry(true)
	clientOptions.SetConnectRetryInterval(time.Duration(1 * time.Second)
	clientOptions.SetCleanSession(true)
	clientOptions.SetKeepAlive(time.Second * 2)
	clientOptions.SetPingTimeout(time.Second * 1)
	//clientOptions.SetConnectTimeout(time.Second * 2)
	clientOptions.SetMaxReconnectInterval(1 * time.Second)
	payload := newWillMessage().getPayload()
	clientOptions.SetWill(topic, payload, mqttQoS["AtLeastOnce"], true)

	clientOptions.SetOnConnectHandler(handleOnConnect)
	clientOptions.SetConnectionLostHandler(handleConnLost)
DEBUG log with 2 payloads lost (val=7 and val=8)
[Debug] [client]   Connect()
[Debug] [store]    memorystore initialized
[Debug] [client]   about to write new connect msg
[Debug] [client]   socket connected to broker
[Debug] [client]   Using MQTT 3.1.1 protocol
[Debug] [net]      connect started
[Debug] [net]      received connack
[Debug] [client]   startCommsWorkers called
[Debug] [client]   client is connected/reconnected
[Debug] [net]      incoming started
[Debug] [net]      startIncommingComms started
[Debug] [net]      outgoing started
[Debug] [net]      startComms started
[Debug] [client]   startCommsWorkers done
[WARN]  [store]    memorystore wiped
[Debug] [client]   exit startClient
[Debug] [client]   enter Subscribe
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [pinger]   keepalive starting
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      startIncommingComms: inboundFromStore complete
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [TARGET_TOPIC/cmd]
[Debug] [client]   sending subscribe message, topic: TARGET_TOPIC/cmd
[Debug] [client]   exit Subscribe
[Debug] [net]      obound priority msg to write, type *packets.SubscribePacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received suback, id: 1
[Debug] [net]      granted qoss [1]
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Subscribe
[Debug] [client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [TARGET_TOPIC/ack]
[Debug] [client]   sending subscribe message, topic: TARGET_TOPIC/ack
[Debug] [client]   exit Subscribe
[Debug] [net]      obound priority msg to write, type *packets.SubscribePacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received suback, id: 1
[Debug] [net]      granted qoss [1]
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/conn
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
connect successfully
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/cfg
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/conn
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [pinger]   ping check 0.386174925
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [net]      received publish, msgId: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      putting puback msg on obound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      done putting puback msg on obound
true
[Debug] [net]      obound priority msg to write, type *packets.PubackPacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 1} {Device1 ATag2 1}] 2020-08-21 14:27:16.271087 +0800 CST m=+2.143573558}
[Debug] [pinger]   ping check 0.382836915
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 2} {Device1 ATag2 2}] 2020-08-21 14:27:17.26822 +0800 CST m=+3.140739241}
[Debug] [pinger]   ping check 0.385806805
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 3} {Device1 ATag2 3}] 2020-08-21 14:27:18.26821 +0800 CST m=+4.140762007}
[Debug] [pinger]   ping check 0.3856215
[Debug] [pinger]   keepalive sending ping
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [net]      received pingresp
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 4} {Device1 ATag2 4}] 2020-08-21 14:27:19.269038 +0800 CST m=+5.141623333}
[Debug] [pinger]   ping check 0.384860591
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 5} {Device1 ATag2 5}] 2020-08-21 14:27:20.271919 +0800 CST m=+6.144536184}
[Debug] [pinger]   ping check 0.380566526
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 6} {Device1 ATag2 6}] 2020-08-21 14:27:21.268387 +0800 CST m=+7.141037538}
[Debug] [pinger]   ping check 0.384015045
[Debug] [pinger]   keepalive sending ping
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 7} {Device1 ATag2 7}] 2020-08-21 14:27:22.268057 +0800 CST m=+8.140740046}
[Debug] [pinger]   ping check 0.384309933
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 8} {Device1 ATag2 8}] 2020-08-21 14:27:23.271071 +0800 CST m=+9.143786490}
[Debug] [pinger]   ping check 0.381371504
[CRIT] [pinger]   pingresp not received, disconnecting
[Debug] [client]   internalConnLost called
[Debug] [client]   stopCommsWorkers called
[Debug] [client]   startCommsWorkers output redirector finnished
[Debug] [net]      incoming complete
[Debug] [net]      startIncommingComms: ibound complete
[Debug] [client]   stopCommsWorkers waiting for workers
[Debug] [net]      startIncommingComms goroutine complete
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [router]   matchAndDispatch exiting
[Debug] [client]   stopCommsWorkers waiting for comms
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      outgoing comms stopping
[Debug] [net]      startComms gorouting exiting
[Debug] [client]   comms goroutine done
[Debug] [client]   stopCommsWorkers done
[Debug] [client]   internalConnLost stopped workers
[Debug] [client]   internalConnLost exiting
Reconnecting...
[Debug] [client]   enter reconnect
pingresp not received, disconnecting
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   about to write new connect msg
[Debug] [client]   socket connected to broker
[Debug] [client]   Using MQTT 3.1.1 protocol
[Debug] [net]      connect started
[Debug] [net]      received connack
[Debug] [client]   startCommsWorkers called
[Debug] [client]   client is connected/reconnected
[Debug] [net]      incoming started
[Debug] [net]      startIncommingComms started
[Debug] [net]      outgoing started
[Debug] [net]      startComms started
[Debug] [client]   startCommsWorkers done
[Debug] [pinger]   keepalive starting
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   enter Subscribe
[Debug] [net]      startIncommingComms: inboundFromStore complete
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [TARGET_TOPIC/cmd]
[Debug] [client]   sending subscribe message, topic: TARGET_TOPIC/cmd
[Debug] [client]   exit Subscribe
[Debug] [net]      obound priority msg to write, type *packets.SubscribePacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 24} {Device1 ATag2 24}] 2020-08-21 14:27:39.273922 +0800 CST m=+25.147161700}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 2
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received suback, id: 1
[Debug] [net]      granted qoss [1]
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Subscribe
[Debug] [client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [TARGET_TOPIC/ack]
[Debug] [client]   sending subscribe message, topic: TARGET_TOPIC/ack
[Debug] [client]   exit Subscribe
[Debug] [net]      obound priority msg to write, type *packets.SubscribePacket
[Debug] [net]      outgoing waiting for an outbound message
disconnect successfully
[Debug] [net]      Received Message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received suback, id: 1
[Debug] [net]      granted qoss [1]
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 2 was deleted
[Debug] [net]      received puback, id: 2
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/conn
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
connect successfully
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/cfg
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/conn
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [pinger]   ping check 0.241499894
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 25} {Device1 ATag2 25}] 2020-08-21 14:27:40.27028 +0800 CST m=+26.143552675}
[Debug] [net]      Received Message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [net]      received publish, msgId: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      putting puback msg on obound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      done putting puback msg on obound
[Debug] [net]      obound priority msg to write, type *packets.PubackPacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
true
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [pinger]   ping check 0.031691686
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 26} {Device1 ATag2 26}] 2020-08-21 14:27:41.268479 +0800 CST m=+27.141784080}
[Debug] [pinger]   ping check 0.796129529
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 27} {Device1 ATag2 27}] 2020-08-21 14:27:42.270736 +0800 CST m=+28.144074381}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 2
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 2 was deleted
[Debug] [net]      received puback, id: 2
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 2
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 2 was deleted
[Debug] [net]      received puback, id: 2
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 2
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 2 was deleted
[Debug] [net]      received puback, id: 2
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 2
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [pinger]   ping check 0.015466843
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 2 was deleted
[Debug] [net]      received puback, id: 2
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 28} {Device1 ATag2 28}] 2020-08-21 14:27:43.27142 +0800 CST m=+29.144791095}
[Debug] [pinger]   ping check 0.789816388
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 29} {Device1 ATag2 29}] 2020-08-21 14:27:44.270631 +0800 CST m=+30.144033946}
[Debug] [pinger]   ping check 0.794322194
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message

I'm not sure if it's the right direction that I should work on, or is there any better workaround on it.
That would be so great if you could give me any direction. 😃 Thanks!

@MattBrittan
Copy link
Contributor

I'm using RMQ as the broker, QoS = 1.
Thanks - unfortunately I have no experience with Rabbit MQ so cannot offer any advice there.

And yes, all the packets lost are being checked by the broker logs and visualization tool.

One question on that. You are connecting with CleanSession set to true. Is whatever you are using to monitor the session also connecting with CleanSession = true and, if so, will it also loose connection when you pull out the network lead? CleanSession tells the broker to forget about any subscriptions when the connection is lost so it will drop any messages received while the subscriber is disconnected; this means that your subscriber would only start receiving messages again once it had connected AND sent a SUBSCRIBE (anything in the interim would be lost). I would suggest setting CleanSession at both ends to FALSE when testing this (if your aim is to process all messages then CleanSession=TRUE is a bad idea).

Looking at the logs:

{[{Device1 ATag1 20} {Device1 ATag2 20}] 2020-08-21 13:34:59.951302 +0800 CST m=+20.819811591}
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   internalConnLost called
write tcp HOST:64812->BROKER_IP:1883: write: can't assign requested address
[Debug] [client]   stopCommsWorkers called
...
Reconnecting...
write tcp HOST:64812->BROKER_IP:1883: write: can't assign requested address
[Debug] [client]   enter reconnect
... (attempts to reconnect)
disconnect successfully
[Debug] [client]   about to write new connect msg
... Connection up (mostly)
[Debug] [client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [TARGET_TOPIC/cmd]
... Connection up fully
connect successfully
...
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 31} {Device1 ATag2 31}] 2020-08-21 13:35:10.948886 +0800 CST m=+31.817756157}

So message 19 got sent then there was an attempt to publish another message (20 - "enter Publish") and this detected the error on send and called internalConnLost. I am a little confused as to why there are no calls to Publish in the interim; I would expect to see regular "enter Publish" entries in the logs if you are calling Publish during the reconnection phase. This seems wrong to me; can you please check that your code is calling Publish as you believe it is (every call to Publish should be logged). Please do note that with the connection down Publish will not return until the outgoing packet has been sent (so it will block if you are not calling it on a go routine).

Checking these logs would be simpler with PR #446 because this adds extra logging in strategic locations (@alsm any chance of this being accepted?).

I've continully working on it by adjusting the SetKeepAlive, SetPingTimeout

This will help but it's really just hiding the issue (setting these will enable the package to detect the network link drop more quickly but my main concern is the lost messages - that should not happen however long the link is down). This library should be able to handle a long network outage without loosing any messages.

@KShih
Copy link
Author

KShih commented Aug 24, 2020

Hi @MattBrittan , I hope you had a good weekend.

this means that your subscriber would only start receiving messages again once it had connected AND sent a SUBSCRIBE (anything in the interim would be lost).

Thanks for notifying this! Although I manage the disconnect and reconnect logic on my own code(will talk about this later), it sounds better to keep it false.

I am a little confused as to why there are no calls to Publish in the interim

Okay, it's because I handle this part on my own code, and the reason for that is the returned token.Error() from the Publish didn't properly set to not nil despite the internet connection is lost. Here is my code:

         if !client.IsConnectionOpen() {
		for _, payload := range payloads {
			WriteToSqlite(payload)
		}
		result = false
	} else {
		for _, payload := range payloads {
			if token := client.Publish(topic, mqttQoS["AtLeaseOnce"], true, payload); token.Wait() && token.Error() != nil {
				fmt.Println(token.Error())
				WriteToSqlite(payload)
				result = false
			}
		}
	}
	return result

As you can see in the if clause, I directly write the payload to the recover.sqlite if the isConnectionOpen() is set to false.
And I'll do the resend from the recover.sqlite whenever the client reconnects.

This is the debug log if I comment out the if part (also set CleanSession to false ):
And it could solve:

this seems wrong to me; can you please check that your code is calling Publish as you believe it is (every call to Publish should be logged).

DEBUG LOG
[Debug] [client]   Connect()
[Debug] [store]    memorystore initialized
[Debug] [client]   about to write new connect msg
[Debug] [client]   socket connected to broker
[Debug] [client]   Using MQTT 3.1.1 protocol
[Debug] [net]      connect started
[Debug] [net]      received connack
[Debug] [client]   startCommsWorkers called
[Debug] [client]   client is connected/reconnected
[Debug] [net]      incoming started
[Debug] [net]      startIncommingComms started
[Debug] [net]      outgoing started
[Debug] [net]      startComms started
[Debug] [pinger]   keepalive starting
[Debug] [client]   startCommsWorkers done
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   exit startClient
[Debug] [client]   enter Subscribe
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      startIncommingComms: inboundFromStore complete
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [TARGET_TOPIC/cmd]
[Debug] [client]   sending subscribe message, topic: TARGET_TOPIC/cmd
[Debug] [client]   exit Subscribe
[Debug] [net]      obound priority msg to write, type *packets.SubscribePacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received suback, id: 1
[Debug] [net]      granted qoss [1]
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Subscribe
[Debug] [client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [TARGET_TOPIC/ack]
[Debug] [client]   sending subscribe message, topic: TARGET_TOPIC/ack
[Debug] [client]   exit Subscribe
[Debug] [net]      obound priority msg to write, type *packets.SubscribePacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received suback, id: 1
[Debug] [net]      granted qoss [1]
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/conn
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
connect successfully
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/cfg
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/conn
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [net]      received publish, msgId: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      putting puback msg on obound
[Debug] [store]    memorystore del: message 1 was deleted
true
[Debug] [net]      done putting puback msg on obound
[Debug] [net]      obound priority msg to write, type *packets.PubackPacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [pinger]   ping check 0.586102023
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 1} {Device1 ATag2 1}] 2020-08-24 10:23:47.001404 +0800 CST m=+1.901773858}
[Debug] [pinger]   ping check 0.795029173
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 2} {Device1 ATag2 2}] 2020-08-24 10:23:48.003999 +0800 CST m=+2.904405702}
[Debug] [pinger]   ping check 0.789016366
[Debug] [pinger]   keepalive sending ping
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [net]      received pingresp
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 3} {Device1 ATag2 3}] 2020-08-24 10:23:49.001257 +0800 CST m=+3.901700195}
[Debug] [pinger]   ping check 0.791692396
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 4} {Device1 ATag2 4}] 2020-08-24 10:23:50.001296 +0800 CST m=+4.901775863}
[Debug] [pinger]   ping check 0.793116481
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 5} {Device1 ATag2 5}] 2020-08-24 10:23:51.00462 +0800 CST m=+5.905137451}
[Debug] [pinger]   ping check 0.792566318
[Debug] [pinger]   keepalive sending ping
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 6} {Device1 ATag2 6}] 2020-08-24 10:23:52.002372 +0800 CST m=+6.902926606}
[Debug] [pinger]   ping check 0.791077519
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 7} {Device1 ATag2 7}] 2020-08-24 10:23:53.004261 +0800 CST m=+7.904852155}
[Debug] [pinger]   ping check 0.791390239
[CRIT] [pinger]   pingresp not received, disconnecting
[Debug] [client]   internalConnLost called
[Debug] [client]   stopCommsWorkers called
[Debug] [client]   startCommsWorkers output redirector finnished
[Debug] [net]      incoming complete
[Debug] [net]      startIncommingComms: ibound complete
[Debug] [client]   stopCommsWorkers waiting for workers
[Debug] [net]      startIncommingComms goroutine complete
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [router]   matchAndDispatch exiting
[Debug] [client]   stopCommsWorkers waiting for comms
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      outgoing comms stopping
[Debug] [net]      startComms gorouting exiting
[Debug] [client]   comms goroutine done
[Debug] [client]   stopCommsWorkers done
[Debug] [client]   internalConnLost stopped workers
[Debug] [client]   internalConnLost exiting
INTO my connectionLostHandler function....
pingresp not received, disconnecting
[Debug] [client]   enter reconnect
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   enter Publish
{[{Device1 ATag1 8} {Device1 ATag2 8}] 2020-08-24 10:23:54.005815 +0800 CST m=+8.906443827}
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   enter Publish
{[{Device1 ATag1 9} {Device1 ATag2 9}] 2020-08-24 10:23:55.001385 +0800 CST m=+9.902050133}
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   enter Publish
{[{Device1 ATag1 10} {Device1 ATag2 10}] 2020-08-24 10:23:56.005318 +0800 CST m=+10.906020865}
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   enter Publish
{[{Device1 ATag1 11} {Device1 ATag2 11}] 2020-08-24 10:23:57.005981 +0800 CST m=+11.906720598}
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   enter Publish
{[{Device1 ATag1 12} {Device1 ATag2 12}] 2020-08-24 10:23:58.001327 +0800 CST m=+12.902103194}
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   enter Publish
{[{Device1 ATag1 13} {Device1 ATag2 13}] 2020-08-24 10:23:59.004632 +0800 CST m=+13.905445597}
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   enter Publish
{[{Device1 ATag1 14} {Device1 ATag2 14}] 2020-08-24 10:24:00.001826 +0800 CST m=+14.902676731}
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   enter Publish
{[{Device1 ATag1 15} {Device1 ATag2 15}] 2020-08-24 10:24:01.003691 +0800 CST m=+15.904578820}
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   enter Publish
{[{Device1 ATag1 16} {Device1 ATag2 16}] 2020-08-24 10:24:02.005925 +0800 CST m=+16.906849244}
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   enter Publish
{[{Device1 ATag1 17} {Device1 ATag2 17}] 2020-08-24 10:24:03.001411 +0800 CST m=+17.902372155}
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   enter Publish
{[{Device1 ATag1 18} {Device1 ATag2 18}] 2020-08-24 10:24:04.000881 +0800 CST m=+18.901879985}
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   enter Publish
{[{Device1 ATag1 19} {Device1 ATag2 19}] 2020-08-24 10:24:05.004286 +0800 CST m=+19.905321673}
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   enter Publish
{[{Device1 ATag1 20} {Device1 ATag2 20}] 2020-08-24 10:24:06.004492 +0800 CST m=+20.905564718}
[Debug] [client]   about to write new connect msg
[ERROR] [client]   dial tcp: lookup BROKER_HOST: no such host
[WARN]  [client]   failed to connect to broker, trying next
[Debug] [client]   Reconnect failed, sleeping for 1 seconds: network Error : dial tcp: lookup BROKER_HOST: no such host
[Debug] [client]   enter Publish
{[{Device1 ATag1 21} {Device1 ATag2 21}] 2020-08-24 10:24:07.002968 +0800 CST m=+21.904077292}
[Debug] [client]   about to write new connect msg
[Debug] [client]   enter Publish
{[{Device1 ATag1 22} {Device1 ATag2 22}] 2020-08-24 10:24:08.003518 +0800 CST m=+22.904664347}
[Debug] [client]   socket connected to broker
[Debug] [client]   Using MQTT 3.1.1 protocol
[Debug] [net]      connect started
[Debug] [net]      received connack
[Debug] [client]   startCommsWorkers called
[Debug] [client]   client is connected/reconnected
[Debug] [net]      incoming started
[Debug] [net]      startIncommingComms started
[Debug] [net]      outgoing started
[Debug] [net]      startComms started
[Debug] [client]   startCommsWorkers done
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      startIncommingComms: inboundFromStore complete
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [pinger]   keepalive starting
[Debug] [client]   enter Subscribe
[Debug] [client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [TARGET_TOPIC/cmd]
[Debug] [client]   sending subscribe message, topic: TARGET_TOPIC/cmd
[Debug] [client]   exit Subscribe
[Debug] [net]      obound priority msg to write, type *packets.SubscribePacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received suback, id: 1
[Debug] [net]      granted qoss [1]
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Subscribe
[Debug] [client]   SUBSCRIBE: dup: false qos: 1 retain: false rLength: 0 MessageID: 1 topics: [TARGET_TOPIC/ack]
[Debug] [client]   sending subscribe message, topic: TARGET_TOPIC/ack
[Debug] [client]   exit Subscribe
[Debug] [net]      obound priority msg to write, type *packets.SubscribePacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received suback, id: 1
[Debug] [net]      granted qoss [1]
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/conn
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
connect successfully
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/cfg
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/conn
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [net]      received publish, msgId: 1
[Debug] [net]      logic waiting for msg on ibound
[Debug] [net]      putting puback msg on obound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      done putting puback msg on obound
true
[Debug] [net]      obound priority msg to write, type *packets.PubackPacket
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 23} {Device1 ATag2 23}] 2020-08-24 10:24:09.005315 +0800 CST m=+23.906498339}
[Debug] [pinger]   ping check 0.180074804
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 24} {Device1 ATag2 24}] 2020-08-24 10:24:10.000601 +0800 CST m=+24.901821494}
[Debug] [pinger]   ping check 0.182278471
disconnect successfully
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 25} {Device1 ATag2 25}] 2020-08-24 10:24:11.004959 +0800 CST m=+25.906217366}
[Debug] [pinger]   ping check 0.179799681
[Debug] [pinger]   keepalive sending ping
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [net]      received pingresp
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 26} {Device1 ATag2 26}] 2020-08-24 10:24:12.001029 +0800 CST m=+26.902323974}
[Debug] [pinger]   ping check 0.182583519
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 27} {Device1 ATag2 27}] 2020-08-24 10:24:13.000886 +0800 CST m=+27.902217734}
[Debug] [pinger]   ping check 0.181812162
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 28} {Device1 ATag2 28}] 2020-08-24 10:24:14.004193 +0800 CST m=+28.905562138}
[Debug] [pinger]   ping check 0.179706808
[Debug] [pinger]   keepalive sending ping
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [net]      received pingresp
[Debug] [net]      logic waiting for msg on ibound
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 29} {Device1 ATag2 29}] 2020-08-24 10:24:15.002497 +0800 CST m=+29.903902715}
[Debug] [pinger]   ping check 0.180089008
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message
{[{Device1 ATag1 30} {Device1 ATag2 30}] 2020-08-24 10:24:16.000443 +0800 CST m=+30.901885935}
[Debug] [pinger]   ping check 0.183516051
[Debug] [client]   enter Publish
[Debug] [client]   sending publish message, topic: TARGET_TOPIC/data
[Debug] [net]      obound wrote msg, id: 0
[Debug] [net]      outgoing waiting for an outbound message

Disconnect on val=5; reconnect on val=23, lost package val=6~val=22.
The memoryStore is not worked as you mentioned in the previous discussion, I'm not sure if it's because I didn't provide the function for SetReconnectingHandler to set.

This library should be able to handle a long network outage without losing any messages.

This is what I thought too. For me, I think the main issue here is Why the token.Error() that the Publish() return, didn't reflect the network condition? And how could we solve it on Mac and Ubuntu (there is no issue on Windows)?

@MattBrittan
Copy link
Contributor

For me, I think the main issue here is Why the token.Error() that the Publish() return, didn't reflect the network condition? And how could we solve it on Mac and Ubuntu (there is no issue on Windows)?

I'm afraid I'll have to disagree. The test you are performing (unplugging the network lead) is unrealistic; the computer is able to detect that immediately (even if that does not immediately have an impact at the TCP layer). While that may happen with a production system it seems more likely that the issue will arise somewhere else (perhaps the broker is stopped without properly shutting down; maybe a router is restarted and the NAT entry lost) and it is often not possible to detect these conditions immediately (you can get close using keepalives but doing so will chew through bandwidth). To my way of thinking the critical thing is that the message gets through eventually whatever happens to the connection (so long as it can be re-established at some point).

Although I manage the disconnect and reconnect logic on my own code(will talk about this later)...

You have SetAutoReconnect(true) so the library is handling the reconnection (the logs seem to agree on this). When you say "I manage the disconnect.." I think that you mean that you detect when the connection is down and store rather than publishing messages until the connection come up. This is unnecessary in most cases because the library has the functionality to store the messages (to disk if you wish) so as long as the connection comes back before around 65535 messages are queued the library should ensure they get sent.

Okay, it's because I handle this part on my own code, and the reason for that is the returned token.Error() from the Publish didn't properly set to not nil despite the internet connection is lost.

I think that I'll need a Minimal, Reproducible Example to continue debugging this. The issue is that the final outcome depends upon both the library and your code so without access to a complete program its difficult to determine what is happening. Would you be able to try your scenario using the broker at test.mosquitto.org (see http://test.mosquitto.org for full details). If the issue occurs with that broker then you would have a standalone example you could share with me; if it works OK then we will be one step nearer to finding the issue).

As mentioned in my previous message PR #446 adds some logging that may help trace the issue but the version you are using logs very little when it comes to resending messages. From the logs you included it looks like messages from the store are not being resubmitted but there is insufficient information to identify the cause.

@KShih
Copy link
Author

KShih commented Aug 25, 2020

Hi @MattBrittan ,

After taking your advice of enabling the DEBUG log and create minimal and reproducible example, I found out it's my silly mistake to let the package lost.
In my minimal example, I found out there is always ibound message received like the log below, which I didn't receive from my code.

[Debug] [net]      obound wrote msg, id: 1
[Debug] [net]      outgoing waiting for an outbound message
[Debug] [net]      Received Message
[Debug] [net]      startIncommingComms: got msg on ibound
[Debug] [store]    memorystore del: message 1 was deleted
[Debug] [net]      received puback, id: 1
[Debug] [net]      logic waiting for msg on ibound

And finally, I found out the QoS attribute in Publish is mistakenly referenced to default type of byte which is zero, so that the Publish will not wait for the payload to be successfully published but directly go to the next Publish call.

// const.go
var mqttQoS = map[string]byte{
	"AtMostOnce":  0,
	"AtLeastOnce": 1,
	"ExactlyOnce": 2,
}

// and I used it as:
client.Publish(topic, mqttQoS["AtLeaseOnce"], true, payload) 
// equivalent to: client.Publish(topic, 0, true, payload)

Thank you for your help, and I'm sorry that the whole bunch of issues is raising from the typo.
I think the correct result of running my typo-app in Windows but not appear in other OS, put me into the wrong debug direction. But for sure the minimal example helps me a lot. Thanks! 🙏

@MattBrittan
Copy link
Contributor

No worries - happy its fixed (and I frequently find the problem when trying to make a simple example!).

If you are happy this is resolved please close the issue,

@KShih KShih closed this as completed Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants