MQTT Connection drops periodically and messages lost

I wrote my own Python client based on the paho-mqtt package. It is connecting to the mqtt broker at us-west.thethings.network using SSL on port 8883. Keepalive is set to 60 seconds. The QOS is set to 1. The clean_session argument is set to False. It connects successfully and messages from my three temperature/humidity sensors are received successfully; the sensors report periodically about every 30 minutes.

There are two problems:

  1. Despite the non-zero keepAlive the connection to the server drops once or twice a day with the errors shown below. I recall reading somewhere that the out of memory message is a wrong interpretation of the error code, but no matter, the program keeps running and is not out of memory; it immediately re-connects and continues fine until the next disconnect.

  2. Because QOS is 1 and clean_session is False; this should create a persistent subscription so I would have expected that any messages received during the disconnect would be sent; however, the message is lost. I can tell there are lost messages by looking at a value in a counter field that the sensor provides; this is a unique sequence of values for each new measurement. I understand that even if the client sets QOS to 1 the TTN MQTT broker would need to be set up for persistent subscriptions perhaps it is not. I have compared the messages received via MQTT with the messages sent to the data storage integration and which can be grabbed via http and can see that the missing messages are present in the data storage area.

Is anyone else using the MQTT broker with the Paho-mqtt package and has anyone else experienced drops in the connection with loss of messages.

I haven’t tried a different broker, it just occurred to me to try that, to even run two separate instances of the program each one connecting to a different broker.

2020-11-21 17:49:49,818 - root - ERROR - MainThread:mqtt_comms:134 - failed to receive on socket: [Errno 60] Operation timed out
2020-11-21 17:49:49,822 - lora.mqtt - WARNING - MainThread:mqtt_comms:122 - Disconnected status Out of memory.
2020-11-21 17:49:49,822 - lora.mqtt - ERROR - MainThread:db_streamer:114 - Upstream disconnected - Out of memory.
2020-11-21 17:49:51,361 - lora.mqtt - INFO - MainThread:mqtt_comms:88 - Connected with result code 0 Connection Accepted.
2020-11-21 17:49:51,363 - lora.mqtt - INFO - MainThread:mqtt_comms:96 - Subscribed to messages for all devices +/devices/+/up returned mid 2
2020-11-21 17:49:51,461 - lora.mqtt - INFO - MainThread:mqtt_comms:112 - Subscribed mid 2 Granted QOS (1,)
  1. TTN does not persist messages. Anything received while you are not connected will be lost. However, you could use the data integration to get the missing messages.
  2. With mosquito_sub I am not seeing frequent disconnects. Hardly any at all. You might have an issue where your program consumes memory that is only released when the failure occurs, hard to diagnose without the code.
  3. With one connection you should be able to get all messages arriving at the back-end. Connecting to a second broker won’t help and inter region traffic is known to experience issues. Only use the region the gateways are connected to and register your application with the same handler to prevent running into known issues.

I am using the paho mqtt client from Java. This appears to work stable so far.
My Java code uses the default settings + automatic reconnect enabled. This will cause a reconnect attempt when keep-alive was not received on time, with exponential back-off. With the java defaults, clean-session defaults to true. My code subscribes with QoS level 1.

In the connectComplete callback, my code explicitly restores the subscription (this is not restored along with the connection if clean-session is true). You can find it here:

I expect that the python client will probably behave in very much the same way the Java client does.

Actually I think that under some settings (e.g. clean-session is false) and QoS settings, an MQTT broker should try to persist messages or else it wouldn’t be able to provide the requested MQTT QoS.
If the TTN MQTT server has specific settings/requirements/limitations, then I think this should be documented clearly.

Yes, the Python application does pretty much exactly what your Java client does. I’m not convinced that the “out of memory” message is real, this webpage https://github.com/eclipse/paho.mqtt.python/issues/340 implies that the mapping of RC = 1 to “Out of Memory” is incorrect; that GitHub issue says it is just a “generic” error.

Yep, I realized after I submitted the topic that to connect to a different region I would have to register an application with the handler for that other region. So that is a dead-end. I may try running my own MQTT broker and feed some test messages through it to test the client; I can feed them through at a much faster rate and see if this forces the out-of-memory error sooner.

Of course there could be lots of reasons for the connection to drop; it could simply be the Wifi connection to my router of my connection to the internet.

I will add some profiling to monitor the memory as the messages come in and see if there is anything to the out-of-memory error; that is a good tip that it could be something that clears when the connection is dropped and then after receiving more messages it happens again. Yes, I could get the measurements from the data integration via the http request but the message from MQTT has many more fields including information on the RSSI and the frequency, the data rate and so on. The data integration only has the payload fields as far as I can tell.

The code is on GitHub but I’m in the middle of adding some more features and it is split between the master branch and a feature branch; when I merge to master I will add a link here, if I haven’t figured out a fix before that.

Thanks for getting back to me. It is useful to know that TTN doesn’t support persistent subscriptions — I don’t believe it is required to; at least that explains one part of the problems I’m seeing.