Multitech Conduit - Device stops sending data to network after 1-2 hours

Hello everyone,

I am testing a LoRaWAN network with the following schema:

LoRaWAN Nodes --> Multitech GW (Packet Forwarder) --> TTN Network Server

I have experienced a HUGE amount of data losses and I wonder why it can be happening. When I say HUGE I mean a lapse of 30-40 minutes without receiving a frame decoded in my Application Data screen, being the transmission period of 3 minutes. It is working perfectly and then the network starts doing strange things. It comes to a point where I lose the device definitively and I need a reset in the gateway or a reset in the device to have data from the node arriving to TTN again.

I have monitored the Traffic Analyzer of the Gateway in TTN and sometimes I see a Join Request, followed by a response of Join Accepted… and the device doesn’t send the frame but sends a Join Request again. This happens up to three times. Then it goes to sleep until the next transmission instant. When this happens several times, then the node stops trying and I have to reset either the GW or the node.

3_retries

As you can see, in this picture the device tried to connect three times, being successful the last one. Sometimes it happens that it tries for three times every three minutes without success, and in time the device stops trying.

It seems to be a software problem and it must be on the GW configuration (the device works perfectly until this happens so I think it shouldn’t be the problem), but I can’t locate it. The GW never loses its connection from TTN, so it should be something about configuration.

As additional information, I didn’t install the install.sh, that sets up TTN configuration in the Multitech because the GW brings the configuration for connecting to TTN as a packet forwarder. I thought it shouldn’t be necessary. Should I install it anyway? As the packets are arriving correctly…

What is going on?! :exploding_head:

Resetting a gateway doesn’t accomplish anything protocol wise, as gateways don’t hold any state relevant to the LoRaWAN protocol. They are merely transparent translators - think of them as a “telepresence robot” that lets TTN server act as if it were running at your gateway’s install site, rather than somewhere off in the cloud. The only thing restarting a gateway would do is maybe get its radio or connection to TTN going again. But if you saw traffic of any sort including from others continuing, that would not be the problem.

Watching any on-gateway logs for received packets, status messages, or errors could be useful.

sometimes I see a Join Request, followed by a response of Join Accepted… and the device doesn’t send the frame but sends a Join Request again.

A node ignoring a join accept and sending another instead is likely either a sign of a bug in the node firmware (often with respect to receive window timing) or a node that is doing a better job of transmitting or receiving. Many popular node firmwares have turned out to have timing bugs - what exactly is your node? Software version (ideally git hash), specific hardware, and local modifications?

In terms of other failure modes you could try giving the node a nice quiet power supply and making sure it is not near other equipment. Being a few tens of meters from the gateway might be preferable to sitting on the same benchtop, though the sort of software confusion that can result from being too close is really not something that should still be happening, at least not routinely.