MikroTik LORA Gateway restarts and Sensors lose connectivity

Hi,

I have a sensor network of TTGO/OLED boards reporting into a couple of Mikrotik gateways.

The sensors are running OTAA.

Often the power to the gateways is disrupted and then the gateways have to be brought back online after a day or two.

In almost every case the sensors fail to reconnect to the gateways and then each sensor has to be manually restarted to reconnect - this is very tedious.

Has anyone else run into this?
Any suggestions?
Would switching to ABP solve this problem?

Many thanks in advance.

The sensors do not know that they aren’t received by the gateway anymore (they do not connect to any gateway).
Gateways are just receivers, transmitting everything they hear with valid CRC to TTS CE via internet.
Only if you use Confirmed Uplinks - and you should avoid this- the nodes can know that they’re not heard by a gateway (and TTS) anymore.
A node doesn’t care whether a gateway restarts or looses it’s connection to TTS. It continues transmitting and increasing the fcnt-value. If the node (and it’s fcnt) is reseted without new OTTA, it will be ignored by TTS until fcnt exceeds the last value. Maybe this is the problem.

Possibly, but you will be gaffer taping something that actually needs fixing properly.

As above, the devices have a ‘connection’ with the Network Server not a gateway, (devices don’t ‘run’ OTAA, it’s just how they negotiate their credentials), so they will carry on regardless and once there is a gateway to hear the transmissions, normal service should resume.

If the gateways are vulnerable to power loss they should not be on TTN - the use of the community network comes with some responsibility to other users of TTN and gateways being unreliable can disrupt other devices. Another TTN’r may have devices and a gateway running, but some of their devices may be heard better by one of your MikroTik’s, the NS sets an appropriate data rate via ADR, something that is vital to efficient use of the shared frequencies. Then when the MikroTik goes offline, the device can’t be heard by the more remote gateway until it runs through its link check escalation (if it supports that).

What is the power for the gateways? And what do they report when they have power restored - do uplinks from other devices start coming through?

Without digging in to the details of your firmware, it’s hard to say what might be going on at that end, but for clarity, a device uplinking should be picked up as soon as there is a gateway to hear it. If your firmware does have some form of link check setup, it could end up dropping in to a variety of strategies - decreasing the DR and then eventually trying a re-join. But I’m not aware of this being implemented much in the common LMIC libraries as a default.

It is always good to have a different device to act as a canary so any issues in the main suite of devices are less likely to impact that, which would then help debug the situation.

That only works if the devices didn’t transmit > 16384 uplinks in the two days the gateways are down. If the devices do exceed 16K uplinks in two days they should be powered down permanently as that would violate TTN fair access policy by quiet some margin. (And possibly legal limits as well)
Not to mention them being very bad neighbors for anyone else using LoRaWAN nearby.

1 Like

Thanks All.

So no one knows why this would happen. OK. Will go a different path and will add a hard reset every 24 hours as a band aid for now.

FYI the sensors transmit 130 times a day and the payload in each case is about 4 bytes. As we always deploy a gateway or two dedicated to the deployed sensors, they usually use SF 7 and I have only once seen SF 8 used. I don’t think we are even close to violating any fair use policy in this case.

You haven’t given us anything to go on other than an overview of the symptoms - gateways should stay online, devices should just keep on trucking - my second device originally deployed 100m away from you was moved to Bolton and is still transmitting - at about the 175,000 uplinks mark now.

Apart from being mildly abusive of the system, there is an expectation that a device joins and only needs to rejoin under very occasional circumstances (battery change being the most usual). Your patch it may not work if the firmware somehow gets bogged down when it’s not getting a response from the gateways - if it locks up, it will need to be an external hard reset. As demonstrated above, there is no need for 24 hour hard resets.

That’s not the point of LoRaWAN / TTN, there’s no need for a gateway to sensor deployment match - just have some gateways to provide coverage across the Hope & Edale valley’s and Bradwell - then anyone can deploy a sensor at will - like gate open or water butt empty etc etc. One in each of the main villages would be a good start.

No one said you did, what @kersing was saying was there is a potential gotcha in that if a device manages to get >16384 uplinks ahead of where the Network Server last heard, then the Network Server will drop the packets.

If you need direct input on making this stuff right, please say so, the Hope Valley has much to gain from a wide range of sensing applications but it won’t work if the two gateways that exist are unreliable and the devices are significantly un-compliant. Let’s do it right, better to get it right now than have to unpick some issues further down the line …

Thanks Nick. Your points noted.

In that case they don’t require reset from network point of view so the device firmware is faulty. Like Nick I have devices that are up and running for ages and all survive gateway outages without any issues.

Some observations from troubleshooting a MikroTik gateway I’ve got deployed:

  • Make sure power is stable. When using PoE use high quality cable and make sure connectors are crimped well. Also make sure cables are fixed and do not ‘pull’ at the ethernet connector.
  • When upgrading MikroTik firmware, make sure to also upgrade the bootloader/low level firmware otherwise the unit will be unstable.

LOL - then 80% of gateways in SA needs to be turned off.

image

The difference being that pretty much ALL gateways in an area are off, not some.

The very best way to mess up a deployment of devices is to park a SCPF in the middle of the area with a decently mounted antenna AND to turn it off at night …

Then you need to look at our load sheading schedules, not all in the same area, sometimes all in the same area, random, 2 to 12 hours a day off (if you are lucky), it is crazy.

Thread hijack… just like our power… :wink: