Data backup on Gateway in case of network outage?

Are there gateways that can save data locally on SDCard in case of network outages? (Or, always log to SDCard as a backup)? I see that all gateways have SD card for firmware image storage, but not advertising this as backup storage?

It’s been done, and covered here before, but it’s not really compatible with LoRaWAN.

The reason is the frame counter - a network server will reject any packets where the frame counter value appears to have moved backwards in time. So if anyone else’s gateway gets a packet from your node through during the outage, your entire archive is rejected. And if you don’t replay your archive in chronological order before you start reporting in timely packets, it gets rejected.

Granted, for data historical integrity purposes you can probably get the relevant keys out of the TTN server and decode the packets yourself… but it’s an awkward fit with the design of LoRaWAN.

3 Likes

Got it, thanks for the explanation.

The very loverly @cslorabox isn’t a fan of that, not a bit and I fully understand why.

But the higher end RAK gateways do this once setup and tested. They do it by holding the unsubmitted messages in order and passing them on once they have connectivity restored. So very useful for GSM/LTE connections. However I believe there are issues with timestamps that I haven’t got a handle on, see: https://forum.rakwireless.com/t/rak7258-lte-automatic-data-recovery/2193/10

However, due to the security design, particularly the frame counters, this can go wrong so should be a nice to have in your design and not a definitely must must work.

RAK’s challenge isn’t so much an implementation one as the classic one of diving into writing code without there being a strategy which solves the actual problem. And then when users unsurprisingly have issues, trying to fix a systemic problem in the code of one component.

Upon reconnection, if they start being a gateway immediately, then the archive gets submitted out of order and a LoRaWAN server enforcing frame counts rejects it

Conversely if they drain the archive in order before they start being a gateway again, now the duration of the outage of real time data (and during which ADR keeps falling back due to loss) is magnified. And should any other gateway get a packet in, it’s all for naught.

I think if you really want to do this, you’re better off taking any gateway where you control the code and writing your own backup scheme which uses the storage media in a way that you feel is safe. Then get your node keys and do your own offline decoding of historical data outside of TTN, and write those packets to your historic log of application data with some “recovered” flag on them.

That still doesn’t mean it is a good solution. It will work for gateways out in the boonies with infrequently transmitting nodes and shorter connection interruptions.
In my opinion this tries to solve a fundamental issue of LoRaWAN, the assumption the gateway is always connected to the backend. Solving this requires a rewrite of (parts) of the specification, not hacking the packet forwarder and hoping to get away with it. (Btw it seems kerlinks cpf buffers packets as well so RAK is in good company)

Users at this level should know better.

Ditto - hence the setup and tested bit.

I fully appreciate that a metropolitan based gateway is going to be hammered if it tries to store & forward - but if it’s in that sort of area, there should be overlapping coverage of gateways.

As you say Jac, it’s for the boonies when the mobile network goes off line because it’s raining hard or it’s foggy (something I’ve actually lived with).

Meh… the real market for RAK (and Dragino) boxes running stock firmware is people who want to just plug something in, click through some gui menus, and have it work.

For a user with reliability concerns, they’re more a demonstration of the fact that an MT76x8 chip (or the competing AR9331 that Dragino uses) can be a decent inexpensive gateway platform potentially robustly booting from NOR flash.

You then have the choice of either putting custom software on the offered boxes (via changes captured in the overlay filesystem, or by rebuilding OpenWrt from source) and/or making a custom hardware platform adding in key things that were left out, such as a USB hub to allow using the sole host port for more than just the LTE modem.

NAND to Tetris has nothing on you! When you need a chip, do you go to the beach for the raw materials? :wink:

I suspect the real problem here is that the memo about the fail-fast product to market philosophy that companies are now using hasn’t reached the users who aren’t aware that features are included that may not work until they (we) have tested it and move it out of beta (if we are lucky, sometimes we move it out of alpha). It’s not unusual for me to include a button I think should be on a device to see if anyone needs it and then wait for them to press it and tell me that it didn’t work. Most of the buttons don’t get pressed.

No, I buy the same parts but shuffle them around until they are connected in the right order… for example putting in the missing USB hub. I didn’t even bother routing the SoC to the DDR, that’s a submodule, as of course are the concentrator and the LTE.

It’s not about doing everything yourself, it’s about re-doing the things that need fixing.

That I can agree with, but iterating by buying successive generations of boxes can be costly. And frequent change in offerings is bad for fleet deployment, too.

Or won’t work at all… often the big problem is software written by people who didn’t start from a clear vision of what it needed to do in the overall system context. That and features invented by a marketing department similarly lacking contextual awareness.

There’s a big difference between room for future expansion where the user/customer/client/licensee has the engineering materials to run with these ideas and turn them into something workable, vs. where they’d have to tear substantial parts up and rebuild them from scratch to move forward.

I think if someone wants packet backup on a gateway, they need to validate how it’s going to fit in architecturally and then write their own software to do it. I personally prefer to have that as an entirely separate subsystem outside the packet forwarder and live backhaul and decoding path.

Given the price of memory / FRAM / Flash / grains of sand, I’d cache data on the device with my own over-arching mechanism to tell it that it’s OK to purge - or more likely as I’d anticipate having weeks worth of data stored, just FIFO it and have a re-send range mechanism so if there is a gap, send a message with the range of data to resend.