The very loverly @cslorabox isnāt a fan of that, not a bit and I fully understand why.
But the higher end RAK gateways do this once setup and tested. They do it by holding the unsubmitted messages in order and passing them on once they have connectivity restored. So very useful for GSM/LTE connections. However I believe there are issues with timestamps that I havenāt got a handle on, see: https://forum.rakwireless.com/t/rak7258-lte-automatic-data-recovery/2193/10
However, due to the security design, particularly the frame counters, this can go wrong so should be a nice to have in your design and not a definitely must must work.
RAKās challenge isnāt so much an implementation one as the classic one of diving into writing code without there being a strategy which solves the actual problem. And then when users unsurprisingly have issues, trying to fix a systemic problem in the code of one component.
Upon reconnection, if they start being a gateway immediately, then the archive gets submitted out of order and a LoRaWAN server enforcing frame counts rejects it
Conversely if they drain the archive in order before they start being a gateway again, now the duration of the outage of real time data (and during which ADR keeps falling back due to loss) is magnified. And should any other gateway get a packet in, itās all for naught.
I think if you really want to do this, youāre better off taking any gateway where you control the code and writing your own backup scheme which uses the storage media in a way that you feel is safe. Then get your node keys and do your own offline decoding of historical data outside of TTN, and write those packets to your historic log of application data with some ārecoveredā flag on them.
That still doesnāt mean it is a good solution. It will work for gateways out in the boonies with infrequently transmitting nodes and shorter connection interruptions.
In my opinion this tries to solve a fundamental issue of LoRaWAN, the assumption the gateway is always connected to the backend. Solving this requires a rewrite of (parts) of the specification, not hacking the packet forwarder and hoping to get away with it. (Btw it seems kerlinks cpf buffers packets as well so RAK is in good company)
I fully appreciate that a metropolitan based gateway is going to be hammered if it tries to store & forward - but if itās in that sort of area, there should be overlapping coverage of gateways.
As you say Jac, itās for the boonies when the mobile network goes off line because itās raining hard or itās foggy (something Iāve actually lived with).
Mehā¦ the real market for RAK (and Dragino) boxes running stock firmware is people who want to just plug something in, click through some gui menus, and have it work.
For a user with reliability concerns, theyāre more a demonstration of the fact that an MT76x8 chip (or the competing AR9331 that Dragino uses) can be a decent inexpensive gateway platform potentially robustly booting from NOR flash.
You then have the choice of either putting custom software on the offered boxes (via changes captured in the overlay filesystem, or by rebuilding OpenWrt from source) and/or making a custom hardware platform adding in key things that were left out, such as a USB hub to allow using the sole host port for more than just the LTE modem.
NAND to Tetris has nothing on you! When you need a chip, do you go to the beach for the raw materials?
I suspect the real problem here is that the memo about the fail-fast product to market philosophy that companies are now using hasnāt reached the users who arenāt aware that features are included that may not work until they (we) have tested it and move it out of beta (if we are lucky, sometimes we move it out of alpha). Itās not unusual for me to include a button I think should be on a device to see if anyone needs it and then wait for them to press it and tell me that it didnāt work. Most of the buttons donāt get pressed.
No, I buy the same parts but shuffle them around until they are connected in the right orderā¦ for example putting in the missing USB hub. I didnāt even bother routing the SoC to the DDR, thatās a submodule, as of course are the concentrator and the LTE.
Itās not about doing everything yourself, itās about re-doing the things that need fixing.
That I can agree with, but iterating by buying successive generations of boxes can be costly. And frequent change in offerings is bad for fleet deployment, too.
Or wonāt work at allā¦ often the big problem is software written by people who didnāt start from a clear vision of what it needed to do in the overall system context. That and features invented by a marketing department similarly lacking contextual awareness.
Thereās a big difference between room for future expansion where the user/customer/client/licensee has the engineering materials to run with these ideas and turn them into something workable, vs. where theyād have to tear substantial parts up and rebuild them from scratch to move forward.
I think if someone wants packet backup on a gateway, they need to validate how itās going to fit in architecturally and then write their own software to do it. I personally prefer to have that as an entirely separate subsystem outside the packet forwarder and live backhaul and decoding path.
Given the price of memory / FRAM / Flash / grains of sand, Iād cache data on the device with my own over-arching mechanism to tell it that itās OK to purge - or more likely as Iād anticipate having weeks worth of data stored, just FIFO it and have a re-send range mechanism so if there is a gap, send a message with the range of data to resend.
@ descartes Hi the time stamp in the RAK store and forward backup, is a time delta in micro secs between last packet sent and current. So you have to do a bit work on on your backend to re-hydrate correct time stamp after an backhall outage.
The issue for many of us who use cellular backhall as default, is to get over the flowing. But brings itās own issues.
// Corp
IT: Gateway canāt be on corp lan/wan must be hidden (often in a ceiling void)
Cyber security: Gateway can Never be on lan/wan or only after 3 months of sec evaluation
// Medium sized enterprise:
You will have to talk to Dave, heās real busy always got a lot onā¦
// Small biz
Ow my niece does the IT after school on Thursdays
However, even why you have a wired backhall Iāve encountered the following issuesā¦
Gateway stolen
Power turned off at weekends
Hairy arsed electricians pulling out power cables to gateways as it was needed for something else
Had a client that would call to say the server was down around 5pm each day - turned out the cleaner was removing the note about not unplugging it ā¦
So for all those situations where the gateway goes awol for what ever reason, my scheme of keeping data points on the device (subject to power & cost issues), seems like a plan.
Iād say itās not worth the trouble of duplicated administration, required for decryption. And above all: LoRaWAN being radio in a license-free spectrum, maybe one should not rely on the data to start with?
Probably not. I suspect this is a confusion based on the free-running microsecond counter of the gateway concentrator chip, which is what the server uses to time downlink replies. It does not measure the time āsinceā a previous packet as that would break in the case of backhaul packet loss, it is merely a local counter stamp. And itās the same thing a gateway normally sends when backhaul is live.
It does indeed take some doing to convert it into an actual time - you have to have a sample of both at at point in the same run of the same packet forwarder program, and enough sense of the time to know how many times it has rolled over. And if the concentrator / packet forwarder have been restarted, that breaks the meaning of the counter unless you have a record of the value right before the restart. Uplink frame counts may make as much sense.
But this is also why someone who really wants a packet backup on the gateway should probably write their own. Including an RTC time is quite simple (and yes, if youāre going to this trouble you want a battery backed RTC and not a situation where you only have time after you can connect to an NTP server).
If youāre really stuck, put an RTC in a node and have it announce the time (doesnāt have to be right, just and monotonic) and use that as a time standard to measure other nodesā packets by.
Hi yes been using frame counts and appreciate the counter role overs but the free-running counter has been consistent enough in terms of delta T to give approximates which is all Iām after. In my case with the RAK it was part on evaluation they make the claim but donāt provide any details on how you would use the feature in a practical way. So one might say itās a rather nebulous feature but it does point to GW manfs. considering the issue.
Iāve used Multiech AEP GWs with onboard LNS & App server where we can Store & Forward before posting to say AWS-IoT.
@cslorabox already explained why queueing is going to be a nightmare, and wonāt work with the idea that the TTN public network can have multiple gateways that receive the uplinks, today or in the future. I just wanted to add that anyone trying to backup/decrypt on the gateway itself, rather than queueing/forwarding, will find that such is a nightmare too. (And, above all: one should expect missing packets anyway.)
Hi everyone, Iām completely with @arjanvanb on this one. If higher resilience to outages is required then LoRaWAN needs to deploy using the standard distributed control systems architecture of the last 25 years:
Full system stack at the edge; for LoRaWAN this would be a small cluster of gateways at the edge combined with dual micro LNS with applications and failover.
Continuous asynchronous replication of data from the edge to the centre.
Intelligence in the centre to handle fresh data differently from stale data.
Intelligence in the centre to replicate identity data out to the edge.
A few lost packets because itās raining hard is one thing, but having a framework that can store on node and re-send after someone has chopped through a comms cable and taken out a network for 36+ hours isnāt such a bad thing if it doesnāt cost much and is simple to manage.
If you are collecting data for a study, larger gaps in data can be irritating at best and ruin that data set at worst. But itās not time critical, so this would be a useful example - although in some situations you could just go and get an SD card from the node.
If you are running something system critical, then thatās a totally different set of circumstances where it would be prudent to implement one or more additional channels to compliment LoRaWAN, transports such as GSM/LTE/4G, SigFox, NB-IOT or even Iridium. In that respect Iād have LoRaWAN for the monitoring data as notionally cost free transport but keep command & control to other channels, but still leave room for a downlink if the other channel(s) manage to go off line.
Backing up and decrypting are two completely different things.
It makes no sense to decrypt on the gateway, itās both useless and severely compromises key management. (And even if the gateway had only the network session key, it still couldnāt autonomously reply to a node to maintain and ADR, path, since it has no way of knowing if another gateway still in contact with the network server has been asked to.)
However, if one wants to do a backup on the gateway, then itās probably necessary to feed the packets to a centralized decryptor which is parallel to that of TTNās server, because such stale packets are not LoRaWAN-compliant.
Building such is not all that complicated. However the debug tool that got built eventually became itās own network server
Very true. But assuming weāre still talking about doing this on the gateway: backing up and forwarding are two completely different things too.
I very much agree.
Thatās the same waste of time, I feel.
Only if one has the (OTAA) session keys that were applicable at the time the packets were received, and if one either (intelligently) brute forces the MSB 16 bits of the frame counters or keeps track of the frame counters too. Like I wrote thatās not as easy as one may think, I feel.
My impression is that you can the current ones via an API from TTN, I could be wrong, when I did it while trying to figure out why lorasever was losing session keys I grabbed them from the join accepts since I was able to intercept the backhaul traffic to all gateways, something obviously not the case in TTN.
And I think(?) you get an application-level message for a join accept when the keys change, so itās not like you have to randomly poll for changes.
In a well-functioning network nodes should almost never join anew.
Itās about what a network server normally does - there arenāt that many realistic possibilities {MSB=MSB, MSB=MSB-1, MSB=MSB+1, MSB=0} Like I said, my debug monitor eventually became its own network server since in the end that was the shortest path to the functionality we needed that let us stop fighting the āfeaturesā that only caused endless trouble.