There are several situations where the frame count (FCnt) sent by a device does not match with that expected by the gateway. Such examples include:
The gateway is offline, resulting in the device transmitting packets with a higher frame count than expected.
The sensor is too far away from the gateway for reliable packet detections by the gateway, resulting in a higher frame count than expected transmitted by the device.
Because most packets are unacknowledged, how is desynconisation managed?
My understanding is that devices should request acknowledgments and reconnect to the network on occasion (which is also a good security practice to refresh keys on OTAA). By reconnecting to the network, the frame count is reset, which is no issue for OTAA (due to new keys), but means that ABP devices that do not retain/save their frame count will have their packets rejected (the network server considers these frames as already having been sent).
The forum post calls out a maxFCntGap = 16384. I believe this means that there can be up to a difference of 16384 frames between one frame and the next which are deemed acceptable/valid, after which the counter will “resync”. In other words, frame counts aren’t very strict.
But reviewing RP002-1.0.4 Regional Parameters, it states the following: MAX_FCNT_GAP was deprecated and removed from LoRaWAN 1.0.4 and subsequent versions.
Does this mean The Things Stack decides on the acceptable gap? If so, what is it now and is there a link/source to where it is called out?
I would also appreciate it if someone can confirm if my understanding is correct.
LoRaWAN revision 1.0.4 and 1.1 both mention the complete removal of MAX_FCNT_GAP in their changelogs as the FCnt fields were upgraded to 32bit. Thus, jumps greater than 16384 are allowed.
The GitHub repo is free to look at for yourself…
I’m not completely sure how to interpret this - line 137 looks like FCnt rollover to me but line 138 does not do a rollover, it’s line 140 that does a rollover. But then I have zero experience with Go nor TTS. Two minutes ago I wondered how fCnt names a type but only then found out that Go switches type and name in a function definition…
Either way, looks like they allow a single 16-bit rollover, regardless of whether I can decode the switch.
Isn’t that code for when the FCnt rollover occurs (given that a 16-bit value is reported by the device, but FCnt is tracked as a 32-bit value) rather than managing the gap that is “acceptable” between frames?
Help yourself by using the symbol panel that opens when you click a function name. Although TBF, this one comes with a substantial number of references throughout the stack, so I’ll give you one more hint:
If your device hasn’t been heard for more than 65k uplinks, you’re either not at all a friendly radio-neighbour, or apparently aren’t interested in the device’s measurements at all given that you haven’t noticed 65k missing uplinks. As we’ve mentioned before: if you’re asking these questions, you’re likely bordering proper use of LoRaWAN. Why is it that you’re asking this? What happened?
The reason for my question is to understand the implications of a gateway outage (for example a power outage). If there is no tolerance (where there could not be any missed frames), then a single missed frame (for example the gateway sending a downlink at the same time) would result in any future packets failing to be received. If the tolerance is too big, then the security benefits of the frame counter are negated.
Try to remember that many big brains in very large corporations sat around going through the excruciating detail of the specifications and that very large corporations use LoRaWAN at massive scale (100,000’s devices) to allow them to bill their customers.
Gateways go down for hours at a time, particularly in Spain last week, so this sort of issue is well catered for.
Not everything is documented - some is left to implementation on what is appropriate for the developer - and some items are sufficiently esoteric that if they get the settings wrong, the users notice and things are changed.
And also, perhaps do some thought experiments of your own - how long does it take to do say, 1K uplinks on a reasonably configured device, couple that with a link check every week and does it seem to work.
Losing the security that the frame counter brings also requires someone to have decrypted the packets. If by security you mean the loss of packets, that is so easily tracked & flagged on your own server.
With sensor deployments as large as what you have described, there indeed needs to be confidence in the network, particularly when there are outages.
Surely there is a threshold which The Things Network considers “acceptable” for the frame counter being above that expected. By knowing this number, you could be able to determine how long an outage would need to be in order to require a rejoin.
I appreciate that experimenting may lead to an answer, but isn’t there a max frame count gap programmed into the configuration to replace the previously defined standard of 16384?
As I have outlined above: it appears to be 65k. And I’ve checked for you: Chirpstack does the same. So there is some (probably coincidental) consensus of this replacement value.
So what numbers did you end up with for a reasonably-paced uplink rate?
I guess the current thresholds work nicely, given that there is zero evidence of your question being asked before…
Thank you Steven, I misinterpreted your earlier comment.
Would you care to explain how you were able to determine it was 65k (I’m assuming you mean 65535)?
I appreciate that experimenting may lead to an answer
In my comment, I was saying that by experimenting you may find the answer, but given it is based on some defined parameter, reviewing source material would be more definitive.
I guess the current thresholds work nicely, given that there is zero evidence of your question being asked before…
Makes it a worthwhile question to ask then. Always better to know than to assume the best (or worst).
For a 100,000 device deployment, I’d rather assume the worst. Would be a bad show if you assumed the exact theoretical boundary and reality appears to be more harsh.
Chiming in here as I saw the question on GitHub. Feel free to mention me here about LoRaWAN questions.
In LoRaWAN 1.0, 1.0.1, 1.0.2 and 1.0.3, frame counters can be 16 or 32-bit. 16-bit frame counters simply roll-over. So with 16-bit frame counters, there has to be a maximum gap much less than 65K, otherwise every FCnt would be valid, making replay attacks too easy. 16K is chosen somewhat arbitrarily. The gap applies also to 32-bit for consistency.
In LoRaWAN 1.0.4 and 1.1, the frame counters are always 32-bit wide so there is no need for the maximum gap.
Now, how many times the NS rolls over is actually not specified. The Things Stack does it once. I think that makes most sense. End-devices (also) don’t have the resources to check for many roll-overs on class B and C downlink because that is compute intensive. So I think everyone settles on one roll-over max.
Gaps should never be this big anyway. For end-devices using ADR, there is ADR backoff. Otherwise there is a similar mechanism described in TR007 that also lets the device detect link loss and revert to join mode. So these gaps only apply to ABP I think.
Thanks for taking the time to respond, it’s great to get insight directly from you.
Just to clarify:
In short, for LoRaWAN 1.0.4 and 1.1, the maximum gap is one roll-over. So it depends on the counter values, but the gap is at most 65K.
In LoRaWAN 1.0.4 and 1.1, does that mean the maximum acceptable gap is the max 32-bit value (~4.29 billion)? Or does The Things Stack impose a gap limit of 65K (the max 16-bit value), even though the frame counter is a 32-bit value? Quote is from your GitHub comment.
Is the primary purpose of frame counters to protect against replay attacks, rather than to reject uplinks with higher frame counters?
Now, how many times the NS rolls over is actually not specified. The Things Stack does it once. I think that makes most sense.
Are rollovers implemented across all versions, including 1.0 to 1.0.3 as well as 1.0.4 and 1.1?
If rollovers apply to 1.0.4 and 1.1 (with 32-bit counters), how is the frame counter gap managed, assuming a rollover occurs? I imagine the likelihood of a 32-bit counter reaching 4 billion is incredibly small, but curious how that’s handled in theory or in practice.