How is frame count (FCnt) desyncronisation managed? (1.0.4+)

ElectronicallyE · May 6, 2025, 1:35am

There are several situations where the frame count (FCnt) sent by a device does not match with that expected by the gateway. Such examples include:

The gateway is offline, resulting in the device transmitting packets with a higher frame count than expected.
The sensor is too far away from the gateway for reliable packet detections by the gateway, resulting in a higher frame count than expected transmitted by the device.

Because most packets are unacknowledged, how is desynconisation managed?
My understanding is that devices should request acknowledgments and reconnect to the network on occasion (which is also a good security practice to refresh keys on OTAA). By reconnecting to the network, the frame count is reset, which is no issue for OTAA (due to new keys), but means that ABP devices that do not retain/save their frame count will have their packets rejected (the network server considers these frames as already having been sent).

But there are devices which do not reconnect to the network or request acknowledgments. How do these continue to work correctly if there is patchy coverage?
I have come across the following forum post from 2017:
How much frame count deviation is allowed? - End Devices (Nodes) - The Things Network

The forum post calls out a maxFCntGap = 16384. I believe this means that there can be up to a difference of 16384 frames between one frame and the next which are deemed acceptable/valid, after which the counter will “resync”. In other words, frame counts aren’t very strict.

But reviewing RP002-1.0.4 Regional Parameters, it states the following:
MAX_FCNT_GAP was deprecated and removed from LoRaWAN 1.0.4 and subsequent versions.

Does this mean The Things Stack decides on the acceptable gap? If so, what is it now and is there a link/source to where it is called out?

I would also appreciate it if someone can confirm if my understanding is correct.

stevencellist · May 6, 2025, 5:31am

LoRaWAN revision 1.0.4 and 1.1 both mention the complete removal of MAX_FCNT_GAP in their changelogs as the FCnt fields were upgraded to 32bit. Thus, jumps greater than 16384 are allowed.

ElectronicallyE · May 6, 2025, 6:21am

Thanks @stevencellist. Do you know what The Things Network permits as an acceptable frame count difference from one frame to the next?

stevencellist · May 6, 2025, 6:58am

The GitHub repo is free to look at for yourself…

github.com/TheThingsNetwork/lorawan-stack

pkg/networkserver/internal/utils.go

3b2442c81


      
          // FullFCnt returns full FCnt given fCnt, lastFCnt and whether or not 32-bit FCnts are supported.
          func FullFCnt(fCnt uint16, lastFCnt uint32, supports32BitFCnt bool) uint32 {
          	switch {
          	case fCnt == 0 && lastFCnt == 0:
          		return 0
          	case !supports32BitFCnt:
          		return uint32(fCnt)
          	case uint32(fCnt) >= lastFCnt&0xffff:
          		return lastFCnt&^0xffff | uint32(fCnt)
          	case lastFCnt < 0xffff0000:
          		return (lastFCnt+0x10000)&^0xffff | uint32(fCnt)
          	default:
          		return uint32(fCnt)
          	}
          }

I’m not completely sure how to interpret this - line 137 looks like FCnt rollover to me but line 138 does not do a rollover, it’s line 140 that does a rollover. But then I have zero experience with Go nor TTS. Two minutes ago I wondered how fCnt names a type but only then found out that Go switches type and name in a function definition…

Either way, looks like they allow a single 16-bit rollover, regardless of whether I can decode the switch.

ElectronicallyE · May 6, 2025, 7:06am

Isn’t that code for when the FCnt rollover occurs (given that a 16-bit value is reported by the device, but FCnt is tracked as a 32-bit value) rather than managing the gap that is “acceptable” between frames?

stevencellist · May 6, 2025, 7:14am

Help yourself by using the symbol panel that opens when you click a function name. Although TBF, this one comes with a substantial number of references throughout the stack, so I’ll give you one more hint:

github.com/TheThingsNetwork/lorawan-stack

pkg/networkserver/grpc_gsns.go

3b2442c81


      
          			return nil, false, nil
          		}
          	}
          
          	// Current session match
          	if matchType != pendingMatch &&
          		dev.Session != nil &&
          		dev.MacState != nil &&
          		devAddr.Equal(types.MustDevAddr(dev.Session.DevAddr).OrZero()) &&
          		macspec.UseLegacyMIC(cmacFMatchResult.LoRaWANVersion) == macspec.UseLegacyMIC(dev.MacState.LorawanVersion) &&
          		(cmacFMatchResult.FullFCnt == FullFCnt(uint16(pld.FHdr.FCnt), dev.Session.LastFCntUp, mac.DeviceSupports32BitFCnt(dev, ns.defaultMACSettings, profile.GetMacSettings())) || // nolint: gosec, lll
          			cmacFMatchResult.FullFCnt == pld.FHdr.FCnt) {
          		fNwkSIntKey, err := cryptoutil.UnwrapAES128Key(ctx, dev.Session.Keys.FNwkSIntKey, ns.KeyService())
          		if err != nil {
          			log.FromContext(ctx).WithError(err).WithField("kek_label", dev.Session.Keys.FNwkSIntKey.KekLabel).Warn("Failed to unwrap FNwkSIntKey")
          			return nil, false, nil
          		}
          		if cmacFMatchResult.FNwkSIntKey.Equal(fNwkSIntKey) {
          			ctx = log.NewContextWithFields(ctx, log.Fields(
          				"last_f_cnt_up", dev.Session.LastFCntUp,
          				"mac_version", dev.MacState.LorawanVersion,

ElectronicallyE · May 6, 2025, 7:21am

To be honest, I’m completely lost. Are you sure the acceptable gap isn’t called out somewhere in some documentation?

stevencellist · May 6, 2025, 7:22am

Google is your friend… if you can find it, let me know.

ElectronicallyE · May 6, 2025, 7:25am

That’s particularly odd then. Wouldn’t you think that this would be important information to know how much tolerance there is for missed frames?

In my searching I could only find reference to the since outdated maxFCntGap.

stevencellist · May 6, 2025, 7:35am

If your device hasn’t been heard for more than 65k uplinks, you’re either not at all a friendly radio-neighbour, or apparently aren’t interested in the device’s measurements at all given that you haven’t noticed 65k missing uplinks. As we’ve mentioned before: if you’re asking these questions, you’re likely bordering proper use of LoRaWAN. Why is it that you’re asking this? What happened?

ElectronicallyE · May 6, 2025, 9:04am

The reason for my question is to understand the implications of a gateway outage (for example a power outage). If there is no tolerance (where there could not be any missed frames), then a single missed frame (for example the gateway sending a downlink at the same time) would result in any future packets failing to be received. If the tolerance is too big, then the security benefits of the frame counter are negated.

descartes · May 6, 2025, 9:22am

Try to remember that many big brains in very large corporations sat around going through the excruciating detail of the specifications and that very large corporations use LoRaWAN at massive scale (100,000’s devices) to allow them to bill their customers.

Gateways go down for hours at a time, particularly in Spain last week, so this sort of issue is well catered for.

Not everything is documented - some is left to implementation on what is appropriate for the developer - and some items are sufficiently esoteric that if they get the settings wrong, the users notice and things are changed.

And also, perhaps do some thought experiments of your own - how long does it take to do say, 1K uplinks on a reasonably configured device, couple that with a link check every week and does it seem to work.

Losing the security that the frame counter brings also requires someone to have decrypted the packets. If by security you mean the loss of packets, that is so easily tracked & flagged on your own server.

ElectronicallyE · May 6, 2025, 12:28pm

With sensor deployments as large as what you have described, there indeed needs to be confidence in the network, particularly when there are outages.

Surely there is a threshold which The Things Network considers “acceptable” for the frame counter being above that expected. By knowing this number, you could be able to determine how long an outage would need to be in order to require a rejoin.

I appreciate that experimenting may lead to an answer, but isn’t there a max frame count gap programmed into the configuration to replace the previously defined standard of 16384?

stevencellist · May 6, 2025, 12:53pm

As I have outlined above: it appears to be 65k. And I’ve checked for you: Chirpstack does the same. So there is some (probably coincidental) consensus of this replacement value.

So what numbers did you end up with for a reasonably-paced uplink rate?

I guess the current thresholds work nicely, given that there is zero evidence of your question being asked before…

ElectronicallyE · May 6, 2025, 1:03pm

Thank you Steven, I misinterpreted your earlier comment.

Would you care to explain how you were able to determine it was 65k (I’m assuming you mean 65535)?

I appreciate that experimenting may lead to an answer

In my comment, I was saying that by experimenting you may find the answer, but given it is based on some defined parameter, reviewing source material would be more definitive.

I guess the current thresholds work nicely, given that there is zero evidence of your question being asked before…

Makes it a worthwhile question to ask then. Always better to know than to assume the best (or worst).

stevencellist · May 6, 2025, 1:29pm

^

For a 100,000 device deployment, I’d rather assume the worst. Would be a bad show if you assumed the exact theoretical boundary and reality appears to be more harsh.

kersing · May 6, 2025, 2:13pm

Current knowledge may be invalid in the future. Just like the previous 16K gap got removed from the spec the newly implemented value may change.

I still assume 16K to be the value to use as most available devices implement versions < 1.0.4 anyway.

descartes · May 6, 2025, 4:48pm

Please, pretty please, try this.

johan · May 6, 2025, 5:19pm

Chiming in here as I saw the question on GitHub. Feel free to mention me here about LoRaWAN questions.

In LoRaWAN 1.0, 1.0.1, 1.0.2 and 1.0.3, frame counters can be 16 or 32-bit. 16-bit frame counters simply roll-over. So with 16-bit frame counters, there has to be a maximum gap much less than 65K, otherwise every FCnt would be valid, making replay attacks too easy. 16K is chosen somewhat arbitrarily. The gap applies also to 32-bit for consistency.

In LoRaWAN 1.0.4 and 1.1, the frame counters are always 32-bit wide so there is no need for the maximum gap.

Now, how many times the NS rolls over is actually not specified. The Things Stack does it once. I think that makes most sense. End-devices (also) don’t have the resources to check for many roll-overs on class B and C downlink because that is compute intensive. So I think everyone settles on one roll-over max.

Gaps should never be this big anyway. For end-devices using ADR, there is ADR backoff. Otherwise there is a similar mechanism described in TR007 that also lets the device detect link loss and revert to join mode. So these gaps only apply to ABP I think.

ElectronicallyE · May 7, 2025, 1:37am

Hi Johan,

Thanks for taking the time to respond, it’s great to get insight directly from you.

Just to clarify:

In short, for LoRaWAN 1.0.4 and 1.1, the maximum gap is one roll-over. So it depends on the counter values, but the gap is at most 65K.

In LoRaWAN 1.0.4 and 1.1, does that mean the maximum acceptable gap is the max 32-bit value (~4.29 billion)? Or does The Things Stack impose a gap limit of 65K (the max 16-bit value), even though the frame counter is a 32-bit value? Quote is from your GitHub comment.
Is the primary purpose of frame counters to protect against replay attacks, rather than to reject uplinks with higher frame counters?

Now, how many times the NS rolls over is actually not specified. The Things Stack does it once. I think that makes most sense.

Are rollovers implemented across all versions, including 1.0 to 1.0.3 as well as 1.0.4 and 1.1?
If rollovers apply to 1.0.4 and 1.1 (with 32-bit counters), how is the frame counter gap managed, assuming a rollover occurs? I imagine the likelihood of a 32-bit counter reaching 4 billion is incredibly small, but curious how that’s handled in theory or in practice.