DevAddr/NwkAddr limits

We’re planning to run an instance of The Things Stack Open Source, and are considering paying the small yearly fee to the LoRa Alliance for a block of 16 Type 7 NetID’s.

Looking at the specs it seems to suggest that for one Type 7 NetID, you get 7 bits for the NwkAddr, which would suggest you can only have 128 devices in that network. Is that correct? (https://lora-alliance.org/wp-content/uploads/2020/11/TS002-1.1.0_LoRaWAN_Backend_Interfaces.pdf - page 51, table 3)

On a page of the TTS documentation though it says “Please keep in mind that the DevAddr value is not unique - multiple devices can have the same DevAddr.” (ABP vs OTAA | The Things Stack for LoRaWAN)

Does this mean that TTS can somehow handle more than 128 devices in the same NetID, despite the 128 NwkAddr limit, or is that sentence just pointing out that there can be clashes occasionally and clashes are handled?

128 devices isn’t enough for our use case, but we can’t justify the full cost of paying for a larger LoRa Alliance membership to get a larger allocation, so I’m just trying to understand some of our options.

This is sort of the IPV4 problem over again, with a partial saving grace and a big note that 32 bits is already a very large amount of address overhead to be transmitting via a low bandwith radio link that may run in dwell time limited regulatory contexts (there are places where LoRaWAN’s overhead alone becomes illegal to transmit even without any message payload at SF11BW125)

First note the part in the document which mentions the two experimental net id’s usable if you don’t need interoperation and roaming. Or for that matter, what the spec doesn’t mention, which is that a fully private network doesn’t have to exactly be by-the-spec LoRaWAN at all.

In terms of address re-use, it’s actually more the combination of device address and network session key which uniquely identifies an end device. In that view, the primary role of a partiall-unique device address is really to have a unique network address for routing a roaming packet back to the right network server.

If two distinct devices have the same device address, the idea is that the cryptographic checksum on a packet (the MIC) should only work when verified with the network session key of one of them. So re-use mostly costs a reasonable, finite multiplier on computation in the network server of searching for a MIC match in a finite list of multiple possibilities with that device address in the device database (including at times checking some possible skip-forward of the frame count beyond the rollover point, if the last heard value was sufficiently close).

I don’t know off the top of my head if TTS in the configuration you’d be running is practically willing to re-use device addresses when it joins OTAA nodes or if it will allow you to assign overlapping ones for ABP nodes.

The more interesting question is the possibility of a MIC collision, where the same 32-bit MIC could be valid for two different devices’ network session keys, possibly at two different states of the mutually tracked but untransmitted upper 16 bits of the frame counter, or if you allow for message corruptions which survive the radio level integrity check. The fact that the MIC space is only 2^32 while the network session key space is 2^64 means that collisions are definitely possible - a given message could produce a given MIC not with one unique key but with something on a statistical average of 2^32 different keys. However the chance of a given message colliding for two known end devices is still very small, and it would seem that the “birthday problem” type quadratic scaling would apply to the device address reuse count, with the number of trials (packets) only contributing linearly to the chance of seeing one.

But if it did happen, the consequence of a MIC collision could be rather bad. The issue wouldn’t be the one mistakenly reported packet, but that its mistaken assignment could lastingly set the expected frame count of a device to the wrong value resulting in all further traffic being ignored for a substantial period of time until the misidentified “lagging” device’s actual transmissions caught up to the frame count value mistakenly misassigned from the other. One technique for resistance would be to have the “match search” code not break at the first match, but instead run all possible end devices with that device address, and if there were more than one MIC match discard the packet entirely to avoid breaking the ongoing session state (at a level of manual review, the frame count would likely look sensible for one end device as a small increment and not the other as a larger skip).

A collision with somebody else’s end device would be theoretically possible too (especially on the shared network id’s) however, you’re much less likely to be receiving some random other party’s uplinks at your gateway than you are your own… but the “check all known possibilities for collision” idea wouldn’t work for the unknown possibility of other people’s devices. Tracking the history of the expected uplink frame count more than one element deep however, could.

These would appear to be contradictory statements - running TTS OS isn’t trivial if you need to rely on it as it will need sanity checks weekly, you should not get too far behind on the updates that come out regularly and you’ll need to organise a backup regime. The hosting won’t be a huge amount but staff time can add up.

So the question is why run TTS OS at all?