OTAA LMIC node no longer showing in TTN console after 65535 frames up

I have fielded a handful of air quality sensors around the local city starting a couple years ago. The project utilizes an ESP32 + LMIC for communication w/ TTN. The software collects readings from attached sensors and sends the telemetry to TTN every minute. If any problems are encountered, the device will reboot itself. This masked an issue that I’m just now getting my hands around. Once the local network issues had been sorted out by the group managing the gateways, my devices were able to stay online for months on end. Then I eventually noticed that my devices would stop sending data after reaching 65535 uplinks (~45 days). The number is clearly the result of a 16bit overflow condition… somewhere.

These are outdoor devices mounted on signal poles which makes pulling debug data difficult. I’ve been working on a new version of the software to add some features, and to switch to the MCCI LMIC port with hopes that I’m probably just seeing some 16 byte issue that may have been solved (despite finding no GitHub issues about this in the original IBM LMIC port). I setup a test device to send data every ~3 sec, which has allowed me now to observe this error after a few days.

What I was expecting is some sort of error shown on the device, a hang, maybe a WDT, something like that. What I woke up to today is TTN showing my device not responding with exactly 65535 frames sent:
image

However - the device itself is online and believes itself to be sending data. I have it spraying debug logs via serial, and it’s happily cranking away, currently on uplink sequence number 72260. My local gateway devices both see the device, but no data is reported to TTN and the device is showing dark for the past 7 hours (per the image above)

Here’s one of the gateway logs:
image

Note the frame counter at 6700-something and counting. Add 65535 to that and you get the frame counter my device thinks it has, which is reported via LMIC.seqnoUp. Another classic sign of 32 bits of data in a 16bit bucket.

While I’m not much of an embedded programmer, I don’t see how sloppy code on my side is going to impact some internal counter within the LMIC library. Outside of the debug code printing the result of LMIC.seqnoUp, there’s nothing on my side that should be interacting w/ the sequence numbers directly. As these are full-time powered devices I’m not persisting frame counters to local storage (as one might do on an OTAA device that is entering deep sleep modes).

Has anyone seen anything like this?

Check the device settings page and switch “Frame counter width” to the other setting.

One thing to note is that you are violating the TTN fair access policy with each of these devices. You are allowed just 30 seconds of airtime for each device. Sending too often means you are using an inappropriate amount of a shared medium (the few frequencies LoRaWAN uses)

2 Likes

That was exactly the problem! Changing my benchtop device in the TTN settings page immediately restored communication with the device. Thanks man!

Also, totally understood on the airtime thing, the “report every 3 seconds” device is in my basement on my own gateway which should hopefully limit the scope of the dumb crap I’m doing here, but now that this is sorted I can go back to 5min reports which, given my packet size and spreading factor, should fit within the 1% airtime fairness rules.

Thanks @kersing!

If there are 65535 packets in 45 days, then that suggests you have the nodes set to send a packet once every 60 seconds.

If SF9 is typical than thats 388 seconds air time per day, fair access limit is 30 seconds per day.

2 Likes

For testing, one can also use the ttnctl command line tool to set the frame counter to some specific value, and then make the test node start with that value too, rather than starting at zero. This might also be needed when the difference between device and TTN exceeds MAX_FCNT_GAP, being 16,384. (But when adhering to the Fair Access Policy that should only happen when erroneously resetting the frame counters in TTN Console.)

Just for future reference: what is the correct setting (for your LMIC device)? (I’d assume 32 bits, which is also the default in TTN Console.)

The correct setting was 32 bit, and every single device I had created (which happened a couple years ago) were set to 16 bit. I don’t recall having any reason to have ever changed that, was the default different some time ago? IIRC (this is going back a bit), I might have used ttnctl to create the device definitions. Maybe something I messed up w/ the command there?

You are mixing two things. First there is the legal 1% limit applicable in Europe for the frequencies used for EU868. Second there is the fair access policy of TTN which states every node is allowed an average airtime of 30 seconds and 10 downlinks each day.

To stay within the 30 seconds a node can send about 10 bytes every 3 minutes at SF7. Larger packets, more frequent transmissions of other SF8 and over will result in a node exceeding the allowed airtime.

3 Likes

Hi @Allen as @Kersing states you are potentially conflating the two issues and I would add that whilst the regs allow higher duty cycles, there have been many posts on the Forum and elsewhere as to why you should not take that as permissive wrt social responsibility let alone wrt TTN FUP, e.g. this recent post from yours truely :wink: if you wouldnt mind taking a moment to read and note the can vs should element :slight_smile:

Understood and acknowledged. We can live with 5min intervals and that was the plan from the start. Thanks for the help guys, this sort of guidance is really useful!

1 Like

If your receiving at SF9, you need to be using a 12.5 minute interval.