Possible frame-counter issues with Stack v3

Hello there,

I got some issues with the new V3 stack and hope anyone can help me.

Let me start from the beginning:
I have some devices that were connected to V2 over my TBMH100 gateway.
Now I switched to V3 ( no migration ) as I’m still trying to get
the hang of all this stuff.
I added the gateway, all good.
I added the application, all good.
I added the devices, all good.

But after a while the devices stopped working. When I trigger
them ( via a button ) to send their data, I can see in the gateway
log that the data arrives, but without the intended payload. Also
the data is never transmitted to the application. When I delete
the devices and add them as new, they work for a while. I’ve been
in contact with the creator of the devices, it seems like it might
be an issue with the frame counter, as it’s stored in a volatile(?)
memory and is reset when the device is powered off or loses power.

This has never been an issue with V2 since you could set the V2 console to ignore the
frame counter. I did some research but am still unsure about that.

So maybe someone can answer me a few questions:

  • Is this correct? Is this issue based on the frame counter?
  • Is there a way to reset the frame counter of a device with CLI? ( See below )
  • Is there a way to join/rejoin a device with CLI? Since I read that would also reset the frame counter
  • Is there another solution to this? The thing is, those devices can not store the frame counter in a
    non-volative memory due to hardware limitation.

About resetting the frame counter I found some hints in the forums but they didn’t work. Like:
> ttn-lw-cli end-devices set [app-id] [dev-id] --session.last-f-cnt-up=1 --session.dev_addr="[dev-addr]"

Just throws an error:

WARN Finished unary call {"duration": 0.055, "error": "rpc error: code = Internal desc = error:pkg/rpcserver:rpc_recovered (Internal Server Error)", "error_correlation_id": "41c54a19ec4745f8a935132135ae82a5", "error_name": "rpc_recovered", "error_namespace": "pkg/rpcserver", "grpc.method": "Set", "grpc.service": "ttn.lorawan.v3.NsEndDeviceRegistry", "grpc_code": "Internal", "namespace": "grpc", "request_id": "c197d9db-ec03-47ae-a28e-dc2bb2360cb8"} error:pkg/rpcserver:rpc_recovered (Internal Server Error) correlation_id=41c54a19ec4745f8a935132135ae82a5

The devices are connected with OTAA

This is the data / payload I ( the gateway ) receive when only the gateway receives data:

> {
>   "name": "gs.up.receive",
>   "time": "2021-08-24T14:42:38.278489636Z",
>   "identifiers": [
>     {
>       "gateway_ids": {
>         "gateway_id": "ds-58a0cbffe802852"
>       }
>     },
>     {
>       "gateway_ids": {
>         "gateway_id": "ds-58a0cbffe802852",
>         "eui": "58A0CBFFFE802852"
>       }
>     }
>   ],
>   "data": {
>     "@type": "type.googleapis.com/ttn.lorawan.v3.UplinkMessage",
>     "raw_payload": "QJZLCybAAwABeVTduEjSQM3/xOYBSbSzmQ==",
>     "payload": {
>       "m_hdr": {
>         "m_type": "UNCONFIRMED_UP"
>       },
>       "mic": "SbSzmQ==",
>       "mac_payload": {
>         "f_hdr": {
>           "dev_addr": "260B4B96",
>           "f_ctrl": {
>             "adr": true,
>             "adr_ack_req": true
>           },
>           "f_cnt": 3
>         },
>         "f_port": 1,
>         "frm_payload": "eVTduEjSQM3/xOYB"
>       }
>     },
>     "settings": {
>       "data_rate": {
>         "lora": {
>           "bandwidth": 125000,
>           "spreading_factor": 12
>         }
>       },
>       "coding_rate": "4/5",
>       "frequency": "867100000",
>       "timestamp": 4024851540,
>       "time": "2021-08-24T14:42:38.198283910Z"
>     },
>     "rx_metadata": [
>       {
>         "gateway_ids": {
>           "gateway_id": "ds-58a0cbffe802852",
>           "eui": "58A0CBFFFE802852"
>         },
>         "time": "2021-08-24T14:42:38.198283910Z",
>         "timestamp": 4024851540,
>         "rssi": -94,
>         "channel_rssi": -94,
>         "snr": 7,
>         "uplink_token": "CiAKHgoSZHMtNThhMGNiZmZlODAyODUyEghYoMv//oAoUhDUuJn/DhoMCN6KlIkGEKf0vIQBIKDQ4t6R6QQ="
>       }
>     ],
>     "received_at": "2021-08-24T14:42:38.277821991Z",
>     "correlation_ids": [
>       "gs:conn:01FDVNJS2P8A8KDW5K9Q0C75BZ",
>       "gs:uplink:01FDW9SY2616Y6Q38M9S45TBA2"
>     ]
>   },
>   "correlation_ids": [
>     "gs:conn:01FDVNJS2P8A8KDW5K9Q0C75BZ",
>     "gs:uplink:01FDW9SY2616Y6Q38M9S45TBA2"
>   ],
>   "origin": "ip-10-100-5-213.eu-west-1.compute.internal",
>   "context": {
>     "tenant-id": "CgN0dG4="
>   },
>   "visibility": {
>     "rights": [
>       "RIGHT_GATEWAY_TRAFFIC_READ",
>       "RIGHT_GATEWAY_TRAFFIC_READ"
>     ]
>   },
>   "unique_id": "01FDW9SY26QRJFTNHCBMRQZFE9"
> }

This is the data I ( the application ) receive after I delete the device and add it fully new with new EUI:

> {
>   "name": "as.up.data.forward",
>   "time": "2021-08-24T15:13:39.703536794Z",
>   "identifiers": [
>     {
>       "device_ids": {
>         "device_id": "eui-70b3d57ed00446f5",
>         "application_ids": {
>           "application_id": "loradetest"
>         }
>       }
>     },
>     {
>       "device_ids": {
>         "device_id": "eui-70b3d57ed00446f5",
>         "application_ids": {
>           "application_id": "loradetest"
>         },
>         "dev_eui": "70B3D57ED00446F5",
>         "join_eui": "0000000000000000",
>         "dev_addr": "260B5356"
>       }
>     }
>   ],
>   "data": {
>     "@type": "type.googleapis.com/ttn.lorawan.v3.ApplicationUp",
>     "end_device_ids": {
>       "device_id": "eui-70b3d57ed00446f5",
>       "application_ids": {
>         "application_id": "loradetest"
>       },
>       "dev_eui": "70B3D57ED00446F5",
>       "join_eui": "0000000000000000",
>       "dev_addr": "260B5356"
>     },
>     "correlation_ids": [
>       "as:up:01FDWBJQVBNX282MJW1QATKDXP",
>       "gs:conn:01FDVNJS2P8A8KDW5K9Q0C75BZ",
>       "gs:up:host:01FDVNJS373BFEHYESGN5HM1W5",
>       "gs:uplink:01FDWBJQMVT9BSPNSK258B5R44",
>       "ns:uplink:01FDWBJQMXSC2WHCQGEJ6B1AHV",
>       "rpc:/ttn.lorawan.v3.GsNs/HandleUplink:01FDWBJQMWQ7ZSV075WN3GH2ZZ",
>       "rpc:/ttn.lorawan.v3.NsAs/HandleUplink:01FDWBJQVA8JZ8JR60Q9VTXYPZ"
>     ],
>     "received_at": "2021-08-24T15:13:39.692753587Z",
>     "uplink_message": {
>       "session_key_id": "AXt4uUmyW2hX9qxIZ09gIQ==",
>       "f_port": 1,
>       "frm_payload": "ECoRBBIzAxlUAOcFNVYnBAcACABZAL0=",
>       "decoded_payload": {
>         "command": 1,
>         "reader1": {
>           "id": 0,
>           "value": 4.2
>         },
>         "reader10": {
>           "id": 9,
>           "value": 18.9
>         },
>         "reader2": {
>           "id": 1,
>           "value": 0.4
>         },
>         "reader3": {
>           "id": 2,
>           "value": 5.1
>         },
>         "reader4": {
>           "id": 3,
>           "value": 25
>         },
>         "reader5": {
>           "id": 4,
>           "value": 23.1
>         },
>         "reader6": {
>           "id": 5,
>           "value": 53
>         },
>         "reader7": {
>           "id": 6,
>           "value": 998.8
>         },
>         "reader8": {
>           "id": 7,
>           "value": 0
>         },
>         "reader9": {
>           "id": 8,
>           "value": 0
>         },
>         "sensorcount": 10
>       },
>       "rx_metadata": [
>         {
>           "gateway_ids": {
>             "gateway_id": "ds-58a0cbffe802852",
>             "eui": "58A0CBFFFE802852"
>           },
>           "time": "2021-08-24T15:13:39.416470050Z",
>           "timestamp": 1591138891,
>           "rssi": -84,
>           "channel_rssi": -84,
>           "snr": 9.75,
>           "uplink_token": "CiAKHgoSZHMtNThhMGNiZmZlODAyODUyEghYoMv//oAoUhDLtNv2BRoMCKOZlIkGELLm1uYBIPjpkrqnnwU="
>         }
>       ],
>       "settings": {
>         "data_rate": {
>           "lora": {
>             "bandwidth": 125000,
>             "spreading_factor": 7
>           }
>         },
>         "data_rate_index": 5,
>         "coding_rate": "4/5",
>         "frequency": "868100000",
>         "timestamp": 1591138891,
>         "time": "2021-08-24T15:13:39.416470050Z"
>       },
>       "received_at": "2021-08-24T15:13:39.485111933Z",
>       "consumed_airtime": "0.077056s",
>       "network_ids": {
>         "net_id": "000013",
>         "tenant_id": "ttn",
>         "cluster_id": "ttn-eu1"
>       }
>     }
>   },
>   "correlation_ids": [
>     "as:up:01FDWBJQVBNX282MJW1QATKDXP",
>     "gs:conn:01FDVNJS2P8A8KDW5K9Q0C75BZ",
>     "gs:up:host:01FDVNJS373BFEHYESGN5HM1W5",
>     "gs:uplink:01FDWBJQMVT9BSPNSK258B5R44",
>     "ns:uplink:01FDWBJQMXSC2WHCQGEJ6B1AHV",
>     "rpc:/ttn.lorawan.v3.GsNs/HandleUplink:01FDWBJQMWQ7ZSV075WN3GH2ZZ",
>     "rpc:/ttn.lorawan.v3.NsAs/HandleUplink:01FDWBJQVA8JZ8JR60Q9VTXYPZ"
>   ],
>   "origin": "ip-10-100-15-175.eu-west-1.compute.internal",
>   "context": {
>     "tenant-id": "CgN0dG4="
>   },
>   "visibility": {
>     "rights": [
>       "RIGHT_APPLICATION_TRAFFIC_READ",
>       "RIGHT_APPLICATION_TRAFFIC_READ"
>     ]
>   },
>   "unique_id": "01FDWBJQVQP8TMTPFV9KTSDK91"
> }

Many thanks to everyone who can help with this. Please tell me if I can provide any more info.
Also, sorry for the formatting, I really tried my best.

Rather than worrying about ‘fixing’ count reset issues (which can only ever be a kluge and leave you a bit more exposed from security p.o.v.) why not set up as correct OTAA, then if they loose power or reset for any reason fcount will clear as part of rejoin process…that said regular rejoins not to be recommended as long term operation mode as there are only so many devnonces to draw on before you need to reset device/application… on V2 it sounds like the ‘creator’ may have cut corners or been a bit spartan in implementation in a way that was tolerated by V2 stack likely now getting caught out as V3 closer to standard implementations of NS stack and much less tolerant…problems can only get worse for you from here I suspect :wink:

There are two key questions:

  • What is the device (vendor & model OR if not commercially available, MCU + radio)?
  • What is the firmware on it (version, source on internet)?

Is this something I can do with the console or CLI or does this require software/hardware/firmware-changes?

That I don’t know. As far as I understood, there is not enough non-volatile memory in the devices for the
frame counter and changing that would require hardware changes. This is planned in the next revision but
as I already have those devices it can’t be changed. Also, I’m not very fond of throwing things away if they’re
still good.

Sorry I don’t know that. I’m going to ask that, but it might take a while to get a reply.

You don’t know what the device is at all? No part numbers, what it measures or senses?

Ok, that was a bit misleading.

The devices are not public available yet, so there’s no source on
the internet or firmware-version to check.
The “creator” as I called it, is a friend and business-partner who built
the devices from scratch. They were developed to measure data from various
optional sensors, like temperature, humidity and so on. They were initially
developed to use local network / wifi, the “need” to use LoRa came later.

The devices themself work great, and his implementation of LoRa worked
with V2 as it wasn’t that strict as V3 seems to be. Now, as mentioned, the
devices lack the ability to store the frame counter over a power loss.
This is no problem for new devices as they will be revised, but there’s
a bunch of devices that can’t be changed anymore.
As these devices are really low in power consumption, they are most of
the time unlikely to run into that problem, but in winter they surely
will, at least every now and then. And it would be great to have an
automated way to “fix” this, as there is already a script to collect
and distribute their data which would recognize when those devices are
out of power.
But unfortunatelly I can’t tell anything about the hardware in use
( like MCU ) as I don’t know about that sort of things.

Sorry to say, but I heard the “Mission Impossible” theme tune in my head whilst reading “power loss” and “fix” - the fix is to have LoRaWAN compliant firmware & hardware. Power loss is not really a thing for LoRaWAN - sure it happens occasionally but it should be avoided for exactly these reasons.

If you can find out the details from your “friend” we can advise on the feasibility of upgrading the firmware to a more compliant version - for which we will need to know what the microcontroller & radio chips are and what the current firmware is.