Unable to send downlink to V1.1.0 end device

AnachronisticPenguin · July 5, 2023, 11:36am

Summary of the problem

I have built a class A end device running the Zephyr RTOS. It joins OTA and sends an unconfirmed uplink every 10 minutes. I have registered it in my application on The Things Stack Community Edition (EU1) as a LoRaWAN V1.1.0 device using the RP002 1.0.1 regional parameters, as this matches the version of the loramac-node library used by Zephyr.

I am unable to send Downlink messages to the device – the Downlink tab in the TTN console gives me the following error: “Downlinks can only be scheduled for end devices with a valid session. Please make sure your end device is properly connected to the network.”

Can anyone give me some tips on how to troubleshoot this, or know what might be causing this? As far as I can tell, my device has successfully joined the network and should have a valid session. I think that the problem is not on the end device, and is related to the Application Session (see below), but have hit a wall in trying to troubleshoot further.

Troubleshooting thus far

Join procedure

The device appears to join OTA successfully. Examining the live data for the device, I see the expected steps for a V1.1 join procedure:

Join-request message from end device with DevNonce value that increases incrementally from 0.

{
    "name": "ns.up.join.receive",
    "time": "2023-07-05T02:52:14.097832Z",
    "data": {
      "@type": "type.googleapis.com/ttn.lorawan.v3.UplinkMessage",
      "raw_payload": "AAAAAAAAAAAACvQF0H7Vs3ADAFoZlRc=",
      "payload": {
        "m_hdr": {},
        "mic": "WhmVFw==",
        "join_request_payload": {
          "join_eui": "0000000000000000",
          "dev_eui": "70B3D57ED005F40A",
          "dev_nonce": "0003"
        }
      },

Join-accept reply from Join server with appropriate JoinNonce value and OptNeg bit set (indicating V1.1).

{
    "name": "ns.down.join.schedule.attempt",
    "time": "2023-07-05T02:52:15.900862406Z",
    "data": {
      "@type": "type.googleapis.com/ttn.lorawan.v3.DownlinkMessage",
      "raw_payload": "IPMAQjYO2Y/hW4+IhfEW4bzkuSeyQoKjdoXr2UkWFoOx",
      "payload": {
        "m_hdr": {
          "m_type": "JOIN_ACCEPT"
        },
        "join_accept_payload": {
          "net_id": "000013",
          "dev_addr": "260B6B15",
          "dl_settings": {
            "rx2_dr": 8,
            "opt_neg": true
          },
          "rx_delay": 5,
          "cf_list": {

(I have independently decrypted the raw payload to confirm that the JoinNonce value is sensible)

RekeyInd MAC command in FOpts payload of first unconfirmed-data-up message from end device.

{
    "name": "ns.up.data.receive",
    "time": "2023-07-05T02:52:19.625521894Z",
    "data": {
      "@type": "type.googleapis.com/ttn.lorawan.v3.UplinkMessage",
      "raw_payload": "QBVrCyaCAQCyzgIZYFiQqkdTuPg=",
      "payload": {
        "m_hdr": {
          "m_type": "UNCONFIRMED_UP"
        },
        "mic": "R1O4+A==",
        "mac_payload": {
          "f_hdr": {
            "dev_addr": "260B6B15",
            "f_ctrl": {
              "adr": true
            },
            "f_cnt": 1,
            "f_opts": "ss4="
          },
          "f_port": 2,
          "frm_payload": "GWBYkKo=",
          "full_f_cnt": 1
        }

Decrypting the f_opts field, I get the value 0x0b01, which is a RekeyInd MAC command from the end device.

RekeyConf MAC reply from network server.

  {
    "name": "ns.down.data.schedule.attempt",
    "time": "2023-07-05T02:52:20.037002128Z",
    "data": {
      "@type": "type.googleapis.com/ttn.lorawan.v3.DownlinkMessage",
      "raw_payload": "YBVrCyaHAAByps9P0VlgsF+LIg==",
      "payload": {
        "m_hdr": {
          "m_type": "UNCONFIRMED_DOWN"
        },
        "mic": "sF+LIg==",
        "mac_payload": {
          "f_hdr": {
            "dev_addr": "260B6B15",
            "f_ctrl": {
              "adr": true
            },
            "f_opts": "cqbPT9FZYA=="
          }
        }
      },

Decrypting the f_opts field, I get the value 0x0b01033100ff01, which has multiple MAC commands. 0b01 completes the key update (RekeyConf), and 033100ff01 updates the end device power and channel mask (LinkADRReq).

The successful join procedure means that the Join-accept message and both the FOpts & FRMPayload fields of uplinks and downlinks are decrypted by the end device and the server, so they share the same NwkSEncKey and AppSKey encryption keys.

Downlinks

The Network Server is successfully downlinking MAC commands to the end device to confirm rekeying (RekeyConf, 0x0B) (see Join procedure above), but also to adjust its data rate (LinkADRReq, 0x03) and to request device status (DevStatusReq, 0x06). This series of successful MAC-level exchanges between the end device and the Network Server implies that the Network Session context is fine.

My conclusion is that the problem lies with the other part of a LoRaWAN session: the Application Session context. This context consists of AppSKey, FCntUp, and AFCntDown. The fact that I can successfully decrypt FRMPayload means that the end device and the Application Server share the same state for AppSKey and FCntUp (these values are used in the encryption of FRMPayload). I can’t figure out how to examine AFCntDown.

Is this the source of my problem? Or is there somewhere else I should be looking?

descartes · July 5, 2023, 1:56pm

Not sure this is correct (for very high values of not sure) - looks more like v1.0.4 would be the supported version, based on what it says on the read me and that the year of release correlates.

v1.1 is a whole new bag of extra spec that +90% of LoRaWAN devs haven’t looked at yet as it’s rather blue sky thinking. Getting some users off 1.0.2 and on to 1.0.3 is WIP. LMIC is only tested against 1.0.3. Depending on day of week I alternate between 3 or 4 of LMn using the ST Cube CodeSoup.

cprovidenti · July 5, 2023, 4:23pm

Try refreshing that page (i.e., “the Downlink tab in the TTN console”) after the device joins. That usually works for me.

AnachronisticPenguin · July 6, 2023, 12:41am

Thank you, that worked! You have no idea how many hours I spent learning the intricacies of the LoRaWAN V1.1.0 standard trying to debug this, and it turned out to have nothing at all to do with that.

On the plus side, I learned a lot that I’m sure will be useful. I derived the attached diagram showing the OTAA join procedure from the LoRaWAN 1.1 specification – hopefully someone else will find it useful.
Join_procedure_1.1-Page-1.pdf (208.0 KB)

AnachronisticPenguin · July 6, 2023, 12:49am

It supports both 1.0.4 and 1.1.0.

I know that most people are still implementing 1.0.3, or even 1.0.2, but for Class A devices the only real changes of significance for the end device developer in going to 1.1 is the need to provision each device with an extra encryption key, and to track DevNonce. Maybe I’m missing something though. LoRaWAN gateways should be able to handle V1.1 Class A devices without any firmware upgrades – from the gateway’s perspective, the structure and PHY encoding of payloads are unchanged. My gateway claims to only support up to 1.0.3, but has handled all my experiments with 1.1 devices without objection. So as long as your network server / join server supports V1.1 (as The Things Stack does), why not use for new devices?

descartes · July 6, 2023, 8:55am

The repro does not say released against the 1.0.4/1.1.1 - and in fact says there are one active branches (sic) and goes on to list three, so YMMV.

The issue here is the body of collective knowledge - if you want to go Chris Columbus and explore new territories, that’s great, but it will leave you figuring some stuff out alone.

Plus 1.0.4 requires us to track the DevNonce - there’s lots of little subtle changes - so navigation can need more than no map and just a sextant.

AnachronisticPenguin · July 6, 2023, 10:53am

Sorry, I should have been less succinct. I agree: the ReadMe for the LoRaMac-node repository is contradictory and confusing. Examining the code, it appears to correctly implement V1.1 if the appropriate compile-time flag is set and defaults to 1.0.4 otherwise, see e.g.:

github.com

zephyrproject-rtos/loramac-node/blob/zephyr/src/mac/LoRaMacCrypto.c#L178


      
                  {

                      buffer[bufferIndex + i] = buffer[bufferIndex + i] ^ sBlock[i];

                  }

                  size -= 16;

                  bufferIndex += 16;

              }

          

              return LORAMAC_CRYPTO_SUCCESS;

          }

          

          #if( USE_LRWAN_1_1_X_CRYPTO == 1 )

          /*

           * Encrypts the FOpts

           *

           * \param[IN]  address          - Address

           * \param[IN]  dir              - Frame direction ( Uplink or Downlink )

           * \param[IN]  fCntID           - Frame counter identifier

           * \param[IN]  frameCounter     - Frame counter

           * \param[IN]  size             - Size of data

           * \param[IN/OUT]  buffer       - Data buffer

           * \retval                      - Status of the operation

The commit version that is checked out by Zephyr’s west tool has USE_LRWAN_1_1_X_CRYPTO set to 1 in LoRaMacCrypto.h, so the LoRaWAN subsystem on Zephyr defaults to V1.1.

So far, thanks to the help I’ve received here, V1.1 appears to be working correctly with The Things Network. The only hiccup I encountered was this browser window refresh bug, which isn’t due to any flaw on the device-side implementation of V1.1, but appears to be a Things Stack bug.

So I’ll continue on this voyage and warn of any pitfalls I discover.

system · July 7, 2023, 10:54am

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.