Are confirmed uplinks indeed broken?

For quite some time the consensus on the forum seems to be that the design of network-level LoRaWAN confirmed uplinks is broken, as the network would not send an ACK if it detects a retry. So, if the first acknowledgement for a confirmed uplink is somehow not received by the device, and the network would not send an ACK for subsequent retries, then the device will transmit the very same confirmed uplink again and again, hopefully giving up at some point.

However, some basic tests seem to show all works fine for me. Using confirmed uplinks on SF9 with an LMIC ABP node for EU868, non-ADR, while erroneously not setting LMIC.dn2Dr = DR_SF9, nicely forces LMIC to retry the uplink as it will never receive the ACK in RX2 then. (TTN is likely to use RX2 for uplinks that use SF9 or worse, and indeed does in the test below.) But MQTT shows that TTN is also nicely sending the ACKs for each of the repeated uplinks.

Am I missing something? Does it matter that I am not using ADR in this test? Maybe also setting an application-level downlink changes things? Is the LoRaWAN specification wrong, but does TTN V2 implement it better than specified? Could things be different for other regions on TTN? Should I not blindly trust the MQTT messages but get an SDR receiver running to ensure the gateway is indeed transmitting the downlink?

Click below for the JSON payloads of the MQTT messages, and some more details.

1st try, 868.5 MHz, payload CQ==, uplink counter 1

This does not include is_retry in the MQTT payload:

{
  "app_id": "arjanvanb-app-testing",
  "dev_id": "arjanvanb-heltec-dr-testing",
  "hardware_serial": "008F5B4F83EDA082",
  "port": 9,
  "counter": 1,
  "confirmed": true,
  "payload_raw": "CQ==",
  "metadata": {
    "time": "2020-10-21T15:24:24.571157853Z",
    "frequency": 868.5,
    "modulation": "LORA",
    "data_rate": "SF9BW125",
    "airtime": 164864000,
    "coding_rate": "4/5",
    "gateways": [
      {
        "gtw_id": "arjanvanb-gw-1",
        "timestamp": 3231208652,
        "time": "2020-10-21T15:24:24Z",
        "channel": 0,
        "rssi": -93,
        "snr": 8.5,
        "rf_chain": 0
      }
    ]
  }
}
1st ACK, RX2, 869.525 MHz, port 0, downlink counter 0

The downlink counter being zero, it’s somehow suppressed in the JSON payload of the MQTT API below. However, decoding the payload shows FCnt = 0 and FCtrl has its ACK bit set.

{
  "payload": "YBcTASYgAAD1oPvr",
  "message": {
    "app_id": "arjanvanb-app-testing",
    "dev_id": "arjanvanb-heltec-dr-testing",
    "port": 0
  },
  "gateway_id": "arjanvanb-gw-1",
  "config": {
    "modulation": "LORA",
    "data_rate": "SF9BW125",
    "airtime": 144384000,
    "frequency": 869525000,
    "power": 27
  }
}

2nd try, 867.1 MHz, same payload, same uplink counter 1

Below, is_retry is now included and set to true:

{
  "app_id": "arjanvanb-app-testing",
  "dev_id": "arjanvanb-heltec-dr-testing",
  "hardware_serial": "008F5B4F83EDA082",
  "port": 9,
  "counter": 1,
  "confirmed": true,
  "is_retry": true,
  "payload_raw": "CQ==",
  "metadata": {
    "time": "2020-10-21T15:24:41.095941199Z",
    "frequency": 867.1,
    "modulation": "LORA",
    "data_rate": "SF9BW125",
    "airtime": 164864000,
    "coding_rate": "4/5",
    "gateways": [
      {
        "gtw_id": "arjanvanb-gw-1",
        "timestamp": 3247684956,
        "time": "2020-10-21T15:24:40Z",
        "channel": 0,
        "rssi": -83,
        "snr": 11.75,
        "rf_chain": 0
      }
    ]
  }
}
2nd ACK, RX2, 869.525 MHz, port 0, downlink counter 1
{
  "payload": "YBcTASYgAQAY8AQi",
  "message": {
    "app_id": "arjanvanb-app-testing",
    "dev_id": "arjanvanb-heltec-dr-testing",
    "port": 0
  },
  "gateway_id": "arjanvanb-gw-1",
  "config": {
    "modulation": "LORA",
    "data_rate": "SF9BW125",
    "airtime": 144384000,
    "counter": 1,
    "frequency": 869525000,
    "power": 27
  }
}

3rd try, 867.3 MHz, same payload, same uplink counter 1
{
  "app_id": "arjanvanb-app-testing",
  "dev_id": "arjanvanb-heltec-dr-testing",
  "hardware_serial": "008F5B4F83EDA082",
  "port": 9,
  "counter": 1,
  "confirmed": true,
  "is_retry": true,
  "payload_raw": "CQ==",
  "metadata": {
    "time": "2020-10-21T15:24:57.716178026Z",
    "frequency": 867.3,
    "modulation": "LORA",
    "data_rate": "SF10BW125",
    "airtime": 288768000,
    "coding_rate": "4/5",
    "gateways": [
      {
        "gtw_id": "arjanvanb-gw-1",
        "timestamp": 3264285164,
        "time": "2020-10-21T15:24:57Z",
        "channel": 0,
        "rssi": -74,
        "snr": 9.75,
        "rf_chain": 0
      }
    ]
  }
}
3rd ACK, RX2, 869.525 MHz, port 0, downlink counter 2
{
  "payload": "YBcTASYgAgDpAYjM",
  "message": {
    "app_id": "arjanvanb-app-testing",
    "dev_id": "arjanvanb-heltec-dr-testing",
    "port": 0
  },
  "gateway_id": "arjanvanb-gw-1",
  "config": {
    "modulation": "LORA",
    "data_rate": "SF9BW125",
    "airtime": 144384000,
    "counter": 2,
    "frequency": 869525000,
    "power": 27
  }
}

@cslorabox, help? :flushed:

2 Likes

Are frame counter checks by any chance disabled in the registration of this device?

1 Like

This issue #272 would appear to be why and where TTN started acking retries (concerns or confusion about the switch statement fallthrough in a later message…)

And in in amazing coincidence, also issue #272 in a different network server makes an argument for why they are not being ack’d there:

To be clear, my initial awareness and subsequent airing of the issue came from experiments against an old version of what is now Chirpstack, not again TTN, and it seems that this behavior may differ, from different interpretations of the spec.

1 Like

They were. :blush: But: enabling makes no difference. :sweat_smile:

Aside, the repeated test shows that MCCI LMIC increases the SF up to SF12, trying twice for each SF, and then gives up to continue with its next uplink at SF9.

TTN Console gateway traffic for MCCI LMIC

At the moment I’m puzzled by this code:

	switch {
case macPayload.FHDR.FCnt > device.FCntUp && macPayload.FHDR.FCnt-device.FCntUp <= maxFCntGap:
	// FCnt higher than latest and within max FCnt gap (normal case)
case device.DisableFCntCheck:
	// FCnt Check disabled. Rely on MIC check only
case device.FCntUp == 0:
	// FCntUp is reset. We don't know where the device will start sending.
case macPayload.FHDR.FCnt == device.FCntUp:
	if phyPayload.MHDR.MType == lorawan.ConfirmedDataUp {
		// Retry of confirmed uplink
		break
	}
	fallthrough
case macPayload.FHDR.FCnt <= device.FCntUp:
	return errors.NewErrInvalidArgument("FCnt", "not high enough")
case macPayload.FHDR.FCnt-device.FCntUp > maxFCntGap:
	return errors.NewErrInvalidArgument("FCnt", "too high")
default:
	return errors.NewErrInternal("FCnt check failed")
}

In that I don’t even see how that duplicate check is unique to even the case of a duplicate, since the if(confirmed) would seemingly fire on any case that gets through the fcnt check such as increment or waiving checks.

Maybe I should make some coffee.

Orne Brocaar at the Chirpstack (ex LoRaServer) project seems to feel pretty strongly as reflected in the issue for his code that a repeated uplink is not supposed to be ack’d in LoRaWAN 1.02, because this could lead to a denial of service of gateway downlink (and by extension, uplink) capacity. His code implements things accordingly, and in practical terms, that’s what I was testing against when I first became aware of the issue (most of my own current efforts are now neither against that, nor TTN, but a custom solution that isn’t even really LoRaWAN anymore)

A contrasting hint is the diagram for sections 18.1 of the 1.02 LoRaWAN spec which does seem to show ACKs being generated for both the original uplink, and a retry.

He then goes on to point out how in LoRaWAN 1.1 additional details of the uplink are included in the MIC coverage, which makes a replay attack on a different channel impossible, so the server could use change of channel as a criteria for a valid uplink. How much people are actually concerned about attackers is an interesting real world question. And of course TTN does not implement LoRaWAN 1.1…

At the end of the day though, confirmed traffic is even in the ideal case hugely expensive of downlink capacity. And I maintain the belief that in the rare cases where it’s actually justified, it’s better to implement this at application level where the mechanism can be tuned to a particular need, than at MAC level where it depends on the details (or reading) of the bluntly generic instrument which is a spec.

1 Like

Nice find. And that code is still the same in today’s V2.

The only disadvantage that I can think of is that an application-level downlink may be postponed until the next uplink, so may need an additional uplink if the device really wants to know if its LoRa transmission was received. (At the same time that’s very much an advantage, hopefully making folks think twice before considering any downlink.)

Like for my basic data rate tester, it is nice to ensure the downlink is always sent immediately (and I’ve explicitly disabled retries there). But that’s testing only, of course, not a real life use case. (And it would be better to use LinkCheckReq for that, or I could just add the uplink counter to the application-level downlink if confirmed uplinks were not supported. And most often I do not use downlinks at all but just use 4G to peek into TTN Console instead.)