Confirmed downlink ACK bits being ignored by TTS

I have run into an issue in which confirmed downlinks are being acknowledged by the end device, but TTN keeps transmitting the downlink.

My end device is a Class A LoRaWAN v1.1.0 device on the Zephyr RTOS.

Example

I have a simple program that joins via OTAA and then uplinks the payload “hello” every minute. It prints downlink payloads to console, but does nothing more with them. It is derived from this sample, modified to save and correctly handle DevNonce and to use LoRaWAN v1.1.0.

A couple of minutes after joining, I send the end device an unconfirmed downlink using the TTN Console with the payload 0x000000. It receives the downlink and prints it to the console.

A few minutes later I schedule a confirmed downlink with the payload 0x010203 using the Console; I do this only once. The end device receives the payload, prints it to the screen, and sets the ACK bit in the FCtrl field in its next uplink to confirm that it received the downlink. So far this is exactly as expected.

However, TTN then continues to periodically send the same confirmed downlink message. It appears to be ignoring the ACK bit sent by the end device.

The table below shows the contents of the traffic exchanged between the end device and TTN. In step 13, the server sends a confirmed downlink. In step 14, the end device sets the ACK bit in its next uplink. However, the server continues to attempt to send the downlink (steps 16 and 19) as if it didn’t receive the ACK response.

Step Time [s] Type ACK FCntUp [A,N]FCntDown FOpt FRMPayload
1 000.000000 join-request
2 001.805991 join-accept
3 005.511201 unconfirmed data up 0 1 0b01 68656c6c6f
4 005.921428 unconfirmed data down 0 0 0b01033100ff01
5 070.772556 unconfirmed data up 0 2 0307 68656c6c6f
6 137.055912 unconfirmed data up 0 3 68656c6c6f
7 203.330603 unconfirmed data up 0 4 68656c6c6f
8 269.606463 unconfirmed data up 0 5 68656c6c6f
9 270.013984 unconfirmed data down 0 1 000000
10 334.785355 unconfirmed data up 0 6 68656c6c6f
11 401.056721 unconfirmed data up 0 7 68656c6c6f
12 467.336085 unconfirmed data up 0 8 68656c6c6f
13 467.745774 confirmed data down 0 2 010203
14 532.504250 unconfirmed data up 1 9 68656c6c6f
15 598.787881 unconfirmed data up 0 10 68656c6c6f
16 599.213361 confirmed data down 0 3 010203
17 663.947792 unconfirmed data up 1 11 68656c6c6f
18 730.233779 unconfirmed data up 0 12 68656c6c6f
19 730.654144 confirmed data down 0 4 010203

In this state (steps 14 onwards), the downlink queue for the device is empty as reported by the TTN CLI tool. The only way to stop the endless downlinking is to run this command:

ttn-lw-cli dev set $APP $DEV --unset mac-state.pending-application-downlink

It seems to me that the cause of this unexpected behavior is on the application or network server side of The Thing Stack v3 – the end device appears to be complying exactly with the LoRaWAN V1.1.0 specification. I’m at a loss for how to proceed with troubleshooting. Any tips, hints, or useful ideas?

Whilst there is clearly something to investigate here, good practise is to never use confirmed downlinks due to the potential for the application side, (not the LNS) to end up being stuck in a similar loop or local interference blocking the ack attempt. Or, as a client discovered, sending a confirmed device reboot request that rebooted before the device could send the confirmation and ended up stuck in a loop until I found the CLI command to clear it all down!

It is preferable to have the device include an ack in its next uplink even if the next uplink follows the downlink a few seconds later. I used to use a flag in the first byte used for device status to ack a downlink command, I’m moving to a byte counter to say which downlink was heard (255 wrap around is deeply unlikely to occur).

Confirmed uplinks aren’t a think either - maybe once every few days for a link check - but if the application is so important that it needs lots of confirmations it’s perhaps not a good use case for LoRaWAN. For “I want all the dataz” applications I put the previous & current readings in an uplink. Deltas & bit packing help with making this smaller, which allows the potential for more than one prior reading plus current. I also have a replay request option if need be - for when there are gaps in the data that need filling in. There is also a variant on Forward Error Correction that gives future readings but a flux capacitor is a huge drain on the battery life so rarely deployed.

I’ve no quick access to v1.1 firmware but I have a test rig on v1.0.4 that I can replicate this.

In the meanwhile, the big brains of @adrianmares can probably explain all of this in one sentence or less!

2 Likes

I see that Zephyr operates with a fork of LoRaMac-node, but their fork is a bit weird - they do not seem to be on the latest release of LoRaMac-node (per this - I interpret that message that they are not using 4.7.0, but 4.6.0 with some cherrypicked fixes)

Versions before 4.7.0 have a bug in the handling of the MIC for the uplink which should confirm that the downlink has been received (LoRaWAN 1.1 has different MIC calculations for uplinks/downlinks which confirm a message). You can find the original issue here.

If upgrading their fork to the latest release it too hard, it may be easier for you to just incorporate the fix for that problem. But for the record, there are many fixes done in the 4.7.0 release.

4 Likes

From:

Appears being the operative word - LMn is just too complicated to work with the latest version, particularly when the code base has passed through an open source collective - even the ST version which is throughly processed to fit ST’s CubeMX world can result in some oddities - so I end up one version behind on CubeMX which is generally 6 to 9 months behind LMn release. I am looking at ways of reviewing the latest MX vs prior vs latest official release but it is a totally unaffordable time sink.

I’ll still try confirmed downlinks on my v1.0.4 setup if you care to do the same and we can compare notes!

1 Like

Thanks, this is great advice. I think I’ll do something similar to confirm uplinks and downlinks without using the MAC layer.

Will try to get 1.0.4 working on Zephyr to test! Not entirely confident I can bend it to my will, but I’ll try.

Thanks for the links! I’ll try incorporating that fix and see if it resolves the problem.

There is no need to change your firmware to operate with 1.0.4 version.

You just need to tell the TTN server that the end-device is a 1.0.4 device instead of 1.1.0.

After this change the TTN server will set the JoinAccept frame OptNeg bit to 0 which indicates to the end-device that the network server is a 1.0.x server.

When the network server operates with version greater than 1.0.4 it sets the OptNeg bit to 1 which indicates to the end-device to operate as a 1.1.0 device.

I would also recommend that you open an issue on Zephyr github project asking them to update the loramac-node project to version 4.7.0

3 Likes

I created a 1.0.4 device in TTN and modified my end-device code slightly (assigned the APP_KEY to NWK_KEY when configuring the join structure). This version behaves correctly for both unconfirmed and confirmed downlinks – confirmed downlinks are ACKed and the ACK is accepted by the server. This confirms that the issue only affects v1.1.0 devices.

I then applied the patch suggested by @adrianmares to the LoRaMac-node in my local copy of Zephyr and built a v1.1.0 end device and… it worked! ACKs from confirmed downlinks are accepted by the TTS server.

Thank you all for your help. I will open an issue at the Zephyr project as suggested, asking them to upgrade LoRaMac-node to 4.7.0.

1 Like

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.