Confirmed downlink ACK bits being ignored by TTS

AnachronisticPenguin · July 13, 2023, 3:30am

I have run into an issue in which confirmed downlinks are being acknowledged by the end device, but TTN keeps transmitting the downlink.

My end device is a Class A LoRaWAN v1.1.0 device on the Zephyr RTOS.

Example

I have a simple program that joins via OTAA and then uplinks the payload “hello” every minute. It prints downlink payloads to console, but does nothing more with them. It is derived from this sample, modified to save and correctly handle DevNonce and to use LoRaWAN v1.1.0.

A couple of minutes after joining, I send the end device an unconfirmed downlink using the TTN Console with the payload 0x000000. It receives the downlink and prints it to the console.

A few minutes later I schedule a confirmed downlink with the payload 0x010203 using the Console; I do this only once. The end device receives the payload, prints it to the screen, and sets the ACK bit in the FCtrl field in its next uplink to confirm that it received the downlink. So far this is exactly as expected.

However, TTN then continues to periodically send the same confirmed downlink message. It appears to be ignoring the ACK bit sent by the end device.

The table below shows the contents of the traffic exchanged between the end device and TTN. In step 13, the server sends a confirmed downlink. In step 14, the end device sets the ACK bit in its next uplink. However, the server continues to attempt to send the downlink (steps 16 and 19) as if it didn’t receive the ACK response.

Step	Time [s]	Type	ACK	FCntUp	[A,N]FCntDown	FOpt	FRMPayload
1	000.000000	join-request
2	001.805991	join-accept
3	005.511201	unconfirmed data up	0	1		0b01	68656c6c6f
4	005.921428	unconfirmed data down	0		0	0b01033100ff01
5	070.772556	unconfirmed data up	0	2		0307	68656c6c6f
6	137.055912	unconfirmed data up	0	3			68656c6c6f
7	203.330603	unconfirmed data up	0	4			68656c6c6f
8	269.606463	unconfirmed data up	0	5			68656c6c6f
9	270.013984	unconfirmed data down	0		1		000000
10	334.785355	unconfirmed data up	0	6			68656c6c6f
11	401.056721	unconfirmed data up	0	7			68656c6c6f
12	467.336085	unconfirmed data up	0	8			68656c6c6f
13	467.745774	confirmed data down	0		2		010203
14	532.504250	unconfirmed data up	1	9			68656c6c6f
15	598.787881	unconfirmed data up	0	10			68656c6c6f
16	599.213361	confirmed data down	0		3		010203
17	663.947792	unconfirmed data up	1	11			68656c6c6f
18	730.233779	unconfirmed data up	0	12			68656c6c6f
19	730.654144	confirmed data down	0		4		010203

In this state (steps 14 onwards), the downlink queue for the device is empty as reported by the TTN CLI tool. The only way to stop the endless downlinking is to run this command:

ttn-lw-cli dev set $APP $DEV --unset mac-state.pending-application-downlink

It seems to me that the cause of this unexpected behavior is on the application or network server side of The Thing Stack v3 – the end device appears to be complying exactly with the LoRaWAN V1.1.0 specification. I’m at a loss for how to proceed with troubleshooting. Any tips, hints, or useful ideas?

descartes · July 13, 2023, 8:28am

Whilst there is clearly something to investigate here, good practise is to never use confirmed downlinks due to the potential for the application side, (not the LNS) to end up being stuck in a similar loop or local interference blocking the ack attempt. Or, as a client discovered, sending a confirmed device reboot request that rebooted before the device could send the confirmation and ended up stuck in a loop until I found the CLI command to clear it all down!

It is preferable to have the device include an ack in its next uplink even if the next uplink follows the downlink a few seconds later. I used to use a flag in the first byte used for device status to ack a downlink command, I’m moving to a byte counter to say which downlink was heard (255 wrap around is deeply unlikely to occur).

Confirmed uplinks aren’t a think either - maybe once every few days for a link check - but if the application is so important that it needs lots of confirmations it’s perhaps not a good use case for LoRaWAN. For “I want all the dataz” applications I put the previous & current readings in an uplink. Deltas & bit packing help with making this smaller, which allows the potential for more than one prior reading plus current. I also have a replay request option if need be - for when there are gaps in the data that need filling in. There is also a variant on Forward Error Correction that gives future readings but a flux capacitor is a huge drain on the battery life so rarely deployed.

I’ve no quick access to v1.1 firmware but I have a test rig on v1.0.4 that I can replicate this.

In the meanwhile, the big brains of @adrianmares can probably explain all of this in one sentence or less!

adrianmares · July 13, 2023, 9:08am

I see that Zephyr operates with a fork of LoRaMac-node, but their fork is a bit weird - they do not seem to be on the latest release of LoRaMac-node (per this - I interpret that message that they are not using 4.7.0, but 4.6.0 with some cherrypicked fixes)

Versions before 4.7.0 have a bug in the handling of the MIC for the uplink which should confirm that the downlink has been received (LoRaWAN 1.1 has different MIC calculations for uplinks/downlinks which confirm a message). You can find the original issue here.

If upgrading their fork to the latest release it too hard, it may be easier for you to just incorporate the fix for that problem. But for the record, there are many fixes done in the 4.7.0 release.

descartes · July 13, 2023, 10:07am

From:

Appears being the operative word - LMn is just too complicated to work with the latest version, particularly when the code base has passed through an open source collective - even the ST version which is throughly processed to fit ST’s CubeMX world can result in some oddities - so I end up one version behind on CubeMX which is generally 6 to 9 months behind LMn release. I am looking at ways of reviewing the latest MX vs prior vs latest official release but it is a totally unaffordable time sink.

I’ll still try confirmed downlinks on my v1.0.4 setup if you care to do the same and we can compare notes!

AnachronisticPenguin · July 13, 2023, 11:24am

Thanks, this is great advice. I think I’ll do something similar to confirm uplinks and downlinks without using the MAC layer.

AnachronisticPenguin · July 13, 2023, 11:25am

Will try to get 1.0.4 working on Zephyr to test! Not entirely confident I can bend it to my will, but I’ll try.

AnachronisticPenguin · July 13, 2023, 11:26am

Thanks for the links! I’ll try incorporating that fix and see if it resolves the problem.

mluis1 · July 13, 2023, 12:09pm

There is no need to change your firmware to operate with 1.0.4 version.

You just need to tell the TTN server that the end-device is a 1.0.4 device instead of 1.1.0.

After this change the TTN server will set the JoinAccept frame OptNeg bit to 0 which indicates to the end-device that the network server is a 1.0.x server.

When the network server operates with version greater than 1.0.4 it sets the OptNeg bit to 1 which indicates to the end-device to operate as a 1.1.0 device.

I would also recommend that you open an issue on Zephyr github project asking them to update the loramac-node project to version 4.7.0

AnachronisticPenguin · July 14, 2023, 1:05am

I created a 1.0.4 device in TTN and modified my end-device code slightly (assigned the APP_KEY to NWK_KEY when configuring the join structure). This version behaves correctly for both unconfirmed and confirmed downlinks – confirmed downlinks are ACKed and the ACK is accepted by the server. This confirms that the issue only affects v1.1.0 devices.

I then applied the patch suggested by @adrianmares to the LoRaMac-node in my local copy of Zephyr and built a v1.1.0 end device and… it worked! ACKs from confirmed downlinks are accepted by the TTS server.

Thank you all for your help. I will open an issue at the Zephyr project as suggested, asking them to upgrade LoRaMac-node to 4.7.0.

system · July 15, 2023, 1:06am

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.