How to clear downlink queue programmatically?

acutetech · June 28, 2021, 10:56pm

There is an instruction here to clear the downlink queue from a command line with this:

$ ttn-lw-cli end-devices downlink clear app1 dev1

I need to do the same from my PHP server: when I receive an incoming uplink webhook message I need to be certain that the device will receive the downlink response that I am about to compose, and not an old one still queued on the server. I am using the “replace” form, but sometimes an old downlink message is sent to the device:
/api/v3/as/applications/{application_id}/webhooks/{webhook_id}/devices/{device_id}/down/replace

How is this done? Perhaps POST to:
/api/v3/as/applications/{application_id}/webhooks/{webhook_id}/devices/{device_id}/down/clear

acutetech · June 29, 2021, 8:07am

Hmmm… the problem might lie elsewhere. Example:

A device joins and sends a first uplink message which is forwarded to a webhook application, which then replies with a first downlink message. However, the device does not receive the message (say it is reset).

The device then powers on again, and joins again. It sends a second uplink message in the direction of the webhook application, but the server sends the FIRST downlink message which it has been saving. The web hook sends a second response, which is queued at the server. From that point on the device and server are out of sync. When the device sends its third uplink message, it receives the queued second downlink message, and the webhook provides the third downlink message which the server queues.

In other words: at rejoin the state of the device has been reset, but the state of the server has not.

I should be able to deal with this at the level of my device application, but: in general, what aspects of the server state is reset when a device rejoins? In principle, should queued downlink messages be purged when a device rejoins, or should they be preserved and delivered to the rejoined device? Should this be user-configurable, or hard-coded at the server? Is this a common issue which has a best-practise resolution, and if so, what is it?

descartes · June 29, 2021, 8:13am

Give me a day to test a solution provided to me after I got a device in to a reboot cycle whilst testing.

kersing · June 29, 2021, 9:01am

The network doesn’t know about the device not receiving the message. After the uplink the downlink will be sent regardless of the device state. Unless you are using ACKed downlinks which is a very bad idea by itself (search the forum if you want to know why) the network considers the downlink sent.
For V3 this will ‘fail’ if the downlink arrives after the 5 second RX window. At that point it will be scheduled for another uplink.

acutetech · June 29, 2021, 10:03am

Is there a general problem here, and if so what is the general solution?

In general a device can send an uplink message and hope for a downlink response. The response can arrive before the server sends its downlink message, in which case the device might receive the “current” response, or the reponse can arrive after the server sends its downlink message, in which case the device won’t get the response “now”, but (probably) will get this earlier response the next time it sends an uplink message. The api “push” vs “replace” choice is almost irrelevant here, since the device will get the earlier response anyway. Of course, any message can get lost at any point.

So if there is a general problem here, does it need to be handled at the device application level? How? For instance, could a solution involve the device’s uplink fcnt being returned in a response message payload, so the device can match downlink message responses with earlier uplink messages? Or what else?

Second question: if the server sees a device rejoin, should it not purge any queued downlink messages? (This is not a solution to the problem presented here, but it seems good form to me…).

Third question: is there any way to view queued downlink messages through the console (feature request…).

descartes · June 29, 2021, 10:48am

Yes, general problem with sequencing, no, there is no general solution, it’s very much situational.

Whilst we now have a potential 3 or so seconds to process an uplink and queue a response, this is ambitious and as we need to code for up or downlinks not arriving as expected, prone to some interesting failure modes.

This possibly indicates you are thinking of downlinks as a thing. Downlinks should be treated as an exception.

Everything and anything - but again, it sounds like a scheme predicated on downlinks being a common occurrence.

Having an uplink that expects a response should allow for “eventually” and then you embedded counters in to the message so it knows which uplink the downlink is responding to. But if you have a device that has some uplinks awaiting a response then the device isn’t a great candidate for LoRaWAN.

Again, devices should be highly unlikely to rejoin and as downlinks shouldn’t be prevalent, the conjunction of rejoin and pending downlinks should be rare.

No, but again it’s not its use case and as you aren’t using downlinks often, you should remember that you’ve queued one and you can see it in the console log. If you use the console to queue a downlink its because you are developing / testing. If you use downlinks, you really need an application / database that runs command & control and knows about the state of such things.

You can view the downlink messages via the CLI - which is the official preferred way to manage a deployment of devices.

ttn-lw-cli end-devices downlink list/clear/push/replace [application-id] [device-id] [flags]

The scenario I ran in to was a remote reboot that was sent as acknowledged by an enthusiastic tester. The device promptly rebooted, joined, hadn’t ack’d the downlink so rebooted, joined, hadn’t ack’d the downlink so rebooted, joined, hadn’t ack’d the downlink so rebooted, joined, hadn’t ack’d the downlink etc etc

It took a few goes to stop that from the command line, I think I ended up resetting the MAC state after it had joined but before it had uplinked so it didn’t have a session so didn’t connect so didn’t get the downlink. We’d then have wait for it to realise it had lost its join but as it was on a bench, the finger of reset was use.

It is the CLI command I have a test scheduled for to deal with this, to purge the downlink acks pending.

The command & control database for this device won’t allow ack’d reboot commands - or indeed any ack’d downlinks - we can either detect the change (say in uplink frequency) or we ack the change in the next scheduled uplink as long as it’s in a reasonable timescale (like the next couple of hours). Otherwise we send a short info uplink.

acutetech · June 29, 2021, 10:07pm

Thanks for your advice Nick. Indeed, my devices won’t use downlinks much, but are expecting one the first time they power up, and it was that code I was testing, and ran into this problem. It has been a lesson…

Regarding confirmed or unconfirmed downlink messages: my device seems to receive unconfirmed messages, but I am not explicitly determining which I generate. I guess the default is unconfirmed, and if I did want a confirmed downlink, I would add

“confirmed”: true

alongside “frm_payload” and “f_port”?

Also, the gateway console does not seem to display downlink Mtype, as it does with uplink messages:

“m_hdr”: {
“m_type”: “CONFIRMED_UP”
},