After resetting frame counters, payload is shown on gateway traffic but not in application

Hello,

I am experiencing a strange issue. I have a Libelium Smart Agriculture sending every 2 minutes its payload to a registered Kerlink Wirnet Station gateway on TTN. It has been sending data without any problems for the last 5 months! It’s been registered to send data via OTAA.

Today, I tried to add the Collos integration to the TTN application and after that I observed that I couldn’t see any data flow on the console. I reset the frames counter just to check if this is the problem. For a period of 30 minutes there was no gateway traffic too (monitor the gateway via console). After this period my device was again shown in the gateway traffic with the same Device Address as before. Until now, my device is sending data to the gateway but for some reason they are not forwarded to the application console.

Notes:

  • The device and gateway are located somewhere far away from me (around 40km). None of them has been restarted or reprogrammed.
  • The OTAA credentials have not changed and the device address remains the same as before.
  • First, I tried to reset the frame counter. Then, I removed the Collos integration.
  • In the gateway traffic there is no “drop” message in the trace section of the device. Everything seems to be normal!

I don’t know if the Collos integration add was just a coincidence. Because it does not make any sense to me that this could mess things up since is just an API call after the app server receives the data.

Can anyone suggest how to debug this situation without having to restart my device which is really far away? Is it a TTN backend issue?

Thank you very much!


:warning: Moderator note: this topic’s title has been changed for future reference, after the cause of the problems was determined in one of the subsequent responses.

1 Like

Did you happen to add/change any access keys for the Collos integration? (I don’t think that should cause this, but who knows…)

Are you using any other integration, like the Data Storage Integration, or another API, like the MQTT Data API?

If yes: any data there?

Can you subscribe to the MQTT API error event? See https://www.thethingsnetwork.org/docs/applications/mqtt/api.html#error-events

As an aside: be careful when doing that. It will also reset the downlink counter, which now is lower than the one known in your device (should that be using downlinks or ADR).

Hello and thank you for your quick response.

I did not change any access keys for the Collos Integration. Also Data Storage does not have any new data and cannot get any from he MQTT Data API.

The device uses ADR. The strange thing is that I reset the frames counter after 3-4 minutes that the expected time of receiving the data (the device is sending data every 2 mins; I reset the frames counter 5-6 minutes after the last recorded payload).

I subscribed to the MQTT API error event by using the NodeJS SDK:

var ttn = require("ttn")
 
var appID = "xxxxx"
var accessKey = "xxxxxxxxx"
 
// discover handler and open mqtt connection
ttn.data(appID, accessKey)
  .then(function (client) {
	console.log("Connected")
	
	client.on("event", "+", "up/errors", function (devID, message) {
	 console.log("Uplink Errors")
      console.log("Received uplink error from ", devID)
      console.log(message)
    })
    client.on("event", "+", "down/errors", function (devID, message) {
	 console.log("Downlink Errors")
      console.log("Received downlink error from ", devID)
      console.log(message)
    })
	client.on("event", "+", "activations/errors", function (devID, message) {
	 console.log("Activation Errors")
      console.log("Received activation error from ", devID)
      console.log(message)
    })
	
  })
  .catch(function (error) {
    console.error("Error", error)
    process.exit(1)
  })

Am I missing something? Because I get connected to the broker but I do not receive any error messages. Nothing at all to be exact. Can you tell me how to subscribe to the Error API properly?

Again, thank you very much!

Your MQTT code works for me; if I make the Payload Format do return 1/0; then, when simulating an uplink through the device in TTN Console, I get:

Connected
Uplink Errors
Received uplink error from  my-device-id
{ error: 'Unable to decode payload fields: Decoder not valid: does not return an object' }

What are you trying to say here? Why is that strange? (I’d say it’s quite soon for drastic measures like that; sometimes data just isn’t received, so I wouldn’t press that button after only a few minutes… But for uplinks, it should not matter much I guess.)

Does the trace information in the Traffic page in TNN Console look something like the following?

image

And is the Payload Format still okay, like “Custom”?

image

As a backup, I’d copy the current DevAddr, AppSKey and NwkSKey into a safe place. I’m not sure if you can remove the application and device and register new ones with these details; I’d not try that without being sure, but I’d secure the secrets anyhow.

Your MQTT code works for me; if I make the Payload Format do return 1/0; then, when simulating an uplink through the device in TTN Console, I get:

Connected
Uplink Errors
Received uplink error from  my-device-id
{ error: 'Unable to decode payload fields: Decoder not valid: does not return an object' }

It is also working for me, when simulating an uplink. But for real data I don’t get any uplink, downlink or activation errors. I don’t get anything.

Is there any way to find the reason why the message is not forwarded to the application? Is there a proper way to debug the data flow from the network server to the application? It keeps sending data to the gateway and I can see the device address is still the same with the one on the application.

Does the trace information in the Traffic page in TNN Console look something like the following?

image

And is the Payload Format still okay, like “Custom”?

image

Exactly like that! I have also disabled the frame counter checks (last reply at Data received by the gateway but not on application) but that did not make a difference. The device looks dead on the application console.

Just wondering if and how is possible to find the error log. The subscription to the MQTT API error events does not give any results. The data seem not be forwarded at all to the application for some reason.

Once again, thank you for your help!

Seems like a backend issue indeed; I’ve pinged @htdvisser but surely the team is always busy.

Is the Handler for the application in TTN Console still okay? Something like the following in the application’s Overview:

image

Is the Collos integration available to be added again? (Just wondering if that has been removed completely, though if available that might not tell you much.) Do you have any other OTAA devices with which you could test if one can remove the application and device and register new ones with the same DevAddr and secrets?

As the trace in gateway’s Traffic page shows no errors, I’d assume that TTN has matched the DevAddr and NwkSKey to an application, so I’d guess these are still unchanged at the TTN side as well. (Surely the node is still using the same OTAA settings it has been using for a long time.) But just in case: do you have any way to validate if the secret AppSKey and NwkSKey are still the same as before? (I guess you don’t.)

No, not possible. You’ll need help from the team for that. Please post the gateway ID, application ID and DevAddr here. (These are not secret.)

Also, is a simulated uplink (though TTN Console’s Device’s page) shown in TTN Console’s Application Data page? And does ttnctl devices info <device-id> show anything weird? See https://www.thethingsnetwork.org/docs/network/cli/quick-start.html

Device Addresses are not unique. There are hundreds of devices with the same DevAddr, so I wouln’t be too sure about this.

This may help for ABP devices, but for OTAA devices this could make things worse (as downlinks will be rejected by the device).

Unfortunately not. We’re working on this for v3.

Backend telemetry shows nothing out of the ordinary. If it were a backend problem we would see more than one user/device being impacted.

Unfortunately, even if we had time, it would be really difficult to trace individual messages with our current network stack, so I can’t help you with that. The only thing I can say is that we haven’t made any changes to our servers, and that adding an integration has absolutely no influence on the uplink flow of the device.

I disabled the frames check counter. Shouldn’t that have solved any frames counters issue? Also, if the device has ADR enabled it takes into consideration the downlink counter? If this is the case I can manually set the up/down frames counter from CLI. Should the uplink counter be configured as lower than the one seen on the gateway traffic and the downlink counter to a larger number than the one on the device (unknown but could be a large number)?

Yes the simulated uplink is shown on the console and captured by the MQTT Data API subscription too. Nothing weird in the device info!

It is not possible to assign a custom DevAddr anyway. This is assigned automatically by the TTN backend.

AppID
Capture3

DevAddr
Capture

GatewayID
Capture2

If the frames check counter isn’t the case, the network credentials and the application credentials have not changed why the payload does not arrive at the console?

What are the possible issues? Why should the application server reject a payload when all the keys are correct? Why is not arriving at the application server at all? If you can, please make this clear!

Thank you again! I guess I will have to go there and reset it :stuck_out_tongue:

I really doubt that will help. Resetting the node would make it execute a new OTAA join, but why would that help…? (Update: it likely will help; see next posts.)

Indeed, I very much wonder about that as well.

For TTN: yes maybe; see next post. For your node: no. The node does not know you changed that setting.

Yes. (But the uplink counter known to TTN will already have adjusted itself (see next posts). It’s only the downlink counter that might be troublesome for ADR, as the value known to TTN will be lower than the one known in the node. But I’d assume all that does not affect your current problems. Doing a new OTAA Join will fix all counters as well.)

Maybe the following will do the job; see also https://www.thethingsnetwork.org/docs/network/cli/api.html#ttnctl-devices-set

But: I’ve never tested that. (And the post above is about DevAddr that starts with 00.)

So, do you have any other devices to test the above?

Hmmm, that is not “Exactly like that!” when comparing to my screenshot? But it seems the additional “bridge” is due to different type of gateways; my screenshot was for a TTN Gateway and indeed for a Kerlink I see two more lines, referring to some “bridge”. So that’s probably not related to any problems either.

(As an aside: my note about posting screenshots for terminal output surely also applies to posting the DevAddr and all please don’t post screenshots for data that someone might need to copy; would you really expect the team to re-type the DevAddr, gateway ID and all from a screenshot…? But well, they already said they cannot investigate any further :frowning: )

You can actually check that: paste the full LoRaWAN packet from the Gateway Traffic into an online decoder and also provide the secret keys.

This will tell you if the NwkSKey is correct as then the MIC will validate assuming the uplink counter value does not exceed 65,536; if it does exceed that value then ignore MIC validation errors. And regardless the MIC, if the AppSKey is correct, then this will give you an unencrypted application payload that you can paste into the Payload Format decoder’s test input (if any) or decode yourself…

1 Like

For the sake of completeness:

…actually: maybe not. If the difference between the device’s uplink counter and the zero value in TTN exceeded the maximum difference between two subsequent received values (MAX_FCNT_GAP or 16,384), then maybe TTN cannot recover from that.

Or even if the code knows the counter has been reset and hence simply knows it can accept anything, then if the node’s uplink counter exceeds 65,536 (or maybe twice that value; 131,072) then it might not be able to find the matching MSB. In LoRaWAN 1.0.x, the value of FCnt only holds the 16 least-significant bits (LSB) of the actual frame counter. But for a 32 bits frame counter still all 32 bits are used when calculating the MIC. So, the server needs to guess or try the other 16 bits when validating the MIC. The server can use its own internal counters for a best guess. And due to the maximum allowed gap between the last known value and current value, the server needs to try, and will try, only one additional value for the MSB. And that will fail after resetting the counters, if the node’s uplink counter is large.

Even though you also already disabled the frame counter checks, TTN would then still be unable to find a matching MIC. I’ve tested by changing a NwkSKey of an ABP node in TTN Console. This also automatically reset the frame counters :frowning:; even just changing “Frame Counter Checks” or its description will also reset the counters! (Bug 746.) So even after restoring the correct NwkSKey TTN was still unable to find a matching MIC. In the trace part of the Gateway Traffic this showed the exact same thing that we saw for my earlier success and your failure. So, indeed: TTN will not report if it cannot match the MIC.

So: do you know the last 32 bits uplink counter value? If not, you can use a Node.js script to brute-force the MSB and determine the full 32 bits counter:

/*
 * Brute-forces finding a LoRaWAN uplink counter's 16 most-significant bits.
 * This needs:
 *   npm install lora-packet
 */
const lorapacket = require('lora-packet');
const nwkSKey = new Buffer('7A47...', 'hex');
const uplink = new Buffer('4053...', 'hex');

const packet = lorapacket.fromWire(uplink);
console.log(packet.toString());

const msb = new Buffer(2);
for (let i = 0; i <= 0xFFFF; i++) {
    console.log(`Trying: ${i}`);
    msb.writeUInt16LE(i, 0);
    if (lorapacket.verifyMIC(packet, nwkSKey, null, msb)) {
        console.log(`Found MSB: 0x${('0000' + i.toString(16)).substr(-4)}`);
        console.log(`32 bits FCnt: ${i << 16 | packet.getFCnt()}`);
        break;
    }
}

Meanwhile I’ve also enhanced the online decoder to brute-force the FCnt’s MSB. So, the online decoder will also give you the 32 bits FCnt if it can find a valid MIC, even though only 16 bits are given in the LoRaWAN packet.

Indeed, the following fixed my ABP node again:

ttnctl devices set my-device --fcnt-up 123456

…and then there is hope!

Although I’ve no idea how to determine the last known downlink counter… So I guess ADR is now no longer working for that node, until it’s made to do a new OTAA Join.

1 Like

I would like to say a big thank you for your effort. For real, I’ve never seen a person so supportive in a forum.

With your detailed guidelines my problem got fixed. Also lora-packet is a great tool! Great job! In brief what I’ve done:

  • Copied and pasted from TTN application the device’s Network Session Key to the lora-packet javascript code (const nwkSKey = new Buffer(‘7A47…’, ‘hex’);).

  • Copied and pasted the physical payload from the gateway’s traffic to the same code (const uplink = new Buffer(‘4053…’, ‘hex’);).

  • Ran your code with NodeJS, found a valid MIC and got the 32bitsFcnt (You were right. It exceeded 65,536 and for that reason the server couldn’t validate MIC)

  • Then set it via ttnctl: ttnctl devices set my-device --fcnt-up 32bitsFCnt

  • My device is live again!

Summary: Always be careful when resetting the frames check counter. If for any reason you can see your payload to the gateway traffic but not on your TTN Application Console a) make sure you have the same secrets as before, b) find the 32-bit FCnt and set it to your device via ttnctl!


Since the device is not moving I do not need ADR at all. So no problem for the downlink counter!

5 Likes

Actually, you should not use ADR for devices that do move. Devices that are in a fixed location can use ADR to adapt to the changing network around them, like when gateways are added or removed. ADR is too slow (it needs too many uplinks and all) to be used for moving devices.

1 Like

Hello there. I’m with the same problem, any news? Thanks.

Hi,

please supply more info (router, gateway ect) and describe YOUR problem… tnx

or maybe it’s more related to this topic ?