DevAddress Changed and TTN doesn't process payload

jayjay75 · April 15, 2022, 2:44pm

Hello guys,
i have this situation and i can’t figure out a way to fix it.
Gateway BASICS LNS Server
TTN V3
End device: Lora version 1.0.2
End Device sends payload with confirmation ack on

first i set the end device up and it registers correctly, got it’s DevAddress and uplinks data and everything is working as it should.
then i pulled another End device and gave it the same DevEUI by mistake so naturally it connected and registered in TTN V3 with a new DevAddress DevNonce and sent it’s transmission which was received. next i realize the mistake and turn off the second End Device, changed it’s DevEUI to a different and turned it on, so it connected and registered with TTN V3 and sent it’s transmission ok, unfortunately the first End Device continues to send it’s payload using the very first Devaddress it recevied it for that DevEUI, and TTN doesn’t process that data and the End device doesn’t receive any error and continues to think it’s connected. and since TTN v3 obviously changed that DEVEUI DevNonce address when the second impersonating device connected. so my issue is how would my first end device know that TTN no longer accepts it’s payload? as it continues to think it’s connected, btw sending LinkCheckReq didn’t help as it returned valid since the end device is still connected to the gateway, also deleting the End device from TTN V3 and recreating it didn’t help. i still can’t see the playload … can anyone please tell me how to fix this issue as the end device in on a remote location. and i need to find a way to force it to rejoin, oh rebooting the gateway didn’t help either.

i’m out of ideas…
Jay

kersing · April 15, 2022, 3:24pm

First of all, devices do not connect to gateways. Devices connect to TTN. (As a result the device works anywhere where a gateway using the same frequencies and TTN cluster receives its uplink without the need to rejoin)

The second devices successful join did not only reset the device address but also the session keys at TTN. As a result any transmissions from the first device can not be decrypted/verified and are discarded (and no answer will be send). There is no way to recover from this if you did not save the session keys apart from the 1st device joining again and exchanging new session information with TTN.

If you are lucky the firmware in your device automatically rejoins after numerous failed attempts to get ack/linkcheckrequest responses. If the firmware does not do that you will need to reset the device.

What device and what firmware are you using?

jayjay75 · April 16, 2022, 11:12pm

Hello @kersing
Thank you for the explanation, it makes sense… so my questions are as follow:
First let me answer yours: the device is based on Heltec Cubecell ASR6550 and the firmware is custom… so my question is as follow:

How should the firmware recover from such situation, meaning the when teh frame is sent with TxConfirm on and after usually the NBTrials the firmware received a timeout and goes to sleep and doesn’t do anything until next dutycycle is triggered and tries again using a lower Data Rate and again when it fails on the current Data rate it goes to sleep and the cycle goes on until it arrives to Datarate 0.
for some reason i always thought that TTN will reply when the session keys do not match or a device that tries to join and it’s recognized with a certain packet and certain error code, but the way i see it it just simply doesn’t reply and the RX times out, which kind of makes difficult to react on the situation properly, unless of course i’m missing something.

so what is the proper way to actually do a rejoin, if i had to wait for the sent event to go through all Data Rates from DR5 to DR0 it means that the firmware ahs to wait for 5 sent cycles and when the dutycycle i set to 6 hours for example it will actually fail to report data for more than 24 hours and that means a lots of data will be missed.

How do professional devices handle these situations please… i look everywhere and i couldn’t find a resource that describes the best way to trigger a re-join when these time outs happen?

i’d really appreciate it if you have and can share a link to a source code or resource that handle these situations… for any device of course as i just need to understand the logic used.
i went through this document and couldn’t see how to handle similar situation RxTimeout ( https://www.st.com/resource/en/application_note/an5406-how-to-build-a-lora-application-with-stm32cubewl-stmicroelectronics.pdf

i appreciate your help.
Jay

descartes · April 17, 2022, 9:35am

If network server can’t match anything on the session database it won’t have encryption keys, so no, it’s not part of the LoRaWAN spec to do anything like that.

It’s not particular if it’s a professional devices but many of the MACs, the bit our firmware asks to do the LoRaWAN bit, has a Link Check built in that is in the spec which you can read up on the details - basically after ~60 uplinks the MAC will send the uplink as confirmed. If it doesn’t get an ack, then it tries different settings ending in a re-join.

In firmware you can add your own safeguards - particularly if there are larger gaps between uplinks like you.

The STM32WL code base uses the LoRaMAC-node, the ‘official’ code, which has the Link Check built in - you may need to adjust the regional parameters to suit your situation.

For LoRaWAN overall, you should plan for some data loss, including some outage for a sustained period - again there are things you can do to send a moving window of data or save summaries on the device, if appropriate

jayjay75 · April 18, 2022, 1:28am

Thanks @descartes for the explanations I’m getting there

can you share the link to the source code please…i would like to check that section out to understand how it’s implemented.

also how would i know if the Gateway that is out or if it’s TTN that is not finding a matching session keys as described in the situation above?
because if it’s the gateway that is out than i wouldn’t need to rejoin and instead just sleep try at a later time. but if it’s TTN issue then i would need to force a rejoin so i would need a way to distinguish between those two situations. how do you all do that?

cheers,
Jay

descartes · April 18, 2022, 12:50pm

You wouldn’t because you won’t get a response in either situation.

The muddle over DevEUI’s should be a one time learning experience for you so something you shouldn’t need to code around - not that I’m sure where to start with trying to code around something like that without it becoming a potential liability for the firmware, thereby introducing other issues.

So if you work on the basis that devices are provisioned properly, the LoRaWAN spec will solve the gateway out of range / down problem as best it can for you.

The official LoRaMAC code is on GitHub, as is the various flavours of STM32WL implementations, the official ST version is in the CubeMX package from their site.

jayjay75 · April 20, 2022, 1:54am

Thank you @descartes again for the explanations.
well after reviewing the firmware of the cube cell i did notice that it was missing few events so i added those , now i have an event being triggered on Rx timeout, so i do receive the timeouts properly when the end device can’t reach TTN due to another device being provision after it, i also noticed that LoRawan stack keeps lowering the DR properly and calling the timeout event every time it can’t reach the gateway or the server in my case, looping through the NBTrials count, which gives me a good start to thinker with how to force a rejoin. so far i can’t put a switch and an if statement and force a rejoin when the NBTrial count reaches it’s highest number in my case 8 as set in the firmware and when the DR reaches DR_0, which works great so far but i still have to test this with the situation when the end device is simply too far from the gateway and needs to go through all SF7 to SF12, because right now i can’t tell it does go through to SF12 or it still simply in SF7 with DR_0… once i figure out how to know that the end device does go through all the SFs and DRs then i can at least guess that something is wrong force a rejoin when the gateways or the server is unreachable at the highest SF12 and DR_0…
this is how i have it set right now in my RXTimerout event. Any suggestions are welcomed please.

int timeoutTxDutyCycle ;
void downLinkAckHandle(McpsConfirm_t *mcpsConfirm)
{
switch (mcpsConfirm->Status)
  {
  case LORAMAC_EVENT_INFO_STATUS_RX1_TIMEOUT:
  case LORAMAC_EVENT_INFO_STATUS_RX2_TIMEOUT:
  case LORAMAC_EVENT_INFO_STATUS_RX1_ERROR:
  case LORAMAC_EVENT_INFO_STATUS_RX2_ERROR:
  {
//skip until we reach no Ack and we are in DR_0
    if (!mcpsConfirm->AckReceived && (int)mcpsConfirm->Datarate == 0)
    {
        printf("something is wrong, let's re-join");
        deviceState = DEVICE_STATE_INIT;
    }
    else if (!mcpsConfirm->AckReceived) // let's temporarily lower the dutcycycle to 1 minute when an Ack is not received to try again, since my dutycycle is set to 6 hours
    {      
      timeoutTxDutyCycle = (1 * 60 * 1000) + randr(-APP_TX_DUTYCYCLE_RND, APP_TX_DUTYCYCLE_RND); // 1 minute
      appTxDutyCycle = timeoutTxDutyCycle;
      deviceState = DEVICE_STATE_CYCLE;
   }
break;
}
default:
//let's return the dutycycle to normal 6 hours after we are successful 
if (appTxDutyCycle == timeoutTxDutyCycle)
    {
      appTxDutyCycle = 6 * 60 * (60 * 1000); //back to 6 hours
      deviceState = DEVICE_STATE_CYCLE;
    }
    break;
  }
}

or do you think the above is risky?

Cheers,
Jay

descartes · April 20, 2022, 8:28am

Are you trying to write code specifically to deal with duplicate EUI’s out in the wild?

If so, I can’t see how re-joining is going to help if both devices have the same setup - you’ll just hijack the second device and then they’ll ping-pong between them for all eternity.

As for advice, apart from being careful not to mess with the stack, I’d create a workflow that precludes provisioning two devices the same that end up being deployed.

jayjay75 · April 20, 2022, 12:22pm

Hi,
Not necessarily just for that situation, but I’m trying to code around a situation where the end device can no longer reach the Server whether it is due to a duplicate or other reasons and force a rejoin… as i noticed if i don’t do a rejoin the end device will continue to send it’s packets in the wind forever…

a similar issue can surface if for example we migrate from one TTN application to another, according to TTN i would have to delete the end device from the old application and recreate it in the new one, something i would have to test and see as i’m not sure if TTN will re-route the traffic to the newly created application or it would require it to re-join to be recognized under the new application… also another situation is when one decides to change the server used completely going from TTN to a client chosen server (maybe chirpstack) now how should an end device deal with this ? apart from detecting the problem of timeouts and doing a rejoin after certain amount of failures? obviously i’m just trying to cover most basic situation where we won’t have a rogue end device in the wild… maybe i’m over thinking and complicate this…

descartes · April 20, 2022, 3:15pm

You may benefit from reading “TR007 Developing LoRaWAN® Devices” which has recommendations from the LA on such things.

jayjay75 · April 21, 2022, 2:05am

Thank you exactly what i was looking for
here is the link in case others need it: https://lora-alliance.org/wp-content/uploads/2021/05/TR007_Developing_LoRaWAN_Devices-v1.0.0.pdf

cheers,
Jay

system · April 22, 2022, 5:36pm

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.