LoRaWAN simulation issues

dpergarc · October 26, 2020, 4:58pm

Hi there!

First of all, forgive me if I am writing this new topic in the wrong place, I am newbie here and I was not able to decide where to put this.

Anyway, I would like to explain as much as possible the issue I have, and it is related to the Semtech UDP Packet Forwarder communication. I am developing an application to simulate some LoRaWAN devices, and I need to comunicate them to the TTN platform. I downloaded the TTN stack (with dockers), and I was able to set it up properly, since everything is working well regarding to the gateway “stat” packets, but when I try to simulate the “join-accept” procedure, something weird happens when I execute it three times in a row, and from that, it answers with an error message just as it is shown below:

join fail
More clues:

The issue happens when this is executed in docker stack, but when I execute the same code in the things network console (https://console.thethingsnetwork.org/) it works perfectly every time!
Device -> MAC V1.0.3 & PHY V1.0.3 REV A.
The testing use cases I have performed were the following ones:

A)
GW connect
ND connect
ND disconnect
…~2min…
ND connect
OK

B)
GW connect
ND connect
ND disconnect
ND connect
!OK

C)
GW connect
ND connect
ND disconnect
ND connect
!OK
GW disconnect
ND connect
OK

Is there a configuration difference between both the console and the docker? Is there something I am missing? Thanks beforehand, looking forward to hearing from you.

Kind regards.
Daniel.

kersing · October 26, 2020, 5:36pm

Looks like you try too soon resulting in airtime issues for the gateway. Every device needs to observe airtime limits and when you ask for too many transmissions in too short a time the gateway will not have airtime available so the stack fails to schedule as there is no available gateway.

cultsdotelecomatgmai · October 26, 2020, 7:21pm

Hi @dpergarc, your post says:

Are you running a real gateway or have you built a software simulation of a gateway and devices?

If it is working as you expect to the TTN V2 stack on the community network but is not working to the TTI V3 stack that you have built, have you used a network sniffer like Wireshark to look at the UDP legacy protocol difference between working and not working?

dpergarc · October 27, 2020, 10:49am

Hi guys!

First of all, thank you for answering so quick, let’s see what you recommend:

Yes, that is true for an scenario with real devices, taking into consideration the windows reception limitations for LoRaWAN, but this is not the case, I am simulating both devices and gateways by software through UDP forwarding packet. What I do not get is why the same packet sent to two different destinations (ttn console VS ttn stack) behaves different, working in one side, and not in the other one.

It is a software simulation for both devices and gateways, there is not real gadgets… yet. If everything works as it is expected, I will look for a real case.
Well, I can make the test you are suggesting, but I believe that the real question here could be: what are the differences between TTN V2 stack and TTI V3 stack? Since I am sending exactly the same packet, there are no differences.

Additional information:
I am attaching the issue with further details (see joinError.json (4.7 KB) ) so as it might give you some clues about what it is going on.

Thanks beforehand, and looking forward to hearing from you.

Kind regards.
Daniel.

cultsdotelecomatgmai · October 27, 2020, 1:35pm

Hi @dpergarc, thanks for the additional information. You say:

This obviously only covers one half of the situation. Your simulation might be the same but the other half - the V2 stack or the V3 stack - is different, very different, V3 is much more complex.

I think that you need to look in detail at all the packets being sent by the LoRaWAN core and see what is triggering the situation. I think that the answer lies in the bits and bytes of the wire-level of the UDP protocol.

cslorabox · October 27, 2020, 2:08pm

That doesn’t mean airtime limits won’t be applied by the server, as far as it is concerned they are real. And it wouldn’t be a very useful simulation if it waived real world constraints.

(Even if a gateway implements airtime limit accounting itself for legal reasons, the network server still needs to for performance reasons, because it would rather send a downlink through a gateway that has available airtime allowance to transmit it, rather than assign it to one that is just going to drop it.)

kersing · October 27, 2020, 4:41pm

The stack does not know your simulation is not a real gateway and not a real node so it will consider them real and handle them appropriately.
The difference in behavior is probably due to the difference in versions of the stack. In V3 lessons learned have been implemented…

dpergarc · October 27, 2020, 5:06pm

Hi there!

Thank you very much for your responses, it gives me encourage to keep going!

I get what all of you are saying… therefore, I can clearly see there is only one way to achieve it, but I do not know whether it is possible or not, so: is there any way to avoid the LoRaWAN limitations for an specific case just like this one? The main aim of this is to simulate the traffic, the coverage and the way that gateways could be overloaded, since in this very first approach, real devices are not mandatory.

Thanks beforehand, looking forward to hearing from you.

Kind regards.
Daniel.

kersing · October 27, 2020, 5:51pm

As the sources are available there probably is. In an other way, I hope not as everyone always has special use cases where the legal/fair use/whatever limits do not need observing…

cslorabox · October 27, 2020, 5:54pm

Then you really need to include the airtime limits in the simulation!

is there any way to avoid the LoRaWAN limitations for an specific case just like this one?

Simulate a setup in a region that doesn’t have downlink airtime limits - but then your result only applies to that region, and not one which does.

But really, if you’re hitting downlink airtime limits, you’re already shooting yourself in the foot, because it means your gateway is spending too much time transmitting and thus unable to hear uplinks from nodes…

Did you accidentally “power on” a bunch of nodes that need to register all at the same time? That’s not very realistic, simulate instead a person putting batteries in them one at a time, or pulling out a battery saver tape or whatever.

cultsdotelecomatgmai · October 27, 2020, 5:58pm

Hi @dpergarc, please do not do any form of stress testing on a network that is being used by other people for production work.

descartes · October 27, 2020, 6:11pm

None of which particularly needs to replicate the protocol. You can use real life to figure out metrics like transmission time (which can actually be calculated to a reasonable level), typical gateway processing, backhaul responses and network processing times. Then write some algorithms, run simulations and unleash some R to evaluate the results.

It would help if you could tell us why you need to figure this out.

kersing · October 27, 2020, 6:20pm

That is an issue we regularly experienced at workshops. Last few workshops I ran I made sure to have multiple gateways on site.

cslorabox · October 27, 2020, 6:25pm

Seems like it could also be handled with better timing randomization at startup and between join attempts.

Also possibly code optimized for such a setting by using only fast SF’s for the first five minutes or so before starting to include slower ones.

dpergarc · October 28, 2020, 5:10pm

Hi guys!

Thank you very much again for answering so quick!!!

Do not worry, I am not trying to execute this in a real scenario, I am using the TTN stack with docker in my computer

That might be (it makes sense actually), but this is not the case I am afraid, since I am doing the following sequence:

Join-accept procedure → OK.
Wait 15 minutes.
Join-accept procedure → OK.
Wait 15 minutes.
Join-accept procedure → NO OK.

That is weird, I am not sending nothing else to the gateway by neither the emulated node nor any other node.

I will look for it, just only to make a little test to discard the air time limitations - Would you mind telling me one? It might be quicker I guess

Thanks beforehand, looking forward to hearing from you soon.

Kind regards.
Daniel.

cslorabox · October 28, 2020, 9:10pm

How are you implementing time and timestamps in your simulation?

dpergarc · October 29, 2020, 8:45am

Hi there!

I am attaching a real example of Semtech UDP packets used recently this morning (that is to say, both the rxpk and the stat, which are related to the first communication steps):

/*******************************************************************************************/

Attempt 1 - OK

Stat
{“time”:“2020-10-29 08:29:44 GMT”,“lati”:0,“long”:0,“alti”:0,“rxnb”:0,“rxok”:0,“rxfw”:0,“ackr”:100,“dwnb”:0,“txnb”:0}
Rxpk
{“time”:“2020-10-29T08:29:47.678004Z”,“tmst”:1603956587,“powe”:14,“chan”:0,“rfch”:0,“freq”:868.1,“stat”:1,“modu”:“LORA”,“datr”:“SF7BW125”,“codr”:“4/5”,“rssi”:-35,“lsnr”:5,“size”:46,“data”:“AO/Nq5B4VjQSAJmqu8zd7v/Me0zhInc=”}

Attempt 2 - OK

Stat
{“time”:“2020-10-29 08:32:06 GMT”,“lati”:0,“long”:0,“alti”:0,“rxnb”:0,“rxok”:0,“rxfw”:0,“ackr”:100,“dwnb”:0,“txnb”:0}
Rxpk
{“time”:“2020-10-29T08:32:08.878004Z”,“tmst”:1603956728,“powe”:14,“chan”:0,“rfch”:0,“freq”:868.1,“stat”:1,“modu”:“LORA”,“datr”:“SF7BW125”,“codr”:“4/5”,“rssi”:-35,“lsnr”:5,“size”:46,“data”:“AO/Nq5B4VjQSAJmqu8zd7v9wb/g86W0=”}

Attempt 3 - FAIL

Stat
{“time”:“2020-10-29 08:32:38 GMT”,“lati”:0,“long”:0,“alti”:0,“rxnb”:0,“rxok”:0,“rxfw”:0,“ackr”:100,“dwnb”:0,“txnb”:0}
Rxpk
{“time”:“2020-10-29T08:32:41.766004Z”,“tmst”:1603956761,“powe”:14,“chan”:0,“rfch”:0,“freq”:868.1,“stat”:1,“modu”:“LORA”,“datr”:“SF7BW125”,“codr”:“4/5”,“rssi”:-35,“lsnr”:5,“size”:46,“data”:“AO/Nq5B4VjQSAJmqu8zd7v/d1L8pvIA=”}

Note

Most information has been made up, but I tried to simulate them with random values achieving the same wrong result.

/*******************************************************************************************/

I hope this helps, if you need further information please, do not hesitate and ask for it, I will be glad to share it!

Thanks beforehand, looking forward to hearing from you.

Kind regards.
Daniel.

cslorabox · October 29, 2020, 4:59pm

As I suspected you have not correctly simulated the hardware timestamp.

Your packets purport to be a couple of minutes apart, but their timestamps are only 141 microseconds apart, which is simply impossible - those packets would overlap!

As a guess, you incremented the hardware timestamp counter only by the air time and not the elapsed time between packets. And you mistakenly applied a value in milliseconds as microseconds.

Or at least something like that.

Your fake packets need to have hardware timestamps which show a progression of time roughly matching that actual progression of time in between when they are submitted.

Assuming you are running your simulation in real time, what you need to do is convert the time since program start to microseconds, mask it at 32 bits so its rolls and use that.

Modeling the uplink airtime is relatively unimportant, unless you’re also trying to do your own accounting of when frequencies are occupied (though in that case you have to model real world radio behavior, too, like close nodes blanking distance ones on other channels, and some intention of to what degree you support the theoretical orthogonality of distinct spreading factors on the same channel at the same time.)

dpergarc · October 30, 2020, 1:13pm

Hi there!

Thank you very much @cslorabox, that seems to be the key!!! I am hard coding that with random values just like as it shown below:

7911574
8449704
8892065
9220230
9947056
404184
2161830

So, according to what you say, the real operation here should be:

Calendar.getInstance().timeInMillis * 1000 // microseconds value → For instance right now: 1604063010987524.
Delete the first six values → in order to have a 32 bits value.

Is that correct? Could it be a random value emulating steps of two minutes?

Thanks beforehand, looking forward to hearing from you.

Kind regards.
Daniel.

cslorabox · October 30, 2020, 3:39pm

No, you need to mask to a 32 bit value. That is not an operation which has any direct equivalence in decimal, but it is rather elementary computer programming.

I’m sorry to say that your approach to all of this is a bit haphazard - if you want to end up with a result that has any real meaning, you’re going to need to first take time to better understand how such a network works.