Start OTAA join on SF7BW125 or SF12BW125? (EU 868MHz)

bluejedi · June 19, 2018, 8:41pm

When doing OTAA join (ADR either enabled or disabled):

LMIC-Arduino tries to OTAA join at SB7BW125 first. If after some attempts the JOIN does not succeed it then tries at SF8BW125, if after some attempts the join does not succeed it then tries at SF9BW125. Up till SF12BW125. Then some later ADR should kick in (if ADR based changes are required).
The (LoRaMAC-node based) LoRaWAN library included with the STM32L0 Arduino core starts OTAA join at SF12BW125. Then after some time ADR kicks in to guide the node to the preferred SF/BW.

Method 1: If the node is unable to reach a(ny) gateway at SF7 but it is able to reach a gateway at SF11 then it can take quite a long time before it can actually join the network.
For distant nodes this is not very power friendly. This method is also a disadvantage for mobile (moving) nodes because the node needs to join as soon as possible (before going out of reach).

Method 2: If any gateway can pickup your signal it will probably pick up the very first join request. This will give a more or less instant join experience.

With method 2 I have actually seen ADR coming into action. I did not see ADR come into action with method 1 before (which probably depends on my situation and limited test setup).

Questions:

Two different methods. Are both methods LoRaWAN compliant?
Which is preferred (or required) and why?

bluejedi · June 20, 2018, 8:24pm

Maybe @htdvisser or @arjanvanb can shine some light on this?

GrumpyOldPizza · June 20, 2018, 8:54pm

The 1.0.2B specs says that the JOIN_REQUEST can be transmitted with any data rate.

Some carriers in their specs have language like this here (machineQ, US915):

"All LoRaWAN end devices operating in OTAA mode MUST transmit
a join request on a different channel for each transmission, across
all 72 channels, at the lowest data rate for each channel, and at the
maximum power level. "

Hence this is what the LoRaWAN class does.

bluejedi · June 20, 2018, 9:19pm

Thanks.
I am still curious why LMIC(-Arduino) does exactly the opposite: start at the highest data rate (with worst coverage).

GrumpyOldPizza · June 20, 2018, 9:41pm

I’d go out on a limb and oracle. If you can get a join accept at a higher data rate, you’ll waste less of your dutycycle, because substantially less airtime is involved.

However that does not hold water, as per spec you are supposed to backoff appropriately after a failed join request. So if your first attempt fails, you might pay a higher price then. Also TTN mitigated the dutycycle problem by also agressively using the RX2 window.

htdvisser · June 21, 2018, 7:36am

The back-off is only mandatory for uplinks that:

Require a response
Are retransmitted if there is no response
Can be triggered by an external event that causes synchronization

It is recommended to implement it in devices that don’t meet all the criteria to make it mandatory.

The back-off mechanism has quite simple requirements:

There is a random delay between the end of RX2 and the retransmitted uplink.
If you have 1000 devices, they should obviously not all follow the same retransmission pattern.
In the first hour after a reset, the aggregated transmission time must be below 36 seconds
In the 10 hours after the first hour, the aggregated transmission time must be below 36 seconds
After that, the aggregated transmission time must be below 8.7 seconds per 24 hours

There are a couple of strategies that we’ve seen in the wild. Most strategies roughly follow the “re-establish connection” procedure from the LoRaWAN Best Practices document that is being written by the LoRa Alliance. The procedure works by:

Restoring the TxPower to the maximum allowed TxPower
Progressively decreasing the data rate (increasing the spreading factor SF7->SF8->SF9->SF10->SF11->SF12) until you get a response.
In each step you should make 3 attempts to account for “normal” packet loss or for gateways that don’t have the duty-cycle available for downlink.

Of course you can make this smarter if you already have knowledge about the coverage in the area where your node is deployed. The approach for static nodes would be to survey the location and set an appropriate starting data rate for the procedure (this also means that you probably need a mechanism to update this setting when the network becomes more dense). Another approach could be to have your node record data rates of successful past uplinks, store that to persistent memory, and use it to choose the starting point for the reconnect procedure.

One last thing I wanted to mention is that a single SF12 join consumes just as much airtime as SF7+SF8+SF9+SF10+SF11 joins.

bluejedi · June 21, 2018, 8:12am

I assume that a draft is currently not publicly available?

htdvisser · June 21, 2018, 8:41am

I’m afraid not. It hasn’t been published yet, and the draft that I have falls under our LoRa Alliance NDA.

bluejedi · June 21, 2018, 8:42am

I already suspected that.

GrumpyOldPizza · June 21, 2018, 11:58am

Would be interesting to have access to this “LoRaWAN Best Practices” document.

Backoff is mandatory for “join request” as well. the 1.0.2 spec actually list that as an examply why backoff is needed (page 37). What I was missing in the spec though is “what is this random delay, min/max” ?

The re-establish connection mechanism seems to be not that much different to the ADR procedure described in 4.3.1.1, except that the “attempts” is ADR_ACK_DELAY frames.

Is the “LoRaWAN Best Practices” then suggesting something along the line of a node-based ADR scheme ? I.e. where the application is not totally under control, but the stack is ?

The Airtime (TX) is interesting. Except for the backoff. So you wait perhaps a lot around before you end up at a spreading factor that can connect, worst case. Guess I need to dig throu more operator recommendations to see what they consider best. The LoRaMac-node code relies on the application to pass in the data rate per request, but filters out internally illegal selections.

GrumpyOldPizza · June 21, 2018, 12:25pm

“Spreading Factor: Orange recommendation is to use the SF12 by default at device initialization in order to maximize the efficiency of the communication, even under difficult radio conditions.”

Not sure this recommendation is for JOIN_REQUEST as well …

Found this here regarding the Microchip modules (which discusses recommendations vs reality):

https://www.microchip.com/forums/m974487.aspx

So they changed the default to SF12, but recommand the user to change that to SF7 …

Here from KPN:

“Reference configuration. Each End-Device must be able to use a reference configuration for troubleshooting. The reference configuration is a Class A device with ADR ON, which sends the first message on Spreading Factor (SF) 12.”

There is one more detail I blanked on. With OTAA you have a 16 bit “DevNonce”. There has been a lot of chatter regarding security, and also changes between 1.0.2 and 1.1. At the end of the day, for a combination of DevEui/AppEui you should not ever reuse a “DevNonce”. So you have only 65536 JOIN_REQUEST that you can send. Going from SF7 to SF12 wastes potentially 6 of those for no good reason.

GrumpyOldPizza · June 21, 2018, 2:18pm

Looks like I am utterly not awake yet to have missed that.

If you join, you need to scan all join channels (realistically) for a given data rate. For EU868 this is 3 channels.

Hence a single SF12 join is less airtime than the worst case of (SF7+SF8+SF9+SF10+S11) * 3.

LMIC for EU868 seems to lower the datarate every 2 join attempts (nextJoinState()). In another repository that has added more regions to LMIC, they properly step throu all the default channels before lowering the data rate.

bluejedi · June 21, 2018, 3:30pm

Please share the location. This might benefit others too.

arjanvanb · June 21, 2018, 4:25pm

Why is that?

So that’s the time for a single (repeated) uplink/join? (So not the total airtime for all different uplinks for a single node, when not taking the TTN Fair Access Policy into account.)

GrumpyOldPizza · June 21, 2018, 4:45pm

Because the premise of using a lower spreading factor for join is to minimize the time on air for the gateway. So rather than forcing it to a higher spreading factor, it’s more reasonable to try out all 3 channels (for EU868).

There is also the secondary thought that you might want to continue on with the same spreading factor as you were able to join. In that case it seems desirable to target for the lowest spreading factor you can get away with.

arjanvanb · August 16, 2020, 4:47pm

No question goes without an answer here, though it may take some time. I think this may be the one?

434Mhz · November 17, 2020, 4:39pm

Late into the discussion, I have been looking into this with chirpstack.
Chirstack uses the Semtech recommended ADR algorithm,
I works very well for controlling devices with high uplink SNR figures at the gateway as it can command TX power both up and downwards to optimize the link.
BUT This ADR engine cannot step down in data rate, so if you a using LMIC and starting out at SF7 straight after join, and the end-device is on the SNR margin, it can get 'stuck" there because the ADR cannot send commands to reduce data-rate to improve uplink SNR.
Its a compromise I guess, better to have the option to start up LMIC at lower data-rates.
p