Trouble getting stable OTAA on SF12 with a RN2483 module


(Morten Guldager) #1

I am experiencing some trouble getting my OTAA’s accepted reliably.

It seems the lower datarate I go with the less likely an accept is. When tracing the UDP:1700 traffic on the network it looks like the join accept arrives too late. Even though the regular PULL_ACK’s are reported to have only about 45ms delay.
Some timing measurements done on the gateway (ethernet):

  • with “mac set dr 5” I measure 4045ms between join request and join ack
  • with “mac set dr 3” I measure 4045ms between join request and join ack
  • with “mac set dr 1” I measure 5041ms between join request and join ack

And that more than 5 seconds response time is enough to be too late for the rn2483 to catch the ack, which then of course respond “denied”

In the TTN console gateway traffic view I see the same ~4 seconds on the two first joins and the ~5 seconds on the last one.

My setup is boiled down to the simplest I could make it:

  • A RN2483 module hooked to a serial terminal for me to act as the node
  • A gateway consisting of an IC880A-spi wired to a RPi
  • “node” and gateway are a few meters apart, I have a -20dB dampener on the gateway antenna line and -30dB on the node antenna to make sure I do not pick up other traffic.

The gateway run the regular semtech packed-forwarder, communicating on UDP:1700. I have been using the same hardware and software with loraserver without any problems, so I think the wiring is good. Only global_conf.json and local_conf.json are changed to match TTN.

I would appreciate suggestions on how to debug this. What else information is needed?

Also please let me know if I should have posted somewhere else instead - I couldn’t really figure out the optimal category for this mess

/mogul


(Jac Kersing) #2

For SF11 and higher the join response will be in RX2 which matches the slightly over 5 seconds delivery of the data to the gateway. (RX2 for joins is 6 seconds after transmission)
However the RN2483 has a known issue of some modules not being able to join at SF12 which might hold true for SF11 as well. Basically the module seems not to wait for RX2.


(Morten Guldager) #3

Ah I see this is a more widespread issue, now you gave me the hints on what to google for - thanks!

Still I’m a little puzzled that the same rn2483 module connect happily on SF12 to the same gateway if I’m using loraserver or loriot as the backend.
It is because the TTN setup it more standard compliant and that strictness exposes a bug in my rn2483?

It bugs me a tad not being able to use rn2483 based devices reliable on the low data rates, after all it’s a quite popular module, we have many devices using it. Haven’t found any mentions on workarounds yet. Pointers?


(Arjan) #4

For the other networks, you’re probably seeing RX1 for SF11 and SF12 then? Selecting RX1 or RX2 is up to the provider.

Thinking about it, I wonder why TTN prefers RX2 for an OTAA Join Request in SF11 and 12.

(It’s understandable for regular downlinks that are responses to uplinks using a high SF, as then, for RX2, TTN uses SF9 with a higher output power, where the lower SF9 limits the valuable airtime, and limits the time that the half-duplex gateways are “deaf” while transmitting a downlink. But for a Join Response, it needs to use the LoRaWAN defaults anyhow, for which RX2 does not seem to have much advantage over RX1, at least not for EU868? Or maybe the duty cycle is better for RX2?)


(Morten Guldager) #5

Just repeated the test with loraserver, easier to measure when you have you own stack running. Yes it seems to use rx1 (5 seconds) for the ack on all spreading factors

Good question, lets see if one of the devs are up for an answer.

Not sure I understand it with the regular downlinks either. I had the impression the timer starts when the uplink transmission as finished and then exact one or two seconds after that last blip, the receiver turns on. If that is the case I see no reason why lower data rates you require longer delay.


(Arjan) #6

The delay is just a result of using either RX1 or RX2. And using RX1 or RX2 is just a choice of the server.

For regular uplinks, when using RX1 then the downlink uses the same SF as the uplink. The higher the SF, the longer the transmission time. But for RX2, the frequency and SF are fixed, at least for EU868. And for RX2, TTN uses SF9 in EU868 (for the regular downlinks; not for the OTAA Join Accept).

Now:

  • Any regular uplink that uses a low (fast) SF, like SF7 or SF8, would also use that low (fast) SF in RX1, but would use the slower SF9 in RX2. So, TTN prefers RX1 then.

  • For an uplink using a high (slow) SF, like SF11 or SF12, RX1 would also need to use that high (slow) SF, while for RX2 it would again use SF9, which then is faster compared to RX1.

Or, like Hylke wrote:

I guess the same reasoning applies for a Join Accept.


(Jac Kersing) #7

No need to bother the devs. RX2 frequency has a 10% duty cycle where for RX1 it is 1%, so longer transmissions make more sense in RX2 to prevent the gateway from running out of airtime.


(Morten Guldager) #8

OK, let me try to sum this up to see if I got it right.

Case: OTAA is not working reliable when using

  • RN2483 module
  • low data rate - SF12
  • OTAA - OverTheAirActivation

Cause: For low data rate (SF12) OTAA, TTN use the RX2 window and channel when responding to a OTAA request. The RN2483 module does not receive this RX2 transmission reliable then doing OTAA.

Conclusion: This is a bug in the RN2483 firmware. (or is it just an incompatibility ?)

Background: TTN has decided to use RX2 for the SF12-OTAA-responses from a band utilisation standpoint. RX2 has a much higher duty cycle.

Now the question is if I can do anything to mitigate this? I see a few options, but none are optimal

  • deploy more gateways to reduce the need for SF12
  • stop using OTAA
  • replace existing RN2483 based devices
  • leave TTN, other networks seems more compatible with RN2483

Comments / corrections ?


(Jac Kersing) #9

What about contacting Microchip (not the forum, log a support case with them) and asking for a solution?


(Morten Guldager) #10

If this is actually a bug in the RN2483 module, I have to be able to qualify this very firmly, which is why I’m reaching out here.
It would be VERY easy for Microchip to simply respond “It works every where else, it a bug in TTN”
Even if it’s a bug in RN2483, many devices does not support firmware upgrades anyway.

I just had a look in the LoRaWAN specification 1.0.2 and the cosponsoring regional parameters 1.0. It is indeed within specs for TTN to use RX2 after 6 seconds for the OTAA join.

If so, it seems to be that TTN is operating in an unusual way, thus fully within specs. But Microchip only implemented the most usual behaviour in RN2483.

Perhaps TTN needs a “compatibility mode” flag on the devices? To instruct TTN never to use RX2 on OTAA responses.


#11

is it the RN2483 or the RN2483A and what firmware version ?


(Morten Guldager) #12

The three I have with me right here all respond: RN2483 1.0.1 Dec 15 2015 09:38:09

  • a Libelium module
  • a naked module I have wired up myself
  • a microchip mote

They have been purchased over the last few years


(Jac Kersing) #13

You could at least upgrade to a more recent firmware version… For the non-A modules I would suggest 1.0.3.


(Morten Guldager) #14

Done. Had to fight it a little. (damn you Microchip, that java application, it could use a little love you know)

Anyway, I upgraded to 1.0.4 downloaded here

It appears to have helped a some. Earlier I could only get it to join after a factoryRESET or a power cycle, now it will also join after a normal sys reset

I will do a lot more tests into this to make sure I have covered every corner worth covering. I will get back to this thread if I learn more than would affect the conclusions drawn so far,


#15

:rofl:


#16

I can throw some fuel on this fire …

The Microchip bug you mention was actually a complex one relating to : i) module hardware (A vs non-A), ii) module firmware, iii) LoRaWAN settings (SF12 and long packets), plus iv) SX1301 driver in the gateway

There have already been plenty of comments on i) & ii) so I’ll skip over those, but of course generally newer versions are better than old!

iii) The issue is basically frequency deviation and so the timebase slips out of sync with very long packets, hence worse with SF12 and I’d guess an OTAA join/accept is a long packet. Not much you can do about this other than try join at lower SF. I expect you’ll respond that SF12 gives the longest range and greatest chance to reach the network, but it is actually better practice to place the OTAA join in a loop and start at SF7 then work up to SF12, dropping out of the loop when you hear the accept. On average its more efficient this way.

iv) Semtech released a SX1301 driver update, I’m going to say it was around Feb 2017. Before that there was a bug making SX1301 intolerant to negative frequency offsets. So an unlucky combination of RN2483(non-A) with negative crystal offset, long SF12 packet and old SX1301 driver would show the problem. I’m flagging this because you mentioned Loriot server worked OK. I think with Loriot you may be using their full gateway agent, which has the newer SX1301 driver. When switching back to TTN, are you using a different packet forwarder and any idea of its build date?


(Morten Guldager) #17

Sounds a bit like a “manual” ADR, but yes, I do that too. I have a collection of antenna dampeners so I can simulate various degrees of dampening, including some so bad that that only SF12 will get through.

I have 3 networks to work with, using the same gateway: Loriot, through a Danish operator, a self hosted loraserver and then TTN.
Only with TTN I have had this problem. I used loraserver with the exact same packet-forwarder where I did not have problems using SF12 for OTAA. When using loriot i use a different gateway, probably not the best reference then.
Perhaps I should see if I can figure out how to load new firmware in the IC880A-SPI board. Pointers?


#18

From what you say about the same gateway on Loraserver and TTN then my comment about the SX1301 driver may be a red-herring. I was hoping I’d spotted something with the difference on Loriot.

At least it would be worth asking IMST if they are aware of the SX1301 update. It wasn’t well publicised.
This was the release: https://github.com/Lora-net/lora_gateway/releases/tag/v4.1.3


(Morten Guldager) #19

I think I have tried my experimental gateway with Loriot too, but it might have been a different one in my lab. I will redo that test in a not too far future. Now I know that test should be done both with Loriot’s software package, but also with raw packet-forwarder I’m using with TTN/LoraServer

When I start my lora_pkt_fwd it says:

*** Beacon Packet Forwarder for Lora Gateway ***
Version: 4.0.1
*** Lora concentrator HAL library version info ***
Version: 5.0.1;

I read it as I’m already using the updated SX1301 code. Or am I to push some microcode to the chip itself?