Downlink Problem (join) OTAA The Things Indoor Gateway(TTIG)

I had the problem that the join fails to often with my node. And one answer was from WASN see here https://github.com/HelTecAutomation/ASR650x-Arduino/issues/48#issuecomment-570359453 He sad he had same problem with it. What should I do?

While the TTIG does have stability issues, I’d expect those to appear as a total loss of uplink and downlink, which is not what is being seen.

My first suspicion would be issues in the node, especially timing related.

Heltec says timing should Not be a problem.
Because also WSN had a problem. And should it work servel time good and suddenly not.
It is always the easiest way to say the node has timing problems.

The postings in that thread did not seem informed by any great degree of understanding.

When someone has used a scope to verify the node timing in the actual incident where a downlink is missed, then I’ll start to believe that the node timing is correct.

What is a problem with the TTIG is that it isn’t easy to examine local log output from the gateway during a test.

@cslorabox I have now bought an USB digital analyzer. I hope I can add code for debug in the heltec library.

I am having the same issue with my TTIG. Uplink and Join messages are received, but Downlink and Join accepts are not sent. After unplugging and plugging the TTIG back in it works for a day or two and after that the same issue occurs.

For the nodes I am using a Heltec wireles stick and an esp32 lora v1

The timing of those boards must be correct, since with other gateways (in other cities) it works just fine.

1 Like

So today the issue happened again and I hooked up the TTIG to my PC to see the debug messages. Apparently it has no more TX time, even though it did not send a single message since two days. Any ideas?

2020-03-04 18:22:09.771 [S2E:VERB] RX 868.3MHz DR5 SF7/BW125 snr=8.2 rssi=-29 xtime=0xAC0001D03CA9C3 - jreq MHdr=00 JoinEui=70b3:d57e:d002:b253 DevEui=4b:aecf:e2d2:61fd DevNonce=14431 MIC=1710948409
2020-03-04 18:22:09.989 [SYS:DEBU]   Free Heap: 16880 (min=16336) wifi=5 mh=7 cups=8 tc=4
2020-03-04 18:22:10.993 [SYS:DEBU]   Free Heap: 18464 (min=16336) wifi=5 mh=7 cups=8 tc=4
2020-03-04 18:22:11.997 [SYS:DEBU]   Free Heap: 18464 (min=16336) wifi=5 mh=7 cups=8 tc=4
2020-03-04 18:22:13.001 [SYS:DEBU]   Free Heap: 18464 (min=16336) wifi=5 mh=7 cups=8 tc=4
2020-03-04 18:22:14.005 [SYS:DEBU]   Free Heap: 18464 (min=16336) wifi=5 mh=7 cups=8 tc=4
2020-03-04 18:22:14.308 [S2E:VERB] ::0 diid=8295 [ant#0] - class A has no more alternate TX time

@cslorabox thank you for that :wink:
i make my money with LoRaWAN Network deployment and development of nodes.
and you only expect a total loss.
sometime failures are not as expected. that sending is not working does not mean that receiving is broken too.

If you use a frequency analyzer i can see the join request on the used frequency but no join accept send from thee TTIG.

All nodes i or my customers own are working without problems on all other gateways i have deployed.
Only using the TTIG make problems after about 1 or 2 days uptime of the TTIG.

For other readers, same thing is being discussed an debugged in #ops in Slack: https://thethingsnetwork.slack.com/archives/C1Q5XLNDT/p1583346968055000

1 Like

how can i login to that slack?

i have a slack account (info@wasn.eu)

You can request an (automatic) invite through https://account.thethingsnetwork.org

I just learned something on Slack about the 7th byte in xtime, which might help people debug other problems:

anton 2020-03-06 09:31

@arjanvanb the uppermost byte is the xtime session - basically a random value generated at the time the radio is started. So the actual time passed is 0x1D03CA9C3/1e6/3600 which is a bit more than 2 hours.

(Note that the TTIG’s log apparently does not print leading zeroes, so the 64 bits value for the example above is 0x00AC0001D03CA9C3.)

1 Like

Here is a full definition of the xtime field in Basic Station:


However, the protocol specifically does not define the structure of xtime and thus must be regarded implementation-specific. The reasoning is that the protocol is designed such that the xtime value is opaque to the LNS and just needs to be passed along between uplink and Class A downlink (eliminating all rollover problems, etc):
https://lora-developers.semtech.com/resources/tools/basic-station/the-lns-protocol#radio-meta-data
The Class A downlink transmission time is controlled by the RxDelay parameter which is a well known LoRaWAN parameter:

3 Likes

Thanks, @bei! I also noticed your GitHub issue, about downlinks on low-traffic TTIG’s, which also seems to apply above. Great troubleshooting!

@arjanvanb FYI I am seeing this issue persist on my TTIG and is reproducible just by leaving the gateway on overnight with a node frequently transmitting (every 15 seconds in an isolated basement, just for testing purposes).

Downlink messages do not go through to the device, even though TTN believes they have successfully. If I power cycle the TTIG, the problem is fixed for another 24 hours or so.

I just unplugged my TTIG and am trying other Gateways to isolate the issue.

Is there a new firmware or something to help solve this on the TTIG, or am I looking at the problem wrong. Thank you for your help!

How do you know it’s not getting to the device, do you have some sort of logging on the device? What is the device? What is the code base? Can you add the debug UART port to the TTIG so you can see what it thinks is going on?

TTN doesn’t have a faith system, it’s a computer*, if it says it sent the request to the gateway, that’s a sort of fact. The gateway may be failing (as you suspect), or the device may not hear the downlink due to a whole variety of reasons or may just not respond to the downlink as you expect.

  • Unless it’s a relative of an electric monk, qf Dirk Gently’s Holistic Detective Agency

Hi @descartes, thanks for the response! The device is essentially an Adafruit Feather M0 running LMIC v3.2. I can see in the serial logs that it simply does not see a downlink message at all.
The moment I power cycle the gateway, the downlink messages goes through without issue.

This particular message changes the sampling/transmit interval, so it’s quite easy to see whether the downlink has worked for other nodes not connected to a serial monitor as well.

It’s really tempting to pin that on the gateway and its closed firmware.

The problem of course is that it’s very closed-ness makes it hard to tell if it is really to blame… or not

I know some people have cracked them open and gotten a serial log but unclear how informative that really is.

Getting your hands on a more ordinary gateway might be a step forward.

Or there’s the long toyed with idea of ripping apart a TTIG and wiring its radio deck to a pi…

The one thing you can do without getting into the gateway is use a scope or logic analyzer to make absolute sure that the node is opening the receive window at the appropriate time, and after-the-fact serial logging to make sure it did on the appropriate frequency and spreading factor (especially in EU868 there’s that issue with RX2 settings toggling back and forth). It’s not immediately obvious how restarting the gateway would help with that, but in the interest of ruling out everything else…