Using TTN trafic monitor to troubleshoot a TTIG-based setup

codename5281 · April 1, 2021, 3:20pm

Hello again,

Let me try to sum everything up as i read again my emails with Sensoterra, this thread and PMs we had with @descartes. As putting pieces together helps me understand a bit more, it may be helpful for others people too.

We bought 8 Sensoterra soil moisture probes. These probes come already registered to TTN by Sensoterra so we just have to wake them up for them to start joining and sending packets through TTN to an endpoint managed by Sensoterra, that performs raw data treatment (soil-specific recalibration done on the server side), displaying of the data and API integration for delivery and subsequent use of this cleaned data. The probes are enclosed and autonomous, and come with a battery that is supposed to last at least 3 years.

This particular design implies that…

In order to provide connectivity, we bought a TTIG gateway that is powered via a solar panel setup, and a 4G dongle ensures the wifi backhaul. We tested everything beforehand in the office, and have a way to remotely monitor that everything is powered up and connected to the internet.

Now, this setup has been installed mid-march in a citrus orchard with the following layout, with the TTIG being installed in Arbre 1 at 2m high and all probes being installed at ground level. Max distance between the TTIG and the probes is ~100m. Declared signal strength in console ranges from 9+/-1 for SNR, -105+/-5 for RSSI.

When we installed the whole setup, all the probes showed up in Sensoterra’s interface, meaning that they successfully performed at least a join and an uplink. Later on, some randomly stopped sending data, while other started sending again. For info these probes have a cache of 6 measurements and each time send a packet with a “trail” of the 6 last measured values. The data is collected hourly in normal mode and once a day in stock mode. The sensors store up to the last six measured data points, so for normal mode up-to 6 hours and in stock mode up to 6 days. Chart of the points logged by Sensoterra’s server during the past 15 days below :

Just to be clear, I understand you have the TTIG on the free to use TTN v2 console and that you see traffic on the TTIG’s console?

After that, yes, we decided to have a look at the TTN monitor to try to understand what was going on. Here’s an example of what we see (html file). We registered the TTIG on TTN console and started monitoring. We noticed that we are receiving a high number of join requests, followed by join accept, without consistent up/downlinks. It is around this time that we started discussing with the Sensoterra team, posting this thread, and discussing in PM with @descartes.

First, gateway placement and RF attenuation issues were ruled out based on Sensoterra’s experience as signal strength of the sensors was checked and the values did not seem out of the ordinary. This is going on the opposite direction from @Jeff-UK’s posts :

Then, discussion focused on putative packet forwarder issues, probably based on the miscomprehension between TTI and TTIG.

Sensoterra tech team:

We see you use TTI with packet forwarder. We have little experience with this unfortunately. Could it be a flaw in the forwarder? Are packages “overlapping” each other? […]

Still, as mentioned in the previous mail we think it could possibly the packet forwarding in TTI and we would love to be able to take a look into it, however TTI is a private network which is beyond our control, so unfortunately we can only provide very limited support. So a solution would be to have someone take a look at it that does have knowledge of the system (and package forwarder) on your end. […]

A possible solution to exclude that the irregularities are caused by the TTI/TTN packet forwarder, would be to create an endpoint in our backend for your (private) TTI network. […]

The behaviour we see from our sensors is very strange (and rare), which is why we suspect that the packet forwarder could be the issue. We also don’t think it is caused by your RF setup. It seems like some sensors have issues with joining, which could be caused because the sensors sometimes also need to receive packets (the accepts, which are not all received in your case) from the network after sending. […]

This is why we are suggesting to create a direct connection to our backend by creating an endpoint. This would bypass the packet forwarder. The main reason for this, is that TTI is originally meant as a private LoRaWAN network. And all our sensor data needs to be decoded and calibrated on our servers, which is why this connection is very important. […]

TTI is a private version of TTN. All our probes work by default on TTN gateways. Because TTI is private, there is a workaround called the packet forwarder, which is linking private TTI data to the public TTN network. Although we thought this would work well, we now suspect it is causing issues with the up- and downlinks to our sensors, this is in line with what you see in the communication. Everything else you are describing in your field, i.e. distance to gateway, signal strength when the sensors do check in etc. looks completely normal to us, usually this would cause no issues whatsoever. Therefore we are suspecting the packet forwarder to be the issue.

Meanwhile, discussion in this thread and in PM with @descartes evoked the join issues :

descartes_PM:

There is absolutely nothing that can cause a device to spontaneously decide to re-join - the classic reason is that the battery fades, regains some strength, the device reboots, sends a join request, uses all its power up with the tx so can’t stay alive long enough to hear the rx.
There may be a downlink command in their firmware that can send a reboot / rejoin request - I hope so otherwise migration to v3 will be problematic - perhaps something in their back end is sending these out. […]

The bottom line is that devices join once in the life of their battery. That’s the LoRaWAN design and that’s what TheThingsIndustries ask us to do on TTN and that’s what we ask of the community because a gateway can’t hear uplinks when it is transmitting as well as issues with running out of Join Nonces (unique random keys) if a device tries to join too often.[…]

I know I keep saying this, but unless they are being remotely commanded or they have overly sensitive “loss of connection” firmware, either of which Sensoterra can explain, devices do not suddenly decide to re-join each morning. I have devices on batteries for over 2+ years that have joined once. […]

Devices should only join once. They can rejoin if they have their batteries changed or they are remotely commanded to. They should NOT re-join spontaneously.

After phone discussion with Sensoterra we had confirmation that the sensors perform a new join request every 24h when they are not able to reach the gateway after 6 to 8 tries.

At some point, Sensoterra understood we were working with TTIG router and not a router connected to TTI, as well as the fact that the TTIG is so straightforward that it does not allow any endpoint to be redefined manually. They then kind of switched onto blaming the device itself.

To which @descartes took again some time to explain to me in PM :

descartes_PM:

the gateway has nothing to do with any of this. It’s about the devices spontaneously re-joining, failing to do that completely and you get a blizzard of ‘wake-ups’ each day. […]

If there was anything likely to work well with the TTN network, it’s their own TTIG […]

As pretty much all of LoRaWAN is specified in considerable detail, if a device & gateway is compliant, uplinks will be received and passed on to a the Gateway / Network Server which has a defacto standard by virtue of the original UDP code that Semtech created. I think most of the gateways I’ve used or recommended use this.

The TTIG uses BasicStation, an updated protocol. The only reason for ticking “I’m using the legacy Semtech forwarder” on the console when setting it up is because it is identified by its EUI. There is an interface between BasicStation devices and the main v2 Network Server. […]

The gateway hears the start of a transmission (called the preamble, the LoRa wake up call), between the closed black box receiver chip (again, one supplier, Semtech) and the well understood & quite simple gateway software arrange to listen to the incoming packet on one of the channels. This is checked for completeness using a simple mathematical algorithm (CRC) and then forward on “as-is” to the Network Server via some sort of network interface - in many cases UDP but for the TTIG, WebSockets, a much more reliable protocol. The Network Server looks up info & decrypts and passes on the cryptographically checked (ie, much better than simple CRC) uplink on to the Application Server.

So I fail to see how the packet forwarder can be an issue with their back end. At the time it is in the gateway it is just an encrypted sequence of bytes. If the packet forwarder changes anything, the decryption will break.

Sensoterra them then added observations about putative timing issues of the TTIG :

To which @descartes added :

As mine is still in 2.0.0, i also went to check if these kind of issues were solvable with the last TTIG firmware, but according to the community the issues we’re facing do not seem to be fixable by that.

Last, Sensoterra team suggested to try to keep the port socket open. I read some topics mentioning “keep awake” issues and some workarounds for TTIG as having a beacon node that is here for the sole purpose of maintaining an active connection (from what i understood this is what you call a canary, right ?). This seemed to help some people having trouble with TTIG around here, still as it wasn’t proposed in this topic i guess this may not be that relevant. Still, as we probably won’t be able to do anything on the Sensoterra probe side, do you think it could be a viable workaround to be able to use TTIG nevertheless ?

In any case, we ended up ordering a RAK gateway in order to do some troubleshooting ; thing is with the new french lockdown time will fly until the day we get it and bring it to the orchard !..