How reliable are downlinks?

cslorabox · June 12, 2020, 12:18pm

I have to agree that if LoRaWAN is going to be used here, it should be class C. The pump sounds like it probably consumes a fair amount of power so there’s likely either a mains power supply or at least a hefty battery solar one that probably wouldn’t be drained by running a LoRa receiver continuously.

That said, if it all possible it’s better to close the control loop locally - running the pump until the tank is full is something that should be done via local control logic.

More generally, while class A operation is not really suitable here it doesn’t sound like the current implementation is really working even for that. Downlinks require timing to be right, and timing with LMiC can definitely be an issue, though the the feather M0 is one of the better supported platforms in the MCCI repo. A mixup of uplink/downlink frequency or other air settings when moving to a less common bandplan could also be at fault.

The asker really, really needs to gain access to the gateway to effectively debug, or even buy another one or hang a concentrator on a pi (make a cost argument, it’s not worth wasting even a few hours of work over this!). There’s a very narrow window of time for the server to get a downlink request back to the gateway (even for a join accept, the server has to wait until just before the deadline as it assumes a queue-less packet forwarder) so moving the gateway to wired Ethernet would be good.

If LMiC is modified so that the transmit frequency is known in advance, then the downlink frequency can be figured out as well and an RTL-SDR dongle tuned there to try to catch the downlink.

But really working out timing bugs should be done with a scope or logic analyzer watching both a GPIO on the node, and the gateway’s transmit LED. Or in theory a simple RF powermeter type received could catch both uplink and downlink from nearby sources irrespective of frequency, and try to see that the interval between the blips is exactly the receive window delay.

mfalkvidd · June 12, 2020, 7:39pm

At only 6m from the gateway, the signal strength of the downlink can overpower the node’s radio front end.

It might be worth trying to move the node further away.

dajt · June 12, 2020, 10:33pm

Thanks for the ideas everyone. It seems this level of performance is not expected which is encouraging.

My testing over the last couple of days has shown it to be pretty good with the latest version of MCCI LMIC as given by the Arduino IDE. I think all the downlinks have arrived within 6 uplinks. It’s just annoying not knowing if the problems are due to the gateway or not so I wanted to see if others were having the same downlink problems meaning that was just how it is.

This is a university project and while the customer really does want automatic control for their pump, I don’t think they’ll spend money on it over and above what they already have buying the gateways and feathers etc. they’re using in other projects.

The virus situation over the last few months has not helped - they’ve generously loaned me the GSM gatway to use at home, but on-site is a different one I hope to get access to when I can go and visit them. I am hoping that solves the downlink problem.

Class C would be good - we have power all the time because the pump is connected to mains electricity. But LMIC doesn’t support class C operation and it looks like that would also require paying for a special ThingsNetwork server which also rules it out.

The pump does have existing automatic switch-off mechanisms so we’re not going to damage it if we don’t get the switch-off command on time. We also have hours to receive the switch-on command because the low-water mark is 50% of tank capacity. So in practice I expect even the the lousy downlink performance we have now would work but it’s annoying not knowing whether the pump will switch on within 10 minutes (first timed status message from the controller), 1 hour (within 6 messages), or some longer time if things go really badly.

Regards, David.

arjanvanb · June 13, 2020, 8:20am

I’d call that really bad. If you find that you need multiple uplinks for your confirmed downlink with the non-GSM gateway as well, then I really feel you should fix it, or look for alternatives. Don’t take a non-optimal system into production. (I’d not even start at all until you can use Class C.)

Did you confirm that the nearby region is used for the gateway’s router? And that the same region is used for your application’s handler? (I’m not sure which component keeps the downlink queue.)

The gateway owner could add you as a “collaborator” for the gateway in TTN Console, so you could at least see the gateway’s Traffic page in TTN Console, and tell if TTN has commanded the gateway to (re-)transmit the downlink. It probably has, so then access to the gateway’s raw logs is invaluable to determine if latency is the issue. Well, @cslorabox explained more above.

Note that a downlink that is scheduled while handling an uplink with that measurement, is likely not even transmitted until the next uplink: My application's downlink is always queued for next uplink.

As for confirmed downlinks, you may also want to check what happens if you replace it while it was not yet confirmed. (Say, the “switch on” downlink was transmitted but not yet confirmed, and meanwhile you determined that a “switch off” downlink needs to be scheduled instead. Does TTN delete the non-confirmed downlink from the downlink queue, or does it wait forever for the confirmation?)

When using the TTN Community Network which is operated on best effort only, you should also take long network outages into account. And maybe in a few years your LoRaWAN gateway or device will just break.

Controlling a pump to fill a tank doesn’t feel like a good LoRaWAN use case to me, not even for Class C. I very much agree with:

Curious: how far away is the tank (or its water level sensor) from the pump?

ame · June 13, 2020, 9:07am

I’m planning something similar, but instead of turning the output on there will be some local smarts in the node that accepts commands to turn on and off, but starts a timer when an “on” command is received. Then, if the network dies the timer will time out and the output will turn off automatically. But, if everything is good and I want the device to be on longer I’ll re-send the “on” command before the timer expires which will re-set it.

arjanvanb · June 13, 2020, 10:13am

Ah, the following was actually confirmed to work:

So, I’m quite sure confirmed downlinks can be replaced with a different confirmed downlink just fine.

dajt · June 13, 2020, 11:35am

Replacing unakc’d confirmed downlinks is an interesting case I had not considered! Glad it works

I agree taking 4-6 attempts to get a downlink is terrible, but given I only managed to get about 3 in a month when this project started my expectations are pretty low at this point and I’m very pleased anything happens at all.

I have written the code to work with either a wifi or lorawan feather because testing with wifi is a lot easier, and in case this just doesn’t work with lorawan I can suggest we use wifi. But I think the project sponsor wants to prove lorawan can either do this or can’t for cases where farms have tanks further away from the house than wifi can reach. GSM is another option but there are already solutions using that. The wifi/lorawan code do not co-exist - you get one or the other so there is no time wasted on a stack not being used. Class C would require either us to write the class C code or move to a different device that does support it, both are way outside the spec for this project.

At the moment there is no “on” automation for the pump so even if the gateway or the feather dies they’re no worse off. The tank level readings are taken from a separate sensor - nothing to do with the feather. There is already at least one automated “off” mechanism, we’re going to allow for timeouts to be sent with the “on” messages, and we also check a couple of input signals that will make the feather switch the pump off without a downlink message. We’re pretty good for switching it off I think. The annoyance will be wondering why it hasn’t switched on when it should have, and sending too many downlinks if the performance isn’t any better with the gateway on-site.

Right now, it’s 10x better than it was 3 days ago and I can demo it without being too embarrassed. We have next semester to hopefully get on-site and see how it goes in place.

cslorabox · June 13, 2020, 2:49pm

It still doesn’t make sense why you are putting the radio in the turn-on/turn-off path at all instead of having that locally automatic and using LoRaWAN only to report status and possibly change control-loop settings.

Even if your water level sensor is remote from the pump, you probably want to connect them via a shorter range point-to-point link (LoRa or otherwise) and not through the LoRaWAN gateway.

That’s wholly apart from the how your LoRaWAN node-gateway interactions don’t yet seem to be working as they should.

ame · June 14, 2020, 12:29am

I am not the OP. Sorry for the confusion. The system I am preparing will be locally controlled, with LoRaWAN used for reporting the state, but I am hoping to use downlink commands to control a relay to modify the state, but even if the downlink fails, or is delayed, or the relay fails, the system will still operate safely.

Besides, what’s the point of a downlink if it can’t be used?

My installation is slightly different to the OP: I have a tank on a hill. There is no LoRaWAN converage there, but there is cellphone coverage. I have a pump in a valley, 1km away. There is no LoRaWAN or cellphone coverage there.

Phase one is to install a water level sensor in the tank connected to a LoRaWAN analogue node. Next to the tank I will install a LoRaWAN gateway with a cellular modem. The sensor, node, gateway, and modem will be powered by a small solar array and battery bank. The node will be only a few metres from the gateway, so it’s a bit pointless, but it allows me to start getting data in a consistent way.

Phase two is to install a sensor on the pump in the valley and connect it to a LoRaWAN digital node. It will report the status of the pump (running/not running) by connecting to the gateway on the hill that was installed in phase one. 1km is still not that far for LoRaWAN. A digital output on the node will do something with the pump (but I don’t recall what it is just now…), but it’s not critical as the pump operates automatically based on a pressure switch. In other words, my use of the downlink to control an output is not part of a control loop.

arjanvanb · June 14, 2020, 8:45am

Downlinks are very useful: for OTAA, for ADR, for remote configuration, maybe even for a remote reboot to allow for joining a different network. (Unfortunately, downlinks for confirmed uplinks apparently have a design or implementation flaw.) Downlinks should work properly, and if not then one should fix that. But controlling things is just not a good use case for Class A LoRaWAN.

Just for the sake of completeness, though already mentioned in many other places: even if downlinks work fine, there’s also the limit of 10 downlinks per day. I’d assume that retries for confirmed downlinks count against the Fair Access Policy as well, as the network cannot be blamed for that.

ame · June 14, 2020, 9:34am

Yes, I have borne that in mind. The node is a class C device, and we plan to turn it on (or off) at most once a day, but probably not very often at all.

arjanvanb · June 14, 2020, 10:20am

Class C devices are just Class A on the TTN Community network:

cslorabox · June 14, 2020, 2:25pm

Beware that LoRaWAN gateways are quite power hungry due to the multichannel DSP baseband receiver chip (some sellers will claim the 8-channel cards are 49-channel due to the number of distinct combinations that could be demodulated, ironically their power consumption isn’t far off from 49x that of a node radio). You may need a bit larger solar setup than you’d expect to keep this up across cloudy days.

Personally I’d look into a custom point-to-point LoRa link using lower power node-class radios. Your box on the hill can wakeaup the mobile data modem periodically and report in, along with water level. And then it can command the pump via LoRa, probably something like “this repeatable message means run for 15 minutes”.

The one possibly tricky part is having the two node-class radios “find each other” if you conclude you need to use multiple channels but you presumably have a fair amount of power at the pump and can keep that radio receiving during searches, and once you establish communication you can keep a schedule of windows which grow wider if a transmission is missed.

bluejedi · June 14, 2020, 9:01pm

FYI: I have updated the topic title.

Downlinks themselves are not unreliable. It is what you want to use them for and whether LoRaWAN as technology is suitable for what you want to apply it for. “How unreliable are downlinks?” implies that downlinks would standard be unreliable, which is not a correct statement.

ame · June 14, 2020, 9:51pm

This is all great information. I am using TTN for test/development, and it might be appropriate for deployment. If I need something “better” I can pay for TTI, or use our incumbent telecom operator’s offerings.

The MikroTik gateway I am using consumes 7W maximum. The Teltonika modem uses <5W. The Ursalink node uses <2.5W and the sensor about 1W. The solar power system will be sized accordingly.

ame · June 15, 2020, 10:26pm

Ok. I have clarified what we want the digital output for. It is to prevent the pump from running when we are not expecting to use water.

If the tank is full the pump will automatically shut-off (because of a pressure switch). But, it will attempt to restart every 15 minutes. To reduce wear and tear we want to have an override switch on the pump. This will be turned on or off at most once a day, probably with a few days between each switching event, i.e. when we turn it off it’s because we don’t want water for a while, and when we turn it on, we’ll leave it on for a while. Basically this output is permitting the system to run (with its own automatic controls) or not, and is not part of the control loop.

Is this an acceptable use case?

descartes · June 16, 2020, 9:59pm

But it is if it overrides the automatic filling of the tank.

Only you can decide if this is an acceptable use case - if you send an OFF command and then someone uses the water but the override is still OFF and for whatever reason downlinks aren’t getting through, is this OK?

ame · June 17, 2020, 3:40am

Well, it depends on how reliable downlinks are.

To answer your specific example, yes. It is acceptable.

If we are not getting any telemetry, or we can’t send downlinks, then we need to investigate. This is why we need to know how reliable downlinks (and uplinks) are.

dajt · June 17, 2020, 6:34am

From my brief experience, it’s variable and probably mostly to do with the gateways. But the sponsor in my case won’t be any worse off than they are now and the farm manager will have to go and switch the pump on manually if too many downlinks go missing.

The GSM backed gateway I have access to most of the time is pretty hopeless as described above.

I’ve parked out near the customer site a few times over the last couple of months (social distancing!) to see if their on-site gateway was any better.

The first two times I was just trying to see if I could get a downlink at all, and did seem to be getting them but it was only with a hello world program. After that, assuming downlinks were ok, I spent most of my time developing with a wifi feather because it was so much easier, and I wouldn’t be abusing the fair use policy.

The 3rd time I was trying to record a demo. I parked in a different place, closer to their building, and the performance was terrible, like the GSM gateway. This visit is what sparked this thread as I then doubted my impression from the first two visits.

The last time I was across the road again to see if that made a difference and the performance was fine. Joined in 6 or 8 seconds, every downlink came through at the first opportunity, and I did quite a few of them. I was able to record the demo.

The gateways seem to be the deciding factor here. I don’t know what the variables are but could they be things like:

Do you have line of sight?
Are you too close?
Is you signal better to a dud gateway than a good one, so only the dud is sending your downlinks? I think I read only one gateway is asked to send a downlink. I heard the sponsor may have more than one gateway on site so perhaps I was talking to one of the not so good ones?

bluejedi · June 17, 2020, 6:49am

It all boils down to whether the worst case scenario (e.g. downlinks and/or uplinks dont’t come through for any period of time, for whatever reason) can cause any dangerous / unacceptable situation.

“Then we need to investigate” is a very fuzzy statement and may not prevent a dangerous / unacceptable situation.

Whether that is acceptable for your case is up to yourself to determine/decide.
There is however no (simple) formula to determine the ‘reliabilty of downlinks’ (or in other words determine the probability that an unacceptable situation can occur due to downlinks not timely arriving for whatever reasons).