How reliable are downlinks?

ame · June 17, 2020, 3:40am

Well, it depends on how reliable downlinks are.

To answer your specific example, yes. It is acceptable.

If we are not getting any telemetry, or we can’t send downlinks, then we need to investigate. This is why we need to know how reliable downlinks (and uplinks) are.

dajt · June 17, 2020, 6:34am

From my brief experience, it’s variable and probably mostly to do with the gateways. But the sponsor in my case won’t be any worse off than they are now and the farm manager will have to go and switch the pump on manually if too many downlinks go missing.

The GSM backed gateway I have access to most of the time is pretty hopeless as described above.

I’ve parked out near the customer site a few times over the last couple of months (social distancing!) to see if their on-site gateway was any better.

The first two times I was just trying to see if I could get a downlink at all, and did seem to be getting them but it was only with a hello world program. After that, assuming downlinks were ok, I spent most of my time developing with a wifi feather because it was so much easier, and I wouldn’t be abusing the fair use policy.

The 3rd time I was trying to record a demo. I parked in a different place, closer to their building, and the performance was terrible, like the GSM gateway. This visit is what sparked this thread as I then doubted my impression from the first two visits.

The last time I was across the road again to see if that made a difference and the performance was fine. Joined in 6 or 8 seconds, every downlink came through at the first opportunity, and I did quite a few of them. I was able to record the demo.

The gateways seem to be the deciding factor here. I don’t know what the variables are but could they be things like:

Do you have line of sight?
Are you too close?
Is you signal better to a dud gateway than a good one, so only the dud is sending your downlinks? I think I read only one gateway is asked to send a downlink. I heard the sponsor may have more than one gateway on site so perhaps I was talking to one of the not so good ones?

bluejedi · June 17, 2020, 6:49am

It all boils down to whether the worst case scenario (e.g. downlinks and/or uplinks dont’t come through for any period of time, for whatever reason) can cause any dangerous / unacceptable situation.

“Then we need to investigate” is a very fuzzy statement and may not prevent a dangerous / unacceptable situation.

Whether that is acceptable for your case is up to yourself to determine/decide.
There is however no (simple) formula to determine the ‘reliabilty of downlinks’ (or in other words determine the probability that an unacceptable situation can occur due to downlinks not timely arriving for whatever reasons).

ame · June 17, 2020, 9:01am

Yup. It is entirely acceptable. If the command to prevent the pump running is not successful then the pump will still be controlled by its own pressure switch, so the system is not compromised. If the command to allow the pump to run is not successful then water will still be available for a while, but the telemetry for the water level will show it is dropping (and not being refilled). If there is no telemetry we will generate a warning that there is no telemetry, and we need to investigate.

dajt · June 17, 2020, 10:36am

Wow, we may as well be writing the same system

ame · June 17, 2020, 10:39am

I have a pump, a tank, and a switch. There are very few combinations we can make.

cslorabox · June 17, 2020, 1:53pm

That’s not necessarily a sound conclusion at all.

By far the most common cause of downlink failure in new projects is timing errors in the node, causing it to not be receiving at the correct time.

However, unreliable or slow Internet connection between the gateway and the servers can also cause a problem, unpredictably over time. There’s only one second available to complete the roundtrip between the node, gateway, server, gateway and node. And unfortunately because the original packet forwarder code could only have one transmit request outstanding at a a time (no queue) even in a case where the rx window is later, the server has to hold onto the transmit request until just before it needs to be sent anyway, because it can’t risk collisions between transmit requests sent down to the gateway out of order. The packet forwarder code has long since gained an internal queue to order things, but the server can’t count on a gateway running a version with that capability.

The biggest thing you need to do for testing is gain actual access to the gateway - what do its logs say, and did the transmit light blink? Ideally, get one scope probe on a gpio on the node driven for the duration of the receive window, and another scope probe on the gateway transmit LED, and see both if the gateway transmitted at all, and if the timing lined up.

dajt · June 17, 2020, 10:32pm

Is this most common on amateur projects with *duino-like devices using LMIC? If someone was serious about developing a product are there other radio modules that ‘just work’ because they have their own crystal, microcontroller, and LoRaWAN software on board so you just give it your uplinks and get your downlinks? If so, do they still need a bit of tuning due to crystal variablility?

cslorabox · June 18, 2020, 4:17am

There are delegate modules which have their own MCU to run the LoRaWAN stack.

Personally I would not depend on one where that stack is not open source and repairable when (not if, but when) issues are found.

descartes · June 18, 2020, 6:00pm

There are many modules around from the likes of Microchip or RAKwireless that present you with a serial port and documented AT commands so you can delegate the whole messy business to their module and no, they don’t need any tuning.

Some of the modules actually allow you to program them at some level (entirely or via an API) so you can put a simple application on the module itself.

Or you can do all you need to do with your micro controller of choice (Arduino, BluePill, micro:bit, Pi etc etc) and just pass over the data packet you want sending.

There are a number of posts on here that start “I’ve created my own device using an MSP430 and an RFM95” where the developer was somewhat overwhelmed by getting the base software for the MCU as well as a LoRaWAN stack running.

So for your own sanity, I’d start out with one of these pre-baked no-brainer modules to get started and leave creating your own MCU & radio module masterpiece out of a critical path timeline.

That said, after a lot of testing, I’ve got a Arduino Pro Mini + RFM95 combo running reliably: https://github.com/descartes/Arduino-LMIC-example

But the experience wasn’t inspiring and I’ve run out of memory, so now I use a RAK4200 over serial.

cslorabox · June 18, 2020, 7:02pm

There are also plenty of people complaining about problems with those, made worse by having little to no debug visibility into what is actually going on internally.

The Microchip module is mature enough that most of the issue might have been worked out, as for the others, I’m dubious.

descartes · June 18, 2020, 8:30pm

I’ll keep my thoughts on the randomness of industry, suffice to say, it’s rare that anything works straight out the box. I don’t like it but I’m resigned to it being part of the cost of doing business.

If you want a part-module part-open, then look at the Microchip ATSAMR34J18 which Microchip provides the source code for:

https://www.microchip.com/wwwproducts/en/ATSAMR34J18

which you can get as a development board:

https://www.microchip.com/developmenttools/ProductDetails/DM320111

and when you are happy to go over to a module, you can get it pre-packaged from RAK as a dev board, a breakout board and a module:

There were certainly some hiccups with RAK’s first release but between staff, resellers & technical end-users it is working like the Microchip board.

This does not do AT commands, so this is definitely a step up from the RAK811 / RAK4200 based modules.

It’s certainly an odd development world we are in - the rush to market can cause some huge issues. I try for a v1 that I can remotely update firmware whilst the low cost v2 module, developed in parallel, is being rigorously tested.

dajt · June 18, 2020, 10:03pm

My choice of hardware and RF protocol was determined by the sponsor - they’re settled on the feather form-factor for all their apps. This might work as intended given they’re getting both wifi and lorawan versions of the firmware and should be able to drop the wifi board in place if they desire. But the experiment is to get lorawan working for specific use-cases where wifi doesn’t suit.

It’s been frustrating, but also a lot more interesting than my day job.

The number of LMIC forks is a horror show, but it’s open source so you have to live with it. The MCCI one seems pretty good and is getting useful extensions and properly updated documentation. But at the start I was told I must use a different specific variant, without being told why or what was different about it. In desperation I started trying others and settled on MCCI. I’ve held back from using the useful extensions to keep compatibility with the other 1000 forks just in case. It’s a confusing mess when you are starting out but after a month or two you know your way around. Some of the original LMIC stuff like their idea to become the main loop of the app and that you should use their scheduling function could be explained as not required and deprecated in the *duino port documentation a bit better than it is.

fishbeetle · June 19, 2020, 1:04pm

If you’re looking at using RAK4260s there’s some useful stuff at
http://www.marvellconsultants.co.uk/LoRaNode
Also see:
http://www.marvellconsultants.co.uk/LoRa
for some notes on down-link issues with the legacy UDP packet driver
and how to fix.

descartes · June 19, 2020, 7:19pm

For which RAK should be eternally in your debt for taking their release and making it work!

descartes · June 19, 2020, 7:26pm

Yeah, took me about a month to drop scheduling my own jobs and just call os_loop in the main loop.

The best mess I’ve seen so far is an ESP32 version with RTOS that then has the LMIC scheduler inside it all!

I know that people will ask why, but I’m tempted to facilitate a re-work from scratch whilst using LMIC as a crib - it will work on small Arduinos with a couple of sensors, but as I found (see my repro comments, linked above), it’s mostly broken now.

dajt · June 19, 2020, 11:24pm

LOL! I briefly looked at RTOSs before quickly deciding that was a stupid idea for such a simple app, and was worried they might get in the way of LMIC.

The library is probably doing nothing the majority of the time given it’s a class A node but I have no idea how often the pins have to be polled once I ask to send a message so figured it’s best to just call os_runloop_once() as often as possible. Power is no problem, there is lots of electricity at the pump installation.

It would also be useful to document which events can happen in each mode as I see the same huge switch statement copied from example to example (as is usual with Arduino cargo cult programming).

And what’s a GSM-backed gateway good for? Nodes that don’t join very often (so save the mountains of state required or use ABP) and don’t use downlinks for their own app? This thing has been a terrible source of confusion for me and wasted a lot of my time. Apparently someone tried to use it last year for a project and had trouble with it too, and they weren’t even using downlinks in their project other than the join ones.

cslorabox · June 20, 2020, 1:12am

A lot of the problem comes from not understanding what it needs to do and how it goes about doing it.

Mostly that switch statement is to print debug messages. You are of course free to only respond to the events you actually care about.

Places you need a gateway where you don’t have other connectivity, naturally.

There’s not proof that the issues you see are related to the GSM.

Sure, it’s possible though, by reason to two in hindsight poor design decisions.

The first was the decision to use a 1 second RX delay for TTN rather than something longer to allow for more backhaul latency.

The second was to assume that all packet forwarders lack a TX jit queue (when in fact builds of current code have one). This means that even if the RX window delay is longer (as in the case of a join), the network server still has to hold onto packets until just before the deadline, to make sure that requests aren’t sent out of order.

cslorabox · June 20, 2020, 1:15am

The big issue here may ultimately come down to the difference between the parts of the LoRaWAN spec that are currently exercised on a daily basis, vs. all of the corner cases that spec compliance actually requires, some of which are pretty much possibilities in theory only, and other of which could be hit by seemingly trivial changes in server code or conditions.

One of the reasons the MCCI LoRaWAN repo has diverged so far is that they’re actually trying to satisfy the LoRaWAN compliance tests, on top of LMiC that originated as more of a “proof of concept” codebase.

If you say “these are the only things I care about” then it gets a whole lot simpler. But then it’s also not technically LoRaWAN. And it might work on TTN today, but not after someone else makes a trivial change tomorrow. That variety of regional possibilities also complicate things - most people test only in their home region as they don’t have a “closed” test setup to verify operation with the details of regional settings that use frequencies that are only legal elsewhere.

dajt · June 20, 2020, 1:59am

That’s an interesting one! Is that part of the LoRaWAN spec or just how the network software is written at present?