Update - 1000+ nodes, one Gateway

Johan_Scheepers · May 25, 2022, 1:36pm

This question maybe out of the scope of TTN, but see if anyone have experienced this.

I want to deploy a few nodes (1000) in one area, less than 500m apart. All of them only uplink once an hour - 2 hours with a one byte payload, so really small and light payload, but densely deployed.

Now the question - I want to send one downlink per week, this is a bit bigger payload - 15-20 bytes and this update downlink will be streamed and sent in one go to all the nodes, how will the network react?
Bear in mind the gateway is deaf while it sends the down link, will I need to use confirmed downlink?

The uplinks I am not to concerned if I receive them, they are more just I am alive, is there a downlink for me?

kersing · May 25, 2022, 2:13pm

That won’t work. Your gateway has to respect the legal airtime limits so after a few downlinks it won’t be able to transmit anything. Even if all nodes would check in with a uniform distribution during the time window you will run out of airtime.

This is the same issue we’re hitting when too many nodes try to join during a workshop.

The solution could be to deploy sufficient gateways. Having multiple at a workshop is the workaround we usually use.

Don’t use confirmed downlinks. Use data in the uplink to signal the last successful received downlink (one byte sequence number should work) and retransmit when a node reports the wrong downlink id.

Johan_Scheepers · May 25, 2022, 2:51pm

How many nodes do you use in the workshop? What is your node gateway ratio?

I have a few nodes that I have set up this way, where the node sends a parity bit back in the downlink to confirm the uplink, but I have seen sometimes I need to retransmit the uplink several times.

I suppose I can delay each nodes downlink until the previous node have confirmed, but this could take days before each node have been update., as the downlinks are 1-2 hours. Need my thinking cap on here now.

kersing · May 25, 2022, 4:09pm

Workshops are 10-15 nodes. With one gateway that is an issue. With two it usually works if not all participants try to boot the node at the same time.

When using confirmed downlinks that will be the same but you shift the scheduling out of your control by offloading it to the network.

Schedule batches staggered in time? By making sure there is a maximum number of potential downlinks you can avoid overloading a gateway. However, keep in mind other use cases might require downlinks as well (join for instance) so leave some capacity. And as you mentioned, uplinks will be lost on gateways this busy. Consider placing two of them at a site to remediate that somewhat.

Jeff-UK · May 25, 2022, 4:40pm

I’ve seen some workshops with >100 nodes potentially joining! Obviously not all at same time but in some cases through the day in close order…and it fails miserably, esp once into aro 10’s /10 mins, as Jac says solution in that case is 2 or even 3 GW’s…the issue isnt the number of nodes or even the node/GW ratio its actually the uplink or downlink rate. If your 1000 nodes all want to join in short order or if you want to send downlinks in short order that is were the problems arise… some metering (have seen water meters 1st hand) and lighting deployments can be exactly as you are looking at with many hundreds/thousands per km sq. The trick is they stagger the deployment/join activity, retain creds to limit need to rejoin and then look at mechanisms to cluster and group and address en mass - see also how FOTAU tends to work. Even then whilst single GW per area deployments are possible, the most effective ones I have seen use overlaping circles of coverage to ensure large % of nodes (ideally >90%) see/are seen by 3 GW’s on reasonable short SF’s (7/8/9), obviously as SF expands the problems seem to grow as a near square law and the capacity in terms of node service rate over time falls as a near inverse square Also when 1st deploying instead of just throwing nodes out all set to start at say SF12/11/10 then using/waiting for ADR or fast ramp algorythms to instruct to jump to practical shorter SF’s good practice suggests follow a bulleye/archery target model for deployment, with installers told to deploy near concentric rings of devices pre set for an SF with a high confidence of success. One former client had its installers carry what they simply called green yellow and red devices. Green were preset for SF7 but quickly moved up to 8/9 if needed, Yellow were 9 and moved to 10/11 if needed, Red were 11. All were ADR enabled and the backend was configured such that they would all slowly balance out and optimise over a 1-2 week period after deployment… much as a BBand DSL modem would ‘train’ to an optimised speed over time after turn on. Also given you are looking at a short length area dont forget if the antenna is mounted high then directly below and for a short area around the GW sensitivity may be lower close in than a few 10’s/00’s meters out depending on ant propagation curves…a farm I looked at could only get good reliable connections to a GW ant on a water tower at edge of a field if using SF9, where identical nodes >400m out were happy at SF7!

descartes · May 25, 2022, 5:30pm

Why?

I’m sure you’ll have seen this many times before on here, but context is everything.

If the setup needs all the nodes to be working to a particular set of settings, then I can see you’d need to download them in a short period of time. But if they don’t have to be in sync as quickly as possible, then it could all be spread out.

But we don’t have context, so it’s hard to make suggestions …

There are worse things than confirmed uplinks & holding parties during Covid lockdown and confirmed downlinks are definitely one of them.

There are various ways of continuously acknowledging the status without adding to the payload.

As it happens your one byte could just as easily be two as it’s exactly the same number of chirps / airtime so makes no difference to anything.

Or you could use the port number.

The ‘trick’ is to embed a number in to the downlink that is like a version which the device then uses when it uplinks to indicate which version of settings it’s heard. So if you get an uplink with a previous version number, you re-do the downlink for that device.

Overall, 1000 devices in a smaller area just screams a multi-cast message - any reason why this can’t be done?

And with all those devices, I’d have three gateways just for redundancy - bit daft having 1,000 devices and then the only gateway dies and it all just stops…

Johan_Scheepers · May 25, 2022, 8:18pm

I am not too concerned about the joining of the nodes (but it is ISM and you need to take others into consideration) as this will be done in one or more days, so this will be spread.

Is it the same limits as for a node, can’t recall off hand now.

I do understand what you say about the SF, the range is short, nearly WiFi but not quit that short…but need to take this into consideration…maybe a random SF selection for joining. Defiantly a point to implement.

Gain can be as low as possible, as the range is <500m.

I am interested in the uplink content, downlink is I am alive my parity bit is 0/1, I am Class A, here in 5 sec I am waiting for my uplink.

The only consideration why I were thinking of a confirmed uplink, are the numbers of uplinks and the success rate of uplinks.

As with my experiment where the backend send a uplink to the node, the node replies with a parity bit, the backend checks the parity bit, if the parity bit don’t match that of a successful uplink, the backend schedule the uplink again to the node. I have not checked the exact % off single uplink been successful, but it is far off 98%, more like 70% or 50%, the gateway is not extremely busy and the SF is 7.

Could also consider say 4 repetitive downlinks in succession with interval of 3min (argument sake) and then deep sleep wait for +/-4 hours.

Cant be as the uplink to each node is unique.

Not too concerned about redundancy as a point if I don’t get the uplink I have 48hours to fix the network, the consideration will be more on the capacity of the gateway to handle the load and legal limits (did not think of legal limits).

descartes · May 25, 2022, 9:10pm

I believe this was a reference to using confirmed downlinks so I’m not sure how this came to be.

It seems to be a mix on uplink/downlink. The downlink saying I’m alive would be the gateway saying to the device that it & the backend is alive. And if you are only sending one byte to open a window for receive mode, that would imply, along with your test stats that you’ve compiled, that you are looking to send commands with some degree of reliability, which is not a good fit for LoRaWAN.

I don’t think we’re any the wiser as to what this application is for so your technical investigations are still rather abstract.

kersing · May 26, 2022, 9:11am

Yes, the regulations make no distinction between devices, all transmitting devices need to stay within the airtime allocations.

For TTN in EU the RX2 frequency is in a different band with more airtime, however the fixed SF of 9 negates (a huge) part of that advantage.

Jeff-UK · May 26, 2022, 10:08am

Jac it depends on which end of the telescope yo look through another way to think of it is that if SF wasnt fixed at 9 and allowed to use any 7-12 then someone using SF12 would struggle more than on a ‘normal’ band - if we assume each SF step = approx doubling of airtime…SF9 @ 10%DC is like SF10 @ 5%, is like SF11 @ 2.5% and then someone using SF12 DL at 1.25% so not much difference by the time you are at SF12…but the delay for the channel to open up for another DL is much shorter if limited to SF9 increasing the chance that other active nodes will get their RX2 DL in a timely manner. Also given the deafness of a GW when DL’ing, there is another advantage - especially when looking at more lightly loaded GWs (from DL activity perspective) - in that the GW has higher availability for uplink capture if limited to SF9 where as even infrequent DL might block the GW for far longer - with greater impact to the community. What also offsets well for RX2 is the higher TX power allowed under the regs delivering greater range helping offset the lower range achieved with SF9 c/w normally associated with say SF10,11,12.

kersing · May 26, 2022, 11:16am

Looking at it the other way, if RX2 would use SF7 (yes I know that would be a huge sacrifice in range) we would be able to transmit 4 times as much data.

I wasn’t trying to imply SF9 isn’t a good choice for RX2, I just wanted to make clear the additional airtime would not be that much of an improvement when compared to available airtime on the other frequencies when using SF7.

If 1000 downlinks are required within a relatively small period of time (and I consider a day a small period when it comes to 1000 downlinks) the only viable solution I currently see is to deploy plenty (not one, not three, but somethings like 5-10) gateways and try to make sure all nodes can use SF7 uplink so downlinks can be SF7 as well.

Johan_Scheepers · May 26, 2022, 1:19pm

There are 3 options, the first two are fixed, can you control the 3 option with some parameter in the downlink? I can not see anything in the documentation ( or maybe I have not read the correct page yet)

As if you could set the node and the NS to use your desired settings, you can try and predict the correct setting and then only use that SF for that specific node. This will need a bit of development (as if node have not heard a downlink for x time change to a different SF, but you still have to possibility of the NS and node t be on different settings.).

Is this a option for the downlinks? (SF7)

Jeff-UK · May 26, 2022, 1:42pm

Yes, if they are close enough to the gws as Jac says by densifying the network, then use the RX1 window…… if TX1 is SF7 RX1 Also SF7 on same frequency, you just need high confidence message gets through and at right SF…… hence dense (gw) network. You then think in terms of RX2 (still at SF9 to keep matters simple) as just being a safety net and occasional fall back in the event of collisions or short interferers & cross your fingers Think I saw some maths on this in one of the academic studies years ago…. And IIRC it was something very simple like ~>90% RX1 success with >95% DL cumulative (RX1 + RX2) success delivered a stable network for several ‘000 nodes within 3 days of start up, or some such……If I get time I may trawl library to try and find the paper…think it was one of the N Europe Uni’s that did the eval… several years BC…. Also from what I can remember is definite private network territory vs trying to co-pop TTN community deployment…

kersing · May 26, 2022, 2:29pm

This can’t work as the node will have either the settings for RX2 hardcoded (ABP) or will get them from the NS at join. So switching to a different SF for RX2 is not an option as the node will continue to listen at whatever it starts with.
And no, I am not aware of options to change the RX2 SF per node but take a look at the cli documentation which might reveal such an option. However, you are straying into dangerous territory where potentially things might break and require manual intervention. With 1000+ nodes deployed do you want to have to visit them one by one (are you able to)? Deploying additional gateways to resolve the issue is a far better and less risky option in my opinion.

Johan_Scheepers · May 26, 2022, 2:41pm

Thinking of possibilities, it is a bit of a long shot when you are starting to play with to may parameters you are setting.

This is defiantly not thrown out, good advise, I am maybe just putting to many of my thought out there.

As it is also not easy to get a 50 let alone 100 nodes on the table and see how it works.