TTN server not "answering" for 2 minutes (receiving join and not answering?)

jeandup · January 25, 2022, 3:41pm

Hello,

I’m testing a gateway (Mikrotik LR8) with multiple Nemeus MM002 modules.
Everything works fine both way most of the time (uplink and downlink).
But from time to time the devices won’t get any answer to the link check request from the Gateway. Then all the devices go to join procedure without getting any answer.
From my point of view I would suggest that something is wrong on the server side (all the devices stop communicating at the same time).
As anyone ever experienced something like this?

descartes · January 25, 2022, 4:38pm

Hmmmm, care to share what you’ve got the link check set to and how they all do it at the same time and how a 2 minute alleged downtime causes all this to happen.

If you are uplinking more than once every 3 minutes at SF7 with a small payload and you have confirmed uplinks on, both would be a huge breach of the FUP.

kersing · January 25, 2022, 4:39pm

A gateway has limited downlink capacity. Could it be the link checks and join responses fail because your gateway is overloaded? This is a known issue when many nodes require downlinks in a small time window. (It is why TTN and other experienced LoRaWAN workshop organisers have multiple gateways present at a location during workshops)

BTW. a node should not rejoin after just one failed link check. That will cause issues whenever there is a small interruption of some kind. Only after it fails to get feedback over a larger amount of time rejoining is an option.

jeandup · January 25, 2022, 5:06pm

Thanks for your replies.

BTW. a node should not rejoin after just one failed link check. That will cause issues whenever there is a small interruption of some kind. Only after it fails to get feedback over a larger amount of time rejoining is an option.

The device tries to join again after 3 failed link check.
As each message is important I’m just trying to test the gateway/TTN capability to receive those messages.

A gateway has limited downlink capacity. Could it be the link checks and join responses fail because your gateway is overloaded? This is a known issue when many nodes require downlinks in a small time window. (It is why TTN and other experienced LoRaWAN workshop organisers have multiple gateways present at a location during workshops)

What do you mean by limited downlink capacity? Is it possible that the gateway is overloaded and keeps listening but not sending any downlinks?

If you are uplinking more than once every 3 minutes at SF7 with a small payload and you have confirmed uplinks on, both would be a huge breach of the FUP.

I am aware of that but what if it’s with no payload? Or maybe by payload you have to understand raw payload.

I don’t know what to share about the settings of the Link check apart from what I shared abobve and this:

As you can tell I’m far from a Lora expert and I am testing the capabilities and limitations of the technology.

Jeff-UK · January 25, 2022, 6:45pm

Radio devices have a legally regulated transmission allowance depending on where in the world. They cannot Tx continuously…and if they did they would never hear anything! LoRaWAN GW’s classically cannot listen at same time they transmit. Hence there is also a community limit on downlink activity as well as a regulatory one… the NS typically enforces the duty cycle or dwell time limits to avoid legal breaches and if there are too many devices seeking to join or asking for confirmation downlinks or simply if NS is firing off too many MAC commands etc. It may exhaust the limits for that given gateway…having more GW’s in a neighbourhood allows the NS to distribute the DL load across the where appropriate.

kersing · January 25, 2022, 6:46pm

A gateway has to adhere to the same limits a node has to adhere to. For Europe that means it can only send 1% of the time in the RX1 window and 10% in the RX2 window (at SF9 hopefully if you use the recommended sett8ngs)

To be blunt, if every message is important LoRaWAN is the wrong technology and should not be used!

And what is the uplink frequency? Anything less than 5 minutes between uplinks is bad and exceeds limits.

The total packet, payload headers etc etc count, so no payload does not mean no airtime.

Jeff-UK · January 25, 2022, 6:50pm

Frankly…if that is the case then Radio technology is the wrong one…unless you implement massive redundancy or message Tx repetition and payload overlap etc., establish handshakes/confirmation etc… which then means LoRaWAN…

kersing · January 25, 2022, 6:55pm

If you want to push the limits it would be commendable to at least study the technology and learn about what it can do before causing disruptions for the (local) community (radiowaves) and the TTN community (by loading the servers).

kersing · January 25, 2022, 6:59pm

Which severely limits the number of nodes that can be deployed in that vicinity and certainly is not acceptable in the TTN community network where you are limited in airtime and downlinks. I think 5G would be a better option as handshakes, beam forming and all kinds of other tricks to allow higher density deployments are part of that specification. (That is why the 5G specification fills an entire bookshelf in stead of less then 1 cm of that shelf)

Jeff-UK · January 25, 2022, 7:54pm

Agree, this was meant to mean/be read as “…which then means LoRaWAN…is not the right technology” as you said, just in case you missed joke & meaning. That’s why I laugh a little when people talk about SLA’s in the context of any radio based system (TTN/TTI included) as its all (marketing?) bull as you cannot totally control or mitigate for risk of radio interference, suppression or blocking, be that bursts, intermittent or semi permanent, would you like a rubbish skip on that metering manhole cover sir? Thankfully LoRa is more resilient than most classic (dare I say legacy) radio modulation schemes, it helps but none are perfect…

descartes · January 25, 2022, 8:18pm

Just to fill in a few of blanks:

Doesn’t really matter if you don’t send any data - you’ve still occupied the airwaves and a LoRaWAN header requires 13 bytes. This still has utility as you can use the port number as “the message”.

The LoRaWAN Alliance recommends link checks after 60 uplinks.

There is no deployment where using confirmed uplinks all of the time that won’t be met by some stiff opposition by other local users. There are many ways of filling in any missing uplinks, but the metric that TTI work to is 10% overall packet loss. If you can’t cope with that, LoRaWAN is not for you unless you can cope with a delay in retrieving the missing transmissions. Or if you have smallish data, you can send the new with the last two (for example).

But as is standard policy for this forum, you’ve not given us context to your application so it’s hard for the community to help, some of whom are actual bona-fide experts.

Now for the really important bit.

Gateways can’t hear anything at all when they are transmitting. So when a device sends a confirmed uplink, for the duration of the the gateway’s reply NO OTHER device can be heard.

Which in conjunction with your link check settings, would explain the results you are seeing.

I am a licensed glider pilot, so in flying terms not a power expert, how would AirBus cope with me testing the limits of their next aeroplane?

However if I choose to learn the basics of powered flying, I’d at least know what I don’t know and ask informed questions.

You are very welcome to restart your evaluation if you want to outline what the important data is.

kersing · January 25, 2022, 8:28pm

Or hire someone that has invested (a lot of) time to gain the required knowledge on what is feasible and what isn’t. There are several knowledgeable forum members that can help you and might be available for consulting. PM me if you want names.

jeandup · January 26, 2022, 10:46am

Thank you all for your input.

I understand that I can’t do whatever I want and sending more than 1 uplink every 3 minutes for everyday usage.

But does it apply for a one time/day deployment of a new gateway whereI would just test the areas covered by this gateway ?

Does it also apply if I use a private network in Lora without TTN?

I honestly think that I studied the technology enough for testing and making mistakes from which I can learn (without causing problems for anybody) more than just by reading documentation. And asking about it is part of the process of learning in my opinion.

But for some reason I feel like I shouldn’t have asked and keep all thoses questions to myself so I don’t feel like I’m crashing an whole airplane in my backyard.

I believe that TTN is too permissive or should be able to warn you of your misuse of the network as it’s easy to monitor so people use it correctly. But maybe people shouldn’t start testing before they’re experts of this technology.

Could you maybe suggest some documentations that are necessary for a good use of this technology that I could have missed or wrongly interpretated/translated?

descartes · January 26, 2022, 11:19am

This is not an accurate metric - it ONLY applies to SF7 packets of 13 bytes (payload) or less in the EU but is indicative.

No need to bombard the network for mapping. There is best practise advice freely available if you ask for it.

A private LoRaWAN network is still subject to legal restrictions, as would a LoRa network - but reliability would still be the same.

NOTHING stopped you asking about reliability or your testing method before hand. And a quick search of the inter webs comes up with the 10% metric which should have indicated rather a lot and you could have based your questions on that - rather than wasting time on a testing method we could have totally told you was not appropriate from the off.

Depends on what you are testing - just trying stuff out is testing - but capacity testing that goes way beyond the FUP without appreciating what is going on is likely to get a pointed reaction, like the above.

Ha, yeah, we’d like there to be comprehensive documentation but sadly it’s all spread out - read all of the TTS/TTN documentation and watch some of the videos on the TTI YouTube channel, particularly https://www.youtube.com/watch?v=ZsVhYiX4_6o

As per forum policy, so many people give us no context at all. But the few that ask simple use case questions gets €1,000’s worth of free advice within hours.

You are in the first category at present even though you were prompted to do so.

kersing · January 26, 2022, 11:33am

All LoRaWAN networks operate using the same frequencies so ditching TTN would allow you to ignore the fair access policy but not the regulatory requirements and laws of physics. So the gateway still won’t listen when transmitting and the airwaves will still permit only a limited amount of traffic. (By decreasing traffic you can increase the number of nodes.)

We might be a bit heavy handed when answering questions. Apologies for that. It is the result of seeing the same questions, assumptions and mistakes popup every few months for over 5 years.

You learn by making mistakes. We all did. However don’t be surprised if your conclusions, design and testing methodology are challenged. In your topic starter you suggest something is wrong at the server side, however we know from experience you might well be experiencing gateway/LoRaWAN limitations caused by your testing methodology. Once you’ve gained more experience and insight (and engaging a consultant would accelerate that process) you can spot those kind of potential bottlenecks yourself.

jeandup · January 26, 2022, 1:49pm

Thank you for your replies again and for the documentation.

I appreciate everybody’s effort to help me understand my problem in my way of using TTN and LoRaWAN. I never wanted to incriminate the server for not working properly and I’m sorry if you felt that way. What I meant is that I am aware that a system not properly set up by the user might not behave as expected.

I get that I need to describe my problem. I was expecting to do it step by step as I don’t know what could cause the problem because of my limited knowledge on this subject. And I am happy and thankful to answer any question related to my problem.

From what I understand you globally think that my way of communicating with my device(s) overload the gateway itself and even if it keeps listening without problem it’s incapable of transmitting for a while. I’ll try to figure out a way to evaluate that the downtime caused by my pings will not be happening again when I’m not pinging anymore.

descartes · January 26, 2022, 2:12pm

Nope!

The gateway isn’t overloaded, it’s sending a reply, and when that’s finished, it can listen again, it doesn’t need any form of recovery time.

But if all the other devices are chiming in, there will be a race condition where all the bandwidth will be taken up by uplinks and then confirmations.

I don’t understand this but the wording worries me.

Why not just stop doing your tests and tell us your use case?

kersing · January 26, 2022, 2:22pm

LoRaWAN is an uplink mostly (bordering on exclusively) protocol. Looking at the gateway specifications that is easily deducible: 8 parallel reception channels and just one transmission channel that blocks reception when in use.
When using uplinks you need to make sure there is a reasonable chance of a transmission not interfering with another transmission (which means infrequent transmissions to reduce the chance of collisions) and not to trigger downlinks when not absolutely required.
Within the community network you are allowed 10 downlinks (including ACKs and linkcheck responses!) a day. It currently does not enforce this limit, however too many abusers will result in it being enforced and greatly hinder developers requiring a little excess while building and testing firmware.