Message lora-dl: [TTN] Keepalive timed out

Thunderwolf007 · August 4, 2020, 3:07pm

Finally I have connected my mikrotik gateway and antenna. Now I see a lot of this message in the log of the router: Message lora-dl: [TTN] Keepalive timed out. Someone have an idea what this can mean and whether I miss a setting somewhere?

kersing · August 4, 2020, 6:59pm

Does your internet connection allow return packets on port 1700/udp to your gateway? Some internet providers don’t allow the traffic and some internet routers filter It by default.
It can be caused by a slow internet connection as well.

Thunderwolf007 · August 5, 2020, 6:41am

Thanks, I’m going to check the configuration in unifi if i can see anything there for port 1700

Thunderwolf007 · August 5, 2020, 7:38am

Can i limit the traffic to router.eu.thethings.network for port 1700/upd?

s56g · August 5, 2020, 9:06am

I’m having a lot of this messages in Mikrotik’s logs … in various configurations (wifi linked, ethernet lan to fiber …). The traffic still pass without problem. Do you observe missing frames?

kersing · August 5, 2020, 4:26pm

Should be good.

kersing · August 5, 2020, 4:27pm

These messages could indicate downlink issues. Are you using downlinks successfully?

Thunderwolf007 · August 7, 2020, 10:02am

It’s still the same. I have forward the port 1700 udp in unifi but i still receive the message in the mikrotik router. I have connected the router with a cat5 cable so no wifi. Do you know of ziggo blocks the traffic?

kersing · August 7, 2020, 5:02pm

They didn’t when I was a Ziggo customer a few years back.

cslorabox · August 7, 2020, 5:19pm

Apparently others have noticed in the past that if you have more than one UDP gateway sitting behind a particular NAT, some NAT implementations don’t track these well and so do not end up returning the downstream packets (acks or dowlinks) to the correct client.

In theory, NAT is something that could be applied by your provider, not just your own little route/firewall/NAT appliance in your home or office.

One idea for debugging would if you have access to a mobile phone with tethering capability, or simply to temporarily take the gateway somewhere else, to momentarily try it with a fundamentally different internet connection and see if you get any different result.

Another would be to run tcpdump or tshark or something on the router it is connected through and see what is (and isn’t) actually flowing through. Note it has to be either the gateway itself or the router, another system on the same network will not see the traffic.

It also wouldn’t be very hard to create a little logging proxy for the UDP protocol, a couple of hours work adapting some generic python UDP example.

kersing · August 7, 2020, 6:19pm

That remark triggered something. Ziggo has been moving to IPV6 with DS-Lite which uses carrier grade NAT for IPv4. (Dutch article - https://community.ziggo.nl/internetverbinding-102/ipv6-ds-lite-fupc-gebied-en-bridge-mode-waarom-zou-dat-niet-kunnen-30135)

That would probably be hard as the Ziggo hardware is closed. But RouterOS provides a basic packet sniffer in the tools menu (IIRC) so @Thunderwolf007 hould be able to check if any port 1700 packets arrive on the gateway itself.

Thunderwolf007 · August 10, 2020, 10:18am

My network knowledge is basic but I think I see traffic coming in on port 1700 upd and I also see outgoing traffic on port 1700 upd. See also attached print screen of the sniffer

fcojmontilla · August 18, 2020, 5:49pm

@Thunderwolf007 from your last shot seems to be working properly.

Been running an open gateway for months, forwarding around 450 msgs/hour, 8000-10000 msgs per day.

Everything is wired, symmetric 600Mbps fiber line, whose health is monitored constantly.

Found out mikrotik TTN gateways fail a lot, but so do TTN (the worst) and to a lesser degree, the most reliable I have found, DigitalCatapultUK, which still, fails several (handful) times per day.

The Semtech forwarder UDP protocol, and UDP per se, is far from being bomb proof, bear in mind it is a lightweight UDP protocol, with a “fire and forget” approach, so lost packets are part of the equation.

@kersing Most default, modern firewalls will be stateful, with “allow any traffic from LAN -> WAN, AND any responses to such connections” to flow through.

Unless TTN is expected to initiate a connection towards the LoRaWAN gateway (which AFAIK isn’t the case), no need to open any ports on ROS default firewall rules, it will allow local traffic to go to the internet, and any reply/related traffic to flow back.

Downlinks work fine for me.

A different scenario would be several LoRaWan gateways/UDP forwarders behind the same internet connection connecting to the same TTN gw, haven’t come across such scenario yet, but it may be required to program some sort of NAT “helper” or traffic mangle.

ROS already has one that may be repurposed (IP > firewall > services > UDPLite).

Again I haven’t checked this scenario, so just guessing.

cslorabox · August 18, 2020, 6:06pm

This bothers me.

Gateways shouldn’t “fail”.

If there is an incident, it should be a specific identifiable failure.

Most of the software and electrical engineering work put into making our own (and the overwhelming motivation) has not been about getting it to be a gateway, but rather about make sure that when something goes wrong, we know what happened, and can come up with a plan to make sure it doesn’t happen again, or that it is automatically corrected, or else that we automatically notify someone quickly to go address it in person.

A different scenario would be several LoRaWan devices behind the same internet connection connecting to the same TTN gw, haven’t come across such scenario yet, but it may be required to program some sort of NAT “helper” or traffic mangle.

If you mean several UDP gateways connecting through the same router, there’s been concern in the past that some NAT implementations didn’t deal with that well, but wasn’t doing well at finding the origins of this last time I searched the forum for it.

fcojmontilla · August 18, 2020, 6:18pm

I feel you… and think like you. But in order to assure that on an open network the only option is setting up yours.

Was on the verge of setting my own chirpstack to run my apps in fact, but the error % is so low vs the messages trafficked I didn’t look back.

Public services exposed to the internet get used (and abused) a lot. And then there’s the scans, attacks… mix that with UDP and errors are expectable… what counts is the UDP forwarder implementation retrying the transmission if the gateway doesn’t respond the keepalive.

200% agree with you, I would like to have good monitoring, but also understand that would mean a substantial investment in resources.

I wouldn’t mind giving a hand if someone is in such scenario with a mikrotik router.

kersing · August 18, 2020, 6:24pm

Strange, not seeing a lot of downtime impact on my (monitored) gateways and applications.

Yep, that was what I assumed my MikroTik gateway would be (statefull), however two UDP gateways behind one firewalling router didn’t work reliably.

cslorabox · August 18, 2020, 6:25pm

It’s debateable if there’s time for a retry that would still allow hitting a 1 second RX1 delay.

Given the RX1 delay, the de-duplication delay also needs to be very short, so if any other gateway received the packet, it’s going out to the application sooner than the one having trouble could get its signal report added.

Patching up the UDP protocol isn’t really the answer; switching to a more sophisticated protocol is. The only gateways that cannot use something else, are those where the user can’t change the software.

fcojmontilla · August 18, 2020, 6:30pm

@cslorabox

Couldn’t agree more… I could understand making the protocol so simple (and the choice of using UDP) if it were embedded or low.powered devices the ones acting as TTN gateways, but this obviously won’t be the case as the natural place to run them is on internet connected routers.

Cannot understand why UDP and so simplistic approach either.

fcojmontilla · August 18, 2020, 6:38pm

The NAT issues are to be expected no matter if ports are open, unless a helper is developed, or some sort of programming/workaround applied so that NAT doesn’t get in the way, the problem most probably isn’t the ports aren’t open, but when the router receives the responses thinks the message is for a given forwarder when is in fact for a different one.

Were both forwarders connecting to the same TTN gw? did you try using different TTN gw for each forwarder?

cslorabox · August 18, 2020, 6:43pm

In other words, the NAT implementation needs to be correct, ie, it needs to track the the source port vs internal IP even when the destination port and IP are the same, such that when it gets a response to that port it can route it back to the correct internal IP.