Message lora-dl: [TTN] Keepalive timed out

kersing · August 18, 2020, 10:46pm

UDP and a simple protocol because the software was originally intended as a tech demo to be used with the Semtech demo server. It was never intended for widespread production use. At least that is what I’ve been told sometime during the last 5 years, However as with all demo software it has taken on a life of its own.

The helper is a requirement, connection tracking provides that. However the the firewall still needs to allowing established/related traffic and that rule might not be in place. Then allowing UDP traffic from port 1700 would help.

I can’t quite make out what you are asking but let me describe the setup:
What I was attempting at that moment was to have two gateways, both running a UDP packet forwarder, forward traffic to the TTN back-end for EU. So four UDP streams from two different internal IPs connect to one destination IP/port using masquerading. (Each packet forwarder has two UDP connections, one uplink and one downlink and TTN uses the same port (1700) for both.)
I have had this running at my home office, but at another location it failed. One of the gateways would not get ACK packets when the other (started first) was running. Once the first one was stopped the second one had no issues. Upon restart the first one had issues as long as the second one was running.
As we didn’t need to run the two gateways behind that single router (just noticed the issues while setting up the gateways) I didn’t get to the bottom of it but my suspicion was a NAT/connection tracking issue. However that was two major ROS releases ago so retesting would be required to see if it can be reproduced.
Around that time at least one other user mentioned he was unable to run two gateways (with UDP packet forwarders) within one NATted network and both connecting to the TTN EU back-end as well.

fcojmontilla · August 19, 2020, 10:04am

Agree.

Been going through the ROS changelogs, LoRa is being actively developed/fixed and found this for 6.46.5

*) lora - properly update source address for packets when routing table is changed;

I suspect the problem isn’t the ROS NAT but the mikrotik forwarder, which is own mikrotik’s implementation, maybe a combination.

Has anyone tested this with a recent ROS version? I’d use an stable channel version, 6.47.2 today.

fcojmontilla · August 19, 2020, 10:07am

Heheh yes… it happens.

But I think that’s a technological debt and being a key component should be properly implemented. Payloads are supersmall, and routers running forwarders won’t have a problem using TCP or a better suited protocol for reliable signalling.

Default ROS firewall allows established/related connections by default.

Was this two mikrotik packet forwarders using a Mikrotik as router towards the internet?

s56g · August 19, 2020, 10:25am

… after quite a bit of testing seems that logged errors have no influence on TTN traffic. Downlinks are reliable too … even on wifi backbone “community network”. Btw; I have two Mikrotik gateways on the same network and they are NATed. So far no problems observed. Btw: IPv6 would be a solution for CGN networks but unfortunately TTN is hosted on non-IPv6 platform . Mikrotiks connect happily to private TheThingsStack instance over IPv6.

kersing · August 19, 2020, 10:35am

No, this was years before Mikrotik released LoRaWAN compatible hardware. (Three years years ago from memory, might be longer)

bsiege · August 19, 2020, 7:54pm

TLDR; should work, but is partly known problem and mostly debugable…

That should not be a problem. UDP is robust. As long you have valid triples out of any connections the NAT Device should track the connection stateful.

Gateway1/RND-SRCportA —> NAT-DevicePub/portG → TTNEurope/DSTPort1700
Gateway1/RND-SRCportB —> NAT-DevicePub/portH → TTNEurope/DSTPort1700
Gateway2/RND-SRCportC —> NAT-DevicePub/portI → TTNEurope/DSTPort1700
Gateway2/RND-SRCportD —> NAT-DevicePub/portJ → TTNEurope/DSTPort1700

Every reply from TTNEurope/DSTPort1700 is trackable and replies to every source are possible. With DNS this works perfectly this way. But this are queries with immediate replies. Many, even professional Firewalls have the dilemma of restricted source ports on public side versus reserved sessions. 15s session timeout for UDP is quite common. And there lies the problem. NAT-Device generates a new session because it uses a new translated source port. If you have the possibility to dump packets LAN and WAN side i am quite convinced that the sessions are not consistent and the NATting device shuffles source ports and confuses the receiving server over his sessions. With this knowledge and a configurable NAT/Firewall there is a chance to adjust more sticky session timeouts for protocol and NAT-Tables.

Or switch to TCP

kersing · August 19, 2020, 9:46pm

you know what my solution is whenever possible. (That doesn’t include the Mikrotik gateway due to its closed nature)