TTN Gateway Reboot [Fixed?]


(Datahive) #1

Just like many other users on this forum, I’ve been fighting with TTN gateway reliability. Today I had a bit of a breakthrough:

  • My gateway is connected to both Wifi and Ethernet. My ethernet is steady 1ms ping with 0% packet loss and my wifi can be spotty with anywhere between 5ms to 1000ms with 0 to 100% packet loss
  • The breakthrough was that I noticed that when I had gateway in a particular spot on my office it would work 90% of the time (all 4 solid blue lights after boot), and in another spot I never got it to work (constant reboots)
  • All this while gateway console showed that it got both ethernet and wifi IP successfully, and I was able to ping it at all times (of course with major packet loss in the bad position)

So … based on this I have a theory what’s wrong with TTN Gateway and why it’s so tricky to get to the bottom of the issue:

  • If you have no wifi at all, there is no problem, you will only get ethernet IP and off you go
  • If you have 100% reliable wifi, there is no problem either
  • If you have 50% reliable wifi, there is about 50% chance that your gateway won’t be able to finish the boot up process. It will likely go as far as receiving IP through DHCP but it won’t be able to communicate with TTN after that. It appears that to get IP through DHCP, 50% wifi is good enough because that particular code retries until it succeeds, but there seems to be a weakness in the resiliency of the rest of the code that communicates with TTN. If your wifi goes down, TTN connection crashes in a bad way and doesn’t recover.

My fix is to relocate my TTN gateway to a location with more reliable Wifi (and if that’s still giving me trouble I will turn off wifi all together).

The proper fix of course is for TTN developers to reproduce this issue (by putting themselves into a position with wifi that’s just barely good enough to give DHCP but not good enough to continue with the rest) then hopefully they can reproduce the issue the way I have. After that fix should be easy. Just display a proper error in status instead of infinite boot loop or reboot. Ex: “Error: poor wifi connection, try again with better wifi signal” … or even better just have the thing retry without crashing. Those packets ARE getting through, just very unreliably. So if the code can work around that lack of reliability, things will work.

I hope this helps others

Note: When I’m talking about 50% reliable wifi, of course there is no such thing - just a shorthand for saying, wifi that has high latency and some level of packet loss at times. It’s 50% of what ideal wifi looks like.


#2

seems very logical :sunglasses:


(Datahive) #3

The moving of the gateway is no brainer for sure (should have done that months ago)

The part that is completely unexpected is that normally you’re either on wifi or your not. Even if you’re dropping packets like crazy, your session is either online or not, there should not be a half way condition. TTN Gateway has a fault condition where your wifi can give you DHCP 100% of the time and you can even ping your TTN gateway with slight packet loss and yet, the wifi quality is still not sufficient to establish connection with TTN network. That should never happen. Or if it does, it should not result in a reboot.

I have to stress that this is not a case of: “well duh! of course if you don’t have internet you can’t connect to TTN” … this is more of “It’s telling me I have internet. I’m pinging it to verify that I have interent, and it’s still not connecting”. If the device can’t gracefully recover (like any other device that loses some packets on wifi), then the minimum fix here is that the device should say: “your internet connection failed because it can’t operate with anything less than perfect reception” … not the best but better than the reboots that it’s doing now.

Now that I found out about this, I can workaround the problem.