Known issue: Gateways appear as offline


(Hylke Visser) #1

Because of an issue in our NOC component, gateways currently appear as offline on the console and on our maps. We are doing our best to have everything back to normal. Routing services are up and running.


The NOT CONNECTED central topic
Network and Router Issues
(Hylke Visser) #2

#3

Do the gateways only appear offline or are they actually offline?


(Hylke Visser) #4

As I wrote in the first post:


(Hylke Visser) #5

(Hylke Visser) #6

What happened

The NOC

The NOC experienced unexpected timeouts while writing to its database. Instead of backing off, the NOC started even more processes trying to write to the database (an unintended positive feedback loop). This resulted in an explosion of goroutines and memory usage and eventually the NOC just kept crashing/restarting/...

The Proxy

We have an SSL-terminating proxy in front of the NOC. When the NOC crashed and restarted within a second, the proxy did not close existing connections

The Routing Services

Because the connections were not closed by the proxy, the routing services (router, broker, ...) that forward metadata to the NOC did not back off to allow the NOC to recover, this also didn't really help. Instead, they started buffering these messages, dumping a flood of messages onto the NOC when it came back after a crash. For some reason these components also spawned more goroutines, leading to extremely high memory usage and slowdown of message processing.

Mitigation

We temporarily disabled forwaring metadata to the NOC, but as a result the gateways now appear as offline on the console and maps.

Resolution

We are still working on reproducing the issue in a controlled environment, and will post an update when we know more. We aim to re-enable NOC forwarding within a couple of hours, after which the gateway pages should display the correct gateway status again.


(Hylke Visser) #7

We decided to push re-enabling the noc to tomorrow.


(Arjan) #8

Until then, ttnctl gateways status [gatewayID] still shows the actual status of your gateways.


(Niau) #9

Now miraculously my gateway appeared online :wink:


(Gsethi2409) #10

Two of our gateways appear online now! Finallyyyy!


(Alexbn71) #11

Mine too, yippee! :slight_smile:

Alex


(Mark189) #12

Hello, is the problem solved? our gateway status is still "not connected"
Lora traffic is being send but the status in TTN is incorrect.


Gateway always show not connected
#13

Watch https://status.thethings.network/ for updates.


(Mark189) #14

Thanx Marcelstoer


(Alexbn71) #15

Are there any predictions about returning online on the map?

Thanks
Alex


(Smartohm) #16

The Gateway now appears in the console.

It still doesn't appear when issuing CLI "ttnctl gateways status ....." or shows as inactive with "ttnctl gateways info ...", which I normally would consider more accurate.

Is there any way to fix this?


MultiTech Gateway Register on thethingsnetwork
(Hylke Visser) #17

Did you supply --router-id to ttnctl?


(Alexbn71) #18

Actions done in the dashboard on gateway and devices are reported by ttnctl in real time? or there is a delay?

Thanks
Alex


(Pat Molloy) #19

ttnctl gateways info eui-ID
and
ttnctl gateways status eui-ID

both working for me when I enter the eui -ID of my gateway.


(Smartohm) #20

The problem was with --router-id, as we are using ttn-router-asia-se,

Thanks htdvisser!

This command seemed to work previously without this flag. Perhaps it was assigned a different router prior to the recent changes.