How to monitor "gateway online"?

I’m thinking about a method / tool like a watchdog to get a message / signal that warns when my TTN-Gateway goes off-line. I could then e.g. start a reboot to get it on-line again.
If, for example, I had a tool to read the “… seconds ago” from the console-page … I’d be happy :wink: , because when the “seconds” turn into “minutes” it is most probably an off-line situation.

Because I have no clue, my question is: Anybody any ideas ?

1 Like

There are a few assumptions, but this works nicely for me. The main assumption being that you use wifi to connect to your gateway

The default behaviour of the TTN gateway is that when it loses its Wifi connection, the TTN gateway then becomes a station/access point and this is the “fault” condition that the hacked code checks for - i.e. if the TTN gateway SSID is seen (if the gateway was working OK, it would not be seen), then reboot - its a simple test and works well.
The other benefit is that the hacked code also checks if the broadband wifi SSID is absent, if it is absent, then the TTN gateway is switched off.
When the broadband wifi SSID is seen again, it switches the gateway back on.

Most of the code revolves around checking logic

I did consider more complicated things, but that would have involved having to know the Wifi password. My thinking was that if the error is on the network side, rebooting the gateway is unlikely to fix things.

If you really want to check if the network is running OK, I figured another process (HTTP?) would be better anyway - because you might have 100 gateways, but you only want to check if the (one) network is working OK

If you wanted to, you could change the hacked code to send a message when it has to reboot (and other things perhaps) - but again, this would involve needing to sign into the Wifi (which the current hacked code never has to do)

This hacked code assumes you have a working gateway - and not one that is in a constant reboot loop. If you have one of those, it would never work anyway :slight_smile:

Also, if you can’t ping something remotely, you can’t control it remotely either, so any rebooting has to be locally controlled.

2 Likes

Thanks for your reply. I do use Wifi, so that assumption is right. My gateway is sometimes online for half a day and sometimes for more than a week. Never experienced the reboot loop.
Another problem that I experienced is that sometimes the blue leds of the gateway indicate that the wifi connection is active, but the gateway is not ‘pingable’ and the console says ‘seen it minutes/hours ago’.
So I think I need a different method…

I think this may be solved in future releases - maybe :slight_smile:

:warning: :warning: :warning:

This post contains references to outdated APIs.

Please read the following for the new APIs: New API for gateway mapping, status and info


You’d also need to monitor the number of uplinks, if your gateway happens to suffer:

Without the need for any authentication, you can also get those details from an URL such as http://noc.thethingsnetwork.org:8085/api/v2/gateways/your-gateway-id-here

For my TTN gateway I’m monitoring the UART output; see Raspberry Pi to monitor serial output of a node or TNN Gateway, and alert on Slack or Telegram.

(I’m currently also testing a little enhancement to get some statistics out of that; will release a new version this week I guess.)

1 Like

At first glance I think this might help me.
Could take a while, but I will test it and present the results here. Thanks.

Hi peterq, I have just tried your url, for a few local gatway id’s but unable to get a respoinse:

This site can’t be reached

noc.thethingsnetwork.org took too long to respond.

Is this still supported?

:warning: :warning: :warning:

This post contains references to outdated APIs.

Please read the following for the new APIs: New API for gateway mapping, status and info


noc was working fine when I looked at it yesterday. suggest you try generic listing of the JSon data (URL up to gateways) 1st then manually search through for your gateway ID, then check if what you type matches the way it is listed there, a small change in descriptor/format will be be a problem.

http://noc.thethingsnetwork.org:8085/api/v2/gateways

:warning: :warning: :warning:

This post contains references to outdated APIs.

Please read the following for the new APIs: New API for gateway mapping, status and info


Thanks Jeff (appears work site is blocking port 8085 preventing me access)
But I found this works fine for me:

https://www.thethingsnetwork.org/gateway-data/location?latitude=-32.8835950&longitude=151.7279500&distance=200

Outputting JSON

I wonder if this is suitable for monitoring. As far as I know this is used for the maps on the website, and might not be updated right away: https://www.thethingsnetwork.org/forum/t/how-often-is-the-description-on-the-ttn-gateway-map-updated/20757

Hi Arjan,
The links does reflect the ttnmapper last upload time, but agree it does not reflect the actual last uplink time. In looking at sensor data uplink timestamp and the time stamp from the url above, they can be out significantly. But have yet to find a more accurate method.

Any ideas anyone?
Regards
Ian

I don’t know if this would be useful for you, but I use node red mqtt connection to backend of ttn. If the flow fails to receive a message for 5 minutes, I send an email message to myself. I expect to see node uplinks at least once per minute.

It’s a simple five minute timer that is reset every time an mqtt message is received. If it expires, I know that nothing has been received for five minutes.

This doesn’t specifically mean that my gateway is down, it could be that my nodes aren’t sending, or that they aren’t being received by a working gateway. At any rate, it means that something is wrong with my system.

Edit. With proper checks it could be modified to look for a specific gateway in the messages. Since I’m the only gateway in the area, I can assume that it’s mine that is having an issue.

2 Likes

I have to say that there is no answer yet to solve my problem: not only to know if my gateway went off-line, but also to know quickly what the cause for the off-line is.
In my situation there are quite some probablities for an error:

  1. My gateway has a wifi-connection through a meshpoint in my in-house router network. If the gateway is connected to the meshpoint but the meshpoint has no communication with the router, the gateway seems to have a wifi-connection. No indication for an error.
  2. The router is connected to a modem/router of my provider and only with a tool (available) I can see if there is an internet-connection. Manually controlled tool.
  3. If there is an internet-connection, I can see if there is a connection to the TTN-network bij looking at the ‘last seen’ indicator.
    So at this moment, only the information ‘last seen’ is available to indicate the end-to-end state, but in case of an error it doesn’t give me a clue if I have to ‘reset’ my gateway or not in order to get a proper connection to the TTN-network.
    I think I need more than 1 indicator in order to know what the state of the gateway in connection with the TTN-network is.
    For the time being, I solved it by resetting the gateway periodically and getting a notification from my router if it is off-line meaning ‘no internet connection’. Not great, but much better than nothing.
    Everything is powered by a UPS.

Sounds like you might be able to use a raspberry pi that is connected (direct or WiFi) to your internet provider modem and ping your ttn gateway and the mesh extender, assuming the mesh device has an IP address and can tell you who is connected to it. The raspberry pi could determine if the internet connection is down, if the nesh device is reachable, if the gateway is reachable etc.

My recommendation is to just use a more reliable gateway. My rak831 has been up 21 days without issue. I rebooted it 21 days ago to try to determine why its last seen wasn’t updating; that turned out to be a problem with the noc, not my gateway. It is currently using a wifi connection too. My plan is to relocate it outdoors and use POE to connect it to my network. When I rebooted it, it had been up for several weeks prior to the reboot.

Some people don’t consider the raspberry pi to be a reliable device, but I’ve personally had great success with them in a variety of applications.

Could you send me the node red flow?
Ive been thinking to use integration to swagger database then node red to access payload + metadata via http get request. But yourmethod sounds much easier.

I don’t know how to cut out just the email alert part and remove my appkey and email login info.

It’s really easy, you just

  1. install node-red-contrib-ttn using the pallet manager tool and then place the ttn-uplink thingy and configure it with your appkey
  2. add a function block to add msg.topic to the object. This sets the email subject line.
  3. pass the output to a trigger function that sends nothing, waits five minutes and check the box to reset the trigger when another message arrives.
  4. then pass that to an email output (part of social) configured with your outbound/smtp mail settings

Then when nothing comes for five minutes, you’ll get only one email notifying you.

If this isn’t enough to get you through it, we can probably talk on the phone or something and I can walk you through it.

Thanks! I’ll do that.

Indeed, this is cached. From Slack:

@sebastianb [Feb 25th at 11:51 AM]

A question on gateway locations, hoping someone knows: which database does this API URL pull on? it does not seem to be the same locations i see in the console, and in in the case of some gateways returns nonsense values, with 14 decimals. -
https://www.thethingsnetwork.org/gateway-data/location?latitude=55.65959816&longitude=12.59145593&distance=20000

it s perfectly possible that i m guilty of having generated these nonsense entries at some point, but where is it that they survive?

@htdvisser [Feb 25th at 12:16 AM]

That API is heavily cached because of the high number of requests to it, so it could take some time to update

The number decimals is probably because of some 32-bit/64-bit floating-point conversion

I’m using Zabbix to monitor my servers/devices and recently I was searching for possibility to add monitoring for my TTN gateways.
I initially found this thread and suggestion to use API http://noc.thethingsnetwork.org:8085/api/v2/gateways/ … I was thinking of writing own scripts but later I found a ready script / template for Zabbix https://github.com/zone11/ttn-zabbix-gw
It uses same API and is easy to set up if one already has running Zabbix instance.

I thought this slightly off topic comment might be helpful for others who also would like to monitor status of their gateways any maybe already using Zabbix

1 Like

Hi,
a different approach testing gateways with a node and application. Ensuring uplink/downlink is working fine by sending periodic ping and pong messages.
blog link, the code for both components is published here github link.

1 Like