Connection to server "bridge.eu.thethings.network" failed

thomaswa · September 4, 2018, 10:14am

I’m not able to connect since yesterday:

04.09.18 12:08:09 (+0200) main 10:08:09 ERROR: [TTN] , retry in 30 seconds
04.09.18 12:08:09 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 0, total 30
04.09.18 12:08:19 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 10, total 30
04.09.18 12:08:29 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 20, total 30
04.09.18 12:08:41 (+0200) main 10:08:41 ERROR: [TTN] Connection to server “bridge.eu.thethings.network” failed, retry in 60 seconds
04.09.18 12:08:41 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 0, total 60
04.09.18 12:08:51 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 10, total 60
04.09.18 12:09:01 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 20, total 60
04.09.18 12:09:11 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 30, total 60
04.09.18 12:09:21 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 40, total 60
04.09.18 12:09:31 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 50, total 60
04.09.18 12:09:44 (+0200) main 10:09:44 ERROR: [TTN] Connection to server “bridge.eu.thethings.network” failed, retry in 120 seconds
04.09.18 12:09:44 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 0, total 120
04.09.18 12:09:54 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 10, total 120
04.09.18 12:10:04 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 20, total 120
04.09.18 12:10:14 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 30, total 120
04.09.18 12:10:24 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 40, total 120
04.09.18 12:10:34 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 50, total 120
04.09.18 12:10:44 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 60, total 120
04.09.18 12:10:54 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 70, total 120
04.09.18 12:11:04 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 80, total 120
04.09.18 12:11:14 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 90, total 120
04.09.18 12:11:24 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 100, total 120
04.09.18 12:11:34 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 110, total 120
04.09.18 12:11:46 (+0200) main 10:11:46 ERROR: [TTN] Connection to server “bridge.eu.thethings.network” failed, retry in 240 seconds
04.09.18 12:11:46 (+0200) main src/ttn_transport.c:371:ttn_connect(): ttn_connect: sleeping() at 0, total 240

EricVdB · September 4, 2018, 10:34am

replace with router.eu.thethings.network

Greetz

kersing · September 4, 2018, 12:17pm

No, that is the wrong destination. For TTN transport protocol bridge.eu.thethings.network should be used. (According to the TTN back-end)

kersing · September 4, 2018, 12:21pm

Check your network connection to the back-end. (For instance try to use nc -v bridge.eu.thethings.network 1883) If the connection succeeds, check you gateway ID and key (for instance with mosquitto_sub with ‘-u ID’ and ‘-P key’.

thomaswa · September 4, 2018, 1:20pm

I switched to “Router digitalcatapult-uk-router” in the Gateways settings.
Besides the new connection problem, now also my stability issue for weeks seems to be solved.

daz37 · September 4, 2018, 2:02pm

Hi,

ERROR: [TTN] Connection to server “bridge.eu.thethings.network” failed, after rebuilding gateway and setting up new gateway on ttn and new app and device on resin

Raspberry pi 1 and ic880a gateway started throwing this error after rebuilding the gateway for a tutorial. This build was working perfect up to the time it was taking down for the tutorial. I was wondering if there is a conflict were setting up a new gateway with hardware from a previous built and working gateway would cause an issue. At the moment working gateway is deleted from ttn and the resin app and device were also deleted and reset up from scratch with no joy. i dont know what to try next hoping someone else has encountered this problem aswell and can help me out.

thx in advance

froland · September 4, 2018, 2:54pm

Hi,

I’m using a RAK831 gateway provisioned with https://github.com/jpmeijers/ttn-resin-gateway-rpi and I had a similar issue today.
From what I’ve seen when starting the multi-protocol packet forwarder, the router address received from the account server is mqtts://bridge.eu.thethings.network:8882.
I’m going through a corporate firewall which doesn’t let pass a TCP connection that port.
What is strange is that mqtts standard port is normally 8883.
Was there a change in the config of the account server?

Anyway, I was able to restore my gateway connection by setting the ROUTER_MQTT_ADDRESS environment variable to mqtt://bridge.eu.thethings.network:1883.

Using nmap, I found that 1883, 8882 and 8883 ports are all opened (I tested it from outside of my corporate network).
Both 8882 and 8883 ports present the same SSL certificate.
I’ll ask the opening of these ports on my corporate firewall and check whether I can then connect back with default parameters (and secured MQTT).

I hope some of you will find hints in this to get their gateway back and running.

Best regards

jpmeijers · September 4, 2018, 8:24pm

@froland was on the correct path. The issue is a combination of a few things:

MP packet forwarder only supports MQTT, not MQTTS yet.
The resin setup started passing the port specified by the account server to MP, so that the setup will work with private backends.
The account server very recently started adding the MQTTS ports to the bridge MQTT address for the public routers.

These three in conjunction caused the MQTTS port to be passed to MP. MP would then fail to connect.

As a workaround the two MQTTS addresses 8882 and 8883 are being replaced by 1882 and 1883 respectively before the address is passed to MP. This should solve the connect issue, while not affecting private instances with custom MQTT ports.

Please find the updated resin setup on:

daz37 · September 5, 2018, 8:29am

Hi,

yes this has sorted the problem thanks very much for the help much appreciated as we were all day yesterday trying to get this gateway back up and running, what rotting luck to have that this was the issue on the day we tore down the gateway for a tutorial. oh well the joys of engineering. again thanks.

pe1mew · September 5, 2018, 1:03pm

Ok, this issue affected two of my gateways I was currently building/testing. What a coincidence that the change happend at the same time I was testing .
Question is: will this affect all existing installations of resin.io gateways? I guess yes.

jpmeijers · September 5, 2018, 1:27pm

It is a good idea to keep all your gateways running Resin.io updated to the latest version. So you should pull/push the code periodically.

The change that added support for private backends, and indirectly causing this issue was added on 2 August. So any gateways that were installed/updated between 2 August and last night will be affected.

But as stated, it’s a good idea to update all your gateways anyway. That is why we are using Resin.io isn’t it?

thomaswa · September 5, 2018, 1:28pm

Is there a way to have an automatic, perdiocally update mechanism?

jpmeijers · September 5, 2018, 1:31pm

Auto update is not necessarily a good idea. An easy one-click supervised update will be better so that you can see if it succeeds or fails and you need to revert to an older version. You can however run a cron job on a linux server that does the pull-push automatically once a day - only if the code on github changed.

If this feature request comes through, it will make the usage of the resin setup much easier:

froland · September 5, 2018, 1:51pm

@jpmeijers, what do we miss to support MQTTS?

I can maybe contribute to this feature.

pe1mew · September 5, 2018, 2:01pm

the resin.io solution is using the multi packet forwarder of Jac Kersing: https://github.com/kersing/packet_forwarder. Here you can contribute to support MQTTS.

thomaswa · September 6, 2018, 7:20pm

Is it possible that the multi packet forwarder somehow loses connection to router and does not re-establish the connection without restarting the gateway?
I have “Last Seen 29 minutes ago” in gateway status, also in the Application that have nodes near to this gateway, but when I check the last gateway output there is no error and it says connected:’

06.09.18 21:19:34 (+0200) main ##### 2018-09-06 19:19:34 GMT #####
06.09.18 21:19:34 (+0200) main ### [UPSTREAM] ###
06.09.18 21:19:34 (+0200) main # RF packets received by concentrator: 0
06.09.18 21:19:34 (+0200) main # CRC_OK: 0.00%, CRC_FAIL: 0.00%, NO_CRC: 0.00%
06.09.18 21:19:34 (+0200) main # RF packets forwarded: 0 (0 bytes)
06.09.18 21:19:34 (+0200) main # PUSH_DATA datagrams sent: 0 (0 bytes)
06.09.18 21:19:34 (+0200) main # PUSH_DATA acknowledged: 0.00%
06.09.18 21:19:34 (+0200) main ### [DOWNSTREAM] ###
06.09.18 21:19:34 (+0200) main # PULL_DATA sent: 0 (0.00% acknowledged)
06.09.18 21:19:34 (+0200) main # PULL_RESP(onse) datagrams received: 0 (0 bytes)
06.09.18 21:19:34 (+0200) main # RF packets sent to concentrator: 0 (0 bytes)
06.09.18 21:19:34 (+0200) main # TX errors: 0
06.09.18 21:19:34 (+0200) main ### BEACON IS DISABLED!
06.09.18 21:19:34 (+0200) main ### [JIT] ###
06.09.18 21:19:34 (+0200) main # INFO: JIT queue contains 0 packets.
06.09.18 21:19:34 (+0200) main # INFO: JIT queue contains 0 beacons.
06.09.18 21:19:34 (+0200) main ### [GPS] ###
06.09.18 21:19:34 (+0200) main # No time keeping possible due to fake gps.
06.09.18 21:19:34 (+0200) main # Manual GPS coordinates: latitude 50.69847, longitude 13.00944, altitude 447 m
06.09.18 21:19:34 (+0200) main ### [PERFORMANCE] ###
06.09.18 21:19:34 (+0200) main # Upstream radio packet quality: 0.00%.
06.09.18 21:19:34 (+0200) main ### [ CONNECTIONS ] ###
06.09.18 21:19:34 (+0200) main # ttn.thingsconnected.net: Connected
06.09.18 21:19:34 (+0200) main # Semtech status report send.
06.09.18 21:19:34 (+0200) main ##### END #####
06.09.18 21:19:34 (+0200) main 19:19:34 INFO: [TTN] ttn.thingsconnected.net RTT33
06.09.18 21:19:34 (+0200) main 19:19:34 INFO: [TTN] send status success for ttn.thingsconnected.net
06.09.18 21:19:38 (+0200) main 19:19:38 INFO: [up] TTN lora packet send to server “ttn.thingsconnected.net”
06.09.18 21:20:04 (+0200) main

kersing · September 6, 2018, 8:00pm

As answered on github where you asked the same question:

If the connection is lost it will log an error and (attempt) to reconnect. Every status update shows the RTT time, this is a turn around time which requires the back-end to send an MQTT ACK for data received.
The behavior observed points to a disconnect between the back-end components, not an issue with the packet forwarder connection to the back-end.

BTW, this forum is the right place for questions, just keep in mind I’m not getting paid for this so I do not sit at the computer 24 hours a day, as a result it might take some time before an answer arrives. (Issues on github are not replied to quicker, which gets handled first depends on what lands in my mailbox first.)

thomaswa · September 6, 2018, 8:06pm

Thank you for your fast answer.

One question I still have: How can the communication be re-established in this case?

thomaswa · September 6, 2018, 8:27pm

I have restarted the gateway container (forwarder) and just one second later the gateway was again shown in console as connected and it receives its last seen every 30 seconds again.

So I guess, the gateway must be involved in this interrupted communication? Strange that I could not see any error in the log.

kersing · September 6, 2018, 8:38pm

Your gateway was talking to an active mqtt connection otherwise it would have complained. Restarting the packet forwarder reconnected to may-be the same mqtt instance or may-be to a new one, hard to debug without access to the back-end logs. It might just have been an issue in mqtt for that one connection, also hard to determine.

Glad you are up and running again. Keep in mind the community network has no SLA so things like this will happen from time to time. (Talking from experience)