TTN GATEWAY central

Today my Things Gateway has died. I had to move it, so briefly unplugged the power cable (as I’ve done several times before without issue), and now it’s stuck in the dreaded reboot loop. I’ve never had any issue with this gateway before (It’s been running for 6 weeks and transferred 120K+ packets). A full reset does not help. Using Ethernet does not help. The firmware is still the original (for me) v1.0.0-917719b9 (2017-06-26T17:59:33Z)

As far as I can see, reading through all the messages here, this may be the first report of a fully functional gateway randomly transitioning to this broken state?

I had the same problem this morning, tried different paths, with ethernet connection, wifi, nothing seemed to work. Then i just cycled my wifi router and TTN gateway power and after 10-15 minutes or so rebooting it worked again…

@Codion @Bokse0011 had same problem also this morning. In my case I had to temp turn off otherwise reliable TTN GW whilst adding in a multi-way power strip & 5 port switch to allow connection of other collocated GW’s for comparative tests in roof space. After adding power strip but before even touching network side GW powered up but then went in loop mode where it would cycle through to 4 leds then immediately fall back to 2nd led flashing 4/5 times (with 3rd led solid) before going to 4 then immediately falling back to 2nd led flashing…left alone for >1 hr still issue so brought down to office and directly connected to internet gateway - same problem…left running in loop for ~30 mins and suddenly stable (note updates and use of beta are turned off so it wasn’t an update loading issue). Then took back to roof space and quickly re-installed and immediately fine and has been good for hours since. :slight_smile: It is a lot colder in roof space (~7/8 deg c vs ~18/19 in office so suspect it may be a temp dependence - as GW was still warm when I took it back to restart in roof space. Given identified timing issues on the design wrt internal vs external timing source I suspect that may be a contribution. Will experiment more in coming days/weeks but doesn’t bode well for long term reliability over more extreme diurnal temp ranges, or quick self recovery after any power fails. Then again it could just be we all hit a TTN/Network issue? :wink: My suggestion is try warming or cooling your GW before trying again and see if that helps…if it does let us all know.

My gateway is partially working again. Some notes, in case anyone trying to diagnose this is interested:

I left the gateway in its reboot loop, also connected to ethernet, with the lid off to stay cool(er) and it appeared to recover itself after about 1 hour. Kind of. I had enabled the beta firmware option in the console, and it upgraded itself to v1.0.1-facdef23 (2018-02-14T08:16:22Z) and forwarded some LoRa packets again. Looked good. I then configured it for wifi (not because I really want to use wifi, but it seems this is currently the only way to get rid of the gateway’s hotspot?) and it got stuck in the reboot loop again. I left it in this state for about 10 minutes and it appeared to become “stable” again, on Wifi, but was quietly NOT forwarding any packets to TTN The LoRa LED would flash, but nothing was forwarded to my TTN application or shown in the console gateway traffic log. The “info” page on the gateway itself also showed “Packets up” increasing. After 10 minutes of this behaviour I dared to powercycle it again and after a few more reboot-loops it seems to be working once more - forwarding packets, but still occasionally rebooting too.

Really hope for another firmware update soon!

Hope for you guys that this is is a issue that is really fixable via firmware. Since the source is available i think a simple logic error would be already discovered…

1 Like

After building my own firmware (with debug logging turned on), my gateway stays up, at least for a little while. It has rebooted once (it seems to require a full power cycle to get it back up again). The uptick counter on the info page showed mid-2000s when it rebooted. And as I was typing, the second reboot occurred at around 1000-ish. Sometimes it comes back, sometimes not (ie. its not yet predictable for me).

My UART device hasn’t arrived yet, and thus I don’t have any logs to share yet. But, as already mentioned on several posts, seemingly this does point to a timing problem.

And on yet a different but related topic … This shows my ignorance on all things LoraWan (which I am trying to work on :slight_smile: ) … I have an Adafruit 32u4 with Lora, onto which I have compiled and seemingly successfully uploaded the LMIC stack and its ttn-abp (after modifications as described here).

During those time periods while the gateway is up, I have been trying to register the device and transmit “hello world”. Thus far seemingly unsuccessfully (I am seeing messages queued on the device, but nothing is showing up in the ttn console yet). Note I don’t have a HackRF yet to even know what kind of signal is leaving the Adafruit board, and I am unclear on many LoraWan topics such as device ids, eui’s, messaging, otaa vs abp, and other topics. I am not sure you have any pointers that might be useful for me to explore (right now its all about me googling, and lots of hit and miss :slight_smile: )

For what it’s worth, my gatway is now actually unusably broken for real use… The TTN thinks my gateway is online. The gateway thinks its online. The LED flashes when a LoRa packet is received and the counter goes up on the device. Yet nothing in the console.

This was pretty rock solid realiable, and right now it’s basically useless.

Time to start getting dirty and trying to debug this thing too I guess. Very disappointing.

You mean the gateway’s Traffic page in TTN Console? Maybe bad CRCs, I’d try another node.

8 Hours ago there was some problem on the backend (EU?). Packets where not displayed after 22:50 UTC. Packet forwarder log was ok…

Suspect there was something weird happening around that time as I noticed three collocated gws under test all showed different info. A laird & imst lite seemed in sync showing same packets & time stamps & count & were visible as “last seen” and connected. The ttn gw however kept disappearing with “last seen” often extending to 2 mins, 3 mins even 8mins+. This manifested as 3 open browser windows for each gw showing different packet received list. At one point the ttn node was only showing 1 in 6 and the counts seemed to get out of sync with the other 2 and time stamp for the ttn node showing count # against wrong time stamp value. Given my earlier post about suspecting a temp dependence on the ttn gw - temp in roof space obviously dropping as night went on - I assumed problem my end. Checked gw connection aro 12:30am as left office and could see solid 4 LEDs & regular enet port connection traffic. All 3 backhaul through same DSL hub & internet connection. Checked this morning and looks solid again.

Clarification - collocated as all at same premises, but only imst & ttn currently sharing roof space as laird only commissioned yesterday and on test in office b4 move to roof space later today…hence was looking carefully at comparative data :sunglasses:

Just for future reference: during the major EU region problems that were happening until a few moments ago, I also got into regular reboots with RQMQTT: Connection failed followed by MAIN: MQTT error and Reboot reason: 0x10.

At other times in the recent logs I see MQTT: Connection lost (rather than failed) like the following:

LORA: Accepted packet
MQTT: Sending UPLINK OK
LORA: Accepted packet
MQTT: Sending UPLINK OK
MQTT: Sending status packet
MQTT: Sending status succeeded: 1
MQTT: Connection lost

MAIN: MQTT error

…followed by many:

MQTT: GOT IP: 52.169.76.203
Connecting to: 52.169.76.203
MQTT: Opening socket timed out, restarting
MQTT: GOT IP: 52.169.76.203
Connecting to: 52.169.76.203
MQTT: Opening socket timed out, restarting
MQTT: GOT IP: 52.169.76.203
Connecting to: 52.169.76.203
MQTT: Opening socket timed out, restarting

So, it seems Connection failed (often) yields a reboot, but Connection lost makes the firmware try forever.

(Right now, without me interfering, it’s reconnected just fine again.)

@htdvisser, if possible, can you provide us with an update of the team’s current progress or status on the “reboot” issue(s)?

I’m not involved with the gateway firmware development, so I don’t know the details. The last thing I heard was that they’re working on changing the baudrate of the UART communication between the microcontroller and the LoRa module. It turns out that this requires more work and testing than expected, so we’ll just have to be patient.

Hi all,

The range of my TTN gateway (stock antenna) leaves a lot to be desired. I am looking for some reference to tell whether I need to adjust my expectations or something else… :wink:

I am aware of how tricky radio transmission is with near-field effects, obstruction/reflection and what not. At the same time there are use cases with gateways in manholes with a reported usable range of 500 meters, sensors within shipping containers that reach up to 2 km (babbler.io). Surely my gateway in the attic behind (mostly) standard dutch roofing (hardboard with rooftiles) on most sides should do be able to provide a useable range of more than 500 meters?

I have one measurement at 500 meters, but it is an outlier, in most directions I don’t get near that.

I have done some RSSI measurements near the gateway using a Things Node and a Things Uno (-40 dBm at 1,5 m), but I don’t know if that means anything or not. Any suggestions, experiences?

[edited: typo in babbler.io URL]

I was doing some investigation by myself how difficult it should be to activate the external clock. According to the PIC datasheet it is a config word which need to be set.

REGISTER 25-1: RTCCON: REAL-TIME CLOCK AND CALENDAR CONTROL REGISTER
.
bit 10-9 RTCCLKSEL<1:0>: RTCC Clock Select bits
When a new value is written to these bits, the Seconds Value register should also be written to properly reset the clock prescalers in the RTCC.
11 = Reserved
10 = Reserved
01 = RTCC uses the external 32.768 kHz Secondary Oscillator (SOSC)
00 = RTCC uses the internal 32 kHz oscillator (LPRC)

When I search the firmware GIT repository, I was not able to find anything about RTC / RTCCON / RTCCLKSEL. However when I tried SOSC I got in the harmony/framework/system/clk/sys_clk.h file:

void SYS_CLK_SecondaryOscillatorEnable ( void )
This function enables secondary oscillator which can be used as a clock source for peripherals like RTCC, Timer etc… The SOSC requires a warm-up period of 1024 before it can be used as a clock source.

But this function is never used. It looks to me this function needs to be called 1024 clockcycles after the SYS_CLK_Initialize function is called in system_init.c (which is a precondition of this function) to fix the problem. Is that right?

A post was merged into an existing topic: Gateway not starting

That is a different clock. You should look for oscillator settings.

I don’t know if this is of any help. It documents my experience - where a Gateway that does set-up subsequently fails (in a sneaky manner)

You are right. But the document is saying to me that Microship is naming the external oscillator: “Secondary Oscillator SOSC”

That’s exact the same thing I am finding back in the description of the void SYS_CLK_SecondaryOscillatorEnable ( void ) function from the firmware. If I have some spare time in the weekend, I am going to try if using this function during system boot will solve the uart comm problem between the PIC and LORA module.