TTN GATEWAY central

Firmware: 1.01
Socket part serial: 1636D4

Same reboot loop. 5 lights, 8 second cycle.

Log:

SNTP: State change from 0 to 0
SNTP: State change from 0 to 0



**************************
*   The Things Network   *
*      G A T E W A Y     *
**************************
Firmware name: AmazingAckermann, type: 0, version: 1.0.1, commit: facdef23, timestamp: 1518596182
Bootloader revision: 1, commit: 7167873a, timestamp: 1496411298
Build time: Feb 14 2018 08:16:44
Reboot reason: 0x03
BOOT: (persisted info) 6F 72 72 65 01 03 BB FD 8E 4D E4 2D 9A F7 F2 DC 




WIFI: Entering state 0
WIFI: Entering SCAN state 0

MAIN: Initialisation complete
LORA: Changing state from 0 to 0

MAIN: Leaving state 0
MAIN: Entering state 1
FLASH: Magic bytes found: wifi config present
FLASH: Magic bytes found: activation data present
FLASH: Magic bytes not found: no stored FOTA data present
FLASH: Loading Firmware Data
CNFG: (Firmware HASH (sha256)) FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 
FLASH: Loading WiFi Data
CNFG: WiFi SSID:      ***
CNFG: WiFi key:       ***
CNFG: WiFi conn_type: 1
CNFG: WiFi sec_type:  4
FLASH: Loading Activation Data
CNFG: Gateway ID:         ***
CNFG: Gateway Key:        ***
CNFG: Account Server URL: https://account.thethingsnetwork.org
CNFG: Locked:             true
CNFG: Locked first time:  false

MAIN: Leaving state 1
MAIN: Entering state 2
INET: State change to 0
LORA: Initialisation complete
LORA: Changing state from 0 to 1
WIFI: Entering state 1
ETH: IP Address: 0.0.0.0 
WIFI: Entering state 4
WIFI: Entering SCAN state 1
Scan is completed successfully
WIFI: Entering SCAN state 2
WIFI: Entering SCAN state 5
WIFI: Entering SCAN state 0
WIFI: Entering state 2
WIFI: Disabling modules
Head magic match void: trying to free an already freed block, ignore
WIFI: Entering state 3
SNTP: State change from 0 to 1
WIFI: Enabling modules for client
WIFI: Entering state 6


>WIFI: IP Address: 0.0.0.0 
CB: INET: Gateway has WiFi
INET: State change to 2
INET: Connected to a network, waiting for DHCP lease, checking validity with ping
SNTP: State change from 1 to 2
WIFI: IP Address: 192.168.0.115 
LORA: Wait init complete, waiting for application.
LORA: Changing state from 1 to 2
INET: State change to 3
INET: Ping probe
INET: Error sending probe on Eth
INET: Ping response from MRF24WN, set as default
INET: State change to 4
SNTP: State change from 2 to 3
MON: SYS Stack size: 3959
MON: heap usage: 147KB (156KB), free: 192KB
SNTP: State change from 3 to 4
SNTP: State change from 4 to 5
SNTP: State change from 5 to 6
SNTP: State change from 6 to 1
INET: Initiated NTP request.
SNTP: State change from 1 to 2
MON: SYS Stack size: 3959
MON: heap usage: 147KB (156KB), free: 192KB
SNTP: State change from 2 to 3
SNTP: State change from 3 to 4
SNTP: State change from 4 to 5
SNTP: State change from 5 to 6
SNTP: State change from 6 to 7
INET: State change to 5

MAIN: Leaving state 2
MAIN: Entering state 3

CNFG: Load online user config state change to 4
HTTP: Close active socket 0
HTTP: Starting connection
HTTPS: Connection Opened: Starting TLS Negotiation
HTTP: Wait for TLS Connect
HTTP: TLS Connection Opened: Starting Clear Text Communication
HTTP: Got 1289 bytes
HTTP: Connection Closed
HTTP: Close active socket 1
CONF: Parsing response token: HTTP/1.1 200 OK
CONF: ROUTER URL: mqtts://bridge.eu.thethings.network:8883

CNFG: Load online user config state change to 6
FREQ: APP_URL_Buffer: https://account.thethingsnetwork.org/api/v2/frequency-plans/EU_863_870
HTTP: Starting connection
HTTPS: Connection Opened: Starting TLS Negotiation
HTTP: Wait for TLS Connect
HTTP: TLS Connection Opened: Starting Clear Text Communication
HTTP: Got 1232 bytes
MON: SYS Stack size: 2870
MON: heap usage: 227KB (233KB), free: 111KB
HTTP: Connection Closed
HTTP: Close active socket 1

CNFG: L

Dear all

Earlier this week, @arjanvanb developed a patch for the reboot problem some of you have encountered on the gateways. After further investigation we’re confident we found the root cause of the problem. In short: the Gateway microcontroller uses its internal clock instead of the external clock. This configuration results in unstable firmware.

You can find a more in-depth explanation below, but first, what does this mean for your gateway (i.e. if you have the issue) and how can you get your gateway running?

We are working on a new firmware version that will fix this. To get there we are working together with Microchip and TWTG on highest priority. This will take some time, and we want to do some extensive tests before we make the change available through the over-the-air firmware updates for your gateway.

The fix developed by @arjanvanb solves the reboot issues for many gateways, although it doesn’t fix the root cause. The patch fixes the reboot issue but unfortunately the gateway firmware remains running on the internal clock and thus in some cases out of spec. This patch is already available through our “beta” over-the-air firmware update channel.

At the same time, we’re working on a different patch, which will halve the baudrate to the LoRa module from 115200 to 57600. This will let the gateway work within spec, but effectively reduces the total bandwidth of the gateway (not noticeable for 99% of the users). We hope to make this patch available through our “beta” over-the-air firmware update channel in a few days.

Both of these two patches will bring us a lot closer to resolving the gateway reboots while we work with Microchip and TWTG on a permanent fix.

Click here to subscribe to the mailing list to be informed about gateway firmware updates.

You can enable beta updates from The Things Network Console in the settings of your gateway. Note that beta updates can be unstable. If your gateway is working fine, you’ll probably want to stick with stable firmware updates.

Some gateways reboot too often for an automatic update. In order to update the gateway via SD perform the following steps:

  • Download the most recent beta firmware files: firmware.hex and checksums
  • Power off the gateway.
  • Take a micro SD card and make sure it’s FAT32 formatted. You’ll only need a few MB, so most SD cards should be fine.
  • Create a directory on the SD-Card called: update
  • Copy the firmware.hex and checkums files to the SD-Card update folder
  • Safely eject the card from your computer and insert it into the Gateway
  • Power on the gateway, if the first led is blinking, then it is updating.

Note: These patches need to be put on an SD-card. After flashing, the SD-Card needs to remain in the Gateway, otherwise other firmware may be reinstalled from our over-the-air update channels. If you update through SD-card, make sure to keep the SD-card inside the gateway until the fixes are also made available on our over-the-air update channels. Please subscribe to the gateway newsletter to automatically receive updates on the topic and receive a notification when the over-the-air fix is public and available.

A technical recap of the issue for the more technical users:

After a period of investigation supported by the community, we may have found the root cause of the reboot problem of the gateway.
@arjanvanb proved that the reboots are related to incorrect data flow over UART between the PIC32 and LoRa module. The PIC32-LoRa module communication contains errors that lead to board reboots, this can be explained by an incorrect (factory) clock configuration. The gateway was supposed to be running from the external 24 MHz clock, but instead is is running from the internal 8 MHz oscillator. While the operational speed of the gateway is the same in both cases, the clock accuracy varies. This introduces communication errors when using in-band clocked protocols such as UART.

If you need additional assistance or if something does not go as planned, please refer to the issues in Issues ¡ TheThingsProducts/gateway ¡ GitHub

We’re working around the clock (pun intended) to fix issues such as these and push them as fast as possible to you guys. Nonetheless, not all can be planned and we need to triple-check if we’re fixing the correct conflicting parts within the firmware. We’re sorry for any inconvenience that this have may caused you.

–
The Things Products Team

5 Likes

Since you posted before TTP’s post above: how did you get that? (Did it fetch it itself, so: did your gateway run fine for some time?)

I’ve added a full log to Gateway loses network connection and needs manual reset to recover, often initiated by MQTT problems or by reboot for firmware upgrade · Issue #8 · TheThingsProducts/gateway · GitHub which, for me, seems to be related to failing to use MQTT while it actually does have an internet connection. It’s not in a reboot loop for that issue though.

Same here: 1635D4 for a mostly working gateway (well, when using updated firmware).

Now installing the new firmware :slight_smile:

Auto-update from the network after blanking the gateway (Power on with MODE switch pressed). Can get through setup and connecting to WiFi/Ethernet, but then will hit reboot loop when talking to the module and stay in it until power-cycle.

After a power-cycle, about one minute of connecting and handshaking before 5 lights and 8 second loop again.

Looks like a better fix is coming though, so happy to wait for that :slight_smile:

Bummer, same here: today’s firmware (installed using an SD card) gets me the original problem again (LORA: Starting reconfiguration is followed by LORA: Configuration failed, retry and eventually LORA: RESET MODULE and Reboot reason: 0x10.

As debug logging is not enabled, I guess I’m going to build a new version myself. But not tonight.

So the downloads get me the old reboot loop. However, when building myself from the develop branch with debug logging enabled then all is fine: https://github.com/TheThingsProducts/gateway/issues/1#issuecomment-365746884

And: building myself without changing anything gets me the same problems!? Is enabling debug logging actually solving the issue!?

I would imagine that enabling debug logging has a similar effect as the proposed beta patch (halving the baudrate) in effect slowing communication / execution down sufficiently to mitigate the timing issues with the internal clock?

1 Like

The firmware mentioned in this post also gives me the same issue as before … I have confirmed that gateway has been updated through SD card, is running fw version v1.0.1-facdef23 (2018-02-14T08:16:22Z), and I have disabled OTA updates.

I also have a connector with etched number 1636D4 if that matters.

UPDATE 1: after dozen of reboots during 30+ minutes, gateway finally connected to the broker for the first time. Its been connected for a few minutes now which is an improvement compared to before where it rebooted every minute before even connecting to the broker.

UPDATE 2: after half a day the gateway is again rebooting. Sigh.

I confirm that my Malfunctioning gateway has a 1636D4 socket; my gateway seem no long able to get past the power led solid with the second led flashing rapidly; and its not visible on the WiFi or wired network or as an AP?

Also tried flash new firmware without success :’(

it would explain why its so hard to track down!? sounds like a timing issue that is resolved by logging; a missing lock/mutex perhaps ? wonder if its worth faking a few on the task to try and identify the area ie

run with only the lora see it reboots
repeat above adding wifi
repeat above adding ethernet
…
you get the picture; @arjanvanb thanks for the info; just hope many gateway can be fixed :’(

I also tried the new firmware. No success. Keeps rebooting.
I also have a 1636D4 connector…

Most likely, the people at TWTG/Microchip did only test the gateway software with debug logging enabled. Understandable when you are developing the software but any professional in software ought to know that you must finally test the software as delivered to the customer. Putting debug logging in dramatically changes the timing or performance so things suddenly stop to work when those delays due to logging (especially to a serial terminal) are removed. As they had two years to complete the software due to hiccups in the hardware manufacturing one wonders why these issues have not been found earlier.
Another token to the immaturity of the whole is the fact that the gateway has to rebooted every 24 hours. Could you imagine that to be done on the average Linux server? Embedded software should be designed to run for years, not hours!

3 Likes

I agree, for mission critical, you can’t rely on volunteer based GW because as you said, if user remove or power off, you are stuck.
But you can just put you own GW is safe place, connected to TTN, the backend is pretty fine stable and you can rely on.That’s what I’m doing for industrial customers, never complains and if so we can arrange a local backend of another one, then they look at the price and stay on TTN and even add reliable GW to the community. So I don’t agree saying TTN is only for hobbyists, all network stack and backend is far better than commercial ones and V3 is coming will be may be the best ever seen.
And all of it for free, just unbeatable

2 Likes

@BoRRoZ, I argued that way as well. @GrahameH made that point, not me.

And as I said, I am impressed by the central infrastructure too, @Charles!

1 Like

Hi,

Just received my Gateway, and tried to activate it via https://ttn.fyi/activate

The Gateway properly connects to my WiFi, and I can see the status via http:///info

However, it keeps rebooting, and it never connects to The Things Console.

What can I do to fix this?

Thanks,
Yoeri

Upgraded Firmware from v1.0.0-917719b9 (2017-06-26T17:59:33Z) to v1.0.1-facdef23 (2018-02-14T08:16:22Z). Does not seem to have fixed my reboot loop. Still not connecting to Console.

Status Update:
Updating my Gateway via USB did at least take me to the point where the gateway forwarded the first packet to the TTN network. :+1:
I needed to disable the auto update in my TTN account, as i was stuck in an update loop :wink:
But it is far away from stable. I am not sure if it is hardware or software issue. Every time i slightly adjust the position of the LoRa card, it seems to be stable for some minutes but then the next reboot occures.

Additionally the packets from my TTN node are transmitted according to debug log, but i dont see the device on the activation map. But that is a topic for an other thread here i think.

Michael

Does it say ‘Gateway Card: 868Mhz’ (or your regional frequency) in the info output?

If it doesn’t and it says ‘ND’ instead of a frequency, you may want to try to push the Lora circuitboard in a bit firmer.

Earlier this week I fired up my gateway and it kept on looping as well, pushing the board a bit fixed it for me. It has been completely stable since.

My only concern now is that the gateway struggles to outperform my 2.4GHz wifi signal when it comes to range :expressionless:

But note that the info is updated every second or so. After booting, it will always say “ND” (“not detected”?) for a while.

People, let’s try to keep it a bit professional here. We’re not a bunch of children, so let’s not make this topic about personal attacks and name-calling. Based on the comments in this topic that have been posted over the past 24 hours, I think we should all take a step back and focus on what’s actually going on.

Some people are experiencing reboots of the gateway. This sucks and it doesn’t meet the quality that many of you have come to expect from TTN based on your experience with us over the past years. The @TTP team is working hard on a solution, but that takes time. Based on research by @arjanvanb, and the @TTP team, the problems have been traced back to a clock/timing issue. A patch by @arjanvanb seemed to solve the problem for some gateways, but doesn’t completely fix the problem. The @TTP team is working on a patch that should fix the problem permanently. This takes time and requires long-running tests to make sure that nothing else breaks. As soon as (intermediate) patches can be safely released, they will be pushed to the beta update channel.

Regarding @Ajj’s message: The gateway has undergone extensive testing (with production firmware, not just debug firmware), and there are currently hundreds of The Things Gateways online that do not have any reboot issues. The 24-hour restarts will be disabled as soon as we’re sure that the firmware is stable completely stable. Until then, these restarts provide a way to roll out critical firmware updates.

I can understand @GrahameH’s frustration, but I’d like to address a couple of his claims that I think are a bit unfair. First of all, @GrahameH seems to generalize the issues with his gateway to failure of the entire project/team. Looking at the progress we’ve made since I joined the core team full-time two years ago, I think we’ve built a pretty awesome project. We’ve brought together over 30,000 people worldwide in our communities, of which 750 attended our conference in Amsterdam. Our free public community network is successfully routing a couple of million messages per day, both for hobbyists and commercial users. So I strongly disagree with the claim that “this product and network has no commercial ability”.

There have also been a couple of messages about “mission critical communication”. I’d like to clarify that neither TTN nor LoRa(WAN) or actually any ISM radio technology can safely be used for truly mission critical communication. ISM spectrum is the wild west of radio, nobody can guarantee a successful transmission. LoRa(WAN) is a young technology, and though it works pretty well, it’s indeed not yet as mature as GPRS/3G. TTN is a community project, don’t rely on people you don’t know to keep their gateways online. TTN’s gateway firmware is still under development, and it’s indeed not completely stable yet. Future firmware updates will fix that. It’s all open source, so feel free to help out.

Sending repeated messages/emails/forum posts about your gateway won’t get it fixed faster. So consider letting the @TTP team work on a solution instead of expecting them to write replies to your messages.

Again, let’s try to keep the criticism constructive. Try to help find solutions, not just highlight problems. Try not to offend people and try not to feel offended. And let’s focus on what we’re all here for: building this thing together.

13 Likes