TTN GATEWAY central


#435

Are any of you with the loop/chronic reboot problem (not the occasional wifi disconnection) located in the United States?


(Onehorse) #436

Yes. I am.


#437

I thought, onehorse, that you seemed to have the occasional wifi disconnection problem, and not the loop/chronic reboot problem that I was trying to allude to, one that may cause crashes every 60 seconds and doesn’t seem to allow the device to work well even for 10 minutes? The former seems like a problem that may be fixed with a firmware update. The latter, more severe, problem seems like it may be a problem with the Lora gateway board… and some have wondered whether there may have been a batch of boards that had a high rate of defects in the manufacturing or assembly process…


(Onehorse) #438

Quite right, my gateway works fine for hours at a time, so this is perhaps a different problem than never being able to activate the gateway, run it stably for more than a few seconds.

I assumed my problem was also a reboot problem since when it stops working, invariable the second led is flashing rapidly indicating failure to connect. Why would it disconnect once connected except upon some kind of restart/reboot event?


#439

"Why would it disconnect once connected except upon some kind of restart/reboot event?"
My sense is that it’s not that rare for client devices to at least briefly lose connection to an access point and a major aspect of the problem you are seeing, that one hopes will be cured via a firmware update, is that once the connection is lost it isn’t quite rapidly reestablished.


(Onehorse) #440

I have been running my MTCAP-915 gateway for 48 hours straight with no touble but this is on ethernet. I suppose to be fair, I need to swap it with the TTN gateway ;>


#441

Yeah, I’m optimistic that your TTN device will be very stable on Ethernet… Mine are. And I’d guess the MTCAP device will do fine on the Wifi. Do let us know the results if you try it.


(Mrtn78) #442

I’m located in the Netherlands and the device isn’t able to get activated. The status is still “not connected”


#443

Thanks for the info. If you haven’t yet had a chance to do much troubleshooting, here are some questions you might ask…
Is it stable in that condition? Or is it rebooting repeatedly? Is it connected to the Internet via Ethernet? What are the condition of the LEDs? One on and the second slowly blinking? There’s a helpful list of reasons associated with different patterns here:
https://www.thethingsnetwork.org/forum/t/ttn-gateway-faq/11173
Do you know its IP address? Can you browse to the IP address adding /info after the IP address? What does it say? Is there any kind of firewall that might be blocking UDP traffic?


#444

Also one of my gateways has the same problem - stuck in a loop.

But two other gateways from an other pledge, which i configure at the same day, same time & same place and conditions, run without any problems .


(Onehorse) #445

OK, whatever caused the outage has been resolved and I have the TTN gateway working again now connected to ethernet. I will report its behavior over the next day or two to see if it has changed.


(CyberJunky) #446

Are you able and willing to swap power supply and lora board between working and non working gw?

And make some detailed pictures of both circuit boards where chip ids are readable?


(Arjan) #447

Peeking in the firmware code I finally understand the weird values in HTTP: Got 1232 bytes: the URLs printed in the logs are not the ones actually used, but ?filter=ttn is appended. And indeed, https://account.thethingsnetwork.org/api/v2/frequency-plans/EU_863_870?filter=ttn is much smaller than https://account.thethingsnetwork.org/api/v2/frequency-plans/EU_863_870

I’m also trying to enable debug logging and add much, much more logging, but then the UART output really starts losing many bytes, making it very hard to decipher my new logging. I tried to double DEBUG_PRINT_BUFFER_SIZE, but then the compiler fails to allocate space. To be continued…

The good news: I do see responses from the LoRa card, like LORA: recv_rpl: 0x23 0x35 0x1 0x0 0x0 0x0 0x0 0x59 0xd.

The bad news: the very first step in the configuration already fails:

status *= configureRXChain(0, appGWActivationData.configuration_sx1301.rfchain[0].enable,
                           appGWActivationData.configuration_sx1301.rfchain[0].freq);

(Willem (pe1dlf)) #448

do we already have the ‘programming’ i.e. the command set details for the LoraCard?


#449

UPDATE: one of the good working gateways, now also in an booting loop. Sometimes it receive an sensor packages, but most of the send sensor packages are lost due to rebooting cyclus. For this gateway, i had switch on the automatic update in the console. I changes the power supply, but no change.

At the moment , the gateway without problems, i have switch-off the automatic updates in the console.

So, my situation, only 1 of the three gateways is usable.


(Arjan) #450

Increasing the value of DRV_USART_BAUD_RATE_IDX0 to, say, 921600 helps making the logs a bit more readable, though still not perfect. These problems might also be caused by my UART-to-USB cable. And of course, the same setting must be used when capturing the output, like in LOG="ttn-gateway-date +’%Y%m%d-%H%M%S’.log"; pio serialports monitor --raw -b 921600 | while read l; do echo "[date +’%F %T’] $l" | tee -a $LOG; done, or when using a Raspberry Pi to capture the output.

This allows for changing the log level to SYS_ERROR_DEBUG in:

…and for enabling some disabled logging in the original code, and adding many more calls to SYS_DEBUG(SYS_ERROR_DEBUG, ...).

I added the following at the end of sendCommand in src/app_lora.c to know when things go wrong (not sure why the code uses status *= ... which will still try all next configuration steps if a previous one has already failed):

SYS_DEBUG(SYS_ERROR_WARNING, "LORA: sendCommand %s\r\n", gotresponse ? "OK" : "ERROR");

…yielding:

CNFG: Configuring LoRa module
LORA: Changing state from 2 to 4
LORA: Starting reconfiguration

LORA: send_cmd: 0x23 0x31 0x1 0x0 0x0 0x55 0xd 
LORA: recv_rpl: 0x23 0x31 0x1 0x0 0x0 0x55 0xd
LORA: sendCommand OK

LORA: send_cmd: 0x23 0x3a 0x1 0x0 0x0 0x5e 0xd 
LORA: recv_rpl: 0x23 0x3a 0x10 0x0 0x1 0x1 0x4c 0x47 0x38 0x35 0x30 0x31 0x36 0x30 0x31 0x37 0x38 0x32 0x4 0x1 0xd
LORA: sendCommand OK
LORA: version: 01

LORA: configureRXChain(0, ...)
RF: 0,1,867500000
LORA: send_cmd: 0x23 0x34 0x6 0x0 0x0 0x1 0xe0 0xff 0xb4 0x33 0x24 0xd 
LORA: recv_rpl: 0xd
LORA: sendCommand ERROR

LORA: configureRXChain(1, ...)
RF: 1,1,868500000
LORA: send_cmd: 0x23 0x34 0x6 0x0 0x1 0x1 0x20 0x42 0xc4 0x33 0xb8 0xd 
LORA: recv_rpl: 0x23 0x34 0x1 0x0 0x0 0x58 0xd
LORA: sendCommand OK

LORA: configureIFChainX(0, ...)

Above, the call to configureRXChain(1, ...) actually consistently succeeds. Or maybe the if(appData.rx_uart_buffer[1] == command) in the call to configureRXChain(1, ...) is seeing the response of the earlier call to configureRXChain(0, ...), as it somehow gets in too late? The if(TIMEOUT(4)) is not being hit. After the above lines the log is a mess again, but I see both OKs and ERRORs.

I need to run now; just posting my early findings in case it helps someone right now; will try to trigger the default configuration later, to ensure the config that is fetched from the internet is okay. To be continued…


#451

Would it make sense, after the gateway that stopped working properly has been turned on and powered up for you to press the reset button for five seconds so that it needs to be reactivated?. And then reactivate it selecting to not have automatic updates?


(Arjan) #452

Yes! :tada:

Seeing LORA: recv_rpl: 0xd above, which is a newline, I wondered what would happen if I always first read any pending RX from the LoRa module’s UART before sending new commands.

Guess what: in two occasions there was indeed such excessive newline pending in the receive buffer while the firmware was about to send a new command. Even better: discarding those makes the LoRa configuration complete without any error. Next, the activation simply succeeds and it even downloads new firmware, which obviously no longer includes my fixes, but somehow is not booted (yet) as the SD card with my own firmware is still in the gateway, my own firmware is loaded again after the reboot. So, using my own firmware it’s now activated and happily forwarding packets…! And the debug logs no longer show garbage either.

To be continued, but: in my case the reboot loop during activation is apparently totally fixable by just using new firmware.

And I’m not even alarmed by this weird double occurrence of frame counter 9, on two frequencies…

(Well, I am…)


(Willem (pe1dlf)) #453

@arjanvanb Can you share the image you created, curious to see of this fixes the reboots I have every so many hours.


(Arjan) #454

Here goes, based on today’s develop branch: https://drive.google.com/open?id=15UMxp7voWhCAHY0_xvZDkWDPHf74pPuX

I will create a PR tomorrow have created a PR, so if you can wait a few days for TTP/TWTG to validate it: just wait :wink:

If you don’t want to wait:

  • I don’t have the factory firmware for you, so: no way back!
  • No need to wipe any existing configuration.
  • Unpack the /update folder to the root of a FAT32 formatted SD card.
  • Remove power, insert the SD card, attach power.
  • If you’re using a serial cable for logging: set the baudrate of your monitor to 921600.
  • Leave the SD card in your gateway to avoid any downloaded firmware from overwriting it.
  • After the updated firmware was downloaded, when the SD card is still in place, you’ll see:
    FIRM: Starting download
    FIRM: available bytes: 79
    FIRM: (Downloaded FOTA key) 69 AE B7 78 1F 49 4E 7F BC B6 C7 CD 9C 59 4F 5D FA AA 3D 81 D4 9C 56 90 A6 83 81 98 FF 18 88 6A 
    FIRM: (Stored FOTA key) 69 AE B7 78 1F 49 4E 7F BC B6 C7 CD 9C 59 4F 5D FA AA 3D 81 D4 9C 56 90 A6 83 81 98 FF 18 88 6A 
    FIRM: Firmware is already downloaded
    MAIN: No new firmware available
    
    …which makes me think that even when removing the SD card after that, it won’t overwrite the firmware until something new is released. But I did not test.

This basically adds a bit more logging, plus:

// flushUart removes any pending bytes from the receive buffer.
void flushUart(DRV_HANDLE handle)
{
    bool flushing = false;
    uint8_t buffer[1]; 
    while(DRV_USART_Read(handle, buffer, 1) > 0) 
    {
        if(!flushing)
        {
            SYS_DEBUG(SYS_ERROR_DEBUG, "LORA: flushing: ");
            flushing = true;
        }
        SYS_DEBUG(SYS_ERROR_DEBUG, "%02x ", buffer[0]);
    }
    if(flushing)
        SYS_DEBUG(SYS_ERROR_DEBUG, "\r\n");
}

(Trying to adhere to the existing code style… Also, it would be nice to use, e.g., DRV_USART_ReceiverBufferIsEmpty, but the tooling refuses to find that?)

The above is then invoked at the start of:

bool sendCommand(uint8_t command, uint8_t* payload, uint16_t len)
{  
    flushUart(appData.USARTHandle);

    bool     gotresponse   = false;
    
    ...

    SYS_DEBUG(SYS_ERROR_DEBUG, "LORA: sendCommand %s\r\n", gotresponse ? "OK" : "ERROR");
    return gotresponse;
}

I only get LORA: flushing: 0d (being an empty line) once now.

However, the mystery continues: I doubt getting two lines as a reply is normal, and I don’t understand how the code could ever print the following!? (See followup post for an explanation.)

RF: 1,1,868500000
LORA: send_cmd: 23 34 06 00 01 01 20 42 c4 33 b8 0d 
LORA: recv_rpl: 23 34 07 00 00 01 01 80 e5 f9 ff be 0d 
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK

Okay, the code in my firmware uses "%02x" to print the hexadecimal values, which is different from the "%#x" in the code on GitHub. But how could it ever (consistently!) print two subsequent lines for LORA: recv_rpl for this very command…!?

My logging

This is complete; see next post for an explanation.

CNFG: Configuring LoRa module
LORA: Changing state from 2 to 4
LORA: Starting reconfiguration

LORA: send_cmd: 23 31 01 00 00 55 0d 
LORA: recv_rpl: 23 31 01 00 00 55 0d
LORA: sendCommand OK

LORA: send_cmd: 23 3a 01 00 00 5e 0d 
LORA: recv_rpl: 23 3a 10 00 01 01 4c 47 38 35 30 31 36 30 31 37 38 32 04 01 0d
LORA: sendCommand OK
LORA: version: 01

LORA: configureRXChain(0, ...)
RF: 0,1,867500000
LORA: flushing: 0d 
LORA: send_cmd: 23 34 06 00 00 01 e0 ff b4 33 24 0d 
LORA: recv_rpl: 23 34 01 00 00 58 0d
LORA: sendCommand OK

LORA: configureRXChain(1, ...)
RF: 1,1,868500000
LORA: send_cmd: 23 34 06 00 01 01 20 42 c4 33 b8 0d 
LORA: recv_rpl: 23 34 07 00 00 01 01 80 e5 f9 ff be 0d 
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK

LORA: configureIFChainX(1, ...)
IF: 1,1,1,-200000
LORA: send_cmd: 23 35 07 00 01 01 01 c0 f2 fc ff 0f 0d 
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK

LORA: configureIFChainX(2, ...)
IF: 2,1,1,0
LORA: send_cmd: 23 35 07 00 02 01 01 00 00 00 00 63 0d 
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK

LORA: configureIFChainX(3, ...)
IF: 3,1,0,-400000
LORA: send_cmd: 23 35 07 00 03 01 00 80 e5 f9 ff c0 0d 
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK

LORA: configureIFChainX(4, ...)
IF: 4,1,0,-200000
LORA: send_cmd: 23 35 07 00 04 01 00 c0 f2 fc ff 11 0d 
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK

LORA: configureIFChainX(5, ...)
IF: 5,1,0,0
LORA: send_cmd: 23 35 07 00 05 01 00 00 00 00 00 65 0d 
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK

LORA: configureIFChainX(6, ...)
IF: 6,1,0,200000
LORA: send_cmd: 23 35 07 00 06 01 00 40 0d 03 00 b6 0d 
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK

LORA: configureIFChainX(7, ...)
IF: 7,1,0,400000
LORA: send_cmd: 23 35 07 00 07 01 00 80 1a 06 00 07 0d 
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK

LORA: configureIFChain8(...)
IF8: 1,1,-200000,250000,7
LORA: send_cmd: 23 36 08 00 01 01 c0 f2 fc ff 02 02 14 0d 
LORA: recv_rpl: 23 36 01 00 00 5a 0d
LORA: sendCommand OK

LORA: configureIFChain9(...)
IF9: 1,1,300000,125000,50000
LORA: send_cmd: 23 37 0b 00 01 01 e0 93 04 00 03 50 c3 00 00 f4 0d 
LORA: recv_rpl: 23 37 01 00 00 5b 0d
LORA: sendCommand OK

LORA: send_cmd: 23 40 01 00 34 98 0d 
LORA: recv_rpl: 23 40 01 00 00 64 0d
LORA: sendCommand OK

LORA: send_cmd: 23 31 01 00 00 55 0d 
LORA: recv_rpl: 23 31 01 00 00 55 0d
LORA: sendCommand OK

LORA: send_cmd: 23 30 01 00 00 54 0d 
MON: SYS Stack size: 2870
MON: heap usage: 152KB (233KB), free: 187KB
LORA: recv_rpl: 23 30 01 00 00 54 0d
LORA: sendCommand OK

LORA: configLora OK
LORA: Configuration succeeded
LORA: Starting operation