Hi Arjan,
I was in for an experiment, so I downloaded your firmware.
It did not work right a way however. I tried reset, new activation etc.
At a certain point I pushed the LORA card and I felt it went a bit further into the connector.
And now it suddenly works.
Could it be a combination of problems?
It is a pity I did not try to push the card before I changed firmware. And since there is no way back…
But at least it seams to work now and that is very nice.
Thanks for your excellent work to find the cause.
The original code is also full of timed waits/delays with comments that they are there to give time to print something.  doesn’t look very stabil.
 doesn’t look very stabil.
Do you have a debug cable? Then I wonder if you see the line LORA: flushing. (Using 921600 baud.)
About the lora board and it’s connector, it looks like they don’t match, as if the board is thinner than the connector is build for. When you remove the board and put it back in, there is no pressure when you press it down, like you have when you insert a flash/sdd card in a slot like that…
I also wonder if the same UART timing issues and excessive newlines could be present in the LoRa card’s output for received LoRaWAN packets, which it might then drop.
But when peeking in TTN Console I think I don’t see my own test packets missing out on frame counters. (Except for the weird double LoRaWAN packet with two different frequencies that I showed earlier.) So maybe my LoRa card is simply a slightly different revision that has different firmware itself, only affecting the configuration commands? At its bottom side, mine is labeled:
LG8501601782
LG-X271
REV: C
If you see packets in ttn console your gw runs longer than mine, mine usually don’t get this far.I have module sn LG8451600649, rest is the same. What I also find weird is that when my gw is powered off for say 24 hrs. and I power it on for first time, it runs the longest. If I do this a few minutes or hours after bootloop it barely has time to connect until I wait a day/night again. It’s not temperature related though.
Yes, it has been running overnight, and a good part of yesterday too. I’ve rebooted manually quite often (it’s a tiring process to copy the firmware to the SD card and so on…) but it has never rebooted unexpectedly.
Any examples? I’ve actually searched for such code but couldn’t find it. I’d assume one can actually peek into the buffer to see if it’s empty, rather than waiting some specific time… But then, the compiler cannot find, e.g., DRV_USART_ReceiverBufferIsEmpty while it can find DRV_USART_Read from the very same header file. So I guess somehow the project defines some static linking that I cannot find in the tooling.
For example the vTaskDelay() calls in system_tasks.c
Okay, I figured out the double response in the log, but wow, this is driving me nuts. Bottom line: even at 921600 baud the/my logging is just too often incomplete.
Like this should create 9 entries in the log:
SYS_DEBUG(SYS_ERROR_DEBUG, "\r\nLORA: configureRXChain(1, ...)\r\n");
status *= configureRXChain(1, appGWActivationData.configuration_sx1301.rfchain[1].enable,
                           appGWActivationData.configuration_sx1301.rfchain[1].freq);
for(i = 0; i <= 7; i++)
{
    SYS_DEBUG(SYS_ERROR_DEBUG, "\r\nLORA: configureIFChainX(%d, ...)\r\n", i);
    status *= configureIFChainX(i, appGWActivationData.configuration_sx1301.ifchain[i].enable,
                                appGWActivationData.configuration_sx1301.ifchain[i].radio,
                                appGWActivationData.configuration_sx1301.ifchain[i].freqOffset);
}
But I get 1 with some weird double response, and 7 for the loop where the first of for(i = 0; i <= 7; i++), is not (properly) shown:
LORA: configureRXChain(1, ...)
RF: 1,1,868500000
LORA: send_cmd: 23 34 06 00 01 01 20 42 c4 33 b8 0d 
LORA: recv_rpl: 23 34 07 00 00 01 01 80 e5 f9 ff be 0d 
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK
// The following is missing!
//    LORA: configureIFChainX(0, ...)
//    IF: 0,1,1,-400000
LORA: configureIFChainX(1, ...)
IF: 1,1,1,-200000
LORA: send_cmd: 23 35 07 00 01 01 01 c0 f2 fc ff 0f 0d 
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK
// Likewise proper output for i = 2, 3, 4, 5, 6
LORA: configureIFChainX(7, ...)
IF: 7,1,0,400000
LORA: send_cmd: 23 35 07 00 07 01 00 80 1a 06 00 07 0d 
LORA: recv_rpl: 23 35 01 00 00 59 0d
LORA: sendCommand OK
So, I am quite sure that the weird second response (the line LORA: recv_rpl: 23 35 01 00 00 59 0d and the next LORA: sendCommand OK) is actually the response for i = 0, and even more: the payload shown for the RX in the line LORA: recv_rpl: 23 34 07 00 00 01 01 80 e5 f9 ff be 0d is actually the payload for the TX in LORA: send_cmd for i = 0!
Compared to when logging happens to be okay, after:
LORA: send_cmd: 23 34 06 00 01 01 20 42 c4 33 b8 0d 
…apparently the first and last lines of:
LORA: recv_rpl: 23 34 01 00 00 58 0d
LORA: sendCommand OK
LORA: configureIFChainX(0, ...)
IF: 0,1,1,-400000
LORA: send_cmd: 23 35 07 00 00 01 01 80 e5 f9 ff be 0d 
…have somehow been combined into this single bogus line, discarding anything in between:
LORA: recv_rpl: 23 34 07 00 00 01 01 80 e5 f9 ff be 0d
Changing the output format for TX and RX a bit also makes clear it’s indeed just a matter of weird printing. (It’s not like the RX and TX data are mixed up in memory.)
TL;DR: for me the fix to flush the buffer is still working, and I guess I can submit the PR, but one needs to be careful trying to read the log.
The only reference to logging I see is:
But that looks okay, as it’s about to reboot and then logging would be lost without such delay.
Hmmm, not all open source code is quite readable (yet?). It seems the web server resources (HTML, images, JavaScript) have been pushed to Git as some disk image, rather than their source code?
(Of course, one might fetch most, if not all, using a regular browser.)
Do we know what we receive from the lora board?
Because I found out the send_cmd 23 34 06 00 01 01 20 42 c4 33 b8 0d
23 = startbit
34 = LORA_COMMAND_RFCONFIG
06 = lsb cmd lenght
00 = msb cmd length
01 = RFchain
01 = enable
20 = 0x33c44220 = 868500000
42
c4
33
b8 = checksum
0d = stopbit
The recv_rpl would be 23 34 07 00 00 01 01 80 e5 f9 ff be 0d:
23 = startbit
34 = LORA_COMMAND_RFCONFIG
06 = lsb cmd lenght
00 = msb cmd length
00 = RFchain -> other RF chain?
01 = enable
01 = 0xfff9e58001 = 1099409227777
80
e5
f9
ff
be = checksum
0d = stopbit
That is a strang reply I think
Also recv_rpl: 23 35 01 00 00 59 0d would maybe mean:
35 = LORA_COMMAND_IFCONFIG
00 = LORA_TX_STATUS_UNKNOWN
edit:
But the last command is really a gues. It could also mean 00 = LORA_ACK
I got these meanings from https://github.com/TheThingsProducts/gateway/blob/59010b338e018eea06f6fa99f128acb6808ebaba/firmware/src/app.h
I get different replies for those 0x34 commands:
RF: 1,1,868500000
LORA: send_cmd: 23 34 06 00 01 01 20 42 c4 33 b8 0d 
LORA: recv_rpl: 23 34 01 00 00 58 0d
(But like proven above, the logging is often not to be trusted…)
The installation process worked straight forward and the gateway seems to be active. This process was smooth.
However:
Where is the manual?
Do we have to rely on this type of posting for information of the TTN gateway?
Where can I find the schematics and overview of this open hardware/software?
Where can I configure the TTN wifi network node to OFF, as I use Ethernet only?
Regards,
Roland
I have seen similar behaviour on my IC880a RPI gateway with nodes very close to the gateway. It seems as if the concentrator does not detect that both signals are the same and decided to apply two receive branches to the signal for detection.
My compliments to Arjan for the pull request.
This is an outstanding effort of engineering and programming.
Great artwork.
My TTN gateway disconnects after about 24-36hrs and will not connect again unless I cycle the power off and on.
This is no good for me as the gateway is remote.
Any ideas why it is not staying permanently connected to TTN?
Are you using a WiFi connection to the Internet? If so there apparently is work being done on the firmware to try to remediate that known problem. Meanwhile if you can possibly switch to Ethernet that might well cure your problem.
No. I am using Ethernet. hard wired Ethernet.
Thanks Arjan! Something new to play with. Sadly no improvement here. Still getting 8 second reboot loop, but with enhanced logging now  I think the LG8271 might just be plain bad.
  I think the LG8271 might just be plain bad.
Enhanced log of where it tries to config the LoRa module, and then decides to reboot:
CNFG: Load online user config state change to 6
MON: SYS Stack size: 2859
MON: heap usage: 151KB (233KB), free: 188KB
FREQ: APP_URL_Buffer: https://account.thethingsnetwork.org/api/v2/frequency-plans/EU_863_870
FREQ: Valid URL
HTTP: Starting connection
HTTP: - Host: account.thethingsnetwork.org
HTTP: - Path: api/v2/frequency-plans/EU_863_870
HTTP: - Port: 443
HTTPS: Connection Opened: Starting TLS Negotiation
HTTP: Wait for TLS Connect
HTTP: TLS Connection Opened: Starting Clear Text Communication
HTTP: Got 1232 bytes
HTTP: Connection Closed
HTTP: Close active socket 1
FREQ: Parsing response token: HTTP/1.1 200 OK
FREQ: response code: -2147309856
FREQ: - lorawan_public:  true
FREQ: - enable:  true
FREQ: - freq:    867500000
FREQ: - enable:  true
FREQ: - freq:    868500000
7
CNFG: ConfiguringLoRa module
40LORA: anging state from 2to 4
 enable:         true
FREQ: - if:      -200000
FREQ: - radio:   1
FREQ: - e0 55 0d 
FREQ: - if:      0
FREQ: - radio:   1
FREQ: - enable:  true
FREQ: - if:      -400000
FREQ: - radio:   0
FREQ: - enable:  true
FREQ: - if:      -200000
FREQ: - radio:   0
FREQ: - enable:  true
FREQ: - if:      0
FREQ: - radio:   0
FREQ: - enable:  true
FREQ: - if:      200000
FREQ: - radio:   0
FREQ: - enable:  true
FREQ: - if:      400000
FREQ: - radio:   0
FREQ: - chan_Lora_std:   {"enable":true,"if":-200000,"radio":1,"bandwidth":250000,"spread_factor":7}
FREQ: - enable:  true
FREQ: - if:      -200000
FREQ: - radio:   1
FREQ: - bandwidth:       250000
FREQ: - spread_factor:   7
FREQ: - chan_FSK:        {"enable":true,"if":300000,"radio":1,"bandwidth":125000,"datarate":50000}
FREQ: - enable:  true
FREQ: - if:      300000
FREQ: - radio:   1
FREQ: - bandwidth:       125000
FREQ: - datarate:        50000
CNFG: Load online user config state change to 7
CNFG: Configuring LoRa module
LORA: Changing state from 2 to 4
LORA: Starting reconfiguration
LORA: send_cmd: 23 31 01 00 00 55 0d 
MON: SYS Stack size: 2859
MON: heap usage: 151KB (233KB), free: 188KB
SNTP: State change from 0 to 0
SNTP: State change from 0 to 0
**************************
*   The Things Network   *
*      G A T E W A Y     *
**************************
Firmware name: Arjan's v1 (flush UART, extended logging), type: 0, version: 0.0.0, commit: 00000000, timestamp: 0
Bootloader revision: 1, commit: 7167873a, timestamp: 1496411298
Build time: Feb 11 2018 01:32:36
Reboot reason: 0x10
BOOT: (persisted info) 6F 72 72 65 01 03 FB FD 8E 0D E4 0D 9A F7 FA DCI don’t know if it would help, but have you tried giving the device a fixed IP address?
In case this is helpful re: remote rebooting needs…
http://www.aviosys.com/9255.html
It sells for about $85 U.S.