LIG16 troubleshooting

New fw seems stable. But also with this new firmware the gateway does not show traffic statistics when you use it in Basic Station mode. If I remember correctly someone already contacted Dragino about this months ago (@wolfp was that you?)

For now I reverted again to semtech udp.

I think you’re going to find that’s pretty common with Basic Station on most platforms. Whatever log scanning was used to capture it is expecting log messages from the UDP forwarder, while Basic Station would produce that information in a different format if at all.

New firmware, tested ok for LIG16.
GUI for gateway traffic does not show anything, don’t bother for that.

lgw–build-v5.4.1644990565-20220216-1352
*Add LPS8-N support
*Add AT&T disconnection detection
*Add Secondary LoRaWAN Server for Semtech UDP
*LPS8/LG308/DLOS8’s software of pkt_fwd iterates to fwd
*LPS8/LG308/DLOS8 supports the GUI of Gateway traffic
*Fixd fallback address lost after save WiFi setting

For me the new firmware version is not ok. There is still nothing shown in “Gateway Traffic” when you use BasicStation. I need to know what’s going on with my gateway without using the console.
Therefore I installed the new firmware using Semtech UDP.

Due to the differences between the UDP based packet forwarder and BasicStation that is unlikely to change anytime soon. So no need to report this every update.

1 Like

I see a lot of reboots on my LPS8, on average every 15 hours, and Basic Station freezes after a few days/weeks.
I updated it 2 weeks ago with the new firmware (16 Feb 2022 lgw-5.4.1644990565) and configured it for Basic Station, connected over WiFi. It ran fine for 2 weeks (lots of reboots) but froze yesterday. Completely unresponsive, no ping.
I had noticed the same issue on the previous firmware (24 Dec 2021), Basic Station freezing, so I switched back to UDP and that ran without freezing; lots of reboots but always coming back online. I wanted to give Basic Station a chance with the new firmware but it seems to reboot more often, and now it froze.
The only modification i made to the firmware is installing nano and a cron job that sends the uptime over MQTT SSL every 5 minutes. Earlier i had tried to avoid the freezes with watchcat (did not work), and directing the syslog to permanent storage to investigate post freeze but i did not find any errors.
Does anyone have similar reboot/freeze issues with Dragino gateways over WiFi?
gw
My WiFi signal is not strong, but acceptable?
wifi

imho the LIG16 and the LPS8 have different hardware but seem to use the (nearly) same firmware. I own both gateways, the LPS8 is still on the “old” firmware running Semtech UDP without any problems since a few months.

The LIG16 is running the newest firmware with Semtech UDP without any reboots or freezes since a few days.

Your WLAN signal seems to be strong enough, it’s nearly the same signal/noise as I have.

Maybe this is more a LPS8 - related problem when using BasicStation.

Did it freeze before you put this on?

One LIG16 is up and running basic station for 2 weeks now, it hasn’t bother me since.

Thanks for the feedback. I’ve switched my LPS8 back to the UDP forwarder and upgraded to a beefy 2.5A power supply, but i’m still seeing several reboots over the past days. I’m just wondering if the same thing is happening to other people without realising it - the uptime is reported in the dashboard bottom right.
gw-uptime
Sorry i didn’t want to hijack this LIG16 thread, just wondering if anyone had a similar problem or if it’s related to firmware. The conclusion seems to be that the LIG16 (both UDP and Basic Station) does not have this problem.

My recommendation would be to get a logread -f running, say from /etc/rc.local and then run the console serial of the gateway into some other logging system, eg, a Raspberry Pi or old PC that can sit there and just log everything for days to weeks.

You may also want to determine if the SoC’s hardware watchdog, or the Linux soft watchdog that can wrap it, are in use.

Basically what you’ll then want to do is search through the external log for Linux boot messages, and see what was happening just before.

That assumes of course that these are Linux reboots, if it’s just the packet forwarder restarting, you’d look for those instances instead.

thanks @cslorabox i ran logread -f in a ssh session and the LPS8 rebooted after 32 hours with these last lines before the pipe broke; i don’t see anything unusual. the iot_keep_alive is a Dragino service for failover Ethernet/WiFi/3G

Mon Mar  7 13:20:02 2022 user.notice iot_keep_alive: Internet Access OK: via wlan0-2
Mon Mar  7 13:20:02 2022 user.notice iot_keep_alive: use WAN or WiFi for internet access now
Mon Mar  7 13:20:03 2022 daemon.info fwd[2562]:
Mon Mar  7 13:20:03 2022 daemon.info fwd[2562]: ################[PKT_SERV] no report of this service ###############
Mon Mar  7 13:20:03 2022 daemon.info fwd[2562]: PKTUP~ [server] JSON: {"stat":{"time":"2022-03-07 05:20:03 UTC","rxnb":1,"rxok":1,"rxfw":1,"ackr":0.0,"dwnb":111,"txnb":0,"pfrm":"SX1308","mail":"tomtobback@gmail.com","desc":"Dragino LoRaWAN Gateway"}}
Mon Mar  7 13:20:03 2022 daemon.info fwd[2562]: PKTUP~ [server2] JSON: {"stat":{"time":"2022-03-07 05:20:03 UTC","rxnb":1,"rxok":1,"rxfw":1,"ackr":0.0,"dwnb":0,"txnb":0,"pfrm":"SX1308","mail":"tomtobback@gmail.com","desc":"Dragino LoRaWAN Gateway"}}
Mon Mar  7 13:20:05 2022 daemon.info fwd[2562]: INFO~ [server-up] received packages from mote: 260D1D33 (fcnt=17)
Mon Mar  7 13:20:05 2022 daemon.info fwd[2562]: PKTUP~ [server] JSON: {"rxpk":[{"jver":1,"tmst":2803997651,"chan":4,"rfch":0,"freq":924.000000,"mid":0,"stat":1,"modu":"LORA","datr":"SF7BW125","codr":"4/5","rssis":-41,"lsnr":9.0,"foff":0,"rssi":-41,"size":21,"data":"QDMdDSaAEQABanzd5cTGYm5Ja30x"}]}
Mon Mar  7 13:20:05 2022 daemon.info fwd[2562]: INFO~ [server2-up] received packages from mote: 260D1D33 (fcnt=17)
Mon Mar  7 13:20:05 2022 daemon.info fwd[2562]: PKTUP~ [server2] JSON: {"rxpk":[{"jver":1,"tmst":2803997651,"chan":4,"rfch":0,"freq":924.000000,"mid":0,"stat":1,"modu":"LORA","datr":"SF7BW125","codr":"4/5","rssis":-41,"lsnr":9.0,"foff":0,"rssi":-41,"size":21,"data":"QDMdDSaAEQABanzd5cTGYm5Ja30x"}]}
Mon Mar  7 13:20:05 2022 kern.info kernel: wlan0-2: AP 24:f5:a2:05:e3:ab changed bandwidth, new config is 2437 MHz, width 1 (2437/0 MHz)
Mon Mar  7 13:20:05 2022 daemon.info fwd[2562]: UNCONF_UP:{"ADDR":"260D1D33", "Size":21, "Rssi":-41, "snr":9, "FCtrl":["ADR":1,"ACK":0, "FPending":0, "FOptsLen":0], "FCnt":17, "FPort":1, "MIC":"317D6B49"}
Mon Mar  7 13:20:07 2022 daemon.info fwd[2562]: INFO~ [server-down] PULL_ACK received in 143 ms
Mon Mar  7 13:20:09 2022 kern.info kernel: wlan0-2: AP 24:f5:a2:05:e3:ab changed bandwidth, new config is 2437 MHz, width 2 (2447/0 MHz)
Mon Mar  7 13:20:12 2022 daemon.info fwd[2562]: INFO~ [server-down] PULL_ACK received in 143 ms
Mon Mar  7 13:20:17 2022 daemon.info fwd[2562]: INFO~ [server-down] PULL_ACK received in 145 ms
Mon Mar  7 13:20:18 2022 user.notice iot_keep_alive: Internet Access OK: via wlan0-2
Mon Mar  7 13:20:18 2022 user.notice iot_keep_alive: use WAN or WiFi for internet access now

and another reboot after these final lines:

Tue Mar  8 05:31:13 2022 user.notice iot_keep_alive: Internet Access OK: via wlan0-2
Tue Mar  8 05:31:13 2022 user.notice iot_keep_alive: use WAN or WiFi for internet access now
Tue Mar  8 05:31:17 2022 daemon.info fwd[2329]: INFO~ [server-down] PULL_ACK received in 148 ms
Tue Mar  8 05:31:23 2022 daemon.info fwd[2329]: INFO~ [server-down] PULL_ACK received in 148 ms
Tue Mar  8 05:31:28 2022 daemon.info fwd[2329]: INFO~ [server-down] PULL_ACK received in 148 ms
Tue Mar  8 05:31:28 2022 user.notice iot_keep_alive: Internet Access OK: via wlan0-2
Tue Mar  8 05:31:28 2022 user.notice iot_keep_alive: use WAN or WiFi for internet access now
Tue Mar  8 05:31:33 2022 daemon.info fwd[2329]: INFO~ [server-down] PULL_ACK received in 152 ms
Tue Mar  8 05:31:38 2022 daemon.info fwd[2329]: INFO~ [server-down] PULL_ACK received in 148 ms
Tue Mar  8 05:31:42 2022 daemon.info fwd[2329]:
Tue Mar  8 05:31:42 2022 daemon.info fwd[2329]: ################[PKT_SERV] no report of this service ###############
Tue Mar  8 05:31:42 2022 daemon.info fwd[2329]: PKTUP~ [server2] JSON: {"stat":{"time":"2022-03-07 21:31:42 UTC","rxnb":0,"rxok":0,"rxfw":0,"ackr":0.0,"dwnb":0,"txnb":0,"pfrm":"SX1308","mail":"tomtobback@gmail.com","desc":"Dragino LoRaWAN Gateway"}}
Tue Mar  8 05:31:42 2022 daemon.info fwd[2329]: PKTUP~ [server] JSON: {"stat":{"time":"2022-03-07 21:31:42 UTC","rxnb":0,"rxok":0,"rxfw":0,"ackr":0.0,"dwnb":46,"txnb":0,"pfrm":"SX1308","mail":"tomtobback@gmail.com","desc":"Dragino LoRaWAN Gateway"}}

I don’t see any clues, but as long as it doesn’t freeze i don’t mind the reboots.

I would check the watchdog configuration.

Also consider capturing the logs via a python script or something that can timestamp each line, and see if there’s an unusual time gap between the final log messages and the u-boot startup messages.

Thanks for all the feedback, i think i’ve figured out the cause: overheating. The LPS8 seems very poorly designed, overheating at around 25 degC ambient air temperature. Maybe not a big issue in Europe but for sub-tropical Hong Kong that it the case more than half of the year, especially under the roof. The Dragino HQ is less than 50k away from here in Shenzhen, i wonder how they test their devices. I’ve added a temperature sensor inside my gateway box, and a small 5V brushless fan that switches on when the inside temp goes above 30 degC (at 40 and 35 degC i was still seeing frequent reboots), and a few ventilation holes in the top panel. Detail here. Almost 4 days without a reboot.
20220315_130610

2 Likes