Latest MultiTech packet forwarder stops sending packets

mattkrebs · May 17, 2017, 11:04am

Unfortunately the new version still stops forwarding packets after a few minutes. I restarted the forwarder several times.

This is the latest output log:
lora-pkt-fwd.log.pdf (25.2 KB)

The last message is always INFO: Disabling GPS mode for concentrator's counter...

dmesg output and system logs look similar to my earlier post.

kersing · May 17, 2017, 11:09pm

Looking at the code I do not see any reason why a lock up should occur at that point apart from possible USB issues while accessing the LoRaWAN concentrator. I do not understand why this isn’t happening on the conduits I’m testing the code on.
I’ve created yet another version with additional debugging, please download it here and run it.

WoutD · May 18, 2017, 2:38pm

Hi there,

I got the same issue. Seems like every time i simmulate a downlink to update my sensor the packet forwarder crashes.
I’m still a newbie to this and I don’t know how to install this .ipk file. I know how to download the installer.sh file as described on the site. But no idea how to make sure the latest .ipk file is running… Can you please help?

kersing · May 18, 2017, 4:58pm

Download the file linked in the message above and transfer it to the conduit, next run:

/etc/init.d/ttn-pkt-forwarder stop
opkg install < filename >
/etc/init.d/ttn-pkt-forwarder start

Where < filename > is the name of the file you transferred.

mattkrebs · May 19, 2017, 7:41am

I tested the new debugging version for almost a day.
During the first try it got stuck again after a few minutes.

Then I tried putting the MTAC-LORA-868 module into slot AP1 instead of AP2. According to the manual, that should not make a difference.
It seemed to be more stable indeed, as it kept going for the rest of the day. But during the night it got stuck again.

Here is the new log from the debug version:
lora-pkt-fwd.log.3.pdf (661.7 KB)

I also suspect that it is a problem accessing the LoRa module, with the lock not being released after INFO: Disabling GPS mode for concentrator's counter....

The last thing I could try is to upgrade to mLinux 3.3.6, currently I am running 3.2.0.
@kersing what is the setup of your test gateways?

kersing · May 21, 2017, 8:26pm

I’ve updated my development gateway from 3.1.0 to the newest available release and am running tests now.

afnic_labs · May 23, 2017, 6:14am

I am having the same issue. With the lastest AEP multitech conduit firmware 1.4.1, the ttn-packet-forwarder is restarting every 30 seconds

afnic_labs · May 23, 2017, 6:52am

I get the following information after every 30 Seconds

INFO: Disabling GPS mode for concentrator's counter...
INFO: host/sx1301 time offset=(1495522106s:463333µs) - drift=-1828µs
INFO: Enabling GPS mode for concentrator's counter.

##### 2017-05-23 06:50:30 GMT #####
### [UPSTREAM] ###
# RF packets received by concentrator: 0
# CRC_OK: 0.00%, CRC_FAIL: 0.00%, NO_CRC: 0.00%
# RF packets forwarded: 0 (0 bytes)
# PUSH_DATA datagrams sent: 0 (0 bytes)
# PUSH_DATA acknowledged: 0.00%
### [DOWNSTREAM] ###
# PULL_DATA sent: 0 (0.00% acknowledged)
# PULL_RESP(onse) datagrams received: 0 (0 bytes)
# RF packets sent to concentrator: 0 (0 bytes)
# TX errors: 0
### BEACON IS DISABLED! 
### [JIT] ###
# INFO: JIT queue contains 0 packets.
# INFO: JIT queue contains 0 beacons.
### GPS IS DISABLED! 
### [PERFORMANCE] ###
# Upstream radio packet quality: 0.00%.
# Semtech status report send. 
##### END #####
INFO: [status] TTN send status success

mrkubas · May 23, 2017, 11:01am

I have the same problem on AEP version with firmware 1.4.1.
After a few minutes since node starts transmitting data, packet forwarder stops working. The following information can be found in the lora_pkt_fwd.log file:

... INFO: [up] TTN lora packet send to server "bridge.eu.thethings.network" INFO: [down] TTN received downlink with payload INFO: [TTN] downlink 12 bytes lgw_receive:1161: WARNING: 128 = INVALID NUMBER OF PACKETS TO FETCH, ABORTING INFO: Enabling TX notch filter ...

Maybe this information will be useful to solve the problem.

kersing · May 23, 2017, 8:54pm

That is the normal status update that should be shown every 30 seconds. It is not a restart.

kersing · May 23, 2017, 8:58pm

It is an indication invalid data is exchanged between the radio on the mCard and the main processor (which I expected all along). Now I ‘only’ need to find why there is invalid data.

ssozonoff · May 26, 2017, 4:27pm

Updated to latest version but still seeing failures…

INFO: [main] Starting the concentrator
ERROR: SPI ERROR DURING REGISTER WRITE
ERROR: SPI ERROR DURING REGISTER WRITE
ERROR: SPI ERROR DURING REGISTER BURST WRITE
59.20.0.13.6.2f.bc.9.0.97.21.10.0.8.0.0.20.c5.55.bb.86.b8.f0.d7.69.80.ed.43.85.79.95.d4.5.65.38.6b.9d.58.59.eb.2b.2c.5a.15.24.d3.c8.37.38.end
ERROR: SPI ERROR DURING REGISTER WRITE
INFO: tx_start_delay=1495 (1495.500000) - (1497, bw_delay=1.500000, notch_delay=0.000000)

INFO: End of JIT thread
ERROR: SPI ERROR DURING REGISTER BURST READ

INFO: End of upstream thread
INFO: [TTN] Disconnecting server "router.eu.thethings.network"
Note: success disconnecting the concentrator
INFO: concentrator stopped successfully
INFO: Exiting packet forwarder program

kersing · May 27, 2017, 9:10pm

Have you been able to successfully run the ‘old’ poly forwarder?

I’m trying to recreate the failures but am unsuccessful.

ssozonoff · May 28, 2017, 5:45pm

Yes the old one worked fine the “new” one works to but has a tendancy to die…
It can run fine for several days and then stop.

As described before if I have a single node talking to the gateway and I reboot the node several times it will usually always kill the gateway. So if you want to try and reproduce I would recommend only one node connecting to the gateway, use OTAA and have a watchdog timer which restarts the node every few minutes. Within 10 cycles you should see the problem… unless its more subtle

mrkubas · May 29, 2017, 12:43pm

In my case the problem occurs very quickly when one node sends packets periodically with confirmation regardless the join method.
Packet forwarder seems to work stable when node transmits data with no confirmation (it has been started 3 days ago and no problems so far).

Additional error messages from the log file:
ERROR: SPI ERROR DURING REGISTER BURST READ
ERROR CONNECTING CONCENTRATOR
ERROR: FAIL TO CONNECT BOARD
ERROR: [main] failed to start the concentrator
ERROR: SPI ERROR DURING REGISTER WRITE

mattkrebs · May 29, 2017, 1:28pm

I have been testing with two gateways running mLinux 3.3.6 and the current packet forwarder (3.0.0-r5).
One gateway is still up and running after 5 days, the other one stopped like 10 minutes after I began to try the method suggested by @ssozonoff.
Before, I had mostly unconfirmed uplinks and occasional confirmed uplinks.
I get the lgw_receive:1161: WARNING: 42 = INVALID NUMBER OF PACKETS TO FETCH, ABORTING message from time to time, but it’s never close to the moment it stops.

kersing · May 29, 2017, 4:02pm

Thank you all for your feedback. I’ll try again to replicate this, so far I have not been able to.

@mattkrebs are both gateways the same hardware? With the mCard in the same slot? (Which slot?)

ssozonoff · May 29, 2017, 5:22pm

I have the mCard in slot 2 if ever and slot 1 is empty.

mattkrebs · May 30, 2017, 7:20am

Both gateways are identical: MTCDT-H5-210L-EU-GB with MTAC-LORA-868 (1.0), using the original power supply.
The mCard is in slot AP1 (I switched it from AP2 - with no difference).
Both are running mLinux 3.3.6 since last week. The GSM functionality is not configured, just Ethernet.

I have a third identical gateway, but that one is still on mLinux 3.2.0 and downgraded to the old packet forwarder (I need it to be up and running). It showed the same problems when I upgraded it to the new forwarder one time.

afnic_labs · May 30, 2017, 8:43am

Please correct me, if i am wrong. But when there is a a “Last seen” status every 30 seconds, i do not receive any valid packets.
I tried the hard way. Completely switching of the gateway, cooling it down, and switching it on after some time. This way it works some times.

When i have the “Last seen” status like more than 30 seconds, there are packets being sent.