Mikrotik WAP LR8 stops forwarding

I have a Mikrotik WAP LR8 (running 7.18.1) connecting to TTN using BS/CUPS

After a few hours (non-deterministic, but less than 12) it just stops forwarding with the log message:

[FWD] gateway-0 forwarder stopped

There’s typical messages in the log: RX packets, LNS signalling etc but nothing to suggest a problem. I know everyone needs a nap from time to time, but this thing can’t keep online.

Has anyone else seen this?

Thanks, Rob

Hi,
I’ve had this issue with firmware 7.17.x but now I’ve upgraded my gateways to 7.18 and 7.18.2 and I’ve not had the issue. Some gateways have been up for more than a week … I’ll keep monitoring.
Can you try 7.18.2 and see if this is still happening ?

Bad news I got the issue on 2 WAP LR8 running 7.18.1 and 7.18.2 today. One of them, in addition to being disconnected, had lost its link between device & Network Server definition.
I’ve opened a ticket at microtik support.
@robshep if you’re still experiencing the issue I suggest you do the same to put the pressure on them have this issue investigated & solved since this is really a showstopper for an otherwise quite nice gateway …

I’ve seen the same behaviour on my gateway (also running v7.18.2.)

Though I can’t reproduce it exactly, it seems to occur mostly when the router can’t reach the server on the first attempt or has abnormally high latency while doing so.

My internet connection is “reset” every night and comes back with a different public IP. The router appears to have difficulties re-establishing the connection to the TTN server in this case.

I solved it by automatically rebooting the WAP whenever the public IP changes. This information is available from my DSL router, which reports it to a MQTT broker, which the Mikrotik can subscribe to and execute a script when the relevant topic changes.

Mikrotik support finally responded to my bug report asking to test on 7.19RC I’ve not done it so far but looking at the changelog I don’t think they fixed it. I’ve built monitoring scripts using the mikrotik API to check the LoRa Status and reboot the gateway & reconfigure LoRa if needed after the reboot since I’ve experienced LoRa config loss when the issue occurs. This is a real pain …

A few days ago I installed v7.19 and shortly after that v7.19.1: With these versions, the problem actually became worse for me. With v7.18.x I had those “stops” only occasionally, with v7.19.x they seem to have grown into a daily occurence. :frowning:

Hi All,
I Tossed this otherwise good gateway on the graveyard and deployed another brand.
No capacity to try-this-try-that-release, but I hope you get somewhere

Thanks

Rob

I’ve just upgraded from 7.18.2 to 7.19.1 on one of my GWs and I’m also experiencing the same behavior where the forwarder stops more frequently.
On another GW I’m testing 7.20Beta2 which seems to behave better (for now).
I’m constantly submitting supout.rif files to support but it seems they’re not managing to figure out the issue, this is a real pain & waste of time.

I’ve also seen weird issues with DutyCyle and I’m not really sure about the cause. It might be a bug caused by the GW …
I’ve seen the message “Sub-band is blocked for XXX min” on TTN’s console for two different GWs when no large amount of DL was sent (no specific app DL etc.). The curious thing is that on the TTN’s GW main page it showed 0% usage of duty cycle for all the frequencies. One one GW the network was trying to send DLs after devices joined and they were blocked. I has to disable DutyCycle enforcement for the GW to get things back to normal.

I’m wondering where this could be coming from. Has anyone else seen this?

So far I’m not getting anywhere :frowning:
The problem is that I’ve pushed several GWs to customers so now I have to ensure it’s working …

On the interface going down, Microtik IS working the issue. There is already a working theory as to what might cause the issue, and a interim firmware has been devised to test the theory. Results should be in very shortly (as in: days).

This may still amount to nothing, but there IS movement.

Yes, actually I’ve noticed that here as well, and with the latest firmware version only. But my case might be doubtful because there were a couple of confirmed-uplinks before from a single device (not mine.) So the block might be genuine.

But don’t these messages come from the uplink (TTN), not from the gateway itself?

yes there is movement but it’s terribly slow. I cannot undrestand how they cannot reproduce the issue on their end. I’ve been testing internal dev firmware for a while and sending lots of supout.rif files to help debugging.
I’ve now had to upgrade to 7.20Beta2 on 2 GWs since 7.19.1 was unusable. LoRa Cnx is dropped all the time and my monitoring scripts that use the API were not working either since the API server seems to be also broken (at least the LoRa part).
I hope they will really pinpoint the issue because so far every testing release I’ve had did not show any improvement :frowning:

I’m wondering if the GW is not reporting a duty cycle exhaustion which causes the message to be displayed on TTN. I’ve seen that TTN requires a specific implementation for the xtime field that is used for the duty cycle computation. I’ve reported it to Mikrotik and they want me to send logs. Unfortunately I’m fearful to turn duty cycle enforcement again on the impacted GWs since blocking DL causes lots of issues.
The thing is that, if the TTN duty cycle counter show very low % usage as in my case, the only cause of the message might be gateway, that’s my theory …
If you check the screenshots I’ve sent on the console panel it show 0% usage on all frequencies and show 10 DL (ACK) 3 days ago. On the GW itself it also show 10 TX paquets for a total airtime of 3.950s . If the packets where sent 3 days ago then the duty cycle shouldn’t be an issue …

Have you tried rolling back to much earlier releases? I deployed a version as a test, so I could familiarise with the product and do some simple comparative evaluations, and it has run reliably and continually (bar a couple of power cuts and a bband outage! From which it recovered autonomously and gracefully) since May 21 :slight_smile: Once I have GW’s deployed I find little need to update unless issues arise or specific security/certificate issues arise, so perhaps an old firmware build will highlight if there is a change in the hardware or other factors that have started causing the problems recently reported? Rob it might be worth pulling out of bin and trying…saves the e-waste :wink:

Which version are you running ?

What ever was stock when I acquired in early '21…would need to go check which isnt a quick task at this point sorry.

Actually I was able to divert and do a quick remote log-in and looks like

RouterOS v6.48.2 (stable)

…if that helps any?

Thanks. That’s indeed a very old version … Are you using Basic Station protocol or simple UDP packet forwarder ?

Again would need to go check, but given age and deployements at that time its likely UDP - apart from TTKGW’s & later TTIG’s I think back then (it was my ~40th personal vs client/collaborator deployment) 80-90% were UDP, and I work on basis of if it ain’t broke don’t fix it! :slight_smile: