Mikrotik Gateway - Offline - not reconnecting

Jeff-UK · August 13, 2020, 9:34pm

This is good/interesting to know…hadn’t seen that one called out before - do you have links to investigations/examples and know which NAT implementations are likely to suffer?

Only ask as I often have 4-12+ gaateways running locally for tests and commissioning/updates and only have 3 Broadband/Backhaul connections available (ok plus occational 3/4G tethers or if GW runs directly on LTE connection…so can have 2-6+ gateways on same network (peak >10off!)

Where I can I use @Kersings MP Fwdr vs simple Semtech UDP which may mitigate the issue but often I dont have choice…

descartes · August 13, 2020, 9:37pm

And therein lies the problem.

People lie at varying levels of technical ability & interest and the resources can be gift wrapped & handed to them on a plate, but they haven’t the skills or aptitude to make use of them. And if they are using a device as an appliance, they may not have the financial resources to get someone else to make use of these resources to even identify issues, let alone implement & test a fix.

As a minimum, I want to be able to have a backup of a known good/working image that I can roll back to if future updates or changes I make go pear shaped.

There is definitely a space in this market for a gateway that can be built from the ground up using open source (as in, you compile the actual source), to allow experimentation and freedom.

But many many gateway owners are limited in resources of time & budget and just want to achieve an end result. So they turn to vendors that have a good track record of delivery, regardless of how open the implementation is, trusting that the system meets basic requirements and that the manufacturer will resolve any fundamental issues with functionality.

I could, from past life experiences, build a kit car from parts from a scrap yard, to the level of taking an engine entirely apart to clean it and put it back together. I know how a carburettor works, how to set the timing on an old style ignition, how to gap spark plugs, bleed brakes, change oil and so forth. But I have neither the time or inclination to do so, as a result, the Mitsubishi Outlander (LPG for the green win) I have is a bit of a mystery to me under the hood and I give it to a local garage for any servicing or maintenance that is required.

But at the moment we have many gateways that are very close to being open enough for total community involvement but not quite close enough, either by design, or in the main, just lack of documentation.

I for one would be happy to be involved in a collaborative effort to identify a hardware & OS platform with a concentrator which was demonstrably open such that even elderly hackers such as myself can spend an evening learning to compile the firmware and install. The more ambitious may even look at reworking the SX1302 reference design in to a four layer board.

I’m not anticipating that the current vendors are going to be persuaded to make everything they do open from the ground up overnight, such persuasive energies may be best deployed in getting some technical assistance from Semtech and from one of the Linux based board developers, perhaps Olimex.

cslorabox · August 13, 2020, 9:47pm

Pragmatically speaking, it would be great to get into the box with ssh or a serial port and run tcpdump or tshark on it.

But it would also be sufficient to run that on a router (or temporary router, such as a Linux PC with internet connection sharing) sitting between it and the Internet. It is not, however sufficient to run it on another host on the subnet, as many modern network topologies don’t even put traffic down the wire to a host where it doesn’t belong.

I’ve implemented the server side of the Semtech UDP protocol in python before, could actually be a fun little project to take a UDP echo example and turn it into a dummy server one could point a gateway at for testing. ACK the gateway requests with the right tokens, print out what it sends, never ask it to do anything (or maybe ask it to transmit a short raw LoRa message as an infrequent test). A proxy mode for man-in-the-middle debugging would be fun, too… might occasionally inject enough latency to cause a missed window, but it would be a brief test kind of thing, not a leave running one. And as a proxy it would be the legitimate destination of the trafffic so not need to run on the router.

While the tests substituting a different gateway somewhat suggest otherwise, it can’t be ruled out that we’re chasing some temporary server side problem, especially competing claims to an EUI.

kersing · August 13, 2020, 10:29pm

RouterOS has a build in packet sniffer. It even allows forwarding packets to a streaming host.
When configuring the capture one just needs to make sure not to capture the traffic generated bij the user connection to the MikroTik.
Its called Packet Sniffer and is located in the Tools menu.

descartes · August 14, 2020, 11:59am

Is there a repro for this?

How do you like my idea for an Open Gateway project?

cslorabox · August 14, 2020, 5:06pm

It would be a small project to create from scratch, but like the majority of the LoRaWAN-related code I’ve written, the actual implementation I currently have is owned by a client as it is a small and integral piece of one of their systems. Although it came to have a different purpose it started out as a drop in replacement out of desperation for an older, buggy version of the LoRaServer UDP to MQTT bridge, quite literally all I did was lookup “python UDP example” and start grafting on logic to handle the sorts of messages the packet forwarder actually generated and wanted. Later when I switched to a more modern packet forwarder I had to fix a few places where I’d been lazy with the details.

It’s a nice idea, but a lot of work that tends to be justified when there’s a volume of similar boxes people are wanting to use. My impression is that people are already doing this for some of the Multitech boxes (eg, taking the open source build and doing more with it, particular with regard to fleet management for TTN communities), Dragino seems to effectively do it and while I might be mistaken I though Kersing had some past comments about doing a variation of their image with his packet forwarder on it. And then essentially anything based on a pi already is that way, unless someone chooses a vendor install with extra cheese on top.

descartes · August 14, 2020, 9:46pm

It is a cold day in hell when a client has enough money to pay for the IP, especially if I then can’t use it else where.

As for a gateway, no participation in a project that addresses your concerns then? Be careful, I may get a concentrator design done, implement it on OpenWRT and not fulfil your expectations.

cslorabox · August 14, 2020, 10:41pm

This is both off topic, and utter nonsense. Most people in the world, be they independent contractors or employees, are paid for their efforts. A comparatively smaller number sell (title or licenses to) something over which they ever had ownership.

I frankly forget if it was a cold day in January or a scorching one in July, but it was a quite ordinary day on which I billed a few hours for that particular tiny part of creating their (not “my”) codebase.

This is even more true of boutique software: why develop at your own expense something that would be difficult to sell elsewhere, when the people who need it will simply pay you to create theirs?

If I’d invested in speculatively developing it, and then someone came along and wanted to buy out my investment,then sure that would be expensive. But in this more typical case, I was probably paid more to identify that this particular component needed creation, than I for the time spent to actually create it. I made good money doing it, but in an entirely ordinary and everyday sort of way.

Today, if I had an actual need for a python implementation of the Semtech UDP protocol over which I had ownership (for example, to give away licenses to), it would take an hour or two to create one, starting from a UDP example and the Semtech docs as I described above and did before. The thing is, I don’t have such a need, and the idea that even anyone else had one was simply debug brainstorming.

descartes · August 14, 2020, 10:51pm

I’m not most people - I retain IP in most instances as to get exclusivity to the research, design, implementation & testing is far outside most budgets - I choose not to be prevented from fulfilling similar business problems else where because I wrote some code for someone once that does something similar. I get reasonably remunerated for the first implementation and then make hay on subsequent implementations, often folding back enhancements in to the original/previous clients versions, certainly any fixes that arise. With a clearly identified market place, many potential clients will have similar business problems to solve.

Still interested to hear if you are up for some Open Gateway action.

cslorabox · August 14, 2020, 11:51pm

In terms of the technical effort, this is another “been there, done that but someone else paid and owns it.”

Nothing really prevents me from helping with an open effort, and if something gets going I’m happy to contribute assistance to solve key problems. But the reality is that the people I know who have a serious need for gateways and are willing to buy the parts, set them up in the field, etc, are precisely the people who paid me to design one when we could not find anything that met their full range of needs.

A key part of what that provided was a sufficient scope of need, focused on a single hardware configuration, to make doing it moderately sensible.

So, for example, if a bunch of RAK7258 owners were really clambering for a tuned DIY build, sure, I’d help out. Or some other viable configuration for which I had access to reference hardware.

But in the course of considering possibilities, I already own three concentrators, one of which matches my client’s current business plan. Sure, I could go buy a pygate and wire it to a pi zero… but in the grand scheme of things, what does that really accomplish, and why should I spend my money on it? I don’t have an unmet gateway need.

Personally, an 8-channel LoRaWAN concentrator is an expensive component.

Professionally? It’s all about the u.fl connectors and mPCIe standoff stacking height.

Building code to run on integrated hardware, or 8cm jumper wire hardware is the easy part.

Bolting stuff together in a box suitable for the field is where it turns out to be hard.

cultsdotelecomatgmai · August 15, 2020, 8:51am

Please can @Johan_Scheepers advise if his gateway is connected to TTN and working normally.

If yes, then that’s good.
If no, then we have to acknowledge that this topic’s “you have a sprained ankle? let’s do a heart transplant” and “if you’re trying to get over there then don’t start from here” approach has not worked.

As a fan of reliability engineering I think that a community of >100k people with >12k connected gateways should be doing better.

Johan_Scheepers · August 15, 2020, 9:00am

Hi,

No still no luck, have upgrade , reset, reconfigured, but it still does nor pass the TTN traffic to LoRaWAN of the LoRaWAN traffic to TTN.

Have posted on the Mikrotik forum 3 days ago but they have not approve my post or looked at it.

Have sent them a mail now, but it can take up to 3 working days fro them to respond.

bwooce · August 15, 2020, 11:33am

Can you try sniffing some packets please? It’d be great to know which TTN endpoint it’s trying to connect to, for a start. Which one is configured in? The default mikrotik ones, and which one did you choose?

It is weird that it’s RX is okay but it’s not forwarding it on. It’d be great to see a trace of it trying to.

arjanvanb · August 15, 2020, 12:06pm

This.

The gateway does not know any of the LoRaWAN secrets, so cannot validate a LoRaWAN MIC, hence has no way to rightfully assume it’s only seeing LoRaWAN. And yet, still it does. Also, it may indeed be configured to also show traffic that doesn’t pass a basic CRC. (Or even shows packets that have no CRC at all?)

So yes: we’ve seen these gateway logs to be confusing before. When it says something is a join request, join accept, uplink or downlink, then that’s only based on boldly interpreting the message’s first byte, without even knowing if it’s LoRaWAN to start with. (And given the IQ inversion it should not hear traffic from other LoRaWAN gateways, so should not receive join accepts nor downlinks?) And what it’s showing as the DevAddr is only based on whatever is in the message’s 2nd thru 5th byte (1…4 with zero-based index), reversing their order, again boldly assuming it’s LoRaWAN. Also note that LoRaWAN should always use coding rate 4/5.

Asides: gateway EUI, DevAddr, and anything else that’s transmitted through the air are not secret. No need to redact those in screenshots.

And as a workaround for the very annoying handling of images in desktop browsers on this installation of the Discourse forum software, one can add a relative image size to the Markdown text, like the ,70% and ,80% in ![image|1000x75,70%](upload://2RGfN8nRVM6LsGHHqkQPdfe2cPC.png) and ![image|799x341,80%](upload://k402ihXXpAp8H3rQgx6mPTNPAyS.png) to get:

cslorabox · August 15, 2020, 1:26pm

I seem to recall intentionally mis-setting this in the past as an experiment and still getting a moderate amount of traffic through

While you’re not wrong, the console logs of the current Semtech packet forwarder repo assume they can extract a device address from messages, too.

One can’t read too much credibility into these things, but they can be handy for debugging when you see something you were expecting to.

Also note that LoRaWAN should always use coding rate 4/5.

That’s probably the most important point, and I’ll readily admit I’d previously overlooked it. When filtering to only CR 4/5 messages, the log gets a lot shorter. But I suspect it does still show some valid traffic.

Although I’ve heard of radios getting into a mode (likely a mis-set internal bit) where they simply invent CRC error packets and never receive any good ones, I’ve not given up on suspicion of a software protocol or IP networking issue. But we’re operating on limited information.

Getting some raw information from the packet capture would be great. Not only would this expose networking issues, it should be possible to take a message from a known node seen in the packet capture, and compare it to the raw message seen by another gateway, or that the node was known to transmit (ie, the post-encryption buffer actually loaded into the node’s radio).

fcojmontilla · August 18, 2020, 10:27am

Hi,

@Johan_Scheepers Will try to give a hand (Mikrotik Consultant & Trainer), need screenshots of:

RouterOS version (can be seen in winbox window titlebar, or system > packages)
Routerboard firmware ( system > routerboard )
IP > DNS
IP > DNS [Static] button
LoRa > Devices, double click yours, switch to Stats tab

Captura de pantalla 2020-08-18 a las 12.20.08

Here the ratio between Valid RX Count (radio received messages) vs RX Forwarded (messages sent to TTN) is what tells us if traffic is being forwarded or not to the gateway. (Device uptime: 14 days.)

I’ve been running a LoRaWAN gw for months, rock solid so far, but for the TTN servers, whose connectivity is flaky, not to mention monitoring.

When it has failed and wasn’t the gateway (I found DigitalCatapultUK to be the most reliable from here) problem was usually the dns resolution.

Can you run tools > traceroute to your TTN gateway, leave it running some minutes and post a screenshot?

You have plenty of killer tools inside ROS for troubleshooting, as was mentioned in the thread. Packet sniffer, packet capture including live streaming to wireshark, etc)

I suspect problem here is either ROS version (I wouldn’t use anything older than 6.46.5 for LoRa deployments, while I only use long-term 6.45.x on pure network routers) or UDP masquerading issue.

Networking wise, the forwarder and protocol (UDP) aren’t the best to have several LoRaWan devices behind the same internet connection, may require some traffic mangling programming to get things right if you expect to have several LoRaWan devices behind same internet uplink.

Johan_Scheepers · August 18, 2020, 1:19pm

Hi,

Thanks,

First I can ping via the ethernet interface the RouterOS but can not log into the RouterOS with WinBox.

Now I connect the the RouterOS via the WIFi AP WinBox works and I have full internet access.

system > packages

system > routerboard

IP > DNS

IP > DNS [Static] button

LoRa > Devices, double click yours, switch to Stats tab

Thank you for your time

fcojmontilla · August 18, 2020, 1:48pm

Hello,

First I can ping via the ethernet interface the RouterOS but can not log into the RouterOS with WinBox.

Looks like you kept the default firewall, which blocks connections coming from ether1… as wAP is a one ether port device, is blocking winbox from connecting.

As this is LAN device, I assume you have another router/firewall protecting your LAN; so you can safely disable ip > firewall > filter rules; or more easily, go to interfaces, interface list tab, there will be a LAN and a WAN list; remove ether1 from the WAN list and add it to the LAN list.

that’s why you can connect via wireless, because that port (wlan1) is not filtered by the firewall.

Your firmware (sort of router BIOS) is out of sync… press [Upgrade] button, you’ll receive a firmware succesfully flashed, reboot afterwards.

Firmware upgrades come embedded with ROS upgrades, but ROS upgrades won’t flash firmware by default. Best practice is keeping firmware always in sync with ROS version to avoid issues.

I’m checking eu.thethings.network traceroute and sadly (specially because TTN monitoring is so unreliable is useless) seems they’re setup not to answer ICMP packets, making troubleshooting rather miserable… guess they’re scanned/attacked so much that they had to resort to that… but something tells me the problem doesn’t lie in L3 connectivity towards the TTN gateway.

This is odd… makes me think some sort of internal issue with the radio. Can you please post LoRa > Your device window (settings, specially antenna gain) masking the IDs is fine. Untick forwarding of packets with errors.

In summary:

0.- disable the firewall/add ether1 to LAN interface list to be able to log straight via ethernet into the wAP.
1.- upgrade firmware and reboot. Check the status of the LoRa device.

Let me know how it goes to guide you further if firmware upgrading doesn’t fix it.

fcojmontilla · August 18, 2020, 2:08pm

Here’s a way to confirm if there’s L3 connectivity between your device and TTN gws.

You need to know the IP of the TTN gw first, e.g. 52.169.76.203 for router.eu.thethings.network.

Open IP > Firewall, go to connections tab, shall be some on your wAP, as aparently you have default firewall programming. If this is not the case, you’ll need to check this on your border router connecting to internet.

Now click the funnel icon, and set it, (with 52.169.76.203 as IP) like

Captura de pantalla 2020-08-18 a las 16.03.56

Here we can see a ping I’m running w/o any response (0 bytes reply), and two Sucesfully “stablished” connections with the TTN gateway IP (same port 1700 for msg upload and download) that sit established but idle (0/0 Rx/Tx bytes/s).

SACs stands for Seen Reply, Active, Confirmed, src-natted.

The three first flags (SAC) means there’s proper bidirectional communication.

Johan_Scheepers · August 18, 2020, 2:17pm

Thanks all up now, have re-sync

From the fire wall I can see the TTN

Thank you for your help, very nice to meet you,and good to have Mikrotik Consultant & Trainer assistance. smiley: