Lora Packet Loss

We are just about to start our first larger application with LoRaWan, but still are testing the system. Our setup:

  • 3 gateways (different typres) in separate spaces
  • Some commercial nodes, e.g. one “theThingsNode”, some LMIC-Nodes for testing
  • different ranges, but all nodes still running on SF7

We are using node-red to receive packets from the discovery-Server and forward them to Thingsboard.

Overall results are good, but we are still facing a packet loss rate. As we are not pretty sure about the reasons we are doing some investigations. Main question is also, if/how can we reduce the loss rate?

Single packet loss is not so critical, but we also see some clusters of lost messages. The following protocol shows the time difference between two received messages and the packet loss evaluated from the node counters. Most of the time, there is only a single loss, but frequently we also have up to 7 packets successively lost.

Sure, the packet rate is high (one message every 20s for test purpose), but things are similar with lower rates. There are hours without a single loss, but most of the time it is like in the following protocol. Average loss rate is between 2 - 10%, but we have also times with 25% loss.

Any suggestions, how to analyze the situation? As far As we can see there is no downlink on the gateway.

Here are some logs:


Hi @EFthings01, you ask:

… if/how can we reduce the loss rate?

You reduce the loss rate by identifying where the loss is happening and then making and testing changes around the point of loss.

Are you confident that all the nodes are actually sending all the uplinks? Check using debug monitors, etc.
Is the LoRa RF working well? Check the gateways and applications on TTN web console and via the MQTT feeds. Are the uplinks being received on the core by all the gateways as you expect? Are the RSSI and SNR numbers as you expect? Are they well within operating margins?
Are you using UDP based packet forwarders? The public Internet typically loses about 1% of packets as single packet loss and burst losses. TCP will recover from this loss and tricks people into thinking that the public Internet is reliable, UDP will not recover. The legacy Semtech UDP packet forwarder is 100% guaranteed to lose traffic when used across the public Internet.

Once you narrow down the source[s] of the loss you can start to deal with it. Until then no-one on this forum can help much.

1 Like

Hy Tim,

yes, that´s what we are trying. But first of all: how much loss has to be expected under perfect conditions, 1%, 10%, 30%?!?

Referring to your advice I started a test: Switched off all nodes except one here with an RSSI of -65dB, that sends one message every minute. So i coud see the traffic of the gateway, of the application and of node-red, that receives the messages.

So far, all messages displayd in the gateway traffic have been forwared correctly. But not all sent messages are shown in the gateway traffic.

Here is an example from the gateway log. This was one packet that was not forwarded. We see that the gateway received the message, but TTN did not get the data. This is an Kerlink iFemtocell running SPF:

Sep 16 15:45:07 klk-wifc-040187 local1.notice spf: INFO: Received pkt from mote: 26012395 (fcnt=16298)
Sep 16 15:45:07 klk-wifc-040187 local1.notice spf: JSON up: {"rxpk":[{"tmst":552221707,"chan":1,"rfch":0,"freq":867.300000,"stat":1,"modu":"LORA","datr":"SF7BW125","codr":"4/5","lsnr":9.0,"rssi":-63,"size":21,"data":"QJUjASaAqj8Bk616AQ+63HXICLhD"}]}
Sep 16 15:45:07 klk-wifc-040187 local1.notice spf: ##### 2019-09-16 15:45:07 GMT #####
Sep 16 15:45:07 klk-wifc-040187 local1.notice spf: ### [UPSTREAM] ###
Sep 16 15:45:07 klk-wifc-040187 local1.notice spf: # RF packets received by concentrator: 1
Sep 16 15:45:07 klk-wifc-040187 local1.notice spf: # CRC_OK: 100.00%, CRC_FAIL: 0.00%, NO_CRC: 0.00%
Sep 16 15:45:07 klk-wifc-040187 local1.notice spf: # RF packets forwarded: 1 (21 bytes)
Sep 16 15:45:07 klk-wifc-040187 local1.notice spf: # PUSH_DATA datagrams sent: 2 (312 bytes)
Sep 16 15:45:07 klk-wifc-040187 local1.notice spf: # PUSH_DATA acknowledged: 0.00%

We can change the gateway and use a TTN indoor gateway. Here I cannot read the logs, but loss is nearly the same. RSSI is slightly higher (-60dB), but we loose lot´s of traffic anyway.

True, we should use lower frequency, but airtime is only 36 ms and there are no other nodes around.

So it seems pretty clear that losses happen between gateway and TTN, but it is not the Gateway itself.

Might be the reason is our internet connection or the receiver?!? Internet connection here is very stable and fast (above 50MBit) and we had the same issue on other places, so it is unlikely an issue of the connection.

Any suggestion to digg deeper?

HI @EFthings01, you ask how much loss to expect under perfect conditions… none of course, perfect is perfect. The world is not perfect so there will be some loss. However you use the data, it must be tolerant to incomplete data. You are doing the right thing to investigate any persistent loss.

So… what happens when you run the device with per-60s uplinks into both gateways active at the same time?

You can watch the TTN web console application / data and see the gateways metadata to see if the uplink is arriving via zero, one or two gateways.

If some uplinks are not delivered at all and one of the gateways has a much lower delivery rate then that could indicate different problems with the two gateways.

There are known stability problems with the TTIG. Please search on the forum. The TTIG uses the new Semtech packet forwarder based on TCP but also has the problem of working via the temporary bridge at the TTN core to connect to the v2 core software.
I suspect that the Kerlink is using the legacy Semtech UDP forwarder (SPF in the log above). This will always lose some of the traffic, however good your local Internet access is.

The future should be better. The TTIG stability problems will hopefully be fixed. TTN will hopefully move to a v3 core with native support for the new Semtech packet forwarder and existing gateways, such as the Kerlink, will get support for the new Semtech packet forwarder.

However, if you’re looking for perfect then you’re in the wrong place!

1 Like

There is no such thing as ‘perfect’ conditions of course, unless the TX and RX are in a ‘perfect’ faraday shield with no other electronics or EMI around, so not liklely.

To be more practical, on the bench in my shed, with a PC running nearby and at that signal strength I would expect packet loss, due mainly to CRC failures, to be under 0.1%.

The real World has a lot of EMI and interferance, so that sort of packet loss is no surprise.

Please read my post: All packtes are received by the gateway! CRC_OK: 100.00%

PUSH_DATA datagrams sent: 2
PUSH_DATA acknowledged: 0.00%

Technically I do not now how the connection works, but the source of loss is CLEARLY the connection from gateway to TTN.

Maybe there are some problems with the old SPF, but things are similar with the TTN indoor gateway. I understand, that on the RF-side there might be some reasons for loss. But for an IP-connection between TTN gateway and TTN there is no good reason! Maybe sombody from TTN/TTI can explain?

Rewrite: Found this: https://github.com/gotthardp/lorawan-server/issues/227

PUSH ACK response time: "After upgrading to the latest code, I started seeing a lot of “ack_lost” warning messages. Then I realized that my gateway was not receiving any PUSH ACK from the server. It was a timeout problem, changing the packet forwarder parameter push_timeout_ms from 100 to 300 solved the problem. "

People did. It is less apparent if you read their replies.

These statements have no relation to each other.

There’s nothing you have posted so far that would indicate that all packets transmitted have been received by your gateway. If in fact they have been, you could show a log of that. It would be quite useful to be able to show packets that were received, but not processed. Also if you control the gateway, at least in the ABP case (and even in the OTAA case, if you your gateway handled the join) it is possible to write code to recover the payloads yourself, independent of submission to TTN.

That all of the packets the gateway tried to receive had valid CRC says nothing about packets it did not even notice or try to receive.

Technically I do not now how the connection works, but the source of loss is CLEARLY the connection from gateway to TTN.

One of the messages you received in response explained how the legacy UDP interconnect tends to lose data.

things are similar with the TTN indoor gateway

Another message explained the known TTIG issues. That situation indeed points out how keeping the TTIG firmware closed and thus unrepairable is utterly inexcusable, but that is distinct from your attempt to complain that no one understood your message.

If you want to do something productive, figure out if the messages are actually being received by your gateway. If they are, and you need a degree of performance which TTN cannot currently offer, you may want to think about implementing local decoding in parallel with the route through TTN. At the very least you’ll come to a sound understanding of all the moving parts, and gain insight into exactly where things are failing.

Hy all,

sorry if my post was misunderstood. I edited the post to make it more readable. Surely I read the replies, but from our test it was obvious where the loss occurs. We carefull watched the traffic for some time, and checked the gateway logs later. Sorry I cannot present a log from there, but we clearly see the data in the gateway traffic, the application data and in node red. We have a reduced environment with only one node sending and one gateway receiving, so we can watch the traffic through TTN easily.

The Log is only an example for one lost packet! We can see, the packet was received by the gateway but did not arrive at TTN, question ist, why.

We have been logging the hourly packet loss of three nodes for some days, average loss is about 10%, some hours we have no loss, but all nodes have lot´s of hours with loss rates with more than 25%.38%20AM%20001

If it´s an timeout-issue, it could be depending on the internet connection. I have increased the push_timeout_ms in our gateway and we can see that the loss is lower now (though not zero). Maybe this is not the only reason, but there seems to be a connection between loss and timeout.

Here is a log of the transmission of last night. The diagram shows the duration between two messages. The values are normalized, so transmission time should be 1. If one packet is lost, transmission time is 2. Here we can estimate, how much packes are missing.


Followup: (Decoding SPF logs is not so easy)

After some homework we could compare the SPF-Logs and our data transmission logs. Here we can see exactly what happens. Sure, there are SOME packets lost by RF transmission, but not many.

From 1140 Packtes only 6 got lost because of a CRC error. About 228 Transmissions got lost between Gateway and Backend. This is 20% of the data, which is surely too much.

Main question is: What can be done?

I assume that there are not many applications that can tolerate up to 7 Messages lost one after the other.

Almost exactly 2 years ago Johan Stokking made the latest commit to the TTN packet forwarder to address the known problems with the original Semtech UDP packet forwarder that was released almost 6 years ago.

There were other developments by specific LoRaWAN operators, equipment manufacturers and members of the TTN community.

Semtech has now released their Basic Station software to address the shortcomings and fragmentation; https://doc.sm.tc/station/

What you do is your business. What I’m doing is:

  • Using the TCP-based TTN packet forwarder wherever I can (RPi + ic880a systems) - works well to TTN, no loss.
  • Learning about the whole system so that as TTI delivers v3 software with Basic Station support, I will be ready.
  • Rather enjoying the process of learning how packet/data loss causes stress to back-end systems and how to cope with that.
1 Like

Check your other topic for the cause, I’m not going to duplicate replies…

Hy Tim,

when we started this post, we did not know that UDP could be the source of high packet losses. We even did not know what loss we could expect (see above). It seems, that Kerlink uses outdated software on their gateways, right?

Now we have a bunch of this devices and need to find a solution. There is a newer software from Kerlink, but they say it is not compatible to our device version (?!?). As far as I see there is no easy way to adapt a new PF for the iFemtocell. If you have different experience, pleas let me know.

Best regards

Its a hard lesson to learn the hard way as they say but I’m told this is “Internet Communications” 101 type stuff so sounds like you need to spend time on both better understanding and investigating The Internet, LoRa & LoRaWAN… as you have found you have tapped into a good source of knowledge and often support through the TTN forum :slight_smile: so that should help with the latter, and given the breadth of expeience I have witnessed with many of the Forumites good for help with the former too! :wink: .

I havent analysed details for quite some time but general perception for UDP over internet for many of my connections is loss of 0.1% to 1% not uncommon with bursty peaks at some times of day >10% (so your experience of 20% may be within the bounds of probabiity), also if you read widely on the TTN forum you will see reference to increasing difficulty for the back end to handle the ever increasing scale of UDP traffic (vs TCP-IP) so, especially in recent days and weeks, it may be that the stresses increasingly seen on the back end handling UDP are also showing up and biting you. There has been some general discussions around potentially migrating older UDP based GW’s to other (e.g. TCP-IP) mechanisms to help offload and mitigate the stresses seen from the increasing load…

As for

I might suggest you try to stay on Jac’s good side and even offer to cross his palms with silver as @kersing developed the mp-p-f, standing on the shoulders of giants one might say, and if you have enough units to make it worth your while funding a development vs waiting to see if Kerlink step up or scrapping and starting again with other GW’s and PF’s if such packet loss is truely not tollerable you may be able to persuade him to spin something specific that can help move away from use of UDP. :thinking:

If such losses are a problem in your application then I would question if any RF based solution - even LoRa based with is improved properties over conventional RF modulation schemes like FSK wrt interference mitigation and resiliance and ability to extract signals from below the noise floor - is the right solution for you application?

1 Like

Answer: Lora is, UDP is not.

As we see in our current test, less than 1% of the loss was caused on the RF side, that´s perfectly suitable for a monitoring task. But hours of loss caused by a traffic jam on TTN is not.

The decision from semtech to use UDP is hard to understand in different ways. Not only Loss is an issue, but also security is. It´s like setting a fast car on wooden wheels.

There are many good reasons to go with lora! We have tried many others like NB-IoT and Bluetooth LE, but they have also several flaws referring to range, building penetration and costs. And the hard lessons we learnd from commercial offers was, that you may not trust advertising promises. With LoRaWan we have an open standard. Ok, now we are facing some problems with packet loss. But there is a chance to use other gatways with better technology.

So, overall I think, LoraWan is a great step forward. The good news for us is, that - with some help from the forum - we found the source of high packet losses and now are able to find a solution.

So, I like to thank you all for your help!


There are always tradeoffs.

On a traditional IP network (vs some mobile modems) UDP is normally one of the fastest schemes, so it was what was traditionally used for time-sensitive tasks like streaming audio or video.

The architecture of LoRaWAN cannot really make full use of stale data anyway (once you don’t complete the round trip within the receive window, the opportunity is lost), so something such as TCP that does retries may not have been seen as advantageous. For example in naive (and ideally avoidable) cases TCP may actually sit on the data for 200 mS before even starting to send it.

That said, UDP may well not be the right protocol for your use today. It doesn’t really seem however that you are tied to it - most gateway hardware can run alternate software stacks built from source.


The decision from semtech to use UDP is hard to understand in different ways.

My understanding is that the original Semtech UDP/IP packet forwarder was developed as an R&D tool and technology demonstrator, rather like the IBM LMiC software for devices.

Both of these software systems were developed before LoRaWAN exploded, little did anyone expect…
The LoRaWAN standards have had to have forklift upgrades.
The packet forwarders and device libraries are also undergoing major replacement work.

I worked on industrial systems through the network wars of the 1980s and 1990s. LoRaWAN is smooth running compared with those times!


A number of my nodes run on 3G backhaul and I find the mobile networks sometimes respond faster on UDP than std TCP IP sessions with less jeopardy for RX1 Window, relying less on RX2 etc. So it’s horses for courses.

TTN have acknowledged the difficulty in scaling UDP support hence other threads… not surprising if you consider the the rate of grown of the TTN& TTI user base. Over time we have celebrated key milestones in that growth (again search other threads) but wrt UDP handling it has come at a cost.

If I may refer to my own engagement experience over the last 7+ quarters # of GW’s near squaring at times (ignoring stand alone GW’s or deployments on other nets) 1, 2, 4, 7, 12, 18, 25, 30…now 31 with #32 pending & 2 ‘retired’ off network. Even with many of later units now deployed without UDP, and slow project to migrate older units off UDP if anyone else has similar story you can see the challenge for TTN to keep up! Note: mp-p-f is your friend :wink: even if sometimes the TTIG is not (yet!)

Remember with UDP the traffic jam is not necessarily in the TTN infrastructure but on the net in general…esp at some times of day or with special events given (video) media consumption growth over last year or 2!

Update just caught up with Tim’s @cultsdotelecomatgmai & @cslorabox comments and concur they reflect whatI have seen over the years and since engaging with TTN :slight_smile:

…reminds me of the old British Rail advert…”We’re getting there” :wink: :rofl:

1 Like