Dataloss in the backend?

Hy there,

we are logging data losses for some test nodes. Over the last days we had increased loss rates. The following log was written within node-red, it gives the node packet counter, the time since last reception and a time stamp:

{"arduino_test":10377,"arduino_test_delta":2463.921,"timestamp":"2019-10-22 14:13:19","arduino_test_lost":40}
{"arduino_test":10416,"arduino_test_delta":2336.506,"timestamp":"2019-10-22 14:52:16","arduino_test_lost":38}
{"arduino_test":10427,"arduino_test_delta":664.953,"timestamp":"2019-10-22 15:03:20","arduino_test_lost":10}
{"arduino_test":10428,"arduino_test_delta":57.415,"timestamp":"2019-10-22 15:04:18"}
{"arduino_test":10454,"arduino_test_delta":1557.818,"timestamp":"2019-10-22 15:30:16","arduino_test_lost":25}
{"arduino_test":10496,"arduino_test_delta":2519.84,"timestamp":"2019-10-22 16:12:16","arduino_test_lost":41}

Checking the gateway logs I found, that during the ā€œdowntimeā€ lotĀ“s of data have been sent:

Oct 22 15:48:45 klk-wifc-040187 local1.info lorafwd[3036]: <6> Uplink message (EF17) sent
Oct 22 15:48:45 klk-wifc-040187 local1.info lorafwd[3036]: <6> Uplink message (EF17) acknowledged in 39.104 ms
Oct 22 15:48:48 klk-wifc-040187 local1.info lorafwd[3036]: <6> Received uplink message: 
Oct 22 15:48:48 klk-wifc-040187 local1.info lorafwd[3036]: <6> | lora uplink (7B305FE4), payload 27 B, channel 868.5 MHz, crc ok, bw 125 kHz, sf 7, cr 4/5
Oct 22 15:48:48 klk-wifc-040187 local1.info lorafwd[3036]: <6> | Unconfirmed Data Up, DevAddr 27008866, FCtrl [ADR], FCnt 4895, FPort 1
Oct 22 15:48:48 klk-wifc-040187 local1.info lorafwd[3036]: <6> |  - radio (00000107)
Oct 22 15:48:48 klk-wifc-040187 local1.info lorafwd[3036]: <6> |   - demodulator counter 2126864916, UTC time 2019-10-22T13:48:48.059476Z, rssi -62.2 dB, snr 4< 7 <9.75 dB
Oct 22 15:48:48 klk-wifc-040187 local1.info lorafwd[3036]: <6> Uplink message (EF18) sent
Oct 22 15:48:48 klk-wifc-040187 local1.info lorafwd[3036]: <6> Uplink message (EF18) acknowledged in 37.5419 ms

So, it lookis like data have been transmitted, but do not reflect in the backend. We are usint ttn-contrib-nod-red to receive data in node-red.

I know there is a status page, but this gives us only current status. Is there any status history to see, if data losses are caused by downtime or maintenance?

1 Like

Followup: After lotĀ“s of debugging we catched one case. This is the gateway traffic:

41%20AM

this is the device traffic:

41%20AM%20001

The platform was running for hours without any issue, but suddenly lotĀ“s of data got lost in the backend.

Any explanation for this?

Current state below. Is this the ā€œnormalā€ operation?

24%20PM_2

Left side getway-traffic for a device, right side device traffic. Nearly every second message lost.
To be honest, not all losses are caused by the backend, but currently there does not seem to be a stable transmission. This are loss rates for the last week (1 means: 1 packet lost)

38%20PM

Nobody with similar issues?

Since it seems not random. I would expect such behavior if some FUP enforcment would be active.

What gw & ttn handler/router being used?

Which packet forwarder running? If legacy/UDP based (I suspect as gw name/eui is usual alpha numeric jumble vs a more human friendly name ref) that may be an issue as generically UDP traffic more likely to get lost (& no retry/authentication mechanism) ā€˜on the netā€™ even before reaching ttn backend.

Not sure if these are related but weā€™re experiencing the same thing over in the US last week and today as well, where our ā€˜typically running fineā€™ nodes all of the sudden do not receive a JA from TTN, even in the best of RF environments and without any changes to any variable lately:
image
Weā€™ve checked both, uplink and downlink spectrum used with a spec A and everything is ā€˜cleanā€™/no interference, which Iā€™m not surprised given the fact that: 1) you can see the messages getting to TTN, just not being ā€˜repliedā€™ to by TTN 2) the messages are 6 to 10+ times above the noise (SNR 8 to10 dB), so thatā€™s not it. Something has been going on lately in the backend. I had a conversation with TTN last week (10/23), they confirmed there was a ā€˜hiccupā€™ for 30 mins, but I see this happening a lot more than that and for lot longer. Although this is ā€˜as isā€™ service for all to enjoy, work, play, collaborate, I wish there was a better way of knowing if thereā€™s a ā€˜downtimeā€™ happening - especially when for lots out there (:red_circle: included) TTN is a springboard to TTI.

Hereā€™s more info from our end:
LoRa Broker: TTN US West
Nodes : OTC RadioBridge
GWs : Laird Sentrius RG191
Spec : US 902-928 MHz (starting at LoRa sub band 2 typically for OTAA from 903.9 MHz channel 8 / 125 kHz to 905.3 MHz channel 15 / 125 kHz / center channel 904.6 MHz / 500 kHz :
image

1 Like

Hy Jeff,

as the packets show up in the gateway traffic console, the udp connection cannot be an issue here (Console shows only packets that have passed the UDP link) . And Fair Use Policy is not active at the moment (there is even no airtime counter in TTN).

So, I heard this might be an overload issue, but no details.

Sure, TTN is a springboard to TTI, but if TTN is loosing most of the packets, this is also no good promotion for TTI.

2 Likes

It seems weā€™re back to normal since about 2:40 AM EST after 4 hours of downtime last night :
image
You can see communication was in/out around 1 PM yesterday, then 5 PM, then 10 PM until almost 3 AM. I let TTN know and theyā€™re looking into it as of last night. :red_circle:

Same hereā€¦
We have a gateway with 30 devices and 5-min updates and a random 10% of those updates does arrive in the Gateway Traffic list (correctly labelled, indistinguishable from other payloads etc) but AFAICT not in the Application Data list and not on the MQTT API.

same issue?

Yes, seems the same. I am working with a GPS and it is losing packets since I started it (3 days). Sometimes it loses more, sometimes less. But all packets appear in the GW backend. Donā€™t know where the problem could be.

FYI: what we did to ā€œsolveā€ the issue is to add a Data Storage integration on the TTN Application and use the REST API on that storage instead of the MQTT interface.

Hy,

thank you very much, I did not know it exists. Did it really help?