Is_retry: true, but not for confirmed uplinks?

Hi,

I’m going through our database of messages trying to filter out duplicates. After marking the actual duplicates - exactly the same json as received from the MQTT topic recorded in the table more than once - I started looking at messages with the is_retry flag set to true.

Some of these are obviously confirmed uplinks where the ack was missed, and I can find those using framecounts or raw_payload content or whatever, and the json says it wants a confirmation.

But a random message I chose to examine wasn’t like this. I looked at the previous message for that deveui and it wasn’t a confirmed uplink.

The retry message is below. Any idea why it was marked as a retry, and why it seemed to be received through the same gateway twice?

I’m assuming the 10 minute time difference on the gateways is due to an incorrect setting on one of them as the prior message showed it also.

The only weird thing I see is that the counter value for both this message and the prior one, which was received minutes before this one, is “1”. I’m guessing a rejoin might have happened between the messages.

Is there enough info here to guess what’s happening or would I need gateway info or info from the console as well?

{
    "port": 15,
    "app_id": "cwn_tank_level",
    "dev_id": "15",
    "counter": 1,
    "is_retry": true,
    "metadata": {
        "time": "2019-03-14T04:38:42.395469689Z",
        "airtime": 370688000,
        "gateways": [
            {
                "snr": -14.25,
                "rssi": -95,
                "time": "2019-03-14T04:47:03Z",
                "gtw_id": "orange-ag-institute",
                "channel": 3,
                "altitude": 920,
                "latitude": -33.32198,
                "rf_chain": 1,
                "longitude": 149.08578,
                "timestamp": 2506235244,
                "gtw_trusted": true,
                "location_source": "registry"
            },
            {
                "snr": 9.25,
                "rssi": -24,
                "time": "2019-03-14T04:47:03Z",
                "gtw_id": "orange-ag-institute",
                "channel": 0,
                "altitude": 920,
                "latitude": -33.32198,
                "rf_chain": 0,
                "longitude": 149.08578,
                "timestamp": 2506235260,
                "gtw_trusted": true,
                "location_source": "registry"
            },
            {
                "snr": 10,
                "rssi": -70,
                "time": "2019-03-14T04:36:58Z",
                "gtw_id": "oai-multitech_02",
                "channel": 0,
                "latitude": -33.32246,
                "rf_chain": 0,
                "longitude": 149.08528,
                "timestamp": 3233188364,
                "gtw_trusted": true,
                "location_source": "registry"
            }
        ],
        "data_rate": "SF10BW125",
        "frequency": 922.4,
        "modulation": "LORA",
        "coding_rate": "4/5"
    },
    "payload_raw": "AbkACC5bASQ=",
    "hardware_serial": "70B3D5CD000101B9"
}
1 Like

Your packet received by the same gateway twice occurs because the real signal at -24 RSSI is so absurdly close that it’s overloading the front end and also being falsely received on another channel at -95 RSSI with a bad SNR. Short answer is, don’t have nodes so close to a gateway. If you absolutely have to, considering replacing their antenna with a 50 ohm SMT resistor.

On the retry, you should really reconsider if confirmed uplink is a good idea, most of the time, it’s not.

In terms of rejoining, what’s the behavior of your node? Rejoins should be extremely rare, idealy a device only joins once ever. But development devices can be a more complex situation.

3 Likes

Very interesting, thank you!

This message wasn’t a confirmed uplink one. We don’t use them, but some devices we’ve had from other orgs do. Those are the retries I’m trying to filter out at the moment, because they should be easy to find by looking at the previous message from that device.

This message has is_retry set to true but no confirmed property to signal it was a confirmed uplink. So it’s a weird message.

We will occasionally have devices close to gateways, such as when we’re configuring one or the other in our lab or office. I guess you’ve given me a clue as to what can happen and what to look for in that case.

Do you have frame counter checks disabled in the registration of that device?

Frame counter values aren’t supposed to repeat, except when they’re a retry of a confirmed message. But if someone has an ABP node and reboots it, typically they will illictly repeat, even though the spec says that the frame count has to be eternally maintained (eg, stuck in some sort of nvram somewhere safe from routine restarts).

There are nodes that repeat every message twice or even three times without incrementing FCnt on different channels. imho this behaviour is caused by a MAC-command.

Not legitimately there aren’t.
This is either:

  • a firmware bug
  • a misunderstanding of a situation such as overloading the gateway front end causing phantom reception on additional channels, or how the poorly thought through confirmed uplink mechanism works
  • a firmware that either intentionally violates the LoRaWAN spec or was hacked on by someone who didn’t really understand the spec.

There are valid arguments for making different decisions than the LoRaWan spec, but the thing is, if you’re going to do that, you need to be on a private network with a server that matches your alternate scheme, not on a public LoRaWan network.

We are already discussing this problem here: LT-22222 transmitting twice - #8 by wolfp

Hardly discussed, as you never provided the requested details.

Fact remains, that’s not valid behavior of a LoRaWan node, the spec is quite explicit that apart from a retry of confirmed traffic, the frame count must increment.

Double taps also seen in this thread which you are aware of Dropouts with a brand new LHT65 node from Dragino - #13 by Jeff-UK have seen such double taps on other devices where manufacturer does it as a redundant Tx method send one then again with same fcount under alternate frequency to avoid any interferes on basis that chances are one or other will get through to NS with NS then taking 1st and rejecting second as duplicate fcount or 1st fails and then hopefully 2nd gets through… its better than relying on conf Tx as less impact on GW (no conf Tx by gw needed so spends more time listening :slight_smile: ) but still not a great socially conscious approach. Other time is when conf Tx used but node doesn’t seen conf back so node times out and re-Tx on different channel (again to avoid assumed interferes) and with same fcount…( as I suspect with the Dragino).

1 Like

The spec is quite explicit that the frame count must increment apart from retries.

Really its rarely worth re-sending increasingly stale data, instead, send more current data.

If it all gets through, great, you have even more data points. If some is missing, you know that the points you did receive are the most current possible for the times at which they transmitted.

Anyone trying to move a block of data over LoRaWan with absolute integrity needs to not only understand why that’s a challenging mismatch, but to be implementing their own chunk identifiers, not relying on the frame count.

I believe the “is_retry” field in the uplink message has nothing to do with confirmed uplinks from the node or with duplicate fcnt, in this case.

I have seen this for balloon trackers where there were a really large number of receivers. There was a first uplink on the MQTT stream, with the bulk of the gateways in metadata, followed by another uplink with the “is_retry” flag set, with a small number of gateways (possibly 1). I suspect this had something to do with the deduplication process in the backend, like some interaction with high-latency gateways that deliver the message after the backend already decided it was done deduplicating.

I can imagine that in this case the receiver overload caused a similar problem. The backend software may have been confused into thinking there were actually two messages instead of one because they were received on different frequencies, throwing the deduplication logic off.

Except that it is a duplicate fcount, for example

A report arriving after the de-dupe window is a duplicate. For all the infrastructure knows, it could be a replay attack.

Here the bleedover got listed in the same signal report, so no, it wasn’t confused.

In contrast the repeating fcount was claimed to be substantially separated in time.

All of my Dragino-nodes use unconfirmed uplinks. But suddenly they startet to transmit several times with the same FCnt. As @cslorabox said, there seams to be a software-bug that leads to this behaviour.
I am still investigating, but in the moment I have no access to my nodes, in the meantime this behaviour stopped.
But I see another node in the log of my gateway showing the same behaviour. It’s RSSI is very low -119 and as @Jeff-UK said, the repetition increases the propability to receive a valid message.