First f_cnt is empty/null - world wide technical debt in progress

Please be aware that at present the first f_cnt is empty/null aka not 0 so anyone writing any form of integration will need to handle this as any reasonable database should be validating the f_cnt as a number.

See First f_cnt is empty/null · Issue #3874 · TheThingsNetwork/lorawan-stack · GitHub

2 Likes

Yep… the good old if (x) because it might be null, and then x turns out to be 0 which is to say false, a mistake I’ve seen in more than a few other cases - the fun one being a limited resolution temperature of exactly 0 C and suddenly the data platform is showing gaps in the graph overnight.

Workaround code will probably need to be things like uplink.get("f_cnt", 0) or similar providing a default value, though fixing the original issue would be better.

From @jpmeijers on Slack:

“That’s a common issue in golang. If a field of a struct is “not set” it’s actually the zero value for the type. And when you marshal the struct to Json, fields that are not set are left out. So in this case the fcnt of 0 is incorrectly seen as not set”

Fixing the original issue is the only way - don’t mind trapping for missing fields in decoders now we have them at device level, but not something fundamental that should be in every single uplink packet.

Ouch, so does that language have a way to explicitly set 0 such that it will marshall to the JSON?

I seem recall seeing a similar issue with protocol buffers, where maybe they can’t encode whatever the “default” value is, but it’s been a while so may not be remembering exactly.

What, Protocol Buffers from the people that bought us Go?

I see a pattern here …

:rofl:

So, as a temporary workaround we should start uplinks with 1?

(Sorry for the silly question)

We/you/anyone can’t.

They start at 0 as per spec

Sorry, I don’t get it.

  1. I can set the f_cnt on my devices. It’s a problem to start with 1?

  2. I transferred ABP devices from V2 to V3. The devices where already at non-zero f_cnt

Actually skipping to 1 is fine, because it receiver looks ahead by up to 16384. Skipping to 16383 would be fine (though not recommended, as if that packet fails to be heard then you really would have a problem)

The network doesn’t know that it hasn’t missed packets.

Skipping forward is a novel solution, but not really in violation of the spec in any practical sense, since it’s pretty much indistinguishable from something the spec expects to regularly happen.

The preferred configuration is OTAA, which will start at zero, as per spec

What you’d want to do is make the device itself start at 1.

If you set the f_cnt in TTN to one, then TTN is going to ignore the f_cnt 0 packet. That keeps it out of your data feed, but still wastes it.

In a typical implementation for either OTAA or ABP, yes. But a firmware modified to start each OTAA session at 1 after getting a join accept would really not cause any problems, the stack would just think there was a previous packet between the join accept and the first one received that had gotten lost, which happens often enough that it’s completely expected - it’s a stack that couldn’t handle getting “1” as the first frame count actually received which would be out-of-spec.

That said, it really makes more sense to armor one’s data platform against a problem input, if 0 isn’t going to be filled for the missing f_cnt then just make sure there’s an exception handler that can drop that packet and survive to accept the next one with a non-missing frame count.

It’s better to not confuse others users with suggestions like ‘start with 1’ and then explain why this would not be an issue in practice.
From protocol standards perspective that is just plain wrong.

The first frame should be frame 0. And if that frame number is missing then that is a bug that needs to be fixed. Independant of the behavior of certain programmings languages.

The number 0 is semantically different from null, void, or whatever it is called in a certain programming language.

As temporary workaround I would like to suggest to interprete a missing frame number as the number 0.
But a workaround should not be needed here, and may cause unwanted side-effects. Instead the bug needs to be fixed asap.

(Based on @descartes’ findings. I have not tested it myself.)

Found this:

“Fields that are empty or zero are not returned in requests, even if they are specified in a field mask.”

on Fields and Field Masks | The Things Stack for LoRaWAN

Wonder how we tell if something was, for numbers, null or zero and for strings, null or empty.

1 Like

Oops. As a temporary workaround we decoding on own server.

(Just kidding!)

I’m deeply opposed to such schemes.

They force those who use them (and those who use things that use them) to make sure that:

  • only one of 0 / missing / null / empty can ever possibly occur in the system for a given field, including as an error case (eg, what do you transmit if that sensor is broken?)

  • code must carefully map “unmentioned” back to the usage-specific singular possibility

As such things go, the frame count is not a particularly problematic example. As long as one can distinguish application packets from joins, then not having a frame count is not even a possibility for an uplink message - in that case, “unmentioned” maps back to the singular possibility of 0.

It gets far more nasty if you have a situation using a similar coding mechanism where say temperature could be 0, but there are also packet formats that don’t include temperature. And worse, not only is temperature a reading, but the 2nd level packet parsing code needs temperature as an input to compensate other readings included in the packets that should have a temperature (but where that temperature might be 0 C). That can make for some interesting exception trees…

Database schemas mostly use null or an actual value, so you can tell if something was actually set to a value & what that value is or not set at all. This is something many developers are used to, indeed expect.

If the packet format doesn’t include the temperature, I’d not try & extract the temperature from it.

And if the sensor is broken, the firmware should flag this within the payload so that when decoded any value that is supplied (unlikely to be null, potentially 0, maybe an extreme value) is ignored, or as I do with DS18B20’s that have had their cable cut or ripped out, use an extreme value as a flag.

But this isn’t applicable to the situation at hand as we either write the JavaScript decoder which isn’t Go so doesn’t suffer from this issue, or, as I prefer to do, unpack the Base64 encoded payload so I’m not reliant on the Application Server running the JavaScript (which fails enough on v2 to cause issues) and also means I have control over the processing and can re-process if (when) someone changes the firmware but not the decoder.

The problem arises when a field in the stack is passed out to via one of the integrations - we will not be able tell if the value wasn’t set due to planned circumstances, an issue within the stack or something the device did or didn’t do.

If we get all the fields that should appear but either null or a value, we can then code for what we receive. But if the field is just missing due to being null or 0, we have no way of knowing what the situation is, was it never set (null) or is it actually 0?

And we have to code for the field being missing as well, no big deal, but just more code to write & test.

1 Like

Frame count doesn’t go through the javascript decoder

as I prefer to do, unpack the Base64 encoded payload so I’m not reliant on the Application Server running the JavaScript

Frame count is in the metadata, not the payload

(which fails enough on v2 to cause issues) and also means I have control over the processing and can re-process if (when) someone changes the firmware but not the decoder.

Though indeed, doing it on your own platform is wise for that reason

If we get all the fields that should appear but either null or a value, we can then code for what we receive. But if the field is just missing due to being null or 0, we have no way of knowing what the situation is, was it never set (null) or is it actually 0?

Hence my point that this forces designing such that only one of those two is ever possible.

For frame count, only zero is possible, any LoRaWan Application or MAC uplink must have a frame count; only join requests do not, and a system should handle and report those distinctly.

I don’t like mechanisms which force this, but for frame count, there’s no ambiguity.

I never said it did. You bought up fields in the payload that have nothing to do with the issue at hand.

Your focus only on the frame count doesn’t address other potential problems that may arise.

As this is not an academic issue, one that we’d all benefit by having it addressed in the Go code base, please can we stick to discussions that move this forward. If you write server side code for integrations, tell us how you’d handle this.

1 Like

And again, my explanation that using a language with this anti-feature forces those who use it or use things (such as TTN) which depend on it to:

  1. Make sure that only one of 0/null/empty/unmentioned can ever even occur for a given field, including after an earlier exception or failure.

  2. Make code which accepts this feed map “unmentioned” back to the usage-specific singular possible value of that field, whatever it is (for frame count, the singular possibility is 0)

Personally I would not use or recommend a language with such an anti-feature.