V3 with PyCom Lopy4 LoRa nvsram restore issue

teus · November 30, 2021, 3:59pm

Migration from V2 to V3 TTN for LoPy works OK after using the trick: power off LoPy4, reset LoRa nvsram (even when never dumped), power on.

Issue Lopy4 using LoRa nvsram and TTN V3:
However when using LoRa nvsram dump/restore eg after deepsleep the second and following LoRa datagrams will not be uploaded by TTN V3.
When no LoRa nvsram dump/restore is used TTN V3 upload went ok.

At the time TTN V2 this all went OK.

Is this a TTN V3 LopY4 firmware issue of just software issue using LoRa?

cslorabox · November 30, 2021, 5:25pm

Can you get the channel and spreading factor and frame count of the ignored packets vs the accepted one?

You might be able to get this from the raw gateway traffic (if it’s being received but not decoded) or from debug UART output of the board itself.

Presumably what is going on is that the firmware is violating one of the darker corners of the LoRaWAN spec, which matters more in V3 in a way it did not as much in V2.

For example, the prohibition on using optional channels (in bandplans that have them) until those have been commanded via a MAC downlink from the server.

teus · November 30, 2021, 6:42pm

Thanks for the quick reply.

The spreading factor is SF7 which we used to start the join. The first framecount 1 (payload is 51 bytes) is received via MQTT interface of TTN V3.
After the join and first datagram the LoPy4 is instructed to save LoRa state in nvsram which is followed by a deepsleep of ca 16 minutes.
After the wakeup of the deepsleep the LoRa state is restored (join is not needed), measurements are done (ca 2 minutes) and the second LoRa datagram with the new count number is sent off. All datagrams with a higher count as 1 are not received via the MQTT TTN V3 interface.

Without the LoRa nvsram dump, deepsleep and restore all LoRa datagrams are received via the MQTT TTN V3 interface.

LoPy firmware version is 1.20.2.rc10 (Jan 2021).

We will try to get the exact values of SF, channel, etc in a later state and will try to see the raw TTN V3 data of the first datagram and (later) datagrams passing the LoRa gateways used is possible.
If needed we will sent those data later.

Thanks so far.
Teus Hagen

kersing · November 30, 2021, 10:13pm

Hi Teus,

From the symptoms it sounds like one or more parameters are not saved and restored correctly. V2 was very forgiving, V3 requires stricter adherence to the LoRaWAN standard and as a result discards packets V2 accepted.

Your message doesn’t explicitly state so but I gather you are using OTAA?

Best regards,

Jac

teus · December 1, 2021, 8:38am

Jac,
W’re using OTAA.
Your remark about ‘maybe the LoRa status dump/restore’ is exactly the reason why I raise this issue also on the PyCom Lora forum.

Currently all air quality measurements kits equipped with solar the non-profit Open Source base organisation have are put on hold :-(.

What I did not expect is that we had to erase explicitly the LoRa nvsram before we could connect to V3 TTN also for kits not using the LoRa nvsram status handling. Mysterieus.
If anyone has an idea how to fix this without needed to do a join for sending every package w’ll be very happy.
teus

descartes · December 1, 2021, 10:32am

This is not a TTN issue - they don’t write code the runs on devices - they write a LoRaWAN compliant stack that communicates to a known standard.

What downlink options are there? Maybe there is something you can send that will help?

How many of these devices are deployed?

kersing · December 1, 2021, 11:29am

That is not an option as you are only able to join 64K times before you need to delete and re-add the device at TTN. And that only works if every join attempt uses a unique nonce (so it needs to be a counter that is saved in NVM), with random nonces duplicates are inevitable well before the 65536 possibilities are exhausted.

teus · December 1, 2021, 11:51am

Indeed it is not a TTN V3 problem. The article is meant to see if others are encountering the same problem. Currently 20 end devices use this method.

Agree fully with Jac that a join for every package sent is a no go.
Yes the problem hints to a package counter problem in the LoRa nvsram restore but with running TTN V2 this ran perfectly.
Stlll there are 2 routes: the software we use in the LoPy4 or the LoPy LoRa firmware?

descartes · December 1, 2021, 12:00pm

Just ensuring those that find this thread don’t start thinking that TTN somehow influences the device firmware.

Can you point us to the Save to NVSRAM code in use so firmware peeps can comment further - in theory you could clean up the result but ideally if it’s a system call then the counter should be included in the save so would be a firmware fix.

I suspect you will be planning to visit each device to update them.

teus · December 1, 2021, 2:00pm

LoRa state dump/restore: LoPy4 runs micro Python. User software on the LoPy4 has no direct access to the nvsram as via a call to the PyCom firmware module library. So is the LoRa state dump/restore nvsram handling. One may call this a ‘system call’ but in reality it is a Python class module API call made available by the microprocessor manufacturer (PyCom API lora.nvram_save() and lora.nvram_restore()). So the actions are (after a first power on/join/nvram save/deepsleep): arise from deepsleep, get access to the LoRa class, restore LoRa state via nvsram restore, do measurements, send package and check success, dump LoRa state in nvsram by a ‘system call’ (we have no real access to that deeper software layer).
As with V2 this went ok my guess is that the framecounter update is not the problem. So my guess is it is in the firmware LoRa API not saving/restoring all neccesary parameters or a wrong use of the API by our software somewhere.

Planning a visit: Anyway we have to go along all end nodes as we discovered that we need to power cycle and erase LoRa nvsram any way. Ie also for those end nodes which were not using a LoRa nvsram state dump/restore sequence. In my opinion strange. A lot of field work but that is what it is.

descartes · December 1, 2021, 3:44pm

I thought that PyCom had released the source for their LoRaWAN modules. Or is that just the gateways?

teus · December 1, 2021, 3:51pm

As far as I know the source of the microprocessor eg LoPy4 is published. Going into that part and do an own release of the PyCom LoPy firmware is a no go. Maybe the problem can be detected there.
That is the next step to do. A step that can be avoided to know if someone has or has not that LoRa state restore issue with the LoPy4. We may be looking in the wrong direction.

descartes · December 1, 2021, 4:29pm

I’m happy to look at the code but I’m not going to look for it - as above, help us to help you by providing a link.

No one suggested that. Providing a fix to PyCom would make it part of the main firmware.

teus · December 1, 2021, 6:33pm

Nick thanks for your replies.

Seems PyCom LoPy LoRa coding can be found in:
pycom-micropython-sigfox/LoRaMac.c at Dev · pycom/pycom-micropython-sigfox · GitHub (seems this code is older as 9 months).
Routine for nvs handling: void LoRaMacNvsSave( void ){…}
I did not yet detect the NvsRestore() function.
I do not know if this code is the one which is used by the LoPy firmware.
My guess is that this C part is what is made avialable at the micro Python level. I lack the knowledge to see if the code is LoRa implementation compliant.
If needed our user part can be found in https://github.com/teusH/MySense/PyCom MySense.py (main driving part handling LoRa nvsram state handling and lib/lora.y the interface to the LoPy LoRa class firmware.

cslorabox · December 1, 2021, 6:52pm

Typically the best way to debug something like this is to get a gateway log of all the uplink/downlink traffic payloads, in raw still encrypted air form along with their frequencies.

Presumably what is happening is that after receiving the join accept, the node is transmitting again, but with improper details such that the packet is being received by the gateway, but rejected by TTN. If in fact the node is ceasing to transmit, that would appear to be a local program logic bug more than a spec compliance bug.

Optionally you can click into details of the various packets, but it may be harder to share that.

Key things you’d be looking for are downlinks with MAC commands, and the full details of any uplink packet apparently from the node in question that is being received by the gateway but not processed by TTN.

teus · December 1, 2021, 8:41pm

Agree, and thanks for the thought. Will buy/install a gateway to be able to see into the logs for this. This will take some time.
For now the priority is to get all the non solar equipped end node running on V3. In other words: there is a handyman problem to have all end nodes operational which are able to run without the LoRa state dump/restore issue.
As soon as I have more info I will inform about it in this thread. Thanks for sharing thoughts.
If anyone does not encounter this problem using LoPy4 in this Lora dump/restore way please say so.

teus · December 2, 2021, 12:14pm

Currently it looks like that the LoPy after a deepsleep/nvsram status restore will sent the LoRa datagram with the same f_cnt (frame count) number as last datagram sent and received via MQTT. So the datagram is skipped of course by TTN. Something strange is going on with the frame count. Will try to do a short idle sleep to give the LoRa part time to empty pending datagrams buffer.
We also noticed that the first LoRa datagram after a join (f_cnt = 0 probably) is not present as “f_cnt”:0 in the MQTT json data record.

descartes · December 2, 2021, 1:33pm

That’s a standard gotcha that comes with Google Protocol buffers, see:

The question is, if you hotwire the f_cnt by one, does it then store the new value.

Or put another way, does it ever increment.

I’d go back to PyCom and invite them to fix this immediately - if they are storing the f_cnt wrong it shouldn’t be that hard to store the new one AND it rather makes the whole functionality totally useless without the f_cnt being stored.

teus · December 15, 2021, 7:56pm

A very basic test code with LoRa nvram restore/save and deepsleep sequence has been tested. That codes works well. So the focus is now shifted to the put the bells and whistles from the troubles code part one by one back and do some experiments with where to put LoRa nvram save/restore/save instructions. Eg LoRa nvram restore seems according the docs to reset LoRa nvram. The issue can be closed for now.
Good playing exercise for Xmas?