Compression strategies for LoRaWAN packets

MedadNewman · December 26, 2020, 11:32pm

In my application, I want to fit as much data into a LoRaWAN packet as possible. I am willing to send a 100 byte payload. However, I want to fit as much data in that packet as possible.

The question is: How can I carry out lossless compression for such a small packet size? I have already carried out bit packing and truncation to ensure the payload size is as small as possible. Lossless compression schemes such as LZSS and Huffman coding have large over heads that make the “compressed” packet even larger than original.

Has anyone attempted lossless compression for LoRaWAN packets? I will be interested in knowing your strategies.

The overheads in Huffman coding is the Huffman tree, that is normally sent in every packet. If the tree is predetermined, then it does not have to be sent at all, saving lots of space. Would this work?

One thing is for sure: sending larger payloads less frequently is better(in terms of airtime) than sending smaller payloads more frequently due to the overheads(Fcnt, CRC etc) in a LoRaWAN packet.

kersing · December 27, 2020, 12:09am

Well… actually that depends on your tolerance for data loss. If you pack a lot of data into one packet you’ll lose a lot if one packet does not make it to the backend. For smaller packets less data will be missing.
Larger packets suffer from larger risk of interference during transmission increasing the chance of data loss.

Finding the right balance between acceptable data loss and used airtime is key for any application.

kersing · December 27, 2020, 12:15am

If you know the data space and you’re able to pre-calculate the tree that is certainly possible. I used a similar technique about 25 years ago for another low bandwidth transport channel.

cslorabox · December 27, 2020, 4:06am

That’s a bit much for LoRaWAN under any but the best conditions of connectivity permitting you to use a fast spreading factor.

Have your run the airtime calculator against your fair usage allowance?

How can I carry out lossless compression for such a small packet size?

There is of course no such thing as lossless compression which works on arbitrary data; only things which work with some types of data. Otherwise you could just feed the output to the input of another copy and shrink some more.

AndyG · December 27, 2020, 5:38am

You also have the port number to play with. Ports 1 to 223 are application specific. Not a complete byte but still useful. If you can model your data set before deploying the nodes, and assuming the node is smart enough, you have the potential to use up to 222 different predefined compression combinations.

descartes · December 27, 2020, 4:35pm

If you can share the data sources, types & sizes we may be able to help in more detail. Are you packing the payload for battery savings or because you have a lot of data to move?

A simple compression scheme is best evaluated by experimenting with - trying various ways of generating a common tree which may, with bit packing and different values, benefit at the nibble (4-bit) level. Quite often the grouping or order of the payload will make a difference.

Rather like solving a Suduko or routing out a PCB, looking at the data for patterns may reveal something useful, so lots of sample data on screen can help. Watch any of the Enigma films for inspiration.

You could look at rate of change as well, and using the suggested port number scheme above, send payloads with different data at different intervals depending on how quickly it changes.

I’ve used a byte as 8 flags to indicate which fields I’m sending, so if something hasn’t changed since last transmission or since the last packet was appended, a 0 indicates that field isn’t being included. If you’ve 8 integer (2 byte) values and half don’t change, you need only have a single flag byte plus 4 x 2 bytes = 9 rather than sending 8 x 2 = 16. If you use the a bit as a flag & a bit for a sign, you can take a 2 byte field and have it ±64 as a change.

Alternatively, if you have a happy zone for a value that is only going to be aggregated / decimated almost as soon as it’s in your database, you can do that on the device - I rarely end up showing more detail than an hourly average for acceptable values so in theory I could have the device send the averages for me. If the value is close to boundary or in an alarm condition, then I can code to have it send the detail. Or send the average when it’s acceptable and not changing at more than a certain rate, and switch over to detail if there is a sudden (but smoothed) spike in reading.

The device could save all the readings for a time period and use a variety of schemes to get the best value out of the payload. You can then employ a mechanism that allows you to request a set of data at various levels of detail if you need it.

If you have GPS data, setting your own geographic reference point or having a confirmed uplink for a daily reference point will then allow deltas to be sent.

If you have the time, patience and some scrupulous documentation, combining a range of schemes can produce some remarkable results.

MedadNewman · December 28, 2020, 12:07am

Agreed, there is the balance between acceptable dataloss vs airtime

MedadNewman · December 28, 2020, 12:15am

Thanks for the reply

A 100 byte payload can still comply with the 30 second daily airtime fair use limitation. Restricting transmissions to Datarate 5(Spreading factor 7 in EU868 settings) yields a tx time of 174.34 milliseconds.

The application I have in mind is for long distance LoRaWAN floater balloons that fly at 10km altitude. These are solar powered devices that transmit only during daylight hours(~7 hours a day at the moment).

MedadNewman · December 28, 2020, 12:16am

Good idea with using the port number. Thanks

MedadNewman · December 28, 2020, 12:23am

Thanks for the detailed reply.

My application is for long distance floater balloons that tranmit LoRaWAN packets.
In each packet, there are 13 position fixes transmitted: the current position as well as 12 previously saved positions, that were not received due to poor TTN coverage.

The packet is like this:

Longitude: 2 bytes
Latitude: 2 bytes
Altitude(meters): 2 bytes
Timestamp(minutes since 1/1/2020): 3 bytes
Repeat steps 1-4 a total of 13 times.

There is indeed some repetition in the type of data here: they are repeated longitude/latitude/altitude/timestamps. There must be some pattern in there that is repeated and can be compressed. I don’t know how to do it yet.

The idea of sending deltas of each position not very attractive to me because the quantisation error adds up. The delta between first and second position could be a certain amount. That error is added onto each subsequent position.

Medad

descartes · December 28, 2020, 12:59am

Doh! Should have spotted the name. Perhaps I should look at some disposable launches, I’ve got gas begging to be used stuck in the shed all this past year.

But you / we / UKHAS have loads of data that can be used - or even made up based on your own prior launches. Both Peter & Steve have run some LoRaWAN flights this year, Steve definitely fed his in to HABHub so there should be some data there. Put some lines in to a spreadsheet, get a beverage and stare at it.

Not if the first frame in a payload is an absolute fix, which doesn’t actually need to be the full digits as a reference point based on a moving window coupled with last direction can be used.

After some analysis, it may be that the quantisation errors are relatively insignificant over a run of fixes depending on how many decimals accuracy you want the location to run to. Again, model it in a spreadsheet, you never know what may turn up.

As you are solar powered, if you can charge a supercap, then the first frame can have the absolute time and subsequent frames can be assumed to have happened on schedule. The timestamp could be two bytes with a base line of first transmission at turn on, giving you 45 days before roll over - if the balloon goes quiet for that long it should be feasible to figure out where it in its cycle it is when it’s next heard.

descartes · December 28, 2020, 1:23am

Just caught up with the ICSS blog - you’ve been keeping busy!

Maybe with the geofencing & results thus far you could store a fix a day whilst over the no-coverage areas and then find a scheme to weave them in to the uplinks when over better areas. Or add some FRAM to keep more of the data.

Additionally, if the panels are generating enough power and are over the right areas, you could open the RX windows to try to get some feedback and purge some of the prior fixes.

cslorabox · December 28, 2020, 2:25am

That’s something you most certainly do not want to do! You want to be able to use the longer range, slower SF’s for such a purpose.

the current position as well as 12 previously saved positions

You should use maybe 4, spaced exponentially further into the past

The idea of sending deltas of each position not very attractive to me because the quantisation error adds up. It would only make sense to use SF7 for something like uploading historic data, after getting a reply that indicated you were in range of a gateway.

Send them each as deltas from the current position so that you only hit that once

MedadNewman · December 28, 2020, 3:04am

That’s something you most certainly do not want to do! You want to be able to use the longer range, slower SF’s for such a purpose.

From experience on flights, the high data rate is not a problem at all; it’s the curvature of the earth getting in the way being more of a problem. Our tranmissions are regularly picked up at the edge of the line of sight limit. We have had transmissions picked up 700km away when the balloon was at 10.7km up. Gateway was 4km up on the Matterhorn.

We are better off using a high data rate(SF7 or SF8) and transmitting more regularly, while sticking to the TTN airtime limits.

Refer to image below for 700km+ range:

MedadNewman · December 28, 2020, 3:12am

Steve Randall launched ICSPACE23 on 17 Dec but it is yet to show up again. It’s internal EEPROM can save 650 positions/timestamps which is plenty I think. The challenge is to send everything down over areas of TTN coverage. Global TTN coverage is limited to Europe, Japan and America. It may be up to 10 days before it reaches these areas.

Currently, in each transmission, it sends the current position and 12 semi-randomly selected past positions(with no repeats) to give us an idea of the past flight path. Subsequent transmissions will fill in the gaps.

Using a downlink(tx from gnd to balloon) to purge the non volatile memory of already received position fixes is a good idea. On the last flight, while over Belarus, only one of my downlinks was acked by the balloon. Getting packets up to the balloon seems more difficult.

Medad

descartes · December 28, 2020, 9:33am

Perhaps get it to send a one-a-day fix for the time it was over no-coverage area so you have a better idea of where it was and then it can progressively fill in some more of the detail rather than semi-random.

Whilst sticking to the lower SF to push the data through expeditiously, you may want to send at a higher SF every so often with a smaller payload - and if it picks up a downlink with reasonable RSSI/SNR, perhaps get busy sending a few more packets.

mark-stanley · December 30, 2020, 12:36pm

Why do you need to include the timestamp for each position in the payload - it takes up a lot of space, even as minutes since the year start? Is there a reason why you can’t use the time from the metadata that comes for free with your received package?
If you really need the timestamp, do you have to be accurate to the minute, or would every 10 mins do and reduce it to 2 bytes per position?

cslorabox · December 30, 2020, 5:46pm

I believe that’s because a packet can include multiple distinct measurements from previous portions of the flight

Strategic resolution is indeed important.

It’s possible with a fixed stages back in time scheme (especially the exponential intervals I was arguing for) that the individual times don’t need to be sent, since they’d be known intervals before the transmission time.

And as along as the system is open loop, a fixed scheme is really as good as any other. Even a “random” repetition could be pseudo-random, in particular encryption of the measurement index offset can be a way to turn an ordinal sequence [1, 2, 3…] into an effectively random one which visits all of the same points. A dithering algorithm is probably better than a random one. But given there are going to be very long periods with both good and bad coverage, I continue to think that a sort of exponential history is probably going to be best.

MedadNewman · December 30, 2020, 8:52pm

Yes indeed we get one timestamp for “free” which is from the metadata. Notice that it is only one timestamp, and can conveniently be used to timestamp the current position of the balloon. However, in each packet, I send 12 previously recorded positions of the balloon so that I get an idea of where the balloon was when it was out of range of TTN gateways. Each position fix will need a timestamp so that I can plot the flight path. Notably, there is no TTN coverage over the oceans and anywhere outside Europe, America and Japan.

The reason for minute precision is we calculate the sun elevations at which the balloon transmits at. The tracker is powered by 6 solar cells and the power output depends on solar elevation. When the sun is directly shining head on at the solar cells(elevation of 90 degrees), it will generate maximum power. At sunrise and sunset, the solar elevation will be lowest. We want to know what is the lowest solar elevation it can operate at. To do that, if we know the position and timestamp accurately, we can calculate the elevation of the sun. From our last flight, ICSPACE23, we found that the tracker’s GPS is operational at just 1.36 degrees sun elevation, which works out to just 2% of the power of the sun if it was shining head on.

A picture of our tracker is below.

mark-stanley · December 31, 2020, 3:33pm

That’s superb Medad!
So rather than calculate minutes since 1/1/20 could you calculate minutes before transmission time for the 12 previous positions? This would be a much smaller number so you could fit into 2 bytes (or less) rather than 3?

Regards, Mark