Simplest low-cost forever-persistent storage method for sensor data

pomplesiegel · May 20, 2020, 6:03pm

Hello TTN community,

I am trying to solve a problem that may seem obvious to others, but not to me.

My goal is to create an app + database that stores all device payload data indefinitely at a low cost. This is not for a web-app. Instead, it’s so i can download it all and analyze the data using tools like Pandas in python offline - it can be slow.

However, I can’t seem to find a simple, low-cost solution to accomplish this. Is there a way to inexpensively log all the JSON data into a s3 bucket and just download that when needed?

The current “storage” integration on TTN is perfect and does exactly what I need it do, but it only stores for 7 days.

Any ideas are welcome!

Thank you,
Michael

wasn · May 20, 2020, 6:27pm

i am using telegraf to subscribe to mqtt and save all data to an influx db.

you can do that with every low cost virtual server hoster

pomplesiegel · May 20, 2020, 6:43pm

@wasn, Thank you. You have all of this consolidated on a particular host? Would you mind sharing your particular setup and whether you would recommend it?

InfluxData seemed like they provided 90%, but you still need another server to forward the data using telegraf.

wasn · May 20, 2020, 6:56pm

we have 2 little datacenters (each with a half size rack) where we have one vm with all telegraf instances we have, one telegraf instance for each application (about 30 right now) working in an HA setup.

this telegrafs sends the data to an other vm (nginx) in ha setup which make the loadbalancing.
an for influx db we are running 4 vms, 2 in each datacenter.

the loadbalancer balances all queries mainly internal in the datacenter but can access the dbs in the other datacenter if the other center will fail.

all writes to the db are send to influxdb relays installed on the influxdb vms. these relays are making the magic of mirroring all data to all influxdb vms.

But you can make it a lot more simpler.
Just install telegraf and influxdb on one vm.
it all depends on the datavolume and bandwidth you need.

pomplesiegel · May 20, 2020, 6:58pm

@wasn, thank you. Our needs are quite modest.
What do you think would be the simplest cloud-hosted solution to achieve this for one application with low volume?

wasn · May 20, 2020, 7:12pm

just use a simple linux virtual server from any hoster.
install telegraf and influxdb.
write the configs

bluejedi · May 20, 2020, 7:43pm

You could run the same on a small low-power Linux system at home (e.g. Intel Nuc) without monthly recurring VPS hosting costs.

(E.g. Intel NUC NUC6CAYH with 4GB memory and decent 256GB SSD can be bought for around €200.)

pomplesiegel · May 20, 2020, 7:47pm

@bluejedi, totally! I was looking to see if there was an easy cloud-hosted option, just so I wouldn’t have to worry about data ingestion outages due to my ISP…

I was initially quite interested in something like influxdata for this reason, but I believe it requires a separate telegraf instance in order to bring the data from TTN to influxdata.

wasn · May 20, 2020, 8:20pm

you can run it at home on a small low power device but you asked about cloud hosting.

there is no direct way to get data from ttn to your influxdb or any other database hosted by yourself…
you always have to use a “gateway” in between like telegraf.

it is not hard to config it.
just config the mqtt input part with ttn server, your application name and a key
and on the outside just config your influxdb credentials and db name.

cslorabox · May 20, 2020, 10:23pm

Promise not to laugh?

I literally have mosquitto-sub piped into tee running under tmux on a basic cloud server

I then use tail to grab some chunk of interest and pipe it into a python script tuned to look at whatever I’m interested in that day.

OK, it’s for pre-production debug data from the custom server of a test network, but…

cultsdotelecomatgmai · May 21, 2020, 6:54am

Hi @cslorabox, I’m not laughing at all.

For all my TTI & TTN data I run a “black box data recorder” system. I run MQTT “mosquitto_sub -d” on topic “#” on each application from systemd with restart and the output simply appended to a txt file. The process is restarted every 24 hrs so I get a series of dated record files. This gives me all uplinks, downlinks, outages, etc. Data is very easily examined and extracted.

For some applications integrity of the record really matters so at 00:00 hrs each day, the previous day’s txt file has its SHA256 fingerprint taken and added to a blockchain.

EDIT: I should have answered @pomplesiegel 's question. The data that I record is stored on cloud servers on block storage volumes that are routinely backed up. You ask about cheap and forever. “Cheap” is a difficult question; my cost is a few $ per month for cloud storage. To me that’s cheap. The best answer to “forever” is to make sure that the data can always be accessed using standard POSIX software tools that will always be available. The next best answer to “forever” is to keep 2 copies in 2 places.

hdtodd · May 23, 2020, 8:16pm

Michael,

This may be too incomplete to be useful for you yet, I fear, and may not be the kind of solution you’re looking for, but I modified the ttn-gateway code for lora_pkt_fwd to store encrypted packets from the gateway server as they’re passed on to the TTN network. The packets are stored with datetime and devadr stamps but are AES encrypted. I got interrupted before I could figure out which AES algorithm to use to decrypt the packets (using separately-maintained decryption keys, with a separate program, in C). I think I’ve got a Python version that would work, but I haven’t tried it yet.

I added objects to local-conf.json to provide parameters for the database storage. The lora_pkt_fwd code that stores the packets can store to either a sqlite3 database (locally) or mysql database (locally or remotely), but that option is a compile-time option, not a runtime option. The database entries are inserted into a table with the following schema:

CREATE TABLE IF NOT EXISTS "ttn_packets" (DTS INT, DevAdr INT, pSize INT, 
Packet CHAR(341) );

DTS is Unix Epoch Time; DevAdr is the device address; pSize is the size of the data packet; Packet is the encrypted packet; the packet is decoded from Base64 but not decrypted.

My intention was to finish up the packet retrieval and decryption program, then package it up and make it available. I’ve avoided changing the functionality of the lora_pkt_fwd code, so my plan was to distribute my code as a patch on the lora_pkt_fwd code, if I can figure out how to do that. To run it, just turn off the local storage from the local-conf.json file if you just want the TTN packet-forwarding functionality; turn it on by setting the parameters in local-conf.json if you want to add local storage.

I also had to modify the Makefile to accommodate the compile-time options and use of sqlite3 or mysql libraries. I built it as a tree “ttn-storer”, parallel to “ttn-forward” under /opt on my Raspberry Pi, and I created a .service file for it so it’s managed by systemd in place of ttn-forward. But overall, not many changes from the source distribution on github outside of the additions to the C code in lora_pkt_fwd.c.

There are any number of potential problems with the code I’ve written – mostly with throughput. I only have two LoRaWAN nodes in my LoRaWAN working radius – mine! Under load, the database storage part would likely become a bottleneck and need some reworking. I didn’t add any filtering for device addresses, either: it stores whatever is being forwarded to TTN. If you’re in a densely-LoRaWAN-populated area, you might want to filter to store just the packets from your devices to avoid throughput problems.

Anyway, I was just getting ready to go back to this project. If this is the sort of thing you have in mind, let me know and I’ll happily share what I’ve got (acknowledging at the outset that I own the rights to none of that lora_pkt_fwd code! ). Unfortunately, it’s not well documented yet, either, though if you know how to “make”, can edit a .json file, and know some sql, you’d get started quickly.

I had originally planned to use MQTT to catch and log those packets, but I had no experience with MQTT. I decided the complexity of learning, setting up, and maintaining that in the long term was greater than just logging the code from the forwarder and would be a distraction from the sensor node work that I really wanted to do. And I learned a lot about the forwarder and TTN from the exercise.

But if you find or can quickly develop a viable MQTT solution, that might be a better bet for you right now.

David

arjanvanb · May 23, 2020, 8:40pm

Nice exercise, but beware that decryption needs a lot of details for each device:

For ABP, your gateway will need to know the DevAddr, and the secret AppSKey and NwkSKey for each device.
For OTAA, you will need to capture both the device’s OTAA Join Request and the network’s Join Accept, and use those to derive the dynamic DevAddr and calculate the secret AppSKey and NwkSKey, just like the devices do. This assumes devices have not already joined, and that TTN uses your gateway for the OTAA Join Accept. OTAA needs your gateway to know the DevEUI, AppEUI and AppKey for each of your devices.
For both ABP and OTAA you will also have to validate the MIC to know for sure which device has sent the data, as a DevAddr is not unique.
To validate the MIC, you need to keep track of the full uplink counter, as the LoRaWAN packet only includes 16 bits of that counter, but the MIC is calculated using the full counter, which might be 32 bits.
Using the AppSKey you could then indeed decrypt the application payload. Like the MIC, this also needs the full uplink counter. (Unlike the MIC, you won’t know if you used the wrong value; you’ll just get a different result.)

So, you’ll need to duplicate a lot of device configuration, and might also want to implement frame counter security. And the beauty of TTN’s community network is that your devices’ transmissions might be received by any other TTN gateway that is around. Some packets might not even be received by your own gateway when other gateways receive them.

When using TTN, I’d really say you should use its MQTT API or the HTTP Integration to get your device’s data (and the LoRaWAN metadata).

(I merged the remarks about the full uplink counter from a later answer into this one, for easy future reference. See the pencil icon in the top right corner for the change history.)

hdtodd · May 23, 2020, 8:56pm

@ arjanvanb

Thanks for the comments! All good points that I failed to mention in my haste. My use case may be unique.

I think your first point is really critical: only works with ABP.

My implementation is a little different than you’re suggesting: I don’t expect the gateway to know anything about AppSKey or NwkSKey … those are only known by the decrypting code, quite likely running on a completely different system for security.

But what’s the reason for your last paragraph?

When using TTN, I’d really say you should use its MQTT API or the HTTP Integration to get your device’s data (and the LoRaWAN metadata).

I thought I’d seen MQTT as the recommended route to get the device’s data. What would be the recommended way to secure a local copy of data as it passes through the gateway?

arjanvanb · May 23, 2020, 9:10pm

I’d say that both the MQTT API and the HTTP Integration are equally fine, just depending on your taste or needs. (Unless you’re using some third party that offers some specialized integration.)

arjanvanb · May 23, 2020, 9:21pm

…which also needs you to keep track of the full uplink counter, as the LoRaWAN packet only includes 16 bits of that counter, but the MIC is calculated using the full counter, which might be 32 bits. You’ll need the full uplink counter for decryption of the application payload as well.

kersing · May 23, 2020, 9:29pm

Reading the posts I’ve been wondering about the legal implications of potentially storing third party data. When you run a ‘normal’ gateway you’re just forwarding data to TTN, if a legal entity decides they want a copy of all data you received you’re able to tell them you don’t have any and point them to TTN (which is required to have policies in place in case this happens because of the Dutch law).
Once you start storing data things become different. Now you’re not forwarding data but also storing it and you might be challenged to hand the data over to the authorities. Are you equipped to handle those requests?

When I added the gwtraf transport to mp forwarder I purposely decided to only forward the meta data of up- and downlinks and drop the data to make sure I would face this kind of issue. (Gwtraf forwards metadata of all traffic to a UDP receiver to allow statistics and gateway performance analysis)

BTW, using poly forwarder or mp forwarder you could just forward any uplinks to a second ‘backend’ and use that to store all the data. No changes required to these packet forwarders

hdtodd · May 24, 2020, 11:45am

@ kersing

Thanks for the additional and insightful perspective on what I thought was a pretty simple project!

I wasn’t concerned about security issues in storing the data since it’s encrypted. But you’re right – I could be asked to provide that data if the authorities have other ways to get the keys. I’ll have to think about that a bit, but at the outset, I’d say I’d be OK providing whatever I have, under court order. On the other hand, that would be a good reason to add a filter to store just the packets from DevAdr’s I own. Pretty easy addition to my code, with the DevAdr’s provided through the local-conf.json file (needing no ongoing modifications to the lora_pkt_fwd.c code as I add new devices).

Again, I’m “lucky” at this point in my development – I’m the only LoRaWAN source within range. But considerations like this are important to factor into the code now, rather than be surprised by issues like this in the future.

Thanks for the tip … might be the better approach for @pomplesiegel to go as he’s just starting out. And I’ll keep it in mind as an alternative to this local database approach if I run into problems, though at this point the storing function seems to be working just fine. I rather prefer the architecture where all this works off one Pi (forwarding and storing) for ongoing maintenance simplicity: that’s one reason I avoided the MQTT approach in the first place.

But you’ve given me a number of other factors to consider in the implementation. Thanks!

David

arjanvanb · May 24, 2020, 12:10pm

Just for the sake of completeness:

Like I wrote, a DevAddr is not unique, so this is not fool proof either. Even more, the structured way they are assigned already defines about 20 of the 32 bits. So, e.g., there are only about 4,096 different values for all ABP devices in the EU region on the TTN community network.

hdtodd · May 24, 2020, 12:25pm

@ arjanvanb

Thanks. Yes, I’ve noted that as something I’ll have to pay attention to when I get back into lora_pkt_fwd.c again. At this point, storage works with DevAdr, and my next task is to finish the C code for retrieving and decrypting packets from the database. Then back to lora_pkt_fwd.c to reconsider device addresses and filtering.

David