Simplest low-cost forever-persistent storage method for sensor data

@wasn, thank you. Our needs are quite modest.
What do you think would be the simplest cloud-hosted solution to achieve this for one application with low volume?

1 Like

just use a simple linux virtual server from any hoster.
install telegraf and influxdb.
write the configs

1 Like

You could run the same on a small low-power Linux system at home (e.g. Intel Nuc) without monthly recurring VPS hosting costs.

(E.g. Intel NUC NUC6CAYH with 4GB memory and decent 256GB SSD can be bought for around €200.)

1 Like

@bluejedi, totally! I was looking to see if there was an easy cloud-hosted option, just so I wouldn’t have to worry about data ingestion outages due to my ISP…

I was initially quite interested in something like influxdata for this reason, but I believe it requires a separate telegraf instance in order to bring the data from TTN to influxdata.

1 Like

you can run it at home on a small low power device but you asked about cloud hosting.

there is no direct way to get data from ttn to your influxdb or any other database hosted by yourself…
you always have to use a “gateway” in between like telegraf.

it is not hard to config it.
just config the mqtt input part with ttn server, your application name and a key
and on the outside just config your influxdb credentials and db name.

1 Like

Promise not to laugh?

I literally have mosquitto-sub piped into tee running under tmux on a basic cloud server

I then use tail to grab some chunk of interest and pipe it into a python script tuned to look at whatever I’m interested in that day.

OK, it’s for pre-production debug data from the custom server of a test network, but…

4 Likes

Hi @cslorabox, I’m not laughing at all.

For all my TTI & TTN data I run a “black box data recorder” system. I run MQTT “mosquitto_sub -d” on topic “#” on each application from systemd with restart and the output simply appended to a txt file. The process is restarted every 24 hrs so I get a series of dated record files. This gives me all uplinks, downlinks, outages, etc. Data is very easily examined and extracted.

For some applications integrity of the record really matters so at 00:00 hrs each day, the previous day’s txt file has its SHA256 fingerprint taken and added to a blockchain.

EDIT: I should have answered @pomplesiegel 's question. The data that I record is stored on cloud servers on block storage volumes that are routinely backed up. You ask about cheap and forever. “Cheap” is a difficult question; my cost is a few $ per month for cloud storage. To me that’s cheap. The best answer to “forever” is to make sure that the data can always be accessed using standard POSIX software tools that will always be available. The next best answer to “forever” is to keep 2 copies in 2 places.

3 Likes

Michael,

This may be too incomplete to be useful for you yet, I fear, and may not be the kind of solution you’re looking for, but I modified the ttn-gateway code for lora_pkt_fwd to store encrypted packets from the gateway server as they’re passed on to the TTN network. The packets are stored with datetime and devadr stamps but are AES encrypted. I got interrupted before I could figure out which AES algorithm to use to decrypt the packets (using separately-maintained decryption keys, with a separate program, in C). I think I’ve got a Python version that would work, but I haven’t tried it yet.

I added objects to local-conf.json to provide parameters for the database storage. The lora_pkt_fwd code that stores the packets can store to either a sqlite3 database (locally) or mysql database (locally or remotely), but that option is a compile-time option, not a runtime option. The database entries are inserted into a table with the following schema:

CREATE TABLE IF NOT EXISTS "ttn_packets" (DTS INT, DevAdr INT, pSize INT, 
Packet CHAR(341) );

DTS is Unix Epoch Time; DevAdr is the device address; pSize is the size of the data packet; Packet is the encrypted packet; the packet is decoded from Base64 but not decrypted.

My intention was to finish up the packet retrieval and decryption program, then package it up and make it available. I’ve avoided changing the functionality of the lora_pkt_fwd code, so my plan was to distribute my code as a patch on the lora_pkt_fwd code, if I can figure out how to do that. To run it, just turn off the local storage from the local-conf.json file if you just want the TTN packet-forwarding functionality; turn it on by setting the parameters in local-conf.json if you want to add local storage.

I also had to modify the Makefile to accommodate the compile-time options and use of sqlite3 or mysql libraries. I built it as a tree “ttn-storer”, parallel to “ttn-forward” under /opt on my Raspberry Pi, and I created a .service file for it so it’s managed by systemd in place of ttn-forward. But overall, not many changes from the source distribution on github outside of the additions to the C code in lora_pkt_fwd.c.

There are any number of potential problems with the code I’ve written – mostly with throughput. I only have two LoRaWAN nodes in my LoRaWAN working radius – mine! Under load, the database storage part would likely become a bottleneck and need some reworking. I didn’t add any filtering for device addresses, either: it stores whatever is being forwarded to TTN. If you’re in a densely-LoRaWAN-populated area, you might want to filter to store just the packets from your devices to avoid throughput problems.

Anyway, I was just getting ready to go back to this project. If this is the sort of thing you have in mind, let me know and I’ll happily share what I’ve got (acknowledging at the outset that I own the rights to none of that lora_pkt_fwd code! :slight_smile: ). Unfortunately, it’s not well documented yet, either, though if you know how to “make”, can edit a .json file, and know some sql, you’d get started quickly.

I had originally planned to use MQTT to catch and log those packets, but I had no experience with MQTT. I decided the complexity of learning, setting up, and maintaining that in the long term was greater than just logging the code from the forwarder and would be a distraction from the sensor node work that I really wanted to do. And I learned a lot about the forwarder and TTN from the exercise.

But if you find or can quickly develop a viable MQTT solution, that might be a better bet for you right now.

David

Nice exercise, but beware that decryption needs a lot of details for each device:

  • For ABP, your gateway will need to know the DevAddr, and the secret AppSKey and NwkSKey for each device.

  • For OTAA, you will need to capture both the device’s OTAA Join Request and the network’s Join Accept, and use those to derive the dynamic DevAddr and calculate the secret AppSKey and NwkSKey, just like the devices do. This assumes devices have not already joined, and that TTN uses your gateway for the OTAA Join Accept. OTAA needs your gateway to know the DevEUI, AppEUI and AppKey for each of your devices.

  • For both ABP and OTAA you will also have to validate the MIC to know for sure which device has sent the data, as a DevAddr is not unique.

  • To validate the MIC, you need to keep track of the full uplink counter, as the LoRaWAN packet only includes 16 bits of that counter, but the MIC is calculated using the full counter, which might be 32 bits.

  • Using the AppSKey you could then indeed decrypt the application payload. Like the MIC, this also needs the full uplink counter. (Unlike the MIC, you won’t know if you used the wrong value; you’ll just get a different result.)

So, you’ll need to duplicate a lot of device configuration, and might also want to implement frame counter security. And the beauty of TTN’s community network is that your devices’ transmissions might be received by any other TTN gateway that is around. Some packets might not even be received by your own gateway when other gateways receive them.

When using TTN, I’d really say you should use its MQTT API or the HTTP Integration to get your device’s data (and the LoRaWAN metadata).

(I merged the remarks about the full uplink counter from a later answer into this one, for easy future reference. See the pencil icon in the top right corner for the change history.)

@ arjanvanb

Thanks for the comments! All good points that I failed to mention in my haste. My use case may be unique.

I think your first point is really critical: only works with ABP.

My implementation is a little different than you’re suggesting: I don’t expect the gateway to know anything about AppSKey or NwkSKey … those are only known by the decrypting code, quite likely running on a completely different system for security.

But what’s the reason for your last paragraph?

When using TTN, I’d really say you should use its MQTT API or the HTTP Integration to get your device’s data (and the LoRaWAN metadata).

I thought I’d seen MQTT as the recommended route to get the device’s data. What would be the recommended way to secure a local copy of data as it passes through the gateway?

I’d say that both the MQTT API and the HTTP Integration are equally fine, just depending on your taste or needs. (Unless you’re using some third party that offers some specialized integration.)

…which also needs you to keep track of the full uplink counter, as the LoRaWAN packet only includes 16 bits of that counter, but the MIC is calculated using the full counter, which might be 32 bits. You’ll need the full uplink counter for decryption of the application payload as well.

Reading the posts I’ve been wondering about the legal implications of potentially storing third party data. When you run a ‘normal’ gateway you’re just forwarding data to TTN, if a legal entity decides they want a copy of all data you received you’re able to tell them you don’t have any and point them to TTN (which is required to have policies in place in case this happens because of the Dutch law).
Once you start storing data things become different. Now you’re not forwarding data but also storing it and you might be challenged to hand the data over to the authorities. Are you equipped to handle those requests?

When I added the gwtraf transport to mp forwarder I purposely decided to only forward the meta data of up- and downlinks and drop the data to make sure I would face this kind of issue. (Gwtraf forwards metadata of all traffic to a UDP receiver to allow statistics and gateway performance analysis)

BTW, using poly forwarder or mp forwarder you could just forward any uplinks to a second ‘backend’ and use that to store all the data. No changes required to these packet forwarders :wink:

3 Likes

@ kersing

Thanks for the additional and insightful perspective on what I thought was a pretty simple project! :slight_smile:

I wasn’t concerned about security issues in storing the data since it’s encrypted. But you’re right – I could be asked to provide that data if the authorities have other ways to get the keys. I’ll have to think about that a bit, but at the outset, I’d say I’d be OK providing whatever I have, under court order. On the other hand, that would be a good reason to add a filter to store just the packets from DevAdr’s I own. Pretty easy addition to my code, with the DevAdr’s provided through the local-conf.json file (needing no ongoing modifications to the lora_pkt_fwd.c code as I add new devices).

Again, I’m “lucky” at this point in my development – I’m the only LoRaWAN source within range. But considerations like this are important to factor into the code now, rather than be surprised by issues like this in the future.

Thanks for the tip … might be the better approach for @pomplesiegel to go as he’s just starting out. And I’ll keep it in mind as an alternative to this local database approach if I run into problems, though at this point the storing function seems to be working just fine. I rather prefer the architecture where all this works off one Pi (forwarding and storing) for ongoing maintenance simplicity: that’s one reason I avoided the MQTT approach in the first place.

But you’ve given me a number of other factors to consider in the implementation. Thanks!

David

Just for the sake of completeness:

Like I wrote, a DevAddr is not unique, so this is not fool proof either. Even more, the structured way they are assigned already defines about 20 of the 32 bits. So, e.g., there are only about 4,096 different values for all ABP devices in the EU region on the TTN community network.

@ arjanvanb

Thanks. Yes, I’ve noted that as something I’ll have to pay attention to when I get back into lora_pkt_fwd.c again. At this point, storage works with DevAdr, and my next task is to finish the C code for retrieving and decrypting packets from the database. Then back to lora_pkt_fwd.c to reconsider device addresses and filtering.

David

Fantastic ideas all around here and so great to learn what everyone is implementing.

After much searching and internal turmoil I found a simple “serverless” solution which fits well to my personal needs.

My flow entire flow is
(Device) -> (Gateway) -> (TTN) -> (TTN HTTP integration) -> (Google function - python) -> (Google Bigquery)

I have found this works quite well as a “serverless” option in the cloud. Within the google function I am checking devices off a whitelist or blacklist, decoding their payloads and metadata, then saving the resulting data into a row of a Bigquery database using a pre-defined schema.

The billing model for these services is quite consistent and cost-competitive. For anyone else with consistent sensor payloads and an attraction to “serverless” options, this may be of interest.

Thanks!
Michael

Get Big Query as source of Google Data Studio would also be great!

Hi @pomplesiegel ,

Could you share the google function that you used?

Hi @tomascharad, here is a version of the function with our specific info / sensor stuff stripped out. I left in a temperature data point in addition to the TTN metadata.

Function in python 3.9

from flask import Response
from google.cloud import bigquery

bqClient = bigquery.Client() #persistent object as our big query client

#to simplify our strings
DATASET_PREFIX = "sensor-data-ttn.sensorDataset"

#devices to ignore completely
DEVICE_BLACKLIST = [
    "sensor-2" #this will turn off storage for this device
]

DEVICE_TO_TABLE_ID_MAP = {
    "sensor-1": DATASET_PREFIX + ".internal-test_base"
}


TABLE_ID_TO_TABLE_OBJECT_MAP = {
    DATASET_PREFIX + ".internal_base": bqClient.get_table(DATASET_PREFIX + '.internal_base')
}


def insertTupleRowDataIntoTable(table_id, arrayOfTupleRowData):
    # pass - if running locally

    #look up the table object based on its table_id
    #insert our array of tuple data into this table
    errors = bqClient.insert_rows(TABLE_ID_TO_TABLE_OBJECT_MAP[table_id], arrayOfTupleRowData)
    if( errors != [] ):
        raise Exception("Could not find requested table '{}', or perhaps we're not authorized".format(table_id))

#For Canopy v4 and associated table schema, including enhanced air sensor fields to be null
#input JSON from a sensor and it will output the exact schema to send to SQL table
def convertSensorJSONIntoSchemaTuple(rawJSON):
    return [
        (rawJSON["end_device_ids"]["device_id"],  # dev-id
         rawJSON["uplink_message"]["rx_metadata"][0]["time"],  # time - for first gateway
         rawJSON["uplink_message"]["decoded_payload"]["sensorType"],  # sensorType
         rawJSON["uplink_message"]["decoded_payload"]["ambientTemperatureInC"]  # ambientTemperatureInC
         )
    ]

def httpHandler(request):
    try: #everything in a try block
        jsonData = request.get_json(silent=True) #turn request into json
        print("Raw JSON payload received: " + str(jsonData))

        #if the port is not "1", then we don't care about this message
        if jsonData["uplink_message"]["f_port"] != 1:
            print("Ignoring message with unexpected port #{}".format(jsonData["port"]))
            return f'Ignored port'

        #if this device is in our blacklist / ignore list
        if jsonData["end_device_ids"]["device_id"] in DEVICE_BLACKLIST:
            print("Ignoring device '{}' in blacklist".format( jsonData["end_device_ids"]["device_id"] ))
            return f'Ignored device'

        # Map dev_id to table it should go to. whitelist
        try:
            table_id = DEVICE_TO_TABLE_ID_MAP[ jsonData["end_device_ids"]["device_id"] ]
        except:
            print("dev_id {} not found in LUT. Ignoring".format( jsonData["end_device_ids"]["device_id"] )) #if it fails, raise an exception
            return f'OK'  # if everything goes ok
            # raise Exception("dev_id '{}' not found in LUT".format(jsonData["end_device_ids"]["device_id"]))

        #map sensor type to a converter function
        if jsonData["uplink_message"]["decoded_payload"]["sensorType"] == "mySensorType":
            tupleOfSensorData = convertSensorJSONIntoSchemaTuple(jsonData)
        #put other sensor type cases here
        else: #default case
            print("Unrecognized sensorType")
            return f'OK'  # if everything goes ok
            # raise Exception("Unrecognized sensorType")

        #if we're here, everything has gone very well
        #we recognize this sensor, we have its data formatted correctly for a table
        #and now it's time to store that data into a table

        print("Storing device '{}' data into table '{}':". \
              format(jsonData["end_device_ids"]["device_id"], table_id)
              + str(tupleOfSensorData) )

        insertTupleRowDataIntoTable(table_id, tupleOfSensorData)

        return f'OK' #if everything goes ok

    except Exception as e: #if anything fails at all just return a 403
        print("Some kind of exception occurred: " + str(e) )
        return Response("403 Forbidden", status=403, mimetype='text/plain')