Feedback needed: Website to log all MQTT data for you

MedadNewman · December 28, 2020, 7:47pm

Hello everybody, my name is Medad Newman.

From using the Things Network, I have found that saving MQTT data is quite complicated. I have to roll my own MQTT logger to store all the data that comes from the TTN. TTN does not save the data. I have to setup a server, and run a program to log everything that comes through. It is not simple.

I can make things easier for you. I am building a website where all you have to do is key in your:

Application ID
Access key

And I will log everything that comes through, night and day.

The data can be easily exported.

Is this a good idea? Any suggestions to improve it? Would you use it? What would you like to see on the website?

Let me know in the comments!

Preview image:

kersing · December 28, 2020, 9:13pm

As you will be handling third party data, what is the privacy policy? And how are you going to secure the data? Where are you located? (What laws apply to you) And where is the server you are using located (what laws apply to that server)?
How long will you be storing data?
TTN provides a data storage option as well where any application data is stored for up to an week, what makes your offering different?

As you can see from my questions providing this service is not as simple as writing the code and deploying it, you need to think of non technical stuff as well.

descartes · December 28, 2020, 9:36pm

Yeah, sorry @MedadNewman, @kersing is touching on the legals that will royally bite you very early on in to this well meaning foray. And then you will get questions about how to decode the payload that will be specific to their application but appear to involve you. If your service goes down, you will be shouted at. And as you have to subscribe to multiple feeds, you need to consider scaling issues.

And there are already multiple websites that automagically feed in the uplinks and then go on to do all sorts of magic with charts, tables & alerts.

Creating a Python MQTT script that initialises a mySQL database and installs a service, especially if it runs on a Raspberry Pi will be a winner as you won’t have any of the issues above apart from, er, well, how to decode the payload, why won’t it install, how to run it behind a firewall, what ports to open etc.

Take a look at:

https://www.thethingsnetwork.org/docs/applications/storage/
https://www.thethingsnetwork.org/docs/applications/http/

I use these for convenience, the second is live, the first I have to pull data, using both means if my web server goes awol I can still get the data from Storage. The simplest script I have for HTTP just saves each incoming uplink in to a text file - that’s a three liner. You may find these fit your needs better.

cslorabox · December 28, 2020, 9:43pm

Publishing source is definitely a good idea, though it probably makes as much sense to target a cloud virtual machine (perhaps even with a cloneable machine image) than a pi, since it’s a lot easier to safely expose that to various places one might want to see the data from (including a phone which might be on a mobile network even if one is physically sitting in the living room of one’s own home).

Ultimately most good ways of doing this would be fairly portable between a cloud instance and a pi anyway, so if someone really wants to use a pi as a data appliance that remains an option.

descartes · December 28, 2020, 9:45pm

Absolutely, there was a degree of sarcasm in there about making it run on a Pi

A Python MQTT template is fairly generic - it’s decoding the payload that trips people up along with either having code that automagically adds new fields to the database or remembering to add them at the time they change the decoder.

cslorabox · December 28, 2020, 9:47pm

That does raise the option of simply saving the decrypted payloads and metadata, and doing the parsing into fields and assignment of meaning at query time, either server side or even having some sort of way to save and apply javascript decoders that run in the browser.

The downside is that it largely precludes using the database engine to find packets with atypical readings.

descartes · December 28, 2020, 9:49pm

They still have to write the decoder and if you perform a query more than once, decoding each time becomes relatively expensive, particularly for range queries and decimation.

MedadNewman · December 28, 2020, 10:38pm

Agreed with your data handling issues. I will have a think about it.

I have not really figured out how to use the data storage integration on TTN. Will have a look at it.

Medad

descartes · December 28, 2020, 11:01pm

Just add it as an integration to an application.

Then you’ll find there is a link on the integration overview page to a Swagger instance where you can try out the data retrieval API - it’s fairly simplistic so not really a query tool, just allows you to retrieve at application or device level for a time range up to 7 days.

You get the entire uplink package but none of the meta data, so it’s a poor relation to MQTT or HTTP, and like all of them, you need to work on the basis that you will decode the raw payload yourself.

MedadNewman · December 29, 2020, 12:18am

Do you think the metadata is useful for your applications? And is 7 days storage time sufficient?

I suspect people will need more long term storage. What do you think?

descartes · December 29, 2020, 3:32pm

Absolutely. But only for debugging frequency, signal strength & gateway coverage which isn’t a daily thing.

Absolutely not. But it’s very useful for if/when an HTTP or MQTT integration has a glitch and you end up with a gap. And it’s brilliant for interactive development coupled with the JavaScript decoder.

One of my desktop dashboard & management packages pulls the data from Data Storage under it’s own steam - so no remote server for HTTP/MQTT required, no internal MQTT process that can suffer, I can send an installer exe, they install and it just works.

Sure, long term data, trends, machine learning, all sorts come from lots of data points. But if you can organise a gateway & setup (or even create) devices, you should be able to cope with installing & configuring an appropriately documented PHP script on a low cost virtual host, or the same with Python.

There’s nothing wrong with your idea persay, but you need to go in to this with your eyes open and Ts&Cs that protect you so you can stop if it gets out of hand or limit peoples expectations about functionality & data assurance (backup & integrity). If you start charging, that’s when the pain begins and freemium as a road to riches is an urban myth - getting 5% of your users to pay is considered a success. But in this arena, as we see from some people travelling hopeful that TTN can service their commercial needs for free (absolutely fine, BTW) but with expectations of a responsive SLA (not here, use TTI), most TTN users are either tech self-service or aren’t working at a scale that they’d want to pay for data storage.

I think scaling is also something you need to consider. I’ve devices collecting 10 data points every 15 minutes, that’s 14,400 per day per device. Go for a smallish scale deployment, say 50 devices (I’m thinking refrigeration unit management at a distribution centre), that’s 720,000 data points per day plus the 72,000 raw records. One year on, you are storing 262,800,000 data points and a manager makes a mistake with a query that Access/Excel/Tableaux allows them to generate and it sets your database on fire.

You can consider raw data only, but at some point that has to be translated in to data points, so the user will need to download it, process it and put it in to a database, which rather diminishes the point of your proposal - they may as well use the 7 day Data Storage.

What is your primary goal here? Creating a community resource for self-deployment has merit. But a commercial venture needs a lot of consideration.

If this idea is inspired solely of the back of pernickety details in the HTTP or MQTT integrations (and there are some really good ways to suffer an exception that you didn’t expect to ever be a thing), start another topic and I’d hope we can get you up & running.

MedadNewman · December 30, 2020, 9:02pm

Thanks @descartes for the very detailed info on the things to consider.

In summary, you say that

Reliablity is very important: Should be possible to work it out. Reliability is the core selling point I think. Anyone can create a logging system, but guaranteeing 99.9999% up time is hard. If I work on ensuring high up time, my service to customers is reliablity. This I think is worth paying for.
Metadata is semi useful: I personally think its quite useful during setup of end nodes, but not much later.
Scaling needs to be considered: I agree, but I think I have to start somewhere. I plan to use virtual servers(AWS etc) so I think scaling should not be a huge problem. It can be optimised for scaling as more devices are connected
“freemium as a road to riches”: I think I have to start somewhere right? The only way to find out if freemium works is to try it I think.
Deployment issues: I have got a system up and running that has not gone down for 2 months so I think I have a system that works at the moment. Uses python and runs on a virtual private server.

descartes · December 30, 2020, 9:23pm

One of the peculiarities of the world of creating product - service or something 3d - is that you start out with one idea and as it develops, you discover a niche that does pay. So, sure, build something but keep evaluating, reviewing & your eyes peeled.

Reliability - 6 nines is 52 minutes down time a year - if that’s access, they can probably live with it, if it’s collecting data, that’s not so great.

Metadata could be stored as part of a paid plan - I use it to identify devices that are at risk due to only being heard by one or two gateways at the edge of their transmission capability vs battery life.

I actually put “freemium as a road to riches is an urban myth” - I’m saying don’t bet the farm on such a strategy, it rarely works well or as you expect - Google for multiple case studies & how to’s so that you use the best practise. Review the obvious Google hits for other similar services (Adafruit.io for instance) to see what & how they offer and see if you can find something they’ve missed.

Good luck!

MedadNewman · December 30, 2020, 9:27pm

Thanks for the feedback @descartes! Really useful stuff.

cslorabox · December 30, 2020, 10:12pm

Metadata has enduring value.

Eg consider,

“3/4 of my sensors vanished”

vs

“3/4 of my sensors vanished and the surviving ones are 40 dB weaker than they were”

Typically storage should be eternal, as you learn the most interesting things looking at long term

MedadNewman · December 31, 2020, 1:10am

Thanks for the feedback about the value of the metadata