How to reduce size of payload formatter script now larger ones are blocked

I just noticed in the release notes that there is now a limit on the payload formatter script size.
See the PR

My ESPEasy decoder is now (hopefully) being part of the predefined vendor selection, but for those users that may like to use their own (altered) version of the decoder script it may no longer work.
See the decoder script
It is just shy of 40k.

I do understand there needs to be some limit on the sizes, but this limit is clearly going to make it impossible for ESPEasy users to upload their own decoder scripts and I really would like to encourage users to extend what I made.

So is there another way besides using rather bloated JavaScript to implement some more efficient decoder?
This decoder will only be larger in the future as soon as new plugins will be added to ESPEasy.

Running the script through a minifier will bring it down to about 18k. Still way above the 4k limit through.

It would definitely be possible to optimize the existing code for size (at the cost of readability and maintainability) but thatwould not make it easy for people to extend the custom.

Maybe make it easy for people to only include code for the data types they actually use? Like a web page with checkboxes, that generates a js file with only the stuff the user needs?

I can also add some other ways of not using big switch statements, but still 4k is unusable low for my use case.
I do have over 100 devices to support in ESPEasy and also trying to keep things transmitted as small as possible.

So one thing to make the script smaller is to have some kind of functions to call in a library we could include (not sure if JavaScript works like that, I really donā€™t have experience with JS and also donā€™t like the languageā€¦)
For example why do we all have to implement some kind of parser, why isnā€™t there a decoder to recreate the most basic types from a binary stream like float, 32bit int and obviously the basic types even missing from most other languages like int24_t :wink:

But 4k is just way way too small, as with 100 device types thatā€™s just 40 bytes per type. Just the strings of the plugin names and variable names of those will be using more than that.

Apart from that, I donā€™t even have a proper build selector web page to let users build their own firmware, so I guess that has a higher priority compared to a web based JavaScript decoder script generator for a controller less than 1% of my users is currently using.

This is the original issue that was raised internally to TTI:

Youā€™ll note that the v3 stack is already seeing timeouts (not just TTI sysops, I have seen timeouts on the console as well as in WebHook packages for <1k decoders) so some correction was inevitable. This happens often enough on v2. The bottom line seems to be that payload formatting is a convenience function that should not be relied upon for production deployment, ie, decode the raw payload on your own servers.

I really like the template system youā€™ve got for defining the byte stream and the corresponding value label but that and the flexibility on the number of decimal places for different sizes sure does add up.

Iā€™m finalising some examples on doing this with Python & PHP. It is entirely feasible to use a JS engine to allow the use of a JavaScript decoder from other languages, I guess it all depends on what the users end-point is - ie WebHook or MQTT and how itā€™s hosted.

How do ESPEasy users general get their data from TTN? As a WebHook would require access to a full-time internet facing web server, I guess MQTT would be good to run on a machine or Pi or similar. Perhaps a worked solution in Python that can output the data in a number of different ways would help?

PS, there is also some discussion on the payload formatters included in the device registry which you may want to look at.

1 Like

Your template style would be good for that and as soon as I thought of it, I figured you could look at CayenneLPP which is built in to TTS but sadly doesnā€™t use meaningful field names.

Many of the payload formatters Iā€™ve assisted on have many branches for port numbers, firmware revisions and internal flags that do tend to get out of hand, I gather that some have hit the 200K mark!

I think there is a lot of room for some thinking & innovation here.

1 Like

I just realized one other important side effect of this changed limitation in decoder size.
It makes it impossible to test a new version of my decoder as it has to be sent in as a pull request and then added, tested, etc.

You can only test small chunks and hope you donā€™t mess up merging it into the larger decoder.

Another approach -at least for my decoder- is to have a single decoder file for all (big part of my decoder JS) and then just some JSON to describe the fields of those that donā€™t have a very specific decoder.
e.g. my system information and GPS plugin do send way more than the standard 4 values per task in a compressed form where I base all values on the number of bits I need. Like a longitude/latitude only needs 24 bit, but need some scaling to use maximum resolution and HDOP can also be packed in a single byte.
I could even generate such a JSON from ESPEasy, so the user only needs to copy/paste that information for its specific build.

I have looked at Cayenne, but I found it way too limited as you have to fit your sensor values into a very limited set of predefined units of measure. Domoticz also restricts values to be in a predefined set of units of measure (albeit a much larger set compared to Cayenne) and thus a lot of sensor types are forced to use ā€œcustomā€ sensors.

Also one very interesting use case is to collect multiple sensor values which ā€œbelongā€ together. For example in the yearly Delsbo Electric competition in Sweden (students build a rail vehicle to transport people over a set distance using as little energy as possible) I made the measuring box. This sends over GPS info and values from the energy measurement sensors. (among other sensor values too)
Those values ā€œbelongā€ together, as the energy consumption is related to the position on the track.
So in my header, in which I describe the task (you can have several ā€˜tasksā€™, some may run the same ā€˜pluginā€™), but alo a sample set counter.
Since you are likely sending the sensor values in a burst, you always start with one specific sensor. For example the GPS may trigger a new event every N meters travelled. This triggers this burst of values and thus it marks the beginning of such a burst. This is what the sample set counter is used for.
It marks which values ā€˜belongā€™ to each other.
Such concepts I have not seen in other existing payload formats.

I donā€™t know how others use the TTN values.
I have used it via Python to collect live charts during the Delsbo Electric event last year, but MQTT is also being used quite a lot.

N.B. I am also planning to add an MQTT controller definition for ESPEasy to interact with other nodes linked via TTN.
This way you can even send commands/events from a node in the field to a connected node via MQTT.

1 Like

The typical ESPEasy user is using it because of the ā€œEasyā€ part.
This means requiring a separate server to decode it may be out of the question.
Of course if it were only received by other ESPEasy nodes, then it would be less of a problem as such a decoder can be added for every plugin, so just have a build including that plugin and you can read the data. But thatā€™s also serverely limiting the use cases and also adding some ā€œvendor lockinā€ kind of behavior to which Iā€™m truly allergic.

Another approach could be to only decode those messages that are actually processed, but it gets complex very soon to check if someone is connected and subscribed to a MQTT topic and/or if a message needs to be published with retain flag, etc.
These checks may soon take more resources (to maintain and process) than saved by not decoding them.

But at least I can consider to add a very basic decoder for those that need to receive the packed payload and decode it only on ESPEasy.
And since weā€™re having it in JavaScript already, I guess it can also be decoded when shown in a web frontend.

Do you have a link?
I donā€™t see it here as a subforum, or maybe I need some more coffeeā€¦
I did already create a ā€œvendorā€ entry for ESPEasy, which has been merged and should be present when the new version is active. That one does include my decoder, so that one should not (yet) be affected by the new imposed limits.

Maybe, but I donā€™t use the console for tricky ones - Iā€™ll upload my very simple web page of payload formatter testing - edit the code in editor of choice, use the web browsers developer console to debug - breakpoints, single stepping, the whole shebang

Not sure I understand this - you wonā€™t be able to determine if someone is connected via MQTT unless they connect via a broker you host.

I can think of several ways of potentially solving this problem but need some clarity on how TTN is used:

  • Do users setup their own account on TTN?
  • How do they currently receive uplinks in to the ESPEasy eco-system?
  • Does the information end up in a database, if so, which?
  • Which I guess means, is there some desktop software or web app?

Itā€™s a GitHub issue:

Basically I raised a concern about generic devices (in that case, the Arduino MKR WAN 1310) having a payload formatter defined. Iā€™m not entirely clear if ESPEasy falls under this use case, I suspect about 30% yes but as your core device code supports a wide range of sensors, about 60% no. But just letting you know that having a payload formatter in the device repro seems applicable if the device is set in stone - usually a handful of sensors so a smallish formatter. Other cases it needs careful review. I suspect you fall in to the ā€˜interestingā€™ category where in theory the devices could be considered to be fixed but the flip side is that the decoder code is pretty substantial.

Like I suggested before, Iā€™m not really a JavaScript programmer.
As soon as StackOverflow is offline, I loose all my JavaScript capabilities.
Iā€™m more C++ oriented.
Sure I can debug JS code in the browser, but thatā€™s not how I work.
Apart from that, if the JS decoder can no longer be > N kB in size, I cannot use it in the TTN console anymore for testing. That was the point I wanted to make.

ESPEasy users for sure need their own TTN account as soon as they want to use this platform for receiving their data as I donā€™t want anything to do with data thatā€™s not mine.
My goal is to make it as easy as possible to use sensors to collect data yourself and encourage users to be as creative as possible in how to use it.
This is also why I took quite a lot of effort to make the TTN link from ESPEasy as easy as possible, so you wonā€™t have a steep learning curve to get started and thus also added an ā€œESPEasy vendor definitionā€ to pick from.
When using this route, there is a single decoder for potentially all ESPEasy users, but it is clear some will make their own plugins and will extend on the decoder, so there will be forks of the decoder script for sure.
I really do understand why some limits may be needed here, so therefore I would like to know how something can be implemented to allow scaling up without adding a lot of complexity.
One way could be to have some decoder which only needs to be fed a (small) JSON per node to present the strings needed to make the decoded data structure more readable.
But thatā€™s already a rather different approach from whatā€™s currently present in the TTN environment.

How ESPEasy users receive and process their data is unknown to me.
Like I said, I use it via Python and MQTT, but as soon as the number of options to get the data grow, the more complex it gets to implement decoders and/or detecting whether it needs to be decoded per message. Therefore Iā€™m not sure thatā€™s the way to go.

If the TTN backend does have performance/resource issues decoding using JavaScript, I can also turn my decoder into another language and like I said if decoding and ā€œinterpretingā€ could be separated to ease resource usage, Iā€™m all in favor of implementing it for ESPEasy.
But then I need to get an idea of what really causes these performance issues.
What is the main problem here?
I can imagine having a 100k decoder script for every node out there is using quite a lot of storage.
So if that can be split into letā€™s say a 100k decoder for all ESPEasy users and an 1k description JSON file per set of nodes, that would make already a big difference.
The decoding itself is only needed when a message is arrived, so that may scale somewhat linear with the number of messages being processed.
And like I said, we can of course distinguish between ā€œdecoding allā€ and ā€œonly decode headerā€ for those that may process the data on ESPEasy nodes. (N.B. there is also RpiEasy, which is built for Raspberry Pi, same UI, rules and plugin/controller structure as ESPEasy)

So in short, I must understand where the real bottleneck is and also whether or not it is feasible to split the decoder and node description into some form where it tackles the potential bottleneck(s) and still makes it as easy to use as possible.
But I like to avoid having to maintain several decoders for several use cases.
One to run on the TTN servers and one inside ESPEasy nodes would be doable, but not more.

As you may have seen, the current decoder already defaults to 1ā€¦4 floating point values with 4 decimals for those plugins that are unknown to the decoder. So you will not see a descriptive label, but it may already work as thatā€™s the format usable for the majority of ESPEasy plugins.
The ā€œdescription JSONā€ I mentioned could be as simple as just describing the labels for a specific plugin and/or task and the decoder can be extended for those plugins that really benefit from a plugin specific decoder like the GPS and sysinfo plugin I mentioned (and some more already implemented)

As ESPEasy is (finally) getting some more attention from others also actively (and frequently) committing code, the number of plugins is growing quite fast lately. So I would not be surprised if we reach the current limit of 255 plugins in the next 2 years.

Nice idea but I canā€™t see that being implemented any time soon - Iā€™m not aware that they are actually going to restrict device repro payload formatters so you may well be OK, subject to you getting sorted with a testing framework.

Sadly not, see below

Scaling:

The decoders have to be held in memory (Redis) so that they are almost instantly available - and whilst there is a hierarchy (so only one copy of the device repro decoder is in memory), if you have 10,000 applications with half of them having a payload decoder and 10% having individual device decoders, the memory gets over the top, even if there is only one instance of your ā€˜bigā€™ decoder.

And then the JavaScript has to be run - so pulled from memory in to a JS runtime, run and the output inserted in to the uplink package and passed on to the next stage. So if a half a dozen larger decoders are running by some random co-incidence of uplinks and thereā€™s a surge of other uplinks all needing decoding, the server has to split itā€™s processing n ways, each n slows down that expands the window of overlap, more come along and you have a classic case study for a CS undergraduate in scalability.

As decoding on TTN seems to have potential issues, perhaps a poll of your users to find out how they retrieve their data & where they put it would help.

For those behind a router, Iā€™d go with MQTT & Python, have the decoder in the Python and then provide a number of templates for data output - tab file or insert in to a SQL database.

OK, that makes it clear the ā€œend decodingā€ is by far favorable over allowing user all freedom in custom decoding.

You mentioned Cayenne being decoded a bit different.
So maybe I can also look into how thatā€™s done and derive from it to make some implementation thatā€™s causing less of a load on the TTN servers.
After all, it is not like decoding this should run into an unmanagable mess when scaling.
It should be the same for the biggest part of the ESPEasy users.

I really donā€™t know how many ESPEasy nodes are out there.
I do know that the number of GitHub downloads per build is roughly 20k, so I guess it will be somewheren between 20k and 100k actively used nodes, maybe more as not all users update all their nodes on every build.
The number of ESPEasy users using the TTN controller is currently rather low as it is a relatively new implementation and you need to build your own board using the RN2xx3. But as soon as I have support for those ESP32 boards with SX127x chips, I guess it may grow. (current estimate about 100 ESPEasy nodes running TTN controller of which roughly 40 built by me)
So better have it ready for larger deployment before the number of users grow :slight_smile:

I will sleep over it some more and maybe also have a call with @kersing discussing the possibilities as Iā€™m currently rather unknown about the internal structure of how data is being processed in the TTN architecture. After all, he introduced me to TTN and sparked my enthusiasm :slight_smile:

1 Like

Itā€™s embedded in to the Go code so ā€œjust worksā€

Mostly no one needs to know - you put data in to a payload on a device and TTN spits it out the other end using a number of different connection types - and we are deeply unlikely to influence any changes to it so itā€™s a bit academic.

If you want to Zoom with @kersing & me, just let me know.

1 Like

I understand that adding another system to the mix is not what we want, because it makes it harder for people to understand and manage. But, how about using a system that is dedicated to decoding? Bitdecoder supports custom javascript functions.

Another alternative might be if TTN supported a ā€œstandardizedā€ description format like https://kaitai.io/ . In that case, TTN would only need a single parser code (and could use an implementation that is more efficient at execution time than javascript; rust or go for example), and each device would have their own yaml file describing the payload.

I use kaitai to decode lorawan payloads inside Azure StreamAnalytics. It works very well.

Cool stuff to read, thanks.

Iā€™d personally prefer more tools for people to ā€˜easilyā€™ build their own decoders on their own backends and take any load we can off the TTN servers - most of everything else falls under a specification whereas the JS decoder has many potential levels of efficiencies.

1 Like