TTS Open Source - Starting a local version leads to a network server error

mattp · January 28, 2025, 10:53am

Hello dear experts.

Here is some context.

I want to run The Things Stack Open Source Edition on a device with a customized OS, which will be permanently running a local instance of the server.
I want to use it to connect gateways and devices, and I will retransmit data flow on another server, and this other server will also ask the TTS one to manage devices (I will use the API for that).

It is important to note I can’t have access to docker on this device.
This is why I deployed TTS from the “releases” files (for armv7).

I achieved to launch the following (I’ll explain later why I am forced to use the flags):

./ttn-lw-stack start as is js console gs

It starts correctly.
I can log in as an admin, and create a new application.

I assume that it is because I do not have the Network Server running that I cannot use the console to add devices in an application (I get an ‘Error 404’ when I want to).
I did get the “data” folders, with the lorawan-devices, lorawan-devices-index and lorawan-frequency-plans folders.

INFO    Client error    {"auth.token_id": "4ZGMSUJYQ2AYGIG5LGKCHSYCRWHDNCRGBW23XCQ", "auth.token_type": "AccessToken", "duration": 0.0193, "http.method": "GET", "http.path": "/api/v3/dr/applications/testing-app/brands", "http.status": 404, "namespace": "web", "peer.address": "192.168.2.150:53771", "request_id": "01C2T4HS3HFGW3BDE6466AQ0EM"}

But I cannot add the flag ‘ns’ (or only using ‘start’ without components flags), or the following error appears:

DEBUG   Subscribed      {"namespace": "applicationserver/io/packages"}
error:cmd/internal/shared:initialize_join_server (initialize Join Server)
    correlation_id=40604e0fc88b4d4e8e5fa4605fa9f06b
--- error:pkg/errors:syscall (`write` failed)
    timeout=false
    syscall=write
    error=broken pipe
    correlation_id=d7f88daf33bf4f5dbf20bda90d5c5ab4
--- broken pipe

Adding the network server brings the Join Server to an error.

EDIT:

I tried without the JS, but with the NS. (I also disabled the JS in the “Console” section of the configuration file to test).
It does not crash, but I have this constant error:

WARN    Task failed     {"error": "broken pipe", "error_cause": "broken pipe", "invocation": 8, "namespace": "networkserver", "syscall": "write", "task_id": "process_downlink_0", "timeout": false}
ERROR   Failed to pop entry from downlink task queue    {"error": "broken pipe", "error_cause": "broken pipe", "namespace": "networkserver", "syscall": "write", "timeout": false}
go.thethings.network/lorawan-stack/v3/pkg/log.(*zapHandler).HandleLog
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/zap_handler.go:84
go.thethings.network/lorawan-stack/v3/pkg/log.(*Logger).Use.func1
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/logger.go:55
go.thethings.network/lorawan-stack/v3/pkg/log.HandlerFunc.HandleLog
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/handler.go:38
go.thethings.network/lorawan-stack/v3/pkg/log/middleware/observability.(*observability).Wrap.func1
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/middleware/observability/observability.go:86
go.thethings.network/lorawan-stack/v3/pkg/log.HandlerFunc.HandleLog
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/handler.go:38
go.thethings.network/lorawan-stack/v3/pkg/log.(*Logger).commit
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/logger.go:75
go.thethings.network/lorawan-stack/v3/pkg/log.(*entry).commit
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/entry.go:69
go.thethings.network/lorawan-stack/v3/pkg/log.(*entry).Error
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/entry.go:94
go.thethings.network/lorawan-stack/v3/pkg/networkserver.(*NetworkServer).processDownlinkTask
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/networkserver/downlink.go:2234
go.thethings.network/lorawan-stack/v3/pkg/networkserver.New.(*NetworkServer).createProcessDownlinkTask.func17
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/networkserver/downlink.go:1892
go.thethings.network/lorawan-stack/v3/pkg/task.Func.Execute
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/task/task.go:47
go.thethings.network/lorawan-stack/v3/pkg/task.DefaultStartTask.func1
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/task/task.go:167
WARN    Task failed     {"error": "broken pipe", "error_cause": "broken pipe", "invocation": 9, "namespace": "networkserver", "syscall": "write", "task_id": "process_downlink_0", "timeout": false}

It sure might be because of my configuration.
I am using the .ttn-lw-stack.yml file, but some things are still hard to understand for me.
Of course, because I am running the stack on a localhost, I am not using TLS, and I set everything related to TLS and certificate to empty or disabled.

First of all, which server do I really need to run ?
Then, what about interoperability ? I did not really understood if I need it for my case, and if I need it to be configured to make the other components working.

Thanks for your help.

If necessary, I can paste my configuration file, but I will only paste my NS configuration for a better reading of this already long post.

Network Server configuration part:

ns:
  application-uplink-queue:
    buffer-size: 1000
    fast-buffer-size: 16384
    fast-num-consumers: 128
    num-consumers: 1
  cluster-id: ""
  cooldown-window: 1s
  deduplication-window: 200ms
  default-mac-settings:
    adr-margin: 15
    class-b-timeout: 10m0s
    class-c-timeout: 5m0s
    desired-adr-ack-delay-exponent: ""
    desired-adr-ack-limit-exponent: ""
    desired-max-duty-cycle: ""
    desired-rx1-delay: "5"
    status-count-periodicity: 200
    status-time-periodicity: 24h0m0s
  dev-addr-prefixes: []
  device-kek-label: ""
  downlink-priorities:
    join-accept: highest
    mac-commands: highest
    max-application-downlink: high
  downlink-queue-capacity: 10000
  downlink-task-queue:
    num-consumers: 1
  interop:
    blob:
      bucket: ""
      path: ""
    config-source: ""
    directory: ""
    id: ""
    url: ""
  net-id: "000000"

descartes · January 28, 2025, 11:17pm

There are many people that get in to a tangle just setting up TTS OS via the Docker using the instructions supplied and I’ve not seen any posts from anyone who’s gone direct or even compiled the source, so responses are likely to be limited.

I’d start out with running the Docker version so I can poke about to see how it links together. And then look at the Docker support files.

Or use a support ARM platform like a Raspberry Pi? If this isn’t possible, please tell us what is unique about your board with customised OS so we can make suggestions.

mattp · January 29, 2025, 11:39am

Thanks for your answer.

I tried the same configuration under Docker and … it worked.
(It was under the amd64 version)
No error at starting, and I was being able to access the console.
I used the same configuration file, I changed the database and folders sections to avoid errors with the Docker environment.

I opened all the necessary ports on my custom OS (I looked at the ones set in the Docker yml file), but what I do not understand is why a “write” error is appearing, and only when I want to start both Join and Network servers.
But the issue obviously comes because of the Network server, because the Join server alone doesn’t seem to be bugging, while I have the constant broken pipe error with the NS.

When looking into the Network server configuration (or others components config involving it), I do not see any settings that might be causing this error.

Now the best solution would be to use Docker, but the device I use can’t run it.

To be more specific about my situation, I am currently working on replacing an older LoRaWAN server on a “homemade” device (and OS) own by the company I’m currently working at, with memory and power constraints (as for the OS, it is under armv7, but has no packet manager, which is why I need to do cross-compilation or using releases).

This is why I can’t use Docker, or more precisely, it will be the very last solution because it will cause a lot of configuration and changes in the device’s system, testing, etc…

I do not have access to a Raspberry Pi right now (but I might buy one for myself), so I can only try under a virtualized Docker.
I did it a few weeks ago, but it was very slow. I can retry with the settings I achieved to make working under the amd64 Docker version.

However, I hope someone know how what does the Network do that might be causing the broken pipe errors.

Thanks again for your answer, if I manage to resolve this issue, I’ll update this topic to explain how I did.

descartes · January 29, 2025, 1:08pm

This is a common situation of trying to get new software working on older hardware which can’t cope for any number of in-explicable reasons. Experience says that you are likely to be putting a lot of effort in to this when a hardware upgrade in parallel will ensure continuity of supply of that and be able to cope with the new improved TTS. A Pi 4 should be able to cope if it’s using a SSD and you can roll your own boards with the CM4 and have access to a huge set of resources for assistance. I’d expect a Pi 5 to be a bit over the top but that will surely set new records, particularly a 16Gb model.

mattp · February 3, 2025, 3:20pm

Thanks for your answer.

Unfortunately, I am not able to change the device for now, because the microchip and RAM memory cannot be changed (I need to keep the same device for production).

However, I tried to check the memory consumption, and it does not seem to be the issue.
The hardware I’m working on is pretty limited, but still achieves to run the console and other working component servers without trouble.

After further testing, I found out that starting both Network Server and Identity Server provokes the error.

I also found that the error appears more precisely because the NS tries a downlink.
I do not know if this is supposed to happen by default, or if I did something wrong with my settings.
Looking at the settings, I did not found any settings turning off or on downlinks sending, except for the MQTT protocols which I turned off because I will not need it for the moment.

To summarize it (using the ttn-lw-stack start command, with the flags) :

(OK = no crash/errors, can use the console, but cannot manually add devices (404 error) / simulated devices uplinks and downlinks via the ELoRa emulator are not taken into account, devices show no activity, all of this with no shown error on both emulator and TTS console/terminal)

console and as flags are always used.

ns alone => OK
js with anything else except ns => OK
ns with js => instant crash, broken pipe error :

error:cmd/internal/shared:initialize_join_server (initialize Join Server)
    correlation_id=0de832151bea4130993eea2777b3bd0c
--- error:pkg/errors:syscall (`write` failed)
    syscall=write
    error=broken pipe
    timeout=false
    correlation_id=fa3ca07a814d4592a1f6a4ee24808f88
--- broken pipe

ns with is => multiple errors, coming form a failed downlink sending from the ns :

ERROR   Failed to pop entry from downlink task queue    {"error": "broken pipe", "error_cause": "broken pipe", "namespace": "networkserver", "syscall": "write", "timeout": false}
go.thethings.network/lorawan-stack/v3/pkg/log.(*zapHandler).HandleLog
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/zap_handler.go:84
go.thethings.network/lorawan-stack/v3/pkg/log.(*Logger).Use.func1
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/logger.go:55
go.thethings.network/lorawan-stack/v3/pkg/log.HandlerFunc.HandleLog
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/handler.go:38
go.thethings.network/lorawan-stack/v3/pkg/log/middleware/observability.(*observability).Wrap.func1
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/middleware/observability/observability.go:86
go.thethings.network/lorawan-stack/v3/pkg/log.HandlerFunc.HandleLog
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/handler.go:38
go.thethings.network/lorawan-stack/v3/pkg/log.(*Logger).commit
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/logger.go:75
go.thethings.network/lorawan-stack/v3/pkg/log.(*entry).commit
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/entry.go:69
go.thethings.network/lorawan-stack/v3/pkg/log.(*entry).Error
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/log/entry.go:94
go.thethings.network/lorawan-stack/v3/pkg/networkserver.(*NetworkServer).processDownlinkTask
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/networkserver/downlink.go:2234
go.thethings.network/lorawan-stack/v3/pkg/networkserver.New.(*NetworkServer).createProcessDownlinkTask.func17
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/networkserver/downlink.go:1892
go.thethings.network/lorawan-stack/v3/pkg/task.Func.Execute
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/task/task.go:47
go.thethings.network/lorawan-stack/v3/pkg/task.DefaultStartTask.func1
        /home/runner/work/lorawan-stack/lorawan-stack/pkg/task/task.go:167
WARN    Task failed     {"error": "broken pipe", "error_cause": "broken pipe", "invocation": 5, "namespace": "networkserver", "syscall": "write", "task_id": "process_downlink_0", "timeout": false}

Do you have any leads I can explore to solve this problem ?

descartes · February 3, 2025, 3:50pm

You could reinitialise the database so this doesn’t happen. It certainly can’t happen unless there are registered devices that have joined/uplinked at some point. Which from what you’ve said above can’t really ever have got to a point where a application & a device were registered, let alone joined.

The broken pipe could be something with this custom OS not having enough resources.

Overall I’d put myself firmly in the category of muse & old-timer who could elicit ideas - but only in the same room to poke the hardware with a pencil, think out loud whilst drinking a really strong cup of coffee. Without knowing the build of this custom OS and the hardware details, I think anyone is going to struggle with specific ideas.

As for updating to more recent hardware, you appear to have two choices - either update it so you can “just run TTS OS” or stick with it until the wheels come off the project and you update it so you can “just run TTS OS”. From experience, the third option of it all suddenly falling in to place occurring is wishing on a star.

mattp · February 3, 2025, 4:39pm

Thanks again for your help !

I tried reinitializing the database, but I still get the downlink tries when starting the ns.

To give more detail about the custom OS I’m using, it comes from an “empty” linux armv7.
I have access to a very limited programs and dependencies.
I did not made it myself, but I can give you more detail about it tomorrow.
Also, I’ll ask my coworker to help me check what’s different about our OS and the one running with Docker, to maybe find a dependencies we do not have.

Here are some hardware details:

CPU: i.MX 6ULL (NXP) - ARM Cortext-A7 32-bit, 792 Mhz
RAM: 512 Mb
SD Card: 8Gb

I hope it is enough to run TTS.

What makes it hard to understand for me is why the Network Server’s starting can cause broken pipe errors due to resources/power, while I can run the rest with no issue, and even other programs and connected devices on the same product (under production, I am only running the OS and TTS for testing purpose).
Does that mean the NS is maybe doing something particularly resource-intensive ?

descartes · February 3, 2025, 6:13pm

This is a rather worrying lack of engineering concepts - the other programs do other things - you could have 100 x Hello World running and then say that launching an MQTT broker fails - they are not at all comparable.

Do you know what the Network Server, the beating heart of a LoRaWAN Network Server has to do? And how it does it?

Have you tried checking the memory to see what happens to it when you start all of these programs?

Again, I’m more of a generalist but I’d be hard pressed to see how you get the stack plus Redis plus Postgres to run in 512Mb of RAM. That’s the same as a Pi Zero which can run a gateway, but in my wildest dreams I’d not expect to stuff all those other processes in to such a small working area.

mattp · February 3, 2025, 8:16pm

I am sorry for the misunderstood, I did not give you all the details to not make my post too long, and my ability to explain things in english sure might be lacking some experience (expecially for this kind of specific case).

The device I’m working on did and still can run a whole LoRaWAN server and MQTT broker (Mosquitto) while computing statistics and managing data traffics from other devices for energy supervision, so it is quite comparable
.
It is capable of running this server: GitHub - gotthardp/lorawan-server: Compact server for private LoRaWAN networks, and supports around up to 50+ devices.
But as said in my second post, I’m in charge of replacing it, because it is no longer developped, and lacks the last version of the LoRaWAN protocol.
And to put even more context, this is my 5th year internship project, so I might lack some experience, and even more knowledge, on the LoRaWAN topic, but I worked and still is working on it (the documentation available and the videos provided by The Things Network, as well as other documentation and my coworkers, helped a lot).

However I did not find a terchnical explanation of how the different components works together that might help me solve the issue I’m encountering, and that’s why I turned myself to the forum.

This is what I meant by “run the rest”, and I deeply apologize for making such a shortcut (this is what happens when you explain things to many people during the day and start to have the reflex to explain like everyone has the whole context … I’ll gladly take that really strong cup of coffee you talked about).
Of course, for my testing I do not run anything else than the OS and the TTS, and not the other server and instances.

And this is why I think it is a setting issue, and not a hardware one.
However, I might be wrong and I will try tomorrow to make sure my device can handle it.
Maybe that the previous server we have been using is not as resource-intensive as TTS, but I doubt the difference will be that huge.

And I am as surprised as you, but the previous server also needed to run a Redis and Postgres instance, and the device still managed to handle it in addition to other programs.
I have done multiple test and the databases took around 40-50 Mb of RAM with the emulator set with 200 devices.

I’ll post here any progress or leads.

descartes · February 3, 2025, 9:27pm

No, not even a little bit. I’m not being patronising, but after using computers for FORTY SIX years, the differences that come about over just a few years are vast, huge, chasmic. The pace of change in the last few years has been exponential, but then it always has been, but exponential growth means the changes are larger, faster.

I’d be happy with Win98 using Office 2003 - to the extent that I do use such a combo. It works, it does almost everything I need. I run it in a VM on my MacBook. It has 1Gb allocated on a machine that face-plants when I run Word & Excel and has 16 GIGABYTE of memory on a SSD that runs TEN times faster than the typical disk that ran XP.

On my shelves I have a plastic box with half-a-dozen spare 8Gb SO-DIMMS for the random collection of Mac minis (4) and MacBooks (4) that are sat on that shelf next to the 20+ hard disks & 5+ smaller (<512Gb) SSDs. Storage here is in tens of Terabytes, in one room of my house I have my first hard disk for a Mac Plus (1Mb of RAM) - it’s 20Mb.

My main server before had 4 cores, 8 threads and 16Mb of memory. The second hand one I now use has 28 cores on each of its 2 CPUs and has 128Gb and cost less than its predecessor.

The ATmega328, the heart of the Arduino Uno, has 32KB of Flash and 2KB of RAM. I could run a LW device on that. Now I need at least 64KB of Flash and 8KB of RAM, which is not a problem as the STM32 MCU’s have 256KB of Flash and 64KB of RAM and runs 5 times faster. ESP32s have MB of Flash and 400KB of RAM and runs 20x faster.

I’ve put this to overemphasis the situation so you can re-evaluate your understanding based on the following points:

Petr Gotthard’s LoRaWAN server was designed to include custom Linux targets.
It uses a built-in aka embedded database, mnesia, designed for these sorts of applications
It supports up to LW v1.0.3
The ReadMe states “It will probably never support the sophisticated management features of the commercial-grade network-servers.”

Whereas TTS is:

Designed to meet the full specification of LW v1.0.0 to v1.0.4 and v1.1.0 and all it’s extensions.
Uses two external databases, Redis for holding pretty much all active data in memory and Postgres for the rest.
Was designed to run on “Big Metal” using a language not original intended for embedded, it runs on a Pi because the Pi has increased in memory & can use SSD’s, not because TTS has been optimised for it.
A fully featured bells & whistles LNS with separate Join Server that is designed to be spread over several servers - so each component comes with an overhead in terms of it’s runtime - which is not an issue when you have multiple Gb’s of memory, SSDs and a processor that can run multiple threads in parallel & is optimised for this level of multi-tasking.

Petr’s LW server was a nice tidy EV run-about that’s charged at home from solar panels. TTS is a Cybertruck that has single digit MPG and has an auto-puppy-run-over sensor.

The rate of change of hardware requirements is partly driven by the reduction in cost, the newer techniques & facilities of the development environments and, to be frank, the poor quality of the current CS syllabus that have failed to explain how it all works right at the desk of the registers. Having a memory safe, garbage collected runtime with all the data structures you can wish for means it is mostly OK to write code with abandon. But that environment is not one suited for a single core processor with less memory than my Biro.

What previous server? Not Petr Gotthard’s from my understanding, not that I’ve done much with it.

mattp · February 3, 2025, 10:33pm

Many thanks for your answer.

Do not worry, it was very interesting and instructive, and I clearly was wrong on my presumption.
I prefer being wrong, getting a good and instructive answer then learning than to remain in the false. And I wouldn’t ask for help if I was sure of myself !
I didn’t realize the scale difference between the two servers on the resource requirements/using.

It would be interesting to have minimal hardware requirements or ressource consumption statistics available on the TTS OS GitHub page or the documentation for embedded system deployment.

Your are right, I got mix up, this was for the ChirpStack server, which was also tested earlier, but is no longer suitable for our case for many reasons; but had convincing memory testing on the V3 version.

So after checking the ressource consumption on the device tomorrow, I’ll have two choices:

A - The device can surprisingly handle it and I have to modify settings/architecture (I hope that a peak of memory/CPU usage will show up before the NS crashes ? And if it doesn’t, it would mean a setting issue ?)
B - The device is too limited, and I’ll see with my coworkers if we must find another stack, or if we can upgrade the hardware to handle it.

In any case, I’ll see with my coworkers and share the details you highlighted.
I will make sure to be sure if this is due to hardware limitations or settings.

Thank you again.

descartes · February 4, 2025, 12:13am

The requirements are listed on the Enterprise version, 4vCPUs & 16Gb memory, for Docker, other configs are listed, albeit for 100,000 devices. That’s 160KB per device. From that alone it starts to flag potential issues with a 512KB device.

There has never been any hints that TTI intended for TTS to be deployed on a resource constrained system, ever. The most likely explanation for the lack of documentation for embedded system deployment is because it’s not intended to be. Just like IBM haven’t released docs on running DB2 on an AVR.

I’d note that TTI have done a very through job of documenting TTS and given that it’s open source, you can read the code yourself to see if they left out some setting. But using the reasoning I’ve outlined above, that setting is unlikely to exist. If it did exist, I’m pretty sure it would be well documented as it would imply that the memory use can be dramatically lowered, which would be good for all users of TTS OS on a full fat server as much as a smaller system. The fact that it’s not documented speaks to the likelihood of its existence.

However you really need to as the TTI engineers directly for a definitive answer, as I elude to in a comment below.

You’ve encountered a situation where Product L can run on embedded devices using Technology E. This does not mean that similar Product T using Technology G that is 5 years further down the development timescale is going to run on embedded devices and given the evolution of code bloat, is deeply unlikely to run on a hardware design that’s +5 years old.

This is about 33% experience and about 66% common sense, something that needs regular use to maintain its effectiveness.

Who ever did the feasibly study on this could have checked in here before ground was broken. Or called up TTI to ask. Any company that wants to deploy TTS in to an unusual, non-documented system that is shipped as a closed box would benefit from having a direct support contract.

I fully appreciate that there is a hierarchy and you were probably handed the poisoned chalice by someone who didn’t know what they were delegating and is potentially unable to appreciate the details or nuances of the situation.

If “they” are giving you a hard time for not being able to perform miracles, then feel free to invite them to contribute to this thread to explain how they think it should work as is. That is, not how they’d like it to work, but how it might get working with the current version.

It’s not something we can get in to here, but Chirpstack v4 is written in Rust (that is designed with embedded systems in mind) and has a SQLite flavour for its database and has an armhf on the downloads page …