MAX_CLOCK_ERROR vs. better xtal?

Hi,

I seem to be finding a lot of fix to common lora/lmic issue by adding MAX_CLOCK_ERROR coefficient. Which by they way did also for me worked wonders and fixed my with RX2 Join acceptance from Gateway.

My question is wether this firmware fix could be circumvented by a better hardware implementation and more accurate MCU timing? What would be, generally best practices to obtain robust MCU clock ticks and avoid such timing issues.

Cheers,

This is very unlikely to be actual crystal error over the short interval of even a join.

Rather, typically the problem has one of two sources:

  • bad software architecture with respect to timekeeping, for example the ESP32 Arduino port seems to have (or at least have had) some serious issues

  • use of an RC oscillator, especially one not dynamically calibrated against a watch crystal, rather than using the watch crystal itself for timekeeping

Given that you’ve failed to mention your hardware and software platform, it’s gong to be hard for anyone to say anything much more specific.

I could understand that timing might be an issue on nodes using say the internal RC oscillator of a processor (ATmega etc) or even possibly nodes powered by Arduino Pro Minis that often use a reasonator.

But nodes allready using crystals need to be more accurate, surely not ?

Boards with 8-bit 8MHz AVR MCU like ATmega328 and ATmega32u4 normally require LMIC MAX_CLOCK_ERROR to be set to a value higher than 0.
Some of these boards are Arduino Pro Mini and Adafruit/BSFrance LoRa32u4.

My assumption was that this is related to the relatively low clock speed (combined with 8-bit architecture) but maybe a resonator could play a role here as well.

There are two popular makes of Arduino Pro Mini available on AliExpress:

  • The Simple (blue PCB) uses a resonator.
  • RobotDyn (black PCB) uses a crystal.

It would be interesting to test if the RobotDyn can work with a lower MAX_CLOCK_ERROR than The Simple.

What also plays a role if setting MAX_CLOCK_ERROR (and what value) is needed is how fast or slow events are handled in the OnEvent() event handler. This is relevant also for faster 32-bit MCU’s.

While the event handler code should be as short and fast as possible, the Arduino framework does not provide simple alternatives like multitasking to spin off longer running tasks, where the OS can schedule different tasks based on their priority (to prevent that longer running code impacts proper LMIC timing).
So if event handler code takes longer time to do its work (e.g. printing to serial port), it may be necessary to set/increase MAX_CLOCK_ERROR or else receive windows could be missed causing failure of downlinks (including joins).

1 Like

There’s really no issue with an 8 MHz processor or an 8-bit CPU.

Any apparent problem comes from faulty software design.

If you want to see what is really going on, get a cheap USB logic analyzer on the DIO pin that signals transmit complete, and on the SPI lines, and look at the actual time delay between the end of the uplink packet and putting the radio in receive mode for the receive window.

For simplicity what I typically actually do is have the firmware blip a GPIO when starting to receive, trigger a scope on that, and put the other probe on the gateway’s transmit LED, so I can simply see how the node receive lines up with what it is actually trying to receive.

That is a very simplified and very generic statement.

Given the limitations of the Arduino framework (no RTOS) combined with a LoRaWAN LMIC stack that runs on the same (mostly single core) MCU as the application - and a community of developers and makers that are not all skilled and seasoned diehard embedded developers,

saying that “any apparent problem comes from faulty software design” isn’t any useful because it does not take into account any practical context that many/most forum users are dealing with.

Apparently there ARE practical issues with 8-bit 8MHz MCU’s when using the Arduino framework and LMIC library. Otherwise the included basic example TTN-OTAA.ino should have to work without issues when joining on the lower spreading factors (which is standard) but it does not and requires relaxing of LMIC timing instead.

So where is the faulty software design, in the Arduino framework, in the used MCU specific Arduino Core implementation, in the used LMIC library implementation or in the example TTN-OTAA.ino?
And what is that going to help the users that want to create a practical application based on these building blocks?

No, there really are not.

Things that need to happen, need to happen orders of magnitude (if not several) slower than the system is capable of making them happen.

So where is the faulty software design, in the Arduino framework, in the used MCU specific Arduino Core implementation, in the used LMIC library implementation or in the example TTN-OTAA.ino?

In some ports, for example, to the ESP32, the mistakes are appear to be downright absurd design decisions by the authors of the Arduino port.

In other cases, the mistakes are bugs in a particular version of LMiC

And in some cases, the mistakes are introduced by the end user, for example when adding debug output or when trying to sleep in between operations.

As the asker provided zero information about the platform or code version in use only generic answers are possible

But getting a radio ready to receive 1 or 5 seconds after it finished transmitting is really not very challenging at all. The “heavy lifting” (if it can be called that) of things like encryption and decryption happens outside of the critical time - before the transmission and only after the reception (if any) while it is only the timing between transmission and reception which is critical.

LMiC doesn’t need an RTOS, it has its own task scheduler.

The reality is that a typical node doesn’t really need to do anything while waiting for the receive window it can simply busy wait, and then start the process of getting the radio ready to receive sufficiently far in advance of the time it needs to be receiving that the setup process will complete in time.

You should stop repeating that there are no practical issues with 8-bit MCU @8MHz.
Practical issues that exist are clearly described below.

This is incorrect.

Be specific: Which particular versions of LMIC are you referring to and what are those bugs?

‘appear to be’ I have seen more remarks about timing issues of ESP32 in combination with LMIC, but instead of hearsay it would be useful to add references to where this is actually confirmed and explained in detail.

Really? If true that should be able to solve all timing related LMIC-based-application issues.
But unfortunately, it appears that just like the LMIC onEvent() event handler code should not spend to much time to handle events, the same accounts for LMIC Jobs that are scheduled by the LMIC scheduler, they should not spend too much time running. I do not have a reference for this information at hand.

Facts

  • What are popular well-known boards using 8-bit MCU’s running at 8MHz?:
    Arduino Pro Mini 8Mhz and Adafruit/BSFrance LoRa32u4 (II).

  • How many well-known Arduino LMIC implementations exist for 8-bit AVR MCU’s: exactly two.
    See: Overview of LoRaWAN Libraries [HowTo]

  • What is the most basic OTAA example that is included with each (Arduino) LMIC library?:
    ttn-oota.ino.

Verified Facts

Are there any issues with 8-bit MCU’s running at 8 MHz in combination with Arduino framework and the two well known Arduino LMIC implementations? Yes there are!

A very simple test proves the facts:

  • Take a BSFrance LoRa32 II v1.2 board.
    Any of the boards mentioned should do but Pro Mini will require a separate SPI LoRa board.

  • Use one of the two well-known Arduino LMIC libraries,
    either LMIC-Arduino or the newer and improved MCCI LoRaWAN LMIC library.

  • Take the ttn-otaa.ino example (use the one that is included with the LMIC library used)
    and configure the correct lmic pinmappings for the board (don’t forget that for this board DIO1 must be manually wired) and enter the proper LoRaWAN keys/id’s.

  • Upload the sketch to the board and let it run.

Will the sketch succeed to immediately join on SF7BW125 as it should?: NO
Instead it will fail to join on SF7 and SF8 and only joins on SF9 instead.
As can be found in earlier posts on the forum.
In practice this means that it will take 6+ to 8+ minutes before a join request succeeds.
And this does not even take into account the testing of proper handling of downlinks.

To solve the issue MAX_CLOCK_ERROR has to be set to a value higher than 0
(actual required value depends on hardware and LMIC library used).

The issues occur independent of which of the two LMIC libraries is used.

So are there any practical issues with 8-bit MCU’s @8MHz with Arduino + LMIC?: Definitely yes.
But they can be worked around by setting MAX_CLOCK_ERROR.

It’s quite ironic that you would charge that, as it is your own post which is full of generalizations and unsupportable claims - at best you seem to be saying that a lack of popularity indicates a problem, when in fact it typical indicates tangential things:

  • there’s not much reason to use an 8-bit processor in a new low power design today
  • volunteer maintainers of LMiC repos do their development and testing on platforms they prefer

There’s no reason you can’t make LMiC work properly on an 8-bit processor, it’s just no one is currently volunteering to maintain a works-out-of-the-box example for that. There are, in actuality, precious few LMiC examples that reliably “work out of the box” on any platform - if you follow the issue trackers and changelogs, the situations that come closest to that keep having new issues discovered and fixed.

Speaking as someone who has been deep inside of the code of LMiC as part of turning it into something very different, and spent a lot of time looking at receive window timing with actual test equipment, your claims are based on a misunderstanding of what a node needs to do, and how LMiC goes about doing that. They have absolutely nothing to do with a processor being 8-bit.

Unfortunately it appears that just like the LMIC onEvent() event handler code should not spend to much time to handle events

The onEvent() method runs in many different circumstances. When that or something similar runs to signal a situation falling within the time between transmission and the following receive window, yes, code must be careful not to take too much time - as I’ve been saying all along, the critical timing is between transmission and reception. But outside of that period, timing is not critical. You can spend as much time as you like printing debug messages between a failed join and the next attempt, before preparing and sending a measurement packet, or after receiving a downlink.

There are indeed many ways in which a LoRaWAN stack, or the way it is plugged into a particular platform can get things wrong. When the asker of the original question clarifies their situation, it will be possible to directly address their specific difficulty.

1 Like

The TTN Forum servers both professionals and enthusiasts/hobbyists.

You seem to know very well how things should be, what is wrong and that there is little reason to use an 8-bit MCU for new low power designs today (this topic is not about low-power designs btw).

But that is not going to help those users who are currently running into issues with their 8-bit MCU based configurations (where MAX_CLOCK_ERROR comes into play). Not all users on the forum are professionally/commercially involved with LoRaWAN (yet) and neither are all of them professional hardware designers or professional embedded software developers.
Many users starting with LoRaWAN use basic LoRa components, boards that are not too expensive and are readily available, which includes 8-bit MCU based boards like Adafruit/BSFrance LoRa32u4.

Als already shown above, just get the most basic OTAA example included with one of the LMIC libraries and you will run into the described issues. That is the starting point for people who want to use LMIC with Arduino on an 8-bit MCU. No more no less.

And yes, if the original poster wants advice he/she should provide more details.
But the main question was whether using a better crystal could prevent the need for using MAX_CLOCK_ERROR (where it is currently being used/needed).
The answer to that question has already been given: No, they are in fact unrelated. And in case a resonator is used instead of a crystal that may play a role, but is not something that can be changed on already existing implementations (‘boards’).

1 Like

Besides timing impact of crystal/oscillator (as discussed above), has anyone here experience if timing can be improved by using hardware interrupts to trigger timestamping for DIO/DIO1 signals?

In MCCI LMIC recently i spotted and solved a bug which before prevented usage of interrupts on ESP32 platform (but probably not on AVR). Since it is working now, i tested it several weeks with different ESP32 boards. My hope was that this would improve join speed, i.e. solves timining issues when joining on SF7. But i can’t see any difference.

I’m wondering if this join issue on ESP32 platform may be caused by a certain software interrupt of ardunio-esp32 platform.

That depends entirely on what is happening while waiting for the DIO pin to change, and how it is being monitored. If code were efficiently busy waiting on an I/O, that could actually potentially be faster than an interrupt as it wouldn’t incur the context switch latency (keeping the processor awake while waiting on the radio is a relatively small premium over the radio’s consumption). Or one could use a hardware timer capture input. But (at least with the exception of waking up from sleep) these distinctions are many times faster than actually required.

Typical actual issues are receiving too late, or actually receiving _too early__and as a result stopping the receiver before the preamble of the packet has been detected.

Causes tend to be things like user add-ons that conflict with the way LMiC’s scheduler works (ie, don’t add things in between transmit and receive, unless you know exactly how they will interact).

Or bugs in the software environment’s timekeeping - if you look at the issue tracker for the ESP32 Arduino core, they’ve had many difficulties there, in trying to come up with something that doesn’t break Arduino traditions and doesn’t suffer consistency issues from the dual-core nature of the ESP32. Another category here is using an uncalibrated RC oscillator, which may shift with temperature, etc.

Also the amount of time the receiver should check hasn’t always been calculated correctly. The SX127x chips specify this as a number of symbol periods, so the conversion between time allowance and the register value (which is split across 2 or 3 registers) depends on the spreading factor. If you start the radio a bit early because the clock might be slow, then you also have to widen the radio’s preamble detection window, in case the clock wasn’t wrong, or in case the clock was fast.

This is, without a doubt, a hard set of issues to debug - as I’ve said before, to investigate an issue its best to use hardware test equipment to look at the timing between transmit and receive. And it requires looking through the history of local customizations, LMiC repos, and any target-specific platform code like an Arduino port.

If someone wants an LMiC solution with a fighting chance of working, it’s not about which platforms or categories of platform to avoid - rather, it’s about picking only a board target that has a maintainer actively supporting and testing LMiC on it - and preferably doing so for the LoRaWAN regional parameters and TTN settings which will be needed.