Feather M0 with LMIC takes more than 2 minutes to transmit

Masoo93 · December 1, 2020, 11:50am

Hi,
I’m having trouble with understanding an issue with an end-node prototype I’m working on.
This end-node is constituted by a Feather M0 LoRa for the EU region and three sensors (DHT22, BMP388 and a contact sensor) and it sends the data collected by the sensors encoded in a payload of 10 bytes. I’m using this LMiC library to handle the LoRa communication and the Arduino LowPower and the RTCZero libraries for the deep sleep implementation, plus all the standard libraries needed by the sensors.
I have 4 Feathers with which I’m testing the code, one with all the above mentioned sensors and the other 3 with all the sensors minus the BMP388. The join procedure chosen is OTAA (widely tested, it works).

The issue I’m experiencing with all four devices on which i loaded the sketch is that the end-nodes transmission period is 1 hour, 2 minutes and 20 seconds (±10 seconds) but the sleep time I’ve specified in the code is an hour (therefore the end-node stays on for circa 2’ and 20’’ for each transmission).
I suspect that this problem is somehow induced by the deep sleep cycle, since I’ve tried the same sketch replacing that function with a simple delay of the same length and the time needed to transmit dropped to more “acceptable” numbers: I kept track of it using the elapsedMillis library and the most common values were 5 or 17 seconds, rarely they went up to 30-45 seconds (including the code execution).

This is the content of the lmic_project_config.h file:
#define CFG_eu868 1
#define CFG_sx1276_radio 1
#define LMIC_USE_INTERRUPTS
#define DISABLE_PING
#define DISABLE_BEACONS
#define LMIC_DEBUG_LEVEL 1

Here are the relevant parts of the code I’m using (it’s more than 500 lines long so I thought it would be better to include only these):

const lmic_pinmap lmic_pins = {
  .nss = 8,
  .rxtx = LMIC_UNUSED_PIN,
  //.rst = 4,
  .rst = LMIC_UNUSED_PIN,
  .dio = {3, 6, LMIC_UNUSED_PIN},
  .rxtx_rx_active = 0,
  .rssi_cal = 8,              // LBT cal for the Adafruit Feather M0 LoRa, in dB
  .spi_freq = 8000000,
};

void setup() {
  pinMode(DHT_PIN,INPUT);
  pinMode(CONTACT_SENSOR_PIN,INPUT_PULLUP);
  pinMode(BATTERY_PIN, INPUT);

  delay(3000);

  rtc_sleep.begin();
  rtc_sleep.setEpoch(0);

  // The code from here to detachInterrupt() has been adapted from a post I've found online
  // and it is used to make the interrupts work on the edges (RISING, FALLING and CHANGE
  // I don't really understand it and I'd be curious to understand its impact on the power draw during
  // the deep sleep
  attachInterrupt(digitalPinToInterrupt(CONTACT_SENSOR_PIN), interrupt_RISING, RISING);

  // Set the XOSC32K to run in standby
  SYSCTRL->XOSC32K.bit.RUNSTDBY = 1;

  // Configure EIC to use GCLK1 which uses XOSC32K
  // This has to be done after the first call to attachInterrupt()
  GCLK->CLKCTRL.reg = GCLK_CLKCTRL_ID(GCM_EIC) |
                     GCLK_CLKCTRL_GEN_GCLK1 |
                     GCLK_CLKCTRL_CLKEN;

  detachInterrupt(digitalPinToInterrupt(CONTACT_SENSOR_PIN));

  dht.begin();

  bmp.begin();

  // LMIC init
  os_init();
  // Reset the MAC state. Session and pending data transfers will be discarded.
  LMIC_reset();

  // Disable link-check mode and ADR, because ADR tends to complicate testing.
  LMIC_setLinkCheckMode(0);
  // Set the data rate to Spreading Factor 7.  This is the fastest supported rate for 125 kHz channels, and it
  // minimizes air time and battery power. Set the transmission power to 14 dBi (25 mW).
  LMIC_setDrTxpow(DR_SF7,14);
  // in the US, with TTN, it saves join time if we start on subband 1 (channels 8-15). This will
  // get overridden after the join by parameters from the network. If working with other
  // networks or in other regions, this will need to be changed.
  // I'm in the EU region, do I need to call this function with other parameters..?
  //LMIC_selectSubBand(1);  
  LMIC_setClockError(MAX_CLOCK_ERROR * 10 / 100);
  // Start job (sending automatically starts OTAA too)
  do_send(&sendjob);
}

void loop() {
  os_runloop_once();
}

// I included only the TX_COMPLETE event because it's the only one in which I actual do something
void onEvent (ev_t ev) {
  Serial.print(os_getTime());
  Serial.print(": ");
  switch(ev) {              
          
      case EV_TXCOMPLETE:
      
          // Enable the interrupt before the deep sleep
          attachInterrupt(digitalPinToInterrupt(CONTACT_SENSOR_PIN), interrupt_RISING, RISING);
          // Call deep sleep function
          goToSleep(TX_interval);
          // Disable the interrupt after the deep sleep
          detachInterrupt(digitalPinToInterrupt(CONTACT_SENSOR_PIN));
          
          do_send(&sendjob);
          break;
    }
  }

void goToSleep (int sleep_time) {
  rtc_sleep.setEpoch(0);            // Probably not needed..
  
  long unsigned int curr_time = rtc_sleep.getEpoch();               // Start deep sleep epoch
  long unsigned int wake_up_time = curr_time + sleep_time;        // End deep sleep epoch

  // Auxiliary variable needed in case the deepSleep function is stopped by an interrupt
  long unsigned int sleep_time_left = sleep_time;

  while (curr_time < wake_up_time) {
    sleep_time_left = wake_up_time - curr_time;      
    
    if (sleep_time_left > 0) {
      LowPower.deepSleep(sleep_time_left * 1000);   // * 1000 poiché il tempo qua è in ms
    } else {
      wake_up_time = 0;
    }
    
    curr_time = rtc_sleep.getEpoch();
  }
}

void do_send(osjob_t* j){
    // Check if there is not a current TX/RX job running
    if (LMIC.opmode & OP_TXRXPEND) {
        Serial.println(F("OP_TXRXPEND, not sending"));
    } else {
        DHT_reading();
        door_state = digitalRead(CONTACT_SENSOR_PIN);
        battery_reading();
        BMP_reading();

        preparePayload();
        
        door_openings = 0;

        /* prepare upstream data transmission at the next possible time.
         * transmit on port 1 (the first parameter); you can use any value from 1 to 223 (others are reserved).
         * don't request an ack (the last parameter, if not zero, requests an ack from the network).
         * Remember, acks consume a lot of network resources; don't ask for an ack unless you really need it. */
        tx_error_code = LMIC_setTxData2(1, payload, sizeof(payload), 0);
        handle_tx_errors();   // Just built the skeleton of the function, at the moment it's almost empty
    }
}

I’ve not included the code for the sensors readings, the payload preparation and other stuff but I can guarantee that those parts have been tested and they work as expected (other than the issue subject of the post everything works perfectly, the payload arrives as expected, etc.).
When I tried this sketch with the delay() function in place of the deep sleep cycle, I monitored the time needed to execute the code between just before the do_send execution in TX_COMPLETE and the end of its execution (which corresponds to the sensor readings and the payload preparation - everything that the sketch does between two transmissions) and it amounted to 1.3 seconds.

cslorabox · December 1, 2020, 3:42pm

The general nature of the problem is that when you wake up, LMiC isn’t aware of any time having elapsed during the sleep.

This will throw off all of LMiC’s timing.

As to why it thinks it shouldn’t transmit again “immediately” you’d have to dig into the details, but probably it’s trying for a bit of sanity there.

Generally when people sleep a platform running LMiC, they make sure to advance the clock that LMiC is looking at.

Masoo93 · December 1, 2020, 4:21pm

Thanks for the reply.
Do you have any idea of what could be the clock used by LMiC and/or how could I keep it running during the sleep?

Also I’ve seen various sketches similar to mine except for the library used for the sleep cycle, i.e. Adafruit SleepyDog; I didn’t use it because I experienced some problems with its implementation, especially for the recognition of edges for the interrupts, but I could try again. Do you know if it automatically handles the clock used by LMiC? Because I do not recall seeing any code related to that in those sketches.

descartes · December 1, 2020, 5:30pm

At the risk of bringing the world raining down on my head, I don’t use the LMIC for timing of send - you don’t have to initialise or maintain the send job, you can command a send when you require.

So I have an Arduino that runs through a series of 8 second sleep cycles and when it has completed, does a data collection, sets a ‘stay awake’ flag, commands a send, resets the flag when the send has completed and then goes to sleep.

The LMIC uses a faux scheduler to know when to open the RX1 & RX2 windows - so I give it free rein without any other ‘tasks’ running during the send & potential receive cycle. But I manage the rest of it’s sorry existence. Given the number of these scattered around me and out in the world as simple canary & temp/humidity devices that just keep on trucking, I’m very unlikely to be persuaded that this is a bad thing - I’ve checked, no puppies were hurt in the process of coding them.

See https://github.com/descartes/Arduino-LMIC-example

Masoo93 · December 2, 2020, 9:31am

Thanks for posting your github repo, I’m trying to understand it.
Shouldn’t my sketch work almost in the same way, since I’m not using the LMiC function that schedules the next send job but i’m sending right away at the next possible time with the LMIC_setTxData2(1, payload, sizeof(payload), 0); function just after the data collection and payload preparation functions?
If not, what changes do you think I should apply to make it work like you said?

descartes · December 2, 2020, 11:24am

I think the key difference is I have the sleep woven in to the loop() rather than acting outside of it - at present you respond to a LMIC complete event by going to sleep and when you wake up, schedule a send and then, and only then, does it all fall back through in to loop() with its call of os_runloop_once() - there may be some side effects to consider.

I’m not saying my solution is necessarily better, but I’m using state flags to decide what is the next appropriate action in the loop() which means we don’t end up with a deeply nested set of function calls - bearing in mind that when LMIC calls the onEvent function, there’s already a whole chain of ‘stuff’ that happened for it to get there, to which you are effectively layering another send job and without re-reading the source, that may well get jammed up awaiting a os_runloop_once()!

cslorabox · December 2, 2020, 4:13pm

This doesn’t actually send anything, rather what it does is set state to indicate that sending is desired and cache the data.

Actual transmission occurs from engineUpdate out of the os runloop when it decides the necessary criteria are met - which will often include a minimum time since the end of the previous transmission.

Do you have any idea of what could be the clock used by LMiC

micros() but in a way that has its own scaling and thus overflow logic.

github.com

mcci-catena/arduino-lmic/blob/master/src/hal/hal.cpp#L226


      
              hal_spi_trx(cmd, buf, len, 1);
          }
          
          // -----------------------------------------------------------------------------
          // TIME
          
          static void hal_time_init () {
              // Nothing to do
          }
          
          u4_t hal_ticks () {
              // Because micros() is scaled down in this function, micros() will
              // overflow before the tick timer should, causing the tick timer to
              // miss a significant part of its values if not corrected. To fix
              // this, the "overflow" serves as an overflow area for the micros()
              // counter. It consists of three parts:
              //  - The US_PER_OSTICK upper bits are effectively an extension for
              //    the micros() counter and are added to the result of this
              //    function.
              //  - The next bit overlaps with the most significant bit of
              //    micros(). This is used to detect micros() overflows.

Look through the issues (including closed ones) on the MCCI repo, people have thought about how to sleep this before

Masoo93 · December 3, 2020, 11:27am

Thanks for the explanation, now I think I understand the difference between the two routines and what could be wrong (or at least, not optimal) with mine.
I’m thinking of a sort of “hybrid” solution for my code, i.e. changing my goToSleep function in order to split the sleep cycles in subperiods of, let’s say, 30 seconds or a minute each, and in between each period call os_runloop_once; do you think it could work?
I came up with this solution just to avoid to restructure completely my code, but eventually I’ll do that if it’s needed.

descartes · December 3, 2020, 11:29am

Possibly - but my programming intuition tells me you need to unwind the multiple levels of code and that in fact you may be nesting another layer doing it this way. Putting it all in to the loop() with some flags shouldn’t take too long.

cslorabox · December 3, 2020, 3:59pm

That won’t help by itself. LMiC needs to see time elapsing in terms of the time function mentioned in previous posts above. In terms of when LMiC next wants to do something, that’s information known to its scheduler. Eg, you could put your sleep in LMiC’s wait-until-next-task code (probably only run it if the wait is going to be more than a second though; also you probably want to wake up early and let the ordinary code handle the last little bit).

In terms of depth of call nesting, that can indeed be a concern in LMiC though I’m not sure it’s really at issue here. The sleep process likely isn’t too deep on its own. And calling the data queuing function in the event method isn’t uncommon in non-sleep cases. These are separate excursions of the stack, so no clear conflict there.

But yes, it would be a lot cleaner to sleep in cooperation with LMiC scheduler rather than by “hitting pause on it” as in the current attempt.

Masoo93 · December 3, 2020, 5:18pm

I have restructured my code as suggested by @descartes (taking inspiration by his code - thanks a lot for sharing!), so now the loop function looks like this:

void loop() {
  bool send_flag = false;
  
  os_runloop_once();

  // If  enough sleep cycles have been completed and the TX interval has been exceeded
  if (sleep_cycles_done * sleep_subperiod >= TX_interval) {
    send_flag = true;
  }

  if (send_flag) {
    send_flag = false;
    sleep_flag = false;
    sleep_cycles_done = 0;
    do_send(&sendjob);
  }

  if (sleep_flag) {
    goToSleep(sleep_subperiod);
    sleep_cycles_done++;
  }
}

and in the TX_COMPLETE event i simply put the sleep_flag to true.
Anyway, I tried to run this code on a Feather: the message after the first sleep period arrived after 1h 1’ 10’‘, but all the following three messages took more than 1h 3’ to arrive, demonstrating an even worse performance that the original code. Therefore I think that as @cslorabox said probably the issue is in the timing of the library’s scheduler; I tried to look through issues of the library but until now I haven’t found anything that could help me solve the problem - I’ll keep searching in there.

As per your last message @cslorabox, I didn’t fully understand this part:

I (briefly) looked into the code of the LMiC library and I didn’t manage to find that “wait-until-next-task code”; anyway what you’re suggesting would be to (find it and) put the sleep “mini-cycles” in there?

cslorabox · December 3, 2020, 6:31pm

See arduino-lmic/src/hal/hal.cpp at master · mcci-catena/arduino-lmic · GitHub but I would also check the code for any other loops that call hal_ticks() or hal_checkTimer().

To do this, you’d also have to fixup the result of the above routines so that they reflect the time elapsed during sleep (but you’re going to have to do that for any solution that doesn’t completely remove and replace the LMiC schedular).

And you’d need to have your event method schedule your next transmission job for a future time using the LMiC scheduler.

But as hinted at before, you should probably start by studying

Issues · mcci-catena/arduino-lmic · GitHub

And maybe try words like “suspend” too. Make sure not to search with the isopen criteria…

descartes · December 3, 2020, 9:24pm

Hmmmm, key difference here is that you have told the LMIC to schedule the task to run at some point in the future which is up to LMIC. And if it’s been frozen in time, it may take some time to get to a point where it thinks it’s time to send.

Whereas I tell the LMIC to transmit, which as long as you haven’t contravened it’s algorithm for duty limits, will do a transmit via it’s own internal threaded system. And, as you can see from the table below, it’s reasonably consistent.

I’d set LMIC_DEBUG_LEVEL to 1 in config.h in the library so you can see far more detail on the serial port as to what is going on under the hood.

Here’s the last 10 records from the Production office temperature device. The code has to turn on the IO & ADC, then each sensors is turned on, some settle time, a reading (or more to iron out noise) and then turn off. One sensor is a DHT and the other is a DS18B20, both of which have to be re-initialised. There’s also a 4 second extra as a rounding error on the set interval of 300 seconds. It’s not optimised and I’ve never timed it as I’ve no particular need for an Arduino + RFM95 node as I don’t deploy them for production, just as Canarys (end to end monitor) or as give-aways.

Timestamp	Difference	Diff from Ave
03/12/2020 19:40:00	5m 34s 526ms	-0m 0s 652ms
03/12/2020 19:45:34	5m 34s 730ms	-0m 0s 856ms
03/12/2020 19:51:08	5m 33s 389ms	0m 0s 485ms
03/12/2020 19:56:41	5m 32s 876ms	0m 0s 998ms
03/12/2020 20:02:14	5m 33s 6ms	0m 0s 868ms
03/12/2020 20:07:48	5m 34s 15ms	-0m 0s 141ms
03/12/2020 20:13:22	5m 34s 563ms	-0m 0s 689ms
03/12/2020 20:18:56	5m 33s 403ms	0m 0s 470ms
03/12/2020 20:24:30	5m 34s 231ms	-0m 0s 357ms
03/12/2020 20:30:04	5m 34s 1ms	-0m 0s 127ms

Given I’ve never really looked at my Arduino implementation at this level of detail, I’m reasonably happy but my interest is peaked so I’ll instrument a copy of the device at some point over the holidays.

cslorabox · December 4, 2020, 6:19am

Not sure what function you’re using to do that (apparently not LMIC_setTxData2() which is the only official method) but anyway, if the entire time between the last transmission and this request is an anesthesia type “unaware of time passing” sleep, then that request will indeed violate limits.

Maso93’s transmissions are occurring at an interval which is the sum of the duration of the “lost awareness” sleep, plus LMiC’s idea of reasonable pacing as the passage of non-sleep time it is aware of.

Fixing up LMiC’s sense of elapsed time is pretty critical - LMiC needs to see that low power sleep time as interval between packets, rather than only start measuring interval after waking up.

descartes · December 4, 2020, 10:36am

github.com

descartes/Arduino-LMIC-example/blob/d1c6d6ce10e546c36bb3203602638675cabc92d9/TinyThing/TinyThing.ino#L54


      
          void doSend() {
            TRACE_PRINTLN("")
            TRACE_S("doSend")
            // Check if there is not a current TX/RX job running
            if (LMIC.opmode & OP_TXRXPEND) {
              TRACE_S("OP_TXRXPEND, not sending")
          
            } else {
              BuildPayload();
              // Prepare upstream data transmission at the next possible time.
              LMIC_setTxData2(1, mydata, sizeof(mydata) - 1, 0);
              DEBUG_S("Packet queued")
            }
            // Next TX is scheduled after TX_COMPLETE event.
          }
          
          
          
          // Respond to incoming data
          void processIncoming() {

Masoo93 · December 4, 2020, 12:07pm

Thanks for your insights; for what I’ve understood now I have two options to solve this LMiC timing issue:

somehow fix the library code in order to make it aware of the sleeping time - for which I may have found a solution inside this issue published on the now deprecated LMiC library mantained by Matthijs Kooijman (hoping it will be compatible with the LMiC library I’m currently using)
understand which clock is used by the LMiC library to keep track of time and how to save it - for

Anyway I didn’t fully understand this part:

You are implying that it would be better to use the os_setTimedCallback function? If yes, why? I saw that function used on the examples provided in the library’s repo, but I think that even if I managed to correct the timing issue it would be better not to rely on the LMiC library for the transmission scheduling but “impose” it using LMIC_setTxData2(); also I’m afraid that the eventual correction would probably lead to some imprecisions in the LMiC clock.

descartes · December 4, 2020, 1:13pm

Many have tried this, for instance:

and many others to find by various searches on the forum.

Masoo93 · December 4, 2020, 1:26pm

I may have just realized what’s the difference you’re talking about in terms of the transmission scheduling: in mine, I execute the LMIC_setTxData2() inside the do_send function, while in yours you execute it directly in a function you defined.

I haven’t really understood how the do_send function works, in particular with respect to its parameter (osjob_t* j), but I think that using it may somehow leave to the LMiC library the decision on when to send the transmission with all its implications; whereas directly calling the LMIC_setTxData2() as you did maybe will enforce the library to actually send asap, without taking into account the timing and other parameters that may cause issues like the one I’m experiencing.

Regarding possible duty cycle violations I don’t think I should be in danger of violating them, since the payload I’m sending each hour is really small (10 bytes) and using the TTN airtime calculator the airtime should range from 60 to 1500 ms, in any case way below the 1% duty cycle established for most of the bands (which should correspond to a maximum of 36 seconds per hour).

Unfortunately I won’t be able to work on this project until monday, but thanks to the advices you and @cslorabox gave me I have plenty of tests to prepare and run on my Feathers!

Anyway thanks again for sharing your insights and the results of your work, which by the way is quite impressive!

(Also thanks for the last link you sent, I only noticed it now and I’ll keep it in mind for the tests I’ll do next monday)

descartes · December 4, 2020, 2:07pm

The LMIC has a mini-task scheduler built-in - not a RTOS but an element of one. Internally it creates, starts, stops and deletes jobs that the schedule will then process according to the jobs criteria.

They create two jobs for the user - one for data collection and one for sending. The data structure is osjob_t* and that is used by sendjob but it could be called anything because it’s only a placeholder to attach the next scheduled time and the function to call, which is do_send. This requires the job structure as a parameter as that’s how the scheduler calls it, as the structure can hold other information if need be.

All that said, calling do_send should be similar to me calling LMIC_setTXData2 but with some initial checks so it doesn’t try whilst it is already processing. My state flags tell me it won’t be, as my structure is either sleeping, creating a payload or LMIC active.

But I don’t have visibility on all your code so I can’t be sure what else is going on - for instance, I don’t even define the ‘sendjob’.

All of this has to be in the context of me trying many many things summer 2019, including trying to adjust timings and so forth. I stripped out various elements from the main example, had debug of the LMIC on full and read the source code. I arrived at a stable solution that works that I’m happy I’ve not warped time & space to achieve, pragmatic but not ideal. Read the README.md for my archive to get a better picture.

At present for co-developer (usually juniors), leveraging Arduino libraries, I use Uno/Nano/Pro Mini/Nano Every with a RAK4200 module on AT commands. I have done a project using a Nano Every (48K Flash, 6K RAM) with an RFM95 using LMIC to good effect. I’m trying out the Adafruit Feather M0 with RFM95 as this is a first class citizen of the MCCI LMIC repro but like many other of these projects, they tend to stop at the basic examples - your requirement for sleep is very common but it’s not been included as there are other things for the developers to do and it’s possibly a bit application specific.

I’ve been to “Planet Change LMIC Timing” and lost days of my life as I detailed on my repro. I’ve learn’t much since then but I’m not tempted back, I’d rather waste my time re-writing a LoRaWAN stack for smaller devices. I was highlighting that with a forum search you can find other people with similar issues over the years. Question is, do you feel lucky (punk), well do you?

cslorabox · December 4, 2020, 4:11pm

It’s critical to remember that this only attempts to stage a packet and set state variables.

it does not send anything!

That only happens if and when LMiC decides it’s appropriate to during a future execution of the runloop.

Typically you wouldn’t call this until you were ready to send, but the actual sending is still going to happen sometime later which takes minimum intervals into account.

The reason one would want to use LMiC’s scheduler to plan the transmission for a future time is so that there can be one scheduling mechanism, which you can then modify so that if it has nothing to do for a while it sleeps.

Otherwise you end up with two scheduling mechanisms that aren’t well aware of each other and both of which need to be made compatible with the fact that at least one of them (the custom one) can cause a sleep which effects both.