Lmic plus rfm95w staying awake for long periods

OK, here we go:

It may be important to know that a DS3231 is connected with an alarm set up every minute on the minute.
Doing this, I can have the Arduino sleep forever, wake every minute and if it is the full hour, I do a send.

The immediate downlink consists of three bytes, being hour, minute and second to avoid the RTC from drifting away over time. But I see the problem on all my 328 nodes when I put them to sleep. The others have no downlinks.

The mentioned raincounter is not connected yet but would also fire the IRQ.

I am pretty sure that the problem has to do with putting the Arduino to sleep and hacking the timer counter. But from reading the internet everybody does it like that and there seems to be no other solution.
Testing of the whole stuff is a but cumbersome as it always takes about 3 hours for the problem to show up.
Except if I add - as mentioned before - huge amount of time to the counter. Then it happens after three minutes. Somehow the timing is messed up. But why and where?
Btw. I tried to enable the debug for the LMIC lib but the 328 has not enough memory for that.

/*******************************************************************************
 * Copyright (c) 2015 Thomas Telkamp and Matthijs Kooijman
 * Copyright (c) 2018 Terry Moore, MCCI
 *
 * Permission is hereby granted, free of charge, to anyone
 * obtaining a copy of this document and accompanying files,
 * to do whatever they want with them without any restriction,
 * including, but not limited to, copying, modification and redistribution.
 * NO WARRANTY OF ANY KIND IS PROVIDED.
 *
 * This example sends a valid LoRaWAN packet with payload "Hello,
 * world!", using frequency and encryption settings matching those of
 * the The Things Network.
 *
 * This uses OTAA (Over-the-air activation), where where a DevEUI and
 * application key is configured, which are used in an over-the-air
 * activation procedure where a DevAddr and session keys are
 * assigned/generated for use with all further communication.
 *
 * Note: LoRaWAN per sub-band duty-cycle limitation is enforced (1% in
 * g1, 0.1% in g2), but not the TTN fair usage policy (which is probably
 * violated by this sketch when left running for longer)!

 * To use this sketch, first register your application and device with
 * the things network, to set or generate an AppEUI, DevEUI and AppKey.
 * Multiple devices can use the same AppEUI, but each device has its own
 * DevEUI and AppKey.
 *
 * Do not forget to define the radio type correctly in
 * arduino-lmic/project_config/lmic_project_config.h or from your BOARDS.txt.
 *
 *******************************************************************************/

#include <lmic.h>
#include <hal/hal.h>
#include <SPI.h>
#include <LoraMessage.h>

#include "LowPower.h"

#include <Wire.h>

#include <RtcDS3231.h>
RtcDS3231<TwoWire> Rtc(Wire);
#include <TimeLib.h>

#include "Adafruit_SI1145.h"
Adafruit_SI1145 uv = Adafruit_SI1145();

// Use pin 2 as wake up pin
const int IRQwakeUpPin = 2;

//Pin for UV Meter
const int UVpin = A2;

// marked volatile so interrupt can safely modify them and
// other code can safely read and modify them
//volatile uint16_t interuptCount = 0;
volatile bool IRQinterruptFlag = false;

void IRQwakeUp() {
  // since this interupted any other running code,
  // don't do anything that takes long and especially avoid
  // any communications calls within this routine
  IRQinterruptFlag = true;
}

volatile boolean powerdown=false;

time_t t;

unsigned int Vbat;
unsigned int Vsolar;

unsigned int ValueCounter;

unsigned int UVindex;
unsigned int UVlevel;
unsigned int VISlevel;
unsigned int IRlevel;

unsigned long UVindexSum;
unsigned long UVlevelSum;
unsigned long VISlevelSum;
unsigned long IRlevelSum;

unsigned int raincount;

// This EUI must be in little-endian format, so least-significant-byte
// first. When copying an EUI from ttnctl output, this means to reverse
// the bytes. For TTN issued EUIs the last bytes should be 0xD5, 0xB3,
// 0x70.
static const u1_t PROGMEM APPEUI[8] = { XXX };
void os_getArtEui (u1_t* buf) {
  memcpy_P(buf, APPEUI, 8);
}

// This should also be in little endian format, see above.
static const u1_t PROGMEM DEVEUI[8] = { XXX };
void os_getDevEui (u1_t* buf) {
  memcpy_P(buf, DEVEUI, 8);
}

// This key should be in big endian format (or, since it is not really a
// number but a block of memory, endianness does not really apply). In
// practice, a key taken from ttnctl can be copied as-is.
// The key shown here is the semtech default key.
static const u1_t PROGMEM APPKEY[16] = { XXX };
void os_getDevKey (u1_t* buf) {
  memcpy_P(buf, APPKEY, 16);
}

static osjob_t sendjob;

// Pin mapping
const lmic_pinmap lmic_pins = {
  .nss = 10,
  .rxtx = LMIC_UNUSED_PIN,
  .rst = 4,
  .dio = {6, 7, LMIC_UNUSED_PIN},
};


void onEvent (ev_t ev) {
    Serial.print(os_getTime());
    Serial.print(": ");
    switch(ev) {
        case EV_SCAN_TIMEOUT:
            Serial.println(F("EV_SCAN_TIMEOUT"));
            break;
        case EV_BEACON_FOUND:
            Serial.println(F("EV_BEACON_FOUND"));
            break;
        case EV_BEACON_MISSED:
            Serial.println(F("EV_BEACON_MISSED"));
            break;
        case EV_BEACON_TRACKED:
            Serial.println(F("EV_BEACON_TRACKED"));
            break;
        case EV_JOINING:
            Serial.println(F("EV_JOINING"));
            break;
        case EV_JOINED:
            Serial.println(F("EV_JOINED"));
            /*
            {
              u4_t netid = 0;
              devaddr_t devaddr = 0;
              u1_t nwkKey[16];
              u1_t artKey[16];
              LMIC_getSessionKeys(&netid, &devaddr, nwkKey, artKey);
              Serial.print("netid: ");
              Serial.println(netid, DEC);
              Serial.print("devaddr: ");
              Serial.println(devaddr, HEX);
              Serial.print("artKey: ");
              for (int i=0; i<sizeof(artKey); ++i) {
                Serial.print(artKey[i], HEX);
              }
              Serial.println("");
              Serial.print("nwkKey: ");
              for (int i=0; i<sizeof(nwkKey); ++i) {
                Serial.print(nwkKey[i], HEX);
              }
              Serial.println("");
            }
            */
            // Disable link check validation (automatically enabled
            // during join, but because slow data rates change max TX
            // size, we don't use it in this example.
            LMIC_setLinkCheckMode(0);
            break;
        /*
        || This event is defined but not used in the code. No
        || point in wasting codespace on it.
        ||
        || case EV_RFU1:
        ||     Serial.println(F("EV_RFU1"));
        ||     break;
        */
        case EV_JOIN_FAILED:
            Serial.println(F("EV_JOIN_FAILED"));
            break;
        case EV_REJOIN_FAILED:
            Serial.println(F("EV_REJOIN_FAILED"));
            break;

    case EV_TXCOMPLETE:
      
        Serial.println(F("EV_TXCOMPLETE (includes waiting for RX windows)"));

        if (LMIC.txrxFlags & TXRX_ACK)
        Serial.println(F("Received ack"));

        if (LMIC.dataLen==3) {
        
        t = Rtc.GetDateTime();
        Serial.print(F("alt: "));
        Serial.print(hour(t));
        Serial.print(F(":"));
        Serial.print(minute(t));
        Serial.print(F(":"));
        Serial.println(second(t));
        
        //Set time to received value
        setTime((int)LMIC.frame[LMIC.dataBeg + 0],(int)LMIC.frame[LMIC.dataBeg + 1],(int)LMIC.frame[LMIC.dataBeg + 2],day(t),month(t),year(t));
        Rtc.SetDateTime(now());
        
        t = Rtc.GetDateTime();
        Serial.print(F("neu: "));
        Serial.print(hour(t));
        Serial.print(F(":"));
        Serial.print(minute(t));
        Serial.print(F(":"));
        Serial.println(second(t));

        // data received in rx slot after tx
        Serial.print(F("Received "));
        Serial.print(LMIC.dataLen);
        Serial.print(F(" bytes of payload"));
        
        /*
        for (int i = 0; i < LMIC.dataLen; i++) {
          if (LMIC.frame[LMIC.dataBeg + i] < 0x10) {
            Serial.print(F("0"));
          }
          Serial.print(LMIC.frame[LMIC.dataBeg + i], HEX);
        }
        */
        Serial.println();
        
        }
      


      Serial.println(F("sent"));

      //send/receive cycle completed
      powerdown=true;
      break;
    
        case EV_LOST_TSYNC:
            Serial.println(F("EV_LOST_TSYNC"));
            break;
        case EV_RESET:
            Serial.println(F("EV_RESET"));
            break;
        case EV_RXCOMPLETE:
            // data received in ping slot
            Serial.println(F("EV_RXCOMPLETE"));
            break;
        case EV_LINK_DEAD:
            Serial.println(F("EV_LINK_DEAD"));
            break;
        case EV_LINK_ALIVE:
            Serial.println(F("EV_LINK_ALIVE"));
            break;
        /*
        || This event is defined but not used in the code. No
        || point in wasting codespace on it.
        ||
        || case EV_SCAN_FOUND:
        ||    Serial.println(F("EV_SCAN_FOUND"));
        ||    break;
        */
        case EV_TXSTART:
            Serial.println(F("EV_TXSTART"));
            break;
        default:
            Serial.print(F("Unknown event: "));
            Serial.println((unsigned) ev);
            break;
    }
}

void do_send(osjob_t* j) {
  
  // Check if there is not a current TX/RX job running
  if (LMIC.opmode & OP_TXRXPEND) {
    Serial.println(F("OP_TXRXPEND, not sending"));
  } else {

    //Prepare data
    LoraMessage message;

    message.addUint16(raincount);
    message.addUint16(UVindex);
    message.addUint16(UVlevel);
    message.addUint16(VISlevel);
    message.addUint16(IRlevel);
    message.addUint16(Vbat);
    message.addUint16(Vsolar);

    ValueCounter=0;
    raincount=0;
    UVindexSum=0;
    UVlevelSum=0;
    VISlevelSum=0;
    IRlevelSum=0;
    
    Serial.println(F(" sending..."));
    // Prepare upstream data transmission at the next possible time.
    LMIC_setTxData2(1, message.getBytes(), message.getLength(), 0);
    Serial.print(os_getTime());
    Serial.print(": ");
    Serial.println(F("Packet queued"));

  }
  // Next TX is scheduled after TX_COMPLETE event.
}



void setup() {

  Serial.begin(115200);
  Serial.println(F("Starting"));

  // Configure wake up pins as input.
  // This will consumes few uA of current.
  pinMode(IRQwakeUpPin, INPUT_PULLUP);   
  
  //Start RTC CLock
  Rtc.Begin();
  
  //Start UV/VIS/IR-Sensor
  uv.begin();
  
  Rtc.Enable32kHzPin(false);
  Rtc.SetSquareWavePin(DS3231SquareWavePin_ModeAlarmTwo);

  // Alarm 2 set to trigger at the top of the minute
  DS3231AlarmTwo alarm2(
    0,
    0,
    0,
    DS3231AlarmTwoControl_OncePerMinute);
  Rtc.SetAlarmTwo(alarm2);

  // throw away any old alarm state before we ran
  Rtc.LatchAlarmsTriggeredFlags();

  // LMIC init
  os_init();

  // Reset the MAC state. Session and pending data transfers will be discarded.
  LMIC_reset();
  LMIC_setClockError(MAX_CLOCK_ERROR * 1 / 100);

  // Start job (sending automatically starts OTAA too)
  measure();  //Do a first measurement at startup
  powerdown=false;
  os_setTimedCallback(&sendjob, os_getTime() + ms2osticks(10), do_send);

  //Attach Interrupt for RTC
  attachInterrupt(digitalPinToInterrupt(IRQwakeUpPin), IRQwakeUp, FALLING);

}

void loop() {

  extern volatile unsigned long timer0_overflow_count;
  os_runloop_once();  //check send status

  if (powerdown) {

    Serial.println(F(" ... go to sleep"));
    Serial.flush();
    
    // Enter power down state with ADC and BOD module disabled.
    LowPower.powerDown(SLEEP_FOREVER, ADC_OFF, BOD_OFF); 
    
    Serial.begin(115200);
    Serial.print(F("woken ... "));

    if (IRQinterruptFlag) { 
      IRQinterruptFlag=false; //Remove flag
      Serial.println(F("IRQ"));
      DS3231AlarmFlag flag = Rtc.LatchAlarmsTriggeredFlags();
      if (flag & DS3231AlarmFlag_Alarm2) { //RTC-Interrupt
        Serial.print(F("-RTC-"));
        cli();
        timer0_overflow_count += 60 * 64 * clockCyclesPerMicrosecond(); //give back 60 seconds of sleep
        sei();

        measure();    

        t = Rtc.GetDateTime();
        Serial.print(F(" "));
        Serial.print(minute(t));
        Serial.print(F(":"));
        Serial.print(second(t));

      
        if (minute(t)==0) {   //send on full hour
          powerdown=false;
          os_setTimedCallback(&sendjob, os_getTime() + ms2osticks(10), do_send);
        }
        
      } else {  //If not woken by RTC it must be Rain sensor
        raincount+=1;  
        Serial.print(F("rain: "));
        Serial.println(raincount);
      }
   } 
}


} //loop


void measure() {
  Vbat = analogRead(A0) / 1024.0 * 3300 * 2;
  Vsolar = analogRead(A1) / 1024.0 * 3300 * 2;

  //Measure UVlevel from analog ML8511
  UVlevelSum+= analogRead(A2);

  //Measure IR, VIS and UVindex from SI1145
  VISlevelSum+=uv.readVisible();
  IRlevelSum+=uv.readIR();
  UVindexSum+=uv.readUV();

  ValueCounter++;
  UVindex=UVindexSum/ValueCounter;
  UVlevel=UVlevelSum/ValueCounter;
  VISlevel=VISlevelSum/ValueCounter;
  IRlevel=IRlevelSum/ValueCounter;
}

I think you’re right :sunglasses:

you just goto deepsleep and wake up… without re initialising the I2C bus … so The DS3231 can react a bit strange sometimes imho

’ to avoid the RTC from drifting away over time ’
not neccessary - it’s a very accurate RTC… only check once a month / year

Sure, you are totally correct there :face_with_monocle:
The downlink evrey hour is neither important to me nor do I believe it is part of the problem.
I just added it out of curiosity (before noticing the other problem) so I kept it for now.
If everything else works, I could change code to only request a downlink every week/month or so for correction.

The DS3231 itself on the other hand is important for me because with the Arduino in deepsleep I have absolutely no other means of time tracking. But it is behaving neatly, so far.

Any idea why my LMIC timing is getting messed up, though? This really drives me crazy and it is hard to believe that I am the only one having seen this issue. Well, I am not, see start of post. But it seemed to have gone away for the others or they did not post their solution.

Ah, good point. Now I know what you mean. Well, I will have to deal with it and reinitialize the I2C to avoid more new problems.

Ah, that code explains much of the logging you posted earlier!

Sounds silly, but what if you’d print a bit more to see if changing timer0_overflow_count makes the expected difference? Untested:

Serial.print(micros());
Serial.print(F(" / "));
Serial.println(os_getTime() / OSTICKS_PER_SEC);

Serial.println(F("-RTC-"));
cli();
timer0_overflow_count += 60 * 64 * clockCyclesPerMicrosecond(); //give back 60 seconds of sleep
sei();

Serial.print(micros());
Serial.print(F(" / "));
Serial.println(os_getTime() / OSTICKS_PER_SEC);

Also note the following from @matthijskooijman, which also changes timer0_millis:

A while ago, I implemented a workaround by updating the Arduino timer value. This uses internal Arduino variables, so is likely to break at some point if Arduino changes its implementation. It is also specific to the AVR Arduino core, other architectures likely need something else. Still, here it is (from here):

ATOMIC_BLOCK(ATOMIC_RESTORESTATE) {
      extern volatile unsigned long timer0_millis;
      extern volatile unsigned long timer0_overflow_count;
      timer0_millis += slept;
      // timer0 uses a /64 prescaler and overflows every 256 timer ticks
      timer0_overflow_count += microsecondsToClockCycles((uint32_t)slept * 1000) / (64 * 256);
}

I’ve no idea if that’s needed… The code that is referenced by Matthijs, mentions:

// Update the millis() and micros() counters, so duty cycle
// calculations remain correct. This is a hack, fiddling with
// Arduino's internal variables, which is needed until
// https://github.com/arduino/Arduino/issues/5087 is fixed.
1 Like

As silly as it may sound, I had a similar suspicion and checked that. It changes os_time.
Anyway, I added the code to have a better view.

Here is how the debug looks like:

-RTC-
1605335192 / 1605
2136670000 / 2136
28:0 … go to sleep
woken … IRQ
-RTC-
2136679776 / 2136
2668014600 / 2668
29:0 … go to sleep
woken … IRQ

The really interesting thing is the other part you dug out. :face_with_raised_eyebrow:
What makes me wonder is that it not only changes timer0_overflow_count but also timer0_millis. One could suspect that not changing one of the two might get something out of sync.
This is definitely worth a try… test is running.

As far as I understand, LMiC only uses micros(), where timer0_overflow_count might suffice. But your other code might trigger usage of millis() too?

I was expecting a difference of 60 seconds there? You’re seeing 2,668,014,600 - 2,136,679,776 = 531,334,824 microseconds, hence 531 seconds? (Same for 2,668 - 2,136 = 532 seconds.)

:roll_eyes:
You are good! You really are!

Oops, this happens when you are careless and assign a value of 60000 milliseconds to a 16bit integer :rofl:
Fixed it, now it’s 60 seconds:

-RTC-
72687736 / 72
132687248 / 132

I did a couple of minutes adding 600 seconds per minute. This might be nonsense but doing so without updating also millis caused the problem to occur within minutes instead of hours. So far, safe and sound. I will monitor the proper time correction over night. I will keep you posted.

Update: The problem is still there.

It took some time to show up, but LMIC still starts to delay the sending occasionally. This keeps the RFM95 awake for anything between 2 and up to 5 minutes while using using 8 mA or so. With this behaviour, really low power applications with the 328P are impossible.
I can see here https://www.thethingsnetwork.org/forum/t/full-arduino-mini-lorawan-below-1ua-sleep-mode/8059 that @Charles obviously did something similar and I am wondering what I do fundamentally wrong?

You’re not seeing this when not sleeping at all, right? (Not sleeping, hence no code to fix the timers either.) What’s the payload length and SF? (I wonder if we can figure out if the 2 to 5 minutes is related to duty cycles.)

Also: do you get any larger downlinks (larger in size, hence including some server-side MAC command that’s piggy-backing on the downlink of your timestamp)? (I wonder if some earlier MAC command might have messed with LMiC’s state.)

For your delay i think you should track when LMIC really want to send data , for that you can add some printf in “engineUpdate” for the variable “txbeg”

Someting like

#endif // !DISABLE_BEACONS

        // Trace to show the delay due to duty cycle. 
        LMIC_DEBUG_PRINTF("%ul: engineUpdate, tx will happen at %ul\n", os_getTime(), txbeg);

        // Earliest possible time vs overhead to setup radio
        if( txbeg - (now + TX_RAMPUP) < 0 ) {

And if the delay is due to txbeg then you will be able to track where it come from.

1 Like

I fancy, I need to go back two steps here at least checking more basic things.
I have to admit, I never let the “same” code on the same node run for long without the sleep. Did not feel the necessity, though it seems to be better now.
I have now set up a very simplified node, sending 2 bytes every 5 minutes not getting downlinks.
First test will be with a delay(2000), 150 times between the sends. No sleep and no tinkering with the timer.

Edit:
You also asked for the SF. This is what the data says for one example:

{
  "time": "2019-01-06T17:44:41.269366308Z",
  "frequency": 868.5,
  "modulation": "LORA",
  "data_rate": "SF7BW125",
  "coding_rate": "4/5",
  "gateways": [
    {
      "gtw_id": "eui-b827ebfffe032416",
      "timestamp": 1130906507,
      "time": "",
      "channel": 2,
      "rssi": -83,
      "snr": 7.8,
      "rf_chain": 1,
      "latitude": 47.53158,
      "longitude": 7.81542,
      "altitude": 395
    }
  ]
}

And from the gateway:

{
  "gw_id": "eui-b827ebfffe032416",
  "payload": "QCQuASaAAgABmJwwYjTD",
  "f_cnt": 2,
  "lora": {
    "spreading_factor": 7,
    "bandwidth": 125,
    "air_time": 46336000
  },
  "coding_rate": "4/5",
  "timestamp": "2019-01-06T17:53:34.110Z",
  "rssi": -78,
  "snr": 9.2,
  "dev_addr": "26012E24",
  "frequency": 868300000
}
1 Like

Here I am a bit at a loss.
How do I get the LMIC_DEBUG_PRINTF to put something out to serial?
I can increase debug_level to 1 which gives me out of memory in variable space. Bad.
So I changed most of the #if LMIC_DEBUG_LEVEL > 0 to #if LMIC_DEBUG_LEVEL > 5 just to make them not print.
Now it compiles, but nothing of the PRINTFs is sent to serial. :thinking:

To get LMIC_DEBUG_PRINTF to work, you have to uncomment

//#define LMIC_PRINTF_TO Serial

Or you can replace the LMIC_DEBUG_PRINTF with some Serial.print().

Serial.print(os_getTime());
Serial.print(F(" : engineUpdate => tx will happen at "));
Serial.println(txbeg);

2 Likes

OK, got it. Thank you a lot for helping me here.
Had to change the line to that, though, so to get the actual values:

LMIC_DEBUG_PRINTF("%lu: engineUpdate, tx will happen at %lu\n", os_getTime(), txbeg);

Btw: I did some other test. If I send packets with a payload of 25 bytes as fast as I can, the LMIC soon delays it to 12 seconds between the packets. They have an airtime of about 100 ms, so that’s about 1% airtime.
Now, reverting to one send every 5 minutes without sleep, I can sit and wait what happens.

After having run some code for 6h and 9h respectively, I have more information but nothing conclusive yet.
I changed the time output to be divided by OSTICKS_PER_SEC to get seconds which are easier to read

So, 6h of sending every 5 minutes without sleep (only 150x 2s delay) and no tinkering with the timers:
Debug looks like this from beginning to end:
go to sleep … … woke up …sending…

22345: Queue packet
22345: engineUpdate, tx at 21738; now is: 22345
22345: EV_TXSTART
22345: tx done
22351: EV_TXCOMPLETE (includes waiting for RX windows)
sent
go to sleep … … woke up …sending…
22651: Queue packet
22651: engineUpdate, tx at 22085; now is: 22651
22651: EV_TXSTART
22651: tx done
22656: EV_TXCOMPLETE (includes waiting for RX windows)
sent

Then changed to sleep for 2 seconds instead of the delay. First 6h no difference, no inexpected delays:

go to sleep … … woke up …sending…
33976: Queue packet
33976: engineUpdate, tx at 33369; now is: 33976
33976: EV_TXSTART
33976: tx done
33982: EV_TXCOMPLETE (includes waiting for RX windows)
sent
go to sleep … … woke up …sending…
34282: Queue packet
34282: engineUpdate, tx at 33716; now is: 34282
34282: EV_TXSTART
34282: tx done
34288: EV_TXCOMPLETE (includes waiting for RX windows)
sent

Then, after 9.52 hours, this happens:

go to sleep … … woke up …sending…
34282: Queue packet
34282: engineUpdate, tx at 33716; now is: 34282
34282: EV_TXSTART
34282: tx done
34288: EV_TXCOMPLETE (includes waiting for RX windows)
sent
go to sleep … … woke up …sending…
-34131: Queue packet
4294933165: engineUpdate, tx at 33981; now is: 4294933165
-34131: EV_TXSTART
4294933165: tx done
-34126: EV_TXCOMPLETE (includes waiting for RX windows)
sent
go to sleep … … woke up …sending…
-33826: Queue packet
4294933470: engineUpdate, tx at 34328; now is: 4294933470
-33826: EV_TXSTART
4294933470: tx done
-33819: EV_TXCOMPLETE (includes waiting for RX windows)
sent
go to sleep … … woke up …sending…
-33519: Queue packet
4294933777: engineUpdate, tx at 4294933170; now is: 4294933777
-33519: EV_TXSTART
4294933777: tx done
-33513: EV_TXCOMPLETE (includes waiting for RX windows)
sent

So, rollover happend, first for now, then for txbeg. No problem so far, as still no send delays.
Next test will be sending not every 5 minutes but every 1 hour. Maybe this causes the problem to show up.

1 Like

I don’t know if rollover occurred yet; it seems the debug print is just thinking it’s a signed value, rather than an unsigned value. For an unsigned number, the value seems to have passed half its maximum then. But sure: at some point it will rollover.

Ah, forget the negative printed values, these are the lines that you’re referring to:

34282: engineUpdate, tx at 33716; now is: 34282
...
4294933165: engineUpdate, tx at 33981; now is: 4294933165
...
4294933470: engineUpdate, tx at 34328; now is: 4294933470
...
4294933777: engineUpdate, tx at 4294933170; now is: 4294933777

Are all times in each line divided by OSTICKS_PER_SEC?

Something I noticed which is a bit weird, but might not affect things, os_getTime is defined as:

ostime_t os_getTime () {
    return hal_ticks();
}

…with hal_ticks expected to return an unsigned 32 bits number:

u4_t hal_ticks () {
    ...
}

…but ostime_t being a signed 32 bits number:

typedef uint32_t           u4_t;
typedef int32_t            s4_t;
...
// the HAL needs to give us ticks, so it ought to know the right type.
typedef              s4_t  ostime_t;

Yes, all are divided by OSTICKS_PER_SEC.
Regarding the negative numbers. Next time I stop the script, I will check if I can/have to cast some printouts as unsigned long not to look different because of them flipping over to negative.

And now, having it run with one wakeup per hour and 1800 times a 2 second sleep with same number of 2 seconds “corrections” on the timer, I can see “bad thing happening” after only a couple of hours:

Starting
 ...sending...
0: Queue packet
0: EV_JOINING
0: engineUpdate, tx at 3; now is: 0
0: next engine update in 3
3: engineUpdate, tx at 3; now is: 3
3: EV_TXSTART
3: tx done
8: EV_JOINED
8: engineUpdate, tx at 8; now is: 8
8: EV_TXSTART
8: tx done
13: EV_TXCOMPLETE (includes waiting for RX windows)
sent
go to sleep ... ... woke up ...sending...
3612: Queue packet
3612: engineUpdate, tx at 8; now is: 3612
3612: EV_TXSTART
3612: tx done
3617: EV_TXCOMPLETE (includes waiting for RX windows)
sent
go to sleep ... ... woke up ...sending...
2921: Queue packet
2921: engineUpdate, tx at 54; now is: 2921
2921: EV_TXSTART
2921: tx done
2928: EV_TXCOMPLETE (includes waiting for RX windows)
sent
go to sleep ... ... woke up ...sending...
2231: Queue packet
2231: engineUpdate, tx at 2967; now is: 2231
2231: next engine update in 2967
2967: engineUpdate, tx at 2967; now is: 2967
2967: EV_TXSTART
2967: tx done
2973: EV_TXCOMPLETE (includes waiting for RX windows)
sent
go to sleep ... ... woke up ...sending...
2277: Queue packet
2277: engineUpdate, tx at 3014; now is: 2277
2277: next engine update in 3014
3014: engineUpdate, tx at 3014; now is: 3014
3014: EV_TXSTART
3014: tx done
3019: EV_TXCOMPLETE (includes waiting for RX windows)
sent
go to sleep ... 

Time seems to be, well, running very slow or even backwards in between. And all of a sudden, tx is after now and it delays.

Again, this is the code used instead of a simple delay(2000);:

  LowPower.powerDown(SLEEP_2S, ADC_OFF, BOD_OFF);
  //Give the AVR back the 2 slept seconds
  unsigned long slept=2*1000;
  ATOMIC_BLOCK(ATOMIC_RESTORESTATE) {
    extern volatile unsigned long timer0_millis;
    extern volatile unsigned long timer0_overflow_count;
    timer0_millis += slept;
    // timer0 uses a /64 prescaler and overflows every 256 timer ticks
    timer0_overflow_count += microsecondsToClockCycles((uint32_t)slept * 1000) / (64 * 256);
  }

Strange, isnt’ it?

I tried a different approach with a different node running on a AA battery (no serial though):
If I do not try to give it back all the time it slept, but only just enough so that the duty cycle is within limits (e.g. 1 minute per send or per hour, probably 10 sec would also suffice), it seems to run nice.
I do not know if I just delay the problem doing so, or if it is now gone completely. But the node is running for >24h now without showing any problems. It might take weeks to find out, but with now no longer showing the power-hungry delays, the battery should last about a year now on that node, according to my estimates :wink:
I know this is just a nasty patch, but unless somebody has a good idea what causes these time-effects, I am lost in getting it fixed properly.

1 Like

As an aside: at least for EU868, it seems the value for tx at 8 is taken from the earliest time any of the channels is available, which indeed can be in the past. So that looks okay.

Some theory…

os_time is based on millis, it is a scaled down version of millis to avoid to overflow too often.
To keep the os_time counter accurate, it needs to detect millis overflow, which happens after approx 71 minutes.
The detection algorithm looks at a flip of the most significant bit of millis.

Now if you sleep one hour, most of the time (1 out of 6) you will have a millis overflow, and it won’t be detected as the increment is big enough to keep the msb unchanged!
And if you are out of luck, it will go backwards…

This is not a proven theory, it is just my gut feeling reading the code, but you could validate it easily by calling os_getTime() each time you wake up (every 2 secs) to allows overflow detection.

4 Likes