Node Fails after sending ~200 Messages


(Akshaim) #1

Hi,

I have recently started working on LMIC library and it was working fine so far, now I am facing a serious issue. I have ABP node which starts off working fine. The node captures magnetometer data and sends to a gateway on a 30-second interval. The code can be found here < https://github.com/akshaim/LoRaNode_Arduino/tree/master/LoRa_865_Battery >

The node fails to send data after around 200 packets and LMIC Debug data show “uplink delayed until …” error. Last few lines of debug log(along with comments) is attached for reference. Any help is deeply appreciated.

[17:49:12.962350 25.631989] Sleep complete //node wakes up from sleep
[17:49:12.978556 0.016206] a_xy228 a_xz233 a_yz229
[17:49:13.009979 0.031424] Enter do_send
[17:49:13.025927 0.015948] 347136201: engineUpdate, opmode=0x908
[17:49:13.058001 0.032074] 347137705: Uplink data pending
[17:49:13.090223 0.032222] 347139720: Considering band 0, which is available at 388
[17:49:13.153837 0.063614] 347143426: Considering band 3, which is available at 0
[17:49:13.201948 0.048111] 347147009: No channel found in band 3
[17:49:13.249569 0.047618] 347149470: Considering band 0, which is available at 388
[17:49:13.313015 0.063448] 347153184: No channel found in band 0
[17:49:13.345306 0.032292] 347155645: Considering band 1, which is available at 345710800
[17:49:13.409153 0.063846] 347159740: Considering band 2, which is available at 388
[17:49:13.472689 0.063538] 347163454: No channel found in band 2
[17:49:13.504812 0.032122] 347165915: Considering band 1, which is available at 345710800
[17:49:13.568289 0.063478] 347170019: No channel found in band 1
[17:49:13.616145 0.047856] 347172479: Airtime available at 345710800 (channel duty limit)
[17:49:13.679799 0.063653] 347176574: Airtime available at 345818595 (global duty limit)
[17:49:13.743777 0.063979] 347180604: Ready for uplink
[17:49:13.775581 0.031803] 347183211: Updating info for TX at 347137704, airtime will be 3536. Setting available time for band 1 to 1273495552
[17:49:13.887426 0.111845] 347190039: TXMODE, freq=865062500, len=21, SF=7, BW=125, CR=4/5, IH=0
[17:49:13.966923 0.079497] 347190036: irq: dio: 0x0 flags: 0x8
[17:49:13.999007 0.032085] 347192387: Scheduled job 0x2ea, cb 0x240e ASAP
[17:49:14.046888 0.047880] Packet queued
[17:49:14.062758 0.015870] Leave do_send
[17:49:14.078713 0.015956] 347197382: Running job 0x2ea, cb 0x240e, deadline 0
[17:49:14.142103 0.063389] 347200766: Scheduled job 0x2ea, cb 0x2421 at 347252505
[17:49:14.907146 0.765042] 347252511: Running job 0x2ea, cb 0x2421, deadline 347252505
[17:49:14.955563 0.048417] 347252684: RXMODE_SINGLE, freq=865062500, SF=7, BW=125, CR=4/5, IH=0
[17:49:15.034720 0.079158] 347252851: irq: dio: 0x1 flags: 0x80
[17:49:15.067107 0.032386] 347255267: Scheduled job 0x2ea, cb 0x2429 ASAP
[17:49:15.114876 0.047770] 347258313: Running job 0x2ea, cb 0x2429, deadline 0
[17:49:15.178321 0.063445] 347261696: Scheduled job 0x2ea, cb 0x2440 at 347317981
[17:49:16.007143 0.828822] 347317987: Running job 0x2ea, cb 0x2440, deadline 347317981
[17:49:16.071155 0.064012] 347318161: RXMODE_SINGLE, freq=869525000, SF=12, BW=125, CR=4/5, IH=0
[17:49:16.182312 0.111156] 347324454: irq: dio: 0x1 flags: 0x80
[17:49:16.214699 0.032388] 347324538: Scheduled job 0x2ea, cb 0x2396 ASAP
[17:49:16.262575 0.047875] 347325715: Running job 0x2ea, cb 0x2396, deadline 0
[17:49:16.310673 0.048099] Enter onEvent
[17:49:16.326525 0.015852] EV_TXCOMPLETE (includes waiting for RX windows) // successful transmission
[17:49:16.389758 0.063233] Leave onEvent
[17:49:16.405718 0.015960] 347334226: engineUpdate, opmode=0x900
[17:49:16.437840 0.032122] Enter sleeping for 3 cycles of 8 seconds
[17:49:42.069730 25.631889] Sleep complete
[17:49:42.086105 0.016376] a_xy228 a_xz233 a_yz229
[17:49:42.117703 0.031598] Enter do_send
[17:49:42.133661 0.015958] 348916902: engineUpdate, opmode=0x908
[17:49:42.165775 0.032114] 348918406: Uplink data pending
[17:49:42.197666 0.031891] 348920421: Considering band 1, which is available at 347491304
[17:49:42.261485 0.063819] 348924516: Considering band 2, which is available at 388
[17:49:42.325137 0.063651] 348928221: Considering band 3, which is available at 0
[17:49:42.388628 0.063492] 348931804: No channel found in band 3
[17:49:42.420754 0.032126] 348934266: Considering band 1, which is available at 347491304
[17:49:42.484632 0.063877] 348938361: Considering band 2, which is available at 388
[17:49:42.548213 0.063582] 348942074: No channel found in band 2
[17:49:42.580477 0.032263] 348944536: Considering band 1, which is available at 347491304
[17:49:42.659480 0.079004] 348948640: No channel found in band 1
[17:49:42.691620 0.032139] 348951110: No channel found in band 0
[17:49:42.723945 0.032326] 348953570: Airtime available at 2148918405 (channel duty limit)
[17:49:42.803013 0.079067] 348957730: Uplink delayed until 2148918405 //Error !!!
[17:49:42.835500 0.032488] 348960530: Scheduled job 0x2ea, cb 0x473 at 2148918280
[17:49:42.898912 0.063412] Packet queued
[17:49:42.914871 0.015959] Leave do_send


#2

try increasing the time between two transmissions


(Arjan) #3

I don’t understand why the following observations would cause the issue you’re seeing, but a few things seem weird:

For The Things Network, the last line is wrong, as RX2 should use SF9, not SF12. As this is an ABP node, you’ll need to ensure the correct network settings are programmed into your node. (For OTAA, LMiC should be using the correct settings as received in the OTAA Join Accept message.) At least, the following seems to be missing, maybe more:

Also, you’ve disabled 5 of the 8 channels in your code, so the duty cycle limits will kick in sooner than you think. Like BoRRoZ wrote: increase the time between transmissions. Or better yet: enable the other channels?

The above does not explain why the node halts after around 200 uplinks. Which version of LMiC are you using?

The following might indicate LMiC is running into some overflow of the timer variables, as the “available at” is lower than the current timestamp in the following? By itself a delay is not a problem, but indeed 2,148,918,405 is a lot larger than 348,957,730, so that’s a long delay, but is that why you think the following is an error?

Maybe the above might be explained by something like the following:

Be sure to read all of Adafruit Feather 32u4 LoRa - long transmission time after deep sleep to see if something applies to your code while in deep sleep.

Your code is already using timer0_overflow_count += 8 * 64 * clockCyclesPerMicrosecond() after the sleep, but that lacks any documentation as for what that should achieve. Any more details?

As an aside, the timestamps in the log seem okay. Like this:

…tells you that 347,334,226 - 347,136,201 = 198,025 equals about 3.2 seconds. And then this:

…shows that 348,916,902 - 347,334,226 = 1,582,676 equals 25.7 seconds. And indeed 1,582,676 / 198,025 = 8 times as long as those first 3.2 seconds, so that seems to match the human readable timestamps. But still, one does not know if the LMiC internals know about the deep sleep?

Finally, probably unrelated:

I wonder what the unit is above. The following defines 4 times 2 bytes of data:

struct {
  unsigned short high_axy;
  unsigned short high_ayz;
  unsigned short high_axz;
  unsigned short battery_percentage;
} mydata;

…which in SF7 would take about 57 milliseconds. Are you indeed receiving 8 bytes when all is fine?


(Akshaim) #4

Hi,

Thanks for the quick help. I have now changed it to SF10, since I am using the device in India.

On the channels, unfortunately, India has only three uplink channels hence disabled the rest.
Uplink:

865.0625 - SF7BW125 to SF12BW125
865.4025 - SF7BW125 to SF12BW125
865.9850 - SF7BW125 to SF12BW125

https://www.thethingsnetwork.org/docs/lorawan/frequency-plans.html

I am using LMIC v1.5 < https://github.com/matthijskooijman/arduino-lmic >

Yes, it’s a long delay !! I cannot figure out how did it increase to such values after previous trasmission.

I am checking the code now. I shall move the delay to the on_event() and check. A few days back I tried but then the node stopped after sending 862 packets.

Yes , indeed. All are received and can be decoded.

Its strange that I had no such issues with RN2483 though using ttn library.


(Akshaim) #5

I can but then my application needs data to be transmitted every 25~35 seconds.


#6

how many bytes do you transmit ?


(Akshaim) #7

8 bytes . (2 bytes x 4 )


#8

just some thought , remove deepsleep and see what happends

exactly what sensor do you use ?


(Akshaim) #9

Its a magnetic sensor. QMC5883L. < https://github.com/dthain/QMC5883L >

I wanted to keep power consumption under 10uA idle. :frowning:


#10

yes I understand… but at this stage we are trying to find the source of your problem.

The GY-271 is supposed to have an HMC5883L, but some knockoff boards have a QMC5883L which has the same basic capability, but is not compatible with the H.

what do you have ?

some I2c implementations on sensors can give problems when the processor wakes up.
they should be reinitialised before reading.

so to test if its a sensor/mcu combi problem :slight_smile:

  • disable deepsleep and try
  • write some test code that reads the sensor, output the readings to monitor and back to sleep/wake up
    see how that goes and what’s the max speed.

(Akshaim) #11

We have the knockoffs !! Honeywell stopped making HMC5833.

I have tried without using QMC but the issue persists.

I shall try it now.


(Akshaim) #12

@BoRRoZ @arjanvanb Any ideas on why LMIC transmits only on channel 0 ? I have tried LMIC_setupChannel(). The log shows No channel found in band 0, No channel found in band 2, and No channel found in band 3 but it transmits on Band 1 ?
But Band 1 is 865402500. Confused !!!

LMIC_setupChannel(0, 865062500, DR_RANGE_MAP(DR_SF7, DR_SF12), BAND_CENTI);
LMIC_setupChannel(1, 865402500, DR_RANGE_MAP(DR_SF7, DR_SF12), BAND_CENTI);
LMIC_setupChannel(2, 865985000, DR_RANGE_MAP(DR_SF7, DR_SF12), BAND_CENTI);


#13

and please post your code complete (here or elsewhere) to get a better idea of your settings.
but use FORMAT


(Lagunacomputer) #14

cheap fix you could have the whole node just reboot/reset after 180 messages…


#15

:rofl:


(Akshaim) #16

Thanks everyone. The node works fine now. The following are the things that I have done.

  1. Enabled all three channels on the node.
  2. Disabled Duty Cap
  3. Increased Delay between messages.

I have been testing the updated node for the past 15 Days. No issues so far :slight_smile: