Uploaded image for project: 'Observium'
  1. Observium
  2. OBS-3927

All devices/sensors read null for a single interval

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • CE-21.10
    • Alerting

    Description

      It seems every device/sensor/etc. in the database returns NULL for a single polling interval, and then it returns to normal. So while it works for 287 out of 288 checks a day, that one errant check sends our engineering team and CEO about 1000 scary emails kicking all the response teams into action. I updated to 21.10 two weeks ago, shortly after it was released and there has been no issue until yesterday and today. I looked in the cron listing and didn't see anything running around that time.

       

      Love this tool, thanks so much for all of your hard work. If there is anything else I can pull to help close this issue out please let me know.

      Attachments

        Activity

          [OBS-3927] All devices/sensors read null for a single interval

          Still looking for a root cause here, but the server replacement resolved the issue. I was able to use the simple instructions to migrate the config/rrd/db to a new instance and it all spun up and started working beautifully. I'm at a loss as to what was happening here. Thanks again!

          erobin Eamon C Robinson added a comment - Still looking for a root cause here, but the server replacement resolved the issue. I was able to use the simple instructions to migrate the config/rrd/db to a new instance and it all spun up and started working beautifully. I'm at a loss as to what was happening here. Thanks again!

          I'm going to throw in the towel on this one and build a new VM and copy the data/config over. 7AM is a low utilization time in our environment and I cannot sort out any reason why this continues to occur.

          single interface log for timing, I'm UTC-4 (EDT)

            2021-11-01 07:10:53 Interface DELETED mark removed
            2021-11-01 07:05:52 Interface was marked as DELETED
            2021-10-31 07:10:52 Interface DELETED mark removed
            2021-10-31 07:05:14 Interface was marked as DELETED
            2021-10-30 07:10:49 Interface DELETED mark removed
            2021-10-30 07:05:33 Interface was marked as DELETED
            2021-10-29 07:10:53 Interface DELETED mark removed
            2021-10-29 07:05:32 Interface was marked as DELETED
            2021-10-28 07:10:51 Interface DELETED mark removed
            2021-10-28 07:05:44 Interface was marked as DELETED
            2021-10-27 07:10:52 Interface DELETED mark removed
            2021-10-27 07:05:20 Interface was marked as DELETED
            2021-10-26 07:10:51 Interface DELETED mark removed
            2021-10-26 07:05:44 Interface was marked as DELETED
          erobin Eamon C Robinson added a comment - I'm going to throw in the towel on this one and build a new VM and copy the data/config over. 7AM is a low utilization time in our environment and I cannot sort out any reason why this continues to occur. single interface log for timing, I'm UTC-4 (EDT)   2021-11-01 07:10:53 Interface DELETED mark removed   2021-11-01 07:05:52 Interface was marked as DELETED   2021-10-31 07:10:52 Interface DELETED mark removed   2021-10-31 07:05:14 Interface was marked as DELETED   2021-10-30 07:10:49 Interface DELETED mark removed   2021-10-30 07:05:33 Interface was marked as DELETED   2021-10-29 07:10:53 Interface DELETED mark removed   2021-10-29 07:05:32 Interface was marked as DELETED   2021-10-28 07:10:51 Interface DELETED mark removed   2021-10-28 07:05:44 Interface was marked as DELETED   2021-10-27 07:10:52 Interface DELETED mark removed   2021-10-27 07:05:20 Interface was marked as DELETED   2021-10-26 07:10:51 Interface DELETED mark removed   2021-10-26 07:05:44 Interface was marked as DELETED

          Third day in a row. 7:05AM EDT, all variables are set to 0/NULL, on the next check at 7:10AM EDT all returns to normal. This is the exact same timing three days in a row. Were it a network issue I'd expect to see all the devices fail to poll, not pull NULLs for all fields, but as I'm not aware of how that module is coded I'm just guessing. 

          erobin Eamon C Robinson added a comment - Third day in a row. 7:05AM EDT, all variables are set to 0/NULL, on the next check at 7:10AM EDT all returns to normal. This is the exact same timing three days in a row. Were it a network issue I'd expect to see all the devices fail to poll, not pull NULLs for all fields, but as I'm not aware of how that module is coded I'm just guessing. 

          I have no idea what might cause this, for all devices to have an issue like this I've not encountered before.

           

          adama Adam Armstrong added a comment - I have no idea what might cause this, for all devices to have an issue like this I've not encountered before.  

          General questions and device support you always can discuss in our Discord channel, click here for connect.


          Please make and attach additional information about the device:

          • full snmp dump from device:

            snmpwalk -v2c -c <community> -t 3 -Cc --hexOutputLength=0 -ObentxU <hostname> .1 > myagent.snmpwalk
            snmpwalk -v2c -c <community> -t 3 -Cc --hexOutputLength=0 -ObentxU <hostname> .1.3.6.1.4.1 >> myagent.snmpwalk

            If device not support SNMP version 2c, replace -v2c with -v1.

          • If you have problems with discovery or poller processes, please do and attach these debugs:

            ./discovery.php -d -h <device>
            ./poller.php -d -h <device>

          • additionally attach device and/or vendor specific MIB files

          This comment is added automatically.

          bot Observium Bot added a comment - General questions and device support you always can discuss in our Discord channel, click here for connect . Please make and attach additional information about the device: full snmp dump from device: snmpwalk -v2c -c <community> -t 3 -Cc --hexOutputLength=0 -ObentxU <hostname> .1 > myagent.snmpwalk snmpwalk -v2c -c <community> -t 3 -Cc --hexOutputLength=0 -ObentxU <hostname> .1.3.6.1.4.1 >> myagent.snmpwalk If device not support SNMP version 2c, replace -v2c with -v1. If you have problems with discovery or poller processes, please do and attach these debugs: ./discovery.php -d -h <device> ./poller.php -d -h <device> additionally attach device and/or vendor specific MIB files This comment is added automatically.

          People

            adama Adam Armstrong
            erobin Eamon C Robinson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: