Uploaded image for project: 'Observium'
  1. Observium
  2. OBS-2464

r8892 Appears to Break Uptime Monitoring on Linux Systems

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Professional Edition
    • Poller
    • None
    • SLES 11/12 and RHEL 7

    Description

      Hi,

      r8892 breaks uptime monitoring on Linux servers, as far as we can see. I've just updated to 8893 and received a device rebooted alert for 30+ servers, and for those servers, the reported uptime is wrong. Seems to be newer distros only. Old CentOS boxes aren't affected, but new RHEL and SLES ones are.

      SVN log shows:
      r8892 | mike | 2017-10-12 14:11:41 +0100 (Thu, 12 Oct 2017) | 2 lines

      [MINOR] Prioritizing snmpEngineTime over hrSystemUptime and sysUptime. Clean old geolocation parts.

      This seems to be the wrong thing to do for Linux systems, because only hrSystemUptime seems to report the correct system uptime, as reported by the "uptime" command.

      For example, we've a server that has been up for 15 days, 22 hours.
      $ uptime
      14:43pm up 15 days 22:39, 1 user, load average: 0.02, 0.05, 0.01

      Observium's device page now reports it as Uptime 3h 50m 35s

      snmpwalk shows this:
      DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1300251) 3:36:42.51
      SNMP-FRAMEWORK-MIB::snmpEngineTime.0 = INTEGER: 13357 seconds
      HOST-RESOURCES-MIB::hrSystemUptime.0 = Timeticks: (137807404) 15 days, 22:47:54.04

      Please can this priority be fixed.

      Thanks,

      Steven

      Attachments

        Activity

          [OBS-2464] r8892 Appears to Break Uptime Monitoring on Linux Systems
          landy Mike Stupalov made changes -
          Status Original: Resolved [ 5 ] New: Closed [ 6 ]

          Looks to have fixed it!

          fadly.tabrani@gmail.com Fadly Tabrani added a comment - Looks to have fixed it!
          fadly.tabrani@gmail.com Fadly Tabrani made changes -
          Attachment New: screenshot-1.png [ 14389 ]

          Confirmed, fixed in r8918. No incorrect "device rebooted" alerts since updating.

           

          Thanks!

          stevenr Steven Robson added a comment - Confirmed, fixed in r8918. No incorrect "device rebooted" alerts since updating.   Thanks!
          landy Mike Stupalov made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Pending Response [ 10000 ] New: Resolved [ 5 ]

          Fixed in r8918.

          landy Mike Stupalov added a comment - Fixed in r8918.
          landy Mike Stupalov made changes -
          Priority Original: Major [ 3 ] New: Minor [ 4 ]
          Status Original: Reopened [ 4 ] New: Pending Response [ 10000 ]

          fadly.tabrani@gmail.com Can you provide temporarry (ssh) access to you observium server? I want to catch why this happen.

          If possible write to me p.mail: mike@observium.org

          landy Mike Stupalov added a comment - fadly.tabrani@gmail.com Can you provide temporarry (ssh) access to you observium server? I want to catch why this happen. If possible write to me p.mail: mike@observium.org
          landy Mike Stupalov made changes -
          Resolution Original: Fixed [ 1 ]
          Status Original: Resolved [ 5 ] New: Reopened [ 4 ]
          fadly.tabrani@gmail.com Fadly Tabrani added a comment - - edited

          On 17.10.8912 , these are happening on my RHEL7 bozes as well, and they have not been rebooted lately.

          2017-10-19 10:46:22 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 18h 55m 36s
          2017-10-19 10:21:15 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 18h 30m 30s
          2017-10-19 10:01:26 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 18h 10m 38s
          2017-10-19 09:41:20 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 17h 50m 44s
          2017-10-19 09:31:22 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 17h 40m 40s
          2017-10-19 09:11:24 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 17h 20m 35s
          2017-10-19 08:21:18 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 16h 30m 37s
          2017-10-19 07:51:15 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 16h 37s
          2017-10-19 07:41:12 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 15h 50m 33s
          2017-10-19 07:26:16 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 15h 35m 32s
          2017-10-19 06:11:12 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 14h 20m 31s
          2017-10-19 05:51:16 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 14h 32s
          2017-10-19 04:51:15 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 13h 45s
          2017-10-19 04:31:18 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 12h 40m 34s
          2017-10-19 04:11:19 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 12h 20m 37s
          2017-10-19 04:01:25 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 12h 10m 43s
          2017-10-19 03:16:15 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 11h 25m 37s
          2017-10-19 03:01:21 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 11h 10m 32s

          fadly.tabrani@gmail.com Fadly Tabrani added a comment - - edited On 17.10.8912 , these are happening on my RHEL7 bozes as well, and they have not been rebooted lately. 2017-10-19 10:46:22 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 18h 55m 36s 2017-10-19 10:21:15 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 18h 30m 30s 2017-10-19 10:01:26 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 18h 10m 38s 2017-10-19 09:41:20 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 17h 50m 44s 2017-10-19 09:31:22 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 17h 40m 40s 2017-10-19 09:11:24 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 17h 20m 35s 2017-10-19 08:21:18 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 16h 30m 37s 2017-10-19 07:51:15 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 16h 37s 2017-10-19 07:41:12 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 15h 50m 33s 2017-10-19 07:26:16 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 15h 35m 32s 2017-10-19 06:11:12 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 14h 20m 31s 2017-10-19 05:51:16 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 14h 32s 2017-10-19 04:51:15 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 13h 45s 2017-10-19 04:31:18 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 12h 40m 34s 2017-10-19 04:11:19 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 12h 20m 37s 2017-10-19 04:01:25 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 12h 10m 43s 2017-10-19 03:16:15 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 11h 25m 37s 2017-10-19 03:01:21 hostxxx hostxxx.xxx.com Device rebooted: after 1 day, 11h 10m 32s

          People

            landy Mike Stupalov
            stevenr Steven Robson
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: