Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • Community Edition
    • Poller
    • Linux (Debian 11)
      Observium CE 20.9.10731

    Description

      I was surprised to see a reduction in ping times in the poller Ping Response graph after upgrading some Linux operating systems a few weeks ago.  I thought that was pretty awesome.  I also noticed that there was no reduction of random spikes that are about an order of magnitude higher than regular pings.  I wondered if the spikes might be caused by first-ping conditions.  I understand that there are other lower-layer protocols that happen behind the scenes that might increase the response time of the first ping, like ARP discovery.

      I have attached a graph of the Observium Ping times of a certain machine that could easily represent all of the other Linux machines I have.  It is easy to see that all pings subsequent to September 1st drop off considerably.  That was the operating system upgrade.

      I thought that maybe the spikes might be reduced by averaging the ping time of several pings within the same amount of poller time, so I patched `includes/functions.inc.php` to provide the arguments to `fping` that makes it send out 5 pings 1 millisecond apart, and then wait for the Observium `$timeout` for all of the responses (500 ms by default).  I have attached a patch that illustrates my simple fping argument changes.  (Additionally, modern `fping` implementations have the arguments `-ipv4` and `-ipv6` instead of the separate executables, `fping` and `fping6`, so I also modified that too.)

      I discovered a wonderful result that is also clearly visible in the same attached graph after the 14th of September.  The spikes dissipated entirely.  All of my Ping graphs since then have very decent ping trends, with no spikes.  I believe that this is closer to reality.

      Please consider my idea for inclusion in Observium, or at least allow the configuration for the number of pings to send during polling so that the "Ping Response" graph can be recorded more accurately.  Thanks for listening.

      Attachments

        Activity

          [OBS-3878] Ping reliability
          troy Troy Bowman added a comment - - edited

          I noticed that the new community edition 21.10.11666 has optimized some of the ping code to pre-resolve addresses by names.  That's great, but things could still cause spikes in the single ping response time, like ARP.

          I still needed to patch "includes/functions.inc.php" and customize the fping arguments to:

          --period 10 --count 5

          Don't worry — sending five pings with "period 10" does not increase the time it takes to perform the ping probe.  All of the pings are sent ten milliseconds apart, and the replies come back as fast as they are received, and the "$config['ping']['timeout']" still applies.

          If you do not want to change the default way fping behaves for more reliable ping times, I believe it would be nice to at least allow end-users to customize fping in $config for the number of pings to send at once (`–count`), as well as the delay between ping requests (`–period`).

           

          --- includes/functions.inc.php.backup 2021-10-17 15:01:41.095645539 -0600
          +++ includes/functions.inc.php 2021-10-17 15:21:45.910222637 -0600
          @@ -1910,7 +1910,7 @@
           elseif ($retries > 10) { $retries = 10; }
           
           // Fping always requested by IP address
          - $cmd = array_tag_replace($tags, '%fping% -t %timeout% -c 1 -q %host% 2>&1');
          + $cmd = array_tag_replace($tags, '%fping% --timeout %timeout% --period 10 --count 5 --quiet %host% 2>&1');
           
           // Sleep interval between retries, max 1 sec, min 333ms (1s/3),
           // next retry will increase interval by 1.5 Backoff factor (see fping -B option)

           

          troy Troy Bowman added a comment - - edited I noticed that the new community edition 21.10.11666 has optimized some of the ping code to pre-resolve addresses by names.  That's great, but things could still cause spikes in the single ping response time, like ARP. I still needed to patch "includes/functions.inc.php" and customize the fping arguments to: --period 10 --count 5 Don't worry — sending five pings with "period 10" does not increase the time it takes to perform the ping probe.  All of the pings are sent ten milliseconds apart, and the replies come back as fast as they are received, and the "$config ['ping'] ['timeout'] " still applies. If you do not want to change the default way fping behaves for more reliable ping times, I believe it would be nice to at least allow end-users to customize fping in $config for the number of pings to send at once (`–count`), as well as the delay between ping requests (`–period`).   --- includes/functions.inc.php.backup 2021 - 10 - 17 15 : 01 : 41.095645539 - 0600 +++ includes/functions.inc.php 2021 - 10 - 17 15 : 21 : 45.910222637 - 0600 @@ - 1910 , 7 + 1910 , 7 @@ elseif ($retries > 10 ) { $retries = 10 ; } // Fping always requested by IP address - $cmd = array_tag_replace($tags, '%fping% -t %timeout% -c 1 -q %host% 2>&1' ); + $cmd = array_tag_replace($tags, '%fping% --timeout %timeout% --period 10 --count 5 --quiet %host% 2>&1' ); // Sleep interval between retries, max 1 sec, min 333ms (1s/3), // next retry will increase interval by 1.5 Backoff factor (see fping -B option)  

          Thanks, Observium Bot, but this really has little to do with SNMP or a specific device.

          troy Troy Bowman added a comment - Thanks, Observium Bot, but this really has little to do with SNMP or a specific device.

          Please make and attach additional information about the device:

          • full snmp dump from device:

            snmpwalk -v2c -c <community> -t 3 -Cc --hexOutputLength=0 -ObentxU <hostname> .1 > myagent.snmpwalk
            snmpwalk -v2c -c <community> -t 3 -Cc --hexOutputLength=0 -ObentxU <hostname> .1.3.6.1.4.1 >> myagent.snmpwalk

            If device not support SNMP version 2c, replace -v2c with -v1.

          • If you have problems with discovery or poller processes, please do and attach these debugs:

            ./discovery.php -d -h <device>
            ./poller.php -d -h <device>

          • additionally attach device and/or vendor specific MIB files

          Note, this comment is added automatically.

          bot Observium Bot added a comment - Please make and attach additional information about the device: full snmp dump from device: snmpwalk -v2c -c <community> -t 3 -Cc --hexOutputLength=0 -ObentxU <hostname> .1 > myagent.snmpwalk snmpwalk -v2c -c <community> -t 3 -Cc --hexOutputLength=0 -ObentxU <hostname> .1.3.6.1.4.1 >> myagent.snmpwalk If device not support SNMP version 2c, replace -v2c with -v1. If you have problems with discovery or poller processes, please do and attach these debugs: ./discovery.php -d -h <device> ./poller.php -d -h <device> additionally attach device and/or vendor specific MIB files Note, this comment is added automatically.

          People

            landy Mike Stupalov
            troy Troy Bowman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: