Details
Description
I was surprised to see a reduction in ping times in the poller Ping Response graph after upgrading some Linux operating systems a few weeks ago. I thought that was pretty awesome. I also noticed that there was no reduction of random spikes that are about an order of magnitude higher than regular pings. I wondered if the spikes might be caused by first-ping conditions. I understand that there are other lower-layer protocols that happen behind the scenes that might increase the response time of the first ping, like ARP discovery.
I have attached a graph of the Observium Ping times of a certain machine that could easily represent all of the other Linux machines I have. It is easy to see that all pings subsequent to September 1st drop off considerably. That was the operating system upgrade.
I thought that maybe the spikes might be reduced by averaging the ping time of several pings within the same amount of poller time, so I patched `includes/functions.inc.php` to provide the arguments to `fping` that makes it send out 5 pings 1 millisecond apart, and then wait for the Observium `$timeout` for all of the responses (500 ms by default). I have attached a patch that illustrates my simple fping argument changes. (Additionally, modern `fping` implementations have the arguments `-ipv4` and `-ipv6` instead of the separate executables, `fping` and `fping6`, so I also modified that too.)
I discovered a wonderful result that is also clearly visible in the same attached graph after the 14th of September. The spikes dissipated entirely. All of my Ping graphs since then have very decent ping trends, with no spikes. I believe that this is closer to reality.
Please consider my idea for inclusion in Observium, or at least allow the configuration for the number of pings to send during polling so that the "Ping Response" graph can be recorded more accurately. Thanks for listening.
I noticed that the new community edition 21.10.11666 has optimized some of the ping code to pre-resolve addresses by names. That's great, but things could still cause spikes in the single ping response time, like ARP.
I still needed to patch "includes/functions.inc.php" and customize the fping arguments to:
--period 10 --count 5
Don't worry — sending five pings with "period 10" does not increase the time it takes to perform the ping probe. All of the pings are sent ten milliseconds apart, and the replies come back as fast as they are received, and the "$config['ping']['timeout']" still applies.
If you do not want to change the default way fping behaves for more reliable ping times, I believe it would be nice to at least allow end-users to customize fping in $config for the number of pings to send at once (`–count`), as well as the delay between ping requests (`–period`).