Description
There is an issue in Observium where some customers have 192+ PetaBytes of data usage tracked on their bill. This data occurs in one day (different per customer) and returns back to normal the next day.
What would be causing these spikes in data usage?
Attachments
Activity
Status | Original: In Review [ 10101 ] | New: Pending Response [ 10000 ] |
Comment |
[ _*General questions and device support can be discussed in [our Discord channel, click here to join|https://discord.gg/GjpNXKWm8W].*_ ---- Please make and attach additional information about the device: * full snmp dump from device: {noformat} snmpwalk -v2c -c <community> -t 3 -Cc --hexOutputLength=0 -Ih -ObentxU <hostname> .1 > myagent.snmpwalk snmpwalk -v2c -c <community> -t 3 -Cc --hexOutputLength=0 -Ih -ObentxU <hostname> .1.3.6.1.4.1 >> myagent.snmpwalk {noformat} _If device not support SNMP version 2c, replace -v2c with -v1._ * If you have problems with discovery or poller processes, please do and attach these debugs: {noformat} ./discovery.php -d -h <device> ./poller.php -d -h <device> {noformat} * additionally attach device and/or vendor specific MIB files ---- {color:#505F79}_This comment is added automatically._{color} ] |
Assignee | Original: Mike Stupalov [ landy ] | New: Adam Armstrong [ adama ] |
Status | Original: Pending Response [ 10000 ] | New: In Review [ 10101 ] |
Status | Original: Open [ 1 ] | New: Pending Response [ 10000 ] |
Hi,
The only thing that could really cause this would be incorrect data being returned by the device, with the delta between two polled datapoints being very large, 192PB is probably the maximum size of the database field signed, or something there abouts.
This should not really happen though.
How distant are the devices being polled from the polling system? What hardware are they?
You should be able to see the deltas in the database, but we don't actually record the actual number returned by the polled device.
Do you get similar odd behaviour in the RRD-based port graphs? RRD probably clips the number off because of the maximum value on the DS, though.
Our logic to generate the in/out delta is quite simple:
if (is_numeric($period) && is_numeric($entity['bill_port_counter_in']) && is_numeric($entity['bill_port_counter_out']) && $data['in'] >= $entity['bill_port_counter_in'] && $data['out'] >= $entity['bill_port_counter_out']) {
// Counters are higher or equal to before, seems legit.
$in_delta = int_sub($data['in'], $entity['bill_port_counter_in']);
$out_delta = int_sub($data['out'], $entity['bill_port_counter_out']);
echo("Counters valid, delta generated.\n");
} elseif (is_numeric($period) && is_numeric($entity['bill_port_counter_in']) && is_numeric($entity['bill_port_counter_out'])) {
// Counters are lower, we must have wrapped. We'll take the measurement as the amount for this period.
$in_delta = $data['in'];
$out_delta = $data['out'];
echo("Counters wrapped, delta fudged.\n");
} else {
// First update. delta is zero, only insert counters.
echo("No existing counters.\n");
$in_delta = 0;
$out_delta = 0;
}}}{{ if ($in_delta == $data['in']) {
// Deltas are equal to counters. Clearly fail.
echo("In Deltas equal counters. Resetting.");
$in_delta = 0;
}}}{{ if ($out_delta == $data['out']) {
// Deltas are equal to counters. Clearly fail.
echo("Out Deltas equal counters. Resetting.");
$out_delta = 0;
{{ }}}
The last two if statements were added to try to prevent these sorts of spikes. We could potentially add another filter to zero any spike which is greater than some number, potentially not the device-reported port speed though, because that's quite unreliable.
I suspect what's happening is that this device is returning a maximum value, overflowing the database field, i don't think it'll be from returning zeros, because we filter out the negative spike from zeroing the counters, then ignore the spike caused by the counters returning to normal.
If you know what you're doing you could add some debugging output to dump a lot of the data to a text file to try to spot what exactly the device is returning.
Also you could look in the database at the times the spike is occurring to see what exactly the deltas being recorded are.
adam.