Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Professional Edition
-
Debian 8.3 64-bit
Description
When I netcat the observium-agent of our storage-server I get the following for nfsd:
<<<app-nfsd>>>
rc 0 3262276 11714942
fh 0 0 0 0 0
io 1012421615 1079238768
th 32 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
ra 64 0 0 0 0 0 0 0 0 0 0 0
net 14977013 0 14976885 6
rpc 14976901 0 0 0 0
proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
proc3 22 0 304 0 18 20 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0 0 0
proc4 2 2 14976465
proc4ops 59 0 0 0 699531 4021436 4346 1438 0 0 13846271 4239317 0 31 0 30 84271 0 0 4366901 0 792334 0 14793981 0 90018 1165970 82534 822 469 11358 13046 4209683 4379447 0 72935 90018 90018 0 2975085 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The problem is that is see no rrd being generated in observium/rrd/storage1/ while I can see rrd's of other metrics. Also there are no graphs visible on the nfsd-page of the storage-machine for nfsd (only titles) while other graphs work fine.
Attachments
- poller.log
- 107 kB
- poller.log
- 15 kB
Activity
In our systems I don't see a difference in the amount of numbers:
# uname -a
|
Linux storage1 3.16.0-7-amd64 #1 SMP Debian 3.16.59-1 (2018-10-03) x86_64 GNU/Linux
|
|
# # cat /proc/net/rpc/nfsd
|
rc 0 828027905 643218917
|
fh 12 0 0 0 0
|
io 1302794912 2664671801
|
th 128 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
|
ra 256 0 0 0 0 0 0 0 0 0 0 0
|
net 1471294987 0 1471180616 18
|
rpc 1471250321 0 0 0 0
|
proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
|
proc3 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
|
proc4 2 11 1471231510
|
proc4ops 59 0 0 0 19377503 17214607 1023247 528078 0 9602946 428639048 7219272 0 1292 0 1273 7889412 0 0 19043186 0 9 3 1472650292 0 20 194325745 3163520 6106 4402702 2625660 101809 0 2625660 0 9729857 6 6 0 832521748 12 0 2 8 11 6 1261 0 0 0 0 0 0 8 447666891 0 0 0 3 8
|
# uname -a
|
Linux bernard 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
|
|
# cat /proc/net/rpc/nfsd
|
rc 0 0 0
|
fh 0 0 0 0 0
|
io 0 0
|
th 8 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
|
ra 32 0 0 0 0 0 0 0 0 0 0 0
|
net 0 0 0 0
|
rpc 0 0 0 0 0
|
proc3 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
|
proc4 2 0 0
|
proc4ops 72 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
|
Do you have access to multiple kernel versions to check that proc3 has the same number of variables?
I agree. I can confirm removing the extra array_shift() fixes the stats. I didn't add the dummy-label, I don't think it's needed but I can't check since we use NFS4.
We locally applied this patch to fix things for now:
Index: includes/polling/applications/nfsd.inc.php
|
===================================================================
|
--- includes/polling/applications/nfsd.inc.php (revision 9671)
|
+++ includes/polling/applications/nfsd.inc.php (working copy)
|
@@ -51,7 +51,6 @@
|
{
|
$base = strtolower($tokens[0]);
|
array_shift($tokens);
|
- array_shift($tokens);
|
foreach ($tokens as $k => $v)
|
{
|
$datas[$base.($nfsLabel[$base][$k])] = $v;
|
Adding proc4ops to the stat's would be nice.
It should be something like this, but it looks like code also has to be added at some other places and rewriting for readability would also be needed so it's easier to maintain or add new features:
$nfsLabel['proc4ops'] = array( |
'unused', |
'access_close_commit_create', |
'delegpurge', |
'recovery', |
'delegreturn', |
'getattr', |
'getfh', |
'link', |
'lock', |
'lockt', |
'locku', |
'lookup', |
'lookupp', |
'nverify', |
'open', |
'openattr', |
'open_confirm', |
'open_dgrd', |
'putfh', |
'putpubfh', |
'putrootfh', |
'read_readdir_readlink_remove_rename', |
'renew', |
'restorefh', |
'savefh', |
'secinfo', |
'setattr', |
'setcltid', |
'setcltidconf', |
'verify', |
'write', |
'rellockowner' |
);
|
It also seems that the number of data points returned differs based on kernel version, which is pretty absurd.
Jesus, whoever originally wrote this code gave zero fucks for any future readability. Why is any of it doing what it's doing?
The whole thing just needs rewritten so that it's actually readable, rather than arsing around with arrays of text labels and somehow building rrds out of them.
It seems that array of tokens is shifted too many times and the last key is never read. A patch below may help (remove array shift and add "dummy" key to compensate for that in nfsv3 parameters).
diff --git a/includes/polling/applications/nfsd.inc.php b/includes/polling/applications/nfsd.inc.php
|
index fc396b88..e386c4f1 100644 |
--- a/includes/polling/applications/nfsd.inc.php
|
+++ b/includes/polling/applications/nfsd.inc.php
|
@@ -36,6 +36,7 @@ if (!empty($agent_data['app']['nfsd'])) |
);
|
|
$nfsLabel['proc3'] = array( |
+ "dummy", |
"null", "getattr", "setattr", "lookup", "access", "readlink", |
"read", "write", "create", "mkdir", "symlink", "mknod", |
"remove", "rmdir", "rename", "link", "readdr", "readdirplus", |
@@ -51,7 +52,6 @@ if (!empty($agent_data['app']['nfsd'])) |
{
|
$base = strtolower($tokens[0]); |
array_shift($tokens);
|
- array_shift($tokens);
|
foreach ($tokens as $k => $v)
|
{
|
$datas[$base.($nfsLabel[$base][$k])] = $v;
|
bump is there any progress on this issue? I'm willing to provide extra information if necessary. I still can't see the amount of writes on the storage-server in nfsd...
I wasn' t paying attention to the issue for a while, but it seems some stats for the module are generated since April 13th... Ironic
Probably this commit:
_r7746 | sid3windr | 2016-04-12 23:26:08 +0200 (di, 12 apr 2016) | 1 line
Major: introduction of new RRD create/update framework, code conversion is over halfway, low hanging fruit mostly picked. Simplifies a lot of the code, provides insight into our current RRD file settings. Could later be used to send certain named metrics to other places from the core functions.
_
I see graphs now but a few metrics are still missing:
NFSd RC
"nocache" shows -nan
NFSd I/O
w_bytes shows -nan (I do get r_bytes but I'm pretty sure the box is also written because VM' s are running on this machine)
NFSd Net
t_conn shows -nan
Graphs NFSd RPC and NFSd v3 show only zero's, but I would expect some stats here also.
The new logfile as requested:
poller.log
Haha, Adam wrote incorrect module name in cmd
Please do and attach debug again:
./poller.php -d -m unix-agent -h <device>
I'm available for testing patches if needed, I'll keep an eye on email.