Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
Description
I am not 100% sure what the problem is so the summary could be wrong.
We have in Observium about 70 devices. All of them have some sort of alerts which all work well. On one device we have a special case that we have twice an alert for disk usage over 80% but with different recipients. And none of them get the emails. But all other checks generate emails.
When the disk usage is over 80% I see in the logs that one of the two has always the status FAIL and the other one FAIL_DELAYED and this for several hours till we delete some files. It looks like the FAIL_DELAYED is somehow used for both when Observium needs to decide if it should send out an email. And strange is why we even have FAIL_DELAYED as both checks have a delay of zero.
Any ideas how I could troubleshoot further? We are on version 23.2.12520.