Uploaded image for project: 'Observium'
  1. Observium
  2. OBS-3679

no more alert notifications since last update

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • None
    • Alerting
    • None
    • Debian 9.13

    Description

      Hi,
      Since Monday, I have updated Observium to the latest revision: there are no more alert notifications.
      Before, we had systematically: "Checks failed" followed by "Alert notification sent", now there is no more this 2nd step.
      I have checked all the parameters / on the master and the 5 pollers, and I don't see where this problem can come from.

      Attachments

        Issue Links

          Activity

            [OBS-3679] no more alert notifications since last update

            As i said :  there was no scheduled maintenance, and none appears in the history.

             

            rdumas Raphaël Dumas added a comment - As i said :  there was no scheduled maintenance, and none appears in the history.  

            Do you have a scheduled maintenance entry in the relevant database table which covers this period?

            It would produce a different alert entry status though, so should have been obvious.

            adam.

            adama Adam Armstrong added a comment - Do you have a scheduled maintenance entry in the relevant database table which covers this period? It would produce a different alert entry status though, so should have been obvious. adam.
            rdumas Raphaël Dumas added a comment - - edited

            I don't understand anything : since last night, without any new action on our part, we receive all the alerts again.
            It was as if the tool had been in global maintenance for 1 week.
            Except that, of course, there was no scheduled maintenance, and none appears in the history.

            Could it be that the tool is consider a "ghost" maintenance?
            => that something were scheduled / cancelled, and was finally considered, without appearing in the interface?

            I'm glad that's working now, but it's really a strange thing...
            if you have any explanation / theory: I take it !
            Could be useful in the futur.

            rdumas Raphaël Dumas added a comment - - edited I don't understand anything : since last night, without any new action on our part, we receive all the alerts again. It was as if the tool had been in global maintenance for 1 week. Except that, of course, there was no scheduled maintenance, and none appears in the history. Could it be that the tool is consider a "ghost" maintenance? => that something were scheduled / cancelled, and was finally considered, without appearing in the interface? I'm glad that's working now, but it's really a strange thing... if you have any explanation / theory: I take it ! Could be useful in the futur.

            You don't really need to do the rebuild thing, certainly not after updates or reboots.

            The alert test script uses basically the same path of generating a notification that the normal poller does, so I'm not sure what might not be working, since you're getting other notifications via other contacts, and testing the telegram contact works.

            You should probably make screenshots of all of the relevant parts in the chain, showing configuration of checker and contact, and alert log entries which didn't generate a telegram notification.

            Adam.

            adama Adam Armstrong added a comment - You don't really need to do the rebuild thing, certainly not after updates or reboots. The alert test script uses basically the same path of generating a notification that the normal poller does, so I'm not sure what might not be working, since you're getting other notifications via other contacts, and testing the telegram contact works. You should probably make screenshots of all of the relevant parts in the chain, showing configuration of checker and contact, and alert log entries which didn't generate a telegram notification. Adam.

            As i said in my comment 3 days ago :
            I clicked on "rebuild" after the update and reboot of the master.

             

            I also tried setting up a new alert, then clicking rebuild again.

             

            do you have another process to follow?

            we didn't get any notifications during the whole weekend, and it's becoming very annoying not to be able to react in time.

             

            rdumas Raphaël Dumas added a comment - As i said in my comment 3 days ago : I clicked on "rebuild" after the update and reboot of the master.   I also tried setting up a new alert, then clicking rebuild again.   do you have another process to follow? we didn't get any notifications during the whole weekend, and it's becoming very annoying not to be able to react in time.  

            Are you sure the alert was triggered?

            adama Adam Armstrong added a comment - Are you sure the alert was triggered?

            yes, we received this test alert message on telegram.

            we still do not receive the current notifications that are supposed to be sent by the checkers

            rdumas Raphaël Dumas added a comment - yes, we received this test alert message on telegram. we still do not receive the current notifications that are supposed to be sent by the checkers

            ok, as I see test notification sends normally to external api.

            REQUEST[https://api.telegram.org/bot11111111663:XXXXXXXXXXXXX/sendMessage]
            REQUEST STATUS[TRUE]
            REQUEST RUNTIME[3.2403s]
            RESPONSE CODE[200 OK]
            Response [1] valid: [ok] eq [1]

            you received this test message?

            landy Mike Stupalov added a comment - ok, as I see test notification sends normally to external api. REQUEST[https://api.telegram.org/bot11111111663:XXXXXXXXXXXXX/sendMessage] REQUEST STATUS[TRUE] REQUEST RUNTIME[3.2403s] RESPONSE CODE[200 OK] Response [1] valid: [ok] eq [1] you received this test message?

            well... it's work better with an accurate contact id...

            (the last number taken for testing was the "recipient":"-1001158003268")

            see the test_alert-c1.txt file in attachment ; note that we did get a message on telegram after this script

             

            rdumas Raphaël Dumas added a comment - well... it's work better with an accurate contact id... (the last number taken for testing was the "recipient":"-1001158003268") see the test_alert-c1.txt file in attachment ; note that we did get a message on telegram after this script  

            You didn't include the command you ran to get this output.

            Why does it think you gave it that weird number as a contact id? What contact id did you give it?

            adama Adam Armstrong added a comment - You didn't include the command you ran to get this output. Why does it think you gave it that weird number as a contact id? What contact id did you give it?

            People

              landy Mike Stupalov
              rdumas Raphaël Dumas
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: