Uploaded image for project: 'Observium'
  1. Observium
  2. OBS-4300

SNMP Connectivity to VMware ESXi 7.0 Hosts Flaky

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • None
    • Poller
    • CentOS Linux release 7.9.2009 (Core)

    Description

      Running version 22.11.12360

      Both with new ESXi 7.0 hosts, and since upgrading existing (previously monitored) ESXi hosts from v6.7 to v7.0, SNMP polling only succeeds intermittently.

      I will attach SNMP walk results from both when the system is "visible" (live) and when polling fails (dead), as well as poller and discovery results.

      With all apologies, I don't know if this is a problem in Observium or in ESXi but wanted to start the ball rolling.  I have a support ticket open with VMware as well, and have sent them logs from two of the affected systems.

      Attachments

        Activity

          [OBS-4300] SNMP Connectivity to VMware ESXi 7.0 Hosts Flaky
          landy Mike Stupalov made changes -
          Status Original: In Review [ 10101 ] New: Pending Response [ 10000 ]

          Yah, I still not have solutions. Observium just use external common commands, for ping - fping, for snmp - net-snmp.

          But exactly for your poller debug, strange that I not see any fping cmd run.

          Device configured as skip ping checks?

          Can you try upgrade your CentOS system.. it used old software versions.

          landy Mike Stupalov added a comment - Yah, I still not have solutions. Observium just use external common commands, for ping - fping, for snmp - net-snmp. But exactly for your poller debug, strange that I not see any fping cmd run. Device configured as skip ping checks? Can you try upgrade your CentOS system.. it used old software versions.

          Working with VMware support, they've been running packet captures.

          When the requests fail they're seeing the SNMP requests come in from Observium on a random port (i.e. 37981) but when attempting to respond the port is unavailable.  VMware is wondering why the port would close before the response can be sent.

          I am attaching four activity captures, two from the Observium side and two from the ESXi side.

          NegwerIT Scott Driemeier-Showers added a comment - Working with VMware support, they've been running packet captures. When the requests fail they're seeing the SNMP requests come in from Observium on a random port (i.e. 37981) but when attempting to respond the port is unavailable.  VMware is wondering why the port would close before the response can be sent. I am attaching four activity captures, two from the Observium side and two from the ESXi side.
          NegwerIT Scott Driemeier-Showers made changes -
          Attachment New: 20221121_Observium_Poll_Failed.txt [ 20268 ]
          Attachment New: 20221121_Observium_Poll_Succeed.txt [ 20269 ]
          Attachment New: 20221121_STLVH004_Rcv_pcap.txt [ 20270 ]
          Attachment New: 20221121_STLVH004_Snd_pcap.txt [ 20271 ]
          landy Mike Stupalov made changes -
          Status Original: Pending Response [ 10000 ] New: In Review [ 10101 ]

          Thank you.  I will continue to work this with VMware Support.

          --Scott

          NegwerIT Scott Driemeier-Showers added a comment - Thank you.  I will continue to work this with VMware Support. --Scott
          landy Mike Stupalov made changes -
          Status Original: In Review [ 10101 ] New: Pending Response [ 10000 ]

          ok, mainly trouble exactly with snmp response by device..
          this is no how related with observium.
          not sure how can help.. I see you already set snmp timeout to 2sec.
          Try set max repetition for device to 0 (this will disable snmpbulkwalk), but I not sure that this will help.

          landy Mike Stupalov added a comment - ok, mainly trouble exactly with snmp response by device.. this is no how related with observium. not sure how can help.. I see you already set snmp timeout to 2sec. Try set max repetition for device to 0 (this will disable snmpbulkwalk), but I not sure that this will help.

          Device event log screenshot attached.

          --Scott

          NegwerIT Scott Driemeier-Showers added a comment - Device event log screenshot attached. --Scott
          landy Mike Stupalov made changes -
          Status Original: Pending Response [ 10000 ] New: In Review [ 10101 ]
          NegwerIT Scott Driemeier-Showers made changes -
          Attachment New: 20221102_Observium_STLVH004_EventLog.png [ 20218 ]

          People

            landy Mike Stupalov
            NegwerIT Scott Driemeier-Showers
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: