Uploaded image for project: 'Observium'
  1. Observium
  2. OBS-4079

Newer IOS-XR Reports Wild/Wrong dBm values

Details

    • Vendor Bug
    • Resolution: Fixed
    • Major
    • None
    • Professional Edition
    • Poller
    • None

    Description

      Hi!

      Following an upgrade in IOS-XR on some 9000 series to 6.4.2 - we are now seeing wildly incorrect dBm readings. We were also polling these from Cacti and have seen the same change. Having looked into it, the reason is that Cisco now include an additional digit of accuracy, but since it is an integer SNMP value, it is essentially now a 10x larger value. The OID is in mW and so when you convert it you get rather large dBm values.

      The fix in Cacti was to amend our script to div 1000 instead of div 100, before applying the various log(10) maths to it to convert it to dBm. This is now working correctly, but the same fix is needed on Observium.

      So, to give a worked example, the OID element for Te0/0/0/0 is now as follows:

      iso.3.6.1.4.1.9.9.91.1.1.1.1.4.59395242 = INTEGER: 7773

      And the PHY interface itself shows:

      Rx Power: 0.77730 mW (-1.09411 dBm)

      Previously this would have given an integer value to 3 decimal places, so 777 instead.

      Interestingly, the full entity OID for reference is:

      iso.3.6.1.4.1.9.9.91.1.1.1.1.1.59395242 = INTEGER: 6
      iso.3.6.1.4.1.9.9.91.1.1.1.1.2.59395242 = INTEGER: 8
      iso.3.6.1.4.1.9.9.91.1.1.1.1.3.59395242 = INTEGER: 3
      iso.3.6.1.4.1.9.9.91.1.1.1.1.4.59395242 = INTEGER: 7773
      iso.3.6.1.4.1.9.9.91.1.1.1.1.5.59395242 = INTEGER: 1
      iso.3.6.1.4.1.9.9.91.1.1.1.1.6.59395242 = Timeticks: (0) 0:00:00.00
      iso.3.6.1.4.1.9.9.91.1.1.1.1.7.59395242 = INTEGER: 30
      iso.3.6.1.4.1.9.9.91.1.1.1.1.8.59395242 = INTEGER: 11201333

      Now i believe iso.3.6.1.4.1.9.9.91.1.1.1.1.3 should really be '4' in this case, as there are 4 decimal places, so that might be a bug or just my lack of understanding.

      If this is indeed an edge-case specifically for this IOS version, then please can i ask that Cisco IOS-XR 6.4.2 is matched during polling/discovery and this fix applied purely to this release number? We now have a number of units running this version (and will be for some time) - and as such, a huge amount of incorrectly reporting/alerting dBm levels.

      Do let me know what else may be required to assist with this.

      All the best!

      Attachments

        1. Screenshot 2022-04-12 143455.png
          Screenshot 2022-04-12 143455.png
          55 kB
        2. Screenshot 2022-04-11 093544.png
          Screenshot 2022-04-11 093544.png
          49 kB
        3. screenshot-1.png
          screenshot-1.png
          95 kB
        4. poller.txt
          4.90 MB
        5. image-2022-04-12-11-49-05-358.png
          image-2022-04-12-11-49-05-358.png
          40 kB
        6. discovery.txt
          7.41 MB

        Activity

          [OBS-4079] Newer IOS-XR Reports Wild/Wrong dBm values
          landy Mike Stupalov added a comment - Please try rediscovery with latest rolling r12306. https://docs.observium.org/updating/#switch-between-rolling-and-stable-trains

          Hi Robert, sorry for delay.
          We not like do this, but I can try set "individual" model/firmware hack.

          I will check again your poller/discovery debug, but if possible please make snmpdump from this device (as Observium Bot requested in first comment).

          landy Mike Stupalov added a comment - Hi Robert, sorry for delay. We not like do this, but I can try set "individual" model/firmware hack. I will check again your poller/discovery debug, but if possible please make snmpdump from this device (as Observium Bot requested in first comment).

          Unfortunately not, the RSP-440 doesn't support the 64bit releases you link to (this is the battle we have had with Cisco) trust me

          The latest version we can possibly run is 6.4.2 (SP10) - this is coming direct from TAC. We are runing the later versions on other RSPs and it is definately fixed on them, even with non-Cisco optics, but we simply cannot use that as it won't work on the RSP-440 as it is 32-bit only....

          robertw Robert Williams added a comment - Unfortunately not, the RSP-440 doesn't support the 64bit releases you link to (this is the battle we have had with Cisco) trust me The latest version we can possibly run is 6.4.2 (SP10) - this is coming direct from TAC. We are runing the later versions on other RSPs and it is definately fixed on them, even with non-Cisco optics, but we simply cannot use that as it won't work on the RSP-440 as it is 32-bit only....

          And just for sure, I see that current MD release for your (ASR 9006) platform 7.3.2:
          https://software.cisco.com/download/home/282423206/type/280805694/release/7.3.2

          landy Mike Stupalov added a comment - And just for sure, I see that current MD release for your (ASR 9006) platform 7.3.2: https://software.cisco.com/download/home/282423206/type/280805694/release/7.3.2

          Mmm... might not be the best example, but they do have a chunk of Cisco-branded ones in there and i've got zero correct readings on any of them at all

          Either way, is it possible to get a flag/knob/switch etc. that we can use to override this Cisco-ness ? It's kinda broken about 200 optics' worth of dBm readings and made them totally useless We are also forced to run this version indefinately now.... which sucks quite a lot...

          robertw Robert Williams added a comment - Mmm... might not be the best example, but they do have a chunk of Cisco-branded ones in there and i've got zero correct readings on any of them at all Either way, is it possible to get a flag/knob/switch etc. that we can use to override this Cisco-ness ? It's kinda broken about 200 optics' worth of dBm readings and made them totally useless We are also forced to run this version indefinately now.... which sucks quite a lot...

          I think this is also non cisco module, but with cisco-compatible firmware

          landy Mike Stupalov added a comment - I think this is also non cisco module, but with cisco-compatible firmware

          Hi - Thanks, although i have confirmed this is affecting Cisco SFP as well as Non-Cisco equally. I've uploaded a couple of screen grabs showing the value for a 10G Cisco SFP+ as an example

          Since this is across the board on the 6.4.2 - is it possible to give us a variable/switch which we can use to manually override the miliwatts decimal accuracy value on a per-device basis?

          This way, it has no impact on anyone who is not using it and won't interfere with them. Those who need it, can then activate it (and set it to 4). ?

          Your assistance is much appreciated!

          robertw Robert Williams added a comment - Hi - Thanks, although i have confirmed this is affecting Cisco SFP as well as Non-Cisco equally. I've uploaded a couple of screen grabs showing the value for a 10G Cisco SFP+ as an example Since this is across the board on the 6.4.2 - is it possible to give us a variable/switch which we can use to manually override the miliwatts decimal accuracy value on a per-device basis? This way, it has no impact on anyone who is not using it and won't interfere with them. Those who need it, can then activate it (and set it to 4). ? Your assistance is much appreciated!

          This trouble can be happened only with Non-Cisco SFP+ modules (and probably not with all).
          Probably you need to "fix" firmware on this module(s) or change module (if that possible).

          Unfortunately it is impossible to make an exception for such a sensor, there are too many possible parameters on which the sensor depends.

          landy Mike Stupalov added a comment - This trouble can be happened only with Non-Cisco SFP+ modules (and probably not with all). Probably you need to "fix" firmware on this module(s) or change module (if that possible). Unfortunately it is impossible to make an exception for such a sensor, there are too many possible parameters on which the sensor depends.

          Sorry, to confirm the 'what' element, as far as i can see it impacts sensors which are in miliwatts. The sensors for RPM, voltage, temperature, watts (PSUs) are all correct. It's purely miliwatt sensors for the optics which are broken.

          robertw Robert Williams added a comment - Sorry, to confirm the 'what' element, as far as i can see it impacts sensors which are in miliwatts. The sensors for RPM, voltage, temperature, watts (PSUs) are all correct. It's purely miliwatt sensors for the optics which are broken.

          Hi - unfortunately this is the only (final) supported/reccomended version for the RSP-440 series in all A9K chassis. So it's going to stay long-term broken on this platform until these older RSPs are retired and replaced. As such, it will be affecting anyone running the final support software on that platform now as a result. Very annoying...!

          My understanding is that it is fixed in future releases beyond 6.4.2 (i.e. for the RSP-880 onwards) - and i can confirm that it is definately correct in 6.7.3 as i have boxes running that and it is correct on those.

          I would therefore reccomend matching 'only' 6.4.2 as in our detection example here, and applying the edge-case code to that version in isolation.

          It would be very much appreciated if this could be achieved, as we have been forced by Cisco to upgrade in order to maintain support on some of these older deployments and so are now unable to go any further in terms of upgrades to fix this. To be honest, it's caused more issues than it has fixed, but we are poweless to avoid it at this stage...

          Cheers!!

          robertw Robert Williams added a comment - Hi - unfortunately this is the only (final) supported/reccomended version for the RSP-440 series in all A9K chassis. So it's going to stay long-term broken on this platform until these older RSPs are retired and replaced. As such, it will be affecting anyone running the final support software on that platform now as a result. Very annoying...! My understanding is that it is fixed in future releases beyond 6.4.2 (i.e. for the RSP-880 onwards) - and i can confirm that it is definately correct in 6.7.3 as i have boxes running that and it is correct on those. I would therefore reccomend matching 'only' 6.4.2 as in our detection example here, and applying the edge-case code to that version in isolation. It would be very much appreciated if this could be achieved, as we have been forced by Cisco to upgrade in order to maintain support on some of these older deployments and so are now unable to go any further in terms of upgrades to fix this. To be honest, it's caused more issues than it has fixed, but we are poweless to avoid it at this stage... Cheers!!

          People

            landy Mike Stupalov
            robertw Robert Williams
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: