Uploaded image for project: 'Observium'
  1. Observium
  2. OBS-3260

hardware requirements for around 3K devices

Details

    • Bug
    • Resolution: Not A Bug
    • Trivial
    • None
    • Community Edition
    • Documentation
    • None
    • virtual machine with 12 cores (Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz) and 16G of RAM running Centos7

    Description

      During a trial, we installed the system in a virtual machine with 12 cores (Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz) and 16G of RAM.

      At the moment we have added the following inventory:

      The CPU usage of the virtual machine is running at 100% so the graphs are showing lots of spaces.

       

       

      Is there a recomended hardware to use?

       

       

      Attachments

        Activity

          [OBS-3260] hardware requirements for around 3K devices

          Your deployment is very large, and this system is sort of old and not that fast.

          The E5-2620 has a cpumark of ~5273 and a single-thread score of 1109. The Ryzen 7 5800X in my desktop has a cpumark of 28381 and a single-thread score of 3487.

          Observium scales in two ways, firstly the aggregate amount of CPU power available, by running multiple poller processes via the poller-wrapper thread count (2x cores is recommended perhaps up to 3x as maximum). Secondly MySQL queries are single-threaded, so rely on the single thread performance of the CPU. The E5-2620 has pretty poor single-thread performance, so every individual mysql query will be quiet slow. 

          You could try moving the MySQL process to a system with faster individual cores, but ideally you should just find a more modern system. Your deployment is quite large and already difficult to scale on modern hardware.

          You should increase poller threads while the poller time keeps reducing until it gets to about 300 seconds. If it stops reducing before you can get it to 300 seconds, you need more raw CPU performance to poll everything.

          You might also have an i/o performance bottleneck. Observium is quite i/o intensive, and rrdcached helps a bit, but not as much as one might expect. 

          BTW, this sort of thing should be addressed to the mailing list rather than Jira, Jira is for bugs.

          adam.

           

           

          adama Adam Armstrong added a comment - Your deployment is very large, and this system is sort of old and not that fast. The E5-2620 has a cpumark of ~5273 and a single-thread score of 1109. The Ryzen 7 5800X in my desktop has a cpumark of 28381 and a single-thread score of 3487. Observium scales in two ways, firstly the aggregate amount of CPU power available, by running multiple poller processes via the poller-wrapper thread count (2x cores is recommended perhaps up to 3x as maximum). Secondly MySQL queries are single-threaded, so rely on the single thread performance of the CPU. The E5-2620 has pretty poor single-thread performance, so every individual mysql query will be quiet slow.  You could try moving the MySQL process to a system with faster individual cores, but ideally you should just find a more modern system. Your deployment is quite large and already difficult to scale on modern hardware. You should increase poller threads while the poller time keeps reducing until it gets to about 300 seconds. If it stops reducing before you can get it to 300 seconds, you need more raw CPU performance to poll everything. You might also have an i/o performance bottleneck. Observium is quite i/o intensive, and rrdcached helps a bit, but not as much as one might expect.  BTW, this sort of thing should be addressed to the mailing list rather than Jira, Jira is for bugs. adam.    

          You can also try disabling poller modules for specific devices if you dont need/want them to decrease time. Check poller stats https://observium/pollerlog/view=devices/ and try to locate slow devices/slow modules on specific device.
          i.e. we've disabled ARP/FDB on all.

          dklimek Denis Klimek added a comment - You can also try disabling poller modules for specific devices if you dont need/want them to decrease time. Check poller stats https://observium/pollerlog/view=devices/ and try to locate slow devices/slow modules on specific device. i.e. we've disabled ARP/FDB on all.

          Try more threads.... I think we have 32 threads running Target time is below 300s. Just based on simple calculation. If you need 1500s with 7 Threads, you will need 7 Threads * 5 = 35 Threads to have 300s.

          dklimek Denis Klimek added a comment - Try more threads.... I think we have 32 threads running Target time is below 300s. Just based on simple calculation. If you need 1500s with 7 Threads, you will need 7 Threads * 5 = 35 Threads to have 300s.

          Thanks for your response.

          I have configured the poller threads to 7, without significant performance change.

           

          The poller time is high.

           

           

          We tried to configure rrdcached but we could not, it shows errors when tried to start it.

           

           

          samuel2020 Samuel Fernandez Marroquín added a comment - Thanks for your response. I have configured the poller threads to 7, without significant performance change.   The poller time is high.     We tried to configure rrdcached but we could not, it shows errors when tried to start it.    
          dklimek Denis Klimek added a comment - - edited

          You have to adjust the poller threads to avoid problems like yours. If you already have enough threads, try increasing the numbers of vCPUs inside of your VM. Using RRDcached can also improve performance of your setup.

          Incease pollers and check https://yourip/pollerlog/ - Total Time must be lower than 300s

          dklimek Denis Klimek added a comment - - edited You have to adjust the poller threads to avoid problems like yours. If you already have enough threads, try increasing the numbers of vCPUs inside of your VM. Using RRDcached can also improve performance of your setup. Incease pollers and check https://yourip/pollerlog/ - Total Time must be lower than 300s

          People

            sid3windr Tom Laermans
            samuel2020 Samuel Fernandez Marroquín
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: