Uploaded image for project: 'Observium'
  1. Observium
  2. OBS-2336

Ability to send poller stats to InfluxDB

Details

    • New Feature
    • Resolution: Unresolved
    • Major
    • None
    • None
    • None
    • None

    Description

      It would be helpful to be able to send telemetry data for device polling information to InfluxDB.

      At present I have hacked poller-wrapper.py to include a number of calls via os.system to InfluxDB to send individual device metrics and global poller-wrapper run stats and then built a dashboard in Grafana to provide a live overview and alerting for poll timers of my Observium nodes.

      Attachments

        Issue Links

          Activity

            [OBS-2336] Ability to send poller stats to InfluxDB

            I'm polling over 800 devices and its writing on average 200 metrics/s during the polling cycles with 128 threads.

             

            jyates Jason Yates added a comment - I'm polling over 800 devices and its writing on average 200 metrics/s during the polling cycles with 128 threads.  

            Your install must be small! It totally destroyed my dev server to the point that it had to be rebooted.

            It's probably preferable to use an external queuing system, so that we don't have to deal with it.

            A major blocker to correct implementation of this would be being able to pass the correct information to the rrdtool functions to correctly tag metrics being sent to influx.

            adama Adam Armstrong added a comment - Your install must be small! It totally destroyed my dev server to the point that it had to be rebooted. It's probably preferable to use an external queuing system, so that we don't have to deal with it. A major blocker to correct implementation of this would be being able to pass the correct information to the rrdtool functions to correctly tag metrics being sent to influx.

            I've been using this patch for a few weeks now with no real issues. It's adding up to 1 second to the polling time of each device.

            To my knowledge, the patch would have to handle queuing the inserts and then send them in chunks at the end of the polling cycle. 

            The best way I can think of doing this is having the influxdb_update function add the update to a global array and then having the poller process actually perform the bulk update via UDP using stream_socket_client

            Thoughts?

             

            jyates Jason Yates added a comment - I've been using this patch for a few weeks now with no real issues. It's adding up to 1 second to the polling time of each device. To my knowledge, the patch would have to handle queuing the inserts and then send them in chunks at the end of the polling cycle.  The best way I can think of doing this is having the influxdb_update function add the update to a global array and then having the poller process actually perform the bulk update via UDP using stream_socket_client Thoughts?  

            Yes, I also fucked my server by using your patch!

            adama Adam Armstrong added a comment - Yes, I also fucked my server by using your patch!
            adama Adam Armstrong added a comment - - edited

            It seems that using https://github.com/influxdata/influxdb/tree/master/services/udp would be better?

            It queues up inserts to be sent to the server in chunks. Seems that it would make sense rather than trying to control that ourselves?

            adama Adam Armstrong added a comment - - edited It seems that using https://github.com/influxdata/influxdb/tree/master/services/udp  would be better? It queues up inserts to be sent to the server in chunks. Seems that it would make sense rather than trying to control that ourselves?

            I added:

                CURLOPT_TIMEOUT => 2,       # don't let hung influx make us wait forever.

            to the curl options, to try to avoid this kind of hang.

            fenestro Bill Fenner added a comment - I added:     CURLOPT_TIMEOUT => 2,       # don't let hung influx make us wait forever. to the curl options, to try to avoid this kind of hang.

            A caution if you're going to try my diff: our influx server hung and observium went insane, hanging forever trying to push the data.  I'll investigate timeouts in the curl api.

            fenestro Bill Fenner added a comment - A caution if you're going to try my diff: our influx server hung and observium went insane, hanging forever trying to push the data.  I'll investigate timeouts in the curl api.

            I wrote some code based on Adam's idea.

            Limitations:

            • If the poller uses rrdtool_update() (not _ng), the values passed to influx have no tags.  Solution: change the poller to use __ rrdtool_update_ng() )
            • The index values that make sense to observium may not make sense to influx users.  (E.g., may want to use port's ifName instead of ifIndex).  Solution: Add another index argument to rrdtool_update_ng() that is just passed through to influx?
            • We make one request to influx per point, instead of queueing them up.  It should be fairly straightforward to create the queue, but in my tests there's only about a 3% overhead to this, so it may be ok for now.

             

            influx-updates.diff

            fenestro Bill Fenner added a comment - I wrote some code based on Adam's idea. Limitations: If the poller uses rrdtool_update() (not _ng), the values passed to influx have no tags.  Solution: change the poller to use __ rrdtool_update_ng() ) The index values that make sense to observium may not make sense to influx users.  (E.g., may want to use port's ifName instead of ifIndex).  Solution: Add another index argument to rrdtool_update_ng() that is just passed through to influx? We make one request to influx per point, instead of queueing them up.  It should be fairly straightforward to create the queue, but in my tests there's only about a 3% overhead to this, so it may be ok for now.   influx-updates.diff

            There is existing code to send metrics to statsd, but it's added per-module. It's possible that this could be done using the rrd functions instead.

            adama Adam Armstrong added a comment - There is existing code to send metrics to statsd, but it's added per-module. It's possible that this could be done using the rrd functions instead.

            People

              adama Adam Armstrong
              jyates Jason Yates
              Votes:
              4 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: