Uploaded image for project: 'Observium'
  1. Observium
  2. OBS-5211

`poller-wrapper.py` distributed-poller SQL uses `@rownum := @rownum + 1` — non-deterministic on MariaDB 10.3+, causes random device reshuffling between cron ticks and NaN gaps in RRDs

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Enterprise Edition
    • Poller

    Description

      I used AI to generate this ticket so if you need more information from me, let me know.

      TL;DR

      Observium's distributed-poller wrapper splits devices across multiple poller
      hosts using a clever-but-fragile SQL trick that is only guaranteed to work
      on MySQL by accident. On MariaDB, the same trick gives different answers on
      different runs. The real-world consequence:

      1. Each cron tick, every poller host asks the database "which devices are
        mine to poll?"
      2.  On MariaDB, the answer randomly changes between runs.
      3. Each device ends up polled on a random schedule instead of every 5 minutes.
      4. When the gap exceeds 10 minutes, RRDtool records `NaN` (no value).
      5. Graphs in the web UI show holes.

      Summary

      When using distributed pollers (`observium-wrapper poller -i N -n i` with
      `N >= 2`) against a MariaDB 10.3+ master database, devices get randomly
      reshuffled across poller instances on every cron tick. Many devices end up
      polled at intervals exceeding the RRD `minimal_heartbeat` (default 600s),
      causing `-nan` rows in the resulting RRDs. The same configuration on MySQL
      8/9 happens to work because of an optimizer detail that masks the underlying
      SQL bug; on MariaDB it is exposed.

      Symptom

      `rrdtool fetch` on a recent RRD shows alternating real values and `-nan` rows
      on a regular cadence:

      1777343700: 1.7722e+08 1.7764e+08 ...
      1777344000: -nan -nan -nan ...
      1777344300: -nan -nan -nan ...
      1777344600: -nan -nan -nan ...
      1777344900: 1.7732e+08 1.7748e+08 ...
      1777345200: -nan -nan -nan ...

      In the `devices` table, `last_polled` for individual devices drifts to 15-45
      minutes ago even though `observium-wrapper poller` reports "processed N
      devices in X seconds" every cron tick on every distributed poller. Each cron
      tick the wrapper processes a different random subset of the devices it
      should own.

      Root cause

      `poller-wrapper.py` (around lines 793-815 in CE 26.1) uses this SQL to
      compute the per-instance shard:

      SELECT `device_id` FROM (SELECT @rownum := 0) r,
      (SELECT @rownum := @rownum + 1 AS rownum, device_id
      FROM `devices`
      WHERE ...
      ORDER BY `device_id` ASC) temp
      WHERE MOD(temp.rownum, %s) = %s

      The MySQL/MariaDB manuals both state that "the order of evaluation for
      expressions involving user variables is undefined" (see Sources, claims 1-2).
      On MariaDB the optimizer typically chooses a secondary index (e.g.
      `disabled_lastpolled`) for this query and adds `Using temporary;
      Using filesort` to satisfy the inner `ORDER BY device_id`. The
      `@rownum := @rownum + 1` assignment runs during the secondary-index scan
      {}before{} the filesort, so the rownum baked into each row reflects
      secondary-index scan order, not sorted-device_id position. Combined with
      InnoDB stats auto-recalc, the chosen plan can shift between executions,
      producing different `MOD(rownum, N) = i` row sets across cron ticks.

      `EXPLAIN` on MariaDB 10.6:

      key: disabled_lastpolled
      type: range
      Extra: Using index condition; Using where; Using temporary; Using filesort

      `EXPLAIN` on MySQL 8

      key: PRIMARY
      type: index
      Extra: (none)

      MySQL's optimizer is generally more biased than MariaDB's toward picking an
      index that already satisfies `ORDER BY` so a filesort can be skipped.
      `prefer_ordering_index=on` is one example of this bias (specifically scoped
      in the docs to queries with `LIMIT`, which our query does not have); a
      broader pattern of similar preferences leads MySQL to typically pick the
      PRIMARY index for our query and MariaDB to typically pick a secondary index.
      With the PRIMARY scan, rows emerge already in `device_id` order, no filesort
      is needed, and rownum stays correlated with the final result order. This is
      what masks the underlying user-variable-evaluation issue on MySQL. The SQL
      is still technically undefined per the manual; the deprecation label has
      been on it since MySQL 8.0 (see Sources, claim 2).

      Reproduction

      On any MariaDB 10.3+ Observium master with distributed polling enabled and
      at least a few hundred devices:

      Run the wrapper's exact shard query 20 times, hash the result each time.

      for i in $(seq 1 20); do
      mysql observium -N -e "
      SELECT \`device_id\` FROM (SELECT @rownum := 0) r,
      (SELECT @rownum := @rownum + 1 AS rownum, device_id
      FROM \`devices\`
      WHERE disabled != 1 AND poller_id = 1
      ORDER BY \`device_id\` ASC) temp
      WHERE MOD(temp.rownum, 4) = 0
      ORDER BY device_id;" \
      |md5sum
      done
      

      Observed on MariaDB 10.6.5 with 313 active devices (`-i 4 -n 0`):

      • distinct MD5 hashes across 20 runs{} — the shard membership is
        changing between executions.

      Observed on MySQL 8.0.41 with the same dataset:

      • MD5 hash across 20 runs{} — stable. (The resulting partition is
        technically not what the code's surface form claims — `@rownum` reflects
        PK scan position, not sorted-device_id rank — but it is at least a stable
        bijection on the active-device set, which is what the wrapper depends on.)

      Proposed fix

      Replace both the `bills` and `devices` distributed-shard subqueries with the
      SQL-standard `ROW_NUMBER() OVER (ORDER BY id ASC)` window function.
      `ROW_NUMBER()` is supported on MariaDB 10.2+ and MySQL 8+, both of which
      Observium already requires (see Sources, claims 4-5).

      — a/poller-wrapper.py
      +++ b/poller-wrapper.py
      @@ -793,21 +793,17 @@
       # Use distributed wrapper
      if process == 'billing':
       # billing
       
       - query = """SELECT `bill_id` FROM (SELECT @rownum :=0) r,
       - (
       - SELECT @rownum := @rownum +1 AS rownum, bill_id
      + query = """SELECT `bill_id` FROM (
      + SELECT ROW_NUMBER() OVER (ORDER BY `bill_id` ASC) AS rownum, `bill_id`
      FROM `bills`
       - ORDER BY `bill_id` ASC
      ) temp
      WHERE MOD(temp.rownum, %s) = %s""" % (instances_count, instance_number)
      else:
       
       # poller or discovery
       
       - query = """SELECT `device_id` FROM (SELECT @rownum :=0) r,
       - (
       - SELECT @rownum := @rownum +1 AS rownum, device_id
      + query = """SELECT `device_id` FROM (
      + SELECT ROW_NUMBER() OVER (ORDER BY `device_id` ASC) AS rownum, `device_id`
      FROM `devices`
      %s
       - ORDER BY `device_id` ASC
      ) temp
      WHERE MOD(temp.rownum, %s) = %s""" % (where_devices, instances_count, instance_number)
      

      After the fix, the query is:

      • Standards-conformant - no reliance on undefined user-variable evaluation
        order).
      • Deterministic across runs and engines.
      • Forward-compatible with MySQL 9.x, where the `@var := ...` inline
        assignment is deprecated (see Sources, claim 2).

      Possibly related public ticket

      OBS-5164 ("Issue with multiple pollers alerts while they can reach
      Observium server and the polled device", filed 2026-01-26, currently Pending
      Response):

      • Observium Enterprise 24.8.13606
      • MariaDB 10.11.6
      • 9 distributed pollers
      • Symptom: random `device_down` alerts with no underlying network or device
        problem
      • Reporter blamed DB connection timeouts but couldn't pin down a real cause

      The symptoms are consistent with this bug. When the shard membership
      reshuffles each cron tick, devices that "should have been" polled by
      instance N this tick get skipped; an alert checker then sees stale
      `last_polled` and fires `device_down` until the next reshuffle happens to
      include that device. Worth investigating whether OBS-5164 closes once this
      fix is in.

      Notes

      • The bug is silent: `observium-wrapper poller` reports "processed N devices
        in X seconds" successfully every tick, and the wrapper's own RRDs
        (`poller-wrapper.rrd`) look healthy. The corruption is downstream in the
        per-device RRDs.
      • It scales with deployment size: a deployment small enough that MariaDB
        picks the PRIMARY index for the shard query may not see the bug. The
        threshold is determined by the optimizer's selectivity estimate of the
        `disabled` column versus PK scan cost.
      • The wrapper's `billing` shard query has the same bug. The patch above
        covers both.
      1.  

      Sources

      Claim 1 — User-variable evaluation order is officially undefined

      • MySQL 9.6 Reference Manual, §11.4 "User-Defined Variables"
        https://dev.mysql.com/doc/refman/9.6/en/user-variables.html>
        > "The order of evaluation for expressions involving user variables is
        > undefined. For example, there is no guarantee that
        > `SELECT @a, @a:=@a+1` evaluates `@a` first and then performs the
        > assignment."
      • MariaDB Knowledge Base, "User-Defined Variables"
        https://mariadb.com/kb/en/user-defined-variables/>
        > "It is unsafe to read a user-defined variable and set its value in the
        > same statement (unless the command is `SET`), because the order of
        > these actions is undefined."

      This is the engine-agnostic foundation: both vendors document that the
      `@rownum := @rownum + 1` pattern has no guaranteed evaluation order. A
      working result on any given engine/version is incidental, not contractual.

      Claim 2 — The inline `@var := ...` assignment style is deprecated in MySQL

      Cited in: "Notes on engine portability" / "Proposed fix" sections, where the
      ticket says the `@rownum := ...` pattern is "deprecated since MySQL 8.0 and
      subject to removal in a future release."

      • MySQL 8.0 Reference Manual, §11.4 "User-Defined Variables"
        https://dev.mysql.com/doc/refman/8.0/en/user-variables.html
        > "Previous releases of MySQL made it possible to assign a value to a user
        > variable in statements other than `SET`. **This functionality is
        > supported in MySQL 8.0 for backward compatibility but is subject to
        > removal in a future release of MySQL.**"

      Claim 3 — `prefer_ordering_index` exists only on MySQL, defaults `on`, and is part of why MySQL tends to mask the bug

      Cited in: "Root cause" section, where the ticket says MySQL's optimizer is
      biased toward picking an index that already satisfies `ORDER BY`.

      • MySQL 8.0 Reference Manual, §10.9.2 "Switchable Optimizations"
        <https://dev.mysql.com/doc/refman/8.0/en/switchable-optimizations.html>
        > "`prefer_ordering_index` (default `on`) — Controls whether, in the case
        > of a query having an `ORDER BY` or `GROUP BY` with a `LIMIT` clause,
        > the optimizer tries to use an ordered index instead of an unordered
        > index, a filesort, or some other optimization. This optimization is
        > performed by default whenever the optimizer determines that using it
        > would allow for faster execution of the query. [...] Prior to MySQL
        > 8.0.21, it was not possible to disable this optimization, but in
        > MySQL 8.0.21 and later, while it remains the default behavior, it can
        > be disabled by setting the `prefer_ordering_index` flag to `off`."

      Important precision. The flag's documented scope is `ORDER BY` /
      `GROUP BY` with a `LIMIT`. Our buggy query does not have `LIMIT`, so the
      flag is not strictly the cause of MySQL picking the PRIMARY index in our
      case. The MySQL optimizer separately also prefers an index that already
      satisfies `ORDER BY` (so it can avoid a filesort). The combined practical
      effect is that MySQL tends to pick the PRIMARY index for our query and
      MariaDB tends not to. The MariaDB optimizer has no comparable "prefer
      ordering index" toggle.

      Claim 4 — `ROW_NUMBER()` is supported on MariaDB 10.2+ and MySQL 8.0+

      Cited in: "Proposed fix" section.

      Claim 5 — Observium's own minimum versions already require ROW_NUMBER()-capable engines

      Cited in: "Proposed fix" section, where the ticket says ROW_NUMBER() is
      within Observium's already-required engine versions.

      MariaDB 10.2.7 ≥ 10.2 → has `ROW_NUMBER()`.
      MySQL 5.7.8 lacks `ROW_NUMBER()` (window functions were added in 8.0). So
      there is a corner case: a Community Edition deployment running
      Observium-minimum MySQL 5.7.8 would not have `ROW_NUMBER()` available.
      However, Distributed Polling — the only feature that exercises this query
      path — is Enterprise-only, and Enterprise installs are virtually always on
      recent MySQL 8.x or MariaDB 10.5+.

      Claim 76— DBA community has independently called the rownum trick out as undefined

      Useful as third-party (non-vendor) corroboration:

      • **MySQL official blog archive, "Row numbering, ranking: how to use less
        user variables in MySQL queries"**
        <https://dev.mysql.com/blog-archive/row-numbering-ranking-how-to-use-less-user-variables-in-mysql-queries/>
        > Discusses why the `@rownum` trick happens to work on a single-table
        > query when the optimizer pre-orders rows: "MySQL sees that there is
        > only one table and that the ordering expression is a column of this
        > table, so it makes an optimization: it first orders the rows of people,
        > then reads this ordered result and, for each row, calculates selected
        > expressions (including `@rownum`). So `@rownum` increases in order, as
        > desired." The blog post is explicit that this is an *optimization
        > side-effect*, not a guarantee.

      Attachments

        Issue Links

          Activity

            People

              landy Mike Stupalov
              konri Lance Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: