Monitoring btrfs with Prometheus (and soon OpenMonitoring)

* Monitoring btrfs with Prometheus (and soon OpenMonitoring)
@ 2018-10-07 13:37 Holger Hoffstätte
  2018-10-08 12:29 ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 2+ messages in thread
From: Holger Hoffstätte @ 2018-10-07 13:37 UTC (permalink / raw)
  To: linux-btrfs

The Prometheus statistics collection/aggregation/monitoring/alerting system
[1] is quite popular, easy to use and will probably be the basis for the
upcoming OpenMetrics "standard" [2].

Prometheus collects metrics by polling host-local "exporters" that respond
to http requests; many such exporters exist, from the generic node_exporter
for OS metrics to all sorts of application-/service-specific varieties.

Since btrfs already exposes quite a lot of monitorable and - more
importantly - actionable runtime information in sysfs it only makes sense
to expose these metrics for visualization & alerting. I noodled over the
idea some time ago but got sidetracked, besides not being thrilled at all
by the idea of doing this in golang (which I *really* dislike).

However, exporters can be written in any language as long as they speak
the standard response protocol, so an alternative would be to use one
of the other official exporter clients. These provide language-native
"mini-frameworks" where one only has to fill in the blanks (see [3]
for examples).

Since the issue just came up in the node_exporter bugtracker [3] I
figured I ask if anyone here is interested in helping build a proper
standalone btrfs_exporter in C++? :D

..just kidding, I'd probably use python (which I kind of don't really
know either :) and build on Hans' python-btrfs library for anything
not covered by sysfs.

Anybody interested in helping? Apparently there are also golang libs
for btrfs [5] but I don't know anything about them (if you do, please
comment on the bug), and the idea of adding even more stuff into the
monolithic, already creaky and somewhat bloated node_exporter is not
appealing to me.

Potential problems wrt. btrfs are access to root-only information,
like e.g. the btrfs device stats/errors in the aforementioned bug,
since exporters are really supposed to run unprivileged due to network
exposure. The S.M.A.R.T. exporter [6] solves this with dual-process
contortions; obviously it would be better if all relevant metrics were
accessible directly in sysfs and not require privileged access, but
forking a tiny privileged process every polling interval is probably
not that bad.

All ideas welcome!

cheers,
Holger

[1] https://www.prometheus.io/
[2] https://openmetrics.io/
[3] https://github.com/prometheus/client_python,
     https://github.com/prometheus/client_ruby
[4] https://github.com/prometheus/node_exporter/issues/1100
[5] https://github.com/prometheus/node_exporter/issues/1100#issuecomment-427651028
[6] https://github.com/cloudandheat/prometheus_smart_exporter

^ permalink raw reply	[flat|nested] 2+ messages in thread