All of lore.kernel.org
 help / color / mirror / Atom feed
* Monitoring btrfs with Prometheus (and soon OpenMonitoring)
@ 2018-10-07 13:37 Holger Hoffstätte
  2018-10-08 12:29 ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 2+ messages in thread
From: Holger Hoffstätte @ 2018-10-07 13:37 UTC (permalink / raw)
  To: linux-btrfs


The Prometheus statistics collection/aggregation/monitoring/alerting system
[1] is quite popular, easy to use and will probably be the basis for the
upcoming OpenMetrics "standard" [2].

Prometheus collects metrics by polling host-local "exporters" that respond
to http requests; many such exporters exist, from the generic node_exporter
for OS metrics to all sorts of application-/service-specific varieties.

Since btrfs already exposes quite a lot of monitorable and - more
importantly - actionable runtime information in sysfs it only makes sense
to expose these metrics for visualization & alerting. I noodled over the
idea some time ago but got sidetracked, besides not being thrilled at all
by the idea of doing this in golang (which I *really* dislike).

However, exporters can be written in any language as long as they speak
the standard response protocol, so an alternative would be to use one
of the other official exporter clients. These provide language-native
"mini-frameworks" where one only has to fill in the blanks (see [3]
for examples).

Since the issue just came up in the node_exporter bugtracker [3] I
figured I ask if anyone here is interested in helping build a proper
standalone btrfs_exporter in C++? :D

..just kidding, I'd probably use python (which I kind of don't really
know either :) and build on Hans' python-btrfs library for anything
not covered by sysfs.

Anybody interested in helping? Apparently there are also golang libs
for btrfs [5] but I don't know anything about them (if you do, please
comment on the bug), and the idea of adding even more stuff into the
monolithic, already creaky and somewhat bloated node_exporter is not
appealing to me.

Potential problems wrt. btrfs are access to root-only information,
like e.g. the btrfs device stats/errors in the aforementioned bug,
since exporters are really supposed to run unprivileged due to network
exposure. The S.M.A.R.T. exporter [6] solves this with dual-process
contortions; obviously it would be better if all relevant metrics were
accessible directly in sysfs and not require privileged access, but
forking a tiny privileged process every polling interval is probably
not that bad.

All ideas welcome!

cheers,
Holger

[1] https://www.prometheus.io/
[2] https://openmetrics.io/
[3] https://github.com/prometheus/client_python,
     https://github.com/prometheus/client_ruby
[4] https://github.com/prometheus/node_exporter/issues/1100
[5] https://github.com/prometheus/node_exporter/issues/1100#issuecomment-427651028
[6] https://github.com/cloudandheat/prometheus_smart_exporter

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Monitoring btrfs with Prometheus (and soon OpenMonitoring)
  2018-10-07 13:37 Monitoring btrfs with Prometheus (and soon OpenMonitoring) Holger Hoffstätte
@ 2018-10-08 12:29 ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 2+ messages in thread
From: Austin S. Hemmelgarn @ 2018-10-08 12:29 UTC (permalink / raw)
  To: Holger Hoffstätte, linux-btrfs

On 2018-10-07 09:37, Holger Hoffstätte wrote:
> 
> The Prometheus statistics collection/aggregation/monitoring/alerting system
> [1] is quite popular, easy to use and will probably be the basis for the
> upcoming OpenMetrics "standard" [2].
> 
> Prometheus collects metrics by polling host-local "exporters" that respond
> to http requests; many such exporters exist, from the generic node_exporter
> for OS metrics to all sorts of application-/service-specific varieties.
> 
> Since btrfs already exposes quite a lot of monitorable and - more
> importantly - actionable runtime information in sysfs it only makes sense
> to expose these metrics for visualization & alerting. I noodled over the
> idea some time ago but got sidetracked, besides not being thrilled at all
> by the idea of doing this in golang (which I *really* dislike).
> 
> However, exporters can be written in any language as long as they speak
> the standard response protocol, so an alternative would be to use one
> of the other official exporter clients. These provide language-native
> "mini-frameworks" where one only has to fill in the blanks (see [3]
> for examples).
> 
> Since the issue just came up in the node_exporter bugtracker [3] I
> figured I ask if anyone here is interested in helping build a proper
> standalone btrfs_exporter in C++? :D
> 
> ..just kidding, I'd probably use python (which I kind of don't really
> know either :) and build on Hans' python-btrfs library for anything
> not covered by sysfs.
> 
> Anybody interested in helping? Apparently there are also golang libs
> for btrfs [5] but I don't know anything about them (if you do, please
> comment on the bug), and the idea of adding even more stuff into the
> monolithic, already creaky and somewhat bloated node_exporter is not
> appealing to me.
> 
> Potential problems wrt. btrfs are access to root-only information,
> like e.g. the btrfs device stats/errors in the aforementioned bug,
> since exporters are really supposed to run unprivileged due to network
> exposure. The S.M.A.R.T. exporter [6] solves this with dual-process
> contortions; obviously it would be better if all relevant metrics were
> accessible directly in sysfs and not require privileged access, but
> forking a tiny privileged process every polling interval is probably
> not that bad.
> 
> All ideas welcome!
You might be interested in what Netdata [1] is doing.  We've already got 
tracking of space allocations via the sysfs interface (fun fact, you 
actually don't have to be root on most systems to read that data), and 
also ship some per-defined alarms that will trigger when the device gets 
close to full at a low-level (more specifically, if total chunk 
allocations exceed 90% of the total space of all the devices in the volume).

Actual data collection is being done in C (Netdata already has a lot of 
infrastructure for parsing things out of /proc or /sys), and there ahs 
been some discussion in the past of adding collection of device error 
counters (I've been working on and off on it myself, but I still don't 
have a good enough understanding of the C code to get anything actually 
working yet).

[1] https://my-netdata.io/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-10-08 12:30 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-07 13:37 Monitoring btrfs with Prometheus (and soon OpenMonitoring) Holger Hoffstätte
2018-10-08 12:29 ` Austin S. Hemmelgarn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.