All of lore.kernel.org
 help / color / mirror / Atom feed
* Monitoring ceph and prometheus
@ 2017-05-11 11:52 Jan Fajerski
  2017-05-11 12:14 ` John Spray
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Fajerski @ 2017-05-11 11:52 UTC (permalink / raw)
  To: ceph-devel

Hi list,
I recently looked into Ceph monitoring with prometheus. There is already a ceph 
exporter for this purpose here https://github.com/digitalocean/ceph_exporter.

Prometheus encourages software projects to instrument their code directly and 
expose this data, instead of using an external piece of code. Several libraries 
are provided for this purpose: 
https://prometheus.io/docs/instrumenting/clientlibs/

I think there are arguments for adding this instrumentation to Ceph directly.  
Generally speaking it should reduce overall complexity in the code (no extra 
exporter component outside of ceph) and in operations (no extra package and 
configuration).

The direct instrumentation could happen in two places:
1)
Directly in Cephs C++ code using https://github.com/jupp0r/prometheus-cpp.  This 
would mean daemons expose their metrics directly via the prometheus http 
interface. This would be the most direct way of exposing metrics, prometheus 
would simply poll all endpoints. Service discovery for scrape targets, say added 
or removed OSDS, would however have to be handled somewhere. For orchestration 
tools à la k8s, ansible, salt, ... either have this feature already or it would 
be simple enough to add. Deployments not using a tool like that need another 
approach. Prometheus offer various mechanisms 
(https://prometheus.io/docs/operating/configuration/#%3Cscrape_config%3E) or a 
ceph component (say mon or mgr) could handle this.

2)
Add a ceph-mgr plugin that exposes the metrics available to ceph-mgr as a 
prometheus scrape target (using https://github.com/prometheus/client_python).  
This would handle the service discovery issue for ceph daemons out of the box 
(though not for the actual mgr-daemon which is the scrape target). The code 
would also be in a central location instead of being scattered in several 
places. It does however add a (maybe pointless) level of indirection 
($ceph_daemon -> ceph-mgr -> prometheus) and adds the need for two different 
scrape intervals (assuming mgr polls metrics from daemons).

I'm aware of the current dashboard efforts based on ceph-mgr exported data. I'm 
sure the data export for the dashboard and prometheus could be unified at some 
point.

Best,
Jan

-- 
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Monitoring ceph and prometheus
  2017-05-11 11:52 Monitoring ceph and prometheus Jan Fajerski
@ 2017-05-11 12:14 ` John Spray
  2017-05-11 12:47   ` Sage Weil
  0 siblings, 1 reply; 13+ messages in thread
From: John Spray @ 2017-05-11 12:14 UTC (permalink / raw)
  To: Ceph Development

On Thu, May 11, 2017 at 12:52 PM, Jan Fajerski <jfajerski@suse.com> wrote:
> Hi list,
> I recently looked into Ceph monitoring with prometheus. There is already a
> ceph exporter for this purpose here
> https://github.com/digitalocean/ceph_exporter.
>
> Prometheus encourages software projects to instrument their code directly
> and expose this data, instead of using an external piece of code. Several
> libraries are provided for this purpose:
> https://prometheus.io/docs/instrumenting/clientlibs/
>
> I think there are arguments for adding this instrumentation to Ceph
> directly.  Generally speaking it should reduce overall complexity in the
> code (no extra exporter component outside of ceph) and in operations (no
> extra package and configuration).
>
> The direct instrumentation could happen in two places:
> 1)
> Directly in Cephs C++ code using https://github.com/jupp0r/prometheus-cpp.
> This would mean daemons expose their metrics directly via the prometheus
> http interface. This would be the most direct way of exposing metrics,
> prometheus would simply poll all endpoints. Service discovery for scrape
> targets, say added or removed OSDS, would however have to be handled
> somewhere. For orchestration tools à la k8s, ansible, salt, ... either have
> this feature already or it would be simple enough to add. Deployments not
> using a tool like that need another approach. Prometheus offer various
> mechanisms
> (https://prometheus.io/docs/operating/configuration/#%3Cscrape_config%3E) or
> a ceph component (say mon or mgr) could handle this.
>
> 2)
> Add a ceph-mgr plugin that exposes the metrics available to ceph-mgr as a
> prometheus scrape target (using
> https://github.com/prometheus/client_python).  This would handle the service
> discovery issue for ceph daemons out of the box (though not for the actual
> mgr-daemon which is the scrape target). The code would also be in a central
> location instead of being scattered in several places. It does however add a
> (maybe pointless) level of indirection ($ceph_daemon -> ceph-mgr ->
> prometheus) and adds the need for two different scrape intervals (assuming
> mgr polls metrics from daemons).

I would love to see a mgr module for prometheus integration!

John

> I'm aware of the current dashboard efforts based on ceph-mgr exported data.
> I'm sure the data export for the dashboard and prometheus could be unified
> at some point.
>
> Best,
> Jan
>
> --
> Jan Fajerski
> Engineer Enterprise Storage
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nürnberg)
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Monitoring ceph and prometheus
  2017-05-11 12:14 ` John Spray
@ 2017-05-11 12:47   ` Sage Weil
  2017-05-12  1:03     ` Brad Hubbard
  2017-05-13 10:14     ` Lars Marowsky-Bree
  0 siblings, 2 replies; 13+ messages in thread
From: Sage Weil @ 2017-05-11 12:47 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2470 bytes --]

On Thu, 11 May 2017, John Spray wrote:
> On Thu, May 11, 2017 at 12:52 PM, Jan Fajerski <jfajerski@suse.com> wrote:
> > Hi list,
> > I recently looked into Ceph monitoring with prometheus. There is already a
> > ceph exporter for this purpose here
> > https://github.com/digitalocean/ceph_exporter.
> >
> > Prometheus encourages software projects to instrument their code directly
> > and expose this data, instead of using an external piece of code. Several
> > libraries are provided for this purpose:
> > https://prometheus.io/docs/instrumenting/clientlibs/
> >
> > I think there are arguments for adding this instrumentation to Ceph
> > directly.  Generally speaking it should reduce overall complexity in the
> > code (no extra exporter component outside of ceph) and in operations (no
> > extra package and configuration).
> >
> > The direct instrumentation could happen in two places:
> > 1)
> > Directly in Cephs C++ code using https://github.com/jupp0r/prometheus-cpp.
> > This would mean daemons expose their metrics directly via the prometheus
> > http interface. This would be the most direct way of exposing metrics,
> > prometheus would simply poll all endpoints. Service discovery for scrape
> > targets, say added or removed OSDS, would however have to be handled
> > somewhere. For orchestration tools à la k8s, ansible, salt, ... either have
> > this feature already or it would be simple enough to add. Deployments not
> > using a tool like that need another approach. Prometheus offer various
> > mechanisms
> > (https://prometheus.io/docs/operating/configuration/#%3Cscrape_config%3E) or
> > a ceph component (say mon or mgr) could handle this.
> >
> > 2)
> > Add a ceph-mgr plugin that exposes the metrics available to ceph-mgr as a
> > prometheus scrape target (using
> > https://github.com/prometheus/client_python).  This would handle the service
> > discovery issue for ceph daemons out of the box (though not for the actual
> > mgr-daemon which is the scrape target). The code would also be in a central
> > location instead of being scattered in several places. It does however add a
> > (maybe pointless) level of indirection ($ceph_daemon -> ceph-mgr ->
> > prometheus) and adds the need for two different scrape intervals (assuming
> > mgr polls metrics from daemons).
> 
> I would love to see a mgr module for prometheus integration!

Me too!  It might make more sense to do it in C++ than python, though, for 
performance reasons.

sage

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Monitoring ceph and prometheus
  2017-05-11 12:47   ` Sage Weil
@ 2017-05-12  1:03     ` Brad Hubbard
  2017-05-12  1:07       ` Sage Weil
  2017-05-13 10:14     ` Lars Marowsky-Bree
  1 sibling, 1 reply; 13+ messages in thread
From: Brad Hubbard @ 2017-05-12  1:03 UTC (permalink / raw)
  To: Sage Weil; +Cc: John Spray, Ceph Development



On Thu, May 11, 2017 at 10:47 PM, Sage Weil <sage@newdream.net> wrote:
> On Thu, 11 May 2017, John Spray wrote:
>> On Thu, May 11, 2017 at 12:52 PM, Jan Fajerski <jfajerski@suse.com> wrote:
>> > Hi list,
>> > I recently looked into Ceph monitoring with prometheus. There is already a
>> > ceph exporter for this purpose here
>> > https://github.com/digitalocean/ceph_exporter.
>> >
>> > Prometheus encourages software projects to instrument their code directly
>> > and expose this data, instead of using an external piece of code. Several
>> > libraries are provided for this purpose:
>> > https://prometheus.io/docs/instrumenting/clientlibs/
>> >
>> > I think there are arguments for adding this instrumentation to Ceph
>> > directly.  Generally speaking it should reduce overall complexity in the
>> > code (no extra exporter component outside of ceph) and in operations (no
>> > extra package and configuration).
>> >
>> > The direct instrumentation could happen in two places:
>> > 1)
>> > Directly in Cephs C++ code using https://github.com/jupp0r/prometheus-cpp.
>> > This would mean daemons expose their metrics directly via the prometheus
>> > http interface. This would be the most direct way of exposing metrics,
>> > prometheus would simply poll all endpoints. Service discovery for scrape
>> > targets, say added or removed OSDS, would however have to be handled
>> > somewhere. For orchestration tools à la k8s, ansible, salt, ... either have
>> > this feature already or it would be simple enough to add. Deployments not
>> > using a tool like that need another approach. Prometheus offer various
>> > mechanisms
>> > (https://prometheus.io/docs/operating/configuration/#%3Cscrape_config%3E) or
>> > a ceph component (say mon or mgr) could handle this.
>> >
>> > 2)
>> > Add a ceph-mgr plugin that exposes the metrics available to ceph-mgr as a
>> > prometheus scrape target (using
>> > https://github.com/prometheus/client_python).  This would handle the service
>> > discovery issue for ceph daemons out of the box (though not for the actual
>> > mgr-daemon which is the scrape target). The code would also be in a central
>> > location instead of being scattered in several places. It does however add a
>> > (maybe pointless) level of indirection ($ceph_daemon -> ceph-mgr ->
>> > prometheus) and adds the need for two different scrape intervals (assuming
>> > mgr polls metrics from daemons).
>>
>> I would love to see a mgr module for prometheus integration!
>
> Me too!  It might make more sense to do it in C++ than python, though, for
> performance reasons.

Can we define "metrics" here? What, specifically, are we planning to gather?

Let's start with an example from "ceph_exporter". It exposes a metric
ApplyLatency which it obtains by connecting to the cluster via a rados client
connection and running the "osd perf" command and gathering the apply_latency_ms
result. I believe this stat is the equivalent of the apply_latency perf counters
statistic.

Does the manager currently export the performance counters? If not option 1 is
looking more viable for gathering these sorts (think "perf dump") of
metrics unless the manager can proxy calls such as "osd perf" back to the MONs?

Part of the problem with gathering metrics from ceph is working out what set of
metrics you want to collect from a large assortment available IMHO.

>
> sage



-- 
Cheers,
Brad

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Monitoring ceph and prometheus
  2017-05-12  1:03     ` Brad Hubbard
@ 2017-05-12  1:07       ` Sage Weil
  2017-05-12  1:16         ` Brad Hubbard
  0 siblings, 1 reply; 13+ messages in thread
From: Sage Weil @ 2017-05-12  1:07 UTC (permalink / raw)
  To: Brad Hubbard; +Cc: John Spray, Ceph Development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4117 bytes --]

On Fri, 12 May 2017, Brad Hubbard wrote:
> On Thu, May 11, 2017 at 10:47 PM, Sage Weil <sage@newdream.net> wrote:
> > On Thu, 11 May 2017, John Spray wrote:
> >> On Thu, May 11, 2017 at 12:52 PM, Jan Fajerski <jfajerski@suse.com> wrote:
> >> > Hi list,
> >> > I recently looked into Ceph monitoring with prometheus. There is already a
> >> > ceph exporter for this purpose here
> >> > https://github.com/digitalocean/ceph_exporter.
> >> >
> >> > Prometheus encourages software projects to instrument their code directly
> >> > and expose this data, instead of using an external piece of code. Several
> >> > libraries are provided for this purpose:
> >> > https://prometheus.io/docs/instrumenting/clientlibs/
> >> >
> >> > I think there are arguments for adding this instrumentation to Ceph
> >> > directly.  Generally speaking it should reduce overall complexity in the
> >> > code (no extra exporter component outside of ceph) and in operations (no
> >> > extra package and configuration).
> >> >
> >> > The direct instrumentation could happen in two places:
> >> > 1)
> >> > Directly in Cephs C++ code using https://github.com/jupp0r/prometheus-cpp.
> >> > This would mean daemons expose their metrics directly via the prometheus
> >> > http interface. This would be the most direct way of exposing metrics,
> >> > prometheus would simply poll all endpoints. Service discovery for scrape
> >> > targets, say added or removed OSDS, would however have to be handled
> >> > somewhere. For orchestration tools à la k8s, ansible, salt, ... either have
> >> > this feature already or it would be simple enough to add. Deployments not
> >> > using a tool like that need another approach. Prometheus offer various
> >> > mechanisms
> >> > (https://prometheus.io/docs/operating/configuration/#%3Cscrape_config%3E) or
> >> > a ceph component (say mon or mgr) could handle this.
> >> >
> >> > 2)
> >> > Add a ceph-mgr plugin that exposes the metrics available to ceph-mgr as a
> >> > prometheus scrape target (using
> >> > https://github.com/prometheus/client_python).  This would handle the service
> >> > discovery issue for ceph daemons out of the box (though not for the actual
> >> > mgr-daemon which is the scrape target). The code would also be in a central
> >> > location instead of being scattered in several places. It does however add a
> >> > (maybe pointless) level of indirection ($ceph_daemon -> ceph-mgr ->
> >> > prometheus) and adds the need for two different scrape intervals (assuming
> >> > mgr polls metrics from daemons).
> >>
> >> I would love to see a mgr module for prometheus integration!
> >
> > Me too!  It might make more sense to do it in C++ than python, though, for
> > performance reasons.
> 
> Can we define "metrics" here? What, specifically, are we planning to gather?
> 
> Let's start with an example from "ceph_exporter". It exposes a metric
> ApplyLatency which it obtains by connecting to the cluster via a rados client
> connection and running the "osd perf" command and gathering the apply_latency_ms
> result. I believe this stat is the equivalent of the apply_latency perf counters
> statistic.
> 
> Does the manager currently export the performance counters? If not option 1 is
> looking more viable for gathering these sorts (think "perf dump") of
> metrics unless the manager can proxy calls such as "osd perf" back to the MONs?

Right now all of the perfcounters are reported to ceph-mgr.  We shouldn't 
need to do 'osd perf' (which is just reporting those 2 metrics that the 
osds have historically reported to the mon).
 
> Part of the problem with gathering metrics from ceph is working out what set of
> metrics you want to collect from a large assortment available IMHO.

We could collect them all. Or, we recently introduced a 'priority' field 
so we can collect everything above a threshold (although then we have to 
go assign meaningful priorities to most of the counters).

BTW one of the cool things about prometheus is that it has a histogram 
type, which means we can take our 2d histogram data and report that 
(flattened into one or the other dimension).

sage

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Monitoring ceph and prometheus
  2017-05-12  1:07       ` Sage Weil
@ 2017-05-12  1:16         ` Brad Hubbard
  0 siblings, 0 replies; 13+ messages in thread
From: Brad Hubbard @ 2017-05-12  1:16 UTC (permalink / raw)
  To: Sage Weil; +Cc: John Spray, Ceph Development

On Fri, May 12, 2017 at 11:07 AM, Sage Weil <sage@newdream.net> wrote:
> On Fri, 12 May 2017, Brad Hubbard wrote:
>> On Thu, May 11, 2017 at 10:47 PM, Sage Weil <sage@newdream.net> wrote:
>> > On Thu, 11 May 2017, John Spray wrote:
>> >> On Thu, May 11, 2017 at 12:52 PM, Jan Fajerski <jfajerski@suse.com> wrote:
>> >> > Hi list,
>> >> > I recently looked into Ceph monitoring with prometheus. There is already a
>> >> > ceph exporter for this purpose here
>> >> > https://github.com/digitalocean/ceph_exporter.
>> >> >
>> >> > Prometheus encourages software projects to instrument their code directly
>> >> > and expose this data, instead of using an external piece of code. Several
>> >> > libraries are provided for this purpose:
>> >> > https://prometheus.io/docs/instrumenting/clientlibs/
>> >> >
>> >> > I think there are arguments for adding this instrumentation to Ceph
>> >> > directly.  Generally speaking it should reduce overall complexity in the
>> >> > code (no extra exporter component outside of ceph) and in operations (no
>> >> > extra package and configuration).
>> >> >
>> >> > The direct instrumentation could happen in two places:
>> >> > 1)
>> >> > Directly in Cephs C++ code using https://github.com/jupp0r/prometheus-cpp.
>> >> > This would mean daemons expose their metrics directly via the prometheus
>> >> > http interface. This would be the most direct way of exposing metrics,
>> >> > prometheus would simply poll all endpoints. Service discovery for scrape
>> >> > targets, say added or removed OSDS, would however have to be handled
>> >> > somewhere. For orchestration tools à la k8s, ansible, salt, ... either have
>> >> > this feature already or it would be simple enough to add. Deployments not
>> >> > using a tool like that need another approach. Prometheus offer various
>> >> > mechanisms
>> >> > (https://prometheus.io/docs/operating/configuration/#%3Cscrape_config%3E) or
>> >> > a ceph component (say mon or mgr) could handle this.
>> >> >
>> >> > 2)
>> >> > Add a ceph-mgr plugin that exposes the metrics available to ceph-mgr as a
>> >> > prometheus scrape target (using
>> >> > https://github.com/prometheus/client_python).  This would handle the service
>> >> > discovery issue for ceph daemons out of the box (though not for the actual
>> >> > mgr-daemon which is the scrape target). The code would also be in a central
>> >> > location instead of being scattered in several places. It does however add a
>> >> > (maybe pointless) level of indirection ($ceph_daemon -> ceph-mgr ->
>> >> > prometheus) and adds the need for two different scrape intervals (assuming
>> >> > mgr polls metrics from daemons).
>> >>
>> >> I would love to see a mgr module for prometheus integration!
>> >
>> > Me too!  It might make more sense to do it in C++ than python, though, for
>> > performance reasons.
>>
>> Can we define "metrics" here? What, specifically, are we planning to gather?
>>
>> Let's start with an example from "ceph_exporter". It exposes a metric
>> ApplyLatency which it obtains by connecting to the cluster via a rados client
>> connection and running the "osd perf" command and gathering the apply_latency_ms
>> result. I believe this stat is the equivalent of the apply_latency perf counters
>> statistic.
>>
>> Does the manager currently export the performance counters? If not option 1 is
>> looking more viable for gathering these sorts (think "perf dump") of
>> metrics unless the manager can proxy calls such as "osd perf" back to the MONs?
>
> Right now all of the perfcounters are reported to ceph-mgr.  We shouldn't
> need to do 'osd perf' (which is just reporting those 2 metrics that the
> osds have historically reported to the mon).

Ah, in DaemonState.* and MgrClient.cc. I see the mechanics now, thanks.

>
>> Part of the problem with gathering metrics from ceph is working out what set of
>> metrics you want to collect from a large assortment available IMHO.
>
> We could collect them all. Or, we recently introduced a 'priority' field
> so we can collect everything above a threshold (although then we have to
> go assign meaningful priorities to most of the counters).
>
> BTW one of the cool things about prometheus is that it has a histogram
> type, which means we can take our 2d histogram data and report that
> (flattened into one or the other dimension).
>
> sage



-- 
Cheers,
Brad

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Monitoring ceph and prometheus
  2017-05-11 12:47   ` Sage Weil
  2017-05-12  1:03     ` Brad Hubbard
@ 2017-05-13 10:14     ` Lars Marowsky-Bree
  2017-05-14 22:27       ` John Spray
  1 sibling, 1 reply; 13+ messages in thread
From: Lars Marowsky-Bree @ 2017-05-13 10:14 UTC (permalink / raw)
  To: Ceph Development

On 2017-05-11T12:47:21, Sage Weil <sage@newdream.net> wrote:

> > I would love to see a mgr module for prometheus integration!
> Me too!  It might make more sense to do it in C++ than python, though, for 
> performance reasons.

I'm leaning the other way. (Disclaimer: I started this dialogue
internally and was originally thinking of putting it into ceph-mgr.)

prometheus implements a pull model for time series data / metrics. For
those to be pull-able from ceph-mgr, either ceph-mgr needs to pull
itself, or daemons stream to it. Clearly it can't pull something that's
not there.

Both have slightly different issues with aligning the periods/intervals.

Prometheus also can scale through polling via several instances; if we
pull everything from ceph-mgr, that is a single chokepoint.

Further, if ceph-mgr were to pull data from individual daemons - why not
have prometheus do this directly? What benefit does this additional
indirection step offer?

If we have rather detailed stats per daemon, ceph-mgr would either relay
them on as-is (pure overhead), or aggregate them - and likely not
aggregate them as well/flexibly as Prometheus would allow via PQL.

Now, that's not to say that ceph-mgr would not benefit from a Prometheus
interface! I could easily see ceph-mgr have stats of its own that are
worth monitoring, and we should make it easy to export those.

So, in short, I believe an easy way to export per-daemon metrics is
desirable. ceph-mgr might choose to pull these in as well if it has a
use for them, but I think Prometheus would best attach to the daemons
directly too.


Regards,
    Lars

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Monitoring ceph and prometheus
  2017-05-13 10:14     ` Lars Marowsky-Bree
@ 2017-05-14 22:27       ` John Spray
  2017-05-15  6:44         ` Lars Marowsky-Bree
  0 siblings, 1 reply; 13+ messages in thread
From: John Spray @ 2017-05-14 22:27 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: Ceph Development

On Sat, May 13, 2017 at 11:14 AM, Lars Marowsky-Bree <lmb@suse.com> wrote:
> On 2017-05-11T12:47:21, Sage Weil <sage@newdream.net> wrote:
>
>> > I would love to see a mgr module for prometheus integration!
>> Me too!  It might make more sense to do it in C++ than python, though, for
>> performance reasons.
>
> I'm leaning the other way. (Disclaimer: I started this dialogue
> internally and was originally thinking of putting it into ceph-mgr.)
>
> prometheus implements a pull model for time series data / metrics. For
> those to be pull-able from ceph-mgr, either ceph-mgr needs to pull
> itself, or daemons stream to it. Clearly it can't pull something that's
> not there.
>
> Both have slightly different issues with aligning the periods/intervals.
>
> Prometheus also can scale through polling via several instances; if we
> pull everything from ceph-mgr, that is a single chokepoint.

The question of bottlenecks comes up regularly when discussing this.
Before going and adding new interfaces to the OSDs to talk to
prometheus, I think it would be useful to find out if there really is
a problem.  We're talking about pretty small messages here, doing
nothing but updating some counters in memory, and it's a lot less work
than the OSDs already do.

When passing that data onwards, I don't know if prometheus has an
issue dealing with a single endpoint that gives them a huge amount of
data.  I have not looked into it, but I wonder if the federation
interface[1] would be appropriate: make ceph-mgr look like a federated
prometheus instance instead of a normal endpoint.

1. https://prometheus.io/docs/operating/federation/

> Further, if ceph-mgr were to pull data from individual daemons - why not
> have prometheus do this directly? What benefit does this additional
> indirection step offer?

Simplicity.  It makes it super simple for a user with nothing but
vanilla Ceph and vanilla Prometheus to connect the two things
together.  Anything that requires lots of per-daemon configuration
relies on some addition orchestration tool to do that plumbing.

While that orchestration is not intrinsically complex, it's an area of
fragmentation in the community, whereas things we can simply build
into Ceph have a better chance of wider adoption.  If we build this
into ceph-mgr, then we can have a super-simple page on docs.ceph.com
that tells people how to plug any Ceph cluster into Prometheus in a
couple of commands.  If it relies on (various) external orchestrators,
we lose that.

In conversations about this topic (centralized vs. per-daemon stats),
we usually come to the conclusion that both are useful: the simple
"batteries included" configuration where we present a single endpoint,
vs. the configuration where some external program is aware of all
individual daemons and monitoring them directly.  If we end up with
both, that's not an awful thing.

As you point out, one ends up putting a prometheus endpoint into the
mgr anyway to expose the cluster-wide stats (as opposed to the daemon
perf counters), so it's probably absurdly easy to just make it expose
the perf counters too, even if one also continues to add code to
(optionally?) expose perf counters directly from daemons too.

John

> If we have rather detailed stats per daemon, ceph-mgr would either relay
> them on as-is (pure overhead), or aggregate them - and likely not
> aggregate them as well/flexibly as Prometheus would allow via PQL.
>
> Now, that's not to say that ceph-mgr would not benefit from a Prometheus
> interface! I could easily see ceph-mgr have stats of its own that are
> worth monitoring, and we should make it easy to export those.
>
> So, in short, I believe an easy way to export per-daemon metrics is
> desirable. ceph-mgr might choose to pull these in as well if it has a
> use for them, but I think Prometheus would best attach to the daemons
> directly too.
>
>
> Regards,
>     Lars
>
> --
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Monitoring ceph and prometheus
  2017-05-14 22:27       ` John Spray
@ 2017-05-15  6:44         ` Lars Marowsky-Bree
  2017-05-15 12:33           ` John Spray
  0 siblings, 1 reply; 13+ messages in thread
From: Lars Marowsky-Bree @ 2017-05-15  6:44 UTC (permalink / raw)
  To: Ceph Development

On 2017-05-14T23:27:03, John Spray <jspray@redhat.com> wrote:

> a problem.  We're talking about pretty small messages here, doing
> nothing but updating some counters in memory, and it's a lot less work
> than the OSDs already do.

True, but sending (or exposing) them to ceph-mgr only for ceph-mgr to
pass them on on-demand to Prometheus just still strikes me as a
redundant hop.

> Simplicity.  It makes it super simple for a user with nothing but
> vanilla Ceph and vanilla Prometheus to connect the two things
> together.  Anything that requires lots of per-daemon configuration
> relies on some addition orchestration tool to do that plumbing.

Prometheus doesn't usually require per-daemon configuration; it has all
the hooks to deal with dynamically update the list of daemons to
monitor.
https://github.com/prometheus/docs/blob/master/content/docs/operating/configuration.md

Ok, so maybe we don't want to use consul/marathon/k8s/serversets.
(Surprised not to see etcd, actually ;-)

But Ceph *does* have a service that tells a client about all
OSDs/MDSs/MONs/... instances, don't we? All the maps. This might better
be solved by a ceph_sd_configs section to point it at a Ceph cluster
with a single stanza?

So, OK, ceph-mgr could additionally keep track of radosgw or nfs-ganesha
instances, possibly more - and possibly strip out the parts of the maps
Prometheus doesn't need to know about. And possibly provide an API that
doesn't require CephX.

So, perhaps exposing this - the dynamic service/target discovery via
ceph-mgr to Prometheus, and then having Prometheus pull directly - is a
synthesis of both positions?

> In conversations about this topic (centralized vs. per-daemon stats),
> we usually come to the conclusion that both are useful: the simple
> "batteries included" configuration where we present a single endpoint,
> vs. the configuration where some external program is aware of all
> individual daemons and monitoring them directly.  If we end up with
> both, that's not an awful thing.

Perhaps the above is the one that can converge both positions into
one?


Regards,
    Lars

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Monitoring ceph and prometheus
  2017-05-15  6:44         ` Lars Marowsky-Bree
@ 2017-05-15 12:33           ` John Spray
  2017-05-18  8:37             ` Lars Marowsky-Bree
  0 siblings, 1 reply; 13+ messages in thread
From: John Spray @ 2017-05-15 12:33 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: Ceph Development

On Mon, May 15, 2017 at 7:44 AM, Lars Marowsky-Bree <lmb@suse.com> wrote:
> On 2017-05-14T23:27:03, John Spray <jspray@redhat.com> wrote:
>
>> a problem.  We're talking about pretty small messages here, doing
>> nothing but updating some counters in memory, and it's a lot less work
>> than the OSDs already do.
>
> True, but sending (or exposing) them to ceph-mgr only for ceph-mgr to
> pass them on on-demand to Prometheus just still strikes me as a
> redundant hop.

At the risk of being a bit picky, it's only redundant if prometheus is
the only thing consuming them.  If the user is also using some mgr
modules (including things like handy CLI views) that consume the
stats, it's not redundant at all.  I'd like to keep these stats around
in the mgr because we're not quite sure yet what kinds of modules
we'll end up with.

Sage's recent change to add the importance thresholds to perf counters
could be interesting here: we might end up sending everything that's
"reasonably important" and higher to the mgr for exposing in CLI tools
etc (I'm thinking of things like the OSD throughput, the MDS number of
each op per second, etc), while perhaps the really obscure stuff would
only get collected (into prometheus?) if someone actively chose that.

>> Simplicity.  It makes it super simple for a user with nothing but
>> vanilla Ceph and vanilla Prometheus to connect the two things
>> together.  Anything that requires lots of per-daemon configuration
>> relies on some addition orchestration tool to do that plumbing.
>
> Prometheus doesn't usually require per-daemon configuration; it has all
> the hooks to deal with dynamically update the list of daemons to
> monitor.
> https://github.com/prometheus/docs/blob/master/content/docs/operating/configuration.md
>
> Ok, so maybe we don't want to use consul/marathon/k8s/serversets.
> (Surprised not to see etcd, actually ;-)
>
> But Ceph *does* have a service that tells a client about all
> OSDs/MDSs/MONs/... instances, don't we? All the maps. This might better
> be solved by a ceph_sd_configs section to point it at a Ceph cluster
> with a single stanza?
>
> So, OK, ceph-mgr could additionally keep track of radosgw or nfs-ganesha
> instances, possibly more - and possibly strip out the parts of the maps
> Prometheus doesn't need to know about. And possibly provide an API that
> doesn't require CephX.
>
> So, perhaps exposing this - the dynamic service/target discovery via
> ceph-mgr to Prometheus, and then having Prometheus pull directly - is a
> synthesis of both positions?

It would certainly be ++good build in the service discovery so that
the user only needs to point prometheus at one place to discover
everything.  Anything that avoids the need for extra external tools to
set things up makes me happy.

John

>
>> In conversations about this topic (centralized vs. per-daemon stats),
>> we usually come to the conclusion that both are useful: the simple
>> "batteries included" configuration where we present a single endpoint,
>> vs. the configuration where some external program is aware of all
>> individual daemons and monitoring them directly.  If we end up with
>> both, that's not an awful thing.
>
> Perhaps the above is the one that can converge both positions into
> one?
>
>
> Regards,
>     Lars
>
> --
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Monitoring ceph and prometheus
  2017-05-15 12:33           ` John Spray
@ 2017-05-18  8:37             ` Lars Marowsky-Bree
  2017-05-18  9:03               ` John Spray
  0 siblings, 1 reply; 13+ messages in thread
From: Lars Marowsky-Bree @ 2017-05-18  8:37 UTC (permalink / raw)
  To: Ceph Development

On 2017-05-15T13:33:29, John Spray <jspray@redhat.com> wrote:

> At the risk of being a bit picky, it's only redundant if prometheus is
> the only thing consuming them.  If the user is also using some mgr
> modules (including things like handy CLI views) that consume the
> stats, it's not redundant at all.  I'd like to keep these stats around
> in the mgr because we're not quite sure yet what kinds of modules
> we'll end up with.

Fair enough. The point that they may wish to gather information at
different frequencies still remains though - a ceph-mgr module may do it
on-demand for certain tasks, event driven, or periodically, prometheus
(or other trending) would want to poll certain counters at various
frequencies, etc.

(e.g., maybe the OSD ones every 10s, SMART every 3h, whatever)

Aligning these would be annoying, and it seems to me that it makes more
sense to allow them to poll independently from the same interfaces.

> Sage's recent change to add the importance thresholds to perf counters
> could be interesting here: we might end up sending everything that's
> "reasonably important" and higher to the mgr for exposing in CLI tools
> etc (I'm thinking of things like the OSD throughput, the MDS number of
> each op per second, etc), while perhaps the really obscure stuff would
> only get collected (into prometheus?) if someone actively chose that.

That's actually somewhat related to how smart classifies. Value,
threshold, type (old-age, pre-fail, we could add a "perf" one).

I take the point - there's also a need for an event-driven channel that
needs to be push by default. (From simple operation completion
notification to "OMFG the disk caught fire.")

I could see those going to ceph-mgr for handling/relaying.

> > So, perhaps exposing this - the dynamic service/target discovery via
> > ceph-mgr to Prometheus, and then having Prometheus pull directly - is a
> > synthesis of both positions?
> It would certainly be ++good build in the service discovery so that
> the user only needs to point prometheus at one place to discover
> everything.  Anything that avoids the need for extra external tools to
> set things up makes me happy.

Yes, I think that'd be great to have. And at least in my head the idea
of where information goes becomes clearer.

Notifications/events go to and through ceph-mgr. ceph-mgr keeps track of
Ceph services. Trending/metrics should IMNSHO be polled directly as
needed.


Regards,
    Lars

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Monitoring ceph and prometheus
  2017-05-18  8:37             ` Lars Marowsky-Bree
@ 2017-05-18  9:03               ` John Spray
  2017-05-19 11:00                 ` Lars Marowsky-Bree
  0 siblings, 1 reply; 13+ messages in thread
From: John Spray @ 2017-05-18  9:03 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: Ceph Development

On Thu, May 18, 2017 at 9:37 AM, Lars Marowsky-Bree <lmb@suse.com> wrote:
> On 2017-05-15T13:33:29, John Spray <jspray@redhat.com> wrote:
>
>> At the risk of being a bit picky, it's only redundant if prometheus is
>> the only thing consuming them.  If the user is also using some mgr
>> modules (including things like handy CLI views) that consume the
>> stats, it's not redundant at all.  I'd like to keep these stats around
>> in the mgr because we're not quite sure yet what kinds of modules
>> we'll end up with.
>
> Fair enough. The point that they may wish to gather information at
> different frequencies still remains though - a ceph-mgr module may do it
> on-demand for certain tasks, event driven, or periodically, prometheus
> (or other trending) would want to poll certain counters at various
> frequencies, etc.

I'm slightly getting the impression that you might not have noticed
the existing functionality here -- the perf counters are sent
continuously from the daemons to the mgr, rather than being polled.
The mgr is in control of how often that is (via the MMgrConfigure
message).

>
> (e.g., maybe the OSD ones every 10s, SMART every 3h, whatever)

To be clear, when I talk about stats I'm talking about the perf
counters -- if SMART monitoring is added at some stage then I would
imagine sending that using a different mechanism.  As you say, sending
SMART counters at the same frequency as normal perf counters wouldn't
make sense.

>
> Aligning these would be annoying, and it seems to me that it makes more
> sense to allow them to poll independently from the same interfaces.
>
>> Sage's recent change to add the importance thresholds to perf counters
>> could be interesting here: we might end up sending everything that's
>> "reasonably important" and higher to the mgr for exposing in CLI tools
>> etc (I'm thinking of things like the OSD throughput, the MDS number of
>> each op per second, etc), while perhaps the really obscure stuff would
>> only get collected (into prometheus?) if someone actively chose that.
>
> That's actually somewhat related to how smart classifies. Value,
> threshold, type (old-age, pre-fail, we could add a "perf" one).
>
> I take the point - there's also a need for an event-driven channel that
> needs to be push by default. (From simple operation completion
> notification to "OMFG the disk caught fire.")

Again, that would be something separate from the existing perf counter
functionality.

> I could see those going to ceph-mgr for handling/relaying.

Yep.

>
>> > So, perhaps exposing this - the dynamic service/target discovery via
>> > ceph-mgr to Prometheus, and then having Prometheus pull directly - is a
>> > synthesis of both positions?
>> It would certainly be ++good build in the service discovery so that
>> the user only needs to point prometheus at one place to discover
>> everything.  Anything that avoids the need for extra external tools to
>> set things up makes me happy.
>
> Yes, I think that'd be great to have. And at least in my head the idea
> of where information goes becomes clearer.
>
> Notifications/events go to and through ceph-mgr. ceph-mgr keeps track of
> Ceph services. Trending/metrics should IMNSHO be polled directly as
> needed.

I'm not opposed to having a polling interface there if you want to add
it -- it could be useful for anyone who chooses to turn off the
existing stats transmission.  However, we should be mindful that it
will complicate the lives of plugin authors if they are uncertain
about whether they're running on a polling-configured (reading stats
is a network op) or a streaming-configured system (reading stats super
fast).

John

>
>
> Regards,
>     Lars
>
> --
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Monitoring ceph and prometheus
  2017-05-18  9:03               ` John Spray
@ 2017-05-19 11:00                 ` Lars Marowsky-Bree
  0 siblings, 0 replies; 13+ messages in thread
From: Lars Marowsky-Bree @ 2017-05-19 11:00 UTC (permalink / raw)
  To: Ceph Development

On 2017-05-18T10:03:25, John Spray <jspray@redhat.com> wrote:

> > Fair enough. The point that they may wish to gather information at
> > different frequencies still remains though - a ceph-mgr module may do it
> > on-demand for certain tasks, event driven, or periodically, prometheus
> > (or other trending) would want to poll certain counters at various
> > frequencies, etc.
> I'm slightly getting the impression that you might not have noticed
> the existing functionality here -- the perf counters are sent
> continuously from the daemons to the mgr, rather than being polled.
> The mgr is in control of how often that is (via the MMgrConfigure
> message).

Oh, sorry. Should have been more clear. I'm aware of this, I just don't
like it - and the fact that the mgr needs to adjust the push interval
just highlights why too, because if something like Prometheus then
wanted to pull from the mgr, those intervals would need to be aligned.


-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-05-19 11:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-11 11:52 Monitoring ceph and prometheus Jan Fajerski
2017-05-11 12:14 ` John Spray
2017-05-11 12:47   ` Sage Weil
2017-05-12  1:03     ` Brad Hubbard
2017-05-12  1:07       ` Sage Weil
2017-05-12  1:16         ` Brad Hubbard
2017-05-13 10:14     ` Lars Marowsky-Bree
2017-05-14 22:27       ` John Spray
2017-05-15  6:44         ` Lars Marowsky-Bree
2017-05-15 12:33           ` John Spray
2017-05-18  8:37             ` Lars Marowsky-Bree
2017-05-18  9:03               ` John Spray
2017-05-19 11:00                 ` Lars Marowsky-Bree

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.