All of lore.kernel.org
 help / color / mirror / Atom feed
* Interpreting ceph osd pool stats output
@ 2017-03-10  2:37 Paul Cuzner
  2017-03-10  9:55 ` John Spray
  2017-03-20 14:20 ` Ruben Kerkhof
  0 siblings, 2 replies; 18+ messages in thread
From: Paul Cuzner @ 2017-03-10  2:37 UTC (permalink / raw)
  To: ceph-devel

Hi,

I've been putting together a collectd plugin for ceph - since the old
one's I could find no longer work. I'm gathering data from the mon's
admin socket, merged with a couple of commands I issue through the
rados mon_command interface.

Nothing complicated, but the data has me a little confused

When I run "osd pool stats" I get *two* different sets of metrics that
describe client i/o and recovery i/o. Since the metrics are different
I can't merge them to get a consistent view of what the cluster is
doing as a whole at any given point in time. For example, client i/o
reports in bytes_sec, but the recovery dict is empty and the
recovery_rate is in objects_sec...

i.e.

}, {
"pool_name": "rados-bench-cbt",
"pool_id": 86,
"recovery": {},
"recovery_rate": {
"recovering_objects_per_sec": 3530,
"recovering_bytes_per_sec": 14462655,
"recovering_keys_per_sec": 0,
"num_objects_recovered": 7148,
"num_bytes_recovered": 29278208,
"num_keys_recovered": 0
},
"client_io_rate": {}

This is running Jewel - 10.2.5-37.el7cp

Is this a bug or a 'feature' :)

Cheers,

Paul C

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-10  2:37 Interpreting ceph osd pool stats output Paul Cuzner
@ 2017-03-10  9:55 ` John Spray
  2017-03-10 20:52   ` Paul Cuzner
  2017-03-20 14:20 ` Ruben Kerkhof
  1 sibling, 1 reply; 18+ messages in thread
From: John Spray @ 2017-03-10  9:55 UTC (permalink / raw)
  To: Paul Cuzner; +Cc: Ceph Development

The reason they're different is that they originate from separate
internal counters:
 * The client_io_rate bits come from
https://github.com/ceph/ceph/blob/jewel/src/mon/PGMap.cc#L1212
 * The recovery bits come from
https://github.com/ceph/ceph/blob/jewel/src/mon/PGMap.cc#L1146

Not sure what you mean about bytes_sec vs objects_sec: client io and
recovery rate both have both objects and bytes counters.

The empty dicts are something that annoys me too, some of the output
functions have an if() right at the start that drops the output when
none of the deltas are nonzero.  I doubt anyone would have a big
problem with changing these to output the zeros rather than skipping
the fields.

BTW I'm not sure it's smart to merge these in practice: would result
in showing users a "your cluster is doing 10GB/s" statistics while
their workload is crawling because all that IO is really recovery.
Confusing.

John


On Fri, Mar 10, 2017 at 2:37 AM, Paul Cuzner <pcuzner@redhat.com> wrote:
> Hi,
>
> I've been putting together a collectd plugin for ceph - since the old
> one's I could find no longer work. I'm gathering data from the mon's
> admin socket, merged with a couple of commands I issue through the
> rados mon_command interface.
>
> Nothing complicated, but the data has me a little confused
>
> When I run "osd pool stats" I get *two* different sets of metrics that
> describe client i/o and recovery i/o. Since the metrics are different
> I can't merge them to get a consistent view of what the cluster is
> doing as a whole at any given point in time. For example, client i/o
> reports in bytes_sec, but the recovery dict is empty and the
> recovery_rate is in objects_sec...
>
> i.e.
>
> }, {
> "pool_name": "rados-bench-cbt",
> "pool_id": 86,
> "recovery": {},
> "recovery_rate": {
> "recovering_objects_per_sec": 3530,
> "recovering_bytes_per_sec": 14462655,
> "recovering_keys_per_sec": 0,
> "num_objects_recovered": 7148,
> "num_bytes_recovered": 29278208,
> "num_keys_recovered": 0
> },
> "client_io_rate": {}
>
> This is running Jewel - 10.2.5-37.el7cp
>
> Is this a bug or a 'feature' :)
>
> Cheers,
>
> Paul C
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-10  9:55 ` John Spray
@ 2017-03-10 20:52   ` Paul Cuzner
  2017-03-11 20:49     ` John Spray
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Cuzner @ 2017-03-10 20:52 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

Thanks John

This is weird then. When I look at the data with client load I see the
following;
{
"pool_name": "default.rgw.buckets.index",
"pool_id": 94,
"recovery": {},
"recovery_rate": {},
"client_io_rate": {
"read_bytes_sec": 19242365,
"write_bytes_sec": 0,
"read_op_per_sec": 12514,
"write_op_per_sec": 0
}

No object related counters - they're all block based. The plugin I
have rolls-up the block metrics across all pools to provide total
client load.

And as in the prior email recovery_rate counters are object related.

As far as merging the stats is concerned, I *do* believe it's useful
info for the admin to know - and maybe even the admin's boss :)

It would answer questions like - how busy is the cluster as a whole,
and with both client and recovery metrics aligned you could then drill
down into client/recovery components. It might also be interesting to
derive a ratio metric of client:recovery and maybe key of that for
automation (alerts/notifications, automated tuning etc etc)









On Fri, Mar 10, 2017 at 10:55 PM, John Spray <jspray@redhat.com> wrote:
> The reason they're different is that they originate from separate
> internal counters:
>  * The client_io_rate bits come from
> https://github.com/ceph/ceph/blob/jewel/src/mon/PGMap.cc#L1212
>  * The recovery bits come from
> https://github.com/ceph/ceph/blob/jewel/src/mon/PGMap.cc#L1146
>
> Not sure what you mean about bytes_sec vs objects_sec: client io and
> recovery rate both have both objects and bytes counters.
>
> The empty dicts are something that annoys me too, some of the output
> functions have an if() right at the start that drops the output when
> none of the deltas are nonzero.  I doubt anyone would have a big
> problem with changing these to output the zeros rather than skipping
> the fields.
>
> BTW I'm not sure it's smart to merge these in practice: would result
> in showing users a "your cluster is doing 10GB/s" statistics while
> their workload is crawling because all that IO is really recovery.
> Confusing.
>
> John
>
>
> On Fri, Mar 10, 2017 at 2:37 AM, Paul Cuzner <pcuzner@redhat.com> wrote:
>> Hi,
>>
>> I've been putting together a collectd plugin for ceph - since the old
>> one's I could find no longer work. I'm gathering data from the mon's
>> admin socket, merged with a couple of commands I issue through the
>> rados mon_command interface.
>>
>> Nothing complicated, but the data has me a little confused
>>
>> When I run "osd pool stats" I get *two* different sets of metrics that
>> describe client i/o and recovery i/o. Since the metrics are different
>> I can't merge them to get a consistent view of what the cluster is
>> doing as a whole at any given point in time. For example, client i/o
>> reports in bytes_sec, but the recovery dict is empty and the
>> recovery_rate is in objects_sec...
>>
>> i.e.
>>
>> }, {
>> "pool_name": "rados-bench-cbt",
>> "pool_id": 86,
>> "recovery": {},
>> "recovery_rate": {
>> "recovering_objects_per_sec": 3530,
>> "recovering_bytes_per_sec": 14462655,
>> "recovering_keys_per_sec": 0,
>> "num_objects_recovered": 7148,
>> "num_bytes_recovered": 29278208,
>> "num_keys_recovered": 0
>> },
>> "client_io_rate": {}
>>
>> This is running Jewel - 10.2.5-37.el7cp
>>
>> Is this a bug or a 'feature' :)
>>
>> Cheers,
>>
>> Paul C
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-10 20:52   ` Paul Cuzner
@ 2017-03-11 20:49     ` John Spray
  2017-03-11 21:24       ` Paul Cuzner
  0 siblings, 1 reply; 18+ messages in thread
From: John Spray @ 2017-03-11 20:49 UTC (permalink / raw)
  To: Paul Cuzner; +Cc: Ceph Development

On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
> Thanks John
>
> This is weird then. When I look at the data with client load I see the
> following;
> {
> "pool_name": "default.rgw.buckets.index",
> "pool_id": 94,
> "recovery": {},
> "recovery_rate": {},
> "client_io_rate": {
> "read_bytes_sec": 19242365,
> "write_bytes_sec": 0,
> "read_op_per_sec": 12514,
> "write_op_per_sec": 0
> }
>
> No object related counters - they're all block based. The plugin I
> have rolls-up the block metrics across all pools to provide total
> client load.

Where are you getting the idea that these counters have to do with
block storage?  What Ceph is telling you about here is the number of
operations (or bytes in those operations) being handled by OSDs.

John

> And as in the prior email recovery_rate counters are object related.
>
> As far as merging the stats is concerned, I *do* believe it's useful
> info for the admin to know - and maybe even the admin's boss :)
>
> It would answer questions like - how busy is the cluster as a whole,
> and with both client and recovery metrics aligned you could then drill
> down into client/recovery components. It might also be interesting to
> derive a ratio metric of client:recovery and maybe key of that for
> automation (alerts/notifications, automated tuning etc etc)
>
>
>
>
>
>
>
>
>
>
> On Fri, Mar 10, 2017 at 10:55 PM, John Spray <jspray@redhat.com> wrote:
>> The reason they're different is that they originate from separate
>> internal counters:
>>  * The client_io_rate bits come from
>> https://github.com/ceph/ceph/blob/jewel/src/mon/PGMap.cc#L1212
>>  * The recovery bits come from
>> https://github.com/ceph/ceph/blob/jewel/src/mon/PGMap.cc#L1146
>>
>> Not sure what you mean about bytes_sec vs objects_sec: client io and
>> recovery rate both have both objects and bytes counters.
>>
>> The empty dicts are something that annoys me too, some of the output
>> functions have an if() right at the start that drops the output when
>> none of the deltas are nonzero.  I doubt anyone would have a big
>> problem with changing these to output the zeros rather than skipping
>> the fields.
>>
>> BTW I'm not sure it's smart to merge these in practice: would result
>> in showing users a "your cluster is doing 10GB/s" statistics while
>> their workload is crawling because all that IO is really recovery.
>> Confusing.
>>
>> John
>>
>>
>> On Fri, Mar 10, 2017 at 2:37 AM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>> Hi,
>>>
>>> I've been putting together a collectd plugin for ceph - since the old
>>> one's I could find no longer work. I'm gathering data from the mon's
>>> admin socket, merged with a couple of commands I issue through the
>>> rados mon_command interface.
>>>
>>> Nothing complicated, but the data has me a little confused
>>>
>>> When I run "osd pool stats" I get *two* different sets of metrics that
>>> describe client i/o and recovery i/o. Since the metrics are different
>>> I can't merge them to get a consistent view of what the cluster is
>>> doing as a whole at any given point in time. For example, client i/o
>>> reports in bytes_sec, but the recovery dict is empty and the
>>> recovery_rate is in objects_sec...
>>>
>>> i.e.
>>>
>>> }, {
>>> "pool_name": "rados-bench-cbt",
>>> "pool_id": 86,
>>> "recovery": {},
>>> "recovery_rate": {
>>> "recovering_objects_per_sec": 3530,
>>> "recovering_bytes_per_sec": 14462655,
>>> "recovering_keys_per_sec": 0,
>>> "num_objects_recovered": 7148,
>>> "num_bytes_recovered": 29278208,
>>> "num_keys_recovered": 0
>>> },
>>> "client_io_rate": {}
>>>
>>> This is running Jewel - 10.2.5-37.el7cp
>>>
>>> Is this a bug or a 'feature' :)
>>>
>>> Cheers,
>>>
>>> Paul C
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-11 20:49     ` John Spray
@ 2017-03-11 21:24       ` Paul Cuzner
  2017-03-12 12:13         ` John Spray
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Cuzner @ 2017-03-11 21:24 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@redhat.com> wrote:
> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>> Thanks John
>>
>> This is weird then. When I look at the data with client load I see the
>> following;
>> {
>> "pool_name": "default.rgw.buckets.index",
>> "pool_id": 94,
>> "recovery": {},
>> "recovery_rate": {},
>> "client_io_rate": {
>> "read_bytes_sec": 19242365,
>> "write_bytes_sec": 0,
>> "read_op_per_sec": 12514,
>> "write_op_per_sec": 0
>> }
>>
>> No object related counters - they're all block based. The plugin I
>> have rolls-up the block metrics across all pools to provide total
>> client load.
>
> Where are you getting the idea that these counters have to do with
> block storage?  What Ceph is telling you about here is the number of
> operations (or bytes in those operations) being handled by OSDs.
>

Perhaps it's my poor choice of words - apologies.

read_op_per_sec is read IOP count to the OSDs from client activity
against the pool

My point is that client-io is expressed in these terms, but recovery
activity is not. I was hoping that both recovery and client I/O would
be reported in the same way so you gain a view of the activity of the
system as a whole. I can sum bytes_sec from client i/o with
recovery_rate bytes_sec, which is something, but I can't see inside
recovery activity to see how much is read or write, or how much IOP
load is coming from recovery.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-11 21:24       ` Paul Cuzner
@ 2017-03-12 12:13         ` John Spray
  2017-03-13 21:50           ` Paul Cuzner
  0 siblings, 1 reply; 18+ messages in thread
From: John Spray @ 2017-03-12 12:13 UTC (permalink / raw)
  To: Paul Cuzner; +Cc: Ceph Development

On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@redhat.com> wrote:
>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>> Thanks John
>>>
>>> This is weird then. When I look at the data with client load I see the
>>> following;
>>> {
>>> "pool_name": "default.rgw.buckets.index",
>>> "pool_id": 94,
>>> "recovery": {},
>>> "recovery_rate": {},
>>> "client_io_rate": {
>>> "read_bytes_sec": 19242365,
>>> "write_bytes_sec": 0,
>>> "read_op_per_sec": 12514,
>>> "write_op_per_sec": 0
>>> }
>>>
>>> No object related counters - they're all block based. The plugin I
>>> have rolls-up the block metrics across all pools to provide total
>>> client load.
>>
>> Where are you getting the idea that these counters have to do with
>> block storage?  What Ceph is telling you about here is the number of
>> operations (or bytes in those operations) being handled by OSDs.
>>
>
> Perhaps it's my poor choice of words - apologies.
>
> read_op_per_sec is read IOP count to the OSDs from client activity
> against the pool
>
> My point is that client-io is expressed in these terms, but recovery
> activity is not. I was hoping that both recovery and client I/O would
> be reported in the same way so you gain a view of the activity of the
> system as a whole. I can sum bytes_sec from client i/o with
> recovery_rate bytes_sec, which is something, but I can't see inside
> recovery activity to see how much is read or write, or how much IOP
> load is coming from recovery.

What would it mean to you for a recovery operation (one OSD sending
some data to another OSD) to be read vs. write?

John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-12 12:13         ` John Spray
@ 2017-03-13 21:50           ` Paul Cuzner
  2017-03-13 22:13             ` John Spray
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Cuzner @ 2017-03-13 21:50 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

Fundamentally, the metrics that describe the IO the OSD performs in
response to a recovery operation should be the same as the metrics for
client I/O. So in the context of a recovery operation, one OSD would
report a read (recovery source) and another report a write (recovery
target), together with their corresponding num_bytes. To my mind this
provides transparency, and maybe helps potential automation.






On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@redhat.com> wrote:
> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@redhat.com> wrote:
>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>> Thanks John
>>>>
>>>> This is weird then. When I look at the data with client load I see the
>>>> following;
>>>> {
>>>> "pool_name": "default.rgw.buckets.index",
>>>> "pool_id": 94,
>>>> "recovery": {},
>>>> "recovery_rate": {},
>>>> "client_io_rate": {
>>>> "read_bytes_sec": 19242365,
>>>> "write_bytes_sec": 0,
>>>> "read_op_per_sec": 12514,
>>>> "write_op_per_sec": 0
>>>> }
>>>>
>>>> No object related counters - they're all block based. The plugin I
>>>> have rolls-up the block metrics across all pools to provide total
>>>> client load.
>>>
>>> Where are you getting the idea that these counters have to do with
>>> block storage?  What Ceph is telling you about here is the number of
>>> operations (or bytes in those operations) being handled by OSDs.
>>>
>>
>> Perhaps it's my poor choice of words - apologies.
>>
>> read_op_per_sec is read IOP count to the OSDs from client activity
>> against the pool
>>
>> My point is that client-io is expressed in these terms, but recovery
>> activity is not. I was hoping that both recovery and client I/O would
>> be reported in the same way so you gain a view of the activity of the
>> system as a whole. I can sum bytes_sec from client i/o with
>> recovery_rate bytes_sec, which is something, but I can't see inside
>> recovery activity to see how much is read or write, or how much IOP
>> load is coming from recovery.
>
> What would it mean to you for a recovery operation (one OSD sending
> some data to another OSD) to be read vs. write?
>
> John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-13 21:50           ` Paul Cuzner
@ 2017-03-13 22:13             ` John Spray
  2017-03-13 22:14               ` John Spray
  0 siblings, 1 reply; 18+ messages in thread
From: John Spray @ 2017-03-13 22:13 UTC (permalink / raw)
  To: Paul Cuzner; +Cc: Ceph Development

On Mon, Mar 13, 2017 at 9:50 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
> Fundamentally, the metrics that describe the IO the OSD performs in
> response to a recovery operation should be the same as the metrics for
> client I/O.

Ah, so the key part here I think is "describe the IO that the OSD
performs" -- the counters you've been looking at do not do that.  They
describe the ops the OSD is servicing, *not* the (disk) IO the OSD is
doing as a result.

That's why you don't get an apples-to-apples comparison between client
IO and recovery -- if you were looking at disk IO stats from both, it
would be perfectly reasonable to combine/compare them.  When you're
looking at Ceph's own counters of client ops vs. recovery activity,
that no longer makes sense.

> So in the context of a recovery operation, one OSD would
> report a read (recovery source) and another report a write (recovery
> target), together with their corresponding num_bytes. To my mind this
> provides transparency, and maybe helps potential automation.

Okay, so if we were talking about disk IO counters, this would
probably make sense (one read wouldn't necessarily correspond to one
write), but if you had a counter that was telling you how many Ceph
recovery push/pull ops were "reading" (being sent) vs "writing" (being
received) the totals would just be zero.

John

>

>
>
>
>
>
> On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@redhat.com> wrote:
>> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@redhat.com> wrote:
>>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>>> Thanks John
>>>>>
>>>>> This is weird then. When I look at the data with client load I see the
>>>>> following;
>>>>> {
>>>>> "pool_name": "default.rgw.buckets.index",
>>>>> "pool_id": 94,
>>>>> "recovery": {},
>>>>> "recovery_rate": {},
>>>>> "client_io_rate": {
>>>>> "read_bytes_sec": 19242365,
>>>>> "write_bytes_sec": 0,
>>>>> "read_op_per_sec": 12514,
>>>>> "write_op_per_sec": 0
>>>>> }
>>>>>
>>>>> No object related counters - they're all block based. The plugin I
>>>>> have rolls-up the block metrics across all pools to provide total
>>>>> client load.
>>>>
>>>> Where are you getting the idea that these counters have to do with
>>>> block storage?  What Ceph is telling you about here is the number of
>>>> operations (or bytes in those operations) being handled by OSDs.
>>>>
>>>
>>> Perhaps it's my poor choice of words - apologies.
>>>
>>> read_op_per_sec is read IOP count to the OSDs from client activity
>>> against the pool
>>>
>>> My point is that client-io is expressed in these terms, but recovery
>>> activity is not. I was hoping that both recovery and client I/O would
>>> be reported in the same way so you gain a view of the activity of the
>>> system as a whole. I can sum bytes_sec from client i/o with
>>> recovery_rate bytes_sec, which is something, but I can't see inside
>>> recovery activity to see how much is read or write, or how much IOP
>>> load is coming from recovery.
>>
>> What would it mean to you for a recovery operation (one OSD sending
>> some data to another OSD) to be read vs. write?
>>
>> John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-13 22:13             ` John Spray
@ 2017-03-13 22:14               ` John Spray
  2017-03-14  3:13                 ` Paul Cuzner
  0 siblings, 1 reply; 18+ messages in thread
From: John Spray @ 2017-03-13 22:14 UTC (permalink / raw)
  To: Paul Cuzner; +Cc: Ceph Development

On Mon, Mar 13, 2017 at 10:13 PM, John Spray <jspray@redhat.com> wrote:
> On Mon, Mar 13, 2017 at 9:50 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>> Fundamentally, the metrics that describe the IO the OSD performs in
>> response to a recovery operation should be the same as the metrics for
>> client I/O.
>
> Ah, so the key part here I think is "describe the IO that the OSD
> performs" -- the counters you've been looking at do not do that.  They
> describe the ops the OSD is servicing, *not* the (disk) IO the OSD is
> doing as a result.
>
> That's why you don't get an apples-to-apples comparison between client
> IO and recovery -- if you were looking at disk IO stats from both, it
> would be perfectly reasonable to combine/compare them.  When you're
> looking at Ceph's own counters of client ops vs. recovery activity,
> that no longer makes sense.
>
>> So in the context of a recovery operation, one OSD would
>> report a read (recovery source) and another report a write (recovery
>> target), together with their corresponding num_bytes. To my mind this
>> provides transparency, and maybe helps potential automation.
>
> Okay, so if we were talking about disk IO counters, this would
> probably make sense (one read wouldn't necessarily correspond to one
> write), but if you had a counter that was telling you how many Ceph
> recovery push/pull ops were "reading" (being sent) vs "writing" (being
> received) the totals would just be zero.

Sorry, that should have said the totals would just be equal.

John

>
> John
>
>>
>
>>
>>
>>
>>
>>
>> On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@redhat.com> wrote:
>>> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@redhat.com> wrote:
>>>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>>>> Thanks John
>>>>>>
>>>>>> This is weird then. When I look at the data with client load I see the
>>>>>> following;
>>>>>> {
>>>>>> "pool_name": "default.rgw.buckets.index",
>>>>>> "pool_id": 94,
>>>>>> "recovery": {},
>>>>>> "recovery_rate": {},
>>>>>> "client_io_rate": {
>>>>>> "read_bytes_sec": 19242365,
>>>>>> "write_bytes_sec": 0,
>>>>>> "read_op_per_sec": 12514,
>>>>>> "write_op_per_sec": 0
>>>>>> }
>>>>>>
>>>>>> No object related counters - they're all block based. The plugin I
>>>>>> have rolls-up the block metrics across all pools to provide total
>>>>>> client load.
>>>>>
>>>>> Where are you getting the idea that these counters have to do with
>>>>> block storage?  What Ceph is telling you about here is the number of
>>>>> operations (or bytes in those operations) being handled by OSDs.
>>>>>
>>>>
>>>> Perhaps it's my poor choice of words - apologies.
>>>>
>>>> read_op_per_sec is read IOP count to the OSDs from client activity
>>>> against the pool
>>>>
>>>> My point is that client-io is expressed in these terms, but recovery
>>>> activity is not. I was hoping that both recovery and client I/O would
>>>> be reported in the same way so you gain a view of the activity of the
>>>> system as a whole. I can sum bytes_sec from client i/o with
>>>> recovery_rate bytes_sec, which is something, but I can't see inside
>>>> recovery activity to see how much is read or write, or how much IOP
>>>> load is coming from recovery.
>>>
>>> What would it mean to you for a recovery operation (one OSD sending
>>> some data to another OSD) to be read vs. write?
>>>
>>> John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-13 22:14               ` John Spray
@ 2017-03-14  3:13                 ` Paul Cuzner
  2017-03-14  9:49                   ` John Spray
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Cuzner @ 2017-03-14  3:13 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

First of all - thanks John for your patience!

I guess, I still can't get past the different metrics being used -
client I/O is described in one way, recovery in another and yet
fundamentally they both send ops to the OSD's right? To me, what's
interesting is that the recovery_rate metrics from pool stats seems to
be a higher level 'product' of lower level information - for example
recovering_objects_per_sec : is this not a product of multiple
read/write ops to OSD's?

Also, don't get me wrong - the recovery_rate dict is cool and it gives
a great view of object level recovery - I was just hoping for common
metrics for the OSD ops that are shared by client and recovery
activity.

Since this isn't the case, what's the recommended way to determine how
busy a cluster is - across recovery and client (rbd/rgw) requests?









.

On Tue, Mar 14, 2017 at 11:14 AM, John Spray <jspray@redhat.com> wrote:
> On Mon, Mar 13, 2017 at 10:13 PM, John Spray <jspray@redhat.com> wrote:
>> On Mon, Mar 13, 2017 at 9:50 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>> Fundamentally, the metrics that describe the IO the OSD performs in
>>> response to a recovery operation should be the same as the metrics for
>>> client I/O.
>>
>> Ah, so the key part here I think is "describe the IO that the OSD
>> performs" -- the counters you've been looking at do not do that.  They
>> describe the ops the OSD is servicing, *not* the (disk) IO the OSD is
>> doing as a result.
>>
>> That's why you don't get an apples-to-apples comparison between client
>> IO and recovery -- if you were looking at disk IO stats from both, it
>> would be perfectly reasonable to combine/compare them.  When you're
>> looking at Ceph's own counters of client ops vs. recovery activity,
>> that no longer makes sense.
>>
>>> So in the context of a recovery operation, one OSD would
>>> report a read (recovery source) and another report a write (recovery
>>> target), together with their corresponding num_bytes. To my mind this
>>> provides transparency, and maybe helps potential automation.
>>
>> Okay, so if we were talking about disk IO counters, this would
>> probably make sense (one read wouldn't necessarily correspond to one
>> write), but if you had a counter that was telling you how many Ceph
>> recovery push/pull ops were "reading" (being sent) vs "writing" (being
>> received) the totals would just be zero.
>
> Sorry, that should have said the totals would just be equal.
>
> John
>
>>
>> John
>>
>>>
>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@redhat.com> wrote:
>>>> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@redhat.com> wrote:
>>>>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>>>>> Thanks John
>>>>>>>
>>>>>>> This is weird then. When I look at the data with client load I see the
>>>>>>> following;
>>>>>>> {
>>>>>>> "pool_name": "default.rgw.buckets.index",
>>>>>>> "pool_id": 94,
>>>>>>> "recovery": {},
>>>>>>> "recovery_rate": {},
>>>>>>> "client_io_rate": {
>>>>>>> "read_bytes_sec": 19242365,
>>>>>>> "write_bytes_sec": 0,
>>>>>>> "read_op_per_sec": 12514,
>>>>>>> "write_op_per_sec": 0
>>>>>>> }
>>>>>>>
>>>>>>> No object related counters - they're all block based. The plugin I
>>>>>>> have rolls-up the block metrics across all pools to provide total
>>>>>>> client load.
>>>>>>
>>>>>> Where are you getting the idea that these counters have to do with
>>>>>> block storage?  What Ceph is telling you about here is the number of
>>>>>> operations (or bytes in those operations) being handled by OSDs.
>>>>>>
>>>>>
>>>>> Perhaps it's my poor choice of words - apologies.
>>>>>
>>>>> read_op_per_sec is read IOP count to the OSDs from client activity
>>>>> against the pool
>>>>>
>>>>> My point is that client-io is expressed in these terms, but recovery
>>>>> activity is not. I was hoping that both recovery and client I/O would
>>>>> be reported in the same way so you gain a view of the activity of the
>>>>> system as a whole. I can sum bytes_sec from client i/o with
>>>>> recovery_rate bytes_sec, which is something, but I can't see inside
>>>>> recovery activity to see how much is read or write, or how much IOP
>>>>> load is coming from recovery.
>>>>
>>>> What would it mean to you for a recovery operation (one OSD sending
>>>> some data to another OSD) to be read vs. write?
>>>>
>>>> John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-14  3:13                 ` Paul Cuzner
@ 2017-03-14  9:49                   ` John Spray
  2017-03-14 13:13                     ` Sage Weil
  0 siblings, 1 reply; 18+ messages in thread
From: John Spray @ 2017-03-14  9:49 UTC (permalink / raw)
  To: Paul Cuzner; +Cc: Ceph Development

On Tue, Mar 14, 2017 at 3:13 AM, Paul Cuzner <pcuzner@redhat.com> wrote:
> First of all - thanks John for your patience!
>
> I guess, I still can't get past the different metrics being used -
> client I/O is described in one way, recovery in another and yet
> fundamentally they both send ops to the OSD's right? To me, what's
> interesting is that the recovery_rate metrics from pool stats seems to
> be a higher level 'product' of lower level information - for example
> recovering_objects_per_sec : is this not a product of multiple
> read/write ops to OSD's?

While there is data being moved around, it would be misleading to say
it's all just ops.  The path that client ops go down is different to
the path that recovery messages go down.  Recovery data is gathered up
into big vectors of object extents that are sent between OSDs, client
ops are sent individually from clients.  An OSD servicing 10 writes
from 10 different clients is not directly comparable to an OSD
servicing an MOSDPush message from another OSD that happens to contain
updates to 10 objects.

Client ops are also a logically meaningful to consumers of the
cluster, while the recovery stuff is a total implementation detail.
The implementation of recovery could change any time, and any counter
generated from it will only be meaningful to someone who understands
how recovery works on that particular version of the ceph code.

> Also, don't get me wrong - the recovery_rate dict is cool and it gives
> a great view of object level recovery - I was just hoping for common
> metrics for the OSD ops that are shared by client and recovery
> activity.
>
> Since this isn't the case, what's the recommended way to determine how
> busy a cluster is - across recovery and client (rbd/rgw) requests?

I would say again that how busy a cluster is doing it's job (client
IO) is a very separate thing from how busy it is doing internal
housekeeping.  Imagine exposing this as a speedometer dial in a GUI
(as people sometimes do) -- a cluster that was killing itself with
recovery and completely blocking it's clients would look like it was
going nice and fast.  In my view, exposing two separate numbers is the
right thing to do, not a shortcoming.

If you truly want to come up with some kind of single metric then you
can: you could take the rate of change of the objects recovered for
example.  If you wanted to, you could think of finishing recovery of
one object as an "op".  I would tend to think of this as the job of a
higher level tool though, rather than a collectd plugin.  Especially
if the collectd plugin is meant to be general purpose, it should avoid
inventing things like this.

John

>
>
>
>
>
>
>
>
>
> .
>
> On Tue, Mar 14, 2017 at 11:14 AM, John Spray <jspray@redhat.com> wrote:
>> On Mon, Mar 13, 2017 at 10:13 PM, John Spray <jspray@redhat.com> wrote:
>>> On Mon, Mar 13, 2017 at 9:50 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>> Fundamentally, the metrics that describe the IO the OSD performs in
>>>> response to a recovery operation should be the same as the metrics for
>>>> client I/O.
>>>
>>> Ah, so the key part here I think is "describe the IO that the OSD
>>> performs" -- the counters you've been looking at do not do that.  They
>>> describe the ops the OSD is servicing, *not* the (disk) IO the OSD is
>>> doing as a result.
>>>
>>> That's why you don't get an apples-to-apples comparison between client
>>> IO and recovery -- if you were looking at disk IO stats from both, it
>>> would be perfectly reasonable to combine/compare them.  When you're
>>> looking at Ceph's own counters of client ops vs. recovery activity,
>>> that no longer makes sense.
>>>
>>>> So in the context of a recovery operation, one OSD would
>>>> report a read (recovery source) and another report a write (recovery
>>>> target), together with their corresponding num_bytes. To my mind this
>>>> provides transparency, and maybe helps potential automation.
>>>
>>> Okay, so if we were talking about disk IO counters, this would
>>> probably make sense (one read wouldn't necessarily correspond to one
>>> write), but if you had a counter that was telling you how many Ceph
>>> recovery push/pull ops were "reading" (being sent) vs "writing" (being
>>> received) the totals would just be zero.
>>
>> Sorry, that should have said the totals would just be equal.
>>
>> John
>>
>>>
>>> John
>>>
>>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@redhat.com> wrote:
>>>>> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>>>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@redhat.com> wrote:
>>>>>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>>>>>> Thanks John
>>>>>>>>
>>>>>>>> This is weird then. When I look at the data with client load I see the
>>>>>>>> following;
>>>>>>>> {
>>>>>>>> "pool_name": "default.rgw.buckets.index",
>>>>>>>> "pool_id": 94,
>>>>>>>> "recovery": {},
>>>>>>>> "recovery_rate": {},
>>>>>>>> "client_io_rate": {
>>>>>>>> "read_bytes_sec": 19242365,
>>>>>>>> "write_bytes_sec": 0,
>>>>>>>> "read_op_per_sec": 12514,
>>>>>>>> "write_op_per_sec": 0
>>>>>>>> }
>>>>>>>>
>>>>>>>> No object related counters - they're all block based. The plugin I
>>>>>>>> have rolls-up the block metrics across all pools to provide total
>>>>>>>> client load.
>>>>>>>
>>>>>>> Where are you getting the idea that these counters have to do with
>>>>>>> block storage?  What Ceph is telling you about here is the number of
>>>>>>> operations (or bytes in those operations) being handled by OSDs.
>>>>>>>
>>>>>>
>>>>>> Perhaps it's my poor choice of words - apologies.
>>>>>>
>>>>>> read_op_per_sec is read IOP count to the OSDs from client activity
>>>>>> against the pool
>>>>>>
>>>>>> My point is that client-io is expressed in these terms, but recovery
>>>>>> activity is not. I was hoping that both recovery and client I/O would
>>>>>> be reported in the same way so you gain a view of the activity of the
>>>>>> system as a whole. I can sum bytes_sec from client i/o with
>>>>>> recovery_rate bytes_sec, which is something, but I can't see inside
>>>>>> recovery activity to see how much is read or write, or how much IOP
>>>>>> load is coming from recovery.
>>>>>
>>>>> What would it mean to you for a recovery operation (one OSD sending
>>>>> some data to another OSD) to be read vs. write?
>>>>>
>>>>> John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-14  9:49                   ` John Spray
@ 2017-03-14 13:13                     ` Sage Weil
  2017-03-20  3:57                       ` Paul Cuzner
  0 siblings, 1 reply; 18+ messages in thread
From: Sage Weil @ 2017-03-14 13:13 UTC (permalink / raw)
  To: John Spray; +Cc: Paul Cuzner, Ceph Development

On Tue, 14 Mar 2017, John Spray wrote:
> On Tue, Mar 14, 2017 at 3:13 AM, Paul Cuzner <pcuzner@redhat.com> wrote:
> > First of all - thanks John for your patience!
> >
> > I guess, I still can't get past the different metrics being used -
> > client I/O is described in one way, recovery in another and yet
> > fundamentally they both send ops to the OSD's right? To me, what's
> > interesting is that the recovery_rate metrics from pool stats seems to
> > be a higher level 'product' of lower level information - for example
> > recovering_objects_per_sec : is this not a product of multiple
> > read/write ops to OSD's?
> 
> While there is data being moved around, it would be misleading to say
> it's all just ops.  The path that client ops go down is different to
> the path that recovery messages go down.  Recovery data is gathered up
> into big vectors of object extents that are sent between OSDs, client
> ops are sent individually from clients.  An OSD servicing 10 writes
> from 10 different clients is not directly comparable to an OSD
> servicing an MOSDPush message from another OSD that happens to contain
> updates to 10 objects.
> 
> Client ops are also a logically meaningful to consumers of the
> cluster, while the recovery stuff is a total implementation detail.
> The implementation of recovery could change any time, and any counter
> generated from it will only be meaningful to someone who understands
> how recovery works on that particular version of the ceph code.
> 
> > Also, don't get me wrong - the recovery_rate dict is cool and it gives
> > a great view of object level recovery - I was just hoping for common
> > metrics for the OSD ops that are shared by client and recovery
> > activity.
> >
> > Since this isn't the case, what's the recommended way to determine how
> > busy a cluster is - across recovery and client (rbd/rgw) requests?
> 
> I would say again that how busy a cluster is doing it's job (client
> IO) is a very separate thing from how busy it is doing internal
> housekeeping.  Imagine exposing this as a speedometer dial in a GUI
> (as people sometimes do) -- a cluster that was killing itself with
> recovery and completely blocking it's clients would look like it was
> going nice and fast.  In my view, exposing two separate numbers is the
> right thing to do, not a shortcoming.
> 
> If you truly want to come up with some kind of single metric then you
> can: you could take the rate of change of the objects recovered for
> example.  If you wanted to, you could think of finishing recovery of
> one object as an "op".  I would tend to think of this as the job of a
> higher level tool though, rather than a collectd plugin.  Especially
> if the collectd plugin is meant to be general purpose, it should avoid
> inventing things like this.

I think the only other option is to take a measurement at a lower layer.  
BlueStore doesn't currently but could easily have metrics for bytes read 
and written.  But again, this is a secondary product of client and 
recovery: a client write, for example, will result in 3 writes across 3 
osds (in a 3x replicated pool).

sage


 > 
> John
> 
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > .
> >
> > On Tue, Mar 14, 2017 at 11:14 AM, John Spray <jspray@redhat.com> wrote:
> >> On Mon, Mar 13, 2017 at 10:13 PM, John Spray <jspray@redhat.com> wrote:
> >>> On Mon, Mar 13, 2017 at 9:50 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
> >>>> Fundamentally, the metrics that describe the IO the OSD performs in
> >>>> response to a recovery operation should be the same as the metrics for
> >>>> client I/O.
> >>>
> >>> Ah, so the key part here I think is "describe the IO that the OSD
> >>> performs" -- the counters you've been looking at do not do that.  They
> >>> describe the ops the OSD is servicing, *not* the (disk) IO the OSD is
> >>> doing as a result.
> >>>
> >>> That's why you don't get an apples-to-apples comparison between client
> >>> IO and recovery -- if you were looking at disk IO stats from both, it
> >>> would be perfectly reasonable to combine/compare them.  When you're
> >>> looking at Ceph's own counters of client ops vs. recovery activity,
> >>> that no longer makes sense.
> >>>
> >>>> So in the context of a recovery operation, one OSD would
> >>>> report a read (recovery source) and another report a write (recovery
> >>>> target), together with their corresponding num_bytes. To my mind this
> >>>> provides transparency, and maybe helps potential automation.
> >>>
> >>> Okay, so if we were talking about disk IO counters, this would
> >>> probably make sense (one read wouldn't necessarily correspond to one
> >>> write), but if you had a counter that was telling you how many Ceph
> >>> recovery push/pull ops were "reading" (being sent) vs "writing" (being
> >>> received) the totals would just be zero.
> >>
> >> Sorry, that should have said the totals would just be equal.
> >>
> >> John
> >>
> >>>
> >>> John
> >>>
> >>>>
> >>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@redhat.com> wrote:
> >>>>> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
> >>>>>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@redhat.com> wrote:
> >>>>>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
> >>>>>>>> Thanks John
> >>>>>>>>
> >>>>>>>> This is weird then. When I look at the data with client load I see the
> >>>>>>>> following;
> >>>>>>>> {
> >>>>>>>> "pool_name": "default.rgw.buckets.index",
> >>>>>>>> "pool_id": 94,
> >>>>>>>> "recovery": {},
> >>>>>>>> "recovery_rate": {},
> >>>>>>>> "client_io_rate": {
> >>>>>>>> "read_bytes_sec": 19242365,
> >>>>>>>> "write_bytes_sec": 0,
> >>>>>>>> "read_op_per_sec": 12514,
> >>>>>>>> "write_op_per_sec": 0
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> No object related counters - they're all block based. The plugin I
> >>>>>>>> have rolls-up the block metrics across all pools to provide total
> >>>>>>>> client load.
> >>>>>>>
> >>>>>>> Where are you getting the idea that these counters have to do with
> >>>>>>> block storage?  What Ceph is telling you about here is the number of
> >>>>>>> operations (or bytes in those operations) being handled by OSDs.
> >>>>>>>
> >>>>>>
> >>>>>> Perhaps it's my poor choice of words - apologies.
> >>>>>>
> >>>>>> read_op_per_sec is read IOP count to the OSDs from client activity
> >>>>>> against the pool
> >>>>>>
> >>>>>> My point is that client-io is expressed in these terms, but recovery
> >>>>>> activity is not. I was hoping that both recovery and client I/O would
> >>>>>> be reported in the same way so you gain a view of the activity of the
> >>>>>> system as a whole. I can sum bytes_sec from client i/o with
> >>>>>> recovery_rate bytes_sec, which is something, but I can't see inside
> >>>>>> recovery activity to see how much is read or write, or how much IOP
> >>>>>> load is coming from recovery.
> >>>>>
> >>>>> What would it mean to you for a recovery operation (one OSD sending
> >>>>> some data to another OSD) to be read vs. write?
> >>>>>
> >>>>> John
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-14 13:13                     ` Sage Weil
@ 2017-03-20  3:57                       ` Paul Cuzner
  2017-03-20  7:54                         ` Brad Hubbard
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Cuzner @ 2017-03-20  3:57 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development, Sage Weil

John/Sage, thanks for the clarification and info. At this stage, I'll
stick with the data I have with John's caveats.

The challenge in understanding the load going on in a cluster is
definitely interesting since the choke points are different depending
on whether you look at the cluster through a hardware or software
'lens'.

I think the interesting question is how does a customer know how
'full' their cluster is from a performance standpoint - ie. when do I
need to buy more or different hardware? Holy grail type stuff :)

Is there any work going on in this space, perhaps analyzing the
underlying components within the cluster like cpu, ram or disk util
rates across the nodes?



On Wed, Mar 15, 2017 at 2:13 AM, Sage Weil <sweil@redhat.com> wrote:
> On Tue, 14 Mar 2017, John Spray wrote:
>> On Tue, Mar 14, 2017 at 3:13 AM, Paul Cuzner <pcuzner@redhat.com> wrote:
>> > First of all - thanks John for your patience!
>> >
>> > I guess, I still can't get past the different metrics being used -
>> > client I/O is described in one way, recovery in another and yet
>> > fundamentally they both send ops to the OSD's right? To me, what's
>> > interesting is that the recovery_rate metrics from pool stats seems to
>> > be a higher level 'product' of lower level information - for example
>> > recovering_objects_per_sec : is this not a product of multiple
>> > read/write ops to OSD's?
>>
>> While there is data being moved around, it would be misleading to say
>> it's all just ops.  The path that client ops go down is different to
>> the path that recovery messages go down.  Recovery data is gathered up
>> into big vectors of object extents that are sent between OSDs, client
>> ops are sent individually from clients.  An OSD servicing 10 writes
>> from 10 different clients is not directly comparable to an OSD
>> servicing an MOSDPush message from another OSD that happens to contain
>> updates to 10 objects.
>>
>> Client ops are also a logically meaningful to consumers of the
>> cluster, while the recovery stuff is a total implementation detail.
>> The implementation of recovery could change any time, and any counter
>> generated from it will only be meaningful to someone who understands
>> how recovery works on that particular version of the ceph code.
>>
>> > Also, don't get me wrong - the recovery_rate dict is cool and it gives
>> > a great view of object level recovery - I was just hoping for common
>> > metrics for the OSD ops that are shared by client and recovery
>> > activity.
>> >
>> > Since this isn't the case, what's the recommended way to determine how
>> > busy a cluster is - across recovery and client (rbd/rgw) requests?
>>
>> I would say again that how busy a cluster is doing it's job (client
>> IO) is a very separate thing from how busy it is doing internal
>> housekeeping.  Imagine exposing this as a speedometer dial in a GUI
>> (as people sometimes do) -- a cluster that was killing itself with
>> recovery and completely blocking it's clients would look like it was
>> going nice and fast.  In my view, exposing two separate numbers is the
>> right thing to do, not a shortcoming.
>>
>> If you truly want to come up with some kind of single metric then you
>> can: you could take the rate of change of the objects recovered for
>> example.  If you wanted to, you could think of finishing recovery of
>> one object as an "op".  I would tend to think of this as the job of a
>> higher level tool though, rather than a collectd plugin.  Especially
>> if the collectd plugin is meant to be general purpose, it should avoid
>> inventing things like this.
>
> I think the only other option is to take a measurement at a lower layer.
> BlueStore doesn't currently but could easily have metrics for bytes read
> and written.  But again, this is a secondary product of client and
> recovery: a client write, for example, will result in 3 writes across 3
> osds (in a 3x replicated pool).
>
> sage
>
>
>  >
>> John
>>
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > .
>> >
>> > On Tue, Mar 14, 2017 at 11:14 AM, John Spray <jspray@redhat.com> wrote:
>> >> On Mon, Mar 13, 2017 at 10:13 PM, John Spray <jspray@redhat.com> wrote:
>> >>> On Mon, Mar 13, 2017 at 9:50 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>> >>>> Fundamentally, the metrics that describe the IO the OSD performs in
>> >>>> response to a recovery operation should be the same as the metrics for
>> >>>> client I/O.
>> >>>
>> >>> Ah, so the key part here I think is "describe the IO that the OSD
>> >>> performs" -- the counters you've been looking at do not do that.  They
>> >>> describe the ops the OSD is servicing, *not* the (disk) IO the OSD is
>> >>> doing as a result.
>> >>>
>> >>> That's why you don't get an apples-to-apples comparison between client
>> >>> IO and recovery -- if you were looking at disk IO stats from both, it
>> >>> would be perfectly reasonable to combine/compare them.  When you're
>> >>> looking at Ceph's own counters of client ops vs. recovery activity,
>> >>> that no longer makes sense.
>> >>>
>> >>>> So in the context of a recovery operation, one OSD would
>> >>>> report a read (recovery source) and another report a write (recovery
>> >>>> target), together with their corresponding num_bytes. To my mind this
>> >>>> provides transparency, and maybe helps potential automation.
>> >>>
>> >>> Okay, so if we were talking about disk IO counters, this would
>> >>> probably make sense (one read wouldn't necessarily correspond to one
>> >>> write), but if you had a counter that was telling you how many Ceph
>> >>> recovery push/pull ops were "reading" (being sent) vs "writing" (being
>> >>> received) the totals would just be zero.
>> >>
>> >> Sorry, that should have said the totals would just be equal.
>> >>
>> >> John
>> >>
>> >>>
>> >>> John
>> >>>
>> >>>>
>> >>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@redhat.com> wrote:
>> >>>>> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>> >>>>>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@redhat.com> wrote:
>> >>>>>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>> >>>>>>>> Thanks John
>> >>>>>>>>
>> >>>>>>>> This is weird then. When I look at the data with client load I see the
>> >>>>>>>> following;
>> >>>>>>>> {
>> >>>>>>>> "pool_name": "default.rgw.buckets.index",
>> >>>>>>>> "pool_id": 94,
>> >>>>>>>> "recovery": {},
>> >>>>>>>> "recovery_rate": {},
>> >>>>>>>> "client_io_rate": {
>> >>>>>>>> "read_bytes_sec": 19242365,
>> >>>>>>>> "write_bytes_sec": 0,
>> >>>>>>>> "read_op_per_sec": 12514,
>> >>>>>>>> "write_op_per_sec": 0
>> >>>>>>>> }
>> >>>>>>>>
>> >>>>>>>> No object related counters - they're all block based. The plugin I
>> >>>>>>>> have rolls-up the block metrics across all pools to provide total
>> >>>>>>>> client load.
>> >>>>>>>
>> >>>>>>> Where are you getting the idea that these counters have to do with
>> >>>>>>> block storage?  What Ceph is telling you about here is the number of
>> >>>>>>> operations (or bytes in those operations) being handled by OSDs.
>> >>>>>>>
>> >>>>>>
>> >>>>>> Perhaps it's my poor choice of words - apologies.
>> >>>>>>
>> >>>>>> read_op_per_sec is read IOP count to the OSDs from client activity
>> >>>>>> against the pool
>> >>>>>>
>> >>>>>> My point is that client-io is expressed in these terms, but recovery
>> >>>>>> activity is not. I was hoping that both recovery and client I/O would
>> >>>>>> be reported in the same way so you gain a view of the activity of the
>> >>>>>> system as a whole. I can sum bytes_sec from client i/o with
>> >>>>>> recovery_rate bytes_sec, which is something, but I can't see inside
>> >>>>>> recovery activity to see how much is read or write, or how much IOP
>> >>>>>> load is coming from recovery.
>> >>>>>
>> >>>>> What would it mean to you for a recovery operation (one OSD sending
>> >>>>> some data to another OSD) to be read vs. write?
>> >>>>>
>> >>>>> John
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-20  3:57                       ` Paul Cuzner
@ 2017-03-20  7:54                         ` Brad Hubbard
  2017-03-20  8:40                           ` Paul Cuzner
  0 siblings, 1 reply; 18+ messages in thread
From: Brad Hubbard @ 2017-03-20  7:54 UTC (permalink / raw)
  To: Paul Cuzner; +Cc: John Spray, Ceph Development, Sage Weil



On Mon, Mar 20, 2017 at 1:57 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
> John/Sage, thanks for the clarification and info. At this stage, I'll
> stick with the data I have with John's caveats.
>
> The challenge in understanding the load going on in a cluster is
> definitely interesting since the choke points are different depending
> on whether you look at the cluster through a hardware or software
> 'lens'.
>
> I think the interesting question is how does a customer know how
> 'full' their cluster is from a performance standpoint - ie. when do I
> need to buy more or different hardware? Holy grail type stuff :)
>
> Is there any work going on in this space, perhaps analyzing the
> underlying components within the cluster like cpu, ram or disk util
> rates across the nodes?

Wouldn't this be reinventing the wheel since this is something that things like
pcp (collectd?) do very well already?

>
>
>
> On Wed, Mar 15, 2017 at 2:13 AM, Sage Weil <sweil@redhat.com> wrote:
>> On Tue, 14 Mar 2017, John Spray wrote:
>>> On Tue, Mar 14, 2017 at 3:13 AM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>> > First of all - thanks John for your patience!
>>> >
>>> > I guess, I still can't get past the different metrics being used -
>>> > client I/O is described in one way, recovery in another and yet
>>> > fundamentally they both send ops to the OSD's right? To me, what's
>>> > interesting is that the recovery_rate metrics from pool stats seems to
>>> > be a higher level 'product' of lower level information - for example
>>> > recovering_objects_per_sec : is this not a product of multiple
>>> > read/write ops to OSD's?
>>>
>>> While there is data being moved around, it would be misleading to say
>>> it's all just ops.  The path that client ops go down is different to
>>> the path that recovery messages go down.  Recovery data is gathered up
>>> into big vectors of object extents that are sent between OSDs, client
>>> ops are sent individually from clients.  An OSD servicing 10 writes
>>> from 10 different clients is not directly comparable to an OSD
>>> servicing an MOSDPush message from another OSD that happens to contain
>>> updates to 10 objects.
>>>
>>> Client ops are also a logically meaningful to consumers of the
>>> cluster, while the recovery stuff is a total implementation detail.
>>> The implementation of recovery could change any time, and any counter
>>> generated from it will only be meaningful to someone who understands
>>> how recovery works on that particular version of the ceph code.
>>>
>>> > Also, don't get me wrong - the recovery_rate dict is cool and it gives
>>> > a great view of object level recovery - I was just hoping for common
>>> > metrics for the OSD ops that are shared by client and recovery
>>> > activity.
>>> >
>>> > Since this isn't the case, what's the recommended way to determine how
>>> > busy a cluster is - across recovery and client (rbd/rgw) requests?
>>>
>>> I would say again that how busy a cluster is doing it's job (client
>>> IO) is a very separate thing from how busy it is doing internal
>>> housekeeping.  Imagine exposing this as a speedometer dial in a GUI
>>> (as people sometimes do) -- a cluster that was killing itself with
>>> recovery and completely blocking it's clients would look like it was
>>> going nice and fast.  In my view, exposing two separate numbers is the
>>> right thing to do, not a shortcoming.
>>>
>>> If you truly want to come up with some kind of single metric then you
>>> can: you could take the rate of change of the objects recovered for
>>> example.  If you wanted to, you could think of finishing recovery of
>>> one object as an "op".  I would tend to think of this as the job of a
>>> higher level tool though, rather than a collectd plugin.  Especially
>>> if the collectd plugin is meant to be general purpose, it should avoid
>>> inventing things like this.
>>
>> I think the only other option is to take a measurement at a lower layer.
>> BlueStore doesn't currently but could easily have metrics for bytes read
>> and written.  But again, this is a secondary product of client and
>> recovery: a client write, for example, will result in 3 writes across 3
>> osds (in a 3x replicated pool).
>>
>> sage
>>
>>
>>  >
>>> John
>>>
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > .
>>> >
>>> > On Tue, Mar 14, 2017 at 11:14 AM, John Spray <jspray@redhat.com> wrote:
>>> >> On Mon, Mar 13, 2017 at 10:13 PM, John Spray <jspray@redhat.com> wrote:
>>> >>> On Mon, Mar 13, 2017 at 9:50 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>> >>>> Fundamentally, the metrics that describe the IO the OSD performs in
>>> >>>> response to a recovery operation should be the same as the metrics for
>>> >>>> client I/O.
>>> >>>
>>> >>> Ah, so the key part here I think is "describe the IO that the OSD
>>> >>> performs" -- the counters you've been looking at do not do that.  They
>>> >>> describe the ops the OSD is servicing, *not* the (disk) IO the OSD is
>>> >>> doing as a result.
>>> >>>
>>> >>> That's why you don't get an apples-to-apples comparison between client
>>> >>> IO and recovery -- if you were looking at disk IO stats from both, it
>>> >>> would be perfectly reasonable to combine/compare them.  When you're
>>> >>> looking at Ceph's own counters of client ops vs. recovery activity,
>>> >>> that no longer makes sense.
>>> >>>
>>> >>>> So in the context of a recovery operation, one OSD would
>>> >>>> report a read (recovery source) and another report a write (recovery
>>> >>>> target), together with their corresponding num_bytes. To my mind this
>>> >>>> provides transparency, and maybe helps potential automation.
>>> >>>
>>> >>> Okay, so if we were talking about disk IO counters, this would
>>> >>> probably make sense (one read wouldn't necessarily correspond to one
>>> >>> write), but if you had a counter that was telling you how many Ceph
>>> >>> recovery push/pull ops were "reading" (being sent) vs "writing" (being
>>> >>> received) the totals would just be zero.
>>> >>
>>> >> Sorry, that should have said the totals would just be equal.
>>> >>
>>> >> John
>>> >>
>>> >>>
>>> >>> John
>>> >>>
>>> >>>>
>>> >>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@redhat.com> wrote:
>>> >>>>> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>> >>>>>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@redhat.com> wrote:
>>> >>>>>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>> >>>>>>>> Thanks John
>>> >>>>>>>>
>>> >>>>>>>> This is weird then. When I look at the data with client load I see the
>>> >>>>>>>> following;
>>> >>>>>>>> {
>>> >>>>>>>> "pool_name": "default.rgw.buckets.index",
>>> >>>>>>>> "pool_id": 94,
>>> >>>>>>>> "recovery": {},
>>> >>>>>>>> "recovery_rate": {},
>>> >>>>>>>> "client_io_rate": {
>>> >>>>>>>> "read_bytes_sec": 19242365,
>>> >>>>>>>> "write_bytes_sec": 0,
>>> >>>>>>>> "read_op_per_sec": 12514,
>>> >>>>>>>> "write_op_per_sec": 0
>>> >>>>>>>> }
>>> >>>>>>>>
>>> >>>>>>>> No object related counters - they're all block based. The plugin I
>>> >>>>>>>> have rolls-up the block metrics across all pools to provide total
>>> >>>>>>>> client load.
>>> >>>>>>>
>>> >>>>>>> Where are you getting the idea that these counters have to do with
>>> >>>>>>> block storage?  What Ceph is telling you about here is the number of
>>> >>>>>>> operations (or bytes in those operations) being handled by OSDs.
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>> Perhaps it's my poor choice of words - apologies.
>>> >>>>>>
>>> >>>>>> read_op_per_sec is read IOP count to the OSDs from client activity
>>> >>>>>> against the pool
>>> >>>>>>
>>> >>>>>> My point is that client-io is expressed in these terms, but recovery
>>> >>>>>> activity is not. I was hoping that both recovery and client I/O would
>>> >>>>>> be reported in the same way so you gain a view of the activity of the
>>> >>>>>> system as a whole. I can sum bytes_sec from client i/o with
>>> >>>>>> recovery_rate bytes_sec, which is something, but I can't see inside
>>> >>>>>> recovery activity to see how much is read or write, or how much IOP
>>> >>>>>> load is coming from recovery.
>>> >>>>>
>>> >>>>> What would it mean to you for a recovery operation (one OSD sending
>>> >>>>> some data to another OSD) to be read vs. write?
>>> >>>>>
>>> >>>>> John
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Cheers,
Brad

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-20  7:54                         ` Brad Hubbard
@ 2017-03-20  8:40                           ` Paul Cuzner
  2017-03-20  8:41                             ` Paul Cuzner
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Cuzner @ 2017-03-20  8:40 UTC (permalink / raw)
  To: Brad Hubbard; +Cc: John Spray, Ceph Development, Sage Weil

I was suggesting inventing the data collector - more about how
(formula's etc) and what metrics we aggregate to derive meaningful
metrics. pcp, collectd etc give us a single component - what's the
framework that ties all those pieces together to give us the
cluster-wide view? If there is something out there, great...I'm not a
fan of reinventing the wheel either :)



On Mon, Mar 20, 2017 at 8:54 PM, Brad Hubbard <bhubbard@redhat.com> wrote:
>
>
> On Mon, Mar 20, 2017 at 1:57 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>> John/Sage, thanks for the clarification and info. At this stage, I'll
>> stick with the data I have with John's caveats.
>>
>> The challenge in understanding the load going on in a cluster is
>> definitely interesting since the choke points are different depending
>> on whether you look at the cluster through a hardware or software
>> 'lens'.
>>
>> I think the interesting question is how does a customer know how
>> 'full' their cluster is from a performance standpoint - ie. when do I
>> need to buy more or different hardware? Holy grail type stuff :)
>>
>> Is there any work going on in this space, perhaps analyzing the
>> underlying components within the cluster like cpu, ram or disk util
>> rates across the nodes?
>
> Wouldn't this be reinventing the wheel since this is something that things like
> pcp (collectd?) do very well already?
>
>>
>>
>>
>> On Wed, Mar 15, 2017 at 2:13 AM, Sage Weil <sweil@redhat.com> wrote:
>>> On Tue, 14 Mar 2017, John Spray wrote:
>>>> On Tue, Mar 14, 2017 at 3:13 AM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>> > First of all - thanks John for your patience!
>>>> >
>>>> > I guess, I still can't get past the different metrics being used -
>>>> > client I/O is described in one way, recovery in another and yet
>>>> > fundamentally they both send ops to the OSD's right? To me, what's
>>>> > interesting is that the recovery_rate metrics from pool stats seems to
>>>> > be a higher level 'product' of lower level information - for example
>>>> > recovering_objects_per_sec : is this not a product of multiple
>>>> > read/write ops to OSD's?
>>>>
>>>> While there is data being moved around, it would be misleading to say
>>>> it's all just ops.  The path that client ops go down is different to
>>>> the path that recovery messages go down.  Recovery data is gathered up
>>>> into big vectors of object extents that are sent between OSDs, client
>>>> ops are sent individually from clients.  An OSD servicing 10 writes
>>>> from 10 different clients is not directly comparable to an OSD
>>>> servicing an MOSDPush message from another OSD that happens to contain
>>>> updates to 10 objects.
>>>>
>>>> Client ops are also a logically meaningful to consumers of the
>>>> cluster, while the recovery stuff is a total implementation detail.
>>>> The implementation of recovery could change any time, and any counter
>>>> generated from it will only be meaningful to someone who understands
>>>> how recovery works on that particular version of the ceph code.
>>>>
>>>> > Also, don't get me wrong - the recovery_rate dict is cool and it gives
>>>> > a great view of object level recovery - I was just hoping for common
>>>> > metrics for the OSD ops that are shared by client and recovery
>>>> > activity.
>>>> >
>>>> > Since this isn't the case, what's the recommended way to determine how
>>>> > busy a cluster is - across recovery and client (rbd/rgw) requests?
>>>>
>>>> I would say again that how busy a cluster is doing it's job (client
>>>> IO) is a very separate thing from how busy it is doing internal
>>>> housekeeping.  Imagine exposing this as a speedometer dial in a GUI
>>>> (as people sometimes do) -- a cluster that was killing itself with
>>>> recovery and completely blocking it's clients would look like it was
>>>> going nice and fast.  In my view, exposing two separate numbers is the
>>>> right thing to do, not a shortcoming.
>>>>
>>>> If you truly want to come up with some kind of single metric then you
>>>> can: you could take the rate of change of the objects recovered for
>>>> example.  If you wanted to, you could think of finishing recovery of
>>>> one object as an "op".  I would tend to think of this as the job of a
>>>> higher level tool though, rather than a collectd plugin.  Especially
>>>> if the collectd plugin is meant to be general purpose, it should avoid
>>>> inventing things like this.
>>>
>>> I think the only other option is to take a measurement at a lower layer.
>>> BlueStore doesn't currently but could easily have metrics for bytes read
>>> and written.  But again, this is a secondary product of client and
>>> recovery: a client write, for example, will result in 3 writes across 3
>>> osds (in a 3x replicated pool).
>>>
>>> sage
>>>
>>>
>>>  >
>>>> John
>>>>
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > .
>>>> >
>>>> > On Tue, Mar 14, 2017 at 11:14 AM, John Spray <jspray@redhat.com> wrote:
>>>> >> On Mon, Mar 13, 2017 at 10:13 PM, John Spray <jspray@redhat.com> wrote:
>>>> >>> On Mon, Mar 13, 2017 at 9:50 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>> >>>> Fundamentally, the metrics that describe the IO the OSD performs in
>>>> >>>> response to a recovery operation should be the same as the metrics for
>>>> >>>> client I/O.
>>>> >>>
>>>> >>> Ah, so the key part here I think is "describe the IO that the OSD
>>>> >>> performs" -- the counters you've been looking at do not do that.  They
>>>> >>> describe the ops the OSD is servicing, *not* the (disk) IO the OSD is
>>>> >>> doing as a result.
>>>> >>>
>>>> >>> That's why you don't get an apples-to-apples comparison between client
>>>> >>> IO and recovery -- if you were looking at disk IO stats from both, it
>>>> >>> would be perfectly reasonable to combine/compare them.  When you're
>>>> >>> looking at Ceph's own counters of client ops vs. recovery activity,
>>>> >>> that no longer makes sense.
>>>> >>>
>>>> >>>> So in the context of a recovery operation, one OSD would
>>>> >>>> report a read (recovery source) and another report a write (recovery
>>>> >>>> target), together with their corresponding num_bytes. To my mind this
>>>> >>>> provides transparency, and maybe helps potential automation.
>>>> >>>
>>>> >>> Okay, so if we were talking about disk IO counters, this would
>>>> >>> probably make sense (one read wouldn't necessarily correspond to one
>>>> >>> write), but if you had a counter that was telling you how many Ceph
>>>> >>> recovery push/pull ops were "reading" (being sent) vs "writing" (being
>>>> >>> received) the totals would just be zero.
>>>> >>
>>>> >> Sorry, that should have said the totals would just be equal.
>>>> >>
>>>> >> John
>>>> >>
>>>> >>>
>>>> >>> John
>>>> >>>
>>>> >>>>
>>>> >>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@redhat.com> wrote:
>>>> >>>>> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>> >>>>>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@redhat.com> wrote:
>>>> >>>>>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>> >>>>>>>> Thanks John
>>>> >>>>>>>>
>>>> >>>>>>>> This is weird then. When I look at the data with client load I see the
>>>> >>>>>>>> following;
>>>> >>>>>>>> {
>>>> >>>>>>>> "pool_name": "default.rgw.buckets.index",
>>>> >>>>>>>> "pool_id": 94,
>>>> >>>>>>>> "recovery": {},
>>>> >>>>>>>> "recovery_rate": {},
>>>> >>>>>>>> "client_io_rate": {
>>>> >>>>>>>> "read_bytes_sec": 19242365,
>>>> >>>>>>>> "write_bytes_sec": 0,
>>>> >>>>>>>> "read_op_per_sec": 12514,
>>>> >>>>>>>> "write_op_per_sec": 0
>>>> >>>>>>>> }
>>>> >>>>>>>>
>>>> >>>>>>>> No object related counters - they're all block based. The plugin I
>>>> >>>>>>>> have rolls-up the block metrics across all pools to provide total
>>>> >>>>>>>> client load.
>>>> >>>>>>>
>>>> >>>>>>> Where are you getting the idea that these counters have to do with
>>>> >>>>>>> block storage?  What Ceph is telling you about here is the number of
>>>> >>>>>>> operations (or bytes in those operations) being handled by OSDs.
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>> Perhaps it's my poor choice of words - apologies.
>>>> >>>>>>
>>>> >>>>>> read_op_per_sec is read IOP count to the OSDs from client activity
>>>> >>>>>> against the pool
>>>> >>>>>>
>>>> >>>>>> My point is that client-io is expressed in these terms, but recovery
>>>> >>>>>> activity is not. I was hoping that both recovery and client I/O would
>>>> >>>>>> be reported in the same way so you gain a view of the activity of the
>>>> >>>>>> system as a whole. I can sum bytes_sec from client i/o with
>>>> >>>>>> recovery_rate bytes_sec, which is something, but I can't see inside
>>>> >>>>>> recovery activity to see how much is read or write, or how much IOP
>>>> >>>>>> load is coming from recovery.
>>>> >>>>>
>>>> >>>>> What would it mean to you for a recovery operation (one OSD sending
>>>> >>>>> some data to another OSD) to be read vs. write?
>>>> >>>>>
>>>> >>>>> John
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Cheers,
> Brad

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-20  8:40                           ` Paul Cuzner
@ 2017-03-20  8:41                             ` Paul Cuzner
  0 siblings, 0 replies; 18+ messages in thread
From: Paul Cuzner @ 2017-03-20  8:41 UTC (permalink / raw)
  To: Brad Hubbard; +Cc: John Spray, Ceph Development, Sage Weil

s/i was/i wasn't/

doh...it's late

On Mon, Mar 20, 2017 at 9:40 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
> I was suggesting inventing the data collector - more about how
> (formula's etc) and what metrics we aggregate to derive meaningful
> metrics. pcp, collectd etc give us a single component - what's the
> framework that ties all those pieces together to give us the
> cluster-wide view? If there is something out there, great...I'm not a
> fan of reinventing the wheel either :)
>
>
>
> On Mon, Mar 20, 2017 at 8:54 PM, Brad Hubbard <bhubbard@redhat.com> wrote:
>>
>>
>> On Mon, Mar 20, 2017 at 1:57 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>> John/Sage, thanks for the clarification and info. At this stage, I'll
>>> stick with the data I have with John's caveats.
>>>
>>> The challenge in understanding the load going on in a cluster is
>>> definitely interesting since the choke points are different depending
>>> on whether you look at the cluster through a hardware or software
>>> 'lens'.
>>>
>>> I think the interesting question is how does a customer know how
>>> 'full' their cluster is from a performance standpoint - ie. when do I
>>> need to buy more or different hardware? Holy grail type stuff :)
>>>
>>> Is there any work going on in this space, perhaps analyzing the
>>> underlying components within the cluster like cpu, ram or disk util
>>> rates across the nodes?
>>
>> Wouldn't this be reinventing the wheel since this is something that things like
>> pcp (collectd?) do very well already?
>>
>>>
>>>
>>>
>>> On Wed, Mar 15, 2017 at 2:13 AM, Sage Weil <sweil@redhat.com> wrote:
>>>> On Tue, 14 Mar 2017, John Spray wrote:
>>>>> On Tue, Mar 14, 2017 at 3:13 AM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>>> > First of all - thanks John for your patience!
>>>>> >
>>>>> > I guess, I still can't get past the different metrics being used -
>>>>> > client I/O is described in one way, recovery in another and yet
>>>>> > fundamentally they both send ops to the OSD's right? To me, what's
>>>>> > interesting is that the recovery_rate metrics from pool stats seems to
>>>>> > be a higher level 'product' of lower level information - for example
>>>>> > recovering_objects_per_sec : is this not a product of multiple
>>>>> > read/write ops to OSD's?
>>>>>
>>>>> While there is data being moved around, it would be misleading to say
>>>>> it's all just ops.  The path that client ops go down is different to
>>>>> the path that recovery messages go down.  Recovery data is gathered up
>>>>> into big vectors of object extents that are sent between OSDs, client
>>>>> ops are sent individually from clients.  An OSD servicing 10 writes
>>>>> from 10 different clients is not directly comparable to an OSD
>>>>> servicing an MOSDPush message from another OSD that happens to contain
>>>>> updates to 10 objects.
>>>>>
>>>>> Client ops are also a logically meaningful to consumers of the
>>>>> cluster, while the recovery stuff is a total implementation detail.
>>>>> The implementation of recovery could change any time, and any counter
>>>>> generated from it will only be meaningful to someone who understands
>>>>> how recovery works on that particular version of the ceph code.
>>>>>
>>>>> > Also, don't get me wrong - the recovery_rate dict is cool and it gives
>>>>> > a great view of object level recovery - I was just hoping for common
>>>>> > metrics for the OSD ops that are shared by client and recovery
>>>>> > activity.
>>>>> >
>>>>> > Since this isn't the case, what's the recommended way to determine how
>>>>> > busy a cluster is - across recovery and client (rbd/rgw) requests?
>>>>>
>>>>> I would say again that how busy a cluster is doing it's job (client
>>>>> IO) is a very separate thing from how busy it is doing internal
>>>>> housekeeping.  Imagine exposing this as a speedometer dial in a GUI
>>>>> (as people sometimes do) -- a cluster that was killing itself with
>>>>> recovery and completely blocking it's clients would look like it was
>>>>> going nice and fast.  In my view, exposing two separate numbers is the
>>>>> right thing to do, not a shortcoming.
>>>>>
>>>>> If you truly want to come up with some kind of single metric then you
>>>>> can: you could take the rate of change of the objects recovered for
>>>>> example.  If you wanted to, you could think of finishing recovery of
>>>>> one object as an "op".  I would tend to think of this as the job of a
>>>>> higher level tool though, rather than a collectd plugin.  Especially
>>>>> if the collectd plugin is meant to be general purpose, it should avoid
>>>>> inventing things like this.
>>>>
>>>> I think the only other option is to take a measurement at a lower layer.
>>>> BlueStore doesn't currently but could easily have metrics for bytes read
>>>> and written.  But again, this is a secondary product of client and
>>>> recovery: a client write, for example, will result in 3 writes across 3
>>>> osds (in a 3x replicated pool).
>>>>
>>>> sage
>>>>
>>>>
>>>>  >
>>>>> John
>>>>>
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > .
>>>>> >
>>>>> > On Tue, Mar 14, 2017 at 11:14 AM, John Spray <jspray@redhat.com> wrote:
>>>>> >> On Mon, Mar 13, 2017 at 10:13 PM, John Spray <jspray@redhat.com> wrote:
>>>>> >>> On Mon, Mar 13, 2017 at 9:50 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>>> >>>> Fundamentally, the metrics that describe the IO the OSD performs in
>>>>> >>>> response to a recovery operation should be the same as the metrics for
>>>>> >>>> client I/O.
>>>>> >>>
>>>>> >>> Ah, so the key part here I think is "describe the IO that the OSD
>>>>> >>> performs" -- the counters you've been looking at do not do that.  They
>>>>> >>> describe the ops the OSD is servicing, *not* the (disk) IO the OSD is
>>>>> >>> doing as a result.
>>>>> >>>
>>>>> >>> That's why you don't get an apples-to-apples comparison between client
>>>>> >>> IO and recovery -- if you were looking at disk IO stats from both, it
>>>>> >>> would be perfectly reasonable to combine/compare them.  When you're
>>>>> >>> looking at Ceph's own counters of client ops vs. recovery activity,
>>>>> >>> that no longer makes sense.
>>>>> >>>
>>>>> >>>> So in the context of a recovery operation, one OSD would
>>>>> >>>> report a read (recovery source) and another report a write (recovery
>>>>> >>>> target), together with their corresponding num_bytes. To my mind this
>>>>> >>>> provides transparency, and maybe helps potential automation.
>>>>> >>>
>>>>> >>> Okay, so if we were talking about disk IO counters, this would
>>>>> >>> probably make sense (one read wouldn't necessarily correspond to one
>>>>> >>> write), but if you had a counter that was telling you how many Ceph
>>>>> >>> recovery push/pull ops were "reading" (being sent) vs "writing" (being
>>>>> >>> received) the totals would just be zero.
>>>>> >>
>>>>> >> Sorry, that should have said the totals would just be equal.
>>>>> >>
>>>>> >> John
>>>>> >>
>>>>> >>>
>>>>> >>> John
>>>>> >>>
>>>>> >>>>
>>>>> >>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@redhat.com> wrote:
>>>>> >>>>> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>>> >>>>>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@redhat.com> wrote:
>>>>> >>>>>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@redhat.com> wrote:
>>>>> >>>>>>>> Thanks John
>>>>> >>>>>>>>
>>>>> >>>>>>>> This is weird then. When I look at the data with client load I see the
>>>>> >>>>>>>> following;
>>>>> >>>>>>>> {
>>>>> >>>>>>>> "pool_name": "default.rgw.buckets.index",
>>>>> >>>>>>>> "pool_id": 94,
>>>>> >>>>>>>> "recovery": {},
>>>>> >>>>>>>> "recovery_rate": {},
>>>>> >>>>>>>> "client_io_rate": {
>>>>> >>>>>>>> "read_bytes_sec": 19242365,
>>>>> >>>>>>>> "write_bytes_sec": 0,
>>>>> >>>>>>>> "read_op_per_sec": 12514,
>>>>> >>>>>>>> "write_op_per_sec": 0
>>>>> >>>>>>>> }
>>>>> >>>>>>>>
>>>>> >>>>>>>> No object related counters - they're all block based. The plugin I
>>>>> >>>>>>>> have rolls-up the block metrics across all pools to provide total
>>>>> >>>>>>>> client load.
>>>>> >>>>>>>
>>>>> >>>>>>> Where are you getting the idea that these counters have to do with
>>>>> >>>>>>> block storage?  What Ceph is telling you about here is the number of
>>>>> >>>>>>> operations (or bytes in those operations) being handled by OSDs.
>>>>> >>>>>>>
>>>>> >>>>>>
>>>>> >>>>>> Perhaps it's my poor choice of words - apologies.
>>>>> >>>>>>
>>>>> >>>>>> read_op_per_sec is read IOP count to the OSDs from client activity
>>>>> >>>>>> against the pool
>>>>> >>>>>>
>>>>> >>>>>> My point is that client-io is expressed in these terms, but recovery
>>>>> >>>>>> activity is not. I was hoping that both recovery and client I/O would
>>>>> >>>>>> be reported in the same way so you gain a view of the activity of the
>>>>> >>>>>> system as a whole. I can sum bytes_sec from client i/o with
>>>>> >>>>>> recovery_rate bytes_sec, which is something, but I can't see inside
>>>>> >>>>>> recovery activity to see how much is read or write, or how much IOP
>>>>> >>>>>> load is coming from recovery.
>>>>> >>>>>
>>>>> >>>>> What would it mean to you for a recovery operation (one OSD sending
>>>>> >>>>> some data to another OSD) to be read vs. write?
>>>>> >>>>>
>>>>> >>>>> John
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Cheers,
>> Brad

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-10  2:37 Interpreting ceph osd pool stats output Paul Cuzner
  2017-03-10  9:55 ` John Spray
@ 2017-03-20 14:20 ` Ruben Kerkhof
  2017-03-21  1:31   ` Paul Cuzner
  1 sibling, 1 reply; 18+ messages in thread
From: Ruben Kerkhof @ 2017-03-20 14:20 UTC (permalink / raw)
  To: Paul Cuzner; +Cc: ceph-devel

On Fri, Mar 10, 2017 at 3:37 AM, Paul Cuzner <pcuzner@redhat.com> wrote:
> Hi,

Hi Paul,

>
> I've been putting together a collectd plugin for ceph - since the old
> one's I could find no longer work.

Did you try the ceph plugin in upstream collectd?
It has been in collectd since 5.5.

Kind regards,

Ruben Kerkhof

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Interpreting ceph osd pool stats output
  2017-03-20 14:20 ` Ruben Kerkhof
@ 2017-03-21  1:31   ` Paul Cuzner
  0 siblings, 0 replies; 18+ messages in thread
From: Paul Cuzner @ 2017-03-21  1:31 UTC (permalink / raw)
  To: Ruben Kerkhof; +Cc: ceph-devel

I'm working on downstream stuff (RHEL) which made the plugins a little
harder to find - but yes, I've tried the ceph plugin (@ 5.7).

With the plugin on a mon I can see all the counters from perf dump,
but since I'm working on benchmarking, I wanted more :)

- how many hosts in the cluster
- what's the client ops performance look like, by pool and total
- what recovery work is going on, by pool and total

tbh, I haven't used the plugin on the osd nodes - I've been looking
for higher level metrics to try and get cluster wide view.




On Tue, Mar 21, 2017 at 3:20 AM, Ruben Kerkhof <ruben@rubenkerkhof.com> wrote:
> On Fri, Mar 10, 2017 at 3:37 AM, Paul Cuzner <pcuzner@redhat.com> wrote:
>> Hi,
>
> Hi Paul,
>
>>
>> I've been putting together a collectd plugin for ceph - since the old
>> one's I could find no longer work.
>
> Did you try the ceph plugin in upstream collectd?
> It has been in collectd since 5.5.
>
> Kind regards,
>
> Ruben Kerkhof

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-03-21  1:33 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-10  2:37 Interpreting ceph osd pool stats output Paul Cuzner
2017-03-10  9:55 ` John Spray
2017-03-10 20:52   ` Paul Cuzner
2017-03-11 20:49     ` John Spray
2017-03-11 21:24       ` Paul Cuzner
2017-03-12 12:13         ` John Spray
2017-03-13 21:50           ` Paul Cuzner
2017-03-13 22:13             ` John Spray
2017-03-13 22:14               ` John Spray
2017-03-14  3:13                 ` Paul Cuzner
2017-03-14  9:49                   ` John Spray
2017-03-14 13:13                     ` Sage Weil
2017-03-20  3:57                       ` Paul Cuzner
2017-03-20  7:54                         ` Brad Hubbard
2017-03-20  8:40                           ` Paul Cuzner
2017-03-20  8:41                             ` Paul Cuzner
2017-03-20 14:20 ` Ruben Kerkhof
2017-03-21  1:31   ` Paul Cuzner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.