All of lore.kernel.org
 help / color / mirror / Atom feed
* Proposition - latency histogram
@ 2016-11-28 16:22 Bartłomiej Święcki
  2016-11-28 16:46 ` Allen Samuels
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Bartłomiej Święcki @ 2016-11-28 16:22 UTC (permalink / raw)
  To: Ceph Development

Hi,


Currently we can query OSD for op latency but it's given as an average. 
Average may not give
the bets information in this case - i.e. spikes can easily get hidden there.

Instead of an average we could easily do a simple histogram - quantize 
the latency into
predefined set of time intervals, for each of them have a simple 
performance counter,
at each op increase one of them. Since those are per OSD, we could have 
pretty high resolution
with fractional memory usage, performance impact should be negligible 
since only one (two if split
into read and write) of those counters would be incremented per one osd op.

In addition we could also do this in 2D - each counter matching given 
latency range and op size range.
having such 2D table would show both latency histogram, request size 
histogram and combinations of those
(i.e. latency histogram of ~4k ops only).

What do you think about this idea? I can prepare some code - a simple 
proof of concept looks really
straightforward to implement.


Bartek


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Proposition - latency histogram
  2016-11-28 16:22 Proposition - latency histogram Bartłomiej Święcki
@ 2016-11-28 16:46 ` Allen Samuels
  2016-11-28 23:05   ` Milosz Tanski
  2016-11-28 16:51 ` Sage Weil
  2017-01-09 11:27 ` Bartłomiej Święcki
  2 siblings, 1 reply; 11+ messages in thread
From: Allen Samuels @ 2016-11-28 16:46 UTC (permalink / raw)
  To: Bartłomiej Święcki, Ceph Development

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> owner@vger.kernel.org] On Behalf Of Bartlomiej Swiecki
> Sent: Monday, November 28, 2016 8:22 AM
> To: Ceph Development <ceph-devel@vger.kernel.org>
> Subject: Proposition - latency histogram
> 
> Hi,
> 
> 
> Currently we can query OSD for op latency but it's given as an average.
> Average may not give
> the bets information in this case - i.e. spikes can easily get hidden there.
> 
> Instead of an average we could easily do a simple histogram - quantize the
> latency into predefined set of time intervals, for each of them have a simple
> performance counter, at each op increase one of them. Since those are per
> OSD, we could have pretty high resolution with fractional memory usage,
> performance impact should be negligible since only one (two if split into read
> and write) of those counters would be incremented per one osd op.
> 

+1

A reminder, there are different latency domains for the different media types (flash, HDD). One solution is to make the buckets be parameterized.

> In addition we could also do this in 2D - each counter matching given latency
> range and op size range.
> having such 2D table would show both latency histogram, request size
> histogram and combinations of those (i.e. latency histogram of ~4k ops only).
> 
> What do you think about this idea? I can prepare some code - a simple proof
> of concept looks really straightforward to implement.
> 
> 
> Bartek
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Proposition - latency histogram
  2016-11-28 16:22 Proposition - latency histogram Bartłomiej Święcki
  2016-11-28 16:46 ` Allen Samuels
@ 2016-11-28 16:51 ` Sage Weil
  2016-11-28 17:43   ` John Spray
  2017-01-09 11:27 ` Bartłomiej Święcki
  2 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2016-11-28 16:51 UTC (permalink / raw)
  To: Bartłomiej Święcki; +Cc: Ceph Development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1362 bytes --]

On Mon, 28 Nov 2016, Bartłomiej Święcki wrote:
> Hi,
> 
> Currently we can query OSD for op latency but it's given as an average. 
> Average may not give the bets information in this case - i.e. spikes can 
> easily get hidden there.
> 
> Instead of an average we could easily do a simple histogram - quantize 
> the latency into predefined set of time intervals, for each of them have 
> a simple performance counter, at each op increase one of them. Since 
> those are per OSD, we could have pretty high resolution with fractional 
> memory usage, performance impact should be negligible since only one 
> (two if split into read and write) of those counters would be 
> incremented per one osd op.
> 
> In addition we could also do this in 2D - each counter matching given 
> latency range and op size range. having such 2D table would show both 
> latency histogram, request size histogram and combinations of those 
> (i.e. latency histogram of ~4k ops only).
> 
> What do you think about this idea? I can prepare some code - a simple proof of
> concept looks really
> straightforward to implement.

This sounds like a great idea.  I think the main issue is that the data 
won't be easily exposed via the perfcounter interface... at least not in a 
way that generic tools can visualize.  Unless there is a standardish way 
to report histogram metrics?

sage

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Proposition - latency histogram
  2016-11-28 16:51 ` Sage Weil
@ 2016-11-28 17:43   ` John Spray
  2016-11-29  2:43     ` Josh Durgin
  0 siblings, 1 reply; 11+ messages in thread
From: John Spray @ 2016-11-28 17:43 UTC (permalink / raw)
  To: Sage Weil; +Cc: Bartłomiej Święcki, Ceph Development

On Mon, Nov 28, 2016 at 4:51 PM, Sage Weil <sage@newdream.net> wrote:
> On Mon, 28 Nov 2016, Bartłomiej Święcki wrote:
>> Hi,
>>
>> Currently we can query OSD for op latency but it's given as an average.
>> Average may not give the bets information in this case - i.e. spikes can
>> easily get hidden there.
>>
>> Instead of an average we could easily do a simple histogram - quantize
>> the latency into predefined set of time intervals, for each of them have
>> a simple performance counter, at each op increase one of them. Since
>> those are per OSD, we could have pretty high resolution with fractional
>> memory usage, performance impact should be negligible since only one
>> (two if split into read and write) of those counters would be
>> incremented per one osd op.
>>
>> In addition we could also do this in 2D - each counter matching given
>> latency range and op size range. having such 2D table would show both
>> latency histogram, request size histogram and combinations of those
>> (i.e. latency histogram of ~4k ops only).
>>
>> What do you think about this idea? I can prepare some code - a simple proof of
>> concept looks really
>> straightforward to implement.
>
> This sounds like a great idea.  I think the main issue is that the data
> won't be easily exposed via the perfcounter interface... at least not in a
> way that generic tools can visualize.  Unless there is a standardish way
> to report histogram metrics?

Newer tools are waking up to the need for histograms, e.g. Prometheus
has a histogram datatype:
https://prometheus.io/docs/concepts/metric_types/#histogram

Someone has done some work on adding support in grafana:
https://github.com/grafana/grafana/issues/600

Should be reasonably straightforward to add a histogram type to the
perf counters: people might end up flattening it to a series of scalar
time series with _bucket suffixes or whatever, but I'd definitely be
in favour of us adding an explicit histogram type internally.

John

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Proposition - latency histogram
  2016-11-28 16:46 ` Allen Samuels
@ 2016-11-28 23:05   ` Milosz Tanski
  0 siblings, 0 replies; 11+ messages in thread
From: Milosz Tanski @ 2016-11-28 23:05 UTC (permalink / raw)
  To: Allen Samuels; +Cc: Bartłomiej Święcki, Ceph Development

On Mon, Nov 28, 2016 at 11:46 AM, Allen Samuels
<Allen.Samuels@sandisk.com> wrote:
>
> > -----Original Message-----
> > From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> > owner@vger.kernel.org] On Behalf Of Bartlomiej Swiecki
> > Sent: Monday, November 28, 2016 8:22 AM
> > To: Ceph Development <ceph-devel@vger.kernel.org>
> > Subject: Proposition - latency histogram
> >
> > Hi,
> >
> >
> > Currently we can query OSD for op latency but it's given as an average.
> > Average may not give
> > the bets information in this case - i.e. spikes can easily get hidden there.
> >
> > Instead of an average we could easily do a simple histogram - quantize the
> > latency into predefined set of time intervals, for each of them have a simple
> > performance counter, at each op increase one of them. Since those are per
> > OSD, we could have pretty high resolution with fractional memory usage,
> > performance impact should be negligible since only one (two if split into read
> > and write) of those counters would be incremented per one osd op.
> >
>
> +1
>
> A reminder, there are different latency domains for the different media types (flash, HDD). One solution is to make the buckets be parameterized.


The histogram can be represented using Count Min Sketch which can
compress a lot buckets in a small space giving us more resolution in
the X axis in exchange for some error in Y axis. You can later
transform it on the fly into something that is closely related to the
buckets you want to use. If you have a cluster that uses different
kind of storage (nvme, ssd, spinning disk and maybe EC) you will end
up values all over the map (as you mentioned).

And while Count Min Sketch it should be enough to estimate and show a
visual representation of PDF or CDF (probability/cumulative density
function) from the discretized estimate.

There's also other sketches for doing histograms like these, but I'm
less familiar with them. I'm guessing that somebody with a
stats/science background can point to them/


>
>
> > In addition we could also do this in 2D - each counter matching given latency
> > range and op size range.
> > having such 2D table would show both latency histogram, request size
> > histogram and combinations of those (i.e. latency histogram of ~4k ops only).
> >
> > What do you think about this idea? I can prepare some code - a simple proof
> > of concept looks really straightforward to implement.
> >
> >
> > Bartek
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Proposition - latency histogram
  2016-11-28 17:43   ` John Spray
@ 2016-11-29  2:43     ` Josh Durgin
  0 siblings, 0 replies; 11+ messages in thread
From: Josh Durgin @ 2016-11-29  2:43 UTC (permalink / raw)
  To: John Spray, Sage Weil; +Cc: Bartłomiej Święcki, Ceph Development

On 11/28/2016 09:43 AM, John Spray wrote:
> On Mon, Nov 28, 2016 at 4:51 PM, Sage Weil <sage@newdream.net> wrote:
>> On Mon, 28 Nov 2016, Bartłomiej Święcki wrote:
>>> Hi,
>>>
>>> Currently we can query OSD for op latency but it's given as an average.
>>> Average may not give the bets information in this case - i.e. spikes can
>>> easily get hidden there.
>>>
>>> Instead of an average we could easily do a simple histogram - quantize
>>> the latency into predefined set of time intervals, for each of them have
>>> a simple performance counter, at each op increase one of them. Since
>>> those are per OSD, we could have pretty high resolution with fractional
>>> memory usage, performance impact should be negligible since only one
>>> (two if split into read and write) of those counters would be
>>> incremented per one osd op.
>>>
>>> In addition we could also do this in 2D - each counter matching given
>>> latency range and op size range. having such 2D table would show both
>>> latency histogram, request size histogram and combinations of those
>>> (i.e. latency histogram of ~4k ops only).
>>>
>>> What do you think about this idea? I can prepare some code - a simple proof of
>>> concept looks really
>>> straightforward to implement.
>>
>> This sounds like a great idea.  I think the main issue is that the data
>> won't be easily exposed via the perfcounter interface... at least not in a
>> way that generic tools can visualize.  Unless there is a standardish way
>> to report histogram metrics?
>
> Newer tools are waking up to the need for histograms, e.g. Prometheus
> has a histogram datatype:
> https://prometheus.io/docs/concepts/metric_types/#histogram
>
> Someone has done some work on adding support in grafana:
> https://github.com/grafana/grafana/issues/600
>
> Should be reasonably straightforward to add a histogram type to the
> perf counters: people might end up flattening it to a series of scalar
> time series with _bucket suffixes or whatever, but I'd definitely be
> in favour of us adding an explicit histogram type internally.

There are also existing libraries like HdrHistogram that have nice
serialized formats that could be extracted in windowed intervals for
monitoring systems, or later analysis, and have existing scripts for
graphing [0].

It also has support for correcting reporting of outliers in common 
benchmark architectures ("coordinated omission"), which would be handy
for a number of our benchmarks [1][2][3].

Josh

[0] https://hdrhistogram.github.io/HdrHistogram/
[1] 
http://psy-lob-saw.blogspot.com/2015/02/hdrhistogram-better-latency-capture.html
[2] 
http://repository.cmu.edu/cgi/viewcontent.cgi?article=1872&context=compsci
[3] 
http://www.azulsystems.com/sites/default/files/images/HowNotToMeasureLatency_LLSummit_NYC_12Nov2013.pdf

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Proposition - latency histogram
  2016-11-28 16:22 Proposition - latency histogram Bartłomiej Święcki
  2016-11-28 16:46 ` Allen Samuels
  2016-11-28 16:51 ` Sage Weil
@ 2017-01-09 11:27 ` Bartłomiej Święcki
  2017-01-31 15:22   ` Bartłomiej Święcki
  2 siblings, 1 reply; 11+ messages in thread
From: Bartłomiej Święcki @ 2017-01-09 11:27 UTC (permalink / raw)
  To: Ceph Development

Hi,

I've made a simple implementation of performance histograms. 
Implementation is not very sophisticated
but I think it could be a good start for more detailed discussion.

Here's the PR: https://github.com/ceph/ceph/pull/12829


Regards,
Bartek


On 11/28/2016 05:22 PM, Bartłomiej Święcki wrote:
> Hi,
>
>
> Currently we can query OSD for op latency but it's given as an 
> average. Average may not give
> the bets information in this case - i.e. spikes can easily get hidden 
> there.
>
> Instead of an average we could easily do a simple histogram - quantize 
> the latency into
> predefined set of time intervals, for each of them have a simple 
> performance counter,
> at each op increase one of them. Since those are per OSD, we could 
> have pretty high resolution
> with fractional memory usage, performance impact should be negligible 
> since only one (two if split
> into read and write) of those counters would be incremented per one 
> osd op.
>
> In addition we could also do this in 2D - each counter matching given 
> latency range and op size range.
> having such 2D table would show both latency histogram, request size 
> histogram and combinations of those
> (i.e. latency histogram of ~4k ops only).
>
> What do you think about this idea? I can prepare some code - a simple 
> proof of concept looks really
> straightforward to implement.
>
>
> Bartek
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Proposition - latency histogram
  2017-01-09 11:27 ` Bartłomiej Święcki
@ 2017-01-31 15:22   ` Bartłomiej Święcki
  2017-01-31 15:33     ` Bartłomiej Święcki
  0 siblings, 1 reply; 11+ messages in thread
From: Bartłomiej Święcki @ 2017-01-31 15:22 UTC (permalink / raw)
  To: Ceph Development

Hi,

Bringing back performance histograms: 
https://github.com/ceph/ceph/pull/12829
I've updated the PR, rebased on master and made internal changes less 
aggressive.

All ctest tests passing and I haven't seen any issues with performance
(and I can actually see much better what the performance characteristics are

Waiting for your comments,
Bartek



Looking

On 01/09/2017 12:27 PM, Bartłomiej Święcki wrote:
> Hi,
>
> I've made a simple implementation of performance histograms. 
> Implementation is not very sophisticated
> but I think it could be a good start for more detailed discussion.
>
> Here's the PR: https://github.com/ceph/ceph/pull/12829
>
>
> Regards,
> Bartek
>
>
> On 11/28/2016 05:22 PM, Bartłomiej Święcki wrote:
>> Hi,
>>
>>
>> Currently we can query OSD for op latency but it's given as an 
>> average. Average may not give
>> the bets information in this case - i.e. spikes can easily get hidden 
>> there.
>>
>> Instead of an average we could easily do a simple histogram - 
>> quantize the latency into
>> predefined set of time intervals, for each of them have a simple 
>> performance counter,
>> at each op increase one of them. Since those are per OSD, we could 
>> have pretty high resolution
>> with fractional memory usage, performance impact should be negligible 
>> since only one (two if split
>> into read and write) of those counters would be incremented per one 
>> osd op.
>>
>> In addition we could also do this in 2D - each counter matching given 
>> latency range and op size range.
>> having such 2D table would show both latency histogram, request size 
>> histogram and combinations of those
>> (i.e. latency histogram of ~4k ops only).
>>
>> What do you think about this idea? I can prepare some code - a simple 
>> proof of concept looks really
>> straightforward to implement.
>>
>>
>> Bartek
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Proposition - latency histogram
  2017-01-31 15:22   ` Bartłomiej Święcki
@ 2017-01-31 15:33     ` Bartłomiej Święcki
  2017-02-06 11:54       ` John Spray
  0 siblings, 1 reply; 11+ messages in thread
From: Bartłomiej Święcki @ 2017-01-31 15:33 UTC (permalink / raw)
  To: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 2710 bytes --]

Attached is a sample output from test python script (included in PR)
to display live results.


On 01/31/2017 04:22 PM, Bartłomiej Święcki wrote:
> Hi,
>
> Bringing back performance histograms: 
> https://github.com/ceph/ceph/pull/12829
> I've updated the PR, rebased on master and made internal changes less 
> aggressive.
>
> All ctest tests passing and I haven't seen any issues with performance
> (and I can actually see much better what the performance 
> characteristics are
>
> Waiting for your comments,
> Bartek
>
>
>
> Looking
>
> On 01/09/2017 12:27 PM, Bartłomiej Święcki wrote:
>> Hi,
>>
>> I've made a simple implementation of performance histograms. 
>> Implementation is not very sophisticated
>> but I think it could be a good start for more detailed discussion.
>>
>> Here's the PR: https://github.com/ceph/ceph/pull/12829
>>
>>
>> Regards,
>> Bartek
>>
>>
>> On 11/28/2016 05:22 PM, Bartłomiej Święcki wrote:
>>> Hi,
>>>
>>>
>>> Currently we can query OSD for op latency but it's given as an 
>>> average. Average may not give
>>> the bets information in this case - i.e. spikes can easily get 
>>> hidden there.
>>>
>>> Instead of an average we could easily do a simple histogram - 
>>> quantize the latency into
>>> predefined set of time intervals, for each of them have a simple 
>>> performance counter,
>>> at each op increase one of them. Since those are per OSD, we could 
>>> have pretty high resolution
>>> with fractional memory usage, performance impact should be 
>>> negligible since only one (two if split
>>> into read and write) of those counters would be incremented per one 
>>> osd op.
>>>
>>> In addition we could also do this in 2D - each counter matching 
>>> given latency range and op size range.
>>> having such 2D table would show both latency histogram, request size 
>>> histogram and combinations of those
>>> (i.e. latency histogram of ~4k ops only).
>>>
>>> What do you think about this idea? I can prepare some code - a 
>>> simple proof of concept looks really
>>> straightforward to implement.
>>>
>>>
>>> Bartek
>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe 
>>> ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: Screenshot - 01312017 - 04:29:15 PM.png --]
[-- Type: image/png, Size: 70819 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Proposition - latency histogram
  2017-01-31 15:33     ` Bartłomiej Święcki
@ 2017-02-06 11:54       ` John Spray
  2017-02-07 16:30         ` Bartłomiej Święcki
  0 siblings, 1 reply; 11+ messages in thread
From: John Spray @ 2017-02-06 11:54 UTC (permalink / raw)
  To: Bartłomiej Święcki; +Cc: Ceph Development

On Tue, Jan 31, 2017 at 3:33 PM, Bartłomiej Święcki
<bartlomiej.swiecki@corp.ovh.com> wrote:
> Attached is a sample output from test python script (included in PR)
> to display live results.

This is very cool, and inspired me to build a colored version in to a
toy gui that I've been using to exercise ceph-mgr.
http://imgur.com/a/TG5kc (it a super-primitive rendering using linear
color scale on cells of a <table>)

I did that by exposing the perf counters via the MCommand (`tell`)
interface on the OSD so that the UI could poll them.

I think there are two main use cases for these plots:
 * System-wide (average of OSDs): what are my doing/experiencing?
 * Individual OSD: is this OSD healthy?  Potentially would plot this
as a delta against the systemwide average to highlight OSDs behaving
badly.

Currently, ordinary perf counters are getting shipped back to ceph-mgr
continuously, so we would need to decide whether we want to do the
same for the larger histogram ones, or whether we would expose them
via `tell` so that any interested parties could fetch them on demand.
The main benefit to continuously sending them would be that ceph-mgr
could maintain a continuous sum/average across all the OSDs.  The cost
depends how widely we use this data type: if there were only a few
histograms per osd (osd read, osd write, store read, store write),
then I suspect we could get away with transmitting them around quite
freely.

The 2D data is awesome and I can't see us not wanting this, though
there will also be at least some key places we want 1D data,
especially for the MDS where metadata ops don't have a size dimension.

John


>
>
> On 01/31/2017 04:22 PM, Bartłomiej Święcki wrote:
>>
>> Hi,
>>
>> Bringing back performance histograms:
>> https://github.com/ceph/ceph/pull/12829
>> I've updated the PR, rebased on master and made internal changes less
>> aggressive.
>>
>> All ctest tests passing and I haven't seen any issues with performance
>> (and I can actually see much better what the performance characteristics
>> are
>>
>> Waiting for your comments,
>> Bartek
>>
>>
>>
>> Looking
>>
>> On 01/09/2017 12:27 PM, Bartłomiej Święcki wrote:
>>>
>>> Hi,
>>>
>>> I've made a simple implementation of performance histograms.
>>> Implementation is not very sophisticated
>>> but I think it could be a good start for more detailed discussion.
>>>
>>> Here's the PR: https://github.com/ceph/ceph/pull/12829
>>>
>>>
>>> Regards,
>>> Bartek
>>>
>>>
>>> On 11/28/2016 05:22 PM, Bartłomiej Święcki wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> Currently we can query OSD for op latency but it's given as an average.
>>>> Average may not give
>>>> the bets information in this case - i.e. spikes can easily get hidden
>>>> there.
>>>>
>>>> Instead of an average we could easily do a simple histogram - quantize
>>>> the latency into
>>>> predefined set of time intervals, for each of them have a simple
>>>> performance counter,
>>>> at each op increase one of them. Since those are per OSD, we could have
>>>> pretty high resolution
>>>> with fractional memory usage, performance impact should be negligible
>>>> since only one (two if split
>>>> into read and write) of those counters would be incremented per one osd
>>>> op.
>>>>
>>>> In addition we could also do this in 2D - each counter matching given
>>>> latency range and op size range.
>>>> having such 2D table would show both latency histogram, request size
>>>> histogram and combinations of those
>>>> (i.e. latency histogram of ~4k ops only).
>>>>
>>>> What do you think about this idea? I can prepare some code - a simple
>>>> proof of concept looks really
>>>> straightforward to implement.
>>>>
>>>>
>>>> Bartek
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Proposition - latency histogram
  2017-02-06 11:54       ` John Spray
@ 2017-02-07 16:30         ` Bartłomiej Święcki
  0 siblings, 0 replies; 11+ messages in thread
From: Bartłomiej Święcki @ 2017-02-07 16:30 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

Hi John,

This looks awesome! I really hope to see this kind of tools in ceph-mgr.

Actually the idea of adding histograms came up after some nasty performance
issues we investigated in one of our production clusters where this kind
of tool would help us identify the issue much quicker.

What I think would be really helpful is to
  1) be able to automatically detect anomalies - in case of this 2D 
histogram
     it could be as simple as splitting the whole square into set of regions
     (i.e. regions with meaning: OK, WARN, ERROR) and check if there's 
activity
     inside of them
  2) if there's an anomaly, provide manual tools to "zoom" into the issue so
     that the cause can be precisely investigated

 From what I understood so far, ceph-mgr is going exactly into that 
direction.

Regards,
Bartek

On 02/06/2017 12:54 PM, John Spray wrote:
> On Tue, Jan 31, 2017 at 3:33 PM, Bartłomiej Święcki
> <bartlomiej.swiecki@corp.ovh.com> wrote:
>> Attached is a sample output from test python script (included in PR)
>> to display live results.
> This is very cool, and inspired me to build a colored version in to a
> toy gui that I've been using to exercise ceph-mgr.
> http://imgur.com/a/TG5kc (it a super-primitive rendering using linear
> color scale on cells of a <table>)
>
> I did that by exposing the perf counters via the MCommand (`tell`)
> interface on the OSD so that the UI could poll them.
>
> I think there are two main use cases for these plots:
>   * System-wide (average of OSDs): what are my doing/experiencing?
>   * Individual OSD: is this OSD healthy?  Potentially would plot this
> as a delta against the systemwide average to highlight OSDs behaving
> badly.
>
> Currently, ordinary perf counters are getting shipped back to ceph-mgr
> continuously, so we would need to decide whether we want to do the
> same for the larger histogram ones, or whether we would expose them
> via `tell` so that any interested parties could fetch them on demand.
> The main benefit to continuously sending them would be that ceph-mgr
> could maintain a continuous sum/average across all the OSDs.  The cost
> depends how widely we use this data type: if there were only a few
> histograms per osd (osd read, osd write, store read, store write),
> then I suspect we could get away with transmitting them around quite
> freely.
>
> The 2D data is awesome and I can't see us not wanting this, though
> there will also be at least some key places we want 1D data,
> especially for the MDS where metadata ops don't have a size dimension.
>
> John
>
>
>>
>> On 01/31/2017 04:22 PM, Bartłomiej Święcki wrote:
>>> Hi,
>>>
>>> Bringing back performance histograms:
>>> https://github.com/ceph/ceph/pull/12829
>>> I've updated the PR, rebased on master and made internal changes less
>>> aggressive.
>>>
>>> All ctest tests passing and I haven't seen any issues with performance
>>> (and I can actually see much better what the performance characteristics
>>> are
>>>
>>> Waiting for your comments,
>>> Bartek
>>>
>>>
>>>
>>> Looking
>>>
>>> On 01/09/2017 12:27 PM, Bartłomiej Święcki wrote:
>>>> Hi,
>>>>
>>>> I've made a simple implementation of performance histograms.
>>>> Implementation is not very sophisticated
>>>> but I think it could be a good start for more detailed discussion.
>>>>
>>>> Here's the PR: https://github.com/ceph/ceph/pull/12829
>>>>
>>>>
>>>> Regards,
>>>> Bartek
>>>>
>>>>
>>>> On 11/28/2016 05:22 PM, Bartłomiej Święcki wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> Currently we can query OSD for op latency but it's given as an average.
>>>>> Average may not give
>>>>> the bets information in this case - i.e. spikes can easily get hidden
>>>>> there.
>>>>>
>>>>> Instead of an average we could easily do a simple histogram - quantize
>>>>> the latency into
>>>>> predefined set of time intervals, for each of them have a simple
>>>>> performance counter,
>>>>> at each op increase one of them. Since those are per OSD, we could have
>>>>> pretty high resolution
>>>>> with fractional memory usage, performance impact should be negligible
>>>>> since only one (two if split
>>>>> into read and write) of those counters would be incremented per one osd
>>>>> op.
>>>>>
>>>>> In addition we could also do this in 2D - each counter matching given
>>>>> latency range and op size range.
>>>>> having such 2D table would show both latency histogram, request size
>>>>> histogram and combinations of those
>>>>> (i.e. latency histogram of ~4k ops only).
>>>>>
>>>>> What do you think about this idea? I can prepare some code - a simple
>>>>> proof of concept looks really
>>>>> straightforward to implement.
>>>>>
>>>>>
>>>>> Bartek
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-02-07 16:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-28 16:22 Proposition - latency histogram Bartłomiej Święcki
2016-11-28 16:46 ` Allen Samuels
2016-11-28 23:05   ` Milosz Tanski
2016-11-28 16:51 ` Sage Weil
2016-11-28 17:43   ` John Spray
2016-11-29  2:43     ` Josh Durgin
2017-01-09 11:27 ` Bartłomiej Święcki
2017-01-31 15:22   ` Bartłomiej Święcki
2017-01-31 15:33     ` Bartłomiej Święcki
2017-02-06 11:54       ` John Spray
2017-02-07 16:30         ` Bartłomiej Święcki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.