Proposition

* Proposition - latency histogram
@ 2016-11-28 16:22 Bartłomiej Święcki
  2016-11-28 16:46 ` Allen Samuels
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Bartłomiej Święcki @ 2016-11-28 16:22 UTC (permalink / raw)
  To: Ceph Development

Hi,

Currently we can query OSD for op latency but it's given as an average. 
Average may not give
the bets information in this case - i.e. spikes can easily get hidden there.

Instead of an average we could easily do a simple histogram - quantize 
the latency into
predefined set of time intervals, for each of them have a simple 
performance counter,
at each op increase one of them. Since those are per OSD, we could have 
pretty high resolution
with fractional memory usage, performance impact should be negligible 
since only one (two if split
into read and write) of those counters would be incremented per one osd op.

In addition we could also do this in 2D - each counter matching given 
latency range and op size range.
having such 2D table would show both latency histogram, request size 
histogram and combinations of those
(i.e. latency histogram of ~4k ops only).

What do you think about this idea? I can prepare some code - a simple 
proof of concept looks really
straightforward to implement.

Bartek

^ permalink raw reply	[flat|nested] 11+ messages in thread