All of lore.kernel.org
 help / color / mirror / Atom feed
* ceph osd df
@ 2015-01-05 19:03 Sage Weil
  2015-01-10  9:31 ` Mykola Golub
  0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2015-01-05 19:03 UTC (permalink / raw)
  To: ceph-devel

We see a fair number of issues and confusion with OSD utilization and 
unfortunately there is easy way to see a summary of the current OSD 
utilization state.  'ceph pg dump' includes raw data but it not very 
friendly.  'ceph osd tree' shows weights but not actual utilization.  
'ceph health detail' tells you the nearfull osds but only when they reach 
the warning threshold.

Opened a ticket for a new command that summarizes just the relevant info:

	http://tracker.ceph.com/issues/10452

Suggestions welcome.  It's a pretty simple implementation (the mon has 
all the info; just need to add the command to present it) so I'm hoping it 
can get into hammer.  If anyone is interested in doing the 
implementation that would be great too!

sage

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph osd df
  2015-01-05 19:03 ceph osd df Sage Weil
@ 2015-01-10  9:31 ` Mykola Golub
  2015-01-10 18:39   ` Sage Weil
  0 siblings, 1 reply; 7+ messages in thread
From: Mykola Golub @ 2015-01-10  9:31 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Mon, Jan 05, 2015 at 11:03:40AM -0800, Sage Weil wrote:
> We see a fair number of issues and confusion with OSD utilization and 
> unfortunately there is easy way to see a summary of the current OSD 
> utilization state.  'ceph pg dump' includes raw data but it not very 
> friendly.  'ceph osd tree' shows weights but not actual utilization.  
> 'ceph health detail' tells you the nearfull osds but only when they reach 
> the warning threshold.
> 
> Opened a ticket for a new command that summarizes just the relevant info:
> 
> 	http://tracker.ceph.com/issues/10452
> 
> Suggestions welcome.  It's a pretty simple implementation (the mon has 
> all the info; just need to add the command to present it) so I'm hoping it 
> can get into hammer.  If anyone is interested in doing the 
> implementation that would be great too!

I am interested in implementing this.

Here is my approach, for preliminary review and discussion.

https://github.com/ceph/ceph/pull/3347

Only plane text format is available currently. As both "osd only" and
"tree" outputs look useful I implemented both and added "tree" option
to tell which to choose.

In http://tracker.ceph.com/issues/10452#note-2 Travis Rhoden suggested
to extend 'ceph osd tree' command to provide this data instead, but
I prefer to have many small specialized commands instead of one with
large output. But if other people also think that it is better to add
a '--detail' to osd tree instead of new command, I will change this.

Also, I am not sure I got an idea how standard deviation should be
calculated. Sage's note in 10452:

 - standard deviation (of normalized
   actual_osd_utilization/crush_weight/reweight value)
   
I don't see why utilization should be normalized by
reweight/crush_weight ratio? As I understand the goal is to have
utilization be the same for all devices (thus deviation as small as
possible), does not matter what reweight values we have?

Some examples of command output for my dev environments:

 % ceph osd df
 ID WEIGHT REWEIGHT %UTIL VAR  
 0    1.00     1.00 18.12 1.00 
 1    1.00     1.00 18.14 1.00 
 2    1.00     1.00 18.13 1.00 
 --
 AVG %UTIL: 18.13  MIN/MAX VAR: 1.00/1.00  DEV: 0
 
 % ceph osd df tree
 ID WEIGHT REWEIGHT %UTIL VAR  NAME            
 -1   3.00        - 18.13 1.00 root default    
 -2   3.00        - 18.13 1.00     host zhuzha 
 0    1.00     1.00 18.12 1.00         osd.0   
 1    1.00     1.00 18.14 1.00         osd.1   
 2    1.00     1.00 18.13 1.00         osd.2   
 --
 AVG %UTIL: 18.13  MIN/MAX VAR: 1.00/1.00  DEV: 0
 
 % ceph osd df
 ID WEIGHT REWEIGHT %UTIL VAR  
 0    1.00     1.00 38.15 0.91 
 1    1.00     1.00 44.15 1.06 
 2    1.00     1.00 45.66 1.09 
 3    1.00     1.00 44.15 1.06 
 4    1.00     0.80 36.82 0.88 
 --
 AVG %UTIL: 41.78  MIN/MAX VAR: 0.88/1.09  DEV: 6.19
 
 % ceph osd df tree
 ID WEIGHT REWEIGHT %UTIL VAR  NAME          
 -1   5.00        - 41.78 1.00 root default  
 -2   1.00        - 38.15 0.91     host osd1 
 0    1.00     1.00 38.15 0.91         osd.0 
 -3   1.00        - 44.15 1.06     host osd2 
 1    1.00     1.00 44.15 1.06         osd.1 
 -4   1.00        - 45.66 1.09     host osd3 
 2    1.00     1.00 45.66 1.09         osd.2 
 -5   1.00        - 44.15 1.06     host osd4 
 3    1.00     1.00 44.15 1.06         osd.3 
 -6   1.00        - 36.82 0.88     host osd5 
 4    1.00     0.80 36.82 0.88         osd.4 
 --
 AVG %UTIL: 41.78  MIN/MAX VAR: 0.88/1.09  DEV: 6.19

-- 
Mykola Golub

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph osd df
  2015-01-10  9:31 ` Mykola Golub
@ 2015-01-10 18:39   ` Sage Weil
  2015-01-11 16:31     ` Mykola Golub
  0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2015-01-10 18:39 UTC (permalink / raw)
  To: Mykola Golub; +Cc: ceph-devel

On Sat, 10 Jan 2015, Mykola Golub wrote:
> On Mon, Jan 05, 2015 at 11:03:40AM -0800, Sage Weil wrote:
> > We see a fair number of issues and confusion with OSD utilization and 
> > unfortunately there is easy way to see a summary of the current OSD 
> > utilization state.  'ceph pg dump' includes raw data but it not very 
> > friendly.  'ceph osd tree' shows weights but not actual utilization.  
> > 'ceph health detail' tells you the nearfull osds but only when they reach 
> > the warning threshold.
> > 
> > Opened a ticket for a new command that summarizes just the relevant info:
> > 
> > 	http://tracker.ceph.com/issues/10452
> > 
> > Suggestions welcome.  It's a pretty simple implementation (the mon has 
> > all the info; just need to add the command to present it) so I'm hoping it 
> > can get into hammer.  If anyone is interested in doing the 
> > implementation that would be great too!
> 
> I am interested in implementing this.
> 
> Here is my approach, for preliminary review and discussion.
>
> https://github.com/ceph/ceph/pull/3347

Awesome!  I made a few comments on the pull request.

> Only plane text format is available currently. As both "osd only" and
> "tree" outputs look useful I implemented both and added "tree" option
> to tell which to choose.

This sounds fine to me.  We will want to include the formatted output 
before merging, though!

> In http://tracker.ceph.com/issues/10452#note-2 Travis Rhoden suggested
> to extend 'ceph osd tree' command to provide this data instead, but
> I prefer to have many small specialized commands instead of one with
> large output. But if other people also think that it is better to add
> a '--detail' to osd tree instead of new command, I will change this.

Works for me.
 
> Also, I am not sure I got an idea how standard deviation should be
> calculated. Sage's note in 10452:
> 
>  - standard deviation (of normalized
>    actual_osd_utilization/crush_weight/reweight value)
>    
> I don't see why utilization should be normalized by
> reweight/crush_weight ratio? As I understand the goal is to have
> utilization be the same for all devices (thus deviation as small as
> possible), does not matter what reweight values we have?

Yeah, I think you're right.  If I'm reading the code correct you're still 
including reweight in there but I think it can be safely dropped.

> Some examples of command output for my dev environments:
> 
>  % ceph osd df
>  ID WEIGHT REWEIGHT %UTIL VAR  
>  0    1.00     1.00 18.12 1.00 
>  1    1.00     1.00 18.14 1.00 
>  2    1.00     1.00 18.13 1.00 

I wonder if we should try to standardize the table formats.  'ceph osd 
tree' current looks like

# id	weight	type name	up/down	reweight
-1	3	root default
-2	3		host maetl
0	1			osd.0	up	1	
1	1			osd.1	up	1	
2	1			osd.2	up	1	

That is, lowercase headers (with a # header prefix).  It's also not using 
TableFormatter (which it predates).

It's also pretty sloppy with the precision and formatting:

$ ./ceph osd crush reweight osd.1 
.0001
reweighted item id 1 name 'osd.1' to 0.0001 in crush map
$ ./ceph osd tree
# id	weight	type name	up/down	reweight
-1	2	root default
-2	2		host maetl
0	1			osd.0	up	1	
1	9.155e-05			osd.1	up	1	
2	1			osd.2	up	1	
$ ./ceph osd crush reweight osd.1 .001
reweighted item id 1 name 'osd.1' to 0.001 in crush map
$ ./ceph osd tree
# id	weight	type name	up/down	reweight
-1	2.001	root default
-2	2.001		host maetl
0	1			osd.0	up	1	
1	0.0009918			osd.1	up	1	
2	1			osd.2	up	1	

Given that the *actual* precision of these weights is 16.16 bit 
fixed-point, that's a lower bound of .00001.  I'm not sure we want to 
print 1.00000 all the time, though?  Although I suppose it's better than

      1
      2
 .00001

In a perfect world I suppose TableFormatter (or whatever) would adjust the 
precision of all printed values to the highest precision needed by any 
item in the list, but maybe just sticking to 5 digits for 
everything is best for simplicity.

Anyway, any interest in making a single stringify_weight() helper and 
fixing up 'ceph osd tree' to also use it and TableFormatter too?  :)

sage


>  --
>  AVG %UTIL: 18.13  MIN/MAX VAR: 1.00/1.00  DEV: 0
>  
>  % ceph osd df tree
>  ID WEIGHT REWEIGHT %UTIL VAR  NAME            
>  -1   3.00        - 18.13 1.00 root default    
>  -2   3.00        - 18.13 1.00     host zhuzha 
>  0    1.00     1.00 18.12 1.00         osd.0   
>  1    1.00     1.00 18.14 1.00         osd.1   
>  2    1.00     1.00 18.13 1.00         osd.2   
>  --
>  AVG %UTIL: 18.13  MIN/MAX VAR: 1.00/1.00  DEV: 0
>  
>  % ceph osd df
>  ID WEIGHT REWEIGHT %UTIL VAR  
>  0    1.00     1.00 38.15 0.91 
>  1    1.00     1.00 44.15 1.06 
>  2    1.00     1.00 45.66 1.09 
>  3    1.00     1.00 44.15 1.06 
>  4    1.00     0.80 36.82 0.88 
>  --
>  AVG %UTIL: 41.78  MIN/MAX VAR: 0.88/1.09  DEV: 6.19
>  
>  % ceph osd df tree
>  ID WEIGHT REWEIGHT %UTIL VAR  NAME          
>  -1   5.00        - 41.78 1.00 root default  
>  -2   1.00        - 38.15 0.91     host osd1 
>  0    1.00     1.00 38.15 0.91         osd.0 
>  -3   1.00        - 44.15 1.06     host osd2 
>  1    1.00     1.00 44.15 1.06         osd.1 
>  -4   1.00        - 45.66 1.09     host osd3 
>  2    1.00     1.00 45.66 1.09         osd.2 
>  -5   1.00        - 44.15 1.06     host osd4 
>  3    1.00     1.00 44.15 1.06         osd.3 
>  -6   1.00        - 36.82 0.88     host osd5 
>  4    1.00     0.80 36.82 0.88         osd.4 
>  --
>  AVG %UTIL: 41.78  MIN/MAX VAR: 0.88/1.09  DEV: 6.19
> 
> -- 
> Mykola Golub
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph osd df
  2015-01-10 18:39   ` Sage Weil
@ 2015-01-11 16:31     ` Mykola Golub
  2015-01-11 17:33       ` Sage Weil
  0 siblings, 1 reply; 7+ messages in thread
From: Mykola Golub @ 2015-01-11 16:31 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Sat, Jan 10, 2015 at 10:39:41AM -0800, Sage Weil wrote:

> I wonder if we should try to standardize the table formats.  'ceph osd 
> tree' current looks like
> 
> # id	weight	type name	up/down	reweight
> -1	3	root default
> -2	3		host maetl
> 0	1			osd.0	up	1	
> 1	1			osd.1	up	1	
> 2	1			osd.2	up	1	
> 
> That is, lowercase headers (with a # header prefix).  It's also not using 
> TableFormatter (which it predates).
> 
> It's also pretty sloppy with the precision and formatting:
> 
> $ ./ceph osd crush reweight osd.1 
> .0001
> reweighted item id 1 name 'osd.1' to 0.0001 in crush map
> $ ./ceph osd tree
> # id	weight	type name	up/down	reweight
> -1	2	root default
> -2	2		host maetl
> 0	1			osd.0	up	1	
> 1	9.155e-05			osd.1	up	1	
> 2	1			osd.2	up	1	
> $ ./ceph osd crush reweight osd.1 .001
> reweighted item id 1 name 'osd.1' to 0.001 in crush map
> $ ./ceph osd tree
> # id	weight	type name	up/down	reweight
> -1	2.001	root default
> -2	2.001		host maetl
> 0	1			osd.0	up	1	
> 1	0.0009918			osd.1	up	1	
> 2	1			osd.2	up	1	
> 
> Given that the *actual* precision of these weights is 16.16 bit 
> fixed-point, that's a lower bound of .00001.  I'm not sure we want to 
> print 1.00000 all the time, though?  Although I suppose it's better than
> 
>       1
>       2
>  .00001
> 
> In a perfect world I suppose TableFormatter (or whatever) would adjust the 
> precision of all printed values to the highest precision needed by any 
> item in the list, but maybe just sticking to 5 digits for 
> everything is best for simplicity.
> 
> Anyway, any interest in making a single stringify_weight() helper and 
> fixing up 'ceph osd tree' to also use it and TableFormatter too?  :)

Sure :) Thanks for the comments and suggestions. I will come with
update.

BTW, wouldn't disk usage in bytes (size used avail) be useful in this
output too? I.e something like below:

# id weight  reweight size  used avail %util var  
0    1.00000  1.00000 886G  171G  670G 19.30 1.00 
...
--
total size/used/avail: 886G/171G/670G
avg %util: 41.78  min/max var: 0.88/1.09  dev: 6.19

-- 
Mykola Golub

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph osd df
  2015-01-11 16:31     ` Mykola Golub
@ 2015-01-11 17:33       ` Sage Weil
  2015-01-12  8:22         ` Mykola Golub
  0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2015-01-11 17:33 UTC (permalink / raw)
  To: Mykola Golub; +Cc: ceph-devel

On Sun, 11 Jan 2015, Mykola Golub wrote:
> On Sat, Jan 10, 2015 at 10:39:41AM -0800, Sage Weil wrote:
> 
> > I wonder if we should try to standardize the table formats.  'ceph osd 
> > tree' current looks like
> > 
> > # id	weight	type name	up/down	reweight
> > -1	3	root default
> > -2	3		host maetl
> > 0	1			osd.0	up	1	
> > 1	1			osd.1	up	1	
> > 2	1			osd.2	up	1	
> > 
> > That is, lowercase headers (with a # header prefix).  It's also not using 
> > TableFormatter (which it predates).
> > 
> > It's also pretty sloppy with the precision and formatting:
> > 
> > $ ./ceph osd crush reweight osd.1 
> > .0001
> > reweighted item id 1 name 'osd.1' to 0.0001 in crush map
> > $ ./ceph osd tree
> > # id	weight	type name	up/down	reweight
> > -1	2	root default
> > -2	2		host maetl
> > 0	1			osd.0	up	1	
> > 1	9.155e-05			osd.1	up	1	
> > 2	1			osd.2	up	1	
> > $ ./ceph osd crush reweight osd.1 .001
> > reweighted item id 1 name 'osd.1' to 0.001 in crush map
> > $ ./ceph osd tree
> > # id	weight	type name	up/down	reweight
> > -1	2.001	root default
> > -2	2.001		host maetl
> > 0	1			osd.0	up	1	
> > 1	0.0009918			osd.1	up	1	
> > 2	1			osd.2	up	1	
> > 
> > Given that the *actual* precision of these weights is 16.16 bit 
> > fixed-point, that's a lower bound of .00001.  I'm not sure we want to 
> > print 1.00000 all the time, though?  Although I suppose it's better than
> > 
> >       1
> >       2
> >  .00001
> > 
> > In a perfect world I suppose TableFormatter (or whatever) would adjust the 
> > precision of all printed values to the highest precision needed by any 
> > item in the list, but maybe just sticking to 5 digits for 
> > everything is best for simplicity.
> > 
> > Anyway, any interest in making a single stringify_weight() helper and 
> > fixing up 'ceph osd tree' to also use it and TableFormatter too?  :)
> 
> Sure :) Thanks for the comments and suggestions. I will come with
> update.
> 
> BTW, wouldn't disk usage in bytes (size used avail) be useful in this
> output too? I.e something like below:
> 
> # id weight  reweight size  used avail %util var  
> 0    1.00000  1.00000 886G  171G  670G 19.30 1.00 

Yeah, sure!

By the way I took another look and I'm not sure that it is worth 
duplicating all of the tree logic for a tree view.  It seems easier to 
either include this optionally in the tree output (the utilzation calc is 
simpler than the tree traversal stack)... or generalize it somehow?

sage

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph osd df
  2015-01-11 17:33       ` Sage Weil
@ 2015-01-12  8:22         ` Mykola Golub
  2015-01-16 15:51           ` Mykola Golub
  0 siblings, 1 reply; 7+ messages in thread
From: Mykola Golub @ 2015-01-12  8:22 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Sun, Jan 11, 2015 at 09:33:57AM -0800, Sage Weil wrote:

> By the way I took another look and I'm not sure that it is worth 
> duplicating all of the tree logic for a tree view.  It seems easier to 
> either include this optionally in the tree output (the utilzation calc is 
> simpler than the tree traversal stack)... or generalize it somehow?

Note, we already have duplication, at least CrushWrapper::dump_tree()
and OSDMap::print_tree(). I will work on generalization, I think some
tree dumper in CrushWrapper.

-- 
Mykola Golub

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ceph osd df
  2015-01-12  8:22         ` Mykola Golub
@ 2015-01-16 15:51           ` Mykola Golub
  0 siblings, 0 replies; 7+ messages in thread
From: Mykola Golub @ 2015-01-16 15:51 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Mon, Jan 12, 2015 at 10:22:51AM +0200, Mykola Golub wrote:
> On Sun, Jan 11, 2015 at 09:33:57AM -0800, Sage Weil wrote:
> 
> > By the way I took another look and I'm not sure that it is worth 
> > duplicating all of the tree logic for a tree view.  It seems easier to 
> > either include this optionally in the tree output (the utilzation calc is 
> > simpler than the tree traversal stack)... or generalize it somehow?
> 
> Note, we already have duplication, at least CrushWrapper::dump_tree()
> and OSDMap::print_tree(). I will work on generalization, I think some
> tree dumper in CrushWrapper.

I have updated the code. https://github.com/ceph/ceph/pull/3347

-- 
Mykola Golub

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-01-16 15:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-05 19:03 ceph osd df Sage Weil
2015-01-10  9:31 ` Mykola Golub
2015-01-10 18:39   ` Sage Weil
2015-01-11 16:31     ` Mykola Golub
2015-01-11 17:33       ` Sage Weil
2015-01-12  8:22         ` Mykola Golub
2015-01-16 15:51           ` Mykola Golub

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.