* ceph osd df
@ 2015-01-05 19:03 Sage Weil
2015-01-10 9:31 ` Mykola Golub
0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2015-01-05 19:03 UTC (permalink / raw)
To: ceph-devel
We see a fair number of issues and confusion with OSD utilization and
unfortunately there is easy way to see a summary of the current OSD
utilization state. 'ceph pg dump' includes raw data but it not very
friendly. 'ceph osd tree' shows weights but not actual utilization.
'ceph health detail' tells you the nearfull osds but only when they reach
the warning threshold.
Opened a ticket for a new command that summarizes just the relevant info:
http://tracker.ceph.com/issues/10452
Suggestions welcome. It's a pretty simple implementation (the mon has
all the info; just need to add the command to present it) so I'm hoping it
can get into hammer. If anyone is interested in doing the
implementation that would be great too!
sage
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ceph osd df
2015-01-05 19:03 ceph osd df Sage Weil
@ 2015-01-10 9:31 ` Mykola Golub
2015-01-10 18:39 ` Sage Weil
0 siblings, 1 reply; 7+ messages in thread
From: Mykola Golub @ 2015-01-10 9:31 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On Mon, Jan 05, 2015 at 11:03:40AM -0800, Sage Weil wrote:
> We see a fair number of issues and confusion with OSD utilization and
> unfortunately there is easy way to see a summary of the current OSD
> utilization state. 'ceph pg dump' includes raw data but it not very
> friendly. 'ceph osd tree' shows weights but not actual utilization.
> 'ceph health detail' tells you the nearfull osds but only when they reach
> the warning threshold.
>
> Opened a ticket for a new command that summarizes just the relevant info:
>
> http://tracker.ceph.com/issues/10452
>
> Suggestions welcome. It's a pretty simple implementation (the mon has
> all the info; just need to add the command to present it) so I'm hoping it
> can get into hammer. If anyone is interested in doing the
> implementation that would be great too!
I am interested in implementing this.
Here is my approach, for preliminary review and discussion.
https://github.com/ceph/ceph/pull/3347
Only plane text format is available currently. As both "osd only" and
"tree" outputs look useful I implemented both and added "tree" option
to tell which to choose.
In http://tracker.ceph.com/issues/10452#note-2 Travis Rhoden suggested
to extend 'ceph osd tree' command to provide this data instead, but
I prefer to have many small specialized commands instead of one with
large output. But if other people also think that it is better to add
a '--detail' to osd tree instead of new command, I will change this.
Also, I am not sure I got an idea how standard deviation should be
calculated. Sage's note in 10452:
- standard deviation (of normalized
actual_osd_utilization/crush_weight/reweight value)
I don't see why utilization should be normalized by
reweight/crush_weight ratio? As I understand the goal is to have
utilization be the same for all devices (thus deviation as small as
possible), does not matter what reweight values we have?
Some examples of command output for my dev environments:
% ceph osd df
ID WEIGHT REWEIGHT %UTIL VAR
0 1.00 1.00 18.12 1.00
1 1.00 1.00 18.14 1.00
2 1.00 1.00 18.13 1.00
--
AVG %UTIL: 18.13 MIN/MAX VAR: 1.00/1.00 DEV: 0
% ceph osd df tree
ID WEIGHT REWEIGHT %UTIL VAR NAME
-1 3.00 - 18.13 1.00 root default
-2 3.00 - 18.13 1.00 host zhuzha
0 1.00 1.00 18.12 1.00 osd.0
1 1.00 1.00 18.14 1.00 osd.1
2 1.00 1.00 18.13 1.00 osd.2
--
AVG %UTIL: 18.13 MIN/MAX VAR: 1.00/1.00 DEV: 0
% ceph osd df
ID WEIGHT REWEIGHT %UTIL VAR
0 1.00 1.00 38.15 0.91
1 1.00 1.00 44.15 1.06
2 1.00 1.00 45.66 1.09
3 1.00 1.00 44.15 1.06
4 1.00 0.80 36.82 0.88
--
AVG %UTIL: 41.78 MIN/MAX VAR: 0.88/1.09 DEV: 6.19
% ceph osd df tree
ID WEIGHT REWEIGHT %UTIL VAR NAME
-1 5.00 - 41.78 1.00 root default
-2 1.00 - 38.15 0.91 host osd1
0 1.00 1.00 38.15 0.91 osd.0
-3 1.00 - 44.15 1.06 host osd2
1 1.00 1.00 44.15 1.06 osd.1
-4 1.00 - 45.66 1.09 host osd3
2 1.00 1.00 45.66 1.09 osd.2
-5 1.00 - 44.15 1.06 host osd4
3 1.00 1.00 44.15 1.06 osd.3
-6 1.00 - 36.82 0.88 host osd5
4 1.00 0.80 36.82 0.88 osd.4
--
AVG %UTIL: 41.78 MIN/MAX VAR: 0.88/1.09 DEV: 6.19
--
Mykola Golub
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ceph osd df
2015-01-10 9:31 ` Mykola Golub
@ 2015-01-10 18:39 ` Sage Weil
2015-01-11 16:31 ` Mykola Golub
0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2015-01-10 18:39 UTC (permalink / raw)
To: Mykola Golub; +Cc: ceph-devel
On Sat, 10 Jan 2015, Mykola Golub wrote:
> On Mon, Jan 05, 2015 at 11:03:40AM -0800, Sage Weil wrote:
> > We see a fair number of issues and confusion with OSD utilization and
> > unfortunately there is easy way to see a summary of the current OSD
> > utilization state. 'ceph pg dump' includes raw data but it not very
> > friendly. 'ceph osd tree' shows weights but not actual utilization.
> > 'ceph health detail' tells you the nearfull osds but only when they reach
> > the warning threshold.
> >
> > Opened a ticket for a new command that summarizes just the relevant info:
> >
> > http://tracker.ceph.com/issues/10452
> >
> > Suggestions welcome. It's a pretty simple implementation (the mon has
> > all the info; just need to add the command to present it) so I'm hoping it
> > can get into hammer. If anyone is interested in doing the
> > implementation that would be great too!
>
> I am interested in implementing this.
>
> Here is my approach, for preliminary review and discussion.
>
> https://github.com/ceph/ceph/pull/3347
Awesome! I made a few comments on the pull request.
> Only plane text format is available currently. As both "osd only" and
> "tree" outputs look useful I implemented both and added "tree" option
> to tell which to choose.
This sounds fine to me. We will want to include the formatted output
before merging, though!
> In http://tracker.ceph.com/issues/10452#note-2 Travis Rhoden suggested
> to extend 'ceph osd tree' command to provide this data instead, but
> I prefer to have many small specialized commands instead of one with
> large output. But if other people also think that it is better to add
> a '--detail' to osd tree instead of new command, I will change this.
Works for me.
> Also, I am not sure I got an idea how standard deviation should be
> calculated. Sage's note in 10452:
>
> - standard deviation (of normalized
> actual_osd_utilization/crush_weight/reweight value)
>
> I don't see why utilization should be normalized by
> reweight/crush_weight ratio? As I understand the goal is to have
> utilization be the same for all devices (thus deviation as small as
> possible), does not matter what reweight values we have?
Yeah, I think you're right. If I'm reading the code correct you're still
including reweight in there but I think it can be safely dropped.
> Some examples of command output for my dev environments:
>
> % ceph osd df
> ID WEIGHT REWEIGHT %UTIL VAR
> 0 1.00 1.00 18.12 1.00
> 1 1.00 1.00 18.14 1.00
> 2 1.00 1.00 18.13 1.00
I wonder if we should try to standardize the table formats. 'ceph osd
tree' current looks like
# id weight type name up/down reweight
-1 3 root default
-2 3 host maetl
0 1 osd.0 up 1
1 1 osd.1 up 1
2 1 osd.2 up 1
That is, lowercase headers (with a # header prefix). It's also not using
TableFormatter (which it predates).
It's also pretty sloppy with the precision and formatting:
$ ./ceph osd crush reweight osd.1
.0001
reweighted item id 1 name 'osd.1' to 0.0001 in crush map
$ ./ceph osd tree
# id weight type name up/down reweight
-1 2 root default
-2 2 host maetl
0 1 osd.0 up 1
1 9.155e-05 osd.1 up 1
2 1 osd.2 up 1
$ ./ceph osd crush reweight osd.1 .001
reweighted item id 1 name 'osd.1' to 0.001 in crush map
$ ./ceph osd tree
# id weight type name up/down reweight
-1 2.001 root default
-2 2.001 host maetl
0 1 osd.0 up 1
1 0.0009918 osd.1 up 1
2 1 osd.2 up 1
Given that the *actual* precision of these weights is 16.16 bit
fixed-point, that's a lower bound of .00001. I'm not sure we want to
print 1.00000 all the time, though? Although I suppose it's better than
1
2
.00001
In a perfect world I suppose TableFormatter (or whatever) would adjust the
precision of all printed values to the highest precision needed by any
item in the list, but maybe just sticking to 5 digits for
everything is best for simplicity.
Anyway, any interest in making a single stringify_weight() helper and
fixing up 'ceph osd tree' to also use it and TableFormatter too? :)
sage
> --
> AVG %UTIL: 18.13 MIN/MAX VAR: 1.00/1.00 DEV: 0
>
> % ceph osd df tree
> ID WEIGHT REWEIGHT %UTIL VAR NAME
> -1 3.00 - 18.13 1.00 root default
> -2 3.00 - 18.13 1.00 host zhuzha
> 0 1.00 1.00 18.12 1.00 osd.0
> 1 1.00 1.00 18.14 1.00 osd.1
> 2 1.00 1.00 18.13 1.00 osd.2
> --
> AVG %UTIL: 18.13 MIN/MAX VAR: 1.00/1.00 DEV: 0
>
> % ceph osd df
> ID WEIGHT REWEIGHT %UTIL VAR
> 0 1.00 1.00 38.15 0.91
> 1 1.00 1.00 44.15 1.06
> 2 1.00 1.00 45.66 1.09
> 3 1.00 1.00 44.15 1.06
> 4 1.00 0.80 36.82 0.88
> --
> AVG %UTIL: 41.78 MIN/MAX VAR: 0.88/1.09 DEV: 6.19
>
> % ceph osd df tree
> ID WEIGHT REWEIGHT %UTIL VAR NAME
> -1 5.00 - 41.78 1.00 root default
> -2 1.00 - 38.15 0.91 host osd1
> 0 1.00 1.00 38.15 0.91 osd.0
> -3 1.00 - 44.15 1.06 host osd2
> 1 1.00 1.00 44.15 1.06 osd.1
> -4 1.00 - 45.66 1.09 host osd3
> 2 1.00 1.00 45.66 1.09 osd.2
> -5 1.00 - 44.15 1.06 host osd4
> 3 1.00 1.00 44.15 1.06 osd.3
> -6 1.00 - 36.82 0.88 host osd5
> 4 1.00 0.80 36.82 0.88 osd.4
> --
> AVG %UTIL: 41.78 MIN/MAX VAR: 0.88/1.09 DEV: 6.19
>
> --
> Mykola Golub
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ceph osd df
2015-01-10 18:39 ` Sage Weil
@ 2015-01-11 16:31 ` Mykola Golub
2015-01-11 17:33 ` Sage Weil
0 siblings, 1 reply; 7+ messages in thread
From: Mykola Golub @ 2015-01-11 16:31 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On Sat, Jan 10, 2015 at 10:39:41AM -0800, Sage Weil wrote:
> I wonder if we should try to standardize the table formats. 'ceph osd
> tree' current looks like
>
> # id weight type name up/down reweight
> -1 3 root default
> -2 3 host maetl
> 0 1 osd.0 up 1
> 1 1 osd.1 up 1
> 2 1 osd.2 up 1
>
> That is, lowercase headers (with a # header prefix). It's also not using
> TableFormatter (which it predates).
>
> It's also pretty sloppy with the precision and formatting:
>
> $ ./ceph osd crush reweight osd.1
> .0001
> reweighted item id 1 name 'osd.1' to 0.0001 in crush map
> $ ./ceph osd tree
> # id weight type name up/down reweight
> -1 2 root default
> -2 2 host maetl
> 0 1 osd.0 up 1
> 1 9.155e-05 osd.1 up 1
> 2 1 osd.2 up 1
> $ ./ceph osd crush reweight osd.1 .001
> reweighted item id 1 name 'osd.1' to 0.001 in crush map
> $ ./ceph osd tree
> # id weight type name up/down reweight
> -1 2.001 root default
> -2 2.001 host maetl
> 0 1 osd.0 up 1
> 1 0.0009918 osd.1 up 1
> 2 1 osd.2 up 1
>
> Given that the *actual* precision of these weights is 16.16 bit
> fixed-point, that's a lower bound of .00001. I'm not sure we want to
> print 1.00000 all the time, though? Although I suppose it's better than
>
> 1
> 2
> .00001
>
> In a perfect world I suppose TableFormatter (or whatever) would adjust the
> precision of all printed values to the highest precision needed by any
> item in the list, but maybe just sticking to 5 digits for
> everything is best for simplicity.
>
> Anyway, any interest in making a single stringify_weight() helper and
> fixing up 'ceph osd tree' to also use it and TableFormatter too? :)
Sure :) Thanks for the comments and suggestions. I will come with
update.
BTW, wouldn't disk usage in bytes (size used avail) be useful in this
output too? I.e something like below:
# id weight reweight size used avail %util var
0 1.00000 1.00000 886G 171G 670G 19.30 1.00
...
--
total size/used/avail: 886G/171G/670G
avg %util: 41.78 min/max var: 0.88/1.09 dev: 6.19
--
Mykola Golub
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ceph osd df
2015-01-11 16:31 ` Mykola Golub
@ 2015-01-11 17:33 ` Sage Weil
2015-01-12 8:22 ` Mykola Golub
0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2015-01-11 17:33 UTC (permalink / raw)
To: Mykola Golub; +Cc: ceph-devel
On Sun, 11 Jan 2015, Mykola Golub wrote:
> On Sat, Jan 10, 2015 at 10:39:41AM -0800, Sage Weil wrote:
>
> > I wonder if we should try to standardize the table formats. 'ceph osd
> > tree' current looks like
> >
> > # id weight type name up/down reweight
> > -1 3 root default
> > -2 3 host maetl
> > 0 1 osd.0 up 1
> > 1 1 osd.1 up 1
> > 2 1 osd.2 up 1
> >
> > That is, lowercase headers (with a # header prefix). It's also not using
> > TableFormatter (which it predates).
> >
> > It's also pretty sloppy with the precision and formatting:
> >
> > $ ./ceph osd crush reweight osd.1
> > .0001
> > reweighted item id 1 name 'osd.1' to 0.0001 in crush map
> > $ ./ceph osd tree
> > # id weight type name up/down reweight
> > -1 2 root default
> > -2 2 host maetl
> > 0 1 osd.0 up 1
> > 1 9.155e-05 osd.1 up 1
> > 2 1 osd.2 up 1
> > $ ./ceph osd crush reweight osd.1 .001
> > reweighted item id 1 name 'osd.1' to 0.001 in crush map
> > $ ./ceph osd tree
> > # id weight type name up/down reweight
> > -1 2.001 root default
> > -2 2.001 host maetl
> > 0 1 osd.0 up 1
> > 1 0.0009918 osd.1 up 1
> > 2 1 osd.2 up 1
> >
> > Given that the *actual* precision of these weights is 16.16 bit
> > fixed-point, that's a lower bound of .00001. I'm not sure we want to
> > print 1.00000 all the time, though? Although I suppose it's better than
> >
> > 1
> > 2
> > .00001
> >
> > In a perfect world I suppose TableFormatter (or whatever) would adjust the
> > precision of all printed values to the highest precision needed by any
> > item in the list, but maybe just sticking to 5 digits for
> > everything is best for simplicity.
> >
> > Anyway, any interest in making a single stringify_weight() helper and
> > fixing up 'ceph osd tree' to also use it and TableFormatter too? :)
>
> Sure :) Thanks for the comments and suggestions. I will come with
> update.
>
> BTW, wouldn't disk usage in bytes (size used avail) be useful in this
> output too? I.e something like below:
>
> # id weight reweight size used avail %util var
> 0 1.00000 1.00000 886G 171G 670G 19.30 1.00
Yeah, sure!
By the way I took another look and I'm not sure that it is worth
duplicating all of the tree logic for a tree view. It seems easier to
either include this optionally in the tree output (the utilzation calc is
simpler than the tree traversal stack)... or generalize it somehow?
sage
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ceph osd df
2015-01-11 17:33 ` Sage Weil
@ 2015-01-12 8:22 ` Mykola Golub
2015-01-16 15:51 ` Mykola Golub
0 siblings, 1 reply; 7+ messages in thread
From: Mykola Golub @ 2015-01-12 8:22 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On Sun, Jan 11, 2015 at 09:33:57AM -0800, Sage Weil wrote:
> By the way I took another look and I'm not sure that it is worth
> duplicating all of the tree logic for a tree view. It seems easier to
> either include this optionally in the tree output (the utilzation calc is
> simpler than the tree traversal stack)... or generalize it somehow?
Note, we already have duplication, at least CrushWrapper::dump_tree()
and OSDMap::print_tree(). I will work on generalization, I think some
tree dumper in CrushWrapper.
--
Mykola Golub
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ceph osd df
2015-01-12 8:22 ` Mykola Golub
@ 2015-01-16 15:51 ` Mykola Golub
0 siblings, 0 replies; 7+ messages in thread
From: Mykola Golub @ 2015-01-16 15:51 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On Mon, Jan 12, 2015 at 10:22:51AM +0200, Mykola Golub wrote:
> On Sun, Jan 11, 2015 at 09:33:57AM -0800, Sage Weil wrote:
>
> > By the way I took another look and I'm not sure that it is worth
> > duplicating all of the tree logic for a tree view. It seems easier to
> > either include this optionally in the tree output (the utilzation calc is
> > simpler than the tree traversal stack)... or generalize it somehow?
>
> Note, we already have duplication, at least CrushWrapper::dump_tree()
> and OSDMap::print_tree(). I will work on generalization, I think some
> tree dumper in CrushWrapper.
I have updated the code. https://github.com/ceph/ceph/pull/3347
--
Mykola Golub
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-01-16 15:51 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-05 19:03 ceph osd df Sage Weil
2015-01-10 9:31 ` Mykola Golub
2015-01-10 18:39 ` Sage Weil
2015-01-11 16:31 ` Mykola Golub
2015-01-11 17:33 ` Sage Weil
2015-01-12 8:22 ` Mykola Golub
2015-01-16 15:51 ` Mykola Golub
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.