All of lore.kernel.org
 help / color / mirror / Atom feed
* ceph mgr balancer bad distribution
@ 2018-02-28 12:47 Stefan Priebe - Profihost AG
       [not found] ` <1ac5678e-ec95-3ab6-38bf-bdb889e1cd23-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-02-28 12:47 UTC (permalink / raw)
  To: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA

Hello,

with jewel we always used the python crush optimizer which gave us a
pretty good distribution fo the used space.

Since luminous we're using the included ceph mgr balancer but the
distribution is far from perfect and much worse than the old method.

Is there any way to tune the mgr balancer?

Currently after a balance we still have:
75% to 92% disk usage which is pretty unfair

Greets,
Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found] ` <1ac5678e-ec95-3ab6-38bf-bdb889e1cd23-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2018-02-28 12:58   ` Dan van der Ster
       [not found]     ` <CABZ+qqmgOb459reQ2=MkhQLBho_O5AM8OA=0PuUQ1Zz=uGrMpA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2018-03-01  7:27   ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 20+ messages in thread
From: Dan van der Ster @ 2018-02-28 12:58 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA

Hi Stefan,

Which balancer mode are you using? crush-compat scores using a mix of
nobjects, npgs, and size. It's doing pretty well over here as long as
you have a relatively small number of empty PGs.
I believe that upmap uses nPGs only, and I haven't tested it enough
yet to know if it actually improves things.

Also, did you only run one iteration of the balancer? It only moves up
to 5% of objects each iteration, so it can take several to fully
balance things.

-- dan


On Wed, Feb 28, 2018 at 1:47 PM, Stefan Priebe - Profihost AG
<s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
> Hello,
>
> with jewel we always used the python crush optimizer which gave us a
> pretty good distribution fo the used space.
>
> Since luminous we're using the included ceph mgr balancer but the
> distribution is far from perfect and much worse than the old method.
>
> Is there any way to tune the mgr balancer?
>
> Currently after a balance we still have:
> 75% to 92% disk usage which is pretty unfair
>
> Greets,
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]     ` <CABZ+qqmgOb459reQ2=MkhQLBho_O5AM8OA=0PuUQ1Zz=uGrMpA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-02-28 13:59       ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-02-28 13:59 UTC (permalink / raw)
  To: Dan van der Ster
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA

Am 28.02.2018 um 13:58 schrieb Dan van der Ster:
> Hi Stefan,
> 
> Which balancer mode are you using? crush-compat scores using a mix of
> nobjects, npgs, and size. It's doing pretty well over here as long as
> you have a relatively small number of empty PGs.
>
> I believe that upmap uses nPGs only, and I haven't tested it enough
> yet to know if it actually improves things.
> 
> Also, did you only run one iteration of the balancer? It only moves up
> to 5% of objects each iteration, so it can take several to fully
> balance things.

crush-compat mode

Yes only one iteration but i set max_misplaced to 20%:
    "mgr/balancer/max_misplaced": "20.00",

> 
> -- dan
> 
> 
> On Wed, Feb 28, 2018 at 1:47 PM, Stefan Priebe - Profihost AG
> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>> Hello,
>>
>> with jewel we always used the python crush optimizer which gave us a
>> pretty good distribution fo the used space.
>>
>> Since luminous we're using the included ceph mgr balancer but the
>> distribution is far from perfect and much worse than the old method.
>>
>> Is there any way to tune the mgr balancer?
>>
>> Currently after a balance we still have:
>> 75% to 92% disk usage which is pretty unfair
>>
>> Greets,
>> Stefan
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found] ` <1ac5678e-ec95-3ab6-38bf-bdb889e1cd23-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  2018-02-28 12:58   ` Dan van der Ster
@ 2018-03-01  7:27   ` Stefan Priebe - Profihost AG
       [not found]     ` <b5d774be-a2e2-b57c-d201-b5df71868d49-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-03-01  7:27 UTC (permalink / raw)
  To: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA

Does anybody have some more input?

I keeped the balancer active for 24h now and it is rebalancing 1-3%
every 30 minutes but the distribution is still bad.

It seems to balance from left to right and than back from right to left...

Greets,
Stefan

Am 28.02.2018 um 13:47 schrieb Stefan Priebe - Profihost AG:
> Hello,
> 
> with jewel we always used the python crush optimizer which gave us a
> pretty good distribution fo the used space.
> 
> Since luminous we're using the included ceph mgr balancer but the
> distribution is far from perfect and much worse than the old method.
> 
> Is there any way to tune the mgr balancer?
> 
> Currently after a balance we still have:
> 75% to 92% disk usage which is pretty unfair
> 
> Greets,
> Stefan
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]     ` <b5d774be-a2e2-b57c-d201-b5df71868d49-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2018-03-01  8:03       ` Dan van der Ster
       [not found]         ` <CABZ+qqnQ+GrhRR7+9GmuzBA3STfwmtSzfMpSU2tPZWocMGHB8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Dan van der Ster @ 2018-03-01  8:03 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA

Is the score improving?

    ceph balancer eval

It should be decreasing over time as the variances drop toward zero.

You mentioned a crush optimize code at the beginning... how did that
leave your cluster? The mgr balancer assumes that the crush weight of
each OSD is equal to its size in TB.
Do you have any osd reweights? crush-compat will gradually adjust
those back to 1.0.

Cheers, Dan



On Thu, Mar 1, 2018 at 8:27 AM, Stefan Priebe - Profihost AG
<s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
> Does anybody have some more input?
>
> I keeped the balancer active for 24h now and it is rebalancing 1-3%
> every 30 minutes but the distribution is still bad.
>
> It seems to balance from left to right and than back from right to left...
>
> Greets,
> Stefan
>
> Am 28.02.2018 um 13:47 schrieb Stefan Priebe - Profihost AG:
>> Hello,
>>
>> with jewel we always used the python crush optimizer which gave us a
>> pretty good distribution fo the used space.
>>
>> Since luminous we're using the included ceph mgr balancer but the
>> distribution is far from perfect and much worse than the old method.
>>
>> Is there any way to tune the mgr balancer?
>>
>> Currently after a balance we still have:
>> 75% to 92% disk usage which is pretty unfair
>>
>> Greets,
>> Stefan
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]         ` <CABZ+qqnQ+GrhRR7+9GmuzBA3STfwmtSzfMpSU2tPZWocMGHB8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-03-01  8:31           ` Stefan Priebe - Profihost AG
       [not found]             ` <da7136f6-cc57-0b28-428c-ccaaef34dfa7-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-03-01  8:31 UTC (permalink / raw)
  To: Dan van der Ster
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw, ceph-devel-u79uwXL29TY76Z2rM5mHXA

Hi,
Am 01.03.2018 um 09:03 schrieb Dan van der Ster:
> Is the score improving?
> 
>     ceph balancer eval
> 
> It should be decreasing over time as the variances drop toward zero.
> 
> You mentioned a crush optimize code at the beginning... how did that
> leave your cluster? The mgr balancer assumes that the crush weight of
> each OSD is equal to its size in TB.
> Do you have any osd reweights? crush-compat will gradually adjust
> those back to 1.0.

I reweighted them all back to their correct weight.

Now the mgr balancer module says:
mgr[balancer] Failed to find further optimization, score 0.010646

But as you can see it's heavily imbalanced:


Example:
49   ssd 0.84000  1.00000   864G   546G   317G 63.26 1.13  49

vs:

48   ssd 0.84000  1.00000   864G   397G   467G 45.96 0.82  49

45% usage vs. 63%

Greets,
Stefan

> 
> Cheers, Dan
> 
> 
> 
> On Thu, Mar 1, 2018 at 8:27 AM, Stefan Priebe - Profihost AG
> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>> Does anybody have some more input?
>>
>> I keeped the balancer active for 24h now and it is rebalancing 1-3%
>> every 30 minutes but the distribution is still bad.
>>
>> It seems to balance from left to right and than back from right to left...
>>
>> Greets,
>> Stefan
>>
>> Am 28.02.2018 um 13:47 schrieb Stefan Priebe - Profihost AG:
>>> Hello,
>>>
>>> with jewel we always used the python crush optimizer which gave us a
>>> pretty good distribution fo the used space.
>>>
>>> Since luminous we're using the included ceph mgr balancer but the
>>> distribution is far from perfect and much worse than the old method.
>>>
>>> Is there any way to tune the mgr balancer?
>>>
>>> Currently after a balance we still have:
>>> 75% to 92% disk usage which is pretty unfair
>>>
>>> Greets,
>>> Stefan
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]             ` <da7136f6-cc57-0b28-428c-ccaaef34dfa7-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2018-03-01  8:42               ` Dan van der Ster
       [not found]                 ` <CABZ+qqmONpy74yXqr7e_zt_24aaxcFomPrwz0Mu2ncf0gYW3Ng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Dan van der Ster @ 2018-03-01  8:42 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w

On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG
<s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
> Hi,
> Am 01.03.2018 um 09:03 schrieb Dan van der Ster:
>> Is the score improving?
>>
>>     ceph balancer eval
>>
>> It should be decreasing over time as the variances drop toward zero.
>>
>> You mentioned a crush optimize code at the beginning... how did that
>> leave your cluster? The mgr balancer assumes that the crush weight of
>> each OSD is equal to its size in TB.
>> Do you have any osd reweights? crush-compat will gradually adjust
>> those back to 1.0.
>
> I reweighted them all back to their correct weight.
>
> Now the mgr balancer module says:
> mgr[balancer] Failed to find further optimization, score 0.010646
>
> But as you can see it's heavily imbalanced:
>
>
> Example:
> 49   ssd 0.84000  1.00000   864G   546G   317G 63.26 1.13  49
>
> vs:
>
> 48   ssd 0.84000  1.00000   864G   397G   467G 45.96 0.82  49
>
> 45% usage vs. 63%

Ahh... but look, the num PGs are perfectly balanced, which implies
that you have a relatively large number of empty PGs.

But regardless, this is annoying and I expect lots of operators to get
this result. (I've also observed that the num PGs is gets balanced
perfectly at the expense of the other score metrics.)

I was thinking of a patch around here [1] that lets operators add a
score weight on pgs, objects, bytes so we can balance how we like.

Spandan: you were the last to look at this function. Do you think it
can be improved as I suggested?

Cheers, Dan

[1] https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L558

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]                 ` <CABZ+qqmONpy74yXqr7e_zt_24aaxcFomPrwz0Mu2ncf0gYW3Ng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-03-01  8:52                   ` Stefan Priebe - Profihost AG
       [not found]                     ` <3b2c1d04-c7bd-1906-6239-b783e4fd585a-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-03-01  8:52 UTC (permalink / raw)
  To: Dan van der Ster
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w

Hi,

Am 01.03.2018 um 09:42 schrieb Dan van der Ster:
> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG
> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>> Hi,
>> Am 01.03.2018 um 09:03 schrieb Dan van der Ster:
>>> Is the score improving?
>>>
>>>     ceph balancer eval
>>>
>>> It should be decreasing over time as the variances drop toward zero.
>>>
>>> You mentioned a crush optimize code at the beginning... how did that
>>> leave your cluster? The mgr balancer assumes that the crush weight of
>>> each OSD is equal to its size in TB.
>>> Do you have any osd reweights? crush-compat will gradually adjust
>>> those back to 1.0.
>>
>> I reweighted them all back to their correct weight.
>>
>> Now the mgr balancer module says:
>> mgr[balancer] Failed to find further optimization, score 0.010646
>>
>> But as you can see it's heavily imbalanced:
>>
>>
>> Example:
>> 49   ssd 0.84000  1.00000   864G   546G   317G 63.26 1.13  49
>>
>> vs:
>>
>> 48   ssd 0.84000  1.00000   864G   397G   467G 45.96 0.82  49
>>
>> 45% usage vs. 63%
> 
> Ahh... but look, the num PGs are perfectly balanced, which implies
> that you have a relatively large number of empty PGs.
> 
> But regardless, this is annoying and I expect lots of operators to get
> this result. (I've also observed that the num PGs is gets balanced
> perfectly at the expense of the other score metrics.)
> 
> I was thinking of a patch around here [1] that lets operators add a
> score weight on pgs, objects, bytes so we can balance how we like.
> 
> Spandan: you were the last to look at this function. Do you think it
> can be improved as I suggested?

Yes the PGs are perfectly distributed - but i think most of the people
would like to have a dsitribution by bytes and not pgs.

Is this possible? I mean in the code there is already a dict for pgs,
objects and bytes - but i don't know how to change the logic. Just
remove the pgs and objects from the dict?

> Cheers, Dan
> 
> [1] https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L558
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]                     ` <3b2c1d04-c7bd-1906-6239-b783e4fd585a-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2018-03-01  8:58                       ` Dan van der Ster
       [not found]                         ` <CABZ+qqkKVsdr+Tch=ZOrpzbbSdmWo-eOdCspWxCRSTnK=buEFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Dan van der Ster @ 2018-03-01  8:58 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w

On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG
<s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
> Hi,
>
> Am 01.03.2018 um 09:42 schrieb Dan van der Ster:
>> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG
>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>> Hi,
>>> Am 01.03.2018 um 09:03 schrieb Dan van der Ster:
>>>> Is the score improving?
>>>>
>>>>     ceph balancer eval
>>>>
>>>> It should be decreasing over time as the variances drop toward zero.
>>>>
>>>> You mentioned a crush optimize code at the beginning... how did that
>>>> leave your cluster? The mgr balancer assumes that the crush weight of
>>>> each OSD is equal to its size in TB.
>>>> Do you have any osd reweights? crush-compat will gradually adjust
>>>> those back to 1.0.
>>>
>>> I reweighted them all back to their correct weight.
>>>
>>> Now the mgr balancer module says:
>>> mgr[balancer] Failed to find further optimization, score 0.010646
>>>
>>> But as you can see it's heavily imbalanced:
>>>
>>>
>>> Example:
>>> 49   ssd 0.84000  1.00000   864G   546G   317G 63.26 1.13  49
>>>
>>> vs:
>>>
>>> 48   ssd 0.84000  1.00000   864G   397G   467G 45.96 0.82  49
>>>
>>> 45% usage vs. 63%
>>
>> Ahh... but look, the num PGs are perfectly balanced, which implies
>> that you have a relatively large number of empty PGs.
>>
>> But regardless, this is annoying and I expect lots of operators to get
>> this result. (I've also observed that the num PGs is gets balanced
>> perfectly at the expense of the other score metrics.)
>>
>> I was thinking of a patch around here [1] that lets operators add a
>> score weight on pgs, objects, bytes so we can balance how we like.
>>
>> Spandan: you were the last to look at this function. Do you think it
>> can be improved as I suggested?
>
> Yes the PGs are perfectly distributed - but i think most of the people
> would like to have a dsitribution by bytes and not pgs.
>
> Is this possible? I mean in the code there is already a dict for pgs,
> objects and bytes - but i don't know how to change the logic. Just
> remove the pgs and objects from the dict?

It's worth a try to remove the pgs and objects from this dict:

https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L552

You can update that directly in the python code on your mgr's. Turn
the ceph balancer off then failover to the next mgr so it reloads the
module. Then:

ceph balancer eval
ceph balancer optimize myplan
ceph balancer eval myplan

Does it move in the right direction?

-- dan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]                         ` <CABZ+qqkKVsdr+Tch=ZOrpzbbSdmWo-eOdCspWxCRSTnK=buEFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-03-01  9:24                           ` Stefan Priebe - Profihost AG
       [not found]                             ` <bea62c27-0faf-1b47-ca1e-9577e98ec6b1-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-03-01  9:24 UTC (permalink / raw)
  To: Dan van der Ster
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w


Am 01.03.2018 um 09:58 schrieb Dan van der Ster:
> On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG
> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>> Hi,
>>
>> Am 01.03.2018 um 09:42 schrieb Dan van der Ster:
>>> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG
>>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>>> Hi,
>>>> Am 01.03.2018 um 09:03 schrieb Dan van der Ster:
>>>>> Is the score improving?
>>>>>
>>>>>     ceph balancer eval
>>>>>
>>>>> It should be decreasing over time as the variances drop toward zero.
>>>>>
>>>>> You mentioned a crush optimize code at the beginning... how did that
>>>>> leave your cluster? The mgr balancer assumes that the crush weight of
>>>>> each OSD is equal to its size in TB.
>>>>> Do you have any osd reweights? crush-compat will gradually adjust
>>>>> those back to 1.0.
>>>>
>>>> I reweighted them all back to their correct weight.
>>>>
>>>> Now the mgr balancer module says:
>>>> mgr[balancer] Failed to find further optimization, score 0.010646
>>>>
>>>> But as you can see it's heavily imbalanced:
>>>>
>>>>
>>>> Example:
>>>> 49   ssd 0.84000  1.00000   864G   546G   317G 63.26 1.13  49
>>>>
>>>> vs:
>>>>
>>>> 48   ssd 0.84000  1.00000   864G   397G   467G 45.96 0.82  49
>>>>
>>>> 45% usage vs. 63%
>>>
>>> Ahh... but look, the num PGs are perfectly balanced, which implies
>>> that you have a relatively large number of empty PGs.
>>>
>>> But regardless, this is annoying and I expect lots of operators to get
>>> this result. (I've also observed that the num PGs is gets balanced
>>> perfectly at the expense of the other score metrics.)
>>>
>>> I was thinking of a patch around here [1] that lets operators add a
>>> score weight on pgs, objects, bytes so we can balance how we like.
>>>
>>> Spandan: you were the last to look at this function. Do you think it
>>> can be improved as I suggested?
>>
>> Yes the PGs are perfectly distributed - but i think most of the people
>> would like to have a dsitribution by bytes and not pgs.
>>
>> Is this possible? I mean in the code there is already a dict for pgs,
>> objects and bytes - but i don't know how to change the logic. Just
>> remove the pgs and objects from the dict?
> 
> It's worth a try to remove the pgs and objects from this dict:
> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L552

Do i have to change this 3 to 1 cause we have only one item in the dict?
I'm not sure where the 3 comes from.
        pe.score /= 3 * len(roots)


> You can update that directly in the python code on your mgr's. Turn
> the ceph balancer off then failover to the next mgr so it reloads the
> module. Then:
> 
> ceph balancer eval
> ceph balancer optimize myplan
> ceph balancer eval myplan
> 
> Does it move in the right direction?
> 
> -- dan
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]                             ` <bea62c27-0faf-1b47-ca1e-9577e98ec6b1-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2018-03-01  9:38                               ` Dan van der Ster
       [not found]                                 ` <CABZ+qqnRwQa8Jrg9=DPc5VnzqG4cjq0RvdhfFG74NgLMs_4EwQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Dan van der Ster @ 2018-03-01  9:38 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w

On Thu, Mar 1, 2018 at 10:24 AM, Stefan Priebe - Profihost AG
<s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>
> Am 01.03.2018 um 09:58 schrieb Dan van der Ster:
>> On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG
>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>> Hi,
>>>
>>> Am 01.03.2018 um 09:42 schrieb Dan van der Ster:
>>>> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG
>>>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>>>> Hi,
>>>>> Am 01.03.2018 um 09:03 schrieb Dan van der Ster:
>>>>>> Is the score improving?
>>>>>>
>>>>>>     ceph balancer eval
>>>>>>
>>>>>> It should be decreasing over time as the variances drop toward zero.
>>>>>>
>>>>>> You mentioned a crush optimize code at the beginning... how did that
>>>>>> leave your cluster? The mgr balancer assumes that the crush weight of
>>>>>> each OSD is equal to its size in TB.
>>>>>> Do you have any osd reweights? crush-compat will gradually adjust
>>>>>> those back to 1.0.
>>>>>
>>>>> I reweighted them all back to their correct weight.
>>>>>
>>>>> Now the mgr balancer module says:
>>>>> mgr[balancer] Failed to find further optimization, score 0.010646
>>>>>
>>>>> But as you can see it's heavily imbalanced:
>>>>>
>>>>>
>>>>> Example:
>>>>> 49   ssd 0.84000  1.00000   864G   546G   317G 63.26 1.13  49
>>>>>
>>>>> vs:
>>>>>
>>>>> 48   ssd 0.84000  1.00000   864G   397G   467G 45.96 0.82  49
>>>>>
>>>>> 45% usage vs. 63%
>>>>
>>>> Ahh... but look, the num PGs are perfectly balanced, which implies
>>>> that you have a relatively large number of empty PGs.
>>>>
>>>> But regardless, this is annoying and I expect lots of operators to get
>>>> this result. (I've also observed that the num PGs is gets balanced
>>>> perfectly at the expense of the other score metrics.)
>>>>
>>>> I was thinking of a patch around here [1] that lets operators add a
>>>> score weight on pgs, objects, bytes so we can balance how we like.
>>>>
>>>> Spandan: you were the last to look at this function. Do you think it
>>>> can be improved as I suggested?
>>>
>>> Yes the PGs are perfectly distributed - but i think most of the people
>>> would like to have a dsitribution by bytes and not pgs.
>>>
>>> Is this possible? I mean in the code there is already a dict for pgs,
>>> objects and bytes - but i don't know how to change the logic. Just
>>> remove the pgs and objects from the dict?
>>
>> It's worth a try to remove the pgs and objects from this dict:
>> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L552
>
> Do i have to change this 3 to 1 cause we have only one item in the dict?
> I'm not sure where the 3 comes from.
>         pe.score /= 3 * len(roots)
>

I'm pretty sure that 3 is just for our 3 metrics. Indeed you can
change that to 1.

I'm trying this on our test cluster here too. The last few lines of
output from `ceph balancer eval-verbose` will confirm that the score
is based only on bytes.

But I'm not sure this is going to work -- indeed the score here went
from ~0.02 to 0.08, but the do_crush_compat doesn't manage to find a
better score.

-- Dan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]                                 ` <CABZ+qqnRwQa8Jrg9=DPc5VnzqG4cjq0RvdhfFG74NgLMs_4EwQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-03-01  9:40                                   ` Dan van der Ster
  2018-03-01 10:30                                     ` Dan van der Ster
  0 siblings, 1 reply; 20+ messages in thread
From: Dan van der Ster @ 2018-03-01  9:40 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w

On Thu, Mar 1, 2018 at 10:38 AM, Dan van der Ster <dan-EOCVfBHj35C+XT7JhA+gdA@public.gmane.org> wrote:
> On Thu, Mar 1, 2018 at 10:24 AM, Stefan Priebe - Profihost AG
> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>
>> Am 01.03.2018 um 09:58 schrieb Dan van der Ster:
>>> On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG
>>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>>> Hi,
>>>>
>>>> Am 01.03.2018 um 09:42 schrieb Dan van der Ster:
>>>>> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG
>>>>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>>>>> Hi,
>>>>>> Am 01.03.2018 um 09:03 schrieb Dan van der Ster:
>>>>>>> Is the score improving?
>>>>>>>
>>>>>>>     ceph balancer eval
>>>>>>>
>>>>>>> It should be decreasing over time as the variances drop toward zero.
>>>>>>>
>>>>>>> You mentioned a crush optimize code at the beginning... how did that
>>>>>>> leave your cluster? The mgr balancer assumes that the crush weight of
>>>>>>> each OSD is equal to its size in TB.
>>>>>>> Do you have any osd reweights? crush-compat will gradually adjust
>>>>>>> those back to 1.0.
>>>>>>
>>>>>> I reweighted them all back to their correct weight.
>>>>>>
>>>>>> Now the mgr balancer module says:
>>>>>> mgr[balancer] Failed to find further optimization, score 0.010646
>>>>>>
>>>>>> But as you can see it's heavily imbalanced:
>>>>>>
>>>>>>
>>>>>> Example:
>>>>>> 49   ssd 0.84000  1.00000   864G   546G   317G 63.26 1.13  49
>>>>>>
>>>>>> vs:
>>>>>>
>>>>>> 48   ssd 0.84000  1.00000   864G   397G   467G 45.96 0.82  49
>>>>>>
>>>>>> 45% usage vs. 63%
>>>>>
>>>>> Ahh... but look, the num PGs are perfectly balanced, which implies
>>>>> that you have a relatively large number of empty PGs.
>>>>>
>>>>> But regardless, this is annoying and I expect lots of operators to get
>>>>> this result. (I've also observed that the num PGs is gets balanced
>>>>> perfectly at the expense of the other score metrics.)
>>>>>
>>>>> I was thinking of a patch around here [1] that lets operators add a
>>>>> score weight on pgs, objects, bytes so we can balance how we like.
>>>>>
>>>>> Spandan: you were the last to look at this function. Do you think it
>>>>> can be improved as I suggested?
>>>>
>>>> Yes the PGs are perfectly distributed - but i think most of the people
>>>> would like to have a dsitribution by bytes and not pgs.
>>>>
>>>> Is this possible? I mean in the code there is already a dict for pgs,
>>>> objects and bytes - but i don't know how to change the logic. Just
>>>> remove the pgs and objects from the dict?
>>>
>>> It's worth a try to remove the pgs and objects from this dict:
>>> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L552
>>
>> Do i have to change this 3 to 1 cause we have only one item in the dict?
>> I'm not sure where the 3 comes from.
>>         pe.score /= 3 * len(roots)
>>
>
> I'm pretty sure that 3 is just for our 3 metrics. Indeed you can
> change that to 1.
>
> I'm trying this on our test cluster here too. The last few lines of
> output from `ceph balancer eval-verbose` will confirm that the score
> is based only on bytes.
>
> But I'm not sure this is going to work -- indeed the score here went
> from ~0.02 to 0.08, but the do_crush_compat doesn't manage to find a
> better score.

Maybe this:

https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L682

I'm trying with that = 'bytes'

-- dan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
  2018-03-01  9:40                                   ` Dan van der Ster
@ 2018-03-01 10:30                                     ` Dan van der Ster
       [not found]                                       ` <CABZ+qqm-gMs9COEg2TVfNwEhVja8mGox00=0y5wQB7Z2QoSjSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Dan van der Ster @ 2018-03-01 10:30 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w

On Thu, Mar 1, 2018 at 10:40 AM, Dan van der Ster <dan-EOCVfBHj35C+XT7JhA+gdA@public.gmane.org> wrote:
> On Thu, Mar 1, 2018 at 10:38 AM, Dan van der Ster <dan-EOCVfBHj35C+XT7JhA+gdA@public.gmane.org> wrote:
>> On Thu, Mar 1, 2018 at 10:24 AM, Stefan Priebe - Profihost AG
>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>>
>>> Am 01.03.2018 um 09:58 schrieb Dan van der Ster:
>>>> On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG
>>>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>>>> Hi,
>>>>>
>>>>> Am 01.03.2018 um 09:42 schrieb Dan van der Ster:
>>>>>> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG
>>>>>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>>>>>> Hi,
>>>>>>> Am 01.03.2018 um 09:03 schrieb Dan van der Ster:
>>>>>>>> Is the score improving?
>>>>>>>>
>>>>>>>>     ceph balancer eval
>>>>>>>>
>>>>>>>> It should be decreasing over time as the variances drop toward zero.
>>>>>>>>
>>>>>>>> You mentioned a crush optimize code at the beginning... how did that
>>>>>>>> leave your cluster? The mgr balancer assumes that the crush weight of
>>>>>>>> each OSD is equal to its size in TB.
>>>>>>>> Do you have any osd reweights? crush-compat will gradually adjust
>>>>>>>> those back to 1.0.
>>>>>>>
>>>>>>> I reweighted them all back to their correct weight.
>>>>>>>
>>>>>>> Now the mgr balancer module says:
>>>>>>> mgr[balancer] Failed to find further optimization, score 0.010646
>>>>>>>
>>>>>>> But as you can see it's heavily imbalanced:
>>>>>>>
>>>>>>>
>>>>>>> Example:
>>>>>>> 49   ssd 0.84000  1.00000   864G   546G   317G 63.26 1.13  49
>>>>>>>
>>>>>>> vs:
>>>>>>>
>>>>>>> 48   ssd 0.84000  1.00000   864G   397G   467G 45.96 0.82  49
>>>>>>>
>>>>>>> 45% usage vs. 63%
>>>>>>
>>>>>> Ahh... but look, the num PGs are perfectly balanced, which implies
>>>>>> that you have a relatively large number of empty PGs.
>>>>>>
>>>>>> But regardless, this is annoying and I expect lots of operators to get
>>>>>> this result. (I've also observed that the num PGs is gets balanced
>>>>>> perfectly at the expense of the other score metrics.)
>>>>>>
>>>>>> I was thinking of a patch around here [1] that lets operators add a
>>>>>> score weight on pgs, objects, bytes so we can balance how we like.
>>>>>>
>>>>>> Spandan: you were the last to look at this function. Do you think it
>>>>>> can be improved as I suggested?
>>>>>
>>>>> Yes the PGs are perfectly distributed - but i think most of the people
>>>>> would like to have a dsitribution by bytes and not pgs.
>>>>>
>>>>> Is this possible? I mean in the code there is already a dict for pgs,
>>>>> objects and bytes - but i don't know how to change the logic. Just
>>>>> remove the pgs and objects from the dict?
>>>>
>>>> It's worth a try to remove the pgs and objects from this dict:
>>>> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L552
>>>
>>> Do i have to change this 3 to 1 cause we have only one item in the dict?
>>> I'm not sure where the 3 comes from.
>>>         pe.score /= 3 * len(roots)
>>>
>>
>> I'm pretty sure that 3 is just for our 3 metrics. Indeed you can
>> change that to 1.
>>
>> I'm trying this on our test cluster here too. The last few lines of
>> output from `ceph balancer eval-verbose` will confirm that the score
>> is based only on bytes.
>>
>> But I'm not sure this is going to work -- indeed the score here went
>> from ~0.02 to 0.08, but the do_crush_compat doesn't manage to find a
>> better score.
>
> Maybe this:
>
> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L682
>
> I'm trying with that = 'bytes'

That seems to be working. I sent this PR as a start
https://github.com/ceph/ceph/pull/20665

I'm not sure we need to mess with the score function, on second thought.

-- dan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]                                       ` <CABZ+qqm-gMs9COEg2TVfNwEhVja8mGox00=0y5wQB7Z2QoSjSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-03-01 12:08                                         ` Stefan Priebe - Profihost AG
       [not found]                                           ` <3d244da6-25c2-b6d8-d4c2-a6a28b897509-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-03-01 12:08 UTC (permalink / raw)
  To: Dan van der Ster
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w

nice thanks will try that soon.

Can you tell me how to change the log lever to info for the balancer module?

Am 01.03.2018 um 11:30 schrieb Dan van der Ster:
> On Thu, Mar 1, 2018 at 10:40 AM, Dan van der Ster <dan-EOCVfBHj35C+XT7JhA+gdA@public.gmane.org> wrote:
>> On Thu, Mar 1, 2018 at 10:38 AM, Dan van der Ster <dan-EOCVfBHj35C+XT7JhA+gdA@public.gmane.org> wrote:
>>> On Thu, Mar 1, 2018 at 10:24 AM, Stefan Priebe - Profihost AG
>>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>>>
>>>> Am 01.03.2018 um 09:58 schrieb Dan van der Ster:
>>>>> On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG
>>>>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Am 01.03.2018 um 09:42 schrieb Dan van der Ster:
>>>>>>> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG
>>>>>>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>>>>>>> Hi,
>>>>>>>> Am 01.03.2018 um 09:03 schrieb Dan van der Ster:
>>>>>>>>> Is the score improving?
>>>>>>>>>
>>>>>>>>>     ceph balancer eval
>>>>>>>>>
>>>>>>>>> It should be decreasing over time as the variances drop toward zero.
>>>>>>>>>
>>>>>>>>> You mentioned a crush optimize code at the beginning... how did that
>>>>>>>>> leave your cluster? The mgr balancer assumes that the crush weight of
>>>>>>>>> each OSD is equal to its size in TB.
>>>>>>>>> Do you have any osd reweights? crush-compat will gradually adjust
>>>>>>>>> those back to 1.0.
>>>>>>>>
>>>>>>>> I reweighted them all back to their correct weight.
>>>>>>>>
>>>>>>>> Now the mgr balancer module says:
>>>>>>>> mgr[balancer] Failed to find further optimization, score 0.010646
>>>>>>>>
>>>>>>>> But as you can see it's heavily imbalanced:
>>>>>>>>
>>>>>>>>
>>>>>>>> Example:
>>>>>>>> 49   ssd 0.84000  1.00000   864G   546G   317G 63.26 1.13  49
>>>>>>>>
>>>>>>>> vs:
>>>>>>>>
>>>>>>>> 48   ssd 0.84000  1.00000   864G   397G   467G 45.96 0.82  49
>>>>>>>>
>>>>>>>> 45% usage vs. 63%
>>>>>>>
>>>>>>> Ahh... but look, the num PGs are perfectly balanced, which implies
>>>>>>> that you have a relatively large number of empty PGs.
>>>>>>>
>>>>>>> But regardless, this is annoying and I expect lots of operators to get
>>>>>>> this result. (I've also observed that the num PGs is gets balanced
>>>>>>> perfectly at the expense of the other score metrics.)
>>>>>>>
>>>>>>> I was thinking of a patch around here [1] that lets operators add a
>>>>>>> score weight on pgs, objects, bytes so we can balance how we like.
>>>>>>>
>>>>>>> Spandan: you were the last to look at this function. Do you think it
>>>>>>> can be improved as I suggested?
>>>>>>
>>>>>> Yes the PGs are perfectly distributed - but i think most of the people
>>>>>> would like to have a dsitribution by bytes and not pgs.
>>>>>>
>>>>>> Is this possible? I mean in the code there is already a dict for pgs,
>>>>>> objects and bytes - but i don't know how to change the logic. Just
>>>>>> remove the pgs and objects from the dict?
>>>>>
>>>>> It's worth a try to remove the pgs and objects from this dict:
>>>>> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L552
>>>>
>>>> Do i have to change this 3 to 1 cause we have only one item in the dict?
>>>> I'm not sure where the 3 comes from.
>>>>         pe.score /= 3 * len(roots)
>>>>
>>>
>>> I'm pretty sure that 3 is just for our 3 metrics. Indeed you can
>>> change that to 1.
>>>
>>> I'm trying this on our test cluster here too. The last few lines of
>>> output from `ceph balancer eval-verbose` will confirm that the score
>>> is based only on bytes.
>>>
>>> But I'm not sure this is going to work -- indeed the score here went
>>> from ~0.02 to 0.08, but the do_crush_compat doesn't manage to find a
>>> better score.
>>
>> Maybe this:
>>
>> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L682
>>
>> I'm trying with that = 'bytes'
> 
> That seems to be working. I sent this PR as a start
> https://github.com/ceph/ceph/pull/20665
> 
> I'm not sure we need to mess with the score function, on second thought.
> 
> -- dan
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]                                           ` <3d244da6-25c2-b6d8-d4c2-a6a28b897509-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2018-03-01 12:12                                             ` Dan van der Ster
       [not found]                                               ` <CABZ+qq=xs5CYAXn55JEGbA4OSZayGdvbFnpwDz7AZDa0A0T2aQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Dan van der Ster @ 2018-03-01 12:12 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w

On Thu, Mar 1, 2018 at 1:08 PM, Stefan Priebe - Profihost AG
<s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
> nice thanks will try that soon.
>
> Can you tell me how to change the log lever to info for the balancer module?

debug mgr = 4/5

-- dan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]                                               ` <CABZ+qq=xs5CYAXn55JEGbA4OSZayGdvbFnpwDz7AZDa0A0T2aQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-03-02  9:12                                                 ` Stefan Priebe - Profihost AG
       [not found]                                                   ` <88BB07AB-D6C8-4106-953F-2131E56081BD-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  2018-03-02 10:13                                                 ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-03-02  9:12 UTC (permalink / raw)
  To: Dan van der Ster
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w


[-- Attachment #1.1: Type: text/plain, Size: 790 bytes --]

Thanks! Your patch works great! The only problem I still see is that the balancer kicks in even when the old optimize has not finished. It seems it only evaluated the degraded of value. But while remapping it can happen that none are degraded but a lot are still misplaced.

I think the balancer should evaluate the ceph health status as well.

Stefan

Excuse my typo sent from my mobile phone.

> Am 01.03.2018 um 13:12 schrieb Dan van der Ster <dan-EOCVfBHj35C+XT7JhA+gdA@public.gmane.org>:
> 
> On Thu, Mar 1, 2018 at 1:08 PM, Stefan Priebe - Profihost AG
> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>> nice thanks will try that soon.
>> 
>> Can you tell me how to change the log lever to info for the balancer module?
> 
> debug mgr = 4/5
> 
> -- dan

[-- Attachment #1.2: Type: text/html, Size: 1432 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]                                               ` <CABZ+qq=xs5CYAXn55JEGbA4OSZayGdvbFnpwDz7AZDa0A0T2aQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2018-03-02  9:12                                                 ` Stefan Priebe - Profihost AG
@ 2018-03-02 10:13                                                 ` Stefan Priebe - Profihost AG
  1 sibling, 0 replies; 20+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-03-02 10:13 UTC (permalink / raw)
  To: Dan van der Ster
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w

Thanks! Your patch works great! The only problem I still see is that the
balancer kicks in even when the old optimize has not finished. It seems
it only evaluated the degraded of value. But while remapping it can
happen that none are degraded but a lot are still misplaced.

I think the balancer should evaluate the ceph health status as well.

Stefan

Excuse my typo sent from my mobile phone.

Am 01.03.2018 um 13:12 schrieb Dan van der Ster <dan-EOCVfBHj35C+XT7JhA+gdA@public.gmane.org
<mailto:dan-EOCVfBHj35C+XT7JhA+gdA@public.gmane.org>>:

> On Thu, Mar 1, 2018 at 1:08 PM, Stefan Priebe - Profihost AG
> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org <mailto:s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>> wrote:
>> nice thanks will try that soon.
>>
>> Can you tell me how to change the log lever to info for the balancer
>> module?
>
> debug mgr = 4/5
>
> -- dan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]                                                   ` <88BB07AB-D6C8-4106-953F-2131E56081BD-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2018-03-02 13:29                                                     ` Dan van der Ster
       [not found]                                                       ` <CABZ+qqkdAHvkLv0q8ysDhjx+dHC_TCYrcQT9Nv_ddLt0krGzgg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Dan van der Ster @ 2018-03-02 13:29 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w

On Fri, Mar 2, 2018 at 10:12 AM, Stefan Priebe - Profihost AG
<s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
> Thanks! Your patch works great!

Cool! I plan to add one more feature to allow operators to switch off
components of the score function. Currently, by only changing the key
to 'bytes', we aren't able to fully balance things because at some
point the pgs score gets too suboptimal and the overall score reaches
a min value.

> The only problem I still see is that the
> balancer kicks in even when the old optimize has not finished. It seems it
> only evaluated the degraded of value. But while remapping it can happen that
> none are degraded but a lot are still misplaced.
>
> I think the balancer should evaluate the ceph health status as well.

I guess this point is debatable. On our clusters we use max_misplaced
= 0.01 and set the begin_hour end_hour to daytime hours so that by the
late evening, every day, the cluster is back to HEALTH_OK.

Cheers, Dan

> Stefan
>
> Excuse my typo sent from my mobile phone.
>
> Am 01.03.2018 um 13:12 schrieb Dan van der Ster <dan-EOCVfBHj35C+XT7JhA+gdA@public.gmane.org>:
>
> On Thu, Mar 1, 2018 at 1:08 PM, Stefan Priebe - Profihost AG
> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>
> nice thanks will try that soon.
>
>
> Can you tell me how to change the log lever to info for the balancer module?
>
>
> debug mgr = 4/5
>
> -- dan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]                                                       ` <CABZ+qqkdAHvkLv0q8ysDhjx+dHC_TCYrcQT9Nv_ddLt0krGzgg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-03-02 20:21                                                         ` Stefan Priebe - Profihost AG
       [not found]                                                           ` <173aba9e-16ae-c9d6-3afa-2c25683b0dbe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-03-02 20:21 UTC (permalink / raw)
  To: Dan van der Ster
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w

Hi,

Am 02.03.2018 um 14:29 schrieb Dan van der Ster:
> On Fri, Mar 2, 2018 at 10:12 AM, Stefan Priebe - Profihost AG
> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>> Thanks! Your patch works great!
> 
> Cool! I plan to add one more feature to allow operators to switch off
> components of the score function. Currently, by only changing the key
> to 'bytes', we aren't able to fully balance things because at some
> point the pgs score gets too suboptimal and the overall score reaches
> a min value.

OK great but it still works perfectly for my clusters. 2%-3% difference
for all OSDs.

>> The only problem I still see is that the
>> balancer kicks in even when the old optimize has not finished. It seems it
>> only evaluated the degraded of value. But while remapping it can happen that
>> none are degraded but a lot are still misplaced.
>>
>> I think the balancer should evaluate the ceph health status as well.
> 
> I guess this point is debatable. On our clusters we use max_misplaced
> = 0.01 and set the begin_hour end_hour to daytime hours so that by the
> late evening, every day, the cluster is back to HEALTH_OK.

ah OK. Is it true that the begin_time and end_time is GMT and not local
time? Can we change this as it makes configuration of monitoring systems
impossible - time changes also with summer and winter time.

Greets,
Stefan

> Cheers, Dan
> 
>> Stefan
>>
>> Excuse my typo sent from my mobile phone.
>>
>> Am 01.03.2018 um 13:12 schrieb Dan van der Ster <dan-EOCVfBHj35C+XT7JhA+gdA@public.gmane.org>:
>>
>> On Thu, Mar 1, 2018 at 1:08 PM, Stefan Priebe - Profihost AG
>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>
>> nice thanks will try that soon.
>>
>>
>> Can you tell me how to change the log lever to info for the balancer module?
>>
>>
>> debug mgr = 4/5
>>
>> -- dan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ceph mgr balancer bad distribution
       [not found]                                                           ` <173aba9e-16ae-c9d6-3afa-2c25683b0dbe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2018-03-03 20:04                                                             ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-03-03 20:04 UTC (permalink / raw)
  To: Dan van der Ster
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w

Hi,

Am 02.03.2018 um 21:21 schrieb Stefan Priebe - Profihost AG:
> Hi,
> 
> Am 02.03.2018 um 14:29 schrieb Dan van der Ster:
>> On Fri, Mar 2, 2018 at 10:12 AM, Stefan Priebe - Profihost AG
>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>> Thanks! Your patch works great!
>>
>> Cool! I plan to add one more feature to allow operators to switch off
>> components of the score function. Currently, by only changing the key
>> to 'bytes', we aren't able to fully balance things because at some
>> point the pgs score gets too suboptimal and the overall score reaches
>> a min value.
> 
> OK great but it still works perfectly for my clusters. 2%-3% difference
> for all OSDs.

ok i had another cluster - where this seems to happen. It just says
Failed to find further optimization but the bytes difference is 12%.

>>> The only problem I still see is that the
>>> balancer kicks in even when the old optimize has not finished. It seems it
>>> only evaluated the degraded of value. But while remapping it can happen that
>>> none are degraded but a lot are still misplaced.
>>>
>>> I think the balancer should evaluate the ceph health status as well.
>>
>> I guess this point is debatable. On our clusters we use max_misplaced
>> = 0.01 and set the begin_hour end_hour to daytime hours so that by the
>> late evening, every day, the cluster is back to HEALTH_OK.
> 
> ah OK. Is it true that the begin_time and end_time is GMT and not local
> time? Can we change this as it makes configuration of monitoring systems
> impossible - time changes also with summer and winter time.

Sorry at least in my case this works fine.

> 
> Greets,
> Stefan
> 
>> Cheers, Dan
>>
>>> Stefan
>>>
>>> Excuse my typo sent from my mobile phone.
>>>
>>> Am 01.03.2018 um 13:12 schrieb Dan van der Ster <dan-EOCVfBHj35C+XT7JhA+gdA@public.gmane.org>:
>>>
>>> On Thu, Mar 1, 2018 at 1:08 PM, Stefan Priebe - Profihost AG
>>> <s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org> wrote:
>>>
>>> nice thanks will try that soon.
>>>
>>>
>>> Can you tell me how to change the log lever to info for the balancer module?
>>>
>>>
>>> debug mgr = 4/5
>>>
>>> -- dan

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-03-03 20:04 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-28 12:47 ceph mgr balancer bad distribution Stefan Priebe - Profihost AG
     [not found] ` <1ac5678e-ec95-3ab6-38bf-bdb889e1cd23-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2018-02-28 12:58   ` Dan van der Ster
     [not found]     ` <CABZ+qqmgOb459reQ2=MkhQLBho_O5AM8OA=0PuUQ1Zz=uGrMpA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-28 13:59       ` Stefan Priebe - Profihost AG
2018-03-01  7:27   ` Stefan Priebe - Profihost AG
     [not found]     ` <b5d774be-a2e2-b57c-d201-b5df71868d49-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2018-03-01  8:03       ` Dan van der Ster
     [not found]         ` <CABZ+qqnQ+GrhRR7+9GmuzBA3STfwmtSzfMpSU2tPZWocMGHB8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-03-01  8:31           ` Stefan Priebe - Profihost AG
     [not found]             ` <da7136f6-cc57-0b28-428c-ccaaef34dfa7-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2018-03-01  8:42               ` Dan van der Ster
     [not found]                 ` <CABZ+qqmONpy74yXqr7e_zt_24aaxcFomPrwz0Mu2ncf0gYW3Ng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-03-01  8:52                   ` Stefan Priebe - Profihost AG
     [not found]                     ` <3b2c1d04-c7bd-1906-6239-b783e4fd585a-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2018-03-01  8:58                       ` Dan van der Ster
     [not found]                         ` <CABZ+qqkKVsdr+Tch=ZOrpzbbSdmWo-eOdCspWxCRSTnK=buEFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-03-01  9:24                           ` Stefan Priebe - Profihost AG
     [not found]                             ` <bea62c27-0faf-1b47-ca1e-9577e98ec6b1-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2018-03-01  9:38                               ` Dan van der Ster
     [not found]                                 ` <CABZ+qqnRwQa8Jrg9=DPc5VnzqG4cjq0RvdhfFG74NgLMs_4EwQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-03-01  9:40                                   ` Dan van der Ster
2018-03-01 10:30                                     ` Dan van der Ster
     [not found]                                       ` <CABZ+qqm-gMs9COEg2TVfNwEhVja8mGox00=0y5wQB7Z2QoSjSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-03-01 12:08                                         ` Stefan Priebe - Profihost AG
     [not found]                                           ` <3d244da6-25c2-b6d8-d4c2-a6a28b897509-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2018-03-01 12:12                                             ` Dan van der Ster
     [not found]                                               ` <CABZ+qq=xs5CYAXn55JEGbA4OSZayGdvbFnpwDz7AZDa0A0T2aQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-03-02  9:12                                                 ` Stefan Priebe - Profihost AG
     [not found]                                                   ` <88BB07AB-D6C8-4106-953F-2131E56081BD-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2018-03-02 13:29                                                     ` Dan van der Ster
     [not found]                                                       ` <CABZ+qqkdAHvkLv0q8ysDhjx+dHC_TCYrcQT9Nv_ddLt0krGzgg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-03-02 20:21                                                         ` Stefan Priebe - Profihost AG
     [not found]                                                           ` <173aba9e-16ae-c9d6-3afa-2c25683b0dbe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2018-03-03 20:04                                                             ` Stefan Priebe - Profihost AG
2018-03-02 10:13                                                 ` Stefan Priebe - Profihost AG

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.