From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan van der Ster Subject: Re: ceph mgr balancer bad distribution Date: Thu, 1 Mar 2018 10:40:51 +0100 Message-ID: References: <1ac5678e-ec95-3ab6-38bf-bdb889e1cd23@profihost.ag> <3b2c1d04-c7bd-1906-6239-b783e4fd585a@profihost.ag> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org Sender: "ceph-users" To: Stefan Priebe - Profihost AG Cc: "ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org" , "ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , spandankumarsahu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org List-Id: ceph-devel.vger.kernel.org On Thu, Mar 1, 2018 at 10:38 AM, Dan van der Ster wrote: > On Thu, Mar 1, 2018 at 10:24 AM, Stefan Priebe - Profihost AG > wrote: >> >> Am 01.03.2018 um 09:58 schrieb Dan van der Ster: >>> On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG >>> wrote: >>>> Hi, >>>> >>>> Am 01.03.2018 um 09:42 schrieb Dan van der Ster: >>>>> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG >>>>> wrote: >>>>>> Hi, >>>>>> Am 01.03.2018 um 09:03 schrieb Dan van der Ster: >>>>>>> Is the score improving? >>>>>>> >>>>>>> ceph balancer eval >>>>>>> >>>>>>> It should be decreasing over time as the variances drop toward zero. >>>>>>> >>>>>>> You mentioned a crush optimize code at the beginning... how did that >>>>>>> leave your cluster? The mgr balancer assumes that the crush weight of >>>>>>> each OSD is equal to its size in TB. >>>>>>> Do you have any osd reweights? crush-compat will gradually adjust >>>>>>> those back to 1.0. >>>>>> >>>>>> I reweighted them all back to their correct weight. >>>>>> >>>>>> Now the mgr balancer module says: >>>>>> mgr[balancer] Failed to find further optimization, score 0.010646 >>>>>> >>>>>> But as you can see it's heavily imbalanced: >>>>>> >>>>>> >>>>>> Example: >>>>>> 49 ssd 0.84000 1.00000 864G 546G 317G 63.26 1.13 49 >>>>>> >>>>>> vs: >>>>>> >>>>>> 48 ssd 0.84000 1.00000 864G 397G 467G 45.96 0.82 49 >>>>>> >>>>>> 45% usage vs. 63% >>>>> >>>>> Ahh... but look, the num PGs are perfectly balanced, which implies >>>>> that you have a relatively large number of empty PGs. >>>>> >>>>> But regardless, this is annoying and I expect lots of operators to get >>>>> this result. (I've also observed that the num PGs is gets balanced >>>>> perfectly at the expense of the other score metrics.) >>>>> >>>>> I was thinking of a patch around here [1] that lets operators add a >>>>> score weight on pgs, objects, bytes so we can balance how we like. >>>>> >>>>> Spandan: you were the last to look at this function. Do you think it >>>>> can be improved as I suggested? >>>> >>>> Yes the PGs are perfectly distributed - but i think most of the people >>>> would like to have a dsitribution by bytes and not pgs. >>>> >>>> Is this possible? I mean in the code there is already a dict for pgs, >>>> objects and bytes - but i don't know how to change the logic. Just >>>> remove the pgs and objects from the dict? >>> >>> It's worth a try to remove the pgs and objects from this dict: >>> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L552 >> >> Do i have to change this 3 to 1 cause we have only one item in the dict? >> I'm not sure where the 3 comes from. >> pe.score /= 3 * len(roots) >> > > I'm pretty sure that 3 is just for our 3 metrics. Indeed you can > change that to 1. > > I'm trying this on our test cluster here too. The last few lines of > output from `ceph balancer eval-verbose` will confirm that the score > is based only on bytes. > > But I'm not sure this is going to work -- indeed the score here went > from ~0.02 to 0.08, but the do_crush_compat doesn't manage to find a > better score. Maybe this: https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L682 I'm trying with that = 'bytes' -- dan