All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xusangdi <xu.sangdi@h3c.com>
To: "Chen, Xiaoxi" <xiaoxi.chen@intel.com>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: RE: chooseleaf may cause some unnecessary pg migrations
Date: Mon, 19 Oct 2015 02:08:34 +0000	[thread overview]
Message-ID: <FF8D83DAE5DF57468726B79904321E06047D8A@H3CMLB12-EX.srv.huawei-3com.com> (raw)
In-Reply-To: <6F3FA899187F0043BA1827A69DA2F7CC03633AAB@shsmsx102.ccr.corp.intel.com>


> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of
> Chen, Xiaoxi
> Sent: Monday, October 19, 2015 9:11 AM
> To: xusangdi 11976 (RD)
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: chooseleaf may cause some unnecessary pg migrations
>
> Sorry but not following...
>
> > then shut down one or more osds (please don't touch the crushmap, just stop the osd service or kill
> its process).
>
> In this case, OSD is only down but not out, but will be marked out after 300s.
>
> So in what case your patch is helping?
>
>       If you said your patch helps on "down and out" , then my experiment is exactly the case,
>

I am afraid it is probably not. Could you tell me how did you simulate the osd "down and out" situation using crushtool? If it was done by arguments such as '--remove-item' or 'reweight-item', it modified the crushmap and is not what I'm aiming for. Some details here as I said in previous discussion:
        > I think the situation is the same:
        >
        Well I am not sure if this is what you referred, but I can confirm that 'mark osd out' is definitely different from 'remove osd from crushmap'. Please note if it's not clear that the 'crushmap' here I mean what we can obtain from 'ceph osd         getcrushmap' command.

        > when the osd is marked "out" the weight[i] value goes to zero and we
        > will reject it at
        >
        > https://github.com/ceph/ceph/pull/6242/files#diff-0df13ad294f6585c3225
        > 88cfe026d701R535
        >
        Yes but we do pick this 'out' osd first from the crush_bucket_choose(...) function, which would never return this osd if it is removed from the crushmap. From my understanding, CRUSH would first make a choice based on values parsed from the        crushmap(crush_bucket_choose function), and then determine whether to actually choose this item depend on real-time weights(is_out function). Hence, if an osd is only marked out but not manually removed from the crushmap (or maybe it will  automatically be removed after a certain time? this is what I asked), the crush_bucket_choose(...) function will return exactly the same results as before for each turn.

        > (An osd goes down but not out doesn't have any effect on crush--the
        > ceph code just filters those osds out of the crush result.)
In a few words, from the code view the 'crushmap' is stable when a cluster is running and won't change even when some device becomes down and out. So again if you do tests by simulating a running cluster, you will see the difference. Anyway please note this is only the personal understanding from my knowledge and experience, and is not yet confirmed by sage or any other expert.

Thank you,
---Sangdi

>       If you said your patch helps on "down but not out" case, then we do need a new way to
> simulate.
>
>

-------------------------------------------------------------------------------------------------------------------------------------
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!

  reply	other threads:[~2015-10-19  2:08 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-13  8:22 chooseleaf may cause some unnecessary pg migrations Xusangdi
2015-10-13 14:20 ` Robert LeBlanc
2015-10-14  2:15   ` Xusangdi
2015-10-14 12:18     ` Sage Weil
2015-10-13 16:44 ` Sage Weil
2015-10-14  5:22   ` Xusangdi
2015-10-14 14:17     ` Sage Weil
     [not found]       ` <FF8D83DAE5DF57468726B79904321E06047677@H3CMLB12-EX.srv.huawei-3com.com>
2015-10-15 19:14         ` Sage Weil
2015-10-16  4:28           ` Chen, Xiaoxi
2015-10-16  6:12             ` Xusangdi
2015-10-16  6:26               ` Chen, Xiaoxi
2015-10-16  6:36                 ` Dałek, Piotr
2015-10-16  6:43                 ` Xusangdi
2015-10-19  1:11                   ` Chen, Xiaoxi
2015-10-19  2:08                     ` Xusangdi [this message]
2015-10-19  2:17                       ` Sage Weil
2015-10-19  7:33                         ` Chen, Xiaoxi
2015-10-19  8:12                           ` Xusangdi
2015-10-23 15:41                             ` Chen, Xiaoxi
2015-10-16 11:24           ` Xusangdi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FF8D83DAE5DF57468726B79904321E06047D8A@H3CMLB12-EX.srv.huawei-3com.com \
    --to=xu.sangdi@h3c.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=xiaoxi.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.