All of lore.kernel.org
 help / color / mirror / Atom feed
* crush_reweight_uniform_bucket documentation
@ 2017-01-24 23:11 Loic Dachary
  2017-01-24 23:43 ` Sage Weil
  0 siblings, 1 reply; 5+ messages in thread
From: Loic Dachary @ 2017-01-24 23:11 UTC (permalink / raw)
  To: Sage Weil; +Cc: Ceph Development

Hi Sage,

While documenting crush_reweight_bucket[1] I came accross something that I don't understand when reweighting uniform buckets[2]. The associated commit[3] is six years old but maybe your remember why the item_weight had to be adjusted with the average of the weight of the buckets... but only if there are more buckets than leaves ?

Cheers

[1] http://libcrush.org/main/libcrush/blob/wip-2-doxygen/crush/builder.h#L111
[2] http://libcrush.org/main/libcrush/blob/wip-2-doxygen/crush/builder.c#L1282
[3] http://libcrush.org/main/libcrush/commit/60f627f88c6314c5a89bb7119ead907ca8b8ef37

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: crush_reweight_uniform_bucket documentation
  2017-01-24 23:11 crush_reweight_uniform_bucket documentation Loic Dachary
@ 2017-01-24 23:43 ` Sage Weil
  2017-01-25  5:39   ` Loic Dachary
  2017-01-25  7:45   ` Wido den Hollander
  0 siblings, 2 replies; 5+ messages in thread
From: Sage Weil @ 2017-01-24 23:43 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1298 bytes --]

On Wed, 25 Jan 2017, Loic Dachary wrote:
> Hi Sage,
> 
> While documenting crush_reweight_bucket[1] I came accross something that 
> I don't understand when reweighting uniform buckets[2]. The associated 
> commit[3] is six years old but maybe your remember why the item_weight 
> had to be adjusted with the average of the weight of the buckets... but 
> only if there are more buckets than leaves ?

I think it's just a half-hearted attempt to Do The Right Thing when the 
situation is nonsensical.  Uniform buckets are meant to be used with leave 
(device) items of fixed weight (item_weight).  If you (ab)use them with 
bucket children, the algorithm can't really do the right thing because it 
doesn't understand the child bucket weights.  If there are a lot of bucket 
children it resets item_weight to their average.

This is probably pointless... we could just remove it, and perhaps warn 
(or error out?) in CrushCompiler if a uniform bucket child is a 
non-device.

s


> 
> Cheers
> 
> [1] http://libcrush.org/main/libcrush/blob/wip-2-doxygen/crush/builder.h#L111
> [2] http://libcrush.org/main/libcrush/blob/wip-2-doxygen/crush/builder.c#L1282
> [3] http://libcrush.org/main/libcrush/commit/60f627f88c6314c5a89bb7119ead907ca8b8ef37
> 
> -- 
> Loïc Dachary, Artisan Logiciel Libre
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: crush_reweight_uniform_bucket documentation
  2017-01-24 23:43 ` Sage Weil
@ 2017-01-25  5:39   ` Loic Dachary
  2017-01-25  7:45   ` Wido den Hollander
  1 sibling, 0 replies; 5+ messages in thread
From: Loic Dachary @ 2017-01-25  5:39 UTC (permalink / raw)
  To: Sage Weil; +Cc: Ceph Development



On 01/25/2017 12:43 AM, Sage Weil wrote:
> On Wed, 25 Jan 2017, Loic Dachary wrote:
>> Hi Sage,
>>
>> While documenting crush_reweight_bucket[1] I came accross something that 
>> I don't understand when reweighting uniform buckets[2]. The associated 
>> commit[3] is six years old but maybe your remember why the item_weight 
>> had to be adjusted with the average of the weight of the buckets... but 
>> only if there are more buckets than leaves ?
> 
> I think it's just a half-hearted attempt to Do The Right Thing when the 
> situation is nonsensical.  Uniform buckets are meant to be used with leave 
> (device) items of fixed weight (item_weight).  If you (ab)use them with 
> bucket children, the algorithm can't really do the right thing because it 
> doesn't understand the child bucket weights.  If there are a lot of bucket 
> children it resets item_weight to their average.
> 
> This is probably pointless... we could just remove it, and perhaps warn 
> (or error out?) in CrushCompiler if a uniform bucket child is a 
> non-device.

Understood. http://libcrush.org/main/libcrush/issues/8 was created to track that.

Thanks !

> s
> 
> 
>>
>> Cheers
>>
>> [1] http://libcrush.org/main/libcrush/blob/wip-2-doxygen/crush/builder.h#L111
>> [2] http://libcrush.org/main/libcrush/blob/wip-2-doxygen/crush/builder.c#L1282
>> [3] http://libcrush.org/main/libcrush/commit/60f627f88c6314c5a89bb7119ead907ca8b8ef37
>>
>> -- 
>> Loïc Dachary, Artisan Logiciel Libre
>>

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: crush_reweight_uniform_bucket documentation
  2017-01-24 23:43 ` Sage Weil
  2017-01-25  5:39   ` Loic Dachary
@ 2017-01-25  7:45   ` Wido den Hollander
  2017-01-25 12:47     ` Sage Weil
  1 sibling, 1 reply; 5+ messages in thread
From: Wido den Hollander @ 2017-01-25  7:45 UTC (permalink / raw)
  To: Loic Dachary, Sage Weil; +Cc: Ceph Development


> Op 25 januari 2017 om 0:43 schreef Sage Weil <sweil@redhat.com>:
> 
> 
> On Wed, 25 Jan 2017, Loic Dachary wrote:
> > Hi Sage,
> > 
> > While documenting crush_reweight_bucket[1] I came accross something that 
> > I don't understand when reweighting uniform buckets[2]. The associated 
> > commit[3] is six years old but maybe your remember why the item_weight 
> > had to be adjusted with the average of the weight of the buckets... but 
> > only if there are more buckets than leaves ?
> 
> I think it's just a half-hearted attempt to Do The Right Thing when the 
> situation is nonsensical.  Uniform buckets are meant to be used with leave 
> (device) items of fixed weight (item_weight).  If you (ab)use them with 
> bucket children, the algorithm can't really do the right thing because it 
> doesn't understand the child bucket weights.  If there are a lot of bucket 
> children it resets item_weight to their average.
> 
> This is probably pointless... we could just remove it, and perhaps warn 
> (or error out?) in CrushCompiler if a uniform bucket child is a 
> non-device.
> 

So I know of a setup which uses something like this:

datacenter dc1 {
    alg straw2
    hash 0
    item rack1
    item rack2
}

root ams {
    alg uniform
    hash 0
    item dc1
    item dc2
    item dc3
    item dc4
}

They want all 3 replicas over 3 different DCs and be able to handle a complete DC failure and recover from it. To prevent data shuffling when a weight is changed in a rack, but data may move inside the DC.

From your comment I understand that was never the intention of uniform buckets?

Wido

> s
> 
> 
> > 
> > Cheers
> > 
> > [1] http://libcrush.org/main/libcrush/blob/wip-2-doxygen/crush/builder.h#L111
> > [2] http://libcrush.org/main/libcrush/blob/wip-2-doxygen/crush/builder.c#L1282
> > [3] http://libcrush.org/main/libcrush/commit/60f627f88c6314c5a89bb7119ead907ca8b8ef37
> > 
> > -- 
> > Loïc Dachary, Artisan Logiciel Libre
> > 
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: crush_reweight_uniform_bucket documentation
  2017-01-25  7:45   ` Wido den Hollander
@ 2017-01-25 12:47     ` Sage Weil
  0 siblings, 0 replies; 5+ messages in thread
From: Sage Weil @ 2017-01-25 12:47 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: Loic Dachary, Ceph Development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2554 bytes --]

On Wed, 25 Jan 2017, Wido den Hollander wrote:
> > Op 25 januari 2017 om 0:43 schreef Sage Weil <sweil@redhat.com>:
> > 
> > 
> > On Wed, 25 Jan 2017, Loic Dachary wrote:
> > > Hi Sage,
> > > 
> > > While documenting crush_reweight_bucket[1] I came accross something that 
> > > I don't understand when reweighting uniform buckets[2]. The associated 
> > > commit[3] is six years old but maybe your remember why the item_weight 
> > > had to be adjusted with the average of the weight of the buckets... but 
> > > only if there are more buckets than leaves ?
> > 
> > I think it's just a half-hearted attempt to Do The Right Thing when the 
> > situation is nonsensical.  Uniform buckets are meant to be used with leave 
> > (device) items of fixed weight (item_weight).  If you (ab)use them with 
> > bucket children, the algorithm can't really do the right thing because it 
> > doesn't understand the child bucket weights.  If there are a lot of bucket 
> > children it resets item_weight to their average.
> > 
> > This is probably pointless... we could just remove it, and perhaps warn 
> > (or error out?) in CrushCompiler if a uniform bucket child is a 
> > non-device.
> > 
> 
> So I know of a setup which uses something like this:
> 
> datacenter dc1 {
>     alg straw2
>     hash 0
>     item rack1
>     item rack2
> }
> 
> root ams {
>     alg uniform
>     hash 0
>     item dc1
>     item dc2
>     item dc3
>     item dc4
> }
> 
> They want all 3 replicas over 3 different DCs and be able to handle a 
> complete DC failure and recover from it. To prevent data shuffling when 
> a weight is changed in a rack, but data may move inside the DC.
> 
> From your comment I understand that was never the intention of uniform 
> buckets?

It was certainly not the intention.  I think it ought to work, 
though, provided the code that tries to keep the weights summing 
up the tree "behave" (are effectively a no-op) on the uniform buckets.

sage


> 
> Wido
> 
> > s
> > 
> > 
> > > 
> > > Cheers
> > > 
> > > [1] http://libcrush.org/main/libcrush/blob/wip-2-doxygen/crush/builder.h#L111
> > > [2] http://libcrush.org/main/libcrush/blob/wip-2-doxygen/crush/builder.c#L1282
> > > [3] http://libcrush.org/main/libcrush/commit/60f627f88c6314c5a89bb7119ead907ca8b8ef37
> > > 
> > > -- 
> > > Loïc Dachary, Artisan Logiciel Libre
> > > 
> > >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-01-25 12:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-24 23:11 crush_reweight_uniform_bucket documentation Loic Dachary
2017-01-24 23:43 ` Sage Weil
2017-01-25  5:39   ` Loic Dachary
2017-01-25  7:45   ` Wido den Hollander
2017-01-25 12:47     ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.