All of lore.kernel.org
 help / color / mirror / Atom feed
* crush multiweight implementation details
@ 2017-04-10 14:04 Loic Dachary
  2017-04-10 14:11 ` Sage Weil
  0 siblings, 1 reply; 7+ messages in thread
From: Loic Dachary @ 2017-04-10 14:04 UTC (permalink / raw)
  To: Sage Weil; +Cc: Ceph Development

Hi Sage,

We could have:

struct crush_choose_arg {
  __u32 bucket_id;
  __u32 num_items;
  __u32 *ids; // override the bucket items for placement
  __u32 num_positions;
  __u32 *weights; // size is num_positions*num_items
};
 
struct crush_choose_arg_map {
  struct crush_choose_arg *args;
  __u32 size;
};

and

void crush_init_workspace(const struct crush_map *m, struct crush_choose_arg_map *arg_map, void *v) {
...
if (m->buckets[b]->id == arg_map[b]->bucket_id)
   w->work[b]->arg = arg_map[b];
...
}

with

struct crush_work_bucket {
    __u32 perm_x; /* @x for which *perm is defined */
    __u32 perm_n; /* num elements of *perm that are permuted/defined */
    __u32 *perm;  /* Permutation of the bucket's items */
    struct crush_choose_arg *arg;
};

There would be no need to change the code path since crush_bucket_choose already is given the crush_work_bucket. And crush_init_workspace already has logic that is algorithm dependent. And all the sanity checks could be done in crush_init_workspace so that the choose function only does what's absolutely necessary.

What do you think ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crush multiweight implementation details
  2017-04-10 14:04 crush multiweight implementation details Loic Dachary
@ 2017-04-10 14:11 ` Sage Weil
  2017-04-10 14:39   ` Loic Dachary
  0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2017-04-10 14:11 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

On Mon, 10 Apr 2017, Loic Dachary wrote:
> Hi Sage,
> 
> We could have:
> 
> struct crush_choose_arg {
>   __u32 bucket_id;
>   __u32 num_items;
>   __u32 *ids; // override the bucket items for placement
>   __u32 num_positions;
>   __u32 *weights; // size is num_positions*num_items
> };
>  
> struct crush_choose_arg_map {
>   struct crush_choose_arg *args;
>   __u32 size;
> };
> 
> and
> 
> void crush_init_workspace(const struct crush_map *m, struct crush_choose_arg_map *arg_map, void *v) {
> ...
> if (m->buckets[b]->id == arg_map[b]->bucket_id)
>    w->work[b]->arg = arg_map[b];
> ...
> }
> 
> with
> 
> struct crush_work_bucket {
>     __u32 perm_x; /* @x for which *perm is defined */
>     __u32 perm_n; /* num elements of *perm that are permuted/defined */
>     __u32 *perm;  /* Permutation of the bucket's items */
>     struct crush_choose_arg *arg;
> };
> 
> There would be no need to change the code path since crush_bucket_choose 
> already is given the crush_work_bucket. And crush_init_workspace already 
> has logic that is algorithm dependent. And all the sanity checks could 
> be done in crush_init_workspace so that the choose function only does 
> what's absolutely necessary.

Allowing overrides of the bucket items too makes me nervous (do we have a 
use for that yet?), but otherwise this looks reasonable!

sage

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crush multiweight implementation details
  2017-04-10 14:11 ` Sage Weil
@ 2017-04-10 14:39   ` Loic Dachary
  2017-04-10 14:44     ` Sage Weil
  0 siblings, 1 reply; 7+ messages in thread
From: Loic Dachary @ 2017-04-10 14:39 UTC (permalink / raw)
  To: Sage Weil; +Cc: Ceph Development



On 04/10/2017 04:11 PM, Sage Weil wrote:
> On Mon, 10 Apr 2017, Loic Dachary wrote:
>> Hi Sage,
>>
>> We could have:
>>
>> struct crush_choose_arg {
>>   __u32 bucket_id;
>>   __u32 num_items;
>>   __u32 *ids; // override the bucket items for placement
>>   __u32 num_positions;
>>   __u32 *weights; // size is num_positions*num_items
>> };
>>  
>> struct crush_choose_arg_map {
>>   struct crush_choose_arg *args;
>>   __u32 size;
>> };
>>
>> and
>>
>> void crush_init_workspace(const struct crush_map *m, struct crush_choose_arg_map *arg_map, void *v) {
>> ...
>> if (m->buckets[b]->id == arg_map[b]->bucket_id)
>>    w->work[b]->arg = arg_map[b];
>> ...
>> }
>>
>> with
>>
>> struct crush_work_bucket {
>>     __u32 perm_x; /* @x for which *perm is defined */
>>     __u32 perm_n; /* num elements of *perm that are permuted/defined */
>>     __u32 *perm;  /* Permutation of the bucket's items */
>>     struct crush_choose_arg *arg;
>> };
>>
>> There would be no need to change the code path since crush_bucket_choose 
>> already is given the crush_work_bucket. And crush_init_workspace already 
>> has logic that is algorithm dependent. And all the sanity checks could 
>> be done in crush_init_workspace so that the choose function only does 
>> what's absolutely necessary.
> 
> Allowing overrides of the bucket items too makes me nervous (do we have a 
> use for that yet?), 

Maybe I misunderstood what you were after with bucket_id in http://pad.ceph.com/p/crush-multiweight around here ?

struct crush_bucket_weight_set {
  __u32 bucket_id;  /* used as input to hash in place of bucket id */
  __u32 num_positions, num_items;
  __u32 *data; // index like [item*num_items+pos]
};

> but otherwise this looks reasonable!

Cool :-)

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crush multiweight implementation details
  2017-04-10 14:39   ` Loic Dachary
@ 2017-04-10 14:44     ` Sage Weil
  2017-04-10 14:53       ` Loic Dachary
  0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2017-04-10 14:44 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2147 bytes --]

On Mon, 10 Apr 2017, Loic Dachary wrote:
> On 04/10/2017 04:11 PM, Sage Weil wrote:
> > On Mon, 10 Apr 2017, Loic Dachary wrote:
> >> Hi Sage,
> >>
> >> We could have:
> >>
> >> struct crush_choose_arg {
> >>   __u32 bucket_id;
> >>   __u32 num_items;
> >>   __u32 *ids; // override the bucket items for placement
> >>   __u32 num_positions;
> >>   __u32 *weights; // size is num_positions*num_items
> >> };
> >>  
> >> struct crush_choose_arg_map {
> >>   struct crush_choose_arg *args;
> >>   __u32 size;
> >> };
> >>
> >> and
> >>
> >> void crush_init_workspace(const struct crush_map *m, struct crush_choose_arg_map *arg_map, void *v) {
> >> ...
> >> if (m->buckets[b]->id == arg_map[b]->bucket_id)
> >>    w->work[b]->arg = arg_map[b];
> >> ...
> >> }
> >>
> >> with
> >>
> >> struct crush_work_bucket {
> >>     __u32 perm_x; /* @x for which *perm is defined */
> >>     __u32 perm_n; /* num elements of *perm that are permuted/defined */
> >>     __u32 *perm;  /* Permutation of the bucket's items */
> >>     struct crush_choose_arg *arg;
> >> };
> >>
> >> There would be no need to change the code path since crush_bucket_choose 
> >> already is given the crush_work_bucket. And crush_init_workspace already 
> >> has logic that is algorithm dependent. And all the sanity checks could 
> >> be done in crush_init_workspace so that the choose function only does 
> >> what's absolutely necessary.
> > 
> > Allowing overrides of the bucket items too makes me nervous (do we have a 
> > use for that yet?), 
> 
> Maybe I misunderstood what you were after with bucket_id in http://pad.ceph.com/p/crush-multiweight around here ?
> 
> struct crush_bucket_weight_set {
>   __u32 bucket_id;  /* used as input to hash in place of bucket id */
>   __u32 num_positions, num_items;
>   __u32 *data; // index like [item*num_items+pos]
> };

It's just the bucket id that matters; the members of the bucket don't need 
to change.  I'm not sure that the __u32 *ids above is needed for 
anything... unless I'm misunderstanding something?

sage

 
> > but otherwise this looks reasonable!
> 
> Cool :-)
> 
> -- 
> Loïc Dachary, Artisan Logiciel Libre
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crush multiweight implementation details
  2017-04-10 14:44     ` Sage Weil
@ 2017-04-10 14:53       ` Loic Dachary
  2017-04-11  6:49         ` Loic Dachary
  0 siblings, 1 reply; 7+ messages in thread
From: Loic Dachary @ 2017-04-10 14:53 UTC (permalink / raw)
  To: Sage Weil; +Cc: Ceph Development



On 04/10/2017 04:44 PM, Sage Weil wrote:
> On Mon, 10 Apr 2017, Loic Dachary wrote:
>> On 04/10/2017 04:11 PM, Sage Weil wrote:
>>> On Mon, 10 Apr 2017, Loic Dachary wrote:
>>>> Hi Sage,
>>>>
>>>> We could have:
>>>>
>>>> struct crush_choose_arg {
>>>>   __u32 bucket_id;
>>>>   __u32 num_items;
>>>>   __u32 *ids; // override the bucket items for placement
>>>>   __u32 num_positions;
>>>>   __u32 *weights; // size is num_positions*num_items
>>>> };
>>>>  
>>>> struct crush_choose_arg_map {
>>>>   struct crush_choose_arg *args;
>>>>   __u32 size;
>>>> };
>>>>
>>>> and
>>>>
>>>> void crush_init_workspace(const struct crush_map *m, struct crush_choose_arg_map *arg_map, void *v) {
>>>> ...
>>>> if (m->buckets[b]->id == arg_map[b]->bucket_id)
>>>>    w->work[b]->arg = arg_map[b];
>>>> ...
>>>> }
>>>>
>>>> with
>>>>
>>>> struct crush_work_bucket {
>>>>     __u32 perm_x; /* @x for which *perm is defined */
>>>>     __u32 perm_n; /* num elements of *perm that are permuted/defined */
>>>>     __u32 *perm;  /* Permutation of the bucket's items */
>>>>     struct crush_choose_arg *arg;
>>>> };
>>>>
>>>> There would be no need to change the code path since crush_bucket_choose 
>>>> already is given the crush_work_bucket. And crush_init_workspace already 
>>>> has logic that is algorithm dependent. And all the sanity checks could 
>>>> be done in crush_init_workspace so that the choose function only does 
>>>> what's absolutely necessary.
>>>
>>> Allowing overrides of the bucket items too makes me nervous (do we have a 
>>> use for that yet?), 
>>
>> Maybe I misunderstood what you were after with bucket_id in http://pad.ceph.com/p/crush-multiweight around here ?
>>
>> struct crush_bucket_weight_set {
>>   __u32 bucket_id;  /* used as input to hash in place of bucket id */
>>   __u32 num_positions, num_items;
>>   __u32 *data; // index like [item*num_items+pos]
>> };
> 
> It's just the bucket id that matters; the members of the bucket don't need 
> to change.  I'm not sure that the __u32 *ids above is needed for 
> anything... unless I'm misunderstanding something?

The code would look like this:

static int bucket_straw2_choose(const struct crush_bucket_straw2 *bucket,
				int x, int r, const struct crush_choose_arg_list *arg_map, int position)
{
        struct crush_choose_arg_at_position *arg = get_straw2_choose_arg(bucket, arg_map, position);
	unsigned int i, high = 0;
	unsigned int u;
	unsigned int w;
        unsigned int id;
	__s64 ln, draw, high_draw = 0;

	for (i = 0; i < bucket->h.size; i++) {
		w = arg->weights[i];
                id = arg->ids[i];
                dprintk("weight 0x%x item %d\n", w, id);
		if (w) {
			u = crush_hash32_3(bucket->h.hash, x, id, r);

I don't see how changing the bucket id alone would work otherwise since it's not used.

> 
> sage
> 
>  
>>> but otherwise this looks reasonable!
>>
>> Cool :-)
>>
>> -- 
>> Loïc Dachary, Artisan Logiciel Libre
>>

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crush multiweight implementation details
  2017-04-10 14:53       ` Loic Dachary
@ 2017-04-11  6:49         ` Loic Dachary
  2017-04-11 13:31           ` Sage Weil
  0 siblings, 1 reply; 7+ messages in thread
From: Loic Dachary @ 2017-04-11  6:49 UTC (permalink / raw)
  To: Sage Weil; +Cc: Ceph Development



On 04/10/2017 04:53 PM, Loic Dachary wrote:
> 
> 
> On 04/10/2017 04:44 PM, Sage Weil wrote:
>> On Mon, 10 Apr 2017, Loic Dachary wrote:
>>> On 04/10/2017 04:11 PM, Sage Weil wrote:
>>>> On Mon, 10 Apr 2017, Loic Dachary wrote:
>>>>> Hi Sage,
>>>>>
>>>>> We could have:
>>>>>
>>>>> struct crush_choose_arg {
>>>>>   __u32 bucket_id;
>>>>>   __u32 num_items;
>>>>>   __u32 *ids; // override the bucket items for placement
>>>>>   __u32 num_positions;
>>>>>   __u32 *weights; // size is num_positions*num_items
>>>>> };
>>>>>  
>>>>> struct crush_choose_arg_map {
>>>>>   struct crush_choose_arg *args;
>>>>>   __u32 size;
>>>>> };
>>>>>
>>>>> and
>>>>>
>>>>> void crush_init_workspace(const struct crush_map *m, struct crush_choose_arg_map *arg_map, void *v) {
>>>>> ...
>>>>> if (m->buckets[b]->id == arg_map[b]->bucket_id)
>>>>>    w->work[b]->arg = arg_map[b];
>>>>> ...
>>>>> }
>>>>>
>>>>> with
>>>>>
>>>>> struct crush_work_bucket {
>>>>>     __u32 perm_x; /* @x for which *perm is defined */
>>>>>     __u32 perm_n; /* num elements of *perm that are permuted/defined */
>>>>>     __u32 *perm;  /* Permutation of the bucket's items */
>>>>>     struct crush_choose_arg *arg;
>>>>> };
>>>>>
>>>>> There would be no need to change the code path since crush_bucket_choose 
>>>>> already is given the crush_work_bucket. And crush_init_workspace already 
>>>>> has logic that is algorithm dependent. And all the sanity checks could 
>>>>> be done in crush_init_workspace so that the choose function only does 
>>>>> what's absolutely necessary.
>>>>
>>>> Allowing overrides of the bucket items too makes me nervous (do we have a 
>>>> use for that yet?), 
>>>
>>> Maybe I misunderstood what you were after with bucket_id in http://pad.ceph.com/p/crush-multiweight around here ?
>>>
>>> struct crush_bucket_weight_set {
>>>   __u32 bucket_id;  /* used as input to hash in place of bucket id */
>>>   __u32 num_positions, num_items;
>>>   __u32 *data; // index like [item*num_items+pos]
>>> };
>>
>> It's just the bucket id that matters; the members of the bucket don't need 
>> to change.  I'm not sure that the __u32 *ids above is needed for 
>> anything... unless I'm misunderstanding something?
> 
> The code would look like this:
> 
> static int bucket_straw2_choose(const struct crush_bucket_straw2 *bucket,
> 				int x, int r, const struct crush_choose_arg_list *arg_map, int position)
> {
>         struct crush_choose_arg_at_position *arg = get_straw2_choose_arg(bucket, arg_map, position);
> 	unsigned int i, high = 0;
> 	unsigned int u;
> 	unsigned int w;
>         unsigned int id;
> 	__s64 ln, draw, high_draw = 0;
> 
> 	for (i = 0; i < bucket->h.size; i++) {
> 		w = arg->weights[i];
>                 id = arg->ids[i];
>                 dprintk("weight 0x%x item %d\n", w, id);
> 		if (w) {
> 			u = crush_hash32_3(bucket->h.hash, x, id, r);
> 
> I don't see how changing the bucket id alone would work otherwise since it's not used.

Another reason for having __u32 *ids to remap each value in bucket->items instead of having a single __u32 bucket_id is that it allows us to remap both bucket ids and device ids.

Maybe we should just leave this id remapping thing for later since we don't currently have a use case for it ? Implementing the weight set as part of the workspace is straightforward and we can follow the same code path at a later time if / when substituting bucket / device ids becomes useful.

Cheers

>>
>> sage
>>
>>  
>>>> but otherwise this looks reasonable!
>>>
>>> Cool :-)
>>>
>>> -- 
>>> Loïc Dachary, Artisan Logiciel Libre
>>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crush multiweight implementation details
  2017-04-11  6:49         ` Loic Dachary
@ 2017-04-11 13:31           ` Sage Weil
  0 siblings, 0 replies; 7+ messages in thread
From: Sage Weil @ 2017-04-11 13:31 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

On Tue, 11 Apr 2017, Loic Dachary wrote:
> On 04/10/2017 04:53 PM, Loic Dachary wrote:
> > On 04/10/2017 04:44 PM, Sage Weil wrote:
> >> On Mon, 10 Apr 2017, Loic Dachary wrote:
> >>> On 04/10/2017 04:11 PM, Sage Weil wrote:
> >>>> On Mon, 10 Apr 2017, Loic Dachary wrote:
> >>>>> Hi Sage,
> >>>>>
> >>>>> We could have:
> >>>>>
> >>>>> struct crush_choose_arg {
> >>>>>   __u32 bucket_id;
> >>>>>   __u32 num_items;
> >>>>>   __u32 *ids; // override the bucket items for placement
> >>>>>   __u32 num_positions;
> >>>>>   __u32 *weights; // size is num_positions*num_items
> >>>>> };
> >>>>>  
> >>>>> struct crush_choose_arg_map {
> >>>>>   struct crush_choose_arg *args;
> >>>>>   __u32 size;
> >>>>> };
> >>>>>
> >>>>> and
> >>>>>
> >>>>> void crush_init_workspace(const struct crush_map *m, struct crush_choose_arg_map *arg_map, void *v) {
> >>>>> ...
> >>>>> if (m->buckets[b]->id == arg_map[b]->bucket_id)
> >>>>>    w->work[b]->arg = arg_map[b];
> >>>>> ...
> >>>>> }
> >>>>>
> >>>>> with
> >>>>>
> >>>>> struct crush_work_bucket {
> >>>>>     __u32 perm_x; /* @x for which *perm is defined */
> >>>>>     __u32 perm_n; /* num elements of *perm that are permuted/defined */
> >>>>>     __u32 *perm;  /* Permutation of the bucket's items */
> >>>>>     struct crush_choose_arg *arg;
> >>>>> };
> >>>>>
> >>>>> There would be no need to change the code path since crush_bucket_choose 
> >>>>> already is given the crush_work_bucket. And crush_init_workspace already 
> >>>>> has logic that is algorithm dependent. And all the sanity checks could 
> >>>>> be done in crush_init_workspace so that the choose function only does 
> >>>>> what's absolutely necessary.
> >>>>
> >>>> Allowing overrides of the bucket items too makes me nervous (do we have a 
> >>>> use for that yet?), 
> >>>
> >>> Maybe I misunderstood what you were after with bucket_id in http://pad.ceph.com/p/crush-multiweight around here ?
> >>>
> >>> struct crush_bucket_weight_set {
> >>>   __u32 bucket_id;  /* used as input to hash in place of bucket id */
> >>>   __u32 num_positions, num_items;
> >>>   __u32 *data; // index like [item*num_items+pos]
> >>> };
> >>
> >> It's just the bucket id that matters; the members of the bucket don't need 
> >> to change.  I'm not sure that the __u32 *ids above is needed for 
> >> anything... unless I'm misunderstanding something?
> > 
> > The code would look like this:
> > 
> > static int bucket_straw2_choose(const struct crush_bucket_straw2 *bucket,
> > 				int x, int r, const struct crush_choose_arg_list *arg_map, int position)
> > {
> >         struct crush_choose_arg_at_position *arg = get_straw2_choose_arg(bucket, arg_map, position);
> > 	unsigned int i, high = 0;
> > 	unsigned int u;
> > 	unsigned int w;
> >         unsigned int id;
> > 	__s64 ln, draw, high_draw = 0;
> > 
> > 	for (i = 0; i < bucket->h.size; i++) {
> > 		w = arg->weights[i];
> >                 id = arg->ids[i];
> >                 dprintk("weight 0x%x item %d\n", w, id);
> > 		if (w) {
> > 			u = crush_hash32_3(bucket->h.hash, x, id, r);
> > 
> > I don't see how changing the bucket id alone would work otherwise since it's not used.
> 
> Another reason for having __u32 *ids to remap each value in 
> bucket->items instead of having a single __u32 bucket_id is that it 
> allows us to remap both bucket ids and device ids.
> 
> Maybe we should just leave this id remapping thing for later since we 
> don't currently have a use case for it ? Implementing the weight set as 
> part of the workspace is straightforward and we can follow the same code 
> path at a later time if / when substituting bucket / device ids becomes 
> useful.

Oh, I misunderstood... I thought these were changing the actual bucket 
items, not just substituting ids for input in the placement hash.

There are several ways that I can imagine using it, but all of them are 
now covered by the fact that we can feed in alternative weight values.

The one exception is (oddly enough) the device classes.  We have a shadow 
hierarchy with different bucket ids.  This can also be accomplished by 
having alternate weight sets for each device class.  However, if the user 
starts using this cluster now, without requiring luminous clients, they 
are signing up for the alternate bucket ids, and if we want to eventually 
transition this feature to make use of the weight sets instead then we'll 
need the id remapping.

So I'm inclined to include it (as you originally proposed)...

sage

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-04-11 13:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-10 14:04 crush multiweight implementation details Loic Dachary
2017-04-10 14:11 ` Sage Weil
2017-04-10 14:39   ` Loic Dachary
2017-04-10 14:44     ` Sage Weil
2017-04-10 14:53       ` Loic Dachary
2017-04-11  6:49         ` Loic Dachary
2017-04-11 13:31           ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.