All of lore.kernel.org
 help / color / mirror / Atom feed
* crushmap rule issue: choose vs. chooseleaf
@ 2010-06-23 19:20 Jim Schutt
  2010-06-23 21:20 ` Sage Weil
  0 siblings, 1 reply; 7+ messages in thread
From: Jim Schutt @ 2010-06-23 19:20 UTC (permalink / raw)
  To: ceph-devel

Hi,

I've been trying to get custom CRUSH maps to work, based on
http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH

I've not had any success until I dumped the map from
a simple 4 device setup.  I noticed that map had a
rule using:
  step choose firstn 0 type device

whereas all the custom maps I was trying to build used
chooseleaf rather than choose.  So I modified those
default 4 device map rules to be:
  step chooseleaf firstn 0 type device

and built a new file system using that map.
It would not start.

I.e., a file system built using this CRUSH map works
for me:

# begin crush map

# devices
device 0 device0
device 1 device1
device 2 device2
device 3 device3

# types
type 0 device
type 1 domain
type 2 pool

# buckets
domain root {
        id -1           # do not change unnecessarily
        alg straw
        hash 0  # rjenkins1
        item device0 weight 1.000
        item device1 weight 1.000
        item device2 weight 1.000
        item device3 weight 1.000
}

# rules
rule data {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take root
        step choose firstn 0 type device
        step emit
}
rule metadata {
        ruleset 1
        type replicated
        min_size 1
        max_size 10
        step take root
        step choose firstn 0 type device
        step emit
}
rule casdata {
        ruleset 2
        type replicated
        min_size 1
        max_size 10
        step take root
        step choose firstn 0 type device
        step emit
}
rule rbd {
        ruleset 3
        type replicated
        min_size 1
        max_size 10
        step take root
        step choose firstn 0 type device
        step emit
}

# end crush map

but a file system built using this CRUSH map this one 
does not:

# begin crush map

# devices
device 0 device0
device 1 device1
device 2 device2
device 3 device3

# types
type 0 device
type 1 domain
type 2 pool

# buckets
domain root {
        id -1           # do not change unnecessarily
        alg straw
        hash 0  # rjenkins1
        item device0 weight 1.000
        item device1 weight 1.000
        item device2 weight 1.000
        item device3 weight 1.000
}

# rules
rule data {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take root
        step chooseleaf firstn 0 type device
        step emit
}
rule metadata {
        ruleset 1
        type replicated
        min_size 1
        max_size 10
        step take root
        step chooseleaf firstn 0 type device
        step emit
}
rule casdata {
        ruleset 2
        type replicated
        min_size 1
        max_size 10
        step take root
        step chooseleaf firstn 0 type device
        step emit
}
rule rbd {
        ruleset 3
        type replicated
        min_size 1
        max_size 10
        step take root
        step chooseleaf firstn 0 type device
        step emit
}

# end crush map


Based on that, I reworked some of test maps with deeper device
hierarchies I had been trying, and got them to work
(i.e. the file system started) when I avoided chooseleaf rules.

E.g. with a device hierarchy like this
(a device here is a partition, as I am still
testing on limited hardware):

type 0 device
type 1 disk
type 2 controller
type 3 host
type 4 root

a map with rules like this worked:

rule data {
        ruleset 0
        type replicated
        min_size 2
        max_size 2
        step take root
        step choose firstn 0 type host
        step choose firstn 0 type controller
        step choose firstn 0 type disk
        step choose firstn 0 type device
        step emit
}

but a map with rules like this didn't:

rule data {
        ruleset 0
        type replicated
        min_size 2
        max_size 2
        step take root
        step chooseleaf firstn 0 type controller
        step emit
}


Am I missing something?

Thanks -- Jim



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crushmap rule issue: choose vs. chooseleaf
  2010-06-23 19:20 crushmap rule issue: choose vs. chooseleaf Jim Schutt
@ 2010-06-23 21:20 ` Sage Weil
  2010-06-23 21:57   ` Jim Schutt
  0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2010-06-23 21:20 UTC (permalink / raw)
  To: Jim Schutt; +Cc: ceph-devel

On Wed, 23 Jun 2010, Jim Schutt wrote:
> I've been trying to get custom CRUSH maps to work, based on
> http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH
> 
> I've not had any success until I dumped the map from
> a simple 4 device setup.  I noticed that map had a
> rule using:
>   step choose firstn 0 type device
> 
> whereas all the custom maps I was trying to build used
> chooseleaf rather than choose.  So I modified those
> default 4 device map rules to be:
>   step chooseleaf firstn 0 type device

Hmm.  It's non-obvious, and should probably work, but chooseleaf on a 
'device' (which is the leaf) currently doesn't work.  If you have a 
hiearchy like

root
host
controller
disk
device

You can either

         step take root
         step choose firstn 0 type controller
         step choose firstn 1 type device
         step emit

to get N distinct controllers, and then for each of those, choose 1 
device.  Or,

         step take root
         step chooseleaf firstn 0 type controller
         step emit

to choose (a device nested beneath) N distinct controllers.  The 
difference is the latter will try to pick a nested device for each 
controller and, if it can't find one, reject the controller choice and 
continue.  It prevents situations where you have a controller with no 
usable devices beneath it, the first rules picks one of those controllers 
in the 'choose firstn 0 type controller' step, but then can't find a 
device and you end up with (n-1) results.

The first problem you had was a bug when chooseleaf was given the leaf 
type (device).  It normally takes intermediate type in the heirarchy, not 
the leaf type.  That's now fixed, and should give an identical result to 
'choose' in that case.


> Based on that, I reworked some of test maps with deeper device
> hierarchies I had been trying, and got them to work
> (i.e. the file system started) when I avoided chooseleaf rules.
> 
> E.g. with a device hierarchy like this
> (a device here is a partition, as I am still
> testing on limited hardware):
> 
> type 0 device
> type 1 disk
> type 2 controller
> type 3 host
> type 4 root
> 
> a map with rules like this worked:
> 
> rule data {
>         ruleset 0
>         type replicated
>         min_size 2
>         max_size 2
>         step take root
>         step choose firstn 0 type host
>         step choose firstn 0 type controller
>         step choose firstn 0 type disk
>         step choose firstn 0 type device
>         step emit
> }
> 
> but a map with rules like this didn't:
> 
> rule data {
>         ruleset 0
>         type replicated
>         min_size 2
>         max_size 2
>         step take root
>         step chooseleaf firstn 0 type controller
>         step emit
> }

Hmm, this should work (assuming there are actually nodes of type 
controller in the tree).  Can you send along the actual map you're trying?

Thanks-
sage

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crushmap rule issue: choose vs. chooseleaf
  2010-06-23 21:20 ` Sage Weil
@ 2010-06-23 21:57   ` Jim Schutt
  2010-06-24 18:20     ` Sage Weil
  0 siblings, 1 reply; 7+ messages in thread
From: Jim Schutt @ 2010-06-23 21:57 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel


On Wed, 2010-06-23 at 15:20 -0600, Sage Weil wrote:
> On Wed, 23 Jun 2010, Jim Schutt wrote:
> > I've been trying to get custom CRUSH maps to work, based on
> > http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH
> > 
> > I've not had any success until I dumped the map from
> > a simple 4 device setup.  I noticed that map had a
> > rule using:
> >   step choose firstn 0 type device
> > 
> > whereas all the custom maps I was trying to build used
> > chooseleaf rather than choose.  So I modified those
> > default 4 device map rules to be:
> >   step chooseleaf firstn 0 type device
> 
> Hmm.  It's non-obvious, and should probably work, but chooseleaf on a 
> 'device' (which is the leaf) currently doesn't work.  If you have a 
> hiearchy like
> 
> root
> host
> controller
> disk
> device
> 
> You can either
> 
>          step take root
>          step choose firstn 0 type controller
>          step choose firstn 1 type device
>          step emit
> 
> to get N distinct controllers, and then for each of those, choose 1 
> device.  Or,
> 
>          step take root
>          step chooseleaf firstn 0 type controller
>          step emit
> 
> to choose (a device nested beneath) N distinct controllers.  The 
> difference is the latter will try to pick a nested device for each 
> controller and, if it can't find one, reject the controller choice and 
> continue.  It prevents situations where you have a controller with no 
> usable devices beneath it, the first rules picks one of those controllers 
> in the 'choose firstn 0 type controller' step, but then can't find a 
> device and you end up with (n-1) results.
> 
> The first problem you had was a bug when chooseleaf was given the leaf 
> type (device).  It normally takes intermediate type in the heirarchy, not 
> the leaf type.  That's now fixed, and should give an identical result to 
> 'choose' in that case.

OK, thanks.

> 
> 
> > Based on that, I reworked some of test maps with deeper device
> > hierarchies I had been trying, and got them to work
> > (i.e. the file system started) when I avoided chooseleaf rules.
> > 
> > E.g. with a device hierarchy like this
> > (a device here is a partition, as I am still
> > testing on limited hardware):
> > 
> > type 0 device
> > type 1 disk
> > type 2 controller
> > type 3 host
> > type 4 root
> > 
> > a map with rules like this worked:
> > 
> > rule data {
> >         ruleset 0
> >         type replicated
> >         min_size 2
> >         max_size 2
> >         step take root
> >         step choose firstn 0 type host
> >         step choose firstn 0 type controller
> >         step choose firstn 0 type disk
> >         step choose firstn 0 type device
> >         step emit
> > }

Based on your above explanation, I suspect this wasn't
doing what I wanted.

> > 
> > but a map with rules like this didn't:
> > 
> > rule data {
> >         ruleset 0
> >         type replicated
> >         min_size 2
> >         max_size 2
> >         step take root
> >         step chooseleaf firstn 0 type controller
> >         step emit
> > }
> 
> Hmm, this should work (assuming there are actually nodes of type 
> controller in the tree).  Can you send along the actual map you're trying?

Sure.  I've been using multiple partitions
per disk for learning about CRUSH maps, so 
in this map a device is a partition.

Here it is:

# begin crush map

# devices
device 0 device0
device 1 device1
device 2 device2
device 3 device3

# types
type 0 device
type 1 disk
type 2 controller
type 3 host
type 4 root

# buckets
disk disk0 {
	id -1		# do not change unnecessarily
	alg uniform	# do not change bucket size (1) unnecessarily
	hash 0	# rjenkins1
	item device0 weight 1.000 pos 0
}
disk disk1 {
	id -2		# do not change unnecessarily
	alg uniform	# do not change bucket size (1) unnecessarily
	hash 0	# rjenkins1
	item device1 weight 1.000 pos 0
}
disk disk2 {
	id -3		# do not change unnecessarily
	alg uniform	# do not change bucket size (1) unnecessarily
	hash 0	# rjenkins1
	item device2 weight 1.000 pos 0
}
disk disk3 {
	id -4		# do not change unnecessarily
	alg uniform	# do not change bucket size (1) unnecessarily
	hash 0	# rjenkins1
	item device3 weight 1.000 pos 0
}
controller controller0 {
	id -5		# do not change unnecessarily
	alg uniform	# do not change bucket size (2) unnecessarily
	hash 0	# rjenkins1
	item disk0 weight 1.000 pos 0
	item disk1 weight 1.000 pos 1
}
controller controller1 {
	id -6		# do not change unnecessarily
	alg uniform	# do not change bucket size (2) unnecessarily
	hash 0	# rjenkins1
	item disk2 weight 1.000 pos 0
	item disk3 weight 1.000 pos 1
}
host host0 {
	id -7		# do not change unnecessarily
	alg uniform	# do not change bucket size (2) unnecessarily
	hash 0	# rjenkins1
	item controller0 weight 2.000 pos 0
	item controller1 weight 2.000 pos 1
}
root root {
	id -8		# do not change unnecessarily
	alg straw
	hash 0	# rjenkins1
	item host0 weight 4.000
}

# rules
rule data {
	ruleset 0
	type replicated
	min_size 2
	max_size 2
	step take root
	step chooseleaf firstn 0 type controller
	step emit
}
rule metadata {
	ruleset 1
	type replicated
	min_size 2
	max_size 2
	step take root
	step chooseleaf firstn 0 type controller
	step emit
}
rule casdata {
	ruleset 2
	type replicated
	min_size 2
	max_size 2
	step take root
	step chooseleaf firstn 0 type controller
	step emit
}
rule rbd {
	ruleset 3
	type replicated
	min_size 2
	max_size 2
	step take root
	step chooseleaf firstn 0 type controller
	step emit
}

# end crush map

When I try to start a file system built with the above map,
the monitor never accepts connections (from either ceph -w
or the cosd instances).

Thanks for taking a look.

-- Jim

> 
> Thanks-
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crushmap rule issue: choose vs. chooseleaf
  2010-06-23 21:57   ` Jim Schutt
@ 2010-06-24 18:20     ` Sage Weil
  2010-06-24 19:44       ` Jim Schutt
  0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2010-06-24 18:20 UTC (permalink / raw)
  To: Jim Schutt; +Cc: ceph-devel

Hi Jim,

Okay, I fixed another bug and am now able to use your map without 
problems.  The fix is pushed to the unstable branch in ceph.git.

I'm surprised we didn't run into this before.. it looks like it's been 
broken for a while.  I'm adding a tracker item to set up some unit tests 
for this stuff so we can avoid this sort of regression.. the crush code 
should be really easy to check.

sage


On Wed, 23 Jun 2010, Jim Schutt wrote:

> 
> On Wed, 2010-06-23 at 15:20 -0600, Sage Weil wrote:
> > On Wed, 23 Jun 2010, Jim Schutt wrote:
> > > I've been trying to get custom CRUSH maps to work, based on
> > > http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH
> > > 
> > > I've not had any success until I dumped the map from
> > > a simple 4 device setup.  I noticed that map had a
> > > rule using:
> > >   step choose firstn 0 type device
> > > 
> > > whereas all the custom maps I was trying to build used
> > > chooseleaf rather than choose.  So I modified those
> > > default 4 device map rules to be:
> > >   step chooseleaf firstn 0 type device
> > 
> > Hmm.  It's non-obvious, and should probably work, but chooseleaf on a 
> > 'device' (which is the leaf) currently doesn't work.  If you have a 
> > hiearchy like
> > 
> > root
> > host
> > controller
> > disk
> > device
> > 
> > You can either
> > 
> >          step take root
> >          step choose firstn 0 type controller
> >          step choose firstn 1 type device
> >          step emit
> > 
> > to get N distinct controllers, and then for each of those, choose 1 
> > device.  Or,
> > 
> >          step take root
> >          step chooseleaf firstn 0 type controller
> >          step emit
> > 
> > to choose (a device nested beneath) N distinct controllers.  The 
> > difference is the latter will try to pick a nested device for each 
> > controller and, if it can't find one, reject the controller choice and 
> > continue.  It prevents situations where you have a controller with no 
> > usable devices beneath it, the first rules picks one of those controllers 
> > in the 'choose firstn 0 type controller' step, but then can't find a 
> > device and you end up with (n-1) results.
> > 
> > The first problem you had was a bug when chooseleaf was given the leaf 
> > type (device).  It normally takes intermediate type in the heirarchy, not 
> > the leaf type.  That's now fixed, and should give an identical result to 
> > 'choose' in that case.
> 
> OK, thanks.
> 
> > 
> > 
> > > Based on that, I reworked some of test maps with deeper device
> > > hierarchies I had been trying, and got them to work
> > > (i.e. the file system started) when I avoided chooseleaf rules.
> > > 
> > > E.g. with a device hierarchy like this
> > > (a device here is a partition, as I am still
> > > testing on limited hardware):
> > > 
> > > type 0 device
> > > type 1 disk
> > > type 2 controller
> > > type 3 host
> > > type 4 root
> > > 
> > > a map with rules like this worked:
> > > 
> > > rule data {
> > >         ruleset 0
> > >         type replicated
> > >         min_size 2
> > >         max_size 2
> > >         step take root
> > >         step choose firstn 0 type host
> > >         step choose firstn 0 type controller
> > >         step choose firstn 0 type disk
> > >         step choose firstn 0 type device
> > >         step emit
> > > }
> 
> Based on your above explanation, I suspect this wasn't
> doing what I wanted.
> 
> > > 
> > > but a map with rules like this didn't:
> > > 
> > > rule data {
> > >         ruleset 0
> > >         type replicated
> > >         min_size 2
> > >         max_size 2
> > >         step take root
> > >         step chooseleaf firstn 0 type controller
> > >         step emit
> > > }
> > 
> > Hmm, this should work (assuming there are actually nodes of type 
> > controller in the tree).  Can you send along the actual map you're trying?
> 
> Sure.  I've been using multiple partitions
> per disk for learning about CRUSH maps, so 
> in this map a device is a partition.
> 
> Here it is:
> 
> # begin crush map
> 
> # devices
> device 0 device0
> device 1 device1
> device 2 device2
> device 3 device3
> 
> # types
> type 0 device
> type 1 disk
> type 2 controller
> type 3 host
> type 4 root
> 
> # buckets
> disk disk0 {
> 	id -1		# do not change unnecessarily
> 	alg uniform	# do not change bucket size (1) unnecessarily
> 	hash 0	# rjenkins1
> 	item device0 weight 1.000 pos 0
> }
> disk disk1 {
> 	id -2		# do not change unnecessarily
> 	alg uniform	# do not change bucket size (1) unnecessarily
> 	hash 0	# rjenkins1
> 	item device1 weight 1.000 pos 0
> }
> disk disk2 {
> 	id -3		# do not change unnecessarily
> 	alg uniform	# do not change bucket size (1) unnecessarily
> 	hash 0	# rjenkins1
> 	item device2 weight 1.000 pos 0
> }
> disk disk3 {
> 	id -4		# do not change unnecessarily
> 	alg uniform	# do not change bucket size (1) unnecessarily
> 	hash 0	# rjenkins1
> 	item device3 weight 1.000 pos 0
> }
> controller controller0 {
> 	id -5		# do not change unnecessarily
> 	alg uniform	# do not change bucket size (2) unnecessarily
> 	hash 0	# rjenkins1
> 	item disk0 weight 1.000 pos 0
> 	item disk1 weight 1.000 pos 1
> }
> controller controller1 {
> 	id -6		# do not change unnecessarily
> 	alg uniform	# do not change bucket size (2) unnecessarily
> 	hash 0	# rjenkins1
> 	item disk2 weight 1.000 pos 0
> 	item disk3 weight 1.000 pos 1
> }
> host host0 {
> 	id -7		# do not change unnecessarily
> 	alg uniform	# do not change bucket size (2) unnecessarily
> 	hash 0	# rjenkins1
> 	item controller0 weight 2.000 pos 0
> 	item controller1 weight 2.000 pos 1
> }
> root root {
> 	id -8		# do not change unnecessarily
> 	alg straw
> 	hash 0	# rjenkins1
> 	item host0 weight 4.000
> }
> 
> # rules
> rule data {
> 	ruleset 0
> 	type replicated
> 	min_size 2
> 	max_size 2
> 	step take root
> 	step chooseleaf firstn 0 type controller
> 	step emit
> }
> rule metadata {
> 	ruleset 1
> 	type replicated
> 	min_size 2
> 	max_size 2
> 	step take root
> 	step chooseleaf firstn 0 type controller
> 	step emit
> }
> rule casdata {
> 	ruleset 2
> 	type replicated
> 	min_size 2
> 	max_size 2
> 	step take root
> 	step chooseleaf firstn 0 type controller
> 	step emit
> }
> rule rbd {
> 	ruleset 3
> 	type replicated
> 	min_size 2
> 	max_size 2
> 	step take root
> 	step chooseleaf firstn 0 type controller
> 	step emit
> }
> 
> # end crush map
> 
> When I try to start a file system built with the above map,
> the monitor never accepts connections (from either ceph -w
> or the cosd instances).
> 
> Thanks for taking a look.
> 
> -- Jim
> 
> > 
> > Thanks-
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crushmap rule issue: choose vs. chooseleaf
  2010-06-24 18:20     ` Sage Weil
@ 2010-06-24 19:44       ` Jim Schutt
  2010-06-24 20:18         ` Sage Weil
  0 siblings, 1 reply; 7+ messages in thread
From: Jim Schutt @ 2010-06-24 19:44 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel


On Thu, 2010-06-24 at 12:20 -0600, Sage Weil wrote:
> Hi Jim,
> 
> Okay, I fixed another bug and am now able to use your map without 
> problems.  The fix is pushed to the unstable branch in ceph.git.

Great, thanks!  I really appreciate you being
able to take a look so quickly.

> 
> I'm surprised we didn't run into this before.. it looks like it's been 
> broken for a while.  I'm adding a tracker item to set up some unit tests 
> for this stuff so we can avoid this sort of regression.. the crush code 
> should be really easy to check.

That sounds great.

I'm still having a little trouble, though.

My map works for me now, in the sense that I can mount
the file system from a client.

But when I try to write to it, vmstat on the server shows
I get a little burst of I/O on the servers, and then nothing.

The same ceph config but using the default map works
great - vmstat on the server shows 200-300 MB/s.

FWIW, here's my custom map again, queried 
via ceph osd getcrushmap:

# begin crush map

# devices
device 0 device0
device 1 device1
device 2 device2
device 3 device3

# types
type 0 device
type 1 disk
type 2 controller
type 3 host
type 4 root

# buckets
disk disk0 {
	id -1		# do not change unnecessarily
	alg uniform	# do not change bucket size (1) unnecessarily
	hash 0	# rjenkins1
	item device0 weight 1.000 pos 0
}
disk disk1 {
	id -2		# do not change unnecessarily
	alg uniform	# do not change bucket size (1) unnecessarily
	hash 0	# rjenkins1
	item device1 weight 1.000 pos 0
}
disk disk2 {
	id -3		# do not change unnecessarily
	alg uniform	# do not change bucket size (1) unnecessarily
	hash 0	# rjenkins1
	item device2 weight 1.000 pos 0
}
disk disk3 {
	id -4		# do not change unnecessarily
	alg uniform	# do not change bucket size (1) unnecessarily
	hash 0	# rjenkins1
	item device3 weight 1.000 pos 0
}
controller controller0 {
	id -5		# do not change unnecessarily
	alg uniform	# do not change bucket size (2) unnecessarily
	hash 0	# rjenkins1
	item disk0 weight 1.000 pos 0
	item disk1 weight 1.000 pos 1
}
controller controller1 {
	id -6		# do not change unnecessarily
	alg uniform	# do not change bucket size (2) unnecessarily
	hash 0	# rjenkins1
	item disk2 weight 1.000 pos 0
	item disk3 weight 1.000 pos 1
}
host host0 {
	id -7		# do not change unnecessarily
	alg uniform	# do not change bucket size (2) unnecessarily
	hash 0	# rjenkins1
	item controller0 weight 2.000 pos 0
	item controller1 weight 2.000 pos 1
}
root root {
	id -8		# do not change unnecessarily
	alg straw
	hash 0	# rjenkins1
	item host0 weight 4.000
}

# rules
rule data {
	ruleset 0
	type replicated
	min_size 2
	max_size 2
	step take root
	step chooseleaf firstn 0 type controller
	step emit
}
rule metadata {
	ruleset 1
	type replicated
	min_size 2
	max_size 2
	step take root
	step chooseleaf firstn 0 type controller
	step emit
}
rule casdata {
	ruleset 2
	type replicated
	min_size 2
	max_size 2
	step take root
	step chooseleaf firstn 0 type controller
	step emit
}
rule rbd {
	ruleset 3
	type replicated
	min_size 2
	max_size 2
	step take root
	step chooseleaf firstn 0 type controller
	step emit
}

# end crush map


and for completeness, here's the default map, also via query:

# begin crush map

# devices
device 0 device0
device 1 device1
device 2 device2
device 3 device3

# types
type 0 device
type 1 domain
type 2 pool

# buckets
domain root {
	id -1		# do not change unnecessarily
	alg straw
	hash 0	# rjenkins1
	item device0 weight 1.000
	item device1 weight 1.000
	item device2 weight 1.000
	item device3 weight 1.000
}

# rules
rule data {
	ruleset 0
	type replicated
	min_size 1
	max_size 10
	step take root
	step choose firstn 0 type device
	step emit
}
rule metadata {
	ruleset 1
	type replicated
	min_size 1
	max_size 10
	step take root
	step choose firstn 0 type device
	step emit
}
rule casdata {
	ruleset 2
	type replicated
	min_size 1
	max_size 10
	step take root
	step choose firstn 0 type device
	step emit
}
rule rbd {
	ruleset 3
	type replicated
	min_size 1
	max_size 10
	step take root
	step choose firstn 0 type device
	step emit
}

# end crush map

Here's the ceph.conf I use for both tests.  Note
that for the default map case I just make sure the 
crush map file I configured doesn't exist; mkcepfs -v
output suggests that the right thing happens in both
cases.

; global

[global]
	pid file = /var/run/ceph/$name.pid

	; some minimal logging (just message traffic) to aid debugging
	debug ms = 4

; monitor daemon common options
[mon]
	crush map = /mnt/projects/ceph/root/crushmap
	debug mon = 10

; monitor daemon options per instance
; need an odd number of instances
[mon0]
	host = sasa008
	mon addr = 192.168.204.111:6788
	mon data = /mnt/disk/disk.00p1/mon

; mds daemon common options

[mds]
	debug mds = 10

; mds daemon options per instance
[mds0]
	host = sasa008
	mds addr = 192.168.204.111
	keyring = /mnt/disk/disk.00p1/mds/keyring.$name

; osd daemon common options

[osd]
	; osd client message size cap = 67108864
	debug osd = 10

; osd options per instance; i.e. per crushmap device.

[osd0]
	host = sasa008
	osd addr = 192.168.204.111
	keyring     = /mnt/disk/disk.00p1/osd/keyring.$name
	osd journal = /dev/sdb2
	; btrfs devs  = /dev/sdb5
	; btrfs path  = /mnt/disk/disk.00p5
	osd data    = /mnt/disk/disk.00p5
[osd1]
	host = sasa008
	osd addr = 192.168.204.111
	keyring     = /mnt/disk/disk.01p1/osd/keyring.$name
	osd journal = /dev/sdc2
	; btrfs devs  = /dev/sdc5
	; btrfs path  = /mnt/disk/disk.01p5
	osd data    = /mnt/disk/disk.01p5
[osd2]
	host = sasa008
	osd addr = 192.168.204.111
	keyring     = /mnt/disk/disk.02p1/osd/keyring.$name
	osd journal = /dev/sdj2
	; btrfs devs  = /dev/sdj5
	; btrfs path  = /mnt/disk/disk.02p5
	osd data    = /mnt/disk/disk.02p5
[osd3]
	host = sasa008
	osd addr = 192.168.204.111
	keyring     = /mnt/disk/disk.03p1/osd/keyring.$name
	osd journal = /dev/sdk2
	; btrfs devs  = /dev/sdk5
	; btrfs path  = /mnt/disk/disk.03p5
	osd data    = /mnt/disk/disk.03p5

Maybe I'm still missing something?

Thanks -- Jim

> 
> sage
> 




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crushmap rule issue: choose vs. chooseleaf
  2010-06-24 19:44       ` Jim Schutt
@ 2010-06-24 20:18         ` Sage Weil
  2010-06-24 21:17           ` Jim Schutt
  0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2010-06-24 20:18 UTC (permalink / raw)
  To: Jim Schutt; +Cc: ceph-devel

On Thu, 24 Jun 2010, Jim Schutt wrote:
> On Thu, 2010-06-24 at 12:20 -0600, Sage Weil wrote:
> > Hi Jim,
> > 
> > Okay, I fixed another bug and am now able to use your map without 
> > problems.  The fix is pushed to the unstable branch in ceph.git.
> 
> Great, thanks!  I really appreciate you being able to take a look so 
> quickly.

No problem!

> > I'm surprised we didn't run into this before.. it looks like it's been 
> > broken for a while.  I'm adding a tracker item to set up some unit tests 
> > for this stuff so we can avoid this sort of regression.. the crush code 
> > should be really easy to check.
> 
> That sounds great.
> 
> I'm still having a little trouble, though.
> 
> My map works for me now, in the sense that I can mount
> the file system from a client.
> 
> But when I try to write to it, vmstat on the server shows
> I get a little burst of I/O on the servers, and then nothing.

Oh, the same fix needs to be applied to the kernel code as well.  I've 
just pushed that out (ceph-client.git master and 
ceph-client-standalone.git master+master-backport branches).  Hopefully 
that will clear it up?

sage

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: crushmap rule issue: choose vs. chooseleaf
  2010-06-24 20:18         ` Sage Weil
@ 2010-06-24 21:17           ` Jim Schutt
  0 siblings, 0 replies; 7+ messages in thread
From: Jim Schutt @ 2010-06-24 21:17 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel


On Thu, 2010-06-24 at 14:18 -0600, Sage Weil wrote:
> On Thu, 24 Jun 2010, Jim Schutt wrote:
> > On Thu, 2010-06-24 at 12:20 -0600, Sage Weil wrote:
> > > Hi Jim,
> > > 
> > > Okay, I fixed another bug and am now able to use your map without 
> > > problems.  The fix is pushed to the unstable branch in ceph.git.
> > 
> > Great, thanks!  I really appreciate you being able to take a look so 
> > quickly.
> 
> No problem!
> 
> > > I'm surprised we didn't run into this before.. it looks like it's been 
> > > broken for a while.  I'm adding a tracker item to set up some unit tests 
> > > for this stuff so we can avoid this sort of regression.. the crush code 
> > > should be really easy to check.
> > 
> > That sounds great.
> > 
> > I'm still having a little trouble, though.
> > 
> > My map works for me now, in the sense that I can mount
> > the file system from a client.
> > 
> > But when I try to write to it, vmstat on the server shows
> > I get a little burst of I/O on the servers, and then nothing.
> 
> Oh, the same fix needs to be applied to the kernel code as well.  I've 
> just pushed that out (ceph-client.git master and 
> ceph-client-standalone.git master+master-backport branches).  Hopefully 
> that will clear it up?

Yes, indeed.

Thanks again!

-- Jim

> 
> sage
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-06-24 21:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-23 19:20 crushmap rule issue: choose vs. chooseleaf Jim Schutt
2010-06-23 21:20 ` Sage Weil
2010-06-23 21:57   ` Jim Schutt
2010-06-24 18:20     ` Sage Weil
2010-06-24 19:44       ` Jim Schutt
2010-06-24 20:18         ` Sage Weil
2010-06-24 21:17           ` Jim Schutt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.