All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sage Weil <sage@newdream.net>
To: Jim Schutt <jaschut@sandia.gov>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: crushmap rule issue: choose vs. chooseleaf
Date: Thu, 24 Jun 2010 11:20:36 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.1006241059570.30180@cobra.newdream.net> (raw)
In-Reply-To: <1277330264.29400.34.camel@sale659.sandia.gov>

Hi Jim,

Okay, I fixed another bug and am now able to use your map without 
problems.  The fix is pushed to the unstable branch in ceph.git.

I'm surprised we didn't run into this before.. it looks like it's been 
broken for a while.  I'm adding a tracker item to set up some unit tests 
for this stuff so we can avoid this sort of regression.. the crush code 
should be really easy to check.

sage


On Wed, 23 Jun 2010, Jim Schutt wrote:

> 
> On Wed, 2010-06-23 at 15:20 -0600, Sage Weil wrote:
> > On Wed, 23 Jun 2010, Jim Schutt wrote:
> > > I've been trying to get custom CRUSH maps to work, based on
> > > http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH
> > > 
> > > I've not had any success until I dumped the map from
> > > a simple 4 device setup.  I noticed that map had a
> > > rule using:
> > >   step choose firstn 0 type device
> > > 
> > > whereas all the custom maps I was trying to build used
> > > chooseleaf rather than choose.  So I modified those
> > > default 4 device map rules to be:
> > >   step chooseleaf firstn 0 type device
> > 
> > Hmm.  It's non-obvious, and should probably work, but chooseleaf on a 
> > 'device' (which is the leaf) currently doesn't work.  If you have a 
> > hiearchy like
> > 
> > root
> > host
> > controller
> > disk
> > device
> > 
> > You can either
> > 
> >          step take root
> >          step choose firstn 0 type controller
> >          step choose firstn 1 type device
> >          step emit
> > 
> > to get N distinct controllers, and then for each of those, choose 1 
> > device.  Or,
> > 
> >          step take root
> >          step chooseleaf firstn 0 type controller
> >          step emit
> > 
> > to choose (a device nested beneath) N distinct controllers.  The 
> > difference is the latter will try to pick a nested device for each 
> > controller and, if it can't find one, reject the controller choice and 
> > continue.  It prevents situations where you have a controller with no 
> > usable devices beneath it, the first rules picks one of those controllers 
> > in the 'choose firstn 0 type controller' step, but then can't find a 
> > device and you end up with (n-1) results.
> > 
> > The first problem you had was a bug when chooseleaf was given the leaf 
> > type (device).  It normally takes intermediate type in the heirarchy, not 
> > the leaf type.  That's now fixed, and should give an identical result to 
> > 'choose' in that case.
> 
> OK, thanks.
> 
> > 
> > 
> > > Based on that, I reworked some of test maps with deeper device
> > > hierarchies I had been trying, and got them to work
> > > (i.e. the file system started) when I avoided chooseleaf rules.
> > > 
> > > E.g. with a device hierarchy like this
> > > (a device here is a partition, as I am still
> > > testing on limited hardware):
> > > 
> > > type 0 device
> > > type 1 disk
> > > type 2 controller
> > > type 3 host
> > > type 4 root
> > > 
> > > a map with rules like this worked:
> > > 
> > > rule data {
> > >         ruleset 0
> > >         type replicated
> > >         min_size 2
> > >         max_size 2
> > >         step take root
> > >         step choose firstn 0 type host
> > >         step choose firstn 0 type controller
> > >         step choose firstn 0 type disk
> > >         step choose firstn 0 type device
> > >         step emit
> > > }
> 
> Based on your above explanation, I suspect this wasn't
> doing what I wanted.
> 
> > > 
> > > but a map with rules like this didn't:
> > > 
> > > rule data {
> > >         ruleset 0
> > >         type replicated
> > >         min_size 2
> > >         max_size 2
> > >         step take root
> > >         step chooseleaf firstn 0 type controller
> > >         step emit
> > > }
> > 
> > Hmm, this should work (assuming there are actually nodes of type 
> > controller in the tree).  Can you send along the actual map you're trying?
> 
> Sure.  I've been using multiple partitions
> per disk for learning about CRUSH maps, so 
> in this map a device is a partition.
> 
> Here it is:
> 
> # begin crush map
> 
> # devices
> device 0 device0
> device 1 device1
> device 2 device2
> device 3 device3
> 
> # types
> type 0 device
> type 1 disk
> type 2 controller
> type 3 host
> type 4 root
> 
> # buckets
> disk disk0 {
> 	id -1		# do not change unnecessarily
> 	alg uniform	# do not change bucket size (1) unnecessarily
> 	hash 0	# rjenkins1
> 	item device0 weight 1.000 pos 0
> }
> disk disk1 {
> 	id -2		# do not change unnecessarily
> 	alg uniform	# do not change bucket size (1) unnecessarily
> 	hash 0	# rjenkins1
> 	item device1 weight 1.000 pos 0
> }
> disk disk2 {
> 	id -3		# do not change unnecessarily
> 	alg uniform	# do not change bucket size (1) unnecessarily
> 	hash 0	# rjenkins1
> 	item device2 weight 1.000 pos 0
> }
> disk disk3 {
> 	id -4		# do not change unnecessarily
> 	alg uniform	# do not change bucket size (1) unnecessarily
> 	hash 0	# rjenkins1
> 	item device3 weight 1.000 pos 0
> }
> controller controller0 {
> 	id -5		# do not change unnecessarily
> 	alg uniform	# do not change bucket size (2) unnecessarily
> 	hash 0	# rjenkins1
> 	item disk0 weight 1.000 pos 0
> 	item disk1 weight 1.000 pos 1
> }
> controller controller1 {
> 	id -6		# do not change unnecessarily
> 	alg uniform	# do not change bucket size (2) unnecessarily
> 	hash 0	# rjenkins1
> 	item disk2 weight 1.000 pos 0
> 	item disk3 weight 1.000 pos 1
> }
> host host0 {
> 	id -7		# do not change unnecessarily
> 	alg uniform	# do not change bucket size (2) unnecessarily
> 	hash 0	# rjenkins1
> 	item controller0 weight 2.000 pos 0
> 	item controller1 weight 2.000 pos 1
> }
> root root {
> 	id -8		# do not change unnecessarily
> 	alg straw
> 	hash 0	# rjenkins1
> 	item host0 weight 4.000
> }
> 
> # rules
> rule data {
> 	ruleset 0
> 	type replicated
> 	min_size 2
> 	max_size 2
> 	step take root
> 	step chooseleaf firstn 0 type controller
> 	step emit
> }
> rule metadata {
> 	ruleset 1
> 	type replicated
> 	min_size 2
> 	max_size 2
> 	step take root
> 	step chooseleaf firstn 0 type controller
> 	step emit
> }
> rule casdata {
> 	ruleset 2
> 	type replicated
> 	min_size 2
> 	max_size 2
> 	step take root
> 	step chooseleaf firstn 0 type controller
> 	step emit
> }
> rule rbd {
> 	ruleset 3
> 	type replicated
> 	min_size 2
> 	max_size 2
> 	step take root
> 	step chooseleaf firstn 0 type controller
> 	step emit
> }
> 
> # end crush map
> 
> When I try to start a file system built with the above map,
> the monitor never accepts connections (from either ceph -w
> or the cosd instances).
> 
> Thanks for taking a look.
> 
> -- Jim
> 
> > 
> > Thanks-
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

  reply	other threads:[~2010-06-24 18:19 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-23 19:20 crushmap rule issue: choose vs. chooseleaf Jim Schutt
2010-06-23 21:20 ` Sage Weil
2010-06-23 21:57   ` Jim Schutt
2010-06-24 18:20     ` Sage Weil [this message]
2010-06-24 19:44       ` Jim Schutt
2010-06-24 20:18         ` Sage Weil
2010-06-24 21:17           ` Jim Schutt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.1006241059570.30180@cobra.newdream.net \
    --to=sage@newdream.net \
    --cc=ceph-devel@vger.kernel.org \
    --cc=jaschut@sandia.gov \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.