All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sage Weil <sweil@redhat.com>
To: Henrik Korkuc <lists@kirneh.eu>
Cc: John Spray <jspray@redhat.com>,
	Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: noout equivalent for temporary OSD rm?
Date: Wed, 8 Feb 2017 14:53:02 +0000 (UTC)	[thread overview]
Message-ID: <alpine.DEB.2.11.1702081451280.7782@piezo.novalocal> (raw)
In-Reply-To: <108f414f-9ff8-6d05-d406-e73a157151ec@kirneh.eu>

On Wed, 8 Feb 2017, Henrik Korkuc wrote:
> On 17-02-08 16:23, Sage Weil wrote:
> > On Wed, 8 Feb 2017, John Spray wrote:
> > > So I've just finished upgrading my home cluster OSDs to bluestore by
> > > killing them one by one and then letting backfill happen to "new" OSDs
> > > on the same drives.  Hooray!
> > > 
> > > One slightly awkward thing I ran into was that even though I had noout
> > > set throughout, during the period between removing the old OSD and
> > > adding the "new" one, some PGs would of course get remapped (and start
> > > generating backfill IO to third party OSDs).  This does make sense
> > > when you think about it (noout doesn't make the cluster magically
> > > remember OSDs that have been removed), but is still an undesirable
> > > behaviour.
> > > 
> > > A) Do we currently have a mechanism to tell the cluster "even though I
> > > removed this OSD, don't go moving PGs around just yet"?  Should we add
> > > one?
> > There's 'ceph osd set norebalance'...
> > 
> > > B) Was there a way for me to avoid this by e.g. skipping the "osd rm
> > > X" and "osd crush rm osd.X" that I'm currently doing before adding the
> > > new OSD that will take the old OSD's ID?
> > This keeps coming up but I don't think we've ever proposed a good
> > solution.  Perhaps the simplest thing is to allow ceph-disk to take an OSD
> > id as an argument.  Normally this is probably a no-no since the OSD might
> > exist elsewhere and have real data on it, and you don't want multiple OSDs
> > with the same id, but we could make this safer by requiring that the
> > OSD be marked 'lost' before it's id can be reused...
> what about "replace"? It could remove old osd from the crush and add new one
> in same location under new id, setting all params to be the same (weight,
> reweight, affinity)?

The problem is that one of the inputs to CRUSH's pseudorandom placement 
decision is the OSD id.  If the id changes, even if the OSD in the same 
position in the hierarchy, that OSD will get a different pseudorandom 
subset of the PGs.

sage

  reply	other threads:[~2017-02-08 14:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-08 12:26 noout equivalent for temporary OSD rm? John Spray
2017-02-08 12:41 ` Blair Bethwaite
2017-02-08 12:50   ` John Spray
2017-02-08 13:11     ` Dan van der Ster
2017-02-08 13:17       ` Blair Bethwaite
2017-02-08 13:13     ` Blair Bethwaite
2017-02-08 12:47 ` Dan van der Ster
2017-02-08 12:52 ` Henrik Korkuc
2017-02-08 14:23 ` Sage Weil
2017-02-08 14:37   ` Henrik Korkuc
2017-02-08 14:53     ` Sage Weil [this message]
2017-02-08 14:42   ` Wido den Hollander

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.11.1702081451280.7782@piezo.novalocal \
    --to=sweil@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=jspray@redhat.com \
    --cc=lists@kirneh.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.