All of lore.kernel.org
 help / color / mirror / Atom feed
* noout equivalent for temporary OSD rm?
@ 2017-02-08 12:26 John Spray
  2017-02-08 12:41 ` Blair Bethwaite
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: John Spray @ 2017-02-08 12:26 UTC (permalink / raw)
  To: Ceph Development

So I've just finished upgrading my home cluster OSDs to bluestore by
killing them one by one and then letting backfill happen to "new" OSDs
on the same drives.  Hooray!

One slightly awkward thing I ran into was that even though I had noout
set throughout, during the period between removing the old OSD and
adding the "new" one, some PGs would of course get remapped (and start
generating backfill IO to third party OSDs).  This does make sense
when you think about it (noout doesn't make the cluster magically
remember OSDs that have been removed), but is still an undesirable
behaviour.

A) Do we currently have a mechanism to tell the cluster "even though I
removed this OSD, don't go moving PGs around just yet"?  Should we add
one?
B) Was there a way for me to avoid this by e.g. skipping the "osd rm
X" and "osd crush rm osd.X" that I'm currently doing before adding the
new OSD that will take the old OSD's ID?

John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: noout equivalent for temporary OSD rm?
  2017-02-08 12:26 noout equivalent for temporary OSD rm? John Spray
@ 2017-02-08 12:41 ` Blair Bethwaite
  2017-02-08 12:50   ` John Spray
  2017-02-08 12:47 ` Dan van der Ster
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Blair Bethwaite @ 2017-02-08 12:41 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

Hi John,

ceph osd set nobackfill/norecover/norebalance ?

It's not something you want to accidentally leave set, but is use
nonetheless - I'm using it right at this moment to load an edited
crushmap and examine the PG remapping impact before actually pulling
the trigger and letting things sort themselves out (if I decide not to
I can always re-inject the previous/current crushmap).

Cheers,

On 8 February 2017 at 23:26, John Spray <jspray@redhat.com> wrote:
> So I've just finished upgrading my home cluster OSDs to bluestore by
> killing them one by one and then letting backfill happen to "new" OSDs
> on the same drives.  Hooray!
>
> One slightly awkward thing I ran into was that even though I had noout
> set throughout, during the period between removing the old OSD and
> adding the "new" one, some PGs would of course get remapped (and start
> generating backfill IO to third party OSDs).  This does make sense
> when you think about it (noout doesn't make the cluster magically
> remember OSDs that have been removed), but is still an undesirable
> behaviour.
>
> A) Do we currently have a mechanism to tell the cluster "even though I
> removed this OSD, don't go moving PGs around just yet"?  Should we add
> one?
> B) Was there a way for me to avoid this by e.g. skipping the "osd rm
> X" and "osd crush rm osd.X" that I'm currently doing before adding the
> new OSD that will take the old OSD's ID?
>
> John
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Cheers,
~Blairo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: noout equivalent for temporary OSD rm?
  2017-02-08 12:26 noout equivalent for temporary OSD rm? John Spray
  2017-02-08 12:41 ` Blair Bethwaite
@ 2017-02-08 12:47 ` Dan van der Ster
  2017-02-08 12:52 ` Henrik Korkuc
  2017-02-08 14:23 ` Sage Weil
  3 siblings, 0 replies; 12+ messages in thread
From: Dan van der Ster @ 2017-02-08 12:47 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

Hi John,

I usually set nobackfill and norecover until all the OSDs are in their
final spots.
PGs get remapped but nothing moves.

-- Dan


On Wed, Feb 8, 2017 at 1:26 PM, John Spray <jspray@redhat.com> wrote:
> So I've just finished upgrading my home cluster OSDs to bluestore by
> killing them one by one and then letting backfill happen to "new" OSDs
> on the same drives.  Hooray!
>
> One slightly awkward thing I ran into was that even though I had noout
> set throughout, during the period between removing the old OSD and
> adding the "new" one, some PGs would of course get remapped (and start
> generating backfill IO to third party OSDs).  This does make sense
> when you think about it (noout doesn't make the cluster magically
> remember OSDs that have been removed), but is still an undesirable
> behaviour.
>
> A) Do we currently have a mechanism to tell the cluster "even though I
> removed this OSD, don't go moving PGs around just yet"?  Should we add
> one?
> B) Was there a way for me to avoid this by e.g. skipping the "osd rm
> X" and "osd crush rm osd.X" that I'm currently doing before adding the
> new OSD that will take the old OSD's ID?
>
> John
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: noout equivalent for temporary OSD rm?
  2017-02-08 12:41 ` Blair Bethwaite
@ 2017-02-08 12:50   ` John Spray
  2017-02-08 13:11     ` Dan van der Ster
  2017-02-08 13:13     ` Blair Bethwaite
  0 siblings, 2 replies; 12+ messages in thread
From: John Spray @ 2017-02-08 12:50 UTC (permalink / raw)
  To: Blair Bethwaite; +Cc: Ceph Development

On Wed, Feb 8, 2017 at 12:41 PM, Blair Bethwaite
<blair.bethwaite@gmail.com> wrote:
> Hi John,
>
> ceph osd set nobackfill/norecover/norebalance ?
>
> It's not something you want to accidentally leave set, but is use
> nonetheless - I'm using it right at this moment to load an edited
> crushmap and examine the PG remapping impact before actually pulling
> the trigger and letting things sort themselves out (if I decide not to
> I can always re-inject the previous/current crushmap).

Ah ha, of course nobackfill is the one.  I am exposing my lack of
experience in actually operating a cluster here :-)

John

>
> Cheers,
>
> On 8 February 2017 at 23:26, John Spray <jspray@redhat.com> wrote:
>> So I've just finished upgrading my home cluster OSDs to bluestore by
>> killing them one by one and then letting backfill happen to "new" OSDs
>> on the same drives.  Hooray!
>>
>> One slightly awkward thing I ran into was that even though I had noout
>> set throughout, during the period between removing the old OSD and
>> adding the "new" one, some PGs would of course get remapped (and start
>> generating backfill IO to third party OSDs).  This does make sense
>> when you think about it (noout doesn't make the cluster magically
>> remember OSDs that have been removed), but is still an undesirable
>> behaviour.
>>
>> A) Do we currently have a mechanism to tell the cluster "even though I
>> removed this OSD, don't go moving PGs around just yet"?  Should we add
>> one?
>> B) Was there a way for me to avoid this by e.g. skipping the "osd rm
>> X" and "osd crush rm osd.X" that I'm currently doing before adding the
>> new OSD that will take the old OSD's ID?
>>
>> John
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Cheers,
> ~Blairo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: noout equivalent for temporary OSD rm?
  2017-02-08 12:26 noout equivalent for temporary OSD rm? John Spray
  2017-02-08 12:41 ` Blair Bethwaite
  2017-02-08 12:47 ` Dan van der Ster
@ 2017-02-08 12:52 ` Henrik Korkuc
  2017-02-08 14:23 ` Sage Weil
  3 siblings, 0 replies; 12+ messages in thread
From: Henrik Korkuc @ 2017-02-08 12:52 UTC (permalink / raw)
  To: John Spray, Ceph Development

On 17-02-08 14:26, John Spray wrote:
> So I've just finished upgrading my home cluster OSDs to bluestore by
> killing them one by one and then letting backfill happen to "new" OSDs
> on the same drives.  Hooray!
>
> One slightly awkward thing I ran into was that even though I had noout
> set throughout, during the period between removing the old OSD and
> adding the "new" one, some PGs would of course get remapped (and start
> generating backfill IO to third party OSDs).  This does make sense
> when you think about it (noout doesn't make the cluster magically
> remember OSDs that have been removed), but is still an undesirable
> behaviour.
>
> A) Do we currently have a mechanism to tell the cluster "even though I
> removed this OSD, don't go moving PGs around just yet"?  Should we add
> one?
I successfully used "nobackfill" for that purpose. Cluster does peering, 
but no actual data movement

> B) Was there a way for me to avoid this by e.g. skipping the "osd rm
> X" and "osd crush rm osd.X" that I'm currently doing before adding the
> new OSD that will take the old OSD's ID?
>
> John
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: noout equivalent for temporary OSD rm?
  2017-02-08 12:50   ` John Spray
@ 2017-02-08 13:11     ` Dan van der Ster
  2017-02-08 13:17       ` Blair Bethwaite
  2017-02-08 13:13     ` Blair Bethwaite
  1 sibling, 1 reply; 12+ messages in thread
From: Dan van der Ster @ 2017-02-08 13:11 UTC (permalink / raw)
  To: John Spray; +Cc: Blair Bethwaite, Ceph Development

On Wed, Feb 8, 2017 at 1:50 PM, John Spray <jspray@redhat.com> wrote:
> On Wed, Feb 8, 2017 at 12:41 PM, Blair Bethwaite
> <blair.bethwaite@gmail.com> wrote:
>> Hi John,
>>
>> ceph osd set nobackfill/norecover/norebalance ?
>>
>> It's not something you want to accidentally leave set, but is use
>> nonetheless - I'm using it right at this moment to load an edited
>> crushmap and examine the PG remapping impact before actually pulling
>> the trigger and letting things sort themselves out (if I decide not to
>> I can always re-inject the previous/current crushmap).
>
> Ah ha, of course nobackfill is the one.  I am exposing my lack of
> experience in actually operating a cluster here :-)
>

That said, it might make sense for Ceph to wait a few minutes before
starting to backfill after any osdmap changes.
The current behaviour can be a little spastic at times.

-- Dan


> John
>
>>
>> Cheers,
>>
>> On 8 February 2017 at 23:26, John Spray <jspray@redhat.com> wrote:
>>> So I've just finished upgrading my home cluster OSDs to bluestore by
>>> killing them one by one and then letting backfill happen to "new" OSDs
>>> on the same drives.  Hooray!
>>>
>>> One slightly awkward thing I ran into was that even though I had noout
>>> set throughout, during the period between removing the old OSD and
>>> adding the "new" one, some PGs would of course get remapped (and start
>>> generating backfill IO to third party OSDs).  This does make sense
>>> when you think about it (noout doesn't make the cluster magically
>>> remember OSDs that have been removed), but is still an undesirable
>>> behaviour.
>>>
>>> A) Do we currently have a mechanism to tell the cluster "even though I
>>> removed this OSD, don't go moving PGs around just yet"?  Should we add
>>> one?
>>> B) Was there a way for me to avoid this by e.g. skipping the "osd rm
>>> X" and "osd crush rm osd.X" that I'm currently doing before adding the
>>> new OSD that will take the old OSD's ID?
>>>
>>> John
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Cheers,
>> ~Blairo
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: noout equivalent for temporary OSD rm?
  2017-02-08 12:50   ` John Spray
  2017-02-08 13:11     ` Dan van der Ster
@ 2017-02-08 13:13     ` Blair Bethwaite
  1 sibling, 0 replies; 12+ messages in thread
From: Blair Bethwaite @ 2017-02-08 13:13 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

I'm heartened by the fact you have a home cluster. No children I presume? :-)

On 8 February 2017 at 23:50, John Spray <jspray@redhat.com> wrote:
> On Wed, Feb 8, 2017 at 12:41 PM, Blair Bethwaite
> <blair.bethwaite@gmail.com> wrote:
>> Hi John,
>>
>> ceph osd set nobackfill/norecover/norebalance ?
>>
>> It's not something you want to accidentally leave set, but is use
>> nonetheless - I'm using it right at this moment to load an edited
>> crushmap and examine the PG remapping impact before actually pulling
>> the trigger and letting things sort themselves out (if I decide not to
>> I can always re-inject the previous/current crushmap).
>
> Ah ha, of course nobackfill is the one.  I am exposing my lack of
> experience in actually operating a cluster here :-)
>
> John
>
>>
>> Cheers,
>>
>> On 8 February 2017 at 23:26, John Spray <jspray@redhat.com> wrote:
>>> So I've just finished upgrading my home cluster OSDs to bluestore by
>>> killing them one by one and then letting backfill happen to "new" OSDs
>>> on the same drives.  Hooray!
>>>
>>> One slightly awkward thing I ran into was that even though I had noout
>>> set throughout, during the period between removing the old OSD and
>>> adding the "new" one, some PGs would of course get remapped (and start
>>> generating backfill IO to third party OSDs).  This does make sense
>>> when you think about it (noout doesn't make the cluster magically
>>> remember OSDs that have been removed), but is still an undesirable
>>> behaviour.
>>>
>>> A) Do we currently have a mechanism to tell the cluster "even though I
>>> removed this OSD, don't go moving PGs around just yet"?  Should we add
>>> one?
>>> B) Was there a way for me to avoid this by e.g. skipping the "osd rm
>>> X" and "osd crush rm osd.X" that I'm currently doing before adding the
>>> new OSD that will take the old OSD's ID?
>>>
>>> John
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Cheers,
>> ~Blairo



-- 
Cheers,
~Blairo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: noout equivalent for temporary OSD rm?
  2017-02-08 13:11     ` Dan van der Ster
@ 2017-02-08 13:17       ` Blair Bethwaite
  0 siblings, 0 replies; 12+ messages in thread
From: Blair Bethwaite @ 2017-02-08 13:17 UTC (permalink / raw)
  To: Dan van der Ster; +Cc: John Spray, Ceph Development

On 9 February 2017 at 00:11, Dan van der Ster <dan@vanderster.com> wrote:
> On Wed, Feb 8, 2017 at 1:50 PM, John Spray <jspray@redhat.com> wrote:
>> On Wed, Feb 8, 2017 at 12:41 PM, Blair Bethwaite
>> <blair.bethwaite@gmail.com> wrote:
>>> Hi John,
>>>
>>> ceph osd set nobackfill/norecover/norebalance ?
>>>
>>> It's not something you want to accidentally leave set, but is use
>>> nonetheless - I'm using it right at this moment to load an edited
>>> crushmap and examine the PG remapping impact before actually pulling
>>> the trigger and letting things sort themselves out (if I decide not to
>>> I can always re-inject the previous/current crushmap).
>>
>> Ah ha, of course nobackfill is the one.  I am exposing my lack of
>> experience in actually operating a cluster here :-)
>>
>
> That said, it might make sense for Ceph to wait a few minutes before
> starting to backfill after any osdmap changes.
> The current behaviour can be a little spastic at times.

Agreed. And the scenario I described above must be very common in
operations, where I suspect more often that not people just make the
change and hope all will be well. It's true crushtool can simulate
mappings, but what I really want to see is the `ceph -s` after the
crush change but before the cluster starts actually acting on it, that
e.g. gives you a chance to see the amount of data that will move and
see if the number of impacted PGs makes sense for the change.

-- 
Cheers,
~Blairo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: noout equivalent for temporary OSD rm?
  2017-02-08 12:26 noout equivalent for temporary OSD rm? John Spray
                   ` (2 preceding siblings ...)
  2017-02-08 12:52 ` Henrik Korkuc
@ 2017-02-08 14:23 ` Sage Weil
  2017-02-08 14:37   ` Henrik Korkuc
  2017-02-08 14:42   ` Wido den Hollander
  3 siblings, 2 replies; 12+ messages in thread
From: Sage Weil @ 2017-02-08 14:23 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Development

On Wed, 8 Feb 2017, John Spray wrote:
> So I've just finished upgrading my home cluster OSDs to bluestore by
> killing them one by one and then letting backfill happen to "new" OSDs
> on the same drives.  Hooray!
> 
> One slightly awkward thing I ran into was that even though I had noout
> set throughout, during the period between removing the old OSD and
> adding the "new" one, some PGs would of course get remapped (and start
> generating backfill IO to third party OSDs).  This does make sense
> when you think about it (noout doesn't make the cluster magically
> remember OSDs that have been removed), but is still an undesirable
> behaviour.
> 
> A) Do we currently have a mechanism to tell the cluster "even though I
> removed this OSD, don't go moving PGs around just yet"?  Should we add
> one?

There's 'ceph osd set norebalance'...

> B) Was there a way for me to avoid this by e.g. skipping the "osd rm
> X" and "osd crush rm osd.X" that I'm currently doing before adding the
> new OSD that will take the old OSD's ID?

This keeps coming up but I don't think we've ever proposed a good 
solution.  Perhaps the simplest thing is to allow ceph-disk to take an OSD 
id as an argument.  Normally this is probably a no-no since the OSD might 
exist elsewhere and have real data on it, and you don't want multiple OSDs 
with the same id, but we could make this safer by requiring that the 
OSD be marked 'lost' before it's id can be reused...

sage

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: noout equivalent for temporary OSD rm?
  2017-02-08 14:23 ` Sage Weil
@ 2017-02-08 14:37   ` Henrik Korkuc
  2017-02-08 14:53     ` Sage Weil
  2017-02-08 14:42   ` Wido den Hollander
  1 sibling, 1 reply; 12+ messages in thread
From: Henrik Korkuc @ 2017-02-08 14:37 UTC (permalink / raw)
  To: Sage Weil, John Spray; +Cc: Ceph Development

On 17-02-08 16:23, Sage Weil wrote:
> On Wed, 8 Feb 2017, John Spray wrote:
>> So I've just finished upgrading my home cluster OSDs to bluestore by
>> killing them one by one and then letting backfill happen to "new" OSDs
>> on the same drives.  Hooray!
>>
>> One slightly awkward thing I ran into was that even though I had noout
>> set throughout, during the period between removing the old OSD and
>> adding the "new" one, some PGs would of course get remapped (and start
>> generating backfill IO to third party OSDs).  This does make sense
>> when you think about it (noout doesn't make the cluster magically
>> remember OSDs that have been removed), but is still an undesirable
>> behaviour.
>>
>> A) Do we currently have a mechanism to tell the cluster "even though I
>> removed this OSD, don't go moving PGs around just yet"?  Should we add
>> one?
> There's 'ceph osd set norebalance'...
>
>> B) Was there a way for me to avoid this by e.g. skipping the "osd rm
>> X" and "osd crush rm osd.X" that I'm currently doing before adding the
>> new OSD that will take the old OSD's ID?
> This keeps coming up but I don't think we've ever proposed a good
> solution.  Perhaps the simplest thing is to allow ceph-disk to take an OSD
> id as an argument.  Normally this is probably a no-no since the OSD might
> exist elsewhere and have real data on it, and you don't want multiple OSDs
> with the same id, but we could make this safer by requiring that the
> OSD be marked 'lost' before it's id can be reused...
what about "replace"? It could remove old osd from the crush and add new 
one in same location under new id, setting all params to be the same 
(weight, reweight, affinity)?


> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: noout equivalent for temporary OSD rm?
  2017-02-08 14:23 ` Sage Weil
  2017-02-08 14:37   ` Henrik Korkuc
@ 2017-02-08 14:42   ` Wido den Hollander
  1 sibling, 0 replies; 12+ messages in thread
From: Wido den Hollander @ 2017-02-08 14:42 UTC (permalink / raw)
  To: Sage Weil, John Spray; +Cc: Ceph Development


> Op 8 februari 2017 om 15:23 schreef Sage Weil <sage@newdream.net>:
> 
> 
> On Wed, 8 Feb 2017, John Spray wrote:
> > So I've just finished upgrading my home cluster OSDs to bluestore by
> > killing them one by one and then letting backfill happen to "new" OSDs
> > on the same drives.  Hooray!
> > 
> > One slightly awkward thing I ran into was that even though I had noout
> > set throughout, during the period between removing the old OSD and
> > adding the "new" one, some PGs would of course get remapped (and start
> > generating backfill IO to third party OSDs).  This does make sense
> > when you think about it (noout doesn't make the cluster magically
> > remember OSDs that have been removed), but is still an undesirable
> > behaviour.
> > 
> > A) Do we currently have a mechanism to tell the cluster "even though I
> > removed this OSD, don't go moving PGs around just yet"?  Should we add
> > one?
> 
> There's 'ceph osd set norebalance'...
> 
> > B) Was there a way for me to avoid this by e.g. skipping the "osd rm
> > X" and "osd crush rm osd.X" that I'm currently doing before adding the
> > new OSD that will take the old OSD's ID?
> 
> This keeps coming up but I don't think we've ever proposed a good 
> solution.  Perhaps the simplest thing is to allow ceph-disk to take an OSD 
> id as an argument.  Normally this is probably a no-no since the OSD might 
> exist elsewhere and have real data on it, and you don't want multiple OSDs 
> with the same id, but we could make this safer by requiring that the 
> OSD be marked 'lost' before it's id can be reused...
> 

Or the OSD should not exist yet? Allowing ceph-disk to re-use a ID and UUID would make it a lot easier when switching from FileStore to BlueStore as well.

ceph-disk --zap-disk -i 0 --uuid XXX-XXXX-XXXX-XXXX prepare /dev/sda

That way you can wipe a existing disk to BlueStore.

Wido

> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: noout equivalent for temporary OSD rm?
  2017-02-08 14:37   ` Henrik Korkuc
@ 2017-02-08 14:53     ` Sage Weil
  0 siblings, 0 replies; 12+ messages in thread
From: Sage Weil @ 2017-02-08 14:53 UTC (permalink / raw)
  To: Henrik Korkuc; +Cc: John Spray, Ceph Development

On Wed, 8 Feb 2017, Henrik Korkuc wrote:
> On 17-02-08 16:23, Sage Weil wrote:
> > On Wed, 8 Feb 2017, John Spray wrote:
> > > So I've just finished upgrading my home cluster OSDs to bluestore by
> > > killing them one by one and then letting backfill happen to "new" OSDs
> > > on the same drives.  Hooray!
> > > 
> > > One slightly awkward thing I ran into was that even though I had noout
> > > set throughout, during the period between removing the old OSD and
> > > adding the "new" one, some PGs would of course get remapped (and start
> > > generating backfill IO to third party OSDs).  This does make sense
> > > when you think about it (noout doesn't make the cluster magically
> > > remember OSDs that have been removed), but is still an undesirable
> > > behaviour.
> > > 
> > > A) Do we currently have a mechanism to tell the cluster "even though I
> > > removed this OSD, don't go moving PGs around just yet"?  Should we add
> > > one?
> > There's 'ceph osd set norebalance'...
> > 
> > > B) Was there a way for me to avoid this by e.g. skipping the "osd rm
> > > X" and "osd crush rm osd.X" that I'm currently doing before adding the
> > > new OSD that will take the old OSD's ID?
> > This keeps coming up but I don't think we've ever proposed a good
> > solution.  Perhaps the simplest thing is to allow ceph-disk to take an OSD
> > id as an argument.  Normally this is probably a no-no since the OSD might
> > exist elsewhere and have real data on it, and you don't want multiple OSDs
> > with the same id, but we could make this safer by requiring that the
> > OSD be marked 'lost' before it's id can be reused...
> what about "replace"? It could remove old osd from the crush and add new one
> in same location under new id, setting all params to be the same (weight,
> reweight, affinity)?

The problem is that one of the inputs to CRUSH's pseudorandom placement 
decision is the OSD id.  If the id changes, even if the OSD in the same 
position in the hierarchy, that OSD will get a different pseudorandom 
subset of the PGs.

sage

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-02-08 17:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-08 12:26 noout equivalent for temporary OSD rm? John Spray
2017-02-08 12:41 ` Blair Bethwaite
2017-02-08 12:50   ` John Spray
2017-02-08 13:11     ` Dan van der Ster
2017-02-08 13:17       ` Blair Bethwaite
2017-02-08 13:13     ` Blair Bethwaite
2017-02-08 12:47 ` Dan van der Ster
2017-02-08 12:52 ` Henrik Korkuc
2017-02-08 14:23 ` Sage Weil
2017-02-08 14:37   ` Henrik Korkuc
2017-02-08 14:53     ` Sage Weil
2017-02-08 14:42   ` Wido den Hollander

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.