All of lore.kernel.org
 help / color / mirror / Atom feed
* set_alloc_hint old osds
@ 2014-09-11 20:19 Samuel Just
  2014-09-11 20:30 ` Gregory Farnum
  0 siblings, 1 reply; 8+ messages in thread
From: Samuel Just @ 2014-09-11 20:19 UTC (permalink / raw)
  To: Ilya Dryomov, Ilya Dryomov, ceph-devel

http://tracker.ceph.com/issues/9419

librbd unconditionally sends set_alloc_hint.  Do we require that users
upgrade the osds first?  Also, should the primary respond with
ENOTSUPP if any replicas don't support it?
-Sam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: set_alloc_hint old osds
  2014-09-11 20:19 set_alloc_hint old osds Samuel Just
@ 2014-09-11 20:30 ` Gregory Farnum
  2014-09-11 20:33   ` Samuel Just
  0 siblings, 1 reply; 8+ messages in thread
From: Gregory Farnum @ 2014-09-11 20:30 UTC (permalink / raw)
  To: Samuel Just; +Cc: Ilya Dryomov, Ilya Dryomov, ceph-devel

On Thu, Sep 11, 2014 at 1:19 PM, Samuel Just <sam.just@inktank.com> wrote:
> http://tracker.ceph.com/issues/9419
>
> librbd unconditionally sends set_alloc_hint.  Do we require that users
> upgrade the osds first?  Also, should the primary respond with
> ENOTSUPP if any replicas don't support it?

Something closer to the second option, I think...but then you run into
the problem where maybe the PG gets moved from a set of new OSDs to a
set of old ones that don't support the op. :/ I think for anything
that goes to disk you need to go through a full features-in-the-osdmap
process like we did for erasure coding.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: set_alloc_hint old osds
  2014-09-11 20:30 ` Gregory Farnum
@ 2014-09-11 20:33   ` Samuel Just
  2014-09-11 20:40     ` Gregory Farnum
  0 siblings, 1 reply; 8+ messages in thread
From: Samuel Just @ 2014-09-11 20:33 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Ilya Dryomov, Ilya Dryomov, ceph-devel

That part is harmless, the transaction would be recreated for the new
acting set taking into account the new acting set features.  It
doesn't have any actual affect on the contents of the object.
-Sam

On Thu, Sep 11, 2014 at 1:30 PM, Gregory Farnum <greg@inktank.com> wrote:
> On Thu, Sep 11, 2014 at 1:19 PM, Samuel Just <sam.just@inktank.com> wrote:
>> http://tracker.ceph.com/issues/9419
>>
>> librbd unconditionally sends set_alloc_hint.  Do we require that users
>> upgrade the osds first?  Also, should the primary respond with
>> ENOTSUPP if any replicas don't support it?
>
> Something closer to the second option, I think...but then you run into
> the problem where maybe the PG gets moved from a set of new OSDs to a
> set of old ones that don't support the op. :/ I think for anything
> that goes to disk you need to go through a full features-in-the-osdmap
> process like we did for erasure coding.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: set_alloc_hint old osds
  2014-09-11 20:33   ` Samuel Just
@ 2014-09-11 20:40     ` Gregory Farnum
  2014-09-11 20:46       ` Samuel Just
  0 siblings, 1 reply; 8+ messages in thread
From: Gregory Farnum @ 2014-09-11 20:40 UTC (permalink / raw)
  To: Samuel Just; +Cc: Ilya Dryomov, Ilya Dryomov, ceph-devel

Does the hint not go into the pg log? Which could be retried on an older OSD?

On Thu, Sep 11, 2014 at 1:33 PM, Samuel Just <sam.just@inktank.com> wrote:
> That part is harmless, the transaction would be recreated for the new
> acting set taking into account the new acting set features.  It
> doesn't have any actual affect on the contents of the object.
> -Sam
>
> On Thu, Sep 11, 2014 at 1:30 PM, Gregory Farnum <greg@inktank.com> wrote:
>> On Thu, Sep 11, 2014 at 1:19 PM, Samuel Just <sam.just@inktank.com> wrote:
>>> http://tracker.ceph.com/issues/9419
>>>
>>> librbd unconditionally sends set_alloc_hint.  Do we require that users
>>> upgrade the osds first?  Also, should the primary respond with
>>> ENOTSUPP if any replicas don't support it?
>>
>> Something closer to the second option, I think...but then you run into
>> the problem where maybe the PG gets moved from a set of new OSDs to a
>> set of old ones that don't support the op. :/ I think for anything
>> that goes to disk you need to go through a full features-in-the-osdmap
>> process like we did for erasure coding.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: set_alloc_hint old osds
  2014-09-11 20:40     ` Gregory Farnum
@ 2014-09-11 20:46       ` Samuel Just
  2014-09-11 21:05         ` Gregory Farnum
  0 siblings, 1 reply; 8+ messages in thread
From: Samuel Just @ 2014-09-11 20:46 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Ilya Dryomov, Ilya Dryomov, ceph-devel

No, we don't put the transaction into the pg log.
-Sam

On Thu, Sep 11, 2014 at 1:40 PM, Gregory Farnum <greg@inktank.com> wrote:
> Does the hint not go into the pg log? Which could be retried on an older OSD?
>
> On Thu, Sep 11, 2014 at 1:33 PM, Samuel Just <sam.just@inktank.com> wrote:
>> That part is harmless, the transaction would be recreated for the new
>> acting set taking into account the new acting set features.  It
>> doesn't have any actual affect on the contents of the object.
>> -Sam
>>
>> On Thu, Sep 11, 2014 at 1:30 PM, Gregory Farnum <greg@inktank.com> wrote:
>>> On Thu, Sep 11, 2014 at 1:19 PM, Samuel Just <sam.just@inktank.com> wrote:
>>>> http://tracker.ceph.com/issues/9419
>>>>
>>>> librbd unconditionally sends set_alloc_hint.  Do we require that users
>>>> upgrade the osds first?  Also, should the primary respond with
>>>> ENOTSUPP if any replicas don't support it?
>>>
>>> Something closer to the second option, I think...but then you run into
>>> the problem where maybe the PG gets moved from a set of new OSDs to a
>>> set of old ones that don't support the op. :/ I think for anything
>>> that goes to disk you need to go through a full features-in-the-osdmap
>>> process like we did for erasure coding.
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: set_alloc_hint old osds
  2014-09-11 20:46       ` Samuel Just
@ 2014-09-11 21:05         ` Gregory Farnum
  2014-09-11 21:21           ` Samuel Just
  0 siblings, 1 reply; 8+ messages in thread
From: Gregory Farnum @ 2014-09-11 21:05 UTC (permalink / raw)
  To: Samuel Just; +Cc: Ilya Dryomov, Ilya Dryomov, ceph-devel

Oh, in that case the peers could just share their supported ops with
the primary or something (like we do with mon commands). That sounds
good to me, anyway?
-Greg

On Thu, Sep 11, 2014 at 1:46 PM, Samuel Just <sam.just@inktank.com> wrote:
> No, we don't put the transaction into the pg log.
> -Sam
>
> On Thu, Sep 11, 2014 at 1:40 PM, Gregory Farnum <greg@inktank.com> wrote:
>> Does the hint not go into the pg log? Which could be retried on an older OSD?
>>
>> On Thu, Sep 11, 2014 at 1:33 PM, Samuel Just <sam.just@inktank.com> wrote:
>>> That part is harmless, the transaction would be recreated for the new
>>> acting set taking into account the new acting set features.  It
>>> doesn't have any actual affect on the contents of the object.
>>> -Sam
>>>
>>> On Thu, Sep 11, 2014 at 1:30 PM, Gregory Farnum <greg@inktank.com> wrote:
>>>> On Thu, Sep 11, 2014 at 1:19 PM, Samuel Just <sam.just@inktank.com> wrote:
>>>>> http://tracker.ceph.com/issues/9419
>>>>>
>>>>> librbd unconditionally sends set_alloc_hint.  Do we require that users
>>>>> upgrade the osds first?  Also, should the primary respond with
>>>>> ENOTSUPP if any replicas don't support it?
>>>>
>>>> Something closer to the second option, I think...but then you run into
>>>> the problem where maybe the PG gets moved from a set of new OSDs to a
>>>> set of old ones that don't support the op. :/ I think for anything
>>>> that goes to disk you need to go through a full features-in-the-osdmap
>>>> process like we did for erasure coding.
>>>> -Greg
>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: set_alloc_hint old osds
  2014-09-11 21:05         ` Gregory Farnum
@ 2014-09-11 21:21           ` Samuel Just
  2014-09-12  8:15             ` Ilya Dryomov
  0 siblings, 1 reply; 8+ messages in thread
From: Samuel Just @ 2014-09-11 21:21 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Ilya Dryomov, Ilya Dryomov, ceph-devel

Yeah, so that's part of it.  The larger question is whether it's ok
for the client to indiscriminately send that op in the first place.
-Sam

On Thu, Sep 11, 2014 at 2:05 PM, Gregory Farnum <greg@inktank.com> wrote:
> Oh, in that case the peers could just share their supported ops with
> the primary or something (like we do with mon commands). That sounds
> good to me, anyway?
> -Greg
>
> On Thu, Sep 11, 2014 at 1:46 PM, Samuel Just <sam.just@inktank.com> wrote:
>> No, we don't put the transaction into the pg log.
>> -Sam
>>
>> On Thu, Sep 11, 2014 at 1:40 PM, Gregory Farnum <greg@inktank.com> wrote:
>>> Does the hint not go into the pg log? Which could be retried on an older OSD?
>>>
>>> On Thu, Sep 11, 2014 at 1:33 PM, Samuel Just <sam.just@inktank.com> wrote:
>>>> That part is harmless, the transaction would be recreated for the new
>>>> acting set taking into account the new acting set features.  It
>>>> doesn't have any actual affect on the contents of the object.
>>>> -Sam
>>>>
>>>> On Thu, Sep 11, 2014 at 1:30 PM, Gregory Farnum <greg@inktank.com> wrote:
>>>>> On Thu, Sep 11, 2014 at 1:19 PM, Samuel Just <sam.just@inktank.com> wrote:
>>>>>> http://tracker.ceph.com/issues/9419
>>>>>>
>>>>>> librbd unconditionally sends set_alloc_hint.  Do we require that users
>>>>>> upgrade the osds first?  Also, should the primary respond with
>>>>>> ENOTSUPP if any replicas don't support it?
>>>>>
>>>>> Something closer to the second option, I think...but then you run into
>>>>> the problem where maybe the PG gets moved from a set of new OSDs to a
>>>>> set of old ones that don't support the op. :/ I think for anything
>>>>> that goes to disk you need to go through a full features-in-the-osdmap
>>>>> process like we did for erasure coding.
>>>>> -Greg
>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: set_alloc_hint old osds
  2014-09-11 21:21           ` Samuel Just
@ 2014-09-12  8:15             ` Ilya Dryomov
  0 siblings, 0 replies; 8+ messages in thread
From: Ilya Dryomov @ 2014-09-12  8:15 UTC (permalink / raw)
  To: Samuel Just; +Cc: Gregory Farnum, Ilya Dryomov, ceph-devel

On Fri, Sep 12, 2014 at 1:21 AM, Samuel Just <sam.just@inktank.com> wrote:
> Yeah, so that's part of it.  The larger question is whether it's ok
> for the client to indiscriminately send that op in the first place.

FWIW, I think it's got to be.  We don't control all the clients, and
I believe I mentioned this to Sage or Josh a while back.  We set FAILOK
to make older OSDs ignore alloc hint op, but that of course that
doesn't help if it's (one of) the replica OSDs that is older.  When
merging alloc hint, it was understood that if there are any older OSDs
in the acting set they will crash in FileStore, but nothing was done
about it..

The full feature bit sounded like an overkill, especially given that
alloc hint doesn't affect the data layout, older OSDs can still read
and write fine, and after all it's just a hint.  Having the primary
return -EOPNOTSUPP based on lists of supported ops sounds to me like
a good idea, both for alloc hint op and future ops.

Thanks,

                Ilya


> -Sam
>
> On Thu, Sep 11, 2014 at 2:05 PM, Gregory Farnum <greg@inktank.com> wrote:
>> Oh, in that case the peers could just share their supported ops with
>> the primary or something (like we do with mon commands). That sounds
>> good to me, anyway?
>> -Greg
>>
>> On Thu, Sep 11, 2014 at 1:46 PM, Samuel Just <sam.just@inktank.com> wrote:
>>> No, we don't put the transaction into the pg log.
>>> -Sam
>>>
>>> On Thu, Sep 11, 2014 at 1:40 PM, Gregory Farnum <greg@inktank.com> wrote:
>>>> Does the hint not go into the pg log? Which could be retried on an older OSD?
>>>>
>>>> On Thu, Sep 11, 2014 at 1:33 PM, Samuel Just <sam.just@inktank.com> wrote:
>>>>> That part is harmless, the transaction would be recreated for the new
>>>>> acting set taking into account the new acting set features.  It
>>>>> doesn't have any actual affect on the contents of the object.
>>>>> -Sam
>>>>>
>>>>> On Thu, Sep 11, 2014 at 1:30 PM, Gregory Farnum <greg@inktank.com> wrote:
>>>>>> On Thu, Sep 11, 2014 at 1:19 PM, Samuel Just <sam.just@inktank.com> wrote:
>>>>>>> http://tracker.ceph.com/issues/9419
>>>>>>>
>>>>>>> librbd unconditionally sends set_alloc_hint.  Do we require that users
>>>>>>> upgrade the osds first?  Also, should the primary respond with
>>>>>>> ENOTSUPP if any replicas don't support it?
>>>>>>
>>>>>> Something closer to the second option, I think...but then you run into
>>>>>> the problem where maybe the PG gets moved from a set of new OSDs to a
>>>>>> set of old ones that don't support the op. :/ I think for anything
>>>>>> that goes to disk you need to go through a full features-in-the-osdmap
>>>>>> process like we did for erasure coding.
>>>>>> -Greg
>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-09-12  8:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-11 20:19 set_alloc_hint old osds Samuel Just
2014-09-11 20:30 ` Gregory Farnum
2014-09-11 20:33   ` Samuel Just
2014-09-11 20:40     ` Gregory Farnum
2014-09-11 20:46       ` Samuel Just
2014-09-11 21:05         ` Gregory Farnum
2014-09-11 21:21           ` Samuel Just
2014-09-12  8:15             ` Ilya Dryomov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.