All of lore.kernel.org
 help / color / mirror / Atom feed
* divergent datastructure changes
@ 2016-08-25 21:10 Yehuda Sadeh-Weinraub
  2016-08-25 21:32 ` Sage Weil
  2016-08-25 21:41 ` Gregory Farnum
  0 siblings, 2 replies; 6+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-08-25 21:10 UTC (permalink / raw)
  To: ceph-devel

The encode/decode functionality that we use for [de]marshalling is
fine, as long as we always move forward. Here's a typical example
(redacted for simplicity).

  void encode(bufferlist& bl) const {
    ENCODE_START(8, 1, bl);
    ::encode(domain_root, bl);
    ::encode(control_pool, bl);
    ::encode(gc_pool, bl);
    ::encode(log_pool, bl);
    ::encode(intent_log_pool, bl);
    ::encode(usage_log_pool, bl);
    ::encode(user_keys_pool, bl);
    ::encode(user_email_pool, bl);
    ::encode(user_swift_pool, bl);
    ::encode(user_uid_pool, bl);
...
    ::encode(system_key, bl);
    ::encode(placement_pools, bl);
    ::encode(metadata_heap, bl);
    ::encode(realm_id, bl);
...
    ENCODE_FINISH(bl);
  }

  void decode(bufferlist::iterator& bl) {
    DECODE_START(8, bl);
    ::decode(domain_root, bl);
    ::decode(control_pool, bl);
    ::decode(gc_pool, bl);
    ::decode(log_pool, bl);
    ::decode(intent_log_pool, bl);
    ::decode(usage_log_pool, bl);
    ::decode(user_keys_pool, bl);
    ::decode(user_email_pool, bl);
    ::decode(user_swift_pool, bl);
    ::decode(user_uid_pool, bl);
...
    if (struct_v >= 3)
      ::decode(system_key, bl);
    if (struct_v >= 4)
      ::decode(placement_pools, bl);
    if (struct_v >= 5)
      ::decode(metadata_heap, bl);
    if (struct_v >= 6) {
      ::decode(realm_id, bl);
    }
...
    DECODE_FINISH(bl);
  }

So the idea is that whenever we add a field, we bump up the encoded
version, add the field at the end. Decoding is done in order, and we
test struct_v to determine whether we need to decode the next param.

The main issue I'm having trouble with right now is what to do when we
need to backport a change that needs a datastructure change. For
example. in the above example, let's say that we need to backport the
realm_id field to an older version that was only up to V3.

One solution would be to make sure that when backporting such a
change, we need to drag with us all the other fields that lead up to
to the one that we need (e.g., we want realm_id, but we need to bring
with us also placement_pools, and metadata_heap). This might not be so
sustainable. The above example might be trivial, but what if
metadata_heap was not a string, but rather a complex data type that in
order to build correctly, we need to backport another feature (and
bring with it the same issues).

It seems to me that for issues like that we might want to consider
adding a more sophisticated encoding scheme that will be more feature
oriented, rather than just blindly putting everything one after the
other.

E.g., some kind of a bit field with offsets into the data, along the
following lines:

feature.encode(0, system_key);
feature.encode(1, placement_pools);
feature.encode(2, metadata_heap);
::encode(features, bl);

and on the decoding side:

::decode(features, bl);
features.decode(0, system_key);
features.decode(1, placement_pools);
features.decode(2, metadata_heap);

In the above example, if we only need metadata_heap, then we can just do this:

::decode(features, bl);
features.decode(2, metadata_heap);

The indexes to the fields should be defined appropriately obviously,
and will be consistent across versions. That should be relatively
easier to maintain than making sure we keep the data structures
consistent when having divergent branches I think.

Any thoughts?

Thanks,
Yehuda

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: divergent datastructure changes
  2016-08-25 21:10 divergent datastructure changes Yehuda Sadeh-Weinraub
@ 2016-08-25 21:32 ` Sage Weil
  2016-08-25 21:59   ` Yehuda Sadeh-Weinraub
  2016-08-25 21:41 ` Gregory Farnum
  1 sibling, 1 reply; 6+ messages in thread
From: Sage Weil @ 2016-08-25 21:32 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub; +Cc: ceph-devel

On Thu, 25 Aug 2016, Yehuda Sadeh-Weinraub wrote:
> The encode/decode functionality that we use for [de]marshalling is
> fine, as long as we always move forward. Here's a typical example
> (redacted for simplicity).
> 
>   void encode(bufferlist& bl) const {
>     ENCODE_START(8, 1, bl);
>     ::encode(domain_root, bl);
>     ::encode(control_pool, bl);
>     ::encode(gc_pool, bl);
>     ::encode(log_pool, bl);
>     ::encode(intent_log_pool, bl);
>     ::encode(usage_log_pool, bl);
>     ::encode(user_keys_pool, bl);
>     ::encode(user_email_pool, bl);
>     ::encode(user_swift_pool, bl);
>     ::encode(user_uid_pool, bl);
> ...
>     ::encode(system_key, bl);
>     ::encode(placement_pools, bl);
>     ::encode(metadata_heap, bl);
>     ::encode(realm_id, bl);
> ...
>     ENCODE_FINISH(bl);
>   }
> 
>   void decode(bufferlist::iterator& bl) {
>     DECODE_START(8, bl);
>     ::decode(domain_root, bl);
>     ::decode(control_pool, bl);
>     ::decode(gc_pool, bl);
>     ::decode(log_pool, bl);
>     ::decode(intent_log_pool, bl);
>     ::decode(usage_log_pool, bl);
>     ::decode(user_keys_pool, bl);
>     ::decode(user_email_pool, bl);
>     ::decode(user_swift_pool, bl);
>     ::decode(user_uid_pool, bl);
> ...
>     if (struct_v >= 3)
>       ::decode(system_key, bl);
>     if (struct_v >= 4)
>       ::decode(placement_pools, bl);
>     if (struct_v >= 5)
>       ::decode(metadata_heap, bl);
>     if (struct_v >= 6) {
>       ::decode(realm_id, bl);
>     }
> ...
>     DECODE_FINISH(bl);
>   }
> 
> So the idea is that whenever we add a field, we bump up the encoded
> version, add the field at the end. Decoding is done in order, and we
> test struct_v to determine whether we need to decode the next param.
> 
> The main issue I'm having trouble with right now is what to do when we
> need to backport a change that needs a datastructure change. For
> example. in the above example, let's say that we need to backport the
> realm_id field to an older version that was only up to V3.
> 
> One solution would be to make sure that when backporting such a
> change, we need to drag with us all the other fields that lead up to
> to the one that we need (e.g., we want realm_id, but we need to bring
> with us also placement_pools, and metadata_heap). This might not be so
> sustainable. The above example might be trivial, but what if
> metadata_heap was not a string, but rather a complex data type that in
> order to build correctly, we need to backport another feature (and
> bring with it the same issues).

In the past we've just dragged the other fields with us.  I can only 
recall this happening a handful of times.  I try to put any new fields 
into a separate commit with *just* the data structure changes so that 
(among other things) it can be easily backported without dragging anything 
else along with it.
 
> It seems to me that for issues like that we might want to consider
> adding a more sophisticated encoding scheme that will be more feature
> oriented, rather than just blindly putting everything one after the
> other.
> 
> E.g., some kind of a bit field with offsets into the data, along the
> following lines:
> 
> feature.encode(0, system_key);
> feature.encode(1, placement_pools);
> feature.encode(2, metadata_heap);
> ::encode(features, bl);
> 
> and on the decoding side:
> 
> ::decode(features, bl);
> features.decode(0, system_key);
> features.decode(1, placement_pools);
> features.decode(2, metadata_heap);
> 
> In the above example, if we only need metadata_heap, then we can just do this:
> 
> ::decode(features, bl);
> features.decode(2, metadata_heap);
> 
> The indexes to the fields should be defined appropriately obviously,
> and will be consistent across versions. That should be relatively
> easier to maintain than making sure we keep the data structures
> consistent when having divergent branches I think.

This seems useful if we need to backport things out of order, but how 
often does that really happen?  The cost is some CPU time during decoding, 
incurred for every encode/decode for all time, to cover a case that should 
be pretty rare.

Have we need a case where the intervening fields are complex data 
structure(s) and difficult to backport?

sage

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: divergent datastructure changes
  2016-08-25 21:10 divergent datastructure changes Yehuda Sadeh-Weinraub
  2016-08-25 21:32 ` Sage Weil
@ 2016-08-25 21:41 ` Gregory Farnum
  1 sibling, 0 replies; 6+ messages in thread
From: Gregory Farnum @ 2016-08-25 21:41 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub; +Cc: ceph-devel

On Thu, Aug 25, 2016 at 2:10 PM, Yehuda Sadeh-Weinraub
<yehuda@redhat.com> wrote:
> The encode/decode functionality that we use for [de]marshalling is
> fine, as long as we always move forward. Here's a typical example
> (redacted for simplicity).
>
>   void encode(bufferlist& bl) const {
>     ENCODE_START(8, 1, bl);
>     ::encode(domain_root, bl);
>     ::encode(control_pool, bl);
>     ::encode(gc_pool, bl);
>     ::encode(log_pool, bl);
>     ::encode(intent_log_pool, bl);
>     ::encode(usage_log_pool, bl);
>     ::encode(user_keys_pool, bl);
>     ::encode(user_email_pool, bl);
>     ::encode(user_swift_pool, bl);
>     ::encode(user_uid_pool, bl);
> ...
>     ::encode(system_key, bl);
>     ::encode(placement_pools, bl);
>     ::encode(metadata_heap, bl);
>     ::encode(realm_id, bl);
> ...
>     ENCODE_FINISH(bl);
>   }
>
>   void decode(bufferlist::iterator& bl) {
>     DECODE_START(8, bl);
>     ::decode(domain_root, bl);
>     ::decode(control_pool, bl);
>     ::decode(gc_pool, bl);
>     ::decode(log_pool, bl);
>     ::decode(intent_log_pool, bl);
>     ::decode(usage_log_pool, bl);
>     ::decode(user_keys_pool, bl);
>     ::decode(user_email_pool, bl);
>     ::decode(user_swift_pool, bl);
>     ::decode(user_uid_pool, bl);
> ...
>     if (struct_v >= 3)
>       ::decode(system_key, bl);
>     if (struct_v >= 4)
>       ::decode(placement_pools, bl);
>     if (struct_v >= 5)
>       ::decode(metadata_heap, bl);
>     if (struct_v >= 6) {
>       ::decode(realm_id, bl);
>     }
> ...
>     DECODE_FINISH(bl);
>   }
>
> So the idea is that whenever we add a field, we bump up the encoded
> version, add the field at the end. Decoding is done in order, and we
> test struct_v to determine whether we need to decode the next param.
>
> The main issue I'm having trouble with right now is what to do when we
> need to backport a change that needs a datastructure change. For
> example. in the above example, let's say that we need to backport the
> realm_id field to an older version that was only up to V3.
>
> One solution would be to make sure that when backporting such a
> change, we need to drag with us all the other fields that lead up to
> to the one that we need (e.g., we want realm_id, but we need to bring
> with us also placement_pools, and metadata_heap). This might not be so
> sustainable. The above example might be trivial, but what if
> metadata_heap was not a string, but rather a complex data type that in
> order to build correctly, we need to backport another feature (and
> bring with it the same issues).
>
> It seems to me that for issues like that we might want to consider
> adding a more sophisticated encoding scheme that will be more feature
> oriented, rather than just blindly putting everything one after the
> other.
>
> E.g., some kind of a bit field with offsets into the data, along the
> following lines:
>
> feature.encode(0, system_key);
> feature.encode(1, placement_pools);
> feature.encode(2, metadata_heap);
> ::encode(features, bl);
>
> and on the decoding side:
>
> ::decode(features, bl);
> features.decode(0, system_key);
> features.decode(1, placement_pools);
> features.decode(2, metadata_heap);
>
> In the above example, if we only need metadata_heap, then we can just do this:
>
> ::decode(features, bl);
> features.decode(2, metadata_heap);
>
> The indexes to the fields should be defined appropriately obviously,
> and will be consistent across versions. That should be relatively
> easier to maintain than making sure we keep the data structures
> consistent when having divergent branches I think.
>
> Any thoughts?

This kind of scheme looks like it makes the encoding way fatter in
terms of metadata overhead (adding an offset for every new member,
many of which are offset-sized and repeated constantly in the FS
code), which I'm not comfortable with. I'm not sure if I've run into
this specific problem or not, but generally we've been able to just
blindly encode default values into the backport, or else used feature
bit/CompatSet info to supplement the raw data structure encoding in
determining what to do.
-Greg

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: divergent datastructure changes
  2016-08-25 21:32 ` Sage Weil
@ 2016-08-25 21:59   ` Yehuda Sadeh-Weinraub
  2016-08-25 22:04     ` Mark Nelson
  0 siblings, 1 reply; 6+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-08-25 21:59 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Thu, Aug 25, 2016 at 2:32 PM, Sage Weil <sage@newdream.net> wrote:
> On Thu, 25 Aug 2016, Yehuda Sadeh-Weinraub wrote:
>> The encode/decode functionality that we use for [de]marshalling is
>> fine, as long as we always move forward. Here's a typical example
>> (redacted for simplicity).
>>
>>   void encode(bufferlist& bl) const {
>>     ENCODE_START(8, 1, bl);
>>     ::encode(domain_root, bl);
>>     ::encode(control_pool, bl);
>>     ::encode(gc_pool, bl);
>>     ::encode(log_pool, bl);
>>     ::encode(intent_log_pool, bl);
>>     ::encode(usage_log_pool, bl);
>>     ::encode(user_keys_pool, bl);
>>     ::encode(user_email_pool, bl);
>>     ::encode(user_swift_pool, bl);
>>     ::encode(user_uid_pool, bl);
>> ...
>>     ::encode(system_key, bl);
>>     ::encode(placement_pools, bl);
>>     ::encode(metadata_heap, bl);
>>     ::encode(realm_id, bl);
>> ...
>>     ENCODE_FINISH(bl);
>>   }
>>
>>   void decode(bufferlist::iterator& bl) {
>>     DECODE_START(8, bl);
>>     ::decode(domain_root, bl);
>>     ::decode(control_pool, bl);
>>     ::decode(gc_pool, bl);
>>     ::decode(log_pool, bl);
>>     ::decode(intent_log_pool, bl);
>>     ::decode(usage_log_pool, bl);
>>     ::decode(user_keys_pool, bl);
>>     ::decode(user_email_pool, bl);
>>     ::decode(user_swift_pool, bl);
>>     ::decode(user_uid_pool, bl);
>> ...
>>     if (struct_v >= 3)
>>       ::decode(system_key, bl);
>>     if (struct_v >= 4)
>>       ::decode(placement_pools, bl);
>>     if (struct_v >= 5)
>>       ::decode(metadata_heap, bl);
>>     if (struct_v >= 6) {
>>       ::decode(realm_id, bl);
>>     }
>> ...
>>     DECODE_FINISH(bl);
>>   }
>>
>> So the idea is that whenever we add a field, we bump up the encoded
>> version, add the field at the end. Decoding is done in order, and we
>> test struct_v to determine whether we need to decode the next param.
>>
>> The main issue I'm having trouble with right now is what to do when we
>> need to backport a change that needs a datastructure change. For
>> example. in the above example, let's say that we need to backport the
>> realm_id field to an older version that was only up to V3.
>>
>> One solution would be to make sure that when backporting such a
>> change, we need to drag with us all the other fields that lead up to
>> to the one that we need (e.g., we want realm_id, but we need to bring
>> with us also placement_pools, and metadata_heap). This might not be so
>> sustainable. The above example might be trivial, but what if
>> metadata_heap was not a string, but rather a complex data type that in
>> order to build correctly, we need to backport another feature (and
>> bring with it the same issues).
>
> In the past we've just dragged the other fields with us.  I can only
> recall this happening a handful of times.  I try to put any new fields
> into a separate commit with *just* the data structure changes so that
> (among other things) it can be easily backported without dragging anything
> else along with it.

Right. It's a delicate job, and it only gets complicated when dealing
with multiple versions, not necessarily going at the same direction.

>
>> It seems to me that for issues like that we might want to consider
>> adding a more sophisticated encoding scheme that will be more feature
>> oriented, rather than just blindly putting everything one after the
>> other.
>>
>> E.g., some kind of a bit field with offsets into the data, along the
>> following lines:
>>
>> feature.encode(0, system_key);
>> feature.encode(1, placement_pools);
>> feature.encode(2, metadata_heap);
>> ::encode(features, bl);
>>
>> and on the decoding side:
>>
>> ::decode(features, bl);
>> features.decode(0, system_key);
>> features.decode(1, placement_pools);
>> features.decode(2, metadata_heap);
>>
>> In the above example, if we only need metadata_heap, then we can just do this:
>>
>> ::decode(features, bl);
>> features.decode(2, metadata_heap);
>>
>> The indexes to the fields should be defined appropriately obviously,
>> and will be consistent across versions. That should be relatively
>> easier to maintain than making sure we keep the data structures
>> consistent when having divergent branches I think.
>
> This seems useful if we need to backport things out of order, but how
> often does that really happen?  The cost is some CPU time during decoding,
> incurred for every encode/decode for all time, to cover a case that should
> be pretty rare.
>
> Have we need a case where the intervening fields are complex data
> structure(s) and difficult to backport?
>

We managed pretty fine when mainly going forward. However, this limits
our ability to backport complex fixes and/or entire set of features.
As I said earlier, keeping the current scheme is possible, but having
the need to maintain divergent versions it makes our lives much more
complicated.
I'm well aware of the impact on both performance and structure sizes.
We can maybe limit this scheme to just deal with encoding of distinct
set of changes that we know we'd need to backport (we need to know it
in advance anyway here), and continue with the current method as is.

Yehuda

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: divergent datastructure changes
  2016-08-25 21:59   ` Yehuda Sadeh-Weinraub
@ 2016-08-25 22:04     ` Mark Nelson
  2016-08-25 22:20       ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Nelson @ 2016-08-25 22:04 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub, Sage Weil; +Cc: ceph-devel



On 08/25/2016 04:59 PM, Yehuda Sadeh-Weinraub wrote:
> On Thu, Aug 25, 2016 at 2:32 PM, Sage Weil <sage@newdream.net> wrote:
>> On Thu, 25 Aug 2016, Yehuda Sadeh-Weinraub wrote:
>>> The encode/decode functionality that we use for [de]marshalling is
>>> fine, as long as we always move forward. Here's a typical example
>>> (redacted for simplicity).
>>>
>>>   void encode(bufferlist& bl) const {
>>>     ENCODE_START(8, 1, bl);
>>>     ::encode(domain_root, bl);
>>>     ::encode(control_pool, bl);
>>>     ::encode(gc_pool, bl);
>>>     ::encode(log_pool, bl);
>>>     ::encode(intent_log_pool, bl);
>>>     ::encode(usage_log_pool, bl);
>>>     ::encode(user_keys_pool, bl);
>>>     ::encode(user_email_pool, bl);
>>>     ::encode(user_swift_pool, bl);
>>>     ::encode(user_uid_pool, bl);
>>> ...
>>>     ::encode(system_key, bl);
>>>     ::encode(placement_pools, bl);
>>>     ::encode(metadata_heap, bl);
>>>     ::encode(realm_id, bl);
>>> ...
>>>     ENCODE_FINISH(bl);
>>>   }
>>>
>>>   void decode(bufferlist::iterator& bl) {
>>>     DECODE_START(8, bl);
>>>     ::decode(domain_root, bl);
>>>     ::decode(control_pool, bl);
>>>     ::decode(gc_pool, bl);
>>>     ::decode(log_pool, bl);
>>>     ::decode(intent_log_pool, bl);
>>>     ::decode(usage_log_pool, bl);
>>>     ::decode(user_keys_pool, bl);
>>>     ::decode(user_email_pool, bl);
>>>     ::decode(user_swift_pool, bl);
>>>     ::decode(user_uid_pool, bl);
>>> ...
>>>     if (struct_v >= 3)
>>>       ::decode(system_key, bl);
>>>     if (struct_v >= 4)
>>>       ::decode(placement_pools, bl);
>>>     if (struct_v >= 5)
>>>       ::decode(metadata_heap, bl);
>>>     if (struct_v >= 6) {
>>>       ::decode(realm_id, bl);
>>>     }
>>> ...
>>>     DECODE_FINISH(bl);
>>>   }
>>>
>>> So the idea is that whenever we add a field, we bump up the encoded
>>> version, add the field at the end. Decoding is done in order, and we
>>> test struct_v to determine whether we need to decode the next param.
>>>
>>> The main issue I'm having trouble with right now is what to do when we
>>> need to backport a change that needs a datastructure change. For
>>> example. in the above example, let's say that we need to backport the
>>> realm_id field to an older version that was only up to V3.
>>>
>>> One solution would be to make sure that when backporting such a
>>> change, we need to drag with us all the other fields that lead up to
>>> to the one that we need (e.g., we want realm_id, but we need to bring
>>> with us also placement_pools, and metadata_heap). This might not be so
>>> sustainable. The above example might be trivial, but what if
>>> metadata_heap was not a string, but rather a complex data type that in
>>> order to build correctly, we need to backport another feature (and
>>> bring with it the same issues).
>>
>> In the past we've just dragged the other fields with us.  I can only
>> recall this happening a handful of times.  I try to put any new fields
>> into a separate commit with *just* the data structure changes so that
>> (among other things) it can be easily backported without dragging anything
>> else along with it.
>
> Right. It's a delicate job, and it only gets complicated when dealing
> with multiple versions, not necessarily going at the same direction.
>
>>
>>> It seems to me that for issues like that we might want to consider
>>> adding a more sophisticated encoding scheme that will be more feature
>>> oriented, rather than just blindly putting everything one after the
>>> other.
>>>
>>> E.g., some kind of a bit field with offsets into the data, along the
>>> following lines:
>>>
>>> feature.encode(0, system_key);
>>> feature.encode(1, placement_pools);
>>> feature.encode(2, metadata_heap);
>>> ::encode(features, bl);
>>>
>>> and on the decoding side:
>>>
>>> ::decode(features, bl);
>>> features.decode(0, system_key);
>>> features.decode(1, placement_pools);
>>> features.decode(2, metadata_heap);
>>>
>>> In the above example, if we only need metadata_heap, then we can just do this:
>>>
>>> ::decode(features, bl);
>>> features.decode(2, metadata_heap);
>>>
>>> The indexes to the fields should be defined appropriately obviously,
>>> and will be consistent across versions. That should be relatively
>>> easier to maintain than making sure we keep the data structures
>>> consistent when having divergent branches I think.
>>
>> This seems useful if we need to backport things out of order, but how
>> often does that really happen?  The cost is some CPU time during decoding,
>> incurred for every encode/decode for all time, to cover a case that should
>> be pretty rare.
>>
>> Have we need a case where the intervening fields are complex data
>> structure(s) and difficult to backport?
>>
>
> We managed pretty fine when mainly going forward. However, this limits
> our ability to backport complex fixes and/or entire set of features.
> As I said earlier, keeping the current scheme is possible, but having
> the need to maintain divergent versions it makes our lives much more
> complicated.
> I'm well aware of the impact on both performance and structure sizes.
> We can maybe limit this scheme to just deal with encoding of distinct
> set of changes that we know we'd need to backport (we need to know it
> in advance anyway here), and continue with the current method as is.

fwiw, I'm very sensitive to encoding overhead right now.  It's like the 
#1 thing I'm grumpy about with maybe the exception of the split/merge 
issues in filestore.  I'd really prefer not doing anything that's going 
to make things worse, at least not until we make it better!

Mark

>
> Yehuda
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: divergent datastructure changes
  2016-08-25 22:04     ` Mark Nelson
@ 2016-08-25 22:20       ` Yehuda Sadeh-Weinraub
  0 siblings, 0 replies; 6+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-08-25 22:20 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Sage Weil, ceph-devel

On Thu, Aug 25, 2016 at 3:04 PM, Mark Nelson <mnelson@redhat.com> wrote:
>
>
> On 08/25/2016 04:59 PM, Yehuda Sadeh-Weinraub wrote:
>>
>> On Thu, Aug 25, 2016 at 2:32 PM, Sage Weil <sage@newdream.net> wrote:
>>>
>>> On Thu, 25 Aug 2016, Yehuda Sadeh-Weinraub wrote:
>>>>
>>>> The encode/decode functionality that we use for [de]marshalling is
>>>> fine, as long as we always move forward. Here's a typical example
>>>> (redacted for simplicity).
>>>>
>>>>   void encode(bufferlist& bl) const {
>>>>     ENCODE_START(8, 1, bl);
>>>>     ::encode(domain_root, bl);
>>>>     ::encode(control_pool, bl);
>>>>     ::encode(gc_pool, bl);
>>>>     ::encode(log_pool, bl);
>>>>     ::encode(intent_log_pool, bl);
>>>>     ::encode(usage_log_pool, bl);
>>>>     ::encode(user_keys_pool, bl);
>>>>     ::encode(user_email_pool, bl);
>>>>     ::encode(user_swift_pool, bl);
>>>>     ::encode(user_uid_pool, bl);
>>>> ...
>>>>     ::encode(system_key, bl);
>>>>     ::encode(placement_pools, bl);
>>>>     ::encode(metadata_heap, bl);
>>>>     ::encode(realm_id, bl);
>>>> ...
>>>>     ENCODE_FINISH(bl);
>>>>   }
>>>>
>>>>   void decode(bufferlist::iterator& bl) {
>>>>     DECODE_START(8, bl);
>>>>     ::decode(domain_root, bl);
>>>>     ::decode(control_pool, bl);
>>>>     ::decode(gc_pool, bl);
>>>>     ::decode(log_pool, bl);
>>>>     ::decode(intent_log_pool, bl);
>>>>     ::decode(usage_log_pool, bl);
>>>>     ::decode(user_keys_pool, bl);
>>>>     ::decode(user_email_pool, bl);
>>>>     ::decode(user_swift_pool, bl);
>>>>     ::decode(user_uid_pool, bl);
>>>> ...
>>>>     if (struct_v >= 3)
>>>>       ::decode(system_key, bl);
>>>>     if (struct_v >= 4)
>>>>       ::decode(placement_pools, bl);
>>>>     if (struct_v >= 5)
>>>>       ::decode(metadata_heap, bl);
>>>>     if (struct_v >= 6) {
>>>>       ::decode(realm_id, bl);
>>>>     }
>>>> ...
>>>>     DECODE_FINISH(bl);
>>>>   }
>>>>
>>>> So the idea is that whenever we add a field, we bump up the encoded
>>>> version, add the field at the end. Decoding is done in order, and we
>>>> test struct_v to determine whether we need to decode the next param.
>>>>
>>>> The main issue I'm having trouble with right now is what to do when we
>>>> need to backport a change that needs a datastructure change. For
>>>> example. in the above example, let's say that we need to backport the
>>>> realm_id field to an older version that was only up to V3.
>>>>
>>>> One solution would be to make sure that when backporting such a
>>>> change, we need to drag with us all the other fields that lead up to
>>>> to the one that we need (e.g., we want realm_id, but we need to bring
>>>> with us also placement_pools, and metadata_heap). This might not be so
>>>> sustainable. The above example might be trivial, but what if
>>>> metadata_heap was not a string, but rather a complex data type that in
>>>> order to build correctly, we need to backport another feature (and
>>>> bring with it the same issues).
>>>
>>>
>>> In the past we've just dragged the other fields with us.  I can only
>>> recall this happening a handful of times.  I try to put any new fields
>>> into a separate commit with *just* the data structure changes so that
>>> (among other things) it can be easily backported without dragging
>>> anything
>>> else along with it.
>>
>>
>> Right. It's a delicate job, and it only gets complicated when dealing
>> with multiple versions, not necessarily going at the same direction.
>>
>>>
>>>> It seems to me that for issues like that we might want to consider
>>>> adding a more sophisticated encoding scheme that will be more feature
>>>> oriented, rather than just blindly putting everything one after the
>>>> other.
>>>>
>>>> E.g., some kind of a bit field with offsets into the data, along the
>>>> following lines:
>>>>
>>>> feature.encode(0, system_key);
>>>> feature.encode(1, placement_pools);
>>>> feature.encode(2, metadata_heap);
>>>> ::encode(features, bl);
>>>>
>>>> and on the decoding side:
>>>>
>>>> ::decode(features, bl);
>>>> features.decode(0, system_key);
>>>> features.decode(1, placement_pools);
>>>> features.decode(2, metadata_heap);
>>>>
>>>> In the above example, if we only need metadata_heap, then we can just do
>>>> this:
>>>>
>>>> ::decode(features, bl);
>>>> features.decode(2, metadata_heap);
>>>>
>>>> The indexes to the fields should be defined appropriately obviously,
>>>> and will be consistent across versions. That should be relatively
>>>> easier to maintain than making sure we keep the data structures
>>>> consistent when having divergent branches I think.
>>>
>>>
>>> This seems useful if we need to backport things out of order, but how
>>> often does that really happen?  The cost is some CPU time during
>>> decoding,
>>> incurred for every encode/decode for all time, to cover a case that
>>> should
>>> be pretty rare.
>>>
>>> Have we need a case where the intervening fields are complex data
>>> structure(s) and difficult to backport?
>>>
>>
>> We managed pretty fine when mainly going forward. However, this limits
>> our ability to backport complex fixes and/or entire set of features.
>> As I said earlier, keeping the current scheme is possible, but having
>> the need to maintain divergent versions it makes our lives much more
>> complicated.
>> I'm well aware of the impact on both performance and structure sizes.
>> We can maybe limit this scheme to just deal with encoding of distinct
>> set of changes that we know we'd need to backport (we need to know it
>> in advance anyway here), and continue with the current method as is.
>
>
> fwiw, I'm very sensitive to encoding overhead right now.  It's like the #1
> thing I'm grumpy about with maybe the exception of the split/merge issues in
> filestore.  I'd really prefer not doing anything that's going to make things
> worse, at least not until we make it better!
>


Not everything is in the critical path and for certain data structures
we can sustain the extra data size and extra computation time. We
don't need to convert existing data structures, the approach I'm
suggesting can live side by side with the current scheme.

Yehuda

> Mark
>
>>
>> Yehuda
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-08-25 22:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-25 21:10 divergent datastructure changes Yehuda Sadeh-Weinraub
2016-08-25 21:32 ` Sage Weil
2016-08-25 21:59   ` Yehuda Sadeh-Weinraub
2016-08-25 22:04     ` Mark Nelson
2016-08-25 22:20       ` Yehuda Sadeh-Weinraub
2016-08-25 21:41 ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.