All of lore.kernel.org
 help / color / mirror / Atom feed
* ceph encoding optimization
@ 2015-11-04 15:00 池信泽
  2015-11-04 15:05 ` Sage Weil
  2015-11-04 15:07 ` Gregory Farnum
  0 siblings, 2 replies; 13+ messages in thread
From: 池信泽 @ 2015-11-04 15:00 UTC (permalink / raw)
  To: ceph-devel

hi, all:

     I am focus on the cpu usage of ceph now. I find the struct (such
as pg_info_t , transaction and so on) encode and decode exhaust too
much cpu resource.

     For now, we should encode every member variable one by one which
calling encode_raw finally. When there are many members, we should
encode it many times. But I think, we could reduce some in some cases.

     For example , struct A { int a; int b; int c }; ceph would
encoding int a , and then encode int b , finally int c. But for this
case , we could calling bufferlist.append((char *)(&a), sizeof(A))
because there are not padding bytes in this struct.

     I use the above optimization, the cpu usage of object_stat_sum_t
encoding decrease from 0.5% to 0% (I could not see any using perf
tools).

     This is only a case, so I think we could do similar optimization
other struct. I think we should pay attention to the padding in
struct.

-- +
Regards,
xinze

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ceph encoding optimization
  2015-11-04 15:00 ceph encoding optimization 池信泽
@ 2015-11-04 15:05 ` Sage Weil
  2015-11-04 15:07 ` Gregory Farnum
  1 sibling, 0 replies; 13+ messages in thread
From: Sage Weil @ 2015-11-04 15:05 UTC (permalink / raw)
  To: 池信泽; +Cc: ceph-devel, jdurgin

On Wed, 4 Nov 2015, ??? wrote:
> hi, all:
> 
>      I am focus on the cpu usage of ceph now. I find the struct (such
> as pg_info_t , transaction and so on) encode and decode exhaust too
> much cpu resource.
> 
>      For now, we should encode every member variable one by one which
> calling encode_raw finally. When there are many members, we should
> encode it many times. But I think, we could reduce some in some cases.
> 
>      For example , struct A { int a; int b; int c }; ceph would
> encoding int a , and then encode int b , finally int c. But for this
> case , we could calling bufferlist.append((char *)(&a), sizeof(A))
> because there are not padding bytes in this struct.
> 
>      I use the above optimization, the cpu usage of object_stat_sum_t
> encoding decrease from 0.5% to 0% (I could not see any using perf
> tools).
> 
>      This is only a case, so I think we could do similar optimization
> other struct. I think we should pay attention to the padding in
> struct.

We have to be careful because we need to ensure that the encoding 
is little-endian.  I think teh way to do this is to define a struct like

struct foo_t {
  __le64 a;
  __le64 b;
  __le64 c;
  ...
};

and just make a foo_t *p that points into the buffer.  There were some 
patches that did this that came out of the Portland ahckathon, but I'm not 
sure where they are... Josh, do you remember?

FWIW I think pg_stat_t (and friends) is a good first start since it is 
expensive and part of the MOSDOp and MOSDRepOp.

sage

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ceph encoding optimization
  2015-11-04 15:00 ceph encoding optimization 池信泽
  2015-11-04 15:05 ` Sage Weil
@ 2015-11-04 15:07 ` Gregory Farnum
  2015-11-04 15:33   ` 池信泽
  1 sibling, 1 reply; 13+ messages in thread
From: Gregory Farnum @ 2015-11-04 15:07 UTC (permalink / raw)
  To: 池信泽; +Cc: ceph-devel

On Wed, Nov 4, 2015 at 7:00 AM, 池信泽 <xmdxcxz@gmail.com> wrote:
> hi, all:
>
>      I am focus on the cpu usage of ceph now. I find the struct (such
> as pg_info_t , transaction and so on) encode and decode exhaust too
> much cpu resource.
>
>      For now, we should encode every member variable one by one which
> calling encode_raw finally. When there are many members, we should
> encode it many times. But I think, we could reduce some in some cases.
>
>      For example , struct A { int a; int b; int c }; ceph would
> encoding int a , and then encode int b , finally int c. But for this
> case , we could calling bufferlist.append((char *)(&a), sizeof(A))
> because there are not padding bytes in this struct.
>
>      I use the above optimization, the cpu usage of object_stat_sum_t
> encoding decrease from 0.5% to 0% (I could not see any using perf
> tools).
>
>      This is only a case, so I think we could do similar optimization
> other struct. I think we should pay attention to the padding in
> struct.

The problem with this approach is that the encoded versions need to be
platform-independent — they are shared over the wire and written to
disks that might get transplanted to different machines. Apart from
padding bytes, we also need to worry about endianness of the machine,
etc. *And* we often mutate structures across versions in order to add
new abilities, relying on the encode-decode process to deal with any
changes to the system. How could we deal with that if just dumping the
raw memory?

Now, maybe we could make these changes on some carefully-selected
structs, I'm not sure. But we'd need a way to pick them out, guarantee
that we aren't breaking interoperability concerns, etc; and it would
need to be something we can maintain as a group going forward. I'm not
sure how to satisfy those constraints without burning a little extra
CPU. :/
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ceph encoding optimization
  2015-11-04 15:07 ` Gregory Farnum
@ 2015-11-04 15:33   ` 池信泽
  2015-11-04 17:19     ` Piotr.Dalek
  0 siblings, 1 reply; 13+ messages in thread
From: 池信泽 @ 2015-11-04 15:33 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

I agree with pg_stat_t (and friends) is a good first start.
The eversion_t and utime_t are also good choice to start because they
are used at many places.

2015-11-04 23:07 GMT+08:00 Gregory Farnum <gfarnum@redhat.com>:
> On Wed, Nov 4, 2015 at 7:00 AM, 池信泽 <xmdxcxz@gmail.com> wrote:
>> hi, all:
>>
>>      I am focus on the cpu usage of ceph now. I find the struct (such
>> as pg_info_t , transaction and so on) encode and decode exhaust too
>> much cpu resource.
>>
>>      For now, we should encode every member variable one by one which
>> calling encode_raw finally. When there are many members, we should
>> encode it many times. But I think, we could reduce some in some cases.
>>
>>      For example , struct A { int a; int b; int c }; ceph would
>> encoding int a , and then encode int b , finally int c. But for this
>> case , we could calling bufferlist.append((char *)(&a), sizeof(A))
>> because there are not padding bytes in this struct.
>>
>>      I use the above optimization, the cpu usage of object_stat_sum_t
>> encoding decrease from 0.5% to 0% (I could not see any using perf
>> tools).
>>
>>      This is only a case, so I think we could do similar optimization
>> other struct. I think we should pay attention to the padding in
>> struct.
>
> The problem with this approach is that the encoded versions need to be
> platform-independent — they are shared over the wire and written to
> disks that might get transplanted to different machines. Apart from
> padding bytes, we also need to worry about endianness of the machine,
> etc. *And* we often mutate structures across versions in order to add
> new abilities, relying on the encode-decode process to deal with any
> changes to the system. How could we deal with that if just dumping the
> raw memory?
>
> Now, maybe we could make these changes on some carefully-selected
> structs, I'm not sure. But we'd need a way to pick them out, guarantee
> that we aren't breaking interoperability concerns, etc; and it would
> need to be something we can maintain as a group going forward. I'm not
> sure how to satisfy those constraints without burning a little extra
> CPU. :/
> -Greg



-- 
Regards,
xinze
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: ceph encoding optimization
  2015-11-04 15:33   ` 池信泽
@ 2015-11-04 17:19     ` Piotr.Dalek
  2015-11-04 17:29       ` Haomai Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Piotr.Dalek @ 2015-11-04 17:19 UTC (permalink / raw)
  To: 池信泽, Gregory Farnum; +Cc: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 956 bytes --]

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> owner@vger.kernel.org] On Behalf Of ???
> Sent: Wednesday, November 04, 2015 4:34 PM
> To: Gregory Farnum
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: ceph encoding optimization
> 
> I agree with pg_stat_t (and friends) is a good first start.
> The eversion_t and utime_t are also good choice to start because they are
> used at many places.

On Ceph Hackathon, Josh Durgin made initial steps in right direction in terms of pg_stat_t encoding and decoding optimization, with the endianness-awareness thing left out. Even in that state, performance improvements offered by this change were huge enough to make it worthwhile. I'm attaching the patch, but please note that this is prototype and based on mid-August state of code, so you might need to take that into account when applying the patch.


With best regards / Pozdrawiam
Piotr Dałek


[-- Attachment #2: 0001-prototype-of-fixed-size-encode-decode-for-pg_stat_t.patch --]
[-- Type: application/octet-stream, Size: 16011 bytes --]

From 72c3ccb3aba19a3bf0f9f7ea16cc30597c66f7fa Mon Sep 17 00:00:00 2001
From: Josh Durgin <jdurgin@redhat.com>
Date: Thu, 13 Aug 2015 10:27:27 -0700
Subject: [PATCH] prototype of fixed-size encode/decode for pg_stat_t

---
 src/osd/osd_types.cc | 271 +++++++++++++++++++++++++++------------------------
 src/osd/osd_types.h  | 136 ++++++++++++++++++++++++--
 src/test/encoding.cc |  23 +++++
 3 files changed, 297 insertions(+), 133 deletions(-)

diff --git a/src/osd/osd_types.cc b/src/osd/osd_types.cc
index df9f597..89d2e7b 100644
--- a/src/osd/osd_types.cc
+++ b/src/osd/osd_types.cc
@@ -1947,8 +1947,8 @@ void pg_stat_t::encode(bufferlist &bl) const
   ::encode(stats, bl);
   ::encode(log_size, bl);
   ::encode(ondisk_log_size, bl);
-  ::encode(up, bl);
-  ::encode(acting, bl);
+   ::encode(up, bl);
+   ::encode(acting, bl);
   ::encode(last_fresh, bl);
   ::encode(last_change, bl);
   ::encode(last_active, bl);
@@ -1971,145 +1971,162 @@ void pg_stat_t::encode(bufferlist &bl) const
   ::encode(hitset_bytes_stats_invalid, bl);
   ::encode(last_peered, bl);
   ::encode(last_became_peered, bl);
+  /*
+ ENCODE_START(22, 22, bl);
+  ::encode(packed, bl);
+  ::encode(up, bl);
+  ::encode(acting, bl);
+  ::encode(blocked_by, bl);
+ */
   ENCODE_FINISH(bl);
 }
 
 void pg_stat_t::decode(bufferlist::iterator &bl)
 {
-  DECODE_START_LEGACY_COMPAT_LEN(20, 8, 8, bl);
-  ::decode(version, bl);
-  ::decode(reported_seq, bl);
-  ::decode(reported_epoch, bl);
-  ::decode(state, bl);
-  ::decode(log_start, bl);
-  ::decode(ondisk_log_start, bl);
-  ::decode(created, bl);
-  if (struct_v >= 7)
-    ::decode(last_epoch_clean, bl);
-  else
-    last_epoch_clean = 0;
-  if (struct_v < 6) {
-    old_pg_t opgid;
-    ::decode(opgid, bl);
-    parent = opgid;
-  } else {
-    ::decode(parent, bl);
-  }
-  ::decode(parent_split_bits, bl);
-  ::decode(last_scrub, bl);
-  ::decode(last_scrub_stamp, bl);
-  if (struct_v <= 4) {
-    ::decode(stats.sum.num_bytes, bl);
-    uint64_t num_kb;
-    ::decode(num_kb, bl);
-    ::decode(stats.sum.num_objects, bl);
-    ::decode(stats.sum.num_object_clones, bl);
-    ::decode(stats.sum.num_object_copies, bl);
-    ::decode(stats.sum.num_objects_missing_on_primary, bl);
-    ::decode(stats.sum.num_objects_degraded, bl);
-    ::decode(log_size, bl);
-    ::decode(ondisk_log_size, bl);
-    if (struct_v >= 2) {
-      ::decode(stats.sum.num_rd, bl);
-      ::decode(stats.sum.num_rd_kb, bl);
-      ::decode(stats.sum.num_wr, bl);
-      ::decode(stats.sum.num_wr_kb, bl);
+  DECODE_START_LEGACY_COMPAT_LEN(22, 8, 8, bl);
+  if (struct_v < 22) {
+    ::decode(version, bl);
+    ::decode(reported_seq, bl);
+    ::decode(reported_epoch, bl);
+    ::decode(state, bl);
+    ::decode(log_start, bl);
+    ::decode(ondisk_log_start, bl);
+    ::decode(created, bl);
+
+    if (struct_v >= 7)
+      ::decode(last_epoch_clean, bl);
+    else
+      last_epoch_clean = 0;
+    if (struct_v < 6) {
+      old_pg_t opgid;
+      ::decode(opgid, bl);
+      parent = opgid;
+    } else {
+      ::decode(parent, bl);
     }
-    if (struct_v >= 3) {
+    ::decode(parent_split_bits, bl);
+    ::decode(last_scrub, bl);
+    ::decode(last_scrub_stamp, bl);
+    if (struct_v <= 4) {
+      ::decode(stats.sum.num_bytes, bl);
+      uint64_t num_kb;
+      ::decode(num_kb, bl);
+      ::decode(stats.sum.num_objects, bl);
+      ::decode(stats.sum.num_object_clones, bl);
+      ::decode(stats.sum.num_object_copies, bl);
+      ::decode(stats.sum.num_objects_missing_on_primary, bl);
+      ::decode(stats.sum.num_objects_degraded, bl);
+      ::decode(log_size, bl);
+      ::decode(ondisk_log_size, bl);
+      if (struct_v >= 2) {
+	::decode(stats.sum.num_rd, bl);
+	::decode(stats.sum.num_rd_kb, bl);
+	::decode(stats.sum.num_wr, bl);
+	::decode(stats.sum.num_wr_kb, bl);
+      }
+      if (struct_v >= 3) {
+	::decode(up, bl);
+      }
+      if (struct_v == 4) {
+	::decode(stats.sum.num_objects_unfound, bl);  // sigh.
+      }
+      ::decode(acting, bl);
+    } else {
+      ::decode(stats, bl);
+      ::decode(log_size, bl);
+      ::decode(ondisk_log_size, bl);
       ::decode(up, bl);
+      ::decode(acting, bl);
+      if (struct_v >= 9) {
+	::decode(last_fresh, bl);
+	::decode(last_change, bl);
+	::decode(last_active, bl);
+	::decode(last_clean, bl);
+	::decode(last_unstale, bl);
+	::decode(mapping_epoch, bl);
+	if (struct_v >= 10) {
+	  ::decode(last_deep_scrub, bl);
+	  ::decode(last_deep_scrub_stamp, bl);
+	}
+      }
+    }
+    if (struct_v < 11) {
+      stats_invalid = false;
+    } else {
+      ::decode(stats_invalid, bl);
     }
-    if (struct_v == 4) {
-      ::decode(stats.sum.num_objects_unfound, bl);  // sigh.
+    if (struct_v >= 12) {
+      ::decode(last_clean_scrub_stamp, bl);
+    } else {
+      last_clean_scrub_stamp = utime_t();
+    }
+    if (struct_v >= 13) {
+      ::decode(last_became_active, bl);
+    } else {
+      last_became_active = last_active;
+    }
+    if (struct_v >= 14) {
+      ::decode(dirty_stats_invalid, bl);
+    } else {
+      // if we are decoding an old encoding of this object, then the
+      // encoder may not have supported num_objects_dirty accounting.
+      dirty_stats_invalid = true;
+    }
+    if (struct_v >= 15) {
+      ::decode(up_primary, bl);
+      ::decode(acting_primary, bl);
+    } else {
+      up_primary = up.size() ? up[0] : -1;
+      acting_primary = acting.size() ? acting[0] : -1;
+    }
+    if (struct_v >= 16) {
+      ::decode(omap_stats_invalid, bl);
+    } else {
+      // if we are decoding an old encoding of this object, then the
+      // encoder may not have supported num_objects_omap accounting.
+      omap_stats_invalid = true;
+    }
+    if (struct_v >= 17) {
+      ::decode(hitset_stats_invalid, bl);
+    } else {
+      // if we are decoding an old encoding of this object, then the
+      // encoder may not have supported num_objects_hit_set_archive accounting.
+      hitset_stats_invalid = true;
+    }
+    if (struct_v >= 18) {
+      ::decode(blocked_by, bl);
+    } else {
+      blocked_by.clear();
+    }
+    if (struct_v >= 19) {
+      ::decode(last_undegraded, bl);
+      ::decode(last_fullsized, bl);
+    } else {
+      last_undegraded = utime_t();
+      last_fullsized = utime_t();
+    }
+    if (struct_v >= 20) {
+      ::decode(hitset_bytes_stats_invalid, bl);
+    } else {
+      // if we are decoding an old encoding of this object, then the
+      // encoder may not have supported num_bytes_hit_set_archive accounting.
+      hitset_bytes_stats_invalid = true;
+    }
+    if (struct_v >= 21) {
+      ::decode(last_peered, bl);
+      ::decode(last_became_peered, bl);
+    } else {
+      last_peered = last_active;
+      last_became_peered = last_became_active;
     }
-    ::decode(acting, bl);
   } else {
-    ::decode(stats, bl);
-    ::decode(log_size, bl);
-    ::decode(ondisk_log_size, bl);
+    ::decode(packed, bl);
     ::decode(up, bl);
     ::decode(acting, bl);
-    if (struct_v >= 9) {
-      ::decode(last_fresh, bl);
-      ::decode(last_change, bl);
-      ::decode(last_active, bl);
-      ::decode(last_clean, bl);
-      ::decode(last_unstale, bl);
-      ::decode(mapping_epoch, bl);
-      if (struct_v >= 10) {
-        ::decode(last_deep_scrub, bl);
-        ::decode(last_deep_scrub_stamp, bl);
-      }
-    }
-  }
-  if (struct_v < 11) {
-    stats_invalid = false;
-  } else {
-    ::decode(stats_invalid, bl);
-  }
-  if (struct_v >= 12) {
-    ::decode(last_clean_scrub_stamp, bl);
-  } else {
-    last_clean_scrub_stamp = utime_t();
-  }
-  if (struct_v >= 13) {
-    ::decode(last_became_active, bl);
-  } else {
-    last_became_active = last_active;
-  }
-  if (struct_v >= 14) {
-    ::decode(dirty_stats_invalid, bl);
-  } else {
-    // if we are decoding an old encoding of this object, then the
-    // encoder may not have supported num_objects_dirty accounting.
-    dirty_stats_invalid = true;
-  }
-  if (struct_v >= 15) {
-    ::decode(up_primary, bl);
-    ::decode(acting_primary, bl);
-  } else {
-    up_primary = up.size() ? up[0] : -1;
-    acting_primary = acting.size() ? acting[0] : -1;
-  }
-  if (struct_v >= 16) {
-    ::decode(omap_stats_invalid, bl);
-  } else {
-    // if we are decoding an old encoding of this object, then the
-    // encoder may not have supported num_objects_omap accounting.
-    omap_stats_invalid = true;
-  }
-  if (struct_v >= 17) {
-    ::decode(hitset_stats_invalid, bl);
-  } else {
-    // if we are decoding an old encoding of this object, then the
-    // encoder may not have supported num_objects_hit_set_archive accounting.
-    hitset_stats_invalid = true;
-  }
-  if (struct_v >= 18) {
     ::decode(blocked_by, bl);
-  } else {
-    blocked_by.clear();
-  }
-  if (struct_v >= 19) {
-    ::decode(last_undegraded, bl);
-    ::decode(last_fullsized, bl);
-  } else {
-    last_undegraded = utime_t();
-    last_fullsized = utime_t();
-  }
-  if (struct_v >= 20) {
-    ::decode(hitset_bytes_stats_invalid, bl);
-  } else {
-    // if we are decoding an old encoding of this object, then the
-    // encoder may not have supported num_bytes_hit_set_archive accounting.
-    hitset_bytes_stats_invalid = true;
-  }
-  if (struct_v >= 21) {
-    ::decode(last_peered, bl);
-    ::decode(last_became_peered, bl);
-  } else {
-    last_peered = last_active;
-    last_became_peered = last_became_active;
+    return;
   }
+
   DECODE_FINISH(bl);
 }
 
diff --git a/src/osd/osd_types.h b/src/osd/osd_types.h
index 0a30458..1b0b937 100644
--- a/src/osd/osd_types.h
+++ b/src/osd/osd_types.h
@@ -658,16 +658,14 @@ class eversion_t {
 public:
   version_t version;
   epoch_t epoch;
-  __u32 __pad;
-  eversion_t() : version(0), epoch(0), __pad(0) {}
-  eversion_t(epoch_t e, version_t v) : version(v), epoch(e), __pad(0) {}
+  eversion_t() : version(0), epoch(0) {}
+  eversion_t(epoch_t e, version_t v) : version(v), epoch(e) {}
 
   eversion_t(const ceph_eversion& ce) : 
     version(ce.version),
-    epoch(ce.epoch),
-    __pad(0) { }
+    epoch(ce.epoch) { }
 
-  eversion_t(bufferlist& bl) : __pad(0) { decode(bl); }
+  eversion_t(bufferlist& bl) { decode(bl); }
 
   static eversion_t max() {
     eversion_t max;
@@ -1503,6 +1501,129 @@ inline bool operator==(const object_stat_collection_t& l,
   return l.sum == r.sum;
 }
 
+struct packed_pg_stat {
+  __le64 version_version;
+  __le32 version_epoch;
+
+  __le64 reported_seq;  // sequence number
+  __le32 reported_epoch;  // epoch of this report
+
+  __le32 state;
+
+  __le32 last_fresh_sec;   // last reported
+  __le32 last_fresh_nsec;
+
+  __le32 last_change_sec;  // new state != previous state
+  __le32 last_change_nsec;
+
+  __le32 last_active_sec;  // state & PG_STATE_ACTIVE
+  __le32 last_active_nsec;
+
+  __le32 last_peered_sec;  // state & PG_STATE_ACTIVE || state & PG_STATE_ACTIVE
+  __le32 last_peered_nsec;
+
+  __le32 last_clean_sec;   // state & PG_STATE_CLEAN
+  __le32 last_clean_nsec;
+
+  __le32 last_unstale_sec; // (state & PG_STATE_STALE) == 0
+  __le32 last_unstale_nsec;
+
+  __le32 last_undegraded_sec; // (state & PG_STATE_DEGRADED) == 0
+  __le32 last_undegraded_nsec;
+
+  __le32 last_fullsized_sec; // (state & PG_STATE_UNDERSIZED) == 0
+  __le32 last_fullsized_nsec;
+
+  __le64 log_start_version;         // (log_start,version]
+  __le32 log_start_epoch;
+
+  __le64 ondisk_log_start_version;  // there may be more on disk
+  __le32 ondisk_log_start_epoch;
+
+  __le32 created;
+  __le32 last_epoch_clean;
+
+  __u8 parent_v;
+  __le64 parent_pool;
+  __le32 parent_seed;
+  __le32 parent_preferred;
+
+  __u32 parent_split_bits;
+
+  __le64 last_scrub_version;
+  __le32 last_scrub_epoch;
+
+  __le64 last_deep_scrub_version;
+  __le32 last_deep_scrub_epoch;
+
+  __le32 last_scrub_stamp_sec;
+  __le32 last_scrub_stamp_nsec;
+  __le32 last_deep_scrub_stamp_sec;
+  __le32 last_deep_scrub_stamp_nsec;
+  __le32 last_clean_scrub_stamp_sec;
+  __le32 last_clean_scrub_stamp_nsec;
+
+  __le64 num_bytes;    // in bytes
+  __le64 num_objects;
+  __le64 num_object_clones;
+  __le64 num_object_copies;  // num_objects * num_replicas
+  __le64 num_objects_missing_on_primary;
+  __le64 num_objects_degraded;
+  __le64 num_objects_misplaced;
+  __le64 num_objects_unfound;
+  __le64 num_rd;
+  __le64 num_rd_kb;
+  __le64 num_wr;
+  __le64 num_wr_kb;
+  __le64 num_scrub_errors;	// total deep and shallow scrub errors
+  __le64 num_shallow_scrub_errors;
+  __le64 num_deep_scrub_errors;
+  __le64 num_objects_recovered;
+  __le64 num_bytes_recovered;
+  __le64 num_keys_recovered;
+  __le64 num_objects_dirty;
+  __le64 num_whiteouts;
+  __le64 num_objects_omap;
+  __le64 num_objects_hit_set_archive;
+  __le64 num_bytes_hit_set_archive;
+  __le64 num_flush;
+  __le64 num_flush_kb;
+  __le64 num_evict;
+  __le64 num_evict_kb;
+  __le64 num_promote;
+
+  __u8 stats_invalid;
+
+  __le64 log_size;
+  __le64 ondisk_log_size;    // >= active_log_size
+
+  __le32  mapping_epoch;
+
+  __le32 last_became_active_sec;
+  __le32 last_became_active_nsec;
+
+  __le32 last_became_peered_sec;
+  __le32 last_became_peered_nsec;
+
+  /// true if num_objects_dirty is not accurate (because it was not
+  /// maintained starting from pool creation)
+  __u8 dirty_stats_invalid;
+  __u8 omap_stats_invalid;
+  __u8 hitset_stats_invalid;
+  __u8 hitset_bytes_stats_invalid;
+
+  /// up, acting primaries
+  __le32 up_primary;
+  __le32 acting_primary;
+
+  void encode(bufferlist& bl) const {
+    bl.append((char*)this, sizeof(struct packed_pg_stat));
+  }
+  void decode(bufferlist::iterator &bl) {
+    bl.copy(sizeof(struct packed_pg_stat), (char*)this);
+  }
+} __attribute__ ((packed));
+WRITE_CLASS_ENCODER(packed_pg_stat)
 
 /** pg_stat
  * aggregate stats for a single PG.
@@ -1511,6 +1632,7 @@ struct pg_stat_t {
   /**************************************************************************
    * WARNING: be sure to update the operator== when adding/removing fields! *
    **************************************************************************/
+
   eversion_t version;
   version_t reported_seq;  // sequence number
   epoch_t reported_epoch;  // epoch of this report
@@ -1563,6 +1685,8 @@ struct pg_stat_t {
   int32_t up_primary;
   int32_t acting_primary;
 
+  struct packed_pg_stat packed;
+
   pg_stat_t()
     : reported_seq(0),
       reported_epoch(0),
diff --git a/src/test/encoding.cc b/src/test/encoding.cc
index 4f2b26c..c96dd90 100644
--- a/src/test/encoding.cc
+++ b/src/test/encoding.cc
@@ -1,6 +1,7 @@
 #include "common/config.h"
 #include "include/buffer.h"
 #include "include/encoding.h"
+#include "osd/osd_types.h"
 
 #include "gtest/gtest.h"
 
@@ -15,6 +16,28 @@ static void test_encode_and_decode(const T& src)
   ASSERT_EQ(src, dst) << "Encoding roundtrip changed the string: orig=" << src << ", but new=" << dst;
 }
 
+TEST(EncodeBench, PGStat) {
+  struct pg_stat_t stats;
+  for (int j = 0; j < 1000; ++j) {
+      bufferlist bl;
+      for (int i = 0; i < 1000; ++i) {
+          encode(stats, bl);
+      }
+  }
+}
+
+TEST(DecodeBench, PGStat) {
+  struct pg_stat_t stats;
+  bufferlist bl;
+  encode(stats, bl);
+  for (int j = 0; j < 1000; ++j) {
+      bufferlist::iterator it(bl.begin());
+      for (int i = 0; i < 1000; ++i) {
+          decode(stats, bl);
+      }
+  }
+}
+/*
 TEST(EncodingRoundTrip, StringSimple) {
   string my_str("I am the very model of a modern major general");
   test_encode_and_decode < std::string >(my_str);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: ceph encoding optimization
  2015-11-04 17:19     ` Piotr.Dalek
@ 2015-11-04 17:29       ` Haomai Wang
  2015-11-07 13:40         ` Haomai Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Haomai Wang @ 2015-11-04 17:29 UTC (permalink / raw)
  To: Piotr.Dalek; +Cc: 池信泽, Gregory Farnum, ceph-devel

On Thu, Nov 5, 2015 at 1:19 AM, Piotr.Dalek@ts.fujitsu.com
<Piotr.Dalek@ts.fujitsu.com> wrote:
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
>> owner@vger.kernel.org] On Behalf Of ???
>> Sent: Wednesday, November 04, 2015 4:34 PM
>> To: Gregory Farnum
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: ceph encoding optimization
>>
>> I agree with pg_stat_t (and friends) is a good first start.
>> The eversion_t and utime_t are also good choice to start because they are
>> used at many places.
>
> On Ceph Hackathon, Josh Durgin made initial steps in right direction in terms of pg_stat_t encoding and decoding optimization, with the endianness-awareness thing left out. Even in that state, performance improvements offered by this change were huge enough to make it worthwhile. I'm attaching the patch, but please note that this is prototype and based on mid-August state of code, so you might need to take that into account when applying the patch.

Cool, it's exactly we want to see.

>
>
> With best regards / Pozdrawiam
> Piotr Dałek
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ceph encoding optimization
  2015-11-04 17:29       ` Haomai Wang
@ 2015-11-07 13:40         ` Haomai Wang
  2015-11-08 14:28           ` Sage Weil
  0 siblings, 1 reply; 13+ messages in thread
From: Haomai Wang @ 2015-11-07 13:40 UTC (permalink / raw)
  To: Piotr.Dalek; +Cc: 池信泽, Gregory Farnum, ceph-devel

Hi sage,

Could we know about your progress to refactor MSubOP and hobject_t,
pg_stat_t decode problem?

We could work on this based on your work if any.


On Thu, Nov 5, 2015 at 1:29 AM, Haomai Wang <haomai@xsky.com> wrote:
> On Thu, Nov 5, 2015 at 1:19 AM, Piotr.Dalek@ts.fujitsu.com
> <Piotr.Dalek@ts.fujitsu.com> wrote:
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
>>> owner@vger.kernel.org] On Behalf Of ???
>>> Sent: Wednesday, November 04, 2015 4:34 PM
>>> To: Gregory Farnum
>>> Cc: ceph-devel@vger.kernel.org
>>> Subject: Re: ceph encoding optimization
>>>
>>> I agree with pg_stat_t (and friends) is a good first start.
>>> The eversion_t and utime_t are also good choice to start because they are
>>> used at many places.
>>
>> On Ceph Hackathon, Josh Durgin made initial steps in right direction in terms of pg_stat_t encoding and decoding optimization, with the endianness-awareness thing left out. Even in that state, performance improvements offered by this change were huge enough to make it worthwhile. I'm attaching the patch, but please note that this is prototype and based on mid-August state of code, so you might need to take that into account when applying the patch.
>
> Cool, it's exactly we want to see.
>
>>
>>
>> With best regards / Pozdrawiam
>> Piotr Dałek
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ceph encoding optimization
  2015-11-07 13:40         ` Haomai Wang
@ 2015-11-08 14:28           ` Sage Weil
  2015-11-09 15:05             ` Gregory Farnum
  0 siblings, 1 reply; 13+ messages in thread
From: Sage Weil @ 2015-11-08 14:28 UTC (permalink / raw)
  To: Haomai Wang
  Cc: Piotr.Dalek, 池信泽, Gregory Farnum, ceph-devel

On Sat, 7 Nov 2015, Haomai Wang wrote:
> Hi sage,
> 
> Could we know about your progress to refactor MSubOP and hobject_t,
> pg_stat_t decode problem?
> 
> We could work on this based on your work if any.

See Piotr's last email on this thead... it has Josh's patch attached.

sage


> 
> 
> On Thu, Nov 5, 2015 at 1:29 AM, Haomai Wang <haomai@xsky.com> wrote:
> > On Thu, Nov 5, 2015 at 1:19 AM, Piotr.Dalek@ts.fujitsu.com
> > <Piotr.Dalek@ts.fujitsu.com> wrote:
> >>> -----Original Message-----
> >>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> >>> owner@vger.kernel.org] On Behalf Of ???
> >>> Sent: Wednesday, November 04, 2015 4:34 PM
> >>> To: Gregory Farnum
> >>> Cc: ceph-devel@vger.kernel.org
> >>> Subject: Re: ceph encoding optimization
> >>>
> >>> I agree with pg_stat_t (and friends) is a good first start.
> >>> The eversion_t and utime_t are also good choice to start because they are
> >>> used at many places.
> >>
> >> On Ceph Hackathon, Josh Durgin made initial steps in right direction in terms of pg_stat_t encoding and decoding optimization, with the endianness-awareness thing left out. Even in that state, performance improvements offered by this change were huge enough to make it worthwhile. I'm attaching the patch, but please note that this is prototype and based on mid-August state of code, so you might need to take that into account when applying the patch.
> >
> > Cool, it's exactly we want to see.
> >
> >>
> >>
> >> With best regards / Pozdrawiam
> >> Piotr Da?ek
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ceph encoding optimization
  2015-11-08 14:28           ` Sage Weil
@ 2015-11-09 15:05             ` Gregory Farnum
  2015-11-09 15:24               ` Sage Weil
  0 siblings, 1 reply; 13+ messages in thread
From: Gregory Farnum @ 2015-11-09 15:05 UTC (permalink / raw)
  To: Sage Weil, Dan Mick
  Cc: Haomai Wang, Piotr.Dalek, 池信泽, ceph-devel

On Wed, Nov 4, 2015 at 7:07 AM, Gregory Farnum <gfarnum@redhat.com> wrote:
> The problem with this approach is that the encoded versions need to be
> platform-independent — they are shared over the wire and written to
> disks that might get transplanted to different machines. Apart from
> padding bytes, we also need to worry about endianness of the machine,
> etc. *And* we often mutate structures across versions in order to add
> new abilities, relying on the encode-decode process to deal with any
> changes to the system. How could we deal with that if just dumping the
> raw memory?
>
> Now, maybe we could make these changes on some carefully-selected
> structs, I'm not sure. But we'd need a way to pick them out, guarantee
> that we aren't breaking interoperability concerns, etc; and it would
> need to be something we can maintain as a group going forward. I'm not
> sure how to satisfy those constraints without burning a little extra
> CPU. :/
> -Greg

So it turns out we've actually had issues with this. Sage merged
(wrote?) some little-endian-only optimizations to the cephx code that
broke big-endian systems by doing a direct memcpy. Apparently our
tests don't find these issues, which makes me even more nervous about
taking that sort of optimization into the tree. :(
-Greg


On Sun, Nov 8, 2015 at 6:28 AM, Sage Weil <sage@newdream.net> wrote:
> On Sat, 7 Nov 2015, Haomai Wang wrote:
>> Hi sage,
>>
>> Could we know about your progress to refactor MSubOP and hobject_t,
>> pg_stat_t decode problem?
>>
>> We could work on this based on your work if any.
>
> See Piotr's last email on this thead... it has Josh's patch attached.
>
> sage
>
>
>>
>>
>> On Thu, Nov 5, 2015 at 1:29 AM, Haomai Wang <haomai@xsky.com> wrote:
>> > On Thu, Nov 5, 2015 at 1:19 AM, Piotr.Dalek@ts.fujitsu.com
>> > <Piotr.Dalek@ts.fujitsu.com> wrote:
>> >>> -----Original Message-----
>> >>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
>> >>> owner@vger.kernel.org] On Behalf Of ???
>> >>> Sent: Wednesday, November 04, 2015 4:34 PM
>> >>> To: Gregory Farnum
>> >>> Cc: ceph-devel@vger.kernel.org
>> >>> Subject: Re: ceph encoding optimization
>> >>>
>> >>> I agree with pg_stat_t (and friends) is a good first start.
>> >>> The eversion_t and utime_t are also good choice to start because they are
>> >>> used at many places.
>> >>
>> >> On Ceph Hackathon, Josh Durgin made initial steps in right direction in terms of pg_stat_t encoding and decoding optimization, with the endianness-awareness thing left out. Even in that state, performance improvements offered by this change were huge enough to make it worthwhile. I'm attaching the patch, but please note that this is prototype and based on mid-August state of code, so you might need to take that into account when applying the patch.
>> >
>> > Cool, it's exactly we want to see.
>> >
>> >>
>> >>
>> >> With best regards / Pozdrawiam
>> >> Piotr Da?ek
>> >>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ceph encoding optimization
  2015-11-09 15:05             ` Gregory Farnum
@ 2015-11-09 15:24               ` Sage Weil
  2015-11-09 17:54                 ` Milosz Tanski
  0 siblings, 1 reply; 13+ messages in thread
From: Sage Weil @ 2015-11-09 15:24 UTC (permalink / raw)
  To: Gregory Farnum
  Cc: Dan Mick, Haomai Wang, Piotr.Dalek, 池信泽, ceph-devel

On Mon, 9 Nov 2015, Gregory Farnum wrote:
> On Wed, Nov 4, 2015 at 7:07 AM, Gregory Farnum <gfarnum@redhat.com> wrote:
> > The problem with this approach is that the encoded versions need to be
> > platform-independent ? they are shared over the wire and written to
> > disks that might get transplanted to different machines. Apart from
> > padding bytes, we also need to worry about endianness of the machine,
> > etc. *And* we often mutate structures across versions in order to add
> > new abilities, relying on the encode-decode process to deal with any
> > changes to the system. How could we deal with that if just dumping the
> > raw memory?
> >
> > Now, maybe we could make these changes on some carefully-selected
> > structs, I'm not sure. But we'd need a way to pick them out, guarantee
> > that we aren't breaking interoperability concerns, etc; and it would
> > need to be something we can maintain as a group going forward. I'm not
> > sure how to satisfy those constraints without burning a little extra
> > CPU. :/
> > -Greg
> 
> So it turns out we've actually had issues with this. Sage merged
> (wrote?) some little-endian-only optimizations to the cephx code that
> broke big-endian systems by doing a direct memcpy. Apparently our
> tests don't find these issues, which makes me even more nervous about
> taking that sort of optimization into the tree. :(

I think the way to make this maintainable will be to

1) Find a clean approach with a simple #if or #ifdef condition for 
little endian and/or architectures that can handle unaligned int pointer 
access.

2) Maintain the parallel optimized implementation next to the generic 
encode/decode in a way that makes it as easy as possible to make changes 
and keep them in sync.

3) Optimize *only* the most recent encoding to minimize complexity.

4) Ensure that there is a set of encode/decode tests that verify they both 
work, triggered by make check (so that a simple make check on a big 
endian box will catch errors).  Ideally this'd be part of the 
test/encoding/readable.sh so that we run it over the entire corpus of old 
encodings..


sage

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ceph encoding optimization
  2015-11-09 15:24               ` Sage Weil
@ 2015-11-09 17:54                 ` Milosz Tanski
  2015-12-03  9:17                   ` Xinze Chi (信泽)
  0 siblings, 1 reply; 13+ messages in thread
From: Milosz Tanski @ 2015-11-09 17:54 UTC (permalink / raw)
  To: Sage Weil
  Cc: Gregory Farnum, Dan Mick, Haomai Wang, Piotr.Dalek,
	池信泽,
	ceph-devel

On Mon, Nov 9, 2015 at 10:24 AM, Sage Weil <sweil@redhat.com> wrote:
> On Mon, 9 Nov 2015, Gregory Farnum wrote:
>> On Wed, Nov 4, 2015 at 7:07 AM, Gregory Farnum <gfarnum@redhat.com> wrote:
>> > The problem with this approach is that the encoded versions need to be
>> > platform-independent ? they are shared over the wire and written to
>> > disks that might get transplanted to different machines. Apart from
>> > padding bytes, we also need to worry about endianness of the machine,
>> > etc. *And* we often mutate structures across versions in order to add
>> > new abilities, relying on the encode-decode process to deal with any
>> > changes to the system. How could we deal with that if just dumping the
>> > raw memory?
>> >
>> > Now, maybe we could make these changes on some carefully-selected
>> > structs, I'm not sure. But we'd need a way to pick them out, guarantee
>> > that we aren't breaking interoperability concerns, etc; and it would
>> > need to be something we can maintain as a group going forward. I'm not
>> > sure how to satisfy those constraints without burning a little extra
>> > CPU. :/
>> > -Greg
>>
>> So it turns out we've actually had issues with this. Sage merged
>> (wrote?) some little-endian-only optimizations to the cephx code that
>> broke big-endian systems by doing a direct memcpy. Apparently our
>> tests don't find these issues, which makes me even more nervous about
>> taking that sort of optimization into the tree. :(
>
> I think the way to make this maintainable will be to
>
> 1) Find a clean approach with a simple #if or #ifdef condition for
> little endian and/or architectures that can handle unaligned int pointer
> access.
>

In C++ you can also do that with a template <bool Endian> or using
std::enable_if. The upside is the same as the downside (depending on
how you look at it). So it'll add to compile time checks (because it
won't be discarded by the processor) and it'll take longer to build,
but you get extra checks and the compiler will later discard the
unused code.

If you do that, it should be easier to write unit tests for the functionality.

> 2) Maintain the parallel optimized implementation next to the generic
> encode/decode in a way that makes it as easy as possible to make changes
> and keep them in sync.
>
> 3) Optimize *only* the most recent encoding to minimize complexity.
>
> 4) Ensure that there is a set of encode/decode tests that verify they both
> work, triggered by make check (so that a simple make check on a big
> endian box will catch errors).  Ideally this'd be part of the
> test/encoding/readable.sh so that we run it over the entire corpus of old
> encodings..
>
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@adfin.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ceph encoding optimization
  2015-11-09 17:54                 ` Milosz Tanski
@ 2015-12-03  9:17                   ` Xinze Chi (信泽)
  2015-12-03  9:29                     ` Piotr.Dalek
  0 siblings, 1 reply; 13+ messages in thread
From: Xinze Chi (信泽) @ 2015-12-03  9:17 UTC (permalink / raw)
  To: Milosz Tanski
  Cc: Sage Weil, Gregory Farnum, Dan Mick, Haomai Wang, Piotr.Dalek,
	ceph-devel

I write a new patch about this
https://github.com/XinzeChi/ceph/commit/06eb471e463a4687e251273d0b5dfe170acbef2d

If you use __attribute__ ((packed)) after struct, we could encode many
struct member in a batch. There is not compatibility problem if we
keep the order of members defined in struct.

Wait for your comment.

2015-11-10 1:54 GMT+08:00 Milosz Tanski <milosz@adfin.com>:
> On Mon, Nov 9, 2015 at 10:24 AM, Sage Weil <sweil@redhat.com> wrote:
>> On Mon, 9 Nov 2015, Gregory Farnum wrote:
>>> On Wed, Nov 4, 2015 at 7:07 AM, Gregory Farnum <gfarnum@redhat.com> wrote:
>>> > The problem with this approach is that the encoded versions need to be
>>> > platform-independent ? they are shared over the wire and written to
>>> > disks that might get transplanted to different machines. Apart from
>>> > padding bytes, we also need to worry about endianness of the machine,
>>> > etc. *And* we often mutate structures across versions in order to add
>>> > new abilities, relying on the encode-decode process to deal with any
>>> > changes to the system. How could we deal with that if just dumping the
>>> > raw memory?
>>> >
>>> > Now, maybe we could make these changes on some carefully-selected
>>> > structs, I'm not sure. But we'd need a way to pick them out, guarantee
>>> > that we aren't breaking interoperability concerns, etc; and it would
>>> > need to be something we can maintain as a group going forward. I'm not
>>> > sure how to satisfy those constraints without burning a little extra
>>> > CPU. :/
>>> > -Greg
>>>
>>> So it turns out we've actually had issues with this. Sage merged
>>> (wrote?) some little-endian-only optimizations to the cephx code that
>>> broke big-endian systems by doing a direct memcpy. Apparently our
>>> tests don't find these issues, which makes me even more nervous about
>>> taking that sort of optimization into the tree. :(
>>
>> I think the way to make this maintainable will be to
>>
>> 1) Find a clean approach with a simple #if or #ifdef condition for
>> little endian and/or architectures that can handle unaligned int pointer
>> access.
>>
>
> In C++ you can also do that with a template <bool Endian> or using
> std::enable_if. The upside is the same as the downside (depending on
> how you look at it). So it'll add to compile time checks (because it
> won't be discarded by the processor) and it'll take longer to build,
> but you get extra checks and the compiler will later discard the
> unused code.
>
> If you do that, it should be easier to write unit tests for the functionality.
>
>> 2) Maintain the parallel optimized implementation next to the generic
>> encode/decode in a way that makes it as easy as possible to make changes
>> and keep them in sync.
>>
>> 3) Optimize *only* the most recent encoding to minimize complexity.
>>
>> 4) Ensure that there is a set of encode/decode tests that verify they both
>> work, triggered by make check (so that a simple make check on a big
>> endian box will catch errors).  Ideally this'd be part of the
>> test/encoding/readable.sh so that we run it over the entire corpus of old
>> encodings..
>>
>>
>> sage
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Milosz Tanski
> CTO
> 16 East 34th Street, 15th floor
> New York, NY 10016
>
> p: 646-253-9055
> e: milosz@adfin.com



-- 
Regards,
Xinze Chi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: ceph encoding optimization
  2015-12-03  9:17                   ` Xinze Chi (信泽)
@ 2015-12-03  9:29                     ` Piotr.Dalek
  0 siblings, 0 replies; 13+ messages in thread
From: Piotr.Dalek @ 2015-12-03  9:29 UTC (permalink / raw)
  To: Xinze Chi (信泽), Milosz Tanski
  Cc: Sage Weil, Gregory Farnum, Dan Mick, Haomai Wang, ceph-devel

> -----Original Message-----
> From: Xinze Chi (信泽) [mailto:xmdxcxz@gmail.com]
> Sent: Thursday, December 03, 2015 10:18 AM
> To: Milosz Tanski
> Cc: Sage Weil; Gregory Farnum; Dan Mick; Haomai Wang; Dałek, Piotr; ceph-
> devel@vger.kernel.org
> Subject: Re: ceph encoding optimization
> 
> I write a new patch about this
> https://github.com/XinzeChi/ceph/commit/06eb471e463a4687e251273d0b5
> dfe170acbef2d
> 
> If you use __attribute__ ((packed)) after struct, we could encode many
> struct member in a batch. There is not compatibility problem if we keep the
> order of members defined in struct.
> 
> Wait for your comment.

Fields in embedded structs might be aligned, that will inhibit packed attribute. Also, you can't just use packed attribute on every struct that is encoded/decoded because that will eventually incur a performance hit. What I had in mind with https://github.com/XinzeChi/ceph/commit/06eb471e463a4687e251273d0b5dfe170acbef2d#commitcomment-14755615 is to use temporary, packed struct, copy data from original struct/class into it, and then copy data from packed struct in one go.


With best regards / Pozdrawiam
Piotr Dałek



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-12-03  9:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-04 15:00 ceph encoding optimization 池信泽
2015-11-04 15:05 ` Sage Weil
2015-11-04 15:07 ` Gregory Farnum
2015-11-04 15:33   ` 池信泽
2015-11-04 17:19     ` Piotr.Dalek
2015-11-04 17:29       ` Haomai Wang
2015-11-07 13:40         ` Haomai Wang
2015-11-08 14:28           ` Sage Weil
2015-11-09 15:05             ` Gregory Farnum
2015-11-09 15:24               ` Sage Weil
2015-11-09 17:54                 ` Milosz Tanski
2015-12-03  9:17                   ` Xinze Chi (信泽)
2015-12-03  9:29                     ` Piotr.Dalek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.