All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/18] ceph: Inline data support
@ 2013-11-27 13:40 Li Wang
  2013-11-27 13:40 ` [PATCH 01/18] ceph: Add inline data feature Li Wang
                   ` (17 more replies)
  0 siblings, 18 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

This patch implements inline data support for Ceph.
It is also available to be pulled from:
https://github.com/kylinstorage/ceph.git inline

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
Against v2:
Streamline the inline data migration with the subsequent read/write

Against v1:
With simplified process under multiple-writer case,
referred to
http://pad.ceph.com/p/mds-inline-data,
http://www.spinics.net/lists/ceph-devel/msg16018.html

Li Wang (18):
  ceph: Add inline data feature
  ceph: Add inline state definition
  mds: Add inline fields to inode_t
  mds: Add inline encode/decode to inode_t
  ceph: Add inline fields to MClientCaps
  osdc: Add write method with truncate parameters
  mds: Add inline fields to Capability
  mds: Push inline data to client in cap message
  ceph: Add inline fields to InodeStat
  mds: Push inline data to client in inodestat
  mds: Receive updated inline data from client
  client: Add inline fields to Inode
  client: Receive inline data pushed from mds
  client: Push inline data to mds by send cap
  client: Add inline data migration helper
  client: Read inline data path
  client: Write inline data path
  client: Fallocate inline data path

 src/ceph_mds.cc             |    1 +
 src/client/Client.cc        |  277 +++++++++++++++++++++++++++++++++++++++----
 src/client/Client.h         |    4 +
 src/client/Inode.h          |    5 +
 src/include/ceph_features.h |    2 +
 src/include/ceph_fs.h       |    3 +
 src/mds/CInode.cc           |   22 ++++
 src/mds/Capability.h        |    2 +
 src/mds/Locker.cc           |    7 ++
 src/mds/mdstypes.cc         |   12 +-
 src/mds/mdstypes.h          |    3 +
 src/messages/MClientCaps.h  |   19 ++-
 src/messages/MClientReply.h |    9 ++
 src/osdc/Objecter.h         |   10 +-
 14 files changed, 346 insertions(+), 30 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/18] ceph: Add inline data feature
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 02/18] ceph: Add inline state definition Li Wang
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/ceph_mds.cc             |    1 +
 src/include/ceph_features.h |    2 ++
 2 files changed, 3 insertions(+)

diff --git a/src/ceph_mds.cc b/src/ceph_mds.cc
index 88b807b..dac676f 100644
--- a/src/ceph_mds.cc
+++ b/src/ceph_mds.cc
@@ -243,6 +243,7 @@ int main(int argc, const char **argv)
     CEPH_FEATURE_UID |
     CEPH_FEATURE_NOSRCADDR |
     CEPH_FEATURE_DIRLAYOUTHASH |
+    CEPH_FEATURE_MDS_INLINE_DATA |
     CEPH_FEATURE_PGID64 |
     CEPH_FEATURE_MSG_AUTH;
   uint64_t required =
diff --git a/src/include/ceph_features.h b/src/include/ceph_features.h
index c0f01cc..70ee921 100644
--- a/src/include/ceph_features.h
+++ b/src/include/ceph_features.h
@@ -40,6 +40,7 @@
 #define CEPH_FEATURE_MON_SCRUB      (1ULL<<33)
 #define CEPH_FEATURE_OSD_PACKED_RECOVERY (1ULL<<34)
 #define CEPH_FEATURE_OSD_CACHEPOOL (1ULL<<35)
+#define CEPH_FEATURE_MDS_INLINE_DATA     (1ULL<<36)
 
 /*
  * The introduction of CEPH_FEATURE_OSD_SNAPMAPPER caused the feature
@@ -103,6 +104,7 @@ static inline unsigned long long ceph_sanitize_features(unsigned long long f) {
 	 CEPH_FEATURE_MON_SCRUB	|	    \
 	 CEPH_FEATURE_OSD_PACKED_RECOVERY | \
 	 CEPH_FEATURE_OSD_CACHEPOOL | \
+	 CEPH_FEATURE_MDS_INLINE_DATA | \
 	 0ULL)
 
 #define CEPH_FEATURES_SUPPORTED_DEFAULT  CEPH_FEATURES_ALL
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/18] ceph: Add inline state definition
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
  2013-11-27 13:40 ` [PATCH 01/18] ceph: Add inline data feature Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 03/18] mds: Add inline fields to inode_t Li Wang
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/include/ceph_fs.h |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/include/ceph_fs.h b/src/include/ceph_fs.h
index 47ec1f1..07a78b8 100644
--- a/src/include/ceph_fs.h
+++ b/src/include/ceph_fs.h
@@ -526,6 +526,9 @@ struct ceph_filelock {
 
 int ceph_flags_to_mode(int flags);
 
+/* inline data state */
+#define CEPH_INLINE_NONE	((__u64)-1)
+#define CEPH_INLINE_SIZE	(1 << 12)
 
 /* capability bits */
 #define CEPH_CAP_PIN         1  /* no specific capabilities beyond the pin */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 03/18] mds: Add inline fields to inode_t
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
  2013-11-27 13:40 ` [PATCH 01/18] ceph: Add inline data feature Li Wang
  2013-11-27 13:40 ` [PATCH 02/18] ceph: Add inline state definition Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 04/18] mds: Add inline encode/decode " Li Wang
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/mds/mdstypes.h |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mds/mdstypes.h b/src/mds/mdstypes.h
index bd53c85..aacc41c 100644
--- a/src/mds/mdstypes.h
+++ b/src/mds/mdstypes.h
@@ -336,6 +336,8 @@ struct inode_t {
   utime_t    mtime;   // file data modify time.
   utime_t    atime;   // file data access time.
   uint32_t   time_warp_seq;  // count of (potential) mtime/atime timewarps (i.e., utimes())
+  bufferlist inline_data;
+  uint64_t   inline_version;
 
   map<client_t,client_writeable_range_t> client_ranges;  // client(s) can write to these ranges
 
@@ -358,6 +360,7 @@ struct inode_t {
 	      truncate_seq(0), truncate_size(0), truncate_from(0),
 	      truncate_pending(0),
 	      time_warp_seq(0),
+	      inline_version(1),
 	      version(0), file_data_version(0), xattr_version(0), backtrace_version(0) {
     clear_layout();
     memset(&dir_layout, 0, sizeof(dir_layout));
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/18] mds: Add inline encode/decode to inode_t
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (2 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 03/18] mds: Add inline fields to inode_t Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 05/18] ceph: Add inline fields to MClientCaps Li Wang
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/mds/mdstypes.cc |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/mds/mdstypes.cc b/src/mds/mdstypes.cc
index df6cd8e..01a04e8 100644
--- a/src/mds/mdstypes.cc
+++ b/src/mds/mdstypes.cc
@@ -204,7 +204,7 @@ ostream& operator<<(ostream& out, const client_writeable_range_t& r)
  */
 void inode_t::encode(bufferlist &bl) const
 {
-  ENCODE_START(8, 6, bl);
+  ENCODE_START(9, 9, bl);
 
   ::encode(ino, bl);
   ::encode(rdev, bl);
@@ -239,13 +239,15 @@ void inode_t::encode(bufferlist &bl) const
   ::encode(backtrace_version, bl);
   ::encode(old_pools, bl);
   ::encode(max_size_ever, bl);
+  ::encode(inline_version, bl);
+  ::encode(inline_data, bl);
 
   ENCODE_FINISH(bl);
 }
 
 void inode_t::decode(bufferlist::iterator &p)
 {
-  DECODE_START_LEGACY_COMPAT_LEN(7, 6, 6, p);
+  DECODE_START_LEGACY_COMPAT_LEN(9, 6, 6, p);
 
   ::decode(ino, p);
   ::decode(rdev, p);
@@ -299,6 +301,12 @@ void inode_t::decode(bufferlist::iterator &p)
     backtrace_version = 0; // note inode which has no backtrace
   if (struct_v >= 8)
     ::decode(max_size_ever, p);
+  if (struct_v >= 9) {
+    ::decode(inline_version, p);
+    ::decode(inline_data, p);
+  } else {
+    inline_version = CEPH_INLINE_NONE;
+  }
 
   DECODE_FINISH(p);
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/18] ceph: Add inline fields to MClientCaps
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (3 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 04/18] mds: Add inline encode/decode " Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 06/18] osdc: Add write method with truncate parameters Li Wang
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/messages/MClientCaps.h |   19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/messages/MClientCaps.h b/src/messages/MClientCaps.h
index 117f241..a506c53 100644
--- a/src/messages/MClientCaps.h
+++ b/src/messages/MClientCaps.h
@@ -21,7 +21,7 @@
 
 class MClientCaps : public Message {
 
-  static const int HEAD_VERSION = 2;   // added flock metadata
+  static const int HEAD_VERSION = 3;   // added flock metadata, inline data
   static const int COMPAT_VERSION = 1;
 
  public:
@@ -29,6 +29,8 @@ class MClientCaps : public Message {
   bufferlist snapbl;
   bufferlist xattrbl;
   bufferlist flockbl;
+  uint64_t   inline_version;
+  bufferlist inline_data;
 
   int      get_caps() { return head.caps; }
   int      get_wanted() { return head.wanted; }
@@ -151,6 +153,13 @@ public:
     // conditionally decode flock metadata
     if (header.version >= 2)
       ::decode(flockbl, p);
+
+    if (header.version >= 3) {
+      ::decode(inline_version, p);
+      ::decode(inline_data, p);
+    } else {
+      inline_version = CEPH_INLINE_NONE;
+    }
   }
   void encode_payload(uint64_t features) {
     head.snap_trace_len = snapbl.length();
@@ -165,6 +174,14 @@ public:
       ::encode(flockbl, payload);
     } else {
       header.version = 1;  // old
+      return;
+    }
+
+    if (features & CEPH_FEATURE_MDS_INLINE_DATA) {
+      ::encode(inline_version, payload);
+      ::encode(inline_data, payload);
+    } else {
+      header.version = 2;
     }
   }
 };
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/18] osdc: Add write method with truncate parameters
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (4 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 05/18] ceph: Add inline fields to MClientCaps Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 07/18] mds: Add inline fields to Capability Li Wang
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/osdc/Objecter.h |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/osdc/Objecter.h b/src/osdc/Objecter.h
index 41973dd..40f03de 100644
--- a/src/osdc/Objecter.h
+++ b/src/osdc/Objecter.h
@@ -279,8 +279,16 @@ struct ObjectOperation {
     out_handler[p] = h;
     out_rval[p] = prval;
   }
-  void write(uint64_t off, bufferlist& bl) {
+  void write(uint64_t off, bufferlist& bl,
+             uint64_t truncate_size,
+             uint32_t truncate_seq) {
     add_data(CEPH_OSD_OP_WRITE, off, bl.length(), bl);
+    OSDOp& o = *ops.rbegin();
+    o.op.extent.truncate_size = truncate_size;
+    o.op.extent.truncate_seq = truncate_seq;
+  }
+  void write(uint64_t off, bufferlist& bl) {
+    write(off, bl, 0, 0);
   }
   void write_full(bufferlist& bl) {
     add_data(CEPH_OSD_OP_WRITEFULL, 0, bl.length(), bl);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 07/18] mds: Add inline fields to Capability
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (5 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 06/18] osdc: Add write method with truncate parameters Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 08/18] mds: Push inline data to client in cap message Li Wang
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/mds/Capability.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mds/Capability.h b/src/mds/Capability.h
index fb6b3dc..995ea3a 100644
--- a/src/mds/Capability.h
+++ b/src/mds/Capability.h
@@ -209,6 +209,7 @@ private:
 public:
   snapid_t client_follows;
   version_t client_xattr_version;
+  uint64_t client_inline_version;
   
   xlist<Capability*>::item item_session_caps;
   xlist<Capability*>::item item_snaprealm_caps;
@@ -223,6 +224,7 @@ public:
     mseq(0),
     suppress(0), stale(false),
     client_follows(0), client_xattr_version(0),
+    client_inline_version(0),
     item_session_caps(this), item_snaprealm_caps(this) {
     g_num_cap++;
     g_num_capa++;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/18] mds: Push inline data to client in cap message
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (6 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 07/18] mds: Add inline fields to Capability Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 09/18] ceph: Add inline fields to InodeStat Li Wang
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/mds/CInode.cc |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc
index c8b00ef..4756865 100644
--- a/src/mds/CInode.cc
+++ b/src/mds/CInode.cc
@@ -2989,6 +2989,13 @@ void CInode::encode_cap_message(MClientCaps *m, Capability *cap)
   i->atime.encode_timeval(&m->head.atime);
   m->head.time_warp_seq = i->time_warp_seq;
 
+  if (cap->client_inline_version < i->inline_version) {
+    m->inline_version = cap->client_inline_version = i->inline_version;
+    m->inline_data = i->inline_data;
+  } else {
+    m->inline_version = 0;
+  }
+
   // max_size is min of projected, actual.
   uint64_t oldms = oi->client_ranges.count(client) ? oi->client_ranges[client].range.last : 0;
   uint64_t newms = pi->client_ranges.count(client) ? pi->client_ranges[client].range.last : 0;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/18] ceph: Add inline fields to InodeStat
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (7 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 08/18] mds: Push inline data to client in cap message Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 10/18] mds: Push inline data to client in inodestat Li Wang
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/messages/MClientReply.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/messages/MClientReply.h b/src/messages/MClientReply.h
index 896245f..47908e9 100644
--- a/src/messages/MClientReply.h
+++ b/src/messages/MClientReply.h
@@ -108,6 +108,8 @@ struct InodeStat {
   uint64_t truncate_size;
   utime_t ctime, mtime, atime;
   version_t time_warp_seq;
+  bufferlist inline_data;
+  uint64_t inline_version;
 
   frag_info_t dirstat;
   nest_info_t rstat;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 10/18] mds: Push inline data to client in inodestat
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (8 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 09/18] ceph: Add inline fields to InodeStat Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 11/18] mds: Receive updated inline data from client Li Wang
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/mds/CInode.cc           |   15 +++++++++++++++
 src/messages/MClientReply.h |    7 +++++++
 2 files changed, 22 insertions(+)

diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc
index 4756865..ed51e1d 100644
--- a/src/mds/CInode.cc
+++ b/src/mds/CInode.cc
@@ -2824,6 +2824,16 @@ int CInode::encode_inodestat(bufferlist& bl, Session *session,
   e.files = i->dirstat.nfiles;
   e.subdirs = i->dirstat.nsubdirs;
 
+  // inline data
+  uint64_t inline_version = 0;
+  bufferlist inline_data;
+  if (!cap || (cap->client_inline_version < i->inline_version)) {
+    inline_version = i->inline_version;
+    inline_data = i->inline_data;
+    if (cap)
+      cap->client_inline_version = i->inline_version;
+  }
+
   // nest (do same as file... :/)
   i->rstat.rctime.encode_timeval(&e.rctime);
   e.rbytes = i->rstat.rbytes;
@@ -2862,6 +2872,7 @@ int CInode::encode_inodestat(bufferlist& bl, Session *session,
     bytes += (sizeof(__u32) + sizeof(__u32)) * dirfragtree._splits.size();
     bytes += sizeof(__u32) + symlink.length();
     bytes += sizeof(__u32) + xbl.length();
+    bytes += sizeof(__u64) + sizeof(__u32) + inline_data.length();
     if (bytes > max_bytes)
       return -ENOSPC;
   }
@@ -2957,6 +2968,10 @@ int CInode::encode_inodestat(bufferlist& bl, Session *session,
     ::encode(i->dir_layout, bl);
   }
   ::encode(xbl, bl);
+  if (session->connection->has_feature(CEPH_FEATURE_MDS_INLINE_DATA)) {
+    ::encode(inline_version, bl);
+    ::encode(inline_data, bl);
+  }
 
   return valid;
 }
diff --git a/src/messages/MClientReply.h b/src/messages/MClientReply.h
index 47908e9..ebb3b9b 100644
--- a/src/messages/MClientReply.h
+++ b/src/messages/MClientReply.h
@@ -176,6 +176,13 @@ struct InodeStat {
 
     xattr_version = e.xattr_version;
     ::decode(xattrbl, p);
+
+    if (features & CEPH_FEATURE_MDS_INLINE_DATA) {
+      ::decode(inline_version, p);
+      ::decode(inline_data, p);
+    } else {
+      inline_version = CEPH_INLINE_NONE;
+    }
   }
   
   // see CInode::encode_inodestat for encoder.
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 11/18] mds: Receive updated inline data from client
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (9 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 10/18] mds: Push inline data to client in inodestat Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 12/18] client: Add inline fields to Inode Li Wang
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/mds/Locker.cc |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc
index 63e0e08..4b02a56 100644
--- a/src/mds/Locker.cc
+++ b/src/mds/Locker.cc
@@ -2691,6 +2691,7 @@ void Locker::_update_cap_fields(CInode *in, int dirty, MClientCaps *m, inode_t *
     utime_t mtime = m->get_mtime();
     utime_t ctime = m->get_ctime();
     uint64_t size = m->get_size();
+    uint64_t inline_version = m->inline_version;
     
     if (((dirty & CEPH_CAP_FILE_WR) && mtime > pi->mtime) ||
 	((dirty & CEPH_CAP_FILE_EXCL) && mtime != pi->mtime)) {
@@ -2710,6 +2711,12 @@ void Locker::_update_cap_fields(CInode *in, int dirty, MClientCaps *m, inode_t *
       pi->size = size;
       pi->rstat.rbytes = size;
     }
+    if (in->inode.is_file() &&
+        (dirty & CEPH_CAP_FILE_WR) &&
+        inline_version > pi->inline_version) {
+      pi->inline_version = inline_version;
+      pi->inline_data = m->inline_data;
+    }
     if ((dirty & CEPH_CAP_FILE_EXCL) && atime != pi->atime) {
       dout(7) << "  atime " << pi->atime << " -> " << atime
 	      << " for " << *in << dendl;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 12/18] client: Add inline fields to Inode
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (10 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 11/18] mds: Receive updated inline data from client Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 13/18] client: Receive inline data pushed from mds Li Wang
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/client/Inode.h |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/client/Inode.h b/src/client/Inode.h
index cc054a6..bb17706 100644
--- a/src/client/Inode.h
+++ b/src/client/Inode.h
@@ -111,6 +111,10 @@ class Inode {
   version_t version;           // auth only
   version_t xattr_version;
 
+  // inline data
+  uint64_t   inline_version;
+  bufferlist inline_data;
+
   bool is_symlink() const { return (mode & S_IFMT) == S_IFLNK; }
   bool is_dir()     const { return (mode & S_IFMT) == S_IFDIR; }
   bool is_file()    const { return (mode & S_IFMT) == S_IFREG; }
@@ -207,6 +211,7 @@ class Inode {
       rdev(0), mode(0), uid(0), gid(0), nlink(0),
       size(0), truncate_seq(1), truncate_size(-1),
       time_warp_seq(0), max_size(0), version(0), xattr_version(0),
+      inline_version(0),
       flags(0),
       dir_hashed(false), dir_replicated(false), auth_cap(NULL),
       dirty_caps(0), flushing_caps(0), flushing_cap_seq(0), shared_gen(0), cache_gen(0),
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 13/18] client: Receive inline data pushed from mds
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (11 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 12/18] client: Add inline fields to Inode Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 14/18] client: Push inline data to mds by send cap Li Wang
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/client/Client.cc |   22 ++++++++++++++++++++--
 src/client/Client.h  |    1 +
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/src/client/Client.cc b/src/client/Client.cc
index a4d5550..19d31e0 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -494,6 +494,8 @@ void Client::update_inode_file_bits(Inode *in,
 				    uint64_t time_warp_seq, utime_t ctime,
 				    utime_t mtime,
 				    utime_t atime,
+				    uint64_t inline_version,
+				    bufferlist& inline_data,
 				    int issued)
 {
   bool warn = false;
@@ -504,6 +506,11 @@ void Client::update_inode_file_bits(Inode *in,
 	   << " local " << in->time_warp_seq << dendl;
   uint64_t prior_size = in->size;
 
+  if (inline_version > in->inline_version) {
+    in->inline_data = inline_data;
+    in->inline_version = inline_version;
+  }
+
   if (truncate_seq > in->truncate_seq ||
       (truncate_seq == in->truncate_seq && size > in->size)) {
     ldout(cct, 10) << "size " << in->size << " -> " << size << dendl;
@@ -520,6 +527,13 @@ void Client::update_inode_file_bits(Inode *in,
 	_invalidate_inode_cache(in, truncate_size, prior_size - truncate_size, true);
       }
     }
+
+    // truncate inline data
+    if (in->inline_version < CEPH_INLINE_NONE) {
+      uint32_t len = in->inline_data.length();
+      if (size < len)
+        in->inline_data.splice(size, len - size);
+    }
   }
   if (truncate_seq >= in->truncate_seq &&
       in->truncate_size != truncate_size) {
@@ -654,6 +668,7 @@ Inode * Client::add_update_inode(InodeStat *st, utime_t from, MetaSession *sessi
   
     update_inode_file_bits(in, st->truncate_seq, st->truncate_size, st->size,
 			   st->time_warp_seq, st->ctime, st->mtime, st->atime,
+			   st->inline_version, st->inline_data,
 			   issued);
   }
 
@@ -3524,7 +3539,9 @@ void Client::handle_cap_trunc(MetaSession *session, Inode *in, MClientCaps *m)
   issued |= implemented;
   update_inode_file_bits(in, m->get_truncate_seq(), m->get_truncate_size(),
                          m->get_size(), m->get_time_warp_seq(), m->get_ctime(),
-                         m->get_mtime(), m->get_atime(), issued);
+                         m->get_mtime(), m->get_atime(),
+                         m->inline_version, m->inline_data,
+                         issued);
   m->put();
 }
 
@@ -3674,7 +3691,8 @@ void Client::handle_cap_grant(MetaSession *session, Inode *in, Cap *cap, MClient
     in->xattr_version = m->head.xattr_version;
   }
   update_inode_file_bits(in, m->get_truncate_seq(), m->get_truncate_size(), m->get_size(),
-			 m->get_time_warp_seq(), m->get_ctime(), m->get_mtime(), m->get_atime(), issued);
+			 m->get_time_warp_seq(), m->get_ctime(), m->get_mtime(), m->get_atime(),
+			 m->inline_version, m->inline_data, issued);
 
   // max_size
   if (cap == in->auth_cap &&
diff --git a/src/client/Client.h b/src/client/Client.h
index 649bacc..48f1fea 100644
--- a/src/client/Client.h
+++ b/src/client/Client.h
@@ -508,6 +508,7 @@ protected:
   void update_inode_file_bits(Inode *in,
 			      uint64_t truncate_seq, uint64_t truncate_size, uint64_t size,
 			      uint64_t time_warp_seq, utime_t ctime, utime_t mtime, utime_t atime,
+			      uint64_t inline_version, bufferlist& inline_data,
 			      int issued);
   Inode *add_update_inode(InodeStat *st, utime_t ttl, MetaSession *session);
   Dentry *insert_dentry_inode(Dir *dir, const string& dname, LeaseStat *dlease, 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 14/18] client: Push inline data to mds by send cap
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (12 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 13/18] client: Receive inline data pushed from mds Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 15/18] client: Add inline data migration helper Li Wang
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/client/Client.cc |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/client/Client.cc b/src/client/Client.cc
index 19d31e0..3beab8f 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -2399,6 +2399,11 @@ void Client::send_cap(Inode *in, MetaSession *session, Cap *cap,
   in->ctime.encode_timeval(&m->head.ctime);
   m->head.time_warp_seq = in->time_warp_seq;
     
+  if (flush & CEPH_CAP_FILE_WR) {
+    m->inline_version = in->inline_version;
+    m->inline_data = in->inline_data;
+  }
+
   in->reported_size = in->size;
   m->set_snap_follows(follows);
   cap->wanted = want;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 15/18] client: Add inline data migration helper
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (13 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 14/18] client: Push inline data to mds by send cap Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 16/18] client: Read inline data path Li Wang
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/client/Client.cc |   41 +++++++++++++++++++++++++++++++++++++++++
 src/client/Client.h  |    3 +++
 2 files changed, 44 insertions(+)

diff --git a/src/client/Client.cc b/src/client/Client.cc
index 3beab8f..bbd56f2 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -5767,6 +5767,47 @@ void Client::unlock_fh_pos(Fh *f)
   f->pos_locked = false;
 }
 
+int Client::uninline_data(Inode *in, Context *onfinish)
+{
+  char oid_buf[32];
+  snprintf(oid_buf, sizeof(oid_buf), "%llx.00000000", (long long unsigned)in->ino);
+  object_t oid = oid_buf;
+
+  ObjectOperation create_ops;
+  create_ops.create(false);
+
+  objecter->mutate(oid,
+                   OSDMap::file_to_object_locator(in->layout),
+                   create_ops,
+                   in->snaprealm->get_snap_context(),
+                   ceph_clock_now(cct),
+                   0,
+                   NULL,
+                   NULL);
+
+  bufferlist inline_version_bl;
+  ::encode(in->inline_version, inline_version_bl);
+
+  ObjectOperation uninline_ops;
+  uninline_ops.cmpxattr("inline_version",
+                        CEPH_OSD_CMPXATTR_OP_GT,
+                        CEPH_OSD_CMPXATTR_MODE_U64,
+                        inline_version_bl);
+  bufferlist inline_data = in->inline_data;
+  uninline_ops.write(0, inline_data, in->truncate_size, in->truncate_seq);
+  uninline_ops.setxattr("inline_version", inline_version_bl);
+
+  objecter->mutate(oid,
+                   OSDMap::file_to_object_locator(in->layout),
+                   uninline_ops,
+                   in->snaprealm->get_snap_context(),
+                   ceph_clock_now(cct),
+                   0,
+                   NULL,
+                   onfinish);
+
+  return 0;
+}
 
 // 
 
diff --git a/src/client/Client.h b/src/client/Client.h
index 48f1fea..63f0c41 100644
--- a/src/client/Client.h
+++ b/src/client/Client.h
@@ -429,6 +429,9 @@ protected:
 
   void handle_lease(MClientLease *m);
 
+  // inline data
+  int uninline_data(Inode *in, Context *onfinish);
+
   // file caps
   void check_cap_issue(Inode *in, Cap *cap, unsigned issued);
   void add_update_cap(Inode *in, MetaSession *session, uint64_t cap_id,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 16/18] client: Read inline data path
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (14 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 15/18] client: Add inline data migration helper Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-27 13:40 ` [PATCH 17/18] client: Write " Li Wang
  2013-11-27 13:40 ` [PATCH 18/18] client: Fallocate " Li Wang
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/client/Client.cc |   55 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/src/client/Client.cc b/src/client/Client.cc
index bbd56f2..6b08155 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -5853,6 +5853,41 @@ int Client::_read(Fh *f, int64_t offset, uint64_t size, bufferlist *bl)
     movepos = true;
   }
 
+  Mutex uninline_flock("Clinet::_read_uninline_data flock");
+  Cond uninline_cond;
+  bool uninline_done = false;
+  int uninline_ret = 0;
+  Context *onuninline = NULL;
+
+  if (in->inline_version < CEPH_INLINE_NONE) {
+    if (!(have & CEPH_CAP_FILE_CACHE)) {
+      onuninline = new C_SafeCond(&uninline_flock,
+                                  &uninline_cond,
+                                  &uninline_done,
+                                  &uninline_ret);
+      uninline_data(in, onuninline);
+    } else {
+      uint32_t len = in->inline_data.length();
+
+      uint64_t endoff = offset + size;
+      if (endoff > in->size)
+        endoff = in->size;
+
+      if (offset < len) {
+        if (endoff <= len) {
+          bl->substr_of(in->inline_data, offset, endoff - offset);
+        } else {
+          bl->substr_of(in->inline_data, offset, len - offset);
+          bl->append_zero(endoff - len);
+        }
+      } else if (offset < endoff) {
+        bl->append_zero(endoff - offset);
+      }
+
+      goto success;
+    }
+  }
+
   if (!conf->client_debug_force_sync_read &&
       (cct->_conf->client_oc && (have & CEPH_CAP_FILE_CACHE))) {
 
@@ -5869,6 +5904,8 @@ int Client::_read(Fh *f, int64_t offset, uint64_t size, bufferlist *bl)
     goto done;
   }
 
+success:
+
   if (movepos) {
     // adjust fd pos
     f->pos = offset+bl->length();
@@ -5890,6 +5927,24 @@ int Client::_read(Fh *f, int64_t offset, uint64_t size, bufferlist *bl)
 
 done:
   // done!
+
+  if (onuninline) {
+    client_lock.Unlock();
+    uninline_flock.Lock();
+    while (!uninline_done)
+      uninline_cond.Wait(uninline_flock);
+    uninline_flock.Unlock();
+    client_lock.Lock();
+
+    if (uninline_ret >= 0 || uninline_ret == -ECANCELED) {
+      in->inline_data.clear();
+      in->inline_version = CEPH_INLINE_NONE;
+      mark_caps_dirty(in, CEPH_CAP_FILE_WR);
+      check_caps(in, false);
+    } else
+      r = uninline_ret;
+  }
+
   put_cap_ref(in, CEPH_CAP_FILE_RD);
   return r;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 17/18] client: Write inline data path
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (15 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 16/18] client: Read inline data path Li Wang
@ 2013-11-27 13:40 ` Li Wang
  2013-11-28  3:02   ` Yan, Zheng
  2013-11-27 13:40 ` [PATCH 18/18] client: Fallocate " Li Wang
  17 siblings, 1 reply; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/client/Client.cc |   55 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/src/client/Client.cc b/src/client/Client.cc
index 6b08155..c913e35 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -6215,6 +6215,41 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
 
   ldout(cct, 10) << " snaprealm " << *in->snaprealm << dendl;
 
+  Mutex uninline_flock("Clinet::_write_uninline_data flock");
+  Cond uninline_cond;
+  bool uninline_done = false;
+  int uninline_ret = 0;
+  Context *onuninline = NULL;
+
+  if (in->inline_version < CEPH_INLINE_NONE) {
+    if (endoff > CEPH_INLINE_SIZE || !(have & CEPH_CAP_FILE_BUFFER)) {
+      onuninline = new C_SafeCond(&uninline_flock,
+                                  &uninline_cond,
+                                  &uninline_done,
+                                  &uninline_ret);
+      uninline_data(in, onuninline);
+    } else {
+      get_cap_ref(in, CEPH_CAP_FILE_BUFFER);
+
+      uint32_t len = in->inline_data.length();
+
+      if (endoff < len)
+        in->inline_data.copy(endoff, len - endoff, bl);
+
+      if (offset < len)
+        in->inline_data.splice(offset, len - offset);
+      else if (offset > len)
+        in->inline_data.append_zero(offset - len);
+
+      in->inline_data.append(bl);
+      in->inline_version++;
+
+      put_cap_ref(in, CEPH_CAP_FILE_BUFFER);
+
+      goto success;
+    }
+  }
+
   if (cct->_conf->client_oc && (have & CEPH_CAP_FILE_BUFFER)) {
     // do buffered write
     if (!in->oset.dirty_or_tx)
@@ -6265,7 +6300,7 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
   }
 
   // if we get here, write was successful, update client metadata
-
+success:
   // time
   lat = ceph_clock_now(cct);
   lat -= start;
@@ -6293,6 +6328,24 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
   mark_caps_dirty(in, CEPH_CAP_FILE_WR);
 
 done:
+
+  if (onuninline) {
+    client_lock.Unlock();
+    uninline_flock.Lock();
+    while (!uninline_done)
+      uninline_cond.Wait(uninline_flock);
+    uninline_flock.Unlock();
+    client_lock.Lock();
+
+    if (uninline_ret >= 0 || uninline_ret == -ECANCELED) {
+      in->inline_data.clear();
+      in->inline_version = CEPH_INLINE_NONE;
+      mark_caps_dirty(in, CEPH_CAP_FILE_WR);
+      check_caps(in, false);
+    } else
+      r = uninline_ret;
+  }
+
   put_cap_ref(in, CEPH_CAP_FILE_WR);
   return r;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 18/18] client: Fallocate inline data path
  2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
                   ` (16 preceding siblings ...)
  2013-11-27 13:40 ` [PATCH 17/18] client: Write " Li Wang
@ 2013-11-27 13:40 ` Li Wang
  17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 src/client/Client.cc |   99 ++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 76 insertions(+), 23 deletions(-)

diff --git a/src/client/Client.cc b/src/client/Client.cc
index c913e35..d77e4ce 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -8002,34 +8002,69 @@ int Client::_fallocate(Fh *fh, int mode, int64_t offset, int64_t length)
   if (r < 0)
     return r;
 
+  Mutex uninline_flock("Clinet::_fallocate_uninline_data flock");
+  Cond uninline_cond;
+  bool uninline_done = false;
+  int uninline_ret = 0;
+  Context *onuninline = NULL;
+
   if (mode & FALLOC_FL_PUNCH_HOLE) {
-    Mutex flock("Client::_punch_hole flock");
-    Cond cond;
-    bool done = false;
-    Context *onfinish = new C_SafeCond(&flock, &cond, &done);
-    Context *onsafe = new C_Client_SyncCommit(this, in);
+    if (in->inline_version < CEPH_INLINE_NONE &&
+        (have & CEPH_CAP_FILE_BUFFER)) {
+      bufferlist bl;
+      int len = in->inline_data.length();
+      if (offset < len) {
+        if (offset > 0)
+          in->inline_data.copy(0, offset, bl);
+        int size = length;
+        if (offset + size > len)
+          size = len - offset;
+        if (size > 0)
+          bl.append_zero(size);
+        if (offset + size < len)
+          in->inline_data.copy(offset + size, len - offset - size, bl);
+        in->inline_data = bl;
+        in->inline_version++;
+      }
+      in->mtime = ceph_clock_now(cct);
+      mark_caps_dirty(in, CEPH_CAP_FILE_WR);
+    } else {
+      if (in->inline_version < CEPH_INLINE_NONE) {
+        onuninline = new C_SafeCond(&uninline_flock,
+                                    &uninline_cond,
+                                    &uninline_done,
+                                    &uninline_ret);
+        uninline_data(in, onuninline);
+      }
 
-    unsafe_sync_write++;
-    get_cap_ref(in, CEPH_CAP_FILE_BUFFER);
+      Mutex flock("Client::_punch_hole flock");
+      Cond cond;
+      bool done = false;
+      Context *onfinish = new C_SafeCond(&flock, &cond, &done);
+      Context *onsafe = new C_Client_SyncCommit(this, in);
 
-    _invalidate_inode_cache(in, offset, length, true);
-    r = filer->zero(in->ino, &in->layout,
-                    in->snaprealm->get_snap_context(),
-                    offset, length,
-                    ceph_clock_now(cct),
-                    0, true, onfinish, onsafe);
-    if (r < 0)
-      goto done;
+      unsafe_sync_write++;
+      get_cap_ref(in, CEPH_CAP_FILE_BUFFER);
 
-    in->mtime = ceph_clock_now(cct);
-    mark_caps_dirty(in, CEPH_CAP_FILE_WR);
+      _invalidate_inode_cache(in, offset, length, true);
+      r = filer->zero(in->ino, &in->layout,
+                      in->snaprealm->get_snap_context(),
+                      offset, length,
+                      ceph_clock_now(cct),
+                      0, true, onfinish, onsafe);
+      if (r < 0)
+        goto done;
 
-    client_lock.Unlock();
-    flock.Lock();
-    while (!done)
-      cond.Wait(flock);
-    flock.Unlock();
-    client_lock.Lock();
+      in->mtime = ceph_clock_now(cct);
+      mark_caps_dirty(in, CEPH_CAP_FILE_WR);
+
+      client_lock.Unlock();
+      flock.Lock();
+      while (!done)
+        cond.Wait(flock);
+      flock.Unlock();
+      client_lock.Lock();
+    }
   } else if (!(mode & FALLOC_FL_KEEP_SIZE)) {
     uint64_t size = offset + length;
     if (size > in->size) {
@@ -8044,6 +8079,24 @@ int Client::_fallocate(Fh *fh, int mode, int64_t offset, int64_t length)
   }
 
 done:
+
+  if (onuninline) {
+    client_lock.Unlock();
+    uninline_flock.Lock();
+    while (!uninline_done)
+      uninline_cond.Wait(uninline_flock);
+    uninline_flock.Unlock();
+    client_lock.Lock();
+
+    if (uninline_ret >= 0 || uninline_ret == -ECANCELED) {
+      in->inline_data.clear();
+      in->inline_version = CEPH_INLINE_NONE;
+      mark_caps_dirty(in, CEPH_CAP_FILE_WR);
+      check_caps(in, false);
+    } else
+      r = uninline_ret;
+  }
+
   put_cap_ref(in, CEPH_CAP_FILE_WR);
   return r;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 17/18] client: Write inline data path
  2013-11-27 13:40 ` [PATCH 17/18] client: Write " Li Wang
@ 2013-11-28  3:02   ` Yan, Zheng
  2013-11-29 17:01     ` Matt W. Benjamin
  2013-12-02  8:03     ` Li Wang
  0 siblings, 2 replies; 23+ messages in thread
From: Yan, Zheng @ 2013-11-28  3:02 UTC (permalink / raw)
  To: Li Wang; +Cc: ceph-devel, Sage Weil, Yunchuan Wen

On Wed, Nov 27, 2013 at 9:40 PM, Li Wang <liwang@ubuntukylin.com> wrote:
> Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
> Signed-off-by: Li Wang <liwang@ubuntukylin.com>
> ---
>  src/client/Client.cc |   55 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/src/client/Client.cc b/src/client/Client.cc
> index 6b08155..c913e35 100644
> --- a/src/client/Client.cc
> +++ b/src/client/Client.cc
> @@ -6215,6 +6215,41 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
>
>    ldout(cct, 10) << " snaprealm " << *in->snaprealm << dendl;
>
> +  Mutex uninline_flock("Clinet::_write_uninline_data flock");
> +  Cond uninline_cond;
> +  bool uninline_done = false;
> +  int uninline_ret = 0;
> +  Context *onuninline = NULL;
> +
> +  if (in->inline_version < CEPH_INLINE_NONE) {
> +    if (endoff > CEPH_INLINE_SIZE || !(have & CEPH_CAP_FILE_BUFFER)) {
> +      onuninline = new C_SafeCond(&uninline_flock,
> +                                  &uninline_cond,
> +                                  &uninline_done,
> +                                  &uninline_ret);
> +      uninline_data(in, onuninline);

If client does 4k sequence write, the second write always trigger the
"uninline" procedure, this is suboptimal. It's better to just copy the
inline data to the object cacher.

Besides, this feature should be disabled by default because it's not
compatible with old clients and it imposes overhead on the mds. we
need to use a config option or directory attribute to enable it.

Regards
Yan, Zheng


> +    } else {
> +      get_cap_ref(in, CEPH_CAP_FILE_BUFFER);
> +
> +      uint32_t len = in->inline_data.length();
> +
> +      if (endoff < len)
> +        in->inline_data.copy(endoff, len - endoff, bl);
> +
> +      if (offset < len)
> +        in->inline_data.splice(offset, len - offset);
> +      else if (offset > len)
> +        in->inline_data.append_zero(offset - len);
> +
> +      in->inline_data.append(bl);
> +      in->inline_version++;
> +
> +      put_cap_ref(in, CEPH_CAP_FILE_BUFFER);
> +
> +      goto success;
> +    }
> +  }
> +
>    if (cct->_conf->client_oc && (have & CEPH_CAP_FILE_BUFFER)) {
>      // do buffered write
>      if (!in->oset.dirty_or_tx)
> @@ -6265,7 +6300,7 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
>    }
>
>    // if we get here, write was successful, update client metadata
> -
> +success:
>    // time
>    lat = ceph_clock_now(cct);
>    lat -= start;
> @@ -6293,6 +6328,24 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
>    mark_caps_dirty(in, CEPH_CAP_FILE_WR);
>
>  done:
> +
> +  if (onuninline) {
> +    client_lock.Unlock();
> +    uninline_flock.Lock();
> +    while (!uninline_done)
> +      uninline_cond.Wait(uninline_flock);
> +    uninline_flock.Unlock();
> +    client_lock.Lock();
> +
> +    if (uninline_ret >= 0 || uninline_ret == -ECANCELED) {
> +      in->inline_data.clear();
> +      in->inline_version = CEPH_INLINE_NONE;
> +      mark_caps_dirty(in, CEPH_CAP_FILE_WR);
> +      check_caps(in, false);
> +    } else
> +      r = uninline_ret;
> +  }
> +
>    put_cap_ref(in, CEPH_CAP_FILE_WR);
>    return r;
>  }
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 17/18] client: Write inline data path
  2013-11-28  3:02   ` Yan, Zheng
@ 2013-11-29 17:01     ` Matt W. Benjamin
  2013-12-02  8:20       ` Li Wang
  2013-12-02  8:03     ` Li Wang
  1 sibling, 1 reply; 23+ messages in thread
From: Matt W. Benjamin @ 2013-11-29 17:01 UTC (permalink / raw)
  To: Zheng Yan; +Cc: ceph-devel, Sage Weil, Yunchuan Wen, Li Wang

Hi,

I wondered about this.  Were you able to measure an effect?

Thanks,

Matt

----- "Zheng Yan" <ukernel@gmail.com> wrote:

> 
> Besides, this feature should be disabled by default because it's not
> compatible with old clients and it imposes overhead on the mds. we
> need to use a config option or directory attribute to enable it.
> 
> Regards
> Yan, Zheng
> 

-- 
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel.  734-761-4689 
fax.  734-769-8938 
cel.  734-216-5309 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 17/18] client: Write inline data path
  2013-11-28  3:02   ` Yan, Zheng
  2013-11-29 17:01     ` Matt W. Benjamin
@ 2013-12-02  8:03     ` Li Wang
  1 sibling, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-12-02  8:03 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel, Sage Weil, Yunchuan Wen

Hi Zheng,
   Thanks for your comments.
   Regarding the configuration option, it is in our original plan, and 
we will make it appear soon in the incoming next version :)
   For the write optimization, it does remind us to do an optimization, 
that is, if the inline data length is zero, we won't bother to do the 
migration. This will capture the situation that application has a write 
buffer larger than the inline threshold, the sequential write will not 
incur migration. And another situation that client performs some inline 
read/write, then truncate it to zero, then start write after the inline 
threshold.

Cheers,
Li Wang

On 11/28/2013 11:02 AM, Yan, Zheng wrote:
> On Wed, Nov 27, 2013 at 9:40 PM, Li Wang <liwang@ubuntukylin.com> wrote:
>> Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
>> Signed-off-by: Li Wang <liwang@ubuntukylin.com>
>> ---
>>   src/client/Client.cc |   55 +++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 54 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/client/Client.cc b/src/client/Client.cc
>> index 6b08155..c913e35 100644
>> --- a/src/client/Client.cc
>> +++ b/src/client/Client.cc
>> @@ -6215,6 +6215,41 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
>>
>>     ldout(cct, 10) << " snaprealm " << *in->snaprealm << dendl;
>>
>> +  Mutex uninline_flock("Clinet::_write_uninline_data flock");
>> +  Cond uninline_cond;
>> +  bool uninline_done = false;
>> +  int uninline_ret = 0;
>> +  Context *onuninline = NULL;
>> +
>> +  if (in->inline_version < CEPH_INLINE_NONE) {
>> +    if (endoff > CEPH_INLINE_SIZE || !(have & CEPH_CAP_FILE_BUFFER)) {
>> +      onuninline = new C_SafeCond(&uninline_flock,
>> +                                  &uninline_cond,
>> +                                  &uninline_done,
>> +                                  &uninline_ret);
>> +      uninline_data(in, onuninline);
>
> If client does 4k sequence write, the second write always trigger the
> "uninline" procedure, this is suboptimal. It's better to just copy the
> inline data to the object cacher.
>
> Besides, this feature should be disabled by default because it's not
> compatible with old clients and it imposes overhead on the mds. we
> need to use a config option or directory attribute to enable it.
>
> Regards
> Yan, Zheng
>
>
>> +    } else {
>> +      get_cap_ref(in, CEPH_CAP_FILE_BUFFER);
>> +
>> +      uint32_t len = in->inline_data.length();
>> +
>> +      if (endoff < len)
>> +        in->inline_data.copy(endoff, len - endoff, bl);
>> +
>> +      if (offset < len)
>> +        in->inline_data.splice(offset, len - offset);
>> +      else if (offset > len)
>> +        in->inline_data.append_zero(offset - len);
>> +
>> +      in->inline_data.append(bl);
>> +      in->inline_version++;
>> +
>> +      put_cap_ref(in, CEPH_CAP_FILE_BUFFER);
>> +
>> +      goto success;
>> +    }
>> +  }
>> +
>>     if (cct->_conf->client_oc && (have & CEPH_CAP_FILE_BUFFER)) {
>>       // do buffered write
>>       if (!in->oset.dirty_or_tx)
>> @@ -6265,7 +6300,7 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
>>     }
>>
>>     // if we get here, write was successful, update client metadata
>> -
>> +success:
>>     // time
>>     lat = ceph_clock_now(cct);
>>     lat -= start;
>> @@ -6293,6 +6328,24 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
>>     mark_caps_dirty(in, CEPH_CAP_FILE_WR);
>>
>>   done:
>> +
>> +  if (onuninline) {
>> +    client_lock.Unlock();
>> +    uninline_flock.Lock();
>> +    while (!uninline_done)
>> +      uninline_cond.Wait(uninline_flock);
>> +    uninline_flock.Unlock();
>> +    client_lock.Lock();
>> +
>> +    if (uninline_ret >= 0 || uninline_ret == -ECANCELED) {
>> +      in->inline_data.clear();
>> +      in->inline_version = CEPH_INLINE_NONE;
>> +      mark_caps_dirty(in, CEPH_CAP_FILE_WR);
>> +      check_caps(in, false);
>> +    } else
>> +      r = uninline_ret;
>> +  }
>> +
>>     put_cap_ref(in, CEPH_CAP_FILE_WR);
>>     return r;
>>   }
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 17/18] client: Write inline data path
  2013-11-29 17:01     ` Matt W. Benjamin
@ 2013-12-02  8:20       ` Li Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-12-02  8:20 UTC (permalink / raw)
  To: Matt W. Benjamin; +Cc: Zheng Yan, ceph-devel, Sage Weil, Yunchuan Wen

Hi Matt,
   This feature is expected to run in the massive tiny file storage 
situation, and there are some further designs to reduce the overhead 
when the file is not tiny,
(1) The migration is done asynchronously with the subsequent read/write 
(v3);
(2) Avoid migration when the inline data length be zero, that will 
capture some situations to almost eliminate the migration overhead (v4);
(3) It could be implicitly turned off at mount time by client (v4);
(4) It could be turned off globally by configuring the mds(v4).

v4 is coming soon.

Cheers,
Li Wang

On 11/30/2013 01:01 AM, Matt W. Benjamin wrote:
> Hi,
>
> I wondered about this.  Were you able to measure an effect?
>
> Thanks,
>
> Matt
>
> ----- "Zheng Yan" <ukernel@gmail.com> wrote:
>
>>
>> Besides, this feature should be disabled by default because it's not
>> compatible with old clients and it imposes overhead on the mds. we
>> need to use a config option or directory attribute to enable it.
>>
>> Regards
>> Yan, Zheng
>>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2013-12-02  8:20 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
2013-11-27 13:40 ` [PATCH 01/18] ceph: Add inline data feature Li Wang
2013-11-27 13:40 ` [PATCH 02/18] ceph: Add inline state definition Li Wang
2013-11-27 13:40 ` [PATCH 03/18] mds: Add inline fields to inode_t Li Wang
2013-11-27 13:40 ` [PATCH 04/18] mds: Add inline encode/decode " Li Wang
2013-11-27 13:40 ` [PATCH 05/18] ceph: Add inline fields to MClientCaps Li Wang
2013-11-27 13:40 ` [PATCH 06/18] osdc: Add write method with truncate parameters Li Wang
2013-11-27 13:40 ` [PATCH 07/18] mds: Add inline fields to Capability Li Wang
2013-11-27 13:40 ` [PATCH 08/18] mds: Push inline data to client in cap message Li Wang
2013-11-27 13:40 ` [PATCH 09/18] ceph: Add inline fields to InodeStat Li Wang
2013-11-27 13:40 ` [PATCH 10/18] mds: Push inline data to client in inodestat Li Wang
2013-11-27 13:40 ` [PATCH 11/18] mds: Receive updated inline data from client Li Wang
2013-11-27 13:40 ` [PATCH 12/18] client: Add inline fields to Inode Li Wang
2013-11-27 13:40 ` [PATCH 13/18] client: Receive inline data pushed from mds Li Wang
2013-11-27 13:40 ` [PATCH 14/18] client: Push inline data to mds by send cap Li Wang
2013-11-27 13:40 ` [PATCH 15/18] client: Add inline data migration helper Li Wang
2013-11-27 13:40 ` [PATCH 16/18] client: Read inline data path Li Wang
2013-11-27 13:40 ` [PATCH 17/18] client: Write " Li Wang
2013-11-28  3:02   ` Yan, Zheng
2013-11-29 17:01     ` Matt W. Benjamin
2013-12-02  8:20       ` Li Wang
2013-12-02  8:03     ` Li Wang
2013-11-27 13:40 ` [PATCH 18/18] client: Fallocate " Li Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.