* [PATCH v3 00/18] ceph: Inline data support
@ 2013-11-27 13:40 Li Wang
2013-11-27 13:40 ` [PATCH 01/18] ceph: Add inline data feature Li Wang
` (17 more replies)
0 siblings, 18 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
This patch implements inline data support for Ceph.
It is also available to be pulled from:
https://github.com/kylinstorage/ceph.git inline
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
Against v2:
Streamline the inline data migration with the subsequent read/write
Against v1:
With simplified process under multiple-writer case,
referred to
http://pad.ceph.com/p/mds-inline-data,
http://www.spinics.net/lists/ceph-devel/msg16018.html
Li Wang (18):
ceph: Add inline data feature
ceph: Add inline state definition
mds: Add inline fields to inode_t
mds: Add inline encode/decode to inode_t
ceph: Add inline fields to MClientCaps
osdc: Add write method with truncate parameters
mds: Add inline fields to Capability
mds: Push inline data to client in cap message
ceph: Add inline fields to InodeStat
mds: Push inline data to client in inodestat
mds: Receive updated inline data from client
client: Add inline fields to Inode
client: Receive inline data pushed from mds
client: Push inline data to mds by send cap
client: Add inline data migration helper
client: Read inline data path
client: Write inline data path
client: Fallocate inline data path
src/ceph_mds.cc | 1 +
src/client/Client.cc | 277 +++++++++++++++++++++++++++++++++++++++----
src/client/Client.h | 4 +
src/client/Inode.h | 5 +
src/include/ceph_features.h | 2 +
src/include/ceph_fs.h | 3 +
src/mds/CInode.cc | 22 ++++
src/mds/Capability.h | 2 +
src/mds/Locker.cc | 7 ++
src/mds/mdstypes.cc | 12 +-
src/mds/mdstypes.h | 3 +
src/messages/MClientCaps.h | 19 ++-
src/messages/MClientReply.h | 9 ++
src/osdc/Objecter.h | 10 +-
14 files changed, 346 insertions(+), 30 deletions(-)
--
1.7.9.5
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 01/18] ceph: Add inline data feature
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 02/18] ceph: Add inline state definition Li Wang
` (16 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/ceph_mds.cc | 1 +
src/include/ceph_features.h | 2 ++
2 files changed, 3 insertions(+)
diff --git a/src/ceph_mds.cc b/src/ceph_mds.cc
index 88b807b..dac676f 100644
--- a/src/ceph_mds.cc
+++ b/src/ceph_mds.cc
@@ -243,6 +243,7 @@ int main(int argc, const char **argv)
CEPH_FEATURE_UID |
CEPH_FEATURE_NOSRCADDR |
CEPH_FEATURE_DIRLAYOUTHASH |
+ CEPH_FEATURE_MDS_INLINE_DATA |
CEPH_FEATURE_PGID64 |
CEPH_FEATURE_MSG_AUTH;
uint64_t required =
diff --git a/src/include/ceph_features.h b/src/include/ceph_features.h
index c0f01cc..70ee921 100644
--- a/src/include/ceph_features.h
+++ b/src/include/ceph_features.h
@@ -40,6 +40,7 @@
#define CEPH_FEATURE_MON_SCRUB (1ULL<<33)
#define CEPH_FEATURE_OSD_PACKED_RECOVERY (1ULL<<34)
#define CEPH_FEATURE_OSD_CACHEPOOL (1ULL<<35)
+#define CEPH_FEATURE_MDS_INLINE_DATA (1ULL<<36)
/*
* The introduction of CEPH_FEATURE_OSD_SNAPMAPPER caused the feature
@@ -103,6 +104,7 @@ static inline unsigned long long ceph_sanitize_features(unsigned long long f) {
CEPH_FEATURE_MON_SCRUB | \
CEPH_FEATURE_OSD_PACKED_RECOVERY | \
CEPH_FEATURE_OSD_CACHEPOOL | \
+ CEPH_FEATURE_MDS_INLINE_DATA | \
0ULL)
#define CEPH_FEATURES_SUPPORTED_DEFAULT CEPH_FEATURES_ALL
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 02/18] ceph: Add inline state definition
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
2013-11-27 13:40 ` [PATCH 01/18] ceph: Add inline data feature Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 03/18] mds: Add inline fields to inode_t Li Wang
` (15 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/include/ceph_fs.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/include/ceph_fs.h b/src/include/ceph_fs.h
index 47ec1f1..07a78b8 100644
--- a/src/include/ceph_fs.h
+++ b/src/include/ceph_fs.h
@@ -526,6 +526,9 @@ struct ceph_filelock {
int ceph_flags_to_mode(int flags);
+/* inline data state */
+#define CEPH_INLINE_NONE ((__u64)-1)
+#define CEPH_INLINE_SIZE (1 << 12)
/* capability bits */
#define CEPH_CAP_PIN 1 /* no specific capabilities beyond the pin */
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 03/18] mds: Add inline fields to inode_t
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
2013-11-27 13:40 ` [PATCH 01/18] ceph: Add inline data feature Li Wang
2013-11-27 13:40 ` [PATCH 02/18] ceph: Add inline state definition Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 04/18] mds: Add inline encode/decode " Li Wang
` (14 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/mds/mdstypes.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mds/mdstypes.h b/src/mds/mdstypes.h
index bd53c85..aacc41c 100644
--- a/src/mds/mdstypes.h
+++ b/src/mds/mdstypes.h
@@ -336,6 +336,8 @@ struct inode_t {
utime_t mtime; // file data modify time.
utime_t atime; // file data access time.
uint32_t time_warp_seq; // count of (potential) mtime/atime timewarps (i.e., utimes())
+ bufferlist inline_data;
+ uint64_t inline_version;
map<client_t,client_writeable_range_t> client_ranges; // client(s) can write to these ranges
@@ -358,6 +360,7 @@ struct inode_t {
truncate_seq(0), truncate_size(0), truncate_from(0),
truncate_pending(0),
time_warp_seq(0),
+ inline_version(1),
version(0), file_data_version(0), xattr_version(0), backtrace_version(0) {
clear_layout();
memset(&dir_layout, 0, sizeof(dir_layout));
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 04/18] mds: Add inline encode/decode to inode_t
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (2 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 03/18] mds: Add inline fields to inode_t Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 05/18] ceph: Add inline fields to MClientCaps Li Wang
` (13 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/mds/mdstypes.cc | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/src/mds/mdstypes.cc b/src/mds/mdstypes.cc
index df6cd8e..01a04e8 100644
--- a/src/mds/mdstypes.cc
+++ b/src/mds/mdstypes.cc
@@ -204,7 +204,7 @@ ostream& operator<<(ostream& out, const client_writeable_range_t& r)
*/
void inode_t::encode(bufferlist &bl) const
{
- ENCODE_START(8, 6, bl);
+ ENCODE_START(9, 9, bl);
::encode(ino, bl);
::encode(rdev, bl);
@@ -239,13 +239,15 @@ void inode_t::encode(bufferlist &bl) const
::encode(backtrace_version, bl);
::encode(old_pools, bl);
::encode(max_size_ever, bl);
+ ::encode(inline_version, bl);
+ ::encode(inline_data, bl);
ENCODE_FINISH(bl);
}
void inode_t::decode(bufferlist::iterator &p)
{
- DECODE_START_LEGACY_COMPAT_LEN(7, 6, 6, p);
+ DECODE_START_LEGACY_COMPAT_LEN(9, 6, 6, p);
::decode(ino, p);
::decode(rdev, p);
@@ -299,6 +301,12 @@ void inode_t::decode(bufferlist::iterator &p)
backtrace_version = 0; // note inode which has no backtrace
if (struct_v >= 8)
::decode(max_size_ever, p);
+ if (struct_v >= 9) {
+ ::decode(inline_version, p);
+ ::decode(inline_data, p);
+ } else {
+ inline_version = CEPH_INLINE_NONE;
+ }
DECODE_FINISH(p);
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 05/18] ceph: Add inline fields to MClientCaps
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (3 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 04/18] mds: Add inline encode/decode " Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 06/18] osdc: Add write method with truncate parameters Li Wang
` (12 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/messages/MClientCaps.h | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/src/messages/MClientCaps.h b/src/messages/MClientCaps.h
index 117f241..a506c53 100644
--- a/src/messages/MClientCaps.h
+++ b/src/messages/MClientCaps.h
@@ -21,7 +21,7 @@
class MClientCaps : public Message {
- static const int HEAD_VERSION = 2; // added flock metadata
+ static const int HEAD_VERSION = 3; // added flock metadata, inline data
static const int COMPAT_VERSION = 1;
public:
@@ -29,6 +29,8 @@ class MClientCaps : public Message {
bufferlist snapbl;
bufferlist xattrbl;
bufferlist flockbl;
+ uint64_t inline_version;
+ bufferlist inline_data;
int get_caps() { return head.caps; }
int get_wanted() { return head.wanted; }
@@ -151,6 +153,13 @@ public:
// conditionally decode flock metadata
if (header.version >= 2)
::decode(flockbl, p);
+
+ if (header.version >= 3) {
+ ::decode(inline_version, p);
+ ::decode(inline_data, p);
+ } else {
+ inline_version = CEPH_INLINE_NONE;
+ }
}
void encode_payload(uint64_t features) {
head.snap_trace_len = snapbl.length();
@@ -165,6 +174,14 @@ public:
::encode(flockbl, payload);
} else {
header.version = 1; // old
+ return;
+ }
+
+ if (features & CEPH_FEATURE_MDS_INLINE_DATA) {
+ ::encode(inline_version, payload);
+ ::encode(inline_data, payload);
+ } else {
+ header.version = 2;
}
}
};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 06/18] osdc: Add write method with truncate parameters
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (4 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 05/18] ceph: Add inline fields to MClientCaps Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 07/18] mds: Add inline fields to Capability Li Wang
` (11 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/osdc/Objecter.h | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/src/osdc/Objecter.h b/src/osdc/Objecter.h
index 41973dd..40f03de 100644
--- a/src/osdc/Objecter.h
+++ b/src/osdc/Objecter.h
@@ -279,8 +279,16 @@ struct ObjectOperation {
out_handler[p] = h;
out_rval[p] = prval;
}
- void write(uint64_t off, bufferlist& bl) {
+ void write(uint64_t off, bufferlist& bl,
+ uint64_t truncate_size,
+ uint32_t truncate_seq) {
add_data(CEPH_OSD_OP_WRITE, off, bl.length(), bl);
+ OSDOp& o = *ops.rbegin();
+ o.op.extent.truncate_size = truncate_size;
+ o.op.extent.truncate_seq = truncate_seq;
+ }
+ void write(uint64_t off, bufferlist& bl) {
+ write(off, bl, 0, 0);
}
void write_full(bufferlist& bl) {
add_data(CEPH_OSD_OP_WRITEFULL, 0, bl.length(), bl);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 07/18] mds: Add inline fields to Capability
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (5 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 06/18] osdc: Add write method with truncate parameters Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 08/18] mds: Push inline data to client in cap message Li Wang
` (10 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/mds/Capability.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/mds/Capability.h b/src/mds/Capability.h
index fb6b3dc..995ea3a 100644
--- a/src/mds/Capability.h
+++ b/src/mds/Capability.h
@@ -209,6 +209,7 @@ private:
public:
snapid_t client_follows;
version_t client_xattr_version;
+ uint64_t client_inline_version;
xlist<Capability*>::item item_session_caps;
xlist<Capability*>::item item_snaprealm_caps;
@@ -223,6 +224,7 @@ public:
mseq(0),
suppress(0), stale(false),
client_follows(0), client_xattr_version(0),
+ client_inline_version(0),
item_session_caps(this), item_snaprealm_caps(this) {
g_num_cap++;
g_num_capa++;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 08/18] mds: Push inline data to client in cap message
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (6 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 07/18] mds: Add inline fields to Capability Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 09/18] ceph: Add inline fields to InodeStat Li Wang
` (9 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/mds/CInode.cc | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc
index c8b00ef..4756865 100644
--- a/src/mds/CInode.cc
+++ b/src/mds/CInode.cc
@@ -2989,6 +2989,13 @@ void CInode::encode_cap_message(MClientCaps *m, Capability *cap)
i->atime.encode_timeval(&m->head.atime);
m->head.time_warp_seq = i->time_warp_seq;
+ if (cap->client_inline_version < i->inline_version) {
+ m->inline_version = cap->client_inline_version = i->inline_version;
+ m->inline_data = i->inline_data;
+ } else {
+ m->inline_version = 0;
+ }
+
// max_size is min of projected, actual.
uint64_t oldms = oi->client_ranges.count(client) ? oi->client_ranges[client].range.last : 0;
uint64_t newms = pi->client_ranges.count(client) ? pi->client_ranges[client].range.last : 0;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 09/18] ceph: Add inline fields to InodeStat
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (7 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 08/18] mds: Push inline data to client in cap message Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 10/18] mds: Push inline data to client in inodestat Li Wang
` (8 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/messages/MClientReply.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/messages/MClientReply.h b/src/messages/MClientReply.h
index 896245f..47908e9 100644
--- a/src/messages/MClientReply.h
+++ b/src/messages/MClientReply.h
@@ -108,6 +108,8 @@ struct InodeStat {
uint64_t truncate_size;
utime_t ctime, mtime, atime;
version_t time_warp_seq;
+ bufferlist inline_data;
+ uint64_t inline_version;
frag_info_t dirstat;
nest_info_t rstat;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 10/18] mds: Push inline data to client in inodestat
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (8 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 09/18] ceph: Add inline fields to InodeStat Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 11/18] mds: Receive updated inline data from client Li Wang
` (7 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/mds/CInode.cc | 15 +++++++++++++++
src/messages/MClientReply.h | 7 +++++++
2 files changed, 22 insertions(+)
diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc
index 4756865..ed51e1d 100644
--- a/src/mds/CInode.cc
+++ b/src/mds/CInode.cc
@@ -2824,6 +2824,16 @@ int CInode::encode_inodestat(bufferlist& bl, Session *session,
e.files = i->dirstat.nfiles;
e.subdirs = i->dirstat.nsubdirs;
+ // inline data
+ uint64_t inline_version = 0;
+ bufferlist inline_data;
+ if (!cap || (cap->client_inline_version < i->inline_version)) {
+ inline_version = i->inline_version;
+ inline_data = i->inline_data;
+ if (cap)
+ cap->client_inline_version = i->inline_version;
+ }
+
// nest (do same as file... :/)
i->rstat.rctime.encode_timeval(&e.rctime);
e.rbytes = i->rstat.rbytes;
@@ -2862,6 +2872,7 @@ int CInode::encode_inodestat(bufferlist& bl, Session *session,
bytes += (sizeof(__u32) + sizeof(__u32)) * dirfragtree._splits.size();
bytes += sizeof(__u32) + symlink.length();
bytes += sizeof(__u32) + xbl.length();
+ bytes += sizeof(__u64) + sizeof(__u32) + inline_data.length();
if (bytes > max_bytes)
return -ENOSPC;
}
@@ -2957,6 +2968,10 @@ int CInode::encode_inodestat(bufferlist& bl, Session *session,
::encode(i->dir_layout, bl);
}
::encode(xbl, bl);
+ if (session->connection->has_feature(CEPH_FEATURE_MDS_INLINE_DATA)) {
+ ::encode(inline_version, bl);
+ ::encode(inline_data, bl);
+ }
return valid;
}
diff --git a/src/messages/MClientReply.h b/src/messages/MClientReply.h
index 47908e9..ebb3b9b 100644
--- a/src/messages/MClientReply.h
+++ b/src/messages/MClientReply.h
@@ -176,6 +176,13 @@ struct InodeStat {
xattr_version = e.xattr_version;
::decode(xattrbl, p);
+
+ if (features & CEPH_FEATURE_MDS_INLINE_DATA) {
+ ::decode(inline_version, p);
+ ::decode(inline_data, p);
+ } else {
+ inline_version = CEPH_INLINE_NONE;
+ }
}
// see CInode::encode_inodestat for encoder.
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 11/18] mds: Receive updated inline data from client
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (9 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 10/18] mds: Push inline data to client in inodestat Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 12/18] client: Add inline fields to Inode Li Wang
` (6 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/mds/Locker.cc | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc
index 63e0e08..4b02a56 100644
--- a/src/mds/Locker.cc
+++ b/src/mds/Locker.cc
@@ -2691,6 +2691,7 @@ void Locker::_update_cap_fields(CInode *in, int dirty, MClientCaps *m, inode_t *
utime_t mtime = m->get_mtime();
utime_t ctime = m->get_ctime();
uint64_t size = m->get_size();
+ uint64_t inline_version = m->inline_version;
if (((dirty & CEPH_CAP_FILE_WR) && mtime > pi->mtime) ||
((dirty & CEPH_CAP_FILE_EXCL) && mtime != pi->mtime)) {
@@ -2710,6 +2711,12 @@ void Locker::_update_cap_fields(CInode *in, int dirty, MClientCaps *m, inode_t *
pi->size = size;
pi->rstat.rbytes = size;
}
+ if (in->inode.is_file() &&
+ (dirty & CEPH_CAP_FILE_WR) &&
+ inline_version > pi->inline_version) {
+ pi->inline_version = inline_version;
+ pi->inline_data = m->inline_data;
+ }
if ((dirty & CEPH_CAP_FILE_EXCL) && atime != pi->atime) {
dout(7) << " atime " << pi->atime << " -> " << atime
<< " for " << *in << dendl;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 12/18] client: Add inline fields to Inode
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (10 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 11/18] mds: Receive updated inline data from client Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 13/18] client: Receive inline data pushed from mds Li Wang
` (5 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/client/Inode.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/src/client/Inode.h b/src/client/Inode.h
index cc054a6..bb17706 100644
--- a/src/client/Inode.h
+++ b/src/client/Inode.h
@@ -111,6 +111,10 @@ class Inode {
version_t version; // auth only
version_t xattr_version;
+ // inline data
+ uint64_t inline_version;
+ bufferlist inline_data;
+
bool is_symlink() const { return (mode & S_IFMT) == S_IFLNK; }
bool is_dir() const { return (mode & S_IFMT) == S_IFDIR; }
bool is_file() const { return (mode & S_IFMT) == S_IFREG; }
@@ -207,6 +211,7 @@ class Inode {
rdev(0), mode(0), uid(0), gid(0), nlink(0),
size(0), truncate_seq(1), truncate_size(-1),
time_warp_seq(0), max_size(0), version(0), xattr_version(0),
+ inline_version(0),
flags(0),
dir_hashed(false), dir_replicated(false), auth_cap(NULL),
dirty_caps(0), flushing_caps(0), flushing_cap_seq(0), shared_gen(0), cache_gen(0),
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 13/18] client: Receive inline data pushed from mds
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (11 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 12/18] client: Add inline fields to Inode Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 14/18] client: Push inline data to mds by send cap Li Wang
` (4 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/client/Client.cc | 22 ++++++++++++++++++++--
src/client/Client.h | 1 +
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/src/client/Client.cc b/src/client/Client.cc
index a4d5550..19d31e0 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -494,6 +494,8 @@ void Client::update_inode_file_bits(Inode *in,
uint64_t time_warp_seq, utime_t ctime,
utime_t mtime,
utime_t atime,
+ uint64_t inline_version,
+ bufferlist& inline_data,
int issued)
{
bool warn = false;
@@ -504,6 +506,11 @@ void Client::update_inode_file_bits(Inode *in,
<< " local " << in->time_warp_seq << dendl;
uint64_t prior_size = in->size;
+ if (inline_version > in->inline_version) {
+ in->inline_data = inline_data;
+ in->inline_version = inline_version;
+ }
+
if (truncate_seq > in->truncate_seq ||
(truncate_seq == in->truncate_seq && size > in->size)) {
ldout(cct, 10) << "size " << in->size << " -> " << size << dendl;
@@ -520,6 +527,13 @@ void Client::update_inode_file_bits(Inode *in,
_invalidate_inode_cache(in, truncate_size, prior_size - truncate_size, true);
}
}
+
+ // truncate inline data
+ if (in->inline_version < CEPH_INLINE_NONE) {
+ uint32_t len = in->inline_data.length();
+ if (size < len)
+ in->inline_data.splice(size, len - size);
+ }
}
if (truncate_seq >= in->truncate_seq &&
in->truncate_size != truncate_size) {
@@ -654,6 +668,7 @@ Inode * Client::add_update_inode(InodeStat *st, utime_t from, MetaSession *sessi
update_inode_file_bits(in, st->truncate_seq, st->truncate_size, st->size,
st->time_warp_seq, st->ctime, st->mtime, st->atime,
+ st->inline_version, st->inline_data,
issued);
}
@@ -3524,7 +3539,9 @@ void Client::handle_cap_trunc(MetaSession *session, Inode *in, MClientCaps *m)
issued |= implemented;
update_inode_file_bits(in, m->get_truncate_seq(), m->get_truncate_size(),
m->get_size(), m->get_time_warp_seq(), m->get_ctime(),
- m->get_mtime(), m->get_atime(), issued);
+ m->get_mtime(), m->get_atime(),
+ m->inline_version, m->inline_data,
+ issued);
m->put();
}
@@ -3674,7 +3691,8 @@ void Client::handle_cap_grant(MetaSession *session, Inode *in, Cap *cap, MClient
in->xattr_version = m->head.xattr_version;
}
update_inode_file_bits(in, m->get_truncate_seq(), m->get_truncate_size(), m->get_size(),
- m->get_time_warp_seq(), m->get_ctime(), m->get_mtime(), m->get_atime(), issued);
+ m->get_time_warp_seq(), m->get_ctime(), m->get_mtime(), m->get_atime(),
+ m->inline_version, m->inline_data, issued);
// max_size
if (cap == in->auth_cap &&
diff --git a/src/client/Client.h b/src/client/Client.h
index 649bacc..48f1fea 100644
--- a/src/client/Client.h
+++ b/src/client/Client.h
@@ -508,6 +508,7 @@ protected:
void update_inode_file_bits(Inode *in,
uint64_t truncate_seq, uint64_t truncate_size, uint64_t size,
uint64_t time_warp_seq, utime_t ctime, utime_t mtime, utime_t atime,
+ uint64_t inline_version, bufferlist& inline_data,
int issued);
Inode *add_update_inode(InodeStat *st, utime_t ttl, MetaSession *session);
Dentry *insert_dentry_inode(Dir *dir, const string& dname, LeaseStat *dlease,
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 14/18] client: Push inline data to mds by send cap
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (12 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 13/18] client: Receive inline data pushed from mds Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 15/18] client: Add inline data migration helper Li Wang
` (3 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/client/Client.cc | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/src/client/Client.cc b/src/client/Client.cc
index 19d31e0..3beab8f 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -2399,6 +2399,11 @@ void Client::send_cap(Inode *in, MetaSession *session, Cap *cap,
in->ctime.encode_timeval(&m->head.ctime);
m->head.time_warp_seq = in->time_warp_seq;
+ if (flush & CEPH_CAP_FILE_WR) {
+ m->inline_version = in->inline_version;
+ m->inline_data = in->inline_data;
+ }
+
in->reported_size = in->size;
m->set_snap_follows(follows);
cap->wanted = want;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 15/18] client: Add inline data migration helper
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (13 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 14/18] client: Push inline data to mds by send cap Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 16/18] client: Read inline data path Li Wang
` (2 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/client/Client.cc | 41 +++++++++++++++++++++++++++++++++++++++++
src/client/Client.h | 3 +++
2 files changed, 44 insertions(+)
diff --git a/src/client/Client.cc b/src/client/Client.cc
index 3beab8f..bbd56f2 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -5767,6 +5767,47 @@ void Client::unlock_fh_pos(Fh *f)
f->pos_locked = false;
}
+int Client::uninline_data(Inode *in, Context *onfinish)
+{
+ char oid_buf[32];
+ snprintf(oid_buf, sizeof(oid_buf), "%llx.00000000", (long long unsigned)in->ino);
+ object_t oid = oid_buf;
+
+ ObjectOperation create_ops;
+ create_ops.create(false);
+
+ objecter->mutate(oid,
+ OSDMap::file_to_object_locator(in->layout),
+ create_ops,
+ in->snaprealm->get_snap_context(),
+ ceph_clock_now(cct),
+ 0,
+ NULL,
+ NULL);
+
+ bufferlist inline_version_bl;
+ ::encode(in->inline_version, inline_version_bl);
+
+ ObjectOperation uninline_ops;
+ uninline_ops.cmpxattr("inline_version",
+ CEPH_OSD_CMPXATTR_OP_GT,
+ CEPH_OSD_CMPXATTR_MODE_U64,
+ inline_version_bl);
+ bufferlist inline_data = in->inline_data;
+ uninline_ops.write(0, inline_data, in->truncate_size, in->truncate_seq);
+ uninline_ops.setxattr("inline_version", inline_version_bl);
+
+ objecter->mutate(oid,
+ OSDMap::file_to_object_locator(in->layout),
+ uninline_ops,
+ in->snaprealm->get_snap_context(),
+ ceph_clock_now(cct),
+ 0,
+ NULL,
+ onfinish);
+
+ return 0;
+}
//
diff --git a/src/client/Client.h b/src/client/Client.h
index 48f1fea..63f0c41 100644
--- a/src/client/Client.h
+++ b/src/client/Client.h
@@ -429,6 +429,9 @@ protected:
void handle_lease(MClientLease *m);
+ // inline data
+ int uninline_data(Inode *in, Context *onfinish);
+
// file caps
void check_cap_issue(Inode *in, Cap *cap, unsigned issued);
void add_update_cap(Inode *in, MetaSession *session, uint64_t cap_id,
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 16/18] client: Read inline data path
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (14 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 15/18] client: Add inline data migration helper Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-27 13:40 ` [PATCH 17/18] client: Write " Li Wang
2013-11-27 13:40 ` [PATCH 18/18] client: Fallocate " Li Wang
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/client/Client.cc | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/src/client/Client.cc b/src/client/Client.cc
index bbd56f2..6b08155 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -5853,6 +5853,41 @@ int Client::_read(Fh *f, int64_t offset, uint64_t size, bufferlist *bl)
movepos = true;
}
+ Mutex uninline_flock("Clinet::_read_uninline_data flock");
+ Cond uninline_cond;
+ bool uninline_done = false;
+ int uninline_ret = 0;
+ Context *onuninline = NULL;
+
+ if (in->inline_version < CEPH_INLINE_NONE) {
+ if (!(have & CEPH_CAP_FILE_CACHE)) {
+ onuninline = new C_SafeCond(&uninline_flock,
+ &uninline_cond,
+ &uninline_done,
+ &uninline_ret);
+ uninline_data(in, onuninline);
+ } else {
+ uint32_t len = in->inline_data.length();
+
+ uint64_t endoff = offset + size;
+ if (endoff > in->size)
+ endoff = in->size;
+
+ if (offset < len) {
+ if (endoff <= len) {
+ bl->substr_of(in->inline_data, offset, endoff - offset);
+ } else {
+ bl->substr_of(in->inline_data, offset, len - offset);
+ bl->append_zero(endoff - len);
+ }
+ } else if (offset < endoff) {
+ bl->append_zero(endoff - offset);
+ }
+
+ goto success;
+ }
+ }
+
if (!conf->client_debug_force_sync_read &&
(cct->_conf->client_oc && (have & CEPH_CAP_FILE_CACHE))) {
@@ -5869,6 +5904,8 @@ int Client::_read(Fh *f, int64_t offset, uint64_t size, bufferlist *bl)
goto done;
}
+success:
+
if (movepos) {
// adjust fd pos
f->pos = offset+bl->length();
@@ -5890,6 +5927,24 @@ int Client::_read(Fh *f, int64_t offset, uint64_t size, bufferlist *bl)
done:
// done!
+
+ if (onuninline) {
+ client_lock.Unlock();
+ uninline_flock.Lock();
+ while (!uninline_done)
+ uninline_cond.Wait(uninline_flock);
+ uninline_flock.Unlock();
+ client_lock.Lock();
+
+ if (uninline_ret >= 0 || uninline_ret == -ECANCELED) {
+ in->inline_data.clear();
+ in->inline_version = CEPH_INLINE_NONE;
+ mark_caps_dirty(in, CEPH_CAP_FILE_WR);
+ check_caps(in, false);
+ } else
+ r = uninline_ret;
+ }
+
put_cap_ref(in, CEPH_CAP_FILE_RD);
return r;
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 17/18] client: Write inline data path
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (15 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 16/18] client: Read inline data path Li Wang
@ 2013-11-27 13:40 ` Li Wang
2013-11-28 3:02 ` Yan, Zheng
2013-11-27 13:40 ` [PATCH 18/18] client: Fallocate " Li Wang
17 siblings, 1 reply; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/client/Client.cc | 55 +++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 54 insertions(+), 1 deletion(-)
diff --git a/src/client/Client.cc b/src/client/Client.cc
index 6b08155..c913e35 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -6215,6 +6215,41 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
ldout(cct, 10) << " snaprealm " << *in->snaprealm << dendl;
+ Mutex uninline_flock("Clinet::_write_uninline_data flock");
+ Cond uninline_cond;
+ bool uninline_done = false;
+ int uninline_ret = 0;
+ Context *onuninline = NULL;
+
+ if (in->inline_version < CEPH_INLINE_NONE) {
+ if (endoff > CEPH_INLINE_SIZE || !(have & CEPH_CAP_FILE_BUFFER)) {
+ onuninline = new C_SafeCond(&uninline_flock,
+ &uninline_cond,
+ &uninline_done,
+ &uninline_ret);
+ uninline_data(in, onuninline);
+ } else {
+ get_cap_ref(in, CEPH_CAP_FILE_BUFFER);
+
+ uint32_t len = in->inline_data.length();
+
+ if (endoff < len)
+ in->inline_data.copy(endoff, len - endoff, bl);
+
+ if (offset < len)
+ in->inline_data.splice(offset, len - offset);
+ else if (offset > len)
+ in->inline_data.append_zero(offset - len);
+
+ in->inline_data.append(bl);
+ in->inline_version++;
+
+ put_cap_ref(in, CEPH_CAP_FILE_BUFFER);
+
+ goto success;
+ }
+ }
+
if (cct->_conf->client_oc && (have & CEPH_CAP_FILE_BUFFER)) {
// do buffered write
if (!in->oset.dirty_or_tx)
@@ -6265,7 +6300,7 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
}
// if we get here, write was successful, update client metadata
-
+success:
// time
lat = ceph_clock_now(cct);
lat -= start;
@@ -6293,6 +6328,24 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
mark_caps_dirty(in, CEPH_CAP_FILE_WR);
done:
+
+ if (onuninline) {
+ client_lock.Unlock();
+ uninline_flock.Lock();
+ while (!uninline_done)
+ uninline_cond.Wait(uninline_flock);
+ uninline_flock.Unlock();
+ client_lock.Lock();
+
+ if (uninline_ret >= 0 || uninline_ret == -ECANCELED) {
+ in->inline_data.clear();
+ in->inline_version = CEPH_INLINE_NONE;
+ mark_caps_dirty(in, CEPH_CAP_FILE_WR);
+ check_caps(in, false);
+ } else
+ r = uninline_ret;
+ }
+
put_cap_ref(in, CEPH_CAP_FILE_WR);
return r;
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 18/18] client: Fallocate inline data path
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
` (16 preceding siblings ...)
2013-11-27 13:40 ` [PATCH 17/18] client: Write " Li Wang
@ 2013-11-27 13:40 ` Li Wang
17 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-11-27 13:40 UTC (permalink / raw)
To: ceph-devel; +Cc: Sage Weil, Li Wang, Yunchuan Wen
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
src/client/Client.cc | 99 ++++++++++++++++++++++++++++++++++++++------------
1 file changed, 76 insertions(+), 23 deletions(-)
diff --git a/src/client/Client.cc b/src/client/Client.cc
index c913e35..d77e4ce 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -8002,34 +8002,69 @@ int Client::_fallocate(Fh *fh, int mode, int64_t offset, int64_t length)
if (r < 0)
return r;
+ Mutex uninline_flock("Clinet::_fallocate_uninline_data flock");
+ Cond uninline_cond;
+ bool uninline_done = false;
+ int uninline_ret = 0;
+ Context *onuninline = NULL;
+
if (mode & FALLOC_FL_PUNCH_HOLE) {
- Mutex flock("Client::_punch_hole flock");
- Cond cond;
- bool done = false;
- Context *onfinish = new C_SafeCond(&flock, &cond, &done);
- Context *onsafe = new C_Client_SyncCommit(this, in);
+ if (in->inline_version < CEPH_INLINE_NONE &&
+ (have & CEPH_CAP_FILE_BUFFER)) {
+ bufferlist bl;
+ int len = in->inline_data.length();
+ if (offset < len) {
+ if (offset > 0)
+ in->inline_data.copy(0, offset, bl);
+ int size = length;
+ if (offset + size > len)
+ size = len - offset;
+ if (size > 0)
+ bl.append_zero(size);
+ if (offset + size < len)
+ in->inline_data.copy(offset + size, len - offset - size, bl);
+ in->inline_data = bl;
+ in->inline_version++;
+ }
+ in->mtime = ceph_clock_now(cct);
+ mark_caps_dirty(in, CEPH_CAP_FILE_WR);
+ } else {
+ if (in->inline_version < CEPH_INLINE_NONE) {
+ onuninline = new C_SafeCond(&uninline_flock,
+ &uninline_cond,
+ &uninline_done,
+ &uninline_ret);
+ uninline_data(in, onuninline);
+ }
- unsafe_sync_write++;
- get_cap_ref(in, CEPH_CAP_FILE_BUFFER);
+ Mutex flock("Client::_punch_hole flock");
+ Cond cond;
+ bool done = false;
+ Context *onfinish = new C_SafeCond(&flock, &cond, &done);
+ Context *onsafe = new C_Client_SyncCommit(this, in);
- _invalidate_inode_cache(in, offset, length, true);
- r = filer->zero(in->ino, &in->layout,
- in->snaprealm->get_snap_context(),
- offset, length,
- ceph_clock_now(cct),
- 0, true, onfinish, onsafe);
- if (r < 0)
- goto done;
+ unsafe_sync_write++;
+ get_cap_ref(in, CEPH_CAP_FILE_BUFFER);
- in->mtime = ceph_clock_now(cct);
- mark_caps_dirty(in, CEPH_CAP_FILE_WR);
+ _invalidate_inode_cache(in, offset, length, true);
+ r = filer->zero(in->ino, &in->layout,
+ in->snaprealm->get_snap_context(),
+ offset, length,
+ ceph_clock_now(cct),
+ 0, true, onfinish, onsafe);
+ if (r < 0)
+ goto done;
- client_lock.Unlock();
- flock.Lock();
- while (!done)
- cond.Wait(flock);
- flock.Unlock();
- client_lock.Lock();
+ in->mtime = ceph_clock_now(cct);
+ mark_caps_dirty(in, CEPH_CAP_FILE_WR);
+
+ client_lock.Unlock();
+ flock.Lock();
+ while (!done)
+ cond.Wait(flock);
+ flock.Unlock();
+ client_lock.Lock();
+ }
} else if (!(mode & FALLOC_FL_KEEP_SIZE)) {
uint64_t size = offset + length;
if (size > in->size) {
@@ -8044,6 +8079,24 @@ int Client::_fallocate(Fh *fh, int mode, int64_t offset, int64_t length)
}
done:
+
+ if (onuninline) {
+ client_lock.Unlock();
+ uninline_flock.Lock();
+ while (!uninline_done)
+ uninline_cond.Wait(uninline_flock);
+ uninline_flock.Unlock();
+ client_lock.Lock();
+
+ if (uninline_ret >= 0 || uninline_ret == -ECANCELED) {
+ in->inline_data.clear();
+ in->inline_version = CEPH_INLINE_NONE;
+ mark_caps_dirty(in, CEPH_CAP_FILE_WR);
+ check_caps(in, false);
+ } else
+ r = uninline_ret;
+ }
+
put_cap_ref(in, CEPH_CAP_FILE_WR);
return r;
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 17/18] client: Write inline data path
2013-11-27 13:40 ` [PATCH 17/18] client: Write " Li Wang
@ 2013-11-28 3:02 ` Yan, Zheng
2013-11-29 17:01 ` Matt W. Benjamin
2013-12-02 8:03 ` Li Wang
0 siblings, 2 replies; 23+ messages in thread
From: Yan, Zheng @ 2013-11-28 3:02 UTC (permalink / raw)
To: Li Wang; +Cc: ceph-devel, Sage Weil, Yunchuan Wen
On Wed, Nov 27, 2013 at 9:40 PM, Li Wang <liwang@ubuntukylin.com> wrote:
> Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
> Signed-off-by: Li Wang <liwang@ubuntukylin.com>
> ---
> src/client/Client.cc | 55 +++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/src/client/Client.cc b/src/client/Client.cc
> index 6b08155..c913e35 100644
> --- a/src/client/Client.cc
> +++ b/src/client/Client.cc
> @@ -6215,6 +6215,41 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
>
> ldout(cct, 10) << " snaprealm " << *in->snaprealm << dendl;
>
> + Mutex uninline_flock("Clinet::_write_uninline_data flock");
> + Cond uninline_cond;
> + bool uninline_done = false;
> + int uninline_ret = 0;
> + Context *onuninline = NULL;
> +
> + if (in->inline_version < CEPH_INLINE_NONE) {
> + if (endoff > CEPH_INLINE_SIZE || !(have & CEPH_CAP_FILE_BUFFER)) {
> + onuninline = new C_SafeCond(&uninline_flock,
> + &uninline_cond,
> + &uninline_done,
> + &uninline_ret);
> + uninline_data(in, onuninline);
If client does 4k sequence write, the second write always trigger the
"uninline" procedure, this is suboptimal. It's better to just copy the
inline data to the object cacher.
Besides, this feature should be disabled by default because it's not
compatible with old clients and it imposes overhead on the mds. we
need to use a config option or directory attribute to enable it.
Regards
Yan, Zheng
> + } else {
> + get_cap_ref(in, CEPH_CAP_FILE_BUFFER);
> +
> + uint32_t len = in->inline_data.length();
> +
> + if (endoff < len)
> + in->inline_data.copy(endoff, len - endoff, bl);
> +
> + if (offset < len)
> + in->inline_data.splice(offset, len - offset);
> + else if (offset > len)
> + in->inline_data.append_zero(offset - len);
> +
> + in->inline_data.append(bl);
> + in->inline_version++;
> +
> + put_cap_ref(in, CEPH_CAP_FILE_BUFFER);
> +
> + goto success;
> + }
> + }
> +
> if (cct->_conf->client_oc && (have & CEPH_CAP_FILE_BUFFER)) {
> // do buffered write
> if (!in->oset.dirty_or_tx)
> @@ -6265,7 +6300,7 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
> }
>
> // if we get here, write was successful, update client metadata
> -
> +success:
> // time
> lat = ceph_clock_now(cct);
> lat -= start;
> @@ -6293,6 +6328,24 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
> mark_caps_dirty(in, CEPH_CAP_FILE_WR);
>
> done:
> +
> + if (onuninline) {
> + client_lock.Unlock();
> + uninline_flock.Lock();
> + while (!uninline_done)
> + uninline_cond.Wait(uninline_flock);
> + uninline_flock.Unlock();
> + client_lock.Lock();
> +
> + if (uninline_ret >= 0 || uninline_ret == -ECANCELED) {
> + in->inline_data.clear();
> + in->inline_version = CEPH_INLINE_NONE;
> + mark_caps_dirty(in, CEPH_CAP_FILE_WR);
> + check_caps(in, false);
> + } else
> + r = uninline_ret;
> + }
> +
> put_cap_ref(in, CEPH_CAP_FILE_WR);
> return r;
> }
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 17/18] client: Write inline data path
2013-11-28 3:02 ` Yan, Zheng
@ 2013-11-29 17:01 ` Matt W. Benjamin
2013-12-02 8:20 ` Li Wang
2013-12-02 8:03 ` Li Wang
1 sibling, 1 reply; 23+ messages in thread
From: Matt W. Benjamin @ 2013-11-29 17:01 UTC (permalink / raw)
To: Zheng Yan; +Cc: ceph-devel, Sage Weil, Yunchuan Wen, Li Wang
Hi,
I wondered about this. Were you able to measure an effect?
Thanks,
Matt
----- "Zheng Yan" <ukernel@gmail.com> wrote:
>
> Besides, this feature should be disabled by default because it's not
> compatible with old clients and it imposes overhead on the mds. we
> need to use a config option or directory attribute to enable it.
>
> Regards
> Yan, Zheng
>
--
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI 48104
http://linuxbox.com
tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 17/18] client: Write inline data path
2013-11-28 3:02 ` Yan, Zheng
2013-11-29 17:01 ` Matt W. Benjamin
@ 2013-12-02 8:03 ` Li Wang
1 sibling, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-12-02 8:03 UTC (permalink / raw)
To: Yan, Zheng; +Cc: ceph-devel, Sage Weil, Yunchuan Wen
Hi Zheng,
Thanks for your comments.
Regarding the configuration option, it is in our original plan, and
we will make it appear soon in the incoming next version :)
For the write optimization, it does remind us to do an optimization,
that is, if the inline data length is zero, we won't bother to do the
migration. This will capture the situation that application has a write
buffer larger than the inline threshold, the sequential write will not
incur migration. And another situation that client performs some inline
read/write, then truncate it to zero, then start write after the inline
threshold.
Cheers,
Li Wang
On 11/28/2013 11:02 AM, Yan, Zheng wrote:
> On Wed, Nov 27, 2013 at 9:40 PM, Li Wang <liwang@ubuntukylin.com> wrote:
>> Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
>> Signed-off-by: Li Wang <liwang@ubuntukylin.com>
>> ---
>> src/client/Client.cc | 55 +++++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 54 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/client/Client.cc b/src/client/Client.cc
>> index 6b08155..c913e35 100644
>> --- a/src/client/Client.cc
>> +++ b/src/client/Client.cc
>> @@ -6215,6 +6215,41 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
>>
>> ldout(cct, 10) << " snaprealm " << *in->snaprealm << dendl;
>>
>> + Mutex uninline_flock("Clinet::_write_uninline_data flock");
>> + Cond uninline_cond;
>> + bool uninline_done = false;
>> + int uninline_ret = 0;
>> + Context *onuninline = NULL;
>> +
>> + if (in->inline_version < CEPH_INLINE_NONE) {
>> + if (endoff > CEPH_INLINE_SIZE || !(have & CEPH_CAP_FILE_BUFFER)) {
>> + onuninline = new C_SafeCond(&uninline_flock,
>> + &uninline_cond,
>> + &uninline_done,
>> + &uninline_ret);
>> + uninline_data(in, onuninline);
>
> If client does 4k sequence write, the second write always trigger the
> "uninline" procedure, this is suboptimal. It's better to just copy the
> inline data to the object cacher.
>
> Besides, this feature should be disabled by default because it's not
> compatible with old clients and it imposes overhead on the mds. we
> need to use a config option or directory attribute to enable it.
>
> Regards
> Yan, Zheng
>
>
>> + } else {
>> + get_cap_ref(in, CEPH_CAP_FILE_BUFFER);
>> +
>> + uint32_t len = in->inline_data.length();
>> +
>> + if (endoff < len)
>> + in->inline_data.copy(endoff, len - endoff, bl);
>> +
>> + if (offset < len)
>> + in->inline_data.splice(offset, len - offset);
>> + else if (offset > len)
>> + in->inline_data.append_zero(offset - len);
>> +
>> + in->inline_data.append(bl);
>> + in->inline_version++;
>> +
>> + put_cap_ref(in, CEPH_CAP_FILE_BUFFER);
>> +
>> + goto success;
>> + }
>> + }
>> +
>> if (cct->_conf->client_oc && (have & CEPH_CAP_FILE_BUFFER)) {
>> // do buffered write
>> if (!in->oset.dirty_or_tx)
>> @@ -6265,7 +6300,7 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
>> }
>>
>> // if we get here, write was successful, update client metadata
>> -
>> +success:
>> // time
>> lat = ceph_clock_now(cct);
>> lat -= start;
>> @@ -6293,6 +6328,24 @@ int Client::_write(Fh *f, int64_t offset, uint64_t size, const char *buf)
>> mark_caps_dirty(in, CEPH_CAP_FILE_WR);
>>
>> done:
>> +
>> + if (onuninline) {
>> + client_lock.Unlock();
>> + uninline_flock.Lock();
>> + while (!uninline_done)
>> + uninline_cond.Wait(uninline_flock);
>> + uninline_flock.Unlock();
>> + client_lock.Lock();
>> +
>> + if (uninline_ret >= 0 || uninline_ret == -ECANCELED) {
>> + in->inline_data.clear();
>> + in->inline_version = CEPH_INLINE_NONE;
>> + mark_caps_dirty(in, CEPH_CAP_FILE_WR);
>> + check_caps(in, false);
>> + } else
>> + r = uninline_ret;
>> + }
>> +
>> put_cap_ref(in, CEPH_CAP_FILE_WR);
>> return r;
>> }
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 17/18] client: Write inline data path
2013-11-29 17:01 ` Matt W. Benjamin
@ 2013-12-02 8:20 ` Li Wang
0 siblings, 0 replies; 23+ messages in thread
From: Li Wang @ 2013-12-02 8:20 UTC (permalink / raw)
To: Matt W. Benjamin; +Cc: Zheng Yan, ceph-devel, Sage Weil, Yunchuan Wen
Hi Matt,
This feature is expected to run in the massive tiny file storage
situation, and there are some further designs to reduce the overhead
when the file is not tiny,
(1) The migration is done asynchronously with the subsequent read/write
(v3);
(2) Avoid migration when the inline data length be zero, that will
capture some situations to almost eliminate the migration overhead (v4);
(3) It could be implicitly turned off at mount time by client (v4);
(4) It could be turned off globally by configuring the mds(v4).
v4 is coming soon.
Cheers,
Li Wang
On 11/30/2013 01:01 AM, Matt W. Benjamin wrote:
> Hi,
>
> I wondered about this. Were you able to measure an effect?
>
> Thanks,
>
> Matt
>
> ----- "Zheng Yan" <ukernel@gmail.com> wrote:
>
>>
>> Besides, this feature should be disabled by default because it's not
>> compatible with old clients and it imposes overhead on the mds. we
>> need to use a config option or directory attribute to enable it.
>>
>> Regards
>> Yan, Zheng
>>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2013-12-02 8:20 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-27 13:40 [PATCH v3 00/18] ceph: Inline data support Li Wang
2013-11-27 13:40 ` [PATCH 01/18] ceph: Add inline data feature Li Wang
2013-11-27 13:40 ` [PATCH 02/18] ceph: Add inline state definition Li Wang
2013-11-27 13:40 ` [PATCH 03/18] mds: Add inline fields to inode_t Li Wang
2013-11-27 13:40 ` [PATCH 04/18] mds: Add inline encode/decode " Li Wang
2013-11-27 13:40 ` [PATCH 05/18] ceph: Add inline fields to MClientCaps Li Wang
2013-11-27 13:40 ` [PATCH 06/18] osdc: Add write method with truncate parameters Li Wang
2013-11-27 13:40 ` [PATCH 07/18] mds: Add inline fields to Capability Li Wang
2013-11-27 13:40 ` [PATCH 08/18] mds: Push inline data to client in cap message Li Wang
2013-11-27 13:40 ` [PATCH 09/18] ceph: Add inline fields to InodeStat Li Wang
2013-11-27 13:40 ` [PATCH 10/18] mds: Push inline data to client in inodestat Li Wang
2013-11-27 13:40 ` [PATCH 11/18] mds: Receive updated inline data from client Li Wang
2013-11-27 13:40 ` [PATCH 12/18] client: Add inline fields to Inode Li Wang
2013-11-27 13:40 ` [PATCH 13/18] client: Receive inline data pushed from mds Li Wang
2013-11-27 13:40 ` [PATCH 14/18] client: Push inline data to mds by send cap Li Wang
2013-11-27 13:40 ` [PATCH 15/18] client: Add inline data migration helper Li Wang
2013-11-27 13:40 ` [PATCH 16/18] client: Read inline data path Li Wang
2013-11-27 13:40 ` [PATCH 17/18] client: Write " Li Wang
2013-11-28 3:02 ` Yan, Zheng
2013-11-29 17:01 ` Matt W. Benjamin
2013-12-02 8:20 ` Li Wang
2013-12-02 8:03 ` Li Wang
2013-11-27 13:40 ` [PATCH 18/18] client: Fallocate " Li Wang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.