ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] ceph: size handling for the fscrypt
@ 2021-10-28  9:14 xiubli
  2021-10-28  9:14 ` [PATCH v3 1/4] ceph: add __ceph_get_caps helper support xiubli
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: xiubli @ 2021-10-28  9:14 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

This patch series is based on the fscrypt_size_handling branch in
https://github.com/lxbsz/linux.git, which is based Jeff's
ceph-fscrypt-content-experimental branch in
https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git,
has reverted one useless commit and added some upstream commits.

I will keep this patch set as simple as possible to review since
this is still one framework code. It works and still in developing
and need some feedbacks and suggestions for two corner cases below.

====

This approach is based on the discussion from V1 and V2, which will
pass the encrypted last block contents to MDS along with the truncate
request.

This will send the encrypted last block contents to MDS along with
the truncate request when truncating to a smaller size and at the
same time new size does not align to BLOCK SIZE.

The MDS side patch is raised in PR
https://github.com/ceph/ceph/pull/43588, which is also based Jeff's
previous great work in PR https://github.com/ceph/ceph/pull/41284.

The MDS will use the filer.write_trunc(), which could update and
truncate the file in one shot, instead of filer.truncate().

This just assume kclient won't support the inline data feature, which
will be remove soon, more detail please see:
https://tracker.ceph.com/issues/52916

Changed in V3:
- Fix possibly corrupting the file just before the MDS acquires the
  xlock for FILE lock, another client has updated it.
- Flush the pagecache buffer before reading the last block for the
  when filling the truncate request.
- Some other minore fixes.

Xiubo Li (4):
  ceph: add __ceph_get_caps helper support
  ceph: add __ceph_sync_read helper support
  ceph: return the real size readed when hit EOF
  ceph: add truncate size handling support for fscrypt

 fs/ceph/caps.c              |  21 +++--
 fs/ceph/file.c              |  47 +++++++---
 fs/ceph/inode.c             | 182 ++++++++++++++++++++++++++++++++++--
 fs/ceph/super.h             |   5 +
 include/linux/ceph/crypto.h |  23 +++++
 5 files changed, 247 insertions(+), 31 deletions(-)
 create mode 100644 include/linux/ceph/crypto.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/4] ceph: add __ceph_get_caps helper support
  2021-10-28  9:14 [PATCH v3 0/4] ceph: size handling for the fscrypt xiubli
@ 2021-10-28  9:14 ` xiubli
  2021-10-28  9:23   ` Xiubo Li
  2021-10-28  9:14 ` [PATCH v3 2/4] ceph: add __ceph_sync_read " xiubli
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: xiubli @ 2021-10-28  9:14 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/caps.c  | 19 +++++++++++++------
 fs/ceph/super.h |  2 ++
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index d628dcdbf869..4e2a588465c5 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2876,10 +2876,9 @@ int ceph_try_get_caps(struct inode *inode, int need, int want,
  * due to a small max_size, make sure we check_max_size (and possibly
  * ask the mds) so we don't get hung up indefinitely.
  */
-int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got)
+int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi, int need,
+		    int want, loff_t endoff, int *got)
 {
-	struct ceph_file_info *fi = filp->private_data;
-	struct inode *inode = file_inode(filp);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
 	int ret, _got, flags;
@@ -2888,7 +2887,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
 	if (ret < 0)
 		return ret;
 
-	if ((fi->fmode & CEPH_FILE_MODE_WR) &&
+	if (fi && (fi->fmode & CEPH_FILE_MODE_WR) &&
 	    fi->filp_gen != READ_ONCE(fsc->filp_gen))
 		return -EBADF;
 
@@ -2896,7 +2895,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
 
 	while (true) {
 		flags &= CEPH_FILE_MODE_MASK;
-		if (atomic_read(&fi->num_locks))
+		if (fi && atomic_read(&fi->num_locks))
 			flags |= CHECK_FILELOCK;
 		_got = 0;
 		ret = try_get_cap_refs(inode, need, want, endoff,
@@ -2941,7 +2940,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
 				continue;
 		}
 
-		if ((fi->fmode & CEPH_FILE_MODE_WR) &&
+		if (fi && (fi->fmode & CEPH_FILE_MODE_WR) &&
 		    fi->filp_gen != READ_ONCE(fsc->filp_gen)) {
 			if (ret >= 0 && _got)
 				ceph_put_cap_refs(ci, _got);
@@ -3004,6 +3003,14 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
 	return 0;
 }
 
+int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got)
+{
+	struct ceph_file_info *fi = filp->private_data;
+	struct inode *inode = file_inode(filp);
+
+	return __ceph_get_caps(inode, fi, need, want, endoff, got);
+}
+
 /*
  * Take cap refs.  Caller must already know we hold at least one ref
  * on the caps in question or we don't know this is safe.
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 7f3976b3319d..027d5f579ba0 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1208,6 +1208,8 @@ extern int ceph_encode_dentry_release(void **p, struct dentry *dn,
 				      struct inode *dir,
 				      int mds, int drop, int unless);
 
+extern int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi,
+			   int need, int want, loff_t endoff, int *got);
 extern int ceph_get_caps(struct file *filp, int need, int want,
 			 loff_t endoff, int *got);
 extern int ceph_try_get_caps(struct inode *inode,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/4] ceph: add __ceph_sync_read helper support
  2021-10-28  9:14 [PATCH v3 0/4] ceph: size handling for the fscrypt xiubli
  2021-10-28  9:14 ` [PATCH v3 1/4] ceph: add __ceph_get_caps helper support xiubli
@ 2021-10-28  9:14 ` xiubli
  2021-10-28 18:21   ` Jeff Layton
  2021-10-28  9:14 ` [PATCH v3 3/4] ceph: return the real size readed when hit EOF xiubli
  2021-10-28  9:14 ` [PATCH v3 4/4] ceph: add truncate size handling support for fscrypt xiubli
  3 siblings, 1 reply; 19+ messages in thread
From: xiubli @ 2021-10-28  9:14 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/file.c  | 31 +++++++++++++++++++++----------
 fs/ceph/super.h |  2 ++
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 6e677b40410e..74db403a4c35 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -901,20 +901,17 @@ static inline void fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64
  * If we get a short result from the OSD, check against i_size; we need to
  * only return a short read to the caller if we hit EOF.
  */
-static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
-			      int *retry_op)
+ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
+			 struct iov_iter *to, int *retry_op)
 {
-	struct file *file = iocb->ki_filp;
-	struct inode *inode = file_inode(file);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
 	struct ceph_osd_client *osdc = &fsc->client->osdc;
 	ssize_t ret;
-	u64 off = iocb->ki_pos;
+	u64 off = *ki_pos;
 	u64 len = iov_iter_count(to);
 
-	dout("sync_read on file %p %llu~%u %s\n", file, off, (unsigned)len,
-	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
+	dout("sync_read on inode %p %llu~%u\n", inode, *ki_pos, (unsigned)len);
 
 	if (!len)
 		return 0;
@@ -1058,18 +1055,32 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
 			break;
 	}
 
-	if (off > iocb->ki_pos) {
+	if (off > *ki_pos) {
 		if (ret >= 0 &&
 		    iov_iter_count(to) > 0 && off >= i_size_read(inode))
 			*retry_op = CHECK_EOF;
-		ret = off - iocb->ki_pos;
-		iocb->ki_pos = off;
+		ret = off - *ki_pos;
+		*ki_pos = off;
 	}
 out:
 	dout("sync_read result %zd retry_op %d\n", ret, *retry_op);
 	return ret;
 }
 
+static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
+			      int *retry_op)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
+
+	dout("sync_read on file %p %llu~%u %s\n", file, iocb->ki_pos,
+	     (unsigned)iov_iter_count(to),
+	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
+
+	return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op);
+
+}
+
 struct ceph_aio_request {
 	struct kiocb *iocb;
 	size_t total_len;
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 027d5f579ba0..57bc952c54e1 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1235,6 +1235,8 @@ extern int ceph_renew_caps(struct inode *inode, int fmode);
 extern int ceph_open(struct inode *inode, struct file *file);
 extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 			    struct file *file, unsigned flags, umode_t mode);
+extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
+				struct iov_iter *to, int *retry_op);
 extern int ceph_release(struct inode *inode, struct file *filp);
 extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page,
 				  char *data, size_t len);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/4] ceph: return the real size readed when hit EOF
  2021-10-28  9:14 [PATCH v3 0/4] ceph: size handling for the fscrypt xiubli
  2021-10-28  9:14 ` [PATCH v3 1/4] ceph: add __ceph_get_caps helper support xiubli
  2021-10-28  9:14 ` [PATCH v3 2/4] ceph: add __ceph_sync_read " xiubli
@ 2021-10-28  9:14 ` xiubli
  2021-10-28 18:11   ` Jeff Layton
  2021-10-28  9:14 ` [PATCH v3 4/4] ceph: add truncate size handling support for fscrypt xiubli
  3 siblings, 1 reply; 19+ messages in thread
From: xiubli @ 2021-10-28  9:14 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/file.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 74db403a4c35..1988e75ad4a2 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -910,6 +910,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 	ssize_t ret;
 	u64 off = *ki_pos;
 	u64 len = iov_iter_count(to);
+	u64 i_size = i_size_read(inode);
 
 	dout("sync_read on inode %p %llu~%u\n", inode, *ki_pos, (unsigned)len);
 
@@ -933,7 +934,6 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 		struct page **pages;
 		int num_pages;
 		size_t page_off;
-		u64 i_size;
 		bool more;
 		int idx;
 		size_t left;
@@ -980,7 +980,6 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 
 		ceph_osdc_put_request(req);
 
-		i_size = i_size_read(inode);
 		dout("sync_read %llu~%llu got %zd i_size %llu%s\n",
 		     off, len, ret, i_size, (more ? " MORE" : ""));
 
@@ -1056,11 +1055,14 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 	}
 
 	if (off > *ki_pos) {
-		if (ret >= 0 &&
-		    iov_iter_count(to) > 0 && off >= i_size_read(inode))
+		if (off >= i_size) {
 			*retry_op = CHECK_EOF;
-		ret = off - *ki_pos;
-		*ki_pos = off;
+			ret = i_size - *ki_pos;
+			*ki_pos = i_size;
+		} else {
+			ret = off - *ki_pos;
+			*ki_pos = off;
+		}
 	}
 out:
 	dout("sync_read result %zd retry_op %d\n", ret, *retry_op);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 4/4] ceph: add truncate size handling support for fscrypt
  2021-10-28  9:14 [PATCH v3 0/4] ceph: size handling for the fscrypt xiubli
                   ` (2 preceding siblings ...)
  2021-10-28  9:14 ` [PATCH v3 3/4] ceph: return the real size readed when hit EOF xiubli
@ 2021-10-28  9:14 ` xiubli
  2021-10-28 19:17   ` Jeff Layton
  3 siblings, 1 reply; 19+ messages in thread
From: xiubli @ 2021-10-28  9:14 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

This will transfer the encrypted last block contents to the MDS
along with the truncate request only when new size is smaller and
not aligned to the fscrypt BLOCK size.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/caps.c              |   2 -
 fs/ceph/file.c              |  10 +-
 fs/ceph/inode.c             | 182 ++++++++++++++++++++++++++++++++++--
 fs/ceph/super.h             |   3 +-
 include/linux/ceph/crypto.h |  23 +++++
 5 files changed, 206 insertions(+), 14 deletions(-)
 create mode 100644 include/linux/ceph/crypto.h

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 4e2a588465c5..c9624b059eb0 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1299,8 +1299,6 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg)
 	 * fscrypt_auth holds the crypto context (if any). fscrypt_file
 	 * tracks the real i_size as an __le64 field (and we use a rounded-up
 	 * i_size in * the traditional size field).
-	 *
-	 * FIXME: should we encrypt fscrypt_file field?
 	 */
 	ceph_encode_32(&p, arg->fscrypt_auth_len);
 	ceph_encode_copy(&p, arg->fscrypt_auth, arg->fscrypt_auth_len);
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 1988e75ad4a2..2a780e0133df 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -902,7 +902,8 @@ static inline void fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64
  * only return a short read to the caller if we hit EOF.
  */
 ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
-			 struct iov_iter *to, int *retry_op)
+			 struct iov_iter *to, int *retry_op,
+			 u64 *assert_ver)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
@@ -978,6 +979,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 					 req->r_end_latency,
 					 len, ret);
 
+		/* Grab assert version. It must be non-zero. */
+		*assert_ver = req->r_version;
+		WARN_ON_ONCE(assert_ver == 0);
 		ceph_osdc_put_request(req);
 
 		dout("sync_read %llu~%llu got %zd i_size %llu%s\n",
@@ -1074,12 +1078,14 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
+	u64 assert_ver;
 
 	dout("sync_read on file %p %llu~%u %s\n", file, iocb->ki_pos,
 	     (unsigned)iov_iter_count(to),
 	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
 
-	return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op);
+	return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op,
+				&assert_ver);
 
 }
 
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 9b798690fdc9..abb634866897 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -21,6 +21,7 @@
 #include "cache.h"
 #include "crypto.h"
 #include <linux/ceph/decode.h>
+#include <linux/ceph/crypto.h>
 
 /*
  * Ceph inode operations
@@ -1034,10 +1035,14 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 		pool_ns = old_ns;
 
 		if (IS_ENCRYPTED(inode) && size &&
-		    (iinfo->fscrypt_file_len == sizeof(__le64))) {
-			size = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file);
-			if (info->size != round_up(size, CEPH_FSCRYPT_BLOCK_SIZE))
-				pr_warn("size=%llu fscrypt_file=%llu\n", info->size, size);
+		    (iinfo->fscrypt_file_len >= sizeof(__le64))) {
+			u64 fsize = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file);
+			if (fsize) {
+				size = fsize;
+				if (info->size != round_up(size, CEPH_FSCRYPT_BLOCK_SIZE))
+					pr_warn("size=%llu fscrypt_file=%llu\n",
+						info->size, size);
+			}
 		}
 
 		queue_trunc = ceph_fill_file_size(inode, issued,
@@ -2229,6 +2234,130 @@ static const struct inode_operations ceph_encrypted_symlink_iops = {
 	.listxattr = ceph_listxattr,
 };
 
+/*
+ * Transfer the encrypted last block to the MDS and the MDS
+ * will update the file when truncating a smaller size.
+ *
+ * We don't support a PAGE_SIZE that is smaller than the
+ * CEPH_FSCRYPT_BLOCK_SIZE.
+ */
+static int fill_fscrypt_truncate(struct inode *inode,
+				 struct ceph_mds_request *req,
+				 struct iattr *attr)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+	int boff = attr->ia_size % CEPH_FSCRYPT_BLOCK_SIZE;
+	loff_t pos, orig_pos = round_down(attr->ia_size, CEPH_FSCRYPT_BLOCK_SIZE);
+	u64 block = orig_pos >> CEPH_FSCRYPT_BLOCK_SHIFT;
+	struct ceph_pagelist *pagelist = NULL;
+	struct kvec iov;
+	struct iov_iter iter;
+	struct page *page = NULL;
+	struct ceph_fscrypt_truncate_size_header header;
+	int retry_op = 0;
+	int len = CEPH_FSCRYPT_BLOCK_SIZE;
+	loff_t i_size = i_size_read(inode);
+	bool fill_header_only = false;
+	u64 assert_ver = cpu_to_le64(0);
+	int got, ret, issued;
+
+	ret = __ceph_get_caps(inode, NULL, CEPH_CAP_FILE_RD, 0, -1, &got);
+	if (ret < 0)
+		return ret;
+
+	dout("%s size %lld -> %lld got cap refs on %s\n", __func__,
+	     i_size, attr->ia_size, ceph_cap_string(got));
+
+	issued = __ceph_caps_issued(ci, NULL);
+
+	/* Try to writeback the dirty pagecaches */
+	if (issued & (CEPH_CAP_FILE_BUFFER))
+		filemap_fdatawrite(&inode->i_data);
+
+	page = __page_cache_alloc(GFP_KERNEL);
+	if (page == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	iov.iov_base = kmap_local_page(page);
+	iov.iov_len = len;
+	iov_iter_kvec(&iter, READ, &iov, 1, len);
+
+	pos = orig_pos;
+	ret = __ceph_sync_read(inode, &pos, &iter, &retry_op, &assert_ver);
+
+	/*
+	 * If we hit a hole here, we should just skip filling
+	 * the fscrypt for the request, because once the fscrypt
+	 * is enabled, the file will be split into many blocks
+	 * with the size of CEPH_FSCRYPT_BLOCK_SIZE, if there
+	 * has a hole, the hole size should be multiple of block
+	 * size.
+	 */
+	if (pos < i_size && ret < len) {
+		dout("%s hit hole, ppos %lld < size %lld\n",
+		     __func__, pos, i_size);
+
+		ret = 0;
+		fill_header_only = true;
+		goto fill_last_block;
+	}
+
+	/* truncate and zero out the extra contents for the last block */
+	memset(iov.iov_base + boff, 0, PAGE_SIZE - boff);
+
+	/* encrypt the last block */
+	ret = fscrypt_encrypt_block_inplace(inode, page,
+					    CEPH_FSCRYPT_BLOCK_SIZE,
+					    0, block,
+					    GFP_KERNEL);
+	if (ret)
+		goto out;
+
+fill_last_block:
+	pagelist = ceph_pagelist_alloc(GFP_KERNEL);
+	if (!pagelist)
+		return -ENOMEM;
+
+	/* Insert the header first */
+	header.ver = 1;
+	header.compat = 1;
+	if (fill_header_only) {
+		header.data_len = cpu_to_le32(8 + 8 + 4);
+		header.assert_ver = cpu_to_le64(0);
+		header.file_offset = cpu_to_le64(0);
+		header.block_size = cpu_to_le64(0);
+	} else {
+		header.data_len = cpu_to_le32(8 + 8 + 4 + CEPH_FSCRYPT_BLOCK_SIZE);
+		header.assert_ver = assert_ver;
+		header.file_offset = cpu_to_le64(orig_pos);
+		header.block_size = cpu_to_le64(CEPH_FSCRYPT_BLOCK_SIZE);
+	}
+	ret = ceph_pagelist_append(pagelist, &header, sizeof(header));
+	if (ret)
+		goto out;
+
+	if (!fill_header_only) {
+		/* Append the last block contents to pagelist */
+		ret = ceph_pagelist_append(pagelist, iov.iov_base,
+					   CEPH_FSCRYPT_BLOCK_SIZE);
+		if (ret)
+			goto out;
+	}
+	req->r_pagelist = pagelist;
+out:
+	dout("%s %p size dropping cap refs on %s\n", __func__,
+	     inode, ceph_cap_string(got));
+	kunmap_local(iov.iov_base);
+	if (page)
+		__free_pages(page, 0);
+	if (ret && pagelist)
+		ceph_pagelist_release(pagelist);
+	ceph_put_cap_refs(ci, got);
+	return ret;
+}
+
 int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
@@ -2236,12 +2365,14 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 	struct ceph_mds_request *req;
 	struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
 	struct ceph_cap_flush *prealloc_cf;
+	loff_t isize = i_size_read(inode);
 	int issued;
 	int release = 0, dirtied = 0;
 	int mask = 0;
 	int err = 0;
 	int inode_dirty_flags = 0;
 	bool lock_snap_rwsem = false;
+	bool fill_fscrypt;
 
 	prealloc_cf = ceph_alloc_cap_flush();
 	if (!prealloc_cf)
@@ -2254,6 +2385,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 		return PTR_ERR(req);
 	}
 
+retry:
+	fill_fscrypt = false;
 	spin_lock(&ci->i_ceph_lock);
 	issued = __ceph_caps_issued(ci, NULL);
 
@@ -2367,10 +2500,27 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 		}
 	}
 	if (ia_valid & ATTR_SIZE) {
-		loff_t isize = i_size_read(inode);
-
 		dout("setattr %p size %lld -> %lld\n", inode, isize, attr->ia_size);
-		if ((issued & CEPH_CAP_FILE_EXCL) && attr->ia_size >= isize) {
+		/*
+		 * Only when the new size is smaller and not aligned to
+		 * CEPH_FSCRYPT_BLOCK_SIZE will the RMW is needed.
+		 */
+		if (IS_ENCRYPTED(inode) && attr->ia_size < isize &&
+		    (attr->ia_size % CEPH_FSCRYPT_BLOCK_SIZE)) {
+			mask |= CEPH_SETATTR_SIZE;
+			release |= CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL |
+				   CEPH_CAP_FILE_RD | CEPH_CAP_FILE_WR;
+			set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
+			mask |= CEPH_SETATTR_FSCRYPT_FILE;
+			req->r_args.setattr.size =
+				cpu_to_le64(round_up(attr->ia_size,
+						     CEPH_FSCRYPT_BLOCK_SIZE));
+			req->r_args.setattr.old_size =
+				cpu_to_le64(round_up(isize,
+						     CEPH_FSCRYPT_BLOCK_SIZE));
+			req->r_fscrypt_file = attr->ia_size;
+			fill_fscrypt = true;
+		} else if ((issued & CEPH_CAP_FILE_EXCL) && attr->ia_size >= isize) {
 			if (attr->ia_size > isize) {
 				i_size_write(inode, attr->ia_size);
 				inode->i_blocks = calc_inode_blocks(attr->ia_size);
@@ -2393,7 +2543,6 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 					cpu_to_le64(round_up(isize,
 							     CEPH_FSCRYPT_BLOCK_SIZE));
 				req->r_fscrypt_file = attr->ia_size;
-				/* FIXME: client must zero out any partial blocks! */
 			} else {
 				req->r_args.setattr.size = cpu_to_le64(attr->ia_size);
 				req->r_args.setattr.old_size = cpu_to_le64(isize);
@@ -2465,7 +2614,6 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 	if (inode_dirty_flags)
 		__mark_inode_dirty(inode, inode_dirty_flags);
 
-
 	if (mask) {
 		req->r_inode = inode;
 		ihold(inode);
@@ -2473,7 +2621,23 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 		req->r_args.setattr.mask = cpu_to_le32(mask);
 		req->r_num_caps = 1;
 		req->r_stamp = attr->ia_ctime;
+		if (fill_fscrypt) {
+			err = fill_fscrypt_truncate(inode, req, attr);
+			if (err)
+				goto out;
+		}
+
+		/*
+		 * The truncate will return -EAGAIN when some one
+		 * has updated the last block before the MDS hold
+		 * the xlock for the FILE lock. Need to retry it.
+		 */
 		err = ceph_mdsc_do_request(mdsc, NULL, req);
+		if (err == -EAGAIN) {
+			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
+			     inode, err, ceph_cap_string(dirtied), mask);
+			goto retry;
+		}
 	}
 out:
 	dout("setattr %p result=%d (%s locally, %d remote)\n", inode, err,
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 57bc952c54e1..c8144273ff28 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1236,7 +1236,8 @@ extern int ceph_open(struct inode *inode, struct file *file);
 extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 			    struct file *file, unsigned flags, umode_t mode);
 extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
-				struct iov_iter *to, int *retry_op);
+				struct iov_iter *to, int *retry_op,
+				u64 *assert_ver);
 extern int ceph_release(struct inode *inode, struct file *filp);
 extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page,
 				  char *data, size_t len);
diff --git a/include/linux/ceph/crypto.h b/include/linux/ceph/crypto.h
new file mode 100644
index 000000000000..e0f799d4d4ee
--- /dev/null
+++ b/include/linux/ceph/crypto.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _FS_CEPH_CRYPTO_H
+#define _FS_CEPH_CRYPTO_H
+
+#include <linux/types.h>
+
+/*
+ * Header for the crypted file when truncating the size, this
+ * will be sent to MDS, and MDS will do the encrypted last
+ * block's update and size truncating.
+ */
+struct ceph_fscrypt_truncate_size_header {
+       __u8  ver;
+       __u8  compat;
+       /* length of sizeof(assert_ver + file_offset + block_size + CEPH_FSCRYPT_BLOCK_SIZE) */
+       __le32 data_len;
+
+       __le64 assert_ver;
+       __le64 file_offset;
+       __le32 block_size;
+} __packed;
+
+#endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/4] ceph: add __ceph_get_caps helper support
  2021-10-28  9:14 ` [PATCH v3 1/4] ceph: add __ceph_get_caps helper support xiubli
@ 2021-10-28  9:23   ` Xiubo Li
  2021-10-28 10:44     ` Jeff Layton
  0 siblings, 1 reply; 19+ messages in thread
From: Xiubo Li @ 2021-10-28  9:23 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel

Hi Jeff,

Not sure why the cover-letter is not displayed in both the mail list and 
ceph patchwork, locally it was successfully sent out.

Any idea ?

Thanks

-- Xiubo


On 10/28/21 5:14 PM, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
>
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>   fs/ceph/caps.c  | 19 +++++++++++++------
>   fs/ceph/super.h |  2 ++
>   2 files changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index d628dcdbf869..4e2a588465c5 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -2876,10 +2876,9 @@ int ceph_try_get_caps(struct inode *inode, int need, int want,
>    * due to a small max_size, make sure we check_max_size (and possibly
>    * ask the mds) so we don't get hung up indefinitely.
>    */
> -int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got)
> +int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi, int need,
> +		    int want, loff_t endoff, int *got)
>   {
> -	struct ceph_file_info *fi = filp->private_data;
> -	struct inode *inode = file_inode(filp);
>   	struct ceph_inode_info *ci = ceph_inode(inode);
>   	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>   	int ret, _got, flags;
> @@ -2888,7 +2887,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
>   	if (ret < 0)
>   		return ret;
>   
> -	if ((fi->fmode & CEPH_FILE_MODE_WR) &&
> +	if (fi && (fi->fmode & CEPH_FILE_MODE_WR) &&
>   	    fi->filp_gen != READ_ONCE(fsc->filp_gen))
>   		return -EBADF;
>   
> @@ -2896,7 +2895,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
>   
>   	while (true) {
>   		flags &= CEPH_FILE_MODE_MASK;
> -		if (atomic_read(&fi->num_locks))
> +		if (fi && atomic_read(&fi->num_locks))
>   			flags |= CHECK_FILELOCK;
>   		_got = 0;
>   		ret = try_get_cap_refs(inode, need, want, endoff,
> @@ -2941,7 +2940,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
>   				continue;
>   		}
>   
> -		if ((fi->fmode & CEPH_FILE_MODE_WR) &&
> +		if (fi && (fi->fmode & CEPH_FILE_MODE_WR) &&
>   		    fi->filp_gen != READ_ONCE(fsc->filp_gen)) {
>   			if (ret >= 0 && _got)
>   				ceph_put_cap_refs(ci, _got);
> @@ -3004,6 +3003,14 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
>   	return 0;
>   }
>   
> +int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got)
> +{
> +	struct ceph_file_info *fi = filp->private_data;
> +	struct inode *inode = file_inode(filp);
> +
> +	return __ceph_get_caps(inode, fi, need, want, endoff, got);
> +}
> +
>   /*
>    * Take cap refs.  Caller must already know we hold at least one ref
>    * on the caps in question or we don't know this is safe.
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 7f3976b3319d..027d5f579ba0 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -1208,6 +1208,8 @@ extern int ceph_encode_dentry_release(void **p, struct dentry *dn,
>   				      struct inode *dir,
>   				      int mds, int drop, int unless);
>   
> +extern int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi,
> +			   int need, int want, loff_t endoff, int *got);
>   extern int ceph_get_caps(struct file *filp, int need, int want,
>   			 loff_t endoff, int *got);
>   extern int ceph_try_get_caps(struct inode *inode,


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/4] ceph: add __ceph_get_caps helper support
  2021-10-28  9:23   ` Xiubo Li
@ 2021-10-28 10:44     ` Jeff Layton
  2021-10-28 12:41       ` Xiubo Li
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2021-10-28 10:44 UTC (permalink / raw)
  To: Xiubo Li; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel

I see your cover letter for this series on the list. You may just be
running across how gmail's (infuriating) labeling works? I don't think
they ever show up in patchwork though (since they don't have patches in
them).

-- Jeff


On Thu, 2021-10-28 at 17:23 +0800, Xiubo Li wrote:
> Hi Jeff,
> 
> Not sure why the cover-letter is not displayed in both the mail list and 
> ceph patchwork, locally it was successfully sent out.
> 
> Any idea ?
> 
> Thanks
> 
> -- Xiubo
> 
> 
> On 10/28/21 5:14 PM, xiubli@redhat.com wrote:
> > From: Xiubo Li <xiubli@redhat.com>
> > 
> > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > ---
> >   fs/ceph/caps.c  | 19 +++++++++++++------
> >   fs/ceph/super.h |  2 ++
> >   2 files changed, 15 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > index d628dcdbf869..4e2a588465c5 100644
> > --- a/fs/ceph/caps.c
> > +++ b/fs/ceph/caps.c
> > @@ -2876,10 +2876,9 @@ int ceph_try_get_caps(struct inode *inode, int need, int want,
> >    * due to a small max_size, make sure we check_max_size (and possibly
> >    * ask the mds) so we don't get hung up indefinitely.
> >    */
> > -int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got)
> > +int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi, int need,
> > +		    int want, loff_t endoff, int *got)
> >   {
> > -	struct ceph_file_info *fi = filp->private_data;
> > -	struct inode *inode = file_inode(filp);
> >   	struct ceph_inode_info *ci = ceph_inode(inode);
> >   	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> >   	int ret, _got, flags;
> > @@ -2888,7 +2887,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
> >   	if (ret < 0)
> >   		return ret;
> >   
> > -	if ((fi->fmode & CEPH_FILE_MODE_WR) &&
> > +	if (fi && (fi->fmode & CEPH_FILE_MODE_WR) &&
> >   	    fi->filp_gen != READ_ONCE(fsc->filp_gen))
> >   		return -EBADF;
> >   
> > @@ -2896,7 +2895,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
> >   
> >   	while (true) {
> >   		flags &= CEPH_FILE_MODE_MASK;
> > -		if (atomic_read(&fi->num_locks))
> > +		if (fi && atomic_read(&fi->num_locks))
> >   			flags |= CHECK_FILELOCK;
> >   		_got = 0;
> >   		ret = try_get_cap_refs(inode, need, want, endoff,
> > @@ -2941,7 +2940,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
> >   				continue;
> >   		}
> >   
> > -		if ((fi->fmode & CEPH_FILE_MODE_WR) &&
> > +		if (fi && (fi->fmode & CEPH_FILE_MODE_WR) &&
> >   		    fi->filp_gen != READ_ONCE(fsc->filp_gen)) {
> >   			if (ret >= 0 && _got)
> >   				ceph_put_cap_refs(ci, _got);
> > @@ -3004,6 +3003,14 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
> >   	return 0;
> >   }
> >   
> > +int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got)
> > +{
> > +	struct ceph_file_info *fi = filp->private_data;
> > +	struct inode *inode = file_inode(filp);
> > +
> > +	return __ceph_get_caps(inode, fi, need, want, endoff, got);
> > +}
> > +
> >   /*
> >    * Take cap refs.  Caller must already know we hold at least one ref
> >    * on the caps in question or we don't know this is safe.
> > diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> > index 7f3976b3319d..027d5f579ba0 100644
> > --- a/fs/ceph/super.h
> > +++ b/fs/ceph/super.h
> > @@ -1208,6 +1208,8 @@ extern int ceph_encode_dentry_release(void **p, struct dentry *dn,
> >   				      struct inode *dir,
> >   				      int mds, int drop, int unless);
> >   
> > +extern int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi,
> > +			   int need, int want, loff_t endoff, int *got);
> >   extern int ceph_get_caps(struct file *filp, int need, int want,
> >   			 loff_t endoff, int *got);
> >   extern int ceph_try_get_caps(struct inode *inode,
> 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/4] ceph: add __ceph_get_caps helper support
  2021-10-28 10:44     ` Jeff Layton
@ 2021-10-28 12:41       ` Xiubo Li
  0 siblings, 0 replies; 19+ messages in thread
From: Xiubo Li @ 2021-10-28 12:41 UTC (permalink / raw)
  To: Jeff Layton; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel


On 10/28/21 6:44 PM, Jeff Layton wrote:
> I see your cover letter for this series on the list. You may just be
> running across how gmail's (infuriating) labeling works? I don't think
> they ever show up in patchwork though (since they don't have patches in
> them).

Okay, I searched the cover-letters everywhere, still couldn't see them 
suddenly since last week :-(


>
> -- Jeff
>
>
> On Thu, 2021-10-28 at 17:23 +0800, Xiubo Li wrote:
>> Hi Jeff,
>>
>> Not sure why the cover-letter is not displayed in both the mail list and
>> ceph patchwork, locally it was successfully sent out.
>>
>> Any idea ?
>>
>> Thanks
>>
>> -- Xiubo
>>
>>
>> On 10/28/21 5:14 PM, xiubli@redhat.com wrote:
>>> From: Xiubo Li <xiubli@redhat.com>
>>>
>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>> ---
>>>    fs/ceph/caps.c  | 19 +++++++++++++------
>>>    fs/ceph/super.h |  2 ++
>>>    2 files changed, 15 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>>> index d628dcdbf869..4e2a588465c5 100644
>>> --- a/fs/ceph/caps.c
>>> +++ b/fs/ceph/caps.c
>>> @@ -2876,10 +2876,9 @@ int ceph_try_get_caps(struct inode *inode, int need, int want,
>>>     * due to a small max_size, make sure we check_max_size (and possibly
>>>     * ask the mds) so we don't get hung up indefinitely.
>>>     */
>>> -int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got)
>>> +int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi, int need,
>>> +		    int want, loff_t endoff, int *got)
>>>    {
>>> -	struct ceph_file_info *fi = filp->private_data;
>>> -	struct inode *inode = file_inode(filp);
>>>    	struct ceph_inode_info *ci = ceph_inode(inode);
>>>    	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>>>    	int ret, _got, flags;
>>> @@ -2888,7 +2887,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
>>>    	if (ret < 0)
>>>    		return ret;
>>>    
>>> -	if ((fi->fmode & CEPH_FILE_MODE_WR) &&
>>> +	if (fi && (fi->fmode & CEPH_FILE_MODE_WR) &&
>>>    	    fi->filp_gen != READ_ONCE(fsc->filp_gen))
>>>    		return -EBADF;
>>>    
>>> @@ -2896,7 +2895,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
>>>    
>>>    	while (true) {
>>>    		flags &= CEPH_FILE_MODE_MASK;
>>> -		if (atomic_read(&fi->num_locks))
>>> +		if (fi && atomic_read(&fi->num_locks))
>>>    			flags |= CHECK_FILELOCK;
>>>    		_got = 0;
>>>    		ret = try_get_cap_refs(inode, need, want, endoff,
>>> @@ -2941,7 +2940,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
>>>    				continue;
>>>    		}
>>>    
>>> -		if ((fi->fmode & CEPH_FILE_MODE_WR) &&
>>> +		if (fi && (fi->fmode & CEPH_FILE_MODE_WR) &&
>>>    		    fi->filp_gen != READ_ONCE(fsc->filp_gen)) {
>>>    			if (ret >= 0 && _got)
>>>    				ceph_put_cap_refs(ci, _got);
>>> @@ -3004,6 +3003,14 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
>>>    	return 0;
>>>    }
>>>    
>>> +int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got)
>>> +{
>>> +	struct ceph_file_info *fi = filp->private_data;
>>> +	struct inode *inode = file_inode(filp);
>>> +
>>> +	return __ceph_get_caps(inode, fi, need, want, endoff, got);
>>> +}
>>> +
>>>    /*
>>>     * Take cap refs.  Caller must already know we hold at least one ref
>>>     * on the caps in question or we don't know this is safe.
>>> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
>>> index 7f3976b3319d..027d5f579ba0 100644
>>> --- a/fs/ceph/super.h
>>> +++ b/fs/ceph/super.h
>>> @@ -1208,6 +1208,8 @@ extern int ceph_encode_dentry_release(void **p, struct dentry *dn,
>>>    				      struct inode *dir,
>>>    				      int mds, int drop, int unless);
>>>    
>>> +extern int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi,
>>> +			   int need, int want, loff_t endoff, int *got);
>>>    extern int ceph_get_caps(struct file *filp, int need, int want,
>>>    			 loff_t endoff, int *got);
>>>    extern int ceph_try_get_caps(struct inode *inode,


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/4] ceph: return the real size readed when hit EOF
  2021-10-28  9:14 ` [PATCH v3 3/4] ceph: return the real size readed when hit EOF xiubli
@ 2021-10-28 18:11   ` Jeff Layton
  2021-10-30  4:35     ` Xiubo Li
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2021-10-28 18:11 UTC (permalink / raw)
  To: xiubli; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel

On Thu, 2021-10-28 at 17:14 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/file.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 74db403a4c35..1988e75ad4a2 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -910,6 +910,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>  	ssize_t ret;
>  	u64 off = *ki_pos;
>  	u64 len = iov_iter_count(to);
> +	u64 i_size = i_size_read(inode);
>  
>  	dout("sync_read on inode %p %llu~%u\n", inode, *ki_pos, (unsigned)len);
>  
> @@ -933,7 +934,6 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>  		struct page **pages;
>  		int num_pages;
>  		size_t page_off;
> -		u64 i_size;
>  		bool more;
>  		int idx;
>  		size_t left;
> @@ -980,7 +980,6 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>  
>  		ceph_osdc_put_request(req);
>  
> -		i_size = i_size_read(inode);

I wonder a little about removing this i_size fetch. Then again, the
i_size could change any time after we fetch it so it doesn't seem
worthwhile to do so.

>  		dout("sync_read %llu~%llu got %zd i_size %llu%s\n",
>  		     off, len, ret, i_size, (more ? " MORE" : ""));
>  
> @@ -1056,11 +1055,14 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>  	}
>  
>  	if (off > *ki_pos) {
> -		if (ret >= 0 &&
> -		    iov_iter_count(to) > 0 && off >= i_size_read(inode))
> +		if (off >= i_size) {
>  			*retry_op = CHECK_EOF;
> -		ret = off - *ki_pos;
> -		*ki_pos = off;
> +			ret = i_size - *ki_pos;
> +			*ki_pos = i_size;
> +		} else {
> +			ret = off - *ki_pos;
> +			*ki_pos = off;
> +		}
>  	}
>  out:
>  	dout("sync_read result %zd retry_op %d\n", ret, *retry_op);

I think I'll go ahead and pull this patch into the testing branch, since
it seems to be correct.

Thanks!
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/4] ceph: add __ceph_sync_read helper support
  2021-10-28  9:14 ` [PATCH v3 2/4] ceph: add __ceph_sync_read " xiubli
@ 2021-10-28 18:21   ` Jeff Layton
  2021-10-29  8:14     ` Xiubo Li
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2021-10-28 18:21 UTC (permalink / raw)
  To: xiubli; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel

On Thu, 2021-10-28 at 17:14 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/file.c  | 31 +++++++++++++++++++++----------
>  fs/ceph/super.h |  2 ++
>  2 files changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 6e677b40410e..74db403a4c35 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -901,20 +901,17 @@ static inline void fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64
>   * If we get a short result from the OSD, check against i_size; we need to
>   * only return a short read to the caller if we hit EOF.
>   */
> -static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
> -			      int *retry_op)
> +ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> +			 struct iov_iter *to, int *retry_op)
>  {
> -	struct file *file = iocb->ki_filp;
> -	struct inode *inode = file_inode(file);
>  	struct ceph_inode_info *ci = ceph_inode(inode);
>  	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>  	struct ceph_osd_client *osdc = &fsc->client->osdc;
>  	ssize_t ret;
> -	u64 off = iocb->ki_pos;
> +	u64 off = *ki_pos;
>  	u64 len = iov_iter_count(to);
>  
> -	dout("sync_read on file %p %llu~%u %s\n", file, off, (unsigned)len,
> -	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
> +	dout("sync_read on inode %p %llu~%u\n", inode, *ki_pos, (unsigned)len);
>  
>  	if (!len)
>  		return 0;
> @@ -1058,18 +1055,32 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>  			break;
>  	}
>  
> -	if (off > iocb->ki_pos) {
> +	if (off > *ki_pos) {
>  		if (ret >= 0 &&
>  		    iov_iter_count(to) > 0 && off >= i_size_read(inode))
>  			*retry_op = CHECK_EOF;
> -		ret = off - iocb->ki_pos;
> -		iocb->ki_pos = off;
> +		ret = off - *ki_pos;
> +		*ki_pos = off;
>  	}
>  out:
>  	dout("sync_read result %zd retry_op %d\n", ret, *retry_op);
>  	return ret;
>  }
>  
> +static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
> +			      int *retry_op)
> +{
> +	struct file *file = iocb->ki_filp;
> +	struct inode *inode = file_inode(file);
> +
> +	dout("sync_read on file %p %llu~%u %s\n", file, iocb->ki_pos,
> +	     (unsigned)iov_iter_count(to),
> +	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
> +
> +	return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op);
> +
> +}
> +
>  struct ceph_aio_request {
>  	struct kiocb *iocb;
>  	size_t total_len;
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 027d5f579ba0..57bc952c54e1 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -1235,6 +1235,8 @@ extern int ceph_renew_caps(struct inode *inode, int fmode);
>  extern int ceph_open(struct inode *inode, struct file *file);
>  extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
>  			    struct file *file, unsigned flags, umode_t mode);
> +extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> +				struct iov_iter *to, int *retry_op);
>  extern int ceph_release(struct inode *inode, struct file *filp);
>  extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page,
>  				  char *data, size_t len);

I went ahead and picked this patch too since #3 had a dependency on it.
If we decide we want #3 for stable though, then we probably ought to
respin these to avoid it.
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/4] ceph: add truncate size handling support for fscrypt
  2021-10-28  9:14 ` [PATCH v3 4/4] ceph: add truncate size handling support for fscrypt xiubli
@ 2021-10-28 19:17   ` Jeff Layton
  2021-10-30  5:58     ` Xiubo Li
  2021-10-30  6:20     ` Xiubo Li
  0 siblings, 2 replies; 19+ messages in thread
From: Jeff Layton @ 2021-10-28 19:17 UTC (permalink / raw)
  To: xiubli; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel

On Thu, 2021-10-28 at 17:14 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> This will transfer the encrypted last block contents to the MDS
> along with the truncate request only when new size is smaller and
> not aligned to the fscrypt BLOCK size.
> 
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/caps.c              |   2 -
>  fs/ceph/file.c              |  10 +-
>  fs/ceph/inode.c             | 182 ++++++++++++++++++++++++++++++++++--
>  fs/ceph/super.h             |   3 +-
>  include/linux/ceph/crypto.h |  23 +++++
>  5 files changed, 206 insertions(+), 14 deletions(-)
>  create mode 100644 include/linux/ceph/crypto.h
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 4e2a588465c5..c9624b059eb0 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -1299,8 +1299,6 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg)
>  	 * fscrypt_auth holds the crypto context (if any). fscrypt_file
>  	 * tracks the real i_size as an __le64 field (and we use a rounded-up
>  	 * i_size in * the traditional size field).
> -	 *
> -	 * FIXME: should we encrypt fscrypt_file field?
>  	 */
>  	ceph_encode_32(&p, arg->fscrypt_auth_len);
>  	ceph_encode_copy(&p, arg->fscrypt_auth, arg->fscrypt_auth_len);
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 1988e75ad4a2..2a780e0133df 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -902,7 +902,8 @@ static inline void fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64
>   * only return a short read to the caller if we hit EOF.
>   */
>  ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> -			 struct iov_iter *to, int *retry_op)
> +			 struct iov_iter *to, int *retry_op,
> +			 u64 *assert_ver)
>  {
>  	struct ceph_inode_info *ci = ceph_inode(inode);
>  	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> @@ -978,6 +979,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>  					 req->r_end_latency,
>  					 len, ret);
>  
> +		/* Grab assert version. It must be non-zero. */
> +		*assert_ver = req->r_version;
> +		WARN_ON_ONCE(assert_ver == 0);
>  		ceph_osdc_put_request(req);
>  
>  		dout("sync_read %llu~%llu got %zd i_size %llu%s\n",
> @@ -1074,12 +1078,14 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>  {
>  	struct file *file = iocb->ki_filp;
>  	struct inode *inode = file_inode(file);
> +	u64 assert_ver;
>  
>  	dout("sync_read on file %p %llu~%u %s\n", file, iocb->ki_pos,
>  	     (unsigned)iov_iter_count(to),
>  	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
>  
> -	return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op);
> +	return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op,
> +				&assert_ver);
>  
>  }
>  
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 9b798690fdc9..abb634866897 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -21,6 +21,7 @@
>  #include "cache.h"
>  #include "crypto.h"
>  #include <linux/ceph/decode.h>
> +#include <linux/ceph/crypto.h>
>  
>  /*
>   * Ceph inode operations
> @@ -1034,10 +1035,14 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
>  		pool_ns = old_ns;
>  
>  		if (IS_ENCRYPTED(inode) && size &&
> -		    (iinfo->fscrypt_file_len == sizeof(__le64))) {
> -			size = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file);
> -			if (info->size != round_up(size, CEPH_FSCRYPT_BLOCK_SIZE))
> -				pr_warn("size=%llu fscrypt_file=%llu\n", info->size, size);
> +		    (iinfo->fscrypt_file_len >= sizeof(__le64))) {
> +			u64 fsize = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file);
> +			if (fsize) {
> +				size = fsize;
> +				if (info->size != round_up(size, CEPH_FSCRYPT_BLOCK_SIZE))
> +					pr_warn("size=%llu fscrypt_file=%llu\n",
> +						info->size, size);
> +			}
>  		}
>  
>  		queue_trunc = ceph_fill_file_size(inode, issued,
> @@ -2229,6 +2234,130 @@ static const struct inode_operations ceph_encrypted_symlink_iops = {
>  	.listxattr = ceph_listxattr,
>  };
>  
> +/*
> + * Transfer the encrypted last block to the MDS and the MDS
> + * will update the file when truncating a smaller size.
> + *
> + * We don't support a PAGE_SIZE that is smaller than the
> + * CEPH_FSCRYPT_BLOCK_SIZE.
> + */
> +static int fill_fscrypt_truncate(struct inode *inode,
> +				 struct ceph_mds_request *req,
> +				 struct iattr *attr)
> +{
> +	struct ceph_inode_info *ci = ceph_inode(inode);
> +	int boff = attr->ia_size % CEPH_FSCRYPT_BLOCK_SIZE;
> +	loff_t pos, orig_pos = round_down(attr->ia_size, CEPH_FSCRYPT_BLOCK_SIZE);
> +	u64 block = orig_pos >> CEPH_FSCRYPT_BLOCK_SHIFT;
> +	struct ceph_pagelist *pagelist = NULL;
> +	struct kvec iov;
> +	struct iov_iter iter;
> +	struct page *page = NULL;
> +	struct ceph_fscrypt_truncate_size_header header;
> +	int retry_op = 0;
> +	int len = CEPH_FSCRYPT_BLOCK_SIZE;
> +	loff_t i_size = i_size_read(inode);
> +	bool fill_header_only = false;
> +	u64 assert_ver = cpu_to_le64(0);
> +	int got, ret, issued;
> +
> +	ret = __ceph_get_caps(inode, NULL, CEPH_CAP_FILE_RD, 0, -1, &got);
> +	if (ret < 0)
> +		return ret;
> +
> +	dout("%s size %lld -> %lld got cap refs on %s\n", __func__,
> +	     i_size, attr->ia_size, ceph_cap_string(got));
> +
> +	issued = __ceph_caps_issued(ci, NULL);
> +
> +	/* Try to writeback the dirty pagecaches */
> +	if (issued & (CEPH_CAP_FILE_BUFFER))
> +		filemap_fdatawrite(&inode->i_data);
> +
> +	page = __page_cache_alloc(GFP_KERNEL);
> +	if (page == NULL) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	iov.iov_base = kmap_local_page(page);
> +	iov.iov_len = len;
> +	iov_iter_kvec(&iter, READ, &iov, 1, len);
> +
> +	pos = orig_pos;
> +	ret = __ceph_sync_read(inode, &pos, &iter, &retry_op, &assert_ver);
> +
> +	/*
> +	 * If we hit a hole here, we should just skip filling
> +	 * the fscrypt for the request, because once the fscrypt
> +	 * is enabled, the file will be split into many blocks
> +	 * with the size of CEPH_FSCRYPT_BLOCK_SIZE, if there
> +	 * has a hole, the hole size should be multiple of block
> +	 * size.
> +	 */
> +	if (pos < i_size && ret < len) {
> +		dout("%s hit hole, ppos %lld < size %lld\n",
> +		     __func__, pos, i_size);
> +
> +		ret = 0;
> +		fill_header_only = true;
> +		goto fill_last_block;
> +	}
> 
> 
You can drop the goto above if you just put everything between here and
fill_last_block into an else block.

> +
> +	/* truncate and zero out the extra contents for the last block */
> +	memset(iov.iov_base + boff, 0, PAGE_SIZE - boff);
> +
> +	/* encrypt the last block */
> +	ret = fscrypt_encrypt_block_inplace(inode, page,
> +					    CEPH_FSCRYPT_BLOCK_SIZE,
> +					    0, block,
> +					    GFP_KERNEL);
> +	if (ret)
> +		goto out;
> 
> 

Careful with the above. We don't _really_ want to run the encryption
just yet, until we're ready to turn on content encryption everywhere.
Might want to #ifdef 0 that out for now.

> +
> +fill_last_block:
> +	pagelist = ceph_pagelist_alloc(GFP_KERNEL);
> +	if (!pagelist)
> +		return -ENOMEM;
> +
> +	/* Insert the header first */
> +	header.ver = 1;
> +	header.compat = 1;
> +	if (fill_header_only) {
> +		header.data_len = cpu_to_le32(8 + 8 + 4);
> +		header.assert_ver = cpu_to_le64(0);
> +		header.file_offset = cpu_to_le64(0);
> +		header.block_size = cpu_to_le64(0);
> +	} else {
> +		header.data_len = cpu_to_le32(8 + 8 + 4 + CEPH_FSCRYPT_BLOCK_SIZE);
> +		header.assert_ver = assert_ver;
> +		header.file_offset = cpu_to_le64(orig_pos);
> +		header.block_size = cpu_to_le64(CEPH_FSCRYPT_BLOCK_SIZE);
> +	}
> +	ret = ceph_pagelist_append(pagelist, &header, sizeof(header));
> +	if (ret)
> +		goto out;
> +
> +	if (!fill_header_only) {
> +		/* Append the last block contents to pagelist */
> +		ret = ceph_pagelist_append(pagelist, iov.iov_base,
> +					   CEPH_FSCRYPT_BLOCK_SIZE);
> +		if (ret)
> +			goto out;
> +	}
> +	req->r_pagelist = pagelist;
> +out:
> +	dout("%s %p size dropping cap refs on %s\n", __func__,
> +	     inode, ceph_cap_string(got));
> +	kunmap_local(iov.iov_base);
> +	if (page)
> +		__free_pages(page, 0);
> +	if (ret && pagelist)
> +		ceph_pagelist_release(pagelist);
> +	ceph_put_cap_refs(ci, got);
> +	return ret;
> +}
> +
>  int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia)
>  {
>  	struct ceph_inode_info *ci = ceph_inode(inode);
> @@ -2236,12 +2365,14 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>  	struct ceph_mds_request *req;
>  	struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
>  	struct ceph_cap_flush *prealloc_cf;
> +	loff_t isize = i_size_read(inode);
>  	int issued;
>  	int release = 0, dirtied = 0;
>  	int mask = 0;
>  	int err = 0;
>  	int inode_dirty_flags = 0;
>  	bool lock_snap_rwsem = false;
> +	bool fill_fscrypt;
>  
>  	prealloc_cf = ceph_alloc_cap_flush();
>  	if (!prealloc_cf)
> @@ -2254,6 +2385,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>  		return PTR_ERR(req);
>  	}
>  
> +retry:
> +	fill_fscrypt = false;
>  	spin_lock(&ci->i_ceph_lock);
>  	issued = __ceph_caps_issued(ci, NULL);
>  
> @@ -2367,10 +2500,27 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>  		}
>  	}
>  	if (ia_valid & ATTR_SIZE) {
> -		loff_t isize = i_size_read(inode);
> -
>  		dout("setattr %p size %lld -> %lld\n", inode, isize, attr->ia_size);
> -		if ((issued & CEPH_CAP_FILE_EXCL) && attr->ia_size >= isize) {
> +		/*
> +		 * Only when the new size is smaller and not aligned to
> +		 * CEPH_FSCRYPT_BLOCK_SIZE will the RMW is needed.
> +		 */
> +		if (IS_ENCRYPTED(inode) && attr->ia_size < isize &&
> +		    (attr->ia_size % CEPH_FSCRYPT_BLOCK_SIZE)) {
> +			mask |= CEPH_SETATTR_SIZE;
> +			release |= CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL |
> +				   CEPH_CAP_FILE_RD | CEPH_CAP_FILE_WR;
> +			set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
> +			mask |= CEPH_SETATTR_FSCRYPT_FILE;
> +			req->r_args.setattr.size =
> +				cpu_to_le64(round_up(attr->ia_size,
> +						     CEPH_FSCRYPT_BLOCK_SIZE));
> +			req->r_args.setattr.old_size =
> +				cpu_to_le64(round_up(isize,
> +						     CEPH_FSCRYPT_BLOCK_SIZE));
> +			req->r_fscrypt_file = attr->ia_size;
> +			fill_fscrypt = true;
> +		} else if ((issued & CEPH_CAP_FILE_EXCL) && attr->ia_size >= isize) {
>  			if (attr->ia_size > isize) {
>  				i_size_write(inode, attr->ia_size);
>  				inode->i_blocks = calc_inode_blocks(attr->ia_size);
> @@ -2393,7 +2543,6 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>  					cpu_to_le64(round_up(isize,
>  							     CEPH_FSCRYPT_BLOCK_SIZE));
>  				req->r_fscrypt_file = attr->ia_size;
> -				/* FIXME: client must zero out any partial blocks! */
>  			} else {
>  				req->r_args.setattr.size = cpu_to_le64(attr->ia_size);
>  				req->r_args.setattr.old_size = cpu_to_le64(isize);
> @@ -2465,7 +2614,6 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>  	if (inode_dirty_flags)
>  		__mark_inode_dirty(inode, inode_dirty_flags);
>  
> -
>  	if (mask) {
>  		req->r_inode = inode;
>  		ihold(inode);
> @@ -2473,7 +2621,23 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>  		req->r_args.setattr.mask = cpu_to_le32(mask);
>  		req->r_num_caps = 1;
>  		req->r_stamp = attr->ia_ctime;
> +		if (fill_fscrypt) {
> +			err = fill_fscrypt_truncate(inode, req, attr);
> +			if (err)
> +				goto out;
> +		}
> +
> +		/*
> +		 * The truncate will return -EAGAIN when some one
> +		 * has updated the last block before the MDS hold
> +		 * the xlock for the FILE lock. Need to retry it.
> +		 */
>  		err = ceph_mdsc_do_request(mdsc, NULL, req);
> +		if (err == -EAGAIN) {
> +			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
> +			     inode, err, ceph_cap_string(dirtied), mask);
> +			goto retry;
> +		}

The rest looks reasonable. We may want to cap the number of retries in
case something goes really wrong or in the case of a livelock with a
competing client. I'm not sure what a reasonable number of tries would
be though -- 5? 10? 100? We may want to benchmark out how long this rmw
operation takes and then we can use that to determine a reasonable
number of tries.

If you run out of tries, you could probably  just return -EAGAIN in that
case. That's not listed in the truncate(2) manpage, but it seems like a
reasonable way to handle that sort of problem.

>  	}
>  out:
>  	dout("setattr %p result=%d (%s locally, %d remote)\n", inode, err,
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 57bc952c54e1..c8144273ff28 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -1236,7 +1236,8 @@ extern int ceph_open(struct inode *inode, struct file *file);
>  extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
>  			    struct file *file, unsigned flags, umode_t mode);
>  extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> -				struct iov_iter *to, int *retry_op);
> +				struct iov_iter *to, int *retry_op,
> +				u64 *assert_ver);
>  extern int ceph_release(struct inode *inode, struct file *filp);
>  extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page,
>  				  char *data, size_t len);
> diff --git a/include/linux/ceph/crypto.h b/include/linux/ceph/crypto.h
> new file mode 100644
> index 000000000000..e0f799d4d4ee
> --- /dev/null
> +++ b/include/linux/ceph/crypto.h
> @@ -0,0 +1,23 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _FS_CEPH_CRYPTO_H
> +#define _FS_CEPH_CRYPTO_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * Header for the crypted file when truncating the size, this
> + * will be sent to MDS, and MDS will do the encrypted last
> + * block's update and size truncating.
> + */
> +struct ceph_fscrypt_truncate_size_header {
> +       __u8  ver;
> +       __u8  compat;
> +       /* length of sizeof(assert_ver + file_offset + block_size + CEPH_FSCRYPT_BLOCK_SIZE) */
> +       __le32 data_len;
> +
> +       __le64 assert_ver;
> +       __le64 file_offset;
> +       __le32 block_size;
> +} __packed;
> +
> +#endif

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/4] ceph: add __ceph_sync_read helper support
  2021-10-28 18:21   ` Jeff Layton
@ 2021-10-29  8:14     ` Xiubo Li
  2021-10-29 10:29       ` Jeff Layton
  0 siblings, 1 reply; 19+ messages in thread
From: Xiubo Li @ 2021-10-29  8:14 UTC (permalink / raw)
  To: Jeff Layton; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel


On 10/29/21 2:21 AM, Jeff Layton wrote:
> On Thu, 2021-10-28 at 17:14 +0800, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/ceph/file.c  | 31 +++++++++++++++++++++----------
>>   fs/ceph/super.h |  2 ++
>>   2 files changed, 23 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 6e677b40410e..74db403a4c35 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -901,20 +901,17 @@ static inline void fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64
>>    * If we get a short result from the OSD, check against i_size; we need to
>>    * only return a short read to the caller if we hit EOF.
>>    */
>> -static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>> -			      int *retry_op)
>> +ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>> +			 struct iov_iter *to, int *retry_op)
>>   {
>> -	struct file *file = iocb->ki_filp;
>> -	struct inode *inode = file_inode(file);
>>   	struct ceph_inode_info *ci = ceph_inode(inode);
>>   	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>>   	struct ceph_osd_client *osdc = &fsc->client->osdc;
>>   	ssize_t ret;
>> -	u64 off = iocb->ki_pos;
>> +	u64 off = *ki_pos;
>>   	u64 len = iov_iter_count(to);
>>   
>> -	dout("sync_read on file %p %llu~%u %s\n", file, off, (unsigned)len,
>> -	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
>> +	dout("sync_read on inode %p %llu~%u\n", inode, *ki_pos, (unsigned)len);
>>   
>>   	if (!len)
>>   		return 0;
>> @@ -1058,18 +1055,32 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>>   			break;
>>   	}
>>   
>> -	if (off > iocb->ki_pos) {
>> +	if (off > *ki_pos) {
>>   		if (ret >= 0 &&
>>   		    iov_iter_count(to) > 0 && off >= i_size_read(inode))
>>   			*retry_op = CHECK_EOF;
>> -		ret = off - iocb->ki_pos;
>> -		iocb->ki_pos = off;
>> +		ret = off - *ki_pos;
>> +		*ki_pos = off;
>>   	}
>>   out:
>>   	dout("sync_read result %zd retry_op %d\n", ret, *retry_op);
>>   	return ret;
>>   }
>>   
>> +static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>> +			      int *retry_op)
>> +{
>> +	struct file *file = iocb->ki_filp;
>> +	struct inode *inode = file_inode(file);
>> +
>> +	dout("sync_read on file %p %llu~%u %s\n", file, iocb->ki_pos,
>> +	     (unsigned)iov_iter_count(to),
>> +	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
>> +
>> +	return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op);
>> +
>> +}
>> +
>>   struct ceph_aio_request {
>>   	struct kiocb *iocb;
>>   	size_t total_len;
>> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
>> index 027d5f579ba0..57bc952c54e1 100644
>> --- a/fs/ceph/super.h
>> +++ b/fs/ceph/super.h
>> @@ -1235,6 +1235,8 @@ extern int ceph_renew_caps(struct inode *inode, int fmode);
>>   extern int ceph_open(struct inode *inode, struct file *file);
>>   extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
>>   			    struct file *file, unsigned flags, umode_t mode);
>> +extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>> +				struct iov_iter *to, int *retry_op);
>>   extern int ceph_release(struct inode *inode, struct file *filp);
>>   extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page,
>>   				  char *data, size_t len);
> I went ahead and picked this patch too since #3 had a dependency on it.
> If we decide we want #3 for stable though, then we probably ought to
> respin these to avoid it.

I saw you have merged these two into the testing branch, should I respin 
for the #3 ?



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/4] ceph: add __ceph_sync_read helper support
  2021-10-29  8:14     ` Xiubo Li
@ 2021-10-29 10:29       ` Jeff Layton
  2021-10-30  4:05         ` Xiubo Li
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2021-10-29 10:29 UTC (permalink / raw)
  To: Xiubo Li; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel

On Fri, 2021-10-29 at 16:14 +0800, Xiubo Li wrote:
> On 10/29/21 2:21 AM, Jeff Layton wrote:
> > On Thu, 2021-10-28 at 17:14 +0800, xiubli@redhat.com wrote:
> > > From: Xiubo Li <xiubli@redhat.com>
> > > 
> > > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > > ---
> > >   fs/ceph/file.c  | 31 +++++++++++++++++++++----------
> > >   fs/ceph/super.h |  2 ++
> > >   2 files changed, 23 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > > index 6e677b40410e..74db403a4c35 100644
> > > --- a/fs/ceph/file.c
> > > +++ b/fs/ceph/file.c
> > > @@ -901,20 +901,17 @@ static inline void fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64
> > >    * If we get a short result from the OSD, check against i_size; we need to
> > >    * only return a short read to the caller if we hit EOF.
> > >    */
> > > -static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
> > > -			      int *retry_op)
> > > +ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> > > +			 struct iov_iter *to, int *retry_op)
> > >   {
> > > -	struct file *file = iocb->ki_filp;
> > > -	struct inode *inode = file_inode(file);
> > >   	struct ceph_inode_info *ci = ceph_inode(inode);
> > >   	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> > >   	struct ceph_osd_client *osdc = &fsc->client->osdc;
> > >   	ssize_t ret;
> > > -	u64 off = iocb->ki_pos;
> > > +	u64 off = *ki_pos;
> > >   	u64 len = iov_iter_count(to);
> > >   
> > > -	dout("sync_read on file %p %llu~%u %s\n", file, off, (unsigned)len,
> > > -	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
> > > +	dout("sync_read on inode %p %llu~%u\n", inode, *ki_pos, (unsigned)len);
> > >   
> > >   	if (!len)
> > >   		return 0;
> > > @@ -1058,18 +1055,32 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
> > >   			break;
> > >   	}
> > >   
> > > -	if (off > iocb->ki_pos) {
> > > +	if (off > *ki_pos) {
> > >   		if (ret >= 0 &&
> > >   		    iov_iter_count(to) > 0 && off >= i_size_read(inode))
> > >   			*retry_op = CHECK_EOF;
> > > -		ret = off - iocb->ki_pos;
> > > -		iocb->ki_pos = off;
> > > +		ret = off - *ki_pos;
> > > +		*ki_pos = off;
> > >   	}
> > >   out:
> > >   	dout("sync_read result %zd retry_op %d\n", ret, *retry_op);
> > >   	return ret;
> > >   }
> > >   
> > > +static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
> > > +			      int *retry_op)
> > > +{
> > > +	struct file *file = iocb->ki_filp;
> > > +	struct inode *inode = file_inode(file);
> > > +
> > > +	dout("sync_read on file %p %llu~%u %s\n", file, iocb->ki_pos,
> > > +	     (unsigned)iov_iter_count(to),
> > > +	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
> > > +
> > > +	return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op);
> > > +
> > > +}
> > > +
> > >   struct ceph_aio_request {
> > >   	struct kiocb *iocb;
> > >   	size_t total_len;
> > > diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> > > index 027d5f579ba0..57bc952c54e1 100644
> > > --- a/fs/ceph/super.h
> > > +++ b/fs/ceph/super.h
> > > @@ -1235,6 +1235,8 @@ extern int ceph_renew_caps(struct inode *inode, int fmode);
> > >   extern int ceph_open(struct inode *inode, struct file *file);
> > >   extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
> > >   			    struct file *file, unsigned flags, umode_t mode);
> > > +extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> > > +				struct iov_iter *to, int *retry_op);
> > >   extern int ceph_release(struct inode *inode, struct file *filp);
> > >   extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page,
> > >   				  char *data, size_t len);
> > I went ahead and picked this patch too since #3 had a dependency on it.
> > If we decide we want #3 for stable though, then we probably ought to
> > respin these to avoid it.
> 
> I saw you have merged these two into the testing branch, should I respin 
> for the #3 ?
> 
> 

Yeah, it's probably worthwhile to mark this for stable. It _is_ a data
integrity issue after all.

Could you respin it so that it doesn't rely on patch #2?
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/4] ceph: add __ceph_sync_read helper support
  2021-10-29 10:29       ` Jeff Layton
@ 2021-10-30  4:05         ` Xiubo Li
  0 siblings, 0 replies; 19+ messages in thread
From: Xiubo Li @ 2021-10-30  4:05 UTC (permalink / raw)
  To: Jeff Layton; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel


On 10/29/21 6:29 PM, Jeff Layton wrote:
> On Fri, 2021-10-29 at 16:14 +0800, Xiubo Li wrote:
>> On 10/29/21 2:21 AM, Jeff Layton wrote:
>>> On Thu, 2021-10-28 at 17:14 +0800, xiubli@redhat.com wrote:
>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>
>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>> ---
>>>>    fs/ceph/file.c  | 31 +++++++++++++++++++++----------
>>>>    fs/ceph/super.h |  2 ++
>>>>    2 files changed, 23 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> index 6e677b40410e..74db403a4c35 100644
>>>> --- a/fs/ceph/file.c
>>>> +++ b/fs/ceph/file.c
>>>> @@ -901,20 +901,17 @@ static inline void fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64
>>>>     * If we get a short result from the OSD, check against i_size; we need to
>>>>     * only return a short read to the caller if we hit EOF.
>>>>     */
>>>> -static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>>>> -			      int *retry_op)
>>>> +ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>>>> +			 struct iov_iter *to, int *retry_op)
>>>>    {
>>>> -	struct file *file = iocb->ki_filp;
>>>> -	struct inode *inode = file_inode(file);
>>>>    	struct ceph_inode_info *ci = ceph_inode(inode);
>>>>    	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>>>>    	struct ceph_osd_client *osdc = &fsc->client->osdc;
>>>>    	ssize_t ret;
>>>> -	u64 off = iocb->ki_pos;
>>>> +	u64 off = *ki_pos;
>>>>    	u64 len = iov_iter_count(to);
>>>>    
>>>> -	dout("sync_read on file %p %llu~%u %s\n", file, off, (unsigned)len,
>>>> -	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
>>>> +	dout("sync_read on inode %p %llu~%u\n", inode, *ki_pos, (unsigned)len);
>>>>    
>>>>    	if (!len)
>>>>    		return 0;
>>>> @@ -1058,18 +1055,32 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>>>>    			break;
>>>>    	}
>>>>    
>>>> -	if (off > iocb->ki_pos) {
>>>> +	if (off > *ki_pos) {
>>>>    		if (ret >= 0 &&
>>>>    		    iov_iter_count(to) > 0 && off >= i_size_read(inode))
>>>>    			*retry_op = CHECK_EOF;
>>>> -		ret = off - iocb->ki_pos;
>>>> -		iocb->ki_pos = off;
>>>> +		ret = off - *ki_pos;
>>>> +		*ki_pos = off;
>>>>    	}
>>>>    out:
>>>>    	dout("sync_read result %zd retry_op %d\n", ret, *retry_op);
>>>> return ret;
>>>>     	 }
>>>>    
>>>> +static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>>>> +			      int *retry_op)
>>>> +{
>>>> +	struct file *file = iocb->ki_filp;
>>>> +	struct inode *inode = file_inode(file);
>>>> +
>>>> +	dout("sync_read on file %p %llu~%u %s\n", file, iocb->ki_pos,
>>>> +	     (unsigned)iov_iter_count(to),
>>>> +	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
>>>> +
>>>> +	return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op);
>>>> +
>>>> +}
>>>> +
>>>>    struct ceph_aio_request {
>>>>    	struct kiocb *iocb;
>>>>    	size_t total_len;
>>>> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
>>>> index 027d5f579ba0..57bc952c54e1 100644
>>>> --- a/fs/ceph/super.h
>>>> +++ b/fs/ceph/super.h
>>>> @@ -1235,6 +1235,8 @@ extern int ceph_renew_caps(struct inode *inode, int fmode);
>>>>    extern int ceph_open(struct inode *inode, struct file *file);
>>>>    extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
>>>>    			    struct file *file, unsigned flags, umode_t mode);
>>>> +extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>>>> +				struct iov_iter *to, int *retry_op);
>>>>    extern int ceph_release(struct inode *inode, struct file *filp);
>>>>    extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page,
>>>>    				  char *data, size_t len);
>>> I went ahead and picked this patch too since #3 had a dependency on it.
>>> If we decide we want #3 for stable though, then we probably ought to
>>> respin these to avoid it.
>> I saw you have merged these two into the testing branch, should I respin
>> for the #3 ?
>>
>>
> Yeah, it's probably worthwhile to mark this for stable. It _is_ a data
> integrity issue after all.
>
> Could you respin it so that it doesn't rely on patch #2?

Sure.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/4] ceph: return the real size readed when hit EOF
  2021-10-28 18:11   ` Jeff Layton
@ 2021-10-30  4:35     ` Xiubo Li
  0 siblings, 0 replies; 19+ messages in thread
From: Xiubo Li @ 2021-10-30  4:35 UTC (permalink / raw)
  To: Jeff Layton; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel


On 10/29/21 2:11 AM, Jeff Layton wrote:
> On Thu, 2021-10-28 at 17:14 +0800, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/ceph/file.c | 14 ++++++++------
>>   1 file changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 74db403a4c35..1988e75ad4a2 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -910,6 +910,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>>   	ssize_t ret;
>>   	u64 off = *ki_pos;
>>   	u64 len = iov_iter_count(to);
>> +	u64 i_size = i_size_read(inode);
>>   
>>   	dout("sync_read on inode %p %llu~%u\n", inode, *ki_pos, (unsigned)len);
>>   
>> @@ -933,7 +934,6 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>>   		struct page **pages;
>>   		int num_pages;
>>   		size_t page_off;
>> -		u64 i_size;
>>   		bool more;
>>   		int idx;
>>   		size_t left;
>> @@ -980,7 +980,6 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>>   
>>   		ceph_osdc_put_request(req);
>>   
>> -		i_size = i_size_read(inode);
> I wonder a little about removing this i_size fetch. Then again, the
> i_size could change any time after we fetch it so it doesn't seem
> worthwhile to do so.

I checked the code again. There has 2 ways to change the inode size, 
which are writing a file and truncating the size. Both this are safe by 
protecting by 'inode->i_rwsem' and FILE caps.

If there has no any other will do this in MDS side, such as when doing a 
file recovery, which I don't think it will.

I will add it back for now.

>
>>   		dout("sync_read %llu~%llu got %zd i_size %llu%s\n",
>>   		     off, len, ret, i_size, (more ? " MORE" : ""));
>>   
>> @@ -1056,11 +1055,14 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>>   	}
>>   
>>   	if (off > *ki_pos) {
>> -		if (ret >= 0 &&
>> -		    iov_iter_count(to) > 0 && off >= i_size_read(inode))
>> +		if (off >= i_size) {
>>   			*retry_op = CHECK_EOF;
>> -		ret = off - *ki_pos;
>> -		*ki_pos = off;
>> +			ret = i_size - *ki_pos;
>> +			*ki_pos = i_size;
>> +		} else {
>> +			ret = off - *ki_pos;
>> +			*ki_pos = off;
>> +		}
>>   	}
>>   out:
>>   	dout("sync_read result %zd retry_op %d\n", ret, *retry_op);
> I think I'll go ahead and pull this patch into the testing branch, since
> it seems to be correct.
>
> Thanks!


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/4] ceph: add truncate size handling support for fscrypt
  2021-10-28 19:17   ` Jeff Layton
@ 2021-10-30  5:58     ` Xiubo Li
  2021-10-30  6:20     ` Xiubo Li
  1 sibling, 0 replies; 19+ messages in thread
From: Xiubo Li @ 2021-10-30  5:58 UTC (permalink / raw)
  To: Jeff Layton; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel


On 10/29/21 3:17 AM, Jeff Layton wrote:
> On Thu, 2021-10-28 at 17:14 +0800, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> This will transfer the encrypted last block contents to the MDS
>> along with the truncate request only when new size is smaller and
>> not aligned to the fscrypt BLOCK size.
>>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/ceph/caps.c              |   2 -
>>   fs/ceph/file.c              |  10 +-
>>   fs/ceph/inode.c             | 182 ++++++++++++++++++++++++++++++++++--
>>   fs/ceph/super.h             |   3 +-
>>   include/linux/ceph/crypto.h |  23 +++++
>>   5 files changed, 206 insertions(+), 14 deletions(-)
>>   create mode 100644 include/linux/ceph/crypto.h
>>
>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>> index 4e2a588465c5..c9624b059eb0 100644
>> --- a/fs/ceph/caps.c
>> +++ b/fs/ceph/caps.c
>> @@ -1299,8 +1299,6 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg)
>>   	 * fscrypt_auth holds the crypto context (if any). fscrypt_file
>>   	 * tracks the real i_size as an __le64 field (and we use a rounded-up
>>   	 * i_size in * the traditional size field).
>> -	 *
>> -	 * FIXME: should we encrypt fscrypt_file field?
>>   	 */
>>   	ceph_encode_32(&p, arg->fscrypt_auth_len);
>>   	ceph_encode_copy(&p, arg->fscrypt_auth, arg->fscrypt_auth_len);
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 1988e75ad4a2..2a780e0133df 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -902,7 +902,8 @@ static inline void fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64
>>    * only return a short read to the caller if we hit EOF.
>>    */
>>   ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>> -			 struct iov_iter *to, int *retry_op)
>> +			 struct iov_iter *to, int *retry_op,
>> +			 u64 *assert_ver)
>>   {
>>   	struct ceph_inode_info *ci = ceph_inode(inode);
>>   	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>> @@ -978,6 +979,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>>   					 req->r_end_latency,
>>   					 len, ret);
>>   
>> +		/* Grab assert version. It must be non-zero. */
>> +		*assert_ver = req->r_version;
>> +		WARN_ON_ONCE(assert_ver == 0);
>>   		ceph_osdc_put_request(req);
>>   
>>   		dout("sync_read %llu~%llu got %zd i_size %llu%s\n",
>> @@ -1074,12 +1078,14 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>>   {
>>   	struct file *file = iocb->ki_filp;
>>   	struct inode *inode = file_inode(file);
>> +	u64 assert_ver;
>>   
>>   	dout("sync_read on file %p %llu~%u %s\n", file, iocb->ki_pos,
>>   	     (unsigned)iov_iter_count(to),
>>   	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
>>   
>> -	return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op);
>> +	return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op,
>> +				&assert_ver);
>>   
>>   }
>>   
>> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
>> index 9b798690fdc9..abb634866897 100644
>> --- a/fs/ceph/inode.c
>> +++ b/fs/ceph/inode.c
>> @@ -21,6 +21,7 @@
>>   #include "cache.h"
>>   #include "crypto.h"
>>   #include <linux/ceph/decode.h>
>> +#include <linux/ceph/crypto.h>
>>   
>>   /*
>>    * Ceph inode operations
>> @@ -1034,10 +1035,14 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
>>   		pool_ns = old_ns;
>>   
>>   		if (IS_ENCRYPTED(inode) && size &&
>> -		    (iinfo->fscrypt_file_len == sizeof(__le64))) {
>> -			size = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file);
>> -			if (info->size != round_up(size, CEPH_FSCRYPT_BLOCK_SIZE))
>> -				pr_warn("size=%llu fscrypt_file=%llu\n", info->size, size);
>> +		    (iinfo->fscrypt_file_len >= sizeof(__le64))) {
>> +			u64 fsize = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file);
>> +			if (fsize) {
>> +				size = fsize;
>> +				if (info->size != round_up(size, CEPH_FSCRYPT_BLOCK_SIZE))
>> +					pr_warn("size=%llu fscrypt_file=%llu\n",
>> +						info->size, size);
>> +			}
>>   		}
>>   
>>   		queue_trunc = ceph_fill_file_size(inode, issued,
>> @@ -2229,6 +2234,130 @@ static const struct inode_operations ceph_encrypted_symlink_iops = {
>>   	.listxattr = ceph_listxattr,
>>   };
>>   
>> +/*
>> + * Transfer the encrypted last block to the MDS and the MDS
>> + * will update the file when truncating a smaller size.
>> + *
>> + * We don't support a PAGE_SIZE that is smaller than the
>> + * CEPH_FSCRYPT_BLOCK_SIZE.
>> + */
>> +static int fill_fscrypt_truncate(struct inode *inode,
>> +				 struct ceph_mds_request *req,
>> +				 struct iattr *attr)
>> +{
>> +	struct ceph_inode_info *ci = ceph_inode(inode);
>> +	int boff = attr->ia_size % CEPH_FSCRYPT_BLOCK_SIZE;
>> +	loff_t pos, orig_pos = round_down(attr->ia_size, CEPH_FSCRYPT_BLOCK_SIZE);
>> +	u64 block = orig_pos >> CEPH_FSCRYPT_BLOCK_SHIFT;
>> +	struct ceph_pagelist *pagelist = NULL;
>> +	struct kvec iov;
>> +	struct iov_iter iter;
>> +	struct page *page = NULL;
>> +	struct ceph_fscrypt_truncate_size_header header;
>> +	int retry_op = 0;
>> +	int len = CEPH_FSCRYPT_BLOCK_SIZE;
>> +	loff_t i_size = i_size_read(inode);
>> +	bool fill_header_only = false;
>> +	u64 assert_ver = cpu_to_le64(0);
>> +	int got, ret, issued;
>> +
>> +	ret = __ceph_get_caps(inode, NULL, CEPH_CAP_FILE_RD, 0, -1, &got);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	dout("%s size %lld -> %lld got cap refs on %s\n", __func__,
>> +	     i_size, attr->ia_size, ceph_cap_string(got));
>> +
>> +	issued = __ceph_caps_issued(ci, NULL);
>> +
>> +	/* Try to writeback the dirty pagecaches */
>> +	if (issued & (CEPH_CAP_FILE_BUFFER))
>> +		filemap_fdatawrite(&inode->i_data);
>> +
>> +	page = __page_cache_alloc(GFP_KERNEL);
>> +	if (page == NULL) {
>> +		ret = -ENOMEM;
>> +		goto out;
>> +	}
>> +
>> +	iov.iov_base = kmap_local_page(page);
>> +	iov.iov_len = len;
>> +	iov_iter_kvec(&iter, READ, &iov, 1, len);
>> +
>> +	pos = orig_pos;
>> +	ret = __ceph_sync_read(inode, &pos, &iter, &retry_op, &assert_ver);
>> +
>> +	/*
>> +	 * If we hit a hole here, we should just skip filling
>> +	 * the fscrypt for the request, because once the fscrypt
>> +	 * is enabled, the file will be split into many blocks
>> +	 * with the size of CEPH_FSCRYPT_BLOCK_SIZE, if there
>> +	 * has a hole, the hole size should be multiple of block
>> +	 * size.
>> +	 */
>> +	if (pos < i_size && ret < len) {
>> +		dout("%s hit hole, ppos %lld < size %lld\n",
>> +		     __func__, pos, i_size);
>> +
>> +		ret = 0;
>> +		fill_header_only = true;
>> +		goto fill_last_block;
>> +	}
>>
>>
> You can drop the goto above if you just put everything between here and
> fill_last_block into an else block.

Sure, will fix it.


>> +
>> +	/* truncate and zero out the extra contents for the last block */
>> +	memset(iov.iov_base + boff, 0, PAGE_SIZE - boff);
>> +
>> +	/* encrypt the last block */
>> +	ret = fscrypt_encrypt_block_inplace(inode, page,
>> +					    CEPH_FSCRYPT_BLOCK_SIZE,
>> +					    0, block,
>> +					    GFP_KERNEL);
>> +	if (ret)
>> +		goto out;
>>
>>
> Careful with the above. We don't _really_ want to run the encryption
> just yet, until we're ready to turn on content encryption everywhere.
> Might want to #ifdef 0 that out for now.

The fill_fscrypt_truncate() will be called by __ceph_setattr() if 
IS_ENCRYPTED(inode) is false.

And this patch series should be based on your previous fscrypt patch set.

>
>> +
>> +fill_last_block:
>> +	pagelist = ceph_pagelist_alloc(GFP_KERNEL);
>> +	if (!pagelist)
>> +		return -ENOMEM;
>> +
>> +	/* Insert the header first */
>> +	header.ver = 1;
>> +	header.compat = 1;
>> +	if (fill_header_only) {
>> +		header.data_len = cpu_to_le32(8 + 8 + 4);
>> +		header.assert_ver = cpu_to_le64(0);
>> +		header.file_offset = cpu_to_le64(0);
>> +		header.block_size = cpu_to_le64(0);
>> +	} else {
>> +		header.data_len = cpu_to_le32(8 + 8 + 4 + CEPH_FSCRYPT_BLOCK_SIZE);
>> +		header.assert_ver = assert_ver;
>> +		header.file_offset = cpu_to_le64(orig_pos);
>> +		header.block_size = cpu_to_le64(CEPH_FSCRYPT_BLOCK_SIZE);
>> +	}
>> +	ret = ceph_pagelist_append(pagelist, &header, sizeof(header));
>> +	if (ret)
>> +		goto out;
>> +
>> +	if (!fill_header_only) {
>> +		/* Append the last block contents to pagelist */
>> +		ret = ceph_pagelist_append(pagelist, iov.iov_base,
>> +					   CEPH_FSCRYPT_BLOCK_SIZE);
>> +		if (ret)
>> +			goto out;
>> +	}
>> +	req->r_pagelist = pagelist;
>> +out:
>> +	dout("%s %p size dropping cap refs on %s\n", __func__,
>> +	     inode, ceph_cap_string(got));
>> +	kunmap_local(iov.iov_base);
>> +	if (page)
>> +		__free_pages(page, 0);
>> +	if (ret && pagelist)
>> +		ceph_pagelist_release(pagelist);
>> +	ceph_put_cap_refs(ci, got);
>> +	return ret;
>> +}
>> +
>>   int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia)
>>   {
>>   	struct ceph_inode_info *ci = ceph_inode(inode);
>> @@ -2236,12 +2365,14 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>   	struct ceph_mds_request *req;
>>   	struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
>>   	struct ceph_cap_flush *prealloc_cf;
>> +	loff_t isize = i_size_read(inode);
>>   	int issued;
>>   	int release = 0, dirtied = 0;
>>   	int mask = 0;
>>   	int err = 0;
>>   	int inode_dirty_flags = 0;
>>   	bool lock_snap_rwsem = false;
>> +	bool fill_fscrypt;
>>   
>>   	prealloc_cf = ceph_alloc_cap_flush();
>>   	if (!prealloc_cf)
>> @@ -2254,6 +2385,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>   		return PTR_ERR(req);
>>   	}
>>   
>> +retry:
>> +	fill_fscrypt = false;
>>   	spin_lock(&ci->i_ceph_lock);
>>   	issued = __ceph_caps_issued(ci, NULL);
>>   
>> @@ -2367,10 +2500,27 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>   		}
>>   	}
>>   	if (ia_valid & ATTR_SIZE) {
>> -		loff_t isize = i_size_read(inode);
>> -
>>   		dout("setattr %p size %lld -> %lld\n", inode, isize, attr->ia_size);
>> -		if ((issued & CEPH_CAP_FILE_EXCL) && attr->ia_size >= isize) {
>> +		/*
>> +		 * Only when the new size is smaller and not aligned to
>> +		 * CEPH_FSCRYPT_BLOCK_SIZE will the RMW is needed.
>> +		 */
>> +		if (IS_ENCRYPTED(inode) && attr->ia_size < isize &&
>> +		    (attr->ia_size % CEPH_FSCRYPT_BLOCK_SIZE)) {
>> +			mask |= CEPH_SETATTR_SIZE;
>> +			release |= CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL |
>> +				   CEPH_CAP_FILE_RD | CEPH_CAP_FILE_WR;
>> +			set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
>> +			mask |= CEPH_SETATTR_FSCRYPT_FILE;
>> +			req->r_args.setattr.size =
>> +				cpu_to_le64(round_up(attr->ia_size,
>> +						     CEPH_FSCRYPT_BLOCK_SIZE));
>> +			req->r_args.setattr.old_size =
>> +				cpu_to_le64(round_up(isize,
>> +						     CEPH_FSCRYPT_BLOCK_SIZE));
>> +			req->r_fscrypt_file = attr->ia_size;
>> +			fill_fscrypt = true;
>> +		} else if ((issued & CEPH_CAP_FILE_EXCL) && attr->ia_size >= isize) {
>>   			if (attr->ia_size > isize) {
>>   				i_size_write(inode, attr->ia_size);
>>   				inode->i_blocks = calc_inode_blocks(attr->ia_size);
>> @@ -2393,7 +2543,6 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>   					cpu_to_le64(round_up(isize,
>>   							     CEPH_FSCRYPT_BLOCK_SIZE));
>>   				req->r_fscrypt_file = attr->ia_size;
>> -				/* FIXME: client must zero out any partial blocks! */
>>   			} else {
>>   				req->r_args.setattr.size = cpu_to_le64(attr->ia_size);
>>   				req->r_args.setattr.old_size = cpu_to_le64(isize);
>> @@ -2465,7 +2614,6 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>   	if (inode_dirty_flags)
>>   		__mark_inode_dirty(inode, inode_dirty_flags);
>>   
>> -
>>   	if (mask) {
>>   		req->r_inode = inode;
>>   		ihold(inode);
>> @@ -2473,7 +2621,23 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>   		req->r_args.setattr.mask = cpu_to_le32(mask);
>>   		req->r_num_caps = 1;
>>   		req->r_stamp = attr->ia_ctime;
>> +		if (fill_fscrypt) {
>> +			err = fill_fscrypt_truncate(inode, req, attr);
>> +			if (err)
>> +				goto out;
>> +		}
>> +
>> +		/*
>> +		 * The truncate will return -EAGAIN when some one
>> +		 * has updated the last block before the MDS hold
>> +		 * the xlock for the FILE lock. Need to retry it.
>> +		 */
>>   		err = ceph_mdsc_do_request(mdsc, NULL, req);
>> +		if (err == -EAGAIN) {
>> +			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
>> +			     inode, err, ceph_cap_string(dirtied), mask);
>> +			goto retry;
>> +		}
> The rest looks reasonable. We may want to cap the number of retries in
> case something goes really wrong or in the case of a livelock with a
> competing client. I'm not sure what a reasonable number of tries would
> be though -- 5? 10? 100? We may want to benchmark out how long this rmw
> operation takes and then we can use that to determine a reasonable
> number of tries.
>
> If you run out of tries, you could probably  just return -EAGAIN in that
> case. That's not listed in the truncate(2) manpage, but it seems like a
> reasonable way to handle that sort of problem.

Make sesne, will do that.

Thanks

-- Xiubo



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/4] ceph: add truncate size handling support for fscrypt
  2021-10-28 19:17   ` Jeff Layton
  2021-10-30  5:58     ` Xiubo Li
@ 2021-10-30  6:20     ` Xiubo Li
  2021-10-30 10:52       ` Jeff Layton
  1 sibling, 1 reply; 19+ messages in thread
From: Xiubo Li @ 2021-10-30  6:20 UTC (permalink / raw)
  To: Jeff Layton; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel



[...]

>> @@ -2473,7 +2621,23 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>   		req->r_args.setattr.mask = cpu_to_le32(mask);
>>   		req->r_num_caps = 1;
>>   		req->r_stamp = attr->ia_ctime;
>> +		if (fill_fscrypt) {
>> +			err = fill_fscrypt_truncate(inode, req, attr);
>> +			if (err)
>> +				goto out;
>> +		}
>> +
>> +		/*
>> +		 * The truncate will return -EAGAIN when some one
>> +		 * has updated the last block before the MDS hold
>> +		 * the xlock for the FILE lock. Need to retry it.
>> +		 */
>>   		err = ceph_mdsc_do_request(mdsc, NULL, req);
>> +		if (err == -EAGAIN) {
>> +			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
>> +			     inode, err, ceph_cap_string(dirtied), mask);
>> +			goto retry;
>> +		}
> The rest looks reasonable. We may want to cap the number of retries in
> case something goes really wrong or in the case of a livelock with a
> competing client. I'm not sure what a reasonable number of tries would
> be though -- 5? 10? 100? We may want to benchmark out how long this rmw
> operation takes and then we can use that to determine a reasonable
> number of tries.

<7>[  330.648749] ceph:  setattr 00000000197f0d87 issued pAsxLsXsxFsxcrwb
<7>[  330.648752] ceph:  setattr 00000000197f0d87 size 11 -> 2
<7>[  330.648756] ceph:  setattr 00000000197f0d87 mtime 
1635574177.43176541 -> 1635574210.35946684
<7>[  330.648760] ceph:  setattr 00000000197f0d87 ctime 
1635574177.43176541 -> 1635574210.35946684 (ignored)
<7>[  330.648765] ceph:  setattr 00000000197f0d87 ATTR_FILE ... hrm!
...

<7>[  330.653696] ceph:  fill_fscrypt_truncate 00000000197f0d87 size 
dropping cap refs on Fr
...

<7>[  330.697464] ceph:  setattr 00000000197f0d87 result=0 (Fx locally, 
4128 remote)

It takes around 50ms.

Shall we retry 20 times ?

>
> If you run out of tries, you could probably  just return -EAGAIN in that
> case. That's not listed in the truncate(2) manpage, but it seems like a
> reasonable way to handle that sort of problem.
>
[...]


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/4] ceph: add truncate size handling support for fscrypt
  2021-10-30  6:20     ` Xiubo Li
@ 2021-10-30 10:52       ` Jeff Layton
  2021-10-31  8:56         ` Xiubo Li
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2021-10-30 10:52 UTC (permalink / raw)
  To: Xiubo Li; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel

On Sat, 2021-10-30 at 14:20 +0800, Xiubo Li wrote:
> 
> [...]
> 
> > > @@ -2473,7 +2621,23 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
> > >   		req->r_args.setattr.mask = cpu_to_le32(mask);
> > >   		req->r_num_caps = 1;
> > >   		req->r_stamp = attr->ia_ctime;
> > > +		if (fill_fscrypt) {
> > > +			err = fill_fscrypt_truncate(inode, req, attr);
> > > +			if (err)
> > > +				goto out;
> > > +		}
> > > +
> > > +		/*
> > > +		 * The truncate will return -EAGAIN when some one
> > > +		 * has updated the last block before the MDS hold
> > > +		 * the xlock for the FILE lock. Need to retry it.
> > > +		 */
> > >   		err = ceph_mdsc_do_request(mdsc, NULL, req);
> > > +		if (err == -EAGAIN) {
> > > +			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
> > > +			     inode, err, ceph_cap_string(dirtied), mask);
> > > +			goto retry;
> > > +		}
> > The rest looks reasonable. We may want to cap the number of retries in
> > case something goes really wrong or in the case of a livelock with a
> > competing client. I'm not sure what a reasonable number of tries would
> > be though -- 5? 10? 100? We may want to benchmark out how long this rmw
> > operation takes and then we can use that to determine a reasonable
> > number of tries.
> 
> <7>[  330.648749] ceph:  setattr 00000000197f0d87 issued pAsxLsXsxFsxcrwb
> <7>[  330.648752] ceph:  setattr 00000000197f0d87 size 11 -> 2
> <7>[  330.648756] ceph:  setattr 00000000197f0d87 mtime 
> 1635574177.43176541 -> 1635574210.35946684
> <7>[  330.648760] ceph:  setattr 00000000197f0d87 ctime 
> 1635574177.43176541 -> 1635574210.35946684 (ignored)
> <7>[  330.648765] ceph:  setattr 00000000197f0d87 ATTR_FILE ... hrm!
> ...
> 
> <7>[  330.653696] ceph:  fill_fscrypt_truncate 00000000197f0d87 size 
> dropping cap refs on Fr
> ...
> 
> <7>[  330.697464] ceph:  setattr 00000000197f0d87 result=0 (Fx locally, 
> 4128 remote)
> 
> It takes around 50ms.
> 
> Shall we retry 20 times ?
> 

Sounds like a good place to start.

> > 
> > If you run out of tries, you could probably  just return -EAGAIN in that
> > case. That's not listed in the truncate(2) manpage, but it seems like a
> > reasonable way to handle that sort of problem.
> > 
> [...]
> 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/4] ceph: add truncate size handling support for fscrypt
  2021-10-30 10:52       ` Jeff Layton
@ 2021-10-31  8:56         ` Xiubo Li
  0 siblings, 0 replies; 19+ messages in thread
From: Xiubo Li @ 2021-10-31  8:56 UTC (permalink / raw)
  To: Jeff Layton; +Cc: idryomov, vshankar, pdonnell, khiremat, ceph-devel


On 10/30/21 6:52 PM, Jeff Layton wrote:
> On Sat, 2021-10-30 at 14:20 +0800, Xiubo Li wrote:
>> [...]
>>
>>>> @@ -2473,7 +2621,23 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>>>    		req->r_args.setattr.mask = cpu_to_le32(mask);
>>>>    		req->r_num_caps = 1;
>>>>    		req->r_stamp = attr->ia_ctime;
>>>> +		if (fill_fscrypt) {
>>>> +			err = fill_fscrypt_truncate(inode, req, attr);
>>>> +			if (err)
>>>> +				goto out;
>>>> +		}
>>>> +
>>>> +		/*
>>>> +		 * The truncate will return -EAGAIN when some one
>>>> +		 * has updated the last block before the MDS hold
>>>> +		 * the xlock for the FILE lock. Need to retry it.
>>>> +		 */
>>>>    		err = ceph_mdsc_do_request(mdsc, NULL, req);
>>>> +		if (err == -EAGAIN) {
>>>> +			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
>>>> +			     inode, err, ceph_cap_string(dirtied), mask);
>>>> +			goto retry;
>>>> +		}
>>> The rest looks reasonable. We may want to cap the number of retries in
>>> case something goes really wrong or in the case of a livelock with a
>>> competing client. I'm not sure what a reasonable number of tries would
>>> be though -- 5? 10? 100? We may want to benchmark out how long this rmw
>>> operation takes and then we can use that to determine a reasonable
>>> number of tries.
>> <7>[  330.648749] ceph:  setattr 00000000197f0d87 issued pAsxLsXsxFsxcrwb
>> <7>[  330.648752] ceph:  setattr 00000000197f0d87 size 11 -> 2
>> <7>[  330.648756] ceph:  setattr 00000000197f0d87 mtime
>> 1635574177.43176541 -> 1635574210.35946684
>> <7>[  330.648760] ceph:  setattr 00000000197f0d87 ctime
>> 1635574177.43176541 -> 1635574210.35946684 (ignored)
>> <7>[  330.648765] ceph:  setattr 00000000197f0d87 ATTR_FILE ... hrm!
>> ...
>>
>> <7>[  330.653696] ceph:  fill_fscrypt_truncate 00000000197f0d87 size
>> dropping cap refs on Fr
>> ...
>>
>> <7>[  330.697464] ceph:  setattr 00000000197f0d87 result=0 (Fx locally,
>> 4128 remote)
>>
>> It takes around 50ms.
>>
>> Shall we retry 20 times ?
>>
> Sounds like a good place to start.
>
Cool, will set it to 20 for now.

Thanks


>>> If you run out of tries, you could probably  just return -EAGAIN in that
>>> case. That's not listed in the truncate(2) manpage, but it seems like a
>>> reasonable way to handle that sort of problem.
>>>
>> [...]
>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-10-31  8:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-28  9:14 [PATCH v3 0/4] ceph: size handling for the fscrypt xiubli
2021-10-28  9:14 ` [PATCH v3 1/4] ceph: add __ceph_get_caps helper support xiubli
2021-10-28  9:23   ` Xiubo Li
2021-10-28 10:44     ` Jeff Layton
2021-10-28 12:41       ` Xiubo Li
2021-10-28  9:14 ` [PATCH v3 2/4] ceph: add __ceph_sync_read " xiubli
2021-10-28 18:21   ` Jeff Layton
2021-10-29  8:14     ` Xiubo Li
2021-10-29 10:29       ` Jeff Layton
2021-10-30  4:05         ` Xiubo Li
2021-10-28  9:14 ` [PATCH v3 3/4] ceph: return the real size readed when hit EOF xiubli
2021-10-28 18:11   ` Jeff Layton
2021-10-30  4:35     ` Xiubo Li
2021-10-28  9:14 ` [PATCH v3 4/4] ceph: add truncate size handling support for fscrypt xiubli
2021-10-28 19:17   ` Jeff Layton
2021-10-30  5:58     ` Xiubo Li
2021-10-30  6:20     ` Xiubo Li
2021-10-30 10:52       ` Jeff Layton
2021-10-31  8:56         ` Xiubo Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).