All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v10 00/48] ceph+fscrypt: full support
@ 2022-01-11 19:15 Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 01/48] vfs: export new_inode_pseudo Jeff Layton
                   ` (51 more replies)
  0 siblings, 52 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

This patchset represents a (mostly) complete rough draft of fscrypt
support for cephfs. The context, filename and symlink support is more or
less the same as the versions posted before, and comprise the first half
of the patches.

The new bits here are the size handling changes and support for content
encryption, in buffered, direct and synchronous codepaths. Much of this
code is still very rough and needs a lot of cleanup work.

fscrypt support relies on some MDS changes that are being tracked here:

    https://github.com/ceph/ceph/pull/43588

In particular, this PR adds some new opaque fields in the inode that we
use to store fscrypt-specific information, like the context and the real
size of a file. That is slated to be merged for the upcoming Quincy
release (which is sometime this northern spring).

There are still some notable bugs:

1/ we've identified a few more potential races in truncate handling
which will probably necessitate a protocol change, as well as changes to
the MDS and kclient patchsets. The good news is that we think we have
an approach that will resolve this.

2/ the kclient doesn't handle reading sparse regions in OSD objects
properly yet. The client can end up writing to a non-zero offset in a
non-existent object. Then, if the client tries to read the written
region back later, it'll get back zeroes and give you garbage when you
try to decrypt them.

It turns out that the OSD already supports a SPARSE_READ operation, so
I'm working on implementing that in the kclient to make it not try to
decrypt the sparse regions.

Still, I was able to run xfstests on this set yesterday. Bug #2 above
prevented all of the tests from passing, but it didn't oops! I call that
progress! Given that, I figured this is a good time to post what I have
so far.

Note that the buffered I/O changes in this set are not suitable for
merge and will likely end up being discarded. We need to plumb the
encryption in at the netfs layer, so that we can store encrypted data
in fscache.

The non-buffered codepaths will likely also need substantial changes
before merging. It may be simpler to just move that into the netfs layer
too as cifs will need something similar anyway.

My goal is to get most of this into v5.18, but v5.19 might be more
realistiv. Hopefully I'll have a non-RFC patchset to send in a few
weeks.

Special thanks to Xiubo who came through with the MDS patches. Also,
thanks to everyone (especially Eric Biggers) for all of the previous
reviews. It's much appreciated!

Jeff Layton (43):
  vfs: export new_inode_pseudo
  fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode
  fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
  fscrypt: add fscrypt_context_for_new_inode
  ceph: preallocate inode for ops that may create one
  ceph: crypto context handling for ceph
  ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces
  ceph: add fscrypt_* handling to caps.c
  ceph: add ability to set fscrypt_auth via setattr
  ceph: implement -o test_dummy_encryption mount option
  ceph: decode alternate_name in lease info
  ceph: add fscrypt ioctls
  ceph: make ceph_msdc_build_path use ref-walk
  ceph: add encrypted fname handling to ceph_mdsc_build_path
  ceph: send altname in MClientRequest
  ceph: encode encrypted name in dentry release
  ceph: properly set DCACHE_NOKEY_NAME flag in lookup
  ceph: make d_revalidate call fscrypt revalidator for encrypted
    dentries
  ceph: add helpers for converting names for userland presentation
  ceph: add fscrypt support to ceph_fill_trace
  ceph: add support to readdir for encrypted filenames
  ceph: create symlinks with encrypted and base64-encoded targets
  ceph: make ceph_get_name decrypt filenames
  ceph: add a new ceph.fscrypt.auth vxattr
  ceph: add some fscrypt guardrails
  libceph: add CEPH_OSD_OP_ASSERT_VER support
  ceph: size handling for encrypted inodes in cap updates
  ceph: fscrypt_file field handling in MClientRequest messages
  ceph: get file size from fscrypt_file when present in inode traces
  ceph: handle fscrypt fields in cap messages from MDS
  ceph: add infrastructure for file encryption and decryption
  libceph: allow ceph_osdc_new_request to accept a multi-op read
  ceph: disable fallocate for encrypted inodes
  ceph: disable copy offload on encrypted inodes
  ceph: don't use special DIO path for encrypted inodes
  ceph: set encryption context on open
  ceph: align data in pages in ceph_sync_write
  ceph: add read/modify/write to ceph_sync_write
  ceph: plumb in decryption during sync reads
  ceph: set i_blkbits to crypto block size for encrypted inodes
  ceph: add fscrypt decryption support to ceph_netfs_issue_op
  ceph: add encryption support to writepage
  ceph: fscrypt support for writepages

Luis Henriques (1):
  ceph: don't allow changing layout on encrypted files/directories

Xiubo Li (4):
  ceph: add __ceph_get_caps helper support
  ceph: add __ceph_sync_read helper support
  ceph: add object version support for sync read
  ceph: add truncate size handling support for fscrypt

 fs/ceph/Makefile                |   1 +
 fs/ceph/acl.c                   |   4 +-
 fs/ceph/addr.c                  | 128 +++++--
 fs/ceph/caps.c                  | 211 ++++++++++--
 fs/ceph/crypto.c                | 374 +++++++++++++++++++++
 fs/ceph/crypto.h                | 237 +++++++++++++
 fs/ceph/dir.c                   | 209 +++++++++---
 fs/ceph/export.c                |  44 ++-
 fs/ceph/file.c                  | 476 +++++++++++++++++++++-----
 fs/ceph/inode.c                 | 576 +++++++++++++++++++++++++++++---
 fs/ceph/ioctl.c                 |  87 +++++
 fs/ceph/mds_client.c            | 349 ++++++++++++++++---
 fs/ceph/mds_client.h            |  24 +-
 fs/ceph/super.c                 |  90 ++++-
 fs/ceph/super.h                 |  43 ++-
 fs/ceph/xattr.c                 |  29 ++
 fs/crypto/fname.c               |  44 ++-
 fs/crypto/fscrypt_private.h     |   9 +-
 fs/crypto/hooks.c               |   6 +-
 fs/crypto/policy.c              |  35 +-
 fs/inode.c                      |   1 +
 include/linux/ceph/ceph_fs.h    |  21 +-
 include/linux/ceph/osd_client.h |   6 +-
 include/linux/ceph/rados.h      |   4 +
 include/linux/fscrypt.h         |  10 +
 net/ceph/osd_client.c           |  32 +-
 26 files changed, 2700 insertions(+), 350 deletions(-)
 create mode 100644 fs/ceph/crypto.c
 create mode 100644 fs/ceph/crypto.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 01/48] vfs: export new_inode_pseudo
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 02/48] fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode Jeff Layton
                   ` (50 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov, Al Viro

Ceph needs to be able to allocate inodes ahead of a create that might
involve a fscrypt-encrypted inode. new_inode() almost fits the bill,
but it puts the inode on the sb->s_inodes list and when we go to hash
it, that might be done again.

We could work around that by setting I_CREATING on the new inode, but
that causes ilookup5 to return -ESTALE if something tries to find it
before I_NEW is cleared. This is desirable behavior for most
filesystems, but doesn't work for ceph.

To work around all of this, just use new_inode_pseudo which doesn't add
it to the sb->s_inodes list.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/inode.c b/fs/inode.c
index 6b80a51129d5..7fd85501bb32 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -951,6 +951,7 @@ struct inode *new_inode_pseudo(struct super_block *sb)
 	}
 	return inode;
 }
+EXPORT_SYMBOL(new_inode_pseudo);
 
 /**
  *	new_inode 	- obtain an inode
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 02/48] fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 01/48] vfs: export new_inode_pseudo Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 03/48] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size Jeff Layton
                   ` (49 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov, Eric Biggers

Ceph is going to add fscrypt support, but we still want encrypted
filenames to be composed of printable characters, so we can maintain
compatibility with clients that don't support fscrypt.

We could just adopt fscrypt's current nokey name format, but that is
subject to change in the future, and it also contains dirhash fields
that we don't need for cephfs. Because of this, we're going to concoct
our own scheme for encoding encrypted filenames. It's very similar to
fscrypt's current scheme, but doesn't bother with the dirhash fields.

The ceph encoding scheme will use base64 encoding as well, and we also
want it to avoid characters that are illegal in filenames. Export the
fscrypt base64 encoding/decoding routines so we can use them in ceph's
fscrypt implementation.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/crypto/fname.c       | 8 ++++----
 include/linux/fscrypt.h | 5 +++++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index a9be4bc74a94..1e4233c95005 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -182,8 +182,6 @@ static int fname_decrypt(const struct inode *inode,
 static const char base64url_table[65] =
 	"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
 
-#define FSCRYPT_BASE64URL_CHARS(nbytes)	DIV_ROUND_UP((nbytes) * 4, 3)
-
 /**
  * fscrypt_base64url_encode() - base64url-encode some binary data
  * @src: the binary data to encode
@@ -198,7 +196,7 @@ static const char base64url_table[65] =
  * Return: the length of the resulting base64url-encoded string in bytes.
  *	   This will be equal to FSCRYPT_BASE64URL_CHARS(srclen).
  */
-static int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
+int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
 {
 	u32 ac = 0;
 	int bits = 0;
@@ -217,6 +215,7 @@ static int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
 		*cp++ = base64url_table[(ac << (6 - bits)) & 0x3f];
 	return cp - dst;
 }
+EXPORT_SYMBOL_GPL(fscrypt_base64url_encode);
 
 /**
  * fscrypt_base64url_decode() - base64url-decode a string
@@ -233,7 +232,7 @@ static int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst)
  * Return: the length of the resulting decoded binary data in bytes,
  *	   or -1 if the string isn't a valid base64url string.
  */
-static int fscrypt_base64url_decode(const char *src, int srclen, u8 *dst)
+int fscrypt_base64url_decode(const char *src, int srclen, u8 *dst)
 {
 	u32 ac = 0;
 	int bits = 0;
@@ -256,6 +255,7 @@ static int fscrypt_base64url_decode(const char *src, int srclen, u8 *dst)
 		return -1;
 	return bp - dst;
 }
+EXPORT_SYMBOL_GPL(fscrypt_base64url_decode);
 
 bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
 				  u32 orig_len, u32 max_len,
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 91ea9477e9bd..671181d196a8 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -46,6 +46,9 @@ struct fscrypt_name {
 /* Maximum value for the third parameter of fscrypt_operations.set_context(). */
 #define FSCRYPT_SET_CONTEXT_MAX_SIZE	40
 
+/* len of resulting string (sans NUL terminator) after base64 encoding nbytes */
+#define FSCRYPT_BASE64URL_CHARS(nbytes)		DIV_ROUND_UP((nbytes) * 4, 3)
+
 #ifdef CONFIG_FS_ENCRYPTION
 
 /*
@@ -305,6 +308,8 @@ void fscrypt_free_inode(struct inode *inode);
 int fscrypt_drop_inode(struct inode *inode);
 
 /* fname.c */
+int fscrypt_base64url_encode(const u8 *src, int len, char *dst);
+int fscrypt_base64url_decode(const char *src, int len, u8 *dst);
 int fscrypt_setup_filename(struct inode *inode, const struct qstr *iname,
 			   int lookup, struct fscrypt_name *fname);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 03/48] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 01/48] vfs: export new_inode_pseudo Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 02/48] fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-27  1:58   ` Eric Biggers
  2022-01-11 19:15 ` [RFC PATCH v10 04/48] fscrypt: add fscrypt_context_for_new_inode Jeff Layton
                   ` (48 subsequent siblings)
  51 siblings, 1 reply; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

For ceph, we want to use our own scheme for handling filenames that are
are longer than NAME_MAX after encryption and Base64 encoding. This
allows us to have a consistent view of the encrypted filenames for
clients that don't support fscrypt and clients that do but that don't
have the key.

Currently, fs/crypto only supports encrypting filenames using
fscrypt_setup_filename, but that also handles encoding nokey names. Ceph
can't use that because it handles nokey names in a different way.

Export fscrypt_fname_encrypt. Rename fscrypt_fname_encrypted_size to
__fscrypt_fname_encrypted_size and add a new wrapper called
fscrypt_fname_encrypted_size that takes an inode argument rather than a
pointer to a fscrypt_policy union.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/crypto/fname.c           | 36 ++++++++++++++++++++++++++++++------
 fs/crypto/fscrypt_private.h |  9 +++------
 fs/crypto/hooks.c           |  6 +++---
 include/linux/fscrypt.h     |  4 ++++
 4 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 1e4233c95005..733ae43da6ec 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -79,7 +79,8 @@ static inline bool fscrypt_is_dot_dotdot(const struct qstr *str)
 /**
  * fscrypt_fname_encrypt() - encrypt a filename
  * @inode: inode of the parent directory (for regular filenames)
- *	   or of the symlink (for symlink targets)
+ *	   or of the symlink (for symlink targets). Key must already be
+ *	   set up.
  * @iname: the filename to encrypt
  * @out: (output) the encrypted filename
  * @olen: size of the encrypted filename.  It must be at least @iname->len.
@@ -130,6 +131,7 @@ int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(fscrypt_fname_encrypt);
 
 /**
  * fname_decrypt() - decrypt a filename
@@ -257,9 +259,9 @@ int fscrypt_base64url_decode(const char *src, int srclen, u8 *dst)
 }
 EXPORT_SYMBOL_GPL(fscrypt_base64url_decode);
 
-bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
-				  u32 orig_len, u32 max_len,
-				  u32 *encrypted_len_ret)
+bool __fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
+				    u32 orig_len, u32 max_len,
+				    u32 *encrypted_len_ret)
 {
 	int padding = 4 << (fscrypt_policy_flags(policy) &
 			    FSCRYPT_POLICY_FLAGS_PAD_MASK);
@@ -273,6 +275,29 @@ bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
 	return true;
 }
 
+/**
+ * fscrypt_fname_encrypted_size() - calculate length of encrypted filename
+ * @inode: 		parent inode of dentry name being encrypted. Key must
+ * 			already be set up.
+ * @orig_len:		length of the original filename
+ * @max_len:		maximum length to return
+ * @encrypted_len_ret:	where calculated length should be returned (on success)
+ *
+ * Filenames that are shorter than the maximum length may have their lengths
+ * increased slightly by encryption, due to padding that is applied.
+ *
+ * Return: false if the orig_len is greater than max_len. Otherwise, true and
+ * 	   fill out encrypted_len_ret with the length (up to max_len).
+ */
+bool fscrypt_fname_encrypted_size(const struct inode *inode, u32 orig_len,
+				  u32 max_len, u32 *encrypted_len_ret)
+{
+	return __fscrypt_fname_encrypted_size(&inode->i_crypt_info->ci_policy,
+					      orig_len, max_len,
+					      encrypted_len_ret);
+}
+EXPORT_SYMBOL_GPL(fscrypt_fname_encrypted_size);
+
 /**
  * fscrypt_fname_alloc_buffer() - allocate a buffer for presented filenames
  * @max_encrypted_len: maximum length of encrypted filenames the buffer will be
@@ -428,8 +453,7 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname,
 		return ret;
 
 	if (fscrypt_has_encryption_key(dir)) {
-		if (!fscrypt_fname_encrypted_size(&dir->i_crypt_info->ci_policy,
-						  iname->len, NAME_MAX,
+		if (!fscrypt_fname_encrypted_size(dir, iname->len, NAME_MAX,
 						  &fname->crypto_buf.len))
 			return -ENAMETOOLONG;
 		fname->crypto_buf.name = kmalloc(fname->crypto_buf.len,
diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h
index 5b0a9e6478b5..f3e6e566daff 100644
--- a/fs/crypto/fscrypt_private.h
+++ b/fs/crypto/fscrypt_private.h
@@ -297,14 +297,11 @@ void fscrypt_generate_iv(union fscrypt_iv *iv, u64 lblk_num,
 			 const struct fscrypt_info *ci);
 
 /* fname.c */
-int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
-			  u8 *out, unsigned int olen);
-bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
-				  u32 orig_len, u32 max_len,
-				  u32 *encrypted_len_ret);
+bool __fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
+				    u32 orig_len, u32 max_len,
+				    u32 *encrypted_len_ret);
 
 /* hkdf.c */
-
 struct fscrypt_hkdf {
 	struct crypto_shash *hmac_tfm;
 };
diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c
index af74599ae1cf..7c01025879b3 100644
--- a/fs/crypto/hooks.c
+++ b/fs/crypto/hooks.c
@@ -228,9 +228,9 @@ int fscrypt_prepare_symlink(struct inode *dir, const char *target,
 	 * counting it (even though it is meaningless for ciphertext) is simpler
 	 * for now since filesystems will assume it is there and subtract it.
 	 */
-	if (!fscrypt_fname_encrypted_size(policy, len,
-					  max_len - sizeof(struct fscrypt_symlink_data),
-					  &disk_link->len))
+	if (!__fscrypt_fname_encrypted_size(policy, len,
+					    max_len - sizeof(struct fscrypt_symlink_data),
+					    &disk_link->len))
 		return -ENAMETOOLONG;
 	disk_link->len += sizeof(struct fscrypt_symlink_data);
 
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 671181d196a8..c90e176b5843 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -308,8 +308,12 @@ void fscrypt_free_inode(struct inode *inode);
 int fscrypt_drop_inode(struct inode *inode);
 
 /* fname.c */
+int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
+			  u8 *out, unsigned int olen);
 int fscrypt_base64url_encode(const u8 *src, int len, char *dst);
 int fscrypt_base64url_decode(const char *src, int len, u8 *dst);
+bool fscrypt_fname_encrypted_size(const struct inode *inode, u32 orig_len,
+				  u32 max_len, u32 *encrypted_len_ret);
 int fscrypt_setup_filename(struct inode *inode, const struct qstr *iname,
 			   int lookup, struct fscrypt_name *fname);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 04/48] fscrypt: add fscrypt_context_for_new_inode
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (2 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 03/48] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 05/48] ceph: preallocate inode for ops that may create one Jeff Layton
                   ` (47 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov, Eric Biggers

Most filesystems just call fscrypt_set_context on new inodes, which
usually causes a setxattr. That's a bit late for ceph, which can send
along a full set of attributes with the create request.

Doing so allows it to avoid race windows that where the new inode could
be seen by other clients without the crypto context attached. It also
avoids the separate round trip to the server.

Refactor the fscrypt code a bit to allow us to create a new crypto
context, attach it to the inode, and write it to the buffer, but without
calling set_context on it. ceph can later use this to marshal the
context into the attributes we send along with the create request.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/crypto/policy.c      | 35 +++++++++++++++++++++++++++++------
 include/linux/fscrypt.h |  1 +
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/fs/crypto/policy.c b/fs/crypto/policy.c
index ed3d623724cd..ec861af96252 100644
--- a/fs/crypto/policy.c
+++ b/fs/crypto/policy.c
@@ -664,6 +664,32 @@ const union fscrypt_policy *fscrypt_policy_to_inherit(struct inode *dir)
 	return fscrypt_get_dummy_policy(dir->i_sb);
 }
 
+/**
+ * fscrypt_context_for_new_inode() - create an encryption context for a new inode
+ * @ctx: where context should be written
+ * @inode: inode from which to fetch policy and nonce
+ *
+ * Given an in-core "prepared" (via fscrypt_prepare_new_inode) inode,
+ * generate a new context and write it to ctx. ctx _must_ be at least
+ * FSCRYPT_SET_CONTEXT_MAX_SIZE bytes.
+ *
+ * Return: size of the resulting context or a negative error code.
+ */
+int fscrypt_context_for_new_inode(void *ctx, struct inode *inode)
+{
+	struct fscrypt_info *ci = inode->i_crypt_info;
+
+	BUILD_BUG_ON(sizeof(union fscrypt_context) !=
+			FSCRYPT_SET_CONTEXT_MAX_SIZE);
+
+	/* fscrypt_prepare_new_inode() should have set up the key already. */
+	if (WARN_ON_ONCE(!ci))
+		return -ENOKEY;
+
+	return fscrypt_new_context(ctx, &ci->ci_policy, ci->ci_nonce);
+}
+EXPORT_SYMBOL_GPL(fscrypt_context_for_new_inode);
+
 /**
  * fscrypt_set_context() - Set the fscrypt context of a new inode
  * @inode: a new inode
@@ -680,12 +706,9 @@ int fscrypt_set_context(struct inode *inode, void *fs_data)
 	union fscrypt_context ctx;
 	int ctxsize;
 
-	/* fscrypt_prepare_new_inode() should have set up the key already. */
-	if (WARN_ON_ONCE(!ci))
-		return -ENOKEY;
-
-	BUILD_BUG_ON(sizeof(ctx) != FSCRYPT_SET_CONTEXT_MAX_SIZE);
-	ctxsize = fscrypt_new_context(&ctx, &ci->ci_policy, ci->ci_nonce);
+	ctxsize = fscrypt_context_for_new_inode(&ctx, inode);
+	if (ctxsize < 0)
+		return ctxsize;
 
 	/*
 	 * This may be the first time the inode number is available, so do any
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index c90e176b5843..530433098f82 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -276,6 +276,7 @@ int fscrypt_ioctl_get_policy(struct file *filp, void __user *arg);
 int fscrypt_ioctl_get_policy_ex(struct file *filp, void __user *arg);
 int fscrypt_ioctl_get_nonce(struct file *filp, void __user *arg);
 int fscrypt_has_permitted_context(struct inode *parent, struct inode *child);
+int fscrypt_context_for_new_inode(void *ctx, struct inode *inode);
 int fscrypt_set_context(struct inode *inode, void *fs_data);
 
 struct fscrypt_dummy_policy {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 05/48] ceph: preallocate inode for ops that may create one
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (3 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 04/48] fscrypt: add fscrypt_context_for_new_inode Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 06/48] ceph: crypto context handling for ceph Jeff Layton
                   ` (46 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

When creating a new inode, we need to determine the crypto context
before we can transmit the RPC. The fscrypt API has a routine for getting
a crypto context before a create occurs, but it requires an inode.

Change the ceph code to preallocate an inode in advance of a create of
any sort (open(), mknod(), symlink(), etc). Move the existing code that
generates the ACL and SELinux blobs into this routine since that's
mostly common across all the different codepaths.

In most cases, we just want to allow ceph_fill_trace to use that inode
after the reply comes in, so add a new field to the MDS request for it
(r_new_inode).

The async create codepath is a bit different though. In that case, we
want to hash the inode in advance of the RPC so that it can be used
before the reply comes in. If the call subsequently fails with
-EJUKEBOX, then just put the references and clean up the as_ctx. Note
that with this change, we now need to regenerate the as_ctx when this
occurs, but it's quite rare for it to happen.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c        | 70 ++++++++++++++++++++-----------------
 fs/ceph/file.c       | 62 ++++++++++++++++++++-------------
 fs/ceph/inode.c      | 82 ++++++++++++++++++++++++++++++++++++++++----
 fs/ceph/mds_client.c |  3 +-
 fs/ceph/mds_client.h |  1 +
 fs/ceph/super.h      |  7 +++-
 6 files changed, 160 insertions(+), 65 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 133dbd9338e7..288f6f0b4b74 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -852,13 +852,6 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out;
 	}
 
-	err = ceph_pre_init_acls(dir, &mode, &as_ctx);
-	if (err < 0)
-		goto out;
-	err = ceph_security_init_secctx(dentry, mode, &as_ctx);
-	if (err < 0)
-		goto out;
-
 	dout("mknod in dir %p dentry %p mode 0%ho rdev %d\n",
 	     dir, dentry, mode, rdev);
 	req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_MKNOD, USE_AUTH_MDS);
@@ -866,6 +859,14 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 		err = PTR_ERR(req);
 		goto out;
 	}
+
+	req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+	if (IS_ERR(req->r_new_inode)) {
+		err = PTR_ERR(req->r_new_inode);
+		req->r_new_inode = NULL;
+		goto out_req;
+	}
+
 	req->r_dentry = dget(dentry);
 	req->r_num_caps = 2;
 	req->r_parent = dir;
@@ -875,13 +876,13 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 	req->r_args.mknod.rdev = cpu_to_le32(rdev);
 	req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL;
 	req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
-	if (as_ctx.pagelist) {
-		req->r_pagelist = as_ctx.pagelist;
-		as_ctx.pagelist = NULL;
-	}
+
+	ceph_as_ctx_to_req(req, &as_ctx);
+
 	err = ceph_mdsc_do_request(mdsc, dir, req);
 	if (!err && !req->r_reply_info.head->is_dentry)
 		err = ceph_handle_notrace_create(dir, dentry);
+out_req:
 	ceph_mdsc_put_request(req);
 out:
 	if (!err)
@@ -904,6 +905,7 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 	struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb);
 	struct ceph_mds_request *req;
 	struct ceph_acl_sec_ctx as_ctx = {};
+	umode_t mode = S_IFLNK | 0777;
 	int err;
 
 	if (ceph_snap(dir) != CEPH_NOSNAP)
@@ -914,21 +916,24 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out;
 	}
 
-	err = ceph_security_init_secctx(dentry, S_IFLNK | 0777, &as_ctx);
-	if (err < 0)
-		goto out;
-
 	dout("symlink in dir %p dentry %p to '%s'\n", dir, dentry, dest);
 	req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_SYMLINK, USE_AUTH_MDS);
 	if (IS_ERR(req)) {
 		err = PTR_ERR(req);
 		goto out;
 	}
+
+	req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+	if (IS_ERR(req->r_new_inode)) {
+		err = PTR_ERR(req->r_new_inode);
+		req->r_new_inode = NULL;
+		goto out_req;
+	}
+
 	req->r_path2 = kstrdup(dest, GFP_KERNEL);
 	if (!req->r_path2) {
 		err = -ENOMEM;
-		ceph_mdsc_put_request(req);
-		goto out;
+		goto out_req;
 	}
 	req->r_parent = dir;
 	ihold(dir);
@@ -938,13 +943,13 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 	req->r_num_caps = 2;
 	req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL;
 	req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
-	if (as_ctx.pagelist) {
-		req->r_pagelist = as_ctx.pagelist;
-		as_ctx.pagelist = NULL;
-	}
+
+	ceph_as_ctx_to_req(req, &as_ctx);
+
 	err = ceph_mdsc_do_request(mdsc, dir, req);
 	if (!err && !req->r_reply_info.head->is_dentry)
 		err = ceph_handle_notrace_create(dir, dentry);
+out_req:
 	ceph_mdsc_put_request(req);
 out:
 	if (err)
@@ -980,13 +985,6 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out;
 	}
 
-	mode |= S_IFDIR;
-	err = ceph_pre_init_acls(dir, &mode, &as_ctx);
-	if (err < 0)
-		goto out;
-	err = ceph_security_init_secctx(dentry, mode, &as_ctx);
-	if (err < 0)
-		goto out;
 
 	req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
 	if (IS_ERR(req)) {
@@ -994,6 +992,14 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out;
 	}
 
+	mode |= S_IFDIR;
+	req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+	if (IS_ERR(req->r_new_inode)) {
+		err = PTR_ERR(req->r_new_inode);
+		req->r_new_inode = NULL;
+		goto out_req;
+	}
+
 	req->r_dentry = dget(dentry);
 	req->r_num_caps = 2;
 	req->r_parent = dir;
@@ -1002,15 +1008,15 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 	req->r_args.mkdir.mode = cpu_to_le32(mode);
 	req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL;
 	req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
-	if (as_ctx.pagelist) {
-		req->r_pagelist = as_ctx.pagelist;
-		as_ctx.pagelist = NULL;
-	}
+
+	ceph_as_ctx_to_req(req, &as_ctx);
+
 	err = ceph_mdsc_do_request(mdsc, dir, req);
 	if (!err &&
 	    !req->r_reply_info.head->is_target &&
 	    !req->r_reply_info.head->is_dentry)
 		err = ceph_handle_notrace_create(dir, dentry);
+out_req:
 	ceph_mdsc_put_request(req);
 out:
 	if (!err)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 5b9104b8e453..ace72a052254 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -570,7 +570,8 @@ static void ceph_async_create_cb(struct ceph_mds_client *mdsc,
 	ceph_mdsc_release_dir_caps(req);
 }
 
-static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
+static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
+				    struct dentry *dentry,
 				    struct file *file, umode_t mode,
 				    struct ceph_mds_request *req,
 				    struct ceph_acl_sec_ctx *as_ctx,
@@ -581,7 +582,6 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
 	struct ceph_mds_reply_inode in = { };
 	struct ceph_mds_reply_info_in iinfo = { .in = &in };
 	struct ceph_inode_info *ci = ceph_inode(dir);
-	struct inode *inode;
 	struct timespec64 now;
 	struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb);
 	struct ceph_vino vino = { .ino = req->r_deleg_ino,
@@ -589,10 +589,6 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
 
 	ktime_get_real_ts64(&now);
 
-	inode = ceph_get_inode(dentry->d_sb, vino);
-	if (IS_ERR(inode))
-		return PTR_ERR(inode);
-
 	iinfo.inline_version = CEPH_INLINE_NONE;
 	iinfo.change_attr = 1;
 	ceph_encode_timespec64(&iinfo.btime, &now);
@@ -642,8 +638,7 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
 		ceph_dir_clear_complete(dir);
 		if (!d_unhashed(dentry))
 			d_drop(dentry);
-		if (inode->i_state & I_NEW)
-			discard_new_inode(inode);
+		discard_new_inode(inode);
 	} else {
 		struct dentry *dn;
 
@@ -683,6 +678,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
 	struct ceph_mds_client *mdsc = fsc->mdsc;
 	struct ceph_mds_request *req;
+	struct inode *new_inode = NULL;
 	struct dentry *dn;
 	struct ceph_acl_sec_ctx as_ctx = {};
 	bool try_async = ceph_test_mount_opt(fsc, ASYNC_DIROPS);
@@ -695,21 +691,21 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 
 	if (dentry->d_name.len > NAME_MAX)
 		return -ENAMETOOLONG;
-
+retry:
 	if (flags & O_CREAT) {
 		if (ceph_quota_is_max_files_exceeded(dir))
 			return -EDQUOT;
-		err = ceph_pre_init_acls(dir, &mode, &as_ctx);
-		if (err < 0)
-			return err;
-		err = ceph_security_init_secctx(dentry, mode, &as_ctx);
-		if (err < 0)
+
+		new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+		if (IS_ERR(new_inode)) {
+			err = PTR_ERR(new_inode);
 			goto out_ctx;
+		}
 	} else if (!d_in_lookup(dentry)) {
 		/* If it's not being looked up, it's negative */
 		return -ENOENT;
 	}
-retry:
+
 	/* do the open */
 	req = prepare_open_request(dir->i_sb, flags, mode);
 	if (IS_ERR(req)) {
@@ -730,25 +726,40 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 
 		req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL;
 		req->r_dentry_unless = CEPH_CAP_FILE_EXCL;
-		if (as_ctx.pagelist) {
-			req->r_pagelist = as_ctx.pagelist;
-			as_ctx.pagelist = NULL;
-		}
-		if (try_async &&
-		    (req->r_dir_caps =
-		      try_prep_async_create(dir, dentry, &lo,
-					    &req->r_deleg_ino))) {
+
+		ceph_as_ctx_to_req(req, &as_ctx);
+
+		if (try_async && (req->r_dir_caps =
+				  try_prep_async_create(dir, dentry, &lo, &req->r_deleg_ino))) {
+			struct ceph_vino vino = { .ino = req->r_deleg_ino,
+						  .snap = CEPH_NOSNAP };
+
 			set_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags);
 			req->r_args.open.flags |= cpu_to_le32(CEPH_O_EXCL);
 			req->r_callback = ceph_async_create_cb;
+
+			/* Hash inode before RPC */
+			new_inode = ceph_get_inode(dir->i_sb, vino, new_inode);
+			if (IS_ERR(new_inode)) {
+				err = PTR_ERR(new_inode);
+				new_inode = NULL;
+				goto out_req;
+			}
+			WARN_ON_ONCE(!(new_inode->i_state & I_NEW));
+
 			err = ceph_mdsc_submit_request(mdsc, dir, req);
 			if (!err) {
-				err = ceph_finish_async_create(dir, dentry,
+				err = ceph_finish_async_create(dir, new_inode, dentry,
 							file, mode, req,
 							&as_ctx, &lo);
+				new_inode = NULL;
 			} else if (err == -EJUKEBOX) {
 				restore_deleg_ino(dir, req->r_deleg_ino);
 				ceph_mdsc_put_request(req);
+				discard_new_inode(new_inode);
+				ceph_release_acl_sec_ctx(&as_ctx);
+				memset(&as_ctx, 0, sizeof(as_ctx));
+				new_inode = NULL;
 				try_async = false;
 				goto retry;
 			}
@@ -757,6 +768,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	}
 
 	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
+	req->r_new_inode = new_inode;
+	new_inode = NULL;
 	err = ceph_mdsc_do_request(mdsc,
 				   (flags & (O_CREAT|O_TRUNC)) ? dir : NULL,
 				   req);
@@ -799,6 +812,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	}
 out_req:
 	ceph_mdsc_put_request(req);
+	iput(new_inode);
 out_ctx:
 	ceph_release_acl_sec_ctx(&as_ctx);
 	dout("atomic_open result=%d\n", err);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 1ce6561a0bd3..ec35bb98985b 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -52,17 +52,85 @@ static int ceph_set_ino_cb(struct inode *inode, void *data)
 	return 0;
 }
 
-struct inode *ceph_get_inode(struct super_block *sb, struct ceph_vino vino)
+/**
+ * ceph_new_inode - allocate a new inode in advance of an expected create
+ * @dir: parent directory for new inode
+ * @dentry: dentry that may eventually point to new inode
+ * @mode: mode of new inode
+ * @as_ctx: pointer to inherited security context
+ *
+ * Allocate a new inode in advance of an operation to create a new inode.
+ * This allocates the inode and sets up the acl_sec_ctx with appropriate
+ * info for the new inode.
+ *
+ * Returns a pointer to the new inode or an ERR_PTR.
+ */
+struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
+			     umode_t *mode, struct ceph_acl_sec_ctx *as_ctx)
+{
+	int err;
+	struct inode *inode;
+
+	inode = new_inode_pseudo(dir->i_sb);
+	if (!inode)
+		return ERR_PTR(-ENOMEM);
+
+	if (!S_ISLNK(*mode)) {
+		err = ceph_pre_init_acls(dir, mode, as_ctx);
+		if (err < 0)
+			goto out_err;
+	}
+
+	err = ceph_security_init_secctx(dentry, *mode, as_ctx);
+	if (err < 0)
+		goto out_err;
+
+	inode->i_state = 0;
+	inode->i_mode = *mode;
+	return inode;
+out_err:
+	iput(inode);
+	return ERR_PTR(err);
+}
+
+void ceph_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as_ctx)
+{
+	if (as_ctx->pagelist) {
+		req->r_pagelist = as_ctx->pagelist;
+		as_ctx->pagelist = NULL;
+	}
+}
+
+/**
+ * ceph_get_inode - find or create/hash a new inode
+ * @sb: superblock to search and allocate in
+ * @vino: vino to search for
+ * @newino: optional new inode to insert if one isn't found (may be NULL)
+ *
+ * Search for or insert a new inode into the hash for the given vino, and return a
+ * reference to it. If new is non-NULL, its reference is consumed.
+ */
+struct inode *ceph_get_inode(struct super_block *sb, struct ceph_vino vino, struct inode *newino)
 {
 	struct inode *inode;
 
 	if (ceph_vino_is_reserved(vino))
 		return ERR_PTR(-EREMOTEIO);
 
-	inode = iget5_locked(sb, (unsigned long)vino.ino, ceph_ino_compare,
-			     ceph_set_ino_cb, &vino);
-	if (!inode)
+	if (newino) {
+		inode = inode_insert5(newino, (unsigned long)vino.ino, ceph_ino_compare,
+					ceph_set_ino_cb, &vino);
+		if (inode != newino)
+			iput(newino);
+	} else {
+		inode = iget5_locked(sb, (unsigned long)vino.ino, ceph_ino_compare,
+				     ceph_set_ino_cb, &vino);
+	}
+
+	if (!inode) {
+		dout("No inode found for %llx.%llx\n", vino.ino, vino.snap);
 		return ERR_PTR(-ENOMEM);
+	}
 
 	dout("get_inode on %llu=%llx.%llx got %p new %d\n", ceph_present_inode(inode),
 	     ceph_vinop(inode), inode, !!(inode->i_state & I_NEW));
@@ -78,7 +146,7 @@ struct inode *ceph_get_snapdir(struct inode *parent)
 		.ino = ceph_ino(parent),
 		.snap = CEPH_SNAPDIR,
 	};
-	struct inode *inode = ceph_get_inode(parent->i_sb, vino);
+	struct inode *inode = ceph_get_inode(parent->i_sb, vino, NULL);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 
 	if (IS_ERR(inode))
@@ -1544,7 +1612,7 @@ static int readdir_prepopulate_inodes_only(struct ceph_mds_request *req,
 		vino.ino = le64_to_cpu(rde->inode.in->ino);
 		vino.snap = le64_to_cpu(rde->inode.in->snapid);
 
-		in = ceph_get_inode(req->r_dentry->d_sb, vino);
+		in = ceph_get_inode(req->r_dentry->d_sb, vino, NULL);
 		if (IS_ERR(in)) {
 			err = PTR_ERR(in);
 			dout("new_inode badness got %d\n", err);
@@ -1746,7 +1814,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 		if (d_really_is_positive(dn)) {
 			in = d_inode(dn);
 		} else {
-			in = ceph_get_inode(parent->d_sb, tvino);
+			in = ceph_get_inode(parent->d_sb, tvino, NULL);
 			if (IS_ERR(in)) {
 				dout("new_inode badness\n");
 				d_drop(dn);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 5937cbfafd31..57cf21c9199f 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -844,6 +844,7 @@ void ceph_mdsc_release_request(struct kref *kref)
 		iput(req->r_parent);
 	}
 	iput(req->r_target_inode);
+	iput(req->r_new_inode);
 	if (req->r_dentry)
 		dput(req->r_dentry);
 	if (req->r_old_dentry)
@@ -3196,7 +3197,7 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
 			.snap = le64_to_cpu(rinfo->targeti.in->snapid)
 		};
 
-		in = ceph_get_inode(mdsc->fsc->sb, tvino);
+		in = ceph_get_inode(mdsc->fsc->sb, tvino, xchg(&req->r_new_inode, NULL));
 		if (IS_ERR(in)) {
 			err = PTR_ERR(in);
 			mutex_lock(&session->s_mutex);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 97c7f7bfa55f..c3986a412fb5 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -259,6 +259,7 @@ struct ceph_mds_request {
 
 	struct inode *r_parent;		    /* parent dir inode */
 	struct inode *r_target_inode;       /* resulting inode */
+	struct inode *r_new_inode;	    /* new inode (for creates) */
 
 #define CEPH_MDS_R_DIRECT_IS_HASH	(1) /* r_direct_hash is valid */
 #define CEPH_MDS_R_ABORTED		(2) /* call was aborted */
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index a12a193bc9ad..532ee9fca878 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -968,6 +968,7 @@ static inline bool __ceph_have_pending_cap_snap(struct ceph_inode_info *ci)
 /* inode.c */
 struct ceph_mds_reply_info_in;
 struct ceph_mds_reply_dirfrag;
+struct ceph_acl_sec_ctx;
 
 extern const struct inode_operations ceph_file_iops;
 
@@ -975,8 +976,12 @@ extern struct inode *ceph_alloc_inode(struct super_block *sb);
 extern void ceph_evict_inode(struct inode *inode);
 extern void ceph_free_inode(struct inode *inode);
 
+struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
+			     umode_t *mode, struct ceph_acl_sec_ctx *as_ctx);
+void ceph_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as_ctx);
+
 extern struct inode *ceph_get_inode(struct super_block *sb,
-				    struct ceph_vino vino);
+				    struct ceph_vino vino, struct inode *newino);
 extern struct inode *ceph_get_snapdir(struct inode *parent);
 extern int ceph_fill_file_size(struct inode *inode, int issued,
 			       u32 truncate_seq, u64 truncate_size, u64 size);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 06/48] ceph: crypto context handling for ceph
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (4 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 05/48] ceph: preallocate inode for ops that may create one Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 07/48] ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces Jeff Layton
                   ` (45 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Have set_context do a setattr that sets the fscrypt_auth value, and
get_context just return the contents of that field.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/Makefile |  1 +
 fs/ceph/crypto.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/crypto.h | 29 ++++++++++++++++++
 fs/ceph/inode.c  |  3 ++
 fs/ceph/super.c  |  3 ++
 5 files changed, 112 insertions(+)
 create mode 100644 fs/ceph/crypto.c
 create mode 100644 fs/ceph/crypto.h

diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile
index 50c635dc7f71..1f77ca04c426 100644
--- a/fs/ceph/Makefile
+++ b/fs/ceph/Makefile
@@ -12,3 +12,4 @@ ceph-y := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \
 
 ceph-$(CONFIG_CEPH_FSCACHE) += cache.o
 ceph-$(CONFIG_CEPH_FS_POSIX_ACL) += acl.o
+ceph-$(CONFIG_FS_ENCRYPTION) += crypto.o
diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
new file mode 100644
index 000000000000..a513ff373b13
--- /dev/null
+++ b/fs/ceph/crypto.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ceph/ceph_debug.h>
+#include <linux/xattr.h>
+#include <linux/fscrypt.h>
+
+#include "super.h"
+#include "crypto.h"
+
+static int ceph_crypt_get_context(struct inode *inode, void *ctx, size_t len)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+	struct ceph_fscrypt_auth *cfa = (struct ceph_fscrypt_auth *)ci->fscrypt_auth;
+	u32 ctxlen;
+
+	/* Non existent or too short? */
+	if (!cfa || (ci->fscrypt_auth_len < (offsetof(struct ceph_fscrypt_auth, cfa_blob) + 1)))
+		return -ENOBUFS;
+
+	/* Some format we don't recognize? */
+	if (le32_to_cpu(cfa->cfa_version) != CEPH_FSCRYPT_AUTH_VERSION)
+		return -ENOBUFS;
+
+	ctxlen = le32_to_cpu(cfa->cfa_blob_len);
+	if (len < ctxlen)
+		return -ERANGE;
+
+	memcpy(ctx, cfa->cfa_blob, ctxlen);
+	return ctxlen;
+}
+
+static int ceph_crypt_set_context(struct inode *inode, const void *ctx, size_t len, void *fs_data)
+{
+	int ret;
+	struct iattr attr = { };
+	struct ceph_iattr cia = { };
+	struct ceph_fscrypt_auth *cfa;
+
+	WARN_ON_ONCE(fs_data);
+
+	if (len > FSCRYPT_SET_CONTEXT_MAX_SIZE)
+		return -EINVAL;
+
+	cfa = kzalloc(sizeof(*cfa), GFP_KERNEL);
+	if (!cfa)
+		return -ENOMEM;
+
+	cfa->cfa_version = cpu_to_le32(CEPH_FSCRYPT_AUTH_VERSION);
+	cfa->cfa_blob_len = cpu_to_le32(len);
+	memcpy(cfa->cfa_blob, ctx, len);
+
+	cia.fscrypt_auth = cfa;
+
+	ret = __ceph_setattr(inode, &attr, &cia);
+	if (ret == 0)
+		inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
+	kfree(cia.fscrypt_auth);
+	return ret;
+}
+
+static bool ceph_crypt_empty_dir(struct inode *inode)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+
+	return ci->i_rsubdirs + ci->i_rfiles == 1;
+}
+
+static struct fscrypt_operations ceph_fscrypt_ops = {
+	.get_context		= ceph_crypt_get_context,
+	.set_context		= ceph_crypt_set_context,
+	.empty_dir		= ceph_crypt_empty_dir,
+};
+
+void ceph_fscrypt_set_ops(struct super_block *sb)
+{
+	fscrypt_set_ops(sb, &ceph_fscrypt_ops);
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
new file mode 100644
index 000000000000..6c3831c57c8d
--- /dev/null
+++ b/fs/ceph/crypto.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Ceph fscrypt functionality
+ */
+
+#ifndef _CEPH_CRYPTO_H
+#define _CEPH_CRYPTO_H
+
+#include <linux/fscrypt.h>
+
+struct ceph_fscrypt_auth {
+	__le32	cfa_version;
+	__le32	cfa_blob_len;
+	u8	cfa_blob[FSCRYPT_SET_CONTEXT_MAX_SIZE];
+} __packed;
+
+#ifdef CONFIG_FS_ENCRYPTION
+#define CEPH_FSCRYPT_AUTH_VERSION	1
+void ceph_fscrypt_set_ops(struct super_block *sb);
+
+#else /* CONFIG_FS_ENCRYPTION */
+
+static inline void ceph_fscrypt_set_ops(struct super_block *sb)
+{
+}
+
+#endif /* CONFIG_FS_ENCRYPTION */
+
+#endif
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index ec35bb98985b..649d7a059d7b 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -14,10 +14,12 @@
 #include <linux/random.h>
 #include <linux/sort.h>
 #include <linux/iversion.h>
+#include <linux/fscrypt.h>
 
 #include "super.h"
 #include "mds_client.h"
 #include "cache.h"
+#include "crypto.h"
 #include <linux/ceph/decode.h>
 
 /*
@@ -638,6 +640,7 @@ void ceph_evict_inode(struct inode *inode)
 	clear_inode(inode);
 
 	ceph_fscache_unregister_inode_cookie(ci);
+	fscrypt_put_encryption_info(inode);
 
 	__ceph_remove_caps(ci);
 
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index f68097c9f61f..fbdf434b4618 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -20,6 +20,7 @@
 #include "super.h"
 #include "mds_client.h"
 #include "cache.h"
+#include "crypto.h"
 
 #include <linux/ceph/ceph_features.h>
 #include <linux/ceph/decode.h>
@@ -1113,6 +1114,8 @@ static int ceph_set_super(struct super_block *s, struct fs_context *fc)
 	s->s_time_min = 0;
 	s->s_time_max = U32_MAX;
 
+	ceph_fscrypt_set_ops(s);
+
 	ret = set_anon_super_fc(s, fc);
 	if (ret != 0)
 		fsc->sb = NULL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 07/48] ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (5 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 06/48] ceph: crypto context handling for ceph Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-02-17  8:25   ` Xiubo Li
  2022-01-11 19:15 ` [RFC PATCH v10 08/48] ceph: add fscrypt_* handling to caps.c Jeff Layton
                   ` (44 subsequent siblings)
  51 siblings, 1 reply; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

...and store them in the ceph_inode_info.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/file.c       |  2 ++
 fs/ceph/inode.c      | 18 ++++++++++++++-
 fs/ceph/mds_client.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/mds_client.h |  4 ++++
 fs/ceph/super.h      |  6 +++++
 5 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index ace72a052254..5937a25ddddd 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -597,6 +597,8 @@ static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
 	iinfo.xattr_data = xattr_buf;
 	memset(iinfo.xattr_data, 0, iinfo.xattr_len);
 
+	/* FIXME: set fscrypt_auth and fscrypt_file */
+
 	in.ino = cpu_to_le64(vino.ino);
 	in.snapid = cpu_to_le64(CEPH_NOSNAP);
 	in.version = cpu_to_le64(1);	// ???
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 649d7a059d7b..d090fe081093 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -609,7 +609,10 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
 	INIT_WORK(&ci->i_work, ceph_inode_work);
 	ci->i_work_mask = 0;
 	memset(&ci->i_btime, '\0', sizeof(ci->i_btime));
-
+#ifdef CONFIG_FS_ENCRYPTION
+	ci->fscrypt_auth = NULL;
+	ci->fscrypt_auth_len = 0;
+#endif
 	ceph_fscache_inode_init(ci);
 
 	return &ci->vfs_inode;
@@ -620,6 +623,9 @@ void ceph_free_inode(struct inode *inode)
 	struct ceph_inode_info *ci = ceph_inode(inode);
 
 	kfree(ci->i_symlink);
+#ifdef CONFIG_FS_ENCRYPTION
+	kfree(ci->fscrypt_auth);
+#endif
 	kmem_cache_free(ceph_inode_cachep, ci);
 }
 
@@ -1020,6 +1026,16 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 		xattr_blob = NULL;
 	}
 
+#ifdef CONFIG_FS_ENCRYPTION
+	if (iinfo->fscrypt_auth_len && !ci->fscrypt_auth) {
+		ci->fscrypt_auth_len = iinfo->fscrypt_auth_len;
+		ci->fscrypt_auth = iinfo->fscrypt_auth;
+		iinfo->fscrypt_auth = NULL;
+		iinfo->fscrypt_auth_len = 0;
+		inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
+	}
+#endif
+
 	/* finally update i_version */
 	if (le64_to_cpu(info->version) > ci->i_version)
 		ci->i_version = le64_to_cpu(info->version);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 57cf21c9199f..bd824e989449 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -184,8 +184,50 @@ static int parse_reply_info_in(void **p, void *end,
 			info->rsnaps = 0;
 		}
 
+		if (struct_v >= 5) {
+			u32 alen;
+
+			ceph_decode_32_safe(p, end, alen, bad);
+
+			while (alen--) {
+				u32 len;
+
+				/* key */
+				ceph_decode_32_safe(p, end, len, bad);
+				ceph_decode_skip_n(p, end, len, bad);
+				/* value */
+				ceph_decode_32_safe(p, end, len, bad);
+				ceph_decode_skip_n(p, end, len, bad);
+			}
+		}
+
+		/* fscrypt flag -- ignore */
+		if (struct_v >= 6)
+			ceph_decode_skip_8(p, end, bad);
+
+		info->fscrypt_auth = NULL;
+		info->fscrypt_file = NULL;
+		if (struct_v >= 7) {
+			ceph_decode_32_safe(p, end, info->fscrypt_auth_len, bad);
+			if (info->fscrypt_auth_len) {
+				info->fscrypt_auth = kmalloc(info->fscrypt_auth_len, GFP_KERNEL);
+				if (!info->fscrypt_auth)
+					return -ENOMEM;
+				ceph_decode_copy_safe(p, end, info->fscrypt_auth,
+						      info->fscrypt_auth_len, bad);
+			}
+			ceph_decode_32_safe(p, end, info->fscrypt_file_len, bad);
+			if (info->fscrypt_file_len) {
+				info->fscrypt_file = kmalloc(info->fscrypt_file_len, GFP_KERNEL);
+				if (!info->fscrypt_file)
+					return -ENOMEM;
+				ceph_decode_copy_safe(p, end, info->fscrypt_file,
+						      info->fscrypt_file_len, bad);
+			}
+		}
 		*p = end;
 	} else {
+		/* legacy (unversioned) struct */
 		if (features & CEPH_FEATURE_MDS_INLINE_DATA) {
 			ceph_decode_64_safe(p, end, info->inline_version, bad);
 			ceph_decode_32_safe(p, end, info->inline_len, bad);
@@ -626,8 +668,21 @@ static int parse_reply_info(struct ceph_mds_session *s, struct ceph_msg *msg,
 
 static void destroy_reply_info(struct ceph_mds_reply_info_parsed *info)
 {
+	int i;
+
+	kfree(info->diri.fscrypt_auth);
+	kfree(info->diri.fscrypt_file);
+	kfree(info->targeti.fscrypt_auth);
+	kfree(info->targeti.fscrypt_file);
 	if (!info->dir_entries)
 		return;
+
+	for (i = 0; i < info->dir_nr; i++) {
+		struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
+
+		kfree(rde->inode.fscrypt_auth);
+		kfree(rde->inode.fscrypt_file);
+	}
 	free_pages((unsigned long)info->dir_entries, get_order(info->dir_buf_size));
 }
 
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index c3986a412fb5..98a8710807d1 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -88,6 +88,10 @@ struct ceph_mds_reply_info_in {
 	s32 dir_pin;
 	struct ceph_timespec btime;
 	struct ceph_timespec snap_btime;
+	u8 *fscrypt_auth;
+	u8 *fscrypt_file;
+	u32 fscrypt_auth_len;
+	u32 fscrypt_file_len;
 	u64 rsnaps;
 	u64 change_attr;
 };
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 532ee9fca878..5b4092e5f291 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -433,6 +433,12 @@ struct ceph_inode_info {
 	struct work_struct i_work;
 	unsigned long  i_work_mask;
 
+#ifdef CONFIG_FS_ENCRYPTION
+	u32 fscrypt_auth_len;
+	u32 fscrypt_file_len;
+	u8 *fscrypt_auth;
+	u8 *fscrypt_file;
+#endif
 #ifdef CONFIG_CEPH_FSCACHE
 	struct fscache_cookie *fscache;
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 08/48] ceph: add fscrypt_* handling to caps.c
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (6 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 07/48] ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 09/48] ceph: add ability to set fscrypt_auth via setattr Jeff Layton
                   ` (43 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c | 76 +++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 63 insertions(+), 13 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index f2f1e4db7b6b..87ee9766dc2e 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -13,6 +13,7 @@
 #include "super.h"
 #include "mds_client.h"
 #include "cache.h"
+#include "crypto.h"
 #include <linux/ceph/decode.h>
 #include <linux/ceph/messenger.h>
 
@@ -1214,15 +1215,12 @@ struct cap_msg_args {
 	umode_t			mode;
 	bool			inline_data;
 	bool			wake;
+	u32			fscrypt_auth_len;
+	u32			fscrypt_file_len;
+	u8			fscrypt_auth[sizeof(struct ceph_fscrypt_auth)]; // for context
+	u8			fscrypt_file[sizeof(u64)]; // for size
 };
 
-/*
- * cap struct size + flock buffer size + inline version + inline data size +
- * osd_epoch_barrier + oldest_flush_tid
- */
-#define CAP_MSG_SIZE (sizeof(struct ceph_mds_caps) + \
-		      4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 4)
-
 /* Marshal up the cap msg to the MDS */
 static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg)
 {
@@ -1238,7 +1236,7 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg)
 	     arg->size, arg->max_size, arg->xattr_version,
 	     arg->xattr_buf ? (int)arg->xattr_buf->vec.iov_len : 0);
 
-	msg->hdr.version = cpu_to_le16(10);
+	msg->hdr.version = cpu_to_le16(12);
 	msg->hdr.tid = cpu_to_le64(arg->flush_tid);
 
 	fc = msg->front.iov_base;
@@ -1309,6 +1307,21 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg)
 
 	/* Advisory flags (version 10) */
 	ceph_encode_32(&p, arg->flags);
+
+	/* dirstats (version 11) - these are r/o on the client */
+	ceph_encode_64(&p, 0);
+	ceph_encode_64(&p, 0);
+
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+	/* fscrypt_auth and fscrypt_file (version 12) */
+	ceph_encode_32(&p, arg->fscrypt_auth_len);
+	ceph_encode_copy(&p, arg->fscrypt_auth, arg->fscrypt_auth_len);
+	ceph_encode_32(&p, arg->fscrypt_file_len);
+	ceph_encode_copy(&p, arg->fscrypt_file, arg->fscrypt_file_len);
+#else /* CONFIG_FS_ENCRYPTION */
+	ceph_encode_32(&p, 0);
+	ceph_encode_32(&p, 0);
+#endif /* CONFIG_FS_ENCRYPTION */
 }
 
 /*
@@ -1430,8 +1443,37 @@ static void __prep_cap(struct cap_msg_args *arg, struct ceph_cap *cap,
 		}
 	}
 	arg->flags = flags;
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+	if (ci->fscrypt_auth_len &&
+	    WARN_ON_ONCE(ci->fscrypt_auth_len != sizeof(struct ceph_fscrypt_auth))) {
+		/* Don't set this if it isn't right size */
+		arg->fscrypt_auth_len = 0;
+	} else {
+		arg->fscrypt_auth_len = ci->fscrypt_auth_len;
+		memcpy(arg->fscrypt_auth, ci->fscrypt_auth,
+			min_t(size_t, ci->fscrypt_auth_len, sizeof(arg->fscrypt_auth)));
+	}
+	/* FIXME: use this to track "real" size */
+	arg->fscrypt_file_len = 0;
+#endif /* CONFIG_FS_ENCRYPTION */
 }
 
+#define CAP_MSG_FIXED_FIELDS (sizeof(struct ceph_mds_caps) + \
+		      4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 4 + 8 + 8 + 4 + 4)
+
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static inline int cap_msg_size(struct cap_msg_args *arg)
+{
+	return CAP_MSG_FIXED_FIELDS + arg->fscrypt_auth_len +
+			arg->fscrypt_file_len;
+}
+#else
+static inline int cap_msg_size(struct cap_msg_args *arg)
+{
+	return CAP_MSG_FIXED_FIELDS;
+}
+#endif /* CONFIG_FS_ENCRYPTION */
+
 /*
  * Send a cap msg on the given inode.
  *
@@ -1442,7 +1484,7 @@ static void __send_cap(struct cap_msg_args *arg, struct ceph_inode_info *ci)
 	struct ceph_msg *msg;
 	struct inode *inode = &ci->vfs_inode;
 
-	msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, CAP_MSG_SIZE, GFP_NOFS, false);
+	msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, cap_msg_size(arg), GFP_NOFS, false);
 	if (!msg) {
 		pr_err("error allocating cap msg: ino (%llx.%llx) flushing %s tid %llu, requeuing cap.\n",
 		       ceph_vinop(inode), ceph_cap_string(arg->dirty),
@@ -1468,10 +1510,6 @@ static inline int __send_flush_snap(struct inode *inode,
 	struct cap_msg_args	arg;
 	struct ceph_msg		*msg;
 
-	msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, CAP_MSG_SIZE, GFP_NOFS, false);
-	if (!msg)
-		return -ENOMEM;
-
 	arg.session = session;
 	arg.ino = ceph_vino(inode).ino;
 	arg.cid = 0;
@@ -1509,6 +1547,18 @@ static inline int __send_flush_snap(struct inode *inode,
 	arg.flags = 0;
 	arg.wake = false;
 
+	/*
+	 * No fscrypt_auth changes from a capsnap. It will need
+	 * to update fscrypt_file on size changes (TODO).
+	 */
+	arg.fscrypt_auth_len = 0;
+	arg.fscrypt_file_len = 0;
+
+	msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, cap_msg_size(&arg),
+			   GFP_NOFS, false);
+	if (!msg)
+		return -ENOMEM;
+
 	encode_cap_msg(msg, &arg);
 	ceph_con_send(&arg.session->s_con, msg);
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 09/48] ceph: add ability to set fscrypt_auth via setattr
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (7 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 08/48] ceph: add fscrypt_* handling to caps.c Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 10/48] ceph: implement -o test_dummy_encryption mount option Jeff Layton
                   ` (42 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/acl.c                |  4 +--
 fs/ceph/crypto.h             |  9 +++++-
 fs/ceph/inode.c              | 42 ++++++++++++++++++++++++++--
 fs/ceph/mds_client.c         | 54 ++++++++++++++++++++++++++++++------
 fs/ceph/mds_client.h         |  3 ++
 fs/ceph/super.h              |  7 ++++-
 include/linux/ceph/ceph_fs.h | 21 ++++++++------
 7 files changed, 117 insertions(+), 23 deletions(-)

diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
index f4fc8e0b847c..427724c36316 100644
--- a/fs/ceph/acl.c
+++ b/fs/ceph/acl.c
@@ -139,7 +139,7 @@ int ceph_set_acl(struct user_namespace *mnt_userns, struct inode *inode,
 		newattrs.ia_ctime = current_time(inode);
 		newattrs.ia_mode = new_mode;
 		newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
-		ret = __ceph_setattr(inode, &newattrs);
+		ret = __ceph_setattr(inode, &newattrs, NULL);
 		if (ret)
 			goto out_free;
 	}
@@ -150,7 +150,7 @@ int ceph_set_acl(struct user_namespace *mnt_userns, struct inode *inode,
 			newattrs.ia_ctime = old_ctime;
 			newattrs.ia_mode = old_mode;
 			newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
-			__ceph_setattr(inode, &newattrs);
+			__ceph_setattr(inode, &newattrs, NULL);
 		}
 		goto out_free;
 	}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 6c3831c57c8d..6dca674f79b8 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -14,8 +14,15 @@ struct ceph_fscrypt_auth {
 	u8	cfa_blob[FSCRYPT_SET_CONTEXT_MAX_SIZE];
 } __packed;
 
-#ifdef CONFIG_FS_ENCRYPTION
 #define CEPH_FSCRYPT_AUTH_VERSION	1
+static inline u32 ceph_fscrypt_auth_len(struct ceph_fscrypt_auth *fa)
+{
+	u32 ctxsize = le32_to_cpu(fa->cfa_blob_len);
+
+	return offsetof(struct ceph_fscrypt_auth, cfa_blob) + ctxsize;
+}
+
+#ifdef CONFIG_FS_ENCRYPTION
 void ceph_fscrypt_set_ops(struct super_block *sb);
 
 #else /* CONFIG_FS_ENCRYPTION */
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index d090fe081093..c6653f83b6f0 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2090,7 +2090,7 @@ static const struct inode_operations ceph_symlink_iops = {
 	.listxattr = ceph_listxattr,
 };
 
-int __ceph_setattr(struct inode *inode, struct iattr *attr)
+int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	unsigned int ia_valid = attr->ia_valid;
@@ -2130,6 +2130,43 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr)
 	}
 
 	dout("setattr %p issued %s\n", inode, ceph_cap_string(issued));
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+	if (cia && cia->fscrypt_auth) {
+		u32 len = ceph_fscrypt_auth_len(cia->fscrypt_auth);
+
+		if (len > sizeof(*cia->fscrypt_auth)) {
+			err = -EINVAL;
+			spin_unlock(&ci->i_ceph_lock);
+			goto out;
+		}
+
+		dout("setattr %llx:%llx fscrypt_auth len %u to %u)\n",
+			ceph_vinop(inode), ci->fscrypt_auth_len, len);
+
+		/* It should never be re-set once set */
+		WARN_ON_ONCE(ci->fscrypt_auth);
+
+		if (issued & CEPH_CAP_AUTH_EXCL) {
+			dirtied |= CEPH_CAP_AUTH_EXCL;
+			kfree(ci->fscrypt_auth);
+			ci->fscrypt_auth = (u8 *)cia->fscrypt_auth;
+			ci->fscrypt_auth_len = len;
+		} else if ((issued & CEPH_CAP_AUTH_SHARED) == 0 ||
+			   ci->fscrypt_auth_len != len ||
+			   memcmp(ci->fscrypt_auth, cia->fscrypt_auth, len)) {
+			req->r_fscrypt_auth = cia->fscrypt_auth;
+			mask |= CEPH_SETATTR_FSCRYPT_AUTH;
+			release |= CEPH_CAP_AUTH_SHARED;
+		}
+		cia->fscrypt_auth = NULL;
+	}
+#else
+	if (cia && cia->fscrypt_auth) {
+		err = -EINVAL;
+		spin_unlock(&ci->i_ceph_lock);
+		goto out;
+	}
+#endif /* CONFIG_FS_ENCRYPTION */
 
 	if (ia_valid & ATTR_UID) {
 		dout("setattr %p uid %d -> %d\n", inode,
@@ -2292,6 +2329,7 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr)
 		req->r_stamp = attr->ia_ctime;
 		err = ceph_mdsc_do_request(mdsc, NULL, req);
 	}
+out:
 	dout("setattr %p result=%d (%s locally, %d remote)\n", inode, err,
 	     ceph_cap_string(dirtied), mask);
 
@@ -2332,7 +2370,7 @@ int ceph_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 	    ceph_quota_is_max_bytes_exceeded(inode, attr->ia_size))
 		return -EDQUOT;
 
-	err = __ceph_setattr(inode, attr);
+	err = __ceph_setattr(inode, attr, NULL);
 
 	if (err >= 0 && (attr->ia_valid & ATTR_MODE))
 		err = posix_acl_chmod(&init_user_ns, inode, attr->ia_mode);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index bd824e989449..34a4f6dbac9d 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -15,6 +15,7 @@
 
 #include "super.h"
 #include "mds_client.h"
+#include "crypto.h"
 
 #include <linux/ceph/ceph_features.h>
 #include <linux/ceph/messenger.h>
@@ -920,6 +921,7 @@ void ceph_mdsc_release_request(struct kref *kref)
 	put_cred(req->r_cred);
 	if (req->r_pagelist)
 		ceph_pagelist_release(req->r_pagelist);
+	kfree(req->r_fscrypt_auth);
 	put_request_session(req);
 	ceph_unreserve_caps(req->r_mdsc, &req->r_caps_reservation);
 	WARN_ON_ONCE(!list_empty(&req->r_wait));
@@ -2499,8 +2501,7 @@ static int set_request_path_attr(struct inode *rinode, struct dentry *rdentry,
 	return r;
 }
 
-static void encode_timestamp_and_gids(void **p,
-				      const struct ceph_mds_request *req)
+static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *req)
 {
 	struct ceph_timespec ts;
 	int i;
@@ -2513,6 +2514,20 @@ static void encode_timestamp_and_gids(void **p,
 	for (i = 0; i < req->r_cred->group_info->ngroups; i++)
 		ceph_encode_64(p, from_kgid(&init_user_ns,
 					    req->r_cred->group_info->gid[i]));
+
+	/* v5: altname (TODO: skip for now) */
+	ceph_encode_32(p, 0);
+
+	/* v6: fscrypt_auth and fscrypt_file */
+	if (req->r_fscrypt_auth) {
+		u32 authlen = ceph_fscrypt_auth_len(req->r_fscrypt_auth);
+
+		ceph_encode_32(p, authlen);
+		ceph_encode_copy(p, req->r_fscrypt_auth, authlen);
+	} else {
+		ceph_encode_32(p, 0);
+	}
+	ceph_encode_32(p, 0); // fscrypt_file for now
 }
 
 /*
@@ -2557,12 +2572,14 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 		goto out_free1;
 	}
 
+	/* head */
 	len = legacy ? sizeof(*head) : sizeof(struct ceph_mds_request_head);
-	len += pathlen1 + pathlen2 + 2*(1 + sizeof(u32) + sizeof(u64)) +
-		sizeof(struct ceph_timespec);
-	len += sizeof(u32) + (sizeof(u64) * req->r_cred->group_info->ngroups);
 
-	/* calculate (max) length for cap releases */
+	/* filepaths */
+	len += 2 * (1 + sizeof(u32) + sizeof(u64));
+	len += pathlen1 + pathlen2;
+
+	/* cap releases */
 	len += sizeof(struct ceph_mds_request_release) *
 		(!!req->r_inode_drop + !!req->r_dentry_drop +
 		 !!req->r_old_inode_drop + !!req->r_old_dentry_drop);
@@ -2572,6 +2589,25 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 	if (req->r_old_dentry_drop)
 		len += pathlen2;
 
+	/* MClientRequest tail */
+
+	/* req->r_stamp */
+	len += sizeof(struct ceph_timespec);
+
+	/* gid list */
+	len += sizeof(u32) + (sizeof(u64) * req->r_cred->group_info->ngroups);
+
+	/* alternate name */
+	len += sizeof(u32);	// TODO
+
+	/* fscrypt_auth */
+	len += sizeof(u32); // fscrypt_auth
+	if (req->r_fscrypt_auth)
+		len += ceph_fscrypt_auth_len(req->r_fscrypt_auth);
+
+	/* fscrypt_file */
+	len += sizeof(u32);
+
 	msg = ceph_msg_new2(CEPH_MSG_CLIENT_REQUEST, len, 1, GFP_NOFS, false);
 	if (!msg) {
 		msg = ERR_PTR(-ENOMEM);
@@ -2591,7 +2627,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 	} else {
 		struct ceph_mds_request_head *new_head = msg->front.iov_base;
 
-		msg->hdr.version = cpu_to_le16(4);
+		msg->hdr.version = cpu_to_le16(6);
 		new_head->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
 		head = (struct ceph_mds_request_head_old *)&new_head->oldest_client_tid;
 		p = msg->front.iov_base + sizeof(*new_head);
@@ -2642,7 +2678,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 
 	head->num_releases = cpu_to_le16(releases);
 
-	encode_timestamp_and_gids(&p, req);
+	encode_mclientrequest_tail(&p, req);
 
 	if (WARN_ON_ONCE(p > end)) {
 		ceph_msg_put(msg);
@@ -2751,7 +2787,7 @@ static int __prepare_send_request(struct ceph_mds_session *session,
 		rhead->num_releases = 0;
 
 		p = msg->front.iov_base + req->r_request_release_offset;
-		encode_timestamp_and_gids(&p, req);
+		encode_mclientrequest_tail(&p, req);
 
 		msg->front.iov_len = p - msg->front.iov_base;
 		msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 98a8710807d1..e7d2c8a1b9c1 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -278,6 +278,9 @@ struct ceph_mds_request {
 	struct mutex r_fill_mutex;
 
 	union ceph_mds_request_args r_args;
+
+	struct ceph_fscrypt_auth *r_fscrypt_auth;
+
 	int r_fmode;        /* file mode, if expecting cap */
 	const struct cred *r_cred;
 	int r_request_release_offset;
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 5b4092e5f291..853577f8d772 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1045,7 +1045,12 @@ static inline int ceph_do_getattr(struct inode *inode, int mask, bool force)
 }
 extern int ceph_permission(struct user_namespace *mnt_userns,
 			   struct inode *inode, int mask);
-extern int __ceph_setattr(struct inode *inode, struct iattr *attr);
+
+struct ceph_iattr {
+	struct ceph_fscrypt_auth	*fscrypt_auth;
+};
+
+extern int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia);
 extern int ceph_setattr(struct user_namespace *mnt_userns,
 			struct dentry *dentry, struct iattr *attr);
 extern int ceph_getattr(struct user_namespace *mnt_userns,
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index 7ad6c3d0db7d..3776bef67235 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -358,14 +358,19 @@ enum {
 
 extern const char *ceph_mds_op_name(int op);
 
-
-#define CEPH_SETATTR_MODE   1
-#define CEPH_SETATTR_UID    2
-#define CEPH_SETATTR_GID    4
-#define CEPH_SETATTR_MTIME  8
-#define CEPH_SETATTR_ATIME 16
-#define CEPH_SETATTR_SIZE  32
-#define CEPH_SETATTR_CTIME 64
+#define CEPH_SETATTR_MODE              (1 << 0)
+#define CEPH_SETATTR_UID               (1 << 1)
+#define CEPH_SETATTR_GID               (1 << 2)
+#define CEPH_SETATTR_MTIME             (1 << 3)
+#define CEPH_SETATTR_ATIME             (1 << 4)
+#define CEPH_SETATTR_SIZE              (1 << 5)
+#define CEPH_SETATTR_CTIME             (1 << 6)
+#define CEPH_SETATTR_MTIME_NOW         (1 << 7)
+#define CEPH_SETATTR_ATIME_NOW         (1 << 8)
+#define CEPH_SETATTR_BTIME             (1 << 9)
+#define CEPH_SETATTR_KILL_SGUID        (1 << 10)
+#define CEPH_SETATTR_FSCRYPT_AUTH      (1 << 11)
+#define CEPH_SETATTR_FSCRYPT_FILE      (1 << 12)
 
 /*
  * Ceph setxattr request flags.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 10/48] ceph: implement -o test_dummy_encryption mount option
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (8 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 09/48] ceph: add ability to set fscrypt_auth via setattr Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-02-11 13:50   ` Luís Henriques
  2022-01-11 19:15 ` [RFC PATCH v10 11/48] ceph: decode alternate_name in lease info Jeff Layton
                   ` (41 subsequent siblings)
  51 siblings, 1 reply; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/crypto.c | 53 ++++++++++++++++++++++++++++++++
 fs/ceph/crypto.h | 26 ++++++++++++++++
 fs/ceph/inode.c  | 10 ++++--
 fs/ceph/super.c  | 79 ++++++++++++++++++++++++++++++++++++++++++++++--
 fs/ceph/super.h  | 12 +++++++-
 fs/ceph/xattr.c  |  3 ++
 6 files changed, 177 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index a513ff373b13..017f31eacb74 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -4,6 +4,7 @@
 #include <linux/fscrypt.h>
 
 #include "super.h"
+#include "mds_client.h"
 #include "crypto.h"
 
 static int ceph_crypt_get_context(struct inode *inode, void *ctx, size_t len)
@@ -64,9 +65,20 @@ static bool ceph_crypt_empty_dir(struct inode *inode)
 	return ci->i_rsubdirs + ci->i_rfiles == 1;
 }
 
+void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
+{
+	fscrypt_free_dummy_policy(&fsc->dummy_enc_policy);
+}
+
+static const union fscrypt_policy *ceph_get_dummy_policy(struct super_block *sb)
+{
+	return ceph_sb_to_client(sb)->dummy_enc_policy.policy;
+}
+
 static struct fscrypt_operations ceph_fscrypt_ops = {
 	.get_context		= ceph_crypt_get_context,
 	.set_context		= ceph_crypt_set_context,
+	.get_dummy_policy	= ceph_get_dummy_policy,
 	.empty_dir		= ceph_crypt_empty_dir,
 };
 
@@ -74,3 +86,44 @@ void ceph_fscrypt_set_ops(struct super_block *sb)
 {
 	fscrypt_set_ops(sb, &ceph_fscrypt_ops);
 }
+
+int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
+				 struct ceph_acl_sec_ctx *as)
+{
+	int ret, ctxsize;
+	bool encrypted = false;
+	struct ceph_inode_info *ci = ceph_inode(inode);
+
+	ret = fscrypt_prepare_new_inode(dir, inode, &encrypted);
+	if (ret)
+		return ret;
+	if (!encrypted)
+		return 0;
+
+	as->fscrypt_auth = kzalloc(sizeof(*as->fscrypt_auth), GFP_KERNEL);
+	if (!as->fscrypt_auth)
+		return -ENOMEM;
+
+	ctxsize = fscrypt_context_for_new_inode(as->fscrypt_auth->cfa_blob, inode);
+	if (ctxsize < 0)
+		return ctxsize;
+
+	as->fscrypt_auth->cfa_version = cpu_to_le32(CEPH_FSCRYPT_AUTH_VERSION);
+	as->fscrypt_auth->cfa_blob_len = cpu_to_le32(ctxsize);
+
+	WARN_ON_ONCE(ci->fscrypt_auth);
+	kfree(ci->fscrypt_auth);
+	ci->fscrypt_auth_len = ceph_fscrypt_auth_len(as->fscrypt_auth);
+	ci->fscrypt_auth = kmemdup(as->fscrypt_auth, ci->fscrypt_auth_len, GFP_KERNEL);
+	if (!ci->fscrypt_auth)
+		return -ENOMEM;
+
+	inode->i_flags |= S_ENCRYPTED;
+
+	return 0;
+}
+
+void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as)
+{
+	swap(req->r_fscrypt_auth, as->fscrypt_auth);
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 6dca674f79b8..cb00fe42d5b7 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -8,6 +8,10 @@
 
 #include <linux/fscrypt.h>
 
+struct ceph_fs_client;
+struct ceph_acl_sec_ctx;
+struct ceph_mds_request;
+
 struct ceph_fscrypt_auth {
 	__le32	cfa_version;
 	__le32	cfa_blob_len;
@@ -25,12 +29,34 @@ static inline u32 ceph_fscrypt_auth_len(struct ceph_fscrypt_auth *fa)
 #ifdef CONFIG_FS_ENCRYPTION
 void ceph_fscrypt_set_ops(struct super_block *sb);
 
+void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc);
+
+int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
+				 struct ceph_acl_sec_ctx *as);
+void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as);
+
 #else /* CONFIG_FS_ENCRYPTION */
 
 static inline void ceph_fscrypt_set_ops(struct super_block *sb)
 {
 }
 
+static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
+{
+}
+
+static inline int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
+						struct ceph_acl_sec_ctx *as)
+{
+	if (IS_ENCRYPTED(dir))
+		return -EOPNOTSUPP;
+	return 0;
+}
+
+static inline void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req,
+						struct ceph_acl_sec_ctx *as_ctx)
+{
+}
 #endif /* CONFIG_FS_ENCRYPTION */
 
 #endif
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index c6653f83b6f0..55e23e2601df 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -83,12 +83,17 @@ struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
 			goto out_err;
 	}
 
+	inode->i_state = 0;
+	inode->i_mode = *mode;
+
 	err = ceph_security_init_secctx(dentry, *mode, as_ctx);
 	if (err < 0)
 		goto out_err;
 
-	inode->i_state = 0;
-	inode->i_mode = *mode;
+	err = ceph_fscrypt_prepare_context(dir, inode, as_ctx);
+	if (err)
+		goto out_err;
+
 	return inode;
 out_err:
 	iput(inode);
@@ -101,6 +106,7 @@ void ceph_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *a
 		req->r_pagelist = as_ctx->pagelist;
 		as_ctx->pagelist = NULL;
 	}
+	ceph_fscrypt_as_ctx_to_req(req, as_ctx);
 }
 
 /**
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index fbdf434b4618..0b32d31c6fe0 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -45,6 +45,7 @@ static void ceph_put_super(struct super_block *s)
 	struct ceph_fs_client *fsc = ceph_sb_to_client(s);
 
 	dout("put_super\n");
+	ceph_fscrypt_free_dummy_policy(fsc);
 	ceph_mdsc_close_sessions(fsc->mdsc);
 }
 
@@ -162,6 +163,7 @@ enum {
 	Opt_copyfrom,
 	Opt_wsync,
 	Opt_pagecache,
+	Opt_test_dummy_encryption,
 };
 
 enum ceph_recover_session_mode {
@@ -189,6 +191,7 @@ static const struct fs_parameter_spec ceph_mount_parameters[] = {
 	fsparam_string	("fsc",				Opt_fscache), // fsc=...
 	fsparam_flag_no ("ino32",			Opt_ino32),
 	fsparam_string	("mds_namespace",		Opt_mds_namespace),
+	fsparam_string	("mon_addr",			Opt_mon_addr),
 	fsparam_flag_no ("poolperm",			Opt_poolperm),
 	fsparam_flag_no ("quotadf",			Opt_quotadf),
 	fsparam_u32	("rasize",			Opt_rasize),
@@ -200,7 +203,8 @@ static const struct fs_parameter_spec ceph_mount_parameters[] = {
 	fsparam_u32	("rsize",			Opt_rsize),
 	fsparam_string	("snapdirname",			Opt_snapdirname),
 	fsparam_string	("source",			Opt_source),
-	fsparam_string	("mon_addr",			Opt_mon_addr),
+	fsparam_flag	("test_dummy_encryption",	Opt_test_dummy_encryption),
+	fsparam_string	("test_dummy_encryption",	Opt_test_dummy_encryption),
 	fsparam_u32	("wsize",			Opt_wsize),
 	fsparam_flag_no	("wsync",			Opt_wsync),
 	fsparam_flag_no	("pagecache",			Opt_pagecache),
@@ -576,6 +580,16 @@ static int ceph_parse_mount_param(struct fs_context *fc,
 		else
 			fsopt->flags &= ~CEPH_MOUNT_OPT_NOPAGECACHE;
 		break;
+	case Opt_test_dummy_encryption:
+#ifdef CONFIG_FS_ENCRYPTION
+		kfree(fsopt->test_dummy_encryption);
+		fsopt->test_dummy_encryption = param->string;
+		param->string = NULL;
+		fsopt->flags |= CEPH_MOUNT_OPT_TEST_DUMMY_ENC;
+#else
+		warnfc(fc, "FS encryption not supported: test_dummy_encryption mount option ignored");
+#endif
+		break;
 	default:
 		BUG();
 	}
@@ -596,6 +610,7 @@ static void destroy_mount_options(struct ceph_mount_options *args)
 	kfree(args->server_path);
 	kfree(args->fscache_uniq);
 	kfree(args->mon_addr);
+	kfree(args->test_dummy_encryption);
 	kfree(args);
 }
 
@@ -714,6 +729,8 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root)
 	if (fsopt->flags & CEPH_MOUNT_OPT_NOPAGECACHE)
 		seq_puts(m, ",nopagecache");
 
+	fscrypt_show_test_dummy_encryption(m, ',', root->d_sb);
+
 	if (fsopt->wsize != CEPH_MAX_WRITE_SIZE)
 		seq_printf(m, ",wsize=%u", fsopt->wsize);
 	if (fsopt->rsize != CEPH_MAX_READ_SIZE)
@@ -1041,6 +1058,52 @@ static struct dentry *open_root_dentry(struct ceph_fs_client *fsc,
 	return root;
 }
 
+#ifdef CONFIG_FS_ENCRYPTION
+static int ceph_set_test_dummy_encryption(struct super_block *sb, struct fs_context *fc,
+						struct ceph_mount_options *fsopt)
+{
+	/*
+	 * No changing encryption context on remount. Note that
+	 * fscrypt_set_test_dummy_encryption will validate the version
+	 * string passed in (if any).
+	 */
+	if (fsopt->flags & CEPH_MOUNT_OPT_TEST_DUMMY_ENC) {
+		struct ceph_fs_client *fsc = sb->s_fs_info;
+		int err = 0;
+
+		if (fc->purpose == FS_CONTEXT_FOR_RECONFIGURE && !fsc->dummy_enc_policy.policy) {
+			errorfc(fc, "Can't set test_dummy_encryption on remount");
+			return -EEXIST;
+		}
+
+		err = fscrypt_set_test_dummy_encryption(sb,
+							fsc->mount_options->test_dummy_encryption,
+							&fsc->dummy_enc_policy);
+		if (err) {
+			if (err == -EEXIST)
+				errorfc(fc, "Can't change test_dummy_encryption on remount");
+			else if (err == -EINVAL)
+				errorfc(fc, "Value of option \"%s\" is unrecognized",
+					fsc->mount_options->test_dummy_encryption);
+			else
+				errorfc(fc, "Error processing option \"%s\" [%d]",
+					fsc->mount_options->test_dummy_encryption, err);
+			return err;
+		}
+		warnfc(fc, "test_dummy_encryption mode enabled");
+	}
+	return 0;
+}
+#else
+static inline int ceph_set_test_dummy_encryption(struct super_block *sb, struct fs_context *fc,
+						struct ceph_mount_options *fsopt)
+{
+	if (fsopt->flags & CEPH_MOUNT_OPT_TEST_DUMMY_ENC)
+		warnfc(fc, "test_dummy_encryption mode ignored");
+	return 0;
+}
+#endif
+
 /*
  * mount: join the ceph cluster, and open root directory.
  */
@@ -1069,6 +1132,10 @@ static struct dentry *ceph_real_mount(struct ceph_fs_client *fsc,
 				goto out;
 		}
 
+		err = ceph_set_test_dummy_encryption(fsc->sb, fc, fsc->mount_options);
+		if (err)
+			goto out;
+
 		dout("mount opening path '%s'\n", path);
 
 		ceph_fs_debugfs_init(fsc);
@@ -1277,9 +1344,15 @@ static void ceph_free_fc(struct fs_context *fc)
 
 static int ceph_reconfigure_fc(struct fs_context *fc)
 {
+	int err;
 	struct ceph_parse_opts_ctx *pctx = fc->fs_private;
 	struct ceph_mount_options *fsopt = pctx->opts;
-	struct ceph_fs_client *fsc = ceph_sb_to_client(fc->root->d_sb);
+	struct super_block *sb = fc->root->d_sb;
+	struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
+
+	err = ceph_set_test_dummy_encryption(sb, fc, fsopt);
+	if (err)
+		return err;
 
 	if (fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS)
 		ceph_set_mount_opt(fsc, ASYNC_DIROPS);
@@ -1293,7 +1366,7 @@ static int ceph_reconfigure_fc(struct fs_context *fc)
 		pr_notice("ceph: monitor addresses recorded, but not used for reconnection");
 	}
 
-	sync_filesystem(fc->root->d_sb);
+	sync_filesystem(sb);
 	return 0;
 }
 
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 853577f8d772..042ea1f8e5c2 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -17,6 +17,7 @@
 #include <linux/posix_acl.h>
 #include <linux/refcount.h>
 #include <linux/security.h>
+#include <linux/fscrypt.h>
 
 #include <linux/ceph/libceph.h>
 
@@ -24,6 +25,8 @@
 #include <linux/fscache.h>
 #endif
 
+#include "crypto.h"
+
 /* f_type in struct statfs */
 #define CEPH_SUPER_MAGIC 0x00c36400
 
@@ -46,6 +49,7 @@
 #define CEPH_MOUNT_OPT_NOCOPYFROM      (1<<14) /* don't use RADOS 'copy-from' op */
 #define CEPH_MOUNT_OPT_ASYNC_DIROPS    (1<<15) /* allow async directory ops */
 #define CEPH_MOUNT_OPT_NOPAGECACHE     (1<<16) /* bypass pagecache altogether */
+#define CEPH_MOUNT_OPT_TEST_DUMMY_ENC  (1<<17) /* enable dummy encryption (for testing) */
 
 #define CEPH_MOUNT_OPT_DEFAULT			\
 	(CEPH_MOUNT_OPT_DCACHE |		\
@@ -102,6 +106,7 @@ struct ceph_mount_options {
 	char *server_path;    /* default NULL (means "/") */
 	char *fscache_uniq;   /* default NULL */
 	char *mon_addr;
+	char *test_dummy_encryption;	/* default NULL */
 };
 
 struct ceph_fs_client {
@@ -141,9 +146,11 @@ struct ceph_fs_client {
 #ifdef CONFIG_CEPH_FSCACHE
 	struct fscache_volume *fscache;
 #endif
+#ifdef CONFIG_FS_ENCRYPTION
+	struct fscrypt_dummy_policy dummy_enc_policy;
+#endif
 };
 
-
 /*
  * File i/o capability.  This tracks shared state with the metadata
  * server that allows us to cache or writeback attributes or to read
@@ -1083,6 +1090,9 @@ struct ceph_acl_sec_ctx {
 #ifdef CONFIG_CEPH_FS_SECURITY_LABEL
 	void *sec_ctx;
 	u32 sec_ctxlen;
+#endif
+#ifdef CONFIG_FS_ENCRYPTION
+	struct ceph_fscrypt_auth *fscrypt_auth;
 #endif
 	struct ceph_pagelist *pagelist;
 };
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index fcf7dfdecf96..5e3522457deb 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -1380,6 +1380,9 @@ void ceph_release_acl_sec_ctx(struct ceph_acl_sec_ctx *as_ctx)
 #endif
 #ifdef CONFIG_CEPH_FS_SECURITY_LABEL
 	security_release_secctx(as_ctx->sec_ctx, as_ctx->sec_ctxlen);
+#endif
+#ifdef CONFIG_FS_ENCRYPTION
+	kfree(as_ctx->fscrypt_auth);
 #endif
 	if (as_ctx->pagelist)
 		ceph_pagelist_release(as_ctx->pagelist);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 11/48] ceph: decode alternate_name in lease info
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (9 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 10/48] ceph: implement -o test_dummy_encryption mount option Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-03-01 10:57   ` Xiubo Li
  2022-01-11 19:15 ` [RFC PATCH v10 12/48] ceph: add fscrypt ioctls Jeff Layton
                   ` (40 subsequent siblings)
  51 siblings, 1 reply; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Ceph is a bit different from local filesystems, in that we don't want
to store filenames as raw binary data, since we may also be dealing
with clients that don't support fscrypt.

We could just base64-encode the encrypted filenames, but that could
leave us with filenames longer than NAME_MAX. It turns out that the
MDS doesn't care much about filename length, but the clients do.

To manage this, we've added a new "alternate name" field that can be
optionally added to any dentry that we'll use to store the binary
crypttext of the filename if its base64-encoded value will be longer
than NAME_MAX. When a dentry has one of these names attached, the MDS
will send it along in the lease info, which we can then store for
later usage.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 40 ++++++++++++++++++++++++++++++----------
 fs/ceph/mds_client.h | 11 +++++++----
 2 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 34a4f6dbac9d..709f3f654555 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -306,27 +306,44 @@ static int parse_reply_info_dir(void **p, void *end,
 
 static int parse_reply_info_lease(void **p, void *end,
 				  struct ceph_mds_reply_lease **lease,
-				  u64 features)
+				  u64 features, u32 *altname_len, u8 **altname)
 {
+	u8 struct_v;
+	u32 struct_len;
+
 	if (features == (u64)-1) {
-		u8 struct_v, struct_compat;
-		u32 struct_len;
+		u8 struct_compat;
+
 		ceph_decode_8_safe(p, end, struct_v, bad);
 		ceph_decode_8_safe(p, end, struct_compat, bad);
+
 		/* struct_v is expected to be >= 1. we only understand
 		 * encoding whose struct_compat == 1. */
 		if (!struct_v || struct_compat != 1)
 			goto bad;
+
 		ceph_decode_32_safe(p, end, struct_len, bad);
-		ceph_decode_need(p, end, struct_len, bad);
-		end = *p + struct_len;
+	} else {
+		struct_len = sizeof(**lease);
+		*altname_len = 0;
+		*altname = NULL;
 	}
 
-	ceph_decode_need(p, end, sizeof(**lease), bad);
+	ceph_decode_need(p, end, struct_len, bad);
 	*lease = *p;
 	*p += sizeof(**lease);
-	if (features == (u64)-1)
-		*p = end;
+
+	if (features == (u64)-1) {
+		if (struct_v >= 2) {
+			ceph_decode_32_safe(p, end, *altname_len, bad);
+			ceph_decode_need(p, end, *altname_len, bad);
+			*altname = *p;
+			*p += *altname_len;
+		} else {
+			*altname = NULL;
+			*altname_len = 0;
+		}
+	}
 	return 0;
 bad:
 	return -EIO;
@@ -356,7 +373,8 @@ static int parse_reply_info_trace(void **p, void *end,
 		info->dname = *p;
 		*p += info->dname_len;
 
-		err = parse_reply_info_lease(p, end, &info->dlease, features);
+		err = parse_reply_info_lease(p, end, &info->dlease, features,
+					     &info->altname_len, &info->altname);
 		if (err < 0)
 			goto out_bad;
 	}
@@ -423,9 +441,11 @@ static int parse_reply_info_readdir(void **p, void *end,
 		dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);
 
 		/* dentry lease */
-		err = parse_reply_info_lease(p, end, &rde->lease, features);
+		err = parse_reply_info_lease(p, end, &rde->lease, features,
+					     &rde->altname_len, &rde->altname);
 		if (err)
 			goto out_bad;
+
 		/* inode */
 		err = parse_reply_info_in(p, end, &rde->inode, features);
 		if (err < 0)
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index e7d2c8a1b9c1..128901a847af 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -29,8 +29,8 @@ enum ceph_feature_type {
 	CEPHFS_FEATURE_MULTI_RECONNECT,
 	CEPHFS_FEATURE_DELEG_INO,
 	CEPHFS_FEATURE_METRIC_COLLECT,
-
-	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
+	CEPHFS_FEATURE_ALTERNATE_NAME,
+	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_ALTERNATE_NAME,
 };
 
 /*
@@ -45,8 +45,7 @@ enum ceph_feature_type {
 	CEPHFS_FEATURE_MULTI_RECONNECT,		\
 	CEPHFS_FEATURE_DELEG_INO,		\
 	CEPHFS_FEATURE_METRIC_COLLECT,		\
-						\
-	CEPHFS_FEATURE_MAX,			\
+	CEPHFS_FEATURE_ALTERNATE_NAME,		\
 }
 #define CEPHFS_FEATURES_CLIENT_REQUIRED {}
 
@@ -98,7 +97,9 @@ struct ceph_mds_reply_info_in {
 
 struct ceph_mds_reply_dir_entry {
 	char                          *name;
+	u8			      *altname;
 	u32                           name_len;
+	u32			      altname_len;
 	struct ceph_mds_reply_lease   *lease;
 	struct ceph_mds_reply_info_in inode;
 	loff_t			      offset;
@@ -117,7 +118,9 @@ struct ceph_mds_reply_info_parsed {
 	struct ceph_mds_reply_info_in diri, targeti;
 	struct ceph_mds_reply_dirfrag *dirfrag;
 	char                          *dname;
+	u8			      *altname;
 	u32                           dname_len;
+	u32                           altname_len;
 	struct ceph_mds_reply_lease   *dlease;
 
 	/* extra */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 12/48] ceph: add fscrypt ioctls
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (10 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 11/48] ceph: decode alternate_name in lease info Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 13/48] ceph: make ceph_msdc_build_path use ref-walk Jeff Layton
                   ` (39 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

We gate most of the ioctls on MDS feature support. The exception is the
key removal and status functions that we still want to work if the MDS's
were to (inexplicably) lose the feature.

For the set_policy ioctl, we take Fcx caps to ensure that nothing can
create files in the directory while the ioctl is running. That should
be enough to ensure that the "empty_dir" check is reliable.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/ioctl.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c
index 6e061bf62ad4..477ecc667aee 100644
--- a/fs/ceph/ioctl.c
+++ b/fs/ceph/ioctl.c
@@ -6,6 +6,7 @@
 #include "mds_client.h"
 #include "ioctl.h"
 #include <linux/ceph/striper.h>
+#include <linux/fscrypt.h>
 
 /*
  * ioctls
@@ -268,8 +269,54 @@ static long ceph_ioctl_syncio(struct file *file)
 	return 0;
 }
 
+static int vet_mds_for_fscrypt(struct file *file)
+{
+	int i, ret = -EOPNOTSUPP;
+	struct ceph_mds_client	*mdsc = ceph_sb_to_mdsc(file_inode(file)->i_sb);
+
+	mutex_lock(&mdsc->mutex);
+	for (i = 0; i < mdsc->max_sessions; i++) {
+		struct ceph_mds_session *s = mdsc->sessions[i];
+
+		if (!s)
+			continue;
+		if (test_bit(CEPHFS_FEATURE_ALTERNATE_NAME, &s->s_features))
+			ret = 0;
+		break;
+	}
+	mutex_unlock(&mdsc->mutex);
+	return ret;
+}
+
+static long ceph_set_encryption_policy(struct file *file, unsigned long arg)
+{
+	int ret, got = 0;
+	struct inode *inode = file_inode(file);
+	struct ceph_inode_info *ci = ceph_inode(inode);
+
+	ret = vet_mds_for_fscrypt(file);
+	if (ret)
+		return ret;
+
+	/*
+	 * Ensure we hold these caps so that we _know_ that the rstats check
+	 * in the empty_dir check is reliable.
+	 */
+	ret = ceph_get_caps(file, CEPH_CAP_FILE_SHARED, 0, -1, &got);
+	if (ret)
+		return ret;
+
+	ret = fscrypt_ioctl_set_policy(file, (const void __user *)arg);
+	if (got)
+		ceph_put_cap_refs(ci, got);
+
+	return ret;
+}
+
 long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
+	int ret;
+
 	dout("ioctl file %p cmd %u arg %lu\n", file, cmd, arg);
 	switch (cmd) {
 	case CEPH_IOC_GET_LAYOUT:
@@ -289,6 +336,42 @@ long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 
 	case CEPH_IOC_SYNCIO:
 		return ceph_ioctl_syncio(file);
+
+	case FS_IOC_SET_ENCRYPTION_POLICY:
+		return ceph_set_encryption_policy(file, arg);
+
+	case FS_IOC_GET_ENCRYPTION_POLICY:
+		ret = vet_mds_for_fscrypt(file);
+		if (ret)
+			return ret;
+		return fscrypt_ioctl_get_policy(file, (void __user *)arg);
+
+	case FS_IOC_GET_ENCRYPTION_POLICY_EX:
+		ret = vet_mds_for_fscrypt(file);
+		if (ret)
+			return ret;
+		return fscrypt_ioctl_get_policy_ex(file, (void __user *)arg);
+
+	case FS_IOC_ADD_ENCRYPTION_KEY:
+		ret = vet_mds_for_fscrypt(file);
+		if (ret)
+			return ret;
+		return fscrypt_ioctl_add_key(file, (void __user *)arg);
+
+	case FS_IOC_REMOVE_ENCRYPTION_KEY:
+		return fscrypt_ioctl_remove_key(file, (void __user *)arg);
+
+	case FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS:
+		return fscrypt_ioctl_remove_key_all_users(file, (void __user *)arg);
+
+	case FS_IOC_GET_ENCRYPTION_KEY_STATUS:
+		return fscrypt_ioctl_get_key_status(file, (void __user *)arg);
+
+	case FS_IOC_GET_ENCRYPTION_NONCE:
+		ret = vet_mds_for_fscrypt(file);
+		if (ret)
+			return ret;
+		return fscrypt_ioctl_get_nonce(file, (void __user *)arg);
 	}
 
 	return -ENOTTY;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 13/48] ceph: make ceph_msdc_build_path use ref-walk
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (11 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 12/48] ceph: add fscrypt ioctls Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 14/48] ceph: add encrypted fname handling to ceph_mdsc_build_path Jeff Layton
                   ` (38 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Encryption potentially requires allocation, at which point we'll need to
be in a non-atomic context. Convert ceph_msdc_build_path to take dentry
spinlocks and references instead of using rcu_read_lock to walk the
path.

This is slightly less efficient, and we may want to eventually allow
using RCU when the leaf dentry isn't encrypted.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 709f3f654555..68552cee3e8e 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2368,7 +2368,8 @@ static inline  u64 __get_oldest_tid(struct ceph_mds_client *mdsc)
 char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 			   int stop_on_nosnap)
 {
-	struct dentry *temp;
+	struct dentry *cur;
+	struct inode *inode;
 	char *path;
 	int pos;
 	unsigned seq;
@@ -2385,34 +2386,35 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 	path[pos] = '\0';
 
 	seq = read_seqbegin(&rename_lock);
-	rcu_read_lock();
-	temp = dentry;
+	cur = dget(dentry);
 	for (;;) {
-		struct inode *inode;
+		struct dentry *temp;
 
-		spin_lock(&temp->d_lock);
-		inode = d_inode(temp);
+		spin_lock(&cur->d_lock);
+		inode = d_inode(cur);
 		if (inode && ceph_snap(inode) == CEPH_SNAPDIR) {
 			dout("build_path path+%d: %p SNAPDIR\n",
-			     pos, temp);
-		} else if (stop_on_nosnap && inode && dentry != temp &&
+			     pos, cur);
+		} else if (stop_on_nosnap && inode && dentry != cur &&
 			   ceph_snap(inode) == CEPH_NOSNAP) {
-			spin_unlock(&temp->d_lock);
+			spin_unlock(&cur->d_lock);
 			pos++; /* get rid of any prepended '/' */
 			break;
 		} else {
-			pos -= temp->d_name.len;
+			pos -= cur->d_name.len;
 			if (pos < 0) {
-				spin_unlock(&temp->d_lock);
+				spin_unlock(&cur->d_lock);
 				break;
 			}
-			memcpy(path + pos, temp->d_name.name, temp->d_name.len);
+			memcpy(path + pos, cur->d_name.name, cur->d_name.len);
 		}
+		temp = cur;
 		spin_unlock(&temp->d_lock);
-		temp = READ_ONCE(temp->d_parent);
+		cur = dget_parent(temp);
+		dput(temp);
 
 		/* Are we at the root? */
-		if (IS_ROOT(temp))
+		if (IS_ROOT(cur))
 			break;
 
 		/* Are we out of buffer? */
@@ -2421,8 +2423,9 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 
 		path[pos] = '/';
 	}
-	base = ceph_ino(d_inode(temp));
-	rcu_read_unlock();
+	inode = d_inode(cur);
+	base = inode ? ceph_ino(inode) : 0;
+	dput(cur);
 
 	if (read_seqretry(&rename_lock, seq))
 		goto retry;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 14/48] ceph: add encrypted fname handling to ceph_mdsc_build_path
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (12 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 13/48] ceph: make ceph_msdc_build_path use ref-walk Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 15/48] ceph: send altname in MClientRequest Jeff Layton
                   ` (37 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Allow ceph_mdsc_build_path to encrypt and base64 encode the filename
when the parent is encrypted and we're sending the path to the MDS.

In most cases, we just encrypt the filenames and base64 encode them,
but when the name is longer than CEPH_NOHASH_NAME_MAX, we use a similar
scheme to fscrypt proper, and hash the remaning bits with sha256.

When doing this, we then send along the full crypttext of the name in
the new alternate_name field of the MClientRequest. The MDS can then
send that along in readdir responses and traces.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/crypto.c     | 48 ++++++++++++++++++++++++++
 fs/ceph/crypto.h     | 26 ++++++++++++++
 fs/ceph/mds_client.c | 80 ++++++++++++++++++++++++++++++++++----------
 3 files changed, 136 insertions(+), 18 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 017f31eacb74..1f54e948b656 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -127,3 +127,51 @@ void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_se
 {
 	swap(req->r_fscrypt_auth, as->fscrypt_auth);
 }
+
+int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf)
+{
+	u32 len;
+	int elen;
+	int ret;
+	u8 *cryptbuf;
+
+	WARN_ON_ONCE(!fscrypt_has_encryption_key(parent));
+
+	/*
+	 * convert cleartext dentry name to ciphertext
+	 * if result is longer than CEPH_NOKEY_NAME_MAX,
+	 * sha256 the remaining bytes
+	 *
+	 * See: fscrypt_setup_filename
+	 */
+	if (!fscrypt_fname_encrypted_size(parent, dentry->d_name.len, NAME_MAX, &len))
+		return -ENAMETOOLONG;
+
+	/* Allocate a buffer appropriate to hold the result */
+	cryptbuf = kmalloc(len > CEPH_NOHASH_NAME_MAX ? NAME_MAX : len, GFP_KERNEL);
+	if (!cryptbuf)
+		return -ENOMEM;
+
+	ret = fscrypt_fname_encrypt(parent, &dentry->d_name, cryptbuf, len);
+	if (ret) {
+		kfree(cryptbuf);
+		return ret;
+	}
+
+	/* hash the end if the name is long enough */
+	if (len > CEPH_NOHASH_NAME_MAX) {
+		u8 hash[SHA256_DIGEST_SIZE];
+		u8 *extra = cryptbuf + CEPH_NOHASH_NAME_MAX;
+
+		/* hash the extra bytes and overwrite crypttext beyond that point with it */
+		sha256(extra, len - CEPH_NOHASH_NAME_MAX, hash);
+		memcpy(extra, hash, SHA256_DIGEST_SIZE);
+		len = CEPH_NOHASH_NAME_MAX + SHA256_DIGEST_SIZE;
+	}
+
+	/* base64 encode the encrypted name */
+	elen = fscrypt_base64url_encode(cryptbuf, len, buf);
+	kfree(cryptbuf);
+	dout("base64-encoded ciphertext name = %.*s\n", elen, buf);
+	return elen;
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index cb00fe42d5b7..d5e298383b3e 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -6,6 +6,7 @@
 #ifndef _CEPH_CRYPTO_H
 #define _CEPH_CRYPTO_H
 
+#include <crypto/sha2.h>
 #include <linux/fscrypt.h>
 
 struct ceph_fs_client;
@@ -27,6 +28,24 @@ static inline u32 ceph_fscrypt_auth_len(struct ceph_fscrypt_auth *fa)
 }
 
 #ifdef CONFIG_FS_ENCRYPTION
+/*
+ * We want to encrypt filenames when creating them, but the encrypted
+ * versions of those names may have illegal characters in them. To mitigate
+ * that, we base64 encode them, but that gives us a result that can exceed
+ * NAME_MAX.
+ *
+ * Follow a similar scheme to fscrypt itself, and cap the filename to a
+ * smaller size. If the ciphertext name is longer than the value below, then
+ * sha256 hash the remaining bytes.
+ *
+ * 189 bytes => 252 bytes base64-encoded, which is <= NAME_MAX (255)
+ *
+ * Note that for long names that end up having their tail portion hashed, we
+ * must also store the full encrypted name (in the dentry's alternate_name
+ * field).
+ */
+#define CEPH_NOHASH_NAME_MAX (189 - SHA256_DIGEST_SIZE)
+
 void ceph_fscrypt_set_ops(struct super_block *sb);
 
 void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc);
@@ -34,6 +53,7 @@ void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc);
 int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
 				 struct ceph_acl_sec_ctx *as);
 void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as);
+int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf);
 
 #else /* CONFIG_FS_ENCRYPTION */
 
@@ -57,6 +77,12 @@ static inline void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req,
 						struct ceph_acl_sec_ctx *as_ctx)
 {
 }
+
+static inline int ceph_encode_encrypted_fname(const struct inode *parent,
+						struct dentry *dentry, char *buf)
+{
+	return -EOPNOTSUPP;
+}
 #endif /* CONFIG_FS_ENCRYPTION */
 
 #endif
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 68552cee3e8e..9552a2eb3e10 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -14,6 +14,7 @@
 #include <linux/bitmap.h>
 
 #include "super.h"
+#include "crypto.h"
 #include "mds_client.h"
 #include "crypto.h"
 
@@ -2355,18 +2356,27 @@ static inline  u64 __get_oldest_tid(struct ceph_mds_client *mdsc)
 	return mdsc->oldest_tid;
 }
 
-/*
- * Build a dentry's path.  Allocate on heap; caller must kfree.  Based
- * on build_path_from_dentry in fs/cifs/dir.c.
+/**
+ * ceph_mdsc_build_path - build a path string to a given dentry
+ * @dentry: dentry to which path should be built
+ * @plen: returned length of string
+ * @pbase: returned base inode number
+ * @for_wire: is this path going to be sent to the MDS?
+ *
+ * Build a string that represents the path to the dentry. This is mostly called
+ * for two different purposes:
  *
- * If @stop_on_nosnap, generate path relative to the first non-snapped
- * inode.
+ * 1) we need to build a path string to send to the MDS (for_wire == true)
+ * 2) we need a path string for local presentation (e.g. debugfs) (for_wire == false)
+ *
+ * The path is built in reverse, starting with the dentry. Walk back up toward
+ * the root, building the path until the first non-snapped inode is reached (for_wire)
+ * or the root inode is reached (!for_wire).
  *
  * Encode hidden .snap dirs as a double /, i.e.
  *   foo/.snap/bar -> foo//bar
  */
-char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
-			   int stop_on_nosnap)
+char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase, int for_wire)
 {
 	struct dentry *cur;
 	struct inode *inode;
@@ -2388,30 +2398,65 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 	seq = read_seqbegin(&rename_lock);
 	cur = dget(dentry);
 	for (;;) {
-		struct dentry *temp;
+		struct dentry *parent;
 
 		spin_lock(&cur->d_lock);
 		inode = d_inode(cur);
 		if (inode && ceph_snap(inode) == CEPH_SNAPDIR) {
 			dout("build_path path+%d: %p SNAPDIR\n",
 			     pos, cur);
-		} else if (stop_on_nosnap && inode && dentry != cur &&
-			   ceph_snap(inode) == CEPH_NOSNAP) {
+			spin_unlock(&cur->d_lock);
+			parent = dget_parent(cur);
+		} else if (for_wire && inode && dentry != cur && ceph_snap(inode) == CEPH_NOSNAP) {
 			spin_unlock(&cur->d_lock);
 			pos++; /* get rid of any prepended '/' */
 			break;
-		} else {
+		} else if (!for_wire || !IS_ENCRYPTED(d_inode(cur->d_parent))) {
 			pos -= cur->d_name.len;
 			if (pos < 0) {
 				spin_unlock(&cur->d_lock);
 				break;
 			}
 			memcpy(path + pos, cur->d_name.name, cur->d_name.len);
+			spin_unlock(&cur->d_lock);
+			parent = dget_parent(cur);
+		} else {
+			int len, ret;
+			char buf[FSCRYPT_BASE64URL_CHARS(NAME_MAX)];
+
+			/*
+			 * Proactively copy name into buf, in case we need to present
+			 * it as-is.
+			 */
+			memcpy(buf, cur->d_name.name, cur->d_name.len);
+			len = cur->d_name.len;
+			spin_unlock(&cur->d_lock);
+			parent = dget_parent(cur);
+
+			ret = __fscrypt_prepare_readdir(d_inode(parent));
+			if (ret < 0) {
+				dput(parent);
+				dput(cur);
+				return ERR_PTR(ret);
+			}
+
+			if (fscrypt_has_encryption_key(d_inode(parent))) {
+				len = ceph_encode_encrypted_fname(d_inode(parent), cur, buf);
+				if (len < 0) {
+					dput(parent);
+					dput(cur);
+					return ERR_PTR(len);
+				}
+			}
+			pos -= len;
+			if (pos < 0) {
+				dput(parent);
+				break;
+			}
+			memcpy(path + pos, buf, len);
 		}
-		temp = cur;
-		spin_unlock(&temp->d_lock);
-		cur = dget_parent(temp);
-		dput(temp);
+		dput(cur);
+		cur = parent;
 
 		/* Are we at the root? */
 		if (IS_ROOT(cur))
@@ -2435,8 +2480,7 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 		 * A rename didn't occur, but somehow we didn't end up where
 		 * we thought we would. Throw a warning and try again.
 		 */
-		pr_warn("build_path did not end path lookup where "
-			"expected, pos is %d\n", pos);
+		pr_warn("build_path did not end path lookup where expected (pos = %d)\n", pos);
 		goto retry;
 	}
 
@@ -2456,7 +2500,7 @@ static int build_dentry_path(struct dentry *dentry, struct inode *dir,
 	rcu_read_lock();
 	if (!dir)
 		dir = d_inode_rcu(dentry->d_parent);
-	if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP) {
+	if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP && !IS_ENCRYPTED(dir)) {
 		*pino = ceph_ino(dir);
 		rcu_read_unlock();
 		*ppath = dentry->d_name.name;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 15/48] ceph: send altname in MClientRequest
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (13 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 14/48] ceph: add encrypted fname handling to ceph_mdsc_build_path Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 16/48] ceph: encode encrypted name in dentry release Jeff Layton
                   ` (36 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

In the event that we have a filename longer than CEPH_NOHASH_NAME_MAX,
we'll need to hash the tail of the filename. The client however will
still need to know the full name of the file if it has a key.

To support this, the MClientRequest field has grown a new alternate_name
field that we populate with the full (binary) crypttext of the filename.
This is then transmitted to the clients in readdir or traces as part of
the dentry lease.

Add support for populating this field when the filenames are very long.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 75 +++++++++++++++++++++++++++++++++++++++++---
 fs/ceph/mds_client.h |  3 ++
 2 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 9552a2eb3e10..8d84995481f2 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -943,6 +943,7 @@ void ceph_mdsc_release_request(struct kref *kref)
 	if (req->r_pagelist)
 		ceph_pagelist_release(req->r_pagelist);
 	kfree(req->r_fscrypt_auth);
+	kfree(req->r_altname);
 	put_request_session(req);
 	ceph_unreserve_caps(req->r_mdsc, &req->r_caps_reservation);
 	WARN_ON_ONCE(!list_empty(&req->r_wait));
@@ -2356,6 +2357,63 @@ static inline  u64 __get_oldest_tid(struct ceph_mds_client *mdsc)
 	return mdsc->oldest_tid;
 }
 
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
+{
+	struct inode *dir = req->r_parent;
+	struct dentry *dentry = req->r_dentry;
+	u8 *cryptbuf = NULL;
+	u32 len = 0;
+	int ret = 0;
+
+	/* only encode if we have parent and dentry */
+	if (!dir || !dentry)
+		goto success;
+
+	/* No-op unless this is encrypted */
+	if (!IS_ENCRYPTED(dir))
+		goto success;
+
+	ret = __fscrypt_prepare_readdir(dir);
+	if (ret)
+		return ERR_PTR(ret);
+
+	/* No key? Just ignore it. */
+	if (!fscrypt_has_encryption_key(dir))
+		goto success;
+
+	if (!fscrypt_fname_encrypted_size(dir, dentry->d_name.len, NAME_MAX, &len)) {
+		WARN_ON_ONCE(1);
+		return ERR_PTR(-ENAMETOOLONG);
+	}
+
+	/* No need to append altname if name is short enough */
+	if (len <= CEPH_NOHASH_NAME_MAX) {
+		len = 0;
+		goto success;
+	}
+
+	cryptbuf = kmalloc(len, GFP_KERNEL);
+	if (!cryptbuf)
+		return ERR_PTR(-ENOMEM);
+
+	ret = fscrypt_fname_encrypt(dir, &dentry->d_name, cryptbuf, len);
+	if (ret) {
+		kfree(cryptbuf);
+		return ERR_PTR(ret);
+	}
+success:
+	*plen = len;
+	return cryptbuf;
+}
+#else
+static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
+{
+	*plen = 0;
+	return NULL;
+}
+#endif
+
 /**
  * ceph_mdsc_build_path - build a path string to a given dentry
  * @dentry: dentry to which path should be built
@@ -2576,14 +2634,15 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
 	ceph_encode_timespec64(&ts, &req->r_stamp);
 	ceph_encode_copy(p, &ts, sizeof(ts));
 
-	/* gid_list */
+	/* v4: gid_list */
 	ceph_encode_32(p, req->r_cred->group_info->ngroups);
 	for (i = 0; i < req->r_cred->group_info->ngroups; i++)
 		ceph_encode_64(p, from_kgid(&init_user_ns,
 					    req->r_cred->group_info->gid[i]));
 
-	/* v5: altname (TODO: skip for now) */
-	ceph_encode_32(p, 0);
+	/* v5: altname */
+	ceph_encode_32(p, req->r_altname_len);
+	ceph_encode_copy(p, req->r_altname, req->r_altname_len);
 
 	/* v6: fscrypt_auth and fscrypt_file */
 	if (req->r_fscrypt_auth) {
@@ -2639,7 +2698,13 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 		goto out_free1;
 	}
 
-	/* head */
+	req->r_altname = get_fscrypt_altname(req, &req->r_altname_len);
+	if (IS_ERR(req->r_altname)) {
+		msg = ERR_CAST(req->r_altname);
+		req->r_altname = NULL;
+		goto out_free2;
+	}
+
 	len = legacy ? sizeof(*head) : sizeof(struct ceph_mds_request_head);
 
 	/* filepaths */
@@ -2665,7 +2730,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 	len += sizeof(u32) + (sizeof(u64) * req->r_cred->group_info->ngroups);
 
 	/* alternate name */
-	len += sizeof(u32);	// TODO
+	len += sizeof(u32) + req->r_altname_len;
 
 	/* fscrypt_auth */
 	len += sizeof(u32); // fscrypt_auth
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 128901a847af..6a2ac489e06e 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -284,6 +284,9 @@ struct ceph_mds_request {
 
 	struct ceph_fscrypt_auth *r_fscrypt_auth;
 
+	u8 *r_altname;		    /* fscrypt binary crypttext for long filenames */
+	u32 r_altname_len;	    /* length of r_altname */
+
 	int r_fmode;        /* file mode, if expecting cap */
 	const struct cred *r_cred;
 	int r_request_release_offset;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 16/48] ceph: encode encrypted name in dentry release
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (14 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 15/48] ceph: send altname in MClientRequest Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 17/48] ceph: properly set DCACHE_NOKEY_NAME flag in lookup Jeff Layton
                   ` (35 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c       | 31 +++++++++++++++++++++++++++----
 fs/ceph/mds_client.c | 20 ++++++++++++++++----
 2 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 87ee9766dc2e..3a9672e822d9 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -4584,6 +4584,18 @@ int ceph_encode_inode_release(void **p, struct inode *inode,
 	return ret;
 }
 
+/**
+ * ceph_encode_dentry_release - encode a dentry release into an outgoing request
+ * @p: outgoing request buffer
+ * @dentry: dentry to release
+ * @dir: dir to release it from
+ * @mds: mds that we're speaking to
+ * @drop: caps being dropped
+ * @unless: unless we have these caps
+ *
+ * Encode a dentry release into an outgoing request buffer. Returns 1 if the
+ * thing was released, or a negative error code otherwise.
+ */
 int ceph_encode_dentry_release(void **p, struct dentry *dentry,
 			       struct inode *dir,
 			       int mds, int drop, int unless)
@@ -4616,13 +4628,24 @@ int ceph_encode_dentry_release(void **p, struct dentry *dentry,
 	if (ret && di->lease_session && di->lease_session->s_mds == mds) {
 		dout("encode_dentry_release %p mds%d seq %d\n",
 		     dentry, mds, (int)di->lease_seq);
-		rel->dname_len = cpu_to_le32(dentry->d_name.len);
-		memcpy(*p, dentry->d_name.name, dentry->d_name.len);
-		*p += dentry->d_name.len;
 		rel->dname_seq = cpu_to_le32(di->lease_seq);
 		__ceph_mdsc_drop_dentry_lease(dentry);
+		spin_unlock(&dentry->d_lock);
+		if (IS_ENCRYPTED(dir) && fscrypt_has_encryption_key(dir)) {
+			int ret2 = ceph_encode_encrypted_fname(dir, dentry, *p);
+			if (ret2 < 0)
+				return ret2;
+
+			rel->dname_len = cpu_to_le32(ret2);
+			*p += ret2;
+		} else {
+			rel->dname_len = cpu_to_le32(dentry->d_name.len);
+			memcpy(*p, dentry->d_name.name, dentry->d_name.len);
+			*p += dentry->d_name.len;
+		}
+	} else {
+		spin_unlock(&dentry->d_lock);
 	}
-	spin_unlock(&dentry->d_lock);
 	return ret;
 }
 
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 8d84995481f2..1d3334b99047 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2789,15 +2789,23 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 		      req->r_inode ? req->r_inode : d_inode(req->r_dentry),
 		      mds, req->r_inode_drop, req->r_inode_unless,
 		      req->r_op == CEPH_MDS_OP_READDIR);
-	if (req->r_dentry_drop)
-		releases += ceph_encode_dentry_release(&p, req->r_dentry,
+	if (req->r_dentry_drop) {
+		ret = ceph_encode_dentry_release(&p, req->r_dentry,
 				req->r_parent, mds, req->r_dentry_drop,
 				req->r_dentry_unless);
-	if (req->r_old_dentry_drop)
-		releases += ceph_encode_dentry_release(&p, req->r_old_dentry,
+		if (ret < 0)
+			goto out_err;
+		releases += ret;
+	}
+	if (req->r_old_dentry_drop) {
+		ret = ceph_encode_dentry_release(&p, req->r_old_dentry,
 				req->r_old_dentry_dir, mds,
 				req->r_old_dentry_drop,
 				req->r_old_dentry_unless);
+		if (ret < 0)
+			goto out_err;
+		releases += ret;
+	}
 	if (req->r_old_inode_drop)
 		releases += ceph_encode_inode_release(&p,
 		      d_inode(req->r_old_dentry),
@@ -2839,6 +2847,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 		ceph_mdsc_free_path((char *)path1, pathlen1);
 out:
 	return msg;
+out_err:
+	ceph_msg_put(msg);
+	msg = ERR_PTR(ret);
+	goto out_free2;
 }
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 17/48] ceph: properly set DCACHE_NOKEY_NAME flag in lookup
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (15 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 16/48] ceph: encode encrypted name in dentry release Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 18/48] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries Jeff Layton
                   ` (34 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

This is required so that we know to invalidate these dentries when the
directory is unlocked.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 288f6f0b4b74..4fa776d8fa53 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -751,6 +751,17 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
 	if (dentry->d_name.len > NAME_MAX)
 		return ERR_PTR(-ENAMETOOLONG);
 
+	if (IS_ENCRYPTED(dir)) {
+		err = __fscrypt_prepare_readdir(dir);
+		if (err)
+			return ERR_PTR(err);
+		if (!fscrypt_has_encryption_key(dir)) {
+			spin_lock(&dentry->d_lock);
+			dentry->d_flags |= DCACHE_NOKEY_NAME;
+			spin_unlock(&dentry->d_lock);
+		}
+	}
+
 	/* can we conclude ENOENT locally? */
 	if (d_really_is_negative(dentry)) {
 		struct ceph_inode_info *ci = ceph_inode(dir);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 18/48] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (16 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 17/48] ceph: properly set DCACHE_NOKEY_NAME flag in lookup Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 19/48] ceph: add helpers for converting names for userland presentation Jeff Layton
                   ` (33 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

If we have a dentry which represents a no-key name, then we need to test
whether the parent directory's encryption key has since been added.  Do
that before we test anything else about the dentry.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 4fa776d8fa53..7977484d0317 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1700,6 +1700,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 	struct inode *dir, *inode;
 	struct ceph_mds_client *mdsc;
 
+	valid = fscrypt_d_revalidate(dentry, flags);
+	if (valid <= 0)
+		return valid;
+
 	if (flags & LOOKUP_RCU) {
 		parent = READ_ONCE(dentry->d_parent);
 		dir = d_inode_rcu(parent);
@@ -1712,8 +1716,8 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 		inode = d_inode(dentry);
 	}
 
-	dout("d_revalidate %p '%pd' inode %p offset 0x%llx\n", dentry,
-	     dentry, inode, ceph_dentry(dentry)->offset);
+	dout("d_revalidate %p '%pd' inode %p offset 0x%llx nokey %d\n", dentry,
+	     dentry, inode, ceph_dentry(dentry)->offset, !!(dentry->d_flags & DCACHE_NOKEY_NAME));
 
 	mdsc = ceph_sb_to_client(dir->i_sb)->mdsc;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 19/48] ceph: add helpers for converting names for userland presentation
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (17 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 18/48] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 20/48] ceph: add fscrypt support to ceph_fill_trace Jeff Layton
                   ` (32 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/crypto.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/crypto.h | 41 ++++++++++++++++++++++++++
 2 files changed, 117 insertions(+)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 1f54e948b656..35137beb027b 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -175,3 +175,79 @@ int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentr
 	dout("base64-encoded ciphertext name = %.*s\n", elen, buf);
 	return elen;
 }
+
+/**
+ * ceph_fname_to_usr - convert a filename for userland presentation
+ * @fname: ceph_fname to be converted
+ * @tname: temporary name buffer to use for conversion (may be NULL)
+ * @oname: where converted name should be placed
+ * @is_nokey: set to true if key wasn't available during conversion (may be NULL)
+ *
+ * Given a filename (usually from the MDS), format it for presentation to
+ * userland. If @parent is not encrypted, just pass it back as-is.
+ *
+ * Otherwise, base64 decode the string, and then ask fscrypt to format it
+ * for userland presentation.
+ *
+ * Returns 0 on success or negative error code on error.
+ */
+int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+		      struct fscrypt_str *oname, bool *is_nokey)
+{
+	int ret;
+	struct fscrypt_str _tname = FSTR_INIT(NULL, 0);
+	struct fscrypt_str iname;
+
+	if (!IS_ENCRYPTED(fname->dir)) {
+		oname->name = fname->name;
+		oname->len = fname->name_len;
+		return 0;
+	}
+
+	/* Sanity check that the resulting name will fit in the buffer */
+	if (fname->name_len > FSCRYPT_BASE64URL_CHARS(NAME_MAX))
+		return -EIO;
+
+	ret = __fscrypt_prepare_readdir(fname->dir);
+	if (ret)
+		return ret;
+
+	/*
+	 * Use the raw dentry name as sent by the MDS instead of
+	 * generating a nokey name via fscrypt.
+	 */
+	if (!fscrypt_has_encryption_key(fname->dir)) {
+		memcpy(oname->name, fname->name, fname->name_len);
+		oname->len = fname->name_len;
+		if (is_nokey)
+			*is_nokey = true;
+		return 0;
+	}
+
+	if (fname->ctext_len == 0) {
+		int declen;
+
+		if (!tname) {
+			ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
+			if (ret)
+				return ret;
+			tname = &_tname;
+		}
+
+		declen = fscrypt_base64url_decode(fname->name, fname->name_len, tname->name);
+		if (declen <= 0) {
+			ret = -EIO;
+			goto out;
+		}
+		iname.name = tname->name;
+		iname.len = declen;
+	} else {
+		iname.name = fname->ctext;
+		iname.len = fname->ctext_len;
+	}
+
+	ret = fscrypt_fname_disk_to_usr(fname->dir, 0, 0, &iname, oname);
+out:
+	fscrypt_fname_free_buffer(&_tname);
+	return ret;
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index d5e298383b3e..c2e0cbb5667b 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -13,6 +13,14 @@ struct ceph_fs_client;
 struct ceph_acl_sec_ctx;
 struct ceph_mds_request;
 
+struct ceph_fname {
+	struct inode	*dir;
+	char 		*name;		// b64 encoded, possibly hashed
+	unsigned char	*ctext;		// binary crypttext (if any)
+	u32		name_len;	// length of name buffer
+	u32		ctext_len;	// length of crypttext
+};
+
 struct ceph_fscrypt_auth {
 	__le32	cfa_version;
 	__le32	cfa_blob_len;
@@ -55,6 +63,22 @@ int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
 void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as);
 int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf);
 
+static inline int ceph_fname_alloc_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+	if (!IS_ENCRYPTED(parent))
+		return 0;
+	return fscrypt_fname_alloc_buffer(NAME_MAX, fname);
+}
+
+static inline void ceph_fname_free_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+	if (IS_ENCRYPTED(parent))
+		fscrypt_fname_free_buffer(fname);
+}
+
+int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+			struct fscrypt_str *oname, bool *is_nokey);
+
 #else /* CONFIG_FS_ENCRYPTION */
 
 static inline void ceph_fscrypt_set_ops(struct super_block *sb)
@@ -83,6 +107,23 @@ static inline int ceph_encode_encrypted_fname(const struct inode *parent,
 {
 	return -EOPNOTSUPP;
 }
+
+static inline int ceph_fname_alloc_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+	return 0;
+}
+
+static inline void ceph_fname_free_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+}
+
+static inline int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+				    struct fscrypt_str *oname, bool *is_nokey)
+{
+	oname->name = fname->name;
+	oname->len = fname->name_len;
+	return 0;
+}
 #endif /* CONFIG_FS_ENCRYPTION */
 
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 20/48] ceph: add fscrypt support to ceph_fill_trace
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (18 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 19/48] ceph: add helpers for converting names for userland presentation Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 21/48] ceph: add support to readdir for encrypted filenames Jeff Layton
                   ` (31 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

When we get a dentry in a trace, decrypt the name so we can properly
instantiate the dentry.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/inode.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 55e23e2601df..28a5b70e5521 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1397,8 +1397,15 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 		if (dir && req->r_op == CEPH_MDS_OP_LOOKUPNAME &&
 		    test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags) &&
 		    !test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags)) {
+			bool is_nokey = false;
 			struct qstr dname;
 			struct dentry *dn, *parent;
+			struct fscrypt_str oname = FSTR_INIT(NULL, 0);
+			struct ceph_fname fname = { .dir	= dir,
+						    .name	= rinfo->dname,
+						    .ctext	= rinfo->altname,
+						    .name_len	= rinfo->dname_len,
+						    .ctext_len	= rinfo->altname_len };
 
 			BUG_ON(!rinfo->head->is_target);
 			BUG_ON(req->r_dentry);
@@ -1406,8 +1413,20 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 			parent = d_find_any_alias(dir);
 			BUG_ON(!parent);
 
-			dname.name = rinfo->dname;
-			dname.len = rinfo->dname_len;
+			err = ceph_fname_alloc_buffer(dir, &oname);
+			if (err < 0) {
+				dput(parent);
+				goto done;
+			}
+
+			err = ceph_fname_to_usr(&fname, NULL, &oname, &is_nokey);
+			if (err < 0) {
+				dput(parent);
+				ceph_fname_free_buffer(dir, &oname);
+				goto done;
+			}
+			dname.name = oname.name;
+			dname.len = oname.len;
 			dname.hash = full_name_hash(parent, dname.name, dname.len);
 			tvino.ino = le64_to_cpu(rinfo->targeti.in->ino);
 			tvino.snap = le64_to_cpu(rinfo->targeti.in->snapid);
@@ -1422,9 +1441,15 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 				     dname.len, dname.name, dn);
 				if (!dn) {
 					dput(parent);
+					ceph_fname_free_buffer(dir, &oname);
 					err = -ENOMEM;
 					goto done;
 				}
+				if (is_nokey) {
+					spin_lock(&dn->d_lock);
+					dn->d_flags |= DCACHE_NOKEY_NAME;
+					spin_unlock(&dn->d_lock);
+				}
 				err = 0;
 			} else if (d_really_is_positive(dn) &&
 				   (ceph_ino(d_inode(dn)) != tvino.ino ||
@@ -1436,6 +1461,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 				dput(dn);
 				goto retry_lookup;
 			}
+			ceph_fname_free_buffer(dir, &oname);
 
 			req->r_dentry = dn;
 			dput(parent);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 21/48] ceph: add support to readdir for encrypted filenames
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (19 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 20/48] ceph: add fscrypt support to ceph_fill_trace Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 22/48] ceph: create symlinks with encrypted and base64-encoded targets Jeff Layton
                   ` (30 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Add helper functions for buffer management and for decrypting filenames
returned by the MDS. Wire those into the readdir codepaths.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c   | 62 +++++++++++++++++++++++++++++++++++++++----------
 fs/ceph/inode.c | 38 +++++++++++++++++++++++++++---
 2 files changed, 85 insertions(+), 15 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 7977484d0317..f8812c976ba0 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -9,6 +9,7 @@
 
 #include "super.h"
 #include "mds_client.h"
+#include "crypto.h"
 
 /*
  * Directory operations: readdir, lookup, create, link, unlink,
@@ -241,7 +242,9 @@ static int __dcache_readdir(struct file *file,  struct dir_context *ctx,
 		di = ceph_dentry(dentry);
 		if (d_unhashed(dentry) ||
 		    d_really_is_negative(dentry) ||
-		    di->lease_shared_gen != shared_gen) {
+		    di->lease_shared_gen != shared_gen ||
+		    ((dentry->d_flags & DCACHE_NOKEY_NAME) &&
+		     fscrypt_has_encryption_key(dir))) {
 			spin_unlock(&dentry->d_lock);
 			dput(dentry);
 			err = -EAGAIN;
@@ -313,6 +316,8 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 	int err;
 	unsigned frag = -1;
 	struct ceph_mds_reply_info_parsed *rinfo;
+	struct fscrypt_str tname = FSTR_INIT(NULL, 0);
+	struct fscrypt_str oname = FSTR_INIT(NULL, 0);
 
 	dout("readdir %p file %p pos %llx\n", inode, file, ctx->pos);
 	if (dfi->file_info.flags & CEPH_F_ATEND)
@@ -340,6 +345,10 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 		ctx->pos = 2;
 	}
 
+	err = fscrypt_prepare_readdir(inode);
+	if (err)
+		goto out;
+
 	spin_lock(&ci->i_ceph_lock);
 	/* request Fx cap. if have Fx, we don't need to release Fs cap
 	 * for later create/unlink. */
@@ -360,6 +369,14 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 		spin_unlock(&ci->i_ceph_lock);
 	}
 
+	err = ceph_fname_alloc_buffer(inode, &tname);
+	if (err < 0)
+		goto out;
+
+	err = ceph_fname_alloc_buffer(inode, &oname);
+	if (err < 0)
+		goto out;
+
 	/* proceed with a normal readdir */
 more:
 	/* do we have the correct frag content buffered? */
@@ -387,12 +404,14 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 		dout("readdir fetching %llx.%llx frag %x offset '%s'\n",
 		     ceph_vinop(inode), frag, dfi->last_name);
 		req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
-		if (IS_ERR(req))
-			return PTR_ERR(req);
+		if (IS_ERR(req)) {
+			err = PTR_ERR(req);
+			goto out;
+		}
 		err = ceph_alloc_readdir_reply_buffer(req, inode);
 		if (err) {
 			ceph_mdsc_put_request(req);
-			return err;
+			goto out;
 		}
 		/* hints to request -> mds selection code */
 		req->r_direct_mode = USE_AUTH_MDS;
@@ -405,7 +424,8 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 			req->r_path2 = kstrdup(dfi->last_name, GFP_KERNEL);
 			if (!req->r_path2) {
 				ceph_mdsc_put_request(req);
-				return -ENOMEM;
+				err = -ENOMEM;
+				goto out;
 			}
 		} else if (is_hash_order(ctx->pos)) {
 			req->r_args.readdir.offset_hash =
@@ -426,7 +446,7 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 		err = ceph_mdsc_do_request(mdsc, NULL, req);
 		if (err < 0) {
 			ceph_mdsc_put_request(req);
-			return err;
+			goto out;
 		}
 		dout("readdir got and parsed readdir result=%d on "
 		     "frag %x, end=%d, complete=%d, hash_order=%d\n",
@@ -479,7 +499,7 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 			err = note_last_dentry(dfi, rde->name, rde->name_len,
 					       next_offset);
 			if (err)
-				return err;
+				goto out;
 		} else if (req->r_reply_info.dir_end) {
 			dfi->next_offset = 2;
 			/* keep last name */
@@ -507,22 +527,37 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 	}
 	for (; i < rinfo->dir_nr; i++) {
 		struct ceph_mds_reply_dir_entry *rde = rinfo->dir_entries + i;
+		struct ceph_fname fname = { .dir	= inode,
+					    .name	= rde->name,
+					    .name_len	= rde->name_len,
+					    .ctext	= rde->altname,
+					    .ctext_len	= rde->altname_len };
+		u32 olen = oname.len;
 
 		BUG_ON(rde->offset < ctx->pos);
+		BUG_ON(!rde->inode.in);
 
 		ctx->pos = rde->offset;
 		dout("readdir (%d/%d) -> %llx '%.*s' %p\n",
 		     i, rinfo->dir_nr, ctx->pos,
 		     rde->name_len, rde->name, &rde->inode.in);
 
-		BUG_ON(!rde->inode.in);
+		err = ceph_fname_to_usr(&fname, &tname, &oname, NULL);
+		if (err) {
+			dout("Unable to decode %.*s. Skipping it.\n", rde->name_len, rde->name);
+			continue;
+		}
 
-		if (!dir_emit(ctx, rde->name, rde->name_len,
+		if (!dir_emit(ctx, oname.name, oname.len,
 			      ceph_present_ino(inode->i_sb, le64_to_cpu(rde->inode.in->ino)),
 			      le32_to_cpu(rde->inode.in->mode) >> 12)) {
 			dout("filldir stopping us...\n");
-			return 0;
+			err = 0;
+			goto out;
 		}
+
+		/* Reset the lengths to their original allocated vals */
+		oname.len = olen;
 		ctx->pos++;
 	}
 
@@ -577,9 +612,12 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 					dfi->dir_ordered_count);
 		spin_unlock(&ci->i_ceph_lock);
 	}
-
+	err = 0;
 	dout("readdir %p file %p done.\n", inode, file);
-	return 0;
+out:
+	ceph_fname_free_buffer(inode, &tname);
+	ceph_fname_free_buffer(inode, &oname);
+	return err;
 }
 
 static void reset_readdir(struct ceph_dir_file_info *dfi)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 28a5b70e5521..3f3231383780 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1741,7 +1741,8 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 			     struct ceph_mds_session *session)
 {
 	struct dentry *parent = req->r_dentry;
-	struct ceph_inode_info *ci = ceph_inode(d_inode(parent));
+	struct inode *inode = d_inode(parent);
+	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
 	struct qstr dname;
 	struct dentry *dn;
@@ -1751,6 +1752,8 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 	u32 last_hash = 0;
 	u32 fpos_offset;
 	struct ceph_readdir_cache_control cache_ctl = {};
+	struct fscrypt_str tname = FSTR_INIT(NULL, 0);
+	struct fscrypt_str oname = FSTR_INIT(NULL, 0);
 
 	if (test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags))
 		return readdir_prepopulate_inodes_only(req, session);
@@ -1802,14 +1805,36 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 	cache_ctl.index = req->r_readdir_cache_idx;
 	fpos_offset = req->r_readdir_offset;
 
+	err = ceph_fname_alloc_buffer(inode, &tname);
+	if (err < 0)
+		goto out;
+
+	err = ceph_fname_alloc_buffer(inode, &oname);
+	if (err < 0)
+		goto out;
+
 	/* FIXME: release caps/leases if error occurs */
 	for (i = 0; i < rinfo->dir_nr; i++) {
+		bool is_nokey = false;
 		struct ceph_mds_reply_dir_entry *rde = rinfo->dir_entries + i;
 		struct ceph_vino tvino;
+		u32 olen = oname.len;
+		struct ceph_fname fname = { .dir	= inode,
+					    .name	= rde->name,
+					    .name_len	= rde->name_len,
+					    .ctext	= rde->altname,
+					    .ctext_len	= rde->altname_len };
+
+		err = ceph_fname_to_usr(&fname, &tname, &oname, &is_nokey);
+		if (err) {
+			dout("Unable to decode %.*s. Skipping it.", rde->name_len, rde->name);
+			continue;
+		}
 
-		dname.name = rde->name;
-		dname.len = rde->name_len;
+		dname.name = oname.name;
+		dname.len = oname.len;
 		dname.hash = full_name_hash(parent, dname.name, dname.len);
+		oname.len = olen;
 
 		tvino.ino = le64_to_cpu(rde->inode.in->ino);
 		tvino.snap = le64_to_cpu(rde->inode.in->snapid);
@@ -1840,6 +1865,11 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 				err = -ENOMEM;
 				goto out;
 			}
+			if (is_nokey) {
+				spin_lock(&dn->d_lock);
+				dn->d_flags |= DCACHE_NOKEY_NAME;
+				spin_unlock(&dn->d_lock);
+			}
 		} else if (d_really_is_positive(dn) &&
 			   (ceph_ino(d_inode(dn)) != tvino.ino ||
 			    ceph_snap(d_inode(dn)) != tvino.snap)) {
@@ -1928,6 +1958,8 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 		req->r_readdir_cache_idx = cache_ctl.index;
 	}
 	ceph_readdir_cache_release(&cache_ctl);
+	ceph_fname_free_buffer(inode, &tname);
+	ceph_fname_free_buffer(inode, &oname);
 	dout("readdir_prepopulate done\n");
 	return err;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 22/48] ceph: create symlinks with encrypted and base64-encoded targets
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (20 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 21/48] ceph: add support to readdir for encrypted filenames Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 23/48] ceph: make ceph_get_name decrypt filenames Jeff Layton
                   ` (29 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

When creating symlinks in encrypted directories, encrypt and
base64-encode the target with the new inode's key before sending to the
MDS.

When filling a symlinked inode, base64-decode it into a buffer that
we'll keep in ci->i_symlink. When get_link is called, decrypt the buffer
into a new one that will hang off i_link.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c   |  51 ++++++++++++++++++++---
 fs/ceph/inode.c | 106 ++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 140 insertions(+), 17 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index f8812c976ba0..bf686e4af27a 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -948,6 +948,40 @@ static int ceph_create(struct user_namespace *mnt_userns, struct inode *dir,
 	return ceph_mknod(mnt_userns, dir, dentry, mode, 0);
 }
 
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static int prep_encrypted_symlink_target(struct ceph_mds_request *req, const char *dest)
+{
+	int err;
+	int len = strlen(dest);
+	struct fscrypt_str osd_link = FSTR_INIT(NULL, 0);
+
+	err = fscrypt_prepare_symlink(req->r_parent, dest, len, PATH_MAX, &osd_link);
+	if (err)
+		goto out;
+
+	err = fscrypt_encrypt_symlink(req->r_new_inode, dest, len, &osd_link);
+	if (err)
+		goto out;
+
+	req->r_path2 = kmalloc(FSCRYPT_BASE64URL_CHARS(osd_link.len) + 1, GFP_KERNEL);
+	if (!req->r_path2) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	len = fscrypt_base64url_encode(osd_link.name, osd_link.len, req->r_path2);
+	req->r_path2[len] = '\0';
+out:
+	fscrypt_fname_free_buffer(&osd_link);
+	return err;
+}
+#else
+static int prep_encrypted_symlink_target(struct ceph_mds_request *req, const char *dest)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 			struct dentry *dentry, const char *dest)
 {
@@ -979,14 +1013,21 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out_req;
 	}
 
-	req->r_path2 = kstrdup(dest, GFP_KERNEL);
-	if (!req->r_path2) {
-		err = -ENOMEM;
-		goto out_req;
-	}
 	req->r_parent = dir;
 	ihold(dir);
 
+	if (IS_ENCRYPTED(req->r_new_inode)) {
+		err = prep_encrypted_symlink_target(req, dest);
+		if (err)
+			goto out_req;
+	} else {
+		req->r_path2 = kstrdup(dest, GFP_KERNEL);
+		if (!req->r_path2) {
+			err = -ENOMEM;
+			goto out_req;
+		}
+	}
+
 	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
 	req->r_dentry = dget(dentry);
 	req->r_num_caps = 2;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 3f3231383780..e7fb212661d1 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -35,6 +35,7 @@
  */
 
 static const struct inode_operations ceph_symlink_iops;
+static const struct inode_operations ceph_encrypted_symlink_iops;
 
 static void ceph_inode_work(struct work_struct *work);
 
@@ -632,6 +633,7 @@ void ceph_free_inode(struct inode *inode)
 #ifdef CONFIG_FS_ENCRYPTION
 	kfree(ci->fscrypt_auth);
 #endif
+	fscrypt_free_inode(inode);
 	kmem_cache_free(ceph_inode_cachep, ci);
 }
 
@@ -829,6 +831,33 @@ void ceph_fill_file_time(struct inode *inode, int issued,
 		     inode, time_warp_seq, ci->i_time_warp_seq);
 }
 
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static int decode_encrypted_symlink(const char *encsym, int enclen, u8 **decsym)
+{
+	int declen;
+	u8 *sym;
+
+	sym = kmalloc(enclen + 1, GFP_NOFS);
+	if (!sym)
+		return -ENOMEM;
+
+	declen = fscrypt_base64url_decode(encsym, enclen, sym);
+	if (declen < 0) {
+		pr_err("%s: can't decode symlink (%d). Content: %.*s\n", __func__, declen, enclen, encsym);
+		kfree(sym);
+		return -EIO;
+	}
+	sym[declen + 1] = '\0';
+	*decsym = sym;
+	return declen;
+}
+#else
+static int decode_encrypted_symlink(const char *encsym, int symlen, u8 **decsym)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 /*
  * Populate an inode based on info from mds.  May be called on new or
  * existing inodes.
@@ -1062,26 +1091,39 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 		inode->i_fop = &ceph_file_fops;
 		break;
 	case S_IFLNK:
-		inode->i_op = &ceph_symlink_iops;
 		if (!ci->i_symlink) {
 			u32 symlen = iinfo->symlink_len;
 			char *sym;
 
 			spin_unlock(&ci->i_ceph_lock);
 
-			if (symlen != i_size_read(inode)) {
-				pr_err("%s %llx.%llx BAD symlink "
-					"size %lld\n", __func__,
-					ceph_vinop(inode),
-					i_size_read(inode));
+			if (IS_ENCRYPTED(inode)) {
+				if (symlen != i_size_read(inode))
+					pr_err("%s %llx.%llx BAD symlink size %lld\n",
+						__func__, ceph_vinop(inode), i_size_read(inode));
+
+				err = decode_encrypted_symlink(iinfo->symlink, symlen, (u8 **)&sym);
+				if (err < 0) {
+					pr_err("%s decoding encrypted symlink failed: %d\n",
+						__func__, err);
+					goto out;
+				}
+				symlen = err;
 				i_size_write(inode, symlen);
 				inode->i_blocks = calc_inode_blocks(symlen);
-			}
+			} else {
+				if (symlen != i_size_read(inode)) {
+					pr_err("%s %llx.%llx BAD symlink size %lld\n",
+						__func__, ceph_vinop(inode), i_size_read(inode));
+					i_size_write(inode, symlen);
+					inode->i_blocks = calc_inode_blocks(symlen);
+				}
 
-			err = -ENOMEM;
-			sym = kstrndup(iinfo->symlink, symlen, GFP_NOFS);
-			if (!sym)
-				goto out;
+				err = -ENOMEM;
+				sym = kstrndup(iinfo->symlink, symlen, GFP_NOFS);
+				if (!sym)
+					goto out;
+			}
 
 			spin_lock(&ci->i_ceph_lock);
 			if (!ci->i_symlink)
@@ -1089,7 +1131,17 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 			else
 				kfree(sym); /* lost a race */
 		}
-		inode->i_link = ci->i_symlink;
+
+		if (IS_ENCRYPTED(inode)) {
+			/*
+			 * Encrypted symlinks need to be decrypted before we can
+			 * cache their targets in i_link. Don't touch it here.
+			 */
+			inode->i_op = &ceph_encrypted_symlink_iops;
+		} else {
+			inode->i_link = ci->i_symlink;
+			inode->i_op = &ceph_symlink_iops;
+		}
 		break;
 	case S_IFDIR:
 		inode->i_op = &ceph_dir_iops;
@@ -2144,6 +2196,29 @@ static void ceph_inode_work(struct work_struct *work)
 	iput(inode);
 }
 
+static const char *ceph_encrypted_get_link(struct dentry *dentry, struct inode *inode,
+					   struct delayed_call *done)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+
+	if (!dentry)
+		return ERR_PTR(-ECHILD);
+
+	return fscrypt_get_symlink(inode, ci->i_symlink, i_size_read(inode), done);
+}
+
+static int ceph_encrypted_symlink_getattr(struct user_namespace *mnt_userns,
+					  const struct path *path, struct kstat *stat,
+					  u32 request_mask, unsigned int query_flags)
+{
+	int ret;
+
+	ret = ceph_getattr(mnt_userns, path, stat, request_mask, query_flags);
+	if (ret)
+		return ret;
+	return fscrypt_symlink_getattr(path, stat);
+}
+
 /*
  * symlinks
  */
@@ -2154,6 +2229,13 @@ static const struct inode_operations ceph_symlink_iops = {
 	.listxattr = ceph_listxattr,
 };
 
+static const struct inode_operations ceph_encrypted_symlink_iops = {
+	.get_link = ceph_encrypted_get_link,
+	.setattr = ceph_setattr,
+	.getattr = ceph_encrypted_symlink_getattr,
+	.listxattr = ceph_listxattr,
+};
+
 int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 23/48] ceph: make ceph_get_name decrypt filenames
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (21 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 22/48] ceph: create symlinks with encrypted and base64-encoded targets Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 24/48] ceph: add a new ceph.fscrypt.auth vxattr Jeff Layton
                   ` (28 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

When we do a lookupino to the MDS, we get a filename in the trace.
ceph_get_name uses that name directly, so we must properly decrypt
it before copying it to the name buffer.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/export.c | 44 ++++++++++++++++++++++++++++++++------------
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/fs/ceph/export.c b/fs/ceph/export.c
index e0fa66ac8b9f..0ebf2bd93055 100644
--- a/fs/ceph/export.c
+++ b/fs/ceph/export.c
@@ -7,6 +7,7 @@
 
 #include "super.h"
 #include "mds_client.h"
+#include "crypto.h"
 
 /*
  * Basic fh
@@ -534,7 +535,9 @@ static int ceph_get_name(struct dentry *parent, char *name,
 {
 	struct ceph_mds_client *mdsc;
 	struct ceph_mds_request *req;
+	struct inode *dir = d_inode(parent);
 	struct inode *inode = d_inode(child);
+	struct ceph_mds_reply_info_parsed *rinfo;
 	int err;
 
 	if (ceph_snap(inode) != CEPH_NOSNAP)
@@ -546,30 +549,47 @@ static int ceph_get_name(struct dentry *parent, char *name,
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	inode_lock(d_inode(parent));
-
+	inode_lock(dir);
 	req->r_inode = inode;
 	ihold(inode);
 	req->r_ino2 = ceph_vino(d_inode(parent));
-	req->r_parent = d_inode(parent);
-	ihold(req->r_parent);
+	req->r_parent = dir;
+	ihold(dir);
 	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
 	req->r_num_caps = 2;
 	err = ceph_mdsc_do_request(mdsc, NULL, req);
+	inode_unlock(dir);
 
-	inode_unlock(d_inode(parent));
+	if (err)
+		goto out;
 
-	if (!err) {
-		struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
+	rinfo = &req->r_reply_info;
+	if (!IS_ENCRYPTED(dir)) {
 		memcpy(name, rinfo->dname, rinfo->dname_len);
 		name[rinfo->dname_len] = 0;
-		dout("get_name %p ino %llx.%llx name %s\n",
-		     child, ceph_vinop(inode), name);
 	} else {
-		dout("get_name %p ino %llx.%llx err %d\n",
-		     child, ceph_vinop(inode), err);
-	}
+		struct fscrypt_str oname = FSTR_INIT(NULL, 0);
+		struct ceph_fname fname = { .dir	= dir,
+					    .name	= rinfo->dname,
+					    .ctext	= rinfo->altname,
+					    .name_len	= rinfo->dname_len,
+					    .ctext_len	= rinfo->altname_len };
+
+		err = ceph_fname_alloc_buffer(dir, &oname);
+		if (err < 0)
+			goto out;
 
+		err = ceph_fname_to_usr(&fname, NULL, &oname, NULL);
+		if (!err) {
+			memcpy(name, oname.name, oname.len);
+			name[oname.len] = 0;
+		}
+		ceph_fname_free_buffer(dir, &oname);
+	}
+out:
+	dout("get_name %p ino %llx.%llx err %d %s%s\n",
+		     child, ceph_vinop(inode), err,
+		     err ? "" : "name ", err ? "" : name);
 	ceph_mdsc_put_request(req);
 	return err;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 24/48] ceph: add a new ceph.fscrypt.auth vxattr
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (22 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 23/48] ceph: make ceph_get_name decrypt filenames Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 25/48] ceph: add some fscrypt guardrails Jeff Layton
                   ` (27 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Give the client a way to get at the xattr from userland, mostly for
future debugging purposes.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/xattr.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index 5e3522457deb..b872673a16a9 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -352,6 +352,23 @@ static ssize_t ceph_vxattrcb_auth_mds(struct ceph_inode_info *ci,
 	return ret;
 }
 
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static bool ceph_vxattrcb_fscrypt_auth_exists(struct ceph_inode_info *ci)
+{
+	return ci->fscrypt_auth_len;
+}
+
+static ssize_t ceph_vxattrcb_fscrypt_auth(struct ceph_inode_info *ci, char *val, size_t size)
+{
+	if (size) {
+		if (size < ci->fscrypt_auth_len)
+			return -ERANGE;
+		memcpy(val, ci->fscrypt_auth, ci->fscrypt_auth_len);
+	}
+	return ci->fscrypt_auth_len;
+}
+#endif /* CONFIG_FS_ENCRYPTION */
+
 #define CEPH_XATTR_NAME(_type, _name)	XATTR_CEPH_PREFIX #_type "." #_name
 #define CEPH_XATTR_NAME2(_type, _name, _name2)	\
 	XATTR_CEPH_PREFIX #_type "." #_name "." #_name2
@@ -492,6 +509,15 @@ static struct ceph_vxattr ceph_common_vxattrs[] = {
 		.exists_cb = NULL,
 		.flags = VXATTR_FLAG_READONLY,
 	},
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+	{
+		.name = "ceph.fscrypt.auth",
+		.name_size = sizeof("ceph.fscrypt.auth"),
+		.getxattr_cb = ceph_vxattrcb_fscrypt_auth,
+		.exists_cb = ceph_vxattrcb_fscrypt_auth_exists,
+		.flags = VXATTR_FLAG_READONLY,
+	},
+#endif /* CONFIG_FS_ENCRYPTION */
 	{ .name = NULL, 0 }	/* Required table terminator */
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 25/48] ceph: add some fscrypt guardrails
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (23 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 24/48] ceph: add a new ceph.fscrypt.auth vxattr Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 26/48] ceph: don't allow changing layout on encrypted files/directories Jeff Layton
                   ` (26 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Ensure that we all into fscrypt to do a proper check for keys on link,
rename, etc.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c   |  8 ++++++++
 fs/ceph/file.c  | 14 +++++++++++++-
 fs/ceph/inode.c |  4 ++++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index bf686e4af27a..37c9c589ee27 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1127,6 +1127,10 @@ static int ceph_link(struct dentry *old_dentry, struct inode *dir,
 	if (ceph_snap(dir) != CEPH_NOSNAP)
 		return -EROFS;
 
+	err = fscrypt_prepare_link(old_dentry, dir, dentry);
+	if (err)
+		return err;
+
 	dout("link in dir %p old_dentry %p dentry %p\n", dir,
 	     old_dentry, dentry);
 	req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_LINK, USE_AUTH_MDS);
@@ -1324,6 +1328,10 @@ static int ceph_rename(struct user_namespace *mnt_userns, struct inode *old_dir,
 	    (!ceph_quota_is_same_realm(old_dir, new_dir)))
 		return -EXDEV;
 
+	err = fscrypt_prepare_rename(old_dir, old_dentry, new_dir, new_dentry, flags);
+	if (err)
+		return err;
+
 	dout("rename dir %p dentry %p to dir %p dentry %p\n",
 	     old_dir, old_dentry, new_dir, new_dentry);
 	req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 5937a25ddddd..01e7cdd84c36 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -356,8 +356,13 @@ int ceph_open(struct inode *inode, struct file *file)
 
 	/* filter out O_CREAT|O_EXCL; vfs did that already.  yuck. */
 	flags = file->f_flags & ~(O_CREAT|O_EXCL);
-	if (S_ISDIR(inode->i_mode))
+	if (S_ISDIR(inode->i_mode)) {
 		flags = O_DIRECTORY;  /* mds likes to know */
+	} else if (S_ISREG(inode->i_mode)) {
+		err = fscrypt_file_open(inode, file);
+		if (err)
+			return err;
+	}
 
 	dout("open inode %p ino %llx.%llx file %p flags %d (%d)\n", inode,
 	     ceph_vinop(inode), file, flags, file->f_flags);
@@ -802,6 +807,13 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 		dout("atomic_open finish_no_open on dn %p\n", dn);
 		err = finish_no_open(file, dn);
 	} else {
+		if (IS_ENCRYPTED(dir) &&
+		    !fscrypt_has_permitted_context(dir, d_inode(dentry))) {
+			pr_warn("Inconsistent encryption context (parent %llx:%llx child %llx:%llx)\n",
+				ceph_vinop(dir), ceph_vinop(d_inode(dentry)));
+			goto out_req;
+		}
+
 		dout("atomic_open finish_open on dn %p\n", dn);
 		if (req->r_op == CEPH_MDS_OP_CREATE && req->r_reply_info.has_create_ino) {
 			struct inode *newino = d_inode(dentry);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index e7fb212661d1..55022fdb1fdf 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2504,6 +2504,10 @@ int ceph_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 	if (ceph_inode_is_shutdown(inode))
 		return -ESTALE;
 
+	err = fscrypt_prepare_setattr(dentry, attr);
+	if (err)
+		return err;
+
 	err = setattr_prepare(&init_user_ns, dentry, attr);
 	if (err != 0)
 		return err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 26/48] ceph: don't allow changing layout on encrypted files/directories
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (24 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 25/48] ceph: add some fscrypt guardrails Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 27/48] libceph: add CEPH_OSD_OP_ASSERT_VER support Jeff Layton
                   ` (25 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov, Luis Henriques

From: Luis Henriques <lhenriques@suse.de>

Encryption is currently only supported on files/directories with layouts
where stripe_count=1.  Forbid changing layouts when encryption is involved.

Signed-off-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/ioctl.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c
index 477ecc667aee..480d18bb2ff0 100644
--- a/fs/ceph/ioctl.c
+++ b/fs/ceph/ioctl.c
@@ -294,6 +294,10 @@ static long ceph_set_encryption_policy(struct file *file, unsigned long arg)
 	struct inode *inode = file_inode(file);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 
+	/* encrypted directories can't have striped layout */
+	if (ci->i_layout.stripe_count > 1)
+		return -EINVAL;
+
 	ret = vet_mds_for_fscrypt(file);
 	if (ret)
 		return ret;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 27/48] libceph: add CEPH_OSD_OP_ASSERT_VER support
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (25 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 26/48] ceph: don't allow changing layout on encrypted files/directories Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 28/48] ceph: size handling for encrypted inodes in cap updates Jeff Layton
                   ` (24 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

...and record the user_version in the reply in a new field in
ceph_osd_request, so we can populate the assert_ver appropriately.
Shuffle the fields a bit too so that the new field fits in an
existing hole on x86_64.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/ceph/osd_client.h | 6 +++++-
 include/linux/ceph/rados.h      | 4 ++++
 net/ceph/osd_client.c           | 5 +++++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index 3431011f364d..90ee000b0124 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -145,6 +145,9 @@ struct ceph_osd_req_op {
 			u32 src_fadvise_flags;
 			struct ceph_osd_data osd_data;
 		} copy_from;
+		struct {
+			u64 ver;
+		} assert_ver;
 	};
 };
 
@@ -199,6 +202,7 @@ struct ceph_osd_request {
 	struct ceph_osd_client *r_osdc;
 	struct kref       r_kref;
 	bool              r_mempool;
+	bool		  r_linger;           /* don't resend on failure */
 	struct completion r_completion;       /* private to osd_client.c */
 	ceph_osdc_callback_t r_callback;
 
@@ -211,9 +215,9 @@ struct ceph_osd_request {
 	struct ceph_snap_context *r_snapc;    /* for writes */
 	struct timespec64 r_mtime;            /* ditto */
 	u64 r_data_offset;                    /* ditto */
-	bool r_linger;                        /* don't resend on failure */
 
 	/* internal */
+	u64 r_version;			      /* data version sent in reply */
 	unsigned long r_stamp;                /* jiffies, send or check time */
 	unsigned long r_start_stamp;          /* jiffies */
 	ktime_t r_start_latency;              /* ktime_t */
diff --git a/include/linux/ceph/rados.h b/include/linux/ceph/rados.h
index 43a7a1573b51..73c3efbec36c 100644
--- a/include/linux/ceph/rados.h
+++ b/include/linux/ceph/rados.h
@@ -523,6 +523,10 @@ struct ceph_osd_op {
 		struct {
 			__le64 cookie;
 		} __attribute__ ((packed)) notify;
+		struct {
+			__le64 unused;
+			__le64 ver;
+		} __attribute__ ((packed)) assert_ver;
 		struct {
 			__le64 offset, length;
 			__le64 src_offset;
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 1c5815530e0d..8a9416e4893d 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1038,6 +1038,10 @@ static u32 osd_req_encode_op(struct ceph_osd_op *dst,
 		dst->copy_from.src_fadvise_flags =
 			cpu_to_le32(src->copy_from.src_fadvise_flags);
 		break;
+	case CEPH_OSD_OP_ASSERT_VER:
+		dst->assert_ver.unused = cpu_to_le64(0);
+		dst->assert_ver.ver = cpu_to_le64(src->assert_ver.ver);
+		break;
 	default:
 		pr_err("unsupported osd opcode %s\n",
 			ceph_osd_op_name(src->op));
@@ -3763,6 +3767,7 @@ static void handle_reply(struct ceph_osd *osd, struct ceph_msg *msg)
 	 * one (type of) reply back.
 	 */
 	WARN_ON(!(m.flags & CEPH_OSD_FLAG_ONDISK));
+	req->r_version = m.user_version;
 	req->r_result = m.result ?: data_len;
 	finish_request(req);
 	mutex_unlock(&osd->lock);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 28/48] ceph: size handling for encrypted inodes in cap updates
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (26 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 27/48] libceph: add CEPH_OSD_OP_ASSERT_VER support Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 29/48] ceph: fscrypt_file field handling in MClientRequest messages Jeff Layton
                   ` (23 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Transmit the rounded-up size as the normal size, and fill out the
fscrypt_file field with the real file size.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c   | 43 +++++++++++++++++++++++++------------------
 fs/ceph/crypto.h |  4 ++++
 2 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 3a9672e822d9..8a4f0157854e 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1215,10 +1215,9 @@ struct cap_msg_args {
 	umode_t			mode;
 	bool			inline_data;
 	bool			wake;
+	bool			encrypted;
 	u32			fscrypt_auth_len;
-	u32			fscrypt_file_len;
 	u8			fscrypt_auth[sizeof(struct ceph_fscrypt_auth)]; // for context
-	u8			fscrypt_file[sizeof(u64)]; // for size
 };
 
 /* Marshal up the cap msg to the MDS */
@@ -1253,7 +1252,12 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg)
 	fc->ino = cpu_to_le64(arg->ino);
 	fc->snap_follows = cpu_to_le64(arg->follows);
 
-	fc->size = cpu_to_le64(arg->size);
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+	if (arg->encrypted)
+		fc->size = cpu_to_le64(round_up(arg->size, CEPH_FSCRYPT_BLOCK_SIZE));
+	else
+#endif
+		fc->size = cpu_to_le64(arg->size);
 	fc->max_size = cpu_to_le64(arg->max_size);
 	ceph_encode_timespec64(&fc->mtime, &arg->mtime);
 	ceph_encode_timespec64(&fc->atime, &arg->atime);
@@ -1313,11 +1317,17 @@ static void encode_cap_msg(struct ceph_msg *msg, struct cap_msg_args *arg)
 	ceph_encode_64(&p, 0);
 
 #if IS_ENABLED(CONFIG_FS_ENCRYPTION)
-	/* fscrypt_auth and fscrypt_file (version 12) */
+	/*
+	 * fscrypt_auth and fscrypt_file (version 12)
+	 *
+	 * fscrypt_auth holds the crypto context (if any). fscrypt_file
+	 * tracks the real i_size as an __le64 field (and we use a rounded-up
+	 * i_size in * the traditional size field).
+	 */
 	ceph_encode_32(&p, arg->fscrypt_auth_len);
 	ceph_encode_copy(&p, arg->fscrypt_auth, arg->fscrypt_auth_len);
-	ceph_encode_32(&p, arg->fscrypt_file_len);
-	ceph_encode_copy(&p, arg->fscrypt_file, arg->fscrypt_file_len);
+	ceph_encode_32(&p, sizeof(__le64));
+	ceph_encode_64(&p, arg->size);
 #else /* CONFIG_FS_ENCRYPTION */
 	ceph_encode_32(&p, 0);
 	ceph_encode_32(&p, 0);
@@ -1389,7 +1399,6 @@ static void __prep_cap(struct cap_msg_args *arg, struct ceph_cap *cap,
 	arg->follows = flushing ? ci->i_head_snapc->seq : 0;
 	arg->flush_tid = flush_tid;
 	arg->oldest_flush_tid = oldest_flush_tid;
-
 	arg->size = i_size_read(inode);
 	ci->i_reported_size = arg->size;
 	arg->max_size = ci->i_wanted_max_size;
@@ -1443,6 +1452,7 @@ static void __prep_cap(struct cap_msg_args *arg, struct ceph_cap *cap,
 		}
 	}
 	arg->flags = flags;
+	arg->encrypted = IS_ENCRYPTED(inode);
 #if IS_ENABLED(CONFIG_FS_ENCRYPTION)
 	if (ci->fscrypt_auth_len &&
 	    WARN_ON_ONCE(ci->fscrypt_auth_len != sizeof(struct ceph_fscrypt_auth))) {
@@ -1453,21 +1463,21 @@ static void __prep_cap(struct cap_msg_args *arg, struct ceph_cap *cap,
 		memcpy(arg->fscrypt_auth, ci->fscrypt_auth,
 			min_t(size_t, ci->fscrypt_auth_len, sizeof(arg->fscrypt_auth)));
 	}
-	/* FIXME: use this to track "real" size */
-	arg->fscrypt_file_len = 0;
 #endif /* CONFIG_FS_ENCRYPTION */
 }
 
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
 #define CAP_MSG_FIXED_FIELDS (sizeof(struct ceph_mds_caps) + \
-		      4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 4 + 8 + 8 + 4 + 4)
+		      4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 4 + 8 + 8 + 4 + 4 + 8)
 
-#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
 static inline int cap_msg_size(struct cap_msg_args *arg)
 {
-	return CAP_MSG_FIXED_FIELDS + arg->fscrypt_auth_len +
-			arg->fscrypt_file_len;
+	return CAP_MSG_FIXED_FIELDS + arg->fscrypt_auth_len;
 }
 #else
+#define CAP_MSG_FIXED_FIELDS (sizeof(struct ceph_mds_caps) + \
+		      4 + 8 + 4 + 4 + 8 + 4 + 4 + 4 + 8 + 8 + 4 + 8 + 8 + 4 + 4)
+
 static inline int cap_msg_size(struct cap_msg_args *arg)
 {
 	return CAP_MSG_FIXED_FIELDS;
@@ -1546,13 +1556,10 @@ static inline int __send_flush_snap(struct inode *inode,
 	arg.inline_data = capsnap->inline_data;
 	arg.flags = 0;
 	arg.wake = false;
+	arg.encrypted = IS_ENCRYPTED(inode);
 
-	/*
-	 * No fscrypt_auth changes from a capsnap. It will need
-	 * to update fscrypt_file on size changes (TODO).
-	 */
+	/* No fscrypt_auth changes from a capsnap.*/
 	arg.fscrypt_auth_len = 0;
-	arg.fscrypt_file_len = 0;
 
 	msg = ceph_msg_new(CEPH_MSG_CLIENT_CAPS, cap_msg_size(&arg),
 			   GFP_NOFS, false);
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index c2e0cbb5667b..ab27a7ed62c3 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -9,6 +9,10 @@
 #include <crypto/sha2.h>
 #include <linux/fscrypt.h>
 
+#define CEPH_FSCRYPT_BLOCK_SHIFT   12
+#define CEPH_FSCRYPT_BLOCK_SIZE    (_AC(1,UL) << CEPH_FSCRYPT_BLOCK_SHIFT)
+#define CEPH_FSCRYPT_BLOCK_MASK	   (~(CEPH_FSCRYPT_BLOCK_SIZE-1))
+
 struct ceph_fs_client;
 struct ceph_acl_sec_ctx;
 struct ceph_mds_request;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 29/48] ceph: fscrypt_file field handling in MClientRequest messages
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (27 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 28/48] ceph: size handling for encrypted inodes in cap updates Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 30/48] ceph: get file size from fscrypt_file when present in inode traces Jeff Layton
                   ` (22 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

For encrypted inodes, transmit a rounded-up size to the MDS as the
normal file size and send the real inode size in fscrypt_file field.

Also, fix up creates and truncates to also transmit fscrypt_file.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c        |  3 +++
 fs/ceph/file.c       |  2 ++
 fs/ceph/inode.c      | 18 ++++++++++++++++--
 fs/ceph/mds_client.c |  9 ++++++++-
 fs/ceph/mds_client.h |  2 ++
 5 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 37c9c589ee27..987c1579614c 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -916,6 +916,9 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out_req;
 	}
 
+	if (S_ISREG(mode) && IS_ENCRYPTED(dir))
+		set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
+
 	req->r_dentry = dget(dentry);
 	req->r_num_caps = 2;
 	req->r_parent = dir;
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 01e7cdd84c36..c65f38045f90 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -727,6 +727,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	req->r_args.open.mask = cpu_to_le32(mask);
 	req->r_parent = dir;
 	ihold(dir);
+	if (IS_ENCRYPTED(dir))
+		set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
 
 	if (flags & O_CREAT) {
 		struct ceph_file_layout lo;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 55022fdb1fdf..4dd84b629850 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2395,11 +2395,25 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 			}
 		} else if ((issued & CEPH_CAP_FILE_SHARED) == 0 ||
 			   attr->ia_size != isize) {
-			req->r_args.setattr.size = cpu_to_le64(attr->ia_size);
-			req->r_args.setattr.old_size = cpu_to_le64(isize);
 			mask |= CEPH_SETATTR_SIZE;
 			release |= CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL |
 				   CEPH_CAP_FILE_RD | CEPH_CAP_FILE_WR;
+			if (IS_ENCRYPTED(inode) && attr->ia_size) {
+				set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
+				mask |= CEPH_SETATTR_FSCRYPT_FILE;
+				req->r_args.setattr.size =
+					cpu_to_le64(round_up(attr->ia_size,
+							     CEPH_FSCRYPT_BLOCK_SIZE));
+				req->r_args.setattr.old_size =
+					cpu_to_le64(round_up(isize,
+							     CEPH_FSCRYPT_BLOCK_SIZE));
+				req->r_fscrypt_file = attr->ia_size;
+				/* FIXME: client must zero out any partial blocks! */
+			} else {
+				req->r_args.setattr.size = cpu_to_le64(attr->ia_size);
+				req->r_args.setattr.old_size = cpu_to_le64(isize);
+				req->r_fscrypt_file = 0;
+			}
 		}
 	}
 	if (ia_valid & ATTR_MTIME) {
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 1d3334b99047..93e5e3c4ba64 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2653,7 +2653,12 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
 	} else {
 		ceph_encode_32(p, 0);
 	}
-	ceph_encode_32(p, 0); // fscrypt_file for now
+	if (test_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags)) {
+		ceph_encode_32(p, sizeof(__le64));
+		ceph_encode_64(p, req->r_fscrypt_file);
+	} else {
+		ceph_encode_32(p, 0);
+	}
 }
 
 /*
@@ -2739,6 +2744,8 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 
 	/* fscrypt_file */
 	len += sizeof(u32);
+	if (test_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags))
+		len += sizeof(__le64);
 
 	msg = ceph_msg_new2(CEPH_MSG_CLIENT_REQUEST, len, 1, GFP_NOFS, false);
 	if (!msg) {
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 6a2ac489e06e..149a3a828472 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -276,6 +276,7 @@ struct ceph_mds_request {
 #define CEPH_MDS_R_DID_PREPOPULATE	(6) /* prepopulated readdir */
 #define CEPH_MDS_R_PARENT_LOCKED	(7) /* is r_parent->i_rwsem wlocked? */
 #define CEPH_MDS_R_ASYNC		(8) /* async request */
+#define CEPH_MDS_R_FSCRYPT_FILE		(9) /* must marshal fscrypt_file field */
 	unsigned long	r_req_flags;
 
 	struct mutex r_fill_mutex;
@@ -283,6 +284,7 @@ struct ceph_mds_request {
 	union ceph_mds_request_args r_args;
 
 	struct ceph_fscrypt_auth *r_fscrypt_auth;
+	u64	r_fscrypt_file;
 
 	u8 *r_altname;		    /* fscrypt binary crypttext for long filenames */
 	u32 r_altname_len;	    /* length of r_altname */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 30/48] ceph: get file size from fscrypt_file when present in inode traces
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (28 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 29/48] ceph: fscrypt_file field handling in MClientRequest messages Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 31/48] ceph: handle fscrypt fields in cap messages from MDS Jeff Layton
                   ` (21 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/inode.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 4dd84b629850..2497306eef58 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -982,6 +982,16 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 		     from_kgid(&init_user_ns, inode->i_gid));
 		ceph_decode_timespec64(&ci->i_btime, &iinfo->btime);
 		ceph_decode_timespec64(&ci->i_snap_btime, &iinfo->snap_btime);
+
+#ifdef CONFIG_FS_ENCRYPTION
+		if (iinfo->fscrypt_auth_len && !ci->fscrypt_auth) {
+			ci->fscrypt_auth_len = iinfo->fscrypt_auth_len;
+			ci->fscrypt_auth = iinfo->fscrypt_auth;
+			iinfo->fscrypt_auth = NULL;
+			iinfo->fscrypt_auth_len = 0;
+			inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
+		}
+#endif
 	}
 
 	if ((new_version || (new_issued & CEPH_CAP_LINK_SHARED)) &&
@@ -1005,6 +1015,7 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 
 	if (new_version ||
 	    (new_issued & (CEPH_CAP_ANY_FILE_RD | CEPH_CAP_ANY_FILE_WR))) {
+		u64 size = info->size;
 		s64 old_pool = ci->i_layout.pool_id;
 		struct ceph_string *old_ns;
 
@@ -1018,10 +1029,17 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 
 		pool_ns = old_ns;
 
+		if (IS_ENCRYPTED(inode) && size &&
+		    (iinfo->fscrypt_file_len == sizeof(__le64))) {
+			size = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file);
+			if (info->size != round_up(size, CEPH_FSCRYPT_BLOCK_SIZE))
+				pr_warn("size=%llu fscrypt_file=%llu\n", info->size, size);
+		}
+
 		queue_trunc = ceph_fill_file_size(inode, issued,
 					le32_to_cpu(info->truncate_seq),
 					le64_to_cpu(info->truncate_size),
-					le64_to_cpu(info->size));
+					le64_to_cpu(size));
 		/* only update max_size on auth cap */
 		if ((info->cap.flags & CEPH_CAP_FLAG_AUTH) &&
 		    ci->i_max_size != le64_to_cpu(info->max_size)) {
@@ -1061,16 +1079,6 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 		xattr_blob = NULL;
 	}
 
-#ifdef CONFIG_FS_ENCRYPTION
-	if (iinfo->fscrypt_auth_len && !ci->fscrypt_auth) {
-		ci->fscrypt_auth_len = iinfo->fscrypt_auth_len;
-		ci->fscrypt_auth = iinfo->fscrypt_auth;
-		iinfo->fscrypt_auth = NULL;
-		iinfo->fscrypt_auth_len = 0;
-		inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
-	}
-#endif
-
 	/* finally update i_version */
 	if (le64_to_cpu(info->version) > ci->i_version)
 		ci->i_version = le64_to_cpu(info->version);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 31/48] ceph: handle fscrypt fields in cap messages from MDS
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (29 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 30/48] ceph: get file size from fscrypt_file when present in inode traces Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 32/48] ceph: add __ceph_get_caps helper support Jeff Layton
                   ` (20 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 72 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 8a4f0157854e..9106340c9c0a 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -3331,6 +3331,9 @@ struct cap_extra_info {
 	/* currently issued */
 	int issued;
 	struct timespec64 btime;
+	u8 *fscrypt_auth;
+	u32 fscrypt_auth_len;
+	u64 fscrypt_file_size;
 };
 
 /*
@@ -3363,6 +3366,14 @@ static void handle_cap_grant(struct inode *inode,
 	bool deleted_inode = false;
 	bool fill_inline = false;
 
+	/*
+	 * If there is at least one crypto block then we'll trust fscrypt_file_size.
+	 * If the real length of the file is 0, then ignore it (it has probably been
+	 * truncated down to 0 by the MDS).
+	 */
+	if (IS_ENCRYPTED(inode) && size)
+		size = extra_info->fscrypt_file_size;
+
 	dout("handle_cap_grant inode %p cap %p mds%d seq %d %s\n",
 	     inode, cap, session->s_mds, seq, ceph_cap_string(newcaps));
 	dout(" size %llu max_size %llu, i_size %llu\n", size, max_size,
@@ -3841,7 +3852,8 @@ static void handle_cap_flushsnap_ack(struct inode *inode, u64 flush_tid,
  */
 static bool handle_cap_trunc(struct inode *inode,
 			     struct ceph_mds_caps *trunc,
-			     struct ceph_mds_session *session)
+			     struct ceph_mds_session *session,
+			     struct cap_extra_info *extra_info)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	int mds = session->s_mds;
@@ -3858,6 +3870,14 @@ static bool handle_cap_trunc(struct inode *inode,
 
 	issued |= implemented | dirty;
 
+	/*
+	 * If there is at least one crypto block then we'll trust fscrypt_file_size.
+	 * If the real length of the file is 0, then ignore it (it has probably been
+	 * truncated down to 0 by the MDS).
+	 */
+	if (IS_ENCRYPTED(inode) && size)
+		size = extra_info->fscrypt_file_size;
+
 	dout("handle_cap_trunc inode %p mds%d seq %d to %lld seq %d\n",
 	     inode, mds, seq, truncate_size, truncate_seq);
 	queue_trunc = ceph_fill_file_size(inode, issued,
@@ -4076,6 +4096,48 @@ static void handle_cap_import(struct ceph_mds_client *mdsc,
 	*target_cap = cap;
 }
 
+#ifdef CONFIG_FS_ENCRYPTION
+static int parse_fscrypt_fields(void **p, void *end, struct cap_extra_info *extra)
+{
+	u32 len;
+
+	ceph_decode_32_safe(p, end, extra->fscrypt_auth_len, bad);
+	if (extra->fscrypt_auth_len) {
+		ceph_decode_need(p, end, extra->fscrypt_auth_len, bad);
+		extra->fscrypt_auth = kmalloc(extra->fscrypt_auth_len, GFP_KERNEL);
+		if (!extra->fscrypt_auth)
+			return -ENOMEM;
+		ceph_decode_copy_safe(p, end, extra->fscrypt_auth,
+					extra->fscrypt_auth_len, bad);
+	}
+
+	ceph_decode_32_safe(p, end, len, bad);
+	if (len == sizeof(u64))
+		ceph_decode_64_safe(p, end, extra->fscrypt_file_size, bad);
+	else
+		ceph_decode_skip_n(p, end, len, bad);
+	return 0;
+bad:
+	return -EIO;
+}
+#else
+static int parse_fscrypt_fields(void **p, void *end, struct cap_extra_info *extra)
+{
+	u32 len;
+
+	/* Don't care about these fields unless we're encryption-capable */
+	ceph_decode_32_safe(p, end, len, bad);
+	if (len)
+		ceph_decode_skip_n(p, end, len, bad);
+	ceph_decode_32_safe(p, end, len, bad);
+	if (len)
+		ceph_decode_skip_n(p, end, len, bad);
+	return 0;
+bad:
+	return -EIO;
+}
+#endif
+
 /*
  * Handle a caps message from the MDS.
  *
@@ -4194,6 +4256,12 @@ void ceph_handle_caps(struct ceph_mds_session *session,
 		ceph_decode_64_safe(&p, end, extra_info.nsubdirs, bad);
 	}
 
+	if (msg_version >= 12) {
+		int ret = parse_fscrypt_fields(&p, end, &extra_info);
+		if (ret)
+			goto bad;
+	}
+
 	/* lookup ino */
 	inode = ceph_find_inode(mdsc->fsc->sb, vino);
 	ci = ceph_inode(inode);
@@ -4290,7 +4358,8 @@ void ceph_handle_caps(struct ceph_mds_session *session,
 		break;
 
 	case CEPH_CAP_OP_TRUNC:
-		queue_trunc = handle_cap_trunc(inode, h, session);
+		queue_trunc = handle_cap_trunc(inode, h, session,
+						&extra_info);
 		spin_unlock(&ci->i_ceph_lock);
 		if (queue_trunc)
 			ceph_queue_vmtruncate(inode);
@@ -4308,6 +4377,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
 	iput(inode);
 out:
 	ceph_put_string(extra_info.pool_ns);
+	kfree(extra_info.fscrypt_auth);
 	return;
 
 flush_cap_releases:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 32/48] ceph: add __ceph_get_caps helper support
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (30 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 31/48] ceph: handle fscrypt fields in cap messages from MDS Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 33/48] ceph: add __ceph_sync_read " Jeff Layton
                   ` (19 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c  | 19 +++++++++++++------
 fs/ceph/super.h |  2 ++
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 9106340c9c0a..944b18b4e217 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2913,10 +2913,9 @@ int ceph_try_get_caps(struct inode *inode, int need, int want,
  * due to a small max_size, make sure we check_max_size (and possibly
  * ask the mds) so we don't get hung up indefinitely.
  */
-int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got)
+int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi, int need,
+		    int want, loff_t endoff, int *got)
 {
-	struct ceph_file_info *fi = filp->private_data;
-	struct inode *inode = file_inode(filp);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
 	int ret, _got, flags;
@@ -2925,7 +2924,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
 	if (ret < 0)
 		return ret;
 
-	if ((fi->fmode & CEPH_FILE_MODE_WR) &&
+	if (fi && (fi->fmode & CEPH_FILE_MODE_WR) &&
 	    fi->filp_gen != READ_ONCE(fsc->filp_gen))
 		return -EBADF;
 
@@ -2933,7 +2932,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
 
 	while (true) {
 		flags &= CEPH_FILE_MODE_MASK;
-		if (atomic_read(&fi->num_locks))
+		if (fi && atomic_read(&fi->num_locks))
 			flags |= CHECK_FILELOCK;
 		_got = 0;
 		ret = try_get_cap_refs(inode, need, want, endoff,
@@ -2978,7 +2977,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
 				continue;
 		}
 
-		if ((fi->fmode & CEPH_FILE_MODE_WR) &&
+		if (fi && (fi->fmode & CEPH_FILE_MODE_WR) &&
 		    fi->filp_gen != READ_ONCE(fsc->filp_gen)) {
 			if (ret >= 0 && _got)
 				ceph_put_cap_refs(ci, _got);
@@ -3041,6 +3040,14 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
 	return 0;
 }
 
+int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got)
+{
+	struct ceph_file_info *fi = filp->private_data;
+	struct inode *inode = file_inode(filp);
+
+	return __ceph_get_caps(inode, fi, need, want, endoff, got);
+}
+
 /*
  * Take cap refs.  Caller must already know we hold at least one ref
  * on the caps in question or we don't know this is safe.
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 042ea1f8e5c2..c60ff747672a 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1230,6 +1230,8 @@ extern int ceph_encode_dentry_release(void **p, struct dentry *dn,
 				      struct inode *dir,
 				      int mds, int drop, int unless);
 
+extern int __ceph_get_caps(struct inode *inode, struct ceph_file_info *fi,
+			   int need, int want, loff_t endoff, int *got);
 extern int ceph_get_caps(struct file *filp, int need, int want,
 			 loff_t endoff, int *got);
 extern int ceph_try_get_caps(struct inode *inode,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 33/48] ceph: add __ceph_sync_read helper support
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (31 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 32/48] ceph: add __ceph_get_caps helper support Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 34/48] ceph: add object version support for sync read Jeff Layton
                   ` (18 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/file.c  | 34 ++++++++++++++++++++++------------
 fs/ceph/super.h |  2 ++
 2 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index c65f38045f90..4309ff942943 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -883,21 +883,18 @@ enum {
  * If we get a short result from the OSD, check against i_size; we need to
  * only return a short read to the caller if we hit EOF.
  */
-static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
-			      int *retry_op)
+ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
+			 struct iov_iter *to, int *retry_op)
 {
-	struct file *file = iocb->ki_filp;
-	struct inode *inode = file_inode(file);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
 	struct ceph_osd_client *osdc = &fsc->client->osdc;
 	ssize_t ret;
-	u64 off = iocb->ki_pos;
+	u64 off = *ki_pos;
 	u64 len = iov_iter_count(to);
 	u64 i_size = i_size_read(inode);
 
-	dout("sync_read on file %p %llu~%u %s\n", file, off, (unsigned)len,
-	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
+	dout("sync_read on inode %p %llu~%u\n", inode, *ki_pos, (unsigned)len);
 
 	if (!len)
 		return 0;
@@ -999,14 +996,14 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
 			break;
 	}
 
-	if (off > iocb->ki_pos) {
+	if (off > *ki_pos) {
 		if (off >= i_size) {
 			*retry_op = CHECK_EOF;
-			ret = i_size - iocb->ki_pos;
-			iocb->ki_pos = i_size;
+			ret = i_size - *ki_pos;
+			*ki_pos = i_size;
 		} else {
-			ret = off - iocb->ki_pos;
-			iocb->ki_pos = off;
+			ret = off - *ki_pos;
+			*ki_pos = off;
 		}
 	}
 
@@ -1014,6 +1011,19 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
 	return ret;
 }
 
+static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
+			      int *retry_op)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
+
+	dout("sync_read on file %p %llu~%u %s\n", file, iocb->ki_pos,
+	     (unsigned)iov_iter_count(to),
+	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
+
+	return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op);
+}
+
 struct ceph_aio_request {
 	struct kiocb *iocb;
 	size_t total_len;
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index c60ff747672a..eb91586ad8f3 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1258,6 +1258,8 @@ extern int ceph_renew_caps(struct inode *inode, int fmode);
 extern int ceph_open(struct inode *inode, struct file *file);
 extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 			    struct file *file, unsigned flags, umode_t mode);
+extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
+				struct iov_iter *to, int *retry_op);
 extern int ceph_release(struct inode *inode, struct file *filp);
 extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page,
 				  char *data, size_t len);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 34/48] ceph: add object version support for sync read
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (32 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 33/48] ceph: add __ceph_sync_read " Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 35/48] ceph: add infrastructure for file encryption and decryption Jeff Layton
                   ` (17 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Always return the last object's version.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/file.c  | 11 +++++++++--
 fs/ceph/super.h |  3 ++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 4309ff942943..f14a2999f6d5 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -884,7 +884,8 @@ enum {
  * only return a short read to the caller if we hit EOF.
  */
 ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
-			 struct iov_iter *to, int *retry_op)
+			 struct iov_iter *to, int *retry_op,
+			 u64 *last_objver)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
@@ -893,6 +894,7 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 	u64 off = *ki_pos;
 	u64 len = iov_iter_count(to);
 	u64 i_size = i_size_read(inode);
+	u64 objver = 0;
 
 	dout("sync_read on inode %p %llu~%u\n", inode, *ki_pos, (unsigned)len);
 
@@ -951,6 +953,8 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 					 req->r_end_latency,
 					 len, ret);
 
+		if (ret > 0)
+			objver = req->r_version;
 		ceph_osdc_put_request(req);
 
 		i_size = i_size_read(inode);
@@ -1007,6 +1011,9 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 		}
 	}
 
+	if (last_objver && ret > 0)
+		*last_objver = objver;
+
 	dout("sync_read result %zd retry_op %d\n", ret, *retry_op);
 	return ret;
 }
@@ -1021,7 +1028,7 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
 	     (unsigned)iov_iter_count(to),
 	     (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
 
-	return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op);
+	return __ceph_sync_read(inode, &iocb->ki_pos, to, retry_op, NULL);
 }
 
 struct ceph_aio_request {
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index eb91586ad8f3..c17622778720 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1259,7 +1259,8 @@ extern int ceph_open(struct inode *inode, struct file *file);
 extern int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 			    struct file *file, unsigned flags, umode_t mode);
 extern ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
-				struct iov_iter *to, int *retry_op);
+				struct iov_iter *to, int *retry_op,
+				u64 *last_objver);
 extern int ceph_release(struct inode *inode, struct file *filp);
 extern void ceph_fill_inline_data(struct inode *inode, struct page *locked_page,
 				  char *data, size_t len);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 35/48] ceph: add infrastructure for file encryption and decryption
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (33 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 34/48] ceph: add object version support for sync read Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 36/48] ceph: add truncate size handling support for fscrypt Jeff Layton
                   ` (16 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

...and allow test_dummy_encryption to bypass content encryption
if mounted with test_dummy_encryption=clear.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/crypto.c | 121 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/crypto.h |  66 ++++++++++++++++++++++++++
 fs/ceph/super.c  |   8 ++++
 fs/ceph/super.h  |   1 +
 4 files changed, 196 insertions(+)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 35137beb027b..5a87e7385d3f 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -251,3 +251,124 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
 	fscrypt_fname_free_buffer(&_tname);
 	return ret;
 }
+
+int ceph_fscrypt_decrypt_block_inplace(const struct inode *inode,
+				  struct page *page, unsigned int len,
+				  unsigned int offs, u64 lblk_num)
+{
+	struct ceph_mount_options *opt = ceph_inode_to_client(inode)->mount_options;
+
+	if (opt->flags & CEPH_MOUNT_OPT_DUMMY_ENC_CLEAR)
+		return 0;
+
+	dout("%s: len %u offs %u blk %llu\n", __func__, len, offs, lblk_num);
+	return fscrypt_decrypt_block_inplace(inode, page, len, offs, lblk_num);
+}
+
+int ceph_fscrypt_encrypt_block_inplace(const struct inode *inode,
+				  struct page *page, unsigned int len,
+				  unsigned int offs, u64 lblk_num, gfp_t gfp_flags)
+{
+	struct ceph_mount_options *opt = ceph_inode_to_client(inode)->mount_options;
+
+	if (opt->flags & CEPH_MOUNT_OPT_DUMMY_ENC_CLEAR)
+		return 0;
+
+	dout("%s: len %u offs %u blk %llu\n", __func__, len, offs, lblk_num);
+	return fscrypt_encrypt_block_inplace(inode, page, len, offs, lblk_num, gfp_flags);
+}
+
+/**
+ * ceph_fscrypt_decrypt_pages - decrypt an array of pages
+ * @inode: pointer to inode associated with these pages
+ * @page: pointer to page array
+ * @off: offset into the file that the read data starts
+ * @len: max length to decrypt
+ *
+ * Decrypt an array of fscrypt'ed pages and return the amount of
+ * data decrypted. Any data in the page prior to the start of the
+ * first complete block in the read is ignored. Any incomplete
+ * crypto blocks at the end of the array are ignored (and should
+ * probably be zeroed by the caller).
+ *
+ * Returns the length of the decrypted data or a negative errno.
+ */
+int ceph_fscrypt_decrypt_pages(struct inode *inode, struct page **page, u64 off, int len)
+{
+	int i, num_blocks;
+	u64 baseblk = off >> CEPH_FSCRYPT_BLOCK_SHIFT;
+	int ret = 0;
+
+	/*
+	 * We can't deal with partial blocks on an encrypted file, so mask off
+	 * the last bit.
+	 */
+	num_blocks = ceph_fscrypt_blocks(off, len & CEPH_FSCRYPT_BLOCK_MASK);
+
+	/* Decrypt each block */
+	for (i = 0; i < num_blocks; ++i) {
+		int blkoff = i << CEPH_FSCRYPT_BLOCK_SHIFT;
+		int pgidx = blkoff >> PAGE_SHIFT;
+		unsigned int pgoffs = offset_in_page(blkoff);
+		int fret;
+
+		fret = ceph_fscrypt_decrypt_block_inplace(inode, page[pgidx],
+				CEPH_FSCRYPT_BLOCK_SIZE, pgoffs,
+				baseblk + i);
+		if (fret < 0) {
+			if (ret == 0)
+				ret = fret;
+			break;
+		}
+		ret += CEPH_FSCRYPT_BLOCK_SIZE;
+	}
+	return ret;
+}
+
+/**
+ * ceph_fscrypt_encrypt_pages - encrypt an array of pages
+ * @inode: pointer to inode associated with these pages
+ * @page: pointer to page array
+ * @off: offset into the file that the data starts
+ * @len: max length to encrypt
+ * @gfp: gfp flags to use for allocation
+ *
+ * Decrypt an array of cleartext pages and return the amount of
+ * data encrypted. Any data in the page prior to the start of the
+ * first complete block in the read is ignored. Any incomplete
+ * crypto blocks at the end of the array are ignored.
+ *
+ * Returns the length of the encrypted data or a negative errno.
+ */
+int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **page, u64 off,
+				int len, gfp_t gfp)
+{
+	int i, num_blocks;
+	u64 baseblk = off >> CEPH_FSCRYPT_BLOCK_SHIFT;
+	int ret = 0;
+
+	/*
+	 * We can't deal with partial blocks on an encrypted file, so mask off
+	 * the last bit.
+	 */
+	num_blocks = ceph_fscrypt_blocks(off, len & CEPH_FSCRYPT_BLOCK_MASK);
+
+	/* Encrypt each block */
+	for (i = 0; i < num_blocks; ++i) {
+		int blkoff = i << CEPH_FSCRYPT_BLOCK_SHIFT;
+		int pgidx = blkoff >> PAGE_SHIFT;
+		unsigned int pgoffs = offset_in_page(blkoff);
+		int fret;
+
+		fret = ceph_fscrypt_encrypt_block_inplace(inode, page[pgidx],
+				CEPH_FSCRYPT_BLOCK_SIZE, pgoffs,
+				baseblk + i, gfp);
+		if (fret < 0) {
+			if (ret == 0)
+				ret = fret;
+			break;
+		}
+		ret += CEPH_FSCRYPT_BLOCK_SIZE;
+	}
+	return ret;
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index ab27a7ed62c3..b5d360085fe8 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -31,6 +31,10 @@ struct ceph_fscrypt_auth {
 	u8	cfa_blob[FSCRYPT_SET_CONTEXT_MAX_SIZE];
 } __packed;
 
+#define CEPH_FSCRYPT_BLOCK_SHIFT	12
+#define CEPH_FSCRYPT_BLOCK_SIZE		(_AC(1,UL) << CEPH_FSCRYPT_BLOCK_SHIFT)
+#define CEPH_FSCRYPT_BLOCK_MASK		(~(CEPH_FSCRYPT_BLOCK_SIZE-1))
+
 #define CEPH_FSCRYPT_AUTH_VERSION	1
 static inline u32 ceph_fscrypt_auth_len(struct ceph_fscrypt_auth *fa)
 {
@@ -83,6 +87,38 @@ static inline void ceph_fname_free_buffer(struct inode *parent, struct fscrypt_s
 int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
 			struct fscrypt_str *oname, bool *is_nokey);
 
+static inline unsigned int ceph_fscrypt_blocks(u64 off, u64 len)
+{
+	/* crypto blocks cannot span more than one page */
+	BUILD_BUG_ON(CEPH_FSCRYPT_BLOCK_SHIFT > PAGE_SHIFT);
+
+	return ((off+len+CEPH_FSCRYPT_BLOCK_SIZE-1) >> CEPH_FSCRYPT_BLOCK_SHIFT) -
+		(off >> CEPH_FSCRYPT_BLOCK_SHIFT);
+}
+
+/*
+ * If we have an encrypted inode then we must adjust the offset and
+ * range of the on-the-wire read to cover an entire encryption block.
+ * The copy will be done using the original offset and length, after
+ * we've decrypted the result.
+ */
+static inline void fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64 *len)
+{
+	if (IS_ENCRYPTED(inode)) {
+		*len = ceph_fscrypt_blocks(*off, *len) * CEPH_FSCRYPT_BLOCK_SIZE;
+		*off &= CEPH_FSCRYPT_BLOCK_MASK;
+	}
+}
+
+int ceph_fscrypt_decrypt_block_inplace(const struct inode *inode,
+				  struct page *page, unsigned int len,
+				  unsigned int offs, u64 lblk_num);
+int ceph_fscrypt_encrypt_block_inplace(const struct inode *inode,
+				  struct page *page, unsigned int len,
+				  unsigned int offs, u64 lblk_num, gfp_t gfp_flags);
+int ceph_fscrypt_decrypt_pages(struct inode *inode, struct page **page, u64 off, int len);
+int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **page, u64 off,
+				int len, gfp_t gfp);
 #else /* CONFIG_FS_ENCRYPTION */
 
 static inline void ceph_fscrypt_set_ops(struct super_block *sb)
@@ -128,6 +164,36 @@ static inline int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscry
 	oname->len = fname->name_len;
 	return 0;
 }
+
+static inline void fscrypt_adjust_off_and_len(struct inode *inode, u64 *off, u64 *len)
+{
+}
+
+static inline int ceph_fscrypt_decrypt_block_inplace(const struct inode *inode,
+					  struct page *page, unsigned int len,
+					  unsigned int offs, u64 lblk_num)
+{
+	return 0;
+}
+
+static inline int ceph_fscrypt_encrypt_block_inplace(const struct inode *inode,
+				  struct page *page, unsigned int len,
+				  unsigned int offs, u64 lblk_num, gfp_t gfp_flags)
+{
+	return 0;
+}
+
+static inline int ceph_fscrypt_decrypt_pages(struct inode *inode, struct page **page,
+					     u64 off, int len)
+{
+	return 0;
+}
+
+static inline int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **page,
+					     u64 off, int len, gfp_t gfp)
+{
+	return 0;
+}
 #endif /* CONFIG_FS_ENCRYPTION */
 
 #endif
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 0b32d31c6fe0..10923d75a876 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -1076,6 +1076,14 @@ static int ceph_set_test_dummy_encryption(struct super_block *sb, struct fs_cont
 			return -EEXIST;
 		}
 
+		/* HACK: allow for cleartext "encryption" in files for testing */
+		if (fsc->mount_options->test_dummy_encryption &&
+		    !strcmp(fsc->mount_options->test_dummy_encryption, "clear")) {
+			fsopt->flags |= CEPH_MOUNT_OPT_DUMMY_ENC_CLEAR;
+			kfree(fsc->mount_options->test_dummy_encryption);
+			fsc->mount_options->test_dummy_encryption = NULL;
+		}
+
 		err = fscrypt_set_test_dummy_encryption(sb,
 							fsc->mount_options->test_dummy_encryption,
 							&fsc->dummy_enc_policy);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index c17622778720..4d2ccb51fe61 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -50,6 +50,7 @@
 #define CEPH_MOUNT_OPT_ASYNC_DIROPS    (1<<15) /* allow async directory ops */
 #define CEPH_MOUNT_OPT_NOPAGECACHE     (1<<16) /* bypass pagecache altogether */
 #define CEPH_MOUNT_OPT_TEST_DUMMY_ENC  (1<<17) /* enable dummy encryption (for testing) */
+#define CEPH_MOUNT_OPT_DUMMY_ENC_CLEAR (1<<18) /* don't actually encrypt content */
 
 #define CEPH_MOUNT_OPT_DEFAULT			\
 	(CEPH_MOUNT_OPT_DCACHE |		\
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 36/48] ceph: add truncate size handling support for fscrypt
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (34 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 35/48] ceph: add infrastructure for file encryption and decryption Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-12  8:41   ` Xiubo Li
  2022-01-11 19:15 ` [RFC PATCH v10 37/48] libceph: allow ceph_osdc_new_request to accept a multi-op read Jeff Layton
                   ` (15 subsequent siblings)
  51 siblings, 1 reply; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

This will transfer the encrypted last block contents to the MDS
along with the truncate request only when the new size is smaller
and not aligned to the fscrypt BLOCK size. When the last block is
located in the file hole, the truncate request will only contain
the header.

The MDS could fail to do the truncate if there has another client
or process has already updated the RADOS object which contains
the last block, and will return -EAGAIN, then the kclient needs
to retry it. The RMW will take around 50ms, and will let it retry
20 times for now.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/crypto.h |  21 +++++
 fs/ceph/inode.c  | 217 ++++++++++++++++++++++++++++++++++++++++++++---
 fs/ceph/super.h  |   5 ++
 3 files changed, 229 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index b5d360085fe8..3b7efffecbeb 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -25,6 +25,27 @@ struct ceph_fname {
 	u32		ctext_len;	// length of crypttext
 };
 
+/*
+ * Header for the crypted file when truncating the size, this
+ * will be sent to MDS, and the MDS will update the encrypted
+ * last block and then truncate the size.
+ */
+struct ceph_fscrypt_truncate_size_header {
+       __u8  ver;
+       __u8  compat;
+
+       /*
+	* It will be sizeof(assert_ver + file_offset + block_size)
+	* if the last block is empty when it's located in a file
+	* hole. Or the data_len will plus CEPH_FSCRYPT_BLOCK_SIZE.
+	*/
+       __le32 data_len;
+
+       __le64 assert_ver;
+       __le64 file_offset;
+       __le32 block_size;
+} __packed;
+
 struct ceph_fscrypt_auth {
 	__le32	cfa_version;
 	__le32	cfa_blob_len;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 2497306eef58..eecda0a73908 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -586,6 +586,7 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
 	ci->i_truncate_seq = 0;
 	ci->i_truncate_size = 0;
 	ci->i_truncate_pending = 0;
+	ci->i_truncate_pagecache_size = 0;
 
 	ci->i_max_size = 0;
 	ci->i_reported_size = 0;
@@ -759,6 +760,10 @@ int ceph_fill_file_size(struct inode *inode, int issued,
 		dout("truncate_size %lld -> %llu\n", ci->i_truncate_size,
 		     truncate_size);
 		ci->i_truncate_size = truncate_size;
+		if (IS_ENCRYPTED(inode))
+			ci->i_truncate_pagecache_size = size;
+		else
+			ci->i_truncate_pagecache_size = truncate_size;
 	}
 	return queue_trunc;
 }
@@ -1015,7 +1020,7 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 
 	if (new_version ||
 	    (new_issued & (CEPH_CAP_ANY_FILE_RD | CEPH_CAP_ANY_FILE_WR))) {
-		u64 size = info->size;
+		u64 size = le64_to_cpu(info->size);
 		s64 old_pool = ci->i_layout.pool_id;
 		struct ceph_string *old_ns;
 
@@ -1030,16 +1035,20 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 		pool_ns = old_ns;
 
 		if (IS_ENCRYPTED(inode) && size &&
-		    (iinfo->fscrypt_file_len == sizeof(__le64))) {
-			size = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file);
-			if (info->size != round_up(size, CEPH_FSCRYPT_BLOCK_SIZE))
-				pr_warn("size=%llu fscrypt_file=%llu\n", info->size, size);
+		    (iinfo->fscrypt_file_len >= sizeof(__le64))) {
+			u64 fsize = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file);
+			if (fsize) {
+				size = fsize;
+				if (le64_to_cpu(info->size) !=
+				    round_up(size, CEPH_FSCRYPT_BLOCK_SIZE))
+					pr_warn("size=%llu fscrypt_file=%llu\n",
+						info->size, size);
+			}
 		}
 
 		queue_trunc = ceph_fill_file_size(inode, issued,
 					le32_to_cpu(info->truncate_seq),
-					le64_to_cpu(info->truncate_size),
-					le64_to_cpu(size));
+					le64_to_cpu(info->truncate_size), size);
 		/* only update max_size on auth cap */
 		if ((info->cap.flags & CEPH_CAP_FLAG_AUTH) &&
 		    ci->i_max_size != le64_to_cpu(info->max_size)) {
@@ -2153,7 +2162,7 @@ void __ceph_do_pending_vmtruncate(struct inode *inode)
 	/* there should be no reader or writer */
 	WARN_ON_ONCE(ci->i_rd_ref || ci->i_wr_ref);
 
-	to = ci->i_truncate_size;
+	to = ci->i_truncate_pagecache_size;
 	wrbuffer_refs = ci->i_wrbuffer_ref;
 	dout("__do_pending_vmtruncate %p (%d) to %lld\n", inode,
 	     ci->i_truncate_pending, to);
@@ -2163,7 +2172,7 @@ void __ceph_do_pending_vmtruncate(struct inode *inode)
 	truncate_pagecache(inode, to);
 
 	spin_lock(&ci->i_ceph_lock);
-	if (to == ci->i_truncate_size) {
+	if (to == ci->i_truncate_pagecache_size) {
 		ci->i_truncate_pending = 0;
 		finish = 1;
 	}
@@ -2244,6 +2253,143 @@ static const struct inode_operations ceph_encrypted_symlink_iops = {
 	.listxattr = ceph_listxattr,
 };
 
+/*
+ * Transfer the encrypted last block to the MDS and the MDS
+ * will help update it when truncating a smaller size.
+ *
+ * We don't support a PAGE_SIZE that is smaller than the
+ * CEPH_FSCRYPT_BLOCK_SIZE.
+ */
+static int fill_fscrypt_truncate(struct inode *inode,
+				 struct ceph_mds_request *req,
+				 struct iattr *attr)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+	int boff = attr->ia_size % CEPH_FSCRYPT_BLOCK_SIZE;
+	loff_t pos, orig_pos = round_down(attr->ia_size, CEPH_FSCRYPT_BLOCK_SIZE);
+	u64 block = orig_pos >> CEPH_FSCRYPT_BLOCK_SHIFT;
+	struct ceph_pagelist *pagelist = NULL;
+	struct kvec iov;
+	struct iov_iter iter;
+	struct page *page = NULL;
+	struct ceph_fscrypt_truncate_size_header header;
+	int retry_op = 0;
+	int len = CEPH_FSCRYPT_BLOCK_SIZE;
+	loff_t i_size = i_size_read(inode);
+	int got, ret, issued;
+	u64 objver;
+
+	ret = __ceph_get_caps(inode, NULL, CEPH_CAP_FILE_RD, 0, -1, &got);
+	if (ret < 0)
+		return ret;
+
+	issued = __ceph_caps_issued(ci, NULL);
+
+	dout("%s size %lld -> %lld got cap refs on %s, issued %s\n", __func__,
+	     i_size, attr->ia_size, ceph_cap_string(got),
+	     ceph_cap_string(issued));
+
+	/* Try to writeback the dirty pagecaches */
+	if (issued & (CEPH_CAP_FILE_BUFFER))
+		filemap_write_and_wait(inode->i_mapping);
+
+	page = __page_cache_alloc(GFP_KERNEL);
+	if (page == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	pagelist = ceph_pagelist_alloc(GFP_KERNEL);
+	if (!pagelist) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	iov.iov_base = kmap_local_page(page);
+	iov.iov_len = len;
+	iov_iter_kvec(&iter, READ, &iov, 1, len);
+
+	pos = orig_pos;
+	ret = __ceph_sync_read(inode, &pos, &iter, &retry_op, &objver);
+	ceph_put_cap_refs(ci, got);
+	if (ret < 0)
+		goto out;
+
+	/* Insert the header first */
+	header.ver = 1;
+	header.compat = 1;
+
+	/*
+	 * Always set the block_size to CEPH_FSCRYPT_BLOCK_SIZE,
+	 * because in MDS it may need this to do the truncate.
+	 */
+	header.block_size = cpu_to_le32(CEPH_FSCRYPT_BLOCK_SIZE);
+
+	/*
+	 * If we hit a hole here, we should just skip filling
+	 * the fscrypt for the request, because once the fscrypt
+	 * is enabled, the file will be split into many blocks
+	 * with the size of CEPH_FSCRYPT_BLOCK_SIZE, if there
+	 * has a hole, the hole size should be multiple of block
+	 * size.
+	 *
+	 * If the Rados object doesn't exist, it will be set 0.
+	 */
+	if (!objver) {
+		dout("%s hit hole, ppos %lld < size %lld\n", __func__,
+		     pos, i_size);
+
+		header.data_len = cpu_to_le32(8 + 8 + 4);
+
+		/*
+		 * If the "assert_ver" is 0 means hitting a hole, and
+		 * the MDS will use the it to check whether hitting a
+		 * hole or not.
+		 */
+		header.assert_ver = 0;
+		header.file_offset = 0;
+		ret = 0;
+	} else {
+		header.data_len = cpu_to_le32(8 + 8 + 4 + CEPH_FSCRYPT_BLOCK_SIZE);
+		header.assert_ver = cpu_to_le64(objver);
+		header.file_offset = cpu_to_le64(orig_pos);
+
+		/* truncate and zero out the extra contents for the last block */
+		memset(iov.iov_base + boff, 0, PAGE_SIZE - boff);
+
+		/* encrypt the last block */
+		ret = ceph_fscrypt_encrypt_block_inplace(inode, page,
+						    CEPH_FSCRYPT_BLOCK_SIZE,
+						    0, block,
+						    GFP_KERNEL);
+		if (ret)
+			goto out;
+	}
+
+	/* Insert the header */
+	ret = ceph_pagelist_append(pagelist, &header, sizeof(header));
+	if (ret)
+		goto out;
+
+	if (header.block_size) {
+		/* Append the last block contents to pagelist */
+		ret = ceph_pagelist_append(pagelist, iov.iov_base,
+					   CEPH_FSCRYPT_BLOCK_SIZE);
+		if (ret)
+			goto out;
+	}
+	req->r_pagelist = pagelist;
+out:
+	dout("%s %p size dropping cap refs on %s\n", __func__,
+	     inode, ceph_cap_string(got));
+	kunmap_local(iov.iov_base);
+	if (page)
+		__free_pages(page, 0);
+	if (ret && pagelist)
+		ceph_pagelist_release(pagelist);
+	return ret;
+}
+
 int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
@@ -2251,13 +2397,17 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 	struct ceph_mds_request *req;
 	struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
 	struct ceph_cap_flush *prealloc_cf;
+	loff_t isize = i_size_read(inode);
 	int issued;
 	int release = 0, dirtied = 0;
 	int mask = 0;
 	int err = 0;
 	int inode_dirty_flags = 0;
 	bool lock_snap_rwsem = false;
+	bool fill_fscrypt;
+	int truncate_retry = 20; /* The RMW will take around 50ms */
 
+retry:
 	prealloc_cf = ceph_alloc_cap_flush();
 	if (!prealloc_cf)
 		return -ENOMEM;
@@ -2269,6 +2419,7 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 		return PTR_ERR(req);
 	}
 
+	fill_fscrypt = false;
 	spin_lock(&ci->i_ceph_lock);
 	issued = __ceph_caps_issued(ci, NULL);
 
@@ -2390,10 +2541,27 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 		}
 	}
 	if (ia_valid & ATTR_SIZE) {
-		loff_t isize = i_size_read(inode);
-
 		dout("setattr %p size %lld -> %lld\n", inode, isize, attr->ia_size);
-		if ((issued & CEPH_CAP_FILE_EXCL) && attr->ia_size >= isize) {
+		/*
+		 * Only when the new size is smaller and not aligned to
+		 * CEPH_FSCRYPT_BLOCK_SIZE will the RMW is needed.
+		 */
+		if (IS_ENCRYPTED(inode) && attr->ia_size < isize &&
+		    (attr->ia_size % CEPH_FSCRYPT_BLOCK_SIZE)) {
+			mask |= CEPH_SETATTR_SIZE;
+			release |= CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL |
+				   CEPH_CAP_FILE_RD | CEPH_CAP_FILE_WR;
+			set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
+			mask |= CEPH_SETATTR_FSCRYPT_FILE;
+			req->r_args.setattr.size =
+				cpu_to_le64(round_up(attr->ia_size,
+						     CEPH_FSCRYPT_BLOCK_SIZE));
+			req->r_args.setattr.old_size =
+				cpu_to_le64(round_up(isize,
+						     CEPH_FSCRYPT_BLOCK_SIZE));
+			req->r_fscrypt_file = attr->ia_size;
+			fill_fscrypt = true;
+		} else if ((issued & CEPH_CAP_FILE_EXCL) && attr->ia_size >= isize) {
 			if (attr->ia_size > isize) {
 				i_size_write(inode, attr->ia_size);
 				inode->i_blocks = calc_inode_blocks(attr->ia_size);
@@ -2416,7 +2584,6 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 					cpu_to_le64(round_up(isize,
 							     CEPH_FSCRYPT_BLOCK_SIZE));
 				req->r_fscrypt_file = attr->ia_size;
-				/* FIXME: client must zero out any partial blocks! */
 			} else {
 				req->r_args.setattr.size = cpu_to_le64(attr->ia_size);
 				req->r_args.setattr.old_size = cpu_to_le64(isize);
@@ -2482,8 +2649,10 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 
 	release &= issued;
 	spin_unlock(&ci->i_ceph_lock);
-	if (lock_snap_rwsem)
+	if (lock_snap_rwsem) {
 		up_read(&mdsc->snap_rwsem);
+		lock_snap_rwsem = false;
+	}
 
 	if (inode_dirty_flags)
 		__mark_inode_dirty(inode, inode_dirty_flags);
@@ -2495,7 +2664,27 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 		req->r_args.setattr.mask = cpu_to_le32(mask);
 		req->r_num_caps = 1;
 		req->r_stamp = attr->ia_ctime;
+		if (fill_fscrypt) {
+			err = fill_fscrypt_truncate(inode, req, attr);
+			if (err)
+				goto out;
+		}
+
+		/*
+		 * The truncate request will return -EAGAIN when the
+		 * last block has been updated just before the MDS
+		 * successfully gets the xlock for the FILE lock. To
+		 * avoid corrupting the file contents we need to retry
+		 * it.
+		 */
 		err = ceph_mdsc_do_request(mdsc, NULL, req);
+		if (err == -EAGAIN && truncate_retry--) {
+			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
+			     inode, err, ceph_cap_string(dirtied), mask);
+			ceph_mdsc_put_request(req);
+			ceph_free_cap_flush(prealloc_cf);
+			goto retry;
+		}
 	}
 out:
 	dout("setattr %p result=%d (%s locally, %d remote)\n", inode, err,
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 4d2ccb51fe61..cd4a83fcbc0f 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -410,6 +410,11 @@ struct ceph_inode_info {
 	u32 i_truncate_seq;        /* last truncate to smaller size */
 	u64 i_truncate_size;       /*  and the size we last truncated down to */
 	int i_truncate_pending;    /*  still need to call vmtruncate */
+	/*
+	 * For none fscrypt case it equals to i_truncate_size or it will
+	 * equals to fscrypt_file_size
+	 */
+	u64 i_truncate_pagecache_size;
 
 	u64 i_max_size;            /* max file size authorized by mds */
 	u64 i_reported_size; /* (max_)size reported to or requested of mds */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 37/48] libceph: allow ceph_osdc_new_request to accept a multi-op read
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (35 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 36/48] ceph: add truncate size handling support for fscrypt Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 38/48] ceph: disable fallocate for encrypted inodes Jeff Layton
                   ` (14 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Currently we have some special-casing for multi-op writes, but in the
case of a read, we can't really handle it. All of the current multi-op
callers call it with CEPH_OSD_FLAG_WRITE set.

Have ceph_osdc_new_request check for CEPH_OSD_FLAG_READ and if it's set,
allocate multiple reply ops instead of multiple request ops. If neither
flag is set, return -EINVAL.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 net/ceph/osd_client.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 8a9416e4893d..24ccd66cc034 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1125,15 +1125,30 @@ struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *osdc,
 	if (flags & CEPH_OSD_FLAG_WRITE)
 		req->r_data_offset = off;
 
-	if (num_ops > 1)
+	if (num_ops > 1) {
+		int num_req_ops, num_rep_ops;
+
 		/*
-		 * This is a special case for ceph_writepages_start(), but it
-		 * also covers ceph_uninline_data().  If more multi-op request
-		 * use cases emerge, we will need a separate helper.
+		 * If this is a multi-op write request, assume that we'll need
+		 * request ops. If it's a multi-op read then assume we'll need
+		 * reply ops. Anything else and call it -EINVAL.
 		 */
-		r = __ceph_osdc_alloc_messages(req, GFP_NOFS, num_ops, 0);
-	else
+		if (flags & CEPH_OSD_FLAG_WRITE) {
+			num_req_ops = num_ops;
+			num_rep_ops = 0;
+		} else if (flags & CEPH_OSD_FLAG_READ) {
+			num_req_ops = 0;
+			num_rep_ops = num_ops;
+		} else {
+			r = -EINVAL;
+			goto fail;
+		}
+
+		r = __ceph_osdc_alloc_messages(req, GFP_NOFS, num_req_ops,
+						num_rep_ops);
+	} else {
 		r = ceph_osdc_alloc_messages(req, GFP_NOFS);
+	}
 	if (r)
 		goto fail;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 38/48] ceph: disable fallocate for encrypted inodes
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (36 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 37/48] libceph: allow ceph_osdc_new_request to accept a multi-op read Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:15 ` [RFC PATCH v10 39/48] ceph: disable copy offload on " Jeff Layton
                   ` (13 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

...hopefully, just for now.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/file.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index f14a2999f6d5..c79c95138843 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -2136,6 +2136,9 @@ static long ceph_fallocate(struct file *file, int mode,
 	if (!S_ISREG(inode->i_mode))
 		return -EOPNOTSUPP;
 
+	if (IS_ENCRYPTED(inode))
+		return -EOPNOTSUPP;
+
 	prealloc_cf = ceph_alloc_cap_flush();
 	if (!prealloc_cf)
 		return -ENOMEM;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 39/48] ceph: disable copy offload on encrypted inodes
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (37 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 38/48] ceph: disable fallocate for encrypted inodes Jeff Layton
@ 2022-01-11 19:15 ` Jeff Layton
  2022-01-11 19:16 ` [RFC PATCH v10 40/48] ceph: don't use special DIO path for " Jeff Layton
                   ` (12 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:15 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

If we have an encrypted inode, then the client will need to re-encrypt
the contents of the new object. Disable copy offload to or from
encrypted inodes.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/file.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index c79c95138843..1711fde46548 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -2462,6 +2462,10 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
 		return -EOPNOTSUPP;
 	}
 
+	/* Every encrypted inode gets its own key, so we can't offload them */
+	if (IS_ENCRYPTED(src_inode) || IS_ENCRYPTED(dst_inode))
+		return -EOPNOTSUPP;
+
 	if (len < src_ci->i_layout.object_size)
 		return -EOPNOTSUPP; /* no remote copy will be done */
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 40/48] ceph: don't use special DIO path for encrypted inodes
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (38 preceding siblings ...)
  2022-01-11 19:15 ` [RFC PATCH v10 39/48] ceph: disable copy offload on " Jeff Layton
@ 2022-01-11 19:16 ` Jeff Layton
  2022-01-11 19:16 ` [RFC PATCH v10 41/48] ceph: set encryption context on open Jeff Layton
                   ` (11 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:16 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Eventually I want to merge the synchronous and direct read codepaths,
possibly via new netfs infrastructure. For now, the direct path is not
crypto-enabled, so use the sync read/write paths instead.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/file.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 1711fde46548..b74c9bf2cef1 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1631,7 +1631,9 @@ static ssize_t ceph_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		     ceph_cap_string(got));
 
 		if (ci->i_inline_version == CEPH_INLINE_NONE) {
-			if (!retry_op && (iocb->ki_flags & IOCB_DIRECT)) {
+			if (!retry_op &&
+			    (iocb->ki_flags & IOCB_DIRECT) &&
+			    !IS_ENCRYPTED(inode)) {
 				ret = ceph_direct_read_write(iocb, to,
 							     NULL, NULL);
 				if (ret >= 0 && ret < len)
@@ -1863,7 +1865,7 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
 
 		/* we might need to revert back to that point */
 		data = *from;
-		if (iocb->ki_flags & IOCB_DIRECT)
+		if ((iocb->ki_flags & IOCB_DIRECT) && !IS_ENCRYPTED(inode))
 			written = ceph_direct_read_write(iocb, &data, snapc,
 							 &prealloc_cf);
 		else
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 41/48] ceph: set encryption context on open
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (39 preceding siblings ...)
  2022-01-11 19:16 ` [RFC PATCH v10 40/48] ceph: don't use special DIO path for " Jeff Layton
@ 2022-01-11 19:16 ` Jeff Layton
  2022-01-11 19:16 ` [RFC PATCH v10 42/48] ceph: align data in pages in ceph_sync_write Jeff Layton
                   ` (10 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:16 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/file.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index b74c9bf2cef1..17e26c030f5f 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -369,6 +369,12 @@ int ceph_open(struct inode *inode, struct file *file)
 	fmode = ceph_flags_to_mode(flags);
 	wanted = ceph_caps_for_mode(fmode);
 
+	if (S_ISREG(inode->i_mode)) {
+		err = fscrypt_file_open(inode, file);
+		if (err)
+			return err;
+	}
+
 	/* snapped files are read-only */
 	if (ceph_snap(inode) != CEPH_NOSNAP && (file->f_mode & FMODE_WRITE))
 		return -EROFS;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 42/48] ceph: align data in pages in ceph_sync_write
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (40 preceding siblings ...)
  2022-01-11 19:16 ` [RFC PATCH v10 41/48] ceph: set encryption context on open Jeff Layton
@ 2022-01-11 19:16 ` Jeff Layton
  2022-01-11 19:16 ` [RFC PATCH v10 43/48] ceph: add read/modify/write to ceph_sync_write Jeff Layton
                   ` (9 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:16 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Encrypted files will need to be dealt with in block-sized chunks and
once we do that, the way that ceph_sync_write aligns the data in the
bounce buffer won't be acceptable.

Change it to align the data the same way it would be aligned in the
pagecache.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/file.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 17e26c030f5f..a6305ad5519b 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1479,6 +1479,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 	bool check_caps = false;
 	struct timespec64 mtime = current_time(inode);
 	size_t count = iov_iter_count(from);
+	size_t off;
 
 	if (ceph_snap(file_inode(file)) != CEPH_NOSNAP)
 		return -EROFS;
@@ -1516,12 +1517,8 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 			break;
 		}
 
-		/*
-		 * write from beginning of first page,
-		 * regardless of io alignment
-		 */
-		num_pages = (len + PAGE_SIZE - 1) >> PAGE_SHIFT;
-
+		/* FIXME: express in FSCRYPT_BLOCK_SIZE units */
+		num_pages = calc_pages_for(pos, len);
 		pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL);
 		if (IS_ERR(pages)) {
 			ret = PTR_ERR(pages);
@@ -1529,9 +1526,11 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 		}
 
 		left = len;
+		off = pos & ~CEPH_FSCRYPT_BLOCK_MASK;
 		for (n = 0; n < num_pages; n++) {
-			size_t plen = min_t(size_t, left, PAGE_SIZE);
-			ret = copy_page_from_iter(pages[n], 0, plen, from);
+			size_t plen = min_t(size_t, left, CEPH_FSCRYPT_BLOCK_SIZE - off);
+			ret = copy_page_from_iter(pages[n], off, plen, from);
+			off = 0;
 			if (ret != plen) {
 				ret = -EFAULT;
 				break;
@@ -1546,8 +1545,9 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 
 		req->r_inode = inode;
 
-		osd_req_op_extent_osd_data_pages(req, 0, pages, len, 0,
-						false, true);
+		osd_req_op_extent_osd_data_pages(req, 0, pages, len,
+						 pos & ~CEPH_FSCRYPT_BLOCK_MASK,
+						 false, true);
 
 		req->r_mtime = mtime;
 		ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 43/48] ceph: add read/modify/write to ceph_sync_write
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (41 preceding siblings ...)
  2022-01-11 19:16 ` [RFC PATCH v10 42/48] ceph: align data in pages in ceph_sync_write Jeff Layton
@ 2022-01-11 19:16 ` Jeff Layton
  2022-01-19  3:21   ` Xiubo Li
  2022-01-11 19:16 ` [RFC PATCH v10 44/48] ceph: plumb in decryption during sync reads Jeff Layton
                   ` (8 subsequent siblings)
  51 siblings, 1 reply; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:16 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

When doing a synchronous write on an encrypted inode, we have no
guarantee that the caller is writing crypto block-aligned data. When
that happens, we must do a read/modify/write cycle.

First, expand the range to cover complete blocks. If we had to change
the original pos or length, issue a read to fill the first and/or last
pages, and fetch the version of the object from the result.

We then copy data into the pages as usual, encrypt the result and issue
a write prefixed by an assertion that the version hasn't changed. If it has
changed then we restart the whole thing again.

If there is no object at that position in the file (-ENOENT), we prefix
the write on an exclusive create of the object instead.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/file.c | 260 +++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 228 insertions(+), 32 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index a6305ad5519b..41766b2012e9 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1468,18 +1468,16 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 	struct inode *inode = file_inode(file);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
-	struct ceph_vino vino;
+	struct ceph_osd_client *osdc = &fsc->client->osdc;
 	struct ceph_osd_request *req;
 	struct page **pages;
 	u64 len;
 	int num_pages;
 	int written = 0;
-	int flags;
 	int ret;
 	bool check_caps = false;
 	struct timespec64 mtime = current_time(inode);
 	size_t count = iov_iter_count(from);
-	size_t off;
 
 	if (ceph_snap(file_inode(file)) != CEPH_NOSNAP)
 		return -EROFS;
@@ -1499,70 +1497,267 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 	if (ret < 0)
 		dout("invalidate_inode_pages2_range returned %d\n", ret);
 
-	flags = /* CEPH_OSD_FLAG_ORDERSNAP | */ CEPH_OSD_FLAG_WRITE;
-
 	while ((len = iov_iter_count(from)) > 0) {
 		size_t left;
 		int n;
+		u64 write_pos = pos;
+		u64 write_len = len;
+		u64 objnum, objoff;
+		u32 xlen;
+		u64 assert_ver;
+		bool rmw;
+		bool first, last;
+		struct iov_iter saved_iter = *from;
+		size_t off;
+
+		fscrypt_adjust_off_and_len(inode, &write_pos, &write_len);
+
+		/* clamp the length to the end of first object */
+		ceph_calc_file_object_mapping(&ci->i_layout, write_pos,
+						write_len, &objnum, &objoff,
+						&xlen);
+		write_len = xlen;
+
+		/* adjust len downward if it goes beyond current object */
+		if (pos + len > write_pos + write_len)
+			len = write_pos + write_len - pos;
 
-		vino = ceph_vino(inode);
-		req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout,
-					    vino, pos, &len, 0, 1,
-					    CEPH_OSD_OP_WRITE, flags, snapc,
-					    ci->i_truncate_seq,
-					    ci->i_truncate_size,
-					    false);
-		if (IS_ERR(req)) {
-			ret = PTR_ERR(req);
-			break;
-		}
+		/*
+		 * If we had to adjust the length or position to align with a
+		 * crypto block, then we must do a read/modify/write cycle. We
+		 * use a version assertion to redrive the thing if something
+		 * changes in between.
+		 */
+		first = pos != write_pos;
+		last = (pos + len) != (write_pos + write_len);
+		rmw = first || last;
 
-		/* FIXME: express in FSCRYPT_BLOCK_SIZE units */
-		num_pages = calc_pages_for(pos, len);
+		/*
+		 * The data is emplaced into the page as it would be if it were in
+		 * an array of pagecache pages.
+		 */
+		num_pages = calc_pages_for(write_pos, write_len);
 		pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL);
 		if (IS_ERR(pages)) {
 			ret = PTR_ERR(pages);
-			goto out;
+			break;
+		}
+
+		/* Do we need to preload the pages? */
+		if (rmw) {
+			u64 first_pos = write_pos;
+			u64 last_pos = (write_pos + write_len) - CEPH_FSCRYPT_BLOCK_SIZE;
+			u64 read_len = CEPH_FSCRYPT_BLOCK_SIZE;
+
+			/* We should only need to do this for encrypted inodes */
+			WARN_ON_ONCE(!IS_ENCRYPTED(inode));
+
+			/* No need to do two reads if first and last blocks are same */
+			if (first && last_pos == first_pos)
+				last = false;
+
+			/*
+			 * Allocate a read request for one or two extents, depending
+			 * on how the request was aligned.
+			 */
+			req = ceph_osdc_new_request(osdc, &ci->i_layout,
+					ci->i_vino, first ? first_pos : last_pos,
+					&read_len, 0, (first && last) ? 2 : 1,
+					CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
+					NULL, ci->i_truncate_seq,
+					ci->i_truncate_size, false);
+			if (IS_ERR(req)) {
+				ceph_release_page_vector(pages, num_pages);
+				ret = PTR_ERR(req);
+				break;
+			}
+
+			/* Something is misaligned! */
+			if (read_len != CEPH_FSCRYPT_BLOCK_SIZE) {
+				ret = -EIO;
+				break;
+			}
+
+			/* Add extent for first block? */
+			if (first)
+				osd_req_op_extent_osd_data_pages(req, 0, pages,
+							 CEPH_FSCRYPT_BLOCK_SIZE,
+							 offset_in_page(first_pos),
+							 false, false);
+
+			/* Add extent for last block */
+			if (last) {
+				/* Init the other extent if first extent has been used */
+				if (first) {
+					osd_req_op_extent_init(req, 1, CEPH_OSD_OP_READ,
+							last_pos, CEPH_FSCRYPT_BLOCK_SIZE,
+							ci->i_truncate_size,
+							ci->i_truncate_seq);
+				}
+
+				osd_req_op_extent_osd_data_pages(req, first ? 1 : 0,
+							&pages[num_pages - 1],
+							CEPH_FSCRYPT_BLOCK_SIZE,
+							offset_in_page(last_pos),
+							false, false);
+			}
+
+			ret = ceph_osdc_start_request(osdc, req, false);
+			if (!ret)
+				ret = ceph_osdc_wait_request(osdc, req);
+
+			/* FIXME: length field is wrong if there are 2 extents */
+			ceph_update_read_metrics(&fsc->mdsc->metric,
+						 req->r_start_latency,
+						 req->r_end_latency,
+						 read_len, ret);
+
+			/* Ok if object is not already present */
+			if (ret == -ENOENT) {
+				/*
+				 * If there is no object, then we can't assert
+				 * on its version. Set it to 0, and we'll use an
+				 * exclusive create instead.
+				 */
+				ceph_osdc_put_request(req);
+				assert_ver = 0;
+				ret = 0;
+
+				/*
+				 * zero out the soon-to-be uncopied parts of the
+				 * first and last pages.
+				 */
+				if (first)
+					zero_user_segment(pages[0], 0,
+							  offset_in_page(first_pos));
+				if (last)
+					zero_user_segment(pages[num_pages - 1],
+							  offset_in_page(last_pos),
+							  PAGE_SIZE);
+			} else {
+				/* Grab assert version. It must be non-zero. */
+				assert_ver = req->r_version;
+				WARN_ON_ONCE(ret > 0 && assert_ver == 0);
+
+				ceph_osdc_put_request(req);
+				if (ret < 0) {
+					ceph_release_page_vector(pages, num_pages);
+					break;
+				}
+
+				if (first) {
+					ret = ceph_fscrypt_decrypt_block_inplace(inode,
+							pages[0],
+							CEPH_FSCRYPT_BLOCK_SIZE,
+							offset_in_page(first_pos),
+							first_pos >> CEPH_FSCRYPT_BLOCK_SHIFT);
+					if (ret < 0)
+						break;
+				}
+				if (last) {
+					ret = ceph_fscrypt_decrypt_block_inplace(inode,
+							pages[num_pages - 1],
+							CEPH_FSCRYPT_BLOCK_SIZE,
+							offset_in_page(last_pos),
+							last_pos >> CEPH_FSCRYPT_BLOCK_SHIFT);
+					if (ret < 0)
+						break;
+				}
+			}
 		}
 
 		left = len;
-		off = pos & ~CEPH_FSCRYPT_BLOCK_MASK;
+		off = offset_in_page(pos);
 		for (n = 0; n < num_pages; n++) {
-			size_t plen = min_t(size_t, left, CEPH_FSCRYPT_BLOCK_SIZE - off);
+			size_t plen = min_t(size_t, left, PAGE_SIZE - off);
+
+			/* copy the data */
 			ret = copy_page_from_iter(pages[n], off, plen, from);
-			off = 0;
 			if (ret != plen) {
 				ret = -EFAULT;
 				break;
 			}
+			off = 0;
 			left -= ret;
 		}
-
 		if (ret < 0) {
+			dout("sync_write write failed with %d\n", ret);
 			ceph_release_page_vector(pages, num_pages);
-			goto out;
+			break;
 		}
 
-		req->r_inode = inode;
+		if (IS_ENCRYPTED(inode)) {
+			ret = ceph_fscrypt_encrypt_pages(inode, pages,
+							 write_pos, write_len,
+							 GFP_KERNEL);
+			if (ret < 0) {
+				dout("encryption failed with %d\n", ret);
+				break;
+			}
+		}
 
-		osd_req_op_extent_osd_data_pages(req, 0, pages, len,
-						 pos & ~CEPH_FSCRYPT_BLOCK_MASK,
-						 false, true);
+		req = ceph_osdc_new_request(osdc, &ci->i_layout,
+					    ci->i_vino, write_pos, &write_len,
+					    rmw ? 1 : 0, rmw ? 2 : 1,
+					    CEPH_OSD_OP_WRITE,
+					    CEPH_OSD_FLAG_WRITE,
+					    snapc, ci->i_truncate_seq,
+					    ci->i_truncate_size, false);
+		if (IS_ERR(req)) {
+			ret = PTR_ERR(req);
+			ceph_release_page_vector(pages, num_pages);
+			break;
+		}
 
+		dout("sync_write write op %lld~%llu\n", write_pos, write_len);
+		osd_req_op_extent_osd_data_pages(req, rmw ? 1 : 0, pages, write_len,
+						 offset_in_page(write_pos), false,
+						 true);
+		req->r_inode = inode;
 		req->r_mtime = mtime;
-		ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
+
+		/* Set up the assertion */
+		if (rmw) {
+			/*
+			 * Set up the assertion. If we don't have a version number,
+			 * then the object doesn't exist yet. Use an exclusive create
+			 * instead of a version assertion in that case.
+			 */
+			if (assert_ver) {
+				osd_req_op_init(req, 0, CEPH_OSD_OP_ASSERT_VER, 0);
+				req->r_ops[0].assert_ver.ver = assert_ver;
+			} else {
+				osd_req_op_init(req, 0, CEPH_OSD_OP_CREATE,
+						CEPH_OSD_OP_FLAG_EXCL);
+			}
+		}
+
+		ret = ceph_osdc_start_request(osdc, req, false);
 		if (!ret)
-			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
+			ret = ceph_osdc_wait_request(osdc, req);
 
 		ceph_update_write_metrics(&fsc->mdsc->metric, req->r_start_latency,
 					  req->r_end_latency, len, ret);
-out:
 		ceph_osdc_put_request(req);
 		if (ret != 0) {
+			dout("sync_write osd write returned %d\n", ret);
+			/* Version changed! Must re-do the rmw cycle */
+			if ((assert_ver && (ret == -ERANGE || ret == -EOVERFLOW)) ||
+			     (!assert_ver && ret == -EEXIST)) {
+				/* We should only ever see this on a rmw */
+				WARN_ON_ONCE(!rmw);
+
+				/* The version should never go backward */
+				WARN_ON_ONCE(ret == -EOVERFLOW);
+
+				*from = saved_iter;
+
+				/* FIXME: limit number of times we loop? */
+				continue;
+			}
 			ceph_set_error_write(ci);
 			break;
 		}
-
 		ceph_clear_error_write(ci);
 		pos += len;
 		written += len;
@@ -1580,6 +1775,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 		ret = written;
 		iocb->ki_pos = pos;
 	}
+	dout("sync_write returning %d\n", ret);
 	return ret;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 44/48] ceph: plumb in decryption during sync reads
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (42 preceding siblings ...)
  2022-01-11 19:16 ` [RFC PATCH v10 43/48] ceph: add read/modify/write to ceph_sync_write Jeff Layton
@ 2022-01-11 19:16 ` Jeff Layton
  2022-01-19  5:18   ` Xiubo Li
  2022-01-11 19:16 ` [RFC PATCH v10 45/48] ceph: set i_blkbits to crypto block size for encrypted inodes Jeff Layton
                   ` (7 subsequent siblings)
  51 siblings, 1 reply; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:16 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Note that the crypto block may be smaller than a page, but the reverse
cannot be true.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/file.c | 94 ++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 69 insertions(+), 25 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 41766b2012e9..b4f2fcd33837 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -926,9 +926,17 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 		bool more;
 		int idx;
 		size_t left;
+		u64 read_off = off;
+		u64 read_len = len;
+
+		/* determine new offset/length if encrypted */
+		fscrypt_adjust_off_and_len(inode, &read_off, &read_len);
+
+		dout("sync_read orig %llu~%llu reading %llu~%llu",
+		     off, len, read_off, read_len);
 
 		req = ceph_osdc_new_request(osdc, &ci->i_layout,
-					ci->i_vino, off, &len, 0, 1,
+					ci->i_vino, read_off, &read_len, 0, 1,
 					CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
 					NULL, ci->i_truncate_seq,
 					ci->i_truncate_size, false);
@@ -937,10 +945,13 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 			break;
 		}
 
+		/* adjust len downward if the request truncated the len */
+		if (off + len > read_off + read_len)
+			len = read_off + read_len - off;
 		more = len < iov_iter_count(to);
 
-		num_pages = calc_pages_for(off, len);
-		page_off = off & ~PAGE_MASK;
+		num_pages = calc_pages_for(read_off, read_len);
+		page_off = offset_in_page(off);
 		pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL);
 		if (IS_ERR(pages)) {
 			ceph_osdc_put_request(req);
@@ -948,7 +959,8 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 			break;
 		}
 
-		osd_req_op_extent_osd_data_pages(req, 0, pages, len, page_off,
+		osd_req_op_extent_osd_data_pages(req, 0, pages, read_len,
+						 offset_in_page(read_off),
 						 false, false);
 		ret = ceph_osdc_start_request(osdc, req, false);
 		if (!ret)
@@ -957,23 +969,50 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 		ceph_update_read_metrics(&fsc->mdsc->metric,
 					 req->r_start_latency,
 					 req->r_end_latency,
-					 len, ret);
+					 read_len, ret);
 
 		if (ret > 0)
 			objver = req->r_version;
 		ceph_osdc_put_request(req);
-
 		i_size = i_size_read(inode);
 		dout("sync_read %llu~%llu got %zd i_size %llu%s\n",
 		     off, len, ret, i_size, (more ? " MORE" : ""));
 
-		if (ret == -ENOENT)
+		if (ret == -ENOENT) {
+			/* No object? Then this is a hole */
 			ret = 0;
+		} else if (ret > 0 && IS_ENCRYPTED(inode)) {
+			int fret;
+
+			fret = ceph_fscrypt_decrypt_pages(inode, pages, read_off, ret);
+			if (fret < 0) {
+				ceph_release_page_vector(pages, num_pages);
+				ret = fret;
+				break;
+			}
+
+			dout("sync_read decrypted fret %d\n", fret);
+
+			/* account for any partial block at the beginning */
+			fret -= (off - read_off);
+
+			/*
+			 * Short read after big offset adjustment?
+			 * Nothing is usable, just call it a zero
+			 * len read.
+			 */
+			fret = max(fret, 0);
+
+			/* account for partial block at the end */
+			ret = min_t(ssize_t, fret, len);
+		}
+
+		/* Short read but not EOF? Zero out the remainder. */
 		if (ret >= 0 && ret < len && (off + ret < i_size)) {
 			int zlen = min(len - ret, i_size - off - ret);
 			int zoff = page_off + ret;
 			dout("sync_read zero gap %llu~%llu\n",
-                             off + ret, off + ret + zlen);
+			     off + ret, off + ret + zlen);
 			ceph_zero_page_vector_range(zoff, zlen, pages);
 			ret += zlen;
 		}
@@ -981,15 +1020,15 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 		idx = 0;
 		left = ret > 0 ? ret : 0;
 		while (left > 0) {
-			size_t len, copied;
-			page_off = off & ~PAGE_MASK;
-			len = min_t(size_t, left, PAGE_SIZE - page_off);
+			size_t plen, copied;
+			plen = min_t(size_t, left, PAGE_SIZE - page_off);
 			SetPageUptodate(pages[idx]);
 			copied = copy_page_to_iter(pages[idx++],
-						   page_off, len, to);
+						   page_off, plen, to);
 			off += copied;
 			left -= copied;
-			if (copied < len) {
+			page_off = 0;
+			if (copied < plen) {
 				ret = -EFAULT;
 				break;
 			}
@@ -1006,20 +1045,21 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 			break;
 	}
 
-	if (off > *ki_pos) {
-		if (off >= i_size) {
-			*retry_op = CHECK_EOF;
-			ret = i_size - *ki_pos;
-			*ki_pos = i_size;
-		} else {
-			ret = off - *ki_pos;
-			*ki_pos = off;
+	if (ret > 0) {
+		if (off > *ki_pos) {
+			if (off >= i_size) {
+				*retry_op = CHECK_EOF;
+				ret = i_size - *ki_pos;
+				*ki_pos = i_size;
+			} else {
+				ret = off - *ki_pos;
+				*ki_pos = off;
+			}
 		}
-	}
-
-	if (last_objver && ret > 0)
-		*last_objver = objver;
 
+		if (last_objver)
+			*last_objver = objver;
+	}
 	dout("sync_read result %zd retry_op %d\n", ret, *retry_op);
 	return ret;
 }
@@ -1532,6 +1572,9 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 		last = (pos + len) != (write_pos + write_len);
 		rmw = first || last;
 
+		dout("sync_write ino %llx %lld~%llu adjusted %lld~%llu -- %srmw\n",
+		     ci->i_vino.ino, pos, len, write_pos, write_len, rmw ? "" : "no ");
+
 		/*
 		 * The data is emplaced into the page as it would be if it were in
 		 * an array of pagecache pages.
@@ -1761,6 +1804,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 		ceph_clear_error_write(ci);
 		pos += len;
 		written += len;
+		dout("sync_write written %d\n", written);
 		if (pos > i_size_read(inode)) {
 			check_caps = ceph_inode_set_size(inode, pos);
 			if (check_caps)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 45/48] ceph: set i_blkbits to crypto block size for encrypted inodes
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (43 preceding siblings ...)
  2022-01-11 19:16 ` [RFC PATCH v10 44/48] ceph: plumb in decryption during sync reads Jeff Layton
@ 2022-01-11 19:16 ` Jeff Layton
  2022-01-11 19:16 ` [RFC PATCH v10 46/48] ceph: add fscrypt decryption support to ceph_netfs_issue_op Jeff Layton
                   ` (6 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:16 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/inode.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index eecda0a73908..d7eff9c3e988 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -968,13 +968,6 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 	issued |= __ceph_caps_dirty(ci);
 	new_issued = ~issued & info_caps;
 
-	/* directories have fl_stripe_unit set to zero */
-	if (le32_to_cpu(info->layout.fl_stripe_unit))
-		inode->i_blkbits =
-			fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
-	else
-		inode->i_blkbits = CEPH_BLOCK_SHIFT;
-
 	__ceph_update_quota(ci, iinfo->max_bytes, iinfo->max_files);
 
 	if ((new_version || (new_issued & CEPH_CAP_AUTH_SHARED)) &&
@@ -999,6 +992,15 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 #endif
 	}
 
+	/* directories have fl_stripe_unit set to zero */
+	if (IS_ENCRYPTED(inode))
+		inode->i_blkbits = CEPH_FSCRYPT_BLOCK_SHIFT;
+	else if (le32_to_cpu(info->layout.fl_stripe_unit))
+		inode->i_blkbits =
+			fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
+	else
+		inode->i_blkbits = CEPH_BLOCK_SHIFT;
+
 	if ((new_version || (new_issued & CEPH_CAP_LINK_SHARED)) &&
 	    (issued & CEPH_CAP_LINK_EXCL) == 0)
 		set_nlink(inode, le32_to_cpu(info->nlink));
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 46/48] ceph: add fscrypt decryption support to ceph_netfs_issue_op
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (44 preceding siblings ...)
  2022-01-11 19:16 ` [RFC PATCH v10 45/48] ceph: set i_blkbits to crypto block size for encrypted inodes Jeff Layton
@ 2022-01-11 19:16 ` Jeff Layton
  2022-01-11 19:16 ` [RFC PATCH v10 47/48] ceph: add encryption support to writepage Jeff Layton
                   ` (5 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:16 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/addr.c | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index b3d9459c9bbd..dbc587a41fea 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -18,6 +18,7 @@
 #include "mds_client.h"
 #include "cache.h"
 #include "metric.h"
+#include "crypto.h"
 #include <linux/ceph/osd_client.h>
 #include <linux/ceph/striper.h>
 
@@ -200,7 +201,7 @@ static void ceph_netfs_expand_readahead(struct netfs_read_request *rreq)
 	rreq->len = roundup(rreq->len, lo->stripe_unit);
 }
 
-static bool ceph_netfs_clamp_length(struct netfs_read_subrequest *subreq)
+static size_t __ceph_netfs_clamp_length(struct netfs_read_subrequest *subreq)
 {
 	struct inode *inode = subreq->rreq->mapping->host;
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
@@ -211,13 +212,18 @@ static bool ceph_netfs_clamp_length(struct netfs_read_subrequest *subreq)
 	/* Truncate the extent at the end of the current block */
 	ceph_calc_file_object_mapping(&ci->i_layout, subreq->start, subreq->len,
 				      &objno, &objoff, &xlen);
-	subreq->len = min(xlen, fsc->mount_options->rsize);
-	return true;
+	return min(xlen, fsc->mount_options->rsize);
 }
 
+static bool ceph_netfs_clamp_length(struct netfs_read_subrequest *subreq)
+{
+	subreq->len = __ceph_netfs_clamp_length(subreq);
+	return true;
+}
 static void finish_netfs_read(struct ceph_osd_request *req)
 {
-	struct ceph_fs_client *fsc = ceph_inode_to_client(req->r_inode);
+	struct inode *inode = req->r_inode;
+	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
 	struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0);
 	struct netfs_read_subrequest *subreq = req->r_priv;
 	int num_pages;
@@ -235,8 +241,16 @@ static void finish_netfs_read(struct ceph_osd_request *req)
 	else if (err == -EBLOCKLISTED)
 		fsc->blocklisted = true;
 
-	if (err >= 0 && err < subreq->len)
-		__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+	if (err >= 0) {
+		if (err < subreq->len)
+			__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
+		if (IS_ENCRYPTED(inode)) {
+			err = ceph_fscrypt_decrypt_pages(inode, osd_data->pages,
+							 subreq->start, err);
+			if (err > subreq->len)
+				err = subreq->len;
+		}
+	}
 
 	netfs_subreq_terminated(subreq, err, true);
 
@@ -258,8 +272,11 @@ static void ceph_netfs_issue_op(struct netfs_read_subrequest *subreq)
 	size_t page_off;
 	int err = 0;
 	u64 len = subreq->len;
+	u64 off = subreq->start;
+
+	fscrypt_adjust_off_and_len(inode, &off, &len);
 
-	req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout, vino, subreq->start, &len,
+	req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout, vino, off, &len,
 			0, 1, CEPH_OSD_OP_READ,
 			CEPH_OSD_FLAG_READ | fsc->client->osdc.client->options->read_from_replica,
 			NULL, ci->i_truncate_seq, ci->i_truncate_size, false);
@@ -270,7 +287,7 @@ static void ceph_netfs_issue_op(struct netfs_read_subrequest *subreq)
 	}
 
 	dout("%s: pos=%llu orig_len=%zu len=%llu\n", __func__, subreq->start, subreq->len, len);
-	iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages, subreq->start, len);
+	iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages, off, len);
 	err = iov_iter_get_pages_alloc(&iter, &pages, len, &page_off);
 	if (err < 0) {
 		dout("%s: iov_ter_get_pages_alloc returned %d\n", __func__, err);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 47/48] ceph: add encryption support to writepage
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (45 preceding siblings ...)
  2022-01-11 19:16 ` [RFC PATCH v10 46/48] ceph: add fscrypt decryption support to ceph_netfs_issue_op Jeff Layton
@ 2022-01-11 19:16 ` Jeff Layton
  2022-01-11 19:16 ` [RFC PATCH v10 48/48] ceph: fscrypt support for writepages Jeff Layton
                   ` (4 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:16 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/addr.c | 34 +++++++++++++++++++++++++++-------
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index dbc587a41fea..46ff50a2474e 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -545,10 +545,12 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
 	loff_t page_off = page_offset(page);
 	int err;
 	loff_t len = thp_size(page);
+	loff_t wlen;
 	struct ceph_writeback_ctl ceph_wbc;
 	struct ceph_osd_client *osdc = &fsc->client->osdc;
 	struct ceph_osd_request *req;
 	bool caching = ceph_is_cache_enabled(inode);
+	struct page *bounce_page = NULL;
 
 	dout("writepage %p idx %lu\n", page, page->index);
 
@@ -579,6 +581,8 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
 
 	if (ceph_wbc.i_size < page_off + len)
 		len = ceph_wbc.i_size - page_off;
+	if (IS_ENCRYPTED(inode))
+		wlen = round_up(len, CEPH_FSCRYPT_BLOCK_SIZE);
 
 	dout("writepage %p page %p index %lu on %llu~%llu snapc %p seq %lld\n",
 	     inode, page, page->index, page_off, len, snapc, snapc->seq);
@@ -587,22 +591,37 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
 	    CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb))
 		set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
 
-	req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode), page_off, &len, 0, 1,
-				    CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE, snapc,
-				    ceph_wbc.truncate_seq, ceph_wbc.truncate_size,
-				    true);
+	req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode),
+				    page_off, &wlen, 0, 1, CEPH_OSD_OP_WRITE,
+				    CEPH_OSD_FLAG_WRITE, snapc,
+				    ceph_wbc.truncate_seq,
+				    ceph_wbc.truncate_size, true);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
+	if (wlen < len)
+		len = wlen;
+
 	set_page_writeback(page);
 	if (caching)
 		ceph_set_page_fscache(page);
 	ceph_fscache_write_to_cache(inode, page_off, len, caching);
 
+	if (IS_ENCRYPTED(inode)) {
+		bounce_page = fscrypt_encrypt_pagecache_blocks(page, CEPH_FSCRYPT_BLOCK_SIZE,
+								0, GFP_NOFS);
+		if (IS_ERR(bounce_page)) {
+			err = PTR_ERR(bounce_page);
+			goto out;
+		}
+	}
 	/* it may be a short write due to an object boundary */
 	WARN_ON_ONCE(len > thp_size(page));
-	osd_req_op_extent_osd_data_pages(req, 0, &page, len, 0, false, false);
-	dout("writepage %llu~%llu (%llu bytes)\n", page_off, len, len);
+	osd_req_op_extent_osd_data_pages(req, 0,
+			bounce_page ? &bounce_page : &page, wlen, 0,
+			false, false);
+	dout("writepage %llu~%llu (%llu bytes, %sencrypted)\n",
+	     page_off, len, wlen, IS_ENCRYPTED(inode) ? "" : "not ");
 
 	req->r_mtime = inode->i_mtime;
 	err = ceph_osdc_start_request(osdc, req, true);
@@ -611,7 +630,8 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
 
 	ceph_update_write_metrics(&fsc->mdsc->metric, req->r_start_latency,
 				  req->r_end_latency, len, err);
-
+	fscrypt_free_bounce_page(bounce_page);
+out:
 	ceph_osdc_put_request(req);
 	if (err == 0)
 		err = len;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [RFC PATCH v10 48/48] ceph: fscrypt support for writepages
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (46 preceding siblings ...)
  2022-01-11 19:16 ` [RFC PATCH v10 47/48] ceph: add encryption support to writepage Jeff Layton
@ 2022-01-11 19:16 ` Jeff Layton
  2022-01-11 19:26 ` [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (3 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:16 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/addr.c   | 61 +++++++++++++++++++++++++++++++++++++++---------
 fs/ceph/crypto.h | 17 ++++++++++++++
 2 files changed, 67 insertions(+), 11 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 46ff50a2474e..e9a886282af0 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -507,10 +507,12 @@ static u64 get_writepages_data_length(struct inode *inode,
 				      struct page *page, u64 start)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
-	struct ceph_snap_context *snapc = page_snap_context(page);
+	struct ceph_snap_context *snapc;
 	struct ceph_cap_snap *capsnap = NULL;
 	u64 end = i_size_read(inode);
+	u64 ret;
 
+	snapc = page_snap_context(ceph_fscrypt_pagecache_page(page));
 	if (snapc != ci->i_head_snapc) {
 		bool found = false;
 		spin_lock(&ci->i_ceph_lock);
@@ -525,9 +527,12 @@ static u64 get_writepages_data_length(struct inode *inode,
 		spin_unlock(&ci->i_ceph_lock);
 		WARN_ON(!found);
 	}
-	if (end > page_offset(page) + thp_size(page))
-		end = page_offset(page) + thp_size(page);
-	return end > start ? end - start : 0;
+	if (end > ceph_fscrypt_page_offset(page) + thp_size(page))
+		end = ceph_fscrypt_page_offset(page) + thp_size(page);
+	ret = end > start ? end - start : 0;
+	if (ret && fscrypt_is_bounce_page(page))
+		ret = round_up(ret, CEPH_FSCRYPT_BLOCK_SIZE);
+	return ret;
 }
 
 /*
@@ -743,6 +748,11 @@ static void writepages_finish(struct ceph_osd_request *req)
 		total_pages += num_pages;
 		for (j = 0; j < num_pages; j++) {
 			page = osd_data->pages[j];
+			if (fscrypt_is_bounce_page(page)) {
+				page = fscrypt_pagecache_page(page);
+				fscrypt_free_bounce_page(osd_data->pages[j]);
+				osd_data->pages[j] = page;
+			}
 			BUG_ON(!page);
 			WARN_ON(!PageUptodate(page));
 
@@ -1001,8 +1011,27 @@ static int ceph_writepages_start(struct address_space *mapping,
 						  BLK_RW_ASYNC);
 			}
 
+			if (IS_ENCRYPTED(inode)) {
+				pages[locked_pages] =
+					fscrypt_encrypt_pagecache_blocks(page,
+						PAGE_SIZE, 0,
+						locked_pages ? GFP_NOWAIT : GFP_NOFS);
+				if (IS_ERR(pages[locked_pages])) {
+					if (PTR_ERR(pages[locked_pages]) == -EINVAL)
+						pr_err("%s: inode->i_blkbits=%hhu\n",
+							__func__, inode->i_blkbits);
+					/* better not fail on first page! */
+					BUG_ON(locked_pages == 0);
+					pages[locked_pages] = NULL;
+					redirty_page_for_writepage(wbc, page);
+					unlock_page(page);
+					break;
+				}
+				++locked_pages;
+			} else {
+				pages[locked_pages++] = page;
+			}
 
-			pages[locked_pages++] = page;
 			pvec.pages[i] = NULL;
 
 			len += thp_size(page);
@@ -1032,7 +1061,7 @@ static int ceph_writepages_start(struct address_space *mapping,
 		}
 
 new_request:
-		offset = page_offset(pages[0]);
+		offset = ceph_fscrypt_page_offset(pages[0]);
 		len = wsize;
 
 		req = ceph_osdc_new_request(&fsc->client->osdc,
@@ -1053,8 +1082,8 @@ static int ceph_writepages_start(struct address_space *mapping,
 						ceph_wbc.truncate_size, true);
 			BUG_ON(IS_ERR(req));
 		}
-		BUG_ON(len < page_offset(pages[locked_pages - 1]) +
-			     thp_size(page) - offset);
+		BUG_ON(len < ceph_fscrypt_page_offset(pages[locked_pages - 1]) +
+			     thp_size(pages[locked_pages -1]) - offset);
 
 		req->r_callback = writepages_finish;
 		req->r_inode = inode;
@@ -1064,7 +1093,9 @@ static int ceph_writepages_start(struct address_space *mapping,
 		data_pages = pages;
 		op_idx = 0;
 		for (i = 0; i < locked_pages; i++) {
-			u64 cur_offset = page_offset(pages[i]);
+			struct page *page = ceph_fscrypt_pagecache_page(pages[i]);
+
+			u64 cur_offset = page_offset(page);
 			/*
 			 * Discontinuity in page range? Ceph can handle that by just passing
 			 * multiple extents in the write op.
@@ -1093,9 +1124,9 @@ static int ceph_writepages_start(struct address_space *mapping,
 				op_idx++;
 			}
 
-			set_page_writeback(pages[i]);
+			set_page_writeback(page);
 			if (caching)
-				ceph_set_page_fscache(pages[i]);
+				ceph_set_page_fscache(page);
 			len += thp_size(page);
 		}
 		ceph_fscache_write_to_cache(inode, offset, len, caching);
@@ -1111,8 +1142,16 @@ static int ceph_writepages_start(struct address_space *mapping,
 							 offset);
 			len = max(len, min_len);
 		}
+		if (IS_ENCRYPTED(inode))
+			len = round_up(len, CEPH_FSCRYPT_BLOCK_SIZE);
+
 		dout("writepages got pages at %llu~%llu\n", offset, len);
 
+		if (IS_ENCRYPTED(inode) &&
+		    ((offset | len) & ~CEPH_FSCRYPT_BLOCK_MASK))
+			pr_warn("%s: bad encrypted write offset=%lld len=%llu\n",
+				__func__, offset, len);
+
 		osd_req_op_extent_osd_data_pages(req, op_idx, data_pages, len,
 						 0, from_pool, false);
 		osd_req_op_extent_update(req, op_idx, len);
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 3b7efffecbeb..33c11653d177 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -140,6 +140,13 @@ int ceph_fscrypt_encrypt_block_inplace(const struct inode *inode,
 int ceph_fscrypt_decrypt_pages(struct inode *inode, struct page **page, u64 off, int len);
 int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **page, u64 off,
 				int len, gfp_t gfp);
+
+static inline struct page *ceph_fscrypt_pagecache_page(struct page *page)
+{
+        return fscrypt_is_bounce_page(page) ?
+		fscrypt_pagecache_page(page) : page;
+}
+
 #else /* CONFIG_FS_ENCRYPTION */
 
 static inline void ceph_fscrypt_set_ops(struct super_block *sb)
@@ -215,6 +222,16 @@ static inline int ceph_fscrypt_encrypt_pages(struct inode *inode, struct page **
 {
 	return 0;
 }
+
+static inline struct page *ceph_fscrypt_pagecache_page(struct page *page)
+{
+        return page;
+}
 #endif /* CONFIG_FS_ENCRYPTION */
 
+static inline loff_t ceph_fscrypt_page_offset(struct page *page)
+{
+        return page_offset(ceph_fscrypt_pagecache_page(page));
+}
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 00/48] ceph+fscrypt: full support
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (47 preceding siblings ...)
  2022-01-11 19:16 ` [RFC PATCH v10 48/48] ceph: fscrypt support for writepages Jeff Layton
@ 2022-01-11 19:26 ` Jeff Layton
  2022-01-27  2:14 ` Eric Biggers
                   ` (2 subsequent siblings)
  51 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-11 19:26 UTC (permalink / raw)
  To: ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov, xiubli, Luis Henriques

On Tue, 2022-01-11 at 14:15 -0500, Jeff Layton wrote:
> This patchset represents a (mostly) complete rough draft of fscrypt
> support for cephfs. The context, filename and symlink support is more or
> less the same as the versions posted before, and comprise the first half
> of the patches.
> 
> The new bits here are the size handling changes and support for content
> encryption, in buffered, direct and synchronous codepaths. Much of this
> code is still very rough and needs a lot of cleanup work.
> 
> fscrypt support relies on some MDS changes that are being tracked here:
> 
>     https://github.com/ceph/ceph/pull/43588
> 
> In particular, this PR adds some new opaque fields in the inode that we
> use to store fscrypt-specific information, like the context and the real
> size of a file. That is slated to be merged for the upcoming Quincy
> release (which is sometime this northern spring).
> 
> There are still some notable bugs:
> 
> 1/ we've identified a few more potential races in truncate handling
> which will probably necessitate a protocol change, as well as changes to
> the MDS and kclient patchsets. The good news is that we think we have
> an approach that will resolve this.
> 
> 2/ the kclient doesn't handle reading sparse regions in OSD objects
> properly yet. The client can end up writing to a non-zero offset in a
> non-existent object. Then, if the client tries to read the written
> region back later, it'll get back zeroes and give you garbage when you
> try to decrypt them.
> 
> It turns out that the OSD already supports a SPARSE_READ operation, so
> I'm working on implementing that in the kclient to make it not try to
> decrypt the sparse regions.
> 
> Still, I was able to run xfstests on this set yesterday. Bug #2 above
> prevented all of the tests from passing, but it didn't oops! I call that
> progress! Given that, I figured this is a good time to post what I have
> so far.
> 
> Note that the buffered I/O changes in this set are not suitable for
> merge and will likely end up being discarded. We need to plumb the
> encryption in at the netfs layer, so that we can store encrypted data
> in fscache.
> 
> The non-buffered codepaths will likely also need substantial changes
> before merging. It may be simpler to just move that into the netfs layer
> too as cifs will need something similar anyway.
> 
> My goal is to get most of this into v5.18, but v5.19 might be more
> realistiv. Hopefully I'll have a non-RFC patchset to send in a few
> weeks.
> 
> Special thanks to Xiubo who came through with the MDS patches. Also,
> thanks to everyone (especially Eric Biggers) for all of the previous
> reviews. It's much appreciated!
> 
> Jeff Layton (43):
>   vfs: export new_inode_pseudo
>   fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode
>   fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
>   fscrypt: add fscrypt_context_for_new_inode
>   ceph: preallocate inode for ops that may create one
>   ceph: crypto context handling for ceph
>   ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces
>   ceph: add fscrypt_* handling to caps.c
>   ceph: add ability to set fscrypt_auth via setattr
>   ceph: implement -o test_dummy_encryption mount option
>   ceph: decode alternate_name in lease info
>   ceph: add fscrypt ioctls
>   ceph: make ceph_msdc_build_path use ref-walk
>   ceph: add encrypted fname handling to ceph_mdsc_build_path
>   ceph: send altname in MClientRequest
>   ceph: encode encrypted name in dentry release
>   ceph: properly set DCACHE_NOKEY_NAME flag in lookup
>   ceph: make d_revalidate call fscrypt revalidator for encrypted
>     dentries
>   ceph: add helpers for converting names for userland presentation
>   ceph: add fscrypt support to ceph_fill_trace
>   ceph: add support to readdir for encrypted filenames
>   ceph: create symlinks with encrypted and base64-encoded targets
>   ceph: make ceph_get_name decrypt filenames
>   ceph: add a new ceph.fscrypt.auth vxattr
>   ceph: add some fscrypt guardrails
>   libceph: add CEPH_OSD_OP_ASSERT_VER support
>   ceph: size handling for encrypted inodes in cap updates
>   ceph: fscrypt_file field handling in MClientRequest messages
>   ceph: get file size from fscrypt_file when present in inode traces
>   ceph: handle fscrypt fields in cap messages from MDS
>   ceph: add infrastructure for file encryption and decryption
>   libceph: allow ceph_osdc_new_request to accept a multi-op read
>   ceph: disable fallocate for encrypted inodes
>   ceph: disable copy offload on encrypted inodes
>   ceph: don't use special DIO path for encrypted inodes
>   ceph: set encryption context on open
>   ceph: align data in pages in ceph_sync_write
>   ceph: add read/modify/write to ceph_sync_write
>   ceph: plumb in decryption during sync reads
>   ceph: set i_blkbits to crypto block size for encrypted inodes
>   ceph: add fscrypt decryption support to ceph_netfs_issue_op
>   ceph: add encryption support to writepage
>   ceph: fscrypt support for writepages
> 
> Luis Henriques (1):
>   ceph: don't allow changing layout on encrypted files/directories
> 
> Xiubo Li (4):
>   ceph: add __ceph_get_caps helper support
>   ceph: add __ceph_sync_read helper support
>   ceph: add object version support for sync read
>   ceph: add truncate size handling support for fscrypt
> 
>  fs/ceph/Makefile                |   1 +
>  fs/ceph/acl.c                   |   4 +-
>  fs/ceph/addr.c                  | 128 +++++--
>  fs/ceph/caps.c                  | 211 ++++++++++--
>  fs/ceph/crypto.c                | 374 +++++++++++++++++++++
>  fs/ceph/crypto.h                | 237 +++++++++++++
>  fs/ceph/dir.c                   | 209 +++++++++---
>  fs/ceph/export.c                |  44 ++-
>  fs/ceph/file.c                  | 476 +++++++++++++++++++++-----
>  fs/ceph/inode.c                 | 576 +++++++++++++++++++++++++++++---
>  fs/ceph/ioctl.c                 |  87 +++++
>  fs/ceph/mds_client.c            | 349 ++++++++++++++++---
>  fs/ceph/mds_client.h            |  24 +-
>  fs/ceph/super.c                 |  90 ++++-
>  fs/ceph/super.h                 |  43 ++-
>  fs/ceph/xattr.c                 |  29 ++
>  fs/crypto/fname.c               |  44 ++-
>  fs/crypto/fscrypt_private.h     |   9 +-
>  fs/crypto/hooks.c               |   6 +-
>  fs/crypto/policy.c              |  35 +-
>  fs/inode.c                      |   1 +
>  include/linux/ceph/ceph_fs.h    |  21 +-
>  include/linux/ceph/osd_client.h |   6 +-
>  include/linux/ceph/rados.h      |   4 +
>  include/linux/fscrypt.h         |  10 +
>  net/ceph/osd_client.c           |  32 +-
>  26 files changed, 2700 insertions(+), 350 deletions(-)
>  create mode 100644 fs/ceph/crypto.c
>  create mode 100644 fs/ceph/crypto.h
> 

I should also mention that I've pushed this series into a new
wip-fscrypt branch in the ceph-client tree for anyone that wants to
check it out.

    https://github.com/ceph/ceph-client/commits/wip-fscrypt

I can't recommend this for general use yet until the data corruption
bugs are fixed, of course.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 36/48] ceph: add truncate size handling support for fscrypt
  2022-01-11 19:15 ` [RFC PATCH v10 36/48] ceph: add truncate size handling support for fscrypt Jeff Layton
@ 2022-01-12  8:41   ` Xiubo Li
  0 siblings, 0 replies; 84+ messages in thread
From: Xiubo Li @ 2022-01-12  8:41 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Hi Jeff,

I have post the V8 for this patch by switching the 'header.objver' to 
'header.change_attr' to gate the truncate operation to fix the first 
notable bug you mentioned in the cover-letter.

Regards

-- Xiubo


On 1/12/22 3:15 AM, Jeff Layton wrote:
> From: Xiubo Li <xiubli@redhat.com>
>
> This will transfer the encrypted last block contents to the MDS
> along with the truncate request only when the new size is smaller
> and not aligned to the fscrypt BLOCK size. When the last block is
> located in the file hole, the truncate request will only contain
> the header.
>
> The MDS could fail to do the truncate if there has another client
> or process has already updated the RADOS object which contains
> the last block, and will return -EAGAIN, then the kclient needs
> to retry it. The RMW will take around 50ms, and will let it retry
> 20 times for now.
>
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>   fs/ceph/crypto.h |  21 +++++
>   fs/ceph/inode.c  | 217 ++++++++++++++++++++++++++++++++++++++++++++---
>   fs/ceph/super.h  |   5 ++
>   3 files changed, 229 insertions(+), 14 deletions(-)
>
> diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
> index b5d360085fe8..3b7efffecbeb 100644
> --- a/fs/ceph/crypto.h
> +++ b/fs/ceph/crypto.h
> @@ -25,6 +25,27 @@ struct ceph_fname {
>   	u32		ctext_len;	// length of crypttext
>   };
>   
> +/*
> + * Header for the crypted file when truncating the size, this
> + * will be sent to MDS, and the MDS will update the encrypted
> + * last block and then truncate the size.
> + */
> +struct ceph_fscrypt_truncate_size_header {
> +       __u8  ver;
> +       __u8  compat;
> +
> +       /*
> +	* It will be sizeof(assert_ver + file_offset + block_size)
> +	* if the last block is empty when it's located in a file
> +	* hole. Or the data_len will plus CEPH_FSCRYPT_BLOCK_SIZE.
> +	*/
> +       __le32 data_len;
> +
> +       __le64 assert_ver;
> +       __le64 file_offset;
> +       __le32 block_size;
> +} __packed;
> +
>   struct ceph_fscrypt_auth {
>   	__le32	cfa_version;
>   	__le32	cfa_blob_len;
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 2497306eef58..eecda0a73908 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -586,6 +586,7 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
>   	ci->i_truncate_seq = 0;
>   	ci->i_truncate_size = 0;
>   	ci->i_truncate_pending = 0;
> +	ci->i_truncate_pagecache_size = 0;
>   
>   	ci->i_max_size = 0;
>   	ci->i_reported_size = 0;
> @@ -759,6 +760,10 @@ int ceph_fill_file_size(struct inode *inode, int issued,
>   		dout("truncate_size %lld -> %llu\n", ci->i_truncate_size,
>   		     truncate_size);
>   		ci->i_truncate_size = truncate_size;
> +		if (IS_ENCRYPTED(inode))
> +			ci->i_truncate_pagecache_size = size;
> +		else
> +			ci->i_truncate_pagecache_size = truncate_size;
>   	}
>   	return queue_trunc;
>   }
> @@ -1015,7 +1020,7 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
>   
>   	if (new_version ||
>   	    (new_issued & (CEPH_CAP_ANY_FILE_RD | CEPH_CAP_ANY_FILE_WR))) {
> -		u64 size = info->size;
> +		u64 size = le64_to_cpu(info->size);
>   		s64 old_pool = ci->i_layout.pool_id;
>   		struct ceph_string *old_ns;
>   
> @@ -1030,16 +1035,20 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
>   		pool_ns = old_ns;
>   
>   		if (IS_ENCRYPTED(inode) && size &&
> -		    (iinfo->fscrypt_file_len == sizeof(__le64))) {
> -			size = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file);
> -			if (info->size != round_up(size, CEPH_FSCRYPT_BLOCK_SIZE))
> -				pr_warn("size=%llu fscrypt_file=%llu\n", info->size, size);
> +		    (iinfo->fscrypt_file_len >= sizeof(__le64))) {
> +			u64 fsize = __le64_to_cpu(*(__le64 *)iinfo->fscrypt_file);
> +			if (fsize) {
> +				size = fsize;
> +				if (le64_to_cpu(info->size) !=
> +				    round_up(size, CEPH_FSCRYPT_BLOCK_SIZE))
> +					pr_warn("size=%llu fscrypt_file=%llu\n",
> +						info->size, size);
> +			}
>   		}
>   
>   		queue_trunc = ceph_fill_file_size(inode, issued,
>   					le32_to_cpu(info->truncate_seq),
> -					le64_to_cpu(info->truncate_size),
> -					le64_to_cpu(size));
> +					le64_to_cpu(info->truncate_size), size);
>   		/* only update max_size on auth cap */
>   		if ((info->cap.flags & CEPH_CAP_FLAG_AUTH) &&
>   		    ci->i_max_size != le64_to_cpu(info->max_size)) {
> @@ -2153,7 +2162,7 @@ void __ceph_do_pending_vmtruncate(struct inode *inode)
>   	/* there should be no reader or writer */
>   	WARN_ON_ONCE(ci->i_rd_ref || ci->i_wr_ref);
>   
> -	to = ci->i_truncate_size;
> +	to = ci->i_truncate_pagecache_size;
>   	wrbuffer_refs = ci->i_wrbuffer_ref;
>   	dout("__do_pending_vmtruncate %p (%d) to %lld\n", inode,
>   	     ci->i_truncate_pending, to);
> @@ -2163,7 +2172,7 @@ void __ceph_do_pending_vmtruncate(struct inode *inode)
>   	truncate_pagecache(inode, to);
>   
>   	spin_lock(&ci->i_ceph_lock);
> -	if (to == ci->i_truncate_size) {
> +	if (to == ci->i_truncate_pagecache_size) {
>   		ci->i_truncate_pending = 0;
>   		finish = 1;
>   	}
> @@ -2244,6 +2253,143 @@ static const struct inode_operations ceph_encrypted_symlink_iops = {
>   	.listxattr = ceph_listxattr,
>   };
>   
> +/*
> + * Transfer the encrypted last block to the MDS and the MDS
> + * will help update it when truncating a smaller size.
> + *
> + * We don't support a PAGE_SIZE that is smaller than the
> + * CEPH_FSCRYPT_BLOCK_SIZE.
> + */
> +static int fill_fscrypt_truncate(struct inode *inode,
> +				 struct ceph_mds_request *req,
> +				 struct iattr *attr)
> +{
> +	struct ceph_inode_info *ci = ceph_inode(inode);
> +	int boff = attr->ia_size % CEPH_FSCRYPT_BLOCK_SIZE;
> +	loff_t pos, orig_pos = round_down(attr->ia_size, CEPH_FSCRYPT_BLOCK_SIZE);
> +	u64 block = orig_pos >> CEPH_FSCRYPT_BLOCK_SHIFT;
> +	struct ceph_pagelist *pagelist = NULL;
> +	struct kvec iov;
> +	struct iov_iter iter;
> +	struct page *page = NULL;
> +	struct ceph_fscrypt_truncate_size_header header;
> +	int retry_op = 0;
> +	int len = CEPH_FSCRYPT_BLOCK_SIZE;
> +	loff_t i_size = i_size_read(inode);
> +	int got, ret, issued;
> +	u64 objver;
> +
> +	ret = __ceph_get_caps(inode, NULL, CEPH_CAP_FILE_RD, 0, -1, &got);
> +	if (ret < 0)
> +		return ret;
> +
> +	issued = __ceph_caps_issued(ci, NULL);
> +
> +	dout("%s size %lld -> %lld got cap refs on %s, issued %s\n", __func__,
> +	     i_size, attr->ia_size, ceph_cap_string(got),
> +	     ceph_cap_string(issued));
> +
> +	/* Try to writeback the dirty pagecaches */
> +	if (issued & (CEPH_CAP_FILE_BUFFER))
> +		filemap_write_and_wait(inode->i_mapping);
> +
> +	page = __page_cache_alloc(GFP_KERNEL);
> +	if (page == NULL) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	pagelist = ceph_pagelist_alloc(GFP_KERNEL);
> +	if (!pagelist) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	iov.iov_base = kmap_local_page(page);
> +	iov.iov_len = len;
> +	iov_iter_kvec(&iter, READ, &iov, 1, len);
> +
> +	pos = orig_pos;
> +	ret = __ceph_sync_read(inode, &pos, &iter, &retry_op, &objver);
> +	ceph_put_cap_refs(ci, got);
> +	if (ret < 0)
> +		goto out;
> +
> +	/* Insert the header first */
> +	header.ver = 1;
> +	header.compat = 1;
> +
> +	/*
> +	 * Always set the block_size to CEPH_FSCRYPT_BLOCK_SIZE,
> +	 * because in MDS it may need this to do the truncate.
> +	 */
> +	header.block_size = cpu_to_le32(CEPH_FSCRYPT_BLOCK_SIZE);
> +
> +	/*
> +	 * If we hit a hole here, we should just skip filling
> +	 * the fscrypt for the request, because once the fscrypt
> +	 * is enabled, the file will be split into many blocks
> +	 * with the size of CEPH_FSCRYPT_BLOCK_SIZE, if there
> +	 * has a hole, the hole size should be multiple of block
> +	 * size.
> +	 *
> +	 * If the Rados object doesn't exist, it will be set 0.
> +	 */
> +	if (!objver) {
> +		dout("%s hit hole, ppos %lld < size %lld\n", __func__,
> +		     pos, i_size);
> +
> +		header.data_len = cpu_to_le32(8 + 8 + 4);
> +
> +		/*
> +		 * If the "assert_ver" is 0 means hitting a hole, and
> +		 * the MDS will use the it to check whether hitting a
> +		 * hole or not.
> +		 */
> +		header.assert_ver = 0;
> +		header.file_offset = 0;
> +		ret = 0;
> +	} else {
> +		header.data_len = cpu_to_le32(8 + 8 + 4 + CEPH_FSCRYPT_BLOCK_SIZE);
> +		header.assert_ver = cpu_to_le64(objver);
> +		header.file_offset = cpu_to_le64(orig_pos);
> +
> +		/* truncate and zero out the extra contents for the last block */
> +		memset(iov.iov_base + boff, 0, PAGE_SIZE - boff);
> +
> +		/* encrypt the last block */
> +		ret = ceph_fscrypt_encrypt_block_inplace(inode, page,
> +						    CEPH_FSCRYPT_BLOCK_SIZE,
> +						    0, block,
> +						    GFP_KERNEL);
> +		if (ret)
> +			goto out;
> +	}
> +
> +	/* Insert the header */
> +	ret = ceph_pagelist_append(pagelist, &header, sizeof(header));
> +	if (ret)
> +		goto out;
> +
> +	if (header.block_size) {
> +		/* Append the last block contents to pagelist */
> +		ret = ceph_pagelist_append(pagelist, iov.iov_base,
> +					   CEPH_FSCRYPT_BLOCK_SIZE);
> +		if (ret)
> +			goto out;
> +	}
> +	req->r_pagelist = pagelist;
> +out:
> +	dout("%s %p size dropping cap refs on %s\n", __func__,
> +	     inode, ceph_cap_string(got));
> +	kunmap_local(iov.iov_base);
> +	if (page)
> +		__free_pages(page, 0);
> +	if (ret && pagelist)
> +		ceph_pagelist_release(pagelist);
> +	return ret;
> +}
> +
>   int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia)
>   {
>   	struct ceph_inode_info *ci = ceph_inode(inode);
> @@ -2251,13 +2397,17 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>   	struct ceph_mds_request *req;
>   	struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
>   	struct ceph_cap_flush *prealloc_cf;
> +	loff_t isize = i_size_read(inode);
>   	int issued;
>   	int release = 0, dirtied = 0;
>   	int mask = 0;
>   	int err = 0;
>   	int inode_dirty_flags = 0;
>   	bool lock_snap_rwsem = false;
> +	bool fill_fscrypt;
> +	int truncate_retry = 20; /* The RMW will take around 50ms */
>   
> +retry:
>   	prealloc_cf = ceph_alloc_cap_flush();
>   	if (!prealloc_cf)
>   		return -ENOMEM;
> @@ -2269,6 +2419,7 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>   		return PTR_ERR(req);
>   	}
>   
> +	fill_fscrypt = false;
>   	spin_lock(&ci->i_ceph_lock);
>   	issued = __ceph_caps_issued(ci, NULL);
>   
> @@ -2390,10 +2541,27 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>   		}
>   	}
>   	if (ia_valid & ATTR_SIZE) {
> -		loff_t isize = i_size_read(inode);
> -
>   		dout("setattr %p size %lld -> %lld\n", inode, isize, attr->ia_size);
> -		if ((issued & CEPH_CAP_FILE_EXCL) && attr->ia_size >= isize) {
> +		/*
> +		 * Only when the new size is smaller and not aligned to
> +		 * CEPH_FSCRYPT_BLOCK_SIZE will the RMW is needed.
> +		 */
> +		if (IS_ENCRYPTED(inode) && attr->ia_size < isize &&
> +		    (attr->ia_size % CEPH_FSCRYPT_BLOCK_SIZE)) {
> +			mask |= CEPH_SETATTR_SIZE;
> +			release |= CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL |
> +				   CEPH_CAP_FILE_RD | CEPH_CAP_FILE_WR;
> +			set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
> +			mask |= CEPH_SETATTR_FSCRYPT_FILE;
> +			req->r_args.setattr.size =
> +				cpu_to_le64(round_up(attr->ia_size,
> +						     CEPH_FSCRYPT_BLOCK_SIZE));
> +			req->r_args.setattr.old_size =
> +				cpu_to_le64(round_up(isize,
> +						     CEPH_FSCRYPT_BLOCK_SIZE));
> +			req->r_fscrypt_file = attr->ia_size;
> +			fill_fscrypt = true;
> +		} else if ((issued & CEPH_CAP_FILE_EXCL) && attr->ia_size >= isize) {
>   			if (attr->ia_size > isize) {
>   				i_size_write(inode, attr->ia_size);
>   				inode->i_blocks = calc_inode_blocks(attr->ia_size);
> @@ -2416,7 +2584,6 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>   					cpu_to_le64(round_up(isize,
>   							     CEPH_FSCRYPT_BLOCK_SIZE));
>   				req->r_fscrypt_file = attr->ia_size;
> -				/* FIXME: client must zero out any partial blocks! */
>   			} else {
>   				req->r_args.setattr.size = cpu_to_le64(attr->ia_size);
>   				req->r_args.setattr.old_size = cpu_to_le64(isize);
> @@ -2482,8 +2649,10 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>   
>   	release &= issued;
>   	spin_unlock(&ci->i_ceph_lock);
> -	if (lock_snap_rwsem)
> +	if (lock_snap_rwsem) {
>   		up_read(&mdsc->snap_rwsem);
> +		lock_snap_rwsem = false;
> +	}
>   
>   	if (inode_dirty_flags)
>   		__mark_inode_dirty(inode, inode_dirty_flags);
> @@ -2495,7 +2664,27 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>   		req->r_args.setattr.mask = cpu_to_le32(mask);
>   		req->r_num_caps = 1;
>   		req->r_stamp = attr->ia_ctime;
> +		if (fill_fscrypt) {
> +			err = fill_fscrypt_truncate(inode, req, attr);
> +			if (err)
> +				goto out;
> +		}
> +
> +		/*
> +		 * The truncate request will return -EAGAIN when the
> +		 * last block has been updated just before the MDS
> +		 * successfully gets the xlock for the FILE lock. To
> +		 * avoid corrupting the file contents we need to retry
> +		 * it.
> +		 */
>   		err = ceph_mdsc_do_request(mdsc, NULL, req);
> +		if (err == -EAGAIN && truncate_retry--) {
> +			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
> +			     inode, err, ceph_cap_string(dirtied), mask);
> +			ceph_mdsc_put_request(req);
> +			ceph_free_cap_flush(prealloc_cf);
> +			goto retry;
> +		}
>   	}
>   out:
>   	dout("setattr %p result=%d (%s locally, %d remote)\n", inode, err,
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 4d2ccb51fe61..cd4a83fcbc0f 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -410,6 +410,11 @@ struct ceph_inode_info {
>   	u32 i_truncate_seq;        /* last truncate to smaller size */
>   	u64 i_truncate_size;       /*  and the size we last truncated down to */
>   	int i_truncate_pending;    /*  still need to call vmtruncate */
> +	/*
> +	 * For none fscrypt case it equals to i_truncate_size or it will
> +	 * equals to fscrypt_file_size
> +	 */
> +	u64 i_truncate_pagecache_size;
>   
>   	u64 i_max_size;            /* max file size authorized by mds */
>   	u64 i_reported_size; /* (max_)size reported to or requested of mds */


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 43/48] ceph: add read/modify/write to ceph_sync_write
  2022-01-11 19:16 ` [RFC PATCH v10 43/48] ceph: add read/modify/write to ceph_sync_write Jeff Layton
@ 2022-01-19  3:21   ` Xiubo Li
  2022-01-19  5:08     ` Xiubo Li
  0 siblings, 1 reply; 84+ messages in thread
From: Xiubo Li @ 2022-01-19  3:21 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov


On 1/12/22 3:16 AM, Jeff Layton wrote:
> When doing a synchronous write on an encrypted inode, we have no
> guarantee that the caller is writing crypto block-aligned data. When
> that happens, we must do a read/modify/write cycle.
>
> First, expand the range to cover complete blocks. If we had to change
> the original pos or length, issue a read to fill the first and/or last
> pages, and fetch the version of the object from the result.
>
> We then copy data into the pages as usual, encrypt the result and issue
> a write prefixed by an assertion that the version hasn't changed. If it has
> changed then we restart the whole thing again.
>
> If there is no object at that position in the file (-ENOENT), we prefix
> the write on an exclusive create of the object instead.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>   fs/ceph/file.c | 260 +++++++++++++++++++++++++++++++++++++++++++------
>   1 file changed, 228 insertions(+), 32 deletions(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index a6305ad5519b..41766b2012e9 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1468,18 +1468,16 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
>   	struct inode *inode = file_inode(file);
>   	struct ceph_inode_info *ci = ceph_inode(inode);
>   	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> -	struct ceph_vino vino;
> +	struct ceph_osd_client *osdc = &fsc->client->osdc;
>   	struct ceph_osd_request *req;
>   	struct page **pages;
>   	u64 len;
>   	int num_pages;
>   	int written = 0;
> -	int flags;
>   	int ret;
>   	bool check_caps = false;
>   	struct timespec64 mtime = current_time(inode);
>   	size_t count = iov_iter_count(from);
> -	size_t off;
>   
>   	if (ceph_snap(file_inode(file)) != CEPH_NOSNAP)
>   		return -EROFS;
> @@ -1499,70 +1497,267 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
>   	if (ret < 0)
>   		dout("invalidate_inode_pages2_range returned %d\n", ret);
>   
> -	flags = /* CEPH_OSD_FLAG_ORDERSNAP | */ CEPH_OSD_FLAG_WRITE;
> -
>   	while ((len = iov_iter_count(from)) > 0) {
>   		size_t left;
>   		int n;
> +		u64 write_pos = pos;
> +		u64 write_len = len;
> +		u64 objnum, objoff;
> +		u32 xlen;
> +		u64 assert_ver;
> +		bool rmw;
> +		bool first, last;
> +		struct iov_iter saved_iter = *from;
> +		size_t off;
> +
> +		fscrypt_adjust_off_and_len(inode, &write_pos, &write_len);
> +
> +		/* clamp the length to the end of first object */
> +		ceph_calc_file_object_mapping(&ci->i_layout, write_pos,
> +						write_len, &objnum, &objoff,
> +						&xlen);
> +		write_len = xlen;
> +
> +		/* adjust len downward if it goes beyond current object */
> +		if (pos + len > write_pos + write_len)
> +			len = write_pos + write_len - pos;
>   
> -		vino = ceph_vino(inode);
> -		req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout,
> -					    vino, pos, &len, 0, 1,
> -					    CEPH_OSD_OP_WRITE, flags, snapc,
> -					    ci->i_truncate_seq,
> -					    ci->i_truncate_size,
> -					    false);
> -		if (IS_ERR(req)) {
> -			ret = PTR_ERR(req);
> -			break;
> -		}
> +		/*
> +		 * If we had to adjust the length or position to align with a
> +		 * crypto block, then we must do a read/modify/write cycle. We
> +		 * use a version assertion to redrive the thing if something
> +		 * changes in between.
> +		 */
> +		first = pos != write_pos;
> +		last = (pos + len) != (write_pos + write_len);
> +		rmw = first || last;
>   
> -		/* FIXME: express in FSCRYPT_BLOCK_SIZE units */
> -		num_pages = calc_pages_for(pos, len);
> +		/*
> +		 * The data is emplaced into the page as it would be if it were in
> +		 * an array of pagecache pages.
> +		 */
> +		num_pages = calc_pages_for(write_pos, write_len);
>   		pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL);
>   		if (IS_ERR(pages)) {
>   			ret = PTR_ERR(pages);
> -			goto out;
> +			break;
> +		}
> +
> +		/* Do we need to preload the pages? */
> +		if (rmw) {
> +			u64 first_pos = write_pos;
> +			u64 last_pos = (write_pos + write_len) - CEPH_FSCRYPT_BLOCK_SIZE;
> +			u64 read_len = CEPH_FSCRYPT_BLOCK_SIZE;
> +
> +			/* We should only need to do this for encrypted inodes */
> +			WARN_ON_ONCE(!IS_ENCRYPTED(inode));
> +
> +			/* No need to do two reads if first and last blocks are same */
> +			if (first && last_pos == first_pos)
> +				last = false;
> +
> +			/*
> +			 * Allocate a read request for one or two extents, depending
> +			 * on how the request was aligned.
> +			 */
> +			req = ceph_osdc_new_request(osdc, &ci->i_layout,
> +					ci->i_vino, first ? first_pos : last_pos,
> +					&read_len, 0, (first && last) ? 2 : 1,
> +					CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
> +					NULL, ci->i_truncate_seq,
> +					ci->i_truncate_size, false);
> +			if (IS_ERR(req)) {
> +				ceph_release_page_vector(pages, num_pages);
> +				ret = PTR_ERR(req);
> +				break;
> +			}
> +
> +			/* Something is misaligned! */
> +			if (read_len != CEPH_FSCRYPT_BLOCK_SIZE) {
> +				ret = -EIO;
> +				break;
> +			}

Do we need to call "ceph_release_page_vector()" here ?



> +
> +			/* Add extent for first block? */
> +			if (first)
> +				osd_req_op_extent_osd_data_pages(req, 0, pages,
> +							 CEPH_FSCRYPT_BLOCK_SIZE,
> +							 offset_in_page(first_pos),
> +							 false, false);
> +
> +			/* Add extent for last block */
> +			if (last) {
> +				/* Init the other extent if first extent has been used */
> +				if (first) {
> +					osd_req_op_extent_init(req, 1, CEPH_OSD_OP_READ,
> +							last_pos, CEPH_FSCRYPT_BLOCK_SIZE,
> +							ci->i_truncate_size,
> +							ci->i_truncate_seq);
> +				}
> +
> +				osd_req_op_extent_osd_data_pages(req, first ? 1 : 0,
> +							&pages[num_pages - 1],
> +							CEPH_FSCRYPT_BLOCK_SIZE,
> +							offset_in_page(last_pos),
> +							false, false);
> +			}
> +
> +			ret = ceph_osdc_start_request(osdc, req, false);
> +			if (!ret)
> +				ret = ceph_osdc_wait_request(osdc, req);
> +
> +			/* FIXME: length field is wrong if there are 2 extents */
> +			ceph_update_read_metrics(&fsc->mdsc->metric,
> +						 req->r_start_latency,
> +						 req->r_end_latency,
> +						 read_len, ret);
> +
> +			/* Ok if object is not already present */
> +			if (ret == -ENOENT) {
> +				/*
> +				 * If there is no object, then we can't assert
> +				 * on its version. Set it to 0, and we'll use an
> +				 * exclusive create instead.
> +				 */
> +				ceph_osdc_put_request(req);
> +				assert_ver = 0;
> +				ret = 0;
> +
> +				/*
> +				 * zero out the soon-to-be uncopied parts of the
> +				 * first and last pages.
> +				 */
> +				if (first)
> +					zero_user_segment(pages[0], 0,

The pages should already be released in "ceph_osdc_put_request()" ?


> +							  offset_in_page(first_pos));
> +				if (last)
> +					zero_user_segment(pages[num_pages - 1],
> +							  offset_in_page(last_pos),
> +							  PAGE_SIZE);
> +			} else {
> +				/* Grab assert version. It must be non-zero. */
> +				assert_ver = req->r_version;
> +				WARN_ON_ONCE(ret > 0 && assert_ver == 0);
> +
> +				ceph_osdc_put_request(req);
> +				if (ret < 0) {
> +					ceph_release_page_vector(pages, num_pages);

Shouldn't the pages are already released in "ceph_osdc_put_request()" ?

IMO you should put the request when you are breaking the while loop and 
just before the next "ceph_osdc_new_request()" below.



> +					break;
> +				}
> +
> +				if (first) {
> +					ret = ceph_fscrypt_decrypt_block_inplace(inode,
> +							pages[0],
> +							CEPH_FSCRYPT_BLOCK_SIZE,
> +							offset_in_page(first_pos),
> +							first_pos >> CEPH_FSCRYPT_BLOCK_SHIFT);
> +					if (ret < 0)
> +						break;
> +				}
> +				if (last) {
> +					ret = ceph_fscrypt_decrypt_block_inplace(inode,
> +							pages[num_pages - 1],
> +							CEPH_FSCRYPT_BLOCK_SIZE,
> +							offset_in_page(last_pos),
> +							last_pos >> CEPH_FSCRYPT_BLOCK_SHIFT);
> +					if (ret < 0)
> +						break;
> +				}
> +			}
>   		}
>   
>   		left = len;
> -		off = pos & ~CEPH_FSCRYPT_BLOCK_MASK;
> +		off = offset_in_page(pos);
>   		for (n = 0; n < num_pages; n++) {
> -			size_t plen = min_t(size_t, left, CEPH_FSCRYPT_BLOCK_SIZE - off);
> +			size_t plen = min_t(size_t, left, PAGE_SIZE - off);
> +
> +			/* copy the data */
>   			ret = copy_page_from_iter(pages[n], off, plen, from);
> -			off = 0;
>   			if (ret != plen) {
>   				ret = -EFAULT;
>   				break;
>   			}
> +			off = 0;
>   			left -= ret;
>   		}
> -
>   		if (ret < 0) {
> +			dout("sync_write write failed with %d\n", ret);
>   			ceph_release_page_vector(pages, num_pages);
> -			goto out;
> +			break;
>   		}
>   
> -		req->r_inode = inode;
> +		if (IS_ENCRYPTED(inode)) {
> +			ret = ceph_fscrypt_encrypt_pages(inode, pages,
> +							 write_pos, write_len,
> +							 GFP_KERNEL);
> +			if (ret < 0) {
> +				dout("encryption failed with %d\n", ret);
> +				break;
> +			}
> +		}
>   
> -		osd_req_op_extent_osd_data_pages(req, 0, pages, len,
> -						 pos & ~CEPH_FSCRYPT_BLOCK_MASK,
> -						 false, true);

The pages have already been released, you need to allocate new pages 
again here.

> +		req = ceph_osdc_new_request(osdc, &ci->i_layout,
> +					    ci->i_vino, write_pos, &write_len,
> +					    rmw ? 1 : 0, rmw ? 2 : 1,
> +					    CEPH_OSD_OP_WRITE,
> +					    CEPH_OSD_FLAG_WRITE,
> +					    snapc, ci->i_truncate_seq,
> +					    ci->i_truncate_size, false);
> +		if (IS_ERR(req)) {
> +			ret = PTR_ERR(req);
> +			ceph_release_page_vector(pages, num_pages);
> +			break;
> +		}
>   
> +		dout("sync_write write op %lld~%llu\n", write_pos, write_len);
> +		osd_req_op_extent_osd_data_pages(req, rmw ? 1 : 0, pages, write_len,
> +						 offset_in_page(write_pos), false,
> +						 true);
> +		req->r_inode = inode;
>   		req->r_mtime = mtime;
> -		ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
> +
> +		/* Set up the assertion */
> +		if (rmw) {
> +			/*
> +			 * Set up the assertion. If we don't have a version number,
> +			 * then the object doesn't exist yet. Use an exclusive create
> +			 * instead of a version assertion in that case.
> +			 */
> +			if (assert_ver) {
> +				osd_req_op_init(req, 0, CEPH_OSD_OP_ASSERT_VER, 0);
> +				req->r_ops[0].assert_ver.ver = assert_ver;
> +			} else {
> +				osd_req_op_init(req, 0, CEPH_OSD_OP_CREATE,
> +						CEPH_OSD_OP_FLAG_EXCL);
> +			}
> +		}
> +
> +		ret = ceph_osdc_start_request(osdc, req, false);
>   		if (!ret)
> -			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
> +			ret = ceph_osdc_wait_request(osdc, req);
>   
>   		ceph_update_write_metrics(&fsc->mdsc->metric, req->r_start_latency,
>   					  req->r_end_latency, len, ret);
> -out:
>   		ceph_osdc_put_request(req);
>   		if (ret != 0) {
> +			dout("sync_write osd write returned %d\n", ret);
> +			/* Version changed! Must re-do the rmw cycle */
> +			if ((assert_ver && (ret == -ERANGE || ret == -EOVERFLOW)) ||
> +			     (!assert_ver && ret == -EEXIST)) {
> +				/* We should only ever see this on a rmw */
> +				WARN_ON_ONCE(!rmw);
> +
> +				/* The version should never go backward */
> +				WARN_ON_ONCE(ret == -EOVERFLOW);
> +
> +				*from = saved_iter;
> +
> +				/* FIXME: limit number of times we loop? */
> +				continue;
> +			}
>   			ceph_set_error_write(ci);
>   			break;
>   		}
> -
>   		ceph_clear_error_write(ci);
>   		pos += len;
>   		written += len;
> @@ -1580,6 +1775,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
>   		ret = written;
>   		iocb->ki_pos = pos;
>   	}
> +	dout("sync_write returning %d\n", ret);
>   	return ret;
>   }
>   


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 43/48] ceph: add read/modify/write to ceph_sync_write
  2022-01-19  3:21   ` Xiubo Li
@ 2022-01-19  5:08     ` Xiubo Li
  2022-01-19 11:06       ` Jeff Layton
  0 siblings, 1 reply; 84+ messages in thread
From: Xiubo Li @ 2022-01-19  5:08 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov


On 1/19/22 11:21 AM, Xiubo Li wrote:
>
> On 1/12/22 3:16 AM, Jeff Layton wrote:
>> When doing a synchronous write on an encrypted inode, we have no
>> guarantee that the caller is writing crypto block-aligned data. When
>> that happens, we must do a read/modify/write cycle.
>>
>> First, expand the range to cover complete blocks. If we had to change
>> the original pos or length, issue a read to fill the first and/or last
>> pages, and fetch the version of the object from the result.
>>
>> We then copy data into the pages as usual, encrypt the result and issue
>> a write prefixed by an assertion that the version hasn't changed. If 
>> it has
>> changed then we restart the whole thing again.
>>
>> If there is no object at that position in the file (-ENOENT), we prefix
>> the write on an exclusive create of the object instead.
>>
>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>> ---
>>   fs/ceph/file.c | 260 +++++++++++++++++++++++++++++++++++++++++++------
>>   1 file changed, 228 insertions(+), 32 deletions(-)
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index a6305ad5519b..41766b2012e9 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -1468,18 +1468,16 @@ ceph_sync_write(struct kiocb *iocb, struct 
>> iov_iter *from, loff_t pos,
>>       struct inode *inode = file_inode(file);
>>       struct ceph_inode_info *ci = ceph_inode(inode);
>>       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>> -    struct ceph_vino vino;
>> +    struct ceph_osd_client *osdc = &fsc->client->osdc;
>>       struct ceph_osd_request *req;
>>       struct page **pages;
>>       u64 len;
>>       int num_pages;
>>       int written = 0;
>> -    int flags;
>>       int ret;
>>       bool check_caps = false;
>>       struct timespec64 mtime = current_time(inode);
>>       size_t count = iov_iter_count(from);
>> -    size_t off;
>>         if (ceph_snap(file_inode(file)) != CEPH_NOSNAP)
>>           return -EROFS;
>> @@ -1499,70 +1497,267 @@ ceph_sync_write(struct kiocb *iocb, struct 
>> iov_iter *from, loff_t pos,
>>       if (ret < 0)
>>           dout("invalidate_inode_pages2_range returned %d\n", ret);
>>   -    flags = /* CEPH_OSD_FLAG_ORDERSNAP | */ CEPH_OSD_FLAG_WRITE;
>> -
>>       while ((len = iov_iter_count(from)) > 0) {
>>           size_t left;
>>           int n;
>> +        u64 write_pos = pos;
>> +        u64 write_len = len;
>> +        u64 objnum, objoff;
>> +        u32 xlen;
>> +        u64 assert_ver;
>> +        bool rmw;
>> +        bool first, last;
>> +        struct iov_iter saved_iter = *from;
>> +        size_t off;
>> +
>> +        fscrypt_adjust_off_and_len(inode, &write_pos, &write_len);
>> +
>> +        /* clamp the length to the end of first object */
>> +        ceph_calc_file_object_mapping(&ci->i_layout, write_pos,
>> +                        write_len, &objnum, &objoff,
>> +                        &xlen);
>> +        write_len = xlen;
>> +
>> +        /* adjust len downward if it goes beyond current object */
>> +        if (pos + len > write_pos + write_len)
>> +            len = write_pos + write_len - pos;
>>   -        vino = ceph_vino(inode);
>> -        req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout,
>> -                        vino, pos, &len, 0, 1,
>> -                        CEPH_OSD_OP_WRITE, flags, snapc,
>> -                        ci->i_truncate_seq,
>> -                        ci->i_truncate_size,
>> -                        false);
>> -        if (IS_ERR(req)) {
>> -            ret = PTR_ERR(req);
>> -            break;
>> -        }
>> +        /*
>> +         * If we had to adjust the length or position to align with a
>> +         * crypto block, then we must do a read/modify/write cycle. We
>> +         * use a version assertion to redrive the thing if something
>> +         * changes in between.
>> +         */
>> +        first = pos != write_pos;
>> +        last = (pos + len) != (write_pos + write_len);
>> +        rmw = first || last;
>>   -        /* FIXME: express in FSCRYPT_BLOCK_SIZE units */
>> -        num_pages = calc_pages_for(pos, len);
>> +        /*
>> +         * The data is emplaced into the page as it would be if it 
>> were in
>> +         * an array of pagecache pages.
>> +         */
>> +        num_pages = calc_pages_for(write_pos, write_len);
>>           pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL);
>>           if (IS_ERR(pages)) {
>>               ret = PTR_ERR(pages);
>> -            goto out;
>> +            break;
>> +        }
>> +
>> +        /* Do we need to preload the pages? */
>> +        if (rmw) {
>> +            u64 first_pos = write_pos;
>> +            u64 last_pos = (write_pos + write_len) - 
>> CEPH_FSCRYPT_BLOCK_SIZE;
>> +            u64 read_len = CEPH_FSCRYPT_BLOCK_SIZE;
>> +
>> +            /* We should only need to do this for encrypted inodes */
>> +            WARN_ON_ONCE(!IS_ENCRYPTED(inode));
>> +
>> +            /* No need to do two reads if first and last blocks are 
>> same */
>> +            if (first && last_pos == first_pos)
>> +                last = false;
>> +
>> +            /*
>> +             * Allocate a read request for one or two extents, 
>> depending
>> +             * on how the request was aligned.
>> +             */
>> +            req = ceph_osdc_new_request(osdc, &ci->i_layout,
>> +                    ci->i_vino, first ? first_pos : last_pos,
>> +                    &read_len, 0, (first && last) ? 2 : 1,
>> +                    CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
>> +                    NULL, ci->i_truncate_seq,
>> +                    ci->i_truncate_size, false);
>> +            if (IS_ERR(req)) {
>> +                ceph_release_page_vector(pages, num_pages);
>> +                ret = PTR_ERR(req);
>> +                break;
>> +            }
>> +
>> +            /* Something is misaligned! */
>> +            if (read_len != CEPH_FSCRYPT_BLOCK_SIZE) {
>> +                ret = -EIO;
>> +                break;
>> +            }
>
> Do we need to call "ceph_release_page_vector()" here ?
>
>
>
>> +
>> +            /* Add extent for first block? */
>> +            if (first)
>> +                osd_req_op_extent_osd_data_pages(req, 0, pages,
>> +                             CEPH_FSCRYPT_BLOCK_SIZE,
>> +                             offset_in_page(first_pos),
>> +                             false, false);
>> +
>> +            /* Add extent for last block */
>> +            if (last) {
>> +                /* Init the other extent if first extent has been 
>> used */
>> +                if (first) {
>> +                    osd_req_op_extent_init(req, 1, CEPH_OSD_OP_READ,
>> +                            last_pos, CEPH_FSCRYPT_BLOCK_SIZE,
>> +                            ci->i_truncate_size,
>> +                            ci->i_truncate_seq);
>> +                }
>> +
>> +                osd_req_op_extent_osd_data_pages(req, first ? 1 : 0,
>> +                            &pages[num_pages - 1],
>> +                            CEPH_FSCRYPT_BLOCK_SIZE,
>> +                            offset_in_page(last_pos),
>> +                            false, false);
>> +            }
>> +
>> +            ret = ceph_osdc_start_request(osdc, req, false);
>> +            if (!ret)
>> +                ret = ceph_osdc_wait_request(osdc, req);
>> +
>> +            /* FIXME: length field is wrong if there are 2 extents */
>> + ceph_update_read_metrics(&fsc->mdsc->metric,
>> +                         req->r_start_latency,
>> +                         req->r_end_latency,
>> +                         read_len, ret);
>> +
>> +            /* Ok if object is not already present */
>> +            if (ret == -ENOENT) {
>> +                /*
>> +                 * If there is no object, then we can't assert
>> +                 * on its version. Set it to 0, and we'll use an
>> +                 * exclusive create instead.
>> +                 */
>> +                ceph_osdc_put_request(req);
>> +                assert_ver = 0;
>> +                ret = 0;
>> +
>> +                /*
>> +                 * zero out the soon-to-be uncopied parts of the
>> +                 * first and last pages.
>> +                 */
>> +                if (first)
>> +                    zero_user_segment(pages[0], 0,
>
> The pages should already be released in "ceph_osdc_put_request()" ?
>
>
>> + offset_in_page(first_pos));
>> +                if (last)
>> +                    zero_user_segment(pages[num_pages - 1],
>> +                              offset_in_page(last_pos),
>> +                              PAGE_SIZE);
>> +            } else {
>> +                /* Grab assert version. It must be non-zero. */
>> +                assert_ver = req->r_version;
>> +                WARN_ON_ONCE(ret > 0 && assert_ver == 0);
>> +
>> +                ceph_osdc_put_request(req);
>> +                if (ret < 0) {
>> +                    ceph_release_page_vector(pages, num_pages);
>
> Shouldn't the pages are already released in "ceph_osdc_put_request()" ?
>
> IMO you should put the request when you are breaking the while loop 
> and just before the next "ceph_osdc_new_request()" below.
>
>
Okay, I missed the "own_page" parameter, the caller is responsible to 
release it.

But you need to call the "ceph_release_page_vector()" when 
"ceph_fscrypt_decrypt_block_inplace()" fails below.


>
>> +                    break;
>> +                }
>> +
>> +                if (first) {
>> +                    ret = ceph_fscrypt_decrypt_block_inplace(inode,
>> +                            pages[0],
>> +                            CEPH_FSCRYPT_BLOCK_SIZE,
>> +                            offset_in_page(first_pos),
>> +                            first_pos >> CEPH_FSCRYPT_BLOCK_SHIFT);
>> +                    if (ret < 0)
>> +                        break;
>> +                }
>> +                if (last) {
>> +                    ret = ceph_fscrypt_decrypt_block_inplace(inode,
>> +                            pages[num_pages - 1],
>> +                            CEPH_FSCRYPT_BLOCK_SIZE,
>> +                            offset_in_page(last_pos),
>> +                            last_pos >> CEPH_FSCRYPT_BLOCK_SHIFT);
>> +                    if (ret < 0)
>> +                        break;
>> +                }
>> +            }
>>           }
>>             left = len;
>> -        off = pos & ~CEPH_FSCRYPT_BLOCK_MASK;
>> +        off = offset_in_page(pos);
>>           for (n = 0; n < num_pages; n++) {
>> -            size_t plen = min_t(size_t, left, 
>> CEPH_FSCRYPT_BLOCK_SIZE - off);
>> +            size_t plen = min_t(size_t, left, PAGE_SIZE - off);
>> +
>> +            /* copy the data */
>>               ret = copy_page_from_iter(pages[n], off, plen, from);
>> -            off = 0;
>>               if (ret != plen) {
>>                   ret = -EFAULT;
>>                   break;
>>               }
>> +            off = 0;
>>               left -= ret;
>>           }
>> -
>>           if (ret < 0) {
>> +            dout("sync_write write failed with %d\n", ret);
>>               ceph_release_page_vector(pages, num_pages);
>> -            goto out;
>> +            break;
>>           }
>>   -        req->r_inode = inode;
>> +        if (IS_ENCRYPTED(inode)) {
>> +            ret = ceph_fscrypt_encrypt_pages(inode, pages,
>> +                             write_pos, write_len,
>> +                             GFP_KERNEL);
>> +            if (ret < 0) {
>> +                dout("encryption failed with %d\n", ret);

And here ?


>> +                break;
>> +            }
>> +        }
>>   -        osd_req_op_extent_osd_data_pages(req, 0, pages, len,
>> -                         pos & ~CEPH_FSCRYPT_BLOCK_MASK,
>> -                         false, true);
>
> The pages have already been released, you need to allocate new pages 
> again here.
>
>> +        req = ceph_osdc_new_request(osdc, &ci->i_layout,
>> +                        ci->i_vino, write_pos, &write_len,
>> +                        rmw ? 1 : 0, rmw ? 2 : 1,
>> +                        CEPH_OSD_OP_WRITE,
>> +                        CEPH_OSD_FLAG_WRITE,
>> +                        snapc, ci->i_truncate_seq,
>> +                        ci->i_truncate_size, false);
>> +        if (IS_ERR(req)) {
>> +            ret = PTR_ERR(req);
>> +            ceph_release_page_vector(pages, num_pages);
>> +            break;
>> +        }
>>   +        dout("sync_write write op %lld~%llu\n", write_pos, 
>> write_len);
>> +        osd_req_op_extent_osd_data_pages(req, rmw ? 1 : 0, pages, 
>> write_len,
>> +                         offset_in_page(write_pos), false,
>> +                         true);
>> +        req->r_inode = inode;
>>           req->r_mtime = mtime;
>> -        ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
>> +
>> +        /* Set up the assertion */
>> +        if (rmw) {
>> +            /*
>> +             * Set up the assertion. If we don't have a version number,
>> +             * then the object doesn't exist yet. Use an exclusive 
>> create
>> +             * instead of a version assertion in that case.
>> +             */
>> +            if (assert_ver) {
>> +                osd_req_op_init(req, 0, CEPH_OSD_OP_ASSERT_VER, 0);
>> +                req->r_ops[0].assert_ver.ver = assert_ver;
>> +            } else {
>> +                osd_req_op_init(req, 0, CEPH_OSD_OP_CREATE,
>> +                        CEPH_OSD_OP_FLAG_EXCL);
>> +            }
>> +        }
>> +
>> +        ret = ceph_osdc_start_request(osdc, req, false);
>>           if (!ret)
>> -            ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
>> +            ret = ceph_osdc_wait_request(osdc, req);
>> ceph_update_write_metrics(&fsc->mdsc->metric, req->r_start_latency,
>>                         req->r_end_latency, len, ret);
>> -out:
>>           ceph_osdc_put_request(req);
>>           if (ret != 0) {
>> +            dout("sync_write osd write returned %d\n", ret);
>> +            /* Version changed! Must re-do the rmw cycle */
>> +            if ((assert_ver && (ret == -ERANGE || ret == 
>> -EOVERFLOW)) ||
>> +                 (!assert_ver && ret == -EEXIST)) {
>> +                /* We should only ever see this on a rmw */
>> +                WARN_ON_ONCE(!rmw);
>> +
>> +                /* The version should never go backward */
>> +                WARN_ON_ONCE(ret == -EOVERFLOW);
>> +
>> +                *from = saved_iter;
>> +
>> +                /* FIXME: limit number of times we loop? */
>> +                continue;
>> +            }
>>               ceph_set_error_write(ci);
>>               break;
>>           }
>> -
>>           ceph_clear_error_write(ci);
>>           pos += len;
>>           written += len;
>> @@ -1580,6 +1775,7 @@ ceph_sync_write(struct kiocb *iocb, struct 
>> iov_iter *from, loff_t pos,
>>           ret = written;
>>           iocb->ki_pos = pos;
>>       }
>> +    dout("sync_write returning %d\n", ret);
>>       return ret;
>>   }


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 44/48] ceph: plumb in decryption during sync reads
  2022-01-11 19:16 ` [RFC PATCH v10 44/48] ceph: plumb in decryption during sync reads Jeff Layton
@ 2022-01-19  5:18   ` Xiubo Li
  2022-01-19 18:49     ` Jeff Layton
  0 siblings, 1 reply; 84+ messages in thread
From: Xiubo Li @ 2022-01-19  5:18 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov


On 1/12/22 3:16 AM, Jeff Layton wrote:
> Note that the crypto block may be smaller than a page, but the reverse
> cannot be true.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>   fs/ceph/file.c | 94 ++++++++++++++++++++++++++++++++++++--------------
>   1 file changed, 69 insertions(+), 25 deletions(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 41766b2012e9..b4f2fcd33837 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -926,9 +926,17 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>   		bool more;
>   		int idx;
>   		size_t left;
> +		u64 read_off = off;
> +		u64 read_len = len;
> +
> +		/* determine new offset/length if encrypted */
> +		fscrypt_adjust_off_and_len(inode, &read_off, &read_len);
> +
> +		dout("sync_read orig %llu~%llu reading %llu~%llu",
> +		     off, len, read_off, read_len);
>   
>   		req = ceph_osdc_new_request(osdc, &ci->i_layout,
> -					ci->i_vino, off, &len, 0, 1,
> +					ci->i_vino, read_off, &read_len, 0, 1,
>   					CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
>   					NULL, ci->i_truncate_seq,
>   					ci->i_truncate_size, false);
> @@ -937,10 +945,13 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>   			break;
>   		}
>   
> +		/* adjust len downward if the request truncated the len */
> +		if (off + len > read_off + read_len)
> +			len = read_off + read_len - off;
>   		more = len < iov_iter_count(to);
>   
> -		num_pages = calc_pages_for(off, len);
> -		page_off = off & ~PAGE_MASK;
> +		num_pages = calc_pages_for(read_off, read_len);
> +		page_off = offset_in_page(off);
>   		pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL);
>   		if (IS_ERR(pages)) {
>   			ceph_osdc_put_request(req);
> @@ -948,7 +959,8 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>   			break;
>   		}
>   
> -		osd_req_op_extent_osd_data_pages(req, 0, pages, len, page_off,
> +		osd_req_op_extent_osd_data_pages(req, 0, pages, read_len,
> +						 offset_in_page(read_off),
>   						 false, false);
>   		ret = ceph_osdc_start_request(osdc, req, false);
>   		if (!ret)
> @@ -957,23 +969,50 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>   		ceph_update_read_metrics(&fsc->mdsc->metric,
>   					 req->r_start_latency,
>   					 req->r_end_latency,
> -					 len, ret);
> +					 read_len, ret);
>   
>   		if (ret > 0)
>   			objver = req->r_version;
>   		ceph_osdc_put_request(req);
> -
>   		i_size = i_size_read(inode);
>   		dout("sync_read %llu~%llu got %zd i_size %llu%s\n",
>   		     off, len, ret, i_size, (more ? " MORE" : ""));
>   
> -		if (ret == -ENOENT)
> +		if (ret == -ENOENT) {
> +			/* No object? Then this is a hole */
>   			ret = 0;
> +		} else if (ret > 0 && IS_ENCRYPTED(inode)) {
> +			int fret;
> +
> +			fret = ceph_fscrypt_decrypt_pages(inode, pages, read_off, ret);
> +			if (fret < 0) {
> +				ceph_release_page_vector(pages, num_pages);
> +				ret = fret;
> +				break;
> +			}
> +
> +			dout("sync_read decrypted fret %d\n", fret);
> +
> +			/* account for any partial block at the beginning */
> +			fret -= (off - read_off);
> +
> +			/*
> +			 * Short read after big offset adjustment?
> +			 * Nothing is usable, just call it a zero
> +			 * len read.
> +			 */
> +			fret = max(fret, 0);
> +
> +			/* account for partial block at the end */
> +			ret = min_t(ssize_t, fret, len);
> +		}
> +
> +		/* Short read but not EOF? Zero out the remainder. */
>   		if (ret >= 0 && ret < len && (off + ret < i_size)) {
>   			int zlen = min(len - ret, i_size - off - ret);
>   			int zoff = page_off + ret;
>   			dout("sync_read zero gap %llu~%llu\n",
> -                             off + ret, off + ret + zlen);
> +			     off + ret, off + ret + zlen);
>   			ceph_zero_page_vector_range(zoff, zlen, pages);
>   			ret += zlen;
>   		}
> @@ -981,15 +1020,15 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>   		idx = 0;
>   		left = ret > 0 ? ret : 0;
>   		while (left > 0) {
> -			size_t len, copied;
> -			page_off = off & ~PAGE_MASK;
> -			len = min_t(size_t, left, PAGE_SIZE - page_off);
> +			size_t plen, copied;
> +			plen = min_t(size_t, left, PAGE_SIZE - page_off);
>   			SetPageUptodate(pages[idx]);
>   			copied = copy_page_to_iter(pages[idx++],
> -						   page_off, len, to);
> +						   page_off, plen, to);
>   			off += copied;
>   			left -= copied;
> -			if (copied < len) {
> +			page_off = 0;
> +			if (copied < plen) {
>   				ret = -EFAULT;
>   				break;
>   			}
> @@ -1006,20 +1045,21 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>   			break;
>   	}
>   
> -	if (off > *ki_pos) {
> -		if (off >= i_size) {
> -			*retry_op = CHECK_EOF;
> -			ret = i_size - *ki_pos;
> -			*ki_pos = i_size;
> -		} else {
> -			ret = off - *ki_pos;
> -			*ki_pos = off;
> +	if (ret > 0) {
> +		if (off > *ki_pos) {
> +			if (off >= i_size) {
> +				*retry_op = CHECK_EOF;
> +				ret = i_size - *ki_pos;
> +				*ki_pos = i_size;
> +			} else {
> +				ret = off - *ki_pos;
> +				*ki_pos = off;
> +			}
>   		}
> -	}
> -
> -	if (last_objver && ret > 0)
> -		*last_objver = objver;
>   
> +		if (last_objver)
> +			*last_objver = objver;
> +	}
>   	dout("sync_read result %zd retry_op %d\n", ret, *retry_op);
>   	return ret;
>   }
> @@ -1532,6 +1572,9 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
>   		last = (pos + len) != (write_pos + write_len);
>   		rmw = first || last;
>   
> +		dout("sync_write ino %llx %lld~%llu adjusted %lld~%llu -- %srmw\n",
> +		     ci->i_vino.ino, pos, len, write_pos, write_len, rmw ? "" : "no ");
> +

Should this move to the previous patch ?


>   		/*
>   		 * The data is emplaced into the page as it would be if it were in
>   		 * an array of pagecache pages.
> @@ -1761,6 +1804,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
>   		ceph_clear_error_write(ci);
>   		pos += len;
>   		written += len;
> +		dout("sync_write written %d\n", written);
>   		if (pos > i_size_read(inode)) {
>   			check_caps = ceph_inode_set_size(inode, pos);
>   			if (check_caps)


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 43/48] ceph: add read/modify/write to ceph_sync_write
  2022-01-19  5:08     ` Xiubo Li
@ 2022-01-19 11:06       ` Jeff Layton
  0 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-19 11:06 UTC (permalink / raw)
  To: Xiubo Li, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

On Wed, 2022-01-19 at 13:08 +0800, Xiubo Li wrote:
> On 1/19/22 11:21 AM, Xiubo Li wrote:
> > 
> > On 1/12/22 3:16 AM, Jeff Layton wrote:
> > > When doing a synchronous write on an encrypted inode, we have no
> > > guarantee that the caller is writing crypto block-aligned data. When
> > > that happens, we must do a read/modify/write cycle.
> > > 
> > > First, expand the range to cover complete blocks. If we had to change
> > > the original pos or length, issue a read to fill the first and/or last
> > > pages, and fetch the version of the object from the result.
> > > 
> > > We then copy data into the pages as usual, encrypt the result and issue
> > > a write prefixed by an assertion that the version hasn't changed. If 
> > > it has
> > > changed then we restart the whole thing again.
> > > 
> > > If there is no object at that position in the file (-ENOENT), we prefix
> > > the write on an exclusive create of the object instead.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >   fs/ceph/file.c | 260 +++++++++++++++++++++++++++++++++++++++++++------
> > >   1 file changed, 228 insertions(+), 32 deletions(-)
> > > 
> > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > > index a6305ad5519b..41766b2012e9 100644
> > > --- a/fs/ceph/file.c
> > > +++ b/fs/ceph/file.c
> > > @@ -1468,18 +1468,16 @@ ceph_sync_write(struct kiocb *iocb, struct 
> > > iov_iter *from, loff_t pos,
> > >       struct inode *inode = file_inode(file);
> > >       struct ceph_inode_info *ci = ceph_inode(inode);
> > >       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> > > -    struct ceph_vino vino;
> > > +    struct ceph_osd_client *osdc = &fsc->client->osdc;
> > >       struct ceph_osd_request *req;
> > >       struct page **pages;
> > >       u64 len;
> > >       int num_pages;
> > >       int written = 0;
> > > -    int flags;
> > >       int ret;
> > >       bool check_caps = false;
> > >       struct timespec64 mtime = current_time(inode);
> > >       size_t count = iov_iter_count(from);
> > > -    size_t off;
> > >         if (ceph_snap(file_inode(file)) != CEPH_NOSNAP)
> > >           return -EROFS;
> > > @@ -1499,70 +1497,267 @@ ceph_sync_write(struct kiocb *iocb, struct 
> > > iov_iter *from, loff_t pos,
> > >       if (ret < 0)
> > >           dout("invalidate_inode_pages2_range returned %d\n", ret);
> > >   -    flags = /* CEPH_OSD_FLAG_ORDERSNAP | */ CEPH_OSD_FLAG_WRITE;
> > > -
> > >       while ((len = iov_iter_count(from)) > 0) {
> > >           size_t left;
> > >           int n;
> > > +        u64 write_pos = pos;
> > > +        u64 write_len = len;
> > > +        u64 objnum, objoff;
> > > +        u32 xlen;
> > > +        u64 assert_ver;
> > > +        bool rmw;
> > > +        bool first, last;
> > > +        struct iov_iter saved_iter = *from;
> > > +        size_t off;
> > > +
> > > +        fscrypt_adjust_off_and_len(inode, &write_pos, &write_len);
> > > +
> > > +        /* clamp the length to the end of first object */
> > > +        ceph_calc_file_object_mapping(&ci->i_layout, write_pos,
> > > +                        write_len, &objnum, &objoff,
> > > +                        &xlen);
> > > +        write_len = xlen;
> > > +
> > > +        /* adjust len downward if it goes beyond current object */
> > > +        if (pos + len > write_pos + write_len)
> > > +            len = write_pos + write_len - pos;
> > >   -        vino = ceph_vino(inode);
> > > -        req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout,
> > > -                        vino, pos, &len, 0, 1,
> > > -                        CEPH_OSD_OP_WRITE, flags, snapc,
> > > -                        ci->i_truncate_seq,
> > > -                        ci->i_truncate_size,
> > > -                        false);
> > > -        if (IS_ERR(req)) {
> > > -            ret = PTR_ERR(req);
> > > -            break;
> > > -        }
> > > +        /*
> > > +         * If we had to adjust the length or position to align with a
> > > +         * crypto block, then we must do a read/modify/write cycle. We
> > > +         * use a version assertion to redrive the thing if something
> > > +         * changes in between.
> > > +         */
> > > +        first = pos != write_pos;
> > > +        last = (pos + len) != (write_pos + write_len);
> > > +        rmw = first || last;
> > >   -        /* FIXME: express in FSCRYPT_BLOCK_SIZE units */
> > > -        num_pages = calc_pages_for(pos, len);
> > > +        /*
> > > +         * The data is emplaced into the page as it would be if it 
> > > were in
> > > +         * an array of pagecache pages.
> > > +         */
> > > +        num_pages = calc_pages_for(write_pos, write_len);
> > >           pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL);
> > >           if (IS_ERR(pages)) {
> > >               ret = PTR_ERR(pages);
> > > -            goto out;
> > > +            break;
> > > +        }
> > > +
> > > +        /* Do we need to preload the pages? */
> > > +        if (rmw) {
> > > +            u64 first_pos = write_pos;
> > > +            u64 last_pos = (write_pos + write_len) - 
> > > CEPH_FSCRYPT_BLOCK_SIZE;
> > > +            u64 read_len = CEPH_FSCRYPT_BLOCK_SIZE;
> > > +
> > > +            /* We should only need to do this for encrypted inodes */
> > > +            WARN_ON_ONCE(!IS_ENCRYPTED(inode));
> > > +
> > > +            /* No need to do two reads if first and last blocks are 
> > > same */
> > > +            if (first && last_pos == first_pos)
> > > +                last = false;
> > > +
> > > +            /*
> > > +             * Allocate a read request for one or two extents, 
> > > depending
> > > +             * on how the request was aligned.
> > > +             */
> > > +            req = ceph_osdc_new_request(osdc, &ci->i_layout,
> > > +                    ci->i_vino, first ? first_pos : last_pos,
> > > +                    &read_len, 0, (first && last) ? 2 : 1,
> > > +                    CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
> > > +                    NULL, ci->i_truncate_seq,
> > > +                    ci->i_truncate_size, false);
> > > +            if (IS_ERR(req)) {
> > > +                ceph_release_page_vector(pages, num_pages);
> > > +                ret = PTR_ERR(req);
> > > +                break;
> > > +            }
> > > +
> > > +            /* Something is misaligned! */
> > > +            if (read_len != CEPH_FSCRYPT_BLOCK_SIZE) {
> > > +                ret = -EIO;
> > > +                break;
> > > +            }
> > 
> > Do we need to call "ceph_release_page_vector()" here ?
> > 
> > 
> > 
> > > +
> > > +            /* Add extent for first block? */
> > > +            if (first)
> > > +                osd_req_op_extent_osd_data_pages(req, 0, pages,
> > > +                             CEPH_FSCRYPT_BLOCK_SIZE,
> > > +                             offset_in_page(first_pos),
> > > +                             false, false);
> > > +
> > > +            /* Add extent for last block */
> > > +            if (last) {
> > > +                /* Init the other extent if first extent has been 
> > > used */
> > > +                if (first) {
> > > +                    osd_req_op_extent_init(req, 1, CEPH_OSD_OP_READ,
> > > +                            last_pos, CEPH_FSCRYPT_BLOCK_SIZE,
> > > +                            ci->i_truncate_size,
> > > +                            ci->i_truncate_seq);
> > > +                }
> > > +
> > > +                osd_req_op_extent_osd_data_pages(req, first ? 1 : 0,
> > > +                            &pages[num_pages - 1],
> > > +                            CEPH_FSCRYPT_BLOCK_SIZE,
> > > +                            offset_in_page(last_pos),
> > > +                            false, false);
> > > +            }
> > > +
> > > +            ret = ceph_osdc_start_request(osdc, req, false);
> > > +            if (!ret)
> > > +                ret = ceph_osdc_wait_request(osdc, req);
> > > +
> > > +            /* FIXME: length field is wrong if there are 2 extents */
> > > + ceph_update_read_metrics(&fsc->mdsc->metric,
> > > +                         req->r_start_latency,
> > > +                         req->r_end_latency,
> > > +                         read_len, ret);
> > > +
> > > +            /* Ok if object is not already present */
> > > +            if (ret == -ENOENT) {
> > > +                /*
> > > +                 * If there is no object, then we can't assert
> > > +                 * on its version. Set it to 0, and we'll use an
> > > +                 * exclusive create instead.
> > > +                 */
> > > +                ceph_osdc_put_request(req);
> > > +                assert_ver = 0;
> > > +                ret = 0;
> > > +
> > > +                /*
> > > +                 * zero out the soon-to-be uncopied parts of the
> > > +                 * first and last pages.
> > > +                 */
> > > +                if (first)
> > > +                    zero_user_segment(pages[0], 0,
> > 
> > The pages should already be released in "ceph_osdc_put_request()" ?
> > 
> > 
> > > + offset_in_page(first_pos));
> > > +                if (last)
> > > +                    zero_user_segment(pages[num_pages - 1],
> > > +                              offset_in_page(last_pos),
> > > +                              PAGE_SIZE);
> > > +            } else {
> > > +                /* Grab assert version. It must be non-zero. */
> > > +                assert_ver = req->r_version;
> > > +                WARN_ON_ONCE(ret > 0 && assert_ver == 0);
> > > +
> > > +                ceph_osdc_put_request(req);
> > > +                if (ret < 0) {
> > > +                    ceph_release_page_vector(pages, num_pages);
> > 
> > Shouldn't the pages are already released in "ceph_osdc_put_request()" ?
> > 
> > IMO you should put the request when you are breaking the while loop 
> > and just before the next "ceph_osdc_new_request()" below.
> > 
> > 
> Okay, I missed the "own_page" parameter, the caller is responsible to 
> release it.
> 
> But you need to call the "ceph_release_page_vector()" when 
> "ceph_fscrypt_decrypt_block_inplace()" fails below.
> 
> 


Well spotted!

Yeah, own_pages is not set here because we need to continue working with
the pages after the read completes. We do set it for the write, however,
so we don't need to set it after that.

I've fixed up the places you noted (and a couple of others) in my tree.
I'll plan to re-push wip-fscrypt after I do some testing.

Thanks!

> > 
> > > +                    break;
> > > +                }
> > > +
> > > +                if (first) {
> > > +                    ret = ceph_fscrypt_decrypt_block_inplace(inode,
> > > +                            pages[0],
> > > +                            CEPH_FSCRYPT_BLOCK_SIZE,
> > > +                            offset_in_page(first_pos),
> > > +                            first_pos >> CEPH_FSCRYPT_BLOCK_SHIFT);
> > > +                    if (ret < 0)
> > > +                        break;
> > > +                }
> > > +                if (last) {
> > > +                    ret = ceph_fscrypt_decrypt_block_inplace(inode,
> > > +                            pages[num_pages - 1],
> > > +                            CEPH_FSCRYPT_BLOCK_SIZE,
> > > +                            offset_in_page(last_pos),
> > > +                            last_pos >> CEPH_FSCRYPT_BLOCK_SHIFT);
> > > +                    if (ret < 0)
> > > +                        break;
> > > +                }
> > > +            }
> > >           }
> > >             left = len;
> > > -        off = pos & ~CEPH_FSCRYPT_BLOCK_MASK;
> > > +        off = offset_in_page(pos);
> > >           for (n = 0; n < num_pages; n++) {
> > > -            size_t plen = min_t(size_t, left, 
> > > CEPH_FSCRYPT_BLOCK_SIZE - off);
> > > +            size_t plen = min_t(size_t, left, PAGE_SIZE - off);
> > > +
> > > +            /* copy the data */
> > >               ret = copy_page_from_iter(pages[n], off, plen, from);
> > > -            off = 0;
> > >               if (ret != plen) {
> > >                   ret = -EFAULT;
> > >                   break;
> > >               }
> > > +            off = 0;
> > >               left -= ret;
> > >           }
> > > -
> > >           if (ret < 0) {
> > > +            dout("sync_write write failed with %d\n", ret);
> > >               ceph_release_page_vector(pages, num_pages);
> > > -            goto out;
> > > +            break;
> > >           }
> > >   -        req->r_inode = inode;
> > > +        if (IS_ENCRYPTED(inode)) {
> > > +            ret = ceph_fscrypt_encrypt_pages(inode, pages,
> > > +                             write_pos, write_len,
> > > +                             GFP_KERNEL);
> > > +            if (ret < 0) {
> > > +                dout("encryption failed with %d\n", ret);
> 
> And here ?
> 
> 
> > > +                break;
> > > +            }
> > > +        }
> > >   -        osd_req_op_extent_osd_data_pages(req, 0, pages, len,
> > > -                         pos & ~CEPH_FSCRYPT_BLOCK_MASK,
> > > -                         false, true);
> > 
> > The pages have already been released, you need to allocate new pages 
> > again here.
> > 
> > > +        req = ceph_osdc_new_request(osdc, &ci->i_layout,
> > > +                        ci->i_vino, write_pos, &write_len,
> > > +                        rmw ? 1 : 0, rmw ? 2 : 1,
> > > +                        CEPH_OSD_OP_WRITE,
> > > +                        CEPH_OSD_FLAG_WRITE,
> > > +                        snapc, ci->i_truncate_seq,
> > > +                        ci->i_truncate_size, false);
> > > +        if (IS_ERR(req)) {
> > > +            ret = PTR_ERR(req);
> > > +            ceph_release_page_vector(pages, num_pages);
> > > +            break;
> > > +        }
> > >   +        dout("sync_write write op %lld~%llu\n", write_pos, 
> > > write_len);
> > > +        osd_req_op_extent_osd_data_pages(req, rmw ? 1 : 0, pages, 
> > > write_len,
> > > +                         offset_in_page(write_pos), false,
> > > +                         true);
> > > +        req->r_inode = inode;
> > >           req->r_mtime = mtime;
> > > -        ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
> > > +
> > > +        /* Set up the assertion */
> > > +        if (rmw) {
> > > +            /*
> > > +             * Set up the assertion. If we don't have a version number,
> > > +             * then the object doesn't exist yet. Use an exclusive 
> > > create
> > > +             * instead of a version assertion in that case.
> > > +             */
> > > +            if (assert_ver) {
> > > +                osd_req_op_init(req, 0, CEPH_OSD_OP_ASSERT_VER, 0);
> > > +                req->r_ops[0].assert_ver.ver = assert_ver;
> > > +            } else {
> > > +                osd_req_op_init(req, 0, CEPH_OSD_OP_CREATE,
> > > +                        CEPH_OSD_OP_FLAG_EXCL);
> > > +            }
> > > +        }
> > > +
> > > +        ret = ceph_osdc_start_request(osdc, req, false);
> > >           if (!ret)
> > > -            ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
> > > +            ret = ceph_osdc_wait_request(osdc, req);
> > > ceph_update_write_metrics(&fsc->mdsc->metric, req->r_start_latency,
> > >                         req->r_end_latency, len, ret);
> > > -out:
> > >           ceph_osdc_put_request(req);
> > >           if (ret != 0) {
> > > +            dout("sync_write osd write returned %d\n", ret);
> > > +            /* Version changed! Must re-do the rmw cycle */
> > > +            if ((assert_ver && (ret == -ERANGE || ret == 
> > > -EOVERFLOW)) ||
> > > +                 (!assert_ver && ret == -EEXIST)) {
> > > +                /* We should only ever see this on a rmw */
> > > +                WARN_ON_ONCE(!rmw);
> > > +
> > > +                /* The version should never go backward */
> > > +                WARN_ON_ONCE(ret == -EOVERFLOW);
> > > +
> > > +                *from = saved_iter;
> > > +
> > > +                /* FIXME: limit number of times we loop? */
> > > +                continue;
> > > +            }
> > >               ceph_set_error_write(ci);
> > >               break;
> > >           }
> > > -
> > >           ceph_clear_error_write(ci);
> > >           pos += len;
> > >           written += len;
> > > @@ -1580,6 +1775,7 @@ ceph_sync_write(struct kiocb *iocb, struct 
> > > iov_iter *from, loff_t pos,
> > >           ret = written;
> > >           iocb->ki_pos = pos;
> > >       }
> > > +    dout("sync_write returning %d\n", ret);
> > >       return ret;
> > >   }
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 44/48] ceph: plumb in decryption during sync reads
  2022-01-19  5:18   ` Xiubo Li
@ 2022-01-19 18:49     ` Jeff Layton
  0 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-19 18:49 UTC (permalink / raw)
  To: Xiubo Li, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

On Wed, 2022-01-19 at 13:18 +0800, Xiubo Li wrote:
> On 1/12/22 3:16 AM, Jeff Layton wrote:
> > Note that the crypto block may be smaller than a page, but the reverse
> > cannot be true.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >   fs/ceph/file.c | 94 ++++++++++++++++++++++++++++++++++++--------------
> >   1 file changed, 69 insertions(+), 25 deletions(-)
> > 
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 41766b2012e9..b4f2fcd33837 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -926,9 +926,17 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> >   		bool more;
> >   		int idx;
> >   		size_t left;
> > +		u64 read_off = off;
> > +		u64 read_len = len;
> > +
> > +		/* determine new offset/length if encrypted */
> > +		fscrypt_adjust_off_and_len(inode, &read_off, &read_len);
> > +
> > +		dout("sync_read orig %llu~%llu reading %llu~%llu",
> > +		     off, len, read_off, read_len);
> >   
> >   		req = ceph_osdc_new_request(osdc, &ci->i_layout,
> > -					ci->i_vino, off, &len, 0, 1,
> > +					ci->i_vino, read_off, &read_len, 0, 1,
> >   					CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
> >   					NULL, ci->i_truncate_seq,
> >   					ci->i_truncate_size, false);
> > @@ -937,10 +945,13 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> >   			break;
> >   		}
> >   
> > +		/* adjust len downward if the request truncated the len */
> > +		if (off + len > read_off + read_len)
> > +			len = read_off + read_len - off;
> >   		more = len < iov_iter_count(to);
> >   
> > -		num_pages = calc_pages_for(off, len);
> > -		page_off = off & ~PAGE_MASK;
> > +		num_pages = calc_pages_for(read_off, read_len);
> > +		page_off = offset_in_page(off);
> >   		pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL);
> >   		if (IS_ERR(pages)) {
> >   			ceph_osdc_put_request(req);
> > @@ -948,7 +959,8 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> >   			break;
> >   		}
> >   
> > -		osd_req_op_extent_osd_data_pages(req, 0, pages, len, page_off,
> > +		osd_req_op_extent_osd_data_pages(req, 0, pages, read_len,
> > +						 offset_in_page(read_off),
> >   						 false, false);
> >   		ret = ceph_osdc_start_request(osdc, req, false);
> >   		if (!ret)
> > @@ -957,23 +969,50 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> >   		ceph_update_read_metrics(&fsc->mdsc->metric,
> >   					 req->r_start_latency,
> >   					 req->r_end_latency,
> > -					 len, ret);
> > +					 read_len, ret);
> >   
> >   		if (ret > 0)
> >   			objver = req->r_version;
> >   		ceph_osdc_put_request(req);
> > -
> >   		i_size = i_size_read(inode);
> >   		dout("sync_read %llu~%llu got %zd i_size %llu%s\n",
> >   		     off, len, ret, i_size, (more ? " MORE" : ""));
> >   
> > -		if (ret == -ENOENT)
> > +		if (ret == -ENOENT) {
> > +			/* No object? Then this is a hole */
> >   			ret = 0;
> > +		} else if (ret > 0 && IS_ENCRYPTED(inode)) {
> > +			int fret;
> > +
> > +			fret = ceph_fscrypt_decrypt_pages(inode, pages, read_off, ret);
> > +			if (fret < 0) {
> > +				ceph_release_page_vector(pages, num_pages);
> > +				ret = fret;
> > +				break;
> > +			}
> > +
> > +			dout("sync_read decrypted fret %d\n", fret);
> > +
> > +			/* account for any partial block at the beginning */
> > +			fret -= (off - read_off);
> > +
> > +			/*
> > +			 * Short read after big offset adjustment?
> > +			 * Nothing is usable, just call it a zero
> > +			 * len read.
> > +			 */
> > +			fret = max(fret, 0);
> > +
> > +			/* account for partial block at the end */
> > +			ret = min_t(ssize_t, fret, len);
> > +		}
> > +
> > +		/* Short read but not EOF? Zero out the remainder. */
> >   		if (ret >= 0 && ret < len && (off + ret < i_size)) {
> >   			int zlen = min(len - ret, i_size - off - ret);
> >   			int zoff = page_off + ret;
> >   			dout("sync_read zero gap %llu~%llu\n",
> > -                             off + ret, off + ret + zlen);
> > +			     off + ret, off + ret + zlen);
> >   			ceph_zero_page_vector_range(zoff, zlen, pages);
> >   			ret += zlen;
> >   		}
> > @@ -981,15 +1020,15 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> >   		idx = 0;
> >   		left = ret > 0 ? ret : 0;
> >   		while (left > 0) {
> > -			size_t len, copied;
> > -			page_off = off & ~PAGE_MASK;
> > -			len = min_t(size_t, left, PAGE_SIZE - page_off);
> > +			size_t plen, copied;
> > +			plen = min_t(size_t, left, PAGE_SIZE - page_off);
> >   			SetPageUptodate(pages[idx]);
> >   			copied = copy_page_to_iter(pages[idx++],
> > -						   page_off, len, to);
> > +						   page_off, plen, to);
> >   			off += copied;
> >   			left -= copied;
> > -			if (copied < len) {
> > +			page_off = 0;
> > +			if (copied < plen) {
> >   				ret = -EFAULT;
> >   				break;
> >   			}
> > @@ -1006,20 +1045,21 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
> >   			break;
> >   	}
> >   
> > -	if (off > *ki_pos) {
> > -		if (off >= i_size) {
> > -			*retry_op = CHECK_EOF;
> > -			ret = i_size - *ki_pos;
> > -			*ki_pos = i_size;
> > -		} else {
> > -			ret = off - *ki_pos;
> > -			*ki_pos = off;
> > +	if (ret > 0) {
> > +		if (off > *ki_pos) {
> > +			if (off >= i_size) {
> > +				*retry_op = CHECK_EOF;
> > +				ret = i_size - *ki_pos;
> > +				*ki_pos = i_size;
> > +			} else {
> > +				ret = off - *ki_pos;
> > +				*ki_pos = off;
> > +			}
> >   		}
> > -	}
> > -
> > -	if (last_objver && ret > 0)
> > -		*last_objver = objver;
> >   
> > +		if (last_objver)
> > +			*last_objver = objver;
> > +	}
> >   	dout("sync_read result %zd retry_op %d\n", ret, *retry_op);
> >   	return ret;
> >   }
> > @@ -1532,6 +1572,9 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> >   		last = (pos + len) != (write_pos + write_len);
> >   		rmw = first || last;
> >   
> > +		dout("sync_write ino %llx %lld~%llu adjusted %lld~%llu -- %srmw\n",
> > +		     ci->i_vino.ino, pos, len, write_pos, write_len, rmw ? "" : "no ");
> > +
> 
> Should this move to the previous patch ?
> 
> 

Yes, fixed in wip-fscrypt. Thanks!

> >   		/*
> >   		 * The data is emplaced into the page as it would be if it were in
> >   		 * an array of pagecache pages.
> > @@ -1761,6 +1804,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> >   		ceph_clear_error_write(ci);
> >   		pos += len;
> >   		written += len;
> > +		dout("sync_write written %d\n", written);
> >   		if (pos > i_size_read(inode)) {
> >   			check_caps = ceph_inode_set_size(inode, pos);
> >   			if (check_caps)
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 03/48] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
  2022-01-11 19:15 ` [RFC PATCH v10 03/48] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size Jeff Layton
@ 2022-01-27  1:58   ` Eric Biggers
  0 siblings, 0 replies; 84+ messages in thread
From: Eric Biggers @ 2022-01-27  1:58 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, idryomov

On Tue, Jan 11, 2022 at 02:15:23PM -0500, Jeff Layton wrote:
> For ceph, we want to use our own scheme for handling filenames that are
> are longer than NAME_MAX after encryption and Base64 encoding. This
> allows us to have a consistent view of the encrypted filenames for
> clients that don't support fscrypt and clients that do but that don't
> have the key.
> 
> Currently, fs/crypto only supports encrypting filenames using
> fscrypt_setup_filename, but that also handles encoding nokey names. Ceph
> can't use that because it handles nokey names in a different way.
> 
> Export fscrypt_fname_encrypt. Rename fscrypt_fname_encrypted_size to
> __fscrypt_fname_encrypted_size and add a new wrapper called
> fscrypt_fname_encrypted_size that takes an inode argument rather than a
> pointer to a fscrypt_policy union.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

Acked-by: Eric Biggers <ebiggers@google.com>

Please make sure to run checkpatch.pl, though.  There is still some weird
indentation in this patch.

- Eric

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 00/48] ceph+fscrypt: full support
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (48 preceding siblings ...)
  2022-01-11 19:26 ` [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
@ 2022-01-27  2:14 ` Eric Biggers
  2022-01-27 11:08   ` Jeff Layton
  2022-02-14  9:37 ` Xiubo Li
  2022-02-14 17:57 ` Luís Henriques
  51 siblings, 1 reply; 84+ messages in thread
From: Eric Biggers @ 2022-01-27  2:14 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, idryomov

On Tue, Jan 11, 2022 at 02:15:20PM -0500, Jeff Layton wrote:
> Still, I was able to run xfstests on this set yesterday. Bug #2 above
> prevented all of the tests from passing, but it didn't oops! I call that
> progress! Given that, I figured this is a good time to post what I have
> so far.

One question: what sort of testing are you doing to show that the file contents
and filenames being stored (i.e., sent by the client to the server in this case)
have been encrypted correctly?  xfstests has tests that verify this for block
device based filesystems; are you doing any equivalent testing?

- Eric

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 00/48] ceph+fscrypt: full support
  2022-01-27  2:14 ` Eric Biggers
@ 2022-01-27 11:08   ` Jeff Layton
  2022-01-28 20:39     ` Eric Biggers
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff Layton @ 2022-01-27 11:08 UTC (permalink / raw)
  To: Eric Biggers; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, idryomov

On Wed, 2022-01-26 at 18:14 -0800, Eric Biggers wrote:
> On Tue, Jan 11, 2022 at 02:15:20PM -0500, Jeff Layton wrote:
> > Still, I was able to run xfstests on this set yesterday. Bug #2 above
> > prevented all of the tests from passing, but it didn't oops! I call that
> > progress! Given that, I figured this is a good time to post what I have
> > so far.
> 
> One question: what sort of testing are you doing to show that the file contents
> and filenames being stored (i.e., sent by the client to the server in this case)
> have been encrypted correctly?  xfstests has tests that verify this for block
> device based filesystems; are you doing any equivalent testing?
> 

I've been testing this pretty regularly with xfstests, and the filenames
portion all seems to be working correctly. Parts of the content
encryption also seem to work ok. I'm still working that piece, so I
haven't been able to validate that part yet.

At the moment I'm working on switching the ceph client over to doing
sparse reads, which is necessary in order to be able to handle sparse
writes without filling in unwritten holes.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 00/48] ceph+fscrypt: full support
  2022-01-27 11:08   ` Jeff Layton
@ 2022-01-28 20:39     ` Eric Biggers
  2022-01-28 20:47       ` Jeff Layton
  0 siblings, 1 reply; 84+ messages in thread
From: Eric Biggers @ 2022-01-28 20:39 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, idryomov

On Thu, Jan 27, 2022 at 06:08:40AM -0500, Jeff Layton wrote:
> On Wed, 2022-01-26 at 18:14 -0800, Eric Biggers wrote:
> > On Tue, Jan 11, 2022 at 02:15:20PM -0500, Jeff Layton wrote:
> > > Still, I was able to run xfstests on this set yesterday. Bug #2 above
> > > prevented all of the tests from passing, but it didn't oops! I call that
> > > progress! Given that, I figured this is a good time to post what I have
> > > so far.
> > 
> > One question: what sort of testing are you doing to show that the file contents
> > and filenames being stored (i.e., sent by the client to the server in this case)
> > have been encrypted correctly?  xfstests has tests that verify this for block
> > device based filesystems; are you doing any equivalent testing?
> > 
> 
> I've been testing this pretty regularly with xfstests, and the filenames
> portion all seems to be working correctly. Parts of the content
> encryption also seem to work ok. I'm still working that piece, so I
> haven't been able to validate that part yet.
> 
> At the moment I'm working on switching the ceph client over to doing
> sparse reads, which is necessary in order to be able to handle sparse
> writes without filling in unwritten holes.

To clarify, I'm asking about the correctness of the ciphertext written to
"disk", not about the user-visible filesystem behavior which is something
different (but also super important as well, of course).  xfstests includes both
types of tests.

Grepping for _verify_ciphertext_for_encryption_policy in xfstests will show the
tests that verify the ciphertext written to disk.  I doubt that you're running
those, as they rely on a block device.  So you'll need to write some equivalent
tests.  In a pinch, you could simply check that the ciphertext is random rather
than correct (that would at least show that it's not plaintext) like what
generic/399 does.  But actually verifying its correctness would be ideal to
ensure that nothing went wrong along the way.

- Eric

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 00/48] ceph+fscrypt: full support
  2022-01-28 20:39     ` Eric Biggers
@ 2022-01-28 20:47       ` Jeff Layton
  0 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-01-28 20:47 UTC (permalink / raw)
  To: Eric Biggers; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, idryomov

On Fri, 2022-01-28 at 12:39 -0800, Eric Biggers wrote:
> On Thu, Jan 27, 2022 at 06:08:40AM -0500, Jeff Layton wrote:
> > On Wed, 2022-01-26 at 18:14 -0800, Eric Biggers wrote:
> > > On Tue, Jan 11, 2022 at 02:15:20PM -0500, Jeff Layton wrote:
> > > > Still, I was able to run xfstests on this set yesterday. Bug #2 above
> > > > prevented all of the tests from passing, but it didn't oops! I call that
> > > > progress! Given that, I figured this is a good time to post what I have
> > > > so far.
> > > 
> > > One question: what sort of testing are you doing to show that the file contents
> > > and filenames being stored (i.e., sent by the client to the server in this case)
> > > have been encrypted correctly?  xfstests has tests that verify this for block
> > > device based filesystems; are you doing any equivalent testing?
> > > 
> > 
> > I've been testing this pretty regularly with xfstests, and the filenames
> > portion all seems to be working correctly. Parts of the content
> > encryption also seem to work ok. I'm still working that piece, so I
> > haven't been able to validate that part yet.
> > 
> > At the moment I'm working on switching the ceph client over to doing
> > sparse reads, which is necessary in order to be able to handle sparse
> > writes without filling in unwritten holes.
> 
> To clarify, I'm asking about the correctness of the ciphertext written to
> "disk", not about the user-visible filesystem behavior which is something
> different (but also super important as well, of course).  xfstests includes both
> types of tests.
> 
> Grepping for _verify_ciphertext_for_encryption_policy in xfstests will show the
> tests that verify the ciphertext written to disk.  I doubt that you're running
> those, as they rely on a block device.  So you'll need to write some equivalent
> tests.  In a pinch, you could simply check that the ciphertext is random rather
> than correct (that would at least show that it's not plaintext) like what
> generic/399 does.  But actually verifying its correctness would be ideal to
> ensure that nothing went wrong along the way.
> 

Got it. Yes, that would be a good thing. I'll have to see what I can do
once I get to the point of a fully-functioning prototype.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 10/48] ceph: implement -o test_dummy_encryption mount option
  2022-01-11 19:15 ` [RFC PATCH v10 10/48] ceph: implement -o test_dummy_encryption mount option Jeff Layton
@ 2022-02-11 13:50   ` Luís Henriques
  2022-02-11 14:52     ` Jeff Layton
  0 siblings, 1 reply; 84+ messages in thread
From: Luís Henriques @ 2022-02-11 13:50 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, idryomov

Jeff Layton <jlayton@kernel.org> writes:

> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/crypto.c | 53 ++++++++++++++++++++++++++++++++
>  fs/ceph/crypto.h | 26 ++++++++++++++++
>  fs/ceph/inode.c  | 10 ++++--
>  fs/ceph/super.c  | 79 ++++++++++++++++++++++++++++++++++++++++++++++--
>  fs/ceph/super.h  | 12 +++++++-
>  fs/ceph/xattr.c  |  3 ++
>  6 files changed, 177 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> index a513ff373b13..017f31eacb74 100644
> --- a/fs/ceph/crypto.c
> +++ b/fs/ceph/crypto.c
> @@ -4,6 +4,7 @@
>  #include <linux/fscrypt.h>
>  
>  #include "super.h"
> +#include "mds_client.h"
>  #include "crypto.h"
>  
>  static int ceph_crypt_get_context(struct inode *inode, void *ctx, size_t len)
> @@ -64,9 +65,20 @@ static bool ceph_crypt_empty_dir(struct inode *inode)
>  	return ci->i_rsubdirs + ci->i_rfiles == 1;
>  }
>  
> +void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
> +{
> +	fscrypt_free_dummy_policy(&fsc->dummy_enc_policy);
> +}
> +
> +static const union fscrypt_policy *ceph_get_dummy_policy(struct super_block *sb)
> +{
> +	return ceph_sb_to_client(sb)->dummy_enc_policy.policy;
> +}
> +
>  static struct fscrypt_operations ceph_fscrypt_ops = {
>  	.get_context		= ceph_crypt_get_context,
>  	.set_context		= ceph_crypt_set_context,
> +	.get_dummy_policy	= ceph_get_dummy_policy,
>  	.empty_dir		= ceph_crypt_empty_dir,
>  };
>  
> @@ -74,3 +86,44 @@ void ceph_fscrypt_set_ops(struct super_block *sb)
>  {
>  	fscrypt_set_ops(sb, &ceph_fscrypt_ops);
>  }
> +
> +int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
> +				 struct ceph_acl_sec_ctx *as)
> +{
> +	int ret, ctxsize;
> +	bool encrypted = false;
> +	struct ceph_inode_info *ci = ceph_inode(inode);
> +
> +	ret = fscrypt_prepare_new_inode(dir, inode, &encrypted);
> +	if (ret)
> +		return ret;
> +	if (!encrypted)
> +		return 0;
> +
> +	as->fscrypt_auth = kzalloc(sizeof(*as->fscrypt_auth), GFP_KERNEL);
> +	if (!as->fscrypt_auth)
> +		return -ENOMEM;
> +

Isn't this memory allocation leaking bellow in the error paths?

(Yeah, I'm finally (but slowly) catching up with this series... my memory
is blurry and there are a lot of things I forgot...)

Cheers,
-- 
Luís

> +	ctxsize = fscrypt_context_for_new_inode(as->fscrypt_auth->cfa_blob, inode);
> +	if (ctxsize < 0)
> +		return ctxsize;
> +
> +	as->fscrypt_auth->cfa_version = cpu_to_le32(CEPH_FSCRYPT_AUTH_VERSION);
> +	as->fscrypt_auth->cfa_blob_len = cpu_to_le32(ctxsize);
> +
> +	WARN_ON_ONCE(ci->fscrypt_auth);
> +	kfree(ci->fscrypt_auth);
> +	ci->fscrypt_auth_len = ceph_fscrypt_auth_len(as->fscrypt_auth);
> +	ci->fscrypt_auth = kmemdup(as->fscrypt_auth, ci->fscrypt_auth_len, GFP_KERNEL);
> +	if (!ci->fscrypt_auth)
> +		return -ENOMEM;
> +
> +	inode->i_flags |= S_ENCRYPTED;
> +
> +	return 0;
> +}
> +
> +void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as)
> +{
> +	swap(req->r_fscrypt_auth, as->fscrypt_auth);
> +}
> diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
> index 6dca674f79b8..cb00fe42d5b7 100644
> --- a/fs/ceph/crypto.h
> +++ b/fs/ceph/crypto.h
> @@ -8,6 +8,10 @@
>  
>  #include <linux/fscrypt.h>
>  
> +struct ceph_fs_client;
> +struct ceph_acl_sec_ctx;
> +struct ceph_mds_request;
> +
>  struct ceph_fscrypt_auth {
>  	__le32	cfa_version;
>  	__le32	cfa_blob_len;
> @@ -25,12 +29,34 @@ static inline u32 ceph_fscrypt_auth_len(struct ceph_fscrypt_auth *fa)
>  #ifdef CONFIG_FS_ENCRYPTION
>  void ceph_fscrypt_set_ops(struct super_block *sb);
>  
> +void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc);
> +
> +int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
> +				 struct ceph_acl_sec_ctx *as);
> +void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as);
> +
>  #else /* CONFIG_FS_ENCRYPTION */
>  
>  static inline void ceph_fscrypt_set_ops(struct super_block *sb)
>  {
>  }
>  
> +static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
> +{
> +}
> +
> +static inline int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
> +						struct ceph_acl_sec_ctx *as)
> +{
> +	if (IS_ENCRYPTED(dir))
> +		return -EOPNOTSUPP;
> +	return 0;
> +}
> +
> +static inline void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req,
> +						struct ceph_acl_sec_ctx *as_ctx)
> +{
> +}
>  #endif /* CONFIG_FS_ENCRYPTION */
>  
>  #endif
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index c6653f83b6f0..55e23e2601df 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -83,12 +83,17 @@ struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
>  			goto out_err;
>  	}
>  
> +	inode->i_state = 0;
> +	inode->i_mode = *mode;
> +
>  	err = ceph_security_init_secctx(dentry, *mode, as_ctx);
>  	if (err < 0)
>  		goto out_err;
>  
> -	inode->i_state = 0;
> -	inode->i_mode = *mode;
> +	err = ceph_fscrypt_prepare_context(dir, inode, as_ctx);
> +	if (err)
> +		goto out_err;
> +
>  	return inode;
>  out_err:
>  	iput(inode);
> @@ -101,6 +106,7 @@ void ceph_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *a
>  		req->r_pagelist = as_ctx->pagelist;
>  		as_ctx->pagelist = NULL;
>  	}
> +	ceph_fscrypt_as_ctx_to_req(req, as_ctx);
>  }
>  
>  /**
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index fbdf434b4618..0b32d31c6fe0 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -45,6 +45,7 @@ static void ceph_put_super(struct super_block *s)
>  	struct ceph_fs_client *fsc = ceph_sb_to_client(s);
>  
>  	dout("put_super\n");
> +	ceph_fscrypt_free_dummy_policy(fsc);
>  	ceph_mdsc_close_sessions(fsc->mdsc);
>  }
>  
> @@ -162,6 +163,7 @@ enum {
>  	Opt_copyfrom,
>  	Opt_wsync,
>  	Opt_pagecache,
> +	Opt_test_dummy_encryption,
>  };
>  
>  enum ceph_recover_session_mode {
> @@ -189,6 +191,7 @@ static const struct fs_parameter_spec ceph_mount_parameters[] = {
>  	fsparam_string	("fsc",				Opt_fscache), // fsc=...
>  	fsparam_flag_no ("ino32",			Opt_ino32),
>  	fsparam_string	("mds_namespace",		Opt_mds_namespace),
> +	fsparam_string	("mon_addr",			Opt_mon_addr),
>  	fsparam_flag_no ("poolperm",			Opt_poolperm),
>  	fsparam_flag_no ("quotadf",			Opt_quotadf),
>  	fsparam_u32	("rasize",			Opt_rasize),
> @@ -200,7 +203,8 @@ static const struct fs_parameter_spec ceph_mount_parameters[] = {
>  	fsparam_u32	("rsize",			Opt_rsize),
>  	fsparam_string	("snapdirname",			Opt_snapdirname),
>  	fsparam_string	("source",			Opt_source),
> -	fsparam_string	("mon_addr",			Opt_mon_addr),
> +	fsparam_flag	("test_dummy_encryption",	Opt_test_dummy_encryption),
> +	fsparam_string	("test_dummy_encryption",	Opt_test_dummy_encryption),
>  	fsparam_u32	("wsize",			Opt_wsize),
>  	fsparam_flag_no	("wsync",			Opt_wsync),
>  	fsparam_flag_no	("pagecache",			Opt_pagecache),
> @@ -576,6 +580,16 @@ static int ceph_parse_mount_param(struct fs_context *fc,
>  		else
>  			fsopt->flags &= ~CEPH_MOUNT_OPT_NOPAGECACHE;
>  		break;
> +	case Opt_test_dummy_encryption:
> +#ifdef CONFIG_FS_ENCRYPTION
> +		kfree(fsopt->test_dummy_encryption);
> +		fsopt->test_dummy_encryption = param->string;
> +		param->string = NULL;
> +		fsopt->flags |= CEPH_MOUNT_OPT_TEST_DUMMY_ENC;
> +#else
> +		warnfc(fc, "FS encryption not supported: test_dummy_encryption mount option ignored");
> +#endif
> +		break;
>  	default:
>  		BUG();
>  	}
> @@ -596,6 +610,7 @@ static void destroy_mount_options(struct ceph_mount_options *args)
>  	kfree(args->server_path);
>  	kfree(args->fscache_uniq);
>  	kfree(args->mon_addr);
> +	kfree(args->test_dummy_encryption);
>  	kfree(args);
>  }
>  
> @@ -714,6 +729,8 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root)
>  	if (fsopt->flags & CEPH_MOUNT_OPT_NOPAGECACHE)
>  		seq_puts(m, ",nopagecache");
>  
> +	fscrypt_show_test_dummy_encryption(m, ',', root->d_sb);
> +
>  	if (fsopt->wsize != CEPH_MAX_WRITE_SIZE)
>  		seq_printf(m, ",wsize=%u", fsopt->wsize);
>  	if (fsopt->rsize != CEPH_MAX_READ_SIZE)
> @@ -1041,6 +1058,52 @@ static struct dentry *open_root_dentry(struct ceph_fs_client *fsc,
>  	return root;
>  }
>  
> +#ifdef CONFIG_FS_ENCRYPTION
> +static int ceph_set_test_dummy_encryption(struct super_block *sb, struct fs_context *fc,
> +						struct ceph_mount_options *fsopt)
> +{
> +	/*
> +	 * No changing encryption context on remount. Note that
> +	 * fscrypt_set_test_dummy_encryption will validate the version
> +	 * string passed in (if any).
> +	 */
> +	if (fsopt->flags & CEPH_MOUNT_OPT_TEST_DUMMY_ENC) {
> +		struct ceph_fs_client *fsc = sb->s_fs_info;
> +		int err = 0;
> +
> +		if (fc->purpose == FS_CONTEXT_FOR_RECONFIGURE && !fsc->dummy_enc_policy.policy) {
> +			errorfc(fc, "Can't set test_dummy_encryption on remount");
> +			return -EEXIST;
> +		}
> +
> +		err = fscrypt_set_test_dummy_encryption(sb,
> +							fsc->mount_options->test_dummy_encryption,
> +							&fsc->dummy_enc_policy);
> +		if (err) {
> +			if (err == -EEXIST)
> +				errorfc(fc, "Can't change test_dummy_encryption on remount");
> +			else if (err == -EINVAL)
> +				errorfc(fc, "Value of option \"%s\" is unrecognized",
> +					fsc->mount_options->test_dummy_encryption);
> +			else
> +				errorfc(fc, "Error processing option \"%s\" [%d]",
> +					fsc->mount_options->test_dummy_encryption, err);
> +			return err;
> +		}
> +		warnfc(fc, "test_dummy_encryption mode enabled");
> +	}
> +	return 0;
> +}
> +#else
> +static inline int ceph_set_test_dummy_encryption(struct super_block *sb, struct fs_context *fc,
> +						struct ceph_mount_options *fsopt)
> +{
> +	if (fsopt->flags & CEPH_MOUNT_OPT_TEST_DUMMY_ENC)
> +		warnfc(fc, "test_dummy_encryption mode ignored");
> +	return 0;
> +}
> +#endif
> +
>  /*
>   * mount: join the ceph cluster, and open root directory.
>   */
> @@ -1069,6 +1132,10 @@ static struct dentry *ceph_real_mount(struct ceph_fs_client *fsc,
>  				goto out;
>  		}
>  
> +		err = ceph_set_test_dummy_encryption(fsc->sb, fc, fsc->mount_options);
> +		if (err)
> +			goto out;
> +
>  		dout("mount opening path '%s'\n", path);
>  
>  		ceph_fs_debugfs_init(fsc);
> @@ -1277,9 +1344,15 @@ static void ceph_free_fc(struct fs_context *fc)
>  
>  static int ceph_reconfigure_fc(struct fs_context *fc)
>  {
> +	int err;
>  	struct ceph_parse_opts_ctx *pctx = fc->fs_private;
>  	struct ceph_mount_options *fsopt = pctx->opts;
> -	struct ceph_fs_client *fsc = ceph_sb_to_client(fc->root->d_sb);
> +	struct super_block *sb = fc->root->d_sb;
> +	struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
> +
> +	err = ceph_set_test_dummy_encryption(sb, fc, fsopt);
> +	if (err)
> +		return err;
>  
>  	if (fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS)
>  		ceph_set_mount_opt(fsc, ASYNC_DIROPS);
> @@ -1293,7 +1366,7 @@ static int ceph_reconfigure_fc(struct fs_context *fc)
>  		pr_notice("ceph: monitor addresses recorded, but not used for reconnection");
>  	}
>  
> -	sync_filesystem(fc->root->d_sb);
> +	sync_filesystem(sb);
>  	return 0;
>  }
>  
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 853577f8d772..042ea1f8e5c2 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -17,6 +17,7 @@
>  #include <linux/posix_acl.h>
>  #include <linux/refcount.h>
>  #include <linux/security.h>
> +#include <linux/fscrypt.h>
>  
>  #include <linux/ceph/libceph.h>
>  
> @@ -24,6 +25,8 @@
>  #include <linux/fscache.h>
>  #endif
>  
> +#include "crypto.h"
> +
>  /* f_type in struct statfs */
>  #define CEPH_SUPER_MAGIC 0x00c36400
>  
> @@ -46,6 +49,7 @@
>  #define CEPH_MOUNT_OPT_NOCOPYFROM      (1<<14) /* don't use RADOS 'copy-from' op */
>  #define CEPH_MOUNT_OPT_ASYNC_DIROPS    (1<<15) /* allow async directory ops */
>  #define CEPH_MOUNT_OPT_NOPAGECACHE     (1<<16) /* bypass pagecache altogether */
> +#define CEPH_MOUNT_OPT_TEST_DUMMY_ENC  (1<<17) /* enable dummy encryption (for testing) */
>  
>  #define CEPH_MOUNT_OPT_DEFAULT			\
>  	(CEPH_MOUNT_OPT_DCACHE |		\
> @@ -102,6 +106,7 @@ struct ceph_mount_options {
>  	char *server_path;    /* default NULL (means "/") */
>  	char *fscache_uniq;   /* default NULL */
>  	char *mon_addr;
> +	char *test_dummy_encryption;	/* default NULL */
>  };
>  
>  struct ceph_fs_client {
> @@ -141,9 +146,11 @@ struct ceph_fs_client {
>  #ifdef CONFIG_CEPH_FSCACHE
>  	struct fscache_volume *fscache;
>  #endif
> +#ifdef CONFIG_FS_ENCRYPTION
> +	struct fscrypt_dummy_policy dummy_enc_policy;
> +#endif
>  };
>  
> -
>  /*
>   * File i/o capability.  This tracks shared state with the metadata
>   * server that allows us to cache or writeback attributes or to read
> @@ -1083,6 +1090,9 @@ struct ceph_acl_sec_ctx {
>  #ifdef CONFIG_CEPH_FS_SECURITY_LABEL
>  	void *sec_ctx;
>  	u32 sec_ctxlen;
> +#endif
> +#ifdef CONFIG_FS_ENCRYPTION
> +	struct ceph_fscrypt_auth *fscrypt_auth;
>  #endif
>  	struct ceph_pagelist *pagelist;
>  };
> diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
> index fcf7dfdecf96..5e3522457deb 100644
> --- a/fs/ceph/xattr.c
> +++ b/fs/ceph/xattr.c
> @@ -1380,6 +1380,9 @@ void ceph_release_acl_sec_ctx(struct ceph_acl_sec_ctx *as_ctx)
>  #endif
>  #ifdef CONFIG_CEPH_FS_SECURITY_LABEL
>  	security_release_secctx(as_ctx->sec_ctx, as_ctx->sec_ctxlen);
> +#endif
> +#ifdef CONFIG_FS_ENCRYPTION
> +	kfree(as_ctx->fscrypt_auth);
>  #endif
>  	if (as_ctx->pagelist)
>  		ceph_pagelist_release(as_ctx->pagelist);
> -- 
>
> 2.34.1
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 10/48] ceph: implement -o test_dummy_encryption mount option
  2022-02-11 13:50   ` Luís Henriques
@ 2022-02-11 14:52     ` Jeff Layton
  2022-02-14  9:29       ` Luís Henriques
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff Layton @ 2022-02-11 14:52 UTC (permalink / raw)
  To: Luís Henriques; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, idryomov

On Fri, 2022-02-11 at 13:50 +0000, Luís Henriques wrote:
> Jeff Layton <jlayton@kernel.org> writes:
> 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/crypto.c | 53 ++++++++++++++++++++++++++++++++
> >  fs/ceph/crypto.h | 26 ++++++++++++++++
> >  fs/ceph/inode.c  | 10 ++++--
> >  fs/ceph/super.c  | 79 ++++++++++++++++++++++++++++++++++++++++++++++--
> >  fs/ceph/super.h  | 12 +++++++-
> >  fs/ceph/xattr.c  |  3 ++
> >  6 files changed, 177 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> > index a513ff373b13..017f31eacb74 100644
> > --- a/fs/ceph/crypto.c
> > +++ b/fs/ceph/crypto.c
> > @@ -4,6 +4,7 @@
> >  #include <linux/fscrypt.h>
> >  
> >  #include "super.h"
> > +#include "mds_client.h"
> >  #include "crypto.h"
> >  
> >  static int ceph_crypt_get_context(struct inode *inode, void *ctx, size_t len)
> > @@ -64,9 +65,20 @@ static bool ceph_crypt_empty_dir(struct inode *inode)
> >  	return ci->i_rsubdirs + ci->i_rfiles == 1;
> >  }
> >  
> > +void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
> > +{
> > +	fscrypt_free_dummy_policy(&fsc->dummy_enc_policy);
> > +}
> > +
> > +static const union fscrypt_policy *ceph_get_dummy_policy(struct super_block *sb)
> > +{
> > +	return ceph_sb_to_client(sb)->dummy_enc_policy.policy;
> > +}
> > +
> >  static struct fscrypt_operations ceph_fscrypt_ops = {
> >  	.get_context		= ceph_crypt_get_context,
> >  	.set_context		= ceph_crypt_set_context,
> > +	.get_dummy_policy	= ceph_get_dummy_policy,
> >  	.empty_dir		= ceph_crypt_empty_dir,
> >  };
> >  
> > @@ -74,3 +86,44 @@ void ceph_fscrypt_set_ops(struct super_block *sb)
> >  {
> >  	fscrypt_set_ops(sb, &ceph_fscrypt_ops);
> >  }
> > +
> > +int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
> > +				 struct ceph_acl_sec_ctx *as)
> > +{
> > +	int ret, ctxsize;
> > +	bool encrypted = false;
> > +	struct ceph_inode_info *ci = ceph_inode(inode);
> > +
> > +	ret = fscrypt_prepare_new_inode(dir, inode, &encrypted);
> > +	if (ret)
> > +		return ret;
> > +	if (!encrypted)
> > +		return 0;
> > +
> > +	as->fscrypt_auth = kzalloc(sizeof(*as->fscrypt_auth), GFP_KERNEL);
> > +	if (!as->fscrypt_auth)
> > +		return -ENOMEM;
> > +
> 
> Isn't this memory allocation leaking bellow in the error paths?
> 
> (Yeah, I'm finally (but slowly) catching up with this series... my memory
> is blurry and there are a lot of things I forgot...)
> 
> Cheers,

No. If an error bubbles back up here, we'll eventually call
ceph_release_acl_sec_ctx on the thing, and it'll be kfreed then.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 10/48] ceph: implement -o test_dummy_encryption mount option
  2022-02-11 14:52     ` Jeff Layton
@ 2022-02-14  9:29       ` Luís Henriques
  0 siblings, 0 replies; 84+ messages in thread
From: Luís Henriques @ 2022-02-14  9:29 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, idryomov

Jeff Layton <jlayton@kernel.org> writes:

> On Fri, 2022-02-11 at 13:50 +0000, Luís Henriques wrote:
>> Jeff Layton <jlayton@kernel.org> writes:
>> 
>> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
>> > ---
>> >  fs/ceph/crypto.c | 53 ++++++++++++++++++++++++++++++++
>> >  fs/ceph/crypto.h | 26 ++++++++++++++++
>> >  fs/ceph/inode.c  | 10 ++++--
>> >  fs/ceph/super.c  | 79 ++++++++++++++++++++++++++++++++++++++++++++++--
>> >  fs/ceph/super.h  | 12 +++++++-
>> >  fs/ceph/xattr.c  |  3 ++
>> >  6 files changed, 177 insertions(+), 6 deletions(-)
>> > 
>> > diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
>> > index a513ff373b13..017f31eacb74 100644
>> > --- a/fs/ceph/crypto.c
>> > +++ b/fs/ceph/crypto.c
>> > @@ -4,6 +4,7 @@
>> >  #include <linux/fscrypt.h>
>> >  
>> >  #include "super.h"
>> > +#include "mds_client.h"
>> >  #include "crypto.h"
>> >  
>> >  static int ceph_crypt_get_context(struct inode *inode, void *ctx, size_t len)
>> > @@ -64,9 +65,20 @@ static bool ceph_crypt_empty_dir(struct inode *inode)
>> >  	return ci->i_rsubdirs + ci->i_rfiles == 1;
>> >  }
>> >  
>> > +void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
>> > +{
>> > +	fscrypt_free_dummy_policy(&fsc->dummy_enc_policy);
>> > +}
>> > +
>> > +static const union fscrypt_policy *ceph_get_dummy_policy(struct super_block *sb)
>> > +{
>> > +	return ceph_sb_to_client(sb)->dummy_enc_policy.policy;
>> > +}
>> > +
>> >  static struct fscrypt_operations ceph_fscrypt_ops = {
>> >  	.get_context		= ceph_crypt_get_context,
>> >  	.set_context		= ceph_crypt_set_context,
>> > +	.get_dummy_policy	= ceph_get_dummy_policy,
>> >  	.empty_dir		= ceph_crypt_empty_dir,
>> >  };
>> >  
>> > @@ -74,3 +86,44 @@ void ceph_fscrypt_set_ops(struct super_block *sb)
>> >  {
>> >  	fscrypt_set_ops(sb, &ceph_fscrypt_ops);
>> >  }
>> > +
>> > +int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
>> > +				 struct ceph_acl_sec_ctx *as)
>> > +{
>> > +	int ret, ctxsize;
>> > +	bool encrypted = false;
>> > +	struct ceph_inode_info *ci = ceph_inode(inode);
>> > +
>> > +	ret = fscrypt_prepare_new_inode(dir, inode, &encrypted);
>> > +	if (ret)
>> > +		return ret;
>> > +	if (!encrypted)
>> > +		return 0;
>> > +
>> > +	as->fscrypt_auth = kzalloc(sizeof(*as->fscrypt_auth), GFP_KERNEL);
>> > +	if (!as->fscrypt_auth)
>> > +		return -ENOMEM;
>> > +
>> 
>> Isn't this memory allocation leaking bellow in the error paths?
>> 
>> (Yeah, I'm finally (but slowly) catching up with this series... my memory
>> is blurry and there are a lot of things I forgot...)
>> 
>> Cheers,
>
> No. If an error bubbles back up here, we'll eventually call
> ceph_release_acl_sec_ctx on the thing, and it'll be kfreed then.

Right, the callers are expected to ensure that ceph_release_acl_sec_ctx()
is invoked, of course.  Sorry for the noise :-/

Cheers,
-- 
Luís

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 00/48] ceph+fscrypt: full support
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (49 preceding siblings ...)
  2022-01-27  2:14 ` Eric Biggers
@ 2022-02-14  9:37 ` Xiubo Li
  2022-02-14 11:33   ` Jeff Layton
  2022-02-14 17:57 ` Luís Henriques
  51 siblings, 1 reply; 84+ messages in thread
From: Xiubo Li @ 2022-02-14  9:37 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

Hi Jeff,

I am using the 'wip-fscrypt' branch to test other issue and hit:

cp: cannot access './dir___683': No buffer space available
cp: cannot access './dir___686': No buffer space available
cp: cannot access './dir___687': No buffer space available
cp: cannot access './dir___688': No buffer space available
cp: cannot access './dir___689': No buffer space available
cp: cannot access './dir___693': No buffer space available

...

[root@lxbceph1 kcephfs]# diff ./dir___997 /data/backup/kernel/dir___997
diff: ./dir___997: No buffer space available


The dmesg logs:

<7>[ 1256.918228] ceph:  do_getattr inode 0000000089964a71 mask AsXsFs 
mode 040755
<7>[ 1256.918232] ceph:  __ceph_caps_issued_mask ino 0x100000009be cap 
0000000014f1c64b issued pAsLsXsFs (mask AsXsFs)
<7>[ 1256.918237] ceph:  __touch_cap 0000000089964a71 cap 
0000000014f1c64b mds0
<7>[ 1256.918250] ceph:  readdir 0000000089964a71 file 00000000065cb689 
pos 0
<7>[ 1256.918254] ceph:  readdir off 0 -> '.'
<7>[ 1256.918258] ceph:  readdir off 1 -> '..'
<4>[ 1256.918262] fscrypt (ceph, inode 1099511630270): Error -105 
getting encryption context
<7>[ 1256.918269] ceph:  readdir 0000000089964a71 file 00000000065cb689 
pos 2
<4>[ 1256.918273] fscrypt (ceph, inode 1099511630270): Error -105 
getting encryption context
<7>[ 1256.918288] ceph:  release inode 0000000089964a71 dir file 
00000000065cb689
<7>[ 1256.918310] ceph:  __ceph_caps_issued_mask ino 0x1 cap 
00000000aa2afb8b issued pAsLsXsFs (mask Fs)
<7>[ 1257.574593] ceph:  mdsc delayed_work

I did nothing about the fscrypt after mounting the kclient, just create 
2000 directories and then made some snapshots on the root dir and then 
try to copy the root directory to the backup.

- Xiubo

On 1/12/22 3:15 AM, Jeff Layton wrote:
> This patchset represents a (mostly) complete rough draft of fscrypt
> support for cephfs. The context, filename and symlink support is more or
> less the same as the versions posted before, and comprise the first half
> of the patches.
>
> The new bits here are the size handling changes and support for content
> encryption, in buffered, direct and synchronous codepaths. Much of this
> code is still very rough and needs a lot of cleanup work.
>
> fscrypt support relies on some MDS changes that are being tracked here:
>
>      https://github.com/ceph/ceph/pull/43588
>
> In particular, this PR adds some new opaque fields in the inode that we
> use to store fscrypt-specific information, like the context and the real
> size of a file. That is slated to be merged for the upcoming Quincy
> release (which is sometime this northern spring).
>
> There are still some notable bugs:
>
> 1/ we've identified a few more potential races in truncate handling
> which will probably necessitate a protocol change, as well as changes to
> the MDS and kclient patchsets. The good news is that we think we have
> an approach that will resolve this.
>
> 2/ the kclient doesn't handle reading sparse regions in OSD objects
> properly yet. The client can end up writing to a non-zero offset in a
> non-existent object. Then, if the client tries to read the written
> region back later, it'll get back zeroes and give you garbage when you
> try to decrypt them.
>
> It turns out that the OSD already supports a SPARSE_READ operation, so
> I'm working on implementing that in the kclient to make it not try to
> decrypt the sparse regions.
>
> Still, I was able to run xfstests on this set yesterday. Bug #2 above
> prevented all of the tests from passing, but it didn't oops! I call that
> progress! Given that, I figured this is a good time to post what I have
> so far.
>
> Note that the buffered I/O changes in this set are not suitable for
> merge and will likely end up being discarded. We need to plumb the
> encryption in at the netfs layer, so that we can store encrypted data
> in fscache.
>
> The non-buffered codepaths will likely also need substantial changes
> before merging. It may be simpler to just move that into the netfs layer
> too as cifs will need something similar anyway.
>
> My goal is to get most of this into v5.18, but v5.19 might be more
> realistiv. Hopefully I'll have a non-RFC patchset to send in a few
> weeks.
>
> Special thanks to Xiubo who came through with the MDS patches. Also,
> thanks to everyone (especially Eric Biggers) for all of the previous
> reviews. It's much appreciated!
>
> Jeff Layton (43):
>    vfs: export new_inode_pseudo
>    fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode
>    fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
>    fscrypt: add fscrypt_context_for_new_inode
>    ceph: preallocate inode for ops that may create one
>    ceph: crypto context handling for ceph
>    ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces
>    ceph: add fscrypt_* handling to caps.c
>    ceph: add ability to set fscrypt_auth via setattr
>    ceph: implement -o test_dummy_encryption mount option
>    ceph: decode alternate_name in lease info
>    ceph: add fscrypt ioctls
>    ceph: make ceph_msdc_build_path use ref-walk
>    ceph: add encrypted fname handling to ceph_mdsc_build_path
>    ceph: send altname in MClientRequest
>    ceph: encode encrypted name in dentry release
>    ceph: properly set DCACHE_NOKEY_NAME flag in lookup
>    ceph: make d_revalidate call fscrypt revalidator for encrypted
>      dentries
>    ceph: add helpers for converting names for userland presentation
>    ceph: add fscrypt support to ceph_fill_trace
>    ceph: add support to readdir for encrypted filenames
>    ceph: create symlinks with encrypted and base64-encoded targets
>    ceph: make ceph_get_name decrypt filenames
>    ceph: add a new ceph.fscrypt.auth vxattr
>    ceph: add some fscrypt guardrails
>    libceph: add CEPH_OSD_OP_ASSERT_VER support
>    ceph: size handling for encrypted inodes in cap updates
>    ceph: fscrypt_file field handling in MClientRequest messages
>    ceph: get file size from fscrypt_file when present in inode traces
>    ceph: handle fscrypt fields in cap messages from MDS
>    ceph: add infrastructure for file encryption and decryption
>    libceph: allow ceph_osdc_new_request to accept a multi-op read
>    ceph: disable fallocate for encrypted inodes
>    ceph: disable copy offload on encrypted inodes
>    ceph: don't use special DIO path for encrypted inodes
>    ceph: set encryption context on open
>    ceph: align data in pages in ceph_sync_write
>    ceph: add read/modify/write to ceph_sync_write
>    ceph: plumb in decryption during sync reads
>    ceph: set i_blkbits to crypto block size for encrypted inodes
>    ceph: add fscrypt decryption support to ceph_netfs_issue_op
>    ceph: add encryption support to writepage
>    ceph: fscrypt support for writepages
>
> Luis Henriques (1):
>    ceph: don't allow changing layout on encrypted files/directories
>
> Xiubo Li (4):
>    ceph: add __ceph_get_caps helper support
>    ceph: add __ceph_sync_read helper support
>    ceph: add object version support for sync read
>    ceph: add truncate size handling support for fscrypt
>
>   fs/ceph/Makefile                |   1 +
>   fs/ceph/acl.c                   |   4 +-
>   fs/ceph/addr.c                  | 128 +++++--
>   fs/ceph/caps.c                  | 211 ++++++++++--
>   fs/ceph/crypto.c                | 374 +++++++++++++++++++++
>   fs/ceph/crypto.h                | 237 +++++++++++++
>   fs/ceph/dir.c                   | 209 +++++++++---
>   fs/ceph/export.c                |  44 ++-
>   fs/ceph/file.c                  | 476 +++++++++++++++++++++-----
>   fs/ceph/inode.c                 | 576 +++++++++++++++++++++++++++++---
>   fs/ceph/ioctl.c                 |  87 +++++
>   fs/ceph/mds_client.c            | 349 ++++++++++++++++---
>   fs/ceph/mds_client.h            |  24 +-
>   fs/ceph/super.c                 |  90 ++++-
>   fs/ceph/super.h                 |  43 ++-
>   fs/ceph/xattr.c                 |  29 ++
>   fs/crypto/fname.c               |  44 ++-
>   fs/crypto/fscrypt_private.h     |   9 +-
>   fs/crypto/hooks.c               |   6 +-
>   fs/crypto/policy.c              |  35 +-
>   fs/inode.c                      |   1 +
>   include/linux/ceph/ceph_fs.h    |  21 +-
>   include/linux/ceph/osd_client.h |   6 +-
>   include/linux/ceph/rados.h      |   4 +
>   include/linux/fscrypt.h         |  10 +
>   net/ceph/osd_client.c           |  32 +-
>   26 files changed, 2700 insertions(+), 350 deletions(-)
>   create mode 100644 fs/ceph/crypto.c
>   create mode 100644 fs/ceph/crypto.h
>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 00/48] ceph+fscrypt: full support
  2022-02-14  9:37 ` Xiubo Li
@ 2022-02-14 11:33   ` Jeff Layton
  2022-02-14 12:08     ` Xiubo Li
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff Layton @ 2022-02-14 11:33 UTC (permalink / raw)
  To: Xiubo Li, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

On Mon, 2022-02-14 at 17:37 +0800, Xiubo Li wrote:
> Hi Jeff,
> 
> I am using the 'wip-fscrypt' branch to test other issue and hit:
> 
> cp: cannot access './dir___683': No buffer space available
> cp: cannot access './dir___686': No buffer space available
> cp: cannot access './dir___687': No buffer space available
> cp: cannot access './dir___688': No buffer space available
> cp: cannot access './dir___689': No buffer space available
> cp: cannot access './dir___693': No buffer space available
> 
> ...
> 
> [root@lxbceph1 kcephfs]# diff ./dir___997 /data/backup/kernel/dir___997
> diff: ./dir___997: No buffer space available
> 
> 
> The dmesg logs:
> 
> <7>[ 1256.918228] ceph:  do_getattr inode 0000000089964a71 mask AsXsFs 
> mode 040755
> <7>[ 1256.918232] ceph:  __ceph_caps_issued_mask ino 0x100000009be cap 
> 0000000014f1c64b issued pAsLsXsFs (mask AsXsFs)
> <7>[ 1256.918237] ceph:  __touch_cap 0000000089964a71 cap 
> 0000000014f1c64b mds0
> <7>[ 1256.918250] ceph:  readdir 0000000089964a71 file 00000000065cb689 
> pos 0
> <7>[ 1256.918254] ceph:  readdir off 0 -> '.'
> <7>[ 1256.918258] ceph:  readdir off 1 -> '..'
> <4>[ 1256.918262] fscrypt (ceph, inode 1099511630270): Error -105 
> getting encryption context
> <7>[ 1256.918269] ceph:  readdir 0000000089964a71 file 00000000065cb689 
> pos 2
> <4>[ 1256.918273] fscrypt (ceph, inode 1099511630270): Error -105 
> getting encryption context
> <7>[ 1256.918288] ceph:  release inode 0000000089964a71 dir file 
> 00000000065cb689
> <7>[ 1256.918310] ceph:  __ceph_caps_issued_mask ino 0x1 cap 
> 00000000aa2afb8b issued pAsLsXsFs (mask Fs)
> <7>[ 1257.574593] ceph:  mdsc delayed_work
> 
> I did nothing about the fscrypt after mounting the kclient, just create 
> 2000 directories and then made some snapshots on the root dir and then 
> try to copy the root directory to the backup.
> 
> - Xiubo
> 

That means that ceph_crypt_get_context returned -ENODATA, which it can
do for several different reasons. We probably need to add in some
debugging there to see which one it is...

TBH, I've done absolutely no testing with snapshots, so it's quite
possible there is some interaction there that is causing problems.

-- Jeff

> On 1/12/22 3:15 AM, Jeff Layton wrote:
> > This patchset represents a (mostly) complete rough draft of fscrypt
> > support for cephfs. The context, filename and symlink support is more or
> > less the same as the versions posted before, and comprise the first half
> > of the patches.
> > 
> > The new bits here are the size handling changes and support for content
> > encryption, in buffered, direct and synchronous codepaths. Much of this
> > code is still very rough and needs a lot of cleanup work.
> > 
> > fscrypt support relies on some MDS changes that are being tracked here:
> > 
> >      https://github.com/ceph/ceph/pull/43588
> > 
> > In particular, this PR adds some new opaque fields in the inode that we
> > use to store fscrypt-specific information, like the context and the real
> > size of a file. That is slated to be merged for the upcoming Quincy
> > release (which is sometime this northern spring).
> > 
> > There are still some notable bugs:
> > 
> > 1/ we've identified a few more potential races in truncate handling
> > which will probably necessitate a protocol change, as well as changes to
> > the MDS and kclient patchsets. The good news is that we think we have
> > an approach that will resolve this.
> > 
> > 2/ the kclient doesn't handle reading sparse regions in OSD objects
> > properly yet. The client can end up writing to a non-zero offset in a
> > non-existent object. Then, if the client tries to read the written
> > region back later, it'll get back zeroes and give you garbage when you
> > try to decrypt them.
> > 
> > It turns out that the OSD already supports a SPARSE_READ operation, so
> > I'm working on implementing that in the kclient to make it not try to
> > decrypt the sparse regions.
> > 
> > Still, I was able to run xfstests on this set yesterday. Bug #2 above
> > prevented all of the tests from passing, but it didn't oops! I call that
> > progress! Given that, I figured this is a good time to post what I have
> > so far.
> > 
> > Note that the buffered I/O changes in this set are not suitable for
> > merge and will likely end up being discarded. We need to plumb the
> > encryption in at the netfs layer, so that we can store encrypted data
> > in fscache.
> > 
> > The non-buffered codepaths will likely also need substantial changes
> > before merging. It may be simpler to just move that into the netfs layer
> > too as cifs will need something similar anyway.
> > 
> > My goal is to get most of this into v5.18, but v5.19 might be more
> > realistiv. Hopefully I'll have a non-RFC patchset to send in a few
> > weeks.
> > 
> > Special thanks to Xiubo who came through with the MDS patches. Also,
> > thanks to everyone (especially Eric Biggers) for all of the previous
> > reviews. It's much appreciated!
> > 
> > Jeff Layton (43):
> >    vfs: export new_inode_pseudo
> >    fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode
> >    fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
> >    fscrypt: add fscrypt_context_for_new_inode
> >    ceph: preallocate inode for ops that may create one
> >    ceph: crypto context handling for ceph
> >    ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces
> >    ceph: add fscrypt_* handling to caps.c
> >    ceph: add ability to set fscrypt_auth via setattr
> >    ceph: implement -o test_dummy_encryption mount option
> >    ceph: decode alternate_name in lease info
> >    ceph: add fscrypt ioctls
> >    ceph: make ceph_msdc_build_path use ref-walk
> >    ceph: add encrypted fname handling to ceph_mdsc_build_path
> >    ceph: send altname in MClientRequest
> >    ceph: encode encrypted name in dentry release
> >    ceph: properly set DCACHE_NOKEY_NAME flag in lookup
> >    ceph: make d_revalidate call fscrypt revalidator for encrypted
> >      dentries
> >    ceph: add helpers for converting names for userland presentation
> >    ceph: add fscrypt support to ceph_fill_trace
> >    ceph: add support to readdir for encrypted filenames
> >    ceph: create symlinks with encrypted and base64-encoded targets
> >    ceph: make ceph_get_name decrypt filenames
> >    ceph: add a new ceph.fscrypt.auth vxattr
> >    ceph: add some fscrypt guardrails
> >    libceph: add CEPH_OSD_OP_ASSERT_VER support
> >    ceph: size handling for encrypted inodes in cap updates
> >    ceph: fscrypt_file field handling in MClientRequest messages
> >    ceph: get file size from fscrypt_file when present in inode traces
> >    ceph: handle fscrypt fields in cap messages from MDS
> >    ceph: add infrastructure for file encryption and decryption
> >    libceph: allow ceph_osdc_new_request to accept a multi-op read
> >    ceph: disable fallocate for encrypted inodes
> >    ceph: disable copy offload on encrypted inodes
> >    ceph: don't use special DIO path for encrypted inodes
> >    ceph: set encryption context on open
> >    ceph: align data in pages in ceph_sync_write
> >    ceph: add read/modify/write to ceph_sync_write
> >    ceph: plumb in decryption during sync reads
> >    ceph: set i_blkbits to crypto block size for encrypted inodes
> >    ceph: add fscrypt decryption support to ceph_netfs_issue_op
> >    ceph: add encryption support to writepage
> >    ceph: fscrypt support for writepages
> > 
> > Luis Henriques (1):
> >    ceph: don't allow changing layout on encrypted files/directories
> > 
> > Xiubo Li (4):
> >    ceph: add __ceph_get_caps helper support
> >    ceph: add __ceph_sync_read helper support
> >    ceph: add object version support for sync read
> >    ceph: add truncate size handling support for fscrypt
> > 
> >   fs/ceph/Makefile                |   1 +
> >   fs/ceph/acl.c                   |   4 +-
> >   fs/ceph/addr.c                  | 128 +++++--
> >   fs/ceph/caps.c                  | 211 ++++++++++--
> >   fs/ceph/crypto.c                | 374 +++++++++++++++++++++
> >   fs/ceph/crypto.h                | 237 +++++++++++++
> >   fs/ceph/dir.c                   | 209 +++++++++---
> >   fs/ceph/export.c                |  44 ++-
> >   fs/ceph/file.c                  | 476 +++++++++++++++++++++-----
> >   fs/ceph/inode.c                 | 576 +++++++++++++++++++++++++++++---
> >   fs/ceph/ioctl.c                 |  87 +++++
> >   fs/ceph/mds_client.c            | 349 ++++++++++++++++---
> >   fs/ceph/mds_client.h            |  24 +-
> >   fs/ceph/super.c                 |  90 ++++-
> >   fs/ceph/super.h                 |  43 ++-
> >   fs/ceph/xattr.c                 |  29 ++
> >   fs/crypto/fname.c               |  44 ++-
> >   fs/crypto/fscrypt_private.h     |   9 +-
> >   fs/crypto/hooks.c               |   6 +-
> >   fs/crypto/policy.c              |  35 +-
> >   fs/inode.c                      |   1 +
> >   include/linux/ceph/ceph_fs.h    |  21 +-
> >   include/linux/ceph/osd_client.h |   6 +-
> >   include/linux/ceph/rados.h      |   4 +
> >   include/linux/fscrypt.h         |  10 +
> >   net/ceph/osd_client.c           |  32 +-
> >   26 files changed, 2700 insertions(+), 350 deletions(-)
> >   create mode 100644 fs/ceph/crypto.c
> >   create mode 100644 fs/ceph/crypto.h
> > 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 00/48] ceph+fscrypt: full support
  2022-02-14 11:33   ` Jeff Layton
@ 2022-02-14 12:08     ` Xiubo Li
  2022-02-15  0:44       ` Xiubo Li
  0 siblings, 1 reply; 84+ messages in thread
From: Xiubo Li @ 2022-02-14 12:08 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov


On 2/14/22 7:33 PM, Jeff Layton wrote:
> On Mon, 2022-02-14 at 17:37 +0800, Xiubo Li wrote:
>> Hi Jeff,
>>
>> I am using the 'wip-fscrypt' branch to test other issue and hit:
>>
>> cp: cannot access './dir___683': No buffer space available
>> cp: cannot access './dir___686': No buffer space available
>> cp: cannot access './dir___687': No buffer space available
>> cp: cannot access './dir___688': No buffer space available
>> cp: cannot access './dir___689': No buffer space available
>> cp: cannot access './dir___693': No buffer space available
>>
>> ...
>>
>> [root@lxbceph1 kcephfs]# diff ./dir___997 /data/backup/kernel/dir___997
>> diff: ./dir___997: No buffer space available
>>
>>
>> The dmesg logs:
>>
>> <7>[ 1256.918228] ceph:  do_getattr inode 0000000089964a71 mask AsXsFs
>> mode 040755
>> <7>[ 1256.918232] ceph:  __ceph_caps_issued_mask ino 0x100000009be cap
>> 0000000014f1c64b issued pAsLsXsFs (mask AsXsFs)
>> <7>[ 1256.918237] ceph:  __touch_cap 0000000089964a71 cap
>> 0000000014f1c64b mds0
>> <7>[ 1256.918250] ceph:  readdir 0000000089964a71 file 00000000065cb689
>> pos 0
>> <7>[ 1256.918254] ceph:  readdir off 0 -> '.'
>> <7>[ 1256.918258] ceph:  readdir off 1 -> '..'
>> <4>[ 1256.918262] fscrypt (ceph, inode 1099511630270): Error -105
>> getting encryption context
>> <7>[ 1256.918269] ceph:  readdir 0000000089964a71 file 00000000065cb689
>> pos 2
>> <4>[ 1256.918273] fscrypt (ceph, inode 1099511630270): Error -105
>> getting encryption context
>> <7>[ 1256.918288] ceph:  release inode 0000000089964a71 dir file
>> 00000000065cb689
>> <7>[ 1256.918310] ceph:  __ceph_caps_issued_mask ino 0x1 cap
>> 00000000aa2afb8b issued pAsLsXsFs (mask Fs)
>> <7>[ 1257.574593] ceph:  mdsc delayed_work
>>
>> I did nothing about the fscrypt after mounting the kclient, just create
>> 2000 directories and then made some snapshots on the root dir and then
>> try to copy the root directory to the backup.
>>
>> - Xiubo
>>
> That means that ceph_crypt_get_context returned -ENODATA, which it can

It should be -ENOBUFS.

I am not sure it relates to the snapshot stuff. I will try without the 
snapshot later.

I can debug it later, maybe in next week.

-- Xiubo

> do for several different reasons. We probably need to add in some
> debugging there to see which one it is...
>
> TBH, I've done absolutely no testing with snapshots, so it's quite
> possible there is some interaction there that is causing problems.
>
> -- Jeff
>
>> On 1/12/22 3:15 AM, Jeff Layton wrote:
>>> This patchset represents a (mostly) complete rough draft of fscrypt
>>> support for cephfs. The context, filename and symlink support is more or
>>> less the same as the versions posted before, and comprise the first half
>>> of the patches.
>>>
>>> The new bits here are the size handling changes and support for content
>>> encryption, in buffered, direct and synchronous codepaths. Much of this
>>> code is still very rough and needs a lot of cleanup work.
>>>
>>> fscrypt support relies on some MDS changes that are being tracked here:
>>>
>>>       https://github.com/ceph/ceph/pull/43588
>>>
>>> In particular, this PR adds some new opaque fields in the inode that we
>>> use to store fscrypt-specific information, like the context and the real
>>> size of a file. That is slated to be merged for the upcoming Quincy
>>> release (which is sometime this northern spring).
>>>
>>> There are still some notable bugs:
>>>
>>> 1/ we've identified a few more potential races in truncate handling
>>> which will probably necessitate a protocol change, as well as changes to
>>> the MDS and kclient patchsets. The good news is that we think we have
>>> an approach that will resolve this.
>>>
>>> 2/ the kclient doesn't handle reading sparse regions in OSD objects
>>> properly yet. The client can end up writing to a non-zero offset in a
>>> non-existent object. Then, if the client tries to read the written
>>> region back later, it'll get back zeroes and give you garbage when you
>>> try to decrypt them.
>>>
>>> It turns out that the OSD already supports a SPARSE_READ operation, so
>>> I'm working on implementing that in the kclient to make it not try to
>>> decrypt the sparse regions.
>>>
>>> Still, I was able to run xfstests on this set yesterday. Bug #2 above
>>> prevented all of the tests from passing, but it didn't oops! I call that
>>> progress! Given that, I figured this is a good time to post what I have
>>> so far.
>>>
>>> Note that the buffered I/O changes in this set are not suitable for
>>> merge and will likely end up being discarded. We need to plumb the
>>> encryption in at the netfs layer, so that we can store encrypted data
>>> in fscache.
>>>
>>> The non-buffered codepaths will likely also need substantial changes
>>> before merging. It may be simpler to just move that into the netfs layer
>>> too as cifs will need something similar anyway.
>>>
>>> My goal is to get most of this into v5.18, but v5.19 might be more
>>> realistiv. Hopefully I'll have a non-RFC patchset to send in a few
>>> weeks.
>>>
>>> Special thanks to Xiubo who came through with the MDS patches. Also,
>>> thanks to everyone (especially Eric Biggers) for all of the previous
>>> reviews. It's much appreciated!
>>>
>>> Jeff Layton (43):
>>>     vfs: export new_inode_pseudo
>>>     fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode
>>>     fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
>>>     fscrypt: add fscrypt_context_for_new_inode
>>>     ceph: preallocate inode for ops that may create one
>>>     ceph: crypto context handling for ceph
>>>     ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces
>>>     ceph: add fscrypt_* handling to caps.c
>>>     ceph: add ability to set fscrypt_auth via setattr
>>>     ceph: implement -o test_dummy_encryption mount option
>>>     ceph: decode alternate_name in lease info
>>>     ceph: add fscrypt ioctls
>>>     ceph: make ceph_msdc_build_path use ref-walk
>>>     ceph: add encrypted fname handling to ceph_mdsc_build_path
>>>     ceph: send altname in MClientRequest
>>>     ceph: encode encrypted name in dentry release
>>>     ceph: properly set DCACHE_NOKEY_NAME flag in lookup
>>>     ceph: make d_revalidate call fscrypt revalidator for encrypted
>>>       dentries
>>>     ceph: add helpers for converting names for userland presentation
>>>     ceph: add fscrypt support to ceph_fill_trace
>>>     ceph: add support to readdir for encrypted filenames
>>>     ceph: create symlinks with encrypted and base64-encoded targets
>>>     ceph: make ceph_get_name decrypt filenames
>>>     ceph: add a new ceph.fscrypt.auth vxattr
>>>     ceph: add some fscrypt guardrails
>>>     libceph: add CEPH_OSD_OP_ASSERT_VER support
>>>     ceph: size handling for encrypted inodes in cap updates
>>>     ceph: fscrypt_file field handling in MClientRequest messages
>>>     ceph: get file size from fscrypt_file when present in inode traces
>>>     ceph: handle fscrypt fields in cap messages from MDS
>>>     ceph: add infrastructure for file encryption and decryption
>>>     libceph: allow ceph_osdc_new_request to accept a multi-op read
>>>     ceph: disable fallocate for encrypted inodes
>>>     ceph: disable copy offload on encrypted inodes
>>>     ceph: don't use special DIO path for encrypted inodes
>>>     ceph: set encryption context on open
>>>     ceph: align data in pages in ceph_sync_write
>>>     ceph: add read/modify/write to ceph_sync_write
>>>     ceph: plumb in decryption during sync reads
>>>     ceph: set i_blkbits to crypto block size for encrypted inodes
>>>     ceph: add fscrypt decryption support to ceph_netfs_issue_op
>>>     ceph: add encryption support to writepage
>>>     ceph: fscrypt support for writepages
>>>
>>> Luis Henriques (1):
>>>     ceph: don't allow changing layout on encrypted files/directories
>>>
>>> Xiubo Li (4):
>>>     ceph: add __ceph_get_caps helper support
>>>     ceph: add __ceph_sync_read helper support
>>>     ceph: add object version support for sync read
>>>     ceph: add truncate size handling support for fscrypt
>>>
>>>    fs/ceph/Makefile                |   1 +
>>>    fs/ceph/acl.c                   |   4 +-
>>>    fs/ceph/addr.c                  | 128 +++++--
>>>    fs/ceph/caps.c                  | 211 ++++++++++--
>>>    fs/ceph/crypto.c                | 374 +++++++++++++++++++++
>>>    fs/ceph/crypto.h                | 237 +++++++++++++
>>>    fs/ceph/dir.c                   | 209 +++++++++---
>>>    fs/ceph/export.c                |  44 ++-
>>>    fs/ceph/file.c                  | 476 +++++++++++++++++++++-----
>>>    fs/ceph/inode.c                 | 576 +++++++++++++++++++++++++++++---
>>>    fs/ceph/ioctl.c                 |  87 +++++
>>>    fs/ceph/mds_client.c            | 349 ++++++++++++++++---
>>>    fs/ceph/mds_client.h            |  24 +-
>>>    fs/ceph/super.c                 |  90 ++++-
>>>    fs/ceph/super.h                 |  43 ++-
>>>    fs/ceph/xattr.c                 |  29 ++
>>>    fs/crypto/fname.c               |  44 ++-
>>>    fs/crypto/fscrypt_private.h     |   9 +-
>>>    fs/crypto/hooks.c               |   6 +-
>>>    fs/crypto/policy.c              |  35 +-
>>>    fs/inode.c                      |   1 +
>>>    include/linux/ceph/ceph_fs.h    |  21 +-
>>>    include/linux/ceph/osd_client.h |   6 +-
>>>    include/linux/ceph/rados.h      |   4 +
>>>    include/linux/fscrypt.h         |  10 +
>>>    net/ceph/osd_client.c           |  32 +-
>>>    26 files changed, 2700 insertions(+), 350 deletions(-)
>>>    create mode 100644 fs/ceph/crypto.c
>>>    create mode 100644 fs/ceph/crypto.h
>>>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 00/48] ceph+fscrypt: full support
  2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
                   ` (50 preceding siblings ...)
  2022-02-14  9:37 ` Xiubo Li
@ 2022-02-14 17:57 ` Luís Henriques
  2022-02-14 18:39   ` Jeff Layton
  51 siblings, 1 reply; 84+ messages in thread
From: Luís Henriques @ 2022-02-14 17:57 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, idryomov

Jeff Layton <jlayton@kernel.org> writes:

> This patchset represents a (mostly) complete rough draft of fscrypt
> support for cephfs. The context, filename and symlink support is more or
> less the same as the versions posted before, and comprise the first half
> of the patches.
>
> The new bits here are the size handling changes and support for content
> encryption, in buffered, direct and synchronous codepaths. Much of this
> code is still very rough and needs a lot of cleanup work.
>
> fscrypt support relies on some MDS changes that are being tracked here:
>
>     https://github.com/ceph/ceph/pull/43588
>

Please correct me if I'm wrong (and I've a feeling that I *will* be
wrong): we're still missing some mechanism that prevents clients that do
not support fscrypt from creating new files in an encryption directory,
right?  I'm pretty sure I've discussed this "somewhere" with "someone",
but I can't remember anything else.

At this point, I can create an encrypted directory and, from a different
client (that doesn't support fscrypt), create a new non-encrypted file in
that directory.  The result isn't good, of course.

I guess that a new feature bit can be used so that the MDS won't allow any
sort of operations (or, at least, write/create operations) on encrypted
dirs from clients that don't have this bit set.

So, am I missing something or is this still on the TODO list?

(I can try to have a look at it if this is still missing.)

Cheers,
-- 
Luís


>
> In particular, this PR adds some new opaque fields in the inode that we
> use to store fscrypt-specific information, like the context and the real
> size of a file. That is slated to be merged for the upcoming Quincy
> release (which is sometime this northern spring).
>
> There are still some notable bugs:
>
> 1/ we've identified a few more potential races in truncate handling
> which will probably necessitate a protocol change, as well as changes to
> the MDS and kclient patchsets. The good news is that we think we have
> an approach that will resolve this.
>
> 2/ the kclient doesn't handle reading sparse regions in OSD objects
> properly yet. The client can end up writing to a non-zero offset in a
> non-existent object. Then, if the client tries to read the written
> region back later, it'll get back zeroes and give you garbage when you
> try to decrypt them.
>
> It turns out that the OSD already supports a SPARSE_READ operation, so
> I'm working on implementing that in the kclient to make it not try to
> decrypt the sparse regions.
>
> Still, I was able to run xfstests on this set yesterday. Bug #2 above
> prevented all of the tests from passing, but it didn't oops! I call that
> progress! Given that, I figured this is a good time to post what I have
> so far.
>
> Note that the buffered I/O changes in this set are not suitable for
> merge and will likely end up being discarded. We need to plumb the
> encryption in at the netfs layer, so that we can store encrypted data
> in fscache.
>
> The non-buffered codepaths will likely also need substantial changes
> before merging. It may be simpler to just move that into the netfs layer
> too as cifs will need something similar anyway.
>
> My goal is to get most of this into v5.18, but v5.19 might be more
> realistiv. Hopefully I'll have a non-RFC patchset to send in a few
> weeks.
>
> Special thanks to Xiubo who came through with the MDS patches. Also,
> thanks to everyone (especially Eric Biggers) for all of the previous
> reviews. It's much appreciated!
>
> Jeff Layton (43):
>   vfs: export new_inode_pseudo
>   fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode
>   fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
>   fscrypt: add fscrypt_context_for_new_inode
>   ceph: preallocate inode for ops that may create one
>   ceph: crypto context handling for ceph
>   ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces
>   ceph: add fscrypt_* handling to caps.c
>   ceph: add ability to set fscrypt_auth via setattr
>   ceph: implement -o test_dummy_encryption mount option
>   ceph: decode alternate_name in lease info
>   ceph: add fscrypt ioctls
>   ceph: make ceph_msdc_build_path use ref-walk
>   ceph: add encrypted fname handling to ceph_mdsc_build_path
>   ceph: send altname in MClientRequest
>   ceph: encode encrypted name in dentry release
>   ceph: properly set DCACHE_NOKEY_NAME flag in lookup
>   ceph: make d_revalidate call fscrypt revalidator for encrypted
>     dentries
>   ceph: add helpers for converting names for userland presentation
>   ceph: add fscrypt support to ceph_fill_trace
>   ceph: add support to readdir for encrypted filenames
>   ceph: create symlinks with encrypted and base64-encoded targets
>   ceph: make ceph_get_name decrypt filenames
>   ceph: add a new ceph.fscrypt.auth vxattr
>   ceph: add some fscrypt guardrails
>   libceph: add CEPH_OSD_OP_ASSERT_VER support
>   ceph: size handling for encrypted inodes in cap updates
>   ceph: fscrypt_file field handling in MClientRequest messages
>   ceph: get file size from fscrypt_file when present in inode traces
>   ceph: handle fscrypt fields in cap messages from MDS
>   ceph: add infrastructure for file encryption and decryption
>   libceph: allow ceph_osdc_new_request to accept a multi-op read
>   ceph: disable fallocate for encrypted inodes
>   ceph: disable copy offload on encrypted inodes
>   ceph: don't use special DIO path for encrypted inodes
>   ceph: set encryption context on open
>   ceph: align data in pages in ceph_sync_write
>   ceph: add read/modify/write to ceph_sync_write
>   ceph: plumb in decryption during sync reads
>   ceph: set i_blkbits to crypto block size for encrypted inodes
>   ceph: add fscrypt decryption support to ceph_netfs_issue_op
>   ceph: add encryption support to writepage
>   ceph: fscrypt support for writepages
>
> Luis Henriques (1):
>   ceph: don't allow changing layout on encrypted files/directories
>
> Xiubo Li (4):
>   ceph: add __ceph_get_caps helper support
>   ceph: add __ceph_sync_read helper support
>   ceph: add object version support for sync read
>   ceph: add truncate size handling support for fscrypt
>
>  fs/ceph/Makefile                |   1 +
>  fs/ceph/acl.c                   |   4 +-
>  fs/ceph/addr.c                  | 128 +++++--
>  fs/ceph/caps.c                  | 211 ++++++++++--
>  fs/ceph/crypto.c                | 374 +++++++++++++++++++++
>  fs/ceph/crypto.h                | 237 +++++++++++++
>  fs/ceph/dir.c                   | 209 +++++++++---
>  fs/ceph/export.c                |  44 ++-
>  fs/ceph/file.c                  | 476 +++++++++++++++++++++-----
>  fs/ceph/inode.c                 | 576 +++++++++++++++++++++++++++++---
>  fs/ceph/ioctl.c                 |  87 +++++
>  fs/ceph/mds_client.c            | 349 ++++++++++++++++---
>  fs/ceph/mds_client.h            |  24 +-
>  fs/ceph/super.c                 |  90 ++++-
>  fs/ceph/super.h                 |  43 ++-
>  fs/ceph/xattr.c                 |  29 ++
>  fs/crypto/fname.c               |  44 ++-
>  fs/crypto/fscrypt_private.h     |   9 +-
>  fs/crypto/hooks.c               |   6 +-
>  fs/crypto/policy.c              |  35 +-
>  fs/inode.c                      |   1 +
>  include/linux/ceph/ceph_fs.h    |  21 +-
>  include/linux/ceph/osd_client.h |   6 +-
>  include/linux/ceph/rados.h      |   4 +
>  include/linux/fscrypt.h         |  10 +
>  net/ceph/osd_client.c           |  32 +-
>  26 files changed, 2700 insertions(+), 350 deletions(-)
>  create mode 100644 fs/ceph/crypto.c
>  create mode 100644 fs/ceph/crypto.h
>
> -- 
> 2.34.1
>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 00/48] ceph+fscrypt: full support
  2022-02-14 17:57 ` Luís Henriques
@ 2022-02-14 18:39   ` Jeff Layton
  2022-02-14 21:00     ` Luís Henriques
  2022-02-16 16:13     ` Luís Henriques
  0 siblings, 2 replies; 84+ messages in thread
From: Jeff Layton @ 2022-02-14 18:39 UTC (permalink / raw)
  To: Luís Henriques; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, idryomov

On Mon, 2022-02-14 at 17:57 +0000, Luís Henriques wrote:
> Jeff Layton <jlayton@kernel.org> writes:
> 
> > This patchset represents a (mostly) complete rough draft of fscrypt
> > support for cephfs. The context, filename and symlink support is more or
> > less the same as the versions posted before, and comprise the first half
> > of the patches.
> > 
> > The new bits here are the size handling changes and support for content
> > encryption, in buffered, direct and synchronous codepaths. Much of this
> > code is still very rough and needs a lot of cleanup work.
> > 
> > fscrypt support relies on some MDS changes that are being tracked here:
> > 
> >     https://github.com/ceph/ceph/pull/43588
> > 
> 
> Please correct me if I'm wrong (and I've a feeling that I *will* be
> wrong): we're still missing some mechanism that prevents clients that do
> not support fscrypt from creating new files in an encryption directory,
> right?  I'm pretty sure I've discussed this "somewhere" with "someone",
> but I can't remember anything else.
> 
> At this point, I can create an encrypted directory and, from a different
> client (that doesn't support fscrypt), create a new non-encrypted file in
> that directory.  The result isn't good, of course.
> 
> I guess that a new feature bit can be used so that the MDS won't allow any
> sort of operations (or, at least, write/create operations) on encrypted
> dirs from clients that don't have this bit set.
> 
> So, am I missing something or is this still on the TODO list?
> 
> (I can try to have a look at it if this is still missing.)
> 
> Cheers,

It's still on the TODO list.

Basically, I think we'll want to allow non-fscrypt-enabled clients to
stat and readdir in an fscrypt-enabled directory tree, and unlink files
and directories in it.

They should have no need to do anything else. You can't run backups from
such clients since you wouldn't have the real size or crypto context.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 00/48] ceph+fscrypt: full support
  2022-02-14 18:39   ` Jeff Layton
@ 2022-02-14 21:00     ` Luís Henriques
  2022-02-14 21:10       ` Jeff Layton
  2022-02-16 16:13     ` Luís Henriques
  1 sibling, 1 reply; 84+ messages in thread
From: Luís Henriques @ 2022-02-14 21:00 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, idryomov

Jeff Layton <jlayton@kernel.org> writes:

> On Mon, 2022-02-14 at 17:57 +0000, Luís Henriques wrote:
>> Jeff Layton <jlayton@kernel.org> writes:
>> 
>> > This patchset represents a (mostly) complete rough draft of fscrypt
>> > support for cephfs. The context, filename and symlink support is more or
>> > less the same as the versions posted before, and comprise the first half
>> > of the patches.
>> > 
>> > The new bits here are the size handling changes and support for content
>> > encryption, in buffered, direct and synchronous codepaths. Much of this
>> > code is still very rough and needs a lot of cleanup work.
>> > 
>> > fscrypt support relies on some MDS changes that are being tracked here:
>> > 
>> >     https://github.com/ceph/ceph/pull/43588
>> > 
>> 
>> Please correct me if I'm wrong (and I've a feeling that I *will* be
>> wrong): we're still missing some mechanism that prevents clients that do
>> not support fscrypt from creating new files in an encryption directory,
>> right?  I'm pretty sure I've discussed this "somewhere" with "someone",
>> but I can't remember anything else.
>> 
>> At this point, I can create an encrypted directory and, from a different
>> client (that doesn't support fscrypt), create a new non-encrypted file in
>> that directory.  The result isn't good, of course.
>> 
>> I guess that a new feature bit can be used so that the MDS won't allow any
>> sort of operations (or, at least, write/create operations) on encrypted
>> dirs from clients that don't have this bit set.
>> 
>> So, am I missing something or is this still on the TODO list?
>> 
>> (I can try to have a look at it if this is still missing.)
>> 
>> Cheers,
>
> It's still on the TODO list.
>
> Basically, I think we'll want to allow non-fscrypt-enabled clients to
> stat and readdir in an fscrypt-enabled directory tree, and unlink files
> and directories in it.
>
> They should have no need to do anything else. You can't run backups from
> such clients since you wouldn't have the real size or crypto context.

Yep, that makes sense.  And do you think that adding a new feature bit is
the best way to sort this out, or did you had other solution in mind?

I'll try to spend some time on this tomorrow.

Cheers,
-- 
Luís

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 00/48] ceph+fscrypt: full support
  2022-02-14 21:00     ` Luís Henriques
@ 2022-02-14 21:10       ` Jeff Layton
  0 siblings, 0 replies; 84+ messages in thread
From: Jeff Layton @ 2022-02-14 21:10 UTC (permalink / raw)
  To: Luís Henriques; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, idryomov

On Mon, 2022-02-14 at 21:00 +0000, Luís Henriques wrote:
> Jeff Layton <jlayton@kernel.org> writes:
> 
> > On Mon, 2022-02-14 at 17:57 +0000, Luís Henriques wrote:
> > > Jeff Layton <jlayton@kernel.org> writes:
> > > 
> > > > This patchset represents a (mostly) complete rough draft of fscrypt
> > > > support for cephfs. The context, filename and symlink support is more or
> > > > less the same as the versions posted before, and comprise the first half
> > > > of the patches.
> > > > 
> > > > The new bits here are the size handling changes and support for content
> > > > encryption, in buffered, direct and synchronous codepaths. Much of this
> > > > code is still very rough and needs a lot of cleanup work.
> > > > 
> > > > fscrypt support relies on some MDS changes that are being tracked here:
> > > > 
> > > >     https://github.com/ceph/ceph/pull/43588
> > > > 
> > > 
> > > Please correct me if I'm wrong (and I've a feeling that I *will* be
> > > wrong): we're still missing some mechanism that prevents clients that do
> > > not support fscrypt from creating new files in an encryption directory,
> > > right?  I'm pretty sure I've discussed this "somewhere" with "someone",
> > > but I can't remember anything else.
> > > 
> > > At this point, I can create an encrypted directory and, from a different
> > > client (that doesn't support fscrypt), create a new non-encrypted file in
> > > that directory.  The result isn't good, of course.
> > > 
> > > I guess that a new feature bit can be used so that the MDS won't allow any
> > > sort of operations (or, at least, write/create operations) on encrypted
> > > dirs from clients that don't have this bit set.
> > > 
> > > So, am I missing something or is this still on the TODO list?
> > > 
> > > (I can try to have a look at it if this is still missing.)
> > > 
> > > Cheers,
> > 
> > It's still on the TODO list.
> > 
> > Basically, I think we'll want to allow non-fscrypt-enabled clients to
> > stat and readdir in an fscrypt-enabled directory tree, and unlink files
> > and directories in it.
> > 
> > They should have no need to do anything else. You can't run backups from
> > such clients since you wouldn't have the real size or crypto context.
> 
> Yep, that makes sense.  And do you think that adding a new feature bit is
> the best way to sort this out, or did you had other solution in mind?
> 
> I'll try to spend some time on this tomorrow.
> 

Probably a new cephfs feature bit is fine, and indeed we already have
one for fscrypt anyway. This can probably share a value with the
ALTERNATE_NAME bit.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 00/48] ceph+fscrypt: full support
  2022-02-14 12:08     ` Xiubo Li
@ 2022-02-15  0:44       ` Xiubo Li
  0 siblings, 0 replies; 84+ messages in thread
From: Xiubo Li @ 2022-02-15  0:44 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov, xiubli


On 2/14/22 8:08 PM, Xiubo Li wrote:
>
> On 2/14/22 7:33 PM, Jeff Layton wrote:
>> On Mon, 2022-02-14 at 17:37 +0800, Xiubo Li wrote:
>>> Hi Jeff,
>>>
>>> I am using the 'wip-fscrypt' branch to test other issue and hit:
>>>
>>> cp: cannot access './dir___683': No buffer space available
>>> cp: cannot access './dir___686': No buffer space available
>>> cp: cannot access './dir___687': No buffer space available
>>> cp: cannot access './dir___688': No buffer space available
>>> cp: cannot access './dir___689': No buffer space available
>>> cp: cannot access './dir___693': No buffer space available
>>>
>>> ...
>>>
>>> [root@lxbceph1 kcephfs]# diff ./dir___997 /data/backup/kernel/dir___997
>>> diff: ./dir___997: No buffer space available
>>>
>>>
>>> The dmesg logs:
>>>
>>> <7>[ 1256.918228] ceph:  do_getattr inode 0000000089964a71 mask AsXsFs
>>> mode 040755
>>> <7>[ 1256.918232] ceph:  __ceph_caps_issued_mask ino 0x100000009be cap
>>> 0000000014f1c64b issued pAsLsXsFs (mask AsXsFs)
>>> <7>[ 1256.918237] ceph:  __touch_cap 0000000089964a71 cap
>>> 0000000014f1c64b mds0
>>> <7>[ 1256.918250] ceph:  readdir 0000000089964a71 file 00000000065cb689
>>> pos 0
>>> <7>[ 1256.918254] ceph:  readdir off 0 -> '.'
>>> <7>[ 1256.918258] ceph:  readdir off 1 -> '..'
>>> <4>[ 1256.918262] fscrypt (ceph, inode 1099511630270): Error -105
>>> getting encryption context
>>> <7>[ 1256.918269] ceph:  readdir 0000000089964a71 file 00000000065cb689
>>> pos 2
>>> <4>[ 1256.918273] fscrypt (ceph, inode 1099511630270): Error -105
>>> getting encryption context
>>> <7>[ 1256.918288] ceph:  release inode 0000000089964a71 dir file
>>> 00000000065cb689
>>> <7>[ 1256.918310] ceph:  __ceph_caps_issued_mask ino 0x1 cap
>>> 00000000aa2afb8b issued pAsLsXsFs (mask Fs)
>>> <7>[ 1257.574593] ceph:  mdsc delayed_work
>>>
>>> I did nothing about the fscrypt after mounting the kclient, just create
>>> 2000 directories and then made some snapshots on the root dir and then
>>> try to copy the root directory to the backup.
>>>
>>> - Xiubo
>>>
>> That means that ceph_crypt_get_context returned -ENODATA, which it can
>
> It should be -ENOBUFS.
>
> I am not sure it relates to the snapshot stuff. I will try without the 
> snapshot later.

Without snapshot, I also can see this error.

-- Xiubo

>
> I can debug it later, maybe in next week.
>
> -- Xiubo
>
>> do for several different reasons. We probably need to add in some
>> debugging there to see which one it is...
>>
>> TBH, I've done absolutely no testing with snapshots, so it's quite
>> possible there is some interaction there that is causing problems.
>>
>> -- Jeff
>>
>>> On 1/12/22 3:15 AM, Jeff Layton wrote:
>>>> This patchset represents a (mostly) complete rough draft of fscrypt
>>>> support for cephfs. The context, filename and symlink support is 
>>>> more or
>>>> less the same as the versions posted before, and comprise the first 
>>>> half
>>>> of the patches.
>>>>
>>>> The new bits here are the size handling changes and support for 
>>>> content
>>>> encryption, in buffered, direct and synchronous codepaths. Much of 
>>>> this
>>>> code is still very rough and needs a lot of cleanup work.
>>>>
>>>> fscrypt support relies on some MDS changes that are being tracked 
>>>> here:
>>>>
>>>>       https://github.com/ceph/ceph/pull/43588
>>>>
>>>> In particular, this PR adds some new opaque fields in the inode 
>>>> that we
>>>> use to store fscrypt-specific information, like the context and the 
>>>> real
>>>> size of a file. That is slated to be merged for the upcoming Quincy
>>>> release (which is sometime this northern spring).
>>>>
>>>> There are still some notable bugs:
>>>>
>>>> 1/ we've identified a few more potential races in truncate handling
>>>> which will probably necessitate a protocol change, as well as 
>>>> changes to
>>>> the MDS and kclient patchsets. The good news is that we think we have
>>>> an approach that will resolve this.
>>>>
>>>> 2/ the kclient doesn't handle reading sparse regions in OSD objects
>>>> properly yet. The client can end up writing to a non-zero offset in a
>>>> non-existent object. Then, if the client tries to read the written
>>>> region back later, it'll get back zeroes and give you garbage when you
>>>> try to decrypt them.
>>>>
>>>> It turns out that the OSD already supports a SPARSE_READ operation, so
>>>> I'm working on implementing that in the kclient to make it not try to
>>>> decrypt the sparse regions.
>>>>
>>>> Still, I was able to run xfstests on this set yesterday. Bug #2 above
>>>> prevented all of the tests from passing, but it didn't oops! I call 
>>>> that
>>>> progress! Given that, I figured this is a good time to post what I 
>>>> have
>>>> so far.
>>>>
>>>> Note that the buffered I/O changes in this set are not suitable for
>>>> merge and will likely end up being discarded. We need to plumb the
>>>> encryption in at the netfs layer, so that we can store encrypted data
>>>> in fscache.
>>>>
>>>> The non-buffered codepaths will likely also need substantial changes
>>>> before merging. It may be simpler to just move that into the netfs 
>>>> layer
>>>> too as cifs will need something similar anyway.
>>>>
>>>> My goal is to get most of this into v5.18, but v5.19 might be more
>>>> realistiv. Hopefully I'll have a non-RFC patchset to send in a few
>>>> weeks.
>>>>
>>>> Special thanks to Xiubo who came through with the MDS patches. Also,
>>>> thanks to everyone (especially Eric Biggers) for all of the previous
>>>> reviews. It's much appreciated!
>>>>
>>>> Jeff Layton (43):
>>>>     vfs: export new_inode_pseudo
>>>>     fscrypt: export fscrypt_base64url_encode and 
>>>> fscrypt_base64url_decode
>>>>     fscrypt: export fscrypt_fname_encrypt and 
>>>> fscrypt_fname_encrypted_size
>>>>     fscrypt: add fscrypt_context_for_new_inode
>>>>     ceph: preallocate inode for ops that may create one
>>>>     ceph: crypto context handling for ceph
>>>>     ceph: parse new fscrypt_auth and fscrypt_file fields in inode 
>>>> traces
>>>>     ceph: add fscrypt_* handling to caps.c
>>>>     ceph: add ability to set fscrypt_auth via setattr
>>>>     ceph: implement -o test_dummy_encryption mount option
>>>>     ceph: decode alternate_name in lease info
>>>>     ceph: add fscrypt ioctls
>>>>     ceph: make ceph_msdc_build_path use ref-walk
>>>>     ceph: add encrypted fname handling to ceph_mdsc_build_path
>>>>     ceph: send altname in MClientRequest
>>>>     ceph: encode encrypted name in dentry release
>>>>     ceph: properly set DCACHE_NOKEY_NAME flag in lookup
>>>>     ceph: make d_revalidate call fscrypt revalidator for encrypted
>>>>       dentries
>>>>     ceph: add helpers for converting names for userland presentation
>>>>     ceph: add fscrypt support to ceph_fill_trace
>>>>     ceph: add support to readdir for encrypted filenames
>>>>     ceph: create symlinks with encrypted and base64-encoded targets
>>>>     ceph: make ceph_get_name decrypt filenames
>>>>     ceph: add a new ceph.fscrypt.auth vxattr
>>>>     ceph: add some fscrypt guardrails
>>>>     libceph: add CEPH_OSD_OP_ASSERT_VER support
>>>>     ceph: size handling for encrypted inodes in cap updates
>>>>     ceph: fscrypt_file field handling in MClientRequest messages
>>>>     ceph: get file size from fscrypt_file when present in inode traces
>>>>     ceph: handle fscrypt fields in cap messages from MDS
>>>>     ceph: add infrastructure for file encryption and decryption
>>>>     libceph: allow ceph_osdc_new_request to accept a multi-op read
>>>>     ceph: disable fallocate for encrypted inodes
>>>>     ceph: disable copy offload on encrypted inodes
>>>>     ceph: don't use special DIO path for encrypted inodes
>>>>     ceph: set encryption context on open
>>>>     ceph: align data in pages in ceph_sync_write
>>>>     ceph: add read/modify/write to ceph_sync_write
>>>>     ceph: plumb in decryption during sync reads
>>>>     ceph: set i_blkbits to crypto block size for encrypted inodes
>>>>     ceph: add fscrypt decryption support to ceph_netfs_issue_op
>>>>     ceph: add encryption support to writepage
>>>>     ceph: fscrypt support for writepages
>>>>
>>>> Luis Henriques (1):
>>>>     ceph: don't allow changing layout on encrypted files/directories
>>>>
>>>> Xiubo Li (4):
>>>>     ceph: add __ceph_get_caps helper support
>>>>     ceph: add __ceph_sync_read helper support
>>>>     ceph: add object version support for sync read
>>>>     ceph: add truncate size handling support for fscrypt
>>>>
>>>>    fs/ceph/Makefile                |   1 +
>>>>    fs/ceph/acl.c                   |   4 +-
>>>>    fs/ceph/addr.c                  | 128 +++++--
>>>>    fs/ceph/caps.c                  | 211 ++++++++++--
>>>>    fs/ceph/crypto.c                | 374 +++++++++++++++++++++
>>>>    fs/ceph/crypto.h                | 237 +++++++++++++
>>>>    fs/ceph/dir.c                   | 209 +++++++++---
>>>>    fs/ceph/export.c                |  44 ++-
>>>>    fs/ceph/file.c                  | 476 +++++++++++++++++++++-----
>>>>    fs/ceph/inode.c                 | 576 
>>>> +++++++++++++++++++++++++++++---
>>>>    fs/ceph/ioctl.c                 |  87 +++++
>>>>    fs/ceph/mds_client.c            | 349 ++++++++++++++++---
>>>>    fs/ceph/mds_client.h            |  24 +-
>>>>    fs/ceph/super.c                 |  90 ++++-
>>>>    fs/ceph/super.h                 |  43 ++-
>>>>    fs/ceph/xattr.c                 |  29 ++
>>>>    fs/crypto/fname.c               |  44 ++-
>>>>    fs/crypto/fscrypt_private.h     |   9 +-
>>>>    fs/crypto/hooks.c               |   6 +-
>>>>    fs/crypto/policy.c              |  35 +-
>>>>    fs/inode.c                      |   1 +
>>>>    include/linux/ceph/ceph_fs.h    |  21 +-
>>>>    include/linux/ceph/osd_client.h |   6 +-
>>>>    include/linux/ceph/rados.h      |   4 +
>>>>    include/linux/fscrypt.h         |  10 +
>>>>    net/ceph/osd_client.c           |  32 +-
>>>>    26 files changed, 2700 insertions(+), 350 deletions(-)
>>>>    create mode 100644 fs/ceph/crypto.c
>>>>    create mode 100644 fs/ceph/crypto.h
>>>>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 00/48] ceph+fscrypt: full support
  2022-02-14 18:39   ` Jeff Layton
  2022-02-14 21:00     ` Luís Henriques
@ 2022-02-16 16:13     ` Luís Henriques
  1 sibling, 0 replies; 84+ messages in thread
From: Luís Henriques @ 2022-02-16 16:13 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, idryomov

Jeff Layton <jlayton@kernel.org> writes:

> On Mon, 2022-02-14 at 17:57 +0000, Luís Henriques wrote:
>> Jeff Layton <jlayton@kernel.org> writes:
>> 
>> > This patchset represents a (mostly) complete rough draft of fscrypt
>> > support for cephfs. The context, filename and symlink support is more or
>> > less the same as the versions posted before, and comprise the first half
>> > of the patches.
>> > 
>> > The new bits here are the size handling changes and support for content
>> > encryption, in buffered, direct and synchronous codepaths. Much of this
>> > code is still very rough and needs a lot of cleanup work.
>> > 
>> > fscrypt support relies on some MDS changes that are being tracked here:
>> > 
>> >     https://github.com/ceph/ceph/pull/43588
>> > 
>> 
>> Please correct me if I'm wrong (and I've a feeling that I *will* be
>> wrong): we're still missing some mechanism that prevents clients that do
>> not support fscrypt from creating new files in an encryption directory,
>> right?  I'm pretty sure I've discussed this "somewhere" with "someone",
>> but I can't remember anything else.
>> 
>> At this point, I can create an encrypted directory and, from a different
>> client (that doesn't support fscrypt), create a new non-encrypted file in
>> that directory.  The result isn't good, of course.
>> 
>> I guess that a new feature bit can be used so that the MDS won't allow any
>> sort of operations (or, at least, write/create operations) on encrypted
>> dirs from clients that don't have this bit set.
>> 
>> So, am I missing something or is this still on the TODO list?
>> 
>> (I can try to have a look at it if this is still missing.)
>> 
>> Cheers,
>
> It's still on the TODO list.
>
> Basically, I think we'll want to allow non-fscrypt-enabled clients to
> stat and readdir in an fscrypt-enabled directory tree, and unlink files
> and directories in it.
>
> They should have no need to do anything else. You can't run backups from
> such clients since you wouldn't have the real size or crypto context.
> -- 
> Jeff Layton <jlayton@kernel.org>

OK, I've looked at the code and I've a patch that works (sort of).  Here's
what I've done:

I'm blocking all the dangerous Ops (CEPH_MDS_OP_{CREATE,MKDIR,...}) early
in the client requests handling code.  I.e., returning -EROFS if the
client session doesn't have the feature *and* the inode has fscrypt_auth
set.

It sort of works (I still need to find if I need any locks, that's black
magic for me!), but it won't prevent a client from doing things like
appending garbage to an encrypted file.  Doing this will obviously make
that file useless, but it's not that much different from non-encrypted
files (sure, in this case it might be possible to recover some data).  But
I'm not seeing an easy way to caps into this mix.

Cheers,
-- 
Luís

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 07/48] ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces
  2022-01-11 19:15 ` [RFC PATCH v10 07/48] ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces Jeff Layton
@ 2022-02-17  8:25   ` Xiubo Li
  2022-02-17 11:39     ` Jeff Layton
  0 siblings, 1 reply; 84+ messages in thread
From: Xiubo Li @ 2022-02-17  8:25 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov


On 1/12/22 3:15 AM, Jeff Layton wrote:
> ...and store them in the ceph_inode_info.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>   fs/ceph/file.c       |  2 ++
>   fs/ceph/inode.c      | 18 ++++++++++++++-
>   fs/ceph/mds_client.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
>   fs/ceph/mds_client.h |  4 ++++
>   fs/ceph/super.h      |  6 +++++
>   5 files changed, 84 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index ace72a052254..5937a25ddddd 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -597,6 +597,8 @@ static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
>   	iinfo.xattr_data = xattr_buf;
>   	memset(iinfo.xattr_data, 0, iinfo.xattr_len);
>   
> +	/* FIXME: set fscrypt_auth and fscrypt_file */
> +
>   	in.ino = cpu_to_le64(vino.ino);
>   	in.snapid = cpu_to_le64(CEPH_NOSNAP);
>   	in.version = cpu_to_le64(1);	// ???
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 649d7a059d7b..d090fe081093 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -609,7 +609,10 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
>   	INIT_WORK(&ci->i_work, ceph_inode_work);
>   	ci->i_work_mask = 0;
>   	memset(&ci->i_btime, '\0', sizeof(ci->i_btime));
> -
> +#ifdef CONFIG_FS_ENCRYPTION
> +	ci->fscrypt_auth = NULL;
> +	ci->fscrypt_auth_len = 0;
> +#endif
>   	ceph_fscache_inode_init(ci);
>   
>   	return &ci->vfs_inode;
> @@ -620,6 +623,9 @@ void ceph_free_inode(struct inode *inode)
>   	struct ceph_inode_info *ci = ceph_inode(inode);
>   
>   	kfree(ci->i_symlink);
> +#ifdef CONFIG_FS_ENCRYPTION
> +	kfree(ci->fscrypt_auth);
> +#endif
>   	kmem_cache_free(ceph_inode_cachep, ci);
>   }
>   
> @@ -1020,6 +1026,16 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
>   		xattr_blob = NULL;
>   	}
>   
> +#ifdef CONFIG_FS_ENCRYPTION
> +	if (iinfo->fscrypt_auth_len && !ci->fscrypt_auth) {
> +		ci->fscrypt_auth_len = iinfo->fscrypt_auth_len;
> +		ci->fscrypt_auth = iinfo->fscrypt_auth;
> +		iinfo->fscrypt_auth = NULL;
> +		iinfo->fscrypt_auth_len = 0;
> +		inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
> +	}
> +#endif
> +
>   	/* finally update i_version */
>   	if (le64_to_cpu(info->version) > ci->i_version)
>   		ci->i_version = le64_to_cpu(info->version);
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 57cf21c9199f..bd824e989449 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -184,8 +184,50 @@ static int parse_reply_info_in(void **p, void *end,
>   			info->rsnaps = 0;
>   		}
>   
> +		if (struct_v >= 5) {
> +			u32 alen;
> +
> +			ceph_decode_32_safe(p, end, alen, bad);
> +
> +			while (alen--) {
> +				u32 len;
> +
> +				/* key */
> +				ceph_decode_32_safe(p, end, len, bad);
> +				ceph_decode_skip_n(p, end, len, bad);
> +				/* value */
> +				ceph_decode_32_safe(p, end, len, bad);
> +				ceph_decode_skip_n(p, end, len, bad);
> +			}
> +		}
> +
> +		/* fscrypt flag -- ignore */
> +		if (struct_v >= 6)
> +			ceph_decode_skip_8(p, end, bad);
> +
> +		info->fscrypt_auth = NULL;
> +		info->fscrypt_file = NULL;

The 'fscrypt_auth_len' and 'fscrypt_file_len' should also be reset here. 
Or we will hit the issue I mentioned as bellow:


cp: cannot access './dir___683': No buffer space available
cp: cannot access './dir___686': No buffer space available

The dmesg logs:

<7>[ 1256.918250] ceph:  readdir 0000000089964a71 file 00000000065cb689
pos 0
<7>[ 1256.918254] ceph:  readdir off 0 -> '.'
<7>[ 1256.918258] ceph:  readdir off 1 -> '..'
<4>[ 1256.918262] fscrypt (ceph, inode 1099511630270): Error -105
getting encryption context
<7>[ 1256.918269] ceph:  readdir 0000000089964a71 file 00000000065cb689
pos 2
<4>[ 1256.918273] fscrypt (ceph, inode 1099511630270): Error -105
getting encryption context


This can be reproduced when using an old ceph cluster without fscrypt 
support.

And also I have sent out one fix to zero the memory when allocating it 
in ceph_readdir() to fix the potential bug like this.

Thanks

BRs

-- Xiubo


> +		if (struct_v >= 7) {
> +			ceph_decode_32_safe(p, end, info->fscrypt_auth_len, bad);
> +			if (info->fscrypt_auth_len) {
> +				info->fscrypt_auth = kmalloc(info->fscrypt_auth_len, GFP_KERNEL);
> +				if (!info->fscrypt_auth)
> +					return -ENOMEM;
> +				ceph_decode_copy_safe(p, end, info->fscrypt_auth,
> +						      info->fscrypt_auth_len, bad);
> +			}
> +			ceph_decode_32_safe(p, end, info->fscrypt_file_len, bad);
> +			if (info->fscrypt_file_len) {
> +				info->fscrypt_file = kmalloc(info->fscrypt_file_len, GFP_KERNEL);
> +				if (!info->fscrypt_file)
> +					return -ENOMEM;
> +				ceph_decode_copy_safe(p, end, info->fscrypt_file,
> +						      info->fscrypt_file_len, bad);
> +			}
> +		}
>   		*p = end;
>   	} else {
> +		/* legacy (unversioned) struct */
>   		if (features & CEPH_FEATURE_MDS_INLINE_DATA) {
>   			ceph_decode_64_safe(p, end, info->inline_version, bad);
>   			ceph_decode_32_safe(p, end, info->inline_len, bad);
> @@ -626,8 +668,21 @@ static int parse_reply_info(struct ceph_mds_session *s, struct ceph_msg *msg,
>   
>   static void destroy_reply_info(struct ceph_mds_reply_info_parsed *info)
>   {
> +	int i;
> +
> +	kfree(info->diri.fscrypt_auth);
> +	kfree(info->diri.fscrypt_file);
> +	kfree(info->targeti.fscrypt_auth);
> +	kfree(info->targeti.fscrypt_file);
>   	if (!info->dir_entries)
>   		return;
> +
> +	for (i = 0; i < info->dir_nr; i++) {
> +		struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
> +
> +		kfree(rde->inode.fscrypt_auth);
> +		kfree(rde->inode.fscrypt_file);
> +	}
>   	free_pages((unsigned long)info->dir_entries, get_order(info->dir_buf_size));
>   }
>   
> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> index c3986a412fb5..98a8710807d1 100644
> --- a/fs/ceph/mds_client.h
> +++ b/fs/ceph/mds_client.h
> @@ -88,6 +88,10 @@ struct ceph_mds_reply_info_in {
>   	s32 dir_pin;
>   	struct ceph_timespec btime;
>   	struct ceph_timespec snap_btime;
> +	u8 *fscrypt_auth;
> +	u8 *fscrypt_file;
> +	u32 fscrypt_auth_len;
> +	u32 fscrypt_file_len;
>   	u64 rsnaps;
>   	u64 change_attr;
>   };
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 532ee9fca878..5b4092e5f291 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -433,6 +433,12 @@ struct ceph_inode_info {
>   	struct work_struct i_work;
>   	unsigned long  i_work_mask;
>   
> +#ifdef CONFIG_FS_ENCRYPTION
> +	u32 fscrypt_auth_len;
> +	u32 fscrypt_file_len;
> +	u8 *fscrypt_auth;
> +	u8 *fscrypt_file;
> +#endif
>   #ifdef CONFIG_CEPH_FSCACHE
>   	struct fscache_cookie *fscache;
>   #endif


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 07/48] ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces
  2022-02-17  8:25   ` Xiubo Li
@ 2022-02-17 11:39     ` Jeff Layton
  2022-02-18  1:09       ` Xiubo Li
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff Layton @ 2022-02-17 11:39 UTC (permalink / raw)
  To: Xiubo Li, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

On Thu, 2022-02-17 at 16:25 +0800, Xiubo Li wrote:
> On 1/12/22 3:15 AM, Jeff Layton wrote:
> > ...and store them in the ceph_inode_info.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >   fs/ceph/file.c       |  2 ++
> >   fs/ceph/inode.c      | 18 ++++++++++++++-
> >   fs/ceph/mds_client.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
> >   fs/ceph/mds_client.h |  4 ++++
> >   fs/ceph/super.h      |  6 +++++
> >   5 files changed, 84 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index ace72a052254..5937a25ddddd 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -597,6 +597,8 @@ static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
> >   	iinfo.xattr_data = xattr_buf;
> >   	memset(iinfo.xattr_data, 0, iinfo.xattr_len);
> >   
> > +	/* FIXME: set fscrypt_auth and fscrypt_file */
> > +
> >   	in.ino = cpu_to_le64(vino.ino);
> >   	in.snapid = cpu_to_le64(CEPH_NOSNAP);
> >   	in.version = cpu_to_le64(1);	// ???
> > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> > index 649d7a059d7b..d090fe081093 100644
> > --- a/fs/ceph/inode.c
> > +++ b/fs/ceph/inode.c
> > @@ -609,7 +609,10 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
> >   	INIT_WORK(&ci->i_work, ceph_inode_work);
> >   	ci->i_work_mask = 0;
> >   	memset(&ci->i_btime, '\0', sizeof(ci->i_btime));
> > -
> > +#ifdef CONFIG_FS_ENCRYPTION
> > +	ci->fscrypt_auth = NULL;
> > +	ci->fscrypt_auth_len = 0;
> > +#endif
> >   	ceph_fscache_inode_init(ci);
> >   
> >   	return &ci->vfs_inode;
> > @@ -620,6 +623,9 @@ void ceph_free_inode(struct inode *inode)
> >   	struct ceph_inode_info *ci = ceph_inode(inode);
> >   
> >   	kfree(ci->i_symlink);
> > +#ifdef CONFIG_FS_ENCRYPTION
> > +	kfree(ci->fscrypt_auth);
> > +#endif
> >   	kmem_cache_free(ceph_inode_cachep, ci);
> >   }
> >   
> > @@ -1020,6 +1026,16 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
> >   		xattr_blob = NULL;
> >   	}
> >   
> > +#ifdef CONFIG_FS_ENCRYPTION
> > +	if (iinfo->fscrypt_auth_len && !ci->fscrypt_auth) {
> > +		ci->fscrypt_auth_len = iinfo->fscrypt_auth_len;
> > +		ci->fscrypt_auth = iinfo->fscrypt_auth;
> > +		iinfo->fscrypt_auth = NULL;
> > +		iinfo->fscrypt_auth_len = 0;
> > +		inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
> > +	}
> > +#endif
> > +
> >   	/* finally update i_version */
> >   	if (le64_to_cpu(info->version) > ci->i_version)
> >   		ci->i_version = le64_to_cpu(info->version);
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index 57cf21c9199f..bd824e989449 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -184,8 +184,50 @@ static int parse_reply_info_in(void **p, void *end,
> >   			info->rsnaps = 0;
> >   		}
> >   
> > +		if (struct_v >= 5) {
> > +			u32 alen;
> > +
> > +			ceph_decode_32_safe(p, end, alen, bad);
> > +
> > +			while (alen--) {
> > +				u32 len;
> > +
> > +				/* key */
> > +				ceph_decode_32_safe(p, end, len, bad);
> > +				ceph_decode_skip_n(p, end, len, bad);
> > +				/* value */
> > +				ceph_decode_32_safe(p, end, len, bad);
> > +				ceph_decode_skip_n(p, end, len, bad);
> > +			}
> > +		}
> > +
> > +		/* fscrypt flag -- ignore */
> > +		if (struct_v >= 6)
> > +			ceph_decode_skip_8(p, end, bad);
> > +
> > +		info->fscrypt_auth = NULL;
> > +		info->fscrypt_file = NULL;
> 
> The 'fscrypt_auth_len' and 'fscrypt_file_len' should also be reset here. 
> Or we will hit the issue I mentioned as bellow:
> 
> 
> cp: cannot access './dir___683': No buffer space available
> cp: cannot access './dir___686': No buffer space available
> 
> The dmesg logs:
> 
> <7>[ 1256.918250] ceph:  readdir 0000000089964a71 file 00000000065cb689
> pos 0
> <7>[ 1256.918254] ceph:  readdir off 0 -> '.'
> <7>[ 1256.918258] ceph:  readdir off 1 -> '..'
> <4>[ 1256.918262] fscrypt (ceph, inode 1099511630270): Error -105
> getting encryption context
> <7>[ 1256.918269] ceph:  readdir 0000000089964a71 file 00000000065cb689
> pos 2
> <4>[ 1256.918273] fscrypt (ceph, inode 1099511630270): Error -105
> getting encryption context
> 
> 
> This can be reproduced when using an old ceph cluster without fscrypt 
> support.
> 
> And also I have sent out one fix to zero the memory when allocating it 
> in ceph_readdir() to fix the potential bug like this.
> 
> Thanks
> 
> BRs
> 
> -- Xiubo
> 
> 

Good catch, Xiubo.

I merged your patch into the testing branch, and fixed this patch to
also zero out the fscrypt_auth_len and fscrypt_file_len. I've also
rebased the wip-fscrypt branch onto the current testing branch.

> > +		if (struct_v >= 7) {
> > +			ceph_decode_32_safe(p, end, info->fscrypt_auth_len, bad);
> > +			if (info->fscrypt_auth_len) {
> > +				info->fscrypt_auth = kmalloc(info->fscrypt_auth_len, GFP_KERNEL);
> > +				if (!info->fscrypt_auth)
> > +					return -ENOMEM;
> > +				ceph_decode_copy_safe(p, end, info->fscrypt_auth,
> > +						      info->fscrypt_auth_len, bad);
> > +			}
> > +			ceph_decode_32_safe(p, end, info->fscrypt_file_len, bad);
> > +			if (info->fscrypt_file_len) {
> > +				info->fscrypt_file = kmalloc(info->fscrypt_file_len, GFP_KERNEL);
> > +				if (!info->fscrypt_file)
> > +					return -ENOMEM;
> > +				ceph_decode_copy_safe(p, end, info->fscrypt_file,
> > +						      info->fscrypt_file_len, bad);
> > +			}
> > +		}
> >   		*p = end;
> >   	} else {
> > +		/* legacy (unversioned) struct */
> >   		if (features & CEPH_FEATURE_MDS_INLINE_DATA) {
> >   			ceph_decode_64_safe(p, end, info->inline_version, bad);
> >   			ceph_decode_32_safe(p, end, info->inline_len, bad);
> > @@ -626,8 +668,21 @@ static int parse_reply_info(struct ceph_mds_session *s, struct ceph_msg *msg,
> >   
> >   static void destroy_reply_info(struct ceph_mds_reply_info_parsed *info)
> >   {
> > +	int i;
> > +
> > +	kfree(info->diri.fscrypt_auth);
> > +	kfree(info->diri.fscrypt_file);
> > +	kfree(info->targeti.fscrypt_auth);
> > +	kfree(info->targeti.fscrypt_file);
> >   	if (!info->dir_entries)
> >   		return;
> > +
> > +	for (i = 0; i < info->dir_nr; i++) {
> > +		struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
> > +
> > +		kfree(rde->inode.fscrypt_auth);
> > +		kfree(rde->inode.fscrypt_file);
> > +	}
> >   	free_pages((unsigned long)info->dir_entries, get_order(info->dir_buf_size));
> >   }
> >   
> > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > index c3986a412fb5..98a8710807d1 100644
> > --- a/fs/ceph/mds_client.h
> > +++ b/fs/ceph/mds_client.h
> > @@ -88,6 +88,10 @@ struct ceph_mds_reply_info_in {
> >   	s32 dir_pin;
> >   	struct ceph_timespec btime;
> >   	struct ceph_timespec snap_btime;
> > +	u8 *fscrypt_auth;
> > +	u8 *fscrypt_file;
> > +	u32 fscrypt_auth_len;
> > +	u32 fscrypt_file_len;
> >   	u64 rsnaps;
> >   	u64 change_attr;
> >   };
> > diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> > index 532ee9fca878..5b4092e5f291 100644
> > --- a/fs/ceph/super.h
> > +++ b/fs/ceph/super.h
> > @@ -433,6 +433,12 @@ struct ceph_inode_info {
> >   	struct work_struct i_work;
> >   	unsigned long  i_work_mask;
> >   
> > +#ifdef CONFIG_FS_ENCRYPTION
> > +	u32 fscrypt_auth_len;
> > +	u32 fscrypt_file_len;
> > +	u8 *fscrypt_auth;
> > +	u8 *fscrypt_file;
> > +#endif
> >   #ifdef CONFIG_CEPH_FSCACHE
> >   	struct fscache_cookie *fscache;
> >   #endif
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 07/48] ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces
  2022-02-17 11:39     ` Jeff Layton
@ 2022-02-18  1:09       ` Xiubo Li
  0 siblings, 0 replies; 84+ messages in thread
From: Xiubo Li @ 2022-02-18  1:09 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov


On 2/17/22 7:39 PM, Jeff Layton wrote:
> On Thu, 2022-02-17 at 16:25 +0800, Xiubo Li wrote:
>> On 1/12/22 3:15 AM, Jeff Layton wrote:
>>> ...and store them in the ceph_inode_info.
>>>
>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>>    fs/ceph/file.c       |  2 ++
>>>    fs/ceph/inode.c      | 18 ++++++++++++++-
>>>    fs/ceph/mds_client.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
>>>    fs/ceph/mds_client.h |  4 ++++
>>>    fs/ceph/super.h      |  6 +++++
>>>    5 files changed, 84 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>> index ace72a052254..5937a25ddddd 100644
>>> --- a/fs/ceph/file.c
>>> +++ b/fs/ceph/file.c
>>> @@ -597,6 +597,8 @@ static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
>>>    	iinfo.xattr_data = xattr_buf;
>>>    	memset(iinfo.xattr_data, 0, iinfo.xattr_len);
>>>    
>>> +	/* FIXME: set fscrypt_auth and fscrypt_file */
>>> +
>>>    	in.ino = cpu_to_le64(vino.ino);
>>>    	in.snapid = cpu_to_le64(CEPH_NOSNAP);
>>>    	in.version = cpu_to_le64(1);	// ???
>>> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
>>> index 649d7a059d7b..d090fe081093 100644
>>> --- a/fs/ceph/inode.c
>>> +++ b/fs/ceph/inode.c
>>> @@ -609,7 +609,10 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
>>>    	INIT_WORK(&ci->i_work, ceph_inode_work);
>>>    	ci->i_work_mask = 0;
>>>    	memset(&ci->i_btime, '\0', sizeof(ci->i_btime));
>>> -
>>> +#ifdef CONFIG_FS_ENCRYPTION
>>> +	ci->fscrypt_auth = NULL;
>>> +	ci->fscrypt_auth_len = 0;
>>> +#endif
>>>    	ceph_fscache_inode_init(ci);
>>>    
>>>    	return &ci->vfs_inode;
>>> @@ -620,6 +623,9 @@ void ceph_free_inode(struct inode *inode)
>>>    	struct ceph_inode_info *ci = ceph_inode(inode);
>>>    
>>>    	kfree(ci->i_symlink);
>>> +#ifdef CONFIG_FS_ENCRYPTION
>>> +	kfree(ci->fscrypt_auth);
>>> +#endif
>>>    	kmem_cache_free(ceph_inode_cachep, ci);
>>>    }
>>>    
>>> @@ -1020,6 +1026,16 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
>>>    		xattr_blob = NULL;
>>>    	}
>>>    
>>> +#ifdef CONFIG_FS_ENCRYPTION
>>> +	if (iinfo->fscrypt_auth_len && !ci->fscrypt_auth) {
>>> +		ci->fscrypt_auth_len = iinfo->fscrypt_auth_len;
>>> +		ci->fscrypt_auth = iinfo->fscrypt_auth;
>>> +		iinfo->fscrypt_auth = NULL;
>>> +		iinfo->fscrypt_auth_len = 0;
>>> +		inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
>>> +	}
>>> +#endif
>>> +
>>>    	/* finally update i_version */
>>>    	if (le64_to_cpu(info->version) > ci->i_version)
>>>    		ci->i_version = le64_to_cpu(info->version);
>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>> index 57cf21c9199f..bd824e989449 100644
>>> --- a/fs/ceph/mds_client.c
>>> +++ b/fs/ceph/mds_client.c
>>> @@ -184,8 +184,50 @@ static int parse_reply_info_in(void **p, void *end,
>>>    			info->rsnaps = 0;
>>>    		}
>>>    
>>> +		if (struct_v >= 5) {
>>> +			u32 alen;
>>> +
>>> +			ceph_decode_32_safe(p, end, alen, bad);
>>> +
>>> +			while (alen--) {
>>> +				u32 len;
>>> +
>>> +				/* key */
>>> +				ceph_decode_32_safe(p, end, len, bad);
>>> +				ceph_decode_skip_n(p, end, len, bad);
>>> +				/* value */
>>> +				ceph_decode_32_safe(p, end, len, bad);
>>> +				ceph_decode_skip_n(p, end, len, bad);
>>> +			}
>>> +		}
>>> +
>>> +		/* fscrypt flag -- ignore */
>>> +		if (struct_v >= 6)
>>> +			ceph_decode_skip_8(p, end, bad);
>>> +
>>> +		info->fscrypt_auth = NULL;
>>> +		info->fscrypt_file = NULL;
>> The 'fscrypt_auth_len' and 'fscrypt_file_len' should also be reset here.
>> Or we will hit the issue I mentioned as bellow:
>>
>>
>> cp: cannot access './dir___683': No buffer space available
>> cp: cannot access './dir___686': No buffer space available
>>
>> The dmesg logs:
>>
>> <7>[ 1256.918250] ceph:  readdir 0000000089964a71 file 00000000065cb689
>> pos 0
>> <7>[ 1256.918254] ceph:  readdir off 0 -> '.'
>> <7>[ 1256.918258] ceph:  readdir off 1 -> '..'
>> <4>[ 1256.918262] fscrypt (ceph, inode 1099511630270): Error -105
>> getting encryption context
>> <7>[ 1256.918269] ceph:  readdir 0000000089964a71 file 00000000065cb689
>> pos 2
>> <4>[ 1256.918273] fscrypt (ceph, inode 1099511630270): Error -105
>> getting encryption context
>>
>>
>> This can be reproduced when using an old ceph cluster without fscrypt
>> support.
>>
>> And also I have sent out one fix to zero the memory when allocating it
>> in ceph_readdir() to fix the potential bug like this.
>>
>> Thanks
>>
>> BRs
>>
>> -- Xiubo
>>
>>
> Good catch, Xiubo.
>
> I merged your patch into the testing branch, and fixed this patch to
> also zero out the fscrypt_auth_len and fscrypt_file_len. I've also
> rebased the wip-fscrypt branch onto the current testing branch.

Sure, I will test it.

-- Xiubo

>>> +		if (struct_v >= 7) {
>>> +			ceph_decode_32_safe(p, end, info->fscrypt_auth_len, bad);
>>> +			if (info->fscrypt_auth_len) {
>>> +				info->fscrypt_auth = kmalloc(info->fscrypt_auth_len, GFP_KERNEL);
>>> +				if (!info->fscrypt_auth)
>>> +					return -ENOMEM;
>>> +				ceph_decode_copy_safe(p, end, info->fscrypt_auth,
>>> +						      info->fscrypt_auth_len, bad);
>>> +			}
>>> +			ceph_decode_32_safe(p, end, info->fscrypt_file_len, bad);
>>> +			if (info->fscrypt_file_len) {
>>> +				info->fscrypt_file = kmalloc(info->fscrypt_file_len, GFP_KERNEL);
>>> +				if (!info->fscrypt_file)
>>> +					return -ENOMEM;
>>> +				ceph_decode_copy_safe(p, end, info->fscrypt_file,
>>> +						      info->fscrypt_file_len, bad);
>>> +			}
>>> +		}
>>>    		*p = end;
>>>    	} else {
>>> +		/* legacy (unversioned) struct */
>>>    		if (features & CEPH_FEATURE_MDS_INLINE_DATA) {
>>>    			ceph_decode_64_safe(p, end, info->inline_version, bad);
>>>    			ceph_decode_32_safe(p, end, info->inline_len, bad);
>>> @@ -626,8 +668,21 @@ static int parse_reply_info(struct ceph_mds_session *s, struct ceph_msg *msg,
>>>    
>>>    static void destroy_reply_info(struct ceph_mds_reply_info_parsed *info)
>>>    {
>>> +	int i;
>>> +
>>> +	kfree(info->diri.fscrypt_auth);
>>> +	kfree(info->diri.fscrypt_file);
>>> +	kfree(info->targeti.fscrypt_auth);
>>> +	kfree(info->targeti.fscrypt_file);
>>>    	if (!info->dir_entries)
>>>    		return;
>>> +
>>> +	for (i = 0; i < info->dir_nr; i++) {
>>> +		struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
>>> +
>>> +		kfree(rde->inode.fscrypt_auth);
>>> +		kfree(rde->inode.fscrypt_file);
>>> +	}
>>>    	free_pages((unsigned long)info->dir_entries, get_order(info->dir_buf_size));
>>>    }
>>>    
>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>>> index c3986a412fb5..98a8710807d1 100644
>>> --- a/fs/ceph/mds_client.h
>>> +++ b/fs/ceph/mds_client.h
>>> @@ -88,6 +88,10 @@ struct ceph_mds_reply_info_in {
>>>    	s32 dir_pin;
>>>    	struct ceph_timespec btime;
>>>    	struct ceph_timespec snap_btime;
>>> +	u8 *fscrypt_auth;
>>> +	u8 *fscrypt_file;
>>> +	u32 fscrypt_auth_len;
>>> +	u32 fscrypt_file_len;
>>>    	u64 rsnaps;
>>>    	u64 change_attr;
>>>    };
>>> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
>>> index 532ee9fca878..5b4092e5f291 100644
>>> --- a/fs/ceph/super.h
>>> +++ b/fs/ceph/super.h
>>> @@ -433,6 +433,12 @@ struct ceph_inode_info {
>>>    	struct work_struct i_work;
>>>    	unsigned long  i_work_mask;
>>>    
>>> +#ifdef CONFIG_FS_ENCRYPTION
>>> +	u32 fscrypt_auth_len;
>>> +	u32 fscrypt_file_len;
>>> +	u8 *fscrypt_auth;
>>> +	u8 *fscrypt_file;
>>> +#endif
>>>    #ifdef CONFIG_CEPH_FSCACHE
>>>    	struct fscache_cookie *fscache;
>>>    #endif


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 11/48] ceph: decode alternate_name in lease info
  2022-01-11 19:15 ` [RFC PATCH v10 11/48] ceph: decode alternate_name in lease info Jeff Layton
@ 2022-03-01 10:57   ` Xiubo Li
  2022-03-01 11:18     ` Xiubo Li
  2022-03-01 13:10     ` Jeff Layton
  0 siblings, 2 replies; 84+ messages in thread
From: Xiubo Li @ 2022-03-01 10:57 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov


On 1/12/22 3:15 AM, Jeff Layton wrote:
> Ceph is a bit different from local filesystems, in that we don't want
> to store filenames as raw binary data, since we may also be dealing
> with clients that don't support fscrypt.
>
> We could just base64-encode the encrypted filenames, but that could
> leave us with filenames longer than NAME_MAX. It turns out that the
> MDS doesn't care much about filename length, but the clients do.
>
> To manage this, we've added a new "alternate name" field that can be
> optionally added to any dentry that we'll use to store the binary
> crypttext of the filename if its base64-encoded value will be longer
> than NAME_MAX. When a dentry has one of these names attached, the MDS
> will send it along in the lease info, which we can then store for
> later usage.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>   fs/ceph/mds_client.c | 40 ++++++++++++++++++++++++++++++----------
>   fs/ceph/mds_client.h | 11 +++++++----
>   2 files changed, 37 insertions(+), 14 deletions(-)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 34a4f6dbac9d..709f3f654555 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -306,27 +306,44 @@ static int parse_reply_info_dir(void **p, void *end,
>   
>   static int parse_reply_info_lease(void **p, void *end,
>   				  struct ceph_mds_reply_lease **lease,
> -				  u64 features)
> +				  u64 features, u32 *altname_len, u8 **altname)
>   {
> +	u8 struct_v;
> +	u32 struct_len;
> +
>   	if (features == (u64)-1) {
> -		u8 struct_v, struct_compat;
> -		u32 struct_len;
> +		u8 struct_compat;
> +
>   		ceph_decode_8_safe(p, end, struct_v, bad);
>   		ceph_decode_8_safe(p, end, struct_compat, bad);
> +
>   		/* struct_v is expected to be >= 1. we only understand
>   		 * encoding whose struct_compat == 1. */
>   		if (!struct_v || struct_compat != 1)
>   			goto bad;
> +
>   		ceph_decode_32_safe(p, end, struct_len, bad);
> -		ceph_decode_need(p, end, struct_len, bad);
> -		end = *p + struct_len;

Hi Jeff,

This is buggy, more detail please see https://tracker.ceph.com/issues/54430.

The following patch will fix it. We should skip the extra memories anyway.


diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 94b4c6508044..3dea96df4769 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -326,6 +326,7 @@ static int parse_reply_info_lease(void **p, void *end,
                         goto bad;

                 ceph_decode_32_safe(p, end, struct_len, bad);
+               end = *p + struct_len;
         } else {
                 struct_len = sizeof(**lease);
                 *altname_len = 0;
@@ -346,6 +347,7 @@ static int parse_reply_info_lease(void **p, void *end,
                         *altname = NULL;
                         *altname_len = 0;
                 }
+               *p = end;
         }
         return 0;
  bad:



> +	} else {
> +		struct_len = sizeof(**lease);
> +		*altname_len = 0;
> +		*altname = NULL;
>   	}
>   
> -	ceph_decode_need(p, end, sizeof(**lease), bad);
> +	ceph_decode_need(p, end, struct_len, bad);
>   	*lease = *p;
>   	*p += sizeof(**lease);
> -	if (features == (u64)-1)
> -		*p = end;
> +
> +	if (features == (u64)-1) {
> +		if (struct_v >= 2) {
> +			ceph_decode_32_safe(p, end, *altname_len, bad);
> +			ceph_decode_need(p, end, *altname_len, bad);
> +			*altname = *p;
> +			*p += *altname_len;
> +		} else {
> +			*altname = NULL;
> +			*altname_len = 0;
> +		}
> +	}
>   	return 0;
>   bad:
>   	return -EIO;
> @@ -356,7 +373,8 @@ static int parse_reply_info_trace(void **p, void *end,
>   		info->dname = *p;
>   		*p += info->dname_len;
>   
> -		err = parse_reply_info_lease(p, end, &info->dlease, features);
> +		err = parse_reply_info_lease(p, end, &info->dlease, features,
> +					     &info->altname_len, &info->altname);
>   		if (err < 0)
>   			goto out_bad;
>   	}
> @@ -423,9 +441,11 @@ static int parse_reply_info_readdir(void **p, void *end,
>   		dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);
>   
>   		/* dentry lease */
> -		err = parse_reply_info_lease(p, end, &rde->lease, features);
> +		err = parse_reply_info_lease(p, end, &rde->lease, features,
> +					     &rde->altname_len, &rde->altname);
>   		if (err)
>   			goto out_bad;
> +
>   		/* inode */
>   		err = parse_reply_info_in(p, end, &rde->inode, features);
>   		if (err < 0)
> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> index e7d2c8a1b9c1..128901a847af 100644
> --- a/fs/ceph/mds_client.h
> +++ b/fs/ceph/mds_client.h
> @@ -29,8 +29,8 @@ enum ceph_feature_type {
>   	CEPHFS_FEATURE_MULTI_RECONNECT,
>   	CEPHFS_FEATURE_DELEG_INO,
>   	CEPHFS_FEATURE_METRIC_COLLECT,
> -
> -	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
> +	CEPHFS_FEATURE_ALTERNATE_NAME,
> +	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_ALTERNATE_NAME,
>   };
>   
>   /*
> @@ -45,8 +45,7 @@ enum ceph_feature_type {
>   	CEPHFS_FEATURE_MULTI_RECONNECT,		\
>   	CEPHFS_FEATURE_DELEG_INO,		\
>   	CEPHFS_FEATURE_METRIC_COLLECT,		\
> -						\
> -	CEPHFS_FEATURE_MAX,			\
> +	CEPHFS_FEATURE_ALTERNATE_NAME,		\
>   }
>   #define CEPHFS_FEATURES_CLIENT_REQUIRED {}
>   
> @@ -98,7 +97,9 @@ struct ceph_mds_reply_info_in {
>   
>   struct ceph_mds_reply_dir_entry {
>   	char                          *name;
> +	u8			      *altname;
>   	u32                           name_len;
> +	u32			      altname_len;
>   	struct ceph_mds_reply_lease   *lease;
>   	struct ceph_mds_reply_info_in inode;
>   	loff_t			      offset;
> @@ -117,7 +118,9 @@ struct ceph_mds_reply_info_parsed {
>   	struct ceph_mds_reply_info_in diri, targeti;
>   	struct ceph_mds_reply_dirfrag *dirfrag;
>   	char                          *dname;
> +	u8			      *altname;
>   	u32                           dname_len;
> +	u32                           altname_len;
>   	struct ceph_mds_reply_lease   *dlease;
>   
>   	/* extra */


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 11/48] ceph: decode alternate_name in lease info
  2022-03-01 10:57   ` Xiubo Li
@ 2022-03-01 11:18     ` Xiubo Li
  2022-03-01 13:10     ` Jeff Layton
  1 sibling, 0 replies; 84+ messages in thread
From: Xiubo Li @ 2022-03-01 11:18 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov


On 3/1/22 6:57 PM, Xiubo Li wrote:
>
> On 1/12/22 3:15 AM, Jeff Layton wrote:
>> Ceph is a bit different from local filesystems, in that we don't want
>> to store filenames as raw binary data, since we may also be dealing
>> with clients that don't support fscrypt.
>>
>> We could just base64-encode the encrypted filenames, but that could
>> leave us with filenames longer than NAME_MAX. It turns out that the
>> MDS doesn't care much about filename length, but the clients do.
>>
>> To manage this, we've added a new "alternate name" field that can be
>> optionally added to any dentry that we'll use to store the binary
>> crypttext of the filename if its base64-encoded value will be longer
>> than NAME_MAX. When a dentry has one of these names attached, the MDS
>> will send it along in the lease info, which we can then store for
>> later usage.
>>
>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>> ---
>>   fs/ceph/mds_client.c | 40 ++++++++++++++++++++++++++++++----------
>>   fs/ceph/mds_client.h | 11 +++++++----
>>   2 files changed, 37 insertions(+), 14 deletions(-)
>>
>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>> index 34a4f6dbac9d..709f3f654555 100644
>> --- a/fs/ceph/mds_client.c
>> +++ b/fs/ceph/mds_client.c
>> @@ -306,27 +306,44 @@ static int parse_reply_info_dir(void **p, void 
>> *end,
>>     static int parse_reply_info_lease(void **p, void *end,
>>                     struct ceph_mds_reply_lease **lease,
>> -                  u64 features)
>> +                  u64 features, u32 *altname_len, u8 **altname)
>>   {
>> +    u8 struct_v;
>> +    u32 struct_len;
>> +
>>       if (features == (u64)-1) {
>> -        u8 struct_v, struct_compat;
>> -        u32 struct_len;
>> +        u8 struct_compat;
>> +
>>           ceph_decode_8_safe(p, end, struct_v, bad);
>>           ceph_decode_8_safe(p, end, struct_compat, bad);
>> +
>>           /* struct_v is expected to be >= 1. we only understand
>>            * encoding whose struct_compat == 1. */
>>           if (!struct_v || struct_compat != 1)
>>               goto bad;
>> +
>>           ceph_decode_32_safe(p, end, struct_len, bad);
>> -        ceph_decode_need(p, end, struct_len, bad);
>> -        end = *p + struct_len;
>
> Hi Jeff,
>
> This is buggy, more detail please see 
> https://tracker.ceph.com/issues/54430.
>
> The following patch will fix it. We should skip the extra memories 
> anyway.
>
>
Hi Jeff,

I will send out a patch series to fix this later.

And we could merge the new patch to this one.

- XIubo



> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 94b4c6508044..3dea96df4769 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -326,6 +326,7 @@ static int parse_reply_info_lease(void **p, void 
> *end,
>                         goto bad;
>
>                 ceph_decode_32_safe(p, end, struct_len, bad);
> +               end = *p + struct_len;
>         } else {
>                 struct_len = sizeof(**lease);
>                 *altname_len = 0;
> @@ -346,6 +347,7 @@ static int parse_reply_info_lease(void **p, void 
> *end,
>                         *altname = NULL;
>                         *altname_len = 0;
>                 }
> +               *p = end;
>         }
>         return 0;
>  bad:
>
>
>
>> +    } else {
>> +        struct_len = sizeof(**lease);
>> +        *altname_len = 0;
>> +        *altname = NULL;
>>       }
>>   -    ceph_decode_need(p, end, sizeof(**lease), bad);
>> +    ceph_decode_need(p, end, struct_len, bad);
>>       *lease = *p;
>>       *p += sizeof(**lease);
>> -    if (features == (u64)-1)
>> -        *p = end;
>> +
>> +    if (features == (u64)-1) {
>> +        if (struct_v >= 2) {
>> +            ceph_decode_32_safe(p, end, *altname_len, bad);
>> +            ceph_decode_need(p, end, *altname_len, bad);
>> +            *altname = *p;
>> +            *p += *altname_len;
>> +        } else {
>> +            *altname = NULL;
>> +            *altname_len = 0;
>> +        }
>> +    }
>>       return 0;
>>   bad:
>>       return -EIO;
>> @@ -356,7 +373,8 @@ static int parse_reply_info_trace(void **p, void 
>> *end,
>>           info->dname = *p;
>>           *p += info->dname_len;
>>   -        err = parse_reply_info_lease(p, end, &info->dlease, 
>> features);
>> +        err = parse_reply_info_lease(p, end, &info->dlease, features,
>> +                         &info->altname_len, &info->altname);
>>           if (err < 0)
>>               goto out_bad;
>>       }
>> @@ -423,9 +441,11 @@ static int parse_reply_info_readdir(void **p, 
>> void *end,
>>           dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);
>>             /* dentry lease */
>> -        err = parse_reply_info_lease(p, end, &rde->lease, features);
>> +        err = parse_reply_info_lease(p, end, &rde->lease, features,
>> +                         &rde->altname_len, &rde->altname);
>>           if (err)
>>               goto out_bad;
>> +
>>           /* inode */
>>           err = parse_reply_info_in(p, end, &rde->inode, features);
>>           if (err < 0)
>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>> index e7d2c8a1b9c1..128901a847af 100644
>> --- a/fs/ceph/mds_client.h
>> +++ b/fs/ceph/mds_client.h
>> @@ -29,8 +29,8 @@ enum ceph_feature_type {
>>       CEPHFS_FEATURE_MULTI_RECONNECT,
>>       CEPHFS_FEATURE_DELEG_INO,
>>       CEPHFS_FEATURE_METRIC_COLLECT,
>> -
>> -    CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
>> +    CEPHFS_FEATURE_ALTERNATE_NAME,
>> +    CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_ALTERNATE_NAME,
>>   };
>>     /*
>> @@ -45,8 +45,7 @@ enum ceph_feature_type {
>>       CEPHFS_FEATURE_MULTI_RECONNECT,        \
>>       CEPHFS_FEATURE_DELEG_INO,        \
>>       CEPHFS_FEATURE_METRIC_COLLECT,        \
>> -                        \
>> -    CEPHFS_FEATURE_MAX,            \
>> +    CEPHFS_FEATURE_ALTERNATE_NAME,        \
>>   }
>>   #define CEPHFS_FEATURES_CLIENT_REQUIRED {}
>>   @@ -98,7 +97,9 @@ struct ceph_mds_reply_info_in {
>>     struct ceph_mds_reply_dir_entry {
>>       char                          *name;
>> +    u8                  *altname;
>>       u32                           name_len;
>> +    u32                  altname_len;
>>       struct ceph_mds_reply_lease   *lease;
>>       struct ceph_mds_reply_info_in inode;
>>       loff_t                  offset;
>> @@ -117,7 +118,9 @@ struct ceph_mds_reply_info_parsed {
>>       struct ceph_mds_reply_info_in diri, targeti;
>>       struct ceph_mds_reply_dirfrag *dirfrag;
>>       char                          *dname;
>> +    u8                  *altname;
>>       u32                           dname_len;
>> +    u32                           altname_len;
>>       struct ceph_mds_reply_lease   *dlease;
>>         /* extra */


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 11/48] ceph: decode alternate_name in lease info
  2022-03-01 10:57   ` Xiubo Li
  2022-03-01 11:18     ` Xiubo Li
@ 2022-03-01 13:10     ` Jeff Layton
  2022-03-01 13:51       ` Xiubo Li
  1 sibling, 1 reply; 84+ messages in thread
From: Jeff Layton @ 2022-03-01 13:10 UTC (permalink / raw)
  To: Xiubo Li, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

On Tue, 2022-03-01 at 18:57 +0800, Xiubo Li wrote:
> On 1/12/22 3:15 AM, Jeff Layton wrote:
> > Ceph is a bit different from local filesystems, in that we don't want
> > to store filenames as raw binary data, since we may also be dealing
> > with clients that don't support fscrypt.
> > 
> > We could just base64-encode the encrypted filenames, but that could
> > leave us with filenames longer than NAME_MAX. It turns out that the
> > MDS doesn't care much about filename length, but the clients do.
> > 
> > To manage this, we've added a new "alternate name" field that can be
> > optionally added to any dentry that we'll use to store the binary
> > crypttext of the filename if its base64-encoded value will be longer
> > than NAME_MAX. When a dentry has one of these names attached, the MDS
> > will send it along in the lease info, which we can then store for
> > later usage.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >   fs/ceph/mds_client.c | 40 ++++++++++++++++++++++++++++++----------
> >   fs/ceph/mds_client.h | 11 +++++++----
> >   2 files changed, 37 insertions(+), 14 deletions(-)
> > 
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index 34a4f6dbac9d..709f3f654555 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -306,27 +306,44 @@ static int parse_reply_info_dir(void **p, void *end,
> >   
> >   static int parse_reply_info_lease(void **p, void *end,
> >   				  struct ceph_mds_reply_lease **lease,
> > -				  u64 features)
> > +				  u64 features, u32 *altname_len, u8 **altname)
> >   {
> > +	u8 struct_v;
> > +	u32 struct_len;
> > +
> >   	if (features == (u64)-1) {
> > -		u8 struct_v, struct_compat;
> > -		u32 struct_len;
> > +		u8 struct_compat;
> > +
> >   		ceph_decode_8_safe(p, end, struct_v, bad);
> >   		ceph_decode_8_safe(p, end, struct_compat, bad);
> > +
> >   		/* struct_v is expected to be >= 1. we only understand
> >   		 * encoding whose struct_compat == 1. */
> >   		if (!struct_v || struct_compat != 1)
> >   			goto bad;
> > +
> >   		ceph_decode_32_safe(p, end, struct_len, bad);
> > -		ceph_decode_need(p, end, struct_len, bad);
> > -		end = *p + struct_len;
> 
> Hi Jeff,
> 
> This is buggy, more detail please see https://tracker.ceph.com/issues/54430.
> 
> The following patch will fix it. We should skip the extra memories anyway.
> 
> 
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 94b4c6508044..3dea96df4769 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -326,6 +326,7 @@ static int parse_reply_info_lease(void **p, void *end,
>                          goto bad;
> 
>                  ceph_decode_32_safe(p, end, struct_len, bad);
> +               end = *p + struct_len;


There may be a bug here, but this doesn't look like the right fix. "end"
denotes the end of the buffer we're decoding. We don't generally want to
go changing it like this. Consider what would happen if the original
"end" was shorter than *p + struct_len.


>          } else {
>                  struct_len = sizeof(**lease);
>                  *altname_len = 0;
> @@ -346,6 +347,7 @@ static int parse_reply_info_lease(void **p, void *end,
>                          *altname = NULL;
>                          *altname_len = 0;
>                  }
> +               *p = end;


I think we just have to do the math here. Maybe this should be something
like this?

    *p += struct_len - sizeof(**lease) - *altname_len;

>          }
>          return 0;
>   bad:
> 
> 
> 


> > +	} else {
> > +		struct_len = sizeof(**lease);
> > +		*altname_len = 0;
> > +		*altname = NULL;
> >   	}
> >   
> > -	ceph_decode_need(p, end, sizeof(**lease), bad);
> > +	ceph_decode_need(p, end, struct_len, bad);
> >   	*lease = *p;
> >   	*p += sizeof(**lease);
> > -	if (features == (u64)-1)
> > -		*p = end;
> > +
> > +	if (features == (u64)-1) {
> > +		if (struct_v >= 2) {
> > +			ceph_decode_32_safe(p, end, *altname_len, bad);
> > +			ceph_decode_need(p, end, *altname_len, bad);
> > +			*altname = *p;
> > +			*p += *altname_len;
> > +		} else {
> > +			*altname = NULL;
> > +			*altname_len = 0;
> > +		}
> > +	}
> >   	return 0;
> >   bad:
> >   	return -EIO;
> > @@ -356,7 +373,8 @@ static int parse_reply_info_trace(void **p, void *end,
> >   		info->dname = *p;
> >   		*p += info->dname_len;
> >   
> > -		err = parse_reply_info_lease(p, end, &info->dlease, features);
> > +		err = parse_reply_info_lease(p, end, &info->dlease, features,
> > +					     &info->altname_len, &info->altname);
> >   		if (err < 0)
> >   			goto out_bad;
> >   	}
> > @@ -423,9 +441,11 @@ static int parse_reply_info_readdir(void **p, void *end,
> >   		dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);
> >   
> >   		/* dentry lease */
> > -		err = parse_reply_info_lease(p, end, &rde->lease, features);
> > +		err = parse_reply_info_lease(p, end, &rde->lease, features,
> > +					     &rde->altname_len, &rde->altname);
> >   		if (err)
> >   			goto out_bad;
> > +
> >   		/* inode */
> >   		err = parse_reply_info_in(p, end, &rde->inode, features);
> >   		if (err < 0)
> > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > index e7d2c8a1b9c1..128901a847af 100644
> > --- a/fs/ceph/mds_client.h
> > +++ b/fs/ceph/mds_client.h
> > @@ -29,8 +29,8 @@ enum ceph_feature_type {
> >   	CEPHFS_FEATURE_MULTI_RECONNECT,
> >   	CEPHFS_FEATURE_DELEG_INO,
> >   	CEPHFS_FEATURE_METRIC_COLLECT,
> > -
> > -	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
> > +	CEPHFS_FEATURE_ALTERNATE_NAME,
> > +	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_ALTERNATE_NAME,
> >   };
> >   
> >   /*
> > @@ -45,8 +45,7 @@ enum ceph_feature_type {
> >   	CEPHFS_FEATURE_MULTI_RECONNECT,		\
> >   	CEPHFS_FEATURE_DELEG_INO,		\
> >   	CEPHFS_FEATURE_METRIC_COLLECT,		\
> > -						\
> > -	CEPHFS_FEATURE_MAX,			\
> > +	CEPHFS_FEATURE_ALTERNATE_NAME,		\
> >   }
> >   #define CEPHFS_FEATURES_CLIENT_REQUIRED {}
> >   
> > @@ -98,7 +97,9 @@ struct ceph_mds_reply_info_in {
> >   
> >   struct ceph_mds_reply_dir_entry {
> >   	char                          *name;
> > +	u8			      *altname;
> >   	u32                           name_len;
> > +	u32			      altname_len;
> >   	struct ceph_mds_reply_lease   *lease;
> >   	struct ceph_mds_reply_info_in inode;
> >   	loff_t			      offset;
> > @@ -117,7 +118,9 @@ struct ceph_mds_reply_info_parsed {
> >   	struct ceph_mds_reply_info_in diri, targeti;
> >   	struct ceph_mds_reply_dirfrag *dirfrag;
> >   	char                          *dname;
> > +	u8			      *altname;
> >   	u32                           dname_len;
> > +	u32                           altname_len;
> >   	struct ceph_mds_reply_lease   *dlease;
> >   
> >   	/* extra */
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 11/48] ceph: decode alternate_name in lease info
  2022-03-01 13:10     ` Jeff Layton
@ 2022-03-01 13:51       ` Xiubo Li
  2022-03-01 13:57         ` Jeff Layton
  0 siblings, 1 reply; 84+ messages in thread
From: Xiubo Li @ 2022-03-01 13:51 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov


On 3/1/22 9:10 PM, Jeff Layton wrote:
> On Tue, 2022-03-01 at 18:57 +0800, Xiubo Li wrote:
>> On 1/12/22 3:15 AM, Jeff Layton wrote:
>>> Ceph is a bit different from local filesystems, in that we don't want
>>> to store filenames as raw binary data, since we may also be dealing
>>> with clients that don't support fscrypt.
>>>
>>> We could just base64-encode the encrypted filenames, but that could
>>> leave us with filenames longer than NAME_MAX. It turns out that the
>>> MDS doesn't care much about filename length, but the clients do.
>>>
>>> To manage this, we've added a new "alternate name" field that can be
>>> optionally added to any dentry that we'll use to store the binary
>>> crypttext of the filename if its base64-encoded value will be longer
>>> than NAME_MAX. When a dentry has one of these names attached, the MDS
>>> will send it along in the lease info, which we can then store for
>>> later usage.
>>>
>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>>    fs/ceph/mds_client.c | 40 ++++++++++++++++++++++++++++++----------
>>>    fs/ceph/mds_client.h | 11 +++++++----
>>>    2 files changed, 37 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>> index 34a4f6dbac9d..709f3f654555 100644
>>> --- a/fs/ceph/mds_client.c
>>> +++ b/fs/ceph/mds_client.c
>>> @@ -306,27 +306,44 @@ static int parse_reply_info_dir(void **p, void *end,
>>>    
>>>    static int parse_reply_info_lease(void **p, void *end,
>>>    				  struct ceph_mds_reply_lease **lease,
>>> -				  u64 features)
>>> +				  u64 features, u32 *altname_len, u8 **altname)
>>>    {
>>> +	u8 struct_v;
>>> +	u32 struct_len;
>>> +
>>>    	if (features == (u64)-1) {
>>> -		u8 struct_v, struct_compat;
>>> -		u32 struct_len;
>>> +		u8 struct_compat;
>>> +
>>>    		ceph_decode_8_safe(p, end, struct_v, bad);
>>>    		ceph_decode_8_safe(p, end, struct_compat, bad);
>>> +
>>>    		/* struct_v is expected to be >= 1. we only understand
>>>    		 * encoding whose struct_compat == 1. */
>>>    		if (!struct_v || struct_compat != 1)
>>>    			goto bad;
>>> +
>>>    		ceph_decode_32_safe(p, end, struct_len, bad);
>>> -		ceph_decode_need(p, end, struct_len, bad);
>>> -		end = *p + struct_len;
>> Hi Jeff,
>>
>> This is buggy, more detail please see https://tracker.ceph.com/issues/54430.
>>
>> The following patch will fix it. We should skip the extra memories anyway.
>>
>>
>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>> index 94b4c6508044..3dea96df4769 100644
>> --- a/fs/ceph/mds_client.c
>> +++ b/fs/ceph/mds_client.c
>> @@ -326,6 +326,7 @@ static int parse_reply_info_lease(void **p, void *end,
>>                           goto bad;
>>
>>                   ceph_decode_32_safe(p, end, struct_len, bad);
>> +               end = *p + struct_len;
>
> There may be a bug here,

Yeah, this will be crash when I use the PR 
https://github.com/ceph/ceph/pull/45208.


> but this doesn't look like the right fix. "end"
> denotes the end of the buffer we're decoding. We don't generally want to
> go changing it like this. Consider what would happen if the original
> "end" was shorter than *p + struct_len.
I missed you have also set the struct_len in the else branch.
>
>>           } else {
>>                   struct_len = sizeof(**lease);
>>                   *altname_len = 0;
>> @@ -346,6 +347,7 @@ static int parse_reply_info_lease(void **p, void *end,
>>                           *altname = NULL;
>>                           *altname_len = 0;
>>                   }
>> +               *p = end;
>
> I think we just have to do the math here. Maybe this should be something
> like this?
>
>      *p += struct_len - sizeof(**lease) - *altname_len;

This is correct, but in future if we are adding tens of new fields we 
must minus them all here.

How about this one:


diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 94b4c6508044..608d077f2eeb 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -313,6 +313,7 @@ static int parse_reply_info_lease(void **p, void *end,
  {
         u8 struct_v;
         u32 struct_len;
+       void *lend;

         if (features == (u64)-1) {
                 u8 struct_compat;
@@ -332,6 +333,7 @@ static int parse_reply_info_lease(void **p, void *end,
                 *altname = NULL;
         }

+       lend = *p + struct_len;
         ceph_decode_need(p, end, struct_len, bad);
         *lease = *p;
         *p += sizeof(**lease);
@@ -347,6 +349,7 @@ static int parse_reply_info_lease(void **p, void *end,
                         *altname_len = 0;
                 }
         }
+       *p = lend;
         return 0;
  bad:
         return -EIO;


>>           }
>>           return 0;
>>    bad:
>>
>>
>>
>
>>> +	} else {
>>> +		struct_len = sizeof(**lease);
>>> +		*altname_len = 0;
>>> +		*altname = NULL;
>>>    	}
>>>    
>>> -	ceph_decode_need(p, end, sizeof(**lease), bad);
>>> +	ceph_decode_need(p, end, struct_len, bad);
>>>    	*lease = *p;
>>>    	*p += sizeof(**lease);
>>> -	if (features == (u64)-1)
>>> -		*p = end;
>>> +
>>> +	if (features == (u64)-1) {
>>> +		if (struct_v >= 2) {
>>> +			ceph_decode_32_safe(p, end, *altname_len, bad);
>>> +			ceph_decode_need(p, end, *altname_len, bad);
>>> +			*altname = *p;
>>> +			*p += *altname_len;
>>> +		} else {
>>> +			*altname = NULL;
>>> +			*altname_len = 0;
>>> +		}
>>> +	}
>>>    	return 0;
>>>    bad:
>>>    	return -EIO;
>>> @@ -356,7 +373,8 @@ static int parse_reply_info_trace(void **p, void *end,
>>>    		info->dname = *p;
>>>    		*p += info->dname_len;
>>>    
>>> -		err = parse_reply_info_lease(p, end, &info->dlease, features);
>>> +		err = parse_reply_info_lease(p, end, &info->dlease, features,
>>> +					     &info->altname_len, &info->altname);
>>>    		if (err < 0)
>>>    			goto out_bad;
>>>    	}
>>> @@ -423,9 +441,11 @@ static int parse_reply_info_readdir(void **p, void *end,
>>>    		dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);
>>>    
>>>    		/* dentry lease */
>>> -		err = parse_reply_info_lease(p, end, &rde->lease, features);
>>> +		err = parse_reply_info_lease(p, end, &rde->lease, features,
>>> +					     &rde->altname_len, &rde->altname);
>>>    		if (err)
>>>    			goto out_bad;
>>> +
>>>    		/* inode */
>>>    		err = parse_reply_info_in(p, end, &rde->inode, features);
>>>    		if (err < 0)
>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>>> index e7d2c8a1b9c1..128901a847af 100644
>>> --- a/fs/ceph/mds_client.h
>>> +++ b/fs/ceph/mds_client.h
>>> @@ -29,8 +29,8 @@ enum ceph_feature_type {
>>>    	CEPHFS_FEATURE_MULTI_RECONNECT,
>>>    	CEPHFS_FEATURE_DELEG_INO,
>>>    	CEPHFS_FEATURE_METRIC_COLLECT,
>>> -
>>> -	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
>>> +	CEPHFS_FEATURE_ALTERNATE_NAME,
>>> +	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_ALTERNATE_NAME,
>>>    };
>>>    
>>>    /*
>>> @@ -45,8 +45,7 @@ enum ceph_feature_type {
>>>    	CEPHFS_FEATURE_MULTI_RECONNECT,		\
>>>    	CEPHFS_FEATURE_DELEG_INO,		\
>>>    	CEPHFS_FEATURE_METRIC_COLLECT,		\
>>> -						\
>>> -	CEPHFS_FEATURE_MAX,			\
>>> +	CEPHFS_FEATURE_ALTERNATE_NAME,		\
>>>    }
>>>    #define CEPHFS_FEATURES_CLIENT_REQUIRED {}
>>>    
>>> @@ -98,7 +97,9 @@ struct ceph_mds_reply_info_in {
>>>    
>>>    struct ceph_mds_reply_dir_entry {
>>>    	char                          *name;
>>> +	u8			      *altname;
>>>    	u32                           name_len;
>>> +	u32			      altname_len;
>>>    	struct ceph_mds_reply_lease   *lease;
>>>    	struct ceph_mds_reply_info_in inode;
>>>    	loff_t			      offset;
>>> @@ -117,7 +118,9 @@ struct ceph_mds_reply_info_parsed {
>>>    	struct ceph_mds_reply_info_in diri, targeti;
>>>    	struct ceph_mds_reply_dirfrag *dirfrag;
>>>    	char                          *dname;
>>> +	u8			      *altname;
>>>    	u32                           dname_len;
>>> +	u32                           altname_len;
>>>    	struct ceph_mds_reply_lease   *dlease;
>>>    
>>>    	/* extra */


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 11/48] ceph: decode alternate_name in lease info
  2022-03-01 13:51       ` Xiubo Li
@ 2022-03-01 13:57         ` Jeff Layton
  2022-03-01 14:07           ` Xiubo Li
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff Layton @ 2022-03-01 13:57 UTC (permalink / raw)
  To: Xiubo Li, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

On Tue, 2022-03-01 at 21:51 +0800, Xiubo Li wrote:
> On 3/1/22 9:10 PM, Jeff Layton wrote:
> > On Tue, 2022-03-01 at 18:57 +0800, Xiubo Li wrote:
> > > On 1/12/22 3:15 AM, Jeff Layton wrote:
> > > > Ceph is a bit different from local filesystems, in that we don't want
> > > > to store filenames as raw binary data, since we may also be dealing
> > > > with clients that don't support fscrypt.
> > > > 
> > > > We could just base64-encode the encrypted filenames, but that could
> > > > leave us with filenames longer than NAME_MAX. It turns out that the
> > > > MDS doesn't care much about filename length, but the clients do.
> > > > 
> > > > To manage this, we've added a new "alternate name" field that can be
> > > > optionally added to any dentry that we'll use to store the binary
> > > > crypttext of the filename if its base64-encoded value will be longer
> > > > than NAME_MAX. When a dentry has one of these names attached, the MDS
> > > > will send it along in the lease info, which we can then store for
> > > > later usage.
> > > > 
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >    fs/ceph/mds_client.c | 40 ++++++++++++++++++++++++++++++----------
> > > >    fs/ceph/mds_client.h | 11 +++++++----
> > > >    2 files changed, 37 insertions(+), 14 deletions(-)
> > > > 
> > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > > index 34a4f6dbac9d..709f3f654555 100644
> > > > --- a/fs/ceph/mds_client.c
> > > > +++ b/fs/ceph/mds_client.c
> > > > @@ -306,27 +306,44 @@ static int parse_reply_info_dir(void **p, void *end,
> > > >    
> > > >    static int parse_reply_info_lease(void **p, void *end,
> > > >    				  struct ceph_mds_reply_lease **lease,
> > > > -				  u64 features)
> > > > +				  u64 features, u32 *altname_len, u8 **altname)
> > > >    {
> > > > +	u8 struct_v;
> > > > +	u32 struct_len;
> > > > +
> > > >    	if (features == (u64)-1) {
> > > > -		u8 struct_v, struct_compat;
> > > > -		u32 struct_len;
> > > > +		u8 struct_compat;
> > > > +
> > > >    		ceph_decode_8_safe(p, end, struct_v, bad);
> > > >    		ceph_decode_8_safe(p, end, struct_compat, bad);
> > > > +
> > > >    		/* struct_v is expected to be >= 1. we only understand
> > > >    		 * encoding whose struct_compat == 1. */
> > > >    		if (!struct_v || struct_compat != 1)
> > > >    			goto bad;
> > > > +
> > > >    		ceph_decode_32_safe(p, end, struct_len, bad);
> > > > -		ceph_decode_need(p, end, struct_len, bad);
> > > > -		end = *p + struct_len;
> > > Hi Jeff,
> > > 
> > > This is buggy, more detail please see https://tracker.ceph.com/issues/54430.
> > > 
> > > The following patch will fix it. We should skip the extra memories anyway.
> > > 
> > > 
> > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > index 94b4c6508044..3dea96df4769 100644
> > > --- a/fs/ceph/mds_client.c
> > > +++ b/fs/ceph/mds_client.c
> > > @@ -326,6 +326,7 @@ static int parse_reply_info_lease(void **p, void *end,
> > >                           goto bad;
> > > 
> > >                   ceph_decode_32_safe(p, end, struct_len, bad);
> > > +               end = *p + struct_len;
> > 
> > There may be a bug here,
> 
> Yeah, this will be crash when I use the PR 
> https://github.com/ceph/ceph/pull/45208.
> 
> 
> > but this doesn't look like the right fix. "end"
> > denotes the end of the buffer we're decoding. We don't generally want to
> > go changing it like this. Consider what would happen if the original
> > "end" was shorter than *p + struct_len.
> I missed you have also set the struct_len in the else branch.
> > 
> > >           } else {
> > >                   struct_len = sizeof(**lease);
> > >                   *altname_len = 0;
> > > @@ -346,6 +347,7 @@ static int parse_reply_info_lease(void **p, void *end,
> > >                           *altname = NULL;
> > >                           *altname_len = 0;
> > >                   }
> > > +               *p = end;
> > 
> > I think we just have to do the math here. Maybe this should be something
> > like this?
> > 
> >      *p += struct_len - sizeof(**lease) - *altname_len;
> 
> This is correct, but in future if we are adding tens of new fields we 
> must minus them all here.
> 
> How about this one:
> 
> 
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 94b4c6508044..608d077f2eeb 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -313,6 +313,7 @@ static int parse_reply_info_lease(void **p, void *end,
>   {
>          u8 struct_v;
>          u32 struct_len;
> +       void *lend;
> 
>          if (features == (u64)-1) {
>                  u8 struct_compat;
> @@ -332,6 +333,7 @@ static int parse_reply_info_lease(void **p, void *end,
>                  *altname = NULL;
>          }
> 
> +       lend = *p + struct_len;


Looks reasonable. Maybe also add a check like this?

    if (lend > end)
	    return -EIO;


>          ceph_decode_need(p, end, struct_len, bad);
>          *lease = *p;
>          *p += sizeof(**lease);
> @@ -347,6 +349,7 @@ static int parse_reply_info_lease(void **p, void *end,
>                          *altname_len = 0;
>                  }
>          }
> +       *p = lend;
>          return 0;
>   bad:
>          return -EIO;
> 
> 

> > >           }
> > >           return 0;
> > >    bad:
> > > 
> > > 
> > > 
> > 
> > > > +	} else {
> > > > +		struct_len = sizeof(**lease);
> > > > +		*altname_len = 0;
> > > > +		*altname = NULL;
> > > >    	}
> > > >    
> > > > -	ceph_decode_need(p, end, sizeof(**lease), bad);
> > > > +	ceph_decode_need(p, end, struct_len, bad);
> > > >    	*lease = *p;
> > > >    	*p += sizeof(**lease);
> > > > -	if (features == (u64)-1)
> > > > -		*p = end;
> > > > +
> > > > +	if (features == (u64)-1) {
> > > > +		if (struct_v >= 2) {
> > > > +			ceph_decode_32_safe(p, end, *altname_len, bad);
> > > > +			ceph_decode_need(p, end, *altname_len, bad);
> > > > +			*altname = *p;
> > > > +			*p += *altname_len;
> > > > +		} else {
> > > > +			*altname = NULL;
> > > > +			*altname_len = 0;
> > > > +		}
> > > > +	}
> > > >    	return 0;
> > > >    bad:
> > > >    	return -EIO;
> > > > @@ -356,7 +373,8 @@ static int parse_reply_info_trace(void **p, void *end,
> > > >    		info->dname = *p;
> > > >    		*p += info->dname_len;
> > > >    
> > > > -		err = parse_reply_info_lease(p, end, &info->dlease, features);
> > > > +		err = parse_reply_info_lease(p, end, &info->dlease, features,
> > > > +					     &info->altname_len, &info->altname);
> > > >    		if (err < 0)
> > > >    			goto out_bad;
> > > >    	}
> > > > @@ -423,9 +441,11 @@ static int parse_reply_info_readdir(void **p, void *end,
> > > >    		dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);
> > > >    
> > > >    		/* dentry lease */
> > > > -		err = parse_reply_info_lease(p, end, &rde->lease, features);
> > > > +		err = parse_reply_info_lease(p, end, &rde->lease, features,
> > > > +					     &rde->altname_len, &rde->altname);
> > > >    		if (err)
> > > >    			goto out_bad;
> > > > +
> > > >    		/* inode */
> > > >    		err = parse_reply_info_in(p, end, &rde->inode, features);
> > > >    		if (err < 0)
> > > > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > > > index e7d2c8a1b9c1..128901a847af 100644
> > > > --- a/fs/ceph/mds_client.h
> > > > +++ b/fs/ceph/mds_client.h
> > > > @@ -29,8 +29,8 @@ enum ceph_feature_type {
> > > >    	CEPHFS_FEATURE_MULTI_RECONNECT,
> > > >    	CEPHFS_FEATURE_DELEG_INO,
> > > >    	CEPHFS_FEATURE_METRIC_COLLECT,
> > > > -
> > > > -	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
> > > > +	CEPHFS_FEATURE_ALTERNATE_NAME,
> > > > +	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_ALTERNATE_NAME,
> > > >    };
> > > >    
> > > >    /*
> > > > @@ -45,8 +45,7 @@ enum ceph_feature_type {
> > > >    	CEPHFS_FEATURE_MULTI_RECONNECT,		\
> > > >    	CEPHFS_FEATURE_DELEG_INO,		\
> > > >    	CEPHFS_FEATURE_METRIC_COLLECT,		\
> > > > -						\
> > > > -	CEPHFS_FEATURE_MAX,			\
> > > > +	CEPHFS_FEATURE_ALTERNATE_NAME,		\
> > > >    }
> > > >    #define CEPHFS_FEATURES_CLIENT_REQUIRED {}
> > > >    
> > > > @@ -98,7 +97,9 @@ struct ceph_mds_reply_info_in {
> > > >    
> > > >    struct ceph_mds_reply_dir_entry {
> > > >    	char                          *name;
> > > > +	u8			      *altname;
> > > >    	u32                           name_len;
> > > > +	u32			      altname_len;
> > > >    	struct ceph_mds_reply_lease   *lease;
> > > >    	struct ceph_mds_reply_info_in inode;
> > > >    	loff_t			      offset;
> > > > @@ -117,7 +118,9 @@ struct ceph_mds_reply_info_parsed {
> > > >    	struct ceph_mds_reply_info_in diri, targeti;
> > > >    	struct ceph_mds_reply_dirfrag *dirfrag;
> > > >    	char                          *dname;
> > > > +	u8			      *altname;
> > > >    	u32                           dname_len;
> > > > +	u32                           altname_len;
> > > >    	struct ceph_mds_reply_lease   *dlease;
> > > >    
> > > >    	/* extra */
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 11/48] ceph: decode alternate_name in lease info
  2022-03-01 13:57         ` Jeff Layton
@ 2022-03-01 14:07           ` Xiubo Li
  2022-03-01 14:14             ` Jeff Layton
  0 siblings, 1 reply; 84+ messages in thread
From: Xiubo Li @ 2022-03-01 14:07 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov


On 3/1/22 9:57 PM, Jeff Layton wrote:
> On Tue, 2022-03-01 at 21:51 +0800, Xiubo Li wrote:
>> On 3/1/22 9:10 PM, Jeff Layton wrote:
>>> On Tue, 2022-03-01 at 18:57 +0800, Xiubo Li wrote:
>>>> On 1/12/22 3:15 AM, Jeff Layton wrote:
>>>>> Ceph is a bit different from local filesystems, in that we don't want
>>>>> to store filenames as raw binary data, since we may also be dealing
>>>>> with clients that don't support fscrypt.
>>>>>
>>>>> We could just base64-encode the encrypted filenames, but that could
>>>>> leave us with filenames longer than NAME_MAX. It turns out that the
>>>>> MDS doesn't care much about filename length, but the clients do.
>>>>>
>>>>> To manage this, we've added a new "alternate name" field that can be
>>>>> optionally added to any dentry that we'll use to store the binary
>>>>> crypttext of the filename if its base64-encoded value will be longer
>>>>> than NAME_MAX. When a dentry has one of these names attached, the MDS
>>>>> will send it along in the lease info, which we can then store for
>>>>> later usage.
>>>>>
>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>> ---
>>>>>     fs/ceph/mds_client.c | 40 ++++++++++++++++++++++++++++++----------
>>>>>     fs/ceph/mds_client.h | 11 +++++++----
>>>>>     2 files changed, 37 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>>> index 34a4f6dbac9d..709f3f654555 100644
>>>>> --- a/fs/ceph/mds_client.c
>>>>> +++ b/fs/ceph/mds_client.c
>>>>> @@ -306,27 +306,44 @@ static int parse_reply_info_dir(void **p, void *end,
>>>>>     
>>>>>     static int parse_reply_info_lease(void **p, void *end,
>>>>>     				  struct ceph_mds_reply_lease **lease,
>>>>> -				  u64 features)
>>>>> +				  u64 features, u32 *altname_len, u8 **altname)
>>>>>     {
>>>>> +	u8 struct_v;
>>>>> +	u32 struct_len;
>>>>> +
>>>>>     	if (features == (u64)-1) {
>>>>> -		u8 struct_v, struct_compat;
>>>>> -		u32 struct_len;
>>>>> +		u8 struct_compat;
>>>>> +
>>>>>     		ceph_decode_8_safe(p, end, struct_v, bad);
>>>>>     		ceph_decode_8_safe(p, end, struct_compat, bad);
>>>>> +
>>>>>     		/* struct_v is expected to be >= 1. we only understand
>>>>>     		 * encoding whose struct_compat == 1. */
>>>>>     		if (!struct_v || struct_compat != 1)
>>>>>     			goto bad;
>>>>> +
>>>>>     		ceph_decode_32_safe(p, end, struct_len, bad);
>>>>> -		ceph_decode_need(p, end, struct_len, bad);
>>>>> -		end = *p + struct_len;
>>>> Hi Jeff,
>>>>
>>>> This is buggy, more detail please see https://tracker.ceph.com/issues/54430.
>>>>
>>>> The following patch will fix it. We should skip the extra memories anyway.
>>>>
>>>>
>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>> index 94b4c6508044..3dea96df4769 100644
>>>> --- a/fs/ceph/mds_client.c
>>>> +++ b/fs/ceph/mds_client.c
>>>> @@ -326,6 +326,7 @@ static int parse_reply_info_lease(void **p, void *end,
>>>>                            goto bad;
>>>>
>>>>                    ceph_decode_32_safe(p, end, struct_len, bad);
>>>> +               end = *p + struct_len;
>>> There may be a bug here,
>> Yeah, this will be crash when I use the PR
>> https://github.com/ceph/ceph/pull/45208.
>>
>>
>>> but this doesn't look like the right fix. "end"
>>> denotes the end of the buffer we're decoding. We don't generally want to
>>> go changing it like this. Consider what would happen if the original
>>> "end" was shorter than *p + struct_len.
>> I missed you have also set the struct_len in the else branch.
>>>>            } else {
>>>>                    struct_len = sizeof(**lease);
>>>>                    *altname_len = 0;
>>>> @@ -346,6 +347,7 @@ static int parse_reply_info_lease(void **p, void *end,
>>>>                            *altname = NULL;
>>>>                            *altname_len = 0;
>>>>                    }
>>>> +               *p = end;
>>> I think we just have to do the math here. Maybe this should be something
>>> like this?
>>>
>>>       *p += struct_len - sizeof(**lease) - *altname_len;
>> This is correct, but in future if we are adding tens of new fields we
>> must minus them all here.
>>
>> How about this one:
>>
>>
>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>> index 94b4c6508044..608d077f2eeb 100644
>> --- a/fs/ceph/mds_client.c
>> +++ b/fs/ceph/mds_client.c
>> @@ -313,6 +313,7 @@ static int parse_reply_info_lease(void **p, void *end,
>>    {
>>           u8 struct_v;
>>           u32 struct_len;
>> +       void *lend;
>>
>>           if (features == (u64)-1) {
>>                   u8 struct_compat;
>> @@ -332,6 +333,7 @@ static int parse_reply_info_lease(void **p, void *end,
>>                   *altname = NULL;
>>           }
>>
>> +       lend = *p + struct_len;
>
> Looks reasonable. Maybe also add a check like this?
>
>      if (lend > end)
> 	    return -EIO;

I don't think this is needed because the:

   ceph_decode_need(p, end, struct_len, bad);

before it will help check it ?


>
>
>>           ceph_decode_need(p, end, struct_len, bad);
>>           *lease = *p;
>>           *p += sizeof(**lease);
>> @@ -347,6 +349,7 @@ static int parse_reply_info_lease(void **p, void *end,
>>                           *altname_len = 0;
>>                   }
>>           }
>> +       *p = lend;
>>           return 0;
>>    bad:
>>           return -EIO;
>>
>>
>>>>            }
>>>>            return 0;
>>>>     bad:
>>>>
>>>>
>>>>
>>>>> +	} else {
>>>>> +		struct_len = sizeof(**lease);
>>>>> +		*altname_len = 0;
>>>>> +		*altname = NULL;
>>>>>     	}
>>>>>     
>>>>> -	ceph_decode_need(p, end, sizeof(**lease), bad);
>>>>> +	ceph_decode_need(p, end, struct_len, bad);
>>>>>     	*lease = *p;
>>>>>     	*p += sizeof(**lease);
>>>>> -	if (features == (u64)-1)
>>>>> -		*p = end;
>>>>> +
>>>>> +	if (features == (u64)-1) {
>>>>> +		if (struct_v >= 2) {
>>>>> +			ceph_decode_32_safe(p, end, *altname_len, bad);
>>>>> +			ceph_decode_need(p, end, *altname_len, bad);
>>>>> +			*altname = *p;
>>>>> +			*p += *altname_len;
>>>>> +		} else {
>>>>> +			*altname = NULL;
>>>>> +			*altname_len = 0;
>>>>> +		}
>>>>> +	}
>>>>>     	return 0;
>>>>>     bad:
>>>>>     	return -EIO;
>>>>> @@ -356,7 +373,8 @@ static int parse_reply_info_trace(void **p, void *end,
>>>>>     		info->dname = *p;
>>>>>     		*p += info->dname_len;
>>>>>     
>>>>> -		err = parse_reply_info_lease(p, end, &info->dlease, features);
>>>>> +		err = parse_reply_info_lease(p, end, &info->dlease, features,
>>>>> +					     &info->altname_len, &info->altname);
>>>>>     		if (err < 0)
>>>>>     			goto out_bad;
>>>>>     	}
>>>>> @@ -423,9 +441,11 @@ static int parse_reply_info_readdir(void **p, void *end,
>>>>>     		dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);
>>>>>     
>>>>>     		/* dentry lease */
>>>>> -		err = parse_reply_info_lease(p, end, &rde->lease, features);
>>>>> +		err = parse_reply_info_lease(p, end, &rde->lease, features,
>>>>> +					     &rde->altname_len, &rde->altname);
>>>>>     		if (err)
>>>>>     			goto out_bad;
>>>>> +
>>>>>     		/* inode */
>>>>>     		err = parse_reply_info_in(p, end, &rde->inode, features);
>>>>>     		if (err < 0)
>>>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>>>>> index e7d2c8a1b9c1..128901a847af 100644
>>>>> --- a/fs/ceph/mds_client.h
>>>>> +++ b/fs/ceph/mds_client.h
>>>>> @@ -29,8 +29,8 @@ enum ceph_feature_type {
>>>>>     	CEPHFS_FEATURE_MULTI_RECONNECT,
>>>>>     	CEPHFS_FEATURE_DELEG_INO,
>>>>>     	CEPHFS_FEATURE_METRIC_COLLECT,
>>>>> -
>>>>> -	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
>>>>> +	CEPHFS_FEATURE_ALTERNATE_NAME,
>>>>> +	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_ALTERNATE_NAME,
>>>>>     };
>>>>>     
>>>>>     /*
>>>>> @@ -45,8 +45,7 @@ enum ceph_feature_type {
>>>>>     	CEPHFS_FEATURE_MULTI_RECONNECT,		\
>>>>>     	CEPHFS_FEATURE_DELEG_INO,		\
>>>>>     	CEPHFS_FEATURE_METRIC_COLLECT,		\
>>>>> -						\
>>>>> -	CEPHFS_FEATURE_MAX,			\
>>>>> +	CEPHFS_FEATURE_ALTERNATE_NAME,		\
>>>>>     }
>>>>>     #define CEPHFS_FEATURES_CLIENT_REQUIRED {}
>>>>>     
>>>>> @@ -98,7 +97,9 @@ struct ceph_mds_reply_info_in {
>>>>>     
>>>>>     struct ceph_mds_reply_dir_entry {
>>>>>     	char                          *name;
>>>>> +	u8			      *altname;
>>>>>     	u32                           name_len;
>>>>> +	u32			      altname_len;
>>>>>     	struct ceph_mds_reply_lease   *lease;
>>>>>     	struct ceph_mds_reply_info_in inode;
>>>>>     	loff_t			      offset;
>>>>> @@ -117,7 +118,9 @@ struct ceph_mds_reply_info_parsed {
>>>>>     	struct ceph_mds_reply_info_in diri, targeti;
>>>>>     	struct ceph_mds_reply_dirfrag *dirfrag;
>>>>>     	char                          *dname;
>>>>> +	u8			      *altname;
>>>>>     	u32                           dname_len;
>>>>> +	u32                           altname_len;
>>>>>     	struct ceph_mds_reply_lease   *dlease;
>>>>>     
>>>>>     	/* extra */


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 11/48] ceph: decode alternate_name in lease info
  2022-03-01 14:07           ` Xiubo Li
@ 2022-03-01 14:14             ` Jeff Layton
  2022-03-01 14:30               ` Xiubo Li
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff Layton @ 2022-03-01 14:14 UTC (permalink / raw)
  To: Xiubo Li, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov

On Tue, 2022-03-01 at 22:07 +0800, Xiubo Li wrote:
> On 3/1/22 9:57 PM, Jeff Layton wrote:
> > On Tue, 2022-03-01 at 21:51 +0800, Xiubo Li wrote:
> > > On 3/1/22 9:10 PM, Jeff Layton wrote:
> > > > On Tue, 2022-03-01 at 18:57 +0800, Xiubo Li wrote:
> > > > > On 1/12/22 3:15 AM, Jeff Layton wrote:
> > > > > > Ceph is a bit different from local filesystems, in that we don't want
> > > > > > to store filenames as raw binary data, since we may also be dealing
> > > > > > with clients that don't support fscrypt.
> > > > > > 
> > > > > > We could just base64-encode the encrypted filenames, but that could
> > > > > > leave us with filenames longer than NAME_MAX. It turns out that the
> > > > > > MDS doesn't care much about filename length, but the clients do.
> > > > > > 
> > > > > > To manage this, we've added a new "alternate name" field that can be
> > > > > > optionally added to any dentry that we'll use to store the binary
> > > > > > crypttext of the filename if its base64-encoded value will be longer
> > > > > > than NAME_MAX. When a dentry has one of these names attached, the MDS
> > > > > > will send it along in the lease info, which we can then store for
> > > > > > later usage.
> > > > > > 
> > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > ---
> > > > > >     fs/ceph/mds_client.c | 40 ++++++++++++++++++++++++++++++----------
> > > > > >     fs/ceph/mds_client.h | 11 +++++++----
> > > > > >     2 files changed, 37 insertions(+), 14 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > > > > index 34a4f6dbac9d..709f3f654555 100644
> > > > > > --- a/fs/ceph/mds_client.c
> > > > > > +++ b/fs/ceph/mds_client.c
> > > > > > @@ -306,27 +306,44 @@ static int parse_reply_info_dir(void **p, void *end,
> > > > > >     
> > > > > >     static int parse_reply_info_lease(void **p, void *end,
> > > > > >     				  struct ceph_mds_reply_lease **lease,
> > > > > > -				  u64 features)
> > > > > > +				  u64 features, u32 *altname_len, u8 **altname)
> > > > > >     {
> > > > > > +	u8 struct_v;
> > > > > > +	u32 struct_len;
> > > > > > +
> > > > > >     	if (features == (u64)-1) {
> > > > > > -		u8 struct_v, struct_compat;
> > > > > > -		u32 struct_len;
> > > > > > +		u8 struct_compat;
> > > > > > +
> > > > > >     		ceph_decode_8_safe(p, end, struct_v, bad);
> > > > > >     		ceph_decode_8_safe(p, end, struct_compat, bad);
> > > > > > +
> > > > > >     		/* struct_v is expected to be >= 1. we only understand
> > > > > >     		 * encoding whose struct_compat == 1. */
> > > > > >     		if (!struct_v || struct_compat != 1)
> > > > > >     			goto bad;
> > > > > > +
> > > > > >     		ceph_decode_32_safe(p, end, struct_len, bad);
> > > > > > -		ceph_decode_need(p, end, struct_len, bad);
> > > > > > -		end = *p + struct_len;
> > > > > Hi Jeff,
> > > > > 
> > > > > This is buggy, more detail please see https://tracker.ceph.com/issues/54430.
> > > > > 
> > > > > The following patch will fix it. We should skip the extra memories anyway.
> > > > > 
> > > > > 
> > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > > > index 94b4c6508044..3dea96df4769 100644
> > > > > --- a/fs/ceph/mds_client.c
> > > > > +++ b/fs/ceph/mds_client.c
> > > > > @@ -326,6 +326,7 @@ static int parse_reply_info_lease(void **p, void *end,
> > > > >                            goto bad;
> > > > > 
> > > > >                    ceph_decode_32_safe(p, end, struct_len, bad);
> > > > > +               end = *p + struct_len;
> > > > There may be a bug here,
> > > Yeah, this will be crash when I use the PR
> > > https://github.com/ceph/ceph/pull/45208.
> > > 
> > > 
> > > > but this doesn't look like the right fix. "end"
> > > > denotes the end of the buffer we're decoding. We don't generally want to
> > > > go changing it like this. Consider what would happen if the original
> > > > "end" was shorter than *p + struct_len.
> > > I missed you have also set the struct_len in the else branch.
> > > > >            } else {
> > > > >                    struct_len = sizeof(**lease);
> > > > >                    *altname_len = 0;
> > > > > @@ -346,6 +347,7 @@ static int parse_reply_info_lease(void **p, void *end,
> > > > >                            *altname = NULL;
> > > > >                            *altname_len = 0;
> > > > >                    }
> > > > > +               *p = end;
> > > > I think we just have to do the math here. Maybe this should be something
> > > > like this?
> > > > 
> > > >       *p += struct_len - sizeof(**lease) - *altname_len;
> > > This is correct, but in future if we are adding tens of new fields we
> > > must minus them all here.
> > > 
> > > How about this one:
> > > 
> > > 
> > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > index 94b4c6508044..608d077f2eeb 100644
> > > --- a/fs/ceph/mds_client.c
> > > +++ b/fs/ceph/mds_client.c
> > > @@ -313,6 +313,7 @@ static int parse_reply_info_lease(void **p, void *end,
> > >    {
> > >           u8 struct_v;
> > >           u32 struct_len;
> > > +       void *lend;
> > > 
> > >           if (features == (u64)-1) {
> > >                   u8 struct_compat;
> > > @@ -332,6 +333,7 @@ static int parse_reply_info_lease(void **p, void *end,
> > >                   *altname = NULL;
> > >           }
> > > 
> > > +       lend = *p + struct_len;
> > 
> > Looks reasonable. Maybe also add a check like this?
> > 
> >      if (lend > end)
> > 	    return -EIO;
> 
> I don't think this is needed because the:
> 
>    ceph_decode_need(p, end, struct_len, bad);
> 
> before it will help check it ?
> 
> 

Oh, right....good point. That patch looks fine then.

> > 
> > 
> > >           ceph_decode_need(p, end, struct_len, bad);
> > >           *lease = *p;
> > >           *p += sizeof(**lease);
> > > @@ -347,6 +349,7 @@ static int parse_reply_info_lease(void **p, void *end,
> > >                           *altname_len = 0;
> > >                   }
> > >           }
> > > +       *p = lend;
> > >           return 0;
> > >    bad:
> > >           return -EIO;
> > > 
> > > 
> > > > >            }
> > > > >            return 0;
> > > > >     bad:
> > > > > 
> > > > > 
> > > > > 
> > > > > > +	} else {
> > > > > > +		struct_len = sizeof(**lease);
> > > > > > +		*altname_len = 0;
> > > > > > +		*altname = NULL;
> > > > > >     	}
> > > > > >     
> > > > > > -	ceph_decode_need(p, end, sizeof(**lease), bad);
> > > > > > +	ceph_decode_need(p, end, struct_len, bad);
> > > > > >     	*lease = *p;
> > > > > >     	*p += sizeof(**lease);
> > > > > > -	if (features == (u64)-1)
> > > > > > -		*p = end;
> > > > > > +
> > > > > > +	if (features == (u64)-1) {
> > > > > > +		if (struct_v >= 2) {
> > > > > > +			ceph_decode_32_safe(p, end, *altname_len, bad);
> > > > > > +			ceph_decode_need(p, end, *altname_len, bad);
> > > > > > +			*altname = *p;
> > > > > > +			*p += *altname_len;
> > > > > > +		} else {
> > > > > > +			*altname = NULL;
> > > > > > +			*altname_len = 0;
> > > > > > +		}
> > > > > > +	}
> > > > > >     	return 0;
> > > > > >     bad:
> > > > > >     	return -EIO;
> > > > > > @@ -356,7 +373,8 @@ static int parse_reply_info_trace(void **p, void *end,
> > > > > >     		info->dname = *p;
> > > > > >     		*p += info->dname_len;
> > > > > >     
> > > > > > -		err = parse_reply_info_lease(p, end, &info->dlease, features);
> > > > > > +		err = parse_reply_info_lease(p, end, &info->dlease, features,
> > > > > > +					     &info->altname_len, &info->altname);
> > > > > >     		if (err < 0)
> > > > > >     			goto out_bad;
> > > > > >     	}
> > > > > > @@ -423,9 +441,11 @@ static int parse_reply_info_readdir(void **p, void *end,
> > > > > >     		dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);
> > > > > >     
> > > > > >     		/* dentry lease */
> > > > > > -		err = parse_reply_info_lease(p, end, &rde->lease, features);
> > > > > > +		err = parse_reply_info_lease(p, end, &rde->lease, features,
> > > > > > +					     &rde->altname_len, &rde->altname);
> > > > > >     		if (err)
> > > > > >     			goto out_bad;
> > > > > > +
> > > > > >     		/* inode */
> > > > > >     		err = parse_reply_info_in(p, end, &rde->inode, features);
> > > > > >     		if (err < 0)
> > > > > > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > > > > > index e7d2c8a1b9c1..128901a847af 100644
> > > > > > --- a/fs/ceph/mds_client.h
> > > > > > +++ b/fs/ceph/mds_client.h
> > > > > > @@ -29,8 +29,8 @@ enum ceph_feature_type {
> > > > > >     	CEPHFS_FEATURE_MULTI_RECONNECT,
> > > > > >     	CEPHFS_FEATURE_DELEG_INO,
> > > > > >     	CEPHFS_FEATURE_METRIC_COLLECT,
> > > > > > -
> > > > > > -	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
> > > > > > +	CEPHFS_FEATURE_ALTERNATE_NAME,
> > > > > > +	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_ALTERNATE_NAME,
> > > > > >     };
> > > > > >     
> > > > > >     /*
> > > > > > @@ -45,8 +45,7 @@ enum ceph_feature_type {
> > > > > >     	CEPHFS_FEATURE_MULTI_RECONNECT,		\
> > > > > >     	CEPHFS_FEATURE_DELEG_INO,		\
> > > > > >     	CEPHFS_FEATURE_METRIC_COLLECT,		\
> > > > > > -						\
> > > > > > -	CEPHFS_FEATURE_MAX,			\
> > > > > > +	CEPHFS_FEATURE_ALTERNATE_NAME,		\
> > > > > >     }
> > > > > >     #define CEPHFS_FEATURES_CLIENT_REQUIRED {}
> > > > > >     
> > > > > > @@ -98,7 +97,9 @@ struct ceph_mds_reply_info_in {
> > > > > >     
> > > > > >     struct ceph_mds_reply_dir_entry {
> > > > > >     	char                          *name;
> > > > > > +	u8			      *altname;
> > > > > >     	u32                           name_len;
> > > > > > +	u32			      altname_len;
> > > > > >     	struct ceph_mds_reply_lease   *lease;
> > > > > >     	struct ceph_mds_reply_info_in inode;
> > > > > >     	loff_t			      offset;
> > > > > > @@ -117,7 +118,9 @@ struct ceph_mds_reply_info_parsed {
> > > > > >     	struct ceph_mds_reply_info_in diri, targeti;
> > > > > >     	struct ceph_mds_reply_dirfrag *dirfrag;
> > > > > >     	char                          *dname;
> > > > > > +	u8			      *altname;
> > > > > >     	u32                           dname_len;
> > > > > > +	u32                           altname_len;
> > > > > >     	struct ceph_mds_reply_lease   *dlease;
> > > > > >     
> > > > > >     	/* extra */
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFC PATCH v10 11/48] ceph: decode alternate_name in lease info
  2022-03-01 14:14             ` Jeff Layton
@ 2022-03-01 14:30               ` Xiubo Li
  0 siblings, 0 replies; 84+ messages in thread
From: Xiubo Li @ 2022-03-01 14:30 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel, linux-fscrypt; +Cc: linux-fsdevel, idryomov


On 3/1/22 10:14 PM, Jeff Layton wrote:
> On Tue, 2022-03-01 at 22:07 +0800, Xiubo Li wrote:
>> On 3/1/22 9:57 PM, Jeff Layton wrote:
>>> On Tue, 2022-03-01 at 21:51 +0800, Xiubo Li wrote:
>>>> On 3/1/22 9:10 PM, Jeff Layton wrote:
>>>>> On Tue, 2022-03-01 at 18:57 +0800, Xiubo Li wrote:
>>>>>> On 1/12/22 3:15 AM, Jeff Layton wrote:
>>>>>>> Ceph is a bit different from local filesystems, in that we don't want
>>>>>>> to store filenames as raw binary data, since we may also be dealing
>>>>>>> with clients that don't support fscrypt.
>>>>>>>
>>>>>>> We could just base64-encode the encrypted filenames, but that could
>>>>>>> leave us with filenames longer than NAME_MAX. It turns out that the
>>>>>>> MDS doesn't care much about filename length, but the clients do.
>>>>>>>
>>>>>>> To manage this, we've added a new "alternate name" field that can be
>>>>>>> optionally added to any dentry that we'll use to store the binary
>>>>>>> crypttext of the filename if its base64-encoded value will be longer
>>>>>>> than NAME_MAX. When a dentry has one of these names attached, the MDS
>>>>>>> will send it along in the lease info, which we can then store for
>>>>>>> later usage.
>>>>>>>
>>>>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>>>>> ---
>>>>>>>      fs/ceph/mds_client.c | 40 ++++++++++++++++++++++++++++++----------
>>>>>>>      fs/ceph/mds_client.h | 11 +++++++----
>>>>>>>      2 files changed, 37 insertions(+), 14 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>>>>> index 34a4f6dbac9d..709f3f654555 100644
>>>>>>> --- a/fs/ceph/mds_client.c
>>>>>>> +++ b/fs/ceph/mds_client.c
>>>>>>> @@ -306,27 +306,44 @@ static int parse_reply_info_dir(void **p, void *end,
>>>>>>>      
>>>>>>>      static int parse_reply_info_lease(void **p, void *end,
>>>>>>>      				  struct ceph_mds_reply_lease **lease,
>>>>>>> -				  u64 features)
>>>>>>> +				  u64 features, u32 *altname_len, u8 **altname)
>>>>>>>      {
>>>>>>> +	u8 struct_v;
>>>>>>> +	u32 struct_len;
>>>>>>> +
>>>>>>>      	if (features == (u64)-1) {
>>>>>>> -		u8 struct_v, struct_compat;
>>>>>>> -		u32 struct_len;
>>>>>>> +		u8 struct_compat;
>>>>>>> +
>>>>>>>      		ceph_decode_8_safe(p, end, struct_v, bad);
>>>>>>>      		ceph_decode_8_safe(p, end, struct_compat, bad);
>>>>>>> +
>>>>>>>      		/* struct_v is expected to be >= 1. we only understand
>>>>>>>      		 * encoding whose struct_compat == 1. */
>>>>>>>      		if (!struct_v || struct_compat != 1)
>>>>>>>      			goto bad;
>>>>>>> +
>>>>>>>      		ceph_decode_32_safe(p, end, struct_len, bad);
>>>>>>> -		ceph_decode_need(p, end, struct_len, bad);
>>>>>>> -		end = *p + struct_len;
>>>>>> Hi Jeff,
>>>>>>
>>>>>> This is buggy, more detail please see https://tracker.ceph.com/issues/54430.
>>>>>>
>>>>>> The following patch will fix it. We should skip the extra memories anyway.
>>>>>>
>>>>>>
>>>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>>>> index 94b4c6508044..3dea96df4769 100644
>>>>>> --- a/fs/ceph/mds_client.c
>>>>>> +++ b/fs/ceph/mds_client.c
>>>>>> @@ -326,6 +326,7 @@ static int parse_reply_info_lease(void **p, void *end,
>>>>>>                             goto bad;
>>>>>>
>>>>>>                     ceph_decode_32_safe(p, end, struct_len, bad);
>>>>>> +               end = *p + struct_len;
>>>>> There may be a bug here,
>>>> Yeah, this will be crash when I use the PR
>>>> https://github.com/ceph/ceph/pull/45208.
>>>>
>>>>
>>>>> but this doesn't look like the right fix. "end"
>>>>> denotes the end of the buffer we're decoding. We don't generally want to
>>>>> go changing it like this. Consider what would happen if the original
>>>>> "end" was shorter than *p + struct_len.
>>>> I missed you have also set the struct_len in the else branch.
>>>>>>             } else {
>>>>>>                     struct_len = sizeof(**lease);
>>>>>>                     *altname_len = 0;
>>>>>> @@ -346,6 +347,7 @@ static int parse_reply_info_lease(void **p, void *end,
>>>>>>                             *altname = NULL;
>>>>>>                             *altname_len = 0;
>>>>>>                     }
>>>>>> +               *p = end;
>>>>> I think we just have to do the math here. Maybe this should be something
>>>>> like this?
>>>>>
>>>>>        *p += struct_len - sizeof(**lease) - *altname_len;
>>>> This is correct, but in future if we are adding tens of new fields we
>>>> must minus them all here.
>>>>
>>>> How about this one:
>>>>
>>>>
>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>> index 94b4c6508044..608d077f2eeb 100644
>>>> --- a/fs/ceph/mds_client.c
>>>> +++ b/fs/ceph/mds_client.c
>>>> @@ -313,6 +313,7 @@ static int parse_reply_info_lease(void **p, void *end,
>>>>     {
>>>>            u8 struct_v;
>>>>            u32 struct_len;
>>>> +       void *lend;
>>>>
>>>>            if (features == (u64)-1) {
>>>>                    u8 struct_compat;
>>>> @@ -332,6 +333,7 @@ static int parse_reply_info_lease(void **p, void *end,
>>>>                    *altname = NULL;
>>>>            }
>>>>
>>>> +       lend = *p + struct_len;
>>> Looks reasonable. Maybe also add a check like this?
>>>
>>>       if (lend > end)
>>> 	    return -EIO;
>> I don't think this is needed because the:
>>
>>     ceph_decode_need(p, end, struct_len, bad);
>>
>> before it will help check it ?
>>
>>
> Oh, right....good point. That patch looks fine then.

Cool, I will send out one separate patch to fix it in wip-fscrypt branch.

- Xiubo

>
>>>
>>>>            ceph_decode_need(p, end, struct_len, bad);
>>>>            *lease = *p;
>>>>            *p += sizeof(**lease);
>>>> @@ -347,6 +349,7 @@ static int parse_reply_info_lease(void **p, void *end,
>>>>                            *altname_len = 0;
>>>>                    }
>>>>            }
>>>> +       *p = lend;
>>>>            return 0;
>>>>     bad:
>>>>            return -EIO;
>>>>
>>>>
>>>>>>             }
>>>>>>             return 0;
>>>>>>      bad:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> +	} else {
>>>>>>> +		struct_len = sizeof(**lease);
>>>>>>> +		*altname_len = 0;
>>>>>>> +		*altname = NULL;
>>>>>>>      	}
>>>>>>>      
>>>>>>> -	ceph_decode_need(p, end, sizeof(**lease), bad);
>>>>>>> +	ceph_decode_need(p, end, struct_len, bad);
>>>>>>>      	*lease = *p;
>>>>>>>      	*p += sizeof(**lease);
>>>>>>> -	if (features == (u64)-1)
>>>>>>> -		*p = end;
>>>>>>> +
>>>>>>> +	if (features == (u64)-1) {
>>>>>>> +		if (struct_v >= 2) {
>>>>>>> +			ceph_decode_32_safe(p, end, *altname_len, bad);
>>>>>>> +			ceph_decode_need(p, end, *altname_len, bad);
>>>>>>> +			*altname = *p;
>>>>>>> +			*p += *altname_len;
>>>>>>> +		} else {
>>>>>>> +			*altname = NULL;
>>>>>>> +			*altname_len = 0;
>>>>>>> +		}
>>>>>>> +	}
>>>>>>>      	return 0;
>>>>>>>      bad:
>>>>>>>      	return -EIO;
>>>>>>> @@ -356,7 +373,8 @@ static int parse_reply_info_trace(void **p, void *end,
>>>>>>>      		info->dname = *p;
>>>>>>>      		*p += info->dname_len;
>>>>>>>      
>>>>>>> -		err = parse_reply_info_lease(p, end, &info->dlease, features);
>>>>>>> +		err = parse_reply_info_lease(p, end, &info->dlease, features,
>>>>>>> +					     &info->altname_len, &info->altname);
>>>>>>>      		if (err < 0)
>>>>>>>      			goto out_bad;
>>>>>>>      	}
>>>>>>> @@ -423,9 +441,11 @@ static int parse_reply_info_readdir(void **p, void *end,
>>>>>>>      		dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);
>>>>>>>      
>>>>>>>      		/* dentry lease */
>>>>>>> -		err = parse_reply_info_lease(p, end, &rde->lease, features);
>>>>>>> +		err = parse_reply_info_lease(p, end, &rde->lease, features,
>>>>>>> +					     &rde->altname_len, &rde->altname);
>>>>>>>      		if (err)
>>>>>>>      			goto out_bad;
>>>>>>> +
>>>>>>>      		/* inode */
>>>>>>>      		err = parse_reply_info_in(p, end, &rde->inode, features);
>>>>>>>      		if (err < 0)
>>>>>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>>>>>>> index e7d2c8a1b9c1..128901a847af 100644
>>>>>>> --- a/fs/ceph/mds_client.h
>>>>>>> +++ b/fs/ceph/mds_client.h
>>>>>>> @@ -29,8 +29,8 @@ enum ceph_feature_type {
>>>>>>>      	CEPHFS_FEATURE_MULTI_RECONNECT,
>>>>>>>      	CEPHFS_FEATURE_DELEG_INO,
>>>>>>>      	CEPHFS_FEATURE_METRIC_COLLECT,
>>>>>>> -
>>>>>>> -	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
>>>>>>> +	CEPHFS_FEATURE_ALTERNATE_NAME,
>>>>>>> +	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_ALTERNATE_NAME,
>>>>>>>      };
>>>>>>>      
>>>>>>>      /*
>>>>>>> @@ -45,8 +45,7 @@ enum ceph_feature_type {
>>>>>>>      	CEPHFS_FEATURE_MULTI_RECONNECT,		\
>>>>>>>      	CEPHFS_FEATURE_DELEG_INO,		\
>>>>>>>      	CEPHFS_FEATURE_METRIC_COLLECT,		\
>>>>>>> -						\
>>>>>>> -	CEPHFS_FEATURE_MAX,			\
>>>>>>> +	CEPHFS_FEATURE_ALTERNATE_NAME,		\
>>>>>>>      }
>>>>>>>      #define CEPHFS_FEATURES_CLIENT_REQUIRED {}
>>>>>>>      
>>>>>>> @@ -98,7 +97,9 @@ struct ceph_mds_reply_info_in {
>>>>>>>      
>>>>>>>      struct ceph_mds_reply_dir_entry {
>>>>>>>      	char                          *name;
>>>>>>> +	u8			      *altname;
>>>>>>>      	u32                           name_len;
>>>>>>> +	u32			      altname_len;
>>>>>>>      	struct ceph_mds_reply_lease   *lease;
>>>>>>>      	struct ceph_mds_reply_info_in inode;
>>>>>>>      	loff_t			      offset;
>>>>>>> @@ -117,7 +118,9 @@ struct ceph_mds_reply_info_parsed {
>>>>>>>      	struct ceph_mds_reply_info_in diri, targeti;
>>>>>>>      	struct ceph_mds_reply_dirfrag *dirfrag;
>>>>>>>      	char                          *dname;
>>>>>>> +	u8			      *altname;
>>>>>>>      	u32                           dname_len;
>>>>>>> +	u32                           altname_len;
>>>>>>>      	struct ceph_mds_reply_lease   *dlease;
>>>>>>>      
>>>>>>>      	/* extra */


^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2022-03-01 14:31 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-11 19:15 [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 01/48] vfs: export new_inode_pseudo Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 02/48] fscrypt: export fscrypt_base64url_encode and fscrypt_base64url_decode Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 03/48] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size Jeff Layton
2022-01-27  1:58   ` Eric Biggers
2022-01-11 19:15 ` [RFC PATCH v10 04/48] fscrypt: add fscrypt_context_for_new_inode Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 05/48] ceph: preallocate inode for ops that may create one Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 06/48] ceph: crypto context handling for ceph Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 07/48] ceph: parse new fscrypt_auth and fscrypt_file fields in inode traces Jeff Layton
2022-02-17  8:25   ` Xiubo Li
2022-02-17 11:39     ` Jeff Layton
2022-02-18  1:09       ` Xiubo Li
2022-01-11 19:15 ` [RFC PATCH v10 08/48] ceph: add fscrypt_* handling to caps.c Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 09/48] ceph: add ability to set fscrypt_auth via setattr Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 10/48] ceph: implement -o test_dummy_encryption mount option Jeff Layton
2022-02-11 13:50   ` Luís Henriques
2022-02-11 14:52     ` Jeff Layton
2022-02-14  9:29       ` Luís Henriques
2022-01-11 19:15 ` [RFC PATCH v10 11/48] ceph: decode alternate_name in lease info Jeff Layton
2022-03-01 10:57   ` Xiubo Li
2022-03-01 11:18     ` Xiubo Li
2022-03-01 13:10     ` Jeff Layton
2022-03-01 13:51       ` Xiubo Li
2022-03-01 13:57         ` Jeff Layton
2022-03-01 14:07           ` Xiubo Li
2022-03-01 14:14             ` Jeff Layton
2022-03-01 14:30               ` Xiubo Li
2022-01-11 19:15 ` [RFC PATCH v10 12/48] ceph: add fscrypt ioctls Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 13/48] ceph: make ceph_msdc_build_path use ref-walk Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 14/48] ceph: add encrypted fname handling to ceph_mdsc_build_path Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 15/48] ceph: send altname in MClientRequest Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 16/48] ceph: encode encrypted name in dentry release Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 17/48] ceph: properly set DCACHE_NOKEY_NAME flag in lookup Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 18/48] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 19/48] ceph: add helpers for converting names for userland presentation Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 20/48] ceph: add fscrypt support to ceph_fill_trace Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 21/48] ceph: add support to readdir for encrypted filenames Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 22/48] ceph: create symlinks with encrypted and base64-encoded targets Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 23/48] ceph: make ceph_get_name decrypt filenames Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 24/48] ceph: add a new ceph.fscrypt.auth vxattr Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 25/48] ceph: add some fscrypt guardrails Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 26/48] ceph: don't allow changing layout on encrypted files/directories Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 27/48] libceph: add CEPH_OSD_OP_ASSERT_VER support Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 28/48] ceph: size handling for encrypted inodes in cap updates Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 29/48] ceph: fscrypt_file field handling in MClientRequest messages Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 30/48] ceph: get file size from fscrypt_file when present in inode traces Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 31/48] ceph: handle fscrypt fields in cap messages from MDS Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 32/48] ceph: add __ceph_get_caps helper support Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 33/48] ceph: add __ceph_sync_read " Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 34/48] ceph: add object version support for sync read Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 35/48] ceph: add infrastructure for file encryption and decryption Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 36/48] ceph: add truncate size handling support for fscrypt Jeff Layton
2022-01-12  8:41   ` Xiubo Li
2022-01-11 19:15 ` [RFC PATCH v10 37/48] libceph: allow ceph_osdc_new_request to accept a multi-op read Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 38/48] ceph: disable fallocate for encrypted inodes Jeff Layton
2022-01-11 19:15 ` [RFC PATCH v10 39/48] ceph: disable copy offload on " Jeff Layton
2022-01-11 19:16 ` [RFC PATCH v10 40/48] ceph: don't use special DIO path for " Jeff Layton
2022-01-11 19:16 ` [RFC PATCH v10 41/48] ceph: set encryption context on open Jeff Layton
2022-01-11 19:16 ` [RFC PATCH v10 42/48] ceph: align data in pages in ceph_sync_write Jeff Layton
2022-01-11 19:16 ` [RFC PATCH v10 43/48] ceph: add read/modify/write to ceph_sync_write Jeff Layton
2022-01-19  3:21   ` Xiubo Li
2022-01-19  5:08     ` Xiubo Li
2022-01-19 11:06       ` Jeff Layton
2022-01-11 19:16 ` [RFC PATCH v10 44/48] ceph: plumb in decryption during sync reads Jeff Layton
2022-01-19  5:18   ` Xiubo Li
2022-01-19 18:49     ` Jeff Layton
2022-01-11 19:16 ` [RFC PATCH v10 45/48] ceph: set i_blkbits to crypto block size for encrypted inodes Jeff Layton
2022-01-11 19:16 ` [RFC PATCH v10 46/48] ceph: add fscrypt decryption support to ceph_netfs_issue_op Jeff Layton
2022-01-11 19:16 ` [RFC PATCH v10 47/48] ceph: add encryption support to writepage Jeff Layton
2022-01-11 19:16 ` [RFC PATCH v10 48/48] ceph: fscrypt support for writepages Jeff Layton
2022-01-11 19:26 ` [RFC PATCH v10 00/48] ceph+fscrypt: full support Jeff Layton
2022-01-27  2:14 ` Eric Biggers
2022-01-27 11:08   ` Jeff Layton
2022-01-28 20:39     ` Eric Biggers
2022-01-28 20:47       ` Jeff Layton
2022-02-14  9:37 ` Xiubo Li
2022-02-14 11:33   ` Jeff Layton
2022-02-14 12:08     ` Xiubo Li
2022-02-15  0:44       ` Xiubo Li
2022-02-14 17:57 ` Luís Henriques
2022-02-14 18:39   ` Jeff Layton
2022-02-14 21:00     ` Luís Henriques
2022-02-14 21:10       ` Jeff Layton
2022-02-16 16:13     ` Luís Henriques

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.