ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support
@ 2021-04-13 17:50 Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 01/20] vfs: export new_inode_pseudo Jeff Layton
                   ` (20 more replies)
  0 siblings, 21 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

The main change in this posting is in the detection of fscrypted inodes.
The older set would grovel around in the xattr blob to see if it had an
"encryption.ctx" xattr. This was problematic if the MDS didn't send
xattrs in the trace, and not very efficient.

This posting changes it to use the new "fscrypt" flag, which should
always be reported by the MDS (Luis, I'm hoping this may fix the issues
you were seeing with dcache coherency).

This unfortunately requires an MDS fix, but that should hopefully make
it in and be backported to Pacific fairly soon:

    https://github.com/ceph/ceph/pull/40828

We also now handle get_name in the NFS export code correctly.

Aside from that, there are better changelogs, particularly on the
fscrypt and vfs patches, and some smaller bugfixes and optimizations.

Jeff Layton (20):
  vfs: export new_inode_pseudo
  fscrypt: export fscrypt_base64_encode and fscrypt_base64_decode
  fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
  fscrypt: add fscrypt_context_for_new_inode
  ceph: crypto context handling for ceph
  ceph: implement -o test_dummy_encryption mount option
  ceph: preallocate inode for ops that may create one
  ceph: add routine to create fscrypt context prior to RPC
  ceph: make ceph_msdc_build_path use ref-walk
  ceph: add encrypted fname handling to ceph_mdsc_build_path
  ceph: decode alternate_name in lease info
  ceph: send altname in MClientRequest
  ceph: properly set DCACHE_NOKEY_NAME flag in lookup
  ceph: make d_revalidate call fscrypt revalidator for encrypted
    dentries
  ceph: add helpers for converting names for userland presentation
  ceph: add fscrypt support to ceph_fill_trace
  ceph: add support to readdir for encrypted filenames
  ceph: create symlinks with encrypted and base64-encoded targets
  ceph: make ceph_get_name decrypt filenames
  ceph: add fscrypt ioctls

 fs/ceph/Makefile            |   1 +
 fs/ceph/crypto.c            | 185 ++++++++++++++++++++++
 fs/ceph/crypto.h            | 101 ++++++++++++
 fs/ceph/dir.c               | 178 ++++++++++++++++-----
 fs/ceph/export.c            |  42 +++--
 fs/ceph/file.c              |  58 ++++---
 fs/ceph/inode.c             | 248 ++++++++++++++++++++++++++---
 fs/ceph/ioctl.c             |  93 +++++++++++
 fs/ceph/mds_client.c        | 303 ++++++++++++++++++++++++++++++------
 fs/ceph/mds_client.h        |  15 +-
 fs/ceph/super.c             |  80 +++++++++-
 fs/ceph/super.h             |  15 +-
 fs/ceph/xattr.c             |   5 +
 fs/crypto/fname.c           |  53 +++++--
 fs/crypto/fscrypt_private.h |   9 +-
 fs/crypto/hooks.c           |   6 +-
 fs/crypto/policy.c          |  34 +++-
 fs/inode.c                  |   1 +
 include/linux/fscrypt.h     |  10 ++
 19 files changed, 1263 insertions(+), 174 deletions(-)
 create mode 100644 fs/ceph/crypto.c
 create mode 100644 fs/ceph/crypto.h

-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 01/20] vfs: export new_inode_pseudo
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 02/20] fscrypt: export fscrypt_base64_encode and fscrypt_base64_decode Jeff Layton
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques, Al Viro

Ceph needs to be able to allocate inodes ahead of a create that might
involve a fscrypt-encrypted inode. new_inode() almost fits the bill,
but it puts the inode on the sb->s_inodes list and when we go to hash
it, that might be done again.

We could work around that by setting I_CREATING on the new inode, but
that causes ilookup5 to return -ESTALE if something tries to find it
before I_NEW is cleared. This is desirable behavior for most
filesystems, but doesn't work for ceph.

To work around all of this, just use new_inode_pseudo which doesn't add
it to the sb->s_inodes list.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/inode.c b/fs/inode.c
index a047ab306f9a..0745dc5d0924 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -935,6 +935,7 @@ struct inode *new_inode_pseudo(struct super_block *sb)
 	}
 	return inode;
 }
+EXPORT_SYMBOL(new_inode_pseudo);
 
 /**
  *	new_inode 	- obtain an inode
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 02/20] fscrypt: export fscrypt_base64_encode and fscrypt_base64_decode
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 01/20] vfs: export new_inode_pseudo Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 03/20] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size Jeff Layton
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

Ceph is going to add fscrypt support, but we still want encrypted
filenames to be composed of printable characters, so we can maintain
compatibility with clients that don't support fscrypt.

We could just adopt fscrypt's current nokey name format, but that is
subject to change in the future, and it also contains dirhash fields
that we don't need for cephfs. Because of this, we're going to concoct
our own scheme for encoding encrypted filenames. It's very similar to
fscrypt's current scheme, but doesn't bother with the dirhash fields.

The ceph encoding scheme will use base64 encoding as well, and we also
want it to avoid characters that are illegal in filenames. Export the
fscrypt base64 encoding/decoding routines so we can use them in ceph's
fscrypt implementation.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/crypto/fname.c       | 34 ++++++++++++++++++++++++----------
 include/linux/fscrypt.h |  5 +++++
 2 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 6ca7d16593ff..32b1f50433ba 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -178,10 +178,8 @@ static int fname_decrypt(const struct inode *inode,
 static const char lookup_table[65] =
 	"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+,";
 
-#define BASE64_CHARS(nbytes)	DIV_ROUND_UP((nbytes) * 4, 3)
-
 /**
- * base64_encode() - base64-encode some bytes
+ * fscrypt_base64_encode() - base64-encode some bytes
  * @src: the bytes to encode
  * @len: number of bytes to encode
  * @dst: (output) the base64-encoded string.  Not NUL-terminated.
@@ -191,7 +189,7 @@ static const char lookup_table[65] =
  *
  * Return: length of the encoded string
  */
-static int base64_encode(const u8 *src, int len, char *dst)
+int fscrypt_base64_encode(const u8 *src, int len, char *dst)
 {
 	int i, bits = 0, ac = 0;
 	char *cp = dst;
@@ -209,8 +207,20 @@ static int base64_encode(const u8 *src, int len, char *dst)
 		*cp++ = lookup_table[ac & 0x3f];
 	return cp - dst;
 }
+EXPORT_SYMBOL(fscrypt_base64_encode);
 
-static int base64_decode(const char *src, int len, u8 *dst)
+/**
+ * fscrypt_base64_decode() - base64-decode some bytes
+ * @src: the bytes to decode
+ * @len: number of bytes to decode
+ * @dst: (output) decoded binary data
+ *
+ * Decode an input string that was previously encoded using
+ * fscrypt_base64_encode.
+ *
+ * Return: length of the decoded binary data
+ */
+int fscrypt_base64_decode(const char *src, int len, u8 *dst)
 {
 	int i, bits = 0, ac = 0;
 	const char *p;
@@ -232,6 +242,7 @@ static int base64_decode(const char *src, int len, u8 *dst)
 		return -1;
 	return cp - dst;
 }
+EXPORT_SYMBOL(fscrypt_base64_decode);
 
 bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
 				  u32 orig_len, u32 max_len,
@@ -263,8 +274,9 @@ bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
 int fscrypt_fname_alloc_buffer(u32 max_encrypted_len,
 			       struct fscrypt_str *crypto_str)
 {
-	const u32 max_encoded_len = BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX);
 	u32 max_presented_len;
+	const u32 max_encoded_len =
+		FSCRYPT_BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX);
 
 	max_presented_len = max(max_encoded_len, max_encrypted_len);
 
@@ -342,7 +354,7 @@ int fscrypt_fname_disk_to_usr(const struct inode *inode,
 		     offsetof(struct fscrypt_nokey_name, bytes));
 	BUILD_BUG_ON(offsetofend(struct fscrypt_nokey_name, bytes) !=
 		     offsetof(struct fscrypt_nokey_name, sha256));
-	BUILD_BUG_ON(BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX) > NAME_MAX);
+	BUILD_BUG_ON(FSCRYPT_BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX) > NAME_MAX);
 
 	if (hash) {
 		nokey_name.dirhash[0] = hash;
@@ -362,7 +374,8 @@ int fscrypt_fname_disk_to_usr(const struct inode *inode,
 		       nokey_name.sha256);
 		size = FSCRYPT_NOKEY_NAME_MAX;
 	}
-	oname->len = base64_encode((const u8 *)&nokey_name, size, oname->name);
+	oname->len = fscrypt_base64_encode((const u8 *)&nokey_name, size,
+					   oname->name);
 	return 0;
 }
 EXPORT_SYMBOL(fscrypt_fname_disk_to_usr);
@@ -436,14 +449,15 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname,
 	 * user-supplied name
 	 */
 
-	if (iname->len > BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX))
+	if (iname->len > FSCRYPT_BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX))
 		return -ENOENT;
 
 	fname->crypto_buf.name = kmalloc(FSCRYPT_NOKEY_NAME_MAX, GFP_KERNEL);
 	if (fname->crypto_buf.name == NULL)
 		return -ENOMEM;
 
-	ret = base64_decode(iname->name, iname->len, fname->crypto_buf.name);
+	ret = fscrypt_base64_decode(iname->name, iname->len,
+				    fname->crypto_buf.name);
 	if (ret < (int)offsetof(struct fscrypt_nokey_name, bytes[1]) ||
 	    (ret > offsetof(struct fscrypt_nokey_name, sha256) &&
 	     ret != FSCRYPT_NOKEY_NAME_MAX)) {
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 2ea1387bb497..e300f6145ddc 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -46,6 +46,9 @@ struct fscrypt_name {
 /* Maximum value for the third parameter of fscrypt_operations.set_context(). */
 #define FSCRYPT_SET_CONTEXT_MAX_SIZE	40
 
+/* Calculate worst-case base64 encoding inflation */
+#define FSCRYPT_BASE64_CHARS(nbytes)	DIV_ROUND_UP((nbytes) * 4, 3)
+
 #ifdef CONFIG_FS_ENCRYPTION
 /*
  * fscrypt superblock flags
@@ -207,6 +210,8 @@ void fscrypt_free_inode(struct inode *inode);
 int fscrypt_drop_inode(struct inode *inode);
 
 /* fname.c */
+int fscrypt_base64_encode(const u8 *src, int len, char *dst);
+int fscrypt_base64_decode(const char *src, int len, u8 *dst);
 int fscrypt_setup_filename(struct inode *inode, const struct qstr *iname,
 			   int lookup, struct fscrypt_name *fname);
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 03/20] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 01/20] vfs: export new_inode_pseudo Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 02/20] fscrypt: export fscrypt_base64_encode and fscrypt_base64_decode Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 04/20] fscrypt: add fscrypt_context_for_new_inode Jeff Layton
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

For ceph, we want to use our own scheme for handling filenames that are
are longer than NAME_MAX after encryption and base64 encoding. This
allows us to have a consistent view of the encrypted filenames for
clients that don't support fscrypt and clients that do but that don't
have the key.

Currently, fs/crypto only supports encrypting filenames using
fscrypt_setup_filename, but that also handles encoding nokey names. Ceph
can't use that because it handles nokey names in a different way.

Export fscrypt_fname_encrypt. Rename fscrypt_fname_encrypted_size to
__fscrypt_fname_encrypted_size and add a new wrapper called
fscrypt_fname_encrypted_size that takes an inode argument rather than a
pointer to a fscrypt_policy union.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/crypto/fname.c           | 19 ++++++++++++++-----
 fs/crypto/fscrypt_private.h |  9 +++------
 fs/crypto/hooks.c           |  6 +++---
 include/linux/fscrypt.h     |  4 ++++
 4 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 32b1f50433ba..5a794de7f61d 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -126,6 +126,7 @@ int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
 
 	return 0;
 }
+EXPORT_SYMBOL(fscrypt_fname_encrypt);
 
 /**
  * fname_decrypt() - decrypt a filename
@@ -244,9 +245,9 @@ int fscrypt_base64_decode(const char *src, int len, u8 *dst)
 }
 EXPORT_SYMBOL(fscrypt_base64_decode);
 
-bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
-				  u32 orig_len, u32 max_len,
-				  u32 *encrypted_len_ret)
+bool __fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
+				    u32 orig_len, u32 max_len,
+				    u32 *encrypted_len_ret)
 {
 	int padding = 4 << (fscrypt_policy_flags(policy) &
 			    FSCRYPT_POLICY_FLAGS_PAD_MASK);
@@ -260,6 +261,15 @@ bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
 	return true;
 }
 
+bool fscrypt_fname_encrypted_size(const struct inode *inode, u32 orig_len,
+				  u32 max_len, u32 *encrypted_len_ret)
+{
+	return __fscrypt_fname_encrypted_size(&inode->i_crypt_info->ci_policy,
+					      orig_len, max_len,
+					      encrypted_len_ret);
+}
+EXPORT_SYMBOL(fscrypt_fname_encrypted_size);
+
 /**
  * fscrypt_fname_alloc_buffer() - allocate a buffer for presented filenames
  * @max_encrypted_len: maximum length of encrypted filenames the buffer will be
@@ -422,8 +432,7 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname,
 		return ret;
 
 	if (fscrypt_has_encryption_key(dir)) {
-		if (!fscrypt_fname_encrypted_size(&dir->i_crypt_info->ci_policy,
-						  iname->len,
+		if (!fscrypt_fname_encrypted_size(dir, iname->len,
 						  dir->i_sb->s_cop->max_namelen,
 						  &fname->crypto_buf.len))
 			return -ENAMETOOLONG;
diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h
index 3fa965eb3336..195de6d0db40 100644
--- a/fs/crypto/fscrypt_private.h
+++ b/fs/crypto/fscrypt_private.h
@@ -292,14 +292,11 @@ void fscrypt_generate_iv(union fscrypt_iv *iv, u64 lblk_num,
 			 const struct fscrypt_info *ci);
 
 /* fname.c */
-int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
-			  u8 *out, unsigned int olen);
-bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
-				  u32 orig_len, u32 max_len,
-				  u32 *encrypted_len_ret);
+bool __fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
+				    u32 orig_len, u32 max_len,
+                                    u32 *encrypted_len_ret);
 
 /* hkdf.c */
-
 struct fscrypt_hkdf {
 	struct crypto_shash *hmac_tfm;
 };
diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c
index a73b0376e6f3..e65c19aae041 100644
--- a/fs/crypto/hooks.c
+++ b/fs/crypto/hooks.c
@@ -228,9 +228,9 @@ int fscrypt_prepare_symlink(struct inode *dir, const char *target,
 	 * counting it (even though it is meaningless for ciphertext) is simpler
 	 * for now since filesystems will assume it is there and subtract it.
 	 */
-	if (!fscrypt_fname_encrypted_size(policy, len,
-					  max_len - sizeof(struct fscrypt_symlink_data),
-					  &disk_link->len))
+	if (!__fscrypt_fname_encrypted_size(policy, len,
+					    max_len - sizeof(struct fscrypt_symlink_data),
+					    &disk_link->len))
 		return -ENAMETOOLONG;
 	disk_link->len += sizeof(struct fscrypt_symlink_data);
 
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index e300f6145ddc..b5c31baaa8bf 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -212,6 +212,10 @@ int fscrypt_drop_inode(struct inode *inode);
 /* fname.c */
 int fscrypt_base64_encode(const u8 *src, int len, char *dst);
 int fscrypt_base64_decode(const char *src, int len, u8 *dst);
+bool fscrypt_fname_encrypted_size(const struct inode *inode, u32 orig_len,
+				  u32 max_len, u32 *encrypted_len_ret);
+int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
+			  u8 *out, unsigned int olen);
 int fscrypt_setup_filename(struct inode *inode, const struct qstr *iname,
 			   int lookup, struct fscrypt_name *fname);
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 04/20] fscrypt: add fscrypt_context_for_new_inode
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (2 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 03/20] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 05/20] ceph: crypto context handling for ceph Jeff Layton
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

Most filesystems just call fscrypt_set_context on new inodes, which
usually causes a setxattr. That's a bit late for ceph, which can send
along a full blob of xattrs with the create request.

That allows us to avoid race windows that where the new inode could be
seen by other clients without the crypto context attached. It also
avoids the separate round trip to the server.

Refactor the code a bit to allow us to create a new crypto context,
attach it to the inode, and write it to the buffer, but without calling
set_context on it. ceph can later use this to marshal the context into
the buffer we send along with the create request.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/crypto/policy.c      | 34 ++++++++++++++++++++++++++++------
 include/linux/fscrypt.h |  1 +
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/fs/crypto/policy.c b/fs/crypto/policy.c
index ed3d623724cd..6a895a31560f 100644
--- a/fs/crypto/policy.c
+++ b/fs/crypto/policy.c
@@ -664,6 +664,31 @@ const union fscrypt_policy *fscrypt_policy_to_inherit(struct inode *dir)
 	return fscrypt_get_dummy_policy(dir->i_sb);
 }
 
+/**
+ * fscrypt_context_for_new_inode() - create an encryption context for a new inode
+ * @ctx: where context should be written
+ * @inode: inode from which to fetch policy and nonce
+ *
+ * Given an in-core "prepared" (via fscrypt_prepare_new_inode) inode,
+ * generate a new context and write it to ctx. ctx _must_ be at least
+ * FSCRYPT_SET_CONTEXT_MAX_SIZE bytes.
+ *
+ * Returns size of the resulting context or a negative error code.
+ */
+int fscrypt_context_for_new_inode(void *ctx, struct inode *inode)
+{
+	struct fscrypt_info *ci = inode->i_crypt_info;
+
+	BUILD_BUG_ON(sizeof(union fscrypt_context) != FSCRYPT_SET_CONTEXT_MAX_SIZE);
+
+	/* fscrypt_prepare_new_inode() should have set up the key already. */
+	if (WARN_ON_ONCE(!ci))
+		return -ENOKEY;
+
+	return fscrypt_new_context(ctx, &ci->ci_policy, ci->ci_nonce);
+}
+EXPORT_SYMBOL_GPL(fscrypt_context_for_new_inode);
+
 /**
  * fscrypt_set_context() - Set the fscrypt context of a new inode
  * @inode: a new inode
@@ -680,12 +705,9 @@ int fscrypt_set_context(struct inode *inode, void *fs_data)
 	union fscrypt_context ctx;
 	int ctxsize;
 
-	/* fscrypt_prepare_new_inode() should have set up the key already. */
-	if (WARN_ON_ONCE(!ci))
-		return -ENOKEY;
-
-	BUILD_BUG_ON(sizeof(ctx) != FSCRYPT_SET_CONTEXT_MAX_SIZE);
-	ctxsize = fscrypt_new_context(&ctx, &ci->ci_policy, ci->ci_nonce);
+	ctxsize = fscrypt_context_for_new_inode(&ctx, inode);
+	if (ctxsize < 0)
+		return ctxsize;
 
 	/*
 	 * This may be the first time the inode number is available, so do any
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index b5c31baaa8bf..087fa87bca0b 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -178,6 +178,7 @@ int fscrypt_ioctl_get_policy(struct file *filp, void __user *arg);
 int fscrypt_ioctl_get_policy_ex(struct file *filp, void __user *arg);
 int fscrypt_ioctl_get_nonce(struct file *filp, void __user *arg);
 int fscrypt_has_permitted_context(struct inode *parent, struct inode *child);
+int fscrypt_context_for_new_inode(void *ctx, struct inode *inode);
 int fscrypt_set_context(struct inode *inode, void *fs_data);
 
 struct fscrypt_dummy_policy {
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 05/20] ceph: crypto context handling for ceph
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (3 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 04/20] fscrypt: add fscrypt_context_for_new_inode Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 06/20] ceph: implement -o test_dummy_encryption mount option Jeff Layton
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

Store the fscrypt context for an inode as an encryption.ctx xattr,
and wire up the fscrypt operations to use it.

Add the decoding for the new fscrypt flag in the inode trace and
set the S_ENCRYPT flag on the inode if it's set.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/Makefile     |  1 +
 fs/ceph/crypto.c     | 42 ++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/crypto.h     | 24 ++++++++++++++++++++++++
 fs/ceph/file.c       |  2 ++
 fs/ceph/inode.c      |  6 ++++++
 fs/ceph/mds_client.c | 20 ++++++++++++++++++++
 fs/ceph/mds_client.h |  1 +
 fs/ceph/super.c      |  3 +++
 fs/ceph/xattr.c      |  5 +++++
 9 files changed, 104 insertions(+)
 create mode 100644 fs/ceph/crypto.c
 create mode 100644 fs/ceph/crypto.h

diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile
index 50c635dc7f71..1f77ca04c426 100644
--- a/fs/ceph/Makefile
+++ b/fs/ceph/Makefile
@@ -12,3 +12,4 @@ ceph-y := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \
 
 ceph-$(CONFIG_CEPH_FSCACHE) += cache.o
 ceph-$(CONFIG_CEPH_FS_POSIX_ACL) += acl.o
+ceph-$(CONFIG_FS_ENCRYPTION) += crypto.o
diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
new file mode 100644
index 000000000000..dbe8b60fd1b0
--- /dev/null
+++ b/fs/ceph/crypto.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ceph/ceph_debug.h>
+#include <linux/xattr.h>
+#include <linux/fscrypt.h>
+
+#include "super.h"
+#include "crypto.h"
+
+static int ceph_crypt_get_context(struct inode *inode, void *ctx, size_t len)
+{
+	return __ceph_getxattr(inode, CEPH_XATTR_NAME_ENCRYPTION_CONTEXT, ctx, len);
+}
+
+static int ceph_crypt_set_context(struct inode *inode, const void *ctx, size_t len, void *fs_data)
+{
+	int ret;
+
+	WARN_ON_ONCE(fs_data);
+	ret = __ceph_setxattr(inode, CEPH_XATTR_NAME_ENCRYPTION_CONTEXT, ctx, len, XATTR_CREATE);
+	if (ret == 0)
+		inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
+	return ret;
+}
+
+static bool ceph_crypt_empty_dir(struct inode *inode)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+
+	return ci->i_rsubdirs + ci->i_rfiles == 1;
+}
+
+static struct fscrypt_operations ceph_fscrypt_ops = {
+	.get_context		= ceph_crypt_get_context,
+	.set_context		= ceph_crypt_set_context,
+	.empty_dir		= ceph_crypt_empty_dir,
+	.max_namelen		= NAME_MAX,
+};
+
+void ceph_fscrypt_set_ops(struct super_block *sb)
+{
+	fscrypt_set_ops(sb, &ceph_fscrypt_ops);
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
new file mode 100644
index 000000000000..189bd8424284
--- /dev/null
+++ b/fs/ceph/crypto.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Ceph fscrypt functionality
+ */
+
+#ifndef _CEPH_CRYPTO_H
+#define _CEPH_CRYPTO_H
+
+#include <linux/fscrypt.h>
+
+#define	CEPH_XATTR_NAME_ENCRYPTION_CONTEXT	"encryption.ctx"
+
+#ifdef CONFIG_FS_ENCRYPTION
+void ceph_fscrypt_set_ops(struct super_block *sb);
+
+#else /* CONFIG_FS_ENCRYPTION */
+
+static inline void ceph_fscrypt_set_ops(struct super_block *sb)
+{
+}
+
+#endif /* CONFIG_FS_ENCRYPTION */
+
+#endif
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 77fc037d5beb..989d947e81bb 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -595,6 +595,8 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
 	iinfo.xattr_data = xattr_buf;
 	memset(iinfo.xattr_data, 0, iinfo.xattr_len);
 
+	iinfo.fscrypt = IS_ENCRYPTED(dir);
+
 	in.ino = cpu_to_le64(vino.ino);
 	in.snapid = cpu_to_le64(CEPH_NOSNAP);
 	in.version = cpu_to_le64(1);	// ???
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index e1c63adb196d..301bd859957d 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -14,10 +14,12 @@
 #include <linux/random.h>
 #include <linux/sort.h>
 #include <linux/iversion.h>
+#include <linux/fscrypt.h>
 
 #include "super.h"
 #include "mds_client.h"
 #include "cache.h"
+#include "crypto.h"
 #include <linux/ceph/decode.h>
 
 /*
@@ -569,6 +571,7 @@ void ceph_evict_inode(struct inode *inode)
 	clear_inode(inode);
 
 	ceph_fscache_unregister_inode_cookie(ci);
+	fscrypt_put_encryption_info(inode);
 
 	__ceph_remove_caps(ci);
 
@@ -951,6 +954,9 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 		xattr_blob = NULL;
 	}
 
+	if (iinfo->fscrypt && !IS_ENCRYPTED(inode))
+		inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
+
 	/* finally update i_version */
 	if (le64_to_cpu(info->version) > ci->i_version)
 		ci->i_version = le64_to_cpu(info->version);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index e5af591d3bd4..e5efdf7a938e 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -183,6 +183,26 @@ static int parse_reply_info_in(void **p, void *end,
 			info->rsnaps = 0;
 		}
 
+		if (struct_v >= 5) {
+			u32 alen;
+
+			ceph_decode_32_safe(p, end, alen, bad);
+
+			while (alen--) {
+				u32 len;
+
+				/* key */
+				ceph_decode_32_safe(p, end, len, bad);
+				ceph_decode_skip_n(p, end, len, bad);
+				/* value */
+				ceph_decode_32_safe(p, end, len, bad);
+				ceph_decode_skip_n(p, end, len, bad);
+			}
+		}
+
+		if (struct_v >= 6)
+			ceph_decode_8_safe(p, end, info->fscrypt, bad);
+
 		*p = end;
 	} else {
 		if (features & CEPH_FEATURE_MDS_INLINE_DATA) {
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 61d67eeef896..1522621d0f7e 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -88,6 +88,7 @@ struct ceph_mds_reply_info_in {
 	s32 dir_pin;
 	struct ceph_timespec btime;
 	struct ceph_timespec snap_btime;
+	bool fscrypt;
 	u64 rsnaps;
 	u64 change_attr;
 };
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 9b1b7f4cfdd4..cdac6ff675e2 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -20,6 +20,7 @@
 #include "super.h"
 #include "mds_client.h"
 #include "cache.h"
+#include "crypto.h"
 
 #include <linux/ceph/ceph_features.h>
 #include <linux/ceph/decode.h>
@@ -988,6 +989,8 @@ static int ceph_set_super(struct super_block *s, struct fs_context *fc)
 	s->s_time_min = 0;
 	s->s_time_max = U32_MAX;
 
+	ceph_fscrypt_set_ops(s);
+
 	ret = set_anon_super_fc(s, fc);
 	if (ret != 0)
 		fsc->sb = NULL;
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index 1242db8d3444..997fa35ee507 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -4,6 +4,7 @@
 
 #include "super.h"
 #include "mds_client.h"
+#include "crypto.h"
 
 #include <linux/ceph/decode.h>
 
@@ -1125,6 +1126,10 @@ int __ceph_setxattr(struct inode *inode, const char *name,
 	if (!strncmp(name, XATTR_CEPH_PREFIX, XATTR_CEPH_PREFIX_LEN))
 		goto do_sync_unlocked;
 
+	/* Inform the MDS ASAP if we're setting the encryption context */
+	if (!strcmp(name, CEPH_XATTR_NAME_ENCRYPTION_CONTEXT))
+		goto do_sync_unlocked;
+
 	/* preallocate memory for xattr name, value, index node */
 	err = -ENOMEM;
 	newname = kmemdup(name, name_len + 1, GFP_NOFS);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 06/20] ceph: implement -o test_dummy_encryption mount option
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (4 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 05/20] ceph: crypto context handling for ceph Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 07/20] ceph: preallocate inode for ops that may create one Jeff Layton
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/crypto.c |  6 ++++
 fs/ceph/crypto.h |  8 +++++
 fs/ceph/super.c  | 77 ++++++++++++++++++++++++++++++++++++++++++++++--
 fs/ceph/super.h  |  7 ++++-
 4 files changed, 95 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index dbe8b60fd1b0..879d9a0d3751 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -29,9 +29,15 @@ static bool ceph_crypt_empty_dir(struct inode *inode)
 	return ci->i_rsubdirs + ci->i_rfiles == 1;
 }
 
+static const union fscrypt_policy *ceph_get_dummy_policy(struct super_block *sb)
+{
+	return ceph_sb_to_client(sb)->dummy_enc_policy.policy;
+}
+
 static struct fscrypt_operations ceph_fscrypt_ops = {
 	.get_context		= ceph_crypt_get_context,
 	.set_context		= ceph_crypt_set_context,
+	.get_dummy_policy	= ceph_get_dummy_policy,
 	.empty_dir		= ceph_crypt_empty_dir,
 	.max_namelen		= NAME_MAX,
 };
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 189bd8424284..0dd043b56096 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -13,12 +13,20 @@
 #ifdef CONFIG_FS_ENCRYPTION
 void ceph_fscrypt_set_ops(struct super_block *sb);
 
+static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
+{
+	fscrypt_free_dummy_policy(&fsc->dummy_enc_policy);
+}
+
 #else /* CONFIG_FS_ENCRYPTION */
 
 static inline void ceph_fscrypt_set_ops(struct super_block *sb)
 {
 }
 
+static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
+{
+}
 #endif /* CONFIG_FS_ENCRYPTION */
 
 #endif
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index cdac6ff675e2..48a99da4ff97 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -45,6 +45,7 @@ static void ceph_put_super(struct super_block *s)
 	struct ceph_fs_client *fsc = ceph_sb_to_client(s);
 
 	dout("put_super\n");
+	ceph_fscrypt_free_dummy_policy(fsc);
 	ceph_mdsc_close_sessions(fsc->mdsc);
 }
 
@@ -159,6 +160,7 @@ enum {
 	Opt_quotadf,
 	Opt_copyfrom,
 	Opt_wsync,
+	Opt_test_dummy_encryption,
 };
 
 enum ceph_recover_session_mode {
@@ -197,6 +199,8 @@ static const struct fs_parameter_spec ceph_mount_parameters[] = {
 	fsparam_u32	("rsize",			Opt_rsize),
 	fsparam_string	("snapdirname",			Opt_snapdirname),
 	fsparam_string	("source",			Opt_source),
+	fsparam_flag	("test_dummy_encryption",	Opt_test_dummy_encryption),
+	fsparam_string	("test_dummy_encryption",	Opt_test_dummy_encryption),
 	fsparam_u32	("wsize",			Opt_wsize),
 	fsparam_flag_no	("wsync",			Opt_wsync),
 	{}
@@ -455,6 +459,16 @@ static int ceph_parse_mount_param(struct fs_context *fc,
 		else
 			fsopt->flags |= CEPH_MOUNT_OPT_ASYNC_DIROPS;
 		break;
+	case Opt_test_dummy_encryption:
+#ifdef CONFIG_FS_ENCRYPTION
+		kfree(fsopt->test_dummy_encryption);
+		fsopt->test_dummy_encryption = param->string;
+		param->string = NULL;
+		fsopt->flags |= CEPH_MOUNT_OPT_TEST_DUMMY_ENC;
+#else
+		warnfc(fc, "FS encryption not supported: test_dummy_encryption mount option ignored");
+#endif
+		break;
 	default:
 		BUG();
 	}
@@ -474,6 +488,7 @@ static void destroy_mount_options(struct ceph_mount_options *args)
 	kfree(args->mds_namespace);
 	kfree(args->server_path);
 	kfree(args->fscache_uniq);
+	kfree(args->test_dummy_encryption);
 	kfree(args);
 }
 
@@ -581,6 +596,8 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root)
 	if (fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS)
 		seq_puts(m, ",nowsync");
 
+	fscrypt_show_test_dummy_encryption(m, ',', root->d_sb);
+
 	if (fsopt->wsize != CEPH_MAX_WRITE_SIZE)
 		seq_printf(m, ",wsize=%u", fsopt->wsize);
 	if (fsopt->rsize != CEPH_MAX_READ_SIZE)
@@ -916,6 +933,52 @@ static struct dentry *open_root_dentry(struct ceph_fs_client *fsc,
 	return root;
 }
 
+#ifdef CONFIG_FS_ENCRYPTION
+static int ceph_set_test_dummy_encryption(struct super_block *sb, struct fs_context *fc,
+						struct ceph_mount_options *fsopt)
+{
+	struct ceph_fs_client *fsc = sb->s_fs_info;
+
+	/*
+	 * No changing encryption context on remount. Note that
+	 * fscrypt_set_test_dummy_encryption will validate the version
+	 * string passed in (if any).
+	 */
+	if (fsopt->flags & CEPH_MOUNT_OPT_TEST_DUMMY_ENC) {
+		int err = 0;
+
+		if (fc->purpose == FS_CONTEXT_FOR_RECONFIGURE && !fsc->dummy_enc_policy.policy) {
+			errorfc(fc, "Can't set test_dummy_encryption on remount");
+			return -EEXIST;
+		}
+
+		err = fscrypt_set_test_dummy_encryption(sb,
+							fsc->mount_options->test_dummy_encryption,
+							&fsc->dummy_enc_policy);
+		if (err) {
+			if (err == -EEXIST)
+				errorfc(fc, "Can't change test_dummy_encryption on remount");
+			else if (err == -EINVAL)
+				errorfc(fc, "Value of option \"%s\" is unrecognized",
+					fsc->mount_options->test_dummy_encryption);
+			else
+				errorfc(fc, "Error processing option \"%s\" [%d]",
+					fsc->mount_options->test_dummy_encryption, err);
+			return err;
+		}
+		warnfc(fc, "test_dummy_encryption mode enabled");
+	}
+	return 0;
+}
+#else
+static inline int ceph_set_test_dummy_encryption(struct super_block *sb, struct fs_context *fc,
+						struct ceph_mount_options *fsopt)
+{
+	warnfc(fc, "test_dummy_encryption mode ignored");
+	return 0;
+}
+#endif
+
 /*
  * mount: join the ceph cluster, and open root directory.
  */
@@ -944,6 +1007,10 @@ static struct dentry *ceph_real_mount(struct ceph_fs_client *fsc,
 				goto out;
 		}
 
+		err = ceph_set_test_dummy_encryption(fsc->sb, fc, fsc->mount_options);
+		if (err)
+			goto out;
+
 		dout("mount opening path '%s'\n", path);
 
 		ceph_fs_debugfs_init(fsc);
@@ -1138,16 +1205,22 @@ static void ceph_free_fc(struct fs_context *fc)
 
 static int ceph_reconfigure_fc(struct fs_context *fc)
 {
+	int err;
 	struct ceph_parse_opts_ctx *pctx = fc->fs_private;
 	struct ceph_mount_options *fsopt = pctx->opts;
-	struct ceph_fs_client *fsc = ceph_sb_to_client(fc->root->d_sb);
+	struct super_block *sb = fc->root->d_sb;
+	struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
+
+	err = ceph_set_test_dummy_encryption(sb, fc, fsopt);
+	if (err)
+		return err;
 
 	if (fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS)
 		ceph_set_mount_opt(fsc, ASYNC_DIROPS);
 	else
 		ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
 
-	sync_filesystem(fc->root->d_sb);
+	sync_filesystem(sb);
 	return 0;
 }
 
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 0fa2ea9a8907..ed2929f20258 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -17,6 +17,7 @@
 #include <linux/posix_acl.h>
 #include <linux/refcount.h>
 #include <linux/security.h>
+#include <linux/fscrypt.h>
 
 #include <linux/ceph/libceph.h>
 
@@ -45,6 +46,7 @@
 #define CEPH_MOUNT_OPT_NOQUOTADF       (1<<13) /* no root dir quota in statfs */
 #define CEPH_MOUNT_OPT_NOCOPYFROM      (1<<14) /* don't use RADOS 'copy-from' op */
 #define CEPH_MOUNT_OPT_ASYNC_DIROPS    (1<<15) /* allow async directory ops */
+#define CEPH_MOUNT_OPT_TEST_DUMMY_ENC  (1<<16) /* enable dummy encryption (for testing) */
 
 #define CEPH_MOUNT_OPT_DEFAULT			\
 	(CEPH_MOUNT_OPT_DCACHE |		\
@@ -97,6 +99,7 @@ struct ceph_mount_options {
 	char *mds_namespace;  /* default NULL */
 	char *server_path;    /* default NULL (means "/") */
 	char *fscache_uniq;   /* default NULL */
+	char *test_dummy_encryption;	/* default NULL */
 };
 
 struct ceph_fs_client {
@@ -136,9 +139,11 @@ struct ceph_fs_client {
 #ifdef CONFIG_CEPH_FSCACHE
 	struct fscache_cookie *fscache;
 #endif
+#ifdef CONFIG_FS_ENCRYPTION
+	struct fscrypt_dummy_policy dummy_enc_policy;
+#endif
 };
 
-
 /*
  * File i/o capability.  This tracks shared state with the metadata
  * server that allows us to cache or writeback attributes or to read
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 07/20] ceph: preallocate inode for ops that may create one
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (5 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 06/20] ceph: implement -o test_dummy_encryption mount option Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 08/20] ceph: add routine to create fscrypt context prior to RPC Jeff Layton
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

When creating a new inode, we need to determine the crypto context
before we can transmit the RPC. The fscrypt API has a routine for getting
a crypto context before a create occurs, but it requires an inode.

Change the ceph code to preallocate an inode in advance of a create of
any sort (open(), mknod(), symlink(), etc). Move the existing code that
generates the ACL and SELinux blobs into this routine since that's
mostly common across all the different codepaths.

In most cases, we just want to allow ceph_fill_trace to use that inode
after the reply comes in, so add a new field to the MDS request for it
(r_new_inode).

The async create codepath is a bit different though. In that case, we
want to hash the inode in advance of the RPC so that it can be used
before the reply comes in. If the call subsequently fails with
-EJUKEBOX, then just put the references and clean up the as_ctx. Note
that with this change, we now need to regenerate the as_ctx when this
occurs, but it's quite rare for it to happen.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c        | 49 +++++++++++++++++------------
 fs/ceph/file.c       | 56 +++++++++++++++++++++------------
 fs/ceph/inode.c      | 74 +++++++++++++++++++++++++++++++++++++++-----
 fs/ceph/mds_client.c |  3 +-
 fs/ceph/mds_client.h |  1 +
 fs/ceph/super.h      |  5 ++-
 6 files changed, 139 insertions(+), 49 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 570662dec3fe..496d24b003dd 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -851,13 +851,6 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out;
 	}
 
-	err = ceph_pre_init_acls(dir, &mode, &as_ctx);
-	if (err < 0)
-		goto out;
-	err = ceph_security_init_secctx(dentry, mode, &as_ctx);
-	if (err < 0)
-		goto out;
-
 	dout("mknod in dir %p dentry %p mode 0%ho rdev %d\n",
 	     dir, dentry, mode, rdev);
 	req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_MKNOD, USE_AUTH_MDS);
@@ -865,6 +858,14 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 		err = PTR_ERR(req);
 		goto out;
 	}
+
+	req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+	if (IS_ERR(req->r_new_inode)) {
+		err = PTR_ERR(req->r_new_inode);
+		req->r_new_inode = NULL;
+		goto out_req;
+	}
+
 	req->r_dentry = dget(dentry);
 	req->r_num_caps = 2;
 	req->r_parent = dir;
@@ -880,6 +881,7 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 	err = ceph_mdsc_do_request(mdsc, dir, req);
 	if (!err && !req->r_reply_info.head->is_dentry)
 		err = ceph_handle_notrace_create(dir, dentry);
+out_req:
 	ceph_mdsc_put_request(req);
 out:
 	if (!err)
@@ -902,6 +904,7 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 	struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb);
 	struct ceph_mds_request *req;
 	struct ceph_acl_sec_ctx as_ctx = {};
+	umode_t mode = S_IFLNK | 0777;
 	int err;
 
 	if (ceph_snap(dir) != CEPH_NOSNAP)
@@ -912,21 +915,24 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out;
 	}
 
-	err = ceph_security_init_secctx(dentry, S_IFLNK | 0777, &as_ctx);
-	if (err < 0)
-		goto out;
-
 	dout("symlink in dir %p dentry %p to '%s'\n", dir, dentry, dest);
 	req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_SYMLINK, USE_AUTH_MDS);
 	if (IS_ERR(req)) {
 		err = PTR_ERR(req);
 		goto out;
 	}
+
+	req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+	if (IS_ERR(req->r_new_inode)) {
+		err = PTR_ERR(req->r_new_inode);
+		req->r_new_inode = NULL;
+		goto out_req;
+	}
+
 	req->r_path2 = kstrdup(dest, GFP_KERNEL);
 	if (!req->r_path2) {
 		err = -ENOMEM;
-		ceph_mdsc_put_request(req);
-		goto out;
+		goto out_req;
 	}
 	req->r_parent = dir;
 	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
@@ -941,6 +947,7 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 	err = ceph_mdsc_do_request(mdsc, dir, req);
 	if (!err && !req->r_reply_info.head->is_dentry)
 		err = ceph_handle_notrace_create(dir, dentry);
+out_req:
 	ceph_mdsc_put_request(req);
 out:
 	if (err)
@@ -976,13 +983,6 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out;
 	}
 
-	mode |= S_IFDIR;
-	err = ceph_pre_init_acls(dir, &mode, &as_ctx);
-	if (err < 0)
-		goto out;
-	err = ceph_security_init_secctx(dentry, mode, &as_ctx);
-	if (err < 0)
-		goto out;
 
 	req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
 	if (IS_ERR(req)) {
@@ -990,6 +990,14 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out;
 	}
 
+	mode |= S_IFDIR;
+	req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+	if (IS_ERR(req->r_new_inode)) {
+		err = PTR_ERR(req->r_new_inode);
+		req->r_new_inode = NULL;
+		goto out_req;
+	}
+
 	req->r_dentry = dget(dentry);
 	req->r_num_caps = 2;
 	req->r_parent = dir;
@@ -1006,6 +1014,7 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 	    !req->r_reply_info.head->is_target &&
 	    !req->r_reply_info.head->is_dentry)
 		err = ceph_handle_notrace_create(dir, dentry);
+out_req:
 	ceph_mdsc_put_request(req);
 out:
 	if (!err)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 989d947e81bb..dbb5eb9367d7 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -565,7 +565,8 @@ static void ceph_async_create_cb(struct ceph_mds_client *mdsc,
 	ceph_mdsc_release_dir_caps(req);
 }
 
-static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
+static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
+				    struct dentry *dentry,
 				    struct file *file, umode_t mode,
 				    struct ceph_mds_request *req,
 				    struct ceph_acl_sec_ctx *as_ctx,
@@ -576,17 +577,12 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
 	struct ceph_mds_reply_inode in = { };
 	struct ceph_mds_reply_info_in iinfo = { .in = &in };
 	struct ceph_inode_info *ci = ceph_inode(dir);
-	struct inode *inode;
 	struct timespec64 now;
 	struct ceph_vino vino = { .ino = req->r_deleg_ino,
 				  .snap = CEPH_NOSNAP };
 
 	ktime_get_real_ts64(&now);
 
-	inode = ceph_get_inode(dentry->d_sb, vino);
-	if (IS_ERR(inode))
-		return PTR_ERR(inode);
-
 	iinfo.inline_version = CEPH_INLINE_NONE;
 	iinfo.change_attr = 1;
 	ceph_encode_timespec64(&iinfo.btime, &now);
@@ -624,8 +620,7 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
 		ceph_dir_clear_complete(dir);
 		if (!d_unhashed(dentry))
 			d_drop(dentry);
-		if (inode->i_state & I_NEW)
-			discard_new_inode(inode);
+		discard_new_inode(inode);
 	} else {
 		struct dentry *dn;
 
@@ -665,6 +660,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
 	struct ceph_mds_client *mdsc = fsc->mdsc;
 	struct ceph_mds_request *req;
+	struct inode *new_inode = NULL;
 	struct dentry *dn;
 	struct ceph_acl_sec_ctx as_ctx = {};
 	bool try_async = ceph_test_mount_opt(fsc, ASYNC_DIROPS);
@@ -677,21 +673,21 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 
 	if (dentry->d_name.len > NAME_MAX)
 		return -ENAMETOOLONG;
-
+retry:
 	if (flags & O_CREAT) {
 		if (ceph_quota_is_max_files_exceeded(dir))
 			return -EDQUOT;
-		err = ceph_pre_init_acls(dir, &mode, &as_ctx);
-		if (err < 0)
-			return err;
-		err = ceph_security_init_secctx(dentry, mode, &as_ctx);
-		if (err < 0)
+
+		new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+		if (IS_ERR(new_inode)) {
+			err = PTR_ERR(new_inode);
 			goto out_ctx;
+		}
 	} else if (!d_in_lookup(dentry)) {
 		/* If it's not being looked up, it's negative */
 		return -ENOENT;
 	}
-retry:
+
 	/* do the open */
 	req = prepare_open_request(dir->i_sb, flags, mode);
 	if (IS_ERR(req)) {
@@ -715,21 +711,38 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 			req->r_pagelist = as_ctx.pagelist;
 			as_ctx.pagelist = NULL;
 		}
-		if (try_async &&
-		    (req->r_dir_caps =
-		      try_prep_async_create(dir, dentry, &lo,
-					    &req->r_deleg_ino))) {
+
+		if (try_async && (req->r_dir_caps =
+				  try_prep_async_create(dir, dentry, &lo, &req->r_deleg_ino))) {
+			struct ceph_vino vino = { .ino = req->r_deleg_ino,
+						  .snap = CEPH_NOSNAP };
+
 			set_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags);
 			req->r_args.open.flags |= cpu_to_le32(CEPH_O_EXCL);
 			req->r_callback = ceph_async_create_cb;
+
+			/* Hash inode before RPC */
+			new_inode = ceph_get_inode(dir->i_sb, vino, new_inode);
+			if (IS_ERR(new_inode)) {
+				err = PTR_ERR(new_inode);
+				new_inode = NULL;
+				goto out_req;
+			}
+			WARN_ON_ONCE(!(new_inode->i_state & I_NEW));
+
 			err = ceph_mdsc_submit_request(mdsc, dir, req);
 			if (!err) {
-				err = ceph_finish_async_create(dir, dentry,
+				err = ceph_finish_async_create(dir, new_inode, dentry,
 							file, mode, req,
 							&as_ctx, &lo);
+				new_inode = NULL;
 			} else if (err == -EJUKEBOX) {
 				restore_deleg_ino(dir, req->r_deleg_ino);
 				ceph_mdsc_put_request(req);
+				discard_new_inode(new_inode);
+				ceph_release_acl_sec_ctx(&as_ctx);
+				memset(&as_ctx, 0, sizeof(as_ctx));
+				new_inode = NULL;
 				try_async = false;
 				goto retry;
 			}
@@ -738,6 +751,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	}
 
 	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
+	req->r_new_inode = new_inode;
+	new_inode = NULL;
 	err = ceph_mdsc_do_request(mdsc,
 				   (flags & (O_CREAT|O_TRUNC)) ? dir : NULL,
 				   req);
@@ -778,6 +793,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	}
 out_req:
 	ceph_mdsc_put_request(req);
+	iput(new_inode);
 out_ctx:
 	ceph_release_acl_sec_ctx(&as_ctx);
 	dout("atomic_open result=%d\n", err);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 301bd859957d..7cf919b530db 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -54,17 +54,77 @@ static int ceph_set_ino_cb(struct inode *inode, void *data)
 	return 0;
 }
 
-struct inode *ceph_get_inode(struct super_block *sb, struct ceph_vino vino)
+/**
+ * ceph_new_inode - allocate a new inode in advance of an expected create
+ * @dir: parent directory for new inode
+ * @dentry: dentry that may eventually point to new inode
+ * @mode: mode of new inode
+ * @as_ctx: pointer to inherited security context
+ *
+ * Allocate a new inode in advance of an operation to create a new inode.
+ * This allocates the inode and sets up the acl_sec_ctx with appropriate
+ * info for the new inode.
+ *
+ * Returns a pointer to the new inode or an ERR_PTR.
+ */
+struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
+			     umode_t *mode, struct ceph_acl_sec_ctx *as_ctx)
+{
+	int err;
+	struct inode *inode;
+
+	inode = new_inode_pseudo(dir->i_sb);
+	if (!inode)
+		return ERR_PTR(-ENOMEM);
+
+	if (!S_ISLNK(*mode)) {
+		err = ceph_pre_init_acls(dir, mode, as_ctx);
+		if (err < 0)
+			goto out_err;
+	}
+
+	err = ceph_security_init_secctx(dentry, *mode, as_ctx);
+	if (err < 0)
+		goto out_err;
+
+	inode->i_state = 0;
+	inode->i_mode = *mode;
+	return inode;
+out_err:
+	iput(inode);
+	return ERR_PTR(err);
+}
+
+/**
+ * ceph_get_inode - find or create/hash a new inode
+ * @sb: superblock to search and allocate in
+ * @vino: vino to search for
+ * @newino: optional new inode to insert if one isn't found (may be NULL)
+ *
+ * Search for or insert a new inode into the hash for the given vino, and return a
+ * reference to it. If new is non-NULL, its reference is consumed.
+ */
+struct inode *ceph_get_inode(struct super_block *sb, struct ceph_vino vino, struct inode *newino)
 {
 	struct inode *inode;
 
 	if (ceph_vino_is_reserved(vino))
 		return ERR_PTR(-EREMOTEIO);
 
-	inode = iget5_locked(sb, (unsigned long)vino.ino, ceph_ino_compare,
-			     ceph_set_ino_cb, &vino);
-	if (!inode)
+	if (newino) {
+		inode = inode_insert5(newino, (unsigned long)vino.ino, ceph_ino_compare,
+					ceph_set_ino_cb, &vino);
+		if (inode != newino)
+			iput(newino);
+	} else {
+		inode = iget5_locked(sb, (unsigned long)vino.ino, ceph_ino_compare,
+				     ceph_set_ino_cb, &vino);
+	}
+
+	if (!inode) {
+		dout("No inode found for %llx.%llx\n", vino.ino, vino.snap);
 		return ERR_PTR(-ENOMEM);
+	}
 
 	dout("get_inode on %llu=%llx.%llx got %p new %d\n", ceph_present_inode(inode),
 	     ceph_vinop(inode), inode, !!(inode->i_state & I_NEW));
@@ -80,7 +140,7 @@ struct inode *ceph_get_snapdir(struct inode *parent)
 		.ino = ceph_ino(parent),
 		.snap = CEPH_SNAPDIR,
 	};
-	struct inode *inode = ceph_get_inode(parent->i_sb, vino);
+	struct inode *inode = ceph_get_inode(parent->i_sb, vino, NULL);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 
 	if (IS_ERR(inode))
@@ -1552,7 +1612,7 @@ static int readdir_prepopulate_inodes_only(struct ceph_mds_request *req,
 		vino.ino = le64_to_cpu(rde->inode.in->ino);
 		vino.snap = le64_to_cpu(rde->inode.in->snapid);
 
-		in = ceph_get_inode(req->r_dentry->d_sb, vino);
+		in = ceph_get_inode(req->r_dentry->d_sb, vino, NULL);
 		if (IS_ERR(in)) {
 			err = PTR_ERR(in);
 			dout("new_inode badness got %d\n", err);
@@ -1755,7 +1815,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 		if (d_really_is_positive(dn)) {
 			in = d_inode(dn);
 		} else {
-			in = ceph_get_inode(parent->d_sb, tvino);
+			in = ceph_get_inode(parent->d_sb, tvino, NULL);
 			if (IS_ERR(in)) {
 				dout("new_inode badness\n");
 				d_drop(dn);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index e5efdf7a938e..87e379d8027a 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -850,6 +850,7 @@ void ceph_mdsc_release_request(struct kref *kref)
 		ceph_async_iput(req->r_parent);
 	}
 	ceph_async_iput(req->r_target_inode);
+	ceph_async_iput(req->r_new_inode);
 	if (req->r_dentry)
 		dput(req->r_dentry);
 	if (req->r_old_dentry)
@@ -3263,7 +3264,7 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
 			.snap = le64_to_cpu(rinfo->targeti.in->snapid)
 		};
 
-		in = ceph_get_inode(mdsc->fsc->sb, tvino);
+		in = ceph_get_inode(mdsc->fsc->sb, tvino, xchg(&req->r_new_inode, NULL));
 		if (IS_ERR(in)) {
 			err = PTR_ERR(in);
 			mutex_lock(&session->s_mutex);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 1522621d0f7e..84c4476bc520 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -262,6 +262,7 @@ struct ceph_mds_request {
 
 	struct inode *r_parent;		    /* parent dir inode */
 	struct inode *r_target_inode;       /* resulting inode */
+	struct inode *r_new_inode;	    /* new inode (for creates) */
 
 #define CEPH_MDS_R_DIRECT_IS_HASH	(1) /* r_direct_hash is valid */
 #define CEPH_MDS_R_ABORTED		(2) /* call was aborted */
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index ed2929f20258..fa3a87a4d233 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -961,6 +961,7 @@ static inline bool __ceph_have_pending_cap_snap(struct ceph_inode_info *ci)
 /* inode.c */
 struct ceph_mds_reply_info_in;
 struct ceph_mds_reply_dirfrag;
+struct ceph_acl_sec_ctx;
 
 extern const struct inode_operations ceph_file_iops;
 
@@ -968,8 +969,10 @@ extern struct inode *ceph_alloc_inode(struct super_block *sb);
 extern void ceph_evict_inode(struct inode *inode);
 extern void ceph_free_inode(struct inode *inode);
 
+struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
+			     umode_t *mode, struct ceph_acl_sec_ctx *as_ctx);
 extern struct inode *ceph_get_inode(struct super_block *sb,
-				    struct ceph_vino vino);
+				    struct ceph_vino vino, struct inode *newino);
 extern struct inode *ceph_get_snapdir(struct inode *parent);
 extern int ceph_fill_file_size(struct inode *inode, int issued,
 			       u32 truncate_seq, u64 truncate_size, u64 size);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 08/20] ceph: add routine to create fscrypt context prior to RPC
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (6 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 07/20] ceph: preallocate inode for ops that may create one Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 09/20] ceph: make ceph_msdc_build_path use ref-walk Jeff Layton
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

After pre-creating a new inode, do an fscrypt prepare on it, fetch a
new encryption context and then marshal that into the security context
to be sent along with the RPC. Call the new function from
ceph_new_inode.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/crypto.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/crypto.h | 12 ++++++++++
 fs/ceph/inode.c  |  9 +++++--
 fs/ceph/super.h  |  3 +++
 4 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 879d9a0d3751..f037a4939026 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -46,3 +46,64 @@ void ceph_fscrypt_set_ops(struct super_block *sb)
 {
 	fscrypt_set_ops(sb, &ceph_fscrypt_ops);
 }
+
+int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
+				 struct ceph_acl_sec_ctx *as)
+{
+	int ret, ctxsize;
+	size_t name_len;
+	char *name;
+	struct ceph_pagelist *pagelist = as->pagelist;
+	bool encrypted = false;
+
+	ret = fscrypt_prepare_new_inode(dir, inode, &encrypted);
+	if (ret)
+		return ret;
+	if (!encrypted)
+		return 0;
+
+	inode->i_flags |= S_ENCRYPTED;
+
+	ctxsize = fscrypt_context_for_new_inode(&as->fscrypt, inode);
+	if (ctxsize < 0)
+		return ctxsize;
+
+	/* marshal it in page array */
+	if (!pagelist) {
+		pagelist = ceph_pagelist_alloc(GFP_KERNEL);
+		if (!pagelist)
+			return -ENOMEM;
+		ret = ceph_pagelist_reserve(pagelist, PAGE_SIZE);
+		if (ret)
+			goto out;
+		ceph_pagelist_encode_32(pagelist, 1);
+	}
+
+	name = CEPH_XATTR_NAME_ENCRYPTION_CONTEXT;
+	name_len = strlen(name);
+	ret = ceph_pagelist_reserve(pagelist, 4 * 2 + name_len + ctxsize);
+	if (ret)
+		goto out;
+
+	if (as->pagelist) {
+		BUG_ON(pagelist->length <= sizeof(__le32));
+		if (list_is_singular(&pagelist->head)) {
+			le32_add_cpu((__le32*)pagelist->mapped_tail, 1);
+		} else {
+			struct page *page = list_first_entry(&pagelist->head,
+							     struct page, lru);
+			void *addr = kmap_atomic(page);
+			le32_add_cpu((__le32*)addr, 1);
+			kunmap_atomic(addr);
+		}
+	}
+
+	ceph_pagelist_encode_32(pagelist, name_len);
+	ceph_pagelist_append(pagelist, name, name_len);
+	ceph_pagelist_encode_32(pagelist, ctxsize);
+	ceph_pagelist_append(pagelist, as->fscrypt, ctxsize);
+out:
+	if (pagelist && !as->pagelist)
+		ceph_pagelist_release(pagelist);
+	return ret;
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 0dd043b56096..cc4e481bf13a 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -18,6 +18,9 @@ static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
 	fscrypt_free_dummy_policy(&fsc->dummy_enc_policy);
 }
 
+int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
+				 struct ceph_acl_sec_ctx *as);
+
 #else /* CONFIG_FS_ENCRYPTION */
 
 static inline void ceph_fscrypt_set_ops(struct super_block *sb)
@@ -27,6 +30,15 @@ static inline void ceph_fscrypt_set_ops(struct super_block *sb)
 static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
 {
 }
+
+static inline int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
+						struct ceph_acl_sec_ctx *as)
+{
+	if (IS_ENCRYPTED(dir))
+		return -EOPNOTSUPP;
+	return 0;
+}
+
 #endif /* CONFIG_FS_ENCRYPTION */
 
 #endif
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 7cf919b530db..e20d1da9fe71 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -83,12 +83,17 @@ struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
 			goto out_err;
 	}
 
+	inode->i_state = 0;
+	inode->i_mode = *mode;
+
 	err = ceph_security_init_secctx(dentry, *mode, as_ctx);
 	if (err < 0)
 		goto out_err;
 
-	inode->i_state = 0;
-	inode->i_mode = *mode;
+	err = ceph_fscrypt_prepare_context(dir, inode, as_ctx);
+	if (err)
+		goto out_err;
+
 	return inode;
 out_err:
 	iput(inode);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index fa3a87a4d233..49356f9137ba 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1055,6 +1055,9 @@ struct ceph_acl_sec_ctx {
 #ifdef CONFIG_CEPH_FS_SECURITY_LABEL
 	void *sec_ctx;
 	u32 sec_ctxlen;
+#endif
+#ifdef CONFIG_FS_ENCRYPTION
+	u8	fscrypt[FSCRYPT_SET_CONTEXT_MAX_SIZE];
 #endif
 	struct ceph_pagelist *pagelist;
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 09/20] ceph: make ceph_msdc_build_path use ref-walk
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (7 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 08/20] ceph: add routine to create fscrypt context prior to RPC Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 10/20] ceph: add encrypted fname handling to ceph_mdsc_build_path Jeff Layton
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

Encryption potentially requires allocation, at which point we'll need to
be in a non-atomic context. Convert ceph_msdc_build_path to take dentry
spinlocks and references instead of using rcu_read_lock to walk the
path.

This is slightly less efficient, and we may want to eventually allow
using RCU when the leaf dentry isn't encrypted.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 87e379d8027a..ad0754a45811 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2357,7 +2357,8 @@ static inline  u64 __get_oldest_tid(struct ceph_mds_client *mdsc)
 char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 			   int stop_on_nosnap)
 {
-	struct dentry *temp;
+	struct dentry *cur;
+	struct inode *inode;
 	char *path;
 	int pos;
 	unsigned seq;
@@ -2374,34 +2375,35 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 	path[pos] = '\0';
 
 	seq = read_seqbegin(&rename_lock);
-	rcu_read_lock();
-	temp = dentry;
+	cur = dget(dentry);
 	for (;;) {
-		struct inode *inode;
+		struct dentry *temp;
 
-		spin_lock(&temp->d_lock);
-		inode = d_inode(temp);
+		spin_lock(&cur->d_lock);
+		inode = d_inode(cur);
 		if (inode && ceph_snap(inode) == CEPH_SNAPDIR) {
 			dout("build_path path+%d: %p SNAPDIR\n",
-			     pos, temp);
-		} else if (stop_on_nosnap && inode && dentry != temp &&
+			     pos, cur);
+		} else if (stop_on_nosnap && inode && dentry != cur &&
 			   ceph_snap(inode) == CEPH_NOSNAP) {
-			spin_unlock(&temp->d_lock);
+			spin_unlock(&cur->d_lock);
 			pos++; /* get rid of any prepended '/' */
 			break;
 		} else {
-			pos -= temp->d_name.len;
+			pos -= cur->d_name.len;
 			if (pos < 0) {
-				spin_unlock(&temp->d_lock);
+				spin_unlock(&cur->d_lock);
 				break;
 			}
-			memcpy(path + pos, temp->d_name.name, temp->d_name.len);
+			memcpy(path + pos, cur->d_name.name, cur->d_name.len);
 		}
+		temp = cur;
 		spin_unlock(&temp->d_lock);
-		temp = READ_ONCE(temp->d_parent);
+		cur = dget_parent(temp);
+		dput(temp);
 
 		/* Are we at the root? */
-		if (IS_ROOT(temp))
+		if (IS_ROOT(cur))
 			break;
 
 		/* Are we out of buffer? */
@@ -2410,8 +2412,9 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 
 		path[pos] = '/';
 	}
-	base = ceph_ino(d_inode(temp));
-	rcu_read_unlock();
+	inode = d_inode(cur);
+	base = inode ? ceph_ino(inode) : 0;
+	dput(cur);
 
 	if (read_seqretry(&rename_lock, seq))
 		goto retry;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 10/20] ceph: add encrypted fname handling to ceph_mdsc_build_path
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (8 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 09/20] ceph: make ceph_msdc_build_path use ref-walk Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 11/20] ceph: decode alternate_name in lease info Jeff Layton
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

Allow ceph_mdsc_build_path to encrypt and base64 encode the filename
when the parent is encrypted and we're sending the path to the MDS.

In most cases, we just encrypt the filenames and base64 encode them,
but when the name is longer than CEPH_NOHASH_NAME_MAX, we use a similar
scheme to fscrypt proper, and hash the remaning bits with sha256.

When doing this, we then send along the full crypttext of the name in
the new alternate_name field of the MClientRequest. The MDS can then
send that along in readdir responses and traces.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/crypto.h     |  16 +++++
 fs/ceph/mds_client.c | 138 +++++++++++++++++++++++++++++++++++++------
 2 files changed, 136 insertions(+), 18 deletions(-)

diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index cc4e481bf13a..331b9c8da7fb 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -6,11 +6,27 @@
 #ifndef _CEPH_CRYPTO_H
 #define _CEPH_CRYPTO_H
 
+#include <crypto/sha2.h>
 #include <linux/fscrypt.h>
 
 #define	CEPH_XATTR_NAME_ENCRYPTION_CONTEXT	"encryption.ctx"
 
 #ifdef CONFIG_FS_ENCRYPTION
+
+/*
+ * We want to encrypt filenames when creating them, but the encrypted
+ * versions of those names may have illegal characters in them. To mitigate
+ * that, we base64 encode them, but that gives us a result that can exceed
+ * NAME_MAX.
+ *
+ * Follow a similar scheme to fscrypt itself, and cap the filename to a
+ * smaller size. If the cleartext name is longer than the value below, then
+ * sha256 hash the remaining bytes.
+ *
+ * 189 bytes => 252 bytes base64-encoded, which is <= NAME_MAX (255)
+ */
+#define CEPH_NOHASH_NAME_MAX (189 - SHA256_DIGEST_SIZE)
+
 void ceph_fscrypt_set_ops(struct super_block *sb);
 
 static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index ad0754a45811..85e8f578d555 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -13,6 +13,7 @@
 #include <linux/ktime.h>
 
 #include "super.h"
+#include "crypto.h"
 #include "mds_client.h"
 
 #include <linux/ceph/ceph_features.h>
@@ -2344,18 +2345,85 @@ static inline  u64 __get_oldest_tid(struct ceph_mds_client *mdsc)
 	return mdsc->oldest_tid;
 }
 
-/*
- * Build a dentry's path.  Allocate on heap; caller must kfree.  Based
- * on build_path_from_dentry in fs/cifs/dir.c.
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static int encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf)
+{
+	u32 len;
+	int elen;
+	int ret;
+	u8 *cryptbuf;
+
+	WARN_ON_ONCE(!fscrypt_has_encryption_key(parent));
+
+	/*
+	 * convert cleartext dentry name to ciphertext
+	 * if result is longer than CEPH_NOKEY_NAME_MAX,
+	 * sha256 the remaining bytes
+	 *
+	 * See: fscrypt_setup_filename
+	 */
+	if (!fscrypt_fname_encrypted_size(parent, dentry->d_name.len, NAME_MAX, &len))
+		return -ENAMETOOLONG;
+
+	/* If we have to hash the end, then we need a full-length buffer */
+	if (len > CEPH_NOHASH_NAME_MAX)
+		len = NAME_MAX;
+
+	cryptbuf = kmalloc(len, GFP_KERNEL);
+	if (!cryptbuf)
+		return -ENOMEM;
+
+	ret = fscrypt_fname_encrypt(parent, &dentry->d_name, cryptbuf, len);
+	if (ret) {
+		kfree(cryptbuf);
+		return ret;
+	}
+
+	/* hash the end if the name is long enough */
+	if (len > CEPH_NOHASH_NAME_MAX) {
+		u8 hash[SHA256_DIGEST_SIZE];
+		u8 *extra = cryptbuf + CEPH_NOHASH_NAME_MAX;
+
+		/* hash the extra bytes and overwrite crypttext beyond that point with it */
+		sha256(extra, len - CEPH_NOHASH_NAME_MAX, hash);
+		memcpy(extra, hash, SHA256_DIGEST_SIZE);
+		len = CEPH_NOHASH_NAME_MAX + SHA256_DIGEST_SIZE;
+	}
+
+	/* base64 encode the encrypted name */
+	elen = fscrypt_base64_encode(cryptbuf, len, buf);
+	kfree(cryptbuf);
+	dout("base64-encoded ciphertext name = %.*s\n", len, buf);
+	return elen;
+}
+#else
+static int encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
+/**
+ * ceph_mdsc_build_path - build a path string to a given dentry
+ * @dentry: dentry to which path should be built
+ * @plen: returned length of string
+ * @pbase: returned base inode number
+ * @for_wire: is this path going to be sent to the MDS?
+ *
+ * Build a string that represents the path to the dentry. This is mostly called
+ * for two different purposes:
  *
- * If @stop_on_nosnap, generate path relative to the first non-snapped
- * inode.
+ * 1) we need to build a path string to send to the MDS (for_wire == true)
+ * 2) we need a path string for local presentation (e.g. debugfs) (for_wire == false)
+ *
+ * The path is built in reverse, starting with the dentry. Walk back up toward
+ * the root, building the path until the first non-snapped inode is reached (for_wire)
+ * or the root inode is reached (!for_wire).
  *
  * Encode hidden .snap dirs as a double /, i.e.
  *   foo/.snap/bar -> foo//bar
  */
-char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
-			   int stop_on_nosnap)
+char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase, int for_wire)
 {
 	struct dentry *cur;
 	struct inode *inode;
@@ -2377,30 +2445,65 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 	seq = read_seqbegin(&rename_lock);
 	cur = dget(dentry);
 	for (;;) {
-		struct dentry *temp;
+		struct dentry *parent;
 
 		spin_lock(&cur->d_lock);
 		inode = d_inode(cur);
 		if (inode && ceph_snap(inode) == CEPH_SNAPDIR) {
 			dout("build_path path+%d: %p SNAPDIR\n",
 			     pos, cur);
-		} else if (stop_on_nosnap && inode && dentry != cur &&
-			   ceph_snap(inode) == CEPH_NOSNAP) {
+			spin_unlock(&cur->d_lock);
+			parent = dget_parent(cur);
+		} else if (for_wire && inode && dentry != cur && ceph_snap(inode) == CEPH_NOSNAP) {
 			spin_unlock(&cur->d_lock);
 			pos++; /* get rid of any prepended '/' */
 			break;
-		} else {
+		} else if (!for_wire || !IS_ENCRYPTED(d_inode(cur->d_parent))) {
 			pos -= cur->d_name.len;
 			if (pos < 0) {
 				spin_unlock(&cur->d_lock);
 				break;
 			}
 			memcpy(path + pos, cur->d_name.name, cur->d_name.len);
+			spin_unlock(&cur->d_lock);
+			parent = dget_parent(cur);
+		} else {
+			int len, ret;
+			char buf[FSCRYPT_BASE64_CHARS(NAME_MAX)];
+
+			/*
+			 * Proactively copy name into buf, in case we need to present
+			 * it as-is.
+			 */
+			memcpy(buf, cur->d_name.name, cur->d_name.len);
+			len = cur->d_name.len;
+			spin_unlock(&cur->d_lock);
+			parent = dget_parent(cur);
+
+			ret = __fscrypt_prepare_readdir(d_inode(parent));
+			if (ret < 0) {
+				dput(parent);
+				dput(cur);
+				return ERR_PTR(ret);
+			}
+
+			if (fscrypt_has_encryption_key(d_inode(parent))) {
+				len = encode_encrypted_fname(d_inode(parent), cur, buf);
+				if (len < 0) {
+					dput(parent);
+					dput(cur);
+					return ERR_PTR(len);
+				}
+			}
+			pos -= len;
+			if (pos < 0) {
+				dput(parent);
+				break;
+			}
+			memcpy(path + pos, buf, len);
 		}
-		temp = cur;
-		spin_unlock(&temp->d_lock);
-		cur = dget_parent(temp);
-		dput(temp);
+		dput(cur);
+		cur = parent;
 
 		/* Are we at the root? */
 		if (IS_ROOT(cur))
@@ -2424,8 +2527,7 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 		 * A rename didn't occur, but somehow we didn't end up where
 		 * we thought we would. Throw a warning and try again.
 		 */
-		pr_warn("build_path did not end path lookup where "
-			"expected, pos is %d\n", pos);
+		pr_warn("build_path did not end path lookup where expected (pos = %d)\n", pos);
 		goto retry;
 	}
 
@@ -2445,7 +2547,7 @@ static int build_dentry_path(struct dentry *dentry, struct inode *dir,
 	rcu_read_lock();
 	if (!dir)
 		dir = d_inode_rcu(dentry->d_parent);
-	if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP) {
+	if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP && !IS_ENCRYPTED(dir)) {
 		*pino = ceph_ino(dir);
 		rcu_read_unlock();
 		*ppath = dentry->d_name.name;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 11/20] ceph: decode alternate_name in lease info
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (9 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 10/20] ceph: add encrypted fname handling to ceph_mdsc_build_path Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 12/20] ceph: send altname in MClientRequest Jeff Layton
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

Ceph is a bit different from local filesystems, in that we don't want
to store filenames as raw binary data, since we may also be dealing
with clients that don't support fscrypt.

We could just base64-encode the encrypted filenames, but that could
leave us with filenames longer than NAME_MAX. It turns out that the
MDS doesn't care much about filename length, but the clients do.

To manage this, we've added a new "alternate name" field that can be
optionally added to any dentry that we'll use to store the binary
crypttext of the filename if its base64-encoded value will be longer
than NAME_MAX. When a dentry has one of these names attached, the MDS
will send it along in the lease info, which we can then store for
later usage.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 40 ++++++++++++++++++++++++++++++----------
 fs/ceph/mds_client.h | 11 +++++++----
 2 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 85e8f578d555..77181a1fc900 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -283,27 +283,44 @@ static int parse_reply_info_dir(void **p, void *end,
 
 static int parse_reply_info_lease(void **p, void *end,
 				  struct ceph_mds_reply_lease **lease,
-				  u64 features)
+				  u64 features, u32 *altname_len, u8 **altname)
 {
+	u8 struct_v;
+	u32 struct_len;
+
 	if (features == (u64)-1) {
-		u8 struct_v, struct_compat;
-		u32 struct_len;
+		u8 struct_compat;
+
 		ceph_decode_8_safe(p, end, struct_v, bad);
 		ceph_decode_8_safe(p, end, struct_compat, bad);
+
 		/* struct_v is expected to be >= 1. we only understand
 		 * encoding whose struct_compat == 1. */
 		if (!struct_v || struct_compat != 1)
 			goto bad;
+
 		ceph_decode_32_safe(p, end, struct_len, bad);
-		ceph_decode_need(p, end, struct_len, bad);
-		end = *p + struct_len;
+	} else {
+		struct_len = sizeof(**lease);
+		*altname_len = 0;
+		*altname = NULL;
 	}
 
-	ceph_decode_need(p, end, sizeof(**lease), bad);
+	ceph_decode_need(p, end, struct_len, bad);
 	*lease = *p;
 	*p += sizeof(**lease);
-	if (features == (u64)-1)
-		*p = end;
+
+	if (features == (u64)-1) {
+		if (struct_v >= 2) {
+			ceph_decode_32_safe(p, end, *altname_len, bad);
+			ceph_decode_need(p, end, *altname_len, bad);
+			*altname = *p;
+			*p += *altname_len;
+		} else {
+			*altname = NULL;
+			*altname_len = 0;
+		}
+	}
 	return 0;
 bad:
 	return -EIO;
@@ -333,7 +350,8 @@ static int parse_reply_info_trace(void **p, void *end,
 		info->dname = *p;
 		*p += info->dname_len;
 
-		err = parse_reply_info_lease(p, end, &info->dlease, features);
+		err = parse_reply_info_lease(p, end, &info->dlease, features,
+					     &info->altname_len, &info->altname);
 		if (err < 0)
 			goto out_bad;
 	}
@@ -400,9 +418,11 @@ static int parse_reply_info_readdir(void **p, void *end,
 		dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);
 
 		/* dentry lease */
-		err = parse_reply_info_lease(p, end, &rde->lease, features);
+		err = parse_reply_info_lease(p, end, &rde->lease, features,
+					     &rde->altname_len, &rde->altname);
 		if (err)
 			goto out_bad;
+
 		/* inode */
 		err = parse_reply_info_in(p, end, &rde->inode, features);
 		if (err < 0)
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 84c4476bc520..676fd994f6b8 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -29,8 +29,8 @@ enum ceph_feature_type {
 	CEPHFS_FEATURE_MULTI_RECONNECT,
 	CEPHFS_FEATURE_DELEG_INO,
 	CEPHFS_FEATURE_METRIC_COLLECT,
-
-	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
+	CEPHFS_FEATURE_ALTERNATE_NAME,
+	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_ALTERNATE_NAME,
 };
 
 /*
@@ -45,8 +45,7 @@ enum ceph_feature_type {
 	CEPHFS_FEATURE_MULTI_RECONNECT,		\
 	CEPHFS_FEATURE_DELEG_INO,		\
 	CEPHFS_FEATURE_METRIC_COLLECT,		\
-						\
-	CEPHFS_FEATURE_MAX,			\
+	CEPHFS_FEATURE_ALTERNATE_NAME,		\
 }
 #define CEPHFS_FEATURES_CLIENT_REQUIRED {}
 
@@ -95,7 +94,9 @@ struct ceph_mds_reply_info_in {
 
 struct ceph_mds_reply_dir_entry {
 	char                          *name;
+	u8			      *altname;
 	u32                           name_len;
+	u32			      altname_len;
 	struct ceph_mds_reply_lease   *lease;
 	struct ceph_mds_reply_info_in inode;
 	loff_t			      offset;
@@ -114,7 +115,9 @@ struct ceph_mds_reply_info_parsed {
 	struct ceph_mds_reply_info_in diri, targeti;
 	struct ceph_mds_reply_dirfrag *dirfrag;
 	char                          *dname;
+	u8			      *altname;
 	u32                           dname_len;
+	u32                           altname_len;
 	struct ceph_mds_reply_lease   *dlease;
 
 	/* extra */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 12/20] ceph: send altname in MClientRequest
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (10 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 11/20] ceph: decode alternate_name in lease info Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 13/20] ceph: properly set DCACHE_NOKEY_NAME flag in lookup Jeff Layton
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

In the event that we have a filename longer than CEPH_NOHASH_NAME_MAX,
we'll need to hash the tail of the filename. The client however will
still need to know the full name of the file if it has a key.

To support this, the MClientRequest field has grown a new alternate_name
field that we populate with the full (binary) crypttext of the filename.
This is then transmitted to the clients in readdir or traces as part of
the dentry lease.

Add support for populating this field when the filenames are very long.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 79 +++++++++++++++++++++++++++++++++++++++++---
 fs/ceph/mds_client.h |  2 ++
 2 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 77181a1fc900..2bcef4ddbe00 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -892,6 +892,7 @@ void ceph_mdsc_release_request(struct kref *kref)
 	put_cred(req->r_cred);
 	if (req->r_pagelist)
 		ceph_pagelist_release(req->r_pagelist);
+	kfree(req->r_altname);
 	put_request_session(req);
 	ceph_unreserve_caps(req->r_mdsc, &req->r_caps_reservation);
 	WARN_ON_ONCE(!list_empty(&req->r_wait));
@@ -2416,11 +2417,66 @@ static int encode_encrypted_fname(const struct inode *parent, struct dentry *den
 	dout("base64-encoded ciphertext name = %.*s\n", len, buf);
 	return elen;
 }
+
+static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
+{
+	struct inode *dir = req->r_parent;
+	struct dentry *dentry = req->r_dentry;
+	u8 *cryptbuf = NULL;
+	u32 len = 0;
+	int ret = 0;
+
+	/* only encode if we have parent and dentry */
+	if (!dir || !dentry)
+		goto success;
+
+	/* No-op unless this is encrypted */
+	if (!IS_ENCRYPTED(dir))
+		goto success;
+
+	ret = __fscrypt_prepare_readdir(dir);
+	if (ret)
+		return ERR_PTR(ret);
+
+	/* No key? Just ignore it. */
+	if (!fscrypt_has_encryption_key(dir))
+		goto success;
+
+	if (!fscrypt_fname_encrypted_size(dir, dentry->d_name.len, NAME_MAX, &len)) {
+		WARN_ON_ONCE(1);
+		return ERR_PTR(-ENAMETOOLONG);
+	}
+
+	/* No need to append altname if name is short enough */
+	if (len <= CEPH_NOHASH_NAME_MAX) {
+		len = 0;
+		goto success;
+	}
+
+	cryptbuf = kmalloc(len, GFP_KERNEL);
+	if (!cryptbuf)
+		return ERR_PTR(-ENOMEM);
+
+	ret = fscrypt_fname_encrypt(dir, &dentry->d_name, cryptbuf, len);
+	if (ret) {
+		kfree(cryptbuf);
+		return ERR_PTR(ret);
+	}
+success:
+	*plen = len;
+	return cryptbuf;
+}
 #else
 static int encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf)
 {
 	return -EOPNOTSUPP;
 }
+
+static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
+{
+	*plen = 0;
+	return NULL;
+}
 #endif
 
 /**
@@ -2635,7 +2691,7 @@ static int set_request_path_attr(struct inode *rinode, struct dentry *rdentry,
 	return r;
 }
 
-static void encode_timestamp_and_gids(void **p,
+static void encode_mclientrequest_tail(void **p,
 				      const struct ceph_mds_request *req)
 {
 	struct ceph_timespec ts;
@@ -2644,11 +2700,16 @@ static void encode_timestamp_and_gids(void **p,
 	ceph_encode_timespec64(&ts, &req->r_stamp);
 	ceph_encode_copy(p, &ts, sizeof(ts));
 
-	/* gid_list */
+	/* v4: gid_list */
 	ceph_encode_32(p, req->r_cred->group_info->ngroups);
 	for (i = 0; i < req->r_cred->group_info->ngroups; i++)
 		ceph_encode_64(p, from_kgid(&init_user_ns,
 					    req->r_cred->group_info->gid[i]));
+
+	/* v5: altname */
+	ceph_encode_32(p, req->r_altname_len);
+	if (req->r_altname_len)
+		ceph_encode_copy(p, req->r_altname, req->r_altname_len);
 }
 
 /*
@@ -2693,10 +2754,18 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 		goto out_free1;
 	}
 
+	req->r_altname = get_fscrypt_altname(req, &req->r_altname_len);
+	if (IS_ERR(req->r_altname)) {
+		msg = ERR_CAST(req->r_altname);
+		req->r_altname = NULL;
+		goto out_free2;
+	}
+
 	len = legacy ? sizeof(*head) : sizeof(struct ceph_mds_request_head);
 	len += pathlen1 + pathlen2 + 2*(1 + sizeof(u32) + sizeof(u64)) +
 		sizeof(struct ceph_timespec);
 	len += sizeof(u32) + (sizeof(u64) * req->r_cred->group_info->ngroups);
+	len += sizeof(u32) + req->r_altname_len;
 
 	/* calculate (max) length for cap releases */
 	len += sizeof(struct ceph_mds_request_release) *
@@ -2727,7 +2796,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 	} else {
 		struct ceph_mds_request_head *new_head = msg->front.iov_base;
 
-		msg->hdr.version = cpu_to_le16(4);
+		msg->hdr.version = cpu_to_le16(5);
 		new_head->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
 		head = (struct ceph_mds_request_head_old *)&new_head->oldest_client_tid;
 		p = msg->front.iov_base + sizeof(*new_head);
@@ -2778,7 +2847,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 
 	head->num_releases = cpu_to_le16(releases);
 
-	encode_timestamp_and_gids(&p, req);
+	encode_mclientrequest_tail(&p, req);
 
 	if (WARN_ON_ONCE(p > end)) {
 		ceph_msg_put(msg);
@@ -2887,7 +2956,7 @@ static int __prepare_send_request(struct ceph_mds_session *session,
 		rhead->num_releases = 0;
 
 		p = msg->front.iov_base + req->r_request_release_offset;
-		encode_timestamp_and_gids(&p, req);
+		encode_mclientrequest_tail(&p, req);
 
 		msg->front.iov_len = p - msg->front.iov_base;
 		msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 676fd994f6b8..597d8d8053c0 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -280,6 +280,8 @@ struct ceph_mds_request {
 	struct mutex r_fill_mutex;
 
 	union ceph_mds_request_args r_args;
+	u8 *r_altname;		    /* fscrypt binary crypttext for long filenames */
+	u32 r_altname_len;	    /* length of r_altname */
 	int r_fmode;        /* file mode, if expecting cap */
 	const struct cred *r_cred;
 	int r_request_release_offset;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 13/20] ceph: properly set DCACHE_NOKEY_NAME flag in lookup
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (11 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 12/20] ceph: send altname in MClientRequest Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 14/20] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries Jeff Layton
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

This is required so that we know to invalidate these dentries when the
directory is unlocked.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 496d24b003dd..72728850e96c 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -755,6 +755,17 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
 	if (dentry->d_name.len > NAME_MAX)
 		return ERR_PTR(-ENAMETOOLONG);
 
+	if (IS_ENCRYPTED(dir)) {
+		err = __fscrypt_prepare_readdir(dir);
+		if (err)
+			return ERR_PTR(err);
+		if (!fscrypt_has_encryption_key(dir)) {
+			spin_lock(&dentry->d_lock);
+			dentry->d_flags |= DCACHE_NOKEY_NAME;
+			spin_unlock(&dentry->d_lock);
+		}
+	}
+
 	/* can we conclude ENOENT locally? */
 	if (d_really_is_negative(dentry)) {
 		struct ceph_inode_info *ci = ceph_inode(dir);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 14/20] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (12 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 13/20] ceph: properly set DCACHE_NOKEY_NAME flag in lookup Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 15/20] ceph: add helpers for converting names for userland presentation Jeff Layton
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

If we have a dentry which represents a no-key name, then we need to test
whether the parent directory's encryption key has since been added.  Do
that before we test anything else about the dentry.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 72728850e96c..867e396f44f1 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1697,6 +1697,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 	struct inode *dir, *inode;
 	struct ceph_mds_client *mdsc;
 
+	valid = fscrypt_d_revalidate(dentry, flags);
+	if (valid <= 0)
+		return valid;
+
 	if (flags & LOOKUP_RCU) {
 		parent = READ_ONCE(dentry->d_parent);
 		dir = d_inode_rcu(parent);
@@ -1709,8 +1713,8 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 		inode = d_inode(dentry);
 	}
 
-	dout("d_revalidate %p '%pd' inode %p offset 0x%llx\n", dentry,
-	     dentry, inode, ceph_dentry(dentry)->offset);
+	dout("d_revalidate %p '%pd' inode %p offset 0x%llx nokey %d\n", dentry,
+	     dentry, inode, ceph_dentry(dentry)->offset, !!(dentry->d_flags & DCACHE_NOKEY_NAME));
 
 	mdsc = ceph_sb_to_client(dir->i_sb)->mdsc;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 15/20] ceph: add helpers for converting names for userland presentation
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (13 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 14/20] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 16/20] ceph: add fscrypt support to ceph_fill_trace Jeff Layton
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/crypto.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/crypto.h | 41 ++++++++++++++++++++++++++
 2 files changed, 117 insertions(+)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index f037a4939026..9fed68f37629 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -107,3 +107,79 @@ int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
 		ceph_pagelist_release(pagelist);
 	return ret;
 }
+
+/**
+ * ceph_fname_to_usr - convert a filename for userland presentation
+ * @fname: ceph_fname to be converted
+ * @tname: temporary name buffer to use for conversion (may be NULL)
+ * @oname: where converted name should be placed
+ * @is_nokey: set to true if key wasn't available during conversion (may be NULL)
+ *
+ * Given a filename (usually from the MDS), format it for presentation to
+ * userland. If @parent is not encrypted, just pass it back as-is.
+ *
+ * Otherwise, base64 decode the string, and then ask fscrypt to format it
+ * for userland presentation.
+ *
+ * Returns 0 on success or negative error code on error.
+ */
+int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+		      struct fscrypt_str *oname, bool *is_nokey)
+{
+	int ret;
+	struct fscrypt_str _tname = FSTR_INIT(NULL, 0);
+	struct fscrypt_str iname;
+
+	if (!IS_ENCRYPTED(fname->dir)) {
+		oname->name = fname->name;
+		oname->len = fname->name_len;
+		return 0;
+	}
+
+	/* Sanity check that the resulting name will fit in the buffer */
+	if (fname->name_len > FSCRYPT_BASE64_CHARS(NAME_MAX))
+		return -EIO;
+
+	ret = __fscrypt_prepare_readdir(fname->dir);
+	if (ret)
+		return ret;
+
+	/*
+	 * Use the raw dentry name as sent by the MDS instead of
+	 * generating a nokey name via fscrypt.
+	 */
+	if (!fscrypt_has_encryption_key(fname->dir)) {
+		memcpy(oname->name, fname->name, fname->name_len);
+		oname->len = fname->name_len;
+		if (is_nokey)
+			*is_nokey = true;
+		return 0;
+	}
+
+	if (fname->ctext_len == 0) {
+		int declen;
+
+		if (!tname) {
+			ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
+			if (ret)
+				return ret;
+			tname = &_tname;
+		}
+
+		declen = fscrypt_base64_decode(fname->name, fname->name_len, tname->name);
+		if (declen <= 0) {
+			ret = -EIO;
+			goto out;
+		}
+		iname.name = tname->name;
+		iname.len = declen;
+	} else {
+		iname.name = fname->ctext;
+		iname.len = fname->ctext_len;
+	}
+
+	ret = fscrypt_fname_disk_to_usr(fname->dir, 0, 0, &iname, oname);
+out:
+	fscrypt_fname_free_buffer(&_tname);
+	return ret;
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 331b9c8da7fb..5a3fb68eb814 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -11,6 +11,14 @@
 
 #define	CEPH_XATTR_NAME_ENCRYPTION_CONTEXT	"encryption.ctx"
 
+struct ceph_fname {
+	struct inode	*dir;
+	char 		*name;		// b64 encoded, possibly hashed
+	unsigned char	*ctext;		// binary crypttext (if any)
+	u32		name_len;	// length of name buffer
+	u32		ctext_len;	// length of crypttext
+};
+
 #ifdef CONFIG_FS_ENCRYPTION
 
 /*
@@ -37,6 +45,22 @@ static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
 int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
 				 struct ceph_acl_sec_ctx *as);
 
+static inline int ceph_fname_alloc_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+	if (!IS_ENCRYPTED(parent))
+		return 0;
+	return fscrypt_fname_alloc_buffer(NAME_MAX, fname);
+}
+
+static inline void ceph_fname_free_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+	if (IS_ENCRYPTED(parent))
+		fscrypt_fname_free_buffer(fname);
+}
+
+int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+			struct fscrypt_str *oname, bool *is_nokey);
+
 #else /* CONFIG_FS_ENCRYPTION */
 
 static inline void ceph_fscrypt_set_ops(struct super_block *sb)
@@ -55,6 +79,23 @@ static inline int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *
 	return 0;
 }
 
+static inline int ceph_fname_alloc_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+	return 0;
+}
+
+static inline void ceph_fname_free_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+}
+
+static inline int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+				    struct fscrypt_str *oname, bool *is_nokey)
+{
+	oname->name = fname->name;
+	oname->len = fname->name_len;
+	return 0;
+}
+
 #endif /* CONFIG_FS_ENCRYPTION */
 
 #endif
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 16/20] ceph: add fscrypt support to ceph_fill_trace
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (14 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 15/20] ceph: add helpers for converting names for userland presentation Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 17/20] ceph: add support to readdir for encrypted filenames Jeff Layton
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

When we get a dentry in a trace, decrypt the name so we can properly
instantiate the dentry.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/inode.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index e20d1da9fe71..bf170a4cf6c0 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1377,8 +1377,15 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 		if (dir && req->r_op == CEPH_MDS_OP_LOOKUPNAME &&
 		    test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags) &&
 		    !test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags)) {
+			bool is_nokey = false;
 			struct qstr dname;
 			struct dentry *dn, *parent;
+			struct fscrypt_str oname = FSTR_INIT(NULL, 0);
+			struct ceph_fname fname = { .dir	= dir,
+						    .name	= rinfo->dname,
+						    .ctext	= rinfo->altname,
+						    .name_len	= rinfo->dname_len,
+						    .ctext_len	= rinfo->altname_len };
 
 			BUG_ON(!rinfo->head->is_target);
 			BUG_ON(req->r_dentry);
@@ -1386,8 +1393,20 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 			parent = d_find_any_alias(dir);
 			BUG_ON(!parent);
 
-			dname.name = rinfo->dname;
-			dname.len = rinfo->dname_len;
+			err = ceph_fname_alloc_buffer(dir, &oname);
+			if (err < 0) {
+				dput(parent);
+				goto done;
+			}
+
+			err = ceph_fname_to_usr(&fname, NULL, &oname, &is_nokey);
+			if (err < 0) {
+				dput(parent);
+				ceph_fname_free_buffer(dir, &oname);
+				goto done;
+			}
+			dname.name = oname.name;
+			dname.len = oname.len;
 			dname.hash = full_name_hash(parent, dname.name, dname.len);
 			tvino.ino = le64_to_cpu(rinfo->targeti.in->ino);
 			tvino.snap = le64_to_cpu(rinfo->targeti.in->snapid);
@@ -1402,9 +1421,15 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 				     dname.len, dname.name, dn);
 				if (!dn) {
 					dput(parent);
+					ceph_fname_free_buffer(dir, &oname);
 					err = -ENOMEM;
 					goto done;
 				}
+				if (is_nokey) {
+					spin_lock(&dn->d_lock);
+					dn->d_flags |= DCACHE_NOKEY_NAME;
+					spin_unlock(&dn->d_lock);
+				}
 				err = 0;
 			} else if (d_really_is_positive(dn) &&
 				   (ceph_ino(d_inode(dn)) != tvino.ino ||
@@ -1416,6 +1441,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 				dput(dn);
 				goto retry_lookup;
 			}
+			ceph_fname_free_buffer(dir, &oname);
 
 			req->r_dentry = dn;
 			dput(parent);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 17/20] ceph: add support to readdir for encrypted filenames
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (15 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 16/20] ceph: add fscrypt support to ceph_fill_trace Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 18/20] ceph: create symlinks with encrypted and base64-encoded targets Jeff Layton
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

Add helper functions for buffer management and for decrypting filenames
returned by the MDS. Wire those into the readdir codepaths.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c   | 62 +++++++++++++++++++++++++++++++++++++++----------
 fs/ceph/inode.c | 38 +++++++++++++++++++++++++++---
 2 files changed, 85 insertions(+), 15 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 867e396f44f1..7fe74c2f3911 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -9,6 +9,7 @@
 
 #include "super.h"
 #include "mds_client.h"
+#include "crypto.h"
 
 /*
  * Directory operations: readdir, lookup, create, link, unlink,
@@ -241,7 +242,9 @@ static int __dcache_readdir(struct file *file,  struct dir_context *ctx,
 		di = ceph_dentry(dentry);
 		if (d_unhashed(dentry) ||
 		    d_really_is_negative(dentry) ||
-		    di->lease_shared_gen != shared_gen) {
+		    di->lease_shared_gen != shared_gen ||
+		    ((dentry->d_flags & DCACHE_NOKEY_NAME) &&
+		     fscrypt_has_encryption_key(dir))) {
 			spin_unlock(&dentry->d_lock);
 			dput(dentry);
 			err = -EAGAIN;
@@ -313,6 +316,8 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 	int err;
 	unsigned frag = -1;
 	struct ceph_mds_reply_info_parsed *rinfo;
+	struct fscrypt_str tname = FSTR_INIT(NULL, 0);
+	struct fscrypt_str oname = FSTR_INIT(NULL, 0);
 
 	dout("readdir %p file %p pos %llx\n", inode, file, ctx->pos);
 	if (dfi->file_info.flags & CEPH_F_ATEND)
@@ -340,6 +345,10 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 		ctx->pos = 2;
 	}
 
+	err = fscrypt_prepare_readdir(inode);
+	if (err)
+		goto out;
+
 	spin_lock(&ci->i_ceph_lock);
 	/* request Fx cap. if have Fx, we don't need to release Fs cap
 	 * for later create/unlink. */
@@ -360,6 +369,14 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 		spin_unlock(&ci->i_ceph_lock);
 	}
 
+	err = ceph_fname_alloc_buffer(inode, &tname);
+	if (err < 0)
+		goto out;
+
+	err = ceph_fname_alloc_buffer(inode, &oname);
+	if (err < 0)
+		goto out;
+
 	/* proceed with a normal readdir */
 more:
 	/* do we have the correct frag content buffered? */
@@ -387,12 +404,14 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 		dout("readdir fetching %llx.%llx frag %x offset '%s'\n",
 		     ceph_vinop(inode), frag, dfi->last_name);
 		req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
-		if (IS_ERR(req))
-			return PTR_ERR(req);
+		if (IS_ERR(req)) {
+			err = PTR_ERR(req);
+			goto out;
+		}
 		err = ceph_alloc_readdir_reply_buffer(req, inode);
 		if (err) {
 			ceph_mdsc_put_request(req);
-			return err;
+			goto out;
 		}
 		/* hints to request -> mds selection code */
 		req->r_direct_mode = USE_AUTH_MDS;
@@ -405,7 +424,8 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 			req->r_path2 = kstrdup(dfi->last_name, GFP_KERNEL);
 			if (!req->r_path2) {
 				ceph_mdsc_put_request(req);
-				return -ENOMEM;
+				err = -ENOMEM;
+				goto out;
 			}
 		} else if (is_hash_order(ctx->pos)) {
 			req->r_args.readdir.offset_hash =
@@ -426,7 +446,7 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 		err = ceph_mdsc_do_request(mdsc, NULL, req);
 		if (err < 0) {
 			ceph_mdsc_put_request(req);
-			return err;
+			goto out;
 		}
 		dout("readdir got and parsed readdir result=%d on "
 		     "frag %x, end=%d, complete=%d, hash_order=%d\n",
@@ -479,7 +499,7 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 			err = note_last_dentry(dfi, rde->name, rde->name_len,
 					       next_offset);
 			if (err)
-				return err;
+				goto out;
 		} else if (req->r_reply_info.dir_end) {
 			dfi->next_offset = 2;
 			/* keep last name */
@@ -507,22 +527,37 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 	}
 	for (; i < rinfo->dir_nr; i++) {
 		struct ceph_mds_reply_dir_entry *rde = rinfo->dir_entries + i;
+		struct ceph_fname fname = { .dir	= inode,
+					    .name	= rde->name,
+					    .name_len	= rde->name_len,
+					    .ctext	= rde->altname,
+					    .ctext_len	= rde->altname_len };
+		u32 olen = oname.len;
 
 		BUG_ON(rde->offset < ctx->pos);
+		BUG_ON(!rde->inode.in);
 
 		ctx->pos = rde->offset;
 		dout("readdir (%d/%d) -> %llx '%.*s' %p\n",
 		     i, rinfo->dir_nr, ctx->pos,
 		     rde->name_len, rde->name, &rde->inode.in);
 
-		BUG_ON(!rde->inode.in);
+		err = ceph_fname_to_usr(&fname, &tname, &oname, NULL);
+		if (err) {
+			dout("Unable to decode %.*s. Skipping it.\n", rde->name_len, rde->name);
+			continue;
+		}
 
-		if (!dir_emit(ctx, rde->name, rde->name_len,
+		if (!dir_emit(ctx, oname.name, oname.len,
 			      ceph_present_ino(inode->i_sb, le64_to_cpu(rde->inode.in->ino)),
 			      le32_to_cpu(rde->inode.in->mode) >> 12)) {
 			dout("filldir stopping us...\n");
-			return 0;
+			err = 0;
+			goto out;
 		}
+
+		/* Reset the lengths to their original allocated vals */
+		oname.len = olen;
 		ctx->pos++;
 	}
 
@@ -577,9 +612,12 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 					dfi->dir_ordered_count);
 		spin_unlock(&ci->i_ceph_lock);
 	}
-
+	err = 0;
 	dout("readdir %p file %p done.\n", inode, file);
-	return 0;
+out:
+	ceph_fname_free_buffer(inode, &tname);
+	ceph_fname_free_buffer(inode, &oname);
+	return err;
 }
 
 static void reset_readdir(struct ceph_dir_file_info *dfi)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index bf170a4cf6c0..5bd0717c030a 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1722,7 +1722,8 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 			     struct ceph_mds_session *session)
 {
 	struct dentry *parent = req->r_dentry;
-	struct ceph_inode_info *ci = ceph_inode(d_inode(parent));
+	struct inode *inode = d_inode(parent);
+	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
 	struct qstr dname;
 	struct dentry *dn;
@@ -1732,6 +1733,8 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 	u32 last_hash = 0;
 	u32 fpos_offset;
 	struct ceph_readdir_cache_control cache_ctl = {};
+	struct fscrypt_str tname = FSTR_INIT(NULL, 0);
+	struct fscrypt_str oname = FSTR_INIT(NULL, 0);
 
 	if (test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags))
 		return readdir_prepopulate_inodes_only(req, session);
@@ -1783,14 +1786,36 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 	cache_ctl.index = req->r_readdir_cache_idx;
 	fpos_offset = req->r_readdir_offset;
 
+	err = ceph_fname_alloc_buffer(inode, &tname);
+	if (err < 0)
+		goto out;
+
+	err = ceph_fname_alloc_buffer(inode, &oname);
+	if (err < 0)
+		goto out;
+
 	/* FIXME: release caps/leases if error occurs */
 	for (i = 0; i < rinfo->dir_nr; i++) {
+		bool is_nokey = false;
 		struct ceph_mds_reply_dir_entry *rde = rinfo->dir_entries + i;
 		struct ceph_vino tvino;
+		u32 olen = oname.len;
+		struct ceph_fname fname = { .dir	= inode,
+					    .name	= rde->name,
+					    .name_len	= rde->name_len,
+					    .ctext	= rde->altname,
+					    .ctext_len	= rde->altname_len };
+
+		err = ceph_fname_to_usr(&fname, &tname, &oname, &is_nokey);
+		if (err) {
+			dout("Unable to decode %.*s. Skipping it.", rde->name_len, rde->name);
+			continue;
+		}
 
-		dname.name = rde->name;
-		dname.len = rde->name_len;
+		dname.name = oname.name;
+		dname.len = oname.len;
 		dname.hash = full_name_hash(parent, dname.name, dname.len);
+		oname.len = olen;
 
 		tvino.ino = le64_to_cpu(rde->inode.in->ino);
 		tvino.snap = le64_to_cpu(rde->inode.in->snapid);
@@ -1821,6 +1846,11 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 				err = -ENOMEM;
 				goto out;
 			}
+			if (is_nokey) {
+				spin_lock(&dn->d_lock);
+				dn->d_flags |= DCACHE_NOKEY_NAME;
+				spin_unlock(&dn->d_lock);
+			}
 		} else if (d_really_is_positive(dn) &&
 			   (ceph_ino(d_inode(dn)) != tvino.ino ||
 			    ceph_snap(d_inode(dn)) != tvino.snap)) {
@@ -1911,6 +1941,8 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 		req->r_readdir_cache_idx = cache_ctl.index;
 	}
 	ceph_readdir_cache_release(&cache_ctl);
+	ceph_fname_free_buffer(inode, &tname);
+	ceph_fname_free_buffer(inode, &oname);
 	dout("readdir_prepopulate done\n");
 	return err;
 }
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 18/20] ceph: create symlinks with encrypted and base64-encoded targets
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (16 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 17/20] ceph: add support to readdir for encrypted filenames Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 19/20] ceph: make ceph_get_name decrypt filenames Jeff Layton
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

When creating symlinks in encrypted directories, encrypt and
base64-encode the target with the new inode's key before sending to the
MDS.

When filling a symlinked inode, base64-decode it into a buffer that
we'll keep in ci->i_symlink. When get_link is called, decrypt the buffer
into a new one that will hang off i_link.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c   | 52 ++++++++++++++++++++++++---
 fs/ceph/inode.c | 95 ++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 130 insertions(+), 17 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 7fe74c2f3911..e039534a5fab 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -947,6 +947,40 @@ static int ceph_create(struct user_namespace *mnt_userns, struct inode *dir,
 	return ceph_mknod(mnt_userns, dir, dentry, mode, 0);
 }
 
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static int prep_encrypted_symlink_target(struct ceph_mds_request *req, const char *dest)
+{
+	int err;
+	int len = strlen(dest);
+	struct fscrypt_str osd_link = FSTR_INIT(NULL, 0);
+
+	err = fscrypt_prepare_symlink(req->r_parent, dest, len, PATH_MAX, &osd_link);
+	if (err)
+		goto out;
+
+	err = fscrypt_encrypt_symlink(req->r_new_inode, dest, len, &osd_link);
+	if (err)
+		goto out;
+
+	req->r_path2 = kmalloc(FSCRYPT_BASE64_CHARS(osd_link.len) + 1, GFP_KERNEL);
+	if (!req->r_path2) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	len = fscrypt_base64_encode(osd_link.name, osd_link.len, req->r_path2);
+	req->r_path2[len] = '\0';
+out:
+	fscrypt_fname_free_buffer(&osd_link);
+	return err;
+}
+#else
+static int prep_encrypted_symlink_target(struct ceph_mds_request *req, const char *dest)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 			struct dentry *dentry, const char *dest)
 {
@@ -978,12 +1012,20 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out_req;
 	}
 
-	req->r_path2 = kstrdup(dest, GFP_KERNEL);
-	if (!req->r_path2) {
-		err = -ENOMEM;
-		goto out_req;
-	}
 	req->r_parent = dir;
+
+	if (IS_ENCRYPTED(req->r_new_inode)) {
+		err = prep_encrypted_symlink_target(req, dest);
+		if (err)
+			goto out_req;
+	} else {
+		req->r_path2 = kstrdup(dest, GFP_KERNEL);
+		if (!req->r_path2) {
+			err = -ENOMEM;
+			goto out_req;
+		}
+	}
+
 	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
 	req->r_dentry = dget(dentry);
 	req->r_num_caps = 2;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 5bd0717c030a..5afedf779dfc 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -35,6 +35,7 @@
  */
 
 static const struct inode_operations ceph_symlink_iops;
+static const struct inode_operations ceph_encrypted_symlink_iops;
 
 static void ceph_inode_work(struct work_struct *work);
 
@@ -618,6 +619,7 @@ void ceph_free_inode(struct inode *inode)
 	struct ceph_inode_info *ci = ceph_inode(inode);
 
 	kfree(ci->i_symlink);
+	fscrypt_free_inode(inode);
 	kmem_cache_free(ceph_inode_cachep, ci);
 }
 
@@ -818,6 +820,33 @@ void ceph_fill_file_time(struct inode *inode, int issued,
 		     inode, time_warp_seq, ci->i_time_warp_seq);
 }
 
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static int decode_encrypted_symlink(const char *encsym, int enclen, u8 **decsym)
+{
+	int declen;
+	u8 *sym;
+
+	sym = kmalloc(enclen + 1, GFP_NOFS);
+	if (!sym)
+		return -ENOMEM;
+
+	declen = fscrypt_base64_decode(encsym, enclen, sym);
+	if (declen < 0) {
+		pr_err("%s: can't decode symlink (%d). Content: %.*s\n", __func__, declen, enclen, encsym);
+		kfree(sym);
+		return -EIO;
+	}
+	sym[declen + 1] = '\0';
+	*decsym = sym;
+	return declen;
+}
+#else
+static int decode_encrypted_symlink(const char *encsym, int symlen, u8 **decsym)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 /*
  * Populate an inode based on info from mds.  May be called on new or
  * existing inodes.
@@ -1042,26 +1071,39 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 		inode->i_fop = &ceph_file_fops;
 		break;
 	case S_IFLNK:
-		inode->i_op = &ceph_symlink_iops;
 		if (!ci->i_symlink) {
 			u32 symlen = iinfo->symlink_len;
 			char *sym;
 
 			spin_unlock(&ci->i_ceph_lock);
 
-			if (symlen != i_size_read(inode)) {
-				pr_err("%s %llx.%llx BAD symlink "
-					"size %lld\n", __func__,
-					ceph_vinop(inode),
-					i_size_read(inode));
+			if (IS_ENCRYPTED(inode)) {
+				if (symlen != i_size_read(inode))
+					pr_err("%s %llx.%llx BAD symlink size %lld\n",
+						__func__, ceph_vinop(inode), i_size_read(inode));
+
+				err = decode_encrypted_symlink(iinfo->symlink, symlen, (u8 **)&sym);
+				if (err < 0) {
+					pr_err("%s decoding encrypted symlink failed: %d\n",
+						__func__, err);
+					goto out;
+				}
+				symlen = err;
 				i_size_write(inode, symlen);
 				inode->i_blocks = calc_inode_blocks(symlen);
-			}
+			} else {
+				if (symlen != i_size_read(inode)) {
+					pr_err("%s %llx.%llx BAD symlink size %lld\n",
+						__func__, ceph_vinop(inode), i_size_read(inode));
+					i_size_write(inode, symlen);
+					inode->i_blocks = calc_inode_blocks(symlen);
+				}
 
-			err = -ENOMEM;
-			sym = kstrndup(iinfo->symlink, symlen, GFP_NOFS);
-			if (!sym)
-				goto out;
+				err = -ENOMEM;
+				sym = kstrndup(iinfo->symlink, symlen, GFP_NOFS);
+				if (!sym)
+					goto out;
+			}
 
 			spin_lock(&ci->i_ceph_lock);
 			if (!ci->i_symlink)
@@ -1069,7 +1111,18 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 			else
 				kfree(sym); /* lost a race */
 		}
-		inode->i_link = ci->i_symlink;
+
+		if (IS_ENCRYPTED(inode)) {
+			/*
+			 * Encrypted symlinks need to be decrypted before we can
+			 * cache their targets in i_link. Leave it blank for now.
+			 */
+			inode->i_link = NULL;
+			inode->i_op = &ceph_encrypted_symlink_iops;
+		} else {
+			inode->i_link = ci->i_symlink;
+			inode->i_op = &ceph_symlink_iops;
+		}
 		break;
 	case S_IFDIR:
 		inode->i_op = &ceph_dir_iops;
@@ -2141,6 +2194,17 @@ static void ceph_inode_work(struct work_struct *work)
 	iput(inode);
 }
 
+static const char *ceph_encrypted_get_link(struct dentry *dentry, struct inode *inode,
+					   struct delayed_call *done)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+
+	if (!dentry)
+		return ERR_PTR(-ECHILD);
+
+	return fscrypt_get_symlink(inode, ci->i_symlink, i_size_read(inode), done);
+}
+
 /*
  * symlinks
  */
@@ -2151,6 +2215,13 @@ static const struct inode_operations ceph_symlink_iops = {
 	.listxattr = ceph_listxattr,
 };
 
+static const struct inode_operations ceph_encrypted_symlink_iops = {
+	.get_link = ceph_encrypted_get_link,
+	.setattr = ceph_setattr,
+	.getattr = ceph_getattr,
+	.listxattr = ceph_listxattr,
+};
+
 int __ceph_setattr(struct inode *inode, struct iattr *attr)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 19/20] ceph: make ceph_get_name decrypt filenames
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (17 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 18/20] ceph: create symlinks with encrypted and base64-encoded targets Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-13 17:50 ` [RFC PATCH v6 20/20] ceph: add fscrypt ioctls Jeff Layton
  2021-04-19 10:30 ` [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Luis Henriques
  20 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

When we do a lookupino to the MDS, we get a filename in the trace.
ceph_get_name uses that name directly, so we must properly decrypt
it before copying it to the name buffer.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/export.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/fs/ceph/export.c b/fs/ceph/export.c
index 65540a4429b2..c81af82568fd 100644
--- a/fs/ceph/export.c
+++ b/fs/ceph/export.c
@@ -7,6 +7,7 @@
 
 #include "super.h"
 #include "mds_client.h"
+#include "crypto.h"
 
 /*
  * Basic fh
@@ -524,7 +525,9 @@ static int ceph_get_name(struct dentry *parent, char *name,
 {
 	struct ceph_mds_client *mdsc;
 	struct ceph_mds_request *req;
+	struct inode *dir = d_inode(parent);
 	struct inode *inode = d_inode(child);
+	struct ceph_mds_reply_info_parsed *rinfo;
 	int err;
 
 	if (ceph_snap(inode) != CEPH_NOSNAP)
@@ -536,29 +539,46 @@ static int ceph_get_name(struct dentry *parent, char *name,
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	inode_lock(d_inode(parent));
-
+	inode_lock(dir);
 	req->r_inode = inode;
 	ihold(inode);
 	req->r_ino2 = ceph_vino(d_inode(parent));
-	req->r_parent = d_inode(parent);
+	req->r_parent = dir;
 	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
 	req->r_num_caps = 2;
 	err = ceph_mdsc_do_request(mdsc, NULL, req);
+	inode_unlock(dir);
 
-	inode_unlock(d_inode(parent));
+	if (err)
+		goto out;
 
-	if (!err) {
-		struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
+	rinfo = &req->r_reply_info;
+	if (!IS_ENCRYPTED(dir)) {
 		memcpy(name, rinfo->dname, rinfo->dname_len);
 		name[rinfo->dname_len] = 0;
-		dout("get_name %p ino %llx.%llx name %s\n",
-		     child, ceph_vinop(inode), name);
 	} else {
-		dout("get_name %p ino %llx.%llx err %d\n",
-		     child, ceph_vinop(inode), err);
-	}
+		struct fscrypt_str oname = FSTR_INIT(NULL, 0);
+		struct ceph_fname fname = { .dir	= dir,
+					    .name	= rinfo->dname,
+					    .ctext	= rinfo->altname,
+					    .name_len	= rinfo->dname_len,
+					    .ctext_len	= rinfo->altname_len };
+
+		err = ceph_fname_alloc_buffer(dir, &oname);
+		if (err < 0)
+			goto out;
 
+		err = ceph_fname_to_usr(&fname, NULL, &oname, NULL);
+		if (!err) {
+			memcpy(name, oname.name, oname.len);
+			name[oname.len] = 0;
+		}
+		ceph_fname_free_buffer(dir, &oname);
+	}
+out:
+	dout("get_name %p ino %llx.%llx err %d %s%s\n",
+		     child, ceph_vinop(inode), err,
+		     err ? "" : "name ", err ? "" : name);
 	ceph_mdsc_put_request(req);
 	return err;
 }
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v6 20/20] ceph: add fscrypt ioctls
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (18 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 19/20] ceph: make ceph_get_name decrypt filenames Jeff Layton
@ 2021-04-13 17:50 ` Jeff Layton
  2021-04-19 10:09   ` Luis Henriques
  2021-04-19 10:30 ` [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Luis Henriques
  20 siblings, 1 reply; 32+ messages in thread
From: Jeff Layton @ 2021-04-13 17:50 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fsdevel, linux-fscrypt, lhenriques

We gate most of the ioctls on MDS feature support. The exception is the
key removal and status functions that we still want to work if the MDS's
were to (inexplicably) lose the feature.

For the set_policy ioctl, we take Fcx caps to ensure that nothing can
create files in the directory while the ioctl is running. That should
be enough to ensure that the "empty_dir" check is reliable.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/ioctl.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)

diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c
index 6e061bf62ad4..485be1637fc0 100644
--- a/fs/ceph/ioctl.c
+++ b/fs/ceph/ioctl.c
@@ -6,6 +6,7 @@
 #include "mds_client.h"
 #include "ioctl.h"
 #include <linux/ceph/striper.h>
+#include <linux/fscrypt.h>
 
 /*
  * ioctls
@@ -268,8 +269,55 @@ static long ceph_ioctl_syncio(struct file *file)
 	return 0;
 }
 
+static int vet_mds_for_fscrypt(struct file *file)
+{
+	int i, ret = -EOPNOTSUPP;
+	struct ceph_mds_client	*mdsc = ceph_sb_to_mdsc(file_inode(file)->i_sb);
+
+	mutex_lock(&mdsc->mutex);
+	for (i = 0; i < mdsc->max_sessions; i++) {
+		struct ceph_mds_session *s = mdsc->sessions[i];
+
+		if (!s)
+			continue;
+		if (test_bit(CEPHFS_FEATURE_ALTERNATE_NAME, &s->s_features))
+			ret = 0;
+		break;
+	}
+	mutex_unlock(&mdsc->mutex);
+	return ret;
+}
+
+static long ceph_set_encryption_policy(struct file *file, unsigned long arg)
+{
+	int ret, got = 0;
+	struct inode *inode = file_inode(file);
+	struct ceph_inode_info *ci = ceph_inode(inode);
+
+	ret = vet_mds_for_fscrypt(file);
+	if (ret)
+		return ret;
+
+	/*
+	 * Ensure we hold these caps so that we _know_ that the rstats check
+	 * in the empty_dir check is reliable.
+	 */
+	ret = ceph_get_caps(file, CEPH_CAP_FILE_SHARED, 0, -1, &got);
+	if (ret)
+		return ret;
+
+	ret = fscrypt_ioctl_set_policy(file, (const void __user *)arg);
+	if (got)
+		ceph_put_cap_refs(ci, got);
+
+	return ret;
+}
+
 long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
+	int ret;
+	struct ceph_inode_info *ci = ceph_inode(file_inode(file));
+
 	dout("ioctl file %p cmd %u arg %lu\n", file, cmd, arg);
 	switch (cmd) {
 	case CEPH_IOC_GET_LAYOUT:
@@ -289,6 +337,51 @@ long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 
 	case CEPH_IOC_SYNCIO:
 		return ceph_ioctl_syncio(file);
+
+	case FS_IOC_SET_ENCRYPTION_POLICY:
+		return ceph_set_encryption_policy(file, arg);
+
+	case FS_IOC_GET_ENCRYPTION_POLICY:
+		ret = vet_mds_for_fscrypt(file);
+		if (ret)
+			return ret;
+		return fscrypt_ioctl_get_policy(file, (void __user *)arg);
+
+	case FS_IOC_GET_ENCRYPTION_POLICY_EX:
+		ret = vet_mds_for_fscrypt(file);
+		if (ret)
+			return ret;
+		return fscrypt_ioctl_get_policy_ex(file, (void __user *)arg);
+
+	case FS_IOC_ADD_ENCRYPTION_KEY:
+		ret = vet_mds_for_fscrypt(file);
+		if (ret)
+			return ret;
+		atomic_inc(&ci->i_shared_gen);
+		ceph_dir_clear_ordered(file_inode(file));
+		ceph_dir_clear_complete(file_inode(file));
+		return fscrypt_ioctl_add_key(file, (void __user *)arg);
+
+	case FS_IOC_REMOVE_ENCRYPTION_KEY:
+		atomic_inc(&ci->i_shared_gen);
+		ceph_dir_clear_ordered(file_inode(file));
+		ceph_dir_clear_complete(file_inode(file));
+		return fscrypt_ioctl_remove_key(file, (void __user *)arg);
+
+	case FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS:
+		atomic_inc(&ci->i_shared_gen);
+		ceph_dir_clear_ordered(file_inode(file));
+		ceph_dir_clear_complete(file_inode(file));
+		return fscrypt_ioctl_remove_key_all_users(file, (void __user *)arg);
+
+	case FS_IOC_GET_ENCRYPTION_KEY_STATUS:
+		return fscrypt_ioctl_get_key_status(file, (void __user *)arg);
+
+	case FS_IOC_GET_ENCRYPTION_NONCE:
+		ret = vet_mds_for_fscrypt(file);
+		if (ret)
+			return ret;
+		return fscrypt_ioctl_get_nonce(file, (void __user *)arg);
 	}
 
 	return -ENOTTY;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v6 20/20] ceph: add fscrypt ioctls
  2021-04-13 17:50 ` [RFC PATCH v6 20/20] ceph: add fscrypt ioctls Jeff Layton
@ 2021-04-19 10:09   ` Luis Henriques
  2021-04-19 12:19     ` Jeff Layton
  0 siblings, 1 reply; 32+ messages in thread
From: Luis Henriques @ 2021-04-19 10:09 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fsdevel, linux-fscrypt

Hi Jeff!

Jeff Layton <jlayton@kernel.org> writes:
<...>
> +
> +	case FS_IOC_ADD_ENCRYPTION_KEY:
> +		ret = vet_mds_for_fscrypt(file);
> +		if (ret)
> +			return ret;
> +		atomic_inc(&ci->i_shared_gen);

After spending some (well... a lot, actually) time looking at the MDS code
to try to figure out my bug, I'm back at this point in the kernel client
code.  I understand that this code is trying to invalidate the directory
dentries here.  However, I just found that the directory we get at this
point is the filesystem root directory, and not the directory we're trying
to unlock.

So, I still don't fully understand the issue I'm seeing, but I believe the
code above is assuming 'ci' is the inode being unlocked, which isn't
correct.

(Note: I haven't checked if there are other ioctls getting the FS root.)

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support
  2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (19 preceding siblings ...)
  2021-04-13 17:50 ` [RFC PATCH v6 20/20] ceph: add fscrypt ioctls Jeff Layton
@ 2021-04-19 10:30 ` Luis Henriques
  2021-04-19 12:23   ` Jeff Layton
  20 siblings, 1 reply; 32+ messages in thread
From: Luis Henriques @ 2021-04-19 10:30 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fsdevel, linux-fscrypt

Jeff Layton <jlayton@kernel.org> writes:

> The main change in this posting is in the detection of fscrypted inodes.
> The older set would grovel around in the xattr blob to see if it had an
> "encryption.ctx" xattr. This was problematic if the MDS didn't send
> xattrs in the trace, and not very efficient.
>
> This posting changes it to use the new "fscrypt" flag, which should
> always be reported by the MDS (Luis, I'm hoping this may fix the issues
> you were seeing with dcache coherency).

I just fetched from your updated 'ceph-fscrypt-fnames' branch (which I
assume contains this RFC series) and I'm now seeing the splat bellow.

Cheers,
--
Luis

[  149.508364] ============================================  
[  149.511075] WARNING: possible recursive locking detected  
[  149.513652] 5.12.0-rc4+ #140 Not tainted                                                                                                                                   
[  149.515656] --------------------------------------------                          
[  149.518293] cat/273 is trying to acquire lock:                                      
[  149.520570] ffff88813b3f6070 (&mdsc->mutex){+.+.}-{3:3}, at: ceph_mdsc_submit_request+0x206/0x600 [ceph]
[  149.525497]                                                                         
[  149.525497] but task is already holding lock:                                
[  149.528420] ffff88813b3f6070 (&mdsc->mutex){+.+.}-{3:3}, at: ceph_mdsc_submit_request+0x206/0x600 [ceph]
[  149.533163]                                                                         
[  149.533163] other info that might help us debug this:
[  149.536383]  Possible unsafe locking scenario:
[  149.536383] 
[  149.539344]        CPU0
[  149.540588]        ----
[  149.541870]   lock(&mdsc->mutex);
[  149.543534]   lock(&mdsc->mutex);
[  149.545205] 
[  149.545205]  *** DEADLOCK ***
[  149.545205] 
[  149.548142]  May be due to missing lock nesting notation
[  149.548142] 
[  149.551254] 2 locks held by cat/273:
[  149.552926]  #0: ffff88812296b590 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x959/0xe10
[  149.556923]  #1: ffff88813b3f6070 (&mdsc->mutex){+.+.}-{3:3}, at: ceph_mdsc_submit_request+0x206/0x600 [ceph]
[  149.560954] 
[  149.560954] stack backtrace:
[  149.562574] CPU: 0 PID: 273 Comm: cat Not tainted 5.12.0-rc4+ #140
[  149.564785] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
[  149.567573] Call Trace:
[  149.568207]  dump_stack+0x93/0xc2
[  149.569072]  __lock_acquire.cold+0x2e5/0x30f
[  149.570100]  ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
[  149.571318]  ? stack_trace_save+0x91/0xc0
[  149.572272]  ? mark_held_locks+0x65/0x90
[  149.573196]  lock_acquire+0x16d/0x4e0
[  149.574030]  ? ceph_mdsc_submit_request+0x206/0x600 [ceph]
[  149.575340]  ? lock_release+0x410/0x410
[  149.576211]  ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
[  149.577348]  ? mark_lock+0x101/0x1a20
[  149.578145]  ? mark_lock+0x101/0x1a20
[  149.578948]  __mutex_lock+0xfd/0xb80
[  149.579773]  ? ceph_mdsc_submit_request+0x206/0x600 [ceph]
[  149.581031]  ? ceph_mdsc_submit_request+0x206/0x600 [ceph]
[  149.582174]  ? ceph_get_cap_refs+0x1c/0x40 [ceph]
[  149.583171]  ? mutex_lock_io_nested+0xab0/0xab0
[  149.584079]  ? lock_release+0x1ea/0x410
[  149.584869]  ? ceph_mdsc_submit_request+0x42/0x600 [ceph]
[  149.585962]  ? lock_downgrade+0x390/0x390
[  149.586737]  ? lock_is_held_type+0x98/0x110
[  149.587565]  ? ceph_take_cap_refs+0x43/0x220 [ceph]
[  149.588560]  ceph_mdsc_submit_request+0x206/0x600 [ceph]
[  149.589603]  ceph_mdsc_do_request+0x31/0x320 [ceph]
[  149.590554]  __ceph_do_getattr+0xf9/0x2b0 [ceph]
[  149.591453]  __ceph_getxattr+0x2fa/0x480 [ceph]
[  149.592337]  ? find_held_lock+0x85/0xa0
[  149.593055]  ? lock_is_held_type+0x98/0x110
[  149.593799]  ceph_crypt_get_context+0x17/0x20 [ceph]
[  149.594732]  fscrypt_get_encryption_info+0x133/0x220
[  149.595621]  ? fscrypt_prepare_new_inode+0x160/0x160
[  149.596512]  ? dget_parent+0x95/0x2f0
[  149.597166]  ? lock_downgrade+0x390/0x390
[  149.597850]  ? rwlock_bug.part.0+0x60/0x60
[  149.598567]  ? lock_downgrade+0x390/0x390
[  149.599251]  ? do_raw_spin_unlock+0x93/0xf0
[  149.599968]  ? dget_parent+0xc4/0x2f0
[  149.600604]  ceph_mdsc_build_path.part.0+0x367/0x7c0 [ceph]
[  149.601587]  ? remove_session_caps_cb+0x7b0/0x7b0 [ceph]
[  149.602506]  ? __lock_acquire+0x863/0x3070
[  149.603188]  ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
[  149.604030]  ? __is_insn_slot_addr+0xc9/0x140
[  149.604774]  ? mark_lock+0x101/0x1a20
[  149.605365]  ? lock_is_held_type+0x98/0x110
[  149.606040]  ? find_held_lock+0x85/0xa0
[  149.606660]  ? lock_release+0x1ea/0x410
[  149.607279]  ? set_request_path_attr+0x173/0x500 [ceph]
[  149.608174]  ? lock_downgrade+0x390/0x390
[  149.608825]  ? find_held_lock+0x85/0xa0
[  149.609443]  ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
[  149.610267]  ? lock_release+0x1ea/0x410
[  149.610924]  set_request_path_attr+0x1a5/0x500 [ceph]
[  149.611811]  __prepare_send_request+0x30e/0x13c0 [ceph]
[  149.612847]  ? rwlock_bug.part.0+0x60/0x60
[  149.613551]  ? set_request_path_attr+0x500/0x500 [ceph]
[  149.614540]  ? __choose_mds+0x323/0xcb0 [ceph]
[  149.615398]  ? trim_caps_cb+0x3b0/0x3b0 [ceph]
[  149.616215]  ? rwlock_bug.part.0+0x60/0x60
[  149.616953]  ? ceph_get_mds_session+0xad/0x1e0 [ceph]
[  149.617847]  ? ceph_session_state_name+0x30/0x30 [ceph]
[  149.618788]  ? ceph_reserve_caps+0x331/0x5a0 [ceph]
[  149.619626]  __do_request+0x338/0x9b0 [ceph]
[  149.620376]  ? cleanup_session_requests+0x1b0/0x1b0 [ceph]
[  149.621347]  ? lock_is_held_type+0x98/0x110
[  149.622052]  ceph_mdsc_submit_request+0x4af/0x600 [ceph]
[  149.622998]  ceph_mdsc_do_request+0x31/0x320 [ceph]
[  149.623885]  ceph_atomic_open+0x3be/0x1050 [ceph]
[  149.624729]  ? d_alloc_parallel+0x576/0xe50
[  149.625309]  ? ceph_renew_caps+0x270/0x270 [ceph]
[  149.625986]  ? __d_lookup_rcu+0x2e0/0x2e0
[  149.626539]  ? lock_is_held_type+0x98/0x110
[  149.627113]  ? lockdep_hardirqs_on_prepare+0x12e/0x1f0
[  149.627835]  lookup_open.isra.0+0x5d2/0x7f0
[  149.628407]  ? hashlen_string+0xa0/0xa0
[  149.628961]  path_openat+0x457/0xe10
[  149.629468]  ? path_parentat+0xc0/0xc0
[  149.630047]  ? __alloc_pages_slowpath.constprop.0+0x1070/0x1070
[  149.630825]  ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
[  149.631540]  ? mntput_no_expire+0xe6/0x650
[  149.632080]  ? mark_held_locks+0x24/0x90
[  149.632605]  do_filp_open+0x10b/0x220
[  149.633100]  ? may_open_dev+0x50/0x50
[  149.633577]  ? lock_downgrade+0x390/0x390
[  149.634147]  ? do_raw_spin_lock+0x119/0x1b0
[  149.634785]  ? rwlock_bug.part.0+0x60/0x60
[  149.635423]  ? do_raw_spin_unlock+0x93/0xf0
[  149.636094]  ? _raw_spin_unlock+0x1f/0x30
[  149.636735]  ? alloc_fd+0x150/0x300
[  149.637284]  do_sys_openat2+0x115/0x240
[  149.637887]  ? build_open_flags+0x270/0x270
[  149.638511]  ? __ia32_compat_sys_newlstat+0x30/0x30
[  149.639264]  __x64_sys_openat+0xce/0x140
[  149.639878]  ? __ia32_compat_sys_open+0x120/0x120
[  149.640622]  ? lockdep_hardirqs_on_prepare+0x12e/0x1f0
[  149.641389]  ? syscall_enter_from_user_mode+0x1d/0x50
[  149.642175]  ? trace_hardirqs_on+0x32/0x100
[  149.642835]  do_syscall_64+0x33/0x40
[  149.643395]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  149.644165] RIP: 0033:0x7f6d190daffb
[  149.644705] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 5
[  149.647306] RSP: 002b:00007ffe706cec20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  149.648294] RAX: ffffffffffffffda RBX: 000055656fd16000 RCX: 00007f6d190daffb
[  149.649200] RDX: 0000000000000000 RSI: 00007ffe706d0eda RDI: 00000000ffffff9c
[  149.650094] RBP: 00007ffe706d0eda R08: 0000000000000000 R09: 0000000000000000
[  149.650983] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  149.651876] R13: 0000000000000002 R14: 00007ffe706cef48 R15: 0000000000020000

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v6 20/20] ceph: add fscrypt ioctls
  2021-04-19 10:09   ` Luis Henriques
@ 2021-04-19 12:19     ` Jeff Layton
  2021-04-19 19:54       ` Eric Biggers
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff Layton @ 2021-04-19 12:19 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, linux-fsdevel, linux-fscrypt

On Mon, 2021-04-19 at 11:09 +0100, Luis Henriques wrote:
> Hi Jeff!
> 
> Jeff Layton <jlayton@kernel.org> writes:
> <...>
> > +
> > +	case FS_IOC_ADD_ENCRYPTION_KEY:
> > +		ret = vet_mds_for_fscrypt(file);
> > +		if (ret)
> > +			return ret;
> > +		atomic_inc(&ci->i_shared_gen);
> 
> After spending some (well... a lot, actually) time looking at the MDS code
> to try to figure out my bug, I'm back at this point in the kernel client
> code.  I understand that this code is trying to invalidate the directory
> dentries here.  However, I just found that the directory we get at this
> point is the filesystem root directory, and not the directory we're trying
> to unlock.
> 
> So, I still don't fully understand the issue I'm seeing, but I believe the
> code above is assuming 'ci' is the inode being unlocked, which isn't
> correct.
> 
> (Note: I haven't checked if there are other ioctls getting the FS root.)
> 
> Cheers,


Oh, interesting. That was my assumption. I'll have to take a look more
closely at what effect that might have then.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support
  2021-04-19 10:30 ` [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Luis Henriques
@ 2021-04-19 12:23   ` Jeff Layton
  2021-04-19 16:03     ` Luis Henriques
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff Layton @ 2021-04-19 12:23 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, linux-fsdevel, linux-fscrypt

On Mon, 2021-04-19 at 11:30 +0100, Luis Henriques wrote:
> Jeff Layton <jlayton@kernel.org> writes:
> 
> > The main change in this posting is in the detection of fscrypted inodes.
> > The older set would grovel around in the xattr blob to see if it had an
> > "encryption.ctx" xattr. This was problematic if the MDS didn't send
> > xattrs in the trace, and not very efficient.
> > 
> > This posting changes it to use the new "fscrypt" flag, which should
> > always be reported by the MDS (Luis, I'm hoping this may fix the issues
> > you were seeing with dcache coherency).
> 
> I just fetched from your updated 'ceph-fscrypt-fnames' branch (which I
> assume contains this RFC series) and I'm now seeing the splat bellow.
> 
> Cheers,
> --
> Luis
> 
> [  149.508364] ============================================  
> [  149.511075] WARNING: possible recursive locking detected  
> [  149.513652] 5.12.0-rc4+ #140 Not tainted                                                                                                                                   
> [  149.515656] --------------------------------------------                          
> [  149.518293] cat/273 is trying to acquire lock:                                      
> [  149.520570] ffff88813b3f6070 (&mdsc->mutex){+.+.}-{3:3}, at: ceph_mdsc_submit_request+0x206/0x600 [ceph]
> [  149.525497]                                                                         
> [  149.525497] but task is already holding lock:                                
> [  149.528420] ffff88813b3f6070 (&mdsc->mutex){+.+.}-{3:3}, at: ceph_mdsc_submit_request+0x206/0x600 [ceph]
> [  149.533163]                                                                         
> [  149.533163] other info that might help us debug this:
> [  149.536383]  Possible unsafe locking scenario:
> [  149.536383] 
> [  149.539344]        CPU0
> [  149.540588]        ----
> [  149.541870]   lock(&mdsc->mutex);
> [  149.543534]   lock(&mdsc->mutex);
> [  149.545205] 
> [  149.545205]  *** DEADLOCK ***
> [  149.545205] 
> [  149.548142]  May be due to missing lock nesting notation
> [  149.548142] 
> [  149.551254] 2 locks held by cat/273:
> [  149.552926]  #0: ffff88812296b590 (&type->i_mutex_dir_key#7){++++}-{3:3}, at: path_openat+0x959/0xe10
> [  149.556923]  #1: ffff88813b3f6070 (&mdsc->mutex){+.+.}-{3:3}, at: ceph_mdsc_submit_request+0x206/0x600 [ceph]
> [  149.560954] 
> [  149.560954] stack backtrace:
> [  149.562574] CPU: 0 PID: 273 Comm: cat Not tainted 5.12.0-rc4+ #140
> [  149.564785] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
> [  149.567573] Call Trace:
> [  149.568207]  dump_stack+0x93/0xc2
> [  149.569072]  __lock_acquire.cold+0x2e5/0x30f
> [  149.570100]  ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
> [  149.571318]  ? stack_trace_save+0x91/0xc0
> [  149.572272]  ? mark_held_locks+0x65/0x90
> [  149.573196]  lock_acquire+0x16d/0x4e0
> [  149.574030]  ? ceph_mdsc_submit_request+0x206/0x600 [ceph]
> [  149.575340]  ? lock_release+0x410/0x410
> [  149.576211]  ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
> [  149.577348]  ? mark_lock+0x101/0x1a20
> [  149.578145]  ? mark_lock+0x101/0x1a20
> [  149.578948]  __mutex_lock+0xfd/0xb80
> [  149.579773]  ? ceph_mdsc_submit_request+0x206/0x600 [ceph]
> [  149.581031]  ? ceph_mdsc_submit_request+0x206/0x600 [ceph]
> [  149.582174]  ? ceph_get_cap_refs+0x1c/0x40 [ceph]
> [  149.583171]  ? mutex_lock_io_nested+0xab0/0xab0
> [  149.584079]  ? lock_release+0x1ea/0x410
> [  149.584869]  ? ceph_mdsc_submit_request+0x42/0x600 [ceph]
> [  149.585962]  ? lock_downgrade+0x390/0x390
> [  149.586737]  ? lock_is_held_type+0x98/0x110
> [  149.587565]  ? ceph_take_cap_refs+0x43/0x220 [ceph]
> [  149.588560]  ceph_mdsc_submit_request+0x206/0x600 [ceph]
> [  149.589603]  ceph_mdsc_do_request+0x31/0x320 [ceph]
> [  149.590554]  __ceph_do_getattr+0xf9/0x2b0 [ceph]
> [  149.591453]  __ceph_getxattr+0x2fa/0x480 [ceph]
> [  149.592337]  ? find_held_lock+0x85/0xa0
> [  149.593055]  ? lock_is_held_type+0x98/0x110
> [  149.593799]  ceph_crypt_get_context+0x17/0x20 [ceph]
> [  149.594732]  fscrypt_get_encryption_info+0x133/0x220
> [  149.595621]  ? fscrypt_prepare_new_inode+0x160/0x160
> [  149.596512]  ? dget_parent+0x95/0x2f0
> [  149.597166]  ? lock_downgrade+0x390/0x390
> [  149.597850]  ? rwlock_bug.part.0+0x60/0x60
> [  149.598567]  ? lock_downgrade+0x390/0x390
> [  149.599251]  ? do_raw_spin_unlock+0x93/0xf0
> [  149.599968]  ? dget_parent+0xc4/0x2f0
> [  149.600604]  ceph_mdsc_build_path.part.0+0x367/0x7c0 [ceph]
> [  149.601587]  ? remove_session_caps_cb+0x7b0/0x7b0 [ceph]
> [  149.602506]  ? __lock_acquire+0x863/0x3070
> [  149.603188]  ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
> [  149.604030]  ? __is_insn_slot_addr+0xc9/0x140
> [  149.604774]  ? mark_lock+0x101/0x1a20
> [  149.605365]  ? lock_is_held_type+0x98/0x110
> [  149.606040]  ? find_held_lock+0x85/0xa0
> [  149.606660]  ? lock_release+0x1ea/0x410
> [  149.607279]  ? set_request_path_attr+0x173/0x500 [ceph]
> [  149.608174]  ? lock_downgrade+0x390/0x390
> [  149.608825]  ? find_held_lock+0x85/0xa0
> [  149.609443]  ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
> [  149.610267]  ? lock_release+0x1ea/0x410
> [  149.610924]  set_request_path_attr+0x1a5/0x500 [ceph]
> [  149.611811]  __prepare_send_request+0x30e/0x13c0 [ceph]
> [  149.612847]  ? rwlock_bug.part.0+0x60/0x60
> [  149.613551]  ? set_request_path_attr+0x500/0x500 [ceph]
> [  149.614540]  ? __choose_mds+0x323/0xcb0 [ceph]
> [  149.615398]  ? trim_caps_cb+0x3b0/0x3b0 [ceph]
> [  149.616215]  ? rwlock_bug.part.0+0x60/0x60
> [  149.616953]  ? ceph_get_mds_session+0xad/0x1e0 [ceph]
> [  149.617847]  ? ceph_session_state_name+0x30/0x30 [ceph]
> [  149.618788]  ? ceph_reserve_caps+0x331/0x5a0 [ceph]
> [  149.619626]  __do_request+0x338/0x9b0 [ceph]
> [  149.620376]  ? cleanup_session_requests+0x1b0/0x1b0 [ceph]
> [  149.621347]  ? lock_is_held_type+0x98/0x110
> [  149.622052]  ceph_mdsc_submit_request+0x4af/0x600 [ceph]
> [  149.622998]  ceph_mdsc_do_request+0x31/0x320 [ceph]
> [  149.623885]  ceph_atomic_open+0x3be/0x1050 [ceph]
> [  149.624729]  ? d_alloc_parallel+0x576/0xe50
> [  149.625309]  ? ceph_renew_caps+0x270/0x270 [ceph]
> [  149.625986]  ? __d_lookup_rcu+0x2e0/0x2e0
> [  149.626539]  ? lock_is_held_type+0x98/0x110
> [  149.627113]  ? lockdep_hardirqs_on_prepare+0x12e/0x1f0
> [  149.627835]  lookup_open.isra.0+0x5d2/0x7f0
> [  149.628407]  ? hashlen_string+0xa0/0xa0
> [  149.628961]  path_openat+0x457/0xe10
> [  149.629468]  ? path_parentat+0xc0/0xc0
> [  149.630047]  ? __alloc_pages_slowpath.constprop.0+0x1070/0x1070
> [  149.630825]  ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
> [  149.631540]  ? mntput_no_expire+0xe6/0x650
> [  149.632080]  ? mark_held_locks+0x24/0x90
> [  149.632605]  do_filp_open+0x10b/0x220
> [  149.633100]  ? may_open_dev+0x50/0x50
> [  149.633577]  ? lock_downgrade+0x390/0x390
> [  149.634147]  ? do_raw_spin_lock+0x119/0x1b0
> [  149.634785]  ? rwlock_bug.part.0+0x60/0x60
> [  149.635423]  ? do_raw_spin_unlock+0x93/0xf0
> [  149.636094]  ? _raw_spin_unlock+0x1f/0x30
> [  149.636735]  ? alloc_fd+0x150/0x300
> [  149.637284]  do_sys_openat2+0x115/0x240
> [  149.637887]  ? build_open_flags+0x270/0x270
> [  149.638511]  ? __ia32_compat_sys_newlstat+0x30/0x30
> [  149.639264]  __x64_sys_openat+0xce/0x140
> [  149.639878]  ? __ia32_compat_sys_open+0x120/0x120
> [  149.640622]  ? lockdep_hardirqs_on_prepare+0x12e/0x1f0
> [  149.641389]  ? syscall_enter_from_user_mode+0x1d/0x50
> [  149.642175]  ? trace_hardirqs_on+0x32/0x100
> [  149.642835]  do_syscall_64+0x33/0x40
> [  149.643395]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  149.644165] RIP: 0033:0x7f6d190daffb
> [  149.644705] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 5
> [  149.647306] RSP: 002b:00007ffe706cec20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
> [  149.648294] RAX: ffffffffffffffda RBX: 000055656fd16000 RCX: 00007f6d190daffb
> [  149.649200] RDX: 0000000000000000 RSI: 00007ffe706d0eda RDI: 00000000ffffff9c
> [  149.650094] RBP: 00007ffe706d0eda R08: 0000000000000000 R09: 0000000000000000
> [  149.650983] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [  149.651876] R13: 0000000000000002 R14: 00007ffe706cef48 R15: 0000000000020000

Ouch. That looks like a real bug, alright.

Basically when building the path, we occasionally need to fetch the
crypto context for parent inodes and such, and that can cause us to
recurse back into __ceph_getxattr and try to issue another RPC to the
MDS.

I'll have to look and see what we can do. Maybe it's safe to drop the
mdsc->mutex while we're building the path? Or maybe this is a good time
to re-think a lot of the really onerous locking in this codepath?

I'm open to suggestions here...
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support
  2021-04-19 12:23   ` Jeff Layton
@ 2021-04-19 16:03     ` Luis Henriques
  2021-04-19 16:28       ` Jeff Layton
  0 siblings, 1 reply; 32+ messages in thread
From: Luis Henriques @ 2021-04-19 16:03 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fsdevel, linux-fscrypt

Jeff Layton <jlayton@kernel.org> writes:

> On Mon, 2021-04-19 at 11:30 +0100, Luis Henriques wrote:
...
> Ouch. That looks like a real bug, alright.
>
> Basically when building the path, we occasionally need to fetch the
> crypto context for parent inodes and such, and that can cause us to
> recurse back into __ceph_getxattr and try to issue another RPC to the
> MDS.
>
> I'll have to look and see what we can do. Maybe it's safe to drop the
> mdsc->mutex while we're building the path? Or maybe this is a good time
> to re-think a lot of the really onerous locking in this codepath?
>
> I'm open to suggestions here...

Yeah, I couldn't see a good fix at a first glace.  Dropping the mutex
while building the path was my initial thought too but it's not easy to
proof that's a safe thing to do.

The other idea I had was to fetch all the needed fscrypt contexts at the
end, after building the path.  But I didn't found a way for doing that
because to build the path... we need the contexts.

It looks like this leaves us with the locking rethinking option.

/me tries harder to find another way out

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support
  2021-04-19 16:03     ` Luis Henriques
@ 2021-04-19 16:28       ` Jeff Layton
  2021-04-20 10:11         ` Luis Henriques
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff Layton @ 2021-04-19 16:28 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, linux-fsdevel, linux-fscrypt

On Mon, 2021-04-19 at 17:03 +0100, Luis Henriques wrote:
> Jeff Layton <jlayton@kernel.org> writes:
> 
> > On Mon, 2021-04-19 at 11:30 +0100, Luis Henriques wrote:
> ...
> > Ouch. That looks like a real bug, alright.
> > 
> > Basically when building the path, we occasionally need to fetch the
> > crypto context for parent inodes and such, and that can cause us to
> > recurse back into __ceph_getxattr and try to issue another RPC to the
> > MDS.
> > 
> > I'll have to look and see what we can do. Maybe it's safe to drop the
> > mdsc->mutex while we're building the path? Or maybe this is a good time
> > to re-think a lot of the really onerous locking in this codepath?
> > 
> > I'm open to suggestions here...
> 
> Yeah, I couldn't see a good fix at a first glace.  Dropping the mutex
> while building the path was my initial thought too but it's not easy to
> proof that's a safe thing to do.
> 

Indeed. It's an extremely coarse-grained mutex and not at all clear what
it protects here.

> The other idea I had was to fetch all the needed fscrypt contexts at the
> end, after building the path.  But I didn't found a way for doing that
> because to build the path... we need the contexts.
> 
> It looks like this leaves us with the locking rethinking option.
> 
> /me tries harder to find another way out
> 
> Cheers,

The other option I think is to not store the context in an xattr at all,
and instead make a dedicated field in the inode for it that we can
ensure is always present for encrypted inodes.  For the most part the
crypto context is a static thing. The only exception is when we're first
encrypting an empty dir.

We already have the fscrypt bool in the inodestat, and we're going to
need another field to hold the real size for files. It may be worthwhile
to just reconsider the design at that level. Maybe we just need to carve
out a chunk of fscrypt space in the inode for the client and let it
manage that however it sees fit.
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v6 20/20] ceph: add fscrypt ioctls
  2021-04-19 12:19     ` Jeff Layton
@ 2021-04-19 19:54       ` Eric Biggers
  2021-04-20  9:34         ` Luis Henriques
  2021-04-20 11:45         ` Jeff Layton
  0 siblings, 2 replies; 32+ messages in thread
From: Eric Biggers @ 2021-04-19 19:54 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Luis Henriques, ceph-devel, linux-fsdevel, linux-fscrypt

On Mon, Apr 19, 2021 at 08:19:59AM -0400, Jeff Layton wrote:
> On Mon, 2021-04-19 at 11:09 +0100, Luis Henriques wrote:
> > Hi Jeff!
> > 
> > Jeff Layton <jlayton@kernel.org> writes:
> > <...>
> > > +
> > > +	case FS_IOC_ADD_ENCRYPTION_KEY:
> > > +		ret = vet_mds_for_fscrypt(file);
> > > +		if (ret)
> > > +			return ret;
> > > +		atomic_inc(&ci->i_shared_gen);
> > 
> > After spending some (well... a lot, actually) time looking at the MDS code
> > to try to figure out my bug, I'm back at this point in the kernel client
> > code.  I understand that this code is trying to invalidate the directory
> > dentries here.  However, I just found that the directory we get at this
> > point is the filesystem root directory, and not the directory we're trying
> > to unlock.
> > 
> > So, I still don't fully understand the issue I'm seeing, but I believe the
> > code above is assuming 'ci' is the inode being unlocked, which isn't
> > correct.
> > 
> > (Note: I haven't checked if there are other ioctls getting the FS root.)
> > 
> > Cheers,
> 
> 
> Oh, interesting. That was my assumption. I'll have to take a look more
> closely at what effect that might have then.
> 

FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY,
FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS, and FS_IOC_GET_ENCRYPTION_KEY_STATUS can
all be executed on any file or directory on the filesystem (but preferably on
the root directory) because they are operations on the filesystem, not on any
specific file or directory.  They deal with encryption keys, which can protect
any number of encrypted directories (even 0 or a large number) and/or even loose
encrypted files that got moved into an unencrypted directory.

Note that this is all described in the documentation
(https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html).
If the documentation is unclear please suggest improvements to it.

Also, there shouldn't be any need for FS_IOC_ADD_ENCRYPTION_KEY to invalidate
dentries itself because that is the point of fscrypt_d_revalidate(); the
invalidation happens on-demand later.

- Eric

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v6 20/20] ceph: add fscrypt ioctls
  2021-04-19 19:54       ` Eric Biggers
@ 2021-04-20  9:34         ` Luis Henriques
  2021-04-20 11:45         ` Jeff Layton
  1 sibling, 0 replies; 32+ messages in thread
From: Luis Henriques @ 2021-04-20  9:34 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Jeff Layton, ceph-devel, linux-fsdevel, linux-fscrypt

Eric Biggers <ebiggers@kernel.org> writes:

> On Mon, Apr 19, 2021 at 08:19:59AM -0400, Jeff Layton wrote:
>> On Mon, 2021-04-19 at 11:09 +0100, Luis Henriques wrote:
>> > Hi Jeff!
>> > 
>> > Jeff Layton <jlayton@kernel.org> writes:
>> > <...>
>> > > +
>> > > +	case FS_IOC_ADD_ENCRYPTION_KEY:
>> > > +		ret = vet_mds_for_fscrypt(file);
>> > > +		if (ret)
>> > > +			return ret;
>> > > +		atomic_inc(&ci->i_shared_gen);
>> > 
>> > After spending some (well... a lot, actually) time looking at the MDS code
>> > to try to figure out my bug, I'm back at this point in the kernel client
>> > code.  I understand that this code is trying to invalidate the directory
>> > dentries here.  However, I just found that the directory we get at this
>> > point is the filesystem root directory, and not the directory we're trying
>> > to unlock.
>> > 
>> > So, I still don't fully understand the issue I'm seeing, but I believe the
>> > code above is assuming 'ci' is the inode being unlocked, which isn't
>> > correct.
>> > 
>> > (Note: I haven't checked if there are other ioctls getting the FS root.)
>> > 
>> > Cheers,
>> 
>> 
>> Oh, interesting. That was my assumption. I'll have to take a look more
>> closely at what effect that might have then.
>> 
>
> FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY,
> FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS, and FS_IOC_GET_ENCRYPTION_KEY_STATUS can
> all be executed on any file or directory on the filesystem (but preferably on
> the root directory) because they are operations on the filesystem, not on any
> specific file or directory.  They deal with encryption keys, which can protect
> any number of encrypted directories (even 0 or a large number) and/or even loose
> encrypted files that got moved into an unencrypted directory.
>
> Note that this is all described in the documentation
> (https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html).
> If the documentation is unclear please suggest improvements to it.
>
> Also, there shouldn't be any need for FS_IOC_ADD_ENCRYPTION_KEY to invalidate
> dentries itself because that is the point of fscrypt_d_revalidate(); the
> invalidation happens on-demand later.

I think the documentation is very clear regarding these ioctls.  I guess I
just need to go refresh my memory as I have read that document long time
ago.  Thanks for reminding me to do that ;-)

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support
  2021-04-19 16:28       ` Jeff Layton
@ 2021-04-20 10:11         ` Luis Henriques
  2021-04-20 15:52           ` Jeff Layton
  0 siblings, 1 reply; 32+ messages in thread
From: Luis Henriques @ 2021-04-20 10:11 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fsdevel, linux-fscrypt

Jeff Layton <jlayton@kernel.org> writes:

> On Mon, 2021-04-19 at 17:03 +0100, Luis Henriques wrote:
>> Jeff Layton <jlayton@kernel.org> writes:
>> 
>> > On Mon, 2021-04-19 at 11:30 +0100, Luis Henriques wrote:
>> ...
>> > Ouch. That looks like a real bug, alright.
>> > 
>> > Basically when building the path, we occasionally need to fetch the
>> > crypto context for parent inodes and such, and that can cause us to
>> > recurse back into __ceph_getxattr and try to issue another RPC to the
>> > MDS.
>> > 
>> > I'll have to look and see what we can do. Maybe it's safe to drop the
>> > mdsc->mutex while we're building the path? Or maybe this is a good time
>> > to re-think a lot of the really onerous locking in this codepath?
>> > 
>> > I'm open to suggestions here...
>> 
>> Yeah, I couldn't see a good fix at a first glace.  Dropping the mutex
>> while building the path was my initial thought too but it's not easy to
>> proof that's a safe thing to do.
>> 
>
> Indeed. It's an extremely coarse-grained mutex and not at all clear what
> it protects here.
>
>> The other idea I had was to fetch all the needed fscrypt contexts at the
>> end, after building the path.  But I didn't found a way for doing that
>> because to build the path... we need the contexts.
>> 
>> It looks like this leaves us with the locking rethinking option.
>> 
>> /me tries harder to find another way out
>> 
>> Cheers,
>
> The other option I think is to not store the context in an xattr at all,
> and instead make a dedicated field in the inode for it that we can
> ensure is always present for encrypted inodes.  For the most part the
> crypto context is a static thing. The only exception is when we're first
> encrypting an empty dir.
>
> We already have the fscrypt bool in the inodestat, and we're going to
> need another field to hold the real size for files. It may be worthwhile
> to just reconsider the design at that level. Maybe we just need to carve
> out a chunk of fscrypt space in the inode for the client and let it
> manage that however it sees fit.

That's another solution.  Since the initial (naïfe) idea of having a
client-only implementation with fscrypt-agnostic MDSs is long gone, the
design can (still) be fixed to do that.  This will definitely allow to
move forward with the fscrypt implementation.  (But we'll probably be
bitten again with these recursive RPCs in the future!)

Anyway, this is probably the most interesting solution as it also reduces
the need for extra calls to MDS.  And the fscrypt bool in inodestat
probably becomes redundant and can be dropped.

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v6 20/20] ceph: add fscrypt ioctls
  2021-04-19 19:54       ` Eric Biggers
  2021-04-20  9:34         ` Luis Henriques
@ 2021-04-20 11:45         ` Jeff Layton
  1 sibling, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-20 11:45 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Luis Henriques, ceph-devel, linux-fsdevel, linux-fscrypt

On Mon, 2021-04-19 at 12:54 -0700, Eric Biggers wrote:
> On Mon, Apr 19, 2021 at 08:19:59AM -0400, Jeff Layton wrote:
> > On Mon, 2021-04-19 at 11:09 +0100, Luis Henriques wrote:
> > > Hi Jeff!
> > > 
> > > Jeff Layton <jlayton@kernel.org> writes:
> > > <...>
> > > > +
> > > > +	case FS_IOC_ADD_ENCRYPTION_KEY:
> > > > +		ret = vet_mds_for_fscrypt(file);
> > > > +		if (ret)
> > > > +			return ret;
> > > > +		atomic_inc(&ci->i_shared_gen);
> > > 
> > > After spending some (well... a lot, actually) time looking at the MDS code
> > > to try to figure out my bug, I'm back at this point in the kernel client
> > > code.  I understand that this code is trying to invalidate the directory
> > > dentries here.  However, I just found that the directory we get at this
> > > point is the filesystem root directory, and not the directory we're trying
> > > to unlock.
> > > 
> > > So, I still don't fully understand the issue I'm seeing, but I believe the
> > > code above is assuming 'ci' is the inode being unlocked, which isn't
> > > correct.
> > > 
> > > (Note: I haven't checked if there are other ioctls getting the FS root.)
> > > 
> > > Cheers,
> > 
> > 
> > Oh, interesting. That was my assumption. I'll have to take a look more
> > closely at what effect that might have then.
> > 
> 
> FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY,
> FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS, and FS_IOC_GET_ENCRYPTION_KEY_STATUS can
> all be executed on any file or directory on the filesystem (but preferably on
> the root directory) because they are operations on the filesystem, not on any
> specific file or directory.  They deal with encryption keys, which can protect
> any number of encrypted directories (even 0 or a large number) and/or even loose
> encrypted files that got moved into an unencrypted directory.
> 
> Note that this is all described in the documentation
> (https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html).
> If the documentation is unclear please suggest improvements to it.
> 
> Also, there shouldn't be any need for FS_IOC_ADD_ENCRYPTION_KEY to invalidate
> dentries itself because that is the point of fscrypt_d_revalidate(); the
> invalidation happens on-demand later.


Ok, thanks. I'll plan to drop the invalidation from the ioctl codepaths,
and leave it up to fscrypt_d_revalidate to sort out.
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support
  2021-04-20 10:11         ` Luis Henriques
@ 2021-04-20 15:52           ` Jeff Layton
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2021-04-20 15:52 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, linux-fsdevel, linux-fscrypt

On Tue, 2021-04-20 at 11:11 +0100, Luis Henriques wrote:
> Jeff Layton <jlayton@kernel.org> writes:
> 
> > On Mon, 2021-04-19 at 17:03 +0100, Luis Henriques wrote:
> > > Jeff Layton <jlayton@kernel.org> writes:
> > > 
> > > > On Mon, 2021-04-19 at 11:30 +0100, Luis Henriques wrote:
> > > ...
> > > > Ouch. That looks like a real bug, alright.
> > > > 
> > > > Basically when building the path, we occasionally need to fetch the
> > > > crypto context for parent inodes and such, and that can cause us to
> > > > recurse back into __ceph_getxattr and try to issue another RPC to the
> > > > MDS.
> > > > 
> > > > I'll have to look and see what we can do. Maybe it's safe to drop the
> > > > mdsc->mutex while we're building the path? Or maybe this is a good time
> > > > to re-think a lot of the really onerous locking in this codepath?
> > > > 
> > > > I'm open to suggestions here...
> > > 
> > > Yeah, I couldn't see a good fix at a first glace.  Dropping the mutex
> > > while building the path was my initial thought too but it's not easy to
> > > proof that's a safe thing to do.
> > > 
> > 
> > Indeed. It's an extremely coarse-grained mutex and not at all clear what
> > it protects here.
> > 
> > > The other idea I had was to fetch all the needed fscrypt contexts at the
> > > end, after building the path.  But I didn't found a way for doing that
> > > because to build the path... we need the contexts.
> > > 
> > > It looks like this leaves us with the locking rethinking option.
> > > 
> > > /me tries harder to find another way out
> > > 
> > > Cheers,
> > 
> > The other option I think is to not store the context in an xattr at all,
> > and instead make a dedicated field in the inode for it that we can
> > ensure is always present for encrypted inodes.  For the most part the
> > crypto context is a static thing. The only exception is when we're first
> > encrypting an empty dir.
> > 
> > We already have the fscrypt bool in the inodestat, and we're going to
> > need another field to hold the real size for files. It may be worthwhile
> > to just reconsider the design at that level. Maybe we just need to carve
> > out a chunk of fscrypt space in the inode for the client and let it
> > manage that however it sees fit.
> 
> That's another solution.  Since the initial (naïfe) idea of having a
> client-only implementation with fscrypt-agnostic MDSs is long gone, the
> design can (still) be fixed to do that.  This will definitely allow to
> move forward with the fscrypt implementation.  (But we'll probably be
> bitten again with these recursive RPCs in the future!)
> 
> Anyway, this is probably the most interesting solution as it also reduces
> the need for extra calls to MDS.  And the fscrypt bool in inodestat
> probably becomes redundant and can be dropped.
> 

We probably can't drop the bool from the protocol, as it's now in a
released version (Pacific).

What we can do is drop tracking the bool internally in the MDS, and just
set that to true if the fscrypt blob isn't zero-length.

Cheers,
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-04-20 15:52 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13 17:50 [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 01/20] vfs: export new_inode_pseudo Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 02/20] fscrypt: export fscrypt_base64_encode and fscrypt_base64_decode Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 03/20] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 04/20] fscrypt: add fscrypt_context_for_new_inode Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 05/20] ceph: crypto context handling for ceph Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 06/20] ceph: implement -o test_dummy_encryption mount option Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 07/20] ceph: preallocate inode for ops that may create one Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 08/20] ceph: add routine to create fscrypt context prior to RPC Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 09/20] ceph: make ceph_msdc_build_path use ref-walk Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 10/20] ceph: add encrypted fname handling to ceph_mdsc_build_path Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 11/20] ceph: decode alternate_name in lease info Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 12/20] ceph: send altname in MClientRequest Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 13/20] ceph: properly set DCACHE_NOKEY_NAME flag in lookup Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 14/20] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 15/20] ceph: add helpers for converting names for userland presentation Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 16/20] ceph: add fscrypt support to ceph_fill_trace Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 17/20] ceph: add support to readdir for encrypted filenames Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 18/20] ceph: create symlinks with encrypted and base64-encoded targets Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 19/20] ceph: make ceph_get_name decrypt filenames Jeff Layton
2021-04-13 17:50 ` [RFC PATCH v6 20/20] ceph: add fscrypt ioctls Jeff Layton
2021-04-19 10:09   ` Luis Henriques
2021-04-19 12:19     ` Jeff Layton
2021-04-19 19:54       ` Eric Biggers
2021-04-20  9:34         ` Luis Henriques
2021-04-20 11:45         ` Jeff Layton
2021-04-19 10:30 ` [RFC PATCH v6 00/20] ceph+fscrypt: context, filename and symlink support Luis Henriques
2021-04-19 12:23   ` Jeff Layton
2021-04-19 16:03     ` Luis Henriques
2021-04-19 16:28       ` Jeff Layton
2021-04-20 10:11         ` Luis Henriques
2021-04-20 15:52           ` Jeff Layton

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox