ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support
@ 2021-03-26 17:32 Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 01/19] vfs: export new_inode_pseudo Jeff Layton
                   ` (20 more replies)
  0 siblings, 21 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

I haven't posted this in a while and there were some bugs shaken out of
the last posting. This adds (partial) support for fscrypt to kcephfs,
including crypto contexts, filenames and encrypted symlink targets. At
this point, the xfstests quick tests that generally pass without fscrypt
also pass with test_dummy_encryption enabled.

There is one lingering bug that I'm having trouble tracking down: xfstest
generic/477 (an open_by_handle_at test) sometimes throws a "Busy inodes
after umount" warning. I'm narrowed down the issue a bit, but there is
some raciness involved so I haven't quite nailed it down yet.

This set is quite invasive. There is probably some further work to be
done to add common code helpers and the like, but the final diffstat
probably won't look too different.

This set does not include encryption of file contents. That is turning
out to be a bit trickier than first expected owing to the fact that the
MDS is usually what handles truncation, and the i_size no longer
represents the amount of data stored in the backing store. That will
probably require an MDS change to fix, and we're still sorting out the
details.

Jeff Layton (19):
  vfs: export new_inode_pseudo
  fscrypt: export fscrypt_base64_encode and fscrypt_base64_decode
  fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
  fscrypt: add fscrypt_context_for_new_inode
  ceph: crypto context handling for ceph
  ceph: implement -o test_dummy_encryption mount option
  ceph: preallocate inode for ops that may create one
  ceph: add routine to create fscrypt context prior to RPC
  ceph: make ceph_msdc_build_path use ref-walk
  ceph: add encrypted fname handling to ceph_mdsc_build_path
  ceph: decode alternate_name in lease info
  ceph: send altname in MClientRequest
  ceph: properly set DCACHE_NOKEY_NAME flag in lookup
  ceph: make d_revalidate call fscrypt revalidator for encrypted
    dentries
  ceph: add helpers for converting names for userland presentation
  ceph: add fscrypt support to ceph_fill_trace
  ceph: add support to readdir for encrypted filenames
  ceph: create symlinks with encrypted and base64-encoded targets
  ceph: add fscrypt ioctls

 fs/ceph/Makefile            |   1 +
 fs/ceph/crypto.c            | 185 +++++++++++++++++++++++
 fs/ceph/crypto.h            | 101 +++++++++++++
 fs/ceph/dir.c               | 178 ++++++++++++++++++-----
 fs/ceph/file.c              |  56 ++++---
 fs/ceph/inode.c             | 255 +++++++++++++++++++++++++++++---
 fs/ceph/ioctl.c             |  94 ++++++++++++
 fs/ceph/mds_client.c        | 283 ++++++++++++++++++++++++++++++------
 fs/ceph/mds_client.h        |  14 +-
 fs/ceph/super.c             |  80 +++++++++-
 fs/ceph/super.h             |  16 +-
 fs/ceph/xattr.c             |  32 ++++
 fs/crypto/fname.c           |  53 +++++--
 fs/crypto/fscrypt_private.h |   9 +-
 fs/crypto/hooks.c           |   6 +-
 fs/crypto/policy.c          |  34 ++++-
 fs/inode.c                  |   1 +
 include/linux/fscrypt.h     |  10 ++
 18 files changed, 1246 insertions(+), 162 deletions(-)
 create mode 100644 fs/ceph/crypto.c
 create mode 100644 fs/ceph/crypto.h

-- 
2.30.2


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 01/19] vfs: export new_inode_pseudo
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-04-08  1:08   ` Eric Biggers
  2021-03-26 17:32 ` [RFC PATCH v5 02/19] fscrypt: export fscrypt_base64_encode and fscrypt_base64_decode Jeff Layton
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel, Al Viro

Ceph needs to be able to allocate inodes ahead of a create that might
involve a fscrypt-encrypted inode. new_inode() almost fits the bill,
but it puts the inode on the sb->s_inodes list and when we go to hash
it, that might be done again.

We could work around that by setting I_CREATING on the new inode, but
that causes ilookup5 to return -ESTALE if something tries to find it
before I_NEW is cleared. To work around all of this, just use
new_inode_pseudo which doesn't add it to the list.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/inode.c b/fs/inode.c
index a047ab306f9a..0745dc5d0924 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -935,6 +935,7 @@ struct inode *new_inode_pseudo(struct super_block *sb)
 	}
 	return inode;
 }
+EXPORT_SYMBOL(new_inode_pseudo);
 
 /**
  *	new_inode 	- obtain an inode
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 02/19] fscrypt: export fscrypt_base64_encode and fscrypt_base64_decode
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 01/19] vfs: export new_inode_pseudo Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-04-08  1:06   ` Eric Biggers
  2021-03-26 17:32 ` [RFC PATCH v5 03/19] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size Jeff Layton
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

Ceph will need to base64-encode some encrypted filenames, so make
these routines, and FSCRYPT_BASE64_CHARS available to modules.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/crypto/fname.c       | 34 ++++++++++++++++++++++++----------
 include/linux/fscrypt.h |  5 +++++
 2 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 6ca7d16593ff..32b1f50433ba 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -178,10 +178,8 @@ static int fname_decrypt(const struct inode *inode,
 static const char lookup_table[65] =
 	"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+,";
 
-#define BASE64_CHARS(nbytes)	DIV_ROUND_UP((nbytes) * 4, 3)
-
 /**
- * base64_encode() - base64-encode some bytes
+ * fscrypt_base64_encode() - base64-encode some bytes
  * @src: the bytes to encode
  * @len: number of bytes to encode
  * @dst: (output) the base64-encoded string.  Not NUL-terminated.
@@ -191,7 +189,7 @@ static const char lookup_table[65] =
  *
  * Return: length of the encoded string
  */
-static int base64_encode(const u8 *src, int len, char *dst)
+int fscrypt_base64_encode(const u8 *src, int len, char *dst)
 {
 	int i, bits = 0, ac = 0;
 	char *cp = dst;
@@ -209,8 +207,20 @@ static int base64_encode(const u8 *src, int len, char *dst)
 		*cp++ = lookup_table[ac & 0x3f];
 	return cp - dst;
 }
+EXPORT_SYMBOL(fscrypt_base64_encode);
 
-static int base64_decode(const char *src, int len, u8 *dst)
+/**
+ * fscrypt_base64_decode() - base64-decode some bytes
+ * @src: the bytes to decode
+ * @len: number of bytes to decode
+ * @dst: (output) decoded binary data
+ *
+ * Decode an input string that was previously encoded using
+ * fscrypt_base64_encode.
+ *
+ * Return: length of the decoded binary data
+ */
+int fscrypt_base64_decode(const char *src, int len, u8 *dst)
 {
 	int i, bits = 0, ac = 0;
 	const char *p;
@@ -232,6 +242,7 @@ static int base64_decode(const char *src, int len, u8 *dst)
 		return -1;
 	return cp - dst;
 }
+EXPORT_SYMBOL(fscrypt_base64_decode);
 
 bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
 				  u32 orig_len, u32 max_len,
@@ -263,8 +274,9 @@ bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
 int fscrypt_fname_alloc_buffer(u32 max_encrypted_len,
 			       struct fscrypt_str *crypto_str)
 {
-	const u32 max_encoded_len = BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX);
 	u32 max_presented_len;
+	const u32 max_encoded_len =
+		FSCRYPT_BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX);
 
 	max_presented_len = max(max_encoded_len, max_encrypted_len);
 
@@ -342,7 +354,7 @@ int fscrypt_fname_disk_to_usr(const struct inode *inode,
 		     offsetof(struct fscrypt_nokey_name, bytes));
 	BUILD_BUG_ON(offsetofend(struct fscrypt_nokey_name, bytes) !=
 		     offsetof(struct fscrypt_nokey_name, sha256));
-	BUILD_BUG_ON(BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX) > NAME_MAX);
+	BUILD_BUG_ON(FSCRYPT_BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX) > NAME_MAX);
 
 	if (hash) {
 		nokey_name.dirhash[0] = hash;
@@ -362,7 +374,8 @@ int fscrypt_fname_disk_to_usr(const struct inode *inode,
 		       nokey_name.sha256);
 		size = FSCRYPT_NOKEY_NAME_MAX;
 	}
-	oname->len = base64_encode((const u8 *)&nokey_name, size, oname->name);
+	oname->len = fscrypt_base64_encode((const u8 *)&nokey_name, size,
+					   oname->name);
 	return 0;
 }
 EXPORT_SYMBOL(fscrypt_fname_disk_to_usr);
@@ -436,14 +449,15 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname,
 	 * user-supplied name
 	 */
 
-	if (iname->len > BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX))
+	if (iname->len > FSCRYPT_BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX))
 		return -ENOENT;
 
 	fname->crypto_buf.name = kmalloc(FSCRYPT_NOKEY_NAME_MAX, GFP_KERNEL);
 	if (fname->crypto_buf.name == NULL)
 		return -ENOMEM;
 
-	ret = base64_decode(iname->name, iname->len, fname->crypto_buf.name);
+	ret = fscrypt_base64_decode(iname->name, iname->len,
+				    fname->crypto_buf.name);
 	if (ret < (int)offsetof(struct fscrypt_nokey_name, bytes[1]) ||
 	    (ret > offsetof(struct fscrypt_nokey_name, sha256) &&
 	     ret != FSCRYPT_NOKEY_NAME_MAX)) {
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 2ea1387bb497..e300f6145ddc 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -46,6 +46,9 @@ struct fscrypt_name {
 /* Maximum value for the third parameter of fscrypt_operations.set_context(). */
 #define FSCRYPT_SET_CONTEXT_MAX_SIZE	40
 
+/* Calculate worst-case base64 encoding inflation */
+#define FSCRYPT_BASE64_CHARS(nbytes)	DIV_ROUND_UP((nbytes) * 4, 3)
+
 #ifdef CONFIG_FS_ENCRYPTION
 /*
  * fscrypt superblock flags
@@ -207,6 +210,8 @@ void fscrypt_free_inode(struct inode *inode);
 int fscrypt_drop_inode(struct inode *inode);
 
 /* fname.c */
+int fscrypt_base64_encode(const u8 *src, int len, char *dst);
+int fscrypt_base64_decode(const char *src, int len, u8 *dst);
 int fscrypt_setup_filename(struct inode *inode, const struct qstr *iname,
 			   int lookup, struct fscrypt_name *fname);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 03/19] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 01/19] vfs: export new_inode_pseudo Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 02/19] fscrypt: export fscrypt_base64_encode and fscrypt_base64_decode Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-04-08  1:19   ` Eric Biggers
  2021-03-26 17:32 ` [RFC PATCH v5 04/19] fscrypt: add fscrypt_context_for_new_inode Jeff Layton
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

For ceph, we want to use our own scheme for handling filenames that are
are longer than NAME_MAX after encryption and base64 encoding. This
allows us to have a consistent view of the encrypted filenames for
clients that don't support fscrypt and clients that do but that don't
have the key.

Export fscrypt_fname_encrypt. Rename fscrypt_fname_encrypted_size to
__fscrypt_fname_encrypted_size and add a new wrapper called
fscrypt_fname_encrypted_size that takes an inode argument rahter than
a pointer to a fscrypt_policy union.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/crypto/fname.c           | 19 ++++++++++++++-----
 fs/crypto/fscrypt_private.h |  9 +++------
 fs/crypto/hooks.c           |  6 +++---
 include/linux/fscrypt.h     |  4 ++++
 4 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 32b1f50433ba..5a794de7f61d 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -126,6 +126,7 @@ int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
 
 	return 0;
 }
+EXPORT_SYMBOL(fscrypt_fname_encrypt);
 
 /**
  * fname_decrypt() - decrypt a filename
@@ -244,9 +245,9 @@ int fscrypt_base64_decode(const char *src, int len, u8 *dst)
 }
 EXPORT_SYMBOL(fscrypt_base64_decode);
 
-bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
-				  u32 orig_len, u32 max_len,
-				  u32 *encrypted_len_ret)
+bool __fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
+				    u32 orig_len, u32 max_len,
+				    u32 *encrypted_len_ret)
 {
 	int padding = 4 << (fscrypt_policy_flags(policy) &
 			    FSCRYPT_POLICY_FLAGS_PAD_MASK);
@@ -260,6 +261,15 @@ bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
 	return true;
 }
 
+bool fscrypt_fname_encrypted_size(const struct inode *inode, u32 orig_len,
+				  u32 max_len, u32 *encrypted_len_ret)
+{
+	return __fscrypt_fname_encrypted_size(&inode->i_crypt_info->ci_policy,
+					      orig_len, max_len,
+					      encrypted_len_ret);
+}
+EXPORT_SYMBOL(fscrypt_fname_encrypted_size);
+
 /**
  * fscrypt_fname_alloc_buffer() - allocate a buffer for presented filenames
  * @max_encrypted_len: maximum length of encrypted filenames the buffer will be
@@ -422,8 +432,7 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname,
 		return ret;
 
 	if (fscrypt_has_encryption_key(dir)) {
-		if (!fscrypt_fname_encrypted_size(&dir->i_crypt_info->ci_policy,
-						  iname->len,
+		if (!fscrypt_fname_encrypted_size(dir, iname->len,
 						  dir->i_sb->s_cop->max_namelen,
 						  &fname->crypto_buf.len))
 			return -ENAMETOOLONG;
diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h
index 3fa965eb3336..195de6d0db40 100644
--- a/fs/crypto/fscrypt_private.h
+++ b/fs/crypto/fscrypt_private.h
@@ -292,14 +292,11 @@ void fscrypt_generate_iv(union fscrypt_iv *iv, u64 lblk_num,
 			 const struct fscrypt_info *ci);
 
 /* fname.c */
-int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
-			  u8 *out, unsigned int olen);
-bool fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
-				  u32 orig_len, u32 max_len,
-				  u32 *encrypted_len_ret);
+bool __fscrypt_fname_encrypted_size(const union fscrypt_policy *policy,
+				    u32 orig_len, u32 max_len,
+                                    u32 *encrypted_len_ret);
 
 /* hkdf.c */
-
 struct fscrypt_hkdf {
 	struct crypto_shash *hmac_tfm;
 };
diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c
index a73b0376e6f3..e65c19aae041 100644
--- a/fs/crypto/hooks.c
+++ b/fs/crypto/hooks.c
@@ -228,9 +228,9 @@ int fscrypt_prepare_symlink(struct inode *dir, const char *target,
 	 * counting it (even though it is meaningless for ciphertext) is simpler
 	 * for now since filesystems will assume it is there and subtract it.
 	 */
-	if (!fscrypt_fname_encrypted_size(policy, len,
-					  max_len - sizeof(struct fscrypt_symlink_data),
-					  &disk_link->len))
+	if (!__fscrypt_fname_encrypted_size(policy, len,
+					    max_len - sizeof(struct fscrypt_symlink_data),
+					    &disk_link->len))
 		return -ENAMETOOLONG;
 	disk_link->len += sizeof(struct fscrypt_symlink_data);
 
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index e300f6145ddc..b5c31baaa8bf 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -212,6 +212,10 @@ int fscrypt_drop_inode(struct inode *inode);
 /* fname.c */
 int fscrypt_base64_encode(const u8 *src, int len, char *dst);
 int fscrypt_base64_decode(const char *src, int len, u8 *dst);
+bool fscrypt_fname_encrypted_size(const struct inode *inode, u32 orig_len,
+				  u32 max_len, u32 *encrypted_len_ret);
+int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
+			  u8 *out, unsigned int olen);
 int fscrypt_setup_filename(struct inode *inode, const struct qstr *iname,
 			   int lookup, struct fscrypt_name *fname);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 04/19] fscrypt: add fscrypt_context_for_new_inode
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (2 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 03/19] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-04-08  1:21   ` Eric Biggers
  2021-03-26 17:32 ` [RFC PATCH v5 05/19] ceph: crypto context handling for ceph Jeff Layton
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

CephFS will need to be able to generate a context for a new "prepared"
inode. Add a new routine for getting the context out of an in-core
inode.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/crypto/policy.c      | 34 ++++++++++++++++++++++++++++------
 include/linux/fscrypt.h |  1 +
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/fs/crypto/policy.c b/fs/crypto/policy.c
index ed3d623724cd..6a895a31560f 100644
--- a/fs/crypto/policy.c
+++ b/fs/crypto/policy.c
@@ -664,6 +664,31 @@ const union fscrypt_policy *fscrypt_policy_to_inherit(struct inode *dir)
 	return fscrypt_get_dummy_policy(dir->i_sb);
 }
 
+/**
+ * fscrypt_context_for_new_inode() - create an encryption context for a new inode
+ * @ctx: where context should be written
+ * @inode: inode from which to fetch policy and nonce
+ *
+ * Given an in-core "prepared" (via fscrypt_prepare_new_inode) inode,
+ * generate a new context and write it to ctx. ctx _must_ be at least
+ * FSCRYPT_SET_CONTEXT_MAX_SIZE bytes.
+ *
+ * Returns size of the resulting context or a negative error code.
+ */
+int fscrypt_context_for_new_inode(void *ctx, struct inode *inode)
+{
+	struct fscrypt_info *ci = inode->i_crypt_info;
+
+	BUILD_BUG_ON(sizeof(union fscrypt_context) != FSCRYPT_SET_CONTEXT_MAX_SIZE);
+
+	/* fscrypt_prepare_new_inode() should have set up the key already. */
+	if (WARN_ON_ONCE(!ci))
+		return -ENOKEY;
+
+	return fscrypt_new_context(ctx, &ci->ci_policy, ci->ci_nonce);
+}
+EXPORT_SYMBOL_GPL(fscrypt_context_for_new_inode);
+
 /**
  * fscrypt_set_context() - Set the fscrypt context of a new inode
  * @inode: a new inode
@@ -680,12 +705,9 @@ int fscrypt_set_context(struct inode *inode, void *fs_data)
 	union fscrypt_context ctx;
 	int ctxsize;
 
-	/* fscrypt_prepare_new_inode() should have set up the key already. */
-	if (WARN_ON_ONCE(!ci))
-		return -ENOKEY;
-
-	BUILD_BUG_ON(sizeof(ctx) != FSCRYPT_SET_CONTEXT_MAX_SIZE);
-	ctxsize = fscrypt_new_context(&ctx, &ci->ci_policy, ci->ci_nonce);
+	ctxsize = fscrypt_context_for_new_inode(&ctx, inode);
+	if (ctxsize < 0)
+		return ctxsize;
 
 	/*
 	 * This may be the first time the inode number is available, so do any
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index b5c31baaa8bf..087fa87bca0b 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -178,6 +178,7 @@ int fscrypt_ioctl_get_policy(struct file *filp, void __user *arg);
 int fscrypt_ioctl_get_policy_ex(struct file *filp, void __user *arg);
 int fscrypt_ioctl_get_nonce(struct file *filp, void __user *arg);
 int fscrypt_has_permitted_context(struct inode *parent, struct inode *child);
+int fscrypt_context_for_new_inode(void *ctx, struct inode *inode);
 int fscrypt_set_context(struct inode *inode, void *fs_data);
 
 struct fscrypt_dummy_policy {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 05/19] ceph: crypto context handling for ceph
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (3 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 04/19] fscrypt: add fscrypt_context_for_new_inode Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 06/19] ceph: implement -o test_dummy_encryption mount option Jeff Layton
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

Store the fscrypt context for an inode as an encryption.ctx xattr.
When we get a new inode in a trace, set the S_ENCRYPTED bit if
the xattr blob has an encryption.ctx xattr.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/Makefile |  1 +
 fs/ceph/crypto.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/crypto.h | 24 ++++++++++++++++++++++++
 fs/ceph/inode.c  | 15 +++++++++++++++
 fs/ceph/super.c  |  3 +++
 fs/ceph/super.h  |  1 +
 fs/ceph/xattr.c  | 32 ++++++++++++++++++++++++++++++++
 7 files changed, 118 insertions(+)
 create mode 100644 fs/ceph/crypto.c
 create mode 100644 fs/ceph/crypto.h

diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile
index 50c635dc7f71..1f77ca04c426 100644
--- a/fs/ceph/Makefile
+++ b/fs/ceph/Makefile
@@ -12,3 +12,4 @@ ceph-y := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \
 
 ceph-$(CONFIG_CEPH_FSCACHE) += cache.o
 ceph-$(CONFIG_CEPH_FS_POSIX_ACL) += acl.o
+ceph-$(CONFIG_FS_ENCRYPTION) += crypto.o
diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
new file mode 100644
index 000000000000..dbe8b60fd1b0
--- /dev/null
+++ b/fs/ceph/crypto.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ceph/ceph_debug.h>
+#include <linux/xattr.h>
+#include <linux/fscrypt.h>
+
+#include "super.h"
+#include "crypto.h"
+
+static int ceph_crypt_get_context(struct inode *inode, void *ctx, size_t len)
+{
+	return __ceph_getxattr(inode, CEPH_XATTR_NAME_ENCRYPTION_CONTEXT, ctx, len);
+}
+
+static int ceph_crypt_set_context(struct inode *inode, const void *ctx, size_t len, void *fs_data)
+{
+	int ret;
+
+	WARN_ON_ONCE(fs_data);
+	ret = __ceph_setxattr(inode, CEPH_XATTR_NAME_ENCRYPTION_CONTEXT, ctx, len, XATTR_CREATE);
+	if (ret == 0)
+		inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
+	return ret;
+}
+
+static bool ceph_crypt_empty_dir(struct inode *inode)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+
+	return ci->i_rsubdirs + ci->i_rfiles == 1;
+}
+
+static struct fscrypt_operations ceph_fscrypt_ops = {
+	.get_context		= ceph_crypt_get_context,
+	.set_context		= ceph_crypt_set_context,
+	.empty_dir		= ceph_crypt_empty_dir,
+	.max_namelen		= NAME_MAX,
+};
+
+void ceph_fscrypt_set_ops(struct super_block *sb)
+{
+	fscrypt_set_ops(sb, &ceph_fscrypt_ops);
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
new file mode 100644
index 000000000000..189bd8424284
--- /dev/null
+++ b/fs/ceph/crypto.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Ceph fscrypt functionality
+ */
+
+#ifndef _CEPH_CRYPTO_H
+#define _CEPH_CRYPTO_H
+
+#include <linux/fscrypt.h>
+
+#define	CEPH_XATTR_NAME_ENCRYPTION_CONTEXT	"encryption.ctx"
+
+#ifdef CONFIG_FS_ENCRYPTION
+void ceph_fscrypt_set_ops(struct super_block *sb);
+
+#else /* CONFIG_FS_ENCRYPTION */
+
+static inline void ceph_fscrypt_set_ops(struct super_block *sb)
+{
+}
+
+#endif /* CONFIG_FS_ENCRYPTION */
+
+#endif
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 2c512475c170..33dda23c99e0 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -14,10 +14,12 @@
 #include <linux/random.h>
 #include <linux/sort.h>
 #include <linux/iversion.h>
+#include <linux/fscrypt.h>
 
 #include "super.h"
 #include "mds_client.h"
 #include "cache.h"
+#include "crypto.h"
 #include <linux/ceph/decode.h>
 
 /*
@@ -566,6 +568,7 @@ void ceph_evict_inode(struct inode *inode)
 	clear_inode(inode);
 
 	ceph_fscache_unregister_inode_cookie(ci);
+	fscrypt_put_encryption_info(inode);
 
 	__ceph_remove_caps(ci);
 
@@ -944,6 +947,18 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 		ceph_forget_all_cached_acls(inode);
 		ceph_security_invalidate_secctx(inode);
 		xattr_blob = NULL;
+
+		/*
+		 * Most inodes inherit the encrypted flag from their parent,
+		 * but empty directories can end up being encrypted later via
+		 * ioctl. Only check for encryption if it's not already encrypted,
+		 * and it's a new inode, or a directory.
+		 */
+		if (!IS_ENCRYPTED(inode) &&
+		    ((inode->i_state & I_NEW) || S_ISDIR(inode->i_mode))) {
+			if (ceph_inode_has_xattr(ci, CEPH_XATTR_NAME_ENCRYPTION_CONTEXT))
+				inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
+		}
 	}
 
 	/* finally update i_version */
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 9b1b7f4cfdd4..cdac6ff675e2 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -20,6 +20,7 @@
 #include "super.h"
 #include "mds_client.h"
 #include "cache.h"
+#include "crypto.h"
 
 #include <linux/ceph/ceph_features.h>
 #include <linux/ceph/decode.h>
@@ -988,6 +989,8 @@ static int ceph_set_super(struct super_block *s, struct fs_context *fc)
 	s->s_time_min = 0;
 	s->s_time_max = U32_MAX;
 
+	ceph_fscrypt_set_ops(s);
+
 	ret = set_anon_super_fc(s, fc);
 	if (ret != 0)
 		fsc->sb = NULL;
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 5e0e1aeee1b5..36b12e33b2bc 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1016,6 +1016,7 @@ extern ssize_t ceph_listxattr(struct dentry *, char *, size_t);
 extern struct ceph_buffer *__ceph_build_xattrs_blob(struct ceph_inode_info *ci);
 extern void __ceph_destroy_xattrs(struct ceph_inode_info *ci);
 extern const struct xattr_handler *ceph_xattr_handlers[];
+bool ceph_inode_has_xattr(struct ceph_inode_info *ci, const char *name);
 
 struct ceph_acl_sec_ctx {
 #ifdef CONFIG_CEPH_FS_POSIX_ACL
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index 02f59bcb4f27..38ac2968e4a1 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -1360,6 +1360,38 @@ void ceph_release_acl_sec_ctx(struct ceph_acl_sec_ctx *as_ctx)
 		ceph_pagelist_release(as_ctx->pagelist);
 }
 
+/* Return true if inode's xattr blob has an xattr named "name" */
+bool ceph_inode_has_xattr(struct ceph_inode_info *ci, const char *name)
+{
+	void *p, *end;
+	u32 numattr;
+	size_t namelen;
+
+	lockdep_assert_held(&ci->i_ceph_lock);
+
+	if (!ci->i_xattrs.blob || ci->i_xattrs.blob->vec.iov_len <= 4)
+		return false;
+
+	namelen = strlen(name);
+	p = ci->i_xattrs.blob->vec.iov_base;
+	end = p + ci->i_xattrs.blob->vec.iov_len;
+	ceph_decode_32_safe(&p, end, numattr, bad);
+
+	while (numattr--) {
+		u32 len;
+
+		ceph_decode_32_safe(&p, end, len, bad);
+		ceph_decode_need(&p, end, len, bad);
+		if (len == namelen && !memcmp(p, name, len))
+			return true;
+		p += len;
+		ceph_decode_32_safe(&p, end, len, bad);
+		ceph_decode_skip_n(&p, end, len, bad);
+	}
+bad:
+	return false;
+}
+
 /*
  * List of handlers for synthetic system.* attributes. Other
  * attributes are handled directly.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 06/19] ceph: implement -o test_dummy_encryption mount option
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (4 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 05/19] ceph: crypto context handling for ceph Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 07/19] ceph: preallocate inode for ops that may create one Jeff Layton
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/crypto.c |  6 ++++
 fs/ceph/crypto.h |  8 +++++
 fs/ceph/super.c  | 77 ++++++++++++++++++++++++++++++++++++++++++++++--
 fs/ceph/super.h  |  7 ++++-
 4 files changed, 95 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index dbe8b60fd1b0..879d9a0d3751 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -29,9 +29,15 @@ static bool ceph_crypt_empty_dir(struct inode *inode)
 	return ci->i_rsubdirs + ci->i_rfiles == 1;
 }
 
+static const union fscrypt_policy *ceph_get_dummy_policy(struct super_block *sb)
+{
+	return ceph_sb_to_client(sb)->dummy_enc_policy.policy;
+}
+
 static struct fscrypt_operations ceph_fscrypt_ops = {
 	.get_context		= ceph_crypt_get_context,
 	.set_context		= ceph_crypt_set_context,
+	.get_dummy_policy	= ceph_get_dummy_policy,
 	.empty_dir		= ceph_crypt_empty_dir,
 	.max_namelen		= NAME_MAX,
 };
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 189bd8424284..0dd043b56096 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -13,12 +13,20 @@
 #ifdef CONFIG_FS_ENCRYPTION
 void ceph_fscrypt_set_ops(struct super_block *sb);
 
+static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
+{
+	fscrypt_free_dummy_policy(&fsc->dummy_enc_policy);
+}
+
 #else /* CONFIG_FS_ENCRYPTION */
 
 static inline void ceph_fscrypt_set_ops(struct super_block *sb)
 {
 }
 
+static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
+{
+}
 #endif /* CONFIG_FS_ENCRYPTION */
 
 #endif
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index cdac6ff675e2..48a99da4ff97 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -45,6 +45,7 @@ static void ceph_put_super(struct super_block *s)
 	struct ceph_fs_client *fsc = ceph_sb_to_client(s);
 
 	dout("put_super\n");
+	ceph_fscrypt_free_dummy_policy(fsc);
 	ceph_mdsc_close_sessions(fsc->mdsc);
 }
 
@@ -159,6 +160,7 @@ enum {
 	Opt_quotadf,
 	Opt_copyfrom,
 	Opt_wsync,
+	Opt_test_dummy_encryption,
 };
 
 enum ceph_recover_session_mode {
@@ -197,6 +199,8 @@ static const struct fs_parameter_spec ceph_mount_parameters[] = {
 	fsparam_u32	("rsize",			Opt_rsize),
 	fsparam_string	("snapdirname",			Opt_snapdirname),
 	fsparam_string	("source",			Opt_source),
+	fsparam_flag	("test_dummy_encryption",	Opt_test_dummy_encryption),
+	fsparam_string	("test_dummy_encryption",	Opt_test_dummy_encryption),
 	fsparam_u32	("wsize",			Opt_wsize),
 	fsparam_flag_no	("wsync",			Opt_wsync),
 	{}
@@ -455,6 +459,16 @@ static int ceph_parse_mount_param(struct fs_context *fc,
 		else
 			fsopt->flags |= CEPH_MOUNT_OPT_ASYNC_DIROPS;
 		break;
+	case Opt_test_dummy_encryption:
+#ifdef CONFIG_FS_ENCRYPTION
+		kfree(fsopt->test_dummy_encryption);
+		fsopt->test_dummy_encryption = param->string;
+		param->string = NULL;
+		fsopt->flags |= CEPH_MOUNT_OPT_TEST_DUMMY_ENC;
+#else
+		warnfc(fc, "FS encryption not supported: test_dummy_encryption mount option ignored");
+#endif
+		break;
 	default:
 		BUG();
 	}
@@ -474,6 +488,7 @@ static void destroy_mount_options(struct ceph_mount_options *args)
 	kfree(args->mds_namespace);
 	kfree(args->server_path);
 	kfree(args->fscache_uniq);
+	kfree(args->test_dummy_encryption);
 	kfree(args);
 }
 
@@ -581,6 +596,8 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root)
 	if (fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS)
 		seq_puts(m, ",nowsync");
 
+	fscrypt_show_test_dummy_encryption(m, ',', root->d_sb);
+
 	if (fsopt->wsize != CEPH_MAX_WRITE_SIZE)
 		seq_printf(m, ",wsize=%u", fsopt->wsize);
 	if (fsopt->rsize != CEPH_MAX_READ_SIZE)
@@ -916,6 +933,52 @@ static struct dentry *open_root_dentry(struct ceph_fs_client *fsc,
 	return root;
 }
 
+#ifdef CONFIG_FS_ENCRYPTION
+static int ceph_set_test_dummy_encryption(struct super_block *sb, struct fs_context *fc,
+						struct ceph_mount_options *fsopt)
+{
+	struct ceph_fs_client *fsc = sb->s_fs_info;
+
+	/*
+	 * No changing encryption context on remount. Note that
+	 * fscrypt_set_test_dummy_encryption will validate the version
+	 * string passed in (if any).
+	 */
+	if (fsopt->flags & CEPH_MOUNT_OPT_TEST_DUMMY_ENC) {
+		int err = 0;
+
+		if (fc->purpose == FS_CONTEXT_FOR_RECONFIGURE && !fsc->dummy_enc_policy.policy) {
+			errorfc(fc, "Can't set test_dummy_encryption on remount");
+			return -EEXIST;
+		}
+
+		err = fscrypt_set_test_dummy_encryption(sb,
+							fsc->mount_options->test_dummy_encryption,
+							&fsc->dummy_enc_policy);
+		if (err) {
+			if (err == -EEXIST)
+				errorfc(fc, "Can't change test_dummy_encryption on remount");
+			else if (err == -EINVAL)
+				errorfc(fc, "Value of option \"%s\" is unrecognized",
+					fsc->mount_options->test_dummy_encryption);
+			else
+				errorfc(fc, "Error processing option \"%s\" [%d]",
+					fsc->mount_options->test_dummy_encryption, err);
+			return err;
+		}
+		warnfc(fc, "test_dummy_encryption mode enabled");
+	}
+	return 0;
+}
+#else
+static inline int ceph_set_test_dummy_encryption(struct super_block *sb, struct fs_context *fc,
+						struct ceph_mount_options *fsopt)
+{
+	warnfc(fc, "test_dummy_encryption mode ignored");
+	return 0;
+}
+#endif
+
 /*
  * mount: join the ceph cluster, and open root directory.
  */
@@ -944,6 +1007,10 @@ static struct dentry *ceph_real_mount(struct ceph_fs_client *fsc,
 				goto out;
 		}
 
+		err = ceph_set_test_dummy_encryption(fsc->sb, fc, fsc->mount_options);
+		if (err)
+			goto out;
+
 		dout("mount opening path '%s'\n", path);
 
 		ceph_fs_debugfs_init(fsc);
@@ -1138,16 +1205,22 @@ static void ceph_free_fc(struct fs_context *fc)
 
 static int ceph_reconfigure_fc(struct fs_context *fc)
 {
+	int err;
 	struct ceph_parse_opts_ctx *pctx = fc->fs_private;
 	struct ceph_mount_options *fsopt = pctx->opts;
-	struct ceph_fs_client *fsc = ceph_sb_to_client(fc->root->d_sb);
+	struct super_block *sb = fc->root->d_sb;
+	struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
+
+	err = ceph_set_test_dummy_encryption(sb, fc, fsopt);
+	if (err)
+		return err;
 
 	if (fsopt->flags & CEPH_MOUNT_OPT_ASYNC_DIROPS)
 		ceph_set_mount_opt(fsc, ASYNC_DIROPS);
 	else
 		ceph_clear_mount_opt(fsc, ASYNC_DIROPS);
 
-	sync_filesystem(fc->root->d_sb);
+	sync_filesystem(sb);
 	return 0;
 }
 
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 36b12e33b2bc..831c1e76789d 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -17,6 +17,7 @@
 #include <linux/posix_acl.h>
 #include <linux/refcount.h>
 #include <linux/security.h>
+#include <linux/fscrypt.h>
 
 #include <linux/ceph/libceph.h>
 
@@ -45,6 +46,7 @@
 #define CEPH_MOUNT_OPT_NOQUOTADF       (1<<13) /* no root dir quota in statfs */
 #define CEPH_MOUNT_OPT_NOCOPYFROM      (1<<14) /* don't use RADOS 'copy-from' op */
 #define CEPH_MOUNT_OPT_ASYNC_DIROPS    (1<<15) /* allow async directory ops */
+#define CEPH_MOUNT_OPT_TEST_DUMMY_ENC  (1<<16) /* enable dummy encryption (for testing) */
 
 #define CEPH_MOUNT_OPT_DEFAULT			\
 	(CEPH_MOUNT_OPT_DCACHE |		\
@@ -97,6 +99,7 @@ struct ceph_mount_options {
 	char *mds_namespace;  /* default NULL */
 	char *server_path;    /* default NULL (means "/") */
 	char *fscache_uniq;   /* default NULL */
+	char *test_dummy_encryption;	/* default NULL */
 };
 
 struct ceph_fs_client {
@@ -136,9 +139,11 @@ struct ceph_fs_client {
 #ifdef CONFIG_CEPH_FSCACHE
 	struct fscache_cookie *fscache;
 #endif
+#ifdef CONFIG_FS_ENCRYPTION
+	struct fscrypt_dummy_policy dummy_enc_policy;
+#endif
 };
 
-
 /*
  * File i/o capability.  This tracks shared state with the metadata
  * server that allows us to cache or writeback attributes or to read
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 07/19] ceph: preallocate inode for ops that may create one
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (5 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 06/19] ceph: implement -o test_dummy_encryption mount option Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 08/19] ceph: add routine to create fscrypt context prior to RPC Jeff Layton
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

When creating a new inode, we need to determine the crypto context
before we can transmit the RPC. The fscrypt API has a routine for getting
a crypto context before a create occurs, but it requires an inode.

Change the ceph code to preallocate an inode in advance of a create of
any sort (open(), mknod(), symlink(), etc). Move the existing code that
generates the ACL and SELinux blobs into this routine since that's
mostly common across all the different codepaths.

In most cases, we just want to allow ceph_fill_trace to use that inode
after the reply comes in, so add a new field to the MDS request for it
(r_new_inode).

The async create codepath is a bit different though. In that case, we
want to hash the inode in advance of the RPC so that it can be used
before the reply comes in. If the call subsequently fails with
-EJUKEBOX, then just put the references and clean up the as_ctx. Note
that with this change, we now need to regenerate the as_ctx when this
occurs, but it's quite rare for it to happen.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c        | 49 ++++++++++++++++++------------
 fs/ceph/file.c       | 56 ++++++++++++++++++++++------------
 fs/ceph/inode.c      | 72 ++++++++++++++++++++++++++++++++++++++++----
 fs/ceph/mds_client.c |  3 +-
 fs/ceph/mds_client.h |  1 +
 fs/ceph/super.h      |  5 ++-
 6 files changed, 138 insertions(+), 48 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 570662dec3fe..496d24b003dd 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -851,13 +851,6 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out;
 	}
 
-	err = ceph_pre_init_acls(dir, &mode, &as_ctx);
-	if (err < 0)
-		goto out;
-	err = ceph_security_init_secctx(dentry, mode, &as_ctx);
-	if (err < 0)
-		goto out;
-
 	dout("mknod in dir %p dentry %p mode 0%ho rdev %d\n",
 	     dir, dentry, mode, rdev);
 	req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_MKNOD, USE_AUTH_MDS);
@@ -865,6 +858,14 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 		err = PTR_ERR(req);
 		goto out;
 	}
+
+	req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+	if (IS_ERR(req->r_new_inode)) {
+		err = PTR_ERR(req->r_new_inode);
+		req->r_new_inode = NULL;
+		goto out_req;
+	}
+
 	req->r_dentry = dget(dentry);
 	req->r_num_caps = 2;
 	req->r_parent = dir;
@@ -880,6 +881,7 @@ static int ceph_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 	err = ceph_mdsc_do_request(mdsc, dir, req);
 	if (!err && !req->r_reply_info.head->is_dentry)
 		err = ceph_handle_notrace_create(dir, dentry);
+out_req:
 	ceph_mdsc_put_request(req);
 out:
 	if (!err)
@@ -902,6 +904,7 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 	struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb);
 	struct ceph_mds_request *req;
 	struct ceph_acl_sec_ctx as_ctx = {};
+	umode_t mode = S_IFLNK | 0777;
 	int err;
 
 	if (ceph_snap(dir) != CEPH_NOSNAP)
@@ -912,21 +915,24 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out;
 	}
 
-	err = ceph_security_init_secctx(dentry, S_IFLNK | 0777, &as_ctx);
-	if (err < 0)
-		goto out;
-
 	dout("symlink in dir %p dentry %p to '%s'\n", dir, dentry, dest);
 	req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_SYMLINK, USE_AUTH_MDS);
 	if (IS_ERR(req)) {
 		err = PTR_ERR(req);
 		goto out;
 	}
+
+	req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+	if (IS_ERR(req->r_new_inode)) {
+		err = PTR_ERR(req->r_new_inode);
+		req->r_new_inode = NULL;
+		goto out_req;
+	}
+
 	req->r_path2 = kstrdup(dest, GFP_KERNEL);
 	if (!req->r_path2) {
 		err = -ENOMEM;
-		ceph_mdsc_put_request(req);
-		goto out;
+		goto out_req;
 	}
 	req->r_parent = dir;
 	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
@@ -941,6 +947,7 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 	err = ceph_mdsc_do_request(mdsc, dir, req);
 	if (!err && !req->r_reply_info.head->is_dentry)
 		err = ceph_handle_notrace_create(dir, dentry);
+out_req:
 	ceph_mdsc_put_request(req);
 out:
 	if (err)
@@ -976,13 +983,6 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out;
 	}
 
-	mode |= S_IFDIR;
-	err = ceph_pre_init_acls(dir, &mode, &as_ctx);
-	if (err < 0)
-		goto out;
-	err = ceph_security_init_secctx(dentry, mode, &as_ctx);
-	if (err < 0)
-		goto out;
 
 	req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
 	if (IS_ERR(req)) {
@@ -990,6 +990,14 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out;
 	}
 
+	mode |= S_IFDIR;
+	req->r_new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+	if (IS_ERR(req->r_new_inode)) {
+		err = PTR_ERR(req->r_new_inode);
+		req->r_new_inode = NULL;
+		goto out_req;
+	}
+
 	req->r_dentry = dget(dentry);
 	req->r_num_caps = 2;
 	req->r_parent = dir;
@@ -1006,6 +1014,7 @@ static int ceph_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 	    !req->r_reply_info.head->is_target &&
 	    !req->r_reply_info.head->is_dentry)
 		err = ceph_handle_notrace_create(dir, dentry);
+out_req:
 	ceph_mdsc_put_request(req);
 out:
 	if (!err)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 31542eac7e59..33c00999c202 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -565,7 +565,8 @@ static void ceph_async_create_cb(struct ceph_mds_client *mdsc,
 	ceph_mdsc_release_dir_caps(req);
 }
 
-static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
+static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
+				    struct dentry *dentry,
 				    struct file *file, umode_t mode,
 				    struct ceph_mds_request *req,
 				    struct ceph_acl_sec_ctx *as_ctx,
@@ -576,17 +577,12 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
 	struct ceph_mds_reply_inode in = { };
 	struct ceph_mds_reply_info_in iinfo = { .in = &in };
 	struct ceph_inode_info *ci = ceph_inode(dir);
-	struct inode *inode;
 	struct timespec64 now;
 	struct ceph_vino vino = { .ino = req->r_deleg_ino,
 				  .snap = CEPH_NOSNAP };
 
 	ktime_get_real_ts64(&now);
 
-	inode = ceph_get_inode(dentry->d_sb, vino);
-	if (IS_ERR(inode))
-		return PTR_ERR(inode);
-
 	iinfo.inline_version = CEPH_INLINE_NONE;
 	iinfo.change_attr = 1;
 	ceph_encode_timespec64(&iinfo.btime, &now);
@@ -622,8 +618,7 @@ static int ceph_finish_async_create(struct inode *dir, struct dentry *dentry,
 		ceph_dir_clear_complete(dir);
 		if (!d_unhashed(dentry))
 			d_drop(dentry);
-		if (inode->i_state & I_NEW)
-			discard_new_inode(inode);
+		discard_new_inode(inode);
 	} else {
 		struct dentry *dn;
 
@@ -663,6 +658,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
 	struct ceph_mds_client *mdsc = fsc->mdsc;
 	struct ceph_mds_request *req;
+	struct inode *new_inode = NULL;
 	struct dentry *dn;
 	struct ceph_acl_sec_ctx as_ctx = {};
 	bool try_async = ceph_test_mount_opt(fsc, ASYNC_DIROPS);
@@ -675,21 +671,21 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 
 	if (dentry->d_name.len > NAME_MAX)
 		return -ENAMETOOLONG;
-
+retry:
 	if (flags & O_CREAT) {
 		if (ceph_quota_is_max_files_exceeded(dir))
 			return -EDQUOT;
-		err = ceph_pre_init_acls(dir, &mode, &as_ctx);
-		if (err < 0)
-			return err;
-		err = ceph_security_init_secctx(dentry, mode, &as_ctx);
-		if (err < 0)
+
+		new_inode = ceph_new_inode(dir, dentry, &mode, &as_ctx);
+		if (IS_ERR(new_inode)) {
+			err = PTR_ERR(new_inode);
 			goto out_ctx;
+		}
 	} else if (!d_in_lookup(dentry)) {
 		/* If it's not being looked up, it's negative */
 		return -ENOENT;
 	}
-retry:
+
 	/* do the open */
 	req = prepare_open_request(dir->i_sb, flags, mode);
 	if (IS_ERR(req)) {
@@ -713,21 +709,38 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 			req->r_pagelist = as_ctx.pagelist;
 			as_ctx.pagelist = NULL;
 		}
-		if (try_async &&
-		    (req->r_dir_caps =
-		      try_prep_async_create(dir, dentry, &lo,
-					    &req->r_deleg_ino))) {
+
+		if (try_async && (req->r_dir_caps =
+				  try_prep_async_create(dir, dentry, &lo, &req->r_deleg_ino))) {
+			struct ceph_vino vino = { .ino = req->r_deleg_ino,
+						  .snap = CEPH_NOSNAP };
+
 			set_bit(CEPH_MDS_R_ASYNC, &req->r_req_flags);
 			req->r_args.open.flags |= cpu_to_le32(CEPH_O_EXCL);
 			req->r_callback = ceph_async_create_cb;
+
+			/* Hash inode before RPC */
+			new_inode = ceph_get_inode(dir->i_sb, vino, new_inode);
+			if (IS_ERR(new_inode)) {
+				err = PTR_ERR(new_inode);
+				new_inode = NULL;
+				goto out_req;
+			}
+			WARN_ON_ONCE(!(new_inode->i_state & I_NEW));
+
 			err = ceph_mdsc_submit_request(mdsc, dir, req);
 			if (!err) {
-				err = ceph_finish_async_create(dir, dentry,
+				err = ceph_finish_async_create(dir, new_inode, dentry,
 							file, mode, req,
 							&as_ctx, &lo);
+				new_inode = NULL;
 			} else if (err == -EJUKEBOX) {
 				restore_deleg_ino(dir, req->r_deleg_ino);
 				ceph_mdsc_put_request(req);
+				discard_new_inode(new_inode);
+				ceph_release_acl_sec_ctx(&as_ctx);
+				memset(&as_ctx, 0, sizeof(as_ctx));
+				new_inode = NULL;
 				try_async = false;
 				goto retry;
 			}
@@ -736,6 +749,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	}
 
 	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
+	req->r_new_inode = new_inode;
+	new_inode = NULL;
 	err = ceph_mdsc_do_request(mdsc,
 				   (flags & (O_CREAT|O_TRUNC)) ? dir : NULL,
 				   req);
@@ -776,6 +791,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	}
 out_req:
 	ceph_mdsc_put_request(req);
+	iput(new_inode);
 out_ctx:
 	ceph_release_acl_sec_ctx(&as_ctx);
 	dout("atomic_open result=%d\n", err);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 33dda23c99e0..7b70187cc564 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -54,15 +54,75 @@ static int ceph_set_ino_cb(struct inode *inode, void *data)
 	return 0;
 }
 
-struct inode *ceph_get_inode(struct super_block *sb, struct ceph_vino vino)
+/**
+ * ceph_new_inode - allocate a new inode in advance of an expected create
+ * @dir: parent directory for new inode
+ * @dentry: dentry that may eventually point to new inode
+ * @mode: mode of new inode
+ * @as_ctx: pointer to inherited security context
+ *
+ * Allocate a new inode in advance of an operation to create a new inode.
+ * This allocates the inode and sets up the acl_sec_ctx with appropriate
+ * info for the new inode.
+ *
+ * Returns a pointer to the new inode or an ERR_PTR.
+ */
+struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
+			     umode_t *mode, struct ceph_acl_sec_ctx *as_ctx)
 {
+	int err;
 	struct inode *inode;
 
-	inode = iget5_locked(sb, (unsigned long)vino.ino, ceph_ino_compare,
-			     ceph_set_ino_cb, &vino);
+	inode = new_inode_pseudo(dir->i_sb);
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
 
+	if (!S_ISLNK(*mode)) {
+		err = ceph_pre_init_acls(dir, mode, as_ctx);
+		if (err < 0)
+			goto out_err;
+	}
+
+	err = ceph_security_init_secctx(dentry, *mode, as_ctx);
+	if (err < 0)
+		goto out_err;
+
+	inode->i_state = 0;
+	inode->i_mode = *mode;
+	return inode;
+out_err:
+	iput(inode);
+	return ERR_PTR(err);
+}
+
+/**
+ * ceph_get_inode - find or create/hash a new inode
+ * @sb: superblock to search and allocate in
+ * @vino: vino to search for
+ * @newino: optional new inode to insert if one isn't found (may be NULL)
+ *
+ * Search for or insert a new inode into the hash for the given vino, and return a
+ * reference to it. If new is non-NULL, its reference is consumed.
+ */
+struct inode *ceph_get_inode(struct super_block *sb, struct ceph_vino vino, struct inode *newino)
+{
+	struct inode *inode;
+
+	if (newino) {
+		inode = inode_insert5(newino, (unsigned long)vino.ino, ceph_ino_compare,
+					ceph_set_ino_cb, &vino);
+		if (inode != newino)
+			iput(newino);
+	} else {
+		inode = iget5_locked(sb, (unsigned long)vino.ino, ceph_ino_compare,
+				     ceph_set_ino_cb, &vino);
+	}
+
+	if (!inode) {
+		dout("No inode found for %llx.%llx\n", vino.ino, vino.snap);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	dout("get_inode on %llu=%llx.%llx got %p new %d\n", ceph_present_inode(inode),
 	     ceph_vinop(inode), inode, !!(inode->i_state & I_NEW));
 	return inode;
@@ -77,7 +137,7 @@ struct inode *ceph_get_snapdir(struct inode *parent)
 		.ino = ceph_ino(parent),
 		.snap = CEPH_SNAPDIR,
 	};
-	struct inode *inode = ceph_get_inode(parent->i_sb, vino);
+	struct inode *inode = ceph_get_inode(parent->i_sb, vino, NULL);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 
 	if (IS_ERR(inode))
@@ -1556,7 +1616,7 @@ static int readdir_prepopulate_inodes_only(struct ceph_mds_request *req,
 		vino.ino = le64_to_cpu(rde->inode.in->ino);
 		vino.snap = le64_to_cpu(rde->inode.in->snapid);
 
-		in = ceph_get_inode(req->r_dentry->d_sb, vino);
+		in = ceph_get_inode(req->r_dentry->d_sb, vino, NULL);
 		if (IS_ERR(in)) {
 			err = PTR_ERR(in);
 			dout("new_inode badness got %d\n", err);
@@ -1759,7 +1819,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 		if (d_really_is_positive(dn)) {
 			in = d_inode(dn);
 		} else {
-			in = ceph_get_inode(parent->d_sb, tvino);
+			in = ceph_get_inode(parent->d_sb, tvino, NULL);
 			if (IS_ERR(in)) {
 				dout("new_inode badness\n");
 				d_drop(dn);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 73ecb7d128c9..e3284de74ca4 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -816,6 +816,7 @@ void ceph_mdsc_release_request(struct kref *kref)
 		ceph_async_iput(req->r_parent);
 	}
 	ceph_async_iput(req->r_target_inode);
+	ceph_async_iput(req->r_new_inode);
 	if (req->r_dentry)
 		dput(req->r_dentry);
 	if (req->r_old_dentry)
@@ -3229,7 +3230,7 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
 			.snap = le64_to_cpu(rinfo->targeti.in->snapid)
 		};
 
-		in = ceph_get_inode(mdsc->fsc->sb, tvino);
+		in = ceph_get_inode(mdsc->fsc->sb, tvino, xchg(&req->r_new_inode, NULL));
 		if (IS_ERR(in)) {
 			err = PTR_ERR(in);
 			mutex_lock(&session->s_mutex);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index bf99c5ba47fc..2b18ea24c650 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -260,6 +260,7 @@ struct ceph_mds_request {
 
 	struct inode *r_parent;		    /* parent dir inode */
 	struct inode *r_target_inode;       /* resulting inode */
+	struct inode *r_new_inode;	    /* new inode (for creates) */
 
 #define CEPH_MDS_R_DIRECT_IS_HASH	(1) /* r_direct_hash is valid */
 #define CEPH_MDS_R_ABORTED		(2) /* call was aborted */
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 831c1e76789d..8a40374f9154 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -939,6 +939,7 @@ static inline bool __ceph_have_pending_cap_snap(struct ceph_inode_info *ci)
 /* inode.c */
 struct ceph_mds_reply_info_in;
 struct ceph_mds_reply_dirfrag;
+struct ceph_acl_sec_ctx;
 
 extern const struct inode_operations ceph_file_iops;
 
@@ -946,8 +947,10 @@ extern struct inode *ceph_alloc_inode(struct super_block *sb);
 extern void ceph_evict_inode(struct inode *inode);
 extern void ceph_free_inode(struct inode *inode);
 
+struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
+			     umode_t *mode, struct ceph_acl_sec_ctx *as_ctx);
 extern struct inode *ceph_get_inode(struct super_block *sb,
-				    struct ceph_vino vino);
+				    struct ceph_vino vino, struct inode *newino);
 extern struct inode *ceph_get_snapdir(struct inode *parent);
 extern int ceph_fill_file_size(struct inode *inode, int issued,
 			       u32 truncate_seq, u64 truncate_size, u64 size);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 08/19] ceph: add routine to create fscrypt context prior to RPC
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (6 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 07/19] ceph: preallocate inode for ops that may create one Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 09/19] ceph: make ceph_msdc_build_path use ref-walk Jeff Layton
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

After pre-creating a new inode, do an fscrypt prepare on it, fetch a
new encryption context and then marshal that into the security context
to be sent along with the RPC. Call the new function from
ceph_new_inode.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/crypto.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/crypto.h | 12 ++++++++++
 fs/ceph/inode.c  |  9 +++++--
 fs/ceph/super.h  |  3 +++
 4 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 879d9a0d3751..f037a4939026 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -46,3 +46,64 @@ void ceph_fscrypt_set_ops(struct super_block *sb)
 {
 	fscrypt_set_ops(sb, &ceph_fscrypt_ops);
 }
+
+int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
+				 struct ceph_acl_sec_ctx *as)
+{
+	int ret, ctxsize;
+	size_t name_len;
+	char *name;
+	struct ceph_pagelist *pagelist = as->pagelist;
+	bool encrypted = false;
+
+	ret = fscrypt_prepare_new_inode(dir, inode, &encrypted);
+	if (ret)
+		return ret;
+	if (!encrypted)
+		return 0;
+
+	inode->i_flags |= S_ENCRYPTED;
+
+	ctxsize = fscrypt_context_for_new_inode(&as->fscrypt, inode);
+	if (ctxsize < 0)
+		return ctxsize;
+
+	/* marshal it in page array */
+	if (!pagelist) {
+		pagelist = ceph_pagelist_alloc(GFP_KERNEL);
+		if (!pagelist)
+			return -ENOMEM;
+		ret = ceph_pagelist_reserve(pagelist, PAGE_SIZE);
+		if (ret)
+			goto out;
+		ceph_pagelist_encode_32(pagelist, 1);
+	}
+
+	name = CEPH_XATTR_NAME_ENCRYPTION_CONTEXT;
+	name_len = strlen(name);
+	ret = ceph_pagelist_reserve(pagelist, 4 * 2 + name_len + ctxsize);
+	if (ret)
+		goto out;
+
+	if (as->pagelist) {
+		BUG_ON(pagelist->length <= sizeof(__le32));
+		if (list_is_singular(&pagelist->head)) {
+			le32_add_cpu((__le32*)pagelist->mapped_tail, 1);
+		} else {
+			struct page *page = list_first_entry(&pagelist->head,
+							     struct page, lru);
+			void *addr = kmap_atomic(page);
+			le32_add_cpu((__le32*)addr, 1);
+			kunmap_atomic(addr);
+		}
+	}
+
+	ceph_pagelist_encode_32(pagelist, name_len);
+	ceph_pagelist_append(pagelist, name, name_len);
+	ceph_pagelist_encode_32(pagelist, ctxsize);
+	ceph_pagelist_append(pagelist, as->fscrypt, ctxsize);
+out:
+	if (pagelist && !as->pagelist)
+		ceph_pagelist_release(pagelist);
+	return ret;
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 0dd043b56096..cc4e481bf13a 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -18,6 +18,9 @@ static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
 	fscrypt_free_dummy_policy(&fsc->dummy_enc_policy);
 }
 
+int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
+				 struct ceph_acl_sec_ctx *as);
+
 #else /* CONFIG_FS_ENCRYPTION */
 
 static inline void ceph_fscrypt_set_ops(struct super_block *sb)
@@ -27,6 +30,15 @@ static inline void ceph_fscrypt_set_ops(struct super_block *sb)
 static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
 {
 }
+
+static inline int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
+						struct ceph_acl_sec_ctx *as)
+{
+	if (IS_ENCRYPTED(dir))
+		return -EOPNOTSUPP;
+	return 0;
+}
+
 #endif /* CONFIG_FS_ENCRYPTION */
 
 #endif
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 7b70187cc564..64cdc4513c8a 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -83,12 +83,17 @@ struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
 			goto out_err;
 	}
 
+	inode->i_state = 0;
+	inode->i_mode = *mode;
+
 	err = ceph_security_init_secctx(dentry, *mode, as_ctx);
 	if (err < 0)
 		goto out_err;
 
-	inode->i_state = 0;
-	inode->i_mode = *mode;
+	err = ceph_fscrypt_prepare_context(dir, inode, as_ctx);
+	if (err)
+		goto out_err;
+
 	return inode;
 out_err:
 	iput(inode);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 8a40374f9154..3b85a5154b49 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1034,6 +1034,9 @@ struct ceph_acl_sec_ctx {
 #ifdef CONFIG_CEPH_FS_SECURITY_LABEL
 	void *sec_ctx;
 	u32 sec_ctxlen;
+#endif
+#ifdef CONFIG_FS_ENCRYPTION
+	u8	fscrypt[FSCRYPT_SET_CONTEXT_MAX_SIZE];
 #endif
 	struct ceph_pagelist *pagelist;
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 09/19] ceph: make ceph_msdc_build_path use ref-walk
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (7 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 08/19] ceph: add routine to create fscrypt context prior to RPC Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 10/19] ceph: add encrypted fname handling to ceph_mdsc_build_path Jeff Layton
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

Encryption potentially requires allocation, at which point we'll need to
be in a non-atomic context. Convert ceph_msdc_build_path to take dentry
spinlocks and references instead of using rcu_read_lock to walk the
path.

This is slightly less efficient, and we may want to eventually allow
using RCU when the leaf dentry isn't encrypted.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index e3284de74ca4..d7a40e83f12f 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2323,7 +2323,8 @@ static inline  u64 __get_oldest_tid(struct ceph_mds_client *mdsc)
 char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 			   int stop_on_nosnap)
 {
-	struct dentry *temp;
+	struct dentry *cur;
+	struct inode *inode;
 	char *path;
 	int pos;
 	unsigned seq;
@@ -2340,34 +2341,35 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 	path[pos] = '\0';
 
 	seq = read_seqbegin(&rename_lock);
-	rcu_read_lock();
-	temp = dentry;
+	cur = dget(dentry);
 	for (;;) {
-		struct inode *inode;
+		struct dentry *temp;
 
-		spin_lock(&temp->d_lock);
-		inode = d_inode(temp);
+		spin_lock(&cur->d_lock);
+		inode = d_inode(cur);
 		if (inode && ceph_snap(inode) == CEPH_SNAPDIR) {
 			dout("build_path path+%d: %p SNAPDIR\n",
-			     pos, temp);
-		} else if (stop_on_nosnap && inode && dentry != temp &&
+			     pos, cur);
+		} else if (stop_on_nosnap && inode && dentry != cur &&
 			   ceph_snap(inode) == CEPH_NOSNAP) {
-			spin_unlock(&temp->d_lock);
+			spin_unlock(&cur->d_lock);
 			pos++; /* get rid of any prepended '/' */
 			break;
 		} else {
-			pos -= temp->d_name.len;
+			pos -= cur->d_name.len;
 			if (pos < 0) {
-				spin_unlock(&temp->d_lock);
+				spin_unlock(&cur->d_lock);
 				break;
 			}
-			memcpy(path + pos, temp->d_name.name, temp->d_name.len);
+			memcpy(path + pos, cur->d_name.name, cur->d_name.len);
 		}
+		temp = cur;
 		spin_unlock(&temp->d_lock);
-		temp = READ_ONCE(temp->d_parent);
+		cur = dget_parent(temp);
+		dput(temp);
 
 		/* Are we at the root? */
-		if (IS_ROOT(temp))
+		if (IS_ROOT(cur))
 			break;
 
 		/* Are we out of buffer? */
@@ -2376,8 +2378,9 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 
 		path[pos] = '/';
 	}
-	base = ceph_ino(d_inode(temp));
-	rcu_read_unlock();
+	inode = d_inode(cur);
+	base = inode ? ceph_ino(inode) : 0;
+	dput(cur);
 
 	if (read_seqretry(&rename_lock, seq))
 		goto retry;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 10/19] ceph: add encrypted fname handling to ceph_mdsc_build_path
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (8 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 09/19] ceph: make ceph_msdc_build_path use ref-walk Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 11/19] ceph: decode alternate_name in lease info Jeff Layton
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

Allow ceph_mdsc_build_path to encrypt and base64 encode the filename
when the parent is encrypted and we're sending the path to the MDS.

In most cases, we just encrypt the filenames and base64 encode them,
but when the name is longer than CEPH_NOHASH_NAME_MAX, we use a similar
scheme to fscrypt proper, and hash the remaning bits with sha256.

When doing this, we then send along the full crypttext of the name in
the new alternate_name field of the MClientRequest. The MDS can then
send that along in readdir responses and traces.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/crypto.h     |  16 +++++
 fs/ceph/mds_client.c | 138 +++++++++++++++++++++++++++++++++++++------
 2 files changed, 136 insertions(+), 18 deletions(-)

diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index cc4e481bf13a..331b9c8da7fb 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -6,11 +6,27 @@
 #ifndef _CEPH_CRYPTO_H
 #define _CEPH_CRYPTO_H
 
+#include <crypto/sha2.h>
 #include <linux/fscrypt.h>
 
 #define	CEPH_XATTR_NAME_ENCRYPTION_CONTEXT	"encryption.ctx"
 
 #ifdef CONFIG_FS_ENCRYPTION
+
+/*
+ * We want to encrypt filenames when creating them, but the encrypted
+ * versions of those names may have illegal characters in them. To mitigate
+ * that, we base64 encode them, but that gives us a result that can exceed
+ * NAME_MAX.
+ *
+ * Follow a similar scheme to fscrypt itself, and cap the filename to a
+ * smaller size. If the cleartext name is longer than the value below, then
+ * sha256 hash the remaining bytes.
+ *
+ * 189 bytes => 252 bytes base64-encoded, which is <= NAME_MAX (255)
+ */
+#define CEPH_NOHASH_NAME_MAX (189 - SHA256_DIGEST_SIZE)
+
 void ceph_fscrypt_set_ops(struct super_block *sb);
 
 static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index d7a40e83f12f..814f74d88748 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -13,6 +13,7 @@
 #include <linux/ktime.h>
 
 #include "super.h"
+#include "crypto.h"
 #include "mds_client.h"
 
 #include <linux/ceph/ceph_features.h>
@@ -2310,18 +2311,85 @@ static inline  u64 __get_oldest_tid(struct ceph_mds_client *mdsc)
 	return mdsc->oldest_tid;
 }
 
-/*
- * Build a dentry's path.  Allocate on heap; caller must kfree.  Based
- * on build_path_from_dentry in fs/cifs/dir.c.
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static int encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf)
+{
+	u32 len;
+	int elen;
+	int ret;
+	u8 *cryptbuf;
+
+	WARN_ON_ONCE(!fscrypt_has_encryption_key(parent));
+
+	/*
+	 * convert cleartext dentry name to ciphertext
+	 * if result is longer than CEPH_NOKEY_NAME_MAX,
+	 * sha256 the remaining bytes
+	 *
+	 * See: fscrypt_setup_filename
+	 */
+	if (!fscrypt_fname_encrypted_size(parent, dentry->d_name.len, NAME_MAX, &len))
+		return -ENAMETOOLONG;
+
+	/* If we have to hash the end, then we need a full-length buffer */
+	if (len > CEPH_NOHASH_NAME_MAX)
+		len = NAME_MAX;
+
+	cryptbuf = kmalloc(len, GFP_KERNEL);
+	if (!cryptbuf)
+		return -ENOMEM;
+
+	ret = fscrypt_fname_encrypt(parent, &dentry->d_name, cryptbuf, len);
+	if (ret) {
+		kfree(cryptbuf);
+		return ret;
+	}
+
+	/* hash the end if the name is long enough */
+	if (len > CEPH_NOHASH_NAME_MAX) {
+		u8 hash[SHA256_DIGEST_SIZE];
+		u8 *extra = cryptbuf + CEPH_NOHASH_NAME_MAX;
+
+		/* hash the extra bytes and overwrite crypttext beyond that point with it */
+		sha256(extra, len - CEPH_NOHASH_NAME_MAX, hash);
+		memcpy(extra, hash, SHA256_DIGEST_SIZE);
+		len = CEPH_NOHASH_NAME_MAX + SHA256_DIGEST_SIZE;
+	}
+
+	/* base64 encode the encrypted name */
+	elen = fscrypt_base64_encode(cryptbuf, len, buf);
+	kfree(cryptbuf);
+	dout("base64-encoded ciphertext name = %.*s\n", len, buf);
+	return elen;
+}
+#else
+static int encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
+/**
+ * ceph_mdsc_build_path - build a path string to a given dentry
+ * @dentry: dentry to which path should be built
+ * @plen: returned length of string
+ * @pbase: returned base inode number
+ * @for_wire: is this path going to be sent to the MDS?
+ *
+ * Build a string that represents the path to the dentry. This is mostly called
+ * for two different purposes:
  *
- * If @stop_on_nosnap, generate path relative to the first non-snapped
- * inode.
+ * 1) we need to build a path string to send to the MDS (for_wire == true)
+ * 2) we need a path string for local presentation (e.g. debugfs) (for_wire == false)
+ *
+ * The path is built in reverse, starting with the dentry. Walk back up toward
+ * the root, building the path until the first non-snapped inode is reached (for_wire)
+ * or the root inode is reached (!for_wire).
  *
  * Encode hidden .snap dirs as a double /, i.e.
  *   foo/.snap/bar -> foo//bar
  */
-char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
-			   int stop_on_nosnap)
+char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase, int for_wire)
 {
 	struct dentry *cur;
 	struct inode *inode;
@@ -2343,30 +2411,65 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 	seq = read_seqbegin(&rename_lock);
 	cur = dget(dentry);
 	for (;;) {
-		struct dentry *temp;
+		struct dentry *parent;
 
 		spin_lock(&cur->d_lock);
 		inode = d_inode(cur);
 		if (inode && ceph_snap(inode) == CEPH_SNAPDIR) {
 			dout("build_path path+%d: %p SNAPDIR\n",
 			     pos, cur);
-		} else if (stop_on_nosnap && inode && dentry != cur &&
-			   ceph_snap(inode) == CEPH_NOSNAP) {
+			spin_unlock(&cur->d_lock);
+			parent = dget_parent(cur);
+		} else if (for_wire && inode && dentry != cur && ceph_snap(inode) == CEPH_NOSNAP) {
 			spin_unlock(&cur->d_lock);
 			pos++; /* get rid of any prepended '/' */
 			break;
-		} else {
+		} else if (!for_wire || !IS_ENCRYPTED(d_inode(cur->d_parent))) {
 			pos -= cur->d_name.len;
 			if (pos < 0) {
 				spin_unlock(&cur->d_lock);
 				break;
 			}
 			memcpy(path + pos, cur->d_name.name, cur->d_name.len);
+			spin_unlock(&cur->d_lock);
+			parent = dget_parent(cur);
+		} else {
+			int len, ret;
+			char buf[FSCRYPT_BASE64_CHARS(NAME_MAX)];
+
+			/*
+			 * Proactively copy name into buf, in case we need to present
+			 * it as-is.
+			 */
+			memcpy(buf, cur->d_name.name, cur->d_name.len);
+			len = cur->d_name.len;
+			spin_unlock(&cur->d_lock);
+			parent = dget_parent(cur);
+
+			ret = __fscrypt_prepare_readdir(d_inode(parent));
+			if (ret < 0) {
+				dput(parent);
+				dput(cur);
+				return ERR_PTR(ret);
+			}
+
+			if (fscrypt_has_encryption_key(d_inode(parent))) {
+				len = encode_encrypted_fname(d_inode(parent), cur, buf);
+				if (len < 0) {
+					dput(parent);
+					dput(cur);
+					return ERR_PTR(len);
+				}
+			}
+			pos -= len;
+			if (pos < 0) {
+				dput(parent);
+				break;
+			}
+			memcpy(path + pos, buf, len);
 		}
-		temp = cur;
-		spin_unlock(&temp->d_lock);
-		cur = dget_parent(temp);
-		dput(temp);
+		dput(cur);
+		cur = parent;
 
 		/* Are we at the root? */
 		if (IS_ROOT(cur))
@@ -2390,8 +2493,7 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *pbase,
 		 * A rename didn't occur, but somehow we didn't end up where
 		 * we thought we would. Throw a warning and try again.
 		 */
-		pr_warn("build_path did not end path lookup where "
-			"expected, pos is %d\n", pos);
+		pr_warn("build_path did not end path lookup where expected (pos = %d)\n", pos);
 		goto retry;
 	}
 
@@ -2411,7 +2513,7 @@ static int build_dentry_path(struct dentry *dentry, struct inode *dir,
 	rcu_read_lock();
 	if (!dir)
 		dir = d_inode_rcu(dentry->d_parent);
-	if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP) {
+	if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP && !IS_ENCRYPTED(dir)) {
 		*pino = ceph_ino(dir);
 		rcu_read_unlock();
 		*ppath = dentry->d_name.name;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 11/19] ceph: decode alternate_name in lease info
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (9 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 10/19] ceph: add encrypted fname handling to ceph_mdsc_build_path Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 12/19] ceph: send altname in MClientRequest Jeff Layton
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

Ceph is a bit different from local filesystems, in that we don't want
to store filenames as raw binary data, since we may also be dealing
with clients that don't support fscrypt.

We could just base64-encode the encrypted filenames, but that could
leave us with filenames longer than NAME_MAX. It turns out that the
MDS doesn't care much about filename length, but the clients do.

To manage this, we've added a new "alternate name" field that can be
optionally added to any dentry that we'll use to store the binary
crypttext of the filename if its base64-encoded value will be longer
than NAME_MAX. When a dentry has one of these names attached, the MDS
will send it along in the lease info, which we can then store for
later usage.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 40 ++++++++++++++++++++++++++++++----------
 fs/ceph/mds_client.h | 11 +++++++----
 2 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 814f74d88748..31a4c9674681 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -256,27 +256,44 @@ static int parse_reply_info_dir(void **p, void *end,
 
 static int parse_reply_info_lease(void **p, void *end,
 				  struct ceph_mds_reply_lease **lease,
-				  u64 features)
+				  u64 features, u32 *altname_len, u8 **altname)
 {
+	u8 struct_v;
+	u32 struct_len;
+
 	if (features == (u64)-1) {
-		u8 struct_v, struct_compat;
-		u32 struct_len;
+		u8 struct_compat;
+
 		ceph_decode_8_safe(p, end, struct_v, bad);
 		ceph_decode_8_safe(p, end, struct_compat, bad);
+
 		/* struct_v is expected to be >= 1. we only understand
 		 * encoding whose struct_compat == 1. */
 		if (!struct_v || struct_compat != 1)
 			goto bad;
+
 		ceph_decode_32_safe(p, end, struct_len, bad);
-		ceph_decode_need(p, end, struct_len, bad);
-		end = *p + struct_len;
+	} else {
+		struct_len = sizeof(**lease);
+		*altname_len = 0;
+		*altname = NULL;
 	}
 
-	ceph_decode_need(p, end, sizeof(**lease), bad);
+	ceph_decode_need(p, end, struct_len, bad);
 	*lease = *p;
 	*p += sizeof(**lease);
-	if (features == (u64)-1)
-		*p = end;
+
+	if (features == (u64)-1) {
+		if (struct_v >= 2) {
+			ceph_decode_32_safe(p, end, *altname_len, bad);
+			ceph_decode_need(p, end, *altname_len, bad);
+			*altname = *p;
+			*p += *altname_len;
+		} else {
+			*altname = NULL;
+			*altname_len = 0;
+		}
+	}
 	return 0;
 bad:
 	return -EIO;
@@ -306,7 +323,8 @@ static int parse_reply_info_trace(void **p, void *end,
 		info->dname = *p;
 		*p += info->dname_len;
 
-		err = parse_reply_info_lease(p, end, &info->dlease, features);
+		err = parse_reply_info_lease(p, end, &info->dlease, features,
+					     &info->altname_len, &info->altname);
 		if (err < 0)
 			goto out_bad;
 	}
@@ -373,9 +391,11 @@ static int parse_reply_info_readdir(void **p, void *end,
 		dout("parsed dir dname '%.*s'\n", rde->name_len, rde->name);
 
 		/* dentry lease */
-		err = parse_reply_info_lease(p, end, &rde->lease, features);
+		err = parse_reply_info_lease(p, end, &rde->lease, features,
+					     &rde->altname_len, &rde->altname);
 		if (err)
 			goto out_bad;
+
 		/* inode */
 		err = parse_reply_info_in(p, end, &rde->inode, features);
 		if (err < 0)
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 2b18ea24c650..b6aeca9b241b 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -29,8 +29,8 @@ enum ceph_feature_type {
 	CEPHFS_FEATURE_MULTI_RECONNECT,
 	CEPHFS_FEATURE_DELEG_INO,
 	CEPHFS_FEATURE_METRIC_COLLECT,
-
-	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
+	CEPHFS_FEATURE_ALTERNATE_NAME,
+	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_ALTERNATE_NAME,
 };
 
 /*
@@ -45,8 +45,7 @@ enum ceph_feature_type {
 	CEPHFS_FEATURE_MULTI_RECONNECT,		\
 	CEPHFS_FEATURE_DELEG_INO,		\
 	CEPHFS_FEATURE_METRIC_COLLECT,		\
-						\
-	CEPHFS_FEATURE_MAX,			\
+	CEPHFS_FEATURE_ALTERNATE_NAME,		\
 }
 #define CEPHFS_FEATURES_CLIENT_REQUIRED {}
 
@@ -93,7 +92,9 @@ struct ceph_mds_reply_info_in {
 
 struct ceph_mds_reply_dir_entry {
 	char                          *name;
+	u8			      *altname;
 	u32                           name_len;
+	u32			      altname_len;
 	struct ceph_mds_reply_lease   *lease;
 	struct ceph_mds_reply_info_in inode;
 	loff_t			      offset;
@@ -112,7 +113,9 @@ struct ceph_mds_reply_info_parsed {
 	struct ceph_mds_reply_info_in diri, targeti;
 	struct ceph_mds_reply_dirfrag *dirfrag;
 	char                          *dname;
+	u8			      *altname;
 	u32                           dname_len;
+	u32                           altname_len;
 	struct ceph_mds_reply_lease   *dlease;
 
 	/* extra */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 12/19] ceph: send altname in MClientRequest
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (10 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 11/19] ceph: decode alternate_name in lease info Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 13/19] ceph: properly set DCACHE_NOKEY_NAME flag in lookup Jeff Layton
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

In the event that we have a filename longer than CEPH_NOHASH_NAME_MAX,
we'll need to hash the tail of the filename. The client however will
still need to know the full name of the file if it has a key.

To support this, the MClientRequest field has grown a new alternate_name
field that we populate with the full (binary) crypttext of the filename.
This is then transmitted to the clients in readdir or traces as part of
the dentry lease.

Add support for populating this field when the filenames are very long.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 79 +++++++++++++++++++++++++++++++++++++++++---
 fs/ceph/mds_client.h |  2 ++
 2 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 31a4c9674681..a2c25292c4b1 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -858,6 +858,7 @@ void ceph_mdsc_release_request(struct kref *kref)
 	put_cred(req->r_cred);
 	if (req->r_pagelist)
 		ceph_pagelist_release(req->r_pagelist);
+	kfree(req->r_altname);
 	put_request_session(req);
 	ceph_unreserve_caps(req->r_mdsc, &req->r_caps_reservation);
 	WARN_ON_ONCE(!list_empty(&req->r_wait));
@@ -2382,11 +2383,66 @@ static int encode_encrypted_fname(const struct inode *parent, struct dentry *den
 	dout("base64-encoded ciphertext name = %.*s\n", len, buf);
 	return elen;
 }
+
+static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
+{
+	struct inode *dir = req->r_parent;
+	struct dentry *dentry = req->r_dentry;
+	u8 *cryptbuf = NULL;
+	u32 len = 0;
+	int ret = 0;
+
+	/* only encode if we have parent and dentry */
+	if (!dir || !dentry)
+		goto success;
+
+	/* No-op unless this is encrypted */
+	if (!IS_ENCRYPTED(dir))
+		goto success;
+
+	ret = __fscrypt_prepare_readdir(dir);
+	if (ret)
+		return ERR_PTR(ret);
+
+	/* No key? Just ignore it. */
+	if (!fscrypt_has_encryption_key(dir))
+		goto success;
+
+	if (!fscrypt_fname_encrypted_size(dir, dentry->d_name.len, NAME_MAX, &len)) {
+		WARN_ON_ONCE(1);
+		return ERR_PTR(-ENAMETOOLONG);
+	}
+
+	/* No need to append altname if name is short enough */
+	if (len <= CEPH_NOHASH_NAME_MAX) {
+		len = 0;
+		goto success;
+	}
+
+	cryptbuf = kmalloc(len, GFP_KERNEL);
+	if (!cryptbuf)
+		return ERR_PTR(-ENOMEM);
+
+	ret = fscrypt_fname_encrypt(dir, &dentry->d_name, cryptbuf, len);
+	if (ret) {
+		kfree(cryptbuf);
+		return ERR_PTR(ret);
+	}
+success:
+	*plen = len;
+	return cryptbuf;
+}
 #else
 static int encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf)
 {
 	return -EOPNOTSUPP;
 }
+
+static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
+{
+	*plen = 0;
+	return NULL;
+}
 #endif
 
 /**
@@ -2601,7 +2657,7 @@ static int set_request_path_attr(struct inode *rinode, struct dentry *rdentry,
 	return r;
 }
 
-static void encode_timestamp_and_gids(void **p,
+static void encode_mclientrequest_tail(void **p,
 				      const struct ceph_mds_request *req)
 {
 	struct ceph_timespec ts;
@@ -2610,11 +2666,16 @@ static void encode_timestamp_and_gids(void **p,
 	ceph_encode_timespec64(&ts, &req->r_stamp);
 	ceph_encode_copy(p, &ts, sizeof(ts));
 
-	/* gid_list */
+	/* v4: gid_list */
 	ceph_encode_32(p, req->r_cred->group_info->ngroups);
 	for (i = 0; i < req->r_cred->group_info->ngroups; i++)
 		ceph_encode_64(p, from_kgid(&init_user_ns,
 					    req->r_cred->group_info->gid[i]));
+
+	/* v5: altname */
+	ceph_encode_32(p, req->r_altname_len);
+	if (req->r_altname_len)
+		ceph_encode_copy(p, req->r_altname, req->r_altname_len);
 }
 
 /*
@@ -2659,10 +2720,18 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 		goto out_free1;
 	}
 
+	req->r_altname = get_fscrypt_altname(req, &req->r_altname_len);
+	if (IS_ERR(req->r_altname)) {
+		msg = ERR_CAST(req->r_altname);
+		req->r_altname = NULL;
+		goto out_free2;
+	}
+
 	len = legacy ? sizeof(*head) : sizeof(struct ceph_mds_request_head);
 	len += pathlen1 + pathlen2 + 2*(1 + sizeof(u32) + sizeof(u64)) +
 		sizeof(struct ceph_timespec);
 	len += sizeof(u32) + (sizeof(u64) * req->r_cred->group_info->ngroups);
+	len += sizeof(u32) + req->r_altname_len;
 
 	/* calculate (max) length for cap releases */
 	len += sizeof(struct ceph_mds_request_release) *
@@ -2693,7 +2762,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 	} else {
 		struct ceph_mds_request_head *new_head = msg->front.iov_base;
 
-		msg->hdr.version = cpu_to_le16(4);
+		msg->hdr.version = cpu_to_le16(5);
 		new_head->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
 		head = (struct ceph_mds_request_head_old *)&new_head->oldest_client_tid;
 		p = msg->front.iov_base + sizeof(*new_head);
@@ -2744,7 +2813,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
 
 	head->num_releases = cpu_to_le16(releases);
 
-	encode_timestamp_and_gids(&p, req);
+	encode_mclientrequest_tail(&p, req);
 
 	if (WARN_ON_ONCE(p > end)) {
 		ceph_msg_put(msg);
@@ -2853,7 +2922,7 @@ static int __prepare_send_request(struct ceph_mds_session *session,
 		rhead->num_releases = 0;
 
 		p = msg->front.iov_base + req->r_request_release_offset;
-		encode_timestamp_and_gids(&p, req);
+		encode_mclientrequest_tail(&p, req);
 
 		msg->front.iov_len = p - msg->front.iov_base;
 		msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index b6aeca9b241b..33b8dba7a44e 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -278,6 +278,8 @@ struct ceph_mds_request {
 	struct mutex r_fill_mutex;
 
 	union ceph_mds_request_args r_args;
+	u8 *r_altname;		    /* fscrypt binary crypttext for long filenames */
+	u32 r_altname_len;	    /* length of r_altname */
 	int r_fmode;        /* file mode, if expecting cap */
 	const struct cred *r_cred;
 	int r_request_release_offset;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 13/19] ceph: properly set DCACHE_NOKEY_NAME flag in lookup
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (11 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 12/19] ceph: send altname in MClientRequest Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 14/19] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries Jeff Layton
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 496d24b003dd..72728850e96c 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -755,6 +755,17 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
 	if (dentry->d_name.len > NAME_MAX)
 		return ERR_PTR(-ENAMETOOLONG);
 
+	if (IS_ENCRYPTED(dir)) {
+		err = __fscrypt_prepare_readdir(dir);
+		if (err)
+			return ERR_PTR(err);
+		if (!fscrypt_has_encryption_key(dir)) {
+			spin_lock(&dentry->d_lock);
+			dentry->d_flags |= DCACHE_NOKEY_NAME;
+			spin_unlock(&dentry->d_lock);
+		}
+	}
+
 	/* can we conclude ENOENT locally? */
 	if (d_really_is_negative(dentry)) {
 		struct ceph_inode_info *ci = ceph_inode(dir);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 14/19] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (12 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 13/19] ceph: properly set DCACHE_NOKEY_NAME flag in lookup Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 15/19] ceph: add helpers for converting names for userland presentation Jeff Layton
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

If we have a dentry which represents a no-key name, then we need to test
whether the parent directory's encryption key has since been added.  Do
that before we test anything else about the dentry.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 72728850e96c..867e396f44f1 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1697,6 +1697,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 	struct inode *dir, *inode;
 	struct ceph_mds_client *mdsc;
 
+	valid = fscrypt_d_revalidate(dentry, flags);
+	if (valid <= 0)
+		return valid;
+
 	if (flags & LOOKUP_RCU) {
 		parent = READ_ONCE(dentry->d_parent);
 		dir = d_inode_rcu(parent);
@@ -1709,8 +1713,8 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 		inode = d_inode(dentry);
 	}
 
-	dout("d_revalidate %p '%pd' inode %p offset 0x%llx\n", dentry,
-	     dentry, inode, ceph_dentry(dentry)->offset);
+	dout("d_revalidate %p '%pd' inode %p offset 0x%llx nokey %d\n", dentry,
+	     dentry, inode, ceph_dentry(dentry)->offset, !!(dentry->d_flags & DCACHE_NOKEY_NAME));
 
 	mdsc = ceph_sb_to_client(dir->i_sb)->mdsc;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 15/19] ceph: add helpers for converting names for userland presentation
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (13 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 14/19] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 16/19] ceph: add fscrypt support to ceph_fill_trace Jeff Layton
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/crypto.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/crypto.h | 41 ++++++++++++++++++++++++++
 2 files changed, 117 insertions(+)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index f037a4939026..9fed68f37629 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -107,3 +107,79 @@ int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
 		ceph_pagelist_release(pagelist);
 	return ret;
 }
+
+/**
+ * ceph_fname_to_usr - convert a filename for userland presentation
+ * @fname: ceph_fname to be converted
+ * @tname: temporary name buffer to use for conversion (may be NULL)
+ * @oname: where converted name should be placed
+ * @is_nokey: set to true if key wasn't available during conversion (may be NULL)
+ *
+ * Given a filename (usually from the MDS), format it for presentation to
+ * userland. If @parent is not encrypted, just pass it back as-is.
+ *
+ * Otherwise, base64 decode the string, and then ask fscrypt to format it
+ * for userland presentation.
+ *
+ * Returns 0 on success or negative error code on error.
+ */
+int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+		      struct fscrypt_str *oname, bool *is_nokey)
+{
+	int ret;
+	struct fscrypt_str _tname = FSTR_INIT(NULL, 0);
+	struct fscrypt_str iname;
+
+	if (!IS_ENCRYPTED(fname->dir)) {
+		oname->name = fname->name;
+		oname->len = fname->name_len;
+		return 0;
+	}
+
+	/* Sanity check that the resulting name will fit in the buffer */
+	if (fname->name_len > FSCRYPT_BASE64_CHARS(NAME_MAX))
+		return -EIO;
+
+	ret = __fscrypt_prepare_readdir(fname->dir);
+	if (ret)
+		return ret;
+
+	/*
+	 * Use the raw dentry name as sent by the MDS instead of
+	 * generating a nokey name via fscrypt.
+	 */
+	if (!fscrypt_has_encryption_key(fname->dir)) {
+		memcpy(oname->name, fname->name, fname->name_len);
+		oname->len = fname->name_len;
+		if (is_nokey)
+			*is_nokey = true;
+		return 0;
+	}
+
+	if (fname->ctext_len == 0) {
+		int declen;
+
+		if (!tname) {
+			ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
+			if (ret)
+				return ret;
+			tname = &_tname;
+		}
+
+		declen = fscrypt_base64_decode(fname->name, fname->name_len, tname->name);
+		if (declen <= 0) {
+			ret = -EIO;
+			goto out;
+		}
+		iname.name = tname->name;
+		iname.len = declen;
+	} else {
+		iname.name = fname->ctext;
+		iname.len = fname->ctext_len;
+	}
+
+	ret = fscrypt_fname_disk_to_usr(fname->dir, 0, 0, &iname, oname);
+out:
+	fscrypt_fname_free_buffer(&_tname);
+	return ret;
+}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h
index 331b9c8da7fb..5a3fb68eb814 100644
--- a/fs/ceph/crypto.h
+++ b/fs/ceph/crypto.h
@@ -11,6 +11,14 @@
 
 #define	CEPH_XATTR_NAME_ENCRYPTION_CONTEXT	"encryption.ctx"
 
+struct ceph_fname {
+	struct inode	*dir;
+	char 		*name;		// b64 encoded, possibly hashed
+	unsigned char	*ctext;		// binary crypttext (if any)
+	u32		name_len;	// length of name buffer
+	u32		ctext_len;	// length of crypttext
+};
+
 #ifdef CONFIG_FS_ENCRYPTION
 
 /*
@@ -37,6 +45,22 @@ static inline void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client *fsc)
 int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode,
 				 struct ceph_acl_sec_ctx *as);
 
+static inline int ceph_fname_alloc_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+	if (!IS_ENCRYPTED(parent))
+		return 0;
+	return fscrypt_fname_alloc_buffer(NAME_MAX, fname);
+}
+
+static inline void ceph_fname_free_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+	if (IS_ENCRYPTED(parent))
+		fscrypt_fname_free_buffer(fname);
+}
+
+int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+			struct fscrypt_str *oname, bool *is_nokey);
+
 #else /* CONFIG_FS_ENCRYPTION */
 
 static inline void ceph_fscrypt_set_ops(struct super_block *sb)
@@ -55,6 +79,23 @@ static inline int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *
 	return 0;
 }
 
+static inline int ceph_fname_alloc_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+	return 0;
+}
+
+static inline void ceph_fname_free_buffer(struct inode *parent, struct fscrypt_str *fname)
+{
+}
+
+static inline int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname,
+				    struct fscrypt_str *oname, bool *is_nokey)
+{
+	oname->name = fname->name;
+	oname->len = fname->name_len;
+	return 0;
+}
+
 #endif /* CONFIG_FS_ENCRYPTION */
 
 #endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 16/19] ceph: add fscrypt support to ceph_fill_trace
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (14 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 15/19] ceph: add helpers for converting names for userland presentation Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 17/19] ceph: add support to readdir for encrypted filenames Jeff Layton
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

When we get a dentry in a trace, decrypt the name so we can properly
instantiate the dentry.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/inode.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 64cdc4513c8a..39f4c0dfa071 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1381,8 +1381,15 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 		if (dir && req->r_op == CEPH_MDS_OP_LOOKUPNAME &&
 		    test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags) &&
 		    !test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags)) {
+			bool is_nokey = false;
 			struct qstr dname;
 			struct dentry *dn, *parent;
+			struct fscrypt_str oname = FSTR_INIT(NULL, 0);
+			struct ceph_fname fname = { .dir	= dir,
+						    .name	= rinfo->dname,
+						    .ctext	= rinfo->altname,
+						    .name_len	= rinfo->dname_len,
+						    .ctext_len	= rinfo->altname_len };
 
 			BUG_ON(!rinfo->head->is_target);
 			BUG_ON(req->r_dentry);
@@ -1390,8 +1397,20 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 			parent = d_find_any_alias(dir);
 			BUG_ON(!parent);
 
-			dname.name = rinfo->dname;
-			dname.len = rinfo->dname_len;
+			err = ceph_fname_alloc_buffer(dir, &oname);
+			if (err < 0) {
+				dput(parent);
+				goto done;
+			}
+
+			err = ceph_fname_to_usr(&fname, NULL, &oname, &is_nokey);
+			if (err < 0) {
+				dput(parent);
+				ceph_fname_free_buffer(dir, &oname);
+				goto done;
+			}
+			dname.name = oname.name;
+			dname.len = oname.len;
 			dname.hash = full_name_hash(parent, dname.name, dname.len);
 			tvino.ino = le64_to_cpu(rinfo->targeti.in->ino);
 			tvino.snap = le64_to_cpu(rinfo->targeti.in->snapid);
@@ -1406,9 +1425,15 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 				     dname.len, dname.name, dn);
 				if (!dn) {
 					dput(parent);
+					ceph_fname_free_buffer(dir, &oname);
 					err = -ENOMEM;
 					goto done;
 				}
+				if (is_nokey) {
+					spin_lock(&dn->d_lock);
+					dn->d_flags |= DCACHE_NOKEY_NAME;
+					spin_unlock(&dn->d_lock);
+				}
 				err = 0;
 			} else if (d_really_is_positive(dn) &&
 				   (ceph_ino(d_inode(dn)) != tvino.ino ||
@@ -1420,6 +1445,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
 				dput(dn);
 				goto retry_lookup;
 			}
+			ceph_fname_free_buffer(dir, &oname);
 
 			req->r_dentry = dn;
 			dput(parent);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 17/19] ceph: add support to readdir for encrypted filenames
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (15 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 16/19] ceph: add fscrypt support to ceph_fill_trace Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 18/19] ceph: create symlinks with encrypted and base64-encoded targets Jeff Layton
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

Add helper functions for buffer management and for decrypting filenames
returned by the MDS. Wire those into the readdir codepaths.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c   | 62 +++++++++++++++++++++++++++++++++++++++----------
 fs/ceph/inode.c | 38 +++++++++++++++++++++++++++---
 2 files changed, 85 insertions(+), 15 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 867e396f44f1..7fe74c2f3911 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -9,6 +9,7 @@
 
 #include "super.h"
 #include "mds_client.h"
+#include "crypto.h"
 
 /*
  * Directory operations: readdir, lookup, create, link, unlink,
@@ -241,7 +242,9 @@ static int __dcache_readdir(struct file *file,  struct dir_context *ctx,
 		di = ceph_dentry(dentry);
 		if (d_unhashed(dentry) ||
 		    d_really_is_negative(dentry) ||
-		    di->lease_shared_gen != shared_gen) {
+		    di->lease_shared_gen != shared_gen ||
+		    ((dentry->d_flags & DCACHE_NOKEY_NAME) &&
+		     fscrypt_has_encryption_key(dir))) {
 			spin_unlock(&dentry->d_lock);
 			dput(dentry);
 			err = -EAGAIN;
@@ -313,6 +316,8 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 	int err;
 	unsigned frag = -1;
 	struct ceph_mds_reply_info_parsed *rinfo;
+	struct fscrypt_str tname = FSTR_INIT(NULL, 0);
+	struct fscrypt_str oname = FSTR_INIT(NULL, 0);
 
 	dout("readdir %p file %p pos %llx\n", inode, file, ctx->pos);
 	if (dfi->file_info.flags & CEPH_F_ATEND)
@@ -340,6 +345,10 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 		ctx->pos = 2;
 	}
 
+	err = fscrypt_prepare_readdir(inode);
+	if (err)
+		goto out;
+
 	spin_lock(&ci->i_ceph_lock);
 	/* request Fx cap. if have Fx, we don't need to release Fs cap
 	 * for later create/unlink. */
@@ -360,6 +369,14 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 		spin_unlock(&ci->i_ceph_lock);
 	}
 
+	err = ceph_fname_alloc_buffer(inode, &tname);
+	if (err < 0)
+		goto out;
+
+	err = ceph_fname_alloc_buffer(inode, &oname);
+	if (err < 0)
+		goto out;
+
 	/* proceed with a normal readdir */
 more:
 	/* do we have the correct frag content buffered? */
@@ -387,12 +404,14 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 		dout("readdir fetching %llx.%llx frag %x offset '%s'\n",
 		     ceph_vinop(inode), frag, dfi->last_name);
 		req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
-		if (IS_ERR(req))
-			return PTR_ERR(req);
+		if (IS_ERR(req)) {
+			err = PTR_ERR(req);
+			goto out;
+		}
 		err = ceph_alloc_readdir_reply_buffer(req, inode);
 		if (err) {
 			ceph_mdsc_put_request(req);
-			return err;
+			goto out;
 		}
 		/* hints to request -> mds selection code */
 		req->r_direct_mode = USE_AUTH_MDS;
@@ -405,7 +424,8 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 			req->r_path2 = kstrdup(dfi->last_name, GFP_KERNEL);
 			if (!req->r_path2) {
 				ceph_mdsc_put_request(req);
-				return -ENOMEM;
+				err = -ENOMEM;
+				goto out;
 			}
 		} else if (is_hash_order(ctx->pos)) {
 			req->r_args.readdir.offset_hash =
@@ -426,7 +446,7 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 		err = ceph_mdsc_do_request(mdsc, NULL, req);
 		if (err < 0) {
 			ceph_mdsc_put_request(req);
-			return err;
+			goto out;
 		}
 		dout("readdir got and parsed readdir result=%d on "
 		     "frag %x, end=%d, complete=%d, hash_order=%d\n",
@@ -479,7 +499,7 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 			err = note_last_dentry(dfi, rde->name, rde->name_len,
 					       next_offset);
 			if (err)
-				return err;
+				goto out;
 		} else if (req->r_reply_info.dir_end) {
 			dfi->next_offset = 2;
 			/* keep last name */
@@ -507,22 +527,37 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 	}
 	for (; i < rinfo->dir_nr; i++) {
 		struct ceph_mds_reply_dir_entry *rde = rinfo->dir_entries + i;
+		struct ceph_fname fname = { .dir	= inode,
+					    .name	= rde->name,
+					    .name_len	= rde->name_len,
+					    .ctext	= rde->altname,
+					    .ctext_len	= rde->altname_len };
+		u32 olen = oname.len;
 
 		BUG_ON(rde->offset < ctx->pos);
+		BUG_ON(!rde->inode.in);
 
 		ctx->pos = rde->offset;
 		dout("readdir (%d/%d) -> %llx '%.*s' %p\n",
 		     i, rinfo->dir_nr, ctx->pos,
 		     rde->name_len, rde->name, &rde->inode.in);
 
-		BUG_ON(!rde->inode.in);
+		err = ceph_fname_to_usr(&fname, &tname, &oname, NULL);
+		if (err) {
+			dout("Unable to decode %.*s. Skipping it.\n", rde->name_len, rde->name);
+			continue;
+		}
 
-		if (!dir_emit(ctx, rde->name, rde->name_len,
+		if (!dir_emit(ctx, oname.name, oname.len,
 			      ceph_present_ino(inode->i_sb, le64_to_cpu(rde->inode.in->ino)),
 			      le32_to_cpu(rde->inode.in->mode) >> 12)) {
 			dout("filldir stopping us...\n");
-			return 0;
+			err = 0;
+			goto out;
 		}
+
+		/* Reset the lengths to their original allocated vals */
+		oname.len = olen;
 		ctx->pos++;
 	}
 
@@ -577,9 +612,12 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 					dfi->dir_ordered_count);
 		spin_unlock(&ci->i_ceph_lock);
 	}
-
+	err = 0;
 	dout("readdir %p file %p done.\n", inode, file);
-	return 0;
+out:
+	ceph_fname_free_buffer(inode, &tname);
+	ceph_fname_free_buffer(inode, &oname);
+	return err;
 }
 
 static void reset_readdir(struct ceph_dir_file_info *dfi)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 39f4c0dfa071..bf2760e53827 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1726,7 +1726,8 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 			     struct ceph_mds_session *session)
 {
 	struct dentry *parent = req->r_dentry;
-	struct ceph_inode_info *ci = ceph_inode(d_inode(parent));
+	struct inode *inode = d_inode(parent);
+	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
 	struct qstr dname;
 	struct dentry *dn;
@@ -1736,6 +1737,8 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 	u32 last_hash = 0;
 	u32 fpos_offset;
 	struct ceph_readdir_cache_control cache_ctl = {};
+	struct fscrypt_str tname = FSTR_INIT(NULL, 0);
+	struct fscrypt_str oname = FSTR_INIT(NULL, 0);
 
 	if (test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags))
 		return readdir_prepopulate_inodes_only(req, session);
@@ -1787,14 +1790,36 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 	cache_ctl.index = req->r_readdir_cache_idx;
 	fpos_offset = req->r_readdir_offset;
 
+	err = ceph_fname_alloc_buffer(inode, &tname);
+	if (err < 0)
+		goto out;
+
+	err = ceph_fname_alloc_buffer(inode, &oname);
+	if (err < 0)
+		goto out;
+
 	/* FIXME: release caps/leases if error occurs */
 	for (i = 0; i < rinfo->dir_nr; i++) {
+		bool is_nokey = false;
 		struct ceph_mds_reply_dir_entry *rde = rinfo->dir_entries + i;
 		struct ceph_vino tvino;
+		u32 olen = oname.len;
+		struct ceph_fname fname = { .dir	= inode,
+					    .name	= rde->name,
+					    .name_len	= rde->name_len,
+					    .ctext	= rde->altname,
+					    .ctext_len	= rde->altname_len };
+
+		err = ceph_fname_to_usr(&fname, &tname, &oname, &is_nokey);
+		if (err) {
+			dout("Unable to decode %.*s. Skipping it.", rde->name_len, rde->name);
+			continue;
+		}
 
-		dname.name = rde->name;
-		dname.len = rde->name_len;
+		dname.name = oname.name;
+		dname.len = oname.len;
 		dname.hash = full_name_hash(parent, dname.name, dname.len);
+		oname.len = olen;
 
 		tvino.ino = le64_to_cpu(rde->inode.in->ino);
 		tvino.snap = le64_to_cpu(rde->inode.in->snapid);
@@ -1825,6 +1850,11 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 				err = -ENOMEM;
 				goto out;
 			}
+			if (is_nokey) {
+				spin_lock(&dn->d_lock);
+				dn->d_flags |= DCACHE_NOKEY_NAME;
+				spin_unlock(&dn->d_lock);
+			}
 		} else if (d_really_is_positive(dn) &&
 			   (ceph_ino(d_inode(dn)) != tvino.ino ||
 			    ceph_snap(d_inode(dn)) != tvino.snap)) {
@@ -1915,6 +1945,8 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
 		req->r_readdir_cache_idx = cache_ctl.index;
 	}
 	ceph_readdir_cache_release(&cache_ctl);
+	ceph_fname_free_buffer(inode, &tname);
+	ceph_fname_free_buffer(inode, &oname);
 	dout("readdir_prepopulate done\n");
 	return err;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 18/19] ceph: create symlinks with encrypted and base64-encoded targets
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (16 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 17/19] ceph: add support to readdir for encrypted filenames Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-03-26 17:32 ` [RFC PATCH v5 19/19] ceph: add fscrypt ioctls Jeff Layton
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

When creating symlinks in encrypted directories, encrypt and
base64-encode the target with the new inode's key before sending to the
MDS.

When filling a symlinked inode, base64-decode it into a buffer that
we'll keep in ci->i_symlink. When get_link is called, decrypt the buffer
into a new one that will hang off i_link.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/dir.c   | 52 ++++++++++++++++++++++++---
 fs/ceph/inode.c | 95 ++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 130 insertions(+), 17 deletions(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 7fe74c2f3911..e039534a5fab 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -947,6 +947,40 @@ static int ceph_create(struct user_namespace *mnt_userns, struct inode *dir,
 	return ceph_mknod(mnt_userns, dir, dentry, mode, 0);
 }
 
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static int prep_encrypted_symlink_target(struct ceph_mds_request *req, const char *dest)
+{
+	int err;
+	int len = strlen(dest);
+	struct fscrypt_str osd_link = FSTR_INIT(NULL, 0);
+
+	err = fscrypt_prepare_symlink(req->r_parent, dest, len, PATH_MAX, &osd_link);
+	if (err)
+		goto out;
+
+	err = fscrypt_encrypt_symlink(req->r_new_inode, dest, len, &osd_link);
+	if (err)
+		goto out;
+
+	req->r_path2 = kmalloc(FSCRYPT_BASE64_CHARS(osd_link.len) + 1, GFP_KERNEL);
+	if (!req->r_path2) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	len = fscrypt_base64_encode(osd_link.name, osd_link.len, req->r_path2);
+	req->r_path2[len] = '\0';
+out:
+	fscrypt_fname_free_buffer(&osd_link);
+	return err;
+}
+#else
+static int prep_encrypted_symlink_target(struct ceph_mds_request *req, const char *dest)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 			struct dentry *dentry, const char *dest)
 {
@@ -978,12 +1012,20 @@ static int ceph_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 		goto out_req;
 	}
 
-	req->r_path2 = kstrdup(dest, GFP_KERNEL);
-	if (!req->r_path2) {
-		err = -ENOMEM;
-		goto out_req;
-	}
 	req->r_parent = dir;
+
+	if (IS_ENCRYPTED(req->r_new_inode)) {
+		err = prep_encrypted_symlink_target(req, dest);
+		if (err)
+			goto out_req;
+	} else {
+		req->r_path2 = kstrdup(dest, GFP_KERNEL);
+		if (!req->r_path2) {
+			err = -ENOMEM;
+			goto out_req;
+		}
+	}
+
 	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
 	req->r_dentry = dget(dentry);
 	req->r_num_caps = 2;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index bf2760e53827..a1f731d57883 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -35,6 +35,7 @@
  */
 
 static const struct inode_operations ceph_symlink_iops;
+static const struct inode_operations ceph_encrypted_symlink_iops;
 
 static void ceph_inode_work(struct work_struct *work);
 
@@ -615,6 +616,7 @@ void ceph_free_inode(struct inode *inode)
 	struct ceph_inode_info *ci = ceph_inode(inode);
 
 	kfree(ci->i_symlink);
+	fscrypt_free_inode(inode);
 	kmem_cache_free(ceph_inode_cachep, ci);
 }
 
@@ -814,6 +816,33 @@ void ceph_fill_file_time(struct inode *inode, int issued,
 		     inode, time_warp_seq, ci->i_time_warp_seq);
 }
 
+#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
+static int decode_encrypted_symlink(const char *encsym, int enclen, u8 **decsym)
+{
+	int declen;
+	u8 *sym;
+
+	sym = kmalloc(enclen + 1, GFP_NOFS);
+	if (!sym)
+		return -ENOMEM;
+
+	declen = fscrypt_base64_decode(encsym, enclen, sym);
+	if (declen < 0) {
+		pr_err("%s: can't decode symlink (%d). Content: %.*s\n", __func__, declen, enclen, encsym);
+		kfree(sym);
+		return -EIO;
+	}
+	sym[declen + 1] = '\0';
+	*decsym = sym;
+	return declen;
+}
+#else
+static int decode_encrypted_symlink(const char *encsym, int symlen, u8 **decsym)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 /*
  * Populate an inode based on info from mds.  May be called on new or
  * existing inodes.
@@ -1046,26 +1075,39 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 		inode->i_fop = &ceph_file_fops;
 		break;
 	case S_IFLNK:
-		inode->i_op = &ceph_symlink_iops;
 		if (!ci->i_symlink) {
 			u32 symlen = iinfo->symlink_len;
 			char *sym;
 
 			spin_unlock(&ci->i_ceph_lock);
 
-			if (symlen != i_size_read(inode)) {
-				pr_err("%s %llx.%llx BAD symlink "
-					"size %lld\n", __func__,
-					ceph_vinop(inode),
-					i_size_read(inode));
+			if (IS_ENCRYPTED(inode)) {
+				if (symlen != i_size_read(inode))
+					pr_err("%s %llx.%llx BAD symlink size %lld\n",
+						__func__, ceph_vinop(inode), i_size_read(inode));
+
+				err = decode_encrypted_symlink(iinfo->symlink, symlen, (u8 **)&sym);
+				if (err < 0) {
+					pr_err("%s decoding encrypted symlink failed: %d\n",
+						__func__, err);
+					goto out;
+				}
+				symlen = err;
 				i_size_write(inode, symlen);
 				inode->i_blocks = calc_inode_blocks(symlen);
-			}
+			} else {
+				if (symlen != i_size_read(inode)) {
+					pr_err("%s %llx.%llx BAD symlink size %lld\n",
+						__func__, ceph_vinop(inode), i_size_read(inode));
+					i_size_write(inode, symlen);
+					inode->i_blocks = calc_inode_blocks(symlen);
+				}
 
-			err = -ENOMEM;
-			sym = kstrndup(iinfo->symlink, symlen, GFP_NOFS);
-			if (!sym)
-				goto out;
+				err = -ENOMEM;
+				sym = kstrndup(iinfo->symlink, symlen, GFP_NOFS);
+				if (!sym)
+					goto out;
+			}
 
 			spin_lock(&ci->i_ceph_lock);
 			if (!ci->i_symlink)
@@ -1073,7 +1115,18 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,
 			else
 				kfree(sym); /* lost a race */
 		}
-		inode->i_link = ci->i_symlink;
+
+		if (IS_ENCRYPTED(inode)) {
+			/*
+			 * Encrypted symlinks need to be decrypted before we can
+			 * cache their targets in i_link. Leave it blank for now.
+			 */
+			inode->i_link = NULL;
+			inode->i_op = &ceph_encrypted_symlink_iops;
+		} else {
+			inode->i_link = ci->i_symlink;
+			inode->i_op = &ceph_symlink_iops;
+		}
 		break;
 	case S_IFDIR:
 		inode->i_op = &ceph_dir_iops;
@@ -2145,6 +2198,17 @@ static void ceph_inode_work(struct work_struct *work)
 	iput(inode);
 }
 
+static const char *ceph_encrypted_get_link(struct dentry *dentry, struct inode *inode,
+					   struct delayed_call *done)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+
+	if (!dentry)
+		return ERR_PTR(-ECHILD);
+
+	return fscrypt_get_symlink(inode, ci->i_symlink, i_size_read(inode), done);
+}
+
 /*
  * symlinks
  */
@@ -2155,6 +2219,13 @@ static const struct inode_operations ceph_symlink_iops = {
 	.listxattr = ceph_listxattr,
 };
 
+static const struct inode_operations ceph_encrypted_symlink_iops = {
+	.get_link = ceph_encrypted_get_link,
+	.setattr = ceph_setattr,
+	.getattr = ceph_getattr,
+	.listxattr = ceph_listxattr,
+};
+
 int __ceph_setattr(struct inode *inode, struct iattr *attr)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 19/19] ceph: add fscrypt ioctls
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (17 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 18/19] ceph: create symlinks with encrypted and base64-encoded targets Jeff Layton
@ 2021-03-26 17:32 ` Jeff Layton
  2021-04-06 15:38   ` Luis Henriques
  2021-03-26 18:38 ` [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
  2021-03-31 20:35 ` [RFC PATCH v5 20/19] ceph: make ceph_get_name decrypt filenames Jeff Layton
  20 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 17:32 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

We gate most of the ioctls on MDS feature support. The exception is the
key removal and status functions that we still want to work if the MDS's
were to (inexplicably) lose the feature.

For the set_policy ioctl, we take Fcx caps to ensure that nothing can
create files in the directory while the ioctl is running. That should
be enough to ensure that the "empty_dir" check is reliable.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/ioctl.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c
index 6e061bf62ad4..34b85bcfcfc7 100644
--- a/fs/ceph/ioctl.c
+++ b/fs/ceph/ioctl.c
@@ -6,6 +6,7 @@
 #include "mds_client.h"
 #include "ioctl.h"
 #include <linux/ceph/striper.h>
+#include <linux/fscrypt.h>
 
 /*
  * ioctls
@@ -268,8 +269,56 @@ static long ceph_ioctl_syncio(struct file *file)
 	return 0;
 }
 
+static int vet_mds_for_fscrypt(struct file *file)
+{
+	int i, ret = -EOPNOTSUPP;
+	struct ceph_mds_client	*mdsc = ceph_sb_to_mdsc(file_inode(file)->i_sb);
+
+	mutex_lock(&mdsc->mutex);
+	for (i = 0; i < mdsc->max_sessions; i++) {
+		struct ceph_mds_session *s = mdsc->sessions[i];
+
+		if (!s)
+			continue;
+		if (test_bit(CEPHFS_FEATURE_ALTERNATE_NAME, &s->s_features))
+			ret = 0;
+		break;
+	}
+	mutex_unlock(&mdsc->mutex);
+	return ret;
+}
+
+static long ceph_set_encryption_policy(struct file *file, unsigned long arg)
+{
+	int ret, got = 0;
+	struct page *page = NULL;
+	struct inode *inode = file_inode(file);
+	struct ceph_inode_info *ci = ceph_inode(inode);
+
+	ret = vet_mds_for_fscrypt(file);
+	if (ret)
+		return ret;
+
+	/*
+	 * Ensure we hold these caps so that we _know_ that the rstats check
+	 * in the empty_dir check is reliable.
+	 */
+	ret = ceph_get_caps(file, CEPH_CAP_FILE_SHARED, 0, -1, &got, &page);
+	if (ret)
+		return ret;
+	if (page)
+		put_page(page);
+	ret = fscrypt_ioctl_set_policy(file, (const void __user *)arg);
+	if (got)
+		ceph_put_cap_refs(ci, got);
+	return ret;
+}
+
 long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
+	int ret;
+	struct ceph_inode_info *ci = ceph_inode(file_inode(file));
+
 	dout("ioctl file %p cmd %u arg %lu\n", file, cmd, arg);
 	switch (cmd) {
 	case CEPH_IOC_GET_LAYOUT:
@@ -289,6 +338,51 @@ long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 
 	case CEPH_IOC_SYNCIO:
 		return ceph_ioctl_syncio(file);
+
+	case FS_IOC_SET_ENCRYPTION_POLICY:
+		return ceph_set_encryption_policy(file, arg);
+
+	case FS_IOC_GET_ENCRYPTION_POLICY:
+		ret = vet_mds_for_fscrypt(file);
+		if (ret)
+			return ret;
+		return fscrypt_ioctl_get_policy(file, (void __user *)arg);
+
+	case FS_IOC_GET_ENCRYPTION_POLICY_EX:
+		ret = vet_mds_for_fscrypt(file);
+		if (ret)
+			return ret;
+		return fscrypt_ioctl_get_policy_ex(file, (void __user *)arg);
+
+	case FS_IOC_ADD_ENCRYPTION_KEY:
+		ret = vet_mds_for_fscrypt(file);
+		if (ret)
+			return ret;
+		atomic_inc(&ci->i_shared_gen);
+		ceph_dir_clear_ordered(file_inode(file));
+		ceph_dir_clear_complete(file_inode(file));
+		return fscrypt_ioctl_add_key(file, (void __user *)arg);
+
+	case FS_IOC_REMOVE_ENCRYPTION_KEY:
+		atomic_inc(&ci->i_shared_gen);
+		ceph_dir_clear_ordered(file_inode(file));
+		ceph_dir_clear_complete(file_inode(file));
+		return fscrypt_ioctl_remove_key(file, (void __user *)arg);
+
+	case FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS:
+		atomic_inc(&ci->i_shared_gen);
+		ceph_dir_clear_ordered(file_inode(file));
+		ceph_dir_clear_complete(file_inode(file));
+		return fscrypt_ioctl_remove_key_all_users(file, (void __user *)arg);
+
+	case FS_IOC_GET_ENCRYPTION_KEY_STATUS:
+		return fscrypt_ioctl_get_key_status(file, (void __user *)arg);
+
+	case FS_IOC_GET_ENCRYPTION_NONCE:
+		ret = vet_mds_for_fscrypt(file);
+		if (ret)
+			return ret;
+		return fscrypt_ioctl_get_nonce(file, (void __user *)arg);
 	}
 
 	return -ENOTTY;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (18 preceding siblings ...)
  2021-03-26 17:32 ` [RFC PATCH v5 19/19] ceph: add fscrypt ioctls Jeff Layton
@ 2021-03-26 18:38 ` Jeff Layton
  2021-03-31 20:35 ` [RFC PATCH v5 20/19] ceph: make ceph_get_name decrypt filenames Jeff Layton
  20 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-03-26 18:38 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

On Fri, 2021-03-26 at 13:32 -0400, Jeff Layton wrote:
> I haven't posted this in a while and there were some bugs shaken out of
> the last posting. This adds (partial) support for fscrypt to kcephfs,
> including crypto contexts, filenames and encrypted symlink targets. At
> this point, the xfstests quick tests that generally pass without fscrypt
> also pass with test_dummy_encryption enabled.
> 
> There is one lingering bug that I'm having trouble tracking down: xfstest
> generic/477 (an open_by_handle_at test) sometimes throws a "Busy inodes
> after umount" warning. I'm narrowed down the issue a bit, but there is
> some raciness involved so I haven't quite nailed it down yet.
> 
> This set is quite invasive. There is probably some further work to be
> done to add common code helpers and the like, but the final diffstat
> probably won't look too different.
> 
> This set does not include encryption of file contents. That is turning
> out to be a bit trickier than first expected owing to the fact that the
> MDS is usually what handles truncation, and the i_size no longer
> represents the amount of data stored in the backing store. That will
> probably require an MDS change to fix, and we're still sorting out the
> details.
> 
> Jeff Layton (19):
>   vfs: export new_inode_pseudo
>   fscrypt: export fscrypt_base64_encode and fscrypt_base64_decode
>   fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
>   fscrypt: add fscrypt_context_for_new_inode
>   ceph: crypto context handling for ceph
>   ceph: implement -o test_dummy_encryption mount option
>   ceph: preallocate inode for ops that may create one
>   ceph: add routine to create fscrypt context prior to RPC
>   ceph: make ceph_msdc_build_path use ref-walk
>   ceph: add encrypted fname handling to ceph_mdsc_build_path
>   ceph: decode alternate_name in lease info
>   ceph: send altname in MClientRequest
>   ceph: properly set DCACHE_NOKEY_NAME flag in lookup
>   ceph: make d_revalidate call fscrypt revalidator for encrypted
>     dentries
>   ceph: add helpers for converting names for userland presentation
>   ceph: add fscrypt support to ceph_fill_trace
>   ceph: add support to readdir for encrypted filenames
>   ceph: create symlinks with encrypted and base64-encoded targets
>   ceph: add fscrypt ioctls
> 
>  fs/ceph/Makefile            |   1 +
>  fs/ceph/crypto.c            | 185 +++++++++++++++++++++++
>  fs/ceph/crypto.h            | 101 +++++++++++++
>  fs/ceph/dir.c               | 178 ++++++++++++++++++-----
>  fs/ceph/file.c              |  56 ++++---
>  fs/ceph/inode.c             | 255 +++++++++++++++++++++++++++++---
>  fs/ceph/ioctl.c             |  94 ++++++++++++
>  fs/ceph/mds_client.c        | 283 ++++++++++++++++++++++++++++++------
>  fs/ceph/mds_client.h        |  14 +-
>  fs/ceph/super.c             |  80 +++++++++-
>  fs/ceph/super.h             |  16 +-
>  fs/ceph/xattr.c             |  32 ++++
>  fs/crypto/fname.c           |  53 +++++--
>  fs/crypto/fscrypt_private.h |   9 +-
>  fs/crypto/hooks.c           |   6 +-
>  fs/crypto/policy.c          |  34 ++++-
>  fs/inode.c                  |   1 +
>  include/linux/fscrypt.h     |  10 ++
>  18 files changed, 1246 insertions(+), 162 deletions(-)
>  create mode 100644 fs/ceph/crypto.c
>  create mode 100644 fs/ceph/crypto.h
> 

Oh, I should mention that this is all in my ceph-fscrypt-fnames branch:

    https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git/

This all still under heavy development, so I'm open to suggestions and
review. If you're daring and want to test with it, please do.

I do think this has the potential to be a "killer feature" for ceph (and
maybe other network filesystems). Being able to store data securely on
an otherwise "public" cluster seems like a very nice thing to have.

Cheers,
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH v5 20/19] ceph: make ceph_get_name decrypt filenames
  2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
                   ` (19 preceding siblings ...)
  2021-03-26 18:38 ` [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
@ 2021-03-31 20:35 ` Jeff Layton
  2021-04-01 11:14   ` Luis Henriques
  20 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2021-03-31 20:35 UTC (permalink / raw)
  To: ceph-devel; +Cc: linux-fscrypt, linux-fsdevel

When we do a lookupino to the MDS, we get a filename in the trace.
ceph_get_name uses that name directly, so we must properly decrypt
it before copying it to the name buffer.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/export.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

This patch is what's needed to fix the "busy inodes after umount"
issue I was seeing with xfstest generic/477, and also makes that
test pass reliably with mounts using -o test_dummy_encryption.

diff --git a/fs/ceph/export.c b/fs/ceph/export.c
index 17d8c8f4ec89..f4e3a17ffc01 100644
--- a/fs/ceph/export.c
+++ b/fs/ceph/export.c
@@ -7,6 +7,7 @@
 
 #include "super.h"
 #include "mds_client.h"
+#include "crypto.h"
 
 /*
  * Basic fh
@@ -516,7 +517,9 @@ static int ceph_get_name(struct dentry *parent, char *name,
 {
 	struct ceph_mds_client *mdsc;
 	struct ceph_mds_request *req;
+	struct inode *dir = d_inode(parent);
 	struct inode *inode = d_inode(child);
+	struct ceph_mds_reply_info_parsed *rinfo;
 	int err;
 
 	if (ceph_snap(inode) != CEPH_NOSNAP)
@@ -528,29 +531,46 @@ static int ceph_get_name(struct dentry *parent, char *name,
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	inode_lock(d_inode(parent));
-
+	inode_lock(dir);
 	req->r_inode = inode;
 	ihold(inode);
 	req->r_ino2 = ceph_vino(d_inode(parent));
-	req->r_parent = d_inode(parent);
+	req->r_parent = dir;
 	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
 	req->r_num_caps = 2;
 	err = ceph_mdsc_do_request(mdsc, NULL, req);
+	inode_unlock(dir);
 
-	inode_unlock(d_inode(parent));
+	if (err)
+		goto out;
 
-	if (!err) {
-		struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
+	rinfo = &req->r_reply_info;
+	if (!IS_ENCRYPTED(dir)) {
 		memcpy(name, rinfo->dname, rinfo->dname_len);
 		name[rinfo->dname_len] = 0;
-		dout("get_name %p ino %llx.%llx name %s\n",
-		     child, ceph_vinop(inode), name);
 	} else {
-		dout("get_name %p ino %llx.%llx err %d\n",
-		     child, ceph_vinop(inode), err);
-	}
+		struct fscrypt_str oname = FSTR_INIT(NULL, 0);
+		struct ceph_fname fname = { .dir	= dir,
+					    .name	= rinfo->dname,
+					    .ctext	= rinfo->altname,
+					    .name_len	= rinfo->dname_len,
+					    .ctext_len	= rinfo->altname_len };
+
+		err = ceph_fname_alloc_buffer(dir, &oname);
+		if (err < 0)
+			goto out;
 
+		err = ceph_fname_to_usr(&fname, NULL, &oname, NULL);
+		if (!err) {
+			memcpy(name, oname.name, oname.len);
+			name[oname.len] = 0;
+		}
+		ceph_fname_free_buffer(dir, &oname);
+	}
+out:
+	dout("get_name %p ino %llx.%llx err %d %s%s\n",
+		     child, ceph_vinop(inode), err,
+		     err ? "" : "name ", err ? "" : name);
 	ceph_mdsc_put_request(req);
 	return err;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 20/19] ceph: make ceph_get_name decrypt filenames
  2021-03-31 20:35 ` [RFC PATCH v5 20/19] ceph: make ceph_get_name decrypt filenames Jeff Layton
@ 2021-04-01 11:14   ` Luis Henriques
  2021-04-01 12:15     ` Jeff Layton
  0 siblings, 1 reply; 39+ messages in thread
From: Luis Henriques @ 2021-04-01 11:14 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

On Wed, Mar 31, 2021 at 04:35:20PM -0400, Jeff Layton wrote:
> When we do a lookupino to the MDS, we get a filename in the trace.
> ceph_get_name uses that name directly, so we must properly decrypt
> it before copying it to the name buffer.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/export.c | 42 +++++++++++++++++++++++++++++++-----------
>  1 file changed, 31 insertions(+), 11 deletions(-)
> 
> This patch is what's needed to fix the "busy inodes after umount"
> issue I was seeing with xfstest generic/477, and also makes that
> test pass reliably with mounts using -o test_dummy_encryption.

You mentioned this issue the other day on IRC but I couldn't reproduce.

On the other hand, I'm seeing another issue.  Here's a way to reproduce:

- create an encrypted dir 'd' and create a file 'f'
- umount and mount the filesystem
- unlock dir 'd'
- cat d/f
  cat: d/2: No such file or directory

It happens _almost_ every time I do the umount+mount+unlock+cat.  Looks
like ceph_atomic_open() fails to see that directory as encrypted.  I don't
think the problem is on this open itself, but in the unlock because a
simple 'ls' also fails to show the decrypted names.  (On the other end, if
you do an 'ls' _before_ the unlock, everything seems to work fine.)

I didn't had time to dig deeper into this yet, but I don't remember seeing
this behaviour in previous versions of the patchset.

Cheers,
--
Luís

> 
> diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> index 17d8c8f4ec89..f4e3a17ffc01 100644
> --- a/fs/ceph/export.c
> +++ b/fs/ceph/export.c
> @@ -7,6 +7,7 @@
>  
>  #include "super.h"
>  #include "mds_client.h"
> +#include "crypto.h"
>  
>  /*
>   * Basic fh
> @@ -516,7 +517,9 @@ static int ceph_get_name(struct dentry *parent, char *name,
>  {
>  	struct ceph_mds_client *mdsc;
>  	struct ceph_mds_request *req;
> +	struct inode *dir = d_inode(parent);
>  	struct inode *inode = d_inode(child);
> +	struct ceph_mds_reply_info_parsed *rinfo;
>  	int err;
>  
>  	if (ceph_snap(inode) != CEPH_NOSNAP)
> @@ -528,29 +531,46 @@ static int ceph_get_name(struct dentry *parent, char *name,
>  	if (IS_ERR(req))
>  		return PTR_ERR(req);
>  
> -	inode_lock(d_inode(parent));
> -
> +	inode_lock(dir);
>  	req->r_inode = inode;
>  	ihold(inode);
>  	req->r_ino2 = ceph_vino(d_inode(parent));
> -	req->r_parent = d_inode(parent);
> +	req->r_parent = dir;
>  	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
>  	req->r_num_caps = 2;
>  	err = ceph_mdsc_do_request(mdsc, NULL, req);
> +	inode_unlock(dir);
>  
> -	inode_unlock(d_inode(parent));
> +	if (err)
> +		goto out;
>  
> -	if (!err) {
> -		struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
> +	rinfo = &req->r_reply_info;
> +	if (!IS_ENCRYPTED(dir)) {
>  		memcpy(name, rinfo->dname, rinfo->dname_len);
>  		name[rinfo->dname_len] = 0;
> -		dout("get_name %p ino %llx.%llx name %s\n",
> -		     child, ceph_vinop(inode), name);
>  	} else {
> -		dout("get_name %p ino %llx.%llx err %d\n",
> -		     child, ceph_vinop(inode), err);
> -	}
> +		struct fscrypt_str oname = FSTR_INIT(NULL, 0);
> +		struct ceph_fname fname = { .dir	= dir,
> +					    .name	= rinfo->dname,
> +					    .ctext	= rinfo->altname,
> +					    .name_len	= rinfo->dname_len,
> +					    .ctext_len	= rinfo->altname_len };
> +
> +		err = ceph_fname_alloc_buffer(dir, &oname);
> +		if (err < 0)
> +			goto out;
>  
> +		err = ceph_fname_to_usr(&fname, NULL, &oname, NULL);
> +		if (!err) {
> +			memcpy(name, oname.name, oname.len);
> +			name[oname.len] = 0;
> +		}
> +		ceph_fname_free_buffer(dir, &oname);
> +	}
> +out:
> +	dout("get_name %p ino %llx.%llx err %d %s%s\n",
> +		     child, ceph_vinop(inode), err,
> +		     err ? "" : "name ", err ? "" : name);
>  	ceph_mdsc_put_request(req);
>  	return err;
>  }
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 20/19] ceph: make ceph_get_name decrypt filenames
  2021-04-01 11:14   ` Luis Henriques
@ 2021-04-01 12:15     ` Jeff Layton
  2021-04-01 13:05       ` Luis Henriques
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2021-04-01 12:15 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

On Thu, 2021-04-01 at 12:14 +0100, Luis Henriques wrote:
> On Wed, Mar 31, 2021 at 04:35:20PM -0400, Jeff Layton wrote:
> > When we do a lookupino to the MDS, we get a filename in the trace.
> > ceph_get_name uses that name directly, so we must properly decrypt
> > it before copying it to the name buffer.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/export.c | 42 +++++++++++++++++++++++++++++++-----------
> >  1 file changed, 31 insertions(+), 11 deletions(-)
> > 
> > This patch is what's needed to fix the "busy inodes after umount"
> > issue I was seeing with xfstest generic/477, and also makes that
> > test pass reliably with mounts using -o test_dummy_encryption.
> 
> You mentioned this issue the other day on IRC but I couldn't reproduce.
> 
> On the other hand, I'm seeing another issue.  Here's a way to reproduce:
> 
> - create an encrypted dir 'd' and create a file 'f'
> - umount and mount the filesystem
> - unlock dir 'd'
> - cat d/f
>   cat: d/2: No such file or directory

I assume the message really says "cat: d/f: No such file or directory"

> 
> It happens _almost_ every time I do the umount+mount+unlock+cat.  Looks
> like ceph_atomic_open() fails to see that directory as encrypted.  I don't
> think the problem is on this open itself, but in the unlock because a
> simple 'ls' also fails to show the decrypted names.  (On the other end, if
> you do an 'ls' _before_ the unlock, everything seems to work fine.)
> 
> I didn't had time to dig deeper into this yet, but I don't remember seeing
> this behaviour in previous versions of the patchset.
> 
> Cheers,
> --
> Luís
> 

I've tried several times to reproduce this, but I haven't seen it happen
at all. It may be dependent on something in your environment (MDS
version, perhaps?). I'll try some more, but let me know if you track
down the cause.

Thanks,
Jeff

> > 
> > diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> > index 17d8c8f4ec89..f4e3a17ffc01 100644
> > --- a/fs/ceph/export.c
> > +++ b/fs/ceph/export.c
> > @@ -7,6 +7,7 @@
> >  
> >  #include "super.h"
> >  #include "mds_client.h"
> > +#include "crypto.h"
> >  
> >  /*
> >   * Basic fh
> > @@ -516,7 +517,9 @@ static int ceph_get_name(struct dentry *parent, char *name,
> >  {
> >  	struct ceph_mds_client *mdsc;
> >  	struct ceph_mds_request *req;
> > +	struct inode *dir = d_inode(parent);
> >  	struct inode *inode = d_inode(child);
> > +	struct ceph_mds_reply_info_parsed *rinfo;
> >  	int err;
> >  
> >  	if (ceph_snap(inode) != CEPH_NOSNAP)
> > @@ -528,29 +531,46 @@ static int ceph_get_name(struct dentry *parent, char *name,
> >  	if (IS_ERR(req))
> >  		return PTR_ERR(req);
> >  
> > -	inode_lock(d_inode(parent));
> > -
> > +	inode_lock(dir);
> >  	req->r_inode = inode;
> >  	ihold(inode);
> >  	req->r_ino2 = ceph_vino(d_inode(parent));
> > -	req->r_parent = d_inode(parent);
> > +	req->r_parent = dir;
> >  	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
> >  	req->r_num_caps = 2;
> >  	err = ceph_mdsc_do_request(mdsc, NULL, req);
> > +	inode_unlock(dir);
> >  
> > -	inode_unlock(d_inode(parent));
> > +	if (err)
> > +		goto out;
> >  
> > -	if (!err) {
> > -		struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
> > +	rinfo = &req->r_reply_info;
> > +	if (!IS_ENCRYPTED(dir)) {
> >  		memcpy(name, rinfo->dname, rinfo->dname_len);
> >  		name[rinfo->dname_len] = 0;
> > -		dout("get_name %p ino %llx.%llx name %s\n",
> > -		     child, ceph_vinop(inode), name);
> >  	} else {
> > -		dout("get_name %p ino %llx.%llx err %d\n",
> > -		     child, ceph_vinop(inode), err);
> > -	}
> > +		struct fscrypt_str oname = FSTR_INIT(NULL, 0);
> > +		struct ceph_fname fname = { .dir	= dir,
> > +					    .name	= rinfo->dname,
> > +					    .ctext	= rinfo->altname,
> > +					    .name_len	= rinfo->dname_len,
> > +					    .ctext_len	= rinfo->altname_len };
> > +
> > +		err = ceph_fname_alloc_buffer(dir, &oname);
> > +		if (err < 0)
> > +			goto out;
> >  
> > +		err = ceph_fname_to_usr(&fname, NULL, &oname, NULL);
> > +		if (!err) {
> > +			memcpy(name, oname.name, oname.len);
> > +			name[oname.len] = 0;
> > +		}
> > +		ceph_fname_free_buffer(dir, &oname);
> > +	}
> > +out:
> > +	dout("get_name %p ino %llx.%llx err %d %s%s\n",
> > +		     child, ceph_vinop(inode), err,
> > +		     err ? "" : "name ", err ? "" : name);
> >  	ceph_mdsc_put_request(req);
> >  	return err;
> >  }
> > -- 
> > 2.30.2
> > 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 20/19] ceph: make ceph_get_name decrypt filenames
  2021-04-01 12:15     ` Jeff Layton
@ 2021-04-01 13:05       ` Luis Henriques
  2021-04-01 13:12         ` Jeff Layton
  0 siblings, 1 reply; 39+ messages in thread
From: Luis Henriques @ 2021-04-01 13:05 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

On Thu, Apr 01, 2021 at 08:15:51AM -0400, Jeff Layton wrote:
> On Thu, 2021-04-01 at 12:14 +0100, Luis Henriques wrote:
> > On Wed, Mar 31, 2021 at 04:35:20PM -0400, Jeff Layton wrote:
> > > When we do a lookupino to the MDS, we get a filename in the trace.
> > > ceph_get_name uses that name directly, so we must properly decrypt
> > > it before copying it to the name buffer.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  fs/ceph/export.c | 42 +++++++++++++++++++++++++++++++-----------
> > >  1 file changed, 31 insertions(+), 11 deletions(-)
> > > 
> > > This patch is what's needed to fix the "busy inodes after umount"
> > > issue I was seeing with xfstest generic/477, and also makes that
> > > test pass reliably with mounts using -o test_dummy_encryption.
> > 
> > You mentioned this issue the other day on IRC but I couldn't reproduce.
> > 
> > On the other hand, I'm seeing another issue.  Here's a way to reproduce:
> > 
> > - create an encrypted dir 'd' and create a file 'f'
> > - umount and mount the filesystem
> > - unlock dir 'd'
> > - cat d/f
> >   cat: d/2: No such file or directory
> 
> I assume the message really says "cat: d/f: No such file or directory"

Yes, of course :)

> > 
> > It happens _almost_ every time I do the umount+mount+unlock+cat.  Looks
> > like ceph_atomic_open() fails to see that directory as encrypted.  I don't
> > think the problem is on this open itself, but in the unlock because a
> > simple 'ls' also fails to show the decrypted names.  (On the other end, if
> > you do an 'ls' _before_ the unlock, everything seems to work fine.)
> > 
> > I didn't had time to dig deeper into this yet, but I don't remember seeing
> > this behaviour in previous versions of the patchset.
> > 
> > Cheers,
> > --
> > Luís
> > 
> 
> I've tried several times to reproduce this, but I haven't seen it happen
> at all. It may be dependent on something in your environment (MDS
> version, perhaps?). I'll try some more, but let me know if you track
> down the cause.

Hmm... it could be indeed.  I'm running a vstart.sh cluster with pacific
(HEAD in eb5d7a868c96 ("Merge PR #40473 into pacific")).  It's trivial to
reproduce here, so I now wonder if I'm really missing something on the MDS
side.  I had a disaster recently (a disk died) and I had to recreate my
test environment.  I don't think I had anything extra to run fscrypt
tests, but I can't really remember.

Anyway, I'll let you know if I get something.

Cheers,
--
Luís


> Thanks,
> Jeff
> 
> > > 
> > > diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> > > index 17d8c8f4ec89..f4e3a17ffc01 100644
> > > --- a/fs/ceph/export.c
> > > +++ b/fs/ceph/export.c
> > > @@ -7,6 +7,7 @@
> > >  
> > >  #include "super.h"
> > >  #include "mds_client.h"
> > > +#include "crypto.h"
> > >  
> > >  /*
> > >   * Basic fh
> > > @@ -516,7 +517,9 @@ static int ceph_get_name(struct dentry *parent, char *name,
> > >  {
> > >  	struct ceph_mds_client *mdsc;
> > >  	struct ceph_mds_request *req;
> > > +	struct inode *dir = d_inode(parent);
> > >  	struct inode *inode = d_inode(child);
> > > +	struct ceph_mds_reply_info_parsed *rinfo;
> > >  	int err;
> > >  
> > >  	if (ceph_snap(inode) != CEPH_NOSNAP)
> > > @@ -528,29 +531,46 @@ static int ceph_get_name(struct dentry *parent, char *name,
> > >  	if (IS_ERR(req))
> > >  		return PTR_ERR(req);
> > >  
> > > -	inode_lock(d_inode(parent));
> > > -
> > > +	inode_lock(dir);
> > >  	req->r_inode = inode;
> > >  	ihold(inode);
> > >  	req->r_ino2 = ceph_vino(d_inode(parent));
> > > -	req->r_parent = d_inode(parent);
> > > +	req->r_parent = dir;
> > >  	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
> > >  	req->r_num_caps = 2;
> > >  	err = ceph_mdsc_do_request(mdsc, NULL, req);
> > > +	inode_unlock(dir);
> > >  
> > > -	inode_unlock(d_inode(parent));
> > > +	if (err)
> > > +		goto out;
> > >  
> > > -	if (!err) {
> > > -		struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
> > > +	rinfo = &req->r_reply_info;
> > > +	if (!IS_ENCRYPTED(dir)) {
> > >  		memcpy(name, rinfo->dname, rinfo->dname_len);
> > >  		name[rinfo->dname_len] = 0;
> > > -		dout("get_name %p ino %llx.%llx name %s\n",
> > > -		     child, ceph_vinop(inode), name);
> > >  	} else {
> > > -		dout("get_name %p ino %llx.%llx err %d\n",
> > > -		     child, ceph_vinop(inode), err);
> > > -	}
> > > +		struct fscrypt_str oname = FSTR_INIT(NULL, 0);
> > > +		struct ceph_fname fname = { .dir	= dir,
> > > +					    .name	= rinfo->dname,
> > > +					    .ctext	= rinfo->altname,
> > > +					    .name_len	= rinfo->dname_len,
> > > +					    .ctext_len	= rinfo->altname_len };
> > > +
> > > +		err = ceph_fname_alloc_buffer(dir, &oname);
> > > +		if (err < 0)
> > > +			goto out;
> > >  
> > > +		err = ceph_fname_to_usr(&fname, NULL, &oname, NULL);
> > > +		if (!err) {
> > > +			memcpy(name, oname.name, oname.len);
> > > +			name[oname.len] = 0;
> > > +		}
> > > +		ceph_fname_free_buffer(dir, &oname);
> > > +	}
> > > +out:
> > > +	dout("get_name %p ino %llx.%llx err %d %s%s\n",
> > > +		     child, ceph_vinop(inode), err,
> > > +		     err ? "" : "name ", err ? "" : name);
> > >  	ceph_mdsc_put_request(req);
> > >  	return err;
> > >  }
> > > -- 
> > > 2.30.2
> > > 
> 
> -- 
> Jeff Layton <jlayton@kernel.org>
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 20/19] ceph: make ceph_get_name decrypt filenames
  2021-04-01 13:05       ` Luis Henriques
@ 2021-04-01 13:12         ` Jeff Layton
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-04-01 13:12 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

On Thu, 2021-04-01 at 14:05 +0100, Luis Henriques wrote:
> On Thu, Apr 01, 2021 at 08:15:51AM -0400, Jeff Layton wrote:
> > On Thu, 2021-04-01 at 12:14 +0100, Luis Henriques wrote:
> > > On Wed, Mar 31, 2021 at 04:35:20PM -0400, Jeff Layton wrote:
> > > > When we do a lookupino to the MDS, we get a filename in the trace.
> > > > ceph_get_name uses that name directly, so we must properly decrypt
> > > > it before copying it to the name buffer.
> > > > 
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >  fs/ceph/export.c | 42 +++++++++++++++++++++++++++++++-----------
> > > >  1 file changed, 31 insertions(+), 11 deletions(-)
> > > > 
> > > > This patch is what's needed to fix the "busy inodes after umount"
> > > > issue I was seeing with xfstest generic/477, and also makes that
> > > > test pass reliably with mounts using -o test_dummy_encryption.
> > > 
> > > You mentioned this issue the other day on IRC but I couldn't reproduce.
> > > 
> > > On the other hand, I'm seeing another issue.  Here's a way to reproduce:
> > > 
> > > - create an encrypted dir 'd' and create a file 'f'
> > > - umount and mount the filesystem
> > > - unlock dir 'd'
> > > - cat d/f
> > >   cat: d/2: No such file or directory
> > 
> > I assume the message really says "cat: d/f: No such file or directory"
> 
> Yes, of course :)
> 
> > > 
> > > It happens _almost_ every time I do the umount+mount+unlock+cat.  Looks
> > > like ceph_atomic_open() fails to see that directory as encrypted.  I don't
> > > think the problem is on this open itself, but in the unlock because a
> > > simple 'ls' also fails to show the decrypted names.  (On the other end, if
> > > you do an 'ls' _before_ the unlock, everything seems to work fine.)
> > > 
> > > I didn't had time to dig deeper into this yet, but I don't remember seeing
> > > this behaviour in previous versions of the patchset.
> > > 
> > > Cheers,
> > > --
> > > Luís
> > > 
> > 
> > I've tried several times to reproduce this, but I haven't seen it happen
> > at all. It may be dependent on something in your environment (MDS
> > version, perhaps?). I'll try some more, but let me know if you track
> > down the cause.
> 
> Hmm... it could be indeed.  I'm running a vstart.sh cluster with pacific
> (HEAD in eb5d7a868c96 ("Merge PR #40473 into pacific")).  It's trivial to
> reproduce here, so I now wonder if I'm really missing something on the MDS
> side.  I had a disaster recently (a disk died) and I had to recreate my
> test environment.  I don't think I had anything extra to run fscrypt
> tests, but I can't really remember.
> 
> Anyway, I'll let you know if I get something.
> 

Thanks. FWIW, I'm on a cephadm built cluster using a pacific(-ish) build
from about 2 weeks ago:

$ sudo ./cephadm version
Using recent ceph image docker.io/ceph/daemon-base@sha256:765d8c56160753aa4a92757a2e007f5821f8c0ec70b5fc998faf334a2b127df2
ceph version 17.0.0-1983-g6a19e303 (6a19e303187c2defceb9c785284ca401a4309c47) quincy (dev)


> Cheers,
> --
> Luís
> 
> 
> > Thanks,
> > Jeff
> > 
> > > > 
> > > > diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> > > > index 17d8c8f4ec89..f4e3a17ffc01 100644
> > > > --- a/fs/ceph/export.c
> > > > +++ b/fs/ceph/export.c
> > > > @@ -7,6 +7,7 @@
> > > >  
> > > >  #include "super.h"
> > > >  #include "mds_client.h"
> > > > +#include "crypto.h"
> > > >  
> > > >  /*
> > > >   * Basic fh
> > > > @@ -516,7 +517,9 @@ static int ceph_get_name(struct dentry *parent, char *name,
> > > >  {
> > > >  	struct ceph_mds_client *mdsc;
> > > >  	struct ceph_mds_request *req;
> > > > +	struct inode *dir = d_inode(parent);
> > > >  	struct inode *inode = d_inode(child);
> > > > +	struct ceph_mds_reply_info_parsed *rinfo;
> > > >  	int err;
> > > >  
> > > >  	if (ceph_snap(inode) != CEPH_NOSNAP)
> > > > @@ -528,29 +531,46 @@ static int ceph_get_name(struct dentry *parent, char *name,
> > > >  	if (IS_ERR(req))
> > > >  		return PTR_ERR(req);
> > > >  
> > > > -	inode_lock(d_inode(parent));
> > > > -
> > > > +	inode_lock(dir);
> > > >  	req->r_inode = inode;
> > > >  	ihold(inode);
> > > >  	req->r_ino2 = ceph_vino(d_inode(parent));
> > > > -	req->r_parent = d_inode(parent);
> > > > +	req->r_parent = dir;
> > > >  	set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
> > > >  	req->r_num_caps = 2;
> > > >  	err = ceph_mdsc_do_request(mdsc, NULL, req);
> > > > +	inode_unlock(dir);
> > > >  
> > > > -	inode_unlock(d_inode(parent));
> > > > +	if (err)
> > > > +		goto out;
> > > >  
> > > > -	if (!err) {
> > > > -		struct ceph_mds_reply_info_parsed *rinfo = &req->r_reply_info;
> > > > +	rinfo = &req->r_reply_info;
> > > > +	if (!IS_ENCRYPTED(dir)) {
> > > >  		memcpy(name, rinfo->dname, rinfo->dname_len);
> > > >  		name[rinfo->dname_len] = 0;
> > > > -		dout("get_name %p ino %llx.%llx name %s\n",
> > > > -		     child, ceph_vinop(inode), name);
> > > >  	} else {
> > > > -		dout("get_name %p ino %llx.%llx err %d\n",
> > > > -		     child, ceph_vinop(inode), err);
> > > > -	}
> > > > +		struct fscrypt_str oname = FSTR_INIT(NULL, 0);
> > > > +		struct ceph_fname fname = { .dir	= dir,
> > > > +					    .name	= rinfo->dname,
> > > > +					    .ctext	= rinfo->altname,
> > > > +					    .name_len	= rinfo->dname_len,
> > > > +					    .ctext_len	= rinfo->altname_len };
> > > > +
> > > > +		err = ceph_fname_alloc_buffer(dir, &oname);
> > > > +		if (err < 0)
> > > > +			goto out;
> > > >  
> > > > +		err = ceph_fname_to_usr(&fname, NULL, &oname, NULL);
> > > > +		if (!err) {
> > > > +			memcpy(name, oname.name, oname.len);
> > > > +			name[oname.len] = 0;
> > > > +		}
> > > > +		ceph_fname_free_buffer(dir, &oname);
> > > > +	}
> > > > +out:
> > > > +	dout("get_name %p ino %llx.%llx err %d %s%s\n",
> > > > +		     child, ceph_vinop(inode), err,
> > > > +		     err ? "" : "name ", err ? "" : name);
> > > >  	ceph_mdsc_put_request(req);
> > > >  	return err;
> > > >  }
> > > > -- 
> > > > 2.30.2
> > > > 
> > 
> > -- 
> > Jeff Layton <jlayton@kernel.org>
> > 
> 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 19/19] ceph: add fscrypt ioctls
  2021-03-26 17:32 ` [RFC PATCH v5 19/19] ceph: add fscrypt ioctls Jeff Layton
@ 2021-04-06 15:38   ` Luis Henriques
  2021-04-06 16:03     ` Jeff Layton
  0 siblings, 1 reply; 39+ messages in thread
From: Luis Henriques @ 2021-04-06 15:38 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

Hi Jeff!

On Fri, Mar 26, 2021 at 01:32:27PM -0400, Jeff Layton wrote:
> We gate most of the ioctls on MDS feature support. The exception is the
> key removal and status functions that we still want to work if the MDS's
> were to (inexplicably) lose the feature.
> 
> For the set_policy ioctl, we take Fcx caps to ensure that nothing can
> create files in the directory while the ioctl is running. That should
> be enough to ensure that the "empty_dir" check is reliable.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/ioctl.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 94 insertions(+)
> 
> diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c
> index 6e061bf62ad4..34b85bcfcfc7 100644
> --- a/fs/ceph/ioctl.c
> +++ b/fs/ceph/ioctl.c
> @@ -6,6 +6,7 @@
>  #include "mds_client.h"
>  #include "ioctl.h"
>  #include <linux/ceph/striper.h>
> +#include <linux/fscrypt.h>
>  
>  /*
>   * ioctls
> @@ -268,8 +269,56 @@ static long ceph_ioctl_syncio(struct file *file)
>  	return 0;
>  }
>  
> +static int vet_mds_for_fscrypt(struct file *file)
> +{
> +	int i, ret = -EOPNOTSUPP;
> +	struct ceph_mds_client	*mdsc = ceph_sb_to_mdsc(file_inode(file)->i_sb);
> +
> +	mutex_lock(&mdsc->mutex);
> +	for (i = 0; i < mdsc->max_sessions; i++) {
> +		struct ceph_mds_session *s = mdsc->sessions[i];
> +
> +		if (!s)
> +			continue;
> +		if (test_bit(CEPHFS_FEATURE_ALTERNATE_NAME, &s->s_features))
> +			ret = 0;
> +		break;
> +	}
> +	mutex_unlock(&mdsc->mutex);
> +	return ret;
> +}
> +
> +static long ceph_set_encryption_policy(struct file *file, unsigned long arg)
> +{
> +	int ret, got = 0;
> +	struct page *page = NULL;
> +	struct inode *inode = file_inode(file);
> +	struct ceph_inode_info *ci = ceph_inode(inode);
> +
> +	ret = vet_mds_for_fscrypt(file);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Ensure we hold these caps so that we _know_ that the rstats check
> +	 * in the empty_dir check is reliable.
> +	 */
> +	ret = ceph_get_caps(file, CEPH_CAP_FILE_SHARED, 0, -1, &got, &page);
> +	if (ret)
> +		return ret;
> +	if (page)
> +		put_page(page);
> +	ret = fscrypt_ioctl_set_policy(file, (const void __user *)arg);
> +	if (got)
> +		ceph_put_cap_refs(ci, got);
> +	return ret;
> +}
> +
>  long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
>  {
> +	int ret;
> +	struct ceph_inode_info *ci = ceph_inode(file_inode(file));
> +
>  	dout("ioctl file %p cmd %u arg %lu\n", file, cmd, arg);
>  	switch (cmd) {
>  	case CEPH_IOC_GET_LAYOUT:
> @@ -289,6 +338,51 @@ long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
>  
>  	case CEPH_IOC_SYNCIO:
>  		return ceph_ioctl_syncio(file);
> +
> +	case FS_IOC_SET_ENCRYPTION_POLICY:
> +		return ceph_set_encryption_policy(file, arg);
> +
> +	case FS_IOC_GET_ENCRYPTION_POLICY:
> +		ret = vet_mds_for_fscrypt(file);
> +		if (ret)
> +			return ret;
> +		return fscrypt_ioctl_get_policy(file, (void __user *)arg);
> +
> +	case FS_IOC_GET_ENCRYPTION_POLICY_EX:
> +		ret = vet_mds_for_fscrypt(file);
> +		if (ret)
> +			return ret;
> +		return fscrypt_ioctl_get_policy_ex(file, (void __user *)arg);
> +
> +	case FS_IOC_ADD_ENCRYPTION_KEY:
> +		ret = vet_mds_for_fscrypt(file);
> +		if (ret)
> +			return ret;
> +		atomic_inc(&ci->i_shared_gen);

I've spent a few hours already looking at the bug I reported before, and I
can't really understand this code.  What does it mean to increment
->i_shared_gen at this point?

The reason I'm asking is because it looks like the problem I'm seeing goes
away if I remove this code.  Here's what I'm doing/seeing:

# mount ...
# fscrypt unlock d

  -> 'd' dentry is eventually pruned at this point *if* ->i_shared_gen was
     incremented by the line above.

# cat d/f

  -> when ceph_fill_inode() is executed, 'd' isn't *not* set as encrypted
     because both ci->i_xattrs.version and info->xattr_version are both
     set to 0.

cat: d/f: No such file or directory

I'm not sure anymore if the issue is on the client or on the MDS side.
Before digging deeper, I wonder if this ring any bell. ;-)

Cheers,
--
Luís


> +		ceph_dir_clear_ordered(file_inode(file));
> +		ceph_dir_clear_complete(file_inode(file));
> +		return fscrypt_ioctl_add_key(file, (void __user *)arg);
> +
> +	case FS_IOC_REMOVE_ENCRYPTION_KEY:
> +		atomic_inc(&ci->i_shared_gen);
> +		ceph_dir_clear_ordered(file_inode(file));
> +		ceph_dir_clear_complete(file_inode(file));
> +		return fscrypt_ioctl_remove_key(file, (void __user *)arg);
> +
> +	case FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS:
> +		atomic_inc(&ci->i_shared_gen);
> +		ceph_dir_clear_ordered(file_inode(file));
> +		ceph_dir_clear_complete(file_inode(file));
> +		return fscrypt_ioctl_remove_key_all_users(file, (void __user *)arg);
> +
> +	case FS_IOC_GET_ENCRYPTION_KEY_STATUS:
> +		return fscrypt_ioctl_get_key_status(file, (void __user *)arg);
> +
> +	case FS_IOC_GET_ENCRYPTION_NONCE:
> +		ret = vet_mds_for_fscrypt(file);
> +		if (ret)
> +			return ret;
> +		return fscrypt_ioctl_get_nonce(file, (void __user *)arg);
>  	}
>  
>  	return -ENOTTY;
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 19/19] ceph: add fscrypt ioctls
  2021-04-06 15:38   ` Luis Henriques
@ 2021-04-06 16:03     ` Jeff Layton
  2021-04-06 16:24       ` Luis Henriques
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2021-04-06 16:03 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

On Tue, 2021-04-06 at 16:38 +0100, Luis Henriques wrote:
> Hi Jeff!
> 
> On Fri, Mar 26, 2021 at 01:32:27PM -0400, Jeff Layton wrote:
> > We gate most of the ioctls on MDS feature support. The exception is the
> > key removal and status functions that we still want to work if the MDS's
> > were to (inexplicably) lose the feature.
> > 
> > For the set_policy ioctl, we take Fcx caps to ensure that nothing can
> > create files in the directory while the ioctl is running. That should
> > be enough to ensure that the "empty_dir" check is reliable.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/ioctl.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 94 insertions(+)
> > 
> > diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c
> > index 6e061bf62ad4..34b85bcfcfc7 100644
> > --- a/fs/ceph/ioctl.c
> > +++ b/fs/ceph/ioctl.c
> > @@ -6,6 +6,7 @@
> >  #include "mds_client.h"
> >  #include "ioctl.h"
> >  #include <linux/ceph/striper.h>
> > +#include <linux/fscrypt.h>
> >  
> >  /*
> >   * ioctls
> > @@ -268,8 +269,56 @@ static long ceph_ioctl_syncio(struct file *file)
> >  	return 0;
> >  }
> >  
> > +static int vet_mds_for_fscrypt(struct file *file)
> > +{
> > +	int i, ret = -EOPNOTSUPP;
> > +	struct ceph_mds_client	*mdsc = ceph_sb_to_mdsc(file_inode(file)->i_sb);
> > +
> > +	mutex_lock(&mdsc->mutex);
> > +	for (i = 0; i < mdsc->max_sessions; i++) {
> > +		struct ceph_mds_session *s = mdsc->sessions[i];
> > +
> > +		if (!s)
> > +			continue;
> > +		if (test_bit(CEPHFS_FEATURE_ALTERNATE_NAME, &s->s_features))
> > +			ret = 0;
> > +		break;
> > +	}
> > +	mutex_unlock(&mdsc->mutex);
> > +	return ret;
> > +}
> > +
> > +static long ceph_set_encryption_policy(struct file *file, unsigned long arg)
> > +{
> > +	int ret, got = 0;
> > +	struct page *page = NULL;
> > +	struct inode *inode = file_inode(file);
> > +	struct ceph_inode_info *ci = ceph_inode(inode);
> > +
> > +	ret = vet_mds_for_fscrypt(file);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/*
> > +	 * Ensure we hold these caps so that we _know_ that the rstats check
> > +	 * in the empty_dir check is reliable.
> > +	 */
> > +	ret = ceph_get_caps(file, CEPH_CAP_FILE_SHARED, 0, -1, &got, &page);
> > +	if (ret)
> > +		return ret;
> > +	if (page)
> > +		put_page(page);
> > +	ret = fscrypt_ioctl_set_policy(file, (const void __user *)arg);
> > +	if (got)
> > +		ceph_put_cap_refs(ci, got);
> > +	return ret;
> > +}
> > +
> >  long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> >  {
> > +	int ret;
> > +	struct ceph_inode_info *ci = ceph_inode(file_inode(file));
> > +
> >  	dout("ioctl file %p cmd %u arg %lu\n", file, cmd, arg);
> >  	switch (cmd) {
> >  	case CEPH_IOC_GET_LAYOUT:
> > @@ -289,6 +338,51 @@ long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> >  
> >  	case CEPH_IOC_SYNCIO:
> >  		return ceph_ioctl_syncio(file);
> > +
> > +	case FS_IOC_SET_ENCRYPTION_POLICY:
> > +		return ceph_set_encryption_policy(file, arg);
> > +
> > +	case FS_IOC_GET_ENCRYPTION_POLICY:
> > +		ret = vet_mds_for_fscrypt(file);
> > +		if (ret)
> > +			return ret;
> > +		return fscrypt_ioctl_get_policy(file, (void __user *)arg);
> > +
> > +	case FS_IOC_GET_ENCRYPTION_POLICY_EX:
> > +		ret = vet_mds_for_fscrypt(file);
> > +		if (ret)
> > +			return ret;
> > +		return fscrypt_ioctl_get_policy_ex(file, (void __user *)arg);
> > +
> > +	case FS_IOC_ADD_ENCRYPTION_KEY:
> > +		ret = vet_mds_for_fscrypt(file);
> > +		if (ret)
> > +			return ret;
> > +		atomic_inc(&ci->i_shared_gen);
> 
> I've spent a few hours already looking at the bug I reported before, and I
> can't really understand this code.  What does it mean to increment
> ->i_shared_gen at this point?
> 
> The reason I'm asking is because it looks like the problem I'm seeing goes
> away if I remove this code.  Here's what I'm doing/seeing:
> 
> # mount ...
> # fscrypt unlock d
> 
>   -> 'd' dentry is eventually pruned at this point *if* ->i_shared_gen was
>      incremented by the line above.
> 
> # cat d/f
> 
>   -> when ceph_fill_inode() is executed, 'd' isn't *not* set as encrypted
>      because both ci->i_xattrs.version and info->xattr_version are both
>      set to 0.
> 

Interesting. That sounds like it might be the bug right there. "d"
should clearly have a fscrypt context in its xattrs at that point. If
the MDS isn't passing that back, then that could be a problem.

I had a concern about that when I was developing this, and I *thought*
Zheng had assured us that the MDS will always pass along the xattr blob
in a trace. Maybe that's not correct?

> cat: d/f: No such file or directory
> 
> I'm not sure anymore if the issue is on the client or on the MDS side.
> Before digging deeper, I wonder if this ring any bell. ;-)
> 
> 

No, this is not something I've seen before.

Dentries that live in a directory have a copy of the i_shared_gen of the
directory when they are instantiated. Bumping that value on a directory
should basically ensure that its child dentries end up invalidated,
which is what we want once we add the key to the directory. Once we add
a key, any old dentries in that directory are no longer valid.

That said, I could certainly have missed some subtlety here.

> 
> > +		ceph_dir_clear_ordered(file_inode(file));
> > +		ceph_dir_clear_complete(file_inode(file));
> > +		return fscrypt_ioctl_add_key(file, (void __user *)arg);
> > +
> > +	case FS_IOC_REMOVE_ENCRYPTION_KEY:
> > +		atomic_inc(&ci->i_shared_gen);
> > +		ceph_dir_clear_ordered(file_inode(file));
> > +		ceph_dir_clear_complete(file_inode(file));
> > +		return fscrypt_ioctl_remove_key(file, (void __user *)arg);
> > +
> > +	case FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS:
> > +		atomic_inc(&ci->i_shared_gen);
> > +		ceph_dir_clear_ordered(file_inode(file));
> > +		ceph_dir_clear_complete(file_inode(file));
> > +		return fscrypt_ioctl_remove_key_all_users(file, (void __user *)arg);
> > +
> > +	case FS_IOC_GET_ENCRYPTION_KEY_STATUS:
> > +		return fscrypt_ioctl_get_key_status(file, (void __user *)arg);
> > +
> > +	case FS_IOC_GET_ENCRYPTION_NONCE:
> > +		ret = vet_mds_for_fscrypt(file);
> > +		if (ret)
> > +			return ret;
> > +		return fscrypt_ioctl_get_nonce(file, (void __user *)arg);
> >  	}
> >  
> >  	return -ENOTTY;
> > -- 
> > 2.30.2
> > 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 19/19] ceph: add fscrypt ioctls
  2021-04-06 16:03     ` Jeff Layton
@ 2021-04-06 16:24       ` Luis Henriques
  2021-04-06 17:27         ` Jeff Layton
  0 siblings, 1 reply; 39+ messages in thread
From: Luis Henriques @ 2021-04-06 16:24 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

On Tue, Apr 06, 2021 at 12:03:27PM -0400, Jeff Layton wrote:
> On Tue, 2021-04-06 at 16:38 +0100, Luis Henriques wrote:
> > Hi Jeff!
> > 
> > On Fri, Mar 26, 2021 at 01:32:27PM -0400, Jeff Layton wrote:
> > > We gate most of the ioctls on MDS feature support. The exception is the
> > > key removal and status functions that we still want to work if the MDS's
> > > were to (inexplicably) lose the feature.
> > > 
> > > For the set_policy ioctl, we take Fcx caps to ensure that nothing can
> > > create files in the directory while the ioctl is running. That should
> > > be enough to ensure that the "empty_dir" check is reliable.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  fs/ceph/ioctl.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 94 insertions(+)
> > > 
> > > diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c
> > > index 6e061bf62ad4..34b85bcfcfc7 100644
> > > --- a/fs/ceph/ioctl.c
> > > +++ b/fs/ceph/ioctl.c
> > > @@ -6,6 +6,7 @@
> > >  #include "mds_client.h"
> > >  #include "ioctl.h"
> > >  #include <linux/ceph/striper.h>
> > > +#include <linux/fscrypt.h>
> > >  
> > >  /*
> > >   * ioctls
> > > @@ -268,8 +269,56 @@ static long ceph_ioctl_syncio(struct file *file)
> > >  	return 0;
> > >  }
> > >  
> > > +static int vet_mds_for_fscrypt(struct file *file)
> > > +{
> > > +	int i, ret = -EOPNOTSUPP;
> > > +	struct ceph_mds_client	*mdsc = ceph_sb_to_mdsc(file_inode(file)->i_sb);
> > > +
> > > +	mutex_lock(&mdsc->mutex);
> > > +	for (i = 0; i < mdsc->max_sessions; i++) {
> > > +		struct ceph_mds_session *s = mdsc->sessions[i];
> > > +
> > > +		if (!s)
> > > +			continue;
> > > +		if (test_bit(CEPHFS_FEATURE_ALTERNATE_NAME, &s->s_features))
> > > +			ret = 0;
> > > +		break;
> > > +	}
> > > +	mutex_unlock(&mdsc->mutex);
> > > +	return ret;
> > > +}
> > > +
> > > +static long ceph_set_encryption_policy(struct file *file, unsigned long arg)
> > > +{
> > > +	int ret, got = 0;
> > > +	struct page *page = NULL;
> > > +	struct inode *inode = file_inode(file);
> > > +	struct ceph_inode_info *ci = ceph_inode(inode);
> > > +
> > > +	ret = vet_mds_for_fscrypt(file);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	/*
> > > +	 * Ensure we hold these caps so that we _know_ that the rstats check
> > > +	 * in the empty_dir check is reliable.
> > > +	 */
> > > +	ret = ceph_get_caps(file, CEPH_CAP_FILE_SHARED, 0, -1, &got, &page);
> > > +	if (ret)
> > > +		return ret;
> > > +	if (page)
> > > +		put_page(page);
> > > +	ret = fscrypt_ioctl_set_policy(file, (const void __user *)arg);
> > > +	if (got)
> > > +		ceph_put_cap_refs(ci, got);
> > > +	return ret;
> > > +}
> > > +
> > >  long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> > >  {
> > > +	int ret;
> > > +	struct ceph_inode_info *ci = ceph_inode(file_inode(file));
> > > +
> > >  	dout("ioctl file %p cmd %u arg %lu\n", file, cmd, arg);
> > >  	switch (cmd) {
> > >  	case CEPH_IOC_GET_LAYOUT:
> > > @@ -289,6 +338,51 @@ long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> > >  
> > >  	case CEPH_IOC_SYNCIO:
> > >  		return ceph_ioctl_syncio(file);
> > > +
> > > +	case FS_IOC_SET_ENCRYPTION_POLICY:
> > > +		return ceph_set_encryption_policy(file, arg);
> > > +
> > > +	case FS_IOC_GET_ENCRYPTION_POLICY:
> > > +		ret = vet_mds_for_fscrypt(file);
> > > +		if (ret)
> > > +			return ret;
> > > +		return fscrypt_ioctl_get_policy(file, (void __user *)arg);
> > > +
> > > +	case FS_IOC_GET_ENCRYPTION_POLICY_EX:
> > > +		ret = vet_mds_for_fscrypt(file);
> > > +		if (ret)
> > > +			return ret;
> > > +		return fscrypt_ioctl_get_policy_ex(file, (void __user *)arg);
> > > +
> > > +	case FS_IOC_ADD_ENCRYPTION_KEY:
> > > +		ret = vet_mds_for_fscrypt(file);
> > > +		if (ret)
> > > +			return ret;
> > > +		atomic_inc(&ci->i_shared_gen);
> > 
> > I've spent a few hours already looking at the bug I reported before, and I
> > can't really understand this code.  What does it mean to increment
> > ->i_shared_gen at this point?
> > 
> > The reason I'm asking is because it looks like the problem I'm seeing goes
> > away if I remove this code.  Here's what I'm doing/seeing:
> > 
> > # mount ...
> > # fscrypt unlock d
> > 
> >   -> 'd' dentry is eventually pruned at this point *if* ->i_shared_gen was
> >      incremented by the line above.
> > 
> > # cat d/f
> > 
> >   -> when ceph_fill_inode() is executed, 'd' isn't *not* set as encrypted
> >      because both ci->i_xattrs.version and info->xattr_version are both
> >      set to 0.
> > 
> 
> Interesting. That sounds like it might be the bug right there. "d"
> should clearly have a fscrypt context in its xattrs at that point. If
> the MDS isn't passing that back, then that could be a problem.
> 
> I had a concern about that when I was developing this, and I *thought*
> Zheng had assured us that the MDS will always pass along the xattr blob
> in a trace. Maybe that's not correct?

Hmm, that's what I thought too.  I was hoping not having to go look at the
MDS, but seems like I'll have to :-)

> > cat: d/f: No such file or directory
> > 
> > I'm not sure anymore if the issue is on the client or on the MDS side.
> > Before digging deeper, I wonder if this ring any bell. ;-)
> > 
> > 
> 
> No, this is not something I've seen before.
> 
> Dentries that live in a directory have a copy of the i_shared_gen of the
> directory when they are instantiated. Bumping that value on a directory
> should basically ensure that its child dentries end up invalidated,
> which is what we want once we add the key to the directory. Once we add
> a key, any old dentries in that directory are no longer valid.
> 
> That said, I could certainly have missed some subtlety here.

Great, thanks for clarifying.  This should help me investigate a little
bit more.

[ And I'm also surprised you don't see this behaviour as it's very easy to
  reproduce. ]

Cheers,
--
Luís

> > 
> > > +		ceph_dir_clear_ordered(file_inode(file));
> > > +		ceph_dir_clear_complete(file_inode(file));
> > > +		return fscrypt_ioctl_add_key(file, (void __user *)arg);
> > > +
> > > +	case FS_IOC_REMOVE_ENCRYPTION_KEY:
> > > +		atomic_inc(&ci->i_shared_gen);
> > > +		ceph_dir_clear_ordered(file_inode(file));
> > > +		ceph_dir_clear_complete(file_inode(file));
> > > +		return fscrypt_ioctl_remove_key(file, (void __user *)arg);
> > > +
> > > +	case FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS:
> > > +		atomic_inc(&ci->i_shared_gen);
> > > +		ceph_dir_clear_ordered(file_inode(file));
> > > +		ceph_dir_clear_complete(file_inode(file));
> > > +		return fscrypt_ioctl_remove_key_all_users(file, (void __user *)arg);
> > > +
> > > +	case FS_IOC_GET_ENCRYPTION_KEY_STATUS:
> > > +		return fscrypt_ioctl_get_key_status(file, (void __user *)arg);
> > > +
> > > +	case FS_IOC_GET_ENCRYPTION_NONCE:
> > > +		ret = vet_mds_for_fscrypt(file);
> > > +		if (ret)
> > > +			return ret;
> > > +		return fscrypt_ioctl_get_nonce(file, (void __user *)arg);
> > >  	}
> > >  
> > >  	return -ENOTTY;
> > > -- 
> > > 2.30.2
> > > 
> 
> -- 
> Jeff Layton <jlayton@kernel.org>
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 19/19] ceph: add fscrypt ioctls
  2021-04-06 16:24       ` Luis Henriques
@ 2021-04-06 17:27         ` Jeff Layton
  2021-04-06 18:04           ` Luis Henriques
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2021-04-06 17:27 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

On Tue, 2021-04-06 at 17:24 +0100, Luis Henriques wrote:
> On Tue, Apr 06, 2021 at 12:03:27PM -0400, Jeff Layton wrote:
> > On Tue, 2021-04-06 at 16:38 +0100, Luis Henriques wrote:
> > > Hi Jeff!
> > > 
> > > On Fri, Mar 26, 2021 at 01:32:27PM -0400, Jeff Layton wrote:
> > > > We gate most of the ioctls on MDS feature support. The exception is the
> > > > key removal and status functions that we still want to work if the MDS's
> > > > were to (inexplicably) lose the feature.
> > > > 
> > > > For the set_policy ioctl, we take Fcx caps to ensure that nothing can
> > > > create files in the directory while the ioctl is running. That should
> > > > be enough to ensure that the "empty_dir" check is reliable.
> > > > 
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >  fs/ceph/ioctl.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 94 insertions(+)
> > > > 
> > > > diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c
> > > > index 6e061bf62ad4..34b85bcfcfc7 100644
> > > > --- a/fs/ceph/ioctl.c
> > > > +++ b/fs/ceph/ioctl.c
> > > > @@ -6,6 +6,7 @@
> > > >  #include "mds_client.h"
> > > >  #include "ioctl.h"
> > > >  #include <linux/ceph/striper.h>
> > > > +#include <linux/fscrypt.h>
> > > >  
> > > >  /*
> > > >   * ioctls
> > > > @@ -268,8 +269,56 @@ static long ceph_ioctl_syncio(struct file *file)
> > > >  	return 0;
> > > >  }
> > > >  
> > > > +static int vet_mds_for_fscrypt(struct file *file)
> > > > +{
> > > > +	int i, ret = -EOPNOTSUPP;
> > > > +	struct ceph_mds_client	*mdsc = ceph_sb_to_mdsc(file_inode(file)->i_sb);
> > > > +
> > > > +	mutex_lock(&mdsc->mutex);
> > > > +	for (i = 0; i < mdsc->max_sessions; i++) {
> > > > +		struct ceph_mds_session *s = mdsc->sessions[i];
> > > > +
> > > > +		if (!s)
> > > > +			continue;
> > > > +		if (test_bit(CEPHFS_FEATURE_ALTERNATE_NAME, &s->s_features))
> > > > +			ret = 0;
> > > > +		break;
> > > > +	}
> > > > +	mutex_unlock(&mdsc->mutex);
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +static long ceph_set_encryption_policy(struct file *file, unsigned long arg)
> > > > +{
> > > > +	int ret, got = 0;
> > > > +	struct page *page = NULL;
> > > > +	struct inode *inode = file_inode(file);
> > > > +	struct ceph_inode_info *ci = ceph_inode(inode);
> > > > +
> > > > +	ret = vet_mds_for_fscrypt(file);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	/*
> > > > +	 * Ensure we hold these caps so that we _know_ that the rstats check
> > > > +	 * in the empty_dir check is reliable.
> > > > +	 */
> > > > +	ret = ceph_get_caps(file, CEPH_CAP_FILE_SHARED, 0, -1, &got, &page);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +	if (page)
> > > > +		put_page(page);
> > > > +	ret = fscrypt_ioctl_set_policy(file, (const void __user *)arg);
> > > > +	if (got)
> > > > +		ceph_put_cap_refs(ci, got);
> > > > +	return ret;
> > > > +}
> > > > +
> > > >  long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> > > >  {
> > > > +	int ret;
> > > > +	struct ceph_inode_info *ci = ceph_inode(file_inode(file));
> > > > +
> > > >  	dout("ioctl file %p cmd %u arg %lu\n", file, cmd, arg);
> > > >  	switch (cmd) {
> > > >  	case CEPH_IOC_GET_LAYOUT:
> > > > @@ -289,6 +338,51 @@ long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> > > >  
> > > >  	case CEPH_IOC_SYNCIO:
> > > >  		return ceph_ioctl_syncio(file);
> > > > +
> > > > +	case FS_IOC_SET_ENCRYPTION_POLICY:
> > > > +		return ceph_set_encryption_policy(file, arg);
> > > > +
> > > > +	case FS_IOC_GET_ENCRYPTION_POLICY:
> > > > +		ret = vet_mds_for_fscrypt(file);
> > > > +		if (ret)
> > > > +			return ret;
> > > > +		return fscrypt_ioctl_get_policy(file, (void __user *)arg);
> > > > +
> > > > +	case FS_IOC_GET_ENCRYPTION_POLICY_EX:
> > > > +		ret = vet_mds_for_fscrypt(file);
> > > > +		if (ret)
> > > > +			return ret;
> > > > +		return fscrypt_ioctl_get_policy_ex(file, (void __user *)arg);
> > > > +
> > > > +	case FS_IOC_ADD_ENCRYPTION_KEY:
> > > > +		ret = vet_mds_for_fscrypt(file);
> > > > +		if (ret)
> > > > +			return ret;
> > > > +		atomic_inc(&ci->i_shared_gen);
> > > 
> > > I've spent a few hours already looking at the bug I reported before, and I
> > > can't really understand this code.  What does it mean to increment
> > > ->i_shared_gen at this point?
> > > 
> > > The reason I'm asking is because it looks like the problem I'm seeing goes
> > > away if I remove this code.  Here's what I'm doing/seeing:
> > > 
> > > # mount ...
> > > # fscrypt unlock d
> > > 
> > >   -> 'd' dentry is eventually pruned at this point *if* ->i_shared_gen was
> > >      incremented by the line above.
> > > 
> > > # cat d/f
> > > 
> > >   -> when ceph_fill_inode() is executed, 'd' isn't *not* set as encrypted
> > >      because both ci->i_xattrs.version and info->xattr_version are both
> > >      set to 0.
> > > 
> > 
> > Interesting. That sounds like it might be the bug right there. "d"
> > should clearly have a fscrypt context in its xattrs at that point. If
> > the MDS isn't passing that back, then that could be a problem.
> > 
> > I had a concern about that when I was developing this, and I *thought*
> > Zheng had assured us that the MDS will always pass along the xattr blob
> > in a trace. Maybe that's not correct?
> 
> Hmm, that's what I thought too.  I was hoping not having to go look at the
> MDS, but seems like I'll have to :-)
> 

That'd be good, if possible.

> > > cat: d/f: No such file or directory
> > > 
> > > I'm not sure anymore if the issue is on the client or on the MDS side.
> > > Before digging deeper, I wonder if this ring any bell. ;-)
> > > 
> > > 
> > 
> > No, this is not something I've seen before.
> > 
> > Dentries that live in a directory have a copy of the i_shared_gen of the
> > directory when they are instantiated. Bumping that value on a directory
> > should basically ensure that its child dentries end up invalidated,
> > which is what we want once we add the key to the directory. Once we add
> > a key, any old dentries in that directory are no longer valid.
> > 
> > That said, I could certainly have missed some subtlety here.
> 
> Great, thanks for clarifying.  This should help me investigate a little
> bit more.
> 
> [ And I'm also surprised you don't see this behaviour as it's very easy to
>   reproduce. ]
> 
> 

It is odd... fwiw, I ran this for 5 mins or so and never saw a problem:

    $ while [ $? -eq 0 ]; do sudo umount /mnt/crypt; sudo mount /mnt/crypt; fscrypt unlock --key=/home/jlayton/fscrypt-keyfile /mnt/crypt/d; cat /mnt/crypt/d/f; done

...do I need some other operations in between? Also, the cluster in this
case is Pacific. It's possible this is a result of changes since then if
you're on a vstart cluster or something.

$ sudo ./cephadm version
Using recent ceph image docker.io/ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42fecb950c3407687cb4f29a
ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable)


> > > 
> > > > +		ceph_dir_clear_ordered(file_inode(file));
> > > > +		ceph_dir_clear_complete(file_inode(file));
> > > > +		return fscrypt_ioctl_add_key(file, (void __user *)arg);
> > > > +
> > > > +	case FS_IOC_REMOVE_ENCRYPTION_KEY:
> > > > +		atomic_inc(&ci->i_shared_gen);
> > > > +		ceph_dir_clear_ordered(file_inode(file));
> > > > +		ceph_dir_clear_complete(file_inode(file));
> > > > +		return fscrypt_ioctl_remove_key(file, (void __user *)arg);
> > > > +
> > > > +	case FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS:
> > > > +		atomic_inc(&ci->i_shared_gen);
> > > > +		ceph_dir_clear_ordered(file_inode(file));
> > > > +		ceph_dir_clear_complete(file_inode(file));
> > > > +		return fscrypt_ioctl_remove_key_all_users(file, (void __user *)arg);
> > > > +
> > > > +	case FS_IOC_GET_ENCRYPTION_KEY_STATUS:
> > > > +		return fscrypt_ioctl_get_key_status(file, (void __user *)arg);
> > > > +
> > > > +	case FS_IOC_GET_ENCRYPTION_NONCE:
> > > > +		ret = vet_mds_for_fscrypt(file);
> > > > +		if (ret)
> > > > +			return ret;
> > > > +		return fscrypt_ioctl_get_nonce(file, (void __user *)arg);
> > > >  	}
> > > >  
> > > >  	return -ENOTTY;
> > > > -- 
> > > > 2.30.2
> > > > 
> > 
> > -- 
> > Jeff Layton <jlayton@kernel.org>
> > 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 19/19] ceph: add fscrypt ioctls
  2021-04-06 17:27         ` Jeff Layton
@ 2021-04-06 18:04           ` Luis Henriques
  2021-04-07 12:47             ` Jeff Layton
  0 siblings, 1 reply; 39+ messages in thread
From: Luis Henriques @ 2021-04-06 18:04 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

On Tue, Apr 06, 2021 at 01:27:21PM -0400, Jeff Layton wrote:
<snip>
> > > > I've spent a few hours already looking at the bug I reported before, and I
> > > > can't really understand this code.  What does it mean to increment
> > > > ->i_shared_gen at this point?
> > > > 
> > > > The reason I'm asking is because it looks like the problem I'm seeing goes
> > > > away if I remove this code.  Here's what I'm doing/seeing:
> > > > 
> > > > # mount ...
> > > > # fscrypt unlock d
> > > > 
> > > >   -> 'd' dentry is eventually pruned at this point *if* ->i_shared_gen was
> > > >      incremented by the line above.
> > > > 
> > > > # cat d/f
> > > > 
> > > >   -> when ceph_fill_inode() is executed, 'd' isn't *not* set as encrypted
> > > >      because both ci->i_xattrs.version and info->xattr_version are both
> > > >      set to 0.
> > > > 
> > > 
> > > Interesting. That sounds like it might be the bug right there. "d"
> > > should clearly have a fscrypt context in its xattrs at that point. If
> > > the MDS isn't passing that back, then that could be a problem.
> > > 
> > > I had a concern about that when I was developing this, and I *thought*
> > > Zheng had assured us that the MDS will always pass along the xattr blob
> > > in a trace. Maybe that's not correct?
> > 
> > Hmm, that's what I thought too.  I was hoping not having to go look at the
> > MDS, but seems like I'll have to :-)
> > 
> 
> That'd be good, if possible.
> 
> > > > cat: d/f: No such file or directory
> > > > 
> > > > I'm not sure anymore if the issue is on the client or on the MDS side.
> > > > Before digging deeper, I wonder if this ring any bell. ;-)
> > > > 
> > > > 
> > > 
> > > No, this is not something I've seen before.
> > > 
> > > Dentries that live in a directory have a copy of the i_shared_gen of the
> > > directory when they are instantiated. Bumping that value on a directory
> > > should basically ensure that its child dentries end up invalidated,
> > > which is what we want once we add the key to the directory. Once we add
> > > a key, any old dentries in that directory are no longer valid.
> > > 
> > > That said, I could certainly have missed some subtlety here.
> > 
> > Great, thanks for clarifying.  This should help me investigate a little
> > bit more.
> > 
> > [ And I'm also surprised you don't see this behaviour as it's very easy to
> >   reproduce. ]
> > 
> > 
> 
> It is odd... fwiw, I ran this for 5 mins or so and never saw a problem:
> 
>     $ while [ $? -eq 0 ]; do sudo umount /mnt/crypt; sudo mount /mnt/crypt; fscrypt unlock --key=/home/jlayton/fscrypt-keyfile /mnt/crypt/d; cat /mnt/crypt/d/f; done
>

TBH I only do this operation once and it almost always fails.  The only
difference I see is that I don't really use a keyfile, but a passphrase
instead.  Not sure if it makes any difference.  Also, it may be worth
adding a delay before the 'cat' to make sure the dentry is pruned.

> ...do I need some other operations in between? Also, the cluster in this
> case is Pacific. It's possible this is a result of changes since then if
> you're on a vstart cluster or something.
> 
> $ sudo ./cephadm version
> Using recent ceph image docker.io/ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42fecb950c3407687cb4f29a
> ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable)

I've re-compiled the cluster after hard-resetting it to commit
6a19e303187c which you mentioned in a previous email in this thread.  But
the result was the same.

Anyway, using a vstart cluster is also a huge difference I guess.  I'll
keep debugging.  Thanks!

Cheers,
--
Luís

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 19/19] ceph: add fscrypt ioctls
  2021-04-06 18:04           ` Luis Henriques
@ 2021-04-07 12:47             ` Jeff Layton
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-04-07 12:47 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

On Tue, 2021-04-06 at 19:04 +0100, Luis Henriques wrote:
> On Tue, Apr 06, 2021 at 01:27:21PM -0400, Jeff Layton wrote:
> <snip>
> > > > > I've spent a few hours already looking at the bug I reported before, and I
> > > > > can't really understand this code.  What does it mean to increment
> > > > > ->i_shared_gen at this point?
> > > > > 
> > > > > The reason I'm asking is because it looks like the problem I'm seeing goes
> > > > > away if I remove this code.  Here's what I'm doing/seeing:
> > > > > 
> > > > > # mount ...
> > > > > # fscrypt unlock d
> > > > > 
> > > > >   -> 'd' dentry is eventually pruned at this point *if* ->i_shared_gen was
> > > > >      incremented by the line above.
> > > > > 
> > > > > # cat d/f
> > > > > 
> > > > >   -> when ceph_fill_inode() is executed, 'd' isn't *not* set as encrypted
> > > > >      because both ci->i_xattrs.version and info->xattr_version are both
> > > > >      set to 0.
> > > > > 
> > > > 
> > > > Interesting. That sounds like it might be the bug right there. "d"
> > > > should clearly have a fscrypt context in its xattrs at that point. If
> > > > the MDS isn't passing that back, then that could be a problem.
> > > > 
> > > > I had a concern about that when I was developing this, and I *thought*
> > > > Zheng had assured us that the MDS will always pass along the xattr blob
> > > > in a trace. Maybe that's not correct?
> > > 
> > > Hmm, that's what I thought too.  I was hoping not having to go look at the
> > > MDS, but seems like I'll have to :-)
> > > 
> > 
> > That'd be good, if possible.
> > 
> > > > > cat: d/f: No such file or directory
> > > > > 
> > > > > I'm not sure anymore if the issue is on the client or on the MDS side.
> > > > > Before digging deeper, I wonder if this ring any bell. ;-)
> > > > > 
> > > > > 
> > > > 
> > > > No, this is not something I've seen before.
> > > > 
> > > > Dentries that live in a directory have a copy of the i_shared_gen of the
> > > > directory when they are instantiated. Bumping that value on a directory
> > > > should basically ensure that its child dentries end up invalidated,
> > > > which is what we want once we add the key to the directory. Once we add
> > > > a key, any old dentries in that directory are no longer valid.
> > > > 
> > > > That said, I could certainly have missed some subtlety here.
> > > 
> > > Great, thanks for clarifying.  This should help me investigate a little
> > > bit more.
> > > 
> > > [ And I'm also surprised you don't see this behaviour as it's very easy to
> > >   reproduce. ]
> > > 
> > > 
> > 
> > It is odd... fwiw, I ran this for 5 mins or so and never saw a problem:
> > 
> >     $ while [ $? -eq 0 ]; do sudo umount /mnt/crypt; sudo mount /mnt/crypt; fscrypt unlock --key=/home/jlayton/fscrypt-keyfile /mnt/crypt/d; cat /mnt/crypt/d/f; done
> > 
> 
> TBH I only do this operation once and it almost always fails.  The only
> difference I see is that I don't really use a keyfile, but a passphrase
> instead.  Not sure if it makes any difference.  Also, it may be worth
> adding a delay before the 'cat' to make sure the dentry is pruned.
> 

No joy. I tried different delays between 1-5s and it didn't change
anything.

> > ...do I need some other operations in between? Also, the cluster in this
> > case is Pacific. It's possible this is a result of changes since then if
> > you're on a vstart cluster or something.
> > 
> > $ sudo ./cephadm version
> > Using recent ceph image docker.io/ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42fecb950c3407687cb4f29a
> > ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable)
> 
> I've re-compiled the cluster after hard-resetting it to commit
> 6a19e303187c which you mentioned in a previous email in this thread.  But
> the result was the same.
> 
> Anyway, using a vstart cluster is also a huge difference I guess.  I'll
> keep debugging.  Thanks!
> 

I may try to set one up today to see if I can reproduce it. Thanks for
the testing help so far!

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 02/19] fscrypt: export fscrypt_base64_encode and fscrypt_base64_decode
  2021-03-26 17:32 ` [RFC PATCH v5 02/19] fscrypt: export fscrypt_base64_encode and fscrypt_base64_decode Jeff Layton
@ 2021-04-08  1:06   ` Eric Biggers
  2021-04-08 16:22     ` Jeff Layton
  0 siblings, 1 reply; 39+ messages in thread
From: Eric Biggers @ 2021-04-08  1:06 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

On Fri, Mar 26, 2021 at 01:32:10PM -0400, Jeff Layton wrote:
> Ceph will need to base64-encode some encrypted filenames, so make
> these routines, and FSCRYPT_BASE64_CHARS available to modules.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

It would be helpful to have a quick explanation here about *why* ceph has to do
base64 encoding/decoding itself.

- Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 01/19] vfs: export new_inode_pseudo
  2021-03-26 17:32 ` [RFC PATCH v5 01/19] vfs: export new_inode_pseudo Jeff Layton
@ 2021-04-08  1:08   ` Eric Biggers
  2021-04-08 16:18     ` Jeff Layton
  0 siblings, 1 reply; 39+ messages in thread
From: Eric Biggers @ 2021-04-08  1:08 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, Al Viro

On Fri, Mar 26, 2021 at 01:32:09PM -0400, Jeff Layton wrote:
> Ceph needs to be able to allocate inodes ahead of a create that might
> involve a fscrypt-encrypted inode. new_inode() almost fits the bill,
> but it puts the inode on the sb->s_inodes list and when we go to hash
> it, that might be done again.
> 
> We could work around that by setting I_CREATING on the new inode, but
> that causes ilookup5 to return -ESTALE if something tries to find it
> before I_NEW is cleared. To work around all of this, just use
> new_inode_pseudo which doesn't add it to the list.
> 
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

IIRC, this looked like a bug in ilookup5().  Did you come to the conclusion that
it's not actually a bug?

- Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 03/19] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size
  2021-03-26 17:32 ` [RFC PATCH v5 03/19] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size Jeff Layton
@ 2021-04-08  1:19   ` Eric Biggers
  0 siblings, 0 replies; 39+ messages in thread
From: Eric Biggers @ 2021-04-08  1:19 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

On Fri, Mar 26, 2021 at 01:32:11PM -0400, Jeff Layton wrote:
> For ceph, we want to use our own scheme for handling filenames that are
> are longer than NAME_MAX after encryption and base64 encoding. This
> allows us to have a consistent view of the encrypted filenames for
> clients that don't support fscrypt and clients that do but that don't
> have the key.
> 
> Export fscrypt_fname_encrypt. Rename fscrypt_fname_encrypted_size to
> __fscrypt_fname_encrypted_size and add a new wrapper called
> fscrypt_fname_encrypted_size that takes an inode argument rahter than
> a pointer to a fscrypt_policy union.

This explanation seems to be missing a logical connection between the first and
second paragraphs.  I think it's missing something along the lines of:
"Currently, fs/crypto/ only supports filenames encryption using
fscrypt_setup_filename(), which also handles decoding no-key names.  Ceph can't
use that because it needs to handle no-key names in a different way.  So, we
need to export the functions needed to encrypt filenames separately."

(I might have gotten the explanation a bit wrong... Point is, it's the type of
thing that seems to be missing here.)

- Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 04/19] fscrypt: add fscrypt_context_for_new_inode
  2021-03-26 17:32 ` [RFC PATCH v5 04/19] fscrypt: add fscrypt_context_for_new_inode Jeff Layton
@ 2021-04-08  1:21   ` Eric Biggers
  2021-04-08 16:27     ` Jeff Layton
  0 siblings, 1 reply; 39+ messages in thread
From: Eric Biggers @ 2021-04-08  1:21 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

On Fri, Mar 26, 2021 at 01:32:12PM -0400, Jeff Layton wrote:
> CephFS will need to be able to generate a context for a new "prepared"
> inode. Add a new routine for getting the context out of an in-core
> inode.

It would be helpful to briefly mention why fscrypt_set_context() can't be used
instead (like the other filesystems do).

- Eric

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 01/19] vfs: export new_inode_pseudo
  2021-04-08  1:08   ` Eric Biggers
@ 2021-04-08 16:18     ` Jeff Layton
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-04-08 16:18 UTC (permalink / raw)
  To: Eric Biggers; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel, Al Viro

On Wed, 2021-04-07 at 18:08 -0700, Eric Biggers wrote:
> On Fri, Mar 26, 2021 at 01:32:09PM -0400, Jeff Layton wrote:
> > Ceph needs to be able to allocate inodes ahead of a create that might
> > involve a fscrypt-encrypted inode. new_inode() almost fits the bill,
> > but it puts the inode on the sb->s_inodes list and when we go to hash
> > it, that might be done again.
> > 
> > We could work around that by setting I_CREATING on the new inode, but
> > that causes ilookup5 to return -ESTALE if something tries to find it
> > before I_NEW is cleared. To work around all of this, just use
> > new_inode_pseudo which doesn't add it to the list.
> > 
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> 
> IIRC, this looked like a bug in ilookup5().  Did you come to the conclusion that
> it's not actually a bug?
> 

Yes. Al pointed out that it's desirable behavior for most (simpler)
filesystems.

Basically, nothing should have presented the filehandle for this inode
to a client until after I_NEW has been cleared. So, any attempt to look
it up should give you back ESTALE at this point.

I'm not married to this approach however. If there's a better way to do
this, then I'm happy to consider it.
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 02/19] fscrypt: export fscrypt_base64_encode and fscrypt_base64_decode
  2021-04-08  1:06   ` Eric Biggers
@ 2021-04-08 16:22     ` Jeff Layton
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-04-08 16:22 UTC (permalink / raw)
  To: Eric Biggers; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

On Wed, 2021-04-07 at 18:06 -0700, Eric Biggers wrote:
> On Fri, Mar 26, 2021 at 01:32:10PM -0400, Jeff Layton wrote:
> > Ceph will need to base64-encode some encrypted filenames, so make
> > these routines, and FSCRYPT_BASE64_CHARS available to modules.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> 
> It would be helpful to have a quick explanation here about *why* ceph has to do
> base64 encoding/decoding itself.
> 

Sure. I'll plan to flesh out the changelogs a bit more before the next
posting.

The basic problem is that we want to use printable filenames for storage
on the MDS, but we don't want to tie the format we use to the fscrypt
nokey name format.

So, we have our own nokey name format that we use that's quite similar
to the one in fscrypt. So similar in fact, that we want to use the same
base64 encoding scheme that fscrypt uses for this -- hence the need to
make these available to modules. 
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH v5 04/19] fscrypt: add fscrypt_context_for_new_inode
  2021-04-08  1:21   ` Eric Biggers
@ 2021-04-08 16:27     ` Jeff Layton
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2021-04-08 16:27 UTC (permalink / raw)
  To: Eric Biggers; +Cc: ceph-devel, linux-fscrypt, linux-fsdevel

On Wed, 2021-04-07 at 18:21 -0700, Eric Biggers wrote:
> On Fri, Mar 26, 2021 at 01:32:12PM -0400, Jeff Layton wrote:
> > CephFS will need to be able to generate a context for a new "prepared"
> > inode. Add a new routine for getting the context out of an in-core
> > inode.
> 
> It would be helpful to briefly mention why fscrypt_set_context() can't be used
> instead (like the other filesystems do).
> 

I'll add this to the changelog as well before the next posting, but
basically, when we send a create request to the MDS, we send along a
full set of attributes, including an xattr blob that includes the
encryption.ctx xattr.

If we used fscrypt_set_context then we'd have to make a separate round
trip to set the xattr on the server for every create. We'd also have a
window of time where the inode exists on the MDS but has no encryption
context attached, which could cause race conditions with other clients.
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2021-04-08 16:27 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-26 17:32 [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 01/19] vfs: export new_inode_pseudo Jeff Layton
2021-04-08  1:08   ` Eric Biggers
2021-04-08 16:18     ` Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 02/19] fscrypt: export fscrypt_base64_encode and fscrypt_base64_decode Jeff Layton
2021-04-08  1:06   ` Eric Biggers
2021-04-08 16:22     ` Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 03/19] fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size Jeff Layton
2021-04-08  1:19   ` Eric Biggers
2021-03-26 17:32 ` [RFC PATCH v5 04/19] fscrypt: add fscrypt_context_for_new_inode Jeff Layton
2021-04-08  1:21   ` Eric Biggers
2021-04-08 16:27     ` Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 05/19] ceph: crypto context handling for ceph Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 06/19] ceph: implement -o test_dummy_encryption mount option Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 07/19] ceph: preallocate inode for ops that may create one Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 08/19] ceph: add routine to create fscrypt context prior to RPC Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 09/19] ceph: make ceph_msdc_build_path use ref-walk Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 10/19] ceph: add encrypted fname handling to ceph_mdsc_build_path Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 11/19] ceph: decode alternate_name in lease info Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 12/19] ceph: send altname in MClientRequest Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 13/19] ceph: properly set DCACHE_NOKEY_NAME flag in lookup Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 14/19] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 15/19] ceph: add helpers for converting names for userland presentation Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 16/19] ceph: add fscrypt support to ceph_fill_trace Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 17/19] ceph: add support to readdir for encrypted filenames Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 18/19] ceph: create symlinks with encrypted and base64-encoded targets Jeff Layton
2021-03-26 17:32 ` [RFC PATCH v5 19/19] ceph: add fscrypt ioctls Jeff Layton
2021-04-06 15:38   ` Luis Henriques
2021-04-06 16:03     ` Jeff Layton
2021-04-06 16:24       ` Luis Henriques
2021-04-06 17:27         ` Jeff Layton
2021-04-06 18:04           ` Luis Henriques
2021-04-07 12:47             ` Jeff Layton
2021-03-26 18:38 ` [RFC PATCH v5 00/19] ceph+fscrypt: context, filename and symlink support Jeff Layton
2021-03-31 20:35 ` [RFC PATCH v5 20/19] ceph: make ceph_get_name decrypt filenames Jeff Layton
2021-04-01 11:14   ` Luis Henriques
2021-04-01 12:15     ` Jeff Layton
2021-04-01 13:05       ` Luis Henriques
2021-04-01 13:12         ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).