All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v14 00/12] FUSE passthrough for file io
@ 2023-10-16 16:08 Amir Goldstein
  2023-10-16 16:08 ` [PATCH v14 01/12] fs: prepare for stackable filesystems backing file helpers Amir Goldstein
                   ` (12 more replies)
  0 siblings, 13 replies; 48+ messages in thread
From: Amir Goldstein @ 2023-10-16 16:08 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

Miklos,

I've shared several POC branches since the posting of v13 back in May
and played with several API choices. It is time to post v14.

The API we converged to is server managed shared backing files that are
referenced by backing id plus per-file re-opened backing_file.

This model looks coherent to me. I think that the example server [3]
demonstrates that this API is simple enough to work with.

There is quite a bit of re-factored code in this version - I've actually
declared this common code as a new vfs subsystem [stackable filesystems]
in MAINTAINERS per Christian's request.

The re-factored common code is based on overlayfs-next and Christian's
vfs.misc branch (for the backing_file changes).

I am not posting performance numbers again. Alessio has already posted
performance numbers back in v12 and nothing has changed in this regard.
We are using a variant of v12 patches in production and the performance
improvement is very noticable.

Bernd and Nikolaus have helped with improving running fstests on fuse
passthrough examples.

I have ran the -g auto fstests with v14 patches with the example server.
Compared to the baseline test results with passthrough_hp, the backing
file passthrough passes several more test, mainly tests related to data
coherecy, such as generic/451.

The following tests are the only ones that pass on baseline passthtough_hp
and fail with my backing file passthrough example:

  generic/080 generic/120 generic/193 generic/215 generic/355

Those tests are failing because of missing mtime/atime/ctime updates
in some use cases and failure to strip suid/sgid bits in some cases.

The model of who is responsible for updating timestamps and stripping
suid/sgid bits is not always clear when it comes to backing file
passthrough. I tried to invalidate attr caches similar to how fuse
read/write behaves, but it does not cover all cases correctly.

Let me know what you think of this version and if you think there is
anything else that we need to take care of before upstreaming.

Thanks,
Amir.

Changes from v13 [1]:
- rebase on 6.6-rc6 (and overlayfs and vfs next branches)
- server managed shared backing files without auto-close mode
- open a backing_file per fuse_file with fuse file's path and flags
- factor out common read/write/splice/mmap helpers from overlayfs
- factor out ioctl helpers

[1] https://lore.kernel.org/r/20230519125705.598234-1-amir73il@gmail.com/
[2] https://github.com/amir73il/linux/commits/fuse-backing-fd-v14
[3] https://github.com/amir73il/libfuse/commits/fuse-backing-fd

Amir Goldstein (12):
  fs: prepare for stackable filesystems backing file helpers
  fs: factor out backing_file_{read,write}_iter() helpers
  fs: factor out backing_file_splice_{read,write}() helpers
  fs: factor out backing_file_mmap() helper
  fuse: factor out helper for FUSE_DEV_IOC_CLONE
  fuse: introduce FUSE_PASSTHROUGH capability
  fuse: pass optional backing_id in struct fuse_open_out
  fuse: implement ioctls to manage backing files
  fuse: implement read/write passthrough
  fuse: implement splice_{read/write} passthrough
  fuse: implement passthrough for mmap
  fuse: implement passthrough for readdir

 MAINTAINERS                  |   9 +
 fs/Kconfig                   |   4 +
 fs/Makefile                  |   1 +
 fs/backing-file.c            | 319 ++++++++++++++++++++++++++++
 fs/fuse/Kconfig              |  11 +
 fs/fuse/Makefile             |   1 +
 fs/fuse/cuse.c               |   3 +-
 fs/fuse/dev.c                |  98 ++++++---
 fs/fuse/dir.c                |   2 +-
 fs/fuse/file.c               |  69 ++++--
 fs/fuse/fuse_i.h             |  72 ++++++-
 fs/fuse/inode.c              |  25 +++
 fs/fuse/ioctl.c              |   3 +-
 fs/fuse/passthrough.c        | 392 +++++++++++++++++++++++++++++++++++
 fs/fuse/readdir.c            |  12 +-
 fs/open.c                    |  38 ----
 fs/overlayfs/Kconfig         |   1 +
 fs/overlayfs/file.c          | 246 ++++------------------
 fs/overlayfs/overlayfs.h     |   8 +-
 fs/overlayfs/super.c         |  11 +-
 include/linux/backing-file.h |  42 ++++
 include/linux/fs.h           |   3 -
 include/uapi/linux/fuse.h    |  23 +-
 23 files changed, 1085 insertions(+), 308 deletions(-)
 create mode 100644 fs/backing-file.c
 create mode 100644 fs/fuse/passthrough.c
 create mode 100644 include/linux/backing-file.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v14 01/12] fs: prepare for stackable filesystems backing file helpers
  2023-10-16 16:08 [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
@ 2023-10-16 16:08 ` Amir Goldstein
  2023-10-16 16:08 ` [PATCH v14 02/12] fs: factor out backing_file_{read,write}_iter() helpers Amir Goldstein
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 48+ messages in thread
From: Amir Goldstein @ 2023-10-16 16:08 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

In preparation for factoring out some backing file io helpers from
overlayfs, move backing_file_open() into a new file fs/backing-file.c
and header.

Add a MAINTAINERS entry for stackable filesystems and add a Kconfig
FS_STACK which stackable filesystems need to select.

For now, the backing_file struct, the backing_file alloc/free functions
and the backing_file_real_path() accessor remain internal to file_table.c.
We may change that in the future.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 MAINTAINERS                  |  9 +++++++
 fs/Kconfig                   |  4 +++
 fs/Makefile                  |  1 +
 fs/backing-file.c            | 48 ++++++++++++++++++++++++++++++++++++
 fs/open.c                    | 38 ----------------------------
 fs/overlayfs/Kconfig         |  1 +
 fs/overlayfs/file.c          |  1 +
 include/linux/backing-file.h | 17 +++++++++++++
 include/linux/fs.h           |  3 ---
 9 files changed, 81 insertions(+), 41 deletions(-)
 create mode 100644 fs/backing-file.c
 create mode 100644 include/linux/backing-file.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 7a7bd8bd80e9..2e3e9a6c1604 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8050,6 +8050,15 @@ F:	include/linux/fs_types.h
 F:	include/uapi/linux/fs.h
 F:	include/uapi/linux/openat2.h
 
+FILESYSTEMS [STACKABLE]
+M:	Miklos Szeredi <miklos@szeredi.hu>
+M:	Amir Goldstein <amir73il@gmail.com>
+L:	linux-fsdevel@vger.kernel.org
+L:	linux-unionfs@vger.kernel.org
+S:	Maintained
+F:	fs/backing-file.c
+F:	include/linux/backing-file.h
+
 FINTEK F75375S HARDWARE MONITOR AND FAN CONTROLLER DRIVER
 M:	Riku Voipio <riku.voipio@iki.fi>
 L:	linux-hwmon@vger.kernel.org
diff --git a/fs/Kconfig b/fs/Kconfig
index aa7e03cc1941..2af673d7390e 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -18,6 +18,10 @@ config VALIDATE_FS_PARSER
 config FS_IOMAP
 	bool
 
+# Stackable filesystems
+config FS_STACK
+	bool
+
 config BUFFER_HEAD
 	bool
 
diff --git a/fs/Makefile b/fs/Makefile
index f9541f40be4e..6fffddc4afde 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_COMPAT_BINFMT_ELF)	+= compat_binfmt_elf.o
 obj-$(CONFIG_BINFMT_ELF_FDPIC)	+= binfmt_elf_fdpic.o
 obj-$(CONFIG_BINFMT_FLAT)	+= binfmt_flat.o
 
+obj-$(CONFIG_FS_STACK)		+= backing-file.o
 obj-$(CONFIG_FS_MBCACHE)	+= mbcache.o
 obj-$(CONFIG_FS_POSIX_ACL)	+= posix_acl.o
 obj-$(CONFIG_NFS_COMMON)	+= nfs_common/
diff --git a/fs/backing-file.c b/fs/backing-file.c
new file mode 100644
index 000000000000..04b33036f709
--- /dev/null
+++ b/fs/backing-file.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Common helpers for stackable filesystems and backing files.
+ *
+ * Copyright (C) 2023 CTERA Networks.
+ */
+
+#include <linux/fs.h>
+#include <linux/backing-file.h>
+
+#include "internal.h"
+
+/**
+ * backing_file_open - open a backing file for kernel internal use
+ * @user_path:	path that the user reuqested to open
+ * @flags:	open flags
+ * @real_path:	path of the backing file
+ * @cred:	credentials for open
+ *
+ * Open a backing file for a stackable filesystem (e.g., overlayfs).
+ * @user_path may be on the stackable filesystem and @real_path on the
+ * underlying filesystem.  In this case, we want to be able to return the
+ * @user_path of the stackable filesystem. This is done by embedding the
+ * returned file into a container structure that also stores the stacked
+ * file's path, which can be retrieved using backing_file_user_path().
+ */
+struct file *backing_file_open(const struct path *user_path, int flags,
+			       const struct path *real_path,
+			       const struct cred *cred)
+{
+	struct file *f;
+	int error;
+
+	f = alloc_empty_backing_file(flags, cred);
+	if (IS_ERR(f))
+		return f;
+
+	path_get(user_path);
+	*backing_file_user_path(f) = *user_path;
+	error = vfs_open(real_path, f);
+	if (error) {
+		fput(f);
+		f = ERR_PTR(error);
+	}
+
+	return f;
+}
+EXPORT_SYMBOL_GPL(backing_file_open);
diff --git a/fs/open.c b/fs/open.c
index 02dc608d40d8..b90142c51797 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1180,44 +1180,6 @@ struct file *kernel_file_open(const struct path *path, int flags,
 }
 EXPORT_SYMBOL_GPL(kernel_file_open);
 
-/**
- * backing_file_open - open a backing file for kernel internal use
- * @user_path:	path that the user reuqested to open
- * @flags:	open flags
- * @real_path:	path of the backing file
- * @cred:	credentials for open
- *
- * Open a backing file for a stackable filesystem (e.g., overlayfs).
- * @user_path may be on the stackable filesystem and @real_path on the
- * underlying filesystem.  In this case, we want to be able to return the
- * @user_path of the stackable filesystem. This is done by embedding the
- * returned file into a container structure that also stores the stacked
- * file's path, which can be retrieved using backing_file_user_path().
- */
-struct file *backing_file_open(const struct path *user_path, int flags,
-			       const struct path *real_path,
-			       const struct cred *cred)
-{
-	struct file *f;
-	int error;
-
-	f = alloc_empty_backing_file(flags, cred);
-	if (IS_ERR(f))
-		return f;
-
-	path_get(user_path);
-	*backing_file_user_path(f) = *user_path;
-	f->f_path = *real_path;
-	error = do_dentry_open(f, d_inode(real_path->dentry), NULL);
-	if (error) {
-		fput(f);
-		f = ERR_PTR(error);
-	}
-
-	return f;
-}
-EXPORT_SYMBOL_GPL(backing_file_open);
-
 #define WILL_CREATE(flags)	(flags & (O_CREAT | __O_TMPFILE))
 #define O_PATH_FLAGS		(O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC)
 
diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
index fec5020c3495..2ac67e04a6fb 100644
--- a/fs/overlayfs/Kconfig
+++ b/fs/overlayfs/Kconfig
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config OVERLAY_FS
 	tristate "Overlay filesystem support"
+	select FS_STACK
 	select EXPORTFS
 	help
 	  An overlay filesystem combines two filesystems - an 'upper' filesystem
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index acdd79dd4bfa..19d4d4768fc7 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -13,6 +13,7 @@
 #include <linux/security.h>
 #include <linux/mm.h>
 #include <linux/fs.h>
+#include <linux/backing-file.h>
 #include "overlayfs.h"
 
 #include "../internal.h"	/* for sb_init_dio_done_wq */
diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h
new file mode 100644
index 000000000000..55c9e804f780
--- /dev/null
+++ b/include/linux/backing-file.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Common helpers for stackable filesystems and backing files.
+ *
+ * Copyright (C) 2023 CTERA Networks.
+ */
+
+#ifndef _LINUX_BACKING_FILE_H
+#define _LINUX_BACKING_FILE_H
+
+#include <linux/file.h>
+
+struct file *backing_file_open(const struct path *user_path, int flags,
+			       const struct path *real_path,
+			       const struct cred *cred);
+
+#endif /* _LINUX_BACKING_FILE_H */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 59b2d2ee2465..9b262516ca93 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2456,9 +2456,6 @@ struct file *dentry_open(const struct path *path, int flags,
 			 const struct cred *creds);
 struct file *dentry_create(const struct path *path, int flags, umode_t mode,
 			   const struct cred *cred);
-struct file *backing_file_open(const struct path *user_path, int flags,
-			       const struct path *real_path,
-			       const struct cred *cred);
 struct path *backing_file_user_path(struct file *f);
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v14 02/12] fs: factor out backing_file_{read,write}_iter() helpers
  2023-10-16 16:08 [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
  2023-10-16 16:08 ` [PATCH v14 01/12] fs: prepare for stackable filesystems backing file helpers Amir Goldstein
@ 2023-10-16 16:08 ` Amir Goldstein
  2023-10-16 16:08 ` [PATCH v14 03/12] fs: factor out backing_file_splice_{read,write}() helpers Amir Goldstein
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 48+ messages in thread
From: Amir Goldstein @ 2023-10-16 16:08 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

Overlayfs submits files io to backing files on other filesystems.
Factor out some common helpers to perform io to backing files, into
fs/backing-file.c.

Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Link: https://lore.kernel.org/r/CAJfpeguhmZbjP3JLqtUy0AdWaHOkAPWeP827BBWwRFEAUgnUcQ@mail.gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/backing-file.c            | 203 +++++++++++++++++++++++++++++++++++
 fs/overlayfs/file.c          | 187 +++-----------------------------
 fs/overlayfs/overlayfs.h     |   8 +-
 fs/overlayfs/super.c         |  11 +-
 include/linux/backing-file.h |  15 +++
 5 files changed, 240 insertions(+), 184 deletions(-)

diff --git a/fs/backing-file.c b/fs/backing-file.c
index 04b33036f709..2969ea6295ce 100644
--- a/fs/backing-file.c
+++ b/fs/backing-file.c
@@ -2,6 +2,9 @@
 /*
  * Common helpers for stackable filesystems and backing files.
  *
+ * Forked from fs/overlayfs/file.c.
+ *
+ * Copyright (C) 2017 Red Hat, Inc.
  * Copyright (C) 2023 CTERA Networks.
  */
 
@@ -46,3 +49,203 @@ struct file *backing_file_open(const struct path *user_path, int flags,
 	return f;
 }
 EXPORT_SYMBOL_GPL(backing_file_open);
+
+struct backing_aio {
+	struct kiocb iocb;
+	refcount_t ref;
+	struct kiocb *orig_iocb;
+	/* used for aio completion */
+	void (*end_write)(struct file *);
+	struct work_struct work;
+	long res;
+};
+
+static struct kmem_cache *backing_aio_cachep;
+
+#define BACKING_IOCB_MASK \
+	(IOCB_NOWAIT | IOCB_HIPRI | IOCB_DSYNC | IOCB_SYNC | IOCB_APPEND)
+
+static rwf_t iocb_to_rw_flags(int flags)
+{
+	return (__force rwf_t)(flags & BACKING_IOCB_MASK);
+}
+
+static void backing_aio_put(struct backing_aio *aio)
+{
+	if (refcount_dec_and_test(&aio->ref)) {
+		fput(aio->iocb.ki_filp);
+		kmem_cache_free(backing_aio_cachep, aio);
+	}
+}
+
+static void backing_aio_cleanup(struct backing_aio *aio, long res)
+{
+	struct kiocb *iocb = &aio->iocb;
+	struct kiocb *orig_iocb = aio->orig_iocb;
+
+	if (iocb->ki_flags & IOCB_WRITE) {
+		kiocb_end_write(iocb);
+		if (aio->end_write)
+			aio->end_write(orig_iocb->ki_filp);
+	}
+
+	orig_iocb->ki_pos = iocb->ki_pos;
+	backing_aio_put(aio);
+}
+
+static void backing_aio_rw_complete(struct kiocb *iocb, long res)
+{
+	struct backing_aio *aio = container_of(iocb, struct backing_aio, iocb);
+	struct kiocb *orig_iocb = aio->orig_iocb;
+
+	backing_aio_cleanup(aio, res);
+	orig_iocb->ki_complete(orig_iocb, res);
+}
+
+static void backing_aio_complete_work(struct work_struct *work)
+{
+	struct backing_aio *aio = container_of(work, struct backing_aio, work);
+
+	backing_aio_rw_complete(&aio->iocb, aio->res);
+}
+
+static void backing_aio_queue_completion(struct kiocb *iocb, long res)
+{
+	struct backing_aio *aio = container_of(iocb, struct backing_aio, iocb);
+
+	/*
+	 * Punt to a work queue to serialize updates of mtime/size.
+	 */
+	aio->res = res;
+	INIT_WORK(&aio->work, backing_aio_complete_work);
+	queue_work(file_inode(aio->orig_iocb->ki_filp)->i_sb->s_dio_done_wq,
+		   &aio->work);
+}
+
+static int backing_aio_init_wq(struct kiocb *iocb)
+{
+	struct super_block *sb = file_inode(iocb->ki_filp)->i_sb;
+
+	if (sb->s_dio_done_wq)
+		return 0;
+
+	return sb_init_dio_done_wq(sb);
+}
+
+
+ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter,
+			       struct kiocb *iocb, int flags,
+			       struct backing_file_ctx *ctx)
+{
+	struct backing_aio *aio = NULL;
+	const struct cred *old_cred;
+	ssize_t ret;
+
+	if (!iov_iter_count(iter))
+		return 0;
+
+	if (iocb->ki_flags & IOCB_DIRECT &&
+	    !(file->f_mode & FMODE_CAN_ODIRECT))
+		return -EINVAL;
+
+	old_cred = override_creds(ctx->cred);
+	if (is_sync_kiocb(iocb)) {
+		rwf_t rwf = iocb_to_rw_flags(flags);
+
+		ret = vfs_iter_read(file, iter, &iocb->ki_pos, rwf);
+	} else {
+		ret = -ENOMEM;
+		aio = kmem_cache_zalloc(backing_aio_cachep, GFP_KERNEL);
+		if (!aio)
+			goto out;
+
+		aio->orig_iocb = iocb;
+		kiocb_clone(&aio->iocb, iocb, get_file(file));
+		aio->iocb.ki_complete = backing_aio_rw_complete;
+		refcount_set(&aio->ref, 2);
+		ret = vfs_iocb_iter_read(file, &aio->iocb, iter);
+		backing_aio_put(aio);
+		if (ret != -EIOCBQUEUED)
+			backing_aio_cleanup(aio, ret);
+	}
+out:
+	revert_creds(old_cred);
+
+	if (ctx->accessed)
+		ctx->accessed(ctx->user_file);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(backing_file_read_iter);
+
+ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter,
+				struct kiocb *iocb, int flags,
+				struct backing_file_ctx *ctx)
+{
+	const struct cred *old_cred;
+	ssize_t ret;
+
+	if (!iov_iter_count(iter))
+		return 0;
+
+	if (iocb->ki_flags & IOCB_DIRECT &&
+	    !(file->f_mode & FMODE_CAN_ODIRECT))
+		return -EINVAL;
+
+	/*
+	 * Stacked filesystems don't support deferred completions, don't copy
+	 * this property in case it is set by the issuer.
+	 */
+	flags &= ~IOCB_DIO_CALLER_COMP;
+
+	old_cred = override_creds(ctx->cred);
+	if (is_sync_kiocb(iocb)) {
+		rwf_t rwf = iocb_to_rw_flags(flags);
+
+		file_start_write(file);
+		ret = vfs_iter_write(file, iter, &iocb->ki_pos, rwf);
+		file_end_write(file);
+		if (ctx->end_write)
+			ctx->end_write(ctx->user_file);
+	} else {
+		struct backing_aio *aio;
+
+		ret = backing_aio_init_wq(iocb);
+		if (ret)
+			goto out;
+
+		ret = -ENOMEM;
+		aio = kmem_cache_zalloc(backing_aio_cachep, GFP_KERNEL);
+		if (!aio)
+			goto out;
+
+		aio->orig_iocb = iocb;
+		aio->end_write = ctx->end_write;
+		kiocb_clone(&aio->iocb, iocb, get_file(file));
+		aio->iocb.ki_flags = flags;
+		aio->iocb.ki_complete = backing_aio_queue_completion;
+		refcount_set(&aio->ref, 2);
+		kiocb_start_write(&aio->iocb);
+		ret = vfs_iocb_iter_write(file, &aio->iocb, iter);
+		backing_aio_put(aio);
+		if (ret != -EIOCBQUEUED)
+			backing_aio_cleanup(aio, ret);
+	}
+out:
+	revert_creds(old_cred);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(backing_file_write_iter);
+
+static int __init backing_aio_init(void)
+{
+	backing_aio_cachep = kmem_cache_create("backing_aio",
+					       sizeof(struct backing_aio),
+					       0, SLAB_HWCACHE_ALIGN, NULL);
+	if (!backing_aio_cachep)
+		return -ENOMEM;
+
+	return 0;
+}
+fs_initcall(backing_aio_init);
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 19d4d4768fc7..1376f2d308fe 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -16,19 +16,6 @@
 #include <linux/backing-file.h>
 #include "overlayfs.h"
 
-#include "../internal.h"	/* for sb_init_dio_done_wq */
-
-struct ovl_aio_req {
-	struct kiocb iocb;
-	refcount_t ref;
-	struct kiocb *orig_iocb;
-	/* used for aio completion */
-	struct work_struct work;
-	long res;
-};
-
-static struct kmem_cache *ovl_aio_request_cachep;
-
 static char ovl_whatisit(struct inode *inode, struct inode *realinode)
 {
 	if (realinode != ovl_inode_upper(inode))
@@ -272,83 +259,16 @@ static void ovl_file_accessed(struct file *file)
 	touch_atime(&file->f_path);
 }
 
-#define OVL_IOCB_MASK \
-	(IOCB_NOWAIT | IOCB_HIPRI | IOCB_DSYNC | IOCB_SYNC | IOCB_APPEND)
-
-static rwf_t iocb_to_rw_flags(int flags)
-{
-	return (__force rwf_t)(flags & OVL_IOCB_MASK);
-}
-
-static inline void ovl_aio_put(struct ovl_aio_req *aio_req)
-{
-	if (refcount_dec_and_test(&aio_req->ref)) {
-		fput(aio_req->iocb.ki_filp);
-		kmem_cache_free(ovl_aio_request_cachep, aio_req);
-	}
-}
-
-static void ovl_aio_cleanup_handler(struct ovl_aio_req *aio_req)
-{
-	struct kiocb *iocb = &aio_req->iocb;
-	struct kiocb *orig_iocb = aio_req->orig_iocb;
-
-	if (iocb->ki_flags & IOCB_WRITE) {
-		kiocb_end_write(iocb);
-		ovl_file_modified(orig_iocb->ki_filp);
-	}
-
-	orig_iocb->ki_pos = iocb->ki_pos;
-	ovl_aio_put(aio_req);
-}
-
-static void ovl_aio_rw_complete(struct kiocb *iocb, long res)
-{
-	struct ovl_aio_req *aio_req = container_of(iocb,
-						   struct ovl_aio_req, iocb);
-	struct kiocb *orig_iocb = aio_req->orig_iocb;
-
-	ovl_aio_cleanup_handler(aio_req);
-	orig_iocb->ki_complete(orig_iocb, res);
-}
-
-static void ovl_aio_complete_work(struct work_struct *work)
-{
-	struct ovl_aio_req *aio_req = container_of(work,
-						   struct ovl_aio_req, work);
-
-	ovl_aio_rw_complete(&aio_req->iocb, aio_req->res);
-}
-
-static void ovl_aio_queue_completion(struct kiocb *iocb, long res)
-{
-	struct ovl_aio_req *aio_req = container_of(iocb,
-						   struct ovl_aio_req, iocb);
-	struct kiocb *orig_iocb = aio_req->orig_iocb;
-
-	/*
-	 * Punt to a work queue to serialize updates of mtime/size.
-	 */
-	aio_req->res = res;
-	INIT_WORK(&aio_req->work, ovl_aio_complete_work);
-	queue_work(file_inode(orig_iocb->ki_filp)->i_sb->s_dio_done_wq,
-		   &aio_req->work);
-}
-
-static int ovl_init_aio_done_wq(struct super_block *sb)
-{
-	if (sb->s_dio_done_wq)
-		return 0;
-
-	return sb_init_dio_done_wq(sb);
-}
-
 static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
 	struct fd real;
-	const struct cred *old_cred;
 	ssize_t ret;
+	struct backing_file_ctx ctx = {
+		.cred = ovl_creds(file_inode(file)->i_sb),
+		.user_file = file,
+		.accessed = ovl_file_accessed,
+	};
 
 	if (!iov_iter_count(iter))
 		return 0;
@@ -357,37 +277,8 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (ret)
 		return ret;
 
-	ret = -EINVAL;
-	if (iocb->ki_flags & IOCB_DIRECT &&
-	    !(real.file->f_mode & FMODE_CAN_ODIRECT))
-		goto out_fdput;
-
-	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	if (is_sync_kiocb(iocb)) {
-		rwf_t rwf = iocb_to_rw_flags(iocb->ki_flags);
-
-		ret = vfs_iter_read(real.file, iter, &iocb->ki_pos, rwf);
-	} else {
-		struct ovl_aio_req *aio_req;
-
-		ret = -ENOMEM;
-		aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL);
-		if (!aio_req)
-			goto out;
-
-		aio_req->orig_iocb = iocb;
-		kiocb_clone(&aio_req->iocb, iocb, get_file(real.file));
-		aio_req->iocb.ki_complete = ovl_aio_rw_complete;
-		refcount_set(&aio_req->ref, 2);
-		ret = vfs_iocb_iter_read(real.file, &aio_req->iocb, iter);
-		ovl_aio_put(aio_req);
-		if (ret != -EIOCBQUEUED)
-			ovl_aio_cleanup_handler(aio_req);
-	}
-out:
-	revert_creds(old_cred);
-	ovl_file_accessed(file);
-out_fdput:
+	ret = backing_file_read_iter(real.file, iter, iocb, iocb->ki_flags,
+				     &ctx);
 	fdput(real);
 
 	return ret;
@@ -398,9 +289,13 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
 	struct fd real;
-	const struct cred *old_cred;
 	ssize_t ret;
 	int ifl = iocb->ki_flags;
+	struct backing_file_ctx ctx = {
+		.cred = ovl_creds(inode->i_sb),
+		.user_file = file,
+		.end_write = ovl_file_modified,
+	};
 
 	if (!iov_iter_count(iter))
 		return 0;
@@ -416,11 +311,6 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (ret)
 		goto out_unlock;
 
-	ret = -EINVAL;
-	if (iocb->ki_flags & IOCB_DIRECT &&
-	    !(real.file->f_mode & FMODE_CAN_ODIRECT))
-		goto out_fdput;
-
 	if (!ovl_should_sync(OVL_FS(inode->i_sb)))
 		ifl &= ~(IOCB_DSYNC | IOCB_SYNC);
 
@@ -429,42 +319,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	 * this property in case it is set by the issuer.
 	 */
 	ifl &= ~IOCB_DIO_CALLER_COMP;
-
-	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	if (is_sync_kiocb(iocb)) {
-		rwf_t rwf = iocb_to_rw_flags(ifl);
-
-		file_start_write(real.file);
-		ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, rwf);
-		file_end_write(real.file);
-		/* Update size */
-		ovl_file_modified(file);
-	} else {
-		struct ovl_aio_req *aio_req;
-
-		ret = ovl_init_aio_done_wq(inode->i_sb);
-		if (ret)
-			goto out;
-
-		ret = -ENOMEM;
-		aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL);
-		if (!aio_req)
-			goto out;
-
-		aio_req->orig_iocb = iocb;
-		kiocb_clone(&aio_req->iocb, iocb, get_file(real.file));
-		aio_req->iocb.ki_flags = ifl;
-		aio_req->iocb.ki_complete = ovl_aio_queue_completion;
-		refcount_set(&aio_req->ref, 2);
-		kiocb_start_write(&aio_req->iocb);
-		ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter);
-		ovl_aio_put(aio_req);
-		if (ret != -EIOCBQUEUED)
-			ovl_aio_cleanup_handler(aio_req);
-	}
-out:
-	revert_creds(old_cred);
-out_fdput:
+	ret = backing_file_write_iter(real.file, iter, iocb, ifl, &ctx);
 	fdput(real);
 
 out_unlock:
@@ -776,19 +631,3 @@ const struct file_operations ovl_file_operations = {
 	.copy_file_range	= ovl_copy_file_range,
 	.remap_file_range	= ovl_remap_file_range,
 };
-
-int __init ovl_aio_request_cache_init(void)
-{
-	ovl_aio_request_cachep = kmem_cache_create("ovl_aio_req",
-						   sizeof(struct ovl_aio_req),
-						   0, SLAB_HWCACHE_ALIGN, NULL);
-	if (!ovl_aio_request_cachep)
-		return -ENOMEM;
-
-	return 0;
-}
-
-void ovl_aio_request_cache_destroy(void)
-{
-	kmem_cache_destroy(ovl_aio_request_cachep);
-}
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index ca88b2636a57..509a57c85fae 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -417,6 +417,12 @@ int ovl_want_write(struct dentry *dentry);
 void ovl_drop_write(struct dentry *dentry);
 struct dentry *ovl_workdir(struct dentry *dentry);
 const struct cred *ovl_override_creds(struct super_block *sb);
+
+static inline const struct cred *ovl_creds(struct super_block *sb)
+{
+	return OVL_FS(sb)->creator_cred;
+}
+
 int ovl_can_decode_fh(struct super_block *sb);
 struct dentry *ovl_indexdir(struct super_block *sb);
 bool ovl_index_all(struct super_block *sb);
@@ -829,8 +835,6 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
 
 /* file.c */
 extern const struct file_operations ovl_file_operations;
-int __init ovl_aio_request_cache_init(void);
-void ovl_aio_request_cache_destroy(void);
 int ovl_real_fileattr_get(const struct path *realpath, struct fileattr *fa);
 int ovl_real_fileattr_set(const struct path *realpath, struct fileattr *fa);
 int ovl_fileattr_get(struct dentry *dentry, struct fileattr *fa);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 1a95ee237fa9..38ad0d3ca973 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1498,14 +1498,10 @@ static int __init ovl_init(void)
 	if (ovl_inode_cachep == NULL)
 		return -ENOMEM;
 
-	err = ovl_aio_request_cache_init();
-	if (!err) {
-		err = register_filesystem(&ovl_fs_type);
-		if (!err)
-			return 0;
+	err = register_filesystem(&ovl_fs_type);
+	if (!err)
+		return 0;
 
-		ovl_aio_request_cache_destroy();
-	}
 	kmem_cache_destroy(ovl_inode_cachep);
 
 	return err;
@@ -1521,7 +1517,6 @@ static void __exit ovl_exit(void)
 	 */
 	rcu_barrier();
 	kmem_cache_destroy(ovl_inode_cachep);
-	ovl_aio_request_cache_destroy();
 }
 
 module_init(ovl_init);
diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h
index 55c9e804f780..0648d548a418 100644
--- a/include/linux/backing-file.h
+++ b/include/linux/backing-file.h
@@ -9,9 +9,24 @@
 #define _LINUX_BACKING_FILE_H
 
 #include <linux/file.h>
+#include <linux/uio.h>
+#include <linux/fs.h>
+
+struct backing_file_ctx {
+	const struct cred *cred;
+	struct file *user_file;
+	void (*accessed)(struct file *);
+	void (*end_write)(struct file *);
+};
 
 struct file *backing_file_open(const struct path *user_path, int flags,
 			       const struct path *real_path,
 			       const struct cred *cred);
+ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter,
+			       struct kiocb *iocb, int flags,
+			       struct backing_file_ctx *ctx);
+ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter,
+				struct kiocb *iocb, int flags,
+				struct backing_file_ctx *ctx);
 
 #endif /* _LINUX_BACKING_FILE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v14 03/12] fs: factor out backing_file_splice_{read,write}() helpers
  2023-10-16 16:08 [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
  2023-10-16 16:08 ` [PATCH v14 01/12] fs: prepare for stackable filesystems backing file helpers Amir Goldstein
  2023-10-16 16:08 ` [PATCH v14 02/12] fs: factor out backing_file_{read,write}_iter() helpers Amir Goldstein
@ 2023-10-16 16:08 ` Amir Goldstein
  2023-10-16 16:08 ` [PATCH v14 04/12] fs: factor out backing_file_mmap() helper Amir Goldstein
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 48+ messages in thread
From: Amir Goldstein @ 2023-10-16 16:08 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

There is not much in those helpers, but it makes sense to have them
logically next to the backing_file_{read,write}_iter() helpers as they
may grow more common logic in the future.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/backing-file.c            | 41 ++++++++++++++++++++++++++++++++++++
 fs/overlayfs/file.c          | 30 ++++++++++++--------------
 include/linux/backing-file.h |  8 +++++++
 3 files changed, 62 insertions(+), 17 deletions(-)

diff --git a/fs/backing-file.c b/fs/backing-file.c
index 2969ea6295ce..f32dd9012720 100644
--- a/fs/backing-file.c
+++ b/fs/backing-file.c
@@ -10,6 +10,7 @@
 
 #include <linux/fs.h>
 #include <linux/backing-file.h>
+#include <linux/splice.h>
 
 #include "internal.h"
 
@@ -238,6 +239,46 @@ ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter,
 }
 EXPORT_SYMBOL_GPL(backing_file_write_iter);
 
+ssize_t backing_file_splice_read(struct file *in, loff_t *ppos,
+				 struct pipe_inode_info *pipe, size_t len,
+				 unsigned int flags,
+				 struct backing_file_ctx *ctx)
+{
+	const struct cred *old_cred;
+	ssize_t ret;
+
+	old_cred = override_creds(ctx->cred);
+	ret = vfs_splice_read(in, ppos, pipe, len, flags);
+	revert_creds(old_cred);
+
+	if (ctx->accessed)
+		ctx->accessed(ctx->user_file);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(backing_file_splice_read);
+
+ssize_t backing_file_splice_write(struct pipe_inode_info *pipe,
+				  struct file *out, loff_t *ppos, size_t len,
+				  unsigned int flags,
+				  struct backing_file_ctx *ctx)
+{
+	const struct cred *old_cred;
+	ssize_t ret;
+
+	old_cred = override_creds(ctx->cred);
+	file_start_write(out);
+	ret = iter_file_splice_write(pipe, out, ppos, len, flags);
+	file_end_write(out);
+	revert_creds(old_cred);
+
+	if (ctx->end_write)
+		ctx->end_write(ctx->user_file);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(backing_file_splice_write);
+
 static int __init backing_aio_init(void)
 {
 	backing_aio_cachep = kmem_cache_create("backing_aio",
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 1376f2d308fe..6a7af440733b 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -9,7 +9,6 @@
 #include <linux/xattr.h>
 #include <linux/uio.h>
 #include <linux/uaccess.h>
-#include <linux/splice.h>
 #include <linux/security.h>
 #include <linux/mm.h>
 #include <linux/fs.h>
@@ -332,20 +331,21 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
 			       struct pipe_inode_info *pipe, size_t len,
 			       unsigned int flags)
 {
-	const struct cred *old_cred;
 	struct fd real;
 	ssize_t ret;
+	struct backing_file_ctx ctx = {
+		.cred = ovl_creds(file_inode(in)->i_sb),
+		.user_file = in,
+		.accessed = ovl_file_accessed,
+	};
 
 	ret = ovl_real_fdget(in, &real);
 	if (ret)
 		return ret;
 
-	old_cred = ovl_override_creds(file_inode(in)->i_sb);
-	ret = vfs_splice_read(real.file, ppos, pipe, len, flags);
-	revert_creds(old_cred);
-	ovl_file_accessed(in);
-
+	ret = backing_file_splice_read(real.file, ppos, pipe, len, flags, &ctx);
 	fdput(real);
+
 	return ret;
 }
 
@@ -361,9 +361,13 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 				loff_t *ppos, size_t len, unsigned int flags)
 {
 	struct fd real;
-	const struct cred *old_cred;
 	struct inode *inode = file_inode(out);
 	ssize_t ret;
+	struct backing_file_ctx ctx = {
+		.cred = ovl_creds(inode->i_sb),
+		.user_file = out,
+		.end_write = ovl_file_modified,
+	};
 
 	inode_lock(inode);
 	/* Update mode */
@@ -376,15 +380,7 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	if (ret)
 		goto out_unlock;
 
-	old_cred = ovl_override_creds(inode->i_sb);
-	file_start_write(real.file);
-
-	ret = iter_file_splice_write(pipe, real.file, ppos, len, flags);
-
-	file_end_write(real.file);
-	/* Update size */
-	ovl_file_modified(out);
-	revert_creds(old_cred);
+	ret = backing_file_splice_write(pipe, real.file, ppos, len, flags, &ctx);
 	fdput(real);
 
 out_unlock:
diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h
index 0648d548a418..0546d5b1c9f5 100644
--- a/include/linux/backing-file.h
+++ b/include/linux/backing-file.h
@@ -28,5 +28,13 @@ ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter,
 ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter,
 				struct kiocb *iocb, int flags,
 				struct backing_file_ctx *ctx);
+ssize_t backing_file_splice_read(struct file *in, loff_t *ppos,
+				 struct pipe_inode_info *pipe, size_t len,
+				 unsigned int flags,
+				 struct backing_file_ctx *ctx);
+ssize_t backing_file_splice_write(struct pipe_inode_info *pipe,
+				  struct file *out, loff_t *ppos, size_t len,
+				  unsigned int flags,
+				  struct backing_file_ctx *ctx);
 
 #endif /* _LINUX_BACKING_FILE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v14 04/12] fs: factor out backing_file_mmap() helper
  2023-10-16 16:08 [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
                   ` (2 preceding siblings ...)
  2023-10-16 16:08 ` [PATCH v14 03/12] fs: factor out backing_file_splice_{read,write}() helpers Amir Goldstein
@ 2023-10-16 16:08 ` Amir Goldstein
  2023-10-16 16:08 ` [PATCH v14 05/12] fuse: factor out helper for FUSE_DEV_IOC_CLONE Amir Goldstein
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 48+ messages in thread
From: Amir Goldstein @ 2023-10-16 16:08 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

Assert that the file object is allocated in a backing_file container
so that file_user_path() could be used to display the user path and
not the backing file's path in /proc/<pid>/maps.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/backing-file.c            | 27 +++++++++++++++++++++++++++
 fs/overlayfs/file.c          | 23 ++++++-----------------
 include/linux/backing-file.h |  2 ++
 3 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/fs/backing-file.c b/fs/backing-file.c
index f32dd9012720..1601a32e8e6a 100644
--- a/fs/backing-file.c
+++ b/fs/backing-file.c
@@ -11,6 +11,7 @@
 #include <linux/fs.h>
 #include <linux/backing-file.h>
 #include <linux/splice.h>
+#include <linux/mm.h>
 
 #include "internal.h"
 
@@ -279,6 +280,32 @@ ssize_t backing_file_splice_write(struct pipe_inode_info *pipe,
 }
 EXPORT_SYMBOL_GPL(backing_file_splice_write);
 
+int backing_file_mmap(struct file *file, struct vm_area_struct *vma,
+		      struct backing_file_ctx *ctx)
+{
+	const struct cred *old_cred;
+	int ret;
+
+	if (WARN_ON_ONCE(!(file->f_mode & FMODE_BACKING)) ||
+	    WARN_ON_ONCE(ctx->user_file != vma->vm_file))
+		return -EIO;
+
+	if (!file->f_op->mmap)
+		return -ENODEV;
+
+	vma_set_file(vma, file);
+
+	old_cred = override_creds(ctx->cred);
+	ret = call_mmap(vma->vm_file, vma);
+	revert_creds(old_cred);
+
+	if (ctx->accessed)
+		ctx->accessed(ctx->user_file);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(backing_file_mmap);
+
 static int __init backing_aio_init(void)
 {
 	backing_aio_cachep = kmem_cache_create("backing_aio",
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 6a7af440733b..034b8088c408 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -10,7 +10,6 @@
 #include <linux/uio.h>
 #include <linux/uaccess.h>
 #include <linux/security.h>
-#include <linux/mm.h>
 #include <linux/fs.h>
 #include <linux/backing-file.h>
 #include "overlayfs.h"
@@ -418,23 +417,13 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct file *realfile = file->private_data;
-	const struct cred *old_cred;
-	int ret;
-
-	if (!realfile->f_op->mmap)
-		return -ENODEV;
-
-	if (WARN_ON(file != vma->vm_file))
-		return -EIO;
-
-	vma_set_file(vma, realfile);
-
-	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	ret = call_mmap(vma->vm_file, vma);
-	revert_creds(old_cred);
-	ovl_file_accessed(file);
+	struct backing_file_ctx ctx = {
+		.cred = ovl_creds(file_inode(file)->i_sb),
+		.user_file = file,
+		.accessed = ovl_file_accessed,
+	};
 
-	return ret;
+	return backing_file_mmap(realfile, vma, &ctx);
 }
 
 static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h
index 0546d5b1c9f5..3f1fe1774f1b 100644
--- a/include/linux/backing-file.h
+++ b/include/linux/backing-file.h
@@ -36,5 +36,7 @@ ssize_t backing_file_splice_write(struct pipe_inode_info *pipe,
 				  struct file *out, loff_t *ppos, size_t len,
 				  unsigned int flags,
 				  struct backing_file_ctx *ctx);
+int backing_file_mmap(struct file *file, struct vm_area_struct *vma,
+		      struct backing_file_ctx *ctx);
 
 #endif /* _LINUX_BACKING_FILE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v14 05/12] fuse: factor out helper for FUSE_DEV_IOC_CLONE
  2023-10-16 16:08 [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
                   ` (3 preceding siblings ...)
  2023-10-16 16:08 ` [PATCH v14 04/12] fs: factor out backing_file_mmap() helper Amir Goldstein
@ 2023-10-16 16:08 ` Amir Goldstein
  2023-10-16 16:08 ` [PATCH v14 06/12] fuse: introduce FUSE_PASSTHROUGH capability Amir Goldstein
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 48+ messages in thread
From: Amir Goldstein @ 2023-10-16 16:08 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

In preparation to adding more fuse dev ioctls.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/dev.c | 59 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 1a8f82f478cb..eba68b57bd7c 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2251,43 +2251,50 @@ static int fuse_device_clone(struct fuse_conn *fc, struct file *new)
 	return 0;
 }
 
-static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
-			   unsigned long arg)
+static long fuse_dev_ioctl_clone(struct file *file, __u32 __user *argp)
 {
 	int res;
 	int oldfd;
 	struct fuse_dev *fud = NULL;
 	struct fd f;
 
+	if (get_user(oldfd, argp))
+		return -EFAULT;
+
+	f = fdget(oldfd);
+	if (!f.file)
+		return -EINVAL;
+
+	/*
+	 * Check against file->f_op because CUSE
+	 * uses the same ioctl handler.
+	 */
+	if (f.file->f_op == file->f_op)
+		fud = fuse_get_dev(f.file);
+
+	res = -EINVAL;
+	if (fud) {
+		mutex_lock(&fuse_mutex);
+		res = fuse_device_clone(fud->fc, file);
+		mutex_unlock(&fuse_mutex);
+	}
+
+	fdput(f);
+	return res;
+}
+
+static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
+			   unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+
 	switch (cmd) {
 	case FUSE_DEV_IOC_CLONE:
-		if (get_user(oldfd, (__u32 __user *)arg))
-			return -EFAULT;
+		return fuse_dev_ioctl_clone(file, argp);
 
-		f = fdget(oldfd);
-		if (!f.file)
-			return -EINVAL;
-
-		/*
-		 * Check against file->f_op because CUSE
-		 * uses the same ioctl handler.
-		 */
-		if (f.file->f_op == file->f_op)
-			fud = fuse_get_dev(f.file);
-
-		res = -EINVAL;
-		if (fud) {
-			mutex_lock(&fuse_mutex);
-			res = fuse_device_clone(fud->fc, file);
-			mutex_unlock(&fuse_mutex);
-		}
-		fdput(f);
-		break;
 	default:
-		res = -ENOTTY;
-		break;
+		return -ENOTTY;
 	}
-	return res;
 }
 
 const struct file_operations fuse_dev_operations = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v14 06/12] fuse: introduce FUSE_PASSTHROUGH capability
  2023-10-16 16:08 [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
                   ` (4 preceding siblings ...)
  2023-10-16 16:08 ` [PATCH v14 05/12] fuse: factor out helper for FUSE_DEV_IOC_CLONE Amir Goldstein
@ 2023-10-16 16:08 ` Amir Goldstein
  2023-10-16 16:08 ` [PATCH v14 07/12] fuse: pass optional backing_id in struct fuse_open_out Amir Goldstein
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 48+ messages in thread
From: Amir Goldstein @ 2023-10-16 16:08 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

FUSE_PASSTHROUGH capability to passthrough FUSE operations to backing
files will be made available with kernel config CONFIG_FUSE_PASSTHROUGH.

When requesting FUSE_PASSTHROUGH, userspace needs to specify the
max_stack_depth that is allowed for FUSE on top of backing files.

Introduce a refcounted fuse_backing object that will be used to
associate an open backing file with a fuse inode.

This association will be used when replying to OPEN with the flag
FOPEN_PASSTHROUGH to setup passthrough of read/write operations on the
open fuse file to the associated backing file.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/Kconfig           | 11 +++++++++++
 fs/fuse/Makefile          |  1 +
 fs/fuse/fuse_i.h          | 19 +++++++++++++++++++
 fs/fuse/inode.c           | 20 ++++++++++++++++++++
 fs/fuse/passthrough.c     | 30 ++++++++++++++++++++++++++++++
 include/uapi/linux/fuse.h | 12 ++++++++++--
 6 files changed, 91 insertions(+), 2 deletions(-)
 create mode 100644 fs/fuse/passthrough.c

diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 038ed0b9aaa5..8674dbfbe59d 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -52,3 +52,14 @@ config FUSE_DAX
 
 	  If you want to allow mounting a Virtio Filesystem with the "dax"
 	  option, answer Y.
+
+config FUSE_PASSTHROUGH
+	bool "FUSE passthrough operations support"
+	default y
+	depends on FUSE_FS
+	select FS_STACK
+	help
+	  This allows bypassing FUSE server by mapping specific FUSE operations
+	  to be performed directly on a backing file.
+
+	  If you want to allow passthrough operations, answer Y.
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 0c48b35c058d..504acfb71402 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -9,5 +9,6 @@ obj-$(CONFIG_VIRTIO_FS) += virtiofs.o
 
 fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
 fuse-$(CONFIG_FUSE_DAX) += dax.o
+fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
 
 virtiofs-y := virtio_fs.o
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 6e6e721f421b..5be51358542e 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -63,6 +63,15 @@ struct fuse_forget_link {
 	struct fuse_forget_link *next;
 };
 
+/** Container for data related to mapping to backing file */
+struct fuse_backing {
+	struct file *file;
+
+	/** refcount */
+	refcount_t count;
+	struct rcu_head rcu;
+};
+
 /** FUSE inode */
 struct fuse_inode {
 	/** Inode data */
@@ -803,6 +812,12 @@ struct fuse_conn {
 	/* Is statx not implemented by fs? */
 	unsigned int no_statx:1;
 
+	/** Passthrough support for read/write IO */
+	unsigned int passthrough:1;
+
+	/** Maximum stack depth for passthrough backing files */
+	int max_stack_depth;
+
 	/** The number of requests waiting for completion */
 	atomic_t num_waiting;
 
@@ -1340,4 +1355,8 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 void fuse_file_release(struct inode *inode, struct fuse_file *ff,
 		       unsigned int open_flags, fl_owner_t id, bool isdir);
 
+/* passthrough.c */
+struct fuse_backing *fuse_backing_get(struct fuse_backing *fb);
+void fuse_backing_put(struct fuse_backing *fb);
+
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 2e4eb7cf26fb..7e01eb5a04dc 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1234,6 +1234,24 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 				fc->create_supp_group = 1;
 			if (flags & FUSE_DIRECT_IO_RELAX)
 				fc->direct_io_relax = 1;
+			/*
+			 * max_stack_depth is the max stack depth of FUSE fs,
+			 * so it has to be at least 1 to support passthrough
+			 * to backing files.
+			 *
+			 * with max_stack_depth > 1, the backing files can be
+			 * on a stacked fs (e.g. overlayfs) themselves and with
+			 * max_stack_depth == 1, FUSE fs can be stacked as the
+			 * underlying fs of a stacked fs (e.g. overlayfs).
+			 */
+			if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH) &&
+			    (flags & FUSE_PASSTHROUGH) &&
+			    arg->max_stack_depth > 0 &&
+			    arg->max_stack_depth <= FILESYSTEM_MAX_STACK_DEPTH) {
+				fc->passthrough = 1;
+				fc->max_stack_depth = arg->max_stack_depth;
+				fm->sb->s_stack_depth = arg->max_stack_depth;
+			}
 		} else {
 			ra_pages = fc->max_read / PAGE_SIZE;
 			fc->no_lock = 1;
@@ -1289,6 +1307,8 @@ void fuse_send_init(struct fuse_mount *fm)
 #endif
 	if (fm->fc->auto_submounts)
 		flags |= FUSE_SUBMOUNTS;
+	if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
+		flags |= FUSE_PASSTHROUGH;
 
 	ia->in.flags = flags;
 	ia->in.flags2 = flags >> 32;
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
new file mode 100644
index 000000000000..e8639c0a9ac6
--- /dev/null
+++ b/fs/fuse/passthrough.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * FUSE passthrough to backing file.
+ *
+ * Copyright (c) 2023 CTERA Networks.
+ */
+
+#include "fuse_i.h"
+
+#include <linux/file.h>
+
+struct fuse_backing *fuse_backing_get(struct fuse_backing *fb)
+{
+	if (fb && refcount_inc_not_zero(&fb->count))
+		return fb;
+	return NULL;
+}
+
+static void fuse_backing_free(struct fuse_backing *fb)
+{
+	if (fb->file)
+		fput(fb->file);
+	kfree_rcu(fb, rcu);
+}
+
+void fuse_backing_put(struct fuse_backing *fb)
+{
+	if (fb && refcount_dec_and_test(&fb->count))
+		fuse_backing_free(fb);
+}
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index db92a7202b34..acb42a76f7ff 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -211,6 +211,10 @@
  *  7.39
  *  - add FUSE_DIRECT_IO_RELAX
  *  - add FUSE_STATX and related structures
+ *
+ *  7.40
+ *  - add max_stack_depth to fuse_init_out, add FUSE_PASSTHROUGH flag
+ *  - add FOPEN_PASSTHROUGH
  */
 
 #ifndef _LINUX_FUSE_H
@@ -246,7 +250,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 39
+#define FUSE_KERNEL_MINOR_VERSION 40
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -353,6 +357,7 @@ struct fuse_file_lock {
  * FOPEN_STREAM: the file is stream-like (no file position at all)
  * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
  * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
+ * FOPEN_PASSTHROUGH: passthrough read/write operations for this open file
  */
 #define FOPEN_DIRECT_IO		(1 << 0)
 #define FOPEN_KEEP_CACHE	(1 << 1)
@@ -361,6 +366,7 @@ struct fuse_file_lock {
 #define FOPEN_STREAM		(1 << 4)
 #define FOPEN_NOFLUSH		(1 << 5)
 #define FOPEN_PARALLEL_DIRECT_WRITES	(1 << 6)
+#define FOPEN_PASSTHROUGH	(1 << 7)
 
 /**
  * INIT request/reply flags
@@ -450,6 +456,7 @@ struct fuse_file_lock {
 #define FUSE_CREATE_SUPP_GROUP	(1ULL << 34)
 #define FUSE_HAS_EXPIRE_ONLY	(1ULL << 35)
 #define FUSE_DIRECT_IO_RELAX	(1ULL << 36)
+#define FUSE_PASSTHROUGH	(1ULL << 37)
 
 /**
  * CUSE INIT request/reply flags
@@ -875,7 +882,8 @@ struct fuse_init_out {
 	uint16_t	max_pages;
 	uint16_t	map_alignment;
 	uint32_t	flags2;
-	uint32_t	unused[7];
+	uint32_t	max_stack_depth;
+	uint32_t	unused[6];
 };
 
 #define CUSE_INIT_INFO_MAX 4096
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v14 07/12] fuse: pass optional backing_id in struct fuse_open_out
  2023-10-16 16:08 [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
                   ` (5 preceding siblings ...)
  2023-10-16 16:08 ` [PATCH v14 06/12] fuse: introduce FUSE_PASSTHROUGH capability Amir Goldstein
@ 2023-10-16 16:08 ` Amir Goldstein
  2023-10-16 16:08 ` [PATCH v14 08/12] fuse: implement ioctls to manage backing files Amir Goldstein
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 48+ messages in thread
From: Amir Goldstein @ 2023-10-16 16:08 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

propagae fuse_open_out arguments up to fuse_finish_open() with optional
backing_id member.

This will be used for setting up passthrough to backing file on open
reply with FOPEN_PASSTHROUGH flag and on zero backing_id.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/cuse.c            |  3 ++-
 fs/fuse/dir.c             |  2 +-
 fs/fuse/file.c            | 23 +++++++++++++----------
 fs/fuse/fuse_i.h          |  8 +++++---
 fs/fuse/ioctl.c           |  3 ++-
 include/uapi/linux/fuse.h |  2 +-
 6 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index 91e89e68177e..050e97976e1f 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -115,6 +115,7 @@ static int cuse_open(struct inode *inode, struct file *file)
 {
 	dev_t devt = inode->i_cdev->dev;
 	struct cuse_conn *cc = NULL, *pos;
+	struct fuse_open_out outarg;
 	int rc;
 
 	/* look up and get the connection */
@@ -135,7 +136,7 @@ static int cuse_open(struct inode *inode, struct file *file)
 	 * Generic permission check is already done against the chrdev
 	 * file, proceed to open.
 	 */
-	rc = fuse_do_open(&cc->fm, 0, file, 0);
+	rc = fuse_do_open(&cc->fm, 0, file, 0, &outarg);
 	if (rc)
 		fuse_conn_put(&cc->fc);
 	return rc;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d707e6987da9..f2adcb30852d 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -698,7 +698,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 		fuse_sync_release(fi, ff, flags);
 	} else {
 		file->private_data = ff;
-		fuse_finish_open(inode, file);
+		fuse_finish_open(inode, file, &outopen);
 		if (fm->fc->atomic_o_trunc && trunc)
 			truncate_pagecache(inode, 0);
 		else if (!(ff->open_flags & FOPEN_KEEP_CACHE))
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1cdb6327511e..b0a6189f7662 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -126,7 +126,8 @@ static void fuse_file_put(struct fuse_file *ff, bool sync, bool isdir)
 }
 
 struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
-				 unsigned int open_flags, bool isdir)
+				 unsigned int open_flags, bool isdir,
+				 struct fuse_open_out *outargp)
 {
 	struct fuse_conn *fc = fm->fc;
 	struct fuse_file *ff;
@@ -140,13 +141,12 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 	/* Default for no-open */
 	ff->open_flags = FOPEN_KEEP_CACHE | (isdir ? FOPEN_CACHE_DIR : 0);
 	if (isdir ? !fc->no_opendir : !fc->no_open) {
-		struct fuse_open_out outarg;
 		int err;
 
-		err = fuse_send_open(fm, nodeid, open_flags, opcode, &outarg);
+		err = fuse_send_open(fm, nodeid, open_flags, opcode, outargp);
 		if (!err) {
-			ff->fh = outarg.fh;
-			ff->open_flags = outarg.open_flags;
+			ff->fh = outargp->fh;
+			ff->open_flags = outargp->open_flags;
 
 		} else if (err != -ENOSYS) {
 			fuse_file_free(ff);
@@ -168,9 +168,10 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 }
 
 int fuse_do_open(struct fuse_mount *fm, u64 nodeid, struct file *file,
-		 bool isdir)
+		 bool isdir, struct fuse_open_out *outargp)
 {
-	struct fuse_file *ff = fuse_file_open(fm, nodeid, file->f_flags, isdir);
+	struct fuse_file *ff = fuse_file_open(fm, nodeid, file->f_flags, isdir,
+					      outargp);
 
 	if (!IS_ERR(ff))
 		file->private_data = ff;
@@ -194,7 +195,8 @@ static void fuse_link_write_file(struct file *file)
 	spin_unlock(&fi->lock);
 }
 
-void fuse_finish_open(struct inode *inode, struct file *file)
+void fuse_finish_open(struct inode *inode, struct file *file,
+		      struct fuse_open_out *outargp)
 {
 	struct fuse_file *ff = file->private_data;
 	struct fuse_conn *fc = get_fuse_conn(inode);
@@ -222,6 +224,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
 {
 	struct fuse_mount *fm = get_fuse_mount(inode);
 	struct fuse_conn *fc = fm->fc;
+	struct fuse_open_out outarg;
 	int err;
 	bool is_wb_truncate = (file->f_flags & O_TRUNC) &&
 			  fc->atomic_o_trunc &&
@@ -249,9 +252,9 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
 	if (is_wb_truncate || dax_truncate)
 		fuse_set_nowrite(inode);
 
-	err = fuse_do_open(fm, get_node_id(inode), file, isdir);
+	err = fuse_do_open(fm, get_node_id(inode), file, isdir, &outarg);
 	if (!err)
-		fuse_finish_open(inode, file);
+		fuse_finish_open(inode, file, &outarg);
 
 	if (is_wb_truncate || dax_truncate)
 		fuse_release_nowrite(inode);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 5be51358542e..233344773d29 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1038,7 +1038,8 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir);
 
 struct fuse_file *fuse_file_alloc(struct fuse_mount *fm);
 void fuse_file_free(struct fuse_file *ff);
-void fuse_finish_open(struct inode *inode, struct file *file);
+void fuse_finish_open(struct inode *inode, struct file *file,
+		      struct fuse_open_out *outargp);
 
 void fuse_sync_release(struct fuse_inode *fi, struct fuse_file *ff,
 		       unsigned int flags);
@@ -1259,7 +1260,7 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
 			     u64 child_nodeid, struct qstr *name, u32 flags);
 
 int fuse_do_open(struct fuse_mount *fm, u64 nodeid, struct file *file,
-		 bool isdir);
+		 bool isdir, struct fuse_open_out *outargp);
 
 /**
  * fuse_direct_io() flags
@@ -1351,7 +1352,8 @@ int fuse_fileattr_set(struct mnt_idmap *idmap,
 /* file.c */
 
 struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
-				 unsigned int open_flags, bool isdir);
+				 unsigned int open_flags, bool isdir,
+				 struct fuse_open_out *outargp);
 void fuse_file_release(struct inode *inode, struct fuse_file *ff,
 		       unsigned int open_flags, fl_owner_t id, bool isdir);
 
diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c
index 726640fa439e..3aeef75c4df3 100644
--- a/fs/fuse/ioctl.c
+++ b/fs/fuse/ioctl.c
@@ -423,6 +423,7 @@ static struct fuse_file *fuse_priv_ioctl_prepare(struct inode *inode)
 {
 	struct fuse_mount *fm = get_fuse_mount(inode);
 	bool isdir = S_ISDIR(inode->i_mode);
+	struct fuse_open_out outarg;
 
 	if (!fuse_allow_current_process(fm->fc))
 		return ERR_PTR(-EACCES);
@@ -433,7 +434,7 @@ static struct fuse_file *fuse_priv_ioctl_prepare(struct inode *inode)
 	if (!S_ISREG(inode->i_mode) && !isdir)
 		return ERR_PTR(-ENOTTY);
 
-	return fuse_file_open(fm, get_node_id(inode), O_RDONLY, isdir);
+	return fuse_file_open(fm, get_node_id(inode), O_RDONLY, isdir, &outarg);
 }
 
 static void fuse_priv_ioctl_cleanup(struct inode *inode, struct fuse_file *ff)
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index acb42a76f7ff..0e273f372df4 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -766,7 +766,7 @@ struct fuse_create_in {
 struct fuse_open_out {
 	uint64_t	fh;
 	uint32_t	open_flags;
-	uint32_t	padding;
+	int32_t		backing_id;
 };
 
 struct fuse_release_in {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v14 08/12] fuse: implement ioctls to manage backing files
  2023-10-16 16:08 [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
                   ` (6 preceding siblings ...)
  2023-10-16 16:08 ` [PATCH v14 07/12] fuse: pass optional backing_id in struct fuse_open_out Amir Goldstein
@ 2023-10-16 16:08 ` Amir Goldstein
  2023-10-17  9:45   ` Amir Goldstein
  2023-10-16 16:08 ` [PATCH v14 09/12] fuse: implement read/write passthrough Amir Goldstein
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-10-16 16:08 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

FUSE server calls the FUSE_DEV_IOC_BACKING_OPEN ioctl with a backing file
descriptor.  If the call succeeds, a backing file identifier is returned.

A later reply to OPEN request with the flag FOPEN_PASSTHROUGH will setup
passthrough of file operations on the open FUSE file to the backing file
associated with the id.  If there is no backing file associated with id,
FOPEN_PASSTHROUGH flag is ignored.

The FUSE server may call FUSE_DEV_IOC_BACKING_CLOSE ioctl to close the
backing file by its id.
If there is no backing file with that id, -ENOENT is returned.

This can be done at any time, but if an open reply with FOPEN_PASSTHROUGH
flag is still in progress, the open may or may not end up setting up the
passthrough to the backing file.

In any case, the backing file will be kept open by the FUSE driver until
the last fuse_file that was setup to passthrough to that backing file is
closed AND the FUSE_DEV_IOC_BACKING_CLOSE ioctl was called.

Setting up backing files requires a server with CAP_SYS_ADMIN privileges.
For the backing file to be successfully setup, the backing file must
implement both read_iter and write_iter file operations.

The limitation on the level of filesystem stacking allowed for the
backing file is enforced before setting up the backing file.

Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/dev.c             |  41 +++++++++
 fs/fuse/file.c            |   5 ++
 fs/fuse/fuse_i.h          |  34 +++++++
 fs/fuse/inode.c           |   5 ++
 fs/fuse/passthrough.c     | 185 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fuse.h |   9 ++
 6 files changed, 279 insertions(+)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index eba68b57bd7c..b680787bd66d 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2283,6 +2283,41 @@ static long fuse_dev_ioctl_clone(struct file *file, __u32 __user *argp)
 	return res;
 }
 
+static long fuse_dev_ioctl_backing_open(struct file *file,
+					struct fuse_backing_map __user *argp)
+{
+	struct fuse_dev *fud = fuse_get_dev(file);
+	struct fuse_backing_map map;
+
+	if (!fud)
+		return -EINVAL;
+
+	if (!IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
+		return -EOPNOTSUPP;
+
+	if (copy_from_user(&map, argp, sizeof(map)))
+		return -EFAULT;
+
+	return fuse_backing_open(fud->fc, &map);
+}
+
+static long fuse_dev_ioctl_backing_close(struct file *file, __u32 __user *argp)
+{
+	struct fuse_dev *fud = fuse_get_dev(file);
+	int backing_id;
+
+	if (!fud)
+		return -EINVAL;
+
+	if (!IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
+		return -EOPNOTSUPP;
+
+	if (get_user(backing_id, argp))
+		return -EFAULT;
+
+	return fuse_backing_close(fud->fc, backing_id);
+}
+
 static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
 			   unsigned long arg)
 {
@@ -2292,6 +2327,12 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
 	case FUSE_DEV_IOC_CLONE:
 		return fuse_dev_ioctl_clone(file, argp);
 
+	case FUSE_DEV_IOC_BACKING_OPEN:
+		return fuse_dev_ioctl_backing_open(file, argp);
+
+	case FUSE_DEV_IOC_BACKING_CLOSE:
+		return fuse_dev_ioctl_backing_close(file, argp);
+
 	default:
 		return -ENOTTY;
 	}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b0a6189f7662..83a7b16d682d 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -205,6 +205,8 @@ void fuse_finish_open(struct inode *inode, struct file *file,
 		stream_open(inode, file);
 	else if (ff->open_flags & FOPEN_NONSEEKABLE)
 		nonseekable_open(inode, file);
+	else if (ff->open_flags & FOPEN_PASSTHROUGH)
+		fuse_passthrough_open(file, outargp->backing_id);
 
 	if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC)) {
 		struct fuse_inode *fi = get_fuse_inode(inode);
@@ -281,6 +283,9 @@ static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff,
 	struct fuse_conn *fc = ff->fm->fc;
 	struct fuse_release_args *ra = ff->release_args;
 
+	if (fuse_file_passthrough(ff))
+		fuse_passthrough_release(ff);
+
 	/* Inode is NULL on error path of fuse_create_open() */
 	if (likely(fi)) {
 		spin_lock(&fi->lock);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 233344773d29..cb1e2aadf1dc 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -66,6 +66,7 @@ struct fuse_forget_link {
 /** Container for data related to mapping to backing file */
 struct fuse_backing {
 	struct file *file;
+	struct cred *cred;
 
 	/** refcount */
 	refcount_t count;
@@ -238,6 +239,12 @@ struct fuse_file {
 	/** Wait queue head for poll */
 	wait_queue_head_t poll_wait;
 
+#ifdef CONFIG_FUSE_PASSTHROUGH
+	/** Reference to backing file in passthrough mode */
+	struct file *passthrough;
+	const struct cred *cred;
+#endif
+
 	/** Has flock been performed on this file? */
 	bool flock:1;
 };
@@ -867,6 +874,11 @@ struct fuse_conn {
 
 	/* New writepages go into this bucket */
 	struct fuse_sync_bucket __rcu *curr_bucket;
+
+#ifdef CONFIG_FUSE_PASSTHROUGH
+	/** IDR for backing files ids */
+	struct idr backing_files_map;
+#endif
 };
 
 /*
@@ -1360,5 +1372,27 @@ void fuse_file_release(struct inode *inode, struct fuse_file *ff,
 /* passthrough.c */
 struct fuse_backing *fuse_backing_get(struct fuse_backing *fb);
 void fuse_backing_put(struct fuse_backing *fb);
+void fuse_backing_files_init(struct fuse_conn *fc);
+void fuse_backing_files_free(struct fuse_conn *fc);
+int fuse_backing_open(struct fuse_conn *fc, struct fuse_backing_map *map);
+int fuse_backing_close(struct fuse_conn *fc, int backing_id);
+
+void fuse_passthrough_setup(struct file *file, int backing_id);
+void fuse_passthrough_release(struct fuse_file *ff);
+
+static inline void fuse_passthrough_open(struct file *file, int backing_id)
+{
+	if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH) && backing_id)
+		fuse_passthrough_setup(file, backing_id);
+}
+
+static inline struct file *fuse_file_passthrough(struct fuse_file *ff)
+{
+#ifdef CONFIG_FUSE_PASSTHROUGH
+	return ff->passthrough;
+#else
+	return NULL;
+#endif
+}
 
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 7e01eb5a04dc..09280bf6e727 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -875,6 +875,9 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
 	fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ;
 	fc->max_pages_limit = FUSE_MAX_MAX_PAGES;
 
+	if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
+		fuse_backing_files_init(fc);
+
 	INIT_LIST_HEAD(&fc->mounts);
 	list_add(&fm->fc_entry, &fc->mounts);
 	fm->fc = fc;
@@ -1336,6 +1339,8 @@ EXPORT_SYMBOL_GPL(fuse_send_init);
 void fuse_free_conn(struct fuse_conn *fc)
 {
 	WARN_ON(!list_empty(&fc->devices));
+	if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
+		fuse_backing_files_free(fc);
 	kfree_rcu(fc, rcu);
 }
 EXPORT_SYMBOL_GPL(fuse_free_conn);
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index e8639c0a9ac6..2c8e68f1c90e 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -8,6 +8,7 @@
 #include "fuse_i.h"
 
 #include <linux/file.h>
+#include <linux/backing-file.h>
 
 struct fuse_backing *fuse_backing_get(struct fuse_backing *fb)
 {
@@ -18,8 +19,11 @@ struct fuse_backing *fuse_backing_get(struct fuse_backing *fb)
 
 static void fuse_backing_free(struct fuse_backing *fb)
 {
+	pr_debug("%s: fb=0x%p\n", __func__, fb);
+
 	if (fb->file)
 		fput(fb->file);
+	put_cred(fb->cred);
 	kfree_rcu(fb, rcu);
 }
 
@@ -28,3 +32,184 @@ void fuse_backing_put(struct fuse_backing *fb)
 	if (fb && refcount_dec_and_test(&fb->count))
 		fuse_backing_free(fb);
 }
+
+void fuse_backing_files_init(struct fuse_conn *fc)
+{
+	idr_init(&fc->backing_files_map);
+}
+
+static int fuse_backing_id_alloc(struct fuse_conn *fc, struct fuse_backing *fb)
+{
+	int id;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&fc->lock);
+	id = idr_alloc_cyclic(&fc->backing_files_map, fb, 1, 0, GFP_ATOMIC);
+	spin_unlock(&fc->lock);
+	idr_preload_end();
+
+	WARN_ON_ONCE(id == 0);
+	return id;
+}
+
+static struct fuse_backing *fuse_backing_id_remove(struct fuse_conn *fc,
+						   int id)
+{
+	struct fuse_backing *fb;
+
+	spin_lock(&fc->lock);
+	fb = idr_remove(&fc->backing_files_map, id);
+	spin_unlock(&fc->lock);
+
+	return fb;
+}
+
+static int fuse_backing_id_free(int id, void *p, void *data)
+{
+	struct fuse_backing *fb = p;
+
+	WARN_ON_ONCE(refcount_read(&fb->count) != 1);
+	fuse_backing_free(fb);
+	return 0;
+}
+
+void fuse_backing_files_free(struct fuse_conn *fc)
+{
+	idr_for_each(&fc->backing_files_map, fuse_backing_id_free, NULL);
+	idr_destroy(&fc->backing_files_map);
+}
+
+int fuse_backing_open(struct fuse_conn *fc, struct fuse_backing_map *map)
+{
+	struct file *file;
+	struct super_block *backing_sb;
+	struct fuse_backing *fb = NULL;
+	int res;
+
+	pr_debug("%s: fd=%d flags=0x%x\n", __func__, map->fd, map->flags);
+
+	/* TODO: relax CAP_SYS_ADMIN once backing files are visible to lsof */
+	res = -EPERM;
+	if (!fc->passthrough || !capable(CAP_SYS_ADMIN))
+		goto out;
+
+	res = -EINVAL;
+	if (map->flags)
+		goto out;
+
+	file = fget(map->fd);
+	res = -EBADF;
+	if (!file)
+		goto out;
+
+	res = -EOPNOTSUPP;
+	if (!file->f_op->read_iter || !file->f_op->write_iter)
+		goto out_fput;
+
+	backing_sb = file_inode(file)->i_sb;
+	res = -ELOOP;
+	if (backing_sb->s_stack_depth >= fc->max_stack_depth)
+		goto out_fput;
+
+	fb = kmalloc(sizeof(struct fuse_backing), GFP_KERNEL);
+	res = -ENOMEM;
+	if (!fb)
+		goto out_fput;
+
+	fb->file = file;
+	fb->cred = prepare_creds();
+	refcount_set(&fb->count, 1);
+
+	res = fuse_backing_id_alloc(fc, fb);
+	if (res < 0) {
+		fuse_backing_free(fb);
+		fb = NULL;
+	}
+
+out:
+	pr_debug("%s: fb=0x%p, ret=%i\n", __func__, fb, res);
+
+	return res;
+
+out_fput:
+	fput(file);
+	goto out;
+}
+
+int fuse_backing_close(struct fuse_conn *fc, int backing_id)
+{
+	struct fuse_backing *fb = NULL;
+	int err;
+
+	pr_debug("%s: backing_id=%d\n", __func__, backing_id);
+
+	/* TODO: relax CAP_SYS_ADMIN once backing files are visible to lsof */
+	err = -EPERM;
+	if (!fc->passthrough || !capable(CAP_SYS_ADMIN))
+		goto out;
+
+	err = -EINVAL;
+	if (backing_id <= 0)
+		goto out;
+
+	err = -ENOENT;
+	fb = fuse_backing_id_remove(fc, backing_id);
+	if (!fb)
+		goto out;
+
+	fuse_backing_put(fb);
+	err = 0;
+out:
+	pr_debug("%s: fb=0x%p, err=%i\n", __func__, fb, err);
+
+	return err;
+}
+
+/* Setup passthrough to a backing file */
+void fuse_passthrough_setup(struct file *file, int backing_id)
+{
+	struct fuse_file *ff = file->private_data;
+	struct fuse_conn *fc = ff->fm->fc;
+	struct fuse_backing *fb;
+	struct file *backing_file;
+	int err;
+
+	err = -EINVAL;
+	if (backing_id <= 0)
+		goto out;
+
+	rcu_read_lock();
+	fb = idr_find(&fc->backing_files_map, backing_id);
+	fb = fuse_backing_get(fb);
+	rcu_read_unlock();
+
+	err = -ENOENT;
+	if (!fb)
+		goto out;
+
+	/* Allocate backing file per fuse file to store fuse path */
+	backing_file = backing_file_open(&file->f_path, file->f_flags,
+					 &fb->file->f_path, fb->cred);
+	err = PTR_ERR(backing_file);
+	if (IS_ERR(backing_file))
+		goto out;
+
+	err = 0;
+	ff->passthrough = backing_file;
+	ff->cred = get_cred(fb->cred);
+out:
+	pr_debug("%s: backing_id=%d, fb=0x%p, backing_file=0x%p, err=%i\n", __func__,
+		 backing_id, fb, ff->passthrough, err);
+
+	fuse_backing_put(fb);
+	if (!ff->passthrough)
+		ff->open_flags &= ~FOPEN_PASSTHROUGH;
+}
+
+void fuse_passthrough_release(struct fuse_file *ff)
+{
+	fput(ff->passthrough);
+	ff->passthrough = NULL;
+	put_cred(ff->cred);
+	ff->cred = NULL;
+}
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 0e273f372df4..eade89f7dc4d 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -1055,9 +1055,18 @@ struct fuse_notify_retrieve_in {
 	uint64_t	dummy4;
 };
 
+struct fuse_backing_map {
+	int32_t		fd;
+	uint32_t	flags;
+	uint64_t	padding;
+};
+
 /* Device ioctls: */
 #define FUSE_DEV_IOC_MAGIC		229
 #define FUSE_DEV_IOC_CLONE		_IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t)
+#define FUSE_DEV_IOC_BACKING_OPEN	_IOW(FUSE_DEV_IOC_MAGIC, 1, \
+					     struct fuse_backing_map)
+#define FUSE_DEV_IOC_BACKING_CLOSE	_IOW(FUSE_DEV_IOC_MAGIC, 2, uint32_t)
 
 struct fuse_lseek_in {
 	uint64_t	fh;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v14 09/12] fuse: implement read/write passthrough
  2023-10-16 16:08 [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
                   ` (7 preceding siblings ...)
  2023-10-16 16:08 ` [PATCH v14 08/12] fuse: implement ioctls to manage backing files Amir Goldstein
@ 2023-10-16 16:08 ` Amir Goldstein
  2023-10-16 16:09 ` [PATCH v14 10/12] fuse: implement splice_{read/write} passthrough Amir Goldstein
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 48+ messages in thread
From: Amir Goldstein @ 2023-10-16 16:08 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

Use the backing file read/write helpers to implement read/write
passthrough to a backing file.

Similar to update size/mtime at the end of fuse_perform_write(),
we need to bump the attr version when we update the inode size,
so extend the ->end_write() callback to report pos and number of
bytes written.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/backing-file.c            |  8 ++--
 fs/fuse/file.c               |  8 +++-
 fs/fuse/fuse_i.h             |  3 ++
 fs/fuse/passthrough.c        | 86 ++++++++++++++++++++++++++++++++++++
 fs/overlayfs/file.c          |  9 +++-
 include/linux/backing-file.h |  2 +-
 6 files changed, 107 insertions(+), 9 deletions(-)

diff --git a/fs/backing-file.c b/fs/backing-file.c
index 1601a32e8e6a..133373432b9c 100644
--- a/fs/backing-file.c
+++ b/fs/backing-file.c
@@ -57,7 +57,7 @@ struct backing_aio {
 	refcount_t ref;
 	struct kiocb *orig_iocb;
 	/* used for aio completion */
-	void (*end_write)(struct file *);
+	void (*end_write)(struct file *, loff_t, ssize_t);
 	struct work_struct work;
 	long res;
 };
@@ -88,7 +88,7 @@ static void backing_aio_cleanup(struct backing_aio *aio, long res)
 	if (iocb->ki_flags & IOCB_WRITE) {
 		kiocb_end_write(iocb);
 		if (aio->end_write)
-			aio->end_write(orig_iocb->ki_filp);
+			aio->end_write(orig_iocb->ki_filp, iocb->ki_pos, res);
 	}
 
 	orig_iocb->ki_pos = iocb->ki_pos;
@@ -208,7 +208,7 @@ ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter,
 		ret = vfs_iter_write(file, iter, &iocb->ki_pos, rwf);
 		file_end_write(file);
 		if (ctx->end_write)
-			ctx->end_write(ctx->user_file);
+			ctx->end_write(iocb->ki_filp, iocb->ki_pos, ret);
 	} else {
 		struct backing_aio *aio;
 
@@ -274,7 +274,7 @@ ssize_t backing_file_splice_write(struct pipe_inode_info *pipe,
 	revert_creds(old_cred);
 
 	if (ctx->end_write)
-		ctx->end_write(ctx->user_file);
+		ctx->end_write(ctx->user_file, ppos ? *ppos : 0, ret);
 
 	return ret;
 }
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 83a7b16d682d..17964486ba80 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1636,7 +1636,9 @@ static ssize_t fuse_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	if (FUSE_IS_DAX(inode))
 		return fuse_dax_read_iter(iocb, to);
 
-	if (!(ff->open_flags & FOPEN_DIRECT_IO))
+	if (fuse_file_passthrough(ff))
+		return fuse_passthrough_read_iter(iocb, to);
+	else if (!(ff->open_flags & FOPEN_DIRECT_IO))
 		return fuse_cache_read_iter(iocb, to);
 	else
 		return fuse_direct_read_iter(iocb, to);
@@ -1654,7 +1656,9 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (FUSE_IS_DAX(inode))
 		return fuse_dax_write_iter(iocb, from);
 
-	if (!(ff->open_flags & FOPEN_DIRECT_IO))
+	if (fuse_file_passthrough(ff))
+		return fuse_passthrough_write_iter(iocb, from);
+	else if (!(ff->open_flags & FOPEN_DIRECT_IO))
 		return fuse_cache_write_iter(iocb, from);
 	else
 		return fuse_direct_write_iter(iocb, from);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index cb1e2aadf1dc..0a43dc93e376 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1395,4 +1395,7 @@ static inline struct file *fuse_file_passthrough(struct fuse_file *ff)
 #endif
 }
 
+ssize_t fuse_passthrough_read_iter(struct kiocb *iocb, struct iov_iter *iter);
+ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, struct iov_iter *iter);
+
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index 2c8e68f1c90e..0224f63f8cdf 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -10,6 +10,92 @@
 #include <linux/file.h>
 #include <linux/backing-file.h>
 
+static void fuse_file_start_write(struct file *file, loff_t pos, size_t count)
+{
+	struct inode *inode = file_inode(file);
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	if (inode->i_size < pos + count)
+		set_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
+}
+
+static void fuse_file_end_write(struct file *file, loff_t pos, ssize_t res)
+{
+	struct inode *inode = file_inode(file);
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	fuse_write_update_attr(inode, pos, res);
+	clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
+}
+
+static void fuse_file_accessed(struct file *file)
+{
+	struct inode *inode = file_inode(file);
+	struct fuse_file *ff = file->private_data;
+	struct file *backing_file = fuse_file_passthrough(ff);
+	struct inode *backing_inode = file_inode(backing_file);
+
+	/* Mimic atime update policy of backing inode, not the actual value */
+	if (!timespec64_equal(&backing_inode->i_atime, &inode->i_atime))
+		fuse_invalidate_atime(inode);
+}
+
+ssize_t fuse_passthrough_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct file *file = iocb->ki_filp;
+	struct fuse_file *ff = file->private_data;
+	struct file *backing_file = fuse_file_passthrough(ff);
+	size_t count = iov_iter_count(iter);
+	ssize_t ret;
+	struct backing_file_ctx ctx = {
+		.cred = ff->cred,
+		.user_file = file,
+		.accessed = fuse_file_accessed,
+	};
+
+
+	pr_debug("%s: backing_file=0x%p, pos=%lld, len=%zu\n", __func__,
+		 backing_file, iocb->ki_pos, count);
+
+	if (!count)
+		return 0;
+
+	ret = backing_file_read_iter(backing_file, iter, iocb, iocb->ki_flags,
+				     &ctx);
+
+	return ret;
+}
+
+ssize_t fuse_passthrough_write_iter(struct kiocb *iocb,
+				    struct iov_iter *iter)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
+	struct fuse_file *ff = file->private_data;
+	struct file *backing_file = fuse_file_passthrough(ff);
+	size_t count = iov_iter_count(iter);
+	ssize_t ret;
+	struct backing_file_ctx ctx = {
+		.cred = ff->cred,
+		.user_file = file,
+		.end_write = fuse_file_end_write,
+	};
+
+	pr_debug("%s: backing_file=0x%p, pos=%lld, len=%zu\n", __func__,
+		 backing_file, iocb->ki_pos, count);
+
+	if (!count)
+		return 0;
+
+	inode_lock(inode);
+	fuse_file_start_write(file, iocb->ki_pos, count);
+	ret = backing_file_write_iter(backing_file, iter, iocb, iocb->ki_flags,
+				      &ctx);
+	inode_unlock(inode);
+
+	return ret;
+}
+
 struct fuse_backing *fuse_backing_get(struct fuse_backing *fb)
 {
 	if (fb && refcount_inc_not_zero(&fb->count))
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 034b8088c408..3659c7f340e5 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -232,6 +232,11 @@ static void ovl_file_modified(struct file *file)
 	ovl_copyattr(file_inode(file));
 }
 
+static void ovl_file_end_write(struct file *file, loff_t pos, ssize_t res)
+{
+	ovl_file_modified(file);
+}
+
 static void ovl_file_accessed(struct file *file)
 {
 	struct inode *inode, *upperinode;
@@ -292,7 +297,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	struct backing_file_ctx ctx = {
 		.cred = ovl_creds(inode->i_sb),
 		.user_file = file,
-		.end_write = ovl_file_modified,
+		.end_write = ovl_file_end_write,
 	};
 
 	if (!iov_iter_count(iter))
@@ -365,7 +370,7 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	struct backing_file_ctx ctx = {
 		.cred = ovl_creds(inode->i_sb),
 		.user_file = out,
-		.end_write = ovl_file_modified,
+		.end_write = ovl_file_end_write,
 	};
 
 	inode_lock(inode);
diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h
index 3f1fe1774f1b..98e0b6c30193 100644
--- a/include/linux/backing-file.h
+++ b/include/linux/backing-file.h
@@ -16,7 +16,7 @@ struct backing_file_ctx {
 	const struct cred *cred;
 	struct file *user_file;
 	void (*accessed)(struct file *);
-	void (*end_write)(struct file *);
+	void (*end_write)(struct file *, loff_t, ssize_t);
 };
 
 struct file *backing_file_open(const struct path *user_path, int flags,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v14 10/12] fuse: implement splice_{read/write} passthrough
  2023-10-16 16:08 [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
                   ` (8 preceding siblings ...)
  2023-10-16 16:08 ` [PATCH v14 09/12] fuse: implement read/write passthrough Amir Goldstein
@ 2023-10-16 16:09 ` Amir Goldstein
  2023-10-16 16:09 ` [PATCH v14 11/12] fuse: implement passthrough for mmap Amir Goldstein
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 48+ messages in thread
From: Amir Goldstein @ 2023-10-16 16:09 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

This allows passing fstests generic/249 and generic/591.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/file.c        | 27 ++++++++++++++++++++--
 fs/fuse/fuse_i.h      |  6 +++++
 fs/fuse/passthrough.c | 52 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 17964486ba80..d6fc99245a61 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1664,6 +1664,29 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		return fuse_direct_write_iter(iocb, from);
 }
 
+static ssize_t fuse_splice_read(struct file *in, loff_t *ppos,
+				struct pipe_inode_info *pipe, size_t len,
+				unsigned int flags)
+{
+	struct fuse_file *ff = in->private_data;
+
+	if (fuse_file_passthrough(ff))
+		return fuse_passthrough_splice_read(in, ppos, pipe, len, flags);
+	else
+		return filemap_splice_read(in, ppos, pipe, len, flags);
+}
+
+static ssize_t fuse_splice_write(struct pipe_inode_info *pipe, struct file *out,
+				 loff_t *ppos, size_t len, unsigned int flags)
+{
+	struct fuse_file *ff = out->private_data;
+
+	if (fuse_file_passthrough(ff))
+		return fuse_passthrough_splice_write(pipe, out, ppos, len, flags);
+	else
+		return iter_file_splice_write(pipe, out, ppos, len, flags);
+}
+
 static void fuse_writepage_free(struct fuse_writepage_args *wpa)
 {
 	struct fuse_args_pages *ap = &wpa->ia.ap;
@@ -3222,8 +3245,8 @@ static const struct file_operations fuse_file_operations = {
 	.lock		= fuse_file_lock,
 	.get_unmapped_area = thp_get_unmapped_area,
 	.flock		= fuse_file_flock,
-	.splice_read	= filemap_splice_read,
-	.splice_write	= iter_file_splice_write,
+	.splice_read	= fuse_splice_read,
+	.splice_write	= fuse_splice_write,
 	.unlocked_ioctl	= fuse_file_ioctl,
 	.compat_ioctl	= fuse_file_compat_ioctl,
 	.poll		= fuse_file_poll,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 0a43dc93e376..7dd770efceba 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1397,5 +1397,11 @@ static inline struct file *fuse_file_passthrough(struct fuse_file *ff)
 
 ssize_t fuse_passthrough_read_iter(struct kiocb *iocb, struct iov_iter *iter);
 ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, struct iov_iter *iter);
+ssize_t fuse_passthrough_splice_read(struct file *in, loff_t *ppos,
+				     struct pipe_inode_info *pipe,
+				     size_t len, unsigned int flags);
+ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe,
+				      struct file *out, loff_t *ppos,
+				      size_t len, unsigned int flags);
 
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index 0224f63f8cdf..fa5b41a4cdc1 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -9,6 +9,7 @@
 
 #include <linux/file.h>
 #include <linux/backing-file.h>
+#include <linux/splice.h>
 
 static void fuse_file_start_write(struct file *file, loff_t pos, size_t count)
 {
@@ -96,6 +97,57 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb,
 	return ret;
 }
 
+ssize_t fuse_passthrough_splice_read(struct file *in, loff_t *ppos,
+				     struct pipe_inode_info *pipe,
+				     size_t len, unsigned int flags)
+{
+	struct fuse_file *ff = in->private_data;
+	struct file *backing_file = fuse_file_passthrough(ff);
+	struct backing_file_ctx ctx = {
+		.cred = ff->cred,
+		.user_file = in,
+		.accessed = fuse_file_accessed,
+	};
+
+	pr_debug("%s: backing_file=0x%p, pos=%lld, len=%zu, flags=0x%x\n", __func__,
+		 backing_file, ppos ? *ppos : 0, len, flags);
+
+	return backing_file_splice_read(backing_file, ppos, pipe, len, flags,
+					&ctx);
+}
+
+ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe,
+				      struct file *out, loff_t *ppos,
+				      size_t len, unsigned int flags)
+{
+	struct fuse_file *ff = out->private_data;
+	struct file *backing_file = fuse_file_passthrough(ff);
+	struct inode *inode = file_inode(out);
+	ssize_t ret;
+	struct backing_file_ctx ctx = {
+		.cred = ff->cred,
+		.user_file = out,
+		.end_write = fuse_file_end_write,
+	};
+
+	pr_debug("%s: backing_file=0x%p, pos=%lld, len=%zu, flags=0x%x\n", __func__,
+		 backing_file, ppos ? *ppos : 0, len, flags);
+
+	inode_lock(inode);
+	fuse_file_start_write(out, ppos ? *ppos : 0, len);
+	ret = file_remove_privs(out);
+	if (ret)
+		goto out_unlock;
+
+	ret = backing_file_splice_write(pipe, backing_file, ppos, len, flags,
+					&ctx);
+
+out_unlock:
+	inode_unlock(inode);
+
+	return ret;
+}
+
 struct fuse_backing *fuse_backing_get(struct fuse_backing *fb)
 {
 	if (fb && refcount_inc_not_zero(&fb->count))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v14 11/12] fuse: implement passthrough for mmap
  2023-10-16 16:08 [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
                   ` (9 preceding siblings ...)
  2023-10-16 16:09 ` [PATCH v14 10/12] fuse: implement splice_{read/write} passthrough Amir Goldstein
@ 2023-10-16 16:09 ` Amir Goldstein
  2023-10-16 16:09 ` [PATCH v14 12/12] fuse: implement passthrough for readdir Amir Goldstein
  2023-10-19 14:33 ` [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
  12 siblings, 0 replies; 48+ messages in thread
From: Amir Goldstein @ 2023-10-16 16:09 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

Enabling FUSE passthrough for mmap-ed operations not only affects
performance, but has also been shown as mandatory for the correct
functioning of FUSE passthrough.

yanwu noticed [1] that a FUSE file with passthrough enabled may suffer
data inconsistencies if the same file is also accessed with mmap.
What happens is that read/write operations are directly applied to the
backing file's page cache, while mmap-ed operations are affecting the
FUSE file's page cache.

Extend the FUSE passthrough implementation to also handle memory-mapped
FUSE file, to both fix the cache inconsistencies and extend the
passthrough performance benefits to mmap-ed operations.

[1] https://lore.kernel.org/lkml/20210119110654.11817-1-wu-yan@tcl.com/

Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/file.c        |  3 +++
 fs/fuse/fuse_i.h      |  1 +
 fs/fuse/passthrough.c | 16 ++++++++++++++++
 3 files changed, 20 insertions(+)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index d6fc99245a61..bae1137426a9 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2499,6 +2499,9 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
 	if (FUSE_IS_DAX(file_inode(file)))
 		return fuse_dax_mmap(file, vma);
 
+	if (fuse_file_passthrough(ff))
+		return fuse_passthrough_mmap(file, vma);
+
 	if (ff->open_flags & FOPEN_DIRECT_IO) {
 		/* Can't provide the coherency needed for MAP_SHARED
 		 * if FUSE_DIRECT_IO_RELAX isn't set.
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 7dd770efceba..6fee4c33678f 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1403,5 +1403,6 @@ ssize_t fuse_passthrough_splice_read(struct file *in, loff_t *ppos,
 ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe,
 				      struct file *out, loff_t *ppos,
 				      size_t len, unsigned int flags);
+ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma);
 
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index fa5b41a4cdc1..a05280ceba83 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -148,6 +148,22 @@ ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe,
 	return ret;
 }
 
+ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct fuse_file *ff = file->private_data;
+	struct file *backing_file = fuse_file_passthrough(ff);
+	struct backing_file_ctx ctx = {
+		.cred = ff->cred,
+		.user_file = file,
+		.accessed = fuse_file_accessed,
+	};
+
+	pr_debug("%s: backing_file=0x%p, start=%lu, end=%lu\n", __func__,
+		 backing_file, vma->vm_start, vma->vm_end);
+
+	return backing_file_mmap(backing_file, vma, &ctx);
+}
+
 struct fuse_backing *fuse_backing_get(struct fuse_backing *fb)
 {
 	if (fb && refcount_inc_not_zero(&fb->count))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v14 12/12] fuse: implement passthrough for readdir
  2023-10-16 16:08 [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
                   ` (10 preceding siblings ...)
  2023-10-16 16:09 ` [PATCH v14 11/12] fuse: implement passthrough for mmap Amir Goldstein
@ 2023-10-16 16:09 ` Amir Goldstein
  2023-10-19 14:33 ` [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
  12 siblings, 0 replies; 48+ messages in thread
From: Amir Goldstein @ 2023-10-16 16:09 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

Same as for regular file read, when dir is open with FOPEN_PASSTHROUGH,
passthrough readdir to backing directory.

FOPEN_CACHE_DIR is ignored with passthrough readdir and it does not
populated children inode cache as READDIRPLUS does.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/file.c        |  3 +++
 fs/fuse/fuse_i.h      |  1 +
 fs/fuse/passthrough.c | 25 ++++++++++++++++++++++++-
 fs/fuse/readdir.c     | 12 +++++++++++-
 4 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index bae1137426a9..a8ebd25765c6 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -148,6 +148,9 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 			ff->fh = outargp->fh;
 			ff->open_flags = outargp->open_flags;
 
+			/* Readdir cache not used for passthrough */
+			if (ff->open_flags & FOPEN_PASSTHROUGH)
+				ff->open_flags &= ~FOPEN_CACHE_DIR;
 		} else if (err != -ENOSYS) {
 			fuse_file_free(ff);
 			return ERR_PTR(err);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 6fee4c33678f..822226fe96bf 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1404,5 +1404,6 @@ ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe,
 				      struct file *out, loff_t *ppos,
 				      size_t len, unsigned int flags);
 ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma);
+int fuse_passthrough_readdir(struct file *file, struct dir_context *ctx);
 
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index a05280ceba83..828f26597b16 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -164,6 +164,28 @@ ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma)
 	return backing_file_mmap(backing_file, vma, &ctx);
 }
 
+int fuse_passthrough_readdir(struct file *file, struct dir_context *ctx)
+{
+	struct fuse_file *ff = file->private_data;
+	struct inode *inode = file_inode(file);
+	struct file *backing_file = fuse_file_passthrough(ff);
+	const struct cred *old_cred;
+	bool locked;
+	int ret;
+
+	pr_debug("%s: backing_file=0x%p, pos=%lld\n", __func__,
+		 backing_file, ctx->pos);
+
+	locked = fuse_lock_inode(inode);
+	old_cred = override_creds(ff->cred);
+	ret = iterate_dir(backing_file, ctx);
+	revert_creds(old_cred);
+	fuse_file_accessed(file);
+	fuse_unlock_inode(inode, locked);
+
+	return ret;
+}
+
 struct fuse_backing *fuse_backing_get(struct fuse_backing *fb)
 {
 	if (fb && refcount_inc_not_zero(&fb->count))
@@ -257,7 +279,8 @@ int fuse_backing_open(struct fuse_conn *fc, struct fuse_backing_map *map)
 		goto out;
 
 	res = -EOPNOTSUPP;
-	if (!file->f_op->read_iter || !file->f_op->write_iter)
+	if (!file->f_op->iterate_shared &&
+	    !(file->f_op->read_iter && file->f_op->write_iter))
 		goto out_fput;
 
 	backing_sb = file_inode(file)->i_sb;
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index 9e6d587b3e67..e59c072ca29c 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -327,7 +327,7 @@ static int parse_dirplusfile(char *buf, size_t nbytes, struct file *file,
 	return 0;
 }
 
-static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx)
+static int fuse_do_readdir(struct file *file, struct dir_context *ctx)
 {
 	int plus;
 	ssize_t res;
@@ -581,6 +581,16 @@ static int fuse_readdir_cached(struct file *file, struct dir_context *ctx)
 	return res == FOUND_SOME ? 0 : UNCACHED;
 }
 
+static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx)
+{
+	struct fuse_file *ff = file->private_data;
+
+	if (fuse_file_passthrough(ff))
+		return fuse_passthrough_readdir(file, ctx);
+	else
+		return fuse_do_readdir(file, ctx);
+}
+
 int fuse_readdir(struct file *file, struct dir_context *ctx)
 {
 	struct fuse_file *ff = file->private_data;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 08/12] fuse: implement ioctls to manage backing files
  2023-10-16 16:08 ` [PATCH v14 08/12] fuse: implement ioctls to manage backing files Amir Goldstein
@ 2023-10-17  9:45   ` Amir Goldstein
  0 siblings, 0 replies; 48+ messages in thread
From: Amir Goldstein @ 2023-10-17  9:45 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

On Mon, Oct 16, 2023 at 7:09 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> FUSE server calls the FUSE_DEV_IOC_BACKING_OPEN ioctl with a backing file
> descriptor.  If the call succeeds, a backing file identifier is returned.
>
> A later reply to OPEN request with the flag FOPEN_PASSTHROUGH will setup
> passthrough of file operations on the open FUSE file to the backing file
> associated with the id.  If there is no backing file associated with id,
> FOPEN_PASSTHROUGH flag is ignored.
>
> The FUSE server may call FUSE_DEV_IOC_BACKING_CLOSE ioctl to close the
> backing file by its id.
> If there is no backing file with that id, -ENOENT is returned.
>
> This can be done at any time, but if an open reply with FOPEN_PASSTHROUGH
> flag is still in progress, the open may or may not end up setting up the
> passthrough to the backing file.
>
> In any case, the backing file will be kept open by the FUSE driver until
> the last fuse_file that was setup to passthrough to that backing file is
> closed AND the FUSE_DEV_IOC_BACKING_CLOSE ioctl was called.
>
> Setting up backing files requires a server with CAP_SYS_ADMIN privileges.
> For the backing file to be successfully setup, the backing file must
> implement both read_iter and write_iter file operations.
>
> The limitation on the level of filesystem stacking allowed for the
> backing file is enforced before setting up the backing file.
>
> Signed-off-by: Alessio Balsini <balsini@android.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---

[...]

> +int fuse_backing_open(struct fuse_conn *fc, struct fuse_backing_map *map)
> +{
> +       struct file *file;
> +       struct super_block *backing_sb;
> +       struct fuse_backing *fb = NULL;
> +       int res;
> +
> +       pr_debug("%s: fd=%d flags=0x%x\n", __func__, map->fd, map->flags);
> +
> +       /* TODO: relax CAP_SYS_ADMIN once backing files are visible to lsof */
> +       res = -EPERM;
> +       if (!fc->passthrough || !capable(CAP_SYS_ADMIN))
> +               goto out;
> +
> +       res = -EINVAL;
> +       if (map->flags)
> +               goto out;
> +
> +       file = fget(map->fd);
> +       res = -EBADF;
> +       if (!file)
> +               goto out;
> +
> +       res = -EOPNOTSUPP;
> +       if (!file->f_op->read_iter || !file->f_op->write_iter)
> +               goto out_fput;
> +

Miklos,

I need to clarify one thing regarding this patch.
Your suggestion was:

- mapping request is done with an O_PATH fd.
- fuse_open() always opens a backing file (just like overlayfs)

The reason I did not use fget_raw() to support O_PATH fd is
because I wanted to keep the EOPNOTSUPP check above
at ioctl time rather than deferring it to open/create reply time,
when the user has no feedback of failure reasons.

We could add O_PATH support later, but I think it is going to be
more important when we support passthrough to inode operations
and the limitation of requiring a non O_PATH fd is not is not that bad.
It also serves as a proof that the server user is allowed to open the
backing file for read/write and avoid the later permission failure of
backing_file_open().

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-10-16 16:08 [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
                   ` (11 preceding siblings ...)
  2023-10-16 16:09 ` [PATCH v14 12/12] fuse: implement passthrough for readdir Amir Goldstein
@ 2023-10-19 14:33 ` Amir Goldstein
  2023-10-30 10:16   ` Miklos Szeredi
  12 siblings, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-10-19 14:33 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

On Mon, Oct 16, 2023 at 7:09 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Miklos,
>
> I've shared several POC branches since the posting of v13 back in May
> and played with several API choices. It is time to post v14.
>
> The API we converged to is server managed shared backing files that are
> referenced by backing id plus per-file re-opened backing_file.
>
> This model looks coherent to me. I think that the example server [3]
> demonstrates that this API is simple enough to work with.
>
> There is quite a bit of re-factored code in this version - I've actually
> declared this common code as a new vfs subsystem [stackable filesystems]
> in MAINTAINERS per Christian's request.
>
> The re-factored common code is based on overlayfs-next and Christian's
> vfs.misc branch (for the backing_file changes).
>
> I am not posting performance numbers again. Alessio has already posted
> performance numbers back in v12 and nothing has changed in this regard.
> We are using a variant of v12 patches in production and the performance
> improvement is very noticable.
>
> Bernd and Nikolaus have helped with improving running fstests on fuse
> passthrough examples.
>
> I have ran the -g auto fstests with v14 patches with the example server.
> Compared to the baseline test results with passthrough_hp, the backing
> file passthrough passes several more test, mainly tests related to data
> coherency, such as generic/451.

FWIW, baseline passthrough_hp --nocache does pass generic/451.
Not surprising considering that the test is for
"Test data integrity when mixing buffered reads and asynchronous
 direct writes a file."

>
> The following tests are the only ones that pass on baseline passthtough_hp
> and fail with my backing file passthrough example:
>
>   generic/080 generic/120 generic/193 generic/215 generic/355
>
> Those tests are failing because of missing mtime/atime/ctime updates

Some more detailed analysis:

generic/120 tests -o noatime and fails because atime is
updated (on the backing file).
This is a general FUSE issue and passthrough_hp --nocache fails
the same test (i.e. it only passed because of attribute cache).

generic/080, generic/215 both test for c/mtime updates after mapped writes.
It is not surprising that backing file passthrough fails these tests -
there is no "passthrough getattr" like overlayfs and there is no opportunity
to invalidate the FUSE inode attribute cache.

> in some use cases and failure to strip suid/sgid bits in some cases.

The suid/sgid strip failures (generic/193, generic/355) were just a silly bug.
Forgot to add file_remove_privs() in passthrough write.
I now moved it from overlayfs into the common backing_file helpers.

I don't think that the issue with mapped writes c/mtime update is a
show stopper?

Thanks,
Amir.

>
> Changes from v13 [1]:
> - rebase on 6.6-rc6 (and overlayfs and vfs next branches)
> - server managed shared backing files without auto-close mode
> - open a backing_file per fuse_file with fuse file's path and flags
> - factor out common read/write/splice/mmap helpers from overlayfs
> - factor out ioctl helpers
>
> [1] https://lore.kernel.org/r/20230519125705.598234-1-amir73il@gmail.com/
> [2] https://github.com/amir73il/linux/commits/fuse-backing-fd-v14
> [3] https://github.com/amir73il/libfuse/commits/fuse-backing-fd
>
> Amir Goldstein (12):
>   fs: prepare for stackable filesystems backing file helpers
>   fs: factor out backing_file_{read,write}_iter() helpers
>   fs: factor out backing_file_splice_{read,write}() helpers
>   fs: factor out backing_file_mmap() helper
>   fuse: factor out helper for FUSE_DEV_IOC_CLONE
>   fuse: introduce FUSE_PASSTHROUGH capability
>   fuse: pass optional backing_id in struct fuse_open_out
>   fuse: implement ioctls to manage backing files
>   fuse: implement read/write passthrough
>   fuse: implement splice_{read/write} passthrough
>   fuse: implement passthrough for mmap
>   fuse: implement passthrough for readdir
>
>  MAINTAINERS                  |   9 +
>  fs/Kconfig                   |   4 +
>  fs/Makefile                  |   1 +
>  fs/backing-file.c            | 319 ++++++++++++++++++++++++++++
>  fs/fuse/Kconfig              |  11 +
>  fs/fuse/Makefile             |   1 +
>  fs/fuse/cuse.c               |   3 +-
>  fs/fuse/dev.c                |  98 ++++++---
>  fs/fuse/dir.c                |   2 +-
>  fs/fuse/file.c               |  69 ++++--
>  fs/fuse/fuse_i.h             |  72 ++++++-
>  fs/fuse/inode.c              |  25 +++
>  fs/fuse/ioctl.c              |   3 +-
>  fs/fuse/passthrough.c        | 392 +++++++++++++++++++++++++++++++++++
>  fs/fuse/readdir.c            |  12 +-
>  fs/open.c                    |  38 ----
>  fs/overlayfs/Kconfig         |   1 +
>  fs/overlayfs/file.c          | 246 ++++------------------
>  fs/overlayfs/overlayfs.h     |   8 +-
>  fs/overlayfs/super.c         |  11 +-
>  include/linux/backing-file.h |  42 ++++
>  include/linux/fs.h           |   3 -
>  include/uapi/linux/fuse.h    |  23 +-
>  23 files changed, 1085 insertions(+), 308 deletions(-)
>  create mode 100644 fs/backing-file.c
>  create mode 100644 fs/fuse/passthrough.c
>  create mode 100644 include/linux/backing-file.h
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-10-19 14:33 ` [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
@ 2023-10-30 10:16   ` Miklos Szeredi
  2023-10-31 10:28     ` Amir Goldstein
  0 siblings, 1 reply; 48+ messages in thread
From: Miklos Szeredi @ 2023-10-30 10:16 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel

On Thu, 19 Oct 2023 at 16:33, Amir Goldstein <amir73il@gmail.com> wrote:

> generic/120 tests -o noatime and fails because atime is
> updated (on the backing file).
> This is a general FUSE issue and passthrough_hp --nocache fails
> the same test (i.e. it only passed because of attribute cache).
>
> generic/080, generic/215 both test for c/mtime updates after mapped writes.
> It is not surprising that backing file passthrough fails these tests -
> there is no "passthrough getattr" like overlayfs and there is no opportunity
> to invalidate the FUSE inode attribute cache.

This is what POSIX has to say:

"The last data modification and last file status change timestamps of
a file that is mapped with MAP_SHARED and PROT_WRITE shall be marked
for update at some point in the interval between a write reference to
the mapped region and the next call to msync() with MS_ASYNC or
MS_SYNC for that portion of the file by any process. If there is no
such call and if the underlying file is modified as a result of a
write reference, then these timestamps shall be marked for update at
some time after the write reference."

Not sure if the test is doing msync(), but invalidating cached c/mtime
on msync() shouldn't be too hard (msync -> fsync).

While the standard doesn't seem to require updating c/mtime on
mumap(2) if there was a modification, that might also make sense in
practice.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-10-30 10:16   ` Miklos Szeredi
@ 2023-10-31 10:28     ` Amir Goldstein
  2023-10-31 11:16       ` Miklos Szeredi
  0 siblings, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-10-31 10:28 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Mon, Oct 30, 2023 at 12:16 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 19 Oct 2023 at 16:33, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > generic/120 tests -o noatime and fails because atime is
> > updated (on the backing file).
> > This is a general FUSE issue and passthrough_hp --nocache fails
> > the same test (i.e. it only passed because of attribute cache).
> >
> > generic/080, generic/215 both test for c/mtime updates after mapped writes.
> > It is not surprising that backing file passthrough fails these tests -
> > there is no "passthrough getattr" like overlayfs and there is no opportunity
> > to invalidate the FUSE inode attribute cache.
>
> This is what POSIX has to say:
>
> "The last data modification and last file status change timestamps of
> a file that is mapped with MAP_SHARED and PROT_WRITE shall be marked
> for update at some point in the interval between a write reference to
> the mapped region and the next call to msync() with MS_ASYNC or
> MS_SYNC for that portion of the file by any process. If there is no
> such call and if the underlying file is modified as a result of a
> write reference, then these timestamps shall be marked for update at
> some time after the write reference."
>
> Not sure if the test is doing msync(), but invalidating cached c/mtime
> on msync() shouldn't be too hard (msync -> fsync).
>

These tests do not do msync.

Specifically, test generic/080 encodes an expectation that
c/mtime will be changed for mmaped write following mmaped read.
Not sure where this expectation is coming from?


Author: Omer Zilberberg <omzg@plexistor.com>
Date:   Wed Apr 1 15:39:36 2015 +1100

    generic: test that mmap-write updates c/mtime

    When using mmap() for file i/o, writing to the file should update
    it's c/mtime. Specifically if we first mmap-read from a page, then
    memap-write to the same page.

    This test was failing for the initial submission of DAX because
    pfn based mapping do not have an page_mkwrite called for them.
    The new Kernel patches that introduce pfn_mkwrite fixes this test.

    Test adapted from a script originally written by Dave Chinner.


> While the standard doesn't seem to require updating c/mtime on
> mumap(2) if there was a modification, that might also make sense in
> practice.

Makes sense.

I was thinking about a cache coherency model for FUSE passthough:
At any given point, if we have access to a backing file, we can compare
the backing file's inode timestamps with those of the FUSE inode.

If the timestamps differ, we do not copy them over to FUSE inode,
as overlayfs does, but we invalidate the FUSE attribute cache.
We can perform this checkup on open() release() flush() fsync()
and at other points, such as munmap() instead of unconditionally
invalidating attribute cache.

I already used tha model for atime update:

       /* Mimic atime update policy of backing inode, not the actual value */
       if (!timespec64_equal(&backing_inode->i_atime, &inode->i_atime))
               fuse_invalidate_atime(inode);

Do you think that can work?
Do you think that the server should be able to configure this behavior
or we can do it by default?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-10-31 10:28     ` Amir Goldstein
@ 2023-10-31 11:16       ` Miklos Szeredi
  2023-10-31 12:31         ` Amir Goldstein
  0 siblings, 1 reply; 48+ messages in thread
From: Miklos Szeredi @ 2023-10-31 11:16 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Tue, 31 Oct 2023 at 11:28, Amir Goldstein <amir73il@gmail.com> wrote:

> These tests do not do msync.
>
> Specifically, test generic/080 encodes an expectation that
> c/mtime will be changed for mmaped write following mmaped read.
> Not sure where this expectation is coming from?

Probably because "update on first store" is the de-facto standard on
linux (filemap_page_mkwrite -> file_update_time).

Thinking about it, the POSIX text is a bit sloppy, I think this is
what it wanted to say:

"The last data modification and last file status change timestamps of
a file that is mapped with MAP_SHARED and PROT_WRITE shall be marked
for update at some point in the interval between the _first_ write reference to
the mapped region _since_the_last_msync()_ call and the next call to msync()
..."

Otherwise the linux behavior would be non-conforming.

> I was thinking about a cache coherency model for FUSE passthough:
> At any given point, if we have access to a backing file, we can compare
> the backing file's inode timestamps with those of the FUSE inode.

Yes, that makes sense as long as there's a 1:1 mapping between backing
files and backed files.

It's not the case for e.g. a blockdev backed fs.   But current
patchset doesn't support that mode yet, so I don't think we need to
care now.

> If the timestamps differ, we do not copy them over to FUSE inode,
> as overlayfs does, but we invalidate the FUSE attribute cache.
> We can perform this checkup on open() release() flush() fsync()
> and at other points, such as munmap() instead of unconditionally
> invalidating attribute cache.

This check can be done directly in  fuse_update_get_attr(), no?

> I already used tha model for atime update:
>
>        /* Mimic atime update policy of backing inode, not the actual value */
>        if (!timespec64_equal(&backing_inode->i_atime, &inode->i_atime))
>                fuse_invalidate_atime(inode);
>
> Do you think that can work?
> Do you think that the server should be able to configure this behavior
> or we can do it by default?

I think this can be the default, and as we generalize the passthrough
behavior we'll add the necessary controls.

Thanks,
Miklos


>
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-10-31 11:16       ` Miklos Szeredi
@ 2023-10-31 12:31         ` Amir Goldstein
  2023-10-31 15:01           ` Miklos Szeredi
  0 siblings, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-10-31 12:31 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Tue, Oct 31, 2023 at 1:17 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, 31 Oct 2023 at 11:28, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > These tests do not do msync.
> >
> > Specifically, test generic/080 encodes an expectation that
> > c/mtime will be changed for mmaped write following mmaped read.
> > Not sure where this expectation is coming from?
>
> Probably because "update on first store" is the de-facto standard on
> linux (filemap_page_mkwrite -> file_update_time).
>
> Thinking about it, the POSIX text is a bit sloppy, I think this is
> what it wanted to say:
>
> "The last data modification and last file status change timestamps of
> a file that is mapped with MAP_SHARED and PROT_WRITE shall be marked
> for update at some point in the interval between the _first_ write reference to
> the mapped region _since_the_last_msync()_ call and the next call to msync()
> ..."
>
> Otherwise the linux behavior would be non-conforming.
>
> > I was thinking about a cache coherency model for FUSE passthough:
> > At any given point, if we have access to a backing file, we can compare
> > the backing file's inode timestamps with those of the FUSE inode.
>
> Yes, that makes sense as long as there's a 1:1 mapping between backing
> files and backed files.
>
> It's not the case for e.g. a blockdev backed fs.   But current
> patchset doesn't support that mode yet, so I don't think we need to
> care now.
>
> > If the timestamps differ, we do not copy them over to FUSE inode,
> > as overlayfs does, but we invalidate the FUSE attribute cache.
> > We can perform this checkup on open() release() flush() fsync()
> > and at other points, such as munmap() instead of unconditionally
> > invalidating attribute cache.
>
> This check can be done directly in  fuse_update_get_attr(), no?
>

Current patch set does not implement "backing inode" for FUSE inode,
so this only works for fuse_update_attributes() (with a file).
We do not do cached read/write/readdir in fuse passthrough mode
that leaves only lseek/llseek(), so I assume it's not what you meant.

IOW, we need to conditionally call fuse_invalidate_attr_mask()
at occasions that we have a backing file/inode.

> > I already used tha model for atime update:
> >
> >        /* Mimic atime update policy of backing inode, not the actual value */
> >        if (!timespec64_equal(&backing_inode->i_atime, &inode->i_atime))
> >                fuse_invalidate_atime(inode);
> >
> > Do you think that can work?
> > Do you think that the server should be able to configure this behavior
> > or we can do it by default?
>
> I think this can be the default, and as we generalize the passthrough
> behavior we'll add the necessary controls.
>

OK, I will try to add this and see if that also solved the 2 failing tests.
It should solve the mmaped write test because the tests do
stat()/open()/mmap()/mem-write/close()/stat()

So it's the close/flush that is going to update mtime, even though that's
not the POSIX semantics, it is a bit like the NFS close-to-open (cto)
semantics regarding data cache coherency.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-10-31 12:31         ` Amir Goldstein
@ 2023-10-31 15:01           ` Miklos Szeredi
  2023-10-31 17:44             ` Amir Goldstein
  0 siblings, 1 reply; 48+ messages in thread
From: Miklos Szeredi @ 2023-10-31 15:01 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Tue, 31 Oct 2023 at 13:31, Amir Goldstein <amir73il@gmail.com> wrote:

> Current patch set does not implement "backing inode" for FUSE inode,

What prevents us from storing a fuse_backing reference in fuse_inode
while the file(s) are open?

AFAICS it just needs a counter in fuse_inode to account the number of
open instances.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-10-31 15:01           ` Miklos Szeredi
@ 2023-10-31 17:44             ` Amir Goldstein
  2023-11-01 11:32               ` Miklos Szeredi
  0 siblings, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-10-31 17:44 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Tue, Oct 31, 2023 at 5:01 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, 31 Oct 2023 at 13:31, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > Current patch set does not implement "backing inode" for FUSE inode,
>
> What prevents us from storing a fuse_backing reference in fuse_inode
> while the file(s) are open?
>
> AFAICS it just needs a counter in fuse_inode to account the number of
> open instances.
>

The current patches do not enforce that all fuse_file of the same fuse_inode
passthrough to the same backing_file (or same backing inode).

I agree that functionally, we could enforce that, and it may be good for
"backing inode" mapping on lookup going forward
The problem is that at the time of FUSE_DEV_IOC_BACKING_OPEN,
we do not know which fuse_inode is going to be associated with the
fuse_backing object.

v13 patches had a mode that pass nodeid with the fuse_backing_map
request (inode bound mode) and refcount the fuse_backing object
from the fuse_inode (with no backing idr) at the time of the ioctl and
then -EBUSY could be returned for a conflicting backing map request.

The problem with that API was with CREATE of a new nodeid, where
the FUSE_DEV_IOC_BACKING_OPEN ioctl happens before fuse
knows about the new nodeid.

In that case, we would be able to "attach" the fuse_backing object
to fuse_inode on CREATE response. If we end up with a backing map
conflict at this point, we can return -EBUSY error to the user and forget
the inode, but the server would have no easy feedback on its mistake.
Also, -EBUSY to user would be confusing if user did not request O_EXCL.

Do you consider the described "atomic_open conflict" case an API problem?

It is true that with v14 patches that do not enforce backing inode conflicts,
the attribute coherency model that I proposed may result in attribute
cache thrashing if the backing inode of a fuse inode is ambiguous.

Do you have an idea how to handle this case elegantly?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-10-31 17:44             ` Amir Goldstein
@ 2023-11-01 11:32               ` Miklos Szeredi
  2023-11-01 13:23                 ` Amir Goldstein
  0 siblings, 1 reply; 48+ messages in thread
From: Miklos Szeredi @ 2023-11-01 11:32 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Tue, 31 Oct 2023 at 18:44, Amir Goldstein <amir73il@gmail.com> wrote:

> In that case, we would be able to "attach" the fuse_backing object
> to fuse_inode on CREATE response. If we end up with a backing map
> conflict at this point, we can return -EBUSY error to the user and forget
> the inode, but the server would have no easy feedback on its mistake.
> Also, -EBUSY to user would be confusing if user did not request O_EXCL.

I think -EIO is more appropriate.  Server is broken, WARN_ON_ONCE
could also be used to indicate that.

> Do you consider the described "atomic_open conflict" case an API problem?
>
> It is true that with v14 patches that do not enforce backing inode conflicts,
> the attribute coherency model that I proposed may result in attribute
> cache thrashing if the backing inode of a fuse inode is ambiguous.
>
> Do you have an idea how to handle this case elegantly?

Let me add some perspective.

Currently we have FOPEN_DIRECT_IO that disables caching.  My feeling
when dealing with this interface is that it was a mistake to make this
a property of the open file.  It should insted be a property of the
inode and all open file instances should respect this property
equally.  It makes no sense to have one file do cached reads while the
other is doing direct writes, etc.  Also it should be possible to turn
this on or off for all open file instances at any time.

Passthrough is similar, I think.  At any point in time all I/O should either be

 - cached
 - direct
 - passthrough to a specific backing file

Allowing these to be mixed leads to confusing semantics, especially
cached and non-cached

OTOH allowing passthrough to be switched on at any point in time
presents challenges.   If passthrough is turned on for an inode that
didn't have it previously means that the backing file needs to be set
up before the actual I/O.    So it's definitely more complex than just
setting up the backing at open.  But I feel that longer term we'll end
up with a better interface.

For the first version we can just bypass this whole mess, and say that
passthrough can only be turned on for the first open.  I.e. if there
are already open instances and passthrough has not yet been set up,
then FOPEN_PASSTHROUGH will be ignored.  Similarly if it has already
been set up, then the lack of FOPEN_PASSTHROUGH is also ignored.

Hmm?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-01 11:32               ` Miklos Szeredi
@ 2023-11-01 13:23                 ` Amir Goldstein
  2023-11-01 14:42                   ` Miklos Szeredi
  0 siblings, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-11-01 13:23 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Wed, Nov 1, 2023 at 1:32 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, 31 Oct 2023 at 18:44, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > In that case, we would be able to "attach" the fuse_backing object
> > to fuse_inode on CREATE response. If we end up with a backing map
> > conflict at this point, we can return -EBUSY error to the user and forget
> > the inode, but the server would have no easy feedback on its mistake.
> > Also, -EBUSY to user would be confusing if user did not request O_EXCL.
>
> I think -EIO is more appropriate.  Server is broken, WARN_ON_ONCE
> could also be used to indicate that.
>

WARN_ON is a kernel assertion - we should not use it for user possible
wrong inputs. we can use pr_warn_ratelimited().

According to your suggestion below that FOPEN_PASSTHROUGH
is an advice flag, we can simply ignore the conflicting passthrough request
and keep passthrough on the existing backing file without any feedback
to the user or server. I can live with that.

> > Do you consider the described "atomic_open conflict" case an API problem?
> >
> > It is true that with v14 patches that do not enforce backing inode conflicts,
> > the attribute coherency model that I proposed may result in attribute
> > cache thrashing if the backing inode of a fuse inode is ambiguous.
> >
> > Do you have an idea how to handle this case elegantly?
>
> Let me add some perspective.
>
> Currently we have FOPEN_DIRECT_IO that disables caching.  My feeling
> when dealing with this interface is that it was a mistake to make this
> a property of the open file.  It should insted be a property of the
> inode and all open file instances should respect this property
> equally.  It makes no sense to have one file do cached reads while the
> other is doing direct writes, etc.  Also it should be possible to turn
> this on or off for all open file instances at any time.
>
> Passthrough is similar, I think.  At any point in time all I/O should either be
>
>  - cached
>  - direct
>  - passthrough to a specific backing file
>
> Allowing these to be mixed leads to confusing semantics, especially
> cached and non-cached
>
> OTOH allowing passthrough to be switched on at any point in time
> presents challenges.   If passthrough is turned on for an inode that
> didn't have it previously means that the backing file needs to be set
> up before the actual I/O.    So it's definitely more complex than just
> setting up the backing at open.  But I feel that longer term we'll end
> up with a better interface.
>
> For the first version we can just bypass this whole mess, and say that
> passthrough can only be turned on for the first open.  I.e. if there
> are already open instances and passthrough has not yet been set up,
> then FOPEN_PASSTHROUGH will be ignored.  Similarly if it has already
> been set up, then the lack of FOPEN_PASSTHROUGH is also ignored.
>
> Hmm?

Sounds like a small change, which will work fine with the example server
as well as with my in-house server requirement for regular files.

However, I can imagine users wanting to do passthrough for read-only
fd without doing passthrough for read-write fd, for example, if data is
never manipulated by the server, but it is inspected on write.
I wonder if this model fits the existing use of Android applications with
the v12 patches.

I have one problem with my in-house server and passthough of readdir.
readdir passthrough cannot do readdirplus to populate inode cache.
For readdir of a large, read-mostly directory, cached readdir is preferred
because of readdirplus.
For a large, write-often directory, when only readdir is needed, readdir
passthough is preferred.
No one size fits all here.

My in-house server chooses whether to do readdir passthrough based
on userspace hint regarding readdir vs. readdirplus.

Since directory has no "write modes" and since FOPEN_CACHE_DIR
is going to stay per-fd, I do not really see a problem with allowing
readdir passthrough and cached/uncached readdir on the same inode.
Do you see a problem with that?

I don't mind dropping the last readdir passthrough patch for the first
version, if you want to take more time to think this over.
I'd just like to know that there is a path forward to make conditional
passthrough per fd possible in future versions.

BTW, the FUSE BPF patches that map a backing inode to
fuse inode allow fallback to server depending on BPF program result.

Thoughts?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-01 13:23                 ` Amir Goldstein
@ 2023-11-01 14:42                   ` Miklos Szeredi
  2023-11-01 15:06                     ` Amir Goldstein
  2023-11-29  7:25                     ` Amir Goldstein
  0 siblings, 2 replies; 48+ messages in thread
From: Miklos Szeredi @ 2023-11-01 14:42 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Wed, 1 Nov 2023 at 14:23, Amir Goldstein <amir73il@gmail.com> wrote:

> However, I can imagine users wanting to do passthrough for read-only
> fd without doing passthrough for read-write fd, for example, if data is
> never manipulated by the server, but it is inspected on write.

Okay, that could make sense in a read-mostly use case.  But I think
that special case could be enabled with FOPEN_DIRECT_IO since we have
that anyway.  I.e. FOPEN_DIRECT_IO would override the per-inode state,
which is what it does now.

What doesn't make sense is to mix cached I/O with non-cached (direct
or passthrough) I/O.

> I have one problem with my in-house server and passthough of readdir.
> readdir passthrough cannot do readdirplus to populate inode cache.
> For readdir of a large, read-mostly directory, cached readdir is preferred
> because of readdirplus.
> For a large, write-often directory, when only readdir is needed, readdir
> passthough is preferred.
> No one size fits all here.

But in this case the cache part of readdirplus is not useful, only the
prepopulation part.  So why cache in that case?

Now using readdirplus for prepopulation might be the right interface,
but it also might make sense to use it _only_ for that and use
passthrough for the actual readdir.

> I don't mind dropping the last readdir passthrough patch for the first
> version, if you want to take more time to think this over.
> I'd just like to know that there is a path forward to make conditional
> passthrough per fd possible in future versions.

I think first version should just do regular file passthrough.

> BTW, the FUSE BPF patches that map a backing inode to
> fuse inode allow fallback to server depending on BPF program result.

Yep, that also makes sense.  What we need to make sure is that cache
and non-cache access are not mixed, because that way lies madness.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-01 14:42                   ` Miklos Szeredi
@ 2023-11-01 15:06                     ` Amir Goldstein
  2023-11-01 15:25                       ` Miklos Szeredi
  2023-11-29  7:25                     ` Amir Goldstein
  1 sibling, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-11-01 15:06 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Wed, Nov 1, 2023 at 4:42 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, 1 Nov 2023 at 14:23, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > However, I can imagine users wanting to do passthrough for read-only
> > fd without doing passthrough for read-write fd, for example, if data is
> > never manipulated by the server, but it is inspected on write.
>
> Okay, that could make sense in a read-mostly use case.  But I think
> that special case could be enabled with FOPEN_DIRECT_IO since we have
> that anyway.  I.e. FOPEN_DIRECT_IO would override the per-inode state,
> which is what it does now.
>
> What doesn't make sense is to mix cached I/O with non-cached (direct
> or passthrough) I/O.
>

That can work.

> > I have one problem with my in-house server and passthough of readdir.
> > readdir passthrough cannot do readdirplus to populate inode cache.
> > For readdir of a large, read-mostly directory, cached readdir is preferred
> > because of readdirplus.
> > For a large, write-often directory, when only readdir is needed, readdir
> > passthough is preferred.
> > No one size fits all here.
>
> But in this case the cache part of readdirplus is not useful, only the
> prepopulation part.  So why cache in that case?
>

In my use case, most users use cached readdirplus, so cache is important.
There is one user (samba case insensitive create) that opts-out of
readdirplus.

This user gets passthrough readdir without populating cache nor
prepopulate inodes, which is important because in that use case
(file create) readdir cache always gets invalidated.

This user does not contribute to cache populate of other readdirplus
users and does not use cache populated by other readdirplus users.

> Now using readdirplus for prepopulation might be the right interface,
> but it also might make sense to use it _only_ for that and use
> passthrough for the actual readdir.
>

That would make sense if readdirplus request is sent to server
in parallel to executing readdir passthrough and readdirplus
response does the prepopulation asynchronously.

> > I don't mind dropping the last readdir passthrough patch for the first
> > version, if you want to take more time to think this over.
> > I'd just like to know that there is a path forward to make conditional
> > passthrough per fd possible in future versions.
>
> I think first version should just do regular file passthrough.
>

OK.

> > BTW, the FUSE BPF patches that map a backing inode to
> > fuse inode allow fallback to server depending on BPF program result.
>
> Yep, that also makes sense.  What we need to make sure is that cache
> and non-cache access are not mixed, because that way lies madness.
>

Are you also referring to mixing cached and uncached readdir?
because that seems fine to me.

I will try to add these rules to FOPEN_PASSTHROUGH:
- ignore request on backing inode conflict
- ignore request if inode is already open in caching mode
- make FOPEN_PASSTHROUGH implicit if inode is already open
  in passthrough *unless* file is open with FOPEN_DIRECT_IO

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-01 15:06                     ` Amir Goldstein
@ 2023-11-01 15:25                       ` Miklos Szeredi
  2023-11-01 18:32                         ` Amir Goldstein
  0 siblings, 1 reply; 48+ messages in thread
From: Miklos Szeredi @ 2023-11-01 15:25 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Wed, 1 Nov 2023 at 16:06, Amir Goldstein <amir73il@gmail.com> wrote:

> That would make sense if readdirplus request is sent to server
> in parallel to executing readdir passthrough and readdirplus
> response does the prepopulation asynchronously.

Yes, that's fine, and it might even make sense to do that regardless
of passthrough.  Readdirplus is similar to read-ahead in this regard.

> Are you also referring to mixing cached and uncached readdir?
> because that seems fine to me.

But is it really fine?  The server must be very careful to prevent
inconsistency between the cached and the uncached readdir, which could
lead to really surprising results.  I just see no point in caching if
the same result is available from a backing directory.

> I will try to add these rules to FOPEN_PASSTHROUGH:
> - ignore request on backing inode conflict
> - ignore request if inode is already open in caching mode
> - make FOPEN_PASSTHROUGH implicit if inode is already open
>   in passthrough *unless* file is open with FOPEN_DIRECT_IO

Sounds good.

There's still this case:

 - file is opened in non-passthrough caching mode, and is being read/written
 - at the same time the file is opened in passthrough mode

The proper thing would be to switch the first file to passthrough mode
while it's open.  It adds some complexity but I think it's worth it.

Thanks,
Miklos



>
> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-01 15:25                       ` Miklos Szeredi
@ 2023-11-01 18:32                         ` Amir Goldstein
  2023-11-02 10:46                           ` Miklos Szeredi
  0 siblings, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-11-01 18:32 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Wed, Nov 1, 2023 at 5:25 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, 1 Nov 2023 at 16:06, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > That would make sense if readdirplus request is sent to server
> > in parallel to executing readdir passthrough and readdirplus
> > response does the prepopulation asynchronously.
>
> Yes, that's fine, and it might even make sense to do that regardless
> of passthrough.  Readdirplus is similar to read-ahead in this regard.
>
> > Are you also referring to mixing cached and uncached readdir?
> > because that seems fine to me.
>
> But is it really fine?  The server must be very careful to prevent
> inconsistency between the cached and the uncached readdir, which could
> lead to really surprising results.  I just see no point in caching if
> the same result is available from a backing directory.
>

I see. so if dir has backing inode, we can always
disable FOPEN_CACHE_DIR (as my patch already does)
but instead of alway doing passthrough readdir, we employ the
heuristic whether or not to do readdirplus.
If not doing readdirplus, we can do fuse_passthrough_readdir().
In the future, we could do async readdirplus and always call
fuse_passthrough_readdir() if we have backing inode.

For now, I will just remove the readdir passthough patch.

> > I will try to add these rules to FOPEN_PASSTHROUGH:
> > - ignore request on backing inode conflict
> > - ignore request if inode is already open in caching mode
> > - make FOPEN_PASSTHROUGH implicit if inode is already open
> >   in passthrough *unless* file is open with FOPEN_DIRECT_IO
>
> Sounds good.
>
> There's still this case:
>
>  - file is opened in non-passthrough caching mode, and is being read/written
>  - at the same time the file is opened in passthrough mode
>
> The proper thing would be to switch the first file to passthrough mode
> while it's open.  It adds some complexity but I think it's worth it.

Remember that we would actually need to do backing_file_open()
for all existing open files of the inode.

Also, after the server calls FUSE_DEV_IOC_BACKING_CLOSE,
will the fuse_backing object still be referenced from the inode
(as it was in v13)? and then properly closed only on the last file
close on that inode?

Otherwise, new file opens without explicit FOPEN_PASSTHROUGH
will not have a fuse_backing object to open the backing_file from.

I am not convinced that this complexity is a must for the first version.
If the server always sets FOPEN_PASSTHROUGH (as I expect many
servers will) this is unneeded complexity.

It seems a *lot* easier to do what you suggested and ignore
FOPEN_PASSTHROUGH if the server is not being consistent
with the FOPEN_ flags it uses.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-01 18:32                         ` Amir Goldstein
@ 2023-11-02 10:46                           ` Miklos Szeredi
  2023-11-02 13:07                             ` Amir Goldstein
  0 siblings, 1 reply; 48+ messages in thread
From: Miklos Szeredi @ 2023-11-02 10:46 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Wed, 1 Nov 2023 at 19:32, Amir Goldstein <amir73il@gmail.com> wrote:

> I see. so if dir has backing inode, we can always
> disable FOPEN_CACHE_DIR (as my patch already does)
> but instead of alway doing passthrough readdir, we employ the
> heuristic whether or not to do readdirplus.
> If not doing readdirplus, we can do fuse_passthrough_readdir().

Okay.

> In the future, we could do async readdirplus and always call
> fuse_passthrough_readdir() if we have backing inode.

Yes.

> For now, I will just remove the readdir passthough patch.

Yes, let try to do the minimal useful thing first.

>
> Remember that we would actually need to do backing_file_open()
> for all existing open files of the inode.

I know.

> Also, after the server calls FUSE_DEV_IOC_BACKING_CLOSE,
> will the fuse_backing object still be referenced from the inode
> (as it was in v13)? and then properly closed only on the last file
> close on that inode?

Yes, those seem the most intuitive semantics.

> I am not convinced that this complexity is a must for the first version.
> If the server always sets FOPEN_PASSTHROUGH (as I expect many
> servers will) this is unneeded complexity.
>
> It seems a *lot* easier to do what you suggested and ignore
> FOPEN_PASSTHROUGH if the server is not being consistent
> with the FOPEN_ flags it uses.

The problem with ignoring is that we can't change the semantics later.

So I think it would be better to enforce consistency such that if
there's already an open file, new open files will have to have the
same FOPEN_PASSTHROUGH state, and just reject with EIO if not.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-02 10:46                           ` Miklos Szeredi
@ 2023-11-02 13:07                             ` Amir Goldstein
  2023-11-02 13:13                               ` Miklos Szeredi
  0 siblings, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-11-02 13:07 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Thu, Nov 2, 2023 at 12:46 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> >
> > Remember that we would actually need to do backing_file_open()
> > for all existing open files of the inode.
>
> I know.
>
> > Also, after the server calls FUSE_DEV_IOC_BACKING_CLOSE,
> > will the fuse_backing object still be referenced from the inode
> > (as it was in v13)? and then properly closed only on the last file
> > close on that inode?
>
> Yes, those seem the most intuitive semantics.
>

Just to be clear, at the last close for an inode, we would check
if attribute cache needs to be invalidate and the inode will return
to "neutral" mode, when server could legally switch between
caching and passthrough mode.

This part is critical to my in-house server, because when files are
not open, their content may become evicted and they may no longer
be passed through to.

> > I am not convinced that this complexity is a must for the first version.
> > If the server always sets FOPEN_PASSTHROUGH (as I expect many
> > servers will) this is unneeded complexity.
> >
> > It seems a *lot* easier to do what you suggested and ignore
> > FOPEN_PASSTHROUGH if the server is not being consistent
> > with the FOPEN_ flags it uses.
>
> The problem with ignoring is that we can't change the semantics later.
>
> So I think it would be better to enforce consistency such that if
> there's already an open file, new open files will have to have the
> same FOPEN_PASSTHROUGH state, and just reject with EIO if not.
>

EIO works for me.
Just as simple.
Will try to get this ready for early 6.8 cycle.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-02 13:07                             ` Amir Goldstein
@ 2023-11-02 13:13                               ` Miklos Szeredi
  2024-01-26 12:13                                 ` Amir Goldstein
  0 siblings, 1 reply; 48+ messages in thread
From: Miklos Szeredi @ 2023-11-02 13:13 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Thu, 2 Nov 2023 at 14:08, Amir Goldstein <amir73il@gmail.com> wrote:

> Just to be clear, at the last close for an inode, we would check
> if attribute cache needs to be invalidate and the inode will return
> to "neutral" mode, when server could legally switch between
> caching and passthrough mode.

Exactly.

> EIO works for me.
> Just as simple.
> Will try to get this ready for early 6.8 cycle.

Sounds great.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-01 14:42                   ` Miklos Szeredi
  2023-11-01 15:06                     ` Amir Goldstein
@ 2023-11-29  7:25                     ` Amir Goldstein
  2023-11-29 14:13                       ` Miklos Szeredi
  1 sibling, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-11-29  7:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Wed, Nov 1, 2023 at 4:42 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, 1 Nov 2023 at 14:23, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > However, I can imagine users wanting to do passthrough for read-only
> > fd without doing passthrough for read-write fd, for example, if data is
> > never manipulated by the server, but it is inspected on write.
>
> Okay, that could make sense in a read-mostly use case.  But I think
> that special case could be enabled with FOPEN_DIRECT_IO since we have
> that anyway.  I.e. FOPEN_DIRECT_IO would override the per-inode state,
> which is what it does now.
>
> What doesn't make sense is to mix cached I/O with non-cached (direct
> or passthrough) I/O.
>

I now realized that FOPEN_DIRECT_IO still allows page cache read
and write-through via mmap.

That hinders the idea that disallowing a mix of cached+passthrough
opens on an inode is enough.

My proposed solution is to change the semantics with the init flag
FUSE_PASSTHROUGH to disallow mmap on FOPEN_DIRECT_IO
files.

So with a filesystem that supports FUSE_PASSTHROUGH,
FOPEN_DIRECT_IO files can only be used for direct read/write io and
mmap maps either the fuse inode pages or the backing inode pages.

This also means that FUSE_DIRECT_IO_RELAX conflicts with
FUSE_PASSTHROUGH.

Should we also deny mixing FUSE_HAS_INODE_DAX with
FUSE_PASSTHROUGH? Anything else from virtiofs?

While I have your attention, I am trying to consolidate the validation
of FOPEN_ mode vs inode state into fuse_finish_open().

What do you think about this partial patch that also moves
fuse_finish_open() to after fuse_release_nowrite().
Is it legit?

[patch to fuse_create_open() omitted for brevity]

--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -195,8 +195,8 @@ static void fuse_link_write_file(struct file *file)
        spin_unlock(&fi->lock);
 }

-void fuse_finish_open(struct inode *inode, struct file *file)
+int fuse_finish_open(struct inode *inode, struct file *file)
 {
        struct fuse_file *ff = file->private_data;
        struct fuse_conn *fc = get_fuse_conn(inode);
@@ -215,6 +215,9 @@ void fuse_finish_open(struct inode *inode, struct
file *file,
                spin_unlock(&fi->lock);
                file_update_time(file);
                fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
+               truncate_pagecache(inode, 0);
+       } else if (!(ff->open_flags & FOPEN_KEEP_CACHE)) {
+               invalidate_inode_pages2(inode->i_mapping);
        }
        if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache)
                fuse_link_write_file(file);
        }
+
+       return 0;
 }
@@ -253,18 +256,16 @@ int fuse_open_common(struct inode *inode, struct
file *file, bool isdir)
                fuse_set_nowrite(inode);

        err = fuse_do_open(fm, get_node_id(inode), file, isdir);
-       if (!err)
-               fuse_finish_open(inode, file);

        if (is_wb_truncate || dax_truncate)
                fuse_release_nowrite(inode);
+
        if (!err) {
                struct fuse_file *ff = file->private_data;

-               if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC))
-                       truncate_pagecache(inode, 0);
-               else if (!(ff->open_flags & FOPEN_KEEP_CACHE))
-                       invalidate_inode_pages2(inode->i_mapping);
+               err = fuse_finish_open(inode, file);
+               if (err)
+                       fuse_sync_release(get_fuse_inode(inode), ff, flags);
        }
        if (dax_truncate)
                filemap_invalidate_unlock(inode->i_mapping);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-29  7:25                     ` Amir Goldstein
@ 2023-11-29 14:13                       ` Miklos Szeredi
  2023-11-29 15:06                         ` Amir Goldstein
  0 siblings, 1 reply; 48+ messages in thread
From: Miklos Szeredi @ 2023-11-29 14:13 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Wed, 29 Nov 2023 at 08:25, Amir Goldstein <amir73il@gmail.com> wrote:

> My proposed solution is to change the semantics with the init flag
> FUSE_PASSTHROUGH to disallow mmap on FOPEN_DIRECT_IO
> files.

Why?  FOPEN_DIRECT_IO and FUSE_PASSTHROUGH should mix much more
readily than FOPEN_DIRECT_IO with page cache.

> So with a filesystem that supports FUSE_PASSTHROUGH,
> FOPEN_DIRECT_IO files can only be used for direct read/write io and
> mmap maps either the fuse inode pages or the backing inode pages.
>
> This also means that FUSE_DIRECT_IO_RELAX conflicts with
> FUSE_PASSTHROUGH.
>
> Should we also deny mixing FUSE_HAS_INODE_DAX with
> FUSE_PASSTHROUGH? Anything else from virtiofs?

Virtiofs/DAX and passthrough are similar features in totally different
environments.   We just need to make sure the code paths don't
collide.

> While I have your attention, I am trying to consolidate the validation
> of FOPEN_ mode vs inode state into fuse_finish_open().
>
> What do you think about this partial patch that also moves
> fuse_finish_open() to after fuse_release_nowrite().
> Is it legit?

I suspect it won't work due to the i_size reset being in
fuse_finish_open().  But I feel there's not enough context to
understand why this is needed.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-29 14:13                       ` Miklos Szeredi
@ 2023-11-29 15:06                         ` Amir Goldstein
  2023-11-29 15:21                           ` Miklos Szeredi
  0 siblings, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-11-29 15:06 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Wed, Nov 29, 2023 at 4:14 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, 29 Nov 2023 at 08:25, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > My proposed solution is to change the semantics with the init flag
> > FUSE_PASSTHROUGH to disallow mmap on FOPEN_DIRECT_IO
> > files.
>
> Why?  FOPEN_DIRECT_IO and FUSE_PASSTHROUGH should mix much more
> readily than FOPEN_DIRECT_IO with page cache.
>

Am I misunderstanding how mmap works with FOPEN_DIRECT_IO file?
My understanding is that mmap of FOPEN_DIRECT_IO reads/writes
from page cache of fuse inode.

To clarify, the plan is to never allow mixing open of passthrough and
cached files on the same inode.

It is allowed to open FOPEN_DIRECT_IO file for inode either in cached
or passthrough mode, but it is NOT allowed to mmap a FOPEN_DIRECT_IO
file for inode in passthrough mode.

However, if inode only has file open in FOPEN_DIRECT_IO mode, then inode
mode is neutral. If we allow mmap in this state then a later open in passthourgh
mode and mmap in passthrough mode will collide with the direct mode mmap.

Therefore, my proposal is than when filesystem is FUSE_PASSTHROUGH capable,
only passthrough file and cached file may be mmaped, but never allow to
mmap a FOPEN_DIRECT_IO file.

Does that make sense?


> > So with a filesystem that supports FUSE_PASSTHROUGH,
> > FOPEN_DIRECT_IO files can only be used for direct read/write io and
> > mmap maps either the fuse inode pages or the backing inode pages.
> >
> > This also means that FUSE_DIRECT_IO_RELAX conflicts with
> > FUSE_PASSTHROUGH.
> >
> > Should we also deny mixing FUSE_HAS_INODE_DAX with
> > FUSE_PASSTHROUGH? Anything else from virtiofs?
>
> Virtiofs/DAX and passthrough are similar features in totally different
> environments.   We just need to make sure the code paths don't
> collide.
>
> > While I have your attention, I am trying to consolidate the validation
> > of FOPEN_ mode vs inode state into fuse_finish_open().
> >
> > What do you think about this partial patch that also moves
> > fuse_finish_open() to after fuse_release_nowrite().
> > Is it legit?
>
> I suspect it won't work due to the i_size reset being in
> fuse_finish_open().  But I feel there's not enough context to
> understand why this is needed.
>

the re-order of fuse_finish_open() and fuse_release_nowrite() is not
needed - it just makes the code a little cleaner.

I will try without reorder.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-29 15:06                         ` Amir Goldstein
@ 2023-11-29 15:21                           ` Miklos Szeredi
  2023-11-29 15:52                             ` Amir Goldstein
  0 siblings, 1 reply; 48+ messages in thread
From: Miklos Szeredi @ 2023-11-29 15:21 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Wed, 29 Nov 2023 at 16:06, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Wed, Nov 29, 2023 at 4:14 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Wed, 29 Nov 2023 at 08:25, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > My proposed solution is to change the semantics with the init flag
> > > FUSE_PASSTHROUGH to disallow mmap on FOPEN_DIRECT_IO
> > > files.
> >
> > Why?  FOPEN_DIRECT_IO and FUSE_PASSTHROUGH should mix much more
> > readily than FOPEN_DIRECT_IO with page cache.
> >
>
> Am I misunderstanding how mmap works with FOPEN_DIRECT_IO file?
> My understanding is that mmap of FOPEN_DIRECT_IO reads/writes
> from page cache of fuse inode.
>
> To clarify, the plan is to never allow mixing open of passthrough and
> cached files on the same inode.

Yep.

But passthrough mmap + direct I/O should work, no?

> It is allowed to open FOPEN_DIRECT_IO file for inode either in cached
> or passthrough mode, but it is NOT allowed to mmap a FOPEN_DIRECT_IO
> file for inode in passthrough mode.
>
> However, if inode only has file open in FOPEN_DIRECT_IO mode, then inode
> mode is neutral. If we allow mmap in this state then a later open in passthourgh
> mode and mmap in passthrough mode will collide with the direct mode mmap.

We can keep track of when there are any page cache mmaps.  Does that
not solve this?

>
> Therefore, my proposal is than when filesystem is FUSE_PASSTHROUGH capable,
> only passthrough file and cached file may be mmaped, but never allow to
> mmap a FOPEN_DIRECT_IO file.
>
> Does that make sense?

I'm not sure I understand how this is supposed to work.   Disallowing
mmap will break applications.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-29 15:21                           ` Miklos Szeredi
@ 2023-11-29 15:52                             ` Amir Goldstein
  2023-11-29 16:55                               ` Miklos Szeredi
  0 siblings, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-11-29 15:52 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Wed, Nov 29, 2023 at 5:21 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, 29 Nov 2023 at 16:06, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Wed, Nov 29, 2023 at 4:14 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Wed, 29 Nov 2023 at 08:25, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > > My proposed solution is to change the semantics with the init flag
> > > > FUSE_PASSTHROUGH to disallow mmap on FOPEN_DIRECT_IO
> > > > files.
> > >
> > > Why?  FOPEN_DIRECT_IO and FUSE_PASSTHROUGH should mix much more
> > > readily than FOPEN_DIRECT_IO with page cache.
> > >
> >
> > Am I misunderstanding how mmap works with FOPEN_DIRECT_IO file?
> > My understanding is that mmap of FOPEN_DIRECT_IO reads/writes
> > from page cache of fuse inode.
> >
> > To clarify, the plan is to never allow mixing open of passthrough and
> > cached files on the same inode.
>
> Yep.
>
> But passthrough mmap + direct I/O should work, no?
>

direct I/O read()/write() is never a problem.

The question is whether mmap() on a file opened with FOPEN_DIRECT_IO
when the inode is in passthrough mode, also uses fuse_passthrough_mmap()?
or denied, similar to how mmap with ff->open_flags & FOPEN_DIRECT_IO &&
vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_relax
is denied?


> > It is allowed to open FOPEN_DIRECT_IO file for inode either in cached
> > or passthrough mode, but it is NOT allowed to mmap a FOPEN_DIRECT_IO
> > file for inode in passthrough mode.
> >
> > However, if inode only has file open in FOPEN_DIRECT_IO mode, then inode
> > mode is neutral. If we allow mmap in this state then a later open in passthourgh
> > mode and mmap in passthrough mode will collide with the direct mode mmap.
>
> We can keep track of when there are any page cache mmaps.  Does that
> not solve this?
>

A bit more challenging, because we will need to track unmounts, or at
least track
"was_cached_mmaped" state per file, but doable.

> >
> > Therefore, my proposal is than when filesystem is FUSE_PASSTHROUGH capable,
> > only passthrough file and cached file may be mmaped, but never allow to
> > mmap a FOPEN_DIRECT_IO file.
> >
> > Does that make sense?
>
> I'm not sure I understand how this is supposed to work.   Disallowing
> mmap will break applications.

How is this different from existing -ENODEV in fuse_file_mmap() for
!fc->direct_io_relax? What am I missing?
Is it because VM_MAYSHARE is far less common for applications?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-29 15:52                             ` Amir Goldstein
@ 2023-11-29 16:55                               ` Miklos Szeredi
  2023-11-29 17:39                                 ` Amir Goldstein
  2023-12-06  9:59                                 ` Amir Goldstein
  0 siblings, 2 replies; 48+ messages in thread
From: Miklos Szeredi @ 2023-11-29 16:55 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Wed, 29 Nov 2023 at 16:52, Amir Goldstein <amir73il@gmail.com> wrote:

> direct I/O read()/write() is never a problem.
>
> The question is whether mmap() on a file opened with FOPEN_DIRECT_IO
> when the inode is in passthrough mode, also uses fuse_passthrough_mmap()?

I think it should.

> or denied, similar to how mmap with ff->open_flags & FOPEN_DIRECT_IO &&
> vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_relax
> is denied?

What would be the use case for FOPEN_DIRECT_IO with passthrough mmap?

> A bit more challenging, because we will need to track unmounts, or at
> least track
> "was_cached_mmaped" state per file, but doable.

Tracking unmaps via fuse_vma_close() should not be difficult.


>
> > >
> > > Therefore, my proposal is than when filesystem is FUSE_PASSTHROUGH capable,
> > > only passthrough file and cached file may be mmaped, but never allow to
> > > mmap a FOPEN_DIRECT_IO file.
> > >
> > > Does that make sense?
> >
> > I'm not sure I understand how this is supposed to work.   Disallowing
> > mmap will break applications.
>
> How is this different from existing -ENODEV in fuse_file_mmap() for
> !fc->direct_io_relax? What am I missing?
> Is it because VM_MAYSHARE is far less common for applications?

Yes, MAP_SHARE is far less common than MAP_PRIVATE.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-29 16:55                               ` Miklos Szeredi
@ 2023-11-29 17:39                                 ` Amir Goldstein
  2023-11-29 20:46                                   ` Bernd Schubert
  2023-12-06  9:59                                 ` Amir Goldstein
  1 sibling, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-11-29 17:39 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Wed, Nov 29, 2023 at 6:55 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, 29 Nov 2023 at 16:52, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > direct I/O read()/write() is never a problem.
> >
> > The question is whether mmap() on a file opened with FOPEN_DIRECT_IO
> > when the inode is in passthrough mode, also uses fuse_passthrough_mmap()?
>
> I think it should.
>
> > or denied, similar to how mmap with ff->open_flags & FOPEN_DIRECT_IO &&
> > vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_relax
> > is denied?
>
> What would be the use case for FOPEN_DIRECT_IO with passthrough mmap?
>

I don't have a use case. That's why I was wondering if we should
support it at all, but will try to make it work.

> > A bit more challenging, because we will need to track unmounts, or at
> > least track
> > "was_cached_mmaped" state per file, but doable.
>
> Tracking unmaps via fuse_vma_close() should not be difficult.
>

OK. so any existing mmap, whether on FOPEN_DIRECT_IO or not
always prevents an inode from being "neutral".

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-29 17:39                                 ` Amir Goldstein
@ 2023-11-29 20:46                                   ` Bernd Schubert
  2023-11-29 21:39                                     ` Antonio SJ Musumeci
  0 siblings, 1 reply; 48+ messages in thread
From: Bernd Schubert @ 2023-11-29 20:46 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner



On 11/29/23 18:39, Amir Goldstein wrote:
> On Wed, Nov 29, 2023 at 6:55 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> On Wed, 29 Nov 2023 at 16:52, Amir Goldstein <amir73il@gmail.com> wrote:
>>
>>> direct I/O read()/write() is never a problem.
>>>
>>> The question is whether mmap() on a file opened with FOPEN_DIRECT_IO
>>> when the inode is in passthrough mode, also uses fuse_passthrough_mmap()?
>>
>> I think it should.
>>
>>> or denied, similar to how mmap with ff->open_flags & FOPEN_DIRECT_IO &&
>>> vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_relax
>>> is denied?
>>
>> What would be the use case for FOPEN_DIRECT_IO with passthrough mmap?
>>
> 
> I don't have a use case. That's why I was wondering if we should
> support it at all, but will try to make it work.

What is actually the use case for FOPEN_DIRECT_IO and passthrough? 
Avoiding double page cache?

> 
>>> A bit more challenging, because we will need to track unmounts, or at
>>> least track
>>> "was_cached_mmaped" state per file, but doable.
>>
>> Tracking unmaps via fuse_vma_close() should not be difficult.
>>
> 
> OK. so any existing mmap, whether on FOPEN_DIRECT_IO or not
> always prevents an inode from being "neutral".
> 


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-29 20:46                                   ` Bernd Schubert
@ 2023-11-29 21:39                                     ` Antonio SJ Musumeci
  2023-11-29 22:01                                       ` Bernd Schubert
  2023-11-30  7:12                                       ` Amir Goldstein
  0 siblings, 2 replies; 48+ messages in thread
From: Antonio SJ Musumeci @ 2023-11-29 21:39 UTC (permalink / raw)
  To: Bernd Schubert, Amir Goldstein, Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On 11/29/23 14:46, Bernd Schubert wrote:
> 
> 
> On 11/29/23 18:39, Amir Goldstein wrote:
>> On Wed, Nov 29, 2023 at 6:55 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>
>>> On Wed, 29 Nov 2023 at 16:52, Amir Goldstein <amir73il@gmail.com> wrote:
>>>
>>>> direct I/O read()/write() is never a problem.
>>>>
>>>> The question is whether mmap() on a file opened with FOPEN_DIRECT_IO
>>>> when the inode is in passthrough mode, also uses fuse_passthrough_mmap()?
>>>
>>> I think it should.
>>>
>>>> or denied, similar to how mmap with ff->open_flags & FOPEN_DIRECT_IO &&
>>>> vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_relax
>>>> is denied?
>>>
>>> What would be the use case for FOPEN_DIRECT_IO with passthrough mmap?
>>>
>>
>> I don't have a use case. That's why I was wondering if we should
>> support it at all, but will try to make it work.
> 
> What is actually the use case for FOPEN_DIRECT_IO and passthrough?
> Avoiding double page cache?
> 
>>
>>>> A bit more challenging, because we will need to track unmounts, or at
>>>> least track
>>>> "was_cached_mmaped" state per file, but doable.
>>>
>>> Tracking unmaps via fuse_vma_close() should not be difficult.
>>>
>>
>> OK. so any existing mmap, whether on FOPEN_DIRECT_IO or not
>> always prevents an inode from being "neutral".
>>
> 
> 
> Thanks,
> Bernd
> 

 > Avoiding double page cache?

Currently my users enable direct_io because 1) it is typically 
materially faster than not 2) avoiding double page caching (it is a 
union filesystem).

The only real reason people disable direct_io is because many apps need 
shared mmap. I've implemented a mode to lookup details about the 
requesting app and optionally disable direct_io for apps which are known 
to need shared mmap but that isn't ideal. The relaxed mode being 
discussed would likely be more performant and more transparent to the 
user. That transparency is nice if that can continue as it is already 
pretty difficult to explain all these options to the layman.

Offtopic: What happens in passthrough mode when an error occurs? 
Currently I have a number of behaviors relying on the fact that I can 
intercept and respond to errors. I think my users will gladly give them 
up for near native io perf but if they can get both it would be nice.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-29 21:39                                     ` Antonio SJ Musumeci
@ 2023-11-29 22:01                                       ` Bernd Schubert
  2023-11-30  7:29                                         ` Amir Goldstein
  2023-11-30  7:12                                       ` Amir Goldstein
  1 sibling, 1 reply; 48+ messages in thread
From: Bernd Schubert @ 2023-11-29 22:01 UTC (permalink / raw)
  To: Antonio SJ Musumeci, Amir Goldstein, Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner



On 11/29/23 22:39, Antonio SJ Musumeci wrote:
> On 11/29/23 14:46, Bernd Schubert wrote:
>>
>>
>> On 11/29/23 18:39, Amir Goldstein wrote:
>>> On Wed, Nov 29, 2023 at 6:55 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>
>>>> On Wed, 29 Nov 2023 at 16:52, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>
>>>>> direct I/O read()/write() is never a problem.
>>>>>
>>>>> The question is whether mmap() on a file opened with FOPEN_DIRECT_IO
>>>>> when the inode is in passthrough mode, also uses fuse_passthrough_mmap()?
>>>>
>>>> I think it should.
>>>>
>>>>> or denied, similar to how mmap with ff->open_flags & FOPEN_DIRECT_IO &&
>>>>> vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_relax
>>>>> is denied?
>>>>
>>>> What would be the use case for FOPEN_DIRECT_IO with passthrough mmap?
>>>>
>>>
>>> I don't have a use case. That's why I was wondering if we should
>>> support it at all, but will try to make it work.
>>
>> What is actually the use case for FOPEN_DIRECT_IO and passthrough?
>> Avoiding double page cache?
>>
>>>
>>>>> A bit more challenging, because we will need to track unmounts, or at
>>>>> least track
>>>>> "was_cached_mmaped" state per file, but doable.
>>>>
>>>> Tracking unmaps via fuse_vma_close() should not be difficult.
>>>>
>>>
>>> OK. so any existing mmap, whether on FOPEN_DIRECT_IO or not
>>> always prevents an inode from being "neutral".
>>>
>>
>>
>> Thanks,
>> Bernd
>>
> 
>   > Avoiding double page cache?
> 
> Currently my users enable direct_io because 1) it is typically
> materially faster than not 2) avoiding double page caching (it is a
> union filesystem).

3) You want coherency for network file systems (our use case).

So performance kind of means it is preferred to have it enabled for 
passthrough. And with that MAP_SHARED gets rather important, imho. I 
don't know if recent gcc versions still do it, but gcc used to write 
files using MAP_SHARED. In the HPC AI world python tools also tend to do 
IO with MAP_SHARED.

> 
> The only real reason people disable direct_io is because many apps need
> shared mmap. I've implemented a mode to lookup details about the
> requesting app and optionally disable direct_io for apps which are known
> to need shared mmap but that isn't ideal. The relaxed mode being
> discussed would likely be more performant and more transparent to the
> user. That transparency is nice if that can continue as it is already
> pretty difficult to explain all these options to the layman.
> 
> Offtopic: What happens in passthrough mode when an error occurs?
> Currently I have a number of behaviors relying on the fact that I can
> intercept and respond to errors. I think my users will gladly give them
> up for near native io perf but if they can get both it would be nice.
> 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-29 21:39                                     ` Antonio SJ Musumeci
  2023-11-29 22:01                                       ` Bernd Schubert
@ 2023-11-30  7:12                                       ` Amir Goldstein
  2023-11-30  8:17                                         ` Miklos Szeredi
  1 sibling, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-11-30  7:12 UTC (permalink / raw)
  To: Antonio SJ Musumeci
  Cc: Bernd Schubert, Miklos Szeredi, Daniel Rosenberg, Paul Lawrence,
	Alessio Balsini, Christian Brauner, fuse-devel, linux-fsdevel,
	Dave Chinner

On Wed, Nov 29, 2023 at 11:39 PM Antonio SJ Musumeci
<trapexit@spawn.link> wrote:
>
> On 11/29/23 14:46, Bernd Schubert wrote:
> >
> >
> > On 11/29/23 18:39, Amir Goldstein wrote:
> >> On Wed, Nov 29, 2023 at 6:55 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>>
> >>> On Wed, 29 Nov 2023 at 16:52, Amir Goldstein <amir73il@gmail.com> wrote:
> >>>
> >>>> direct I/O read()/write() is never a problem.
> >>>>
> >>>> The question is whether mmap() on a file opened with FOPEN_DIRECT_IO
> >>>> when the inode is in passthrough mode, also uses fuse_passthrough_mmap()?
> >>>
> >>> I think it should.
> >>>
> >>>> or denied, similar to how mmap with ff->open_flags & FOPEN_DIRECT_IO &&
> >>>> vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_relax
> >>>> is denied?
> >>>
> >>> What would be the use case for FOPEN_DIRECT_IO with passthrough mmap?
> >>>
> >>
> >> I don't have a use case. That's why I was wondering if we should
> >> support it at all, but will try to make it work.
> >
> > What is actually the use case for FOPEN_DIRECT_IO and passthrough?
> > Avoiding double page cache?
> >
> >>
> >>>> A bit more challenging, because we will need to track unmounts, or at
> >>>> least track
> >>>> "was_cached_mmaped" state per file, but doable.
> >>>
> >>> Tracking unmaps via fuse_vma_close() should not be difficult.
> >>>
> >>
> >> OK. so any existing mmap, whether on FOPEN_DIRECT_IO or not
> >> always prevents an inode from being "neutral".
> >>
> >
> >
> > Thanks,
> > Bernd
> >
>
>  > Avoiding double page cache?
>
> Currently my users enable direct_io because 1) it is typically
> materially faster than not 2) avoiding double page caching (it is a
> union filesystem).
>
> The only real reason people disable direct_io is because many apps need
> shared mmap. I've implemented a mode to lookup details about the
> requesting app and optionally disable direct_io for apps which are known
> to need shared mmap but that isn't ideal. The relaxed mode being
> discussed would likely be more performant and more transparent to the
> user. That transparency is nice if that can continue as it is already
> pretty difficult to explain all these options to the layman.
>
> Offtopic: What happens in passthrough mode when an error occurs?

passthrough is passthrough, in sickness and in health...

whatever read/write/splice on the real file returns, that is what application
will get for read/write/splice on the fuse file.

Regarding passthrough mmap - this means that the memory is really
mapped to the real file (in sickness and in health...).

The semantics of whether we do "normal" mmap or passthrough mmap
on a file that the server opened as FOPEN_DIRECT_IO will be:

If there is any file open for passthrough on this inode, mmap() will be to
the real inode page cache, regardless of FOPEN_DIRECT_IO.

Otherwise, whether there is a file open in cached mode or not, mmap()
will be to fuse inode page cache, regardless of FOPEN_DIRECT_IO.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-29 22:01                                       ` Bernd Schubert
@ 2023-11-30  7:29                                         ` Amir Goldstein
  0 siblings, 0 replies; 48+ messages in thread
From: Amir Goldstein @ 2023-11-30  7:29 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Antonio SJ Musumeci, Miklos Szeredi, Daniel Rosenberg,
	Paul Lawrence, Alessio Balsini, Christian Brauner, fuse-devel,
	linux-fsdevel, Dave Chinner

On Thu, Nov 30, 2023 at 12:01 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 11/29/23 22:39, Antonio SJ Musumeci wrote:
> > On 11/29/23 14:46, Bernd Schubert wrote:
> >>
> >>
> >> On 11/29/23 18:39, Amir Goldstein wrote:
> >>> On Wed, Nov 29, 2023 at 6:55 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>>>
> >>>> On Wed, 29 Nov 2023 at 16:52, Amir Goldstein <amir73il@gmail.com> wrote:
> >>>>
> >>>>> direct I/O read()/write() is never a problem.
> >>>>>
> >>>>> The question is whether mmap() on a file opened with FOPEN_DIRECT_IO
> >>>>> when the inode is in passthrough mode, also uses fuse_passthrough_mmap()?
> >>>>
> >>>> I think it should.
> >>>>
> >>>>> or denied, similar to how mmap with ff->open_flags & FOPEN_DIRECT_IO &&
> >>>>> vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_relax
> >>>>> is denied?
> >>>>
> >>>> What would be the use case for FOPEN_DIRECT_IO with passthrough mmap?
> >>>>
> >>>
> >>> I don't have a use case. That's why I was wondering if we should
> >>> support it at all, but will try to make it work.
> >>
> >> What is actually the use case for FOPEN_DIRECT_IO and passthrough?
> >> Avoiding double page cache?
> >>
> >>>
> >>>>> A bit more challenging, because we will need to track unmounts, or at
> >>>>> least track
> >>>>> "was_cached_mmaped" state per file, but doable.
> >>>>
> >>>> Tracking unmaps via fuse_vma_close() should not be difficult.
> >>>>
> >>>
> >>> OK. so any existing mmap, whether on FOPEN_DIRECT_IO or not
> >>> always prevents an inode from being "neutral".
> >>>
> >>
> >>
> >> Thanks,
> >> Bernd
> >>
> >
> >   > Avoiding double page cache?
> >
> > Currently my users enable direct_io because 1) it is typically
> > materially faster than not 2) avoiding double page caching (it is a
> > union filesystem).
>
> 3) You want coherency for network file systems (our use case).
>
> So performance kind of means it is preferred to have it enabled for
> passthrough. And with that MAP_SHARED gets rather important, imho. I
> don't know if recent gcc versions still do it, but gcc used to write
> files using MAP_SHARED. In the HPC AI world python tools also tend to do
> IO with MAP_SHARED.
>

I see. good to know.
That means that an inode in passthrough mode should always imply
direct_io_relax behavior.

The way that passthrough mode and network file systems use cases can
overlap is by per-inode basis, which is the typical case for HSM (out use case):
- some inodes are local and some are remote
- local inodes are open passthrough mode
- remote inodes are open in direct or cached mode
- inode can change state local => remote (a.k.a local storage evict) when there
  are no files open on the inode
- inode can change state remote => local if there are no files open in
  cached mode and no mmaps (*)
- specifically, remote files open in direct mode, do not block the transition
  remote => local, where later opens will be in passthrough mode, but
  the files already open in direct mode will not enjoy the performance boost

* If we relax the last condition we will end up with cache incoherency
   that exists in overlayfs after copy up, when lower file is still mmaped

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-30  7:12                                       ` Amir Goldstein
@ 2023-11-30  8:17                                         ` Miklos Szeredi
  0 siblings, 0 replies; 48+ messages in thread
From: Miklos Szeredi @ 2023-11-30  8:17 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Antonio SJ Musumeci, Bernd Schubert, Daniel Rosenberg,
	Paul Lawrence, Alessio Balsini, Christian Brauner, fuse-devel,
	linux-fsdevel, Dave Chinner

On Thu, 30 Nov 2023 at 08:12, Amir Goldstein <amir73il@gmail.com> wrote:

> > Offtopic: What happens in passthrough mode when an error occurs?
>
> passthrough is passthrough, in sickness and in health...

Fuse *could* intercept passthrough errors (except for mmap), and fall
back to direct I/O in that case.   But this is definitely not for the
initial version.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-29 16:55                               ` Miklos Szeredi
  2023-11-29 17:39                                 ` Amir Goldstein
@ 2023-12-06  9:59                                 ` Amir Goldstein
  2023-12-06 23:11                                   ` Bernd Schubert
  1 sibling, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-12-06  9:59 UTC (permalink / raw)
  To: Miklos Szeredi, Bernd Schubert
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Wed, Nov 29, 2023 at 6:55 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, 29 Nov 2023 at 16:52, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > direct I/O read()/write() is never a problem.
> >
> > The question is whether mmap() on a file opened with FOPEN_DIRECT_IO
> > when the inode is in passthrough mode, also uses fuse_passthrough_mmap()?
>
> I think it should.
>
> > or denied, similar to how mmap with ff->open_flags & FOPEN_DIRECT_IO &&
> > vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_relax
> > is denied?
>
> What would be the use case for FOPEN_DIRECT_IO with passthrough mmap?
>
> > A bit more challenging, because we will need to track unmounts, or at
> > least track
> > "was_cached_mmaped" state per file, but doable.
>
> Tracking unmaps via fuse_vma_close() should not be difficult.
>

I think that it is.

fuse_vma_close() does not seem to be balanced with fuse_file_mmap()
because IIUC, maps can be cloned via fork() etc.

It tried to implement an iocachectr refcount to track cache mmaps,
but it keeps underflowing in fuse_vma_close().

I would like us to consider a slightly different model.

We agreed that caching and passthrough mode on the same
inode cannot mix and there is no problem with different modes
per inode on the same filesystem.

I have a use case for mixing direct_io and passthrough on the
same inode (i.e. inode in passthrough mode).

I have no use case (yet) for the transition from caching to passthrough
mode on the same inode and direct_io cached mmaps complicate
things quite a bit for this scenario.

My proposal is to taint a direct_io file with FOPEN_CACHE_MMAP
if it was ever mmaped using page cache.
We will not try to clean this flag in fuse_vma_close(), it stays with
the file until release.

An FOPEN_CACHE_MMAP file forces an inode into caching mode,
same as a regular caching open.
We could allow server to set FOPEN_CACHE_MMAP along with
FOPEN_DIRECT_IO to preemptively deny future passthrough open,
but not sure this is important.
If we wanted to, we could let this flag combination have the same
meaning as direct_io_allow_mmap, but per file/inode.

In relation to the FOPEN_PARALLEL_DIRECT_WRITES vs.
FUSE_DIRECT_IO_ALLOW_MMAP discussion, Bernd has suggested
a per inode FUSE_I_CACHE_WRITES, state that tracks if caching writes
were ever done on inode to allow parallel dio on the rest of the inodes
in the filesystem.

FUSE_I_CACHE_WRITES is a sub-state of caching mode inode state.
I think maybe caching mode would be enough for both use cases -
preventing parallel dio and preventing passthrough open.

The result would be that parallel dio would be performed on inodes that
are not currently open in caching mode and have not been mmaped
at all (regardless of writes to page cache) using any of the currently
open direct_io files.

As long as the applications that use mmap write (e.g. compiler)
do not usually work on the same files as the applications that do
parallel dio writes (e.g. db) and as long as files that are typically mmaped
privately (exe and libs) don't need parallel dio writes,
I think that FUSE_I_CACHE_WRITES state will not be needed.

But maybe I am missing some cases. In any case, there is nothing
preventing FUSE_I_CACHE_WRITES to exist along side caching mode
if needed.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-12-06  9:59                                 ` Amir Goldstein
@ 2023-12-06 23:11                                   ` Bernd Schubert
  2023-12-07  7:23                                     ` Amir Goldstein
  0 siblings, 1 reply; 48+ messages in thread
From: Bernd Schubert @ 2023-12-06 23:11 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

Hi Amir,


On 12/6/23 10:59, Amir Goldstein wrote:
> On Wed, Nov 29, 2023 at 6:55 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> On Wed, 29 Nov 2023 at 16:52, Amir Goldstein <amir73il@gmail.com> wrote:
>>
>>> direct I/O read()/write() is never a problem.
>>>
>>> The question is whether mmap() on a file opened with FOPEN_DIRECT_IO
>>> when the inode is in passthrough mode, also uses fuse_passthrough_mmap()?
>>
>> I think it should.
>>
>>> or denied, similar to how mmap with ff->open_flags & FOPEN_DIRECT_IO &&
>>> vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_relax
>>> is denied?
>>
>> What would be the use case for FOPEN_DIRECT_IO with passthrough mmap?
>>
>>> A bit more challenging, because we will need to track unmounts, or at
>>> least track
>>> "was_cached_mmaped" state per file, but doable.
>>
>> Tracking unmaps via fuse_vma_close() should not be difficult.
>>
> 
> I think that it is.
> 
> fuse_vma_close() does not seem to be balanced with fuse_file_mmap()
> because IIUC, maps can be cloned via fork() etc.
> 
> It tried to implement an iocachectr refcount to track cache mmaps,
> but it keeps underflowing in fuse_vma_close().
> 
> I would like us to consider a slightly different model.
> 
> We agreed that caching and passthrough mode on the same
> inode cannot mix and there is no problem with different modes
> per inode on the same filesystem.
> 
> I have a use case for mixing direct_io and passthrough on the
> same inode (i.e. inode in passthrough mode).
> 
> I have no use case (yet) for the transition from caching to passthrough
> mode on the same inode and direct_io cached mmaps complicate
> things quite a bit for this scenario.
> 
> My proposal is to taint a direct_io file with FOPEN_CACHE_MMAP
> if it was ever mmaped using page cache.
> We will not try to clean this flag in fuse_vma_close(), it stays with
> the file until release.
> 
> An FOPEN_CACHE_MMAP file forces an inode into caching mode,
> same as a regular caching open.

where do you actually want to set that flag? My initial idea for 
FUSE_I_CACHE_WRITES was to set that in fuse_file_mmap, but I would have 
needed the i_rwsem lock and that resulted in a lock ordering issue.

> We could allow server to set FOPEN_CACHE_MMAP along with
> FOPEN_DIRECT_IO to preemptively deny future passthrough open,
> but not sure this is important.
> If we wanted to, we could let this flag combination have the same
> meaning as direct_io_allow_mmap, but per file/inode.
> 
> In relation to the FOPEN_PARALLEL_DIRECT_WRITES vs.
> FUSE_DIRECT_IO_ALLOW_MMAP discussion, Bernd has suggested
> a per inode FUSE_I_CACHE_WRITES, state that tracks if caching writes
> were ever done on inode to allow parallel dio on the rest of the inodes
> in the filesystem.
> 
> FUSE_I_CACHE_WRITES is a sub-state of caching mode inode state.
> I think maybe caching mode would be enough for both use cases -
> preventing parallel dio and preventing passthrough open.
> 
> The result would be that parallel dio would be performed on inodes that
> are not currently open in caching mode and have not been mmaped
> at all (regardless of writes to page cache) using any of the currently
> open direct_io files.
> 
> As long as the applications that use mmap write (e.g. compiler)
> do not usually work on the same files as the applications that do
> parallel dio writes (e.g. db) and as long as files that are typically mmaped
> privately (exe and libs) don't need parallel dio writes,
> I think that FUSE_I_CACHE_WRITES state will not be needed.
> 
> But maybe I am missing some cases. In any case, there is nothing
> preventing FUSE_I_CACHE_WRITES to exist along side caching mode
> if needed.

 From the description is sounds like the caching mode inode state should 
be enough for FOPEN_PARALLEL_DIRECT_WRITES/FUSE_DIRECT_IO_ALLOW_MMAP. 
I'm just not sure where the flag will be set (as above).
I guess at some point also use cases might come up that need MMAP + 
parallel DIO, at different times or different offsets. Though, before 
making the code even more complex I would first like to see a real world 
use case for that. Next part for me is to rebase the dio consolidation 
branch and to get a shared lock for O_DIRECT (without the need for 
FOPEN_DIRECT_IO).


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-12-06 23:11                                   ` Bernd Schubert
@ 2023-12-07  7:23                                     ` Amir Goldstein
  2023-12-07  8:56                                       ` Bernd Schubert
  0 siblings, 1 reply; 48+ messages in thread
From: Amir Goldstein @ 2023-12-07  7:23 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Thu, Dec 7, 2023 at 1:11 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> Hi Amir,
>
>
> On 12/6/23 10:59, Amir Goldstein wrote:
> > On Wed, Nov 29, 2023 at 6:55 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>
> >> On Wed, 29 Nov 2023 at 16:52, Amir Goldstein <amir73il@gmail.com> wrote:
> >>
> >>> direct I/O read()/write() is never a problem.
> >>>
> >>> The question is whether mmap() on a file opened with FOPEN_DIRECT_IO
> >>> when the inode is in passthrough mode, also uses fuse_passthrough_mmap()?
> >>
> >> I think it should.
> >>
> >>> or denied, similar to how mmap with ff->open_flags & FOPEN_DIRECT_IO &&
> >>> vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_relax
> >>> is denied?
> >>
> >> What would be the use case for FOPEN_DIRECT_IO with passthrough mmap?
> >>
> >>> A bit more challenging, because we will need to track unmounts, or at
> >>> least track
> >>> "was_cached_mmaped" state per file, but doable.
> >>
> >> Tracking unmaps via fuse_vma_close() should not be difficult.
> >>
> >
> > I think that it is.
> >
> > fuse_vma_close() does not seem to be balanced with fuse_file_mmap()
> > because IIUC, maps can be cloned via fork() etc.
> >
> > It tried to implement an iocachectr refcount to track cache mmaps,
> > but it keeps underflowing in fuse_vma_close().
> >
> > I would like us to consider a slightly different model.
> >
> > We agreed that caching and passthrough mode on the same
> > inode cannot mix and there is no problem with different modes
> > per inode on the same filesystem.
> >
> > I have a use case for mixing direct_io and passthrough on the
> > same inode (i.e. inode in passthrough mode).
> >
> > I have no use case (yet) for the transition from caching to passthrough
> > mode on the same inode and direct_io cached mmaps complicate
> > things quite a bit for this scenario.
> >
> > My proposal is to taint a direct_io file with FOPEN_CACHE_MMAP
> > if it was ever mmaped using page cache.
> > We will not try to clean this flag in fuse_vma_close(), it stays with
> > the file until release.
> >
> > An FOPEN_CACHE_MMAP file forces an inode into caching mode,
> > same as a regular caching open.
>
> where do you actually want to set that flag? My initial idea for
> FUSE_I_CACHE_WRITES was to set that in fuse_file_mmap, but I would have
> needed the i_rwsem lock and that resulted in a lock ordering issue.
>

Yes, the idea is to set this flag on the first mmap of a FOPEN_DIRECT_IO file.
Why does that require an i_rwsem lock?

Before setting FUSE_I_CACHE_WRITES inode flag, your patch does:
    wait_event(fi->direct_io_waitq, fi->shared_lock_direct_io_ctr == 0);

We can do the same in direct_io mmap, before setting the
FOPEN_CACHE_MMAP file flag and consequently changing inode
mode to caching. No?

I will try to prepare a patch today to demonstrate.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-12-07  7:23                                     ` Amir Goldstein
@ 2023-12-07  8:56                                       ` Bernd Schubert
  0 siblings, 0 replies; 48+ messages in thread
From: Bernd Schubert @ 2023-12-07  8:56 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner



On 12/7/23 08:23, Amir Goldstein wrote:
> On Thu, Dec 7, 2023 at 1:11 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>> Hi Amir,
>>
>>
>> On 12/6/23 10:59, Amir Goldstein wrote:
>>> On Wed, Nov 29, 2023 at 6:55 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>
>>>> On Wed, 29 Nov 2023 at 16:52, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>
>>>>> direct I/O read()/write() is never a problem.
>>>>>
>>>>> The question is whether mmap() on a file opened with FOPEN_DIRECT_IO
>>>>> when the inode is in passthrough mode, also uses fuse_passthrough_mmap()?
>>>>
>>>> I think it should.
>>>>
>>>>> or denied, similar to how mmap with ff->open_flags & FOPEN_DIRECT_IO &&
>>>>> vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_relax
>>>>> is denied?
>>>>
>>>> What would be the use case for FOPEN_DIRECT_IO with passthrough mmap?
>>>>
>>>>> A bit more challenging, because we will need to track unmounts, or at
>>>>> least track
>>>>> "was_cached_mmaped" state per file, but doable.
>>>>
>>>> Tracking unmaps via fuse_vma_close() should not be difficult.
>>>>
>>>
>>> I think that it is.
>>>
>>> fuse_vma_close() does not seem to be balanced with fuse_file_mmap()
>>> because IIUC, maps can be cloned via fork() etc.
>>>
>>> It tried to implement an iocachectr refcount to track cache mmaps,
>>> but it keeps underflowing in fuse_vma_close().
>>>
>>> I would like us to consider a slightly different model.
>>>
>>> We agreed that caching and passthrough mode on the same
>>> inode cannot mix and there is no problem with different modes
>>> per inode on the same filesystem.
>>>
>>> I have a use case for mixing direct_io and passthrough on the
>>> same inode (i.e. inode in passthrough mode).
>>>
>>> I have no use case (yet) for the transition from caching to passthrough
>>> mode on the same inode and direct_io cached mmaps complicate
>>> things quite a bit for this scenario.
>>>
>>> My proposal is to taint a direct_io file with FOPEN_CACHE_MMAP
>>> if it was ever mmaped using page cache.
>>> We will not try to clean this flag in fuse_vma_close(), it stays with
>>> the file until release.
>>>
>>> An FOPEN_CACHE_MMAP file forces an inode into caching mode,
>>> same as a regular caching open.
>>
>> where do you actually want to set that flag? My initial idea for
>> FUSE_I_CACHE_WRITES was to set that in fuse_file_mmap, but I would have
>> needed the i_rwsem lock and that resulted in a lock ordering issue.
>>
> 
> Yes, the idea is to set this flag on the first mmap of a FOPEN_DIRECT_IO file.
> Why does that require an i_rwsem lock?
> 
> Before setting FUSE_I_CACHE_WRITES inode flag, your patch does:
>      wait_event(fi->direct_io_waitq, fi->shared_lock_direct_io_ctr == 0);
> 
> We can do the same in direct_io mmap, before setting the
> FOPEN_CACHE_MMAP file flag and consequently changing inode
> mode to caching. No?


Yeah right, that will work.

Thanks,
Bernd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v14 00/12] FUSE passthrough for file io
  2023-11-02 13:13                               ` Miklos Szeredi
@ 2024-01-26 12:13                                 ` Amir Goldstein
  0 siblings, 0 replies; 48+ messages in thread
From: Amir Goldstein @ 2024-01-26 12:13 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	Christian Brauner, fuse-devel, linux-fsdevel, Dave Chinner

On Thu, Nov 2, 2023 at 3:13 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 2 Nov 2023 at 14:08, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > Just to be clear, at the last close for an inode, we would check
> > if attribute cache needs to be invalidate and the inode will return
> > to "neutral" mode, when server could legally switch between
> > caching and passthrough mode.
>
> Exactly.

FYI, this is now implemented in the fuse-backing-fd branch OTM [1],
including auto-invalidate of attributes in fuse_getattr() when there is
a backing inode.

With this in place, the few fstests that were reported to fail in the v14
cover letter (top of this thread) are now passing.

One detail worth noting is that I found it too complicated to support
writeback cache (for the non-passthrough inodes) along with fuse
passthrough support on the same filesystem, so for now, this
combination is not allowed.

>
> > EIO works for me.
> > Just as simple.
> > Will try to get this ready for early 6.8 cycle.
>

My patches are based on top of the fuse IO mode patches [2] that
were a joined effort with Bernd.

I will wait for Bernd to post his patches before I post the FUSE
passthrough patches.

There are a few questionable behaviors w.r.t mixing parallel dio
with fuse passthrough, but I won't get into them now.
We can discuss them after the patches are posted.

I will mention that I took a design choice that server can
(and is encouraged to) use FOPEN_DIRECT_IO together with
FUSE_PASSTHROUGH and a backing file id to request that read/write
will go directly to server, but mmap will have a backing inode to map to,
so that it won't need to use page cache and deny future passthrough open
of the same inode.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fuse-backing-fd-260124/
[2] https://github.com/amir73il/linux/commits/fuse_io_mode

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2024-01-26 12:13 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-16 16:08 [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
2023-10-16 16:08 ` [PATCH v14 01/12] fs: prepare for stackable filesystems backing file helpers Amir Goldstein
2023-10-16 16:08 ` [PATCH v14 02/12] fs: factor out backing_file_{read,write}_iter() helpers Amir Goldstein
2023-10-16 16:08 ` [PATCH v14 03/12] fs: factor out backing_file_splice_{read,write}() helpers Amir Goldstein
2023-10-16 16:08 ` [PATCH v14 04/12] fs: factor out backing_file_mmap() helper Amir Goldstein
2023-10-16 16:08 ` [PATCH v14 05/12] fuse: factor out helper for FUSE_DEV_IOC_CLONE Amir Goldstein
2023-10-16 16:08 ` [PATCH v14 06/12] fuse: introduce FUSE_PASSTHROUGH capability Amir Goldstein
2023-10-16 16:08 ` [PATCH v14 07/12] fuse: pass optional backing_id in struct fuse_open_out Amir Goldstein
2023-10-16 16:08 ` [PATCH v14 08/12] fuse: implement ioctls to manage backing files Amir Goldstein
2023-10-17  9:45   ` Amir Goldstein
2023-10-16 16:08 ` [PATCH v14 09/12] fuse: implement read/write passthrough Amir Goldstein
2023-10-16 16:09 ` [PATCH v14 10/12] fuse: implement splice_{read/write} passthrough Amir Goldstein
2023-10-16 16:09 ` [PATCH v14 11/12] fuse: implement passthrough for mmap Amir Goldstein
2023-10-16 16:09 ` [PATCH v14 12/12] fuse: implement passthrough for readdir Amir Goldstein
2023-10-19 14:33 ` [PATCH v14 00/12] FUSE passthrough for file io Amir Goldstein
2023-10-30 10:16   ` Miklos Szeredi
2023-10-31 10:28     ` Amir Goldstein
2023-10-31 11:16       ` Miklos Szeredi
2023-10-31 12:31         ` Amir Goldstein
2023-10-31 15:01           ` Miklos Szeredi
2023-10-31 17:44             ` Amir Goldstein
2023-11-01 11:32               ` Miklos Szeredi
2023-11-01 13:23                 ` Amir Goldstein
2023-11-01 14:42                   ` Miklos Szeredi
2023-11-01 15:06                     ` Amir Goldstein
2023-11-01 15:25                       ` Miklos Szeredi
2023-11-01 18:32                         ` Amir Goldstein
2023-11-02 10:46                           ` Miklos Szeredi
2023-11-02 13:07                             ` Amir Goldstein
2023-11-02 13:13                               ` Miklos Szeredi
2024-01-26 12:13                                 ` Amir Goldstein
2023-11-29  7:25                     ` Amir Goldstein
2023-11-29 14:13                       ` Miklos Szeredi
2023-11-29 15:06                         ` Amir Goldstein
2023-11-29 15:21                           ` Miklos Szeredi
2023-11-29 15:52                             ` Amir Goldstein
2023-11-29 16:55                               ` Miklos Szeredi
2023-11-29 17:39                                 ` Amir Goldstein
2023-11-29 20:46                                   ` Bernd Schubert
2023-11-29 21:39                                     ` Antonio SJ Musumeci
2023-11-29 22:01                                       ` Bernd Schubert
2023-11-30  7:29                                         ` Amir Goldstein
2023-11-30  7:12                                       ` Amir Goldstein
2023-11-30  8:17                                         ` Miklos Szeredi
2023-12-06  9:59                                 ` Amir Goldstein
2023-12-06 23:11                                   ` Bernd Schubert
2023-12-07  7:23                                     ` Amir Goldstein
2023-12-07  8:56                                       ` Bernd Schubert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.