linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v13 00/10] fuse: Add support for passthrough read/write
@ 2023-05-19 12:56 Amir Goldstein
  2023-05-19 12:56 ` [PATCH v13 01/10] fs: Generic function to convert iocb to rw flags Amir Goldstein
                   ` (11 more replies)
  0 siblings, 12 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-05-19 12:56 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

Miklos,

This patch set addresses your review feedback on Alesio's V12 patch set
from 2021 [1] as well as other bugs that I have found since.
This patch set uses refcounted backing files as we discussed recently [2].

I am posting this for several possible outcomes:

1. Either FUSE-BPF develpers can use this as a reference implementation
   for their 1st phase of "backing file passthrough"
2. Or they can tell me which API changes need to made to this patch set
   so the API is flexible enough to extend to "backing inode passthrough"
   and to "BPF filters" later on
3. We find there is little overlap in the APIs and merge this as is

These patches are available on github [3] along with libfuse patches [4].
I tested them by running xfstests (./check -fuse -g quick.rw) with latest
libfuse xfstest support.

Without FOPEN_PASSTHROUGH, one test in this group fails (generic/451)
which tests mixed buffered/aio writes.
With FOPEN_PASSTHROUGH, this test also passes.

This revision does not set any limitations on the number of backing files
that can be mapped by the server.  I considered several ways to address
this and decided to try a different approach.

Patch 10 (with matching libfuse patch) is an RFC patch for an alternative
API approach. Please see my comments on that patch.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/20210125153057.3623715-1-balsini@android.com/
[2] https://lore.kernel.org/linux-fsdevel/CAJfpegvbMKadnsBZmEvZpCxeWaMEGDRiDBqEZqaBSXcWyPZnpA@mail.gmail.com/
[3] https://github.com/amir73il/linux/commits/fuse-passthrough-fd
[4] https://github.com/amir73il/libfuse/commits/fuse-passthrough-fd

Changes since v12:
- Rebase to v6.4-rc2
- Reword 'lower file' language to 'backing file'
- Add explicit FOPEN_PASSTHROUGH open flags
- Remove fuse_passthrough_out container
- Add FUSE_DEV_IOC_PASSTHROUGH_CLOSE ioctl
- Add experimental FUSE_DEV_IOC_PASSTHROUGH_SETUP ioctl
- Distinguished errors for failures to create passthrough id
  (EBADF, EOPNOTSUPP, ELOOP)
- idr and fuse_file point to refcounted passthrough object
- Use rcu_read_lock() to get passthrough object by id
- Handle errors to setup passthrough in atomic_open()
- Invalidate mtime/size after passthrough write
- Invalidate atime after passthrough read/mmap
- Bump FUSE protocol minor version
Alessio Balsini (2):
  fs: Generic function to convert iocb to rw flags
  fuse: Definitions and ioctl for passthrough

Amir Goldstein (8):
  fuse: Passthrough initialization and release
  fuse: Introduce synchronous read and write for passthrough
  fuse: Handle asynchronous read and write in passthrough
  fuse: Use daemon creds in passthrough mode
  fuse: Introduce passthrough for mmap
  fuse: update inode size/mtime after passthrough write
  fuse: invalidate atime after passthrough read/mmap
  fuse: setup a passthrough fd without a permanent backing id

 fs/fuse/Makefile          |   1 +
 fs/fuse/dev.c             |  76 ++++++++-
 fs/fuse/dir.c             |   7 +-
 fs/fuse/file.c            |  28 +++-
 fs/fuse/fuse_i.h          |  48 +++++-
 fs/fuse/inode.c           |  22 ++-
 fs/fuse/passthrough.c     | 344 ++++++++++++++++++++++++++++++++++++++
 fs/overlayfs/file.c       |  23 +--
 include/linux/fs.h        |   5 +
 include/uapi/linux/fuse.h |  21 ++-
 10 files changed, 535 insertions(+), 40 deletions(-)
 create mode 100644 fs/fuse/passthrough.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v13 01/10] fs: Generic function to convert iocb to rw flags
  2023-05-19 12:56 [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
@ 2023-05-19 12:56 ` Amir Goldstein
  2023-05-19 12:56 ` [PATCH v13 02/10] fuse: Definitions and ioctl for passthrough Amir Goldstein
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-05-19 12:56 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

From: Alessio Balsini <balsini@android.com>

OverlayFS implements its own function to translate iocb flags into rw
flags, so that they can be passed into another vfs call.
With commit ce71bfea207b4 ("fs: align IOCB_* flags with RWF_* flags")
Jens created a 1:1 matching between the iocb flags and rw flags,
simplifying the conversion.

Reduce the OverlayFS code by making the flag conversion function generic
and reusable.

Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c | 23 +++++------------------
 include/linux/fs.h  |  5 +++++
 2 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 7c04f033aadd..759893e4da04 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -15,6 +15,8 @@
 #include <linux/fs.h>
 #include "overlayfs.h"
 
+#define OVL_IOCB_MASK (IOCB_DSYNC | IOCB_HIPRI | IOCB_NOWAIT | IOCB_SYNC)
+
 struct ovl_aio_req {
 	struct kiocb iocb;
 	refcount_t ref;
@@ -241,22 +243,6 @@ static void ovl_file_accessed(struct file *file)
 	touch_atime(&file->f_path);
 }
 
-static rwf_t ovl_iocb_to_rwf(int ifl)
-{
-	rwf_t flags = 0;
-
-	if (ifl & IOCB_NOWAIT)
-		flags |= RWF_NOWAIT;
-	if (ifl & IOCB_HIPRI)
-		flags |= RWF_HIPRI;
-	if (ifl & IOCB_DSYNC)
-		flags |= RWF_DSYNC;
-	if (ifl & IOCB_SYNC)
-		flags |= RWF_SYNC;
-
-	return flags;
-}
-
 static inline void ovl_aio_put(struct ovl_aio_req *aio_req)
 {
 	if (refcount_dec_and_test(&aio_req->ref)) {
@@ -316,7 +302,8 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	if (is_sync_kiocb(iocb)) {
 		ret = vfs_iter_read(real.file, iter, &iocb->ki_pos,
-				    ovl_iocb_to_rwf(iocb->ki_flags));
+				    iocb_to_rw_flags(iocb->ki_flags,
+						     OVL_IOCB_MASK));
 	} else {
 		struct ovl_aio_req *aio_req;
 
@@ -380,7 +367,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (is_sync_kiocb(iocb)) {
 		file_start_write(real.file);
 		ret = vfs_iter_write(real.file, iter, &iocb->ki_pos,
-				     ovl_iocb_to_rwf(ifl));
+				     iocb_to_rw_flags(ifl, OVL_IOCB_MASK));
 		file_end_write(real.file);
 		/* Update size */
 		ovl_copyattr(inode);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21a981680856..65f0b833cc80 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3028,6 +3028,11 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
 	return 0;
 }
 
+static inline rwf_t iocb_to_rw_flags(int ifl, int iocb_mask)
+{
+	return ifl & iocb_mask;
+}
+
 static inline ino_t parent_ino(struct dentry *dentry)
 {
 	ino_t res;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 02/10] fuse: Definitions and ioctl for passthrough
  2023-05-19 12:56 [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
  2023-05-19 12:56 ` [PATCH v13 01/10] fs: Generic function to convert iocb to rw flags Amir Goldstein
@ 2023-05-19 12:56 ` Amir Goldstein
  2023-05-19 15:12   ` Miklos Szeredi
  2023-05-19 12:56 ` [PATCH v13 03/10] fuse: Passthrough initialization and release Amir Goldstein
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-05-19 12:56 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

From: Alessio Balsini <balsini@android.com>

Expose the FUSE_PASSTHROUGH capability to user space and declare all the
basic data structures and functions as the skeleton on top of which the
FUSE passthrough functionality will be built.

As part of this, introduce the new FUSE passthrough ioctl, which allows
the FUSE daemon to specify a direct connection between a FUSE file and a
backing file.  The ioctl requires user space to pass the file descriptor
of one of its opened files to the FUSE driver and get an id in return.
This id may be passed in a reply to OPEN with flag FOPEN_PASSTHROUGH
to setup passthrough of read/write operations on the open file.

Also, add the passthrough functions for the set-up and tear-down of the
data structures and locks that will be used both when fuse_conns and
fuse_files are created/deleted.

Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/Makefile          |  1 +
 fs/fuse/dev.c             | 33 ++++++++++++++++++++++++--------
 fs/fuse/dir.c             |  7 ++++++-
 fs/fuse/file.c            | 17 +++++++++++++----
 fs/fuse/fuse_i.h          | 27 ++++++++++++++++++++++++++
 fs/fuse/inode.c           | 21 +++++++++++++++++++-
 fs/fuse/passthrough.c     | 40 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fuse.h | 13 +++++++++++--
 8 files changed, 143 insertions(+), 16 deletions(-)
 create mode 100644 fs/fuse/passthrough.c

diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 0c48b35c058d..d9e1b47382f3 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_CUSE) += cuse.o
 obj-$(CONFIG_VIRTIO_FS) += virtiofs.o
 
 fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
+fuse-y += passthrough.o
 fuse-$(CONFIG_FUSE_DAX) += dax.o
 
 virtiofs-y := virtio_fs.o
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 1a8f82f478cb..cb00234e7843 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2255,16 +2255,19 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
 			   unsigned long arg)
 {
 	int res;
-	int oldfd;
-	struct fuse_dev *fud = NULL;
+	int fd, id;
+	struct fuse_dev *fud = fuse_get_dev(file);
 	struct fd f;
 
+	if (!fud)
+		return -EINVAL;
+
 	switch (cmd) {
 	case FUSE_DEV_IOC_CLONE:
-		if (get_user(oldfd, (__u32 __user *)arg))
+		if (get_user(fd, (__u32 __user *)arg))
 			return -EFAULT;
 
-		f = fdget(oldfd);
+		f = fdget(fd);
 		if (!f.file)
 			return -EINVAL;
 
@@ -2272,17 +2275,31 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
 		 * Check against file->f_op because CUSE
 		 * uses the same ioctl handler.
 		 */
-		if (f.file->f_op == file->f_op)
-			fud = fuse_get_dev(f.file);
-
 		res = -EINVAL;
-		if (fud) {
+		if (f.file->f_op == file->f_op) {
 			mutex_lock(&fuse_mutex);
 			res = fuse_device_clone(fud->fc, file);
 			mutex_unlock(&fuse_mutex);
 		}
 		fdput(f);
 		break;
+	case FUSE_DEV_IOC_PASSTHROUGH_OPEN:
+		if (get_user(fd, (__u32 __user *)arg))
+			return -EFAULT;
+
+		f = fdget(fd);
+		if (!f.file)
+			return -EINVAL;
+
+		res = fuse_passthrough_open(fud->fc, fd);
+		fdput(f);
+		break;
+	case FUSE_DEV_IOC_PASSTHROUGH_CLOSE:
+		if (get_user(id, (__u32 __user *)arg))
+			return -EFAULT;
+
+		res = fuse_passthrough_close(fud->fc, id);
+		break;
 	default:
 		res = -ENOTTY;
 		break;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 35bc174f9ba2..1894298e7f7a 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -619,6 +619,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 {
 	int err;
 	struct inode *inode;
+	struct fuse_conn *fc = get_fuse_conn(dir);
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	FUSE_ARGS(args);
 	struct fuse_forget_link *forget;
@@ -700,7 +701,11 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 	d_instantiate(entry, inode);
 	fuse_change_entry_timeout(entry, &outentry);
 	fuse_dir_changed(dir);
-	err = finish_open(file, entry, generic_file_open);
+	err = 0;
+	if (ff->open_flags & FOPEN_PASSTHROUGH)
+		err = fuse_passthrough_setup(fc, ff, &outopen);
+	if (!err)
+		err = finish_open(file, entry, generic_file_open);
 	if (err) {
 		fi = get_fuse_inode(inode);
 		fuse_sync_release(fi, ff, flags);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 89d97f6188e0..96a46a5aa892 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -132,6 +132,7 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 	struct fuse_conn *fc = fm->fc;
 	struct fuse_file *ff;
 	int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN;
+	int err;
 
 	ff = fuse_file_alloc(fm);
 	if (!ff)
@@ -142,16 +143,17 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 	ff->open_flags = FOPEN_KEEP_CACHE | (isdir ? FOPEN_CACHE_DIR : 0);
 	if (isdir ? !fc->no_opendir : !fc->no_open) {
 		struct fuse_open_out outarg;
-		int err;
 
 		err = fuse_send_open(fm, nodeid, open_flags, opcode, &outarg);
 		if (!err) {
 			ff->fh = outarg.fh;
 			ff->open_flags = outarg.open_flags;
-
+			if (ff->open_flags & FOPEN_PASSTHROUGH)
+				err = fuse_passthrough_setup(fc, ff, &outarg);
+			if (err)
+				goto out_free_ff;
 		} else if (err != -ENOSYS) {
-			fuse_file_free(ff);
-			return ERR_PTR(err);
+			goto out_free_ff;
 		} else {
 			if (isdir)
 				fc->no_opendir = 1;
@@ -166,6 +168,10 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 	ff->nodeid = nodeid;
 
 	return ff;
+
+out_free_ff:
+	fuse_file_free(ff);
+	return ERR_PTR(err);
 }
 
 int fuse_do_open(struct fuse_mount *fm, u64 nodeid, struct file *file,
@@ -279,6 +285,9 @@ static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff,
 	struct fuse_conn *fc = ff->fm->fc;
 	struct fuse_release_args *ra = ff->release_args;
 
+	fuse_passthrough_put(ff->passthrough);
+	ff->passthrough = NULL;
+
 	/* Inode is NULL on error path of fuse_create_open() */
 	if (likely(fi)) {
 		spin_lock(&fi->lock);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 9b7fc7d3c7f1..f52604534ff6 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -173,6 +173,16 @@ struct fuse_conn;
 struct fuse_mount;
 struct fuse_release_args;
 
+/**
+ * Reference to backing file for read/write operations in passthrough mode.
+ */
+struct fuse_passthrough {
+	struct file *filp;
+
+	/** refcount */
+	refcount_t count;
+};
+
 /** FUSE specific file data */
 struct fuse_file {
 	/** Fuse connection for this file */
@@ -218,6 +228,9 @@ struct fuse_file {
 
 	} readdir;
 
+	/** Container for data related to the passthrough functionality */
+	struct fuse_passthrough *passthrough;
+
 	/** RB node to be linked on fuse_conn->polled_files */
 	struct rb_node polled_node;
 
@@ -792,6 +805,9 @@ struct fuse_conn {
 	/* Is tmpfile not implemented by fs? */
 	unsigned int no_tmpfile:1;
 
+	/** Passthrough mode for read/write IO */
+	unsigned int passthrough:1;
+
 	/** The number of requests waiting for completion */
 	atomic_t num_waiting;
 
@@ -841,6 +857,9 @@ struct fuse_conn {
 
 	/* New writepages go into this bucket */
 	struct fuse_sync_bucket __rcu *curr_bucket;
+
+	/** IDR for passthrough files */
+	struct idr passthrough_files_map;
 };
 
 /*
@@ -1324,4 +1343,12 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 void fuse_file_release(struct inode *inode, struct fuse_file *ff,
 		       unsigned int open_flags, fl_owner_t id, bool isdir);
 
+/* passthrough.c */
+int fuse_passthrough_open(struct fuse_conn *fc, int backing_fd);
+int fuse_passthrough_close(struct fuse_conn *fc, int passthrough_fh);
+int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
+			   struct fuse_open_out *openarg);
+void fuse_passthrough_put(struct fuse_passthrough *passthrough);
+void fuse_passthrough_free(struct fuse_passthrough *passthrough);
+
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index d66070af145d..271586fac008 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -840,6 +840,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
 	INIT_LIST_HEAD(&fc->bg_queue);
 	INIT_LIST_HEAD(&fc->entry);
 	INIT_LIST_HEAD(&fc->devices);
+	idr_init(&fc->passthrough_files_map);
 	atomic_set(&fc->num_waiting, 0);
 	fc->max_background = FUSE_DEFAULT_MAX_BACKGROUND;
 	fc->congestion_threshold = FUSE_DEFAULT_CONGESTION_THRESHOLD;
@@ -1209,6 +1210,12 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 				fc->init_security = 1;
 			if (flags & FUSE_CREATE_SUPP_GROUP)
 				fc->create_supp_group = 1;
+			if (flags & FUSE_PASSTHROUGH) {
+				fc->passthrough = 1;
+				/* Prevent further stacking */
+				fm->sb->s_stack_depth =
+					FILESYSTEM_MAX_STACK_DEPTH;
+			}
 		} else {
 			ra_pages = fc->max_read / PAGE_SIZE;
 			fc->no_lock = 1;
@@ -1254,7 +1261,8 @@ void fuse_send_init(struct fuse_mount *fm)
 		FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS |
 		FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA |
 		FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT |
-		FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP;
+		FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP |
+		FUSE_PASSTHROUGH;
 #ifdef CONFIG_FUSE_DAX
 	if (fm->fc->dax)
 		flags |= FUSE_MAP_ALIGNMENT;
@@ -1287,9 +1295,20 @@ void fuse_send_init(struct fuse_mount *fm)
 }
 EXPORT_SYMBOL_GPL(fuse_send_init);
 
+static int fuse_passthrough_id_free(int id, void *p, void *data)
+{
+	struct fuse_passthrough *passthrough = p;
+
+	WARN_ON_ONCE(refcount_read(&passthrough->count) != 1);
+	fuse_passthrough_free(passthrough);
+	return 0;
+}
+
 void fuse_free_conn(struct fuse_conn *fc)
 {
 	WARN_ON(!list_empty(&fc->devices));
+	idr_for_each(&fc->passthrough_files_map, fuse_passthrough_id_free, NULL);
+	idr_destroy(&fc->passthrough_files_map);
 	kfree_rcu(fc, rcu);
 }
 EXPORT_SYMBOL_GPL(fuse_free_conn);
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
new file mode 100644
index 000000000000..fc723e004de9
--- /dev/null
+++ b/fs/fuse/passthrough.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "fuse_i.h"
+
+#include <linux/file.h>
+
+/*
+ * Returns passthrough_fh id that can be passed with FOPEN_PASSTHROUGH
+ * open response and needs to be released with fuse_passthrough_close().
+ */
+int fuse_passthrough_open(struct fuse_conn *fc, int backing_fd)
+{
+	return -EINVAL;
+}
+
+int fuse_passthrough_close(struct fuse_conn *fc, int passthrough_fh)
+{
+	return -EINVAL;
+}
+
+int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
+			   struct fuse_open_out *openarg)
+{
+	return -EINVAL;
+}
+
+void fuse_passthrough_put(struct fuse_passthrough *passthrough)
+{
+	if (passthrough && refcount_dec_and_test(&passthrough->count))
+		fuse_passthrough_free(passthrough);
+}
+
+void fuse_passthrough_free(struct fuse_passthrough *passthrough)
+{
+	if (passthrough && passthrough->filp) {
+		fput(passthrough->filp);
+		passthrough->filp = NULL;
+	}
+	kfree(passthrough);
+}
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 1b9d0dfae72d..3da1f59007cf 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -206,6 +206,10 @@
  *  - add extension header
  *  - add FUSE_EXT_GROUPS
  *  - add FUSE_CREATE_SUPP_GROUP
+ *
+ *  7.39
+ *  - add FUSE_PASSTHROUGH
+ *  - add FOPEN_PASSTHROUGH
  */
 
 #ifndef _LINUX_FUSE_H
@@ -241,7 +245,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 38
+#define FUSE_KERNEL_MINOR_VERSION 39
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -314,6 +318,7 @@ struct fuse_file_lock {
  * FOPEN_STREAM: the file is stream-like (no file position at all)
  * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
  * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
+ * FOPEN_PASSTHROUGH: passthrough read/write operations for this open file
  */
 #define FOPEN_DIRECT_IO		(1 << 0)
 #define FOPEN_KEEP_CACHE	(1 << 1)
@@ -322,6 +327,7 @@ struct fuse_file_lock {
 #define FOPEN_STREAM		(1 << 4)
 #define FOPEN_NOFLUSH		(1 << 5)
 #define FOPEN_PARALLEL_DIRECT_WRITES	(1 << 6)
+#define FOPEN_PASSTHROUGH	(1 << 7)
 
 /**
  * INIT request/reply flags
@@ -406,6 +412,7 @@ struct fuse_file_lock {
 #define FUSE_SECURITY_CTX	(1ULL << 32)
 #define FUSE_HAS_INODE_DAX	(1ULL << 33)
 #define FUSE_CREATE_SUPP_GROUP	(1ULL << 34)
+#define FUSE_PASSTHROUGH	(1ULL << 35)
 
 /**
  * CUSE INIT request/reply flags
@@ -698,7 +705,7 @@ struct fuse_create_in {
 struct fuse_open_out {
 	uint64_t	fh;
 	uint32_t	open_flags;
-	uint32_t	padding;
+	uint32_t	passthrough_fh;
 };
 
 struct fuse_release_in {
@@ -989,6 +996,8 @@ struct fuse_notify_retrieve_in {
 /* Device ioctls: */
 #define FUSE_DEV_IOC_MAGIC		229
 #define FUSE_DEV_IOC_CLONE		_IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t)
+#define FUSE_DEV_IOC_PASSTHROUGH_OPEN	_IOW(FUSE_DEV_IOC_MAGIC, 1, uint32_t)
+#define FUSE_DEV_IOC_PASSTHROUGH_CLOSE	_IOW(FUSE_DEV_IOC_MAGIC, 2, uint32_t)
 
 struct fuse_lseek_in {
 	uint64_t	fh;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 03/10] fuse: Passthrough initialization and release
  2023-05-19 12:56 [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
  2023-05-19 12:56 ` [PATCH v13 01/10] fs: Generic function to convert iocb to rw flags Amir Goldstein
  2023-05-19 12:56 ` [PATCH v13 02/10] fuse: Definitions and ioctl for passthrough Amir Goldstein
@ 2023-05-19 12:56 ` Amir Goldstein
  2023-05-19 12:56 ` [PATCH v13 04/10] fuse: Introduce synchronous read and write for passthrough Amir Goldstein
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-05-19 12:56 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

Implement the FUSE passthrough ioctl that associates a backing
(passthrough) file with the fuse_file.

The file descriptor passed to the ioctl by the FUSE daemon is used to
access the relative file pointer, that will be copied to the fuse_file
data structure to consolidate the link between the FUSE and the backing
file.

To enable the passthrough mode, FUSE daemon calls the
FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl and, if the call succeeds, receives
back an identifier that will be used at open/create response time in the
fuse_open_out field to associate the FUSE file to the backing file.

The value returned by the ioctl to user space can be:
  > 0: success, the value can be used as part of an open/create reply.
 <= 0: an error occurred.
 == 0: represents an error to preserve backward compatibility - the
fuse_open_out field that is used to pass the backing_id back to the
kernel uses the same bits that were previously as struct padding, and is
commonly zero-initialized (e.g., in the libfuse implementation).

Removing 0 from the correct values fixes the ambiguity between the case
in which 0 corresponds to a real backing_id, a missing implementation
of FUSE passthrough or a request for a normal FUSE file, simplifying the
user space implementation.

For the passthrough mode to be successfully activated, the backing file
filesystem must implement both read_iter and write_iter file operations.
This extra check avoids special pseudo files to be used for passhrough.
Passthrough comes with another limitation: no further filesystem stacking
is allowed for those FUSE filesystems using passthrough.

The FUSE daemon must call FUSE_DEV_IOC_PASSTHROUGH_CLOSE ioctl to drop
the reference on the backing file.  This can be done immediattely after
responding to open or at a later time.

In any case, the backing file will be kept open by the FUSE driver until
the last fuse_file that was setup to passthrough to that backing file is
closed AND the FUSE_DEV_IOC_PASSTHROUGH_CLOSE ioctl was called.

Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/fuse_i.h      |  1 +
 fs/fuse/inode.c       |  1 +
 fs/fuse/passthrough.c | 89 +++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f52604534ff6..9ad1cc37a5c4 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -181,6 +181,7 @@ struct fuse_passthrough {
 
 	/** refcount */
 	refcount_t count;
+	struct rcu_head rcu;
 };
 
 /** FUSE specific file data */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 271586fac008..4200fb13883e 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1301,6 +1301,7 @@ static int fuse_passthrough_id_free(int id, void *p, void *data)
 
 	WARN_ON_ONCE(refcount_read(&passthrough->count) != 1);
 	fuse_passthrough_free(passthrough);
+
 	return 0;
 }
 
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index fc723e004de9..8d090ae252f2 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -3,6 +3,7 @@
 #include "fuse_i.h"
 
 #include <linux/file.h>
+#include <linux/idr.h>
 
 /*
  * Returns passthrough_fh id that can be passed with FOPEN_PASSTHROUGH
@@ -10,18 +11,98 @@
  */
 int fuse_passthrough_open(struct fuse_conn *fc, int backing_fd)
 {
-	return -EINVAL;
+	struct file *passthrough_filp;
+	struct inode *passthrough_inode;
+	struct super_block *passthrough_sb;
+	struct fuse_passthrough *passthrough;
+	int res;
+
+	if (!fc->passthrough)
+		return -EPERM;
+
+	passthrough_filp = fget(backing_fd);
+	if (!passthrough_filp)
+		return -EBADF;
+
+	res = -EOPNOTSUPP;
+	if (!passthrough_filp->f_op->read_iter ||
+	    !passthrough_filp->f_op->write_iter)
+		goto out_fput;
+
+	passthrough_inode = file_inode(passthrough_filp);
+	passthrough_sb = passthrough_inode->i_sb;
+	res = -ELOOP;
+	if (passthrough_sb->s_stack_depth >= FILESYSTEM_MAX_STACK_DEPTH)
+		goto out_fput;
+
+	passthrough = kmalloc(sizeof(struct fuse_passthrough), GFP_KERNEL);
+	res = -ENOMEM;
+	if (!passthrough)
+		goto out_fput;
+
+	passthrough->filp = passthrough_filp;
+	refcount_set(&passthrough->count, 1);
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&fc->lock);
+	res = idr_alloc_cyclic(&fc->passthrough_files_map, passthrough, 1, 0,
+			       GFP_ATOMIC);
+	spin_unlock(&fc->lock);
+	idr_preload_end();
+
+	if (res < 0)
+		fuse_passthrough_free(passthrough);
+
+	return res;
+
+out_fput:
+	fput(passthrough_filp);
+
+	return res;
 }
 
 int fuse_passthrough_close(struct fuse_conn *fc, int passthrough_fh)
 {
-	return -EINVAL;
+	struct fuse_passthrough *passthrough;
+
+	if (!fc->passthrough)
+		return -EPERM;
+
+	if (passthrough_fh <= 0)
+		return -EINVAL;
+
+	spin_lock(&fc->lock);
+	passthrough = idr_remove(&fc->passthrough_files_map, passthrough_fh);
+	spin_unlock(&fc->lock);
+
+	if (!passthrough)
+		return -ENOENT;
+
+	fuse_passthrough_put(passthrough);
+
+	return 0;
 }
 
 int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
 			   struct fuse_open_out *openarg)
 {
-	return -EINVAL;
+	int passthrough_fh = openarg->passthrough_fh;
+	struct fuse_passthrough *passthrough;
+
+	if (passthrough_fh <= 0)
+		return -EINVAL;
+
+	rcu_read_lock();
+	passthrough = idr_find(&fc->passthrough_files_map, passthrough_fh);
+	if (passthrough && !refcount_inc_not_zero(&passthrough->count))
+		passthrough = NULL;
+	rcu_read_unlock();
+	if (!passthrough)
+		return -ENOENT;
+
+	ff->passthrough = passthrough;
+
+	return 0;
 }
 
 void fuse_passthrough_put(struct fuse_passthrough *passthrough)
@@ -36,5 +117,5 @@ void fuse_passthrough_free(struct fuse_passthrough *passthrough)
 		fput(passthrough->filp);
 		passthrough->filp = NULL;
 	}
-	kfree(passthrough);
+	kfree_rcu(passthrough, rcu);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 04/10] fuse: Introduce synchronous read and write for passthrough
  2023-05-19 12:56 [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
                   ` (2 preceding siblings ...)
  2023-05-19 12:56 ` [PATCH v13 03/10] fuse: Passthrough initialization and release Amir Goldstein
@ 2023-05-19 12:56 ` Amir Goldstein
  2023-05-19 12:57 ` [PATCH v13 05/10] fuse: Handle asynchronous read and write in passthrough Amir Goldstein
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-05-19 12:56 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

All the read and write operations performed on fuse_files which have the
passthrough feature enabled are forwarded to the associated backing file
via VFS.

Sending the request directly to the backing file avoids the userspace
round-trip that, because of possible context switches and additional
operations might reduce the overall performance, especially in those
cases where caching doesn't help, for example in reads at random offsets.

Verifying if a fuse_file has a backing file associated with it
can be done by checking the validity of its passthrough_filp pointer.
This pointer is not NULL only if passthrough has been successfully
enabled via the appropriate ioctl().

When a read/write operation is requested for a FUSE file with passthrough
enabled, a new equivalent VFS request is generated, which instead targets
the backing file.

The VFS layer performs additional checks that allow for safer operations
but may cause the operation to fail if the process accessing the FUSE
file does not have access to the backing file.

This change only implements synchronous requests in passthrough,
returning an error in the case of asynchronous operations, yet covering
the majority of the use cases.

Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/file.c        |  8 ++++++--
 fs/fuse/fuse_i.h      |  3 +++
 fs/fuse/passthrough.c | 47 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 96a46a5aa892..24d37681ddcd 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1682,7 +1682,9 @@ static ssize_t fuse_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	if (FUSE_IS_DAX(inode))
 		return fuse_dax_read_iter(iocb, to);
 
-	if (!(ff->open_flags & FOPEN_DIRECT_IO))
+	if (ff->passthrough)
+		return fuse_passthrough_read_iter(iocb, to);
+	else if (!(ff->open_flags & FOPEN_DIRECT_IO))
 		return fuse_cache_read_iter(iocb, to);
 	else
 		return fuse_direct_read_iter(iocb, to);
@@ -1700,7 +1702,9 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (FUSE_IS_DAX(inode))
 		return fuse_dax_write_iter(iocb, from);
 
-	if (!(ff->open_flags & FOPEN_DIRECT_IO))
+	if (ff->passthrough)
+		return fuse_passthrough_write_iter(iocb, from);
+	else if (!(ff->open_flags & FOPEN_DIRECT_IO))
 		return fuse_cache_write_iter(iocb, from);
 	else
 		return fuse_direct_write_iter(iocb, from);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 9ad1cc37a5c4..ff09fcd840df 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1352,4 +1352,7 @@ int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
 void fuse_passthrough_put(struct fuse_passthrough *passthrough);
 void fuse_passthrough_free(struct fuse_passthrough *passthrough);
 
+ssize_t fuse_passthrough_read_iter(struct kiocb *iocb, struct iov_iter *to);
+ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, struct iov_iter *from);
+
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index 8d090ae252f2..9d81f3982c96 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -4,6 +4,53 @@
 
 #include <linux/file.h>
 #include <linux/idr.h>
+#include <linux/uio.h>
+
+#define FUSE_IOCB_MASK                                                  \
+	(IOCB_APPEND | IOCB_DSYNC | IOCB_HIPRI | IOCB_NOWAIT | IOCB_SYNC)
+
+ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
+				   struct iov_iter *iter)
+{
+	struct file *fuse_filp = iocb_fuse->ki_filp;
+	struct fuse_file *ff = fuse_filp->private_data;
+	struct file *passthrough_filp = ff->passthrough->filp;
+	ssize_t ret;
+	rwf_t rwf;
+
+	if (!iov_iter_count(iter))
+		return 0;
+
+	rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
+	ret = vfs_iter_read(passthrough_filp, iter, &iocb_fuse->ki_pos, rwf);
+
+	return ret;
+}
+
+ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
+				    struct iov_iter *iter)
+{
+	struct file *fuse_filp = iocb_fuse->ki_filp;
+	struct fuse_file *ff = fuse_filp->private_data;
+	struct inode *fuse_inode = file_inode(fuse_filp);
+	struct file *passthrough_filp = ff->passthrough->filp;
+	ssize_t ret;
+	rwf_t rwf;
+
+	if (!iov_iter_count(iter))
+		return 0;
+
+	inode_lock(fuse_inode);
+
+	file_start_write(passthrough_filp);
+	rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
+	ret = vfs_iter_write(passthrough_filp, iter, &iocb_fuse->ki_pos, rwf);
+	file_end_write(passthrough_filp);
+
+	inode_unlock(fuse_inode);
+
+	return ret;
+}
 
 /*
  * Returns passthrough_fh id that can be passed with FOPEN_PASSTHROUGH
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 05/10] fuse: Handle asynchronous read and write in passthrough
  2023-05-19 12:56 [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
                   ` (3 preceding siblings ...)
  2023-05-19 12:56 ` [PATCH v13 04/10] fuse: Introduce synchronous read and write for passthrough Amir Goldstein
@ 2023-05-19 12:57 ` Amir Goldstein
  2023-05-22 15:20   ` Miklos Szeredi
  2023-05-19 12:57 ` [PATCH v13 06/10] fuse: Use daemon creds in passthrough mode Amir Goldstein
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-05-19 12:57 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

Extend the passthrough feature by handling asynchronous IO both for read
and write operations.

When an AIO request is received, if the request targets a FUSE file with
the passthrough functionality enabled, a new identical AIO request is
created.  The new request targets the backing file and gets assigned
a special FUSE passthrough AIO completion callback.

When the backing file AIO request is completed, the FUSE
passthrough AIO completion callback is executed and propagates the
completion signal to the FUSE AIO request by triggering its completion
callback as well.

Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/passthrough.c | 82 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 75 insertions(+), 7 deletions(-)

diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index 9d81f3982c96..2ccd2d6de736 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -9,6 +9,36 @@
 #define FUSE_IOCB_MASK                                                  \
 	(IOCB_APPEND | IOCB_DSYNC | IOCB_HIPRI | IOCB_NOWAIT | IOCB_SYNC)
 
+struct fuse_aio_req {
+	struct kiocb iocb;
+	struct kiocb *iocb_fuse;
+};
+
+static void fuse_aio_cleanup_handler(struct fuse_aio_req *aio_req)
+{
+	struct kiocb *iocb = &aio_req->iocb;
+	struct kiocb *iocb_fuse = aio_req->iocb_fuse;
+
+	if (iocb->ki_flags & IOCB_WRITE) {
+		__sb_writers_acquired(file_inode(iocb->ki_filp)->i_sb,
+				      SB_FREEZE_WRITE);
+		file_end_write(iocb->ki_filp);
+	}
+
+	iocb_fuse->ki_pos = iocb->ki_pos;
+	kfree(aio_req);
+}
+
+static void fuse_aio_rw_complete(struct kiocb *iocb, long res)
+{
+	struct fuse_aio_req *aio_req =
+		container_of(iocb, struct fuse_aio_req, iocb);
+	struct kiocb *iocb_fuse = aio_req->iocb_fuse;
+
+	fuse_aio_cleanup_handler(aio_req);
+	iocb_fuse->ki_complete(iocb_fuse, res);
+}
+
 ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 				   struct iov_iter *iter)
 {
@@ -21,8 +51,24 @@ ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 	if (!iov_iter_count(iter))
 		return 0;
 
-	rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
-	ret = vfs_iter_read(passthrough_filp, iter, &iocb_fuse->ki_pos, rwf);
+	if (is_sync_kiocb(iocb_fuse)) {
+		rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
+		ret = vfs_iter_read(passthrough_filp, iter, &iocb_fuse->ki_pos,
+				    rwf);
+	} else {
+		struct fuse_aio_req *aio_req;
+
+		aio_req = kmalloc(sizeof(struct fuse_aio_req), GFP_KERNEL);
+		if (!aio_req)
+			return -ENOMEM;
+
+		aio_req->iocb_fuse = iocb_fuse;
+		kiocb_clone(&aio_req->iocb, iocb_fuse, passthrough_filp);
+		aio_req->iocb.ki_complete = fuse_aio_rw_complete;
+		ret = call_read_iter(passthrough_filp, &aio_req->iocb, iter);
+		if (ret != -EIOCBQUEUED)
+			fuse_aio_cleanup_handler(aio_req);
+	}
 
 	return ret;
 }
@@ -34,6 +80,7 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 	struct fuse_file *ff = fuse_filp->private_data;
 	struct inode *fuse_inode = file_inode(fuse_filp);
 	struct file *passthrough_filp = ff->passthrough->filp;
+	struct inode *passthrough_inode = file_inode(passthrough_filp);
 	ssize_t ret;
 	rwf_t rwf;
 
@@ -42,11 +89,32 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 
 	inode_lock(fuse_inode);
 
-	file_start_write(passthrough_filp);
-	rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
-	ret = vfs_iter_write(passthrough_filp, iter, &iocb_fuse->ki_pos, rwf);
-	file_end_write(passthrough_filp);
-
+	if (is_sync_kiocb(iocb_fuse)) {
+		file_start_write(passthrough_filp);
+		rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
+		ret = vfs_iter_write(passthrough_filp, iter, &iocb_fuse->ki_pos,
+				     rwf);
+		file_end_write(passthrough_filp);
+	} else {
+		struct fuse_aio_req *aio_req;
+
+		aio_req = kmalloc(sizeof(struct fuse_aio_req), GFP_KERNEL);
+		if (!aio_req) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		file_start_write(passthrough_filp);
+		__sb_writers_release(passthrough_inode->i_sb, SB_FREEZE_WRITE);
+
+		aio_req->iocb_fuse = iocb_fuse;
+		kiocb_clone(&aio_req->iocb, iocb_fuse, passthrough_filp);
+		aio_req->iocb.ki_complete = fuse_aio_rw_complete;
+		ret = call_write_iter(passthrough_filp, &aio_req->iocb, iter);
+		if (ret != -EIOCBQUEUED)
+			fuse_aio_cleanup_handler(aio_req);
+	}
+out:
 	inode_unlock(fuse_inode);
 
 	return ret;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 06/10] fuse: Use daemon creds in passthrough mode
  2023-05-19 12:56 [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
                   ` (4 preceding siblings ...)
  2023-05-19 12:57 ` [PATCH v13 05/10] fuse: Handle asynchronous read and write in passthrough Amir Goldstein
@ 2023-05-19 12:57 ` Amir Goldstein
  2023-05-19 12:57 ` [PATCH v13 07/10] fuse: Introduce passthrough for mmap Amir Goldstein
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-05-19 12:57 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

When using FUSE passthrough, read/write operations are directly
forwarded to the backing file through VFS, but there is no guarantee
that the process that is triggering the request has the right
permissions to access the backing file.  This would cause the
read/write access to fail.

In a passthrough FUSE filesystems, where the FUSE daemon is responsible
for the enforcement of the backing file access policies, often happens
that the process dealing with the FUSE filesystem doesn't have access
to the backing file.

Being the FUSE daemon in charge of implementing the FUSE file operations,
that in the case of read/write operations usually simply results in the
copy of memory buffers from/to the backing filesystem respectively,
these operations are executed with the FUSE daemon privileges.

This patch adds a reference to the FUSE daemon credentials, referenced
at FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl() time so that they can be used
to temporarily override the user credentials when accessing backing file
in passthrough operations.

The process accessing the FUSE file with passthrough enabled temporarily
receives the privileges of the FUSE daemon while performing read/write
operations. Similar behavior is implemented in overlayfs.

These privileges will be reverted as soon as the IO operation completes.
This feature does not provide any higher security privileges to those
processes accessing the FUSE filesystem with passthrough enabled.  This
is because it is still the FUSE daemon responsible for enabling or not
the passthrough feature at file open time, and should enable the feature
only after appropriate access policy checks.

Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/fuse_i.h      |  4 +++-
 fs/fuse/passthrough.c | 18 ++++++++++++++++--
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index ff09fcd840df..61a3968cfc8f 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -174,10 +174,12 @@ struct fuse_mount;
 struct fuse_release_args;
 
 /**
- * Reference to backing file for read/write operations in passthrough mode.
+ * Reference to backing file for read/write operations in passthrough mode
+ * and the credentials to be used for passthrough operations.
  */
 struct fuse_passthrough {
 	struct file *filp;
+	struct cred *cred;
 
 	/** refcount */
 	refcount_t count;
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index 2ccd2d6de736..06c6926aa85a 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -45,12 +45,14 @@ ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 	struct file *fuse_filp = iocb_fuse->ki_filp;
 	struct fuse_file *ff = fuse_filp->private_data;
 	struct file *passthrough_filp = ff->passthrough->filp;
+	const struct cred *old_cred;
 	ssize_t ret;
 	rwf_t rwf;
 
 	if (!iov_iter_count(iter))
 		return 0;
 
+	old_cred = override_creds(ff->passthrough->cred);
 	if (is_sync_kiocb(iocb_fuse)) {
 		rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
 		ret = vfs_iter_read(passthrough_filp, iter, &iocb_fuse->ki_pos,
@@ -59,8 +61,10 @@ ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 		struct fuse_aio_req *aio_req;
 
 		aio_req = kmalloc(sizeof(struct fuse_aio_req), GFP_KERNEL);
-		if (!aio_req)
-			return -ENOMEM;
+		if (!aio_req) {
+			ret = -ENOMEM;
+			goto out;
+		}
 
 		aio_req->iocb_fuse = iocb_fuse;
 		kiocb_clone(&aio_req->iocb, iocb_fuse, passthrough_filp);
@@ -69,6 +73,8 @@ ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 		if (ret != -EIOCBQUEUED)
 			fuse_aio_cleanup_handler(aio_req);
 	}
+out:
+	revert_creds(old_cred);
 
 	return ret;
 }
@@ -81,6 +87,7 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 	struct inode *fuse_inode = file_inode(fuse_filp);
 	struct file *passthrough_filp = ff->passthrough->filp;
 	struct inode *passthrough_inode = file_inode(passthrough_filp);
+	const struct cred *old_cred;
 	ssize_t ret;
 	rwf_t rwf;
 
@@ -89,6 +96,7 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 
 	inode_lock(fuse_inode);
 
+	old_cred = override_creds(ff->passthrough->cred);
 	if (is_sync_kiocb(iocb_fuse)) {
 		file_start_write(passthrough_filp);
 		rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
@@ -115,6 +123,7 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 			fuse_aio_cleanup_handler(aio_req);
 	}
 out:
+	revert_creds(old_cred);
 	inode_unlock(fuse_inode);
 
 	return ret;
@@ -156,6 +165,7 @@ int fuse_passthrough_open(struct fuse_conn *fc, int backing_fd)
 		goto out_fput;
 
 	passthrough->filp = passthrough_filp;
+	passthrough->cred = prepare_creds();
 	refcount_set(&passthrough->count, 1);
 
 	idr_preload(GFP_KERNEL);
@@ -232,5 +242,9 @@ void fuse_passthrough_free(struct fuse_passthrough *passthrough)
 		fput(passthrough->filp);
 		passthrough->filp = NULL;
 	}
+	if (passthrough->cred) {
+		put_cred(passthrough->cred);
+		passthrough->cred = NULL;
+	}
 	kfree_rcu(passthrough, rcu);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 07/10] fuse: Introduce passthrough for mmap
  2023-05-19 12:56 [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
                   ` (5 preceding siblings ...)
  2023-05-19 12:57 ` [PATCH v13 06/10] fuse: Use daemon creds in passthrough mode Amir Goldstein
@ 2023-05-19 12:57 ` Amir Goldstein
  2023-05-19 12:57 ` [PATCH v13 08/10] fuse: update inode size/mtime after passthrough write Amir Goldstein
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-05-19 12:57 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

Enabling FUSE passthrough for mmap-ed operations not only affects
performance, but has also been shown as mandatory for the correct
functioning of FUSE passthrough.

yanwu noticed [1] that a FUSE file with passthrough enabled may suffer
data inconsistencies if the same file is also accessed with mmap.
What happens is that read/write operations are directly applied to the
backing file's page cache, while mmap-ed operations are affecting the
FUSE file's page cache.

Extend the FUSE passthrough implementation to also handle memory-mapped
FUSE file, to both fix the cache inconsistencies and extend the
passthrough performance benefits to mmap-ed operations.

[1] https://lore.kernel.org/lkml/20210119110654.11817-1-wu-yan@tcl.com/

Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/file.c        |  3 +++
 fs/fuse/fuse_i.h      |  1 +
 fs/fuse/passthrough.c | 27 +++++++++++++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 24d37681ddcd..80e20bae569f 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2521,6 +2521,9 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
 	if (FUSE_IS_DAX(file_inode(file)))
 		return fuse_dax_mmap(file, vma);
 
+	if (ff->passthrough)
+		return fuse_passthrough_mmap(file, vma);
+
 	if (ff->open_flags & FOPEN_DIRECT_IO) {
 		/* Can't provide the coherency needed for MAP_SHARED */
 		if (vma->vm_flags & VM_MAYSHARE)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 61a3968cfc8f..238a43349298 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1356,5 +1356,6 @@ void fuse_passthrough_free(struct fuse_passthrough *passthrough);
 
 ssize_t fuse_passthrough_read_iter(struct kiocb *iocb, struct iov_iter *to);
 ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, struct iov_iter *from);
+ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma);
 
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index 06c6926aa85a..10b370bcc423 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -129,6 +129,33 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 	return ret;
 }
 
+ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	int ret;
+	const struct cred *old_cred;
+	struct fuse_file *ff = file->private_data;
+	struct file *passthrough_filp = ff->passthrough->filp;
+
+	if (!passthrough_filp->f_op->mmap)
+		return -ENODEV;
+
+	if (WARN_ON(file != vma->vm_file))
+		return -EIO;
+
+	vma->vm_file = get_file(passthrough_filp);
+
+	old_cred = override_creds(ff->passthrough->cred);
+	ret = call_mmap(vma->vm_file, vma);
+	revert_creds(old_cred);
+
+	if (ret)
+		fput(passthrough_filp);
+	else
+		fput(file);
+
+	return ret;
+}
+
 /*
  * Returns passthrough_fh id that can be passed with FOPEN_PASSTHROUGH
  * open response and needs to be released with fuse_passthrough_close().
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 08/10] fuse: update inode size/mtime after passthrough write
  2023-05-19 12:56 [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
                   ` (6 preceding siblings ...)
  2023-05-19 12:57 ` [PATCH v13 07/10] fuse: Introduce passthrough for mmap Amir Goldstein
@ 2023-05-19 12:57 ` Amir Goldstein
  2023-09-25  6:41   ` [External] " Zhang Tianci
  2023-05-19 12:57 ` [PATCH v13 09/10] fuse: invalidate atime after passthrough read/mmap Amir Goldstein
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-05-19 12:57 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

Similar to update size/mtime at the end of fuse_perform_write(),
we need to bump the attr version when we update the inode size.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/passthrough.c | 53 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 42 insertions(+), 11 deletions(-)

diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index 10b370bcc423..8352d6b91e0e 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -14,15 +14,42 @@ struct fuse_aio_req {
 	struct kiocb *iocb_fuse;
 };
 
-static void fuse_aio_cleanup_handler(struct fuse_aio_req *aio_req)
+static void fuse_file_start_write(struct file *fuse_file,
+				  struct file *backing_file,
+				  loff_t pos, size_t count)
+{
+	struct inode *inode = file_inode(fuse_file);
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	if (inode->i_size < pos + count)
+		set_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
+
+	file_start_write(backing_file);
+}
+
+static void fuse_file_end_write(struct file *fuse_file,
+				struct file *backing_file,
+				loff_t pos, ssize_t res)
+{
+	struct inode *inode = file_inode(fuse_file);
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	file_end_write(backing_file);
+
+	fuse_write_update_attr(inode, pos, res);
+	clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
+}
+
+static void fuse_aio_cleanup_handler(struct fuse_aio_req *aio_req, long res)
 {
 	struct kiocb *iocb = &aio_req->iocb;
 	struct kiocb *iocb_fuse = aio_req->iocb_fuse;
+	struct file *filp = iocb->ki_filp;
+	struct file *fuse_filp = iocb_fuse->ki_filp;
 
 	if (iocb->ki_flags & IOCB_WRITE) {
-		__sb_writers_acquired(file_inode(iocb->ki_filp)->i_sb,
-				      SB_FREEZE_WRITE);
-		file_end_write(iocb->ki_filp);
+		__sb_writers_acquired(file_inode(filp)->i_sb, SB_FREEZE_WRITE);
+		fuse_file_end_write(fuse_filp, filp, iocb->ki_pos, res);
 	}
 
 	iocb_fuse->ki_pos = iocb->ki_pos;
@@ -35,7 +62,7 @@ static void fuse_aio_rw_complete(struct kiocb *iocb, long res)
 		container_of(iocb, struct fuse_aio_req, iocb);
 	struct kiocb *iocb_fuse = aio_req->iocb_fuse;
 
-	fuse_aio_cleanup_handler(aio_req);
+	fuse_aio_cleanup_handler(aio_req, res);
 	iocb_fuse->ki_complete(iocb_fuse, res);
 }
 
@@ -71,7 +98,7 @@ ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 		aio_req->iocb.ki_complete = fuse_aio_rw_complete;
 		ret = call_read_iter(passthrough_filp, &aio_req->iocb, iter);
 		if (ret != -EIOCBQUEUED)
-			fuse_aio_cleanup_handler(aio_req);
+			fuse_aio_cleanup_handler(aio_req, ret);
 	}
 out:
 	revert_creds(old_cred);
@@ -87,22 +114,25 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 	struct inode *fuse_inode = file_inode(fuse_filp);
 	struct file *passthrough_filp = ff->passthrough->filp;
 	struct inode *passthrough_inode = file_inode(passthrough_filp);
+	size_t count = iov_iter_count(iter);
 	const struct cred *old_cred;
 	ssize_t ret;
 	rwf_t rwf;
 
-	if (!iov_iter_count(iter))
+	if (!count)
 		return 0;
 
 	inode_lock(fuse_inode);
 
 	old_cred = override_creds(ff->passthrough->cred);
 	if (is_sync_kiocb(iocb_fuse)) {
-		file_start_write(passthrough_filp);
+		fuse_file_start_write(fuse_filp, passthrough_filp,
+				      iocb_fuse->ki_pos, count);
 		rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
 		ret = vfs_iter_write(passthrough_filp, iter, &iocb_fuse->ki_pos,
 				     rwf);
-		file_end_write(passthrough_filp);
+		fuse_file_end_write(fuse_filp, passthrough_filp,
+				    iocb_fuse->ki_pos, ret);
 	} else {
 		struct fuse_aio_req *aio_req;
 
@@ -112,7 +142,8 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 			goto out;
 		}
 
-		file_start_write(passthrough_filp);
+		fuse_file_start_write(fuse_filp, passthrough_filp,
+				      iocb_fuse->ki_pos, count);
 		__sb_writers_release(passthrough_inode->i_sb, SB_FREEZE_WRITE);
 
 		aio_req->iocb_fuse = iocb_fuse;
@@ -120,7 +151,7 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 		aio_req->iocb.ki_complete = fuse_aio_rw_complete;
 		ret = call_write_iter(passthrough_filp, &aio_req->iocb, iter);
 		if (ret != -EIOCBQUEUED)
-			fuse_aio_cleanup_handler(aio_req);
+			fuse_aio_cleanup_handler(aio_req, ret);
 	}
 out:
 	revert_creds(old_cred);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 09/10] fuse: invalidate atime after passthrough read/mmap
  2023-05-19 12:56 [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
                   ` (7 preceding siblings ...)
  2023-05-19 12:57 ` [PATCH v13 08/10] fuse: update inode size/mtime after passthrough write Amir Goldstein
@ 2023-05-19 12:57 ` Amir Goldstein
  2023-05-19 12:57 ` [PATCH v13 10/10] fuse: setup a passthrough fd without a permanent backing id Amir Goldstein
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-05-19 12:57 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

Similar to invalidate atime in fuse_readpages_end().

To minimize requests to server, invalidate atime only if the backing
inode atime has changed during the operation.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/passthrough.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index 8352d6b91e0e..2b745b6b2364 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -12,6 +12,7 @@
 struct fuse_aio_req {
 	struct kiocb iocb;
 	struct kiocb *iocb_fuse;
+	struct timespec64 pre_atime;
 };
 
 static void fuse_file_start_write(struct file *fuse_file,
@@ -40,6 +41,21 @@ static void fuse_file_end_write(struct file *fuse_file,
 	clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
 }
 
+static void fuse_file_start_read(struct file *backing_file,
+				 struct timespec64 *pre_atime)
+{
+	*pre_atime = file_inode(backing_file)->i_atime;
+}
+
+static void fuse_file_end_read(struct file *fuse_file,
+			       struct file *backing_file,
+			       struct timespec64 *pre_atime)
+{
+	/* Mimic atime update policy of passthrough inode, not the value */
+	if (!timespec64_equal(&file_inode(backing_file)->i_atime, pre_atime))
+		fuse_invalidate_atime(file_inode(fuse_file));
+}
+
 static void fuse_aio_cleanup_handler(struct fuse_aio_req *aio_req, long res)
 {
 	struct kiocb *iocb = &aio_req->iocb;
@@ -50,6 +66,8 @@ static void fuse_aio_cleanup_handler(struct fuse_aio_req *aio_req, long res)
 	if (iocb->ki_flags & IOCB_WRITE) {
 		__sb_writers_acquired(file_inode(filp)->i_sb, SB_FREEZE_WRITE);
 		fuse_file_end_write(fuse_filp, filp, iocb->ki_pos, res);
+	} else {
+		fuse_file_end_read(fuse_filp, filp, &aio_req->pre_atime);
 	}
 
 	iocb_fuse->ki_pos = iocb->ki_pos;
@@ -81,9 +99,13 @@ ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 
 	old_cred = override_creds(ff->passthrough->cred);
 	if (is_sync_kiocb(iocb_fuse)) {
+		struct timespec64 pre_atime;
+
 		rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
+		fuse_file_start_read(passthrough_filp, &pre_atime);
 		ret = vfs_iter_read(passthrough_filp, iter, &iocb_fuse->ki_pos,
 				    rwf);
+		fuse_file_end_read(fuse_filp, passthrough_filp, &pre_atime);
 	} else {
 		struct fuse_aio_req *aio_req;
 
@@ -94,6 +116,7 @@ ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 		}
 
 		aio_req->iocb_fuse = iocb_fuse;
+		fuse_file_start_read(passthrough_filp, &aio_req->pre_atime);
 		kiocb_clone(&aio_req->iocb, iocb_fuse, passthrough_filp);
 		aio_req->iocb.ki_complete = fuse_aio_rw_complete;
 		ret = call_read_iter(passthrough_filp, &aio_req->iocb, iter);
@@ -166,6 +189,7 @@ ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma)
 	const struct cred *old_cred;
 	struct fuse_file *ff = file->private_data;
 	struct file *passthrough_filp = ff->passthrough->filp;
+	struct timespec64 pre_atime;
 
 	if (!passthrough_filp->f_op->mmap)
 		return -ENODEV;
@@ -176,7 +200,9 @@ ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma)
 	vma->vm_file = get_file(passthrough_filp);
 
 	old_cred = override_creds(ff->passthrough->cred);
+	fuse_file_start_read(passthrough_filp, &pre_atime);
 	ret = call_mmap(vma->vm_file, vma);
+	fuse_file_end_read(file, passthrough_filp, &pre_atime);
 	revert_creds(old_cred);
 
 	if (ret)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v13 10/10] fuse: setup a passthrough fd without a permanent backing id
  2023-05-19 12:56 [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
                   ` (8 preceding siblings ...)
  2023-05-19 12:57 ` [PATCH v13 09/10] fuse: invalidate atime after passthrough read/mmap Amir Goldstein
@ 2023-05-19 12:57 ` Amir Goldstein
  2023-06-06 10:22   ` Fwd: " Miklos Szeredi
  2023-05-20 10:28 ` [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
  2023-06-06  9:13 ` Amir Goldstein
  11 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-05-19 12:57 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

WIP

Add an ioctl to associate a FUSE server open fd with a request.
A later response to this request get use the FOPEN_PASSTHROUGH flag
to request passthrough to the associated backing file.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Miklos,

After implementing refcounted backing files, I started to think how
to limit the server from mapping too many files.

I wanted to limit the backing files mappings to the number of open fuse
files to simplify backing files accounting (i.e. open files are
effectively accounted to clients).

It occured to me that creatig a 1-to-1 mapping between fuse files and
backing file ids is quite futile if there is no need to manage 1-to-many
backing file mappings.

If only 1-to-1 mapping is desired, the proposed ioctl associates a
backing file with a pending request.  The backing file will be kept
open for as long the request lives, or until its refcount is handed
over to the client, which can then use it to setup passthough to the
backing file without the intermediate idr array.

I have not implemented the full hand over yet, because I wanted to
hear your feedback first.

Thanks,
Amir.


 fs/fuse/dev.c             | 45 ++++++++++++++++++++++++++++++++++++++-
 fs/fuse/fuse_i.h          | 16 +++++++++-----
 fs/fuse/passthrough.c     | 12 ++++++++++-
 include/uapi/linux/fuse.h |  8 +++++++
 4 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index cb00234e7843..01fb9c5411d2 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -61,6 +61,8 @@ static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
 
 static void fuse_request_free(struct fuse_req *req)
 {
+	if (test_bit(FR_PASSTHROUGH, &req->flags))
+		fuse_passthrough_put(req->fpt);
 	kmem_cache_free(fuse_req_cachep, req);
 }
 
@@ -2251,6 +2253,43 @@ static int fuse_device_clone(struct fuse_conn *fc, struct file *new)
 	return 0;
 }
 
+// Associate an passthrough fd to a fuse request
+static int fuse_handle_ioc_passthrough_setup(struct fuse_dev *fud,
+				struct fuse_passthrough_setup_in __user *arg)
+{
+	struct fuse_pqueue *fpq = &fud->pq;
+	struct fuse_req *req;
+	struct fuse_passthrough_setup_in fpts;
+	struct fuse_passthrough *fpt;
+	int err;
+
+	if (copy_from_user(&fpts, arg, sizeof(fpts)))
+		return -EFAULT;
+
+	if (fpts.padding)
+		return -EINVAL;
+
+	err = fuse_passthrough_open(fud->fc, fpts.fd, &fpt);
+	if (err)
+		return err;
+
+	spin_lock(&fpq->lock);
+	req = NULL;
+	if (fpq->connected)
+		req = request_find(fpq, fpts.unique);
+	if (req) {
+		__set_bit(FR_PASSTHROUGH, &req->flags);
+		req->fpt = fpt;
+	}
+	spin_unlock(&fpq->lock);
+	if (!req) {
+		fuse_passthrough_put(fpt);
+		return -ENOENT;
+	}
+
+	return 0;
+}
+
 static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
 			   unsigned long arg)
 {
@@ -2291,7 +2330,7 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
 		if (!f.file)
 			return -EINVAL;
 
-		res = fuse_passthrough_open(fud->fc, fd);
+		res = fuse_passthrough_open(fud->fc, fd, NULL);
 		fdput(f);
 		break;
 	case FUSE_DEV_IOC_PASSTHROUGH_CLOSE:
@@ -2300,6 +2339,10 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
 
 		res = fuse_passthrough_close(fud->fc, id);
 		break;
+	case FUSE_DEV_IOC_PASSTHROUGH_SETUP:
+		res = fuse_handle_ioc_passthrough_setup(fud,
+			   (struct fuse_passthrough_setup_in __user *)arg);
+		break;
 	default:
 		res = -ENOTTY;
 		break;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 238a43349298..085d7607ba6e 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -332,6 +332,7 @@ struct fuse_io_priv {
  * FR_FINISHED:		request is finished
  * FR_PRIVATE:		request is on private list
  * FR_ASYNC:		request is asynchronous
+ * FR_PASSTHROUGH:	request is associated with passthrough file
  */
 enum fuse_req_flag {
 	FR_ISREPLY,
@@ -346,6 +347,7 @@ enum fuse_req_flag {
 	FR_FINISHED,
 	FR_PRIVATE,
 	FR_ASYNC,
+	FR_PASSTHROUGH,
 };
 
 /**
@@ -385,10 +387,13 @@ struct fuse_req {
 	/** Used to wake up the task waiting for completion of request*/
 	wait_queue_head_t waitq;
 
-#if IS_ENABLED(CONFIG_VIRTIO_FS)
-	/** virtio-fs's physically contiguous buffer for in and out args */
-	void *argbuf;
-#endif
+	union {
+		/** virtio-fs's physically contiguous buffer for in/out args */
+		void *argbuf;
+
+		/** passthrough file associated with request */
+		struct fuse_passthrough *fpt;
+	};
 
 	/** fuse_mount this request belongs to */
 	struct fuse_mount *fm;
@@ -1347,7 +1352,8 @@ void fuse_file_release(struct inode *inode, struct fuse_file *ff,
 		       unsigned int open_flags, fl_owner_t id, bool isdir);
 
 /* passthrough.c */
-int fuse_passthrough_open(struct fuse_conn *fc, int backing_fd);
+int fuse_passthrough_open(struct fuse_conn *fc, int backing_fd,
+			  struct fuse_passthrough **pfpt);
 int fuse_passthrough_close(struct fuse_conn *fc, int passthrough_fh);
 int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
 			   struct fuse_open_out *openarg);
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index 2b745b6b2364..85a1d72b1666 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -216,8 +216,12 @@ ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma)
 /*
  * Returns passthrough_fh id that can be passed with FOPEN_PASSTHROUGH
  * open response and needs to be released with fuse_passthrough_close().
+ *
+ * When @pfpt is non-NULL, an anonynous passthrough handle is returned
+ * via *pfpt on success and the return value is 0.
  */
-int fuse_passthrough_open(struct fuse_conn *fc, int backing_fd)
+int fuse_passthrough_open(struct fuse_conn *fc, int backing_fd,
+			  struct fuse_passthrough **pfpt)
 {
 	struct file *passthrough_filp;
 	struct inode *passthrough_inode;
@@ -252,6 +256,12 @@ int fuse_passthrough_open(struct fuse_conn *fc, int backing_fd)
 	passthrough->cred = prepare_creds();
 	refcount_set(&passthrough->count, 1);
 
+	if (pfpt) {
+		/* Return an anonynous passthrough handle */
+		*pfpt = passthrough;
+		return 0;
+	}
+
 	idr_preload(GFP_KERNEL);
 	spin_lock(&fc->lock);
 	res = idr_alloc_cyclic(&fc->passthrough_files_map, passthrough, 1, 0,
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 3da1f59007cf..028a65fe3fa2 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -993,11 +993,19 @@ struct fuse_notify_retrieve_in {
 	uint64_t	dummy4;
 };
 
+struct fuse_passthrough_setup_in {
+	uint64_t	unique;
+	uint32_t        fd;
+	uint32_t        padding;
+};
+
 /* Device ioctls: */
 #define FUSE_DEV_IOC_MAGIC		229
 #define FUSE_DEV_IOC_CLONE		_IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t)
 #define FUSE_DEV_IOC_PASSTHROUGH_OPEN	_IOW(FUSE_DEV_IOC_MAGIC, 1, uint32_t)
 #define FUSE_DEV_IOC_PASSTHROUGH_CLOSE	_IOW(FUSE_DEV_IOC_MAGIC, 2, uint32_t)
+#define FUSE_DEV_IOC_PASSTHROUGH_SETUP	_IOW(FUSE_DEV_IOC_MAGIC, 3, \
+					     struct fuse_passthrough_setup_in)
 
 struct fuse_lseek_in {
 	uint64_t	fh;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 02/10] fuse: Definitions and ioctl for passthrough
  2023-05-19 12:56 ` [PATCH v13 02/10] fuse: Definitions and ioctl for passthrough Amir Goldstein
@ 2023-05-19 15:12   ` Miklos Szeredi
  2023-05-20 10:20     ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Miklos Szeredi @ 2023-05-19 15:12 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
>
> From: Alessio Balsini <balsini@android.com>
>
> Expose the FUSE_PASSTHROUGH capability to user space and declare all the
> basic data structures and functions as the skeleton on top of which the
> FUSE passthrough functionality will be built.
>
> As part of this, introduce the new FUSE passthrough ioctl, which allows
> the FUSE daemon to specify a direct connection between a FUSE file and a
> backing file.  The ioctl requires user space to pass the file descriptor
> of one of its opened files to the FUSE driver and get an id in return.
> This id may be passed in a reply to OPEN with flag FOPEN_PASSTHROUGH
> to setup passthrough of read/write operations on the open file.
>
> Also, add the passthrough functions for the set-up and tear-down of the
> data structures and locks that will be used both when fuse_conns and
> fuse_files are created/deleted.
>
> Signed-off-by: Alessio Balsini <balsini@android.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/fuse/Makefile          |  1 +
>  fs/fuse/dev.c             | 33 ++++++++++++++++++++++++--------
>  fs/fuse/dir.c             |  7 ++++++-
>  fs/fuse/file.c            | 17 +++++++++++++----
>  fs/fuse/fuse_i.h          | 27 ++++++++++++++++++++++++++
>  fs/fuse/inode.c           | 21 +++++++++++++++++++-
>  fs/fuse/passthrough.c     | 40 +++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/fuse.h | 13 +++++++++++--
>  8 files changed, 143 insertions(+), 16 deletions(-)
>  create mode 100644 fs/fuse/passthrough.c
>
> diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
> index 0c48b35c058d..d9e1b47382f3 100644
> --- a/fs/fuse/Makefile
> +++ b/fs/fuse/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_CUSE) += cuse.o
>  obj-$(CONFIG_VIRTIO_FS) += virtiofs.o
>
>  fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
> +fuse-y += passthrough.o
>  fuse-$(CONFIG_FUSE_DAX) += dax.o
>
>  virtiofs-y := virtio_fs.o
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 1a8f82f478cb..cb00234e7843 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -2255,16 +2255,19 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
>                            unsigned long arg)
>  {
>         int res;
> -       int oldfd;
> -       struct fuse_dev *fud = NULL;
> +       int fd, id;
> +       struct fuse_dev *fud = fuse_get_dev(file);

This is broken, see below.

>         struct fd f;
>
> +       if (!fud)
> +               return -EINVAL;
> +
>         switch (cmd) {
>         case FUSE_DEV_IOC_CLONE:
> -               if (get_user(oldfd, (__u32 __user *)arg))
> +               if (get_user(fd, (__u32 __user *)arg))
>                         return -EFAULT;
>
> -               f = fdget(oldfd);
> +               f = fdget(fd);
>                 if (!f.file)
>                         return -EINVAL;
>
> @@ -2272,17 +2275,31 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
>                  * Check against file->f_op because CUSE
>                  * uses the same ioctl handler.
>                  */
> -               if (f.file->f_op == file->f_op)
> -                       fud = fuse_get_dev(f.file);
> -
>                 res = -EINVAL;
> -               if (fud) {
> +               if (f.file->f_op == file->f_op) {
>                         mutex_lock(&fuse_mutex);
>                         res = fuse_device_clone(fud->fc, file);

We are cloning f.file into file not the other way round.  So fud must
come from f.file.


>                         mutex_unlock(&fuse_mutex);
>                 }
>                 fdput(f);
>                 break;
> +       case FUSE_DEV_IOC_PASSTHROUGH_OPEN:
> +               if (get_user(fd, (__u32 __user *)arg))
> +                       return -EFAULT;
> +
> +               f = fdget(fd);
> +               if (!f.file)
> +                       return -EINVAL;
> +
> +               res = fuse_passthrough_open(fud->fc, fd);
> +               fdput(f);
> +               break;
> +       case FUSE_DEV_IOC_PASSTHROUGH_CLOSE:
> +               if (get_user(id, (__u32 __user *)arg))
> +                       return -EFAULT;
> +
> +               res = fuse_passthrough_close(fud->fc, id);
> +               break;
>         default:
>                 res = -ENOTTY;
>                 break;
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 35bc174f9ba2..1894298e7f7a 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -619,6 +619,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  {
>         int err;
>         struct inode *inode;
> +       struct fuse_conn *fc = get_fuse_conn(dir);
>         struct fuse_mount *fm = get_fuse_mount(dir);
>         FUSE_ARGS(args);
>         struct fuse_forget_link *forget;
> @@ -700,7 +701,11 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>         d_instantiate(entry, inode);
>         fuse_change_entry_timeout(entry, &outentry);
>         fuse_dir_changed(dir);
> -       err = finish_open(file, entry, generic_file_open);
> +       err = 0;
> +       if (ff->open_flags & FOPEN_PASSTHROUGH)
> +               err = fuse_passthrough_setup(fc, ff, &outopen);
> +       if (!err)
> +               err = finish_open(file, entry, generic_file_open);
>         if (err) {
>                 fi = get_fuse_inode(inode);
>                 fuse_sync_release(fi, ff, flags);
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 89d97f6188e0..96a46a5aa892 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -132,6 +132,7 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
>         struct fuse_conn *fc = fm->fc;
>         struct fuse_file *ff;
>         int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN;
> +       int err;
>
>         ff = fuse_file_alloc(fm);
>         if (!ff)
> @@ -142,16 +143,17 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
>         ff->open_flags = FOPEN_KEEP_CACHE | (isdir ? FOPEN_CACHE_DIR : 0);
>         if (isdir ? !fc->no_opendir : !fc->no_open) {
>                 struct fuse_open_out outarg;
> -               int err;
>
>                 err = fuse_send_open(fm, nodeid, open_flags, opcode, &outarg);
>                 if (!err) {
>                         ff->fh = outarg.fh;
>                         ff->open_flags = outarg.open_flags;
> -
> +                       if (ff->open_flags & FOPEN_PASSTHROUGH)
> +                               err = fuse_passthrough_setup(fc, ff, &outarg);
> +                       if (err)
> +                               goto out_free_ff;
>                 } else if (err != -ENOSYS) {
> -                       fuse_file_free(ff);
> -                       return ERR_PTR(err);
> +                       goto out_free_ff;
>                 } else {
>                         if (isdir)
>                                 fc->no_opendir = 1;
> @@ -166,6 +168,10 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
>         ff->nodeid = nodeid;
>
>         return ff;
> +
> +out_free_ff:
> +       fuse_file_free(ff);
> +       return ERR_PTR(err);
>  }
>
>  int fuse_do_open(struct fuse_mount *fm, u64 nodeid, struct file *file,
> @@ -279,6 +285,9 @@ static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff,
>         struct fuse_conn *fc = ff->fm->fc;
>         struct fuse_release_args *ra = ff->release_args;
>
> +       fuse_passthrough_put(ff->passthrough);
> +       ff->passthrough = NULL;
> +
>         /* Inode is NULL on error path of fuse_create_open() */
>         if (likely(fi)) {
>                 spin_lock(&fi->lock);
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 9b7fc7d3c7f1..f52604534ff6 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -173,6 +173,16 @@ struct fuse_conn;
>  struct fuse_mount;
>  struct fuse_release_args;
>
> +/**
> + * Reference to backing file for read/write operations in passthrough mode.
> + */
> +struct fuse_passthrough {
> +       struct file *filp;
> +
> +       /** refcount */
> +       refcount_t count;
> +};
> +
>  /** FUSE specific file data */
>  struct fuse_file {
>         /** Fuse connection for this file */
> @@ -218,6 +228,9 @@ struct fuse_file {
>
>         } readdir;
>
> +       /** Container for data related to the passthrough functionality */
> +       struct fuse_passthrough *passthrough;
> +
>         /** RB node to be linked on fuse_conn->polled_files */
>         struct rb_node polled_node;
>
> @@ -792,6 +805,9 @@ struct fuse_conn {
>         /* Is tmpfile not implemented by fs? */
>         unsigned int no_tmpfile:1;
>
> +       /** Passthrough mode for read/write IO */
> +       unsigned int passthrough:1;
> +
>         /** The number of requests waiting for completion */
>         atomic_t num_waiting;
>
> @@ -841,6 +857,9 @@ struct fuse_conn {
>
>         /* New writepages go into this bucket */
>         struct fuse_sync_bucket __rcu *curr_bucket;
> +
> +       /** IDR for passthrough files */
> +       struct idr passthrough_files_map;
>  };
>
>  /*
> @@ -1324,4 +1343,12 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
>  void fuse_file_release(struct inode *inode, struct fuse_file *ff,
>                        unsigned int open_flags, fl_owner_t id, bool isdir);
>
> +/* passthrough.c */
> +int fuse_passthrough_open(struct fuse_conn *fc, int backing_fd);
> +int fuse_passthrough_close(struct fuse_conn *fc, int passthrough_fh);
> +int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
> +                          struct fuse_open_out *openarg);
> +void fuse_passthrough_put(struct fuse_passthrough *passthrough);
> +void fuse_passthrough_free(struct fuse_passthrough *passthrough);
> +
>  #endif /* _FS_FUSE_I_H */
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index d66070af145d..271586fac008 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -840,6 +840,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
>         INIT_LIST_HEAD(&fc->bg_queue);
>         INIT_LIST_HEAD(&fc->entry);
>         INIT_LIST_HEAD(&fc->devices);
> +       idr_init(&fc->passthrough_files_map);
>         atomic_set(&fc->num_waiting, 0);
>         fc->max_background = FUSE_DEFAULT_MAX_BACKGROUND;
>         fc->congestion_threshold = FUSE_DEFAULT_CONGESTION_THRESHOLD;
> @@ -1209,6 +1210,12 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
>                                 fc->init_security = 1;
>                         if (flags & FUSE_CREATE_SUPP_GROUP)
>                                 fc->create_supp_group = 1;
> +                       if (flags & FUSE_PASSTHROUGH) {
> +                               fc->passthrough = 1;
> +                               /* Prevent further stacking */
> +                               fm->sb->s_stack_depth =
> +                                       FILESYSTEM_MAX_STACK_DEPTH;
> +                       }

Seems too restrictive.  We could specify the max stacking depth in the
protocol and verify that when registering the passthrough file.

I.e. fuse_sb->s_stack_depth of

0 -> passthrough disabled
1 -> backing_sb->s_stack_depth == 0
2 -> backing_sb->stack_depth <= 1
...

>                 } else {
>                         ra_pages = fc->max_read / PAGE_SIZE;
>                         fc->no_lock = 1;
> @@ -1254,7 +1261,8 @@ void fuse_send_init(struct fuse_mount *fm)
>                 FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS |
>                 FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA |
>                 FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT |
> -               FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP;
> +               FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP |
> +               FUSE_PASSTHROUGH;
>  #ifdef CONFIG_FUSE_DAX
>         if (fm->fc->dax)
>                 flags |= FUSE_MAP_ALIGNMENT;
> @@ -1287,9 +1295,20 @@ void fuse_send_init(struct fuse_mount *fm)
>  }
>  EXPORT_SYMBOL_GPL(fuse_send_init);
>
> +static int fuse_passthrough_id_free(int id, void *p, void *data)
> +{
> +       struct fuse_passthrough *passthrough = p;
> +
> +       WARN_ON_ONCE(refcount_read(&passthrough->count) != 1);
> +       fuse_passthrough_free(passthrough);
> +       return 0;
> +}
> +
>  void fuse_free_conn(struct fuse_conn *fc)
>  {
>         WARN_ON(!list_empty(&fc->devices));
> +       idr_for_each(&fc->passthrough_files_map, fuse_passthrough_id_free, NULL);
> +       idr_destroy(&fc->passthrough_files_map);
>         kfree_rcu(fc, rcu);
>  }
>  EXPORT_SYMBOL_GPL(fuse_free_conn);
> diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
> new file mode 100644
> index 000000000000..fc723e004de9
> --- /dev/null
> +++ b/fs/fuse/passthrough.c
> @@ -0,0 +1,40 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include "fuse_i.h"
> +
> +#include <linux/file.h>
> +
> +/*
> + * Returns passthrough_fh id that can be passed with FOPEN_PASSTHROUGH
> + * open response and needs to be released with fuse_passthrough_close().
> + */
> +int fuse_passthrough_open(struct fuse_conn *fc, int backing_fd)
> +{
> +       return -EINVAL;
> +}
> +
> +int fuse_passthrough_close(struct fuse_conn *fc, int passthrough_fh)
> +{
> +       return -EINVAL;
> +}
> +
> +int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
> +                          struct fuse_open_out *openarg)
> +{
> +       return -EINVAL;
> +}
> +
> +void fuse_passthrough_put(struct fuse_passthrough *passthrough)
> +{
> +       if (passthrough && refcount_dec_and_test(&passthrough->count))
> +               fuse_passthrough_free(passthrough);
> +}
> +
> +void fuse_passthrough_free(struct fuse_passthrough *passthrough)
> +{
> +       if (passthrough && passthrough->filp) {
> +               fput(passthrough->filp);
> +               passthrough->filp = NULL;
> +       }
> +       kfree(passthrough);
> +}
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index 1b9d0dfae72d..3da1f59007cf 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -206,6 +206,10 @@
>   *  - add extension header
>   *  - add FUSE_EXT_GROUPS
>   *  - add FUSE_CREATE_SUPP_GROUP
> + *
> + *  7.39
> + *  - add FUSE_PASSTHROUGH
> + *  - add FOPEN_PASSTHROUGH
>   */
>
>  #ifndef _LINUX_FUSE_H
> @@ -241,7 +245,7 @@
>  #define FUSE_KERNEL_VERSION 7
>
>  /** Minor version number of this interface */
> -#define FUSE_KERNEL_MINOR_VERSION 38
> +#define FUSE_KERNEL_MINOR_VERSION 39
>
>  /** The node ID of the root inode */
>  #define FUSE_ROOT_ID 1
> @@ -314,6 +318,7 @@ struct fuse_file_lock {
>   * FOPEN_STREAM: the file is stream-like (no file position at all)
>   * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
>   * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
> + * FOPEN_PASSTHROUGH: passthrough read/write operations for this open file
>   */
>  #define FOPEN_DIRECT_IO                (1 << 0)
>  #define FOPEN_KEEP_CACHE       (1 << 1)
> @@ -322,6 +327,7 @@ struct fuse_file_lock {
>  #define FOPEN_STREAM           (1 << 4)
>  #define FOPEN_NOFLUSH          (1 << 5)
>  #define FOPEN_PARALLEL_DIRECT_WRITES   (1 << 6)
> +#define FOPEN_PASSTHROUGH      (1 << 7)
>
>  /**
>   * INIT request/reply flags
> @@ -406,6 +412,7 @@ struct fuse_file_lock {
>  #define FUSE_SECURITY_CTX      (1ULL << 32)
>  #define FUSE_HAS_INODE_DAX     (1ULL << 33)
>  #define FUSE_CREATE_SUPP_GROUP (1ULL << 34)
> +#define FUSE_PASSTHROUGH       (1ULL << 35)
>
>  /**
>   * CUSE INIT request/reply flags
> @@ -698,7 +705,7 @@ struct fuse_create_in {
>  struct fuse_open_out {
>         uint64_t        fh;
>         uint32_t        open_flags;
> -       uint32_t        padding;
> +       uint32_t        passthrough_fh;
>  };
>
>  struct fuse_release_in {
> @@ -989,6 +996,8 @@ struct fuse_notify_retrieve_in {
>  /* Device ioctls: */
>  #define FUSE_DEV_IOC_MAGIC             229
>  #define FUSE_DEV_IOC_CLONE             _IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t)
> +#define FUSE_DEV_IOC_PASSTHROUGH_OPEN  _IOW(FUSE_DEV_IOC_MAGIC, 1, uint32_t)
> +#define FUSE_DEV_IOC_PASSTHROUGH_CLOSE _IOW(FUSE_DEV_IOC_MAGIC, 2, uint32_t)
>
>  struct fuse_lseek_in {
>         uint64_t        fh;
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 02/10] fuse: Definitions and ioctl for passthrough
  2023-05-19 15:12   ` Miklos Szeredi
@ 2023-05-20 10:20     ` Amir Goldstein
  2023-05-22 14:50       ` Miklos Szeredi
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-05-20 10:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Fri, May 19, 2023 at 6:13 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > From: Alessio Balsini <balsini@android.com>
> >
> > Expose the FUSE_PASSTHROUGH capability to user space and declare all the
> > basic data structures and functions as the skeleton on top of which the
> > FUSE passthrough functionality will be built.
> >
> > As part of this, introduce the new FUSE passthrough ioctl, which allows
> > the FUSE daemon to specify a direct connection between a FUSE file and a
> > backing file.  The ioctl requires user space to pass the file descriptor
> > of one of its opened files to the FUSE driver and get an id in return.
> > This id may be passed in a reply to OPEN with flag FOPEN_PASSTHROUGH
> > to setup passthrough of read/write operations on the open file.
> >
> > Also, add the passthrough functions for the set-up and tear-down of the
> > data structures and locks that will be used both when fuse_conns and
> > fuse_files are created/deleted.
> >
> > Signed-off-by: Alessio Balsini <balsini@android.com>
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/fuse/Makefile          |  1 +
> >  fs/fuse/dev.c             | 33 ++++++++++++++++++++++++--------
> >  fs/fuse/dir.c             |  7 ++++++-
> >  fs/fuse/file.c            | 17 +++++++++++++----
> >  fs/fuse/fuse_i.h          | 27 ++++++++++++++++++++++++++
> >  fs/fuse/inode.c           | 21 +++++++++++++++++++-
> >  fs/fuse/passthrough.c     | 40 +++++++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/fuse.h | 13 +++++++++++--
> >  8 files changed, 143 insertions(+), 16 deletions(-)
> >  create mode 100644 fs/fuse/passthrough.c
> >
> > diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
> > index 0c48b35c058d..d9e1b47382f3 100644
> > --- a/fs/fuse/Makefile
> > +++ b/fs/fuse/Makefile
> > @@ -8,6 +8,7 @@ obj-$(CONFIG_CUSE) += cuse.o
> >  obj-$(CONFIG_VIRTIO_FS) += virtiofs.o
> >
> >  fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
> > +fuse-y += passthrough.o
> >  fuse-$(CONFIG_FUSE_DAX) += dax.o
> >
> >  virtiofs-y := virtio_fs.o
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index 1a8f82f478cb..cb00234e7843 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > @@ -2255,16 +2255,19 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
> >                            unsigned long arg)
> >  {
> >         int res;
> > -       int oldfd;
> > -       struct fuse_dev *fud = NULL;
> > +       int fd, id;
> > +       struct fuse_dev *fud = fuse_get_dev(file);
>
> This is broken, see below.

IIUC, *this* is not broken for the new ioctls...

>
> >         struct fd f;
> >
> > +       if (!fud)
> > +               return -EINVAL;
> > +
> >         switch (cmd) {
> >         case FUSE_DEV_IOC_CLONE:
> > -               if (get_user(oldfd, (__u32 __user *)arg))
> > +               if (get_user(fd, (__u32 __user *)arg))
> >                         return -EFAULT;
> >
> > -               f = fdget(oldfd);
> > +               f = fdget(fd);
> >                 if (!f.file)
> >                         return -EINVAL;
> >
> > @@ -2272,17 +2275,31 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
> >                  * Check against file->f_op because CUSE
> >                  * uses the same ioctl handler.
> >                  */
> > -               if (f.file->f_op == file->f_op)
> > -                       fud = fuse_get_dev(f.file);
> > -
> >                 res = -EINVAL;
> > -               if (fud) {
> > +               if (f.file->f_op == file->f_op) {

and this can be fixed by adding:
 +                           fud = fuse_get_dev(f.file);

> >                         mutex_lock(&fuse_mutex);
> >                         res = fuse_device_clone(fud->fc, file);
>
> We are cloning f.file into file not the other way round.  So fud must
> come from f.file.
>
>
> >                         mutex_unlock(&fuse_mutex);
> >                 }
> >                 fdput(f);
> >                 break;
> > +       case FUSE_DEV_IOC_PASSTHROUGH_OPEN:
> > +               if (get_user(fd, (__u32 __user *)arg))
> > +                       return -EFAULT;
> > +
> > +               f = fdget(fd);
> > +               if (!f.file)
> > +                       return -EINVAL;
> > +
> > +               res = fuse_passthrough_open(fud->fc, fd);
> > +               fdput(f);
> > +               break;
> > +       case FUSE_DEV_IOC_PASSTHROUGH_CLOSE:
> > +               if (get_user(id, (__u32 __user *)arg))
> > +                       return -EFAULT;
> > +
> > +               res = fuse_passthrough_close(fud->fc, id);
> > +               break;
> >         default:
> >                 res = -ENOTTY;
> >                 break;
> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index 35bc174f9ba2..1894298e7f7a 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -619,6 +619,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> >  {
> >         int err;
> >         struct inode *inode;
> > +       struct fuse_conn *fc = get_fuse_conn(dir);
> >         struct fuse_mount *fm = get_fuse_mount(dir);
> >         FUSE_ARGS(args);
> >         struct fuse_forget_link *forget;
> > @@ -700,7 +701,11 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> >         d_instantiate(entry, inode);
> >         fuse_change_entry_timeout(entry, &outentry);
> >         fuse_dir_changed(dir);
> > -       err = finish_open(file, entry, generic_file_open);
> > +       err = 0;
> > +       if (ff->open_flags & FOPEN_PASSTHROUGH)
> > +               err = fuse_passthrough_setup(fc, ff, &outopen);
> > +       if (!err)
> > +               err = finish_open(file, entry, generic_file_open);
> >         if (err) {
> >                 fi = get_fuse_inode(inode);
> >                 fuse_sync_release(fi, ff, flags);
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 89d97f6188e0..96a46a5aa892 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -132,6 +132,7 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
> >         struct fuse_conn *fc = fm->fc;
> >         struct fuse_file *ff;
> >         int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN;
> > +       int err;
> >
> >         ff = fuse_file_alloc(fm);
> >         if (!ff)
> > @@ -142,16 +143,17 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
> >         ff->open_flags = FOPEN_KEEP_CACHE | (isdir ? FOPEN_CACHE_DIR : 0);
> >         if (isdir ? !fc->no_opendir : !fc->no_open) {
> >                 struct fuse_open_out outarg;
> > -               int err;
> >
> >                 err = fuse_send_open(fm, nodeid, open_flags, opcode, &outarg);
> >                 if (!err) {
> >                         ff->fh = outarg.fh;
> >                         ff->open_flags = outarg.open_flags;
> > -
> > +                       if (ff->open_flags & FOPEN_PASSTHROUGH)
> > +                               err = fuse_passthrough_setup(fc, ff, &outarg);
> > +                       if (err)
> > +                               goto out_free_ff;
> >                 } else if (err != -ENOSYS) {
> > -                       fuse_file_free(ff);
> > -                       return ERR_PTR(err);
> > +                       goto out_free_ff;
> >                 } else {
> >                         if (isdir)
> >                                 fc->no_opendir = 1;
> > @@ -166,6 +168,10 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
> >         ff->nodeid = nodeid;
> >
> >         return ff;
> > +
> > +out_free_ff:
> > +       fuse_file_free(ff);
> > +       return ERR_PTR(err);
> >  }
> >
> >  int fuse_do_open(struct fuse_mount *fm, u64 nodeid, struct file *file,
> > @@ -279,6 +285,9 @@ static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff,
> >         struct fuse_conn *fc = ff->fm->fc;
> >         struct fuse_release_args *ra = ff->release_args;
> >
> > +       fuse_passthrough_put(ff->passthrough);
> > +       ff->passthrough = NULL;
> > +
> >         /* Inode is NULL on error path of fuse_create_open() */
> >         if (likely(fi)) {
> >                 spin_lock(&fi->lock);
> > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > index 9b7fc7d3c7f1..f52604534ff6 100644
> > --- a/fs/fuse/fuse_i.h
> > +++ b/fs/fuse/fuse_i.h
> > @@ -173,6 +173,16 @@ struct fuse_conn;
> >  struct fuse_mount;
> >  struct fuse_release_args;
> >
> > +/**
> > + * Reference to backing file for read/write operations in passthrough mode.
> > + */
> > +struct fuse_passthrough {
> > +       struct file *filp;
> > +
> > +       /** refcount */
> > +       refcount_t count;
> > +};
> > +
> >  /** FUSE specific file data */
> >  struct fuse_file {
> >         /** Fuse connection for this file */
> > @@ -218,6 +228,9 @@ struct fuse_file {
> >
> >         } readdir;
> >
> > +       /** Container for data related to the passthrough functionality */
> > +       struct fuse_passthrough *passthrough;
> > +
> >         /** RB node to be linked on fuse_conn->polled_files */
> >         struct rb_node polled_node;
> >
> > @@ -792,6 +805,9 @@ struct fuse_conn {
> >         /* Is tmpfile not implemented by fs? */
> >         unsigned int no_tmpfile:1;
> >
> > +       /** Passthrough mode for read/write IO */
> > +       unsigned int passthrough:1;
> > +
> >         /** The number of requests waiting for completion */
> >         atomic_t num_waiting;
> >
> > @@ -841,6 +857,9 @@ struct fuse_conn {
> >
> >         /* New writepages go into this bucket */
> >         struct fuse_sync_bucket __rcu *curr_bucket;
> > +
> > +       /** IDR for passthrough files */
> > +       struct idr passthrough_files_map;
> >  };
> >
> >  /*
> > @@ -1324,4 +1343,12 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
> >  void fuse_file_release(struct inode *inode, struct fuse_file *ff,
> >                        unsigned int open_flags, fl_owner_t id, bool isdir);
> >
> > +/* passthrough.c */
> > +int fuse_passthrough_open(struct fuse_conn *fc, int backing_fd);
> > +int fuse_passthrough_close(struct fuse_conn *fc, int passthrough_fh);
> > +int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
> > +                          struct fuse_open_out *openarg);
> > +void fuse_passthrough_put(struct fuse_passthrough *passthrough);
> > +void fuse_passthrough_free(struct fuse_passthrough *passthrough);
> > +
> >  #endif /* _FS_FUSE_I_H */
> > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > index d66070af145d..271586fac008 100644
> > --- a/fs/fuse/inode.c
> > +++ b/fs/fuse/inode.c
> > @@ -840,6 +840,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
> >         INIT_LIST_HEAD(&fc->bg_queue);
> >         INIT_LIST_HEAD(&fc->entry);
> >         INIT_LIST_HEAD(&fc->devices);
> > +       idr_init(&fc->passthrough_files_map);
> >         atomic_set(&fc->num_waiting, 0);
> >         fc->max_background = FUSE_DEFAULT_MAX_BACKGROUND;
> >         fc->congestion_threshold = FUSE_DEFAULT_CONGESTION_THRESHOLD;
> > @@ -1209,6 +1210,12 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
> >                                 fc->init_security = 1;
> >                         if (flags & FUSE_CREATE_SUPP_GROUP)
> >                                 fc->create_supp_group = 1;
> > +                       if (flags & FUSE_PASSTHROUGH) {
> > +                               fc->passthrough = 1;
> > +                               /* Prevent further stacking */
> > +                               fm->sb->s_stack_depth =
> > +                                       FILESYSTEM_MAX_STACK_DEPTH;
> > +                       }
>
> Seems too restrictive.  We could specify the max stacking depth in the
> protocol and verify that when registering the passthrough file.
>
> I.e. fuse_sb->s_stack_depth of
>
> 0 -> passthrough disabled
> 1 -> backing_sb->s_stack_depth == 0
> 2 -> backing_sb->stack_depth <= 1
> ...
>

Ok. Let's see.
What do we stand to gain from the ability to setup nax stacking depth?

We could use it to setup an overlayfs with lower FUSE that allows passthrough
fds to a non-stacked backing fs and we could use it to setup FUSE that allows
passthrough fds to overlayfs.

I pity the FUSE userspace developers that will need to understand this
setup parameter...
So ignoring the possibility of  FILESYSTEM_MAX_STACK_DEPTH changing in
the future, maybe better describe this with two capability flags
instead of an int?

FUSE_PASSTHROUGH_NESTED ???

Would NESTED mean max depth 1 or 2 through? ;-)

> >                 } else {
> >                         ra_pages = fc->max_read / PAGE_SIZE;
> >                         fc->no_lock = 1;
> > @@ -1254,7 +1261,8 @@ void fuse_send_init(struct fuse_mount *fm)
> >                 FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS |
> >                 FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA |
> >                 FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT |
> > -               FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP;
> > +               FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP |
> > +               FUSE_PASSTHROUGH;
> >  #ifdef CONFIG_FUSE_DAX
> >         if (fm->fc->dax)
> >                 flags |= FUSE_MAP_ALIGNMENT;
> > @@ -1287,9 +1295,20 @@ void fuse_send_init(struct fuse_mount *fm)
> >  }
> >  EXPORT_SYMBOL_GPL(fuse_send_init);
> >
> > +static int fuse_passthrough_id_free(int id, void *p, void *data)
> > +{
> > +       struct fuse_passthrough *passthrough = p;
> > +
> > +       WARN_ON_ONCE(refcount_read(&passthrough->count) != 1);
> > +       fuse_passthrough_free(passthrough);
> > +       return 0;
> > +}
> > +
> >  void fuse_free_conn(struct fuse_conn *fc)
> >  {
> >         WARN_ON(!list_empty(&fc->devices));
> > +       idr_for_each(&fc->passthrough_files_map, fuse_passthrough_id_free, NULL);
> > +       idr_destroy(&fc->passthrough_files_map);
> >         kfree_rcu(fc, rcu);
> >  }
> >  EXPORT_SYMBOL_GPL(fuse_free_conn);
> > diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
> > new file mode 100644
> > index 000000000000..fc723e004de9
> > --- /dev/null
> > +++ b/fs/fuse/passthrough.c
> > @@ -0,0 +1,40 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +

Alessio et. al.
I noticed that this file does not have any copyright in your original patch.

Do you mind if I retroactively add:

/*
 * FUSE passthrough support.
 *
 * Copyright (c) 2021 Google LLC.
 * Copyright (c) 2023 CTERA Networks.
 */

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-05-19 12:56 [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
                   ` (9 preceding siblings ...)
  2023-05-19 12:57 ` [PATCH v13 10/10] fuse: setup a passthrough fd without a permanent backing id Amir Goldstein
@ 2023-05-20 10:28 ` Amir Goldstein
  2023-06-06  9:13 ` Amir Goldstein
  11 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-05-20 10:28 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Fri, May 19, 2023 at 3:57 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Miklos,
>
> This patch set addresses your review feedback on Alesio's V12 patch set
> from 2021 [1] as well as other bugs that I have found since.
> This patch set uses refcounted backing files as we discussed recently [2].
>
> I am posting this for several possible outcomes:
>
> 1. Either FUSE-BPF develpers can use this as a reference implementation
>    for their 1st phase of "backing file passthrough"
> 2. Or they can tell me which API changes need to made to this patch set
>    so the API is flexible enough to extend to "backing inode passthrough"
>    and to "BPF filters" later on
> 3. We find there is little overlap in the APIs and merge this as is
>
> These patches are available on github [3] along with libfuse patches [4].
> I tested them by running xfstests (./check -fuse -g quick.rw) with latest
> libfuse xfstest support.
>
> Without FOPEN_PASSTHROUGH, one test in this group fails (generic/451)
> which tests mixed buffered/aio writes.
> With FOPEN_PASSTHROUGH, this test also passes.
>
> This revision does not set any limitations on the number of backing files
> that can be mapped by the server.  I considered several ways to address
> this and decided to try a different approach.
>
> Patch 10 (with matching libfuse patch) is an RFC patch for an alternative
> API approach. Please see my comments on that patch.
>
> Thanks,
> Amir.
>
> [1] https://lore.kernel.org/linux-fsdevel/20210125153057.3623715-1-balsini@android.com/
> [2] https://lore.kernel.org/linux-fsdevel/CAJfpegvbMKadnsBZmEvZpCxeWaMEGDRiDBqEZqaBSXcWyPZnpA@mail.gmail.com/
> [3] https://github.com/amir73il/linux/commits/fuse-passthrough-fd
> [4] https://github.com/amir73il/libfuse/commits/fuse-passthrough-fd
>
> Changes since v12:
> - Rebase to v6.4-rc2
> - Reword 'lower file' language to 'backing file'
> - Add explicit FOPEN_PASSTHROUGH open flags
> - Remove fuse_passthrough_out container
> - Add FUSE_DEV_IOC_PASSTHROUGH_CLOSE ioctl
> - Add experimental FUSE_DEV_IOC_PASSTHROUGH_SETUP ioctl
> - Distinguished errors for failures to create passthrough id
>   (EBADF, EOPNOTSUPP, ELOOP)
> - idr and fuse_file point to refcounted passthrough object
> - Use rcu_read_lock() to get passthrough object by id
> - Handle errors to setup passthrough in atomic_open()
> - Invalidate mtime/size after passthrough write
> - Invalidate atime after passthrough read/mmap
> - Bump FUSE protocol minor version

> Alessio Balsini (2):
>   fs: Generic function to convert iocb to rw flags
>   fuse: Definitions and ioctl for passthrough
>
> Amir Goldstein (8):
>   fuse: Passthrough initialization and release
>   fuse: Introduce synchronous read and write for passthrough
>   fuse: Handle asynchronous read and write in passthrough
>   fuse: Use daemon creds in passthrough mode
>   fuse: Introduce passthrough for mmap
>   fuse: update inode size/mtime after passthrough write
>   fuse: invalidate atime after passthrough read/mmap
>   fuse: setup a passthrough fd without a permanent backing id
>

Hi Alessio et al.

FYI, the authorship of the patches in this set have been randomly
reset by git rebase.

I've restored Alessio's authorship on my branch for patches 4-7
which I have changed only mildly.

OTOH, I have taken over authorship on patches 2-3 which I had
made more significant changes to (i.e. refcounted backing files).

Also, please note my comment on patch 2 for copyright notice
on the new passthrough.c file.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 02/10] fuse: Definitions and ioctl for passthrough
  2023-05-20 10:20     ` Amir Goldstein
@ 2023-05-22 14:50       ` Miklos Szeredi
  2023-05-24 10:00         ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Miklos Szeredi @ 2023-05-22 14:50 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Sat, 20 May 2023 at 12:20, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Fri, May 19, 2023 at 6:13 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > From: Alessio Balsini <balsini@android.com>
> > >
> > > Expose the FUSE_PASSTHROUGH capability to user space and declare all the
> > > basic data structures and functions as the skeleton on top of which the
> > > FUSE passthrough functionality will be built.
> > >
> > > As part of this, introduce the new FUSE passthrough ioctl, which allows
> > > the FUSE daemon to specify a direct connection between a FUSE file and a
> > > backing file.  The ioctl requires user space to pass the file descriptor
> > > of one of its opened files to the FUSE driver and get an id in return.
> > > This id may be passed in a reply to OPEN with flag FOPEN_PASSTHROUGH
> > > to setup passthrough of read/write operations on the open file.
> > >
> > > Also, add the passthrough functions for the set-up and tear-down of the
> > > data structures and locks that will be used both when fuse_conns and
> > > fuse_files are created/deleted.
> > >
> > > Signed-off-by: Alessio Balsini <balsini@android.com>
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > >  fs/fuse/Makefile          |  1 +
> > >  fs/fuse/dev.c             | 33 ++++++++++++++++++++++++--------
> > >  fs/fuse/dir.c             |  7 ++++++-
> > >  fs/fuse/file.c            | 17 +++++++++++++----
> > >  fs/fuse/fuse_i.h          | 27 ++++++++++++++++++++++++++
> > >  fs/fuse/inode.c           | 21 +++++++++++++++++++-
> > >  fs/fuse/passthrough.c     | 40 +++++++++++++++++++++++++++++++++++++++
> > >  include/uapi/linux/fuse.h | 13 +++++++++++--
> > >  8 files changed, 143 insertions(+), 16 deletions(-)
> > >  create mode 100644 fs/fuse/passthrough.c
> > >
> > > diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
> > > index 0c48b35c058d..d9e1b47382f3 100644
> > > --- a/fs/fuse/Makefile
> > > +++ b/fs/fuse/Makefile
> > > @@ -8,6 +8,7 @@ obj-$(CONFIG_CUSE) += cuse.o
> > >  obj-$(CONFIG_VIRTIO_FS) += virtiofs.o
> > >
> > >  fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
> > > +fuse-y += passthrough.o
> > >  fuse-$(CONFIG_FUSE_DAX) += dax.o
> > >
> > >  virtiofs-y := virtio_fs.o
> > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > > index 1a8f82f478cb..cb00234e7843 100644
> > > --- a/fs/fuse/dev.c
> > > +++ b/fs/fuse/dev.c
> > > @@ -2255,16 +2255,19 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
> > >                            unsigned long arg)
> > >  {
> > >         int res;
> > > -       int oldfd;
> > > -       struct fuse_dev *fud = NULL;
> > > +       int fd, id;
> > > +       struct fuse_dev *fud = fuse_get_dev(file);
> >
> > This is broken, see below.
>
> IIUC, *this* is not broken for the new ioctls...
>
> >
> > >         struct fd f;
> > >
> > > +       if (!fud)
> > > +               return -EINVAL;
> > > +

This is also broken for the old ioctl.

> > >         switch (cmd) {
> > >         case FUSE_DEV_IOC_CLONE:
> > > -               if (get_user(oldfd, (__u32 __user *)arg))
> > > +               if (get_user(fd, (__u32 __user *)arg))
> > >                         return -EFAULT;
> > >
> > > -               f = fdget(oldfd);
> > > +               f = fdget(fd);
> > >                 if (!f.file)
> > >                         return -EINVAL;
> > >
> > > @@ -2272,17 +2275,31 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
> > >                  * Check against file->f_op because CUSE
> > >                  * uses the same ioctl handler.
> > >                  */
> > > -               if (f.file->f_op == file->f_op)
> > > -                       fud = fuse_get_dev(f.file);
> > > -
> > >                 res = -EINVAL;
> > > -               if (fud) {
> > > +               if (f.file->f_op == file->f_op) {
>
> and this can be fixed by adding:
>  +                           fud = fuse_get_dev(f.file);

Yes, but it's still messy.

I suggest separating out unrelated ioctl commands into different
functions.  Not sure if it's worth doing the open/close in a common
function, I'll leave that to you.

[snip]

> > Seems too restrictive.  We could specify the max stacking depth in the
> > protocol and verify that when registering the passthrough file.
> >
> > I.e. fuse_sb->s_stack_depth of
> >
> > 0 -> passthrough disabled
> > 1 -> backing_sb->s_stack_depth == 0
> > 2 -> backing_sb->stack_depth <= 1
> > ...
> >
>
> Ok. Let's see.
> What do we stand to gain from the ability to setup nax stacking depth?
>
> We could use it to setup an overlayfs with lower FUSE that allows passthrough
> fds to a non-stacked backing fs and we could use it to setup FUSE that allows
> passthrough fds to overlayfs.
>
> I pity the FUSE userspace developers that will need to understand this
> setup parameter...

I guess libfuse could parse it with other common options.  It's
something that needs to be tuned on a per-case basis, not something
the filesystem designer can predict.

Would be better if we could have a per-inode stack depth and then this
wouldn't have to be tuned.  Is that feasible?

> So ignoring the possibility of  FILESYSTEM_MAX_STACK_DEPTH changing in
> the future, maybe better describe this with two capability flags
> instead of an int?

The max depth could be changed, this value was just chosen because we
didn't have a use case for a larger fs stack, I guess.  Some analysis
about the kernel stack usage would be required before doing so (btw.
such analysis was never done, so it would be useful regardless).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 05/10] fuse: Handle asynchronous read and write in passthrough
  2023-05-19 12:57 ` [PATCH v13 05/10] fuse: Handle asynchronous read and write in passthrough Amir Goldstein
@ 2023-05-22 15:20   ` Miklos Szeredi
  2023-05-24 10:03     ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Miklos Szeredi @ 2023-05-22 15:20 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
>
> Extend the passthrough feature by handling asynchronous IO both for read
> and write operations.
>
> When an AIO request is received, if the request targets a FUSE file with
> the passthrough functionality enabled, a new identical AIO request is
> created.  The new request targets the backing file and gets assigned
> a special FUSE passthrough AIO completion callback.
>
> When the backing file AIO request is completed, the FUSE
> passthrough AIO completion callback is executed and propagates the
> completion signal to the FUSE AIO request by triggering its completion
> callback as well.

Overlayfs added refcounting to the async req (commit 9a2544037600
("ovl: fix use after free in struct ovl_aio_req")).  Is that not
needed for fuse as well?

Would it make sense to try and merge the two implementations?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 02/10] fuse: Definitions and ioctl for passthrough
  2023-05-22 14:50       ` Miklos Szeredi
@ 2023-05-24 10:00         ` Amir Goldstein
  0 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-05-24 10:00 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Mon, May 22, 2023 at 5:50 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Sat, 20 May 2023 at 12:20, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Fri, May 19, 2023 at 6:13 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > From: Alessio Balsini <balsini@android.com>
> > > >
> > > > Expose the FUSE_PASSTHROUGH capability to user space and declare all the
> > > > basic data structures and functions as the skeleton on top of which the
> > > > FUSE passthrough functionality will be built.
> > > >
> > > > As part of this, introduce the new FUSE passthrough ioctl, which allows
> > > > the FUSE daemon to specify a direct connection between a FUSE file and a
> > > > backing file.  The ioctl requires user space to pass the file descriptor
> > > > of one of its opened files to the FUSE driver and get an id in return.
> > > > This id may be passed in a reply to OPEN with flag FOPEN_PASSTHROUGH
> > > > to setup passthrough of read/write operations on the open file.
> > > >
> > > > Also, add the passthrough functions for the set-up and tear-down of the
> > > > data structures and locks that will be used both when fuse_conns and
> > > > fuse_files are created/deleted.
> > > >
> > > > Signed-off-by: Alessio Balsini <balsini@android.com>
> > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > ---
> > > >  fs/fuse/Makefile          |  1 +
> > > >  fs/fuse/dev.c             | 33 ++++++++++++++++++++++++--------
> > > >  fs/fuse/dir.c             |  7 ++++++-
> > > >  fs/fuse/file.c            | 17 +++++++++++++----
> > > >  fs/fuse/fuse_i.h          | 27 ++++++++++++++++++++++++++
> > > >  fs/fuse/inode.c           | 21 +++++++++++++++++++-
> > > >  fs/fuse/passthrough.c     | 40 +++++++++++++++++++++++++++++++++++++++
> > > >  include/uapi/linux/fuse.h | 13 +++++++++++--
> > > >  8 files changed, 143 insertions(+), 16 deletions(-)
> > > >  create mode 100644 fs/fuse/passthrough.c
> > > >
> > > > diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
> > > > index 0c48b35c058d..d9e1b47382f3 100644
> > > > --- a/fs/fuse/Makefile
> > > > +++ b/fs/fuse/Makefile
> > > > @@ -8,6 +8,7 @@ obj-$(CONFIG_CUSE) += cuse.o
> > > >  obj-$(CONFIG_VIRTIO_FS) += virtiofs.o
> > > >
> > > >  fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
> > > > +fuse-y += passthrough.o
> > > >  fuse-$(CONFIG_FUSE_DAX) += dax.o
> > > >
> > > >  virtiofs-y := virtio_fs.o
> > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > > > index 1a8f82f478cb..cb00234e7843 100644
> > > > --- a/fs/fuse/dev.c
> > > > +++ b/fs/fuse/dev.c
> > > > @@ -2255,16 +2255,19 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
> > > >                            unsigned long arg)
> > > >  {
> > > >         int res;
> > > > -       int oldfd;
> > > > -       struct fuse_dev *fud = NULL;
> > > > +       int fd, id;
> > > > +       struct fuse_dev *fud = fuse_get_dev(file);
> > >
> > > This is broken, see below.
> >
> > IIUC, *this* is not broken for the new ioctls...
> >
> > >
> > > >         struct fd f;
> > > >
> > > > +       if (!fud)
> > > > +               return -EINVAL;
> > > > +
>
> This is also broken for the old ioctl.
>
> > > >         switch (cmd) {
> > > >         case FUSE_DEV_IOC_CLONE:
> > > > -               if (get_user(oldfd, (__u32 __user *)arg))
> > > > +               if (get_user(fd, (__u32 __user *)arg))
> > > >                         return -EFAULT;
> > > >
> > > > -               f = fdget(oldfd);
> > > > +               f = fdget(fd);
> > > >                 if (!f.file)
> > > >                         return -EINVAL;
> > > >
> > > > @@ -2272,17 +2275,31 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
> > > >                  * Check against file->f_op because CUSE
> > > >                  * uses the same ioctl handler.
> > > >                  */
> > > > -               if (f.file->f_op == file->f_op)
> > > > -                       fud = fuse_get_dev(f.file);
> > > > -
> > > >                 res = -EINVAL;
> > > > -               if (fud) {
> > > > +               if (f.file->f_op == file->f_op) {
> >
> > and this can be fixed by adding:
> >  +                           fud = fuse_get_dev(f.file);
>
> Yes, but it's still messy.
>
> I suggest separating out unrelated ioctl commands into different
> functions.  Not sure if it's worth doing the open/close in a common
> function, I'll leave that to you.
>
> [snip]

ok.

>
> > > Seems too restrictive.  We could specify the max stacking depth in the
> > > protocol and verify that when registering the passthrough file.
> > >
> > > I.e. fuse_sb->s_stack_depth of
> > >
> > > 0 -> passthrough disabled
> > > 1 -> backing_sb->s_stack_depth == 0
> > > 2 -> backing_sb->stack_depth <= 1
> > > ...
> > >
> >
> > Ok. Let's see.
> > What do we stand to gain from the ability to setup nax stacking depth?
> >
> > We could use it to setup an overlayfs with lower FUSE that allows passthrough
> > fds to a non-stacked backing fs and we could use it to setup FUSE that allows
> > passthrough fds to overlayfs.
> >
> > I pity the FUSE userspace developers that will need to understand this
> > setup parameter...
>
> I guess libfuse could parse it with other common options.  It's
> something that needs to be tuned on a per-case basis, not something
> the filesystem designer can predict.
>
> Would be better if we could have a per-inode stack depth and then this
> wouldn't have to be tuned.  Is that feasible?
>

We could do something like:

real = d_real(dentry);
for (depth = 0; real != dentry && depth < FILESYSTEM_MAX_STACK_DEPTH;
      dentry = real, real = d_real(dentry));

Which brings up the question - what should fuse_d_real() return
if we fuse passthrough is per file and not per inode?
Should we add f_real() op?

Doing this check on FUSE passthrough open is easy, because FUSE
server can fall back to non-passthrough, but what about overlayfs?
I don't think we want to fail open in overlayfs because a lower FUSE
file turns out to be using the max stack depth.

This is getting a bit complicated, so I would like to take a step back
and think about which configurations we *really* want/need to support.

I already listed FUSE over overlayfs and overlayfs over FUSE -
they both seem reasonably likely.

How about FUSE over FUSE (nested passthrough) and more combinations
if you would like to take stack depth > 2 into design considerations?

Do you think we can relax allowed combinations to make things easier?
To me, it feels that restricting nested FUSE bypass is not so bad, because
server has the option to fallback to non-passthrough at any level.

So how about:
1. init: FUSE s_stack_depth = FUSE_PASSTHOUGH ? 1 : 0
2. Let stacking fs set a flag FMODE_PASSTHROUGH on backing file
3. on passthrough setup, check if fuse_file has FMODE_PASSTHROUGH
    (i.e. is FUSE the top of the stack?)
3.a. if FUSE is top of the stack, allow backing file stack depth 1
3.b. if FUSE is below something, limit backing file stack depth 0

This implies that the API to setup a backing file should provide
the fuse_file that is expected to be setup for passthrough, so that
server will have an opportunity to react and not reply to the open
request with FOPEN_PASSTHROUGH.
Alternatively, the client can silently ignore FOPEN_PASSTHROUGH,
but that seems less ideal.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 05/10] fuse: Handle asynchronous read and write in passthrough
  2023-05-22 15:20   ` Miklos Szeredi
@ 2023-05-24 10:03     ` Amir Goldstein
  2023-08-21 15:27       ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-05-24 10:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Mon, May 22, 2023 at 6:20 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > Extend the passthrough feature by handling asynchronous IO both for read
> > and write operations.
> >
> > When an AIO request is received, if the request targets a FUSE file with
> > the passthrough functionality enabled, a new identical AIO request is
> > created.  The new request targets the backing file and gets assigned
> > a special FUSE passthrough AIO completion callback.
> >
> > When the backing file AIO request is completed, the FUSE
> > passthrough AIO completion callback is executed and propagates the
> > completion signal to the FUSE AIO request by triggering its completion
> > callback as well.
>
> Overlayfs added refcounting to the async req (commit 9a2544037600
> ("ovl: fix use after free in struct ovl_aio_req")).  Is that not
> needed for fuse as well?
>
> Would it make sense to try and merge the two implementations?
>

Makes sense - I will look into it.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-05-19 12:56 [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
                   ` (10 preceding siblings ...)
  2023-05-20 10:28 ` [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
@ 2023-06-06  9:13 ` Amir Goldstein
  2023-06-06  9:49   ` Miklos Szeredi
  11 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-06-06  9:13 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Fri, May 19, 2023 at 3:57 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Miklos,
>
> This patch set addresses your review feedback on Alesio's V12 patch set
> from 2021 [1] as well as other bugs that I have found since.
> This patch set uses refcounted backing files as we discussed recently [2].
>
> I am posting this for several possible outcomes:
>
> 1. Either FUSE-BPF develpers can use this as a reference implementation
>    for their 1st phase of "backing file passthrough"
> 2. Or they can tell me which API changes need to made to this patch set
>    so the API is flexible enough to extend to "backing inode passthrough"
>    and to "BPF filters" later on
> 3. We find there is little overlap in the APIs and merge this as is
>
> These patches are available on github [3] along with libfuse patches [4].
> I tested them by running xfstests (./check -fuse -g quick.rw) with latest
> libfuse xfstest support.
>
> Without FOPEN_PASSTHROUGH, one test in this group fails (generic/451)
> which tests mixed buffered/aio writes.
> With FOPEN_PASSTHROUGH, this test also passes.
>
> This revision does not set any limitations on the number of backing files
> that can be mapped by the server.  I considered several ways to address
> this and decided to try a different approach.
>
> Patch 10 (with matching libfuse patch) is an RFC patch for an alternative
> API approach. Please see my comments on that patch.
>

Miklos,

I wanted to set expectations w.r.t this patch set and the passthrough
feature development in general.

So far I've seen comments from you up to path 5/10, so I assume you
did not get up to RFC patch 10/10.

The comments about adding max stack depth to protocol and about
refactoring overlayfs common code are easy to do.

However, I feel that there are still open core design questions that need
to be spelled out, before we continue.

Do you find the following acceptable for first implementation, or do you
think that those issues must be addressed before merging anything?

1. No lsof visibility of backing files (if server closes them)
2. Derived backing files resource limit (cannot grow beyond nr of fuse files)
3. No data consistency guaranty between different fd to the same inode
    (i.e. backing is per fd not per inode)

Depending on your answers, I will decide if I have the bandwidth to carry
this patch set to the finish line or wait for the Android team to post a more
comprehensive patch set that deals with all of the above.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-06-06  9:13 ` Amir Goldstein
@ 2023-06-06  9:49   ` Miklos Szeredi
  2023-06-06 11:19     ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Miklos Szeredi @ 2023-06-06  9:49 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Tue, 6 Jun 2023 at 11:13, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Fri, May 19, 2023 at 3:57 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > Miklos,
> >
> > This patch set addresses your review feedback on Alesio's V12 patch set
> > from 2021 [1] as well as other bugs that I have found since.
> > This patch set uses refcounted backing files as we discussed recently [2].
> >
> > I am posting this for several possible outcomes:
> >
> > 1. Either FUSE-BPF develpers can use this as a reference implementation
> >    for their 1st phase of "backing file passthrough"
> > 2. Or they can tell me which API changes need to made to this patch set
> >    so the API is flexible enough to extend to "backing inode passthrough"
> >    and to "BPF filters" later on
> > 3. We find there is little overlap in the APIs and merge this as is
> >
> > These patches are available on github [3] along with libfuse patches [4].
> > I tested them by running xfstests (./check -fuse -g quick.rw) with latest
> > libfuse xfstest support.
> >
> > Without FOPEN_PASSTHROUGH, one test in this group fails (generic/451)
> > which tests mixed buffered/aio writes.
> > With FOPEN_PASSTHROUGH, this test also passes.
> >
> > This revision does not set any limitations on the number of backing files
> > that can be mapped by the server.  I considered several ways to address
> > this and decided to try a different approach.
> >
> > Patch 10 (with matching libfuse patch) is an RFC patch for an alternative
> > API approach. Please see my comments on that patch.
> >
>
> Miklos,
>
> I wanted to set expectations w.r.t this patch set and the passthrough
> feature development in general.
>
> So far I've seen comments from you up to path 5/10, so I assume you
> did not get up to RFC patch 10/10.
>
> The comments about adding max stack depth to protocol and about
> refactoring overlayfs common code are easy to do.
>
> However, I feel that there are still open core design questions that need
> to be spelled out, before we continue.
>
> Do you find the following acceptable for first implementation, or do you
> think that those issues must be addressed before merging anything?
>
> 1. No lsof visibility of backing files (if server closes them)
> 2. Derived backing files resource limit (cannot grow beyond nr of fuse files)
> 3. No data consistency guaranty between different fd to the same inode
>     (i.e. backing is per fd not per inode)

I think the most important thing is to have the FUSE-BPF team onboard.
   I'm not sure that the per-file part of this is necessary, doing
everything per-inode should be okay.   What are the benefits?

Not having visibility and resource limits would be okay for a first
version, as long as it's somehow constrained to privileged use.  But
I'm not sure it would be worth it that way.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Fwd: [PATCH v13 10/10] fuse: setup a passthrough fd without a permanent backing id
  2023-05-19 12:57 ` [PATCH v13 10/10] fuse: setup a passthrough fd without a permanent backing id Amir Goldstein
@ 2023-06-06 10:22   ` Miklos Szeredi
  2023-06-06 11:00     ` [fuse-devel] " Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Miklos Szeredi @ 2023-06-06 10:22 UTC (permalink / raw)
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
>
> WIP
>
> Add an ioctl to associate a FUSE server open fd with a request.
> A later response to this request get use the FOPEN_PASSTHROUGH flag
> to request passthrough to the associated backing file.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>
> Miklos,
>
> After implementing refcounted backing files, I started to think how
> to limit the server from mapping too many files.
>
> I wanted to limit the backing files mappings to the number of open fuse
> files to simplify backing files accounting (i.e. open files are
> effectively accounted to clients).
>
> It occured to me that creatig a 1-to-1 mapping between fuse files and
> backing file ids is quite futile if there is no need to manage 1-to-many
> backing file mappings.
>
> If only 1-to-1 mapping is desired, the proposed ioctl associates a
> backing file with a pending request.  The backing file will be kept
> open for as long the request lives, or until its refcount is handed
> over to the client, which can then use it to setup passthough to the
> backing file without the intermediate idr array.

I think I understand what the patch does, but what I don't understand
is how this is going to solve the resource accounting problem.

Can you elaborate?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [fuse-devel] Fwd: [PATCH v13 10/10] fuse: setup a passthrough fd without a permanent backing id
  2023-06-06 10:22   ` Fwd: " Miklos Szeredi
@ 2023-06-06 11:00     ` Amir Goldstein
  2023-06-06 12:46       ` Miklos Szeredi
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-06-06 11:00 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: fuse-devel, linux-fsdevel, Daniel Rosenberg

On Tue, Jun 6, 2023 at 1:23 PM Miklos Szeredi via fuse-devel
<fuse-devel@lists.sourceforge.net> wrote:
>
> On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > WIP
> >
> > Add an ioctl to associate a FUSE server open fd with a request.
> > A later response to this request get use the FOPEN_PASSTHROUGH flag
> > to request passthrough to the associated backing file.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >
> > Miklos,
> >
> > After implementing refcounted backing files, I started to think how
> > to limit the server from mapping too many files.
> >
> > I wanted to limit the backing files mappings to the number of open fuse
> > files to simplify backing files accounting (i.e. open files are
> > effectively accounted to clients).
> >
> > It occured to me that creatig a 1-to-1 mapping between fuse files and
> > backing file ids is quite futile if there is no need to manage 1-to-many
> > backing file mappings.
> >
> > If only 1-to-1 mapping is desired, the proposed ioctl associates a
> > backing file with a pending request.  The backing file will be kept
> > open for as long the request lives, or until its refcount is handed
> > over to the client, which can then use it to setup passthough to the
> > backing file without the intermediate idr array.
>
> I think I understand what the patch does, but what I don't understand
> is how this is going to solve the resource accounting problem.
>
> Can you elaborate?
>

It does not solve the resource accounting in the traditional way
of limiting the number of open files to the resource limit of the
server process.

Instead, it has the similar effect of overlayfs pseudo files
non accounting.

A FUSE passthrough filesystem can contribute the same number
of non accounted open fds as the number of FUSE fds accounted
to different processes.

A non privileged user can indirectly cause unaccounted open fds
with a FUSE passthough fs in the exact same way that the same
user can cause unaccounted open fds with an overlayfs mount
if it can convince other users to open files on the FUSE/ovl that it
has mounted.

Am I making sense?

One possible improvement to this API is to include the nodeid
in FUSE_DEV_IOC_PASSTHROUGH_SETUP to make the
mapping per-inode.

If the mapping was already created during another open of
the same inode, the setup would fail with a special error
(EEXIST) so the caller would close the new backing fd, but
the request already takes a reference to the backing fd of
the inode, so FOPEN_PASSTHROUGH may proceed.

On the last close, the backing fd refcount drops to zero and
then is detached from the inode.

A server that wants to manage the mapping lifetime can still
use FUSE_DEV_IOC_PASSTHROUGH_OPEN/CLOSE to
do so (with nodeid) argument.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-06-06  9:49   ` Miklos Szeredi
@ 2023-06-06 11:19     ` Amir Goldstein
  2023-06-06 13:06       ` Miklos Szeredi
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-06-06 11:19 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Tue, Jun 6, 2023 at 12:49 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, 6 Jun 2023 at 11:13, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Fri, May 19, 2023 at 3:57 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > Miklos,
> > >
> > > This patch set addresses your review feedback on Alesio's V12 patch set
> > > from 2021 [1] as well as other bugs that I have found since.
> > > This patch set uses refcounted backing files as we discussed recently [2].
> > >
> > > I am posting this for several possible outcomes:
> > >
> > > 1. Either FUSE-BPF develpers can use this as a reference implementation
> > >    for their 1st phase of "backing file passthrough"
> > > 2. Or they can tell me which API changes need to made to this patch set
> > >    so the API is flexible enough to extend to "backing inode passthrough"
> > >    and to "BPF filters" later on
> > > 3. We find there is little overlap in the APIs and merge this as is
> > >
> > > These patches are available on github [3] along with libfuse patches [4].
> > > I tested them by running xfstests (./check -fuse -g quick.rw) with latest
> > > libfuse xfstest support.
> > >
> > > Without FOPEN_PASSTHROUGH, one test in this group fails (generic/451)
> > > which tests mixed buffered/aio writes.
> > > With FOPEN_PASSTHROUGH, this test also passes.
> > >
> > > This revision does not set any limitations on the number of backing files
> > > that can be mapped by the server.  I considered several ways to address
> > > this and decided to try a different approach.
> > >
> > > Patch 10 (with matching libfuse patch) is an RFC patch for an alternative
> > > API approach. Please see my comments on that patch.
> > >
> >
> > Miklos,
> >
> > I wanted to set expectations w.r.t this patch set and the passthrough
> > feature development in general.
> >
> > So far I've seen comments from you up to path 5/10, so I assume you
> > did not get up to RFC patch 10/10.
> >
> > The comments about adding max stack depth to protocol and about
> > refactoring overlayfs common code are easy to do.
> >
> > However, I feel that there are still open core design questions that need
> > to be spelled out, before we continue.
> >
> > Do you find the following acceptable for first implementation, or do you
> > think that those issues must be addressed before merging anything?
> >
> > 1. No lsof visibility of backing files (if server closes them)
> > 2. Derived backing files resource limit (cannot grow beyond nr of fuse files)
> > 3. No data consistency guaranty between different fd to the same inode
> >     (i.e. backing is per fd not per inode)
>
> I think the most important thing is to have the FUSE-BPF team onboard.

Yeh, I'd love to get some feedback from you guys.

>    I'm not sure that the per-file part of this is necessary, doing
> everything per-inode should be okay.   What are the benefits?
>

I agree that semantics are simpler with per-inode.
The only benefit I see to per-file is the lifetime of the mapping.

It is very easy IMO to program with a mapping scope of
open-to-close that is requested by FOPEN_PASSTHROUGH
and FOPEN_PASSTHROUGH_AUTO_CLOSE.

But I think the same lifetime can still be achieved with per-inode
mapping. I hand waved how I think that could be done in response
to patch 10/10 review.

I think if I can make this patch set work per-inode, the roadmap
from here to FUSE-BPF would be much more clear.

> Not having visibility and resource limits would be okay for a first
> version, as long as it's somehow constrained to privileged use.  But
> I'm not sure it would be worth it that way.
>

Speaking on behalf of my own use case for FUSE passthrough (HSM),
FUSE is used for "do something that does not belong in the kernel",
but running as unprivileged user is a non-requirement.
So I can say with confidence of paying customers that passthrough is
useful and essential even with privileged user constraint.

In summary, I will try to come up with v14 that is:
- privileged user only
- no resource limitation
- per-inode mapping

If at any time FUSE-BFP team would like to take this patch set
of my hands, or propose a replacement for it, I would be very happy
to step down - whatever it takes to land read/write passthrough.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [fuse-devel] Fwd: [PATCH v13 10/10] fuse: setup a passthrough fd without a permanent backing id
  2023-06-06 11:00     ` [fuse-devel] " Amir Goldstein
@ 2023-06-06 12:46       ` Miklos Szeredi
  0 siblings, 0 replies; 55+ messages in thread
From: Miklos Szeredi @ 2023-06-06 12:46 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: fuse-devel, linux-fsdevel, Daniel Rosenberg

On Tue, 6 Jun 2023 at 13:00, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, Jun 6, 2023 at 1:23 PM Miklos Szeredi via fuse-devel
> <fuse-devel@lists.sourceforge.net> wrote:
> >
> > On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > WIP
> > >
> > > Add an ioctl to associate a FUSE server open fd with a request.
> > > A later response to this request get use the FOPEN_PASSTHROUGH flag
> > > to request passthrough to the associated backing file.
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > >
> > > Miklos,
> > >
> > > After implementing refcounted backing files, I started to think how
> > > to limit the server from mapping too many files.
> > >
> > > I wanted to limit the backing files mappings to the number of open fuse
> > > files to simplify backing files accounting (i.e. open files are
> > > effectively accounted to clients).
> > >
> > > It occured to me that creatig a 1-to-1 mapping between fuse files and
> > > backing file ids is quite futile if there is no need to manage 1-to-many
> > > backing file mappings.
> > >
> > > If only 1-to-1 mapping is desired, the proposed ioctl associates a
> > > backing file with a pending request.  The backing file will be kept
> > > open for as long the request lives, or until its refcount is handed
> > > over to the client, which can then use it to setup passthough to the
> > > backing file without the intermediate idr array.
> >
> > I think I understand what the patch does, but what I don't understand
> > is how this is going to solve the resource accounting problem.
> >
> > Can you elaborate?
> >
>
> It does not solve the resource accounting in the traditional way
> of limiting the number of open files to the resource limit of the
> server process.
>
> Instead, it has the similar effect of overlayfs pseudo files
> non accounting.
>
> A FUSE passthrough filesystem can contribute the same number
> of non accounted open fds as the number of FUSE fds accounted
> to different processes.
>
> A non privileged user can indirectly cause unaccounted open fds
> with a FUSE passthough fs in the exact same way that the same
> user can cause unaccounted open fds with an overlayfs mount
> if it can convince other users to open files on the FUSE/ovl that it
> has mounted.
>
> Am I making sense?

So this allows double the number of open files as normally would be
allowed, same as overlayfs.

Makes sense.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-06-06 11:19     ` Amir Goldstein
@ 2023-06-06 13:06       ` Miklos Szeredi
  2023-08-29 18:14         ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Miklos Szeredi @ 2023-06-06 13:06 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Tue, 6 Jun 2023 at 13:19, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, Jun 6, 2023 at 12:49 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Tue, 6 Jun 2023 at 11:13, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Fri, May 19, 2023 at 3:57 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > Miklos,
> > > >
> > > > This patch set addresses your review feedback on Alesio's V12 patch set
> > > > from 2021 [1] as well as other bugs that I have found since.
> > > > This patch set uses refcounted backing files as we discussed recently [2].
> > > >
> > > > I am posting this for several possible outcomes:
> > > >
> > > > 1. Either FUSE-BPF develpers can use this as a reference implementation
> > > >    for their 1st phase of "backing file passthrough"
> > > > 2. Or they can tell me which API changes need to made to this patch set
> > > >    so the API is flexible enough to extend to "backing inode passthrough"
> > > >    and to "BPF filters" later on
> > > > 3. We find there is little overlap in the APIs and merge this as is
> > > >
> > > > These patches are available on github [3] along with libfuse patches [4].
> > > > I tested them by running xfstests (./check -fuse -g quick.rw) with latest
> > > > libfuse xfstest support.
> > > >
> > > > Without FOPEN_PASSTHROUGH, one test in this group fails (generic/451)
> > > > which tests mixed buffered/aio writes.
> > > > With FOPEN_PASSTHROUGH, this test also passes.
> > > >
> > > > This revision does not set any limitations on the number of backing files
> > > > that can be mapped by the server.  I considered several ways to address
> > > > this and decided to try a different approach.
> > > >
> > > > Patch 10 (with matching libfuse patch) is an RFC patch for an alternative
> > > > API approach. Please see my comments on that patch.
> > > >
> > >
> > > Miklos,
> > >
> > > I wanted to set expectations w.r.t this patch set and the passthrough
> > > feature development in general.
> > >
> > > So far I've seen comments from you up to path 5/10, so I assume you
> > > did not get up to RFC patch 10/10.
> > >
> > > The comments about adding max stack depth to protocol and about
> > > refactoring overlayfs common code are easy to do.
> > >
> > > However, I feel that there are still open core design questions that need
> > > to be spelled out, before we continue.
> > >
> > > Do you find the following acceptable for first implementation, or do you
> > > think that those issues must be addressed before merging anything?
> > >
> > > 1. No lsof visibility of backing files (if server closes them)
> > > 2. Derived backing files resource limit (cannot grow beyond nr of fuse files)
> > > 3. No data consistency guaranty between different fd to the same inode
> > >     (i.e. backing is per fd not per inode)
> >
> > I think the most important thing is to have the FUSE-BPF team onboard.
>
> Yeh, I'd love to get some feedback from you guys.
>
> >    I'm not sure that the per-file part of this is necessary, doing
> > everything per-inode should be okay.   What are the benefits?
> >
>
> I agree that semantics are simpler with per-inode.
> The only benefit I see to per-file is the lifetime of the mapping.
>
> It is very easy IMO to program with a mapping scope of
> open-to-close that is requested by FOPEN_PASSTHROUGH
> and FOPEN_PASSTHROUGH_AUTO_CLOSE.

Right, and this case the resource limiting is also easy to think about.

I'm not worried about consistency, fuse server can do whatever it
wants with the data anyway.  I am worried about the visibility of the
mapping.  One idea would be to create a kernel thread for each fuse sb
instance and install mapped files into that thread's file table.  In
theory this should be trivial as the VFS has all the helpers that can
do this safely.

>
> But I think the same lifetime can still be achieved with per-inode
> mapping. I hand waved how I think that could be done in response
> to patch 10/10 review.

Yeah, but it's just complicating things further.


>
> I think if I can make this patch set work per-inode, the roadmap
> from here to FUSE-BPF would be much more clear.

One advantage of per-inode non-autoclose would be that open could be
done without a roundtrip to the server.   However the resource
limiting becomes harder to think about.

So it might make sense to just create two separate modes:

 - per-open unmap-on-release (autoclose)
 - per-inode unmap-on-forget (non-autoclose, mapping can be torn down
explicitly)

We also should at least consider (even if not supported in the first
version) the block-fuse case where the backing file(s) contain the
disk image.  In this case the backing fd registration would be per-fs
not per-inode, and the mapping would contain the
backing_fd/backing_offset pair.

>
> > Not having visibility and resource limits would be okay for a first
> > version, as long as it's somehow constrained to privileged use.  But
> > I'm not sure it would be worth it that way.
> >
>
> Speaking on behalf of my own use case for FUSE passthrough (HSM),
> FUSE is used for "do something that does not belong in the kernel",
> but running as unprivileged user is a non-requirement.
> So I can say with confidence of paying customers that passthrough is
> useful and essential even with privileged user constraint.
>
> In summary, I will try to come up with v14 that is:
> - privileged user only
> - no resource limitation
> - per-inode mapping

Okay, that's a logical first step.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 05/10] fuse: Handle asynchronous read and write in passthrough
  2023-05-24 10:03     ` Amir Goldstein
@ 2023-08-21 15:27       ` Amir Goldstein
  2023-08-22 10:18         ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-08-21 15:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Wed, May 24, 2023 at 1:03 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Mon, May 22, 2023 at 6:20 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > Extend the passthrough feature by handling asynchronous IO both for read
> > > and write operations.
> > >
> > > When an AIO request is received, if the request targets a FUSE file with
> > > the passthrough functionality enabled, a new identical AIO request is
> > > created.  The new request targets the backing file and gets assigned
> > > a special FUSE passthrough AIO completion callback.
> > >
> > > When the backing file AIO request is completed, the FUSE
> > > passthrough AIO completion callback is executed and propagates the
> > > completion signal to the FUSE AIO request by triggering its completion
> > > callback as well.
> >
> > Overlayfs added refcounting to the async req (commit 9a2544037600
> > ("ovl: fix use after free in struct ovl_aio_req")).  Is that not
> > needed for fuse as well?
> >
> > Would it make sense to try and merge the two implementations?
> >
>
> Makes sense - I will look into it.


Hi Miklos,

Getting back to this.
Did you mean something like that? (only compile tested)

https://github.com/amir73il/linux/commits/backing_fs

If yes, then I wonder:
1. Is the difference between FUSE_IOCB_MASK and OVL_IOCB_MASK
    (i.e. the APPEND flag) intentional?
2. What would be the right way to do ovl_copyattr() on io completion?
    Pass another completion handler to read/write helpers?
    This seems a bit ugly. Do you have a nicer idea?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 05/10] fuse: Handle asynchronous read and write in passthrough
  2023-08-21 15:27       ` Amir Goldstein
@ 2023-08-22 10:18         ` Amir Goldstein
  2023-08-22 11:03           ` Miklos Szeredi
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-08-22 10:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel, overlayfs

On Mon, Aug 21, 2023 at 6:27 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Wed, May 24, 2023 at 1:03 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Mon, May 22, 2023 at 6:20 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > Extend the passthrough feature by handling asynchronous IO both for read
> > > > and write operations.
> > > >
> > > > When an AIO request is received, if the request targets a FUSE file with
> > > > the passthrough functionality enabled, a new identical AIO request is
> > > > created.  The new request targets the backing file and gets assigned
> > > > a special FUSE passthrough AIO completion callback.
> > > >
> > > > When the backing file AIO request is completed, the FUSE
> > > > passthrough AIO completion callback is executed and propagates the
> > > > completion signal to the FUSE AIO request by triggering its completion
> > > > callback as well.
> > >
> > > Overlayfs added refcounting to the async req (commit 9a2544037600
> > > ("ovl: fix use after free in struct ovl_aio_req")).  Is that not
> > > needed for fuse as well?
> > >
> > > Would it make sense to try and merge the two implementations?
> > >
> >
> > Makes sense - I will look into it.
>
>
> Hi Miklos,
>
> Getting back to this.
> Did you mean something like that? (only compile tested)
>
> https://github.com/amir73il/linux/commits/backing_fs
>
> If yes, then I wonder:
> 1. Is the difference between FUSE_IOCB_MASK and OVL_IOCB_MASK
>     (i.e. the APPEND flag) intentional?
> 2. What would be the right way to do ovl_copyattr() on io completion?
>     Pass another completion handler to read/write helpers?
>     This seems a bit ugly. Do you have a nicer idea?
>

Hmm. Looking closer, ovl_copyattr() in ovl_aio_cleanup_handler()
seems a bit racy as it is not done under inode_lock().

I wonder if it is enough to fix that by adding the lock or if we need
to resort to a more complicated scheme like FUSE_I_SIZE_UNSTABLE
for overlayfs aio?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 05/10] fuse: Handle asynchronous read and write in passthrough
  2023-08-22 10:18         ` Amir Goldstein
@ 2023-08-22 11:03           ` Miklos Szeredi
  2023-08-22 13:22             ` Amir Goldstein
  2023-08-24 12:11             ` Amir Goldstein
  0 siblings, 2 replies; 55+ messages in thread
From: Miklos Szeredi @ 2023-08-22 11:03 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel, overlayfs

On Tue, 22 Aug 2023 at 12:18, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Mon, Aug 21, 2023 at 6:27 PM Amir Goldstein <amir73il@gmail.com> wrote:

> > Getting back to this.
> > Did you mean something like that? (only compile tested)
> >
> > https://github.com/amir73il/linux/commits/backing_fs
> >
> > If yes, then I wonder:
> > 1. Is the difference between FUSE_IOCB_MASK and OVL_IOCB_MASK
> >     (i.e. the APPEND flag) intentional?

Setting IOCB_APPEND on the backing file doesn't make a difference as
long as the backing file is not modified during the write.

In overlayfs the case of the backing file being modified is not
defined, so I guess that's the reason to omit it.  However I don't see
a problem with setting it on the backing file either, the file
size/position is synchronized after the write, so nothing bad should
happen if the backing file was modified.

> > 2. What would be the right way to do ovl_copyattr() on io completion?
> >     Pass another completion handler to read/write helpers?
> >     This seems a bit ugly. Do you have a nicer idea?
> >

Ugh, I missed that little detail.   I don't have a better idea than to
use a callback function.

>
> Hmm. Looking closer, ovl_copyattr() in ovl_aio_cleanup_handler()
> seems a bit racy as it is not done under inode_lock().
>
> I wonder if it is enough to fix that by adding the lock or if we need
> to resort to a more complicated scheme like FUSE_I_SIZE_UNSTABLE
> for overlayfs aio?

Quite recently rename didn't take inode lock on source, so
ovl_aio_cleanup_handler() wasn't the only unlocked instance.

I don't see a strong reason to always lock the inode before
ovl_copyattr(), but I could be wrong.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 05/10] fuse: Handle asynchronous read and write in passthrough
  2023-08-22 11:03           ` Miklos Szeredi
@ 2023-08-22 13:22             ` Amir Goldstein
  2023-08-22 14:06               ` Miklos Szeredi
  2023-08-24 12:11             ` Amir Goldstein
  1 sibling, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-08-22 13:22 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel, overlayfs

On Tue, Aug 22, 2023 at 2:03 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, 22 Aug 2023 at 12:18, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Mon, Aug 21, 2023 at 6:27 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > > Getting back to this.
> > > Did you mean something like that? (only compile tested)
> > >
> > > https://github.com/amir73il/linux/commits/backing_fs
> > >
> > > If yes, then I wonder:
> > > 1. Is the difference between FUSE_IOCB_MASK and OVL_IOCB_MASK
> > >     (i.e. the APPEND flag) intentional?
>
> Setting IOCB_APPEND on the backing file doesn't make a difference as
> long as the backing file is not modified during the write.
>
> In overlayfs the case of the backing file being modified is not
> defined, so I guess that's the reason to omit it.  However I don't see
> a problem with setting it on the backing file either, the file
> size/position is synchronized after the write, so nothing bad should
> happen if the backing file was modified.
>
> > > 2. What would be the right way to do ovl_copyattr() on io completion?
> > >     Pass another completion handler to read/write helpers?
> > >     This seems a bit ugly. Do you have a nicer idea?
> > >
>
> Ugh, I missed that little detail.   I don't have a better idea than to
> use a callback function.
>
> >
> > Hmm. Looking closer, ovl_copyattr() in ovl_aio_cleanup_handler()
> > seems a bit racy as it is not done under inode_lock().
> >
> > I wonder if it is enough to fix that by adding the lock or if we need
> > to resort to a more complicated scheme like FUSE_I_SIZE_UNSTABLE
> > for overlayfs aio?
>
> Quite recently rename didn't take inode lock on source, so
> ovl_aio_cleanup_handler() wasn't the only unlocked instance.
>
> I don't see a strong reason to always lock the inode before
> ovl_copyattr(), but I could be wrong.
>

IDK, ovl_copyattr() looks like a textbook example of a race
if not protected by something because it reads a bunch of stuff
from realinode and then writes a bunch of stuff to inode.

Anyway, I guess it wouldn't hurt to wrap it with inode_lock()
in the ovl completion callback.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 05/10] fuse: Handle asynchronous read and write in passthrough
  2023-08-22 13:22             ` Amir Goldstein
@ 2023-08-22 14:06               ` Miklos Szeredi
  0 siblings, 0 replies; 55+ messages in thread
From: Miklos Szeredi @ 2023-08-22 14:06 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel, overlayfs

On Tue, 22 Aug 2023 at 15:22, Amir Goldstein <amir73il@gmail.com> wrote:

> IDK, ovl_copyattr() looks like a textbook example of a race
> if not protected by something because it reads a bunch of stuff
> from realinode and then writes a bunch of stuff to inode.

Yeah, you are right.

> Anyway, I guess it wouldn't hurt to wrap it with inode_lock()
> in the ovl completion callback.

Okay.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 05/10] fuse: Handle asynchronous read and write in passthrough
  2023-08-22 11:03           ` Miklos Szeredi
  2023-08-22 13:22             ` Amir Goldstein
@ 2023-08-24 12:11             ` Amir Goldstein
  2023-08-29 12:42               ` Miklos Szeredi
  1 sibling, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-08-24 12:11 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel, overlayfs

On Tue, Aug 22, 2023 at 2:03 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, 22 Aug 2023 at 12:18, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Mon, Aug 21, 2023 at 6:27 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > > Getting back to this.
> > > Did you mean something like that? (only compile tested)
> > >
> > > https://github.com/amir73il/linux/commits/backing_fs
> > >
> > > If yes, then I wonder:
> > > 1. Is the difference between FUSE_IOCB_MASK and OVL_IOCB_MASK
> > >     (i.e. the APPEND flag) intentional?
>
> Setting IOCB_APPEND on the backing file doesn't make a difference as
> long as the backing file is not modified during the write.
>
> In overlayfs the case of the backing file being modified is not
> defined, so I guess that's the reason to omit it.  However I don't see
> a problem with setting it on the backing file either, the file
> size/position is synchronized after the write, so nothing bad should
> happen if the backing file was modified.
>

That raises the question if FUSE passthrough behavior is defined when
backing file is being modified. Should it be any different than ovl?
I don't really care if we set IOCB_APPEND or not, just if we need
a different mask for ovl and FUSE.

> > > 2. What would be the right way to do ovl_copyattr() on io completion?
> > >     Pass another completion handler to read/write helpers?
> > >     This seems a bit ugly. Do you have a nicer idea?
> > >
>
> Ugh, I missed that little detail.   I don't have a better idea than to
> use a callback function.
>

Ok. added the cleanup callback.
I think it's not that bad?

https://github.com/amir73il/linux/commits/backing_fs

> >
> > Hmm. Looking closer, ovl_copyattr() in ovl_aio_cleanup_handler()
> > seems a bit racy as it is not done under inode_lock().
> >

Decided to fix that by taking ovl inode spinlock inside ovl_copyattr()
and did the same for ovl_file_accessed() for read ops.

I've found and fixed two other issues with aio completion on this branch,
one of them is a fix for a possible realfile refcount leak, so will probably
need to backport this one.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 05/10] fuse: Handle asynchronous read and write in passthrough
  2023-08-24 12:11             ` Amir Goldstein
@ 2023-08-29 12:42               ` Miklos Szeredi
  0 siblings, 0 replies; 55+ messages in thread
From: Miklos Szeredi @ 2023-08-29 12:42 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel, overlayfs

On Thu, 24 Aug 2023 at 14:12, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, Aug 22, 2023 at 2:03 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Tue, 22 Aug 2023 at 12:18, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Mon, Aug 21, 2023 at 6:27 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > > Getting back to this.
> > > > Did you mean something like that? (only compile tested)
> > > >
> > > > https://github.com/amir73il/linux/commits/backing_fs
> > > >
> > > > If yes, then I wonder:
> > > > 1. Is the difference between FUSE_IOCB_MASK and OVL_IOCB_MASK
> > > >     (i.e. the APPEND flag) intentional?
> >
> > Setting IOCB_APPEND on the backing file doesn't make a difference as
> > long as the backing file is not modified during the write.
> >
> > In overlayfs the case of the backing file being modified is not
> > defined, so I guess that's the reason to omit it.  However I don't see
> > a problem with setting it on the backing file either, the file
> > size/position is synchronized after the write, so nothing bad should
> > happen if the backing file was modified.
> >
>
> That raises the question if FUSE passthrough behavior is defined when
> backing file is being modified. Should it be any different than ovl?
> I don't really care if we set IOCB_APPEND or not, just if we need
> a different mask for ovl and FUSE.

Dunno.

I feel that instead of making the behavior defined when file is
modified behind fuse's back, there should be some mechanism between
the server and the client (userspace and kernel) that allows proper
handling of "remote" modification (oplocks/leases/younameit).

I thought about this quite a bit, and even have some patches for the
read-only lease case.  But this is far from trivial.

In the meantime just setting IOCB_APPEND is quite harmless, so I think
we should do it for both overlayfs and fuse for consistency.

>
> > > > 2. What would be the right way to do ovl_copyattr() on io completion?
> > > >     Pass another completion handler to read/write helpers?
> > > >     This seems a bit ugly. Do you have a nicer idea?
> > > >
> >
> > Ugh, I missed that little detail.   I don't have a better idea than to
> > use a callback function.
> >
>
> Ok. added the cleanup callback.
> I think it's not that bad?
>
> https://github.com/amir73il/linux/commits/backing_fs

Looks good.


>
> > >
> > > Hmm. Looking closer, ovl_copyattr() in ovl_aio_cleanup_handler()
> > > seems a bit racy as it is not done under inode_lock().
> > >
>
> Decided to fix that by taking ovl inode spinlock inside ovl_copyattr()
> and did the same for ovl_file_accessed() for read ops.
>
> I've found and fixed two other issues with aio completion on this branch,
> one of them is a fix for a possible realfile refcount leak, so will probably
> need to backport this one.

Great.  Thanks.

Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-06-06 13:06       ` Miklos Szeredi
@ 2023-08-29 18:14         ` Amir Goldstein
  2023-09-20 13:56           ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-08-29 18:14 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Tue, Jun 6, 2023 at 4:06 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, 6 Jun 2023 at 13:19, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Tue, Jun 6, 2023 at 12:49 PM Miklos Szeredi <miklos@szeredi.hu> wrote:

[...]

> > >    I'm not sure that the per-file part of this is necessary, doing
> > > everything per-inode should be okay.   What are the benefits?
> > >
> >
> > I agree that semantics are simpler with per-inode.
> > The only benefit I see to per-file is the lifetime of the mapping.
> >
> > It is very easy IMO to program with a mapping scope of
> > open-to-close that is requested by FOPEN_PASSTHROUGH
> > and FOPEN_PASSTHROUGH_AUTO_CLOSE.
>
> Right, and this case the resource limiting is also easy to think about.
>
> I'm not worried about consistency, fuse server can do whatever it
> wants with the data anyway.  I am worried about the visibility of the
> mapping.  One idea would be to create a kernel thread for each fuse sb
> instance and install mapped files into that thread's file table.  In
> theory this should be trivial as the VFS has all the helpers that can
> do this safely.
>

Sounds doable.
I will look into this after I get the basics sorted out.

> >
> > I think if I can make this patch set work per-inode, the roadmap
> > from here to FUSE-BPF would be much more clear.
>
> One advantage of per-inode non-autoclose would be that open could be
> done without a roundtrip to the server.   However the resource
> limiting becomes harder to think about.
>
> So it might make sense to just create two separate modes:
>
>  - per-open unmap-on-release (autoclose)
>  - per-inode unmap-on-forget (non-autoclose, mapping can be torn down
> explicitly)
>

[...]

> > In summary, I will try to come up with v14 that is:
> > - privileged user only
> > - no resource limitation
> > - per-inode mapping
>
> Okay, that's a logical first step.

I said that I would try to start with per-inode operation mode,
but I realize that it does not meet one of my basic project requirements -
I need to be able to passthrough some of the fds of the same inode,
but not all of them.

I was thinking of a slightly different model that could (possibly)
unify those two modes and be flexible enough to be extended with
BPF filters going forward.

The model is based on per-inode association to backing fd.

1. A single association (mapping) can created per-inode using ioctl
 - There is no mapping id - the inode either has a backing_fd or not
 - Trying to set another backing_fd for inode gets EEXIST if one exists
 - A backing_fd can be torn with ioctl
 - The backing_fd is of course auto-closed on forget

2. The backing_fd association itself does not cause any passthrough!
 - Passthrough operations need to be opt-in independently of mapping
   the backing_fd
 - Down the road, passthrough operation mask could be setup in the
   mapping
 - Down the road, a BPF program to decide on passthrough operation
   could be setup in the mapping as the BPF patches intended

3. Initially, the only way to opt-in to passthrough read/write operations
    is by passing the FOPEN_PASSTHROUGH flag on open
 - FOPEN_PASSTHROUGH will have no effect if backing_fd wasn't
   mapped beforehand
 - As long as there are FUSE files open with FOPEN_PASSTHROUGH,
   the inode's backing_fd cannot be unmapped
 - If a file is opened with FOPEN_PASSTHROUGH_AUTOCLOSE,
   when that file is closed, *if it is the last file referencing* the inode,
   the backing_fd is auto-closed

This is not as flexible as being able to map each FUSE fd to a different
backing_fd.

In the future, FUSE fds could have their own individual backing_fd if
needed, but for now, I think that starting with a single shared backing_fd
with per-fd opt-in on open, would be simpler to implement, but still useful.

One obvious downside of the shared backing_fd approach is that
if FUSE fds are a mix of O_RDONLY and O_WRONLY, the shared
backing_fd needs to be setup as O_RDWR in advance.

I think this is not such a strict limitation for the first implementation,
since we anyway agreed that the first implementation would require
the server to be privileged.

Do you think this is an acceptable first step?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-08-29 18:14         ` Amir Goldstein
@ 2023-09-20 13:56           ` Amir Goldstein
  2023-09-20 18:15             ` Miklos Szeredi
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-09-20 13:56 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Tue, Aug 29, 2023 at 9:14 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, Jun 6, 2023 at 4:06 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Tue, 6 Jun 2023 at 13:19, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Tue, Jun 6, 2023 at 12:49 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> [...]
>
> > > >    I'm not sure that the per-file part of this is necessary, doing
> > > > everything per-inode should be okay.   What are the benefits?
> > > >
> > >
> > > I agree that semantics are simpler with per-inode.
> > > The only benefit I see to per-file is the lifetime of the mapping.
> > >
> > > It is very easy IMO to program with a mapping scope of
> > > open-to-close that is requested by FOPEN_PASSTHROUGH
> > > and FOPEN_PASSTHROUGH_AUTO_CLOSE.
> >
> > Right, and this case the resource limiting is also easy to think about.
> >
> > I'm not worried about consistency, fuse server can do whatever it
> > wants with the data anyway.  I am worried about the visibility of the
> > mapping.  One idea would be to create a kernel thread for each fuse sb
> > instance and install mapped files into that thread's file table.  In
> > theory this should be trivial as the VFS has all the helpers that can
> > do this safely.
> >
>
> Sounds doable.
> I will look into this after I get the basics sorted out.
>
> > >
> > > I think if I can make this patch set work per-inode, the roadmap
> > > from here to FUSE-BPF would be much more clear.
> >
> > One advantage of per-inode non-autoclose would be that open could be
> > done without a roundtrip to the server.   However the resource
> > limiting becomes harder to think about.
> >
> > So it might make sense to just create two separate modes:
> >
> >  - per-open unmap-on-release (autoclose)
> >  - per-inode unmap-on-forget (non-autoclose, mapping can be torn down
> > explicitly)
> >
>

Miklos,

I have a POC for the two modes above, see kernel patches [1] and
passthrough_hp example[2].

As far as I know, I addressed all your review comments on v13:
1. factor out common helpers from overlayfs
2. factor out ioctl helpers
3. per-file and per-inode backing file modes
4. optional auto close flag

Only thing not addressed yet is lsof-visibility of the backing files.

The first passthrough_hp commit:
- Enable passthrough mode for read/write operations

Is the original passthrough_hp POC of Alessio, which implements
the per-open unmap-on-release (autoclose), as the original POC did.

Unlike Alession's version, this implementation uses the explicit ioctl
flag FUSE_BACKING_MAP_ID to setup an "inode unbound" backing file
and the explicit open flag FOPEN_PASSTHROUGH_AUTO_CLOSE
to unmap the backing id, when the backing file is assigned to the fuse file.

FOPEN_PASSTHROUGH (without AUTO_CLOSE) would let the server
manage the backing ids, but that mode is not implemented in the example.

The seconds passthrough_hp commit:
- Enable passthrough of multiple fuse files to the same backing file

Maps all the rdonly FUSE files to the same "inode bound" backing
file that is setup on the first rdonly open of an inode.
The implementation uses the ioctl flag FUSE_BACKING_MAP_INODE
and the open flag FOPEN_PASSTHROUGH (AUTO_CLOSE not
supported in this mode).

The "inode bound" shared backing file is released of inode evict
of course, but the example explicitly unmaps the inode bound
backing file on close of the last open file of an inode.

Writable files still use the per-open unmap-on-release mode.

I ran this POC on fstests using the newly added support for
running fuse in fstests with a mount helper (one libfuse patch
in my branch improves the mount helper).

First I verified that the group -g quick.rw passes on baseline
passthrough_hp and then I tested with backing files enabled.
The tests found some failures with splice_{read/write}, so
I added support for splice_{read/write} to backing files and
then all the tests passed.

NOTE that if the FOPEN_PASSTHROUGH directive fails for
any reason, this will not fail the open - it will just open the file
without passthrough, and if a valid backing_id is provided
with FOPEN_PASSTHROUGH_AUTO_CLOSE, then the
backing file will be auto closed regardless if the open fails.

Please ACK this API design and then I will continue to clean up
the kernel patches and post them for review.

I still plan to:
1. Move more code into backing_file helpers (i.e. splice/mmap)
2. Install fds of backing files in kernel task file table

Please let me know if you have any comments so far and if
you want me to post the patches even without the kernel task,
as that may take me a bit longer to get to.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fuse-backing-fd
[1] https://github.com/amir73il/libfuse/commits/fuse-backing-fd

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-09-20 13:56           ` Amir Goldstein
@ 2023-09-20 18:15             ` Miklos Szeredi
  2023-09-21  7:33               ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Miklos Szeredi @ 2023-09-20 18:15 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Wed, 20 Sept 2023 at 15:57, Amir Goldstein <amir73il@gmail.com> wrote:

> Unlike Alession's version, this implementation uses the explicit ioctl
> flag FUSE_BACKING_MAP_ID to setup an "inode unbound" backing file
> and the explicit open flag FOPEN_PASSTHROUGH_AUTO_CLOSE
> to unmap the backing id, when the backing file is assigned to the fuse file.

That sounds about right.  I'm not sure the flexibility provided by the
auto/noauto mode is really needed, but taking this out later in the
development process shouldn't be a problem.

>
> FOPEN_PASSTHROUGH (without AUTO_CLOSE) would let the server
> manage the backing ids, but that mode is not implemented in the example.
>
> The seconds passthrough_hp commit:
> - Enable passthrough of multiple fuse files to the same backing file
>
> Maps all the rdonly FUSE files to the same "inode bound" backing
> file that is setup on the first rdonly open of an inode.
> The implementation uses the ioctl flag FUSE_BACKING_MAP_INODE
> and the open flag FOPEN_PASSTHROUGH (AUTO_CLOSE not
> supported in this mode).

I think the natural interface would be to setup the inode mapping in
the lookup reply (using the normal backing ID).

With your current method, what's the good time for establishing the
mapping?  After the lookup succeeded, obviously.  But then it might
already have raced with open...

>
> The "inode bound" shared backing file is released of inode evict
> of course, but the example explicitly unmaps the inode bound
> backing file on close of the last open file of an inode.
>
> Writable files still use the per-open unmap-on-release mode.

What's the reason for using different modes on different types of opens?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-09-20 18:15             ` Miklos Szeredi
@ 2023-09-21  7:33               ` Amir Goldstein
  2023-09-21  8:21                 ` Miklos Szeredi
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-09-21  7:33 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Wed, Sep 20, 2023 at 9:15 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, 20 Sept 2023 at 15:57, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > Unlike Alession's version, this implementation uses the explicit ioctl
> > flag FUSE_BACKING_MAP_ID to setup an "inode unbound" backing file
> > and the explicit open flag FOPEN_PASSTHROUGH_AUTO_CLOSE
> > to unmap the backing id, when the backing file is assigned to the fuse file.
>
> That sounds about right.  I'm not sure the flexibility provided by the
> auto/noauto mode is really needed, but taking this out later in the
> development process shouldn't be a problem.
>
> >
> > FOPEN_PASSTHROUGH (without AUTO_CLOSE) would let the server
> > manage the backing ids, but that mode is not implemented in the example.
> >
> > The seconds passthrough_hp commit:
> > - Enable passthrough of multiple fuse files to the same backing file
> >
> > Maps all the rdonly FUSE files to the same "inode bound" backing
> > file that is setup on the first rdonly open of an inode.
> > The implementation uses the ioctl flag FUSE_BACKING_MAP_INODE
> > and the open flag FOPEN_PASSTHROUGH (AUTO_CLOSE not
> > supported in this mode).
>
> I think the natural interface would be to setup the inode mapping in
> the lookup reply (using the normal backing ID).
>
> With your current method, what's the good time for establishing the
> mapping?  After the lookup succeeded, obviously.  But then it might
> already have raced with open...
>

I don't understand the concern.

This API leaves the control of when to setup/teardown the
mapping completely in the hands of the server.

Avoiding races in the "inode bound" mode is also the responsibility
of the server.

Quoting from the kernel commit of FUSE_BACKING_MAP_INODE:

"If an inode bound backing file already exists, the ioctl returns -EEXIST.
 Managing which thread sets up the backing file for concurrent file open
 requests is the responsibility of the server.
...
 The FUSE server may call FUSE_DEV_IOC_BACKING_CLOSE ioctl to break
 the association between the backing file and the inode.

 If there is no inode bound backing file, the ioctl returns -ENOENT.
 Managing which thread detaches the backing file is the responsibility of
 the server."

In my example, the server happens to setup the mapping
of the backing file to inode
*before the first open for read on the inode*
and to teardown the mapping
*after the last close on the inode* (not even last rdonly file close)
but this is an arbitrary implementation choice of the server.

This example server has a lock on the server inode object,
which is already used among other things to safely track the number
of open fd on that inode (inode.nopen), so it was easy to
implement the map/unmap in these code points.

> >
> > The "inode bound" shared backing file is released of inode evict
> > of course, but the example explicitly unmaps the inode bound
> > backing file on close of the last open file of an inode.
> >
> > Writable files still use the per-open unmap-on-release mode.
>
> What's the reason for using different modes on different types of opens?
>

The reason is to demonstrate that different FUSE files can use
different backing file modes and that several FUSE files can share
the same backing file.

In the "inode bound" mode, the backing file is *shared* among all
openers of the inode that set backing_id = 0 in the open reply.

If I wanted to demo share the same backing file for readers and writers
I would have needed to re-open the backing file with O_RDWR
on the first rdonly open.

It was convenient for the purpose of the demo to share the rdonly
file from the first rdonly open, among all readers. This demo is
not 100% correct in the sense that if the first open for read
has O_NOATIME and the other open for read do not, all will
use the backing file with O_NOATIME, but this is a minor
implementation detail that the server is fully able to handle.

From your questions, it seems that you either do not like my
proposal of the special backing_id 0, or maybe you missed it
because I did not point it out.

The current kernel implementation does NOT refcount the
backing_id only the fuse_backing object.

The way that auto-close-on-evict is implemented is simply by
not having a backing_id for the "inode bound" backing files.

The way that auto-close-on-fput is implemented is simply by
handing over the backing_id from server to kernel on open
reply, which unmaps the backing_id and atomically transfers the
ownership of the fuse_backing object to the fuse file.

This is not the most powerful API, but
1. It is simple
2. It can be extended later (i.e. by allowing both
    FUSE_BACKING_MAP_{INODE,ID} or another flag)

Is that clear? Do I need to document it better?
Do you find this API useful/acceptable?

If not, I can drop the FUSE_BACKING_MAP_INODE mode
patch and implement the server managed backing ids mode
in the example (by storing the backing_id in the server inode).

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-09-21  7:33               ` Amir Goldstein
@ 2023-09-21  8:21                 ` Miklos Szeredi
  2023-09-21  9:17                   ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Miklos Szeredi @ 2023-09-21  8:21 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Thu, 21 Sept 2023 at 09:33, Amir Goldstein <amir73il@gmail.com> wrote:

> In my example, the server happens to setup the mapping
> of the backing file to inode
> *before the first open for read on the inode*
> and to teardown the mapping
> *after the last close on the inode* (not even last rdonly file close)
> but this is an arbitrary implementation choice of the server.

Okay.

So my question becomes: is this flexibility really needed?

I understand the need to set up the mapping on open.   I think it also
makes sense to set up the mapping on lookup, since then OPEN/RELEASE
can be omitted.

Removing the mapping at a random point int time might also make sense,
because of limitation on the number of open files, but that's
debatable.

What I'm getting at is that I'd prefer the ioctl to just work one way:
register a file and return a ID. Then there would be ways to associate
that ID with an inode (in LOOKUP or OPEN) or with an open file (in
OPEN).

Can you please also remind me why we need the per-open-file mapping
mode?  I'm sure that we've discussed this, but my brain is like a
sieve...

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-09-21  8:21                 ` Miklos Szeredi
@ 2023-09-21  9:17                   ` Amir Goldstein
  2023-09-21  9:30                     ` Miklos Szeredi
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-09-21  9:17 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Thu, Sep 21, 2023 at 11:21 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 21 Sept 2023 at 09:33, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > In my example, the server happens to setup the mapping
> > of the backing file to inode
> > *before the first open for read on the inode*
> > and to teardown the mapping
> > *after the last close on the inode* (not even last rdonly file close)
> > but this is an arbitrary implementation choice of the server.
>
> Okay.
>
> So my question becomes: is this flexibility really needed?
>
> I understand the need to set up the mapping on open.   I think it also
> makes sense to set up the mapping on lookup, since then OPEN/RELEASE
> can be omitted.
>
> Removing the mapping at a random point int time might also make sense,
> because of limitation on the number of open files, but that's
> debatable.
>
> What I'm getting at is that I'd prefer the ioctl to just work one way:
> register a file and return a ID. Then there would be ways to associate
> that ID with an inode (in LOOKUP or OPEN) or with an open file (in
> OPEN).

With my current kernel implementation, this change implies changing
the lifetime rules of fuse_backing object, so that last put will also
remove the backing_id from idr.
It complicates things a bit and is not needed IMO (see below).

The thing that I am concerned about is the complexity of
the AUTO_CLOSE semantics for per-inode and per-file.
IOW, explaining who owns the backing_id becomes more complex
to understand and communicate in a simple API.

I was aiming for a simple to use API and I think my example
demonstrates two modes that are simple to use and even
the server managed backing_id mode would be simple to use.

I don't mind dropping the "inode bound" patch altogether
and staying with server managed backing_id without support
for auto-close-on-evict and only support per-file-auto-close
as is already implemented in my POC.

IWO, if the server want to associate a backing file id with an
inode on LOOKUP or on open, it has no problem is keeping
this association in the server internally, replying to any open
with the backing_id that it associated and closing the
backing_id on FORGET or on the last close.

I see no real gain in the kernel handling the inode-backing_id
association for the server. At least not until this association is
needed to passthough inode operations and in that case
server would probably mapping an O_PATH fd to the inode.

I can easily change my example to work like that and drop the
"inode bound" backing file patch.

In fact, it will make the example even more flexible, because
the server can keep 2 files per inode, one rdonly and one rdwr
(same as the kernel nfsd v3 open files cache does) and for the
example, we could let any open request with non trivial flags
(i.e. O_SYNC) use a per-file-auto-close backing file.

>
> Can you please also remind me why we need the per-open-file mapping
> mode?  I'm sure that we've discussed this, but my brain is like a
> sieve...

Different FUSE files may have different open flags
(e.g. O_RDONLY/O_RDWR/O_SYNC) so server may want to use
different backing files for different FUSE files on the same inode,
but perhaps this is not what you were asking?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-09-21  9:17                   ` Amir Goldstein
@ 2023-09-21  9:30                     ` Miklos Szeredi
  2023-09-21 10:31                       ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Miklos Szeredi @ 2023-09-21  9:30 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Thu, 21 Sept 2023 at 11:17, Amir Goldstein <amir73il@gmail.com> wrote:

> I don't mind dropping the "inode bound" patch altogether
> and staying with server managed backing_id without support
> for auto-close-on-evict and only support per-file-auto-close
> as is already implemented in my POC.

Lets do that, then.

> IWO, if the server want to associate a backing file id with an
> inode on LOOKUP or on open, it has no problem is keeping
> this association in the server internally, replying to any open
> with the backing_id that it associated and closing the
> backing_id on FORGET or on the last close.

Right.   The only gain is when we want to omit the OPEN call.  But
that's definitely not urgent.

> Different FUSE files may have different open flags
> (e.g. O_RDONLY/O_RDWR/O_SYNC) so server may want to use
> different backing files for different FUSE files on the same inode,
> but perhaps this is not what you were asking?

Yes, this answers my question.

THanks,
Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-09-21  9:30                     ` Miklos Szeredi
@ 2023-09-21 10:31                       ` Amir Goldstein
  2023-09-21 11:50                         ` Miklos Szeredi
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-09-21 10:31 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Thu, Sep 21, 2023 at 12:30 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 21 Sept 2023 at 11:17, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > I don't mind dropping the "inode bound" patch altogether
> > and staying with server managed backing_id without support
> > for auto-close-on-evict and only support per-file-auto-close
> > as is already implemented in my POC.
>
> Lets do that, then.

Ok. Two follow up design questions.

I used this in/out struct for both open/close ioctls,
because I needed the flags for different modes:

struct fuse_backing_map {
       int32_t         fd;
       uint32_t        flags;
       uint32_t        backing_id;
       uint32_t        padding;
};

I prefer to leave it like that (with flags=0) for future extensions
over the ioctl that inputs fd and outputs backing_id.
I hope you are ok with this.

The second thing is mmap passthrough.

I noticed that the current mmap passthough patch
uses the backing file as is, which is fine for io and does
pass the tests, but the path of that file is not a "fake path".

So either FUSE mmap passthrough needs to allocate
a backing file with "fake path"
OR if it is not important, than maybe it is not important for
overlayfs either?

Which one is it?
Do you have an idea how to dig ourselves out of this hole?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-09-21 10:31                       ` Amir Goldstein
@ 2023-09-21 11:50                         ` Miklos Szeredi
  2023-09-22 12:45                           ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Miklos Szeredi @ 2023-09-21 11:50 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Thu, 21 Sept 2023 at 12:31, Amir Goldstein <amir73il@gmail.com> wrote:

> Ok. Two follow up design questions.
>
> I used this in/out struct for both open/close ioctls,
> because I needed the flags for different modes:
>
> struct fuse_backing_map {
>        int32_t         fd;
>        uint32_t        flags;
>        uint32_t        backing_id;
>        uint32_t        padding;
> };
>
> I prefer to leave it like that (with flags=0) for future extensions
> over the ioctl that inputs fd and outputs backing_id.
> I hope you are ok with this.

I'd prefer if backing_id was just a return value for open.  Taking a
struct as input is okay, and possibly it should have some amount of
reserved space for future extensions.

As for close, I don't see why we'd need anything else than the backing ID...

>
> The second thing is mmap passthrough.
>
> I noticed that the current mmap passthough patch
> uses the backing file as is, which is fine for io and does
> pass the tests, but the path of that file is not a "fake path".
>
> So either FUSE mmap passthrough needs to allocate
> a backing file with "fake path"

Drat.

> OR if it is not important, than maybe it is not important for
> overlayfs either?

We took great pains to make sure that /proc/self/maps, etc display the
correct overlayfs path.  I don't see why FUSE should be different.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-09-21 11:50                         ` Miklos Szeredi
@ 2023-09-22 12:45                           ` Amir Goldstein
  2023-10-08 17:53                             ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-09-22 12:45 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Thu, Sep 21, 2023 at 2:50 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 21 Sept 2023 at 12:31, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > Ok. Two follow up design questions.
> >
> > I used this in/out struct for both open/close ioctls,
> > because I needed the flags for different modes:
> >
> > struct fuse_backing_map {
> >        int32_t         fd;
> >        uint32_t        flags;
> >        uint32_t        backing_id;
> >        uint32_t        padding;
> > };
> >
> > I prefer to leave it like that (with flags=0) for future extensions
> > over the ioctl that inputs fd and outputs backing_id.
> > I hope you are ok with this.
>
> I'd prefer if backing_id was just a return value for open.  Taking a
> struct as input is okay, and possibly it should have some amount of
> reserved space for future extensions.
>
> As for close, I don't see why we'd need anything else than the backing ID...
>

OK. dropped the inode bound mode, updated the example
to do server managed backing id for the rdonly shared fds.

Pushed this to the fuse-backing-id branches:

[1] https://github.com/amir73il/linux/commits/fuse-backing-fd
[2] https://github.com/amir73il/libfuse/commits/fuse-backing-fd

> >
> > The second thing is mmap passthrough.
> >
> > I noticed that the current mmap passthough patch
> > uses the backing file as is, which is fine for io and does
> > pass the tests, but the path of that file is not a "fake path".
> >
> > So either FUSE mmap passthrough needs to allocate
> > a backing file with "fake path"
>
> Drat.
>
> > OR if it is not important, than maybe it is not important for
> > overlayfs either?
>
> We took great pains to make sure that /proc/self/maps, etc display the
> correct overlayfs path.  I don't see why FUSE should be different.
>

Now I will go have a think about how to ease the pain
in this area for vfs. I have some ideas...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [External] [PATCH v13 08/10] fuse: update inode size/mtime after passthrough write
  2023-05-19 12:57 ` [PATCH v13 08/10] fuse: update inode size/mtime after passthrough write Amir Goldstein
@ 2023-09-25  6:41   ` Zhang Tianci
  2023-09-25 10:43     ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Zhang Tianci @ 2023-09-25  6:41 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	fuse-devel, linux-fsdevel

On Fri, May 19, 2023 at 8:59 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Similar to update size/mtime at the end of fuse_perform_write(),
> we need to bump the attr version when we update the inode size.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/fuse/passthrough.c | 53 ++++++++++++++++++++++++++++++++++---------
>  1 file changed, 42 insertions(+), 11 deletions(-)
>
> diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
> index 10b370bcc423..8352d6b91e0e 100644
> --- a/fs/fuse/passthrough.c
> +++ b/fs/fuse/passthrough.c
> @@ -14,15 +14,42 @@ struct fuse_aio_req {
>         struct kiocb *iocb_fuse;
>  };
>
> -static void fuse_aio_cleanup_handler(struct fuse_aio_req *aio_req)
> +static void fuse_file_start_write(struct file *fuse_file,
> +                                 struct file *backing_file,
> +                                 loff_t pos, size_t count)
> +{
> +       struct inode *inode = file_inode(fuse_file);
> +       struct fuse_inode *fi = get_fuse_inode(inode);
> +
> +       if (inode->i_size < pos + count)
> +               set_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
> +
> +       file_start_write(backing_file);
> +}
> +
> +static void fuse_file_end_write(struct file *fuse_file,
> +                               struct file *backing_file,
> +                               loff_t pos, ssize_t res)
> +{
> +       struct inode *inode = file_inode(fuse_file);
> +       struct fuse_inode *fi = get_fuse_inode(inode);
> +
> +       file_end_write(backing_file);
> +
> +       fuse_write_update_attr(inode, pos, res);

Hi Amir,
This function(fuse_file_end_write) will execute in interrupt context, but
fuse_write_update_attr() uses fuse_inode->lock, this will cause soft lockup.

So we may have to change all the fuse_inode->lock usage to fixup this bug, but
I think this is one ugly resolution.

Or why should we do aio_clearup_handler()? What is the difference between
fuse_passthrough_write_iter() with ovl_write_iter()?

Thanks,
Tianci

> +       clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
> +}
> +
> +static void fuse_aio_cleanup_handler(struct fuse_aio_req *aio_req, long res)
>  {
>         struct kiocb *iocb = &aio_req->iocb;
>         struct kiocb *iocb_fuse = aio_req->iocb_fuse;
> +       struct file *filp = iocb->ki_filp;
> +       struct file *fuse_filp = iocb_fuse->ki_filp;
>
>         if (iocb->ki_flags & IOCB_WRITE) {
> -               __sb_writers_acquired(file_inode(iocb->ki_filp)->i_sb,
> -                                     SB_FREEZE_WRITE);
> -               file_end_write(iocb->ki_filp);
> +               __sb_writers_acquired(file_inode(filp)->i_sb, SB_FREEZE_WRITE);
> +               fuse_file_end_write(fuse_filp, filp, iocb->ki_pos, res);
>         }
>
>         iocb_fuse->ki_pos = iocb->ki_pos;
> @@ -35,7 +62,7 @@ static void fuse_aio_rw_complete(struct kiocb *iocb, long res)
>                 container_of(iocb, struct fuse_aio_req, iocb);
>         struct kiocb *iocb_fuse = aio_req->iocb_fuse;
>
> -       fuse_aio_cleanup_handler(aio_req);
> +       fuse_aio_cleanup_handler(aio_req, res);
>         iocb_fuse->ki_complete(iocb_fuse, res);
>  }
>
> @@ -71,7 +98,7 @@ ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
>                 aio_req->iocb.ki_complete = fuse_aio_rw_complete;
>                 ret = call_read_iter(passthrough_filp, &aio_req->iocb, iter);
>                 if (ret != -EIOCBQUEUED)
> -                       fuse_aio_cleanup_handler(aio_req);
> +                       fuse_aio_cleanup_handler(aio_req, ret);
>         }
>  out:
>         revert_creds(old_cred);
> @@ -87,22 +114,25 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
>         struct inode *fuse_inode = file_inode(fuse_filp);
>         struct file *passthrough_filp = ff->passthrough->filp;
>         struct inode *passthrough_inode = file_inode(passthrough_filp);
> +       size_t count = iov_iter_count(iter);
>         const struct cred *old_cred;
>         ssize_t ret;
>         rwf_t rwf;
>
> -       if (!iov_iter_count(iter))
> +       if (!count)
>                 return 0;
>
>         inode_lock(fuse_inode);
>
>         old_cred = override_creds(ff->passthrough->cred);
>         if (is_sync_kiocb(iocb_fuse)) {
> -               file_start_write(passthrough_filp);
> +               fuse_file_start_write(fuse_filp, passthrough_filp,
> +                                     iocb_fuse->ki_pos, count);
>                 rwf = iocb_to_rw_flags(iocb_fuse->ki_flags, FUSE_IOCB_MASK);
>                 ret = vfs_iter_write(passthrough_filp, iter, &iocb_fuse->ki_pos,
>                                      rwf);
> -               file_end_write(passthrough_filp);
> +               fuse_file_end_write(fuse_filp, passthrough_filp,
> +                                   iocb_fuse->ki_pos, ret);
>         } else {
>                 struct fuse_aio_req *aio_req;
>
> @@ -112,7 +142,8 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
>                         goto out;
>                 }
>
> -               file_start_write(passthrough_filp);
> +               fuse_file_start_write(fuse_filp, passthrough_filp,
> +                                     iocb_fuse->ki_pos, count);
>                 __sb_writers_release(passthrough_inode->i_sb, SB_FREEZE_WRITE);
>
>                 aio_req->iocb_fuse = iocb_fuse;
> @@ -120,7 +151,7 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
>                 aio_req->iocb.ki_complete = fuse_aio_rw_complete;
>                 ret = call_write_iter(passthrough_filp, &aio_req->iocb, iter);
>                 if (ret != -EIOCBQUEUED)
> -                       fuse_aio_cleanup_handler(aio_req);
> +                       fuse_aio_cleanup_handler(aio_req, ret);
>         }
>  out:
>         revert_creds(old_cred);
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [External] [PATCH v13 08/10] fuse: update inode size/mtime after passthrough write
  2023-09-25  6:41   ` [External] " Zhang Tianci
@ 2023-09-25 10:43     ` Amir Goldstein
  2023-09-26 15:31       ` Jens Axboe
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-09-25 10:43 UTC (permalink / raw)
  To: Zhang Tianci
  Cc: Miklos Szeredi, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	fuse-devel, linux-fsdevel, Jens Axboe, Christian Brauner

On Mon, Sep 25, 2023 at 9:41 AM Zhang Tianci
<zhangtianci.1997@bytedance.com> wrote:
>
> On Fri, May 19, 2023 at 8:59 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > Similar to update size/mtime at the end of fuse_perform_write(),
> > we need to bump the attr version when we update the inode size.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/fuse/passthrough.c | 53 ++++++++++++++++++++++++++++++++++---------
> >  1 file changed, 42 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
> > index 10b370bcc423..8352d6b91e0e 100644
> > --- a/fs/fuse/passthrough.c
> > +++ b/fs/fuse/passthrough.c
> > @@ -14,15 +14,42 @@ struct fuse_aio_req {
> >         struct kiocb *iocb_fuse;
> >  };
> >
> > -static void fuse_aio_cleanup_handler(struct fuse_aio_req *aio_req)
> > +static void fuse_file_start_write(struct file *fuse_file,
> > +                                 struct file *backing_file,
> > +                                 loff_t pos, size_t count)
> > +{
> > +       struct inode *inode = file_inode(fuse_file);
> > +       struct fuse_inode *fi = get_fuse_inode(inode);
> > +
> > +       if (inode->i_size < pos + count)
> > +               set_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
> > +
> > +       file_start_write(backing_file);
> > +}
> > +
> > +static void fuse_file_end_write(struct file *fuse_file,
> > +                               struct file *backing_file,
> > +                               loff_t pos, ssize_t res)
> > +{
> > +       struct inode *inode = file_inode(fuse_file);
> > +       struct fuse_inode *fi = get_fuse_inode(inode);
> > +
> > +       file_end_write(backing_file);
> > +
> > +       fuse_write_update_attr(inode, pos, res);
>
> Hi Amir,
> This function(fuse_file_end_write) will execute in interrupt context, but
> fuse_write_update_attr() uses fuse_inode->lock, this will cause soft lockup.
>
> So we may have to change all the fuse_inode->lock usage to fixup this bug, but
> I think this is one ugly resolution.
>
> Or why should we do aio_clearup_handler()? What is the difference between
> fuse_passthrough_write_iter() with ovl_write_iter()?
>

[CC Jens and Christian]

Heh, very good question.
Does this answer your question:

https://lore.kernel.org/linux-unionfs/20230912173653.3317828-2-amir73il@gmail.com/

I queued this patch to overlayfs for 6.7, because I think overlayfs
has a bug that can manifest with concurrent aio completions.

For people who just joined, this is a patch review of the
FUSE passthrough feature, which is expected to share the
common "kiocb_clone" io passthrough helpers with overlayfs.

Jens,

Are there any IOCB flags that overlayfs (or backing_aio) need
to set or clear, besides IOCB_DIO_CALLER_COMP, that
would prevent calling completion from interrupt context?

Or is the proper way to deal with this is to defer completion
to workqueue in the common backing_aio helpers that
I am re-factoring from overlayfs?

IIUC, that could also help overlayfs support
IOCB_DIO_CALLER_COMP?

Is my understanding correct?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [External] [PATCH v13 08/10] fuse: update inode size/mtime after passthrough write
  2023-09-25 10:43     ` Amir Goldstein
@ 2023-09-26 15:31       ` Jens Axboe
  2023-09-26 15:48         ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Jens Axboe @ 2023-09-26 15:31 UTC (permalink / raw)
  To: Amir Goldstein, Zhang Tianci
  Cc: Miklos Szeredi, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	fuse-devel, linux-fsdevel, Christian Brauner

On 9/25/23 4:43 AM, Amir Goldstein wrote:
> Jens,
> 
> Are there any IOCB flags that overlayfs (or backing_aio) need
> to set or clear, besides IOCB_DIO_CALLER_COMP, that
> would prevent calling completion from interrupt context?

There are a few flags that may get set (like WAITQ or ALLOC_CACHE), but
I think that should all work ok as-is as the former is just state in
that iocb and that is persistent (and only for the read path), and
ALLOC_CACHE should just be propagated.

> Or is the proper way to deal with this is to defer completion
> to workqueue in the common backing_aio helpers that
> I am re-factoring from overlayfs?

No, deferring to a workqueue would defeat the purpose of the flag, which
is telling you that the caller will ensure that the end_io callback will
happen from task context and need not be deferred to a workqueue. I can
take a peek at how to wire it up properly for overlayfs, have some
travel coming up in a few days.

> IIUC, that could also help overlayfs support
> IOCB_DIO_CALLER_COMP?
> 
> Is my understanding correct?

If you peek at fs.h and find the CALLER_COMP references, it'll tell you
a bit about how it works. This is new with the 6.6-rc kernel, there's a
series of patches from me that went in through the iomap tree that
hooked that up. Those commits have an explanation as well.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [External] [PATCH v13 08/10] fuse: update inode size/mtime after passthrough write
  2023-09-26 15:31       ` Jens Axboe
@ 2023-09-26 15:48         ` Amir Goldstein
  2023-09-26 16:19           ` Jens Axboe
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-09-26 15:48 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Zhang Tianci, Miklos Szeredi, Daniel Rosenberg, Paul Lawrence,
	Alessio Balsini, fuse-devel, linux-fsdevel, Christian Brauner

On Tue, Sep 26, 2023 at 6:31 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 9/25/23 4:43 AM, Amir Goldstein wrote:
> > Jens,
> >
> > Are there any IOCB flags that overlayfs (or backing_aio) need
> > to set or clear, besides IOCB_DIO_CALLER_COMP, that
> > would prevent calling completion from interrupt context?
>
> There are a few flags that may get set (like WAITQ or ALLOC_CACHE), but
> I think that should all work ok as-is as the former is just state in
> that iocb and that is persistent (and only for the read path), and
> ALLOC_CACHE should just be propagated.
>
> > Or is the proper way to deal with this is to defer completion
> > to workqueue in the common backing_aio helpers that
> > I am re-factoring from overlayfs?
>
> No, deferring to a workqueue would defeat the purpose of the flag, which
> is telling you that the caller will ensure that the end_io callback will
> happen from task context and need not be deferred to a workqueue. I can
> take a peek at how to wire it up properly for overlayfs, have some
> travel coming up in a few days.
>

No worries, this is not urgent.
I queued a change to overlayfs to take a spin lock on completion
for the 6.7 merge window, so if I can get a ACK/NACK until then
It would be nice.

https://lore.kernel.org/linux-unionfs/20230912173653.3317828-2-amir73il@gmail.com/

> > IIUC, that could also help overlayfs support
> > IOCB_DIO_CALLER_COMP?
> >
> > Is my understanding correct?
>
> If you peek at fs.h and find the CALLER_COMP references, it'll tell you
> a bit about how it works. This is new with the 6.6-rc kernel, there's a
> series of patches from me that went in through the iomap tree that
> hooked that up. Those commits have an explanation as well.
>

Sorry, I think my question wasn't clear.
I wasn't asking specifically about CALLER_COMP.

Zhang Tianci commented in review (above) that I am not allowed
to take the inode spinlock in the ovl io completion context, because
it may be called from interrupt.

I wasn't sure if his statement was correct, so this is what I am
asking - whether overlayfs can set any IOCB flags that will force
the completion to be called from task context - this is kind of the
opposite of CALLER_COMP.

Let me know if I wasn't able to explain myself.
I am not that fluent in aio jargon.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [External] [PATCH v13 08/10] fuse: update inode size/mtime after passthrough write
  2023-09-26 15:48         ` Amir Goldstein
@ 2023-09-26 16:19           ` Jens Axboe
  2023-09-26 16:56             ` Amir Goldstein
  0 siblings, 1 reply; 55+ messages in thread
From: Jens Axboe @ 2023-09-26 16:19 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Zhang Tianci, Miklos Szeredi, Daniel Rosenberg, Paul Lawrence,
	Alessio Balsini, fuse-devel, linux-fsdevel, Christian Brauner

On 9/26/23 9:48 AM, Amir Goldstein wrote:
> On Tue, Sep 26, 2023 at 6:31?PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 9/25/23 4:43 AM, Amir Goldstein wrote:
>>> Jens,
>>>
>>> Are there any IOCB flags that overlayfs (or backing_aio) need
>>> to set or clear, besides IOCB_DIO_CALLER_COMP, that
>>> would prevent calling completion from interrupt context?
>>
>> There are a few flags that may get set (like WAITQ or ALLOC_CACHE), but
>> I think that should all work ok as-is as the former is just state in
>> that iocb and that is persistent (and only for the read path), and
>> ALLOC_CACHE should just be propagated.
>>
>>> Or is the proper way to deal with this is to defer completion
>>> to workqueue in the common backing_aio helpers that
>>> I am re-factoring from overlayfs?
>>
>> No, deferring to a workqueue would defeat the purpose of the flag, which
>> is telling you that the caller will ensure that the end_io callback will
>> happen from task context and need not be deferred to a workqueue. I can
>> take a peek at how to wire it up properly for overlayfs, have some
>> travel coming up in a few days.
>>
> 
> No worries, this is not urgent.
> I queued a change to overlayfs to take a spin lock on completion
> for the 6.7 merge window, so if I can get a ACK/NACK until then
> It would be nice.
> 
> https://lore.kernel.org/linux-unionfs/20230912173653.3317828-2-amir73il@gmail.com/

That's not going to work for ovl_copyattr(), as ->ki_complete() may very
well be called from interrupt context in general.

>>> IIUC, that could also help overlayfs support
>>> IOCB_DIO_CALLER_COMP?
>>>
>>> Is my understanding correct?
>>
>> If you peek at fs.h and find the CALLER_COMP references, it'll tell you
>> a bit about how it works. This is new with the 6.6-rc kernel, there's a
>> series of patches from me that went in through the iomap tree that
>> hooked that up. Those commits have an explanation as well.
>>
> 
> Sorry, I think my question wasn't clear.
> I wasn't asking specifically about CALLER_COMP.
> 
> Zhang Tianci commented in review (above) that I am not allowed
> to take the inode spinlock in the ovl io completion context, because
> it may be called from interrupt.

That is correct, the inode spinlock is not IRQ safe.

> I wasn't sure if his statement was correct, so this is what I am
> asking - whether overlayfs can set any IOCB flags that will force
> the completion to be called from task context - this is kind of the
> opposite of CALLER_COMP.
> 
> Let me know if I wasn't able to explain myself.
> I am not that fluent in aio jargon.

Ah gotcha. I don't think that'd really work for your case as you don't
need to propagate it, you can just punt your completion handling to a
context that is sane for you, like a workqueue. That is provided that
you don't need any task context there, which presumably you don't since
eg copyattr is already called from IRQ context.

From that context you could then grab the inode lock.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [External] [PATCH v13 08/10] fuse: update inode size/mtime after passthrough write
  2023-09-26 16:19           ` Jens Axboe
@ 2023-09-26 16:56             ` Amir Goldstein
  0 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-09-26 16:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Zhang Tianci, Miklos Szeredi, Daniel Rosenberg, Paul Lawrence,
	Alessio Balsini, fuse-devel, linux-fsdevel, Christian Brauner

On Tue, Sep 26, 2023 at 7:19 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 9/26/23 9:48 AM, Amir Goldstein wrote:
> > On Tue, Sep 26, 2023 at 6:31?PM Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> On 9/25/23 4:43 AM, Amir Goldstein wrote:
> >>> Jens,
> >>>
> >>> Are there any IOCB flags that overlayfs (or backing_aio) need
> >>> to set or clear, besides IOCB_DIO_CALLER_COMP, that
> >>> would prevent calling completion from interrupt context?
> >>
> >> There are a few flags that may get set (like WAITQ or ALLOC_CACHE), but
> >> I think that should all work ok as-is as the former is just state in
> >> that iocb and that is persistent (and only for the read path), and
> >> ALLOC_CACHE should just be propagated.
> >>
> >>> Or is the proper way to deal with this is to defer completion
> >>> to workqueue in the common backing_aio helpers that
> >>> I am re-factoring from overlayfs?
> >>
> >> No, deferring to a workqueue would defeat the purpose of the flag, which
> >> is telling you that the caller will ensure that the end_io callback will
> >> happen from task context and need not be deferred to a workqueue. I can
> >> take a peek at how to wire it up properly for overlayfs, have some
> >> travel coming up in a few days.
> >>
> >
> > No worries, this is not urgent.
> > I queued a change to overlayfs to take a spin lock on completion
> > for the 6.7 merge window, so if I can get a ACK/NACK until then
> > It would be nice.
> >
> > https://lore.kernel.org/linux-unionfs/20230912173653.3317828-2-amir73il@gmail.com/
>
> That's not going to work for ovl_copyattr(), as ->ki_complete() may very
> well be called from interrupt context in general.
>
> >>> IIUC, that could also help overlayfs support
> >>> IOCB_DIO_CALLER_COMP?
> >>>
> >>> Is my understanding correct?
> >>
> >> If you peek at fs.h and find the CALLER_COMP references, it'll tell you
> >> a bit about how it works. This is new with the 6.6-rc kernel, there's a
> >> series of patches from me that went in through the iomap tree that
> >> hooked that up. Those commits have an explanation as well.
> >>
> >
> > Sorry, I think my question wasn't clear.
> > I wasn't asking specifically about CALLER_COMP.
> >
> > Zhang Tianci commented in review (above) that I am not allowed
> > to take the inode spinlock in the ovl io completion context, because
> > it may be called from interrupt.
>
> That is correct, the inode spinlock is not IRQ safe.
>
> > I wasn't sure if his statement was correct, so this is what I am
> > asking - whether overlayfs can set any IOCB flags that will force
> > the completion to be called from task context - this is kind of the
> > opposite of CALLER_COMP.
> >
> > Let me know if I wasn't able to explain myself.
> > I am not that fluent in aio jargon.
>
> Ah gotcha. I don't think that'd really work for your case as you don't
> need to propagate it, you can just punt your completion handling to a
> context that is sane for you, like a workqueue. That is provided that
> you don't need any task context there, which presumably you don't since
> eg copyattr is already called from IRQ context.
>
> From that context you could then grab the inode lock.
>

Yes, that's what I thought.
Thanks for confirming!
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-09-22 12:45                           ` Amir Goldstein
@ 2023-10-08 17:53                             ` Amir Goldstein
  2023-10-10 14:31                               ` Miklos Szeredi
  0 siblings, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-10-08 17:53 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Fri, Sep 22, 2023 at 3:45 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Sep 21, 2023 at 2:50 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
...
> > As for close, I don't see why we'd need anything else than the backing ID...
> >
>
> OK. dropped the inode bound mode, updated the example
> to do server managed backing id for the rdonly shared fds.
>
> Pushed this to the fuse-backing-id branches:
>
> [1] https://github.com/amir73il/linux/commits/fuse-backing-fd
> [2] https://github.com/amir73il/libfuse/commits/fuse-backing-fd
>
> > >
> > > The second thing is mmap passthrough.
> > >
> > > I noticed that the current mmap passthough patch
> > > uses the backing file as is, which is fine for io and does
> > > pass the tests, but the path of that file is not a "fake path".
> > >
> > > So either FUSE mmap passthrough needs to allocate
> > > a backing file with "fake path"
> >
> > Drat.
> >
> > > OR if it is not important, than maybe it is not important for
> > > overlayfs either?
> >
> > We took great pains to make sure that /proc/self/maps, etc display the
> > correct overlayfs path.  I don't see why FUSE should be different.
> >
>
> Now I will go have a think about how to ease the pain
> in this area for vfs. I have some ideas...

Ok, posted your original suggestion for opt-in to fake path:
https://lore.kernel.org/linux-fsdevel/20231007084433.1417887-1-amir73il@gmail.com/

Now the problem is that on FUSE_DEV_IOC_BACKING_OPEN ioctl,
the fake (fuse) path is not known.

We can set the fake path on the first FOPEN_PASSTHROUGH response,
but then the whole concept of a backing id that is not bound to a
single file/inode
becomes a bit fuzzy.

One solution is to allocate a backing_file container per fuse file on
FOPEN_PASSTHROUGH response.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-10-08 17:53                             ` Amir Goldstein
@ 2023-10-10 14:31                               ` Miklos Szeredi
  2023-10-10 15:14                                 ` Amir Goldstein
  2023-10-16 10:30                                 ` Amir Goldstein
  0 siblings, 2 replies; 55+ messages in thread
From: Miklos Szeredi @ 2023-10-10 14:31 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Sun, 8 Oct 2023 at 19:53, Amir Goldstein <amir73il@gmail.com> wrote:

> Ok, posted your original suggestion for opt-in to fake path:
> https://lore.kernel.org/linux-fsdevel/20231007084433.1417887-1-amir73il@gmail.com/
>
> Now the problem is that on FUSE_DEV_IOC_BACKING_OPEN ioctl,
> the fake (fuse) path is not known.
>
> We can set the fake path on the first FOPEN_PASSTHROUGH response,
> but then the whole concept of a backing id that is not bound to a
> single file/inode
> becomes a bit fuzzy.
>
> One solution is to allocate a backing_file container per fuse file on
> FOPEN_PASSTHROUGH response.

Right.   How about the following idea:

 - mapping request is done with an O_PATH fd.
 - fuse_open() always opens a backing file (just like overlayfs)

The disadvantage is one more struct file (the third).  The advantage
is that the server doesn't have to care about open flags, hence the
mapping can always be per-inode.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-10-10 14:31                               ` Miklos Szeredi
@ 2023-10-10 15:14                                 ` Amir Goldstein
  2023-10-14 16:01                                   ` Amir Goldstein
  2023-10-16 10:30                                 ` Amir Goldstein
  1 sibling, 1 reply; 55+ messages in thread
From: Amir Goldstein @ 2023-10-10 15:14 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Tue, Oct 10, 2023 at 5:31 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Sun, 8 Oct 2023 at 19:53, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > Ok, posted your original suggestion for opt-in to fake path:
> > https://lore.kernel.org/linux-fsdevel/20231007084433.1417887-1-amir73il@gmail.com/
> >
> > Now the problem is that on FUSE_DEV_IOC_BACKING_OPEN ioctl,
> > the fake (fuse) path is not known.
> >
> > We can set the fake path on the first FOPEN_PASSTHROUGH response,
> > but then the whole concept of a backing id that is not bound to a
> > single file/inode
> > becomes a bit fuzzy.
> >
> > One solution is to allocate a backing_file container per fuse file on
> > FOPEN_PASSTHROUGH response.
>
> Right.   How about the following idea:
>
>  - mapping request is done with an O_PATH fd.
>  - fuse_open() always opens a backing file (just like overlayfs)
>

I think it makes sense and it is in the direction that FUSE BPF took
so that's good.

> The disadvantage is one more struct file (the third).  The advantage
> is that the server doesn't have to care about open flags, hence the
> mapping can always be per-inode.

I am fine with that.
One more struct file per inode is not that big of an overhead
and one backing file per fuse file is the same overhead as overlayfs.

Does it mean that we can drop backing_id's and use the "inode bound"
FUSE_BACKING_MAP_INODE mode to map the O_PATH fd to an inode,
where the FUSE_DEV_IOC_BACKING_OPEN ioctl takes a nodeid as
an input argument?

Or do you still think that we need the backing_id, so we can map
the same O_PATH fd to different inodes?

To me, one O_PATH fd per inode without any backing_id seems
like a simpler start.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-10-10 15:14                                 ` Amir Goldstein
@ 2023-10-14 16:01                                   ` Amir Goldstein
  0 siblings, 0 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-10-14 16:01 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel

On Tue, Oct 10, 2023 at 6:14 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, Oct 10, 2023 at 5:31 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Sun, 8 Oct 2023 at 19:53, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > Ok, posted your original suggestion for opt-in to fake path:
> > > https://lore.kernel.org/linux-fsdevel/20231007084433.1417887-1-amir73il@gmail.com/
> > >
> > > Now the problem is that on FUSE_DEV_IOC_BACKING_OPEN ioctl,
> > > the fake (fuse) path is not known.
> > >
> > > We can set the fake path on the first FOPEN_PASSTHROUGH response,
> > > but then the whole concept of a backing id that is not bound to a
> > > single file/inode
> > > becomes a bit fuzzy.
> > >
> > > One solution is to allocate a backing_file container per fuse file on
> > > FOPEN_PASSTHROUGH response.
> >
> > Right.   How about the following idea:
> >
> >  - mapping request is done with an O_PATH fd.
> >  - fuse_open() always opens a backing file (just like overlayfs)
> >
>
> I think it makes sense and it is in the direction that FUSE BPF took
> so that's good.
>
> > The disadvantage is one more struct file (the third).  The advantage
> > is that the server doesn't have to care about open flags, hence the
> > mapping can always be per-inode.
>
> I am fine with that.
> One more struct file per inode is not that big of an overhead
> and one backing file per fuse file is the same overhead as overlayfs.
>
> Does it mean that we can drop backing_id's and use the "inode bound"
> FUSE_BACKING_MAP_INODE mode to map the O_PATH fd to an inode,
> where the FUSE_DEV_IOC_BACKING_OPEN ioctl takes a nodeid as
> an input argument?
>
> Or do you still think that we need the backing_id, so we can map
> the same O_PATH fd to different inodes?
>
> To me, one O_PATH fd per inode without any backing_id seems
> like a simpler start.
>

I did not take into account that during create, backing file is setup
before the kernel knows about the nodeid.
I will keep the backing_id and will figure out how to avoid multi
backing ids per inode.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-10-10 14:31                               ` Miklos Szeredi
  2023-10-10 15:14                                 ` Amir Goldstein
@ 2023-10-16 10:30                                 ` Amir Goldstein
  2023-10-16 13:21                                   ` Miklos Szeredi
  2023-10-16 19:16                                   ` Bernd Schubert
  1 sibling, 2 replies; 55+ messages in thread
From: Amir Goldstein @ 2023-10-16 10:30 UTC (permalink / raw)
  To: Miklos Szeredi, Bernd Schubert
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel, Jens Axboe

On Tue, Oct 10, 2023 at 5:31 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Sun, 8 Oct 2023 at 19:53, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > Ok, posted your original suggestion for opt-in to fake path:
> > https://lore.kernel.org/linux-fsdevel/20231007084433.1417887-1-amir73il@gmail.com/
> >
> > Now the problem is that on FUSE_DEV_IOC_BACKING_OPEN ioctl,
> > the fake (fuse) path is not known.
> >
> > We can set the fake path on the first FOPEN_PASSTHROUGH response,
> > but then the whole concept of a backing id that is not bound to a
> > single file/inode
> > becomes a bit fuzzy.
> >
> > One solution is to allocate a backing_file container per fuse file on
> > FOPEN_PASSTHROUGH response.
>
> Right.   How about the following idea:
>
>  - mapping request is done with an O_PATH fd.
>  - fuse_open() always opens a backing file (just like overlayfs)
>
> The disadvantage is one more struct file (the third).  The advantage
> is that the server doesn't have to care about open flags, hence the
> mapping can always be per-inode.
>

OK, this was pushed to the POC branches [2][3].

NOTE that in this version, backing ids are completely server managed.
The passthrough_hp example keeps backing_id in the server's Inode
object and closes the backing file when server's inode.nopen drops to 0.
Even if OPEN/CREATE reply with backing_id failed to create/open the
fuse inode, RELEASE should still be called with the server provided nodeid,
so server should be able to close the backing file after a failed open/create.

I am ready to post v14 patches, but since they depend on vfs and overlayfs
changes queued for 6.7, and since fuse passthrough is not a likely candidate
for 6.7, I thought I will wait with posting, because you must be busy preparing
for 6.7(?).

The remaining question about lsof-visibility of the backing files got me
thinking and I wanted to consult with io_uring developers regarding using
io_uring fixed files table for FUSE backing files [*].

[*] The term "FUSE backing files" is now VERY confusing since we
     have two types of "FUSE backing files", so we desperately need
     better names to describe those two types:
1. struct file, which is referred via backing_id in per FUSE sb map
2. struct backing_file, which is referred via fuse file ->private
    (just like overlayfs backing files)

The concern is about the lsof-visibility of the first type, which the server
can open as many as it wants without having any connection to number
of fuse inodes and file objects in the kernel and server can close those
fds in its process file table, making those open files invisible to users.

This looks and sounds a lot like io_uring fixed files, especially considering
that the server could even pick the backing_id itself. So why do we need
to reinvent this wheel?

Does io_uring expose the fixed files table via lsof or otherwise?

Bernd,

IIUC, your work on FUSE io_uring queue is only for kernel -> user
requests. Is that correct?
Is there also a plan to have a user -> kernel uring as well?

I wouldn't mind if FUSE passthrough depended on FUSE io_uring
queue, because essentially, I think that both features address problems
from the same domain of FUSE performance issues. Do you agree?

Am I over engineering this?

Thanks,
Amir.

Changes from v14-rc1 to v14-rc2 [2]:
- rebase on 6.6-rc6 (and overlayfs and vfs next branches)
- open a backing_file per fuse_file with fuse file's path and flags
- factor out common splice/mmap helpers from overlayfs
- remove inode-bound (no backing id) backing file mode
- use backing_id as input arg for close ioctl
- remove auto-close mode

Changes from v13 to v14-rc1 [1]:
- rebase on 6.6-rc1
- factor out common read/write helpers from overlayfs
- factor out ioctl helpers
- per-file and per-inode backing file modes
- optional auto close flag

[1] https://github.com/amir73il/linux/commits/fuse-backing-fd-rc1
[2] https://github.com/amir73il/linux/commits/fuse-backing-fd-rc2
[3] https://github.com/amir73il/libfuse/commits/fuse-backing-fd

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-10-16 10:30                                 ` Amir Goldstein
@ 2023-10-16 13:21                                   ` Miklos Szeredi
  2023-10-16 19:16                                   ` Bernd Schubert
  1 sibling, 0 replies; 55+ messages in thread
From: Miklos Szeredi @ 2023-10-16 13:21 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Bernd Schubert, Daniel Rosenberg, Paul Lawrence, Alessio Balsini,
	fuse-devel, linux-fsdevel, Jens Axboe

On Mon, 16 Oct 2023 at 12:30, Amir Goldstein <amir73il@gmail.com> wrote:

> OK, this was pushed to the POC branches [2][3].
>
> NOTE that in this version, backing ids are completely server managed.
> The passthrough_hp example keeps backing_id in the server's Inode
> object and closes the backing file when server's inode.nopen drops to 0.
> Even if OPEN/CREATE reply with backing_id failed to create/open the
> fuse inode, RELEASE should still be called with the server provided nodeid,
> so server should be able to close the backing file after a failed open/create.
>
> I am ready to post v14 patches, but since they depend on vfs and overlayfs
> changes queued for 6.7, and since fuse passthrough is not a likely candidate
> for 6.7, I thought I will wait with posting, because you must be busy preparing
> for 6.7(?).

Dont' worry about that.

> The remaining question about lsof-visibility of the backing files got me
> thinking and I wanted to consult with io_uring developers regarding using
> io_uring fixed files table for FUSE backing files [*].
>
> [*] The term "FUSE backing files" is now VERY confusing since we
>      have two types of "FUSE backing files", so we desperately need
>      better names to describe those two types:
> 1. struct file, which is referred via backing_id in per FUSE sb map
> 2. struct backing_file, which is referred via fuse file ->private
>     (just like overlayfs backing files)
>
> The concern is about the lsof-visibility of the first type, which the server
> can open as many as it wants without having any connection to number
> of fuse inodes and file objects in the kernel and server can close those
> fds in its process file table, making those open files invisible to users.
>
> This looks and sounds a lot like io_uring fixed files, especially considering
> that the server could even pick the backing_id itself. So why do we need
> to reinvent this wheel?
>
> Does io_uring expose the fixed files table via lsof or otherwise?

I don't think so.

>
> Bernd,
>
> IIUC, your work on FUSE io_uring queue is only for kernel -> user
> requests. Is that correct?
> Is there also a plan to have a user -> kernel uring as well?
>
> I wouldn't mind if FUSE passthrough depended on FUSE io_uring
> queue, because essentially, I think that both features address problems
> from the same domain of FUSE performance issues. Do you agree?

It's a good plan if it all fits together.  I'm not all that familiar
with io_uring to say for sure.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write
  2023-10-16 10:30                                 ` Amir Goldstein
  2023-10-16 13:21                                   ` Miklos Szeredi
@ 2023-10-16 19:16                                   ` Bernd Schubert
  1 sibling, 0 replies; 55+ messages in thread
From: Bernd Schubert @ 2023-10-16 19:16 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi
  Cc: Daniel Rosenberg, Paul Lawrence, Alessio Balsini, fuse-devel,
	linux-fsdevel, Jens Axboe



On 10/16/23 12:30, Amir Goldstein wrote:
> On Tue, Oct 10, 2023 at 5:31 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> On Sun, 8 Oct 2023 at 19:53, Amir Goldstein <amir73il@gmail.com> wrote:
>>
>>> Ok, posted your original suggestion for opt-in to fake path:
>>> https://lore.kernel.org/linux-fsdevel/20231007084433.1417887-1-amir73il@gmail.com/
>>>
>>> Now the problem is that on FUSE_DEV_IOC_BACKING_OPEN ioctl,
>>> the fake (fuse) path is not known.
>>>
>>> We can set the fake path on the first FOPEN_PASSTHROUGH response,
>>> but then the whole concept of a backing id that is not bound to a
>>> single file/inode
>>> becomes a bit fuzzy.
>>>
>>> One solution is to allocate a backing_file container per fuse file on
>>> FOPEN_PASSTHROUGH response.
>>
>> Right.   How about the following idea:
>>
>>   - mapping request is done with an O_PATH fd.
>>   - fuse_open() always opens a backing file (just like overlayfs)
>>
>> The disadvantage is one more struct file (the third).  The advantage
>> is that the server doesn't have to care about open flags, hence the
>> mapping can always be per-inode.
>>
> 
> OK, this was pushed to the POC branches [2][3].
> 
> NOTE that in this version, backing ids are completely server managed.
> The passthrough_hp example keeps backing_id in the server's Inode
> object and closes the backing file when server's inode.nopen drops to 0.
> Even if OPEN/CREATE reply with backing_id failed to create/open the
> fuse inode, RELEASE should still be called with the server provided nodeid,
> so server should be able to close the backing file after a failed open/create.
> 
> I am ready to post v14 patches, but since they depend on vfs and overlayfs
> changes queued for 6.7, and since fuse passthrough is not a likely candidate
> for 6.7, I thought I will wait with posting, because you must be busy preparing
> for 6.7(?).
> 
> The remaining question about lsof-visibility of the backing files got me
> thinking and I wanted to consult with io_uring developers regarding using
> io_uring fixed files table for FUSE backing files [*].
> 
> [*] The term "FUSE backing files" is now VERY confusing since we
>       have two types of "FUSE backing files", so we desperately need
>       better names to describe those two types:
> 1. struct file, which is referred via backing_id in per FUSE sb map
> 2. struct backing_file, which is referred via fuse file ->private
>      (just like overlayfs backing files)
> 
> The concern is about the lsof-visibility of the first type, which the server
> can open as many as it wants without having any connection to number
> of fuse inodes and file objects in the kernel and server can close those
> fds in its process file table, making those open files invisible to users.
> 
> This looks and sounds a lot like io_uring fixed files, especially considering
> that the server could even pick the backing_id itself. So why do we need
> to reinvent this wheel?
> 
> Does io_uring expose the fixed files table via lsof or otherwise?
> 
> Bernd,
> 
> IIUC, your work on FUSE io_uring queue is only for kernel -> user
> requests. Is that correct?
> Is there also a plan to have a user -> kernel uring as well?

Right now I don't have support with the ring for user -> kernel 
notifications yet. Though we will make heavily use of a distributed lock 
manager (DLM) - that will need fast notifications and so better over a 
ring. In principle I could also use the existing ring and take some 
credits for the notifications, but it would make it harder to understand 
what is going on when something goes wrong.

> 
> I wouldn't mind if FUSE passthrough depended on FUSE io_uring
> queue, because essentially, I think that both features address problems
> from the same domain of FUSE performance issues. Do you agree?


The goal is definitely the same, just two different so far orthogonal ways.

I need to hurry up with the patch updates, got distracted by 
atomic-open. direct-io and ddn internal work...


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2023-10-16 19:16 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-19 12:56 [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
2023-05-19 12:56 ` [PATCH v13 01/10] fs: Generic function to convert iocb to rw flags Amir Goldstein
2023-05-19 12:56 ` [PATCH v13 02/10] fuse: Definitions and ioctl for passthrough Amir Goldstein
2023-05-19 15:12   ` Miklos Szeredi
2023-05-20 10:20     ` Amir Goldstein
2023-05-22 14:50       ` Miklos Szeredi
2023-05-24 10:00         ` Amir Goldstein
2023-05-19 12:56 ` [PATCH v13 03/10] fuse: Passthrough initialization and release Amir Goldstein
2023-05-19 12:56 ` [PATCH v13 04/10] fuse: Introduce synchronous read and write for passthrough Amir Goldstein
2023-05-19 12:57 ` [PATCH v13 05/10] fuse: Handle asynchronous read and write in passthrough Amir Goldstein
2023-05-22 15:20   ` Miklos Szeredi
2023-05-24 10:03     ` Amir Goldstein
2023-08-21 15:27       ` Amir Goldstein
2023-08-22 10:18         ` Amir Goldstein
2023-08-22 11:03           ` Miklos Szeredi
2023-08-22 13:22             ` Amir Goldstein
2023-08-22 14:06               ` Miklos Szeredi
2023-08-24 12:11             ` Amir Goldstein
2023-08-29 12:42               ` Miklos Szeredi
2023-05-19 12:57 ` [PATCH v13 06/10] fuse: Use daemon creds in passthrough mode Amir Goldstein
2023-05-19 12:57 ` [PATCH v13 07/10] fuse: Introduce passthrough for mmap Amir Goldstein
2023-05-19 12:57 ` [PATCH v13 08/10] fuse: update inode size/mtime after passthrough write Amir Goldstein
2023-09-25  6:41   ` [External] " Zhang Tianci
2023-09-25 10:43     ` Amir Goldstein
2023-09-26 15:31       ` Jens Axboe
2023-09-26 15:48         ` Amir Goldstein
2023-09-26 16:19           ` Jens Axboe
2023-09-26 16:56             ` Amir Goldstein
2023-05-19 12:57 ` [PATCH v13 09/10] fuse: invalidate atime after passthrough read/mmap Amir Goldstein
2023-05-19 12:57 ` [PATCH v13 10/10] fuse: setup a passthrough fd without a permanent backing id Amir Goldstein
2023-06-06 10:22   ` Fwd: " Miklos Szeredi
2023-06-06 11:00     ` [fuse-devel] " Amir Goldstein
2023-06-06 12:46       ` Miklos Szeredi
2023-05-20 10:28 ` [PATCH v13 00/10] fuse: Add support for passthrough read/write Amir Goldstein
2023-06-06  9:13 ` Amir Goldstein
2023-06-06  9:49   ` Miklos Szeredi
2023-06-06 11:19     ` Amir Goldstein
2023-06-06 13:06       ` Miklos Szeredi
2023-08-29 18:14         ` Amir Goldstein
2023-09-20 13:56           ` Amir Goldstein
2023-09-20 18:15             ` Miklos Szeredi
2023-09-21  7:33               ` Amir Goldstein
2023-09-21  8:21                 ` Miklos Szeredi
2023-09-21  9:17                   ` Amir Goldstein
2023-09-21  9:30                     ` Miklos Szeredi
2023-09-21 10:31                       ` Amir Goldstein
2023-09-21 11:50                         ` Miklos Szeredi
2023-09-22 12:45                           ` Amir Goldstein
2023-10-08 17:53                             ` Amir Goldstein
2023-10-10 14:31                               ` Miklos Szeredi
2023-10-10 15:14                                 ` Amir Goldstein
2023-10-14 16:01                                   ` Amir Goldstein
2023-10-16 10:30                                 ` Amir Goldstein
2023-10-16 13:21                                   ` Miklos Szeredi
2023-10-16 19:16                                   ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).