linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write
@ 2021-01-18 19:27 Alessio Balsini
  2021-01-18 19:27 ` [PATCH RESEND V11 1/7] fs: Generic function to convert iocb to rw flags Alessio Balsini
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Alessio Balsini @ 2021-01-18 19:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Akilesh Kailash, Amir Goldstein, Antonio SJ Musumeci,
	David Anderson, Giuseppe Scrivano, Jann Horn, Jens Axboe,
	Martijn Coenen, Palmer Dabbelt, Paul Lawrence, Peng Tao,
	Stefano Duo, Zimuzo Ezeozue, wuyan, fuse-devel, kernel-team,
	linux-fsdevel, linux-kernel

This is the 11th version of the series, rebased on top of v5.11-rc4.
Please find the changelog at the bottom of this cover letter.

Add support for file system passthrough read/write of files when enabled
in userspace through the option FUSE_PASSTHROUGH.

There are file systems based on FUSE that are intended to enforce
special policies or trigger complicated decision makings at the file
operations level.
Android, for example, uses FUSE to enforce fine-grained access policies
that also depend on the file contents.
Sometimes it happens that at open or create time a file is identified as
not requiring additional checks for consequent reads/writes, thus FUSE
would simply act as a passive bridge between the process accessing the
FUSE file system and the lower file system. Splicing and caching help
reduce the FUSE overhead, but there are still read/write operations
forwarded to the userspace FUSE daemon that could be avoided.

This series has been inspired by the original patches from Nikhilesh
Reddy, the idea and code of which has been elaborated and improved
thanks to the community support.

When the FUSE_PASSTHROUGH capability is enabled, the FUSE daemon may
decide while handling the open/create operations, if the given file can
be accessed in passthrough mode. This means that all the further read
and write operations would be forwarded by the kernel directly to the
lower file system using the VFS layer rather than to the FUSE daemon.
All the requests other than reads or writes are still handled by the
userspace FUSE daemon.
This allows for improved performance on reads and writes, especially in
the case of reads at random offsets, for which no (readahead) caching
mechanism would help.
Benchmarks show improved performance that is close to native file system
access when doing massive manipulations on a single opened file,
especially in the case of random reads, random writes and sequential
writes. Detailed benchmarking results are presented below.

The creation of this direct connection (passthrough) between FUSE file
objects and file objects in the lower file system happens in a way that
reminds of passing file descriptors via sockets:
- a process requests the opening of a file handled by FUSE, so the
  kernel forwards the request to the FUSE daemon;
- the FUSE daemon opens the target file in the lower file system,
  getting its file descriptor;
- the FUSE daemon also decides according to its internal policies if
  passthrough can be enabled for that file, and, if so, can perform a
  FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl() on /dev/fuse, passing the file
  descriptor obtained at the previous step and the fuse_req unique
  identifier;
- the kernel translates the file descriptor to the file pointer
  navigating through the opened files of the "current" process and
  temporarily stores it in the associated open/create fuse_req's
  passthrough_filp;
- when the FUSE daemon has done with the request and it's time for the
  kernel to close it, it checks if the passthrough_filp is available and
  in case updates the additional field in the fuse_file owned by the
  process accessing the FUSE file system.
From now on, all the read/write operations performed by that process
will be redirected to the corresponding lower file system file by
creating new VFS requests.
Since the read/write operation to the lower file system is executed with
the current process's credentials, it might happen that it does not have
enough privileges to succeed. For this reason, the process temporarily
receives the same credentials as the FUSE daemon, that are reverted as
soon as the read/write operation completes, emulating the behavior of
the request to be performed by the FUSE daemon itself. This solution has
been inspired by the way overlayfs handles read/write operations.
Asynchronous IO is supported as well, handled by creating separate AIO
requests for the lower file system that will be internally tracked by
FUSE, that intercepts and propagates their completion through an
internal ki_completed callback similar to the current implementation of
overlayfs.
The ioctl() has been designed taking as a reference and trying to
converge to the fuse2 implementation. For example, the
fuse_passthrough_out data structure has extra fields that will allow for
further extensions of the feature.


    Performance on RAM block device

What follows has been performed using a custom passthrough_hp FUSE
daemon that enables pass-through for each file that is opened during
both "open" and "create". Benchmarks were run on an Intel Xeon W-2135,
64 GiB of RAM workstation, with a RAM block device used as storage
target. More specifically, out of the system's 64 GiB of RAM, 40 GiB
were reserved for /dev/ram0, formatted as ext4. For the FUSE and FUSE
passthrough benchmarks, the FUSE file system was mounted on top of the
mounted /dev/ram0 device.
That file system has been completely filled and then cleaned up before
running the benchmarks: this to ensure that all the /dev/ram0 space was
reserved and not usable as page cache.

The rationale for using a RAM block device is that SSDs may experience
performance fluctuations, especially when dealing with accessing data
random offsets.
Getting rid of the discrete storage device also removes a huge component
of slowness, highlighting the performance difference of the software
parts (and probably the goodness of CPU caching and its coherence
mechanisms).

No special tuning has been performed, e.g., all the involved processes
are SCHED_OTHER, ondemand is the frequency governor with no frequency
restrictions, and turbo-boost, as well as p-state, are active. This is
because I noticed that, for such high-level benchmarks, results
consistency was minimally affected by these features.

The source code of the updated libfuse library and passthrough_hp is
shared at the following repository:

    https://github.com/balsini/libfuse/tree/fuse-passthrough-v11-v5.11-rc4

Two different kinds of benchmarks were done for this change, the first
set of tests evaluates the bandwidth improvements when manipulating huge
single files, the second set of tests verify that no performance
regressions were introduced when handling many small files.

All the caches were dropped before running every benchmark with

  echo 3 > /proc/sys/vm/drop_caches

All the benchmarks were run 10 times, with 1 minute cool down between
each run.

The first benchmarks were done by running FIO (fio-3.24) with:
- bs=4Ki;
- file size: 35Gi;
- ioengine: sync;
- fsync_on_close=1;
- randseed=0.
The target file has been chosen large enough to avoid it to be entirely
loaded into the page cache.

Results are presented in the following table:

+-----------+------------+-------------+-------------+
|   MiB/s   |    fuse    | passthrough |   native    |
+-----------+------------+-------------+-------------+
| read      | 471(±1.3%) | 1791(±1.0%) | 1839(±1.8%) |
| write     |  95(±.6%)  | 1068(±.9%)  | 1322(±.8%)  |
| randread  |  25(±1.7%) |  860(±.8%)  | 1135(±.5%)  |
| randwrite |  76(±3.0%) |  813(±1.0%) | 1005(±.7%)  |
+-----------+------------+-------------+-------------+

This table shows that FUSE, except for the sequential reads, is far
behind FUSE passthrough and native in terms of performance. The
extremely good FUSE performance for sequential reads is the result of a
great read-ahead mechanism.  I was able to verify that setting
read_ahead_kb to 0 causes a terrible performance drop.
All the results are stable, as shown by the standard deviations.
Moreover, these numbers show the reasonable gap between passthrough and
native, introduced by the extra traversal through the VFS layer.

As long as this patch has the primary objective of improving bandwidth,
another set of tests has been performed to see how this behaves on a
totally different scenario that involves accessing many small files. For
this purpose, measuring the build time of the Linux kernel has been
chosen as an appropriate, well-known, workload. The kernel has been
built with as many processes as the number of logical CPUs (-j
$(nproc)), that besides being a reasonable parallelization value, is
also enough to saturate the processor's utilization thanks to the
additional FUSE daemon's threads, making it even harder to get closer to
the native file system performance.
The following table shows the total build times in the different
configurations:

+------------------+--------------+-----------+
|                  | AVG duration |  Standard |
|                  |     (sec)    | deviation |
+------------------+--------------+-----------+
| FUSE             |      144.566 |     0.697 |
+------------------+--------------+-----------+
| FUSE passthrough |      133.820 |     0.341 |
+------------------+--------------+-----------+
| Native           |      109.423 |     0.724 |
+------------------+--------------+-----------+

Further testing and performance evaluations are welcome.


    Description of the series

Patch 1 generalizes the function which converts iocb flags to rw flags
  from overlayfs, so that can be used in this patch set.

Patch 2 enables the 32-bit compatibility for the /dev/fuse ioctl.

Patch 3 introduces the data structures, function signatures and ioctl()
  required both for the communication with userspace and for the
  internal kernel use.

Patch 4 introduces initialization and release functions for FUSE
  passthrough.

Patch 5 enables the synchronous read and write operations for those FUSE
  files for which the passthrough functionality is enabled.

Patch 6 extends the read and write operations to also support
  asynchronous IO.

Patch 7 allows FUSE passthrough to target files for which the requesting
  process would not have direct access to, by temporarily performing a
  credentials switch to the credentials of the FUSE daemon that issued
  the FUSE passthrough ioctl().


    Changelog

Changes in v11:
* Fix the FILESYSTEM_MAX_STACK_DEPTH check to allow other file systems
  to be stacked
* Moved file system stacking depth check at ioctl() time
* Update cover letter with correct libfuse repository to test the change
  [Requested by Peng Tao]
* Fix the file reference counter leak introduced in v10
  [Requested by yanwu]

Changes in v10:
* UAPI updated: ioctl() now returns an ID that will be used at
  open/create response time to reference the passthrough file
* Synchronous read/write_iter functions does not return silly errors
  (fixed in aio patch)
* FUSE daemon credentials updated at ioctl() time instead of mount time
* Updated benchmark results
  [Requested by Miklos Szeredi]

Changes in v9:
* Switched to using VFS instead of direct lower FS file ops
  [Attempt to address a request from Jens Axboe, Jann Horn, Amir
  Goldstein]
* Removal of useless included aio.h header
  [Proposed by Jens Axboe]

Changes in v8:
* aio requests now use kmalloc/kfree, instead of kmem_cache
* Switched to call_{read,write}_iter in AIO
* Revisited attributes copy
* Passthrough can only be enabled via ioctl(), fixing the security issue
  spotted by Jann
* Use an extensible fuse_passthrough_out data structure [Attempt to
  address a request from Nikolaus Rath, Amir Goldstein and Miklos Szeredi]

Changes in v7:
* Full handling of aio requests as done in overlayfs (update commit
  message).
* s/fget_raw/fget.
* Open fails in case of passthrough errors, emitting warning messages.
  [Proposed by Jann Horn]
* Create new local kiocb, getting rid of the previously proposed ki_filp
  swapping.
  [Proposed by Jann Horn and Jens Axboe]
* Code polishing.

Changes in v6:
* Port to kernel v5.8:
  * fuse_file_{read,write}_iter() changed since the v5 of this patch was
    proposed.
* Simplify fuse_simple_request().
* Merge fuse_passthrough.h into fuse_i.h
* Refactor of passthrough.c:
  * Remove BUG_ON()s.
  * Simplified error checking and request arguments indexing.
  * Use call_{read,write}_iter() utility functions.
  * Remove get_file() and fputs() during read/write: handle the extra
    FUSE references to the lower file object when the fuse_file is
    created/deleted.
  [Proposed by Jann Horn]

Changes in v5:
* Fix the check when setting the passthrough file.
  [Found when testing by Mike Shal]

Changes in v3 and v4:
* Use the fs_stack_depth to prevent further stacking and a minor fix.
  [Proposed by Jann Horn]

Changes in v2:
* Changed the feature name to passthrough from stacked_io.
  [Proposed by Linus Torvalds]


Alessio Balsini (7):
  fs: Generic function to convert iocb to rw flags
  fuse: 32-bit user space ioctl compat for fuse device
  fuse: Definitions and ioctl for passthrough
  fuse: Passthrough initialization and release
  fuse: Introduce synchronous read and write for passthrough
  fuse: Handle asynchronous read and write in passthrough
  fuse: Use daemon creds in passthrough mode

 fs/fuse/Makefile          |   1 +
 fs/fuse/dev.c             |  41 +++++--
 fs/fuse/dir.c             |   2 +
 fs/fuse/file.c            |  12 +-
 fs/fuse/fuse_i.h          |  32 +++++
 fs/fuse/inode.c           |  22 +++-
 fs/fuse/passthrough.c     | 239 ++++++++++++++++++++++++++++++++++++++
 fs/overlayfs/file.c       |  23 +---
 include/linux/fs.h        |   5 +
 include/uapi/linux/fuse.h |  14 ++-
 10 files changed, 356 insertions(+), 35 deletions(-)
 create mode 100644 fs/fuse/passthrough.c

-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RESEND V11 1/7] fs: Generic function to convert iocb to rw flags
  2021-01-18 19:27 [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write Alessio Balsini
@ 2021-01-18 19:27 ` Alessio Balsini
  2021-01-18 19:27 ` [PATCH RESEND V11 2/7] fuse: 32-bit user space ioctl compat for fuse device Alessio Balsini
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Alessio Balsini @ 2021-01-18 19:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Akilesh Kailash, Amir Goldstein, Antonio SJ Musumeci,
	David Anderson, Giuseppe Scrivano, Jann Horn, Jens Axboe,
	Martijn Coenen, Palmer Dabbelt, Paul Lawrence, Peng Tao,
	Stefano Duo, Zimuzo Ezeozue, wuyan, fuse-devel, kernel-team,
	linux-fsdevel, linux-kernel

OverlayFS implements its own function to translate iocb flags into rw
flags, so that they can be passed into another vfs call.
With commit ce71bfea207b4 ("fs: align IOCB_* flags with RWF_* flags")
Jens created a 1:1 matching between the iocb flags and rw flags,
simplifying the conversion.

Reduce the OverlayFS code by making the flag conversion function generic
and reusable.

Signed-off-by: Alessio Balsini <balsini@android.com>
---
 fs/overlayfs/file.c | 23 +++++------------------
 include/linux/fs.h  |  5 +++++
 2 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index bd9dd38347ae..56be2ffc5a14 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -15,6 +15,8 @@
 #include <linux/fs.h>
 #include "overlayfs.h"
 
+#define OVL_IOCB_MASK (IOCB_DSYNC | IOCB_HIPRI | IOCB_NOWAIT | IOCB_SYNC)
+
 struct ovl_aio_req {
 	struct kiocb iocb;
 	struct kiocb *orig_iocb;
@@ -236,22 +238,6 @@ static void ovl_file_accessed(struct file *file)
 	touch_atime(&file->f_path);
 }
 
-static rwf_t ovl_iocb_to_rwf(int ifl)
-{
-	rwf_t flags = 0;
-
-	if (ifl & IOCB_NOWAIT)
-		flags |= RWF_NOWAIT;
-	if (ifl & IOCB_HIPRI)
-		flags |= RWF_HIPRI;
-	if (ifl & IOCB_DSYNC)
-		flags |= RWF_DSYNC;
-	if (ifl & IOCB_SYNC)
-		flags |= RWF_SYNC;
-
-	return flags;
-}
-
 static void ovl_aio_cleanup_handler(struct ovl_aio_req *aio_req)
 {
 	struct kiocb *iocb = &aio_req->iocb;
@@ -299,7 +285,8 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	if (is_sync_kiocb(iocb)) {
 		ret = vfs_iter_read(real.file, iter, &iocb->ki_pos,
-				    ovl_iocb_to_rwf(iocb->ki_flags));
+				    iocb_to_rw_flags(iocb->ki_flags,
+						     OVL_IOCB_MASK));
 	} else {
 		struct ovl_aio_req *aio_req;
 
@@ -356,7 +343,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (is_sync_kiocb(iocb)) {
 		file_start_write(real.file);
 		ret = vfs_iter_write(real.file, iter, &iocb->ki_pos,
-				     ovl_iocb_to_rwf(ifl));
+				     iocb_to_rw_flags(ifl, OVL_IOCB_MASK));
 		file_end_write(real.file);
 		/* Update size */
 		ovl_copyattr(ovl_inode_real(inode), inode);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd47deea7c17..647c35423545 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3275,6 +3275,11 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
 	return 0;
 }
 
+static inline rwf_t iocb_to_rw_flags(int ifl, int iocb_mask)
+{
+	return ifl & iocb_mask;
+}
+
 static inline ino_t parent_ino(struct dentry *dentry)
 {
 	ino_t res;
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RESEND V11 2/7] fuse: 32-bit user space ioctl compat for fuse device
  2021-01-18 19:27 [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write Alessio Balsini
  2021-01-18 19:27 ` [PATCH RESEND V11 1/7] fs: Generic function to convert iocb to rw flags Alessio Balsini
@ 2021-01-18 19:27 ` Alessio Balsini
  2021-01-18 19:27 ` [PATCH RESEND V11 3/7] fuse: Definitions and ioctl for passthrough Alessio Balsini
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Alessio Balsini @ 2021-01-18 19:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Akilesh Kailash, Amir Goldstein, Antonio SJ Musumeci,
	David Anderson, Giuseppe Scrivano, Jann Horn, Jens Axboe,
	Martijn Coenen, Palmer Dabbelt, Paul Lawrence, Peng Tao,
	Stefano Duo, Zimuzo Ezeozue, wuyan, fuse-devel, kernel-team,
	linux-fsdevel, linux-kernel

With a 64-bit kernel build the FUSE device cannot handle ioctl requests
coming from 32-bit user space.
This is due to the ioctl command translation that generates different
command identifiers that thus cannot be used for comparisons previous
proper manipulation.

Explicitly extract type and number from the ioctl command to enable
32-bit user space compatibility on 64-bit kernel builds.

Signed-off-by: Alessio Balsini <balsini@android.com>
---
 fs/fuse/dev.c             | 29 ++++++++++++++++++-----------
 include/uapi/linux/fuse.h |  3 ++-
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 588f8d1240aa..ff9f3b83f879 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2233,37 +2233,44 @@ static int fuse_device_clone(struct fuse_conn *fc, struct file *new)
 static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
 			   unsigned long arg)
 {
-	int err = -ENOTTY;
+	int res;
+	int oldfd;
+	struct fuse_dev *fud = NULL;
 
-	if (cmd == FUSE_DEV_IOC_CLONE) {
-		int oldfd;
+	if (_IOC_TYPE(cmd) != FUSE_DEV_IOC_MAGIC)
+		return -EINVAL;
 
-		err = -EFAULT;
-		if (!get_user(oldfd, (__u32 __user *) arg)) {
+	switch (_IOC_NR(cmd)) {
+	case _IOC_NR(FUSE_DEV_IOC_CLONE):
+		res = -EFAULT;
+		if (!get_user(oldfd, (__u32 __user *)arg)) {
 			struct file *old = fget(oldfd);
 
-			err = -EINVAL;
+			res = -EINVAL;
 			if (old) {
-				struct fuse_dev *fud = NULL;
-
 				/*
 				 * Check against file->f_op because CUSE
 				 * uses the same ioctl handler.
 				 */
 				if (old->f_op == file->f_op &&
-				    old->f_cred->user_ns == file->f_cred->user_ns)
+				    old->f_cred->user_ns ==
+					    file->f_cred->user_ns)
 					fud = fuse_get_dev(old);
 
 				if (fud) {
 					mutex_lock(&fuse_mutex);
-					err = fuse_device_clone(fud->fc, file);
+					res = fuse_device_clone(fud->fc, file);
 					mutex_unlock(&fuse_mutex);
 				}
 				fput(old);
 			}
 		}
+		break;
+	default:
+		res = -ENOTTY;
+		break;
 	}
-	return err;
+	return res;
 }
 
 const struct file_operations fuse_dev_operations = {
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 98ca64d1beb6..54442612c48b 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -903,7 +903,8 @@ struct fuse_notify_retrieve_in {
 };
 
 /* Device ioctls: */
-#define FUSE_DEV_IOC_CLONE	_IOR(229, 0, uint32_t)
+#define FUSE_DEV_IOC_MAGIC		229
+#define FUSE_DEV_IOC_CLONE		_IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t)
 
 struct fuse_lseek_in {
 	uint64_t	fh;
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RESEND V11 3/7] fuse: Definitions and ioctl for passthrough
  2021-01-18 19:27 [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write Alessio Balsini
  2021-01-18 19:27 ` [PATCH RESEND V11 1/7] fs: Generic function to convert iocb to rw flags Alessio Balsini
  2021-01-18 19:27 ` [PATCH RESEND V11 2/7] fuse: 32-bit user space ioctl compat for fuse device Alessio Balsini
@ 2021-01-18 19:27 ` Alessio Balsini
  2021-01-19  6:33   ` Amir Goldstein
  2021-01-18 19:27 ` [PATCH RESEND V11 4/7] fuse: Passthrough initialization and release Alessio Balsini
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Alessio Balsini @ 2021-01-18 19:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Akilesh Kailash, Amir Goldstein, Antonio SJ Musumeci,
	David Anderson, Giuseppe Scrivano, Jann Horn, Jens Axboe,
	Martijn Coenen, Palmer Dabbelt, Paul Lawrence, Peng Tao,
	Stefano Duo, Zimuzo Ezeozue, wuyan, fuse-devel, kernel-team,
	linux-fsdevel, linux-kernel

Expose the FUSE_PASSTHROUGH interface to user space and declare all the
basic data structures and functions as the skeleton on top of which the
FUSE passthrough functionality will be built.

As part of this, introduce the new FUSE passthrough ioctl(), which
allows the FUSE daemon to specify a direct connection between a FUSE
file and a lower file system file. Such ioctl() requires users pace to
pass the file descriptor of one of its opened files through the
fuse_passthrough_out data structure introduced in this patch. This
structure includes extra fields for possible future extensions.
Also, add the passthrough functions for the set-up and tear-down of the
data structures and locks that will be used both when fuse_conns and
fuse_files are created/deleted.

Signed-off-by: Alessio Balsini <balsini@android.com>
---
 fs/fuse/Makefile          |  1 +
 fs/fuse/dev.c             | 12 ++++++++++++
 fs/fuse/dir.c             |  2 ++
 fs/fuse/file.c            |  4 +++-
 fs/fuse/fuse_i.h          | 27 +++++++++++++++++++++++++++
 fs/fuse/inode.c           | 17 ++++++++++++++++-
 fs/fuse/passthrough.c     | 21 +++++++++++++++++++++
 include/uapi/linux/fuse.h | 11 ++++++++++-
 8 files changed, 92 insertions(+), 3 deletions(-)
 create mode 100644 fs/fuse/passthrough.c

diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 8c7021fb2cd4..20ed23aa16fa 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_CUSE) += cuse.o
 obj-$(CONFIG_VIRTIO_FS) += virtiofs.o
 
 fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o
+fuse-y += passthrough.o
 fuse-$(CONFIG_FUSE_DAX) += dax.o
 
 virtiofs-y := virtio_fs.o
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ff9f3b83f879..5446f13db5a0 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2236,6 +2236,7 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
 	int res;
 	int oldfd;
 	struct fuse_dev *fud = NULL;
+	struct fuse_passthrough_out pto;
 
 	if (_IOC_TYPE(cmd) != FUSE_DEV_IOC_MAGIC)
 		return -EINVAL;
@@ -2266,6 +2267,17 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
 			}
 		}
 		break;
+	case _IOC_NR(FUSE_DEV_IOC_PASSTHROUGH_OPEN):
+		res = -EFAULT;
+		if (!copy_from_user(&pto,
+				    (struct fuse_passthrough_out __user *)arg,
+				    sizeof(pto))) {
+			res = -EINVAL;
+			fud = fuse_get_dev(file);
+			if (fud)
+				res = fuse_passthrough_open(fud, &pto);
+		}
+		break;
 	default:
 		res = -ENOTTY;
 		break;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 78f9f209078c..c9a1b33c5481 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -513,6 +513,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 {
 	int err;
 	struct inode *inode;
+	struct fuse_conn *fc = get_fuse_conn(dir);
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	FUSE_ARGS(args);
 	struct fuse_forget_link *forget;
@@ -574,6 +575,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 	ff->fh = outopen.fh;
 	ff->nodeid = outentry.nodeid;
 	ff->open_flags = outopen.open_flags;
+	fuse_passthrough_setup(fc, ff, &outopen);
 	inode = fuse_iget(dir->i_sb, outentry.nodeid, outentry.generation,
 			  &outentry.attr, entry_attr_timeout(&outentry), 0);
 	if (!inode) {
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 8cccecb55fb8..953f3034c375 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -158,7 +158,7 @@ int fuse_do_open(struct fuse_mount *fm, u64 nodeid, struct file *file,
 		if (!err) {
 			ff->fh = outarg.fh;
 			ff->open_flags = outarg.open_flags;
-
+			fuse_passthrough_setup(fc, ff, &outarg);
 		} else if (err != -ENOSYS) {
 			fuse_file_free(ff);
 			return err;
@@ -304,6 +304,8 @@ void fuse_release_common(struct file *file, bool isdir)
 	struct fuse_release_args *ra = ff->release_args;
 	int opcode = isdir ? FUSE_RELEASEDIR : FUSE_RELEASE;
 
+	fuse_passthrough_release(&ff->passthrough);
+
 	fuse_prepare_release(fi, ff, file->f_flags, opcode);
 
 	if (ff->flock) {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 7c4b8cb93f9f..8d39f5304a11 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -180,6 +180,14 @@ struct fuse_conn;
 struct fuse_mount;
 struct fuse_release_args;
 
+/**
+ * Reference to lower filesystem file for read/write operations handled in
+ * passthrough mode
+ */
+struct fuse_passthrough {
+	struct file *filp;
+};
+
 /** FUSE specific file data */
 struct fuse_file {
 	/** Fuse connection for this file */
@@ -225,6 +233,9 @@ struct fuse_file {
 
 	} readdir;
 
+	/** Container for data related to the passthrough functionality */
+	struct fuse_passthrough passthrough;
+
 	/** RB node to be linked on fuse_conn->polled_files */
 	struct rb_node polled_node;
 
@@ -755,6 +766,9 @@ struct fuse_conn {
 	/* Auto-mount submounts announced by the server */
 	unsigned int auto_submounts:1;
 
+	/** Passthrough mode for read/write IO */
+	unsigned int passthrough:1;
+
 	/** The number of requests waiting for completion */
 	atomic_t num_waiting;
 
@@ -798,6 +812,12 @@ struct fuse_conn {
 
 	/** List of filesystems using this connection */
 	struct list_head mounts;
+
+	/** IDR for passthrough requests */
+	struct idr passthrough_req;
+
+	/** Protects passthrough_req */
+	spinlock_t passthrough_req_lock;
 };
 
 /*
@@ -1213,4 +1233,11 @@ void fuse_dax_inode_cleanup(struct inode *inode);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
 
+/* passthrough.c */
+int fuse_passthrough_open(struct fuse_dev *fud,
+			  struct fuse_passthrough_out *pto);
+int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
+			   struct fuse_open_out *openarg);
+void fuse_passthrough_release(struct fuse_passthrough *passthrough);
+
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b0e18b470e91..d5c46eafb419 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -691,6 +691,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
 	memset(fc, 0, sizeof(*fc));
 	spin_lock_init(&fc->lock);
 	spin_lock_init(&fc->bg_lock);
+	spin_lock_init(&fc->passthrough_req_lock);
 	init_rwsem(&fc->killsb);
 	refcount_set(&fc->count, 1);
 	atomic_set(&fc->dev_count, 1);
@@ -699,6 +700,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
 	INIT_LIST_HEAD(&fc->bg_queue);
 	INIT_LIST_HEAD(&fc->entry);
 	INIT_LIST_HEAD(&fc->devices);
+	idr_init(&fc->passthrough_req);
 	atomic_set(&fc->num_waiting, 0);
 	fc->max_background = FUSE_DEFAULT_MAX_BACKGROUND;
 	fc->congestion_threshold = FUSE_DEFAULT_CONGESTION_THRESHOLD;
@@ -1052,6 +1054,12 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 				fc->handle_killpriv_v2 = 1;
 				fm->sb->s_flags |= SB_NOSEC;
 			}
+			if (arg->flags & FUSE_PASSTHROUGH) {
+				fc->passthrough = 1;
+				/* Prevent further stacking */
+				fm->sb->s_stack_depth =
+					FILESYSTEM_MAX_STACK_DEPTH + 1;
+			}
 		} else {
 			ra_pages = fc->max_read / PAGE_SIZE;
 			fc->no_lock = 1;
@@ -1095,7 +1103,7 @@ void fuse_send_init(struct fuse_mount *fm)
 		FUSE_PARALLEL_DIROPS | FUSE_HANDLE_KILLPRIV | FUSE_POSIX_ACL |
 		FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS |
 		FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA |
-		FUSE_HANDLE_KILLPRIV_V2;
+		FUSE_HANDLE_KILLPRIV_V2 | FUSE_PASSTHROUGH;
 #ifdef CONFIG_FUSE_DAX
 	if (fm->fc->dax)
 		ia->in.flags |= FUSE_MAP_ALIGNMENT;
@@ -1123,9 +1131,16 @@ void fuse_send_init(struct fuse_mount *fm)
 }
 EXPORT_SYMBOL_GPL(fuse_send_init);
 
+static int free_fuse_passthrough(int id, void *p, void *data)
+{
+	return 0;
+}
+
 void fuse_free_conn(struct fuse_conn *fc)
 {
 	WARN_ON(!list_empty(&fc->devices));
+	idr_for_each(&fc->passthrough_req, free_fuse_passthrough, NULL);
+	idr_destroy(&fc->passthrough_req);
 	kfree_rcu(fc, rcu);
 }
 EXPORT_SYMBOL_GPL(fuse_free_conn);
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
new file mode 100644
index 000000000000..594060c654f8
--- /dev/null
+++ b/fs/fuse/passthrough.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "fuse_i.h"
+
+#include <linux/fuse.h>
+
+int fuse_passthrough_open(struct fuse_dev *fud,
+			  struct fuse_passthrough_out *pto)
+{
+	return -EINVAL;
+}
+
+int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
+			   struct fuse_open_out *openarg)
+{
+	return -EINVAL;
+}
+
+void fuse_passthrough_release(struct fuse_passthrough *passthrough)
+{
+}
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 54442612c48b..9d7685ce0acd 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -360,6 +360,7 @@ struct fuse_file_lock {
 #define FUSE_MAP_ALIGNMENT	(1 << 26)
 #define FUSE_SUBMOUNTS		(1 << 27)
 #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
+#define FUSE_PASSTHROUGH	(1 << 29)
 
 /**
  * CUSE INIT request/reply flags
@@ -625,7 +626,7 @@ struct fuse_create_in {
 struct fuse_open_out {
 	uint64_t	fh;
 	uint32_t	open_flags;
-	uint32_t	padding;
+	uint32_t	passthrough_fh;
 };
 
 struct fuse_release_in {
@@ -828,6 +829,13 @@ struct fuse_in_header {
 	uint32_t	padding;
 };
 
+struct fuse_passthrough_out {
+	uint32_t	fd;
+	/* For future implementation */
+	uint32_t	len;
+	void		*vec;
+};
+
 struct fuse_out_header {
 	uint32_t	len;
 	int32_t		error;
@@ -905,6 +913,7 @@ struct fuse_notify_retrieve_in {
 /* Device ioctls: */
 #define FUSE_DEV_IOC_MAGIC		229
 #define FUSE_DEV_IOC_CLONE		_IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t)
+#define FUSE_DEV_IOC_PASSTHROUGH_OPEN	_IOW(FUSE_DEV_IOC_MAGIC, 1, struct fuse_passthrough_out)
 
 struct fuse_lseek_in {
 	uint64_t	fh;
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RESEND V11 4/7] fuse: Passthrough initialization and release
  2021-01-18 19:27 [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write Alessio Balsini
                   ` (2 preceding siblings ...)
  2021-01-18 19:27 ` [PATCH RESEND V11 3/7] fuse: Definitions and ioctl for passthrough Alessio Balsini
@ 2021-01-18 19:27 ` Alessio Balsini
  2021-01-19 12:06   ` Alessio Balsini
  2021-01-18 19:27 ` [PATCH RESEND V11 5/7] fuse: Introduce synchronous read and write for passthrough Alessio Balsini
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Alessio Balsini @ 2021-01-18 19:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Akilesh Kailash, Amir Goldstein, Antonio SJ Musumeci,
	David Anderson, Giuseppe Scrivano, Jann Horn, Jens Axboe,
	Martijn Coenen, Palmer Dabbelt, Paul Lawrence, Peng Tao,
	Stefano Duo, Zimuzo Ezeozue, wuyan, fuse-devel, kernel-team,
	linux-fsdevel, linux-kernel

Implement the FUSE passthrough ioctl() that associates the lower
(passthrough) file system file with the fuse_file.

The file descriptor passed to the ioctl() by the FUSE daemon is used to
access the relative file pointer, that will be copied to the fuse_file
data structure to consolidate the link between the FUSE and lower file
system.

To enable the passthrough mode, user space triggers the
FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl() and, if the call succeeds,
receives back an identifier that will be used at open/create response
time in the fuse_open_out field to associate the FUSE file to the lower
file system file.
The value returned by the ioctl() to user space can be:
- > 0: success, the identifier can be used as part of an open/create
  reply.
- < 0: an error occurred.
The value 0 has been left unused for backward compatibility: the
fuse_open_out field that is used to pass the passthrough_fh back to the
kernel uses the same bits that were previously as struct padding,
zero-initialized in the common libfuse implementation. Removing the 0
value fixes the ambiguity between the case in which 0 corresponds to a
real passthrough_fh or a missing implementation, simplifying the user
space implementation.

For the passthrough mode to be successfully activated, the lower file
system file must implement both read_iter and write_iter file
operations. This extra check avoids special pseudo files to be targeted
for this feature.
Passthrough comes with another limitation: if a FUSE file systems
enables passthrough, this feature is no more available to other FUSE
file systems stacked on top of it. This check is only performed when
FUSE passthrough is requested for a specific file and would simply
prevent the use of FUSE passthrough for that file, not limiting other
file operations.

Signed-off-by: Alessio Balsini <balsini@android.com>
---
 fs/fuse/inode.c       |  5 +++
 fs/fuse/passthrough.c | 87 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index d5c46eafb419..bc327789f25d 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1133,6 +1133,11 @@ EXPORT_SYMBOL_GPL(fuse_send_init);
 
 static int free_fuse_passthrough(int id, void *p, void *data)
 {
+	struct fuse_passthrough *passthrough = (struct fuse_passthrough *)p;
+
+	fuse_passthrough_release(passthrough);
+	kfree(p);
+
 	return 0;
 }
 
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index 594060c654f8..cf720ca14a45 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -3,19 +3,102 @@
 #include "fuse_i.h"
 
 #include <linux/fuse.h>
+#include <linux/idr.h>
 
 int fuse_passthrough_open(struct fuse_dev *fud,
 			  struct fuse_passthrough_out *pto)
 {
-	return -EINVAL;
+	int res;
+	struct file *passthrough_filp;
+	struct fuse_conn *fc = fud->fc;
+	struct inode *passthrough_inode;
+	struct super_block *passthrough_sb;
+	struct fuse_passthrough *passthrough;
+
+	if (!fc->passthrough)
+		return -EPERM;
+
+	/* This field is reserved for future implementation */
+	if (pto->len != 0)
+		return -EINVAL;
+
+	passthrough_filp = fget(pto->fd);
+	if (!passthrough_filp) {
+		pr_err("FUSE: invalid file descriptor for passthrough.\n");
+		return -EBADF;
+	}
+
+	if (!passthrough_filp->f_op->read_iter ||
+	    !passthrough_filp->f_op->write_iter) {
+		pr_err("FUSE: passthrough file misses file operations.\n");
+		res = -EBADF;
+		goto err_free_file;
+	}
+
+	passthrough_inode = file_inode(passthrough_filp);
+	passthrough_sb = passthrough_inode->i_sb;
+	if (passthrough_sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
+		pr_err("FUSE: fs stacking depth exceeded for passthrough\n");
+		res = -EINVAL;
+		goto err_free_file;
+	}
+
+	passthrough = kmalloc(sizeof(struct fuse_passthrough), GFP_KERNEL);
+	if (!passthrough) {
+		res = -ENOMEM;
+		goto err_free_file;
+	}
+
+	passthrough->filp = passthrough_filp;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&fc->passthrough_req_lock);
+	res = idr_alloc(&fc->passthrough_req, passthrough, 1, 0, GFP_ATOMIC);
+	spin_unlock(&fc->passthrough_req_lock);
+	idr_preload_end();
+
+	if (res > 0)
+		return res;
+
+	fuse_passthrough_release(passthrough);
+	kfree(passthrough);
+
+err_free_file:
+	fput(passthrough_filp);
+
+	return res;
 }
 
 int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
 			   struct fuse_open_out *openarg)
 {
-	return -EINVAL;
+	struct fuse_passthrough *passthrough;
+	int passthrough_fh = openarg->passthrough_fh;
+
+	if (!fc->passthrough)
+		return -EPERM;
+
+	/* Default case, passthrough is not requested */
+	if (passthrough_fh <= 0)
+		return -EINVAL;
+
+	spin_lock(&fc->passthrough_req_lock);
+	passthrough = idr_remove(&fc->passthrough_req, passthrough_fh);
+	spin_unlock(&fc->passthrough_req_lock);
+
+	if (!passthrough)
+		return -EINVAL;
+
+	ff->passthrough = *passthrough;
+	kfree(passthrough);
+
+	return 0;
 }
 
 void fuse_passthrough_release(struct fuse_passthrough *passthrough)
 {
+	if (passthrough->filp) {
+		fput(passthrough->filp);
+		passthrough->filp = NULL;
+	}
 }
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RESEND V11 5/7] fuse: Introduce synchronous read and write for passthrough
  2021-01-18 19:27 [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write Alessio Balsini
                   ` (3 preceding siblings ...)
  2021-01-18 19:27 ` [PATCH RESEND V11 4/7] fuse: Passthrough initialization and release Alessio Balsini
@ 2021-01-18 19:27 ` Alessio Balsini
  2021-01-18 19:27 ` [PATCH RESEND V11 6/7] fuse: Handle asynchronous read and write in passthrough Alessio Balsini
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Alessio Balsini @ 2021-01-18 19:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Akilesh Kailash, Amir Goldstein, Antonio SJ Musumeci,
	David Anderson, Giuseppe Scrivano, Jann Horn, Jens Axboe,
	Martijn Coenen, Palmer Dabbelt, Paul Lawrence, Peng Tao,
	Stefano Duo, Zimuzo Ezeozue, wuyan, fuse-devel, kernel-team,
	linux-fsdevel, linux-kernel

All the read and write operations performed on fuse_files which have the
passthrough feature enabled are forwarded to the associated lower file
system file via VFS.

Sending the request directly to the lower file system avoids the userspace
round-trip that, because of possible context switches and additional
operations might reduce the overall performance, especially in those cases
where caching doesn't help, for example in reads at random offsets.

Verifying if a fuse_file has a lower file system file associated with can
be done by checking the validity of its passthrough_filp pointer. This
pointer is not NULL only if passthrough has been successfully enabled via
the appropriate ioctl().
When a read/write operation is requested for a FUSE file with passthrough
enabled, a new equivalent VFS request is generated, which instead targets
the lower file system file.
The VFS layer performs additional checks that allow for safer operations
but may cause the operation to fail if the process accessing the FUSE file
system does not have access to the lower file system.

This change only implements synchronous requests in passthrough, returning
an error in the case of asynchronous operations, yet covering the majority
of the use cases.

Signed-off-by: Alessio Balsini <balsini@android.com>
---
 fs/fuse/file.c        |  8 ++++--
 fs/fuse/fuse_i.h      |  2 ++
 fs/fuse/passthrough.c | 57 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 953f3034c375..cddada1e8bd9 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1581,7 +1581,9 @@ static ssize_t fuse_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	if (FUSE_IS_DAX(inode))
 		return fuse_dax_read_iter(iocb, to);
 
-	if (!(ff->open_flags & FOPEN_DIRECT_IO))
+	if (ff->passthrough.filp)
+		return fuse_passthrough_read_iter(iocb, to);
+	else if (!(ff->open_flags & FOPEN_DIRECT_IO))
 		return fuse_cache_read_iter(iocb, to);
 	else
 		return fuse_direct_read_iter(iocb, to);
@@ -1599,7 +1601,9 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (FUSE_IS_DAX(inode))
 		return fuse_dax_write_iter(iocb, from);
 
-	if (!(ff->open_flags & FOPEN_DIRECT_IO))
+	if (ff->passthrough.filp)
+		return fuse_passthrough_write_iter(iocb, from);
+	else if (!(ff->open_flags & FOPEN_DIRECT_IO))
 		return fuse_cache_write_iter(iocb, from);
 	else
 		return fuse_direct_write_iter(iocb, from);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 8d39f5304a11..c4730d893324 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1239,5 +1239,7 @@ int fuse_passthrough_open(struct fuse_dev *fud,
 int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
 			   struct fuse_open_out *openarg);
 void fuse_passthrough_release(struct fuse_passthrough *passthrough);
+ssize_t fuse_passthrough_read_iter(struct kiocb *iocb, struct iov_iter *to);
+ssize_t fuse_passthrough_write_iter(struct kiocb *iocb, struct iov_iter *from);
 
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index cf720ca14a45..8f6882a31a0b 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -4,6 +4,63 @@
 
 #include <linux/fuse.h>
 #include <linux/idr.h>
+#include <linux/uio.h>
+
+#define PASSTHROUGH_IOCB_MASK                                                  \
+	(IOCB_APPEND | IOCB_DSYNC | IOCB_HIPRI | IOCB_NOWAIT | IOCB_SYNC)
+
+static void fuse_copyattr(struct file *dst_file, struct file *src_file)
+{
+	struct inode *dst = file_inode(dst_file);
+	struct inode *src = file_inode(src_file);
+
+	i_size_write(dst, i_size_read(src));
+}
+
+ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
+				   struct iov_iter *iter)
+{
+	ssize_t ret;
+	struct file *fuse_filp = iocb_fuse->ki_filp;
+	struct fuse_file *ff = fuse_filp->private_data;
+	struct file *passthrough_filp = ff->passthrough.filp;
+
+	if (!iov_iter_count(iter))
+		return 0;
+
+	ret = vfs_iter_read(passthrough_filp, iter, &iocb_fuse->ki_pos,
+			    iocb_to_rw_flags(iocb_fuse->ki_flags,
+					     PASSTHROUGH_IOCB_MASK));
+
+	return ret;
+}
+
+ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
+				    struct iov_iter *iter)
+{
+	ssize_t ret;
+	struct file *fuse_filp = iocb_fuse->ki_filp;
+	struct fuse_file *ff = fuse_filp->private_data;
+	struct inode *fuse_inode = file_inode(fuse_filp);
+	struct file *passthrough_filp = ff->passthrough.filp;
+
+	if (!iov_iter_count(iter))
+		return 0;
+
+	inode_lock(fuse_inode);
+
+	file_start_write(passthrough_filp);
+	ret = vfs_iter_write(passthrough_filp, iter, &iocb_fuse->ki_pos,
+			     iocb_to_rw_flags(iocb_fuse->ki_flags,
+					      PASSTHROUGH_IOCB_MASK));
+	file_end_write(passthrough_filp);
+	if (ret > 0)
+		fuse_copyattr(fuse_filp, passthrough_filp);
+
+	inode_unlock(fuse_inode);
+
+	return ret;
+}
 
 int fuse_passthrough_open(struct fuse_dev *fud,
 			  struct fuse_passthrough_out *pto)
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RESEND V11 6/7] fuse: Handle asynchronous read and write in passthrough
  2021-01-18 19:27 [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write Alessio Balsini
                   ` (4 preceding siblings ...)
  2021-01-18 19:27 ` [PATCH RESEND V11 5/7] fuse: Introduce synchronous read and write for passthrough Alessio Balsini
@ 2021-01-18 19:27 ` Alessio Balsini
  2021-01-18 19:27 ` [PATCH RESEND V11 7/7] fuse: Use daemon creds in passthrough mode Alessio Balsini
  2021-01-19 11:06 ` [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write Rokudo Yan
  7 siblings, 0 replies; 15+ messages in thread
From: Alessio Balsini @ 2021-01-18 19:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Akilesh Kailash, Amir Goldstein, Antonio SJ Musumeci,
	David Anderson, Giuseppe Scrivano, Jann Horn, Jens Axboe,
	Martijn Coenen, Palmer Dabbelt, Paul Lawrence, Peng Tao,
	Stefano Duo, Zimuzo Ezeozue, wuyan, fuse-devel, kernel-team,
	linux-fsdevel, linux-kernel

Extend the passthrough feature by handling asynchronous IO both for read
and write operations.

When an AIO request is received, if the request targets a FUSE file with
the passthrough functionality enabled, a new identical AIO request is
created. The new request targets the lower file system file and gets
assigned a special FUSE passthrough AIO completion callback.
When the lower file system AIO request is completed, the FUSE passthrough
AIO completion callback is executed and propagates the completion signal to
the FUSE AIO request by triggering its completion callback as well.

Signed-off-by: Alessio Balsini <balsini@android.com>
---
 fs/fuse/passthrough.c | 89 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 78 insertions(+), 11 deletions(-)

diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index 8f6882a31a0b..da71a74014d7 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -9,6 +9,11 @@
 #define PASSTHROUGH_IOCB_MASK                                                  \
 	(IOCB_APPEND | IOCB_DSYNC | IOCB_HIPRI | IOCB_NOWAIT | IOCB_SYNC)
 
+struct fuse_aio_req {
+	struct kiocb iocb;
+	struct kiocb *iocb_fuse;
+};
+
 static void fuse_copyattr(struct file *dst_file, struct file *src_file)
 {
 	struct inode *dst = file_inode(dst_file);
@@ -17,6 +22,32 @@ static void fuse_copyattr(struct file *dst_file, struct file *src_file)
 	i_size_write(dst, i_size_read(src));
 }
 
+static void fuse_aio_cleanup_handler(struct fuse_aio_req *aio_req)
+{
+	struct kiocb *iocb = &aio_req->iocb;
+	struct kiocb *iocb_fuse = aio_req->iocb_fuse;
+
+	if (iocb->ki_flags & IOCB_WRITE) {
+		__sb_writers_acquired(file_inode(iocb->ki_filp)->i_sb,
+				      SB_FREEZE_WRITE);
+		file_end_write(iocb->ki_filp);
+		fuse_copyattr(iocb_fuse->ki_filp, iocb->ki_filp);
+	}
+
+	iocb_fuse->ki_pos = iocb->ki_pos;
+	kfree(aio_req);
+}
+
+static void fuse_aio_rw_complete(struct kiocb *iocb, long res, long res2)
+{
+	struct fuse_aio_req *aio_req =
+		container_of(iocb, struct fuse_aio_req, iocb);
+	struct kiocb *iocb_fuse = aio_req->iocb_fuse;
+
+	fuse_aio_cleanup_handler(aio_req);
+	iocb_fuse->ki_complete(iocb_fuse, res, res2);
+}
+
 ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 				   struct iov_iter *iter)
 {
@@ -28,9 +59,24 @@ ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 	if (!iov_iter_count(iter))
 		return 0;
 
-	ret = vfs_iter_read(passthrough_filp, iter, &iocb_fuse->ki_pos,
-			    iocb_to_rw_flags(iocb_fuse->ki_flags,
-					     PASSTHROUGH_IOCB_MASK));
+	if (is_sync_kiocb(iocb_fuse)) {
+		ret = vfs_iter_read(passthrough_filp, iter, &iocb_fuse->ki_pos,
+				    iocb_to_rw_flags(iocb_fuse->ki_flags,
+						     PASSTHROUGH_IOCB_MASK));
+	} else {
+		struct fuse_aio_req *aio_req;
+
+		aio_req = kmalloc(sizeof(struct fuse_aio_req), GFP_KERNEL);
+		if (!aio_req)
+			return -ENOMEM;
+
+		aio_req->iocb_fuse = iocb_fuse;
+		kiocb_clone(&aio_req->iocb, iocb_fuse, passthrough_filp);
+		aio_req->iocb.ki_complete = fuse_aio_rw_complete;
+		ret = call_read_iter(passthrough_filp, &aio_req->iocb, iter);
+		if (ret != -EIOCBQUEUED)
+			fuse_aio_cleanup_handler(aio_req);
+	}
 
 	return ret;
 }
@@ -43,20 +89,41 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 	struct fuse_file *ff = fuse_filp->private_data;
 	struct inode *fuse_inode = file_inode(fuse_filp);
 	struct file *passthrough_filp = ff->passthrough.filp;
+	struct inode *passthrough_inode = file_inode(passthrough_filp);
 
 	if (!iov_iter_count(iter))
 		return 0;
 
 	inode_lock(fuse_inode);
 
-	file_start_write(passthrough_filp);
-	ret = vfs_iter_write(passthrough_filp, iter, &iocb_fuse->ki_pos,
-			     iocb_to_rw_flags(iocb_fuse->ki_flags,
-					      PASSTHROUGH_IOCB_MASK));
-	file_end_write(passthrough_filp);
-	if (ret > 0)
-		fuse_copyattr(fuse_filp, passthrough_filp);
-
+	if (is_sync_kiocb(iocb_fuse)) {
+		file_start_write(passthrough_filp);
+		ret = vfs_iter_write(passthrough_filp, iter, &iocb_fuse->ki_pos,
+				     iocb_to_rw_flags(iocb_fuse->ki_flags,
+						      PASSTHROUGH_IOCB_MASK));
+		file_end_write(passthrough_filp);
+		if (ret > 0)
+			fuse_copyattr(fuse_filp, passthrough_filp);
+	} else {
+		struct fuse_aio_req *aio_req;
+
+		aio_req = kmalloc(sizeof(struct fuse_aio_req), GFP_KERNEL);
+		if (!aio_req) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		file_start_write(passthrough_filp);
+		__sb_writers_release(passthrough_inode->i_sb, SB_FREEZE_WRITE);
+
+		aio_req->iocb_fuse = iocb_fuse;
+		kiocb_clone(&aio_req->iocb, iocb_fuse, passthrough_filp);
+		aio_req->iocb.ki_complete = fuse_aio_rw_complete;
+		ret = call_write_iter(passthrough_filp, &aio_req->iocb, iter);
+		if (ret != -EIOCBQUEUED)
+			fuse_aio_cleanup_handler(aio_req);
+	}
+out:
 	inode_unlock(fuse_inode);
 
 	return ret;
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RESEND V11 7/7] fuse: Use daemon creds in passthrough mode
  2021-01-18 19:27 [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write Alessio Balsini
                   ` (5 preceding siblings ...)
  2021-01-18 19:27 ` [PATCH RESEND V11 6/7] fuse: Handle asynchronous read and write in passthrough Alessio Balsini
@ 2021-01-18 19:27 ` Alessio Balsini
  2021-01-19 11:06 ` [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write Rokudo Yan
  7 siblings, 0 replies; 15+ messages in thread
From: Alessio Balsini @ 2021-01-18 19:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Akilesh Kailash, Amir Goldstein, Antonio SJ Musumeci,
	David Anderson, Giuseppe Scrivano, Jann Horn, Jens Axboe,
	Martijn Coenen, Palmer Dabbelt, Paul Lawrence, Peng Tao,
	Stefano Duo, Zimuzo Ezeozue, wuyan, fuse-devel, kernel-team,
	linux-fsdevel, linux-kernel

When using FUSE passthrough, read/write operations are directly forwarded
to the lower file system file through VFS, but there is no guarantee that
the process that is triggering the request has the right permissions to
access the lower file system. This would cause the read/write access to
fail.

In passthrough file systems, where the FUSE daemon is responsible for the
enforcement of the lower file system access policies, often happens that
the process dealing with the FUSE file system doesn't have access to the
lower file system.
Being the FUSE daemon in charge of implementing the FUSE file operations,
that in the case of read/write operations usually simply results in the
copy of memory buffers from/to the lower file system respectively, these
operations are executed with the FUSE daemon privileges.

This patch adds a reference to the FUSE daemon credentials, referenced at
FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl() time so that they can be used to
temporarily raise the user credentials when accessing lower file system
files in passthrough.
The process accessing the FUSE file with passthrough enabled temporarily
receives the privileges of the FUSE daemon while performing read/write
operations. Similar behavior is implemented in overlayfs.
These privileges will be reverted as soon as the IO operation completes.
This feature does not provide any higher security privileges to those
processes accessing the FUSE file system with passthrough enabled. This is
because it is still the FUSE daemon responsible for enabling or not the
passthrough feature at file open time, and should enable the feature only
after appropriate access policy checks.

Signed-off-by: Alessio Balsini <balsini@android.com>
---
 fs/fuse/fuse_i.h      |  5 ++++-
 fs/fuse/passthrough.c | 11 +++++++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index c4730d893324..815af1845b16 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -182,10 +182,13 @@ struct fuse_release_args;
 
 /**
  * Reference to lower filesystem file for read/write operations handled in
- * passthrough mode
+ * passthrough mode.
+ * This struct also tracks the credentials to be used for handling read/write
+ * operations.
  */
 struct fuse_passthrough {
 	struct file *filp;
+	struct cred *cred;
 };
 
 /** FUSE specific file data */
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index da71a74014d7..d81e3960b097 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -52,6 +52,7 @@ ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 				   struct iov_iter *iter)
 {
 	ssize_t ret;
+	const struct cred *old_cred;
 	struct file *fuse_filp = iocb_fuse->ki_filp;
 	struct fuse_file *ff = fuse_filp->private_data;
 	struct file *passthrough_filp = ff->passthrough.filp;
@@ -59,6 +60,7 @@ ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 	if (!iov_iter_count(iter))
 		return 0;
 
+	old_cred = override_creds(ff->passthrough.cred);
 	if (is_sync_kiocb(iocb_fuse)) {
 		ret = vfs_iter_read(passthrough_filp, iter, &iocb_fuse->ki_pos,
 				    iocb_to_rw_flags(iocb_fuse->ki_flags,
@@ -77,6 +79,7 @@ ssize_t fuse_passthrough_read_iter(struct kiocb *iocb_fuse,
 		if (ret != -EIOCBQUEUED)
 			fuse_aio_cleanup_handler(aio_req);
 	}
+	revert_creds(old_cred);
 
 	return ret;
 }
@@ -85,6 +88,7 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 				    struct iov_iter *iter)
 {
 	ssize_t ret;
+	const struct cred *old_cred;
 	struct file *fuse_filp = iocb_fuse->ki_filp;
 	struct fuse_file *ff = fuse_filp->private_data;
 	struct inode *fuse_inode = file_inode(fuse_filp);
@@ -96,6 +100,7 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 
 	inode_lock(fuse_inode);
 
+	old_cred = override_creds(ff->passthrough.cred);
 	if (is_sync_kiocb(iocb_fuse)) {
 		file_start_write(passthrough_filp);
 		ret = vfs_iter_write(passthrough_filp, iter, &iocb_fuse->ki_pos,
@@ -124,6 +129,7 @@ ssize_t fuse_passthrough_write_iter(struct kiocb *iocb_fuse,
 			fuse_aio_cleanup_handler(aio_req);
 	}
 out:
+	revert_creds(old_cred);
 	inode_unlock(fuse_inode);
 
 	return ret;
@@ -174,6 +180,7 @@ int fuse_passthrough_open(struct fuse_dev *fud,
 	}
 
 	passthrough->filp = passthrough_filp;
+	passthrough->cred = prepare_creds();
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&fc->passthrough_req_lock);
@@ -225,4 +232,8 @@ void fuse_passthrough_release(struct fuse_passthrough *passthrough)
 		fput(passthrough->filp);
 		passthrough->filp = NULL;
 	}
+	if (passthrough->cred) {
+		put_cred(passthrough->cred);
+		passthrough->cred = NULL;
+	}
 }
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND V11 3/7] fuse: Definitions and ioctl for passthrough
  2021-01-18 19:27 ` [PATCH RESEND V11 3/7] fuse: Definitions and ioctl for passthrough Alessio Balsini
@ 2021-01-19  6:33   ` Amir Goldstein
  2021-01-19 11:51     ` Alessio Balsini
  0 siblings, 1 reply; 15+ messages in thread
From: Amir Goldstein @ 2021-01-19  6:33 UTC (permalink / raw)
  To: Alessio Balsini
  Cc: Miklos Szeredi, Akilesh Kailash, Antonio SJ Musumeci,
	David Anderson, Giuseppe Scrivano, Jann Horn, Jens Axboe,
	Martijn Coenen, Palmer Dabbelt, Paul Lawrence, Peng Tao,
	Stefano Duo, Zimuzo Ezeozue, wuyan, fuse-devel, kernel-team,
	linux-fsdevel, linux-kernel

On Mon, Jan 18, 2021 at 9:28 PM Alessio Balsini <balsini@android.com> wrote:
>
> Expose the FUSE_PASSTHROUGH interface to user space and declare all the
> basic data structures and functions as the skeleton on top of which the
> FUSE passthrough functionality will be built.
>
> As part of this, introduce the new FUSE passthrough ioctl(), which
> allows the FUSE daemon to specify a direct connection between a FUSE
> file and a lower file system file. Such ioctl() requires users pace to
> pass the file descriptor of one of its opened files through the
> fuse_passthrough_out data structure introduced in this patch. This
> structure includes extra fields for possible future extensions.
> Also, add the passthrough functions for the set-up and tear-down of the
> data structures and locks that will be used both when fuse_conns and
> fuse_files are created/deleted.
>
> Signed-off-by: Alessio Balsini <balsini@android.com>
> ---
[...]

> @@ -699,6 +700,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
>         INIT_LIST_HEAD(&fc->bg_queue);
>         INIT_LIST_HEAD(&fc->entry);
>         INIT_LIST_HEAD(&fc->devices);
> +       idr_init(&fc->passthrough_req);
>         atomic_set(&fc->num_waiting, 0);
>         fc->max_background = FUSE_DEFAULT_MAX_BACKGROUND;
>         fc->congestion_threshold = FUSE_DEFAULT_CONGESTION_THRESHOLD;
> @@ -1052,6 +1054,12 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
>                                 fc->handle_killpriv_v2 = 1;
>                                 fm->sb->s_flags |= SB_NOSEC;
>                         }
> +                       if (arg->flags & FUSE_PASSTHROUGH) {
> +                               fc->passthrough = 1;
> +                               /* Prevent further stacking */
> +                               fm->sb->s_stack_depth =
> +                                       FILESYSTEM_MAX_STACK_DEPTH + 1;
> +                       }

Hi Allesio,

I'm sorry I missed the discussion on v10 patch, but this looks wrong.
First of all, assigning a value above a declared MAX_ is misleading
and setting a trap for someone else to trip in the future.

While this may be just a semantic mistake, the code that checks for
(passthrough_sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH)
is just cheating.

fuse_file_{read,write}_iter are stacked operations, no different in any way
than overlayfs and ecryptfs stacked file operations.

Peng Tao mentioned a case of passthrough to overlayfs over ecryptfs [1].
If anyone really thinks this use case is interesting enough (I doubt it), then
they may propose to bump up FILESYSTEM_MAX_STACK_DEPTH to 3,
but not to cheat around the currently defined maximum.

So please set s_max_depth to FILESYSTEM_MAX_STACK_DEPTH and
restore your v10 check of
passthrough_sb->s_stack_depth >= FILESYSTEM_MAX_STACK_DEPTH

Your commit message sounds as if the only purpose of this check is to
prevent stacking of FUSE passthrough on top of each other, but that
is not enough.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/CA+a=Yy6S9spMLr9BqyO1qvU52iAAXU3i9eVtb81SnrzjkCwO5Q@mail.gmail.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write
  2021-01-18 19:27 [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write Alessio Balsini
                   ` (6 preceding siblings ...)
  2021-01-18 19:27 ` [PATCH RESEND V11 7/7] fuse: Use daemon creds in passthrough mode Alessio Balsini
@ 2021-01-19 11:06 ` Rokudo Yan
  2021-01-19 12:34   ` Alessio Balsini
  7 siblings, 1 reply; 15+ messages in thread
From: Rokudo Yan @ 2021-01-19 11:06 UTC (permalink / raw)
  To: balsini
  Cc: akailash, amir73il, axboe, bergwolf, duostefano93, dvander,
	fuse-devel, gscrivan, jannh, kernel-team, linux-fsdevel,
	linux-kernel, maco, miklos, palmer, paullawrence, trapexit,
	wu-yan, zezeozue

on Mon, Jan 18, 2021 at 5:27 PM Alessio Balsini <balsini@android.com> wrote:
>
> This is the 11th version of the series, rebased on top of v5.11-rc4.
> Please find the changelog at the bottom of this cover letter.
> 
> Add support for file system passthrough read/write of files when enabled
> in userspace through the option FUSE_PASSTHROUGH.
[...]


Hi Allesio,

Could you please add support for passthrough mmap too ?
If the fuse file opened with passthrough actived, and then map (shared) to (another) process
address space using mmap interface. As access the file with mmap will pass the vfs cache of fuse,
but access the file with read/write will bypass the vfs cache of fuse, this may cause inconsistency.
eg. the reader read the fuse file with mmap() and the writer modify the file with write(), the reader
may not see the modification immediately since the writer bypass the vfs cache of fuse.
Actually we have already meet an issue caused by the inconsistency after applying fuse passthrough
scheme to our product.

Thanks,
yanwu.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND V11 3/7] fuse: Definitions and ioctl for passthrough
  2021-01-19  6:33   ` Amir Goldstein
@ 2021-01-19 11:51     ` Alessio Balsini
  0 siblings, 0 replies; 15+ messages in thread
From: Alessio Balsini @ 2021-01-19 11:51 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Alessio Balsini, Miklos Szeredi, Akilesh Kailash,
	Antonio SJ Musumeci, David Anderson, Giuseppe Scrivano,
	Jann Horn, Jens Axboe, Martijn Coenen, Palmer Dabbelt,
	Paul Lawrence, Peng Tao, Stefano Duo, Zimuzo Ezeozue, wuyan,
	fuse-devel, kernel-team, linux-fsdevel, linux-kernel

On Tue, Jan 19, 2021 at 08:33:16AM +0200, Amir Goldstein wrote:
> On Mon, Jan 18, 2021 at 9:28 PM Alessio Balsini <balsini@android.com> wrote:
> >
> > Expose the FUSE_PASSTHROUGH interface to user space and declare all the
> > basic data structures and functions as the skeleton on top of which the
> > FUSE passthrough functionality will be built.
> >
> > As part of this, introduce the new FUSE passthrough ioctl(), which
> > allows the FUSE daemon to specify a direct connection between a FUSE
> > file and a lower file system file. Such ioctl() requires users pace to
> > pass the file descriptor of one of its opened files through the
> > fuse_passthrough_out data structure introduced in this patch. This
> > structure includes extra fields for possible future extensions.
> > Also, add the passthrough functions for the set-up and tear-down of the
> > data structures and locks that will be used both when fuse_conns and
> > fuse_files are created/deleted.
> >
> > Signed-off-by: Alessio Balsini <balsini@android.com>
> > ---
> [...]
> 
> > @@ -699,6 +700,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
> >         INIT_LIST_HEAD(&fc->bg_queue);
> >         INIT_LIST_HEAD(&fc->entry);
> >         INIT_LIST_HEAD(&fc->devices);
> > +       idr_init(&fc->passthrough_req);
> >         atomic_set(&fc->num_waiting, 0);
> >         fc->max_background = FUSE_DEFAULT_MAX_BACKGROUND;
> >         fc->congestion_threshold = FUSE_DEFAULT_CONGESTION_THRESHOLD;
> > @@ -1052,6 +1054,12 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
> >                                 fc->handle_killpriv_v2 = 1;
> >                                 fm->sb->s_flags |= SB_NOSEC;
> >                         }
> > +                       if (arg->flags & FUSE_PASSTHROUGH) {
> > +                               fc->passthrough = 1;
> > +                               /* Prevent further stacking */
> > +                               fm->sb->s_stack_depth =
> > +                                       FILESYSTEM_MAX_STACK_DEPTH + 1;
> > +                       }
> 
> Hi Allesio,
> 
> I'm sorry I missed the discussion on v10 patch, but this looks wrong.
> First of all, assigning a value above a declared MAX_ is misleading
> and setting a trap for someone else to trip in the future.
> 
> While this may be just a semantic mistake, the code that checks for
> (passthrough_sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH)
> is just cheating.
> 
> fuse_file_{read,write}_iter are stacked operations, no different in any way
> than overlayfs and ecryptfs stacked file operations.
> 
> Peng Tao mentioned a case of passthrough to overlayfs over ecryptfs [1].
> If anyone really thinks this use case is interesting enough (I doubt it), then
> they may propose to bump up FILESYSTEM_MAX_STACK_DEPTH to 3,
> but not to cheat around the currently defined maximum.
> 
> So please set s_max_depth to FILESYSTEM_MAX_STACK_DEPTH and
> restore your v10 check of
> passthrough_sb->s_stack_depth >= FILESYSTEM_MAX_STACK_DEPTH
> 
> Your commit message sounds as if the only purpose of this check is to
> prevent stacking of FUSE passthrough on top of each other, but that
> is not enough.
> 
> Thanks,
> Amir.
> 
> [1] https://lore.kernel.org/linux-fsdevel/CA+a=Yy6S9spMLr9BqyO1qvU52iAAXU3i9eVtb81SnrzjkCwO5Q@mail.gmail.com/


Hi Amir,

The stacking solution in V10 works for me and, as we agreed last time,
we would still be able to update the stacking policy with FUSE
passthrough as soon as some use cases find it beneficial.

Our use case in Android is somewhat simple and it's difficult for me to
think of how this stacking can be useful or limiting to the different
use cases. It's anyway worth highlighting that if FUSE passthrough is
disabled, as it is by default, the FUSE behavior remains exactly the
same as it was before this series.
For my limited use cases experience here, I have no personal preferences
on the stacking policy I'm just trying to do the right thing based on
the feedback from the community :)

I can change this policy back as this was in V10, but at the same time I
don't want to put extra effort/confusion in the mailing list and to
Miklos with the next patch version. So I'm going to post the diff to
bring back the stacking policy as it was in V10 in reply to "[PATCH
RESEND V11 4/7] fuse: Passthrough initialization and release" and wait
for the community consensus before sending out the V12.

Thanks again for helpful feedback!

Cheers,
Alessio

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND V11 4/7] fuse: Passthrough initialization and release
  2021-01-18 19:27 ` [PATCH RESEND V11 4/7] fuse: Passthrough initialization and release Alessio Balsini
@ 2021-01-19 12:06   ` Alessio Balsini
  0 siblings, 0 replies; 15+ messages in thread
From: Alessio Balsini @ 2021-01-19 12:06 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Akilesh Kailash, Amir Goldstein, Antonio SJ Musumeci,
	David Anderson, Giuseppe Scrivano, Jann Horn, Jens Axboe,
	Martijn Coenen, Palmer Dabbelt, Paul Lawrence, Peng Tao,
	Stefano Duo, Zimuzo Ezeozue, wuyan, fuse-devel, kernel-team,
	linux-fsdevel, linux-kernel

On Mon, Jan 18, 2021 at 07:27:45PM +0000, Alessio Balsini wrote:
> Implement the FUSE passthrough ioctl() that associates the lower
> (passthrough) file system file with the fuse_file.
> 
> The file descriptor passed to the ioctl() by the FUSE daemon is used to
> access the relative file pointer, that will be copied to the fuse_file
> data structure to consolidate the link between the FUSE and lower file
> system.
> 
> To enable the passthrough mode, user space triggers the
> FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl() and, if the call succeeds,
> receives back an identifier that will be used at open/create response
> time in the fuse_open_out field to associate the FUSE file to the lower
> file system file.
> The value returned by the ioctl() to user space can be:
> - > 0: success, the identifier can be used as part of an open/create
>   reply.
> - < 0: an error occurred.
> The value 0 has been left unused for backward compatibility: the
> fuse_open_out field that is used to pass the passthrough_fh back to the
> kernel uses the same bits that were previously as struct padding,
> zero-initialized in the common libfuse implementation. Removing the 0
> value fixes the ambiguity between the case in which 0 corresponds to a
> real passthrough_fh or a missing implementation, simplifying the user
> space implementation.
> 
> For the passthrough mode to be successfully activated, the lower file
> system file must implement both read_iter and write_iter file
> operations. This extra check avoids special pseudo files to be targeted
> for this feature.
> Passthrough comes with another limitation: if a FUSE file systems
> enables passthrough, this feature is no more available to other FUSE
> file systems stacked on top of it. This check is only performed when
> FUSE passthrough is requested for a specific file and would simply
> prevent the use of FUSE passthrough for that file, not limiting other
> file operations.
> 
> Signed-off-by: Alessio Balsini <balsini@android.com>
> ---
>  fs/fuse/inode.c       |  5 +++
>  fs/fuse/passthrough.c | 87 ++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 90 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index d5c46eafb419..bc327789f25d 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1133,6 +1133,11 @@ EXPORT_SYMBOL_GPL(fuse_send_init);
>  
>  static int free_fuse_passthrough(int id, void *p, void *data)
>  {
> +	struct fuse_passthrough *passthrough = (struct fuse_passthrough *)p;
> +
> +	fuse_passthrough_release(passthrough);
> +	kfree(p);
> +
>  	return 0;
>  }
>  
> diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
> index 594060c654f8..cf720ca14a45 100644
> --- a/fs/fuse/passthrough.c
> +++ b/fs/fuse/passthrough.c
> @@ -3,19 +3,102 @@
>  #include "fuse_i.h"
>  
>  #include <linux/fuse.h>
> +#include <linux/idr.h>
>  
>  int fuse_passthrough_open(struct fuse_dev *fud,
>  			  struct fuse_passthrough_out *pto)
>  {
> -	return -EINVAL;
> +	int res;
> +	struct file *passthrough_filp;
> +	struct fuse_conn *fc = fud->fc;
> +	struct inode *passthrough_inode;
> +	struct super_block *passthrough_sb;
> +	struct fuse_passthrough *passthrough;
> +
> +	if (!fc->passthrough)
> +		return -EPERM;
> +
> +	/* This field is reserved for future implementation */
> +	if (pto->len != 0)
> +		return -EINVAL;
> +
> +	passthrough_filp = fget(pto->fd);
> +	if (!passthrough_filp) {
> +		pr_err("FUSE: invalid file descriptor for passthrough.\n");
> +		return -EBADF;
> +	}
> +
> +	if (!passthrough_filp->f_op->read_iter ||
> +	    !passthrough_filp->f_op->write_iter) {
> +		pr_err("FUSE: passthrough file misses file operations.\n");
> +		res = -EBADF;
> +		goto err_free_file;
> +	}
> +
> +	passthrough_inode = file_inode(passthrough_filp);
> +	passthrough_sb = passthrough_inode->i_sb;
> +	if (passthrough_sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
> +		pr_err("FUSE: fs stacking depth exceeded for passthrough\n");
> +		res = -EINVAL;
> +		goto err_free_file;
> +	}
> +
> +	passthrough = kmalloc(sizeof(struct fuse_passthrough), GFP_KERNEL);
> +	if (!passthrough) {
> +		res = -ENOMEM;
> +		goto err_free_file;
> +	}
> +
> +	passthrough->filp = passthrough_filp;
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&fc->passthrough_req_lock);
> +	res = idr_alloc(&fc->passthrough_req, passthrough, 1, 0, GFP_ATOMIC);
> +	spin_unlock(&fc->passthrough_req_lock);
> +	idr_preload_end();
> +
> +	if (res > 0)
> +		return res;
> +
> +	fuse_passthrough_release(passthrough);
> +	kfree(passthrough);
> +
> +err_free_file:
> +	fput(passthrough_filp);
> +
> +	return res;
>  }
>  
>  int fuse_passthrough_setup(struct fuse_conn *fc, struct fuse_file *ff,
>  			   struct fuse_open_out *openarg)
>  {
> -	return -EINVAL;
> +	struct fuse_passthrough *passthrough;
> +	int passthrough_fh = openarg->passthrough_fh;
> +
> +	if (!fc->passthrough)
> +		return -EPERM;
> +
> +	/* Default case, passthrough is not requested */
> +	if (passthrough_fh <= 0)
> +		return -EINVAL;
> +
> +	spin_lock(&fc->passthrough_req_lock);
> +	passthrough = idr_remove(&fc->passthrough_req, passthrough_fh);
> +	spin_unlock(&fc->passthrough_req_lock);
> +
> +	if (!passthrough)
> +		return -EINVAL;
> +
> +	ff->passthrough = *passthrough;
> +	kfree(passthrough);
> +
> +	return 0;
>  }
>  
>  void fuse_passthrough_release(struct fuse_passthrough *passthrough)
>  {
> +	if (passthrough->filp) {
> +		fput(passthrough->filp);
> +		passthrough->filp = NULL;
> +	}
>  }
> -- 
> 2.30.0.284.gd98b1dd5eaa7-goog
> 

Hi,

As Amir was noticing, the stacking policy proposed in this series (as
opposed to V10) is not enough to ensure that the same file undergoes
multiple FUSE passthrough paths, moreover, checking for:

  passthrough_sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH

looks misleading and hacky.
The simplest solution at this point in time would be to just go back to
the policy introduced in V10 and, if for some use use cases FUSE
passthrough is desirable in systems where complex stackings are
involved, the stacking policy can be revisited.

Before sending out the V12 of this series, I would love to have the
consensus both from the community and from Miklos on what is the best
way to go.

Here follows a simple diff that restores the policy as in V10.

Thanks,
Alessio

---8<---

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index bc327789f25d..7ebc398fbacb 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1058,7 +1058,7 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
                                fc->passthrough = 1;
                                /* Prevent further stacking */
                                fm->sb->s_stack_depth =
-                                       FILESYSTEM_MAX_STACK_DEPTH + 1;
+                                       FILESYSTEM_MAX_STACK_DEPTH;
                        }
                } else {
                        ra_pages = fc->max_read / PAGE_SIZE;
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index cf720ca14a45..cf993e83803e 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -37,7 +37,7 @@ int fuse_passthrough_open(struct fuse_dev *fud,

        passthrough_inode = file_inode(passthrough_filp);
        passthrough_sb = passthrough_inode->i_sb;
-       if (passthrough_sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
+       if (passthrough_sb->s_stack_depth >= FILESYSTEM_MAX_STACK_DEPTH) {
                pr_err("FUSE: fs stacking depth exceeded for passthrough\n");
                res = -EINVAL;
                goto err_free_file;

--->8---

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write
  2021-01-19 11:06 ` [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write Rokudo Yan
@ 2021-01-19 12:34   ` Alessio Balsini
  2021-01-22 11:06     ` Alessio Balsini
  0 siblings, 1 reply; 15+ messages in thread
From: Alessio Balsini @ 2021-01-19 12:34 UTC (permalink / raw)
  To: Rokudo Yan
  Cc: balsini, akailash, amir73il, axboe, bergwolf, duostefano93,
	dvander, fuse-devel, gscrivan, jannh, kernel-team, linux-fsdevel,
	linux-kernel, maco, miklos, palmer, paullawrence, trapexit,
	zezeozue

On Tue, Jan 19, 2021 at 07:06:54PM +0800, Rokudo Yan wrote:
> on Mon, Jan 18, 2021 at 5:27 PM Alessio Balsini <balsini@android.com> wrote:
> >
> > This is the 11th version of the series, rebased on top of v5.11-rc4.
> > Please find the changelog at the bottom of this cover letter.
> > 
> > Add support for file system passthrough read/write of files when enabled
> > in userspace through the option FUSE_PASSTHROUGH.
> [...]
> 
> 
> Hi Allesio,
> 
> Could you please add support for passthrough mmap too ?
> If the fuse file opened with passthrough actived, and then map (shared) to (another) process
> address space using mmap interface. As access the file with mmap will pass the vfs cache of fuse,
> but access the file with read/write will bypass the vfs cache of fuse, this may cause inconsistency.
> eg. the reader read the fuse file with mmap() and the writer modify the file with write(), the reader
> may not see the modification immediately since the writer bypass the vfs cache of fuse.
> Actually we have already meet an issue caused by the inconsistency after applying fuse passthrough
> scheme to our product.
> 
> Thanks,
> yanwu.

Hi yanwu,

Thank you for your interest in this change.

FUSE passthrough for mmap is an extension that is already in my TODO
list, together with passthrough for directories.
For now I would prefer to keep this series minimal to make the review
process leaner and simpler.
I will start working on extending this series with new features and
addressing more corner cases as soon as these changes get merged, what
do you think?

Thanks,
Alessio

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write
  2021-01-19 12:34   ` Alessio Balsini
@ 2021-01-22 11:06     ` Alessio Balsini
  2021-01-25  2:39       ` wu-yan
  0 siblings, 1 reply; 15+ messages in thread
From: Alessio Balsini @ 2021-01-22 11:06 UTC (permalink / raw)
  To: Rokudo Yan
  Cc: balsini, akailash, amir73il, axboe, bergwolf, duostefano93,
	dvander, fuse-devel, gscrivan, jannh, kernel-team, linux-fsdevel,
	linux-kernel, maco, miklos, palmer, paullawrence, trapexit,
	zezeozue

On Tue, Jan 19, 2021 at 12:34:23PM +0000, Alessio Balsini wrote:
> On Tue, Jan 19, 2021 at 07:06:54PM +0800, Rokudo Yan wrote:
> > on Mon, Jan 18, 2021 at 5:27 PM Alessio Balsini <balsini@android.com> wrote:
> > >
> > > This is the 11th version of the series, rebased on top of v5.11-rc4.
> > > Please find the changelog at the bottom of this cover letter.
> > > 
> > > Add support for file system passthrough read/write of files when enabled
> > > in userspace through the option FUSE_PASSTHROUGH.
> > [...]
> > 
> > 
> > Hi Allesio,
> > 
> > Could you please add support for passthrough mmap too ?
> > If the fuse file opened with passthrough actived, and then map (shared) to (another) process
> > address space using mmap interface. As access the file with mmap will pass the vfs cache of fuse,
> > but access the file with read/write will bypass the vfs cache of fuse, this may cause inconsistency.
> > eg. the reader read the fuse file with mmap() and the writer modify the file with write(), the reader
> > may not see the modification immediately since the writer bypass the vfs cache of fuse.
> > Actually we have already meet an issue caused by the inconsistency after applying fuse passthrough
> > scheme to our product.
> > 
> > Thanks,
> > yanwu.
> 
> Hi yanwu,
> 
> Thank you for your interest in this change.
> 
> FUSE passthrough for mmap is an extension that is already in my TODO
> list, together with passthrough for directories.
> For now I would prefer to keep this series minimal to make the review
> process leaner and simpler.
> I will start working on extending this series with new features and
> addressing more corner cases as soon as these changes get merged, what
> do you think?
> 
> Thanks,
> Alessio

Hi yanwu,

Sorry if I overlooked this issue. I added memory-mapping to my tests and
could reproduce/verify this wrong behavior you mentioned.

I created this WIP (history may change) branch that has the missing mmap
implementation:

  https://github.com/balsini/linux/commits/fuse-passthrough-v12-develop-v5.11-rc4

I did some mmap testing in the last days with this extra mmap
implementation and couldn't find any issues, everything seems to be
working as expected with the extra mmap patch. Can you please confirm
this is fixed on your end too?
I'm also going to revert in this branch the stacking policy changes to
how they were in V10 as suggested by Amir if there are no concerns with
that.
I'm waiting for some extra tests to complete and, if no issue is
detected, I'll post the V12 series super soon.

Thanks,
Alessio


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write
  2021-01-22 11:06     ` Alessio Balsini
@ 2021-01-25  2:39       ` wu-yan
  0 siblings, 0 replies; 15+ messages in thread
From: wu-yan @ 2021-01-25  2:39 UTC (permalink / raw)
  To: Alessio Balsini
  Cc: akailash, amir73il, axboe, bergwolf, duostefano93, dvander,
	fuse-devel, gscrivan, jannh, kernel-team, linux-fsdevel,
	linux-kernel, maco, miklos, palmer, paullawrence, trapexit,
	zezeozue

On 1/22/21 7:06 PM, Alessio Balsini wrote:
> On Tue, Jan 19, 2021 at 12:34:23PM +0000, Alessio Balsini wrote:
>> On Tue, Jan 19, 2021 at 07:06:54PM +0800, Rokudo Yan wrote:
>>> on Mon, Jan 18, 2021 at 5:27 PM Alessio Balsini <balsini@android.com> wrote:
>>>>
>>>> This is the 11th version of the series, rebased on top of v5.11-rc4.
>>>> Please find the changelog at the bottom of this cover letter.
>>>>
>>>> Add support for file system passthrough read/write of files when enabled
>>>> in userspace through the option FUSE_PASSTHROUGH.
>>> [...]
>>>
>>>
>>> Hi Allesio,
>>>
>>> Could you please add support for passthrough mmap too ?
>>> If the fuse file opened with passthrough actived, and then map (shared) to (another) process
>>> address space using mmap interface. As access the file with mmap will pass the vfs cache of fuse,
>>> but access the file with read/write will bypass the vfs cache of fuse, this may cause inconsistency.
>>> eg. the reader read the fuse file with mmap() and the writer modify the file with write(), the reader
>>> may not see the modification immediately since the writer bypass the vfs cache of fuse.
>>> Actually we have already meet an issue caused by the inconsistency after applying fuse passthrough
>>> scheme to our product.
>>>
>>> Thanks,
>>> yanwu.
>>
>> Hi yanwu,
>>
>> Thank you for your interest in this change.
>>
>> FUSE passthrough for mmap is an extension that is already in my TODO
>> list, together with passthrough for directories.
>> For now I would prefer to keep this series minimal to make the review
>> process leaner and simpler.
>> I will start working on extending this series with new features and
>> addressing more corner cases as soon as these changes get merged, what
>> do you think?
>>
>> Thanks,
>> Alessio
> 
> Hi yanwu,
> 
> Sorry if I overlooked this issue. I added memory-mapping to my tests and
> could reproduce/verify this wrong behavior you mentioned.
> 
> I created this WIP (history may change) branch that has the missing mmap
> implementation:
> 
>    https://github.com/balsini/linux/commits/fuse-passthrough-v12-develop-v5.11-rc4
> 
> I did some mmap testing in the last days with this extra mmap
> implementation and couldn't find any issues, everything seems to be
> working as expected with the extra mmap patch. Can you please confirm
> this is fixed on your end too?
> I'm also going to revert in this branch the stacking policy changes to
> how they were in V10 as suggested by Amir if there are no concerns with
> that.
> I'm waiting for some extra tests to complete and, if no issue is
> detected, I'll post the V12 series super soon.
> 
> Thanks,
> Alessio
> 

Hi, Alessio

Thank you for your reply. I have already added mmap for passthrough in 
our product, and the issue mentioned before is fixed. And I will update 
if any new issuses found.

Thanks
yanwu

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-01-25  2:41 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-18 19:27 [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write Alessio Balsini
2021-01-18 19:27 ` [PATCH RESEND V11 1/7] fs: Generic function to convert iocb to rw flags Alessio Balsini
2021-01-18 19:27 ` [PATCH RESEND V11 2/7] fuse: 32-bit user space ioctl compat for fuse device Alessio Balsini
2021-01-18 19:27 ` [PATCH RESEND V11 3/7] fuse: Definitions and ioctl for passthrough Alessio Balsini
2021-01-19  6:33   ` Amir Goldstein
2021-01-19 11:51     ` Alessio Balsini
2021-01-18 19:27 ` [PATCH RESEND V11 4/7] fuse: Passthrough initialization and release Alessio Balsini
2021-01-19 12:06   ` Alessio Balsini
2021-01-18 19:27 ` [PATCH RESEND V11 5/7] fuse: Introduce synchronous read and write for passthrough Alessio Balsini
2021-01-18 19:27 ` [PATCH RESEND V11 6/7] fuse: Handle asynchronous read and write in passthrough Alessio Balsini
2021-01-18 19:27 ` [PATCH RESEND V11 7/7] fuse: Use daemon creds in passthrough mode Alessio Balsini
2021-01-19 11:06 ` [PATCH RESEND V11 0/7] fuse: Add support for passthrough read/write Rokudo Yan
2021-01-19 12:34   ` Alessio Balsini
2021-01-22 11:06     ` Alessio Balsini
2021-01-25  2:39       ` wu-yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).