qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs
@ 2021-09-16  8:40 Hanna Reitz
  2021-09-16  8:40 ` [PATCH v4 01/12] virtiofsd: Keep /proc/self/mountinfo open Hanna Reitz
                   ` (12 more replies)
  0 siblings, 13 replies; 30+ messages in thread
From: Hanna Reitz @ 2021-09-16  8:40 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Reitz, Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal

Hi,

v1 cover letter for an overview:
https://listman.redhat.com/archives/virtio-fs/2021-June/msg00033.html

v2 cover letter:
https://listman.redhat.com/archives/virtio-fs/2021-June/msg00074.html

v3 cover letter:
https://listman.redhat.com/archives/virtio-fs/2021-July/msg00050.html


Here in v4, there are two main changes:

First, for turning a mount ID into a mount FD, we no longer use the
first inode that’s looked up on the respective mount, but instead look
into /proc/self/mountinfo to get the mount’s root directory and open
that.  Wel also refcount mount FDs now.

In theory, the advantage is that before the mount root directory can be
deleted, all files on the mount must be closed (and deleted), so the
refcounting will lead to the mount FD being closed.  Then, deleting the
mount root would actually result in the inode being deleted, and so with
inotify, the guest could receive a notification about it.

In practice, this is still a mount root (on the host), so it cannot be
deleted (rmdir returns EBUSY).  But that’s effectively just as well,
because this way we open a mount FD that cannot be deleted, and so, we
won’t have to worry about missing guest notifications.

Note that unmounting the virtiofs mount in the guest will lead to all
mount FDs correctly being closed, so the refcounting does work.


Second, I’ve renamed the TempFd objects to reflect what kind of FDs they
contain; i.e. they’re no longer called “inode_fd” or “dir_fd”, but
“path_fd”, “rw_fd”, or “dir_path_fd” instead.
p


Minor changes:
- -o inode_file_handles now auto-adds +dac_read_search to modcaps
- Some fixes


(Quick note: I’ll be on PTO from Sep 20 to Oct 3, so I’ll only be able
to respond to potential reviews after then.)


Exact changes per patch:

Patch 1:
- Added, we want to use the respective mount point as a mount FD, and we
  have to read /proc/self/mountinfo to translate mount IDs to mount
  points

Patch 2:
- Moved the FCHDIR_NOFAIL(lo->root.fd) (changing the CWD back to what it
  was) into the cleanup path, and added a bool `changed_cwd` to keep
  track of when we need to invoke it
- Dropped `open_inode` (`changed_cwd` kind of makes it obsolete)

Patch 3:
- Added temp_fd_copy() to (shallow-)copy a TempFd

Patch 4:
- Forgot an lo_inode_put() in lo_opendir()’s successful return path
- lo_setxattr(): Moved procname[] into the block where it’s used (now
  that it’s generated and used within a single `else` block)

Patch 5:
- Give TempFds a name that reflects what kind of FDs they are, e.g.
  `path_fd` instead of `inode_fd`
- lo_link(): Don’t overwrite errno for the error path, but store such
  values in saverr instead and add a new error path label below the
  existing `saverr = errno;`
- xattr functions: Moved TempFds into local blocks where they are
  actually used (to generate a filename in /proc/self/fd)
  (with the renaming of `inode_fd` to `path_fd`, we are going to need
  distinct variables for both conditional blocks here – one with an
  O_RDONLY FD (`rd_fd`), and one with an O_PATH FD (`path_fd`))

Patch 6:
- Like in patch 5, give TempFds a name to reflect what they are

Patch 7:
- Like in patches 5 and 6, give TempFds a name to reflect what they are
- `lo_setattr` thus has two TempFds now, which sometimes are the same FD
  (and we use `temp_fd_copy()` to copy one into the other)
- lo_opendir(): Close the FD if `fdopendir()` failed

Patch 8:
- Added: We want the mount FD collection to be in lo_data instead of in
  a static global variable, so we need to pass lo_data wherever we might
  need a mount FD

Patch 9:
- Put mount_fds and mount_fds_lock into lo_data
- Have mount_fds values be lo_mount_fd objects (FD + refcount) instead
  of just pure FDs
- Refcount mount FDs (so releasing a file handle now has to go through a
  dedicated function)
- Having a file handle and an O_PATH FD in an lo_inode object is no
  longer mutually exclusive

Patch 10:
- Destroying the inodes_by_handle hash map should depend on whether
  inodes_by_handle is non-NULL, not whether inodes_by_ids is non-NULL
  (i.e. fixed a code typo)

Patch 11:
- Auto-add CAP_DAC_READ_SEARCH with -o inode_file_handles instead of
  requiring the user to do so
- get_file_handle(): Use /proc/self/mountinfo to get a mount’s root
  point and open that as the mount FD instead of using the first inode
  that’s looked up on that mount
- Refcount mount FDs
- Have get_file_handle() return a file handle even if creating a
  corresponding mount FD failed, and so the file handle will not be
  open-able; we can still use this file handle for lookups

Patch 12:
- Rebase conflicts


git-backport-diff against v3:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/12:[down] 'virtiofsd: Keep /proc/self/mountinfo open'
002/12:[0012] [FC] 'virtiofsd: Limit setxattr()'s creds-dropped region'
003/12:[0014] [FC] 'virtiofsd: Add TempFd structure'
004/12:[0004] [FC] 'virtiofsd: Use lo_inode_open() instead of openat()'
005/12:[0109] [FC] 'virtiofsd: Add lo_inode_fd() helper'
006/12:[0030] [FC] 'virtiofsd: Let lo_fd() return a TempFd'
007/12:[0076] [FC] 'virtiofsd: Let lo_inode_open() return a TempFd'
008/12:[down] 'virtiofsd: Pass lo_data to lo_inode_{fd,open}()'
009/12:[0111] [FC] 'virtiofsd: Add lo_inode.fhandle'
010/12:[0002] [FC] 'virtiofsd: Add inodes_by_handle hash table'
011/12:[0235] [FC] 'virtiofsd: Optionally fill lo_inode.fhandle'
012/12:[0006] [FC] 'virtiofsd: Add lazy lo_do_find()'


Hanna Reitz (12):
  virtiofsd: Keep /proc/self/mountinfo open
  virtiofsd: Limit setxattr()'s creds-dropped region
  virtiofsd: Add TempFd structure
  virtiofsd: Use lo_inode_open() instead of openat()
  virtiofsd: Add lo_inode_fd() helper
  virtiofsd: Let lo_fd() return a TempFd
  virtiofsd: Let lo_inode_open() return a TempFd
  virtiofsd: Pass lo_data to lo_inode_{fd,open}()
  virtiofsd: Add lo_inode.fhandle
  virtiofsd: Add inodes_by_handle hash table
  virtiofsd: Optionally fill lo_inode.fhandle
  virtiofsd: Add lazy lo_do_find()

 tools/virtiofsd/helper.c              |    3 +
 tools/virtiofsd/passthrough_ll.c      | 1102 +++++++++++++++++++++----
 tools/virtiofsd/passthrough_seccomp.c |    2 +
 3 files changed, 951 insertions(+), 156 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v4 01/12] virtiofsd: Keep /proc/self/mountinfo open
  2021-09-16  8:40 [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Hanna Reitz
@ 2021-09-16  8:40 ` Hanna Reitz
  2021-10-18 17:07   ` Vivek Goyal
  2021-09-16  8:40 ` [PATCH v4 02/12] virtiofsd: Limit setxattr()'s creds-dropped region Hanna Reitz
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Hanna Reitz @ 2021-09-16  8:40 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Reitz, Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal

File handles are specific to mounts, and so name_to_handle_at() returns
the respective mount ID.  However, open_by_handle_at() is not content
with an ID, it wants a file descriptor for some inode on the mount,
which we have to open.

We want to use /proc/self/mountinfo to find the mounts' root directories
so we can open them and pass the respective FDs to open_by_handle_at().
(We need to use the root directory, because we want the inode belonging
to every mount FD be deletable.  Before the root directory can be
deleted, all entries within must have been closed, and so when it is
deleted, there should not be any file handles left that need its FD as
their mount FD.  Thus, we can then close that FD and the inode can be
deleted.[1])

That is why we need to open /proc/self/mountinfo so that we can use it
to translate mount IDs into root directory paths.  We have to open it
after setup_mounts() was called, because if we try to open it before, it
will appear as an empty file after setup_mounts().

[1] Note that in practice, you still cannot delete the mount root
directory.  It is a mount point on the host, after all, and mount points
cannot be deleted.  But by using the mount point as the mount FD, we
will at least not hog any actually deletable inodes.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 40 ++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 38b2af8599..6511a6acb4 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -172,6 +172,8 @@ struct lo_data {
 
     /* An O_PATH file descriptor to /proc/self/fd/ */
     int proc_self_fd;
+    /* A read-only FILE pointer for /proc/self/mountinfo */
+    FILE *mountinfo_fp;
     int user_killpriv_v2, killpriv_v2;
     /* If set, virtiofsd is responsible for setting umask during creation */
     bool change_umask;
@@ -3718,6 +3720,19 @@ static void setup_chroot(struct lo_data *lo)
 static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
                           bool enable_syslog)
 {
+    int proc_self, mountinfo_fd;
+    int saverr;
+
+    /*
+     * Open /proc/self before we pivot to the new root so we can still
+     * open /proc/self/mountinfo afterwards
+     */
+    proc_self = open("/proc/self", O_PATH);
+    if (proc_self < 0) {
+        fuse_log(FUSE_LOG_WARNING, "Failed to open /proc/self: %m; "
+                 "will not be able to use file handles\n");
+    }
+
     if (lo->sandbox == SANDBOX_NAMESPACE) {
         setup_namespaces(lo, se);
         setup_mounts(lo->source);
@@ -3725,6 +3740,31 @@ static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
         setup_chroot(lo);
     }
 
+    /*
+     * Opening /proc/self/mountinfo before the umount2() call in
+     * setup_mounts() leads to the file appearing empty.  That is why
+     * we defer opening it until here.
+     */
+    lo->mountinfo_fp = NULL;
+    if (proc_self >= 0) {
+        mountinfo_fd = openat(proc_self, "mountinfo", O_RDONLY);
+        if (mountinfo_fd < 0) {
+            saverr = errno;
+        } else if (mountinfo_fd >= 0) {
+            lo->mountinfo_fp = fdopen(mountinfo_fd, "r");
+            if (!lo->mountinfo_fp) {
+                saverr = errno;
+                close(mountinfo_fd);
+            }
+        }
+        if (!lo->mountinfo_fp) {
+            fuse_log(FUSE_LOG_WARNING, "Failed to open /proc/self/mountinfo: "
+                     "%s; will not be able to use file handles\n",
+                     strerror(saverr));
+        }
+        close(proc_self);
+    }
+
     setup_seccomp(enable_syslog);
     setup_capabilities(g_strdup(lo->modcaps));
 }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 02/12] virtiofsd: Limit setxattr()'s creds-dropped region
  2021-09-16  8:40 [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Hanna Reitz
  2021-09-16  8:40 ` [PATCH v4 01/12] virtiofsd: Keep /proc/self/mountinfo open Hanna Reitz
@ 2021-09-16  8:40 ` Hanna Reitz
  2021-10-18 17:20   ` Vivek Goyal
  2021-09-16  8:40 ` [PATCH v4 03/12] virtiofsd: Add TempFd structure Hanna Reitz
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Hanna Reitz @ 2021-09-16  8:40 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Reitz, Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal

We only need to drop/switch our credentials for the (f)setxattr() call
alone, not for the openat() or fchdir() around it.

(Right now, this may not be that big of a problem, but with inodes being
identified by file handles instead of an O_PATH fd, we will need
open_by_handle_at() calls here, which is really fickle when it comes to
credentials being dropped.)

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 34 +++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 6511a6acb4..b43afdfbd3 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3123,6 +3123,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     bool switched_creds = false;
     bool cap_fsetid_dropped = false;
     struct lo_cred old = {};
+    bool changed_cwd = false;
 
     if (block_xattr(lo, in_name)) {
         fuse_reply_err(req, EOPNOTSUPP);
@@ -3158,6 +3159,24 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
              ", name=%s value=%s size=%zd)\n", ino, name, value, size);
 
     sprintf(procname, "%i", inode->fd);
+    /*
+     * We can only open regular files or directories.  If the inode is
+     * something else, we have to enter /proc/self/fd and use
+     * setxattr() on the link's filename there.
+     */
+    if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
+        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        if (fd < 0) {
+            saverr = errno;
+            goto out;
+        }
+    } else {
+        /* fchdir should not fail here */
+        FCHDIR_NOFAIL(lo->proc_self_fd);
+        /* Set flag so the clean-up path will chdir back */
+        changed_cwd = true;
+    }
+
     /*
      * If we are setting posix access acl and if SGID needs to be
      * cleared, then switch to caller's gid and drop CAP_FSETID
@@ -3178,20 +3197,12 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         }
         switched_creds = true;
     }
-    if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
-        if (fd < 0) {
-            saverr = errno;
-            goto out;
-        }
+    if (fd >= 0) {
         ret = fsetxattr(fd, name, value, size, flags);
         saverr = ret == -1 ? errno : 0;
     } else {
-        /* fchdir should not fail here */
-        FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = setxattr(procname, name, value, size, flags);
         saverr = ret == -1 ? errno : 0;
-        FCHDIR_NOFAIL(lo->root.fd);
     }
     if (switched_creds) {
         if (cap_fsetid_dropped)
@@ -3201,6 +3212,11 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     }
 
 out:
+    if (changed_cwd) {
+        /* Change CWD back, fchdir should not fail here */
+        FCHDIR_NOFAIL(lo->root.fd);
+    }
+
     if (fd >= 0) {
         close(fd);
     }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 03/12] virtiofsd: Add TempFd structure
  2021-09-16  8:40 [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Hanna Reitz
  2021-09-16  8:40 ` [PATCH v4 01/12] virtiofsd: Keep /proc/self/mountinfo open Hanna Reitz
  2021-09-16  8:40 ` [PATCH v4 02/12] virtiofsd: Limit setxattr()'s creds-dropped region Hanna Reitz
@ 2021-09-16  8:40 ` Hanna Reitz
  2021-09-16  8:40 ` [PATCH v4 04/12] virtiofsd: Use lo_inode_open() instead of openat() Hanna Reitz
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Hanna Reitz @ 2021-09-16  8:40 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Reitz, Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal

We are planning to add file handles to lo_inode objects as an
alternative to lo_inode.fd.  That means that everywhere where we
currently reference lo_inode.fd, we will have to open a temporary file
descriptor that needs to be closed after use.

So instead of directly accessing lo_inode.fd, there will be a helper
function (lo_inode_fd()) that either returns lo_inode.fd, or opens a new
file descriptor with open_by_handle_at().  It encapsulates this result
in a TempFd structure to let the caller know whether the FD needs to be
closed after use (opened from the handle) or not (copied from
lo_inode.fd).

By using g_auto(TempFd) to store this result, callers will not even have
to care about closing a temporary FD after use.  It will be done
automatically once the object goes out of scope.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 63 ++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index b43afdfbd3..88318a4ba9 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -180,6 +180,28 @@ struct lo_data {
     int user_posix_acl, posix_acl;
 };
 
+/**
+ * Represents a file descriptor that may either be owned by this
+ * TempFd, or only referenced (i.e. the ownership belongs to some
+ * other object, and the value has just been copied into this TempFd).
+ *
+ * The purpose of this encapsulation is to be used as g_auto(TempFd)
+ * to automatically clean up owned file descriptors when this object
+ * goes out of scope.
+ *
+ * Use temp_fd_steal() to get an owned file descriptor that will not
+ * be closed when the TempFd goes out of scope.
+ */
+typedef struct {
+    int fd;
+    bool owned; /* fd owned by this object? */
+} TempFd;
+
+#define TEMP_FD_INIT ((TempFd) { .fd = -1, .owned = false })
+
+static void temp_fd_clear(TempFd *temp_fd);
+G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(TempFd, temp_fd_clear);
+
 static const struct fuse_opt lo_opts[] = {
     { "sandbox=namespace",
       offsetof(struct lo_data, sandbox),
@@ -257,6 +279,47 @@ static struct lo_data *lo_data(fuse_req_t req)
     return (struct lo_data *)fuse_req_userdata(req);
 }
 
+/**
+ * Clean-up function for TempFds
+ */
+static void temp_fd_clear(TempFd *temp_fd)
+{
+    if (temp_fd->owned) {
+        close(temp_fd->fd);
+        *temp_fd = TEMP_FD_INIT;
+    }
+}
+
+/**
+ * Return an owned fd from *temp_fd that will not be closed when
+ * *temp_fd goes out of scope.
+ *
+ * (TODO: Remove __attribute__ once this is used.)
+ */
+static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
+{
+    if (temp_fd->owned) {
+        temp_fd->owned = false;
+        return temp_fd->fd;
+    } else {
+        return dup(temp_fd->fd);
+    }
+}
+
+/**
+ * Create a borrowing copy of an existing TempFd.  Note that *to is
+ * only valid as long as *from is valid.
+ *
+ * (TODO: Remove __attribute__ once this is used.)
+ */
+static __attribute__((unused)) void temp_fd_copy(const TempFd *from, TempFd *to)
+{
+    *to = (TempFd) {
+        .fd = from->fd,
+        .owned = false,
+    };
+}
+
 /*
  * Load capng's state from our saved state if the current thread
  * hadn't previously been loaded.
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 04/12] virtiofsd: Use lo_inode_open() instead of openat()
  2021-09-16  8:40 [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Hanna Reitz
                   ` (2 preceding siblings ...)
  2021-09-16  8:40 ` [PATCH v4 03/12] virtiofsd: Add TempFd structure Hanna Reitz
@ 2021-09-16  8:40 ` Hanna Reitz
  2021-09-16  8:40 ` [PATCH v4 05/12] virtiofsd: Add lo_inode_fd() helper Hanna Reitz
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Hanna Reitz @ 2021-09-16  8:40 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Reitz, Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal

The xattr functions want a non-O_PATH FD, so they reopen the lo_inode.fd
with the flags they need through /proc/self/fd.

Similarly, lo_opendir() needs an O_RDONLY FD.  Instead of the
/proc/self/fd trick, it just uses openat(fd, "."), because the FD is
guaranteed to be a directory, so this works.

All cases have one problem in common, though: In the future, when we may
have a file handle in the lo_inode instead of an FD, querying an
lo_inode FD may incur an open_by_handle_at() call.  It does not make
sense to then reopen that FD with custom flags, those should have been
passed to open_by_handle_at() instead.

Use lo_inode_open() instead of openat().  As part of the file handle
change, lo_inode_open() will be made to invoke openat() only if
lo_inode.fd is valid.  Otherwise, it will invoke open_by_handle_at()
with the right flags from the start.

Consequently, after this patch, lo_inode_open() is the only place to
invoke openat() to reopen an existing FD with different flags.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 47 ++++++++++++++++++++------------
 1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 88318a4ba9..d76283d080 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1745,18 +1745,26 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
 {
     int error = ENOMEM;
     struct lo_data *lo = lo_data(req);
-    struct lo_dirp *d;
+    struct lo_inode *inode;
+    struct lo_dirp *d = NULL;
     int fd;
     ssize_t fh;
 
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        error = EBADF;
+        goto out_err;
+    }
+
     d = calloc(1, sizeof(struct lo_dirp));
     if (d == NULL) {
         goto out_err;
     }
 
-    fd = openat(lo_fd(req, ino), ".", O_RDONLY);
-    if (fd == -1) {
-        goto out_errno;
+    fd = lo_inode_open(lo, inode, O_RDONLY);
+    if (fd < 0) {
+        error = -fd;
+        goto out_err;
     }
 
     d->dp = fdopendir(fd);
@@ -1780,11 +1788,13 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
         fi->cache_readdir = 1;
     }
     fuse_reply_open(req, fi);
+    lo_inode_put(lo, &inode);
     return;
 
 out_errno:
     error = errno;
 out_err:
+    lo_inode_put(lo, &inode);
     if (d) {
         if (d->dp) {
             closedir(d->dp);
@@ -2989,7 +2999,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         }
     }
 
-    sprintf(procname, "%i", inode->fd);
     /*
      * It is not safe to open() non-regular/non-dir files in file server
      * unless O_PATH is used, so use that method for regular files/dir
@@ -2997,13 +3006,15 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      * Otherwise, call fchdir() to avoid open().
      */
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            goto out_err;
+            saverr = -fd;
+            goto out;
         }
         ret = fgetxattr(fd, name, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = getxattr(procname, name, value, size);
@@ -3070,15 +3081,16 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         }
     }
 
-    sprintf(procname, "%i", inode->fd);
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            goto out_err;
+            saverr = -fd;
+            goto out;
         }
         ret = flistxattr(fd, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = listxattr(procname, value, size);
@@ -3175,7 +3187,6 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
                         const char *value, size_t size, int flags,
                         uint32_t extra_flags)
 {
-    char procname[64];
     const char *name;
     char *mapped_name;
     struct lo_data *lo = lo_data(req);
@@ -3221,16 +3232,15 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     fuse_log(FUSE_LOG_DEBUG, "lo_setxattr(ino=%" PRIu64
              ", name=%s value=%s size=%zd)\n", ino, name, value, size);
 
-    sprintf(procname, "%i", inode->fd);
     /*
      * We can only open regular files or directories.  If the inode is
      * something else, we have to enter /proc/self/fd and use
      * setxattr() on the link's filename there.
      */
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            saverr = errno;
+            saverr = -fd;
             goto out;
         }
     } else {
@@ -3264,6 +3274,9 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         ret = fsetxattr(fd, name, value, size, flags);
         saverr = ret == -1 ? errno : 0;
     } else {
+        char procname[64];
+
+        sprintf(procname, "%i", inode->fd);
         ret = setxattr(procname, name, value, size, flags);
         saverr = ret == -1 ? errno : 0;
     }
@@ -3333,16 +3346,16 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     fuse_log(FUSE_LOG_DEBUG, "lo_removexattr(ino=%" PRIu64 ", name=%s)\n", ino,
              name);
 
-    sprintf(procname, "%i", inode->fd);
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            saverr = errno;
+            saverr = -fd;
             goto out;
         }
         ret = fremovexattr(fd, name);
         saverr = ret == -1 ? errno : 0;
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = removexattr(procname, name);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 05/12] virtiofsd: Add lo_inode_fd() helper
  2021-09-16  8:40 [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Hanna Reitz
                   ` (3 preceding siblings ...)
  2021-09-16  8:40 ` [PATCH v4 04/12] virtiofsd: Use lo_inode_open() instead of openat() Hanna Reitz
@ 2021-09-16  8:40 ` Hanna Reitz
  2021-09-16  8:40 ` [PATCH v4 06/12] virtiofsd: Let lo_fd() return a TempFd Hanna Reitz
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Hanna Reitz @ 2021-09-16  8:40 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Reitz, Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal

Once we let lo_inode.fd be optional, we will need its users to open the
file handle stored in lo_inode instead.  This function will do that.

For now, it just returns lo_inode.fd, though.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 159 +++++++++++++++++++++++++------
 1 file changed, 132 insertions(+), 27 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index d76283d080..c5baa752e4 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -651,6 +651,16 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
     return elem->inode;
 }
 
+static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
+{
+    *tfd = (TempFd) {
+        .fd = inode->fd,
+        .owned = false,
+    };
+
+    return 0;
+}
+
 /*
  * TODO Remove this helper and force callers to hold an inode refcount until
  * they are done with the fd.  This will be done in a later patch to make
@@ -838,11 +848,11 @@ static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
 static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
                        int valid, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) path_fd = TEMP_FD_INIT;
     int saverr;
     char procname[64];
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
-    int ifd;
     int res;
     int fd = -1;
 
@@ -852,7 +862,11 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         return;
     }
 
-    ifd = inode->fd;
+    res = lo_inode_fd(inode, &path_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out_err;
+    }
 
     /* If fi->fh is invalid we'll report EBADF later */
     if (fi) {
@@ -863,7 +877,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             res = fchmod(fd, attr->st_mode);
         } else {
-            sprintf(procname, "%i", ifd);
+            sprintf(procname, "%i", path_fd.fd);
             res = fchmodat(lo->proc_self_fd, procname, attr->st_mode, 0);
         }
         if (res == -1) {
@@ -875,12 +889,13 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         uid_t uid = (valid & FUSE_SET_ATTR_UID) ? attr->st_uid : (uid_t)-1;
         gid_t gid = (valid & FUSE_SET_ATTR_GID) ? attr->st_gid : (gid_t)-1;
 
-        saverr = drop_security_capability(lo, ifd);
+        saverr = drop_security_capability(lo, path_fd.fd);
         if (saverr) {
             goto out_err;
         }
 
-        res = fchownat(ifd, "", uid, gid, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+        res = fchownat(path_fd.fd, "", uid, gid,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
         if (res == -1) {
             saverr = errno;
             goto out_err;
@@ -959,7 +974,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             res = futimens(fd, tv);
         } else {
-            sprintf(procname, "%i", inode->fd);
+            sprintf(procname, "%i", path_fd.fd);
             res = utimensat(lo->proc_self_fd, procname, tv, 0);
         }
         if (res == -1) {
@@ -1074,7 +1089,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
                         struct fuse_entry_param *e,
                         struct lo_inode **inodep)
 {
-    int newfd;
+    g_auto(TempFd) dir_path_fd = TEMP_FD_INIT;
+    int newfd = -1;
     int res;
     int saverr;
     uint64_t mnt_id;
@@ -1104,7 +1120,13 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         name = ".";
     }
 
-    newfd = openat(dir->fd, name, O_PATH | O_NOFOLLOW);
+    res = lo_inode_fd(dir, &dir_path_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out;
+    }
+
+    newfd = openat(dir_path_fd.fd, name, O_PATH | O_NOFOLLOW);
     if (newfd == -1) {
         goto out_err;
     }
@@ -1171,6 +1193,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 
 out_err:
     saverr = errno;
+out:
     if (newfd != -1) {
         close(newfd);
     }
@@ -1328,6 +1351,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
                              const char *name, mode_t mode, dev_t rdev,
                              const char *link)
 {
+    g_auto(TempFd) dir_path_fd = TEMP_FD_INIT;
     int res;
     int saverr;
     struct lo_data *lo = lo_data(req);
@@ -1351,12 +1375,18 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
         return;
     }
 
+    res = lo_inode_fd(dir, &dir_path_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out;
+    }
+
     saverr = lo_change_cred(req, &old, lo->change_umask && !S_ISLNK(mode));
     if (saverr) {
         goto out;
     }
 
-    res = mknod_wrapper(dir->fd, name, link, mode, rdev);
+    res = mknod_wrapper(dir_path_fd.fd, name, link, mode, rdev);
 
     saverr = errno;
 
@@ -1404,6 +1434,8 @@ static void lo_symlink(fuse_req_t req, const char *link, fuse_ino_t parent,
 static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
                     const char *name)
 {
+    g_auto(TempFd) path_fd = TEMP_FD_INIT;
+    g_auto(TempFd) parent_path_fd = TEMP_FD_INIT;
     int res;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *parent_inode;
@@ -1425,22 +1457,35 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
     parent_inode = lo_inode(req, parent);
     inode = lo_inode(req, ino);
     if (!parent_inode || !inode) {
-        errno = EBADF;
-        goto out_err;
+        saverr = EBADF;
+        goto out;
+    }
+
+    res = lo_inode_fd(inode, &path_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out;
+    }
+
+    res = lo_inode_fd(parent_inode, &parent_path_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out;
     }
 
     memset(&e, 0, sizeof(struct fuse_entry_param));
     e.attr_timeout = lo->timeout;
     e.entry_timeout = lo->timeout;
 
-    sprintf(procname, "%i", inode->fd);
-    res = linkat(lo->proc_self_fd, procname, parent_inode->fd, name,
+    sprintf(procname, "%i", path_fd.fd);
+    res = linkat(lo->proc_self_fd, procname, parent_path_fd.fd, name,
                  AT_SYMLINK_FOLLOW);
     if (res == -1) {
         goto out_err;
     }
 
-    res = fstatat(inode->fd, "", &e.attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    res = fstatat(path_fd.fd, "", &e.attr,
+                  AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
     if (res == -1) {
         goto out_err;
     }
@@ -1460,6 +1505,7 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
 
 out_err:
     saverr = errno;
+out:
     lo_inode_put(lo, &parent_inode);
     lo_inode_put(lo, &inode);
     fuse_reply_err(req, saverr);
@@ -1469,23 +1515,34 @@ out_err:
 static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
                                     const char *name)
 {
+    g_auto(TempFd) dir_path_fd = TEMP_FD_INIT;
     int res;
     uint64_t mnt_id;
     struct stat attr;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *dir = lo_inode(req, parent);
+    struct lo_inode *inode = NULL;
 
     if (!dir) {
-        return NULL;
+        goto out;
     }
 
-    res = do_statx(lo, dir->fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
-    lo_inode_put(lo, &dir);
+    res = lo_inode_fd(dir, &dir_path_fd);
+    if (res < 0) {
+        goto out;
+    }
+
+    res = do_statx(lo, dir_path_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW,
+                   &mnt_id);
     if (res == -1) {
-        return NULL;
+        goto out;
     }
 
-    return lo_find(lo, &attr, mnt_id);
+    inode = lo_find(lo, &attr, mnt_id);
+
+out:
+    lo_inode_put(lo, &dir);
+    return inode;
 }
 
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
@@ -1521,6 +1578,8 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
                       fuse_ino_t newparent, const char *newname,
                       unsigned int flags)
 {
+    g_auto(TempFd) parent_path_fd = TEMP_FD_INIT;
+    g_auto(TempFd) newparent_path_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *parent_inode;
     struct lo_inode *newparent_inode;
@@ -1553,12 +1612,24 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
+    res = lo_inode_fd(parent_inode, &parent_path_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        goto out;
+    }
+
+    res = lo_inode_fd(newparent_inode, &newparent_path_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        goto out;
+    }
+
     if (flags) {
 #ifndef SYS_renameat2
         fuse_reply_err(req, EINVAL);
 #else
-        res = syscall(SYS_renameat2, parent_inode->fd, name,
-                        newparent_inode->fd, newname, flags);
+        res = syscall(SYS_renameat2, parent_path_fd.fd, name,
+                        newparent_path_fd.fd, newname, flags);
         if (res == -1 && errno == ENOSYS) {
             fuse_reply_err(req, EINVAL);
         } else {
@@ -1568,7 +1639,7 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
-    res = renameat(parent_inode->fd, name, newparent_inode->fd, newname);
+    res = renameat(parent_path_fd.fd, name, newparent_path_fd.fd, newname);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
 out:
@@ -2054,6 +2125,7 @@ static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
 static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
                       mode_t mode, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) parent_path_fd = TEMP_FD_INIT;
     int fd = -1;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *parent_inode;
@@ -2076,6 +2148,12 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
         return;
     }
 
+    err = lo_inode_fd(parent_inode, &parent_path_fd);
+    if (err < 0) {
+        err = -err;
+        goto out;
+    }
+
     err = lo_change_cred(req, &old, lo->change_umask);
     if (err) {
         goto out;
@@ -2084,7 +2162,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
     update_open_flags(lo->writeback, lo->allow_direct_io, fi);
 
     /* Try to create a new file but don't open existing files */
-    fd = openat(parent_inode->fd, name, fi->flags | O_CREAT | O_EXCL, mode);
+    fd = openat(parent_path_fd.fd, name, fi->flags | O_CREAT | O_EXCL, mode);
     err = fd == -1 ? errno : 0;
 
     lo_restore_cred(&old, lo->change_umask);
@@ -3014,7 +3092,14 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         ret = fgetxattr(fd, name, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
-        sprintf(procname, "%i", inode->fd);
+        g_auto(TempFd) path_fd = TEMP_FD_INIT;
+
+        ret = lo_inode_fd(inode, &path_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", path_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = getxattr(procname, name, value, size);
@@ -3090,7 +3175,14 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         ret = flistxattr(fd, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
-        sprintf(procname, "%i", inode->fd);
+        g_auto(TempFd) path_fd = TEMP_FD_INIT;
+
+        ret = lo_inode_fd(inode, &path_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", path_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = listxattr(procname, value, size);
@@ -3187,6 +3279,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
                         const char *value, size_t size, int flags,
                         uint32_t extra_flags)
 {
+    g_auto(TempFd) path_fd = TEMP_FD_INIT;
     const char *name;
     char *mapped_name;
     struct lo_data *lo = lo_data(req);
@@ -3244,6 +3337,11 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
             goto out;
         }
     } else {
+        ret = lo_inode_fd(inode, &path_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         /* Set flag so the clean-up path will chdir back */
@@ -3276,7 +3374,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     } else {
         char procname[64];
 
-        sprintf(procname, "%i", inode->fd);
+        sprintf(procname, "%i", path_fd.fd);
         ret = setxattr(procname, name, value, size, flags);
         saverr = ret == -1 ? errno : 0;
     }
@@ -3355,7 +3453,14 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
         ret = fremovexattr(fd, name);
         saverr = ret == -1 ? errno : 0;
     } else {
-        sprintf(procname, "%i", inode->fd);
+        g_auto(TempFd) path_fd = TEMP_FD_INIT;
+
+        ret = lo_inode_fd(inode, &path_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", path_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = removexattr(procname, name);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 06/12] virtiofsd: Let lo_fd() return a TempFd
  2021-09-16  8:40 [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Hanna Reitz
                   ` (4 preceding siblings ...)
  2021-09-16  8:40 ` [PATCH v4 05/12] virtiofsd: Add lo_inode_fd() helper Hanna Reitz
@ 2021-09-16  8:40 ` Hanna Reitz
  2021-09-16  8:40 ` [PATCH v4 07/12] virtiofsd: Let lo_inode_open() " Hanna Reitz
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Hanna Reitz @ 2021-09-16  8:40 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Reitz, Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal

Accessing lo_inode.fd must generally happen through lo_inode_fd(), and
lo_fd() is no exception; and then it must pass on the TempFd it has
received from lo_inode_fd().

(Note that all lo_fd() calls now use proper error handling, where all of
them were in-line before; i.e. they were used in place of the fd
argument of some function call.  This only worked because the only error
that could occur was that lo_inode() failed to find the inode ID: Then
-1 would be passed as the fd, which would result in an EBADF error,
which is precisely what we would want to return to the guest for an
invalid inode ID.
Now, though, lo_inode_fd() might potentially invoke open_by_handle_at(),
which can return many different errors, and they should be properly
handled and returned to the guest.  So we can no longer allow lo_fd() to
be used in-line, and instead need to do proper error handling for it.)

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 55 +++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 11 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index c5baa752e4..3bf20b8659 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -666,18 +666,19 @@ static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
  * they are done with the fd.  This will be done in a later patch to make
  * review easier.
  */
-static int lo_fd(fuse_req_t req, fuse_ino_t ino)
+static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
 {
     struct lo_inode *inode = lo_inode(req, ino);
-    int fd;
+    int res;
 
     if (!inode) {
-        return -1;
+        return -EBADF;
     }
 
-    fd = inode->fd;
+    res = lo_inode_fd(inode, tfd);
+
     lo_inode_put(lo_data(req), &inode);
-    return fd;
+    return res;
 }
 
 /*
@@ -814,14 +815,19 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
 static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
                        struct fuse_file_info *fi)
 {
+    g_auto(TempFd) path_fd = TEMP_FD_INIT;
     int res;
     struct stat buf;
     struct lo_data *lo = lo_data(req);
 
     (void)fi;
 
-    res =
-        fstatat(lo_fd(req, ino), "", &buf, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    res = lo_fd(req, ino, &path_fd);
+    if (res < 0) {
+        return (void)fuse_reply_err(req, -res);
+    }
+
+    res = fstatat(path_fd.fd, "", &buf, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
     if (res == -1) {
         return (void)fuse_reply_err(req, errno);
     }
@@ -1547,6 +1553,7 @@ out:
 
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
+    g_auto(TempFd) parent_path_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *inode;
     struct lo_data *lo = lo_data(req);
@@ -1561,13 +1568,19 @@ static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
         return;
     }
 
+    res = lo_fd(req, parent, &parent_path_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
     inode = lookup_name(req, parent, name);
     if (!inode) {
         fuse_reply_err(req, EIO);
         return;
     }
 
-    res = unlinkat(lo_fd(req, parent), name, AT_REMOVEDIR);
+    res = unlinkat(parent_path_fd.fd, name, AT_REMOVEDIR);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
     unref_inode_lolocked(lo, inode, 1);
@@ -1653,6 +1666,7 @@ out:
 
 static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
+    g_auto(TempFd) parent_path_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *inode;
     struct lo_data *lo = lo_data(req);
@@ -1667,13 +1681,19 @@ static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
         return;
     }
 
+    res = lo_fd(req, parent, &parent_path_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
     inode = lookup_name(req, parent, name);
     if (!inode) {
         fuse_reply_err(req, EIO);
         return;
     }
 
-    res = unlinkat(lo_fd(req, parent), name, 0);
+    res = unlinkat(parent_path_fd.fd, name, 0);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
     unref_inode_lolocked(lo, inode, 1);
@@ -1753,10 +1773,16 @@ static void lo_forget_multi(fuse_req_t req, size_t count,
 
 static void lo_readlink(fuse_req_t req, fuse_ino_t ino)
 {
+    g_auto(TempFd) path_fd = TEMP_FD_INIT;
     char buf[PATH_MAX + 1];
     int res;
 
-    res = readlinkat(lo_fd(req, ino), "", buf, sizeof(buf));
+    res = lo_fd(req, ino, &path_fd);
+    if (res < 0) {
+        return (void)fuse_reply_err(req, -res);
+    }
+
+    res = readlinkat(path_fd.fd, "", buf, sizeof(buf));
     if (res == -1) {
         return (void)fuse_reply_err(req, errno);
     }
@@ -2554,10 +2580,17 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
 
 static void lo_statfs(fuse_req_t req, fuse_ino_t ino)
 {
+    g_auto(TempFd) path_fd = TEMP_FD_INIT;
     int res;
     struct statvfs stbuf;
 
-    res = fstatvfs(lo_fd(req, ino), &stbuf);
+    res = lo_fd(req, ino, &path_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
+    res = fstatvfs(path_fd.fd, &stbuf);
     if (res == -1) {
         fuse_reply_err(req, errno);
     } else {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 07/12] virtiofsd: Let lo_inode_open() return a TempFd
  2021-09-16  8:40 [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Hanna Reitz
                   ` (5 preceding siblings ...)
  2021-09-16  8:40 ` [PATCH v4 06/12] virtiofsd: Let lo_fd() return a TempFd Hanna Reitz
@ 2021-09-16  8:40 ` Hanna Reitz
  2021-10-18 19:18   ` Vivek Goyal
  2021-09-16  8:40 ` [PATCH v4 08/12] virtiofsd: Pass lo_data to lo_inode_{fd,open}() Hanna Reitz
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Hanna Reitz @ 2021-09-16  8:40 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Reitz, Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal

Strictly speaking, this is not necessary, because lo_inode_open() will
always return a new FD owned by the caller, so TempFd.owned will always
be true.

The auto-cleanup is nice, though.  Also, we get a more unified interface
where you always get a TempFd when you need an FD for an lo_inode
(regardless of whether it is an O_PATH FD or a non-O_PATH FD).

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 156 +++++++++++++++----------------
 1 file changed, 75 insertions(+), 81 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 3bf20b8659..d257eda129 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -293,10 +293,8 @@ static void temp_fd_clear(TempFd *temp_fd)
 /**
  * Return an owned fd from *temp_fd that will not be closed when
  * *temp_fd goes out of scope.
- *
- * (TODO: Remove __attribute__ once this is used.)
  */
-static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
+static int temp_fd_steal(TempFd *temp_fd)
 {
     if (temp_fd->owned) {
         temp_fd->owned = false;
@@ -309,10 +307,8 @@ static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
 /**
  * Create a borrowing copy of an existing TempFd.  Note that *to is
  * only valid as long as *from is valid.
- *
- * (TODO: Remove __attribute__ once this is used.)
  */
-static __attribute__((unused)) void temp_fd_copy(const TempFd *from, TempFd *to)
+static void temp_fd_copy(const TempFd *from, TempFd *to)
 {
     *to = (TempFd) {
         .fd = from->fd,
@@ -689,9 +685,12 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
  * when a malicious client opens special files such as block device nodes.
  * Symlink inodes are also rejected since symlinks must already have been
  * traversed on the client side.
+ *
+ * The fd is returned in tfd->fd.  The return value is 0 on success and -errno
+ * otherwise.
  */
 static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
-                         int open_flags)
+                         int open_flags, TempFd *tfd)
 {
     g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
     int fd;
@@ -710,7 +709,13 @@ static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
     if (fd < 0) {
         return -errno;
     }
-    return fd;
+
+    *tfd = (TempFd) {
+        .fd = fd,
+        .owned = true,
+    };
+
+    return 0;
 }
 
 static void lo_init(void *userdata, struct fuse_conn_info *conn)
@@ -854,7 +859,8 @@ static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
 static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
                        int valid, struct fuse_file_info *fi)
 {
-    g_auto(TempFd) path_fd = TEMP_FD_INIT;
+    g_auto(TempFd) path_fd = TEMP_FD_INIT; /* at least an O_PATH fd */
+    g_auto(TempFd) rw_fd = TEMP_FD_INIT; /* O_RDWR fd */
     int saverr;
     char procname[64];
     struct lo_data *lo = lo_data(req);
@@ -868,7 +874,15 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         return;
     }
 
-    res = lo_inode_fd(inode, &path_fd);
+    if (!fi && (valid & FUSE_SET_ATTR_SIZE)) {
+        /* We need an O_RDWR FD for ftruncate() */
+        res = lo_inode_open(lo, inode, O_RDWR, &rw_fd);
+        if (res >= 0) {
+            temp_fd_copy(&rw_fd, &path_fd);
+        }
+    } else {
+        res = lo_inode_fd(inode, &path_fd);
+    }
     if (res < 0) {
         saverr = -res;
         goto out_err;
@@ -916,18 +930,12 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             truncfd = fd;
         } else {
-            truncfd = lo_inode_open(lo, inode, O_RDWR);
-            if (truncfd < 0) {
-                saverr = -truncfd;
-                goto out_err;
-            }
+            assert(rw_fd.fd >= 0);
+            truncfd = rw_fd.fd;
         }
 
         saverr = drop_security_capability(lo, truncfd);
         if (saverr) {
-            if (!fi) {
-                close(truncfd);
-            }
             goto out_err;
         }
 
@@ -935,9 +943,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
             res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
             if (res != 0) {
                 saverr = res;
-                if (!fi) {
-                    close(truncfd);
-                }
                 goto out_err;
             }
         }
@@ -950,9 +955,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
                 fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
             }
         }
-        if (!fi) {
-            close(truncfd);
-        }
         if (res == -1) {
             goto out_err;
         }
@@ -1840,11 +1842,13 @@ static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
 static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
                        struct fuse_file_info *fi)
 {
+    g_auto(TempFd) rd_fd = TEMP_FD_INIT;
     int error = ENOMEM;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
     struct lo_dirp *d = NULL;
     int fd;
+    int res;
     ssize_t fh;
 
     inode = lo_inode(req, ino);
@@ -1858,14 +1862,16 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
         goto out_err;
     }
 
-    fd = lo_inode_open(lo, inode, O_RDONLY);
-    if (fd < 0) {
-        error = -fd;
+    res = lo_inode_open(lo, inode, O_RDONLY, &rd_fd);
+    if (res < 0) {
+        error = -res;
         goto out_err;
     }
 
+    fd = temp_fd_steal(&rd_fd);
     d->dp = fdopendir(fd);
     if (d->dp == NULL) {
+        close(fd);
         goto out_errno;
     }
 
@@ -1895,8 +1901,6 @@ out_err:
     if (d) {
         if (d->dp) {
             closedir(d->dp);
-        } else if (fd != -1) {
-            close(fd);
         }
         free(d);
     }
@@ -2096,6 +2100,7 @@ static void update_open_flags(int writeback, int allow_direct_io,
 static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
                       int existing_fd, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) opened_fd = TEMP_FD_INIT;
     ssize_t fh;
     int fd = existing_fd;
     int err;
@@ -2112,16 +2117,18 @@ static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
             }
         }
 
-        fd = lo_inode_open(lo, inode, fi->flags);
+        err = lo_inode_open(lo, inode, fi->flags, &opened_fd);
 
         if (cap_fsetid_dropped) {
             if (gain_effective_cap("FSETID")) {
                 fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
             }
         }
-        if (fd < 0) {
-            return -fd;
+        if (err < 0) {
+            return -err;
         }
+        fd = temp_fd_steal(&opened_fd);
+
         if (fi->flags & (O_TRUNC)) {
             int err = drop_security_capability(lo, fd);
             if (err) {
@@ -2231,8 +2238,9 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
                                                       uint64_t lock_owner,
                                                       pid_t pid, int *err)
 {
+    g_auto(TempFd) rw_fd = TEMP_FD_INIT;
     struct lo_inode_plock *plock;
-    int fd;
+    int res;
 
     plock =
         g_hash_table_lookup(inode->posix_locks, GUINT_TO_POINTER(lock_owner));
@@ -2249,15 +2257,15 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
 
     /* Open another instance of file which can be used for ofd locks. */
     /* TODO: What if file is not writable? */
-    fd = lo_inode_open(lo, inode, O_RDWR);
-    if (fd < 0) {
-        *err = -fd;
+    res = lo_inode_open(lo, inode, O_RDWR, &rw_fd);
+    if (res < 0) {
+        *err = -res;
         free(plock);
         return NULL;
     }
 
     plock->lock_owner = lock_owner;
-    plock->fd = fd;
+    plock->fd = temp_fd_steal(&rw_fd);
     g_hash_table_insert(inode->posix_locks, GUINT_TO_POINTER(plock->lock_owner),
                         plock);
     return plock;
@@ -2473,6 +2481,7 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
                      struct fuse_file_info *fi)
 {
+    g_auto(TempFd) rw_fd = TEMP_FD_INIT;
     struct lo_inode *inode = lo_inode(req, ino);
     struct lo_data *lo = lo_data(req);
     int res;
@@ -2487,11 +2496,12 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
     }
 
     if (!fi) {
-        fd = lo_inode_open(lo, inode, O_RDWR);
-        if (fd < 0) {
-            res = -fd;
+        res = lo_inode_open(lo, inode, O_RDWR, &rw_fd);
+        if (res < 0) {
+            res = -res;
             goto out;
         }
+        fd = rw_fd.fd;
     } else {
         fd = lo_fi_fd(req, fi);
     }
@@ -2501,9 +2511,6 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
     } else {
         res = fsync(fd) == -1 ? errno : 0;
     }
-    if (!fi) {
-        close(fd);
-    }
 out:
     lo_inode_put(lo, &inode);
     fuse_reply_err(req, res);
@@ -3065,7 +3072,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     if (block_xattr(lo, in_name)) {
         fuse_reply_err(req, EOPNOTSUPP);
@@ -3117,12 +3123,14 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      * Otherwise, call fchdir() to avoid open().
      */
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        g_auto(TempFd) rd_fd = TEMP_FD_INIT;
+
+        ret = lo_inode_open(lo, inode, O_RDONLY, &rd_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = fgetxattr(fd, name, value, size);
+        ret = fgetxattr(rd_fd.fd, name, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
         g_auto(TempFd) path_fd = TEMP_FD_INIT;
@@ -3153,10 +3161,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         fuse_reply_xattr(req, ret);
     }
 out_free:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     return;
 
@@ -3176,7 +3180,6 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     inode = lo_inode(req, ino);
     if (!inode) {
@@ -3200,12 +3203,14 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
     }
 
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        g_auto(TempFd) rd_fd = TEMP_FD_INIT;
+
+        ret = lo_inode_open(lo, inode, O_RDONLY, &rd_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = flistxattr(fd, value, size);
+        ret = flistxattr(rd_fd.fd, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
         g_auto(TempFd) path_fd = TEMP_FD_INIT;
@@ -3294,10 +3299,6 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         fuse_reply_xattr(req, ret);
     }
 out_free:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     return;
 
@@ -3312,14 +3313,14 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
                         const char *value, size_t size, int flags,
                         uint32_t extra_flags)
 {
-    g_auto(TempFd) path_fd = TEMP_FD_INIT;
+    g_auto(TempFd) path_fd = TEMP_FD_INIT; /* O_PATH fd */
+    g_auto(TempFd) rd_fd = TEMP_FD_INIT; /* O_RDONLY fd */
     const char *name;
     char *mapped_name;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
     bool switched_creds = false;
     bool cap_fsetid_dropped = false;
     struct lo_cred old = {};
@@ -3364,9 +3365,9 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      * setxattr() on the link's filename there.
      */
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &rd_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
     } else {
@@ -3401,8 +3402,8 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         }
         switched_creds = true;
     }
-    if (fd >= 0) {
-        ret = fsetxattr(fd, name, value, size, flags);
+    if (rd_fd.fd >= 0) {
+        ret = fsetxattr(rd_fd.fd, name, value, size, flags);
         saverr = ret == -1 ? errno : 0;
     } else {
         char procname[64];
@@ -3424,10 +3425,6 @@ out:
         FCHDIR_NOFAIL(lo->root.fd);
     }
 
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     g_free(mapped_name);
     fuse_reply_err(req, saverr);
@@ -3442,7 +3439,6 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     if (block_xattr(lo, in_name)) {
         fuse_reply_err(req, EOPNOTSUPP);
@@ -3478,12 +3474,14 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
              name);
 
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        g_auto(TempFd) rd_fd = TEMP_FD_INIT;
+
+        ret = lo_inode_open(lo, inode, O_RDONLY, &rd_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = fremovexattr(fd, name);
+        ret = fremovexattr(rd_fd.fd, name);
         saverr = ret == -1 ? errno : 0;
     } else {
         g_auto(TempFd) path_fd = TEMP_FD_INIT;
@@ -3502,10 +3500,6 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     }
 
 out:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     g_free(mapped_name);
     fuse_reply_err(req, saverr);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 08/12] virtiofsd: Pass lo_data to lo_inode_{fd,open}()
  2021-09-16  8:40 [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Hanna Reitz
                   ` (6 preceding siblings ...)
  2021-09-16  8:40 ` [PATCH v4 07/12] virtiofsd: Let lo_inode_open() " Hanna Reitz
@ 2021-09-16  8:40 ` Hanna Reitz
  2021-09-16  8:40 ` [PATCH v4 09/12] virtiofsd: Add lo_inode.fhandle Hanna Reitz
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Hanna Reitz @ 2021-09-16  8:40 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Reitz, Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal

In order to be able to use file handles for identifying lo_inode
objects, we will add some global state to lo_data, which we will need in
a future function to be called from lo_inode_fd() and lo_inode_open().

To prepare for this, pass a (non-const) lo_data pointer to lo_inode_fd()
and lo_inode_open().

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 34 +++++++++++++++++---------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index d257eda129..bc3b803d46 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -647,7 +647,8 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
     return elem->inode;
 }
 
-static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
+static int lo_inode_fd(struct lo_data *lo, const struct lo_inode *inode,
+                       TempFd *tfd)
 {
     *tfd = (TempFd) {
         .fd = inode->fd,
@@ -665,15 +666,16 @@ static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
 static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
 {
     struct lo_inode *inode = lo_inode(req, ino);
+    struct lo_data *lo = lo_data(req);
     int res;
 
     if (!inode) {
         return -EBADF;
     }
 
-    res = lo_inode_fd(inode, tfd);
+    res = lo_inode_fd(lo, inode, tfd);
 
-    lo_inode_put(lo_data(req), &inode);
+    lo_inode_put(lo, &inode);
     return res;
 }
 
@@ -881,7 +883,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
             temp_fd_copy(&rw_fd, &path_fd);
         }
     } else {
-        res = lo_inode_fd(inode, &path_fd);
+        res = lo_inode_fd(lo, inode, &path_fd);
     }
     if (res < 0) {
         saverr = -res;
@@ -1128,7 +1130,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         name = ".";
     }
 
-    res = lo_inode_fd(dir, &dir_path_fd);
+    res = lo_inode_fd(lo, dir, &dir_path_fd);
     if (res < 0) {
         saverr = -res;
         goto out;
@@ -1383,7 +1385,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
         return;
     }
 
-    res = lo_inode_fd(dir, &dir_path_fd);
+    res = lo_inode_fd(lo, dir, &dir_path_fd);
     if (res < 0) {
         saverr = -res;
         goto out;
@@ -1469,13 +1471,13 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
         goto out;
     }
 
-    res = lo_inode_fd(inode, &path_fd);
+    res = lo_inode_fd(lo, inode, &path_fd);
     if (res < 0) {
         saverr = -res;
         goto out;
     }
 
-    res = lo_inode_fd(parent_inode, &parent_path_fd);
+    res = lo_inode_fd(lo, parent_inode, &parent_path_fd);
     if (res < 0) {
         saverr = -res;
         goto out;
@@ -1535,7 +1537,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
         goto out;
     }
 
-    res = lo_inode_fd(dir, &dir_path_fd);
+    res = lo_inode_fd(lo, dir, &dir_path_fd);
     if (res < 0) {
         goto out;
     }
@@ -1627,13 +1629,13 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
-    res = lo_inode_fd(parent_inode, &parent_path_fd);
+    res = lo_inode_fd(lo, parent_inode, &parent_path_fd);
     if (res < 0) {
         fuse_reply_err(req, -res);
         goto out;
     }
 
-    res = lo_inode_fd(newparent_inode, &newparent_path_fd);
+    res = lo_inode_fd(lo, newparent_inode, &newparent_path_fd);
     if (res < 0) {
         fuse_reply_err(req, -res);
         goto out;
@@ -2181,7 +2183,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
         return;
     }
 
-    err = lo_inode_fd(parent_inode, &parent_path_fd);
+    err = lo_inode_fd(lo, parent_inode, &parent_path_fd);
     if (err < 0) {
         err = -err;
         goto out;
@@ -3135,7 +3137,7 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     } else {
         g_auto(TempFd) path_fd = TEMP_FD_INIT;
 
-        ret = lo_inode_fd(inode, &path_fd);
+        ret = lo_inode_fd(lo, inode, &path_fd);
         if (ret < 0) {
             saverr = -ret;
             goto out;
@@ -3215,7 +3217,7 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
     } else {
         g_auto(TempFd) path_fd = TEMP_FD_INIT;
 
-        ret = lo_inode_fd(inode, &path_fd);
+        ret = lo_inode_fd(lo, inode, &path_fd);
         if (ret < 0) {
             saverr = -ret;
             goto out;
@@ -3371,7 +3373,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
             goto out;
         }
     } else {
-        ret = lo_inode_fd(inode, &path_fd);
+        ret = lo_inode_fd(lo, inode, &path_fd);
         if (ret < 0) {
             saverr = -ret;
             goto out;
@@ -3486,7 +3488,7 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     } else {
         g_auto(TempFd) path_fd = TEMP_FD_INIT;
 
-        ret = lo_inode_fd(inode, &path_fd);
+        ret = lo_inode_fd(lo, inode, &path_fd);
         if (ret < 0) {
             saverr = -ret;
             goto out;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 09/12] virtiofsd: Add lo_inode.fhandle
  2021-09-16  8:40 [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Hanna Reitz
                   ` (7 preceding siblings ...)
  2021-09-16  8:40 ` [PATCH v4 08/12] virtiofsd: Pass lo_data to lo_inode_{fd,open}() Hanna Reitz
@ 2021-09-16  8:40 ` Hanna Reitz
  2021-09-16  8:40 ` [PATCH v4 10/12] virtiofsd: Add inodes_by_handle hash table Hanna Reitz
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Hanna Reitz @ 2021-09-16  8:40 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Reitz, Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal

This new field is an alternative to lo_inode.fd: Either of the two must
be set.  In case an O_PATH FD is needed for some lo_inode, it is either
taken from lo_inode.fd, if valid, or a temporary FD is opened with
open_by_handle_at().

Using a file handle instead of an FD has the advantage of keeping the
number of open file descriptors low.

Because open_by_handle_at() requires a mount FD (i.e. a non-O_PATH FD
opened on the filesystem to which the file handle refers), but every
lo_fhandle only has a mount ID (as returned by name_to_handle_at()), we
keep a hash map of such FDs in mount_fds (mapping ID to FD).
get_file_handle(), which is added by a later patch, will ensure that
every mount ID for which we have generated a handle has a corresponding
entry in mount_fds.  Every handle holds a strong reference to its mount
FD (lo_mount_fd.refcount) so we can clean up mount FDs when they are no
longer needed.

release_file_handle()'s drop_mount_fd_ref parameter may look a bit
strange at this point, because we always pass true for it, but it will
make more sense when we start generating file handles: At that point, we
will also use this function to clean up lo_fhandle object that do not
yet have a strong reference to an lo_mount_fd object, and then we will
need to be able to pass false for drop_mount_fd_ref.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c      | 175 +++++++++++++++++++++++---
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 2 files changed, 161 insertions(+), 15 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index bc3b803d46..bd8fc922ea 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -88,8 +88,18 @@ struct lo_key {
     uint64_t mnt_id;
 };
 
+struct lo_fhandle {
+    union {
+        struct file_handle handle;
+        char padding[sizeof(struct file_handle) + MAX_HANDLE_SZ];
+    };
+    int mount_id;
+};
+
 struct lo_inode {
+    /* fd or fhandle must be set (i.e. >= 0 or non-NULL, respectively) */
     int fd;
+    struct lo_fhandle *fhandle;
 
     /*
      * Atomic reference count for this object.  The nlookup field holds a
@@ -142,6 +152,19 @@ typedef struct xattr_map_entry {
     unsigned int flags;
 } XattrMapEntry;
 
+/*
+ * An O_RDONLY FD representing the mount it is on.  We need this for
+ * open_by_handle_at().
+ *
+ * The refcount is increased every time we store a file handle for
+ * this mount, and it is decreased every time we release such a stored
+ * file handle.
+ */
+struct lo_mount_fd {
+    int fd;
+    gint refcount;
+};
+
 struct lo_data {
     pthread_mutex_t mutex;
     int sandbox;
@@ -178,6 +201,10 @@ struct lo_data {
     /* If set, virtiofsd is responsible for setting umask during creation */
     bool change_umask;
     int user_posix_acl, posix_acl;
+
+    /* Maps (integer) mount IDs to lo_mount_fd objects */
+    GHashTable *mount_fds;
+    pthread_rwlock_t mount_fds_lock;
 };
 
 /**
@@ -316,6 +343,93 @@ static void temp_fd_copy(const TempFd *from, TempFd *to)
     };
 }
 
+static void free_lo_mount_fd(gpointer data)
+{
+    struct lo_mount_fd *mfd = data;
+
+    close(mfd->fd);
+    g_free(mfd);
+}
+
+/**
+ * Frees a file handle and optionally removes its reference to the
+ * associated mount FD.  (Passing NULL as @fh is OK.)
+ *
+ * Pass @drop_mount_fd_ref == true if and only if this handle has a
+ * strong reference to an lo_mount_fd object in the mount_fds hash
+ * table.  That is always the case for file handles stored in lo_inode
+ * objects.
+ */
+static void release_file_handle(struct lo_data *lo, struct lo_fhandle *fh,
+                                bool drop_mount_fd_ref)
+{
+    if (!fh) {
+        return;
+    }
+
+    if (drop_mount_fd_ref) {
+        struct lo_mount_fd *mfd;
+
+        if (pthread_rwlock_rdlock(&lo->mount_fds_lock)) {
+            fuse_log(FUSE_LOG_ERR, "%s(): Dropping mount FD reference failed "
+                     "(mount ID: %i)\n", __func__, fh->mount_id);
+        } else {
+            mfd = g_hash_table_lookup(lo->mount_fds,
+                                      GINT_TO_POINTER(fh->mount_id));
+            assert(mfd != NULL);
+
+            pthread_rwlock_unlock(&lo->mount_fds_lock);
+
+            if (g_atomic_int_dec_and_test(&mfd->refcount)) {
+                if (pthread_rwlock_wrlock(&lo->mount_fds_lock)) {
+                    fuse_log(FUSE_LOG_ERR, "%s(): Dropping mount FD reference "
+                             "failed (mount ID: %i)\n", __func__, fh->mount_id);
+                } else {
+                    /* Auto-closes the FD and frees mfd */
+                    g_hash_table_remove(lo->mount_fds,
+                                        GINT_TO_POINTER(fh->mount_id));
+                    pthread_rwlock_unlock(&lo->mount_fds_lock);
+                }
+            }
+        }
+    }
+
+    g_free(fh);
+}
+
+/**
+ * Open the given file handle with the given flags.
+ *
+ * The mount FD to pass to open_by_handle_at() is taken from the
+ * mount_fds hash map.
+ *
+ * On error, return -errno.
+ */
+static int open_file_handle(struct lo_data *lo, const struct lo_fhandle *fh,
+                            int flags)
+{
+    struct lo_mount_fd *mfd;
+    int ret;
+
+    ret = pthread_rwlock_rdlock(&lo->mount_fds_lock);
+    if (ret) {
+        return -ret;
+    }
+
+    mfd = g_hash_table_lookup(lo->mount_fds, GINT_TO_POINTER(fh->mount_id));
+    pthread_rwlock_unlock(&lo->mount_fds_lock);
+    if (!mfd) {
+        return -EINVAL;
+    }
+
+    ret = open_by_handle_at(mfd->fd, (struct file_handle *)&fh->handle, flags);
+    if (ret < 0) {
+        return -errno;
+    }
+
+    return ret;
+}
+
 /*
  * Load capng's state from our saved state if the current thread
  * hadn't previously been loaded.
@@ -622,7 +736,10 @@ static void lo_inode_put(struct lo_data *lo, struct lo_inode **inodep)
     *inodep = NULL;
 
     if (g_atomic_int_dec_and_test(&inode->refcount)) {
-        close(inode->fd);
+        if (inode->fd >= 0) {
+            close(inode->fd);
+        }
+        release_file_handle(lo, inode->fhandle, true);
         free(inode);
     }
 }
@@ -650,10 +767,25 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
 static int lo_inode_fd(struct lo_data *lo, const struct lo_inode *inode,
                        TempFd *tfd)
 {
-    *tfd = (TempFd) {
-        .fd = inode->fd,
-        .owned = false,
-    };
+    if (inode->fd >= 0) {
+        *tfd = (TempFd) {
+            .fd = inode->fd,
+            .owned = false,
+        };
+    } else {
+        int fd;
+
+        assert(inode->fhandle != NULL);
+        fd = open_file_handle(lo, inode->fhandle, O_PATH);
+        if (fd < 0) {
+            return -errno;
+        }
+
+        *tfd = (TempFd) {
+            .fd = fd,
+            .owned = true,
+        };
+    }
 
     return 0;
 }
@@ -694,22 +826,32 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
 static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
                          int open_flags, TempFd *tfd)
 {
-    g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
+    g_autofree char *fd_str = NULL;
     int fd;
 
     if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) {
         return -EBADF;
     }
 
-    /*
-     * The file is a symlink so O_NOFOLLOW must be ignored. We checked earlier
-     * that the inode is not a special file but if an external process races
-     * with us then symlinks are traversed here. It is not possible to escape
-     * the shared directory since it is mounted as "/" though.
-     */
-    fd = openat(lo->proc_self_fd, fd_str, open_flags & ~O_NOFOLLOW);
-    if (fd < 0) {
-        return -errno;
+    if (inode->fd >= 0) {
+        /*
+         * The file is a symlink so O_NOFOLLOW must be ignored. We checked
+         * earlier that the inode is not a special file but if an external
+         * process races with us then symlinks are traversed here. It is not
+         * possible to escape the shared directory since it is mounted as "/"
+         * though.
+         */
+        fd_str = g_strdup_printf("%d", inode->fd);
+        fd = openat(lo->proc_self_fd, fd_str, open_flags & ~O_NOFOLLOW);
+        if (fd < 0) {
+            return -errno;
+        }
+    } else {
+        assert(inode->fhandle != NULL);
+        fd = open_file_handle(lo, inode->fhandle, open_flags);
+        if (fd < 0) {
+            return fd;
+        }
     }
 
     *tfd = (TempFd) {
@@ -4187,6 +4329,9 @@ int main(int argc, char *argv[])
     lo.root.fuse_ino = FUSE_ROOT_ID;
     lo.cache = CACHE_AUTO;
 
+    pthread_rwlock_init(&lo.mount_fds_lock, NULL);
+    lo.mount_fds = g_hash_table_new_full(NULL, NULL, NULL, free_lo_mount_fd);
+
     /*
      * Set up the ino map like this:
      * [0] Reserved (will not be used)
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index f49ed94b5e..af04c638cb 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -77,6 +77,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(statx),
     SCMP_SYS(open),
     SCMP_SYS(openat),
+    SCMP_SYS(open_by_handle_at),
     SCMP_SYS(ppoll),
     SCMP_SYS(prctl), /* TODO restrict to just PR_SET_NAME? */
     SCMP_SYS(preadv),
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 10/12] virtiofsd: Add inodes_by_handle hash table
  2021-09-16  8:40 [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Hanna Reitz
                   ` (8 preceding siblings ...)
  2021-09-16  8:40 ` [PATCH v4 09/12] virtiofsd: Add lo_inode.fhandle Hanna Reitz
@ 2021-09-16  8:40 ` Hanna Reitz
  2021-10-19 20:02   ` Vivek Goyal
  2021-09-16  8:40 ` [PATCH v4 11/12] virtiofsd: Optionally fill lo_inode.fhandle Hanna Reitz
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Hanna Reitz @ 2021-09-16  8:40 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Reitz, Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal

Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
its inode ID will remain in use until we drop our lo_inode (and
lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
the inode ID as an lo_inode key, because any inode with an inode ID we
find in lo_data.inodes (on the same filesystem) must be the exact same
file.

This will change when we start setting lo_inode.fhandle so we do not
have to keep an O_PATH FD open.  Then, unlinking such an inode will
immediately remove it, so its ID can then be reused by newly created
files, even while the lo_inode object is still there[1].

So creating a new file can then reuse the old file's inode ID, and
looking up the new file would lead to us finding the old file's
lo_inode, which is not ideal.

Luckily, just as file handles cause this problem, they also solve it:  A
file handle contains a generation ID, which changes when an inode ID is
reused, so the new file can be distinguished from the old one.  So all
we need to do is to add a second map besides lo_data.inodes that maps
file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.

Unfortunately, we cannot rely on being able to generate file handles
every time.  Therefore, we still enter every lo_inode object into
inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
potential inodes_by_handle entry then has precedence, the inodes_by_ids
entry is just a fallback.

Note that we do not generate lo_fhandle objects yet, and so we also do
not enter anything into the inodes_by_handle map yet.  Also, all lookups
skip that map.  We might manually create file handles with some code
that is immediately removed by the next patch again, but that would
break the assumption in lo_find() that every lo_inode with a non-NULL
.fhandle must have an entry in inodes_by_handle and vice versa.  So we
leave actually using the inodes_by_handle map for the next patch.

[1] If some application in the guest still has the file open, there is
going to be a corresponding FD mapping in lo_data.fd_map.  In such a
case, the inode will only go away once every application in the guest
has closed it.  The problem described only applies to cases where the
guest does not have the file open, and it is just in the dentry cache,
basically.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
 1 file changed, 65 insertions(+), 16 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index bd8fc922ea..b7d6aa7f9d 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -186,7 +186,8 @@ struct lo_data {
     int announce_submounts;
     bool use_statx;
     struct lo_inode root;
-    GHashTable *inodes; /* protected by lo->mutex */
+    GHashTable *inodes_by_ids; /* protected by lo->mutex */
+    GHashTable *inodes_by_handle; /* protected by lo->mutex */
     struct lo_map ino_map; /* protected by lo->mutex */
     struct lo_map dirp_map; /* protected by lo->mutex */
     struct lo_map fd_map; /* protected by lo->mutex */
@@ -275,8 +276,9 @@ static struct {
 /* That we loaded cap-ng in the current thread from the saved */
 static __thread bool cap_loaded = 0;
 
-static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
-                                uint64_t mnt_id);
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id);
 static int xattr_map_client(const struct lo_data *lo, const char *client_name,
                             char **out_name);
 
@@ -1143,18 +1145,40 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
-static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
-                                uint64_t mnt_id)
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id)
 {
-    struct lo_inode *p;
-    struct lo_key key = {
+    struct lo_inode *p = NULL;
+    struct lo_key ids_key = {
         .ino = st->st_ino,
         .dev = st->st_dev,
         .mnt_id = mnt_id,
     };
 
     pthread_mutex_lock(&lo->mutex);
-    p = g_hash_table_lookup(lo->inodes, &key);
+    if (fhandle) {
+        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
+    }
+    if (!p) {
+        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
+        /*
+         * When we had to fall back to looking up an inode by its
+         * inode ID, ensure that we hit an entry that has a valid file
+         * descriptor.  Having an FD open means that the inode cannot
+         * really be deleted until the FD is closed, so that the inode
+         * ID remains valid until we evict our lo_inode.
+         * With no FD open (and just a file handle), the inode can be
+         * deleted while we still have our lo_inode, and so the inode
+         * ID may be reused by a completely different new inode.  We
+         * then must look up the lo_inode by file handle, because this
+         * handle contains a generation ID to differentiate between
+         * the old and the new inode.
+         */
+        if (p && p->fd == -1) {
+            p = NULL;
+        }
+    }
     if (p) {
         assert(p->nlookup > 0);
         p->nlookup++;
@@ -1294,7 +1318,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
-    inode = lo_find(lo, &e->attr, mnt_id);
+    inode = lo_find(lo, NULL, &e->attr, mnt_id);
     if (inode) {
         close(newfd);
     } else {
@@ -1324,7 +1348,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         }
         pthread_mutex_lock(&lo->mutex);
         inode->fuse_ino = lo_add_inode_mapping(req, inode);
-        g_hash_table_insert(lo->inodes, &inode->key, inode);
+        g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
         pthread_mutex_unlock(&lo->mutex);
     }
     e->ino = inode->fuse_ino;
@@ -1690,7 +1714,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
         goto out;
     }
 
-    inode = lo_find(lo, &attr, mnt_id);
+    inode = lo_find(lo, NULL, &attr, mnt_id);
 
 out:
     lo_inode_put(lo, &dir);
@@ -1857,7 +1881,7 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
     inode->nlookup -= n;
     if (!inode->nlookup) {
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
-        g_hash_table_remove(lo->inodes, &inode->key);
+        g_hash_table_remove(lo->inodes_by_ids, &inode->key);
         if (lo->posix_lock) {
             if (g_hash_table_size(inode->posix_locks)) {
                 fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
@@ -3700,7 +3724,7 @@ static void lo_destroy(void *userdata)
         GHashTableIter iter;
         gpointer key, value;
 
-        g_hash_table_iter_init(&iter, lo->inodes);
+        g_hash_table_iter_init(&iter, lo->inodes_by_ids);
         if (!g_hash_table_iter_next(&iter, &key, &value)) {
             break;
         }
@@ -4264,10 +4288,34 @@ static gboolean lo_key_equal(gconstpointer a, gconstpointer b)
     return la->ino == lb->ino && la->dev == lb->dev && la->mnt_id == lb->mnt_id;
 }
 
+static guint lo_fhandle_hash(gconstpointer key)
+{
+    const struct lo_fhandle *fh = key;
+    guint hash;
+    size_t i;
+
+    /* Basically g_str_hash() */
+    hash = 5381;
+    for (i = 0; i < sizeof(fh->padding); i++) {
+        hash += hash * 33 + (unsigned char)fh->padding[i];
+    }
+    hash += hash * 33 + fh->mount_id;
+
+    return hash;
+}
+
+static gboolean lo_fhandle_equal(gconstpointer a, gconstpointer b)
+{
+    return !memcmp(a, b, sizeof(struct lo_fhandle));
+}
+
 static void fuse_lo_data_cleanup(struct lo_data *lo)
 {
-    if (lo->inodes) {
-        g_hash_table_destroy(lo->inodes);
+    if (lo->inodes_by_ids) {
+        g_hash_table_destroy(lo->inodes_by_ids);
+    }
+    if (lo->inodes_by_handle) {
+        g_hash_table_destroy(lo->inodes_by_handle);
     }
 
     if (lo->root.posix_locks) {
@@ -4324,7 +4372,8 @@ int main(int argc, char *argv[])
     qemu_init_exec_dir(argv[0]);
 
     pthread_mutex_init(&lo.mutex, NULL);
-    lo.inodes = g_hash_table_new(lo_key_hash, lo_key_equal);
+    lo.inodes_by_ids = g_hash_table_new(lo_key_hash, lo_key_equal);
+    lo.inodes_by_handle = g_hash_table_new(lo_fhandle_hash, lo_fhandle_equal);
     lo.root.fd = -1;
     lo.root.fuse_ino = FUSE_ROOT_ID;
     lo.cache = CACHE_AUTO;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 11/12] virtiofsd: Optionally fill lo_inode.fhandle
  2021-09-16  8:40 [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Hanna Reitz
                   ` (9 preceding siblings ...)
  2021-09-16  8:40 ` [PATCH v4 10/12] virtiofsd: Add inodes_by_handle hash table Hanna Reitz
@ 2021-09-16  8:40 ` Hanna Reitz
  2021-10-19 18:57   ` Vivek Goyal
  2021-09-16  8:40 ` [PATCH v4 12/12] virtiofsd: Add lazy lo_do_find() Hanna Reitz
  2021-10-18 18:08 ` [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Vivek Goyal
  12 siblings, 1 reply; 30+ messages in thread
From: Hanna Reitz @ 2021-09-16  8:40 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Reitz, Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal

When the inode_file_handles option is set, try to generate a file handle
for new inodes instead of opening an O_PATH FD.

Being able to open these again will require CAP_DAC_READ_SEARCH, so
setting this option will result in us taking that capability.

Generating a file handle returns the mount ID it is valid for.  Opening
it will require an FD instead.  We have mount_fds to map an ID to an FD.
get_file_handle() scans /proc/self/mountinfo to map mount IDs to their
mount points, which we open to get the mount FD we need.  To verify that
the resulting FD indeed represents the handle's mount ID, we use
statx().  Therefore, using file handles requires statx() support.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tools/virtiofsd/helper.c              |   3 +
 tools/virtiofsd/passthrough_ll.c      | 297 ++++++++++++++++++++++++--
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 3 files changed, 289 insertions(+), 12 deletions(-)

diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index a8295d975a..311f05c7ee 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -187,6 +187,9 @@ void fuse_cmdline_help(void)
            "                               default: no_allow_direct_io\n"
            "    -o announce_submounts      Announce sub-mount points to the guest\n"
            "    -o posix_acl/no_posix_acl  Enable/Disable posix_acl. (default: disabled)\n"
+           "    -o inode_file_handles      Use file handles to reference inodes\n"
+           "                               instead of O_PATH file descriptors\n"
+           "                               (adds +dac_read_search to modcaps)\n"
            );
 }
 
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index b7d6aa7f9d..e86fad8b2f 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -206,6 +206,8 @@ struct lo_data {
     /* Maps (integer) mount IDs to lo_mount_fd objects */
     GHashTable *mount_fds;
     pthread_rwlock_t mount_fds_lock;
+
+    int inode_file_handles;
 };
 
 /**
@@ -262,6 +264,10 @@ static const struct fuse_opt lo_opts[] = {
     { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
     { "posix_acl", offsetof(struct lo_data, user_posix_acl), 1 },
     { "no_posix_acl", offsetof(struct lo_data, user_posix_acl), 0 },
+    { "inode_file_handles", offsetof(struct lo_data, inode_file_handles), 1 },
+    { "no_inode_file_handles",
+      offsetof(struct lo_data, inode_file_handles),
+      0 },
     FUSE_OPT_END
 };
 static bool use_syslog = false;
@@ -359,8 +365,15 @@ static void free_lo_mount_fd(gpointer data)
  *
  * Pass @drop_mount_fd_ref == true if and only if this handle has a
  * strong reference to an lo_mount_fd object in the mount_fds hash
- * table.  That is always the case for file handles stored in lo_inode
- * objects.
+ * table, i.e. if this file handle has been returned by a
+ * get_file_handle() call where *can_open_handle was returned to be
+ * true.  (That is always the case for file handles stored in lo_inode
+ * objects, because those file handles must be open-able.)
+ *
+ * Conversely, pass @drop_mount_fd_ref == false if and only if this
+ * file handle has been returned by a get_file_handle() call where
+ * either NULL was passed for @can_open_handle, or where
+ * *can_open_handle was returned to be false.
  */
 static void release_file_handle(struct lo_data *lo, struct lo_fhandle *fh,
                                 bool drop_mount_fd_ref)
@@ -399,6 +412,196 @@ static void release_file_handle(struct lo_data *lo, struct lo_fhandle *fh,
     g_free(fh);
 }
 
+/**
+ * Generate a file handle for the given dirfd/name combination.
+ *
+ * If mount_fds does not yet contain an entry for the handle's mount
+ * ID, (re)open dirfd/name in O_RDONLY mode and add it to mount_fds
+ * as the FD for that mount ID.  (That is the file that we have
+ * generated a handle for, so it should be representative for the
+ * mount ID.  However, to be sure (and to rule out races), we use
+ * statx() to verify that our assumption is correct.)
+ *
+ * Opening a mount FD can fail in various ways, and independently of
+ * whether generating a file handle was possible.  Many callers only
+ * care about getting a file handle for a lookup, though, and so do
+ * not necessarily need it to be usable.  (You need a valid mount FD
+ * for the handle to be usable.)
+ * *can_open_handle will be set to true if the file handle can be
+ * opened (i.e., we have a mount FD for it), and to false otherwise.
+ * By passing NULL for @can_open_handle, the caller indicates that
+ * they do not care about the handle being open-able, and so
+ * generating a mount FD will be skipped altogether.
+ *
+ * File handles must be freed with release_file_handle().
+ */
+static struct lo_fhandle *get_file_handle(struct lo_data *lo,
+                                          int dirfd, const char *name,
+                                          bool *can_open_handle)
+{
+    /* We need statx() to verify the mount ID */
+#if defined(CONFIG_STATX) && defined(STATX_MNT_ID)
+    int root_path_fd = -1;
+    int mount_fd = -1;
+    struct lo_fhandle *fh = NULL;
+    struct lo_mount_fd *mfd;
+    int ret;
+
+    if (!lo->use_statx || !lo->inode_file_handles || !lo->mountinfo_fp) {
+        goto fail_handle;
+    }
+
+    fh = g_new0(struct lo_fhandle, 1);
+
+    fh->handle.handle_bytes = sizeof(fh->padding) - sizeof(fh->handle);
+    ret = name_to_handle_at(dirfd, name, &fh->handle, &fh->mount_id,
+                            AT_EMPTY_PATH);
+    if (ret < 0) {
+        goto fail_handle;
+    }
+
+    if (!can_open_handle) {
+        /* No need to generate a mount FD if the caller does not care */
+        return fh;
+    }
+
+    if (pthread_rwlock_rdlock(&lo->mount_fds_lock)) {
+        goto fail_mount_fd;
+    }
+
+    mfd = g_hash_table_lookup(lo->mount_fds, GINT_TO_POINTER(fh->mount_id));
+    if (mfd) {
+        g_atomic_int_inc(&mfd->refcount);
+    } else {
+        char *mi_line = NULL;
+        size_t mi_line_len = 0;
+        char *mount_root = NULL;
+        struct statx stx;
+        char procname[64];
+
+        pthread_rwlock_unlock(&lo->mount_fds_lock);
+
+        rewind(lo->mountinfo_fp);
+        while (!mount_root) {
+            ssize_t read_count;
+            int scan_count;
+            int mount_id;
+
+            read_count = getline(&mi_line, &mi_line_len, lo->mountinfo_fp);
+            if (read_count < 0) {
+                break;
+            }
+
+            scan_count = sscanf(mi_line, "%d %*d %*d:%*d %*s %ms",
+                                &mount_id, &mount_root);
+            if (scan_count != 2 || mount_id != fh->mount_id) {
+                free(mount_root);
+                mount_root = NULL;
+            }
+        }
+        free(mi_line);
+
+        if (!mount_root) {
+            goto fail_mount_fd;
+        }
+
+        root_path_fd = open(mount_root, O_PATH);
+        free(mount_root);
+        if (root_path_fd < 0) {
+            goto fail_mount_fd;
+        }
+
+        ret = statx(root_path_fd, "", AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
+                    STATX_TYPE | STATX_MNT_ID, &stx);
+        if (ret < 0) {
+            if (errno == ENOSYS) {
+                lo->use_statx = false;
+                fuse_log(FUSE_LOG_WARNING,
+                         "statx() does not work: Will not be able to use file "
+                         "handles for inodes\n");
+            }
+            goto fail_mount_fd;
+        }
+        if (!(stx.stx_mask & STATX_MNT_ID) || stx.stx_mnt_id != fh->mount_id) {
+            /*
+             * Perhaps a TOCTTOU problem.  Just return a non-openable file
+             * handle this time and retry for the next handle that we want to
+             * generate for this mount.
+             */
+            goto fail_mount_fd;
+        }
+        if (!(stx.stx_mask & STATX_TYPE) ||
+            !(S_ISREG(stx.stx_mode) || S_ISDIR(stx.stx_mode)))
+        {
+            /*
+             * We must not open special files with anything but O_PATH, so we
+             * cannot use this file for mount_fds.  (Note that filesystems can
+             * have special files as their root node, so this case can happen.)
+             * Just return a failure in such a case and let the lo_inode have
+             * an O_PATH fd instead of a file handle.
+             */
+            goto fail_mount_fd;
+        }
+
+        /* Now that we know this fd is safe to open, do it */
+        snprintf(procname, sizeof(procname), "%i", root_path_fd);
+        mount_fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        if (mount_fd < 0) {
+            goto fail_mount_fd;
+        }
+
+        if (pthread_rwlock_wrlock(&lo->mount_fds_lock)) {
+            goto fail_mount_fd;
+        }
+
+        /* Check again, might have changed */
+        if (!g_hash_table_contains(lo->mount_fds,
+                                   GINT_TO_POINTER(fh->mount_id))) {
+            mfd = g_new(struct lo_mount_fd, 1);
+
+            *mfd = (struct lo_mount_fd) {
+                .fd = mount_fd,
+                .refcount = 1,
+            };
+            mount_fd = -1; /* reference moved to *mfd */
+
+            g_hash_table_insert(lo->mount_fds,
+                                GINT_TO_POINTER(fh->mount_id),
+                                mfd);
+        }
+    }
+    pthread_rwlock_unlock(&lo->mount_fds_lock);
+
+    assert(can_open_handle != NULL);
+    *can_open_handle = true;
+
+    goto out;
+
+fail_handle:
+    release_file_handle(lo, fh, false);
+    fh = NULL;
+
+fail_mount_fd:
+    if (can_open_handle) {
+        *can_open_handle = false;
+    }
+
+out:
+    if (root_path_fd >= 0) {
+        close(root_path_fd);
+    }
+    if (mount_fd >= 0) {
+        close(mount_fd);
+    }
+    return fh;
+#else /* defined(CONFIG_STATX) && defined(STATX_MNT_ID) */
+    if (can_open_handle) {
+        *can_open_handle = false;
+    }
+    return NULL;
+#endif
+}
+
 /**
  * Open the given file handle with the given flags.
  *
@@ -1244,6 +1447,11 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
             return -1;
         }
         lo->use_statx = false;
+        if (lo->inode_file_handles) {
+            fuse_log(FUSE_LOG_WARNING,
+                     "statx() does not work: Will not be able to use file "
+                     "handles for inodes\n");
+        }
         /* fallback */
     }
 #endif
@@ -1273,6 +1481,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode = NULL;
     struct lo_inode *dir = lo_inode(req, parent);
+    struct lo_fhandle *fh = NULL;
+    bool can_open_handle = false;
 
     if (inodep) {
         *inodep = NULL; /* in case there is an error */
@@ -1302,13 +1512,26 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
-    newfd = openat(dir_path_fd.fd, name, O_PATH | O_NOFOLLOW);
-    if (newfd == -1) {
-        goto out_err;
+    fh = get_file_handle(lo, dir_path_fd.fd, name, &can_open_handle);
+    if (!fh || !can_open_handle) {
+        /*
+         * If we will not be able to open the file handle again
+         * (can_open_handle is false), open an FD that we can put into
+         * lo_inode (in case we need to create a new lo_inode).
+         */
+        newfd = openat(dir_path_fd.fd, name, O_PATH | O_NOFOLLOW);
+        if (newfd == -1) {
+            goto out_err;
+        }
     }
 
-    res = do_statx(lo, newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
-                   &mnt_id);
+    if (newfd >= 0) {
+        res = do_statx(lo, newfd, "", &e->attr,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
+    } else {
+        res = do_statx(lo, dir_path_fd.fd, name, &e->attr,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
+    }
     if (res == -1) {
         goto out_err;
     }
@@ -1318,9 +1541,19 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
-    inode = lo_find(lo, NULL, &e->attr, mnt_id);
+    /*
+     * Note that fh is always NULL if lo->inode_file_handles is false,
+     * and so we will never do a lookup by file handle here, and
+     * lo->inodes_by_handle will always remain empty.  We only need
+     * this map when we do not have an O_PATH fd open for every
+     * lo_inode, though, so if inode_file_handles is false, we do not
+     * need that map anyway.
+     */
+    inode = lo_find(lo, fh, &e->attr, mnt_id);
     if (inode) {
-        close(newfd);
+        if (newfd != -1) {
+            close(newfd);
+        }
     } else {
         inode = calloc(1, sizeof(struct lo_inode));
         if (!inode) {
@@ -1338,6 +1571,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 
         inode->nlookup = 1;
         inode->fd = newfd;
+        if (can_open_handle) {
+            inode->fhandle = fh;
+            fh = NULL; /* owned by the lo_inode now */
+        }
         inode->key.ino = e->attr.st_ino;
         inode->key.dev = e->attr.st_dev;
         inode->key.mnt_id = mnt_id;
@@ -1349,6 +1586,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         pthread_mutex_lock(&lo->mutex);
         inode->fuse_ino = lo_add_inode_mapping(req, inode);
         g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
+        if (inode->fhandle) {
+            g_hash_table_insert(lo->inodes_by_handle, inode->fhandle, inode);
+        }
         pthread_mutex_unlock(&lo->mutex);
     }
     e->ino = inode->fuse_ino;
@@ -1362,6 +1602,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 
     lo_inode_put(lo, &dir);
 
+    release_file_handle(lo, fh, can_open_handle);
+
     fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n", (unsigned long long)parent,
              name, (unsigned long long)e->ino);
 
@@ -1373,6 +1615,7 @@ out:
     if (newfd != -1) {
         close(newfd);
     }
+    release_file_handle(lo, fh, can_open_handle);
     lo_inode_put(lo, &inode);
     lo_inode_put(lo, &dir);
     return saverr;
@@ -1695,6 +1938,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
     int res;
     uint64_t mnt_id;
     struct stat attr;
+    struct lo_fhandle *fh;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *dir = lo_inode(req, parent);
     struct lo_inode *inode = NULL;
@@ -1708,13 +1952,17 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
         goto out;
     }
 
+    fh = get_file_handle(lo, dir_path_fd.fd, name, NULL);
+    /* Ignore errors, this is just an optional key for the lookup */
+
     res = do_statx(lo, dir_path_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW,
                    &mnt_id);
     if (res == -1) {
         goto out;
     }
 
-    inode = lo_find(lo, NULL, &attr, mnt_id);
+    inode = lo_find(lo, fh, &attr, mnt_id);
+    release_file_handle(lo, fh, false);
 
 out:
     lo_inode_put(lo, &dir);
@@ -1882,6 +2130,9 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
     if (!inode->nlookup) {
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
         g_hash_table_remove(lo->inodes_by_ids, &inode->key);
+        if (inode->fhandle) {
+            g_hash_table_remove(lo->inodes_by_handle, inode->fhandle);
+        }
         if (lo->posix_lock) {
             if (g_hash_table_size(inode->posix_locks)) {
                 fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
@@ -3978,8 +4229,11 @@ static void setup_mounts(const char *source)
 /*
  * Only keep capabilities in allowlist that are needed for file system operation
  * The (possibly NULL) modcaps_in string passed in is free'd before exit.
+ *
+ * Passing true for cap_dac_read_search adds CAP_DAC_READ_SEARCH to the
+ * allowlist.
  */
-static void setup_capabilities(char *modcaps_in)
+static void setup_capabilities(char *modcaps_in, bool cap_dac_read_search)
 {
     char *modcaps = modcaps_in;
     pthread_mutex_lock(&cap.mutex);
@@ -4012,6 +4266,17 @@ static void setup_capabilities(char *modcaps_in)
         exit(1);
     }
 
+    /*
+     * If we need CAP_DAC_READ_SEARCH (for file handles), add that, too.
+     */
+    if (cap_dac_read_search &&
+        capng_update(CAPNG_ADD, CAPNG_PERMITTED | CAPNG_EFFECTIVE,
+                     CAP_DAC_READ_SEARCH)) {
+        fuse_log(FUSE_LOG_ERR, "%s: capng_update failed for "
+                 "CAP_DAC_READ_SEARCH\n", __func__);
+        exit(1);
+    }
+
     /*
      * The modcaps option is a colon separated list of caps,
      * each preceded by either + or -.
@@ -4158,7 +4423,7 @@ static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
     }
 
     setup_seccomp(enable_syslog);
-    setup_capabilities(g_strdup(lo->modcaps));
+    setup_capabilities(g_strdup(lo->modcaps), lo->inode_file_handles);
 }
 
 /* Set the maximum number of open file descriptors */
@@ -4498,6 +4763,14 @@ int main(int argc, char *argv[])
 
     lo.use_statx = true;
 
+#if !defined(CONFIG_STATX) || !defined(STATX_MNT_ID)
+    if (lo.inode_file_handles) {
+        fuse_log(FUSE_LOG_WARNING,
+                 "No statx() or mount ID support: Will not be able to use file "
+                 "handles for inodes\n");
+    }
+#endif
+
     se = fuse_session_new(&args, &lo_oper, sizeof(lo_oper), &lo);
     if (se == NULL) {
         goto err_out1;
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index af04c638cb..ab4dc07e3f 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -73,6 +73,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(mprotect),
     SCMP_SYS(mremap),
     SCMP_SYS(munmap),
+    SCMP_SYS(name_to_handle_at),
     SCMP_SYS(newfstatat),
     SCMP_SYS(statx),
     SCMP_SYS(open),
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 12/12] virtiofsd: Add lazy lo_do_find()
  2021-09-16  8:40 [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Hanna Reitz
                   ` (10 preceding siblings ...)
  2021-09-16  8:40 ` [PATCH v4 11/12] virtiofsd: Optionally fill lo_inode.fhandle Hanna Reitz
@ 2021-09-16  8:40 ` Hanna Reitz
  2021-10-18 18:08 ` [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Vivek Goyal
  12 siblings, 0 replies; 30+ messages in thread
From: Hanna Reitz @ 2021-09-16  8:40 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Reitz, Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal

lo_find() right now takes two lookup keys for two maps, namely the file
handle for inodes_by_handle and the statx information for inodes_by_ids.
However, we only need the statx information if looking up the inode by
the file handle failed.

There are two callers of lo_find(): The first one, lo_do_lookup(), has
both keys anyway, so passing them does not incur any additional cost.
The second one, lookup_name(), though, needs to explicitly invoke
name_to_handle_at() (through get_file_handle()) and statx() (through
do_statx()).  We need to try to get a file handle as the primary key, so
we cannot get rid of get_file_handle(), but we only need the statx
information if looking up an inode by handle failed; so we can defer
that until the lookup has indeed failed.

To this end, replace lo_find()'s st/mnt_id parameters by a get_ids()
closure that is invoked to fill the lo_key struct if necessary.

Also, lo_find() is renamed to lo_do_find(), so we can add a new
lo_find() wrapper whose closure just initializes the lo_key from the
st/mnt_id parameters, just like the old lo_find() did.

lookup_name() directly calls lo_do_find() now and passes its own
closure, which performs the do_statx() call.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 93 +++++++++++++++++++++++++-------
 1 file changed, 75 insertions(+), 18 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index e86fad8b2f..368bad17c7 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1348,22 +1348,23 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
-static struct lo_inode *lo_find(struct lo_data *lo,
-                                const struct lo_fhandle *fhandle,
-                                struct stat *st, uint64_t mnt_id)
+/*
+ * get_ids() will be called to get the key for lo->inodes_by_ids if
+ * the lookup by file handle has failed.
+ */
+static struct lo_inode *lo_do_find(struct lo_data *lo,
+    const struct lo_fhandle *fhandle,
+    int (*get_ids)(struct lo_key *, const void *),
+    const void *get_ids_opaque)
 {
     struct lo_inode *p = NULL;
-    struct lo_key ids_key = {
-        .ino = st->st_ino,
-        .dev = st->st_dev,
-        .mnt_id = mnt_id,
-    };
+    struct lo_key ids_key;
 
     pthread_mutex_lock(&lo->mutex);
     if (fhandle) {
         p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
     }
-    if (!p) {
+    if (!p && get_ids(&ids_key, get_ids_opaque) == 0) {
         p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
         /*
          * When we had to fall back to looking up an inode by its
@@ -1392,6 +1393,36 @@ static struct lo_inode *lo_find(struct lo_data *lo,
     return p;
 }
 
+struct lo_find_get_ids_key_opaque {
+    const struct stat *st;
+    uint64_t mnt_id;
+};
+
+static int lo_find_get_ids_key(struct lo_key *ids_key, const void *opaque)
+{
+    const struct lo_find_get_ids_key_opaque *stat_info = opaque;
+
+    *ids_key = (struct lo_key){
+        .ino = stat_info->st->st_ino,
+        .dev = stat_info->st->st_dev,
+        .mnt_id = stat_info->mnt_id,
+    };
+
+    return 0;
+}
+
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id)
+{
+    const struct lo_find_get_ids_key_opaque stat_info = {
+        .st = st,
+        .mnt_id = mnt_id,
+    };
+
+    return lo_do_find(lo, fhandle, lo_find_get_ids_key, &stat_info);
+}
+
 /* value_destroy_func for posix_locks GHashTable */
 static void posix_locks_value_destroy(gpointer data)
 {
@@ -1930,14 +1961,41 @@ out:
     fuse_reply_err(req, saverr);
 }
 
+struct lookup_name_get_ids_key_opaque {
+    struct lo_data *lo;
+    int parent_fd;
+    const char *name;
+};
+
+static int lookup_name_get_ids_key(struct lo_key *ids_key, const void *opaque)
+{
+    const struct lookup_name_get_ids_key_opaque *stat_params = opaque;
+    uint64_t mnt_id;
+    struct stat attr;
+    int res;
+
+    res = do_statx(stat_params->lo, stat_params->parent_fd, stat_params->name,
+                   &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
+    if (res < 0) {
+        return -errno;
+    }
+
+    *ids_key = (struct lo_key){
+        .ino = attr.st_ino,
+        .dev = attr.st_dev,
+        .mnt_id = mnt_id,
+    };
+
+    return 0;
+}
+
 /* Increments nlookup and caller must release refcount using lo_inode_put() */
 static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
                                     const char *name)
 {
     g_auto(TempFd) dir_path_fd = TEMP_FD_INIT;
     int res;
-    uint64_t mnt_id;
-    struct stat attr;
+    struct lookup_name_get_ids_key_opaque stat_params;
     struct lo_fhandle *fh;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *dir = lo_inode(req, parent);
@@ -1955,13 +2013,12 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
     fh = get_file_handle(lo, dir_path_fd.fd, name, NULL);
     /* Ignore errors, this is just an optional key for the lookup */
 
-    res = do_statx(lo, dir_path_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW,
-                   &mnt_id);
-    if (res == -1) {
-        goto out;
-    }
-
-    inode = lo_find(lo, fh, &attr, mnt_id);
+    stat_params = (struct lookup_name_get_ids_key_opaque){
+        .lo = lo,
+        .parent_fd = dir_path_fd.fd,
+        .name = name,
+    };
+    inode = lo_do_find(lo, fh, lookup_name_get_ids_key, &stat_params);
     release_file_handle(lo, fh, false);
 
 out:
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 01/12] virtiofsd: Keep /proc/self/mountinfo open
  2021-09-16  8:40 ` [PATCH v4 01/12] virtiofsd: Keep /proc/self/mountinfo open Hanna Reitz
@ 2021-10-18 17:07   ` Vivek Goyal
  2021-10-20  9:04     ` Hanna Reitz
  0 siblings, 1 reply; 30+ messages in thread
From: Vivek Goyal @ 2021-10-18 17:07 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Thu, Sep 16, 2021 at 10:40:34AM +0200, Hanna Reitz wrote:
> File handles are specific to mounts, and so name_to_handle_at() returns
> the respective mount ID.  However, open_by_handle_at() is not content
> with an ID, it wants a file descriptor for some inode on the mount,
> which we have to open.
> 
> We want to use /proc/self/mountinfo to find the mounts' root directories
> so we can open them and pass the respective FDs to open_by_handle_at().
> (We need to use the root directory, because we want the inode belonging
> to every mount FD be deletable.  Before the root directory can be
> deleted, all entries within must have been closed, and so when it is
> deleted, there should not be any file handles left that need its FD as
> their mount FD.  Thus, we can then close that FD and the inode can be
> deleted.[1])
> 
> That is why we need to open /proc/self/mountinfo so that we can use it
> to translate mount IDs into root directory paths.  We have to open it
> after setup_mounts() was called, because if we try to open it before, it
> will appear as an empty file after setup_mounts().
> 
> [1] Note that in practice, you still cannot delete the mount root
> directory.  It is a mount point on the host, after all, and mount points
> cannot be deleted.  But by using the mount point as the mount FD, we
> will at least not hog any actually deletable inodes.
> 
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 40 ++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 38b2af8599..6511a6acb4 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -172,6 +172,8 @@ struct lo_data {
>  
>      /* An O_PATH file descriptor to /proc/self/fd/ */
>      int proc_self_fd;
> +    /* A read-only FILE pointer for /proc/self/mountinfo */
> +    FILE *mountinfo_fp;
>      int user_killpriv_v2, killpriv_v2;
>      /* If set, virtiofsd is responsible for setting umask during creation */
>      bool change_umask;
> @@ -3718,6 +3720,19 @@ static void setup_chroot(struct lo_data *lo)
>  static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
>                            bool enable_syslog)
>  {
> +    int proc_self, mountinfo_fd;
> +    int saverr;
> +
> +    /*
> +     * Open /proc/self before we pivot to the new root so we can still
> +     * open /proc/self/mountinfo afterwards
> +     */
> +    proc_self = open("/proc/self", O_PATH);
> +    if (proc_self < 0) {
> +        fuse_log(FUSE_LOG_WARNING, "Failed to open /proc/self: %m; "
> +                 "will not be able to use file handles\n");
> +    }
> +

Hi Hanna,

Should we open /proc/self and /proc/self/mountinfo only if user wants
to file handle. We have already parsed options by now so we know.

Also, if user asked for file handles, and we can't open /proc/self or
/proc/self/mountinfo successfully, I would think we should error out
and not continue (instead of just log it and continue).

That seems to be general theme. If user asked for a feature and if
we can't enable it, we error out and let user retry without that
particular feature.

>      if (lo->sandbox == SANDBOX_NAMESPACE) {
>          setup_namespaces(lo, se);
>          setup_mounts(lo->source);
> @@ -3725,6 +3740,31 @@ static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
>          setup_chroot(lo);
>      }
>  
> +    /*
> +     * Opening /proc/self/mountinfo before the umount2() call in
> +     * setup_mounts() leads to the file appearing empty.  That is why
> +     * we defer opening it until here.
> +     */
> +    lo->mountinfo_fp = NULL;
> +    if (proc_self >= 0) {
> +        mountinfo_fd = openat(proc_self, "mountinfo", O_RDONLY);
> +        if (mountinfo_fd < 0) {
> +            saverr = errno;
> +        } else if (mountinfo_fd >= 0) {
> +            lo->mountinfo_fp = fdopen(mountinfo_fd, "r");
> +            if (!lo->mountinfo_fp) {
> +                saverr = errno;
> +                close(mountinfo_fd);
> +            }
> +        }
> +        if (!lo->mountinfo_fp) {
> +            fuse_log(FUSE_LOG_WARNING, "Failed to open /proc/self/mountinfo: "
> +                     "%s; will not be able to use file handles\n",
> +                     strerror(saverr));
> +        }
> +        close(proc_self);
> +    }
> +

Above code couple probably be moved in a helper function. Makes it
easier to read setup_sandbox(). Same here, open mountinfo only if
user wants file handle support and error out if file handle support
can't be enabled.

Thanks
Vivek
>      setup_seccomp(enable_syslog);
>      setup_capabilities(g_strdup(lo->modcaps));
>  }
> -- 
> 2.31.1
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 02/12] virtiofsd: Limit setxattr()'s creds-dropped region
  2021-09-16  8:40 ` [PATCH v4 02/12] virtiofsd: Limit setxattr()'s creds-dropped region Hanna Reitz
@ 2021-10-18 17:20   ` Vivek Goyal
  2021-10-20  9:11     ` Hanna Reitz
  0 siblings, 1 reply; 30+ messages in thread
From: Vivek Goyal @ 2021-10-18 17:20 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Thu, Sep 16, 2021 at 10:40:35AM +0200, Hanna Reitz wrote:
> We only need to drop/switch our credentials for the (f)setxattr() call
> alone, not for the openat() or fchdir() around it.
> 
> (Right now, this may not be that big of a problem, but with inodes being
> identified by file handles instead of an O_PATH fd, we will need
> open_by_handle_at() calls here, which is really fickle when it comes to
> credentials being dropped.)
> 
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 34 +++++++++++++++++++++++---------
>  1 file changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 6511a6acb4..b43afdfbd3 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -3123,6 +3123,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>      bool switched_creds = false;
>      bool cap_fsetid_dropped = false;
>      struct lo_cred old = {};
> +    bool changed_cwd = false;
>  
>      if (block_xattr(lo, in_name)) {
>          fuse_reply_err(req, EOPNOTSUPP);
> @@ -3158,6 +3159,24 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>               ", name=%s value=%s size=%zd)\n", ino, name, value, size);
>  
>      sprintf(procname, "%i", inode->fd);
> +    /*
> +     * We can only open regular files or directories.  If the inode is
> +     * something else, we have to enter /proc/self/fd and use
> +     * setxattr() on the link's filename there.
> +     */
> +    if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
> +        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        if (fd < 0) {
> +            saverr = errno;
> +            goto out;
> +        }
> +    } else {
> +        /* fchdir should not fail here */
> +        FCHDIR_NOFAIL(lo->proc_self_fd);
> +        /* Set flag so the clean-up path will chdir back */
> +        changed_cwd = true;

Is there a need to move FCHDIR_NOFAIL() call earlier too? I am assuming
this will not be impacted by file handle stuff. So we probably could
leave it in place. Easier to read.

Vivek

> +    }
> +
>      /*
>       * If we are setting posix access acl and if SGID needs to be
>       * cleared, then switch to caller's gid and drop CAP_FSETID
> @@ -3178,20 +3197,12 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>          }
>          switched_creds = true;
>      }
> -    if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
> -        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> -        if (fd < 0) {
> -            saverr = errno;
> -            goto out;
> -        }
> +    if (fd >= 0) {
>          ret = fsetxattr(fd, name, value, size, flags);
>          saverr = ret == -1 ? errno : 0;
>      } else {
> -        /* fchdir should not fail here */
> -        FCHDIR_NOFAIL(lo->proc_self_fd);
>          ret = setxattr(procname, name, value, size, flags);
>          saverr = ret == -1 ? errno : 0;
> -        FCHDIR_NOFAIL(lo->root.fd);
>      }
>      if (switched_creds) {
>          if (cap_fsetid_dropped)
> @@ -3201,6 +3212,11 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>      }
>  
>  out:
> +    if (changed_cwd) {
> +        /* Change CWD back, fchdir should not fail here */
> +        FCHDIR_NOFAIL(lo->root.fd);
> +    }
> +
>      if (fd >= 0) {
>          close(fd);
>      }
> -- 
> 2.31.1
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs
  2021-09-16  8:40 [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Hanna Reitz
                   ` (11 preceding siblings ...)
  2021-09-16  8:40 ` [PATCH v4 12/12] virtiofsd: Add lazy lo_do_find() Hanna Reitz
@ 2021-10-18 18:08 ` Vivek Goyal
  12 siblings, 0 replies; 30+ messages in thread
From: Vivek Goyal @ 2021-10-18 18:08 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Thu, Sep 16, 2021 at 10:40:33AM +0200, Hanna Reitz wrote:

[..]
> Second, I’ve renamed the TempFd objects to reflect what kind of FDs they
> contain; i.e. they’re no longer called “inode_fd” or “dir_fd”, but
> “path_fd”, “rw_fd”, or “dir_path_fd” instead.

This change is really helpful. Makes it easier to read code and figure
out which fd we are referring to.

Vivek



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 07/12] virtiofsd: Let lo_inode_open() return a TempFd
  2021-09-16  8:40 ` [PATCH v4 07/12] virtiofsd: Let lo_inode_open() " Hanna Reitz
@ 2021-10-18 19:18   ` Vivek Goyal
  2021-10-20  9:15     ` Hanna Reitz
  0 siblings, 1 reply; 30+ messages in thread
From: Vivek Goyal @ 2021-10-18 19:18 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Thu, Sep 16, 2021 at 10:40:40AM +0200, Hanna Reitz wrote:
> Strictly speaking, this is not necessary, because lo_inode_open() will
> always return a new FD owned by the caller, so TempFd.owned will always
> be true.
> 
> The auto-cleanup is nice, though.  Also, we get a more unified interface
> where you always get a TempFd when you need an FD for an lo_inode
> (regardless of whether it is an O_PATH FD or a non-O_PATH FD).
> 
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 156 +++++++++++++++----------------
>  1 file changed, 75 insertions(+), 81 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 3bf20b8659..d257eda129 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -293,10 +293,8 @@ static void temp_fd_clear(TempFd *temp_fd)
>  /**
>   * Return an owned fd from *temp_fd that will not be closed when
>   * *temp_fd goes out of scope.
> - *
> - * (TODO: Remove __attribute__ once this is used.)
>   */
> -static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
> +static int temp_fd_steal(TempFd *temp_fd)
>  {
>      if (temp_fd->owned) {
>          temp_fd->owned = false;
> @@ -309,10 +307,8 @@ static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
>  /**
>   * Create a borrowing copy of an existing TempFd.  Note that *to is
>   * only valid as long as *from is valid.
> - *
> - * (TODO: Remove __attribute__ once this is used.)
>   */
> -static __attribute__((unused)) void temp_fd_copy(const TempFd *from, TempFd *to)
> +static void temp_fd_copy(const TempFd *from, TempFd *to)
>  {
>      *to = (TempFd) {
>          .fd = from->fd,
> @@ -689,9 +685,12 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
>   * when a malicious client opens special files such as block device nodes.
>   * Symlink inodes are also rejected since symlinks must already have been
>   * traversed on the client side.
> + *
> + * The fd is returned in tfd->fd.  The return value is 0 on success and -errno
> + * otherwise.
>   */
>  static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
> -                         int open_flags)
> +                         int open_flags, TempFd *tfd)
>  {
>      g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
>      int fd;
> @@ -710,7 +709,13 @@ static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
>      if (fd < 0) {
>          return -errno;
>      }
> -    return fd;
> +
> +    *tfd = (TempFd) {
> +        .fd = fd,
> +        .owned = true,
> +    };
> +
> +    return 0;
>  }
>  
>  static void lo_init(void *userdata, struct fuse_conn_info *conn)
> @@ -854,7 +859,8 @@ static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
>  static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>                         int valid, struct fuse_file_info *fi)
>  {
> -    g_auto(TempFd) path_fd = TEMP_FD_INIT;
> +    g_auto(TempFd) path_fd = TEMP_FD_INIT; /* at least an O_PATH fd */

What does atleast O_PATH fd mean?

> +    g_auto(TempFd) rw_fd = TEMP_FD_INIT; /* O_RDWR fd */
>      int saverr;
>      char procname[64];
>      struct lo_data *lo = lo_data(req);
> @@ -868,7 +874,15 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>          return;
>      }
>  
> -    res = lo_inode_fd(inode, &path_fd);
> +    if (!fi && (valid & FUSE_SET_ATTR_SIZE)) {
> +        /* We need an O_RDWR FD for ftruncate() */
> +        res = lo_inode_open(lo, inode, O_RDWR, &rw_fd);
> +        if (res >= 0) {
> +            temp_fd_copy(&rw_fd, &path_fd);

I am lost here. If lo_inode_open() failed, why are we calling this
temp_fd_copy()? path_fd is not even a valid fd yet.

Still beats me that why open rw_fd now instead of down in
FUSE_SET_ATTR_SIZE block. I think we had this discussion and you
had some reasons to move it up.

Vivek

> +        }
> +    } else {
> +        res = lo_inode_fd(inode, &path_fd);
> +    }
>      if (res < 0) {
>          saverr = -res;
>          goto out_err;
> @@ -916,18 +930,12 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>          if (fi) {
>              truncfd = fd;
>          } else {
> -            truncfd = lo_inode_open(lo, inode, O_RDWR);
> -            if (truncfd < 0) {
> -                saverr = -truncfd;
> -                goto out_err;
> -            }
> +            assert(rw_fd.fd >= 0);
> +            truncfd = rw_fd.fd;
>          }
>  
>          saverr = drop_security_capability(lo, truncfd);
>          if (saverr) {
> -            if (!fi) {
> -                close(truncfd);
> -            }
>              goto out_err;
>          }
>  
> @@ -935,9 +943,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>              res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
>              if (res != 0) {
>                  saverr = res;
> -                if (!fi) {
> -                    close(truncfd);
> -                }
>                  goto out_err;
>              }
>          }
> @@ -950,9 +955,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>                  fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
>              }
>          }
> -        if (!fi) {
> -            close(truncfd);
> -        }
>          if (res == -1) {
>              goto out_err;
>          }
> @@ -1840,11 +1842,13 @@ static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
>  static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>                         struct fuse_file_info *fi)
>  {
> +    g_auto(TempFd) rd_fd = TEMP_FD_INIT;
>      int error = ENOMEM;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode;
>      struct lo_dirp *d = NULL;
>      int fd;
> +    int res;
>      ssize_t fh;
>  
>      inode = lo_inode(req, ino);
> @@ -1858,14 +1862,16 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>          goto out_err;
>      }
>  
> -    fd = lo_inode_open(lo, inode, O_RDONLY);
> -    if (fd < 0) {
> -        error = -fd;
> +    res = lo_inode_open(lo, inode, O_RDONLY, &rd_fd);
> +    if (res < 0) {
> +        error = -res;
>          goto out_err;
>      }
>  
> +    fd = temp_fd_steal(&rd_fd);
>      d->dp = fdopendir(fd);
>      if (d->dp == NULL) {
> +        close(fd);
>          goto out_errno;
>      }
>  
> @@ -1895,8 +1901,6 @@ out_err:
>      if (d) {
>          if (d->dp) {
>              closedir(d->dp);
> -        } else if (fd != -1) {
> -            close(fd);
>          }
>          free(d);
>      }
> @@ -2096,6 +2100,7 @@ static void update_open_flags(int writeback, int allow_direct_io,
>  static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
>                        int existing_fd, struct fuse_file_info *fi)
>  {
> +    g_auto(TempFd) opened_fd = TEMP_FD_INIT;
>      ssize_t fh;
>      int fd = existing_fd;
>      int err;
> @@ -2112,16 +2117,18 @@ static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
>              }
>          }
>  
> -        fd = lo_inode_open(lo, inode, fi->flags);
> +        err = lo_inode_open(lo, inode, fi->flags, &opened_fd);
>  
>          if (cap_fsetid_dropped) {
>              if (gain_effective_cap("FSETID")) {
>                  fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
>              }
>          }
> -        if (fd < 0) {
> -            return -fd;
> +        if (err < 0) {
> +            return -err;
>          }
> +        fd = temp_fd_steal(&opened_fd);
> +
>          if (fi->flags & (O_TRUNC)) {
>              int err = drop_security_capability(lo, fd);
>              if (err) {
> @@ -2231,8 +2238,9 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
>                                                        uint64_t lock_owner,
>                                                        pid_t pid, int *err)
>  {
> +    g_auto(TempFd) rw_fd = TEMP_FD_INIT;
>      struct lo_inode_plock *plock;
> -    int fd;
> +    int res;
>  
>      plock =
>          g_hash_table_lookup(inode->posix_locks, GUINT_TO_POINTER(lock_owner));
> @@ -2249,15 +2257,15 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
>  
>      /* Open another instance of file which can be used for ofd locks. */
>      /* TODO: What if file is not writable? */
> -    fd = lo_inode_open(lo, inode, O_RDWR);
> -    if (fd < 0) {
> -        *err = -fd;
> +    res = lo_inode_open(lo, inode, O_RDWR, &rw_fd);
> +    if (res < 0) {
> +        *err = -res;
>          free(plock);
>          return NULL;
>      }
>  
>      plock->lock_owner = lock_owner;
> -    plock->fd = fd;
> +    plock->fd = temp_fd_steal(&rw_fd);
>      g_hash_table_insert(inode->posix_locks, GUINT_TO_POINTER(plock->lock_owner),
>                          plock);
>      return plock;
> @@ -2473,6 +2481,7 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
>  static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
>                       struct fuse_file_info *fi)
>  {
> +    g_auto(TempFd) rw_fd = TEMP_FD_INIT;
>      struct lo_inode *inode = lo_inode(req, ino);
>      struct lo_data *lo = lo_data(req);
>      int res;
> @@ -2487,11 +2496,12 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
>      }
>  
>      if (!fi) {
> -        fd = lo_inode_open(lo, inode, O_RDWR);
> -        if (fd < 0) {
> -            res = -fd;
> +        res = lo_inode_open(lo, inode, O_RDWR, &rw_fd);
> +        if (res < 0) {
> +            res = -res;
>              goto out;
>          }
> +        fd = rw_fd.fd;
>      } else {
>          fd = lo_fi_fd(req, fi);
>      }
> @@ -2501,9 +2511,6 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
>      } else {
>          res = fsync(fd) == -1 ? errno : 0;
>      }
> -    if (!fi) {
> -        close(fd);
> -    }
>  out:
>      lo_inode_put(lo, &inode);
>      fuse_reply_err(req, res);
> @@ -3065,7 +3072,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>      struct lo_inode *inode;
>      ssize_t ret;
>      int saverr;
> -    int fd = -1;
>  
>      if (block_xattr(lo, in_name)) {
>          fuse_reply_err(req, EOPNOTSUPP);
> @@ -3117,12 +3123,14 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>       * Otherwise, call fchdir() to avoid open().
>       */
>      if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
> -        fd = lo_inode_open(lo, inode, O_RDONLY);
> -        if (fd < 0) {
> -            saverr = -fd;
> +        g_auto(TempFd) rd_fd = TEMP_FD_INIT;
> +
> +        ret = lo_inode_open(lo, inode, O_RDONLY, &rd_fd);
> +        if (ret < 0) {
> +            saverr = -ret;
>              goto out;
>          }
> -        ret = fgetxattr(fd, name, value, size);
> +        ret = fgetxattr(rd_fd.fd, name, value, size);
>          saverr = ret == -1 ? errno : 0;
>      } else {
>          g_auto(TempFd) path_fd = TEMP_FD_INIT;
> @@ -3153,10 +3161,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>          fuse_reply_xattr(req, ret);
>      }
>  out_free:
> -    if (fd >= 0) {
> -        close(fd);
> -    }
> -
>      lo_inode_put(lo, &inode);
>      return;
>  
> @@ -3176,7 +3180,6 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
>      struct lo_inode *inode;
>      ssize_t ret;
>      int saverr;
> -    int fd = -1;
>  
>      inode = lo_inode(req, ino);
>      if (!inode) {
> @@ -3200,12 +3203,14 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
>      }
>  
>      if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
> -        fd = lo_inode_open(lo, inode, O_RDONLY);
> -        if (fd < 0) {
> -            saverr = -fd;
> +        g_auto(TempFd) rd_fd = TEMP_FD_INIT;
> +
> +        ret = lo_inode_open(lo, inode, O_RDONLY, &rd_fd);
> +        if (ret < 0) {
> +            saverr = -ret;
>              goto out;
>          }
> -        ret = flistxattr(fd, value, size);
> +        ret = flistxattr(rd_fd.fd, value, size);
>          saverr = ret == -1 ? errno : 0;
>      } else {
>          g_auto(TempFd) path_fd = TEMP_FD_INIT;
> @@ -3294,10 +3299,6 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
>          fuse_reply_xattr(req, ret);
>      }
>  out_free:
> -    if (fd >= 0) {
> -        close(fd);
> -    }
> -
>      lo_inode_put(lo, &inode);
>      return;
>  
> @@ -3312,14 +3313,14 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>                          const char *value, size_t size, int flags,
>                          uint32_t extra_flags)
>  {
> -    g_auto(TempFd) path_fd = TEMP_FD_INIT;
> +    g_auto(TempFd) path_fd = TEMP_FD_INIT; /* O_PATH fd */
> +    g_auto(TempFd) rd_fd = TEMP_FD_INIT; /* O_RDONLY fd */
>      const char *name;
>      char *mapped_name;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode;
>      ssize_t ret;
>      int saverr;
> -    int fd = -1;
>      bool switched_creds = false;
>      bool cap_fsetid_dropped = false;
>      struct lo_cred old = {};
> @@ -3364,9 +3365,9 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>       * setxattr() on the link's filename there.
>       */
>      if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
> -        fd = lo_inode_open(lo, inode, O_RDONLY);
> -        if (fd < 0) {
> -            saverr = -fd;
> +        ret = lo_inode_open(lo, inode, O_RDONLY, &rd_fd);
> +        if (ret < 0) {
> +            saverr = -ret;
>              goto out;
>          }
>      } else {
> @@ -3401,8 +3402,8 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>          }
>          switched_creds = true;
>      }
> -    if (fd >= 0) {
> -        ret = fsetxattr(fd, name, value, size, flags);
> +    if (rd_fd.fd >= 0) {
> +        ret = fsetxattr(rd_fd.fd, name, value, size, flags);
>          saverr = ret == -1 ? errno : 0;
>      } else {
>          char procname[64];
> @@ -3424,10 +3425,6 @@ out:
>          FCHDIR_NOFAIL(lo->root.fd);
>      }
>  
> -    if (fd >= 0) {
> -        close(fd);
> -    }
> -
>      lo_inode_put(lo, &inode);
>      g_free(mapped_name);
>      fuse_reply_err(req, saverr);
> @@ -3442,7 +3439,6 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
>      struct lo_inode *inode;
>      ssize_t ret;
>      int saverr;
> -    int fd = -1;
>  
>      if (block_xattr(lo, in_name)) {
>          fuse_reply_err(req, EOPNOTSUPP);
> @@ -3478,12 +3474,14 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
>               name);
>  
>      if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
> -        fd = lo_inode_open(lo, inode, O_RDONLY);
> -        if (fd < 0) {
> -            saverr = -fd;
> +        g_auto(TempFd) rd_fd = TEMP_FD_INIT;
> +
> +        ret = lo_inode_open(lo, inode, O_RDONLY, &rd_fd);
> +        if (ret < 0) {
> +            saverr = -ret;
>              goto out;
>          }
> -        ret = fremovexattr(fd, name);
> +        ret = fremovexattr(rd_fd.fd, name);
>          saverr = ret == -1 ? errno : 0;
>      } else {
>          g_auto(TempFd) path_fd = TEMP_FD_INIT;
> @@ -3502,10 +3500,6 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
>      }
>  
>  out:
> -    if (fd >= 0) {
> -        close(fd);
> -    }
> -
>      lo_inode_put(lo, &inode);
>      g_free(mapped_name);
>      fuse_reply_err(req, saverr);
> -- 
> 2.31.1
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 11/12] virtiofsd: Optionally fill lo_inode.fhandle
  2021-09-16  8:40 ` [PATCH v4 11/12] virtiofsd: Optionally fill lo_inode.fhandle Hanna Reitz
@ 2021-10-19 18:57   ` Vivek Goyal
  2021-10-20 10:00     ` Hanna Reitz
  0 siblings, 1 reply; 30+ messages in thread
From: Vivek Goyal @ 2021-10-19 18:57 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Thu, Sep 16, 2021 at 10:40:44AM +0200, Hanna Reitz wrote:
> When the inode_file_handles option is set, try to generate a file handle
> for new inodes instead of opening an O_PATH FD.
> 
> Being able to open these again will require CAP_DAC_READ_SEARCH, so
> setting this option will result in us taking that capability.
> 
> Generating a file handle returns the mount ID it is valid for.  Opening
> it will require an FD instead.  We have mount_fds to map an ID to an FD.
> get_file_handle() scans /proc/self/mountinfo to map mount IDs to their
> mount points, which we open to get the mount FD we need.  To verify that
> the resulting FD indeed represents the handle's mount ID, we use
> statx().  Therefore, using file handles requires statx() support.
> 
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
>  tools/virtiofsd/helper.c              |   3 +
>  tools/virtiofsd/passthrough_ll.c      | 297 ++++++++++++++++++++++++--
>  tools/virtiofsd/passthrough_seccomp.c |   1 +
>  3 files changed, 289 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index a8295d975a..311f05c7ee 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -187,6 +187,9 @@ void fuse_cmdline_help(void)
>             "                               default: no_allow_direct_io\n"
>             "    -o announce_submounts      Announce sub-mount points to the guest\n"
>             "    -o posix_acl/no_posix_acl  Enable/Disable posix_acl. (default: disabled)\n"
> +           "    -o inode_file_handles      Use file handles to reference inodes\n"
> +           "                               instead of O_PATH file descriptors\n"
> +           "                               (adds +dac_read_search to modcaps)\n"
>             );
>  }
>  
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index b7d6aa7f9d..e86fad8b2f 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -206,6 +206,8 @@ struct lo_data {
>      /* Maps (integer) mount IDs to lo_mount_fd objects */
>      GHashTable *mount_fds;
>      pthread_rwlock_t mount_fds_lock;
> +
> +    int inode_file_handles;
>  };
>  
>  /**
> @@ -262,6 +264,10 @@ static const struct fuse_opt lo_opts[] = {
>      { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
>      { "posix_acl", offsetof(struct lo_data, user_posix_acl), 1 },
>      { "no_posix_acl", offsetof(struct lo_data, user_posix_acl), 0 },
> +    { "inode_file_handles", offsetof(struct lo_data, inode_file_handles), 1 },
> +    { "no_inode_file_handles",
> +      offsetof(struct lo_data, inode_file_handles),
> +      0 },
>      FUSE_OPT_END
>  };
>  static bool use_syslog = false;
> @@ -359,8 +365,15 @@ static void free_lo_mount_fd(gpointer data)
>   *
>   * Pass @drop_mount_fd_ref == true if and only if this handle has a
>   * strong reference to an lo_mount_fd object in the mount_fds hash
> - * table.  That is always the case for file handles stored in lo_inode
> - * objects.
> + * table, i.e. if this file handle has been returned by a
> + * get_file_handle() call where *can_open_handle was returned to be
> + * true.  (That is always the case for file handles stored in lo_inode
> + * objects, because those file handles must be open-able.)
> + *
> + * Conversely, pass @drop_mount_fd_ref == false if and only if this
> + * file handle has been returned by a get_file_handle() call where
> + * either NULL was passed for @can_open_handle, or where
> + * *can_open_handle was returned to be false.
>   */
>  static void release_file_handle(struct lo_data *lo, struct lo_fhandle *fh,
>                                  bool drop_mount_fd_ref)
> @@ -399,6 +412,196 @@ static void release_file_handle(struct lo_data *lo, struct lo_fhandle *fh,
>      g_free(fh);
>  }
>  
> +/**
> + * Generate a file handle for the given dirfd/name combination.
> + *
> + * If mount_fds does not yet contain an entry for the handle's mount
> + * ID, (re)open dirfd/name in O_RDONLY mode and add it to mount_fds
> + * as the FD for that mount ID.  (That is the file that we have
> + * generated a handle for, so it should be representative for the
> + * mount ID.  However, to be sure (and to rule out races), we use
> + * statx() to verify that our assumption is correct.)
> + *
> + * Opening a mount FD can fail in various ways, and independently of
> + * whether generating a file handle was possible.  Many callers only
> + * care about getting a file handle for a lookup, though, and so do
> + * not necessarily need it to be usable.  (You need a valid mount FD
> + * for the handle to be usable.)
> + * *can_open_handle will be set to true if the file handle can be
> + * opened (i.e., we have a mount FD for it), and to false otherwise.
> + * By passing NULL for @can_open_handle, the caller indicates that
> + * they do not care about the handle being open-able, and so
> + * generating a mount FD will be skipped altogether.
> + *
> + * File handles must be freed with release_file_handle().
> + */
> +static struct lo_fhandle *get_file_handle(struct lo_data *lo,
> +                                          int dirfd, const char *name,
> +                                          bool *can_open_handle)
> +{
> +    /* We need statx() to verify the mount ID */
> +#if defined(CONFIG_STATX) && defined(STATX_MNT_ID)
> +    int root_path_fd = -1;
> +    int mount_fd = -1;
> +    struct lo_fhandle *fh = NULL;
> +    struct lo_mount_fd *mfd;
> +    int ret;
> +
> +    if (!lo->use_statx || !lo->inode_file_handles || !lo->mountinfo_fp) {
> +        goto fail_handle;
> +    }
> +
> +    fh = g_new0(struct lo_fhandle, 1);
> +
> +    fh->handle.handle_bytes = sizeof(fh->padding) - sizeof(fh->handle);
> +    ret = name_to_handle_at(dirfd, name, &fh->handle, &fh->mount_id,
> +                            AT_EMPTY_PATH);
> +    if (ret < 0) {
> +        goto fail_handle;
> +    }
> +
> +    if (!can_open_handle) {
> +        /* No need to generate a mount FD if the caller does not care */
> +        return fh;
> +    }
> +
> +    if (pthread_rwlock_rdlock(&lo->mount_fds_lock)) {
> +        goto fail_mount_fd;
> +    }
> +
> +    mfd = g_hash_table_lookup(lo->mount_fds, GINT_TO_POINTER(fh->mount_id));
> +    if (mfd) {
> +        g_atomic_int_inc(&mfd->refcount);
> +    } else {
> +        char *mi_line = NULL;
> +        size_t mi_line_len = 0;
> +        char *mount_root = NULL;
> +        struct statx stx;
> +        char procname[64];
> +
> +        pthread_rwlock_unlock(&lo->mount_fds_lock);
> +
> +        rewind(lo->mountinfo_fp);
> +        while (!mount_root) {
> +            ssize_t read_count;
> +            int scan_count;
> +            int mount_id;
> +
> +            read_count = getline(&mi_line, &mi_line_len, lo->mountinfo_fp);
> +            if (read_count < 0) {
> +                break;
> +            }
> +
> +            scan_count = sscanf(mi_line, "%d %*d %*d:%*d %*s %ms",
> +                                &mount_id, &mount_root);
> +            if (scan_count != 2 || mount_id != fh->mount_id) {
> +                free(mount_root);
> +                mount_root = NULL;
> +            }
> +        }
> +        free(mi_line);
> +
> +        if (!mount_root) {
> +            goto fail_mount_fd;
> +        }
> +
> +        root_path_fd = open(mount_root, O_PATH);

So root_path_fd is basically O_PATH fd for mount point. And you want to
first open it O_PATH so that you can check that it is either regular
file or directory before opening it  O_RDONLY. Hmm.., I guess its little
more complicated but safe.

BTW, if O_RDONLY for mount point is called mount_fd, then calling O_PATH
fd mount_path_fd will probably be better. A minor nit. You can change it
if you end up respinning patches.

> +        free(mount_root);
> +        if (root_path_fd < 0) {
> +            goto fail_mount_fd;

We seem to have initialized fh already. In case of failure, fail_mount_fd,
should we free fh and set fh = NULL. Or is it intentional that you still
want to return fh even if mount_fd could not be opened. What's the use
case.

> +        }
> +
> +        ret = statx(root_path_fd, "", AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
> +                    STATX_TYPE | STATX_MNT_ID, &stx);
> +        if (ret < 0) {
> +            if (errno == ENOSYS) {
> +                lo->use_statx = false;
> +                fuse_log(FUSE_LOG_WARNING,
> +                         "statx() does not work: Will not be able to use file "
> +                         "handles for inodes\n");
> +            }
> +            goto fail_mount_fd;
> +        }
> +        if (!(stx.stx_mask & STATX_MNT_ID) || stx.stx_mnt_id != fh->mount_id) {
> +            /*
> +             * Perhaps a TOCTTOU problem.  Just return a non-openable file
> +             * handle this time and retry for the next handle that we want to
> +             * generate for this mount.
> +             */
> +            goto fail_mount_fd;
> +        }
> +        if (!(stx.stx_mask & STATX_TYPE) ||
> +            !(S_ISREG(stx.stx_mode) || S_ISDIR(stx.stx_mode)))
> +        {
> +            /*
> +             * We must not open special files with anything but O_PATH, so we
> +             * cannot use this file for mount_fds.  (Note that filesystems can
> +             * have special files as their root node, so this case can happen.)
> +             * Just return a failure in such a case and let the lo_inode have
> +             * an O_PATH fd instead of a file handle.
> +             */
> +            goto fail_mount_fd;
> +        }
> +
> +        /* Now that we know this fd is safe to open, do it */
> +        snprintf(procname, sizeof(procname), "%i", root_path_fd);
> +        mount_fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        if (mount_fd < 0) {
> +            goto fail_mount_fd;
> +        }
> +
> +        if (pthread_rwlock_wrlock(&lo->mount_fds_lock)) {
> +            goto fail_mount_fd;
> +        }
> +
> +        /* Check again, might have changed */
> +        if (!g_hash_table_contains(lo->mount_fds,
> +                                   GINT_TO_POINTER(fh->mount_id))) {
> +            mfd = g_new(struct lo_mount_fd, 1);
> +
> +            *mfd = (struct lo_mount_fd) {
> +                .fd = mount_fd,
> +                .refcount = 1,
> +            };
> +            mount_fd = -1; /* reference moved to *mfd */
> +
> +            g_hash_table_insert(lo->mount_fds,
> +                                GINT_TO_POINTER(fh->mount_id),
> +                                mfd);
> +        }
> +    }
> +    pthread_rwlock_unlock(&lo->mount_fds_lock);
> +
> +    assert(can_open_handle != NULL);
> +    *can_open_handle = true;
> +
> +    goto out;
> +
> +fail_handle:
> +    release_file_handle(lo, fh, false);
> +    fh = NULL;
> +
> +fail_mount_fd:
> +    if (can_open_handle) {
> +        *can_open_handle = false;
> +    }
> +
> +out:
> +    if (root_path_fd >= 0) {
> +        close(root_path_fd);
> +    }
> +    if (mount_fd >= 0) {
> +        close(mount_fd);
> +    }
> +    return fh;
> +#else /* defined(CONFIG_STATX) && defined(STATX_MNT_ID) */
> +    if (can_open_handle) {
> +        *can_open_handle = false;
> +    }
> +    return NULL;
> +#endif
> +}
> +
>  /**
>   * Open the given file handle with the given flags.
>   *
> @@ -1244,6 +1447,11 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>              return -1;
>          }
>          lo->use_statx = false;
> +        if (lo->inode_file_handles) {
> +            fuse_log(FUSE_LOG_WARNING,
> +                     "statx() does not work: Will not be able to use file "
> +                     "handles for inodes\n");
> +        }
>          /* fallback */
>      }
>  #endif
> @@ -1273,6 +1481,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode = NULL;
>      struct lo_inode *dir = lo_inode(req, parent);
> +    struct lo_fhandle *fh = NULL;
> +    bool can_open_handle = false;
>  
>      if (inodep) {
>          *inodep = NULL; /* in case there is an error */
> @@ -1302,13 +1512,26 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out;
>      }
>  
> -    newfd = openat(dir_path_fd.fd, name, O_PATH | O_NOFOLLOW);
> -    if (newfd == -1) {
> -        goto out_err;
> +    fh = get_file_handle(lo, dir_path_fd.fd, name, &can_open_handle);
> +    if (!fh || !can_open_handle) {
> +        /*
> +         * If we will not be able to open the file handle again
> +         * (can_open_handle is false), open an FD that we can put into
> +         * lo_inode (in case we need to create a new lo_inode).
> +         */
> +        newfd = openat(dir_path_fd.fd, name, O_PATH | O_NOFOLLOW);
> +        if (newfd == -1) {
> +            goto out_err;
> +        }
>      }
>  
> -    res = do_statx(lo, newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
> -                   &mnt_id);
> +    if (newfd >= 0) {
> +        res = do_statx(lo, newfd, "", &e->attr,
> +                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
> +    } else {
> +        res = do_statx(lo, dir_path_fd.fd, name, &e->attr,
> +                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
> +    }
>      if (res == -1) {
>          goto out_err;
>      }
> @@ -1318,9 +1541,19 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;

Can this FUSE_ATTR_SUBMOUNT check be racy w.r.t file handles. I mean
say we open the file handle, and before we call do_statx(), another
mount shows up on the directory in queustion. So stats now belong
to file in new mount and we will think it is a SUBMOUNT. So effectively
now we have fh belonging to old file but stats belonging to new file
in new mount?

>      }
>  
> -    inode = lo_find(lo, NULL, &e->attr, mnt_id);
> +    /*
> +     * Note that fh is always NULL if lo->inode_file_handles is false,
> +     * and so we will never do a lookup by file handle here, and
> +     * lo->inodes_by_handle will always remain empty.  We only need
> +     * this map when we do not have an O_PATH fd open for every
> +     * lo_inode, though, so if inode_file_handles is false, we do not
> +     * need that map anyway.
> +     */
> +    inode = lo_find(lo, fh, &e->attr, mnt_id);
>      if (inode) {
> -        close(newfd);
> +        if (newfd != -1) {
> +            close(newfd);
> +        }
>      } else {
>          inode = calloc(1, sizeof(struct lo_inode));
>          if (!inode) {
> @@ -1338,6 +1571,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>  
>          inode->nlookup = 1;
>          inode->fd = newfd;
> +        if (can_open_handle) {
> +            inode->fhandle = fh;
> +            fh = NULL; /* owned by the lo_inode now */
> +        }
>          inode->key.ino = e->attr.st_ino;
>          inode->key.dev = e->attr.st_dev;
>          inode->key.mnt_id = mnt_id;
> @@ -1349,6 +1586,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          pthread_mutex_lock(&lo->mutex);
>          inode->fuse_ino = lo_add_inode_mapping(req, inode);
>          g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
> +        if (inode->fhandle) {
> +            g_hash_table_insert(lo->inodes_by_handle, inode->fhandle, inode);
> +        }
>          pthread_mutex_unlock(&lo->mutex);
>      }
>      e->ino = inode->fuse_ino;
> @@ -1362,6 +1602,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>  
>      lo_inode_put(lo, &dir);
>  
> +    release_file_handle(lo, fh, can_open_handle);

We transferred ownership of fh to inode (inode->fhandle). Should we be
releasing it now? What am I missing.

> +
>      fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n", (unsigned long long)parent,
>               name, (unsigned long long)e->ino);
>  
> @@ -1373,6 +1615,7 @@ out:
>      if (newfd != -1) {
>          close(newfd);
>      }
> +    release_file_handle(lo, fh, can_open_handle);
>      lo_inode_put(lo, &inode);
>      lo_inode_put(lo, &dir);
>      return saverr;
> @@ -1695,6 +1938,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>      int res;
>      uint64_t mnt_id;
>      struct stat attr;
> +    struct lo_fhandle *fh;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *dir = lo_inode(req, parent);
>      struct lo_inode *inode = NULL;
> @@ -1708,13 +1952,17 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>          goto out;
>      }
>  
> +    fh = get_file_handle(lo, dir_path_fd.fd, name, NULL);
> +    /* Ignore errors, this is just an optional key for the lookup */
> +
>      res = do_statx(lo, dir_path_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW,
>                     &mnt_id);
>      if (res == -1) {
>          goto out;
>      }
>  
> -    inode = lo_find(lo, NULL, &attr, mnt_id);
> +    inode = lo_find(lo, fh, &attr, mnt_id);
> +    release_file_handle(lo, fh, false);
>  
>  out:
>      lo_inode_put(lo, &dir);
> @@ -1882,6 +2130,9 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
>      if (!inode->nlookup) {
>          lo_map_remove(&lo->ino_map, inode->fuse_ino);
>          g_hash_table_remove(lo->inodes_by_ids, &inode->key);
> +        if (inode->fhandle) {
> +            g_hash_table_remove(lo->inodes_by_handle, inode->fhandle);
> +        }
>          if (lo->posix_lock) {
>              if (g_hash_table_size(inode->posix_locks)) {
>                  fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
> @@ -3978,8 +4229,11 @@ static void setup_mounts(const char *source)
>  /*
>   * Only keep capabilities in allowlist that are needed for file system operation
>   * The (possibly NULL) modcaps_in string passed in is free'd before exit.
> + *
> + * Passing true for cap_dac_read_search adds CAP_DAC_READ_SEARCH to the
> + * allowlist.
>   */
> -static void setup_capabilities(char *modcaps_in)
> +static void setup_capabilities(char *modcaps_in, bool cap_dac_read_search)
>  {
>      char *modcaps = modcaps_in;
>      pthread_mutex_lock(&cap.mutex);
> @@ -4012,6 +4266,17 @@ static void setup_capabilities(char *modcaps_in)
>          exit(1);
>      }
>  
> +    /*
> +     * If we need CAP_DAC_READ_SEARCH (for file handles), add that, too.
> +     */
> +    if (cap_dac_read_search &&
> +        capng_update(CAPNG_ADD, CAPNG_PERMITTED | CAPNG_EFFECTIVE,
> +                     CAP_DAC_READ_SEARCH)) {
> +        fuse_log(FUSE_LOG_ERR, "%s: capng_update failed for "
> +                 "CAP_DAC_READ_SEARCH\n", __func__);
> +        exit(1);
> +    }
> +
>      /*
>       * The modcaps option is a colon separated list of caps,
>       * each preceded by either + or -.
> @@ -4158,7 +4423,7 @@ static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
>      }
>  
>      setup_seccomp(enable_syslog);
> -    setup_capabilities(g_strdup(lo->modcaps));
> +    setup_capabilities(g_strdup(lo->modcaps), lo->inode_file_handles);
>  }
>  
>  /* Set the maximum number of open file descriptors */
> @@ -4498,6 +4763,14 @@ int main(int argc, char *argv[])
>  
>      lo.use_statx = true;
>  
> +#if !defined(CONFIG_STATX) || !defined(STATX_MNT_ID)
> +    if (lo.inode_file_handles) {
> +        fuse_log(FUSE_LOG_WARNING,
> +                 "No statx() or mount ID support: Will not be able to use file "
> +                 "handles for inodes\n");
> +    }

Again, I think we should error out if user asked for file handle support
explicitly and we can't enable it. But if we end up enabling by default,
it probably is fine to just log a message and not use it.

This begs the question what happens if filesystem does not support the
file handles. Ideally, I would think that we can error out. But for
submounts check will happen much later. For root mount atleast we
should be able to check at startup time and make sure file handles are
supported by filesystem.

Thanks
Vivek

> +#endif
> +
>      se = fuse_session_new(&args, &lo_oper, sizeof(lo_oper), &lo);
>      if (se == NULL) {
>          goto err_out1;
> diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
> index af04c638cb..ab4dc07e3f 100644
> --- a/tools/virtiofsd/passthrough_seccomp.c
> +++ b/tools/virtiofsd/passthrough_seccomp.c
> @@ -73,6 +73,7 @@ static const int syscall_allowlist[] = {
>      SCMP_SYS(mprotect),
>      SCMP_SYS(mremap),
>      SCMP_SYS(munmap),
> +    SCMP_SYS(name_to_handle_at),
>      SCMP_SYS(newfstatat),
>      SCMP_SYS(statx),
>      SCMP_SYS(open),
> -- 
> 2.31.1
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 10/12] virtiofsd: Add inodes_by_handle hash table
  2021-09-16  8:40 ` [PATCH v4 10/12] virtiofsd: Add inodes_by_handle hash table Hanna Reitz
@ 2021-10-19 20:02   ` Vivek Goyal
  2021-10-20 10:02     ` Hanna Reitz
  0 siblings, 1 reply; 30+ messages in thread
From: Vivek Goyal @ 2021-10-19 20:02 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Thu, Sep 16, 2021 at 10:40:43AM +0200, Hanna Reitz wrote:
> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> its inode ID will remain in use until we drop our lo_inode (and
> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> the inode ID as an lo_inode key, because any inode with an inode ID we
> find in lo_data.inodes (on the same filesystem) must be the exact same
> file.
> 
> This will change when we start setting lo_inode.fhandle so we do not
> have to keep an O_PATH FD open.  Then, unlinking such an inode will
> immediately remove it, so its ID can then be reused by newly created
> files, even while the lo_inode object is still there[1].
> 
> So creating a new file can then reuse the old file's inode ID, and
> looking up the new file would lead to us finding the old file's
> lo_inode, which is not ideal.
> 
> Luckily, just as file handles cause this problem, they also solve it:  A
> file handle contains a generation ID, which changes when an inode ID is
> reused, so the new file can be distinguished from the old one.  So all
> we need to do is to add a second map besides lo_data.inodes that maps
> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> 
> Unfortunately, we cannot rely on being able to generate file handles
> every time.  Therefore, we still enter every lo_inode object into
> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> potential inodes_by_handle entry then has precedence, the inodes_by_ids
> entry is just a fallback.
> 
> Note that we do not generate lo_fhandle objects yet, and so we also do
> not enter anything into the inodes_by_handle map yet.  Also, all lookups
> skip that map.  We might manually create file handles with some code
> that is immediately removed by the next patch again, but that would
> break the assumption in lo_find() that every lo_inode with a non-NULL
> .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> leave actually using the inodes_by_handle map for the next patch.
> 
> [1] If some application in the guest still has the file open, there is
> going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> case, the inode will only go away once every application in the guest
> has closed it.  The problem described only applies to cases where the
> guest does not have the file open, and it is just in the dentry cache,
> basically.
> 
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
>  1 file changed, 65 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index bd8fc922ea..b7d6aa7f9d 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -186,7 +186,8 @@ struct lo_data {
>      int announce_submounts;
>      bool use_statx;
>      struct lo_inode root;
> -    GHashTable *inodes; /* protected by lo->mutex */
> +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
> +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
>      struct lo_map ino_map; /* protected by lo->mutex */
>      struct lo_map dirp_map; /* protected by lo->mutex */
>      struct lo_map fd_map; /* protected by lo->mutex */
> @@ -275,8 +276,9 @@ static struct {
>  /* That we loaded cap-ng in the current thread from the saved */
>  static __thread bool cap_loaded = 0;
>  
> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> -                                uint64_t mnt_id);
> +static struct lo_inode *lo_find(struct lo_data *lo,
> +                                const struct lo_fhandle *fhandle,
> +                                struct stat *st, uint64_t mnt_id);
>  static int xattr_map_client(const struct lo_data *lo, const char *client_name,
>                              char **out_name);
>  
> @@ -1143,18 +1145,40 @@ out_err:
>      fuse_reply_err(req, saverr);
>  }
>  
> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> -                                uint64_t mnt_id)

> +static struct lo_inode *lo_find(struct lo_data *lo,
> +                                const struct lo_fhandle *fhandle,
> +                                struct stat *st, uint64_t mnt_id)
>  {
> -    struct lo_inode *p;
> -    struct lo_key key = {
> +    struct lo_inode *p = NULL;
> +    struct lo_key ids_key = {
>          .ino = st->st_ino,
>          .dev = st->st_dev,
>          .mnt_id = mnt_id,
>      };
>  
>      pthread_mutex_lock(&lo->mutex);
> -    p = g_hash_table_lookup(lo->inodes, &key);
> +    if (fhandle) {
> +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
> +    }
> +    if (!p) {
> +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
> +        /*
> +         * When we had to fall back to looking up an inode by its
> +         * inode ID, ensure that we hit an entry that has a valid file
> +         * descriptor.  Having an FD open means that the inode cannot
> +         * really be deleted until the FD is closed, so that the inode
> +         * ID remains valid until we evict our lo_inode.
> +         * With no FD open (and just a file handle), the inode can be
> +         * deleted while we still have our lo_inode, and so the inode
> +         * ID may be reused by a completely different new inode.  We
> +         * then must look up the lo_inode by file handle, because this
> +         * handle contains a generation ID to differentiate between
> +         * the old and the new inode.
> +         */
> +        if (p && p->fd == -1) {
> +            p = NULL;
> +        }

What happens in following scenario.

- Say I have a hard linked file foo.txt with link foo-link.txt.

- I lookup foo.txt. We generate file handle and add inode for foo.txt
  to inode cache. lo_inode->fhandle will be valie but lo_inode->fd == -1.

- Now later lookup for foo-link.txt happens. Say this time we can't
  generate file handle. When we try to lookup inode, lo_find() should
  return NULL. It will find inode by ids but not use it because inode
  was added using file handle and p->fd == -1. That means lookup
  for foo-link.txt will end up adding another inode, when it should
  not have?

Am I understanding it correctly?

Vivek



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 01/12] virtiofsd: Keep /proc/self/mountinfo open
  2021-10-18 17:07   ` Vivek Goyal
@ 2021-10-20  9:04     ` Hanna Reitz
  2021-10-20 18:25       ` Vivek Goyal
  0 siblings, 1 reply; 30+ messages in thread
From: Hanna Reitz @ 2021-10-20  9:04 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 18.10.21 19:07, Vivek Goyal wrote:
> On Thu, Sep 16, 2021 at 10:40:34AM +0200, Hanna Reitz wrote:
>> File handles are specific to mounts, and so name_to_handle_at() returns
>> the respective mount ID.  However, open_by_handle_at() is not content
>> with an ID, it wants a file descriptor for some inode on the mount,
>> which we have to open.
>>
>> We want to use /proc/self/mountinfo to find the mounts' root directories
>> so we can open them and pass the respective FDs to open_by_handle_at().
>> (We need to use the root directory, because we want the inode belonging
>> to every mount FD be deletable.  Before the root directory can be
>> deleted, all entries within must have been closed, and so when it is
>> deleted, there should not be any file handles left that need its FD as
>> their mount FD.  Thus, we can then close that FD and the inode can be
>> deleted.[1])
>>
>> That is why we need to open /proc/self/mountinfo so that we can use it
>> to translate mount IDs into root directory paths.  We have to open it
>> after setup_mounts() was called, because if we try to open it before, it
>> will appear as an empty file after setup_mounts().
>>
>> [1] Note that in practice, you still cannot delete the mount root
>> directory.  It is a mount point on the host, after all, and mount points
>> cannot be deleted.  But by using the mount point as the mount FD, we
>> will at least not hog any actually deletable inodes.
>>
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 40 ++++++++++++++++++++++++++++++++
>>   1 file changed, 40 insertions(+)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 38b2af8599..6511a6acb4 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -172,6 +172,8 @@ struct lo_data {
>>   
>>       /* An O_PATH file descriptor to /proc/self/fd/ */
>>       int proc_self_fd;
>> +    /* A read-only FILE pointer for /proc/self/mountinfo */
>> +    FILE *mountinfo_fp;
>>       int user_killpriv_v2, killpriv_v2;
>>       /* If set, virtiofsd is responsible for setting umask during creation */
>>       bool change_umask;
>> @@ -3718,6 +3720,19 @@ static void setup_chroot(struct lo_data *lo)
>>   static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
>>                             bool enable_syslog)
>>   {
>> +    int proc_self, mountinfo_fd;
>> +    int saverr;
>> +
>> +    /*
>> +     * Open /proc/self before we pivot to the new root so we can still
>> +     * open /proc/self/mountinfo afterwards
>> +     */
>> +    proc_self = open("/proc/self", O_PATH);
>> +    if (proc_self < 0) {
>> +        fuse_log(FUSE_LOG_WARNING, "Failed to open /proc/self: %m; "
>> +                 "will not be able to use file handles\n");
>> +    }
>> +
> Hi Hanna,
>
> Should we open /proc/self and /proc/self/mountinfo only if user wants
> to file handle. We have already parsed options by now so we know.

I didn’t think it would matter given that it wouldn’t have an adverse 
effect.  If we can’t open them (and I can’t imagine a case where we’d be 
unable to open them), the only result is a warning.

> Also, if user asked for file handles, and we can't open /proc/self or
> /proc/self/mountinfo successfully, I would think we should error out
> and not continue (instead of just log it and continue).

Well, that would break the assumption I had above.  Not that that’s 
really relevant, I just want to mention it.

File handles are a best effort in any case.  If they don’t work, we 
always fall back.  So I don’t know whether we must error out.

OTOH if we know they can never work, then perhaps it would be more 
sensible to error out.

FWIW I’ve ported the relevant v1..v4 changes to virtiofsd-rs, and there 
it errors out.  The error is unconditional, though, so even if you don’t 
request file handles, it’ll try to open mountinfo and exit on error.  I 
found that reasonable because I can’t imagine a case where opening 
/proc/self/fd would work, but /proc/self/mountinfo wouldn’t – and 
working around that would be a bit cumbersome (it would mean wrapping 
PassthroughFs.mount_fds in an Option<> and .unwrap()-ing it on every 
use, with a comment why that’s fine). Honestly, I’d prefer to wait until 
we get a bug report about a failure to open /proc/self/mountinfo.

> That seems to be general theme. If user asked for a feature and if
> we can't enable it, we error out and let user retry without that
> particular feature.
>
>>       if (lo->sandbox == SANDBOX_NAMESPACE) {
>>           setup_namespaces(lo, se);
>>           setup_mounts(lo->source);
>> @@ -3725,6 +3740,31 @@ static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
>>           setup_chroot(lo);
>>       }
>>   
>> +    /*
>> +     * Opening /proc/self/mountinfo before the umount2() call in
>> +     * setup_mounts() leads to the file appearing empty.  That is why
>> +     * we defer opening it until here.
>> +     */
>> +    lo->mountinfo_fp = NULL;
>> +    if (proc_self >= 0) {
>> +        mountinfo_fd = openat(proc_self, "mountinfo", O_RDONLY);
>> +        if (mountinfo_fd < 0) {
>> +            saverr = errno;
>> +        } else if (mountinfo_fd >= 0) {
>> +            lo->mountinfo_fp = fdopen(mountinfo_fd, "r");
>> +            if (!lo->mountinfo_fp) {
>> +                saverr = errno;
>> +                close(mountinfo_fd);
>> +            }
>> +        }
>> +        if (!lo->mountinfo_fp) {
>> +            fuse_log(FUSE_LOG_WARNING, "Failed to open /proc/self/mountinfo: "
>> +                     "%s; will not be able to use file handles\n",
>> +                     strerror(saverr));
>> +        }
>> +        close(proc_self);
>> +    }
>> +
> Above code couple probably be moved in a helper function. Makes it
> easier to read setup_sandbox(). Same here, open mountinfo only if
> user wants file handle support and error out if file handle support
> can't be enabled.

Perhaps, but frankly I don’t see a need to keep setup_sandbox() 
readable.  AFAIU, we are planning to deprecate C virtiofsd anyway, so 
while it pains me to say something like this, we don’t need to keep it 
maintainable.

Now that I’ve opened an MR to bring the v1..v4 changes to virtiofsd-rs 
(https://gitlab.com/virtio-fs/virtiofsd-rs/-/merge_requests/41), I also 
don’t really see a justification for putting further development effort 
into bringing file handles to C virtiofsd.  Of course I’m still grateful 
for your review, and I’ll try to adapt it to virtiofsd-rs, but right now 
I don’t plan on sending a v5.

Hanna



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 02/12] virtiofsd: Limit setxattr()'s creds-dropped region
  2021-10-18 17:20   ` Vivek Goyal
@ 2021-10-20  9:11     ` Hanna Reitz
  0 siblings, 0 replies; 30+ messages in thread
From: Hanna Reitz @ 2021-10-20  9:11 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 18.10.21 19:20, Vivek Goyal wrote:
> On Thu, Sep 16, 2021 at 10:40:35AM +0200, Hanna Reitz wrote:
>> We only need to drop/switch our credentials for the (f)setxattr() call
>> alone, not for the openat() or fchdir() around it.
>>
>> (Right now, this may not be that big of a problem, but with inodes being
>> identified by file handles instead of an O_PATH fd, we will need
>> open_by_handle_at() calls here, which is really fickle when it comes to
>> credentials being dropped.)
>>
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 34 +++++++++++++++++++++++---------
>>   1 file changed, 25 insertions(+), 9 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 6511a6acb4..b43afdfbd3 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -3123,6 +3123,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>>       bool switched_creds = false;
>>       bool cap_fsetid_dropped = false;
>>       struct lo_cred old = {};
>> +    bool changed_cwd = false;
>>   
>>       if (block_xattr(lo, in_name)) {
>>           fuse_reply_err(req, EOPNOTSUPP);
>> @@ -3158,6 +3159,24 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>>                ", name=%s value=%s size=%zd)\n", ino, name, value, size);
>>   
>>       sprintf(procname, "%i", inode->fd);
>> +    /*
>> +     * We can only open regular files or directories.  If the inode is
>> +     * something else, we have to enter /proc/self/fd and use
>> +     * setxattr() on the link's filename there.
>> +     */
>> +    if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
>> +        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
>> +        if (fd < 0) {
>> +            saverr = errno;
>> +            goto out;
>> +        }
>> +    } else {
>> +        /* fchdir should not fail here */
>> +        FCHDIR_NOFAIL(lo->proc_self_fd);
>> +        /* Set flag so the clean-up path will chdir back */
>> +        changed_cwd = true;
> Is there a need to move FCHDIR_NOFAIL() call earlier too? I am assuming
> this will not be impacted by file handle stuff. So we probably could
> leave it in place. Easier to read.

I wanted to limit the region where the creds are dropped to an absolute 
minimum, i.e. just around (f)setxattr().  I prefer this in general, not 
just because it breaks opening file handles, and so I wanted to pull out 
not just the openat(), but the fchdir() as well.

Hanna



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 07/12] virtiofsd: Let lo_inode_open() return a TempFd
  2021-10-18 19:18   ` Vivek Goyal
@ 2021-10-20  9:15     ` Hanna Reitz
  0 siblings, 0 replies; 30+ messages in thread
From: Hanna Reitz @ 2021-10-20  9:15 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 18.10.21 21:18, Vivek Goyal wrote:
> On Thu, Sep 16, 2021 at 10:40:40AM +0200, Hanna Reitz wrote:
>> Strictly speaking, this is not necessary, because lo_inode_open() will
>> always return a new FD owned by the caller, so TempFd.owned will always
>> be true.
>>
>> The auto-cleanup is nice, though.  Also, we get a more unified interface
>> where you always get a TempFd when you need an FD for an lo_inode
>> (regardless of whether it is an O_PATH FD or a non-O_PATH FD).
>>
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 156 +++++++++++++++----------------
>>   1 file changed, 75 insertions(+), 81 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 3bf20b8659..d257eda129 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -293,10 +293,8 @@ static void temp_fd_clear(TempFd *temp_fd)
>>   /**
>>    * Return an owned fd from *temp_fd that will not be closed when
>>    * *temp_fd goes out of scope.
>> - *
>> - * (TODO: Remove __attribute__ once this is used.)
>>    */
>> -static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
>> +static int temp_fd_steal(TempFd *temp_fd)
>>   {
>>       if (temp_fd->owned) {
>>           temp_fd->owned = false;
>> @@ -309,10 +307,8 @@ static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
>>   /**
>>    * Create a borrowing copy of an existing TempFd.  Note that *to is
>>    * only valid as long as *from is valid.
>> - *
>> - * (TODO: Remove __attribute__ once this is used.)
>>    */
>> -static __attribute__((unused)) void temp_fd_copy(const TempFd *from, TempFd *to)
>> +static void temp_fd_copy(const TempFd *from, TempFd *to)
>>   {
>>       *to = (TempFd) {
>>           .fd = from->fd,
>> @@ -689,9 +685,12 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
>>    * when a malicious client opens special files such as block device nodes.
>>    * Symlink inodes are also rejected since symlinks must already have been
>>    * traversed on the client side.
>> + *
>> + * The fd is returned in tfd->fd.  The return value is 0 on success and -errno
>> + * otherwise.
>>    */
>>   static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
>> -                         int open_flags)
>> +                         int open_flags, TempFd *tfd)
>>   {
>>       g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
>>       int fd;
>> @@ -710,7 +709,13 @@ static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
>>       if (fd < 0) {
>>           return -errno;
>>       }
>> -    return fd;
>> +
>> +    *tfd = (TempFd) {
>> +        .fd = fd,
>> +        .owned = true,
>> +    };
>> +
>> +    return 0;
>>   }
>>   
>>   static void lo_init(void *userdata, struct fuse_conn_info *conn)
>> @@ -854,7 +859,8 @@ static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
>>   static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>>                          int valid, struct fuse_file_info *fi)
>>   {
>> -    g_auto(TempFd) path_fd = TEMP_FD_INIT;
>> +    g_auto(TempFd) path_fd = TEMP_FD_INIT; /* at least an O_PATH fd */
> What does atleast O_PATH fd mean?

It means it might as well be an O_RDWR FD.  When we open an O_RDWR FD, 
it’s pointless to open an O_PATH FD, too, because we can use the O_RDWR 
FD everywhere where we’d need an O_PATH FD.  Hence in this case, we open 
rw_fd and copy it to path_fd.

Users still use rw_fd and path_fd according to what they need.

>
>> +    g_auto(TempFd) rw_fd = TEMP_FD_INIT; /* O_RDWR fd */
>>       int saverr;
>>       char procname[64];
>>       struct lo_data *lo = lo_data(req);
>> @@ -868,7 +874,15 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>>           return;
>>       }
>>   
>> -    res = lo_inode_fd(inode, &path_fd);
>> +    if (!fi && (valid & FUSE_SET_ATTR_SIZE)) {
>> +        /* We need an O_RDWR FD for ftruncate() */
>> +        res = lo_inode_open(lo, inode, O_RDWR, &rw_fd);
>> +        if (res >= 0) {
>> +            temp_fd_copy(&rw_fd, &path_fd);
> I am lost here. If lo_inode_open() failed, why are we calling this
> temp_fd_copy()? path_fd is not even a valid fd yet.

We’re calling it on success.

> Still beats me that why open rw_fd now instead of down in
> FUSE_SET_ATTR_SIZE block. I think we had this discussion and you
> had some reasons to move it up.

Because it’s pointless overhead to open two FDs.

Before file handles, getting the O_PATH FD was a trivial operation. If 
we needed an O_RDWR FD later, we’d open it later.  No duplicate work 
involved.

With file handles, getting the O_PATH FD may mean opening a new FD. 
Opening another FD from the same file handle, just with different flags, 
later on would be strange, because we could have just opened the FD as 
O_RDWR right from the start.

Hanna



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 11/12] virtiofsd: Optionally fill lo_inode.fhandle
  2021-10-19 18:57   ` Vivek Goyal
@ 2021-10-20 10:00     ` Hanna Reitz
  2021-10-20 18:53       ` Vivek Goyal
  0 siblings, 1 reply; 30+ messages in thread
From: Hanna Reitz @ 2021-10-20 10:00 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 19.10.21 20:57, Vivek Goyal wrote:
> On Thu, Sep 16, 2021 at 10:40:44AM +0200, Hanna Reitz wrote:
>> When the inode_file_handles option is set, try to generate a file handle
>> for new inodes instead of opening an O_PATH FD.
>>
>> Being able to open these again will require CAP_DAC_READ_SEARCH, so
>> setting this option will result in us taking that capability.
>>
>> Generating a file handle returns the mount ID it is valid for.  Opening
>> it will require an FD instead.  We have mount_fds to map an ID to an FD.
>> get_file_handle() scans /proc/self/mountinfo to map mount IDs to their
>> mount points, which we open to get the mount FD we need.  To verify that
>> the resulting FD indeed represents the handle's mount ID, we use
>> statx().  Therefore, using file handles requires statx() support.
>>
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>> ---
>>   tools/virtiofsd/helper.c              |   3 +
>>   tools/virtiofsd/passthrough_ll.c      | 297 ++++++++++++++++++++++++--
>>   tools/virtiofsd/passthrough_seccomp.c |   1 +
>>   3 files changed, 289 insertions(+), 12 deletions(-)
>>
>> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
>> index a8295d975a..311f05c7ee 100644
>> --- a/tools/virtiofsd/helper.c
>> +++ b/tools/virtiofsd/helper.c
>> @@ -187,6 +187,9 @@ void fuse_cmdline_help(void)
>>              "                               default: no_allow_direct_io\n"
>>              "    -o announce_submounts      Announce sub-mount points to the guest\n"
>>              "    -o posix_acl/no_posix_acl  Enable/Disable posix_acl. (default: disabled)\n"
>> +           "    -o inode_file_handles      Use file handles to reference inodes\n"
>> +           "                               instead of O_PATH file descriptors\n"
>> +           "                               (adds +dac_read_search to modcaps)\n"
>>              );
>>   }
>>   
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index b7d6aa7f9d..e86fad8b2f 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -206,6 +206,8 @@ struct lo_data {
>>       /* Maps (integer) mount IDs to lo_mount_fd objects */
>>       GHashTable *mount_fds;
>>       pthread_rwlock_t mount_fds_lock;
>> +
>> +    int inode_file_handles;
>>   };
>>   
>>   /**
>> @@ -262,6 +264,10 @@ static const struct fuse_opt lo_opts[] = {
>>       { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
>>       { "posix_acl", offsetof(struct lo_data, user_posix_acl), 1 },
>>       { "no_posix_acl", offsetof(struct lo_data, user_posix_acl), 0 },
>> +    { "inode_file_handles", offsetof(struct lo_data, inode_file_handles), 1 },
>> +    { "no_inode_file_handles",
>> +      offsetof(struct lo_data, inode_file_handles),
>> +      0 },
>>       FUSE_OPT_END
>>   };
>>   static bool use_syslog = false;
>> @@ -359,8 +365,15 @@ static void free_lo_mount_fd(gpointer data)
>>    *
>>    * Pass @drop_mount_fd_ref == true if and only if this handle has a
>>    * strong reference to an lo_mount_fd object in the mount_fds hash
>> - * table.  That is always the case for file handles stored in lo_inode
>> - * objects.
>> + * table, i.e. if this file handle has been returned by a
>> + * get_file_handle() call where *can_open_handle was returned to be
>> + * true.  (That is always the case for file handles stored in lo_inode
>> + * objects, because those file handles must be open-able.)
>> + *
>> + * Conversely, pass @drop_mount_fd_ref == false if and only if this
>> + * file handle has been returned by a get_file_handle() call where
>> + * either NULL was passed for @can_open_handle, or where
>> + * *can_open_handle was returned to be false.
>>    */
>>   static void release_file_handle(struct lo_data *lo, struct lo_fhandle *fh,
>>                                   bool drop_mount_fd_ref)
>> @@ -399,6 +412,196 @@ static void release_file_handle(struct lo_data *lo, struct lo_fhandle *fh,
>>       g_free(fh);
>>   }
>>   
>> +/**
>> + * Generate a file handle for the given dirfd/name combination.
>> + *
>> + * If mount_fds does not yet contain an entry for the handle's mount
>> + * ID, (re)open dirfd/name in O_RDONLY mode and add it to mount_fds
>> + * as the FD for that mount ID.  (That is the file that we have
>> + * generated a handle for, so it should be representative for the
>> + * mount ID.  However, to be sure (and to rule out races), we use
>> + * statx() to verify that our assumption is correct.)
>> + *
>> + * Opening a mount FD can fail in various ways, and independently of
>> + * whether generating a file handle was possible.  Many callers only
>> + * care about getting a file handle for a lookup, though, and so do
>> + * not necessarily need it to be usable.  (You need a valid mount FD
>> + * for the handle to be usable.)
>> + * *can_open_handle will be set to true if the file handle can be
>> + * opened (i.e., we have a mount FD for it), and to false otherwise.
>> + * By passing NULL for @can_open_handle, the caller indicates that
>> + * they do not care about the handle being open-able, and so
>> + * generating a mount FD will be skipped altogether.
>> + *
>> + * File handles must be freed with release_file_handle().
>> + */
>> +static struct lo_fhandle *get_file_handle(struct lo_data *lo,
>> +                                          int dirfd, const char *name,
>> +                                          bool *can_open_handle)
>> +{
>> +    /* We need statx() to verify the mount ID */
>> +#if defined(CONFIG_STATX) && defined(STATX_MNT_ID)
>> +    int root_path_fd = -1;
>> +    int mount_fd = -1;
>> +    struct lo_fhandle *fh = NULL;
>> +    struct lo_mount_fd *mfd;
>> +    int ret;
>> +
>> +    if (!lo->use_statx || !lo->inode_file_handles || !lo->mountinfo_fp) {
>> +        goto fail_handle;
>> +    }
>> +
>> +    fh = g_new0(struct lo_fhandle, 1);
>> +
>> +    fh->handle.handle_bytes = sizeof(fh->padding) - sizeof(fh->handle);
>> +    ret = name_to_handle_at(dirfd, name, &fh->handle, &fh->mount_id,
>> +                            AT_EMPTY_PATH);
>> +    if (ret < 0) {
>> +        goto fail_handle;
>> +    }
>> +
>> +    if (!can_open_handle) {
>> +        /* No need to generate a mount FD if the caller does not care */
>> +        return fh;
>> +    }
>> +
>> +    if (pthread_rwlock_rdlock(&lo->mount_fds_lock)) {
>> +        goto fail_mount_fd;
>> +    }
>> +
>> +    mfd = g_hash_table_lookup(lo->mount_fds, GINT_TO_POINTER(fh->mount_id));
>> +    if (mfd) {
>> +        g_atomic_int_inc(&mfd->refcount);
>> +    } else {
>> +        char *mi_line = NULL;
>> +        size_t mi_line_len = 0;
>> +        char *mount_root = NULL;
>> +        struct statx stx;
>> +        char procname[64];
>> +
>> +        pthread_rwlock_unlock(&lo->mount_fds_lock);
>> +
>> +        rewind(lo->mountinfo_fp);
>> +        while (!mount_root) {
>> +            ssize_t read_count;
>> +            int scan_count;
>> +            int mount_id;
>> +
>> +            read_count = getline(&mi_line, &mi_line_len, lo->mountinfo_fp);
>> +            if (read_count < 0) {
>> +                break;
>> +            }
>> +
>> +            scan_count = sscanf(mi_line, "%d %*d %*d:%*d %*s %ms",
>> +                                &mount_id, &mount_root);
>> +            if (scan_count != 2 || mount_id != fh->mount_id) {
>> +                free(mount_root);
>> +                mount_root = NULL;
>> +            }
>> +        }
>> +        free(mi_line);
>> +
>> +        if (!mount_root) {
>> +            goto fail_mount_fd;
>> +        }
>> +
>> +        root_path_fd = open(mount_root, O_PATH);
> So root_path_fd is basically O_PATH fd for mount point. And you want to
> first open it O_PATH so that you can check that it is either regular
> file or directory before opening it  O_RDONLY. Hmm.., I guess its little
> more complicated but safe.
>
> BTW, if O_RDONLY for mount point is called mount_fd, then calling O_PATH
> fd mount_path_fd will probably be better. A minor nit. You can change it
> if you end up respinning patches.
>
>> +        free(mount_root);
>> +        if (root_path_fd < 0) {
>> +            goto fail_mount_fd;
> We seem to have initialized fh already. In case of failure, fail_mount_fd,
> should we free fh and set fh = NULL. Or is it intentional that you still
> want to return fh even if mount_fd could not be opened. What's the use
> case.

The use case is that in this version, get_file_handle() is allowed to 
return un-openable file handles.  If opening the mount FD fails, the 
file handle is still returned, but *can_open_handle is set to false.

This is useful for lookups, because this way we can always look up by 
file handle unless generating the file handle itself fails.

(In practice, this was probably never an issue, because when you do such 
a lookup and find an existing lo_inode with that file handle, there must 
be an accompanying mount FD already, so the whole mount FD generation 
code in get_file_handle() is skipped, and no errors can arise from it.  
However, this was difficult to understand, as shown by some discussions 
I believed we had.)

(By the way, in the Rust version, there’s a strict type difference 
between file handles and openable file handles.  Turning a file handle 
into an openable file handle requires an extra function call that 
contains the whole mount FD code.  I believe that might have also been a 
nicer model for C virtiofsd, but I just didn’t think of it until Rust 
forced me to...)

>> +        }
>> +
>> +        ret = statx(root_path_fd, "", AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
>> +                    STATX_TYPE | STATX_MNT_ID, &stx);
>> +        if (ret < 0) {
>> +            if (errno == ENOSYS) {
>> +                lo->use_statx = false;
>> +                fuse_log(FUSE_LOG_WARNING,
>> +                         "statx() does not work: Will not be able to use file "
>> +                         "handles for inodes\n");
>> +            }
>> +            goto fail_mount_fd;
>> +        }
>> +        if (!(stx.stx_mask & STATX_MNT_ID) || stx.stx_mnt_id != fh->mount_id) {
>> +            /*
>> +             * Perhaps a TOCTTOU problem.  Just return a non-openable file
>> +             * handle this time and retry for the next handle that we want to
>> +             * generate for this mount.
>> +             */
>> +            goto fail_mount_fd;
>> +        }
>> +        if (!(stx.stx_mask & STATX_TYPE) ||
>> +            !(S_ISREG(stx.stx_mode) || S_ISDIR(stx.stx_mode)))
>> +        {
>> +            /*
>> +             * We must not open special files with anything but O_PATH, so we
>> +             * cannot use this file for mount_fds.  (Note that filesystems can
>> +             * have special files as their root node, so this case can happen.)
>> +             * Just return a failure in such a case and let the lo_inode have
>> +             * an O_PATH fd instead of a file handle.
>> +             */
>> +            goto fail_mount_fd;
>> +        }
>> +
>> +        /* Now that we know this fd is safe to open, do it */
>> +        snprintf(procname, sizeof(procname), "%i", root_path_fd);
>> +        mount_fd = openat(lo->proc_self_fd, procname, O_RDONLY);
>> +        if (mount_fd < 0) {
>> +            goto fail_mount_fd;
>> +        }
>> +
>> +        if (pthread_rwlock_wrlock(&lo->mount_fds_lock)) {
>> +            goto fail_mount_fd;
>> +        }
>> +
>> +        /* Check again, might have changed */
>> +        if (!g_hash_table_contains(lo->mount_fds,
>> +                                   GINT_TO_POINTER(fh->mount_id))) {
>> +            mfd = g_new(struct lo_mount_fd, 1);
>> +
>> +            *mfd = (struct lo_mount_fd) {
>> +                .fd = mount_fd,
>> +                .refcount = 1,
>> +            };
>> +            mount_fd = -1; /* reference moved to *mfd */
>> +
>> +            g_hash_table_insert(lo->mount_fds,
>> +                                GINT_TO_POINTER(fh->mount_id),
>> +                                mfd);
>> +        }
>> +    }
>> +    pthread_rwlock_unlock(&lo->mount_fds_lock);
>> +
>> +    assert(can_open_handle != NULL);
>> +    *can_open_handle = true;
>> +
>> +    goto out;
>> +
>> +fail_handle:
>> +    release_file_handle(lo, fh, false);
>> +    fh = NULL;
>> +
>> +fail_mount_fd:
>> +    if (can_open_handle) {
>> +        *can_open_handle = false;
>> +    }
>> +
>> +out:
>> +    if (root_path_fd >= 0) {
>> +        close(root_path_fd);
>> +    }
>> +    if (mount_fd >= 0) {
>> +        close(mount_fd);
>> +    }
>> +    return fh;
>> +#else /* defined(CONFIG_STATX) && defined(STATX_MNT_ID) */
>> +    if (can_open_handle) {
>> +        *can_open_handle = false;
>> +    }
>> +    return NULL;
>> +#endif
>> +}
>> +
>>   /**
>>    * Open the given file handle with the given flags.
>>    *
>> @@ -1244,6 +1447,11 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>>               return -1;
>>           }
>>           lo->use_statx = false;
>> +        if (lo->inode_file_handles) {
>> +            fuse_log(FUSE_LOG_WARNING,
>> +                     "statx() does not work: Will not be able to use file "
>> +                     "handles for inodes\n");
>> +        }
>>           /* fallback */
>>       }
>>   #endif
>> @@ -1273,6 +1481,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>>       struct lo_data *lo = lo_data(req);
>>       struct lo_inode *inode = NULL;
>>       struct lo_inode *dir = lo_inode(req, parent);
>> +    struct lo_fhandle *fh = NULL;
>> +    bool can_open_handle = false;
>>   
>>       if (inodep) {
>>           *inodep = NULL; /* in case there is an error */
>> @@ -1302,13 +1512,26 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>>           goto out;
>>       }
>>   
>> -    newfd = openat(dir_path_fd.fd, name, O_PATH | O_NOFOLLOW);
>> -    if (newfd == -1) {
>> -        goto out_err;
>> +    fh = get_file_handle(lo, dir_path_fd.fd, name, &can_open_handle);
>> +    if (!fh || !can_open_handle) {
>> +        /*
>> +         * If we will not be able to open the file handle again
>> +         * (can_open_handle is false), open an FD that we can put into
>> +         * lo_inode (in case we need to create a new lo_inode).
>> +         */
>> +        newfd = openat(dir_path_fd.fd, name, O_PATH | O_NOFOLLOW);
>> +        if (newfd == -1) {
>> +            goto out_err;
>> +        }
>>       }
>>   
>> -    res = do_statx(lo, newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
>> -                   &mnt_id);
>> +    if (newfd >= 0) {
>> +        res = do_statx(lo, newfd, "", &e->attr,
>> +                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
>> +    } else {
>> +        res = do_statx(lo, dir_path_fd.fd, name, &e->attr,
>> +                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
>> +    }
>>       if (res == -1) {
>>           goto out_err;
>>       }
>> @@ -1318,9 +1541,19 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>>           e->attr_flags |= FUSE_ATTR_SUBMOUNT;
> Can this FUSE_ATTR_SUBMOUNT check be racy w.r.t file handles. I mean
> say we open the file handle, and before we call do_statx(), another
> mount shows up on the directory in queustion. So stats now belong
> to file in new mount and we will think it is a SUBMOUNT. So effectively
> now we have fh belonging to old file but stats belonging to new file
> in new mount?

Yes.  Not just the submount, but the whole stat information, so also the 
file type that goes into the lo_inode.

I thought this wasn’t too bad, though now I don’t really know why. 
Perhaps it was just how I started the implementation and I never could 
get myself to care enough (not good, I know).  Thanks for making me care! :)

We could theoretically open an O_PATH FD from the file handle to get the 
stat information from it, but that wouldn’t work for un-openable file 
handles.

So I think the best is to open an O_PATH FD unconditionally first, and 
then generate the file handle from it.  Then we can stat the FD.  I’ll 
add this to the virtiofsd-rs MR.

>>       }
>>   
>> -    inode = lo_find(lo, NULL, &e->attr, mnt_id);
>> +    /*
>> +     * Note that fh is always NULL if lo->inode_file_handles is false,
>> +     * and so we will never do a lookup by file handle here, and
>> +     * lo->inodes_by_handle will always remain empty.  We only need
>> +     * this map when we do not have an O_PATH fd open for every
>> +     * lo_inode, though, so if inode_file_handles is false, we do not
>> +     * need that map anyway.
>> +     */
>> +    inode = lo_find(lo, fh, &e->attr, mnt_id);
>>       if (inode) {
>> -        close(newfd);
>> +        if (newfd != -1) {
>> +            close(newfd);
>> +        }
>>       } else {
>>           inode = calloc(1, sizeof(struct lo_inode));
>>           if (!inode) {
>> @@ -1338,6 +1571,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>>   
>>           inode->nlookup = 1;
>>           inode->fd = newfd;
>> +        if (can_open_handle) {
>> +            inode->fhandle = fh;
>> +            fh = NULL; /* owned by the lo_inode now */
>> +        }
>>           inode->key.ino = e->attr.st_ino;
>>           inode->key.dev = e->attr.st_dev;
>>           inode->key.mnt_id = mnt_id;
>> @@ -1349,6 +1586,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>>           pthread_mutex_lock(&lo->mutex);
>>           inode->fuse_ino = lo_add_inode_mapping(req, inode);
>>           g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
>> +        if (inode->fhandle) {
>> +            g_hash_table_insert(lo->inodes_by_handle, inode->fhandle, inode);
>> +        }
>>           pthread_mutex_unlock(&lo->mutex);
>>       }
>>       e->ino = inode->fuse_ino;
>> @@ -1362,6 +1602,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>>   
>>       lo_inode_put(lo, &dir);
>>   
>> +    release_file_handle(lo, fh, can_open_handle);
> We transferred ownership of fh to inode (inode->fhandle). Should we be
> releasing it now? What am I missing.

On transfer of ownership, `fh` is set to NULL, so this will be a no-op.  
It is not a no-op if `can_open_handle` is false or when lo_find() 
returned a result, i.e. when the ownership wasn’t transferred.

>> +
>>       fuse_log(FUSE_LOG_DEBUG, "  %lli/%s -> %lli\n", (unsigned long long)parent,
>>                name, (unsigned long long)e->ino);
>>   
>> @@ -1373,6 +1615,7 @@ out:
>>       if (newfd != -1) {
>>           close(newfd);
>>       }
>> +    release_file_handle(lo, fh, can_open_handle);
>>       lo_inode_put(lo, &inode);
>>       lo_inode_put(lo, &dir);
>>       return saverr;
>> @@ -1695,6 +1938,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>>       int res;
>>       uint64_t mnt_id;
>>       struct stat attr;
>> +    struct lo_fhandle *fh;
>>       struct lo_data *lo = lo_data(req);
>>       struct lo_inode *dir = lo_inode(req, parent);
>>       struct lo_inode *inode = NULL;
>> @@ -1708,13 +1952,17 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>>           goto out;
>>       }
>>   
>> +    fh = get_file_handle(lo, dir_path_fd.fd, name, NULL);
>> +    /* Ignore errors, this is just an optional key for the lookup */
>> +
>>       res = do_statx(lo, dir_path_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW,
>>                      &mnt_id);
>>       if (res == -1) {
>>           goto out;
>>       }
>>   
>> -    inode = lo_find(lo, NULL, &attr, mnt_id);
>> +    inode = lo_find(lo, fh, &attr, mnt_id);
>> +    release_file_handle(lo, fh, false);
>>   
>>   out:
>>       lo_inode_put(lo, &dir);
>> @@ -1882,6 +2130,9 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
>>       if (!inode->nlookup) {
>>           lo_map_remove(&lo->ino_map, inode->fuse_ino);
>>           g_hash_table_remove(lo->inodes_by_ids, &inode->key);
>> +        if (inode->fhandle) {
>> +            g_hash_table_remove(lo->inodes_by_handle, inode->fhandle);
>> +        }
>>           if (lo->posix_lock) {
>>               if (g_hash_table_size(inode->posix_locks)) {
>>                   fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
>> @@ -3978,8 +4229,11 @@ static void setup_mounts(const char *source)
>>   /*
>>    * Only keep capabilities in allowlist that are needed for file system operation
>>    * The (possibly NULL) modcaps_in string passed in is free'd before exit.
>> + *
>> + * Passing true for cap_dac_read_search adds CAP_DAC_READ_SEARCH to the
>> + * allowlist.
>>    */
>> -static void setup_capabilities(char *modcaps_in)
>> +static void setup_capabilities(char *modcaps_in, bool cap_dac_read_search)
>>   {
>>       char *modcaps = modcaps_in;
>>       pthread_mutex_lock(&cap.mutex);
>> @@ -4012,6 +4266,17 @@ static void setup_capabilities(char *modcaps_in)
>>           exit(1);
>>       }
>>   
>> +    /*
>> +     * If we need CAP_DAC_READ_SEARCH (for file handles), add that, too.
>> +     */
>> +    if (cap_dac_read_search &&
>> +        capng_update(CAPNG_ADD, CAPNG_PERMITTED | CAPNG_EFFECTIVE,
>> +                     CAP_DAC_READ_SEARCH)) {
>> +        fuse_log(FUSE_LOG_ERR, "%s: capng_update failed for "
>> +                 "CAP_DAC_READ_SEARCH\n", __func__);
>> +        exit(1);
>> +    }
>> +
>>       /*
>>        * The modcaps option is a colon separated list of caps,
>>        * each preceded by either + or -.
>> @@ -4158,7 +4423,7 @@ static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
>>       }
>>   
>>       setup_seccomp(enable_syslog);
>> -    setup_capabilities(g_strdup(lo->modcaps));
>> +    setup_capabilities(g_strdup(lo->modcaps), lo->inode_file_handles);
>>   }
>>   
>>   /* Set the maximum number of open file descriptors */
>> @@ -4498,6 +4763,14 @@ int main(int argc, char *argv[])
>>   
>>       lo.use_statx = true;
>>   
>> +#if !defined(CONFIG_STATX) || !defined(STATX_MNT_ID)
>> +    if (lo.inode_file_handles) {
>> +        fuse_log(FUSE_LOG_WARNING,
>> +                 "No statx() or mount ID support: Will not be able to use file "
>> +                 "handles for inodes\n");
>> +    }
> Again, I think we should error out if user asked for file handle support
> explicitly and we can't enable it. But if we end up enabling by default,
> it probably is fine to just log a message and not use it.
>
> This begs the question what happens if filesystem does not support the
> file handles. Ideally, I would think that we can error out.But for
> submounts check will happen much later. For root mount atleast we
> should be able to check at startup time and make sure file handles are
> supported by filesystem.

I disagree, because we’ve decided that enabling file handles means 
best-effort.  As for CONFIG_STATX or STATX_MNT_ID, those are things that 
will matter less over time anyway (because they will always be present).

Hanna



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 10/12] virtiofsd: Add inodes_by_handle hash table
  2021-10-19 20:02   ` Vivek Goyal
@ 2021-10-20 10:02     ` Hanna Reitz
  2021-10-20 12:29       ` Vivek Goyal
  2021-10-20 12:53       ` Vivek Goyal
  0 siblings, 2 replies; 30+ messages in thread
From: Hanna Reitz @ 2021-10-20 10:02 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 19.10.21 22:02, Vivek Goyal wrote:
> On Thu, Sep 16, 2021 at 10:40:43AM +0200, Hanna Reitz wrote:
>> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
>> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
>> its inode ID will remain in use until we drop our lo_inode (and
>> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
>> the inode ID as an lo_inode key, because any inode with an inode ID we
>> find in lo_data.inodes (on the same filesystem) must be the exact same
>> file.
>>
>> This will change when we start setting lo_inode.fhandle so we do not
>> have to keep an O_PATH FD open.  Then, unlinking such an inode will
>> immediately remove it, so its ID can then be reused by newly created
>> files, even while the lo_inode object is still there[1].
>>
>> So creating a new file can then reuse the old file's inode ID, and
>> looking up the new file would lead to us finding the old file's
>> lo_inode, which is not ideal.
>>
>> Luckily, just as file handles cause this problem, they also solve it:  A
>> file handle contains a generation ID, which changes when an inode ID is
>> reused, so the new file can be distinguished from the old one.  So all
>> we need to do is to add a second map besides lo_data.inodes that maps
>> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
>> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
>>
>> Unfortunately, we cannot rely on being able to generate file handles
>> every time.  Therefore, we still enter every lo_inode object into
>> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
>> potential inodes_by_handle entry then has precedence, the inodes_by_ids
>> entry is just a fallback.
>>
>> Note that we do not generate lo_fhandle objects yet, and so we also do
>> not enter anything into the inodes_by_handle map yet.  Also, all lookups
>> skip that map.  We might manually create file handles with some code
>> that is immediately removed by the next patch again, but that would
>> break the assumption in lo_find() that every lo_inode with a non-NULL
>> .fhandle must have an entry in inodes_by_handle and vice versa.  So we
>> leave actually using the inodes_by_handle map for the next patch.
>>
>> [1] If some application in the guest still has the file open, there is
>> going to be a corresponding FD mapping in lo_data.fd_map.  In such a
>> case, the inode will only go away once every application in the guest
>> has closed it.  The problem described only applies to cases where the
>> guest does not have the file open, and it is just in the dentry cache,
>> basically.
>>
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
>>   1 file changed, 65 insertions(+), 16 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index bd8fc922ea..b7d6aa7f9d 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -186,7 +186,8 @@ struct lo_data {
>>       int announce_submounts;
>>       bool use_statx;
>>       struct lo_inode root;
>> -    GHashTable *inodes; /* protected by lo->mutex */
>> +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
>> +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
>>       struct lo_map ino_map; /* protected by lo->mutex */
>>       struct lo_map dirp_map; /* protected by lo->mutex */
>>       struct lo_map fd_map; /* protected by lo->mutex */
>> @@ -275,8 +276,9 @@ static struct {
>>   /* That we loaded cap-ng in the current thread from the saved */
>>   static __thread bool cap_loaded = 0;
>>   
>> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
>> -                                uint64_t mnt_id);
>> +static struct lo_inode *lo_find(struct lo_data *lo,
>> +                                const struct lo_fhandle *fhandle,
>> +                                struct stat *st, uint64_t mnt_id);
>>   static int xattr_map_client(const struct lo_data *lo, const char *client_name,
>>                               char **out_name);
>>   
>> @@ -1143,18 +1145,40 @@ out_err:
>>       fuse_reply_err(req, saverr);
>>   }
>>   
>> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
>> -                                uint64_t mnt_id)
>> +static struct lo_inode *lo_find(struct lo_data *lo,
>> +                                const struct lo_fhandle *fhandle,
>> +                                struct stat *st, uint64_t mnt_id)
>>   {
>> -    struct lo_inode *p;
>> -    struct lo_key key = {
>> +    struct lo_inode *p = NULL;
>> +    struct lo_key ids_key = {
>>           .ino = st->st_ino,
>>           .dev = st->st_dev,
>>           .mnt_id = mnt_id,
>>       };
>>   
>>       pthread_mutex_lock(&lo->mutex);
>> -    p = g_hash_table_lookup(lo->inodes, &key);
>> +    if (fhandle) {
>> +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
>> +    }
>> +    if (!p) {
>> +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
>> +        /*
>> +         * When we had to fall back to looking up an inode by its
>> +         * inode ID, ensure that we hit an entry that has a valid file
>> +         * descriptor.  Having an FD open means that the inode cannot
>> +         * really be deleted until the FD is closed, so that the inode
>> +         * ID remains valid until we evict our lo_inode.
>> +         * With no FD open (and just a file handle), the inode can be
>> +         * deleted while we still have our lo_inode, and so the inode
>> +         * ID may be reused by a completely different new inode.  We
>> +         * then must look up the lo_inode by file handle, because this
>> +         * handle contains a generation ID to differentiate between
>> +         * the old and the new inode.
>> +         */
>> +        if (p && p->fd == -1) {
>> +            p = NULL;
>> +        }
> What happens in following scenario.
>
> - Say I have a hard linked file foo.txt with link foo-link.txt.
>
> - I lookup foo.txt. We generate file handle and add inode for foo.txt
>    to inode cache. lo_inode->fhandle will be valie but lo_inode->fd == -1.
>
> - Now later lookup for foo-link.txt happens. Say this time we can't
>    generate file handle.

Which we’ve already decided is practically impossible.

> When we try to lookup inode, lo_find() should
>    return NULL. It will find inode by ids but not use it because inode
>    was added using file handle and p->fd == -1. That means lookup
>    for foo-link.txt will end up adding another inode, when it should
>    not have?

Yes, it would end up adding another inode, which doesn’t seem 
catastrophic to me.  But again, the whole case seems impossible to me.

Hanna



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 10/12] virtiofsd: Add inodes_by_handle hash table
  2021-10-20 10:02     ` Hanna Reitz
@ 2021-10-20 12:29       ` Vivek Goyal
  2021-10-20 14:10         ` Hanna Reitz
  2021-10-20 12:53       ` Vivek Goyal
  1 sibling, 1 reply; 30+ messages in thread
From: Vivek Goyal @ 2021-10-20 12:29 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Wed, Oct 20, 2021 at 12:02:32PM +0200, Hanna Reitz wrote:
> On 19.10.21 22:02, Vivek Goyal wrote:
> > On Thu, Sep 16, 2021 at 10:40:43AM +0200, Hanna Reitz wrote:
> > > Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> > > FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> > > its inode ID will remain in use until we drop our lo_inode (and
> > > lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> > > the inode ID as an lo_inode key, because any inode with an inode ID we
> > > find in lo_data.inodes (on the same filesystem) must be the exact same
> > > file.
> > > 
> > > This will change when we start setting lo_inode.fhandle so we do not
> > > have to keep an O_PATH FD open.  Then, unlinking such an inode will
> > > immediately remove it, so its ID can then be reused by newly created
> > > files, even while the lo_inode object is still there[1].
> > > 
> > > So creating a new file can then reuse the old file's inode ID, and
> > > looking up the new file would lead to us finding the old file's
> > > lo_inode, which is not ideal.
> > > 
> > > Luckily, just as file handles cause this problem, they also solve it:  A
> > > file handle contains a generation ID, which changes when an inode ID is
> > > reused, so the new file can be distinguished from the old one.  So all
> > > we need to do is to add a second map besides lo_data.inodes that maps
> > > file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> > > clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> > > 
> > > Unfortunately, we cannot rely on being able to generate file handles
> > > every time.  Therefore, we still enter every lo_inode object into
> > > inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> > > potential inodes_by_handle entry then has precedence, the inodes_by_ids
> > > entry is just a fallback.
> > > 
> > > Note that we do not generate lo_fhandle objects yet, and so we also do
> > > not enter anything into the inodes_by_handle map yet.  Also, all lookups
> > > skip that map.  We might manually create file handles with some code
> > > that is immediately removed by the next patch again, but that would
> > > break the assumption in lo_find() that every lo_inode with a non-NULL
> > > .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> > > leave actually using the inodes_by_handle map for the next patch.
> > > 
> > > [1] If some application in the guest still has the file open, there is
> > > going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> > > case, the inode will only go away once every application in the guest
> > > has closed it.  The problem described only applies to cases where the
> > > guest does not have the file open, and it is just in the dentry cache,
> > > basically.
> > > 
> > > Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> > > ---
> > >   tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
> > >   1 file changed, 65 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index bd8fc922ea..b7d6aa7f9d 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -186,7 +186,8 @@ struct lo_data {
> > >       int announce_submounts;
> > >       bool use_statx;
> > >       struct lo_inode root;
> > > -    GHashTable *inodes; /* protected by lo->mutex */
> > > +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
> > > +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
> > >       struct lo_map ino_map; /* protected by lo->mutex */
> > >       struct lo_map dirp_map; /* protected by lo->mutex */
> > >       struct lo_map fd_map; /* protected by lo->mutex */
> > > @@ -275,8 +276,9 @@ static struct {
> > >   /* That we loaded cap-ng in the current thread from the saved */
> > >   static __thread bool cap_loaded = 0;
> > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > -                                uint64_t mnt_id);
> > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > +                                const struct lo_fhandle *fhandle,
> > > +                                struct stat *st, uint64_t mnt_id);
> > >   static int xattr_map_client(const struct lo_data *lo, const char *client_name,
> > >                               char **out_name);
> > > @@ -1143,18 +1145,40 @@ out_err:
> > >       fuse_reply_err(req, saverr);
> > >   }
> > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > -                                uint64_t mnt_id)
> > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > +                                const struct lo_fhandle *fhandle,
> > > +                                struct stat *st, uint64_t mnt_id)
> > >   {
> > > -    struct lo_inode *p;
> > > -    struct lo_key key = {
> > > +    struct lo_inode *p = NULL;
> > > +    struct lo_key ids_key = {
> > >           .ino = st->st_ino,
> > >           .dev = st->st_dev,
> > >           .mnt_id = mnt_id,
> > >       };
> > >       pthread_mutex_lock(&lo->mutex);
> > > -    p = g_hash_table_lookup(lo->inodes, &key);
> > > +    if (fhandle) {
> > > +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
> > > +    }
> > > +    if (!p) {
> > > +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
> > > +        /*
> > > +         * When we had to fall back to looking up an inode by its
> > > +         * inode ID, ensure that we hit an entry that has a valid file
> > > +         * descriptor.  Having an FD open means that the inode cannot
> > > +         * really be deleted until the FD is closed, so that the inode
> > > +         * ID remains valid until we evict our lo_inode.
> > > +         * With no FD open (and just a file handle), the inode can be
> > > +         * deleted while we still have our lo_inode, and so the inode
> > > +         * ID may be reused by a completely different new inode.  We
> > > +         * then must look up the lo_inode by file handle, because this
> > > +         * handle contains a generation ID to differentiate between
> > > +         * the old and the new inode.
> > > +         */
> > > +        if (p && p->fd == -1) {
> > > +            p = NULL;
> > > +        }
> > What happens in following scenario.
> > 
> > - Say I have a hard linked file foo.txt with link foo-link.txt.
> > 
> > - I lookup foo.txt. We generate file handle and add inode for foo.txt
> >    to inode cache. lo_inode->fhandle will be valie but lo_inode->fd == -1.
> > 
> > - Now later lookup for foo-link.txt happens. Say this time we can't
> >    generate file handle.
> 
> Which we’ve already decided is practically impossible.

Agreed that probably is very less but it can happen when sufficient
resources are not available, like -ENOMEM.

static long do_sys_name_to_handle(struct path *path,
                                  struct file_handle __user *ufh,
                                  int __user *mnt_id)
{
        handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_bytes,
                         GFP_KERNEL);
        if (!handle)
                return -ENOMEM;
}

> 
> > When we try to lookup inode, lo_find() should
> >    return NULL. It will find inode by ids but not use it because inode
> >    was added using file handle and p->fd == -1. That means lookup
> >    for foo-link.txt will end up adding another inode, when it should
> >    not have?
> 
> Yes, it would end up adding another inode, which doesn’t seem catastrophic
> to me.

> But again, the whole case seems impossible to me.

Given we can get -ENOMEM error it is not impossible.

I thought all along you wanted to write code so that we could fallback
to ids in case of errors. Anyway, if you agree that except the case of
-EOPNOTSUPP, we don't have to worry about fallback, then let us just
reutrn error to caller if get_file_handle() fails (except the case
of -EOPNOTSUPP).

And then lo_find() logic could be simpler too. And there is no need
for checks like this.

    if (p && p->fd == -1) {
        p = NULL;
    }

if (fhandle) {
    lookup_using_handle
} else {
    lookup_using_ids
}

Thanks
Vivek



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 10/12] virtiofsd: Add inodes_by_handle hash table
  2021-10-20 10:02     ` Hanna Reitz
  2021-10-20 12:29       ` Vivek Goyal
@ 2021-10-20 12:53       ` Vivek Goyal
  1 sibling, 0 replies; 30+ messages in thread
From: Vivek Goyal @ 2021-10-20 12:53 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Wed, Oct 20, 2021 at 12:02:32PM +0200, Hanna Reitz wrote:
> On 19.10.21 22:02, Vivek Goyal wrote:
> > On Thu, Sep 16, 2021 at 10:40:43AM +0200, Hanna Reitz wrote:
> > > Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> > > FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> > > its inode ID will remain in use until we drop our lo_inode (and
> > > lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> > > the inode ID as an lo_inode key, because any inode with an inode ID we
> > > find in lo_data.inodes (on the same filesystem) must be the exact same
> > > file.
> > > 
> > > This will change when we start setting lo_inode.fhandle so we do not
> > > have to keep an O_PATH FD open.  Then, unlinking such an inode will
> > > immediately remove it, so its ID can then be reused by newly created
> > > files, even while the lo_inode object is still there[1].
> > > 
> > > So creating a new file can then reuse the old file's inode ID, and
> > > looking up the new file would lead to us finding the old file's
> > > lo_inode, which is not ideal.
> > > 
> > > Luckily, just as file handles cause this problem, they also solve it:  A
> > > file handle contains a generation ID, which changes when an inode ID is
> > > reused, so the new file can be distinguished from the old one.  So all
> > > we need to do is to add a second map besides lo_data.inodes that maps
> > > file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> > > clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> > > 
> > > Unfortunately, we cannot rely on being able to generate file handles
> > > every time.  Therefore, we still enter every lo_inode object into
> > > inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> > > potential inodes_by_handle entry then has precedence, the inodes_by_ids
> > > entry is just a fallback.
> > > 
> > > Note that we do not generate lo_fhandle objects yet, and so we also do
> > > not enter anything into the inodes_by_handle map yet.  Also, all lookups
> > > skip that map.  We might manually create file handles with some code
> > > that is immediately removed by the next patch again, but that would
> > > break the assumption in lo_find() that every lo_inode with a non-NULL
> > > .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> > > leave actually using the inodes_by_handle map for the next patch.
> > > 
> > > [1] If some application in the guest still has the file open, there is
> > > going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> > > case, the inode will only go away once every application in the guest
> > > has closed it.  The problem described only applies to cases where the
> > > guest does not have the file open, and it is just in the dentry cache,
> > > basically.
> > > 
> > > Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> > > ---
> > >   tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
> > >   1 file changed, 65 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index bd8fc922ea..b7d6aa7f9d 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -186,7 +186,8 @@ struct lo_data {
> > >       int announce_submounts;
> > >       bool use_statx;
> > >       struct lo_inode root;
> > > -    GHashTable *inodes; /* protected by lo->mutex */
> > > +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
> > > +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
> > >       struct lo_map ino_map; /* protected by lo->mutex */
> > >       struct lo_map dirp_map; /* protected by lo->mutex */
> > >       struct lo_map fd_map; /* protected by lo->mutex */
> > > @@ -275,8 +276,9 @@ static struct {
> > >   /* That we loaded cap-ng in the current thread from the saved */
> > >   static __thread bool cap_loaded = 0;
> > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > -                                uint64_t mnt_id);
> > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > +                                const struct lo_fhandle *fhandle,
> > > +                                struct stat *st, uint64_t mnt_id);
> > >   static int xattr_map_client(const struct lo_data *lo, const char *client_name,
> > >                               char **out_name);
> > > @@ -1143,18 +1145,40 @@ out_err:
> > >       fuse_reply_err(req, saverr);
> > >   }
> > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > -                                uint64_t mnt_id)
> > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > +                                const struct lo_fhandle *fhandle,
> > > +                                struct stat *st, uint64_t mnt_id)
> > >   {
> > > -    struct lo_inode *p;
> > > -    struct lo_key key = {
> > > +    struct lo_inode *p = NULL;
> > > +    struct lo_key ids_key = {
> > >           .ino = st->st_ino,
> > >           .dev = st->st_dev,
> > >           .mnt_id = mnt_id,
> > >       };
> > >       pthread_mutex_lock(&lo->mutex);
> > > -    p = g_hash_table_lookup(lo->inodes, &key);
> > > +    if (fhandle) {
> > > +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
> > > +    }
> > > +    if (!p) {
> > > +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
> > > +        /*
> > > +         * When we had to fall back to looking up an inode by its
> > > +         * inode ID, ensure that we hit an entry that has a valid file
> > > +         * descriptor.  Having an FD open means that the inode cannot
> > > +         * really be deleted until the FD is closed, so that the inode
> > > +         * ID remains valid until we evict our lo_inode.
> > > +         * With no FD open (and just a file handle), the inode can be
> > > +         * deleted while we still have our lo_inode, and so the inode
> > > +         * ID may be reused by a completely different new inode.  We
> > > +         * then must look up the lo_inode by file handle, because this
> > > +         * handle contains a generation ID to differentiate between
> > > +         * the old and the new inode.
> > > +         */
> > > +        if (p && p->fd == -1) {
> > > +            p = NULL;
> > > +        }
> > What happens in following scenario.
> > 
> > - Say I have a hard linked file foo.txt with link foo-link.txt.
> > 
> > - I lookup foo.txt. We generate file handle and add inode for foo.txt
> >    to inode cache. lo_inode->fhandle will be valie but lo_inode->fd == -1.
> > 
> > - Now later lookup for foo-link.txt happens. Say this time we can't
> >    generate file handle.
> 
> Which we’ve already decided is practically impossible.
> 
> > When we try to lookup inode, lo_find() should
> >    return NULL. It will find inode by ids but not use it because inode
> >    was added using file handle and p->fd == -1. That means lookup
> >    for foo-link.txt will end up adding another inode, when it should
> >    not have?
> 
> Yes, it would end up adding another inode, which doesn’t seem catastrophic
> to me.  But again, the whole case seems impossible to me.

Have been thinking what can break if we end up creating two inodes for
hard linked files. One thing jumps out is posix locks. We keep the
posix lock hash table in lo_inode.

So if I open foo.txt, take posix lock. And then open foo-link.txt and
close the fd for foo-link.txt, it will not drop the posix lock held
on this inode because inodes have diverged and break remote posix lock
functionality.

Vivek



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 10/12] virtiofsd: Add inodes_by_handle hash table
  2021-10-20 12:29       ` Vivek Goyal
@ 2021-10-20 14:10         ` Hanna Reitz
  2021-10-20 18:06           ` Vivek Goyal
  0 siblings, 1 reply; 30+ messages in thread
From: Hanna Reitz @ 2021-10-20 14:10 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 20.10.21 14:29, Vivek Goyal wrote:
> On Wed, Oct 20, 2021 at 12:02:32PM +0200, Hanna Reitz wrote:
>> On 19.10.21 22:02, Vivek Goyal wrote:
>>> On Thu, Sep 16, 2021 at 10:40:43AM +0200, Hanna Reitz wrote:
>>>> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
>>>> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
>>>> its inode ID will remain in use until we drop our lo_inode (and
>>>> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
>>>> the inode ID as an lo_inode key, because any inode with an inode ID we
>>>> find in lo_data.inodes (on the same filesystem) must be the exact same
>>>> file.
>>>>
>>>> This will change when we start setting lo_inode.fhandle so we do not
>>>> have to keep an O_PATH FD open.  Then, unlinking such an inode will
>>>> immediately remove it, so its ID can then be reused by newly created
>>>> files, even while the lo_inode object is still there[1].
>>>>
>>>> So creating a new file can then reuse the old file's inode ID, and
>>>> looking up the new file would lead to us finding the old file's
>>>> lo_inode, which is not ideal.
>>>>
>>>> Luckily, just as file handles cause this problem, they also solve it:  A
>>>> file handle contains a generation ID, which changes when an inode ID is
>>>> reused, so the new file can be distinguished from the old one.  So all
>>>> we need to do is to add a second map besides lo_data.inodes that maps
>>>> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
>>>> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
>>>>
>>>> Unfortunately, we cannot rely on being able to generate file handles
>>>> every time.  Therefore, we still enter every lo_inode object into
>>>> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
>>>> potential inodes_by_handle entry then has precedence, the inodes_by_ids
>>>> entry is just a fallback.
>>>>
>>>> Note that we do not generate lo_fhandle objects yet, and so we also do
>>>> not enter anything into the inodes_by_handle map yet.  Also, all lookups
>>>> skip that map.  We might manually create file handles with some code
>>>> that is immediately removed by the next patch again, but that would
>>>> break the assumption in lo_find() that every lo_inode with a non-NULL
>>>> .fhandle must have an entry in inodes_by_handle and vice versa.  So we
>>>> leave actually using the inodes_by_handle map for the next patch.
>>>>
>>>> [1] If some application in the guest still has the file open, there is
>>>> going to be a corresponding FD mapping in lo_data.fd_map.  In such a
>>>> case, the inode will only go away once every application in the guest
>>>> has closed it.  The problem described only applies to cases where the
>>>> guest does not have the file open, and it is just in the dentry cache,
>>>> basically.
>>>>
>>>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>>>> ---
>>>>    tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
>>>>    1 file changed, 65 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>>>> index bd8fc922ea..b7d6aa7f9d 100644
>>>> --- a/tools/virtiofsd/passthrough_ll.c
>>>> +++ b/tools/virtiofsd/passthrough_ll.c
>>>> @@ -186,7 +186,8 @@ struct lo_data {
>>>>        int announce_submounts;
>>>>        bool use_statx;
>>>>        struct lo_inode root;
>>>> -    GHashTable *inodes; /* protected by lo->mutex */
>>>> +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
>>>> +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
>>>>        struct lo_map ino_map; /* protected by lo->mutex */
>>>>        struct lo_map dirp_map; /* protected by lo->mutex */
>>>>        struct lo_map fd_map; /* protected by lo->mutex */
>>>> @@ -275,8 +276,9 @@ static struct {
>>>>    /* That we loaded cap-ng in the current thread from the saved */
>>>>    static __thread bool cap_loaded = 0;
>>>> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
>>>> -                                uint64_t mnt_id);
>>>> +static struct lo_inode *lo_find(struct lo_data *lo,
>>>> +                                const struct lo_fhandle *fhandle,
>>>> +                                struct stat *st, uint64_t mnt_id);
>>>>    static int xattr_map_client(const struct lo_data *lo, const char *client_name,
>>>>                                char **out_name);
>>>> @@ -1143,18 +1145,40 @@ out_err:
>>>>        fuse_reply_err(req, saverr);
>>>>    }
>>>> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
>>>> -                                uint64_t mnt_id)
>>>> +static struct lo_inode *lo_find(struct lo_data *lo,
>>>> +                                const struct lo_fhandle *fhandle,
>>>> +                                struct stat *st, uint64_t mnt_id)
>>>>    {
>>>> -    struct lo_inode *p;
>>>> -    struct lo_key key = {
>>>> +    struct lo_inode *p = NULL;
>>>> +    struct lo_key ids_key = {
>>>>            .ino = st->st_ino,
>>>>            .dev = st->st_dev,
>>>>            .mnt_id = mnt_id,
>>>>        };
>>>>        pthread_mutex_lock(&lo->mutex);
>>>> -    p = g_hash_table_lookup(lo->inodes, &key);
>>>> +    if (fhandle) {
>>>> +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
>>>> +    }
>>>> +    if (!p) {
>>>> +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
>>>> +        /*
>>>> +         * When we had to fall back to looking up an inode by its
>>>> +         * inode ID, ensure that we hit an entry that has a valid file
>>>> +         * descriptor.  Having an FD open means that the inode cannot
>>>> +         * really be deleted until the FD is closed, so that the inode
>>>> +         * ID remains valid until we evict our lo_inode.
>>>> +         * With no FD open (and just a file handle), the inode can be
>>>> +         * deleted while we still have our lo_inode, and so the inode
>>>> +         * ID may be reused by a completely different new inode.  We
>>>> +         * then must look up the lo_inode by file handle, because this
>>>> +         * handle contains a generation ID to differentiate between
>>>> +         * the old and the new inode.
>>>> +         */
>>>> +        if (p && p->fd == -1) {
>>>> +            p = NULL;
>>>> +        }
>>> What happens in following scenario.
>>>
>>> - Say I have a hard linked file foo.txt with link foo-link.txt.
>>>
>>> - I lookup foo.txt. We generate file handle and add inode for foo.txt
>>>     to inode cache. lo_inode->fhandle will be valie but lo_inode->fd == -1.
>>>
>>> - Now later lookup for foo-link.txt happens. Say this time we can't
>>>     generate file handle.
>> Which we’ve already decided is practically impossible.
> Agreed that probably is very less but it can happen when sufficient
> resources are not available, like -ENOMEM.
>
> static long do_sys_name_to_handle(struct path *path,
>                                    struct file_handle __user *ufh,
>                                    int __user *mnt_id)
> {
>          handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_bytes,
>                           GFP_KERNEL);
>          if (!handle)
>                  return -ENOMEM;
> }

OK, but do we really want to be prepared for an ENOMEM from the kernel?

>>> When we try to lookup inode, lo_find() should
>>>     return NULL. It will find inode by ids but not use it because inode
>>>     was added using file handle and p->fd == -1. That means lookup
>>>     for foo-link.txt will end up adding another inode, when it should
>>>     not have?
>> Yes, it would end up adding another inode, which doesn’t seem catastrophic
>> to me.
>> But again, the whole case seems impossible to me.
> Given we can get -ENOMEM error it is not impossible.
>
> I thought all along you wanted to write code so that we could fallback
> to ids in case of errors.

Well, yes, if possible, but unfortunately it isn’t possible here.

> Anyway, if you agree that except the case of
> -EOPNOTSUPP, we don't have to worry about fallback, then let us just
> reutrn error to caller if get_file_handle() fails (except the case
> of -EOPNOTSUPP).

We’ve had this discussion before, and so I tried that for v3 but 
abandoned it again because I found it unreasonably complicated. Always 
falling back is simpler than deciding when to return an error to the 
guest and when not to; and also, considering errors other than 
EOPNOTSUPP should be rare, the code that returns errors to the guest 
would effectively be dead code.

If we really must return an error in the case you describe, then I don’t 
think there’s a way around it, though...  In virtiofsd-rs, we could let 
FileHandle::from_name_at() return an io::Result<Option<Self>>, such that 
it returns Ok(None) in case of EOPNOTSUPP, or an Error in case there is 
an unexpected error like ENOMEM.  I think that shouldn’t be too bad, but 
my experience says the devil’s in the details.

It’s just that it seems to be a purely theoretical problem, because if 
the kernel gives us ENOMEM, we probably won’t live too long anyway.

Hanna



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 10/12] virtiofsd: Add inodes_by_handle hash table
  2021-10-20 14:10         ` Hanna Reitz
@ 2021-10-20 18:06           ` Vivek Goyal
  0 siblings, 0 replies; 30+ messages in thread
From: Vivek Goyal @ 2021-10-20 18:06 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Wed, Oct 20, 2021 at 04:10:51PM +0200, Hanna Reitz wrote:
> On 20.10.21 14:29, Vivek Goyal wrote:
> > On Wed, Oct 20, 2021 at 12:02:32PM +0200, Hanna Reitz wrote:
> > > On 19.10.21 22:02, Vivek Goyal wrote:
> > > > On Thu, Sep 16, 2021 at 10:40:43AM +0200, Hanna Reitz wrote:
> > > > > Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> > > > > FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> > > > > its inode ID will remain in use until we drop our lo_inode (and
> > > > > lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> > > > > the inode ID as an lo_inode key, because any inode with an inode ID we
> > > > > find in lo_data.inodes (on the same filesystem) must be the exact same
> > > > > file.
> > > > > 
> > > > > This will change when we start setting lo_inode.fhandle so we do not
> > > > > have to keep an O_PATH FD open.  Then, unlinking such an inode will
> > > > > immediately remove it, so its ID can then be reused by newly created
> > > > > files, even while the lo_inode object is still there[1].
> > > > > 
> > > > > So creating a new file can then reuse the old file's inode ID, and
> > > > > looking up the new file would lead to us finding the old file's
> > > > > lo_inode, which is not ideal.
> > > > > 
> > > > > Luckily, just as file handles cause this problem, they also solve it:  A
> > > > > file handle contains a generation ID, which changes when an inode ID is
> > > > > reused, so the new file can be distinguished from the old one.  So all
> > > > > we need to do is to add a second map besides lo_data.inodes that maps
> > > > > file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> > > > > clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> > > > > 
> > > > > Unfortunately, we cannot rely on being able to generate file handles
> > > > > every time.  Therefore, we still enter every lo_inode object into
> > > > > inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> > > > > potential inodes_by_handle entry then has precedence, the inodes_by_ids
> > > > > entry is just a fallback.
> > > > > 
> > > > > Note that we do not generate lo_fhandle objects yet, and so we also do
> > > > > not enter anything into the inodes_by_handle map yet.  Also, all lookups
> > > > > skip that map.  We might manually create file handles with some code
> > > > > that is immediately removed by the next patch again, but that would
> > > > > break the assumption in lo_find() that every lo_inode with a non-NULL
> > > > > .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> > > > > leave actually using the inodes_by_handle map for the next patch.
> > > > > 
> > > > > [1] If some application in the guest still has the file open, there is
> > > > > going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> > > > > case, the inode will only go away once every application in the guest
> > > > > has closed it.  The problem described only applies to cases where the
> > > > > guest does not have the file open, and it is just in the dentry cache,
> > > > > basically.
> > > > > 
> > > > > Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> > > > > ---
> > > > >    tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
> > > > >    1 file changed, 65 insertions(+), 16 deletions(-)
> > > > > 
> > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > > > index bd8fc922ea..b7d6aa7f9d 100644
> > > > > --- a/tools/virtiofsd/passthrough_ll.c
> > > > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > > > @@ -186,7 +186,8 @@ struct lo_data {
> > > > >        int announce_submounts;
> > > > >        bool use_statx;
> > > > >        struct lo_inode root;
> > > > > -    GHashTable *inodes; /* protected by lo->mutex */
> > > > > +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
> > > > > +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
> > > > >        struct lo_map ino_map; /* protected by lo->mutex */
> > > > >        struct lo_map dirp_map; /* protected by lo->mutex */
> > > > >        struct lo_map fd_map; /* protected by lo->mutex */
> > > > > @@ -275,8 +276,9 @@ static struct {
> > > > >    /* That we loaded cap-ng in the current thread from the saved */
> > > > >    static __thread bool cap_loaded = 0;
> > > > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > > > -                                uint64_t mnt_id);
> > > > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > > > +                                const struct lo_fhandle *fhandle,
> > > > > +                                struct stat *st, uint64_t mnt_id);
> > > > >    static int xattr_map_client(const struct lo_data *lo, const char *client_name,
> > > > >                                char **out_name);
> > > > > @@ -1143,18 +1145,40 @@ out_err:
> > > > >        fuse_reply_err(req, saverr);
> > > > >    }
> > > > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > > > -                                uint64_t mnt_id)
> > > > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > > > +                                const struct lo_fhandle *fhandle,
> > > > > +                                struct stat *st, uint64_t mnt_id)
> > > > >    {
> > > > > -    struct lo_inode *p;
> > > > > -    struct lo_key key = {
> > > > > +    struct lo_inode *p = NULL;
> > > > > +    struct lo_key ids_key = {
> > > > >            .ino = st->st_ino,
> > > > >            .dev = st->st_dev,
> > > > >            .mnt_id = mnt_id,
> > > > >        };
> > > > >        pthread_mutex_lock(&lo->mutex);
> > > > > -    p = g_hash_table_lookup(lo->inodes, &key);
> > > > > +    if (fhandle) {
> > > > > +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
> > > > > +    }
> > > > > +    if (!p) {
> > > > > +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
> > > > > +        /*
> > > > > +         * When we had to fall back to looking up an inode by its
> > > > > +         * inode ID, ensure that we hit an entry that has a valid file
> > > > > +         * descriptor.  Having an FD open means that the inode cannot
> > > > > +         * really be deleted until the FD is closed, so that the inode
> > > > > +         * ID remains valid until we evict our lo_inode.
> > > > > +         * With no FD open (and just a file handle), the inode can be
> > > > > +         * deleted while we still have our lo_inode, and so the inode
> > > > > +         * ID may be reused by a completely different new inode.  We
> > > > > +         * then must look up the lo_inode by file handle, because this
> > > > > +         * handle contains a generation ID to differentiate between
> > > > > +         * the old and the new inode.
> > > > > +         */
> > > > > +        if (p && p->fd == -1) {
> > > > > +            p = NULL;
> > > > > +        }
> > > > What happens in following scenario.
> > > > 
> > > > - Say I have a hard linked file foo.txt with link foo-link.txt.
> > > > 
> > > > - I lookup foo.txt. We generate file handle and add inode for foo.txt
> > > >     to inode cache. lo_inode->fhandle will be valie but lo_inode->fd == -1.
> > > > 
> > > > - Now later lookup for foo-link.txt happens. Say this time we can't
> > > >     generate file handle.
> > > Which we’ve already decided is practically impossible.
> > Agreed that probably is very less but it can happen when sufficient
> > resources are not available, like -ENOMEM.
> > 
> > static long do_sys_name_to_handle(struct path *path,
> >                                    struct file_handle __user *ufh,
> >                                    int __user *mnt_id)
> > {
> >          handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_bytes,
> >                           GFP_KERNEL);
> >          if (!handle)
> >                  return -ENOMEM;
> > }
> 
> OK, but do we really want to be prepared for an ENOMEM from the kernel?

Why not. Other filesystem methods are dealing with it and return -ENOMEM
to fuse client. So I don't see any reason why file handle stuff should
be special. In fact in an attempt to trying to make it special just
increases complexity and having to deal with fallback plans (except
-EOPNOTSUP).


> 
> > > > When we try to lookup inode, lo_find() should
> > > >     return NULL. It will find inode by ids but not use it because inode
> > > >     was added using file handle and p->fd == -1. That means lookup
> > > >     for foo-link.txt will end up adding another inode, when it should
> > > >     not have?
> > > Yes, it would end up adding another inode, which doesn’t seem catastrophic
> > > to me.
> > > But again, the whole case seems impossible to me.
> > Given we can get -ENOMEM error it is not impossible.
> > 
> > I thought all along you wanted to write code so that we could fallback
> > to ids in case of errors.
> 
> Well, yes, if possible, but unfortunately it isn’t possible here.
> 
> > Anyway, if you agree that except the case of
> > -EOPNOTSUPP, we don't have to worry about fallback, then let us just
> > reutrn error to caller if get_file_handle() fails (except the case
> > of -EOPNOTSUPP).
> 
> We’ve had this discussion before, and so I tried that for v3 but abandoned
> it again because I found it unreasonably complicated.

I guess you had some code previously (when you were not parsing mountinfo)
that made it harder to deal with -ENOMEM. You could not afford a failure.
With a move to mountinfo parsing, that restriction might have gone away
now.

> Always falling back is
> simpler than deciding when to return an error to the guest and when not to;
> and also, considering errors other than EOPNOTSUPP should be rare, the code
> that returns errors to the guest would effectively be dead code.

May be. I probably should take your patches and try to convert them to 
return all temporary errors to client (except -ENOTSUPP) and see what
are the issues I face.

> 
> If we really must return an error in the case you describe, then I don’t
> think there’s a way around it, though...  In virtiofsd-rs, we could let
> FileHandle::from_name_at() return an io::Result<Option<Self>>, such that it
> returns Ok(None) in case of EOPNOTSUPP, or an Error in case there is an
> unexpected error like ENOMEM.  I think that shouldn’t be too bad, but my
> experience says the devil’s in the details.
> 
> It’s just that it seems to be a purely theoretical problem, because if the
> kernel gives us ENOMEM, we probably won’t live too long anyway.

Or may be some other process will exit/die freeing memory and virtiofsd
will continue to live. So we should not conclude that after -ENOMEM
virtiofsd will not live too long anyway. Atleast code should not be
written with this assumption.

Thanks
Vivek



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 01/12] virtiofsd: Keep /proc/self/mountinfo open
  2021-10-20  9:04     ` Hanna Reitz
@ 2021-10-20 18:25       ` Vivek Goyal
  0 siblings, 0 replies; 30+ messages in thread
From: Vivek Goyal @ 2021-10-20 18:25 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Wed, Oct 20, 2021 at 11:04:31AM +0200, Hanna Reitz wrote:
> On 18.10.21 19:07, Vivek Goyal wrote:
> > On Thu, Sep 16, 2021 at 10:40:34AM +0200, Hanna Reitz wrote:
> > > File handles are specific to mounts, and so name_to_handle_at() returns
> > > the respective mount ID.  However, open_by_handle_at() is not content
> > > with an ID, it wants a file descriptor for some inode on the mount,
> > > which we have to open.
> > > 
> > > We want to use /proc/self/mountinfo to find the mounts' root directories
> > > so we can open them and pass the respective FDs to open_by_handle_at().
> > > (We need to use the root directory, because we want the inode belonging
> > > to every mount FD be deletable.  Before the root directory can be
> > > deleted, all entries within must have been closed, and so when it is
> > > deleted, there should not be any file handles left that need its FD as
> > > their mount FD.  Thus, we can then close that FD and the inode can be
> > > deleted.[1])
> > > 
> > > That is why we need to open /proc/self/mountinfo so that we can use it
> > > to translate mount IDs into root directory paths.  We have to open it
> > > after setup_mounts() was called, because if we try to open it before, it
> > > will appear as an empty file after setup_mounts().
> > > 
> > > [1] Note that in practice, you still cannot delete the mount root
> > > directory.  It is a mount point on the host, after all, and mount points
> > > cannot be deleted.  But by using the mount point as the mount FD, we
> > > will at least not hog any actually deletable inodes.
> > > 
> > > Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> > > ---
> > >   tools/virtiofsd/passthrough_ll.c | 40 ++++++++++++++++++++++++++++++++
> > >   1 file changed, 40 insertions(+)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index 38b2af8599..6511a6acb4 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -172,6 +172,8 @@ struct lo_data {
> > >       /* An O_PATH file descriptor to /proc/self/fd/ */
> > >       int proc_self_fd;
> > > +    /* A read-only FILE pointer for /proc/self/mountinfo */
> > > +    FILE *mountinfo_fp;
> > >       int user_killpriv_v2, killpriv_v2;
> > >       /* If set, virtiofsd is responsible for setting umask during creation */
> > >       bool change_umask;
> > > @@ -3718,6 +3720,19 @@ static void setup_chroot(struct lo_data *lo)
> > >   static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
> > >                             bool enable_syslog)
> > >   {
> > > +    int proc_self, mountinfo_fd;
> > > +    int saverr;
> > > +
> > > +    /*
> > > +     * Open /proc/self before we pivot to the new root so we can still
> > > +     * open /proc/self/mountinfo afterwards
> > > +     */
> > > +    proc_self = open("/proc/self", O_PATH);
> > > +    if (proc_self < 0) {
> > > +        fuse_log(FUSE_LOG_WARNING, "Failed to open /proc/self: %m; "
> > > +                 "will not be able to use file handles\n");
> > > +    }
> > > +
> > Hi Hanna,
> > 
> > Should we open /proc/self and /proc/self/mountinfo only if user wants
> > to file handle. We have already parsed options by now so we know.
> 
> I didn’t think it would matter given that it wouldn’t have an adverse
> effect.  If we can’t open them (and I can’t imagine a case where we’d be
> unable to open them), the only result is a warning.
> 
> > Also, if user asked for file handles, and we can't open /proc/self or
> > /proc/self/mountinfo successfully, I would think we should error out
> > and not continue (instead of just log it and continue).
> 
> Well, that would break the assumption I had above.  Not that that’s really
> relevant, I just want to mention it.
> 
> File handles are a best effort in any case.  If they don’t work, we always
> fall back.  So I don’t know whether we must error out.
> 
> OTOH if we know they can never work, then perhaps it would be more sensible
> to error out.

Yes. If they can't work because filesystem does not have capability or
we don't have CAP_DAC_READ_SEARCH or any other necessary component is
not there then we should fail.
> 
> FWIW I’ve ported the relevant v1..v4 changes to virtiofsd-rs, and there it
> errors out.  The error is unconditional, though, so even if you don’t
> request file handles, it’ll try to open mountinfo and exit on error.  I
> found that reasonable because I can’t imagine a case where opening
> /proc/self/fd would work, but /proc/self/mountinfo wouldn’t – and working
> around that would be a bit cumbersome (it would mean wrapping
> PassthroughFs.mount_fds in an Option<> and .unwrap()-ing it on every use,
> with a comment why that’s fine). Honestly, I’d prefer to wait until we get a
> bug report about a failure to open /proc/self/mountinfo.
> 
> > That seems to be general theme. If user asked for a feature and if
> > we can't enable it, we error out and let user retry without that
> > particular feature.
> > 
> > >       if (lo->sandbox == SANDBOX_NAMESPACE) {
> > >           setup_namespaces(lo, se);
> > >           setup_mounts(lo->source);
> > > @@ -3725,6 +3740,31 @@ static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
> > >           setup_chroot(lo);
> > >       }
> > > +    /*
> > > +     * Opening /proc/self/mountinfo before the umount2() call in
> > > +     * setup_mounts() leads to the file appearing empty.  That is why
> > > +     * we defer opening it until here.
> > > +     */
> > > +    lo->mountinfo_fp = NULL;
> > > +    if (proc_self >= 0) {
> > > +        mountinfo_fd = openat(proc_self, "mountinfo", O_RDONLY);
> > > +        if (mountinfo_fd < 0) {
> > > +            saverr = errno;
> > > +        } else if (mountinfo_fd >= 0) {
> > > +            lo->mountinfo_fp = fdopen(mountinfo_fd, "r");
> > > +            if (!lo->mountinfo_fp) {
> > > +                saverr = errno;
> > > +                close(mountinfo_fd);
> > > +            }
> > > +        }
> > > +        if (!lo->mountinfo_fp) {
> > > +            fuse_log(FUSE_LOG_WARNING, "Failed to open /proc/self/mountinfo: "
> > > +                     "%s; will not be able to use file handles\n",
> > > +                     strerror(saverr));
> > > +        }
> > > +        close(proc_self);
> > > +    }
> > > +
> > Above code couple probably be moved in a helper function. Makes it
> > easier to read setup_sandbox(). Same here, open mountinfo only if
> > user wants file handle support and error out if file handle support
> > can't be enabled.
> 
> Perhaps, but frankly I don’t see a need to keep setup_sandbox() readable. 
> AFAIU, we are planning to deprecate C virtiofsd anyway, so while it pains me
> to say something like this, we don’t need to keep it maintainable.
> 
> Now that I’ve opened an MR to bring the v1..v4 changes to virtiofsd-rs
> (https://gitlab.com/virtio-fs/virtiofsd-rs/-/merge_requests/41), I also
> don’t really see a justification for putting further development effort into
> bringing file handles to C virtiofsd.  Of course I’m still grateful for your
> review, and I’ll try to adapt it to virtiofsd-rs, but right now I don’t plan
> on sending a v5.

Fair enough. If file handle stuff is important for C version, I am hoping
somebody else can pick this work. Anyway I think its almost 90% ready, we
are just dealing with last 10% of issues.

Vivek



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 11/12] virtiofsd: Optionally fill lo_inode.fhandle
  2021-10-20 10:00     ` Hanna Reitz
@ 2021-10-20 18:53       ` Vivek Goyal
  0 siblings, 0 replies; 30+ messages in thread
From: Vivek Goyal @ 2021-10-20 18:53 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Wed, Oct 20, 2021 at 12:00:07PM +0200, Hanna Reitz wrote:

[..]
> > > @@ -1302,13 +1512,26 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> > >           goto out;
> > >       }
> > > -    newfd = openat(dir_path_fd.fd, name, O_PATH | O_NOFOLLOW);
> > > -    if (newfd == -1) {
> > > -        goto out_err;
> > > +    fh = get_file_handle(lo, dir_path_fd.fd, name, &can_open_handle);
> > > +    if (!fh || !can_open_handle) {
> > > +        /*
> > > +         * If we will not be able to open the file handle again
> > > +         * (can_open_handle is false), open an FD that we can put into
> > > +         * lo_inode (in case we need to create a new lo_inode).
> > > +         */
> > > +        newfd = openat(dir_path_fd.fd, name, O_PATH | O_NOFOLLOW);
> > > +        if (newfd == -1) {
> > > +            goto out_err;
> > > +        }
> > >       }
> > > -    res = do_statx(lo, newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
> > > -                   &mnt_id);
> > > +    if (newfd >= 0) {
> > > +        res = do_statx(lo, newfd, "", &e->attr,
> > > +                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
> > > +    } else {
> > > +        res = do_statx(lo, dir_path_fd.fd, name, &e->attr,
> > > +                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
> > > +    }
> > >       if (res == -1) {
> > >           goto out_err;
> > >       }
> > > @@ -1318,9 +1541,19 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> > >           e->attr_flags |= FUSE_ATTR_SUBMOUNT;
> > Can this FUSE_ATTR_SUBMOUNT check be racy w.r.t file handles. I mean
> > say we open the file handle, and before we call do_statx(), another
> > mount shows up on the directory in queustion. So stats now belong
> > to file in new mount and we will think it is a SUBMOUNT. So effectively
> > now we have fh belonging to old file but stats belonging to new file
> > in new mount?
> 
> Yes.  Not just the submount, but the whole stat information, so also the
> file type that goes into the lo_inode.
> 
> I thought this wasn’t too bad, though now I don’t really know why. Perhaps
> it was just how I started the implementation and I never could get myself to
> care enough (not good, I know).  Thanks for making me care! :)
> 
> We could theoretically open an O_PATH FD from the file handle to get the
> stat information from it, but that wouldn’t work for un-openable file
> handles.
> 
> So I think the best is to open an O_PATH FD unconditionally first, and then
> generate the file handle from it.  Then we can stat the FD.

Yes, this sounds like a more reasonable appraoch. This O_PATH fd will be
temporary in nature, so it should not be a problem.

[..]
> > > + *
> > > + * Passing true for cap_dac_read_search adds CAP_DAC_READ_SEARCH to the
> > > + * allowlist.
> > >    */
> > > -static void setup_capabilities(char *modcaps_in)
> > > +static void setup_capabilities(char *modcaps_in, bool cap_dac_read_search)
> > >   {
> > >       char *modcaps = modcaps_in;
> > >       pthread_mutex_lock(&cap.mutex);
> > > @@ -4012,6 +4266,17 @@ static void setup_capabilities(char *modcaps_in)
> > >           exit(1);
> > >       }
> > > +    /*
> > > +     * If we need CAP_DAC_READ_SEARCH (for file handles), add that, too.
> > > +     */
> > > +    if (cap_dac_read_search &&
> > > +        capng_update(CAPNG_ADD, CAPNG_PERMITTED | CAPNG_EFFECTIVE,
> > > +                     CAP_DAC_READ_SEARCH)) {
> > > +        fuse_log(FUSE_LOG_ERR, "%s: capng_update failed for "
> > > +                 "CAP_DAC_READ_SEARCH\n", __func__);
> > > +        exit(1);
> > > +    }
> > > +
> > >       /*
> > >        * The modcaps option is a colon separated list of caps,
> > >        * each preceded by either + or -.
> > > @@ -4158,7 +4423,7 @@ static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
> > >       }
> > >       setup_seccomp(enable_syslog);
> > > -    setup_capabilities(g_strdup(lo->modcaps));
> > > +    setup_capabilities(g_strdup(lo->modcaps), lo->inode_file_handles);
> > >   }
> > >   /* Set the maximum number of open file descriptors */
> > > @@ -4498,6 +4763,14 @@ int main(int argc, char *argv[])
> > >       lo.use_statx = true;
> > > +#if !defined(CONFIG_STATX) || !defined(STATX_MNT_ID)
> > > +    if (lo.inode_file_handles) {
> > > +        fuse_log(FUSE_LOG_WARNING,
> > > +                 "No statx() or mount ID support: Will not be able to use file "
> > > +                 "handles for inodes\n");
> > > +    }
> > Again, I think we should error out if user asked for file handle support
> > explicitly and we can't enable it. But if we end up enabling by default,
> > it probably is fine to just log a message and not use it.
> > 
> > This begs the question what happens if filesystem does not support the
> > file handles. Ideally, I would think that we can error out.But for
> > submounts check will happen much later. For root mount atleast we
> > should be able to check at startup time and make sure file handles are
> > supported by filesystem.
> 
> I disagree, because we’ve decided that enabling file handles means
> best-effort.

Only for the cases where failure is temporary in nature or submounts don't
support file handles, right?

I thought you agreed that we should fail if we can't use file handles at
all.

> As for CONFIG_STATX or STATX_MNT_ID, those are things that
> will matter less over time anyway (because they will always be present).

This failure seems permanent and one will not be able to use file handles
at all.

Anyway, it probably will be nice to have first patch in the series, which
is just documentation patch and explains this best-effort nature of
file handle option and when to expect failure and when it will silently
fallback to O_PATH fd.

Vivek



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-10-20 18:54 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-16  8:40 [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Hanna Reitz
2021-09-16  8:40 ` [PATCH v4 01/12] virtiofsd: Keep /proc/self/mountinfo open Hanna Reitz
2021-10-18 17:07   ` Vivek Goyal
2021-10-20  9:04     ` Hanna Reitz
2021-10-20 18:25       ` Vivek Goyal
2021-09-16  8:40 ` [PATCH v4 02/12] virtiofsd: Limit setxattr()'s creds-dropped region Hanna Reitz
2021-10-18 17:20   ` Vivek Goyal
2021-10-20  9:11     ` Hanna Reitz
2021-09-16  8:40 ` [PATCH v4 03/12] virtiofsd: Add TempFd structure Hanna Reitz
2021-09-16  8:40 ` [PATCH v4 04/12] virtiofsd: Use lo_inode_open() instead of openat() Hanna Reitz
2021-09-16  8:40 ` [PATCH v4 05/12] virtiofsd: Add lo_inode_fd() helper Hanna Reitz
2021-09-16  8:40 ` [PATCH v4 06/12] virtiofsd: Let lo_fd() return a TempFd Hanna Reitz
2021-09-16  8:40 ` [PATCH v4 07/12] virtiofsd: Let lo_inode_open() " Hanna Reitz
2021-10-18 19:18   ` Vivek Goyal
2021-10-20  9:15     ` Hanna Reitz
2021-09-16  8:40 ` [PATCH v4 08/12] virtiofsd: Pass lo_data to lo_inode_{fd,open}() Hanna Reitz
2021-09-16  8:40 ` [PATCH v4 09/12] virtiofsd: Add lo_inode.fhandle Hanna Reitz
2021-09-16  8:40 ` [PATCH v4 10/12] virtiofsd: Add inodes_by_handle hash table Hanna Reitz
2021-10-19 20:02   ` Vivek Goyal
2021-10-20 10:02     ` Hanna Reitz
2021-10-20 12:29       ` Vivek Goyal
2021-10-20 14:10         ` Hanna Reitz
2021-10-20 18:06           ` Vivek Goyal
2021-10-20 12:53       ` Vivek Goyal
2021-09-16  8:40 ` [PATCH v4 11/12] virtiofsd: Optionally fill lo_inode.fhandle Hanna Reitz
2021-10-19 18:57   ` Vivek Goyal
2021-10-20 10:00     ` Hanna Reitz
2021-10-20 18:53       ` Vivek Goyal
2021-09-16  8:40 ` [PATCH v4 12/12] virtiofsd: Add lazy lo_do_find() Hanna Reitz
2021-10-18 18:08 ` [PATCH v4 00/12] virtiofsd: Allow using file handles instead of O_PATH FDs Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).