All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/10] virtiofsd: Allow using file handles instead of O_PATH FDs
@ 2021-07-30 15:01 ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal, Max Reitz

Hi,

v1 cover letter for an overview:
https://listman.redhat.com/archives/virtio-fs/2021-June/msg00033.html

v2 cover letter:
https://listman.redhat.com/archives/virtio-fs/2021-June/msg00074.html

For v3, at first I attempted to have errors related to file handle
generation (name_to_handle_at()) be returned to the guest unless they
are cases where file name generation is simply not supported, and only
then do a fallback to an O_PATH FD, as Vivek has suggested.

However, I found that to be rather complicated.  (Always falling back is
just simpler.)  Furthermore, because we believe that name_to_handle_at()
can rarely fail except for EOPNOTSUPP, there should be little difference
in practice.

Therefore, in v3, I kept the v2 model of always falling back to an
O_PATH FD when an error occurred during handle generation.

What did change in v3 is the following:
- I added patch 1, because f1aa1774dfb happened in the meantime, and
  this is basically what we did for virtiofsd-rs in the form of
  31e7ac63944 (virtiofsd-rs commit hash)

- Patch 4: In lookup_name(), I noticed that I failed to invoke
  lo_inode_put() to match the lo_inode() from the beginning of the
  function in all error paths.  Fixed by adding a common error path.

- Patch 6: Mostly contextual rebase conflicts (partly because of patch
  1), but also one functional change: I Dropped the `assert(fd >= 0)`
  under `if (open_inode)` in lo_setxattr(), because `fd` is dropped by
  this patch (and `inode_fd` is used regardless of the value of
  `open_inode` we can’t assert anything similar on it).

- Patch 8:
  - Fixed the condition to reject results found by st_ino lookup.
    - st_ino on its own is only a valid identifier/key if we have an
      O_PATH fd for its respective lo_inode, because otherwise the inode
      may be unlinked and its st_ino might be reused by some new inode
    - It does not matter whether lo_find()’s caller has supplied a file
      handle for a prior lookup by handle or not, so drop that part of
      the condition
    - Semantically, it does not matter whether the lo_inode has a file
      handle or not – what matters is whether it has an O_PATH fd or
      not.  (The two are linked by a `handle <=> !fd` condition, so that
      part wasn’t technically wrong, just semantically.)
    - In accordance with the last point, I rewrote the comment
      explaining why we have to reject such results.
  - Rebase conflict in lookup_name() because of the fix in patch 4

- Patch 9:
  - Non-functional change in lo_do_lookup() to separate the
    get_file_handle()/openat() part from the do_statx() calls (and have
    the do_statx() calls be side by side) – as a side effect, this makes
    the diff to master slightly smaller.
  - Rebase conflict in lookup_name() because of the fix in patch 4

- Patch 10:
  - Rebase conflict in lookup_name() because of the fix in patch 4


Max Reitz (10):
  virtiofsd: Limit setxattr()'s creds-dropped region
  virtiofsd: Add TempFd structure
  virtiofsd: Use lo_inode_open() instead of openat()
  virtiofsd: Add lo_inode_fd() helper
  virtiofsd: Let lo_fd() return a TempFd
  virtiofsd: Let lo_inode_open() return a TempFd
  virtiofsd: Add lo_inode.fhandle
  virtiofsd: Add inodes_by_handle hash table
  virtiofsd: Optionally fill lo_inode.fhandle
  virtiofsd: Add lazy lo_do_find()

 tools/virtiofsd/helper.c              |   3 +
 tools/virtiofsd/passthrough_ll.c      | 869 +++++++++++++++++++++-----
 tools/virtiofsd/passthrough_seccomp.c |   2 +
 3 files changed, 720 insertions(+), 154 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 88+ messages in thread

* [Virtio-fs] [PATCH v3 00/10] virtiofsd: Allow using file handles instead of O_PATH FDs
@ 2021-07-30 15:01 ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Vivek Goyal, Max Reitz

Hi,

v1 cover letter for an overview:
https://listman.redhat.com/archives/virtio-fs/2021-June/msg00033.html

v2 cover letter:
https://listman.redhat.com/archives/virtio-fs/2021-June/msg00074.html

For v3, at first I attempted to have errors related to file handle
generation (name_to_handle_at()) be returned to the guest unless they
are cases where file name generation is simply not supported, and only
then do a fallback to an O_PATH FD, as Vivek has suggested.

However, I found that to be rather complicated.  (Always falling back is
just simpler.)  Furthermore, because we believe that name_to_handle_at()
can rarely fail except for EOPNOTSUPP, there should be little difference
in practice.

Therefore, in v3, I kept the v2 model of always falling back to an
O_PATH FD when an error occurred during handle generation.

What did change in v3 is the following:
- I added patch 1, because f1aa1774dfb happened in the meantime, and
  this is basically what we did for virtiofsd-rs in the form of
  31e7ac63944 (virtiofsd-rs commit hash)

- Patch 4: In lookup_name(), I noticed that I failed to invoke
  lo_inode_put() to match the lo_inode() from the beginning of the
  function in all error paths.  Fixed by adding a common error path.

- Patch 6: Mostly contextual rebase conflicts (partly because of patch
  1), but also one functional change: I Dropped the `assert(fd >= 0)`
  under `if (open_inode)` in lo_setxattr(), because `fd` is dropped by
  this patch (and `inode_fd` is used regardless of the value of
  `open_inode` we can’t assert anything similar on it).

- Patch 8:
  - Fixed the condition to reject results found by st_ino lookup.
    - st_ino on its own is only a valid identifier/key if we have an
      O_PATH fd for its respective lo_inode, because otherwise the inode
      may be unlinked and its st_ino might be reused by some new inode
    - It does not matter whether lo_find()’s caller has supplied a file
      handle for a prior lookup by handle or not, so drop that part of
      the condition
    - Semantically, it does not matter whether the lo_inode has a file
      handle or not – what matters is whether it has an O_PATH fd or
      not.  (The two are linked by a `handle <=> !fd` condition, so that
      part wasn’t technically wrong, just semantically.)
    - In accordance with the last point, I rewrote the comment
      explaining why we have to reject such results.
  - Rebase conflict in lookup_name() because of the fix in patch 4

- Patch 9:
  - Non-functional change in lo_do_lookup() to separate the
    get_file_handle()/openat() part from the do_statx() calls (and have
    the do_statx() calls be side by side) – as a side effect, this makes
    the diff to master slightly smaller.
  - Rebase conflict in lookup_name() because of the fix in patch 4

- Patch 10:
  - Rebase conflict in lookup_name() because of the fix in patch 4


Max Reitz (10):
  virtiofsd: Limit setxattr()'s creds-dropped region
  virtiofsd: Add TempFd structure
  virtiofsd: Use lo_inode_open() instead of openat()
  virtiofsd: Add lo_inode_fd() helper
  virtiofsd: Let lo_fd() return a TempFd
  virtiofsd: Let lo_inode_open() return a TempFd
  virtiofsd: Add lo_inode.fhandle
  virtiofsd: Add inodes_by_handle hash table
  virtiofsd: Optionally fill lo_inode.fhandle
  virtiofsd: Add lazy lo_do_find()

 tools/virtiofsd/helper.c              |   3 +
 tools/virtiofsd/passthrough_ll.c      | 869 +++++++++++++++++++++-----
 tools/virtiofsd/passthrough_seccomp.c |   2 +
 3 files changed, 720 insertions(+), 154 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v3 01/10] virtiofsd: Limit setxattr()'s creds-dropped region
  2021-07-30 15:01 ` [Virtio-fs] " Max Reitz
@ 2021-07-30 15:01   ` Max Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal, Max Reitz

We only need to drop/switch our credentials for the (f)setxattr() call
alone, not for the openat() or fchdir() around it.

(Right now, this may not be that big of a problem, but with inodes being
identified by file handles instead of an O_PATH fd, we will need
open_by_handle_at() calls here, which is really fickle when it comes to
credentials being dropped.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 34 +++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 38b2af8599..1f27eeabc5 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3121,6 +3121,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     bool switched_creds = false;
     bool cap_fsetid_dropped = false;
     struct lo_cred old = {};
+    bool open_inode;
 
     if (block_xattr(lo, in_name)) {
         fuse_reply_err(req, EOPNOTSUPP);
@@ -3155,7 +3156,24 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     fuse_log(FUSE_LOG_DEBUG, "lo_setxattr(ino=%" PRIu64
              ", name=%s value=%s size=%zd)\n", ino, name, value, size);
 
+    /*
+     * We can only open regular files or directories.  If the inode is
+     * something else, we have to enter /proc/self/fd and use
+     * setxattr() on the link's filename there.
+     */
+    open_inode = S_ISREG(inode->filetype) || S_ISDIR(inode->filetype);
     sprintf(procname, "%i", inode->fd);
+    if (open_inode) {
+        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        if (fd < 0) {
+            saverr = errno;
+            goto out;
+        }
+    } else {
+        /* fchdir should not fail here */
+        FCHDIR_NOFAIL(lo->proc_self_fd);
+    }
+
     /*
      * If we are setting posix access acl and if SGID needs to be
      * cleared, then switch to caller's gid and drop CAP_FSETID
@@ -3176,20 +3194,13 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         }
         switched_creds = true;
     }
-    if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
-        if (fd < 0) {
-            saverr = errno;
-            goto out;
-        }
+    if (open_inode) {
+        assert(fd >= 0);
         ret = fsetxattr(fd, name, value, size, flags);
         saverr = ret == -1 ? errno : 0;
     } else {
-        /* fchdir should not fail here */
-        FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = setxattr(procname, name, value, size, flags);
         saverr = ret == -1 ? errno : 0;
-        FCHDIR_NOFAIL(lo->root.fd);
     }
     if (switched_creds) {
         if (cap_fsetid_dropped)
@@ -3198,6 +3209,11 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
             lo_restore_cred(&old, false);
     }
 
+    if (!open_inode) {
+        /* Change CWD back, fchdir should not fail here */
+        FCHDIR_NOFAIL(lo->root.fd);
+    }
+
 out:
     if (fd >= 0) {
         close(fd);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [Virtio-fs] [PATCH v3 01/10] virtiofsd: Limit setxattr()'s creds-dropped region
@ 2021-07-30 15:01   ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Vivek Goyal, Max Reitz

We only need to drop/switch our credentials for the (f)setxattr() call
alone, not for the openat() or fchdir() around it.

(Right now, this may not be that big of a problem, but with inodes being
identified by file handles instead of an O_PATH fd, we will need
open_by_handle_at() calls here, which is really fickle when it comes to
credentials being dropped.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 34 +++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 38b2af8599..1f27eeabc5 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3121,6 +3121,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     bool switched_creds = false;
     bool cap_fsetid_dropped = false;
     struct lo_cred old = {};
+    bool open_inode;
 
     if (block_xattr(lo, in_name)) {
         fuse_reply_err(req, EOPNOTSUPP);
@@ -3155,7 +3156,24 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     fuse_log(FUSE_LOG_DEBUG, "lo_setxattr(ino=%" PRIu64
              ", name=%s value=%s size=%zd)\n", ino, name, value, size);
 
+    /*
+     * We can only open regular files or directories.  If the inode is
+     * something else, we have to enter /proc/self/fd and use
+     * setxattr() on the link's filename there.
+     */
+    open_inode = S_ISREG(inode->filetype) || S_ISDIR(inode->filetype);
     sprintf(procname, "%i", inode->fd);
+    if (open_inode) {
+        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        if (fd < 0) {
+            saverr = errno;
+            goto out;
+        }
+    } else {
+        /* fchdir should not fail here */
+        FCHDIR_NOFAIL(lo->proc_self_fd);
+    }
+
     /*
      * If we are setting posix access acl and if SGID needs to be
      * cleared, then switch to caller's gid and drop CAP_FSETID
@@ -3176,20 +3194,13 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         }
         switched_creds = true;
     }
-    if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
-        if (fd < 0) {
-            saverr = errno;
-            goto out;
-        }
+    if (open_inode) {
+        assert(fd >= 0);
         ret = fsetxattr(fd, name, value, size, flags);
         saverr = ret == -1 ? errno : 0;
     } else {
-        /* fchdir should not fail here */
-        FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = setxattr(procname, name, value, size, flags);
         saverr = ret == -1 ? errno : 0;
-        FCHDIR_NOFAIL(lo->root.fd);
     }
     if (switched_creds) {
         if (cap_fsetid_dropped)
@@ -3198,6 +3209,11 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
             lo_restore_cred(&old, false);
     }
 
+    if (!open_inode) {
+        /* Change CWD back, fchdir should not fail here */
+        FCHDIR_NOFAIL(lo->root.fd);
+    }
+
 out:
     if (fd >= 0) {
         close(fd);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v3 02/10] virtiofsd: Add TempFd structure
  2021-07-30 15:01 ` [Virtio-fs] " Max Reitz
@ 2021-07-30 15:01   ` Max Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal, Max Reitz

We are planning to add file handles to lo_inode objects as an
alternative to lo_inode.fd.  That means that everywhere where we
currently reference lo_inode.fd, we will have to open a temporary file
descriptor that needs to be closed after use.

So instead of directly accessing lo_inode.fd, there will be a helper
function (lo_inode_fd()) that either returns lo_inode.fd, or opens a new
file descriptor with open_by_handle_at().  It encapsulates this result
in a TempFd structure to let the caller know whether the FD needs to be
closed after use (opened from the handle) or not (copied from
lo_inode.fd).

By using g_auto(TempFd) to store this result, callers will not even have
to care about closing a temporary FD after use.  It will be done
automatically once the object goes out of scope.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 49 ++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 1f27eeabc5..fb5e073e6a 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -178,6 +178,28 @@ struct lo_data {
     int user_posix_acl, posix_acl;
 };
 
+/**
+ * Represents a file descriptor that may either be owned by this
+ * TempFd, or only referenced (i.e. the ownership belongs to some
+ * other object, and the value has just been copied into this TempFd).
+ *
+ * The purpose of this encapsulation is to be used as g_auto(TempFd)
+ * to automatically clean up owned file descriptors when this object
+ * goes out of scope.
+ *
+ * Use temp_fd_steal() to get an owned file descriptor that will not
+ * be closed when the TempFd goes out of scope.
+ */
+typedef struct {
+    int fd;
+    bool owned; /* fd owned by this object? */
+} TempFd;
+
+#define TEMP_FD_INIT ((TempFd) { .fd = -1, .owned = false })
+
+static void temp_fd_clear(TempFd *temp_fd);
+G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(TempFd, temp_fd_clear);
+
 static const struct fuse_opt lo_opts[] = {
     { "sandbox=namespace",
       offsetof(struct lo_data, sandbox),
@@ -255,6 +277,33 @@ static struct lo_data *lo_data(fuse_req_t req)
     return (struct lo_data *)fuse_req_userdata(req);
 }
 
+/**
+ * Clean-up function for TempFds
+ */
+static void temp_fd_clear(TempFd *temp_fd)
+{
+    if (temp_fd->owned) {
+        close(temp_fd->fd);
+        *temp_fd = TEMP_FD_INIT;
+    }
+}
+
+/**
+ * Return an owned fd from *temp_fd that will not be closed when
+ * *temp_fd goes out of scope.
+ *
+ * (TODO: Remove __attribute__ once this is used.)
+ */
+static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
+{
+    if (temp_fd->owned) {
+        temp_fd->owned = false;
+        return temp_fd->fd;
+    } else {
+        return dup(temp_fd->fd);
+    }
+}
+
 /*
  * Load capng's state from our saved state if the current thread
  * hadn't previously been loaded.
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [Virtio-fs] [PATCH v3 02/10] virtiofsd: Add TempFd structure
@ 2021-07-30 15:01   ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Vivek Goyal, Max Reitz

We are planning to add file handles to lo_inode objects as an
alternative to lo_inode.fd.  That means that everywhere where we
currently reference lo_inode.fd, we will have to open a temporary file
descriptor that needs to be closed after use.

So instead of directly accessing lo_inode.fd, there will be a helper
function (lo_inode_fd()) that either returns lo_inode.fd, or opens a new
file descriptor with open_by_handle_at().  It encapsulates this result
in a TempFd structure to let the caller know whether the FD needs to be
closed after use (opened from the handle) or not (copied from
lo_inode.fd).

By using g_auto(TempFd) to store this result, callers will not even have
to care about closing a temporary FD after use.  It will be done
automatically once the object goes out of scope.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 49 ++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 1f27eeabc5..fb5e073e6a 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -178,6 +178,28 @@ struct lo_data {
     int user_posix_acl, posix_acl;
 };
 
+/**
+ * Represents a file descriptor that may either be owned by this
+ * TempFd, or only referenced (i.e. the ownership belongs to some
+ * other object, and the value has just been copied into this TempFd).
+ *
+ * The purpose of this encapsulation is to be used as g_auto(TempFd)
+ * to automatically clean up owned file descriptors when this object
+ * goes out of scope.
+ *
+ * Use temp_fd_steal() to get an owned file descriptor that will not
+ * be closed when the TempFd goes out of scope.
+ */
+typedef struct {
+    int fd;
+    bool owned; /* fd owned by this object? */
+} TempFd;
+
+#define TEMP_FD_INIT ((TempFd) { .fd = -1, .owned = false })
+
+static void temp_fd_clear(TempFd *temp_fd);
+G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(TempFd, temp_fd_clear);
+
 static const struct fuse_opt lo_opts[] = {
     { "sandbox=namespace",
       offsetof(struct lo_data, sandbox),
@@ -255,6 +277,33 @@ static struct lo_data *lo_data(fuse_req_t req)
     return (struct lo_data *)fuse_req_userdata(req);
 }
 
+/**
+ * Clean-up function for TempFds
+ */
+static void temp_fd_clear(TempFd *temp_fd)
+{
+    if (temp_fd->owned) {
+        close(temp_fd->fd);
+        *temp_fd = TEMP_FD_INIT;
+    }
+}
+
+/**
+ * Return an owned fd from *temp_fd that will not be closed when
+ * *temp_fd goes out of scope.
+ *
+ * (TODO: Remove __attribute__ once this is used.)
+ */
+static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
+{
+    if (temp_fd->owned) {
+        temp_fd->owned = false;
+        return temp_fd->fd;
+    } else {
+        return dup(temp_fd->fd);
+    }
+}
+
 /*
  * Load capng's state from our saved state if the current thread
  * hadn't previously been loaded.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v3 03/10] virtiofsd: Use lo_inode_open() instead of openat()
  2021-07-30 15:01 ` [Virtio-fs] " Max Reitz
@ 2021-07-30 15:01   ` Max Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal, Max Reitz

The xattr functions want a non-O_PATH FD, so they reopen the lo_inode.fd
with the flags they need through /proc/self/fd.

Similarly, lo_opendir() needs an O_RDONLY FD.  Instead of the
/proc/self/fd trick, it just uses openat(fd, "."), because the FD is
guaranteed to be a directory, so this works.

All cases have one problem in common, though: In the future, when we may
have a file handle in the lo_inode instead of an FD, querying an
lo_inode FD may incur an open_by_handle_at() call.  It does not make
sense to then reopen that FD with custom flags, those should have been
passed to open_by_handle_at() instead.

Use lo_inode_open() instead of openat().  As part of the file handle
change, lo_inode_open() will be made to invoke openat() only if
lo_inode.fd is valid.  Otherwise, it will invoke open_by_handle_at()
with the right flags from the start.

Consequently, after this patch, lo_inode_open() is the only place to
invoke openat() to reopen an existing FD with different flags.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index fb5e073e6a..a444c3a7e2 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1729,18 +1729,26 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
 {
     int error = ENOMEM;
     struct lo_data *lo = lo_data(req);
-    struct lo_dirp *d;
+    struct lo_inode *inode;
+    struct lo_dirp *d = NULL;
     int fd;
     ssize_t fh;
 
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        error = EBADF;
+        goto out_err;
+    }
+
     d = calloc(1, sizeof(struct lo_dirp));
     if (d == NULL) {
         goto out_err;
     }
 
-    fd = openat(lo_fd(req, ino), ".", O_RDONLY);
-    if (fd == -1) {
-        goto out_errno;
+    fd = lo_inode_open(lo, inode, O_RDONLY);
+    if (fd < 0) {
+        error = -fd;
+        goto out_err;
     }
 
     d->dp = fdopendir(fd);
@@ -1769,6 +1777,7 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
 out_errno:
     error = errno;
 out_err:
+    lo_inode_put(lo, &inode);
     if (d) {
         if (d->dp) {
             closedir(d->dp);
@@ -2973,7 +2982,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         }
     }
 
-    sprintf(procname, "%i", inode->fd);
     /*
      * It is not safe to open() non-regular/non-dir files in file server
      * unless O_PATH is used, so use that method for regular files/dir
@@ -2981,13 +2989,15 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      * Otherwise, call fchdir() to avoid open().
      */
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            goto out_err;
+            saverr = -fd;
+            goto out;
         }
         ret = fgetxattr(fd, name, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = getxattr(procname, name, value, size);
@@ -3054,15 +3064,16 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         }
     }
 
-    sprintf(procname, "%i", inode->fd);
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            goto out_err;
+            saverr = -fd;
+            goto out;
         }
         ret = flistxattr(fd, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = listxattr(procname, value, size);
@@ -3211,14 +3222,14 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      * setxattr() on the link's filename there.
      */
     open_inode = S_ISREG(inode->filetype) || S_ISDIR(inode->filetype);
-    sprintf(procname, "%i", inode->fd);
     if (open_inode) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            saverr = errno;
+            saverr = -fd;
             goto out;
         }
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
     }
@@ -3317,16 +3328,16 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     fuse_log(FUSE_LOG_DEBUG, "lo_removexattr(ino=%" PRIu64 ", name=%s)\n", ino,
              name);
 
-    sprintf(procname, "%i", inode->fd);
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            saverr = errno;
+            saverr = -fd;
             goto out;
         }
         ret = fremovexattr(fd, name);
         saverr = ret == -1 ? errno : 0;
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = removexattr(procname, name);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [Virtio-fs] [PATCH v3 03/10] virtiofsd: Use lo_inode_open() instead of openat()
@ 2021-07-30 15:01   ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Vivek Goyal, Max Reitz

The xattr functions want a non-O_PATH FD, so they reopen the lo_inode.fd
with the flags they need through /proc/self/fd.

Similarly, lo_opendir() needs an O_RDONLY FD.  Instead of the
/proc/self/fd trick, it just uses openat(fd, "."), because the FD is
guaranteed to be a directory, so this works.

All cases have one problem in common, though: In the future, when we may
have a file handle in the lo_inode instead of an FD, querying an
lo_inode FD may incur an open_by_handle_at() call.  It does not make
sense to then reopen that FD with custom flags, those should have been
passed to open_by_handle_at() instead.

Use lo_inode_open() instead of openat().  As part of the file handle
change, lo_inode_open() will be made to invoke openat() only if
lo_inode.fd is valid.  Otherwise, it will invoke open_by_handle_at()
with the right flags from the start.

Consequently, after this patch, lo_inode_open() is the only place to
invoke openat() to reopen an existing FD with different flags.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index fb5e073e6a..a444c3a7e2 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1729,18 +1729,26 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
 {
     int error = ENOMEM;
     struct lo_data *lo = lo_data(req);
-    struct lo_dirp *d;
+    struct lo_inode *inode;
+    struct lo_dirp *d = NULL;
     int fd;
     ssize_t fh;
 
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        error = EBADF;
+        goto out_err;
+    }
+
     d = calloc(1, sizeof(struct lo_dirp));
     if (d == NULL) {
         goto out_err;
     }
 
-    fd = openat(lo_fd(req, ino), ".", O_RDONLY);
-    if (fd == -1) {
-        goto out_errno;
+    fd = lo_inode_open(lo, inode, O_RDONLY);
+    if (fd < 0) {
+        error = -fd;
+        goto out_err;
     }
 
     d->dp = fdopendir(fd);
@@ -1769,6 +1777,7 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
 out_errno:
     error = errno;
 out_err:
+    lo_inode_put(lo, &inode);
     if (d) {
         if (d->dp) {
             closedir(d->dp);
@@ -2973,7 +2982,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         }
     }
 
-    sprintf(procname, "%i", inode->fd);
     /*
      * It is not safe to open() non-regular/non-dir files in file server
      * unless O_PATH is used, so use that method for regular files/dir
@@ -2981,13 +2989,15 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      * Otherwise, call fchdir() to avoid open().
      */
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            goto out_err;
+            saverr = -fd;
+            goto out;
         }
         ret = fgetxattr(fd, name, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = getxattr(procname, name, value, size);
@@ -3054,15 +3064,16 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         }
     }
 
-    sprintf(procname, "%i", inode->fd);
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            goto out_err;
+            saverr = -fd;
+            goto out;
         }
         ret = flistxattr(fd, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = listxattr(procname, value, size);
@@ -3211,14 +3222,14 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      * setxattr() on the link's filename there.
      */
     open_inode = S_ISREG(inode->filetype) || S_ISDIR(inode->filetype);
-    sprintf(procname, "%i", inode->fd);
     if (open_inode) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            saverr = errno;
+            saverr = -fd;
             goto out;
         }
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
     }
@@ -3317,16 +3328,16 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     fuse_log(FUSE_LOG_DEBUG, "lo_removexattr(ino=%" PRIu64 ", name=%s)\n", ino,
              name);
 
-    sprintf(procname, "%i", inode->fd);
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            saverr = errno;
+            saverr = -fd;
             goto out;
         }
         ret = fremovexattr(fd, name);
         saverr = ret == -1 ? errno : 0;
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = removexattr(procname, name);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v3 04/10] virtiofsd: Add lo_inode_fd() helper
  2021-07-30 15:01 ` [Virtio-fs] " Max Reitz
@ 2021-07-30 15:01   ` Max Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal, Max Reitz

Once we let lo_inode.fd be optional, we will need its users to open the
file handle stored in lo_inode instead.  This function will do that.

For now, it just returns lo_inode.fd, though.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 150 +++++++++++++++++++++++++------
 1 file changed, 125 insertions(+), 25 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index a444c3a7e2..86b901cf19 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -635,6 +635,16 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
     return elem->inode;
 }
 
+static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
+{
+    *tfd = (TempFd) {
+        .fd = inode->fd,
+        .owned = false,
+    };
+
+    return 0;
+}
+
 /*
  * TODO Remove this helper and force callers to hold an inode refcount until
  * they are done with the fd.  This will be done in a later patch to make
@@ -822,11 +832,11 @@ static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
 static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
                        int valid, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     int saverr;
     char procname[64];
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
-    int ifd;
     int res;
     int fd = -1;
 
@@ -836,7 +846,11 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         return;
     }
 
-    ifd = inode->fd;
+    res = lo_inode_fd(inode, &inode_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out_err;
+    }
 
     /* If fi->fh is invalid we'll report EBADF later */
     if (fi) {
@@ -847,7 +861,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             res = fchmod(fd, attr->st_mode);
         } else {
-            sprintf(procname, "%i", ifd);
+            sprintf(procname, "%i", inode_fd.fd);
             res = fchmodat(lo->proc_self_fd, procname, attr->st_mode, 0);
         }
         if (res == -1) {
@@ -859,12 +873,13 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         uid_t uid = (valid & FUSE_SET_ATTR_UID) ? attr->st_uid : (uid_t)-1;
         gid_t gid = (valid & FUSE_SET_ATTR_GID) ? attr->st_gid : (gid_t)-1;
 
-        saverr = drop_security_capability(lo, ifd);
+        saverr = drop_security_capability(lo, inode_fd.fd);
         if (saverr) {
             goto out_err;
         }
 
-        res = fchownat(ifd, "", uid, gid, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+        res = fchownat(inode_fd.fd, "", uid, gid,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
         if (res == -1) {
             saverr = errno;
             goto out_err;
@@ -943,7 +958,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             res = futimens(fd, tv);
         } else {
-            sprintf(procname, "%i", inode->fd);
+            sprintf(procname, "%i", inode_fd.fd);
             res = utimensat(lo->proc_self_fd, procname, tv, 0);
         }
         if (res == -1) {
@@ -1058,7 +1073,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
                         struct fuse_entry_param *e,
                         struct lo_inode **inodep)
 {
-    int newfd;
+    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
+    int newfd = -1;
     int res;
     int saverr;
     uint64_t mnt_id;
@@ -1088,7 +1104,13 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         name = ".";
     }
 
-    newfd = openat(dir->fd, name, O_PATH | O_NOFOLLOW);
+    res = lo_inode_fd(dir, &dir_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out;
+    }
+
+    newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
     if (newfd == -1) {
         goto out_err;
     }
@@ -1155,6 +1177,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 
 out_err:
     saverr = errno;
+out:
     if (newfd != -1) {
         close(newfd);
     }
@@ -1312,6 +1335,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
                              const char *name, mode_t mode, dev_t rdev,
                              const char *link)
 {
+    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
     int res;
     int saverr;
     struct lo_data *lo = lo_data(req);
@@ -1335,12 +1359,18 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
         return;
     }
 
+    res = lo_inode_fd(dir, &dir_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out;
+    }
+
     saverr = lo_change_cred(req, &old, lo->change_umask && !S_ISLNK(mode));
     if (saverr) {
         goto out;
     }
 
-    res = mknod_wrapper(dir->fd, name, link, mode, rdev);
+    res = mknod_wrapper(dir_fd.fd, name, link, mode, rdev);
 
     saverr = errno;
 
@@ -1388,6 +1418,8 @@ static void lo_symlink(fuse_req_t req, const char *link, fuse_ino_t parent,
 static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
                     const char *name)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int res;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *parent_inode;
@@ -1413,18 +1445,31 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
         goto out_err;
     }
 
+    res = lo_inode_fd(inode, &inode_fd);
+    if (res < 0) {
+        errno = -res;
+        goto out_err;
+    }
+
+    res = lo_inode_fd(parent_inode, &parent_fd);
+    if (res < 0) {
+        errno = -res;
+        goto out_err;
+    }
+
     memset(&e, 0, sizeof(struct fuse_entry_param));
     e.attr_timeout = lo->timeout;
     e.entry_timeout = lo->timeout;
 
-    sprintf(procname, "%i", inode->fd);
-    res = linkat(lo->proc_self_fd, procname, parent_inode->fd, name,
+    sprintf(procname, "%i", inode_fd.fd);
+    res = linkat(lo->proc_self_fd, procname, parent_fd.fd, name,
                  AT_SYMLINK_FOLLOW);
     if (res == -1) {
         goto out_err;
     }
 
-    res = fstatat(inode->fd, "", &e.attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    res = fstatat(inode_fd.fd, "", &e.attr,
+                  AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
     if (res == -1) {
         goto out_err;
     }
@@ -1453,23 +1498,33 @@ out_err:
 static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
                                     const char *name)
 {
+    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
     int res;
     uint64_t mnt_id;
     struct stat attr;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *dir = lo_inode(req, parent);
+    struct lo_inode *inode = NULL;
 
     if (!dir) {
-        return NULL;
+        goto out;
     }
 
-    res = do_statx(lo, dir->fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
-    lo_inode_put(lo, &dir);
+    res = lo_inode_fd(dir, &dir_fd);
+    if (res < 0) {
+        goto out;
+    }
+
+    res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
     if (res == -1) {
-        return NULL;
+        goto out;
     }
 
-    return lo_find(lo, &attr, mnt_id);
+    inode = lo_find(lo, &attr, mnt_id);
+
+out:
+    lo_inode_put(lo, &dir);
+    return inode;
 }
 
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
@@ -1505,6 +1560,8 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
                       fuse_ino_t newparent, const char *newname,
                       unsigned int flags)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
+    g_auto(TempFd) newparent_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *parent_inode;
     struct lo_inode *newparent_inode;
@@ -1537,12 +1594,24 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
+    res = lo_inode_fd(parent_inode, &parent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        goto out;
+    }
+
+    res = lo_inode_fd(newparent_inode, &newparent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        goto out;
+    }
+
     if (flags) {
 #ifndef SYS_renameat2
         fuse_reply_err(req, EINVAL);
 #else
-        res = syscall(SYS_renameat2, parent_inode->fd, name,
-                        newparent_inode->fd, newname, flags);
+        res = syscall(SYS_renameat2, parent_fd.fd, name,
+                        newparent_fd.fd, newname, flags);
         if (res == -1 && errno == ENOSYS) {
             fuse_reply_err(req, EINVAL);
         } else {
@@ -1552,7 +1621,7 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
-    res = renameat(parent_inode->fd, name, newparent_inode->fd, newname);
+    res = renameat(parent_fd.fd, name, newparent_fd.fd, newname);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
 out:
@@ -2037,6 +2106,7 @@ static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
 static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
                       mode_t mode, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int fd = -1;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *parent_inode;
@@ -2059,6 +2129,12 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
         return;
     }
 
+    err = lo_inode_fd(parent_inode, &parent_fd);
+    if (err < 0) {
+        err = -err;
+        goto out;
+    }
+
     err = lo_change_cred(req, &old, lo->change_umask);
     if (err) {
         goto out;
@@ -2067,7 +2143,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
     update_open_flags(lo->writeback, lo->allow_direct_io, fi);
 
     /* Try to create a new file but don't open existing files */
-    fd = openat(parent_inode->fd, name, fi->flags | O_CREAT | O_EXCL, mode);
+    fd = openat(parent_fd.fd, name, fi->flags | O_CREAT | O_EXCL, mode);
     err = fd == -1 ? errno : 0;
 
     lo_restore_cred(&old, lo->change_umask);
@@ -2929,6 +3005,7 @@ static int remove_blocked_xattrs(struct lo_data *lo, char *xattr_list,
 static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
                         size_t size)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_data *lo = lo_data(req);
     g_autofree char *value = NULL;
     char procname[64];
@@ -2997,7 +3074,12 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         ret = fgetxattr(fd, name, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = getxattr(procname, name, value, size);
@@ -3035,6 +3117,7 @@ out:
 
 static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_data *lo = lo_data(req);
     g_autofree char *value = NULL;
     char procname[64];
@@ -3073,7 +3156,12 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         ret = flistxattr(fd, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = listxattr(procname, value, size);
@@ -3170,6 +3258,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
                         const char *value, size_t size, int flags,
                         uint32_t extra_flags)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     char procname[64];
     const char *name;
     char *mapped_name;
@@ -3229,7 +3318,12 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
             goto out;
         }
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
     }
@@ -3286,6 +3380,7 @@ out:
 
 static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     char procname[64];
     const char *name;
     char *mapped_name;
@@ -3337,7 +3432,12 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
         ret = fremovexattr(fd, name);
         saverr = ret == -1 ? errno : 0;
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = removexattr(procname, name);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [Virtio-fs] [PATCH v3 04/10] virtiofsd: Add lo_inode_fd() helper
@ 2021-07-30 15:01   ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Vivek Goyal, Max Reitz

Once we let lo_inode.fd be optional, we will need its users to open the
file handle stored in lo_inode instead.  This function will do that.

For now, it just returns lo_inode.fd, though.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 150 +++++++++++++++++++++++++------
 1 file changed, 125 insertions(+), 25 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index a444c3a7e2..86b901cf19 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -635,6 +635,16 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
     return elem->inode;
 }
 
+static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
+{
+    *tfd = (TempFd) {
+        .fd = inode->fd,
+        .owned = false,
+    };
+
+    return 0;
+}
+
 /*
  * TODO Remove this helper and force callers to hold an inode refcount until
  * they are done with the fd.  This will be done in a later patch to make
@@ -822,11 +832,11 @@ static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
 static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
                        int valid, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     int saverr;
     char procname[64];
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
-    int ifd;
     int res;
     int fd = -1;
 
@@ -836,7 +846,11 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         return;
     }
 
-    ifd = inode->fd;
+    res = lo_inode_fd(inode, &inode_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out_err;
+    }
 
     /* If fi->fh is invalid we'll report EBADF later */
     if (fi) {
@@ -847,7 +861,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             res = fchmod(fd, attr->st_mode);
         } else {
-            sprintf(procname, "%i", ifd);
+            sprintf(procname, "%i", inode_fd.fd);
             res = fchmodat(lo->proc_self_fd, procname, attr->st_mode, 0);
         }
         if (res == -1) {
@@ -859,12 +873,13 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         uid_t uid = (valid & FUSE_SET_ATTR_UID) ? attr->st_uid : (uid_t)-1;
         gid_t gid = (valid & FUSE_SET_ATTR_GID) ? attr->st_gid : (gid_t)-1;
 
-        saverr = drop_security_capability(lo, ifd);
+        saverr = drop_security_capability(lo, inode_fd.fd);
         if (saverr) {
             goto out_err;
         }
 
-        res = fchownat(ifd, "", uid, gid, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+        res = fchownat(inode_fd.fd, "", uid, gid,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
         if (res == -1) {
             saverr = errno;
             goto out_err;
@@ -943,7 +958,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             res = futimens(fd, tv);
         } else {
-            sprintf(procname, "%i", inode->fd);
+            sprintf(procname, "%i", inode_fd.fd);
             res = utimensat(lo->proc_self_fd, procname, tv, 0);
         }
         if (res == -1) {
@@ -1058,7 +1073,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
                         struct fuse_entry_param *e,
                         struct lo_inode **inodep)
 {
-    int newfd;
+    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
+    int newfd = -1;
     int res;
     int saverr;
     uint64_t mnt_id;
@@ -1088,7 +1104,13 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         name = ".";
     }
 
-    newfd = openat(dir->fd, name, O_PATH | O_NOFOLLOW);
+    res = lo_inode_fd(dir, &dir_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out;
+    }
+
+    newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
     if (newfd == -1) {
         goto out_err;
     }
@@ -1155,6 +1177,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 
 out_err:
     saverr = errno;
+out:
     if (newfd != -1) {
         close(newfd);
     }
@@ -1312,6 +1335,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
                              const char *name, mode_t mode, dev_t rdev,
                              const char *link)
 {
+    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
     int res;
     int saverr;
     struct lo_data *lo = lo_data(req);
@@ -1335,12 +1359,18 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
         return;
     }
 
+    res = lo_inode_fd(dir, &dir_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out;
+    }
+
     saverr = lo_change_cred(req, &old, lo->change_umask && !S_ISLNK(mode));
     if (saverr) {
         goto out;
     }
 
-    res = mknod_wrapper(dir->fd, name, link, mode, rdev);
+    res = mknod_wrapper(dir_fd.fd, name, link, mode, rdev);
 
     saverr = errno;
 
@@ -1388,6 +1418,8 @@ static void lo_symlink(fuse_req_t req, const char *link, fuse_ino_t parent,
 static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
                     const char *name)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int res;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *parent_inode;
@@ -1413,18 +1445,31 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
         goto out_err;
     }
 
+    res = lo_inode_fd(inode, &inode_fd);
+    if (res < 0) {
+        errno = -res;
+        goto out_err;
+    }
+
+    res = lo_inode_fd(parent_inode, &parent_fd);
+    if (res < 0) {
+        errno = -res;
+        goto out_err;
+    }
+
     memset(&e, 0, sizeof(struct fuse_entry_param));
     e.attr_timeout = lo->timeout;
     e.entry_timeout = lo->timeout;
 
-    sprintf(procname, "%i", inode->fd);
-    res = linkat(lo->proc_self_fd, procname, parent_inode->fd, name,
+    sprintf(procname, "%i", inode_fd.fd);
+    res = linkat(lo->proc_self_fd, procname, parent_fd.fd, name,
                  AT_SYMLINK_FOLLOW);
     if (res == -1) {
         goto out_err;
     }
 
-    res = fstatat(inode->fd, "", &e.attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    res = fstatat(inode_fd.fd, "", &e.attr,
+                  AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
     if (res == -1) {
         goto out_err;
     }
@@ -1453,23 +1498,33 @@ out_err:
 static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
                                     const char *name)
 {
+    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
     int res;
     uint64_t mnt_id;
     struct stat attr;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *dir = lo_inode(req, parent);
+    struct lo_inode *inode = NULL;
 
     if (!dir) {
-        return NULL;
+        goto out;
     }
 
-    res = do_statx(lo, dir->fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
-    lo_inode_put(lo, &dir);
+    res = lo_inode_fd(dir, &dir_fd);
+    if (res < 0) {
+        goto out;
+    }
+
+    res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
     if (res == -1) {
-        return NULL;
+        goto out;
     }
 
-    return lo_find(lo, &attr, mnt_id);
+    inode = lo_find(lo, &attr, mnt_id);
+
+out:
+    lo_inode_put(lo, &dir);
+    return inode;
 }
 
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
@@ -1505,6 +1560,8 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
                       fuse_ino_t newparent, const char *newname,
                       unsigned int flags)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
+    g_auto(TempFd) newparent_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *parent_inode;
     struct lo_inode *newparent_inode;
@@ -1537,12 +1594,24 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
+    res = lo_inode_fd(parent_inode, &parent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        goto out;
+    }
+
+    res = lo_inode_fd(newparent_inode, &newparent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        goto out;
+    }
+
     if (flags) {
 #ifndef SYS_renameat2
         fuse_reply_err(req, EINVAL);
 #else
-        res = syscall(SYS_renameat2, parent_inode->fd, name,
-                        newparent_inode->fd, newname, flags);
+        res = syscall(SYS_renameat2, parent_fd.fd, name,
+                        newparent_fd.fd, newname, flags);
         if (res == -1 && errno == ENOSYS) {
             fuse_reply_err(req, EINVAL);
         } else {
@@ -1552,7 +1621,7 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
-    res = renameat(parent_inode->fd, name, newparent_inode->fd, newname);
+    res = renameat(parent_fd.fd, name, newparent_fd.fd, newname);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
 out:
@@ -2037,6 +2106,7 @@ static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
 static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
                       mode_t mode, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int fd = -1;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *parent_inode;
@@ -2059,6 +2129,12 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
         return;
     }
 
+    err = lo_inode_fd(parent_inode, &parent_fd);
+    if (err < 0) {
+        err = -err;
+        goto out;
+    }
+
     err = lo_change_cred(req, &old, lo->change_umask);
     if (err) {
         goto out;
@@ -2067,7 +2143,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
     update_open_flags(lo->writeback, lo->allow_direct_io, fi);
 
     /* Try to create a new file but don't open existing files */
-    fd = openat(parent_inode->fd, name, fi->flags | O_CREAT | O_EXCL, mode);
+    fd = openat(parent_fd.fd, name, fi->flags | O_CREAT | O_EXCL, mode);
     err = fd == -1 ? errno : 0;
 
     lo_restore_cred(&old, lo->change_umask);
@@ -2929,6 +3005,7 @@ static int remove_blocked_xattrs(struct lo_data *lo, char *xattr_list,
 static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
                         size_t size)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_data *lo = lo_data(req);
     g_autofree char *value = NULL;
     char procname[64];
@@ -2997,7 +3074,12 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         ret = fgetxattr(fd, name, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = getxattr(procname, name, value, size);
@@ -3035,6 +3117,7 @@ out:
 
 static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_data *lo = lo_data(req);
     g_autofree char *value = NULL;
     char procname[64];
@@ -3073,7 +3156,12 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         ret = flistxattr(fd, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = listxattr(procname, value, size);
@@ -3170,6 +3258,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
                         const char *value, size_t size, int flags,
                         uint32_t extra_flags)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     char procname[64];
     const char *name;
     char *mapped_name;
@@ -3229,7 +3318,12 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
             goto out;
         }
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
     }
@@ -3286,6 +3380,7 @@ out:
 
 static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     char procname[64];
     const char *name;
     char *mapped_name;
@@ -3337,7 +3432,12 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
         ret = fremovexattr(fd, name);
         saverr = ret == -1 ? errno : 0;
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = removexattr(procname, name);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v3 05/10] virtiofsd: Let lo_fd() return a TempFd
  2021-07-30 15:01 ` [Virtio-fs] " Max Reitz
@ 2021-07-30 15:01   ` Max Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal, Max Reitz

Accessing lo_inode.fd must generally happen through lo_inode_fd(), and
lo_fd() is no exception; and then it must pass on the TempFd it has
received from lo_inode_fd().

(Note that all lo_fd() calls now use proper error handling, where all of
them were in-line before; i.e. they were used in place of the fd
argument of some function call.  This only worked because the only error
that could occur was that lo_inode() failed to find the inode ID: Then
-1 would be passed as the fd, which would result in an EBADF error,
which is precisely what we would want to return to the guest for an
invalid inode ID.
Now, though, lo_inode_fd() might potentially invoke open_by_handle_at(),
which can return many different errors, and they should be properly
handled and returned to the guest.  So we can no longer allow lo_fd() to
be used in-line, and instead need to do proper error handling for it.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 55 +++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 11 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 86b901cf19..9e1bc37af8 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -650,18 +650,19 @@ static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
  * they are done with the fd.  This will be done in a later patch to make
  * review easier.
  */
-static int lo_fd(fuse_req_t req, fuse_ino_t ino)
+static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
 {
     struct lo_inode *inode = lo_inode(req, ino);
-    int fd;
+    int res;
 
     if (!inode) {
-        return -1;
+        return -EBADF;
     }
 
-    fd = inode->fd;
+    res = lo_inode_fd(inode, tfd);
+
     lo_inode_put(lo_data(req), &inode);
-    return fd;
+    return res;
 }
 
 /*
@@ -798,14 +799,19 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
 static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
                        struct fuse_file_info *fi)
 {
+    g_auto(TempFd) ino_fd = TEMP_FD_INIT;
     int res;
     struct stat buf;
     struct lo_data *lo = lo_data(req);
 
     (void)fi;
 
-    res =
-        fstatat(lo_fd(req, ino), "", &buf, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    res = lo_fd(req, ino, &ino_fd);
+    if (res < 0) {
+        return (void)fuse_reply_err(req, -res);
+    }
+
+    res = fstatat(ino_fd.fd, "", &buf, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
     if (res == -1) {
         return (void)fuse_reply_err(req, errno);
     }
@@ -1529,6 +1535,7 @@ out:
 
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *inode;
     struct lo_data *lo = lo_data(req);
@@ -1543,13 +1550,19 @@ static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
         return;
     }
 
+    res = lo_fd(req, parent, &parent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
     inode = lookup_name(req, parent, name);
     if (!inode) {
         fuse_reply_err(req, EIO);
         return;
     }
 
-    res = unlinkat(lo_fd(req, parent), name, AT_REMOVEDIR);
+    res = unlinkat(parent_fd.fd, name, AT_REMOVEDIR);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
     unref_inode_lolocked(lo, inode, 1);
@@ -1635,6 +1648,7 @@ out:
 
 static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *inode;
     struct lo_data *lo = lo_data(req);
@@ -1649,13 +1663,19 @@ static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
         return;
     }
 
+    res = lo_fd(req, parent, &parent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
     inode = lookup_name(req, parent, name);
     if (!inode) {
         fuse_reply_err(req, EIO);
         return;
     }
 
-    res = unlinkat(lo_fd(req, parent), name, 0);
+    res = unlinkat(parent_fd.fd, name, 0);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
     unref_inode_lolocked(lo, inode, 1);
@@ -1735,10 +1755,16 @@ static void lo_forget_multi(fuse_req_t req, size_t count,
 
 static void lo_readlink(fuse_req_t req, fuse_ino_t ino)
 {
+    g_auto(TempFd) ino_fd = TEMP_FD_INIT;
     char buf[PATH_MAX + 1];
     int res;
 
-    res = readlinkat(lo_fd(req, ino), "", buf, sizeof(buf));
+    res = lo_fd(req, ino, &ino_fd);
+    if (res < 0) {
+        return (void)fuse_reply_err(req, -res);
+    }
+
+    res = readlinkat(ino_fd.fd, "", buf, sizeof(buf));
     if (res == -1) {
         return (void)fuse_reply_err(req, errno);
     }
@@ -2535,10 +2561,17 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
 
 static void lo_statfs(fuse_req_t req, fuse_ino_t ino)
 {
+    g_auto(TempFd) ino_fd = TEMP_FD_INIT;
     int res;
     struct statvfs stbuf;
 
-    res = fstatvfs(lo_fd(req, ino), &stbuf);
+    res = lo_fd(req, ino, &ino_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
+    res = fstatvfs(ino_fd.fd, &stbuf);
     if (res == -1) {
         fuse_reply_err(req, errno);
     } else {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [Virtio-fs] [PATCH v3 05/10] virtiofsd: Let lo_fd() return a TempFd
@ 2021-07-30 15:01   ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Vivek Goyal, Max Reitz

Accessing lo_inode.fd must generally happen through lo_inode_fd(), and
lo_fd() is no exception; and then it must pass on the TempFd it has
received from lo_inode_fd().

(Note that all lo_fd() calls now use proper error handling, where all of
them were in-line before; i.e. they were used in place of the fd
argument of some function call.  This only worked because the only error
that could occur was that lo_inode() failed to find the inode ID: Then
-1 would be passed as the fd, which would result in an EBADF error,
which is precisely what we would want to return to the guest for an
invalid inode ID.
Now, though, lo_inode_fd() might potentially invoke open_by_handle_at(),
which can return many different errors, and they should be properly
handled and returned to the guest.  So we can no longer allow lo_fd() to
be used in-line, and instead need to do proper error handling for it.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 55 +++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 11 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 86b901cf19..9e1bc37af8 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -650,18 +650,19 @@ static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
  * they are done with the fd.  This will be done in a later patch to make
  * review easier.
  */
-static int lo_fd(fuse_req_t req, fuse_ino_t ino)
+static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
 {
     struct lo_inode *inode = lo_inode(req, ino);
-    int fd;
+    int res;
 
     if (!inode) {
-        return -1;
+        return -EBADF;
     }
 
-    fd = inode->fd;
+    res = lo_inode_fd(inode, tfd);
+
     lo_inode_put(lo_data(req), &inode);
-    return fd;
+    return res;
 }
 
 /*
@@ -798,14 +799,19 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
 static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
                        struct fuse_file_info *fi)
 {
+    g_auto(TempFd) ino_fd = TEMP_FD_INIT;
     int res;
     struct stat buf;
     struct lo_data *lo = lo_data(req);
 
     (void)fi;
 
-    res =
-        fstatat(lo_fd(req, ino), "", &buf, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    res = lo_fd(req, ino, &ino_fd);
+    if (res < 0) {
+        return (void)fuse_reply_err(req, -res);
+    }
+
+    res = fstatat(ino_fd.fd, "", &buf, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
     if (res == -1) {
         return (void)fuse_reply_err(req, errno);
     }
@@ -1529,6 +1535,7 @@ out:
 
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *inode;
     struct lo_data *lo = lo_data(req);
@@ -1543,13 +1550,19 @@ static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
         return;
     }
 
+    res = lo_fd(req, parent, &parent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
     inode = lookup_name(req, parent, name);
     if (!inode) {
         fuse_reply_err(req, EIO);
         return;
     }
 
-    res = unlinkat(lo_fd(req, parent), name, AT_REMOVEDIR);
+    res = unlinkat(parent_fd.fd, name, AT_REMOVEDIR);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
     unref_inode_lolocked(lo, inode, 1);
@@ -1635,6 +1648,7 @@ out:
 
 static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *inode;
     struct lo_data *lo = lo_data(req);
@@ -1649,13 +1663,19 @@ static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
         return;
     }
 
+    res = lo_fd(req, parent, &parent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
     inode = lookup_name(req, parent, name);
     if (!inode) {
         fuse_reply_err(req, EIO);
         return;
     }
 
-    res = unlinkat(lo_fd(req, parent), name, 0);
+    res = unlinkat(parent_fd.fd, name, 0);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
     unref_inode_lolocked(lo, inode, 1);
@@ -1735,10 +1755,16 @@ static void lo_forget_multi(fuse_req_t req, size_t count,
 
 static void lo_readlink(fuse_req_t req, fuse_ino_t ino)
 {
+    g_auto(TempFd) ino_fd = TEMP_FD_INIT;
     char buf[PATH_MAX + 1];
     int res;
 
-    res = readlinkat(lo_fd(req, ino), "", buf, sizeof(buf));
+    res = lo_fd(req, ino, &ino_fd);
+    if (res < 0) {
+        return (void)fuse_reply_err(req, -res);
+    }
+
+    res = readlinkat(ino_fd.fd, "", buf, sizeof(buf));
     if (res == -1) {
         return (void)fuse_reply_err(req, errno);
     }
@@ -2535,10 +2561,17 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
 
 static void lo_statfs(fuse_req_t req, fuse_ino_t ino)
 {
+    g_auto(TempFd) ino_fd = TEMP_FD_INIT;
     int res;
     struct statvfs stbuf;
 
-    res = fstatvfs(lo_fd(req, ino), &stbuf);
+    res = lo_fd(req, ino, &ino_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
+    res = fstatvfs(ino_fd.fd, &stbuf);
     if (res == -1) {
         fuse_reply_err(req, errno);
     } else {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v3 06/10] virtiofsd: Let lo_inode_open() return a TempFd
  2021-07-30 15:01 ` [Virtio-fs] " Max Reitz
@ 2021-07-30 15:01   ` Max Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal, Max Reitz

Strictly speaking, this is not necessary, because lo_inode_open() will
always return a new FD owned by the caller, so TempFd.owned will always
be true.

However, auto-cleanup is nice, and in some cases this plays nicely with
an lo_inode_fd() call in another conditional branch (see lo_setattr()).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 138 +++++++++++++------------------
 1 file changed, 59 insertions(+), 79 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 9e1bc37af8..292b7f7e27 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -291,10 +291,8 @@ static void temp_fd_clear(TempFd *temp_fd)
 /**
  * Return an owned fd from *temp_fd that will not be closed when
  * *temp_fd goes out of scope.
- *
- * (TODO: Remove __attribute__ once this is used.)
  */
-static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
+static int temp_fd_steal(TempFd *temp_fd)
 {
     if (temp_fd->owned) {
         temp_fd->owned = false;
@@ -673,9 +671,12 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
  * when a malicious client opens special files such as block device nodes.
  * Symlink inodes are also rejected since symlinks must already have been
  * traversed on the client side.
+ *
+ * The fd is returned in tfd->fd.  The return value is 0 on success and -errno
+ * otherwise.
  */
-static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
-                         int open_flags)
+static int lo_inode_open(const struct lo_data *lo, const struct lo_inode *inode,
+                         int open_flags, TempFd *tfd)
 {
     g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
     int fd;
@@ -694,7 +695,13 @@ static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
     if (fd < 0) {
         return -errno;
     }
-    return fd;
+
+    *tfd = (TempFd) {
+        .fd = fd,
+        .owned = true,
+    };
+
+    return 0;
 }
 
 static void lo_init(void *userdata, struct fuse_conn_info *conn)
@@ -852,7 +859,12 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         return;
     }
 
-    res = lo_inode_fd(inode, &inode_fd);
+    if (!fi && (valid & FUSE_SET_ATTR_SIZE)) {
+        /* We need an O_RDWR FD for ftruncate() */
+        res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
+    } else {
+        res = lo_inode_fd(inode, &inode_fd);
+    }
     if (res < 0) {
         saverr = -res;
         goto out_err;
@@ -900,18 +912,11 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             truncfd = fd;
         } else {
-            truncfd = lo_inode_open(lo, inode, O_RDWR);
-            if (truncfd < 0) {
-                saverr = -truncfd;
-                goto out_err;
-            }
+            truncfd = inode_fd.fd;
         }
 
         saverr = drop_security_capability(lo, truncfd);
         if (saverr) {
-            if (!fi) {
-                close(truncfd);
-            }
             goto out_err;
         }
 
@@ -919,9 +924,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
             res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
             if (res != 0) {
                 saverr = res;
-                if (!fi) {
-                    close(truncfd);
-                }
                 goto out_err;
             }
         }
@@ -934,9 +936,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
                 fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
             }
         }
-        if (!fi) {
-            close(truncfd);
-        }
         if (res == -1) {
             goto out_err;
         }
@@ -1822,11 +1821,12 @@ static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
 static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
                        struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     int error = ENOMEM;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
     struct lo_dirp *d = NULL;
-    int fd;
+    int res;
     ssize_t fh;
 
     inode = lo_inode(req, ino);
@@ -1840,13 +1840,13 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
         goto out_err;
     }
 
-    fd = lo_inode_open(lo, inode, O_RDONLY);
-    if (fd < 0) {
-        error = -fd;
+    res = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+    if (res < 0) {
+        error = -res;
         goto out_err;
     }
 
-    d->dp = fdopendir(fd);
+    d->dp = fdopendir(temp_fd_steal(&inode_fd));
     if (d->dp == NULL) {
         goto out_errno;
     }
@@ -1876,8 +1876,6 @@ out_err:
     if (d) {
         if (d->dp) {
             closedir(d->dp);
-        } else if (fd != -1) {
-            close(fd);
         }
         free(d);
     }
@@ -2077,6 +2075,7 @@ static void update_open_flags(int writeback, int allow_direct_io,
 static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
                       int existing_fd, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     ssize_t fh;
     int fd = existing_fd;
     int err;
@@ -2093,16 +2092,18 @@ static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
             }
         }
 
-        fd = lo_inode_open(lo, inode, fi->flags);
+        err = lo_inode_open(lo, inode, fi->flags, &inode_fd);
 
         if (cap_fsetid_dropped) {
             if (gain_effective_cap("FSETID")) {
                 fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
             }
         }
-        if (fd < 0) {
-            return -fd;
+        if (err < 0) {
+            return -err;
         }
+        fd = temp_fd_steal(&inode_fd);
+
         if (fi->flags & (O_TRUNC)) {
             int err = drop_security_capability(lo, fd);
             if (err) {
@@ -2212,8 +2213,9 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
                                                       uint64_t lock_owner,
                                                       pid_t pid, int *err)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_inode_plock *plock;
-    int fd;
+    int res;
 
     plock =
         g_hash_table_lookup(inode->posix_locks, GUINT_TO_POINTER(lock_owner));
@@ -2230,15 +2232,15 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
 
     /* Open another instance of file which can be used for ofd locks. */
     /* TODO: What if file is not writable? */
-    fd = lo_inode_open(lo, inode, O_RDWR);
-    if (fd < 0) {
-        *err = -fd;
+    res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
+    if (res < 0) {
+        *err = -res;
         free(plock);
         return NULL;
     }
 
     plock->lock_owner = lock_owner;
-    plock->fd = fd;
+    plock->fd = temp_fd_steal(&inode_fd);
     g_hash_table_insert(inode->posix_locks, GUINT_TO_POINTER(plock->lock_owner),
                         plock);
     return plock;
@@ -2454,6 +2456,7 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
                      struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_inode *inode = lo_inode(req, ino);
     struct lo_data *lo = lo_data(req);
     int res;
@@ -2468,11 +2471,12 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
     }
 
     if (!fi) {
-        fd = lo_inode_open(lo, inode, O_RDWR);
-        if (fd < 0) {
-            res = -fd;
+        res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
+        if (res < 0) {
+            res = -res;
             goto out;
         }
+        fd = inode_fd.fd;
     } else {
         fd = lo_fi_fd(req, fi);
     }
@@ -2482,9 +2486,6 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
     } else {
         res = fsync(fd) == -1 ? errno : 0;
     }
-    if (!fi) {
-        close(fd);
-    }
 out:
     lo_inode_put(lo, &inode);
     fuse_reply_err(req, res);
@@ -3047,7 +3048,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     if (block_xattr(lo, in_name)) {
         fuse_reply_err(req, EOPNOTSUPP);
@@ -3099,12 +3099,12 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      * Otherwise, call fchdir() to avoid open().
      */
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = fgetxattr(fd, name, value, size);
+        ret = fgetxattr(inode_fd.fd, name, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
         ret = lo_inode_fd(inode, &inode_fd);
@@ -3133,10 +3133,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         fuse_reply_xattr(req, ret);
     }
 out_free:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     return;
 
@@ -3157,7 +3153,6 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     inode = lo_inode(req, ino);
     if (!inode) {
@@ -3181,12 +3176,12 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
     }
 
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = flistxattr(fd, value, size);
+        ret = flistxattr(inode_fd.fd, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
         ret = lo_inode_fd(inode, &inode_fd);
@@ -3273,10 +3268,6 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         fuse_reply_xattr(req, ret);
     }
 out_free:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     return;
 
@@ -3299,7 +3290,6 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
     bool switched_creds = false;
     bool cap_fsetid_dropped = false;
     struct lo_cred old = {};
@@ -3345,9 +3335,9 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      */
     open_inode = S_ISREG(inode->filetype) || S_ISDIR(inode->filetype);
     if (open_inode) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
     } else {
@@ -3382,8 +3372,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         switched_creds = true;
     }
     if (open_inode) {
-        assert(fd >= 0);
-        ret = fsetxattr(fd, name, value, size, flags);
+        ret = fsetxattr(inode_fd.fd, name, value, size, flags);
         saverr = ret == -1 ? errno : 0;
     } else {
         ret = setxattr(procname, name, value, size, flags);
@@ -3402,10 +3391,6 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     }
 
 out:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     g_free(mapped_name);
     fuse_reply_err(req, saverr);
@@ -3421,7 +3406,6 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     if (block_xattr(lo, in_name)) {
         fuse_reply_err(req, EOPNOTSUPP);
@@ -3457,12 +3441,12 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
              name);
 
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = fremovexattr(fd, name);
+        ret = fremovexattr(inode_fd.fd, name);
         saverr = ret == -1 ? errno : 0;
     } else {
         ret = lo_inode_fd(inode, &inode_fd);
@@ -3479,10 +3463,6 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     }
 
 out:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     g_free(mapped_name);
     fuse_reply_err(req, saverr);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [Virtio-fs] [PATCH v3 06/10] virtiofsd: Let lo_inode_open() return a TempFd
@ 2021-07-30 15:01   ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Vivek Goyal, Max Reitz

Strictly speaking, this is not necessary, because lo_inode_open() will
always return a new FD owned by the caller, so TempFd.owned will always
be true.

However, auto-cleanup is nice, and in some cases this plays nicely with
an lo_inode_fd() call in another conditional branch (see lo_setattr()).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 138 +++++++++++++------------------
 1 file changed, 59 insertions(+), 79 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 9e1bc37af8..292b7f7e27 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -291,10 +291,8 @@ static void temp_fd_clear(TempFd *temp_fd)
 /**
  * Return an owned fd from *temp_fd that will not be closed when
  * *temp_fd goes out of scope.
- *
- * (TODO: Remove __attribute__ once this is used.)
  */
-static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
+static int temp_fd_steal(TempFd *temp_fd)
 {
     if (temp_fd->owned) {
         temp_fd->owned = false;
@@ -673,9 +671,12 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
  * when a malicious client opens special files such as block device nodes.
  * Symlink inodes are also rejected since symlinks must already have been
  * traversed on the client side.
+ *
+ * The fd is returned in tfd->fd.  The return value is 0 on success and -errno
+ * otherwise.
  */
-static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
-                         int open_flags)
+static int lo_inode_open(const struct lo_data *lo, const struct lo_inode *inode,
+                         int open_flags, TempFd *tfd)
 {
     g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
     int fd;
@@ -694,7 +695,13 @@ static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
     if (fd < 0) {
         return -errno;
     }
-    return fd;
+
+    *tfd = (TempFd) {
+        .fd = fd,
+        .owned = true,
+    };
+
+    return 0;
 }
 
 static void lo_init(void *userdata, struct fuse_conn_info *conn)
@@ -852,7 +859,12 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         return;
     }
 
-    res = lo_inode_fd(inode, &inode_fd);
+    if (!fi && (valid & FUSE_SET_ATTR_SIZE)) {
+        /* We need an O_RDWR FD for ftruncate() */
+        res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
+    } else {
+        res = lo_inode_fd(inode, &inode_fd);
+    }
     if (res < 0) {
         saverr = -res;
         goto out_err;
@@ -900,18 +912,11 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             truncfd = fd;
         } else {
-            truncfd = lo_inode_open(lo, inode, O_RDWR);
-            if (truncfd < 0) {
-                saverr = -truncfd;
-                goto out_err;
-            }
+            truncfd = inode_fd.fd;
         }
 
         saverr = drop_security_capability(lo, truncfd);
         if (saverr) {
-            if (!fi) {
-                close(truncfd);
-            }
             goto out_err;
         }
 
@@ -919,9 +924,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
             res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
             if (res != 0) {
                 saverr = res;
-                if (!fi) {
-                    close(truncfd);
-                }
                 goto out_err;
             }
         }
@@ -934,9 +936,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
                 fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
             }
         }
-        if (!fi) {
-            close(truncfd);
-        }
         if (res == -1) {
             goto out_err;
         }
@@ -1822,11 +1821,12 @@ static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
 static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
                        struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     int error = ENOMEM;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
     struct lo_dirp *d = NULL;
-    int fd;
+    int res;
     ssize_t fh;
 
     inode = lo_inode(req, ino);
@@ -1840,13 +1840,13 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
         goto out_err;
     }
 
-    fd = lo_inode_open(lo, inode, O_RDONLY);
-    if (fd < 0) {
-        error = -fd;
+    res = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+    if (res < 0) {
+        error = -res;
         goto out_err;
     }
 
-    d->dp = fdopendir(fd);
+    d->dp = fdopendir(temp_fd_steal(&inode_fd));
     if (d->dp == NULL) {
         goto out_errno;
     }
@@ -1876,8 +1876,6 @@ out_err:
     if (d) {
         if (d->dp) {
             closedir(d->dp);
-        } else if (fd != -1) {
-            close(fd);
         }
         free(d);
     }
@@ -2077,6 +2075,7 @@ static void update_open_flags(int writeback, int allow_direct_io,
 static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
                       int existing_fd, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     ssize_t fh;
     int fd = existing_fd;
     int err;
@@ -2093,16 +2092,18 @@ static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
             }
         }
 
-        fd = lo_inode_open(lo, inode, fi->flags);
+        err = lo_inode_open(lo, inode, fi->flags, &inode_fd);
 
         if (cap_fsetid_dropped) {
             if (gain_effective_cap("FSETID")) {
                 fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
             }
         }
-        if (fd < 0) {
-            return -fd;
+        if (err < 0) {
+            return -err;
         }
+        fd = temp_fd_steal(&inode_fd);
+
         if (fi->flags & (O_TRUNC)) {
             int err = drop_security_capability(lo, fd);
             if (err) {
@@ -2212,8 +2213,9 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
                                                       uint64_t lock_owner,
                                                       pid_t pid, int *err)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_inode_plock *plock;
-    int fd;
+    int res;
 
     plock =
         g_hash_table_lookup(inode->posix_locks, GUINT_TO_POINTER(lock_owner));
@@ -2230,15 +2232,15 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
 
     /* Open another instance of file which can be used for ofd locks. */
     /* TODO: What if file is not writable? */
-    fd = lo_inode_open(lo, inode, O_RDWR);
-    if (fd < 0) {
-        *err = -fd;
+    res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
+    if (res < 0) {
+        *err = -res;
         free(plock);
         return NULL;
     }
 
     plock->lock_owner = lock_owner;
-    plock->fd = fd;
+    plock->fd = temp_fd_steal(&inode_fd);
     g_hash_table_insert(inode->posix_locks, GUINT_TO_POINTER(plock->lock_owner),
                         plock);
     return plock;
@@ -2454,6 +2456,7 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
                      struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_inode *inode = lo_inode(req, ino);
     struct lo_data *lo = lo_data(req);
     int res;
@@ -2468,11 +2471,12 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
     }
 
     if (!fi) {
-        fd = lo_inode_open(lo, inode, O_RDWR);
-        if (fd < 0) {
-            res = -fd;
+        res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
+        if (res < 0) {
+            res = -res;
             goto out;
         }
+        fd = inode_fd.fd;
     } else {
         fd = lo_fi_fd(req, fi);
     }
@@ -2482,9 +2486,6 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
     } else {
         res = fsync(fd) == -1 ? errno : 0;
     }
-    if (!fi) {
-        close(fd);
-    }
 out:
     lo_inode_put(lo, &inode);
     fuse_reply_err(req, res);
@@ -3047,7 +3048,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     if (block_xattr(lo, in_name)) {
         fuse_reply_err(req, EOPNOTSUPP);
@@ -3099,12 +3099,12 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      * Otherwise, call fchdir() to avoid open().
      */
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = fgetxattr(fd, name, value, size);
+        ret = fgetxattr(inode_fd.fd, name, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
         ret = lo_inode_fd(inode, &inode_fd);
@@ -3133,10 +3133,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         fuse_reply_xattr(req, ret);
     }
 out_free:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     return;
 
@@ -3157,7 +3153,6 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     inode = lo_inode(req, ino);
     if (!inode) {
@@ -3181,12 +3176,12 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
     }
 
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = flistxattr(fd, value, size);
+        ret = flistxattr(inode_fd.fd, value, size);
         saverr = ret == -1 ? errno : 0;
     } else {
         ret = lo_inode_fd(inode, &inode_fd);
@@ -3273,10 +3268,6 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         fuse_reply_xattr(req, ret);
     }
 out_free:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     return;
 
@@ -3299,7 +3290,6 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
     bool switched_creds = false;
     bool cap_fsetid_dropped = false;
     struct lo_cred old = {};
@@ -3345,9 +3335,9 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      */
     open_inode = S_ISREG(inode->filetype) || S_ISDIR(inode->filetype);
     if (open_inode) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
     } else {
@@ -3382,8 +3372,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         switched_creds = true;
     }
     if (open_inode) {
-        assert(fd >= 0);
-        ret = fsetxattr(fd, name, value, size, flags);
+        ret = fsetxattr(inode_fd.fd, name, value, size, flags);
         saverr = ret == -1 ? errno : 0;
     } else {
         ret = setxattr(procname, name, value, size, flags);
@@ -3402,10 +3391,6 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     }
 
 out:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     g_free(mapped_name);
     fuse_reply_err(req, saverr);
@@ -3421,7 +3406,6 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     if (block_xattr(lo, in_name)) {
         fuse_reply_err(req, EOPNOTSUPP);
@@ -3457,12 +3441,12 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
              name);
 
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = fremovexattr(fd, name);
+        ret = fremovexattr(inode_fd.fd, name);
         saverr = ret == -1 ? errno : 0;
     } else {
         ret = lo_inode_fd(inode, &inode_fd);
@@ -3479,10 +3463,6 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     }
 
 out:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     g_free(mapped_name);
     fuse_reply_err(req, saverr);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v3 07/10] virtiofsd: Add lo_inode.fhandle
  2021-07-30 15:01 ` [Virtio-fs] " Max Reitz
@ 2021-07-30 15:01   ` Max Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal, Max Reitz

This new field is an alternative to lo_inode.fd: Either of the two must
be set.  In case an O_PATH FD is needed for some lo_inode, it is either
taken from lo_inode.fd, if valid, or a temporary FD is opened with
open_by_handle_at().

Using a file handle instead of an FD has the advantage of keeping the
number of open file descriptors low.

Because open_by_handle_at() requires a mount FD (i.e. a non-O_PATH FD
opened on the filesystem to which the file handle refers), but every
lo_fhandle only has a mount ID (as returned by name_to_handle_at()), we
keep a hash map of such FDs in mount_fds (mapping ID to FD).
get_file_handle(), which is added by a later patch, will ensure that
every mount ID for which we have generated a handle has a corresponding
entry in mount_fds.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c      | 116 ++++++++++++++++++++++----
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 2 files changed, 102 insertions(+), 15 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 292b7f7e27..487448d666 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -88,8 +88,25 @@ struct lo_key {
     uint64_t mnt_id;
 };
 
+struct lo_fhandle {
+    union {
+        struct file_handle handle;
+        char padding[sizeof(struct file_handle) + MAX_HANDLE_SZ];
+    };
+    int mount_id;
+};
+
+/* Maps mount IDs to an FD that we can pass to open_by_handle_at() */
+static GHashTable *mount_fds;
+pthread_rwlock_t mount_fds_lock = PTHREAD_RWLOCK_INITIALIZER;
+
 struct lo_inode {
+    /*
+     * Either of fd or fhandle must be set (i.e. >= 0 or non-NULL,
+     * respectively).
+     */
     int fd;
+    struct lo_fhandle *fhandle;
 
     /*
      * Atomic reference count for this object.  The nlookup field holds a
@@ -302,6 +319,44 @@ static int temp_fd_steal(TempFd *temp_fd)
     }
 }
 
+/**
+ * Open the given file handle with the given flags.
+ *
+ * The mount FD to pass to open_by_handle_at() is taken from the
+ * mount_fds hash map.
+ *
+ * On error, return -errno.
+ */
+static int open_file_handle(const struct lo_fhandle *fh, int flags)
+{
+    gpointer mount_fd_ptr;
+    int mount_fd;
+    bool found;
+    int ret;
+
+    ret = pthread_rwlock_rdlock(&mount_fds_lock);
+    if (ret) {
+        return -ret;
+    }
+
+    /* mount_fd == 0 is valid, so we need lookup_extended */
+    found = g_hash_table_lookup_extended(mount_fds,
+                                         GINT_TO_POINTER(fh->mount_id),
+                                         NULL, &mount_fd_ptr);
+    pthread_rwlock_unlock(&mount_fds_lock);
+    if (!found) {
+        return -EINVAL;
+    }
+    mount_fd = GPOINTER_TO_INT(mount_fd_ptr);
+
+    ret = open_by_handle_at(mount_fd, (struct file_handle *)&fh->handle, flags);
+    if (ret < 0) {
+        return -errno;
+    }
+
+    return ret;
+}
+
 /*
  * Load capng's state from our saved state if the current thread
  * hadn't previously been loaded.
@@ -608,7 +663,11 @@ static void lo_inode_put(struct lo_data *lo, struct lo_inode **inodep)
     *inodep = NULL;
 
     if (g_atomic_int_dec_and_test(&inode->refcount)) {
-        close(inode->fd);
+        if (inode->fd >= 0) {
+            close(inode->fd);
+        } else {
+            g_free(inode->fhandle);
+        }
         free(inode);
     }
 }
@@ -635,10 +694,25 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
 
 static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
 {
-    *tfd = (TempFd) {
-        .fd = inode->fd,
-        .owned = false,
-    };
+    if (inode->fd >= 0) {
+        *tfd = (TempFd) {
+            .fd = inode->fd,
+            .owned = false,
+        };
+    } else {
+        int fd;
+
+        assert(inode->fhandle != NULL);
+        fd = open_file_handle(inode->fhandle, O_PATH);
+        if (fd < 0) {
+            return -errno;
+        }
+
+        *tfd = (TempFd) {
+            .fd = fd,
+            .owned = true,
+        };
+    }
 
     return 0;
 }
@@ -678,22 +752,32 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
 static int lo_inode_open(const struct lo_data *lo, const struct lo_inode *inode,
                          int open_flags, TempFd *tfd)
 {
-    g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
+    g_autofree char *fd_str = NULL;
     int fd;
 
     if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) {
         return -EBADF;
     }
 
-    /*
-     * The file is a symlink so O_NOFOLLOW must be ignored. We checked earlier
-     * that the inode is not a special file but if an external process races
-     * with us then symlinks are traversed here. It is not possible to escape
-     * the shared directory since it is mounted as "/" though.
-     */
-    fd = openat(lo->proc_self_fd, fd_str, open_flags & ~O_NOFOLLOW);
-    if (fd < 0) {
-        return -errno;
+    if (inode->fd >= 0) {
+        /*
+         * The file is a symlink so O_NOFOLLOW must be ignored. We checked
+         * earlier that the inode is not a special file but if an external
+         * process races with us then symlinks are traversed here. It is not
+         * possible to escape the shared directory since it is mounted as "/"
+         * though.
+         */
+        fd_str = g_strdup_printf("%d", inode->fd);
+        fd = openat(lo->proc_self_fd, fd_str, open_flags & ~O_NOFOLLOW);
+        if (fd < 0) {
+            return -errno;
+        }
+    } else {
+        assert(inode->fhandle != NULL);
+        fd = open_file_handle(inode->fhandle, open_flags);
+        if (fd < 0) {
+            return fd;
+        }
     }
 
     *tfd = (TempFd) {
@@ -4110,6 +4194,8 @@ int main(int argc, char *argv[])
     lo.root.fuse_ino = FUSE_ROOT_ID;
     lo.cache = CACHE_AUTO;
 
+    mount_fds = g_hash_table_new(NULL, NULL);
+
     /*
      * Set up the ino map like this:
      * [0] Reserved (will not be used)
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index f49ed94b5e..af04c638cb 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -77,6 +77,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(statx),
     SCMP_SYS(open),
     SCMP_SYS(openat),
+    SCMP_SYS(open_by_handle_at),
     SCMP_SYS(ppoll),
     SCMP_SYS(prctl), /* TODO restrict to just PR_SET_NAME? */
     SCMP_SYS(preadv),
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [Virtio-fs] [PATCH v3 07/10] virtiofsd: Add lo_inode.fhandle
@ 2021-07-30 15:01   ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Vivek Goyal, Max Reitz

This new field is an alternative to lo_inode.fd: Either of the two must
be set.  In case an O_PATH FD is needed for some lo_inode, it is either
taken from lo_inode.fd, if valid, or a temporary FD is opened with
open_by_handle_at().

Using a file handle instead of an FD has the advantage of keeping the
number of open file descriptors low.

Because open_by_handle_at() requires a mount FD (i.e. a non-O_PATH FD
opened on the filesystem to which the file handle refers), but every
lo_fhandle only has a mount ID (as returned by name_to_handle_at()), we
keep a hash map of such FDs in mount_fds (mapping ID to FD).
get_file_handle(), which is added by a later patch, will ensure that
every mount ID for which we have generated a handle has a corresponding
entry in mount_fds.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c      | 116 ++++++++++++++++++++++----
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 2 files changed, 102 insertions(+), 15 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 292b7f7e27..487448d666 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -88,8 +88,25 @@ struct lo_key {
     uint64_t mnt_id;
 };
 
+struct lo_fhandle {
+    union {
+        struct file_handle handle;
+        char padding[sizeof(struct file_handle) + MAX_HANDLE_SZ];
+    };
+    int mount_id;
+};
+
+/* Maps mount IDs to an FD that we can pass to open_by_handle_at() */
+static GHashTable *mount_fds;
+pthread_rwlock_t mount_fds_lock = PTHREAD_RWLOCK_INITIALIZER;
+
 struct lo_inode {
+    /*
+     * Either of fd or fhandle must be set (i.e. >= 0 or non-NULL,
+     * respectively).
+     */
     int fd;
+    struct lo_fhandle *fhandle;
 
     /*
      * Atomic reference count for this object.  The nlookup field holds a
@@ -302,6 +319,44 @@ static int temp_fd_steal(TempFd *temp_fd)
     }
 }
 
+/**
+ * Open the given file handle with the given flags.
+ *
+ * The mount FD to pass to open_by_handle_at() is taken from the
+ * mount_fds hash map.
+ *
+ * On error, return -errno.
+ */
+static int open_file_handle(const struct lo_fhandle *fh, int flags)
+{
+    gpointer mount_fd_ptr;
+    int mount_fd;
+    bool found;
+    int ret;
+
+    ret = pthread_rwlock_rdlock(&mount_fds_lock);
+    if (ret) {
+        return -ret;
+    }
+
+    /* mount_fd == 0 is valid, so we need lookup_extended */
+    found = g_hash_table_lookup_extended(mount_fds,
+                                         GINT_TO_POINTER(fh->mount_id),
+                                         NULL, &mount_fd_ptr);
+    pthread_rwlock_unlock(&mount_fds_lock);
+    if (!found) {
+        return -EINVAL;
+    }
+    mount_fd = GPOINTER_TO_INT(mount_fd_ptr);
+
+    ret = open_by_handle_at(mount_fd, (struct file_handle *)&fh->handle, flags);
+    if (ret < 0) {
+        return -errno;
+    }
+
+    return ret;
+}
+
 /*
  * Load capng's state from our saved state if the current thread
  * hadn't previously been loaded.
@@ -608,7 +663,11 @@ static void lo_inode_put(struct lo_data *lo, struct lo_inode **inodep)
     *inodep = NULL;
 
     if (g_atomic_int_dec_and_test(&inode->refcount)) {
-        close(inode->fd);
+        if (inode->fd >= 0) {
+            close(inode->fd);
+        } else {
+            g_free(inode->fhandle);
+        }
         free(inode);
     }
 }
@@ -635,10 +694,25 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
 
 static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
 {
-    *tfd = (TempFd) {
-        .fd = inode->fd,
-        .owned = false,
-    };
+    if (inode->fd >= 0) {
+        *tfd = (TempFd) {
+            .fd = inode->fd,
+            .owned = false,
+        };
+    } else {
+        int fd;
+
+        assert(inode->fhandle != NULL);
+        fd = open_file_handle(inode->fhandle, O_PATH);
+        if (fd < 0) {
+            return -errno;
+        }
+
+        *tfd = (TempFd) {
+            .fd = fd,
+            .owned = true,
+        };
+    }
 
     return 0;
 }
@@ -678,22 +752,32 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
 static int lo_inode_open(const struct lo_data *lo, const struct lo_inode *inode,
                          int open_flags, TempFd *tfd)
 {
-    g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
+    g_autofree char *fd_str = NULL;
     int fd;
 
     if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) {
         return -EBADF;
     }
 
-    /*
-     * The file is a symlink so O_NOFOLLOW must be ignored. We checked earlier
-     * that the inode is not a special file but if an external process races
-     * with us then symlinks are traversed here. It is not possible to escape
-     * the shared directory since it is mounted as "/" though.
-     */
-    fd = openat(lo->proc_self_fd, fd_str, open_flags & ~O_NOFOLLOW);
-    if (fd < 0) {
-        return -errno;
+    if (inode->fd >= 0) {
+        /*
+         * The file is a symlink so O_NOFOLLOW must be ignored. We checked
+         * earlier that the inode is not a special file but if an external
+         * process races with us then symlinks are traversed here. It is not
+         * possible to escape the shared directory since it is mounted as "/"
+         * though.
+         */
+        fd_str = g_strdup_printf("%d", inode->fd);
+        fd = openat(lo->proc_self_fd, fd_str, open_flags & ~O_NOFOLLOW);
+        if (fd < 0) {
+            return -errno;
+        }
+    } else {
+        assert(inode->fhandle != NULL);
+        fd = open_file_handle(inode->fhandle, open_flags);
+        if (fd < 0) {
+            return fd;
+        }
     }
 
     *tfd = (TempFd) {
@@ -4110,6 +4194,8 @@ int main(int argc, char *argv[])
     lo.root.fuse_ino = FUSE_ROOT_ID;
     lo.cache = CACHE_AUTO;
 
+    mount_fds = g_hash_table_new(NULL, NULL);
+
     /*
      * Set up the ino map like this:
      * [0] Reserved (will not be used)
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index f49ed94b5e..af04c638cb 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -77,6 +77,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(statx),
     SCMP_SYS(open),
     SCMP_SYS(openat),
+    SCMP_SYS(open_by_handle_at),
     SCMP_SYS(ppoll),
     SCMP_SYS(prctl), /* TODO restrict to just PR_SET_NAME? */
     SCMP_SYS(preadv),
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table
  2021-07-30 15:01 ` [Virtio-fs] " Max Reitz
@ 2021-07-30 15:01   ` Max Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal, Max Reitz

Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
its inode ID will remain in use until we drop our lo_inode (and
lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
the inode ID as an lo_inode key, because any inode with an inode ID we
find in lo_data.inodes (on the same filesystem) must be the exact same
file.

This will change when we start setting lo_inode.fhandle so we do not
have to keep an O_PATH FD open.  Then, unlinking such an inode will
immediately remove it, so its ID can then be reused by newly created
files, even while the lo_inode object is still there[1].

So creating a new file can then reuse the old file's inode ID, and
looking up the new file would lead to us finding the old file's
lo_inode, which is not ideal.

Luckily, just as file handles cause this problem, they also solve it:  A
file handle contains a generation ID, which changes when an inode ID is
reused, so the new file can be distinguished from the old one.  So all
we need to do is to add a second map besides lo_data.inodes that maps
file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.

Unfortunately, we cannot rely on being able to generate file handles
every time.  Therefore, we still enter every lo_inode object into
inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
potential inodes_by_handle entry then has precedence, the inodes_by_ids
entry is just a fallback.

Note that we do not generate lo_fhandle objects yet, and so we also do
not enter anything into the inodes_by_handle map yet.  Also, all lookups
skip that map.  We might manually create file handles with some code
that is immediately removed by the next patch again, but that would
break the assumption in lo_find() that every lo_inode with a non-NULL
.fhandle must have an entry in inodes_by_handle and vice versa.  So we
leave actually using the inodes_by_handle map for the next patch.

[1] If some application in the guest still has the file open, there is
going to be a corresponding FD mapping in lo_data.fd_map.  In such a
case, the inode will only go away once every application in the guest
has closed it.  The problem described only applies to cases where the
guest does not have the file open, and it is just in the dentry cache,
basically.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
 1 file changed, 65 insertions(+), 16 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 487448d666..f9d8b2f134 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -180,7 +180,8 @@ struct lo_data {
     int announce_submounts;
     bool use_statx;
     struct lo_inode root;
-    GHashTable *inodes; /* protected by lo->mutex */
+    GHashTable *inodes_by_ids; /* protected by lo->mutex */
+    GHashTable *inodes_by_handle; /* protected by lo->mutex */
     struct lo_map ino_map; /* protected by lo->mutex */
     struct lo_map dirp_map; /* protected by lo->mutex */
     struct lo_map fd_map; /* protected by lo->mutex */
@@ -263,8 +264,9 @@ static struct {
 /* That we loaded cap-ng in the current thread from the saved */
 static __thread bool cap_loaded = 0;
 
-static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
-                                uint64_t mnt_id);
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id);
 static int xattr_map_client(const struct lo_data *lo, const char *client_name,
                             char **out_name);
 
@@ -1064,18 +1066,40 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
-static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
-                                uint64_t mnt_id)
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id)
 {
-    struct lo_inode *p;
-    struct lo_key key = {
+    struct lo_inode *p = NULL;
+    struct lo_key ids_key = {
         .ino = st->st_ino,
         .dev = st->st_dev,
         .mnt_id = mnt_id,
     };
 
     pthread_mutex_lock(&lo->mutex);
-    p = g_hash_table_lookup(lo->inodes, &key);
+    if (fhandle) {
+        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
+    }
+    if (!p) {
+        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
+        /*
+         * When we had to fall back to looking up an inode by its
+         * inode ID, ensure that we hit an entry that has a valid file
+         * descriptor.  Having an FD open means that the inode cannot
+         * really be deleted until the FD is closed, so that the inode
+         * ID remains valid until we evict our lo_inode.
+         * With no FD open (and just a file handle), the inode can be
+         * deleted while we still have our lo_inode, and so the inode
+         * ID may be reused by a completely different new inode.  We
+         * then must look up the lo_inode by file handle, because this
+         * handle contains a generation ID to differentiate between
+         * the old and the new inode.
+         */
+        if (p && p->fd == -1) {
+            p = NULL;
+        }
+    }
     if (p) {
         assert(p->nlookup > 0);
         p->nlookup++;
@@ -1215,7 +1239,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
-    inode = lo_find(lo, &e->attr, mnt_id);
+    inode = lo_find(lo, NULL, &e->attr, mnt_id);
     if (inode) {
         close(newfd);
     } else {
@@ -1245,7 +1269,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         }
         pthread_mutex_lock(&lo->mutex);
         inode->fuse_ino = lo_add_inode_mapping(req, inode);
-        g_hash_table_insert(lo->inodes, &inode->key, inode);
+        g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
         pthread_mutex_unlock(&lo->mutex);
     }
     e->ino = inode->fuse_ino;
@@ -1609,7 +1633,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
         goto out;
     }
 
-    inode = lo_find(lo, &attr, mnt_id);
+    inode = lo_find(lo, NULL, &attr, mnt_id);
 
 out:
     lo_inode_put(lo, &dir);
@@ -1776,7 +1800,7 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
     inode->nlookup -= n;
     if (!inode->nlookup) {
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
-        g_hash_table_remove(lo->inodes, &inode->key);
+        g_hash_table_remove(lo->inodes_by_ids, &inode->key);
         if (lo->posix_lock) {
             if (g_hash_table_size(inode->posix_locks)) {
                 fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
@@ -3603,7 +3627,7 @@ static void lo_destroy(void *userdata)
         GHashTableIter iter;
         gpointer key, value;
 
-        g_hash_table_iter_init(&iter, lo->inodes);
+        g_hash_table_iter_init(&iter, lo->inodes_by_ids);
         if (!g_hash_table_iter_next(&iter, &key, &value)) {
             break;
         }
@@ -4129,10 +4153,34 @@ static gboolean lo_key_equal(gconstpointer a, gconstpointer b)
     return la->ino == lb->ino && la->dev == lb->dev && la->mnt_id == lb->mnt_id;
 }
 
+static guint lo_fhandle_hash(gconstpointer key)
+{
+    const struct lo_fhandle *fh = key;
+    guint hash;
+    size_t i;
+
+    /* Basically g_str_hash() */
+    hash = 5381;
+    for (i = 0; i < sizeof(fh->padding); i++) {
+        hash += hash * 33 + (unsigned char)fh->padding[i];
+    }
+    hash += hash * 33 + fh->mount_id;
+
+    return hash;
+}
+
+static gboolean lo_fhandle_equal(gconstpointer a, gconstpointer b)
+{
+    return !memcmp(a, b, sizeof(struct lo_fhandle));
+}
+
 static void fuse_lo_data_cleanup(struct lo_data *lo)
 {
-    if (lo->inodes) {
-        g_hash_table_destroy(lo->inodes);
+    if (lo->inodes_by_ids) {
+        g_hash_table_destroy(lo->inodes_by_ids);
+    }
+    if (lo->inodes_by_ids) {
+        g_hash_table_destroy(lo->inodes_by_handle);
     }
 
     if (lo->root.posix_locks) {
@@ -4189,7 +4237,8 @@ int main(int argc, char *argv[])
     qemu_init_exec_dir(argv[0]);
 
     pthread_mutex_init(&lo.mutex, NULL);
-    lo.inodes = g_hash_table_new(lo_key_hash, lo_key_equal);
+    lo.inodes_by_ids = g_hash_table_new(lo_key_hash, lo_key_equal);
+    lo.inodes_by_handle = g_hash_table_new(lo_fhandle_hash, lo_fhandle_equal);
     lo.root.fd = -1;
     lo.root.fuse_ino = FUSE_ROOT_ID;
     lo.cache = CACHE_AUTO;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [Virtio-fs] [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table
@ 2021-07-30 15:01   ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Vivek Goyal, Max Reitz

Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
its inode ID will remain in use until we drop our lo_inode (and
lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
the inode ID as an lo_inode key, because any inode with an inode ID we
find in lo_data.inodes (on the same filesystem) must be the exact same
file.

This will change when we start setting lo_inode.fhandle so we do not
have to keep an O_PATH FD open.  Then, unlinking such an inode will
immediately remove it, so its ID can then be reused by newly created
files, even while the lo_inode object is still there[1].

So creating a new file can then reuse the old file's inode ID, and
looking up the new file would lead to us finding the old file's
lo_inode, which is not ideal.

Luckily, just as file handles cause this problem, they also solve it:  A
file handle contains a generation ID, which changes when an inode ID is
reused, so the new file can be distinguished from the old one.  So all
we need to do is to add a second map besides lo_data.inodes that maps
file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.

Unfortunately, we cannot rely on being able to generate file handles
every time.  Therefore, we still enter every lo_inode object into
inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
potential inodes_by_handle entry then has precedence, the inodes_by_ids
entry is just a fallback.

Note that we do not generate lo_fhandle objects yet, and so we also do
not enter anything into the inodes_by_handle map yet.  Also, all lookups
skip that map.  We might manually create file handles with some code
that is immediately removed by the next patch again, but that would
break the assumption in lo_find() that every lo_inode with a non-NULL
.fhandle must have an entry in inodes_by_handle and vice versa.  So we
leave actually using the inodes_by_handle map for the next patch.

[1] If some application in the guest still has the file open, there is
going to be a corresponding FD mapping in lo_data.fd_map.  In such a
case, the inode will only go away once every application in the guest
has closed it.  The problem described only applies to cases where the
guest does not have the file open, and it is just in the dentry cache,
basically.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
 1 file changed, 65 insertions(+), 16 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 487448d666..f9d8b2f134 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -180,7 +180,8 @@ struct lo_data {
     int announce_submounts;
     bool use_statx;
     struct lo_inode root;
-    GHashTable *inodes; /* protected by lo->mutex */
+    GHashTable *inodes_by_ids; /* protected by lo->mutex */
+    GHashTable *inodes_by_handle; /* protected by lo->mutex */
     struct lo_map ino_map; /* protected by lo->mutex */
     struct lo_map dirp_map; /* protected by lo->mutex */
     struct lo_map fd_map; /* protected by lo->mutex */
@@ -263,8 +264,9 @@ static struct {
 /* That we loaded cap-ng in the current thread from the saved */
 static __thread bool cap_loaded = 0;
 
-static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
-                                uint64_t mnt_id);
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id);
 static int xattr_map_client(const struct lo_data *lo, const char *client_name,
                             char **out_name);
 
@@ -1064,18 +1066,40 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
-static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
-                                uint64_t mnt_id)
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id)
 {
-    struct lo_inode *p;
-    struct lo_key key = {
+    struct lo_inode *p = NULL;
+    struct lo_key ids_key = {
         .ino = st->st_ino,
         .dev = st->st_dev,
         .mnt_id = mnt_id,
     };
 
     pthread_mutex_lock(&lo->mutex);
-    p = g_hash_table_lookup(lo->inodes, &key);
+    if (fhandle) {
+        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
+    }
+    if (!p) {
+        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
+        /*
+         * When we had to fall back to looking up an inode by its
+         * inode ID, ensure that we hit an entry that has a valid file
+         * descriptor.  Having an FD open means that the inode cannot
+         * really be deleted until the FD is closed, so that the inode
+         * ID remains valid until we evict our lo_inode.
+         * With no FD open (and just a file handle), the inode can be
+         * deleted while we still have our lo_inode, and so the inode
+         * ID may be reused by a completely different new inode.  We
+         * then must look up the lo_inode by file handle, because this
+         * handle contains a generation ID to differentiate between
+         * the old and the new inode.
+         */
+        if (p && p->fd == -1) {
+            p = NULL;
+        }
+    }
     if (p) {
         assert(p->nlookup > 0);
         p->nlookup++;
@@ -1215,7 +1239,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
-    inode = lo_find(lo, &e->attr, mnt_id);
+    inode = lo_find(lo, NULL, &e->attr, mnt_id);
     if (inode) {
         close(newfd);
     } else {
@@ -1245,7 +1269,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         }
         pthread_mutex_lock(&lo->mutex);
         inode->fuse_ino = lo_add_inode_mapping(req, inode);
-        g_hash_table_insert(lo->inodes, &inode->key, inode);
+        g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
         pthread_mutex_unlock(&lo->mutex);
     }
     e->ino = inode->fuse_ino;
@@ -1609,7 +1633,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
         goto out;
     }
 
-    inode = lo_find(lo, &attr, mnt_id);
+    inode = lo_find(lo, NULL, &attr, mnt_id);
 
 out:
     lo_inode_put(lo, &dir);
@@ -1776,7 +1800,7 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
     inode->nlookup -= n;
     if (!inode->nlookup) {
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
-        g_hash_table_remove(lo->inodes, &inode->key);
+        g_hash_table_remove(lo->inodes_by_ids, &inode->key);
         if (lo->posix_lock) {
             if (g_hash_table_size(inode->posix_locks)) {
                 fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
@@ -3603,7 +3627,7 @@ static void lo_destroy(void *userdata)
         GHashTableIter iter;
         gpointer key, value;
 
-        g_hash_table_iter_init(&iter, lo->inodes);
+        g_hash_table_iter_init(&iter, lo->inodes_by_ids);
         if (!g_hash_table_iter_next(&iter, &key, &value)) {
             break;
         }
@@ -4129,10 +4153,34 @@ static gboolean lo_key_equal(gconstpointer a, gconstpointer b)
     return la->ino == lb->ino && la->dev == lb->dev && la->mnt_id == lb->mnt_id;
 }
 
+static guint lo_fhandle_hash(gconstpointer key)
+{
+    const struct lo_fhandle *fh = key;
+    guint hash;
+    size_t i;
+
+    /* Basically g_str_hash() */
+    hash = 5381;
+    for (i = 0; i < sizeof(fh->padding); i++) {
+        hash += hash * 33 + (unsigned char)fh->padding[i];
+    }
+    hash += hash * 33 + fh->mount_id;
+
+    return hash;
+}
+
+static gboolean lo_fhandle_equal(gconstpointer a, gconstpointer b)
+{
+    return !memcmp(a, b, sizeof(struct lo_fhandle));
+}
+
 static void fuse_lo_data_cleanup(struct lo_data *lo)
 {
-    if (lo->inodes) {
-        g_hash_table_destroy(lo->inodes);
+    if (lo->inodes_by_ids) {
+        g_hash_table_destroy(lo->inodes_by_ids);
+    }
+    if (lo->inodes_by_ids) {
+        g_hash_table_destroy(lo->inodes_by_handle);
     }
 
     if (lo->root.posix_locks) {
@@ -4189,7 +4237,8 @@ int main(int argc, char *argv[])
     qemu_init_exec_dir(argv[0]);
 
     pthread_mutex_init(&lo.mutex, NULL);
-    lo.inodes = g_hash_table_new(lo_key_hash, lo_key_equal);
+    lo.inodes_by_ids = g_hash_table_new(lo_key_hash, lo_key_equal);
+    lo.inodes_by_handle = g_hash_table_new(lo_fhandle_hash, lo_fhandle_equal);
     lo.root.fd = -1;
     lo.root.fuse_ino = FUSE_ROOT_ID;
     lo.cache = CACHE_AUTO;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
  2021-07-30 15:01 ` [Virtio-fs] " Max Reitz
@ 2021-07-30 15:01   ` Max Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal, Max Reitz

When the inode_file_handles option is set, try to generate a file handle
for new inodes instead of opening an O_PATH FD.

Being able to open these again will require CAP_DAC_READ_SEARCH, so the
description text tells the user they will also need to specify
-o modcaps=+dac_read_search.

Generating a file handle returns the mount ID it is valid for.  Opening
it will require an FD instead.  We have mount_fds to map an ID to an FD.
get_file_handle() fills the hash map by opening the file we have
generated a handle for.  To verify that the resulting FD indeed
represents the handle's mount ID, we use statx().  Therefore, using file
handles requires statx() support.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/helper.c              |   3 +
 tools/virtiofsd/passthrough_ll.c      | 194 ++++++++++++++++++++++++--
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 3 files changed, 190 insertions(+), 8 deletions(-)

diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index a8295d975a..aa63a21d43 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -187,6 +187,9 @@ void fuse_cmdline_help(void)
            "                               default: no_allow_direct_io\n"
            "    -o announce_submounts      Announce sub-mount points to the guest\n"
            "    -o posix_acl/no_posix_acl  Enable/Disable posix_acl. (default: disabled)\n"
+           "    -o inode_file_handles      Use file handles to reference inodes\n"
+           "                               instead of O_PATH file descriptors\n"
+           "                               (requires -o modcaps=+dac_read_search)\n"
            );
 }
 
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index f9d8b2f134..ac95961d12 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -194,6 +194,7 @@ struct lo_data {
     /* If set, virtiofsd is responsible for setting umask during creation */
     bool change_umask;
     int user_posix_acl, posix_acl;
+    int inode_file_handles;
 };
 
 /**
@@ -250,6 +251,10 @@ static const struct fuse_opt lo_opts[] = {
     { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
     { "posix_acl", offsetof(struct lo_data, user_posix_acl), 1 },
     { "no_posix_acl", offsetof(struct lo_data, user_posix_acl), 0 },
+    { "inode_file_handles", offsetof(struct lo_data, inode_file_handles), 1 },
+    { "no_inode_file_handles",
+      offsetof(struct lo_data, inode_file_handles),
+      0 },
     FUSE_OPT_END
 };
 static bool use_syslog = false;
@@ -321,6 +326,135 @@ static int temp_fd_steal(TempFd *temp_fd)
     }
 }
 
+/**
+ * Generate a file handle for the given dirfd/name combination.
+ *
+ * If mount_fds does not yet contain an entry for the handle's mount
+ * ID, (re)open dirfd/name in O_RDONLY mode and add it to mount_fds
+ * as the FD for that mount ID.  (That is the file that we have
+ * generated a handle for, so it should be representative for the
+ * mount ID.  However, to be sure (and to rule out races), we use
+ * statx() to verify that our assumption is correct.)
+ */
+static struct lo_fhandle *get_file_handle(struct lo_data *lo,
+                                          int dirfd, const char *name)
+{
+    /* We need statx() to verify the mount ID */
+#if defined(CONFIG_STATX) && defined(STATX_MNT_ID)
+    struct lo_fhandle *fh;
+    int ret;
+
+    if (!lo->use_statx || !lo->inode_file_handles) {
+        return NULL;
+    }
+
+    fh = g_new0(struct lo_fhandle, 1);
+
+    fh->handle.handle_bytes = sizeof(fh->padding) - sizeof(fh->handle);
+    ret = name_to_handle_at(dirfd, name, &fh->handle, &fh->mount_id,
+                            AT_EMPTY_PATH);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    if (pthread_rwlock_rdlock(&mount_fds_lock)) {
+        goto fail;
+    }
+    if (!g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
+        g_auto(TempFd) path_fd = TEMP_FD_INIT;
+        struct statx stx;
+        char procname[64];
+        int fd;
+
+        pthread_rwlock_unlock(&mount_fds_lock);
+
+        /*
+         * Before opening an O_RDONLY fd, check whether dirfd/name is a regular
+         * file or directory, because we must not open anything else with
+         * anything but O_PATH.
+         * (And we use that occasion to verify that the file has the mount ID we
+         * need.)
+         */
+        if (name[0]) {
+            path_fd.fd = openat(dirfd, name, O_PATH);
+            if (path_fd.fd < 0) {
+                goto fail;
+            }
+            path_fd.owned = true;
+        } else {
+            path_fd.fd = dirfd;
+            path_fd.owned = false;
+        }
+
+        ret = statx(path_fd.fd, "", AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
+                    STATX_TYPE | STATX_MNT_ID, &stx);
+        if (ret < 0) {
+            if (errno == ENOSYS) {
+                lo->use_statx = false;
+                fuse_log(FUSE_LOG_WARNING,
+                         "statx() does not work: Will not be able to use file "
+                         "handles for inodes\n");
+            }
+            goto fail;
+        }
+        if (!(stx.stx_mask & STATX_MNT_ID) || stx.stx_mnt_id != fh->mount_id) {
+            /*
+             * One reason for stx_mnt_id != mount_id could be that dirfd/name
+             * is a directory, and some other filesystem was mounted there
+             * between us generating the file handle and then opening the FD.
+             * (Other kinds of races might be possible, too.)
+             * Failing this function is not fatal, though, because our caller
+             * (lo_do_lookup()) will just fall back to opening an O_PATH FD to
+             * store in lo_inode.fd instead of storing a file handle in
+             * lo_inode.fhandle.  So we do not need to try too hard to get an
+             * FD for fh->mount_id so this function could succeed.
+             */
+            goto fail;
+        }
+        if (!(stx.stx_mask & STATX_TYPE) ||
+            !(S_ISREG(stx.stx_mode) || S_ISDIR(stx.stx_mode)))
+        {
+            /*
+             * We must not open special files with anything but O_PATH, so we
+             * cannot use this file for mount_fds.
+             * Just return a failure in such a case and let the lo_inode have
+             * an O_PATH fd instead of a file handle.
+             */
+            goto fail;
+        }
+
+        /* Now that we know this fd is safe to open, do it */
+        snprintf(procname, sizeof(procname), "%i", path_fd.fd);
+        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        if (fd < 0) {
+            goto fail;
+        }
+
+        if (pthread_rwlock_wrlock(&mount_fds_lock)) {
+            goto fail;
+        }
+
+        /* Check again, might have changed */
+        if (g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
+            close(fd);
+        } else {
+            g_hash_table_insert(mount_fds,
+                                GINT_TO_POINTER(fh->mount_id),
+                                GINT_TO_POINTER(fd));
+        }
+    }
+    pthread_rwlock_unlock(&mount_fds_lock);
+
+    return fh;
+
+fail:
+    free(fh);
+    return NULL;
+#else /* defined(CONFIG_STATX) && defined(STATX_MNT_ID) */
+    return NULL;
+#endif
+}
+
 /**
  * Open the given file handle with the given flags.
  *
@@ -1165,6 +1299,11 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
             return -1;
         }
         lo->use_statx = false;
+        if (lo->inode_file_handles) {
+            fuse_log(FUSE_LOG_WARNING,
+                     "statx() does not work: Will not be able to use file "
+                     "handles for inodes\n");
+        }
         /* fallback */
     }
 #endif
@@ -1194,6 +1333,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode = NULL;
     struct lo_inode *dir = lo_inode(req, parent);
+    struct lo_fhandle *fh;
 
     if (inodep) {
         *inodep = NULL; /* in case there is an error */
@@ -1223,13 +1363,21 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
-    newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
-    if (newfd == -1) {
-        goto out_err;
+    fh = get_file_handle(lo, dir_fd.fd, name);
+    if (!fh) {
+        newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
+        if (newfd == -1) {
+            goto out_err;
+        }
     }
 
-    res = do_statx(lo, newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
-                   &mnt_id);
+    if (newfd >= 0) {
+        res = do_statx(lo, newfd, "", &e->attr,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
+    } else {
+        res = do_statx(lo, dir_fd.fd, name, &e->attr,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
+    }
     if (res == -1) {
         goto out_err;
     }
@@ -1239,9 +1387,19 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
-    inode = lo_find(lo, NULL, &e->attr, mnt_id);
+    /*
+     * Note that fh is always NULL if lo->inode_file_handles is false,
+     * and so we will never do a lookup by file handle here, and
+     * lo->inodes_by_handle will always remain empty.  We only need
+     * this map when we do not have an O_PATH fd open for every
+     * lo_inode, though, so if inode_file_handles is false, we do not
+     * need that map anyway.
+     */
+    inode = lo_find(lo, fh, &e->attr, mnt_id);
     if (inode) {
-        close(newfd);
+        if (newfd != -1) {
+            close(newfd);
+        }
     } else {
         inode = calloc(1, sizeof(struct lo_inode));
         if (!inode) {
@@ -1259,6 +1417,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 
         inode->nlookup = 1;
         inode->fd = newfd;
+        inode->fhandle = fh;
         inode->key.ino = e->attr.st_ino;
         inode->key.dev = e->attr.st_dev;
         inode->key.mnt_id = mnt_id;
@@ -1270,6 +1429,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         pthread_mutex_lock(&lo->mutex);
         inode->fuse_ino = lo_add_inode_mapping(req, inode);
         g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
+        if (inode->fhandle) {
+            g_hash_table_insert(lo->inodes_by_handle, inode->fhandle, inode);
+        }
         pthread_mutex_unlock(&lo->mutex);
     }
     e->ino = inode->fuse_ino;
@@ -1615,6 +1777,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
     int res;
     uint64_t mnt_id;
     struct stat attr;
+    struct lo_fhandle *fh;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *dir = lo_inode(req, parent);
     struct lo_inode *inode = NULL;
@@ -1628,12 +1791,16 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
         goto out;
     }
 
+    fh = get_file_handle(lo, dir_fd.fd, name);
+    /* Ignore errors, this is just an optional key for the lookup */
+
     res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
     if (res == -1) {
         goto out;
     }
 
-    inode = lo_find(lo, NULL, &attr, mnt_id);
+    inode = lo_find(lo, fh, &attr, mnt_id);
+    g_free(fh);
 
 out:
     lo_inode_put(lo, &dir);
@@ -1801,6 +1968,9 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
     if (!inode->nlookup) {
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
         g_hash_table_remove(lo->inodes_by_ids, &inode->key);
+        if (inode->fhandle) {
+            g_hash_table_remove(lo->inodes_by_handle, inode->fhandle);
+        }
         if (lo->posix_lock) {
             if (g_hash_table_size(inode->posix_locks)) {
                 fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
@@ -4362,6 +4532,14 @@ int main(int argc, char *argv[])
 
     lo.use_statx = true;
 
+#if !defined(CONFIG_STATX) || !defined(STATX_MNT_ID)
+    if (lo.inode_file_handles) {
+        fuse_log(FUSE_LOG_WARNING,
+                 "No statx() or mount ID support: Will not be able to use file "
+                 "handles for inodes\n");
+    }
+#endif
+
     se = fuse_session_new(&args, &lo_oper, sizeof(lo_oper), &lo);
     if (se == NULL) {
         goto err_out1;
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index af04c638cb..ab4dc07e3f 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -73,6 +73,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(mprotect),
     SCMP_SYS(mremap),
     SCMP_SYS(munmap),
+    SCMP_SYS(name_to_handle_at),
     SCMP_SYS(newfstatat),
     SCMP_SYS(statx),
     SCMP_SYS(open),
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-07-30 15:01   ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Vivek Goyal, Max Reitz

When the inode_file_handles option is set, try to generate a file handle
for new inodes instead of opening an O_PATH FD.

Being able to open these again will require CAP_DAC_READ_SEARCH, so the
description text tells the user they will also need to specify
-o modcaps=+dac_read_search.

Generating a file handle returns the mount ID it is valid for.  Opening
it will require an FD instead.  We have mount_fds to map an ID to an FD.
get_file_handle() fills the hash map by opening the file we have
generated a handle for.  To verify that the resulting FD indeed
represents the handle's mount ID, we use statx().  Therefore, using file
handles requires statx() support.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/helper.c              |   3 +
 tools/virtiofsd/passthrough_ll.c      | 194 ++++++++++++++++++++++++--
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 3 files changed, 190 insertions(+), 8 deletions(-)

diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index a8295d975a..aa63a21d43 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -187,6 +187,9 @@ void fuse_cmdline_help(void)
            "                               default: no_allow_direct_io\n"
            "    -o announce_submounts      Announce sub-mount points to the guest\n"
            "    -o posix_acl/no_posix_acl  Enable/Disable posix_acl. (default: disabled)\n"
+           "    -o inode_file_handles      Use file handles to reference inodes\n"
+           "                               instead of O_PATH file descriptors\n"
+           "                               (requires -o modcaps=+dac_read_search)\n"
            );
 }
 
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index f9d8b2f134..ac95961d12 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -194,6 +194,7 @@ struct lo_data {
     /* If set, virtiofsd is responsible for setting umask during creation */
     bool change_umask;
     int user_posix_acl, posix_acl;
+    int inode_file_handles;
 };
 
 /**
@@ -250,6 +251,10 @@ static const struct fuse_opt lo_opts[] = {
     { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
     { "posix_acl", offsetof(struct lo_data, user_posix_acl), 1 },
     { "no_posix_acl", offsetof(struct lo_data, user_posix_acl), 0 },
+    { "inode_file_handles", offsetof(struct lo_data, inode_file_handles), 1 },
+    { "no_inode_file_handles",
+      offsetof(struct lo_data, inode_file_handles),
+      0 },
     FUSE_OPT_END
 };
 static bool use_syslog = false;
@@ -321,6 +326,135 @@ static int temp_fd_steal(TempFd *temp_fd)
     }
 }
 
+/**
+ * Generate a file handle for the given dirfd/name combination.
+ *
+ * If mount_fds does not yet contain an entry for the handle's mount
+ * ID, (re)open dirfd/name in O_RDONLY mode and add it to mount_fds
+ * as the FD for that mount ID.  (That is the file that we have
+ * generated a handle for, so it should be representative for the
+ * mount ID.  However, to be sure (and to rule out races), we use
+ * statx() to verify that our assumption is correct.)
+ */
+static struct lo_fhandle *get_file_handle(struct lo_data *lo,
+                                          int dirfd, const char *name)
+{
+    /* We need statx() to verify the mount ID */
+#if defined(CONFIG_STATX) && defined(STATX_MNT_ID)
+    struct lo_fhandle *fh;
+    int ret;
+
+    if (!lo->use_statx || !lo->inode_file_handles) {
+        return NULL;
+    }
+
+    fh = g_new0(struct lo_fhandle, 1);
+
+    fh->handle.handle_bytes = sizeof(fh->padding) - sizeof(fh->handle);
+    ret = name_to_handle_at(dirfd, name, &fh->handle, &fh->mount_id,
+                            AT_EMPTY_PATH);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    if (pthread_rwlock_rdlock(&mount_fds_lock)) {
+        goto fail;
+    }
+    if (!g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
+        g_auto(TempFd) path_fd = TEMP_FD_INIT;
+        struct statx stx;
+        char procname[64];
+        int fd;
+
+        pthread_rwlock_unlock(&mount_fds_lock);
+
+        /*
+         * Before opening an O_RDONLY fd, check whether dirfd/name is a regular
+         * file or directory, because we must not open anything else with
+         * anything but O_PATH.
+         * (And we use that occasion to verify that the file has the mount ID we
+         * need.)
+         */
+        if (name[0]) {
+            path_fd.fd = openat(dirfd, name, O_PATH);
+            if (path_fd.fd < 0) {
+                goto fail;
+            }
+            path_fd.owned = true;
+        } else {
+            path_fd.fd = dirfd;
+            path_fd.owned = false;
+        }
+
+        ret = statx(path_fd.fd, "", AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
+                    STATX_TYPE | STATX_MNT_ID, &stx);
+        if (ret < 0) {
+            if (errno == ENOSYS) {
+                lo->use_statx = false;
+                fuse_log(FUSE_LOG_WARNING,
+                         "statx() does not work: Will not be able to use file "
+                         "handles for inodes\n");
+            }
+            goto fail;
+        }
+        if (!(stx.stx_mask & STATX_MNT_ID) || stx.stx_mnt_id != fh->mount_id) {
+            /*
+             * One reason for stx_mnt_id != mount_id could be that dirfd/name
+             * is a directory, and some other filesystem was mounted there
+             * between us generating the file handle and then opening the FD.
+             * (Other kinds of races might be possible, too.)
+             * Failing this function is not fatal, though, because our caller
+             * (lo_do_lookup()) will just fall back to opening an O_PATH FD to
+             * store in lo_inode.fd instead of storing a file handle in
+             * lo_inode.fhandle.  So we do not need to try too hard to get an
+             * FD for fh->mount_id so this function could succeed.
+             */
+            goto fail;
+        }
+        if (!(stx.stx_mask & STATX_TYPE) ||
+            !(S_ISREG(stx.stx_mode) || S_ISDIR(stx.stx_mode)))
+        {
+            /*
+             * We must not open special files with anything but O_PATH, so we
+             * cannot use this file for mount_fds.
+             * Just return a failure in such a case and let the lo_inode have
+             * an O_PATH fd instead of a file handle.
+             */
+            goto fail;
+        }
+
+        /* Now that we know this fd is safe to open, do it */
+        snprintf(procname, sizeof(procname), "%i", path_fd.fd);
+        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        if (fd < 0) {
+            goto fail;
+        }
+
+        if (pthread_rwlock_wrlock(&mount_fds_lock)) {
+            goto fail;
+        }
+
+        /* Check again, might have changed */
+        if (g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
+            close(fd);
+        } else {
+            g_hash_table_insert(mount_fds,
+                                GINT_TO_POINTER(fh->mount_id),
+                                GINT_TO_POINTER(fd));
+        }
+    }
+    pthread_rwlock_unlock(&mount_fds_lock);
+
+    return fh;
+
+fail:
+    free(fh);
+    return NULL;
+#else /* defined(CONFIG_STATX) && defined(STATX_MNT_ID) */
+    return NULL;
+#endif
+}
+
 /**
  * Open the given file handle with the given flags.
  *
@@ -1165,6 +1299,11 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
             return -1;
         }
         lo->use_statx = false;
+        if (lo->inode_file_handles) {
+            fuse_log(FUSE_LOG_WARNING,
+                     "statx() does not work: Will not be able to use file "
+                     "handles for inodes\n");
+        }
         /* fallback */
     }
 #endif
@@ -1194,6 +1333,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode = NULL;
     struct lo_inode *dir = lo_inode(req, parent);
+    struct lo_fhandle *fh;
 
     if (inodep) {
         *inodep = NULL; /* in case there is an error */
@@ -1223,13 +1363,21 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
-    newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
-    if (newfd == -1) {
-        goto out_err;
+    fh = get_file_handle(lo, dir_fd.fd, name);
+    if (!fh) {
+        newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
+        if (newfd == -1) {
+            goto out_err;
+        }
     }
 
-    res = do_statx(lo, newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
-                   &mnt_id);
+    if (newfd >= 0) {
+        res = do_statx(lo, newfd, "", &e->attr,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
+    } else {
+        res = do_statx(lo, dir_fd.fd, name, &e->attr,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
+    }
     if (res == -1) {
         goto out_err;
     }
@@ -1239,9 +1387,19 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
-    inode = lo_find(lo, NULL, &e->attr, mnt_id);
+    /*
+     * Note that fh is always NULL if lo->inode_file_handles is false,
+     * and so we will never do a lookup by file handle here, and
+     * lo->inodes_by_handle will always remain empty.  We only need
+     * this map when we do not have an O_PATH fd open for every
+     * lo_inode, though, so if inode_file_handles is false, we do not
+     * need that map anyway.
+     */
+    inode = lo_find(lo, fh, &e->attr, mnt_id);
     if (inode) {
-        close(newfd);
+        if (newfd != -1) {
+            close(newfd);
+        }
     } else {
         inode = calloc(1, sizeof(struct lo_inode));
         if (!inode) {
@@ -1259,6 +1417,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 
         inode->nlookup = 1;
         inode->fd = newfd;
+        inode->fhandle = fh;
         inode->key.ino = e->attr.st_ino;
         inode->key.dev = e->attr.st_dev;
         inode->key.mnt_id = mnt_id;
@@ -1270,6 +1429,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         pthread_mutex_lock(&lo->mutex);
         inode->fuse_ino = lo_add_inode_mapping(req, inode);
         g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
+        if (inode->fhandle) {
+            g_hash_table_insert(lo->inodes_by_handle, inode->fhandle, inode);
+        }
         pthread_mutex_unlock(&lo->mutex);
     }
     e->ino = inode->fuse_ino;
@@ -1615,6 +1777,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
     int res;
     uint64_t mnt_id;
     struct stat attr;
+    struct lo_fhandle *fh;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *dir = lo_inode(req, parent);
     struct lo_inode *inode = NULL;
@@ -1628,12 +1791,16 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
         goto out;
     }
 
+    fh = get_file_handle(lo, dir_fd.fd, name);
+    /* Ignore errors, this is just an optional key for the lookup */
+
     res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
     if (res == -1) {
         goto out;
     }
 
-    inode = lo_find(lo, NULL, &attr, mnt_id);
+    inode = lo_find(lo, fh, &attr, mnt_id);
+    g_free(fh);
 
 out:
     lo_inode_put(lo, &dir);
@@ -1801,6 +1968,9 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
     if (!inode->nlookup) {
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
         g_hash_table_remove(lo->inodes_by_ids, &inode->key);
+        if (inode->fhandle) {
+            g_hash_table_remove(lo->inodes_by_handle, inode->fhandle);
+        }
         if (lo->posix_lock) {
             if (g_hash_table_size(inode->posix_locks)) {
                 fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
@@ -4362,6 +4532,14 @@ int main(int argc, char *argv[])
 
     lo.use_statx = true;
 
+#if !defined(CONFIG_STATX) || !defined(STATX_MNT_ID)
+    if (lo.inode_file_handles) {
+        fuse_log(FUSE_LOG_WARNING,
+                 "No statx() or mount ID support: Will not be able to use file "
+                 "handles for inodes\n");
+    }
+#endif
+
     se = fuse_session_new(&args, &lo_oper, sizeof(lo_oper), &lo);
     if (se == NULL) {
         goto err_out1;
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index af04c638cb..ab4dc07e3f 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -73,6 +73,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(mprotect),
     SCMP_SYS(mremap),
     SCMP_SYS(munmap),
+    SCMP_SYS(name_to_handle_at),
     SCMP_SYS(newfstatat),
     SCMP_SYS(statx),
     SCMP_SYS(open),
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v3 10/10] virtiofsd: Add lazy lo_do_find()
  2021-07-30 15:01 ` [Virtio-fs] " Max Reitz
@ 2021-07-30 15:01   ` Max Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Stefan Hajnoczi, Dr . David Alan Gilbert, Vivek Goyal, Max Reitz

lo_find() right now takes two lookup keys for two maps, namely the file
handle for inodes_by_handle and the statx information for inodes_by_ids.
However, we only need the statx information if looking up the inode by
the file handle failed.

There are two callers of lo_find(): The first one, lo_do_lookup(), has
both keys anyway, so passing them does not incur any additional cost.
The second one, lookup_name(), though, needs to explicitly invoke
name_to_handle_at() (through get_file_handle()) and statx() (through
do_statx()).  We need to try to get a file handle as the primary key, so
we cannot get rid of get_file_handle(), but we only need the statx
information if looking up an inode by handle failed; so we can defer
that until the lookup has indeed failed.

To this end, replace lo_find()'s st/mnt_id parameters by a get_ids()
closure that is invoked to fill the lo_key struct if necessary.

Also, lo_find() is renamed to lo_do_find(), so we can add a new
lo_find() wrapper whose closure just initializes the lo_key from the
st/mnt_id parameters, just like the old lo_find() did.

lookup_name() directly calls lo_do_find() now and passes its own
closure, which performs the do_statx() call.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 93 ++++++++++++++++++++++++++------
 1 file changed, 76 insertions(+), 17 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index ac95961d12..41e9f53878 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1200,22 +1200,23 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
-static struct lo_inode *lo_find(struct lo_data *lo,
-                                const struct lo_fhandle *fhandle,
-                                struct stat *st, uint64_t mnt_id)
+/*
+ * get_ids() will be called to get the key for lo->inodes_by_ids if
+ * the lookup by file handle has failed.
+ */
+static struct lo_inode *lo_do_find(struct lo_data *lo,
+    const struct lo_fhandle *fhandle,
+    int (*get_ids)(struct lo_key *, const void *),
+    const void *get_ids_opaque)
 {
     struct lo_inode *p = NULL;
-    struct lo_key ids_key = {
-        .ino = st->st_ino,
-        .dev = st->st_dev,
-        .mnt_id = mnt_id,
-    };
+    struct lo_key ids_key;
 
     pthread_mutex_lock(&lo->mutex);
     if (fhandle) {
         p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
     }
-    if (!p) {
+    if (!p && get_ids(&ids_key, get_ids_opaque) == 0) {
         p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
         /*
          * When we had to fall back to looking up an inode by its
@@ -1244,6 +1245,36 @@ static struct lo_inode *lo_find(struct lo_data *lo,
     return p;
 }
 
+struct lo_find_get_ids_key_opaque {
+    const struct stat *st;
+    uint64_t mnt_id;
+};
+
+static int lo_find_get_ids_key(struct lo_key *ids_key, const void *opaque)
+{
+    const struct lo_find_get_ids_key_opaque *stat_info = opaque;
+
+    *ids_key = (struct lo_key){
+        .ino = stat_info->st->st_ino,
+        .dev = stat_info->st->st_dev,
+        .mnt_id = stat_info->mnt_id,
+    };
+
+    return 0;
+}
+
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id)
+{
+    const struct lo_find_get_ids_key_opaque stat_info = {
+        .st = st,
+        .mnt_id = mnt_id,
+    };
+
+    return lo_do_find(lo, fhandle, lo_find_get_ids_key, &stat_info);
+}
+
 /* value_destroy_func for posix_locks GHashTable */
 static void posix_locks_value_destroy(gpointer data)
 {
@@ -1769,14 +1800,41 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
+struct lookup_name_get_ids_key_opaque {
+    struct lo_data *lo;
+    int parent_fd;
+    const char *name;
+};
+
+static int lookup_name_get_ids_key(struct lo_key *ids_key, const void *opaque)
+{
+    const struct lookup_name_get_ids_key_opaque *stat_params = opaque;
+    uint64_t mnt_id;
+    struct stat attr;
+    int res;
+
+    res = do_statx(stat_params->lo, stat_params->parent_fd, stat_params->name,
+                   &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
+    if (res < 0) {
+        return -errno;
+    }
+
+    *ids_key = (struct lo_key){
+        .ino = attr.st_ino,
+        .dev = attr.st_dev,
+        .mnt_id = mnt_id,
+    };
+
+    return 0;
+}
+
 /* Increments nlookup and caller must release refcount using lo_inode_put() */
 static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
                                     const char *name)
 {
     g_auto(TempFd) dir_fd = TEMP_FD_INIT;
     int res;
-    uint64_t mnt_id;
-    struct stat attr;
+    struct lookup_name_get_ids_key_opaque stat_params;
     struct lo_fhandle *fh;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *dir = lo_inode(req, parent);
@@ -1794,12 +1852,13 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
     fh = get_file_handle(lo, dir_fd.fd, name);
     /* Ignore errors, this is just an optional key for the lookup */
 
-    res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
-    if (res == -1) {
-        goto out;
-    }
-
-    inode = lo_find(lo, fh, &attr, mnt_id);
+    stat_params = (struct lookup_name_get_ids_key_opaque){
+        .lo = lo,
+        .parent_fd = dir_fd.fd,
+        .name = name,
+    };
+    inode = lo_do_find(lo, fh, lookup_name_get_ids_key, &stat_params);
+    lo_inode_put(lo, &dir);
     g_free(fh);
 
 out:
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [Virtio-fs] [PATCH v3 10/10] virtiofsd: Add lazy lo_do_find()
@ 2021-07-30 15:01   ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-07-30 15:01 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Vivek Goyal, Max Reitz

lo_find() right now takes two lookup keys for two maps, namely the file
handle for inodes_by_handle and the statx information for inodes_by_ids.
However, we only need the statx information if looking up the inode by
the file handle failed.

There are two callers of lo_find(): The first one, lo_do_lookup(), has
both keys anyway, so passing them does not incur any additional cost.
The second one, lookup_name(), though, needs to explicitly invoke
name_to_handle_at() (through get_file_handle()) and statx() (through
do_statx()).  We need to try to get a file handle as the primary key, so
we cannot get rid of get_file_handle(), but we only need the statx
information if looking up an inode by handle failed; so we can defer
that until the lookup has indeed failed.

To this end, replace lo_find()'s st/mnt_id parameters by a get_ids()
closure that is invoked to fill the lo_key struct if necessary.

Also, lo_find() is renamed to lo_do_find(), so we can add a new
lo_find() wrapper whose closure just initializes the lo_key from the
st/mnt_id parameters, just like the old lo_find() did.

lookup_name() directly calls lo_do_find() now and passes its own
closure, which performs the do_statx() call.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 93 ++++++++++++++++++++++++++------
 1 file changed, 76 insertions(+), 17 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index ac95961d12..41e9f53878 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1200,22 +1200,23 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
-static struct lo_inode *lo_find(struct lo_data *lo,
-                                const struct lo_fhandle *fhandle,
-                                struct stat *st, uint64_t mnt_id)
+/*
+ * get_ids() will be called to get the key for lo->inodes_by_ids if
+ * the lookup by file handle has failed.
+ */
+static struct lo_inode *lo_do_find(struct lo_data *lo,
+    const struct lo_fhandle *fhandle,
+    int (*get_ids)(struct lo_key *, const void *),
+    const void *get_ids_opaque)
 {
     struct lo_inode *p = NULL;
-    struct lo_key ids_key = {
-        .ino = st->st_ino,
-        .dev = st->st_dev,
-        .mnt_id = mnt_id,
-    };
+    struct lo_key ids_key;
 
     pthread_mutex_lock(&lo->mutex);
     if (fhandle) {
         p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
     }
-    if (!p) {
+    if (!p && get_ids(&ids_key, get_ids_opaque) == 0) {
         p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
         /*
          * When we had to fall back to looking up an inode by its
@@ -1244,6 +1245,36 @@ static struct lo_inode *lo_find(struct lo_data *lo,
     return p;
 }
 
+struct lo_find_get_ids_key_opaque {
+    const struct stat *st;
+    uint64_t mnt_id;
+};
+
+static int lo_find_get_ids_key(struct lo_key *ids_key, const void *opaque)
+{
+    const struct lo_find_get_ids_key_opaque *stat_info = opaque;
+
+    *ids_key = (struct lo_key){
+        .ino = stat_info->st->st_ino,
+        .dev = stat_info->st->st_dev,
+        .mnt_id = stat_info->mnt_id,
+    };
+
+    return 0;
+}
+
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id)
+{
+    const struct lo_find_get_ids_key_opaque stat_info = {
+        .st = st,
+        .mnt_id = mnt_id,
+    };
+
+    return lo_do_find(lo, fhandle, lo_find_get_ids_key, &stat_info);
+}
+
 /* value_destroy_func for posix_locks GHashTable */
 static void posix_locks_value_destroy(gpointer data)
 {
@@ -1769,14 +1800,41 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
+struct lookup_name_get_ids_key_opaque {
+    struct lo_data *lo;
+    int parent_fd;
+    const char *name;
+};
+
+static int lookup_name_get_ids_key(struct lo_key *ids_key, const void *opaque)
+{
+    const struct lookup_name_get_ids_key_opaque *stat_params = opaque;
+    uint64_t mnt_id;
+    struct stat attr;
+    int res;
+
+    res = do_statx(stat_params->lo, stat_params->parent_fd, stat_params->name,
+                   &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
+    if (res < 0) {
+        return -errno;
+    }
+
+    *ids_key = (struct lo_key){
+        .ino = attr.st_ino,
+        .dev = attr.st_dev,
+        .mnt_id = mnt_id,
+    };
+
+    return 0;
+}
+
 /* Increments nlookup and caller must release refcount using lo_inode_put() */
 static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
                                     const char *name)
 {
     g_auto(TempFd) dir_fd = TEMP_FD_INIT;
     int res;
-    uint64_t mnt_id;
-    struct stat attr;
+    struct lookup_name_get_ids_key_opaque stat_params;
     struct lo_fhandle *fh;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *dir = lo_inode(req, parent);
@@ -1794,12 +1852,13 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
     fh = get_file_handle(lo, dir_fd.fd, name);
     /* Ignore errors, this is just an optional key for the lookup */
 
-    res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
-    if (res == -1) {
-        goto out;
-    }
-
-    inode = lo_find(lo, fh, &attr, mnt_id);
+    stat_params = (struct lookup_name_get_ids_key_opaque){
+        .lo = lo,
+        .parent_fd = dir_fd.fd,
+        .name = name,
+    };
+    inode = lo_do_find(lo, fh, lookup_name_get_ids_key, &stat_params);
+    lo_inode_put(lo, &dir);
     g_free(fh);
 
 out:
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 01/10] virtiofsd: Limit setxattr()'s creds-dropped region
  2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
@ 2021-08-06 14:16     ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-06 14:16 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Fri, Jul 30, 2021 at 05:01:25PM +0200, Max Reitz wrote:
> We only need to drop/switch our credentials for the (f)setxattr() call
> alone, not for the openat() or fchdir() around it.
> 
> (Right now, this may not be that big of a problem, but with inodes being
> identified by file handles instead of an O_PATH fd, we will need
> open_by_handle_at() calls here, which is really fickle when it comes to
> credentials being dropped.)
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 34 +++++++++++++++++++++++---------
>  1 file changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 38b2af8599..1f27eeabc5 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -3121,6 +3121,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>      bool switched_creds = false;
>      bool cap_fsetid_dropped = false;
>      struct lo_cred old = {};
> +    bool open_inode;
>  
>      if (block_xattr(lo, in_name)) {
>          fuse_reply_err(req, EOPNOTSUPP);
> @@ -3155,7 +3156,24 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>      fuse_log(FUSE_LOG_DEBUG, "lo_setxattr(ino=%" PRIu64
>               ", name=%s value=%s size=%zd)\n", ino, name, value, size);
>  
> +    /*
> +     * We can only open regular files or directories.  If the inode is
> +     * something else, we have to enter /proc/self/fd and use
> +     * setxattr() on the link's filename there.
> +     */
> +    open_inode = S_ISREG(inode->filetype) || S_ISDIR(inode->filetype);
>      sprintf(procname, "%i", inode->fd);
> +    if (open_inode) {
> +        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        if (fd < 0) {
> +            saverr = errno;
> +            goto out;
> +        }
> +    } else {
> +        /* fchdir should not fail here */
> +        FCHDIR_NOFAIL(lo->proc_self_fd);
> +    }
> +
>      /*
>       * If we are setting posix access acl and if SGID needs to be
>       * cleared, then switch to caller's gid and drop CAP_FSETID
> @@ -3176,20 +3194,13 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>          }
>          switched_creds = true;
>      }
> -    if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
> -        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> -        if (fd < 0) {
> -            saverr = errno;
> -            goto out;
> -        }
> +    if (open_inode) {
> +        assert(fd >= 0);
>          ret = fsetxattr(fd, name, value, size, flags);
>          saverr = ret == -1 ? errno : 0;
>      } else {
> -        /* fchdir should not fail here */
> -        FCHDIR_NOFAIL(lo->proc_self_fd);
>          ret = setxattr(procname, name, value, size, flags);
>          saverr = ret == -1 ? errno : 0;
> -        FCHDIR_NOFAIL(lo->root.fd);
>      }
>      if (switched_creds) {
>          if (cap_fsetid_dropped)
> @@ -3198,6 +3209,11 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>              lo_restore_cred(&old, false);
>      }
>  
> +    if (!open_inode) {
> +        /* Change CWD back, fchdir should not fail here */
> +        FCHDIR_NOFAIL(lo->root.fd);
> +    }
> +

This FCHDIR_NOFAIL() will also need to be called if lo_drop_cap_change_cred()
fails. 

        ret = lo_drop_cap_change_cred(req, &old, false, "FSETID",
                                      &cap_fsetid_dropped);
        if (ret) {
            saverr = ret;
            goto out;
        }

Vivek

>  out:
>      if (fd >= 0) {
>          close(fd);
> -- 
> 2.31.1
> 



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 01/10] virtiofsd: Limit setxattr()'s creds-dropped region
@ 2021-08-06 14:16     ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-06 14:16 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Fri, Jul 30, 2021 at 05:01:25PM +0200, Max Reitz wrote:
> We only need to drop/switch our credentials for the (f)setxattr() call
> alone, not for the openat() or fchdir() around it.
> 
> (Right now, this may not be that big of a problem, but with inodes being
> identified by file handles instead of an O_PATH fd, we will need
> open_by_handle_at() calls here, which is really fickle when it comes to
> credentials being dropped.)
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 34 +++++++++++++++++++++++---------
>  1 file changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 38b2af8599..1f27eeabc5 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -3121,6 +3121,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>      bool switched_creds = false;
>      bool cap_fsetid_dropped = false;
>      struct lo_cred old = {};
> +    bool open_inode;
>  
>      if (block_xattr(lo, in_name)) {
>          fuse_reply_err(req, EOPNOTSUPP);
> @@ -3155,7 +3156,24 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>      fuse_log(FUSE_LOG_DEBUG, "lo_setxattr(ino=%" PRIu64
>               ", name=%s value=%s size=%zd)\n", ino, name, value, size);
>  
> +    /*
> +     * We can only open regular files or directories.  If the inode is
> +     * something else, we have to enter /proc/self/fd and use
> +     * setxattr() on the link's filename there.
> +     */
> +    open_inode = S_ISREG(inode->filetype) || S_ISDIR(inode->filetype);
>      sprintf(procname, "%i", inode->fd);
> +    if (open_inode) {
> +        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        if (fd < 0) {
> +            saverr = errno;
> +            goto out;
> +        }
> +    } else {
> +        /* fchdir should not fail here */
> +        FCHDIR_NOFAIL(lo->proc_self_fd);
> +    }
> +
>      /*
>       * If we are setting posix access acl and if SGID needs to be
>       * cleared, then switch to caller's gid and drop CAP_FSETID
> @@ -3176,20 +3194,13 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>          }
>          switched_creds = true;
>      }
> -    if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
> -        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> -        if (fd < 0) {
> -            saverr = errno;
> -            goto out;
> -        }
> +    if (open_inode) {
> +        assert(fd >= 0);
>          ret = fsetxattr(fd, name, value, size, flags);
>          saverr = ret == -1 ? errno : 0;
>      } else {
> -        /* fchdir should not fail here */
> -        FCHDIR_NOFAIL(lo->proc_self_fd);
>          ret = setxattr(procname, name, value, size, flags);
>          saverr = ret == -1 ? errno : 0;
> -        FCHDIR_NOFAIL(lo->root.fd);
>      }
>      if (switched_creds) {
>          if (cap_fsetid_dropped)
> @@ -3198,6 +3209,11 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>              lo_restore_cred(&old, false);
>      }
>  
> +    if (!open_inode) {
> +        /* Change CWD back, fchdir should not fail here */
> +        FCHDIR_NOFAIL(lo->root.fd);
> +    }
> +

This FCHDIR_NOFAIL() will also need to be called if lo_drop_cap_change_cred()
fails. 

        ret = lo_drop_cap_change_cred(req, &old, false, "FSETID",
                                      &cap_fsetid_dropped);
        if (ret) {
            saverr = ret;
            goto out;
        }

Vivek

>  out:
>      if (fd >= 0) {
>          close(fd);
> -- 
> 2.31.1
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 02/10] virtiofsd: Add TempFd structure
  2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
@ 2021-08-06 14:41     ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-06 14:41 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Fri, Jul 30, 2021 at 05:01:26PM +0200, Max Reitz wrote:
> We are planning to add file handles to lo_inode objects as an
> alternative to lo_inode.fd.  That means that everywhere where we
> currently reference lo_inode.fd, we will have to open a temporary file
> descriptor that needs to be closed after use.
> 
> So instead of directly accessing lo_inode.fd, there will be a helper
> function (lo_inode_fd()) that either returns lo_inode.fd, or opens a new
> file descriptor with open_by_handle_at().  It encapsulates this result
> in a TempFd structure to let the caller know whether the FD needs to be
> closed after use (opened from the handle) or not (copied from
> lo_inode.fd).

I am wondering why this notion of "owned". Why not have this requirement
of always closing "fd". If we copied it from lo_inode.fd, then we will
need to dup() it. Otherwise we opened it from file handle and we will
need to close it anyway.

I guess you are trying to avoid having to call dup() and that's why
this notion of "owned" fd.

> 
> By using g_auto(TempFd) to store this result, callers will not even have
> to care about closing a temporary FD after use.  It will be done
> automatically once the object goes out of scope.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 49 ++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 1f27eeabc5..fb5e073e6a 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -178,6 +178,28 @@ struct lo_data {
>      int user_posix_acl, posix_acl;
>  };
>  
> +/**
> + * Represents a file descriptor that may either be owned by this
> + * TempFd, or only referenced (i.e. the ownership belongs to some
> + * other object, and the value has just been copied into this TempFd).
> + *
> + * The purpose of this encapsulation is to be used as g_auto(TempFd)
> + * to automatically clean up owned file descriptors when this object
> + * goes out of scope.
> + *
> + * Use temp_fd_steal() to get an owned file descriptor that will not
> + * be closed when the TempFd goes out of scope.
> + */
> +typedef struct {
> +    int fd;
> +    bool owned; /* fd owned by this object? */
> +} TempFd;
> +
> +#define TEMP_FD_INIT ((TempFd) { .fd = -1, .owned = false })
> +
> +static void temp_fd_clear(TempFd *temp_fd);
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(TempFd, temp_fd_clear);
> +
>  static const struct fuse_opt lo_opts[] = {
>      { "sandbox=namespace",
>        offsetof(struct lo_data, sandbox),
> @@ -255,6 +277,33 @@ static struct lo_data *lo_data(fuse_req_t req)
>      return (struct lo_data *)fuse_req_userdata(req);
>  }
>  
> +/**
> + * Clean-up function for TempFds
> + */
> +static void temp_fd_clear(TempFd *temp_fd)
> +{
> +    if (temp_fd->owned) {
> +        close(temp_fd->fd);
> +        *temp_fd = TEMP_FD_INIT;
> +    }
> +}
> +
> +/**
> + * Return an owned fd from *temp_fd that will not be closed when
> + * *temp_fd goes out of scope.
> + *
> + * (TODO: Remove __attribute__ once this is used.)
> + */
> +static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
> +{
> +    if (temp_fd->owned) {
> +        temp_fd->owned = false;
> +        return temp_fd->fd;
> +    } else {
> +        return dup(temp_fd->fd);
> +    }
> +}

This also will be simpler if we always called dup() and every caller
will close() fd. 

I think only downside is having to call dup()/close(). Not sure if this
is an expensive operation or not.

Vivek



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 02/10] virtiofsd: Add TempFd structure
@ 2021-08-06 14:41     ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-06 14:41 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Fri, Jul 30, 2021 at 05:01:26PM +0200, Max Reitz wrote:
> We are planning to add file handles to lo_inode objects as an
> alternative to lo_inode.fd.  That means that everywhere where we
> currently reference lo_inode.fd, we will have to open a temporary file
> descriptor that needs to be closed after use.
> 
> So instead of directly accessing lo_inode.fd, there will be a helper
> function (lo_inode_fd()) that either returns lo_inode.fd, or opens a new
> file descriptor with open_by_handle_at().  It encapsulates this result
> in a TempFd structure to let the caller know whether the FD needs to be
> closed after use (opened from the handle) or not (copied from
> lo_inode.fd).

I am wondering why this notion of "owned". Why not have this requirement
of always closing "fd". If we copied it from lo_inode.fd, then we will
need to dup() it. Otherwise we opened it from file handle and we will
need to close it anyway.

I guess you are trying to avoid having to call dup() and that's why
this notion of "owned" fd.

> 
> By using g_auto(TempFd) to store this result, callers will not even have
> to care about closing a temporary FD after use.  It will be done
> automatically once the object goes out of scope.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 49 ++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 1f27eeabc5..fb5e073e6a 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -178,6 +178,28 @@ struct lo_data {
>      int user_posix_acl, posix_acl;
>  };
>  
> +/**
> + * Represents a file descriptor that may either be owned by this
> + * TempFd, or only referenced (i.e. the ownership belongs to some
> + * other object, and the value has just been copied into this TempFd).
> + *
> + * The purpose of this encapsulation is to be used as g_auto(TempFd)
> + * to automatically clean up owned file descriptors when this object
> + * goes out of scope.
> + *
> + * Use temp_fd_steal() to get an owned file descriptor that will not
> + * be closed when the TempFd goes out of scope.
> + */
> +typedef struct {
> +    int fd;
> +    bool owned; /* fd owned by this object? */
> +} TempFd;
> +
> +#define TEMP_FD_INIT ((TempFd) { .fd = -1, .owned = false })
> +
> +static void temp_fd_clear(TempFd *temp_fd);
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(TempFd, temp_fd_clear);
> +
>  static const struct fuse_opt lo_opts[] = {
>      { "sandbox=namespace",
>        offsetof(struct lo_data, sandbox),
> @@ -255,6 +277,33 @@ static struct lo_data *lo_data(fuse_req_t req)
>      return (struct lo_data *)fuse_req_userdata(req);
>  }
>  
> +/**
> + * Clean-up function for TempFds
> + */
> +static void temp_fd_clear(TempFd *temp_fd)
> +{
> +    if (temp_fd->owned) {
> +        close(temp_fd->fd);
> +        *temp_fd = TEMP_FD_INIT;
> +    }
> +}
> +
> +/**
> + * Return an owned fd from *temp_fd that will not be closed when
> + * *temp_fd goes out of scope.
> + *
> + * (TODO: Remove __attribute__ once this is used.)
> + */
> +static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
> +{
> +    if (temp_fd->owned) {
> +        temp_fd->owned = false;
> +        return temp_fd->fd;
> +    } else {
> +        return dup(temp_fd->fd);
> +    }
> +}

This also will be simpler if we always called dup() and every caller
will close() fd. 

I think only downside is having to call dup()/close(). Not sure if this
is an expensive operation or not.

Vivek


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 03/10] virtiofsd: Use lo_inode_open() instead of openat()
  2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
@ 2021-08-06 15:42     ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-06 15:42 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Fri, Jul 30, 2021 at 05:01:27PM +0200, Max Reitz wrote:
> The xattr functions want a non-O_PATH FD, so they reopen the lo_inode.fd
> with the flags they need through /proc/self/fd.
> 
> Similarly, lo_opendir() needs an O_RDONLY FD.  Instead of the
> /proc/self/fd trick, it just uses openat(fd, "."), because the FD is
> guaranteed to be a directory, so this works.

Ok, given now lo_opendir() will use lo_inode_open(), it will switch
to using proc O_PATH fd trick. I guess that should be fine.

Vivek

> 
> All cases have one problem in common, though: In the future, when we may
> have a file handle in the lo_inode instead of an FD, querying an
> lo_inode FD may incur an open_by_handle_at() call.  It does not make
> sense to then reopen that FD with custom flags, those should have been
> passed to open_by_handle_at() instead.
> 
> Use lo_inode_open() instead of openat().  As part of the file handle
> change, lo_inode_open() will be made to invoke openat() only if
> lo_inode.fd is valid.  Otherwise, it will invoke open_by_handle_at()
> with the right flags from the start.
> 
> Consequently, after this patch, lo_inode_open() is the only place to
> invoke openat() to reopen an existing FD with different flags.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++------------
>  1 file changed, 27 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index fb5e073e6a..a444c3a7e2 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -1729,18 +1729,26 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>  {
>      int error = ENOMEM;
>      struct lo_data *lo = lo_data(req);
> -    struct lo_dirp *d;
> +    struct lo_inode *inode;
> +    struct lo_dirp *d = NULL;
>      int fd;
>      ssize_t fh;
>  
> +    inode = lo_inode(req, ino);
> +    if (!inode) {
> +        error = EBADF;
> +        goto out_err;
> +    }
> +
>      d = calloc(1, sizeof(struct lo_dirp));
>      if (d == NULL) {
>          goto out_err;
>      }
>  
> -    fd = openat(lo_fd(req, ino), ".", O_RDONLY);
> -    if (fd == -1) {
> -        goto out_errno;
> +    fd = lo_inode_open(lo, inode, O_RDONLY);
> +    if (fd < 0) {
> +        error = -fd;
> +        goto out_err;
>      }
>  
>      d->dp = fdopendir(fd);
> @@ -1769,6 +1777,7 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>  out_errno:
>      error = errno;
>  out_err:
> +    lo_inode_put(lo, &inode);
>      if (d) {
>          if (d->dp) {
>              closedir(d->dp);
> @@ -2973,7 +2982,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>          }
>      }
>  
> -    sprintf(procname, "%i", inode->fd);
>      /*
>       * It is not safe to open() non-regular/non-dir files in file server
>       * unless O_PATH is used, so use that method for regular files/dir
> @@ -2981,13 +2989,15 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>       * Otherwise, call fchdir() to avoid open().
>       */
>      if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
> -        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        fd = lo_inode_open(lo, inode, O_RDONLY);
>          if (fd < 0) {
> -            goto out_err;
> +            saverr = -fd;
> +            goto out;
>          }
>          ret = fgetxattr(fd, name, value, size);
>          saverr = ret == -1 ? errno : 0;
>      } else {
> +        sprintf(procname, "%i", inode->fd);
>          /* fchdir should not fail here */
>          FCHDIR_NOFAIL(lo->proc_self_fd);
>          ret = getxattr(procname, name, value, size);
> @@ -3054,15 +3064,16 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
>          }
>      }
>  
> -    sprintf(procname, "%i", inode->fd);
>      if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
> -        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        fd = lo_inode_open(lo, inode, O_RDONLY);
>          if (fd < 0) {
> -            goto out_err;
> +            saverr = -fd;
> +            goto out;
>          }
>          ret = flistxattr(fd, value, size);
>          saverr = ret == -1 ? errno : 0;
>      } else {
> +        sprintf(procname, "%i", inode->fd);
>          /* fchdir should not fail here */
>          FCHDIR_NOFAIL(lo->proc_self_fd);
>          ret = listxattr(procname, value, size);
> @@ -3211,14 +3222,14 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>       * setxattr() on the link's filename there.
>       */
>      open_inode = S_ISREG(inode->filetype) || S_ISDIR(inode->filetype);
> -    sprintf(procname, "%i", inode->fd);
>      if (open_inode) {
> -        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        fd = lo_inode_open(lo, inode, O_RDONLY);
>          if (fd < 0) {
> -            saverr = errno;
> +            saverr = -fd;
>              goto out;
>          }
>      } else {
> +        sprintf(procname, "%i", inode->fd);
>          /* fchdir should not fail here */
>          FCHDIR_NOFAIL(lo->proc_self_fd);
>      }
> @@ -3317,16 +3328,16 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
>      fuse_log(FUSE_LOG_DEBUG, "lo_removexattr(ino=%" PRIu64 ", name=%s)\n", ino,
>               name);
>  
> -    sprintf(procname, "%i", inode->fd);
>      if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
> -        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        fd = lo_inode_open(lo, inode, O_RDONLY);
>          if (fd < 0) {
> -            saverr = errno;
> +            saverr = -fd;
>              goto out;
>          }
>          ret = fremovexattr(fd, name);
>          saverr = ret == -1 ? errno : 0;
>      } else {
> +        sprintf(procname, "%i", inode->fd);
>          /* fchdir should not fail here */
>          FCHDIR_NOFAIL(lo->proc_self_fd);
>          ret = removexattr(procname, name);
> -- 
> 2.31.1
> 



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 03/10] virtiofsd: Use lo_inode_open() instead of openat()
@ 2021-08-06 15:42     ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-06 15:42 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Fri, Jul 30, 2021 at 05:01:27PM +0200, Max Reitz wrote:
> The xattr functions want a non-O_PATH FD, so they reopen the lo_inode.fd
> with the flags they need through /proc/self/fd.
> 
> Similarly, lo_opendir() needs an O_RDONLY FD.  Instead of the
> /proc/self/fd trick, it just uses openat(fd, "."), because the FD is
> guaranteed to be a directory, so this works.

Ok, given now lo_opendir() will use lo_inode_open(), it will switch
to using proc O_PATH fd trick. I guess that should be fine.

Vivek

> 
> All cases have one problem in common, though: In the future, when we may
> have a file handle in the lo_inode instead of an FD, querying an
> lo_inode FD may incur an open_by_handle_at() call.  It does not make
> sense to then reopen that FD with custom flags, those should have been
> passed to open_by_handle_at() instead.
> 
> Use lo_inode_open() instead of openat().  As part of the file handle
> change, lo_inode_open() will be made to invoke openat() only if
> lo_inode.fd is valid.  Otherwise, it will invoke open_by_handle_at()
> with the right flags from the start.
> 
> Consequently, after this patch, lo_inode_open() is the only place to
> invoke openat() to reopen an existing FD with different flags.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++------------
>  1 file changed, 27 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index fb5e073e6a..a444c3a7e2 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -1729,18 +1729,26 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>  {
>      int error = ENOMEM;
>      struct lo_data *lo = lo_data(req);
> -    struct lo_dirp *d;
> +    struct lo_inode *inode;
> +    struct lo_dirp *d = NULL;
>      int fd;
>      ssize_t fh;
>  
> +    inode = lo_inode(req, ino);
> +    if (!inode) {
> +        error = EBADF;
> +        goto out_err;
> +    }
> +
>      d = calloc(1, sizeof(struct lo_dirp));
>      if (d == NULL) {
>          goto out_err;
>      }
>  
> -    fd = openat(lo_fd(req, ino), ".", O_RDONLY);
> -    if (fd == -1) {
> -        goto out_errno;
> +    fd = lo_inode_open(lo, inode, O_RDONLY);
> +    if (fd < 0) {
> +        error = -fd;
> +        goto out_err;
>      }
>  
>      d->dp = fdopendir(fd);
> @@ -1769,6 +1777,7 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>  out_errno:
>      error = errno;
>  out_err:
> +    lo_inode_put(lo, &inode);
>      if (d) {
>          if (d->dp) {
>              closedir(d->dp);
> @@ -2973,7 +2982,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>          }
>      }
>  
> -    sprintf(procname, "%i", inode->fd);
>      /*
>       * It is not safe to open() non-regular/non-dir files in file server
>       * unless O_PATH is used, so use that method for regular files/dir
> @@ -2981,13 +2989,15 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>       * Otherwise, call fchdir() to avoid open().
>       */
>      if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
> -        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        fd = lo_inode_open(lo, inode, O_RDONLY);
>          if (fd < 0) {
> -            goto out_err;
> +            saverr = -fd;
> +            goto out;
>          }
>          ret = fgetxattr(fd, name, value, size);
>          saverr = ret == -1 ? errno : 0;
>      } else {
> +        sprintf(procname, "%i", inode->fd);
>          /* fchdir should not fail here */
>          FCHDIR_NOFAIL(lo->proc_self_fd);
>          ret = getxattr(procname, name, value, size);
> @@ -3054,15 +3064,16 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
>          }
>      }
>  
> -    sprintf(procname, "%i", inode->fd);
>      if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
> -        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        fd = lo_inode_open(lo, inode, O_RDONLY);
>          if (fd < 0) {
> -            goto out_err;
> +            saverr = -fd;
> +            goto out;
>          }
>          ret = flistxattr(fd, value, size);
>          saverr = ret == -1 ? errno : 0;
>      } else {
> +        sprintf(procname, "%i", inode->fd);
>          /* fchdir should not fail here */
>          FCHDIR_NOFAIL(lo->proc_self_fd);
>          ret = listxattr(procname, value, size);
> @@ -3211,14 +3222,14 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>       * setxattr() on the link's filename there.
>       */
>      open_inode = S_ISREG(inode->filetype) || S_ISDIR(inode->filetype);
> -    sprintf(procname, "%i", inode->fd);
>      if (open_inode) {
> -        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        fd = lo_inode_open(lo, inode, O_RDONLY);
>          if (fd < 0) {
> -            saverr = errno;
> +            saverr = -fd;
>              goto out;
>          }
>      } else {
> +        sprintf(procname, "%i", inode->fd);
>          /* fchdir should not fail here */
>          FCHDIR_NOFAIL(lo->proc_self_fd);
>      }
> @@ -3317,16 +3328,16 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
>      fuse_log(FUSE_LOG_DEBUG, "lo_removexattr(ino=%" PRIu64 ", name=%s)\n", ino,
>               name);
>  
> -    sprintf(procname, "%i", inode->fd);
>      if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
> -        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        fd = lo_inode_open(lo, inode, O_RDONLY);
>          if (fd < 0) {
> -            saverr = errno;
> +            saverr = -fd;
>              goto out;
>          }
>          ret = fremovexattr(fd, name);
>          saverr = ret == -1 ? errno : 0;
>      } else {
> +        sprintf(procname, "%i", inode->fd);
>          /* fchdir should not fail here */
>          FCHDIR_NOFAIL(lo->proc_self_fd);
>          ret = removexattr(procname, name);
> -- 
> 2.31.1
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 04/10] virtiofsd: Add lo_inode_fd() helper
  2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
@ 2021-08-06 18:25     ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-06 18:25 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Fri, Jul 30, 2021 at 05:01:28PM +0200, Max Reitz wrote:

[..]
> @@ -1335,12 +1359,18 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
>          return;
>      }
>  
> +    res = lo_inode_fd(dir, &dir_fd);
> +    if (res < 0) {
> +        saverr = -res;
> +        goto out;
> +    }
> +
>      saverr = lo_change_cred(req, &old, lo->change_umask && !S_ISLNK(mode));
>      if (saverr) {
>          goto out;
>      }
>  
> -    res = mknod_wrapper(dir->fd, name, link, mode, rdev);
> +    res = mknod_wrapper(dir_fd.fd, name, link, mode, rdev);
>  
>      saverr = errno;
>  
> @@ -1388,6 +1418,8 @@ static void lo_symlink(fuse_req_t req, const char *link, fuse_ino_t parent,
>  static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
>                      const char *name)
>  {
> +    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
> +    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
>      int res;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *parent_inode;
> @@ -1413,18 +1445,31 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
>          goto out_err;
>      }
>  
> +    res = lo_inode_fd(inode, &inode_fd);
> +    if (res < 0) {
> +        errno = -res;

In previous function, we saved error to "saverr" and jumped to "out"
label, instead of overwriting to errno.

I would think that it will be good to use a single pattern. Either
save error in saverr or overwrite errno. I personally prefer saving
error into "saverr".

> +        goto out_err;
> +    }
> +
> +    res = lo_inode_fd(parent_inode, &parent_fd);
> +    if (res < 0) {
> +        errno = -res;
> +        goto out_err;
> +    }
> +
>      memset(&e, 0, sizeof(struct fuse_entry_param));
>      e.attr_timeout = lo->timeout;
>      e.entry_timeout = lo->timeout;
>  
> -    sprintf(procname, "%i", inode->fd);
> -    res = linkat(lo->proc_self_fd, procname, parent_inode->fd, name,
> +    sprintf(procname, "%i", inode_fd.fd);
> +    res = linkat(lo->proc_self_fd, procname, parent_fd.fd, name,
>                   AT_SYMLINK_FOLLOW);
>      if (res == -1) {
>          goto out_err;
>      }
>  
> -    res = fstatat(inode->fd, "", &e.attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
> +    res = fstatat(inode_fd.fd, "", &e.attr,
> +                  AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
>      if (res == -1) {
>          goto out_err;
>      }
> @@ -1453,23 +1498,33 @@ out_err:
>  static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>                                      const char *name)
>  {
> +    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
>      int res;
>      uint64_t mnt_id;
>      struct stat attr;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *dir = lo_inode(req, parent);
> +    struct lo_inode *inode = NULL;
>  
>      if (!dir) {
> -        return NULL;
> +        goto out;

Should we continue to just call "return NULL". dir is NULL. That means
lo_inode() failed. That means we never got the reference. So we don't
have to put the reference. If we do "goto out", it will call
lo_inode_put() which is not needed.

>      }
>  
> -    res = do_statx(lo, dir->fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
> -    lo_inode_put(lo, &dir);
> +    res = lo_inode_fd(dir, &dir_fd);
> +    if (res < 0) {
> +        goto out;
> +    }
> +
> +    res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
>      if (res == -1) {
> -        return NULL;
> +        goto out;
>      }
>  
> -    return lo_find(lo, &attr, mnt_id);
> +    inode = lo_find(lo, &attr, mnt_id);
> +
> +out:
> +    lo_inode_put(lo, &dir);
> +    return inode;
>  }


Thanks
Vivek



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 04/10] virtiofsd: Add lo_inode_fd() helper
@ 2021-08-06 18:25     ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-06 18:25 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Fri, Jul 30, 2021 at 05:01:28PM +0200, Max Reitz wrote:

[..]
> @@ -1335,12 +1359,18 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
>          return;
>      }
>  
> +    res = lo_inode_fd(dir, &dir_fd);
> +    if (res < 0) {
> +        saverr = -res;
> +        goto out;
> +    }
> +
>      saverr = lo_change_cred(req, &old, lo->change_umask && !S_ISLNK(mode));
>      if (saverr) {
>          goto out;
>      }
>  
> -    res = mknod_wrapper(dir->fd, name, link, mode, rdev);
> +    res = mknod_wrapper(dir_fd.fd, name, link, mode, rdev);
>  
>      saverr = errno;
>  
> @@ -1388,6 +1418,8 @@ static void lo_symlink(fuse_req_t req, const char *link, fuse_ino_t parent,
>  static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
>                      const char *name)
>  {
> +    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
> +    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
>      int res;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *parent_inode;
> @@ -1413,18 +1445,31 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
>          goto out_err;
>      }
>  
> +    res = lo_inode_fd(inode, &inode_fd);
> +    if (res < 0) {
> +        errno = -res;

In previous function, we saved error to "saverr" and jumped to "out"
label, instead of overwriting to errno.

I would think that it will be good to use a single pattern. Either
save error in saverr or overwrite errno. I personally prefer saving
error into "saverr".

> +        goto out_err;
> +    }
> +
> +    res = lo_inode_fd(parent_inode, &parent_fd);
> +    if (res < 0) {
> +        errno = -res;
> +        goto out_err;
> +    }
> +
>      memset(&e, 0, sizeof(struct fuse_entry_param));
>      e.attr_timeout = lo->timeout;
>      e.entry_timeout = lo->timeout;
>  
> -    sprintf(procname, "%i", inode->fd);
> -    res = linkat(lo->proc_self_fd, procname, parent_inode->fd, name,
> +    sprintf(procname, "%i", inode_fd.fd);
> +    res = linkat(lo->proc_self_fd, procname, parent_fd.fd, name,
>                   AT_SYMLINK_FOLLOW);
>      if (res == -1) {
>          goto out_err;
>      }
>  
> -    res = fstatat(inode->fd, "", &e.attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
> +    res = fstatat(inode_fd.fd, "", &e.attr,
> +                  AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
>      if (res == -1) {
>          goto out_err;
>      }
> @@ -1453,23 +1498,33 @@ out_err:
>  static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>                                      const char *name)
>  {
> +    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
>      int res;
>      uint64_t mnt_id;
>      struct stat attr;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *dir = lo_inode(req, parent);
> +    struct lo_inode *inode = NULL;
>  
>      if (!dir) {
> -        return NULL;
> +        goto out;

Should we continue to just call "return NULL". dir is NULL. That means
lo_inode() failed. That means we never got the reference. So we don't
have to put the reference. If we do "goto out", it will call
lo_inode_put() which is not needed.

>      }
>  
> -    res = do_statx(lo, dir->fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
> -    lo_inode_put(lo, &dir);
> +    res = lo_inode_fd(dir, &dir_fd);
> +    if (res < 0) {
> +        goto out;
> +    }
> +
> +    res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
>      if (res == -1) {
> -        return NULL;
> +        goto out;
>      }
>  
> -    return lo_find(lo, &attr, mnt_id);
> +    inode = lo_find(lo, &attr, mnt_id);
> +
> +out:
> +    lo_inode_put(lo, &dir);
> +    return inode;
>  }


Thanks
Vivek


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 06/10] virtiofsd: Let lo_inode_open() return a TempFd
  2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
@ 2021-08-06 19:55     ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-06 19:55 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Fri, Jul 30, 2021 at 05:01:30PM +0200, Max Reitz wrote:
> Strictly speaking, this is not necessary, because lo_inode_open() will
> always return a new FD owned by the caller, so TempFd.owned will always
> be true.
> 
> However, auto-cleanup is nice, and in some cases this plays nicely with
> an lo_inode_fd() call in another conditional branch (see lo_setattr()).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 138 +++++++++++++------------------
>  1 file changed, 59 insertions(+), 79 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 9e1bc37af8..292b7f7e27 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -291,10 +291,8 @@ static void temp_fd_clear(TempFd *temp_fd)
>  /**
>   * Return an owned fd from *temp_fd that will not be closed when
>   * *temp_fd goes out of scope.
> - *
> - * (TODO: Remove __attribute__ once this is used.)
>   */
> -static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
> +static int temp_fd_steal(TempFd *temp_fd)
>  {
>      if (temp_fd->owned) {
>          temp_fd->owned = false;
> @@ -673,9 +671,12 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
>   * when a malicious client opens special files such as block device nodes.
>   * Symlink inodes are also rejected since symlinks must already have been
>   * traversed on the client side.
> + *
> + * The fd is returned in tfd->fd.  The return value is 0 on success and -errno
> + * otherwise.
>   */
> -static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
> -                         int open_flags)
> +static int lo_inode_open(const struct lo_data *lo, const struct lo_inode *inode,
> +                         int open_flags, TempFd *tfd)
>  {
>      g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
>      int fd;
> @@ -694,7 +695,13 @@ static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
>      if (fd < 0) {
>          return -errno;
>      }
> -    return fd;
> +
> +    *tfd = (TempFd) {
> +        .fd = fd,
> +        .owned = true,
> +    };
> +
> +    return 0;
>  }
>  
>  static void lo_init(void *userdata, struct fuse_conn_info *conn)
> @@ -852,7 +859,12 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>          return;
>      }
>  
> -    res = lo_inode_fd(inode, &inode_fd);
> +    if (!fi && (valid & FUSE_SET_ATTR_SIZE)) {
> +        /* We need an O_RDWR FD for ftruncate() */
> +        res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
> +    } else {
> +        res = lo_inode_fd(inode, &inode_fd);
> +    }

A minor nit.

So inode_fd could hold either an O_PATH fd returned by lo_inode_fd()
or a O_RDWR fd returned by lo_inode_open().

Previous code held these fds in two different variables, inode_fd and
truncfd respectively. I kind of found that easier to read because looking
at variable name, I knew whether I am dealing with O_PATH fd or an
O_RDWR fd I just opened. 

So a minor nit. We could continue to have two variables, say
inode_fd and trunc_fd. Just that type of trunc_fd will now be TempFd.

Also I liked previous style easier to read where I always got hold
of O_PATH fd first. And later opened a O_RDWR fd if operation
is FUSE_ATTR_SIZE. So "valid & FUSE_SET_ATTR_SIZE" check was not
at two places.

Anyway, this is a minor nit. If you don't like the idea of using
two separate variables to hold O_PATH fd and O_RDWR fd, that's ok.


>      if (res < 0) {
>          saverr = -res;
>          goto out_err;
> @@ -900,18 +912,11 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>          if (fi) {
>              truncfd = fd;
>          } else {
> -            truncfd = lo_inode_open(lo, inode, O_RDWR);
> -            if (truncfd < 0) {
> -                saverr = -truncfd;
> -                goto out_err;
> -            }
> +            truncfd = inode_fd.fd;
>          }
>  
>          saverr = drop_security_capability(lo, truncfd);
>          if (saverr) {
> -            if (!fi) {
> -                close(truncfd);
> -            }
>              goto out_err;
>          }
>  
> @@ -919,9 +924,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>              res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
>              if (res != 0) {
>                  saverr = res;
> -                if (!fi) {
> -                    close(truncfd);
> -                }
>                  goto out_err;
>              }
>          }
> @@ -934,9 +936,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>                  fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
>              }
>          }
> -        if (!fi) {
> -            close(truncfd);
> -        }
>          if (res == -1) {
>              goto out_err;
>          }
> @@ -1822,11 +1821,12 @@ static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
>  static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>                         struct fuse_file_info *fi)
>  {
> +    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
>      int error = ENOMEM;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode;
>      struct lo_dirp *d = NULL;
> -    int fd;
> +    int res;
>      ssize_t fh;
>  
>      inode = lo_inode(req, ino);
> @@ -1840,13 +1840,13 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>          goto out_err;
>      }
>  
> -    fd = lo_inode_open(lo, inode, O_RDONLY);
> -    if (fd < 0) {
> -        error = -fd;
> +    res = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
> +    if (res < 0) {
> +        error = -res;
>          goto out_err;
>      }
>  
> -    d->dp = fdopendir(fd);
> +    d->dp = fdopendir(temp_fd_steal(&inode_fd));

So we are using temp_fd_steal(), because if fdopendir() is succesful,
we don't want to close fd instead it will be closed during closedir()
call. inode_fd will be closed once lo_opendir(), so we get fd ownership
which will need to close explicitly, when appropriate.

Who closes the stolen fd returned by temp_fd_steal() if fdopendir() fails?

>      if (d->dp == NULL) {
>          goto out_errno;
>      }
> @@ -1876,8 +1876,6 @@ out_err:
>      if (d) {
>          if (d->dp) {
>              closedir(d->dp);
> -        } else if (fd != -1) {
> -            close(fd);
>          }
>          free(d);
>      }
> @@ -2077,6 +2075,7 @@ static void update_open_flags(int writeback, int allow_direct_io,
>  static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
>                        int existing_fd, struct fuse_file_info *fi)
>  {
> +    g_auto(TempFd) inode_fd = TEMP_FD_INIT;

It bothers me that we are using variable inode_fd both to hold O_PATH
fd as well as regular fd. Will be nice if just by looking at variable
name I could figure out which type of fd it is.

Will it make sense to use path_fd, or ipath_fd, or inode_path_fd to
represent where we are using O_PATH fd.


Thanks
Vivek



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 06/10] virtiofsd: Let lo_inode_open() return a TempFd
@ 2021-08-06 19:55     ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-06 19:55 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Fri, Jul 30, 2021 at 05:01:30PM +0200, Max Reitz wrote:
> Strictly speaking, this is not necessary, because lo_inode_open() will
> always return a new FD owned by the caller, so TempFd.owned will always
> be true.
> 
> However, auto-cleanup is nice, and in some cases this plays nicely with
> an lo_inode_fd() call in another conditional branch (see lo_setattr()).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 138 +++++++++++++------------------
>  1 file changed, 59 insertions(+), 79 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 9e1bc37af8..292b7f7e27 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -291,10 +291,8 @@ static void temp_fd_clear(TempFd *temp_fd)
>  /**
>   * Return an owned fd from *temp_fd that will not be closed when
>   * *temp_fd goes out of scope.
> - *
> - * (TODO: Remove __attribute__ once this is used.)
>   */
> -static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
> +static int temp_fd_steal(TempFd *temp_fd)
>  {
>      if (temp_fd->owned) {
>          temp_fd->owned = false;
> @@ -673,9 +671,12 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
>   * when a malicious client opens special files such as block device nodes.
>   * Symlink inodes are also rejected since symlinks must already have been
>   * traversed on the client side.
> + *
> + * The fd is returned in tfd->fd.  The return value is 0 on success and -errno
> + * otherwise.
>   */
> -static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
> -                         int open_flags)
> +static int lo_inode_open(const struct lo_data *lo, const struct lo_inode *inode,
> +                         int open_flags, TempFd *tfd)
>  {
>      g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
>      int fd;
> @@ -694,7 +695,13 @@ static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
>      if (fd < 0) {
>          return -errno;
>      }
> -    return fd;
> +
> +    *tfd = (TempFd) {
> +        .fd = fd,
> +        .owned = true,
> +    };
> +
> +    return 0;
>  }
>  
>  static void lo_init(void *userdata, struct fuse_conn_info *conn)
> @@ -852,7 +859,12 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>          return;
>      }
>  
> -    res = lo_inode_fd(inode, &inode_fd);
> +    if (!fi && (valid & FUSE_SET_ATTR_SIZE)) {
> +        /* We need an O_RDWR FD for ftruncate() */
> +        res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
> +    } else {
> +        res = lo_inode_fd(inode, &inode_fd);
> +    }

A minor nit.

So inode_fd could hold either an O_PATH fd returned by lo_inode_fd()
or a O_RDWR fd returned by lo_inode_open().

Previous code held these fds in two different variables, inode_fd and
truncfd respectively. I kind of found that easier to read because looking
at variable name, I knew whether I am dealing with O_PATH fd or an
O_RDWR fd I just opened. 

So a minor nit. We could continue to have two variables, say
inode_fd and trunc_fd. Just that type of trunc_fd will now be TempFd.

Also I liked previous style easier to read where I always got hold
of O_PATH fd first. And later opened a O_RDWR fd if operation
is FUSE_ATTR_SIZE. So "valid & FUSE_SET_ATTR_SIZE" check was not
at two places.

Anyway, this is a minor nit. If you don't like the idea of using
two separate variables to hold O_PATH fd and O_RDWR fd, that's ok.


>      if (res < 0) {
>          saverr = -res;
>          goto out_err;
> @@ -900,18 +912,11 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>          if (fi) {
>              truncfd = fd;
>          } else {
> -            truncfd = lo_inode_open(lo, inode, O_RDWR);
> -            if (truncfd < 0) {
> -                saverr = -truncfd;
> -                goto out_err;
> -            }
> +            truncfd = inode_fd.fd;
>          }
>  
>          saverr = drop_security_capability(lo, truncfd);
>          if (saverr) {
> -            if (!fi) {
> -                close(truncfd);
> -            }
>              goto out_err;
>          }
>  
> @@ -919,9 +924,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>              res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
>              if (res != 0) {
>                  saverr = res;
> -                if (!fi) {
> -                    close(truncfd);
> -                }
>                  goto out_err;
>              }
>          }
> @@ -934,9 +936,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>                  fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
>              }
>          }
> -        if (!fi) {
> -            close(truncfd);
> -        }
>          if (res == -1) {
>              goto out_err;
>          }
> @@ -1822,11 +1821,12 @@ static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
>  static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>                         struct fuse_file_info *fi)
>  {
> +    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
>      int error = ENOMEM;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode;
>      struct lo_dirp *d = NULL;
> -    int fd;
> +    int res;
>      ssize_t fh;
>  
>      inode = lo_inode(req, ino);
> @@ -1840,13 +1840,13 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>          goto out_err;
>      }
>  
> -    fd = lo_inode_open(lo, inode, O_RDONLY);
> -    if (fd < 0) {
> -        error = -fd;
> +    res = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
> +    if (res < 0) {
> +        error = -res;
>          goto out_err;
>      }
>  
> -    d->dp = fdopendir(fd);
> +    d->dp = fdopendir(temp_fd_steal(&inode_fd));

So we are using temp_fd_steal(), because if fdopendir() is succesful,
we don't want to close fd instead it will be closed during closedir()
call. inode_fd will be closed once lo_opendir(), so we get fd ownership
which will need to close explicitly, when appropriate.

Who closes the stolen fd returned by temp_fd_steal() if fdopendir() fails?

>      if (d->dp == NULL) {
>          goto out_errno;
>      }
> @@ -1876,8 +1876,6 @@ out_err:
>      if (d) {
>          if (d->dp) {
>              closedir(d->dp);
> -        } else if (fd != -1) {
> -            close(fd);
>          }
>          free(d);
>      }
> @@ -2077,6 +2075,7 @@ static void update_open_flags(int writeback, int allow_direct_io,
>  static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
>                        int existing_fd, struct fuse_file_info *fi)
>  {
> +    g_auto(TempFd) inode_fd = TEMP_FD_INIT;

It bothers me that we are using variable inode_fd both to hold O_PATH
fd as well as regular fd. Will be nice if just by looking at variable
name I could figure out which type of fd it is.

Will it make sense to use path_fd, or ipath_fd, or inode_path_fd to
represent where we are using O_PATH fd.


Thanks
Vivek


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 01/10] virtiofsd: Limit setxattr()'s creds-dropped region
  2021-08-06 14:16     ` [Virtio-fs] " Vivek Goyal
@ 2021-08-09 10:30       ` Max Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-08-09 10:30 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 06.08.21 16:16, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:25PM +0200, Max Reitz wrote:
>> We only need to drop/switch our credentials for the (f)setxattr() call
>> alone, not for the openat() or fchdir() around it.
>>
>> (Right now, this may not be that big of a problem, but with inodes being
>> identified by file handles instead of an O_PATH fd, we will need
>> open_by_handle_at() calls here, which is really fickle when it comes to
>> credentials being dropped.)
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 34 +++++++++++++++++++++++---------
>>   1 file changed, 25 insertions(+), 9 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 38b2af8599..1f27eeabc5 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -3121,6 +3121,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>>       bool switched_creds = false;
>>       bool cap_fsetid_dropped = false;
>>       struct lo_cred old = {};
>> +    bool open_inode;
>>   
>>       if (block_xattr(lo, in_name)) {
>>           fuse_reply_err(req, EOPNOTSUPP);
>> @@ -3155,7 +3156,24 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>>       fuse_log(FUSE_LOG_DEBUG, "lo_setxattr(ino=%" PRIu64
>>                ", name=%s value=%s size=%zd)\n", ino, name, value, size);
>>   
>> +    /*
>> +     * We can only open regular files or directories.  If the inode is
>> +     * something else, we have to enter /proc/self/fd and use
>> +     * setxattr() on the link's filename there.
>> +     */
>> +    open_inode = S_ISREG(inode->filetype) || S_ISDIR(inode->filetype);
>>       sprintf(procname, "%i", inode->fd);
>> +    if (open_inode) {
>> +        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
>> +        if (fd < 0) {
>> +            saverr = errno;
>> +            goto out;
>> +        }
>> +    } else {
>> +        /* fchdir should not fail here */
>> +        FCHDIR_NOFAIL(lo->proc_self_fd);
>> +    }
>> +
>>       /*
>>        * If we are setting posix access acl and if SGID needs to be
>>        * cleared, then switch to caller's gid and drop CAP_FSETID
>> @@ -3176,20 +3194,13 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>>           }
>>           switched_creds = true;
>>       }
>> -    if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
>> -        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
>> -        if (fd < 0) {
>> -            saverr = errno;
>> -            goto out;
>> -        }
>> +    if (open_inode) {
>> +        assert(fd >= 0);
>>           ret = fsetxattr(fd, name, value, size, flags);
>>           saverr = ret == -1 ? errno : 0;
>>       } else {
>> -        /* fchdir should not fail here */
>> -        FCHDIR_NOFAIL(lo->proc_self_fd);
>>           ret = setxattr(procname, name, value, size, flags);
>>           saverr = ret == -1 ? errno : 0;
>> -        FCHDIR_NOFAIL(lo->root.fd);
>>       }
>>       if (switched_creds) {
>>           if (cap_fsetid_dropped)
>> @@ -3198,6 +3209,11 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>>               lo_restore_cred(&old, false);
>>       }
>>   
>> +    if (!open_inode) {
>> +        /* Change CWD back, fchdir should not fail here */
>> +        FCHDIR_NOFAIL(lo->root.fd);
>> +    }
>> +
> This FCHDIR_NOFAIL() will also need to be called if lo_drop_cap_change_cred()
> fails.
>
>          ret = lo_drop_cap_change_cred(req, &old, false, "FSETID",
>                                        &cap_fsetid_dropped);
>          if (ret) {
>              saverr = ret;
>              goto out;
>          }

Oh, right, thanks!

Max



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 01/10] virtiofsd: Limit setxattr()'s creds-dropped region
@ 2021-08-09 10:30       ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-08-09 10:30 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel

On 06.08.21 16:16, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:25PM +0200, Max Reitz wrote:
>> We only need to drop/switch our credentials for the (f)setxattr() call
>> alone, not for the openat() or fchdir() around it.
>>
>> (Right now, this may not be that big of a problem, but with inodes being
>> identified by file handles instead of an O_PATH fd, we will need
>> open_by_handle_at() calls here, which is really fickle when it comes to
>> credentials being dropped.)
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 34 +++++++++++++++++++++++---------
>>   1 file changed, 25 insertions(+), 9 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 38b2af8599..1f27eeabc5 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -3121,6 +3121,7 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>>       bool switched_creds = false;
>>       bool cap_fsetid_dropped = false;
>>       struct lo_cred old = {};
>> +    bool open_inode;
>>   
>>       if (block_xattr(lo, in_name)) {
>>           fuse_reply_err(req, EOPNOTSUPP);
>> @@ -3155,7 +3156,24 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>>       fuse_log(FUSE_LOG_DEBUG, "lo_setxattr(ino=%" PRIu64
>>                ", name=%s value=%s size=%zd)\n", ino, name, value, size);
>>   
>> +    /*
>> +     * We can only open regular files or directories.  If the inode is
>> +     * something else, we have to enter /proc/self/fd and use
>> +     * setxattr() on the link's filename there.
>> +     */
>> +    open_inode = S_ISREG(inode->filetype) || S_ISDIR(inode->filetype);
>>       sprintf(procname, "%i", inode->fd);
>> +    if (open_inode) {
>> +        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
>> +        if (fd < 0) {
>> +            saverr = errno;
>> +            goto out;
>> +        }
>> +    } else {
>> +        /* fchdir should not fail here */
>> +        FCHDIR_NOFAIL(lo->proc_self_fd);
>> +    }
>> +
>>       /*
>>        * If we are setting posix access acl and if SGID needs to be
>>        * cleared, then switch to caller's gid and drop CAP_FSETID
>> @@ -3176,20 +3194,13 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>>           }
>>           switched_creds = true;
>>       }
>> -    if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
>> -        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
>> -        if (fd < 0) {
>> -            saverr = errno;
>> -            goto out;
>> -        }
>> +    if (open_inode) {
>> +        assert(fd >= 0);
>>           ret = fsetxattr(fd, name, value, size, flags);
>>           saverr = ret == -1 ? errno : 0;
>>       } else {
>> -        /* fchdir should not fail here */
>> -        FCHDIR_NOFAIL(lo->proc_self_fd);
>>           ret = setxattr(procname, name, value, size, flags);
>>           saverr = ret == -1 ? errno : 0;
>> -        FCHDIR_NOFAIL(lo->root.fd);
>>       }
>>       if (switched_creds) {
>>           if (cap_fsetid_dropped)
>> @@ -3198,6 +3209,11 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
>>               lo_restore_cred(&old, false);
>>       }
>>   
>> +    if (!open_inode) {
>> +        /* Change CWD back, fchdir should not fail here */
>> +        FCHDIR_NOFAIL(lo->root.fd);
>> +    }
>> +
> This FCHDIR_NOFAIL() will also need to be called if lo_drop_cap_change_cred()
> fails.
>
>          ret = lo_drop_cap_change_cred(req, &old, false, "FSETID",
>                                        &cap_fsetid_dropped);
>          if (ret) {
>              saverr = ret;
>              goto out;
>          }

Oh, right, thanks!

Max


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 02/10] virtiofsd: Add TempFd structure
  2021-08-06 14:41     ` [Virtio-fs] " Vivek Goyal
@ 2021-08-09 10:44       ` Max Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-08-09 10:44 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 06.08.21 16:41, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:26PM +0200, Max Reitz wrote:
>> We are planning to add file handles to lo_inode objects as an
>> alternative to lo_inode.fd.  That means that everywhere where we
>> currently reference lo_inode.fd, we will have to open a temporary file
>> descriptor that needs to be closed after use.
>>
>> So instead of directly accessing lo_inode.fd, there will be a helper
>> function (lo_inode_fd()) that either returns lo_inode.fd, or opens a new
>> file descriptor with open_by_handle_at().  It encapsulates this result
>> in a TempFd structure to let the caller know whether the FD needs to be
>> closed after use (opened from the handle) or not (copied from
>> lo_inode.fd).
> I am wondering why this notion of "owned". Why not have this requirement
> of always closing "fd". If we copied it from lo_inode.fd, then we will
> need to dup() it. Otherwise we opened it from file handle and we will
> need to close it anyway.
>
> I guess you are trying to avoid having to call dup() and that's why
> this notion of "owned" fd.

Yes, I don’t want to dup() it.  One reason is that I’d rather just not.  
It’s something that we can avoid, and dup-ing every time wouldn’t make 
the code that much simpler (I think, without having tried).

One other is because this affects the current behavior (with O_PATH 
FDs), which I don’t want to alter.

Well, and finally, as a pragmatic reason, virtiofsd-rs uses the same 
structure and I don’t really want C virtiofsd and virtiofsd-rs to differ 
too much.

>> By using g_auto(TempFd) to store this result, callers will not even have
>> to care about closing a temporary FD after use.  It will be done
>> automatically once the object goes out of scope.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 49 ++++++++++++++++++++++++++++++++
>>   1 file changed, 49 insertions(+)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 1f27eeabc5..fb5e073e6a 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -178,6 +178,28 @@ struct lo_data {
>>       int user_posix_acl, posix_acl;
>>   };
>>   
>> +/**
>> + * Represents a file descriptor that may either be owned by this
>> + * TempFd, or only referenced (i.e. the ownership belongs to some
>> + * other object, and the value has just been copied into this TempFd).
>> + *
>> + * The purpose of this encapsulation is to be used as g_auto(TempFd)
>> + * to automatically clean up owned file descriptors when this object
>> + * goes out of scope.
>> + *
>> + * Use temp_fd_steal() to get an owned file descriptor that will not
>> + * be closed when the TempFd goes out of scope.
>> + */
>> +typedef struct {
>> +    int fd;
>> +    bool owned; /* fd owned by this object? */
>> +} TempFd;
>> +
>> +#define TEMP_FD_INIT ((TempFd) { .fd = -1, .owned = false })
>> +
>> +static void temp_fd_clear(TempFd *temp_fd);
>> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(TempFd, temp_fd_clear);
>> +
>>   static const struct fuse_opt lo_opts[] = {
>>       { "sandbox=namespace",
>>         offsetof(struct lo_data, sandbox),
>> @@ -255,6 +277,33 @@ static struct lo_data *lo_data(fuse_req_t req)
>>       return (struct lo_data *)fuse_req_userdata(req);
>>   }
>>   
>> +/**
>> + * Clean-up function for TempFds
>> + */
>> +static void temp_fd_clear(TempFd *temp_fd)
>> +{
>> +    if (temp_fd->owned) {
>> +        close(temp_fd->fd);
>> +        *temp_fd = TEMP_FD_INIT;
>> +    }
>> +}
>> +
>> +/**
>> + * Return an owned fd from *temp_fd that will not be closed when
>> + * *temp_fd goes out of scope.
>> + *
>> + * (TODO: Remove __attribute__ once this is used.)
>> + */
>> +static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
>> +{
>> +    if (temp_fd->owned) {
>> +        temp_fd->owned = false;
>> +        return temp_fd->fd;
>> +    } else {
>> +        return dup(temp_fd->fd);
>> +    }
>> +}
> This also will be simpler if we always called dup() and every caller
> will close() fd.
>
> I think only downside is having to call dup()/close(). Not sure if this
> is an expensive operation or not.
>
> Vivek
>



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 02/10] virtiofsd: Add TempFd structure
@ 2021-08-09 10:44       ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-08-09 10:44 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel

On 06.08.21 16:41, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:26PM +0200, Max Reitz wrote:
>> We are planning to add file handles to lo_inode objects as an
>> alternative to lo_inode.fd.  That means that everywhere where we
>> currently reference lo_inode.fd, we will have to open a temporary file
>> descriptor that needs to be closed after use.
>>
>> So instead of directly accessing lo_inode.fd, there will be a helper
>> function (lo_inode_fd()) that either returns lo_inode.fd, or opens a new
>> file descriptor with open_by_handle_at().  It encapsulates this result
>> in a TempFd structure to let the caller know whether the FD needs to be
>> closed after use (opened from the handle) or not (copied from
>> lo_inode.fd).
> I am wondering why this notion of "owned". Why not have this requirement
> of always closing "fd". If we copied it from lo_inode.fd, then we will
> need to dup() it. Otherwise we opened it from file handle and we will
> need to close it anyway.
>
> I guess you are trying to avoid having to call dup() and that's why
> this notion of "owned" fd.

Yes, I don’t want to dup() it.  One reason is that I’d rather just not.  
It’s something that we can avoid, and dup-ing every time wouldn’t make 
the code that much simpler (I think, without having tried).

One other is because this affects the current behavior (with O_PATH 
FDs), which I don’t want to alter.

Well, and finally, as a pragmatic reason, virtiofsd-rs uses the same 
structure and I don’t really want C virtiofsd and virtiofsd-rs to differ 
too much.

>> By using g_auto(TempFd) to store this result, callers will not even have
>> to care about closing a temporary FD after use.  It will be done
>> automatically once the object goes out of scope.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 49 ++++++++++++++++++++++++++++++++
>>   1 file changed, 49 insertions(+)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 1f27eeabc5..fb5e073e6a 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -178,6 +178,28 @@ struct lo_data {
>>       int user_posix_acl, posix_acl;
>>   };
>>   
>> +/**
>> + * Represents a file descriptor that may either be owned by this
>> + * TempFd, or only referenced (i.e. the ownership belongs to some
>> + * other object, and the value has just been copied into this TempFd).
>> + *
>> + * The purpose of this encapsulation is to be used as g_auto(TempFd)
>> + * to automatically clean up owned file descriptors when this object
>> + * goes out of scope.
>> + *
>> + * Use temp_fd_steal() to get an owned file descriptor that will not
>> + * be closed when the TempFd goes out of scope.
>> + */
>> +typedef struct {
>> +    int fd;
>> +    bool owned; /* fd owned by this object? */
>> +} TempFd;
>> +
>> +#define TEMP_FD_INIT ((TempFd) { .fd = -1, .owned = false })
>> +
>> +static void temp_fd_clear(TempFd *temp_fd);
>> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(TempFd, temp_fd_clear);
>> +
>>   static const struct fuse_opt lo_opts[] = {
>>       { "sandbox=namespace",
>>         offsetof(struct lo_data, sandbox),
>> @@ -255,6 +277,33 @@ static struct lo_data *lo_data(fuse_req_t req)
>>       return (struct lo_data *)fuse_req_userdata(req);
>>   }
>>   
>> +/**
>> + * Clean-up function for TempFds
>> + */
>> +static void temp_fd_clear(TempFd *temp_fd)
>> +{
>> +    if (temp_fd->owned) {
>> +        close(temp_fd->fd);
>> +        *temp_fd = TEMP_FD_INIT;
>> +    }
>> +}
>> +
>> +/**
>> + * Return an owned fd from *temp_fd that will not be closed when
>> + * *temp_fd goes out of scope.
>> + *
>> + * (TODO: Remove __attribute__ once this is used.)
>> + */
>> +static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
>> +{
>> +    if (temp_fd->owned) {
>> +        temp_fd->owned = false;
>> +        return temp_fd->fd;
>> +    } else {
>> +        return dup(temp_fd->fd);
>> +    }
>> +}
> This also will be simpler if we always called dup() and every caller
> will close() fd.
>
> I think only downside is having to call dup()/close(). Not sure if this
> is an expensive operation or not.
>
> Vivek
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 04/10] virtiofsd: Add lo_inode_fd() helper
  2021-08-06 18:25     ` [Virtio-fs] " Vivek Goyal
@ 2021-08-09 10:48       ` Max Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-08-09 10:48 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 06.08.21 20:25, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:28PM +0200, Max Reitz wrote:
>
> [..]
>> @@ -1335,12 +1359,18 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
>>           return;
>>       }
>>   
>> +    res = lo_inode_fd(dir, &dir_fd);
>> +    if (res < 0) {
>> +        saverr = -res;
>> +        goto out;
>> +    }
>> +
>>       saverr = lo_change_cred(req, &old, lo->change_umask && !S_ISLNK(mode));
>>       if (saverr) {
>>           goto out;
>>       }
>>   
>> -    res = mknod_wrapper(dir->fd, name, link, mode, rdev);
>> +    res = mknod_wrapper(dir_fd.fd, name, link, mode, rdev);
>>   
>>       saverr = errno;
>>   
>> @@ -1388,6 +1418,8 @@ static void lo_symlink(fuse_req_t req, const char *link, fuse_ino_t parent,
>>   static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
>>                       const char *name)
>>   {
>> +    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
>> +    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
>>       int res;
>>       struct lo_data *lo = lo_data(req);
>>       struct lo_inode *parent_inode;
>> @@ -1413,18 +1445,31 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
>>           goto out_err;
>>       }
>>   
>> +    res = lo_inode_fd(inode, &inode_fd);
>> +    if (res < 0) {
>> +        errno = -res;
> In previous function, we saved error to "saverr" and jumped to "out"
> label, instead of overwriting to errno.
>
> I would think that it will be good to use a single pattern. Either
> save error in saverr or overwrite errno. I personally prefer saving
> error into "saverr".

Absolutely, will do.

>> +        goto out_err;
>> +    }
>> +
>> +    res = lo_inode_fd(parent_inode, &parent_fd);
>> +    if (res < 0) {
>> +        errno = -res;
>> +        goto out_err;
>> +    }
>> +
>>       memset(&e, 0, sizeof(struct fuse_entry_param));
>>       e.attr_timeout = lo->timeout;
>>       e.entry_timeout = lo->timeout;
>>   
>> -    sprintf(procname, "%i", inode->fd);
>> -    res = linkat(lo->proc_self_fd, procname, parent_inode->fd, name,
>> +    sprintf(procname, "%i", inode_fd.fd);
>> +    res = linkat(lo->proc_self_fd, procname, parent_fd.fd, name,
>>                    AT_SYMLINK_FOLLOW);
>>       if (res == -1) {
>>           goto out_err;
>>       }
>>   
>> -    res = fstatat(inode->fd, "", &e.attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
>> +    res = fstatat(inode_fd.fd, "", &e.attr,
>> +                  AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
>>       if (res == -1) {
>>           goto out_err;
>>       }
>> @@ -1453,23 +1498,33 @@ out_err:
>>   static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>>                                       const char *name)
>>   {
>> +    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
>>       int res;
>>       uint64_t mnt_id;
>>       struct stat attr;
>>       struct lo_data *lo = lo_data(req);
>>       struct lo_inode *dir = lo_inode(req, parent);
>> +    struct lo_inode *inode = NULL;
>>   
>>       if (!dir) {
>> -        return NULL;
>> +        goto out;
> Should we continue to just call "return NULL". dir is NULL. That means
> lo_inode() failed. That means we never got the reference. So we don't
> have to put the reference. If we do "goto out", it will call
> lo_inode_put() which is not needed.

Yes, but lo_inode_put() will handle this gracefully, so it isn’t wrong. 
My personal preference is that if there is an clean-up path, it should 
be used everywhere instead of having pure returns at the beginning of a 
function (where not many resources have been initialized yet), so that 
no clean-up will be forgotten.  Like, if we were to add some resource 
acquisition in the declarations above (and clean-up code in the clean-up 
path), we would need to change the return to a goto here.  Or maybe we’d 
forget that, and then we’d leak something.

So I prefer having clean-up sections be generic enough that they can be 
used from anywhere within the function, and then also use it from 
anywhere within the function, even if they end up being no-ops.

>>       }
>>   
>> -    res = do_statx(lo, dir->fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
>> -    lo_inode_put(lo, &dir);
>> +    res = lo_inode_fd(dir, &dir_fd);
>> +    if (res < 0) {
>> +        goto out;
>> +    }
>> +
>> +    res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
>>       if (res == -1) {
>> -        return NULL;
>> +        goto out;
>>       }
>>   
>> -    return lo_find(lo, &attr, mnt_id);
>> +    inode = lo_find(lo, &attr, mnt_id);
>> +
>> +out:
>> +    lo_inode_put(lo, &dir);
>> +    return inode;
>>   }
>
> Thanks
> Vivek
>



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 04/10] virtiofsd: Add lo_inode_fd() helper
@ 2021-08-09 10:48       ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-08-09 10:48 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel

On 06.08.21 20:25, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:28PM +0200, Max Reitz wrote:
>
> [..]
>> @@ -1335,12 +1359,18 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
>>           return;
>>       }
>>   
>> +    res = lo_inode_fd(dir, &dir_fd);
>> +    if (res < 0) {
>> +        saverr = -res;
>> +        goto out;
>> +    }
>> +
>>       saverr = lo_change_cred(req, &old, lo->change_umask && !S_ISLNK(mode));
>>       if (saverr) {
>>           goto out;
>>       }
>>   
>> -    res = mknod_wrapper(dir->fd, name, link, mode, rdev);
>> +    res = mknod_wrapper(dir_fd.fd, name, link, mode, rdev);
>>   
>>       saverr = errno;
>>   
>> @@ -1388,6 +1418,8 @@ static void lo_symlink(fuse_req_t req, const char *link, fuse_ino_t parent,
>>   static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
>>                       const char *name)
>>   {
>> +    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
>> +    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
>>       int res;
>>       struct lo_data *lo = lo_data(req);
>>       struct lo_inode *parent_inode;
>> @@ -1413,18 +1445,31 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
>>           goto out_err;
>>       }
>>   
>> +    res = lo_inode_fd(inode, &inode_fd);
>> +    if (res < 0) {
>> +        errno = -res;
> In previous function, we saved error to "saverr" and jumped to "out"
> label, instead of overwriting to errno.
>
> I would think that it will be good to use a single pattern. Either
> save error in saverr or overwrite errno. I personally prefer saving
> error into "saverr".

Absolutely, will do.

>> +        goto out_err;
>> +    }
>> +
>> +    res = lo_inode_fd(parent_inode, &parent_fd);
>> +    if (res < 0) {
>> +        errno = -res;
>> +        goto out_err;
>> +    }
>> +
>>       memset(&e, 0, sizeof(struct fuse_entry_param));
>>       e.attr_timeout = lo->timeout;
>>       e.entry_timeout = lo->timeout;
>>   
>> -    sprintf(procname, "%i", inode->fd);
>> -    res = linkat(lo->proc_self_fd, procname, parent_inode->fd, name,
>> +    sprintf(procname, "%i", inode_fd.fd);
>> +    res = linkat(lo->proc_self_fd, procname, parent_fd.fd, name,
>>                    AT_SYMLINK_FOLLOW);
>>       if (res == -1) {
>>           goto out_err;
>>       }
>>   
>> -    res = fstatat(inode->fd, "", &e.attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
>> +    res = fstatat(inode_fd.fd, "", &e.attr,
>> +                  AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
>>       if (res == -1) {
>>           goto out_err;
>>       }
>> @@ -1453,23 +1498,33 @@ out_err:
>>   static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>>                                       const char *name)
>>   {
>> +    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
>>       int res;
>>       uint64_t mnt_id;
>>       struct stat attr;
>>       struct lo_data *lo = lo_data(req);
>>       struct lo_inode *dir = lo_inode(req, parent);
>> +    struct lo_inode *inode = NULL;
>>   
>>       if (!dir) {
>> -        return NULL;
>> +        goto out;
> Should we continue to just call "return NULL". dir is NULL. That means
> lo_inode() failed. That means we never got the reference. So we don't
> have to put the reference. If we do "goto out", it will call
> lo_inode_put() which is not needed.

Yes, but lo_inode_put() will handle this gracefully, so it isn’t wrong. 
My personal preference is that if there is an clean-up path, it should 
be used everywhere instead of having pure returns at the beginning of a 
function (where not many resources have been initialized yet), so that 
no clean-up will be forgotten.  Like, if we were to add some resource 
acquisition in the declarations above (and clean-up code in the clean-up 
path), we would need to change the return to a goto here.  Or maybe we’d 
forget that, and then we’d leak something.

So I prefer having clean-up sections be generic enough that they can be 
used from anywhere within the function, and then also use it from 
anywhere within the function, even if they end up being no-ops.

>>       }
>>   
>> -    res = do_statx(lo, dir->fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
>> -    lo_inode_put(lo, &dir);
>> +    res = lo_inode_fd(dir, &dir_fd);
>> +    if (res < 0) {
>> +        goto out;
>> +    }
>> +
>> +    res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
>>       if (res == -1) {
>> -        return NULL;
>> +        goto out;
>>       }
>>   
>> -    return lo_find(lo, &attr, mnt_id);
>> +    inode = lo_find(lo, &attr, mnt_id);
>> +
>> +out:
>> +    lo_inode_put(lo, &dir);
>> +    return inode;
>>   }
>
> Thanks
> Vivek
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 06/10] virtiofsd: Let lo_inode_open() return a TempFd
  2021-08-06 19:55     ` [Virtio-fs] " Vivek Goyal
@ 2021-08-09 13:40       ` Max Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-08-09 13:40 UTC (permalink / raw)
  To: Vivek Goyal, Max Reitz
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 06.08.21 21:55, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:30PM +0200, Max Reitz wrote:
>> Strictly speaking, this is not necessary, because lo_inode_open() will
>> always return a new FD owned by the caller, so TempFd.owned will always
>> be true.
>>
>> However, auto-cleanup is nice, and in some cases this plays nicely with
>> an lo_inode_fd() call in another conditional branch (see lo_setattr()).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 138 +++++++++++++------------------
>>   1 file changed, 59 insertions(+), 79 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 9e1bc37af8..292b7f7e27 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -291,10 +291,8 @@ static void temp_fd_clear(TempFd *temp_fd)
>>   /**
>>    * Return an owned fd from *temp_fd that will not be closed when
>>    * *temp_fd goes out of scope.
>> - *
>> - * (TODO: Remove __attribute__ once this is used.)
>>    */
>> -static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
>> +static int temp_fd_steal(TempFd *temp_fd)
>>   {
>>       if (temp_fd->owned) {
>>           temp_fd->owned = false;
>> @@ -673,9 +671,12 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
>>    * when a malicious client opens special files such as block device nodes.
>>    * Symlink inodes are also rejected since symlinks must already have been
>>    * traversed on the client side.
>> + *
>> + * The fd is returned in tfd->fd.  The return value is 0 on success and -errno
>> + * otherwise.
>>    */
>> -static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
>> -                         int open_flags)
>> +static int lo_inode_open(const struct lo_data *lo, const struct lo_inode *inode,
>> +                         int open_flags, TempFd *tfd)
>>   {
>>       g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
>>       int fd;
>> @@ -694,7 +695,13 @@ static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
>>       if (fd < 0) {
>>           return -errno;
>>       }
>> -    return fd;
>> +
>> +    *tfd = (TempFd) {
>> +        .fd = fd,
>> +        .owned = true,
>> +    };
>> +
>> +    return 0;
>>   }
>>   
>>   static void lo_init(void *userdata, struct fuse_conn_info *conn)
>> @@ -852,7 +859,12 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>>           return;
>>       }
>>   
>> -    res = lo_inode_fd(inode, &inode_fd);
>> +    if (!fi && (valid & FUSE_SET_ATTR_SIZE)) {
>> +        /* We need an O_RDWR FD for ftruncate() */
>> +        res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
>> +    } else {
>> +        res = lo_inode_fd(inode, &inode_fd);
>> +    }
> A minor nit.
>
> So inode_fd could hold either an O_PATH fd returned by lo_inode_fd()
> or a O_RDWR fd returned by lo_inode_open().
>
> Previous code held these fds in two different variables, inode_fd and
> truncfd respectively. I kind of found that easier to read because looking
> at variable name, I knew whether I am dealing with O_PATH fd or an
> O_RDWR fd I just opened.
>
> So a minor nit. We could continue to have two variables, say
> inode_fd and trunc_fd. Just that type of trunc_fd will now be TempFd.
>
> Also I liked previous style easier to read where I always got hold
> of O_PATH fd first. And later opened a O_RDWR fd if operation
> is FUSE_ATTR_SIZE. So "valid & FUSE_SET_ATTR_SIZE" check was not
> at two places.

Oh, yes.  The problem with that approach is that we unconditionally need 
to get an O_PATH fd, which is trivial for when we have one, but with 
file handles this means an open_by_handle_at() operation – and then 
another one to get the O_RDWR fd.  So there’s a superfluous 
open_by_handle_at() operation there.

I understand this makes the code a bit more complicated, but I felt 
there was sufficient reason for it.

That also means that I don’t really want to differentiate the fd into 
two distinct fd variables.  Nothing in this function needs an O_PATH fd, 
it’s just that that’s the easier one to open, so those places can work 
with any fd.

What we could do is have an rw_fd variable and a path_fd variable. The 
former will only be valid if the conditions are right (!fi && (valid & 
FUSE_SET_ATTR_SIZE)), the latter will always be valid and will be the 
same fd as rw_fd if the latter is valid.

However, both need to be TempFds, because both lo_inode_open() and 
lo_inode_fd() return TempFds.  So copying from rw_fd to path_fd would 
require a new function temp_fd_copy() or something, so the code would 
look like:

if (!fi && (valid & FUSE_SET_ATTR_SIZE)) {
     res = lo_inode_open(..., &rw_fd);
     if (res >= 0) {
         temp_fd_copy(&rw_fd, &path_fd);
     }
} else {
     res = lo_inode_fd(..., &path_fd);
}

with

void temp_fd_copy(const TempFd *from, const TempFd *to)
{
     *to = {
         .fd = to->fd,
         .owned = false,
     };
}

And then we use path_fd wherever an O_PATH fd would suffice, and rw_fd 
elsewhere (perhaps with a preceding assert(rw_fd.fd >= 0)).  Would that 
be kind of in accordance with what you had in mind?

> Anyway, this is a minor nit. If you don't like the idea of using
> two separate variables to hold O_PATH fd and O_RDWR fd, that's ok.
>
>
>>       if (res < 0) {
>>           saverr = -res;
>>           goto out_err;
>> @@ -900,18 +912,11 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>>           if (fi) {
>>               truncfd = fd;
>>           } else {
>> -            truncfd = lo_inode_open(lo, inode, O_RDWR);
>> -            if (truncfd < 0) {
>> -                saverr = -truncfd;
>> -                goto out_err;
>> -            }
>> +            truncfd = inode_fd.fd;
>>           }
>>   
>>           saverr = drop_security_capability(lo, truncfd);
>>           if (saverr) {
>> -            if (!fi) {
>> -                close(truncfd);
>> -            }
>>               goto out_err;
>>           }
>>   
>> @@ -919,9 +924,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>>               res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
>>               if (res != 0) {
>>                   saverr = res;
>> -                if (!fi) {
>> -                    close(truncfd);
>> -                }
>>                   goto out_err;
>>               }
>>           }
>> @@ -934,9 +936,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>>                   fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
>>               }
>>           }
>> -        if (!fi) {
>> -            close(truncfd);
>> -        }
>>           if (res == -1) {
>>               goto out_err;
>>           }
>> @@ -1822,11 +1821,12 @@ static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
>>   static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>>                          struct fuse_file_info *fi)
>>   {
>> +    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
>>       int error = ENOMEM;
>>       struct lo_data *lo = lo_data(req);
>>       struct lo_inode *inode;
>>       struct lo_dirp *d = NULL;
>> -    int fd;
>> +    int res;
>>       ssize_t fh;
>>   
>>       inode = lo_inode(req, ino);
>> @@ -1840,13 +1840,13 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>>           goto out_err;
>>       }
>>   
>> -    fd = lo_inode_open(lo, inode, O_RDONLY);
>> -    if (fd < 0) {
>> -        error = -fd;
>> +    res = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
>> +    if (res < 0) {
>> +        error = -res;
>>           goto out_err;
>>       }
>>   
>> -    d->dp = fdopendir(fd);
>> +    d->dp = fdopendir(temp_fd_steal(&inode_fd));
> So we are using temp_fd_steal(), because if fdopendir() is succesful,
> we don't want to close fd instead it will be closed during closedir()
> call. inode_fd will be closed once lo_opendir(), so we get fd ownership
> which will need to close explicitly, when appropriate.
>
> Who closes the stolen fd returned by temp_fd_steal() if fdopendir() fails?

Nobody, I forgot handling it in the error path. O:)

Thanks for the catch.

>>       if (d->dp == NULL) {
>>           goto out_errno;
>>       }
>> @@ -1876,8 +1876,6 @@ out_err:
>>       if (d) {
>>           if (d->dp) {
>>               closedir(d->dp);
>> -        } else if (fd != -1) {
>> -            close(fd);
>>           }
>>           free(d);
>>       }
>> @@ -2077,6 +2075,7 @@ static void update_open_flags(int writeback, int allow_direct_io,
>>   static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
>>                         int existing_fd, struct fuse_file_info *fi)
>>   {
>> +    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
> It bothers me that we are using variable inode_fd both to hold O_PATH
> fd as well as regular fd. Will be nice if just by looking at variable
> name I could figure out which type of fd it is.
>
> Will it make sense to use path_fd, or ipath_fd, or inode_path_fd to
> represent where we are using O_PATH fd.

I suppose you mean in general and not specifically for lo_do_open()?  
Sure, I vote for path_fd.

I can imagine the diff stat may become rather large, though, so while I 
agree in principle, I’ll have to take a look first to know how invasive 
such a change would be (and then let you know).

Thanks for you feedback!

Max



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 06/10] virtiofsd: Let lo_inode_open() return a TempFd
@ 2021-08-09 13:40       ` Max Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Max Reitz @ 2021-08-09 13:40 UTC (permalink / raw)
  To: Vivek Goyal, Max Reitz; +Cc: virtio-fs, qemu-devel

On 06.08.21 21:55, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:30PM +0200, Max Reitz wrote:
>> Strictly speaking, this is not necessary, because lo_inode_open() will
>> always return a new FD owned by the caller, so TempFd.owned will always
>> be true.
>>
>> However, auto-cleanup is nice, and in some cases this plays nicely with
>> an lo_inode_fd() call in another conditional branch (see lo_setattr()).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 138 +++++++++++++------------------
>>   1 file changed, 59 insertions(+), 79 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 9e1bc37af8..292b7f7e27 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -291,10 +291,8 @@ static void temp_fd_clear(TempFd *temp_fd)
>>   /**
>>    * Return an owned fd from *temp_fd that will not be closed when
>>    * *temp_fd goes out of scope.
>> - *
>> - * (TODO: Remove __attribute__ once this is used.)
>>    */
>> -static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
>> +static int temp_fd_steal(TempFd *temp_fd)
>>   {
>>       if (temp_fd->owned) {
>>           temp_fd->owned = false;
>> @@ -673,9 +671,12 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
>>    * when a malicious client opens special files such as block device nodes.
>>    * Symlink inodes are also rejected since symlinks must already have been
>>    * traversed on the client side.
>> + *
>> + * The fd is returned in tfd->fd.  The return value is 0 on success and -errno
>> + * otherwise.
>>    */
>> -static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
>> -                         int open_flags)
>> +static int lo_inode_open(const struct lo_data *lo, const struct lo_inode *inode,
>> +                         int open_flags, TempFd *tfd)
>>   {
>>       g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
>>       int fd;
>> @@ -694,7 +695,13 @@ static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
>>       if (fd < 0) {
>>           return -errno;
>>       }
>> -    return fd;
>> +
>> +    *tfd = (TempFd) {
>> +        .fd = fd,
>> +        .owned = true,
>> +    };
>> +
>> +    return 0;
>>   }
>>   
>>   static void lo_init(void *userdata, struct fuse_conn_info *conn)
>> @@ -852,7 +859,12 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>>           return;
>>       }
>>   
>> -    res = lo_inode_fd(inode, &inode_fd);
>> +    if (!fi && (valid & FUSE_SET_ATTR_SIZE)) {
>> +        /* We need an O_RDWR FD for ftruncate() */
>> +        res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
>> +    } else {
>> +        res = lo_inode_fd(inode, &inode_fd);
>> +    }
> A minor nit.
>
> So inode_fd could hold either an O_PATH fd returned by lo_inode_fd()
> or a O_RDWR fd returned by lo_inode_open().
>
> Previous code held these fds in two different variables, inode_fd and
> truncfd respectively. I kind of found that easier to read because looking
> at variable name, I knew whether I am dealing with O_PATH fd or an
> O_RDWR fd I just opened.
>
> So a minor nit. We could continue to have two variables, say
> inode_fd and trunc_fd. Just that type of trunc_fd will now be TempFd.
>
> Also I liked previous style easier to read where I always got hold
> of O_PATH fd first. And later opened a O_RDWR fd if operation
> is FUSE_ATTR_SIZE. So "valid & FUSE_SET_ATTR_SIZE" check was not
> at two places.

Oh, yes.  The problem with that approach is that we unconditionally need 
to get an O_PATH fd, which is trivial for when we have one, but with 
file handles this means an open_by_handle_at() operation – and then 
another one to get the O_RDWR fd.  So there’s a superfluous 
open_by_handle_at() operation there.

I understand this makes the code a bit more complicated, but I felt 
there was sufficient reason for it.

That also means that I don’t really want to differentiate the fd into 
two distinct fd variables.  Nothing in this function needs an O_PATH fd, 
it’s just that that’s the easier one to open, so those places can work 
with any fd.

What we could do is have an rw_fd variable and a path_fd variable. The 
former will only be valid if the conditions are right (!fi && (valid & 
FUSE_SET_ATTR_SIZE)), the latter will always be valid and will be the 
same fd as rw_fd if the latter is valid.

However, both need to be TempFds, because both lo_inode_open() and 
lo_inode_fd() return TempFds.  So copying from rw_fd to path_fd would 
require a new function temp_fd_copy() or something, so the code would 
look like:

if (!fi && (valid & FUSE_SET_ATTR_SIZE)) {
     res = lo_inode_open(..., &rw_fd);
     if (res >= 0) {
         temp_fd_copy(&rw_fd, &path_fd);
     }
} else {
     res = lo_inode_fd(..., &path_fd);
}

with

void temp_fd_copy(const TempFd *from, const TempFd *to)
{
     *to = {
         .fd = to->fd,
         .owned = false,
     };
}

And then we use path_fd wherever an O_PATH fd would suffice, and rw_fd 
elsewhere (perhaps with a preceding assert(rw_fd.fd >= 0)).  Would that 
be kind of in accordance with what you had in mind?

> Anyway, this is a minor nit. If you don't like the idea of using
> two separate variables to hold O_PATH fd and O_RDWR fd, that's ok.
>
>
>>       if (res < 0) {
>>           saverr = -res;
>>           goto out_err;
>> @@ -900,18 +912,11 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>>           if (fi) {
>>               truncfd = fd;
>>           } else {
>> -            truncfd = lo_inode_open(lo, inode, O_RDWR);
>> -            if (truncfd < 0) {
>> -                saverr = -truncfd;
>> -                goto out_err;
>> -            }
>> +            truncfd = inode_fd.fd;
>>           }
>>   
>>           saverr = drop_security_capability(lo, truncfd);
>>           if (saverr) {
>> -            if (!fi) {
>> -                close(truncfd);
>> -            }
>>               goto out_err;
>>           }
>>   
>> @@ -919,9 +924,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>>               res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
>>               if (res != 0) {
>>                   saverr = res;
>> -                if (!fi) {
>> -                    close(truncfd);
>> -                }
>>                   goto out_err;
>>               }
>>           }
>> @@ -934,9 +936,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
>>                   fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
>>               }
>>           }
>> -        if (!fi) {
>> -            close(truncfd);
>> -        }
>>           if (res == -1) {
>>               goto out_err;
>>           }
>> @@ -1822,11 +1821,12 @@ static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
>>   static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>>                          struct fuse_file_info *fi)
>>   {
>> +    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
>>       int error = ENOMEM;
>>       struct lo_data *lo = lo_data(req);
>>       struct lo_inode *inode;
>>       struct lo_dirp *d = NULL;
>> -    int fd;
>> +    int res;
>>       ssize_t fh;
>>   
>>       inode = lo_inode(req, ino);
>> @@ -1840,13 +1840,13 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
>>           goto out_err;
>>       }
>>   
>> -    fd = lo_inode_open(lo, inode, O_RDONLY);
>> -    if (fd < 0) {
>> -        error = -fd;
>> +    res = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
>> +    if (res < 0) {
>> +        error = -res;
>>           goto out_err;
>>       }
>>   
>> -    d->dp = fdopendir(fd);
>> +    d->dp = fdopendir(temp_fd_steal(&inode_fd));
> So we are using temp_fd_steal(), because if fdopendir() is succesful,
> we don't want to close fd instead it will be closed during closedir()
> call. inode_fd will be closed once lo_opendir(), so we get fd ownership
> which will need to close explicitly, when appropriate.
>
> Who closes the stolen fd returned by temp_fd_steal() if fdopendir() fails?

Nobody, I forgot handling it in the error path. O:)

Thanks for the catch.

>>       if (d->dp == NULL) {
>>           goto out_errno;
>>       }
>> @@ -1876,8 +1876,6 @@ out_err:
>>       if (d) {
>>           if (d->dp) {
>>               closedir(d->dp);
>> -        } else if (fd != -1) {
>> -            close(fd);
>>           }
>>           free(d);
>>       }
>> @@ -2077,6 +2075,7 @@ static void update_open_flags(int writeback, int allow_direct_io,
>>   static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
>>                         int existing_fd, struct fuse_file_info *fi)
>>   {
>> +    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
> It bothers me that we are using variable inode_fd both to hold O_PATH
> fd as well as regular fd. Will be nice if just by looking at variable
> name I could figure out which type of fd it is.
>
> Will it make sense to use path_fd, or ipath_fd, or inode_path_fd to
> represent where we are using O_PATH fd.

I suppose you mean in general and not specifically for lo_do_open()?  
Sure, I vote for path_fd.

I can imagine the diff stat may become rather large, though, so while I 
agree in principle, I’ll have to take a look first to know how invasive 
such a change would be (and then let you know).

Thanks for you feedback!

Max


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 07/10] virtiofsd: Add lo_inode.fhandle
  2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
@ 2021-08-09 15:21     ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-09 15:21 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Fri, Jul 30, 2021 at 05:01:31PM +0200, Max Reitz wrote:
> This new field is an alternative to lo_inode.fd: Either of the two must
> be set.  In case an O_PATH FD is needed for some lo_inode, it is either
> taken from lo_inode.fd, if valid, or a temporary FD is opened with
> open_by_handle_at().
> 
> Using a file handle instead of an FD has the advantage of keeping the
> number of open file descriptors low.
> 
> Because open_by_handle_at() requires a mount FD (i.e. a non-O_PATH FD
> opened on the filesystem to which the file handle refers), but every
> lo_fhandle only has a mount ID (as returned by name_to_handle_at()), we
> keep a hash map of such FDs in mount_fds (mapping ID to FD).
> get_file_handle(), which is added by a later patch, will ensure that
> every mount ID for which we have generated a handle has a corresponding
> entry in mount_fds.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c      | 116 ++++++++++++++++++++++----
>  tools/virtiofsd/passthrough_seccomp.c |   1 +
>  2 files changed, 102 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 292b7f7e27..487448d666 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -88,8 +88,25 @@ struct lo_key {
>      uint64_t mnt_id;
>  };
>  
> +struct lo_fhandle {
> +    union {
> +        struct file_handle handle;
> +        char padding[sizeof(struct file_handle) + MAX_HANDLE_SZ];
> +    };
> +    int mount_id;
> +};
> +
> +/* Maps mount IDs to an FD that we can pass to open_by_handle_at() */
> +static GHashTable *mount_fds;
> +pthread_rwlock_t mount_fds_lock = PTHREAD_RWLOCK_INITIALIZER;
> +

How about if we move this hash table inside "struct lo_data". That seems to
be one global data structure keeping all the info. Also it can be
cleaned up during lo_destroy().

Thanks
Vivek



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 07/10] virtiofsd: Add lo_inode.fhandle
@ 2021-08-09 15:21     ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-09 15:21 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Fri, Jul 30, 2021 at 05:01:31PM +0200, Max Reitz wrote:
> This new field is an alternative to lo_inode.fd: Either of the two must
> be set.  In case an O_PATH FD is needed for some lo_inode, it is either
> taken from lo_inode.fd, if valid, or a temporary FD is opened with
> open_by_handle_at().
> 
> Using a file handle instead of an FD has the advantage of keeping the
> number of open file descriptors low.
> 
> Because open_by_handle_at() requires a mount FD (i.e. a non-O_PATH FD
> opened on the filesystem to which the file handle refers), but every
> lo_fhandle only has a mount ID (as returned by name_to_handle_at()), we
> keep a hash map of such FDs in mount_fds (mapping ID to FD).
> get_file_handle(), which is added by a later patch, will ensure that
> every mount ID for which we have generated a handle has a corresponding
> entry in mount_fds.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c      | 116 ++++++++++++++++++++++----
>  tools/virtiofsd/passthrough_seccomp.c |   1 +
>  2 files changed, 102 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 292b7f7e27..487448d666 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -88,8 +88,25 @@ struct lo_key {
>      uint64_t mnt_id;
>  };
>  
> +struct lo_fhandle {
> +    union {
> +        struct file_handle handle;
> +        char padding[sizeof(struct file_handle) + MAX_HANDLE_SZ];
> +    };
> +    int mount_id;
> +};
> +
> +/* Maps mount IDs to an FD that we can pass to open_by_handle_at() */
> +static GHashTable *mount_fds;
> +pthread_rwlock_t mount_fds_lock = PTHREAD_RWLOCK_INITIALIZER;
> +

How about if we move this hash table inside "struct lo_data". That seems to
be one global data structure keeping all the info. Also it can be
cleaned up during lo_destroy().

Thanks
Vivek


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table
  2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
@ 2021-08-09 16:10     ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-09 16:10 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Fri, Jul 30, 2021 at 05:01:32PM +0200, Max Reitz wrote:
> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> its inode ID will remain in use until we drop our lo_inode (and
> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> the inode ID as an lo_inode key, because any inode with an inode ID we
> find in lo_data.inodes (on the same filesystem) must be the exact same
> file.
> 
> This will change when we start setting lo_inode.fhandle so we do not
> have to keep an O_PATH FD open.  Then, unlinking such an inode will
> immediately remove it, so its ID can then be reused by newly created
> files, even while the lo_inode object is still there[1].
> 
> So creating a new file can then reuse the old file's inode ID, and
> looking up the new file would lead to us finding the old file's
> lo_inode, which is not ideal.
> 
> Luckily, just as file handles cause this problem, they also solve it:  A
> file handle contains a generation ID, which changes when an inode ID is
> reused, so the new file can be distinguished from the old one.  So all
> we need to do is to add a second map besides lo_data.inodes that maps
> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> 
> Unfortunately, we cannot rely on being able to generate file handles
> every time.  Therefore, we still enter every lo_inode object into
> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> potential inodes_by_handle entry then has precedence, the inodes_by_ids
> entry is just a fallback.
> 
> Note that we do not generate lo_fhandle objects yet, and so we also do
> not enter anything into the inodes_by_handle map yet.  Also, all lookups
> skip that map.  We might manually create file handles with some code
> that is immediately removed by the next patch again, but that would
> break the assumption in lo_find() that every lo_inode with a non-NULL
> .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> leave actually using the inodes_by_handle map for the next patch.
> 
> [1] If some application in the guest still has the file open, there is
> going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> case, the inode will only go away once every application in the guest
> has closed it.  The problem described only applies to cases where the
> guest does not have the file open, and it is just in the dentry cache,
> basically.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
>  1 file changed, 65 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 487448d666..f9d8b2f134 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -180,7 +180,8 @@ struct lo_data {
>      int announce_submounts;
>      bool use_statx;
>      struct lo_inode root;
> -    GHashTable *inodes; /* protected by lo->mutex */
> +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
> +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
>      struct lo_map ino_map; /* protected by lo->mutex */
>      struct lo_map dirp_map; /* protected by lo->mutex */
>      struct lo_map fd_map; /* protected by lo->mutex */
> @@ -263,8 +264,9 @@ static struct {
>  /* That we loaded cap-ng in the current thread from the saved */
>  static __thread bool cap_loaded = 0;
>  
> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> -                                uint64_t mnt_id);
> +static struct lo_inode *lo_find(struct lo_data *lo,
> +                                const struct lo_fhandle *fhandle,
> +                                struct stat *st, uint64_t mnt_id);
>  static int xattr_map_client(const struct lo_data *lo, const char *client_name,
>                              char **out_name);
>  
> @@ -1064,18 +1066,40 @@ out_err:
>      fuse_reply_err(req, saverr);
>  }
>  
> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> -                                uint64_t mnt_id)
> +static struct lo_inode *lo_find(struct lo_data *lo,
> +                                const struct lo_fhandle *fhandle,
> +                                struct stat *st, uint64_t mnt_id)
>  {
> -    struct lo_inode *p;
> -    struct lo_key key = {
> +    struct lo_inode *p = NULL;
> +    struct lo_key ids_key = {
>          .ino = st->st_ino,
>          .dev = st->st_dev,
>          .mnt_id = mnt_id,
>      };
>  
>      pthread_mutex_lock(&lo->mutex);
> -    p = g_hash_table_lookup(lo->inodes, &key);
> +    if (fhandle) {
> +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
> +    }
> +    if (!p) {
> +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);

So even if fhandle is not NULL, we will still lookup the inode
object in lo->inodes_by_ids? I thought fallback was only required
if we could not generate file handle to begin with and in that case
fhandle will be NULL?

IOW, should this code instead look like.

if (fhandle) {
    lookup_in_lo_inodes_by_handle
} else {
    lookup_in_lo_inodes_by_ids;
    if_found_verify_valid_o_path_fd;
}


> +        /*
> +         * When we had to fall back to looking up an inode by its
> +         * inode ID, ensure that we hit an entry that has a valid file
> +         * descriptor.  Having an FD open means that the inode cannot
> +         * really be deleted until the FD is closed, so that the inode
> +         * ID remains valid until we evict our lo_inode.
> +         * With no FD open (and just a file handle), the inode can be
> +         * deleted while we still have our lo_inode, and so the inode
> +         * ID may be reused by a completely different new inode.  We
> +         * then must look up the lo_inode by file handle, because this
> +         * handle contains a generation ID to differentiate between
> +         * the old and the new inode.
> +         */
> +        if (p && p->fd == -1) {
> +            p = NULL;
> +        }
> +    }
>      if (p) {
>          assert(p->nlookup > 0);
>          p->nlookup++;


[..]
>  static void fuse_lo_data_cleanup(struct lo_data *lo)
>  {
> -    if (lo->inodes) {
> -        g_hash_table_destroy(lo->inodes);
> +    if (lo->inodes_by_ids) {
> +        g_hash_table_destroy(lo->inodes_by_ids);
> +    }
> +    if (lo->inodes_by_ids) {
            ^^^^^
Should this be lo->inodes_by_handle instead?

> +        g_hash_table_destroy(lo->inodes_by_handle);

Thanks
Vivek



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table
@ 2021-08-09 16:10     ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-09 16:10 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Fri, Jul 30, 2021 at 05:01:32PM +0200, Max Reitz wrote:
> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> its inode ID will remain in use until we drop our lo_inode (and
> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> the inode ID as an lo_inode key, because any inode with an inode ID we
> find in lo_data.inodes (on the same filesystem) must be the exact same
> file.
> 
> This will change when we start setting lo_inode.fhandle so we do not
> have to keep an O_PATH FD open.  Then, unlinking such an inode will
> immediately remove it, so its ID can then be reused by newly created
> files, even while the lo_inode object is still there[1].
> 
> So creating a new file can then reuse the old file's inode ID, and
> looking up the new file would lead to us finding the old file's
> lo_inode, which is not ideal.
> 
> Luckily, just as file handles cause this problem, they also solve it:  A
> file handle contains a generation ID, which changes when an inode ID is
> reused, so the new file can be distinguished from the old one.  So all
> we need to do is to add a second map besides lo_data.inodes that maps
> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> 
> Unfortunately, we cannot rely on being able to generate file handles
> every time.  Therefore, we still enter every lo_inode object into
> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> potential inodes_by_handle entry then has precedence, the inodes_by_ids
> entry is just a fallback.
> 
> Note that we do not generate lo_fhandle objects yet, and so we also do
> not enter anything into the inodes_by_handle map yet.  Also, all lookups
> skip that map.  We might manually create file handles with some code
> that is immediately removed by the next patch again, but that would
> break the assumption in lo_find() that every lo_inode with a non-NULL
> .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> leave actually using the inodes_by_handle map for the next patch.
> 
> [1] If some application in the guest still has the file open, there is
> going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> case, the inode will only go away once every application in the guest
> has closed it.  The problem described only applies to cases where the
> guest does not have the file open, and it is just in the dentry cache,
> basically.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
>  1 file changed, 65 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 487448d666..f9d8b2f134 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -180,7 +180,8 @@ struct lo_data {
>      int announce_submounts;
>      bool use_statx;
>      struct lo_inode root;
> -    GHashTable *inodes; /* protected by lo->mutex */
> +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
> +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
>      struct lo_map ino_map; /* protected by lo->mutex */
>      struct lo_map dirp_map; /* protected by lo->mutex */
>      struct lo_map fd_map; /* protected by lo->mutex */
> @@ -263,8 +264,9 @@ static struct {
>  /* That we loaded cap-ng in the current thread from the saved */
>  static __thread bool cap_loaded = 0;
>  
> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> -                                uint64_t mnt_id);
> +static struct lo_inode *lo_find(struct lo_data *lo,
> +                                const struct lo_fhandle *fhandle,
> +                                struct stat *st, uint64_t mnt_id);
>  static int xattr_map_client(const struct lo_data *lo, const char *client_name,
>                              char **out_name);
>  
> @@ -1064,18 +1066,40 @@ out_err:
>      fuse_reply_err(req, saverr);
>  }
>  
> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> -                                uint64_t mnt_id)
> +static struct lo_inode *lo_find(struct lo_data *lo,
> +                                const struct lo_fhandle *fhandle,
> +                                struct stat *st, uint64_t mnt_id)
>  {
> -    struct lo_inode *p;
> -    struct lo_key key = {
> +    struct lo_inode *p = NULL;
> +    struct lo_key ids_key = {
>          .ino = st->st_ino,
>          .dev = st->st_dev,
>          .mnt_id = mnt_id,
>      };
>  
>      pthread_mutex_lock(&lo->mutex);
> -    p = g_hash_table_lookup(lo->inodes, &key);
> +    if (fhandle) {
> +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
> +    }
> +    if (!p) {
> +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);

So even if fhandle is not NULL, we will still lookup the inode
object in lo->inodes_by_ids? I thought fallback was only required
if we could not generate file handle to begin with and in that case
fhandle will be NULL?

IOW, should this code instead look like.

if (fhandle) {
    lookup_in_lo_inodes_by_handle
} else {
    lookup_in_lo_inodes_by_ids;
    if_found_verify_valid_o_path_fd;
}


> +        /*
> +         * When we had to fall back to looking up an inode by its
> +         * inode ID, ensure that we hit an entry that has a valid file
> +         * descriptor.  Having an FD open means that the inode cannot
> +         * really be deleted until the FD is closed, so that the inode
> +         * ID remains valid until we evict our lo_inode.
> +         * With no FD open (and just a file handle), the inode can be
> +         * deleted while we still have our lo_inode, and so the inode
> +         * ID may be reused by a completely different new inode.  We
> +         * then must look up the lo_inode by file handle, because this
> +         * handle contains a generation ID to differentiate between
> +         * the old and the new inode.
> +         */
> +        if (p && p->fd == -1) {
> +            p = NULL;
> +        }
> +    }
>      if (p) {
>          assert(p->nlookup > 0);
>          p->nlookup++;


[..]
>  static void fuse_lo_data_cleanup(struct lo_data *lo)
>  {
> -    if (lo->inodes) {
> -        g_hash_table_destroy(lo->inodes);
> +    if (lo->inodes_by_ids) {
> +        g_hash_table_destroy(lo->inodes_by_ids);
> +    }
> +    if (lo->inodes_by_ids) {
            ^^^^^
Should this be lo->inodes_by_handle instead?

> +        g_hash_table_destroy(lo->inodes_by_handle);

Thanks
Vivek


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 07/10] virtiofsd: Add lo_inode.fhandle
  2021-08-09 15:21     ` [Virtio-fs] " Vivek Goyal
@ 2021-08-09 16:41       ` Hanna Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-09 16:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 09.08.21 17:21, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:31PM +0200, Max Reitz wrote:
>> This new field is an alternative to lo_inode.fd: Either of the two must
>> be set.  In case an O_PATH FD is needed for some lo_inode, it is either
>> taken from lo_inode.fd, if valid, or a temporary FD is opened with
>> open_by_handle_at().
>>
>> Using a file handle instead of an FD has the advantage of keeping the
>> number of open file descriptors low.
>>
>> Because open_by_handle_at() requires a mount FD (i.e. a non-O_PATH FD
>> opened on the filesystem to which the file handle refers), but every
>> lo_fhandle only has a mount ID (as returned by name_to_handle_at()), we
>> keep a hash map of such FDs in mount_fds (mapping ID to FD).
>> get_file_handle(), which is added by a later patch, will ensure that
>> every mount ID for which we have generated a handle has a corresponding
>> entry in mount_fds.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c      | 116 ++++++++++++++++++++++----
>>   tools/virtiofsd/passthrough_seccomp.c |   1 +
>>   2 files changed, 102 insertions(+), 15 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 292b7f7e27..487448d666 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -88,8 +88,25 @@ struct lo_key {
>>       uint64_t mnt_id;
>>   };
>>   
>> +struct lo_fhandle {
>> +    union {
>> +        struct file_handle handle;
>> +        char padding[sizeof(struct file_handle) + MAX_HANDLE_SZ];
>> +    };
>> +    int mount_id;
>> +};
>> +
>> +/* Maps mount IDs to an FD that we can pass to open_by_handle_at() */
>> +static GHashTable *mount_fds;
>> +pthread_rwlock_t mount_fds_lock = PTHREAD_RWLOCK_INITIALIZER;
>> +
> How about if we move this hash table inside "struct lo_data". That seems to
> be one global data structure keeping all the info. Also it can be
> cleaned up during lo_destroy().

Yes, sounds good and right, will do.

Hanna



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 07/10] virtiofsd: Add lo_inode.fhandle
@ 2021-08-09 16:41       ` Hanna Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-09 16:41 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel

On 09.08.21 17:21, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:31PM +0200, Max Reitz wrote:
>> This new field is an alternative to lo_inode.fd: Either of the two must
>> be set.  In case an O_PATH FD is needed for some lo_inode, it is either
>> taken from lo_inode.fd, if valid, or a temporary FD is opened with
>> open_by_handle_at().
>>
>> Using a file handle instead of an FD has the advantage of keeping the
>> number of open file descriptors low.
>>
>> Because open_by_handle_at() requires a mount FD (i.e. a non-O_PATH FD
>> opened on the filesystem to which the file handle refers), but every
>> lo_fhandle only has a mount ID (as returned by name_to_handle_at()), we
>> keep a hash map of such FDs in mount_fds (mapping ID to FD).
>> get_file_handle(), which is added by a later patch, will ensure that
>> every mount ID for which we have generated a handle has a corresponding
>> entry in mount_fds.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c      | 116 ++++++++++++++++++++++----
>>   tools/virtiofsd/passthrough_seccomp.c |   1 +
>>   2 files changed, 102 insertions(+), 15 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 292b7f7e27..487448d666 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -88,8 +88,25 @@ struct lo_key {
>>       uint64_t mnt_id;
>>   };
>>   
>> +struct lo_fhandle {
>> +    union {
>> +        struct file_handle handle;
>> +        char padding[sizeof(struct file_handle) + MAX_HANDLE_SZ];
>> +    };
>> +    int mount_id;
>> +};
>> +
>> +/* Maps mount IDs to an FD that we can pass to open_by_handle_at() */
>> +static GHashTable *mount_fds;
>> +pthread_rwlock_t mount_fds_lock = PTHREAD_RWLOCK_INITIALIZER;
>> +
> How about if we move this hash table inside "struct lo_data". That seems to
> be one global data structure keeping all the info. Also it can be
> cleaned up during lo_destroy().

Yes, sounds good and right, will do.

Hanna


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table
  2021-08-09 16:10     ` [Virtio-fs] " Vivek Goyal
@ 2021-08-09 16:47       ` Hanna Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-09 16:47 UTC (permalink / raw)
  To: Vivek Goyal, Max Reitz
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 09.08.21 18:10, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:32PM +0200, Max Reitz wrote:
>> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
>> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
>> its inode ID will remain in use until we drop our lo_inode (and
>> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
>> the inode ID as an lo_inode key, because any inode with an inode ID we
>> find in lo_data.inodes (on the same filesystem) must be the exact same
>> file.
>>
>> This will change when we start setting lo_inode.fhandle so we do not
>> have to keep an O_PATH FD open.  Then, unlinking such an inode will
>> immediately remove it, so its ID can then be reused by newly created
>> files, even while the lo_inode object is still there[1].
>>
>> So creating a new file can then reuse the old file's inode ID, and
>> looking up the new file would lead to us finding the old file's
>> lo_inode, which is not ideal.
>>
>> Luckily, just as file handles cause this problem, they also solve it:  A
>> file handle contains a generation ID, which changes when an inode ID is
>> reused, so the new file can be distinguished from the old one.  So all
>> we need to do is to add a second map besides lo_data.inodes that maps
>> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
>> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
>>
>> Unfortunately, we cannot rely on being able to generate file handles
>> every time.  Therefore, we still enter every lo_inode object into
>> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
>> potential inodes_by_handle entry then has precedence, the inodes_by_ids
>> entry is just a fallback.
>>
>> Note that we do not generate lo_fhandle objects yet, and so we also do
>> not enter anything into the inodes_by_handle map yet.  Also, all lookups
>> skip that map.  We might manually create file handles with some code
>> that is immediately removed by the next patch again, but that would
>> break the assumption in lo_find() that every lo_inode with a non-NULL
>> .fhandle must have an entry in inodes_by_handle and vice versa.  So we
>> leave actually using the inodes_by_handle map for the next patch.
>>
>> [1] If some application in the guest still has the file open, there is
>> going to be a corresponding FD mapping in lo_data.fd_map.  In such a
>> case, the inode will only go away once every application in the guest
>> has closed it.  The problem described only applies to cases where the
>> guest does not have the file open, and it is just in the dentry cache,
>> basically.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
>>   1 file changed, 65 insertions(+), 16 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 487448d666..f9d8b2f134 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -180,7 +180,8 @@ struct lo_data {
>>       int announce_submounts;
>>       bool use_statx;
>>       struct lo_inode root;
>> -    GHashTable *inodes; /* protected by lo->mutex */
>> +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
>> +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
>>       struct lo_map ino_map; /* protected by lo->mutex */
>>       struct lo_map dirp_map; /* protected by lo->mutex */
>>       struct lo_map fd_map; /* protected by lo->mutex */
>> @@ -263,8 +264,9 @@ static struct {
>>   /* That we loaded cap-ng in the current thread from the saved */
>>   static __thread bool cap_loaded = 0;
>>   
>> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
>> -                                uint64_t mnt_id);
>> +static struct lo_inode *lo_find(struct lo_data *lo,
>> +                                const struct lo_fhandle *fhandle,
>> +                                struct stat *st, uint64_t mnt_id);
>>   static int xattr_map_client(const struct lo_data *lo, const char *client_name,
>>                               char **out_name);
>>   
>> @@ -1064,18 +1066,40 @@ out_err:
>>       fuse_reply_err(req, saverr);
>>   }
>>   
>> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
>> -                                uint64_t mnt_id)
>> +static struct lo_inode *lo_find(struct lo_data *lo,
>> +                                const struct lo_fhandle *fhandle,
>> +                                struct stat *st, uint64_t mnt_id)
>>   {
>> -    struct lo_inode *p;
>> -    struct lo_key key = {
>> +    struct lo_inode *p = NULL;
>> +    struct lo_key ids_key = {
>>           .ino = st->st_ino,
>>           .dev = st->st_dev,
>>           .mnt_id = mnt_id,
>>       };
>>   
>>       pthread_mutex_lock(&lo->mutex);
>> -    p = g_hash_table_lookup(lo->inodes, &key);
>> +    if (fhandle) {
>> +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
>> +    }
>> +    if (!p) {
>> +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
> So even if fhandle is not NULL, we will still lookup the inode
> object in lo->inodes_by_ids? I thought fallback was only required
> if we could not generate file handle to begin with and in that case
> fhandle will be NULL?

Well.  I think it depends again on when file handle generation can fail 
and when it cannot.  If we assume it can randomly fail at any time, then 
it’s possible we create an lo_inode with an O_PATH fd, but later we are 
able to generate a file handle for it.  So we first try a lookup by file 
handle here, which would fail, but we’d still have to try a lookup by 
IDs, so we can find the O_PATH lo_inode.

An example case would be if at first we weren’t able to open a mount fd 
(because this file is a device node and the first lo_inode looked up on 
its filesystem), and so we couldn’t generate a file handle that we would 
be sure would work; but later for the lookup we can generate a file 
handle (because some other node on that filesystem has been opened by 
then, so we have a mount fd).

> IOW, should this code instead look like.
>
> if (fhandle) {
>      lookup_in_lo_inodes_by_handle
> } else {
>      lookup_in_lo_inodes_by_ids;
>      if_found_verify_valid_o_path_fd;
> }
>
>
>> +        /*
>> +         * When we had to fall back to looking up an inode by its
>> +         * inode ID, ensure that we hit an entry that has a valid file
>> +         * descriptor.  Having an FD open means that the inode cannot
>> +         * really be deleted until the FD is closed, so that the inode
>> +         * ID remains valid until we evict our lo_inode.
>> +         * With no FD open (and just a file handle), the inode can be
>> +         * deleted while we still have our lo_inode, and so the inode
>> +         * ID may be reused by a completely different new inode.  We
>> +         * then must look up the lo_inode by file handle, because this
>> +         * handle contains a generation ID to differentiate between
>> +         * the old and the new inode.
>> +         */
>> +        if (p && p->fd == -1) {
>> +            p = NULL;
>> +        }
>> +    }
>>       if (p) {
>>           assert(p->nlookup > 0);
>>           p->nlookup++;
>
> [..]
>>   static void fuse_lo_data_cleanup(struct lo_data *lo)
>>   {
>> -    if (lo->inodes) {
>> -        g_hash_table_destroy(lo->inodes);
>> +    if (lo->inodes_by_ids) {
>> +        g_hash_table_destroy(lo->inodes_by_ids);
>> +    }
>> +    if (lo->inodes_by_ids) {
>              ^^^^^
> Should this be lo->inodes_by_handle instead?

Oh, crap, yes, absolutely.

Hanna

>> +        g_hash_table_destroy(lo->inodes_by_handle);
> Thanks
> Vivek
>



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table
@ 2021-08-09 16:47       ` Hanna Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-09 16:47 UTC (permalink / raw)
  To: Vivek Goyal, Max Reitz; +Cc: virtio-fs, qemu-devel

On 09.08.21 18:10, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:32PM +0200, Max Reitz wrote:
>> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
>> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
>> its inode ID will remain in use until we drop our lo_inode (and
>> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
>> the inode ID as an lo_inode key, because any inode with an inode ID we
>> find in lo_data.inodes (on the same filesystem) must be the exact same
>> file.
>>
>> This will change when we start setting lo_inode.fhandle so we do not
>> have to keep an O_PATH FD open.  Then, unlinking such an inode will
>> immediately remove it, so its ID can then be reused by newly created
>> files, even while the lo_inode object is still there[1].
>>
>> So creating a new file can then reuse the old file's inode ID, and
>> looking up the new file would lead to us finding the old file's
>> lo_inode, which is not ideal.
>>
>> Luckily, just as file handles cause this problem, they also solve it:  A
>> file handle contains a generation ID, which changes when an inode ID is
>> reused, so the new file can be distinguished from the old one.  So all
>> we need to do is to add a second map besides lo_data.inodes that maps
>> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
>> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
>>
>> Unfortunately, we cannot rely on being able to generate file handles
>> every time.  Therefore, we still enter every lo_inode object into
>> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
>> potential inodes_by_handle entry then has precedence, the inodes_by_ids
>> entry is just a fallback.
>>
>> Note that we do not generate lo_fhandle objects yet, and so we also do
>> not enter anything into the inodes_by_handle map yet.  Also, all lookups
>> skip that map.  We might manually create file handles with some code
>> that is immediately removed by the next patch again, but that would
>> break the assumption in lo_find() that every lo_inode with a non-NULL
>> .fhandle must have an entry in inodes_by_handle and vice versa.  So we
>> leave actually using the inodes_by_handle map for the next patch.
>>
>> [1] If some application in the guest still has the file open, there is
>> going to be a corresponding FD mapping in lo_data.fd_map.  In such a
>> case, the inode will only go away once every application in the guest
>> has closed it.  The problem described only applies to cases where the
>> guest does not have the file open, and it is just in the dentry cache,
>> basically.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
>>   1 file changed, 65 insertions(+), 16 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 487448d666..f9d8b2f134 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -180,7 +180,8 @@ struct lo_data {
>>       int announce_submounts;
>>       bool use_statx;
>>       struct lo_inode root;
>> -    GHashTable *inodes; /* protected by lo->mutex */
>> +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
>> +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
>>       struct lo_map ino_map; /* protected by lo->mutex */
>>       struct lo_map dirp_map; /* protected by lo->mutex */
>>       struct lo_map fd_map; /* protected by lo->mutex */
>> @@ -263,8 +264,9 @@ static struct {
>>   /* That we loaded cap-ng in the current thread from the saved */
>>   static __thread bool cap_loaded = 0;
>>   
>> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
>> -                                uint64_t mnt_id);
>> +static struct lo_inode *lo_find(struct lo_data *lo,
>> +                                const struct lo_fhandle *fhandle,
>> +                                struct stat *st, uint64_t mnt_id);
>>   static int xattr_map_client(const struct lo_data *lo, const char *client_name,
>>                               char **out_name);
>>   
>> @@ -1064,18 +1066,40 @@ out_err:
>>       fuse_reply_err(req, saverr);
>>   }
>>   
>> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
>> -                                uint64_t mnt_id)
>> +static struct lo_inode *lo_find(struct lo_data *lo,
>> +                                const struct lo_fhandle *fhandle,
>> +                                struct stat *st, uint64_t mnt_id)
>>   {
>> -    struct lo_inode *p;
>> -    struct lo_key key = {
>> +    struct lo_inode *p = NULL;
>> +    struct lo_key ids_key = {
>>           .ino = st->st_ino,
>>           .dev = st->st_dev,
>>           .mnt_id = mnt_id,
>>       };
>>   
>>       pthread_mutex_lock(&lo->mutex);
>> -    p = g_hash_table_lookup(lo->inodes, &key);
>> +    if (fhandle) {
>> +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
>> +    }
>> +    if (!p) {
>> +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
> So even if fhandle is not NULL, we will still lookup the inode
> object in lo->inodes_by_ids? I thought fallback was only required
> if we could not generate file handle to begin with and in that case
> fhandle will be NULL?

Well.  I think it depends again on when file handle generation can fail 
and when it cannot.  If we assume it can randomly fail at any time, then 
it’s possible we create an lo_inode with an O_PATH fd, but later we are 
able to generate a file handle for it.  So we first try a lookup by file 
handle here, which would fail, but we’d still have to try a lookup by 
IDs, so we can find the O_PATH lo_inode.

An example case would be if at first we weren’t able to open a mount fd 
(because this file is a device node and the first lo_inode looked up on 
its filesystem), and so we couldn’t generate a file handle that we would 
be sure would work; but later for the lookup we can generate a file 
handle (because some other node on that filesystem has been opened by 
then, so we have a mount fd).

> IOW, should this code instead look like.
>
> if (fhandle) {
>      lookup_in_lo_inodes_by_handle
> } else {
>      lookup_in_lo_inodes_by_ids;
>      if_found_verify_valid_o_path_fd;
> }
>
>
>> +        /*
>> +         * When we had to fall back to looking up an inode by its
>> +         * inode ID, ensure that we hit an entry that has a valid file
>> +         * descriptor.  Having an FD open means that the inode cannot
>> +         * really be deleted until the FD is closed, so that the inode
>> +         * ID remains valid until we evict our lo_inode.
>> +         * With no FD open (and just a file handle), the inode can be
>> +         * deleted while we still have our lo_inode, and so the inode
>> +         * ID may be reused by a completely different new inode.  We
>> +         * then must look up the lo_inode by file handle, because this
>> +         * handle contains a generation ID to differentiate between
>> +         * the old and the new inode.
>> +         */
>> +        if (p && p->fd == -1) {
>> +            p = NULL;
>> +        }
>> +    }
>>       if (p) {
>>           assert(p->nlookup > 0);
>>           p->nlookup++;
>
> [..]
>>   static void fuse_lo_data_cleanup(struct lo_data *lo)
>>   {
>> -    if (lo->inodes) {
>> -        g_hash_table_destroy(lo->inodes);
>> +    if (lo->inodes_by_ids) {
>> +        g_hash_table_destroy(lo->inodes_by_ids);
>> +    }
>> +    if (lo->inodes_by_ids) {
>              ^^^^^
> Should this be lo->inodes_by_handle instead?

Oh, crap, yes, absolutely.

Hanna

>> +        g_hash_table_destroy(lo->inodes_by_handle);
> Thanks
> Vivek
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
  2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
@ 2021-08-09 18:41     ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-09 18:41 UTC (permalink / raw)
  To: Max Reitz
  Cc: virtio-fs, Ioannis Angelakopoulos, qemu-devel, Stefan Hajnoczi,
	Dr . David Alan Gilbert

On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
> When the inode_file_handles option is set, try to generate a file handle
> for new inodes instead of opening an O_PATH FD.
> 
> Being able to open these again will require CAP_DAC_READ_SEARCH, so the
> description text tells the user they will also need to specify
> -o modcaps=+dac_read_search.
> 
> Generating a file handle returns the mount ID it is valid for.  Opening
> it will require an FD instead.  We have mount_fds to map an ID to an FD.
> get_file_handle() fills the hash map by opening the file we have
> generated a handle for.  To verify that the resulting FD indeed
> represents the handle's mount ID, we use statx().  Therefore, using file
> handles requires statx() support.

So opening the file and storing that fd in mount_fds table might be
a potential problem with inotify work Ioannis is doing.

So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
say user unlinks foo.txt. If notifications are enabled, final notification
will not be generated till this mount_fds fd is closed.

Now question is when will this fd be closed? If it closed at some
later point and then notification is generated, that will break
notificaitons.

In fact even O_PATH fd is delaying notifications due to same reason.
But its not too bad as we close O_PATH fd pretty quickly after
unlinking. And we were hoping that file handle support will get rid
of this problem because we will not keep O_PATH fd open.

But, IIUC, mount_fds stuff will make it even worse. I did not see 
the code which removes this fd from mount_fds. So I am not sure what's
the life time of this fd.

Thanks
Vivek

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/helper.c              |   3 +
>  tools/virtiofsd/passthrough_ll.c      | 194 ++++++++++++++++++++++++--
>  tools/virtiofsd/passthrough_seccomp.c |   1 +
>  3 files changed, 190 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index a8295d975a..aa63a21d43 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -187,6 +187,9 @@ void fuse_cmdline_help(void)
>             "                               default: no_allow_direct_io\n"
>             "    -o announce_submounts      Announce sub-mount points to the guest\n"
>             "    -o posix_acl/no_posix_acl  Enable/Disable posix_acl. (default: disabled)\n"
> +           "    -o inode_file_handles      Use file handles to reference inodes\n"
> +           "                               instead of O_PATH file descriptors\n"
> +           "                               (requires -o modcaps=+dac_read_search)\n"
>             );
>  }
>  
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index f9d8b2f134..ac95961d12 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -194,6 +194,7 @@ struct lo_data {
>      /* If set, virtiofsd is responsible for setting umask during creation */
>      bool change_umask;
>      int user_posix_acl, posix_acl;
> +    int inode_file_handles;
>  };
>  
>  /**
> @@ -250,6 +251,10 @@ static const struct fuse_opt lo_opts[] = {
>      { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
>      { "posix_acl", offsetof(struct lo_data, user_posix_acl), 1 },
>      { "no_posix_acl", offsetof(struct lo_data, user_posix_acl), 0 },
> +    { "inode_file_handles", offsetof(struct lo_data, inode_file_handles), 1 },
> +    { "no_inode_file_handles",
> +      offsetof(struct lo_data, inode_file_handles),
> +      0 },
>      FUSE_OPT_END
>  };
>  static bool use_syslog = false;
> @@ -321,6 +326,135 @@ static int temp_fd_steal(TempFd *temp_fd)
>      }
>  }
>  
> +/**
> + * Generate a file handle for the given dirfd/name combination.
> + *
> + * If mount_fds does not yet contain an entry for the handle's mount
> + * ID, (re)open dirfd/name in O_RDONLY mode and add it to mount_fds
> + * as the FD for that mount ID.  (That is the file that we have
> + * generated a handle for, so it should be representative for the
> + * mount ID.  However, to be sure (and to rule out races), we use
> + * statx() to verify that our assumption is correct.)
> + */
> +static struct lo_fhandle *get_file_handle(struct lo_data *lo,
> +                                          int dirfd, const char *name)
> +{
> +    /* We need statx() to verify the mount ID */
> +#if defined(CONFIG_STATX) && defined(STATX_MNT_ID)
> +    struct lo_fhandle *fh;
> +    int ret;
> +
> +    if (!lo->use_statx || !lo->inode_file_handles) {
> +        return NULL;
> +    }
> +
> +    fh = g_new0(struct lo_fhandle, 1);
> +
> +    fh->handle.handle_bytes = sizeof(fh->padding) - sizeof(fh->handle);
> +    ret = name_to_handle_at(dirfd, name, &fh->handle, &fh->mount_id,
> +                            AT_EMPTY_PATH);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    if (pthread_rwlock_rdlock(&mount_fds_lock)) {
> +        goto fail;
> +    }
> +    if (!g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
> +        g_auto(TempFd) path_fd = TEMP_FD_INIT;
> +        struct statx stx;
> +        char procname[64];
> +        int fd;
> +
> +        pthread_rwlock_unlock(&mount_fds_lock);
> +
> +        /*
> +         * Before opening an O_RDONLY fd, check whether dirfd/name is a regular
> +         * file or directory, because we must not open anything else with
> +         * anything but O_PATH.
> +         * (And we use that occasion to verify that the file has the mount ID we
> +         * need.)
> +         */
> +        if (name[0]) {
> +            path_fd.fd = openat(dirfd, name, O_PATH);
> +            if (path_fd.fd < 0) {
> +                goto fail;
> +            }
> +            path_fd.owned = true;
> +        } else {
> +            path_fd.fd = dirfd;
> +            path_fd.owned = false;
> +        }
> +
> +        ret = statx(path_fd.fd, "", AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
> +                    STATX_TYPE | STATX_MNT_ID, &stx);
> +        if (ret < 0) {
> +            if (errno == ENOSYS) {
> +                lo->use_statx = false;
> +                fuse_log(FUSE_LOG_WARNING,
> +                         "statx() does not work: Will not be able to use file "
> +                         "handles for inodes\n");
> +            }
> +            goto fail;
> +        }
> +        if (!(stx.stx_mask & STATX_MNT_ID) || stx.stx_mnt_id != fh->mount_id) {
> +            /*
> +             * One reason for stx_mnt_id != mount_id could be that dirfd/name
> +             * is a directory, and some other filesystem was mounted there
> +             * between us generating the file handle and then opening the FD.
> +             * (Other kinds of races might be possible, too.)
> +             * Failing this function is not fatal, though, because our caller
> +             * (lo_do_lookup()) will just fall back to opening an O_PATH FD to
> +             * store in lo_inode.fd instead of storing a file handle in
> +             * lo_inode.fhandle.  So we do not need to try too hard to get an
> +             * FD for fh->mount_id so this function could succeed.
> +             */
> +            goto fail;
> +        }
> +        if (!(stx.stx_mask & STATX_TYPE) ||
> +            !(S_ISREG(stx.stx_mode) || S_ISDIR(stx.stx_mode)))
> +        {
> +            /*
> +             * We must not open special files with anything but O_PATH, so we
> +             * cannot use this file for mount_fds.
> +             * Just return a failure in such a case and let the lo_inode have
> +             * an O_PATH fd instead of a file handle.
> +             */
> +            goto fail;
> +        }
> +
> +        /* Now that we know this fd is safe to open, do it */
> +        snprintf(procname, sizeof(procname), "%i", path_fd.fd);
> +        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        if (fd < 0) {
> +            goto fail;
> +        }
> +
> +        if (pthread_rwlock_wrlock(&mount_fds_lock)) {
> +            goto fail;
> +        }
> +
> +        /* Check again, might have changed */
> +        if (g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
> +            close(fd);
> +        } else {
> +            g_hash_table_insert(mount_fds,
> +                                GINT_TO_POINTER(fh->mount_id),
> +                                GINT_TO_POINTER(fd));
> +        }
> +    }
> +    pthread_rwlock_unlock(&mount_fds_lock);
> +
> +    return fh;
> +
> +fail:
> +    free(fh);
> +    return NULL;
> +#else /* defined(CONFIG_STATX) && defined(STATX_MNT_ID) */
> +    return NULL;
> +#endif
> +}
> +
>  /**
>   * Open the given file handle with the given flags.
>   *
> @@ -1165,6 +1299,11 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>              return -1;
>          }
>          lo->use_statx = false;
> +        if (lo->inode_file_handles) {
> +            fuse_log(FUSE_LOG_WARNING,
> +                     "statx() does not work: Will not be able to use file "
> +                     "handles for inodes\n");
> +        }
>          /* fallback */
>      }
>  #endif
> @@ -1194,6 +1333,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode = NULL;
>      struct lo_inode *dir = lo_inode(req, parent);
> +    struct lo_fhandle *fh;
>  
>      if (inodep) {
>          *inodep = NULL; /* in case there is an error */
> @@ -1223,13 +1363,21 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out;
>      }
>  
> -    newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
> -    if (newfd == -1) {
> -        goto out_err;
> +    fh = get_file_handle(lo, dir_fd.fd, name);
> +    if (!fh) {
> +        newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
> +        if (newfd == -1) {
> +            goto out_err;
> +        }
>      }
>  
> -    res = do_statx(lo, newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
> -                   &mnt_id);
> +    if (newfd >= 0) {
> +        res = do_statx(lo, newfd, "", &e->attr,
> +                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
> +    } else {
> +        res = do_statx(lo, dir_fd.fd, name, &e->attr,
> +                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
> +    }
>      if (res == -1) {
>          goto out_err;
>      }
> @@ -1239,9 +1387,19 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
>      }
>  
> -    inode = lo_find(lo, NULL, &e->attr, mnt_id);
> +    /*
> +     * Note that fh is always NULL if lo->inode_file_handles is false,
> +     * and so we will never do a lookup by file handle here, and
> +     * lo->inodes_by_handle will always remain empty.  We only need
> +     * this map when we do not have an O_PATH fd open for every
> +     * lo_inode, though, so if inode_file_handles is false, we do not
> +     * need that map anyway.
> +     */
> +    inode = lo_find(lo, fh, &e->attr, mnt_id);
>      if (inode) {
> -        close(newfd);
> +        if (newfd != -1) {
> +            close(newfd);
> +        }
>      } else {
>          inode = calloc(1, sizeof(struct lo_inode));
>          if (!inode) {
> @@ -1259,6 +1417,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>  
>          inode->nlookup = 1;
>          inode->fd = newfd;
> +        inode->fhandle = fh;
>          inode->key.ino = e->attr.st_ino;
>          inode->key.dev = e->attr.st_dev;
>          inode->key.mnt_id = mnt_id;
> @@ -1270,6 +1429,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          pthread_mutex_lock(&lo->mutex);
>          inode->fuse_ino = lo_add_inode_mapping(req, inode);
>          g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
> +        if (inode->fhandle) {
> +            g_hash_table_insert(lo->inodes_by_handle, inode->fhandle, inode);
> +        }
>          pthread_mutex_unlock(&lo->mutex);
>      }
>      e->ino = inode->fuse_ino;
> @@ -1615,6 +1777,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>      int res;
>      uint64_t mnt_id;
>      struct stat attr;
> +    struct lo_fhandle *fh;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *dir = lo_inode(req, parent);
>      struct lo_inode *inode = NULL;
> @@ -1628,12 +1791,16 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>          goto out;
>      }
>  
> +    fh = get_file_handle(lo, dir_fd.fd, name);
> +    /* Ignore errors, this is just an optional key for the lookup */
> +
>      res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
>      if (res == -1) {
>          goto out;
>      }
>  
> -    inode = lo_find(lo, NULL, &attr, mnt_id);
> +    inode = lo_find(lo, fh, &attr, mnt_id);
> +    g_free(fh);
>  
>  out:
>      lo_inode_put(lo, &dir);
> @@ -1801,6 +1968,9 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
>      if (!inode->nlookup) {
>          lo_map_remove(&lo->ino_map, inode->fuse_ino);
>          g_hash_table_remove(lo->inodes_by_ids, &inode->key);
> +        if (inode->fhandle) {
> +            g_hash_table_remove(lo->inodes_by_handle, inode->fhandle);
> +        }
>          if (lo->posix_lock) {
>              if (g_hash_table_size(inode->posix_locks)) {
>                  fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
> @@ -4362,6 +4532,14 @@ int main(int argc, char *argv[])
>  
>      lo.use_statx = true;
>  
> +#if !defined(CONFIG_STATX) || !defined(STATX_MNT_ID)
> +    if (lo.inode_file_handles) {
> +        fuse_log(FUSE_LOG_WARNING,
> +                 "No statx() or mount ID support: Will not be able to use file "
> +                 "handles for inodes\n");
> +    }
> +#endif
> +
>      se = fuse_session_new(&args, &lo_oper, sizeof(lo_oper), &lo);
>      if (se == NULL) {
>          goto err_out1;
> diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
> index af04c638cb..ab4dc07e3f 100644
> --- a/tools/virtiofsd/passthrough_seccomp.c
> +++ b/tools/virtiofsd/passthrough_seccomp.c
> @@ -73,6 +73,7 @@ static const int syscall_allowlist[] = {
>      SCMP_SYS(mprotect),
>      SCMP_SYS(mremap),
>      SCMP_SYS(munmap),
> +    SCMP_SYS(name_to_handle_at),
>      SCMP_SYS(newfstatat),
>      SCMP_SYS(statx),
>      SCMP_SYS(open),
> -- 
> 2.31.1
> 



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-08-09 18:41     ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-09 18:41 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
> When the inode_file_handles option is set, try to generate a file handle
> for new inodes instead of opening an O_PATH FD.
> 
> Being able to open these again will require CAP_DAC_READ_SEARCH, so the
> description text tells the user they will also need to specify
> -o modcaps=+dac_read_search.
> 
> Generating a file handle returns the mount ID it is valid for.  Opening
> it will require an FD instead.  We have mount_fds to map an ID to an FD.
> get_file_handle() fills the hash map by opening the file we have
> generated a handle for.  To verify that the resulting FD indeed
> represents the handle's mount ID, we use statx().  Therefore, using file
> handles requires statx() support.

So opening the file and storing that fd in mount_fds table might be
a potential problem with inotify work Ioannis is doing.

So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
say user unlinks foo.txt. If notifications are enabled, final notification
will not be generated till this mount_fds fd is closed.

Now question is when will this fd be closed? If it closed at some
later point and then notification is generated, that will break
notificaitons.

In fact even O_PATH fd is delaying notifications due to same reason.
But its not too bad as we close O_PATH fd pretty quickly after
unlinking. And we were hoping that file handle support will get rid
of this problem because we will not keep O_PATH fd open.

But, IIUC, mount_fds stuff will make it even worse. I did not see 
the code which removes this fd from mount_fds. So I am not sure what's
the life time of this fd.

Thanks
Vivek

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/helper.c              |   3 +
>  tools/virtiofsd/passthrough_ll.c      | 194 ++++++++++++++++++++++++--
>  tools/virtiofsd/passthrough_seccomp.c |   1 +
>  3 files changed, 190 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index a8295d975a..aa63a21d43 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -187,6 +187,9 @@ void fuse_cmdline_help(void)
>             "                               default: no_allow_direct_io\n"
>             "    -o announce_submounts      Announce sub-mount points to the guest\n"
>             "    -o posix_acl/no_posix_acl  Enable/Disable posix_acl. (default: disabled)\n"
> +           "    -o inode_file_handles      Use file handles to reference inodes\n"
> +           "                               instead of O_PATH file descriptors\n"
> +           "                               (requires -o modcaps=+dac_read_search)\n"
>             );
>  }
>  
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index f9d8b2f134..ac95961d12 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -194,6 +194,7 @@ struct lo_data {
>      /* If set, virtiofsd is responsible for setting umask during creation */
>      bool change_umask;
>      int user_posix_acl, posix_acl;
> +    int inode_file_handles;
>  };
>  
>  /**
> @@ -250,6 +251,10 @@ static const struct fuse_opt lo_opts[] = {
>      { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
>      { "posix_acl", offsetof(struct lo_data, user_posix_acl), 1 },
>      { "no_posix_acl", offsetof(struct lo_data, user_posix_acl), 0 },
> +    { "inode_file_handles", offsetof(struct lo_data, inode_file_handles), 1 },
> +    { "no_inode_file_handles",
> +      offsetof(struct lo_data, inode_file_handles),
> +      0 },
>      FUSE_OPT_END
>  };
>  static bool use_syslog = false;
> @@ -321,6 +326,135 @@ static int temp_fd_steal(TempFd *temp_fd)
>      }
>  }
>  
> +/**
> + * Generate a file handle for the given dirfd/name combination.
> + *
> + * If mount_fds does not yet contain an entry for the handle's mount
> + * ID, (re)open dirfd/name in O_RDONLY mode and add it to mount_fds
> + * as the FD for that mount ID.  (That is the file that we have
> + * generated a handle for, so it should be representative for the
> + * mount ID.  However, to be sure (and to rule out races), we use
> + * statx() to verify that our assumption is correct.)
> + */
> +static struct lo_fhandle *get_file_handle(struct lo_data *lo,
> +                                          int dirfd, const char *name)
> +{
> +    /* We need statx() to verify the mount ID */
> +#if defined(CONFIG_STATX) && defined(STATX_MNT_ID)
> +    struct lo_fhandle *fh;
> +    int ret;
> +
> +    if (!lo->use_statx || !lo->inode_file_handles) {
> +        return NULL;
> +    }
> +
> +    fh = g_new0(struct lo_fhandle, 1);
> +
> +    fh->handle.handle_bytes = sizeof(fh->padding) - sizeof(fh->handle);
> +    ret = name_to_handle_at(dirfd, name, &fh->handle, &fh->mount_id,
> +                            AT_EMPTY_PATH);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    if (pthread_rwlock_rdlock(&mount_fds_lock)) {
> +        goto fail;
> +    }
> +    if (!g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
> +        g_auto(TempFd) path_fd = TEMP_FD_INIT;
> +        struct statx stx;
> +        char procname[64];
> +        int fd;
> +
> +        pthread_rwlock_unlock(&mount_fds_lock);
> +
> +        /*
> +         * Before opening an O_RDONLY fd, check whether dirfd/name is a regular
> +         * file or directory, because we must not open anything else with
> +         * anything but O_PATH.
> +         * (And we use that occasion to verify that the file has the mount ID we
> +         * need.)
> +         */
> +        if (name[0]) {
> +            path_fd.fd = openat(dirfd, name, O_PATH);
> +            if (path_fd.fd < 0) {
> +                goto fail;
> +            }
> +            path_fd.owned = true;
> +        } else {
> +            path_fd.fd = dirfd;
> +            path_fd.owned = false;
> +        }
> +
> +        ret = statx(path_fd.fd, "", AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
> +                    STATX_TYPE | STATX_MNT_ID, &stx);
> +        if (ret < 0) {
> +            if (errno == ENOSYS) {
> +                lo->use_statx = false;
> +                fuse_log(FUSE_LOG_WARNING,
> +                         "statx() does not work: Will not be able to use file "
> +                         "handles for inodes\n");
> +            }
> +            goto fail;
> +        }
> +        if (!(stx.stx_mask & STATX_MNT_ID) || stx.stx_mnt_id != fh->mount_id) {
> +            /*
> +             * One reason for stx_mnt_id != mount_id could be that dirfd/name
> +             * is a directory, and some other filesystem was mounted there
> +             * between us generating the file handle and then opening the FD.
> +             * (Other kinds of races might be possible, too.)
> +             * Failing this function is not fatal, though, because our caller
> +             * (lo_do_lookup()) will just fall back to opening an O_PATH FD to
> +             * store in lo_inode.fd instead of storing a file handle in
> +             * lo_inode.fhandle.  So we do not need to try too hard to get an
> +             * FD for fh->mount_id so this function could succeed.
> +             */
> +            goto fail;
> +        }
> +        if (!(stx.stx_mask & STATX_TYPE) ||
> +            !(S_ISREG(stx.stx_mode) || S_ISDIR(stx.stx_mode)))
> +        {
> +            /*
> +             * We must not open special files with anything but O_PATH, so we
> +             * cannot use this file for mount_fds.
> +             * Just return a failure in such a case and let the lo_inode have
> +             * an O_PATH fd instead of a file handle.
> +             */
> +            goto fail;
> +        }
> +
> +        /* Now that we know this fd is safe to open, do it */
> +        snprintf(procname, sizeof(procname), "%i", path_fd.fd);
> +        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        if (fd < 0) {
> +            goto fail;
> +        }
> +
> +        if (pthread_rwlock_wrlock(&mount_fds_lock)) {
> +            goto fail;
> +        }
> +
> +        /* Check again, might have changed */
> +        if (g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
> +            close(fd);
> +        } else {
> +            g_hash_table_insert(mount_fds,
> +                                GINT_TO_POINTER(fh->mount_id),
> +                                GINT_TO_POINTER(fd));
> +        }
> +    }
> +    pthread_rwlock_unlock(&mount_fds_lock);
> +
> +    return fh;
> +
> +fail:
> +    free(fh);
> +    return NULL;
> +#else /* defined(CONFIG_STATX) && defined(STATX_MNT_ID) */
> +    return NULL;
> +#endif
> +}
> +
>  /**
>   * Open the given file handle with the given flags.
>   *
> @@ -1165,6 +1299,11 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>              return -1;
>          }
>          lo->use_statx = false;
> +        if (lo->inode_file_handles) {
> +            fuse_log(FUSE_LOG_WARNING,
> +                     "statx() does not work: Will not be able to use file "
> +                     "handles for inodes\n");
> +        }
>          /* fallback */
>      }
>  #endif
> @@ -1194,6 +1333,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode = NULL;
>      struct lo_inode *dir = lo_inode(req, parent);
> +    struct lo_fhandle *fh;
>  
>      if (inodep) {
>          *inodep = NULL; /* in case there is an error */
> @@ -1223,13 +1363,21 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out;
>      }
>  
> -    newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
> -    if (newfd == -1) {
> -        goto out_err;
> +    fh = get_file_handle(lo, dir_fd.fd, name);
> +    if (!fh) {
> +        newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
> +        if (newfd == -1) {
> +            goto out_err;
> +        }
>      }
>  
> -    res = do_statx(lo, newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
> -                   &mnt_id);
> +    if (newfd >= 0) {
> +        res = do_statx(lo, newfd, "", &e->attr,
> +                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
> +    } else {
> +        res = do_statx(lo, dir_fd.fd, name, &e->attr,
> +                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
> +    }
>      if (res == -1) {
>          goto out_err;
>      }
> @@ -1239,9 +1387,19 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
>      }
>  
> -    inode = lo_find(lo, NULL, &e->attr, mnt_id);
> +    /*
> +     * Note that fh is always NULL if lo->inode_file_handles is false,
> +     * and so we will never do a lookup by file handle here, and
> +     * lo->inodes_by_handle will always remain empty.  We only need
> +     * this map when we do not have an O_PATH fd open for every
> +     * lo_inode, though, so if inode_file_handles is false, we do not
> +     * need that map anyway.
> +     */
> +    inode = lo_find(lo, fh, &e->attr, mnt_id);
>      if (inode) {
> -        close(newfd);
> +        if (newfd != -1) {
> +            close(newfd);
> +        }
>      } else {
>          inode = calloc(1, sizeof(struct lo_inode));
>          if (!inode) {
> @@ -1259,6 +1417,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>  
>          inode->nlookup = 1;
>          inode->fd = newfd;
> +        inode->fhandle = fh;
>          inode->key.ino = e->attr.st_ino;
>          inode->key.dev = e->attr.st_dev;
>          inode->key.mnt_id = mnt_id;
> @@ -1270,6 +1429,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          pthread_mutex_lock(&lo->mutex);
>          inode->fuse_ino = lo_add_inode_mapping(req, inode);
>          g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
> +        if (inode->fhandle) {
> +            g_hash_table_insert(lo->inodes_by_handle, inode->fhandle, inode);
> +        }
>          pthread_mutex_unlock(&lo->mutex);
>      }
>      e->ino = inode->fuse_ino;
> @@ -1615,6 +1777,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>      int res;
>      uint64_t mnt_id;
>      struct stat attr;
> +    struct lo_fhandle *fh;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *dir = lo_inode(req, parent);
>      struct lo_inode *inode = NULL;
> @@ -1628,12 +1791,16 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>          goto out;
>      }
>  
> +    fh = get_file_handle(lo, dir_fd.fd, name);
> +    /* Ignore errors, this is just an optional key for the lookup */
> +
>      res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
>      if (res == -1) {
>          goto out;
>      }
>  
> -    inode = lo_find(lo, NULL, &attr, mnt_id);
> +    inode = lo_find(lo, fh, &attr, mnt_id);
> +    g_free(fh);
>  
>  out:
>      lo_inode_put(lo, &dir);
> @@ -1801,6 +1968,9 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
>      if (!inode->nlookup) {
>          lo_map_remove(&lo->ino_map, inode->fuse_ino);
>          g_hash_table_remove(lo->inodes_by_ids, &inode->key);
> +        if (inode->fhandle) {
> +            g_hash_table_remove(lo->inodes_by_handle, inode->fhandle);
> +        }
>          if (lo->posix_lock) {
>              if (g_hash_table_size(inode->posix_locks)) {
>                  fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
> @@ -4362,6 +4532,14 @@ int main(int argc, char *argv[])
>  
>      lo.use_statx = true;
>  
> +#if !defined(CONFIG_STATX) || !defined(STATX_MNT_ID)
> +    if (lo.inode_file_handles) {
> +        fuse_log(FUSE_LOG_WARNING,
> +                 "No statx() or mount ID support: Will not be able to use file "
> +                 "handles for inodes\n");
> +    }
> +#endif
> +
>      se = fuse_session_new(&args, &lo_oper, sizeof(lo_oper), &lo);
>      if (se == NULL) {
>          goto err_out1;
> diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
> index af04c638cb..ab4dc07e3f 100644
> --- a/tools/virtiofsd/passthrough_seccomp.c
> +++ b/tools/virtiofsd/passthrough_seccomp.c
> @@ -73,6 +73,7 @@ static const int syscall_allowlist[] = {
>      SCMP_SYS(mprotect),
>      SCMP_SYS(mremap),
>      SCMP_SYS(munmap),
> +    SCMP_SYS(name_to_handle_at),
>      SCMP_SYS(newfstatat),
>      SCMP_SYS(statx),
>      SCMP_SYS(open),
> -- 
> 2.31.1
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 10/10] virtiofsd: Add lazy lo_do_find()
  2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
@ 2021-08-09 19:08     ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-09 19:08 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Fri, Jul 30, 2021 at 05:01:34PM +0200, Max Reitz wrote:
> lo_find() right now takes two lookup keys for two maps, namely the file
> handle for inodes_by_handle and the statx information for inodes_by_ids.
> However, we only need the statx information if looking up the inode by
> the file handle failed.
> 
> There are two callers of lo_find(): The first one, lo_do_lookup(), has
> both keys anyway, so passing them does not incur any additional cost.
> The second one, lookup_name(), though, needs to explicitly invoke
> name_to_handle_at() (through get_file_handle()) and statx() (through
> do_statx()).  We need to try to get a file handle as the primary key, so
> we cannot get rid of get_file_handle(), but we only need the statx
> information if looking up an inode by handle failed; so we can defer
> that until the lookup has indeed failed.

So IIUC, this patch seems to be all about avoiding do_statx()
call in lookup_name() if file handle could be successfully
generated.

So can't we just not modify lookup_name() to not call statx()
if file handle could be generated. And also modfiy lo_find()
to use st/mnt_id only if fhandle==NULL.

That probably is much simpler change as compared to passing function
pointers around.

Vivek

> 
> To this end, replace lo_find()'s st/mnt_id parameters by a get_ids()
> closure that is invoked to fill the lo_key struct if necessary.
> 
> Also, lo_find() is renamed to lo_do_find(), so we can add a new
> lo_find() wrapper whose closure just initializes the lo_key from the
> st/mnt_id parameters, just like the old lo_find() did.
> 
> lookup_name() directly calls lo_do_find() now and passes its own
> closure, which performs the do_statx() call.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 93 ++++++++++++++++++++++++++------
>  1 file changed, 76 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index ac95961d12..41e9f53878 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -1200,22 +1200,23 @@ out_err:
>      fuse_reply_err(req, saverr);
>  }
>  
> -static struct lo_inode *lo_find(struct lo_data *lo,
> -                                const struct lo_fhandle *fhandle,
> -                                struct stat *st, uint64_t mnt_id)
> +/*
> + * get_ids() will be called to get the key for lo->inodes_by_ids if
> + * the lookup by file handle has failed.
> + */
> +static struct lo_inode *lo_do_find(struct lo_data *lo,
> +    const struct lo_fhandle *fhandle,
> +    int (*get_ids)(struct lo_key *, const void *),
> +    const void *get_ids_opaque)
>  {
>      struct lo_inode *p = NULL;
> -    struct lo_key ids_key = {
> -        .ino = st->st_ino,
> -        .dev = st->st_dev,
> -        .mnt_id = mnt_id,
> -    };
> +    struct lo_key ids_key;
>  
>      pthread_mutex_lock(&lo->mutex);
>      if (fhandle) {
>          p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
>      }
> -    if (!p) {
> +    if (!p && get_ids(&ids_key, get_ids_opaque) == 0) {
>          p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
>          /*
>           * When we had to fall back to looking up an inode by its
> @@ -1244,6 +1245,36 @@ static struct lo_inode *lo_find(struct lo_data *lo,
>      return p;
>  }
>  
> +struct lo_find_get_ids_key_opaque {
> +    const struct stat *st;
> +    uint64_t mnt_id;
> +};
> +
> +static int lo_find_get_ids_key(struct lo_key *ids_key, const void *opaque)
> +{
> +    const struct lo_find_get_ids_key_opaque *stat_info = opaque;
> +
> +    *ids_key = (struct lo_key){
> +        .ino = stat_info->st->st_ino,
> +        .dev = stat_info->st->st_dev,
> +        .mnt_id = stat_info->mnt_id,
> +    };
> +
> +    return 0;
> +}
> +
> +static struct lo_inode *lo_find(struct lo_data *lo,
> +                                const struct lo_fhandle *fhandle,
> +                                struct stat *st, uint64_t mnt_id)
> +{
> +    const struct lo_find_get_ids_key_opaque stat_info = {
> +        .st = st,
> +        .mnt_id = mnt_id,
> +    };
> +
> +    return lo_do_find(lo, fhandle, lo_find_get_ids_key, &stat_info);
> +}
> +
>  /* value_destroy_func for posix_locks GHashTable */
>  static void posix_locks_value_destroy(gpointer data)
>  {
> @@ -1769,14 +1800,41 @@ out_err:
>      fuse_reply_err(req, saverr);
>  }
>  
> +struct lookup_name_get_ids_key_opaque {
> +    struct lo_data *lo;
> +    int parent_fd;
> +    const char *name;
> +};
> +
> +static int lookup_name_get_ids_key(struct lo_key *ids_key, const void *opaque)
> +{
> +    const struct lookup_name_get_ids_key_opaque *stat_params = opaque;
> +    uint64_t mnt_id;
> +    struct stat attr;
> +    int res;
> +
> +    res = do_statx(stat_params->lo, stat_params->parent_fd, stat_params->name,
> +                   &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
> +    if (res < 0) {
> +        return -errno;
> +    }
> +
> +    *ids_key = (struct lo_key){
> +        .ino = attr.st_ino,
> +        .dev = attr.st_dev,
> +        .mnt_id = mnt_id,
> +    };
> +
> +    return 0;
> +}
> +
>  /* Increments nlookup and caller must release refcount using lo_inode_put() */
>  static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>                                      const char *name)
>  {
>      g_auto(TempFd) dir_fd = TEMP_FD_INIT;
>      int res;
> -    uint64_t mnt_id;
> -    struct stat attr;
> +    struct lookup_name_get_ids_key_opaque stat_params;
>      struct lo_fhandle *fh;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *dir = lo_inode(req, parent);
> @@ -1794,12 +1852,13 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>      fh = get_file_handle(lo, dir_fd.fd, name);
>      /* Ignore errors, this is just an optional key for the lookup */
>  
> -    res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
> -    if (res == -1) {
> -        goto out;
> -    }
> -
> -    inode = lo_find(lo, fh, &attr, mnt_id);
> +    stat_params = (struct lookup_name_get_ids_key_opaque){
> +        .lo = lo,
> +        .parent_fd = dir_fd.fd,
> +        .name = name,
> +    };
> +    inode = lo_do_find(lo, fh, lookup_name_get_ids_key, &stat_params);
> +    lo_inode_put(lo, &dir);
>      g_free(fh);
>  
>  out:
> -- 
> 2.31.1
> 



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 10/10] virtiofsd: Add lazy lo_do_find()
@ 2021-08-09 19:08     ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-09 19:08 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Fri, Jul 30, 2021 at 05:01:34PM +0200, Max Reitz wrote:
> lo_find() right now takes two lookup keys for two maps, namely the file
> handle for inodes_by_handle and the statx information for inodes_by_ids.
> However, we only need the statx information if looking up the inode by
> the file handle failed.
> 
> There are two callers of lo_find(): The first one, lo_do_lookup(), has
> both keys anyway, so passing them does not incur any additional cost.
> The second one, lookup_name(), though, needs to explicitly invoke
> name_to_handle_at() (through get_file_handle()) and statx() (through
> do_statx()).  We need to try to get a file handle as the primary key, so
> we cannot get rid of get_file_handle(), but we only need the statx
> information if looking up an inode by handle failed; so we can defer
> that until the lookup has indeed failed.

So IIUC, this patch seems to be all about avoiding do_statx()
call in lookup_name() if file handle could be successfully
generated.

So can't we just not modify lookup_name() to not call statx()
if file handle could be generated. And also modfiy lo_find()
to use st/mnt_id only if fhandle==NULL.

That probably is much simpler change as compared to passing function
pointers around.

Vivek

> 
> To this end, replace lo_find()'s st/mnt_id parameters by a get_ids()
> closure that is invoked to fill the lo_key struct if necessary.
> 
> Also, lo_find() is renamed to lo_do_find(), so we can add a new
> lo_find() wrapper whose closure just initializes the lo_key from the
> st/mnt_id parameters, just like the old lo_find() did.
> 
> lookup_name() directly calls lo_do_find() now and passes its own
> closure, which performs the do_statx() call.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 93 ++++++++++++++++++++++++++------
>  1 file changed, 76 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index ac95961d12..41e9f53878 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -1200,22 +1200,23 @@ out_err:
>      fuse_reply_err(req, saverr);
>  }
>  
> -static struct lo_inode *lo_find(struct lo_data *lo,
> -                                const struct lo_fhandle *fhandle,
> -                                struct stat *st, uint64_t mnt_id)
> +/*
> + * get_ids() will be called to get the key for lo->inodes_by_ids if
> + * the lookup by file handle has failed.
> + */
> +static struct lo_inode *lo_do_find(struct lo_data *lo,
> +    const struct lo_fhandle *fhandle,
> +    int (*get_ids)(struct lo_key *, const void *),
> +    const void *get_ids_opaque)
>  {
>      struct lo_inode *p = NULL;
> -    struct lo_key ids_key = {
> -        .ino = st->st_ino,
> -        .dev = st->st_dev,
> -        .mnt_id = mnt_id,
> -    };
> +    struct lo_key ids_key;
>  
>      pthread_mutex_lock(&lo->mutex);
>      if (fhandle) {
>          p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
>      }
> -    if (!p) {
> +    if (!p && get_ids(&ids_key, get_ids_opaque) == 0) {
>          p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
>          /*
>           * When we had to fall back to looking up an inode by its
> @@ -1244,6 +1245,36 @@ static struct lo_inode *lo_find(struct lo_data *lo,
>      return p;
>  }
>  
> +struct lo_find_get_ids_key_opaque {
> +    const struct stat *st;
> +    uint64_t mnt_id;
> +};
> +
> +static int lo_find_get_ids_key(struct lo_key *ids_key, const void *opaque)
> +{
> +    const struct lo_find_get_ids_key_opaque *stat_info = opaque;
> +
> +    *ids_key = (struct lo_key){
> +        .ino = stat_info->st->st_ino,
> +        .dev = stat_info->st->st_dev,
> +        .mnt_id = stat_info->mnt_id,
> +    };
> +
> +    return 0;
> +}
> +
> +static struct lo_inode *lo_find(struct lo_data *lo,
> +                                const struct lo_fhandle *fhandle,
> +                                struct stat *st, uint64_t mnt_id)
> +{
> +    const struct lo_find_get_ids_key_opaque stat_info = {
> +        .st = st,
> +        .mnt_id = mnt_id,
> +    };
> +
> +    return lo_do_find(lo, fhandle, lo_find_get_ids_key, &stat_info);
> +}
> +
>  /* value_destroy_func for posix_locks GHashTable */
>  static void posix_locks_value_destroy(gpointer data)
>  {
> @@ -1769,14 +1800,41 @@ out_err:
>      fuse_reply_err(req, saverr);
>  }
>  
> +struct lookup_name_get_ids_key_opaque {
> +    struct lo_data *lo;
> +    int parent_fd;
> +    const char *name;
> +};
> +
> +static int lookup_name_get_ids_key(struct lo_key *ids_key, const void *opaque)
> +{
> +    const struct lookup_name_get_ids_key_opaque *stat_params = opaque;
> +    uint64_t mnt_id;
> +    struct stat attr;
> +    int res;
> +
> +    res = do_statx(stat_params->lo, stat_params->parent_fd, stat_params->name,
> +                   &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
> +    if (res < 0) {
> +        return -errno;
> +    }
> +
> +    *ids_key = (struct lo_key){
> +        .ino = attr.st_ino,
> +        .dev = attr.st_dev,
> +        .mnt_id = mnt_id,
> +    };
> +
> +    return 0;
> +}
> +
>  /* Increments nlookup and caller must release refcount using lo_inode_put() */
>  static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>                                      const char *name)
>  {
>      g_auto(TempFd) dir_fd = TEMP_FD_INIT;
>      int res;
> -    uint64_t mnt_id;
> -    struct stat attr;
> +    struct lookup_name_get_ids_key_opaque stat_params;
>      struct lo_fhandle *fh;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *dir = lo_inode(req, parent);
> @@ -1794,12 +1852,13 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>      fh = get_file_handle(lo, dir_fd.fd, name);
>      /* Ignore errors, this is just an optional key for the lookup */
>  
> -    res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
> -    if (res == -1) {
> -        goto out;
> -    }
> -
> -    inode = lo_find(lo, fh, &attr, mnt_id);
> +    stat_params = (struct lookup_name_get_ids_key_opaque){
> +        .lo = lo,
> +        .parent_fd = dir_fd.fd,
> +        .name = name,
> +    };
> +    inode = lo_do_find(lo, fh, lookup_name_get_ids_key, &stat_params);
> +    lo_inode_put(lo, &dir);
>      g_free(fh);
>  
>  out:
> -- 
> 2.31.1
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
  2021-08-09 18:41     ` [Virtio-fs] " Vivek Goyal
@ 2021-08-10  8:32       ` Hanna Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-10  8:32 UTC (permalink / raw)
  To: Vivek Goyal, Max Reitz
  Cc: virtio-fs, Ioannis Angelakopoulos, qemu-devel, Stefan Hajnoczi,
	Dr . David Alan Gilbert

On 09.08.21 20:41, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
>> When the inode_file_handles option is set, try to generate a file handle
>> for new inodes instead of opening an O_PATH FD.
>>
>> Being able to open these again will require CAP_DAC_READ_SEARCH, so the
>> description text tells the user they will also need to specify
>> -o modcaps=+dac_read_search.
>>
>> Generating a file handle returns the mount ID it is valid for.  Opening
>> it will require an FD instead.  We have mount_fds to map an ID to an FD.
>> get_file_handle() fills the hash map by opening the file we have
>> generated a handle for.  To verify that the resulting FD indeed
>> represents the handle's mount ID, we use statx().  Therefore, using file
>> handles requires statx() support.
> So opening the file and storing that fd in mount_fds table might be
> a potential problem with inotify work Ioannis is doing.
>
> So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
> say user unlinks foo.txt. If notifications are enabled, final notification
> will not be generated till this mount_fds fd is closed.
>
> Now question is when will this fd be closed? If it closed at some
> later point and then notification is generated, that will break
> notificaitons.

Currently, it is never closed.

> In fact even O_PATH fd is delaying notifications due to same reason.
> But its not too bad as we close O_PATH fd pretty quickly after
> unlinking. And we were hoping that file handle support will get rid
> of this problem because we will not keep O_PATH fd open.
>
> But, IIUC, mount_fds stuff will make it even worse. I did not see
> the code which removes this fd from mount_fds. So I am not sure what's
> the life time of this fd.

The lifetime is forever.  If we wanted to remove it at some point, we’d 
need to track how many file handles we have open for the given mount fd 
and then remove it from the table once the count reaches 0, so it would 
still be delayed.

I think in practice the first thing that is looked up from some mount 
will probably be the root directory, which cannot be deleted before 
everything else on the mount is gone, so that would work.  We track how 
many handles are there, if the whole mount were to be deleted, I hope 
all lo_inodes are evicted, the count goes to 0, and we can drop the 
mount fd.

I think we can make the assumption that the mount fd is the root 
directory certain by, well, looking into mountinfo...  That would result 
in us always opening the root node of the filesystem, so that first the 
whole filesystem needs to disappear before it can be deleted (and our 
mount fd closed) – which should work, I guess?

It’s a bit tricky because our sandboxing prevents easy access to 
mountinfo, but if that’s the only way...

Hanna



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-08-10  8:32       ` Hanna Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-10  8:32 UTC (permalink / raw)
  To: Vivek Goyal, Max Reitz; +Cc: virtio-fs, qemu-devel

On 09.08.21 20:41, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
>> When the inode_file_handles option is set, try to generate a file handle
>> for new inodes instead of opening an O_PATH FD.
>>
>> Being able to open these again will require CAP_DAC_READ_SEARCH, so the
>> description text tells the user they will also need to specify
>> -o modcaps=+dac_read_search.
>>
>> Generating a file handle returns the mount ID it is valid for.  Opening
>> it will require an FD instead.  We have mount_fds to map an ID to an FD.
>> get_file_handle() fills the hash map by opening the file we have
>> generated a handle for.  To verify that the resulting FD indeed
>> represents the handle's mount ID, we use statx().  Therefore, using file
>> handles requires statx() support.
> So opening the file and storing that fd in mount_fds table might be
> a potential problem with inotify work Ioannis is doing.
>
> So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
> say user unlinks foo.txt. If notifications are enabled, final notification
> will not be generated till this mount_fds fd is closed.
>
> Now question is when will this fd be closed? If it closed at some
> later point and then notification is generated, that will break
> notificaitons.

Currently, it is never closed.

> In fact even O_PATH fd is delaying notifications due to same reason.
> But its not too bad as we close O_PATH fd pretty quickly after
> unlinking. And we were hoping that file handle support will get rid
> of this problem because we will not keep O_PATH fd open.
>
> But, IIUC, mount_fds stuff will make it even worse. I did not see
> the code which removes this fd from mount_fds. So I am not sure what's
> the life time of this fd.

The lifetime is forever.  If we wanted to remove it at some point, we’d 
need to track how many file handles we have open for the given mount fd 
and then remove it from the table once the count reaches 0, so it would 
still be delayed.

I think in practice the first thing that is looked up from some mount 
will probably be the root directory, which cannot be deleted before 
everything else on the mount is gone, so that would work.  We track how 
many handles are there, if the whole mount were to be deleted, I hope 
all lo_inodes are evicted, the count goes to 0, and we can drop the 
mount fd.

I think we can make the assumption that the mount fd is the root 
directory certain by, well, looking into mountinfo...  That would result 
in us always opening the root node of the filesystem, so that first the 
whole filesystem needs to disappear before it can be deleted (and our 
mount fd closed) – which should work, I guess?

It’s a bit tricky because our sandboxing prevents easy access to 
mountinfo, but if that’s the only way...

Hanna


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 10/10] virtiofsd: Add lazy lo_do_find()
  2021-08-09 19:08     ` [Virtio-fs] " Vivek Goyal
@ 2021-08-10  8:38       ` Hanna Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-10  8:38 UTC (permalink / raw)
  To: Vivek Goyal, Max Reitz
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 09.08.21 21:08, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:34PM +0200, Max Reitz wrote:
>> lo_find() right now takes two lookup keys for two maps, namely the file
>> handle for inodes_by_handle and the statx information for inodes_by_ids.
>> However, we only need the statx information if looking up the inode by
>> the file handle failed.
>>
>> There are two callers of lo_find(): The first one, lo_do_lookup(), has
>> both keys anyway, so passing them does not incur any additional cost.
>> The second one, lookup_name(), though, needs to explicitly invoke
>> name_to_handle_at() (through get_file_handle()) and statx() (through
>> do_statx()).  We need to try to get a file handle as the primary key, so
>> we cannot get rid of get_file_handle(), but we only need the statx
>> information if looking up an inode by handle failed; so we can defer
>> that until the lookup has indeed failed.
> So IIUC, this patch seems to be all about avoiding do_statx()
> call in lookup_name() if file handle could be successfully
> generated.
>
> So can't we just not modify lookup_name() to not call statx()
> if file handle could be generated. And also modfiy lo_find()
> to use st/mnt_id only if fhandle==NULL.
>
> That probably is much simpler change as compared to passing function
> pointers around.

Definitely, but I don’t know whether it’s correct.

Or, we can just drop this patch and say that we don’t need to 
over-optimize C virtiofsd.

Hanna



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 10/10] virtiofsd: Add lazy lo_do_find()
@ 2021-08-10  8:38       ` Hanna Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-10  8:38 UTC (permalink / raw)
  To: Vivek Goyal, Max Reitz; +Cc: virtio-fs, qemu-devel

On 09.08.21 21:08, Vivek Goyal wrote:
> On Fri, Jul 30, 2021 at 05:01:34PM +0200, Max Reitz wrote:
>> lo_find() right now takes two lookup keys for two maps, namely the file
>> handle for inodes_by_handle and the statx information for inodes_by_ids.
>> However, we only need the statx information if looking up the inode by
>> the file handle failed.
>>
>> There are two callers of lo_find(): The first one, lo_do_lookup(), has
>> both keys anyway, so passing them does not incur any additional cost.
>> The second one, lookup_name(), though, needs to explicitly invoke
>> name_to_handle_at() (through get_file_handle()) and statx() (through
>> do_statx()).  We need to try to get a file handle as the primary key, so
>> we cannot get rid of get_file_handle(), but we only need the statx
>> information if looking up an inode by handle failed; so we can defer
>> that until the lookup has indeed failed.
> So IIUC, this patch seems to be all about avoiding do_statx()
> call in lookup_name() if file handle could be successfully
> generated.
>
> So can't we just not modify lookup_name() to not call statx()
> if file handle could be generated. And also modfiy lo_find()
> to use st/mnt_id only if fhandle==NULL.
>
> That probably is much simpler change as compared to passing function
> pointers around.

Definitely, but I don’t know whether it’s correct.

Or, we can just drop this patch and say that we don’t need to 
over-optimize C virtiofsd.

Hanna


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table
  2021-08-09 16:47       ` [Virtio-fs] " Hanna Reitz
@ 2021-08-10 14:07         ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-10 14:07 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: virtio-fs, Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert,
	Max Reitz

On Mon, Aug 09, 2021 at 06:47:18PM +0200, Hanna Reitz wrote:
> On 09.08.21 18:10, Vivek Goyal wrote:
> > On Fri, Jul 30, 2021 at 05:01:32PM +0200, Max Reitz wrote:
> > > Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> > > FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> > > its inode ID will remain in use until we drop our lo_inode (and
> > > lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> > > the inode ID as an lo_inode key, because any inode with an inode ID we
> > > find in lo_data.inodes (on the same filesystem) must be the exact same
> > > file.
> > > 
> > > This will change when we start setting lo_inode.fhandle so we do not
> > > have to keep an O_PATH FD open.  Then, unlinking such an inode will
> > > immediately remove it, so its ID can then be reused by newly created
> > > files, even while the lo_inode object is still there[1].
> > > 
> > > So creating a new file can then reuse the old file's inode ID, and
> > > looking up the new file would lead to us finding the old file's
> > > lo_inode, which is not ideal.
> > > 
> > > Luckily, just as file handles cause this problem, they also solve it:  A
> > > file handle contains a generation ID, which changes when an inode ID is
> > > reused, so the new file can be distinguished from the old one.  So all
> > > we need to do is to add a second map besides lo_data.inodes that maps
> > > file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> > > clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> > > 
> > > Unfortunately, we cannot rely on being able to generate file handles
> > > every time.  Therefore, we still enter every lo_inode object into
> > > inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> > > potential inodes_by_handle entry then has precedence, the inodes_by_ids
> > > entry is just a fallback.
> > > 
> > > Note that we do not generate lo_fhandle objects yet, and so we also do
> > > not enter anything into the inodes_by_handle map yet.  Also, all lookups
> > > skip that map.  We might manually create file handles with some code
> > > that is immediately removed by the next patch again, but that would
> > > break the assumption in lo_find() that every lo_inode with a non-NULL
> > > .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> > > leave actually using the inodes_by_handle map for the next patch.
> > > 
> > > [1] If some application in the guest still has the file open, there is
> > > going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> > > case, the inode will only go away once every application in the guest
> > > has closed it.  The problem described only applies to cases where the
> > > guest does not have the file open, and it is just in the dentry cache,
> > > basically.
> > > 
> > > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > > ---
> > >   tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
> > >   1 file changed, 65 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index 487448d666..f9d8b2f134 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -180,7 +180,8 @@ struct lo_data {
> > >       int announce_submounts;
> > >       bool use_statx;
> > >       struct lo_inode root;
> > > -    GHashTable *inodes; /* protected by lo->mutex */
> > > +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
> > > +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
> > >       struct lo_map ino_map; /* protected by lo->mutex */
> > >       struct lo_map dirp_map; /* protected by lo->mutex */
> > >       struct lo_map fd_map; /* protected by lo->mutex */
> > > @@ -263,8 +264,9 @@ static struct {
> > >   /* That we loaded cap-ng in the current thread from the saved */
> > >   static __thread bool cap_loaded = 0;
> > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > -                                uint64_t mnt_id);
> > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > +                                const struct lo_fhandle *fhandle,
> > > +                                struct stat *st, uint64_t mnt_id);
> > >   static int xattr_map_client(const struct lo_data *lo, const char *client_name,
> > >                               char **out_name);
> > > @@ -1064,18 +1066,40 @@ out_err:
> > >       fuse_reply_err(req, saverr);
> > >   }
> > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > -                                uint64_t mnt_id)
> > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > +                                const struct lo_fhandle *fhandle,
> > > +                                struct stat *st, uint64_t mnt_id)
> > >   {
> > > -    struct lo_inode *p;
> > > -    struct lo_key key = {
> > > +    struct lo_inode *p = NULL;
> > > +    struct lo_key ids_key = {
> > >           .ino = st->st_ino,
> > >           .dev = st->st_dev,
> > >           .mnt_id = mnt_id,
> > >       };
> > >       pthread_mutex_lock(&lo->mutex);
> > > -    p = g_hash_table_lookup(lo->inodes, &key);
> > > +    if (fhandle) {
> > > +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
> > > +    }
> > > +    if (!p) {
> > > +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
> > So even if fhandle is not NULL, we will still lookup the inode
> > object in lo->inodes_by_ids? I thought fallback was only required
> > if we could not generate file handle to begin with and in that case
> > fhandle will be NULL?
> 
> Well.  I think it depends again on when file handle generation can fail and
> when it cannot.  If we assume it can randomly fail at any time, then it’s
> possible we create an lo_inode with an O_PATH fd, but later we are able to
> generate a file handle for it.  So we first try a lookup by file handle
> here, which would fail, but we’d still have to try a lookup by IDs, so we
> can find the O_PATH lo_inode.
> 
> An example case would be if at first we weren’t able to open a mount fd
> (because this file is a device node and the first lo_inode looked up on its
> filesystem), and so we couldn’t generate a file handle that we would be sure
> would work; but later for the lookup we can generate a file handle (because
> some other node on that filesystem has been opened by then, so we have a
> mount fd).

Ok, got it. If we are assuming that file handle generation can fail
randomly, then what will happen in following scenario.

- lookup, file handle generated, inode added to both hash tables.

- another lookup, handle generation failed. We call lo_find(), it
  finds inode in lo->inodes_by_ids but rejects it because p->fd == -1.

- Now lo_find() will return NULL and caller will assume inode could
  not be found (despite the fact it is in there) and caller lo_do_lookup()
  will try to add new inode to hash tables. So we will have two inode
  instances in hash table with same st_dev, st_ino, mnt_id. One will
  have file handle while other will have O_PATH fd.

So we have two inodes in cache representing same file. One using file
handle while other using O_PATH fd. 

One side affect of this is says guest has looked up a file (and got
node id 1, fhandle based inode). And later guest is revalidating
that inode, this time it could get inode 2 (O_PATH fd). Guest will
think inode has changed and discard previous inode and trigger
another lookup. This typically happens only if file has gone away.
But now it will happen because we have two inodes in cache representing
same file.

There might be other cases where this is bad. I can't think of any
at this point of time.

If could solve the issue of mount_fd, then we have to use fallback
path probably only for EOPNOTSUPP case. And then we can be sure
that cache will always have one inode either fhandle based or
O_PATH based (and not both).

Vivek



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table
@ 2021-08-10 14:07         ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-10 14:07 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: virtio-fs, qemu-devel, Max Reitz

On Mon, Aug 09, 2021 at 06:47:18PM +0200, Hanna Reitz wrote:
> On 09.08.21 18:10, Vivek Goyal wrote:
> > On Fri, Jul 30, 2021 at 05:01:32PM +0200, Max Reitz wrote:
> > > Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> > > FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> > > its inode ID will remain in use until we drop our lo_inode (and
> > > lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> > > the inode ID as an lo_inode key, because any inode with an inode ID we
> > > find in lo_data.inodes (on the same filesystem) must be the exact same
> > > file.
> > > 
> > > This will change when we start setting lo_inode.fhandle so we do not
> > > have to keep an O_PATH FD open.  Then, unlinking such an inode will
> > > immediately remove it, so its ID can then be reused by newly created
> > > files, even while the lo_inode object is still there[1].
> > > 
> > > So creating a new file can then reuse the old file's inode ID, and
> > > looking up the new file would lead to us finding the old file's
> > > lo_inode, which is not ideal.
> > > 
> > > Luckily, just as file handles cause this problem, they also solve it:  A
> > > file handle contains a generation ID, which changes when an inode ID is
> > > reused, so the new file can be distinguished from the old one.  So all
> > > we need to do is to add a second map besides lo_data.inodes that maps
> > > file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> > > clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> > > 
> > > Unfortunately, we cannot rely on being able to generate file handles
> > > every time.  Therefore, we still enter every lo_inode object into
> > > inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> > > potential inodes_by_handle entry then has precedence, the inodes_by_ids
> > > entry is just a fallback.
> > > 
> > > Note that we do not generate lo_fhandle objects yet, and so we also do
> > > not enter anything into the inodes_by_handle map yet.  Also, all lookups
> > > skip that map.  We might manually create file handles with some code
> > > that is immediately removed by the next patch again, but that would
> > > break the assumption in lo_find() that every lo_inode with a non-NULL
> > > .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> > > leave actually using the inodes_by_handle map for the next patch.
> > > 
> > > [1] If some application in the guest still has the file open, there is
> > > going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> > > case, the inode will only go away once every application in the guest
> > > has closed it.  The problem described only applies to cases where the
> > > guest does not have the file open, and it is just in the dentry cache,
> > > basically.
> > > 
> > > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > > ---
> > >   tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
> > >   1 file changed, 65 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index 487448d666..f9d8b2f134 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -180,7 +180,8 @@ struct lo_data {
> > >       int announce_submounts;
> > >       bool use_statx;
> > >       struct lo_inode root;
> > > -    GHashTable *inodes; /* protected by lo->mutex */
> > > +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
> > > +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
> > >       struct lo_map ino_map; /* protected by lo->mutex */
> > >       struct lo_map dirp_map; /* protected by lo->mutex */
> > >       struct lo_map fd_map; /* protected by lo->mutex */
> > > @@ -263,8 +264,9 @@ static struct {
> > >   /* That we loaded cap-ng in the current thread from the saved */
> > >   static __thread bool cap_loaded = 0;
> > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > -                                uint64_t mnt_id);
> > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > +                                const struct lo_fhandle *fhandle,
> > > +                                struct stat *st, uint64_t mnt_id);
> > >   static int xattr_map_client(const struct lo_data *lo, const char *client_name,
> > >                               char **out_name);
> > > @@ -1064,18 +1066,40 @@ out_err:
> > >       fuse_reply_err(req, saverr);
> > >   }
> > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > -                                uint64_t mnt_id)
> > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > +                                const struct lo_fhandle *fhandle,
> > > +                                struct stat *st, uint64_t mnt_id)
> > >   {
> > > -    struct lo_inode *p;
> > > -    struct lo_key key = {
> > > +    struct lo_inode *p = NULL;
> > > +    struct lo_key ids_key = {
> > >           .ino = st->st_ino,
> > >           .dev = st->st_dev,
> > >           .mnt_id = mnt_id,
> > >       };
> > >       pthread_mutex_lock(&lo->mutex);
> > > -    p = g_hash_table_lookup(lo->inodes, &key);
> > > +    if (fhandle) {
> > > +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
> > > +    }
> > > +    if (!p) {
> > > +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
> > So even if fhandle is not NULL, we will still lookup the inode
> > object in lo->inodes_by_ids? I thought fallback was only required
> > if we could not generate file handle to begin with and in that case
> > fhandle will be NULL?
> 
> Well.  I think it depends again on when file handle generation can fail and
> when it cannot.  If we assume it can randomly fail at any time, then it’s
> possible we create an lo_inode with an O_PATH fd, but later we are able to
> generate a file handle for it.  So we first try a lookup by file handle
> here, which would fail, but we’d still have to try a lookup by IDs, so we
> can find the O_PATH lo_inode.
> 
> An example case would be if at first we weren’t able to open a mount fd
> (because this file is a device node and the first lo_inode looked up on its
> filesystem), and so we couldn’t generate a file handle that we would be sure
> would work; but later for the lookup we can generate a file handle (because
> some other node on that filesystem has been opened by then, so we have a
> mount fd).

Ok, got it. If we are assuming that file handle generation can fail
randomly, then what will happen in following scenario.

- lookup, file handle generated, inode added to both hash tables.

- another lookup, handle generation failed. We call lo_find(), it
  finds inode in lo->inodes_by_ids but rejects it because p->fd == -1.

- Now lo_find() will return NULL and caller will assume inode could
  not be found (despite the fact it is in there) and caller lo_do_lookup()
  will try to add new inode to hash tables. So we will have two inode
  instances in hash table with same st_dev, st_ino, mnt_id. One will
  have file handle while other will have O_PATH fd.

So we have two inodes in cache representing same file. One using file
handle while other using O_PATH fd. 

One side affect of this is says guest has looked up a file (and got
node id 1, fhandle based inode). And later guest is revalidating
that inode, this time it could get inode 2 (O_PATH fd). Guest will
think inode has changed and discard previous inode and trigger
another lookup. This typically happens only if file has gone away.
But now it will happen because we have two inodes in cache representing
same file.

There might be other cases where this is bad. I can't think of any
at this point of time.

If could solve the issue of mount_fd, then we have to use fallback
path probably only for EOPNOTSUPP case. And then we can be sure
that cache will always have one inode either fhandle based or
O_PATH based (and not both).

Vivek


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 10/10] virtiofsd: Add lazy lo_do_find()
  2021-08-10  8:38       ` [Virtio-fs] " Hanna Reitz
@ 2021-08-10 14:12         ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-10 14:12 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: virtio-fs, Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert,
	Max Reitz

On Tue, Aug 10, 2021 at 10:38:32AM +0200, Hanna Reitz wrote:
> On 09.08.21 21:08, Vivek Goyal wrote:
> > On Fri, Jul 30, 2021 at 05:01:34PM +0200, Max Reitz wrote:
> > > lo_find() right now takes two lookup keys for two maps, namely the file
> > > handle for inodes_by_handle and the statx information for inodes_by_ids.
> > > However, we only need the statx information if looking up the inode by
> > > the file handle failed.
> > > 
> > > There are two callers of lo_find(): The first one, lo_do_lookup(), has
> > > both keys anyway, so passing them does not incur any additional cost.
> > > The second one, lookup_name(), though, needs to explicitly invoke
> > > name_to_handle_at() (through get_file_handle()) and statx() (through
> > > do_statx()).  We need to try to get a file handle as the primary key, so
> > > we cannot get rid of get_file_handle(), but we only need the statx
> > > information if looking up an inode by handle failed; so we can defer
> > > that until the lookup has indeed failed.
> > So IIUC, this patch seems to be all about avoiding do_statx()
> > call in lookup_name() if file handle could be successfully
> > generated.
> > 
> > So can't we just not modify lookup_name() to not call statx()
> > if file handle could be generated. And also modfiy lo_find()
> > to use st/mnt_id only if fhandle==NULL.
> > 
> > That probably is much simpler change as compared to passing function
> > pointers around.
> 
> Definitely, but I don’t know whether it’s correct.

What problem do you see from correctness point of view.
> 
> Or, we can just drop this patch and say that we don’t need to over-optimize
> C virtiofsd.

Rust version is used by very few people, while C version is in production.
So I will definitely optimize C version. Once rust version is widely
available and available in product, then we can start paying less
attention to C version, IMHO.

Vivek



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 10/10] virtiofsd: Add lazy lo_do_find()
@ 2021-08-10 14:12         ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-10 14:12 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: virtio-fs, qemu-devel, Max Reitz

On Tue, Aug 10, 2021 at 10:38:32AM +0200, Hanna Reitz wrote:
> On 09.08.21 21:08, Vivek Goyal wrote:
> > On Fri, Jul 30, 2021 at 05:01:34PM +0200, Max Reitz wrote:
> > > lo_find() right now takes two lookup keys for two maps, namely the file
> > > handle for inodes_by_handle and the statx information for inodes_by_ids.
> > > However, we only need the statx information if looking up the inode by
> > > the file handle failed.
> > > 
> > > There are two callers of lo_find(): The first one, lo_do_lookup(), has
> > > both keys anyway, so passing them does not incur any additional cost.
> > > The second one, lookup_name(), though, needs to explicitly invoke
> > > name_to_handle_at() (through get_file_handle()) and statx() (through
> > > do_statx()).  We need to try to get a file handle as the primary key, so
> > > we cannot get rid of get_file_handle(), but we only need the statx
> > > information if looking up an inode by handle failed; so we can defer
> > > that until the lookup has indeed failed.
> > So IIUC, this patch seems to be all about avoiding do_statx()
> > call in lookup_name() if file handle could be successfully
> > generated.
> > 
> > So can't we just not modify lookup_name() to not call statx()
> > if file handle could be generated. And also modfiy lo_find()
> > to use st/mnt_id only if fhandle==NULL.
> > 
> > That probably is much simpler change as compared to passing function
> > pointers around.
> 
> Definitely, but I don’t know whether it’s correct.

What problem do you see from correctness point of view.
> 
> Or, we can just drop this patch and say that we don’t need to over-optimize
> C virtiofsd.

Rust version is used by very few people, while C version is in production.
So I will definitely optimize C version. Once rust version is widely
available and available in product, then we can start paying less
attention to C version, IMHO.

Vivek


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table
  2021-08-10 14:07         ` [Virtio-fs] " Vivek Goyal
@ 2021-08-10 14:13           ` Hanna Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-10 14:13 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert,
	Max Reitz

On 10.08.21 16:07, Vivek Goyal wrote:
> On Mon, Aug 09, 2021 at 06:47:18PM +0200, Hanna Reitz wrote:
>> On 09.08.21 18:10, Vivek Goyal wrote:
>>> On Fri, Jul 30, 2021 at 05:01:32PM +0200, Max Reitz wrote:
>>>> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
>>>> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
>>>> its inode ID will remain in use until we drop our lo_inode (and
>>>> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
>>>> the inode ID as an lo_inode key, because any inode with an inode ID we
>>>> find in lo_data.inodes (on the same filesystem) must be the exact same
>>>> file.
>>>>
>>>> This will change when we start setting lo_inode.fhandle so we do not
>>>> have to keep an O_PATH FD open.  Then, unlinking such an inode will
>>>> immediately remove it, so its ID can then be reused by newly created
>>>> files, even while the lo_inode object is still there[1].
>>>>
>>>> So creating a new file can then reuse the old file's inode ID, and
>>>> looking up the new file would lead to us finding the old file's
>>>> lo_inode, which is not ideal.
>>>>
>>>> Luckily, just as file handles cause this problem, they also solve it:  A
>>>> file handle contains a generation ID, which changes when an inode ID is
>>>> reused, so the new file can be distinguished from the old one.  So all
>>>> we need to do is to add a second map besides lo_data.inodes that maps
>>>> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
>>>> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
>>>>
>>>> Unfortunately, we cannot rely on being able to generate file handles
>>>> every time.  Therefore, we still enter every lo_inode object into
>>>> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
>>>> potential inodes_by_handle entry then has precedence, the inodes_by_ids
>>>> entry is just a fallback.
>>>>
>>>> Note that we do not generate lo_fhandle objects yet, and so we also do
>>>> not enter anything into the inodes_by_handle map yet.  Also, all lookups
>>>> skip that map.  We might manually create file handles with some code
>>>> that is immediately removed by the next patch again, but that would
>>>> break the assumption in lo_find() that every lo_inode with a non-NULL
>>>> .fhandle must have an entry in inodes_by_handle and vice versa.  So we
>>>> leave actually using the inodes_by_handle map for the next patch.
>>>>
>>>> [1] If some application in the guest still has the file open, there is
>>>> going to be a corresponding FD mapping in lo_data.fd_map.  In such a
>>>> case, the inode will only go away once every application in the guest
>>>> has closed it.  The problem described only applies to cases where the
>>>> guest does not have the file open, and it is just in the dentry cache,
>>>> basically.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>    tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
>>>>    1 file changed, 65 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>>>> index 487448d666..f9d8b2f134 100644
>>>> --- a/tools/virtiofsd/passthrough_ll.c
>>>> +++ b/tools/virtiofsd/passthrough_ll.c
>>>> @@ -180,7 +180,8 @@ struct lo_data {
>>>>        int announce_submounts;
>>>>        bool use_statx;
>>>>        struct lo_inode root;
>>>> -    GHashTable *inodes; /* protected by lo->mutex */
>>>> +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
>>>> +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
>>>>        struct lo_map ino_map; /* protected by lo->mutex */
>>>>        struct lo_map dirp_map; /* protected by lo->mutex */
>>>>        struct lo_map fd_map; /* protected by lo->mutex */
>>>> @@ -263,8 +264,9 @@ static struct {
>>>>    /* That we loaded cap-ng in the current thread from the saved */
>>>>    static __thread bool cap_loaded = 0;
>>>> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
>>>> -                                uint64_t mnt_id);
>>>> +static struct lo_inode *lo_find(struct lo_data *lo,
>>>> +                                const struct lo_fhandle *fhandle,
>>>> +                                struct stat *st, uint64_t mnt_id);
>>>>    static int xattr_map_client(const struct lo_data *lo, const char *client_name,
>>>>                                char **out_name);
>>>> @@ -1064,18 +1066,40 @@ out_err:
>>>>        fuse_reply_err(req, saverr);
>>>>    }
>>>> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
>>>> -                                uint64_t mnt_id)
>>>> +static struct lo_inode *lo_find(struct lo_data *lo,
>>>> +                                const struct lo_fhandle *fhandle,
>>>> +                                struct stat *st, uint64_t mnt_id)
>>>>    {
>>>> -    struct lo_inode *p;
>>>> -    struct lo_key key = {
>>>> +    struct lo_inode *p = NULL;
>>>> +    struct lo_key ids_key = {
>>>>            .ino = st->st_ino,
>>>>            .dev = st->st_dev,
>>>>            .mnt_id = mnt_id,
>>>>        };
>>>>        pthread_mutex_lock(&lo->mutex);
>>>> -    p = g_hash_table_lookup(lo->inodes, &key);
>>>> +    if (fhandle) {
>>>> +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
>>>> +    }
>>>> +    if (!p) {
>>>> +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
>>> So even if fhandle is not NULL, we will still lookup the inode
>>> object in lo->inodes_by_ids? I thought fallback was only required
>>> if we could not generate file handle to begin with and in that case
>>> fhandle will be NULL?
>> Well.  I think it depends again on when file handle generation can fail and
>> when it cannot.  If we assume it can randomly fail at any time, then it’s
>> possible we create an lo_inode with an O_PATH fd, but later we are able to
>> generate a file handle for it.  So we first try a lookup by file handle
>> here, which would fail, but we’d still have to try a lookup by IDs, so we
>> can find the O_PATH lo_inode.
>>
>> An example case would be if at first we weren’t able to open a mount fd
>> (because this file is a device node and the first lo_inode looked up on its
>> filesystem), and so we couldn’t generate a file handle that we would be sure
>> would work; but later for the lookup we can generate a file handle (because
>> some other node on that filesystem has been opened by then, so we have a
>> mount fd).
> Ok, got it. If we are assuming that file handle generation can fail
> randomly, then what will happen in following scenario.
>
> - lookup, file handle generated, inode added to both hash tables.
>
> - another lookup, handle generation failed. We call lo_find(), it
>    finds inode in lo->inodes_by_ids but rejects it because p->fd == -1.
>
> - Now lo_find() will return NULL and caller will assume inode could
>    not be found (despite the fact it is in there) and caller lo_do_lookup()
>    will try to add new inode to hash tables. So we will have two inode
>    instances in hash table with same st_dev, st_ino, mnt_id. One will
>    have file handle while other will have O_PATH fd.
>
> So we have two inodes in cache representing same file. One using file
> handle while other using O_PATH fd.
>
> One side affect of this is says guest has looked up a file (and got
> node id 1, fhandle based inode). And later guest is revalidating
> that inode, this time it could get inode 2 (O_PATH fd). Guest will
> think inode has changed and discard previous inode and trigger
> another lookup. This typically happens only if file has gone away.
> But now it will happen because we have two inodes in cache representing
> same file.
>
> There might be other cases where this is bad. I can't think of any
> at this point of time.
>
> If could solve the issue of mount_fd, then we have to use fallback
> path probably only for EOPNOTSUPP case. And then we can be sure
> that cache will always have one inode either fhandle based or
> O_PATH based (and not both).

OK, but can we truly solve the mount_fd issue?

What I think we could do is have two variants of the file handle 
generation function, one which is supposed to create a usable file 
handle (so this version will ensure mount_fds contains a valid fd for 
the mount ID), and one that just generates a file handle for lookup 
(i.e. it doesn’t look into mount_fds at all).  The latter version would 
practically only fail in the EOPNOTSUPP case.

Would that get around the issue?

Hanna



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table
@ 2021-08-10 14:13           ` Hanna Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-10 14:13 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, Max Reitz

On 10.08.21 16:07, Vivek Goyal wrote:
> On Mon, Aug 09, 2021 at 06:47:18PM +0200, Hanna Reitz wrote:
>> On 09.08.21 18:10, Vivek Goyal wrote:
>>> On Fri, Jul 30, 2021 at 05:01:32PM +0200, Max Reitz wrote:
>>>> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
>>>> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
>>>> its inode ID will remain in use until we drop our lo_inode (and
>>>> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
>>>> the inode ID as an lo_inode key, because any inode with an inode ID we
>>>> find in lo_data.inodes (on the same filesystem) must be the exact same
>>>> file.
>>>>
>>>> This will change when we start setting lo_inode.fhandle so we do not
>>>> have to keep an O_PATH FD open.  Then, unlinking such an inode will
>>>> immediately remove it, so its ID can then be reused by newly created
>>>> files, even while the lo_inode object is still there[1].
>>>>
>>>> So creating a new file can then reuse the old file's inode ID, and
>>>> looking up the new file would lead to us finding the old file's
>>>> lo_inode, which is not ideal.
>>>>
>>>> Luckily, just as file handles cause this problem, they also solve it:  A
>>>> file handle contains a generation ID, which changes when an inode ID is
>>>> reused, so the new file can be distinguished from the old one.  So all
>>>> we need to do is to add a second map besides lo_data.inodes that maps
>>>> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
>>>> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
>>>>
>>>> Unfortunately, we cannot rely on being able to generate file handles
>>>> every time.  Therefore, we still enter every lo_inode object into
>>>> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
>>>> potential inodes_by_handle entry then has precedence, the inodes_by_ids
>>>> entry is just a fallback.
>>>>
>>>> Note that we do not generate lo_fhandle objects yet, and so we also do
>>>> not enter anything into the inodes_by_handle map yet.  Also, all lookups
>>>> skip that map.  We might manually create file handles with some code
>>>> that is immediately removed by the next patch again, but that would
>>>> break the assumption in lo_find() that every lo_inode with a non-NULL
>>>> .fhandle must have an entry in inodes_by_handle and vice versa.  So we
>>>> leave actually using the inodes_by_handle map for the next patch.
>>>>
>>>> [1] If some application in the guest still has the file open, there is
>>>> going to be a corresponding FD mapping in lo_data.fd_map.  In such a
>>>> case, the inode will only go away once every application in the guest
>>>> has closed it.  The problem described only applies to cases where the
>>>> guest does not have the file open, and it is just in the dentry cache,
>>>> basically.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>    tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
>>>>    1 file changed, 65 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>>>> index 487448d666..f9d8b2f134 100644
>>>> --- a/tools/virtiofsd/passthrough_ll.c
>>>> +++ b/tools/virtiofsd/passthrough_ll.c
>>>> @@ -180,7 +180,8 @@ struct lo_data {
>>>>        int announce_submounts;
>>>>        bool use_statx;
>>>>        struct lo_inode root;
>>>> -    GHashTable *inodes; /* protected by lo->mutex */
>>>> +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
>>>> +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
>>>>        struct lo_map ino_map; /* protected by lo->mutex */
>>>>        struct lo_map dirp_map; /* protected by lo->mutex */
>>>>        struct lo_map fd_map; /* protected by lo->mutex */
>>>> @@ -263,8 +264,9 @@ static struct {
>>>>    /* That we loaded cap-ng in the current thread from the saved */
>>>>    static __thread bool cap_loaded = 0;
>>>> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
>>>> -                                uint64_t mnt_id);
>>>> +static struct lo_inode *lo_find(struct lo_data *lo,
>>>> +                                const struct lo_fhandle *fhandle,
>>>> +                                struct stat *st, uint64_t mnt_id);
>>>>    static int xattr_map_client(const struct lo_data *lo, const char *client_name,
>>>>                                char **out_name);
>>>> @@ -1064,18 +1066,40 @@ out_err:
>>>>        fuse_reply_err(req, saverr);
>>>>    }
>>>> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
>>>> -                                uint64_t mnt_id)
>>>> +static struct lo_inode *lo_find(struct lo_data *lo,
>>>> +                                const struct lo_fhandle *fhandle,
>>>> +                                struct stat *st, uint64_t mnt_id)
>>>>    {
>>>> -    struct lo_inode *p;
>>>> -    struct lo_key key = {
>>>> +    struct lo_inode *p = NULL;
>>>> +    struct lo_key ids_key = {
>>>>            .ino = st->st_ino,
>>>>            .dev = st->st_dev,
>>>>            .mnt_id = mnt_id,
>>>>        };
>>>>        pthread_mutex_lock(&lo->mutex);
>>>> -    p = g_hash_table_lookup(lo->inodes, &key);
>>>> +    if (fhandle) {
>>>> +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
>>>> +    }
>>>> +    if (!p) {
>>>> +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
>>> So even if fhandle is not NULL, we will still lookup the inode
>>> object in lo->inodes_by_ids? I thought fallback was only required
>>> if we could not generate file handle to begin with and in that case
>>> fhandle will be NULL?
>> Well.  I think it depends again on when file handle generation can fail and
>> when it cannot.  If we assume it can randomly fail at any time, then it’s
>> possible we create an lo_inode with an O_PATH fd, but later we are able to
>> generate a file handle for it.  So we first try a lookup by file handle
>> here, which would fail, but we’d still have to try a lookup by IDs, so we
>> can find the O_PATH lo_inode.
>>
>> An example case would be if at first we weren’t able to open a mount fd
>> (because this file is a device node and the first lo_inode looked up on its
>> filesystem), and so we couldn’t generate a file handle that we would be sure
>> would work; but later for the lookup we can generate a file handle (because
>> some other node on that filesystem has been opened by then, so we have a
>> mount fd).
> Ok, got it. If we are assuming that file handle generation can fail
> randomly, then what will happen in following scenario.
>
> - lookup, file handle generated, inode added to both hash tables.
>
> - another lookup, handle generation failed. We call lo_find(), it
>    finds inode in lo->inodes_by_ids but rejects it because p->fd == -1.
>
> - Now lo_find() will return NULL and caller will assume inode could
>    not be found (despite the fact it is in there) and caller lo_do_lookup()
>    will try to add new inode to hash tables. So we will have two inode
>    instances in hash table with same st_dev, st_ino, mnt_id. One will
>    have file handle while other will have O_PATH fd.
>
> So we have two inodes in cache representing same file. One using file
> handle while other using O_PATH fd.
>
> One side affect of this is says guest has looked up a file (and got
> node id 1, fhandle based inode). And later guest is revalidating
> that inode, this time it could get inode 2 (O_PATH fd). Guest will
> think inode has changed and discard previous inode and trigger
> another lookup. This typically happens only if file has gone away.
> But now it will happen because we have two inodes in cache representing
> same file.
>
> There might be other cases where this is bad. I can't think of any
> at this point of time.
>
> If could solve the issue of mount_fd, then we have to use fallback
> path probably only for EOPNOTSUPP case. And then we can be sure
> that cache will always have one inode either fhandle based or
> O_PATH based (and not both).

OK, but can we truly solve the mount_fd issue?

What I think we could do is have two variants of the file handle 
generation function, one which is supposed to create a usable file 
handle (so this version will ensure mount_fds contains a valid fd for 
the mount ID), and one that just generates a file handle for lookup 
(i.e. it doesn’t look into mount_fds at all).  The latter version would 
practically only fail in the EOPNOTSUPP case.

Would that get around the issue?

Hanna


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 10/10] virtiofsd: Add lazy lo_do_find()
  2021-08-10 14:12         ` [Virtio-fs] " Vivek Goyal
@ 2021-08-10 14:17           ` Hanna Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-10 14:17 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert,
	Max Reitz

On 10.08.21 16:12, Vivek Goyal wrote:
> On Tue, Aug 10, 2021 at 10:38:32AM +0200, Hanna Reitz wrote:
>> On 09.08.21 21:08, Vivek Goyal wrote:
>>> On Fri, Jul 30, 2021 at 05:01:34PM +0200, Max Reitz wrote:
>>>> lo_find() right now takes two lookup keys for two maps, namely the file
>>>> handle for inodes_by_handle and the statx information for inodes_by_ids.
>>>> However, we only need the statx information if looking up the inode by
>>>> the file handle failed.
>>>>
>>>> There are two callers of lo_find(): The first one, lo_do_lookup(), has
>>>> both keys anyway, so passing them does not incur any additional cost.
>>>> The second one, lookup_name(), though, needs to explicitly invoke
>>>> name_to_handle_at() (through get_file_handle()) and statx() (through
>>>> do_statx()).  We need to try to get a file handle as the primary key, so
>>>> we cannot get rid of get_file_handle(), but we only need the statx
>>>> information if looking up an inode by handle failed; so we can defer
>>>> that until the lookup has indeed failed.
>>> So IIUC, this patch seems to be all about avoiding do_statx()
>>> call in lookup_name() if file handle could be successfully
>>> generated.
>>>
>>> So can't we just not modify lookup_name() to not call statx()
>>> if file handle could be generated. And also modfiy lo_find()
>>> to use st/mnt_id only if fhandle==NULL.
>>>
>>> That probably is much simpler change as compared to passing function
>>> pointers around.
>> Definitely, but I don’t know whether it’s correct.
> What problem do you see from correctness point of view.

Again assuming that file handle generation can randomly fail (this time 
assuming it failed the first time, and later may succeed), it’s possible 
we have an lo_inode that we want to look up that does not have a file 
handle, but for the lookup we were able to generate a file handle for 
it.  In such a case, we need to call statx() to get st_ino/st_dev/mnt_id.

>> Or, we can just drop this patch and say that we don’t need to over-optimize
>> C virtiofsd.
> Rust version is used by very few people, while C version is in production.
> So I will definitely optimize C version. Once rust version is widely
> available and available in product, then we can start paying less
> attention to C version, IMHO.

OK, it was just an offer.  I mean, I myself wrote this patch as an 
optimization after all. :)

Hanna



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 10/10] virtiofsd: Add lazy lo_do_find()
@ 2021-08-10 14:17           ` Hanna Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-10 14:17 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, Max Reitz

On 10.08.21 16:12, Vivek Goyal wrote:
> On Tue, Aug 10, 2021 at 10:38:32AM +0200, Hanna Reitz wrote:
>> On 09.08.21 21:08, Vivek Goyal wrote:
>>> On Fri, Jul 30, 2021 at 05:01:34PM +0200, Max Reitz wrote:
>>>> lo_find() right now takes two lookup keys for two maps, namely the file
>>>> handle for inodes_by_handle and the statx information for inodes_by_ids.
>>>> However, we only need the statx information if looking up the inode by
>>>> the file handle failed.
>>>>
>>>> There are two callers of lo_find(): The first one, lo_do_lookup(), has
>>>> both keys anyway, so passing them does not incur any additional cost.
>>>> The second one, lookup_name(), though, needs to explicitly invoke
>>>> name_to_handle_at() (through get_file_handle()) and statx() (through
>>>> do_statx()).  We need to try to get a file handle as the primary key, so
>>>> we cannot get rid of get_file_handle(), but we only need the statx
>>>> information if looking up an inode by handle failed; so we can defer
>>>> that until the lookup has indeed failed.
>>> So IIUC, this patch seems to be all about avoiding do_statx()
>>> call in lookup_name() if file handle could be successfully
>>> generated.
>>>
>>> So can't we just not modify lookup_name() to not call statx()
>>> if file handle could be generated. And also modfiy lo_find()
>>> to use st/mnt_id only if fhandle==NULL.
>>>
>>> That probably is much simpler change as compared to passing function
>>> pointers around.
>> Definitely, but I don’t know whether it’s correct.
> What problem do you see from correctness point of view.

Again assuming that file handle generation can randomly fail (this time 
assuming it failed the first time, and later may succeed), it’s possible 
we have an lo_inode that we want to look up that does not have a file 
handle, but for the lookup we were able to generate a file handle for 
it.  In such a case, we need to call statx() to get st_ino/st_dev/mnt_id.

>> Or, we can just drop this patch and say that we don’t need to over-optimize
>> C virtiofsd.
> Rust version is used by very few people, while C version is in production.
> So I will definitely optimize C version. Once rust version is widely
> available and available in product, then we can start paying less
> attention to C version, IMHO.

OK, it was just an offer.  I mean, I myself wrote this patch as an 
optimization after all. :)

Hanna


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
  2021-08-10  8:32       ` [Virtio-fs] " Hanna Reitz
@ 2021-08-10 15:23         ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-10 15:23 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert, virtio-fs,
	Ioannis Angelakopoulos, Max Reitz

On Tue, Aug 10, 2021 at 10:32:55AM +0200, Hanna Reitz wrote:
> On 09.08.21 20:41, Vivek Goyal wrote:
> > On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
> > > When the inode_file_handles option is set, try to generate a file handle
> > > for new inodes instead of opening an O_PATH FD.
> > > 
> > > Being able to open these again will require CAP_DAC_READ_SEARCH, so the
> > > description text tells the user they will also need to specify
> > > -o modcaps=+dac_read_search.
> > > 
> > > Generating a file handle returns the mount ID it is valid for.  Opening
> > > it will require an FD instead.  We have mount_fds to map an ID to an FD.
> > > get_file_handle() fills the hash map by opening the file we have
> > > generated a handle for.  To verify that the resulting FD indeed
> > > represents the handle's mount ID, we use statx().  Therefore, using file
> > > handles requires statx() support.
> > So opening the file and storing that fd in mount_fds table might be
> > a potential problem with inotify work Ioannis is doing.
> > 
> > So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
> > say user unlinks foo.txt. If notifications are enabled, final notification
> > will not be generated till this mount_fds fd is closed.
> > 
> > Now question is when will this fd be closed? If it closed at some
> > later point and then notification is generated, that will break
> > notificaitons.
> 
> Currently, it is never closed.
> 
> > In fact even O_PATH fd is delaying notifications due to same reason.
> > But its not too bad as we close O_PATH fd pretty quickly after
> > unlinking. And we were hoping that file handle support will get rid
> > of this problem because we will not keep O_PATH fd open.
> > 
> > But, IIUC, mount_fds stuff will make it even worse. I did not see
> > the code which removes this fd from mount_fds. So I am not sure what's
> > the life time of this fd.
> 
> The lifetime is forever.  If we wanted to remove it at some point, we’d need
> to track how many file handles we have open for the given mount fd and then
> remove it from the table once the count reaches 0, so it would still be
> delayed.
> 
> I think in practice the first thing that is looked up from some mount will
> probably be the root directory, which cannot be deleted before everything
> else on the mount is gone, so that would work.  We track how many handles
> are there, if the whole mount were to be deleted, I hope all lo_inodes are
> evicted, the count goes to 0, and we can drop the mount fd.

Keeping a reference count on mount_fd object make sense. So we probably
maintain this hash table and lookup using mount_id (as you are already
doing). All subsequent inodes from same filesystem will use same
object. Once all inodes have been flushed out, then mount_fd object
should go away as well (allowing for unmount on host).

> 
> I think we can make the assumption that the mount fd is the root directory
> certain by, well, looking into mountinfo...  That would result in us always
> opening the root node of the filesystem, so that first the whole filesystem
> needs to disappear before it can be deleted (and our mount fd closed) –
> which should work, I guess?

This seems more reasonable. And I think that's what man page seems to 
suggest.

       The  mount_id  argument  returns an identifier for the filesystem mount
       that corresponds to pathname.  This corresponds to the first  field  in
       one  of  the  records in /proc/self/mountinfo.  Opening the pathname in
       the fifth field of that record yields a file descriptor for  the  mount
       point;  that  file  descriptor  can  be  used  in  a subsequent call to
       open_by_handle_at().

Fifth field seems to be the mount point. man proc says.

              (5)  mount  point:  the  pathname of the mount point relative to
                   the process's root directory.

So opening mount point and saving as mount_fd (if it is not already
in hash table) and then take a per inode reference count on mount_fd
object looks like will solve the life time issue of mount_fd as
well as the issue of temporary failures arising because we can't
open a device special file.

> 
> It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
> but if that’s the only way...

yes. We already have lo->proc_self_fd. Maybe we need to keep
/proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
that any mount table changes will still be visible despite the fact
I have fd open (and don't have to open new fd to notice new mount/unmount
changes).

Vivek



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-08-10 15:23         ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-10 15:23 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: qemu-devel, virtio-fs, Max Reitz

On Tue, Aug 10, 2021 at 10:32:55AM +0200, Hanna Reitz wrote:
> On 09.08.21 20:41, Vivek Goyal wrote:
> > On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
> > > When the inode_file_handles option is set, try to generate a file handle
> > > for new inodes instead of opening an O_PATH FD.
> > > 
> > > Being able to open these again will require CAP_DAC_READ_SEARCH, so the
> > > description text tells the user they will also need to specify
> > > -o modcaps=+dac_read_search.
> > > 
> > > Generating a file handle returns the mount ID it is valid for.  Opening
> > > it will require an FD instead.  We have mount_fds to map an ID to an FD.
> > > get_file_handle() fills the hash map by opening the file we have
> > > generated a handle for.  To verify that the resulting FD indeed
> > > represents the handle's mount ID, we use statx().  Therefore, using file
> > > handles requires statx() support.
> > So opening the file and storing that fd in mount_fds table might be
> > a potential problem with inotify work Ioannis is doing.
> > 
> > So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
> > say user unlinks foo.txt. If notifications are enabled, final notification
> > will not be generated till this mount_fds fd is closed.
> > 
> > Now question is when will this fd be closed? If it closed at some
> > later point and then notification is generated, that will break
> > notificaitons.
> 
> Currently, it is never closed.
> 
> > In fact even O_PATH fd is delaying notifications due to same reason.
> > But its not too bad as we close O_PATH fd pretty quickly after
> > unlinking. And we were hoping that file handle support will get rid
> > of this problem because we will not keep O_PATH fd open.
> > 
> > But, IIUC, mount_fds stuff will make it even worse. I did not see
> > the code which removes this fd from mount_fds. So I am not sure what's
> > the life time of this fd.
> 
> The lifetime is forever.  If we wanted to remove it at some point, we’d need
> to track how many file handles we have open for the given mount fd and then
> remove it from the table once the count reaches 0, so it would still be
> delayed.
> 
> I think in practice the first thing that is looked up from some mount will
> probably be the root directory, which cannot be deleted before everything
> else on the mount is gone, so that would work.  We track how many handles
> are there, if the whole mount were to be deleted, I hope all lo_inodes are
> evicted, the count goes to 0, and we can drop the mount fd.

Keeping a reference count on mount_fd object make sense. So we probably
maintain this hash table and lookup using mount_id (as you are already
doing). All subsequent inodes from same filesystem will use same
object. Once all inodes have been flushed out, then mount_fd object
should go away as well (allowing for unmount on host).

> 
> I think we can make the assumption that the mount fd is the root directory
> certain by, well, looking into mountinfo...  That would result in us always
> opening the root node of the filesystem, so that first the whole filesystem
> needs to disappear before it can be deleted (and our mount fd closed) –
> which should work, I guess?

This seems more reasonable. And I think that's what man page seems to 
suggest.

       The  mount_id  argument  returns an identifier for the filesystem mount
       that corresponds to pathname.  This corresponds to the first  field  in
       one  of  the  records in /proc/self/mountinfo.  Opening the pathname in
       the fifth field of that record yields a file descriptor for  the  mount
       point;  that  file  descriptor  can  be  used  in  a subsequent call to
       open_by_handle_at().

Fifth field seems to be the mount point. man proc says.

              (5)  mount  point:  the  pathname of the mount point relative to
                   the process's root directory.

So opening mount point and saving as mount_fd (if it is not already
in hash table) and then take a per inode reference count on mount_fd
object looks like will solve the life time issue of mount_fd as
well as the issue of temporary failures arising because we can't
open a device special file.

> 
> It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
> but if that’s the only way...

yes. We already have lo->proc_self_fd. Maybe we need to keep
/proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
that any mount table changes will still be visible despite the fact
I have fd open (and don't have to open new fd to notice new mount/unmount
changes).

Vivek


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
  2021-08-10 15:23         ` [Virtio-fs] " Vivek Goyal
@ 2021-08-10 15:26           ` Hanna Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-10 15:26 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert, virtio-fs,
	Ioannis Angelakopoulos, Max Reitz

On 10.08.21 17:23, Vivek Goyal wrote:
> On Tue, Aug 10, 2021 at 10:32:55AM +0200, Hanna Reitz wrote:
>> On 09.08.21 20:41, Vivek Goyal wrote:
>>> On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
>>>> When the inode_file_handles option is set, try to generate a file handle
>>>> for new inodes instead of opening an O_PATH FD.
>>>>
>>>> Being able to open these again will require CAP_DAC_READ_SEARCH, so the
>>>> description text tells the user they will also need to specify
>>>> -o modcaps=+dac_read_search.
>>>>
>>>> Generating a file handle returns the mount ID it is valid for.  Opening
>>>> it will require an FD instead.  We have mount_fds to map an ID to an FD.
>>>> get_file_handle() fills the hash map by opening the file we have
>>>> generated a handle for.  To verify that the resulting FD indeed
>>>> represents the handle's mount ID, we use statx().  Therefore, using file
>>>> handles requires statx() support.
>>> So opening the file and storing that fd in mount_fds table might be
>>> a potential problem with inotify work Ioannis is doing.
>>>
>>> So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
>>> say user unlinks foo.txt. If notifications are enabled, final notification
>>> will not be generated till this mount_fds fd is closed.
>>>
>>> Now question is when will this fd be closed? If it closed at some
>>> later point and then notification is generated, that will break
>>> notificaitons.
>> Currently, it is never closed.
>>
>>> In fact even O_PATH fd is delaying notifications due to same reason.
>>> But its not too bad as we close O_PATH fd pretty quickly after
>>> unlinking. And we were hoping that file handle support will get rid
>>> of this problem because we will not keep O_PATH fd open.
>>>
>>> But, IIUC, mount_fds stuff will make it even worse. I did not see
>>> the code which removes this fd from mount_fds. So I am not sure what's
>>> the life time of this fd.
>> The lifetime is forever.  If we wanted to remove it at some point, we’d need
>> to track how many file handles we have open for the given mount fd and then
>> remove it from the table once the count reaches 0, so it would still be
>> delayed.
>>
>> I think in practice the first thing that is looked up from some mount will
>> probably be the root directory, which cannot be deleted before everything
>> else on the mount is gone, so that would work.  We track how many handles
>> are there, if the whole mount were to be deleted, I hope all lo_inodes are
>> evicted, the count goes to 0, and we can drop the mount fd.
> Keeping a reference count on mount_fd object make sense. So we probably
> maintain this hash table and lookup using mount_id (as you are already
> doing). All subsequent inodes from same filesystem will use same
> object. Once all inodes have been flushed out, then mount_fd object
> should go away as well (allowing for unmount on host).
>
>> I think we can make the assumption that the mount fd is the root directory
>> certain by, well, looking into mountinfo...  That would result in us always
>> opening the root node of the filesystem, so that first the whole filesystem
>> needs to disappear before it can be deleted (and our mount fd closed) –
>> which should work, I guess?
> This seems more reasonable. And I think that's what man page seems to
> suggest.
>
>         The  mount_id  argument  returns an identifier for the filesystem mount
>         that corresponds to pathname.  This corresponds to the first  field  in
>         one  of  the  records in /proc/self/mountinfo.  Opening the pathname in
>         the fifth field of that record yields a file descriptor for  the  mount
>         point;  that  file  descriptor  can  be  used  in  a subsequent call to
>         open_by_handle_at().
>
> Fifth field seems to be the mount point. man proc says.
>
>                (5)  mount  point:  the  pathname of the mount point relative to
>                     the process's root directory.
>
> So opening mount point and saving as mount_fd (if it is not already
> in hash table) and then take a per inode reference count on mount_fd
> object looks like will solve the life time issue of mount_fd as
> well as the issue of temporary failures arising because we can't
> open a device special file.

Well, we’ve had this discussion before, and it’s possible that a 
filesystem has a device file as its mount point.

But given the inotify complications, there’s really a good reason we 
should use mountinfo.

>> It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
>> but if that’s the only way...
> yes. We already have lo->proc_self_fd. Maybe we need to keep
> /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
> that any mount table changes will still be visible despite the fact
> I have fd open (and don't have to open new fd to notice new mount/unmount
> changes).

Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful 
yet; when I tried keeping the fd open, reading from it would just return 
0 bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so 
that nothing else in /proc is visible. Perhaps we need to bind-mount 
/proc/self/mountinfo into /proc/self/fd before that...

I’ll just have to try.

Hanna



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-08-10 15:26           ` Hanna Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-10 15:26 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: qemu-devel, virtio-fs, Max Reitz

On 10.08.21 17:23, Vivek Goyal wrote:
> On Tue, Aug 10, 2021 at 10:32:55AM +0200, Hanna Reitz wrote:
>> On 09.08.21 20:41, Vivek Goyal wrote:
>>> On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
>>>> When the inode_file_handles option is set, try to generate a file handle
>>>> for new inodes instead of opening an O_PATH FD.
>>>>
>>>> Being able to open these again will require CAP_DAC_READ_SEARCH, so the
>>>> description text tells the user they will also need to specify
>>>> -o modcaps=+dac_read_search.
>>>>
>>>> Generating a file handle returns the mount ID it is valid for.  Opening
>>>> it will require an FD instead.  We have mount_fds to map an ID to an FD.
>>>> get_file_handle() fills the hash map by opening the file we have
>>>> generated a handle for.  To verify that the resulting FD indeed
>>>> represents the handle's mount ID, we use statx().  Therefore, using file
>>>> handles requires statx() support.
>>> So opening the file and storing that fd in mount_fds table might be
>>> a potential problem with inotify work Ioannis is doing.
>>>
>>> So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
>>> say user unlinks foo.txt. If notifications are enabled, final notification
>>> will not be generated till this mount_fds fd is closed.
>>>
>>> Now question is when will this fd be closed? If it closed at some
>>> later point and then notification is generated, that will break
>>> notificaitons.
>> Currently, it is never closed.
>>
>>> In fact even O_PATH fd is delaying notifications due to same reason.
>>> But its not too bad as we close O_PATH fd pretty quickly after
>>> unlinking. And we were hoping that file handle support will get rid
>>> of this problem because we will not keep O_PATH fd open.
>>>
>>> But, IIUC, mount_fds stuff will make it even worse. I did not see
>>> the code which removes this fd from mount_fds. So I am not sure what's
>>> the life time of this fd.
>> The lifetime is forever.  If we wanted to remove it at some point, we’d need
>> to track how many file handles we have open for the given mount fd and then
>> remove it from the table once the count reaches 0, so it would still be
>> delayed.
>>
>> I think in practice the first thing that is looked up from some mount will
>> probably be the root directory, which cannot be deleted before everything
>> else on the mount is gone, so that would work.  We track how many handles
>> are there, if the whole mount were to be deleted, I hope all lo_inodes are
>> evicted, the count goes to 0, and we can drop the mount fd.
> Keeping a reference count on mount_fd object make sense. So we probably
> maintain this hash table and lookup using mount_id (as you are already
> doing). All subsequent inodes from same filesystem will use same
> object. Once all inodes have been flushed out, then mount_fd object
> should go away as well (allowing for unmount on host).
>
>> I think we can make the assumption that the mount fd is the root directory
>> certain by, well, looking into mountinfo...  That would result in us always
>> opening the root node of the filesystem, so that first the whole filesystem
>> needs to disappear before it can be deleted (and our mount fd closed) –
>> which should work, I guess?
> This seems more reasonable. And I think that's what man page seems to
> suggest.
>
>         The  mount_id  argument  returns an identifier for the filesystem mount
>         that corresponds to pathname.  This corresponds to the first  field  in
>         one  of  the  records in /proc/self/mountinfo.  Opening the pathname in
>         the fifth field of that record yields a file descriptor for  the  mount
>         point;  that  file  descriptor  can  be  used  in  a subsequent call to
>         open_by_handle_at().
>
> Fifth field seems to be the mount point. man proc says.
>
>                (5)  mount  point:  the  pathname of the mount point relative to
>                     the process's root directory.
>
> So opening mount point and saving as mount_fd (if it is not already
> in hash table) and then take a per inode reference count on mount_fd
> object looks like will solve the life time issue of mount_fd as
> well as the issue of temporary failures arising because we can't
> open a device special file.

Well, we’ve had this discussion before, and it’s possible that a 
filesystem has a device file as its mount point.

But given the inotify complications, there’s really a good reason we 
should use mountinfo.

>> It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
>> but if that’s the only way...
> yes. We already have lo->proc_self_fd. Maybe we need to keep
> /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
> that any mount table changes will still be visible despite the fact
> I have fd open (and don't have to open new fd to notice new mount/unmount
> changes).

Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful 
yet; when I tried keeping the fd open, reading from it would just return 
0 bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so 
that nothing else in /proc is visible. Perhaps we need to bind-mount 
/proc/self/mountinfo into /proc/self/fd before that...

I’ll just have to try.

Hanna


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
  2021-08-10 15:26           ` [Virtio-fs] " Hanna Reitz
@ 2021-08-10 15:57             ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-10 15:57 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert, virtio-fs,
	Ioannis Angelakopoulos, Max Reitz

On Tue, Aug 10, 2021 at 05:26:15PM +0200, Hanna Reitz wrote:
> On 10.08.21 17:23, Vivek Goyal wrote:
> > On Tue, Aug 10, 2021 at 10:32:55AM +0200, Hanna Reitz wrote:
> > > On 09.08.21 20:41, Vivek Goyal wrote:
> > > > On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
> > > > > When the inode_file_handles option is set, try to generate a file handle
> > > > > for new inodes instead of opening an O_PATH FD.
> > > > > 
> > > > > Being able to open these again will require CAP_DAC_READ_SEARCH, so the
> > > > > description text tells the user they will also need to specify
> > > > > -o modcaps=+dac_read_search.
> > > > > 
> > > > > Generating a file handle returns the mount ID it is valid for.  Opening
> > > > > it will require an FD instead.  We have mount_fds to map an ID to an FD.
> > > > > get_file_handle() fills the hash map by opening the file we have
> > > > > generated a handle for.  To verify that the resulting FD indeed
> > > > > represents the handle's mount ID, we use statx().  Therefore, using file
> > > > > handles requires statx() support.
> > > > So opening the file and storing that fd in mount_fds table might be
> > > > a potential problem with inotify work Ioannis is doing.
> > > > 
> > > > So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
> > > > say user unlinks foo.txt. If notifications are enabled, final notification
> > > > will not be generated till this mount_fds fd is closed.
> > > > 
> > > > Now question is when will this fd be closed? If it closed at some
> > > > later point and then notification is generated, that will break
> > > > notificaitons.
> > > Currently, it is never closed.
> > > 
> > > > In fact even O_PATH fd is delaying notifications due to same reason.
> > > > But its not too bad as we close O_PATH fd pretty quickly after
> > > > unlinking. And we were hoping that file handle support will get rid
> > > > of this problem because we will not keep O_PATH fd open.
> > > > 
> > > > But, IIUC, mount_fds stuff will make it even worse. I did not see
> > > > the code which removes this fd from mount_fds. So I am not sure what's
> > > > the life time of this fd.
> > > The lifetime is forever.  If we wanted to remove it at some point, we’d need
> > > to track how many file handles we have open for the given mount fd and then
> > > remove it from the table once the count reaches 0, so it would still be
> > > delayed.
> > > 
> > > I think in practice the first thing that is looked up from some mount will
> > > probably be the root directory, which cannot be deleted before everything
> > > else on the mount is gone, so that would work.  We track how many handles
> > > are there, if the whole mount were to be deleted, I hope all lo_inodes are
> > > evicted, the count goes to 0, and we can drop the mount fd.
> > Keeping a reference count on mount_fd object make sense. So we probably
> > maintain this hash table and lookup using mount_id (as you are already
> > doing). All subsequent inodes from same filesystem will use same
> > object. Once all inodes have been flushed out, then mount_fd object
> > should go away as well (allowing for unmount on host).
> > 
> > > I think we can make the assumption that the mount fd is the root directory
> > > certain by, well, looking into mountinfo...  That would result in us always
> > > opening the root node of the filesystem, so that first the whole filesystem
> > > needs to disappear before it can be deleted (and our mount fd closed) –
> > > which should work, I guess?
> > This seems more reasonable. And I think that's what man page seems to
> > suggest.
> > 
> >         The  mount_id  argument  returns an identifier for the filesystem mount
> >         that corresponds to pathname.  This corresponds to the first  field  in
> >         one  of  the  records in /proc/self/mountinfo.  Opening the pathname in
> >         the fifth field of that record yields a file descriptor for  the  mount
> >         point;  that  file  descriptor  can  be  used  in  a subsequent call to
> >         open_by_handle_at().
> > 
> > Fifth field seems to be the mount point. man proc says.
> > 
> >                (5)  mount  point:  the  pathname of the mount point relative to
> >                     the process's root directory.
> > 
> > So opening mount point and saving as mount_fd (if it is not already
> > in hash table) and then take a per inode reference count on mount_fd
> > object looks like will solve the life time issue of mount_fd as
> > well as the issue of temporary failures arising because we can't
> > open a device special file.
> 
> Well, we’ve had this discussion before, and it’s possible that a filesystem
> has a device file as its mount point.

Yes. I think you did modified fuse to do some special trickery. Not sure
where should that be fixed. 

If filesystem is faking, then it can fake a device node as regular
file and fool us into opening it as well?

> 
> But given the inotify complications, there’s really a good reason we should
> use mountinfo.
> 
> > > It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
> > > but if that’s the only way...
> > yes. We already have lo->proc_self_fd. Maybe we need to keep
> > /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
> > that any mount table changes will still be visible despite the fact
> > I have fd open (and don't have to open new fd to notice new mount/unmount
> > changes).
> 
> Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
> when I tried keeping the fd open, reading from it would just return 0
> bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
> nothing else in /proc is visible. Perhaps we need to bind-mount
> /proc/self/mountinfo into /proc/self/fd before that...

Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
before /proc/self/fd is bind mounted on /proc?

Vivek



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-08-10 15:57             ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-10 15:57 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: qemu-devel, virtio-fs, Max Reitz

On Tue, Aug 10, 2021 at 05:26:15PM +0200, Hanna Reitz wrote:
> On 10.08.21 17:23, Vivek Goyal wrote:
> > On Tue, Aug 10, 2021 at 10:32:55AM +0200, Hanna Reitz wrote:
> > > On 09.08.21 20:41, Vivek Goyal wrote:
> > > > On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
> > > > > When the inode_file_handles option is set, try to generate a file handle
> > > > > for new inodes instead of opening an O_PATH FD.
> > > > > 
> > > > > Being able to open these again will require CAP_DAC_READ_SEARCH, so the
> > > > > description text tells the user they will also need to specify
> > > > > -o modcaps=+dac_read_search.
> > > > > 
> > > > > Generating a file handle returns the mount ID it is valid for.  Opening
> > > > > it will require an FD instead.  We have mount_fds to map an ID to an FD.
> > > > > get_file_handle() fills the hash map by opening the file we have
> > > > > generated a handle for.  To verify that the resulting FD indeed
> > > > > represents the handle's mount ID, we use statx().  Therefore, using file
> > > > > handles requires statx() support.
> > > > So opening the file and storing that fd in mount_fds table might be
> > > > a potential problem with inotify work Ioannis is doing.
> > > > 
> > > > So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
> > > > say user unlinks foo.txt. If notifications are enabled, final notification
> > > > will not be generated till this mount_fds fd is closed.
> > > > 
> > > > Now question is when will this fd be closed? If it closed at some
> > > > later point and then notification is generated, that will break
> > > > notificaitons.
> > > Currently, it is never closed.
> > > 
> > > > In fact even O_PATH fd is delaying notifications due to same reason.
> > > > But its not too bad as we close O_PATH fd pretty quickly after
> > > > unlinking. And we were hoping that file handle support will get rid
> > > > of this problem because we will not keep O_PATH fd open.
> > > > 
> > > > But, IIUC, mount_fds stuff will make it even worse. I did not see
> > > > the code which removes this fd from mount_fds. So I am not sure what's
> > > > the life time of this fd.
> > > The lifetime is forever.  If we wanted to remove it at some point, we’d need
> > > to track how many file handles we have open for the given mount fd and then
> > > remove it from the table once the count reaches 0, so it would still be
> > > delayed.
> > > 
> > > I think in practice the first thing that is looked up from some mount will
> > > probably be the root directory, which cannot be deleted before everything
> > > else on the mount is gone, so that would work.  We track how many handles
> > > are there, if the whole mount were to be deleted, I hope all lo_inodes are
> > > evicted, the count goes to 0, and we can drop the mount fd.
> > Keeping a reference count on mount_fd object make sense. So we probably
> > maintain this hash table and lookup using mount_id (as you are already
> > doing). All subsequent inodes from same filesystem will use same
> > object. Once all inodes have been flushed out, then mount_fd object
> > should go away as well (allowing for unmount on host).
> > 
> > > I think we can make the assumption that the mount fd is the root directory
> > > certain by, well, looking into mountinfo...  That would result in us always
> > > opening the root node of the filesystem, so that first the whole filesystem
> > > needs to disappear before it can be deleted (and our mount fd closed) –
> > > which should work, I guess?
> > This seems more reasonable. And I think that's what man page seems to
> > suggest.
> > 
> >         The  mount_id  argument  returns an identifier for the filesystem mount
> >         that corresponds to pathname.  This corresponds to the first  field  in
> >         one  of  the  records in /proc/self/mountinfo.  Opening the pathname in
> >         the fifth field of that record yields a file descriptor for  the  mount
> >         point;  that  file  descriptor  can  be  used  in  a subsequent call to
> >         open_by_handle_at().
> > 
> > Fifth field seems to be the mount point. man proc says.
> > 
> >                (5)  mount  point:  the  pathname of the mount point relative to
> >                     the process's root directory.
> > 
> > So opening mount point and saving as mount_fd (if it is not already
> > in hash table) and then take a per inode reference count on mount_fd
> > object looks like will solve the life time issue of mount_fd as
> > well as the issue of temporary failures arising because we can't
> > open a device special file.
> 
> Well, we’ve had this discussion before, and it’s possible that a filesystem
> has a device file as its mount point.

Yes. I think you did modified fuse to do some special trickery. Not sure
where should that be fixed. 

If filesystem is faking, then it can fake a device node as regular
file and fool us into opening it as well?

> 
> But given the inotify complications, there’s really a good reason we should
> use mountinfo.
> 
> > > It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
> > > but if that’s the only way...
> > yes. We already have lo->proc_self_fd. Maybe we need to keep
> > /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
> > that any mount table changes will still be visible despite the fact
> > I have fd open (and don't have to open new fd to notice new mount/unmount
> > changes).
> 
> Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
> when I tried keeping the fd open, reading from it would just return 0
> bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
> nothing else in /proc is visible. Perhaps we need to bind-mount
> /proc/self/mountinfo into /proc/self/fd before that...

Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
before /proc/self/fd is bind mounted on /proc?

Vivek


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table
  2021-08-10 14:13           ` [Virtio-fs] " Hanna Reitz
@ 2021-08-10 17:51             ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-10 17:51 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: virtio-fs, Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert,
	Max Reitz

On Tue, Aug 10, 2021 at 04:13:44PM +0200, Hanna Reitz wrote:
> On 10.08.21 16:07, Vivek Goyal wrote:
> > On Mon, Aug 09, 2021 at 06:47:18PM +0200, Hanna Reitz wrote:
> > > On 09.08.21 18:10, Vivek Goyal wrote:
> > > > On Fri, Jul 30, 2021 at 05:01:32PM +0200, Max Reitz wrote:
> > > > > Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> > > > > FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> > > > > its inode ID will remain in use until we drop our lo_inode (and
> > > > > lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> > > > > the inode ID as an lo_inode key, because any inode with an inode ID we
> > > > > find in lo_data.inodes (on the same filesystem) must be the exact same
> > > > > file.
> > > > > 
> > > > > This will change when we start setting lo_inode.fhandle so we do not
> > > > > have to keep an O_PATH FD open.  Then, unlinking such an inode will
> > > > > immediately remove it, so its ID can then be reused by newly created
> > > > > files, even while the lo_inode object is still there[1].
> > > > > 
> > > > > So creating a new file can then reuse the old file's inode ID, and
> > > > > looking up the new file would lead to us finding the old file's
> > > > > lo_inode, which is not ideal.
> > > > > 
> > > > > Luckily, just as file handles cause this problem, they also solve it:  A
> > > > > file handle contains a generation ID, which changes when an inode ID is
> > > > > reused, so the new file can be distinguished from the old one.  So all
> > > > > we need to do is to add a second map besides lo_data.inodes that maps
> > > > > file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> > > > > clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> > > > > 
> > > > > Unfortunately, we cannot rely on being able to generate file handles
> > > > > every time.  Therefore, we still enter every lo_inode object into
> > > > > inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> > > > > potential inodes_by_handle entry then has precedence, the inodes_by_ids
> > > > > entry is just a fallback.
> > > > > 
> > > > > Note that we do not generate lo_fhandle objects yet, and so we also do
> > > > > not enter anything into the inodes_by_handle map yet.  Also, all lookups
> > > > > skip that map.  We might manually create file handles with some code
> > > > > that is immediately removed by the next patch again, but that would
> > > > > break the assumption in lo_find() that every lo_inode with a non-NULL
> > > > > .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> > > > > leave actually using the inodes_by_handle map for the next patch.
> > > > > 
> > > > > [1] If some application in the guest still has the file open, there is
> > > > > going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> > > > > case, the inode will only go away once every application in the guest
> > > > > has closed it.  The problem described only applies to cases where the
> > > > > guest does not have the file open, and it is just in the dentry cache,
> > > > > basically.
> > > > > 
> > > > > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > > > > ---
> > > > >    tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
> > > > >    1 file changed, 65 insertions(+), 16 deletions(-)
> > > > > 
> > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > > > index 487448d666..f9d8b2f134 100644
> > > > > --- a/tools/virtiofsd/passthrough_ll.c
> > > > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > > > @@ -180,7 +180,8 @@ struct lo_data {
> > > > >        int announce_submounts;
> > > > >        bool use_statx;
> > > > >        struct lo_inode root;
> > > > > -    GHashTable *inodes; /* protected by lo->mutex */
> > > > > +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
> > > > > +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
> > > > >        struct lo_map ino_map; /* protected by lo->mutex */
> > > > >        struct lo_map dirp_map; /* protected by lo->mutex */
> > > > >        struct lo_map fd_map; /* protected by lo->mutex */
> > > > > @@ -263,8 +264,9 @@ static struct {
> > > > >    /* That we loaded cap-ng in the current thread from the saved */
> > > > >    static __thread bool cap_loaded = 0;
> > > > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > > > -                                uint64_t mnt_id);
> > > > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > > > +                                const struct lo_fhandle *fhandle,
> > > > > +                                struct stat *st, uint64_t mnt_id);
> > > > >    static int xattr_map_client(const struct lo_data *lo, const char *client_name,
> > > > >                                char **out_name);
> > > > > @@ -1064,18 +1066,40 @@ out_err:
> > > > >        fuse_reply_err(req, saverr);
> > > > >    }
> > > > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > > > -                                uint64_t mnt_id)
> > > > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > > > +                                const struct lo_fhandle *fhandle,
> > > > > +                                struct stat *st, uint64_t mnt_id)
> > > > >    {
> > > > > -    struct lo_inode *p;
> > > > > -    struct lo_key key = {
> > > > > +    struct lo_inode *p = NULL;
> > > > > +    struct lo_key ids_key = {
> > > > >            .ino = st->st_ino,
> > > > >            .dev = st->st_dev,
> > > > >            .mnt_id = mnt_id,
> > > > >        };
> > > > >        pthread_mutex_lock(&lo->mutex);
> > > > > -    p = g_hash_table_lookup(lo->inodes, &key);
> > > > > +    if (fhandle) {
> > > > > +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
> > > > > +    }
> > > > > +    if (!p) {
> > > > > +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
> > > > So even if fhandle is not NULL, we will still lookup the inode
> > > > object in lo->inodes_by_ids? I thought fallback was only required
> > > > if we could not generate file handle to begin with and in that case
> > > > fhandle will be NULL?
> > > Well.  I think it depends again on when file handle generation can fail and
> > > when it cannot.  If we assume it can randomly fail at any time, then it’s
> > > possible we create an lo_inode with an O_PATH fd, but later we are able to
> > > generate a file handle for it.  So we first try a lookup by file handle
> > > here, which would fail, but we’d still have to try a lookup by IDs, so we
> > > can find the O_PATH lo_inode.
> > > 
> > > An example case would be if at first we weren’t able to open a mount fd
> > > (because this file is a device node and the first lo_inode looked up on its
> > > filesystem), and so we couldn’t generate a file handle that we would be sure
> > > would work; but later for the lookup we can generate a file handle (because
> > > some other node on that filesystem has been opened by then, so we have a
> > > mount fd).
> > Ok, got it. If we are assuming that file handle generation can fail
> > randomly, then what will happen in following scenario.
> > 
> > - lookup, file handle generated, inode added to both hash tables.
> > 
> > - another lookup, handle generation failed. We call lo_find(), it
> >    finds inode in lo->inodes_by_ids but rejects it because p->fd == -1.
> > 
> > - Now lo_find() will return NULL and caller will assume inode could
> >    not be found (despite the fact it is in there) and caller lo_do_lookup()
> >    will try to add new inode to hash tables. So we will have two inode
> >    instances in hash table with same st_dev, st_ino, mnt_id. One will
> >    have file handle while other will have O_PATH fd.
> > 
> > So we have two inodes in cache representing same file. One using file
> > handle while other using O_PATH fd.
> > 
> > One side affect of this is says guest has looked up a file (and got
> > node id 1, fhandle based inode). And later guest is revalidating
> > that inode, this time it could get inode 2 (O_PATH fd). Guest will
> > think inode has changed and discard previous inode and trigger
> > another lookup. This typically happens only if file has gone away.
> > But now it will happen because we have two inodes in cache representing
> > same file.
> > 
> > There might be other cases where this is bad. I can't think of any
> > at this point of time.
> > 
> > If could solve the issue of mount_fd, then we have to use fallback
> > path probably only for EOPNOTSUPP case. And then we can be sure
> > that cache will always have one inode either fhandle based or
> > O_PATH based (and not both).
> 
> OK, but can we truly solve the mount_fd issue?
> 
> What I think we could do is have two variants of the file handle generation
> function, one which is supposed to create a usable file handle (so this
> version will ensure mount_fds contains a valid fd for the mount ID), and one
> that just generates a file handle for lookup (i.e. it doesn’t look into
> mount_fds at all).  The latter version would practically only fail in the
> EOPNOTSUPP case.
> 
> Would that get around the issue?

IIUC, suggestion is that in lo_do_lookup() we will use first variant
and in lookup_rename() we will use second variant. If yes, that does not
solve the issue of having two inodes representing same file.
lo_do_lookup() might be successful first time and add inode with fhandle
and fail next time and add a new inode with O_PATH fd. 

Maybe this will not happen easily because first operation will add
mount_fd and then second operation will find existing mount_fd and
will not fail atleast due to mount_fd. Might fail due to some other
temporary resource failure etc.

Vivek



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table
@ 2021-08-10 17:51             ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-10 17:51 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: virtio-fs, qemu-devel, Max Reitz

On Tue, Aug 10, 2021 at 04:13:44PM +0200, Hanna Reitz wrote:
> On 10.08.21 16:07, Vivek Goyal wrote:
> > On Mon, Aug 09, 2021 at 06:47:18PM +0200, Hanna Reitz wrote:
> > > On 09.08.21 18:10, Vivek Goyal wrote:
> > > > On Fri, Jul 30, 2021 at 05:01:32PM +0200, Max Reitz wrote:
> > > > > Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> > > > > FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> > > > > its inode ID will remain in use until we drop our lo_inode (and
> > > > > lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> > > > > the inode ID as an lo_inode key, because any inode with an inode ID we
> > > > > find in lo_data.inodes (on the same filesystem) must be the exact same
> > > > > file.
> > > > > 
> > > > > This will change when we start setting lo_inode.fhandle so we do not
> > > > > have to keep an O_PATH FD open.  Then, unlinking such an inode will
> > > > > immediately remove it, so its ID can then be reused by newly created
> > > > > files, even while the lo_inode object is still there[1].
> > > > > 
> > > > > So creating a new file can then reuse the old file's inode ID, and
> > > > > looking up the new file would lead to us finding the old file's
> > > > > lo_inode, which is not ideal.
> > > > > 
> > > > > Luckily, just as file handles cause this problem, they also solve it:  A
> > > > > file handle contains a generation ID, which changes when an inode ID is
> > > > > reused, so the new file can be distinguished from the old one.  So all
> > > > > we need to do is to add a second map besides lo_data.inodes that maps
> > > > > file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> > > > > clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> > > > > 
> > > > > Unfortunately, we cannot rely on being able to generate file handles
> > > > > every time.  Therefore, we still enter every lo_inode object into
> > > > > inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> > > > > potential inodes_by_handle entry then has precedence, the inodes_by_ids
> > > > > entry is just a fallback.
> > > > > 
> > > > > Note that we do not generate lo_fhandle objects yet, and so we also do
> > > > > not enter anything into the inodes_by_handle map yet.  Also, all lookups
> > > > > skip that map.  We might manually create file handles with some code
> > > > > that is immediately removed by the next patch again, but that would
> > > > > break the assumption in lo_find() that every lo_inode with a non-NULL
> > > > > .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> > > > > leave actually using the inodes_by_handle map for the next patch.
> > > > > 
> > > > > [1] If some application in the guest still has the file open, there is
> > > > > going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> > > > > case, the inode will only go away once every application in the guest
> > > > > has closed it.  The problem described only applies to cases where the
> > > > > guest does not have the file open, and it is just in the dentry cache,
> > > > > basically.
> > > > > 
> > > > > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > > > > ---
> > > > >    tools/virtiofsd/passthrough_ll.c | 81 +++++++++++++++++++++++++-------
> > > > >    1 file changed, 65 insertions(+), 16 deletions(-)
> > > > > 
> > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > > > index 487448d666..f9d8b2f134 100644
> > > > > --- a/tools/virtiofsd/passthrough_ll.c
> > > > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > > > @@ -180,7 +180,8 @@ struct lo_data {
> > > > >        int announce_submounts;
> > > > >        bool use_statx;
> > > > >        struct lo_inode root;
> > > > > -    GHashTable *inodes; /* protected by lo->mutex */
> > > > > +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
> > > > > +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
> > > > >        struct lo_map ino_map; /* protected by lo->mutex */
> > > > >        struct lo_map dirp_map; /* protected by lo->mutex */
> > > > >        struct lo_map fd_map; /* protected by lo->mutex */
> > > > > @@ -263,8 +264,9 @@ static struct {
> > > > >    /* That we loaded cap-ng in the current thread from the saved */
> > > > >    static __thread bool cap_loaded = 0;
> > > > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > > > -                                uint64_t mnt_id);
> > > > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > > > +                                const struct lo_fhandle *fhandle,
> > > > > +                                struct stat *st, uint64_t mnt_id);
> > > > >    static int xattr_map_client(const struct lo_data *lo, const char *client_name,
> > > > >                                char **out_name);
> > > > > @@ -1064,18 +1066,40 @@ out_err:
> > > > >        fuse_reply_err(req, saverr);
> > > > >    }
> > > > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > > > -                                uint64_t mnt_id)
> > > > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > > > +                                const struct lo_fhandle *fhandle,
> > > > > +                                struct stat *st, uint64_t mnt_id)
> > > > >    {
> > > > > -    struct lo_inode *p;
> > > > > -    struct lo_key key = {
> > > > > +    struct lo_inode *p = NULL;
> > > > > +    struct lo_key ids_key = {
> > > > >            .ino = st->st_ino,
> > > > >            .dev = st->st_dev,
> > > > >            .mnt_id = mnt_id,
> > > > >        };
> > > > >        pthread_mutex_lock(&lo->mutex);
> > > > > -    p = g_hash_table_lookup(lo->inodes, &key);
> > > > > +    if (fhandle) {
> > > > > +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
> > > > > +    }
> > > > > +    if (!p) {
> > > > > +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
> > > > So even if fhandle is not NULL, we will still lookup the inode
> > > > object in lo->inodes_by_ids? I thought fallback was only required
> > > > if we could not generate file handle to begin with and in that case
> > > > fhandle will be NULL?
> > > Well.  I think it depends again on when file handle generation can fail and
> > > when it cannot.  If we assume it can randomly fail at any time, then it’s
> > > possible we create an lo_inode with an O_PATH fd, but later we are able to
> > > generate a file handle for it.  So we first try a lookup by file handle
> > > here, which would fail, but we’d still have to try a lookup by IDs, so we
> > > can find the O_PATH lo_inode.
> > > 
> > > An example case would be if at first we weren’t able to open a mount fd
> > > (because this file is a device node and the first lo_inode looked up on its
> > > filesystem), and so we couldn’t generate a file handle that we would be sure
> > > would work; but later for the lookup we can generate a file handle (because
> > > some other node on that filesystem has been opened by then, so we have a
> > > mount fd).
> > Ok, got it. If we are assuming that file handle generation can fail
> > randomly, then what will happen in following scenario.
> > 
> > - lookup, file handle generated, inode added to both hash tables.
> > 
> > - another lookup, handle generation failed. We call lo_find(), it
> >    finds inode in lo->inodes_by_ids but rejects it because p->fd == -1.
> > 
> > - Now lo_find() will return NULL and caller will assume inode could
> >    not be found (despite the fact it is in there) and caller lo_do_lookup()
> >    will try to add new inode to hash tables. So we will have two inode
> >    instances in hash table with same st_dev, st_ino, mnt_id. One will
> >    have file handle while other will have O_PATH fd.
> > 
> > So we have two inodes in cache representing same file. One using file
> > handle while other using O_PATH fd.
> > 
> > One side affect of this is says guest has looked up a file (and got
> > node id 1, fhandle based inode). And later guest is revalidating
> > that inode, this time it could get inode 2 (O_PATH fd). Guest will
> > think inode has changed and discard previous inode and trigger
> > another lookup. This typically happens only if file has gone away.
> > But now it will happen because we have two inodes in cache representing
> > same file.
> > 
> > There might be other cases where this is bad. I can't think of any
> > at this point of time.
> > 
> > If could solve the issue of mount_fd, then we have to use fallback
> > path probably only for EOPNOTSUPP case. And then we can be sure
> > that cache will always have one inode either fhandle based or
> > O_PATH based (and not both).
> 
> OK, but can we truly solve the mount_fd issue?
> 
> What I think we could do is have two variants of the file handle generation
> function, one which is supposed to create a usable file handle (so this
> version will ensure mount_fds contains a valid fd for the mount ID), and one
> that just generates a file handle for lookup (i.e. it doesn’t look into
> mount_fds at all).  The latter version would practically only fail in the
> EOPNOTSUPP case.
> 
> Would that get around the issue?

IIUC, suggestion is that in lo_do_lookup() we will use first variant
and in lookup_rename() we will use second variant. If yes, that does not
solve the issue of having two inodes representing same file.
lo_do_lookup() might be successful first time and add inode with fhandle
and fail next time and add a new inode with O_PATH fd. 

Maybe this will not happen easily because first operation will add
mount_fd and then second operation will find existing mount_fd and
will not fail atleast due to mount_fd. Might fail due to some other
temporary resource failure etc.

Vivek


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
  2021-08-10 15:57             ` [Virtio-fs] " Vivek Goyal
@ 2021-08-11  6:41               ` Hanna Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-11  6:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert, virtio-fs,
	Ioannis Angelakopoulos, Max Reitz

On 10.08.21 17:57, Vivek Goyal wrote:
> On Tue, Aug 10, 2021 at 05:26:15PM +0200, Hanna Reitz wrote:
>> On 10.08.21 17:23, Vivek Goyal wrote:
>>> On Tue, Aug 10, 2021 at 10:32:55AM +0200, Hanna Reitz wrote:
>>>> On 09.08.21 20:41, Vivek Goyal wrote:
>>>>> On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
>>>>>> When the inode_file_handles option is set, try to generate a file handle
>>>>>> for new inodes instead of opening an O_PATH FD.
>>>>>>
>>>>>> Being able to open these again will require CAP_DAC_READ_SEARCH, so the
>>>>>> description text tells the user they will also need to specify
>>>>>> -o modcaps=+dac_read_search.
>>>>>>
>>>>>> Generating a file handle returns the mount ID it is valid for.  Opening
>>>>>> it will require an FD instead.  We have mount_fds to map an ID to an FD.
>>>>>> get_file_handle() fills the hash map by opening the file we have
>>>>>> generated a handle for.  To verify that the resulting FD indeed
>>>>>> represents the handle's mount ID, we use statx().  Therefore, using file
>>>>>> handles requires statx() support.
>>>>> So opening the file and storing that fd in mount_fds table might be
>>>>> a potential problem with inotify work Ioannis is doing.
>>>>>
>>>>> So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
>>>>> say user unlinks foo.txt. If notifications are enabled, final notification
>>>>> will not be generated till this mount_fds fd is closed.
>>>>>
>>>>> Now question is when will this fd be closed? If it closed at some
>>>>> later point and then notification is generated, that will break
>>>>> notificaitons.
>>>> Currently, it is never closed.
>>>>
>>>>> In fact even O_PATH fd is delaying notifications due to same reason.
>>>>> But its not too bad as we close O_PATH fd pretty quickly after
>>>>> unlinking. And we were hoping that file handle support will get rid
>>>>> of this problem because we will not keep O_PATH fd open.
>>>>>
>>>>> But, IIUC, mount_fds stuff will make it even worse. I did not see
>>>>> the code which removes this fd from mount_fds. So I am not sure what's
>>>>> the life time of this fd.
>>>> The lifetime is forever.  If we wanted to remove it at some point, we’d need
>>>> to track how many file handles we have open for the given mount fd and then
>>>> remove it from the table once the count reaches 0, so it would still be
>>>> delayed.
>>>>
>>>> I think in practice the first thing that is looked up from some mount will
>>>> probably be the root directory, which cannot be deleted before everything
>>>> else on the mount is gone, so that would work.  We track how many handles
>>>> are there, if the whole mount were to be deleted, I hope all lo_inodes are
>>>> evicted, the count goes to 0, and we can drop the mount fd.
>>> Keeping a reference count on mount_fd object make sense. So we probably
>>> maintain this hash table and lookup using mount_id (as you are already
>>> doing). All subsequent inodes from same filesystem will use same
>>> object. Once all inodes have been flushed out, then mount_fd object
>>> should go away as well (allowing for unmount on host).
>>>
>>>> I think we can make the assumption that the mount fd is the root directory
>>>> certain by, well, looking into mountinfo...  That would result in us always
>>>> opening the root node of the filesystem, so that first the whole filesystem
>>>> needs to disappear before it can be deleted (and our mount fd closed) –
>>>> which should work, I guess?
>>> This seems more reasonable. And I think that's what man page seems to
>>> suggest.
>>>
>>>          The  mount_id  argument  returns an identifier for the filesystem mount
>>>          that corresponds to pathname.  This corresponds to the first  field  in
>>>          one  of  the  records in /proc/self/mountinfo.  Opening the pathname in
>>>          the fifth field of that record yields a file descriptor for  the  mount
>>>          point;  that  file  descriptor  can  be  used  in  a subsequent call to
>>>          open_by_handle_at().
>>>
>>> Fifth field seems to be the mount point. man proc says.
>>>
>>>                 (5)  mount  point:  the  pathname of the mount point relative to
>>>                      the process's root directory.
>>>
>>> So opening mount point and saving as mount_fd (if it is not already
>>> in hash table) and then take a per inode reference count on mount_fd
>>> object looks like will solve the life time issue of mount_fd as
>>> well as the issue of temporary failures arising because we can't
>>> open a device special file.
>> Well, we’ve had this discussion before, and it’s possible that a filesystem
>> has a device file as its mount point.
> Yes. I think you did modified fuse to do some special trickery. Not sure
> where should that be fixed.

I used fuse, but I’m sure a non-fuse filesystem can do the same.  (I 
mean, fuse effectively is a non-fuse filesystem, too.)

I don’t think it needs to be fixed, it just means we need to continue to 
stat the mount point to verify it’s a regular file or directory.

> If filesystem is faking, then it can fake a device node as regular
> file and fool us into opening it as well?

Well, of course opening any file can have side effects, on any filesystem.

>> But given the inotify complications, there’s really a good reason we should
>> use mountinfo.
>>
>>>> It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
>>>> but if that’s the only way...
>>> yes. We already have lo->proc_self_fd. Maybe we need to keep
>>> /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
>>> that any mount table changes will still be visible despite the fact
>>> I have fd open (and don't have to open new fd to notice new mount/unmount
>>> changes).
>> Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
>> when I tried keeping the fd open, reading from it would just return 0
>> bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
>> nothing else in /proc is visible. Perhaps we need to bind-mount
>> /proc/self/mountinfo into /proc/self/fd before that...
> Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> before /proc/self/fd is bind mounted on /proc?

Yes, I tried that, and then reading would just return 0 bytes.

Hanna



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-08-11  6:41               ` Hanna Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-11  6:41 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: qemu-devel, virtio-fs, Max Reitz

On 10.08.21 17:57, Vivek Goyal wrote:
> On Tue, Aug 10, 2021 at 05:26:15PM +0200, Hanna Reitz wrote:
>> On 10.08.21 17:23, Vivek Goyal wrote:
>>> On Tue, Aug 10, 2021 at 10:32:55AM +0200, Hanna Reitz wrote:
>>>> On 09.08.21 20:41, Vivek Goyal wrote:
>>>>> On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
>>>>>> When the inode_file_handles option is set, try to generate a file handle
>>>>>> for new inodes instead of opening an O_PATH FD.
>>>>>>
>>>>>> Being able to open these again will require CAP_DAC_READ_SEARCH, so the
>>>>>> description text tells the user they will also need to specify
>>>>>> -o modcaps=+dac_read_search.
>>>>>>
>>>>>> Generating a file handle returns the mount ID it is valid for.  Opening
>>>>>> it will require an FD instead.  We have mount_fds to map an ID to an FD.
>>>>>> get_file_handle() fills the hash map by opening the file we have
>>>>>> generated a handle for.  To verify that the resulting FD indeed
>>>>>> represents the handle's mount ID, we use statx().  Therefore, using file
>>>>>> handles requires statx() support.
>>>>> So opening the file and storing that fd in mount_fds table might be
>>>>> a potential problem with inotify work Ioannis is doing.
>>>>>
>>>>> So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
>>>>> say user unlinks foo.txt. If notifications are enabled, final notification
>>>>> will not be generated till this mount_fds fd is closed.
>>>>>
>>>>> Now question is when will this fd be closed? If it closed at some
>>>>> later point and then notification is generated, that will break
>>>>> notificaitons.
>>>> Currently, it is never closed.
>>>>
>>>>> In fact even O_PATH fd is delaying notifications due to same reason.
>>>>> But its not too bad as we close O_PATH fd pretty quickly after
>>>>> unlinking. And we were hoping that file handle support will get rid
>>>>> of this problem because we will not keep O_PATH fd open.
>>>>>
>>>>> But, IIUC, mount_fds stuff will make it even worse. I did not see
>>>>> the code which removes this fd from mount_fds. So I am not sure what's
>>>>> the life time of this fd.
>>>> The lifetime is forever.  If we wanted to remove it at some point, we’d need
>>>> to track how many file handles we have open for the given mount fd and then
>>>> remove it from the table once the count reaches 0, so it would still be
>>>> delayed.
>>>>
>>>> I think in practice the first thing that is looked up from some mount will
>>>> probably be the root directory, which cannot be deleted before everything
>>>> else on the mount is gone, so that would work.  We track how many handles
>>>> are there, if the whole mount were to be deleted, I hope all lo_inodes are
>>>> evicted, the count goes to 0, and we can drop the mount fd.
>>> Keeping a reference count on mount_fd object make sense. So we probably
>>> maintain this hash table and lookup using mount_id (as you are already
>>> doing). All subsequent inodes from same filesystem will use same
>>> object. Once all inodes have been flushed out, then mount_fd object
>>> should go away as well (allowing for unmount on host).
>>>
>>>> I think we can make the assumption that the mount fd is the root directory
>>>> certain by, well, looking into mountinfo...  That would result in us always
>>>> opening the root node of the filesystem, so that first the whole filesystem
>>>> needs to disappear before it can be deleted (and our mount fd closed) –
>>>> which should work, I guess?
>>> This seems more reasonable. And I think that's what man page seems to
>>> suggest.
>>>
>>>          The  mount_id  argument  returns an identifier for the filesystem mount
>>>          that corresponds to pathname.  This corresponds to the first  field  in
>>>          one  of  the  records in /proc/self/mountinfo.  Opening the pathname in
>>>          the fifth field of that record yields a file descriptor for  the  mount
>>>          point;  that  file  descriptor  can  be  used  in  a subsequent call to
>>>          open_by_handle_at().
>>>
>>> Fifth field seems to be the mount point. man proc says.
>>>
>>>                 (5)  mount  point:  the  pathname of the mount point relative to
>>>                      the process's root directory.
>>>
>>> So opening mount point and saving as mount_fd (if it is not already
>>> in hash table) and then take a per inode reference count on mount_fd
>>> object looks like will solve the life time issue of mount_fd as
>>> well as the issue of temporary failures arising because we can't
>>> open a device special file.
>> Well, we’ve had this discussion before, and it’s possible that a filesystem
>> has a device file as its mount point.
> Yes. I think you did modified fuse to do some special trickery. Not sure
> where should that be fixed.

I used fuse, but I’m sure a non-fuse filesystem can do the same.  (I 
mean, fuse effectively is a non-fuse filesystem, too.)

I don’t think it needs to be fixed, it just means we need to continue to 
stat the mount point to verify it’s a regular file or directory.

> If filesystem is faking, then it can fake a device node as regular
> file and fool us into opening it as well?

Well, of course opening any file can have side effects, on any filesystem.

>> But given the inotify complications, there’s really a good reason we should
>> use mountinfo.
>>
>>>> It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
>>>> but if that’s the only way...
>>> yes. We already have lo->proc_self_fd. Maybe we need to keep
>>> /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
>>> that any mount table changes will still be visible despite the fact
>>> I have fd open (and don't have to open new fd to notice new mount/unmount
>>> changes).
>> Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
>> when I tried keeping the fd open, reading from it would just return 0
>> bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
>> nothing else in /proc is visible. Perhaps we need to bind-mount
>> /proc/self/mountinfo into /proc/self/fd before that...
> Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> before /proc/self/fd is bind mounted on /proc?

Yes, I tried that, and then reading would just return 0 bytes.

Hanna


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
  2021-08-11  6:41               ` [Virtio-fs] " Hanna Reitz
@ 2021-08-16 19:44                 ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-16 19:44 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert, virtio-fs,
	Ioannis Angelakopoulos, Max Reitz

On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:

[..]
> > > But given the inotify complications, there’s really a good reason we should
> > > use mountinfo.
> > > 
> > > > > It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
> > > > > but if that’s the only way...
> > > > yes. We already have lo->proc_self_fd. Maybe we need to keep
> > > > /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
> > > > that any mount table changes will still be visible despite the fact
> > > > I have fd open (and don't have to open new fd to notice new mount/unmount
> > > > changes).
> > > Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
> > > when I tried keeping the fd open, reading from it would just return 0
> > > bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
> > > nothing else in /proc is visible. Perhaps we need to bind-mount
> > > /proc/self/mountinfo into /proc/self/fd before that...
> > Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> > before /proc/self/fd is bind mounted on /proc?
> 
> Yes, I tried that, and then reading would just return 0 bytes.

Hi Hanna,

I tried this simple patch and I can read /proc/self/mountinfo before
bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
I missing something.

Vivek

---
 tools/virtiofsd/passthrough_ll.c |   32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

Index: rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c
===================================================================
--- rhvgoyal-qemu.orig/tools/virtiofsd/passthrough_ll.c	2021-08-16 15:29:27.712223551 -0400
+++ rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c	2021-08-16 15:41:29.500032032 -0400
@@ -172,6 +172,7 @@ struct lo_data {
 
     /* An O_PATH file descriptor to /proc/self/fd/ */
     int proc_self_fd;
+    int proc_mountinfo;
     int user_killpriv_v2, killpriv_v2;
     /* If set, virtiofsd is responsible for setting umask during creation */
     bool change_umask;
@@ -3409,6 +3410,9 @@ static void setup_wait_parent_capabiliti
 static void setup_namespaces(struct lo_data *lo, struct fuse_session *se)
 {
     pid_t child;
+    int fd;
+    char buf[128];
+    ssize_t count;
 
     /*
      * Create a new pid namespace for *child* processes.  We'll have to
@@ -3472,6 +3476,24 @@ static void setup_namespaces(struct lo_d
         exit(1);
     }
 
+    fd = open("/proc/self/mountinfo", O_RDONLY);
+    if (fd == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(/proc/self/mountinfo, O_RDONLY): %m\n");
+        exit(1);
+    }
+
+    lo->proc_mountinfo = fd;
+
+    count = read(lo->proc_mountinfo, buf, 127);
+    if (count == -1) {
+        fuse_log(FUSE_LOG_ERR, "read(/proc/self/mountinfo): %m\n");
+        exit(1);
+    }
+
+    fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", count);
+    buf[count] = '\0';
+    fuse_log(FUSE_LOG_INFO, "%s\n", buf);
+
     /*
      * We only need /proc/self/fd. Prevent ".." from accessing parent
      * directories of /proc/self/fd by bind-mounting it over /proc. Since / was
@@ -3489,6 +3511,16 @@ static void setup_namespaces(struct lo_d
         fuse_log(FUSE_LOG_ERR, "open(/proc, O_PATH): %m\n");
         exit(1);
     }
+
+    count = read(lo->proc_mountinfo, buf, 127);
+    if (count == -1) {
+        fuse_log(FUSE_LOG_ERR, "read(/proc/self/mountinfo): %m\n");
+        exit(1);
+    }
+
+    fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", count);
+    buf[count] = '\0';
+    fuse_log(FUSE_LOG_INFO, "%s\n", buf);
 }
 
 /*



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-08-16 19:44                 ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-16 19:44 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: qemu-devel, virtio-fs, Max Reitz

On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:

[..]
> > > But given the inotify complications, there’s really a good reason we should
> > > use mountinfo.
> > > 
> > > > > It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
> > > > > but if that’s the only way...
> > > > yes. We already have lo->proc_self_fd. Maybe we need to keep
> > > > /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
> > > > that any mount table changes will still be visible despite the fact
> > > > I have fd open (and don't have to open new fd to notice new mount/unmount
> > > > changes).
> > > Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
> > > when I tried keeping the fd open, reading from it would just return 0
> > > bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
> > > nothing else in /proc is visible. Perhaps we need to bind-mount
> > > /proc/self/mountinfo into /proc/self/fd before that...
> > Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> > before /proc/self/fd is bind mounted on /proc?
> 
> Yes, I tried that, and then reading would just return 0 bytes.

Hi Hanna,

I tried this simple patch and I can read /proc/self/mountinfo before
bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
I missing something.

Vivek

---
 tools/virtiofsd/passthrough_ll.c |   32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

Index: rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c
===================================================================
--- rhvgoyal-qemu.orig/tools/virtiofsd/passthrough_ll.c	2021-08-16 15:29:27.712223551 -0400
+++ rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c	2021-08-16 15:41:29.500032032 -0400
@@ -172,6 +172,7 @@ struct lo_data {
 
     /* An O_PATH file descriptor to /proc/self/fd/ */
     int proc_self_fd;
+    int proc_mountinfo;
     int user_killpriv_v2, killpriv_v2;
     /* If set, virtiofsd is responsible for setting umask during creation */
     bool change_umask;
@@ -3409,6 +3410,9 @@ static void setup_wait_parent_capabiliti
 static void setup_namespaces(struct lo_data *lo, struct fuse_session *se)
 {
     pid_t child;
+    int fd;
+    char buf[128];
+    ssize_t count;
 
     /*
      * Create a new pid namespace for *child* processes.  We'll have to
@@ -3472,6 +3476,24 @@ static void setup_namespaces(struct lo_d
         exit(1);
     }
 
+    fd = open("/proc/self/mountinfo", O_RDONLY);
+    if (fd == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(/proc/self/mountinfo, O_RDONLY): %m\n");
+        exit(1);
+    }
+
+    lo->proc_mountinfo = fd;
+
+    count = read(lo->proc_mountinfo, buf, 127);
+    if (count == -1) {
+        fuse_log(FUSE_LOG_ERR, "read(/proc/self/mountinfo): %m\n");
+        exit(1);
+    }
+
+    fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", count);
+    buf[count] = '\0';
+    fuse_log(FUSE_LOG_INFO, "%s\n", buf);
+
     /*
      * We only need /proc/self/fd. Prevent ".." from accessing parent
      * directories of /proc/self/fd by bind-mounting it over /proc. Since / was
@@ -3489,6 +3511,16 @@ static void setup_namespaces(struct lo_d
         fuse_log(FUSE_LOG_ERR, "open(/proc, O_PATH): %m\n");
         exit(1);
     }
+
+    count = read(lo->proc_mountinfo, buf, 127);
+    if (count == -1) {
+        fuse_log(FUSE_LOG_ERR, "read(/proc/self/mountinfo): %m\n");
+        exit(1);
+    }
+
+    fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", count);
+    buf[count] = '\0';
+    fuse_log(FUSE_LOG_INFO, "%s\n", buf);
 }
 
 /*


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
  2021-08-16 19:44                 ` [Virtio-fs] " Vivek Goyal
@ 2021-08-17  8:27                   ` Hanna Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-17  8:27 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert, virtio-fs,
	Ioannis Angelakopoulos, Max Reitz

On 16.08.21 21:44, Vivek Goyal wrote:
> On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:
>
> [..]
>>>> But given the inotify complications, there’s really a good reason we should
>>>> use mountinfo.
>>>>
>>>>>> It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
>>>>>> but if that’s the only way...
>>>>> yes. We already have lo->proc_self_fd. Maybe we need to keep
>>>>> /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
>>>>> that any mount table changes will still be visible despite the fact
>>>>> I have fd open (and don't have to open new fd to notice new mount/unmount
>>>>> changes).
>>>> Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
>>>> when I tried keeping the fd open, reading from it would just return 0
>>>> bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
>>>> nothing else in /proc is visible. Perhaps we need to bind-mount
>>>> /proc/self/mountinfo into /proc/self/fd before that...
>>> Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
>>> before /proc/self/fd is bind mounted on /proc?
>> Yes, I tried that, and then reading would just return 0 bytes.
> Hi Hanna,
>
> I tried this simple patch and I can read /proc/self/mountinfo before
> bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
> I missing something.

Yes, but I tried reading it in the main loop (where we’d actually need 
it).  It looks like the umount2(".", MNT_DETACH) in setup_mounts() 
breaks it.

Hanna



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-08-17  8:27                   ` Hanna Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-17  8:27 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: qemu-devel, virtio-fs, Max Reitz

On 16.08.21 21:44, Vivek Goyal wrote:
> On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:
>
> [..]
>>>> But given the inotify complications, there’s really a good reason we should
>>>> use mountinfo.
>>>>
>>>>>> It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
>>>>>> but if that’s the only way...
>>>>> yes. We already have lo->proc_self_fd. Maybe we need to keep
>>>>> /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
>>>>> that any mount table changes will still be visible despite the fact
>>>>> I have fd open (and don't have to open new fd to notice new mount/unmount
>>>>> changes).
>>>> Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
>>>> when I tried keeping the fd open, reading from it would just return 0
>>>> bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
>>>> nothing else in /proc is visible. Perhaps we need to bind-mount
>>>> /proc/self/mountinfo into /proc/self/fd before that...
>>> Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
>>> before /proc/self/fd is bind mounted on /proc?
>> Yes, I tried that, and then reading would just return 0 bytes.
> Hi Hanna,
>
> I tried this simple patch and I can read /proc/self/mountinfo before
> bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
> I missing something.

Yes, but I tried reading it in the main loop (where we’d actually need 
it).  It looks like the umount2(".", MNT_DETACH) in setup_mounts() 
breaks it.

Hanna


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
  2021-08-17  8:27                   ` [Virtio-fs] " Hanna Reitz
@ 2021-08-17 19:45                     ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-17 19:45 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert, virtio-fs,
	Ioannis Angelakopoulos, Max Reitz

On Tue, Aug 17, 2021 at 10:27:16AM +0200, Hanna Reitz wrote:
> On 16.08.21 21:44, Vivek Goyal wrote:
> > On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:
> > 
> > [..]
> > > > > But given the inotify complications, there’s really a good reason we should
> > > > > use mountinfo.
> > > > > 
> > > > > > > It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
> > > > > > > but if that’s the only way...
> > > > > > yes. We already have lo->proc_self_fd. Maybe we need to keep
> > > > > > /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
> > > > > > that any mount table changes will still be visible despite the fact
> > > > > > I have fd open (and don't have to open new fd to notice new mount/unmount
> > > > > > changes).
> > > > > Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
> > > > > when I tried keeping the fd open, reading from it would just return 0
> > > > > bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
> > > > > nothing else in /proc is visible. Perhaps we need to bind-mount
> > > > > /proc/self/mountinfo into /proc/self/fd before that...
> > > > Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> > > > before /proc/self/fd is bind mounted on /proc?
> > > Yes, I tried that, and then reading would just return 0 bytes.
> > Hi Hanna,
> > 
> > I tried this simple patch and I can read /proc/self/mountinfo before
> > bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
> > I missing something.
> 
> Yes, but I tried reading it in the main loop (where we’d actually need it). 
> It looks like the umount2(".", MNT_DETACH) in setup_mounts() breaks it.

Good point. I modified my code and notice too that after umoutn2() it
always reads 0 bytes. I can understand that all the other mount points
could go away but new rootfs mount point of virtiofsd should still be
visible, IIUC. I don't understand why.

Anyway, I tried re-opening /proc/self/mountinfo file after umount2(".",
MNT_DETACH), and that seems to work and it shows root mount point. I 
created a bind mount and it shows that too.

So looks like quick fix can be that we re-open /proc/self/mountinfo. But
that means we can't bind /proc/self/fd on /proc/. We could bind mount
/proc/self on /proc. Not sure is it safe enough.

Here is the debug patch I tried.


---
 tools/virtiofsd/passthrough_ll.c |  101 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 96 insertions(+), 5 deletions(-)

Index: rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c
===================================================================
--- rhvgoyal-qemu.orig/tools/virtiofsd/passthrough_ll.c	2021-08-16 15:29:27.712223551 -0400
+++ rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c	2021-08-17 15:40:20.456811218 -0400
@@ -172,6 +172,8 @@ struct lo_data {
 
     /* An O_PATH file descriptor to /proc/self/fd/ */
     int proc_self_fd;
+    int proc_mountinfo;
+    int proc_self;
     int user_killpriv_v2, killpriv_v2;
     /* If set, virtiofsd is responsible for setting umask during creation */
     bool change_umask;
@@ -3403,12 +3405,56 @@ static void setup_wait_parent_capabiliti
     capng_apply(CAPNG_SELECT_BOTH);
 }
 
+static void read_mountinfo(struct lo_data *lo)
+{
+    char buf[4096];
+    ssize_t count, total_read = 0;
+    int ret;
+
+    ret = lseek(lo->proc_mountinfo, 0, SEEK_SET);
+    if (ret == -1) {
+            fuse_log(FUSE_LOG_ERR, "lseek(): %m\n");
+            exit(1);
+    }
+
+    do {
+        count = read(lo->proc_mountinfo, buf, 4095);
+        if (count == -1) {
+            fuse_log(FUSE_LOG_ERR, "read(/proc/self/mountinfo): %m\n");
+            exit(1);
+        }
+
+        //fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", count);
+        buf[count] = '\0';
+        fuse_log(FUSE_LOG_INFO, "%s", buf);
+        total_read += count;
+    } while(count);
+
+    fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", total_read);
+}
+
+static void reopen_mountinfo(struct lo_data *lo)
+{
+    int fd;
+
+    close(lo->proc_mountinfo);
+
+    fd = openat(lo->proc_self, "mountinfo", O_RDONLY);
+    if (fd == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(/proc/self/mountinfo, O_RDONLY): %m\n");
+        exit(1);
+    }
+
+    lo->proc_mountinfo = fd;
+}
+
 /*
  * Move to a new mount, net, and pid namespaces to isolate this process.
  */
 static void setup_namespaces(struct lo_data *lo, struct fuse_session *se)
 {
     pid_t child;
+    int fd;
 
     /*
      * Create a new pid namespace for *child* processes.  We'll have to
@@ -3472,21 +3518,35 @@ static void setup_namespaces(struct lo_d
         exit(1);
     }
 
+    fd = open("/proc/self/mountinfo", O_RDONLY);
+    if (fd == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(/proc/self/mountinfo, O_RDONLY): %m\n");
+        exit(1);
+    }
+
+    lo->proc_mountinfo = fd;
+
     /*
      * We only need /proc/self/fd. Prevent ".." from accessing parent
      * directories of /proc/self/fd by bind-mounting it over /proc. Since / was
      * previously remounted with MS_REC | MS_SLAVE this mount change only
      * affects our process.
      */
-    if (mount("/proc/self/fd", "/proc", NULL, MS_BIND, NULL) < 0) {
+    if (mount("/proc/self/", "/proc", NULL, MS_BIND, NULL) < 0) {
         fuse_log(FUSE_LOG_ERR, "mount(/proc/self/fd, MS_BIND): %m\n");
         exit(1);
     }
 
     /* Get the /proc (actually /proc/self/fd, see above) file descriptor */
-    lo->proc_self_fd = open("/proc", O_PATH);
+    lo->proc_self_fd = open("/proc/fd", O_PATH);
     if (lo->proc_self_fd == -1) {
-        fuse_log(FUSE_LOG_ERR, "open(/proc, O_PATH): %m\n");
+        fuse_log(FUSE_LOG_ERR, "open(/proc/fd, O_PATH): %m\n");
+        exit(1);
+    }
+
+    lo->proc_self = open("/proc/", O_PATH);
+    if (lo->proc_self == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(/proc/self, O_PATH): %m\n");
         exit(1);
     }
 }
@@ -3524,7 +3584,7 @@ static void cleanup_capng(void)
  * Make the source directory our root so symlinks cannot escape and no other
  * files are accessible.  Assumes unshare(CLONE_NEWNS) was already called.
  */
-static void setup_mounts(const char *source)
+static void setup_mounts(const char *source, struct lo_data *lo)
 {
     int oldroot;
     int newroot;
@@ -3552,26 +3612,43 @@ static void setup_mounts(const char *sou
         exit(1);
     }
 
+    fuse_log(FUSE_LOG_INFO, "mountinfo before pivot_root()\n");
+    read_mountinfo(lo);
+
     if (syscall(__NR_pivot_root, ".", ".") < 0) {
         fuse_log(FUSE_LOG_ERR, "pivot_root(., .): %m\n");
         exit(1);
     }
 
+    fuse_log(FUSE_LOG_INFO, "mountinfo after pivot_root()\n");
+    read_mountinfo(lo);
+
     if (fchdir(oldroot) < 0) {
         fuse_log(FUSE_LOG_ERR, "fchdir(oldroot): %m\n");
         exit(1);
     }
 
+    fuse_log(FUSE_LOG_INFO, "mountinfo after fchdir()\n");
+    read_mountinfo(lo);
+
     if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) {
         fuse_log(FUSE_LOG_ERR, "mount(., MS_SLAVE | MS_REC): %m\n");
         exit(1);
     }
 
+    fuse_log(FUSE_LOG_INFO, "mountinfo before umount2(., MNT_DETACH): %m\n");
+    reopen_mountinfo(lo);
+    read_mountinfo(lo);
+
     if (umount2(".", MNT_DETACH) < 0) {
         fuse_log(FUSE_LOG_ERR, "umount2(., MNT_DETACH): %m\n");
         exit(1);
     }
 
+    fuse_log(FUSE_LOG_INFO, "mountinfo after umount2(., MNT_DETACH): %m\n");
+    reopen_mountinfo(lo);
+    read_mountinfo(lo);
+
     if (fchdir(newroot) < 0) {
         fuse_log(FUSE_LOG_ERR, "fchdir(newroot): %m\n");
         exit(1);
@@ -3711,6 +3788,19 @@ static void setup_chroot(struct lo_data
     }
 }
 
+static void create_mount(struct lo_data *lo)
+{
+    const char *source="foo", *dest="bar";
+
+    if (mount(source, dest, NULL, MS_BIND | MS_REC, NULL) < 0) {
+        fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
+        exit(1);
+    }
+
+    fuse_log(FUSE_LOG_INFO, "mountinfo after mounting foo\n");
+    read_mountinfo(lo);
+}
+
 /*
  * Lock down this process to prevent access to other processes or files outside
  * source directory.  This reduces the impact of arbitrary code execution bugs.
@@ -3720,7 +3810,8 @@ static void setup_sandbox(struct lo_data
 {
     if (lo->sandbox == SANDBOX_NAMESPACE) {
         setup_namespaces(lo, se);
-        setup_mounts(lo->source);
+        setup_mounts(lo->source, lo);
+        create_mount(lo);
     } else {
         setup_chroot(lo);
     }



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-08-17 19:45                     ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-17 19:45 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: qemu-devel, virtio-fs, Max Reitz

On Tue, Aug 17, 2021 at 10:27:16AM +0200, Hanna Reitz wrote:
> On 16.08.21 21:44, Vivek Goyal wrote:
> > On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:
> > 
> > [..]
> > > > > But given the inotify complications, there’s really a good reason we should
> > > > > use mountinfo.
> > > > > 
> > > > > > > It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
> > > > > > > but if that’s the only way...
> > > > > > yes. We already have lo->proc_self_fd. Maybe we need to keep
> > > > > > /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
> > > > > > that any mount table changes will still be visible despite the fact
> > > > > > I have fd open (and don't have to open new fd to notice new mount/unmount
> > > > > > changes).
> > > > > Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
> > > > > when I tried keeping the fd open, reading from it would just return 0
> > > > > bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
> > > > > nothing else in /proc is visible. Perhaps we need to bind-mount
> > > > > /proc/self/mountinfo into /proc/self/fd before that...
> > > > Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> > > > before /proc/self/fd is bind mounted on /proc?
> > > Yes, I tried that, and then reading would just return 0 bytes.
> > Hi Hanna,
> > 
> > I tried this simple patch and I can read /proc/self/mountinfo before
> > bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
> > I missing something.
> 
> Yes, but I tried reading it in the main loop (where we’d actually need it). 
> It looks like the umount2(".", MNT_DETACH) in setup_mounts() breaks it.

Good point. I modified my code and notice too that after umoutn2() it
always reads 0 bytes. I can understand that all the other mount points
could go away but new rootfs mount point of virtiofsd should still be
visible, IIUC. I don't understand why.

Anyway, I tried re-opening /proc/self/mountinfo file after umount2(".",
MNT_DETACH), and that seems to work and it shows root mount point. I 
created a bind mount and it shows that too.

So looks like quick fix can be that we re-open /proc/self/mountinfo. But
that means we can't bind /proc/self/fd on /proc/. We could bind mount
/proc/self on /proc. Not sure is it safe enough.

Here is the debug patch I tried.


---
 tools/virtiofsd/passthrough_ll.c |  101 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 96 insertions(+), 5 deletions(-)

Index: rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c
===================================================================
--- rhvgoyal-qemu.orig/tools/virtiofsd/passthrough_ll.c	2021-08-16 15:29:27.712223551 -0400
+++ rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c	2021-08-17 15:40:20.456811218 -0400
@@ -172,6 +172,8 @@ struct lo_data {
 
     /* An O_PATH file descriptor to /proc/self/fd/ */
     int proc_self_fd;
+    int proc_mountinfo;
+    int proc_self;
     int user_killpriv_v2, killpriv_v2;
     /* If set, virtiofsd is responsible for setting umask during creation */
     bool change_umask;
@@ -3403,12 +3405,56 @@ static void setup_wait_parent_capabiliti
     capng_apply(CAPNG_SELECT_BOTH);
 }
 
+static void read_mountinfo(struct lo_data *lo)
+{
+    char buf[4096];
+    ssize_t count, total_read = 0;
+    int ret;
+
+    ret = lseek(lo->proc_mountinfo, 0, SEEK_SET);
+    if (ret == -1) {
+            fuse_log(FUSE_LOG_ERR, "lseek(): %m\n");
+            exit(1);
+    }
+
+    do {
+        count = read(lo->proc_mountinfo, buf, 4095);
+        if (count == -1) {
+            fuse_log(FUSE_LOG_ERR, "read(/proc/self/mountinfo): %m\n");
+            exit(1);
+        }
+
+        //fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", count);
+        buf[count] = '\0';
+        fuse_log(FUSE_LOG_INFO, "%s", buf);
+        total_read += count;
+    } while(count);
+
+    fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", total_read);
+}
+
+static void reopen_mountinfo(struct lo_data *lo)
+{
+    int fd;
+
+    close(lo->proc_mountinfo);
+
+    fd = openat(lo->proc_self, "mountinfo", O_RDONLY);
+    if (fd == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(/proc/self/mountinfo, O_RDONLY): %m\n");
+        exit(1);
+    }
+
+    lo->proc_mountinfo = fd;
+}
+
 /*
  * Move to a new mount, net, and pid namespaces to isolate this process.
  */
 static void setup_namespaces(struct lo_data *lo, struct fuse_session *se)
 {
     pid_t child;
+    int fd;
 
     /*
      * Create a new pid namespace for *child* processes.  We'll have to
@@ -3472,21 +3518,35 @@ static void setup_namespaces(struct lo_d
         exit(1);
     }
 
+    fd = open("/proc/self/mountinfo", O_RDONLY);
+    if (fd == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(/proc/self/mountinfo, O_RDONLY): %m\n");
+        exit(1);
+    }
+
+    lo->proc_mountinfo = fd;
+
     /*
      * We only need /proc/self/fd. Prevent ".." from accessing parent
      * directories of /proc/self/fd by bind-mounting it over /proc. Since / was
      * previously remounted with MS_REC | MS_SLAVE this mount change only
      * affects our process.
      */
-    if (mount("/proc/self/fd", "/proc", NULL, MS_BIND, NULL) < 0) {
+    if (mount("/proc/self/", "/proc", NULL, MS_BIND, NULL) < 0) {
         fuse_log(FUSE_LOG_ERR, "mount(/proc/self/fd, MS_BIND): %m\n");
         exit(1);
     }
 
     /* Get the /proc (actually /proc/self/fd, see above) file descriptor */
-    lo->proc_self_fd = open("/proc", O_PATH);
+    lo->proc_self_fd = open("/proc/fd", O_PATH);
     if (lo->proc_self_fd == -1) {
-        fuse_log(FUSE_LOG_ERR, "open(/proc, O_PATH): %m\n");
+        fuse_log(FUSE_LOG_ERR, "open(/proc/fd, O_PATH): %m\n");
+        exit(1);
+    }
+
+    lo->proc_self = open("/proc/", O_PATH);
+    if (lo->proc_self == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(/proc/self, O_PATH): %m\n");
         exit(1);
     }
 }
@@ -3524,7 +3584,7 @@ static void cleanup_capng(void)
  * Make the source directory our root so symlinks cannot escape and no other
  * files are accessible.  Assumes unshare(CLONE_NEWNS) was already called.
  */
-static void setup_mounts(const char *source)
+static void setup_mounts(const char *source, struct lo_data *lo)
 {
     int oldroot;
     int newroot;
@@ -3552,26 +3612,43 @@ static void setup_mounts(const char *sou
         exit(1);
     }
 
+    fuse_log(FUSE_LOG_INFO, "mountinfo before pivot_root()\n");
+    read_mountinfo(lo);
+
     if (syscall(__NR_pivot_root, ".", ".") < 0) {
         fuse_log(FUSE_LOG_ERR, "pivot_root(., .): %m\n");
         exit(1);
     }
 
+    fuse_log(FUSE_LOG_INFO, "mountinfo after pivot_root()\n");
+    read_mountinfo(lo);
+
     if (fchdir(oldroot) < 0) {
         fuse_log(FUSE_LOG_ERR, "fchdir(oldroot): %m\n");
         exit(1);
     }
 
+    fuse_log(FUSE_LOG_INFO, "mountinfo after fchdir()\n");
+    read_mountinfo(lo);
+
     if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) {
         fuse_log(FUSE_LOG_ERR, "mount(., MS_SLAVE | MS_REC): %m\n");
         exit(1);
     }
 
+    fuse_log(FUSE_LOG_INFO, "mountinfo before umount2(., MNT_DETACH): %m\n");
+    reopen_mountinfo(lo);
+    read_mountinfo(lo);
+
     if (umount2(".", MNT_DETACH) < 0) {
         fuse_log(FUSE_LOG_ERR, "umount2(., MNT_DETACH): %m\n");
         exit(1);
     }
 
+    fuse_log(FUSE_LOG_INFO, "mountinfo after umount2(., MNT_DETACH): %m\n");
+    reopen_mountinfo(lo);
+    read_mountinfo(lo);
+
     if (fchdir(newroot) < 0) {
         fuse_log(FUSE_LOG_ERR, "fchdir(newroot): %m\n");
         exit(1);
@@ -3711,6 +3788,19 @@ static void setup_chroot(struct lo_data
     }
 }
 
+static void create_mount(struct lo_data *lo)
+{
+    const char *source="foo", *dest="bar";
+
+    if (mount(source, dest, NULL, MS_BIND | MS_REC, NULL) < 0) {
+        fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
+        exit(1);
+    }
+
+    fuse_log(FUSE_LOG_INFO, "mountinfo after mounting foo\n");
+    read_mountinfo(lo);
+}
+
 /*
  * Lock down this process to prevent access to other processes or files outside
  * source directory.  This reduces the impact of arbitrary code execution bugs.
@@ -3720,7 +3810,8 @@ static void setup_sandbox(struct lo_data
 {
     if (lo->sandbox == SANDBOX_NAMESPACE) {
         setup_namespaces(lo, se);
-        setup_mounts(lo->source);
+        setup_mounts(lo->source, lo);
+        create_mount(lo);
     } else {
         setup_chroot(lo);
     }


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
  2021-08-17 19:45                     ` [Virtio-fs] " Vivek Goyal
@ 2021-08-18  0:14                       ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-18  0:14 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert, virtio-fs,
	Ioannis Angelakopoulos, Max Reitz

On Tue, Aug 17, 2021 at 03:45:19PM -0400, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 10:27:16AM +0200, Hanna Reitz wrote:
> > On 16.08.21 21:44, Vivek Goyal wrote:
> > > On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:
> > > 
> > > [..]
> > > > > > But given the inotify complications, there’s really a good reason we should
> > > > > > use mountinfo.
> > > > > > 
> > > > > > > > It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
> > > > > > > > but if that’s the only way...
> > > > > > > yes. We already have lo->proc_self_fd. Maybe we need to keep
> > > > > > > /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
> > > > > > > that any mount table changes will still be visible despite the fact
> > > > > > > I have fd open (and don't have to open new fd to notice new mount/unmount
> > > > > > > changes).
> > > > > > Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
> > > > > > when I tried keeping the fd open, reading from it would just return 0
> > > > > > bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
> > > > > > nothing else in /proc is visible. Perhaps we need to bind-mount
> > > > > > /proc/self/mountinfo into /proc/self/fd before that...
> > > > > Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> > > > > before /proc/self/fd is bind mounted on /proc?
> > > > Yes, I tried that, and then reading would just return 0 bytes.
> > > Hi Hanna,
> > > 
> > > I tried this simple patch and I can read /proc/self/mountinfo before
> > > bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
> > > I missing something.
> > 
> > Yes, but I tried reading it in the main loop (where we’d actually need it). 
> > It looks like the umount2(".", MNT_DETACH) in setup_mounts() breaks it.
> 
> Good point. I modified my code and notice too that after umoutn2() it
> always reads 0 bytes. I can understand that all the other mount points
> could go away but new rootfs mount point of virtiofsd should still be
> visible, IIUC. I don't understand why.
> 
> Anyway, I tried re-opening /proc/self/mountinfo file after umount2(".",
> MNT_DETACH), and that seems to work and it shows root mount point. I 
> created a bind mount and it shows that too.
> 
> So looks like quick fix can be that we re-open /proc/self/mountinfo. But
> that means we can't bind /proc/self/fd on /proc/. We could bind mount
> /proc/self on /proc. Not sure is it safe enough.

Or may be I can do this.

- Open O_PATH fd for /proc/self
  proc_self = open("/proc/self");
- Bind mount /proc/self/fd on /proc
- pivot_root() and umount() stuff
- Openat(proc_self, "mountinfo")
- close(proc_self)

If this works, then we don't have the security issue and we managed
to open mountinfo after pivot_root() and umount(). Will give it a
try and see if it works tomorrow.

Vivek



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-08-18  0:14                       ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-18  0:14 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: qemu-devel, virtio-fs, Max Reitz

On Tue, Aug 17, 2021 at 03:45:19PM -0400, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 10:27:16AM +0200, Hanna Reitz wrote:
> > On 16.08.21 21:44, Vivek Goyal wrote:
> > > On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:
> > > 
> > > [..]
> > > > > > But given the inotify complications, there’s really a good reason we should
> > > > > > use mountinfo.
> > > > > > 
> > > > > > > > It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
> > > > > > > > but if that’s the only way...
> > > > > > > yes. We already have lo->proc_self_fd. Maybe we need to keep
> > > > > > > /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
> > > > > > > that any mount table changes will still be visible despite the fact
> > > > > > > I have fd open (and don't have to open new fd to notice new mount/unmount
> > > > > > > changes).
> > > > > > Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
> > > > > > when I tried keeping the fd open, reading from it would just return 0
> > > > > > bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
> > > > > > nothing else in /proc is visible. Perhaps we need to bind-mount
> > > > > > /proc/self/mountinfo into /proc/self/fd before that...
> > > > > Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> > > > > before /proc/self/fd is bind mounted on /proc?
> > > > Yes, I tried that, and then reading would just return 0 bytes.
> > > Hi Hanna,
> > > 
> > > I tried this simple patch and I can read /proc/self/mountinfo before
> > > bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
> > > I missing something.
> > 
> > Yes, but I tried reading it in the main loop (where we’d actually need it). 
> > It looks like the umount2(".", MNT_DETACH) in setup_mounts() breaks it.
> 
> Good point. I modified my code and notice too that after umoutn2() it
> always reads 0 bytes. I can understand that all the other mount points
> could go away but new rootfs mount point of virtiofsd should still be
> visible, IIUC. I don't understand why.
> 
> Anyway, I tried re-opening /proc/self/mountinfo file after umount2(".",
> MNT_DETACH), and that seems to work and it shows root mount point. I 
> created a bind mount and it shows that too.
> 
> So looks like quick fix can be that we re-open /proc/self/mountinfo. But
> that means we can't bind /proc/self/fd on /proc/. We could bind mount
> /proc/self on /proc. Not sure is it safe enough.

Or may be I can do this.

- Open O_PATH fd for /proc/self
  proc_self = open("/proc/self");
- Bind mount /proc/self/fd on /proc
- pivot_root() and umount() stuff
- Openat(proc_self, "mountinfo")
- close(proc_self)

If this works, then we don't have the security issue and we managed
to open mountinfo after pivot_root() and umount(). Will give it a
try and see if it works tomorrow.

Vivek


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
  2021-08-18  0:14                       ` [Virtio-fs] " Vivek Goyal
@ 2021-08-18 13:32                         ` Vivek Goyal
  -1 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-18 13:32 UTC (permalink / raw)
  To: Hanna Reitz
  Cc: Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert, virtio-fs,
	Ioannis Angelakopoulos, Max Reitz

On Tue, Aug 17, 2021 at 08:14:46PM -0400, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 03:45:19PM -0400, Vivek Goyal wrote:
> > On Tue, Aug 17, 2021 at 10:27:16AM +0200, Hanna Reitz wrote:
> > > On 16.08.21 21:44, Vivek Goyal wrote:
> > > > On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:
> > > > 
> > > > [..]
> > > > > > > But given the inotify complications, there’s really a good reason we should
> > > > > > > use mountinfo.
> > > > > > > 
> > > > > > > > > It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
> > > > > > > > > but if that’s the only way...
> > > > > > > > yes. We already have lo->proc_self_fd. Maybe we need to keep
> > > > > > > > /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
> > > > > > > > that any mount table changes will still be visible despite the fact
> > > > > > > > I have fd open (and don't have to open new fd to notice new mount/unmount
> > > > > > > > changes).
> > > > > > > Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
> > > > > > > when I tried keeping the fd open, reading from it would just return 0
> > > > > > > bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
> > > > > > > nothing else in /proc is visible. Perhaps we need to bind-mount
> > > > > > > /proc/self/mountinfo into /proc/self/fd before that...
> > > > > > Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> > > > > > before /proc/self/fd is bind mounted on /proc?
> > > > > Yes, I tried that, and then reading would just return 0 bytes.
> > > > Hi Hanna,
> > > > 
> > > > I tried this simple patch and I can read /proc/self/mountinfo before
> > > > bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
> > > > I missing something.
> > > 
> > > Yes, but I tried reading it in the main loop (where we’d actually need it). 
> > > It looks like the umount2(".", MNT_DETACH) in setup_mounts() breaks it.
> > 
> > Good point. I modified my code and notice too that after umoutn2() it
> > always reads 0 bytes. I can understand that all the other mount points
> > could go away but new rootfs mount point of virtiofsd should still be
> > visible, IIUC. I don't understand why.
> > 
> > Anyway, I tried re-opening /proc/self/mountinfo file after umount2(".",
> > MNT_DETACH), and that seems to work and it shows root mount point. I 
> > created a bind mount and it shows that too.
> > 
> > So looks like quick fix can be that we re-open /proc/self/mountinfo. But
> > that means we can't bind /proc/self/fd on /proc/. We could bind mount
> > /proc/self on /proc. Not sure is it safe enough.
> 
> Or may be I can do this.
> 
> - Open O_PATH fd for /proc/self
>   proc_self = open("/proc/self");
> - Bind mount /proc/self/fd on /proc
> - pivot_root() and umount() stuff
> - Openat(proc_self, "mountinfo")
> - close(proc_self)
> 
> If this works, then we don't have the security issue and we managed
> to open mountinfo after pivot_root() and umount(). Will give it a
> try and see if it works tomorrow.

Hi Hanna,

This seems to work for me. I think key is to open mountinfo after
pivot_root() and then it works. If it is opened before pivot_root()
then it does not work. Not sure why.

Thanks
Vivek


---
 tools/virtiofsd/passthrough_ll.c |   61 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 59 insertions(+), 2 deletions(-)

Index: rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c
===================================================================
--- rhvgoyal-qemu.orig/tools/virtiofsd/passthrough_ll.c	2021-08-16 15:29:27.712223551 -0400
+++ rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c	2021-08-18 09:29:34.653891067 -0400
@@ -172,6 +172,8 @@ struct lo_data {
 
     /* An O_PATH file descriptor to /proc/self/fd/ */
     int proc_self_fd;
+    int proc_mountinfo;
+    int proc_self;
     int user_killpriv_v2, killpriv_v2;
     /* If set, virtiofsd is responsible for setting umask during creation */
     bool change_umask;
@@ -3403,6 +3405,47 @@ static void setup_wait_parent_capabiliti
     capng_apply(CAPNG_SELECT_BOTH);
 }
 
+static void read_mountinfo(struct lo_data *lo)
+{
+    char buf[4096];
+    ssize_t count, total_read = 0;
+    int ret;
+
+    ret = lseek(lo->proc_mountinfo, 0, SEEK_SET);
+    if (ret == -1) {
+            fuse_log(FUSE_LOG_ERR, "lseek(): %m\n");
+            exit(1);
+    }
+
+    do {
+        count = read(lo->proc_mountinfo, buf, 4095);
+        if (count == -1) {
+            fuse_log(FUSE_LOG_ERR, "read(/proc/self/mountinfo): %m\n");
+            exit(1);
+        }
+
+        //fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", count);
+        buf[count] = '\0';
+        fuse_log(FUSE_LOG_INFO, "%s", buf);
+        total_read += count;
+    } while(count);
+
+    fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", total_read);
+}
+
+static void open_mountinfo(struct lo_data *lo)
+{
+    int fd;
+
+    fd = openat(lo->proc_self, "mountinfo", O_RDONLY);
+    if (fd == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(/proc/self/mountinfo, O_RDONLY): %m\n");
+        exit(1);
+    }
+
+    lo->proc_mountinfo = fd;
+}
+
 /*
  * Move to a new mount, net, and pid namespaces to isolate this process.
  */
@@ -3472,6 +3515,12 @@ static void setup_namespaces(struct lo_d
         exit(1);
     }
 
+    lo->proc_self = open("/proc/self", O_PATH);
+    if (lo->proc_self == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(/proc/self, O_PATH): %m\n");
+        exit(1);
+    }
+
     /*
      * We only need /proc/self/fd. Prevent ".." from accessing parent
      * directories of /proc/self/fd by bind-mounting it over /proc. Since / was
@@ -3524,7 +3573,7 @@ static void cleanup_capng(void)
  * Make the source directory our root so symlinks cannot escape and no other
  * files are accessible.  Assumes unshare(CLONE_NEWNS) was already called.
  */
-static void setup_mounts(const char *source)
+static void setup_mounts(const char *source, struct lo_data *lo)
 {
     int oldroot;
     int newroot;
@@ -3557,6 +3606,8 @@ static void setup_mounts(const char *sou
         exit(1);
     }
 
+    open_mountinfo(lo);
+
     if (fchdir(oldroot) < 0) {
         fuse_log(FUSE_LOG_ERR, "fchdir(oldroot): %m\n");
         exit(1);
@@ -3567,11 +3618,17 @@ static void setup_mounts(const char *sou
         exit(1);
     }
 
+    fuse_log(FUSE_LOG_INFO, "mountinfo before umount2(., MNT_DETACH)\n");
+    read_mountinfo(lo);
+
     if (umount2(".", MNT_DETACH) < 0) {
         fuse_log(FUSE_LOG_ERR, "umount2(., MNT_DETACH): %m\n");
         exit(1);
     }
 
+    fuse_log(FUSE_LOG_INFO, "mountinfo after umount2(., MNT_DETACH):\n");
+    read_mountinfo(lo);
+
     if (fchdir(newroot) < 0) {
         fuse_log(FUSE_LOG_ERR, "fchdir(newroot): %m\n");
         exit(1);
@@ -3720,7 +3777,7 @@ static void setup_sandbox(struct lo_data
 {
     if (lo->sandbox == SANDBOX_NAMESPACE) {
         setup_namespaces(lo, se);
-        setup_mounts(lo->source);
+        setup_mounts(lo->source, lo);
     } else {
         setup_chroot(lo);
     }



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-08-18 13:32                         ` Vivek Goyal
  0 siblings, 0 replies; 88+ messages in thread
From: Vivek Goyal @ 2021-08-18 13:32 UTC (permalink / raw)
  To: Hanna Reitz; +Cc: qemu-devel, virtio-fs, Max Reitz

On Tue, Aug 17, 2021 at 08:14:46PM -0400, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 03:45:19PM -0400, Vivek Goyal wrote:
> > On Tue, Aug 17, 2021 at 10:27:16AM +0200, Hanna Reitz wrote:
> > > On 16.08.21 21:44, Vivek Goyal wrote:
> > > > On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:
> > > > 
> > > > [..]
> > > > > > > But given the inotify complications, there’s really a good reason we should
> > > > > > > use mountinfo.
> > > > > > > 
> > > > > > > > > It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
> > > > > > > > > but if that’s the only way...
> > > > > > > > yes. We already have lo->proc_self_fd. Maybe we need to keep
> > > > > > > > /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
> > > > > > > > that any mount table changes will still be visible despite the fact
> > > > > > > > I have fd open (and don't have to open new fd to notice new mount/unmount
> > > > > > > > changes).
> > > > > > > Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
> > > > > > > when I tried keeping the fd open, reading from it would just return 0
> > > > > > > bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
> > > > > > > nothing else in /proc is visible. Perhaps we need to bind-mount
> > > > > > > /proc/self/mountinfo into /proc/self/fd before that...
> > > > > > Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> > > > > > before /proc/self/fd is bind mounted on /proc?
> > > > > Yes, I tried that, and then reading would just return 0 bytes.
> > > > Hi Hanna,
> > > > 
> > > > I tried this simple patch and I can read /proc/self/mountinfo before
> > > > bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
> > > > I missing something.
> > > 
> > > Yes, but I tried reading it in the main loop (where we’d actually need it). 
> > > It looks like the umount2(".", MNT_DETACH) in setup_mounts() breaks it.
> > 
> > Good point. I modified my code and notice too that after umoutn2() it
> > always reads 0 bytes. I can understand that all the other mount points
> > could go away but new rootfs mount point of virtiofsd should still be
> > visible, IIUC. I don't understand why.
> > 
> > Anyway, I tried re-opening /proc/self/mountinfo file after umount2(".",
> > MNT_DETACH), and that seems to work and it shows root mount point. I 
> > created a bind mount and it shows that too.
> > 
> > So looks like quick fix can be that we re-open /proc/self/mountinfo. But
> > that means we can't bind /proc/self/fd on /proc/. We could bind mount
> > /proc/self on /proc. Not sure is it safe enough.
> 
> Or may be I can do this.
> 
> - Open O_PATH fd for /proc/self
>   proc_self = open("/proc/self");
> - Bind mount /proc/self/fd on /proc
> - pivot_root() and umount() stuff
> - Openat(proc_self, "mountinfo")
> - close(proc_self)
> 
> If this works, then we don't have the security issue and we managed
> to open mountinfo after pivot_root() and umount(). Will give it a
> try and see if it works tomorrow.

Hi Hanna,

This seems to work for me. I think key is to open mountinfo after
pivot_root() and then it works. If it is opened before pivot_root()
then it does not work. Not sure why.

Thanks
Vivek


---
 tools/virtiofsd/passthrough_ll.c |   61 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 59 insertions(+), 2 deletions(-)

Index: rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c
===================================================================
--- rhvgoyal-qemu.orig/tools/virtiofsd/passthrough_ll.c	2021-08-16 15:29:27.712223551 -0400
+++ rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c	2021-08-18 09:29:34.653891067 -0400
@@ -172,6 +172,8 @@ struct lo_data {
 
     /* An O_PATH file descriptor to /proc/self/fd/ */
     int proc_self_fd;
+    int proc_mountinfo;
+    int proc_self;
     int user_killpriv_v2, killpriv_v2;
     /* If set, virtiofsd is responsible for setting umask during creation */
     bool change_umask;
@@ -3403,6 +3405,47 @@ static void setup_wait_parent_capabiliti
     capng_apply(CAPNG_SELECT_BOTH);
 }
 
+static void read_mountinfo(struct lo_data *lo)
+{
+    char buf[4096];
+    ssize_t count, total_read = 0;
+    int ret;
+
+    ret = lseek(lo->proc_mountinfo, 0, SEEK_SET);
+    if (ret == -1) {
+            fuse_log(FUSE_LOG_ERR, "lseek(): %m\n");
+            exit(1);
+    }
+
+    do {
+        count = read(lo->proc_mountinfo, buf, 4095);
+        if (count == -1) {
+            fuse_log(FUSE_LOG_ERR, "read(/proc/self/mountinfo): %m\n");
+            exit(1);
+        }
+
+        //fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", count);
+        buf[count] = '\0';
+        fuse_log(FUSE_LOG_INFO, "%s", buf);
+        total_read += count;
+    } while(count);
+
+    fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", total_read);
+}
+
+static void open_mountinfo(struct lo_data *lo)
+{
+    int fd;
+
+    fd = openat(lo->proc_self, "mountinfo", O_RDONLY);
+    if (fd == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(/proc/self/mountinfo, O_RDONLY): %m\n");
+        exit(1);
+    }
+
+    lo->proc_mountinfo = fd;
+}
+
 /*
  * Move to a new mount, net, and pid namespaces to isolate this process.
  */
@@ -3472,6 +3515,12 @@ static void setup_namespaces(struct lo_d
         exit(1);
     }
 
+    lo->proc_self = open("/proc/self", O_PATH);
+    if (lo->proc_self == -1) {
+        fuse_log(FUSE_LOG_ERR, "open(/proc/self, O_PATH): %m\n");
+        exit(1);
+    }
+
     /*
      * We only need /proc/self/fd. Prevent ".." from accessing parent
      * directories of /proc/self/fd by bind-mounting it over /proc. Since / was
@@ -3524,7 +3573,7 @@ static void cleanup_capng(void)
  * Make the source directory our root so symlinks cannot escape and no other
  * files are accessible.  Assumes unshare(CLONE_NEWNS) was already called.
  */
-static void setup_mounts(const char *source)
+static void setup_mounts(const char *source, struct lo_data *lo)
 {
     int oldroot;
     int newroot;
@@ -3557,6 +3606,8 @@ static void setup_mounts(const char *sou
         exit(1);
     }
 
+    open_mountinfo(lo);
+
     if (fchdir(oldroot) < 0) {
         fuse_log(FUSE_LOG_ERR, "fchdir(oldroot): %m\n");
         exit(1);
@@ -3567,11 +3618,17 @@ static void setup_mounts(const char *sou
         exit(1);
     }
 
+    fuse_log(FUSE_LOG_INFO, "mountinfo before umount2(., MNT_DETACH)\n");
+    read_mountinfo(lo);
+
     if (umount2(".", MNT_DETACH) < 0) {
         fuse_log(FUSE_LOG_ERR, "umount2(., MNT_DETACH): %m\n");
         exit(1);
     }
 
+    fuse_log(FUSE_LOG_INFO, "mountinfo after umount2(., MNT_DETACH):\n");
+    read_mountinfo(lo);
+
     if (fchdir(newroot) < 0) {
         fuse_log(FUSE_LOG_ERR, "fchdir(newroot): %m\n");
         exit(1);
@@ -3720,7 +3777,7 @@ static void setup_sandbox(struct lo_data
 {
     if (lo->sandbox == SANDBOX_NAMESPACE) {
         setup_namespaces(lo, se);
-        setup_mounts(lo->source);
+        setup_mounts(lo->source, lo);
     } else {
         setup_chroot(lo);
     }


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
  2021-08-18 13:32                         ` [Virtio-fs] " Vivek Goyal
@ 2021-08-18 13:48                           ` Hanna Reitz
  -1 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-18 13:48 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Stefan Hajnoczi, qemu-devel, Dr . David Alan Gilbert, virtio-fs,
	Ioannis Angelakopoulos, Max Reitz

On 18.08.21 15:32, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 08:14:46PM -0400, Vivek Goyal wrote:
>> On Tue, Aug 17, 2021 at 03:45:19PM -0400, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:27:16AM +0200, Hanna Reitz wrote:
>>>> On 16.08.21 21:44, Vivek Goyal wrote:
>>>>> On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:
>>>>>
>>>>> [..]
>>>>>>>> But given the inotify complications, there’s really a good reason we should
>>>>>>>> use mountinfo.
>>>>>>>>
>>>>>>>>>> It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
>>>>>>>>>> but if that’s the only way...
>>>>>>>>> yes. We already have lo->proc_self_fd. Maybe we need to keep
>>>>>>>>> /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
>>>>>>>>> that any mount table changes will still be visible despite the fact
>>>>>>>>> I have fd open (and don't have to open new fd to notice new mount/unmount
>>>>>>>>> changes).
>>>>>>>> Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
>>>>>>>> when I tried keeping the fd open, reading from it would just return 0
>>>>>>>> bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
>>>>>>>> nothing else in /proc is visible. Perhaps we need to bind-mount
>>>>>>>> /proc/self/mountinfo into /proc/self/fd before that...
>>>>>>> Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
>>>>>>> before /proc/self/fd is bind mounted on /proc?
>>>>>> Yes, I tried that, and then reading would just return 0 bytes.
>>>>> Hi Hanna,
>>>>>
>>>>> I tried this simple patch and I can read /proc/self/mountinfo before
>>>>> bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
>>>>> I missing something.
>>>> Yes, but I tried reading it in the main loop (where we’d actually need it).
>>>> It looks like the umount2(".", MNT_DETACH) in setup_mounts() breaks it.
>>> Good point. I modified my code and notice too that after umoutn2() it
>>> always reads 0 bytes. I can understand that all the other mount points
>>> could go away but new rootfs mount point of virtiofsd should still be
>>> visible, IIUC. I don't understand why.
>>>
>>> Anyway, I tried re-opening /proc/self/mountinfo file after umount2(".",
>>> MNT_DETACH), and that seems to work and it shows root mount point. I
>>> created a bind mount and it shows that too.
>>>
>>> So looks like quick fix can be that we re-open /proc/self/mountinfo. But
>>> that means we can't bind /proc/self/fd on /proc/. We could bind mount
>>> /proc/self on /proc. Not sure is it safe enough.
>> Or may be I can do this.
>>
>> - Open O_PATH fd for /proc/self
>>    proc_self = open("/proc/self");
>> - Bind mount /proc/self/fd on /proc
>> - pivot_root() and umount() stuff
>> - Openat(proc_self, "mountinfo")
>> - close(proc_self)
>>
>> If this works, then we don't have the security issue and we managed
>> to open mountinfo after pivot_root() and umount(). Will give it a
>> try and see if it works tomorrow.
> Hi Hanna,
>
> This seems to work for me. I think key is to open mountinfo after
> pivot_root() and then it works. If it is opened before pivot_root()
> then it does not work. Not sure why.

Great, your code looks good to me.  I was afraid this was going to be 
really complicated, but that doesn’t look too bad.

Thanks!

Hanna



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-08-18 13:48                           ` Hanna Reitz
  0 siblings, 0 replies; 88+ messages in thread
From: Hanna Reitz @ 2021-08-18 13:48 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: qemu-devel, virtio-fs, Max Reitz

On 18.08.21 15:32, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 08:14:46PM -0400, Vivek Goyal wrote:
>> On Tue, Aug 17, 2021 at 03:45:19PM -0400, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:27:16AM +0200, Hanna Reitz wrote:
>>>> On 16.08.21 21:44, Vivek Goyal wrote:
>>>>> On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:
>>>>>
>>>>> [..]
>>>>>>>> But given the inotify complications, there’s really a good reason we should
>>>>>>>> use mountinfo.
>>>>>>>>
>>>>>>>>>> It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
>>>>>>>>>> but if that’s the only way...
>>>>>>>>> yes. We already have lo->proc_self_fd. Maybe we need to keep
>>>>>>>>> /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
>>>>>>>>> that any mount table changes will still be visible despite the fact
>>>>>>>>> I have fd open (and don't have to open new fd to notice new mount/unmount
>>>>>>>>> changes).
>>>>>>>> Well, yes, that was my idea.  Unfortunately, I wasn’t quite successful yet;
>>>>>>>> when I tried keeping the fd open, reading from it would just return 0
>>>>>>>> bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
>>>>>>>> nothing else in /proc is visible. Perhaps we need to bind-mount
>>>>>>>> /proc/self/mountinfo into /proc/self/fd before that...
>>>>>>> Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
>>>>>>> before /proc/self/fd is bind mounted on /proc?
>>>>>> Yes, I tried that, and then reading would just return 0 bytes.
>>>>> Hi Hanna,
>>>>>
>>>>> I tried this simple patch and I can read /proc/self/mountinfo before
>>>>> bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
>>>>> I missing something.
>>>> Yes, but I tried reading it in the main loop (where we’d actually need it).
>>>> It looks like the umount2(".", MNT_DETACH) in setup_mounts() breaks it.
>>> Good point. I modified my code and notice too that after umoutn2() it
>>> always reads 0 bytes. I can understand that all the other mount points
>>> could go away but new rootfs mount point of virtiofsd should still be
>>> visible, IIUC. I don't understand why.
>>>
>>> Anyway, I tried re-opening /proc/self/mountinfo file after umount2(".",
>>> MNT_DETACH), and that seems to work and it shows root mount point. I
>>> created a bind mount and it shows that too.
>>>
>>> So looks like quick fix can be that we re-open /proc/self/mountinfo. But
>>> that means we can't bind /proc/self/fd on /proc/. We could bind mount
>>> /proc/self on /proc. Not sure is it safe enough.
>> Or may be I can do this.
>>
>> - Open O_PATH fd for /proc/self
>>    proc_self = open("/proc/self");
>> - Bind mount /proc/self/fd on /proc
>> - pivot_root() and umount() stuff
>> - Openat(proc_self, "mountinfo")
>> - close(proc_self)
>>
>> If this works, then we don't have the security issue and we managed
>> to open mountinfo after pivot_root() and umount(). Will give it a
>> try and see if it works tomorrow.
> Hi Hanna,
>
> This seems to work for me. I think key is to open mountinfo after
> pivot_root() and then it works. If it is opened before pivot_root()
> then it does not work. Not sure why.

Great, your code looks good to me.  I was afraid this was going to be 
really complicated, but that doesn’t look too bad.

Thanks!

Hanna


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
  2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
@ 2021-08-19 16:38     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 88+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-19 16:38 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, Stefan Hajnoczi, qemu-devel, Vivek Goyal

* Max Reitz (mreitz@redhat.com) wrote:
> When the inode_file_handles option is set, try to generate a file handle
> for new inodes instead of opening an O_PATH FD.
> 
> Being able to open these again will require CAP_DAC_READ_SEARCH, so the
> description text tells the user they will also need to specify
> -o modcaps=+dac_read_search.
> 
> Generating a file handle returns the mount ID it is valid for.  Opening
> it will require an FD instead.  We have mount_fds to map an ID to an FD.
> get_file_handle() fills the hash map by opening the file we have
> generated a handle for.  To verify that the resulting FD indeed
> represents the handle's mount ID, we use statx().  Therefore, using file
> handles requires statx() support.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/helper.c              |   3 +
>  tools/virtiofsd/passthrough_ll.c      | 194 ++++++++++++++++++++++++--
>  tools/virtiofsd/passthrough_seccomp.c |   1 +
>  3 files changed, 190 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index a8295d975a..aa63a21d43 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -187,6 +187,9 @@ void fuse_cmdline_help(void)
>             "                               default: no_allow_direct_io\n"
>             "    -o announce_submounts      Announce sub-mount points to the guest\n"
>             "    -o posix_acl/no_posix_acl  Enable/Disable posix_acl. (default: disabled)\n"
> +           "    -o inode_file_handles      Use file handles to reference inodes\n"
> +           "                               instead of O_PATH file descriptors\n"
> +           "                               (requires -o modcaps=+dac_read_search)\n"

I think you should probably add that automatically for the user; we do
similar for seccomp/syslog (see syscall_allowlist_syslog); just do it
before the while (modcaps) {   line so whatever the user specifies
sticks.

Dave

>             );
>  }
>  
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index f9d8b2f134..ac95961d12 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -194,6 +194,7 @@ struct lo_data {
>      /* If set, virtiofsd is responsible for setting umask during creation */
>      bool change_umask;
>      int user_posix_acl, posix_acl;
> +    int inode_file_handles;
>  };
>  
>  /**
> @@ -250,6 +251,10 @@ static const struct fuse_opt lo_opts[] = {
>      { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
>      { "posix_acl", offsetof(struct lo_data, user_posix_acl), 1 },
>      { "no_posix_acl", offsetof(struct lo_data, user_posix_acl), 0 },
> +    { "inode_file_handles", offsetof(struct lo_data, inode_file_handles), 1 },
> +    { "no_inode_file_handles",
> +      offsetof(struct lo_data, inode_file_handles),
> +      0 },
>      FUSE_OPT_END
>  };
>  static bool use_syslog = false;
> @@ -321,6 +326,135 @@ static int temp_fd_steal(TempFd *temp_fd)
>      }
>  }
>  
> +/**
> + * Generate a file handle for the given dirfd/name combination.
> + *
> + * If mount_fds does not yet contain an entry for the handle's mount
> + * ID, (re)open dirfd/name in O_RDONLY mode and add it to mount_fds
> + * as the FD for that mount ID.  (That is the file that we have
> + * generated a handle for, so it should be representative for the
> + * mount ID.  However, to be sure (and to rule out races), we use
> + * statx() to verify that our assumption is correct.)
> + */
> +static struct lo_fhandle *get_file_handle(struct lo_data *lo,
> +                                          int dirfd, const char *name)
> +{
> +    /* We need statx() to verify the mount ID */
> +#if defined(CONFIG_STATX) && defined(STATX_MNT_ID)
> +    struct lo_fhandle *fh;
> +    int ret;
> +
> +    if (!lo->use_statx || !lo->inode_file_handles) {
> +        return NULL;
> +    }
> +
> +    fh = g_new0(struct lo_fhandle, 1);
> +
> +    fh->handle.handle_bytes = sizeof(fh->padding) - sizeof(fh->handle);
> +    ret = name_to_handle_at(dirfd, name, &fh->handle, &fh->mount_id,
> +                            AT_EMPTY_PATH);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    if (pthread_rwlock_rdlock(&mount_fds_lock)) {
> +        goto fail;
> +    }
> +    if (!g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
> +        g_auto(TempFd) path_fd = TEMP_FD_INIT;
> +        struct statx stx;
> +        char procname[64];
> +        int fd;
> +
> +        pthread_rwlock_unlock(&mount_fds_lock);
> +
> +        /*
> +         * Before opening an O_RDONLY fd, check whether dirfd/name is a regular
> +         * file or directory, because we must not open anything else with
> +         * anything but O_PATH.
> +         * (And we use that occasion to verify that the file has the mount ID we
> +         * need.)
> +         */
> +        if (name[0]) {
> +            path_fd.fd = openat(dirfd, name, O_PATH);
> +            if (path_fd.fd < 0) {
> +                goto fail;
> +            }
> +            path_fd.owned = true;
> +        } else {
> +            path_fd.fd = dirfd;
> +            path_fd.owned = false;
> +        }
> +
> +        ret = statx(path_fd.fd, "", AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
> +                    STATX_TYPE | STATX_MNT_ID, &stx);
> +        if (ret < 0) {
> +            if (errno == ENOSYS) {
> +                lo->use_statx = false;
> +                fuse_log(FUSE_LOG_WARNING,
> +                         "statx() does not work: Will not be able to use file "
> +                         "handles for inodes\n");
> +            }
> +            goto fail;
> +        }
> +        if (!(stx.stx_mask & STATX_MNT_ID) || stx.stx_mnt_id != fh->mount_id) {
> +            /*
> +             * One reason for stx_mnt_id != mount_id could be that dirfd/name
> +             * is a directory, and some other filesystem was mounted there
> +             * between us generating the file handle and then opening the FD.
> +             * (Other kinds of races might be possible, too.)
> +             * Failing this function is not fatal, though, because our caller
> +             * (lo_do_lookup()) will just fall back to opening an O_PATH FD to
> +             * store in lo_inode.fd instead of storing a file handle in
> +             * lo_inode.fhandle.  So we do not need to try too hard to get an
> +             * FD for fh->mount_id so this function could succeed.
> +             */
> +            goto fail;
> +        }
> +        if (!(stx.stx_mask & STATX_TYPE) ||
> +            !(S_ISREG(stx.stx_mode) || S_ISDIR(stx.stx_mode)))
> +        {
> +            /*
> +             * We must not open special files with anything but O_PATH, so we
> +             * cannot use this file for mount_fds.
> +             * Just return a failure in such a case and let the lo_inode have
> +             * an O_PATH fd instead of a file handle.
> +             */
> +            goto fail;
> +        }
> +
> +        /* Now that we know this fd is safe to open, do it */
> +        snprintf(procname, sizeof(procname), "%i", path_fd.fd);
> +        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        if (fd < 0) {
> +            goto fail;
> +        }
> +
> +        if (pthread_rwlock_wrlock(&mount_fds_lock)) {
> +            goto fail;
> +        }
> +
> +        /* Check again, might have changed */
> +        if (g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
> +            close(fd);
> +        } else {
> +            g_hash_table_insert(mount_fds,
> +                                GINT_TO_POINTER(fh->mount_id),
> +                                GINT_TO_POINTER(fd));
> +        }
> +    }
> +    pthread_rwlock_unlock(&mount_fds_lock);
> +
> +    return fh;
> +
> +fail:
> +    free(fh);
> +    return NULL;
> +#else /* defined(CONFIG_STATX) && defined(STATX_MNT_ID) */
> +    return NULL;
> +#endif
> +}
> +
>  /**
>   * Open the given file handle with the given flags.
>   *
> @@ -1165,6 +1299,11 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>              return -1;
>          }
>          lo->use_statx = false;
> +        if (lo->inode_file_handles) {
> +            fuse_log(FUSE_LOG_WARNING,
> +                     "statx() does not work: Will not be able to use file "
> +                     "handles for inodes\n");
> +        }
>          /* fallback */
>      }
>  #endif
> @@ -1194,6 +1333,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode = NULL;
>      struct lo_inode *dir = lo_inode(req, parent);
> +    struct lo_fhandle *fh;
>  
>      if (inodep) {
>          *inodep = NULL; /* in case there is an error */
> @@ -1223,13 +1363,21 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out;
>      }
>  
> -    newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
> -    if (newfd == -1) {
> -        goto out_err;
> +    fh = get_file_handle(lo, dir_fd.fd, name);
> +    if (!fh) {
> +        newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
> +        if (newfd == -1) {
> +            goto out_err;
> +        }
>      }
>  
> -    res = do_statx(lo, newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
> -                   &mnt_id);
> +    if (newfd >= 0) {
> +        res = do_statx(lo, newfd, "", &e->attr,
> +                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
> +    } else {
> +        res = do_statx(lo, dir_fd.fd, name, &e->attr,
> +                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
> +    }
>      if (res == -1) {
>          goto out_err;
>      }
> @@ -1239,9 +1387,19 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
>      }
>  
> -    inode = lo_find(lo, NULL, &e->attr, mnt_id);
> +    /*
> +     * Note that fh is always NULL if lo->inode_file_handles is false,
> +     * and so we will never do a lookup by file handle here, and
> +     * lo->inodes_by_handle will always remain empty.  We only need
> +     * this map when we do not have an O_PATH fd open for every
> +     * lo_inode, though, so if inode_file_handles is false, we do not
> +     * need that map anyway.
> +     */
> +    inode = lo_find(lo, fh, &e->attr, mnt_id);
>      if (inode) {
> -        close(newfd);
> +        if (newfd != -1) {
> +            close(newfd);
> +        }
>      } else {
>          inode = calloc(1, sizeof(struct lo_inode));
>          if (!inode) {
> @@ -1259,6 +1417,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>  
>          inode->nlookup = 1;
>          inode->fd = newfd;
> +        inode->fhandle = fh;
>          inode->key.ino = e->attr.st_ino;
>          inode->key.dev = e->attr.st_dev;
>          inode->key.mnt_id = mnt_id;
> @@ -1270,6 +1429,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          pthread_mutex_lock(&lo->mutex);
>          inode->fuse_ino = lo_add_inode_mapping(req, inode);
>          g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
> +        if (inode->fhandle) {
> +            g_hash_table_insert(lo->inodes_by_handle, inode->fhandle, inode);
> +        }
>          pthread_mutex_unlock(&lo->mutex);
>      }
>      e->ino = inode->fuse_ino;
> @@ -1615,6 +1777,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>      int res;
>      uint64_t mnt_id;
>      struct stat attr;
> +    struct lo_fhandle *fh;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *dir = lo_inode(req, parent);
>      struct lo_inode *inode = NULL;
> @@ -1628,12 +1791,16 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>          goto out;
>      }
>  
> +    fh = get_file_handle(lo, dir_fd.fd, name);
> +    /* Ignore errors, this is just an optional key for the lookup */
> +
>      res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
>      if (res == -1) {
>          goto out;
>      }
>  
> -    inode = lo_find(lo, NULL, &attr, mnt_id);
> +    inode = lo_find(lo, fh, &attr, mnt_id);
> +    g_free(fh);
>  
>  out:
>      lo_inode_put(lo, &dir);
> @@ -1801,6 +1968,9 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
>      if (!inode->nlookup) {
>          lo_map_remove(&lo->ino_map, inode->fuse_ino);
>          g_hash_table_remove(lo->inodes_by_ids, &inode->key);
> +        if (inode->fhandle) {
> +            g_hash_table_remove(lo->inodes_by_handle, inode->fhandle);
> +        }
>          if (lo->posix_lock) {
>              if (g_hash_table_size(inode->posix_locks)) {
>                  fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
> @@ -4362,6 +4532,14 @@ int main(int argc, char *argv[])
>  
>      lo.use_statx = true;
>  
> +#if !defined(CONFIG_STATX) || !defined(STATX_MNT_ID)
> +    if (lo.inode_file_handles) {
> +        fuse_log(FUSE_LOG_WARNING,
> +                 "No statx() or mount ID support: Will not be able to use file "
> +                 "handles for inodes\n");
> +    }
> +#endif
> +
>      se = fuse_session_new(&args, &lo_oper, sizeof(lo_oper), &lo);
>      if (se == NULL) {
>          goto err_out1;
> diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
> index af04c638cb..ab4dc07e3f 100644
> --- a/tools/virtiofsd/passthrough_seccomp.c
> +++ b/tools/virtiofsd/passthrough_seccomp.c
> @@ -73,6 +73,7 @@ static const int syscall_allowlist[] = {
>      SCMP_SYS(mprotect),
>      SCMP_SYS(mremap),
>      SCMP_SYS(munmap),
> +    SCMP_SYS(name_to_handle_at),
>      SCMP_SYS(newfstatat),
>      SCMP_SYS(statx),
>      SCMP_SYS(open),
> -- 
> 2.31.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [Virtio-fs] [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-08-19 16:38     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 88+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-19 16:38 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel, Vivek Goyal

* Max Reitz (mreitz@redhat.com) wrote:
> When the inode_file_handles option is set, try to generate a file handle
> for new inodes instead of opening an O_PATH FD.
> 
> Being able to open these again will require CAP_DAC_READ_SEARCH, so the
> description text tells the user they will also need to specify
> -o modcaps=+dac_read_search.
> 
> Generating a file handle returns the mount ID it is valid for.  Opening
> it will require an FD instead.  We have mount_fds to map an ID to an FD.
> get_file_handle() fills the hash map by opening the file we have
> generated a handle for.  To verify that the resulting FD indeed
> represents the handle's mount ID, we use statx().  Therefore, using file
> handles requires statx() support.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/helper.c              |   3 +
>  tools/virtiofsd/passthrough_ll.c      | 194 ++++++++++++++++++++++++--
>  tools/virtiofsd/passthrough_seccomp.c |   1 +
>  3 files changed, 190 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index a8295d975a..aa63a21d43 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -187,6 +187,9 @@ void fuse_cmdline_help(void)
>             "                               default: no_allow_direct_io\n"
>             "    -o announce_submounts      Announce sub-mount points to the guest\n"
>             "    -o posix_acl/no_posix_acl  Enable/Disable posix_acl. (default: disabled)\n"
> +           "    -o inode_file_handles      Use file handles to reference inodes\n"
> +           "                               instead of O_PATH file descriptors\n"
> +           "                               (requires -o modcaps=+dac_read_search)\n"

I think you should probably add that automatically for the user; we do
similar for seccomp/syslog (see syscall_allowlist_syslog); just do it
before the while (modcaps) {   line so whatever the user specifies
sticks.

Dave

>             );
>  }
>  
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index f9d8b2f134..ac95961d12 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -194,6 +194,7 @@ struct lo_data {
>      /* If set, virtiofsd is responsible for setting umask during creation */
>      bool change_umask;
>      int user_posix_acl, posix_acl;
> +    int inode_file_handles;
>  };
>  
>  /**
> @@ -250,6 +251,10 @@ static const struct fuse_opt lo_opts[] = {
>      { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
>      { "posix_acl", offsetof(struct lo_data, user_posix_acl), 1 },
>      { "no_posix_acl", offsetof(struct lo_data, user_posix_acl), 0 },
> +    { "inode_file_handles", offsetof(struct lo_data, inode_file_handles), 1 },
> +    { "no_inode_file_handles",
> +      offsetof(struct lo_data, inode_file_handles),
> +      0 },
>      FUSE_OPT_END
>  };
>  static bool use_syslog = false;
> @@ -321,6 +326,135 @@ static int temp_fd_steal(TempFd *temp_fd)
>      }
>  }
>  
> +/**
> + * Generate a file handle for the given dirfd/name combination.
> + *
> + * If mount_fds does not yet contain an entry for the handle's mount
> + * ID, (re)open dirfd/name in O_RDONLY mode and add it to mount_fds
> + * as the FD for that mount ID.  (That is the file that we have
> + * generated a handle for, so it should be representative for the
> + * mount ID.  However, to be sure (and to rule out races), we use
> + * statx() to verify that our assumption is correct.)
> + */
> +static struct lo_fhandle *get_file_handle(struct lo_data *lo,
> +                                          int dirfd, const char *name)
> +{
> +    /* We need statx() to verify the mount ID */
> +#if defined(CONFIG_STATX) && defined(STATX_MNT_ID)
> +    struct lo_fhandle *fh;
> +    int ret;
> +
> +    if (!lo->use_statx || !lo->inode_file_handles) {
> +        return NULL;
> +    }
> +
> +    fh = g_new0(struct lo_fhandle, 1);
> +
> +    fh->handle.handle_bytes = sizeof(fh->padding) - sizeof(fh->handle);
> +    ret = name_to_handle_at(dirfd, name, &fh->handle, &fh->mount_id,
> +                            AT_EMPTY_PATH);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    if (pthread_rwlock_rdlock(&mount_fds_lock)) {
> +        goto fail;
> +    }
> +    if (!g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
> +        g_auto(TempFd) path_fd = TEMP_FD_INIT;
> +        struct statx stx;
> +        char procname[64];
> +        int fd;
> +
> +        pthread_rwlock_unlock(&mount_fds_lock);
> +
> +        /*
> +         * Before opening an O_RDONLY fd, check whether dirfd/name is a regular
> +         * file or directory, because we must not open anything else with
> +         * anything but O_PATH.
> +         * (And we use that occasion to verify that the file has the mount ID we
> +         * need.)
> +         */
> +        if (name[0]) {
> +            path_fd.fd = openat(dirfd, name, O_PATH);
> +            if (path_fd.fd < 0) {
> +                goto fail;
> +            }
> +            path_fd.owned = true;
> +        } else {
> +            path_fd.fd = dirfd;
> +            path_fd.owned = false;
> +        }
> +
> +        ret = statx(path_fd.fd, "", AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
> +                    STATX_TYPE | STATX_MNT_ID, &stx);
> +        if (ret < 0) {
> +            if (errno == ENOSYS) {
> +                lo->use_statx = false;
> +                fuse_log(FUSE_LOG_WARNING,
> +                         "statx() does not work: Will not be able to use file "
> +                         "handles for inodes\n");
> +            }
> +            goto fail;
> +        }
> +        if (!(stx.stx_mask & STATX_MNT_ID) || stx.stx_mnt_id != fh->mount_id) {
> +            /*
> +             * One reason for stx_mnt_id != mount_id could be that dirfd/name
> +             * is a directory, and some other filesystem was mounted there
> +             * between us generating the file handle and then opening the FD.
> +             * (Other kinds of races might be possible, too.)
> +             * Failing this function is not fatal, though, because our caller
> +             * (lo_do_lookup()) will just fall back to opening an O_PATH FD to
> +             * store in lo_inode.fd instead of storing a file handle in
> +             * lo_inode.fhandle.  So we do not need to try too hard to get an
> +             * FD for fh->mount_id so this function could succeed.
> +             */
> +            goto fail;
> +        }
> +        if (!(stx.stx_mask & STATX_TYPE) ||
> +            !(S_ISREG(stx.stx_mode) || S_ISDIR(stx.stx_mode)))
> +        {
> +            /*
> +             * We must not open special files with anything but O_PATH, so we
> +             * cannot use this file for mount_fds.
> +             * Just return a failure in such a case and let the lo_inode have
> +             * an O_PATH fd instead of a file handle.
> +             */
> +            goto fail;
> +        }
> +
> +        /* Now that we know this fd is safe to open, do it */
> +        snprintf(procname, sizeof(procname), "%i", path_fd.fd);
> +        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
> +        if (fd < 0) {
> +            goto fail;
> +        }
> +
> +        if (pthread_rwlock_wrlock(&mount_fds_lock)) {
> +            goto fail;
> +        }
> +
> +        /* Check again, might have changed */
> +        if (g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
> +            close(fd);
> +        } else {
> +            g_hash_table_insert(mount_fds,
> +                                GINT_TO_POINTER(fh->mount_id),
> +                                GINT_TO_POINTER(fd));
> +        }
> +    }
> +    pthread_rwlock_unlock(&mount_fds_lock);
> +
> +    return fh;
> +
> +fail:
> +    free(fh);
> +    return NULL;
> +#else /* defined(CONFIG_STATX) && defined(STATX_MNT_ID) */
> +    return NULL;
> +#endif
> +}
> +
>  /**
>   * Open the given file handle with the given flags.
>   *
> @@ -1165,6 +1299,11 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>              return -1;
>          }
>          lo->use_statx = false;
> +        if (lo->inode_file_handles) {
> +            fuse_log(FUSE_LOG_WARNING,
> +                     "statx() does not work: Will not be able to use file "
> +                     "handles for inodes\n");
> +        }
>          /* fallback */
>      }
>  #endif
> @@ -1194,6 +1333,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode = NULL;
>      struct lo_inode *dir = lo_inode(req, parent);
> +    struct lo_fhandle *fh;
>  
>      if (inodep) {
>          *inodep = NULL; /* in case there is an error */
> @@ -1223,13 +1363,21 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out;
>      }
>  
> -    newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
> -    if (newfd == -1) {
> -        goto out_err;
> +    fh = get_file_handle(lo, dir_fd.fd, name);
> +    if (!fh) {
> +        newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
> +        if (newfd == -1) {
> +            goto out_err;
> +        }
>      }
>  
> -    res = do_statx(lo, newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
> -                   &mnt_id);
> +    if (newfd >= 0) {
> +        res = do_statx(lo, newfd, "", &e->attr,
> +                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
> +    } else {
> +        res = do_statx(lo, dir_fd.fd, name, &e->attr,
> +                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
> +    }
>      if (res == -1) {
>          goto out_err;
>      }
> @@ -1239,9 +1387,19 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
>      }
>  
> -    inode = lo_find(lo, NULL, &e->attr, mnt_id);
> +    /*
> +     * Note that fh is always NULL if lo->inode_file_handles is false,
> +     * and so we will never do a lookup by file handle here, and
> +     * lo->inodes_by_handle will always remain empty.  We only need
> +     * this map when we do not have an O_PATH fd open for every
> +     * lo_inode, though, so if inode_file_handles is false, we do not
> +     * need that map anyway.
> +     */
> +    inode = lo_find(lo, fh, &e->attr, mnt_id);
>      if (inode) {
> -        close(newfd);
> +        if (newfd != -1) {
> +            close(newfd);
> +        }
>      } else {
>          inode = calloc(1, sizeof(struct lo_inode));
>          if (!inode) {
> @@ -1259,6 +1417,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>  
>          inode->nlookup = 1;
>          inode->fd = newfd;
> +        inode->fhandle = fh;
>          inode->key.ino = e->attr.st_ino;
>          inode->key.dev = e->attr.st_dev;
>          inode->key.mnt_id = mnt_id;
> @@ -1270,6 +1429,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          pthread_mutex_lock(&lo->mutex);
>          inode->fuse_ino = lo_add_inode_mapping(req, inode);
>          g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
> +        if (inode->fhandle) {
> +            g_hash_table_insert(lo->inodes_by_handle, inode->fhandle, inode);
> +        }
>          pthread_mutex_unlock(&lo->mutex);
>      }
>      e->ino = inode->fuse_ino;
> @@ -1615,6 +1777,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>      int res;
>      uint64_t mnt_id;
>      struct stat attr;
> +    struct lo_fhandle *fh;
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *dir = lo_inode(req, parent);
>      struct lo_inode *inode = NULL;
> @@ -1628,12 +1791,16 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>          goto out;
>      }
>  
> +    fh = get_file_handle(lo, dir_fd.fd, name);
> +    /* Ignore errors, this is just an optional key for the lookup */
> +
>      res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
>      if (res == -1) {
>          goto out;
>      }
>  
> -    inode = lo_find(lo, NULL, &attr, mnt_id);
> +    inode = lo_find(lo, fh, &attr, mnt_id);
> +    g_free(fh);
>  
>  out:
>      lo_inode_put(lo, &dir);
> @@ -1801,6 +1968,9 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
>      if (!inode->nlookup) {
>          lo_map_remove(&lo->ino_map, inode->fuse_ino);
>          g_hash_table_remove(lo->inodes_by_ids, &inode->key);
> +        if (inode->fhandle) {
> +            g_hash_table_remove(lo->inodes_by_handle, inode->fhandle);
> +        }
>          if (lo->posix_lock) {
>              if (g_hash_table_size(inode->posix_locks)) {
>                  fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
> @@ -4362,6 +4532,14 @@ int main(int argc, char *argv[])
>  
>      lo.use_statx = true;
>  
> +#if !defined(CONFIG_STATX) || !defined(STATX_MNT_ID)
> +    if (lo.inode_file_handles) {
> +        fuse_log(FUSE_LOG_WARNING,
> +                 "No statx() or mount ID support: Will not be able to use file "
> +                 "handles for inodes\n");
> +    }
> +#endif
> +
>      se = fuse_session_new(&args, &lo_oper, sizeof(lo_oper), &lo);
>      if (se == NULL) {
>          goto err_out1;
> diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
> index af04c638cb..ab4dc07e3f 100644
> --- a/tools/virtiofsd/passthrough_seccomp.c
> +++ b/tools/virtiofsd/passthrough_seccomp.c
> @@ -73,6 +73,7 @@ static const int syscall_allowlist[] = {
>      SCMP_SYS(mprotect),
>      SCMP_SYS(mremap),
>      SCMP_SYS(munmap),
> +    SCMP_SYS(name_to_handle_at),
>      SCMP_SYS(newfstatat),
>      SCMP_SYS(statx),
>      SCMP_SYS(open),
> -- 
> 2.31.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2021-08-19 16:39 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 15:01 [PATCH v3 00/10] virtiofsd: Allow using file handles instead of O_PATH FDs Max Reitz
2021-07-30 15:01 ` [Virtio-fs] " Max Reitz
2021-07-30 15:01 ` [PATCH v3 01/10] virtiofsd: Limit setxattr()'s creds-dropped region Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-06 14:16   ` Vivek Goyal
2021-08-06 14:16     ` [Virtio-fs] " Vivek Goyal
2021-08-09 10:30     ` Max Reitz
2021-08-09 10:30       ` [Virtio-fs] " Max Reitz
2021-07-30 15:01 ` [PATCH v3 02/10] virtiofsd: Add TempFd structure Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-06 14:41   ` Vivek Goyal
2021-08-06 14:41     ` [Virtio-fs] " Vivek Goyal
2021-08-09 10:44     ` Max Reitz
2021-08-09 10:44       ` [Virtio-fs] " Max Reitz
2021-07-30 15:01 ` [PATCH v3 03/10] virtiofsd: Use lo_inode_open() instead of openat() Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-06 15:42   ` Vivek Goyal
2021-08-06 15:42     ` [Virtio-fs] " Vivek Goyal
2021-07-30 15:01 ` [PATCH v3 04/10] virtiofsd: Add lo_inode_fd() helper Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-06 18:25   ` Vivek Goyal
2021-08-06 18:25     ` [Virtio-fs] " Vivek Goyal
2021-08-09 10:48     ` Max Reitz
2021-08-09 10:48       ` [Virtio-fs] " Max Reitz
2021-07-30 15:01 ` [PATCH v3 05/10] virtiofsd: Let lo_fd() return a TempFd Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-07-30 15:01 ` [PATCH v3 06/10] virtiofsd: Let lo_inode_open() " Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-06 19:55   ` Vivek Goyal
2021-08-06 19:55     ` [Virtio-fs] " Vivek Goyal
2021-08-09 13:40     ` Max Reitz
2021-08-09 13:40       ` [Virtio-fs] " Max Reitz
2021-07-30 15:01 ` [PATCH v3 07/10] virtiofsd: Add lo_inode.fhandle Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-09 15:21   ` Vivek Goyal
2021-08-09 15:21     ` [Virtio-fs] " Vivek Goyal
2021-08-09 16:41     ` Hanna Reitz
2021-08-09 16:41       ` [Virtio-fs] " Hanna Reitz
2021-07-30 15:01 ` [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-09 16:10   ` Vivek Goyal
2021-08-09 16:10     ` [Virtio-fs] " Vivek Goyal
2021-08-09 16:47     ` Hanna Reitz
2021-08-09 16:47       ` [Virtio-fs] " Hanna Reitz
2021-08-10 14:07       ` Vivek Goyal
2021-08-10 14:07         ` [Virtio-fs] " Vivek Goyal
2021-08-10 14:13         ` Hanna Reitz
2021-08-10 14:13           ` [Virtio-fs] " Hanna Reitz
2021-08-10 17:51           ` Vivek Goyal
2021-08-10 17:51             ` [Virtio-fs] " Vivek Goyal
2021-07-30 15:01 ` [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-09 18:41   ` Vivek Goyal
2021-08-09 18:41     ` [Virtio-fs] " Vivek Goyal
2021-08-10  8:32     ` Hanna Reitz
2021-08-10  8:32       ` [Virtio-fs] " Hanna Reitz
2021-08-10 15:23       ` Vivek Goyal
2021-08-10 15:23         ` [Virtio-fs] " Vivek Goyal
2021-08-10 15:26         ` Hanna Reitz
2021-08-10 15:26           ` [Virtio-fs] " Hanna Reitz
2021-08-10 15:57           ` Vivek Goyal
2021-08-10 15:57             ` [Virtio-fs] " Vivek Goyal
2021-08-11  6:41             ` Hanna Reitz
2021-08-11  6:41               ` [Virtio-fs] " Hanna Reitz
2021-08-16 19:44               ` Vivek Goyal
2021-08-16 19:44                 ` [Virtio-fs] " Vivek Goyal
2021-08-17  8:27                 ` Hanna Reitz
2021-08-17  8:27                   ` [Virtio-fs] " Hanna Reitz
2021-08-17 19:45                   ` Vivek Goyal
2021-08-17 19:45                     ` [Virtio-fs] " Vivek Goyal
2021-08-18  0:14                     ` Vivek Goyal
2021-08-18  0:14                       ` [Virtio-fs] " Vivek Goyal
2021-08-18 13:32                       ` Vivek Goyal
2021-08-18 13:32                         ` [Virtio-fs] " Vivek Goyal
2021-08-18 13:48                         ` Hanna Reitz
2021-08-18 13:48                           ` [Virtio-fs] " Hanna Reitz
2021-08-19 16:38   ` Dr. David Alan Gilbert
2021-08-19 16:38     ` [Virtio-fs] " Dr. David Alan Gilbert
2021-07-30 15:01 ` [PATCH v3 10/10] virtiofsd: Add lazy lo_do_find() Max Reitz
2021-07-30 15:01   ` [Virtio-fs] " Max Reitz
2021-08-09 19:08   ` Vivek Goyal
2021-08-09 19:08     ` [Virtio-fs] " Vivek Goyal
2021-08-10  8:38     ` Hanna Reitz
2021-08-10  8:38       ` [Virtio-fs] " Hanna Reitz
2021-08-10 14:12       ` Vivek Goyal
2021-08-10 14:12         ` [Virtio-fs] " Vivek Goyal
2021-08-10 14:17         ` Hanna Reitz
2021-08-10 14:17           ` [Virtio-fs] " Hanna Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.