All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] virtiofsd: Allow using file handles instead of O_PATH FDs
@ 2021-06-09 15:55 ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi, Max Reitz

Hi,

v1 cover letter for an overview:
https://listman.redhat.com/archives/virtio-fs/2021-June/msg00033.html

In v2, I (tried to) fix the bug Dave found, which is that
get_file_handle() indiscriminately opened the given dirfd/name
combination to get an O_RDONLY fd without checking whether we’re
actually allowed to open dirfd/name; namely, we don’t allow ourselves to
open files that aren’t regular files or directories.

So that openat(.., O_RDONLY) is changed to an openat(..., O_PATH), and
then check the file type with the statx() we’re doing anyway.  If the
file is OK to open, we reopen it O_RDONLY with the help of
/proc/self/fd, like we always do.

(This only affects patch 8.)


git-backport-diff against v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/9:[----] [--] 'virtiofsd: Add TempFd structure'
002/9:[----] [--] 'virtiofsd: Use lo_inode_open() instead of openat()'
003/9:[----] [--] 'virtiofsd: Add lo_inode_fd() helper'
004/9:[----] [--] 'virtiofsd: Let lo_fd() return a TempFd'
005/9:[----] [--] 'virtiofsd: Let lo_inode_open() return a TempFd'
006/9:[----] [--] 'virtiofsd: Add lo_inode.fhandle'
007/9:[----] [--] 'virtiofsd: Add inodes_by_handle hash table'
008/9:[0045] [FC] 'virtiofsd: Optionally fill lo_inode.fhandle'
009/9:[----] [--] 'virtiofsd: Add lazy lo_do_find()'


Max Reitz (9):
  virtiofsd: Add TempFd structure
  virtiofsd: Use lo_inode_open() instead of openat()
  virtiofsd: Add lo_inode_fd() helper
  virtiofsd: Let lo_fd() return a TempFd
  virtiofsd: Let lo_inode_open() return a TempFd
  virtiofsd: Add lo_inode.fhandle
  virtiofsd: Add inodes_by_handle hash table
  virtiofsd: Optionally fill lo_inode.fhandle
  virtiofsd: Add lazy lo_do_find()

 tools/virtiofsd/helper.c              |   3 +
 tools/virtiofsd/passthrough_ll.c      | 836 +++++++++++++++++++++-----
 tools/virtiofsd/passthrough_seccomp.c |   2 +
 3 files changed, 694 insertions(+), 147 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Virtio-fs] [PATCH v2 0/9] virtiofsd: Allow using file handles instead of O_PATH FDs
@ 2021-06-09 15:55 ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Max Reitz

Hi,

v1 cover letter for an overview:
https://listman.redhat.com/archives/virtio-fs/2021-June/msg00033.html

In v2, I (tried to) fix the bug Dave found, which is that
get_file_handle() indiscriminately opened the given dirfd/name
combination to get an O_RDONLY fd without checking whether we’re
actually allowed to open dirfd/name; namely, we don’t allow ourselves to
open files that aren’t regular files or directories.

So that openat(.., O_RDONLY) is changed to an openat(..., O_PATH), and
then check the file type with the statx() we’re doing anyway.  If the
file is OK to open, we reopen it O_RDONLY with the help of
/proc/self/fd, like we always do.

(This only affects patch 8.)


git-backport-diff against v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/9:[----] [--] 'virtiofsd: Add TempFd structure'
002/9:[----] [--] 'virtiofsd: Use lo_inode_open() instead of openat()'
003/9:[----] [--] 'virtiofsd: Add lo_inode_fd() helper'
004/9:[----] [--] 'virtiofsd: Let lo_fd() return a TempFd'
005/9:[----] [--] 'virtiofsd: Let lo_inode_open() return a TempFd'
006/9:[----] [--] 'virtiofsd: Add lo_inode.fhandle'
007/9:[----] [--] 'virtiofsd: Add inodes_by_handle hash table'
008/9:[0045] [FC] 'virtiofsd: Optionally fill lo_inode.fhandle'
009/9:[----] [--] 'virtiofsd: Add lazy lo_do_find()'


Max Reitz (9):
  virtiofsd: Add TempFd structure
  virtiofsd: Use lo_inode_open() instead of openat()
  virtiofsd: Add lo_inode_fd() helper
  virtiofsd: Let lo_fd() return a TempFd
  virtiofsd: Let lo_inode_open() return a TempFd
  virtiofsd: Add lo_inode.fhandle
  virtiofsd: Add inodes_by_handle hash table
  virtiofsd: Optionally fill lo_inode.fhandle
  virtiofsd: Add lazy lo_do_find()

 tools/virtiofsd/helper.c              |   3 +
 tools/virtiofsd/passthrough_ll.c      | 836 +++++++++++++++++++++-----
 tools/virtiofsd/passthrough_seccomp.c |   2 +
 3 files changed, 694 insertions(+), 147 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 1/9] virtiofsd: Add TempFd structure
  2021-06-09 15:55 ` [Virtio-fs] " Max Reitz
@ 2021-06-09 15:55   ` Max Reitz
  -1 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi, Max Reitz

We are planning to add file handles to lo_inode objects as an
alternative to lo_inode.fd.  That means that everywhere where we
currently reference lo_inode.fd, we will have to open a temporary file
descriptor that needs to be closed after use.

So instead of directly accessing lo_inode.fd, there will be a helper
function (lo_inode_fd()) that either returns lo_inode.fd, or opens a new
file descriptor with open_by_handle_at().  It encapsulates this result
in a TempFd structure to let the caller know whether the FD needs to be
closed after use (opened from the handle) or not (copied from
lo_inode.fd).

By using g_auto(TempFd) to store this result, callers will not even have
to care about closing a temporary FD after use.  It will be done
automatically once the object goes out of scope.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 49 ++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 49c21fd855..a4674aba80 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -174,6 +174,28 @@ struct lo_data {
     int user_killpriv_v2, killpriv_v2;
 };
 
+/**
+ * Represents a file descriptor that may either be owned by this
+ * TempFd, or only referenced (i.e. the ownership belongs to some
+ * other object, and the value has just been copied into this TempFd).
+ *
+ * The purpose of this encapsulation is to be used as g_auto(TempFd)
+ * to automatically clean up owned file descriptors when this object
+ * goes out of scope.
+ *
+ * Use temp_fd_steal() to get an owned file descriptor that will not
+ * be closed when the TempFd goes out of scope.
+ */
+typedef struct {
+    int fd;
+    bool owned; /* fd owned by this object? */
+} TempFd;
+
+#define TEMP_FD_INIT ((TempFd) { .fd = -1, .owned = false })
+
+static void temp_fd_clear(TempFd *temp_fd);
+G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(TempFd, temp_fd_clear);
+
 static const struct fuse_opt lo_opts[] = {
     { "sandbox=namespace",
       offsetof(struct lo_data, sandbox),
@@ -249,6 +271,33 @@ static struct lo_data *lo_data(fuse_req_t req)
     return (struct lo_data *)fuse_req_userdata(req);
 }
 
+/**
+ * Clean-up function for TempFds
+ */
+static void temp_fd_clear(TempFd *temp_fd)
+{
+    if (temp_fd->owned) {
+        close(temp_fd->fd);
+        *temp_fd = TEMP_FD_INIT;
+    }
+}
+
+/**
+ * Return an owned fd from *temp_fd that will not be closed when
+ * *temp_fd goes out of scope.
+ *
+ * (TODO: Remove __attribute__ once this is used.)
+ */
+static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
+{
+    if (temp_fd->owned) {
+        temp_fd->owned = false;
+        return temp_fd->fd;
+    } else {
+        return dup(temp_fd->fd);
+    }
+}
+
 /*
  * Load capng's state from our saved state if the current thread
  * hadn't previously been loaded.
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Virtio-fs] [PATCH v2 1/9] virtiofsd: Add TempFd structure
@ 2021-06-09 15:55   ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Max Reitz

We are planning to add file handles to lo_inode objects as an
alternative to lo_inode.fd.  That means that everywhere where we
currently reference lo_inode.fd, we will have to open a temporary file
descriptor that needs to be closed after use.

So instead of directly accessing lo_inode.fd, there will be a helper
function (lo_inode_fd()) that either returns lo_inode.fd, or opens a new
file descriptor with open_by_handle_at().  It encapsulates this result
in a TempFd structure to let the caller know whether the FD needs to be
closed after use (opened from the handle) or not (copied from
lo_inode.fd).

By using g_auto(TempFd) to store this result, callers will not even have
to care about closing a temporary FD after use.  It will be done
automatically once the object goes out of scope.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 49 ++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 49c21fd855..a4674aba80 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -174,6 +174,28 @@ struct lo_data {
     int user_killpriv_v2, killpriv_v2;
 };
 
+/**
+ * Represents a file descriptor that may either be owned by this
+ * TempFd, or only referenced (i.e. the ownership belongs to some
+ * other object, and the value has just been copied into this TempFd).
+ *
+ * The purpose of this encapsulation is to be used as g_auto(TempFd)
+ * to automatically clean up owned file descriptors when this object
+ * goes out of scope.
+ *
+ * Use temp_fd_steal() to get an owned file descriptor that will not
+ * be closed when the TempFd goes out of scope.
+ */
+typedef struct {
+    int fd;
+    bool owned; /* fd owned by this object? */
+} TempFd;
+
+#define TEMP_FD_INIT ((TempFd) { .fd = -1, .owned = false })
+
+static void temp_fd_clear(TempFd *temp_fd);
+G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(TempFd, temp_fd_clear);
+
 static const struct fuse_opt lo_opts[] = {
     { "sandbox=namespace",
       offsetof(struct lo_data, sandbox),
@@ -249,6 +271,33 @@ static struct lo_data *lo_data(fuse_req_t req)
     return (struct lo_data *)fuse_req_userdata(req);
 }
 
+/**
+ * Clean-up function for TempFds
+ */
+static void temp_fd_clear(TempFd *temp_fd)
+{
+    if (temp_fd->owned) {
+        close(temp_fd->fd);
+        *temp_fd = TEMP_FD_INIT;
+    }
+}
+
+/**
+ * Return an owned fd from *temp_fd that will not be closed when
+ * *temp_fd goes out of scope.
+ *
+ * (TODO: Remove __attribute__ once this is used.)
+ */
+static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
+{
+    if (temp_fd->owned) {
+        temp_fd->owned = false;
+        return temp_fd->fd;
+    } else {
+        return dup(temp_fd->fd);
+    }
+}
+
 /*
  * Load capng's state from our saved state if the current thread
  * hadn't previously been loaded.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 2/9] virtiofsd: Use lo_inode_open() instead of openat()
  2021-06-09 15:55 ` [Virtio-fs] " Max Reitz
@ 2021-06-09 15:55   ` Max Reitz
  -1 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi, Max Reitz

The xattr functions want a non-O_PATH FD, so they reopen the lo_inode.fd
with the flags they need through /proc/self/fd.

Similarly, lo_opendir() needs an O_RDONLY FD.  Instead of the
/proc/self/fd trick, it just uses openat(fd, "."), because the FD is
guaranteed to be a directory, so this works.

All cases have one problem in common, though: In the future, when we may
have a file handle in the lo_inode instead of an FD, querying an
lo_inode FD may incur an open_by_handle_at() call.  It does not make
sense to then reopen that FD with custom flags, those should have been
passed to open_by_handle_at() instead.

Use lo_inode_open() instead of openat().  As part of the file handle
change, lo_inode_open() will be made to invoke openat() only if
lo_inode.fd is valid.  Otherwise, it will invoke open_by_handle_at()
with the right flags from the start.

Consequently, after this patch, lo_inode_open() is the only place to
invoke openat() to reopen an existing FD with different flags.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index a4674aba80..436f771d2a 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1645,18 +1645,26 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
 {
     int error = ENOMEM;
     struct lo_data *lo = lo_data(req);
-    struct lo_dirp *d;
+    struct lo_inode *inode;
+    struct lo_dirp *d = NULL;
     int fd;
     ssize_t fh;
 
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        error = EBADF;
+        goto out_err;
+    }
+
     d = calloc(1, sizeof(struct lo_dirp));
     if (d == NULL) {
         goto out_err;
     }
 
-    fd = openat(lo_fd(req, ino), ".", O_RDONLY);
-    if (fd == -1) {
-        goto out_errno;
+    fd = lo_inode_open(lo, inode, O_RDONLY);
+    if (fd < 0) {
+        error = -fd;
+        goto out_err;
     }
 
     d->dp = fdopendir(fd);
@@ -1685,6 +1693,7 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
 out_errno:
     error = errno;
 out_err:
+    lo_inode_put(lo, &inode);
     if (d) {
         if (d->dp) {
             closedir(d->dp);
@@ -2827,7 +2836,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         }
     }
 
-    sprintf(procname, "%i", inode->fd);
     /*
      * It is not safe to open() non-regular/non-dir files in file server
      * unless O_PATH is used, so use that method for regular files/dir
@@ -2835,12 +2843,14 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      * Otherwise, call fchdir() to avoid open().
      */
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            goto out_err;
+            saverr = -fd;
+            goto out;
         }
         ret = fgetxattr(fd, name, value, size);
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = getxattr(procname, name, value, size);
@@ -2906,14 +2916,15 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         }
     }
 
-    sprintf(procname, "%i", inode->fd);
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            goto out_err;
+            saverr = -fd;
+            goto out;
         }
         ret = flistxattr(fd, value, size);
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = listxattr(procname, value, size);
@@ -3039,15 +3050,15 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     fuse_log(FUSE_LOG_DEBUG, "lo_setxattr(ino=%" PRIu64
              ", name=%s value=%s size=%zd)\n", ino, name, value, size);
 
-    sprintf(procname, "%i", inode->fd);
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            saverr = errno;
+            saverr = -fd;
             goto out;
         }
         ret = fsetxattr(fd, name, value, size, flags);
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = setxattr(procname, name, value, size, flags);
@@ -3105,15 +3116,15 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     fuse_log(FUSE_LOG_DEBUG, "lo_removexattr(ino=%" PRIu64 ", name=%s)\n", ino,
              name);
 
-    sprintf(procname, "%i", inode->fd);
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            saverr = errno;
+            saverr = -fd;
             goto out;
         }
         ret = fremovexattr(fd, name);
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = removexattr(procname, name);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Virtio-fs] [PATCH v2 2/9] virtiofsd: Use lo_inode_open() instead of openat()
@ 2021-06-09 15:55   ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Max Reitz

The xattr functions want a non-O_PATH FD, so they reopen the lo_inode.fd
with the flags they need through /proc/self/fd.

Similarly, lo_opendir() needs an O_RDONLY FD.  Instead of the
/proc/self/fd trick, it just uses openat(fd, "."), because the FD is
guaranteed to be a directory, so this works.

All cases have one problem in common, though: In the future, when we may
have a file handle in the lo_inode instead of an FD, querying an
lo_inode FD may incur an open_by_handle_at() call.  It does not make
sense to then reopen that FD with custom flags, those should have been
passed to open_by_handle_at() instead.

Use lo_inode_open() instead of openat().  As part of the file handle
change, lo_inode_open() will be made to invoke openat() only if
lo_inode.fd is valid.  Otherwise, it will invoke open_by_handle_at()
with the right flags from the start.

Consequently, after this patch, lo_inode_open() is the only place to
invoke openat() to reopen an existing FD with different flags.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index a4674aba80..436f771d2a 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1645,18 +1645,26 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
 {
     int error = ENOMEM;
     struct lo_data *lo = lo_data(req);
-    struct lo_dirp *d;
+    struct lo_inode *inode;
+    struct lo_dirp *d = NULL;
     int fd;
     ssize_t fh;
 
+    inode = lo_inode(req, ino);
+    if (!inode) {
+        error = EBADF;
+        goto out_err;
+    }
+
     d = calloc(1, sizeof(struct lo_dirp));
     if (d == NULL) {
         goto out_err;
     }
 
-    fd = openat(lo_fd(req, ino), ".", O_RDONLY);
-    if (fd == -1) {
-        goto out_errno;
+    fd = lo_inode_open(lo, inode, O_RDONLY);
+    if (fd < 0) {
+        error = -fd;
+        goto out_err;
     }
 
     d->dp = fdopendir(fd);
@@ -1685,6 +1693,7 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
 out_errno:
     error = errno;
 out_err:
+    lo_inode_put(lo, &inode);
     if (d) {
         if (d->dp) {
             closedir(d->dp);
@@ -2827,7 +2836,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         }
     }
 
-    sprintf(procname, "%i", inode->fd);
     /*
      * It is not safe to open() non-regular/non-dir files in file server
      * unless O_PATH is used, so use that method for regular files/dir
@@ -2835,12 +2843,14 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      * Otherwise, call fchdir() to avoid open().
      */
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            goto out_err;
+            saverr = -fd;
+            goto out;
         }
         ret = fgetxattr(fd, name, value, size);
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = getxattr(procname, name, value, size);
@@ -2906,14 +2916,15 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         }
     }
 
-    sprintf(procname, "%i", inode->fd);
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            goto out_err;
+            saverr = -fd;
+            goto out;
         }
         ret = flistxattr(fd, value, size);
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = listxattr(procname, value, size);
@@ -3039,15 +3050,15 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     fuse_log(FUSE_LOG_DEBUG, "lo_setxattr(ino=%" PRIu64
              ", name=%s value=%s size=%zd)\n", ino, name, value, size);
 
-    sprintf(procname, "%i", inode->fd);
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            saverr = errno;
+            saverr = -fd;
             goto out;
         }
         ret = fsetxattr(fd, name, value, size, flags);
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = setxattr(procname, name, value, size, flags);
@@ -3105,15 +3116,15 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     fuse_log(FUSE_LOG_DEBUG, "lo_removexattr(ino=%" PRIu64 ", name=%s)\n", ino,
              name);
 
-    sprintf(procname, "%i", inode->fd);
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        fd = lo_inode_open(lo, inode, O_RDONLY);
         if (fd < 0) {
-            saverr = errno;
+            saverr = -fd;
             goto out;
         }
         ret = fremovexattr(fd, name);
     } else {
+        sprintf(procname, "%i", inode->fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = removexattr(procname, name);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 3/9] virtiofsd: Add lo_inode_fd() helper
  2021-06-09 15:55 ` [Virtio-fs] " Max Reitz
@ 2021-06-09 15:55   ` Max Reitz
  -1 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi, Max Reitz

Once we let lo_inode.fd be optional, we will need its users to open the
file handle stored in lo_inode instead.  This function will do that.

For now, it just returns lo_inode.fd, though.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 138 ++++++++++++++++++++++++++-----
 1 file changed, 117 insertions(+), 21 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 436f771d2a..46c9dfe200 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -629,6 +629,16 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
     return elem->inode;
 }
 
+static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
+{
+    *tfd = (TempFd) {
+        .fd = inode->fd,
+        .owned = false,
+    };
+
+    return 0;
+}
+
 /*
  * TODO Remove this helper and force callers to hold an inode refcount until
  * they are done with the fd.  This will be done in a later patch to make
@@ -790,11 +800,11 @@ static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
 static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
                        int valid, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     int saverr;
     char procname[64];
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
-    int ifd;
     int res;
     int fd = -1;
 
@@ -804,7 +814,11 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         return;
     }
 
-    ifd = inode->fd;
+    res = lo_inode_fd(inode, &inode_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out_err;
+    }
 
     /* If fi->fh is invalid we'll report EBADF later */
     if (fi) {
@@ -815,7 +829,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             res = fchmod(fd, attr->st_mode);
         } else {
-            sprintf(procname, "%i", ifd);
+            sprintf(procname, "%i", inode_fd.fd);
             res = fchmodat(lo->proc_self_fd, procname, attr->st_mode, 0);
         }
         if (res == -1) {
@@ -827,12 +841,13 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         uid_t uid = (valid & FUSE_SET_ATTR_UID) ? attr->st_uid : (uid_t)-1;
         gid_t gid = (valid & FUSE_SET_ATTR_GID) ? attr->st_gid : (gid_t)-1;
 
-        saverr = drop_security_capability(lo, ifd);
+        saverr = drop_security_capability(lo, inode_fd.fd);
         if (saverr) {
             goto out_err;
         }
 
-        res = fchownat(ifd, "", uid, gid, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+        res = fchownat(inode_fd.fd, "", uid, gid,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
         if (res == -1) {
             saverr = errno;
             goto out_err;
@@ -911,7 +926,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             res = futimens(fd, tv);
         } else {
-            sprintf(procname, "%i", inode->fd);
+            sprintf(procname, "%i", inode_fd.fd);
             res = utimensat(lo->proc_self_fd, procname, tv, 0);
         }
         if (res == -1) {
@@ -1026,7 +1041,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
                         struct fuse_entry_param *e,
                         struct lo_inode **inodep)
 {
-    int newfd;
+    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
+    int newfd = -1;
     int res;
     int saverr;
     uint64_t mnt_id;
@@ -1056,7 +1072,13 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         name = ".";
     }
 
-    newfd = openat(dir->fd, name, O_PATH | O_NOFOLLOW);
+    res = lo_inode_fd(dir, &dir_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out;
+    }
+
+    newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
     if (newfd == -1) {
         goto out_err;
     }
@@ -1123,6 +1145,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 
 out_err:
     saverr = errno;
+out:
     if (newfd != -1) {
         close(newfd);
     }
@@ -1228,6 +1251,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
                              const char *name, mode_t mode, dev_t rdev,
                              const char *link)
 {
+    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
     int res;
     int saverr;
     struct lo_data *lo = lo_data(req);
@@ -1251,12 +1275,18 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
         return;
     }
 
+    res = lo_inode_fd(dir, &dir_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out;
+    }
+
     saverr = lo_change_cred(req, &old);
     if (saverr) {
         goto out;
     }
 
-    res = mknod_wrapper(dir->fd, name, link, mode, rdev);
+    res = mknod_wrapper(dir_fd.fd, name, link, mode, rdev);
 
     saverr = errno;
 
@@ -1304,6 +1334,8 @@ static void lo_symlink(fuse_req_t req, const char *link, fuse_ino_t parent,
 static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
                     const char *name)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int res;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *parent_inode;
@@ -1329,18 +1361,31 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
         goto out_err;
     }
 
+    res = lo_inode_fd(inode, &inode_fd);
+    if (res < 0) {
+        errno = -res;
+        goto out_err;
+    }
+
+    res = lo_inode_fd(parent_inode, &parent_fd);
+    if (res < 0) {
+        errno = -res;
+        goto out_err;
+    }
+
     memset(&e, 0, sizeof(struct fuse_entry_param));
     e.attr_timeout = lo->timeout;
     e.entry_timeout = lo->timeout;
 
-    sprintf(procname, "%i", inode->fd);
-    res = linkat(lo->proc_self_fd, procname, parent_inode->fd, name,
+    sprintf(procname, "%i", inode_fd.fd);
+    res = linkat(lo->proc_self_fd, procname, parent_fd.fd, name,
                  AT_SYMLINK_FOLLOW);
     if (res == -1) {
         goto out_err;
     }
 
-    res = fstatat(inode->fd, "", &e.attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    res = fstatat(inode_fd.fd, "", &e.attr,
+                  AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
     if (res == -1) {
         goto out_err;
     }
@@ -1369,6 +1414,7 @@ out_err:
 static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
                                     const char *name)
 {
+    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
     int res;
     uint64_t mnt_id;
     struct stat attr;
@@ -1379,7 +1425,12 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
         return NULL;
     }
 
-    res = do_statx(lo, dir->fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
+    res = lo_inode_fd(dir, &dir_fd);
+    if (res < 0) {
+        return NULL;
+    }
+
+    res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
     lo_inode_put(lo, &dir);
     if (res == -1) {
         return NULL;
@@ -1421,6 +1472,8 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
                       fuse_ino_t newparent, const char *newname,
                       unsigned int flags)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
+    g_auto(TempFd) newparent_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *parent_inode;
     struct lo_inode *newparent_inode;
@@ -1453,12 +1506,24 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
+    res = lo_inode_fd(parent_inode, &parent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        goto out;
+    }
+
+    res = lo_inode_fd(newparent_inode, &newparent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        goto out;
+    }
+
     if (flags) {
 #ifndef SYS_renameat2
         fuse_reply_err(req, EINVAL);
 #else
-        res = syscall(SYS_renameat2, parent_inode->fd, name,
-                        newparent_inode->fd, newname, flags);
+        res = syscall(SYS_renameat2, parent_fd.fd, name,
+                        newparent_fd.fd, newname, flags);
         if (res == -1 && errno == ENOSYS) {
             fuse_reply_err(req, EINVAL);
         } else {
@@ -1468,7 +1533,7 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
-    res = renameat(parent_inode->fd, name, newparent_inode->fd, newname);
+    res = renameat(parent_fd.fd, name, newparent_fd.fd, newname);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
 out:
@@ -1953,6 +2018,7 @@ static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
 static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
                       mode_t mode, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int fd = -1;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *parent_inode;
@@ -1975,6 +2041,12 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
         return;
     }
 
+    err = lo_inode_fd(parent_inode, &parent_fd);
+    if (err < 0) {
+        err = -err;
+        goto out;
+    }
+
     err = lo_change_cred(req, &old);
     if (err) {
         goto out;
@@ -1983,7 +2055,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
     update_open_flags(lo->writeback, lo->allow_direct_io, fi);
 
     /* Try to create a new file but don't open existing files */
-    fd = openat(parent_inode->fd, name, fi->flags | O_CREAT | O_EXCL, mode);
+    fd = openat(parent_fd.fd, name, fi->flags | O_CREAT | O_EXCL, mode);
     err = fd == -1 ? errno : 0;
 
     lo_restore_cred(&old);
@@ -2788,6 +2860,7 @@ static int xattr_map_server(const struct lo_data *lo, const char *server_name,
 static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
                         size_t size)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_data *lo = lo_data(req);
     g_autofree char *value = NULL;
     char procname[64];
@@ -2850,7 +2923,12 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         }
         ret = fgetxattr(fd, name, value, size);
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = getxattr(procname, name, value, size);
@@ -2887,6 +2965,7 @@ out:
 
 static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_data *lo = lo_data(req);
     g_autofree char *value = NULL;
     char procname[64];
@@ -2924,7 +3003,12 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         }
         ret = flistxattr(fd, value, size);
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = listxattr(procname, value, size);
@@ -3013,6 +3097,7 @@ out:
 static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
                         const char *value, size_t size, int flags)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     char procname[64];
     const char *name;
     char *mapped_name;
@@ -3058,7 +3143,12 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         }
         ret = fsetxattr(fd, name, value, size, flags);
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = setxattr(procname, name, value, size, flags);
@@ -3079,6 +3169,7 @@ out:
 
 static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     char procname[64];
     const char *name;
     char *mapped_name;
@@ -3124,7 +3215,12 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
         }
         ret = fremovexattr(fd, name);
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = removexattr(procname, name);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Virtio-fs] [PATCH v2 3/9] virtiofsd: Add lo_inode_fd() helper
@ 2021-06-09 15:55   ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Max Reitz

Once we let lo_inode.fd be optional, we will need its users to open the
file handle stored in lo_inode instead.  This function will do that.

For now, it just returns lo_inode.fd, though.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 138 ++++++++++++++++++++++++++-----
 1 file changed, 117 insertions(+), 21 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 436f771d2a..46c9dfe200 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -629,6 +629,16 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
     return elem->inode;
 }
 
+static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
+{
+    *tfd = (TempFd) {
+        .fd = inode->fd,
+        .owned = false,
+    };
+
+    return 0;
+}
+
 /*
  * TODO Remove this helper and force callers to hold an inode refcount until
  * they are done with the fd.  This will be done in a later patch to make
@@ -790,11 +800,11 @@ static int lo_fi_fd(fuse_req_t req, struct fuse_file_info *fi)
 static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
                        int valid, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     int saverr;
     char procname[64];
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
-    int ifd;
     int res;
     int fd = -1;
 
@@ -804,7 +814,11 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         return;
     }
 
-    ifd = inode->fd;
+    res = lo_inode_fd(inode, &inode_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out_err;
+    }
 
     /* If fi->fh is invalid we'll report EBADF later */
     if (fi) {
@@ -815,7 +829,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             res = fchmod(fd, attr->st_mode);
         } else {
-            sprintf(procname, "%i", ifd);
+            sprintf(procname, "%i", inode_fd.fd);
             res = fchmodat(lo->proc_self_fd, procname, attr->st_mode, 0);
         }
         if (res == -1) {
@@ -827,12 +841,13 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         uid_t uid = (valid & FUSE_SET_ATTR_UID) ? attr->st_uid : (uid_t)-1;
         gid_t gid = (valid & FUSE_SET_ATTR_GID) ? attr->st_gid : (gid_t)-1;
 
-        saverr = drop_security_capability(lo, ifd);
+        saverr = drop_security_capability(lo, inode_fd.fd);
         if (saverr) {
             goto out_err;
         }
 
-        res = fchownat(ifd, "", uid, gid, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+        res = fchownat(inode_fd.fd, "", uid, gid,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
         if (res == -1) {
             saverr = errno;
             goto out_err;
@@ -911,7 +926,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             res = futimens(fd, tv);
         } else {
-            sprintf(procname, "%i", inode->fd);
+            sprintf(procname, "%i", inode_fd.fd);
             res = utimensat(lo->proc_self_fd, procname, tv, 0);
         }
         if (res == -1) {
@@ -1026,7 +1041,8 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
                         struct fuse_entry_param *e,
                         struct lo_inode **inodep)
 {
-    int newfd;
+    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
+    int newfd = -1;
     int res;
     int saverr;
     uint64_t mnt_id;
@@ -1056,7 +1072,13 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         name = ".";
     }
 
-    newfd = openat(dir->fd, name, O_PATH | O_NOFOLLOW);
+    res = lo_inode_fd(dir, &dir_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out;
+    }
+
+    newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
     if (newfd == -1) {
         goto out_err;
     }
@@ -1123,6 +1145,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 
 out_err:
     saverr = errno;
+out:
     if (newfd != -1) {
         close(newfd);
     }
@@ -1228,6 +1251,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
                              const char *name, mode_t mode, dev_t rdev,
                              const char *link)
 {
+    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
     int res;
     int saverr;
     struct lo_data *lo = lo_data(req);
@@ -1251,12 +1275,18 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
         return;
     }
 
+    res = lo_inode_fd(dir, &dir_fd);
+    if (res < 0) {
+        saverr = -res;
+        goto out;
+    }
+
     saverr = lo_change_cred(req, &old);
     if (saverr) {
         goto out;
     }
 
-    res = mknod_wrapper(dir->fd, name, link, mode, rdev);
+    res = mknod_wrapper(dir_fd.fd, name, link, mode, rdev);
 
     saverr = errno;
 
@@ -1304,6 +1334,8 @@ static void lo_symlink(fuse_req_t req, const char *link, fuse_ino_t parent,
 static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
                     const char *name)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int res;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *parent_inode;
@@ -1329,18 +1361,31 @@ static void lo_link(fuse_req_t req, fuse_ino_t ino, fuse_ino_t parent,
         goto out_err;
     }
 
+    res = lo_inode_fd(inode, &inode_fd);
+    if (res < 0) {
+        errno = -res;
+        goto out_err;
+    }
+
+    res = lo_inode_fd(parent_inode, &parent_fd);
+    if (res < 0) {
+        errno = -res;
+        goto out_err;
+    }
+
     memset(&e, 0, sizeof(struct fuse_entry_param));
     e.attr_timeout = lo->timeout;
     e.entry_timeout = lo->timeout;
 
-    sprintf(procname, "%i", inode->fd);
-    res = linkat(lo->proc_self_fd, procname, parent_inode->fd, name,
+    sprintf(procname, "%i", inode_fd.fd);
+    res = linkat(lo->proc_self_fd, procname, parent_fd.fd, name,
                  AT_SYMLINK_FOLLOW);
     if (res == -1) {
         goto out_err;
     }
 
-    res = fstatat(inode->fd, "", &e.attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    res = fstatat(inode_fd.fd, "", &e.attr,
+                  AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
     if (res == -1) {
         goto out_err;
     }
@@ -1369,6 +1414,7 @@ out_err:
 static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
                                     const char *name)
 {
+    g_auto(TempFd) dir_fd = TEMP_FD_INIT;
     int res;
     uint64_t mnt_id;
     struct stat attr;
@@ -1379,7 +1425,12 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
         return NULL;
     }
 
-    res = do_statx(lo, dir->fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
+    res = lo_inode_fd(dir, &dir_fd);
+    if (res < 0) {
+        return NULL;
+    }
+
+    res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
     lo_inode_put(lo, &dir);
     if (res == -1) {
         return NULL;
@@ -1421,6 +1472,8 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
                       fuse_ino_t newparent, const char *newname,
                       unsigned int flags)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
+    g_auto(TempFd) newparent_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *parent_inode;
     struct lo_inode *newparent_inode;
@@ -1453,12 +1506,24 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
+    res = lo_inode_fd(parent_inode, &parent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        goto out;
+    }
+
+    res = lo_inode_fd(newparent_inode, &newparent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        goto out;
+    }
+
     if (flags) {
 #ifndef SYS_renameat2
         fuse_reply_err(req, EINVAL);
 #else
-        res = syscall(SYS_renameat2, parent_inode->fd, name,
-                        newparent_inode->fd, newname, flags);
+        res = syscall(SYS_renameat2, parent_fd.fd, name,
+                        newparent_fd.fd, newname, flags);
         if (res == -1 && errno == ENOSYS) {
             fuse_reply_err(req, EINVAL);
         } else {
@@ -1468,7 +1533,7 @@ static void lo_rename(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
-    res = renameat(parent_inode->fd, name, newparent_inode->fd, newname);
+    res = renameat(parent_fd.fd, name, newparent_fd.fd, newname);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
 out:
@@ -1953,6 +2018,7 @@ static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
 static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
                       mode_t mode, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int fd = -1;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *parent_inode;
@@ -1975,6 +2041,12 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
         return;
     }
 
+    err = lo_inode_fd(parent_inode, &parent_fd);
+    if (err < 0) {
+        err = -err;
+        goto out;
+    }
+
     err = lo_change_cred(req, &old);
     if (err) {
         goto out;
@@ -1983,7 +2055,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
     update_open_flags(lo->writeback, lo->allow_direct_io, fi);
 
     /* Try to create a new file but don't open existing files */
-    fd = openat(parent_inode->fd, name, fi->flags | O_CREAT | O_EXCL, mode);
+    fd = openat(parent_fd.fd, name, fi->flags | O_CREAT | O_EXCL, mode);
     err = fd == -1 ? errno : 0;
 
     lo_restore_cred(&old);
@@ -2788,6 +2860,7 @@ static int xattr_map_server(const struct lo_data *lo, const char *server_name,
 static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
                         size_t size)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_data *lo = lo_data(req);
     g_autofree char *value = NULL;
     char procname[64];
@@ -2850,7 +2923,12 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         }
         ret = fgetxattr(fd, name, value, size);
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = getxattr(procname, name, value, size);
@@ -2887,6 +2965,7 @@ out:
 
 static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_data *lo = lo_data(req);
     g_autofree char *value = NULL;
     char procname[64];
@@ -2924,7 +3003,12 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         }
         ret = flistxattr(fd, value, size);
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = listxattr(procname, value, size);
@@ -3013,6 +3097,7 @@ out:
 static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
                         const char *value, size_t size, int flags)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     char procname[64];
     const char *name;
     char *mapped_name;
@@ -3058,7 +3143,12 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         }
         ret = fsetxattr(fd, name, value, size, flags);
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = setxattr(procname, name, value, size, flags);
@@ -3079,6 +3169,7 @@ out:
 
 static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     char procname[64];
     const char *name;
     char *mapped_name;
@@ -3124,7 +3215,12 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
         }
         ret = fremovexattr(fd, name);
     } else {
-        sprintf(procname, "%i", inode->fd);
+        ret = lo_inode_fd(inode, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
+            goto out;
+        }
+        sprintf(procname, "%i", inode_fd.fd);
         /* fchdir should not fail here */
         FCHDIR_NOFAIL(lo->proc_self_fd);
         ret = removexattr(procname, name);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 4/9] virtiofsd: Let lo_fd() return a TempFd
  2021-06-09 15:55 ` [Virtio-fs] " Max Reitz
@ 2021-06-09 15:55   ` Max Reitz
  -1 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi, Max Reitz

Accessing lo_inode.fd must generally happen through lo_inode_fd(), and
lo_fd() is no exception; and then it must pass on the TempFd it has
received from lo_inode_fd().

(Note that all lo_fd() calls now use proper error handling, where all of
them were in-line before; i.e. they were used in place of the fd
argument of some function call.  This only worked because the only error
that could occur was that lo_inode() failed to find the inode ID: Then
-1 would be passed as the fd, which would result in an EBADF error,
which is precisely what we would want to return to the guest for an
invalid inode ID.
Now, though, lo_inode_fd() might potentially invoke open_by_handle_at(),
which can return many different errors, and they should be properly
handled and returned to the guest.  So we can no longer allow lo_fd() to
be used in-line, and instead need to do proper error handling for it.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 55 +++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 11 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 46c9dfe200..8f64bcd6c5 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -644,18 +644,19 @@ static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
  * they are done with the fd.  This will be done in a later patch to make
  * review easier.
  */
-static int lo_fd(fuse_req_t req, fuse_ino_t ino)
+static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
 {
     struct lo_inode *inode = lo_inode(req, ino);
-    int fd;
+    int res;
 
     if (!inode) {
-        return -1;
+        return -EBADF;
     }
 
-    fd = inode->fd;
+    res = lo_inode_fd(inode, tfd);
+
     lo_inode_put(lo_data(req), &inode);
-    return fd;
+    return res;
 }
 
 /*
@@ -766,14 +767,19 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
 static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
                        struct fuse_file_info *fi)
 {
+    g_auto(TempFd) ino_fd = TEMP_FD_INIT;
     int res;
     struct stat buf;
     struct lo_data *lo = lo_data(req);
 
     (void)fi;
 
-    res =
-        fstatat(lo_fd(req, ino), "", &buf, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    res = lo_fd(req, ino, &ino_fd);
+    if (res < 0) {
+        return (void)fuse_reply_err(req, -res);
+    }
+
+    res = fstatat(ino_fd.fd, "", &buf, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
     if (res == -1) {
         return (void)fuse_reply_err(req, errno);
     }
@@ -1441,6 +1447,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
 
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *inode;
     struct lo_data *lo = lo_data(req);
@@ -1455,13 +1462,19 @@ static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
         return;
     }
 
+    res = lo_fd(req, parent, &parent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
     inode = lookup_name(req, parent, name);
     if (!inode) {
         fuse_reply_err(req, EIO);
         return;
     }
 
-    res = unlinkat(lo_fd(req, parent), name, AT_REMOVEDIR);
+    res = unlinkat(parent_fd.fd, name, AT_REMOVEDIR);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
     unref_inode_lolocked(lo, inode, 1);
@@ -1547,6 +1560,7 @@ out:
 
 static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *inode;
     struct lo_data *lo = lo_data(req);
@@ -1561,13 +1575,19 @@ static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
         return;
     }
 
+    res = lo_fd(req, parent, &parent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
     inode = lookup_name(req, parent, name);
     if (!inode) {
         fuse_reply_err(req, EIO);
         return;
     }
 
-    res = unlinkat(lo_fd(req, parent), name, 0);
+    res = unlinkat(parent_fd.fd, name, 0);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
     unref_inode_lolocked(lo, inode, 1);
@@ -1647,10 +1667,16 @@ static void lo_forget_multi(fuse_req_t req, size_t count,
 
 static void lo_readlink(fuse_req_t req, fuse_ino_t ino)
 {
+    g_auto(TempFd) ino_fd = TEMP_FD_INIT;
     char buf[PATH_MAX + 1];
     int res;
 
-    res = readlinkat(lo_fd(req, ino), "", buf, sizeof(buf));
+    res = lo_fd(req, ino, &ino_fd);
+    if (res < 0) {
+        return (void)fuse_reply_err(req, -res);
+    }
+
+    res = readlinkat(ino_fd.fd, "", buf, sizeof(buf));
     if (res == -1) {
         return (void)fuse_reply_err(req, errno);
     }
@@ -2447,10 +2473,17 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
 
 static void lo_statfs(fuse_req_t req, fuse_ino_t ino)
 {
+    g_auto(TempFd) ino_fd = TEMP_FD_INIT;
     int res;
     struct statvfs stbuf;
 
-    res = fstatvfs(lo_fd(req, ino), &stbuf);
+    res = lo_fd(req, ino, &ino_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
+    res = fstatvfs(ino_fd.fd, &stbuf);
     if (res == -1) {
         fuse_reply_err(req, errno);
     } else {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Virtio-fs] [PATCH v2 4/9] virtiofsd: Let lo_fd() return a TempFd
@ 2021-06-09 15:55   ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Max Reitz

Accessing lo_inode.fd must generally happen through lo_inode_fd(), and
lo_fd() is no exception; and then it must pass on the TempFd it has
received from lo_inode_fd().

(Note that all lo_fd() calls now use proper error handling, where all of
them were in-line before; i.e. they were used in place of the fd
argument of some function call.  This only worked because the only error
that could occur was that lo_inode() failed to find the inode ID: Then
-1 would be passed as the fd, which would result in an EBADF error,
which is precisely what we would want to return to the guest for an
invalid inode ID.
Now, though, lo_inode_fd() might potentially invoke open_by_handle_at(),
which can return many different errors, and they should be properly
handled and returned to the guest.  So we can no longer allow lo_fd() to
be used in-line, and instead need to do proper error handling for it.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 55 +++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 11 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 46c9dfe200..8f64bcd6c5 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -644,18 +644,19 @@ static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
  * they are done with the fd.  This will be done in a later patch to make
  * review easier.
  */
-static int lo_fd(fuse_req_t req, fuse_ino_t ino)
+static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
 {
     struct lo_inode *inode = lo_inode(req, ino);
-    int fd;
+    int res;
 
     if (!inode) {
-        return -1;
+        return -EBADF;
     }
 
-    fd = inode->fd;
+    res = lo_inode_fd(inode, tfd);
+
     lo_inode_put(lo_data(req), &inode);
-    return fd;
+    return res;
 }
 
 /*
@@ -766,14 +767,19 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
 static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
                        struct fuse_file_info *fi)
 {
+    g_auto(TempFd) ino_fd = TEMP_FD_INIT;
     int res;
     struct stat buf;
     struct lo_data *lo = lo_data(req);
 
     (void)fi;
 
-    res =
-        fstatat(lo_fd(req, ino), "", &buf, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
+    res = lo_fd(req, ino, &ino_fd);
+    if (res < 0) {
+        return (void)fuse_reply_err(req, -res);
+    }
+
+    res = fstatat(ino_fd.fd, "", &buf, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
     if (res == -1) {
         return (void)fuse_reply_err(req, errno);
     }
@@ -1441,6 +1447,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
 
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *inode;
     struct lo_data *lo = lo_data(req);
@@ -1455,13 +1462,19 @@ static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
         return;
     }
 
+    res = lo_fd(req, parent, &parent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
     inode = lookup_name(req, parent, name);
     if (!inode) {
         fuse_reply_err(req, EIO);
         return;
     }
 
-    res = unlinkat(lo_fd(req, parent), name, AT_REMOVEDIR);
+    res = unlinkat(parent_fd.fd, name, AT_REMOVEDIR);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
     unref_inode_lolocked(lo, inode, 1);
@@ -1547,6 +1560,7 @@ out:
 
 static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
 {
+    g_auto(TempFd) parent_fd = TEMP_FD_INIT;
     int res;
     struct lo_inode *inode;
     struct lo_data *lo = lo_data(req);
@@ -1561,13 +1575,19 @@ static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
         return;
     }
 
+    res = lo_fd(req, parent, &parent_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
     inode = lookup_name(req, parent, name);
     if (!inode) {
         fuse_reply_err(req, EIO);
         return;
     }
 
-    res = unlinkat(lo_fd(req, parent), name, 0);
+    res = unlinkat(parent_fd.fd, name, 0);
 
     fuse_reply_err(req, res == -1 ? errno : 0);
     unref_inode_lolocked(lo, inode, 1);
@@ -1647,10 +1667,16 @@ static void lo_forget_multi(fuse_req_t req, size_t count,
 
 static void lo_readlink(fuse_req_t req, fuse_ino_t ino)
 {
+    g_auto(TempFd) ino_fd = TEMP_FD_INIT;
     char buf[PATH_MAX + 1];
     int res;
 
-    res = readlinkat(lo_fd(req, ino), "", buf, sizeof(buf));
+    res = lo_fd(req, ino, &ino_fd);
+    if (res < 0) {
+        return (void)fuse_reply_err(req, -res);
+    }
+
+    res = readlinkat(ino_fd.fd, "", buf, sizeof(buf));
     if (res == -1) {
         return (void)fuse_reply_err(req, errno);
     }
@@ -2447,10 +2473,17 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t ino,
 
 static void lo_statfs(fuse_req_t req, fuse_ino_t ino)
 {
+    g_auto(TempFd) ino_fd = TEMP_FD_INIT;
     int res;
     struct statvfs stbuf;
 
-    res = fstatvfs(lo_fd(req, ino), &stbuf);
+    res = lo_fd(req, ino, &ino_fd);
+    if (res < 0) {
+        fuse_reply_err(req, -res);
+        return;
+    }
+
+    res = fstatvfs(ino_fd.fd, &stbuf);
     if (res == -1) {
         fuse_reply_err(req, errno);
     } else {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 5/9] virtiofsd: Let lo_inode_open() return a TempFd
  2021-06-09 15:55 ` [Virtio-fs] " Max Reitz
@ 2021-06-09 15:55   ` Max Reitz
  -1 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi, Max Reitz

Strictly speaking, this is not necessary, because lo_inode_open() will
always return a new FD owned by the caller, so TempFd.owned will always
be true.

However, auto-cleanup is nice, and in some cases this plays nicely with
an lo_inode_fd() call in another conditional branch (see lo_setattr()).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 137 +++++++++++++------------------
 1 file changed, 59 insertions(+), 78 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 8f64bcd6c5..3014e8baf8 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -285,10 +285,8 @@ static void temp_fd_clear(TempFd *temp_fd)
 /**
  * Return an owned fd from *temp_fd that will not be closed when
  * *temp_fd goes out of scope.
- *
- * (TODO: Remove __attribute__ once this is used.)
  */
-static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
+static int temp_fd_steal(TempFd *temp_fd)
 {
     if (temp_fd->owned) {
         temp_fd->owned = false;
@@ -667,9 +665,12 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
  * when a malicious client opens special files such as block device nodes.
  * Symlink inodes are also rejected since symlinks must already have been
  * traversed on the client side.
+ *
+ * The fd is returned in tfd->fd.  The return value is 0 on success and -errno
+ * otherwise.
  */
-static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
-                         int open_flags)
+static int lo_inode_open(const struct lo_data *lo, const struct lo_inode *inode,
+                         int open_flags, TempFd *tfd)
 {
     g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
     int fd;
@@ -688,7 +689,13 @@ static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
     if (fd < 0) {
         return -errno;
     }
-    return fd;
+
+    *tfd = (TempFd) {
+        .fd = fd,
+        .owned = true,
+    };
+
+    return 0;
 }
 
 static void lo_init(void *userdata, struct fuse_conn_info *conn)
@@ -820,7 +827,12 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         return;
     }
 
-    res = lo_inode_fd(inode, &inode_fd);
+    if (!fi && (valid & FUSE_SET_ATTR_SIZE)) {
+        /* We need an O_RDWR FD for ftruncate() */
+        res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
+    } else {
+        res = lo_inode_fd(inode, &inode_fd);
+    }
     if (res < 0) {
         saverr = -res;
         goto out_err;
@@ -868,18 +880,11 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             truncfd = fd;
         } else {
-            truncfd = lo_inode_open(lo, inode, O_RDWR);
-            if (truncfd < 0) {
-                saverr = -truncfd;
-                goto out_err;
-            }
+            truncfd = inode_fd.fd;
         }
 
         saverr = drop_security_capability(lo, truncfd);
         if (saverr) {
-            if (!fi) {
-                close(truncfd);
-            }
             goto out_err;
         }
 
@@ -887,9 +892,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
             res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
             if (res != 0) {
                 saverr = res;
-                if (!fi) {
-                    close(truncfd);
-                }
                 goto out_err;
             }
         }
@@ -902,9 +904,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
                 fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
             }
         }
-        if (!fi) {
-            close(truncfd);
-        }
         if (res == -1) {
             goto out_err;
         }
@@ -1734,11 +1733,12 @@ static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
 static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
                        struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     int error = ENOMEM;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
     struct lo_dirp *d = NULL;
-    int fd;
+    int res;
     ssize_t fh;
 
     inode = lo_inode(req, ino);
@@ -1752,13 +1752,13 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
         goto out_err;
     }
 
-    fd = lo_inode_open(lo, inode, O_RDONLY);
-    if (fd < 0) {
-        error = -fd;
+    res = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+    if (res < 0) {
+        error = -res;
         goto out_err;
     }
 
-    d->dp = fdopendir(fd);
+    d->dp = fdopendir(temp_fd_steal(&inode_fd));
     if (d->dp == NULL) {
         goto out_errno;
     }
@@ -1788,8 +1788,6 @@ out_err:
     if (d) {
         if (d->dp) {
             closedir(d->dp);
-        } else if (fd != -1) {
-            close(fd);
         }
         free(d);
     }
@@ -1989,6 +1987,7 @@ static void update_open_flags(int writeback, int allow_direct_io,
 static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
                       int existing_fd, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     ssize_t fh;
     int fd = existing_fd;
     int err;
@@ -2005,16 +2004,18 @@ static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
             }
         }
 
-        fd = lo_inode_open(lo, inode, fi->flags);
+        err = lo_inode_open(lo, inode, fi->flags, &inode_fd);
 
         if (cap_fsetid_dropped) {
             if (gain_effective_cap("FSETID")) {
                 fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
             }
         }
-        if (fd < 0) {
-            return -fd;
+        if (err < 0) {
+            return -err;
         }
+        fd = temp_fd_steal(&inode_fd);
+
         if (fi->flags & (O_TRUNC)) {
             int err = drop_security_capability(lo, fd);
             if (err) {
@@ -2124,8 +2125,9 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
                                                       uint64_t lock_owner,
                                                       pid_t pid, int *err)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_inode_plock *plock;
-    int fd;
+    int res;
 
     plock =
         g_hash_table_lookup(inode->posix_locks, GUINT_TO_POINTER(lock_owner));
@@ -2142,15 +2144,15 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
 
     /* Open another instance of file which can be used for ofd locks. */
     /* TODO: What if file is not writable? */
-    fd = lo_inode_open(lo, inode, O_RDWR);
-    if (fd < 0) {
-        *err = -fd;
+    res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
+    if (res < 0) {
+        *err = -res;
         free(plock);
         return NULL;
     }
 
     plock->lock_owner = lock_owner;
-    plock->fd = fd;
+    plock->fd = temp_fd_steal(&inode_fd);
     g_hash_table_insert(inode->posix_locks, GUINT_TO_POINTER(plock->lock_owner),
                         plock);
     return plock;
@@ -2366,6 +2368,7 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
                      struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_inode *inode = lo_inode(req, ino);
     struct lo_data *lo = lo_data(req);
     int res;
@@ -2380,11 +2383,12 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
     }
 
     if (!fi) {
-        fd = lo_inode_open(lo, inode, O_RDWR);
-        if (fd < 0) {
-            res = -fd;
+        res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
+        if (res < 0) {
+            res = -res;
             goto out;
         }
+        fd = inode_fd.fd;
     } else {
         fd = lo_fi_fd(req, fi);
     }
@@ -2394,9 +2398,6 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
     } else {
         res = fsync(fd) == -1 ? errno : 0;
     }
-    if (!fi) {
-        close(fd);
-    }
 out:
     lo_inode_put(lo, &inode);
     fuse_reply_err(req, res);
@@ -2902,7 +2903,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     mapped_name = NULL;
     name = in_name;
@@ -2949,12 +2949,12 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      * Otherwise, call fchdir() to avoid open().
      */
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = fgetxattr(fd, name, value, size);
+        ret = fgetxattr(inode_fd.fd, name, value, size);
     } else {
         ret = lo_inode_fd(inode, &inode_fd);
         if (ret < 0) {
@@ -2981,10 +2981,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         fuse_reply_xattr(req, ret);
     }
 out_free:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     return;
 
@@ -3005,7 +3001,6 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     inode = lo_inode(req, ino);
     if (!inode) {
@@ -3029,12 +3024,12 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
     }
 
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = flistxattr(fd, value, size);
+        ret = flistxattr(inode_fd.fd, value, size);
     } else {
         ret = lo_inode_fd(inode, &inode_fd);
         if (ret < 0) {
@@ -3113,10 +3108,6 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         fuse_reply_xattr(req, ret);
     }
 out_free:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     return;
 
@@ -3138,7 +3129,6 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     mapped_name = NULL;
     name = in_name;
@@ -3169,12 +3159,12 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
              ", name=%s value=%s size=%zd)\n", ino, name, value, size);
 
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = fsetxattr(fd, name, value, size, flags);
+        ret = fsetxattr(inode_fd.fd, name, value, size, flags);
     } else {
         ret = lo_inode_fd(inode, &inode_fd);
         if (ret < 0) {
@@ -3191,10 +3181,6 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     saverr = ret == -1 ? errno : 0;
 
 out:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     g_free(mapped_name);
     fuse_reply_err(req, saverr);
@@ -3210,7 +3196,6 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     mapped_name = NULL;
     name = in_name;
@@ -3241,12 +3226,12 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
              name);
 
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = fremovexattr(fd, name);
+        ret = fremovexattr(inode_fd.fd, name);
     } else {
         ret = lo_inode_fd(inode, &inode_fd);
         if (ret < 0) {
@@ -3263,10 +3248,6 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     saverr = ret == -1 ? errno : 0;
 
 out:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     g_free(mapped_name);
     fuse_reply_err(req, saverr);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Virtio-fs] [PATCH v2 5/9] virtiofsd: Let lo_inode_open() return a TempFd
@ 2021-06-09 15:55   ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Max Reitz

Strictly speaking, this is not necessary, because lo_inode_open() will
always return a new FD owned by the caller, so TempFd.owned will always
be true.

However, auto-cleanup is nice, and in some cases this plays nicely with
an lo_inode_fd() call in another conditional branch (see lo_setattr()).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 137 +++++++++++++------------------
 1 file changed, 59 insertions(+), 78 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 8f64bcd6c5..3014e8baf8 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -285,10 +285,8 @@ static void temp_fd_clear(TempFd *temp_fd)
 /**
  * Return an owned fd from *temp_fd that will not be closed when
  * *temp_fd goes out of scope.
- *
- * (TODO: Remove __attribute__ once this is used.)
  */
-static __attribute__((unused)) int temp_fd_steal(TempFd *temp_fd)
+static int temp_fd_steal(TempFd *temp_fd)
 {
     if (temp_fd->owned) {
         temp_fd->owned = false;
@@ -667,9 +665,12 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
  * when a malicious client opens special files such as block device nodes.
  * Symlink inodes are also rejected since symlinks must already have been
  * traversed on the client side.
+ *
+ * The fd is returned in tfd->fd.  The return value is 0 on success and -errno
+ * otherwise.
  */
-static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
-                         int open_flags)
+static int lo_inode_open(const struct lo_data *lo, const struct lo_inode *inode,
+                         int open_flags, TempFd *tfd)
 {
     g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
     int fd;
@@ -688,7 +689,13 @@ static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
     if (fd < 0) {
         return -errno;
     }
-    return fd;
+
+    *tfd = (TempFd) {
+        .fd = fd,
+        .owned = true,
+    };
+
+    return 0;
 }
 
 static void lo_init(void *userdata, struct fuse_conn_info *conn)
@@ -820,7 +827,12 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         return;
     }
 
-    res = lo_inode_fd(inode, &inode_fd);
+    if (!fi && (valid & FUSE_SET_ATTR_SIZE)) {
+        /* We need an O_RDWR FD for ftruncate() */
+        res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
+    } else {
+        res = lo_inode_fd(inode, &inode_fd);
+    }
     if (res < 0) {
         saverr = -res;
         goto out_err;
@@ -868,18 +880,11 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
         if (fi) {
             truncfd = fd;
         } else {
-            truncfd = lo_inode_open(lo, inode, O_RDWR);
-            if (truncfd < 0) {
-                saverr = -truncfd;
-                goto out_err;
-            }
+            truncfd = inode_fd.fd;
         }
 
         saverr = drop_security_capability(lo, truncfd);
         if (saverr) {
-            if (!fi) {
-                close(truncfd);
-            }
             goto out_err;
         }
 
@@ -887,9 +892,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
             res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
             if (res != 0) {
                 saverr = res;
-                if (!fi) {
-                    close(truncfd);
-                }
                 goto out_err;
             }
         }
@@ -902,9 +904,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
                 fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
             }
         }
-        if (!fi) {
-            close(truncfd);
-        }
         if (res == -1) {
             goto out_err;
         }
@@ -1734,11 +1733,12 @@ static struct lo_dirp *lo_dirp(fuse_req_t req, struct fuse_file_info *fi)
 static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
                        struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     int error = ENOMEM;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode;
     struct lo_dirp *d = NULL;
-    int fd;
+    int res;
     ssize_t fh;
 
     inode = lo_inode(req, ino);
@@ -1752,13 +1752,13 @@ static void lo_opendir(fuse_req_t req, fuse_ino_t ino,
         goto out_err;
     }
 
-    fd = lo_inode_open(lo, inode, O_RDONLY);
-    if (fd < 0) {
-        error = -fd;
+    res = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+    if (res < 0) {
+        error = -res;
         goto out_err;
     }
 
-    d->dp = fdopendir(fd);
+    d->dp = fdopendir(temp_fd_steal(&inode_fd));
     if (d->dp == NULL) {
         goto out_errno;
     }
@@ -1788,8 +1788,6 @@ out_err:
     if (d) {
         if (d->dp) {
             closedir(d->dp);
-        } else if (fd != -1) {
-            close(fd);
         }
         free(d);
     }
@@ -1989,6 +1987,7 @@ static void update_open_flags(int writeback, int allow_direct_io,
 static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
                       int existing_fd, struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     ssize_t fh;
     int fd = existing_fd;
     int err;
@@ -2005,16 +2004,18 @@ static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
             }
         }
 
-        fd = lo_inode_open(lo, inode, fi->flags);
+        err = lo_inode_open(lo, inode, fi->flags, &inode_fd);
 
         if (cap_fsetid_dropped) {
             if (gain_effective_cap("FSETID")) {
                 fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
             }
         }
-        if (fd < 0) {
-            return -fd;
+        if (err < 0) {
+            return -err;
         }
+        fd = temp_fd_steal(&inode_fd);
+
         if (fi->flags & (O_TRUNC)) {
             int err = drop_security_capability(lo, fd);
             if (err) {
@@ -2124,8 +2125,9 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
                                                       uint64_t lock_owner,
                                                       pid_t pid, int *err)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_inode_plock *plock;
-    int fd;
+    int res;
 
     plock =
         g_hash_table_lookup(inode->posix_locks, GUINT_TO_POINTER(lock_owner));
@@ -2142,15 +2144,15 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
 
     /* Open another instance of file which can be used for ofd locks. */
     /* TODO: What if file is not writable? */
-    fd = lo_inode_open(lo, inode, O_RDWR);
-    if (fd < 0) {
-        *err = -fd;
+    res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
+    if (res < 0) {
+        *err = -res;
         free(plock);
         return NULL;
     }
 
     plock->lock_owner = lock_owner;
-    plock->fd = fd;
+    plock->fd = temp_fd_steal(&inode_fd);
     g_hash_table_insert(inode->posix_locks, GUINT_TO_POINTER(plock->lock_owner),
                         plock);
     return plock;
@@ -2366,6 +2368,7 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
                      struct fuse_file_info *fi)
 {
+    g_auto(TempFd) inode_fd = TEMP_FD_INIT;
     struct lo_inode *inode = lo_inode(req, ino);
     struct lo_data *lo = lo_data(req);
     int res;
@@ -2380,11 +2383,12 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
     }
 
     if (!fi) {
-        fd = lo_inode_open(lo, inode, O_RDWR);
-        if (fd < 0) {
-            res = -fd;
+        res = lo_inode_open(lo, inode, O_RDWR, &inode_fd);
+        if (res < 0) {
+            res = -res;
             goto out;
         }
+        fd = inode_fd.fd;
     } else {
         fd = lo_fi_fd(req, fi);
     }
@@ -2394,9 +2398,6 @@ static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
     } else {
         res = fsync(fd) == -1 ? errno : 0;
     }
-    if (!fi) {
-        close(fd);
-    }
 out:
     lo_inode_put(lo, &inode);
     fuse_reply_err(req, res);
@@ -2902,7 +2903,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     mapped_name = NULL;
     name = in_name;
@@ -2949,12 +2949,12 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
      * Otherwise, call fchdir() to avoid open().
      */
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = fgetxattr(fd, name, value, size);
+        ret = fgetxattr(inode_fd.fd, name, value, size);
     } else {
         ret = lo_inode_fd(inode, &inode_fd);
         if (ret < 0) {
@@ -2981,10 +2981,6 @@ static void lo_getxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
         fuse_reply_xattr(req, ret);
     }
 out_free:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     return;
 
@@ -3005,7 +3001,6 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     inode = lo_inode(req, ino);
     if (!inode) {
@@ -3029,12 +3024,12 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
     }
 
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = flistxattr(fd, value, size);
+        ret = flistxattr(inode_fd.fd, value, size);
     } else {
         ret = lo_inode_fd(inode, &inode_fd);
         if (ret < 0) {
@@ -3113,10 +3108,6 @@ static void lo_listxattr(fuse_req_t req, fuse_ino_t ino, size_t size)
         fuse_reply_xattr(req, ret);
     }
 out_free:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     return;
 
@@ -3138,7 +3129,6 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     mapped_name = NULL;
     name = in_name;
@@ -3169,12 +3159,12 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
              ", name=%s value=%s size=%zd)\n", ino, name, value, size);
 
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = fsetxattr(fd, name, value, size, flags);
+        ret = fsetxattr(inode_fd.fd, name, value, size, flags);
     } else {
         ret = lo_inode_fd(inode, &inode_fd);
         if (ret < 0) {
@@ -3191,10 +3181,6 @@ static void lo_setxattr(fuse_req_t req, fuse_ino_t ino, const char *in_name,
     saverr = ret == -1 ? errno : 0;
 
 out:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     g_free(mapped_name);
     fuse_reply_err(req, saverr);
@@ -3210,7 +3196,6 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     struct lo_inode *inode;
     ssize_t ret;
     int saverr;
-    int fd = -1;
 
     mapped_name = NULL;
     name = in_name;
@@ -3241,12 +3226,12 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
              name);
 
     if (S_ISREG(inode->filetype) || S_ISDIR(inode->filetype)) {
-        fd = lo_inode_open(lo, inode, O_RDONLY);
-        if (fd < 0) {
-            saverr = -fd;
+        ret = lo_inode_open(lo, inode, O_RDONLY, &inode_fd);
+        if (ret < 0) {
+            saverr = -ret;
             goto out;
         }
-        ret = fremovexattr(fd, name);
+        ret = fremovexattr(inode_fd.fd, name);
     } else {
         ret = lo_inode_fd(inode, &inode_fd);
         if (ret < 0) {
@@ -3263,10 +3248,6 @@ static void lo_removexattr(fuse_req_t req, fuse_ino_t ino, const char *in_name)
     saverr = ret == -1 ? errno : 0;
 
 out:
-    if (fd >= 0) {
-        close(fd);
-    }
-
     lo_inode_put(lo, &inode);
     g_free(mapped_name);
     fuse_reply_err(req, saverr);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 6/9] virtiofsd: Add lo_inode.fhandle
  2021-06-09 15:55 ` [Virtio-fs] " Max Reitz
@ 2021-06-09 15:55   ` Max Reitz
  -1 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi, Max Reitz

This new field is an alternative to lo_inode.fd: Either of the two must
be set.  In case an O_PATH FD is needed for some lo_inode, it is either
taken from lo_inode.fd, if valid, or a temporary FD is opened with
open_by_handle_at().

Using a file handle instead of an FD has the advantage of keeping the
number of open file descriptors low.

Because open_by_handle_at() requires a mount FD (i.e. a non-O_PATH FD
opened on the filesystem to which the file handle refers), but every
lo_fhandle only has a mount ID (as returned by name_to_handle_at()), we
keep a hash map of such FDs in mount_fds (mapping ID to FD).
get_file_handle(), which is added by a later patch, will ensure that
every mount ID for which we have generated a handle has a corresponding
entry in mount_fds.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c      | 116 ++++++++++++++++++++++----
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 2 files changed, 102 insertions(+), 15 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 3014e8baf8..e665575401 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -88,8 +88,25 @@ struct lo_key {
     uint64_t mnt_id;
 };
 
+struct lo_fhandle {
+    union {
+        struct file_handle handle;
+        char padding[sizeof(struct file_handle) + MAX_HANDLE_SZ];
+    };
+    int mount_id;
+};
+
+/* Maps mount IDs to an FD that we can pass to open_by_handle_at() */
+static GHashTable *mount_fds;
+pthread_rwlock_t mount_fds_lock = PTHREAD_RWLOCK_INITIALIZER;
+
 struct lo_inode {
+    /*
+     * Either of fd or fhandle must be set (i.e. >= 0 or non-NULL,
+     * respectively).
+     */
     int fd;
+    struct lo_fhandle *fhandle;
 
     /*
      * Atomic reference count for this object.  The nlookup field holds a
@@ -296,6 +313,44 @@ static int temp_fd_steal(TempFd *temp_fd)
     }
 }
 
+/**
+ * Open the given file handle with the given flags.
+ *
+ * The mount FD to pass to open_by_handle_at() is taken from the
+ * mount_fds hash map.
+ *
+ * On error, return -errno.
+ */
+static int open_file_handle(const struct lo_fhandle *fh, int flags)
+{
+    gpointer mount_fd_ptr;
+    int mount_fd;
+    bool found;
+    int ret;
+
+    ret = pthread_rwlock_rdlock(&mount_fds_lock);
+    if (ret) {
+        return -ret;
+    }
+
+    /* mount_fd == 0 is valid, so we need lookup_extended */
+    found = g_hash_table_lookup_extended(mount_fds,
+                                         GINT_TO_POINTER(fh->mount_id),
+                                         NULL, &mount_fd_ptr);
+    pthread_rwlock_unlock(&mount_fds_lock);
+    if (!found) {
+        return -EINVAL;
+    }
+    mount_fd = GPOINTER_TO_INT(mount_fd_ptr);
+
+    ret = open_by_handle_at(mount_fd, (struct file_handle *)&fh->handle, flags);
+    if (ret < 0) {
+        return -errno;
+    }
+
+    return ret;
+}
+
 /*
  * Load capng's state from our saved state if the current thread
  * hadn't previously been loaded.
@@ -602,7 +657,11 @@ static void lo_inode_put(struct lo_data *lo, struct lo_inode **inodep)
     *inodep = NULL;
 
     if (g_atomic_int_dec_and_test(&inode->refcount)) {
-        close(inode->fd);
+        if (inode->fd >= 0) {
+            close(inode->fd);
+        } else {
+            g_free(inode->fhandle);
+        }
         free(inode);
     }
 }
@@ -629,10 +688,25 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
 
 static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
 {
-    *tfd = (TempFd) {
-        .fd = inode->fd,
-        .owned = false,
-    };
+    if (inode->fd >= 0) {
+        *tfd = (TempFd) {
+            .fd = inode->fd,
+            .owned = false,
+        };
+    } else {
+        int fd;
+
+        assert(inode->fhandle != NULL);
+        fd = open_file_handle(inode->fhandle, O_PATH);
+        if (fd < 0) {
+            return -errno;
+        }
+
+        *tfd = (TempFd) {
+            .fd = fd,
+            .owned = true,
+        };
+    }
 
     return 0;
 }
@@ -672,22 +746,32 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
 static int lo_inode_open(const struct lo_data *lo, const struct lo_inode *inode,
                          int open_flags, TempFd *tfd)
 {
-    g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
+    g_autofree char *fd_str = NULL;
     int fd;
 
     if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) {
         return -EBADF;
     }
 
-    /*
-     * The file is a symlink so O_NOFOLLOW must be ignored. We checked earlier
-     * that the inode is not a special file but if an external process races
-     * with us then symlinks are traversed here. It is not possible to escape
-     * the shared directory since it is mounted as "/" though.
-     */
-    fd = openat(lo->proc_self_fd, fd_str, open_flags & ~O_NOFOLLOW);
-    if (fd < 0) {
-        return -errno;
+    if (inode->fd >= 0) {
+        /*
+         * The file is a symlink so O_NOFOLLOW must be ignored. We checked
+         * earlier that the inode is not a special file but if an external
+         * process races with us then symlinks are traversed here. It is not
+         * possible to escape the shared directory since it is mounted as "/"
+         * though.
+         */
+        fd_str = g_strdup_printf("%d", inode->fd);
+        fd = openat(lo->proc_self_fd, fd_str, open_flags & ~O_NOFOLLOW);
+        if (fd < 0) {
+            return -errno;
+        }
+    } else {
+        assert(inode->fhandle != NULL);
+        fd = open_file_handle(inode->fhandle, open_flags);
+        if (fd < 0) {
+            return fd;
+        }
     }
 
     *tfd = (TempFd) {
@@ -3911,6 +3995,8 @@ int main(int argc, char *argv[])
     lo.root.fuse_ino = FUSE_ROOT_ID;
     lo.cache = CACHE_AUTO;
 
+    mount_fds = g_hash_table_new(NULL, NULL);
+
     /*
      * Set up the ino map like this:
      * [0] Reserved (will not be used)
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index 62441cfcdb..e948f25ac1 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -77,6 +77,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(statx),
     SCMP_SYS(open),
     SCMP_SYS(openat),
+    SCMP_SYS(open_by_handle_at),
     SCMP_SYS(ppoll),
     SCMP_SYS(prctl), /* TODO restrict to just PR_SET_NAME? */
     SCMP_SYS(preadv),
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Virtio-fs] [PATCH v2 6/9] virtiofsd: Add lo_inode.fhandle
@ 2021-06-09 15:55   ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Max Reitz

This new field is an alternative to lo_inode.fd: Either of the two must
be set.  In case an O_PATH FD is needed for some lo_inode, it is either
taken from lo_inode.fd, if valid, or a temporary FD is opened with
open_by_handle_at().

Using a file handle instead of an FD has the advantage of keeping the
number of open file descriptors low.

Because open_by_handle_at() requires a mount FD (i.e. a non-O_PATH FD
opened on the filesystem to which the file handle refers), but every
lo_fhandle only has a mount ID (as returned by name_to_handle_at()), we
keep a hash map of such FDs in mount_fds (mapping ID to FD).
get_file_handle(), which is added by a later patch, will ensure that
every mount ID for which we have generated a handle has a corresponding
entry in mount_fds.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c      | 116 ++++++++++++++++++++++----
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 2 files changed, 102 insertions(+), 15 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 3014e8baf8..e665575401 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -88,8 +88,25 @@ struct lo_key {
     uint64_t mnt_id;
 };
 
+struct lo_fhandle {
+    union {
+        struct file_handle handle;
+        char padding[sizeof(struct file_handle) + MAX_HANDLE_SZ];
+    };
+    int mount_id;
+};
+
+/* Maps mount IDs to an FD that we can pass to open_by_handle_at() */
+static GHashTable *mount_fds;
+pthread_rwlock_t mount_fds_lock = PTHREAD_RWLOCK_INITIALIZER;
+
 struct lo_inode {
+    /*
+     * Either of fd or fhandle must be set (i.e. >= 0 or non-NULL,
+     * respectively).
+     */
     int fd;
+    struct lo_fhandle *fhandle;
 
     /*
      * Atomic reference count for this object.  The nlookup field holds a
@@ -296,6 +313,44 @@ static int temp_fd_steal(TempFd *temp_fd)
     }
 }
 
+/**
+ * Open the given file handle with the given flags.
+ *
+ * The mount FD to pass to open_by_handle_at() is taken from the
+ * mount_fds hash map.
+ *
+ * On error, return -errno.
+ */
+static int open_file_handle(const struct lo_fhandle *fh, int flags)
+{
+    gpointer mount_fd_ptr;
+    int mount_fd;
+    bool found;
+    int ret;
+
+    ret = pthread_rwlock_rdlock(&mount_fds_lock);
+    if (ret) {
+        return -ret;
+    }
+
+    /* mount_fd == 0 is valid, so we need lookup_extended */
+    found = g_hash_table_lookup_extended(mount_fds,
+                                         GINT_TO_POINTER(fh->mount_id),
+                                         NULL, &mount_fd_ptr);
+    pthread_rwlock_unlock(&mount_fds_lock);
+    if (!found) {
+        return -EINVAL;
+    }
+    mount_fd = GPOINTER_TO_INT(mount_fd_ptr);
+
+    ret = open_by_handle_at(mount_fd, (struct file_handle *)&fh->handle, flags);
+    if (ret < 0) {
+        return -errno;
+    }
+
+    return ret;
+}
+
 /*
  * Load capng's state from our saved state if the current thread
  * hadn't previously been loaded.
@@ -602,7 +657,11 @@ static void lo_inode_put(struct lo_data *lo, struct lo_inode **inodep)
     *inodep = NULL;
 
     if (g_atomic_int_dec_and_test(&inode->refcount)) {
-        close(inode->fd);
+        if (inode->fd >= 0) {
+            close(inode->fd);
+        } else {
+            g_free(inode->fhandle);
+        }
         free(inode);
     }
 }
@@ -629,10 +688,25 @@ static struct lo_inode *lo_inode(fuse_req_t req, fuse_ino_t ino)
 
 static int lo_inode_fd(const struct lo_inode *inode, TempFd *tfd)
 {
-    *tfd = (TempFd) {
-        .fd = inode->fd,
-        .owned = false,
-    };
+    if (inode->fd >= 0) {
+        *tfd = (TempFd) {
+            .fd = inode->fd,
+            .owned = false,
+        };
+    } else {
+        int fd;
+
+        assert(inode->fhandle != NULL);
+        fd = open_file_handle(inode->fhandle, O_PATH);
+        if (fd < 0) {
+            return -errno;
+        }
+
+        *tfd = (TempFd) {
+            .fd = fd,
+            .owned = true,
+        };
+    }
 
     return 0;
 }
@@ -672,22 +746,32 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino, TempFd *tfd)
 static int lo_inode_open(const struct lo_data *lo, const struct lo_inode *inode,
                          int open_flags, TempFd *tfd)
 {
-    g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
+    g_autofree char *fd_str = NULL;
     int fd;
 
     if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) {
         return -EBADF;
     }
 
-    /*
-     * The file is a symlink so O_NOFOLLOW must be ignored. We checked earlier
-     * that the inode is not a special file but if an external process races
-     * with us then symlinks are traversed here. It is not possible to escape
-     * the shared directory since it is mounted as "/" though.
-     */
-    fd = openat(lo->proc_self_fd, fd_str, open_flags & ~O_NOFOLLOW);
-    if (fd < 0) {
-        return -errno;
+    if (inode->fd >= 0) {
+        /*
+         * The file is a symlink so O_NOFOLLOW must be ignored. We checked
+         * earlier that the inode is not a special file but if an external
+         * process races with us then symlinks are traversed here. It is not
+         * possible to escape the shared directory since it is mounted as "/"
+         * though.
+         */
+        fd_str = g_strdup_printf("%d", inode->fd);
+        fd = openat(lo->proc_self_fd, fd_str, open_flags & ~O_NOFOLLOW);
+        if (fd < 0) {
+            return -errno;
+        }
+    } else {
+        assert(inode->fhandle != NULL);
+        fd = open_file_handle(inode->fhandle, open_flags);
+        if (fd < 0) {
+            return fd;
+        }
     }
 
     *tfd = (TempFd) {
@@ -3911,6 +3995,8 @@ int main(int argc, char *argv[])
     lo.root.fuse_ino = FUSE_ROOT_ID;
     lo.cache = CACHE_AUTO;
 
+    mount_fds = g_hash_table_new(NULL, NULL);
+
     /*
      * Set up the ino map like this:
      * [0] Reserved (will not be used)
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index 62441cfcdb..e948f25ac1 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -77,6 +77,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(statx),
     SCMP_SYS(open),
     SCMP_SYS(openat),
+    SCMP_SYS(open_by_handle_at),
     SCMP_SYS(ppoll),
     SCMP_SYS(prctl), /* TODO restrict to just PR_SET_NAME? */
     SCMP_SYS(preadv),
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
  2021-06-09 15:55 ` [Virtio-fs] " Max Reitz
@ 2021-06-09 15:55   ` Max Reitz
  -1 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi, Max Reitz

Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
its inode ID will remain in use until we drop our lo_inode (and
lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
the inode ID as an lo_inode key, because any inode with an inode ID we
find in lo_data.inodes (on the same filesystem) must be the exact same
file.

This will change when we start setting lo_inode.fhandle so we do not
have to keep an O_PATH FD open.  Then, unlinking such an inode will
immediately remove it, so its ID can then be reused by newly created
files, even while the lo_inode object is still there[1].

So creating a new file can then reuse the old file's inode ID, and
looking up the new file would lead to us finding the old file's
lo_inode, which is not ideal.

Luckily, just as file handles cause this problem, they also solve it:  A
file handle contains a generation ID, which changes when an inode ID is
reused, so the new file can be distinguished from the old one.  So all
we need to do is to add a second map besides lo_data.inodes that maps
file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.

Unfortunately, we cannot rely on being able to generate file handles
every time.  Therefore, we still enter every lo_inode object into
inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
potential inodes_by_handle entry then has precedence, the inodes_by_ids
entry is just a fallback.

Note that we do not generate lo_fhandle objects yet, and so we also do
not enter anything into the inodes_by_handle map yet.  Also, all lookups
skip that map.  We might manually create file handles with some code
that is immediately removed by the next patch again, but that would
break the assumption in lo_find() that every lo_inode with a non-NULL
.fhandle must have an entry in inodes_by_handle and vice versa.  So we
leave actually using the inodes_by_handle map for the next patch.

[1] If some application in the guest still has the file open, there is
going to be a corresponding FD mapping in lo_data.fd_map.  In such a
case, the inode will only go away once every application in the guest
has closed it.  The problem described only applies to cases where the
guest does not have the file open, and it is just in the dentry cache,
basically.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 80 +++++++++++++++++++++++++-------
 1 file changed, 64 insertions(+), 16 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index e665575401..793d2c333e 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -179,7 +179,8 @@ struct lo_data {
     int announce_submounts;
     bool use_statx;
     struct lo_inode root;
-    GHashTable *inodes; /* protected by lo->mutex */
+    GHashTable *inodes_by_ids; /* protected by lo->mutex */
+    GHashTable *inodes_by_handle; /* protected by lo->mutex */
     struct lo_map ino_map; /* protected by lo->mutex */
     struct lo_map dirp_map; /* protected by lo->mutex */
     struct lo_map fd_map; /* protected by lo->mutex */
@@ -257,8 +258,9 @@ static struct {
 /* That we loaded cap-ng in the current thread from the saved */
 static __thread bool cap_loaded = 0;
 
-static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
-                                uint64_t mnt_id);
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id);
 static int xattr_map_client(const struct lo_data *lo, const char *client_name,
                             char **out_name);
 
@@ -1032,18 +1034,39 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
-static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
-                                uint64_t mnt_id)
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id)
 {
-    struct lo_inode *p;
-    struct lo_key key = {
+    struct lo_inode *p = NULL;
+    struct lo_key ids_key = {
         .ino = st->st_ino,
         .dev = st->st_dev,
         .mnt_id = mnt_id,
     };
 
     pthread_mutex_lock(&lo->mutex);
-    p = g_hash_table_lookup(lo->inodes, &key);
+    if (fhandle) {
+        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
+    }
+    if (!p) {
+        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
+        /*
+         * When we had to fall back to looking up an inode by its IDs,
+         * ensure that we hit an entry that does not have a file
+         * handle.  Entries with file handles must also have a handle
+         * alt key, so if we have not found it by that handle alt key,
+         * we must have found an entry with a mismatching handle; i.e.
+         * an entry for a different file, even though it has the same
+         * inode ID.
+         * (This can happen when we look up a new file that has reused
+         * the inode ID of some previously unlinked inode for which we
+         * still have an lo_inode object.)
+         */
+        if (p && fhandle != NULL && p->fhandle != NULL) {
+            p = NULL;
+        }
+    }
     if (p) {
         assert(p->nlookup > 0);
         p->nlookup++;
@@ -1183,7 +1206,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
-    inode = lo_find(lo, &e->attr, mnt_id);
+    inode = lo_find(lo, NULL, &e->attr, mnt_id);
     if (inode) {
         close(newfd);
     } else {
@@ -1213,7 +1236,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         }
         pthread_mutex_lock(&lo->mutex);
         inode->fuse_ino = lo_add_inode_mapping(req, inode);
-        g_hash_table_insert(lo->inodes, &inode->key, inode);
+        g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
         pthread_mutex_unlock(&lo->mutex);
     }
     e->ino = inode->fuse_ino;
@@ -1525,7 +1548,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
         return NULL;
     }
 
-    return lo_find(lo, &attr, mnt_id);
+    return lo_find(lo, NULL, &attr, mnt_id);
 }
 
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
@@ -1688,7 +1711,7 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
     inode->nlookup -= n;
     if (!inode->nlookup) {
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
-        g_hash_table_remove(lo->inodes, &inode->key);
+        g_hash_table_remove(lo->inodes_by_ids, &inode->key);
         if (lo->posix_lock) {
             if (g_hash_table_size(inode->posix_locks)) {
                 fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
@@ -3388,7 +3411,7 @@ static void lo_destroy(void *userdata)
         GHashTableIter iter;
         gpointer key, value;
 
-        g_hash_table_iter_init(&iter, lo->inodes);
+        g_hash_table_iter_init(&iter, lo->inodes_by_ids);
         if (!g_hash_table_iter_next(&iter, &key, &value)) {
             break;
         }
@@ -3931,10 +3954,34 @@ static gboolean lo_key_equal(gconstpointer a, gconstpointer b)
     return la->ino == lb->ino && la->dev == lb->dev && la->mnt_id == lb->mnt_id;
 }
 
+static guint lo_fhandle_hash(gconstpointer key)
+{
+    const struct lo_fhandle *fh = key;
+    guint hash;
+    size_t i;
+
+    /* Basically g_str_hash() */
+    hash = 5381;
+    for (i = 0; i < sizeof(fh->padding); i++) {
+        hash += hash * 33 + (unsigned char)fh->padding[i];
+    }
+    hash += hash * 33 + fh->mount_id;
+
+    return hash;
+}
+
+static gboolean lo_fhandle_equal(gconstpointer a, gconstpointer b)
+{
+    return !memcmp(a, b, sizeof(struct lo_fhandle));
+}
+
 static void fuse_lo_data_cleanup(struct lo_data *lo)
 {
-    if (lo->inodes) {
-        g_hash_table_destroy(lo->inodes);
+    if (lo->inodes_by_ids) {
+        g_hash_table_destroy(lo->inodes_by_ids);
+    }
+    if (lo->inodes_by_ids) {
+        g_hash_table_destroy(lo->inodes_by_handle);
     }
 
     if (lo->root.posix_locks) {
@@ -3990,7 +4037,8 @@ int main(int argc, char *argv[])
     qemu_init_exec_dir(argv[0]);
 
     pthread_mutex_init(&lo.mutex, NULL);
-    lo.inodes = g_hash_table_new(lo_key_hash, lo_key_equal);
+    lo.inodes_by_ids = g_hash_table_new(lo_key_hash, lo_key_equal);
+    lo.inodes_by_handle = g_hash_table_new(lo_fhandle_hash, lo_fhandle_equal);
     lo.root.fd = -1;
     lo.root.fuse_ino = FUSE_ROOT_ID;
     lo.cache = CACHE_AUTO;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
@ 2021-06-09 15:55   ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Max Reitz

Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
its inode ID will remain in use until we drop our lo_inode (and
lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
the inode ID as an lo_inode key, because any inode with an inode ID we
find in lo_data.inodes (on the same filesystem) must be the exact same
file.

This will change when we start setting lo_inode.fhandle so we do not
have to keep an O_PATH FD open.  Then, unlinking such an inode will
immediately remove it, so its ID can then be reused by newly created
files, even while the lo_inode object is still there[1].

So creating a new file can then reuse the old file's inode ID, and
looking up the new file would lead to us finding the old file's
lo_inode, which is not ideal.

Luckily, just as file handles cause this problem, they also solve it:  A
file handle contains a generation ID, which changes when an inode ID is
reused, so the new file can be distinguished from the old one.  So all
we need to do is to add a second map besides lo_data.inodes that maps
file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.

Unfortunately, we cannot rely on being able to generate file handles
every time.  Therefore, we still enter every lo_inode object into
inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
potential inodes_by_handle entry then has precedence, the inodes_by_ids
entry is just a fallback.

Note that we do not generate lo_fhandle objects yet, and so we also do
not enter anything into the inodes_by_handle map yet.  Also, all lookups
skip that map.  We might manually create file handles with some code
that is immediately removed by the next patch again, but that would
break the assumption in lo_find() that every lo_inode with a non-NULL
.fhandle must have an entry in inodes_by_handle and vice versa.  So we
leave actually using the inodes_by_handle map for the next patch.

[1] If some application in the guest still has the file open, there is
going to be a corresponding FD mapping in lo_data.fd_map.  In such a
case, the inode will only go away once every application in the guest
has closed it.  The problem described only applies to cases where the
guest does not have the file open, and it is just in the dentry cache,
basically.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 80 +++++++++++++++++++++++++-------
 1 file changed, 64 insertions(+), 16 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index e665575401..793d2c333e 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -179,7 +179,8 @@ struct lo_data {
     int announce_submounts;
     bool use_statx;
     struct lo_inode root;
-    GHashTable *inodes; /* protected by lo->mutex */
+    GHashTable *inodes_by_ids; /* protected by lo->mutex */
+    GHashTable *inodes_by_handle; /* protected by lo->mutex */
     struct lo_map ino_map; /* protected by lo->mutex */
     struct lo_map dirp_map; /* protected by lo->mutex */
     struct lo_map fd_map; /* protected by lo->mutex */
@@ -257,8 +258,9 @@ static struct {
 /* That we loaded cap-ng in the current thread from the saved */
 static __thread bool cap_loaded = 0;
 
-static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
-                                uint64_t mnt_id);
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id);
 static int xattr_map_client(const struct lo_data *lo, const char *client_name,
                             char **out_name);
 
@@ -1032,18 +1034,39 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
-static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
-                                uint64_t mnt_id)
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id)
 {
-    struct lo_inode *p;
-    struct lo_key key = {
+    struct lo_inode *p = NULL;
+    struct lo_key ids_key = {
         .ino = st->st_ino,
         .dev = st->st_dev,
         .mnt_id = mnt_id,
     };
 
     pthread_mutex_lock(&lo->mutex);
-    p = g_hash_table_lookup(lo->inodes, &key);
+    if (fhandle) {
+        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
+    }
+    if (!p) {
+        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
+        /*
+         * When we had to fall back to looking up an inode by its IDs,
+         * ensure that we hit an entry that does not have a file
+         * handle.  Entries with file handles must also have a handle
+         * alt key, so if we have not found it by that handle alt key,
+         * we must have found an entry with a mismatching handle; i.e.
+         * an entry for a different file, even though it has the same
+         * inode ID.
+         * (This can happen when we look up a new file that has reused
+         * the inode ID of some previously unlinked inode for which we
+         * still have an lo_inode object.)
+         */
+        if (p && fhandle != NULL && p->fhandle != NULL) {
+            p = NULL;
+        }
+    }
     if (p) {
         assert(p->nlookup > 0);
         p->nlookup++;
@@ -1183,7 +1206,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
-    inode = lo_find(lo, &e->attr, mnt_id);
+    inode = lo_find(lo, NULL, &e->attr, mnt_id);
     if (inode) {
         close(newfd);
     } else {
@@ -1213,7 +1236,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         }
         pthread_mutex_lock(&lo->mutex);
         inode->fuse_ino = lo_add_inode_mapping(req, inode);
-        g_hash_table_insert(lo->inodes, &inode->key, inode);
+        g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
         pthread_mutex_unlock(&lo->mutex);
     }
     e->ino = inode->fuse_ino;
@@ -1525,7 +1548,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
         return NULL;
     }
 
-    return lo_find(lo, &attr, mnt_id);
+    return lo_find(lo, NULL, &attr, mnt_id);
 }
 
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
@@ -1688,7 +1711,7 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
     inode->nlookup -= n;
     if (!inode->nlookup) {
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
-        g_hash_table_remove(lo->inodes, &inode->key);
+        g_hash_table_remove(lo->inodes_by_ids, &inode->key);
         if (lo->posix_lock) {
             if (g_hash_table_size(inode->posix_locks)) {
                 fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
@@ -3388,7 +3411,7 @@ static void lo_destroy(void *userdata)
         GHashTableIter iter;
         gpointer key, value;
 
-        g_hash_table_iter_init(&iter, lo->inodes);
+        g_hash_table_iter_init(&iter, lo->inodes_by_ids);
         if (!g_hash_table_iter_next(&iter, &key, &value)) {
             break;
         }
@@ -3931,10 +3954,34 @@ static gboolean lo_key_equal(gconstpointer a, gconstpointer b)
     return la->ino == lb->ino && la->dev == lb->dev && la->mnt_id == lb->mnt_id;
 }
 
+static guint lo_fhandle_hash(gconstpointer key)
+{
+    const struct lo_fhandle *fh = key;
+    guint hash;
+    size_t i;
+
+    /* Basically g_str_hash() */
+    hash = 5381;
+    for (i = 0; i < sizeof(fh->padding); i++) {
+        hash += hash * 33 + (unsigned char)fh->padding[i];
+    }
+    hash += hash * 33 + fh->mount_id;
+
+    return hash;
+}
+
+static gboolean lo_fhandle_equal(gconstpointer a, gconstpointer b)
+{
+    return !memcmp(a, b, sizeof(struct lo_fhandle));
+}
+
 static void fuse_lo_data_cleanup(struct lo_data *lo)
 {
-    if (lo->inodes) {
-        g_hash_table_destroy(lo->inodes);
+    if (lo->inodes_by_ids) {
+        g_hash_table_destroy(lo->inodes_by_ids);
+    }
+    if (lo->inodes_by_ids) {
+        g_hash_table_destroy(lo->inodes_by_handle);
     }
 
     if (lo->root.posix_locks) {
@@ -3990,7 +4037,8 @@ int main(int argc, char *argv[])
     qemu_init_exec_dir(argv[0]);
 
     pthread_mutex_init(&lo.mutex, NULL);
-    lo.inodes = g_hash_table_new(lo_key_hash, lo_key_equal);
+    lo.inodes_by_ids = g_hash_table_new(lo_key_hash, lo_key_equal);
+    lo.inodes_by_handle = g_hash_table_new(lo_fhandle_hash, lo_fhandle_equal);
     lo.root.fd = -1;
     lo.root.fuse_ino = FUSE_ROOT_ID;
     lo.cache = CACHE_AUTO;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 8/9] virtiofsd: Optionally fill lo_inode.fhandle
  2021-06-09 15:55 ` [Virtio-fs] " Max Reitz
@ 2021-06-09 15:55   ` Max Reitz
  -1 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi, Max Reitz

When the inode_file_handles option is set, try to generate a file handle
for new inodes instead of opening an O_PATH FD.

Being able to open these again will require CAP_DAC_READ_SEARCH, so the
description text tells the user they will also need to specify
-o modcaps=+dac_read_search.

Generating a file handle returns the mount ID it is valid for.  Opening
it will require an FD instead.  We have mount_fds to map an ID to an FD.
get_file_handle() fills the hash map by opening the file we have
generated a handle for.  To verify that the resulting FD indeed
represents the handle's mount ID, we use statx().  Therefore, using file
handles requires statx() support.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/helper.c              |   3 +
 tools/virtiofsd/passthrough_ll.c      | 197 ++++++++++++++++++++++++--
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 3 files changed, 192 insertions(+), 9 deletions(-)

diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index 5e98ed702b..954f8639e6 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -186,6 +186,9 @@ void fuse_cmdline_help(void)
            "                               to virtiofsd from guest applications.\n"
            "                               default: no_allow_direct_io\n"
            "    -o announce_submounts      Announce sub-mount points to the guest\n"
+           "    -o inode_file_handles      Use file handles to reference inodes\n"
+           "                               instead of O_PATH file descriptors\n"
+           "                               (requires -o modcaps=+dac_read_search)\n"
            );
 }
 
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 793d2c333e..2e56c40b2f 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -190,6 +190,7 @@ struct lo_data {
     /* An O_PATH file descriptor to /proc/self/fd/ */
     int proc_self_fd;
     int user_killpriv_v2, killpriv_v2;
+    int inode_file_handles;
 };
 
 /**
@@ -244,6 +245,10 @@ static const struct fuse_opt lo_opts[] = {
     { "announce_submounts", offsetof(struct lo_data, announce_submounts), 1 },
     { "killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 1 },
     { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
+    { "inode_file_handles", offsetof(struct lo_data, inode_file_handles), 1 },
+    { "no_inode_file_handles",
+      offsetof(struct lo_data, inode_file_handles),
+      0 },
     FUSE_OPT_END
 };
 static bool use_syslog = false;
@@ -315,6 +320,135 @@ static int temp_fd_steal(TempFd *temp_fd)
     }
 }
 
+/**
+ * Generate a file handle for the given dirfd/name combination.
+ *
+ * If mount_fds does not yet contain an entry for the handle's mount
+ * ID, (re)open dirfd/name in O_RDONLY mode and add it to mount_fds
+ * as the FD for that mount ID.  (That is the file that we have
+ * generated a handle for, so it should be representative for the
+ * mount ID.  However, to be sure (and to rule out races), we use
+ * statx() to verify that our assumption is correct.)
+ */
+static struct lo_fhandle *get_file_handle(struct lo_data *lo,
+                                          int dirfd, const char *name)
+{
+    /* We need statx() to verify the mount ID */
+#if defined(CONFIG_STATX) && defined(STATX_MNT_ID)
+    struct lo_fhandle *fh;
+    int ret;
+
+    if (!lo->use_statx || !lo->inode_file_handles) {
+        return NULL;
+    }
+
+    fh = g_new0(struct lo_fhandle, 1);
+
+    fh->handle.handle_bytes = sizeof(fh->padding) - sizeof(fh->handle);
+    ret = name_to_handle_at(dirfd, name, &fh->handle, &fh->mount_id,
+                            AT_EMPTY_PATH);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    if (pthread_rwlock_rdlock(&mount_fds_lock)) {
+        goto fail;
+    }
+    if (!g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
+        g_auto(TempFd) path_fd = TEMP_FD_INIT;
+        struct statx stx;
+        char procname[64];
+        int fd;
+
+        pthread_rwlock_unlock(&mount_fds_lock);
+
+        /*
+         * Before opening an O_RDONLY fd, check whether dirfd/name is a regular
+         * file or directory, because we must not open anything else with
+         * anything but O_PATH.
+         * (And we use that occasion to verify that the file has the mount ID we
+         * need.)
+         */
+        if (name[0]) {
+            path_fd.fd = openat(dirfd, name, O_PATH);
+            if (path_fd.fd < 0) {
+                goto fail;
+            }
+            path_fd.owned = true;
+        } else {
+            path_fd.fd = dirfd;
+            path_fd.owned = false;
+        }
+
+        ret = statx(path_fd.fd, "", AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
+                    STATX_TYPE | STATX_MNT_ID, &stx);
+        if (ret < 0) {
+            if (errno == ENOSYS) {
+                lo->use_statx = false;
+                fuse_log(FUSE_LOG_WARNING,
+                         "statx() does not work: Will not be able to use file "
+                         "handles for inodes\n");
+            }
+            goto fail;
+        }
+        if (!(stx.stx_mask & STATX_MNT_ID) || stx.stx_mnt_id != fh->mount_id) {
+            /*
+             * One reason for stx_mnt_id != mount_id could be that dirfd/name
+             * is a directory, and some other filesystem was mounted there
+             * between us generating the file handle and then opening the FD.
+             * (Other kinds of races might be possible, too.)
+             * Failing this function is not fatal, though, because our caller
+             * (lo_do_lookup()) will just fall back to opening an O_PATH FD to
+             * store in lo_inode.fd instead of storing a file handle in
+             * lo_inode.fhandle.  So we do not need to try too hard to get an
+             * FD for fh->mount_id so this function could succeed.
+             */
+            goto fail;
+        }
+        if (!(stx.stx_mask & STATX_TYPE) ||
+            !(S_ISREG(stx.stx_mode) || S_ISDIR(stx.stx_mode)))
+        {
+            /*
+             * We must not open special files with anything but O_PATH, so we
+             * cannot use this file for mount_fds.
+             * Just return a failure in such a case and let the lo_inode have
+             * an O_PATH fd instead of a file handle.
+             */
+            goto fail;
+        }
+
+        /* Now that we know this fd is safe to open, do it */
+        snprintf(procname, sizeof(procname), "%i", path_fd.fd);
+        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        if (fd < 0) {
+            goto fail;
+        }
+
+        if (pthread_rwlock_wrlock(&mount_fds_lock)) {
+            goto fail;
+        }
+
+        /* Check again, might have changed */
+        if (g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
+            close(fd);
+        } else {
+            g_hash_table_insert(mount_fds,
+                                GINT_TO_POINTER(fh->mount_id),
+                                GINT_TO_POINTER(fd));
+        }
+    }
+    pthread_rwlock_unlock(&mount_fds_lock);
+
+    return fh;
+
+fail:
+    free(fh);
+    return NULL;
+#else /* defined(CONFIG_STATX) && defined(STATX_MNT_ID) */
+    return NULL;
+#endif
+}
+
 /**
  * Open the given file handle with the given flags.
  *
@@ -1132,6 +1266,11 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
             return -1;
         }
         lo->use_statx = false;
+        if (lo->inode_file_handles) {
+            fuse_log(FUSE_LOG_WARNING,
+                     "statx() does not work: Will not be able to use file "
+                     "handles for inodes\n");
+        }
         /* fallback */
     }
 #endif
@@ -1161,6 +1300,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode = NULL;
     struct lo_inode *dir = lo_inode(req, parent);
+    struct lo_fhandle *fh;
 
     if (inodep) {
         *inodep = NULL; /* in case there is an error */
@@ -1190,13 +1330,19 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
-    newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
-    if (newfd == -1) {
-        goto out_err;
-    }
+    fh = get_file_handle(lo, dir_fd.fd, name);
+    if (fh) {
+        res = do_statx(lo, dir_fd.fd, name, &e->attr,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
+    } else {
+        newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
+        if (newfd == -1) {
+            goto out_err;
+        }
 
-    res = do_statx(lo, newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
-                   &mnt_id);
+        res = do_statx(lo, newfd, "", &e->attr,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
+    }
     if (res == -1) {
         goto out_err;
     }
@@ -1206,9 +1352,19 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
-    inode = lo_find(lo, NULL, &e->attr, mnt_id);
+    /*
+     * Note that fh is always NULL if lo->inode_file_handles is false,
+     * and so we will never do a lookup by file handle here, and
+     * lo->inodes_by_handle will always remain empty.  We only need
+     * this map when we do not have an O_PATH fd open for every
+     * lo_inode, though, so if inode_file_handles is false, we do not
+     * need that map anyway.
+     */
+    inode = lo_find(lo, fh, &e->attr, mnt_id);
     if (inode) {
-        close(newfd);
+        if (newfd != -1) {
+            close(newfd);
+        }
     } else {
         inode = calloc(1, sizeof(struct lo_inode));
         if (!inode) {
@@ -1226,6 +1382,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 
         inode->nlookup = 1;
         inode->fd = newfd;
+        inode->fhandle = fh;
         inode->key.ino = e->attr.st_ino;
         inode->key.dev = e->attr.st_dev;
         inode->key.mnt_id = mnt_id;
@@ -1237,6 +1394,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         pthread_mutex_lock(&lo->mutex);
         inode->fuse_ino = lo_add_inode_mapping(req, inode);
         g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
+        if (inode->fhandle) {
+            g_hash_table_insert(lo->inodes_by_handle, inode->fhandle, inode);
+        }
         pthread_mutex_unlock(&lo->mutex);
     }
     e->ino = inode->fuse_ino;
@@ -1530,8 +1690,10 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
     int res;
     uint64_t mnt_id;
     struct stat attr;
+    struct lo_fhandle *fh;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *dir = lo_inode(req, parent);
+    struct lo_inode *inode;
 
     if (!dir) {
         return NULL;
@@ -1542,13 +1704,19 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
         return NULL;
     }
 
+    fh = get_file_handle(lo, dir_fd.fd, name);
+    /* Ignore errors, this is just an optional key for the lookup */
+
     res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
     lo_inode_put(lo, &dir);
     if (res == -1) {
         return NULL;
     }
 
-    return lo_find(lo, NULL, &attr, mnt_id);
+    inode = lo_find(lo, fh, &attr, mnt_id);
+    g_free(fh);
+
+    return inode;
 }
 
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
@@ -1712,6 +1880,9 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
     if (!inode->nlookup) {
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
         g_hash_table_remove(lo->inodes_by_ids, &inode->key);
+        if (inode->fhandle) {
+            g_hash_table_remove(lo->inodes_by_handle, inode->fhandle);
+        }
         if (lo->posix_lock) {
             if (g_hash_table_size(inode->posix_locks)) {
                 fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
@@ -4156,6 +4327,14 @@ int main(int argc, char *argv[])
 
     lo.use_statx = true;
 
+#if !defined(CONFIG_STATX) || !defined(STATX_MNT_ID)
+    if (lo.inode_file_handles) {
+        fuse_log(FUSE_LOG_WARNING,
+                 "No statx() or mount ID support: Will not be able to use file "
+                 "handles for inodes\n");
+    }
+#endif
+
     se = fuse_session_new(&args, &lo_oper, sizeof(lo_oper), &lo);
     if (se == NULL) {
         goto err_out1;
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index e948f25ac1..ed23e67ba8 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -73,6 +73,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(mprotect),
     SCMP_SYS(mremap),
     SCMP_SYS(munmap),
+    SCMP_SYS(name_to_handle_at),
     SCMP_SYS(newfstatat),
     SCMP_SYS(statx),
     SCMP_SYS(open),
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Virtio-fs] [PATCH v2 8/9] virtiofsd: Optionally fill lo_inode.fhandle
@ 2021-06-09 15:55   ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Max Reitz

When the inode_file_handles option is set, try to generate a file handle
for new inodes instead of opening an O_PATH FD.

Being able to open these again will require CAP_DAC_READ_SEARCH, so the
description text tells the user they will also need to specify
-o modcaps=+dac_read_search.

Generating a file handle returns the mount ID it is valid for.  Opening
it will require an FD instead.  We have mount_fds to map an ID to an FD.
get_file_handle() fills the hash map by opening the file we have
generated a handle for.  To verify that the resulting FD indeed
represents the handle's mount ID, we use statx().  Therefore, using file
handles requires statx() support.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/helper.c              |   3 +
 tools/virtiofsd/passthrough_ll.c      | 197 ++++++++++++++++++++++++--
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 3 files changed, 192 insertions(+), 9 deletions(-)

diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index 5e98ed702b..954f8639e6 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -186,6 +186,9 @@ void fuse_cmdline_help(void)
            "                               to virtiofsd from guest applications.\n"
            "                               default: no_allow_direct_io\n"
            "    -o announce_submounts      Announce sub-mount points to the guest\n"
+           "    -o inode_file_handles      Use file handles to reference inodes\n"
+           "                               instead of O_PATH file descriptors\n"
+           "                               (requires -o modcaps=+dac_read_search)\n"
            );
 }
 
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 793d2c333e..2e56c40b2f 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -190,6 +190,7 @@ struct lo_data {
     /* An O_PATH file descriptor to /proc/self/fd/ */
     int proc_self_fd;
     int user_killpriv_v2, killpriv_v2;
+    int inode_file_handles;
 };
 
 /**
@@ -244,6 +245,10 @@ static const struct fuse_opt lo_opts[] = {
     { "announce_submounts", offsetof(struct lo_data, announce_submounts), 1 },
     { "killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 1 },
     { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
+    { "inode_file_handles", offsetof(struct lo_data, inode_file_handles), 1 },
+    { "no_inode_file_handles",
+      offsetof(struct lo_data, inode_file_handles),
+      0 },
     FUSE_OPT_END
 };
 static bool use_syslog = false;
@@ -315,6 +320,135 @@ static int temp_fd_steal(TempFd *temp_fd)
     }
 }
 
+/**
+ * Generate a file handle for the given dirfd/name combination.
+ *
+ * If mount_fds does not yet contain an entry for the handle's mount
+ * ID, (re)open dirfd/name in O_RDONLY mode and add it to mount_fds
+ * as the FD for that mount ID.  (That is the file that we have
+ * generated a handle for, so it should be representative for the
+ * mount ID.  However, to be sure (and to rule out races), we use
+ * statx() to verify that our assumption is correct.)
+ */
+static struct lo_fhandle *get_file_handle(struct lo_data *lo,
+                                          int dirfd, const char *name)
+{
+    /* We need statx() to verify the mount ID */
+#if defined(CONFIG_STATX) && defined(STATX_MNT_ID)
+    struct lo_fhandle *fh;
+    int ret;
+
+    if (!lo->use_statx || !lo->inode_file_handles) {
+        return NULL;
+    }
+
+    fh = g_new0(struct lo_fhandle, 1);
+
+    fh->handle.handle_bytes = sizeof(fh->padding) - sizeof(fh->handle);
+    ret = name_to_handle_at(dirfd, name, &fh->handle, &fh->mount_id,
+                            AT_EMPTY_PATH);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    if (pthread_rwlock_rdlock(&mount_fds_lock)) {
+        goto fail;
+    }
+    if (!g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
+        g_auto(TempFd) path_fd = TEMP_FD_INIT;
+        struct statx stx;
+        char procname[64];
+        int fd;
+
+        pthread_rwlock_unlock(&mount_fds_lock);
+
+        /*
+         * Before opening an O_RDONLY fd, check whether dirfd/name is a regular
+         * file or directory, because we must not open anything else with
+         * anything but O_PATH.
+         * (And we use that occasion to verify that the file has the mount ID we
+         * need.)
+         */
+        if (name[0]) {
+            path_fd.fd = openat(dirfd, name, O_PATH);
+            if (path_fd.fd < 0) {
+                goto fail;
+            }
+            path_fd.owned = true;
+        } else {
+            path_fd.fd = dirfd;
+            path_fd.owned = false;
+        }
+
+        ret = statx(path_fd.fd, "", AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
+                    STATX_TYPE | STATX_MNT_ID, &stx);
+        if (ret < 0) {
+            if (errno == ENOSYS) {
+                lo->use_statx = false;
+                fuse_log(FUSE_LOG_WARNING,
+                         "statx() does not work: Will not be able to use file "
+                         "handles for inodes\n");
+            }
+            goto fail;
+        }
+        if (!(stx.stx_mask & STATX_MNT_ID) || stx.stx_mnt_id != fh->mount_id) {
+            /*
+             * One reason for stx_mnt_id != mount_id could be that dirfd/name
+             * is a directory, and some other filesystem was mounted there
+             * between us generating the file handle and then opening the FD.
+             * (Other kinds of races might be possible, too.)
+             * Failing this function is not fatal, though, because our caller
+             * (lo_do_lookup()) will just fall back to opening an O_PATH FD to
+             * store in lo_inode.fd instead of storing a file handle in
+             * lo_inode.fhandle.  So we do not need to try too hard to get an
+             * FD for fh->mount_id so this function could succeed.
+             */
+            goto fail;
+        }
+        if (!(stx.stx_mask & STATX_TYPE) ||
+            !(S_ISREG(stx.stx_mode) || S_ISDIR(stx.stx_mode)))
+        {
+            /*
+             * We must not open special files with anything but O_PATH, so we
+             * cannot use this file for mount_fds.
+             * Just return a failure in such a case and let the lo_inode have
+             * an O_PATH fd instead of a file handle.
+             */
+            goto fail;
+        }
+
+        /* Now that we know this fd is safe to open, do it */
+        snprintf(procname, sizeof(procname), "%i", path_fd.fd);
+        fd = openat(lo->proc_self_fd, procname, O_RDONLY);
+        if (fd < 0) {
+            goto fail;
+        }
+
+        if (pthread_rwlock_wrlock(&mount_fds_lock)) {
+            goto fail;
+        }
+
+        /* Check again, might have changed */
+        if (g_hash_table_contains(mount_fds, GINT_TO_POINTER(fh->mount_id))) {
+            close(fd);
+        } else {
+            g_hash_table_insert(mount_fds,
+                                GINT_TO_POINTER(fh->mount_id),
+                                GINT_TO_POINTER(fd));
+        }
+    }
+    pthread_rwlock_unlock(&mount_fds_lock);
+
+    return fh;
+
+fail:
+    free(fh);
+    return NULL;
+#else /* defined(CONFIG_STATX) && defined(STATX_MNT_ID) */
+    return NULL;
+#endif
+}
+
 /**
  * Open the given file handle with the given flags.
  *
@@ -1132,6 +1266,11 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
             return -1;
         }
         lo->use_statx = false;
+        if (lo->inode_file_handles) {
+            fuse_log(FUSE_LOG_WARNING,
+                     "statx() does not work: Will not be able to use file "
+                     "handles for inodes\n");
+        }
         /* fallback */
     }
 #endif
@@ -1161,6 +1300,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode = NULL;
     struct lo_inode *dir = lo_inode(req, parent);
+    struct lo_fhandle *fh;
 
     if (inodep) {
         *inodep = NULL; /* in case there is an error */
@@ -1190,13 +1330,19 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
-    newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
-    if (newfd == -1) {
-        goto out_err;
-    }
+    fh = get_file_handle(lo, dir_fd.fd, name);
+    if (fh) {
+        res = do_statx(lo, dir_fd.fd, name, &e->attr,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
+    } else {
+        newfd = openat(dir_fd.fd, name, O_PATH | O_NOFOLLOW);
+        if (newfd == -1) {
+            goto out_err;
+        }
 
-    res = do_statx(lo, newfd, "", &e->attr, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW,
-                   &mnt_id);
+        res = do_statx(lo, newfd, "", &e->attr,
+                       AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW, &mnt_id);
+    }
     if (res == -1) {
         goto out_err;
     }
@@ -1206,9 +1352,19 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
-    inode = lo_find(lo, NULL, &e->attr, mnt_id);
+    /*
+     * Note that fh is always NULL if lo->inode_file_handles is false,
+     * and so we will never do a lookup by file handle here, and
+     * lo->inodes_by_handle will always remain empty.  We only need
+     * this map when we do not have an O_PATH fd open for every
+     * lo_inode, though, so if inode_file_handles is false, we do not
+     * need that map anyway.
+     */
+    inode = lo_find(lo, fh, &e->attr, mnt_id);
     if (inode) {
-        close(newfd);
+        if (newfd != -1) {
+            close(newfd);
+        }
     } else {
         inode = calloc(1, sizeof(struct lo_inode));
         if (!inode) {
@@ -1226,6 +1382,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 
         inode->nlookup = 1;
         inode->fd = newfd;
+        inode->fhandle = fh;
         inode->key.ino = e->attr.st_ino;
         inode->key.dev = e->attr.st_dev;
         inode->key.mnt_id = mnt_id;
@@ -1237,6 +1394,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         pthread_mutex_lock(&lo->mutex);
         inode->fuse_ino = lo_add_inode_mapping(req, inode);
         g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
+        if (inode->fhandle) {
+            g_hash_table_insert(lo->inodes_by_handle, inode->fhandle, inode);
+        }
         pthread_mutex_unlock(&lo->mutex);
     }
     e->ino = inode->fuse_ino;
@@ -1530,8 +1690,10 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
     int res;
     uint64_t mnt_id;
     struct stat attr;
+    struct lo_fhandle *fh;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *dir = lo_inode(req, parent);
+    struct lo_inode *inode;
 
     if (!dir) {
         return NULL;
@@ -1542,13 +1704,19 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
         return NULL;
     }
 
+    fh = get_file_handle(lo, dir_fd.fd, name);
+    /* Ignore errors, this is just an optional key for the lookup */
+
     res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
     lo_inode_put(lo, &dir);
     if (res == -1) {
         return NULL;
     }
 
-    return lo_find(lo, NULL, &attr, mnt_id);
+    inode = lo_find(lo, fh, &attr, mnt_id);
+    g_free(fh);
+
+    return inode;
 }
 
 static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
@@ -1712,6 +1880,9 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
     if (!inode->nlookup) {
         lo_map_remove(&lo->ino_map, inode->fuse_ino);
         g_hash_table_remove(lo->inodes_by_ids, &inode->key);
+        if (inode->fhandle) {
+            g_hash_table_remove(lo->inodes_by_handle, inode->fhandle);
+        }
         if (lo->posix_lock) {
             if (g_hash_table_size(inode->posix_locks)) {
                 fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
@@ -4156,6 +4327,14 @@ int main(int argc, char *argv[])
 
     lo.use_statx = true;
 
+#if !defined(CONFIG_STATX) || !defined(STATX_MNT_ID)
+    if (lo.inode_file_handles) {
+        fuse_log(FUSE_LOG_WARNING,
+                 "No statx() or mount ID support: Will not be able to use file "
+                 "handles for inodes\n");
+    }
+#endif
+
     se = fuse_session_new(&args, &lo_oper, sizeof(lo_oper), &lo);
     if (se == NULL) {
         goto err_out1;
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index e948f25ac1..ed23e67ba8 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -73,6 +73,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(mprotect),
     SCMP_SYS(mremap),
     SCMP_SYS(munmap),
+    SCMP_SYS(name_to_handle_at),
     SCMP_SYS(newfstatat),
     SCMP_SYS(statx),
     SCMP_SYS(open),
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 9/9] virtiofsd: Add lazy lo_do_find()
  2021-06-09 15:55 ` [Virtio-fs] " Max Reitz
@ 2021-06-09 15:55   ` Max Reitz
  -1 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi, Max Reitz

lo_find() right now takes two lookup keys for two maps, namely the file
handle for inodes_by_handle and the statx information for inodes_by_ids.
However, we only need the statx information if looking up the inode by
the file handle failed.

There are two callers of lo_find(): The first one, lo_do_lookup(), has
both keys anyway, so passing them does not incur any additional cost.
The second one, lookup_name(), though, needs to explicitly invoke
name_to_handle_at() (through get_file_handle()) and statx() (through
do_statx()).  We need to try to get a file handle as the primary key, so
we cannot get rid of get_file_handle(), but we only need the statx
information if looking up an inode by handle failed; so we can defer
that until the lookup has indeed failed.

To this end, replace lo_find()'s st/mnt_id parameters by a get_ids()
closure that is invoked to fill the lo_key struct if necessary.

Also, lo_find() is renamed to lo_do_find(), so we can add a new
lo_find() wrapper whose closure just initializes the lo_key from the
st/mnt_id parameters, just like the old lo_find() did.

lookup_name() directly calls lo_do_find() now and passes its own
closure, which performs the do_statx() call.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 93 ++++++++++++++++++++++++++------
 1 file changed, 76 insertions(+), 17 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 2e56c40b2f..8990fd5bd2 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1168,22 +1168,23 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
-static struct lo_inode *lo_find(struct lo_data *lo,
-                                const struct lo_fhandle *fhandle,
-                                struct stat *st, uint64_t mnt_id)
+/*
+ * get_ids() will be called to get the key for lo->inodes_by_ids if
+ * the lookup by file handle has failed.
+ */
+static struct lo_inode *lo_do_find(struct lo_data *lo,
+    const struct lo_fhandle *fhandle,
+    int (*get_ids)(struct lo_key *, const void *),
+    const void *get_ids_opaque)
 {
     struct lo_inode *p = NULL;
-    struct lo_key ids_key = {
-        .ino = st->st_ino,
-        .dev = st->st_dev,
-        .mnt_id = mnt_id,
-    };
+    struct lo_key ids_key;
 
     pthread_mutex_lock(&lo->mutex);
     if (fhandle) {
         p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
     }
-    if (!p) {
+    if (!p && get_ids(&ids_key, get_ids_opaque) == 0) {
         p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
         /*
          * When we had to fall back to looking up an inode by its IDs,
@@ -1211,6 +1212,36 @@ static struct lo_inode *lo_find(struct lo_data *lo,
     return p;
 }
 
+struct lo_find_get_ids_key_opaque {
+    const struct stat *st;
+    uint64_t mnt_id;
+};
+
+static int lo_find_get_ids_key(struct lo_key *ids_key, const void *opaque)
+{
+    const struct lo_find_get_ids_key_opaque *stat_info = opaque;
+
+    *ids_key = (struct lo_key){
+        .ino = stat_info->st->st_ino,
+        .dev = stat_info->st->st_dev,
+        .mnt_id = stat_info->mnt_id,
+    };
+
+    return 0;
+}
+
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id)
+{
+    const struct lo_find_get_ids_key_opaque stat_info = {
+        .st = st,
+        .mnt_id = mnt_id,
+    };
+
+    return lo_do_find(lo, fhandle, lo_find_get_ids_key, &stat_info);
+}
+
 /* value_destroy_func for posix_locks GHashTable */
 static void posix_locks_value_destroy(gpointer data)
 {
@@ -1682,14 +1713,41 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
+struct lookup_name_get_ids_key_opaque {
+    struct lo_data *lo;
+    int parent_fd;
+    const char *name;
+};
+
+static int lookup_name_get_ids_key(struct lo_key *ids_key, const void *opaque)
+{
+    const struct lookup_name_get_ids_key_opaque *stat_params = opaque;
+    uint64_t mnt_id;
+    struct stat attr;
+    int res;
+
+    res = do_statx(stat_params->lo, stat_params->parent_fd, stat_params->name,
+                   &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
+    if (res < 0) {
+        return -errno;
+    }
+
+    *ids_key = (struct lo_key){
+        .ino = attr.st_ino,
+        .dev = attr.st_dev,
+        .mnt_id = mnt_id,
+    };
+
+    return 0;
+}
+
 /* Increments nlookup and caller must release refcount using lo_inode_put() */
 static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
                                     const char *name)
 {
     g_auto(TempFd) dir_fd = TEMP_FD_INIT;
     int res;
-    uint64_t mnt_id;
-    struct stat attr;
+    struct lookup_name_get_ids_key_opaque stat_params;
     struct lo_fhandle *fh;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *dir = lo_inode(req, parent);
@@ -1707,13 +1765,14 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
     fh = get_file_handle(lo, dir_fd.fd, name);
     /* Ignore errors, this is just an optional key for the lookup */
 
-    res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
-    lo_inode_put(lo, &dir);
-    if (res == -1) {
-        return NULL;
-    }
+    stat_params = (struct lookup_name_get_ids_key_opaque){
+        .lo = lo,
+        .parent_fd = dir_fd.fd,
+        .name = name,
+    };
 
-    inode = lo_find(lo, fh, &attr, mnt_id);
+    inode = lo_do_find(lo, fh, lookup_name_get_ids_key, &stat_params);
+    lo_inode_put(lo, &dir);
     g_free(fh);
 
     return inode;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Virtio-fs] [PATCH v2 9/9] virtiofsd: Add lazy lo_do_find()
@ 2021-06-09 15:55   ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-09 15:55 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Max Reitz

lo_find() right now takes two lookup keys for two maps, namely the file
handle for inodes_by_handle and the statx information for inodes_by_ids.
However, we only need the statx information if looking up the inode by
the file handle failed.

There are two callers of lo_find(): The first one, lo_do_lookup(), has
both keys anyway, so passing them does not incur any additional cost.
The second one, lookup_name(), though, needs to explicitly invoke
name_to_handle_at() (through get_file_handle()) and statx() (through
do_statx()).  We need to try to get a file handle as the primary key, so
we cannot get rid of get_file_handle(), but we only need the statx
information if looking up an inode by handle failed; so we can defer
that until the lookup has indeed failed.

To this end, replace lo_find()'s st/mnt_id parameters by a get_ids()
closure that is invoked to fill the lo_key struct if necessary.

Also, lo_find() is renamed to lo_do_find(), so we can add a new
lo_find() wrapper whose closure just initializes the lo_key from the
st/mnt_id parameters, just like the old lo_find() did.

lookup_name() directly calls lo_do_find() now and passes its own
closure, which performs the do_statx() call.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 93 ++++++++++++++++++++++++++------
 1 file changed, 76 insertions(+), 17 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 2e56c40b2f..8990fd5bd2 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1168,22 +1168,23 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
-static struct lo_inode *lo_find(struct lo_data *lo,
-                                const struct lo_fhandle *fhandle,
-                                struct stat *st, uint64_t mnt_id)
+/*
+ * get_ids() will be called to get the key for lo->inodes_by_ids if
+ * the lookup by file handle has failed.
+ */
+static struct lo_inode *lo_do_find(struct lo_data *lo,
+    const struct lo_fhandle *fhandle,
+    int (*get_ids)(struct lo_key *, const void *),
+    const void *get_ids_opaque)
 {
     struct lo_inode *p = NULL;
-    struct lo_key ids_key = {
-        .ino = st->st_ino,
-        .dev = st->st_dev,
-        .mnt_id = mnt_id,
-    };
+    struct lo_key ids_key;
 
     pthread_mutex_lock(&lo->mutex);
     if (fhandle) {
         p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
     }
-    if (!p) {
+    if (!p && get_ids(&ids_key, get_ids_opaque) == 0) {
         p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
         /*
          * When we had to fall back to looking up an inode by its IDs,
@@ -1211,6 +1212,36 @@ static struct lo_inode *lo_find(struct lo_data *lo,
     return p;
 }
 
+struct lo_find_get_ids_key_opaque {
+    const struct stat *st;
+    uint64_t mnt_id;
+};
+
+static int lo_find_get_ids_key(struct lo_key *ids_key, const void *opaque)
+{
+    const struct lo_find_get_ids_key_opaque *stat_info = opaque;
+
+    *ids_key = (struct lo_key){
+        .ino = stat_info->st->st_ino,
+        .dev = stat_info->st->st_dev,
+        .mnt_id = stat_info->mnt_id,
+    };
+
+    return 0;
+}
+
+static struct lo_inode *lo_find(struct lo_data *lo,
+                                const struct lo_fhandle *fhandle,
+                                struct stat *st, uint64_t mnt_id)
+{
+    const struct lo_find_get_ids_key_opaque stat_info = {
+        .st = st,
+        .mnt_id = mnt_id,
+    };
+
+    return lo_do_find(lo, fhandle, lo_find_get_ids_key, &stat_info);
+}
+
 /* value_destroy_func for posix_locks GHashTable */
 static void posix_locks_value_destroy(gpointer data)
 {
@@ -1682,14 +1713,41 @@ out_err:
     fuse_reply_err(req, saverr);
 }
 
+struct lookup_name_get_ids_key_opaque {
+    struct lo_data *lo;
+    int parent_fd;
+    const char *name;
+};
+
+static int lookup_name_get_ids_key(struct lo_key *ids_key, const void *opaque)
+{
+    const struct lookup_name_get_ids_key_opaque *stat_params = opaque;
+    uint64_t mnt_id;
+    struct stat attr;
+    int res;
+
+    res = do_statx(stat_params->lo, stat_params->parent_fd, stat_params->name,
+                   &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
+    if (res < 0) {
+        return -errno;
+    }
+
+    *ids_key = (struct lo_key){
+        .ino = attr.st_ino,
+        .dev = attr.st_dev,
+        .mnt_id = mnt_id,
+    };
+
+    return 0;
+}
+
 /* Increments nlookup and caller must release refcount using lo_inode_put() */
 static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
                                     const char *name)
 {
     g_auto(TempFd) dir_fd = TEMP_FD_INIT;
     int res;
-    uint64_t mnt_id;
-    struct stat attr;
+    struct lookup_name_get_ids_key_opaque stat_params;
     struct lo_fhandle *fh;
     struct lo_data *lo = lo_data(req);
     struct lo_inode *dir = lo_inode(req, parent);
@@ -1707,13 +1765,14 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
     fh = get_file_handle(lo, dir_fd.fd, name);
     /* Ignore errors, this is just an optional key for the lookup */
 
-    res = do_statx(lo, dir_fd.fd, name, &attr, AT_SYMLINK_NOFOLLOW, &mnt_id);
-    lo_inode_put(lo, &dir);
-    if (res == -1) {
-        return NULL;
-    }
+    stat_params = (struct lookup_name_get_ids_key_opaque){
+        .lo = lo,
+        .parent_fd = dir_fd.fd,
+        .name = name,
+    };
 
-    inode = lo_find(lo, fh, &attr, mnt_id);
+    inode = lo_do_find(lo, fh, lookup_name_get_ids_key, &stat_params);
+    lo_inode_put(lo, &dir);
     g_free(fh);
 
     return inode;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/9] virtiofsd: Allow using file handles instead of O_PATH FDs
  2021-06-09 15:55 ` [Virtio-fs] " Max Reitz
@ 2021-06-11 19:19   ` Vivek Goyal
  -1 siblings, 0 replies; 38+ messages in thread
From: Vivek Goyal @ 2021-06-11 19:19 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Wed, Jun 09, 2021 at 05:55:42PM +0200, Max Reitz wrote:
> Hi,
> 
> v1 cover letter for an overview:
> https://listman.redhat.com/archives/virtio-fs/2021-June/msg00033.html

Hi Max,

What's the impact of these patches on performance? Just trying to 
get some idea what to expect. Performance remains more or less
same or we expect a hit.

Thanks
Vivek

> 
> In v2, I (tried to) fix the bug Dave found, which is that
> get_file_handle() indiscriminately opened the given dirfd/name
> combination to get an O_RDONLY fd without checking whether we’re
> actually allowed to open dirfd/name; namely, we don’t allow ourselves to
> open files that aren’t regular files or directories.
> 
> So that openat(.., O_RDONLY) is changed to an openat(..., O_PATH), and
> then check the file type with the statx() we’re doing anyway.  If the
> file is OK to open, we reopen it O_RDONLY with the help of
> /proc/self/fd, like we always do.
> 
> (This only affects patch 8.)
> 
> 
> git-backport-diff against v1:
> 
> Key:
> [----] : patches are identical
> [####] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
> 
> 001/9:[----] [--] 'virtiofsd: Add TempFd structure'
> 002/9:[----] [--] 'virtiofsd: Use lo_inode_open() instead of openat()'
> 003/9:[----] [--] 'virtiofsd: Add lo_inode_fd() helper'
> 004/9:[----] [--] 'virtiofsd: Let lo_fd() return a TempFd'
> 005/9:[----] [--] 'virtiofsd: Let lo_inode_open() return a TempFd'
> 006/9:[----] [--] 'virtiofsd: Add lo_inode.fhandle'
> 007/9:[----] [--] 'virtiofsd: Add inodes_by_handle hash table'
> 008/9:[0045] [FC] 'virtiofsd: Optionally fill lo_inode.fhandle'
> 009/9:[----] [--] 'virtiofsd: Add lazy lo_do_find()'
> 
> 
> Max Reitz (9):
>   virtiofsd: Add TempFd structure
>   virtiofsd: Use lo_inode_open() instead of openat()
>   virtiofsd: Add lo_inode_fd() helper
>   virtiofsd: Let lo_fd() return a TempFd
>   virtiofsd: Let lo_inode_open() return a TempFd
>   virtiofsd: Add lo_inode.fhandle
>   virtiofsd: Add inodes_by_handle hash table
>   virtiofsd: Optionally fill lo_inode.fhandle
>   virtiofsd: Add lazy lo_do_find()
> 
>  tools/virtiofsd/helper.c              |   3 +
>  tools/virtiofsd/passthrough_ll.c      | 836 +++++++++++++++++++++-----
>  tools/virtiofsd/passthrough_seccomp.c |   2 +
>  3 files changed, 694 insertions(+), 147 deletions(-)
> 
> -- 
> 2.31.1
> 
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 0/9] virtiofsd: Allow using file handles instead of O_PATH FDs
@ 2021-06-11 19:19   ` Vivek Goyal
  0 siblings, 0 replies; 38+ messages in thread
From: Vivek Goyal @ 2021-06-11 19:19 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Wed, Jun 09, 2021 at 05:55:42PM +0200, Max Reitz wrote:
> Hi,
> 
> v1 cover letter for an overview:
> https://listman.redhat.com/archives/virtio-fs/2021-June/msg00033.html

Hi Max,

What's the impact of these patches on performance? Just trying to 
get some idea what to expect. Performance remains more or less
same or we expect a hit.

Thanks
Vivek

> 
> In v2, I (tried to) fix the bug Dave found, which is that
> get_file_handle() indiscriminately opened the given dirfd/name
> combination to get an O_RDONLY fd without checking whether we’re
> actually allowed to open dirfd/name; namely, we don’t allow ourselves to
> open files that aren’t regular files or directories.
> 
> So that openat(.., O_RDONLY) is changed to an openat(..., O_PATH), and
> then check the file type with the statx() we’re doing anyway.  If the
> file is OK to open, we reopen it O_RDONLY with the help of
> /proc/self/fd, like we always do.
> 
> (This only affects patch 8.)
> 
> 
> git-backport-diff against v1:
> 
> Key:
> [----] : patches are identical
> [####] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
> 
> 001/9:[----] [--] 'virtiofsd: Add TempFd structure'
> 002/9:[----] [--] 'virtiofsd: Use lo_inode_open() instead of openat()'
> 003/9:[----] [--] 'virtiofsd: Add lo_inode_fd() helper'
> 004/9:[----] [--] 'virtiofsd: Let lo_fd() return a TempFd'
> 005/9:[----] [--] 'virtiofsd: Let lo_inode_open() return a TempFd'
> 006/9:[----] [--] 'virtiofsd: Add lo_inode.fhandle'
> 007/9:[----] [--] 'virtiofsd: Add inodes_by_handle hash table'
> 008/9:[0045] [FC] 'virtiofsd: Optionally fill lo_inode.fhandle'
> 009/9:[----] [--] 'virtiofsd: Add lazy lo_do_find()'
> 
> 
> Max Reitz (9):
>   virtiofsd: Add TempFd structure
>   virtiofsd: Use lo_inode_open() instead of openat()
>   virtiofsd: Add lo_inode_fd() helper
>   virtiofsd: Let lo_fd() return a TempFd
>   virtiofsd: Let lo_inode_open() return a TempFd
>   virtiofsd: Add lo_inode.fhandle
>   virtiofsd: Add inodes_by_handle hash table
>   virtiofsd: Optionally fill lo_inode.fhandle
>   virtiofsd: Add lazy lo_do_find()
> 
>  tools/virtiofsd/helper.c              |   3 +
>  tools/virtiofsd/passthrough_ll.c      | 836 +++++++++++++++++++++-----
>  tools/virtiofsd/passthrough_seccomp.c |   2 +
>  3 files changed, 694 insertions(+), 147 deletions(-)
> 
> -- 
> 2.31.1
> 
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
  2021-06-09 15:55   ` [Virtio-fs] " Max Reitz
  (?)
@ 2021-06-11 20:04   ` Vivek Goyal
  2021-06-16 13:38     ` Max Reitz
  -1 siblings, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2021-06-11 20:04 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Wed, Jun 09, 2021 at 05:55:49PM +0200, Max Reitz wrote:
> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> its inode ID will remain in use until we drop our lo_inode (and
> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> the inode ID as an lo_inode key, because any inode with an inode ID we
> find in lo_data.inodes (on the same filesystem) must be the exact same
> file.
> 
> This will change when we start setting lo_inode.fhandle so we do not
> have to keep an O_PATH FD open.  Then, unlinking such an inode will
> immediately remove it, so its ID can then be reused by newly created
> files, even while the lo_inode object is still there[1].
> 
> So creating a new file can then reuse the old file's inode ID, and
> looking up the new file would lead to us finding the old file's
> lo_inode, which is not ideal.
> 
> Luckily, just as file handles cause this problem, they also solve it:  A
> file handle contains a generation ID, which changes when an inode ID is
> reused, so the new file can be distinguished from the old one.  So all
> we need to do is to add a second map besides lo_data.inodes that maps
> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> 
> Unfortunately, we cannot rely on being able to generate file handles
> every time.

Hi Max, 

What are the cases where we can not rely being able to generate file
handles?

> Therefore, we still enter every lo_inode object into
> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> potential inodes_by_handle entry then has precedence, the inodes_by_ids
> entry is just a fallback.

If we have to keep inodes_by_ids around, then can we just add fhandle
to the lo_key. That way we can manage with single hash table and still
be able to detect if inode ID has been reused.

Thanks
Vivek

> 
> Note that we do not generate lo_fhandle objects yet, and so we also do
> not enter anything into the inodes_by_handle map yet.  Also, all lookups
> skip that map.  We might manually create file handles with some code
> that is immediately removed by the next patch again, but that would
> break the assumption in lo_find() that every lo_inode with a non-NULL
> .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> leave actually using the inodes_by_handle map for the next patch.
> 
> [1] If some application in the guest still has the file open, there is
> going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> case, the inode will only go away once every application in the guest
> has closed it.  The problem described only applies to cases where the
> guest does not have the file open, and it is just in the dentry cache,
> basically.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 80 +++++++++++++++++++++++++-------
>  1 file changed, 64 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index e665575401..793d2c333e 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -179,7 +179,8 @@ struct lo_data {
>      int announce_submounts;
>      bool use_statx;
>      struct lo_inode root;
> -    GHashTable *inodes; /* protected by lo->mutex */
> +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
> +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
>      struct lo_map ino_map; /* protected by lo->mutex */
>      struct lo_map dirp_map; /* protected by lo->mutex */
>      struct lo_map fd_map; /* protected by lo->mutex */
> @@ -257,8 +258,9 @@ static struct {
>  /* That we loaded cap-ng in the current thread from the saved */
>  static __thread bool cap_loaded = 0;
>  
> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> -                                uint64_t mnt_id);
> +static struct lo_inode *lo_find(struct lo_data *lo,
> +                                const struct lo_fhandle *fhandle,
> +                                struct stat *st, uint64_t mnt_id);
>  static int xattr_map_client(const struct lo_data *lo, const char *client_name,
>                              char **out_name);
>  
> @@ -1032,18 +1034,39 @@ out_err:
>      fuse_reply_err(req, saverr);
>  }
>  
> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> -                                uint64_t mnt_id)
> +static struct lo_inode *lo_find(struct lo_data *lo,
> +                                const struct lo_fhandle *fhandle,
> +                                struct stat *st, uint64_t mnt_id)
>  {
> -    struct lo_inode *p;
> -    struct lo_key key = {
> +    struct lo_inode *p = NULL;
> +    struct lo_key ids_key = {
>          .ino = st->st_ino,
>          .dev = st->st_dev,
>          .mnt_id = mnt_id,
>      };
>  
>      pthread_mutex_lock(&lo->mutex);
> -    p = g_hash_table_lookup(lo->inodes, &key);
> +    if (fhandle) {
> +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
> +    }
> +    if (!p) {
> +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
> +        /*
> +         * When we had to fall back to looking up an inode by its IDs,
> +         * ensure that we hit an entry that does not have a file
> +         * handle.  Entries with file handles must also have a handle
> +         * alt key, so if we have not found it by that handle alt key,
> +         * we must have found an entry with a mismatching handle; i.e.
> +         * an entry for a different file, even though it has the same
> +         * inode ID.
> +         * (This can happen when we look up a new file that has reused
> +         * the inode ID of some previously unlinked inode for which we
> +         * still have an lo_inode object.)
> +         */
> +        if (p && fhandle != NULL && p->fhandle != NULL) {
> +            p = NULL;
> +        }
> +    }
>      if (p) {
>          assert(p->nlookup > 0);
>          p->nlookup++;
> @@ -1183,7 +1206,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
>      }
>  
> -    inode = lo_find(lo, &e->attr, mnt_id);
> +    inode = lo_find(lo, NULL, &e->attr, mnt_id);
>      if (inode) {
>          close(newfd);
>      } else {
> @@ -1213,7 +1236,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          }
>          pthread_mutex_lock(&lo->mutex);
>          inode->fuse_ino = lo_add_inode_mapping(req, inode);
> -        g_hash_table_insert(lo->inodes, &inode->key, inode);
> +        g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
>          pthread_mutex_unlock(&lo->mutex);
>      }
>      e->ino = inode->fuse_ino;
> @@ -1525,7 +1548,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>          return NULL;
>      }
>  
> -    return lo_find(lo, &attr, mnt_id);
> +    return lo_find(lo, NULL, &attr, mnt_id);
>  }
>  
>  static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
> @@ -1688,7 +1711,7 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
>      inode->nlookup -= n;
>      if (!inode->nlookup) {
>          lo_map_remove(&lo->ino_map, inode->fuse_ino);
> -        g_hash_table_remove(lo->inodes, &inode->key);
> +        g_hash_table_remove(lo->inodes_by_ids, &inode->key);
>          if (lo->posix_lock) {
>              if (g_hash_table_size(inode->posix_locks)) {
>                  fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
> @@ -3388,7 +3411,7 @@ static void lo_destroy(void *userdata)
>          GHashTableIter iter;
>          gpointer key, value;
>  
> -        g_hash_table_iter_init(&iter, lo->inodes);
> +        g_hash_table_iter_init(&iter, lo->inodes_by_ids);
>          if (!g_hash_table_iter_next(&iter, &key, &value)) {
>              break;
>          }
> @@ -3931,10 +3954,34 @@ static gboolean lo_key_equal(gconstpointer a, gconstpointer b)
>      return la->ino == lb->ino && la->dev == lb->dev && la->mnt_id == lb->mnt_id;
>  }
>  
> +static guint lo_fhandle_hash(gconstpointer key)
> +{
> +    const struct lo_fhandle *fh = key;
> +    guint hash;
> +    size_t i;
> +
> +    /* Basically g_str_hash() */
> +    hash = 5381;
> +    for (i = 0; i < sizeof(fh->padding); i++) {
> +        hash += hash * 33 + (unsigned char)fh->padding[i];
> +    }
> +    hash += hash * 33 + fh->mount_id;
> +
> +    return hash;
> +}
> +
> +static gboolean lo_fhandle_equal(gconstpointer a, gconstpointer b)
> +{
> +    return !memcmp(a, b, sizeof(struct lo_fhandle));
> +}
> +
>  static void fuse_lo_data_cleanup(struct lo_data *lo)
>  {
> -    if (lo->inodes) {
> -        g_hash_table_destroy(lo->inodes);
> +    if (lo->inodes_by_ids) {
> +        g_hash_table_destroy(lo->inodes_by_ids);
> +    }
> +    if (lo->inodes_by_ids) {
> +        g_hash_table_destroy(lo->inodes_by_handle);
>      }
>  
>      if (lo->root.posix_locks) {
> @@ -3990,7 +4037,8 @@ int main(int argc, char *argv[])
>      qemu_init_exec_dir(argv[0]);
>  
>      pthread_mutex_init(&lo.mutex, NULL);
> -    lo.inodes = g_hash_table_new(lo_key_hash, lo_key_equal);
> +    lo.inodes_by_ids = g_hash_table_new(lo_key_hash, lo_key_equal);
> +    lo.inodes_by_handle = g_hash_table_new(lo_fhandle_hash, lo_fhandle_equal);
>      lo.root.fd = -1;
>      lo.root.fuse_ino = FUSE_ROOT_ID;
>      lo.cache = CACHE_AUTO;
> -- 
> 2.31.1
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
  2021-06-11 20:04   ` Vivek Goyal
@ 2021-06-16 13:38     ` Max Reitz
  2021-06-17 21:21       ` Vivek Goyal
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2021-06-16 13:38 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel

On 11.06.21 22:04, Vivek Goyal wrote:
> On Wed, Jun 09, 2021 at 05:55:49PM +0200, Max Reitz wrote:
>> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
>> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
>> its inode ID will remain in use until we drop our lo_inode (and
>> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
>> the inode ID as an lo_inode key, because any inode with an inode ID we
>> find in lo_data.inodes (on the same filesystem) must be the exact same
>> file.
>>
>> This will change when we start setting lo_inode.fhandle so we do not
>> have to keep an O_PATH FD open.  Then, unlinking such an inode will
>> immediately remove it, so its ID can then be reused by newly created
>> files, even while the lo_inode object is still there[1].
>>
>> So creating a new file can then reuse the old file's inode ID, and
>> looking up the new file would lead to us finding the old file's
>> lo_inode, which is not ideal.
>>
>> Luckily, just as file handles cause this problem, they also solve it:  A
>> file handle contains a generation ID, which changes when an inode ID is
>> reused, so the new file can be distinguished from the old one.  So all
>> we need to do is to add a second map besides lo_data.inodes that maps
>> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
>> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
>>
>> Unfortunately, we cannot rely on being able to generate file handles
>> every time.
> Hi Max,
>
> What are the cases where we can not rely being able to generate file
> handles?

I have no idea, but it’s much easier to claim we can’t than to prove 
that we can. I’d rather be resilient.

>> Therefore, we still enter every lo_inode object into
>> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
>> potential inodes_by_handle entry then has precedence, the inodes_by_ids
>> entry is just a fallback.
> If we have to keep inodes_by_ids around, then can we just add fhandle
> to the lo_key. That way we can manage with single hash table and still
> be able to detect if inode ID has been reused.

We cannot, because I assume we cannot rely on name_to_handle_at() 
working every time. Therefore, maybe at one point we can generate a file 
handle, and at another, we cannot – we should still be able to look up 
the inode regardless.

If the file handle were part of inodes_by_ids, then we can look up 
inodes only if we can generate a file handle either every time (for a 
given inode) or never. Or, well, I suppose we could always create two 
entries, one with the file handles zeroed out, and one with the file 
handle specified, but I wouldn’t find that very beautiful.

Max

>> Note that we do not generate lo_fhandle objects yet, and so we also do
>> not enter anything into the inodes_by_handle map yet.  Also, all lookups
>> skip that map.  We might manually create file handles with some code
>> that is immediately removed by the next patch again, but that would
>> break the assumption in lo_find() that every lo_inode with a non-NULL
>> .fhandle must have an entry in inodes_by_handle and vice versa.  So we
>> leave actually using the inodes_by_handle map for the next patch.
>>
>> [1] If some application in the guest still has the file open, there is
>> going to be a corresponding FD mapping in lo_data.fd_map.  In such a
>> case, the inode will only go away once every application in the guest
>> has closed it.  The problem described only applies to cases where the
>> guest does not have the file open, and it is just in the dentry cache,
>> basically.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 80 +++++++++++++++++++++++++-------
>>   1 file changed, 64 insertions(+), 16 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index e665575401..793d2c333e 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -179,7 +179,8 @@ struct lo_data {
>>       int announce_submounts;
>>       bool use_statx;
>>       struct lo_inode root;
>> -    GHashTable *inodes; /* protected by lo->mutex */
>> +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
>> +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
>>       struct lo_map ino_map; /* protected by lo->mutex */
>>       struct lo_map dirp_map; /* protected by lo->mutex */
>>       struct lo_map fd_map; /* protected by lo->mutex */
>> @@ -257,8 +258,9 @@ static struct {
>>   /* That we loaded cap-ng in the current thread from the saved */
>>   static __thread bool cap_loaded = 0;
>>   
>> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
>> -                                uint64_t mnt_id);
>> +static struct lo_inode *lo_find(struct lo_data *lo,
>> +                                const struct lo_fhandle *fhandle,
>> +                                struct stat *st, uint64_t mnt_id);
>>   static int xattr_map_client(const struct lo_data *lo, const char *client_name,
>>                               char **out_name);
>>   
>> @@ -1032,18 +1034,39 @@ out_err:
>>       fuse_reply_err(req, saverr);
>>   }
>>   
>> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
>> -                                uint64_t mnt_id)
>> +static struct lo_inode *lo_find(struct lo_data *lo,
>> +                                const struct lo_fhandle *fhandle,
>> +                                struct stat *st, uint64_t mnt_id)
>>   {
>> -    struct lo_inode *p;
>> -    struct lo_key key = {
>> +    struct lo_inode *p = NULL;
>> +    struct lo_key ids_key = {
>>           .ino = st->st_ino,
>>           .dev = st->st_dev,
>>           .mnt_id = mnt_id,
>>       };
>>   
>>       pthread_mutex_lock(&lo->mutex);
>> -    p = g_hash_table_lookup(lo->inodes, &key);
>> +    if (fhandle) {
>> +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
>> +    }
>> +    if (!p) {
>> +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
>> +        /*
>> +         * When we had to fall back to looking up an inode by its IDs,
>> +         * ensure that we hit an entry that does not have a file
>> +         * handle.  Entries with file handles must also have a handle
>> +         * alt key, so if we have not found it by that handle alt key,
>> +         * we must have found an entry with a mismatching handle; i.e.
>> +         * an entry for a different file, even though it has the same
>> +         * inode ID.
>> +         * (This can happen when we look up a new file that has reused
>> +         * the inode ID of some previously unlinked inode for which we
>> +         * still have an lo_inode object.)
>> +         */
>> +        if (p && fhandle != NULL && p->fhandle != NULL) {
>> +            p = NULL;
>> +        }
>> +    }
>>       if (p) {
>>           assert(p->nlookup > 0);
>>           p->nlookup++;
>> @@ -1183,7 +1206,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>>           e->attr_flags |= FUSE_ATTR_SUBMOUNT;
>>       }
>>   
>> -    inode = lo_find(lo, &e->attr, mnt_id);
>> +    inode = lo_find(lo, NULL, &e->attr, mnt_id);
>>       if (inode) {
>>           close(newfd);
>>       } else {
>> @@ -1213,7 +1236,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>>           }
>>           pthread_mutex_lock(&lo->mutex);
>>           inode->fuse_ino = lo_add_inode_mapping(req, inode);
>> -        g_hash_table_insert(lo->inodes, &inode->key, inode);
>> +        g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
>>           pthread_mutex_unlock(&lo->mutex);
>>       }
>>       e->ino = inode->fuse_ino;
>> @@ -1525,7 +1548,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
>>           return NULL;
>>       }
>>   
>> -    return lo_find(lo, &attr, mnt_id);
>> +    return lo_find(lo, NULL, &attr, mnt_id);
>>   }
>>   
>>   static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
>> @@ -1688,7 +1711,7 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
>>       inode->nlookup -= n;
>>       if (!inode->nlookup) {
>>           lo_map_remove(&lo->ino_map, inode->fuse_ino);
>> -        g_hash_table_remove(lo->inodes, &inode->key);
>> +        g_hash_table_remove(lo->inodes_by_ids, &inode->key);
>>           if (lo->posix_lock) {
>>               if (g_hash_table_size(inode->posix_locks)) {
>>                   fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
>> @@ -3388,7 +3411,7 @@ static void lo_destroy(void *userdata)
>>           GHashTableIter iter;
>>           gpointer key, value;
>>   
>> -        g_hash_table_iter_init(&iter, lo->inodes);
>> +        g_hash_table_iter_init(&iter, lo->inodes_by_ids);
>>           if (!g_hash_table_iter_next(&iter, &key, &value)) {
>>               break;
>>           }
>> @@ -3931,10 +3954,34 @@ static gboolean lo_key_equal(gconstpointer a, gconstpointer b)
>>       return la->ino == lb->ino && la->dev == lb->dev && la->mnt_id == lb->mnt_id;
>>   }
>>   
>> +static guint lo_fhandle_hash(gconstpointer key)
>> +{
>> +    const struct lo_fhandle *fh = key;
>> +    guint hash;
>> +    size_t i;
>> +
>> +    /* Basically g_str_hash() */
>> +    hash = 5381;
>> +    for (i = 0; i < sizeof(fh->padding); i++) {
>> +        hash += hash * 33 + (unsigned char)fh->padding[i];
>> +    }
>> +    hash += hash * 33 + fh->mount_id;
>> +
>> +    return hash;
>> +}
>> +
>> +static gboolean lo_fhandle_equal(gconstpointer a, gconstpointer b)
>> +{
>> +    return !memcmp(a, b, sizeof(struct lo_fhandle));
>> +}
>> +
>>   static void fuse_lo_data_cleanup(struct lo_data *lo)
>>   {
>> -    if (lo->inodes) {
>> -        g_hash_table_destroy(lo->inodes);
>> +    if (lo->inodes_by_ids) {
>> +        g_hash_table_destroy(lo->inodes_by_ids);
>> +    }
>> +    if (lo->inodes_by_ids) {
>> +        g_hash_table_destroy(lo->inodes_by_handle);
>>       }
>>   
>>       if (lo->root.posix_locks) {
>> @@ -3990,7 +4037,8 @@ int main(int argc, char *argv[])
>>       qemu_init_exec_dir(argv[0]);
>>   
>>       pthread_mutex_init(&lo.mutex, NULL);
>> -    lo.inodes = g_hash_table_new(lo_key_hash, lo_key_equal);
>> +    lo.inodes_by_ids = g_hash_table_new(lo_key_hash, lo_key_equal);
>> +    lo.inodes_by_handle = g_hash_table_new(lo_fhandle_hash, lo_fhandle_equal);
>>       lo.root.fd = -1;
>>       lo.root.fuse_ino = FUSE_ROOT_ID;
>>       lo.cache = CACHE_AUTO;
>> -- 
>> 2.31.1
>>
>> _______________________________________________
>> Virtio-fs mailing list
>> Virtio-fs@redhat.com
>> https://listman.redhat.com/mailman/listinfo/virtio-fs
>>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 0/9] virtiofsd: Allow using file handles instead of O_PATH FDs
  2021-06-11 19:19   ` [Virtio-fs] " Vivek Goyal
@ 2021-06-16 13:41     ` Max Reitz
  -1 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-16 13:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 11.06.21 21:19, Vivek Goyal wrote:
> On Wed, Jun 09, 2021 at 05:55:42PM +0200, Max Reitz wrote:
>> Hi,
>>
>> v1 cover letter for an overview:
>> https://listman.redhat.com/archives/virtio-fs/2021-June/msg00033.html
> Hi Max,
>
> What's the impact of these patches on performance? Just trying to
> get some idea what to expect. Performance remains more or less
> same or we expect a hit.
I definitely expect a hit if you just benchmark the right way. Having to 
open FDs all the time (for metadata operations) will have an impact 
(when you do lookups all the time, or open files and close them 
immediately). How much of an impact, that’s completely up to the 
filesystem’s open_by_handle_at() implementation, I presume.

I don’t expect it to be significant for real-world use cases, though. 
When really doing I/O, there should be absolutely no difference, because 
then we’re operating with these FUSE handles that have actual (non-path) 
FDs attached to them.

Max



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 0/9] virtiofsd: Allow using file handles instead of O_PATH FDs
@ 2021-06-16 13:41     ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-16 13:41 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel

On 11.06.21 21:19, Vivek Goyal wrote:
> On Wed, Jun 09, 2021 at 05:55:42PM +0200, Max Reitz wrote:
>> Hi,
>>
>> v1 cover letter for an overview:
>> https://listman.redhat.com/archives/virtio-fs/2021-June/msg00033.html
> Hi Max,
>
> What's the impact of these patches on performance? Just trying to
> get some idea what to expect. Performance remains more or less
> same or we expect a hit.
I definitely expect a hit if you just benchmark the right way. Having to 
open FDs all the time (for metadata operations) will have an impact 
(when you do lookups all the time, or open files and close them 
immediately). How much of an impact, that’s completely up to the 
filesystem’s open_by_handle_at() implementation, I presume.

I don’t expect it to be significant for real-world use cases, though. 
When really doing I/O, there should be absolutely no difference, because 
then we’re operating with these FUSE handles that have actual (non-path) 
FDs attached to them.

Max


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
  2021-06-16 13:38     ` Max Reitz
@ 2021-06-17 21:21       ` Vivek Goyal
  2021-06-18  8:28         ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2021-06-17 21:21 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Wed, Jun 16, 2021 at 03:38:13PM +0200, Max Reitz wrote:
> On 11.06.21 22:04, Vivek Goyal wrote:
> > On Wed, Jun 09, 2021 at 05:55:49PM +0200, Max Reitz wrote:
> > > Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> > > FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> > > its inode ID will remain in use until we drop our lo_inode (and
> > > lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> > > the inode ID as an lo_inode key, because any inode with an inode ID we
> > > find in lo_data.inodes (on the same filesystem) must be the exact same
> > > file.
> > > 
> > > This will change when we start setting lo_inode.fhandle so we do not
> > > have to keep an O_PATH FD open.  Then, unlinking such an inode will
> > > immediately remove it, so its ID can then be reused by newly created
> > > files, even while the lo_inode object is still there[1].
> > > 
> > > So creating a new file can then reuse the old file's inode ID, and
> > > looking up the new file would lead to us finding the old file's
> > > lo_inode, which is not ideal.
> > > 
> > > Luckily, just as file handles cause this problem, they also solve it:  A
> > > file handle contains a generation ID, which changes when an inode ID is
> > > reused, so the new file can be distinguished from the old one.  So all
> > > we need to do is to add a second map besides lo_data.inodes that maps
> > > file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> > > clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> > > 
> > > Unfortunately, we cannot rely on being able to generate file handles
> > > every time.
> > Hi Max,
> > 
> > What are the cases where we can not rely being able to generate file
> > handles?
> 
> I have no idea, but it’s much easier to claim we can’t than to prove that we
> can. I’d rather be resilient.

Assuming that we can not genererate file handles all the time and hence
mainitaing another inode cache seems little problematic to me.

I would rather start with that we can generate file handles and have
a single inode cache.

> 
> > > Therefore, we still enter every lo_inode object into
> > > inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> > > potential inodes_by_handle entry then has precedence, the inodes_by_ids
> > > entry is just a fallback.
> > If we have to keep inodes_by_ids around, then can we just add fhandle
> > to the lo_key. That way we can manage with single hash table and still
> > be able to detect if inode ID has been reused.
> 
> We cannot, because I assume we cannot rely on name_to_handle_at() working
> every time.

I guess either we need concrete information that we can't generate
file handle every time or we should assume we can until we are proven
wrong. And then fix it accordingly, IMHO.

> Therefore, maybe at one point we can generate a file handle, and
> at another, we cannot – we should still be able to look up the inode
> regardless.
> 
> If the file handle were part of inodes_by_ids, then we can look up inodes
> only if we can generate a file handle either every time (for a given inode)
> or never.

Right. And is there a reason to belive that for the same file we can
sometimes generate file handles and other times not. 

Thanks
Vivek

> Or, well, I suppose we could always create two entries, one with
> the file handles zeroed out, and one with the file handle specified, but I
> wouldn’t find that very beautiful.
> 
> Max
> 
> > > Note that we do not generate lo_fhandle objects yet, and so we also do
> > > not enter anything into the inodes_by_handle map yet.  Also, all lookups
> > > skip that map.  We might manually create file handles with some code
> > > that is immediately removed by the next patch again, but that would
> > > break the assumption in lo_find() that every lo_inode with a non-NULL
> > > .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> > > leave actually using the inodes_by_handle map for the next patch.
> > > 
> > > [1] If some application in the guest still has the file open, there is
> > > going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> > > case, the inode will only go away once every application in the guest
> > > has closed it.  The problem described only applies to cases where the
> > > guest does not have the file open, and it is just in the dentry cache,
> > > basically.
> > > 
> > > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > > Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
> > > ---
> > >   tools/virtiofsd/passthrough_ll.c | 80 +++++++++++++++++++++++++-------
> > >   1 file changed, 64 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index e665575401..793d2c333e 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -179,7 +179,8 @@ struct lo_data {
> > >       int announce_submounts;
> > >       bool use_statx;
> > >       struct lo_inode root;
> > > -    GHashTable *inodes; /* protected by lo->mutex */
> > > +    GHashTable *inodes_by_ids; /* protected by lo->mutex */
> > > +    GHashTable *inodes_by_handle; /* protected by lo->mutex */
> > >       struct lo_map ino_map; /* protected by lo->mutex */
> > >       struct lo_map dirp_map; /* protected by lo->mutex */
> > >       struct lo_map fd_map; /* protected by lo->mutex */
> > > @@ -257,8 +258,9 @@ static struct {
> > >   /* That we loaded cap-ng in the current thread from the saved */
> > >   static __thread bool cap_loaded = 0;
> > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > -                                uint64_t mnt_id);
> > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > +                                const struct lo_fhandle *fhandle,
> > > +                                struct stat *st, uint64_t mnt_id);
> > >   static int xattr_map_client(const struct lo_data *lo, const char *client_name,
> > >                               char **out_name);
> > > @@ -1032,18 +1034,39 @@ out_err:
> > >       fuse_reply_err(req, saverr);
> > >   }
> > > -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> > > -                                uint64_t mnt_id)
> > > +static struct lo_inode *lo_find(struct lo_data *lo,
> > > +                                const struct lo_fhandle *fhandle,
> > > +                                struct stat *st, uint64_t mnt_id)
> > >   {
> > > -    struct lo_inode *p;
> > > -    struct lo_key key = {
> > > +    struct lo_inode *p = NULL;
> > > +    struct lo_key ids_key = {
> > >           .ino = st->st_ino,
> > >           .dev = st->st_dev,
> > >           .mnt_id = mnt_id,
> > >       };
> > >       pthread_mutex_lock(&lo->mutex);
> > > -    p = g_hash_table_lookup(lo->inodes, &key);
> > > +    if (fhandle) {
> > > +        p = g_hash_table_lookup(lo->inodes_by_handle, fhandle);
> > > +    }
> > > +    if (!p) {
> > > +        p = g_hash_table_lookup(lo->inodes_by_ids, &ids_key);
> > > +        /*
> > > +         * When we had to fall back to looking up an inode by its IDs,
> > > +         * ensure that we hit an entry that does not have a file
> > > +         * handle.  Entries with file handles must also have a handle
> > > +         * alt key, so if we have not found it by that handle alt key,
> > > +         * we must have found an entry with a mismatching handle; i.e.
> > > +         * an entry for a different file, even though it has the same
> > > +         * inode ID.
> > > +         * (This can happen when we look up a new file that has reused
> > > +         * the inode ID of some previously unlinked inode for which we
> > > +         * still have an lo_inode object.)
> > > +         */
> > > +        if (p && fhandle != NULL && p->fhandle != NULL) {
> > > +            p = NULL;
> > > +        }
> > > +    }
> > >       if (p) {
> > >           assert(p->nlookup > 0);
> > >           p->nlookup++;
> > > @@ -1183,7 +1206,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> > >           e->attr_flags |= FUSE_ATTR_SUBMOUNT;
> > >       }
> > > -    inode = lo_find(lo, &e->attr, mnt_id);
> > > +    inode = lo_find(lo, NULL, &e->attr, mnt_id);
> > >       if (inode) {
> > >           close(newfd);
> > >       } else {
> > > @@ -1213,7 +1236,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> > >           }
> > >           pthread_mutex_lock(&lo->mutex);
> > >           inode->fuse_ino = lo_add_inode_mapping(req, inode);
> > > -        g_hash_table_insert(lo->inodes, &inode->key, inode);
> > > +        g_hash_table_insert(lo->inodes_by_ids, &inode->key, inode);
> > >           pthread_mutex_unlock(&lo->mutex);
> > >       }
> > >       e->ino = inode->fuse_ino;
> > > @@ -1525,7 +1548,7 @@ static struct lo_inode *lookup_name(fuse_req_t req, fuse_ino_t parent,
> > >           return NULL;
> > >       }
> > > -    return lo_find(lo, &attr, mnt_id);
> > > +    return lo_find(lo, NULL, &attr, mnt_id);
> > >   }
> > >   static void lo_rmdir(fuse_req_t req, fuse_ino_t parent, const char *name)
> > > @@ -1688,7 +1711,7 @@ static void unref_inode(struct lo_data *lo, struct lo_inode *inode, uint64_t n)
> > >       inode->nlookup -= n;
> > >       if (!inode->nlookup) {
> > >           lo_map_remove(&lo->ino_map, inode->fuse_ino);
> > > -        g_hash_table_remove(lo->inodes, &inode->key);
> > > +        g_hash_table_remove(lo->inodes_by_ids, &inode->key);
> > >           if (lo->posix_lock) {
> > >               if (g_hash_table_size(inode->posix_locks)) {
> > >                   fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
> > > @@ -3388,7 +3411,7 @@ static void lo_destroy(void *userdata)
> > >           GHashTableIter iter;
> > >           gpointer key, value;
> > > -        g_hash_table_iter_init(&iter, lo->inodes);
> > > +        g_hash_table_iter_init(&iter, lo->inodes_by_ids);
> > >           if (!g_hash_table_iter_next(&iter, &key, &value)) {
> > >               break;
> > >           }
> > > @@ -3931,10 +3954,34 @@ static gboolean lo_key_equal(gconstpointer a, gconstpointer b)
> > >       return la->ino == lb->ino && la->dev == lb->dev && la->mnt_id == lb->mnt_id;
> > >   }
> > > +static guint lo_fhandle_hash(gconstpointer key)
> > > +{
> > > +    const struct lo_fhandle *fh = key;
> > > +    guint hash;
> > > +    size_t i;
> > > +
> > > +    /* Basically g_str_hash() */
> > > +    hash = 5381;
> > > +    for (i = 0; i < sizeof(fh->padding); i++) {
> > > +        hash += hash * 33 + (unsigned char)fh->padding[i];
> > > +    }
> > > +    hash += hash * 33 + fh->mount_id;
> > > +
> > > +    return hash;
> > > +}
> > > +
> > > +static gboolean lo_fhandle_equal(gconstpointer a, gconstpointer b)
> > > +{
> > > +    return !memcmp(a, b, sizeof(struct lo_fhandle));
> > > +}
> > > +
> > >   static void fuse_lo_data_cleanup(struct lo_data *lo)
> > >   {
> > > -    if (lo->inodes) {
> > > -        g_hash_table_destroy(lo->inodes);
> > > +    if (lo->inodes_by_ids) {
> > > +        g_hash_table_destroy(lo->inodes_by_ids);
> > > +    }
> > > +    if (lo->inodes_by_ids) {
> > > +        g_hash_table_destroy(lo->inodes_by_handle);
> > >       }
> > >       if (lo->root.posix_locks) {
> > > @@ -3990,7 +4037,8 @@ int main(int argc, char *argv[])
> > >       qemu_init_exec_dir(argv[0]);
> > >       pthread_mutex_init(&lo.mutex, NULL);
> > > -    lo.inodes = g_hash_table_new(lo_key_hash, lo_key_equal);
> > > +    lo.inodes_by_ids = g_hash_table_new(lo_key_hash, lo_key_equal);
> > > +    lo.inodes_by_handle = g_hash_table_new(lo_fhandle_hash, lo_fhandle_equal);
> > >       lo.root.fd = -1;
> > >       lo.root.fuse_ino = FUSE_ROOT_ID;
> > >       lo.cache = CACHE_AUTO;
> > > -- 
> > > 2.31.1
> > > 
> > > _______________________________________________
> > > Virtio-fs mailing list
> > > Virtio-fs@redhat.com
> > > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > > 
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
  2021-06-17 21:21       ` Vivek Goyal
@ 2021-06-18  8:28         ` Max Reitz
  2021-06-18 18:29           ` Vivek Goyal
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2021-06-18  8:28 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel

On 17.06.21 23:21, Vivek Goyal wrote:
> On Wed, Jun 16, 2021 at 03:38:13PM +0200, Max Reitz wrote:
>> On 11.06.21 22:04, Vivek Goyal wrote:
>>> On Wed, Jun 09, 2021 at 05:55:49PM +0200, Max Reitz wrote:
>>>> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
>>>> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
>>>> its inode ID will remain in use until we drop our lo_inode (and
>>>> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
>>>> the inode ID as an lo_inode key, because any inode with an inode ID we
>>>> find in lo_data.inodes (on the same filesystem) must be the exact same
>>>> file.
>>>>
>>>> This will change when we start setting lo_inode.fhandle so we do not
>>>> have to keep an O_PATH FD open.  Then, unlinking such an inode will
>>>> immediately remove it, so its ID can then be reused by newly created
>>>> files, even while the lo_inode object is still there[1].
>>>>
>>>> So creating a new file can then reuse the old file's inode ID, and
>>>> looking up the new file would lead to us finding the old file's
>>>> lo_inode, which is not ideal.
>>>>
>>>> Luckily, just as file handles cause this problem, they also solve it:  A
>>>> file handle contains a generation ID, which changes when an inode ID is
>>>> reused, so the new file can be distinguished from the old one.  So all
>>>> we need to do is to add a second map besides lo_data.inodes that maps
>>>> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
>>>> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
>>>>
>>>> Unfortunately, we cannot rely on being able to generate file handles
>>>> every time.
>>> Hi Max,
>>>
>>> What are the cases where we can not rely being able to generate file
>>> handles?
>> I have no idea, but it’s much easier to claim we can’t than to prove that we
>> can. I’d rather be resilient.
> Assuming that we can not genererate file handles all the time and hence
> mainitaing another inode cache seems little problematic to me.

How so?

> I would rather start with that we can generate file handles and have
> a single inode cache.

The assumption that we can generate file handles all the time is a much 
stronger one (and one which needs to be proven) than assuming that 
failure is possible.

Also, it is still a single inode cache. I'm just adding a third key. 
(The two existing keys are lo_key (through lo->inodes) and fuse_ino_t 
(through lo->ino_map).)

>>>> Therefore, we still enter every lo_inode object into
>>>> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
>>>> potential inodes_by_handle entry then has precedence, the inodes_by_ids
>>>> entry is just a fallback.
>>> If we have to keep inodes_by_ids around, then can we just add fhandle
>>> to the lo_key. That way we can manage with single hash table and still
>>> be able to detect if inode ID has been reused.
>> We cannot, because I assume we cannot rely on name_to_handle_at() working
>> every time.
> I guess either we need concrete information that we can't generate
> file handle every time or we should assume we can until we are proven
> wrong. And then fix it accordingly, IMHO.

I don’t know why we need this other than because it would simplify the code.

Under the assumption that for a specific file we can either generate 
file handles all the time or never, the code as it is will behave 
correct. It’s just a bit more complicated than it would need to be, but 
I don’t find the diffstat of +64/-16 to be indicative of something 
that’s really bad.

>> Therefore, maybe at one point we can generate a file handle, and
>> at another, we cannot – we should still be able to look up the inode
>> regardless.
>>
>> If the file handle were part of inodes_by_ids, then we can look up inodes
>> only if we can generate a file handle either every time (for a given inode)
>> or never.
> Right. And is there a reason to belive that for the same file we can
> sometimes generate file handles and other times not.

Looking into name_to_handle_at()’s man page, there is no error listed 
that I could imagine happening only sometimes. But it doesn’t give an 
explicit guarantee that it will either always succeed or fail for a 
given inode.

Looking into the kernel, I can see that most filesystems only fail 
.encode_fh() if the buffer is too small. Overlayfs can also fail with 
ENOMEM (which will be translated to EOVERFLOW along the way, so so much 
for name_to_handle_at()’s list of error conditions), and EIO on 
conditions I don’t understand well enough (again, will become EOVERFLOW 
for the user).

You’re probably right that at least in practice we don’t need to 
accommodate for name_to_handle_at() sometimes working for some inode and 
sometimes not.

But I feel quite uneasy relying on this being the case, and being the 
case forever, when I find it quite simple to just have some added 
complexity to deal with it. It’s just a third key for our inode cache.

If you want to, I can write a 10/9 patch that simplifies the code under 
the assumption that name_to_handle_at() will either fail or not, but 
frankly I wouldn’t want to have my name under it. (Which is why it would 
be a 10/9 so I can have some explicit note that my S-o-b would be there 
only for legal reasons, not because this is really my patch.)

(And now I tentatively wrote such a patch (which requires patch 9 to be 
reverted, of course), and that gives me a diffstat of +37/-66. 
Basically, the difference is just having two comments removed.)

Max


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
  2021-06-09 15:55   ` [Virtio-fs] " Max Reitz
@ 2021-06-18  8:30     ` Max Reitz
  -1 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-18  8:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi

On 09.06.21 17:55, Max Reitz wrote:
> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> its inode ID will remain in use until we drop our lo_inode (and
> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> the inode ID as an lo_inode key, because any inode with an inode ID we
> find in lo_data.inodes (on the same filesystem) must be the exact same
> file.
>
> This will change when we start setting lo_inode.fhandle so we do not
> have to keep an O_PATH FD open.  Then, unlinking such an inode will
> immediately remove it, so its ID can then be reused by newly created
> files, even while the lo_inode object is still there[1].
>
> So creating a new file can then reuse the old file's inode ID, and
> looking up the new file would lead to us finding the old file's
> lo_inode, which is not ideal.
>
> Luckily, just as file handles cause this problem, they also solve it:  A
> file handle contains a generation ID, which changes when an inode ID is
> reused, so the new file can be distinguished from the old one.  So all
> we need to do is to add a second map besides lo_data.inodes that maps
> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
>
> Unfortunately, we cannot rely on being able to generate file handles
> every time.  Therefore, we still enter every lo_inode object into
> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> potential inodes_by_handle entry then has precedence, the inodes_by_ids
> entry is just a fallback.
>
> Note that we do not generate lo_fhandle objects yet, and so we also do
> not enter anything into the inodes_by_handle map yet.  Also, all lookups
> skip that map.  We might manually create file handles with some code
> that is immediately removed by the next patch again, but that would
> break the assumption in lo_find() that every lo_inode with a non-NULL
> .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> leave actually using the inodes_by_handle map for the next patch.
>
> [1] If some application in the guest still has the file open, there is
> going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> case, the inode will only go away once every application in the guest
> has closed it.  The problem described only applies to cases where the
> guest does not have the file open, and it is just in the dentry cache,
> basically.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
> ---
>   tools/virtiofsd/passthrough_ll.c | 80 +++++++++++++++++++++++++-------
>   1 file changed, 64 insertions(+), 16 deletions(-)

[...]

> @@ -3931,10 +3954,34 @@ static gboolean lo_key_equal(gconstpointer a, gconstpointer b)
>       return la->ino == lb->ino && la->dev == lb->dev && la->mnt_id == lb->mnt_id;
>   }
>   
> +static guint lo_fhandle_hash(gconstpointer key)
> +{
> +    const struct lo_fhandle *fh = key;
> +    guint hash;
> +    size_t i;
> +
> +    /* Basically g_str_hash() */
> +    hash = 5381;
> +    for (i = 0; i < sizeof(fh->padding); i++) {
> +        hash += hash * 33 + (unsigned char)fh->padding[i];
> +    }
> +    hash += hash * 33 + fh->mount_id;

Just spotted: These `+=` should be `=`.

Max

> +
> +    return hash;
> +}



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
@ 2021-06-18  8:30     ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-18  8:30 UTC (permalink / raw)
  To: qemu-devel, virtio-fs

On 09.06.21 17:55, Max Reitz wrote:
> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> its inode ID will remain in use until we drop our lo_inode (and
> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> the inode ID as an lo_inode key, because any inode with an inode ID we
> find in lo_data.inodes (on the same filesystem) must be the exact same
> file.
>
> This will change when we start setting lo_inode.fhandle so we do not
> have to keep an O_PATH FD open.  Then, unlinking such an inode will
> immediately remove it, so its ID can then be reused by newly created
> files, even while the lo_inode object is still there[1].
>
> So creating a new file can then reuse the old file's inode ID, and
> looking up the new file would lead to us finding the old file's
> lo_inode, which is not ideal.
>
> Luckily, just as file handles cause this problem, they also solve it:  A
> file handle contains a generation ID, which changes when an inode ID is
> reused, so the new file can be distinguished from the old one.  So all
> we need to do is to add a second map besides lo_data.inodes that maps
> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
>
> Unfortunately, we cannot rely on being able to generate file handles
> every time.  Therefore, we still enter every lo_inode object into
> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> potential inodes_by_handle entry then has precedence, the inodes_by_ids
> entry is just a fallback.
>
> Note that we do not generate lo_fhandle objects yet, and so we also do
> not enter anything into the inodes_by_handle map yet.  Also, all lookups
> skip that map.  We might manually create file handles with some code
> that is immediately removed by the next patch again, but that would
> break the assumption in lo_find() that every lo_inode with a non-NULL
> .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> leave actually using the inodes_by_handle map for the next patch.
>
> [1] If some application in the guest still has the file open, there is
> going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> case, the inode will only go away once every application in the guest
> has closed it.  The problem described only applies to cases where the
> guest does not have the file open, and it is just in the dentry cache,
> basically.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
> ---
>   tools/virtiofsd/passthrough_ll.c | 80 +++++++++++++++++++++++++-------
>   1 file changed, 64 insertions(+), 16 deletions(-)

[...]

> @@ -3931,10 +3954,34 @@ static gboolean lo_key_equal(gconstpointer a, gconstpointer b)
>       return la->ino == lb->ino && la->dev == lb->dev && la->mnt_id == lb->mnt_id;
>   }
>   
> +static guint lo_fhandle_hash(gconstpointer key)
> +{
> +    const struct lo_fhandle *fh = key;
> +    guint hash;
> +    size_t i;
> +
> +    /* Basically g_str_hash() */
> +    hash = 5381;
> +    for (i = 0; i < sizeof(fh->padding); i++) {
> +        hash += hash * 33 + (unsigned char)fh->padding[i];
> +    }
> +    hash += hash * 33 + fh->mount_id;

Just spotted: These `+=` should be `=`.

Max

> +
> +    return hash;
> +}


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
  2021-06-18  8:28         ` Max Reitz
@ 2021-06-18 18:29           ` Vivek Goyal
  2021-06-21  9:02             ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2021-06-18 18:29 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Fri, Jun 18, 2021 at 10:28:38AM +0200, Max Reitz wrote:
> On 17.06.21 23:21, Vivek Goyal wrote:
> > On Wed, Jun 16, 2021 at 03:38:13PM +0200, Max Reitz wrote:
> > > On 11.06.21 22:04, Vivek Goyal wrote:
> > > > On Wed, Jun 09, 2021 at 05:55:49PM +0200, Max Reitz wrote:
> > > > > Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> > > > > FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> > > > > its inode ID will remain in use until we drop our lo_inode (and
> > > > > lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> > > > > the inode ID as an lo_inode key, because any inode with an inode ID we
> > > > > find in lo_data.inodes (on the same filesystem) must be the exact same
> > > > > file.
> > > > > 
> > > > > This will change when we start setting lo_inode.fhandle so we do not
> > > > > have to keep an O_PATH FD open.  Then, unlinking such an inode will
> > > > > immediately remove it, so its ID can then be reused by newly created
> > > > > files, even while the lo_inode object is still there[1].
> > > > > 
> > > > > So creating a new file can then reuse the old file's inode ID, and
> > > > > looking up the new file would lead to us finding the old file's
> > > > > lo_inode, which is not ideal.
> > > > > 
> > > > > Luckily, just as file handles cause this problem, they also solve it:  A
> > > > > file handle contains a generation ID, which changes when an inode ID is
> > > > > reused, so the new file can be distinguished from the old one.  So all
> > > > > we need to do is to add a second map besides lo_data.inodes that maps
> > > > > file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> > > > > clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> > > > > 
> > > > > Unfortunately, we cannot rely on being able to generate file handles
> > > > > every time.
> > > > Hi Max,
> > > > 
> > > > What are the cases where we can not rely being able to generate file
> > > > handles?
> > > I have no idea, but it’s much easier to claim we can’t than to prove that we
> > > can. I’d rather be resilient.
> > Assuming that we can not genererate file handles all the time and hence
> > mainitaing another inode cache seems little problematic to me.
> 
> How so?

It is extra complexity. Need to worry about one more hash table. Sure,
I give it to you that we are not creating two copies of inodes. Same
inode object is being added to two different hash tables and is being
looked up using two different keys.

> 
> > I would rather start with that we can generate file handles and have
> > a single inode cache.
> 
> The assumption that we can generate file handles all the time is a much
> stronger one (and one which needs to be proven) than assuming that failure
> is possible.

So if temporary failures can happen (like -ENOMEM, as you mentioned),
these can happen with openat() too. And in that case we return error
to caller. So why to try to hide temporary failures from
name_to_handle_at().

I am still reading your code and trying to understand it. But one
question came to mind. What happens if we can generate file handle
during lookup. But can't generate when same file is looked up again.

- A file foo.txt is looked. We can create file handle and we add it
  to lo->inodes_by_handle as well as lo->inodes_by_ds.

- Say somebody deleted file and created again and inode number got
  reused.

- Now during ->revalidation path, lookup happens again. This time say
  we can't generate file handle. If am reading lo_do_find() code
  correctly, it will find the old inode using ids and return same
  inode as result of lookup. And we did not recognize that inode
  number has been reused.

And issues might arise if we could not generate file handle in first
lookup but could generate in second.

- A file foo.txt is lookedup. We can not create file handle and we add it
  to lo->inodes_by_ids.

- Say somebody deleted file and created again and inode number got
  reused.

- Now during ->revalidation path, lookup happens again. This time say
  we can generate file handle. If am reading lo_do_find() code
  correctly, it will find the old inode using ids and return same
  inode as result of lookup. And we did not recognize that inode
  number has been reused.

IOW, because we could not generate the file handle, we lost the
ability to recognize that inode number has been reused. That means
either we don't switch to using file handles or if we do switch,
it is important that we can generate file handle to differentiate
whether inode number has been used or not. If not, return error to
caller. Caller probably will mark inode bad and let and do a lookup
again.

> 
> Also, it is still a single inode cache. I'm just adding a third key. (The
> two existing keys are lo_key (through lo->inodes) and fuse_ino_t (through
> lo->ino_map).)
> 
> > > > > Therefore, we still enter every lo_inode object into
> > > > > inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> > > > > potential inodes_by_handle entry then has precedence, the inodes_by_ids
> > > > > entry is just a fallback.
> > > > If we have to keep inodes_by_ids around, then can we just add fhandle
> > > > to the lo_key. That way we can manage with single hash table and still
> > > > be able to detect if inode ID has been reused.
> > > We cannot, because I assume we cannot rely on name_to_handle_at() working
> > > every time.
> > I guess either we need concrete information that we can't generate
> > file handle every time or we should assume we can until we are proven
> > wrong. And then fix it accordingly, IMHO.
> 
> I don’t know why we need this other than because it would simplify the code.
> 
> Under the assumption that for a specific file we can either generate file
> handles all the time or never, the code as it is will behave correct. It’s
> just a bit more complicated than it would need to be, but I don’t find the
> diffstat of +64/-16 to be indicative of something that’s really bad.
> 
> > > Therefore, maybe at one point we can generate a file handle, and
> > > at another, we cannot – we should still be able to look up the inode
> > > regardless.
> > > 
> > > If the file handle were part of inodes_by_ids, then we can look up inodes
> > > only if we can generate a file handle either every time (for a given inode)
> > > or never.
> > Right. And is there a reason to belive that for the same file we can
> > sometimes generate file handles and other times not.
> 
> Looking into name_to_handle_at()’s man page, there is no error listed that I
> could imagine happening only sometimes. But it doesn’t give an explicit
> guarantee that it will either always succeed or fail for a given inode.
> 
> Looking into the kernel, I can see that most filesystems only fail
> .encode_fh() if the buffer is too small. Overlayfs can also fail with ENOMEM
> (which will be translated to EOVERFLOW along the way, so so much for
> name_to_handle_at()’s list of error conditions), and EIO on conditions I
> don’t understand well enough (again, will become EOVERFLOW for the user).
> 
> You’re probably right that at least in practice we don’t need to accommodate
> for name_to_handle_at() sometimes working for some inode and sometimes not.

What am I not able to understand is that why we can't return error if
we run into a temporary issue and can't generate file handle. What's
that requirement that we need to hide the error and try to cover it
up by some other means.

Thanks
Vivek

> 
> But I feel quite uneasy relying on this being the case, and being the case
> forever, when I find it quite simple to just have some added complexity to
> deal with it. It’s just a third key for our inode cache.
> 
> If you want to, I can write a 10/9 patch that simplifies the code under the
> assumption that name_to_handle_at() will either fail or not, but frankly I
> wouldn’t want to have my name under it. (Which is why it would be a 10/9 so
> I can have some explicit note that my S-o-b would be there only for legal
> reasons, not because this is really my patch.)
> 
> (And now I tentatively wrote such a patch (which requires patch 9 to be
> reverted, of course), and that gives me a diffstat of +37/-66. Basically,
> the difference is just having two comments removed.)
> 
> Max
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
  2021-06-18 18:29           ` Vivek Goyal
@ 2021-06-21  9:02             ` Max Reitz
  2021-06-21 15:51               ` Vivek Goyal
  2021-07-13 15:07               ` Max Reitz
  0 siblings, 2 replies; 38+ messages in thread
From: Max Reitz @ 2021-06-21  9:02 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel

On 18.06.21 20:29, Vivek Goyal wrote:
> On Fri, Jun 18, 2021 at 10:28:38AM +0200, Max Reitz wrote:
>> On 17.06.21 23:21, Vivek Goyal wrote:
>>> On Wed, Jun 16, 2021 at 03:38:13PM +0200, Max Reitz wrote:
>>>> On 11.06.21 22:04, Vivek Goyal wrote:
>>>>> On Wed, Jun 09, 2021 at 05:55:49PM +0200, Max Reitz wrote:
>>>>>> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
>>>>>> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
>>>>>> its inode ID will remain in use until we drop our lo_inode (and
>>>>>> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
>>>>>> the inode ID as an lo_inode key, because any inode with an inode ID we
>>>>>> find in lo_data.inodes (on the same filesystem) must be the exact same
>>>>>> file.
>>>>>>
>>>>>> This will change when we start setting lo_inode.fhandle so we do not
>>>>>> have to keep an O_PATH FD open.  Then, unlinking such an inode will
>>>>>> immediately remove it, so its ID can then be reused by newly created
>>>>>> files, even while the lo_inode object is still there[1].
>>>>>>
>>>>>> So creating a new file can then reuse the old file's inode ID, and
>>>>>> looking up the new file would lead to us finding the old file's
>>>>>> lo_inode, which is not ideal.
>>>>>>
>>>>>> Luckily, just as file handles cause this problem, they also solve it:  A
>>>>>> file handle contains a generation ID, which changes when an inode ID is
>>>>>> reused, so the new file can be distinguished from the old one.  So all
>>>>>> we need to do is to add a second map besides lo_data.inodes that maps
>>>>>> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
>>>>>> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
>>>>>>
>>>>>> Unfortunately, we cannot rely on being able to generate file handles
>>>>>> every time.
>>>>> Hi Max,
>>>>>
>>>>> What are the cases where we can not rely being able to generate file
>>>>> handles?
>>>> I have no idea, but it’s much easier to claim we can’t than to prove that we
>>>> can. I’d rather be resilient.
>>> Assuming that we can not genererate file handles all the time and hence
>>> mainitaing another inode cache seems little problematic to me.
>> How so?
> It is extra complexity. Need to worry about one more hash table. Sure,
> I give it to you that we are not creating two copies of inodes. Same
> inode object is being added to two different hash tables and is being
> looked up using two different keys.
>
>>> I would rather start with that we can generate file handles and have
>>> a single inode cache.
>> The assumption that we can generate file handles all the time is a much
>> stronger one (and one which needs to be proven) than assuming that failure
>> is possible.
> So if temporary failures can happen (like -ENOMEM, as you mentioned),
> these can happen with openat() too. And in that case we return error
> to caller. So why to try to hide temporary failures from
> name_to_handle_at().

Well, for openat() we don’t have a choice, if that fails, there is no 
fallback, so we must return an error.  For name_to_handle_at(), there is 
a fallback.

> I am still reading your code and trying to understand it. But one
> question came to mind. What happens if we can generate file handle
> during lookup. But can't generate when same file is looked up again.
>
> - A file foo.txt is looked. We can create file handle and we add it
>    to lo->inodes_by_handle as well as lo->inodes_by_ds.
>
> - Say somebody deleted file and created again and inode number got
>    reused.
>
> - Now during ->revalidation path, lookup happens again. This time say
>    we can't generate file handle. If am reading lo_do_find() code
>    correctly, it will find the old inode using ids and return same
>    inode as result of lookup. And we did not recognize that inode
>    number has been reused.

Oh, that’s a good point.  If an lo_inode has no O_PATH fd but is only 
addressed by handle, we must always look it up by handle.

> And issues might arise if we could not generate file handle in first
> lookup but could generate in second.
>
> - A file foo.txt is lookedup. We can not create file handle and we add it
>    to lo->inodes_by_ids.
>
> - Say somebody deleted file and created again and inode number got
>    reused.

This is not possible, because if we could not generate a file handle on 
the first lookup, the lo_inode will have an O_PATH fd attached to it, so 
the inode number cannot be reused while the lo_inode still exists.

> - Now during ->revalidation path, lookup happens again. This time say
>    we can generate file handle. If am reading lo_do_find() code
>    correctly, it will find the old inode using ids and return same
>    inode as result of lookup. And we did not recognize that inode
>    number has been reused.
>
> IOW, because we could not generate the file handle, we lost the
> ability to recognize that inode number has been reused. That means
> either we don't switch to using file handles or if we do switch,
> it is important that we can generate file handle to differentiate
> whether inode number has been used or not. If not, return error to
> caller. Caller probably will mark inode bad and let and do a lookup
> again.
>
>> Also, it is still a single inode cache. I'm just adding a third key. (The
>> two existing keys are lo_key (through lo->inodes) and fuse_ino_t (through
>> lo->ino_map).)
>>
>>>>>> Therefore, we still enter every lo_inode object into
>>>>>> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
>>>>>> potential inodes_by_handle entry then has precedence, the inodes_by_ids
>>>>>> entry is just a fallback.
>>>>> If we have to keep inodes_by_ids around, then can we just add fhandle
>>>>> to the lo_key. That way we can manage with single hash table and still
>>>>> be able to detect if inode ID has been reused.
>>>> We cannot, because I assume we cannot rely on name_to_handle_at() working
>>>> every time.
>>> I guess either we need concrete information that we can't generate
>>> file handle every time or we should assume we can until we are proven
>>> wrong. And then fix it accordingly, IMHO.
>> I don’t know why we need this other than because it would simplify the code.
>>
>> Under the assumption that for a specific file we can either generate file
>> handles all the time or never, the code as it is will behave correct. It’s
>> just a bit more complicated than it would need to be, but I don’t find the
>> diffstat of +64/-16 to be indicative of something that’s really bad.
>>
>>>> Therefore, maybe at one point we can generate a file handle, and
>>>> at another, we cannot – we should still be able to look up the inode
>>>> regardless.
>>>>
>>>> If the file handle were part of inodes_by_ids, then we can look up inodes
>>>> only if we can generate a file handle either every time (for a given inode)
>>>> or never.
>>> Right. And is there a reason to belive that for the same file we can
>>> sometimes generate file handles and other times not.
>> Looking into name_to_handle_at()’s man page, there is no error listed that I
>> could imagine happening only sometimes. But it doesn’t give an explicit
>> guarantee that it will either always succeed or fail for a given inode.
>>
>> Looking into the kernel, I can see that most filesystems only fail
>> .encode_fh() if the buffer is too small. Overlayfs can also fail with ENOMEM
>> (which will be translated to EOVERFLOW along the way, so so much for
>> name_to_handle_at()’s list of error conditions), and EIO on conditions I
>> don’t understand well enough (again, will become EOVERFLOW for the user).
>>
>> You’re probably right that at least in practice we don’t need to accommodate
>> for name_to_handle_at() sometimes working for some inode and sometimes not.
> What am I not able to understand is that why we can't return error if
> we run into a temporary issue and can't generate file handle. What's
> that requirement that we need to hide the error and try to cover it
> up by some other means.

There is no requirement, it’s just that we need to hide ENOTSUPP errors 
anyway (because e.g. some submounted filesystem may not support file 
handles), so I considered hiding temporary errors not to really add 
complexity.  (Which is true, as can be seen from the diff stat I posted 
below: Dropping the second hash table and making the handle part of 
lo_key, so that temporary errors are now fatal, generates a diff of 
+37/-66, where -20 are just two comments (which realistically I’d need 
to replace by different comments), so in terms of code, it’s +37/-46.)

I’d just rather handle errors gracefully when I find it doesn’t really 
cost complexity.


However, you made a good point in that we must require 
name_to_handle_at() to work if it worked before for some inode, not 
because it would be simpler, but because it would be wrong otherwise.

As for the other way around...  Well, now I don’t have a strong opinion 
on it.  Handling temporary name_to_handle_at() failure after it worked 
the first time should not add extra complexity, but it wouldn’t be 
symmetric.  Like, allowing temporary failure sometimes but not at other 
times.

The next question is, how do we detect temporary failure, because if we 
look up some new inode, name_to_handle_at() fails, we ignore it, and 
then it starts to work and we fail all further lookups, that’s not 
good.  We should have the first lookup fail.  I suppose ENOTSUPP means 
“OK to ignore”, and for everything else we should let lookup fail?  (And 
that pretty much answers my "what if name_to_handle_at() works the first 
time, but then fails" question.  If we let anything but ENOTSUPP let the 
lookup fail, then we should do so every time.)

Max

>> But I feel quite uneasy relying on this being the case, and being the case
>> forever, when I find it quite simple to just have some added complexity to
>> deal with it. It’s just a third key for our inode cache.
>>
>> If you want to, I can write a 10/9 patch that simplifies the code under the
>> assumption that name_to_handle_at() will either fail or not, but frankly I
>> wouldn’t want to have my name under it. (Which is why it would be a 10/9 so
>> I can have some explicit note that my S-o-b would be there only for legal
>> reasons, not because this is really my patch.)
>>
>> (And now I tentatively wrote such a patch (which requires patch 9 to be
>> reverted, of course), and that gives me a diffstat of +37/-66. Basically,
>> the difference is just having two comments removed.)
>>
>> Max
>>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
  2021-06-21  9:02             ` Max Reitz
@ 2021-06-21 15:51               ` Vivek Goyal
  2021-06-21 17:07                 ` Max Reitz
  2021-07-13 15:07               ` Max Reitz
  1 sibling, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2021-06-21 15:51 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Mon, Jun 21, 2021 at 11:02:16AM +0200, Max Reitz wrote:
> On 18.06.21 20:29, Vivek Goyal wrote:
> > On Fri, Jun 18, 2021 at 10:28:38AM +0200, Max Reitz wrote:
> > > On 17.06.21 23:21, Vivek Goyal wrote:
> > > > On Wed, Jun 16, 2021 at 03:38:13PM +0200, Max Reitz wrote:
> > > > > On 11.06.21 22:04, Vivek Goyal wrote:
> > > > > > On Wed, Jun 09, 2021 at 05:55:49PM +0200, Max Reitz wrote:
> > > > > > > Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> > > > > > > FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> > > > > > > its inode ID will remain in use until we drop our lo_inode (and
> > > > > > > lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> > > > > > > the inode ID as an lo_inode key, because any inode with an inode ID we
> > > > > > > find in lo_data.inodes (on the same filesystem) must be the exact same
> > > > > > > file.
> > > > > > > 
> > > > > > > This will change when we start setting lo_inode.fhandle so we do not
> > > > > > > have to keep an O_PATH FD open.  Then, unlinking such an inode will
> > > > > > > immediately remove it, so its ID can then be reused by newly created
> > > > > > > files, even while the lo_inode object is still there[1].
> > > > > > > 
> > > > > > > So creating a new file can then reuse the old file's inode ID, and
> > > > > > > looking up the new file would lead to us finding the old file's
> > > > > > > lo_inode, which is not ideal.
> > > > > > > 
> > > > > > > Luckily, just as file handles cause this problem, they also solve it:  A
> > > > > > > file handle contains a generation ID, which changes when an inode ID is
> > > > > > > reused, so the new file can be distinguished from the old one.  So all
> > > > > > > we need to do is to add a second map besides lo_data.inodes that maps
> > > > > > > file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> > > > > > > clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> > > > > > > 
> > > > > > > Unfortunately, we cannot rely on being able to generate file handles
> > > > > > > every time.
> > > > > > Hi Max,
> > > > > > 
> > > > > > What are the cases where we can not rely being able to generate file
> > > > > > handles?
> > > > > I have no idea, but it’s much easier to claim we can’t than to prove that we
> > > > > can. I’d rather be resilient.
> > > > Assuming that we can not genererate file handles all the time and hence
> > > > mainitaing another inode cache seems little problematic to me.
> > > How so?
> > It is extra complexity. Need to worry about one more hash table. Sure,
> > I give it to you that we are not creating two copies of inodes. Same
> > inode object is being added to two different hash tables and is being
> > looked up using two different keys.
> > 
> > > > I would rather start with that we can generate file handles and have
> > > > a single inode cache.
> > > The assumption that we can generate file handles all the time is a much
> > > stronger one (and one which needs to be proven) than assuming that failure
> > > is possible.
> > So if temporary failures can happen (like -ENOMEM, as you mentioned),
> > these can happen with openat() too. And in that case we return error
> > to caller. So why to try to hide temporary failures from
> > name_to_handle_at().
> 
> Well, for openat() we don’t have a choice, if that fails, there is no
> fallback, so we must return an error.  For name_to_handle_at(), there is a
> fallback.
> 
> > I am still reading your code and trying to understand it. But one
> > question came to mind. What happens if we can generate file handle
> > during lookup. But can't generate when same file is looked up again.
> > 
> > - A file foo.txt is looked. We can create file handle and we add it
> >    to lo->inodes_by_handle as well as lo->inodes_by_ds.
> > 
> > - Say somebody deleted file and created again and inode number got
> >    reused.
> > 
> > - Now during ->revalidation path, lookup happens again. This time say
> >    we can't generate file handle. If am reading lo_do_find() code
> >    correctly, it will find the old inode using ids and return same
> >    inode as result of lookup. And we did not recognize that inode
> >    number has been reused.
> 
> Oh, that’s a good point.  If an lo_inode has no O_PATH fd but is only
> addressed by handle, we must always look it up by handle.
> 
> > And issues might arise if we could not generate file handle in first
> > lookup but could generate in second.
> > 
> > - A file foo.txt is lookedup. We can not create file handle and we add it
> >    to lo->inodes_by_ids.
> > 
> > - Say somebody deleted file and created again and inode number got
> >    reused.
> 
> This is not possible, because if we could not generate a file handle on the
> first lookup, the lo_inode will have an O_PATH fd attached to it, so the
> inode number cannot be reused while the lo_inode still exists.
> 
> > - Now during ->revalidation path, lookup happens again. This time say
> >    we can generate file handle. If am reading lo_do_find() code
> >    correctly, it will find the old inode using ids and return same
> >    inode as result of lookup. And we did not recognize that inode
> >    number has been reused.
> > 
> > IOW, because we could not generate the file handle, we lost the
> > ability to recognize that inode number has been reused. That means
> > either we don't switch to using file handles or if we do switch,
> > it is important that we can generate file handle to differentiate
> > whether inode number has been used or not. If not, return error to
> > caller. Caller probably will mark inode bad and let and do a lookup
> > again.
> > 
> > > Also, it is still a single inode cache. I'm just adding a third key. (The
> > > two existing keys are lo_key (through lo->inodes) and fuse_ino_t (through
> > > lo->ino_map).)
> > > 
> > > > > > > Therefore, we still enter every lo_inode object into
> > > > > > > inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> > > > > > > potential inodes_by_handle entry then has precedence, the inodes_by_ids
> > > > > > > entry is just a fallback.
> > > > > > If we have to keep inodes_by_ids around, then can we just add fhandle
> > > > > > to the lo_key. That way we can manage with single hash table and still
> > > > > > be able to detect if inode ID has been reused.
> > > > > We cannot, because I assume we cannot rely on name_to_handle_at() working
> > > > > every time.
> > > > I guess either we need concrete information that we can't generate
> > > > file handle every time or we should assume we can until we are proven
> > > > wrong. And then fix it accordingly, IMHO.
> > > I don’t know why we need this other than because it would simplify the code.
> > > 
> > > Under the assumption that for a specific file we can either generate file
> > > handles all the time or never, the code as it is will behave correct. It’s
> > > just a bit more complicated than it would need to be, but I don’t find the
> > > diffstat of +64/-16 to be indicative of something that’s really bad.
> > > 
> > > > > Therefore, maybe at one point we can generate a file handle, and
> > > > > at another, we cannot – we should still be able to look up the inode
> > > > > regardless.
> > > > > 
> > > > > If the file handle were part of inodes_by_ids, then we can look up inodes
> > > > > only if we can generate a file handle either every time (for a given inode)
> > > > > or never.
> > > > Right. And is there a reason to belive that for the same file we can
> > > > sometimes generate file handles and other times not.
> > > Looking into name_to_handle_at()’s man page, there is no error listed that I
> > > could imagine happening only sometimes. But it doesn’t give an explicit
> > > guarantee that it will either always succeed or fail for a given inode.
> > > 
> > > Looking into the kernel, I can see that most filesystems only fail
> > > .encode_fh() if the buffer is too small. Overlayfs can also fail with ENOMEM
> > > (which will be translated to EOVERFLOW along the way, so so much for
> > > name_to_handle_at()’s list of error conditions), and EIO on conditions I
> > > don’t understand well enough (again, will become EOVERFLOW for the user).
> > > 
> > > You’re probably right that at least in practice we don’t need to accommodate
> > > for name_to_handle_at() sometimes working for some inode and sometimes not.
> > What am I not able to understand is that why we can't return error if
> > we run into a temporary issue and can't generate file handle. What's
> > that requirement that we need to hide the error and try to cover it
> > up by some other means.
> 
> There is no requirement, it’s just that we need to hide ENOTSUPP errors
> anyway (because e.g. some submounted filesystem may not support file
> handles), so I considered hiding temporary errors

ENOTSUPP is not a temporary error I guess. But this is a good point that
if host filesystem does not support file handles, should we return
error so that user is forced to remove "-o inode_file_handles" option.

I can see multiple modes and they all seem to be useful in different
scenarios.

A. Do not use file handles at all.

B. Use file handles if filesystem supports it. Also this could do some
   kind of mix and matching so that some inodes use file handles while
   others use fd. We could think of implementing some threshold and
   if open fds go above this threshold, switch to file handle stuff.

C. Strictly use file handles otherwise return error to caller.

One possibility is that we implement two options inode_file_handles
and no_inode_file_handles.

- User specified -o no_inode_file_handles, implement A.
- User specified -o inode_file_handles, implement C.
- User did not specify anything, then use file handles opportunistically
  as seen fit by daemon. This provides us the maximum flexibility and
  this practically implements option B.

IOW, if user did not specify anything, then we can think of implementing
file handles by default and fallback to using O_PATH fds if filesystem
does not support file handles (ENOTSUPP). But if user did specify
"-o no_inode_file_handles" or "-o inode_file_handles", this kind
of points to strictly implementing respective policy, IMHO. That's how
I have implemented some other options.

Alternatively, we could think of adding one more option say
"inode_file_handles_only.

- "-o no_inode_files_handles" implements A.
- "-o inode_files_handles" implements B.
- "-o inode_files_handles_only" implements C.
- If user did not specify anything on command line, then its up to the
  daemon whatever default policy it wants and default can change
  over time.

> not to really add
> complexity.  (Which is true, as can be seen from the diff stat I posted
> below: Dropping the second hash table and making the handle part of lo_key,
> so that temporary errors are now fatal, generates a diff of +37/-66, where
> -20 are just two comments (which realistically I’d need to replace by
> different comments), so in terms of code, it’s +37/-46.)
> 
> I’d just rather handle errors gracefully when I find it doesn’t really cost
> complexity.
> 
> 
> However, you made a good point in that we must require name_to_handle_at()
> to work if it worked before for some inode, not because it would be simpler,
> but because it would be wrong otherwise.
> 
> As for the other way around...  Well, now I don’t have a strong opinion on
> it.  Handling temporary name_to_handle_at() failure after it worked the
> first time should not add extra complexity, but it wouldn’t be symmetric. 
> Like, allowing temporary failure sometimes but not at other times.

Right. If we decided that we need to generate file handle for an inode
and underlying filesystem supports file handles, then handling temporary
failures only sometimes will make it assymetric. At this point of time
I am more inclined to return error to caller on temporary failures. But
if this does not work well in practice, I am open to change the policy.

> 
> The next question is, how do we detect temporary failure, because if we look
> up some new inode, name_to_handle_at() fails, we ignore it, and then it
> starts to work and we fail all further lookups, that’s not good.  We should
> have the first lookup fail.  I suppose ENOTSUPP means “OK to ignore”, and
> for everything else we should let lookup fail?

ENOTSUPP will be a permanent failure right? And not a temporary one.

It seems reasonable that we gracefully fall back to O_PATH fd in case
of ENOTSUPP (Assuming -o inode_file_handles means try to use file
handles and fallback to O_PATH fds if host filesystem does not 
support it).

But for temporary failures we probably will have to return errors to callers.
Do you have a specific temporary failure in mind which you are concerned
about.

> (And that pretty much
> answers my "what if name_to_handle_at() works the first time, but then
> fails" question.  If we let anything but ENOTSUPP let the lookup fail, then
> we should do so every time.)

Agreed. ENOTSUPP seems to be the only error when falling back to O_PATH
fd makes most sense to me. Rest of them probably should be propagated
to caller.

And given ENOTSUPP is a permanent failure, you probably will be able
to add fhandle to lo_key and not require a separate mapping which
looks up inode by fhandle.

Thanks
Vivek

> 
> Max
> 
> > > But I feel quite uneasy relying on this being the case, and being the case
> > > forever, when I find it quite simple to just have some added complexity to
> > > deal with it. It’s just a third key for our inode cache.
> > > 
> > > If you want to, I can write a 10/9 patch that simplifies the code under the
> > > assumption that name_to_handle_at() will either fail or not, but frankly I
> > > wouldn’t want to have my name under it. (Which is why it would be a 10/9 so
> > > I can have some explicit note that my S-o-b would be there only for legal
> > > reasons, not because this is really my patch.)
> > > 
> > > (And now I tentatively wrote such a patch (which requires patch 9 to be
> > > reverted, of course), and that gives me a diffstat of +37/-66. Basically,
> > > the difference is just having two comments removed.)
> > > 
> > > Max
> > > 
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
  2021-06-21 15:51               ` Vivek Goyal
@ 2021-06-21 17:07                 ` Max Reitz
  2021-06-21 21:27                   ` Vivek Goyal
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2021-06-21 17:07 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel

On 21.06.21 17:51, Vivek Goyal wrote:
> On Mon, Jun 21, 2021 at 11:02:16AM +0200, Max Reitz wrote:
>> On 18.06.21 20:29, Vivek Goyal wrote:

[snip]

>>> What am I not able to understand is that why we can't return error if
>>> we run into a temporary issue and can't generate file handle. What's
>>> that requirement that we need to hide the error and try to cover it
>>> up by some other means.
>> There is no requirement, it’s just that we need to hide ENOTSUPP errors
>> anyway (because e.g. some submounted filesystem may not support file
>> handles), so I considered hiding temporary errors
> ENOTSUPP is not a temporary error I guess. But this is a good point that
> if host filesystem does not support file handles, should we return
> error so that user is forced to remove "-o inode_file_handles" option.
>
> I can see multiple modes and they all seem to be useful in different
> scenarios.
>
> A. Do not use file handles at all.
>
> B. Use file handles if filesystem supports it. Also this could do some
>     kind of mix and matching so that some inodes use file handles while
>     others use fd. We could think of implementing some threshold and
>     if open fds go above this threshold, switch to file handle stuff.
>
> C. Strictly use file handles otherwise return error to caller.
>
> One possibility is that we implement two options inode_file_handles
> and no_inode_file_handles.
>
> - User specified -o no_inode_file_handles, implement A.
> - User specified -o inode_file_handles, implement C.
> - User did not specify anything, then use file handles opportunistically
>    as seen fit by daemon. This provides us the maximum flexibility and
>    this practically implements option B.
>
> IOW, if user did not specify anything, then we can think of implementing
> file handles by default and fallback to using O_PATH fds if filesystem
> does not support file handles (ENOTSUPP). But if user did specify
> "-o no_inode_file_handles" or "-o inode_file_handles", this kind
> of points to strictly implementing respective policy, IMHO. That's how
> I have implemented some other options.
>
> Alternatively, we could think of adding one more option say
> "inode_file_handles_only.
>
> - "-o no_inode_files_handles" implements A.
> - "-o inode_files_handles" implements B.
> - "-o inode_files_handles_only" implements C.
> - If user did not specify anything on command line, then its up to the
>    daemon whatever default policy it wants and default can change
>    over time.

I think it makes sense not to punish the user for wanting to use file 
handles as much as possible and still gracefully handling submounts that 
don’t support them.  So I’d want to implement B first, and have that be 
-o inode_files_handles.  I think A as -o no_inode_file_handles is also 
there, automatically...?

I don’t think there’s much of a problem with implementing C, except we 
probably want to log ENOTSUPP errors then, so the user can figure out 
what’s going on.  I feel like we can still wait with such an option 
until there’s a demand for it.

(Except that perhaps the demand would come in the form of “please help I 
use -o inode_file_handles but virtiofsd’s FD count is still too high I 
don’t know what to do”, and that probably wouldn’t be so great.  But 
then again, inodes_files_handles_only wouldn’t help a user in that case 
either, because it changes “works badly” to “doesn’t work at all”.  Now 
I’m wondering what the actual use case of inodes_files_handles_only 
would be.)

>> not to really add
>> complexity.  (Which is true, as can be seen from the diff stat I posted
>> below: Dropping the second hash table and making the handle part of lo_key,
>> so that temporary errors are now fatal, generates a diff of +37/-66, where
>> -20 are just two comments (which realistically I’d need to replace by
>> different comments), so in terms of code, it’s +37/-46.)
>>
>> I’d just rather handle errors gracefully when I find it doesn’t really cost
>> complexity.
>>
>>
>> However, you made a good point in that we must require name_to_handle_at()
>> to work if it worked before for some inode, not because it would be simpler,
>> but because it would be wrong otherwise.
>>
>> As for the other way around...  Well, now I don’t have a strong opinion on
>> it.  Handling temporary name_to_handle_at() failure after it worked the
>> first time should not add extra complexity, but it wouldn’t be symmetric.
>> Like, allowing temporary failure sometimes but not at other times.
> Right. If we decided that we need to generate file handle for an inode
> and underlying filesystem supports file handles, then handling temporary
> failures only sometimes will make it assymetric. At this point of time
> I am more inclined to return error to caller on temporary failures. But
> if this does not work well in practice, I am open to change the policy.
>
>> The next question is, how do we detect temporary failure, because if we look
>> up some new inode, name_to_handle_at() fails, we ignore it, and then it
>> starts to work and we fail all further lookups, that’s not good.  We should
>> have the first lookup fail.  I suppose ENOTSUPP means “OK to ignore”, and
>> for everything else we should let lookup fail?
> ENOTSUPP will be a permanent failure right? And not a temporary one.

I sure hope so.  The man page says “The filesystem does not support 
decoding of a pathname to a file handle.”, which sounds pretty 
permanent, unless perhaps the filesystem driver is updated in-place.

> It seems reasonable that we gracefully fall back to O_PATH fd in case
> of ENOTSUPP (Assuming -o inode_file_handles means try to use file
> handles and fallback to O_PATH fds if host filesystem does not
> support it).
>
> But for temporary failures we probably will have to return errors to callers.
> Do you have a specific temporary failure in mind which you are concerned
> about.

Oh, no.  It’s just, there can be errors other than EOPNOTSUPP (now that 
I looked into the man page I know that’s what it actually is), and we 
have to decide what to do then.  Nothing more.

>> (And that pretty much
>> answers my "what if name_to_handle_at() works the first time, but then
>> fails" question.  If we let anything but ENOTSUPP let the lookup fail, then
>> we should do so every time.)
> Agreed. ENOTSUPP seems to be the only error when falling back to O_PATH
> fd makes most sense to me. Rest of them probably should be propagated
> to caller.

Perhaps also EOVERFLOW, which indicates that our file handle storage is 
too small.  This shouldn’t happen, given that we use MAX_HANDLE_SZ to 
size it, but I imagine it’s possible that MAX_HANDLE_SZ is increased in 
the future, and when you then compile virtiofsd on one system and run it 
on another, it’s possible that some filesystem driver wants to store 
larger handles.  Given that we expect a file handle to always be the 
same for a given inode, such an EOVERFLOW error should be permanent, too 
(for a given inode).

> And given ENOTSUPP is a permanent failure, you probably will be able
> to add fhandle to lo_key and not require a separate mapping which
> looks up inode by fhandle.

Sure, but again, this doesn’t make anything simpler.

What I particularly don’t like about this is that for doing so, we have 
a choice between (A) adding fhandle to lo_key inline (i.e. of type 
lo_fhandle), or (B) adding it as a pointer (i.e. of type lo_fhandle *).

(A) seems the more obvious solution, but then lo_inode.fhandle would 
either not exist (one would have to go through lo_inode.key.fhandle, 
which I don’t really like), or be a pointer to &lo_inode.key.fhandle, 
which is also kind of weird.  Also, if there is no file handle, it would 
need to be explicitly 0, which wastes memory when not using -o 
inode_file_handles (and cycles, because hashing and comparing will take 
longer).

So (B) would do it the other way around, lo_inode.key.fhandle would be a 
copy of lo_inode.fhandle.  But having a pointer be part of a key is, 
while perfectly feasible, again strange.

I find two hash maps to just be cleaner.

Max


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
  2021-06-21 17:07                 ` Max Reitz
@ 2021-06-21 21:27                   ` Vivek Goyal
  0 siblings, 0 replies; 38+ messages in thread
From: Vivek Goyal @ 2021-06-21 21:27 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Mon, Jun 21, 2021 at 07:07:19PM +0200, Max Reitz wrote:
> On 21.06.21 17:51, Vivek Goyal wrote:
> > On Mon, Jun 21, 2021 at 11:02:16AM +0200, Max Reitz wrote:
> > > On 18.06.21 20:29, Vivek Goyal wrote:
> 
> [snip]
> 
> > > > What am I not able to understand is that why we can't return error if
> > > > we run into a temporary issue and can't generate file handle. What's
> > > > that requirement that we need to hide the error and try to cover it
> > > > up by some other means.
> > > There is no requirement, it’s just that we need to hide ENOTSUPP errors
> > > anyway (because e.g. some submounted filesystem may not support file
> > > handles), so I considered hiding temporary errors
> > ENOTSUPP is not a temporary error I guess. But this is a good point that
> > if host filesystem does not support file handles, should we return
> > error so that user is forced to remove "-o inode_file_handles" option.
> > 
> > I can see multiple modes and they all seem to be useful in different
> > scenarios.
> > 
> > A. Do not use file handles at all.
> > 
> > B. Use file handles if filesystem supports it. Also this could do some
> >     kind of mix and matching so that some inodes use file handles while
> >     others use fd. We could think of implementing some threshold and
> >     if open fds go above this threshold, switch to file handle stuff.
> > 
> > C. Strictly use file handles otherwise return error to caller.
> > 
> > One possibility is that we implement two options inode_file_handles
> > and no_inode_file_handles.
> > 
> > - User specified -o no_inode_file_handles, implement A.
> > - User specified -o inode_file_handles, implement C.
> > - User did not specify anything, then use file handles opportunistically
> >    as seen fit by daemon. This provides us the maximum flexibility and
> >    this practically implements option B.
> > 
> > IOW, if user did not specify anything, then we can think of implementing
> > file handles by default and fallback to using O_PATH fds if filesystem
> > does not support file handles (ENOTSUPP). But if user did specify
> > "-o no_inode_file_handles" or "-o inode_file_handles", this kind
> > of points to strictly implementing respective policy, IMHO. That's how
> > I have implemented some other options.
> > 
> > Alternatively, we could think of adding one more option say
> > "inode_file_handles_only.
> > 
> > - "-o no_inode_files_handles" implements A.
> > - "-o inode_files_handles" implements B.
> > - "-o inode_files_handles_only" implements C.
> > - If user did not specify anything on command line, then its up to the
> >    daemon whatever default policy it wants and default can change
> >    over time.
> 
> I think it makes sense not to punish the user for wanting to use file
> handles as much as possible and still gracefully handling submounts that
> don’t support them.  So I’d want to implement B first, and have that be -o
> inode_files_handles.

Agreed. B probably will be most common.

> I think A as -o no_inode_file_handles is also there,
> automatically...?

I do see you have added it.

    { "inode_file_handles", offsetof(struct lo_data, inode_file_handles), 1 },
    { "no_inode_file_handles",
      offsetof(struct lo_data, inode_file_handles),
      0 },

> 
> I don’t think there’s much of a problem with implementing C, except we
> probably want to log ENOTSUPP errors then, so the user can figure out what’s
> going on.  I feel like we can still wait with such an option until there’s a
> demand for it.

Agreed.

> 
> (Except that perhaps the demand would come in the form of “please help I use
> -o inode_file_handles but virtiofsd’s FD count is still too high I don’t
> know what to do”, and that probably wouldn’t be so great.  But then again,
> inodes_files_handles_only wouldn’t help a user in that case either, because
> it changes “works badly” to “doesn’t work at all”.

May be we should log it. Will be good if some of the Info logs go into
syslogs and users can look at it.

> Now I’m wondering what
> the actual use case of inodes_files_handles_only would be.)

I was thinking more of debugging use cases. (Assuming, "-o
inode_file_handles" can take liberties and use O_PATH fd for some inodes
and file handle for other inodes). For example, determining what's the
overhead of using file handles. In that case I will like to make sure
O_PATH fds are not being used.

But we don't need it now. We can add it when we need it later.

> 
> > > not to really add
> > > complexity.  (Which is true, as can be seen from the diff stat I posted
> > > below: Dropping the second hash table and making the handle part of lo_key,
> > > so that temporary errors are now fatal, generates a diff of +37/-66, where
> > > -20 are just two comments (which realistically I’d need to replace by
> > > different comments), so in terms of code, it’s +37/-46.)
> > > 
> > > I’d just rather handle errors gracefully when I find it doesn’t really cost
> > > complexity.
> > > 
> > > 
> > > However, you made a good point in that we must require name_to_handle_at()
> > > to work if it worked before for some inode, not because it would be simpler,
> > > but because it would be wrong otherwise.
> > > 
> > > As for the other way around...  Well, now I don’t have a strong opinion on
> > > it.  Handling temporary name_to_handle_at() failure after it worked the
> > > first time should not add extra complexity, but it wouldn’t be symmetric.
> > > Like, allowing temporary failure sometimes but not at other times.
> > Right. If we decided that we need to generate file handle for an inode
> > and underlying filesystem supports file handles, then handling temporary
> > failures only sometimes will make it assymetric. At this point of time
> > I am more inclined to return error to caller on temporary failures. But
> > if this does not work well in practice, I am open to change the policy.
> > 
> > > The next question is, how do we detect temporary failure, because if we look
> > > up some new inode, name_to_handle_at() fails, we ignore it, and then it
> > > starts to work and we fail all further lookups, that’s not good.  We should
> > > have the first lookup fail.  I suppose ENOTSUPP means “OK to ignore”, and
> > > for everything else we should let lookup fail?
> > ENOTSUPP will be a permanent failure right? And not a temporary one.
> 
> I sure hope so.  The man page says “The filesystem does not support decoding
> of a pathname to a file handle.”, which sounds pretty permanent, unless
> perhaps the filesystem driver is updated in-place.
> 
> > It seems reasonable that we gracefully fall back to O_PATH fd in case
> > of ENOTSUPP (Assuming -o inode_file_handles means try to use file
> > handles and fallback to O_PATH fds if host filesystem does not
> > support it).
> > 
> > But for temporary failures we probably will have to return errors to callers.
> > Do you have a specific temporary failure in mind which you are concerned
> > about.
> 
> Oh, no.  It’s just, there can be errors other than EOPNOTSUPP (now that I
> looked into the man page I know that’s what it actually is), and we have to
> decide what to do then.  Nothing more.
> 
> > > (And that pretty much
> > > answers my "what if name_to_handle_at() works the first time, but then
> > > fails" question.  If we let anything but ENOTSUPP let the lookup fail, then
> > > we should do so every time.)
> > Agreed. ENOTSUPP seems to be the only error when falling back to O_PATH
> > fd makes most sense to me. Rest of them probably should be propagated
> > to caller.
> 
> Perhaps also EOVERFLOW, which indicates that our file handle storage is too
> small.  This shouldn’t happen, given that we use MAX_HANDLE_SZ to size it,
> but I imagine it’s possible that MAX_HANDLE_SZ is increased in the future,
> and when you then compile virtiofsd on one system and run it on another,
> it’s possible that some filesystem driver wants to store larger handles. 
> Given that we expect a file handle to always be the same for a given inode,
> such an EOVERFLOW error should be permanent, too (for a given inode).

man page says that EOVERFLOW can also indicate that no file handle is
available.

*******
Some care  is needed here as EOVERFLOW can also indicate that no file
handle is available for this particular name in a filesystem which
does  normally  support  file-handle lookup.  This case can be detected
when the EOVERFLOW error is returned without handle_bytes being increased.
*****

> 
> > And given ENOTSUPP is a permanent failure, you probably will be able
> > to add fhandle to lo_key and not require a separate mapping which
> > looks up inode by fhandle.
> 
> Sure, but again, this doesn’t make anything simpler.
> 
> What I particularly don’t like about this is that for doing so, we have a
> choice between (A) adding fhandle to lo_key inline (i.e. of type
> lo_fhandle), or (B) adding it as a pointer (i.e. of type lo_fhandle *).
> 
> (A) seems the more obvious solution, but then lo_inode.fhandle would either
> not exist (one would have to go through lo_inode.key.fhandle, which I don’t
> really like), or be a pointer to &lo_inode.key.fhandle, which is also kind
> of weird.  Also, if there is no file handle, it would need to be explicitly
> 0, which wastes memory when not using -o inode_file_handles (and cycles,
> because hashing and comparing will take longer).
> 
> So (B) would do it the other way around, lo_inode.key.fhandle would be a
> copy of lo_inode.fhandle.  But having a pointer be part of a key is, while
> perfectly feasible, again strange.

Thinking more about it. So an inode is uniquely identified either by
lo_key1 (dev, ino, mnt_id) or lo_key2 (fhandle, mnt_id). 

Ideally we will add inodes in hash table using lo_key2 if file handles
are enabled and using lo_key1 if file handles are not enabled.

Given we support submounts and its possible that we have mix if inodes
where for some inodes we use lo_key1 while for other we use lo_key2.

But it will never happen that for same inode we keep flipping the
mode between lo_key1 and lo_key2. (We will return temporary errors
from file system).

> 
> I find two hash maps to just be cleaner.

Given an inode is supposed to be looked up either by lo_key1 or 
lo_key2 (depending on whether file system support file handles)
, I guess it makes less sense to merge two keys and come
up with single lo_key which contains everything. So I am fine
with adding another key type for fhandle and lookup in separate
hash map.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
  2021-06-21  9:02             ` Max Reitz
  2021-06-21 15:51               ` Vivek Goyal
@ 2021-07-13 15:07               ` Max Reitz
  2021-07-20 14:50                 ` Vivek Goyal
  1 sibling, 1 reply; 38+ messages in thread
From: Max Reitz @ 2021-07-13 15:07 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel

So I’m coming back to this after three weeks (well, PTO), and this again 
turns into a bit of a pain, actually.

I don’t think it’s anything serious, but I had thought we had found 
something that would make us both happy because it wouldn’t be too ugly, 
and now it’s turning ugly again...  So I’m sending this mail as a heads 
up before I send v3 in the next days, to explain my thought process.

On 21.06.21 11:02, Max Reitz wrote:
> On 18.06.21 20:29, Vivek Goyal wrote:
>

[...]

>> I am still reading your code and trying to understand it. But one
>> question came to mind. What happens if we can generate file handle
>> during lookup. But can't generate when same file is looked up again.
>>
>> - A file foo.txt is looked. We can create file handle and we add it
>>    to lo->inodes_by_handle as well as lo->inodes_by_ds.
>>
>> - Say somebody deleted file and created again and inode number got
>>    reused.
>>
>> - Now during ->revalidation path, lookup happens again. This time say
>>    we can't generate file handle. If am reading lo_do_find() code
>>    correctly, it will find the old inode using ids and return same
>>    inode as result of lookup. And we did not recognize that inode
>>    number has been reused.
>
> Oh, that’s a good point.  If an lo_inode has no O_PATH fd but is only 
> addressed by handle, we must always look it up by handle.

Also, just wanted to throw in this remark:

Now that I read the code again, lo_do_find() already has a condition to 
prevent this.  It’s this:

if (p && fhandle != NULL && p->fhandle != NULL) {
     p = NULL;
}

There’s just one thing wrong with it, and that is the `fhandle != NULL` 
part.  It has no place there.  But this piece of code does exactly what 
we’d need it do if it were just:

if (p && p->fhandle != NULL) {
     p = NULL;
}

[...]

> However, you made a good point in that we must require 
> name_to_handle_at() to work if it worked before for some inode, not 
> because it would be simpler, but because it would be wrong otherwise.
>
> As for the other way around...  Well, now I don’t have a strong 
> opinion on it.  Handling temporary name_to_handle_at() failure after 
> it worked the first time should not add extra complexity, but it 
> wouldn’t be symmetric.  Like, allowing temporary failure sometimes but 
> not at other times.

(I think I mistyped here, it should be “Handling name_to_handle_at() 
randomly working after it failed the first time”.)

> The next question is, how do we detect temporary failure, because if 
> we look up some new inode, name_to_handle_at() fails, we ignore it, 
> and then it starts to work and we fail all further lookups, that’s not 
> good.  We should have the first lookup fail.  I suppose ENOTSUPP means 
> “OK to ignore”, and for everything else we should let lookup fail?  
> (And that pretty much answers my "what if name_to_handle_at() works 
> the first time, but then fails" question.  If we let anything but 
> ENOTSUPP let the lookup fail, then we should do so every time.)

I don’t think this will work as cleanly as I’d hoped.

The problem I’m facing is that get_file_handle() doesn’t only call 
name_to_handle_at(), but also contains a lot of code managing 
mount_fds.  There are a lot of places that can fail, too, and I think we 
should have them fall back to using an O_PATH FD:

Say mount_fds doesn’t contain an FD for the new handle’s mount ID yet, 
so we want to add one.  However, it turns out that the file is not a 
regular file or directory, so we cannot open it as a regular FD and add 
it to mount_fds; or that it is a regular file, but without permission to 
open it O_RDONLY.  So we cannot return a file handle, because it will 
not be usable until a mount FD is added.

I think in such a case we should fall back to an O_PATH FD, because this 
is not some unexpected error, but just an unfortunate (but reproducible 
and valid) circumstance where using `-o inode_file_handles` fails to do 
something that works without it.

Now, however, this means that the next time we try to generate a handle 
for this file (to look it up), it will absolutely work if some other FD 
was added to mount_fds for this mount ID in the meantime.


We could get around this by not trying to open the file for which we are 
to generate a handle to add its FD to mount_fds, but instead doing what 
the open_by_handle_at() man page suggests:

> The mount_id argument returns an identifier for the filesystem mount 
> that corresponds to pathname. This corresponds to the first field in 
> one of the records in /proc/self/mountinfo. Opening the pathname in 
> the fifth field of that record yields a file descriptor for the mount 
> point; that file descriptor can be used in a subsequent call to 
> open_by_handle_at().

However, I’d rather avoid parsing mountinfo.  And as far as I 
understand, the only problem here is that we’ll have to cope with the 
fact that sometimes on lookups, we can generate a file handle, but the 
lo_inode we want to find has no file handle attached to it (because 
get_file_handle() failed the first time), and so we won’t find it by 
that handle but have to look it up by its inode ID. (Which is safe, 
because that lo_inode must have an O_PATH FD attached to it, so the 
inode ID cannot be reused.)  And that’s something that this series 
already does, so I tend to favor that over parsing mountinfo.

Max


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
  2021-07-13 15:07               ` Max Reitz
@ 2021-07-20 14:50                 ` Vivek Goyal
  2021-07-21  8:29                   ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2021-07-20 14:50 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs, qemu-devel

On Tue, Jul 13, 2021 at 05:07:31PM +0200, Max Reitz wrote:

[..]
> > The next question is, how do we detect temporary failure, because if we
> > look up some new inode, name_to_handle_at() fails, we ignore it, and
> > then it starts to work and we fail all further lookups, that’s not
> > good.  We should have the first lookup fail.  I suppose ENOTSUPP means
> > “OK to ignore”, and for everything else we should let lookup fail?  (And
> > that pretty much answers my "what if name_to_handle_at() works the first
> > time, but then fails" question.  If we let anything but ENOTSUPP let the
> > lookup fail, then we should do so every time.)
> 
> I don’t think this will work as cleanly as I’d hoped.
> 
> The problem I’m facing is that get_file_handle() doesn’t only call
> name_to_handle_at(), but also contains a lot of code managing mount_fds. 
> There are a lot of places that can fail, too, and I think we should have
> them fall back to using an O_PATH FD:
> 
> Say mount_fds doesn’t contain an FD for the new handle’s mount ID yet, so we
> want to add one.  However, it turns out that the file is not a regular file
> or directory, so we cannot open it as a regular FD and add it to mount_fds;

Hi Max,

So an fd opened using O_PATH can't be used as "mount_fd" in
open_by_handle_at()? (I see that you are already first opening O_PATH
fd and then verifying if this is a regular file/dir or not).

> or that it is a regular file, but without permission to open it O_RDONLY. 
> So we cannot return a file handle, because it will not be usable until a
> mount FD is added.
> 
> I think in such a case we should fall back to an O_PATH FD, because this is
> not some unexpected error, but just an unfortunate (but reproducible and
> valid) circumstance where using `-o inode_file_handles` fails to do
> something that works without it.
> 
> Now, however, this means that the next time we try to generate a handle for
> this file (to look it up), it will absolutely work if some other FD was
> added to mount_fds for this mount ID in the meantime.
> 
> 
> We could get around this by not trying to open the file for which we are to
> generate a handle to add its FD to mount_fds, but instead doing what the
> open_by_handle_at() man page suggests:
> 
> > The mount_id argument returns an identifier for the filesystem mount
> > that corresponds to pathname. This corresponds to the first field in one
> > of the records in /proc/self/mountinfo. Opening the pathname in the
> > fifth field of that record yields a file descriptor for the mount point;
> > that file descriptor can be used in a subsequent call to
> > open_by_handle_at().
> 
> However, I’d rather avoid parsing mountinfo.

Hmm.., not sure what's wrong with parsing mountinfo. Example code does
not look too bad. Also it mentions that libmount provides helpers
(if we don't want to write our own function to parse mountinfo).

I would think parsing mountinfo is a good idea because it solves
your problem of not wanting to open device nodes for mount_fds. And
in turn not relying on a fallback to O_PATH fds.

Few thoughts overall.

- We are primarily disagreeing on whether we should fallback to O_PATH
  fds or not if something goes wrong w.r.t handle generation.

  My preference is that atleast in the initial patches we should not try
  to fall back. EOPNOTSUPP is the only case we need to take care of.
  Otherwise if there is any temporary error (EMOMEM, running out of
  fds or something else), we return it to the caller. That's what
  rest of the code is doing. If some operation fails due to some
  temporary error (say ENOMEM), we return error to the caller.

- If above apporach creates problem, we can always add the fallback
  path later.

- If you have strong preference for fallback path, can we have it
  as last patch of the series and not bake it in from the beginning
  of the patch series.

- Even if we add fallback path, can we not make that assumption in
  other areas of the code. For example, can we not avoid parsing
  mountinfo to generate mount_fd, because we have a fallback path
  and we can afford to not generate handle. I mean if we were to
  remove fallback logic at some point of time, it will be much
  easier to do if dependency on this assumption did not spread
  too much.

Thanks
Vivek


> And as far as I understand,
> the only problem here is that we’ll have to cope with the fact that
> sometimes on lookups, we can generate a file handle, but the lo_inode we
> want to find has no file handle attached to it (because get_file_handle()
> failed the first time), and so we won’t find it by that handle but have to
> look it up by its inode ID. (Which is safe, because that lo_inode must have
> an O_PATH FD attached to it, so the inode ID cannot be reused.)  And that’s
> something that this series already does, so I tend to favor that over
> parsing mountinfo.

> 
> Max
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table
  2021-07-20 14:50                 ` Vivek Goyal
@ 2021-07-21  8:29                   ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2021-07-21  8:29 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel

On 20.07.21 16:50, Vivek Goyal wrote:
> On Tue, Jul 13, 2021 at 05:07:31PM +0200, Max Reitz wrote:
>
> [..]
>>> The next question is, how do we detect temporary failure, because if we
>>> look up some new inode, name_to_handle_at() fails, we ignore it, and
>>> then it starts to work and we fail all further lookups, that’s not
>>> good.  We should have the first lookup fail.  I suppose ENOTSUPP means
>>> “OK to ignore”, and for everything else we should let lookup fail?  (And
>>> that pretty much answers my "what if name_to_handle_at() works the first
>>> time, but then fails" question.  If we let anything but ENOTSUPP let the
>>> lookup fail, then we should do so every time.)
>> I don’t think this will work as cleanly as I’d hoped.
>>
>> The problem I’m facing is that get_file_handle() doesn’t only call
>> name_to_handle_at(), but also contains a lot of code managing mount_fds.
>> There are a lot of places that can fail, too, and I think we should have
>> them fall back to using an O_PATH FD:
>>
>> Say mount_fds doesn’t contain an FD for the new handle’s mount ID yet, so we
>> want to add one.  However, it turns out that the file is not a regular file
>> or directory, so we cannot open it as a regular FD and add it to mount_fds;
> Hi Max,
>
> So an fd opened using O_PATH can't be used as "mount_fd" in
> open_by_handle_at()? (I see that you are already first opening O_PATH
> fd and then verifying if this is a regular file/dir or not).

Yep, unfortunately we need a non-O_PATH fd.

>> or that it is a regular file, but without permission to open it O_RDONLY.
>> So we cannot return a file handle, because it will not be usable until a
>> mount FD is added.
>>
>> I think in such a case we should fall back to an O_PATH FD, because this is
>> not some unexpected error, but just an unfortunate (but reproducible and
>> valid) circumstance where using `-o inode_file_handles` fails to do
>> something that works without it.
>>
>> Now, however, this means that the next time we try to generate a handle for
>> this file (to look it up), it will absolutely work if some other FD was
>> added to mount_fds for this mount ID in the meantime.
>>
>>
>> We could get around this by not trying to open the file for which we are to
>> generate a handle to add its FD to mount_fds, but instead doing what the
>> open_by_handle_at() man page suggests:
>>
>>> The mount_id argument returns an identifier for the filesystem mount
>>> that corresponds to pathname. This corresponds to the first field in one
>>> of the records in /proc/self/mountinfo. Opening the pathname in the
>>> fifth field of that record yields a file descriptor for the mount point;
>>> that file descriptor can be used in a subsequent call to
>>> open_by_handle_at().
>> However, I’d rather avoid parsing mountinfo.
> Hmm.., not sure what's wrong with parsing mountinfo.

Well, it’s just that it’s some additional complexity that I didn’t 
consider necessary.

(Because I was content with falling back in the rare case that the 
looked up file is not a regular file or directory.)

> Example code does
> not look too bad. Also it mentions that libmount provides helpers
> (if we don't want to write our own function to parse mountinfo).
>
> I would think parsing mountinfo is a good idea because it solves
> your problem of not wanting to open device nodes for mount_fds. And
> in turn not relying on a fallback to O_PATH fds.

Well.  Strictly speaking, it isn’t really my problem, because I didn’t 
really consider a rare fallback to be a problem.

Furthermore, I don’t even know whether it really solves the problem.  
Just as a mount point need not be a directory, it need not even be a 
regular file.  You absolutely can mount a filesystem on a device file, 
and have the root node be a device file, too:

(I did this by modifying the qemu FUSE block export code (to pass “dev” 
as a mount option, to drop the check whether the mount point is a 
regular file, and to report a device file instead of a regular file).  
It doesn’t work perfectly well because FUSE drops the rdev attribute, 
and so you can only create 0:0 device files, but, well...)

$ cat /proc/self/mountinfo
436 40 0:45 / /tmp/blub rw,nosuid,relatime shared:238 - fuse /dev/fuse 
rw,user_id=0,group_id=0,default_permissions,allow_other,max_read=67108864
$ stat /tmp/blub
File: /tmp/blub
Size: 1073741824 Blocks: 2097152 IO Block: 1 character special file
Device: 2dh/45d Inode: 1 Links: 1 Device type: 0,0
[...]

I know this is something that nobody will normally ever do, but I still 
don’t think we can absolutely safely assume a mount point to always be a 
regular file or directory.

> Few thoughts overall.
>
> - We are primarily disagreeing on whether we should fallback to O_PATH
>    fds or not if something goes wrong w.r.t handle generation.
>
>    My preference is that atleast in the initial patches we should not try
>    to fall back. EOPNOTSUPP is the only case we need to take care of.
>    Otherwise if there is any temporary error (EMOMEM, running out of
>    fds or something else), we return it to the caller. That's what
>    rest of the code is doing. If some operation fails due to some
>    temporary error (say ENOMEM), we return error to the caller.
>
> - If above apporach creates problem, we can always add the fallback
>    path later.
>
> - If you have strong preference for fallback path, can we have it
>    as last patch of the series and not bake it in from the beginning
>    of the patch series.
>
> - Even if we add fallback path, can we not make that assumption in
>    other areas of the code. For example, can we not avoid parsing
>    mountinfo to generate mount_fd, because we have a fallback path
>    and we can afford to not generate handle. I mean if we were to
>    remove fallback logic at some point of time, it will be much
>    easier to do if dependency on this assumption did not spread
>    too much.

The problem I see is that we always need a fallback path (for 
EOPNOTSUPP), and it’s very simple to have.  I see no reason to ever 
remove it, because removing it won’t really simplify anything.

As for when to report errors to the guest and when not, the simplest 
case is to have all errors fall back to O_PATH fds.  It’s naturally more 
difficult having to distinguish between errors that should be passed to 
the guest, and errors that should result in fallbacks.

In fact, frankly, I still don’t understand why it’s a problem to always 
fall back.  I thought I understood one problem in our discussion on v2, 
but then it turned out that it was just a single condition somewhere 
else that was wrong (i.e. the fact that we must never look up lo_inodes 
that have no O_PATH fd by a device ID without a generation ID).

I understand that intuitively erroring out is easier.  But technically, 
it isn’t, and I don’t see any technical advantage in passing 
file-handle-related errors to the guest immediately rather than trying 
to fall back first.  (Even if the fallback then fails, and one could 
argue that we could have seen it coming based on the file handle error, 
taking a bit more time for a case that will fail anyway shouldn’t be a 
real problem.  (And the “more time” is just a single openat(O_PATH).))

Max


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2021-07-21  8:29 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-09 15:55 [PATCH v2 0/9] virtiofsd: Allow using file handles instead of O_PATH FDs Max Reitz
2021-06-09 15:55 ` [Virtio-fs] " Max Reitz
2021-06-09 15:55 ` [PATCH v2 1/9] virtiofsd: Add TempFd structure Max Reitz
2021-06-09 15:55   ` [Virtio-fs] " Max Reitz
2021-06-09 15:55 ` [PATCH v2 2/9] virtiofsd: Use lo_inode_open() instead of openat() Max Reitz
2021-06-09 15:55   ` [Virtio-fs] " Max Reitz
2021-06-09 15:55 ` [PATCH v2 3/9] virtiofsd: Add lo_inode_fd() helper Max Reitz
2021-06-09 15:55   ` [Virtio-fs] " Max Reitz
2021-06-09 15:55 ` [PATCH v2 4/9] virtiofsd: Let lo_fd() return a TempFd Max Reitz
2021-06-09 15:55   ` [Virtio-fs] " Max Reitz
2021-06-09 15:55 ` [PATCH v2 5/9] virtiofsd: Let lo_inode_open() " Max Reitz
2021-06-09 15:55   ` [Virtio-fs] " Max Reitz
2021-06-09 15:55 ` [PATCH v2 6/9] virtiofsd: Add lo_inode.fhandle Max Reitz
2021-06-09 15:55   ` [Virtio-fs] " Max Reitz
2021-06-09 15:55 ` [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table Max Reitz
2021-06-09 15:55   ` [Virtio-fs] " Max Reitz
2021-06-11 20:04   ` Vivek Goyal
2021-06-16 13:38     ` Max Reitz
2021-06-17 21:21       ` Vivek Goyal
2021-06-18  8:28         ` Max Reitz
2021-06-18 18:29           ` Vivek Goyal
2021-06-21  9:02             ` Max Reitz
2021-06-21 15:51               ` Vivek Goyal
2021-06-21 17:07                 ` Max Reitz
2021-06-21 21:27                   ` Vivek Goyal
2021-07-13 15:07               ` Max Reitz
2021-07-20 14:50                 ` Vivek Goyal
2021-07-21  8:29                   ` Max Reitz
2021-06-18  8:30   ` Max Reitz
2021-06-18  8:30     ` [Virtio-fs] " Max Reitz
2021-06-09 15:55 ` [PATCH v2 8/9] virtiofsd: Optionally fill lo_inode.fhandle Max Reitz
2021-06-09 15:55   ` [Virtio-fs] " Max Reitz
2021-06-09 15:55 ` [PATCH v2 9/9] virtiofsd: Add lazy lo_do_find() Max Reitz
2021-06-09 15:55   ` [Virtio-fs] " Max Reitz
2021-06-11 19:19 ` [PATCH v2 0/9] virtiofsd: Allow using file handles instead of O_PATH FDs Vivek Goyal
2021-06-11 19:19   ` [Virtio-fs] " Vivek Goyal
2021-06-16 13:41   ` Max Reitz
2021-06-16 13:41     ` [Virtio-fs] " Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.