All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] virtiofsd: Find original inode ID of mount points
@ 2021-05-12 12:55 Max Reitz
  2021-05-12 12:55 ` [PATCH 1/3] " Max Reitz
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Max Reitz @ 2021-05-12 12:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi, Max Reitz

Hi,

Not announcing submounts to the guest may lead to duplicated
st_dev/st_ino combinations, because inodes on different filesystems
(different st_dev) may share the same inode ID (st_ino).  If the guest
only sees a single st_dev, those inodes will appear to have the same
st_dev/st_ino combination.

Announcing submounts is supposed to solve this problem by making the
guest report different st_dev IDs for different filesystems.

However, there is one loop hole: Submounts are implemented as
auto-mounts, meaning when the virtio-fs filesystem is mounted, the
submounts will not be mounted immediately.  In the guest, it is possible
to stat() these mount points with AT_NO_AUTOMOUNT to receive information
without mounting them, so they will then show whatever st_ino virtiofsd
passes, and the st_dev of the parent filesystem.

Unfortunately, as far as we understood, the only st_ino that virtiofsd
could inquire was the one on the submounted filesystem, i.e. the st_ino
of that FS’s root node.

Thus, we again got a collision: In said case, the guest would see st_dev
of the parent FS, but st_ino of the submounted FS.  This is very likely
to be the same st_dev/st_ino combination as the root node of the parent
FS.

For nested mount structures, this can be reproduced with `find`:
Mounting several filesystems under each other, passing everything into a
guest, mounting the root virtio-fs FS there, and then invoking `find`
would likely result in it complaining about filesystem loops and
refusing to visit the submount points.  (These loops are reported
because it takes notes of st_dev/st_ino combinations, and then reports a
loop when some combination reappears.)  This can only be fixed by
forcing all submounts to be (auto-)mounted.

What we’d need to do is report the inode ID of the mount point directory
that it has on the parent filesystem to the guest, until the submount is
auto-mounted there.  It’s just that we thought there was no way to
inquire this parent FS st_ino.

It turns out that our understand was wrong, though: There is a way,
namely readdir().  The dirent.d_ino field represents (in practice) the
st_ino of some directory entry on the directory’s filesystem, and so is
exactly the value we want to give to the guest.

Let’s do that.


Max Reitz (3):
  virtiofsd: Find original inode ID of mount points
  virtiofs_submounts.py: Do not generate ssh key
  virtiofs_submounts.py: Check `find`

 tools/virtiofsd/passthrough_ll.c              | 104 +++++++++++++++++-
 tests/acceptance/virtiofs_submounts.py        |  10 +-
 .../virtiofs_submounts.py.data/guest.sh       |  56 ++++------
 3 files changed, 122 insertions(+), 48 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/3] virtiofsd: Find original inode ID of mount points
  2021-05-12 12:55 [PATCH 0/3] virtiofsd: Find original inode ID of mount points Max Reitz
@ 2021-05-12 12:55 ` Max Reitz
  2021-05-12 15:59   ` Connor Kuehl
                     ` (3 more replies)
  2021-05-12 12:55 ` [PATCH 2/3] virtiofs_submounts.py: Do not generate ssh key Max Reitz
                   ` (2 subsequent siblings)
  3 siblings, 4 replies; 22+ messages in thread
From: Max Reitz @ 2021-05-12 12:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi, Max Reitz

Mount point directories represent two inodes: On one hand, they are a
normal directory on their parent filesystem.  On the other, they are the
root node of the filesystem mounted there.  Thus, they have two inode
IDs.

Right now, we only report the latter inode ID (i.e. the inode ID of the
mounted filesystem's root node).  This is fine once the guest has
auto-mounted a submount there (so this inode ID goes with a device ID
that is distinct from the parent filesystem), but before the auto-mount,
they have the device ID of the parent and the inode ID for the submount.
This is problematic because this is likely exactly the same
st_dev/st_ino combination as the parent filesystem's root node.  This
leads to problems for example with `find`, which will thus complain
about a filesystem loop if it has visited the parent filesystem's root
node before, and then refuse to descend into the submount.

There is a way to find the mount directory's original inode ID, and that
is to readdir(3) the parent directory, look for the mount directory, and
read the dirent.d_ino field.  Using this, we can let lookup and
readdirplus return that original inode ID, which the guest will thus
show until the submount is auto-mounted.  (Then, it will invoke getattr
and that stat(2) call will return the inode ID for the submount.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 104 +++++++++++++++++++++++++++++--
 1 file changed, 99 insertions(+), 5 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 1553d2ef45..110b6e7e5b 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -968,14 +968,87 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
     return 0;
 }
 
+/*
+ * Use readdir() to find mp_name's inode ID on the parent's filesystem.
+ * (For mount points, stat() will only return the inode ID on the
+ * filesystem mounted there, i.e. the root directory's inode ID.  The
+ * mount point originally was a directory on the parent filesystem,
+ * though, and so has a different inode ID there.  When passing
+ * submount information to the guest, we need to pass this other ID,
+ * so the guest can use it as the inode ID until the submount is
+ * auto-mounted.  (At which point the guest will invoke getattr and
+ * find the inode ID on the submount.))
+ *
+ * Return 0 on success, and -errno otherwise.  *pino is set only in
+ * case of success.
+ */
+static int get_mp_ino_on_parent(const struct lo_inode *dir, const char *mp_name,
+                                ino_t *pino)
+{
+    int dirfd = -1;
+    int ret;
+    DIR *dp = NULL;
+
+    dirfd = openat(dir->fd, ".", O_RDONLY);
+    if (dirfd < 0) {
+        ret = -errno;
+        goto out;
+    }
+
+    dp = fdopendir(dirfd);
+    if (!dp) {
+        ret = -errno;
+        goto out;
+    }
+    /* Owned by dp now */
+    dirfd = -1;
+
+    while (true) {
+        struct dirent *de;
+
+        errno = 0;
+        de = readdir(dp);
+        if (!de) {
+            ret = errno ? -errno : -ENOENT;
+            goto out;
+        }
+
+        if (!strcmp(de->d_name, mp_name)) {
+            *pino = de->d_ino;
+            ret = 0;
+            goto out;
+        }
+    }
+
+out:
+    if (dp) {
+        closedir(dp);
+    }
+    if (dirfd >= 0) {
+        close(dirfd);
+    }
+    return ret;
+}
+
 /*
  * Increments nlookup on the inode on success. unref_inode_lolocked() must be
  * called eventually to decrement nlookup again. If inodep is non-NULL, the
  * inode pointer is stored and the caller must call lo_inode_put().
+ *
+ * If parent_fs_st_ino is true, the entry is a mount point, and submounts are
+ * announced to the guest, set e->attr.st_ino to the entry's inode ID on its
+ * parent filesystem instead of its inode ID on the filesystem mounted on it.
+ * (For mount points, the entry encompasses two inodes: One on the parent FS,
+ * and one on the mounted FS (where it is the root node), so it has two inode
+ * IDs.  When looking up entries, we should show the guest the parent FS's inode
+ * ID, because as long as the guest has not auto-mounted the submount, it should
+ * see that original ID.  Once it does perform the auto-mount, it will invoke
+ * getattr and see the root node's inode ID.)
  */
 static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
                         struct fuse_entry_param *e,
-                        struct lo_inode **inodep)
+                        struct lo_inode **inodep,
+                        bool parent_fs_st_ino)
 {
     int newfd;
     int res;
@@ -984,6 +1057,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
     struct lo_data *lo = lo_data(req);
     struct lo_inode *inode = NULL;
     struct lo_inode *dir = lo_inode(req, parent);
+    ino_t ino_id_for_guest;
 
     if (inodep) {
         *inodep = NULL; /* in case there is an error */
@@ -1018,9 +1092,22 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out_err;
     }
 
+    ino_id_for_guest = e->attr.st_ino;
+
     if (S_ISDIR(e->attr.st_mode) && lo->announce_submounts &&
         (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id)) {
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
+
+        if (parent_fs_st_ino) {
+            /*
+             * Best effort, so ignore errors.
+             * Also note that using readdir() means there may be races:
+             * The directory entry we find (if any) may be different
+             * from newfd.  Again, this is a best effort.  Reporting
+             * the wrong inode ID to the guest is not catastrophic.
+             */
+            get_mp_ino_on_parent(dir, name, &ino_id_for_guest);
+        }
     }
 
     inode = lo_find(lo, &e->attr, mnt_id);
@@ -1043,6 +1130,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
 
         inode->nlookup = 1;
         inode->fd = newfd;
+        /*
+         * For the inode key, use the dev/ino/mnt ID as reported by stat()
+         * (i.e. not ino_id_for_guest)
+         */
         inode->key.ino = e->attr.st_ino;
         inode->key.dev = e->attr.st_dev;
         inode->key.mnt_id = mnt_id;
@@ -1058,6 +1149,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
     }
     e->ino = inode->fuse_ino;
 
+    /* Report ino_id_for_guest to the guest */
+    e->attr.st_ino = ino_id_for_guest;
+
     /* Transfer ownership of inode pointer to caller or drop it */
     if (inodep) {
         *inodep = inode;
@@ -1104,7 +1198,7 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
         return;
     }
 
-    err = lo_do_lookup(req, parent, name, &e, NULL);
+    err = lo_do_lookup(req, parent, name, &e, NULL, true);
     if (err) {
         fuse_reply_err(req, err);
     } else {
@@ -1217,7 +1311,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
         goto out;
     }
 
-    saverr = lo_do_lookup(req, parent, name, &e, NULL);
+    saverr = lo_do_lookup(req, parent, name, &e, NULL, false);
     if (saverr) {
         goto out;
     }
@@ -1714,7 +1808,7 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
 
         if (plus) {
             if (!is_dot_or_dotdot(name)) {
-                err = lo_do_lookup(req, ino, name, &e, NULL);
+                err = lo_do_lookup(req, ino, name, &e, NULL, true);
                 if (err) {
                     goto error;
                 }
@@ -1936,7 +2030,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
         goto out;
     }
 
-    err = lo_do_lookup(req, parent, name, &e, &inode);
+    err = lo_do_lookup(req, parent, name, &e, &inode, false);
     if (err) {
         goto out;
     }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/3] virtiofs_submounts.py: Do not generate ssh key
  2021-05-12 12:55 [PATCH 0/3] virtiofsd: Find original inode ID of mount points Max Reitz
  2021-05-12 12:55 ` [PATCH 1/3] " Max Reitz
@ 2021-05-12 12:55 ` Max Reitz
  2021-05-12 12:55 ` [PATCH 3/3] virtiofs_submounts.py: Check `find` Max Reitz
  2021-05-12 14:37 ` [Virtio-fs] Fwd: [PATCH 0/3] virtiofsd: Find original inode ID of mount points Max Reitz
  3 siblings, 0 replies; 22+ messages in thread
From: Max Reitz @ 2021-05-12 12:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi, Max Reitz

Since c0c5a7f18e623b8f6eb, avocado_qemu provides an ssh key by default.
There is no need to generate one in the virtiofs_submounts test.
(In fact, it currently is harmful, because e8197c6e0c56aff83 made
LinuxTest expect ssh_pubkey to be a file path and not the file content,
but virtiofs_submounts provides the latter.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/acceptance/virtiofs_submounts.py | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/tests/acceptance/virtiofs_submounts.py b/tests/acceptance/virtiofs_submounts.py
index 46fa65392a..22fe14b661 100644
--- a/tests/acceptance/virtiofs_submounts.py
+++ b/tests/acceptance/virtiofs_submounts.py
@@ -220,7 +220,7 @@ def live_cleanup(self):
         self.run(('bash', self.get_data('cleanup.sh'), scratch_dir))
 
     @skipUnless(*has_cmds(('sudo -n', ('sudo', '-n', 'true')),
-                          'ssh-keygen', 'bash', 'losetup', 'mkfs.xfs', 'mount'))
+                          'bash', 'losetup', 'mkfs.xfs', 'mount'))
     def setUp(self):
         vmlinuz = self.params.get('vmlinuz')
         if vmlinuz is None:
@@ -242,13 +242,7 @@ def setUp(self):
 
         self.seed = self.params.get('seed')
 
-        self.ssh_key = os.path.join(self.workdir, 'id_ed25519')
-
-        self.run(('ssh-keygen', '-N', '', '-t', 'ed25519', '-f', self.ssh_key))
-
-        pubkey = open(self.ssh_key + '.pub').read()
-
-        super(VirtiofsSubmountsTest, self).setUp(pubkey)
+        super(VirtiofsSubmountsTest, self).setUp()
 
         if len(vmlinuz) > 0:
             self.vm.add_args('-kernel', vmlinuz,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/3] virtiofs_submounts.py: Check `find`
  2021-05-12 12:55 [PATCH 0/3] virtiofsd: Find original inode ID of mount points Max Reitz
  2021-05-12 12:55 ` [PATCH 1/3] " Max Reitz
  2021-05-12 12:55 ` [PATCH 2/3] virtiofs_submounts.py: Do not generate ssh key Max Reitz
@ 2021-05-12 12:55 ` Max Reitz
  2021-05-12 14:37 ` [Virtio-fs] Fwd: [PATCH 0/3] virtiofsd: Find original inode ID of mount points Max Reitz
  3 siblings, 0 replies; 22+ messages in thread
From: Max Reitz @ 2021-05-12 12:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi, Max Reitz

The guest test script contained a lengthy section on why we cannot just
run `find` to auto-mount all submounts.  With HEAD^^, that should work
now, so replace it and the recursively_visit() function by a lengthy
section explaining the history, and a `find`.

(Also really check that `find` will not complain about anything.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 .../virtiofs_submounts.py.data/guest.sh       | 56 +++++++------------
 1 file changed, 21 insertions(+), 35 deletions(-)

diff --git a/tests/acceptance/virtiofs_submounts.py.data/guest.sh b/tests/acceptance/virtiofs_submounts.py.data/guest.sh
index 59ba40fde1..a3e9dc02f2 100644
--- a/tests/acceptance/virtiofs_submounts.py.data/guest.sh
+++ b/tests/acceptance/virtiofs_submounts.py.data/guest.sh
@@ -19,41 +19,27 @@ fi
 
 cd "$shared_dir"
 
-# FIXME: This should not be necessary, but it is.  In order for all
-# submounts to be proper mount points, we need to visit them.
-# (Before we visit them, they will not be auto-mounted, and so just
-# appear as normal directories, with the catch that their st_ino will
-# be the st_ino of the filesystem they host, while the st_dev will
-# still be the st_dev of the parent.)
-# `find` does not work, because it will refuse to touch the mount
-# points as long as they are not mounted; their st_dev being shared
-# with the parent and st_ino just being the root node's inode ID
-# will practically ensure that this node exists elsewhere on the
-# filesystem, and `find` is required to recognize loops and not to
-# follow them.
-# Thus, we have to manually visit all nodes first.
-
-mnt_i=0
-
-function recursively_visit()
-{
-    pushd "$1" >/dev/null
-    for entry in *; do
-        if [[ "$entry" == mnt* ]]; then
-            mnt_i=$((mnt_i + 1))
-            printf "Triggering auto-mount $mnt_i...\r"
-        fi
-
-        if [ -d "$entry" ]; then
-            recursively_visit "$entry"
-        fi
-    done
-    popd >/dev/null
-}
-
-recursively_visit .
-echo
-
+# See whether `find` complains about anything, like file system loops,
+# by looking for a file that does not exist (so the output should be
+# empty).
+# (Historically, for mount points, virtiofsd reported only the inode ID
+# in submount, i.e. the submount root's inode ID.  However, while the
+# submount is not yet auto-mounted in the guest, it would have the
+# parent's device ID, and so would have the same st_dev/st_ino
+# combination as the parent filesystem's root.  This would lead to
+# `find` reporting file system loops.
+# This has been fixed so that virtiofsd reports the mount point node's
+# inode ID in the parent filesystem, and when the guest auto-mounts the
+# submount, it will only then see the inode ID in that FS.)
+#
+# As a side-effect, this `find` auto-mounts all submounts by visiting
+# the whole tree.
+find_output=$(find -name there-is-no-such-file 2>&1)
+if [ -n "$find_output" ]; then
+    echo "Error: find has reported errors or warnings:" >&2
+    echo "$find_output" >&2
+    exit 1
+fi
 
 if [ -n "$(find -name not-mounted)" ]; then
     echo "Error: not-mounted files visible on mount points:" >&2
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Virtio-fs] Fwd: [PATCH 0/3] virtiofsd: Find original inode ID of mount points
  2021-05-12 12:55 [PATCH 0/3] virtiofsd: Find original inode ID of mount points Max Reitz
                   ` (2 preceding siblings ...)
  2021-05-12 12:55 ` [PATCH 3/3] virtiofs_submounts.py: Check `find` Max Reitz
@ 2021-05-12 14:37 ` Max Reitz
  3 siblings, 0 replies; 22+ messages in thread
From: Max Reitz @ 2021-05-12 14:37 UTC (permalink / raw)
  To: virtio-fs-list

Sorry, forgot to CC the virtio-fs list.  I hope forwarding the cover 
letter is enough...

Here’s the link to the qemu-devel archive:
https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg03366.html


-------- Forwarded Message --------
Subject: [PATCH 0/3] virtiofsd: Find original inode ID of mount points
Date: Wed, 12 May 2021 14:55:41 +0200
From: Max Reitz <mreitz@redhat.com>
To: qemu-devel@nongnu.org
Copy (CC): Max Reitz <mreitz@redhat.com>, Dr . David Alan Gilbert 
<dgilbert@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>

Hi,

Not announcing submounts to the guest may lead to duplicated
st_dev/st_ino combinations, because inodes on different filesystems
(different st_dev) may share the same inode ID (st_ino).  If the guest
only sees a single st_dev, those inodes will appear to have the same
st_dev/st_ino combination.

Announcing submounts is supposed to solve this problem by making the
guest report different st_dev IDs for different filesystems.

However, there is one loop hole: Submounts are implemented as
auto-mounts, meaning when the virtio-fs filesystem is mounted, the
submounts will not be mounted immediately.  In the guest, it is possible
to stat() these mount points with AT_NO_AUTOMOUNT to receive information
without mounting them, so they will then show whatever st_ino virtiofsd
passes, and the st_dev of the parent filesystem.

Unfortunately, as far as we understood, the only st_ino that virtiofsd
could inquire was the one on the submounted filesystem, i.e. the st_ino
of that FS’s root node.

Thus, we again got a collision: In said case, the guest would see st_dev
of the parent FS, but st_ino of the submounted FS.  This is very likely
to be the same st_dev/st_ino combination as the root node of the parent
FS.

For nested mount structures, this can be reproduced with `find`:
Mounting several filesystems under each other, passing everything into a
guest, mounting the root virtio-fs FS there, and then invoking `find`
would likely result in it complaining about filesystem loops and
refusing to visit the submount points.  (These loops are reported
because it takes notes of st_dev/st_ino combinations, and then reports a
loop when some combination reappears.)  This can only be fixed by
forcing all submounts to be (auto-)mounted.

What we’d need to do is report the inode ID of the mount point directory
that it has on the parent filesystem to the guest, until the submount is
auto-mounted there.  It’s just that we thought there was no way to
inquire this parent FS st_ino.

It turns out that our understand was wrong, though: There is a way,
namely readdir().  The dirent.d_ino field represents (in practice) the
st_ino of some directory entry on the directory’s filesystem, and so is
exactly the value we want to give to the guest.

Let’s do that.


Max Reitz (3):
   virtiofsd: Find original inode ID of mount points
   virtiofs_submounts.py: Do not generate ssh key
   virtiofs_submounts.py: Check `find`

  tools/virtiofsd/passthrough_ll.c              | 104 +++++++++++++++++-
  tests/acceptance/virtiofs_submounts.py        |  10 +-
  .../virtiofs_submounts.py.data/guest.sh       |  56 ++++------
  3 files changed, 122 insertions(+), 48 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] virtiofsd: Find original inode ID of mount points
  2021-05-12 12:55 ` [PATCH 1/3] " Max Reitz
@ 2021-05-12 15:59   ` Connor Kuehl
  2021-05-17 14:57     ` [Virtio-fs] " Vivek Goyal
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Connor Kuehl @ 2021-05-12 15:59 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Dr . David Alan Gilbert, Stefan Hajnoczi

On 5/12/21 7:55 AM, Max Reitz wrote:
> Mount point directories represent two inodes: On one hand, they are a
> normal directory on their parent filesystem.  On the other, they are
> the
> root node of the filesystem mounted there.  Thus, they have two inode
> IDs.
> 
> Right now, we only report the latter inode ID (i.e. the inode ID of
> the
> mounted filesystem's root node).  This is fine once the guest has
> auto-mounted a submount there (so this inode ID goes with a device ID
> that is distinct from the parent filesystem), but before the
> auto-mount,
> they have the device ID of the parent and the inode ID for the
> submount.
> This is problematic because this is likely exactly the same
> st_dev/st_ino combination as the parent filesystem's root node.  This
> leads to problems for example with `find`, which will thus complain
> about a filesystem loop if it has visited the parent filesystem's root
> node before, and then refuse to descend into the submount.
> 
> There is a way to find the mount directory's original inode ID, and
> that
> is to readdir(3) the parent directory, look for the mount directory,
> and
> read the dirent.d_ino field.  Using this, we can let lookup and
> readdirplus return that original inode ID, which the guest will thus
> show until the submount is auto-mounted.  (Then, it will invoke
> getattr
> and that stat(2) call will return the inode ID for the submount.)
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---

This is a clever way of uncovering the inode ID.

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] virtiofsd: Find original inode ID of mount points
  2021-05-12 12:55 ` [PATCH 1/3] " Max Reitz
@ 2021-05-17 14:57     ` Vivek Goyal
  2021-05-17 14:57     ` [Virtio-fs] " Vivek Goyal
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2021-05-17 14:57 UTC (permalink / raw)
  To: Max Reitz
  Cc: virtio-fs-list, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
> Mount point directories represent two inodes: On one hand, they are a
> normal directory on their parent filesystem.  On the other, they are the
> root node of the filesystem mounted there.  Thus, they have two inode
> IDs.
> 
> Right now, we only report the latter inode ID (i.e. the inode ID of the
> mounted filesystem's root node).  This is fine once the guest has
> auto-mounted a submount there (so this inode ID goes with a device ID
> that is distinct from the parent filesystem), but before the auto-mount,
> they have the device ID of the parent and the inode ID for the submount.
> This is problematic because this is likely exactly the same
> st_dev/st_ino combination as the parent filesystem's root node.  This
> leads to problems for example with `find`, which will thus complain
> about a filesystem loop if it has visited the parent filesystem's root
> node before, and then refuse to descend into the submount.
> 
> There is a way to find the mount directory's original inode ID, and that
> is to readdir(3) the parent directory, look for the mount directory, and
> read the dirent.d_ino field.  Using this, we can let lookup and
> readdirplus return that original inode ID, which the guest will thus
> show until the submount is auto-mounted.  (Then, it will invoke getattr
> and that stat(2) call will return the inode ID for the submount.)
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 104 +++++++++++++++++++++++++++++--
>  1 file changed, 99 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 1553d2ef45..110b6e7e5b 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -968,14 +968,87 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>      return 0;
>  }
>  
> +/*
> + * Use readdir() to find mp_name's inode ID on the parent's filesystem.
> + * (For mount points, stat() will only return the inode ID on the
> + * filesystem mounted there, i.e. the root directory's inode ID.  The
> + * mount point originally was a directory on the parent filesystem,
> + * though, and so has a different inode ID there.  When passing
> + * submount information to the guest, we need to pass this other ID,
> + * so the guest can use it as the inode ID until the submount is
> + * auto-mounted.  (At which point the guest will invoke getattr and
> + * find the inode ID on the submount.))
> + *
> + * Return 0 on success, and -errno otherwise.  *pino is set only in
> + * case of success.
> + */
> +static int get_mp_ino_on_parent(const struct lo_inode *dir, const char *mp_name,
> +                                ino_t *pino)
> +{
> +    int dirfd = -1;
> +    int ret;
> +    DIR *dp = NULL;
> +
> +    dirfd = openat(dir->fd, ".", O_RDONLY);
> +    if (dirfd < 0) {
> +        ret = -errno;
> +        goto out;
> +    }
> +
> +    dp = fdopendir(dirfd);
> +    if (!dp) {
> +        ret = -errno;
> +        goto out;
> +    }
> +    /* Owned by dp now */
> +    dirfd = -1;
> +
> +    while (true) {
> +        struct dirent *de;
> +
> +        errno = 0;
> +        de = readdir(dp);
> +        if (!de) {
> +            ret = errno ? -errno : -ENOENT;
> +            goto out;
> +        }
> +
> +        if (!strcmp(de->d_name, mp_name)) {
> +            *pino = de->d_ino;
> +            ret = 0;
> +            goto out;
> +        }
> +    }
> +
> +out:
> +    if (dp) {
> +        closedir(dp);
> +    }
> +    if (dirfd >= 0) {
> +        close(dirfd);
> +    }
> +    return ret;
> +}
> +
>  /*
>   * Increments nlookup on the inode on success. unref_inode_lolocked() must be
>   * called eventually to decrement nlookup again. If inodep is non-NULL, the
>   * inode pointer is stored and the caller must call lo_inode_put().
> + *
> + * If parent_fs_st_ino is true, the entry is a mount point, and submounts are
> + * announced to the guest, set e->attr.st_ino to the entry's inode ID on its
> + * parent filesystem instead of its inode ID on the filesystem mounted on it.
> + * (For mount points, the entry encompasses two inodes: One on the parent FS,
> + * and one on the mounted FS (where it is the root node), so it has two inode
> + * IDs.  When looking up entries, we should show the guest the parent FS's inode
> + * ID, because as long as the guest has not auto-mounted the submount, it should
> + * see that original ID.  Once it does perform the auto-mount, it will invoke
> + * getattr and see the root node's inode ID.)
>   */
>  static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>                          struct fuse_entry_param *e,
> -                        struct lo_inode **inodep)
> +                        struct lo_inode **inodep,
> +                        bool parent_fs_st_ino)
>  {
>      int newfd;
>      int res;
> @@ -984,6 +1057,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode = NULL;
>      struct lo_inode *dir = lo_inode(req, parent);
> +    ino_t ino_id_for_guest;
>  
>      if (inodep) {
>          *inodep = NULL; /* in case there is an error */
> @@ -1018,9 +1092,22 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out_err;
>      }
>  
> +    ino_id_for_guest = e->attr.st_ino;
> +
>      if (S_ISDIR(e->attr.st_mode) && lo->announce_submounts &&
>          (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id)) {
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
> +
> +        if (parent_fs_st_ino) {
> +            /*
> +             * Best effort, so ignore errors.
> +             * Also note that using readdir() means there may be races:
> +             * The directory entry we find (if any) may be different
> +             * from newfd.  Again, this is a best effort.  Reporting
> +             * the wrong inode ID to the guest is not catastrophic.
> +             */
> +            get_mp_ino_on_parent(dir, name, &ino_id_for_guest);

Hi Max,

[CC virtio-fs list ]

In general patch looks good to me. A minor nit. get_mp_ino_on_parent()
is retruning error. It might be better to capture error and print a
message and continue.

I have couple of general questions about submounts.

- What happens in case of single file mounted on top of another file.

  mount --bind foo.txt bar.txt

Do submounts work when mount point is not a directory.

- Say a directory is not a mount point yet and lookup instantiates an
  inode. Later user mounts something on that directory. When does
  client/server notice this change. I am assuming this is probably
  part of revalidation path.

Thanks
Vivek

> +        }
>      }
>  
>      inode = lo_find(lo, &e->attr, mnt_id);
> @@ -1043,6 +1130,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>  
>          inode->nlookup = 1;
>          inode->fd = newfd;
> +        /*
> +         * For the inode key, use the dev/ino/mnt ID as reported by stat()
> +         * (i.e. not ino_id_for_guest)
> +         */
>          inode->key.ino = e->attr.st_ino;
>          inode->key.dev = e->attr.st_dev;
>          inode->key.mnt_id = mnt_id;
> @@ -1058,6 +1149,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      }
>      e->ino = inode->fuse_ino;
>  
> +    /* Report ino_id_for_guest to the guest */
> +    e->attr.st_ino = ino_id_for_guest;
> +
>      /* Transfer ownership of inode pointer to caller or drop it */
>      if (inodep) {
>          *inodep = inode;
> @@ -1104,7 +1198,7 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
>          return;
>      }
>  
> -    err = lo_do_lookup(req, parent, name, &e, NULL);
> +    err = lo_do_lookup(req, parent, name, &e, NULL, true);
>      if (err) {
>          fuse_reply_err(req, err);
>      } else {
> @@ -1217,7 +1311,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
>          goto out;
>      }
>  
> -    saverr = lo_do_lookup(req, parent, name, &e, NULL);
> +    saverr = lo_do_lookup(req, parent, name, &e, NULL, false);
>      if (saverr) {
>          goto out;
>      }
> @@ -1714,7 +1808,7 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
>  
>          if (plus) {
>              if (!is_dot_or_dotdot(name)) {
> -                err = lo_do_lookup(req, ino, name, &e, NULL);
> +                err = lo_do_lookup(req, ino, name, &e, NULL, true);
>                  if (err) {
>                      goto error;
>                  }
> @@ -1936,7 +2030,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out;
>      }
>  
> -    err = lo_do_lookup(req, parent, name, &e, &inode);
> +    err = lo_do_lookup(req, parent, name, &e, &inode, false);
>      if (err) {
>          goto out;
>      }
> -- 
> 2.31.1
> 
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Virtio-fs] [PATCH 1/3] virtiofsd: Find original inode ID of mount points
@ 2021-05-17 14:57     ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2021-05-17 14:57 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list, qemu-devel

On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
> Mount point directories represent two inodes: On one hand, they are a
> normal directory on their parent filesystem.  On the other, they are the
> root node of the filesystem mounted there.  Thus, they have two inode
> IDs.
> 
> Right now, we only report the latter inode ID (i.e. the inode ID of the
> mounted filesystem's root node).  This is fine once the guest has
> auto-mounted a submount there (so this inode ID goes with a device ID
> that is distinct from the parent filesystem), but before the auto-mount,
> they have the device ID of the parent and the inode ID for the submount.
> This is problematic because this is likely exactly the same
> st_dev/st_ino combination as the parent filesystem's root node.  This
> leads to problems for example with `find`, which will thus complain
> about a filesystem loop if it has visited the parent filesystem's root
> node before, and then refuse to descend into the submount.
> 
> There is a way to find the mount directory's original inode ID, and that
> is to readdir(3) the parent directory, look for the mount directory, and
> read the dirent.d_ino field.  Using this, we can let lookup and
> readdirplus return that original inode ID, which the guest will thus
> show until the submount is auto-mounted.  (Then, it will invoke getattr
> and that stat(2) call will return the inode ID for the submount.)
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 104 +++++++++++++++++++++++++++++--
>  1 file changed, 99 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 1553d2ef45..110b6e7e5b 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -968,14 +968,87 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>      return 0;
>  }
>  
> +/*
> + * Use readdir() to find mp_name's inode ID on the parent's filesystem.
> + * (For mount points, stat() will only return the inode ID on the
> + * filesystem mounted there, i.e. the root directory's inode ID.  The
> + * mount point originally was a directory on the parent filesystem,
> + * though, and so has a different inode ID there.  When passing
> + * submount information to the guest, we need to pass this other ID,
> + * so the guest can use it as the inode ID until the submount is
> + * auto-mounted.  (At which point the guest will invoke getattr and
> + * find the inode ID on the submount.))
> + *
> + * Return 0 on success, and -errno otherwise.  *pino is set only in
> + * case of success.
> + */
> +static int get_mp_ino_on_parent(const struct lo_inode *dir, const char *mp_name,
> +                                ino_t *pino)
> +{
> +    int dirfd = -1;
> +    int ret;
> +    DIR *dp = NULL;
> +
> +    dirfd = openat(dir->fd, ".", O_RDONLY);
> +    if (dirfd < 0) {
> +        ret = -errno;
> +        goto out;
> +    }
> +
> +    dp = fdopendir(dirfd);
> +    if (!dp) {
> +        ret = -errno;
> +        goto out;
> +    }
> +    /* Owned by dp now */
> +    dirfd = -1;
> +
> +    while (true) {
> +        struct dirent *de;
> +
> +        errno = 0;
> +        de = readdir(dp);
> +        if (!de) {
> +            ret = errno ? -errno : -ENOENT;
> +            goto out;
> +        }
> +
> +        if (!strcmp(de->d_name, mp_name)) {
> +            *pino = de->d_ino;
> +            ret = 0;
> +            goto out;
> +        }
> +    }
> +
> +out:
> +    if (dp) {
> +        closedir(dp);
> +    }
> +    if (dirfd >= 0) {
> +        close(dirfd);
> +    }
> +    return ret;
> +}
> +
>  /*
>   * Increments nlookup on the inode on success. unref_inode_lolocked() must be
>   * called eventually to decrement nlookup again. If inodep is non-NULL, the
>   * inode pointer is stored and the caller must call lo_inode_put().
> + *
> + * If parent_fs_st_ino is true, the entry is a mount point, and submounts are
> + * announced to the guest, set e->attr.st_ino to the entry's inode ID on its
> + * parent filesystem instead of its inode ID on the filesystem mounted on it.
> + * (For mount points, the entry encompasses two inodes: One on the parent FS,
> + * and one on the mounted FS (where it is the root node), so it has two inode
> + * IDs.  When looking up entries, we should show the guest the parent FS's inode
> + * ID, because as long as the guest has not auto-mounted the submount, it should
> + * see that original ID.  Once it does perform the auto-mount, it will invoke
> + * getattr and see the root node's inode ID.)
>   */
>  static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>                          struct fuse_entry_param *e,
> -                        struct lo_inode **inodep)
> +                        struct lo_inode **inodep,
> +                        bool parent_fs_st_ino)
>  {
>      int newfd;
>      int res;
> @@ -984,6 +1057,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode = NULL;
>      struct lo_inode *dir = lo_inode(req, parent);
> +    ino_t ino_id_for_guest;
>  
>      if (inodep) {
>          *inodep = NULL; /* in case there is an error */
> @@ -1018,9 +1092,22 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out_err;
>      }
>  
> +    ino_id_for_guest = e->attr.st_ino;
> +
>      if (S_ISDIR(e->attr.st_mode) && lo->announce_submounts &&
>          (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id)) {
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
> +
> +        if (parent_fs_st_ino) {
> +            /*
> +             * Best effort, so ignore errors.
> +             * Also note that using readdir() means there may be races:
> +             * The directory entry we find (if any) may be different
> +             * from newfd.  Again, this is a best effort.  Reporting
> +             * the wrong inode ID to the guest is not catastrophic.
> +             */
> +            get_mp_ino_on_parent(dir, name, &ino_id_for_guest);

Hi Max,

[CC virtio-fs list ]

In general patch looks good to me. A minor nit. get_mp_ino_on_parent()
is retruning error. It might be better to capture error and print a
message and continue.

I have couple of general questions about submounts.

- What happens in case of single file mounted on top of another file.

  mount --bind foo.txt bar.txt

Do submounts work when mount point is not a directory.

- Say a directory is not a mount point yet and lookup instantiates an
  inode. Later user mounts something on that directory. When does
  client/server notice this change. I am assuming this is probably
  part of revalidation path.

Thanks
Vivek

> +        }
>      }
>  
>      inode = lo_find(lo, &e->attr, mnt_id);
> @@ -1043,6 +1130,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>  
>          inode->nlookup = 1;
>          inode->fd = newfd;
> +        /*
> +         * For the inode key, use the dev/ino/mnt ID as reported by stat()
> +         * (i.e. not ino_id_for_guest)
> +         */
>          inode->key.ino = e->attr.st_ino;
>          inode->key.dev = e->attr.st_dev;
>          inode->key.mnt_id = mnt_id;
> @@ -1058,6 +1149,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      }
>      e->ino = inode->fuse_ino;
>  
> +    /* Report ino_id_for_guest to the guest */
> +    e->attr.st_ino = ino_id_for_guest;
> +
>      /* Transfer ownership of inode pointer to caller or drop it */
>      if (inodep) {
>          *inodep = inode;
> @@ -1104,7 +1198,7 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
>          return;
>      }
>  
> -    err = lo_do_lookup(req, parent, name, &e, NULL);
> +    err = lo_do_lookup(req, parent, name, &e, NULL, true);
>      if (err) {
>          fuse_reply_err(req, err);
>      } else {
> @@ -1217,7 +1311,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
>          goto out;
>      }
>  
> -    saverr = lo_do_lookup(req, parent, name, &e, NULL);
> +    saverr = lo_do_lookup(req, parent, name, &e, NULL, false);
>      if (saverr) {
>          goto out;
>      }
> @@ -1714,7 +1808,7 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
>  
>          if (plus) {
>              if (!is_dot_or_dotdot(name)) {
> -                err = lo_do_lookup(req, ino, name, &e, NULL);
> +                err = lo_do_lookup(req, ino, name, &e, NULL, true);
>                  if (err) {
>                      goto error;
>                  }
> @@ -1936,7 +2030,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out;
>      }
>  
> -    err = lo_do_lookup(req, parent, name, &e, &inode);
> +    err = lo_do_lookup(req, parent, name, &e, &inode, false);
>      if (err) {
>          goto out;
>      }
> -- 
> 2.31.1
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] virtiofsd: Find original inode ID of mount points
  2021-05-17 14:57     ` [Virtio-fs] " Vivek Goyal
@ 2021-05-17 17:26       ` Max Reitz
  -1 siblings, 0 replies; 22+ messages in thread
From: Max Reitz @ 2021-05-17 17:26 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs-list, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 17.05.21 16:57, Vivek Goyal wrote:
> On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
>> Mount point directories represent two inodes: On one hand, they are a
>> normal directory on their parent filesystem.  On the other, they are the
>> root node of the filesystem mounted there.  Thus, they have two inode
>> IDs.
>>
>> Right now, we only report the latter inode ID (i.e. the inode ID of the
>> mounted filesystem's root node).  This is fine once the guest has
>> auto-mounted a submount there (so this inode ID goes with a device ID
>> that is distinct from the parent filesystem), but before the auto-mount,
>> they have the device ID of the parent and the inode ID for the submount.
>> This is problematic because this is likely exactly the same
>> st_dev/st_ino combination as the parent filesystem's root node.  This
>> leads to problems for example with `find`, which will thus complain
>> about a filesystem loop if it has visited the parent filesystem's root
>> node before, and then refuse to descend into the submount.
>>
>> There is a way to find the mount directory's original inode ID, and that
>> is to readdir(3) the parent directory, look for the mount directory, and
>> read the dirent.d_ino field.  Using this, we can let lookup and
>> readdirplus return that original inode ID, which the guest will thus
>> show until the submount is auto-mounted.  (Then, it will invoke getattr
>> and that stat(2) call will return the inode ID for the submount.)
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 104 +++++++++++++++++++++++++++++--
>>   1 file changed, 99 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 1553d2ef45..110b6e7e5b 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -968,14 +968,87 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>>       return 0;
>>   }
>>   
>> +/*
>> + * Use readdir() to find mp_name's inode ID on the parent's filesystem.
>> + * (For mount points, stat() will only return the inode ID on the
>> + * filesystem mounted there, i.e. the root directory's inode ID.  The
>> + * mount point originally was a directory on the parent filesystem,
>> + * though, and so has a different inode ID there.  When passing
>> + * submount information to the guest, we need to pass this other ID,
>> + * so the guest can use it as the inode ID until the submount is
>> + * auto-mounted.  (At which point the guest will invoke getattr and
>> + * find the inode ID on the submount.))
>> + *
>> + * Return 0 on success, and -errno otherwise.  *pino is set only in
>> + * case of success.
>> + */
>> +static int get_mp_ino_on_parent(const struct lo_inode *dir, const char *mp_name,
>> +                                ino_t *pino)
>> +{
>> +    int dirfd = -1;
>> +    int ret;
>> +    DIR *dp = NULL;
>> +
>> +    dirfd = openat(dir->fd, ".", O_RDONLY);
>> +    if (dirfd < 0) {
>> +        ret = -errno;
>> +        goto out;
>> +    }
>> +
>> +    dp = fdopendir(dirfd);
>> +    if (!dp) {
>> +        ret = -errno;
>> +        goto out;
>> +    }
>> +    /* Owned by dp now */
>> +    dirfd = -1;
>> +
>> +    while (true) {
>> +        struct dirent *de;
>> +
>> +        errno = 0;
>> +        de = readdir(dp);
>> +        if (!de) {
>> +            ret = errno ? -errno : -ENOENT;
>> +            goto out;
>> +        }
>> +
>> +        if (!strcmp(de->d_name, mp_name)) {
>> +            *pino = de->d_ino;
>> +            ret = 0;
>> +            goto out;
>> +        }
>> +    }
>> +
>> +out:
>> +    if (dp) {
>> +        closedir(dp);
>> +    }
>> +    if (dirfd >= 0) {
>> +        close(dirfd);
>> +    }
>> +    return ret;
>> +}
>> +
>>   /*
>>    * Increments nlookup on the inode on success. unref_inode_lolocked() must be
>>    * called eventually to decrement nlookup again. If inodep is non-NULL, the
>>    * inode pointer is stored and the caller must call lo_inode_put().
>> + *
>> + * If parent_fs_st_ino is true, the entry is a mount point, and submounts are
>> + * announced to the guest, set e->attr.st_ino to the entry's inode ID on its
>> + * parent filesystem instead of its inode ID on the filesystem mounted on it.
>> + * (For mount points, the entry encompasses two inodes: One on the parent FS,
>> + * and one on the mounted FS (where it is the root node), so it has two inode
>> + * IDs.  When looking up entries, we should show the guest the parent FS's inode
>> + * ID, because as long as the guest has not auto-mounted the submount, it should
>> + * see that original ID.  Once it does perform the auto-mount, it will invoke
>> + * getattr and see the root node's inode ID.)
>>    */
>>   static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>>                           struct fuse_entry_param *e,
>> -                        struct lo_inode **inodep)
>> +                        struct lo_inode **inodep,
>> +                        bool parent_fs_st_ino)
>>   {
>>       int newfd;
>>       int res;
>> @@ -984,6 +1057,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>>       struct lo_data *lo = lo_data(req);
>>       struct lo_inode *inode = NULL;
>>       struct lo_inode *dir = lo_inode(req, parent);
>> +    ino_t ino_id_for_guest;
>>   
>>       if (inodep) {
>>           *inodep = NULL; /* in case there is an error */
>> @@ -1018,9 +1092,22 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>>           goto out_err;
>>       }
>>   
>> +    ino_id_for_guest = e->attr.st_ino;
>> +
>>       if (S_ISDIR(e->attr.st_mode) && lo->announce_submounts &&
>>           (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id)) {
>>           e->attr_flags |= FUSE_ATTR_SUBMOUNT;
>> +
>> +        if (parent_fs_st_ino) {
>> +            /*
>> +             * Best effort, so ignore errors.
>> +             * Also note that using readdir() means there may be races:
>> +             * The directory entry we find (if any) may be different
>> +             * from newfd.  Again, this is a best effort.  Reporting
>> +             * the wrong inode ID to the guest is not catastrophic.
>> +             */
>> +            get_mp_ino_on_parent(dir, name, &ino_id_for_guest);
> 
> Hi Max,
> 
> [CC virtio-fs list ]
> 
> In general patch looks good to me. A minor nit. get_mp_ino_on_parent()
> is retruning error. It might be better to capture error and print a
> message and continue.

Sure, why not.

> I have couple of general questions about submounts.
> 
> - What happens in case of single file mounted on top of another file.
> 
>    mount --bind foo.txt bar.txt
> 
> Do submounts work when mount point is not a directory.

No, as you can see in the condition quoted above, we only set the 
FUSE_ATTR_SUBMOUNT flag for directories.  That seemed the most common 
case for me, and I didn’t want to have to worry about weirdness that 
might ensue for file mounts.

> - Say a directory is not a mount point yet and lookup instantiates an
>    inode. Later user mounts something on that directory. When does
>    client/server notice this change. I am assuming this is probably
>    part of revalidation path.

I guess at least before this patch this is no different from any other 
filesystem change.  Because st_dev+st_ino changed, it should basically 
look like the old directory was removed and a different one was put in 
its place.

Now, with this patch, we will return the old st_ino to the guest, but 
internally virtiofsd will still use the submount’s st_dev/st_ino, so a 
new lo_inode should be created, and so fuse_dentry_revalidate()’s lookup 
should return a different node ID, resulting it to consider the entry 
expired.

Besides, fuse_dentry_revalidate() has a condition on IS_AUTOMOUNT(inode) 
!= flags & FUSE_ATTR_SUBMOUNT.  Considering the previous paragraph, this 
doesn’t seem necessary to me, but it can’t hurt.

Max



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Virtio-fs] [PATCH 1/3] virtiofsd: Find original inode ID of mount points
@ 2021-05-17 17:26       ` Max Reitz
  0 siblings, 0 replies; 22+ messages in thread
From: Max Reitz @ 2021-05-17 17:26 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel

On 17.05.21 16:57, Vivek Goyal wrote:
> On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
>> Mount point directories represent two inodes: On one hand, they are a
>> normal directory on their parent filesystem.  On the other, they are the
>> root node of the filesystem mounted there.  Thus, they have two inode
>> IDs.
>>
>> Right now, we only report the latter inode ID (i.e. the inode ID of the
>> mounted filesystem's root node).  This is fine once the guest has
>> auto-mounted a submount there (so this inode ID goes with a device ID
>> that is distinct from the parent filesystem), but before the auto-mount,
>> they have the device ID of the parent and the inode ID for the submount.
>> This is problematic because this is likely exactly the same
>> st_dev/st_ino combination as the parent filesystem's root node.  This
>> leads to problems for example with `find`, which will thus complain
>> about a filesystem loop if it has visited the parent filesystem's root
>> node before, and then refuse to descend into the submount.
>>
>> There is a way to find the mount directory's original inode ID, and that
>> is to readdir(3) the parent directory, look for the mount directory, and
>> read the dirent.d_ino field.  Using this, we can let lookup and
>> readdirplus return that original inode ID, which the guest will thus
>> show until the submount is auto-mounted.  (Then, it will invoke getattr
>> and that stat(2) call will return the inode ID for the submount.)
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tools/virtiofsd/passthrough_ll.c | 104 +++++++++++++++++++++++++++++--
>>   1 file changed, 99 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 1553d2ef45..110b6e7e5b 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -968,14 +968,87 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>>       return 0;
>>   }
>>   
>> +/*
>> + * Use readdir() to find mp_name's inode ID on the parent's filesystem.
>> + * (For mount points, stat() will only return the inode ID on the
>> + * filesystem mounted there, i.e. the root directory's inode ID.  The
>> + * mount point originally was a directory on the parent filesystem,
>> + * though, and so has a different inode ID there.  When passing
>> + * submount information to the guest, we need to pass this other ID,
>> + * so the guest can use it as the inode ID until the submount is
>> + * auto-mounted.  (At which point the guest will invoke getattr and
>> + * find the inode ID on the submount.))
>> + *
>> + * Return 0 on success, and -errno otherwise.  *pino is set only in
>> + * case of success.
>> + */
>> +static int get_mp_ino_on_parent(const struct lo_inode *dir, const char *mp_name,
>> +                                ino_t *pino)
>> +{
>> +    int dirfd = -1;
>> +    int ret;
>> +    DIR *dp = NULL;
>> +
>> +    dirfd = openat(dir->fd, ".", O_RDONLY);
>> +    if (dirfd < 0) {
>> +        ret = -errno;
>> +        goto out;
>> +    }
>> +
>> +    dp = fdopendir(dirfd);
>> +    if (!dp) {
>> +        ret = -errno;
>> +        goto out;
>> +    }
>> +    /* Owned by dp now */
>> +    dirfd = -1;
>> +
>> +    while (true) {
>> +        struct dirent *de;
>> +
>> +        errno = 0;
>> +        de = readdir(dp);
>> +        if (!de) {
>> +            ret = errno ? -errno : -ENOENT;
>> +            goto out;
>> +        }
>> +
>> +        if (!strcmp(de->d_name, mp_name)) {
>> +            *pino = de->d_ino;
>> +            ret = 0;
>> +            goto out;
>> +        }
>> +    }
>> +
>> +out:
>> +    if (dp) {
>> +        closedir(dp);
>> +    }
>> +    if (dirfd >= 0) {
>> +        close(dirfd);
>> +    }
>> +    return ret;
>> +}
>> +
>>   /*
>>    * Increments nlookup on the inode on success. unref_inode_lolocked() must be
>>    * called eventually to decrement nlookup again. If inodep is non-NULL, the
>>    * inode pointer is stored and the caller must call lo_inode_put().
>> + *
>> + * If parent_fs_st_ino is true, the entry is a mount point, and submounts are
>> + * announced to the guest, set e->attr.st_ino to the entry's inode ID on its
>> + * parent filesystem instead of its inode ID on the filesystem mounted on it.
>> + * (For mount points, the entry encompasses two inodes: One on the parent FS,
>> + * and one on the mounted FS (where it is the root node), so it has two inode
>> + * IDs.  When looking up entries, we should show the guest the parent FS's inode
>> + * ID, because as long as the guest has not auto-mounted the submount, it should
>> + * see that original ID.  Once it does perform the auto-mount, it will invoke
>> + * getattr and see the root node's inode ID.)
>>    */
>>   static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>>                           struct fuse_entry_param *e,
>> -                        struct lo_inode **inodep)
>> +                        struct lo_inode **inodep,
>> +                        bool parent_fs_st_ino)
>>   {
>>       int newfd;
>>       int res;
>> @@ -984,6 +1057,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>>       struct lo_data *lo = lo_data(req);
>>       struct lo_inode *inode = NULL;
>>       struct lo_inode *dir = lo_inode(req, parent);
>> +    ino_t ino_id_for_guest;
>>   
>>       if (inodep) {
>>           *inodep = NULL; /* in case there is an error */
>> @@ -1018,9 +1092,22 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>>           goto out_err;
>>       }
>>   
>> +    ino_id_for_guest = e->attr.st_ino;
>> +
>>       if (S_ISDIR(e->attr.st_mode) && lo->announce_submounts &&
>>           (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id)) {
>>           e->attr_flags |= FUSE_ATTR_SUBMOUNT;
>> +
>> +        if (parent_fs_st_ino) {
>> +            /*
>> +             * Best effort, so ignore errors.
>> +             * Also note that using readdir() means there may be races:
>> +             * The directory entry we find (if any) may be different
>> +             * from newfd.  Again, this is a best effort.  Reporting
>> +             * the wrong inode ID to the guest is not catastrophic.
>> +             */
>> +            get_mp_ino_on_parent(dir, name, &ino_id_for_guest);
> 
> Hi Max,
> 
> [CC virtio-fs list ]
> 
> In general patch looks good to me. A minor nit. get_mp_ino_on_parent()
> is retruning error. It might be better to capture error and print a
> message and continue.

Sure, why not.

> I have couple of general questions about submounts.
> 
> - What happens in case of single file mounted on top of another file.
> 
>    mount --bind foo.txt bar.txt
> 
> Do submounts work when mount point is not a directory.

No, as you can see in the condition quoted above, we only set the 
FUSE_ATTR_SUBMOUNT flag for directories.  That seemed the most common 
case for me, and I didn’t want to have to worry about weirdness that 
might ensue for file mounts.

> - Say a directory is not a mount point yet and lookup instantiates an
>    inode. Later user mounts something on that directory. When does
>    client/server notice this change. I am assuming this is probably
>    part of revalidation path.

I guess at least before this patch this is no different from any other 
filesystem change.  Because st_dev+st_ino changed, it should basically 
look like the old directory was removed and a different one was put in 
its place.

Now, with this patch, we will return the old st_ino to the guest, but 
internally virtiofsd will still use the submount’s st_dev/st_ino, so a 
new lo_inode should be created, and so fuse_dentry_revalidate()’s lookup 
should return a different node ID, resulting it to consider the entry 
expired.

Besides, fuse_dentry_revalidate() has a condition on IS_AUTOMOUNT(inode) 
!= flags & FUSE_ATTR_SUBMOUNT.  Considering the previous paragraph, this 
doesn’t seem necessary to me, but it can’t hurt.

Max


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] virtiofsd: Find original inode ID of mount points
  2021-05-17 17:26       ` [Virtio-fs] " Max Reitz
@ 2021-05-20 11:28         ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2021-05-20 11:28 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list, Stefan Hajnoczi, qemu-devel, Vivek Goyal

* Max Reitz (mreitz@redhat.com) wrote:
> On 17.05.21 16:57, Vivek Goyal wrote:
> > On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
> > > Mount point directories represent two inodes: On one hand, they are a
> > > normal directory on their parent filesystem.  On the other, they are the
> > > root node of the filesystem mounted there.  Thus, they have two inode
> > > IDs.
> > > 
> > > Right now, we only report the latter inode ID (i.e. the inode ID of the
> > > mounted filesystem's root node).  This is fine once the guest has
> > > auto-mounted a submount there (so this inode ID goes with a device ID
> > > that is distinct from the parent filesystem), but before the auto-mount,
> > > they have the device ID of the parent and the inode ID for the submount.
> > > This is problematic because this is likely exactly the same
> > > st_dev/st_ino combination as the parent filesystem's root node.  This
> > > leads to problems for example with `find`, which will thus complain
> > > about a filesystem loop if it has visited the parent filesystem's root
> > > node before, and then refuse to descend into the submount.
> > > 
> > > There is a way to find the mount directory's original inode ID, and that
> > > is to readdir(3) the parent directory, look for the mount directory, and
> > > read the dirent.d_ino field.  Using this, we can let lookup and
> > > readdirplus return that original inode ID, which the guest will thus
> > > show until the submount is auto-mounted.  (Then, it will invoke getattr
> > > and that stat(2) call will return the inode ID for the submount.)
> > > 
> > > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > > ---
> > >   tools/virtiofsd/passthrough_ll.c | 104 +++++++++++++++++++++++++++++--
> > >   1 file changed, 99 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index 1553d2ef45..110b6e7e5b 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -968,14 +968,87 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
> > >       return 0;
> > >   }
> > > +/*
> > > + * Use readdir() to find mp_name's inode ID on the parent's filesystem.
> > > + * (For mount points, stat() will only return the inode ID on the
> > > + * filesystem mounted there, i.e. the root directory's inode ID.  The
> > > + * mount point originally was a directory on the parent filesystem,
> > > + * though, and so has a different inode ID there.  When passing
> > > + * submount information to the guest, we need to pass this other ID,
> > > + * so the guest can use it as the inode ID until the submount is
> > > + * auto-mounted.  (At which point the guest will invoke getattr and
> > > + * find the inode ID on the submount.))
> > > + *
> > > + * Return 0 on success, and -errno otherwise.  *pino is set only in
> > > + * case of success.
> > > + */
> > > +static int get_mp_ino_on_parent(const struct lo_inode *dir, const char *mp_name,
> > > +                                ino_t *pino)
> > > +{
> > > +    int dirfd = -1;
> > > +    int ret;
> > > +    DIR *dp = NULL;
> > > +
> > > +    dirfd = openat(dir->fd, ".", O_RDONLY);
> > > +    if (dirfd < 0) {
> > > +        ret = -errno;
> > > +        goto out;
> > > +    }
> > > +
> > > +    dp = fdopendir(dirfd);
> > > +    if (!dp) {
> > > +        ret = -errno;
> > > +        goto out;
> > > +    }
> > > +    /* Owned by dp now */
> > > +    dirfd = -1;
> > > +
> > > +    while (true) {
> > > +        struct dirent *de;
> > > +
> > > +        errno = 0;
> > > +        de = readdir(dp);
> > > +        if (!de) {
> > > +            ret = errno ? -errno : -ENOENT;
> > > +            goto out;
> > > +        }
> > > +
> > > +        if (!strcmp(de->d_name, mp_name)) {
> > > +            *pino = de->d_ino;
> > > +            ret = 0;
> > > +            goto out;
> > > +        }
> > > +    }
> > > +
> > > +out:
> > > +    if (dp) {
> > > +        closedir(dp);
> > > +    }
> > > +    if (dirfd >= 0) {
> > > +        close(dirfd);
> > > +    }
> > > +    return ret;
> > > +}
> > > +
> > >   /*
> > >    * Increments nlookup on the inode on success. unref_inode_lolocked() must be
> > >    * called eventually to decrement nlookup again. If inodep is non-NULL, the
> > >    * inode pointer is stored and the caller must call lo_inode_put().
> > > + *
> > > + * If parent_fs_st_ino is true, the entry is a mount point, and submounts are
> > > + * announced to the guest, set e->attr.st_ino to the entry's inode ID on its
> > > + * parent filesystem instead of its inode ID on the filesystem mounted on it.
> > > + * (For mount points, the entry encompasses two inodes: One on the parent FS,
> > > + * and one on the mounted FS (where it is the root node), so it has two inode
> > > + * IDs.  When looking up entries, we should show the guest the parent FS's inode
> > > + * ID, because as long as the guest has not auto-mounted the submount, it should
> > > + * see that original ID.  Once it does perform the auto-mount, it will invoke
> > > + * getattr and see the root node's inode ID.)
> > >    */
> > >   static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> > >                           struct fuse_entry_param *e,
> > > -                        struct lo_inode **inodep)
> > > +                        struct lo_inode **inodep,
> > > +                        bool parent_fs_st_ino)
> > >   {
> > >       int newfd;
> > >       int res;
> > > @@ -984,6 +1057,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> > >       struct lo_data *lo = lo_data(req);
> > >       struct lo_inode *inode = NULL;
> > >       struct lo_inode *dir = lo_inode(req, parent);
> > > +    ino_t ino_id_for_guest;
> > >       if (inodep) {
> > >           *inodep = NULL; /* in case there is an error */
> > > @@ -1018,9 +1092,22 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> > >           goto out_err;
> > >       }
> > > +    ino_id_for_guest = e->attr.st_ino;
> > > +
> > >       if (S_ISDIR(e->attr.st_mode) && lo->announce_submounts &&
> > >           (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id)) {
> > >           e->attr_flags |= FUSE_ATTR_SUBMOUNT;
> > > +
> > > +        if (parent_fs_st_ino) {
> > > +            /*
> > > +             * Best effort, so ignore errors.
> > > +             * Also note that using readdir() means there may be races:
> > > +             * The directory entry we find (if any) may be different
> > > +             * from newfd.  Again, this is a best effort.  Reporting
> > > +             * the wrong inode ID to the guest is not catastrophic.
> > > +             */
> > > +            get_mp_ino_on_parent(dir, name, &ino_id_for_guest);
> > 
> > Hi Max,
> > 
> > [CC virtio-fs list ]
> > 
> > In general patch looks good to me. A minor nit. get_mp_ino_on_parent()
> > is retruning error. It might be better to capture error and print a
> > message and continue.
> 
> Sure, why not.
> 
> > I have couple of general questions about submounts.
> > 
> > - What happens in case of single file mounted on top of another file.
> > 
> >    mount --bind foo.txt bar.txt
> > 
> > Do submounts work when mount point is not a directory.
> 
> No, as you can see in the condition quoted above, we only set the
> FUSE_ATTR_SUBMOUNT flag for directories.  That seemed the most common case
> for me, and I didn’t want to have to worry about weirdness that might ensue
> for file mounts.

It might be worth checking file mounts don't break too badly; I'm pretty
sure I've seen the container systems use them a lot in /etc

Dave

> > - Say a directory is not a mount point yet and lookup instantiates an
> >    inode. Later user mounts something on that directory. When does
> >    client/server notice this change. I am assuming this is probably
> >    part of revalidation path.
> 
> I guess at least before this patch this is no different from any other
> filesystem change.  Because st_dev+st_ino changed, it should basically look
> like the old directory was removed and a different one was put in its place.
> 
> Now, with this patch, we will return the old st_ino to the guest, but
> internally virtiofsd will still use the submount’s st_dev/st_ino, so a new
> lo_inode should be created, and so fuse_dentry_revalidate()’s lookup should
> return a different node ID, resulting it to consider the entry expired.
> 
> Besides, fuse_dentry_revalidate() has a condition on IS_AUTOMOUNT(inode) !=
> flags & FUSE_ATTR_SUBMOUNT.  Considering the previous paragraph, this
> doesn’t seem necessary to me, but it can’t hurt.
> 
> Max
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Virtio-fs] [PATCH 1/3] virtiofsd: Find original inode ID of mount points
@ 2021-05-20 11:28         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2021-05-20 11:28 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list, qemu-devel, Vivek Goyal

* Max Reitz (mreitz@redhat.com) wrote:
> On 17.05.21 16:57, Vivek Goyal wrote:
> > On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
> > > Mount point directories represent two inodes: On one hand, they are a
> > > normal directory on their parent filesystem.  On the other, they are the
> > > root node of the filesystem mounted there.  Thus, they have two inode
> > > IDs.
> > > 
> > > Right now, we only report the latter inode ID (i.e. the inode ID of the
> > > mounted filesystem's root node).  This is fine once the guest has
> > > auto-mounted a submount there (so this inode ID goes with a device ID
> > > that is distinct from the parent filesystem), but before the auto-mount,
> > > they have the device ID of the parent and the inode ID for the submount.
> > > This is problematic because this is likely exactly the same
> > > st_dev/st_ino combination as the parent filesystem's root node.  This
> > > leads to problems for example with `find`, which will thus complain
> > > about a filesystem loop if it has visited the parent filesystem's root
> > > node before, and then refuse to descend into the submount.
> > > 
> > > There is a way to find the mount directory's original inode ID, and that
> > > is to readdir(3) the parent directory, look for the mount directory, and
> > > read the dirent.d_ino field.  Using this, we can let lookup and
> > > readdirplus return that original inode ID, which the guest will thus
> > > show until the submount is auto-mounted.  (Then, it will invoke getattr
> > > and that stat(2) call will return the inode ID for the submount.)
> > > 
> > > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > > ---
> > >   tools/virtiofsd/passthrough_ll.c | 104 +++++++++++++++++++++++++++++--
> > >   1 file changed, 99 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index 1553d2ef45..110b6e7e5b 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -968,14 +968,87 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
> > >       return 0;
> > >   }
> > > +/*
> > > + * Use readdir() to find mp_name's inode ID on the parent's filesystem.
> > > + * (For mount points, stat() will only return the inode ID on the
> > > + * filesystem mounted there, i.e. the root directory's inode ID.  The
> > > + * mount point originally was a directory on the parent filesystem,
> > > + * though, and so has a different inode ID there.  When passing
> > > + * submount information to the guest, we need to pass this other ID,
> > > + * so the guest can use it as the inode ID until the submount is
> > > + * auto-mounted.  (At which point the guest will invoke getattr and
> > > + * find the inode ID on the submount.))
> > > + *
> > > + * Return 0 on success, and -errno otherwise.  *pino is set only in
> > > + * case of success.
> > > + */
> > > +static int get_mp_ino_on_parent(const struct lo_inode *dir, const char *mp_name,
> > > +                                ino_t *pino)
> > > +{
> > > +    int dirfd = -1;
> > > +    int ret;
> > > +    DIR *dp = NULL;
> > > +
> > > +    dirfd = openat(dir->fd, ".", O_RDONLY);
> > > +    if (dirfd < 0) {
> > > +        ret = -errno;
> > > +        goto out;
> > > +    }
> > > +
> > > +    dp = fdopendir(dirfd);
> > > +    if (!dp) {
> > > +        ret = -errno;
> > > +        goto out;
> > > +    }
> > > +    /* Owned by dp now */
> > > +    dirfd = -1;
> > > +
> > > +    while (true) {
> > > +        struct dirent *de;
> > > +
> > > +        errno = 0;
> > > +        de = readdir(dp);
> > > +        if (!de) {
> > > +            ret = errno ? -errno : -ENOENT;
> > > +            goto out;
> > > +        }
> > > +
> > > +        if (!strcmp(de->d_name, mp_name)) {
> > > +            *pino = de->d_ino;
> > > +            ret = 0;
> > > +            goto out;
> > > +        }
> > > +    }
> > > +
> > > +out:
> > > +    if (dp) {
> > > +        closedir(dp);
> > > +    }
> > > +    if (dirfd >= 0) {
> > > +        close(dirfd);
> > > +    }
> > > +    return ret;
> > > +}
> > > +
> > >   /*
> > >    * Increments nlookup on the inode on success. unref_inode_lolocked() must be
> > >    * called eventually to decrement nlookup again. If inodep is non-NULL, the
> > >    * inode pointer is stored and the caller must call lo_inode_put().
> > > + *
> > > + * If parent_fs_st_ino is true, the entry is a mount point, and submounts are
> > > + * announced to the guest, set e->attr.st_ino to the entry's inode ID on its
> > > + * parent filesystem instead of its inode ID on the filesystem mounted on it.
> > > + * (For mount points, the entry encompasses two inodes: One on the parent FS,
> > > + * and one on the mounted FS (where it is the root node), so it has two inode
> > > + * IDs.  When looking up entries, we should show the guest the parent FS's inode
> > > + * ID, because as long as the guest has not auto-mounted the submount, it should
> > > + * see that original ID.  Once it does perform the auto-mount, it will invoke
> > > + * getattr and see the root node's inode ID.)
> > >    */
> > >   static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> > >                           struct fuse_entry_param *e,
> > > -                        struct lo_inode **inodep)
> > > +                        struct lo_inode **inodep,
> > > +                        bool parent_fs_st_ino)
> > >   {
> > >       int newfd;
> > >       int res;
> > > @@ -984,6 +1057,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> > >       struct lo_data *lo = lo_data(req);
> > >       struct lo_inode *inode = NULL;
> > >       struct lo_inode *dir = lo_inode(req, parent);
> > > +    ino_t ino_id_for_guest;
> > >       if (inodep) {
> > >           *inodep = NULL; /* in case there is an error */
> > > @@ -1018,9 +1092,22 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> > >           goto out_err;
> > >       }
> > > +    ino_id_for_guest = e->attr.st_ino;
> > > +
> > >       if (S_ISDIR(e->attr.st_mode) && lo->announce_submounts &&
> > >           (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id)) {
> > >           e->attr_flags |= FUSE_ATTR_SUBMOUNT;
> > > +
> > > +        if (parent_fs_st_ino) {
> > > +            /*
> > > +             * Best effort, so ignore errors.
> > > +             * Also note that using readdir() means there may be races:
> > > +             * The directory entry we find (if any) may be different
> > > +             * from newfd.  Again, this is a best effort.  Reporting
> > > +             * the wrong inode ID to the guest is not catastrophic.
> > > +             */
> > > +            get_mp_ino_on_parent(dir, name, &ino_id_for_guest);
> > 
> > Hi Max,
> > 
> > [CC virtio-fs list ]
> > 
> > In general patch looks good to me. A minor nit. get_mp_ino_on_parent()
> > is retruning error. It might be better to capture error and print a
> > message and continue.
> 
> Sure, why not.
> 
> > I have couple of general questions about submounts.
> > 
> > - What happens in case of single file mounted on top of another file.
> > 
> >    mount --bind foo.txt bar.txt
> > 
> > Do submounts work when mount point is not a directory.
> 
> No, as you can see in the condition quoted above, we only set the
> FUSE_ATTR_SUBMOUNT flag for directories.  That seemed the most common case
> for me, and I didn’t want to have to worry about weirdness that might ensue
> for file mounts.

It might be worth checking file mounts don't break too badly; I'm pretty
sure I've seen the container systems use them a lot in /etc

Dave

> > - Say a directory is not a mount point yet and lookup instantiates an
> >    inode. Later user mounts something on that directory. When does
> >    client/server notice this change. I am assuming this is probably
> >    part of revalidation path.
> 
> I guess at least before this patch this is no different from any other
> filesystem change.  Because st_dev+st_ino changed, it should basically look
> like the old directory was removed and a different one was put in its place.
> 
> Now, with this patch, we will return the old st_ino to the guest, but
> internally virtiofsd will still use the submount’s st_dev/st_ino, so a new
> lo_inode should be created, and so fuse_dentry_revalidate()’s lookup should
> return a different node ID, resulting it to consider the entry expired.
> 
> Besides, fuse_dentry_revalidate() has a condition on IS_AUTOMOUNT(inode) !=
> flags & FUSE_ATTR_SUBMOUNT.  Considering the previous paragraph, this
> doesn’t seem necessary to me, but it can’t hurt.
> 
> Max
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] virtiofsd: Find original inode ID of mount points
  2021-05-12 12:55 ` [PATCH 1/3] " Max Reitz
@ 2021-05-26 18:13     ` Vivek Goyal
  2021-05-17 14:57     ` [Virtio-fs] " Vivek Goyal
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2021-05-26 18:13 UTC (permalink / raw)
  To: Max Reitz
  Cc: virtio-fs-list, Miklos Szeredi, qemu-devel, Stefan Hajnoczi,
	Dr . David Alan Gilbert

On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
> Mount point directories represent two inodes: On one hand, they are a
> normal directory on their parent filesystem.  On the other, they are the
> root node of the filesystem mounted there.  Thus, they have two inode
> IDs.
> 
> Right now, we only report the latter inode ID (i.e. the inode ID of the
> mounted filesystem's root node).  This is fine once the guest has
> auto-mounted a submount there (so this inode ID goes with a device ID
> that is distinct from the parent filesystem), but before the auto-mount,
> they have the device ID of the parent and the inode ID for the submount.
> This is problematic because this is likely exactly the same
> st_dev/st_ino combination as the parent filesystem's root node.  This
> leads to problems for example with `find`, which will thus complain
> about a filesystem loop if it has visited the parent filesystem's root
> node before, and then refuse to descend into the submount.
> 
> There is a way to find the mount directory's original inode ID, and that
> is to readdir(3) the parent directory, look for the mount directory, and
> read the dirent.d_ino field.  Using this, we can let lookup and
> readdirplus return that original inode ID, which the guest will thus
> show until the submount is auto-mounted.

> (Then, it will invoke getattr
> and that stat(2) call will return the inode ID for the submount.)

Hi Max,

How are we sure that GETATTR() will always be called and that will
allow us to return inode number in mounted filesystem (instead of
parent filesystem). I thought GETATTR will be called only if cached
attrs have expired. (1 second default for cache=auto). Otherwise
stat() will fill inode->i_no from cache and return. And I am afraid
that in that case we will return inode number from parent fs,
instead of mounted fs.

Say following sequence of events happens pretty fast one after the
other. Say /mnt/virtiofs/foo is a mount point in server but client
has not created submount yet.

A. stat(/mnt/virtiofs/foo, AT_NO_AUTOMOUNT)
   -> This should get inode number in parent filesystem on host and 
      store in guest inode->i_no and return to user space. Say inode
      in guest is called a_ino.
B. stat(/mnt/virtiofs/foo)
   -> This should create submount and create new inode (say b_ino), using
      properties from a_ino. IOW, this should copy a_ino->i_no to
      b_ino->b_ino given current code, IIUC.

   -> Assume timeout has not happened and cached attrs have not expired.

   -> And now b_ino->i_no (or ->orig_ino) will be returned to user space.

Am I missing something. Do we need to always expire inode attrs when
we create submount so that client is forced to issue GETATTR.

Thanks
Vivek

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 104 +++++++++++++++++++++++++++++--
>  1 file changed, 99 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 1553d2ef45..110b6e7e5b 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -968,14 +968,87 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>      return 0;
>  }
>  
> +/*
> + * Use readdir() to find mp_name's inode ID on the parent's filesystem.
> + * (For mount points, stat() will only return the inode ID on the
> + * filesystem mounted there, i.e. the root directory's inode ID.  The
> + * mount point originally was a directory on the parent filesystem,
> + * though, and so has a different inode ID there.  When passing
> + * submount information to the guest, we need to pass this other ID,
> + * so the guest can use it as the inode ID until the submount is
> + * auto-mounted.  (At which point the guest will invoke getattr and
> + * find the inode ID on the submount.))
> + *
> + * Return 0 on success, and -errno otherwise.  *pino is set only in
> + * case of success.
> + */
> +static int get_mp_ino_on_parent(const struct lo_inode *dir, const char *mp_name,
> +                                ino_t *pino)
> +{
> +    int dirfd = -1;
> +    int ret;
> +    DIR *dp = NULL;
> +
> +    dirfd = openat(dir->fd, ".", O_RDONLY);
> +    if (dirfd < 0) {
> +        ret = -errno;
> +        goto out;
> +    }
> +
> +    dp = fdopendir(dirfd);
> +    if (!dp) {
> +        ret = -errno;
> +        goto out;
> +    }
> +    /* Owned by dp now */
> +    dirfd = -1;
> +
> +    while (true) {
> +        struct dirent *de;
> +
> +        errno = 0;
> +        de = readdir(dp);
> +        if (!de) {
> +            ret = errno ? -errno : -ENOENT;
> +            goto out;
> +        }
> +
> +        if (!strcmp(de->d_name, mp_name)) {
> +            *pino = de->d_ino;
> +            ret = 0;
> +            goto out;
> +        }
> +    }
> +
> +out:
> +    if (dp) {
> +        closedir(dp);
> +    }
> +    if (dirfd >= 0) {
> +        close(dirfd);
> +    }
> +    return ret;
> +}
> +
>  /*
>   * Increments nlookup on the inode on success. unref_inode_lolocked() must be
>   * called eventually to decrement nlookup again. If inodep is non-NULL, the
>   * inode pointer is stored and the caller must call lo_inode_put().
> + *
> + * If parent_fs_st_ino is true, the entry is a mount point, and submounts are
> + * announced to the guest, set e->attr.st_ino to the entry's inode ID on its
> + * parent filesystem instead of its inode ID on the filesystem mounted on it.
> + * (For mount points, the entry encompasses two inodes: One on the parent FS,
> + * and one on the mounted FS (where it is the root node), so it has two inode
> + * IDs.  When looking up entries, we should show the guest the parent FS's inode
> + * ID, because as long as the guest has not auto-mounted the submount, it should
> + * see that original ID.  Once it does perform the auto-mount, it will invoke
> + * getattr and see the root node's inode ID.)
>   */
>  static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>                          struct fuse_entry_param *e,
> -                        struct lo_inode **inodep)
> +                        struct lo_inode **inodep,
> +                        bool parent_fs_st_ino)
>  {
>      int newfd;
>      int res;
> @@ -984,6 +1057,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode = NULL;
>      struct lo_inode *dir = lo_inode(req, parent);
> +    ino_t ino_id_for_guest;
>  
>      if (inodep) {
>          *inodep = NULL; /* in case there is an error */
> @@ -1018,9 +1092,22 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out_err;
>      }
>  
> +    ino_id_for_guest = e->attr.st_ino;
> +
>      if (S_ISDIR(e->attr.st_mode) && lo->announce_submounts &&
>          (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id)) {
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
> +
> +        if (parent_fs_st_ino) {
> +            /*
> +             * Best effort, so ignore errors.
> +             * Also note that using readdir() means there may be races:
> +             * The directory entry we find (if any) may be different
> +             * from newfd.  Again, this is a best effort.  Reporting
> +             * the wrong inode ID to the guest is not catastrophic.
> +             */
> +            get_mp_ino_on_parent(dir, name, &ino_id_for_guest);
> +        }
>      }
>  
>      inode = lo_find(lo, &e->attr, mnt_id);
> @@ -1043,6 +1130,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>  
>          inode->nlookup = 1;
>          inode->fd = newfd;
> +        /*
> +         * For the inode key, use the dev/ino/mnt ID as reported by stat()
> +         * (i.e. not ino_id_for_guest)
> +         */
>          inode->key.ino = e->attr.st_ino;
>          inode->key.dev = e->attr.st_dev;
>          inode->key.mnt_id = mnt_id;
> @@ -1058,6 +1149,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      }
>      e->ino = inode->fuse_ino;
>  
> +    /* Report ino_id_for_guest to the guest */
> +    e->attr.st_ino = ino_id_for_guest;
> +
>      /* Transfer ownership of inode pointer to caller or drop it */
>      if (inodep) {
>          *inodep = inode;
> @@ -1104,7 +1198,7 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
>          return;
>      }
>  
> -    err = lo_do_lookup(req, parent, name, &e, NULL);
> +    err = lo_do_lookup(req, parent, name, &e, NULL, true);
>      if (err) {
>          fuse_reply_err(req, err);
>      } else {
> @@ -1217,7 +1311,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
>          goto out;
>      }
>  
> -    saverr = lo_do_lookup(req, parent, name, &e, NULL);
> +    saverr = lo_do_lookup(req, parent, name, &e, NULL, false);
>      if (saverr) {
>          goto out;
>      }
> @@ -1714,7 +1808,7 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
>  
>          if (plus) {
>              if (!is_dot_or_dotdot(name)) {
> -                err = lo_do_lookup(req, ino, name, &e, NULL);
> +                err = lo_do_lookup(req, ino, name, &e, NULL, true);
>                  if (err) {
>                      goto error;
>                  }
> @@ -1936,7 +2030,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out;
>      }
>  
> -    err = lo_do_lookup(req, parent, name, &e, &inode);
> +    err = lo_do_lookup(req, parent, name, &e, &inode, false);
>      if (err) {
>          goto out;
>      }
> -- 
> 2.31.1
> 
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Virtio-fs] [PATCH 1/3] virtiofsd: Find original inode ID of mount points
@ 2021-05-26 18:13     ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2021-05-26 18:13 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list, qemu-devel

On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
> Mount point directories represent two inodes: On one hand, they are a
> normal directory on their parent filesystem.  On the other, they are the
> root node of the filesystem mounted there.  Thus, they have two inode
> IDs.
> 
> Right now, we only report the latter inode ID (i.e. the inode ID of the
> mounted filesystem's root node).  This is fine once the guest has
> auto-mounted a submount there (so this inode ID goes with a device ID
> that is distinct from the parent filesystem), but before the auto-mount,
> they have the device ID of the parent and the inode ID for the submount.
> This is problematic because this is likely exactly the same
> st_dev/st_ino combination as the parent filesystem's root node.  This
> leads to problems for example with `find`, which will thus complain
> about a filesystem loop if it has visited the parent filesystem's root
> node before, and then refuse to descend into the submount.
> 
> There is a way to find the mount directory's original inode ID, and that
> is to readdir(3) the parent directory, look for the mount directory, and
> read the dirent.d_ino field.  Using this, we can let lookup and
> readdirplus return that original inode ID, which the guest will thus
> show until the submount is auto-mounted.

> (Then, it will invoke getattr
> and that stat(2) call will return the inode ID for the submount.)

Hi Max,

How are we sure that GETATTR() will always be called and that will
allow us to return inode number in mounted filesystem (instead of
parent filesystem). I thought GETATTR will be called only if cached
attrs have expired. (1 second default for cache=auto). Otherwise
stat() will fill inode->i_no from cache and return. And I am afraid
that in that case we will return inode number from parent fs,
instead of mounted fs.

Say following sequence of events happens pretty fast one after the
other. Say /mnt/virtiofs/foo is a mount point in server but client
has not created submount yet.

A. stat(/mnt/virtiofs/foo, AT_NO_AUTOMOUNT)
   -> This should get inode number in parent filesystem on host and 
      store in guest inode->i_no and return to user space. Say inode
      in guest is called a_ino.
B. stat(/mnt/virtiofs/foo)
   -> This should create submount and create new inode (say b_ino), using
      properties from a_ino. IOW, this should copy a_ino->i_no to
      b_ino->b_ino given current code, IIUC.

   -> Assume timeout has not happened and cached attrs have not expired.

   -> And now b_ino->i_no (or ->orig_ino) will be returned to user space.

Am I missing something. Do we need to always expire inode attrs when
we create submount so that client is forced to issue GETATTR.

Thanks
Vivek

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 104 +++++++++++++++++++++++++++++--
>  1 file changed, 99 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 1553d2ef45..110b6e7e5b 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -968,14 +968,87 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>      return 0;
>  }
>  
> +/*
> + * Use readdir() to find mp_name's inode ID on the parent's filesystem.
> + * (For mount points, stat() will only return the inode ID on the
> + * filesystem mounted there, i.e. the root directory's inode ID.  The
> + * mount point originally was a directory on the parent filesystem,
> + * though, and so has a different inode ID there.  When passing
> + * submount information to the guest, we need to pass this other ID,
> + * so the guest can use it as the inode ID until the submount is
> + * auto-mounted.  (At which point the guest will invoke getattr and
> + * find the inode ID on the submount.))
> + *
> + * Return 0 on success, and -errno otherwise.  *pino is set only in
> + * case of success.
> + */
> +static int get_mp_ino_on_parent(const struct lo_inode *dir, const char *mp_name,
> +                                ino_t *pino)
> +{
> +    int dirfd = -1;
> +    int ret;
> +    DIR *dp = NULL;
> +
> +    dirfd = openat(dir->fd, ".", O_RDONLY);
> +    if (dirfd < 0) {
> +        ret = -errno;
> +        goto out;
> +    }
> +
> +    dp = fdopendir(dirfd);
> +    if (!dp) {
> +        ret = -errno;
> +        goto out;
> +    }
> +    /* Owned by dp now */
> +    dirfd = -1;
> +
> +    while (true) {
> +        struct dirent *de;
> +
> +        errno = 0;
> +        de = readdir(dp);
> +        if (!de) {
> +            ret = errno ? -errno : -ENOENT;
> +            goto out;
> +        }
> +
> +        if (!strcmp(de->d_name, mp_name)) {
> +            *pino = de->d_ino;
> +            ret = 0;
> +            goto out;
> +        }
> +    }
> +
> +out:
> +    if (dp) {
> +        closedir(dp);
> +    }
> +    if (dirfd >= 0) {
> +        close(dirfd);
> +    }
> +    return ret;
> +}
> +
>  /*
>   * Increments nlookup on the inode on success. unref_inode_lolocked() must be
>   * called eventually to decrement nlookup again. If inodep is non-NULL, the
>   * inode pointer is stored and the caller must call lo_inode_put().
> + *
> + * If parent_fs_st_ino is true, the entry is a mount point, and submounts are
> + * announced to the guest, set e->attr.st_ino to the entry's inode ID on its
> + * parent filesystem instead of its inode ID on the filesystem mounted on it.
> + * (For mount points, the entry encompasses two inodes: One on the parent FS,
> + * and one on the mounted FS (where it is the root node), so it has two inode
> + * IDs.  When looking up entries, we should show the guest the parent FS's inode
> + * ID, because as long as the guest has not auto-mounted the submount, it should
> + * see that original ID.  Once it does perform the auto-mount, it will invoke
> + * getattr and see the root node's inode ID.)
>   */
>  static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>                          struct fuse_entry_param *e,
> -                        struct lo_inode **inodep)
> +                        struct lo_inode **inodep,
> +                        bool parent_fs_st_ino)
>  {
>      int newfd;
>      int res;
> @@ -984,6 +1057,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode = NULL;
>      struct lo_inode *dir = lo_inode(req, parent);
> +    ino_t ino_id_for_guest;
>  
>      if (inodep) {
>          *inodep = NULL; /* in case there is an error */
> @@ -1018,9 +1092,22 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out_err;
>      }
>  
> +    ino_id_for_guest = e->attr.st_ino;
> +
>      if (S_ISDIR(e->attr.st_mode) && lo->announce_submounts &&
>          (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id)) {
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
> +
> +        if (parent_fs_st_ino) {
> +            /*
> +             * Best effort, so ignore errors.
> +             * Also note that using readdir() means there may be races:
> +             * The directory entry we find (if any) may be different
> +             * from newfd.  Again, this is a best effort.  Reporting
> +             * the wrong inode ID to the guest is not catastrophic.
> +             */
> +            get_mp_ino_on_parent(dir, name, &ino_id_for_guest);
> +        }
>      }
>  
>      inode = lo_find(lo, &e->attr, mnt_id);
> @@ -1043,6 +1130,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>  
>          inode->nlookup = 1;
>          inode->fd = newfd;
> +        /*
> +         * For the inode key, use the dev/ino/mnt ID as reported by stat()
> +         * (i.e. not ino_id_for_guest)
> +         */
>          inode->key.ino = e->attr.st_ino;
>          inode->key.dev = e->attr.st_dev;
>          inode->key.mnt_id = mnt_id;
> @@ -1058,6 +1149,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      }
>      e->ino = inode->fuse_ino;
>  
> +    /* Report ino_id_for_guest to the guest */
> +    e->attr.st_ino = ino_id_for_guest;
> +
>      /* Transfer ownership of inode pointer to caller or drop it */
>      if (inodep) {
>          *inodep = inode;
> @@ -1104,7 +1198,7 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
>          return;
>      }
>  
> -    err = lo_do_lookup(req, parent, name, &e, NULL);
> +    err = lo_do_lookup(req, parent, name, &e, NULL, true);
>      if (err) {
>          fuse_reply_err(req, err);
>      } else {
> @@ -1217,7 +1311,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
>          goto out;
>      }
>  
> -    saverr = lo_do_lookup(req, parent, name, &e, NULL);
> +    saverr = lo_do_lookup(req, parent, name, &e, NULL, false);
>      if (saverr) {
>          goto out;
>      }
> @@ -1714,7 +1808,7 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
>  
>          if (plus) {
>              if (!is_dot_or_dotdot(name)) {
> -                err = lo_do_lookup(req, ino, name, &e, NULL);
> +                err = lo_do_lookup(req, ino, name, &e, NULL, true);
>                  if (err) {
>                      goto error;
>                  }
> @@ -1936,7 +2030,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out;
>      }
>  
> -    err = lo_do_lookup(req, parent, name, &e, &inode);
> +    err = lo_do_lookup(req, parent, name, &e, &inode, false);
>      if (err) {
>          goto out;
>      }
> -- 
> 2.31.1
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Virtio-fs] [PATCH 1/3] virtiofsd: Find original inode ID of mount points
  2021-05-26 18:13     ` [Virtio-fs] " Vivek Goyal
  (?)
@ 2021-05-26 18:50     ` Vivek Goyal
  2021-05-27 15:00       ` Max Reitz
  -1 siblings, 1 reply; 22+ messages in thread
From: Vivek Goyal @ 2021-05-26 18:50 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list, qemu-devel

On Wed, May 26, 2021 at 02:13:24PM -0400, Vivek Goyal wrote:
> On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
> > Mount point directories represent two inodes: On one hand, they are a
> > normal directory on their parent filesystem.  On the other, they are the
> > root node of the filesystem mounted there.  Thus, they have two inode
> > IDs.
> > 
> > Right now, we only report the latter inode ID (i.e. the inode ID of the
> > mounted filesystem's root node).  This is fine once the guest has
> > auto-mounted a submount there (so this inode ID goes with a device ID
> > that is distinct from the parent filesystem), but before the auto-mount,
> > they have the device ID of the parent and the inode ID for the submount.
> > This is problematic because this is likely exactly the same
> > st_dev/st_ino combination as the parent filesystem's root node.  This
> > leads to problems for example with `find`, which will thus complain
> > about a filesystem loop if it has visited the parent filesystem's root
> > node before, and then refuse to descend into the submount.
> > 
> > There is a way to find the mount directory's original inode ID, and that
> > is to readdir(3) the parent directory, look for the mount directory, and
> > read the dirent.d_ino field.  Using this, we can let lookup and
> > readdirplus return that original inode ID, which the guest will thus
> > show until the submount is auto-mounted.
> 
> > (Then, it will invoke getattr
> > and that stat(2) call will return the inode ID for the submount.)
> 
> Hi Max,
> 
> How are we sure that GETATTR() will always be called and that will
> allow us to return inode number in mounted filesystem (instead of
> parent filesystem). I thought GETATTR will be called only if cached
> attrs have expired. (1 second default for cache=auto). Otherwise
> stat() will fill inode->i_no from cache and return. And I am afraid
> that in that case we will return inode number from parent fs,
> instead of mounted fs.
> 
> Say following sequence of events happens pretty fast one after the
> other. Say /mnt/virtiofs/foo is a mount point in server but client
> has not created submount yet.
> 
> A. stat(/mnt/virtiofs/foo, AT_NO_AUTOMOUNT)
>    -> This should get inode number in parent filesystem on host and 
>       store in guest inode->i_no and return to user space. Say inode
>       in guest is called a_ino.
> B. stat(/mnt/virtiofs/foo)
>    -> This should create submount and create new inode (say b_ino), using
>       properties from a_ino. IOW, this should copy a_ino->i_no to
>       b_ino->b_ino given current code, IIUC.
> 
>    -> Assume timeout has not happened and cached attrs have not expired.
> 
>    -> And now b_ino->i_no (or ->orig_ino) will be returned to user space.
> 
> Am I missing something. Do we need to always expire inode attrs when
> we create submount so that client is forced to issue GETATTR.

Looks like while initialzing b_ino, we are setting attr_valid=0, which
should set fi->i_time=0 and force issuing GETATTR later.

fuse_fill_super_submount()
  root = fuse_iget(sb, parent_fi->nodeid, 0, &root_attr, 0, 0);
                                                         ^
    fuse_iget(attr_valid=0)
       fuse_change_attributes(attr_valid=0)
	  fuse_change_attributes_common(attr_valid=0)
            fi->i_time = attr_valid;

So may be this will force GETATTR and fetch new inode id when second
stat() is called.

Thanks
Vivek

> > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 104 +++++++++++++++++++++++++++++--
> >  1 file changed, 99 insertions(+), 5 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 1553d2ef45..110b6e7e5b 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -968,14 +968,87 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
> >      return 0;
> >  }
> >  
> > +/*
> > + * Use readdir() to find mp_name's inode ID on the parent's filesystem.
> > + * (For mount points, stat() will only return the inode ID on the
> > + * filesystem mounted there, i.e. the root directory's inode ID.  The
> > + * mount point originally was a directory on the parent filesystem,
> > + * though, and so has a different inode ID there.  When passing
> > + * submount information to the guest, we need to pass this other ID,
> > + * so the guest can use it as the inode ID until the submount is
> > + * auto-mounted.  (At which point the guest will invoke getattr and
> > + * find the inode ID on the submount.))
> > + *
> > + * Return 0 on success, and -errno otherwise.  *pino is set only in
> > + * case of success.
> > + */
> > +static int get_mp_ino_on_parent(const struct lo_inode *dir, const char *mp_name,
> > +                                ino_t *pino)
> > +{
> > +    int dirfd = -1;
> > +    int ret;
> > +    DIR *dp = NULL;
> > +
> > +    dirfd = openat(dir->fd, ".", O_RDONLY);
> > +    if (dirfd < 0) {
> > +        ret = -errno;
> > +        goto out;
> > +    }
> > +
> > +    dp = fdopendir(dirfd);
> > +    if (!dp) {
> > +        ret = -errno;
> > +        goto out;
> > +    }
> > +    /* Owned by dp now */
> > +    dirfd = -1;
> > +
> > +    while (true) {
> > +        struct dirent *de;
> > +
> > +        errno = 0;
> > +        de = readdir(dp);
> > +        if (!de) {
> > +            ret = errno ? -errno : -ENOENT;
> > +            goto out;
> > +        }
> > +
> > +        if (!strcmp(de->d_name, mp_name)) {
> > +            *pino = de->d_ino;
> > +            ret = 0;
> > +            goto out;
> > +        }
> > +    }
> > +
> > +out:
> > +    if (dp) {
> > +        closedir(dp);
> > +    }
> > +    if (dirfd >= 0) {
> > +        close(dirfd);
> > +    }
> > +    return ret;
> > +}
> > +
> >  /*
> >   * Increments nlookup on the inode on success. unref_inode_lolocked() must be
> >   * called eventually to decrement nlookup again. If inodep is non-NULL, the
> >   * inode pointer is stored and the caller must call lo_inode_put().
> > + *
> > + * If parent_fs_st_ino is true, the entry is a mount point, and submounts are
> > + * announced to the guest, set e->attr.st_ino to the entry's inode ID on its
> > + * parent filesystem instead of its inode ID on the filesystem mounted on it.
> > + * (For mount points, the entry encompasses two inodes: One on the parent FS,
> > + * and one on the mounted FS (where it is the root node), so it has two inode
> > + * IDs.  When looking up entries, we should show the guest the parent FS's inode
> > + * ID, because as long as the guest has not auto-mounted the submount, it should
> > + * see that original ID.  Once it does perform the auto-mount, it will invoke
> > + * getattr and see the root node's inode ID.)
> >   */
> >  static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> >                          struct fuse_entry_param *e,
> > -                        struct lo_inode **inodep)
> > +                        struct lo_inode **inodep,
> > +                        bool parent_fs_st_ino)
> >  {
> >      int newfd;
> >      int res;
> > @@ -984,6 +1057,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> >      struct lo_data *lo = lo_data(req);
> >      struct lo_inode *inode = NULL;
> >      struct lo_inode *dir = lo_inode(req, parent);
> > +    ino_t ino_id_for_guest;
> >  
> >      if (inodep) {
> >          *inodep = NULL; /* in case there is an error */
> > @@ -1018,9 +1092,22 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> >          goto out_err;
> >      }
> >  
> > +    ino_id_for_guest = e->attr.st_ino;
> > +
> >      if (S_ISDIR(e->attr.st_mode) && lo->announce_submounts &&
> >          (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id)) {
> >          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
> > +
> > +        if (parent_fs_st_ino) {
> > +            /*
> > +             * Best effort, so ignore errors.
> > +             * Also note that using readdir() means there may be races:
> > +             * The directory entry we find (if any) may be different
> > +             * from newfd.  Again, this is a best effort.  Reporting
> > +             * the wrong inode ID to the guest is not catastrophic.
> > +             */
> > +            get_mp_ino_on_parent(dir, name, &ino_id_for_guest);
> > +        }
> >      }
> >  
> >      inode = lo_find(lo, &e->attr, mnt_id);
> > @@ -1043,6 +1130,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> >  
> >          inode->nlookup = 1;
> >          inode->fd = newfd;
> > +        /*
> > +         * For the inode key, use the dev/ino/mnt ID as reported by stat()
> > +         * (i.e. not ino_id_for_guest)
> > +         */
> >          inode->key.ino = e->attr.st_ino;
> >          inode->key.dev = e->attr.st_dev;
> >          inode->key.mnt_id = mnt_id;
> > @@ -1058,6 +1149,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> >      }
> >      e->ino = inode->fuse_ino;
> >  
> > +    /* Report ino_id_for_guest to the guest */
> > +    e->attr.st_ino = ino_id_for_guest;
> > +
> >      /* Transfer ownership of inode pointer to caller or drop it */
> >      if (inodep) {
> >          *inodep = inode;
> > @@ -1104,7 +1198,7 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
> >          return;
> >      }
> >  
> > -    err = lo_do_lookup(req, parent, name, &e, NULL);
> > +    err = lo_do_lookup(req, parent, name, &e, NULL, true);
> >      if (err) {
> >          fuse_reply_err(req, err);
> >      } else {
> > @@ -1217,7 +1311,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
> >          goto out;
> >      }
> >  
> > -    saverr = lo_do_lookup(req, parent, name, &e, NULL);
> > +    saverr = lo_do_lookup(req, parent, name, &e, NULL, false);
> >      if (saverr) {
> >          goto out;
> >      }
> > @@ -1714,7 +1808,7 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
> >  
> >          if (plus) {
> >              if (!is_dot_or_dotdot(name)) {
> > -                err = lo_do_lookup(req, ino, name, &e, NULL);
> > +                err = lo_do_lookup(req, ino, name, &e, NULL, true);
> >                  if (err) {
> >                      goto error;
> >                  }
> > @@ -1936,7 +2030,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
> >          goto out;
> >      }
> >  
> > -    err = lo_do_lookup(req, parent, name, &e, &inode);
> > +    err = lo_do_lookup(req, parent, name, &e, &inode, false);
> >      if (err) {
> >          goto out;
> >      }
> > -- 
> > 2.31.1
> > 
> > 
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Virtio-fs] [PATCH 1/3] virtiofsd: Find original inode ID of mount points
  2021-05-26 18:50     ` Vivek Goyal
@ 2021-05-27 15:00       ` Max Reitz
  0 siblings, 0 replies; 22+ messages in thread
From: Max Reitz @ 2021-05-27 15:00 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel

On 26.05.21 20:50, Vivek Goyal wrote:
> On Wed, May 26, 2021 at 02:13:24PM -0400, Vivek Goyal wrote:
>> On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
>>> Mount point directories represent two inodes: On one hand, they are a
>>> normal directory on their parent filesystem.  On the other, they are the
>>> root node of the filesystem mounted there.  Thus, they have two inode
>>> IDs.
>>>
>>> Right now, we only report the latter inode ID (i.e. the inode ID of the
>>> mounted filesystem's root node).  This is fine once the guest has
>>> auto-mounted a submount there (so this inode ID goes with a device ID
>>> that is distinct from the parent filesystem), but before the auto-mount,
>>> they have the device ID of the parent and the inode ID for the submount.
>>> This is problematic because this is likely exactly the same
>>> st_dev/st_ino combination as the parent filesystem's root node.  This
>>> leads to problems for example with `find`, which will thus complain
>>> about a filesystem loop if it has visited the parent filesystem's root
>>> node before, and then refuse to descend into the submount.
>>>
>>> There is a way to find the mount directory's original inode ID, and that
>>> is to readdir(3) the parent directory, look for the mount directory, and
>>> read the dirent.d_ino field.  Using this, we can let lookup and
>>> readdirplus return that original inode ID, which the guest will thus
>>> show until the submount is auto-mounted.
>>
>>> (Then, it will invoke getattr
>>> and that stat(2) call will return the inode ID for the submount.)
>>
>> Hi Max,
>>
>> How are we sure that GETATTR() will always be called and that will
>> allow us to return inode number in mounted filesystem (instead of
>> parent filesystem). I thought GETATTR will be called only if cached
>> attrs have expired. (1 second default for cache=auto). Otherwise
>> stat() will fill inode->i_no from cache and return. And I am afraid
>> that in that case we will return inode number from parent fs,
>> instead of mounted fs.
>>
>> Say following sequence of events happens pretty fast one after the
>> other. Say /mnt/virtiofs/foo is a mount point in server but client
>> has not created submount yet.
>>
>> A. stat(/mnt/virtiofs/foo, AT_NO_AUTOMOUNT)
>>     -> This should get inode number in parent filesystem on host and
>>        store in guest inode->i_no and return to user space. Say inode
>>        in guest is called a_ino.
>> B. stat(/mnt/virtiofs/foo)
>>     -> This should create submount and create new inode (say b_ino), using
>>        properties from a_ino. IOW, this should copy a_ino->i_no to
>>        b_ino->b_ino given current code, IIUC.
>>
>>     -> Assume timeout has not happened and cached attrs have not expired.
>>
>>     -> And now b_ino->i_no (or ->orig_ino) will be returned to user space.

Well, I mean, this sounds easy enough to test.  For example, this passes 
for me:

count=1000
root_st_ino=128
tag=host
mountpoint=/mnt
submount_path=submount

for i in $(seq $count)
do
     mount -t virtiofs $tag $mountpoint || break
     if [ $(stat -c '%i' $mountpoint/$submount_path) -eq $root_st_ino ]
     then
         echo 'fail 0'
         break
     fi
     ls $mountpoint/$submount_path > /dev/null
     if [ $(stat -c '%i' $mountpoint/$submount_path) -ne $root_st_ino ]
     then
         echo 'fail 1'
         break
     fi
     umount $mountpoint || break
done
if [ $i -ne $count ]
then
     echo 'Something failed'
else
     echo 'OK'
fi

>> Am I missing something. Do we need to always expire inode attrs when
>> we create submount so that client is forced to issue GETATTR.
> 
> Looks like while initialzing b_ino, we are setting attr_valid=0, which
> should set fi->i_time=0 and force issuing GETATTR later.
> 
> fuse_fill_super_submount()
>    root = fuse_iget(sb, parent_fi->nodeid, 0, &root_attr, 0, 0);
>                                                           ^
>      fuse_iget(attr_valid=0)
>         fuse_change_attributes(attr_valid=0)
> 	  fuse_change_attributes_common(attr_valid=0)
>              fi->i_time = attr_valid;
> 
> So may be this will force GETATTR and fetch new inode id when second
> stat() is called.

i_time is at least what fuse_update_get_attr() uses to decide whether to 
invoke GETATTR or not – although AT_STATX_DONT_SYNC can override that, 
but I don’t think that’s a problem.  If i_time is 0, that function will 
always invoke GETATTR (unless AT_STATX_DONT_SYNC).

So I think it works in practice.  When the inode ID is looked up through 
some stat-y function, this should go through fuse_update_get_attr(), 
which will fetch the st_ino on the submount.  There is still i_ino, 
though...

To be absolutely certain, we could invoke fuse_update_attributes() in 
fuse_fill_super_submount(), but then again, fuse_fill_super_common() 
doesn’t do that either for its root node.  It just initializes i_ino to 
FUSE_ROOT_ID, i.e. 1.

Max


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] virtiofsd: Find original inode ID of mount points
  2021-05-12 12:55 ` [PATCH 1/3] " Max Reitz
@ 2021-06-02 18:19     ` Vivek Goyal
  2021-05-17 14:57     ` [Virtio-fs] " Vivek Goyal
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2021-06-02 18:19 UTC (permalink / raw)
  To: Max Reitz
  Cc: virtio-fs-list, Miklos Szeredi, qemu-devel, Stefan Hajnoczi,
	Dr . David Alan Gilbert

On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
> Mount point directories represent two inodes: On one hand, they are a
> normal directory on their parent filesystem.  On the other, they are the
> root node of the filesystem mounted there.  Thus, they have two inode
> IDs.
> 
> Right now, we only report the latter inode ID (i.e. the inode ID of the
> mounted filesystem's root node).  This is fine once the guest has
> auto-mounted a submount there (so this inode ID goes with a device ID
> that is distinct from the parent filesystem), but before the auto-mount,
> they have the device ID of the parent and the inode ID for the submount.
> This is problematic because this is likely exactly the same
> st_dev/st_ino combination as the parent filesystem's root node.  This
> leads to problems for example with `find`, which will thus complain
> about a filesystem loop if it has visited the parent filesystem's root
> node before, and then refuse to descend into the submount.
> 
> There is a way to find the mount directory's original inode ID, and that
> is to readdir(3) the parent directory, look for the mount directory, and
> read the dirent.d_ino field.  Using this, we can let lookup and
> readdirplus return that original inode ID, which the guest will thus
> show until the submount is auto-mounted.  (Then, it will invoke getattr
> and that stat(2) call will return the inode ID for the submount.)

[ CC miklos ]

Hi Max,

Though we discussed this in chat room, I am still responding to this
email with the concern I have, so that there is a record of it.

So with this patch for FUSE_LOOKUP  we always return submount's parentinode
id and with GETATTR request we return actual inode id of submount. That
kind of bothers me a bit as we are assuming that there is always going
to be a GETATTR request after FUSE_LOOKUP. FUSE_LOOKUP itself has attrs
returned with it and it might happen that after FUSE_LOOKUP, FUSE_GETATTR
might not be called at all because FUSE_LOOKUP itself got the latest
updated attrs with certain timeout.

For example, if I call stat on a normal file (not submount), I see that
after FUSE_LOOKUP, no FUSE_GETATTR request is going and
fuse_update_get_attr() is using attrs from locally cached inode attrs.

But same thing does not seem to be happening in case of submount. Once
submount is created in guest, I see that we never seem to be calling
->revalidate() on newly created dentry of submount root. I am not sure
why. And that means we don't do FUSE_LOOKUP and that means FUSE_GETATTR
always gets called.

I am not sure if this is a bug or expected behavior. I am not confident
that we can rely on this behavior of submount dentries. 

Ccing Miklos. He might have thoughts on this.

As we discussed, may be it is a better idea to create two inodes in
virtiofsd (like we do in guest). One for submount parent and one for
actual submount.

Thanks
Vivek


> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 104 +++++++++++++++++++++++++++++--
>  1 file changed, 99 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 1553d2ef45..110b6e7e5b 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -968,14 +968,87 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>      return 0;
>  }
>  
> +/*
> + * Use readdir() to find mp_name's inode ID on the parent's filesystem.
> + * (For mount points, stat() will only return the inode ID on the
> + * filesystem mounted there, i.e. the root directory's inode ID.  The
> + * mount point originally was a directory on the parent filesystem,
> + * though, and so has a different inode ID there.  When passing
> + * submount information to the guest, we need to pass this other ID,
> + * so the guest can use it as the inode ID until the submount is
> + * auto-mounted.  (At which point the guest will invoke getattr and
> + * find the inode ID on the submount.))
> + *
> + * Return 0 on success, and -errno otherwise.  *pino is set only in
> + * case of success.
> + */
> +static int get_mp_ino_on_parent(const struct lo_inode *dir, const char *mp_name,
> +                                ino_t *pino)
> +{
> +    int dirfd = -1;
> +    int ret;
> +    DIR *dp = NULL;
> +
> +    dirfd = openat(dir->fd, ".", O_RDONLY);
> +    if (dirfd < 0) {
> +        ret = -errno;
> +        goto out;
> +    }
> +
> +    dp = fdopendir(dirfd);
> +    if (!dp) {
> +        ret = -errno;
> +        goto out;
> +    }
> +    /* Owned by dp now */
> +    dirfd = -1;
> +
> +    while (true) {
> +        struct dirent *de;
> +
> +        errno = 0;
> +        de = readdir(dp);
> +        if (!de) {
> +            ret = errno ? -errno : -ENOENT;
> +            goto out;
> +        }
> +
> +        if (!strcmp(de->d_name, mp_name)) {
> +            *pino = de->d_ino;
> +            ret = 0;
> +            goto out;
> +        }
> +    }
> +
> +out:
> +    if (dp) {
> +        closedir(dp);
> +    }
> +    if (dirfd >= 0) {
> +        close(dirfd);
> +    }
> +    return ret;
> +}
> +
>  /*
>   * Increments nlookup on the inode on success. unref_inode_lolocked() must be
>   * called eventually to decrement nlookup again. If inodep is non-NULL, the
>   * inode pointer is stored and the caller must call lo_inode_put().
> + *
> + * If parent_fs_st_ino is true, the entry is a mount point, and submounts are
> + * announced to the guest, set e->attr.st_ino to the entry's inode ID on its
> + * parent filesystem instead of its inode ID on the filesystem mounted on it.
> + * (For mount points, the entry encompasses two inodes: One on the parent FS,
> + * and one on the mounted FS (where it is the root node), so it has two inode
> + * IDs.  When looking up entries, we should show the guest the parent FS's inode
> + * ID, because as long as the guest has not auto-mounted the submount, it should
> + * see that original ID.  Once it does perform the auto-mount, it will invoke
> + * getattr and see the root node's inode ID.)
>   */
>  static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>                          struct fuse_entry_param *e,
> -                        struct lo_inode **inodep)
> +                        struct lo_inode **inodep,
> +                        bool parent_fs_st_ino)
>  {
>      int newfd;
>      int res;
> @@ -984,6 +1057,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode = NULL;
>      struct lo_inode *dir = lo_inode(req, parent);
> +    ino_t ino_id_for_guest;
>  
>      if (inodep) {
>          *inodep = NULL; /* in case there is an error */
> @@ -1018,9 +1092,22 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out_err;
>      }
>  
> +    ino_id_for_guest = e->attr.st_ino;
> +
>      if (S_ISDIR(e->attr.st_mode) && lo->announce_submounts &&
>          (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id)) {
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
> +
> +        if (parent_fs_st_ino) {
> +            /*
> +             * Best effort, so ignore errors.
> +             * Also note that using readdir() means there may be races:
> +             * The directory entry we find (if any) may be different
> +             * from newfd.  Again, this is a best effort.  Reporting
> +             * the wrong inode ID to the guest is not catastrophic.
> +             */
> +            get_mp_ino_on_parent(dir, name, &ino_id_for_guest);
> +        }
>      }
>  
>      inode = lo_find(lo, &e->attr, mnt_id);
> @@ -1043,6 +1130,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>  
>          inode->nlookup = 1;
>          inode->fd = newfd;
> +        /*
> +         * For the inode key, use the dev/ino/mnt ID as reported by stat()
> +         * (i.e. not ino_id_for_guest)
> +         */
>          inode->key.ino = e->attr.st_ino;
>          inode->key.dev = e->attr.st_dev;
>          inode->key.mnt_id = mnt_id;
> @@ -1058,6 +1149,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      }
>      e->ino = inode->fuse_ino;
>  
> +    /* Report ino_id_for_guest to the guest */
> +    e->attr.st_ino = ino_id_for_guest;
> +
>      /* Transfer ownership of inode pointer to caller or drop it */
>      if (inodep) {
>          *inodep = inode;
> @@ -1104,7 +1198,7 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
>          return;
>      }
>  
> -    err = lo_do_lookup(req, parent, name, &e, NULL);
> +    err = lo_do_lookup(req, parent, name, &e, NULL, true);
>      if (err) {
>          fuse_reply_err(req, err);
>      } else {
> @@ -1217,7 +1311,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
>          goto out;
>      }
>  
> -    saverr = lo_do_lookup(req, parent, name, &e, NULL);
> +    saverr = lo_do_lookup(req, parent, name, &e, NULL, false);
>      if (saverr) {
>          goto out;
>      }
> @@ -1714,7 +1808,7 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
>  
>          if (plus) {
>              if (!is_dot_or_dotdot(name)) {
> -                err = lo_do_lookup(req, ino, name, &e, NULL);
> +                err = lo_do_lookup(req, ino, name, &e, NULL, true);
>                  if (err) {
>                      goto error;
>                  }
> @@ -1936,7 +2030,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out;
>      }
>  
> -    err = lo_do_lookup(req, parent, name, &e, &inode);
> +    err = lo_do_lookup(req, parent, name, &e, &inode, false);
>      if (err) {
>          goto out;
>      }
> -- 
> 2.31.1
> 
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Virtio-fs] [PATCH 1/3] virtiofsd: Find original inode ID of mount points
@ 2021-06-02 18:19     ` Vivek Goyal
  0 siblings, 0 replies; 22+ messages in thread
From: Vivek Goyal @ 2021-06-02 18:19 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list, Miklos Szeredi, qemu-devel

On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
> Mount point directories represent two inodes: On one hand, they are a
> normal directory on their parent filesystem.  On the other, they are the
> root node of the filesystem mounted there.  Thus, they have two inode
> IDs.
> 
> Right now, we only report the latter inode ID (i.e. the inode ID of the
> mounted filesystem's root node).  This is fine once the guest has
> auto-mounted a submount there (so this inode ID goes with a device ID
> that is distinct from the parent filesystem), but before the auto-mount,
> they have the device ID of the parent and the inode ID for the submount.
> This is problematic because this is likely exactly the same
> st_dev/st_ino combination as the parent filesystem's root node.  This
> leads to problems for example with `find`, which will thus complain
> about a filesystem loop if it has visited the parent filesystem's root
> node before, and then refuse to descend into the submount.
> 
> There is a way to find the mount directory's original inode ID, and that
> is to readdir(3) the parent directory, look for the mount directory, and
> read the dirent.d_ino field.  Using this, we can let lookup and
> readdirplus return that original inode ID, which the guest will thus
> show until the submount is auto-mounted.  (Then, it will invoke getattr
> and that stat(2) call will return the inode ID for the submount.)

[ CC miklos ]

Hi Max,

Though we discussed this in chat room, I am still responding to this
email with the concern I have, so that there is a record of it.

So with this patch for FUSE_LOOKUP  we always return submount's parentinode
id and with GETATTR request we return actual inode id of submount. That
kind of bothers me a bit as we are assuming that there is always going
to be a GETATTR request after FUSE_LOOKUP. FUSE_LOOKUP itself has attrs
returned with it and it might happen that after FUSE_LOOKUP, FUSE_GETATTR
might not be called at all because FUSE_LOOKUP itself got the latest
updated attrs with certain timeout.

For example, if I call stat on a normal file (not submount), I see that
after FUSE_LOOKUP, no FUSE_GETATTR request is going and
fuse_update_get_attr() is using attrs from locally cached inode attrs.

But same thing does not seem to be happening in case of submount. Once
submount is created in guest, I see that we never seem to be calling
->revalidate() on newly created dentry of submount root. I am not sure
why. And that means we don't do FUSE_LOOKUP and that means FUSE_GETATTR
always gets called.

I am not sure if this is a bug or expected behavior. I am not confident
that we can rely on this behavior of submount dentries. 

Ccing Miklos. He might have thoughts on this.

As we discussed, may be it is a better idea to create two inodes in
virtiofsd (like we do in guest). One for submount parent and one for
actual submount.

Thanks
Vivek


> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 104 +++++++++++++++++++++++++++++--
>  1 file changed, 99 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 1553d2ef45..110b6e7e5b 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -968,14 +968,87 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>      return 0;
>  }
>  
> +/*
> + * Use readdir() to find mp_name's inode ID on the parent's filesystem.
> + * (For mount points, stat() will only return the inode ID on the
> + * filesystem mounted there, i.e. the root directory's inode ID.  The
> + * mount point originally was a directory on the parent filesystem,
> + * though, and so has a different inode ID there.  When passing
> + * submount information to the guest, we need to pass this other ID,
> + * so the guest can use it as the inode ID until the submount is
> + * auto-mounted.  (At which point the guest will invoke getattr and
> + * find the inode ID on the submount.))
> + *
> + * Return 0 on success, and -errno otherwise.  *pino is set only in
> + * case of success.
> + */
> +static int get_mp_ino_on_parent(const struct lo_inode *dir, const char *mp_name,
> +                                ino_t *pino)
> +{
> +    int dirfd = -1;
> +    int ret;
> +    DIR *dp = NULL;
> +
> +    dirfd = openat(dir->fd, ".", O_RDONLY);
> +    if (dirfd < 0) {
> +        ret = -errno;
> +        goto out;
> +    }
> +
> +    dp = fdopendir(dirfd);
> +    if (!dp) {
> +        ret = -errno;
> +        goto out;
> +    }
> +    /* Owned by dp now */
> +    dirfd = -1;
> +
> +    while (true) {
> +        struct dirent *de;
> +
> +        errno = 0;
> +        de = readdir(dp);
> +        if (!de) {
> +            ret = errno ? -errno : -ENOENT;
> +            goto out;
> +        }
> +
> +        if (!strcmp(de->d_name, mp_name)) {
> +            *pino = de->d_ino;
> +            ret = 0;
> +            goto out;
> +        }
> +    }
> +
> +out:
> +    if (dp) {
> +        closedir(dp);
> +    }
> +    if (dirfd >= 0) {
> +        close(dirfd);
> +    }
> +    return ret;
> +}
> +
>  /*
>   * Increments nlookup on the inode on success. unref_inode_lolocked() must be
>   * called eventually to decrement nlookup again. If inodep is non-NULL, the
>   * inode pointer is stored and the caller must call lo_inode_put().
> + *
> + * If parent_fs_st_ino is true, the entry is a mount point, and submounts are
> + * announced to the guest, set e->attr.st_ino to the entry's inode ID on its
> + * parent filesystem instead of its inode ID on the filesystem mounted on it.
> + * (For mount points, the entry encompasses two inodes: One on the parent FS,
> + * and one on the mounted FS (where it is the root node), so it has two inode
> + * IDs.  When looking up entries, we should show the guest the parent FS's inode
> + * ID, because as long as the guest has not auto-mounted the submount, it should
> + * see that original ID.  Once it does perform the auto-mount, it will invoke
> + * getattr and see the root node's inode ID.)
>   */
>  static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>                          struct fuse_entry_param *e,
> -                        struct lo_inode **inodep)
> +                        struct lo_inode **inodep,
> +                        bool parent_fs_st_ino)
>  {
>      int newfd;
>      int res;
> @@ -984,6 +1057,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      struct lo_data *lo = lo_data(req);
>      struct lo_inode *inode = NULL;
>      struct lo_inode *dir = lo_inode(req, parent);
> +    ino_t ino_id_for_guest;
>  
>      if (inodep) {
>          *inodep = NULL; /* in case there is an error */
> @@ -1018,9 +1092,22 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out_err;
>      }
>  
> +    ino_id_for_guest = e->attr.st_ino;
> +
>      if (S_ISDIR(e->attr.st_mode) && lo->announce_submounts &&
>          (e->attr.st_dev != dir->key.dev || mnt_id != dir->key.mnt_id)) {
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
> +
> +        if (parent_fs_st_ino) {
> +            /*
> +             * Best effort, so ignore errors.
> +             * Also note that using readdir() means there may be races:
> +             * The directory entry we find (if any) may be different
> +             * from newfd.  Again, this is a best effort.  Reporting
> +             * the wrong inode ID to the guest is not catastrophic.
> +             */
> +            get_mp_ino_on_parent(dir, name, &ino_id_for_guest);
> +        }
>      }
>  
>      inode = lo_find(lo, &e->attr, mnt_id);
> @@ -1043,6 +1130,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>  
>          inode->nlookup = 1;
>          inode->fd = newfd;
> +        /*
> +         * For the inode key, use the dev/ino/mnt ID as reported by stat()
> +         * (i.e. not ino_id_for_guest)
> +         */
>          inode->key.ino = e->attr.st_ino;
>          inode->key.dev = e->attr.st_dev;
>          inode->key.mnt_id = mnt_id;
> @@ -1058,6 +1149,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>      }
>      e->ino = inode->fuse_ino;
>  
> +    /* Report ino_id_for_guest to the guest */
> +    e->attr.st_ino = ino_id_for_guest;
> +
>      /* Transfer ownership of inode pointer to caller or drop it */
>      if (inodep) {
>          *inodep = inode;
> @@ -1104,7 +1198,7 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
>          return;
>      }
>  
> -    err = lo_do_lookup(req, parent, name, &e, NULL);
> +    err = lo_do_lookup(req, parent, name, &e, NULL, true);
>      if (err) {
>          fuse_reply_err(req, err);
>      } else {
> @@ -1217,7 +1311,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
>          goto out;
>      }
>  
> -    saverr = lo_do_lookup(req, parent, name, &e, NULL);
> +    saverr = lo_do_lookup(req, parent, name, &e, NULL, false);
>      if (saverr) {
>          goto out;
>      }
> @@ -1714,7 +1808,7 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
>  
>          if (plus) {
>              if (!is_dot_or_dotdot(name)) {
> -                err = lo_do_lookup(req, ino, name, &e, NULL);
> +                err = lo_do_lookup(req, ino, name, &e, NULL, true);
>                  if (err) {
>                      goto error;
>                  }
> @@ -1936,7 +2030,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
>          goto out;
>      }
>  
> -    err = lo_do_lookup(req, parent, name, &e, &inode);
> +    err = lo_do_lookup(req, parent, name, &e, &inode, false);
>      if (err) {
>          goto out;
>      }
> -- 
> 2.31.1
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] virtiofsd: Find original inode ID of mount points
  2021-06-02 18:19     ` [Virtio-fs] " Vivek Goyal
@ 2021-06-02 18:59       ` Miklos Szeredi
  -1 siblings, 0 replies; 22+ messages in thread
From: Miklos Szeredi @ 2021-06-02 18:59 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs-list, Stefan Hajnoczi, qemu-devel,
	Dr . David Alan Gilbert, Max Reitz

On Wed, 2 Jun 2021 at 20:20, Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
> > Mount point directories represent two inodes: On one hand, they are a
> > normal directory on their parent filesystem.  On the other, they are the
> > root node of the filesystem mounted there.  Thus, they have two inode
> > IDs.
> >
> > Right now, we only report the latter inode ID (i.e. the inode ID of the
> > mounted filesystem's root node).  This is fine once the guest has
> > auto-mounted a submount there (so this inode ID goes with a device ID
> > that is distinct from the parent filesystem), but before the auto-mount,
> > they have the device ID of the parent and the inode ID for the submount.
> > This is problematic because this is likely exactly the same
> > st_dev/st_ino combination as the parent filesystem's root node.  This
> > leads to problems for example with `find`, which will thus complain
> > about a filesystem loop if it has visited the parent filesystem's root
> > node before, and then refuse to descend into the submount.
> >
> > There is a way to find the mount directory's original inode ID, and that
> > is to readdir(3) the parent directory, look for the mount directory, and
> > read the dirent.d_ino field.  Using this, we can let lookup and
> > readdirplus return that original inode ID, which the guest will thus
> > show until the submount is auto-mounted.  (Then, it will invoke getattr
> > and that stat(2) call will return the inode ID for the submount.)
>
> [ CC miklos ]
>
> Hi Max,
>
> Though we discussed this in chat room, I am still responding to this
> email with the concern I have, so that there is a record of it.
>
> So with this patch for FUSE_LOOKUP  we always return submount's parentinode
> id and with GETATTR request we return actual inode id of submount. That
> kind of bothers me a bit as we are assuming that there is always going
> to be a GETATTR request after FUSE_LOOKUP. FUSE_LOOKUP itself has attrs
> returned with it and it might happen that after FUSE_LOOKUP, FUSE_GETATTR
> might not be called at all because FUSE_LOOKUP itself got the latest
> updated attrs with certain timeout.
>
> For example, if I call stat on a normal file (not submount), I see that
> after FUSE_LOOKUP, no FUSE_GETATTR request is going and
> fuse_update_get_attr() is using attrs from locally cached inode attrs.
>
> But same thing does not seem to be happening in case of submount. Once
> submount is created in guest, I see that we never seem to be calling
> ->revalidate() on newly created dentry of submount root. I am not sure
> why. And that means we don't do FUSE_LOOKUP and that means FUSE_GETATTR
> always gets called.

Yes, this sounds normal.

The lookup sequence for "/mnt/virtiofs/dir1/submount/file" will be:

LOOKUP(1, "dir")
    create inode I1 in sb1
    create dentry "dir" with inode I1 in sb1
LOOKUP(I1, "submount")
    create inode I2 in sb1, set S_AUTOMOUNT
    create dentry "submount" with inode I2 in sb1
    automount triggered:
    create sb2
    create inode I2 in sb2
    create dentry "/" with inode I2 in sb2
GETATTR(I2)
     fill inode I2
LOOKUP(I2, "file")
     create inode I3
     create dentry "file" with inode I3 in sb2

Notice how there's two inodes numbered I2, but in separate sb's.
However the server has only the notion of a single I2 inode which is
the root of the submount, since the mountpoint is not visible (except
for d_ino in readdir()).

Now AFAICS what this patch does is set the inode number in the
attributes returned by LOOKUP(I1, "submount") to the covered inode, so
that AT_NO_AUTOMOUNT stat will return the correct value.   While this
seems to work, it's not a fundamental fix to the problem, since the
attributes on sb1/I2 will time out and the next stat(...,
AT_NO_AUTOMOUNT) will trigger a GETATTR request, which will return the
inode number of the submount root.

Solving this properly would mean that the server would have to
generate separate nodeid's for the mountpoint and the submount root
and the protocol would have to be extended so that this information
could be transferred to the client.

Not sure whether this is worth it or not.

Thanks,
Miklos


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Virtio-fs] [PATCH 1/3] virtiofsd: Find original inode ID of mount points
@ 2021-06-02 18:59       ` Miklos Szeredi
  0 siblings, 0 replies; 22+ messages in thread
From: Miklos Szeredi @ 2021-06-02 18:59 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel, Max Reitz

On Wed, 2 Jun 2021 at 20:20, Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
> > Mount point directories represent two inodes: On one hand, they are a
> > normal directory on their parent filesystem.  On the other, they are the
> > root node of the filesystem mounted there.  Thus, they have two inode
> > IDs.
> >
> > Right now, we only report the latter inode ID (i.e. the inode ID of the
> > mounted filesystem's root node).  This is fine once the guest has
> > auto-mounted a submount there (so this inode ID goes with a device ID
> > that is distinct from the parent filesystem), but before the auto-mount,
> > they have the device ID of the parent and the inode ID for the submount.
> > This is problematic because this is likely exactly the same
> > st_dev/st_ino combination as the parent filesystem's root node.  This
> > leads to problems for example with `find`, which will thus complain
> > about a filesystem loop if it has visited the parent filesystem's root
> > node before, and then refuse to descend into the submount.
> >
> > There is a way to find the mount directory's original inode ID, and that
> > is to readdir(3) the parent directory, look for the mount directory, and
> > read the dirent.d_ino field.  Using this, we can let lookup and
> > readdirplus return that original inode ID, which the guest will thus
> > show until the submount is auto-mounted.  (Then, it will invoke getattr
> > and that stat(2) call will return the inode ID for the submount.)
>
> [ CC miklos ]
>
> Hi Max,
>
> Though we discussed this in chat room, I am still responding to this
> email with the concern I have, so that there is a record of it.
>
> So with this patch for FUSE_LOOKUP  we always return submount's parentinode
> id and with GETATTR request we return actual inode id of submount. That
> kind of bothers me a bit as we are assuming that there is always going
> to be a GETATTR request after FUSE_LOOKUP. FUSE_LOOKUP itself has attrs
> returned with it and it might happen that after FUSE_LOOKUP, FUSE_GETATTR
> might not be called at all because FUSE_LOOKUP itself got the latest
> updated attrs with certain timeout.
>
> For example, if I call stat on a normal file (not submount), I see that
> after FUSE_LOOKUP, no FUSE_GETATTR request is going and
> fuse_update_get_attr() is using attrs from locally cached inode attrs.
>
> But same thing does not seem to be happening in case of submount. Once
> submount is created in guest, I see that we never seem to be calling
> ->revalidate() on newly created dentry of submount root. I am not sure
> why. And that means we don't do FUSE_LOOKUP and that means FUSE_GETATTR
> always gets called.

Yes, this sounds normal.

The lookup sequence for "/mnt/virtiofs/dir1/submount/file" will be:

LOOKUP(1, "dir")
    create inode I1 in sb1
    create dentry "dir" with inode I1 in sb1
LOOKUP(I1, "submount")
    create inode I2 in sb1, set S_AUTOMOUNT
    create dentry "submount" with inode I2 in sb1
    automount triggered:
    create sb2
    create inode I2 in sb2
    create dentry "/" with inode I2 in sb2
GETATTR(I2)
     fill inode I2
LOOKUP(I2, "file")
     create inode I3
     create dentry "file" with inode I3 in sb2

Notice how there's two inodes numbered I2, but in separate sb's.
However the server has only the notion of a single I2 inode which is
the root of the submount, since the mountpoint is not visible (except
for d_ino in readdir()).

Now AFAICS what this patch does is set the inode number in the
attributes returned by LOOKUP(I1, "submount") to the covered inode, so
that AT_NO_AUTOMOUNT stat will return the correct value.   While this
seems to work, it's not a fundamental fix to the problem, since the
attributes on sb1/I2 will time out and the next stat(...,
AT_NO_AUTOMOUNT) will trigger a GETATTR request, which will return the
inode number of the submount root.

Solving this properly would mean that the server would have to
generate separate nodeid's for the mountpoint and the submount root
and the protocol would have to be extended so that this information
could be transferred to the client.

Not sure whether this is worth it or not.

Thanks,
Miklos


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] virtiofsd: Find original inode ID of mount points
  2021-06-02 18:59       ` [Virtio-fs] " Miklos Szeredi
@ 2021-06-04 16:22         ` Max Reitz
  -1 siblings, 0 replies; 22+ messages in thread
From: Max Reitz @ 2021-06-04 16:22 UTC (permalink / raw)
  To: Miklos Szeredi, Vivek Goyal
  Cc: virtio-fs-list, qemu-devel, Stefan Hajnoczi, Dr . David Alan Gilbert

On 02.06.21 20:59, Miklos Szeredi wrote:
> On Wed, 2 Jun 2021 at 20:20, Vivek Goyal <vgoyal@redhat.com> wrote:
>> On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
>>> Mount point directories represent two inodes: On one hand, they are a
>>> normal directory on their parent filesystem.  On the other, they are the
>>> root node of the filesystem mounted there.  Thus, they have two inode
>>> IDs.
>>>
>>> Right now, we only report the latter inode ID (i.e. the inode ID of the
>>> mounted filesystem's root node).  This is fine once the guest has
>>> auto-mounted a submount there (so this inode ID goes with a device ID
>>> that is distinct from the parent filesystem), but before the auto-mount,
>>> they have the device ID of the parent and the inode ID for the submount.
>>> This is problematic because this is likely exactly the same
>>> st_dev/st_ino combination as the parent filesystem's root node.  This
>>> leads to problems for example with `find`, which will thus complain
>>> about a filesystem loop if it has visited the parent filesystem's root
>>> node before, and then refuse to descend into the submount.
>>>
>>> There is a way to find the mount directory's original inode ID, and that
>>> is to readdir(3) the parent directory, look for the mount directory, and
>>> read the dirent.d_ino field.  Using this, we can let lookup and
>>> readdirplus return that original inode ID, which the guest will thus
>>> show until the submount is auto-mounted.  (Then, it will invoke getattr
>>> and that stat(2) call will return the inode ID for the submount.)
>> [ CC miklos ]
>>
>> Hi Max,
>>
>> Though we discussed this in chat room, I am still responding to this
>> email with the concern I have, so that there is a record of it.

That sounds scary :)

>> So with this patch for FUSE_LOOKUP  we always return submount's parentinode
>> id and with GETATTR request we return actual inode id of submount. That
>> kind of bothers me a bit as we are assuming that there is always going
>> to be a GETATTR request after FUSE_LOOKUP. FUSE_LOOKUP itself has attrs
>> returned with it and it might happen that after FUSE_LOOKUP, FUSE_GETATTR
>> might not be called at all because FUSE_LOOKUP itself got the latest
>> updated attrs with certain timeout.
>>
>> For example, if I call stat on a normal file (not submount), I see that
>> after FUSE_LOOKUP, no FUSE_GETATTR request is going and
>> fuse_update_get_attr() is using attrs from locally cached inode attrs.
>>
>> But same thing does not seem to be happening in case of submount. Once
>> submount is created in guest, I see that we never seem to be calling
>> ->revalidate() on newly created dentry of submount root. I am not sure
>> why. And that means we don't do FUSE_LOOKUP and that means FUSE_GETATTR
>> always gets called.

Even if it worked reliably, I have to admit it’s kind of relying on at 
most implementation-defined behavior.  Having two inodes would 
definitely be the right solution.

> Yes, this sounds normal.
>
> The lookup sequence for "/mnt/virtiofs/dir1/submount/file" will be:
>
> LOOKUP(1, "dir")
>      create inode I1 in sb1
>      create dentry "dir" with inode I1 in sb1
> LOOKUP(I1, "submount")
>      create inode I2 in sb1, set S_AUTOMOUNT
>      create dentry "submount" with inode I2 in sb1
>      automount triggered:
>      create sb2
>      create inode I2 in sb2
>      create dentry "/" with inode I2 in sb2
> GETATTR(I2)
>       fill inode I2
> LOOKUP(I2, "file")
>       create inode I3
>       create dentry "file" with inode I3 in sb2
>
> Notice how there's two inodes numbered I2, but in separate sb's.
> However the server has only the notion of a single I2 inode which is
> the root of the submount, since the mountpoint is not visible (except
> for d_ino in readdir()).
>
> Now AFAICS what this patch does is set the inode number in the
> attributes returned by LOOKUP(I1, "submount") to the covered inode, so
> that AT_NO_AUTOMOUNT stat will return the correct value.   While this
> seems to work, it's not a fundamental fix to the problem, since the
> attributes on sb1/I2 will time out and the next stat(...,
> AT_NO_AUTOMOUNT) will trigger a GETATTR request, which will return the
> inode number of the submount root.

Ah, yeah.

> Solving this properly would mean that the server would have to
> generate separate nodeid's for the mountpoint and the submount root
> and the protocol would have to be extended so that this information
> could be transferred to the client.

Yes, we’d somehow need to do a separate lookup for the root inode of the 
submount...

> Not sure whether this is worth it or not.

I’m wondering the same.  This series was mostly the result of an 
incidental finding and seeing that getting it to work and do what I want 
seemed to be pretty easy.  Turns out doing something right can never 
really be easy...

But we have already decided that we don’t deem the inode ID problem too 
important (not least considering NFS has the same issue), so fixing it 
is not really top priority.  Maybe I’ll get back to it, but I think for 
now I consider it on the backlog.

Max



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Virtio-fs] [PATCH 1/3] virtiofsd: Find original inode ID of mount points
@ 2021-06-04 16:22         ` Max Reitz
  0 siblings, 0 replies; 22+ messages in thread
From: Max Reitz @ 2021-06-04 16:22 UTC (permalink / raw)
  To: Miklos Szeredi, Vivek Goyal; +Cc: virtio-fs-list, qemu-devel

On 02.06.21 20:59, Miklos Szeredi wrote:
> On Wed, 2 Jun 2021 at 20:20, Vivek Goyal <vgoyal@redhat.com> wrote:
>> On Wed, May 12, 2021 at 02:55:42PM +0200, Max Reitz wrote:
>>> Mount point directories represent two inodes: On one hand, they are a
>>> normal directory on their parent filesystem.  On the other, they are the
>>> root node of the filesystem mounted there.  Thus, they have two inode
>>> IDs.
>>>
>>> Right now, we only report the latter inode ID (i.e. the inode ID of the
>>> mounted filesystem's root node).  This is fine once the guest has
>>> auto-mounted a submount there (so this inode ID goes with a device ID
>>> that is distinct from the parent filesystem), but before the auto-mount,
>>> they have the device ID of the parent and the inode ID for the submount.
>>> This is problematic because this is likely exactly the same
>>> st_dev/st_ino combination as the parent filesystem's root node.  This
>>> leads to problems for example with `find`, which will thus complain
>>> about a filesystem loop if it has visited the parent filesystem's root
>>> node before, and then refuse to descend into the submount.
>>>
>>> There is a way to find the mount directory's original inode ID, and that
>>> is to readdir(3) the parent directory, look for the mount directory, and
>>> read the dirent.d_ino field.  Using this, we can let lookup and
>>> readdirplus return that original inode ID, which the guest will thus
>>> show until the submount is auto-mounted.  (Then, it will invoke getattr
>>> and that stat(2) call will return the inode ID for the submount.)
>> [ CC miklos ]
>>
>> Hi Max,
>>
>> Though we discussed this in chat room, I am still responding to this
>> email with the concern I have, so that there is a record of it.

That sounds scary :)

>> So with this patch for FUSE_LOOKUP  we always return submount's parentinode
>> id and with GETATTR request we return actual inode id of submount. That
>> kind of bothers me a bit as we are assuming that there is always going
>> to be a GETATTR request after FUSE_LOOKUP. FUSE_LOOKUP itself has attrs
>> returned with it and it might happen that after FUSE_LOOKUP, FUSE_GETATTR
>> might not be called at all because FUSE_LOOKUP itself got the latest
>> updated attrs with certain timeout.
>>
>> For example, if I call stat on a normal file (not submount), I see that
>> after FUSE_LOOKUP, no FUSE_GETATTR request is going and
>> fuse_update_get_attr() is using attrs from locally cached inode attrs.
>>
>> But same thing does not seem to be happening in case of submount. Once
>> submount is created in guest, I see that we never seem to be calling
>> ->revalidate() on newly created dentry of submount root. I am not sure
>> why. And that means we don't do FUSE_LOOKUP and that means FUSE_GETATTR
>> always gets called.

Even if it worked reliably, I have to admit it’s kind of relying on at 
most implementation-defined behavior.  Having two inodes would 
definitely be the right solution.

> Yes, this sounds normal.
>
> The lookup sequence for "/mnt/virtiofs/dir1/submount/file" will be:
>
> LOOKUP(1, "dir")
>      create inode I1 in sb1
>      create dentry "dir" with inode I1 in sb1
> LOOKUP(I1, "submount")
>      create inode I2 in sb1, set S_AUTOMOUNT
>      create dentry "submount" with inode I2 in sb1
>      automount triggered:
>      create sb2
>      create inode I2 in sb2
>      create dentry "/" with inode I2 in sb2
> GETATTR(I2)
>       fill inode I2
> LOOKUP(I2, "file")
>       create inode I3
>       create dentry "file" with inode I3 in sb2
>
> Notice how there's two inodes numbered I2, but in separate sb's.
> However the server has only the notion of a single I2 inode which is
> the root of the submount, since the mountpoint is not visible (except
> for d_ino in readdir()).
>
> Now AFAICS what this patch does is set the inode number in the
> attributes returned by LOOKUP(I1, "submount") to the covered inode, so
> that AT_NO_AUTOMOUNT stat will return the correct value.   While this
> seems to work, it's not a fundamental fix to the problem, since the
> attributes on sb1/I2 will time out and the next stat(...,
> AT_NO_AUTOMOUNT) will trigger a GETATTR request, which will return the
> inode number of the submount root.

Ah, yeah.

> Solving this properly would mean that the server would have to
> generate separate nodeid's for the mountpoint and the submount root
> and the protocol would have to be extended so that this information
> could be transferred to the client.

Yes, we’d somehow need to do a separate lookup for the root inode of the 
submount...

> Not sure whether this is worth it or not.

I’m wondering the same.  This series was mostly the result of an 
incidental finding and seeing that getting it to work and do what I want 
seemed to be pretty easy.  Turns out doing something right can never 
really be easy...

But we have already decided that we don’t deem the inode ID problem too 
important (not least considering NFS has the same issue), so fixing it 
is not really top priority.  Maybe I’ll get back to it, but I think for 
now I consider it on the backlog.

Max


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-06-04 17:03 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-12 12:55 [PATCH 0/3] virtiofsd: Find original inode ID of mount points Max Reitz
2021-05-12 12:55 ` [PATCH 1/3] " Max Reitz
2021-05-12 15:59   ` Connor Kuehl
2021-05-17 14:57   ` Vivek Goyal
2021-05-17 14:57     ` [Virtio-fs] " Vivek Goyal
2021-05-17 17:26     ` Max Reitz
2021-05-17 17:26       ` [Virtio-fs] " Max Reitz
2021-05-20 11:28       ` Dr. David Alan Gilbert
2021-05-20 11:28         ` [Virtio-fs] " Dr. David Alan Gilbert
2021-05-26 18:13   ` Vivek Goyal
2021-05-26 18:13     ` [Virtio-fs] " Vivek Goyal
2021-05-26 18:50     ` Vivek Goyal
2021-05-27 15:00       ` Max Reitz
2021-06-02 18:19   ` Vivek Goyal
2021-06-02 18:19     ` [Virtio-fs] " Vivek Goyal
2021-06-02 18:59     ` Miklos Szeredi
2021-06-02 18:59       ` [Virtio-fs] " Miklos Szeredi
2021-06-04 16:22       ` Max Reitz
2021-06-04 16:22         ` [Virtio-fs] " Max Reitz
2021-05-12 12:55 ` [PATCH 2/3] virtiofs_submounts.py: Do not generate ssh key Max Reitz
2021-05-12 12:55 ` [PATCH 3/3] virtiofs_submounts.py: Check `find` Max Reitz
2021-05-12 14:37 ` [Virtio-fs] Fwd: [PATCH 0/3] virtiofsd: Find original inode ID of mount points Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.