[PATCH 1/1] 9pfs: avoid iterator invalidation in v9fs_mark_fids

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/1] 9pfs: avoid iterator invalidation in v9fs_mark_fids_unreclaim
@ 2022-09-26 12:42 Linus Heckemann
  2022-09-26 16:02 ` Christian Schoenebeck
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Heckemann @ 2022-09-26 12:42 UTC (permalink / raw)
  To: qemu-devel, Christian Schoenebeck; +Cc: Qemu-block, Greg Kurz, Linus Heckemann

Previously, the yielding in v9fs_reopen_fid and put_fid could result
in other parts of the code modifying the fid table. This would
invalidate the hash table iterator, causing misbehaviour.

Now we ensure that we complete the iteration before yielding, so that
the iterator remains valid throughout the loop, and only reopen the
fids afterwards.
---

@Christian: I still haven't been able to reproduce the issue that this
commit is supposed to fix (I tried building KDE too, no problems), so
it's a bit of a shot in the dark. It certainly still runs and I think it
should fix the issue, but it would be great if you could test it.



 hw/9pfs/9p.c | 46 ++++++++++++++++++++++++++++++----------------
 1 file changed, 30 insertions(+), 16 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index f4c1e37202..825c39e122 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -522,33 +522,47 @@ static int coroutine_fn v9fs_mark_fids_unreclaim(V9fsPDU *pdu, V9fsPath *path)
     V9fsFidState *fidp;
     gpointer fid;
     GHashTableIter iter;
+    /*
+     * The most common case is probably that we have exactly one
+     * fid for the given path, so preallocate exactly one.
+     */
+    GArray *to_reopen = g_array_sized_new(FALSE, FALSE, sizeof(V9fsFidState*), 1);
+    gint i;
 
     g_hash_table_iter_init(&iter, s->fids);
 
+    /*
+     * We iterate over the fid table looking for the entries we need
+     * to reopen, and store them in to_reopen. This is because
+     * reopening them happens asynchronously, allowing the fid table
+     * to be modified in the meantime, invalidating our iterator.
+     */
     while (g_hash_table_iter_next(&iter, &fid, (gpointer *) &fidp)) {
-        /*
-         * Ensure the fid survives a potential clunk request during
-         * v9fs_reopen_fid.
-         */
-        fidp->ref++;
-
         if (fidp->path.size == path->size &&
             !memcmp(fidp->path.data, path->data, path->size)) {
-            /* Mark the fid non reclaimable. */
-            fidp->flags |= FID_NON_RECLAIMABLE;
-
-            /* reopen the file/dir if already closed */
-            err = v9fs_reopen_fid(pdu, fidp);
-            if (err < 0) {
-                put_fid(pdu, fidp);
-                return err;
-            }
+            /*
+             * Ensure the fid survives a potential clunk request during
+             * v9fs_reopen_fid or put_fid.
+             */
+            fidp->ref++;
+            g_array_append_val(to_reopen, fidp);
         }
+    }
 
-        /* We're done with this fid */
+    for (i=0; i < to_reopen->len; i++) {
+        fidp = g_array_index(to_reopen, V9fsFidState*, i);
+        /* reopen the file/dir if already closed */
+        err = v9fs_reopen_fid(pdu, fidp);
+        if (err < 0) {
+            put_fid(pdu, fidp);
+            g_array_free(to_reopen, TRUE);
+            return err;
+        }
         put_fid(pdu, fidp);
     }
 
+    g_array_free(to_reopen, TRUE);
+
     return 0;
 }
 
-- 
2.36.2



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] 9pfs: avoid iterator invalidation in v9fs_mark_fids_unreclaim
  2022-09-26 12:42 [PATCH 1/1] 9pfs: avoid iterator invalidation in v9fs_mark_fids_unreclaim Linus Heckemann
@ 2022-09-26 16:02 ` Christian Schoenebeck
  2022-09-27 13:05   ` Linus Heckemann
  0 siblings, 1 reply; 6+ messages in thread
From: Christian Schoenebeck @ 2022-09-26 16:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Qemu-block, Greg Kurz, Linus Heckemann

On Montag, 26. September 2022 14:42:06 CEST Linus Heckemann wrote:
> Previously, the yielding in v9fs_reopen_fid and put_fid could result
> in other parts of the code modifying the fid table. This would
> invalidate the hash table iterator, causing misbehaviour.
> 
> Now we ensure that we complete the iteration before yielding, so that
> the iterator remains valid throughout the loop, and only reopen the
> fids afterwards.
> ---

Ah, you sent this fix as a separate patch on top. I actually just meant that 
you would take my already queued patch as the latest version (just because I 
had made some minor changes on my end) and adjust that patch further as v4.

Anyway, there are still some things to do here, so maybe you can send your 
patch squashed in the next round ...

> @Christian: I still haven't been able to reproduce the issue that this
> commit is supposed to fix (I tried building KDE too, no problems), so
> it's a bit of a shot in the dark. It certainly still runs and I think it
> should fix the issue, but it would be great if you could test it.

No worries about reproduction, I will definitely test this thoroughly. ;-)

>  hw/9pfs/9p.c | 46 ++++++++++++++++++++++++++++++----------------
>  1 file changed, 30 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index f4c1e37202..825c39e122 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -522,33 +522,47 @@ static int coroutine_fn
> v9fs_mark_fids_unreclaim(V9fsPDU *pdu, V9fsPath *path) V9fsFidState *fidp;
>      gpointer fid;
>      GHashTableIter iter;
> +    /*
> +     * The most common case is probably that we have exactly one
> +     * fid for the given path, so preallocate exactly one.
> +     */
> +    GArray *to_reopen = g_array_sized_new(FALSE, FALSE,
> sizeof(V9fsFidState*), 1); +    gint i;

Please use `g_autoptr(GArray)` instead of `GArray *`, that avoids the need for 
explicit calls to g_array_free() below.

> 
>      g_hash_table_iter_init(&iter, s->fids);
> 
> +    /*
> +     * We iterate over the fid table looking for the entries we need
> +     * to reopen, and store them in to_reopen. This is because
> +     * reopening them happens asynchronously, allowing the fid table
> +     * to be modified in the meantime, invalidating our iterator.
> +     */
>      while (g_hash_table_iter_next(&iter, &fid, (gpointer *) &fidp)) {
> -        /*
> -         * Ensure the fid survives a potential clunk request during
> -         * v9fs_reopen_fid.
> -         */
> -        fidp->ref++;
> -
>          if (fidp->path.size == path->size &&
>              !memcmp(fidp->path.data, path->data, path->size)) {
> -            /* Mark the fid non reclaimable. */
> -            fidp->flags |= FID_NON_RECLAIMABLE;

Why did you remove that? It should still be marked as FID_NON_RECLAIMABLE, no?

> -
> -            /* reopen the file/dir if already closed */
> -            err = v9fs_reopen_fid(pdu, fidp);
> -            if (err < 0) {
> -                put_fid(pdu, fidp);
> -                return err;
> -            }
> +            /*
> +             * Ensure the fid survives a potential clunk request during
> +             * v9fs_reopen_fid or put_fid.
> +             */
> +            fidp->ref++;

Hmm, bumping the refcount here makes sense, as the 2nd loop may be interrupted 
and the fid otherwise disappear in between, but ...

> +            g_array_append_val(to_reopen, fidp);
>          }
> +    }
> 
> -        /* We're done with this fid */
> +    for (i=0; i < to_reopen->len; i++) {
> +        fidp = g_array_index(to_reopen, V9fsFidState*, i);
> +        /* reopen the file/dir if already closed */
> +        err = v9fs_reopen_fid(pdu, fidp);
> +        if (err < 0) {
> +            put_fid(pdu, fidp);
> +            g_array_free(to_reopen, TRUE);
> +            return err;

... this return would then leak all remainder fids that you have bumped the 
refcount for above already.

> +        }
>          put_fid(pdu, fidp);
>      }
> 
> +    g_array_free(to_reopen, TRUE);
> +

With `g_autoptr(GArray)` you can drop both g_array_free() calls.

>      return 0;
>  }

Also: I noticed that your changes in virtfs_reset() would need the same 2-loop 
hack to avoid hash iterator invalidation, as it would also call put_fid() 
inside the loop and be prone for hash iterator invalidation otherwise.

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] 9pfs: avoid iterator invalidation in v9fs_mark_fids_unreclaim
  2022-09-26 16:02 ` Christian Schoenebeck
@ 2022-09-27 13:05   ` Linus Heckemann
  2022-09-27 17:14     ` Christian Schoenebeck
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Heckemann @ 2022-09-27 13:05 UTC (permalink / raw)
  To: Christian Schoenebeck, qemu-devel; +Cc: Qemu-block, Greg Kurz

Christian Schoenebeck <qemu_oss@crudebyte.com> writes:

> Ah, you sent this fix as a separate patch on top. I actually just meant that 
> you would take my already queued patch as the latest version (just because I 
> had made some minor changes on my end) and adjust that patch further as v4.
>
> Anyway, there are still some things to do here, so maybe you can send your 
> patch squashed in the next round ...

I see, will do!

>> @Christian: I still haven't been able to reproduce the issue that this
>> commit is supposed to fix (I tried building KDE too, no problems), so
>> it's a bit of a shot in the dark. It certainly still runs and I think it
>> should fix the issue, but it would be great if you could test it.
>
> No worries about reproduction, I will definitely test this thoroughly. ;-)
>
>>  hw/9pfs/9p.c | 46 ++++++++++++++++++++++++++++++----------------
>>  1 file changed, 30 insertions(+), 16 deletions(-)
>> 
>> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
>> index f4c1e37202..825c39e122 100644
>> --- a/hw/9pfs/9p.c
>> +++ b/hw/9pfs/9p.c
>> @@ -522,33 +522,47 @@ static int coroutine_fn
>> v9fs_mark_fids_unreclaim(V9fsPDU *pdu, V9fsPath *path) V9fsFidState *fidp;
>>      gpointer fid;
>>      GHashTableIter iter;
>> +    /*
>> +     * The most common case is probably that we have exactly one
>> +     * fid for the given path, so preallocate exactly one.
>> +     */
>> +    GArray *to_reopen = g_array_sized_new(FALSE, FALSE,
>> sizeof(V9fsFidState*), 1); +    gint i;
>
> Please use `g_autoptr(GArray)` instead of `GArray *`, that avoids the need for 
> explicit calls to g_array_free() below.

Good call. I'm not familiar with glib, so I didn't know about this :)

>> -            fidp->flags |= FID_NON_RECLAIMABLE;
>
> Why did you remove that? It should still be marked as FID_NON_RECLAIMABLE, no?

Indeed, that was an accident. 

>> +            /*
>> +             * Ensure the fid survives a potential clunk request during
>> +             * v9fs_reopen_fid or put_fid.
>> +             */
>> +            fidp->ref++;
>
> Hmm, bumping the refcount here makes sense, as the 2nd loop may be interrupted 
> and the fid otherwise disappear in between, but ...
>
>> +            g_array_append_val(to_reopen, fidp);
>>          }
>> +    }
>> 
>> -        /* We're done with this fid */
>> +    for (i=0; i < to_reopen->len; i++) {
>> +        fidp = g_array_index(to_reopen, V9fsFidState*, i);
>> +        /* reopen the file/dir if already closed */
>> +        err = v9fs_reopen_fid(pdu, fidp);
>> +        if (err < 0) {
>> +            put_fid(pdu, fidp);
>> +            g_array_free(to_reopen, TRUE);
>> +            return err;
>
> ... this return would then leak all remainder fids that you have bumped the 
> refcount for above already.

You're right. I think the best way around it, though it feels ugly, is
to add a third loop in an "out:".

> Also: I noticed that your changes in virtfs_reset() would need the same 2-loop 
> hack to avoid hash iterator invalidation, as it would also call put_fid() 
> inside the loop and be prone for hash iterator invalidation otherwise.

Good point. Will do.


One more thing has occurred to me. I think the reclaiming/reopening
logic will misbehave in the following sequence of events:

1. QEMU reclaims an open fid, losing the file handle
2. The file referred to by the fid is replaced with a different file
   (e.g. via rename or symlink) outside QEMU
3. The file is accessed again by the guest, causing QEMU to reopen a
   _different file_ from before without the guest having performed any
   operations that should cause this to happen.

This is neither introduced nor resolved by my changes. Am I overlooking
something that avoids this (be it documentation that directories exposed
via 9p should not be touched by the host), or is this a real issue? I'm
thinking one could at least detect it by saving inode numbers in
V9fsFidState and comparing them when reopening, but recovering from such
a situation seems difficult.

Linus


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] 9pfs: avoid iterator invalidation in v9fs_mark_fids_unreclaim
  2022-09-27 13:05   ` Linus Heckemann
@ 2022-09-27 17:14     ` Christian Schoenebeck
  2022-09-27 19:47       ` Greg Kurz
  0 siblings, 1 reply; 6+ messages in thread
From: Christian Schoenebeck @ 2022-09-27 17:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: Qemu-block, Greg Kurz, Linus Heckemann

On Dienstag, 27. September 2022 15:05:13 CEST Linus Heckemann wrote:
> Christian Schoenebeck <qemu_oss@crudebyte.com> writes:
> > Ah, you sent this fix as a separate patch on top. I actually just meant
> > that you would take my already queued patch as the latest version (just
> > because I had made some minor changes on my end) and adjust that patch
> > further as v4.
> > 
> > Anyway, there are still some things to do here, so maybe you can send your
> > patch squashed in the next round ...
> 
> I see, will do!
> 
> >> @Christian: I still haven't been able to reproduce the issue that this
> >> commit is supposed to fix (I tried building KDE too, no problems), so
> >> it's a bit of a shot in the dark. It certainly still runs and I think it
> >> should fix the issue, but it would be great if you could test it.
> > 
> > No worries about reproduction, I will definitely test this thoroughly. ;-)
> > 
> >>  hw/9pfs/9p.c | 46 ++++++++++++++++++++++++++++++----------------
> >>  1 file changed, 30 insertions(+), 16 deletions(-)
> >> 
> >> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> >> index f4c1e37202..825c39e122 100644
> >> --- a/hw/9pfs/9p.c
> >> +++ b/hw/9pfs/9p.c
> >> @@ -522,33 +522,47 @@ static int coroutine_fn
> >> v9fs_mark_fids_unreclaim(V9fsPDU *pdu, V9fsPath *path) V9fsFidState
> >> *fidp;
> >> 
> >>      gpointer fid;
> >>      GHashTableIter iter;
> >> 
> >> +    /*
> >> +     * The most common case is probably that we have exactly one
> >> +     * fid for the given path, so preallocate exactly one.
> >> +     */
> >> +    GArray *to_reopen = g_array_sized_new(FALSE, FALSE,
> >> sizeof(V9fsFidState*), 1); +    gint i;
> > 
> > Please use `g_autoptr(GArray)` instead of `GArray *`, that avoids the need
> > for explicit calls to g_array_free() below.
> 
> Good call. I'm not familiar with glib, so I didn't know about this :)
> 
> >> -            fidp->flags |= FID_NON_RECLAIMABLE;
> > 
> > Why did you remove that? It should still be marked as FID_NON_RECLAIMABLE,
> > no?
> Indeed, that was an accident.
> 
> >> +            /*
> >> +             * Ensure the fid survives a potential clunk request during
> >> +             * v9fs_reopen_fid or put_fid.
> >> +             */
> >> +            fidp->ref++;
> > 
> > Hmm, bumping the refcount here makes sense, as the 2nd loop may be
> > interrupted and the fid otherwise disappear in between, but ...
> > 
> >> +            g_array_append_val(to_reopen, fidp);
> >> 
> >>          }
> >> 
> >> +    }
> >> 
> >> -        /* We're done with this fid */
> >> +    for (i=0; i < to_reopen->len; i++) {
> >> +        fidp = g_array_index(to_reopen, V9fsFidState*, i);
> >> +        /* reopen the file/dir if already closed */
> >> +        err = v9fs_reopen_fid(pdu, fidp);
> >> +        if (err < 0) {
> >> +            put_fid(pdu, fidp);
> >> +            g_array_free(to_reopen, TRUE);
> >> +            return err;
> > 
> > ... this return would then leak all remainder fids that you have bumped
> > the
> > refcount for above already.
> 
> You're right. I think the best way around it, though it feels ugly, is
> to add a third loop in an "out:".

Either that, or continuing the loop to the end. Not that this would become 
much prettier. I must admit I also don't really have a good idea for a clean 
solution in this case.

> > Also: I noticed that your changes in virtfs_reset() would need the same
> > 2-loop hack to avoid hash iterator invalidation, as it would also call
> > put_fid() inside the loop and be prone for hash iterator invalidation
> > otherwise.
> Good point. Will do.
> 
> One more thing has occurred to me. I think the reclaiming/reopening
> logic will misbehave in the following sequence of events:
> 
> 1. QEMU reclaims an open fid, losing the file handle
> 2. The file referred to by the fid is replaced with a different file
>    (e.g. via rename or symlink) outside QEMU
> 3. The file is accessed again by the guest, causing QEMU to reopen a
>    _different file_ from before without the guest having performed any
>    operations that should cause this to happen.
> 
> This is neither introduced nor resolved by my changes. Am I overlooking
> something that avoids this (be it documentation that directories exposed
> via 9p should not be touched by the host), or is this a real issue? I'm
> thinking one could at least detect it by saving inode numbers in
> V9fsFidState and comparing them when reopening, but recovering from such
> a situation seems difficult.

Well, in that specific scenario when rename/move happens outside of QEMU then 
yes, this might happen unfortunately. The point of this "reclaim fid" stuff is 
to deal with the fact that there is an upper limit on systems for the max. 
amount of open file descriptors a process might hold at a time. And on some 
systems like macOS I think that limit is quite low by default (like 100?).

There is also another issue pending that affects pure inner-guest behaviour; 
the infamous use-after-unlink() use pattern:
https://wiki.qemu.org/Documentation/9p#Implementation_Plans
https://gitlab.com/qemu-project/qemu/-/issues/103

It would make sense to look how other file servers deal with the max. amount 
of file descriptors limit before starting to just fight the symptoms. This 
whole reclaim fid stuff in general is PITA.

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] 9pfs: avoid iterator invalidation in v9fs_mark_fids_unreclaim
  2022-09-27 17:14     ` Christian Schoenebeck
@ 2022-09-27 19:47       ` Greg Kurz
  2022-09-28 17:24         ` Christian Schoenebeck
  0 siblings, 1 reply; 6+ messages in thread
From: Greg Kurz @ 2022-09-27 19:47 UTC (permalink / raw)
  To: Christian Schoenebeck; +Cc: qemu-devel, Qemu-block, Linus Heckemann

On Tue, 27 Sep 2022 19:14:33 +0200
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> On Dienstag, 27. September 2022 15:05:13 CEST Linus Heckemann wrote:
> > Christian Schoenebeck <qemu_oss@crudebyte.com> writes:
> > > Ah, you sent this fix as a separate patch on top. I actually just meant
> > > that you would take my already queued patch as the latest version (just
> > > because I had made some minor changes on my end) and adjust that patch
> > > further as v4.
> > > 
> > > Anyway, there are still some things to do here, so maybe you can send your
> > > patch squashed in the next round ...
> > 
> > I see, will do!
> > 
> > >> @Christian: I still haven't been able to reproduce the issue that this
> > >> commit is supposed to fix (I tried building KDE too, no problems), so
> > >> it's a bit of a shot in the dark. It certainly still runs and I think it
> > >> should fix the issue, but it would be great if you could test it.
> > > 
> > > No worries about reproduction, I will definitely test this thoroughly. ;-)
> > > 
> > >>  hw/9pfs/9p.c | 46 ++++++++++++++++++++++++++++++----------------
> > >>  1 file changed, 30 insertions(+), 16 deletions(-)
> > >> 
> > >> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > >> index f4c1e37202..825c39e122 100644
> > >> --- a/hw/9pfs/9p.c
> > >> +++ b/hw/9pfs/9p.c
> > >> @@ -522,33 +522,47 @@ static int coroutine_fn
> > >> v9fs_mark_fids_unreclaim(V9fsPDU *pdu, V9fsPath *path) V9fsFidState
> > >> *fidp;
> > >> 
> > >>      gpointer fid;
> > >>      GHashTableIter iter;
> > >> 
> > >> +    /*
> > >> +     * The most common case is probably that we have exactly one
> > >> +     * fid for the given path, so preallocate exactly one.
> > >> +     */
> > >> +    GArray *to_reopen = g_array_sized_new(FALSE, FALSE,
> > >> sizeof(V9fsFidState*), 1); +    gint i;
> > > 
> > > Please use `g_autoptr(GArray)` instead of `GArray *`, that avoids the need
> > > for explicit calls to g_array_free() below.
> > 
> > Good call. I'm not familiar with glib, so I didn't know about this :)
> > 
> > >> -            fidp->flags |= FID_NON_RECLAIMABLE;
> > > 
> > > Why did you remove that? It should still be marked as FID_NON_RECLAIMABLE,
> > > no?
> > Indeed, that was an accident.
> > 
> > >> +            /*
> > >> +             * Ensure the fid survives a potential clunk request during
> > >> +             * v9fs_reopen_fid or put_fid.
> > >> +             */
> > >> +            fidp->ref++;
> > > 
> > > Hmm, bumping the refcount here makes sense, as the 2nd loop may be
> > > interrupted and the fid otherwise disappear in between, but ...
> > > 
> > >> +            g_array_append_val(to_reopen, fidp);
> > >> 
> > >>          }
> > >> 
> > >> +    }
> > >> 
> > >> -        /* We're done with this fid */
> > >> +    for (i=0; i < to_reopen->len; i++) {
> > >> +        fidp = g_array_index(to_reopen, V9fsFidState*, i);
> > >> +        /* reopen the file/dir if already closed */
> > >> +        err = v9fs_reopen_fid(pdu, fidp);
> > >> +        if (err < 0) {
> > >> +            put_fid(pdu, fidp);
> > >> +            g_array_free(to_reopen, TRUE);
> > >> +            return err;
> > > 
> > > ... this return would then leak all remainder fids that you have bumped
> > > the
> > > refcount for above already.
> > 
> > You're right. I think the best way around it, though it feels ugly, is
> > to add a third loop in an "out:".
> 
> Either that, or continuing the loop to the end. Not that this would become 
> much prettier. I must admit I also don't really have a good idea for a clean 
> solution in this case.
> 
> > > Also: I noticed that your changes in virtfs_reset() would need the same
> > > 2-loop hack to avoid hash iterator invalidation, as it would also call
> > > put_fid() inside the loop and be prone for hash iterator invalidation
> > > otherwise.
> > Good point. Will do.
> > 
> > One more thing has occurred to me. I think the reclaiming/reopening
> > logic will misbehave in the following sequence of events:
> > 
> > 1. QEMU reclaims an open fid, losing the file handle
> > 2. The file referred to by the fid is replaced with a different file
> >    (e.g. via rename or symlink) outside QEMU
> > 3. The file is accessed again by the guest, causing QEMU to reopen a
> >    _different file_ from before without the guest having performed any
> >    operations that should cause this to happen.
> > 
> > This is neither introduced nor resolved by my changes. Am I overlooking
> > something that avoids this (be it documentation that directories exposed
> > via 9p should not be touched by the host), or is this a real issue? I'm
> > thinking one could at least detect it by saving inode numbers in
> > V9fsFidState and comparing them when reopening, but recovering from such
> > a situation seems difficult.
> 
> Well, in that specific scenario when rename/move happens outside of QEMU then 
> yes, this might happen unfortunately. The point of this "reclaim fid" stuff is 
> to deal with the fact that there is an upper limit on systems for the max. 
> amount of open file descriptors a process might hold at a time. And on some 
> systems like macOS I think that limit is quite low by default (like 100?).
> 
> There is also another issue pending that affects pure inner-guest behaviour; 
> the infamous use-after-unlink() use pattern:
> https://wiki.qemu.org/Documentation/9p#Implementation_Plans
> https://gitlab.com/qemu-project/qemu/-/issues/103
> 
> It would make sense to look how other file servers deal with the max. amount 
> of file descriptors limit before starting to just fight the symptoms. This 
> whole reclaim fid stuff in general is PITA.
> 

Yes this reclaim code is just a best effort tentative to not
starve file descriptors. But since its implementation is path
based, it gets the per-design limitation that nothing should
modify the backing fs outside of the current 9p session.
Note: just like the use-after-unlink() infamous pattern (I love
the wording), you can get this with a "pure inner-guest behaviour"
using two devices with overlapping backends (shoot in the foot
setup) :-)

Recovering from lost state is impossible but the server should
at least try to detect that and return EIO to the client, pretty
much like any storage device is expected to do if possible.

Cheers,

--
Greg

> Best regards,
> Christian Schoenebeck
> 
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] 9pfs: avoid iterator invalidation in v9fs_mark_fids_unreclaim
  2022-09-27 19:47       ` Greg Kurz
@ 2022-09-28 17:24         ` Christian Schoenebeck
  0 siblings, 0 replies; 6+ messages in thread
From: Christian Schoenebeck @ 2022-09-28 17:24 UTC (permalink / raw)
  To: qemu-devel, Qemu-block; +Cc: Linus Heckemann, Greg Kurz

On Dienstag, 27. September 2022 21:47:02 CEST Greg Kurz wrote:
> On Tue, 27 Sep 2022 19:14:33 +0200
> 
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > On Dienstag, 27. September 2022 15:05:13 CEST Linus Heckemann wrote:
> > > One more thing has occurred to me. I think the reclaiming/reopening
> > > logic will misbehave in the following sequence of events:
> > > 
> > > 1. QEMU reclaims an open fid, losing the file handle
> > > 2. The file referred to by the fid is replaced with a different file
> > > 
> > >    (e.g. via rename or symlink) outside QEMU
> > > 
> > > 3. The file is accessed again by the guest, causing QEMU to reopen a
> > > 
> > >    _different file_ from before without the guest having performed any
> > >    operations that should cause this to happen.
> > > 
> > > This is neither introduced nor resolved by my changes. Am I overlooking
> > > something that avoids this (be it documentation that directories exposed
> > > via 9p should not be touched by the host), or is this a real issue? I'm
> > > thinking one could at least detect it by saving inode numbers in
> > > V9fsFidState and comparing them when reopening, but recovering from such
> > > a situation seems difficult.
> > 
> > Well, in that specific scenario when rename/move happens outside of QEMU
> > then yes, this might happen unfortunately. The point of this "reclaim
> > fid" stuff is to deal with the fact that there is an upper limit on
> > systems for the max. amount of open file descriptors a process might hold
> > at a time. And on some systems like macOS I think that limit is quite low
> > by default (like 100?).
> > 
> > There is also another issue pending that affects pure inner-guest
> > behaviour; the infamous use-after-unlink() use pattern:
> > https://wiki.qemu.org/Documentation/9p#Implementation_Plans
> > https://gitlab.com/qemu-project/qemu/-/issues/103
> > 
> > It would make sense to look how other file servers deal with the max.
> > amount of file descriptors limit before starting to just fight the
> > symptoms. This whole reclaim fid stuff in general is PITA.
> 
> Yes this reclaim code is just a best effort tentative to not
> starve file descriptors. But since its implementation is path
> based, it gets the per-design limitation that nothing should
> modify the backing fs outside of the current 9p session.

Sure.

> Note: just like the use-after-unlink() infamous pattern (I love
> the wording), you can get this with a "pure inner-guest behaviour"
> using two devices with overlapping backends (shoot in the foot
> setup) :-)

True.

> Recovering from lost state is impossible but the server should
> at least try to detect that and return EIO to the client, pretty
> much like any storage device is expected to do if possible.

Yeah, I agree.

Nevertheless, I just had a glimpse on how this is handled on Samba, and one 
important aspect they are doing is trying to increase (hard & soft) limits:

https://github.com/samba-team/samba/blob/master/source3/lib/util.c#L1320

Which makes sense, and now I remember commonly doing that on macOS as well due 
to Apple's very low default limit there.

Samba's anticipated default limit is a max. of 10k open files BTW, which is 
quite a good ground for not getting into these waters in the first place. 
Again, not that I would ignore that space.

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-28 17:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-26 12:42 [PATCH 1/1] 9pfs: avoid iterator invalidation in v9fs_mark_fids_unreclaim Linus Heckemann
2022-09-26 16:02 ` Christian Schoenebeck
2022-09-27 13:05   ` Linus Heckemann
2022-09-27 17:14     ` Christian Schoenebeck
2022-09-27 19:47       ` Greg Kurz
2022-09-28 17:24         ` Christian Schoenebeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.