All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] nbd/server: Quiesce coroutines on context switch
@ 2021-01-21 17:06 Sergio Lopez
  2021-01-21 17:06 ` [PATCH v3 1/2] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore() Sergio Lopez
  2021-01-21 17:07 ` [PATCH v3 2/2] block: move blk_exp_close_all() to qemu_cleanup() Sergio Lopez
  0 siblings, 2 replies; 9+ messages in thread
From: Sergio Lopez @ 2021-01-21 17:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Sergio Lopez, qemu-block, Max Reitz

This series allows the NBD server to properly switch between AIO contexts,
having quiesced recv_coroutine and send_coroutine before doing the transition.

We need this because we send back devices running in IO Thread owned contexts
to the main context when stopping the data plane, something that can happen
multiple times during the lifetime of a VM (usually during the boot sequence or
on a reboot), and we drag the NBD server of the correspoing export with it.

While there, fix also a problem caused by a cross-dependency between
closing the export's client connections and draining the block
layer. The visible effect of this problem was QEMU getting hung when
the guest request a power off while there's an active NBD client.

v3:
 - Drop already merged "block: Honor blk_set_aio_context() context
 requirements" and "nbd/server: Quiesce coroutines on context switch"
 - Change the strategy for avoiding processing BDS twice to adding
 every child and parent to the ignore list in advance before
 processing them. (Kevin Wolf)
 - Replace "nbd/server: Quiesce coroutines on context switch" with
 "block: move blk_exp_close_all() to qemu_cleanup()"

v2:
 - Replace "virtio-blk: Acquire context while switching them on
 dataplane start" with "block: Honor blk_set_aio_context() context
 requirements" (Kevin Wolf)
 - Add "block: Avoid processing BDS twice in
 bdrv_set_aio_context_ignore()"
 - Add "block: Close block exports in two steps"
 - Rename nbd_read_eof() to nbd_server_read_eof() (Eric Blake)
 - Fix double space and typo in comment. (Eric Blake)

Sergio Lopez (2):
  block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()
  block: move blk_exp_close_all() to qemu_cleanup()

 block.c            | 35 +++++++++++++++++++++++++++--------
 softmmu/runstate.c |  9 +++++++++
 2 files changed, 36 insertions(+), 8 deletions(-)

-- 
2.26.2




^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/2] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()
  2021-01-21 17:06 [PATCH v3 0/2] nbd/server: Quiesce coroutines on context switch Sergio Lopez
@ 2021-01-21 17:06 ` Sergio Lopez
  2021-01-21 17:31   ` Eric Blake
  2021-02-01 12:06   ` Kevin Wolf
  2021-01-21 17:07 ` [PATCH v3 2/2] block: move blk_exp_close_all() to qemu_cleanup() Sergio Lopez
  1 sibling, 2 replies; 9+ messages in thread
From: Sergio Lopez @ 2021-01-21 17:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Sergio Lopez, qemu-block, Max Reitz

Some graphs may contain an indirect reference to the first BDS in the
chain that can be reached while walking it bottom->up from one its
children.

Doubling-processing of a BDS is especially problematic for the
aio_notifiers, as they might attempt to work on both the old and the
new AIO contexts.

To avoid this problem, add every child and parent to the ignore list
before actually processing them.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 block.c | 34 +++++++++++++++++++++++++++-------
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index 8b9d457546..3da99312db 100644
--- a/block.c
+++ b/block.c
@@ -6414,7 +6414,10 @@ void bdrv_set_aio_context_ignore(BlockDriverState *bs,
                                  AioContext *new_context, GSList **ignore)
 {
     AioContext *old_context = bdrv_get_aio_context(bs);
-    BdrvChild *child;
+    GSList *children_to_process = NULL;
+    GSList *parents_to_process = NULL;
+    GSList *entry;
+    BdrvChild *child, *parent;
 
     g_assert(qemu_get_current_aio_context() == qemu_get_aio_context());
 
@@ -6429,16 +6432,33 @@ void bdrv_set_aio_context_ignore(BlockDriverState *bs,
             continue;
         }
         *ignore = g_slist_prepend(*ignore, child);
-        bdrv_set_aio_context_ignore(child->bs, new_context, ignore);
+        children_to_process = g_slist_prepend(children_to_process, child);
     }
-    QLIST_FOREACH(child, &bs->parents, next_parent) {
-        if (g_slist_find(*ignore, child)) {
+
+    QLIST_FOREACH(parent, &bs->parents, next_parent) {
+        if (g_slist_find(*ignore, parent)) {
             continue;
         }
-        assert(child->klass->set_aio_ctx);
-        *ignore = g_slist_prepend(*ignore, child);
-        child->klass->set_aio_ctx(child, new_context, ignore);
+        *ignore = g_slist_prepend(*ignore, parent);
+        parents_to_process = g_slist_prepend(parents_to_process, parent);
+    }
+
+    for (entry = children_to_process;
+         entry != NULL;
+         entry = g_slist_next(entry)) {
+        child = entry->data;
+        bdrv_set_aio_context_ignore(child->bs, new_context, ignore);
+    }
+    g_slist_free(children_to_process);
+
+    for (entry = parents_to_process;
+         entry != NULL;
+         entry = g_slist_next(entry)) {
+        parent = entry->data;
+        assert(parent->klass->set_aio_ctx);
+        parent->klass->set_aio_ctx(parent, new_context, ignore);
     }
+    g_slist_free(parents_to_process);
 
     bdrv_detach_aio_context(bs);
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] block: move blk_exp_close_all() to qemu_cleanup()
  2021-01-21 17:06 [PATCH v3 0/2] nbd/server: Quiesce coroutines on context switch Sergio Lopez
  2021-01-21 17:06 ` [PATCH v3 1/2] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore() Sergio Lopez
@ 2021-01-21 17:07 ` Sergio Lopez
  2021-01-21 18:02   ` Eric Blake
  2021-02-01 12:20   ` Kevin Wolf
  1 sibling, 2 replies; 9+ messages in thread
From: Sergio Lopez @ 2021-01-21 17:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Sergio Lopez, qemu-block, Max Reitz

Move blk_exp_close_all() from bdrv_close() to qemu_cleanup(), before
bdrv_drain_all_begin().

Export drivers may have coroutines yielding at some point in the block
layer, so we need to shut them down before draining the block layer,
as otherwise they may get stuck blk_wait_while_drained().

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900505
Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 block.c            | 1 -
 softmmu/runstate.c | 9 +++++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 3da99312db..9682c82fa8 100644
--- a/block.c
+++ b/block.c
@@ -4435,7 +4435,6 @@ static void bdrv_close(BlockDriverState *bs)
 void bdrv_close_all(void)
 {
     assert(job_next(NULL) == NULL);
-    blk_exp_close_all();
 
     /* Drop references from requests still in flight, such as canceled block
      * jobs whose AIO context has not been polled yet */
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index 6177693a30..ac4b2e2540 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -25,6 +25,7 @@
 #include "qemu/osdep.h"
 #include "audio/audio.h"
 #include "block/block.h"
+#include "block/export.h"
 #include "chardev/char.h"
 #include "crypto/cipher.h"
 #include "crypto/init.h"
@@ -783,6 +784,14 @@ void qemu_cleanup(void)
      */
     migration_shutdown();
 
+    /*
+     * Close the exports before draining the block layer. The export
+     * drivers may have coroutines yielding on it, so we need to clean
+     * them up before the drain, as otherwise they may be get stuck in
+     * blk_wait_while_drained().
+     */
+    blk_exp_close_all();
+
     /*
      * We must cancel all block jobs while the block layer is drained,
      * or cancelling will be affected by throttling and thus may block
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()
  2021-01-21 17:06 ` [PATCH v3 1/2] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore() Sergio Lopez
@ 2021-01-21 17:31   ` Eric Blake
  2021-02-01 12:06   ` Kevin Wolf
  1 sibling, 0 replies; 9+ messages in thread
From: Eric Blake @ 2021-01-21 17:31 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel; +Cc: Kevin Wolf, qemu-block, Max Reitz

On 1/21/21 11:06 AM, Sergio Lopez wrote:
> Some graphs may contain an indirect reference to the first BDS in the
> chain that can be reached while walking it bottom->up from one its

one of its

> children.
> 
> Doubling-processing of a BDS is especially problematic for the

Double-processing

> aio_notifiers, as they might attempt to work on both the old and the
> new AIO contexts.
> 
> To avoid this problem, add every child and parent to the ignore list
> before actually processing them.
> 
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  block.c | 34 +++++++++++++++++++++++++++-------
>  1 file changed, 27 insertions(+), 7 deletions(-)
> 
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] block: move blk_exp_close_all() to qemu_cleanup()
  2021-01-21 17:07 ` [PATCH v3 2/2] block: move blk_exp_close_all() to qemu_cleanup() Sergio Lopez
@ 2021-01-21 18:02   ` Eric Blake
  2021-02-01 12:20   ` Kevin Wolf
  1 sibling, 0 replies; 9+ messages in thread
From: Eric Blake @ 2021-01-21 18:02 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel; +Cc: Kevin Wolf, qemu-block, Max Reitz

On 1/21/21 11:07 AM, Sergio Lopez wrote:
> Move blk_exp_close_all() from bdrv_close() to qemu_cleanup(), before
> bdrv_drain_all_begin().
> 
> Export drivers may have coroutines yielding at some point in the block
> layer, so we need to shut them down before draining the block layer,
> as otherwise they may get stuck blk_wait_while_drained().

stuck in

> 
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900505
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  block.c            | 1 -
>  softmmu/runstate.c | 9 +++++++++
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 

> @@ -783,6 +784,14 @@ void qemu_cleanup(void)
>       */
>      migration_shutdown();
>  
> +    /*
> +     * Close the exports before draining the block layer. The export
> +     * drivers may have coroutines yielding on it, so we need to clean
> +     * them up before the drain, as otherwise they may be get stuck in

s/be //

> +     * blk_wait_while_drained().
> +     */
> +    blk_exp_close_all();
> +
>      /*
>       * We must cancel all block jobs while the block layer is drained,
>       * or cancelling will be affected by throttling and thus may block
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()
  2021-01-21 17:06 ` [PATCH v3 1/2] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore() Sergio Lopez
  2021-01-21 17:31   ` Eric Blake
@ 2021-02-01 12:06   ` Kevin Wolf
  2021-02-01 12:34     ` Sergio Lopez
  1 sibling, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2021-02-01 12:06 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: qemu-devel, qemu-block, Max Reitz

Am 21.01.2021 um 18:06 hat Sergio Lopez geschrieben:
> Some graphs may contain an indirect reference to the first BDS in the
> chain that can be reached while walking it bottom->up from one its
> children.
> 
> Doubling-processing of a BDS is especially problematic for the
> aio_notifiers, as they might attempt to work on both the old and the
> new AIO contexts.
> 
> To avoid this problem, add every child and parent to the ignore list
> before actually processing them.
> 
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  block.c | 34 +++++++++++++++++++++++++++-------
>  1 file changed, 27 insertions(+), 7 deletions(-)

The patch looks correct to me, I'm just wondering about one thing:

> diff --git a/block.c b/block.c
> index 8b9d457546..3da99312db 100644
> --- a/block.c
> +++ b/block.c
> @@ -6414,7 +6414,10 @@ void bdrv_set_aio_context_ignore(BlockDriverState *bs,
>                                   AioContext *new_context, GSList **ignore)
>  {
>      AioContext *old_context = bdrv_get_aio_context(bs);
> -    BdrvChild *child;
> +    GSList *children_to_process = NULL;
> +    GSList *parents_to_process = NULL;

Why do we need these separate lists? Can't we just iterate over
bs->parents/children a second time? I don't think the graph can change
between the first and the second loop (and if it could, the result would
be broken anyway).

Kevin



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] block: move blk_exp_close_all() to qemu_cleanup()
  2021-01-21 17:07 ` [PATCH v3 2/2] block: move blk_exp_close_all() to qemu_cleanup() Sergio Lopez
  2021-01-21 18:02   ` Eric Blake
@ 2021-02-01 12:20   ` Kevin Wolf
  2021-02-01 12:35     ` Sergio Lopez
  1 sibling, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2021-02-01 12:20 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: qemu-devel, qemu-block, Max Reitz

Am 21.01.2021 um 18:07 hat Sergio Lopez geschrieben:
> Move blk_exp_close_all() from bdrv_close() to qemu_cleanup(), before
> bdrv_drain_all_begin().
> 
> Export drivers may have coroutines yielding at some point in the block
> layer, so we need to shut them down before draining the block layer,
> as otherwise they may get stuck blk_wait_while_drained().
> 
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900505
> Signed-off-by: Sergio Lopez <slp@redhat.com>

This patch loses the call in qemu-nbd and qemu-storage-daemon.

Kevin



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()
  2021-02-01 12:06   ` Kevin Wolf
@ 2021-02-01 12:34     ` Sergio Lopez
  0 siblings, 0 replies; 9+ messages in thread
From: Sergio Lopez @ 2021-02-01 12:34 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

On Mon, Feb 01, 2021 at 01:06:31PM +0100, Kevin Wolf wrote:
> Am 21.01.2021 um 18:06 hat Sergio Lopez geschrieben:
> > Some graphs may contain an indirect reference to the first BDS in the
> > chain that can be reached while walking it bottom->up from one its
> > children.
> > 
> > Doubling-processing of a BDS is especially problematic for the
> > aio_notifiers, as they might attempt to work on both the old and the
> > new AIO contexts.
> > 
> > To avoid this problem, add every child and parent to the ignore list
> > before actually processing them.
> > 
> > Suggested-by: Kevin Wolf <kwolf@redhat.com>
> > Signed-off-by: Sergio Lopez <slp@redhat.com>
> > ---
> >  block.c | 34 +++++++++++++++++++++++++++-------
> >  1 file changed, 27 insertions(+), 7 deletions(-)
> 
> The patch looks correct to me, I'm just wondering about one thing:
> 
> > diff --git a/block.c b/block.c
> > index 8b9d457546..3da99312db 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -6414,7 +6414,10 @@ void bdrv_set_aio_context_ignore(BlockDriverState *bs,
> >                                   AioContext *new_context, GSList **ignore)
> >  {
> >      AioContext *old_context = bdrv_get_aio_context(bs);
> > -    BdrvChild *child;
> > +    GSList *children_to_process = NULL;
> > +    GSList *parents_to_process = NULL;
> 
> Why do we need these separate lists? Can't we just iterate over
> bs->parents/children a second time? I don't think the graph can change
> between the first and the second loop (and if it could, the result would
> be broken anyway).

It's not strictly needed, but this makes the code more readable by
making our intentions clearer. To my eyes, at least.

Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] block: move blk_exp_close_all() to qemu_cleanup()
  2021-02-01 12:20   ` Kevin Wolf
@ 2021-02-01 12:35     ` Sergio Lopez
  0 siblings, 0 replies; 9+ messages in thread
From: Sergio Lopez @ 2021-02-01 12:35 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 695 bytes --]

On Mon, Feb 01, 2021 at 01:20:30PM +0100, Kevin Wolf wrote:
> Am 21.01.2021 um 18:07 hat Sergio Lopez geschrieben:
> > Move blk_exp_close_all() from bdrv_close() to qemu_cleanup(), before
> > bdrv_drain_all_begin().
> > 
> > Export drivers may have coroutines yielding at some point in the block
> > layer, so we need to shut them down before draining the block layer,
> > as otherwise they may get stuck blk_wait_while_drained().
> > 
> > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900505
> > Signed-off-by: Sergio Lopez <slp@redhat.com>
> 
> This patch loses the call in qemu-nbd and qemu-storage-daemon.

You're right, I'll prepare a v4 right away.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-02-01 12:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-21 17:06 [PATCH v3 0/2] nbd/server: Quiesce coroutines on context switch Sergio Lopez
2021-01-21 17:06 ` [PATCH v3 1/2] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore() Sergio Lopez
2021-01-21 17:31   ` Eric Blake
2021-02-01 12:06   ` Kevin Wolf
2021-02-01 12:34     ` Sergio Lopez
2021-01-21 17:07 ` [PATCH v3 2/2] block: move blk_exp_close_all() to qemu_cleanup() Sergio Lopez
2021-01-21 18:02   ` Eric Blake
2021-02-01 12:20   ` Kevin Wolf
2021-02-01 12:35     ` Sergio Lopez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.