[PATCH v2 0/2] nbd/server: Quiesce server on drained section

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] nbd/server: Quiesce server on drained section
@ 2021-06-02  6:05 Sergio Lopez
  2021-06-02  6:05 ` [PATCH v2 1/2] block-backend: add drained_poll Sergio Lopez
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Sergio Lopez @ 2021-06-02  6:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Sergio Lopez,
	qemu-block, Max Reitz, Nir Soffer

Before switching between AioContexts we need to make sure that we're
fully quiesced ("nb_requests == 0" for every client) when entering the
drained section. Otherwise, coroutines may be run in the wrong context
after the switch, leading to a number of critical issues.

To accomplish this, we add ".drained_poll" to BlockDevOps and use it
in the NBD server, along with ".drained_being" and "drained_end", to
coordinate the quiescing of the server while entering a drained
section.

v2:
 - Use a bool for the value returned by .drained_poll [Kevin]
 - Change .drained_poll comment to reflect that the returned boolean
   value will be true if the device is still busy, or false otherwise
 - Drop yield_co_list and use recv_coroutine and read_yielding [Kevin]
 - Return "true" or "false" in nbd_drained_poll [Kevin]
 - Fix grammar in the commit message of patch 2 [Eric]

Sergio Lopez (2):
  block-backend: add drained_poll
  nbd/server: Use drained block ops to quiesce the server

 block/block-backend.c          |  7 ++-
 include/sysemu/block-backend.h |  4 ++
 nbd/server.c                   | 82 +++++++++++++++++++++++++---------
 3 files changed, 71 insertions(+), 22 deletions(-)

-- 
2.26.2




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] block-backend: add drained_poll
  2021-06-02  6:05 [PATCH v2 0/2] nbd/server: Quiesce server on drained section Sergio Lopez
@ 2021-06-02  6:05 ` Sergio Lopez
  2021-06-02  8:34   ` Vladimir Sementsov-Ogievskiy
  2021-06-02  6:05 ` [PATCH v2 2/2] nbd/server: Use drained block ops to quiesce the server Sergio Lopez
  2021-06-02 11:57 ` [PATCH v2 0/2] nbd/server: Quiesce server on drained section Kevin Wolf
  2 siblings, 1 reply; 8+ messages in thread
From: Sergio Lopez @ 2021-06-02  6:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Sergio Lopez,
	qemu-block, Max Reitz, Nir Soffer

Allow block backends to poll their devices/users to check if they have
been quiesced when entering a drained section.

This will be used in the next patch to wait for the NBD server to be
completely quiesced.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 block/block-backend.c          | 7 ++++++-
 include/sysemu/block-backend.h | 4 ++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index de5496af66..8fcc2b4b3d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2393,8 +2393,13 @@ static void blk_root_drained_begin(BdrvChild *child)
 static bool blk_root_drained_poll(BdrvChild *child)
 {
     BlockBackend *blk = child->opaque;
+    bool busy = false;
     assert(blk->quiesce_counter);
-    return !!blk->in_flight;
+
+    if (blk->dev_ops && blk->dev_ops->drained_poll) {
+        busy = blk->dev_ops->drained_poll(blk->dev_opaque);
+    }
+    return busy || !!blk->in_flight;
 }
 
 static void blk_root_drained_end(BdrvChild *child, int *drained_end_counter)
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 880e903293..5423e3d9c6 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -66,6 +66,10 @@ typedef struct BlockDevOps {
      * Runs when the backend's last drain request ends.
      */
     void (*drained_end)(void *opaque);
+    /*
+     * Is the device still busy?
+     */
+    bool (*drained_poll)(void *opaque);
 } BlockDevOps;
 
 /* This struct is embedded in (the private) BlockBackend struct and contains
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] nbd/server: Use drained block ops to quiesce the server
  2021-06-02  6:05 [PATCH v2 0/2] nbd/server: Quiesce server on drained section Sergio Lopez
  2021-06-02  6:05 ` [PATCH v2 1/2] block-backend: add drained_poll Sergio Lopez
@ 2021-06-02  6:05 ` Sergio Lopez
  2021-06-02 12:06   ` Vladimir Sementsov-Ogievskiy
  2021-06-02 11:57 ` [PATCH v2 0/2] nbd/server: Quiesce server on drained section Kevin Wolf
  2 siblings, 1 reply; 8+ messages in thread
From: Sergio Lopez @ 2021-06-02  6:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Sergio Lopez,
	qemu-block, Max Reitz, Nir Soffer

Before switching between AioContexts we need to make sure that we're
fully quiesced ("nb_requests == 0" for every client) when entering the
drained section.

To do this, we set "quiescing = true" for every client on
".drained_begin" to prevent new coroutines from being created, and
check if "nb_requests == 0" on ".drained_poll". Finally, once we're
exiting the drained section, on ".drained_end" we set "quiescing =
false" and call "nbd_client_receive_next_request()" to resume the
processing of new requests.

With these changes, "blk_aio_attach()" and "blk_aio_detach()" can be
reverted to be as simple as they were before f148ae7d36.

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1960137
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 nbd/server.c | 82 ++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 61 insertions(+), 21 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 86a44a9b41..b60ebc3ab6 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1513,6 +1513,11 @@ static void nbd_request_put(NBDRequestData *req)
     g_free(req);
 
     client->nb_requests--;
+
+    if (client->quiescing && client->nb_requests == 0) {
+        aio_wait_kick();
+    }
+
     nbd_client_receive_next_request(client);
 
     nbd_client_put(client);
@@ -1530,49 +1535,68 @@ static void blk_aio_attached(AioContext *ctx, void *opaque)
     QTAILQ_FOREACH(client, &exp->clients, next) {
         qio_channel_attach_aio_context(client->ioc, ctx);
 
+        assert(client->nb_requests == 0);
         assert(client->recv_coroutine == NULL);
         assert(client->send_coroutine == NULL);
-
-        if (client->quiescing) {
-            client->quiescing = false;
-            nbd_client_receive_next_request(client);
-        }
     }
 }
 
-static void nbd_aio_detach_bh(void *opaque)
+static void blk_aio_detach(void *opaque)
 {
     NBDExport *exp = opaque;
     NBDClient *client;
 
+    trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
+
     QTAILQ_FOREACH(client, &exp->clients, next) {
         qio_channel_detach_aio_context(client->ioc);
+    }
+
+    exp->common.ctx = NULL;
+}
+
+static void nbd_drained_begin(void *opaque)
+{
+    NBDExport *exp = opaque;
+    NBDClient *client;
+
+    QTAILQ_FOREACH(client, &exp->clients, next) {
         client->quiescing = true;
+    }
+}
 
-        if (client->recv_coroutine) {
-            if (client->read_yielding) {
-                qemu_aio_coroutine_enter(exp->common.ctx,
-                                         client->recv_coroutine);
-            } else {
-                AIO_WAIT_WHILE(exp->common.ctx, client->recv_coroutine != NULL);
-            }
-        }
+static void nbd_drained_end(void *opaque)
+{
+    NBDExport *exp = opaque;
+    NBDClient *client;
 
-        if (client->send_coroutine) {
-            AIO_WAIT_WHILE(exp->common.ctx, client->send_coroutine != NULL);
-        }
+    QTAILQ_FOREACH(client, &exp->clients, next) {
+        client->quiescing = false;
+        nbd_client_receive_next_request(client);
     }
 }
 
-static void blk_aio_detach(void *opaque)
+static bool nbd_drained_poll(void *opaque)
 {
     NBDExport *exp = opaque;
+    NBDClient *client;
 
-    trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
+    QTAILQ_FOREACH(client, &exp->clients, next) {
+        if (client->nb_requests != 0) {
+            /*
+             * If there's a coroutine waiting for a request on nbd_read_eof()
+             * enter it here so we don't depend on the client to wake it up.
+             */
+            if (client->recv_coroutine != NULL && client->read_yielding) {
+                qemu_aio_coroutine_enter(exp->common.ctx,
+                                         client->recv_coroutine);
+            }
 
-    aio_wait_bh_oneshot(exp->common.ctx, nbd_aio_detach_bh, exp);
+            return true;
+        }
+    }
 
-    exp->common.ctx = NULL;
+    return false;
 }
 
 static void nbd_eject_notifier(Notifier *n, void *data)
@@ -1594,6 +1618,12 @@ void nbd_export_set_on_eject_blk(BlockExport *exp, BlockBackend *blk)
     blk_add_remove_bs_notifier(blk, &nbd_exp->eject_notifier);
 }
 
+static const BlockDevOps nbd_block_ops = {
+    .drained_begin = nbd_drained_begin,
+    .drained_end = nbd_drained_end,
+    .drained_poll = nbd_drained_poll,
+};
+
 static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
                              Error **errp)
 {
@@ -1715,8 +1745,17 @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
 
     exp->allocation_depth = arg->allocation_depth;
 
+    /*
+     * We need to inhibit request queuing in the block layer to ensure we can
+     * be properly quiesced when entering a drained section, as our coroutines
+     * servicing pending requests might enter blk_pread().
+     */
+    blk_set_disable_request_queuing(blk, true);
+
     blk_add_aio_context_notifier(blk, blk_aio_attached, blk_aio_detach, exp);
 
+    blk_set_dev_ops(blk, &nbd_block_ops, exp);
+
     QTAILQ_INSERT_TAIL(&exports, exp, next);
 
     return 0;
@@ -1788,6 +1827,7 @@ static void nbd_export_delete(BlockExport *blk_exp)
         }
         blk_remove_aio_context_notifier(exp->common.blk, blk_aio_attached,
                                         blk_aio_detach, exp);
+        blk_set_disable_request_queuing(exp->common.blk, false);
     }
 
     for (i = 0; i < exp->nr_export_bitmaps; i++) {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] block-backend: add drained_poll
  2021-06-02  6:05 ` [PATCH v2 1/2] block-backend: add drained_poll Sergio Lopez
@ 2021-06-02  8:34   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-06-02  8:34 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: Max Reitz, qemu-block, Nir Soffer, Kevin Wolf, Eric Blake

02.06.2021 09:05, Sergio Lopez wrote:
> Allow block backends to poll their devices/users to check if they have
> been quiesced when entering a drained section.
> 
> This will be used in the next patch to wait for the NBD server to be
> completely quiesced.
> 
> Suggested-by: Kevin Wolf<kwolf@redhat.com>
> Reviewed-by: Kevin Wolf<kwolf@redhat.com>
> Reviewed-by: Eric Blake<eblake@redhat.com>
> Signed-off-by: Sergio Lopez<slp@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] nbd/server: Quiesce server on drained section
  2021-06-02  6:05 [PATCH v2 0/2] nbd/server: Quiesce server on drained section Sergio Lopez
  2021-06-02  6:05 ` [PATCH v2 1/2] block-backend: add drained_poll Sergio Lopez
  2021-06-02  6:05 ` [PATCH v2 2/2] nbd/server: Use drained block ops to quiesce the server Sergio Lopez
@ 2021-06-02 11:57 ` Kevin Wolf
  2 siblings, 0 replies; 8+ messages in thread
From: Kevin Wolf @ 2021-06-02 11:57 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel, Max Reitz,
	Nir Soffer

Am 02.06.2021 um 08:05 hat Sergio Lopez geschrieben:
> Before switching between AioContexts we need to make sure that we're
> fully quiesced ("nb_requests == 0" for every client) when entering the
> drained section. Otherwise, coroutines may be run in the wrong context
> after the switch, leading to a number of critical issues.
> 
> To accomplish this, we add ".drained_poll" to BlockDevOps and use it
> in the NBD server, along with ".drained_being" and "drained_end", to
> coordinate the quiescing of the server while entering a drained
> section.
> 
> v2:
>  - Use a bool for the value returned by .drained_poll [Kevin]
>  - Change .drained_poll comment to reflect that the returned boolean
>    value will be true if the device is still busy, or false otherwise
>  - Drop yield_co_list and use recv_coroutine and read_yielding [Kevin]
>  - Return "true" or "false" in nbd_drained_poll [Kevin]
>  - Fix grammar in the commit message of patch 2 [Eric]

Thanks, applied to the block branch.

Kevin



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] nbd/server: Use drained block ops to quiesce the server
  2021-06-02  6:05 ` [PATCH v2 2/2] nbd/server: Use drained block ops to quiesce the server Sergio Lopez
@ 2021-06-02 12:06   ` Vladimir Sementsov-Ogievskiy
  2021-06-02 12:44     ` Sergio Lopez
  0 siblings, 1 reply; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-06-02 12:06 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: Max Reitz, qemu-block, Nir Soffer, Kevin Wolf, Eric Blake

02.06.2021 09:05, Sergio Lopez wrote:
> Before switching between AioContexts we need to make sure that we're
> fully quiesced ("nb_requests == 0" for every client) when entering the
> drained section.
> 
> To do this, we set "quiescing = true" for every client on
> ".drained_begin" to prevent new coroutines from being created, and
> check if "nb_requests == 0" on ".drained_poll". Finally, once we're
> exiting the drained section, on ".drained_end" we set "quiescing =
> false" and call "nbd_client_receive_next_request()" to resume the
> processing of new requests.
> 
> With these changes, "blk_aio_attach()" and "blk_aio_detach()" can be
> reverted to be as simple as they were before f148ae7d36.
> 
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1960137
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>   nbd/server.c | 82 ++++++++++++++++++++++++++++++++++++++--------------
>   1 file changed, 61 insertions(+), 21 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 86a44a9b41..b60ebc3ab6 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1513,6 +1513,11 @@ static void nbd_request_put(NBDRequestData *req)
>       g_free(req);
>   
>       client->nb_requests--;
> +
> +    if (client->quiescing && client->nb_requests == 0) {
> +        aio_wait_kick();
> +    }
> +
>       nbd_client_receive_next_request(client);
>   
>       nbd_client_put(client);
> @@ -1530,49 +1535,68 @@ static void blk_aio_attached(AioContext *ctx, void *opaque)
>       QTAILQ_FOREACH(client, &exp->clients, next) {
>           qio_channel_attach_aio_context(client->ioc, ctx);
>   
> +        assert(client->nb_requests == 0);
>           assert(client->recv_coroutine == NULL);
>           assert(client->send_coroutine == NULL);
> -
> -        if (client->quiescing) {
> -            client->quiescing = false;
> -            nbd_client_receive_next_request(client);
> -        }
>       }
>   }
>   
> -static void nbd_aio_detach_bh(void *opaque)
> +static void blk_aio_detach(void *opaque)
>   {
>       NBDExport *exp = opaque;
>       NBDClient *client;
>   
> +    trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
> +
>       QTAILQ_FOREACH(client, &exp->clients, next) {
>           qio_channel_detach_aio_context(client->ioc);
> +    }
> +
> +    exp->common.ctx = NULL;
> +}
> +
> +static void nbd_drained_begin(void *opaque)
> +{
> +    NBDExport *exp = opaque;
> +    NBDClient *client;
> +
> +    QTAILQ_FOREACH(client, &exp->clients, next) {
>           client->quiescing = true;
> +    }
> +}
>   
> -        if (client->recv_coroutine) {
> -            if (client->read_yielding) {
> -                qemu_aio_coroutine_enter(exp->common.ctx,
> -                                         client->recv_coroutine);
> -            } else {
> -                AIO_WAIT_WHILE(exp->common.ctx, client->recv_coroutine != NULL);
> -            }
> -        }
> +static void nbd_drained_end(void *opaque)
> +{
> +    NBDExport *exp = opaque;
> +    NBDClient *client;
>   
> -        if (client->send_coroutine) {
> -            AIO_WAIT_WHILE(exp->common.ctx, client->send_coroutine != NULL);
> -        }
> +    QTAILQ_FOREACH(client, &exp->clients, next) {
> +        client->quiescing = false;
> +        nbd_client_receive_next_request(client);
>       }
>   }
>   
> -static void blk_aio_detach(void *opaque)
> +static bool nbd_drained_poll(void *opaque)
>   {
>       NBDExport *exp = opaque;
> +    NBDClient *client;
>   
> -    trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
> +    QTAILQ_FOREACH(client, &exp->clients, next) {
> +        if (client->nb_requests != 0) {
> +            /*
> +             * If there's a coroutine waiting for a request on nbd_read_eof()
> +             * enter it here so we don't depend on the client to wake it up.
> +             */
> +            if (client->recv_coroutine != NULL && client->read_yielding) {
> +                qemu_aio_coroutine_enter(exp->common.ctx,
> +                                         client->recv_coroutine);
> +            }
>   
> -    aio_wait_bh_oneshot(exp->common.ctx, nbd_aio_detach_bh, exp);
> +            return true;
> +        }
> +    }
>   
> -    exp->common.ctx = NULL;
> +    return false;
>   }
>   
>   static void nbd_eject_notifier(Notifier *n, void *data)
> @@ -1594,6 +1618,12 @@ void nbd_export_set_on_eject_blk(BlockExport *exp, BlockBackend *blk)
>       blk_add_remove_bs_notifier(blk, &nbd_exp->eject_notifier);
>   }
>   
> +static const BlockDevOps nbd_block_ops = {
> +    .drained_begin = nbd_drained_begin,
> +    .drained_end = nbd_drained_end,
> +    .drained_poll = nbd_drained_poll,
> +};
> +
>   static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
>                                Error **errp)
>   {
> @@ -1715,8 +1745,17 @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
>   
>       exp->allocation_depth = arg->allocation_depth;
>   
> +    /*
> +     * We need to inhibit request queuing in the block layer to ensure we can
> +     * be properly quiesced when entering a drained section, as our coroutines
> +     * servicing pending requests might enter blk_pread().
> +     */

Not very understandable to me :(. What's bad in queuing requests at blk layer during drained section?

> +    blk_set_disable_request_queuing(blk, true);
> +
>       blk_add_aio_context_notifier(blk, blk_aio_attached, blk_aio_detach, exp);
>   
> +    blk_set_dev_ops(blk, &nbd_block_ops, exp);
> +
>       QTAILQ_INSERT_TAIL(&exports, exp, next);
>   
>       return 0;
> @@ -1788,6 +1827,7 @@ static void nbd_export_delete(BlockExport *blk_exp)
>           }
>           blk_remove_aio_context_notifier(exp->common.blk, blk_aio_attached,
>                                           blk_aio_detach, exp);
> +        blk_set_disable_request_queuing(exp->common.blk, false);
>       }
>   
>       for (i = 0; i < exp->nr_export_bitmaps; i++) {
> 

I don't follow the whole logic of clients quiescing, but overall looks good to me. As I understand, prior to patch we have a kind of drain_begin realization in blk_aio_detach handler, and after patch we instead utilize generic code of drained sections with help of appropriate devops handlers.

weak:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] nbd/server: Use drained block ops to quiesce the server
  2021-06-02 12:06   ` Vladimir Sementsov-Ogievskiy
@ 2021-06-02 12:44     ` Sergio Lopez
  2021-06-02 12:48       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 8+ messages in thread
From: Sergio Lopez @ 2021-06-02 12:44 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, Nir Soffer

[-- Attachment #1: Type: text/plain, Size: 7512 bytes --]

On Wed, Jun 02, 2021 at 03:06:02PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 02.06.2021 09:05, Sergio Lopez wrote:
> > Before switching between AioContexts we need to make sure that we're
> > fully quiesced ("nb_requests == 0" for every client) when entering the
> > drained section.
> > 
> > To do this, we set "quiescing = true" for every client on
> > ".drained_begin" to prevent new coroutines from being created, and
> > check if "nb_requests == 0" on ".drained_poll". Finally, once we're
> > exiting the drained section, on ".drained_end" we set "quiescing =
> > false" and call "nbd_client_receive_next_request()" to resume the
> > processing of new requests.
> > 
> > With these changes, "blk_aio_attach()" and "blk_aio_detach()" can be
> > reverted to be as simple as they were before f148ae7d36.
> > 
> > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1960137
> > Suggested-by: Kevin Wolf <kwolf@redhat.com>
> > Signed-off-by: Sergio Lopez <slp@redhat.com>
> > ---
> >   nbd/server.c | 82 ++++++++++++++++++++++++++++++++++++++--------------
> >   1 file changed, 61 insertions(+), 21 deletions(-)
> > 
> > diff --git a/nbd/server.c b/nbd/server.c
> > index 86a44a9b41..b60ebc3ab6 100644
> > --- a/nbd/server.c
> > +++ b/nbd/server.c
> > @@ -1513,6 +1513,11 @@ static void nbd_request_put(NBDRequestData *req)
> >       g_free(req);
> >       client->nb_requests--;
> > +
> > +    if (client->quiescing && client->nb_requests == 0) {
> > +        aio_wait_kick();
> > +    }
> > +
> >       nbd_client_receive_next_request(client);
> >       nbd_client_put(client);
> > @@ -1530,49 +1535,68 @@ static void blk_aio_attached(AioContext *ctx, void *opaque)
> >       QTAILQ_FOREACH(client, &exp->clients, next) {
> >           qio_channel_attach_aio_context(client->ioc, ctx);
> > +        assert(client->nb_requests == 0);
> >           assert(client->recv_coroutine == NULL);
> >           assert(client->send_coroutine == NULL);
> > -
> > -        if (client->quiescing) {
> > -            client->quiescing = false;
> > -            nbd_client_receive_next_request(client);
> > -        }
> >       }
> >   }
> > -static void nbd_aio_detach_bh(void *opaque)
> > +static void blk_aio_detach(void *opaque)
> >   {
> >       NBDExport *exp = opaque;
> >       NBDClient *client;
> > +    trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
> > +
> >       QTAILQ_FOREACH(client, &exp->clients, next) {
> >           qio_channel_detach_aio_context(client->ioc);
> > +    }
> > +
> > +    exp->common.ctx = NULL;
> > +}
> > +
> > +static void nbd_drained_begin(void *opaque)
> > +{
> > +    NBDExport *exp = opaque;
> > +    NBDClient *client;
> > +
> > +    QTAILQ_FOREACH(client, &exp->clients, next) {
> >           client->quiescing = true;
> > +    }
> > +}
> > -        if (client->recv_coroutine) {
> > -            if (client->read_yielding) {
> > -                qemu_aio_coroutine_enter(exp->common.ctx,
> > -                                         client->recv_coroutine);
> > -            } else {
> > -                AIO_WAIT_WHILE(exp->common.ctx, client->recv_coroutine != NULL);
> > -            }
> > -        }
> > +static void nbd_drained_end(void *opaque)
> > +{
> > +    NBDExport *exp = opaque;
> > +    NBDClient *client;
> > -        if (client->send_coroutine) {
> > -            AIO_WAIT_WHILE(exp->common.ctx, client->send_coroutine != NULL);
> > -        }
> > +    QTAILQ_FOREACH(client, &exp->clients, next) {
> > +        client->quiescing = false;
> > +        nbd_client_receive_next_request(client);
> >       }
> >   }
> > -static void blk_aio_detach(void *opaque)
> > +static bool nbd_drained_poll(void *opaque)
> >   {
> >       NBDExport *exp = opaque;
> > +    NBDClient *client;
> > -    trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
> > +    QTAILQ_FOREACH(client, &exp->clients, next) {
> > +        if (client->nb_requests != 0) {
> > +            /*
> > +             * If there's a coroutine waiting for a request on nbd_read_eof()
> > +             * enter it here so we don't depend on the client to wake it up.
> > +             */
> > +            if (client->recv_coroutine != NULL && client->read_yielding) {
> > +                qemu_aio_coroutine_enter(exp->common.ctx,
> > +                                         client->recv_coroutine);
> > +            }
> > -    aio_wait_bh_oneshot(exp->common.ctx, nbd_aio_detach_bh, exp);
> > +            return true;
> > +        }
> > +    }
> > -    exp->common.ctx = NULL;
> > +    return false;
> >   }
> >   static void nbd_eject_notifier(Notifier *n, void *data)
> > @@ -1594,6 +1618,12 @@ void nbd_export_set_on_eject_blk(BlockExport *exp, BlockBackend *blk)
> >       blk_add_remove_bs_notifier(blk, &nbd_exp->eject_notifier);
> >   }
> > +static const BlockDevOps nbd_block_ops = {
> > +    .drained_begin = nbd_drained_begin,
> > +    .drained_end = nbd_drained_end,
> > +    .drained_poll = nbd_drained_poll,
> > +};
> > +
> >   static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
> >                                Error **errp)
> >   {
> > @@ -1715,8 +1745,17 @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
> >       exp->allocation_depth = arg->allocation_depth;
> > +    /*
> > +     * We need to inhibit request queuing in the block layer to ensure we can
> > +     * be properly quiesced when entering a drained section, as our coroutines
> > +     * servicing pending requests might enter blk_pread().
> > +     */
> 
> Not very understandable to me :(. What's bad in queuing requests at blk layer during drained section?

We need to make sure that all coroutines in the NBD server have
finished (client->nb_requests == 0) before detaching from the
AioContext. If we don't inhibit request queuing, some coroutines may
get stuck in blk_pread()->...->blk_wait_while_drained(), causing
nbd_drained_poll() to always return that we're busy.

> > +    blk_set_disable_request_queuing(blk, true);
> > +
> >       blk_add_aio_context_notifier(blk, blk_aio_attached, blk_aio_detach, exp);
> > +    blk_set_dev_ops(blk, &nbd_block_ops, exp);
> > +
> >       QTAILQ_INSERT_TAIL(&exports, exp, next);
> >       return 0;
> > @@ -1788,6 +1827,7 @@ static void nbd_export_delete(BlockExport *blk_exp)
> >           }
> >           blk_remove_aio_context_notifier(exp->common.blk, blk_aio_attached,
> >                                           blk_aio_detach, exp);
> > +        blk_set_disable_request_queuing(exp->common.blk, false);
> >       }
> >       for (i = 0; i < exp->nr_export_bitmaps; i++) {
> > 
> 
> I don't follow the whole logic of clients quiescing, but overall looks good to me. As I understand, prior to patch we have a kind of drain_begin realization in blk_aio_detach handler, and after patch we instead utilize generic code of drained sections with help of appropriate devops handlers.

Doing that in blk_aio_detach() was a bit too late since we were
already in the drained section, and as stated above, we need it alive
to ensure that all of the NBD server coroutines finish properly.

Also, .drained_poll allows us to align quiescing the NBD server with
the block layer.

Thanks,
Sergio.

> weak:
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> 
> -- 
> Best regards,
> Vladimir
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] nbd/server: Use drained block ops to quiesce the server
  2021-06-02 12:44     ` Sergio Lopez
@ 2021-06-02 12:48       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-06-02 12:48 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: qemu-devel, Max Reitz, qemu-block, Nir Soffer, Kevin Wolf, Eric Blake

02.06.2021 15:44, Sergio Lopez wrote:
> On Wed, Jun 02, 2021 at 03:06:02PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> 02.06.2021 09:05, Sergio Lopez wrote:
>>> Before switching between AioContexts we need to make sure that we're
>>> fully quiesced ("nb_requests == 0" for every client) when entering the
>>> drained section.
>>>
>>> To do this, we set "quiescing = true" for every client on
>>> ".drained_begin" to prevent new coroutines from being created, and
>>> check if "nb_requests == 0" on ".drained_poll". Finally, once we're
>>> exiting the drained section, on ".drained_end" we set "quiescing =
>>> false" and call "nbd_client_receive_next_request()" to resume the
>>> processing of new requests.
>>>
>>> With these changes, "blk_aio_attach()" and "blk_aio_detach()" can be
>>> reverted to be as simple as they were before f148ae7d36.
>>>
>>> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1960137
>>> Suggested-by: Kevin Wolf <kwolf@redhat.com>
>>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>>> ---
>>>    nbd/server.c | 82 ++++++++++++++++++++++++++++++++++++++--------------
>>>    1 file changed, 61 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/nbd/server.c b/nbd/server.c
>>> index 86a44a9b41..b60ebc3ab6 100644
>>> --- a/nbd/server.c
>>> +++ b/nbd/server.c
>>> @@ -1513,6 +1513,11 @@ static void nbd_request_put(NBDRequestData *req)
>>>        g_free(req);
>>>        client->nb_requests--;
>>> +
>>> +    if (client->quiescing && client->nb_requests == 0) {
>>> +        aio_wait_kick();
>>> +    }
>>> +
>>>        nbd_client_receive_next_request(client);
>>>        nbd_client_put(client);
>>> @@ -1530,49 +1535,68 @@ static void blk_aio_attached(AioContext *ctx, void *opaque)
>>>        QTAILQ_FOREACH(client, &exp->clients, next) {
>>>            qio_channel_attach_aio_context(client->ioc, ctx);
>>> +        assert(client->nb_requests == 0);
>>>            assert(client->recv_coroutine == NULL);
>>>            assert(client->send_coroutine == NULL);
>>> -
>>> -        if (client->quiescing) {
>>> -            client->quiescing = false;
>>> -            nbd_client_receive_next_request(client);
>>> -        }
>>>        }
>>>    }
>>> -static void nbd_aio_detach_bh(void *opaque)
>>> +static void blk_aio_detach(void *opaque)
>>>    {
>>>        NBDExport *exp = opaque;
>>>        NBDClient *client;
>>> +    trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
>>> +
>>>        QTAILQ_FOREACH(client, &exp->clients, next) {
>>>            qio_channel_detach_aio_context(client->ioc);
>>> +    }
>>> +
>>> +    exp->common.ctx = NULL;
>>> +}
>>> +
>>> +static void nbd_drained_begin(void *opaque)
>>> +{
>>> +    NBDExport *exp = opaque;
>>> +    NBDClient *client;
>>> +
>>> +    QTAILQ_FOREACH(client, &exp->clients, next) {
>>>            client->quiescing = true;
>>> +    }
>>> +}
>>> -        if (client->recv_coroutine) {
>>> -            if (client->read_yielding) {
>>> -                qemu_aio_coroutine_enter(exp->common.ctx,
>>> -                                         client->recv_coroutine);
>>> -            } else {
>>> -                AIO_WAIT_WHILE(exp->common.ctx, client->recv_coroutine != NULL);
>>> -            }
>>> -        }
>>> +static void nbd_drained_end(void *opaque)
>>> +{
>>> +    NBDExport *exp = opaque;
>>> +    NBDClient *client;
>>> -        if (client->send_coroutine) {
>>> -            AIO_WAIT_WHILE(exp->common.ctx, client->send_coroutine != NULL);
>>> -        }
>>> +    QTAILQ_FOREACH(client, &exp->clients, next) {
>>> +        client->quiescing = false;
>>> +        nbd_client_receive_next_request(client);
>>>        }
>>>    }
>>> -static void blk_aio_detach(void *opaque)
>>> +static bool nbd_drained_poll(void *opaque)
>>>    {
>>>        NBDExport *exp = opaque;
>>> +    NBDClient *client;
>>> -    trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
>>> +    QTAILQ_FOREACH(client, &exp->clients, next) {
>>> +        if (client->nb_requests != 0) {
>>> +            /*
>>> +             * If there's a coroutine waiting for a request on nbd_read_eof()
>>> +             * enter it here so we don't depend on the client to wake it up.
>>> +             */
>>> +            if (client->recv_coroutine != NULL && client->read_yielding) {
>>> +                qemu_aio_coroutine_enter(exp->common.ctx,
>>> +                                         client->recv_coroutine);
>>> +            }
>>> -    aio_wait_bh_oneshot(exp->common.ctx, nbd_aio_detach_bh, exp);
>>> +            return true;
>>> +        }
>>> +    }
>>> -    exp->common.ctx = NULL;
>>> +    return false;
>>>    }
>>>    static void nbd_eject_notifier(Notifier *n, void *data)
>>> @@ -1594,6 +1618,12 @@ void nbd_export_set_on_eject_blk(BlockExport *exp, BlockBackend *blk)
>>>        blk_add_remove_bs_notifier(blk, &nbd_exp->eject_notifier);
>>>    }
>>> +static const BlockDevOps nbd_block_ops = {
>>> +    .drained_begin = nbd_drained_begin,
>>> +    .drained_end = nbd_drained_end,
>>> +    .drained_poll = nbd_drained_poll,
>>> +};
>>> +
>>>    static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
>>>                                 Error **errp)
>>>    {
>>> @@ -1715,8 +1745,17 @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
>>>        exp->allocation_depth = arg->allocation_depth;
>>> +    /*
>>> +     * We need to inhibit request queuing in the block layer to ensure we can
>>> +     * be properly quiesced when entering a drained section, as our coroutines
>>> +     * servicing pending requests might enter blk_pread().
>>> +     */
>>
>> Not very understandable to me :(. What's bad in queuing requests at blk layer during drained section?
> 
> We need to make sure that all coroutines in the NBD server have
> finished (client->nb_requests == 0) before detaching from the
> AioContext. If we don't inhibit request queuing, some coroutines may
> get stuck in blk_pread()->...->blk_wait_while_drained(), causing
> nbd_drained_poll() to always return that we're busy.

Ah, OK. Thanks for explanation.

> 
>>> +    blk_set_disable_request_queuing(blk, true);
>>> +
>>>        blk_add_aio_context_notifier(blk, blk_aio_attached, blk_aio_detach, exp);
>>> +    blk_set_dev_ops(blk, &nbd_block_ops, exp);
>>> +
>>>        QTAILQ_INSERT_TAIL(&exports, exp, next);
>>>        return 0;
>>> @@ -1788,6 +1827,7 @@ static void nbd_export_delete(BlockExport *blk_exp)
>>>            }
>>>            blk_remove_aio_context_notifier(exp->common.blk, blk_aio_attached,
>>>                                            blk_aio_detach, exp);
>>> +        blk_set_disable_request_queuing(exp->common.blk, false);
>>>        }
>>>        for (i = 0; i < exp->nr_export_bitmaps; i++) {
>>>
>>
>> I don't follow the whole logic of clients quiescing, but overall looks good to me. As I understand, prior to patch we have a kind of drain_begin realization in blk_aio_detach handler, and after patch we instead utilize generic code of drained sections with help of appropriate devops handlers.
> 
> Doing that in blk_aio_detach() was a bit too late since we were
> already in the drained section, and as stated above, we need it alive
> to ensure that all of the NBD server coroutines finish properly.
> 
> Also, .drained_poll allows us to align quiescing the NBD server with
> the block layer.
> 
> Thanks,
> Sergio.
> 
>> weak:
>>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>
>>
>> -- 
>> Best regards,
>> Vladimir
>>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-06-02 12:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-02  6:05 [PATCH v2 0/2] nbd/server: Quiesce server on drained section Sergio Lopez
2021-06-02  6:05 ` [PATCH v2 1/2] block-backend: add drained_poll Sergio Lopez
2021-06-02  8:34   ` Vladimir Sementsov-Ogievskiy
2021-06-02  6:05 ` [PATCH v2 2/2] nbd/server: Use drained block ops to quiesce the server Sergio Lopez
2021-06-02 12:06   ` Vladimir Sementsov-Ogievskiy
2021-06-02 12:44     ` Sergio Lopez
2021-06-02 12:48       ` Vladimir Sementsov-Ogievskiy
2021-06-02 11:57 ` [PATCH v2 0/2] nbd/server: Quiesce server on drained section Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.