* [PATCH 0/2] nbd/server: Quiesce server on drained section
@ 2021-06-01 5:57 Sergio Lopez
2021-06-01 5:57 ` [PATCH 1/2] block-backend: add drained_poll Sergio Lopez
2021-06-01 5:57 ` [PATCH 2/2] nbd/server: Use drained block ops to quiesce the server Sergio Lopez
0 siblings, 2 replies; 11+ messages in thread
From: Sergio Lopez @ 2021-06-01 5:57 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Sergio Lopez,
qemu-block, Max Reitz, Nir Soffer
Before switching between AioContexts we need to make sure that we're
fully quiesced ("nb_requests == 0" for every client) when entering the
drained section. Otherwise, coroutines may be run in the wrong context
after the switch, leading to a number of critical issues.
To accomplish this, we add ".drained_poll" to BlockDevOps and use it
in the NBD server, along with ".drained_being" and "drained_end", to
coordinate the quiescing of the server while entering a drained
section.
Sergio Lopez (2):
block-backend: add drained_poll
nbd/server: Use drained block ops to quiesce the server
block/block-backend.c | 7 ++-
include/sysemu/block-backend.h | 4 ++
nbd/server.c | 99 +++++++++++++++++++++++++---------
3 files changed, 85 insertions(+), 25 deletions(-)
--
2.26.2
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] block-backend: add drained_poll
2021-06-01 5:57 [PATCH 0/2] nbd/server: Quiesce server on drained section Sergio Lopez
@ 2021-06-01 5:57 ` Sergio Lopez
2021-06-01 15:59 ` Kevin Wolf
2021-06-01 5:57 ` [PATCH 2/2] nbd/server: Use drained block ops to quiesce the server Sergio Lopez
1 sibling, 1 reply; 11+ messages in thread
From: Sergio Lopez @ 2021-06-01 5:57 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Sergio Lopez,
qemu-block, Max Reitz, Nir Soffer
Allow block backends to poll their devices/users to check if they have
been quiesced when entering a drained section.
This will be used in the next patch to wait for the NBD server to be
completely quiesced.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Sergio Lopez <slp@redhat.com>
---
block/block-backend.c | 7 ++++++-
include/sysemu/block-backend.h | 4 ++++
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/block/block-backend.c b/block/block-backend.c
index de5496af66..163ca05b97 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2393,8 +2393,13 @@ static void blk_root_drained_begin(BdrvChild *child)
static bool blk_root_drained_poll(BdrvChild *child)
{
BlockBackend *blk = child->opaque;
+ int ret = 0;
assert(blk->quiesce_counter);
- return !!blk->in_flight;
+
+ if (blk->dev_ops && blk->dev_ops->drained_poll) {
+ ret = blk->dev_ops->drained_poll(blk->dev_opaque);
+ }
+ return ret || !!blk->in_flight;
}
static void blk_root_drained_end(BdrvChild *child, int *drained_end_counter)
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 880e903293..9992072e18 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -66,6 +66,10 @@ typedef struct BlockDevOps {
* Runs when the backend's last drain request ends.
*/
void (*drained_end)(void *opaque);
+ /*
+ * Is the device drained?
+ */
+ bool (*drained_poll)(void *opaque);
} BlockDevOps;
/* This struct is embedded in (the private) BlockBackend struct and contains
--
2.26.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] nbd/server: Use drained block ops to quiesce the server
2021-06-01 5:57 [PATCH 0/2] nbd/server: Quiesce server on drained section Sergio Lopez
2021-06-01 5:57 ` [PATCH 1/2] block-backend: add drained_poll Sergio Lopez
@ 2021-06-01 5:57 ` Sergio Lopez
2021-06-01 16:08 ` Kevin Wolf
2021-06-01 21:29 ` Eric Blake
1 sibling, 2 replies; 11+ messages in thread
From: Sergio Lopez @ 2021-06-01 5:57 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Sergio Lopez,
qemu-block, Max Reitz, Nir Soffer
Before switching between AioContexts we need to make sure that we're
fully quiesced ("nb_requests == 0" for every client) when entering the
drained section.
To do this, we set "quiescing = true" for every client on
".drained_begin" to prevent new coroutines to be created, and check if
"nb_requests == 0" on ".drained_poll". Finally, once we're exiting the
drained section, on ".drained_end" we set "quiescing = false" and
call "nbd_client_receive_next_request()" to resume the processing of
new requests.
With these changes, "blk_aio_attach()" and "blk_aio_detach()" can be
reverted to be as simple as they were before f148ae7d36.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1960137
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Sergio Lopez <slp@redhat.com>
---
nbd/server.c | 99 +++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 75 insertions(+), 24 deletions(-)
diff --git a/nbd/server.c b/nbd/server.c
index 86a44a9b41..33e55479d7 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -132,7 +132,7 @@ struct NBDClient {
CoMutex send_lock;
Coroutine *send_coroutine;
- bool read_yielding;
+ GSList *yield_co_list; /* List of coroutines yielding on nbd_read_eof */
bool quiescing;
QTAILQ_ENTRY(NBDClient) next;
@@ -1367,6 +1367,7 @@ static inline int coroutine_fn
nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
{
bool partial = false;
+ Coroutine *co;
assert(size);
while (size > 0) {
@@ -1375,9 +1376,12 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
len = qio_channel_readv(client->ioc, &iov, 1, errp);
if (len == QIO_CHANNEL_ERR_BLOCK) {
- client->read_yielding = true;
+ co = qemu_coroutine_self();
+
+ client->yield_co_list = g_slist_prepend(client->yield_co_list, co);
qio_channel_yield(client->ioc, G_IO_IN);
- client->read_yielding = false;
+ client->yield_co_list = g_slist_remove(client->yield_co_list, co);
+
if (client->quiescing) {
return -EAGAIN;
}
@@ -1513,6 +1517,11 @@ static void nbd_request_put(NBDRequestData *req)
g_free(req);
client->nb_requests--;
+
+ if (client->quiescing && client->nb_requests == 0) {
+ aio_wait_kick();
+ }
+
nbd_client_receive_next_request(client);
nbd_client_put(client);
@@ -1530,49 +1539,75 @@ static void blk_aio_attached(AioContext *ctx, void *opaque)
QTAILQ_FOREACH(client, &exp->clients, next) {
qio_channel_attach_aio_context(client->ioc, ctx);
+ assert(client->nb_requests == 0);
assert(client->recv_coroutine == NULL);
assert(client->send_coroutine == NULL);
-
- if (client->quiescing) {
- client->quiescing = false;
- nbd_client_receive_next_request(client);
- }
}
}
-static void nbd_aio_detach_bh(void *opaque)
+static void blk_aio_detach(void *opaque)
{
NBDExport *exp = opaque;
NBDClient *client;
+ trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
+
QTAILQ_FOREACH(client, &exp->clients, next) {
qio_channel_detach_aio_context(client->ioc);
+ }
+
+ exp->common.ctx = NULL;
+}
+
+static void nbd_drained_begin(void *opaque)
+{
+ NBDExport *exp = opaque;
+ NBDClient *client;
+
+ QTAILQ_FOREACH(client, &exp->clients, next) {
client->quiescing = true;
+ }
+}
- if (client->recv_coroutine) {
- if (client->read_yielding) {
- qemu_aio_coroutine_enter(exp->common.ctx,
- client->recv_coroutine);
- } else {
- AIO_WAIT_WHILE(exp->common.ctx, client->recv_coroutine != NULL);
- }
- }
+static void nbd_drained_end(void *opaque)
+{
+ NBDExport *exp = opaque;
+ NBDClient *client;
- if (client->send_coroutine) {
- AIO_WAIT_WHILE(exp->common.ctx, client->send_coroutine != NULL);
- }
+ QTAILQ_FOREACH(client, &exp->clients, next) {
+ client->quiescing = false;
+ nbd_client_receive_next_request(client);
}
}
-static void blk_aio_detach(void *opaque)
+static bool nbd_drained_poll(void *opaque)
{
NBDExport *exp = opaque;
+ NBDClient *client;
+ Coroutine *co;
+ GSList *entry;
+ GSList *coroutine_list;
- trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
+ QTAILQ_FOREACH(client, &exp->clients, next) {
+ if (client->nb_requests != 0) {
+ /*
+ * Enter coroutines waiting for new requests on nbd_read_eof(), so
+ * we don't depend on the client to wake us up.
+ */
+ coroutine_list = g_slist_copy(client->yield_co_list);
+ for (entry = coroutine_list;
+ entry != NULL;
+ entry = g_slist_next(entry)) {
+ co = entry->data;
+ qemu_aio_coroutine_enter(exp->common.ctx, co);
+ }
+ g_slist_free(coroutine_list);
- aio_wait_bh_oneshot(exp->common.ctx, nbd_aio_detach_bh, exp);
+ return 1;
+ }
+ }
- exp->common.ctx = NULL;
+ return 0;
}
static void nbd_eject_notifier(Notifier *n, void *data)
@@ -1594,6 +1629,12 @@ void nbd_export_set_on_eject_blk(BlockExport *exp, BlockBackend *blk)
blk_add_remove_bs_notifier(blk, &nbd_exp->eject_notifier);
}
+static const BlockDevOps nbd_block_ops = {
+ .drained_begin = nbd_drained_begin,
+ .drained_end = nbd_drained_end,
+ .drained_poll = nbd_drained_poll,
+};
+
static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
Error **errp)
{
@@ -1715,8 +1756,17 @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
exp->allocation_depth = arg->allocation_depth;
+ /*
+ * We need to inhibit request queuing in the block layer to ensure we can
+ * be properly quiesced when entering a drained section, as our coroutines
+ * servicing pending requests might enter blk_pread().
+ */
+ blk_set_disable_request_queuing(blk, true);
+
blk_add_aio_context_notifier(blk, blk_aio_attached, blk_aio_detach, exp);
+ blk_set_dev_ops(blk, &nbd_block_ops, exp);
+
QTAILQ_INSERT_TAIL(&exports, exp, next);
return 0;
@@ -1788,6 +1838,7 @@ static void nbd_export_delete(BlockExport *blk_exp)
}
blk_remove_aio_context_notifier(exp->common.blk, blk_aio_attached,
blk_aio_detach, exp);
+ blk_set_disable_request_queuing(exp->common.blk, false);
}
for (i = 0; i < exp->nr_export_bitmaps; i++) {
--
2.26.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] block-backend: add drained_poll
2021-06-01 5:57 ` [PATCH 1/2] block-backend: add drained_poll Sergio Lopez
@ 2021-06-01 15:59 ` Kevin Wolf
2021-06-01 16:32 ` Sergio Lopez
2021-06-01 21:24 ` Eric Blake
0 siblings, 2 replies; 11+ messages in thread
From: Kevin Wolf @ 2021-06-01 15:59 UTC (permalink / raw)
To: Sergio Lopez
Cc: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel, Max Reitz,
Nir Soffer
Am 01.06.2021 um 07:57 hat Sergio Lopez geschrieben:
> Allow block backends to poll their devices/users to check if they have
> been quiesced when entering a drained section.
>
> This will be used in the next patch to wait for the NBD server to be
> completely quiesced.
>
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
> block/block-backend.c | 7 ++++++-
> include/sysemu/block-backend.h | 4 ++++
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index de5496af66..163ca05b97 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -2393,8 +2393,13 @@ static void blk_root_drained_begin(BdrvChild *child)
> static bool blk_root_drained_poll(BdrvChild *child)
> {
> BlockBackend *blk = child->opaque;
> + int ret = 0;
It's really a bool.
> assert(blk->quiesce_counter);
> - return !!blk->in_flight;
> +
> + if (blk->dev_ops && blk->dev_ops->drained_poll) {
> + ret = blk->dev_ops->drained_poll(blk->dev_opaque);
> + }
> + return ret || !!blk->in_flight;
> }
Doesn't make a difference for correctness, of course, so whether you
change it or not:
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] nbd/server: Use drained block ops to quiesce the server
2021-06-01 5:57 ` [PATCH 2/2] nbd/server: Use drained block ops to quiesce the server Sergio Lopez
@ 2021-06-01 16:08 ` Kevin Wolf
2021-06-01 16:31 ` Sergio Lopez
2021-06-01 21:29 ` Eric Blake
1 sibling, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2021-06-01 16:08 UTC (permalink / raw)
To: Sergio Lopez
Cc: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel, Max Reitz,
Nir Soffer
Am 01.06.2021 um 07:57 hat Sergio Lopez geschrieben:
> Before switching between AioContexts we need to make sure that we're
> fully quiesced ("nb_requests == 0" for every client) when entering the
> drained section.
>
> To do this, we set "quiescing = true" for every client on
> ".drained_begin" to prevent new coroutines to be created, and check if
> "nb_requests == 0" on ".drained_poll". Finally, once we're exiting the
> drained section, on ".drained_end" we set "quiescing = false" and
> call "nbd_client_receive_next_request()" to resume the processing of
> new requests.
>
> With these changes, "blk_aio_attach()" and "blk_aio_detach()" can be
> reverted to be as simple as they were before f148ae7d36.
>
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1960137
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
> nbd/server.c | 99 +++++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 75 insertions(+), 24 deletions(-)
>
> diff --git a/nbd/server.c b/nbd/server.c
> index 86a44a9b41..33e55479d7 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -132,7 +132,7 @@ struct NBDClient {
> CoMutex send_lock;
> Coroutine *send_coroutine;
>
> - bool read_yielding;
> + GSList *yield_co_list; /* List of coroutines yielding on nbd_read_eof */
> bool quiescing;
Hm, how do you get more than one coroutine per client yielding in
nbd_read_eof() at the same time? I thought the model is that you always
have one coroutine reading the next request (which is
client->recv_coroutine) and all the others are just processing the
request they had read earlier. Multiple coroutines reading from the
same socket would sound like a bad idea.
> QTAILQ_ENTRY(NBDClient) next;
> @@ -1367,6 +1367,7 @@ static inline int coroutine_fn
> nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
> {
> bool partial = false;
> + Coroutine *co;
>
> assert(size);
> while (size > 0) {
> @@ -1375,9 +1376,12 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
>
> len = qio_channel_readv(client->ioc, &iov, 1, errp);
> if (len == QIO_CHANNEL_ERR_BLOCK) {
> - client->read_yielding = true;
> + co = qemu_coroutine_self();
> +
> + client->yield_co_list = g_slist_prepend(client->yield_co_list, co);
> qio_channel_yield(client->ioc, G_IO_IN);
> - client->read_yielding = false;
> + client->yield_co_list = g_slist_remove(client->yield_co_list, co);
> +
> if (client->quiescing) {
> return -EAGAIN;
> }
> @@ -1513,6 +1517,11 @@ static void nbd_request_put(NBDRequestData *req)
> g_free(req);
>
> client->nb_requests--;
> +
> + if (client->quiescing && client->nb_requests == 0) {
> + aio_wait_kick();
> + }
> +
> nbd_client_receive_next_request(client);
>
> nbd_client_put(client);
> @@ -1530,49 +1539,75 @@ static void blk_aio_attached(AioContext *ctx, void *opaque)
> QTAILQ_FOREACH(client, &exp->clients, next) {
> qio_channel_attach_aio_context(client->ioc, ctx);
>
> + assert(client->nb_requests == 0);
> assert(client->recv_coroutine == NULL);
> assert(client->send_coroutine == NULL);
> -
> - if (client->quiescing) {
> - client->quiescing = false;
> - nbd_client_receive_next_request(client);
> - }
> }
> }
>
> -static void nbd_aio_detach_bh(void *opaque)
> +static void blk_aio_detach(void *opaque)
> {
> NBDExport *exp = opaque;
> NBDClient *client;
>
> + trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
> +
> QTAILQ_FOREACH(client, &exp->clients, next) {
> qio_channel_detach_aio_context(client->ioc);
> + }
> +
> + exp->common.ctx = NULL;
> +}
> +
> +static void nbd_drained_begin(void *opaque)
> +{
> + NBDExport *exp = opaque;
> + NBDClient *client;
> +
> + QTAILQ_FOREACH(client, &exp->clients, next) {
> client->quiescing = true;
> + }
> +}
>
> - if (client->recv_coroutine) {
> - if (client->read_yielding) {
> - qemu_aio_coroutine_enter(exp->common.ctx,
> - client->recv_coroutine);
> - } else {
> - AIO_WAIT_WHILE(exp->common.ctx, client->recv_coroutine != NULL);
> - }
> - }
> +static void nbd_drained_end(void *opaque)
> +{
> + NBDExport *exp = opaque;
> + NBDClient *client;
>
> - if (client->send_coroutine) {
> - AIO_WAIT_WHILE(exp->common.ctx, client->send_coroutine != NULL);
> - }
> + QTAILQ_FOREACH(client, &exp->clients, next) {
> + client->quiescing = false;
> + nbd_client_receive_next_request(client);
> }
> }
>
> -static void blk_aio_detach(void *opaque)
> +static bool nbd_drained_poll(void *opaque)
> {
> NBDExport *exp = opaque;
> + NBDClient *client;
> + Coroutine *co;
> + GSList *entry;
> + GSList *coroutine_list;
>
> - trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
> + QTAILQ_FOREACH(client, &exp->clients, next) {
> + if (client->nb_requests != 0) {
> + /*
> + * Enter coroutines waiting for new requests on nbd_read_eof(), so
> + * we don't depend on the client to wake us up.
> + */
> + coroutine_list = g_slist_copy(client->yield_co_list);
> + for (entry = coroutine_list;
> + entry != NULL;
> + entry = g_slist_next(entry)) {
> + co = entry->data;
> + qemu_aio_coroutine_enter(exp->common.ctx, co);
> + }
> + g_slist_free(coroutine_list);
>
> - aio_wait_bh_oneshot(exp->common.ctx, nbd_aio_detach_bh, exp);
> + return 1;
This would be more accurately spelt true...
> + }
> + }
>
> - exp->common.ctx = NULL;
> + return 0;
...and this false.
> }
>
> static void nbd_eject_notifier(Notifier *n, void *data)
The patch looks correct to me, though I'm not sure if yield_co_list is
an unnecessary complication (and if it isn't, whether that's safe).
I would be happy enough to apply it anyway if you can explain the
yield_co_list thing, but I'll give Eric some time to have a look, too.
Kevin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] nbd/server: Use drained block ops to quiesce the server
2021-06-01 16:08 ` Kevin Wolf
@ 2021-06-01 16:31 ` Sergio Lopez
2021-06-01 21:31 ` Eric Blake
0 siblings, 1 reply; 11+ messages in thread
From: Sergio Lopez @ 2021-06-01 16:31 UTC (permalink / raw)
To: Kevin Wolf
Cc: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel, Max Reitz,
Nir Soffer
[-- Attachment #1: Type: text/plain, Size: 7371 bytes --]
On Tue, Jun 01, 2021 at 06:08:41PM +0200, Kevin Wolf wrote:
> Am 01.06.2021 um 07:57 hat Sergio Lopez geschrieben:
> > Before switching between AioContexts we need to make sure that we're
> > fully quiesced ("nb_requests == 0" for every client) when entering the
> > drained section.
> >
> > To do this, we set "quiescing = true" for every client on
> > ".drained_begin" to prevent new coroutines to be created, and check if
> > "nb_requests == 0" on ".drained_poll". Finally, once we're exiting the
> > drained section, on ".drained_end" we set "quiescing = false" and
> > call "nbd_client_receive_next_request()" to resume the processing of
> > new requests.
> >
> > With these changes, "blk_aio_attach()" and "blk_aio_detach()" can be
> > reverted to be as simple as they were before f148ae7d36.
> >
> > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1960137
> > Suggested-by: Kevin Wolf <kwolf@redhat.com>
> > Signed-off-by: Sergio Lopez <slp@redhat.com>
> > ---
> > nbd/server.c | 99 +++++++++++++++++++++++++++++++++++++++-------------
> > 1 file changed, 75 insertions(+), 24 deletions(-)
> >
> > diff --git a/nbd/server.c b/nbd/server.c
> > index 86a44a9b41..33e55479d7 100644
> > --- a/nbd/server.c
> > +++ b/nbd/server.c
> > @@ -132,7 +132,7 @@ struct NBDClient {
> > CoMutex send_lock;
> > Coroutine *send_coroutine;
> >
> > - bool read_yielding;
> > + GSList *yield_co_list; /* List of coroutines yielding on nbd_read_eof */
> > bool quiescing;
>
> Hm, how do you get more than one coroutine per client yielding in
> nbd_read_eof() at the same time? I thought the model is that you always
> have one coroutine reading the next request (which is
> client->recv_coroutine) and all the others are just processing the
> request they had read earlier. Multiple coroutines reading from the
> same socket would sound like a bad idea.
You're right, there's only a single coroutine yielding on
nbd_read_eof(). I've added the list while at a moment I was trying to
keep track of every coroutine, and I kept it without thinking if it
was really needed.
I'll drop it, entering just client->recv_coroutine is it isn't NULL.
> > QTAILQ_ENTRY(NBDClient) next;
> > @@ -1367,6 +1367,7 @@ static inline int coroutine_fn
> > nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
> > {
> > bool partial = false;
> > + Coroutine *co;
> >
> > assert(size);
> > while (size > 0) {
> > @@ -1375,9 +1376,12 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
> >
> > len = qio_channel_readv(client->ioc, &iov, 1, errp);
> > if (len == QIO_CHANNEL_ERR_BLOCK) {
> > - client->read_yielding = true;
> > + co = qemu_coroutine_self();
> > +
> > + client->yield_co_list = g_slist_prepend(client->yield_co_list, co);
> > qio_channel_yield(client->ioc, G_IO_IN);
> > - client->read_yielding = false;
> > + client->yield_co_list = g_slist_remove(client->yield_co_list, co);
> > +
> > if (client->quiescing) {
> > return -EAGAIN;
> > }
> > @@ -1513,6 +1517,11 @@ static void nbd_request_put(NBDRequestData *req)
> > g_free(req);
> >
> > client->nb_requests--;
> > +
> > + if (client->quiescing && client->nb_requests == 0) {
> > + aio_wait_kick();
> > + }
> > +
> > nbd_client_receive_next_request(client);
> >
> > nbd_client_put(client);
> > @@ -1530,49 +1539,75 @@ static void blk_aio_attached(AioContext *ctx, void *opaque)
> > QTAILQ_FOREACH(client, &exp->clients, next) {
> > qio_channel_attach_aio_context(client->ioc, ctx);
> >
> > + assert(client->nb_requests == 0);
> > assert(client->recv_coroutine == NULL);
> > assert(client->send_coroutine == NULL);
> > -
> > - if (client->quiescing) {
> > - client->quiescing = false;
> > - nbd_client_receive_next_request(client);
> > - }
> > }
> > }
> >
> > -static void nbd_aio_detach_bh(void *opaque)
> > +static void blk_aio_detach(void *opaque)
> > {
> > NBDExport *exp = opaque;
> > NBDClient *client;
> >
> > + trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
> > +
> > QTAILQ_FOREACH(client, &exp->clients, next) {
> > qio_channel_detach_aio_context(client->ioc);
> > + }
> > +
> > + exp->common.ctx = NULL;
> > +}
> > +
> > +static void nbd_drained_begin(void *opaque)
> > +{
> > + NBDExport *exp = opaque;
> > + NBDClient *client;
> > +
> > + QTAILQ_FOREACH(client, &exp->clients, next) {
> > client->quiescing = true;
> > + }
> > +}
> >
> > - if (client->recv_coroutine) {
> > - if (client->read_yielding) {
> > - qemu_aio_coroutine_enter(exp->common.ctx,
> > - client->recv_coroutine);
> > - } else {
> > - AIO_WAIT_WHILE(exp->common.ctx, client->recv_coroutine != NULL);
> > - }
> > - }
> > +static void nbd_drained_end(void *opaque)
> > +{
> > + NBDExport *exp = opaque;
> > + NBDClient *client;
> >
> > - if (client->send_coroutine) {
> > - AIO_WAIT_WHILE(exp->common.ctx, client->send_coroutine != NULL);
> > - }
> > + QTAILQ_FOREACH(client, &exp->clients, next) {
> > + client->quiescing = false;
> > + nbd_client_receive_next_request(client);
> > }
> > }
> >
> > -static void blk_aio_detach(void *opaque)
> > +static bool nbd_drained_poll(void *opaque)
> > {
> > NBDExport *exp = opaque;
> > + NBDClient *client;
> > + Coroutine *co;
> > + GSList *entry;
> > + GSList *coroutine_list;
> >
> > - trace_nbd_blk_aio_detach(exp->name, exp->common.ctx);
> > + QTAILQ_FOREACH(client, &exp->clients, next) {
> > + if (client->nb_requests != 0) {
> > + /*
> > + * Enter coroutines waiting for new requests on nbd_read_eof(), so
> > + * we don't depend on the client to wake us up.
> > + */
> > + coroutine_list = g_slist_copy(client->yield_co_list);
> > + for (entry = coroutine_list;
> > + entry != NULL;
> > + entry = g_slist_next(entry)) {
> > + co = entry->data;
> > + qemu_aio_coroutine_enter(exp->common.ctx, co);
> > + }
> > + g_slist_free(coroutine_list);
> >
> > - aio_wait_bh_oneshot(exp->common.ctx, nbd_aio_detach_bh, exp);
> > + return 1;
>
> This would be more accurately spelt true...
>
> > + }
> > + }
> >
> > - exp->common.ctx = NULL;
> > + return 0;
>
> ...and this false.
I'll change this in v2.
Thanks,
Sergio.
> > }
> >
> > static void nbd_eject_notifier(Notifier *n, void *data)
>
> The patch looks correct to me, though I'm not sure if yield_co_list is
> an unnecessary complication (and if it isn't, whether that's safe).
>
> I would be happy enough to apply it anyway if you can explain the
> yield_co_list thing, but I'll give Eric some time to have a look, too.
>
> Kevin
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] block-backend: add drained_poll
2021-06-01 15:59 ` Kevin Wolf
@ 2021-06-01 16:32 ` Sergio Lopez
2021-06-01 21:24 ` Eric Blake
1 sibling, 0 replies; 11+ messages in thread
From: Sergio Lopez @ 2021-06-01 16:32 UTC (permalink / raw)
To: Kevin Wolf
Cc: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel, Max Reitz,
Nir Soffer
[-- Attachment #1: Type: text/plain, Size: 1500 bytes --]
On Tue, Jun 01, 2021 at 05:59:10PM +0200, Kevin Wolf wrote:
> Am 01.06.2021 um 07:57 hat Sergio Lopez geschrieben:
> > Allow block backends to poll their devices/users to check if they have
> > been quiesced when entering a drained section.
> >
> > This will be used in the next patch to wait for the NBD server to be
> > completely quiesced.
> >
> > Suggested-by: Kevin Wolf <kwolf@redhat.com>
> > Signed-off-by: Sergio Lopez <slp@redhat.com>
> > ---
> > block/block-backend.c | 7 ++++++-
> > include/sysemu/block-backend.h | 4 ++++
> > 2 files changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/block-backend.c b/block/block-backend.c
> > index de5496af66..163ca05b97 100644
> > --- a/block/block-backend.c
> > +++ b/block/block-backend.c
> > @@ -2393,8 +2393,13 @@ static void blk_root_drained_begin(BdrvChild *child)
> > static bool blk_root_drained_poll(BdrvChild *child)
> > {
> > BlockBackend *blk = child->opaque;
> > + int ret = 0;
>
> It's really a bool.
I'll fix this in v2.
Thanks,
Sergio.
> > assert(blk->quiesce_counter);
> > - return !!blk->in_flight;
> > +
> > + if (blk->dev_ops && blk->dev_ops->drained_poll) {
> > + ret = blk->dev_ops->drained_poll(blk->dev_opaque);
> > + }
> > + return ret || !!blk->in_flight;
> > }
>
> Doesn't make a difference for correctness, of course, so whether you
> change it or not:
>
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] block-backend: add drained_poll
2021-06-01 15:59 ` Kevin Wolf
2021-06-01 16:32 ` Sergio Lopez
@ 2021-06-01 21:24 ` Eric Blake
1 sibling, 0 replies; 11+ messages in thread
From: Eric Blake @ 2021-06-01 21:24 UTC (permalink / raw)
To: Kevin Wolf
Cc: Vladimir Sementsov-Ogievskiy, Sergio Lopez, qemu-block,
qemu-devel, Max Reitz, Nir Soffer
On Tue, Jun 01, 2021 at 05:59:10PM +0200, Kevin Wolf wrote:
> > +++ b/block/block-backend.c
> > @@ -2393,8 +2393,13 @@ static void blk_root_drained_begin(BdrvChild *child)
> > static bool blk_root_drained_poll(BdrvChild *child)
> > {
> > BlockBackend *blk = child->opaque;
> > + int ret = 0;
>
> It's really a bool.
>
> > assert(blk->quiesce_counter);
> > - return !!blk->in_flight;
> > +
> > + if (blk->dev_ops && blk->dev_ops->drained_poll) {
> > + ret = blk->dev_ops->drained_poll(blk->dev_opaque);
> > + }
> > + return ret || !!blk->in_flight;
> > }
>
> Doesn't make a difference for correctness, of course, so whether you
> change it or not:
>
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Likewise, with that cosmetic change,
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] nbd/server: Use drained block ops to quiesce the server
2021-06-01 5:57 ` [PATCH 2/2] nbd/server: Use drained block ops to quiesce the server Sergio Lopez
2021-06-01 16:08 ` Kevin Wolf
@ 2021-06-01 21:29 ` Eric Blake
2021-06-02 5:52 ` Sergio Lopez
1 sibling, 1 reply; 11+ messages in thread
From: Eric Blake @ 2021-06-01 21:29 UTC (permalink / raw)
To: Sergio Lopez
Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
Max Reitz, Nir Soffer
On Tue, Jun 01, 2021 at 07:57:28AM +0200, Sergio Lopez wrote:
> Before switching between AioContexts we need to make sure that we're
> fully quiesced ("nb_requests == 0" for every client) when entering the
> drained section.
>
> To do this, we set "quiescing = true" for every client on
> ".drained_begin" to prevent new coroutines to be created, and check if
s/to be created/from being created/
> "nb_requests == 0" on ".drained_poll". Finally, once we're exiting the
> drained section, on ".drained_end" we set "quiescing = false" and
> call "nbd_client_receive_next_request()" to resume the processing of
> new requests.
>
> With these changes, "blk_aio_attach()" and "blk_aio_detach()" can be
> reverted to be as simple as they were before f148ae7d36.
Is that reversion planned to be patch 3 of your series in v2?
>
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1960137
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
> nbd/server.c | 99 +++++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 75 insertions(+), 24 deletions(-)
>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] nbd/server: Use drained block ops to quiesce the server
2021-06-01 16:31 ` Sergio Lopez
@ 2021-06-01 21:31 ` Eric Blake
0 siblings, 0 replies; 11+ messages in thread
From: Eric Blake @ 2021-06-01 21:31 UTC (permalink / raw)
To: Sergio Lopez
Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
Max Reitz, Nir Soffer
On Tue, Jun 01, 2021 at 06:31:29PM +0200, Sergio Lopez wrote:
> > Hm, how do you get more than one coroutine per client yielding in
> > nbd_read_eof() at the same time? I thought the model is that you always
> > have one coroutine reading the next request (which is
> > client->recv_coroutine) and all the others are just processing the
> > request they had read earlier. Multiple coroutines reading from the
> > same socket would sound like a bad idea.
>
> You're right, there's only a single coroutine yielding on
> nbd_read_eof(). I've added the list while at a moment I was trying to
> keep track of every coroutine, and I kept it without thinking if it
> was really needed.
>
> I'll drop it, entering just client->recv_coroutine is it isn't NULL.
Sounds like I'll wait for the v2 before applying. But the overall
logic changes made sense to me.
> > The patch looks correct to me, though I'm not sure if yield_co_list is
> > an unnecessary complication (and if it isn't, whether that's safe).
> >
> > I would be happy enough to apply it anyway if you can explain the
> > yield_co_list thing, but I'll give Eric some time to have a look, too.
Thanks for catching my attention on this!
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] nbd/server: Use drained block ops to quiesce the server
2021-06-01 21:29 ` Eric Blake
@ 2021-06-02 5:52 ` Sergio Lopez
0 siblings, 0 replies; 11+ messages in thread
From: Sergio Lopez @ 2021-06-02 5:52 UTC (permalink / raw)
To: Eric Blake
Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel,
Max Reitz, Nir Soffer
[-- Attachment #1: Type: text/plain, Size: 1189 bytes --]
On Tue, Jun 01, 2021 at 04:29:07PM -0500, Eric Blake wrote:
> On Tue, Jun 01, 2021 at 07:57:28AM +0200, Sergio Lopez wrote:
> > Before switching between AioContexts we need to make sure that we're
> > fully quiesced ("nb_requests == 0" for every client) when entering the
> > drained section.
> >
> > To do this, we set "quiescing = true" for every client on
> > ".drained_begin" to prevent new coroutines to be created, and check if
>
> s/to be created/from being created/
>
> > "nb_requests == 0" on ".drained_poll". Finally, once we're exiting the
> > drained section, on ".drained_end" we set "quiescing = false" and
> > call "nbd_client_receive_next_request()" to resume the processing of
> > new requests.
> >
> > With these changes, "blk_aio_attach()" and "blk_aio_detach()" can be
> > reverted to be as simple as they were before f148ae7d36.
>
> Is that reversion planned to be patch 3 of your series in v2?
Actually, we need part of the changes introduced in f148ae7d36, so
it's probably simpler to manually revert "blk_aio_attach()" and
"blk_aio_detach()" here than doing an actual reversion and then
reintroducing the changes.
Thanks,
Sergio.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-06-02 5:53 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-01 5:57 [PATCH 0/2] nbd/server: Quiesce server on drained section Sergio Lopez
2021-06-01 5:57 ` [PATCH 1/2] block-backend: add drained_poll Sergio Lopez
2021-06-01 15:59 ` Kevin Wolf
2021-06-01 16:32 ` Sergio Lopez
2021-06-01 21:24 ` Eric Blake
2021-06-01 5:57 ` [PATCH 2/2] nbd/server: Use drained block ops to quiesce the server Sergio Lopez
2021-06-01 16:08 ` Kevin Wolf
2021-06-01 16:31 ` Sergio Lopez
2021-06-01 21:31 ` Eric Blake
2021-06-01 21:29 ` Eric Blake
2021-06-02 5:52 ` Sergio Lopez
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.