All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/12] block: bdrv_set_aio_context() related fixes
@ 2019-02-18 16:18 Kevin Wolf
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 01/12] block-backend: Make blk_inc/dec_in_flight public Kevin Wolf
                   ` (11 more replies)
  0 siblings, 12 replies; 30+ messages in thread
From: Kevin Wolf @ 2019-02-18 16:18 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, mreitz, eblake, stefanha, berrange, pbonzini, qemu-devel

Background for this series is the following bug report, which is about a
crash with virtio-blk + iothread and request resubmission for werror/rerror:

https://bugzilla.redhat.com/show_bug.cgi?id=1671173

The reason is that bdrv_set_aio_context() didn't correctly quiesce
everything. Instead, it had a local hack to call aio_poll() for the
source AioContext, which covered some, but not all cases, and is wrong
because you can only call aio_poll() from the home thread.

So this series tries to make bdrv_drain() actually drain the known cases
(fixes virtio-blk and the NBD client) and use the regular drain
functions in bdrv_set_aio_context() instead of open-coding something
similar.

Kevin Wolf (12):
  block-backend: Make blk_inc/dec_in_flight public
  virtio-blk: Increase in_flight for request restart BH
  nbd: Restrict connection_co reentrance
  io: Make qio_channel_yield() interruptible
  nbd: Move nbd_read_eof() to nbd/client.c
  nbd: Use low-level QIOChannel API in nbd_read_eof()
  nbd: Increase bs->in_flight during AioContext switch
  block: Don't poll in bdrv_set_aio_context()
  block: Fix AioContext switch for drained node
  test-bdrv-drain: AioContext switch in drained section
  block: Use normal drain for bdrv_set_aio_context()
  aio-posix: Assert that aio_poll() is always called in home thread

 block/nbd-client.h             |  2 ++
 include/block/nbd.h            |  4 ++-
 include/io/channel.h           |  9 +++--
 include/sysemu/block-backend.h |  2 ++
 nbd/nbd-internal.h             | 19 ----------
 block.c                        | 25 +++++++-------
 block/block-backend.c          |  4 +--
 block/nbd-client.c             | 27 ++++++++++++++-
 hw/block/virtio-blk.c          |  4 +++
 io/channel.c                   | 10 ++++++
 nbd/client.c                   | 63 ++++++++++++++++++++++++++++++++--
 tests/test-bdrv-drain.c        | 32 +++++++++++++++++
 util/aio-posix.c               |  3 +-
 13 files changed, 162 insertions(+), 42 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 01/12] block-backend: Make blk_inc/dec_in_flight public
  2019-02-18 16:18 [Qemu-devel] [PATCH 00/12] block: bdrv_set_aio_context() related fixes Kevin Wolf
@ 2019-02-18 16:18 ` Kevin Wolf
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 02/12] virtio-blk: Increase in_flight for request restart BH Kevin Wolf
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2019-02-18 16:18 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, mreitz, eblake, stefanha, berrange, pbonzini, qemu-devel

For some users of BlockBackends, just increasing the in_flight counter
is easier than implementing separate handlers in BlockDevOps. Make the
helper functions for this public.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/sysemu/block-backend.h | 2 ++
 block/block-backend.c          | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 832a4bf168..e2066eb06b 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -156,6 +156,8 @@ int blk_co_pdiscard(BlockBackend *blk, int64_t offset, int bytes);
 int blk_co_flush(BlockBackend *blk);
 int blk_flush(BlockBackend *blk);
 int blk_commit_all(void);
+void blk_inc_in_flight(BlockBackend *blk);
+void blk_dec_in_flight(BlockBackend *blk);
 void blk_drain(BlockBackend *blk);
 void blk_drain_all(void);
 void blk_set_on_error(BlockBackend *blk, BlockdevOnError on_read_error,
diff --git a/block/block-backend.c b/block/block-backend.c
index f6ea824308..0219555f89 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1262,12 +1262,12 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
     return bdrv_make_zero(blk->root, flags);
 }
 
-static void blk_inc_in_flight(BlockBackend *blk)
+void blk_inc_in_flight(BlockBackend *blk)
 {
     atomic_inc(&blk->in_flight);
 }
 
-static void blk_dec_in_flight(BlockBackend *blk)
+void blk_dec_in_flight(BlockBackend *blk)
 {
     atomic_dec(&blk->in_flight);
     aio_wait_kick();
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 02/12] virtio-blk: Increase in_flight for request restart BH
  2019-02-18 16:18 [Qemu-devel] [PATCH 00/12] block: bdrv_set_aio_context() related fixes Kevin Wolf
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 01/12] block-backend: Make blk_inc/dec_in_flight public Kevin Wolf
@ 2019-02-18 16:18 ` Kevin Wolf
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 03/12] nbd: Restrict connection_co reentrance Kevin Wolf
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2019-02-18 16:18 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, mreitz, eblake, stefanha, berrange, pbonzini, qemu-devel

virtio_blk_dma_restart_bh() submits new requests, so in order to make
sure that these requests are not started inside a drained section of the
attached BlockBackend, we need to make sure that draining the
BlockBackend waits for the BH to be executed.

This BH is still questionable because its scheduled in the main thread
instead of the configured iothread. Leave a FIXME comment for this.

But with this fix, enabling the data plane at least waits for these
requests (in bdrv_set_aio_context()) instead of changing the AioContext
under their feet and making them run in the wrong thread, causing
crashes and failures (e.g. due to missing locking).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/block/virtio-blk.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cf7f47eaba..e11e6e45d0 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -675,6 +675,7 @@ static void virtio_blk_dma_restart_bh(void *opaque)
     if (mrb.num_reqs) {
         virtio_blk_submit_multireq(s->blk, &mrb);
     }
+    blk_dec_in_flight(s->conf.conf.blk);
     aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
@@ -688,8 +689,11 @@ static void virtio_blk_dma_restart_cb(void *opaque, int running,
     }
 
     if (!s->bh) {
+        /* FIXME The data plane is not started yet, so these requests are
+         * processed in the main thread. */
         s->bh = aio_bh_new(blk_get_aio_context(s->conf.conf.blk),
                            virtio_blk_dma_restart_bh, s);
+        blk_inc_in_flight(s->conf.conf.blk);
         qemu_bh_schedule(s->bh);
     }
 }
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 03/12] nbd: Restrict connection_co reentrance
  2019-02-18 16:18 [Qemu-devel] [PATCH 00/12] block: bdrv_set_aio_context() related fixes Kevin Wolf
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 01/12] block-backend: Make blk_inc/dec_in_flight public Kevin Wolf
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 02/12] virtio-blk: Increase in_flight for request restart BH Kevin Wolf
@ 2019-02-18 16:18 ` Kevin Wolf
  2019-02-18 20:30   ` Eric Blake
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 04/12] io: Make qio_channel_yield() interruptible Kevin Wolf
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2019-02-18 16:18 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, mreitz, eblake, stefanha, berrange, pbonzini, qemu-devel

nbd_client_attach_aio_context() schedules connection_co in the new
AioContext and this way reenters it in any arbitrary place that has
yielded. We can restrict this a bit to the function call where the
coroutine actually sits waiting when it's idle.

This doesn't solve any bug yet, but it shows where in the code we need
to support this random reentrance and where we don't have to care.

Add FIXME comments for the existing bugs that the rest of this series
will fix.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/nbd-client.h |  1 +
 block/nbd-client.c | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index d990207a5c..09e03013d2 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -35,6 +35,7 @@ typedef struct NBDClientSession {
 
     NBDClientRequest requests[MAX_NBD_REQUESTS];
     NBDReply reply;
+    BlockDriverState *bs;
     bool quit;
 } NBDClientSession;
 
diff --git a/block/nbd-client.c b/block/nbd-client.c
index f0ad54ce21..e776785325 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -76,8 +76,24 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
     Error *local_err = NULL;
 
     while (!s->quit) {
+        /*
+         * The NBD client can only really be considered idle when it has
+         * yielded from qio_channel_readv_all_eof(), waiting for data. This is
+         * the point where the additional scheduled coroutine entry happens
+         * after nbd_client_attach_aio_context().
+         *
+         * Therefore we keep an additional in_flight reference all the time and
+         * only drop it temporarily here.
+         *
+         * FIXME This is not safe because the QIOChannel could wake up the
+         * coroutine for a second time; it is not prepared for coroutine
+         * resumption from external code.
+         */
+        bdrv_dec_in_flight(s->bs);
         assert(s->reply.handle == 0);
         ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
+        bdrv_inc_in_flight(s->bs);
+
         if (local_err) {
             trace_nbd_read_reply_entry_fail(ret, error_get_pretty(local_err));
             error_free(local_err);
@@ -116,6 +132,8 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
 
     s->quit = true;
     nbd_recv_coroutines_wake_all(s);
+    bdrv_dec_in_flight(s->bs);
+
     s->connection_co = NULL;
     aio_wait_kick();
 }
@@ -970,6 +988,9 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
 {
     NBDClientSession *client = nbd_get_client_session(bs);
     qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), new_context);
+
+    /* FIXME Really need a bdrv_inc_in_flight() here, but the corresponding
+     * bdrv_dec_in_flight() would have to be in QIOChannel code :-/ */
     aio_co_schedule(new_context, client->connection_co);
 }
 
@@ -1076,6 +1097,7 @@ static int nbd_client_connect(BlockDriverState *bs,
      * kick the reply mechanism.  */
     qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
     client->connection_co = qemu_coroutine_create(nbd_connection_entry, client);
+    bdrv_inc_in_flight(bs);
     nbd_client_attach_aio_context(bs, bdrv_get_aio_context(bs));
 
     logout("Established connection with NBD server\n");
@@ -1108,6 +1130,7 @@ int nbd_client_init(BlockDriverState *bs,
 {
     NBDClientSession *client = nbd_get_client_session(bs);
 
+    client->bs = bs;
     qemu_co_mutex_init(&client->send_mutex);
     qemu_co_queue_init(&client->free_sema);
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 04/12] io: Make qio_channel_yield() interruptible
  2019-02-18 16:18 [Qemu-devel] [PATCH 00/12] block: bdrv_set_aio_context() related fixes Kevin Wolf
                   ` (2 preceding siblings ...)
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 03/12] nbd: Restrict connection_co reentrance Kevin Wolf
@ 2019-02-18 16:18 ` Kevin Wolf
  2019-02-18 17:11   ` Paolo Bonzini
  2019-02-18 20:33   ` Eric Blake
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 05/12] nbd: Move nbd_read_eof() to nbd/client.c Kevin Wolf
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 30+ messages in thread
From: Kevin Wolf @ 2019-02-18 16:18 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, mreitz, eblake, stefanha, berrange, pbonzini, qemu-devel

Similar to how qemu_co_sleep_ns() allows to be preempted by an external
coroutine entry, allow reentering qio_channel_yield() early.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/io/channel.h |  9 ++++++---
 io/channel.c         | 10 ++++++++++
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index da2f138200..59460cb1ec 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -739,10 +739,13 @@ void qio_channel_detach_aio_context(QIOChannel *ioc);
  * addition, no two coroutine can be waiting on the same condition
  * and channel at the same time.
  *
- * This must only be called from coroutine context
+ * This must only be called from coroutine context. It is safe to
+ * reenter the coroutine externally while it is waiting; in this
+ * case the function will return even if @condition is not yet
+ * available.
  */
-void qio_channel_yield(QIOChannel *ioc,
-                       GIOCondition condition);
+void coroutine_fn qio_channel_yield(QIOChannel *ioc,
+                                    GIOCondition condition);
 
 /**
  * qio_channel_wait:
diff --git a/io/channel.c b/io/channel.c
index 8dd0684f5d..303376e08d 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -469,6 +469,16 @@ void coroutine_fn qio_channel_yield(QIOChannel *ioc,
     }
     qio_channel_set_aio_fd_handlers(ioc);
     qemu_coroutine_yield();
+
+    /* Allow interrupting the operation by reentering the coroutine other than
+     * through the aio_fd_handlers. */
+    if (condition == G_IO_IN && ioc->read_coroutine) {
+        ioc->read_coroutine = NULL;
+        qio_channel_set_aio_fd_handlers(ioc);
+    } else if (condition == G_IO_OUT && ioc->write_coroutine) {
+        ioc->write_coroutine = NULL;
+        qio_channel_set_aio_fd_handlers(ioc);
+    }
 }
 
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 05/12] nbd: Move nbd_read_eof() to nbd/client.c
  2019-02-18 16:18 [Qemu-devel] [PATCH 00/12] block: bdrv_set_aio_context() related fixes Kevin Wolf
                   ` (3 preceding siblings ...)
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 04/12] io: Make qio_channel_yield() interruptible Kevin Wolf
@ 2019-02-18 16:18 ` Kevin Wolf
  2019-02-18 20:42   ` Eric Blake
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 06/12] nbd: Use low-level QIOChannel API in nbd_read_eof() Kevin Wolf
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2019-02-18 16:18 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, mreitz, eblake, stefanha, berrange, pbonzini, qemu-devel

The only caller of nbd_read_eof() is nbd_receive_reply(), so it doesn't
have to live in the header file, but can move next to its caller.

Also add the missing coroutine_fn to the function and its caller.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/nbd.h |  3 ++-
 nbd/nbd-internal.h  | 19 -------------------
 nbd/client.c        | 22 +++++++++++++++++++++-
 3 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 96cfb1d7d5..cad975e00c 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -300,7 +300,8 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
 int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
              Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
-int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp);
+int coroutine_fn nbd_receive_reply(QIOChannel *ioc, NBDReply *reply,
+                                   Error **errp);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 int nbd_errno_to_system_errno(int err);
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 82aa221227..049f83df77 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -64,25 +64,6 @@
 #define NBD_SET_TIMEOUT             _IO(0xab, 9)
 #define NBD_SET_FLAGS               _IO(0xab, 10)
 
-/* nbd_read_eof
- * Tries to read @size bytes from @ioc.
- * Returns 1 on success
- *         0 on eof, when no data was read (errp is not set)
- *         negative errno on failure (errp is set)
- */
-static inline int nbd_read_eof(QIOChannel *ioc, void *buffer, size_t size,
-                               Error **errp)
-{
-    int ret;
-
-    assert(size);
-    ret = qio_channel_read_all_eof(ioc, buffer, size, errp);
-    if (ret < 0) {
-        ret = -EIO;
-    }
-    return ret;
-}
-
 /* nbd_write
  * Writes @size bytes to @ioc. Returns 0 on success.
  */
diff --git a/nbd/client.c b/nbd/client.c
index 10a52ad7d0..28d174c0f3 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1387,12 +1387,32 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
     return 0;
 }
 
+/* nbd_read_eof
+ * Tries to read @size bytes from @ioc.
+ * Returns 1 on success
+ *         0 on eof, when no data was read (errp is not set)
+ *         negative errno on failure (errp is set)
+ */
+static inline int coroutine_fn
+nbd_read_eof(QIOChannel *ioc, void *buffer, size_t size, Error **errp)
+{
+    int ret;
+
+    assert(size);
+    ret = qio_channel_read_all_eof(ioc, buffer, size, errp);
+    if (ret < 0) {
+        ret = -EIO;
+    }
+    return ret;
+}
+
 /* nbd_receive_reply
  * Returns 1 on success
  *         0 on eof, when no data was read (errp is not set)
  *         negative errno on failure (errp is set)
  */
-int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
+int coroutine_fn nbd_receive_reply(QIOChannel *ioc, NBDReply *reply,
+                                   Error **errp)
 {
     int ret;
     const char *type;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 06/12] nbd: Use low-level QIOChannel API in nbd_read_eof()
  2019-02-18 16:18 [Qemu-devel] [PATCH 00/12] block: bdrv_set_aio_context() related fixes Kevin Wolf
                   ` (4 preceding siblings ...)
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 05/12] nbd: Move nbd_read_eof() to nbd/client.c Kevin Wolf
@ 2019-02-18 16:18 ` Kevin Wolf
  2019-02-18 20:48   ` Eric Blake
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 07/12] nbd: Increase bs->in_flight during AioContext switch Kevin Wolf
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2019-02-18 16:18 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, mreitz, eblake, stefanha, berrange, pbonzini, qemu-devel

Instead of using the convenience wrapper qio_channel_read_all_eof(), use
the lower level QIOChannel API. This means duplicating some code, but
we'll need this because this coroutine yield is special: We want it to
be interruptible so that nbd_client_attach_aio_context() can correctly
reenter the coroutine.

This moves the bdrv_dec/inc_in_flight() pair into nbd_read_eof(), so
that connection_co will always sit in this exact qio_channel_yield()
call when bdrv_drain() returns.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/nbd.h |  4 ++--
 block/nbd-client.c  |  8 +-------
 nbd/client.c        | 46 ++++++++++++++++++++++++++++++++++++---------
 3 files changed, 40 insertions(+), 18 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index cad975e00c..c6ef1ef42e 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -300,8 +300,8 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
 int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
              Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
-int coroutine_fn nbd_receive_reply(QIOChannel *ioc, NBDReply *reply,
-                                   Error **errp);
+int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
+                                   NBDReply *reply, Error **errp);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 int nbd_errno_to_system_errno(int err);
diff --git a/block/nbd-client.c b/block/nbd-client.c
index e776785325..688993652d 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -84,15 +84,9 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
          *
          * Therefore we keep an additional in_flight reference all the time and
          * only drop it temporarily here.
-         *
-         * FIXME This is not safe because the QIOChannel could wake up the
-         * coroutine for a second time; it is not prepared for coroutine
-         * resumption from external code.
          */
-        bdrv_dec_in_flight(s->bs);
         assert(s->reply.handle == 0);
-        ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
-        bdrv_inc_in_flight(s->bs);
+        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, &local_err);
 
         if (local_err) {
             trace_nbd_read_reply_entry_fail(ret, error_get_pretty(local_err));
diff --git a/nbd/client.c b/nbd/client.c
index 28d174c0f3..de7da48246 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1394,30 +1394,58 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
  *         negative errno on failure (errp is set)
  */
 static inline int coroutine_fn
-nbd_read_eof(QIOChannel *ioc, void *buffer, size_t size, Error **errp)
+nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size,
+             Error **errp)
 {
-    int ret;
+    bool partial = false;
 
     assert(size);
-    ret = qio_channel_read_all_eof(ioc, buffer, size, errp);
-    if (ret < 0) {
-        ret = -EIO;
+    while (size > 0) {
+        struct iovec iov = { .iov_base = buffer, .iov_len = size };
+        ssize_t len;
+
+        len = qio_channel_readv(ioc, &iov, 1, errp);
+        if (len == QIO_CHANNEL_ERR_BLOCK) {
+            bdrv_dec_in_flight(bs);
+            qio_channel_yield(ioc, G_IO_IN);
+            bdrv_inc_in_flight(bs);
+            continue;
+        } else if (len < 0) {
+            return -EIO;
+        } else if (len == 0) {
+            if (partial) {
+                error_setg(errp,
+                           "Unexpected end-of-file before all bytes were read");
+                return -EIO;
+            } else {
+                return 0;
+            }
+        }
+
+        partial = true;
+        size -= len;
+        buffer = (uint8_t*) buffer + len;
     }
-    return ret;
+    return 1;
 }
 
 /* nbd_receive_reply
+ *
+ * Decreases bs->in_flight while waiting for a new reply. This yield is where
+ * we wait indefinitely and the coroutine must be able to be safely reentered
+ * for nbd_client_attach_aio_context().
+ *
  * Returns 1 on success
  *         0 on eof, when no data was read (errp is not set)
  *         negative errno on failure (errp is set)
  */
-int coroutine_fn nbd_receive_reply(QIOChannel *ioc, NBDReply *reply,
-                                   Error **errp)
+int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
+                                   NBDReply *reply, Error **errp)
 {
     int ret;
     const char *type;
 
-    ret = nbd_read_eof(ioc, &reply->magic, sizeof(reply->magic), errp);
+    ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), errp);
     if (ret <= 0) {
         return ret;
     }
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 07/12] nbd: Increase bs->in_flight during AioContext switch
  2019-02-18 16:18 [Qemu-devel] [PATCH 00/12] block: bdrv_set_aio_context() related fixes Kevin Wolf
                   ` (5 preceding siblings ...)
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 06/12] nbd: Use low-level QIOChannel API in nbd_read_eof() Kevin Wolf
@ 2019-02-18 16:18 ` Kevin Wolf
  2019-02-18 17:22   ` Paolo Bonzini
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 08/12] block: Don't poll in bdrv_set_aio_context() Kevin Wolf
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2019-02-18 16:18 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, mreitz, eblake, stefanha, berrange, pbonzini, qemu-devel

bdrv_drain() must not leave connection_co scheduled, so bs->in_flight
needs to be increased while the coroutine is waiting to be scheduled
in the new AioContext after nbd_client_attach_aio_context().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/nbd-client.h  |  1 +
 include/block/nbd.h |  5 +++--
 block/nbd-client.c  | 16 ++++++++++++----
 nbd/client.c        | 23 +++++++++++++++++------
 4 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index 09e03013d2..ee4e8544fe 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -37,6 +37,7 @@ typedef struct NBDClientSession {
     NBDReply reply;
     BlockDriverState *bs;
     bool quit;
+    bool aio_ctx_switch;
 } NBDClientSession;
 
 NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
diff --git a/include/block/nbd.h b/include/block/nbd.h
index c6ef1ef42e..73e6e95095 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -300,8 +300,9 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
 int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
              Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
-int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-                                   NBDReply *reply, Error **errp);
+int coroutine_fn nbd_receive_reply(BlockDriverState *bs, bool *aio_ctx_switch,
+                                   QIOChannel *ioc, NBDReply *reply,
+                                   Error **errp);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 int nbd_errno_to_system_errno(int err);
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 688993652d..433de9e275 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -75,6 +75,11 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
     int ret = 0;
     Error *local_err = NULL;
 
+    /* Keep the initial bs->in_flight increase; decrease only temporarily in
+     * nbd_read_eof() and at the end of this function. */
+    assert(s->aio_ctx_switch);
+    s->aio_ctx_switch = false;
+
     while (!s->quit) {
         /*
          * The NBD client can only really be considered idle when it has
@@ -86,7 +91,8 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
          * only drop it temporarily here.
          */
         assert(s->reply.handle == 0);
-        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, &local_err);
+        ret = nbd_receive_reply(s->bs, &s->aio_ctx_switch, s->ioc, &s->reply,
+                                &local_err);
 
         if (local_err) {
             trace_nbd_read_reply_entry_fail(ret, error_get_pretty(local_err));
@@ -983,8 +989,11 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
     NBDClientSession *client = nbd_get_client_session(bs);
     qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), new_context);
 
-    /* FIXME Really need a bdrv_inc_in_flight() here, but the corresponding
-     * bdrv_dec_in_flight() would have to be in QIOChannel code :-/ */
+    /* The corresponding bdrv_dec_in_flight() is in nbd_read_eof() */
+    assert(!client->aio_ctx_switch);
+    client->aio_ctx_switch = true;
+    bdrv_inc_in_flight(bs);
+
     aio_co_schedule(new_context, client->connection_co);
 }
 
@@ -1091,7 +1100,6 @@ static int nbd_client_connect(BlockDriverState *bs,
      * kick the reply mechanism.  */
     qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
     client->connection_co = qemu_coroutine_create(nbd_connection_entry, client);
-    bdrv_inc_in_flight(bs);
     nbd_client_attach_aio_context(bs, bdrv_get_aio_context(bs));
 
     logout("Established connection with NBD server\n");
diff --git a/nbd/client.c b/nbd/client.c
index de7da48246..12bd24d8fe 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1394,8 +1394,8 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
  *         negative errno on failure (errp is set)
  */
 static inline int coroutine_fn
-nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size,
-             Error **errp)
+nbd_read_eof(BlockDriverState *bs, bool *aio_ctx_switch, QIOChannel *ioc,
+             void *buffer, size_t size, Error **errp)
 {
     bool partial = false;
 
@@ -1406,9 +1406,18 @@ nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size,
 
         len = qio_channel_readv(ioc, &iov, 1, errp);
         if (len == QIO_CHANNEL_ERR_BLOCK) {
+            /* aio_ctx_switch is only supposed to be set if we're sitting in
+             * the qio_channel_yield() below. */
+            assert(!*aio_ctx_switch);
             bdrv_dec_in_flight(bs);
             qio_channel_yield(ioc, G_IO_IN);
-            bdrv_inc_in_flight(bs);
+            if (*aio_ctx_switch) {
+                /* nbd_client_attach_aio_context() already increased in_flight
+                 * when scheduling this coroutine for reentry */
+                *aio_ctx_switch = false;
+            } else {
+                bdrv_inc_in_flight(bs);
+            }
             continue;
         } else if (len < 0) {
             return -EIO;
@@ -1439,13 +1448,15 @@ nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size,
  *         0 on eof, when no data was read (errp is not set)
  *         negative errno on failure (errp is set)
  */
-int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-                                   NBDReply *reply, Error **errp)
+int coroutine_fn nbd_receive_reply(BlockDriverState *bs, bool *aio_ctx_switch,
+                                   QIOChannel *ioc, NBDReply *reply,
+                                   Error **errp)
 {
     int ret;
     const char *type;
 
-    ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), errp);
+    ret = nbd_read_eof(bs, aio_ctx_switch, ioc, &reply->magic,
+                       sizeof(reply->magic), errp);
     if (ret <= 0) {
         return ret;
     }
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 08/12] block: Don't poll in bdrv_set_aio_context()
  2019-02-18 16:18 [Qemu-devel] [PATCH 00/12] block: bdrv_set_aio_context() related fixes Kevin Wolf
                   ` (6 preceding siblings ...)
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 07/12] nbd: Increase bs->in_flight during AioContext switch Kevin Wolf
@ 2019-02-18 16:18 ` Kevin Wolf
  2019-02-18 20:52   ` Eric Blake
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 09/12] block: Fix AioContext switch for drained node Kevin Wolf
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2019-02-18 16:18 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, mreitz, eblake, stefanha, berrange, pbonzini, qemu-devel

The explicit aio_poll() call in bdrv_set_aio_context() was added in
commit c2b6428d388 as a workaround for bdrv_drain() failing to achieve
to actually quiesce everything (specifically the NBD client code to
switch AioContext).

Now that the NBD client has been fixed to complete this operation during
bdrv_drain(), we don't need the workaround any more.

It was wrong anyway: aio_poll() must always be run in the home thread of
the AioContext.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/block.c b/block.c
index 0c12632661..17bc1d3dca 100644
--- a/block.c
+++ b/block.c
@@ -5273,10 +5273,6 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
     bdrv_parent_drained_begin(bs, NULL, false);
     bdrv_drain(bs); /* ensure there are no in-flight requests */
 
-    while (aio_poll(ctx, false)) {
-        /* wait for all bottom halves to execute */
-    }
-
     bdrv_detach_aio_context(bs);
 
     /* This function executes in the old AioContext so acquire the new one in
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 09/12] block: Fix AioContext switch for drained node
  2019-02-18 16:18 [Qemu-devel] [PATCH 00/12] block: bdrv_set_aio_context() related fixes Kevin Wolf
                   ` (7 preceding siblings ...)
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 08/12] block: Don't poll in bdrv_set_aio_context() Kevin Wolf
@ 2019-02-18 16:18 ` Kevin Wolf
  2019-02-18 20:53   ` Eric Blake
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 10/12] test-bdrv-drain: AioContext switch in drained section Kevin Wolf
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2019-02-18 16:18 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, mreitz, eblake, stefanha, berrange, pbonzini, qemu-devel

When a drained node changes its AioContext, we need to move its
aio_disable_external() to the new context, too.

Without this fix, drain_end will try to reenable the new context, which
has never been disabled, so an assertion failure is triggered.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/block.c b/block.c
index 17bc1d3dca..aefb5701f5 100644
--- a/block.c
+++ b/block.c
@@ -5227,6 +5227,9 @@ void bdrv_detach_aio_context(BlockDriverState *bs)
         bdrv_detach_aio_context(child->bs);
     }
 
+    if (bs->quiesce_counter) {
+        aio_enable_external(bs->aio_context);
+    }
     bs->aio_context = NULL;
 }
 
@@ -5240,6 +5243,10 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
         return;
     }
 
+    if (bs->quiesce_counter) {
+        aio_disable_external(new_context);
+    }
+
     bs->aio_context = new_context;
 
     QLIST_FOREACH(child, &bs->children, next) {
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 10/12] test-bdrv-drain: AioContext switch in drained section
  2019-02-18 16:18 [Qemu-devel] [PATCH 00/12] block: bdrv_set_aio_context() related fixes Kevin Wolf
                   ` (8 preceding siblings ...)
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 09/12] block: Fix AioContext switch for drained node Kevin Wolf
@ 2019-02-18 16:18 ` Kevin Wolf
  2019-02-18 20:55   ` Eric Blake
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 11/12] block: Use normal drain for bdrv_set_aio_context() Kevin Wolf
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 12/12] aio-posix: Assert that aio_poll() is always called in home thread Kevin Wolf
  11 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2019-02-18 16:18 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, mreitz, eblake, stefanha, berrange, pbonzini, qemu-devel

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/test-bdrv-drain.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index ee1740ff06..1b1f6c17a5 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -1522,6 +1522,36 @@ static void test_append_to_drained(void)
     blk_unref(blk);
 }
 
+static void test_set_aio_context(void)
+{
+    BlockDriverState *bs;
+    IOThread *a = iothread_new();
+    IOThread *b = iothread_new();
+    AioContext *ctx_a = iothread_get_aio_context(a);
+    AioContext *ctx_b = iothread_get_aio_context(b);
+
+    bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
+                              &error_abort);
+
+    bdrv_drained_begin(bs);
+    bdrv_set_aio_context(bs, ctx_a);
+
+    aio_context_acquire(ctx_a);
+    bdrv_drained_end(bs);
+
+    bdrv_drained_begin(bs);
+    bdrv_set_aio_context(bs, ctx_b);
+    aio_context_release(ctx_a);
+    aio_context_acquire(ctx_b);
+    bdrv_set_aio_context(bs, qemu_get_aio_context());
+    aio_context_release(ctx_b);
+    bdrv_drained_end(bs);
+
+    bdrv_unref(bs);
+    iothread_join(a);
+    iothread_join(b);
+}
+
 int main(int argc, char **argv)
 {
     int ret;
@@ -1603,6 +1633,8 @@ int main(int argc, char **argv)
 
     g_test_add_func("/bdrv-drain/attach/drain", test_append_to_drained);
 
+    g_test_add_func("/bdrv-drain/set_aio_context", test_set_aio_context);
+
     ret = g_test_run();
     qemu_event_destroy(&done_event);
     return ret;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 11/12] block: Use normal drain for bdrv_set_aio_context()
  2019-02-18 16:18 [Qemu-devel] [PATCH 00/12] block: bdrv_set_aio_context() related fixes Kevin Wolf
                   ` (9 preceding siblings ...)
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 10/12] test-bdrv-drain: AioContext switch in drained section Kevin Wolf
@ 2019-02-18 16:18 ` Kevin Wolf
  2019-02-18 20:57   ` Eric Blake
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 12/12] aio-posix: Assert that aio_poll() is always called in home thread Kevin Wolf
  11 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2019-02-18 16:18 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, mreitz, eblake, stefanha, berrange, pbonzini, qemu-devel

Now that bdrv_set_aio_context() works inside drained sections, it can
also use the real drain function instead of open coding something
similar.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index aefb5701f5..7024172db8 100644
--- a/block.c
+++ b/block.c
@@ -5268,18 +5268,15 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
     bs->walking_aio_notifiers = false;
 }
 
+/* The caller must own the AioContext lock for the old AioContext of bs, but it
+ * must not own the AioContext lock for new_context. */
 void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
 {
-    AioContext *ctx = bdrv_get_aio_context(bs);
-
-    if (ctx == new_context) {
+    if (bdrv_get_aio_context(bs) == new_context) {
         return;
     }
 
-    aio_disable_external(ctx);
-    bdrv_parent_drained_begin(bs, NULL, false);
-    bdrv_drain(bs); /* ensure there are no in-flight requests */
-
+    bdrv_drained_begin(bs);
     bdrv_detach_aio_context(bs);
 
     /* This function executes in the old AioContext so acquire the new one in
@@ -5287,8 +5284,7 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
      */
     aio_context_acquire(new_context);
     bdrv_attach_aio_context(bs, new_context);
-    bdrv_parent_drained_end(bs, NULL, false);
-    aio_enable_external(ctx);
+    bdrv_drained_end(bs);
     aio_context_release(new_context);
 }
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 12/12] aio-posix: Assert that aio_poll() is always called in home thread
  2019-02-18 16:18 [Qemu-devel] [PATCH 00/12] block: bdrv_set_aio_context() related fixes Kevin Wolf
                   ` (10 preceding siblings ...)
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 11/12] block: Use normal drain for bdrv_set_aio_context() Kevin Wolf
@ 2019-02-18 16:18 ` Kevin Wolf
  2019-02-18 20:58   ` Eric Blake
  11 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2019-02-18 16:18 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, mreitz, eblake, stefanha, berrange, pbonzini, qemu-devel

aio_poll() has an existing assertion that the function is only called
from the AioContext's home thread if blocking is allowed.

This is not enough, some handlers make assumptions about the thread they
run in. Extend the assertion to non-blocking calls, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 util/aio-posix.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 8640dfde9f..6fbfa7924f 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -613,6 +613,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
     int64_t timeout;
     int64_t start = 0;
 
+    assert(in_aio_context_home_thread(ctx));
+
     /* aio_notify can avoid the expensive event_notifier_set if
      * everything (file descriptors, bottom halves, timers) will
      * be re-evaluated before the next blocking poll().  This is
@@ -621,7 +623,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
      * so disable the optimization now.
      */
     if (blocking) {
-        assert(in_aio_context_home_thread(ctx));
         atomic_add(&ctx->notify_me, 2);
     }
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 04/12] io: Make qio_channel_yield() interruptible
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 04/12] io: Make qio_channel_yield() interruptible Kevin Wolf
@ 2019-02-18 17:11   ` Paolo Bonzini
  2019-02-19 10:24     ` Kevin Wolf
  2019-02-18 20:33   ` Eric Blake
  1 sibling, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2019-02-18 17:11 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, eblake, stefanha, berrange, qemu-devel

On 18/02/19 17:18, Kevin Wolf wrote:
> Similar to how qemu_co_sleep_ns() allows to be preempted by an external
> coroutine entry, allow reentering qio_channel_yield() early.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/io/channel.h |  9 ++++++---
>  io/channel.c         | 10 ++++++++++
>  2 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index da2f138200..59460cb1ec 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -739,10 +739,13 @@ void qio_channel_detach_aio_context(QIOChannel *ioc);
>   * addition, no two coroutine can be waiting on the same condition
>   * and channel at the same time.
>   *
> - * This must only be called from coroutine context
> + * This must only be called from coroutine context. It is safe to
> + * reenter the coroutine externally while it is waiting; in this
> + * case the function will return even if @condition is not yet
> + * available.
>   */
> -void qio_channel_yield(QIOChannel *ioc,
> -                       GIOCondition condition);
> +void coroutine_fn qio_channel_yield(QIOChannel *ioc,
> +                                    GIOCondition condition);
>  
>  /**
>   * qio_channel_wait:
> diff --git a/io/channel.c b/io/channel.c
> index 8dd0684f5d..303376e08d 100644
> --- a/io/channel.c
> +++ b/io/channel.c
> @@ -469,6 +469,16 @@ void coroutine_fn qio_channel_yield(QIOChannel *ioc,
>      }
>      qio_channel_set_aio_fd_handlers(ioc);
>      qemu_coroutine_yield();
> +
> +    /* Allow interrupting the operation by reentering the coroutine other than
> +     * through the aio_fd_handlers. */
> +    if (condition == G_IO_IN && ioc->read_coroutine) {
> +        ioc->read_coroutine = NULL;
> +        qio_channel_set_aio_fd_handlers(ioc);
> +    } else if (condition == G_IO_OUT && ioc->write_coroutine) {
> +        ioc->write_coroutine = NULL;
> +        qio_channel_set_aio_fd_handlers(ioc);
> +    }
>  }
>  
>  
> 

Perhaps it's a bit cleaner to remove the ioc->..._coroutine assignments,
and the call to qio_channel_set_aio_fd_handlers, from
qio_channel_restart_read and qio_channel_restart_write.

Paolo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 07/12] nbd: Increase bs->in_flight during AioContext switch
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 07/12] nbd: Increase bs->in_flight during AioContext switch Kevin Wolf
@ 2019-02-18 17:22   ` Paolo Bonzini
  2019-02-19 11:11     ` Kevin Wolf
  2019-02-20 16:33     ` Kevin Wolf
  0 siblings, 2 replies; 30+ messages in thread
From: Paolo Bonzini @ 2019-02-18 17:22 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, eblake, stefanha, berrange, qemu-devel

On 18/02/19 17:18, Kevin Wolf wrote:
> +            /* aio_ctx_switch is only supposed to be set if we're sitting in
> +             * the qio_channel_yield() below. */
> +            assert(!*aio_ctx_switch);
>              bdrv_dec_in_flight(bs);
>              qio_channel_yield(ioc, G_IO_IN);
> -            bdrv_inc_in_flight(bs);
> +            if (*aio_ctx_switch) {
> +                /* nbd_client_attach_aio_context() already increased in_flight
> +                 * when scheduling this coroutine for reentry */
> +                *aio_ctx_switch = false;
> +            } else {
> +                bdrv_inc_in_flight(bs);
> +            }

Hmm, my first thought would have been to do the bdrv_inc_in_flight(bs);
unconditionally here, and in nbd_connection_entry do the opposite, like

	if (s->aio_ctx_switch) {
	    s->aio_ctx_switch = false;
	    bdrv_dec_in_flight(bs);
	}

but I guess the problem is that then bdrv_drain could hang.

So my question is:

1) is there a testcase that shows the problem with this "obvious"
refactoring;

2) maybe instead of aio_co_schedul-ing client->connection_co and having
the s->aio_ctx_switch flag, you could go through a bottom half that does
the bdrv_inc_in_flight and then enters client->connection_co?

Paolo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 03/12] nbd: Restrict connection_co reentrance
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 03/12] nbd: Restrict connection_co reentrance Kevin Wolf
@ 2019-02-18 20:30   ` Eric Blake
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2019-02-18 20:30 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, stefanha, berrange, pbonzini, qemu-devel

On 2/18/19 10:18 AM, Kevin Wolf wrote:
> nbd_client_attach_aio_context() schedules connection_co in the new
> AioContext and this way reenters it in any arbitrary place that has
> yielded. We can restrict this a bit to the function call where the
> coroutine actually sits waiting when it's idle.
> 
> This doesn't solve any bug yet, but it shows where in the code we need
> to support this random reentrance and where we don't have to care.
> 
> Add FIXME comments for the existing bugs that the rest of this series
> will fix.

Wow, that's a lot of comments. Thanks for working on this.

> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/nbd-client.h |  1 +
>  block/nbd-client.c | 23 +++++++++++++++++++++++
>  2 files changed, 24 insertions(+)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

> +++ b/block/nbd-client.c
> @@ -76,8 +76,24 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
>      Error *local_err = NULL;
>  
>      while (!s->quit) {
> +        /*
> +         * The NBD client can only really be considered idle when it has
> +         * yielded from qio_channel_readv_all_eof(), waiting for data. This is
> +         * the point where the additional scheduled coroutine entry happens
> +         * after nbd_client_attach_aio_context().
> +         *
> +         * Therefore we keep an additional in_flight reference all the time and
> +         * only drop it temporarily here.
> +         *
> +         * FIXME This is not safe because the QIOChannel could wake up the
> +         * coroutine for a second time; it is not prepared for coroutine
> +         * resumption from external code.
> +         */
> +        bdrv_dec_in_flight(s->bs);
>          assert(s->reply.handle == 0);
>          ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
> +        bdrv_inc_in_flight(s->bs);
> +
>          if (local_err) {
>              trace_nbd_read_reply_entry_fail(ret, error_get_pretty(local_err));
>              error_free(local_err);
> @@ -116,6 +132,8 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
>  
>      s->quit = true;
>      nbd_recv_coroutines_wake_all(s);
> +    bdrv_dec_in_flight(s->bs);
> +
>      s->connection_co = NULL;
>      aio_wait_kick();
>  }
> @@ -970,6 +988,9 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
>  {
>      NBDClientSession *client = nbd_get_client_session(bs);
>      qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), new_context);
> +
> +    /* FIXME Really need a bdrv_inc_in_flight() here, but the corresponding
> +     * bdrv_dec_in_flight() would have to be in QIOChannel code :-/ */
>      aio_co_schedule(new_context, client->connection_co);
>  }
>  
> @@ -1076,6 +1097,7 @@ static int nbd_client_connect(BlockDriverState *bs,
>       * kick the reply mechanism.  */
>      qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
>      client->connection_co = qemu_coroutine_create(nbd_connection_entry, client);
> +    bdrv_inc_in_flight(bs);
>      nbd_client_attach_aio_context(bs, bdrv_get_aio_context(bs));
>  
>      logout("Established connection with NBD server\n");
> @@ -1108,6 +1130,7 @@ int nbd_client_init(BlockDriverState *bs,
>  {
>      NBDClientSession *client = nbd_get_client_session(bs);
>  
> +    client->bs = bs;
>      qemu_co_mutex_init(&client->send_mutex);
>      qemu_co_queue_init(&client->free_sema);
>  
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 04/12] io: Make qio_channel_yield() interruptible
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 04/12] io: Make qio_channel_yield() interruptible Kevin Wolf
  2019-02-18 17:11   ` Paolo Bonzini
@ 2019-02-18 20:33   ` Eric Blake
  1 sibling, 0 replies; 30+ messages in thread
From: Eric Blake @ 2019-02-18 20:33 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, stefanha, berrange, pbonzini, qemu-devel

On 2/18/19 10:18 AM, Kevin Wolf wrote:
> Similar to how qemu_co_sleep_ns() allows to be preempted by an external

allows preemption from an external

> coroutine entry, allow reentering qio_channel_yield() early.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/io/channel.h |  9 ++++++---
>  io/channel.c         | 10 ++++++++++
>  2 files changed, 16 insertions(+), 3 deletions(-)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 05/12] nbd: Move nbd_read_eof() to nbd/client.c
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 05/12] nbd: Move nbd_read_eof() to nbd/client.c Kevin Wolf
@ 2019-02-18 20:42   ` Eric Blake
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2019-02-18 20:42 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, stefanha, berrange, pbonzini, qemu-devel

On 2/18/19 10:18 AM, Kevin Wolf wrote:
> The only caller of nbd_read_eof() is nbd_receive_reply(), so it doesn't
> have to live in the header file, but can move next to its caller.
> 
> Also add the missing coroutine_fn to the function and its caller.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/nbd.h |  3 ++-
>  nbd/nbd-internal.h  | 19 -------------------
>  nbd/client.c        | 22 +++++++++++++++++++++-
>  3 files changed, 23 insertions(+), 21 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 06/12] nbd: Use low-level QIOChannel API in nbd_read_eof()
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 06/12] nbd: Use low-level QIOChannel API in nbd_read_eof() Kevin Wolf
@ 2019-02-18 20:48   ` Eric Blake
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2019-02-18 20:48 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, stefanha, berrange, pbonzini, qemu-devel

On 2/18/19 10:18 AM, Kevin Wolf wrote:
> Instead of using the convenience wrapper qio_channel_read_all_eof(), use
> the lower level QIOChannel API. This means duplicating some code, but
> we'll need this because this coroutine yield is special: We want it to
> be interruptible so that nbd_client_attach_aio_context() can correctly
> reenter the coroutine.
> 
> This moves the bdrv_dec/inc_in_flight() pair into nbd_read_eof(), so
> that connection_co will always sit in this exact qio_channel_yield()
> call when bdrv_drain() returns.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/nbd.h |  4 ++--
>  block/nbd-client.c  |  8 +-------
>  nbd/client.c        | 46 ++++++++++++++++++++++++++++++++++++---------
>  3 files changed, 40 insertions(+), 18 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 08/12] block: Don't poll in bdrv_set_aio_context()
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 08/12] block: Don't poll in bdrv_set_aio_context() Kevin Wolf
@ 2019-02-18 20:52   ` Eric Blake
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2019-02-18 20:52 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, stefanha, berrange, pbonzini, qemu-devel

On 2/18/19 10:18 AM, Kevin Wolf wrote:
> The explicit aio_poll() call in bdrv_set_aio_context() was added in
> commit c2b6428d388 as a workaround for bdrv_drain() failing to achieve
> to actually quiesce everything (specifically the NBD client code to
> switch AioContext).
> 
> Now that the NBD client has been fixed to complete this operation during
> bdrv_drain(), we don't need the workaround any more.
> 
> It was wrong anyway: aio_poll() must always be run in the home thread of
> the AioContext.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 4 ----
>  1 file changed, 4 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 09/12] block: Fix AioContext switch for drained node
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 09/12] block: Fix AioContext switch for drained node Kevin Wolf
@ 2019-02-18 20:53   ` Eric Blake
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2019-02-18 20:53 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, stefanha, berrange, pbonzini, qemu-devel

On 2/18/19 10:18 AM, Kevin Wolf wrote:
> When a drained node changes its AioContext, we need to move its
> aio_disable_external() to the new context, too.
> 
> Without this fix, drain_end will try to reenable the new context, which
> has never been disabled, so an assertion failure is triggered.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 7 +++++++
>  1 file changed, 7 insertions(+)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] test-bdrv-drain: AioContext switch in drained section
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 10/12] test-bdrv-drain: AioContext switch in drained section Kevin Wolf
@ 2019-02-18 20:55   ` Eric Blake
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2019-02-18 20:55 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, stefanha, berrange, pbonzini, qemu-devel

On 2/18/19 10:18 AM, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  tests/test-bdrv-drain.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 11/12] block: Use normal drain for bdrv_set_aio_context()
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 11/12] block: Use normal drain for bdrv_set_aio_context() Kevin Wolf
@ 2019-02-18 20:57   ` Eric Blake
  2019-02-19 11:23     ` Kevin Wolf
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Blake @ 2019-02-18 20:57 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, stefanha, berrange, pbonzini, qemu-devel

On 2/18/19 10:18 AM, Kevin Wolf wrote:
> Now that bdrv_set_aio_context() works inside drained sections, it can
> also use the real drain function instead of open coding something
> similar.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 14 +++++---------
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/block.c b/block.c
> index aefb5701f5..7024172db8 100644
> --- a/block.c
> +++ b/block.c
> @@ -5268,18 +5268,15 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
>      bs->walking_aio_notifiers = false;
>  }
>  
> +/* The caller must own the AioContext lock for the old AioContext of bs, but it
> + * must not own the AioContext lock for new_context. */

Is this comment still accurate, given

>  void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
>  {
> -    AioContext *ctx = bdrv_get_aio_context(bs);
> -
> -    if (ctx == new_context) {
> +    if (bdrv_get_aio_context(bs) == new_context) {
>          return;
>      }

the short-circuiting when the old context is the new context?

>  
> -    aio_disable_external(ctx);
> -    bdrv_parent_drained_begin(bs, NULL, false);
> -    bdrv_drain(bs); /* ensure there are no in-flight requests */
> -
> +    bdrv_drained_begin(bs);
>      bdrv_detach_aio_context(bs);
>  
>      /* This function executes in the old AioContext so acquire the new one in
> @@ -5287,8 +5284,7 @@ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
>       */
>      aio_context_acquire(new_context);
>      bdrv_attach_aio_context(bs, new_context);
> -    bdrv_parent_drained_end(bs, NULL, false);
> -    aio_enable_external(ctx);
> +    bdrv_drained_end(bs);
>      aio_context_release(new_context);
>  }
>  
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 12/12] aio-posix: Assert that aio_poll() is always called in home thread
  2019-02-18 16:18 ` [Qemu-devel] [PATCH 12/12] aio-posix: Assert that aio_poll() is always called in home thread Kevin Wolf
@ 2019-02-18 20:58   ` Eric Blake
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2019-02-18 20:58 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: mreitz, stefanha, berrange, pbonzini, qemu-devel

On 2/18/19 10:18 AM, Kevin Wolf wrote:
> aio_poll() has an existing assertion that the function is only called
> from the AioContext's home thread if blocking is allowed.
> 
> This is not enough, some handlers make assumptions about the thread they
> run in. Extend the assertion to non-blocking calls, too.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  util/aio-posix.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index 8640dfde9f..6fbfa7924f 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -613,6 +613,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
>      int64_t timeout;
>      int64_t start = 0;
>  
> +    assert(in_aio_context_home_thread(ctx));
> +
>      /* aio_notify can avoid the expensive event_notifier_set if
>       * everything (file descriptors, bottom halves, timers) will
>       * be re-evaluated before the next blocking poll().  This is
> @@ -621,7 +623,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
>       * so disable the optimization now.
>       */
>      if (blocking) {
> -        assert(in_aio_context_home_thread(ctx));
>          atomic_add(&ctx->notify_me, 2);
>      }
>  
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 04/12] io: Make qio_channel_yield() interruptible
  2019-02-18 17:11   ` Paolo Bonzini
@ 2019-02-19 10:24     ` Kevin Wolf
  0 siblings, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2019-02-19 10:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, eblake, stefanha, berrange, qemu-devel

Am 18.02.2019 um 18:11 hat Paolo Bonzini geschrieben:
> On 18/02/19 17:18, Kevin Wolf wrote:
> > Similar to how qemu_co_sleep_ns() allows to be preempted by an external
> > coroutine entry, allow reentering qio_channel_yield() early.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  include/io/channel.h |  9 ++++++---
> >  io/channel.c         | 10 ++++++++++
> >  2 files changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/io/channel.h b/include/io/channel.h
> > index da2f138200..59460cb1ec 100644
> > --- a/include/io/channel.h
> > +++ b/include/io/channel.h
> > @@ -739,10 +739,13 @@ void qio_channel_detach_aio_context(QIOChannel *ioc);
> >   * addition, no two coroutine can be waiting on the same condition
> >   * and channel at the same time.
> >   *
> > - * This must only be called from coroutine context
> > + * This must only be called from coroutine context. It is safe to
> > + * reenter the coroutine externally while it is waiting; in this
> > + * case the function will return even if @condition is not yet
> > + * available.
> >   */
> > -void qio_channel_yield(QIOChannel *ioc,
> > -                       GIOCondition condition);
> > +void coroutine_fn qio_channel_yield(QIOChannel *ioc,
> > +                                    GIOCondition condition);
> >  
> >  /**
> >   * qio_channel_wait:
> > diff --git a/io/channel.c b/io/channel.c
> > index 8dd0684f5d..303376e08d 100644
> > --- a/io/channel.c
> > +++ b/io/channel.c
> > @@ -469,6 +469,16 @@ void coroutine_fn qio_channel_yield(QIOChannel *ioc,
> >      }
> >      qio_channel_set_aio_fd_handlers(ioc);
> >      qemu_coroutine_yield();
> > +
> > +    /* Allow interrupting the operation by reentering the coroutine other than
> > +     * through the aio_fd_handlers. */
> > +    if (condition == G_IO_IN && ioc->read_coroutine) {
> > +        ioc->read_coroutine = NULL;
> > +        qio_channel_set_aio_fd_handlers(ioc);
> > +    } else if (condition == G_IO_OUT && ioc->write_coroutine) {
> > +        ioc->write_coroutine = NULL;
> > +        qio_channel_set_aio_fd_handlers(ioc);
> > +    }
> >  }
> 
> Perhaps it's a bit cleaner to remove the ioc->..._coroutine assignments,
> and the call to qio_channel_set_aio_fd_handlers, from
> qio_channel_restart_read and qio_channel_restart_write.

I wasn't sure if this was correct because aio_co_wake() doesn't
necessarily enter the coroutine immediately, so the value between there
and actually entering the coroutine would change compared to the old
state.

On closer look, these handlers are never in coroutine context, and
qio_channel_attach_aio_context() asserts that the handlers are inactive,
so I guess we could replace the aio_co_wake() with a direct
qemu_coroutine_enter() and possibly an assertion that we're already in
the right AioContext.

That would make it obvious that removing the assignment and call to
qio_channel_set_aio_fd_handlers() there is safe.

In any case, I think this needs to be a separate patch where the commit
message can explain the above.

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 07/12] nbd: Increase bs->in_flight during AioContext switch
  2019-02-18 17:22   ` Paolo Bonzini
@ 2019-02-19 11:11     ` Kevin Wolf
  2019-02-19 14:12       ` Paolo Bonzini
  2019-02-20 16:33     ` Kevin Wolf
  1 sibling, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2019-02-19 11:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, eblake, stefanha, berrange, qemu-devel

Am 18.02.2019 um 18:22 hat Paolo Bonzini geschrieben:
> On 18/02/19 17:18, Kevin Wolf wrote:
> > +            /* aio_ctx_switch is only supposed to be set if we're sitting in
> > +             * the qio_channel_yield() below. */
> > +            assert(!*aio_ctx_switch);
> >              bdrv_dec_in_flight(bs);
> >              qio_channel_yield(ioc, G_IO_IN);
> > -            bdrv_inc_in_flight(bs);
> > +            if (*aio_ctx_switch) {
> > +                /* nbd_client_attach_aio_context() already increased in_flight
> > +                 * when scheduling this coroutine for reentry */
> > +                *aio_ctx_switch = false;
> > +            } else {
> > +                bdrv_inc_in_flight(bs);
> > +            }
> 
> Hmm, my first thought would have been to do the bdrv_inc_in_flight(bs);
> unconditionally here, and in nbd_connection_entry do the opposite, like
> 
> 	if (s->aio_ctx_switch) {
> 	    s->aio_ctx_switch = false;
> 	    bdrv_dec_in_flight(bs);
> 	}
> 
> but I guess the problem is that then bdrv_drain could hang.

Yes, the important part is that in_flight can drop to 0 while we're in
qio_channel_yield().

> So my question is:
> 
> 1) is there a testcase that shows the problem with this "obvious"
> refactoring;

I haven't actually tried it out because it's "obviously" wrong, but in
any test case where no requests are running, you'd never leave this
loop, so it should trivially trigger the problem.

In other words, I think qemu-iotests 094 covers this.

> 2) maybe instead of aio_co_schedul-ing client->connection_co and having
> the s->aio_ctx_switch flag, you could go through a bottom half that does
> the bdrv_inc_in_flight and then enters client->connection_co?

That would be too easy. :-)

But I agree, that might indeed be the better solution.

I think I'd keep patch 6 anyway so that we know the exact yield that
we'll interrupt, even if it's not strictly necessary as long as we know
that nbd_receive_reply() can only yield in places that are safe to be
interrupted. While intuitively I think it's true, I don't feel like
actually auditing the code, and at some point we'd probably fail to
check that new code won't violate this invariant.

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 11/12] block: Use normal drain for bdrv_set_aio_context()
  2019-02-18 20:57   ` Eric Blake
@ 2019-02-19 11:23     ` Kevin Wolf
  2019-02-19 16:04       ` Eric Blake
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2019-02-19 11:23 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-block, mreitz, stefanha, berrange, pbonzini, qemu-devel

Am 18.02.2019 um 21:57 hat Eric Blake geschrieben:
> On 2/18/19 10:18 AM, Kevin Wolf wrote:
> > Now that bdrv_set_aio_context() works inside drained sections, it can
> > also use the real drain function instead of open coding something
> > similar.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block.c | 14 +++++---------
> >  1 file changed, 5 insertions(+), 9 deletions(-)
> > 
> > diff --git a/block.c b/block.c
> > index aefb5701f5..7024172db8 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -5268,18 +5268,15 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
> >      bs->walking_aio_notifiers = false;
> >  }
> >  
> > +/* The caller must own the AioContext lock for the old AioContext of bs, but it
> > + * must not own the AioContext lock for new_context. */
> 
> Is this comment still accurate, given
> 
> >  void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
> >  {
> > -    AioContext *ctx = bdrv_get_aio_context(bs);
> > -
> > -    if (ctx == new_context) {
> > +    if (bdrv_get_aio_context(bs) == new_context) {
> >          return;
> >      }
> 
> the short-circuiting when the old context is the new context?

Hm, yes, old == new is an exception where you quite obviously can't have
old locked and new unlocked at the same time.

So is adding this enough?

    (unless new_context is the same as the current context of bs)

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 07/12] nbd: Increase bs->in_flight during AioContext switch
  2019-02-19 11:11     ` Kevin Wolf
@ 2019-02-19 14:12       ` Paolo Bonzini
  0 siblings, 0 replies; 30+ messages in thread
From: Paolo Bonzini @ 2019-02-19 14:12 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, eblake, stefanha, berrange, qemu-devel

On 19/02/19 12:11, Kevin Wolf wrote:
>> 2) maybe instead of aio_co_schedul-ing client->connection_co and having
>> the s->aio_ctx_switch flag, you could go through a bottom half that does
>> the bdrv_inc_in_flight and then enters client->connection_co?
> That would be too easy. :-)
> 
> But I agree, that might indeed be the better solution.
> 
> I think I'd keep patch 6 anyway so that we know the exact yield that
> we'll interrupt, even if it's not strictly necessary as long as we know
> that nbd_receive_reply() can only yield in places that are safe to be
> interrupted. While intuitively I think it's true, I don't feel like
> actually auditing the code, and at some point we'd probably fail to
> check that new code won't violate this invariant.

Yes, I agree with keeping patch 6.

Paolo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 11/12] block: Use normal drain for bdrv_set_aio_context()
  2019-02-19 11:23     ` Kevin Wolf
@ 2019-02-19 16:04       ` Eric Blake
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2019-02-19 16:04 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, mreitz, stefanha, berrange, pbonzini, qemu-devel

On 2/19/19 5:23 AM, Kevin Wolf wrote:
> Am 18.02.2019 um 21:57 hat Eric Blake geschrieben:
>> On 2/18/19 10:18 AM, Kevin Wolf wrote:
>>> Now that bdrv_set_aio_context() works inside drained sections, it can
>>> also use the real drain function instead of open coding something
>>> similar.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---

>>> +/* The caller must own the AioContext lock for the old AioContext of bs, but it
>>> + * must not own the AioContext lock for new_context. */
>>
>> Is this comment still accurate, given
>>
>>>  void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context)
>>>  {
>>> -    AioContext *ctx = bdrv_get_aio_context(bs);
>>> -
>>> -    if (ctx == new_context) {
>>> +    if (bdrv_get_aio_context(bs) == new_context) {
>>>          return;
>>>      }
>>
>> the short-circuiting when the old context is the new context?
> 
> Hm, yes, old == new is an exception where you quite obviously can't have
> old locked and new unlocked at the same time.
> 
> So is adding this enough?
> 
>     (unless new_context is the same as the current context of bs)

Works for me.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 07/12] nbd: Increase bs->in_flight during AioContext switch
  2019-02-18 17:22   ` Paolo Bonzini
  2019-02-19 11:11     ` Kevin Wolf
@ 2019-02-20 16:33     ` Kevin Wolf
  1 sibling, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2019-02-20 16:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-block, mreitz, eblake, stefanha, berrange, qemu-devel

Am 18.02.2019 um 18:22 hat Paolo Bonzini geschrieben:
> On 18/02/19 17:18, Kevin Wolf wrote:
> > +            /* aio_ctx_switch is only supposed to be set if we're sitting in
> > +             * the qio_channel_yield() below. */
> > +            assert(!*aio_ctx_switch);
> >              bdrv_dec_in_flight(bs);
> >              qio_channel_yield(ioc, G_IO_IN);
> > -            bdrv_inc_in_flight(bs);
> > +            if (*aio_ctx_switch) {
> > +                /* nbd_client_attach_aio_context() already increased in_flight
> > +                 * when scheduling this coroutine for reentry */
> > +                *aio_ctx_switch = false;
> > +            } else {
> > +                bdrv_inc_in_flight(bs);
> > +            }
> 
> Hmm, my first thought would have been to do the bdrv_inc_in_flight(bs);
> unconditionally here, and in nbd_connection_entry do the opposite, like
> 
> 	if (s->aio_ctx_switch) {
> 	    s->aio_ctx_switch = false;
> 	    bdrv_dec_in_flight(bs);
> 	}
> 
> but I guess the problem is that then bdrv_drain could hang.
> 
> So my question is:
> 
> 1) is there a testcase that shows the problem with this "obvious"
> refactoring;
> 
> 2) maybe instead of aio_co_schedul-ing client->connection_co and having
> the s->aio_ctx_switch flag, you could go through a bottom half that does
> the bdrv_inc_in_flight and then enters client->connection_co?

Actually, this is going to become a bit ugly, too. I can't just schedule
the BH and return because then the node isn't drained any more when the
BH actually runs - and when it's not drained, we don't know where the
coroutine is, so we can't reenter it.

With an AIO_WAIT_WHILE() in the old thread, it should work, though...

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-02-20 16:33 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-18 16:18 [Qemu-devel] [PATCH 00/12] block: bdrv_set_aio_context() related fixes Kevin Wolf
2019-02-18 16:18 ` [Qemu-devel] [PATCH 01/12] block-backend: Make blk_inc/dec_in_flight public Kevin Wolf
2019-02-18 16:18 ` [Qemu-devel] [PATCH 02/12] virtio-blk: Increase in_flight for request restart BH Kevin Wolf
2019-02-18 16:18 ` [Qemu-devel] [PATCH 03/12] nbd: Restrict connection_co reentrance Kevin Wolf
2019-02-18 20:30   ` Eric Blake
2019-02-18 16:18 ` [Qemu-devel] [PATCH 04/12] io: Make qio_channel_yield() interruptible Kevin Wolf
2019-02-18 17:11   ` Paolo Bonzini
2019-02-19 10:24     ` Kevin Wolf
2019-02-18 20:33   ` Eric Blake
2019-02-18 16:18 ` [Qemu-devel] [PATCH 05/12] nbd: Move nbd_read_eof() to nbd/client.c Kevin Wolf
2019-02-18 20:42   ` Eric Blake
2019-02-18 16:18 ` [Qemu-devel] [PATCH 06/12] nbd: Use low-level QIOChannel API in nbd_read_eof() Kevin Wolf
2019-02-18 20:48   ` Eric Blake
2019-02-18 16:18 ` [Qemu-devel] [PATCH 07/12] nbd: Increase bs->in_flight during AioContext switch Kevin Wolf
2019-02-18 17:22   ` Paolo Bonzini
2019-02-19 11:11     ` Kevin Wolf
2019-02-19 14:12       ` Paolo Bonzini
2019-02-20 16:33     ` Kevin Wolf
2019-02-18 16:18 ` [Qemu-devel] [PATCH 08/12] block: Don't poll in bdrv_set_aio_context() Kevin Wolf
2019-02-18 20:52   ` Eric Blake
2019-02-18 16:18 ` [Qemu-devel] [PATCH 09/12] block: Fix AioContext switch for drained node Kevin Wolf
2019-02-18 20:53   ` Eric Blake
2019-02-18 16:18 ` [Qemu-devel] [PATCH 10/12] test-bdrv-drain: AioContext switch in drained section Kevin Wolf
2019-02-18 20:55   ` Eric Blake
2019-02-18 16:18 ` [Qemu-devel] [PATCH 11/12] block: Use normal drain for bdrv_set_aio_context() Kevin Wolf
2019-02-18 20:57   ` Eric Blake
2019-02-19 11:23     ` Kevin Wolf
2019-02-19 16:04       ` Eric Blake
2019-02-18 16:18 ` [Qemu-devel] [PATCH 12/12] aio-posix: Assert that aio_poll() is always called in home thread Kevin Wolf
2019-02-18 20:58   ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.