All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/21] block: remove aio_disable_external() API
@ 2023-05-04 19:53 Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 01/21] block: Fix use after free in blockdev_mark_auto_del() Stefan Hajnoczi
                   ` (22 more replies)
  0 siblings, 23 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

v5:
- Use atomic accesses for in_flight counter in vhost-user-server.c [Kevin]
- Stash SCSIDevice id/lun values for VIRTIO_SCSI_T_TRANSPORT_RESET event
  before unrealizing the SCSIDevice [Kevin]
- Keep vhost-user-blk export .detach() callback so ctx is set to NULL [Kevin]
- Narrow BdrvChildClass and BlockDriver drained_{begin/end/poll} callbacks from
  IO_OR_GS_CODE() to GLOBAL_STATE_CODE() [Kevin]
- Include Kevin's "block: Fix use after free in blockdev_mark_auto_del()" to
  fix a latent bug that was exposed by this series

v4:
- Remove external_disable_cnt variable [Philippe]
- Add Patch 1 to fix assertion failure in .drained_end() -> blk_get_aio_context()
v3:
- Resend full patch series. v2 was sent in the middle of a git rebase and was
  missing patches. [Eric]
- Apply Reviewed-by tags.
v2:
- Do not rely on BlockBackend request queuing, implement .drained_begin/end()
  instead in xen-block, virtio-blk, and virtio-scsi [Paolo]
- Add qdev_is_realized() API [Philippe]
- Add patch to avoid AioContext lock around blk_exp_ref/unref() [Paolo]
- Add patch to call .drained_begin/end() from main loop thread to simplify
  callback implementations

The aio_disable_external() API temporarily suspends file descriptor monitoring
in the event loop. The block layer uses this to prevent new I/O requests being
submitted from the guest and elsewhere between bdrv_drained_begin() and
bdrv_drained_end().

While the block layer still needs to prevent new I/O requests in drained
sections, the aio_disable_external() API can be replaced with
.drained_begin/end/poll() callbacks that have been added to BdrvChildClass and
BlockDevOps.

This newer .bdrained_begin/end/poll() approach is attractive because it works
without specifying a specific AioContext. The block layer is moving towards
multi-queue and that means multiple AioContexts may be processing I/O
simultaneously.

The aio_disable_external() was always somewhat hacky. It suspends all file
descriptors that were registered with is_external=true, even if they have
nothing to do with the BlockDriverState graph nodes that are being drained.
It's better to solve a block layer problem in the block layer than to have an
odd event loop API solution.

The approach in this patch series is to implement BlockDevOps
.drained_begin/end() callbacks that temporarily stop file descriptor handlers.
This ensures that new I/O requests are not submitted in drained sections.

Kevin Wolf (1):
  block: Fix use after free in blockdev_mark_auto_del()

Stefan Hajnoczi (20):
  block-backend: split blk_do_set_aio_context()
  hw/qdev: introduce qdev_is_realized() helper
  virtio-scsi: avoid race between unplug and transport event
  virtio-scsi: stop using aio_disable_external() during unplug
  util/vhost-user-server: rename refcount to in_flight counter
  block/export: wait for vhost-user-blk requests when draining
  block/export: stop using is_external in vhost-user-blk server
  hw/xen: do not use aio_set_fd_handler(is_external=true) in
    xen_xenstore
  block: add blk_in_drain() API
  block: drain from main loop thread in bdrv_co_yield_to_drain()
  xen-block: implement BlockDevOps->drained_begin()
  hw/xen: do not set is_external=true on evtchn fds
  block/export: rewrite vduse-blk drain code
  block/export: don't require AioContext lock around blk_exp_ref/unref()
  block/fuse: do not set is_external=true on FUSE fd
  virtio: make it possible to detach host notifier from any thread
  virtio-blk: implement BlockDevOps->drained_begin()
  virtio-scsi: implement BlockDevOps->drained_begin()
  virtio: do not set is_external=true on host notifiers
  aio: remove aio_disable_external() API

 hw/block/dataplane/xen-block.h              |   2 +
 include/block/aio.h                         |  57 ---------
 include/block/block_int-common.h            |  90 +++++++-------
 include/block/export.h                      |   2 +
 include/hw/qdev-core.h                      |  17 ++-
 include/hw/scsi/scsi.h                      |  14 +++
 include/qemu/vhost-user-server.h            |   8 +-
 include/sysemu/block-backend-common.h       |  25 ++--
 include/sysemu/block-backend-global-state.h |   1 +
 util/aio-posix.h                            |   1 -
 block.c                                     |   7 --
 block/blkio.c                               |  15 +--
 block/block-backend.c                       |  78 ++++++------
 block/curl.c                                |  10 +-
 block/export/export.c                       |  13 +-
 block/export/fuse.c                         |  56 ++++++++-
 block/export/vduse-blk.c                    | 128 ++++++++++++++------
 block/export/vhost-user-blk-server.c        |  52 +++++++-
 block/io.c                                  |  16 ++-
 block/io_uring.c                            |   4 +-
 block/iscsi.c                               |   3 +-
 block/linux-aio.c                           |   4 +-
 block/nfs.c                                 |   5 +-
 block/nvme.c                                |   8 +-
 block/ssh.c                                 |   4 +-
 block/win32-aio.c                           |   6 +-
 blockdev.c                                  |  18 ++-
 hw/block/dataplane/virtio-blk.c             |  19 ++-
 hw/block/dataplane/xen-block.c              |  42 +++++--
 hw/block/virtio-blk.c                       |  38 +++++-
 hw/block/xen-block.c                        |  24 +++-
 hw/i386/kvm/xen_xenstore.c                  |   2 +-
 hw/scsi/scsi-bus.c                          |  46 ++++++-
 hw/scsi/scsi-disk.c                         |  27 ++++-
 hw/scsi/virtio-scsi-dataplane.c             |  31 +++--
 hw/scsi/virtio-scsi.c                       | 127 ++++++++++++++-----
 hw/virtio/virtio.c                          |   6 +-
 hw/xen/xen-bus.c                            |  11 +-
 io/channel-command.c                        |   6 +-
 io/channel-file.c                           |   3 +-
 io/channel-socket.c                         |   3 +-
 migration/rdma.c                            |  16 +--
 tests/unit/test-aio.c                       |  27 +----
 tests/unit/test-bdrv-drain.c                |  15 +--
 tests/unit/test-fdmon-epoll.c               |  73 -----------
 util/aio-posix.c                            |  20 +--
 util/aio-win32.c                            |   8 +-
 util/async.c                                |   3 +-
 util/fdmon-epoll.c                          |  18 +--
 util/fdmon-io_uring.c                       |   8 +-
 util/fdmon-poll.c                           |   3 +-
 util/main-loop.c                            |   7 +-
 util/qemu-coroutine-io.c                    |   7 +-
 util/vhost-user-server.c                    |  33 ++---
 hw/scsi/trace-events                        |   2 +
 tests/unit/meson.build                      |   3 -
 56 files changed, 738 insertions(+), 534 deletions(-)
 delete mode 100644 tests/unit/test-fdmon-epoll.c

-- 
2.40.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v5 01/21] block: Fix use after free in blockdev_mark_auto_del()
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 02/21] block-backend: split blk_do_set_aio_context() Stefan Hajnoczi
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

From: Kevin Wolf <kwolf@redhat.com>

job_cancel_locked() drops the job list lock temporarily and it may call
aio_poll(). We must assume that the list has changed after this call.
Also, with unlucky timing, it can end up freeing the job during
job_completed_txn_abort_locked(), making the job pointer invalid, too.

For both reasons, we can't just continue at block_job_next_locked(job).
Instead, start at the head of the list again after job_cancel_locked()
and skip those jobs that we already cancelled (or that are completing
anyway).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230503140142.474404-1-kwolf@redhat.com>
---
 blockdev.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index d7b5c18f0a..2c1752a403 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -153,12 +153,22 @@ void blockdev_mark_auto_del(BlockBackend *blk)
 
     JOB_LOCK_GUARD();
 
-    for (job = block_job_next_locked(NULL); job;
-         job = block_job_next_locked(job)) {
-        if (block_job_has_bdrv(job, blk_bs(blk))) {
+    do {
+        job = block_job_next_locked(NULL);
+        while (job && (job->job.cancelled ||
+                       job->job.deferred_to_main_loop ||
+                       !block_job_has_bdrv(job, blk_bs(blk))))
+        {
+            job = block_job_next_locked(job);
+        }
+        if (job) {
+            /*
+             * This drops the job lock temporarily and polls, so we need to
+             * restart processing the list from the start after this.
+             */
             job_cancel_locked(&job->job, false);
         }
-    }
+    } while (job);
 
     dinfo->auto_del = 1;
 }
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 02/21] block-backend: split blk_do_set_aio_context()
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 01/21] block: Fix use after free in blockdev_mark_auto_del() Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 03/21] hw/qdev: introduce qdev_is_realized() helper Stefan Hajnoczi
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

blk_set_aio_context() is not fully transactional because
blk_do_set_aio_context() updates blk->ctx outside the transaction. Most
of the time this goes unnoticed but a BlockDevOps.drained_end() callback
that invokes blk_get_aio_context() fails assert(ctx == blk->ctx). This
happens because blk->ctx is only assigned after
BlockDevOps.drained_end() is called and we're in an intermediate state
where BlockDrvierState nodes already have the new context and the
BlockBackend still has the old context.

Making blk_set_aio_context() fully transactional solves this assertion
failure because the BlockBackend's context is updated as part of the
transaction (before BlockDevOps.drained_end() is called).

Split blk_do_set_aio_context() in order to solve this assertion failure.
This helper function actually serves two different purposes:
1. It drives blk_set_aio_context().
2. It responds to BdrvChildClass->change_aio_ctx().

Get rid of the helper function. Do #1 inside blk_set_aio_context() and
do #2 inside blk_root_set_aio_ctx_commit(). This simplifies the code.

The only drawback of the fully transactional approach is that
blk_set_aio_context() must contend with blk_root_set_aio_ctx_commit()
being invoked as part of the AioContext change propagation. This can be
solved by temporarily setting blk->allow_aio_context_change to true.

Future patches call blk_get_aio_context() from
BlockDevOps->drained_end(), so this patch will become necessary.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/block-backend.c | 71 +++++++++++++++++--------------------------
 1 file changed, 28 insertions(+), 43 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index fc530ded6a..68d38635bc 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2205,52 +2205,31 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb)
     return blk_get_aio_context(blk_acb->blk);
 }
 
-static int blk_do_set_aio_context(BlockBackend *blk, AioContext *new_context,
-                                  bool update_root_node, Error **errp)
-{
-    BlockDriverState *bs = blk_bs(blk);
-    ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
-    int ret;
-
-    if (bs) {
-        bdrv_ref(bs);
-
-        if (update_root_node) {
-            /*
-             * update_root_node MUST be false for blk_root_set_aio_ctx_commit(),
-             * as we are already in the commit function of a transaction.
-             */
-            ret = bdrv_try_change_aio_context(bs, new_context, blk->root, errp);
-            if (ret < 0) {
-                bdrv_unref(bs);
-                return ret;
-            }
-        }
-        /*
-         * Make blk->ctx consistent with the root node before we invoke any
-         * other operations like drain that might inquire blk->ctx
-         */
-        blk->ctx = new_context;
-        if (tgm->throttle_state) {
-            bdrv_drained_begin(bs);
-            throttle_group_detach_aio_context(tgm);
-            throttle_group_attach_aio_context(tgm, new_context);
-            bdrv_drained_end(bs);
-        }
-
-        bdrv_unref(bs);
-    } else {
-        blk->ctx = new_context;
-    }
-
-    return 0;
-}
-
 int blk_set_aio_context(BlockBackend *blk, AioContext *new_context,
                         Error **errp)
 {
+    bool old_allow_change;
+    BlockDriverState *bs = blk_bs(blk);
+    int ret;
+
     GLOBAL_STATE_CODE();
-    return blk_do_set_aio_context(blk, new_context, true, errp);
+
+    if (!bs) {
+        blk->ctx = new_context;
+        return 0;
+    }
+
+    bdrv_ref(bs);
+
+    old_allow_change = blk->allow_aio_context_change;
+    blk->allow_aio_context_change = true;
+
+    ret = bdrv_try_change_aio_context(bs, new_context, NULL, errp);
+
+    blk->allow_aio_context_change = old_allow_change;
+
+    bdrv_unref(bs);
+    return ret;
 }
 
 typedef struct BdrvStateBlkRootContext {
@@ -2262,8 +2241,14 @@ static void blk_root_set_aio_ctx_commit(void *opaque)
 {
     BdrvStateBlkRootContext *s = opaque;
     BlockBackend *blk = s->blk;
+    AioContext *new_context = s->new_ctx;
+    ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
 
-    blk_do_set_aio_context(blk, s->new_ctx, false, &error_abort);
+    blk->ctx = new_context;
+    if (tgm->throttle_state) {
+        throttle_group_detach_aio_context(tgm);
+        throttle_group_attach_aio_context(tgm, new_context);
+    }
 }
 
 static TransactionActionDrv set_blk_root_context = {
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 03/21] hw/qdev: introduce qdev_is_realized() helper
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 01/21] block: Fix use after free in blockdev_mark_auto_del() Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 02/21] block-backend: split blk_do_set_aio_context() Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 04/21] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

Add a helper function to check whether the device is realized without
requiring the Big QEMU Lock. The next patch adds a second caller. The
goal is to avoid spreading DeviceState field accesses throughout the
code.

Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/qdev-core.h | 17 ++++++++++++++---
 hw/scsi/scsi-bus.c     |  3 +--
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 7623703943..f1070d6dc7 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -1,6 +1,7 @@
 #ifndef QDEV_CORE_H
 #define QDEV_CORE_H
 
+#include "qemu/atomic.h"
 #include "qemu/queue.h"
 #include "qemu/bitmap.h"
 #include "qemu/rcu.h"
@@ -168,9 +169,6 @@ typedef struct {
 
 /**
  * DeviceState:
- * @realized: Indicates whether the device has been fully constructed.
- *            When accessed outside big qemu lock, must be accessed with
- *            qatomic_load_acquire()
  * @reset: ResettableState for the device; handled by Resettable interface.
  *
  * This structure should not be accessed directly.  We declare it here
@@ -339,6 +337,19 @@ DeviceState *qdev_new(const char *name);
  */
 DeviceState *qdev_try_new(const char *name);
 
+/**
+ * qdev_is_realized:
+ * @dev: The device to check.
+ *
+ * May be called outside big qemu lock.
+ *
+ * Returns: %true% if the device has been fully constructed, %false% otherwise.
+ */
+static inline bool qdev_is_realized(DeviceState *dev)
+{
+    return qatomic_load_acquire(&dev->realized);
+}
+
 /**
  * qdev_realize: Realize @dev.
  * @dev: device to realize
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 3c20b47ad0..8857ff41f6 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -60,8 +60,7 @@ static SCSIDevice *do_scsi_device_find(SCSIBus *bus,
      * the user access the device.
      */
 
-    if (retval && !include_unrealized &&
-        !qatomic_load_acquire(&retval->qdev.realized)) {
+    if (retval && !include_unrealized && !qdev_is_realized(&retval->qdev)) {
         retval = NULL;
     }
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 04/21] virtio-scsi: avoid race between unplug and transport event
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 03/21] hw/qdev: introduce qdev_is_realized() helper Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 05/21] virtio-scsi: stop using aio_disable_external() during unplug Stefan Hajnoczi
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum,
	Daniil Tatianin

Only report a transport reset event to the guest after the SCSIDevice
has been unrealized by qdev_simple_device_unplug_cb().

qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
to false so that scsi_device_find/get() no longer see it.

scsi_target_emulate_report_luns() also needs to be updated to filter out
SCSIDevices that are unrealized.

Change virtio_scsi_push_event() to take event information as an argument
instead of the SCSIDevice. This allows virtio_scsi_hotunplug() to emit a
VIRTIO_SCSI_T_TRANSPORT_RESET event after the SCSIDevice has already
been unrealized.

These changes ensure that the guest driver does not see the SCSIDevice
that's being unplugged if it responds very quickly to the transport
reset event.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v5:
- Stash SCSIDevice id/lun values for VIRTIO_SCSI_T_TRANSPORT_RESET event
  before unrealizing the SCSIDevice [Kevin]
---
 hw/scsi/scsi-bus.c    |  3 +-
 hw/scsi/virtio-scsi.c | 86 ++++++++++++++++++++++++++++++-------------
 2 files changed, 63 insertions(+), 26 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 8857ff41f6..64013c8a24 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -487,7 +487,8 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq *r)
             DeviceState *qdev = kid->child;
             SCSIDevice *dev = SCSI_DEVICE(qdev);
 
-            if (dev->channel == channel && dev->id == id && dev->lun != 0) {
+            if (dev->channel == channel && dev->id == id && dev->lun != 0 &&
+                qdev_is_realized(&dev->qdev)) {
                 store_lun(tmp, dev->lun);
                 g_byte_array_append(buf, tmp, 8);
                 len += 8;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 612c525d9d..ae314af3de 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -933,13 +933,27 @@ static void virtio_scsi_reset(VirtIODevice *vdev)
     s->events_dropped = false;
 }
 
-static void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice *dev,
-                                   uint32_t event, uint32_t reason)
+typedef struct {
+    uint32_t event;
+    uint32_t reason;
+    union {
+        /* Used by messages specific to a device */
+        struct {
+            uint32_t id;
+            uint32_t lun;
+        } address;
+    };
+} VirtIOSCSIEventInfo;
+
+static void virtio_scsi_push_event(VirtIOSCSI *s,
+                                   const VirtIOSCSIEventInfo *info)
 {
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
     VirtIOSCSIReq *req;
     VirtIOSCSIEvent *evt;
     VirtIODevice *vdev = VIRTIO_DEVICE(s);
+    uint32_t event = info->event;
+    uint32_t reason = info->reason;
 
     if (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
         return;
@@ -965,27 +979,28 @@ static void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice *dev,
     memset(evt, 0, sizeof(VirtIOSCSIEvent));
     evt->event = virtio_tswap32(vdev, event);
     evt->reason = virtio_tswap32(vdev, reason);
-    if (!dev) {
-        assert(event == VIRTIO_SCSI_T_EVENTS_MISSED);
-    } else {
+    if (event != VIRTIO_SCSI_T_EVENTS_MISSED) {
         evt->lun[0] = 1;
-        evt->lun[1] = dev->id;
+        evt->lun[1] = info->address.id;
 
         /* Linux wants us to keep the same encoding we use for REPORT LUNS.  */
-        if (dev->lun >= 256) {
-            evt->lun[2] = (dev->lun >> 8) | 0x40;
+        if (info->address.lun >= 256) {
+            evt->lun[2] = (info->address.lun >> 8) | 0x40;
         }
-        evt->lun[3] = dev->lun & 0xFF;
+        evt->lun[3] = info->address.lun & 0xFF;
     }
     trace_virtio_scsi_event(virtio_scsi_get_lun(evt->lun), event, reason);
-     
+
     virtio_scsi_complete_req(req);
 }
 
 static void virtio_scsi_handle_event_vq(VirtIOSCSI *s, VirtQueue *vq)
 {
     if (s->events_dropped) {
-        virtio_scsi_push_event(s, NULL, VIRTIO_SCSI_T_NO_EVENT, 0);
+        VirtIOSCSIEventInfo info = {
+            .event = VIRTIO_SCSI_T_NO_EVENT,
+        };
+        virtio_scsi_push_event(s, &info);
     }
 }
 
@@ -1009,9 +1024,17 @@ static void virtio_scsi_change(SCSIBus *bus, SCSIDevice *dev, SCSISense sense)
 
     if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_CHANGE) &&
         dev->type != TYPE_ROM) {
+        VirtIOSCSIEventInfo info = {
+            .event   = VIRTIO_SCSI_T_PARAM_CHANGE,
+            .reason  = sense.asc | (sense.ascq << 8),
+            .address = {
+                .id  = dev->id,
+                .lun = dev->lun,
+            },
+        };
+
         virtio_scsi_acquire(s);
-        virtio_scsi_push_event(s, dev, VIRTIO_SCSI_T_PARAM_CHANGE,
-                               sense.asc | (sense.ascq << 8));
+        virtio_scsi_push_event(s, &info);
         virtio_scsi_release(s);
     }
 }
@@ -1046,10 +1069,17 @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev,
     }
 
     if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
+        VirtIOSCSIEventInfo info = {
+            .event   = VIRTIO_SCSI_T_TRANSPORT_RESET,
+            .reason  = VIRTIO_SCSI_EVT_RESET_RESCAN,
+            .address = {
+                .id  = sd->id,
+                .lun = sd->lun,
+            },
+        };
+
         virtio_scsi_acquire(s);
-        virtio_scsi_push_event(s, sd,
-                               VIRTIO_SCSI_T_TRANSPORT_RESET,
-                               VIRTIO_SCSI_EVT_RESET_RESCAN);
+        virtio_scsi_push_event(s, &info);
         scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
         virtio_scsi_release(s);
     }
@@ -1062,15 +1092,14 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
     VirtIOSCSI *s = VIRTIO_SCSI(vdev);
     SCSIDevice *sd = SCSI_DEVICE(dev);
     AioContext *ctx = s->ctx ?: qemu_get_aio_context();
-
-    if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
-        virtio_scsi_acquire(s);
-        virtio_scsi_push_event(s, sd,
-                               VIRTIO_SCSI_T_TRANSPORT_RESET,
-                               VIRTIO_SCSI_EVT_RESET_REMOVED);
-        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
-        virtio_scsi_release(s);
-    }
+    VirtIOSCSIEventInfo info = {
+        .event   = VIRTIO_SCSI_T_TRANSPORT_RESET,
+        .reason  = VIRTIO_SCSI_EVT_RESET_REMOVED,
+        .address = {
+            .id  = sd->id,
+            .lun = sd->lun,
+        },
+    };
 
     aio_disable_external(ctx);
     qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
@@ -1082,6 +1111,13 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
         blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL);
         virtio_scsi_release(s);
     }
+
+    if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
+        virtio_scsi_acquire(s);
+        virtio_scsi_push_event(s, &info);
+        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
+        virtio_scsi_release(s);
+    }
 }
 
 static struct SCSIBusInfo virtio_scsi_scsi_info = {
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 05/21] virtio-scsi: stop using aio_disable_external() during unplug
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 04/21] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-09 18:55   ` Kevin Wolf
  2023-05-04 19:53 ` [PATCH v5 06/21] util/vhost-user-server: rename refcount to in_flight counter Stefan Hajnoczi
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum,
	Zhengui Li, Daniil Tatianin

This patch is part of an effort to remove the aio_disable_external()
API because it does not fit in a multi-queue block layer world where
many AioContexts may be submitting requests to the same disk.

The SCSI emulation code is already in good shape to stop using
aio_disable_external(). It was only used by commit 9c5aad84da1c
("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
disk") to ensure that virtio_scsi_hotunplug() works while the guest
driver is submitting I/O.

Ensure virtio_scsi_hotunplug() is safe as follows:

1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
   device_set_realized() calls qatomic_set(&dev->realized, false) so
   that future scsi_device_get() calls return NULL because they exclude
   SCSIDevices with realized=false.

   That means virtio-scsi will reject new I/O requests to this
   SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
   virtio_scsi_hotunplug() is still executing. We are protected against
   new requests!

2. scsi_device_unrealize() already contains a call to
   scsi_device_purge_requests() so that in-flight requests are cancelled
   synchronously. This ensures that no in-flight requests remain once
   qdev_simple_device_unplug_cb() returns.

Thanks to these two conditions we don't need aio_disable_external()
anymore.

Cc: Zhengui Li <lizhengui@huawei.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/scsi/virtio-scsi.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index ae314af3de..c1a7ea9ae2 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1091,7 +1091,6 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
     VirtIODevice *vdev = VIRTIO_DEVICE(hotplug_dev);
     VirtIOSCSI *s = VIRTIO_SCSI(vdev);
     SCSIDevice *sd = SCSI_DEVICE(dev);
-    AioContext *ctx = s->ctx ?: qemu_get_aio_context();
     VirtIOSCSIEventInfo info = {
         .event   = VIRTIO_SCSI_T_TRANSPORT_RESET,
         .reason  = VIRTIO_SCSI_EVT_RESET_REMOVED,
@@ -1101,9 +1100,7 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
         },
     };
 
-    aio_disable_external(ctx);
     qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
-    aio_enable_external(ctx);
 
     if (s->ctx) {
         virtio_scsi_acquire(s);
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 06/21] util/vhost-user-server: rename refcount to in_flight counter
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 05/21] virtio-scsi: stop using aio_disable_external() during unplug Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 07/21] block/export: wait for vhost-user-blk requests when draining Stefan Hajnoczi
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

The VuServer object has a refcount field and ref/unref APIs. The name is
confusing because it's actually an in-flight request counter instead of
a refcount.

Normally a refcount destroys the object upon reaching zero. The VuServer
counter is used to wake up the vhost-user coroutine when there are no
more requests.

Avoid confusing by renaming refcount and ref/unref to in_flight and
inc/dec.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/vhost-user-server.h     |  6 +++---
 block/export/vhost-user-blk-server.c | 11 +++++++----
 util/vhost-user-server.c             | 14 +++++++-------
 3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h
index 25c72433ca..bc0ac9ddb6 100644
--- a/include/qemu/vhost-user-server.h
+++ b/include/qemu/vhost-user-server.h
@@ -41,7 +41,7 @@ typedef struct {
     const VuDevIface *vu_iface;
 
     /* Protected by ctx lock */
-    unsigned int refcount;
+    unsigned int in_flight;
     bool wait_idle;
     VuDev vu_dev;
     QIOChannel *ioc; /* The I/O channel with the client */
@@ -60,8 +60,8 @@ bool vhost_user_server_start(VuServer *server,
 
 void vhost_user_server_stop(VuServer *server);
 
-void vhost_user_server_ref(VuServer *server);
-void vhost_user_server_unref(VuServer *server);
+void vhost_user_server_inc_in_flight(VuServer *server);
+void vhost_user_server_dec_in_flight(VuServer *server);
 
 void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
 void vhost_user_server_detach_aio_context(VuServer *server);
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
index e56b92f2e2..841acb36e3 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -50,7 +50,10 @@ static void vu_blk_req_complete(VuBlkReq *req, size_t in_len)
     free(req);
 }
 
-/* Called with server refcount increased, must decrease before returning */
+/*
+ * Called with server in_flight counter increased, must decrease before
+ * returning.
+ */
 static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
 {
     VuBlkReq *req = opaque;
@@ -68,12 +71,12 @@ static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
                                     in_num, out_num);
     if (in_len < 0) {
         free(req);
-        vhost_user_server_unref(server);
+        vhost_user_server_dec_in_flight(server);
         return;
     }
 
     vu_blk_req_complete(req, in_len);
-    vhost_user_server_unref(server);
+    vhost_user_server_dec_in_flight(server);
 }
 
 static void vu_blk_process_vq(VuDev *vu_dev, int idx)
@@ -95,7 +98,7 @@ static void vu_blk_process_vq(VuDev *vu_dev, int idx)
         Coroutine *co =
             qemu_coroutine_create(vu_blk_virtio_process_req, req);
 
-        vhost_user_server_ref(server);
+        vhost_user_server_inc_in_flight(server);
         qemu_coroutine_enter(co);
     }
 }
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 5b6216069c..1622f8cfb3 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -75,16 +75,16 @@ static void panic_cb(VuDev *vu_dev, const char *buf)
     error_report("vu_panic: %s", buf);
 }
 
-void vhost_user_server_ref(VuServer *server)
+void vhost_user_server_inc_in_flight(VuServer *server)
 {
     assert(!server->wait_idle);
-    server->refcount++;
+    server->in_flight++;
 }
 
-void vhost_user_server_unref(VuServer *server)
+void vhost_user_server_dec_in_flight(VuServer *server)
 {
-    server->refcount--;
-    if (server->wait_idle && !server->refcount) {
+    server->in_flight--;
+    if (server->wait_idle && !server->in_flight) {
         aio_co_wake(server->co_trip);
     }
 }
@@ -192,13 +192,13 @@ static coroutine_fn void vu_client_trip(void *opaque)
         /* Keep running */
     }
 
-    if (server->refcount) {
+    if (server->in_flight) {
         /* Wait for requests to complete before we can unmap the memory */
         server->wait_idle = true;
         qemu_coroutine_yield();
         server->wait_idle = false;
     }
-    assert(server->refcount == 0);
+    assert(server->in_flight == 0);
 
     vu_deinit(vu_dev);
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 07/21] block/export: wait for vhost-user-blk requests when draining
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 06/21] util/vhost-user-server: rename refcount to in_flight counter Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 08/21] block/export: stop using is_external in vhost-user-blk server Stefan Hajnoczi
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

Each vhost-user-blk request runs in a coroutine. When the BlockBackend
enters a drained section we need to enter a quiescent state. Currently
any in-flight requests race with bdrv_drained_begin() because it is
unaware of vhost-user-blk requests.

When blk_co_preadv/pwritev()/etc returns it wakes the
bdrv_drained_begin() thread but vhost-user-blk request processing has
not yet finished. The request coroutine continues executing while the
main loop thread thinks it is in a drained section.

One example where this is unsafe is for blk_set_aio_context() where
bdrv_drained_begin() is called before .aio_context_detached() and
.aio_context_attach(). If request coroutines are still running after
bdrv_drained_begin(), then the AioContext could change underneath them
and they race with new requests processed in the new AioContext. This
could lead to virtqueue corruption, for example.

(This example is theoretical, I came across this while reading the
code and have not tried to reproduce it.)

It's easy to make bdrv_drained_begin() wait for in-flight requests: add
a .drained_poll() callback that checks the VuServer's in-flight counter.
VuServer just needs an API that returns true when there are requests in
flight. The in-flight counter needs to be atomic.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v5:
- Use atomic accesses for in_flight counter in vhost-user-server.c [Kevin]
---
 include/qemu/vhost-user-server.h     |  4 +++-
 block/export/vhost-user-blk-server.c | 13 +++++++++++++
 util/vhost-user-server.c             | 18 ++++++++++++------
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h
index bc0ac9ddb6..b1c1cda886 100644
--- a/include/qemu/vhost-user-server.h
+++ b/include/qemu/vhost-user-server.h
@@ -40,8 +40,9 @@ typedef struct {
     int max_queues;
     const VuDevIface *vu_iface;
 
+    unsigned int in_flight; /* atomic */
+
     /* Protected by ctx lock */
-    unsigned int in_flight;
     bool wait_idle;
     VuDev vu_dev;
     QIOChannel *ioc; /* The I/O channel with the client */
@@ -62,6 +63,7 @@ void vhost_user_server_stop(VuServer *server);
 
 void vhost_user_server_inc_in_flight(VuServer *server);
 void vhost_user_server_dec_in_flight(VuServer *server);
+bool vhost_user_server_has_in_flight(VuServer *server);
 
 void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
 void vhost_user_server_detach_aio_context(VuServer *server);
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
index 841acb36e3..f51a36a14f 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -272,7 +272,20 @@ static void vu_blk_exp_resize(void *opaque)
     vu_config_change_msg(&vexp->vu_server.vu_dev);
 }
 
+/*
+ * Ensures that bdrv_drained_begin() waits until in-flight requests complete.
+ *
+ * Called with vexp->export.ctx acquired.
+ */
+static bool vu_blk_drained_poll(void *opaque)
+{
+    VuBlkExport *vexp = opaque;
+
+    return vhost_user_server_has_in_flight(&vexp->vu_server);
+}
+
 static const BlockDevOps vu_blk_dev_ops = {
+    .drained_poll  = vu_blk_drained_poll,
     .resize_cb = vu_blk_exp_resize,
 };
 
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 1622f8cfb3..68c3bf162f 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -78,17 +78,23 @@ static void panic_cb(VuDev *vu_dev, const char *buf)
 void vhost_user_server_inc_in_flight(VuServer *server)
 {
     assert(!server->wait_idle);
-    server->in_flight++;
+    qatomic_inc(&server->in_flight);
 }
 
 void vhost_user_server_dec_in_flight(VuServer *server)
 {
-    server->in_flight--;
-    if (server->wait_idle && !server->in_flight) {
-        aio_co_wake(server->co_trip);
+    if (qatomic_fetch_dec(&server->in_flight) == 1) {
+        if (server->wait_idle) {
+            aio_co_wake(server->co_trip);
+        }
     }
 }
 
+bool vhost_user_server_has_in_flight(VuServer *server)
+{
+    return qatomic_load_acquire(&server->in_flight) > 0;
+}
+
 static bool coroutine_fn
 vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
 {
@@ -192,13 +198,13 @@ static coroutine_fn void vu_client_trip(void *opaque)
         /* Keep running */
     }
 
-    if (server->in_flight) {
+    if (vhost_user_server_has_in_flight(server)) {
         /* Wait for requests to complete before we can unmap the memory */
         server->wait_idle = true;
         qemu_coroutine_yield();
         server->wait_idle = false;
     }
-    assert(server->in_flight == 0);
+    assert(!vhost_user_server_has_in_flight(server));
 
     vu_deinit(vu_dev);
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 08/21] block/export: stop using is_external in vhost-user-blk server
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 07/21] block/export: wait for vhost-user-blk requests when draining Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 09/21] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Stefan Hajnoczi
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

vhost-user activity must be suspended during bdrv_drained_begin/end().
This prevents new requests from interfering with whatever is happening
in the drained section.

Previously this was done using aio_set_fd_handler()'s is_external
argument. In a multi-queue block layer world the aio_disable_external()
API cannot be used since multiple AioContext may be processing I/O, not
just one.

Switch to BlockDevOps->drained_begin/end() callbacks.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/export/vhost-user-blk-server.c | 28 ++++++++++++++++++++++++++--
 util/vhost-user-server.c             | 10 +++++-----
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
index f51a36a14f..81b59761e3 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -212,15 +212,21 @@ static void blk_aio_attached(AioContext *ctx, void *opaque)
 {
     VuBlkExport *vexp = opaque;
 
+    /*
+     * The actual attach will happen in vu_blk_drained_end() and we just
+     * restore ctx here.
+     */
     vexp->export.ctx = ctx;
-    vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
 }
 
 static void blk_aio_detach(void *opaque)
 {
     VuBlkExport *vexp = opaque;
 
-    vhost_user_server_detach_aio_context(&vexp->vu_server);
+    /*
+     * The actual detach already happened in vu_blk_drained_begin() but from
+     * this point on we must not access ctx anymore.
+     */
     vexp->export.ctx = NULL;
 }
 
@@ -272,6 +278,22 @@ static void vu_blk_exp_resize(void *opaque)
     vu_config_change_msg(&vexp->vu_server.vu_dev);
 }
 
+/* Called with vexp->export.ctx acquired */
+static void vu_blk_drained_begin(void *opaque)
+{
+    VuBlkExport *vexp = opaque;
+
+    vhost_user_server_detach_aio_context(&vexp->vu_server);
+}
+
+/* Called with vexp->export.blk AioContext acquired */
+static void vu_blk_drained_end(void *opaque)
+{
+    VuBlkExport *vexp = opaque;
+
+    vhost_user_server_attach_aio_context(&vexp->vu_server, vexp->export.ctx);
+}
+
 /*
  * Ensures that bdrv_drained_begin() waits until in-flight requests complete.
  *
@@ -285,6 +307,8 @@ static bool vu_blk_drained_poll(void *opaque)
 }
 
 static const BlockDevOps vu_blk_dev_ops = {
+    .drained_begin = vu_blk_drained_begin,
+    .drained_end   = vu_blk_drained_end,
     .drained_poll  = vu_blk_drained_poll,
     .resize_cb = vu_blk_exp_resize,
 };
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 68c3bf162f..a12b2d1bba 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -278,7 +278,7 @@ set_watch(VuDev *vu_dev, int fd, int vu_evt,
         vu_fd_watch->fd = fd;
         vu_fd_watch->cb = cb;
         qemu_socket_set_nonblock(fd);
-        aio_set_fd_handler(server->ioc->ctx, fd, true, kick_handler,
+        aio_set_fd_handler(server->ioc->ctx, fd, false, kick_handler,
                            NULL, NULL, NULL, vu_fd_watch);
         vu_fd_watch->vu_dev = vu_dev;
         vu_fd_watch->pvt = pvt;
@@ -299,7 +299,7 @@ static void remove_watch(VuDev *vu_dev, int fd)
     if (!vu_fd_watch) {
         return;
     }
-    aio_set_fd_handler(server->ioc->ctx, fd, true,
+    aio_set_fd_handler(server->ioc->ctx, fd, false,
                        NULL, NULL, NULL, NULL, NULL);
 
     QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
@@ -362,7 +362,7 @@ void vhost_user_server_stop(VuServer *server)
         VuFdWatch *vu_fd_watch;
 
         QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
-            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
+            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false,
                                NULL, NULL, NULL, NULL, vu_fd_watch);
         }
 
@@ -403,7 +403,7 @@ void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx)
     qio_channel_attach_aio_context(server->ioc, ctx);
 
     QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
-        aio_set_fd_handler(ctx, vu_fd_watch->fd, true, kick_handler, NULL,
+        aio_set_fd_handler(ctx, vu_fd_watch->fd, false, kick_handler, NULL,
                            NULL, NULL, vu_fd_watch);
     }
 
@@ -417,7 +417,7 @@ void vhost_user_server_detach_aio_context(VuServer *server)
         VuFdWatch *vu_fd_watch;
 
         QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
-            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
+            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false,
                                NULL, NULL, NULL, NULL, vu_fd_watch);
         }
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 09/21] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (7 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 08/21] block/export: stop using is_external in vhost-user-blk server Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 10/21] block: add blk_in_drain() API Stefan Hajnoczi
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum,
	David Woodhouse

There is no need to suspend activity between aio_disable_external() and
aio_enable_external(), which is mainly used for the block layer's drain
operation.

This is part of ongoing work to remove the aio_disable_external() API.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/i386/kvm/xen_xenstore.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index 900679af8a..6e81bc8791 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -133,7 +133,7 @@ static void xen_xenstore_realize(DeviceState *dev, Error **errp)
         error_setg(errp, "Xenstore evtchn port init failed");
         return;
     }
-    aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), true,
+    aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), false,
                        xen_xenstore_event, NULL, NULL, NULL, s);
 
     s->impl = xs_impl_create(xen_domid);
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 10/21] block: add blk_in_drain() API
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (8 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 09/21] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 11/21] block: drain from main loop thread in bdrv_co_yield_to_drain() Stefan Hajnoczi
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

The BlockBackend quiesce_counter is greater than zero during drained
sections. Add an API to check whether the BlockBackend is in a drained
section.

The next patch will use this API.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/sysemu/block-backend-global-state.h | 1 +
 block/block-backend.c                       | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/include/sysemu/block-backend-global-state.h b/include/sysemu/block-backend-global-state.h
index 2b6d27db7c..ac7cbd6b5e 100644
--- a/include/sysemu/block-backend-global-state.h
+++ b/include/sysemu/block-backend-global-state.h
@@ -78,6 +78,7 @@ void blk_activate(BlockBackend *blk, Error **errp);
 int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags);
 void blk_aio_cancel(BlockAIOCB *acb);
 int blk_commit_all(void);
+bool blk_in_drain(BlockBackend *blk);
 void blk_drain(BlockBackend *blk);
 void blk_drain_all(void);
 void blk_set_on_error(BlockBackend *blk, BlockdevOnError on_read_error,
diff --git a/block/block-backend.c b/block/block-backend.c
index 68d38635bc..96f03cae95 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1270,6 +1270,13 @@ blk_check_byte_request(BlockBackend *blk, int64_t offset, int64_t bytes)
     return 0;
 }
 
+/* Are we currently in a drained section? */
+bool blk_in_drain(BlockBackend *blk)
+{
+    GLOBAL_STATE_CODE(); /* change to IO_OR_GS_CODE(), if necessary */
+    return qatomic_read(&blk->quiesce_counter);
+}
+
 /* To be called between exactly one pair of blk_inc/dec_in_flight() */
 static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
 {
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 11/21] block: drain from main loop thread in bdrv_co_yield_to_drain()
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (9 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 10/21] block: add blk_in_drain() API Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 12/21] xen-block: implement BlockDevOps->drained_begin() Stefan Hajnoczi
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

For simplicity, always run BlockDevOps .drained_begin/end/poll()
callbacks in the main loop thread. This makes it easier to implement the
callbacks and avoids extra locks.

Move the function pointer declarations from the I/O Code section to the
Global State section for BlockDevOps, BdrvChildClass, and BlockDriver.

Narrow IO_OR_GS_CODE() to GLOBAL_STATE_CODE() where appropriate.

The test-bdrv-drain test case calls bdrv_drain() from an IOThread. This
is now only allowed from coroutine context, so update the test case to
run in a coroutine.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/block_int-common.h      | 90 +++++++++++++--------------
 include/sysemu/block-backend-common.h | 25 ++++----
 block/io.c                            | 14 +++--
 tests/unit/test-bdrv-drain.c          | 14 +++--
 4 files changed, 76 insertions(+), 67 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 013d419444..f462a8be55 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -356,6 +356,21 @@ struct BlockDriver {
     void (*bdrv_attach_aio_context)(BlockDriverState *bs,
                                     AioContext *new_context);
 
+    /**
+     * bdrv_drain_begin is called if implemented in the beginning of a
+     * drain operation to drain and stop any internal sources of requests in
+     * the driver.
+     * bdrv_drain_end is called if implemented at the end of the drain.
+     *
+     * They should be used by the driver to e.g. manage scheduled I/O
+     * requests, or toggle an internal state. After the end of the drain new
+     * requests will continue normally.
+     *
+     * Implementations of both functions must not call aio_poll().
+     */
+    void (*bdrv_drain_begin)(BlockDriverState *bs);
+    void (*bdrv_drain_end)(BlockDriverState *bs);
+
     /**
      * Try to get @bs's logical and physical block size.
      * On success, store them in @bsz and return zero.
@@ -743,21 +758,6 @@ struct BlockDriver {
     void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_unplug)(
         BlockDriverState *bs);
 
-    /**
-     * bdrv_drain_begin is called if implemented in the beginning of a
-     * drain operation to drain and stop any internal sources of requests in
-     * the driver.
-     * bdrv_drain_end is called if implemented at the end of the drain.
-     *
-     * They should be used by the driver to e.g. manage scheduled I/O
-     * requests, or toggle an internal state. After the end of the drain new
-     * requests will continue normally.
-     *
-     * Implementations of both functions must not call aio_poll().
-     */
-    void (*bdrv_drain_begin)(BlockDriverState *bs);
-    void (*bdrv_drain_end)(BlockDriverState *bs);
-
     bool (*bdrv_supports_persistent_dirty_bitmap)(BlockDriverState *bs);
 
     bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_can_store_new_dirty_bitmap)(
@@ -920,36 +920,6 @@ struct BdrvChildClass {
     void GRAPH_WRLOCK_PTR (*attach)(BdrvChild *child);
     void GRAPH_WRLOCK_PTR (*detach)(BdrvChild *child);
 
-    /*
-     * Notifies the parent that the filename of its child has changed (e.g.
-     * because the direct child was removed from the backing chain), so that it
-     * can update its reference.
-     */
-    int (*update_filename)(BdrvChild *child, BlockDriverState *new_base,
-                           const char *filename, Error **errp);
-
-    bool (*change_aio_ctx)(BdrvChild *child, AioContext *ctx,
-                           GHashTable *visited, Transaction *tran,
-                           Error **errp);
-
-    /*
-     * I/O API functions. These functions are thread-safe.
-     *
-     * See include/block/block-io.h for more information about
-     * the I/O API.
-     */
-
-    void (*resize)(BdrvChild *child);
-
-    /*
-     * Returns a name that is supposedly more useful for human users than the
-     * node name for identifying the node in question (in particular, a BB
-     * name), or NULL if the parent can't provide a better name.
-     */
-    const char *(*get_name)(BdrvChild *child);
-
-    AioContext *(*get_parent_aio_context)(BdrvChild *child);
-
     /*
      * If this pair of functions is implemented, the parent doesn't issue new
      * requests after returning from .drained_begin() until .drained_end() is
@@ -970,6 +940,36 @@ struct BdrvChildClass {
      * activity on the child has stopped.
      */
     bool (*drained_poll)(BdrvChild *child);
+
+    /*
+     * Notifies the parent that the filename of its child has changed (e.g.
+     * because the direct child was removed from the backing chain), so that it
+     * can update its reference.
+     */
+    int (*update_filename)(BdrvChild *child, BlockDriverState *new_base,
+                           const char *filename, Error **errp);
+
+    bool (*change_aio_ctx)(BdrvChild *child, AioContext *ctx,
+                           GHashTable *visited, Transaction *tran,
+                           Error **errp);
+
+    /*
+     * I/O API functions. These functions are thread-safe.
+     *
+     * See include/block/block-io.h for more information about
+     * the I/O API.
+     */
+
+    void (*resize)(BdrvChild *child);
+
+    /*
+     * Returns a name that is supposedly more useful for human users than the
+     * node name for identifying the node in question (in particular, a BB
+     * name), or NULL if the parent can't provide a better name.
+     */
+    const char *(*get_name)(BdrvChild *child);
+
+    AioContext *(*get_parent_aio_context)(BdrvChild *child);
 };
 
 extern const BdrvChildClass child_of_bds;
diff --git a/include/sysemu/block-backend-common.h b/include/sysemu/block-backend-common.h
index 2391679c56..780cea7305 100644
--- a/include/sysemu/block-backend-common.h
+++ b/include/sysemu/block-backend-common.h
@@ -59,6 +59,19 @@ typedef struct BlockDevOps {
      */
     bool (*is_medium_locked)(void *opaque);
 
+    /*
+     * Runs when the backend receives a drain request.
+     */
+    void (*drained_begin)(void *opaque);
+    /*
+     * Runs when the backend's last drain request ends.
+     */
+    void (*drained_end)(void *opaque);
+    /*
+     * Is the device still busy?
+     */
+    bool (*drained_poll)(void *opaque);
+
     /*
      * I/O API functions. These functions are thread-safe.
      *
@@ -76,18 +89,6 @@ typedef struct BlockDevOps {
      * Runs when the size changed (e.g. monitor command block_resize)
      */
     void (*resize_cb)(void *opaque);
-    /*
-     * Runs when the backend receives a drain request.
-     */
-    void (*drained_begin)(void *opaque);
-    /*
-     * Runs when the backend's last drain request ends.
-     */
-    void (*drained_end)(void *opaque);
-    /*
-     * Is the device still busy?
-     */
-    bool (*drained_poll)(void *opaque);
 } BlockDevOps;
 
 /*
diff --git a/block/io.c b/block/io.c
index 6fa1993374..532c8c90c9 100644
--- a/block/io.c
+++ b/block/io.c
@@ -60,7 +60,7 @@ static void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore)
 
 void bdrv_parent_drained_end_single(BdrvChild *c)
 {
-    IO_OR_GS_CODE();
+    GLOBAL_STATE_CODE();
 
     assert(c->quiesced_parent);
     c->quiesced_parent = false;
@@ -108,7 +108,7 @@ static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore,
 
 void bdrv_parent_drained_begin_single(BdrvChild *c)
 {
-    IO_OR_GS_CODE();
+    GLOBAL_STATE_CODE();
 
     assert(!c->quiesced_parent);
     c->quiesced_parent = true;
@@ -248,7 +248,7 @@ typedef struct {
 bool bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent,
                      bool ignore_bds_parents)
 {
-    IO_OR_GS_CODE();
+    GLOBAL_STATE_CODE();
 
     if (bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)) {
         return true;
@@ -335,7 +335,8 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
     if (ctx != co_ctx) {
         aio_context_release(ctx);
     }
-    replay_bh_schedule_oneshot_event(ctx, bdrv_co_drain_bh_cb, &data);
+    replay_bh_schedule_oneshot_event(qemu_get_aio_context(),
+                                     bdrv_co_drain_bh_cb, &data);
 
     qemu_coroutine_yield();
     /* If we are resumed from some other event (such as an aio completion or a
@@ -358,6 +359,8 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
         return;
     }
 
+    GLOBAL_STATE_CODE();
+
     /* Stop things in parent-to-child order */
     if (qatomic_fetch_inc(&bs->quiesce_counter) == 0) {
         aio_disable_external(bdrv_get_aio_context(bs));
@@ -400,11 +403,14 @@ static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent)
 {
     int old_quiesce_counter;
 
+    IO_OR_GS_CODE();
+
     if (qemu_in_coroutine()) {
         bdrv_co_yield_to_drain(bs, false, parent, false);
         return;
     }
     assert(bs->quiesce_counter > 0);
+    GLOBAL_STATE_CODE();
 
     /* Re-enable things in child-to-parent order */
     old_quiesce_counter = qatomic_fetch_dec(&bs->quiesce_counter);
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index d9d3807062..dc3cb9e0e3 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -445,19 +445,19 @@ struct test_iothread_data {
     BlockDriverState *bs;
     enum drain_type drain_type;
     int *aio_ret;
+    bool co_done;
 };
 
-static void test_iothread_drain_entry(void *opaque)
+static void coroutine_fn test_iothread_drain_co_entry(void *opaque)
 {
     struct test_iothread_data *data = opaque;
 
-    aio_context_acquire(bdrv_get_aio_context(data->bs));
     do_drain_begin(data->drain_type, data->bs);
     g_assert_cmpint(*data->aio_ret, ==, 0);
     do_drain_end(data->drain_type, data->bs);
-    aio_context_release(bdrv_get_aio_context(data->bs));
 
-    qemu_event_set(&done_event);
+    data->co_done = true;
+    aio_wait_kick();
 }
 
 static void test_iothread_aio_cb(void *opaque, int ret)
@@ -493,6 +493,7 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread)
     BlockDriverState *bs;
     BDRVTestState *s;
     BlockAIOCB *acb;
+    Coroutine *co;
     int aio_ret;
     struct test_iothread_data data;
 
@@ -571,8 +572,9 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread)
         }
         break;
     case 1:
-        aio_bh_schedule_oneshot(ctx_a, test_iothread_drain_entry, &data);
-        qemu_event_wait(&done_event);
+        co = qemu_coroutine_create(test_iothread_drain_co_entry, &data);
+        aio_co_enter(ctx_a, co);
+        AIO_WAIT_WHILE_UNLOCKED(NULL, !data.co_done);
         break;
     default:
         g_assert_not_reached();
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 12/21] xen-block: implement BlockDevOps->drained_begin()
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (10 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 11/21] block: drain from main loop thread in bdrv_co_yield_to_drain() Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-16 14:24     ` Anthony PERARD
  2023-05-16 14:45     ` Anthony PERARD
  2023-05-04 19:53 ` [PATCH v5 13/21] hw/xen: do not set is_external=true on evtchn fds Stefan Hajnoczi
                   ` (10 subsequent siblings)
  22 siblings, 2 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

Detach event channels during drained sections to stop I/O submission
from the ring. xen-block is no longer reliant on aio_disable_external()
after this patch. This will allow us to remove the
aio_disable_external() API once all other code that relies on it is
converted.

Extend xen_device_set_event_channel_context() to allow ctx=NULL. The
event channel still exists but the event loop does not monitor the file
descriptor. Event channel processing can resume by calling
xen_device_set_event_channel_context() with a non-NULL ctx.

Factor out xen_device_set_event_channel_context() calls in
hw/block/dataplane/xen-block.c into attach/detach helper functions.
Incidentally, these don't require the AioContext lock because
aio_set_fd_handler() is thread-safe.

It's safer to register BlockDevOps after the dataplane instance has been
created. The BlockDevOps .drained_begin/end() callbacks depend on the
dataplane instance, so move the blk_set_dev_ops() call after
xen_block_dataplane_create().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/dataplane/xen-block.h |  2 ++
 hw/block/dataplane/xen-block.c | 42 +++++++++++++++++++++++++---------
 hw/block/xen-block.c           | 24 ++++++++++++++++---
 hw/xen/xen-bus.c               |  7 ++++--
 4 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/hw/block/dataplane/xen-block.h b/hw/block/dataplane/xen-block.h
index 76dcd51c3d..7b8e9df09f 100644
--- a/hw/block/dataplane/xen-block.h
+++ b/hw/block/dataplane/xen-block.h
@@ -26,5 +26,7 @@ void xen_block_dataplane_start(XenBlockDataPlane *dataplane,
                                unsigned int protocol,
                                Error **errp);
 void xen_block_dataplane_stop(XenBlockDataPlane *dataplane);
+void xen_block_dataplane_attach(XenBlockDataPlane *dataplane);
+void xen_block_dataplane_detach(XenBlockDataPlane *dataplane);
 
 #endif /* HW_BLOCK_DATAPLANE_XEN_BLOCK_H */
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index d8bc39d359..2597f38805 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -664,6 +664,30 @@ void xen_block_dataplane_destroy(XenBlockDataPlane *dataplane)
     g_free(dataplane);
 }
 
+void xen_block_dataplane_detach(XenBlockDataPlane *dataplane)
+{
+    if (!dataplane || !dataplane->event_channel) {
+        return;
+    }
+
+    /* Only reason for failure is a NULL channel */
+    xen_device_set_event_channel_context(dataplane->xendev,
+                                         dataplane->event_channel,
+                                         NULL, &error_abort);
+}
+
+void xen_block_dataplane_attach(XenBlockDataPlane *dataplane)
+{
+    if (!dataplane || !dataplane->event_channel) {
+        return;
+    }
+
+    /* Only reason for failure is a NULL channel */
+    xen_device_set_event_channel_context(dataplane->xendev,
+                                         dataplane->event_channel,
+                                         dataplane->ctx, &error_abort);
+}
+
 void xen_block_dataplane_stop(XenBlockDataPlane *dataplane)
 {
     XenDevice *xendev;
@@ -674,13 +698,11 @@ void xen_block_dataplane_stop(XenBlockDataPlane *dataplane)
 
     xendev = dataplane->xendev;
 
-    aio_context_acquire(dataplane->ctx);
-    if (dataplane->event_channel) {
-        /* Only reason for failure is a NULL channel */
-        xen_device_set_event_channel_context(xendev, dataplane->event_channel,
-                                             qemu_get_aio_context(),
-                                             &error_abort);
+    if (!blk_in_drain(dataplane->blk)) {
+        xen_block_dataplane_detach(dataplane);
     }
+
+    aio_context_acquire(dataplane->ctx);
     /* Xen doesn't have multiple users for nodes, so this can't fail */
     blk_set_aio_context(dataplane->blk, qemu_get_aio_context(), &error_abort);
     aio_context_release(dataplane->ctx);
@@ -819,11 +841,9 @@ void xen_block_dataplane_start(XenBlockDataPlane *dataplane,
     blk_set_aio_context(dataplane->blk, dataplane->ctx, NULL);
     aio_context_release(old_context);
 
-    /* Only reason for failure is a NULL channel */
-    aio_context_acquire(dataplane->ctx);
-    xen_device_set_event_channel_context(xendev, dataplane->event_channel,
-                                         dataplane->ctx, &error_abort);
-    aio_context_release(dataplane->ctx);
+    if (!blk_in_drain(dataplane->blk)) {
+        xen_block_dataplane_attach(dataplane);
+    }
 
     return;
 
diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index f5a744589d..f099914831 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -189,8 +189,26 @@ static void xen_block_resize_cb(void *opaque)
     xen_device_backend_printf(xendev, "state", "%u", state);
 }
 
+/* Suspend request handling */
+static void xen_block_drained_begin(void *opaque)
+{
+    XenBlockDevice *blockdev = opaque;
+
+    xen_block_dataplane_detach(blockdev->dataplane);
+}
+
+/* Resume request handling */
+static void xen_block_drained_end(void *opaque)
+{
+    XenBlockDevice *blockdev = opaque;
+
+    xen_block_dataplane_attach(blockdev->dataplane);
+}
+
 static const BlockDevOps xen_block_dev_ops = {
-    .resize_cb = xen_block_resize_cb,
+    .resize_cb     = xen_block_resize_cb,
+    .drained_begin = xen_block_drained_begin,
+    .drained_end   = xen_block_drained_end,
 };
 
 static void xen_block_realize(XenDevice *xendev, Error **errp)
@@ -242,8 +260,6 @@ static void xen_block_realize(XenDevice *xendev, Error **errp)
         return;
     }
 
-    blk_set_dev_ops(blk, &xen_block_dev_ops, blockdev);
-
     if (conf->discard_granularity == -1) {
         conf->discard_granularity = conf->physical_block_size;
     }
@@ -277,6 +293,8 @@ static void xen_block_realize(XenDevice *xendev, Error **errp)
     blockdev->dataplane =
         xen_block_dataplane_create(xendev, blk, conf->logical_block_size,
                                    blockdev->props.iothread);
+
+    blk_set_dev_ops(blk, &xen_block_dev_ops, blockdev);
 }
 
 static void xen_block_frontend_changed(XenDevice *xendev,
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index c59850b1de..b8f408c9ed 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -846,8 +846,11 @@ void xen_device_set_event_channel_context(XenDevice *xendev,
                            NULL, NULL, NULL, NULL, NULL);
 
     channel->ctx = ctx;
-    aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), true,
-                       xen_device_event, NULL, xen_device_poll, NULL, channel);
+    if (ctx) {
+        aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh),
+                           true, xen_device_event, NULL, xen_device_poll, NULL,
+                           channel);
+    }
 }
 
 XenEventChannel *xen_device_bind_event_channel(XenDevice *xendev,
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 13/21] hw/xen: do not set is_external=true on evtchn fds
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (11 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 12/21] xen-block: implement BlockDevOps->drained_begin() Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-16 14:25     ` Anthony PERARD
  2023-05-04 19:53 ` [PATCH v5 14/21] block/export: rewrite vduse-blk drain code Stefan Hajnoczi
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

is_external=true suspends fd handlers between aio_disable_external() and
aio_enable_external(). The block layer's drain operation uses this
mechanism to prevent new I/O from sneaking in between
bdrv_drained_begin() and bdrv_drained_end().

The previous commit converted the xen-block device to use BlockDevOps
.drained_begin/end() callbacks. It no longer relies on is_external=true
so it is safe to pass is_external=false.

This is part of ongoing work to remove the aio_disable_external() API.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/xen/xen-bus.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index b8f408c9ed..bf256d4da2 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -842,14 +842,14 @@ void xen_device_set_event_channel_context(XenDevice *xendev,
     }
 
     if (channel->ctx)
-        aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), true,
+        aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
                            NULL, NULL, NULL, NULL, NULL);
 
     channel->ctx = ctx;
     if (ctx) {
         aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh),
-                           true, xen_device_event, NULL, xen_device_poll, NULL,
-                           channel);
+                           false, xen_device_event, NULL, xen_device_poll,
+                           NULL, channel);
     }
 }
 
@@ -923,7 +923,7 @@ void xen_device_unbind_event_channel(XenDevice *xendev,
 
     QLIST_REMOVE(channel, list);
 
-    aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), true,
+    aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
                        NULL, NULL, NULL, NULL, NULL);
 
     if (qemu_xen_evtchn_unbind(channel->xeh, channel->local_port) < 0) {
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 14/21] block/export: rewrite vduse-blk drain code
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (12 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 13/21] hw/xen: do not set is_external=true on evtchn fds Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 15/21] block/export: don't require AioContext lock around blk_exp_ref/unref() Stefan Hajnoczi
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

vduse_blk_detach_ctx() waits for in-flight requests using
AIO_WAIT_WHILE(). This is not allowed according to a comment in
bdrv_set_aio_context_commit():

  /*
   * Take the old AioContex when detaching it from bs.
   * At this point, new_context lock is already acquired, and we are now
   * also taking old_context. This is safe as long as bdrv_detach_aio_context
   * does not call AIO_POLL_WHILE().
   */

Use this opportunity to rewrite the drain code in vduse-blk:

- Use the BlockExport refcount so that vduse_blk_exp_delete() is only
  called when there are no more requests in flight.

- Implement .drained_poll() so in-flight request coroutines are stopped
  by the time .bdrv_detach_aio_context() is called.

- Remove AIO_WAIT_WHILE() from vduse_blk_detach_ctx() to solve the
  .bdrv_detach_aio_context() constraint violation. It's no longer
  needed due to the previous changes.

- Always handle the VDUSE file descriptor, even in drained sections. The
  VDUSE file descriptor doesn't submit I/O, so it's safe to handle it in
  drained sections. This ensures that the VDUSE kernel code gets a fast
  response.

- Suspend virtqueue fd handlers in .drained_begin() and resume them in
  .drained_end(). This eliminates the need for the
  aio_set_fd_handler(is_external=true) flag, which is being removed from
  QEMU.

This is a long list but splitting it into individual commits would
probably lead to git bisect failures - the changes are all related.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/export/vduse-blk.c | 132 +++++++++++++++++++++++++++------------
 1 file changed, 93 insertions(+), 39 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index b53ef39da0..a25556fe04 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -31,7 +31,8 @@ typedef struct VduseBlkExport {
     VduseDev *dev;
     uint16_t num_queues;
     char *recon_file;
-    unsigned int inflight;
+    unsigned int inflight; /* atomic */
+    bool vqs_started;
 } VduseBlkExport;
 
 typedef struct VduseBlkReq {
@@ -41,13 +42,24 @@ typedef struct VduseBlkReq {
 
 static void vduse_blk_inflight_inc(VduseBlkExport *vblk_exp)
 {
-    vblk_exp->inflight++;
+    if (qatomic_fetch_inc(&vblk_exp->inflight) == 0) {
+        /* Prevent export from being deleted */
+        aio_context_acquire(vblk_exp->export.ctx);
+        blk_exp_ref(&vblk_exp->export);
+        aio_context_release(vblk_exp->export.ctx);
+    }
 }
 
 static void vduse_blk_inflight_dec(VduseBlkExport *vblk_exp)
 {
-    if (--vblk_exp->inflight == 0) {
+    if (qatomic_fetch_dec(&vblk_exp->inflight) == 1) {
+        /* Wake AIO_WAIT_WHILE() */
         aio_wait_kick();
+
+        /* Now the export can be deleted */
+        aio_context_acquire(vblk_exp->export.ctx);
+        blk_exp_unref(&vblk_exp->export);
+        aio_context_release(vblk_exp->export.ctx);
     }
 }
 
@@ -124,8 +136,12 @@ static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq)
 {
     VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
 
+    if (!vblk_exp->vqs_started) {
+        return; /* vduse_blk_drained_end() will start vqs later */
+    }
+
     aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
-                       true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+                       false, on_vduse_vq_kick, NULL, NULL, NULL, vq);
     /* Make sure we don't miss any kick afer reconnecting */
     eventfd_write(vduse_queue_get_fd(vq), 1);
 }
@@ -133,9 +149,14 @@ static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq)
 static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
 {
     VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
+    int fd = vduse_queue_get_fd(vq);
 
-    aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
-                       true, NULL, NULL, NULL, NULL, NULL);
+    if (fd < 0) {
+        return;
+    }
+
+    aio_set_fd_handler(vblk_exp->export.ctx, fd, false,
+                       NULL, NULL, NULL, NULL, NULL);
 }
 
 static const VduseOps vduse_blk_ops = {
@@ -152,42 +173,19 @@ static void on_vduse_dev_kick(void *opaque)
 
 static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx)
 {
-    int i;
-
     aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
-                       true, on_vduse_dev_kick, NULL, NULL, NULL,
+                       false, on_vduse_dev_kick, NULL, NULL, NULL,
                        vblk_exp->dev);
 
-    for (i = 0; i < vblk_exp->num_queues; i++) {
-        VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
-        int fd = vduse_queue_get_fd(vq);
-
-        if (fd < 0) {
-            continue;
-        }
-        aio_set_fd_handler(vblk_exp->export.ctx, fd, true,
-                           on_vduse_vq_kick, NULL, NULL, NULL, vq);
-    }
+    /* Virtqueues are handled by vduse_blk_drained_end() */
 }
 
 static void vduse_blk_detach_ctx(VduseBlkExport *vblk_exp)
 {
-    int i;
-
-    for (i = 0; i < vblk_exp->num_queues; i++) {
-        VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
-        int fd = vduse_queue_get_fd(vq);
-
-        if (fd < 0) {
-            continue;
-        }
-        aio_set_fd_handler(vblk_exp->export.ctx, fd,
-                           true, NULL, NULL, NULL, NULL, NULL);
-    }
     aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
-                       true, NULL, NULL, NULL, NULL, NULL);
+                       false, NULL, NULL, NULL, NULL, NULL);
 
-    AIO_WAIT_WHILE(vblk_exp->export.ctx, vblk_exp->inflight > 0);
+    /* Virtqueues are handled by vduse_blk_drained_begin() */
 }
 
 
@@ -220,8 +218,55 @@ static void vduse_blk_resize(void *opaque)
                             (char *)&config.capacity);
 }
 
+static void vduse_blk_stop_virtqueues(VduseBlkExport *vblk_exp)
+{
+    for (uint16_t i = 0; i < vblk_exp->num_queues; i++) {
+        VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
+        vduse_blk_disable_queue(vblk_exp->dev, vq);
+    }
+
+    vblk_exp->vqs_started = false;
+}
+
+static void vduse_blk_start_virtqueues(VduseBlkExport *vblk_exp)
+{
+    vblk_exp->vqs_started = true;
+
+    for (uint16_t i = 0; i < vblk_exp->num_queues; i++) {
+        VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
+        vduse_blk_enable_queue(vblk_exp->dev, vq);
+    }
+}
+
+static void vduse_blk_drained_begin(void *opaque)
+{
+    BlockExport *exp = opaque;
+    VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+
+    vduse_blk_stop_virtqueues(vblk_exp);
+}
+
+static void vduse_blk_drained_end(void *opaque)
+{
+    BlockExport *exp = opaque;
+    VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+
+    vduse_blk_start_virtqueues(vblk_exp);
+}
+
+static bool vduse_blk_drained_poll(void *opaque)
+{
+    BlockExport *exp = opaque;
+    VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+
+    return qatomic_read(&vblk_exp->inflight) > 0;
+}
+
 static const BlockDevOps vduse_block_ops = {
-    .resize_cb = vduse_blk_resize,
+    .resize_cb     = vduse_blk_resize,
+    .drained_begin = vduse_blk_drained_begin,
+    .drained_end   = vduse_blk_drained_end,
+    .drained_poll  = vduse_blk_drained_poll,
 };
 
 static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
@@ -268,6 +313,7 @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
     vblk_exp->handler.serial = g_strdup(vblk_opts->serial ?: "");
     vblk_exp->handler.logical_block_size = logical_block_size;
     vblk_exp->handler.writable = opts->writable;
+    vblk_exp->vqs_started = true;
 
     config.capacity =
             cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
@@ -322,14 +368,20 @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
         vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
     }
 
-    aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), true,
+    aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), false,
                        on_vduse_dev_kick, NULL, NULL, NULL, vblk_exp->dev);
 
     blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
                                  vblk_exp);
-
     blk_set_dev_ops(exp->blk, &vduse_block_ops, exp);
 
+    /*
+     * We handle draining ourselves using an in-flight counter and by disabling
+     * virtqueue fd handlers. Do not queue BlockBackend requests, they need to
+     * complete so the in-flight counter reaches zero.
+     */
+    blk_set_disable_request_queuing(exp->blk, true);
+
     return 0;
 err:
     vduse_dev_destroy(vblk_exp->dev);
@@ -344,6 +396,9 @@ static void vduse_blk_exp_delete(BlockExport *exp)
     VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
     int ret;
 
+    assert(qatomic_read(&vblk_exp->inflight) == 0);
+
+    vduse_blk_detach_ctx(vblk_exp);
     blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
                                     vblk_exp);
     ret = vduse_dev_destroy(vblk_exp->dev);
@@ -354,13 +409,12 @@ static void vduse_blk_exp_delete(BlockExport *exp)
     g_free(vblk_exp->handler.serial);
 }
 
+/* Called with exp->ctx acquired */
 static void vduse_blk_exp_request_shutdown(BlockExport *exp)
 {
     VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
 
-    aio_context_acquire(vblk_exp->export.ctx);
-    vduse_blk_detach_ctx(vblk_exp);
-    aio_context_acquire(vblk_exp->export.ctx);
+    vduse_blk_stop_virtqueues(vblk_exp);
 }
 
 const BlockExportDriver blk_exp_vduse_blk = {
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 15/21] block/export: don't require AioContext lock around blk_exp_ref/unref()
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (13 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 14/21] block/export: rewrite vduse-blk drain code Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 16/21] block/fuse: do not set is_external=true on FUSE fd Stefan Hajnoczi
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

The FUSE export calls blk_exp_ref/unref() without the AioContext lock.
Instead of fixing the FUSE export, adjust blk_exp_ref/unref() so they
work without the AioContext lock. This way it's less error-prone.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/export.h   |  2 ++
 block/export/export.c    | 13 ++++++-------
 block/export/vduse-blk.c |  4 ----
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/include/block/export.h b/include/block/export.h
index 7feb02e10d..f2fe0f8078 100644
--- a/include/block/export.h
+++ b/include/block/export.h
@@ -57,6 +57,8 @@ struct BlockExport {
      * Reference count for this block export. This includes strong references
      * both from the owner (qemu-nbd or the monitor) and clients connected to
      * the export.
+     *
+     * Use atomics to access this field.
      */
     int refcount;
 
diff --git a/block/export/export.c b/block/export/export.c
index 62c7c22d45..ab007e9d31 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -202,11 +202,10 @@ fail:
     return NULL;
 }
 
-/* Callers must hold exp->ctx lock */
 void blk_exp_ref(BlockExport *exp)
 {
-    assert(exp->refcount > 0);
-    exp->refcount++;
+    assert(qatomic_read(&exp->refcount) > 0);
+    qatomic_inc(&exp->refcount);
 }
 
 /* Runs in the main thread */
@@ -229,11 +228,10 @@ static void blk_exp_delete_bh(void *opaque)
     aio_context_release(aio_context);
 }
 
-/* Callers must hold exp->ctx lock */
 void blk_exp_unref(BlockExport *exp)
 {
-    assert(exp->refcount > 0);
-    if (--exp->refcount == 0) {
+    assert(qatomic_read(&exp->refcount) > 0);
+    if (qatomic_fetch_dec(&exp->refcount) == 1) {
         /* Touch the block_exports list only in the main thread */
         aio_bh_schedule_oneshot(qemu_get_aio_context(), blk_exp_delete_bh,
                                 exp);
@@ -341,7 +339,8 @@ void qmp_block_export_del(const char *id,
     if (!has_mode) {
         mode = BLOCK_EXPORT_REMOVE_MODE_SAFE;
     }
-    if (mode == BLOCK_EXPORT_REMOVE_MODE_SAFE && exp->refcount > 1) {
+    if (mode == BLOCK_EXPORT_REMOVE_MODE_SAFE &&
+        qatomic_read(&exp->refcount) > 1) {
         error_setg(errp, "export '%s' still in use", exp->id);
         error_append_hint(errp, "Use mode='hard' to force client "
                           "disconnect\n");
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index a25556fe04..e0455551f9 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -44,9 +44,7 @@ static void vduse_blk_inflight_inc(VduseBlkExport *vblk_exp)
 {
     if (qatomic_fetch_inc(&vblk_exp->inflight) == 0) {
         /* Prevent export from being deleted */
-        aio_context_acquire(vblk_exp->export.ctx);
         blk_exp_ref(&vblk_exp->export);
-        aio_context_release(vblk_exp->export.ctx);
     }
 }
 
@@ -57,9 +55,7 @@ static void vduse_blk_inflight_dec(VduseBlkExport *vblk_exp)
         aio_wait_kick();
 
         /* Now the export can be deleted */
-        aio_context_acquire(vblk_exp->export.ctx);
         blk_exp_unref(&vblk_exp->export);
-        aio_context_release(vblk_exp->export.ctx);
     }
 }
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 16/21] block/fuse: do not set is_external=true on FUSE fd
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (14 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 15/21] block/export: don't require AioContext lock around blk_exp_ref/unref() Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 17/21] virtio: make it possible to detach host notifier from any thread Stefan Hajnoczi
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

This is part of ongoing work to remove the aio_disable_external() API.

Use BlockDevOps .drained_begin/end/poll() instead of
aio_set_fd_handler(is_external=true).

As a side-effect the FUSE export now follows AioContext changes like the
other export types.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/export/fuse.c | 56 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 06fa41079e..adf3236b5a 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -50,6 +50,7 @@ typedef struct FuseExport {
 
     struct fuse_session *fuse_session;
     struct fuse_buf fuse_buf;
+    unsigned int in_flight; /* atomic */
     bool mounted, fd_handler_set_up;
 
     char *mountpoint;
@@ -78,6 +79,42 @@ static void read_from_fuse_export(void *opaque);
 static bool is_regular_file(const char *path, Error **errp);
 
 
+static void fuse_export_drained_begin(void *opaque)
+{
+    FuseExport *exp = opaque;
+
+    aio_set_fd_handler(exp->common.ctx,
+                       fuse_session_fd(exp->fuse_session), false,
+                       NULL, NULL, NULL, NULL, NULL);
+    exp->fd_handler_set_up = false;
+}
+
+static void fuse_export_drained_end(void *opaque)
+{
+    FuseExport *exp = opaque;
+
+    /* Refresh AioContext in case it changed */
+    exp->common.ctx = blk_get_aio_context(exp->common.blk);
+
+    aio_set_fd_handler(exp->common.ctx,
+                       fuse_session_fd(exp->fuse_session), false,
+                       read_from_fuse_export, NULL, NULL, NULL, exp);
+    exp->fd_handler_set_up = true;
+}
+
+static bool fuse_export_drained_poll(void *opaque)
+{
+    FuseExport *exp = opaque;
+
+    return qatomic_read(&exp->in_flight) > 0;
+}
+
+static const BlockDevOps fuse_export_blk_dev_ops = {
+    .drained_begin = fuse_export_drained_begin,
+    .drained_end   = fuse_export_drained_end,
+    .drained_poll  = fuse_export_drained_poll,
+};
+
 static int fuse_export_create(BlockExport *blk_exp,
                               BlockExportOptions *blk_exp_args,
                               Error **errp)
@@ -101,6 +138,15 @@ static int fuse_export_create(BlockExport *blk_exp,
         }
     }
 
+    blk_set_dev_ops(exp->common.blk, &fuse_export_blk_dev_ops, exp);
+
+    /*
+     * We handle draining ourselves using an in-flight counter and by disabling
+     * the FUSE fd handler. Do not queue BlockBackend requests, they need to
+     * complete so the in-flight counter reaches zero.
+     */
+    blk_set_disable_request_queuing(exp->common.blk, true);
+
     init_exports_table();
 
     /*
@@ -224,7 +270,7 @@ static int setup_fuse_export(FuseExport *exp, const char *mountpoint,
     g_hash_table_insert(exports, g_strdup(mountpoint), NULL);
 
     aio_set_fd_handler(exp->common.ctx,
-                       fuse_session_fd(exp->fuse_session), true,
+                       fuse_session_fd(exp->fuse_session), false,
                        read_from_fuse_export, NULL, NULL, NULL, exp);
     exp->fd_handler_set_up = true;
 
@@ -246,6 +292,8 @@ static void read_from_fuse_export(void *opaque)
 
     blk_exp_ref(&exp->common);
 
+    qatomic_inc(&exp->in_flight);
+
     do {
         ret = fuse_session_receive_buf(exp->fuse_session, &exp->fuse_buf);
     } while (ret == -EINTR);
@@ -256,6 +304,10 @@ static void read_from_fuse_export(void *opaque)
     fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf);
 
 out:
+    if (qatomic_fetch_dec(&exp->in_flight) == 1) {
+        aio_wait_kick(); /* wake AIO_WAIT_WHILE() */
+    }
+
     blk_exp_unref(&exp->common);
 }
 
@@ -268,7 +320,7 @@ static void fuse_export_shutdown(BlockExport *blk_exp)
 
         if (exp->fd_handler_set_up) {
             aio_set_fd_handler(exp->common.ctx,
-                               fuse_session_fd(exp->fuse_session), true,
+                               fuse_session_fd(exp->fuse_session), false,
                                NULL, NULL, NULL, NULL, NULL);
             exp->fd_handler_set_up = false;
         }
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 17/21] virtio: make it possible to detach host notifier from any thread
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (15 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 16/21] block/fuse: do not set is_external=true on FUSE fd Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 18/21] virtio-blk: implement BlockDevOps->drained_begin() Stefan Hajnoczi
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

virtio_queue_aio_detach_host_notifier() does two things:
1. It removes the fd handler from the event loop.
2. It processes the virtqueue one last time.

The first step can be peformed by any thread and without taking the
AioContext lock.

The second step may need the AioContext lock (depending on the device
implementation) and runs in the thread where request processing takes
place. virtio-blk and virtio-scsi therefore call
virtio_queue_aio_detach_host_notifier() from a BH that is scheduled in
AioContext

Scheduling a BH is undesirable for .drained_begin() functions. The next
patch will introduce a .drained_begin() function that needs to call
virtio_queue_aio_detach_host_notifier().

Move the virtqueue processing out to the callers of
virtio_queue_aio_detach_host_notifier() so that the function can be
called from any thread. This is in preparation for the next patch.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/dataplane/virtio-blk.c | 2 ++
 hw/scsi/virtio-scsi-dataplane.c | 9 +++++++++
 2 files changed, 11 insertions(+)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index a6202997ee..27eafa6c92 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -287,8 +287,10 @@ static void virtio_blk_data_plane_stop_bh(void *opaque)
 
     for (i = 0; i < s->conf->num_queues; i++) {
         VirtQueue *vq = virtio_get_queue(s->vdev, i);
+        EventNotifier *host_notifier = virtio_queue_get_host_notifier(vq);
 
         virtio_queue_aio_detach_host_notifier(vq, s->ctx);
+        virtio_queue_host_notifier_read(host_notifier);
     }
 }
 
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 20bb91766e..81643445ed 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -71,12 +71,21 @@ static void virtio_scsi_dataplane_stop_bh(void *opaque)
 {
     VirtIOSCSI *s = opaque;
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
+    EventNotifier *host_notifier;
     int i;
 
     virtio_queue_aio_detach_host_notifier(vs->ctrl_vq, s->ctx);
+    host_notifier = virtio_queue_get_host_notifier(vs->ctrl_vq);
+    virtio_queue_host_notifier_read(host_notifier);
+
     virtio_queue_aio_detach_host_notifier(vs->event_vq, s->ctx);
+    host_notifier = virtio_queue_get_host_notifier(vs->event_vq);
+    virtio_queue_host_notifier_read(host_notifier);
+
     for (i = 0; i < vs->conf.num_queues; i++) {
         virtio_queue_aio_detach_host_notifier(vs->cmd_vqs[i], s->ctx);
+        host_notifier = virtio_queue_get_host_notifier(vs->cmd_vqs[i]);
+        virtio_queue_host_notifier_read(host_notifier);
     }
 }
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 18/21] virtio-blk: implement BlockDevOps->drained_begin()
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (16 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 17/21] virtio: make it possible to detach host notifier from any thread Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 19/21] virtio-scsi: " Stefan Hajnoczi
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

Detach ioeventfds during drained sections to stop I/O submission from
the guest. virtio-blk is no longer reliant on aio_disable_external()
after this patch. This will allow us to remove the
aio_disable_external() API once all other code that relies on it is
converted.

Take extra care to avoid attaching/detaching ioeventfds if the data
plane is started/stopped during a drained section. This should be rare,
but maybe the mirror block job can trigger it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/dataplane/virtio-blk.c | 17 +++++++++------
 hw/block/virtio-blk.c           | 38 ++++++++++++++++++++++++++++++++-
 2 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 27eafa6c92..c0d2663abc 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -246,13 +246,15 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
     }
 
     /* Get this show started by hooking up our callbacks */
-    aio_context_acquire(s->ctx);
-    for (i = 0; i < nvqs; i++) {
-        VirtQueue *vq = virtio_get_queue(s->vdev, i);
+    if (!blk_in_drain(s->conf->conf.blk)) {
+        aio_context_acquire(s->ctx);
+        for (i = 0; i < nvqs; i++) {
+            VirtQueue *vq = virtio_get_queue(s->vdev, i);
 
-        virtio_queue_aio_attach_host_notifier(vq, s->ctx);
+            virtio_queue_aio_attach_host_notifier(vq, s->ctx);
+        }
+        aio_context_release(s->ctx);
     }
-    aio_context_release(s->ctx);
     return 0;
 
   fail_aio_context:
@@ -318,7 +320,10 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
     trace_virtio_blk_data_plane_stop(s);
 
     aio_context_acquire(s->ctx);
-    aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
+
+    if (!blk_in_drain(s->conf->conf.blk)) {
+        aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
+    }
 
     /* Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */
     blk_drain(s->conf->conf.blk);
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cefca93b31..d8dedc575c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1109,8 +1109,44 @@ static void virtio_blk_resize(void *opaque)
     aio_bh_schedule_oneshot(qemu_get_aio_context(), virtio_resize_cb, vdev);
 }
 
+/* Suspend virtqueue ioeventfd processing during drain */
+static void virtio_blk_drained_begin(void *opaque)
+{
+    VirtIOBlock *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
+    AioContext *ctx = blk_get_aio_context(s->conf.conf.blk);
+
+    if (!s->dataplane || !s->dataplane_started) {
+        return;
+    }
+
+    for (uint16_t i = 0; i < s->conf.num_queues; i++) {
+        VirtQueue *vq = virtio_get_queue(vdev, i);
+        virtio_queue_aio_detach_host_notifier(vq, ctx);
+    }
+}
+
+/* Resume virtqueue ioeventfd processing after drain */
+static void virtio_blk_drained_end(void *opaque)
+{
+    VirtIOBlock *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
+    AioContext *ctx = blk_get_aio_context(s->conf.conf.blk);
+
+    if (!s->dataplane || !s->dataplane_started) {
+        return;
+    }
+
+    for (uint16_t i = 0; i < s->conf.num_queues; i++) {
+        VirtQueue *vq = virtio_get_queue(vdev, i);
+        virtio_queue_aio_attach_host_notifier(vq, ctx);
+    }
+}
+
 static const BlockDevOps virtio_block_ops = {
-    .resize_cb = virtio_blk_resize,
+    .resize_cb     = virtio_blk_resize,
+    .drained_begin = virtio_blk_drained_begin,
+    .drained_end   = virtio_blk_drained_end,
 };
 
 static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 19/21] virtio-scsi: implement BlockDevOps->drained_begin()
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (17 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 18/21] virtio-blk: implement BlockDevOps->drained_begin() Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 20/21] virtio: do not set is_external=true on host notifiers Stefan Hajnoczi
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

The virtio-scsi Host Bus Adapter provides access to devices on a SCSI
bus. Those SCSI devices typically have a BlockBackend. When the
BlockBackend enters a drained section, the SCSI device must temporarily
stop submitting new I/O requests.

Implement this behavior by temporarily stopping virtio-scsi virtqueue
processing when one of the SCSI devices enters a drained section. The
new scsi_device_drained_begin() API allows scsi-disk to message the
virtio-scsi HBA.

scsi_device_drained_begin() uses a drain counter so that multiple SCSI
devices can have overlapping drained sections. The HBA only sees one
pair of .drained_begin/end() calls.

After this commit, virtio-scsi no longer depends on hw/virtio's
ioeventfd aio_set_event_notifier(is_external=true). This commit is a
step towards removing the aio_disable_external() API.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/scsi/scsi.h          | 14 ++++++++++++
 hw/scsi/scsi-bus.c              | 40 +++++++++++++++++++++++++++++++++
 hw/scsi/scsi-disk.c             | 27 +++++++++++++++++-----
 hw/scsi/virtio-scsi-dataplane.c | 22 ++++++++++--------
 hw/scsi/virtio-scsi.c           | 38 +++++++++++++++++++++++++++++++
 hw/scsi/trace-events            |  2 ++
 6 files changed, 129 insertions(+), 14 deletions(-)

diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 6f23a7a73e..e2bb1a2fbf 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -133,6 +133,16 @@ struct SCSIBusInfo {
     void (*save_request)(QEMUFile *f, SCSIRequest *req);
     void *(*load_request)(QEMUFile *f, SCSIRequest *req);
     void (*free_request)(SCSIBus *bus, void *priv);
+
+    /*
+     * Temporarily stop submitting new requests between drained_begin() and
+     * drained_end(). Called from the main loop thread with the BQL held.
+     *
+     * Implement these callbacks if request processing is triggered by a file
+     * descriptor like an EventNotifier. Otherwise set them to NULL.
+     */
+    void (*drained_begin)(SCSIBus *bus);
+    void (*drained_end)(SCSIBus *bus);
 };
 
 #define TYPE_SCSI_BUS "SCSI"
@@ -144,6 +154,8 @@ struct SCSIBus {
 
     SCSISense unit_attention;
     const SCSIBusInfo *info;
+
+    int drain_count; /* protected by BQL */
 };
 
 /**
@@ -213,6 +225,8 @@ void scsi_req_cancel_complete(SCSIRequest *req);
 void scsi_req_cancel(SCSIRequest *req);
 void scsi_req_cancel_async(SCSIRequest *req, Notifier *notifier);
 void scsi_req_retry(SCSIRequest *req);
+void scsi_device_drained_begin(SCSIDevice *sdev);
+void scsi_device_drained_end(SCSIDevice *sdev);
 void scsi_device_purge_requests(SCSIDevice *sdev, SCSISense sense);
 void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense);
 void scsi_device_report_change(SCSIDevice *dev, SCSISense sense);
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 64013c8a24..f80f4cb4fc 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1669,6 +1669,46 @@ void scsi_device_purge_requests(SCSIDevice *sdev, SCSISense sense)
     scsi_device_set_ua(sdev, sense);
 }
 
+void scsi_device_drained_begin(SCSIDevice *sdev)
+{
+    SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, sdev->qdev.parent_bus);
+    if (!bus) {
+        return;
+    }
+
+    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+    assert(bus->drain_count < INT_MAX);
+
+    /*
+     * Multiple BlockBackends can be on a SCSIBus and each may begin/end
+     * draining at any time. Keep a counter so HBAs only see begin/end once.
+     */
+    if (bus->drain_count++ == 0) {
+        trace_scsi_bus_drained_begin(bus, sdev);
+        if (bus->info->drained_begin) {
+            bus->info->drained_begin(bus);
+        }
+    }
+}
+
+void scsi_device_drained_end(SCSIDevice *sdev)
+{
+    SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, sdev->qdev.parent_bus);
+    if (!bus) {
+        return;
+    }
+
+    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+    assert(bus->drain_count > 0);
+
+    if (bus->drain_count-- == 1) {
+        trace_scsi_bus_drained_end(bus, sdev);
+        if (bus->info->drained_end) {
+            bus->info->drained_end(bus);
+        }
+    }
+}
+
 static char *scsibus_get_dev_path(DeviceState *dev)
 {
     SCSIDevice *d = SCSI_DEVICE(dev);
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 97c9b1c8cd..e0d79c7966 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2360,6 +2360,20 @@ static void scsi_disk_reset(DeviceState *dev)
     s->qdev.scsi_version = s->qdev.default_scsi_version;
 }
 
+static void scsi_disk_drained_begin(void *opaque)
+{
+    SCSIDiskState *s = opaque;
+
+    scsi_device_drained_begin(&s->qdev);
+}
+
+static void scsi_disk_drained_end(void *opaque)
+{
+    SCSIDiskState *s = opaque;
+
+    scsi_device_drained_end(&s->qdev);
+}
+
 static void scsi_disk_resize_cb(void *opaque)
 {
     SCSIDiskState *s = opaque;
@@ -2414,16 +2428,19 @@ static bool scsi_cd_is_medium_locked(void *opaque)
 }
 
 static const BlockDevOps scsi_disk_removable_block_ops = {
-    .change_media_cb = scsi_cd_change_media_cb,
+    .change_media_cb  = scsi_cd_change_media_cb,
+    .drained_begin    = scsi_disk_drained_begin,
+    .drained_end      = scsi_disk_drained_end,
     .eject_request_cb = scsi_cd_eject_request_cb,
-    .is_tray_open = scsi_cd_is_tray_open,
     .is_medium_locked = scsi_cd_is_medium_locked,
-
-    .resize_cb = scsi_disk_resize_cb,
+    .is_tray_open     = scsi_cd_is_tray_open,
+    .resize_cb        = scsi_disk_resize_cb,
 };
 
 static const BlockDevOps scsi_disk_block_ops = {
-    .resize_cb = scsi_disk_resize_cb,
+    .drained_begin = scsi_disk_drained_begin,
+    .drained_end   = scsi_disk_drained_end,
+    .resize_cb     = scsi_disk_resize_cb,
 };
 
 static void scsi_disk_unit_attention_reported(SCSIDevice *dev)
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 81643445ed..1060038e13 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -153,14 +153,16 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
     s->dataplane_starting = false;
     s->dataplane_started = true;
 
-    aio_context_acquire(s->ctx);
-    virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx);
-    virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx);
+    if (s->bus.drain_count == 0) {
+        aio_context_acquire(s->ctx);
+        virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx);
+        virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx);
 
-    for (i = 0; i < vs->conf.num_queues; i++) {
-        virtio_queue_aio_attach_host_notifier(vs->cmd_vqs[i], s->ctx);
+        for (i = 0; i < vs->conf.num_queues; i++) {
+            virtio_queue_aio_attach_host_notifier(vs->cmd_vqs[i], s->ctx);
+        }
+        aio_context_release(s->ctx);
     }
-    aio_context_release(s->ctx);
     return 0;
 
 fail_host_notifiers:
@@ -206,9 +208,11 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
     }
     s->dataplane_stopping = true;
 
-    aio_context_acquire(s->ctx);
-    aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
-    aio_context_release(s->ctx);
+    if (s->bus.drain_count == 0) {
+        aio_context_acquire(s->ctx);
+        aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
+        aio_context_release(s->ctx);
+    }
 
     blk_drain_all(); /* ensure there are no in-flight requests */
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index c1a7ea9ae2..4a8849cc7e 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1117,6 +1117,42 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
     }
 }
 
+/* Suspend virtqueue ioeventfd processing during drain */
+static void virtio_scsi_drained_begin(SCSIBus *bus)
+{
+    VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+    uint32_t total_queues = VIRTIO_SCSI_VQ_NUM_FIXED +
+                            s->parent_obj.conf.num_queues;
+
+    if (!s->dataplane_started) {
+        return;
+    }
+
+    for (uint32_t i = 0; i < total_queues; i++) {
+        VirtQueue *vq = virtio_get_queue(vdev, i);
+        virtio_queue_aio_detach_host_notifier(vq, s->ctx);
+    }
+}
+
+/* Resume virtqueue ioeventfd processing after drain */
+static void virtio_scsi_drained_end(SCSIBus *bus)
+{
+    VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+    uint32_t total_queues = VIRTIO_SCSI_VQ_NUM_FIXED +
+                            s->parent_obj.conf.num_queues;
+
+    if (!s->dataplane_started) {
+        return;
+    }
+
+    for (uint32_t i = 0; i < total_queues; i++) {
+        VirtQueue *vq = virtio_get_queue(vdev, i);
+        virtio_queue_aio_attach_host_notifier(vq, s->ctx);
+    }
+}
+
 static struct SCSIBusInfo virtio_scsi_scsi_info = {
     .tcq = true,
     .max_channel = VIRTIO_SCSI_MAX_CHANNEL,
@@ -1131,6 +1167,8 @@ static struct SCSIBusInfo virtio_scsi_scsi_info = {
     .get_sg_list = virtio_scsi_get_sg_list,
     .save_request = virtio_scsi_save_request,
     .load_request = virtio_scsi_load_request,
+    .drained_begin = virtio_scsi_drained_begin,
+    .drained_end = virtio_scsi_drained_end,
 };
 
 void virtio_scsi_common_realize(DeviceState *dev,
diff --git a/hw/scsi/trace-events b/hw/scsi/trace-events
index ab238293f0..bdd4e2c7c7 100644
--- a/hw/scsi/trace-events
+++ b/hw/scsi/trace-events
@@ -6,6 +6,8 @@ scsi_req_cancel(int target, int lun, int tag) "target %d lun %d tag %d"
 scsi_req_data(int target, int lun, int tag, int len) "target %d lun %d tag %d len %d"
 scsi_req_data_canceled(int target, int lun, int tag, int len) "target %d lun %d tag %d len %d"
 scsi_req_dequeue(int target, int lun, int tag) "target %d lun %d tag %d"
+scsi_bus_drained_begin(void *bus, void *sdev) "bus %p sdev %p"
+scsi_bus_drained_end(void *bus, void *sdev) "bus %p sdev %p"
 scsi_req_continue(int target, int lun, int tag) "target %d lun %d tag %d"
 scsi_req_continue_canceled(int target, int lun, int tag) "target %d lun %d tag %d"
 scsi_req_parsed(int target, int lun, int tag, int cmd, int mode, int xfer) "target %d lun %d tag %d command %d dir %d length %d"
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 20/21] virtio: do not set is_external=true on host notifiers
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (18 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 19/21] virtio-scsi: " Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 19:53 ` [PATCH v5 21/21] aio: remove aio_disable_external() API Stefan Hajnoczi
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

Host notifiers can now use is_external=false since virtio-blk and
virtio-scsi no longer rely on is_external=true for drained sections.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/virtio/virtio.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 272d930721..9cdad7e550 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3491,7 +3491,7 @@ static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
 
 void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
 {
-    aio_set_event_notifier(ctx, &vq->host_notifier, true,
+    aio_set_event_notifier(ctx, &vq->host_notifier, false,
                            virtio_queue_host_notifier_read,
                            virtio_queue_host_notifier_aio_poll,
                            virtio_queue_host_notifier_aio_poll_ready);
@@ -3508,14 +3508,14 @@ void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
  */
 void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext *ctx)
 {
-    aio_set_event_notifier(ctx, &vq->host_notifier, true,
+    aio_set_event_notifier(ctx, &vq->host_notifier, false,
                            virtio_queue_host_notifier_read,
                            NULL, NULL);
 }
 
 void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx)
 {
-    aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL, NULL);
+    aio_set_event_notifier(ctx, &vq->host_notifier, false, NULL, NULL, NULL);
     /* Test and clear notifier before after disabling event,
      * in case poll callback didn't have time to run. */
     virtio_queue_host_notifier_read(&vq->host_notifier);
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 21/21] aio: remove aio_disable_external() API
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (19 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 20/21] virtio: do not set is_external=true on host notifiers Stefan Hajnoczi
@ 2023-05-04 19:53 ` Stefan Hajnoczi
  2023-05-04 21:44 ` [PATCH v5 00/21] block: " Kevin Wolf
  2023-05-09 19:07 ` Kevin Wolf
  22 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-04 19:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ronnie Sahlberg, Aarushi Mehta, qemu-block, Paul Durrant,
	Anthony Perard, Peter Lieven, Stefan Weil, Xie Yongji,
	Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Stefan Hajnoczi, Julia Suvorova, xen-devel, eesposit,
	Juan Quintela, Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

All callers now pass is_external=false to aio_set_fd_handler() and
aio_set_event_notifier(). The aio_disable_external() API that
temporarily disables fd handlers that were registered is_external=true
is therefore dead code.

Remove aio_disable_external(), aio_enable_external(), and the
is_external arguments to aio_set_fd_handler() and
aio_set_event_notifier().

The entire test-fdmon-epoll test is removed because its sole purpose was
testing aio_disable_external().

Parts of this patch were generated using the following coccinelle
(https://coccinelle.lip6.fr/) semantic patch:

  @@
  expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque;
  @@
  - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque)
  + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque)

  @@
  expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
  @@
  - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready)
  + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)

Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/aio.h           | 57 ---------------------------
 util/aio-posix.h              |  1 -
 block.c                       |  7 ----
 block/blkio.c                 | 15 +++----
 block/curl.c                  | 10 ++---
 block/export/fuse.c           |  8 ++--
 block/export/vduse-blk.c      | 10 ++---
 block/io.c                    |  2 -
 block/io_uring.c              |  4 +-
 block/iscsi.c                 |  3 +-
 block/linux-aio.c             |  4 +-
 block/nfs.c                   |  5 +--
 block/nvme.c                  |  8 ++--
 block/ssh.c                   |  4 +-
 block/win32-aio.c             |  6 +--
 hw/i386/kvm/xen_xenstore.c    |  2 +-
 hw/virtio/virtio.c            |  6 +--
 hw/xen/xen-bus.c              |  8 ++--
 io/channel-command.c          |  6 +--
 io/channel-file.c             |  3 +-
 io/channel-socket.c           |  3 +-
 migration/rdma.c              | 16 ++++----
 tests/unit/test-aio.c         | 27 +------------
 tests/unit/test-bdrv-drain.c  |  1 -
 tests/unit/test-fdmon-epoll.c | 73 -----------------------------------
 util/aio-posix.c              | 20 +++-------
 util/aio-win32.c              |  8 +---
 util/async.c                  |  3 +-
 util/fdmon-epoll.c            | 18 +++------
 util/fdmon-io_uring.c         |  8 +---
 util/fdmon-poll.c             |  3 +-
 util/main-loop.c              |  7 ++--
 util/qemu-coroutine-io.c      |  7 ++--
 util/vhost-user-server.c      | 11 +++---
 tests/unit/meson.build        |  3 --
 35 files changed, 82 insertions(+), 295 deletions(-)
 delete mode 100644 tests/unit/test-fdmon-epoll.c

diff --git a/include/block/aio.h b/include/block/aio.h
index 89bbc536f9..32042e8905 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -225,8 +225,6 @@ struct AioContext {
      */
     QEMUTimerListGroup tlg;
 
-    int external_disable_cnt;
-
     /* Number of AioHandlers without .io_poll() */
     int poll_disable_cnt;
 
@@ -481,7 +479,6 @@ bool aio_poll(AioContext *ctx, bool blocking);
  */
 void aio_set_fd_handler(AioContext *ctx,
                         int fd,
-                        bool is_external,
                         IOHandler *io_read,
                         IOHandler *io_write,
                         AioPollFn *io_poll,
@@ -497,7 +494,6 @@ void aio_set_fd_handler(AioContext *ctx,
  */
 void aio_set_event_notifier(AioContext *ctx,
                             EventNotifier *notifier,
-                            bool is_external,
                             EventNotifierHandler *io_read,
                             AioPollFn *io_poll,
                             EventNotifierHandler *io_poll_ready);
@@ -626,59 +622,6 @@ static inline void aio_timer_init(AioContext *ctx,
  */
 int64_t aio_compute_timeout(AioContext *ctx);
 
-/**
- * aio_disable_external:
- * @ctx: the aio context
- *
- * Disable the further processing of external clients.
- */
-static inline void aio_disable_external(AioContext *ctx)
-{
-    qatomic_inc(&ctx->external_disable_cnt);
-}
-
-/**
- * aio_enable_external:
- * @ctx: the aio context
- *
- * Enable the processing of external clients.
- */
-static inline void aio_enable_external(AioContext *ctx)
-{
-    int old;
-
-    old = qatomic_fetch_dec(&ctx->external_disable_cnt);
-    assert(old > 0);
-    if (old == 1) {
-        /* Kick event loop so it re-arms file descriptors */
-        aio_notify(ctx);
-    }
-}
-
-/**
- * aio_external_disabled:
- * @ctx: the aio context
- *
- * Return true if the external clients are disabled.
- */
-static inline bool aio_external_disabled(AioContext *ctx)
-{
-    return qatomic_read(&ctx->external_disable_cnt);
-}
-
-/**
- * aio_node_check:
- * @ctx: the aio context
- * @is_external: Whether or not the checked node is an external event source.
- *
- * Check if the node's is_external flag is okay to be polled by the ctx at this
- * moment. True means green light.
- */
-static inline bool aio_node_check(AioContext *ctx, bool is_external)
-{
-    return !is_external || !qatomic_read(&ctx->external_disable_cnt);
-}
-
 /**
  * aio_co_schedule:
  * @ctx: the aio context
diff --git a/util/aio-posix.h b/util/aio-posix.h
index 80b927c7f4..4264c518be 100644
--- a/util/aio-posix.h
+++ b/util/aio-posix.h
@@ -38,7 +38,6 @@ struct AioHandler {
 #endif
     int64_t poll_idle_timeout; /* when to stop userspace polling */
     bool poll_ready; /* has polling detected an event? */
-    bool is_external;
 };
 
 /* Add a handler to a ready list */
diff --git a/block.c b/block.c
index 5ec1a3897e..6239c8bc07 100644
--- a/block.c
+++ b/block.c
@@ -7268,9 +7268,6 @@ static void bdrv_detach_aio_context(BlockDriverState *bs)
         bs->drv->bdrv_detach_aio_context(bs);
     }
 
-    if (bs->quiesce_counter) {
-        aio_enable_external(bs->aio_context);
-    }
     bs->aio_context = NULL;
 }
 
@@ -7280,10 +7277,6 @@ static void bdrv_attach_aio_context(BlockDriverState *bs,
     BdrvAioNotifier *ban, *ban_tmp;
     GLOBAL_STATE_CODE();
 
-    if (bs->quiesce_counter) {
-        aio_disable_external(new_context);
-    }
-
     bs->aio_context = new_context;
 
     if (bs->drv && bs->drv->bdrv_attach_aio_context) {
diff --git a/block/blkio.c b/block/blkio.c
index 0cdc99a729..72117fa005 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -306,23 +306,18 @@ static void blkio_attach_aio_context(BlockDriverState *bs,
 {
     BDRVBlkioState *s = bs->opaque;
 
-    aio_set_fd_handler(new_context,
-                       s->completion_fd,
-                       false,
-                       blkio_completion_fd_read,
-                       NULL,
+    aio_set_fd_handler(new_context, s->completion_fd,
+                       blkio_completion_fd_read, NULL,
                        blkio_completion_fd_poll,
-                       blkio_completion_fd_poll_ready,
-                       bs);
+                       blkio_completion_fd_poll_ready, bs);
 }
 
 static void blkio_detach_aio_context(BlockDriverState *bs)
 {
     BDRVBlkioState *s = bs->opaque;
 
-    aio_set_fd_handler(bdrv_get_aio_context(bs),
-                       s->completion_fd,
-                       false, NULL, NULL, NULL, NULL, NULL);
+    aio_set_fd_handler(bdrv_get_aio_context(bs), s->completion_fd, NULL, NULL,
+                       NULL, NULL, NULL);
 }
 
 /* Call with s->blkio_lock held to submit I/O after enqueuing a new request */
diff --git a/block/curl.c b/block/curl.c
index 8bb39a134e..0fc42d03d7 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -132,7 +132,7 @@ static gboolean curl_drop_socket(void *key, void *value, void *opaque)
     CURLSocket *socket = value;
     BDRVCURLState *s = socket->s;
 
-    aio_set_fd_handler(s->aio_context, socket->fd, false,
+    aio_set_fd_handler(s->aio_context, socket->fd,
                        NULL, NULL, NULL, NULL, NULL);
     return true;
 }
@@ -180,20 +180,20 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
     trace_curl_sock_cb(action, (int)fd);
     switch (action) {
         case CURL_POLL_IN:
-            aio_set_fd_handler(s->aio_context, fd, false,
+            aio_set_fd_handler(s->aio_context, fd,
                                curl_multi_do, NULL, NULL, NULL, socket);
             break;
         case CURL_POLL_OUT:
-            aio_set_fd_handler(s->aio_context, fd, false,
+            aio_set_fd_handler(s->aio_context, fd,
                                NULL, curl_multi_do, NULL, NULL, socket);
             break;
         case CURL_POLL_INOUT:
-            aio_set_fd_handler(s->aio_context, fd, false,
+            aio_set_fd_handler(s->aio_context, fd,
                                curl_multi_do, curl_multi_do,
                                NULL, NULL, socket);
             break;
         case CURL_POLL_REMOVE:
-            aio_set_fd_handler(s->aio_context, fd, false,
+            aio_set_fd_handler(s->aio_context, fd,
                                NULL, NULL, NULL, NULL, NULL);
             break;
     }
diff --git a/block/export/fuse.c b/block/export/fuse.c
index adf3236b5a..3307b64089 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -84,7 +84,7 @@ static void fuse_export_drained_begin(void *opaque)
     FuseExport *exp = opaque;
 
     aio_set_fd_handler(exp->common.ctx,
-                       fuse_session_fd(exp->fuse_session), false,
+                       fuse_session_fd(exp->fuse_session),
                        NULL, NULL, NULL, NULL, NULL);
     exp->fd_handler_set_up = false;
 }
@@ -97,7 +97,7 @@ static void fuse_export_drained_end(void *opaque)
     exp->common.ctx = blk_get_aio_context(exp->common.blk);
 
     aio_set_fd_handler(exp->common.ctx,
-                       fuse_session_fd(exp->fuse_session), false,
+                       fuse_session_fd(exp->fuse_session),
                        read_from_fuse_export, NULL, NULL, NULL, exp);
     exp->fd_handler_set_up = true;
 }
@@ -270,7 +270,7 @@ static int setup_fuse_export(FuseExport *exp, const char *mountpoint,
     g_hash_table_insert(exports, g_strdup(mountpoint), NULL);
 
     aio_set_fd_handler(exp->common.ctx,
-                       fuse_session_fd(exp->fuse_session), false,
+                       fuse_session_fd(exp->fuse_session),
                        read_from_fuse_export, NULL, NULL, NULL, exp);
     exp->fd_handler_set_up = true;
 
@@ -320,7 +320,7 @@ static void fuse_export_shutdown(BlockExport *blk_exp)
 
         if (exp->fd_handler_set_up) {
             aio_set_fd_handler(exp->common.ctx,
-                               fuse_session_fd(exp->fuse_session), false,
+                               fuse_session_fd(exp->fuse_session),
                                NULL, NULL, NULL, NULL, NULL);
             exp->fd_handler_set_up = false;
         }
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index e0455551f9..83b05548e7 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -137,7 +137,7 @@ static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq)
     }
 
     aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
-                       false, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+                       on_vduse_vq_kick, NULL, NULL, NULL, vq);
     /* Make sure we don't miss any kick afer reconnecting */
     eventfd_write(vduse_queue_get_fd(vq), 1);
 }
@@ -151,7 +151,7 @@ static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
         return;
     }
 
-    aio_set_fd_handler(vblk_exp->export.ctx, fd, false,
+    aio_set_fd_handler(vblk_exp->export.ctx, fd,
                        NULL, NULL, NULL, NULL, NULL);
 }
 
@@ -170,7 +170,7 @@ static void on_vduse_dev_kick(void *opaque)
 static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx)
 {
     aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
-                       false, on_vduse_dev_kick, NULL, NULL, NULL,
+                       on_vduse_dev_kick, NULL, NULL, NULL,
                        vblk_exp->dev);
 
     /* Virtqueues are handled by vduse_blk_drained_end() */
@@ -179,7 +179,7 @@ static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx)
 static void vduse_blk_detach_ctx(VduseBlkExport *vblk_exp)
 {
     aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
-                       false, NULL, NULL, NULL, NULL, NULL);
+                       NULL, NULL, NULL, NULL, NULL);
 
     /* Virtqueues are handled by vduse_blk_drained_begin() */
 }
@@ -364,7 +364,7 @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
         vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
     }
 
-    aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), false,
+    aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev),
                        on_vduse_dev_kick, NULL, NULL, NULL, vblk_exp->dev);
 
     blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
diff --git a/block/io.c b/block/io.c
index 532c8c90c9..9f82d1009e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -363,7 +363,6 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
 
     /* Stop things in parent-to-child order */
     if (qatomic_fetch_inc(&bs->quiesce_counter) == 0) {
-        aio_disable_external(bdrv_get_aio_context(bs));
         bdrv_parent_drained_begin(bs, parent);
         if (bs->drv && bs->drv->bdrv_drain_begin) {
             bs->drv->bdrv_drain_begin(bs);
@@ -419,7 +418,6 @@ static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent)
             bs->drv->bdrv_drain_end(bs);
         }
         bdrv_parent_drained_end(bs, parent);
-        aio_enable_external(bdrv_get_aio_context(bs));
     }
 }
 
diff --git a/block/io_uring.c b/block/io_uring.c
index 989f9a99ed..b64a3e6285 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -406,7 +406,7 @@ int coroutine_fn luring_co_submit(BlockDriverState *bs, int fd, uint64_t offset,
 
 void luring_detach_aio_context(LuringState *s, AioContext *old_context)
 {
-    aio_set_fd_handler(old_context, s->ring.ring_fd, false,
+    aio_set_fd_handler(old_context, s->ring.ring_fd,
                        NULL, NULL, NULL, NULL, s);
     qemu_bh_delete(s->completion_bh);
     s->aio_context = NULL;
@@ -416,7 +416,7 @@ void luring_attach_aio_context(LuringState *s, AioContext *new_context)
 {
     s->aio_context = new_context;
     s->completion_bh = aio_bh_new(new_context, qemu_luring_completion_bh, s);
-    aio_set_fd_handler(s->aio_context, s->ring.ring_fd, false,
+    aio_set_fd_handler(s->aio_context, s->ring.ring_fd,
                        qemu_luring_completion_cb, NULL,
                        qemu_luring_poll_cb, qemu_luring_poll_ready, s);
 }
diff --git a/block/iscsi.c b/block/iscsi.c
index 9fc0bed90b..34f97ab646 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -363,7 +363,6 @@ iscsi_set_events(IscsiLun *iscsilun)
 
     if (ev != iscsilun->events) {
         aio_set_fd_handler(iscsilun->aio_context, iscsi_get_fd(iscsi),
-                           false,
                            (ev & POLLIN) ? iscsi_process_read : NULL,
                            (ev & POLLOUT) ? iscsi_process_write : NULL,
                            NULL, NULL,
@@ -1540,7 +1539,7 @@ static void iscsi_detach_aio_context(BlockDriverState *bs)
     IscsiLun *iscsilun = bs->opaque;
 
     aio_set_fd_handler(iscsilun->aio_context, iscsi_get_fd(iscsilun->iscsi),
-                       false, NULL, NULL, NULL, NULL, NULL);
+                       NULL, NULL, NULL, NULL, NULL);
     iscsilun->events = 0;
 
     if (iscsilun->nop_timer) {
diff --git a/block/linux-aio.c b/block/linux-aio.c
index fc50cdd1bf..129908531a 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -443,7 +443,7 @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qiov,
 
 void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context)
 {
-    aio_set_event_notifier(old_context, &s->e, false, NULL, NULL, NULL);
+    aio_set_event_notifier(old_context, &s->e, NULL, NULL, NULL);
     qemu_bh_delete(s->completion_bh);
     s->aio_context = NULL;
 }
@@ -452,7 +452,7 @@ void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context)
 {
     s->aio_context = new_context;
     s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s);
-    aio_set_event_notifier(new_context, &s->e, false,
+    aio_set_event_notifier(new_context, &s->e,
                            qemu_laio_completion_cb,
                            qemu_laio_poll_cb,
                            qemu_laio_poll_ready);
diff --git a/block/nfs.c b/block/nfs.c
index 006045d71a..8f89ece69f 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -195,7 +195,6 @@ static void nfs_set_events(NFSClient *client)
     int ev = nfs_which_events(client->context);
     if (ev != client->events) {
         aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
-                           false,
                            (ev & POLLIN) ? nfs_process_read : NULL,
                            (ev & POLLOUT) ? nfs_process_write : NULL,
                            NULL, NULL, client);
@@ -373,7 +372,7 @@ static void nfs_detach_aio_context(BlockDriverState *bs)
     NFSClient *client = bs->opaque;
 
     aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
-                       false, NULL, NULL, NULL, NULL, NULL);
+                       NULL, NULL, NULL, NULL, NULL);
     client->events = 0;
 }
 
@@ -391,7 +390,7 @@ static void nfs_client_close(NFSClient *client)
     if (client->context) {
         qemu_mutex_lock(&client->mutex);
         aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
-                           false, NULL, NULL, NULL, NULL, NULL);
+                           NULL, NULL, NULL, NULL, NULL);
         qemu_mutex_unlock(&client->mutex);
         if (client->fh) {
             nfs_close(client->context, client->fh);
diff --git a/block/nvme.c b/block/nvme.c
index 5b744c2bda..17937d398d 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -862,7 +862,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
     }
     aio_set_event_notifier(bdrv_get_aio_context(bs),
                            &s->irq_notifier[MSIX_SHARED_IRQ_IDX],
-                           false, nvme_handle_event, nvme_poll_cb,
+                           nvme_handle_event, nvme_poll_cb,
                            nvme_poll_ready);
 
     if (!nvme_identify(bs, namespace, errp)) {
@@ -948,7 +948,7 @@ static void nvme_close(BlockDriverState *bs)
     g_free(s->queues);
     aio_set_event_notifier(bdrv_get_aio_context(bs),
                            &s->irq_notifier[MSIX_SHARED_IRQ_IDX],
-                           false, NULL, NULL, NULL);
+                           NULL, NULL, NULL);
     event_notifier_cleanup(&s->irq_notifier[MSIX_SHARED_IRQ_IDX]);
     qemu_vfio_pci_unmap_bar(s->vfio, 0, s->bar0_wo_map,
                             0, sizeof(NvmeBar) + NVME_DOORBELL_SIZE);
@@ -1546,7 +1546,7 @@ static void nvme_detach_aio_context(BlockDriverState *bs)
 
     aio_set_event_notifier(bdrv_get_aio_context(bs),
                            &s->irq_notifier[MSIX_SHARED_IRQ_IDX],
-                           false, NULL, NULL, NULL);
+                           NULL, NULL, NULL);
 }
 
 static void nvme_attach_aio_context(BlockDriverState *bs,
@@ -1556,7 +1556,7 @@ static void nvme_attach_aio_context(BlockDriverState *bs,
 
     s->aio_context = new_context;
     aio_set_event_notifier(new_context, &s->irq_notifier[MSIX_SHARED_IRQ_IDX],
-                           false, nvme_handle_event, nvme_poll_cb,
+                           nvme_handle_event, nvme_poll_cb,
                            nvme_poll_ready);
 
     for (unsigned i = 0; i < s->queue_count; i++) {
diff --git a/block/ssh.c b/block/ssh.c
index b3b3352075..2748253d4a 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -1019,7 +1019,7 @@ static void restart_coroutine(void *opaque)
     AioContext *ctx = bdrv_get_aio_context(bs);
 
     trace_ssh_restart_coroutine(restart->co);
-    aio_set_fd_handler(ctx, s->sock, false, NULL, NULL, NULL, NULL, NULL);
+    aio_set_fd_handler(ctx, s->sock, NULL, NULL, NULL, NULL, NULL);
 
     aio_co_wake(restart->co);
 }
@@ -1049,7 +1049,7 @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
     trace_ssh_co_yield(s->sock, rd_handler, wr_handler);
 
     aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
-                       false, rd_handler, wr_handler, NULL, NULL, &restart);
+                       rd_handler, wr_handler, NULL, NULL, &restart);
     qemu_coroutine_yield();
     trace_ssh_co_yield_back(s->sock);
 }
diff --git a/block/win32-aio.c b/block/win32-aio.c
index ee87d6048f..6327861e1d 100644
--- a/block/win32-aio.c
+++ b/block/win32-aio.c
@@ -174,7 +174,7 @@ int win32_aio_attach(QEMUWin32AIOState *aio, HANDLE hfile)
 void win32_aio_detach_aio_context(QEMUWin32AIOState *aio,
                                   AioContext *old_context)
 {
-    aio_set_event_notifier(old_context, &aio->e, false, NULL, NULL, NULL);
+    aio_set_event_notifier(old_context, &aio->e, NULL, NULL, NULL);
     aio->aio_ctx = NULL;
 }
 
@@ -182,8 +182,8 @@ void win32_aio_attach_aio_context(QEMUWin32AIOState *aio,
                                   AioContext *new_context)
 {
     aio->aio_ctx = new_context;
-    aio_set_event_notifier(new_context, &aio->e, false,
-                           win32_aio_completion_cb, NULL, NULL);
+    aio_set_event_notifier(new_context, &aio->e, win32_aio_completion_cb,
+                           NULL, NULL);
 }
 
 QEMUWin32AIOState *win32_aio_init(void)
diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index 6e81bc8791..0b189c6ab8 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -133,7 +133,7 @@ static void xen_xenstore_realize(DeviceState *dev, Error **errp)
         error_setg(errp, "Xenstore evtchn port init failed");
         return;
     }
-    aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), false,
+    aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh),
                        xen_xenstore_event, NULL, NULL, NULL, s);
 
     s->impl = xs_impl_create(xen_domid);
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 9cdad7e550..d48e240c37 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3491,7 +3491,7 @@ static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
 
 void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
 {
-    aio_set_event_notifier(ctx, &vq->host_notifier, false,
+    aio_set_event_notifier(ctx, &vq->host_notifier,
                            virtio_queue_host_notifier_read,
                            virtio_queue_host_notifier_aio_poll,
                            virtio_queue_host_notifier_aio_poll_ready);
@@ -3508,14 +3508,14 @@ void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
  */
 void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext *ctx)
 {
-    aio_set_event_notifier(ctx, &vq->host_notifier, false,
+    aio_set_event_notifier(ctx, &vq->host_notifier,
                            virtio_queue_host_notifier_read,
                            NULL, NULL);
 }
 
 void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx)
 {
-    aio_set_event_notifier(ctx, &vq->host_notifier, false, NULL, NULL, NULL);
+    aio_set_event_notifier(ctx, &vq->host_notifier, NULL, NULL, NULL);
     /* Test and clear notifier before after disabling event,
      * in case poll callback didn't have time to run. */
     virtio_queue_host_notifier_read(&vq->host_notifier);
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index bf256d4da2..1e08cf027a 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -842,14 +842,14 @@ void xen_device_set_event_channel_context(XenDevice *xendev,
     }
 
     if (channel->ctx)
-        aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
+        aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh),
                            NULL, NULL, NULL, NULL, NULL);
 
     channel->ctx = ctx;
     if (ctx) {
         aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh),
-                           false, xen_device_event, NULL, xen_device_poll,
-                           NULL, channel);
+                           xen_device_event, NULL, xen_device_poll, NULL,
+                           channel);
     }
 }
 
@@ -923,7 +923,7 @@ void xen_device_unbind_event_channel(XenDevice *xendev,
 
     QLIST_REMOVE(channel, list);
 
-    aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
+    aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh),
                        NULL, NULL, NULL, NULL, NULL);
 
     if (qemu_xen_evtchn_unbind(channel->xeh, channel->local_port) < 0) {
diff --git a/io/channel-command.c b/io/channel-command.c
index e7edd091af..7ed726c802 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -337,10 +337,8 @@ static void qio_channel_command_set_aio_fd_handler(QIOChannel *ioc,
                                                    void *opaque)
 {
     QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc);
-    aio_set_fd_handler(ctx, cioc->readfd, false,
-                       io_read, NULL, NULL, NULL, opaque);
-    aio_set_fd_handler(ctx, cioc->writefd, false,
-                       NULL, io_write, NULL, NULL, opaque);
+    aio_set_fd_handler(ctx, cioc->readfd, io_read, NULL, NULL, NULL, opaque);
+    aio_set_fd_handler(ctx, cioc->writefd, NULL, io_write, NULL, NULL, opaque);
 }
 
 
diff --git a/io/channel-file.c b/io/channel-file.c
index d76663e6ae..8b5821f452 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -198,8 +198,7 @@ static void qio_channel_file_set_aio_fd_handler(QIOChannel *ioc,
                                                 void *opaque)
 {
     QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
-    aio_set_fd_handler(ctx, fioc->fd, false, io_read, io_write,
-                       NULL, NULL, opaque);
+    aio_set_fd_handler(ctx, fioc->fd, io_read, io_write, NULL, NULL, opaque);
 }
 
 static GSource *qio_channel_file_create_watch(QIOChannel *ioc,
diff --git a/io/channel-socket.c b/io/channel-socket.c
index b0ea7d48b3..d99945ebec 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -899,8 +899,7 @@ static void qio_channel_socket_set_aio_fd_handler(QIOChannel *ioc,
                                                   void *opaque)
 {
     QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
-    aio_set_fd_handler(ctx, sioc->fd, false,
-                       io_read, io_write, NULL, NULL, opaque);
+    aio_set_fd_handler(ctx, sioc->fd, io_read, io_write, NULL, NULL, opaque);
 }
 
 static GSource *qio_channel_socket_create_watch(QIOChannel *ioc,
diff --git a/migration/rdma.c b/migration/rdma.c
index 7e747b2595..afb8761a06 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3110,15 +3110,15 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
     if (io_read) {
-        aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd,
-                           false, io_read, io_write, NULL, NULL, opaque);
-        aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd,
-                           false, io_read, io_write, NULL, NULL, opaque);
+        aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd, io_read,
+                           io_write, NULL, NULL, opaque);
+        aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd, io_read,
+                           io_write, NULL, NULL, opaque);
     } else {
-        aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd,
-                           false, io_read, io_write, NULL, NULL, opaque);
-        aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd,
-                           false, io_read, io_write, NULL, NULL, opaque);
+        aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd, io_read,
+                           io_write, NULL, NULL, opaque);
+        aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd, io_read,
+                           io_write, NULL, NULL, opaque);
     }
 }
 
diff --git a/tests/unit/test-aio.c b/tests/unit/test-aio.c
index 321d7ab01a..519440eed3 100644
--- a/tests/unit/test-aio.c
+++ b/tests/unit/test-aio.c
@@ -130,7 +130,7 @@ static void *test_acquire_thread(void *opaque)
 static void set_event_notifier(AioContext *ctx, EventNotifier *notifier,
                                EventNotifierHandler *handler)
 {
-    aio_set_event_notifier(ctx, notifier, false, handler, NULL, NULL);
+    aio_set_event_notifier(ctx, notifier, handler, NULL, NULL);
 }
 
 static void dummy_notifier_read(EventNotifier *n)
@@ -383,30 +383,6 @@ static void test_flush_event_notifier(void)
     event_notifier_cleanup(&data.e);
 }
 
-static void test_aio_external_client(void)
-{
-    int i, j;
-
-    for (i = 1; i < 3; i++) {
-        EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true };
-        event_notifier_init(&data.e, false);
-        aio_set_event_notifier(ctx, &data.e, true, event_ready_cb, NULL, NULL);
-        event_notifier_set(&data.e);
-        for (j = 0; j < i; j++) {
-            aio_disable_external(ctx);
-        }
-        for (j = 0; j < i; j++) {
-            assert(!aio_poll(ctx, false));
-            assert(event_notifier_test_and_clear(&data.e));
-            event_notifier_set(&data.e);
-            aio_enable_external(ctx);
-        }
-        assert(aio_poll(ctx, false));
-        set_event_notifier(ctx, &data.e, NULL);
-        event_notifier_cleanup(&data.e);
-    }
-}
-
 static void test_wait_event_notifier_noflush(void)
 {
     EventNotifierTestData data = { .n = 0 };
@@ -935,7 +911,6 @@ int main(int argc, char **argv)
     g_test_add_func("/aio/event/wait",              test_wait_event_notifier);
     g_test_add_func("/aio/event/wait/no-flush-cb",  test_wait_event_notifier_noflush);
     g_test_add_func("/aio/event/flush",             test_flush_event_notifier);
-    g_test_add_func("/aio/external-client",         test_aio_external_client);
     g_test_add_func("/aio/timer/schedule",          test_timer_schedule);
 
     g_test_add_func("/aio/coroutine/queue-chaining", test_queue_chaining);
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index dc3cb9e0e3..9cac61c2bf 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -435,7 +435,6 @@ static void test_graph_change_drain_all(void)
 
     g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
     g_assert_cmpint(b_s->drain_count, ==, 0);
-    g_assert_cmpint(qemu_get_aio_context()->external_disable_cnt, ==, 0);
 
     bdrv_unref(bs_b);
     blk_unref(blk_b);
diff --git a/tests/unit/test-fdmon-epoll.c b/tests/unit/test-fdmon-epoll.c
deleted file mode 100644
index ef5a856d09..0000000000
--- a/tests/unit/test-fdmon-epoll.c
+++ /dev/null
@@ -1,73 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * fdmon-epoll tests
- *
- * Copyright (c) 2020 Red Hat, Inc.
- */
-
-#include "qemu/osdep.h"
-#include "block/aio.h"
-#include "qapi/error.h"
-#include "qemu/main-loop.h"
-
-static AioContext *ctx;
-
-static void dummy_fd_handler(EventNotifier *notifier)
-{
-    event_notifier_test_and_clear(notifier);
-}
-
-static void add_event_notifiers(EventNotifier *notifiers, size_t n)
-{
-    for (size_t i = 0; i < n; i++) {
-        event_notifier_init(&notifiers[i], false);
-        aio_set_event_notifier(ctx, &notifiers[i], false,
-                               dummy_fd_handler, NULL, NULL);
-    }
-}
-
-static void remove_event_notifiers(EventNotifier *notifiers, size_t n)
-{
-    for (size_t i = 0; i < n; i++) {
-        aio_set_event_notifier(ctx, &notifiers[i], false, NULL, NULL, NULL);
-        event_notifier_cleanup(&notifiers[i]);
-    }
-}
-
-/* Check that fd handlers work when external clients are disabled */
-static void test_external_disabled(void)
-{
-    EventNotifier notifiers[100];
-
-    /* fdmon-epoll is only enabled when many fd handlers are registered */
-    add_event_notifiers(notifiers, G_N_ELEMENTS(notifiers));
-
-    event_notifier_set(&notifiers[0]);
-    assert(aio_poll(ctx, true));
-
-    aio_disable_external(ctx);
-    event_notifier_set(&notifiers[0]);
-    assert(aio_poll(ctx, true));
-    aio_enable_external(ctx);
-
-    remove_event_notifiers(notifiers, G_N_ELEMENTS(notifiers));
-}
-
-int main(int argc, char **argv)
-{
-    /*
-     * This code relies on the fact that fdmon-io_uring disables itself when
-     * the glib main loop is in use. The main loop uses fdmon-poll and upgrades
-     * to fdmon-epoll when the number of fds exceeds a threshold.
-     */
-    qemu_init_main_loop(&error_fatal);
-    ctx = qemu_get_aio_context();
-
-    while (g_main_context_iteration(NULL, false)) {
-        /* Do nothing */
-    }
-
-    g_test_init(&argc, &argv, NULL);
-    g_test_add_func("/fdmon-epoll/external-disabled", test_external_disabled);
-    return g_test_run();
-}
diff --git a/util/aio-posix.c b/util/aio-posix.c
index a8be940f76..934b1bbb85 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -99,7 +99,6 @@ static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
 
 void aio_set_fd_handler(AioContext *ctx,
                         int fd,
-                        bool is_external,
                         IOHandler *io_read,
                         IOHandler *io_write,
                         AioPollFn *io_poll,
@@ -144,7 +143,6 @@ void aio_set_fd_handler(AioContext *ctx,
         new_node->io_poll = io_poll;
         new_node->io_poll_ready = io_poll_ready;
         new_node->opaque = opaque;
-        new_node->is_external = is_external;
 
         if (is_new) {
             new_node->pfd.fd = fd;
@@ -196,12 +194,11 @@ static void aio_set_fd_poll(AioContext *ctx, int fd,
 
 void aio_set_event_notifier(AioContext *ctx,
                             EventNotifier *notifier,
-                            bool is_external,
                             EventNotifierHandler *io_read,
                             AioPollFn *io_poll,
                             EventNotifierHandler *io_poll_ready)
 {
-    aio_set_fd_handler(ctx, event_notifier_get_fd(notifier), is_external,
+    aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
                        (IOHandler *)io_read, NULL, io_poll,
                        (IOHandler *)io_poll_ready, notifier);
 }
@@ -285,13 +282,11 @@ bool aio_pending(AioContext *ctx)
 
         /* TODO should this check poll ready? */
         revents = node->pfd.revents & node->pfd.events;
-        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read &&
-            aio_node_check(ctx, node->is_external)) {
+        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
             result = true;
             break;
         }
-        if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write &&
-            aio_node_check(ctx, node->is_external)) {
+        if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
             result = true;
             break;
         }
@@ -350,9 +345,7 @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
         QLIST_INSERT_HEAD(&ctx->poll_aio_handlers, node, node_poll);
     }
     if (!QLIST_IS_INSERTED(node, node_deleted) &&
-        poll_ready && revents == 0 &&
-        aio_node_check(ctx, node->is_external) &&
-        node->io_poll_ready) {
+        poll_ready && revents == 0 && node->io_poll_ready) {
         node->io_poll_ready(node->opaque);
 
         /*
@@ -364,7 +357,6 @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
 
     if (!QLIST_IS_INSERTED(node, node_deleted) &&
         (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
-        aio_node_check(ctx, node->is_external) &&
         node->io_read) {
         node->io_read(node->opaque);
 
@@ -375,7 +367,6 @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
     }
     if (!QLIST_IS_INSERTED(node, node_deleted) &&
         (revents & (G_IO_OUT | G_IO_ERR)) &&
-        aio_node_check(ctx, node->is_external) &&
         node->io_write) {
         node->io_write(node->opaque);
         progress = true;
@@ -436,8 +427,7 @@ static bool run_poll_handlers_once(AioContext *ctx,
     AioHandler *tmp;
 
     QLIST_FOREACH_SAFE(node, &ctx->poll_aio_handlers, node_poll, tmp) {
-        if (aio_node_check(ctx, node->is_external) &&
-            node->io_poll(node->opaque)) {
+        if (node->io_poll(node->opaque)) {
             aio_add_poll_ready_handler(ready_list, node);
 
             node->poll_idle_timeout = now + POLL_IDLE_INTERVAL_NS;
diff --git a/util/aio-win32.c b/util/aio-win32.c
index 6bded009a4..948ef47a4d 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -32,7 +32,6 @@ struct AioHandler {
     GPollFD pfd;
     int deleted;
     void *opaque;
-    bool is_external;
     QLIST_ENTRY(AioHandler) node;
 };
 
@@ -64,7 +63,6 @@ static void aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
 
 void aio_set_fd_handler(AioContext *ctx,
                         int fd,
-                        bool is_external,
                         IOHandler *io_read,
                         IOHandler *io_write,
                         AioPollFn *io_poll,
@@ -111,7 +109,6 @@ void aio_set_fd_handler(AioContext *ctx,
         node->opaque = opaque;
         node->io_read = io_read;
         node->io_write = io_write;
-        node->is_external = is_external;
 
         if (io_read) {
             bitmask |= FD_READ | FD_ACCEPT | FD_CLOSE;
@@ -135,7 +132,6 @@ void aio_set_fd_handler(AioContext *ctx,
 
 void aio_set_event_notifier(AioContext *ctx,
                             EventNotifier *e,
-                            bool is_external,
                             EventNotifierHandler *io_notify,
                             AioPollFn *io_poll,
                             EventNotifierHandler *io_poll_ready)
@@ -161,7 +157,6 @@ void aio_set_event_notifier(AioContext *ctx,
             node->e = e;
             node->pfd.fd = (uintptr_t)event_notifier_get_handle(e);
             node->pfd.events = G_IO_IN;
-            node->is_external = is_external;
             QLIST_INSERT_HEAD_RCU(&ctx->aio_handlers, node, node);
 
             g_source_add_poll(&ctx->source, &node->pfd);
@@ -368,8 +363,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
     /* fill fd sets */
     count = 0;
     QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
-        if (!node->deleted && node->io_notify
-            && aio_node_check(ctx, node->is_external)) {
+        if (!node->deleted && node->io_notify) {
             assert(count < MAXIMUM_WAIT_OBJECTS);
             events[count++] = event_notifier_get_handle(node->e);
         }
diff --git a/util/async.c b/util/async.c
index 055070ffbd..8f90ddc304 100644
--- a/util/async.c
+++ b/util/async.c
@@ -409,7 +409,7 @@ aio_ctx_finalize(GSource     *source)
         g_free(bh);
     }
 
-    aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL, NULL);
+    aio_set_event_notifier(ctx, &ctx->notifier, NULL, NULL, NULL);
     event_notifier_cleanup(&ctx->notifier);
     qemu_rec_mutex_destroy(&ctx->lock);
     qemu_lockcnt_destroy(&ctx->list_lock);
@@ -593,7 +593,6 @@ AioContext *aio_context_new(Error **errp)
     QSLIST_INIT(&ctx->scheduled_coroutines);
 
     aio_set_event_notifier(ctx, &ctx->notifier,
-                           false,
                            aio_context_notifier_cb,
                            aio_context_notifier_poll,
                            aio_context_notifier_poll_ready);
diff --git a/util/fdmon-epoll.c b/util/fdmon-epoll.c
index 1683aa1105..6b6a1a91f8 100644
--- a/util/fdmon-epoll.c
+++ b/util/fdmon-epoll.c
@@ -64,11 +64,6 @@ static int fdmon_epoll_wait(AioContext *ctx, AioHandlerList *ready_list,
     int i, ret = 0;
     struct epoll_event events[128];
 
-    /* Fall back while external clients are disabled */
-    if (qatomic_read(&ctx->external_disable_cnt)) {
-        return fdmon_poll_ops.wait(ctx, ready_list, timeout);
-    }
-
     if (timeout > 0) {
         ret = qemu_poll_ns(&pfd, 1, timeout);
         if (ret > 0) {
@@ -133,13 +128,12 @@ bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd)
         return false;
     }
 
-    /* Do not upgrade while external clients are disabled */
-    if (qatomic_read(&ctx->external_disable_cnt)) {
-        return false;
-    }
-
-    if (npfd < EPOLL_ENABLE_THRESHOLD) {
-        return false;
+    if (npfd >= EPOLL_ENABLE_THRESHOLD) {
+        if (fdmon_epoll_try_enable(ctx)) {
+            return true;
+        } else {
+            fdmon_epoll_disable(ctx);
+        }
     }
 
     /* The list must not change while we add fds to epoll */
diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
index ab43052dd7..17ec18b7bd 100644
--- a/util/fdmon-io_uring.c
+++ b/util/fdmon-io_uring.c
@@ -276,11 +276,6 @@ static int fdmon_io_uring_wait(AioContext *ctx, AioHandlerList *ready_list,
     unsigned wait_nr = 1; /* block until at least one cqe is ready */
     int ret;
 
-    /* Fall back while external clients are disabled */
-    if (qatomic_read(&ctx->external_disable_cnt)) {
-        return fdmon_poll_ops.wait(ctx, ready_list, timeout);
-    }
-
     if (timeout == 0) {
         wait_nr = 0; /* non-blocking */
     } else if (timeout > 0) {
@@ -315,8 +310,7 @@ static bool fdmon_io_uring_need_wait(AioContext *ctx)
         return true;
     }
 
-    /* Are we falling back to fdmon-poll? */
-    return qatomic_read(&ctx->external_disable_cnt);
+    return false;
 }
 
 static const FDMonOps fdmon_io_uring_ops = {
diff --git a/util/fdmon-poll.c b/util/fdmon-poll.c
index 5fe3b47865..17df917cf9 100644
--- a/util/fdmon-poll.c
+++ b/util/fdmon-poll.c
@@ -65,8 +65,7 @@ static int fdmon_poll_wait(AioContext *ctx, AioHandlerList *ready_list,
     assert(npfd == 0);
 
     QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
-        if (!QLIST_IS_INSERTED(node, node_deleted) && node->pfd.events
-                && aio_node_check(ctx, node->is_external)) {
+        if (!QLIST_IS_INSERTED(node, node_deleted) && node->pfd.events) {
             add_pollfd(node);
         }
     }
diff --git a/util/main-loop.c b/util/main-loop.c
index 7022f02ef8..014c795916 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -644,14 +644,13 @@ void qemu_set_fd_handler(int fd,
                          void *opaque)
 {
     iohandler_init();
-    aio_set_fd_handler(iohandler_ctx, fd, false,
-                       fd_read, fd_write, NULL, NULL, opaque);
+    aio_set_fd_handler(iohandler_ctx, fd, fd_read, fd_write, NULL, NULL,
+                       opaque);
 }
 
 void event_notifier_set_handler(EventNotifier *e,
                                 EventNotifierHandler *handler)
 {
     iohandler_init();
-    aio_set_event_notifier(iohandler_ctx, e, false,
-                           handler, NULL, NULL);
+    aio_set_event_notifier(iohandler_ctx, e, handler, NULL, NULL);
 }
diff --git a/util/qemu-coroutine-io.c b/util/qemu-coroutine-io.c
index d791932d63..364f4d5abf 100644
--- a/util/qemu-coroutine-io.c
+++ b/util/qemu-coroutine-io.c
@@ -74,8 +74,7 @@ typedef struct {
 static void fd_coroutine_enter(void *opaque)
 {
     FDYieldUntilData *data = opaque;
-    aio_set_fd_handler(data->ctx, data->fd, false,
-                       NULL, NULL, NULL, NULL, NULL);
+    aio_set_fd_handler(data->ctx, data->fd, NULL, NULL, NULL, NULL, NULL);
     qemu_coroutine_enter(data->co);
 }
 
@@ -87,7 +86,7 @@ void coroutine_fn yield_until_fd_readable(int fd)
     data.ctx = qemu_get_current_aio_context();
     data.co = qemu_coroutine_self();
     data.fd = fd;
-    aio_set_fd_handler(
-        data.ctx, fd, false, fd_coroutine_enter, NULL, NULL, NULL, &data);
+    aio_set_fd_handler(data.ctx, fd, fd_coroutine_enter, NULL, NULL, NULL,
+                       &data);
     qemu_coroutine_yield();
 }
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index a12b2d1bba..cd17fb5326 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -278,7 +278,7 @@ set_watch(VuDev *vu_dev, int fd, int vu_evt,
         vu_fd_watch->fd = fd;
         vu_fd_watch->cb = cb;
         qemu_socket_set_nonblock(fd);
-        aio_set_fd_handler(server->ioc->ctx, fd, false, kick_handler,
+        aio_set_fd_handler(server->ioc->ctx, fd, kick_handler,
                            NULL, NULL, NULL, vu_fd_watch);
         vu_fd_watch->vu_dev = vu_dev;
         vu_fd_watch->pvt = pvt;
@@ -299,8 +299,7 @@ static void remove_watch(VuDev *vu_dev, int fd)
     if (!vu_fd_watch) {
         return;
     }
-    aio_set_fd_handler(server->ioc->ctx, fd, false,
-                       NULL, NULL, NULL, NULL, NULL);
+    aio_set_fd_handler(server->ioc->ctx, fd, NULL, NULL, NULL, NULL, NULL);
 
     QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
     g_free(vu_fd_watch);
@@ -362,7 +361,7 @@ void vhost_user_server_stop(VuServer *server)
         VuFdWatch *vu_fd_watch;
 
         QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
-            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false,
+            aio_set_fd_handler(server->ctx, vu_fd_watch->fd,
                                NULL, NULL, NULL, NULL, vu_fd_watch);
         }
 
@@ -403,7 +402,7 @@ void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx)
     qio_channel_attach_aio_context(server->ioc, ctx);
 
     QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
-        aio_set_fd_handler(ctx, vu_fd_watch->fd, false, kick_handler, NULL,
+        aio_set_fd_handler(ctx, vu_fd_watch->fd, kick_handler, NULL,
                            NULL, NULL, vu_fd_watch);
     }
 
@@ -417,7 +416,7 @@ void vhost_user_server_detach_aio_context(VuServer *server)
         VuFdWatch *vu_fd_watch;
 
         QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
-            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false,
+            aio_set_fd_handler(server->ctx, vu_fd_watch->fd,
                                NULL, NULL, NULL, NULL, vu_fd_watch);
         }
 
diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 3bc78d8660..b33298a444 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -122,9 +122,6 @@ if have_block
   if nettle.found() or gcrypt.found()
     tests += {'test-crypto-pbkdf': [io]}
   endif
-  if config_host_data.get('CONFIG_EPOLL_CREATE1')
-    tests += {'test-fdmon-epoll': [testblock]}
-  endif
 endif
 
 if have_system
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 00/21] block: remove aio_disable_external() API
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (20 preceding siblings ...)
  2023-05-04 19:53 ` [PATCH v5 21/21] aio: remove aio_disable_external() API Stefan Hajnoczi
@ 2023-05-04 21:44 ` Kevin Wolf
  2023-05-09 17:51   ` Stefan Hajnoczi
  2023-05-09 19:07 ` Kevin Wolf
  22 siblings, 1 reply; 34+ messages in thread
From: Kevin Wolf @ 2023-05-04 21:44 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Ronnie Sahlberg, Aarushi Mehta, qemu-block,
	Paul Durrant, Anthony Perard, Peter Lieven, Stefan Weil,
	Xie Yongji, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Julia Suvorova, xen-devel, eesposit, Juan Quintela,
	Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

Am 04.05.2023 um 21:53 hat Stefan Hajnoczi geschrieben:
> v5:
> - Use atomic accesses for in_flight counter in vhost-user-server.c [Kevin]
> - Stash SCSIDevice id/lun values for VIRTIO_SCSI_T_TRANSPORT_RESET event
>   before unrealizing the SCSIDevice [Kevin]
> - Keep vhost-user-blk export .detach() callback so ctx is set to NULL [Kevin]
> - Narrow BdrvChildClass and BlockDriver drained_{begin/end/poll} callbacks from
>   IO_OR_GS_CODE() to GLOBAL_STATE_CODE() [Kevin]
> - Include Kevin's "block: Fix use after free in blockdev_mark_auto_del()" to
>   fix a latent bug that was exposed by this series

I only just finished reviewing v4 when you had already sent v5, but it
hadn't arrived yet. I had a few more comments on what are now patches
17, 18, 19 and 21 in v5. I think they all still apply.

Kevin



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 00/21] block: remove aio_disable_external() API
  2023-05-04 21:44 ` [PATCH v5 00/21] block: " Kevin Wolf
@ 2023-05-09 17:51   ` Stefan Hajnoczi
  2023-05-09 18:35     ` Kevin Wolf
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-09 17:51 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, Ronnie Sahlberg, Aarushi Mehta, qemu-block,
	Paul Durrant, Anthony Perard, Peter Lieven, Stefan Weil,
	Xie Yongji, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Julia Suvorova, xen-devel, eesposit, Juan Quintela,
	Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

[-- Attachment #1: Type: text/plain, Size: 1123 bytes --]

On Thu, May 04, 2023 at 11:44:42PM +0200, Kevin Wolf wrote:
> Am 04.05.2023 um 21:53 hat Stefan Hajnoczi geschrieben:
> > v5:
> > - Use atomic accesses for in_flight counter in vhost-user-server.c [Kevin]
> > - Stash SCSIDevice id/lun values for VIRTIO_SCSI_T_TRANSPORT_RESET event
> >   before unrealizing the SCSIDevice [Kevin]
> > - Keep vhost-user-blk export .detach() callback so ctx is set to NULL [Kevin]
> > - Narrow BdrvChildClass and BlockDriver drained_{begin/end/poll} callbacks from
> >   IO_OR_GS_CODE() to GLOBAL_STATE_CODE() [Kevin]
> > - Include Kevin's "block: Fix use after free in blockdev_mark_auto_del()" to
> >   fix a latent bug that was exposed by this series
> 
> I only just finished reviewing v4 when you had already sent v5, but it
> hadn't arrived yet. I had a few more comments on what are now patches
> 17, 18, 19 and 21 in v5. I think they all still apply.

I'm not sure which comments from v4 still apply. In my email client all
your replies were already read when I sent v5.

Maybe you can share the Message-Id of something I still need to address?

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 00/21] block: remove aio_disable_external() API
  2023-05-09 17:51   ` Stefan Hajnoczi
@ 2023-05-09 18:35     ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2023-05-09 18:35 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Ronnie Sahlberg, Aarushi Mehta, qemu-block,
	Paul Durrant, Anthony Perard, Peter Lieven, Stefan Weil,
	Xie Yongji, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Julia Suvorova, xen-devel, eesposit, Juan Quintela,
	Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

[-- Attachment #1: Type: text/plain, Size: 1571 bytes --]

Am 09.05.2023 um 19:51 hat Stefan Hajnoczi geschrieben:
> On Thu, May 04, 2023 at 11:44:42PM +0200, Kevin Wolf wrote:
> > Am 04.05.2023 um 21:53 hat Stefan Hajnoczi geschrieben:
> > > v5:
> > > - Use atomic accesses for in_flight counter in vhost-user-server.c [Kevin]
> > > - Stash SCSIDevice id/lun values for VIRTIO_SCSI_T_TRANSPORT_RESET event
> > >   before unrealizing the SCSIDevice [Kevin]
> > > - Keep vhost-user-blk export .detach() callback so ctx is set to NULL [Kevin]
> > > - Narrow BdrvChildClass and BlockDriver drained_{begin/end/poll} callbacks from
> > >   IO_OR_GS_CODE() to GLOBAL_STATE_CODE() [Kevin]
> > > - Include Kevin's "block: Fix use after free in blockdev_mark_auto_del()" to
> > >   fix a latent bug that was exposed by this series
> > 
> > I only just finished reviewing v4 when you had already sent v5, but it
> > hadn't arrived yet. I had a few more comments on what are now patches
> > 17, 18, 19 and 21 in v5. I think they all still apply.
> 
> I'm not sure which comments from v4 still apply. In my email client all
> your replies were already read when I sent v5.

Yes, but I added some more replies after you had sent v5 (and before I
fetched mail again to actually see v5).

> Maybe you can share the Message-Id of something I still need to address?

I thought the patch numbers identified them and were easier, but sure:

Message-ID: <ZFQc89cFJuoGF+qI@redhat.com>
Message-ID: <ZFQgBvWShB4NCymj@redhat.com>
Message-ID: <ZFQivbkVPcX3nECA@redhat.com>
Message-ID: <ZFQk2TdhZ6DiwM4t@redhat.com>

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 05/21] virtio-scsi: stop using aio_disable_external() during unplug
  2023-05-04 19:53 ` [PATCH v5 05/21] virtio-scsi: stop using aio_disable_external() during unplug Stefan Hajnoczi
@ 2023-05-09 18:55   ` Kevin Wolf
  2023-05-09 20:43     ` Stefan Hajnoczi
  0 siblings, 1 reply; 34+ messages in thread
From: Kevin Wolf @ 2023-05-09 18:55 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Ronnie Sahlberg, Aarushi Mehta, qemu-block,
	Paul Durrant, Anthony Perard, Peter Lieven, Stefan Weil,
	Xie Yongji, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Julia Suvorova, xen-devel, eesposit, Juan Quintela,
	Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum, Zhengui Li,
	Daniil Tatianin

Am 04.05.2023 um 21:53 hat Stefan Hajnoczi geschrieben:
> This patch is part of an effort to remove the aio_disable_external()
> API because it does not fit in a multi-queue block layer world where
> many AioContexts may be submitting requests to the same disk.
> 
> The SCSI emulation code is already in good shape to stop using
> aio_disable_external(). It was only used by commit 9c5aad84da1c
> ("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
> disk") to ensure that virtio_scsi_hotunplug() works while the guest
> driver is submitting I/O.
> 
> Ensure virtio_scsi_hotunplug() is safe as follows:
> 
> 1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
>    device_set_realized() calls qatomic_set(&dev->realized, false) so
>    that future scsi_device_get() calls return NULL because they exclude
>    SCSIDevices with realized=false.
> 
>    That means virtio-scsi will reject new I/O requests to this
>    SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
>    virtio_scsi_hotunplug() is still executing. We are protected against
>    new requests!
> 
> 2. scsi_device_unrealize() already contains a call to

I think you mean scsi_qdev_unrealize(). Can be fixed while applying.

>    scsi_device_purge_requests() so that in-flight requests are cancelled
>    synchronously. This ensures that no in-flight requests remain once
>    qdev_simple_device_unplug_cb() returns.
> 
> Thanks to these two conditions we don't need aio_disable_external()
> anymore.
> 
> Cc: Zhengui Li <lizhengui@huawei.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Kevin



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 00/21] block: remove aio_disable_external() API
  2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (21 preceding siblings ...)
  2023-05-04 21:44 ` [PATCH v5 00/21] block: " Kevin Wolf
@ 2023-05-09 19:07 ` Kevin Wolf
  22 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2023-05-09 19:07 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Ronnie Sahlberg, Aarushi Mehta, qemu-block,
	Paul Durrant, Anthony Perard, Peter Lieven, Stefan Weil,
	Xie Yongji, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Julia Suvorova, xen-devel, eesposit, Juan Quintela,
	Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

Am 04.05.2023 um 21:53 hat Stefan Hajnoczi geschrieben:
> v5:
> - Use atomic accesses for in_flight counter in vhost-user-server.c [Kevin]
> - Stash SCSIDevice id/lun values for VIRTIO_SCSI_T_TRANSPORT_RESET event
>   before unrealizing the SCSIDevice [Kevin]
> - Keep vhost-user-blk export .detach() callback so ctx is set to NULL [Kevin]
> - Narrow BdrvChildClass and BlockDriver drained_{begin/end/poll} callbacks from
>   IO_OR_GS_CODE() to GLOBAL_STATE_CODE() [Kevin]
> - Include Kevin's "block: Fix use after free in blockdev_mark_auto_del()" to
>   fix a latent bug that was exposed by this series
> 
> v4:
> - Remove external_disable_cnt variable [Philippe]
> - Add Patch 1 to fix assertion failure in .drained_end() -> blk_get_aio_context()
> v3:
> - Resend full patch series. v2 was sent in the middle of a git rebase and was
>   missing patches. [Eric]
> - Apply Reviewed-by tags.
> v2:
> - Do not rely on BlockBackend request queuing, implement .drained_begin/end()
>   instead in xen-block, virtio-blk, and virtio-scsi [Paolo]
> - Add qdev_is_realized() API [Philippe]
> - Add patch to avoid AioContext lock around blk_exp_ref/unref() [Paolo]
> - Add patch to call .drained_begin/end() from main loop thread to simplify
>   callback implementations
> 
> The aio_disable_external() API temporarily suspends file descriptor monitoring
> in the event loop. The block layer uses this to prevent new I/O requests being
> submitted from the guest and elsewhere between bdrv_drained_begin() and
> bdrv_drained_end().
> 
> While the block layer still needs to prevent new I/O requests in drained
> sections, the aio_disable_external() API can be replaced with
> .drained_begin/end/poll() callbacks that have been added to BdrvChildClass and
> BlockDevOps.
> 
> This newer .bdrained_begin/end/poll() approach is attractive because it works
> without specifying a specific AioContext. The block layer is moving towards
> multi-queue and that means multiple AioContexts may be processing I/O
> simultaneously.
> 
> The aio_disable_external() was always somewhat hacky. It suspends all file
> descriptors that were registered with is_external=true, even if they have
> nothing to do with the BlockDriverState graph nodes that are being drained.
> It's better to solve a block layer problem in the block layer than to have an
> odd event loop API solution.
> 
> The approach in this patch series is to implement BlockDevOps
> .drained_begin/end() callbacks that temporarily stop file descriptor handlers.
> This ensures that new I/O requests are not submitted in drained sections.

Patches 2-16: Reviewed-by: Kevin Wolf <kwolf@redhat.com>



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 05/21] virtio-scsi: stop using aio_disable_external() during unplug
  2023-05-09 18:55   ` Kevin Wolf
@ 2023-05-09 20:43     ` Stefan Hajnoczi
  0 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2023-05-09 20:43 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, Ronnie Sahlberg, Aarushi Mehta, qemu-block,
	Paul Durrant, Anthony Perard, Peter Lieven, Stefan Weil,
	Xie Yongji, Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras,
	Peter Xu, Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Julia Suvorova, xen-devel, eesposit, Juan Quintela,
	Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum, Zhengui Li,
	Daniil Tatianin

[-- Attachment #1: Type: text/plain, Size: 1474 bytes --]

On Tue, May 09, 2023 at 08:55:14PM +0200, Kevin Wolf wrote:
> Am 04.05.2023 um 21:53 hat Stefan Hajnoczi geschrieben:
> > This patch is part of an effort to remove the aio_disable_external()
> > API because it does not fit in a multi-queue block layer world where
> > many AioContexts may be submitting requests to the same disk.
> > 
> > The SCSI emulation code is already in good shape to stop using
> > aio_disable_external(). It was only used by commit 9c5aad84da1c
> > ("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
> > disk") to ensure that virtio_scsi_hotunplug() works while the guest
> > driver is submitting I/O.
> > 
> > Ensure virtio_scsi_hotunplug() is safe as follows:
> > 
> > 1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
> >    device_set_realized() calls qatomic_set(&dev->realized, false) so
> >    that future scsi_device_get() calls return NULL because they exclude
> >    SCSIDevices with realized=false.
> > 
> >    That means virtio-scsi will reject new I/O requests to this
> >    SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
> >    virtio_scsi_hotunplug() is still executing. We are protected against
> >    new requests!
> > 
> > 2. scsi_device_unrealize() already contains a call to
> 
> I think you mean scsi_qdev_unrealize(). Can be fixed while applying.

Yes, it should be scsi_qdev_unrealize(). I'll review your other comments
and fix this if I need to respin.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 12/21] xen-block: implement BlockDevOps->drained_begin()
  2023-05-04 19:53 ` [PATCH v5 12/21] xen-block: implement BlockDevOps->drained_begin() Stefan Hajnoczi
@ 2023-05-16 14:24     ` Anthony PERARD
  2023-05-16 14:45     ` Anthony PERARD
  1 sibling, 0 replies; 34+ messages in thread
From: Anthony PERARD via @ 2023-05-16 14:24 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Ronnie Sahlberg, Aarushi Mehta, qemu-block,
	Paul Durrant, Peter Lieven, Stefan Weil, Xie Yongji, Kevin Wolf,
	Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras, Peter Xu,
	Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Julia Suvorova, xen-devel, eesposit, Juan Quintela,
	Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

On Thu, May 04, 2023 at 03:53:18PM -0400, Stefan Hajnoczi wrote:
> Detach event channels during drained sections to stop I/O submission
> from the ring. xen-block is no longer reliant on aio_disable_external()
> after this patch. This will allow us to remove the
> aio_disable_external() API once all other code that relies on it is
> converted.
> 
> Extend xen_device_set_event_channel_context() to allow ctx=NULL. The
> event channel still exists but the event loop does not monitor the file
> descriptor. Event channel processing can resume by calling
> xen_device_set_event_channel_context() with a non-NULL ctx.
> 
> Factor out xen_device_set_event_channel_context() calls in
> hw/block/dataplane/xen-block.c into attach/detach helper functions.
> Incidentally, these don't require the AioContext lock because
> aio_set_fd_handler() is thread-safe.
> 
> It's safer to register BlockDevOps after the dataplane instance has been
> created. The BlockDevOps .drained_begin/end() callbacks depend on the
> dataplane instance, so move the blk_set_dev_ops() call after
> xen_block_dataplane_create().
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>

Thanks,

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 12/21] xen-block: implement BlockDevOps->drained_begin()
@ 2023-05-16 14:24     ` Anthony PERARD
  0 siblings, 0 replies; 34+ messages in thread
From: Anthony PERARD @ 2023-05-16 14:24 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Ronnie Sahlberg, Aarushi Mehta, qemu-block,
	Paul Durrant, Peter Lieven, Stefan Weil, Xie Yongji, Kevin Wolf,
	Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras, Peter Xu,
	Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Julia Suvorova, xen-devel, eesposit, Juan Quintela,
	Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

On Thu, May 04, 2023 at 03:53:18PM -0400, Stefan Hajnoczi wrote:
> Detach event channels during drained sections to stop I/O submission
> from the ring. xen-block is no longer reliant on aio_disable_external()
> after this patch. This will allow us to remove the
> aio_disable_external() API once all other code that relies on it is
> converted.
> 
> Extend xen_device_set_event_channel_context() to allow ctx=NULL. The
> event channel still exists but the event loop does not monitor the file
> descriptor. Event channel processing can resume by calling
> xen_device_set_event_channel_context() with a non-NULL ctx.
> 
> Factor out xen_device_set_event_channel_context() calls in
> hw/block/dataplane/xen-block.c into attach/detach helper functions.
> Incidentally, these don't require the AioContext lock because
> aio_set_fd_handler() is thread-safe.
> 
> It's safer to register BlockDevOps after the dataplane instance has been
> created. The BlockDevOps .drained_begin/end() callbacks depend on the
> dataplane instance, so move the blk_set_dev_ops() call after
> xen_block_dataplane_create().
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>

Thanks,

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 13/21] hw/xen: do not set is_external=true on evtchn fds
  2023-05-04 19:53 ` [PATCH v5 13/21] hw/xen: do not set is_external=true on evtchn fds Stefan Hajnoczi
@ 2023-05-16 14:25     ` Anthony PERARD
  0 siblings, 0 replies; 34+ messages in thread
From: Anthony PERARD via @ 2023-05-16 14:25 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Ronnie Sahlberg, Aarushi Mehta, qemu-block,
	Paul Durrant, Peter Lieven, Stefan Weil, Xie Yongji, Kevin Wolf,
	Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras, Peter Xu,
	Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Julia Suvorova, xen-devel, eesposit, Juan Quintela,
	Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

On Thu, May 04, 2023 at 03:53:19PM -0400, Stefan Hajnoczi wrote:
> is_external=true suspends fd handlers between aio_disable_external() and
> aio_enable_external(). The block layer's drain operation uses this
> mechanism to prevent new I/O from sneaking in between
> bdrv_drained_begin() and bdrv_drained_end().
> 
> The previous commit converted the xen-block device to use BlockDevOps
> .drained_begin/end() callbacks. It no longer relies on is_external=true
> so it is safe to pass is_external=false.
> 
> This is part of ongoing work to remove the aio_disable_external() API.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Acked-by: Anthony PERARD <anthony.perard@citrix.com>

Thanks,

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 13/21] hw/xen: do not set is_external=true on evtchn fds
@ 2023-05-16 14:25     ` Anthony PERARD
  0 siblings, 0 replies; 34+ messages in thread
From: Anthony PERARD @ 2023-05-16 14:25 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Ronnie Sahlberg, Aarushi Mehta, qemu-block,
	Paul Durrant, Peter Lieven, Stefan Weil, Xie Yongji, Kevin Wolf,
	Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras, Peter Xu,
	Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Julia Suvorova, xen-devel, eesposit, Juan Quintela,
	Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

On Thu, May 04, 2023 at 03:53:19PM -0400, Stefan Hajnoczi wrote:
> is_external=true suspends fd handlers between aio_disable_external() and
> aio_enable_external(). The block layer's drain operation uses this
> mechanism to prevent new I/O from sneaking in between
> bdrv_drained_begin() and bdrv_drained_end().
> 
> The previous commit converted the xen-block device to use BlockDevOps
> .drained_begin/end() callbacks. It no longer relies on is_external=true
> so it is safe to pass is_external=false.
> 
> This is part of ongoing work to remove the aio_disable_external() API.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Acked-by: Anthony PERARD <anthony.perard@citrix.com>

Thanks,

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 12/21] xen-block: implement BlockDevOps->drained_begin()
  2023-05-04 19:53 ` [PATCH v5 12/21] xen-block: implement BlockDevOps->drained_begin() Stefan Hajnoczi
@ 2023-05-16 14:45     ` Anthony PERARD
  2023-05-16 14:45     ` Anthony PERARD
  1 sibling, 0 replies; 34+ messages in thread
From: Anthony PERARD via @ 2023-05-16 14:45 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Ronnie Sahlberg, Aarushi Mehta, qemu-block,
	Paul Durrant, Peter Lieven, Stefan Weil, Xie Yongji, Kevin Wolf,
	Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras, Peter Xu,
	Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Julia Suvorova, xen-devel, eesposit, Juan Quintela,
	Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

On Thu, May 04, 2023 at 03:53:18PM -0400, Stefan Hajnoczi wrote:
> @@ -819,11 +841,9 @@ void xen_block_dataplane_start(XenBlockDataPlane *dataplane,
>      blk_set_aio_context(dataplane->blk, dataplane->ctx, NULL);
>      aio_context_release(old_context);
>  
> -    /* Only reason for failure is a NULL channel */
> -    aio_context_acquire(dataplane->ctx);
> -    xen_device_set_event_channel_context(xendev, dataplane->event_channel,
> -                                         dataplane->ctx, &error_abort);
> -    aio_context_release(dataplane->ctx);
> +    if (!blk_in_drain(dataplane->blk)) {

There's maybe something missing in the patch.
xen_block_dataplane_start() calls xen_device_bind_event_channel() just
before xen_block_dataplane_attach().

And xen_device_bind_event_channel() sets the event context to
qemu_get_aio_context() instead of NULL.

So, even if we don't call xen_block_dataplane_attach() while in drain,
there's already a fd_handler attach to the fd. So should
xen_device_bind_event_channel() be changed as well? Or maybe a call to
xen_block_dataplane_detach() would be enough.

(There's only one user of xen_device_bind_event_channel() at the moment
so I don't know if other implementation making use of this API will want
to call set_event_channel_context or not.)

> +        xen_block_dataplane_attach(dataplane);
> +    }
>  
>      return;
>  

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 12/21] xen-block: implement BlockDevOps->drained_begin()
@ 2023-05-16 14:45     ` Anthony PERARD
  0 siblings, 0 replies; 34+ messages in thread
From: Anthony PERARD @ 2023-05-16 14:45 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Ronnie Sahlberg, Aarushi Mehta, qemu-block,
	Paul Durrant, Peter Lieven, Stefan Weil, Xie Yongji, Kevin Wolf,
	Paolo Bonzini, Michael S. Tsirkin, Leonardo Bras, Peter Xu,
	Hanna Reitz, Stefano Stabellini, Richard Henderson,
	David Woodhouse, Coiby Xu, Eduardo Habkost, Stefano Garzarella,
	Philippe Mathieu-Daudé, Daniel P. Berrangé,
	Julia Suvorova, xen-devel, eesposit, Juan Quintela,
	Richard W.M. Jones, Fam Zheng, Marcel Apfelbaum

On Thu, May 04, 2023 at 03:53:18PM -0400, Stefan Hajnoczi wrote:
> @@ -819,11 +841,9 @@ void xen_block_dataplane_start(XenBlockDataPlane *dataplane,
>      blk_set_aio_context(dataplane->blk, dataplane->ctx, NULL);
>      aio_context_release(old_context);
>  
> -    /* Only reason for failure is a NULL channel */
> -    aio_context_acquire(dataplane->ctx);
> -    xen_device_set_event_channel_context(xendev, dataplane->event_channel,
> -                                         dataplane->ctx, &error_abort);
> -    aio_context_release(dataplane->ctx);
> +    if (!blk_in_drain(dataplane->blk)) {

There's maybe something missing in the patch.
xen_block_dataplane_start() calls xen_device_bind_event_channel() just
before xen_block_dataplane_attach().

And xen_device_bind_event_channel() sets the event context to
qemu_get_aio_context() instead of NULL.

So, even if we don't call xen_block_dataplane_attach() while in drain,
there's already a fd_handler attach to the fd. So should
xen_device_bind_event_channel() be changed as well? Or maybe a call to
xen_block_dataplane_detach() would be enough.

(There's only one user of xen_device_bind_event_channel() at the moment
so I don't know if other implementation making use of this API will want
to call set_event_channel_context or not.)

> +        xen_block_dataplane_attach(dataplane);
> +    }
>  
>      return;
>  

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2023-05-16 14:46 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-04 19:53 [PATCH v5 00/21] block: remove aio_disable_external() API Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 01/21] block: Fix use after free in blockdev_mark_auto_del() Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 02/21] block-backend: split blk_do_set_aio_context() Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 03/21] hw/qdev: introduce qdev_is_realized() helper Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 04/21] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 05/21] virtio-scsi: stop using aio_disable_external() during unplug Stefan Hajnoczi
2023-05-09 18:55   ` Kevin Wolf
2023-05-09 20:43     ` Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 06/21] util/vhost-user-server: rename refcount to in_flight counter Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 07/21] block/export: wait for vhost-user-blk requests when draining Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 08/21] block/export: stop using is_external in vhost-user-blk server Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 09/21] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 10/21] block: add blk_in_drain() API Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 11/21] block: drain from main loop thread in bdrv_co_yield_to_drain() Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 12/21] xen-block: implement BlockDevOps->drained_begin() Stefan Hajnoczi
2023-05-16 14:24   ` Anthony PERARD via
2023-05-16 14:24     ` Anthony PERARD
2023-05-16 14:45   ` Anthony PERARD via
2023-05-16 14:45     ` Anthony PERARD
2023-05-04 19:53 ` [PATCH v5 13/21] hw/xen: do not set is_external=true on evtchn fds Stefan Hajnoczi
2023-05-16 14:25   ` Anthony PERARD via
2023-05-16 14:25     ` Anthony PERARD
2023-05-04 19:53 ` [PATCH v5 14/21] block/export: rewrite vduse-blk drain code Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 15/21] block/export: don't require AioContext lock around blk_exp_ref/unref() Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 16/21] block/fuse: do not set is_external=true on FUSE fd Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 17/21] virtio: make it possible to detach host notifier from any thread Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 18/21] virtio-blk: implement BlockDevOps->drained_begin() Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 19/21] virtio-scsi: " Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 20/21] virtio: do not set is_external=true on host notifiers Stefan Hajnoczi
2023-05-04 19:53 ` [PATCH v5 21/21] aio: remove aio_disable_external() API Stefan Hajnoczi
2023-05-04 21:44 ` [PATCH v5 00/21] block: " Kevin Wolf
2023-05-09 17:51   ` Stefan Hajnoczi
2023-05-09 18:35     ` Kevin Wolf
2023-05-09 19:07 ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.