All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] block: remove aio_disable_external() API
@ 2023-04-03 18:29 Stefan Hajnoczi
  2023-04-03 18:29 ` [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
                   ` (13 more replies)
  0 siblings, 14 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-03 18:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

The aio_disable_external() API temporarily suspends file descriptor monitoring
in the event loop. The block layer uses this to prevent new I/O requests being
submitted from the guest and elsewhere between bdrv_drained_begin() and
bdrv_drained_end().

While the block layer still needs to prevent new I/O requests in drained
sections, the aio_disable_external() API can be replaced with
.drained_begin/end/poll() callbacks that have been added to BdrvChildClass and
BlockDevOps.

This newer .bdrained_begin/end/poll() approach is attractive because it works
without specifying a specific AioContext. The block layer is moving towards
multi-queue and that means multiple AioContexts may be processing I/O
simultaneously.

The aio_disable_external() was always somewhat hacky. It suspends all file
descriptors that were registered with is_external=true, even if they have
nothing to do with the BlockDriverState graph nodes that are being drained.
It's better to solve a block layer problem in the block layer than to have an
odd event loop API solution.

That covers the motivation for this change, now on to the specifics of this
series:

While it would be nice if a single conceptual approach could be applied to all
is_external=true file descriptors, I ended up looking at callers on a
case-by-case basis. There are two general ways I migrated code away from
is_external=true:

1. Block exports are typically best off unregistering fds in .drained_begin()
   and registering them again in .drained_end(). The .drained_poll() function
   waits for in-flight requests to finish using a reference counter.

2. Emulated storage controllers like virtio-blk and virtio-scsi are a little
   simpler. They can rely on BlockBackend's request queuing during drain
   feature. Guest I/O request coroutines are suspended in a drained section and
   resume upon the end of the drained section.

The first two virtio-scsi patches were already sent as a separate series. I
included them because they are necessary in order to fully remove
aio_disable_external().

Based-on: 087bc644b7634436ca9d52fe58ba9234e2bef026 (kevin/block-next)

Stefan Hajnoczi (13):
  virtio-scsi: avoid race between unplug and transport event
  virtio-scsi: stop using aio_disable_external() during unplug
  block/export: only acquire AioContext once for
    vhost_user_server_stop()
  util/vhost-user-server: rename refcount to in_flight counter
  block/export: wait for vhost-user-blk requests when draining
  block/export: stop using is_external in vhost-user-blk server
  virtio: do not set is_external=true on host notifiers
  hw/xen: do not use aio_set_fd_handler(is_external=true) in
    xen_xenstore
  hw/xen: do not set is_external=true on evtchn fds
  block/export: rewrite vduse-blk drain code
  block/fuse: take AioContext lock around blk_exp_ref/unref()
  block/fuse: do not set is_external=true on FUSE fd
  aio: remove aio_disable_external() API

 include/block/aio.h                  |  55 -----------
 include/qemu/vhost-user-server.h     |   8 +-
 util/aio-posix.h                     |   1 -
 block.c                              |   7 --
 block/blkio.c                        |  15 +--
 block/curl.c                         |  10 +-
 block/export/fuse.c                  |  62 ++++++++++++-
 block/export/vduse-blk.c             | 132 +++++++++++++++++++--------
 block/export/vhost-user-blk-server.c |  73 +++++++++------
 block/io.c                           |   2 -
 block/io_uring.c                     |   4 +-
 block/iscsi.c                        |   3 +-
 block/linux-aio.c                    |   4 +-
 block/nfs.c                          |   5 +-
 block/nvme.c                         |   8 +-
 block/ssh.c                          |   4 +-
 block/win32-aio.c                    |   6 +-
 hw/i386/kvm/xen_xenstore.c           |   2 +-
 hw/scsi/scsi-bus.c                   |   3 +-
 hw/scsi/scsi-disk.c                  |   1 +
 hw/scsi/virtio-scsi.c                |  21 ++---
 hw/virtio/virtio.c                   |   6 +-
 hw/xen/xen-bus.c                     |   6 +-
 io/channel-command.c                 |   6 +-
 io/channel-file.c                    |   3 +-
 io/channel-socket.c                  |   3 +-
 migration/rdma.c                     |  16 ++--
 tests/unit/test-aio.c                |  27 +-----
 tests/unit/test-fdmon-epoll.c        |  73 ---------------
 util/aio-posix.c                     |  20 +---
 util/aio-win32.c                     |   8 +-
 util/async.c                         |   3 +-
 util/fdmon-epoll.c                   |  10 --
 util/fdmon-io_uring.c                |   8 +-
 util/fdmon-poll.c                    |   3 +-
 util/main-loop.c                     |   7 +-
 util/qemu-coroutine-io.c             |   7 +-
 util/vhost-user-server.c             |  38 ++++----
 tests/unit/meson.build               |   3 -
 39 files changed, 298 insertions(+), 375 deletions(-)
 delete mode 100644 tests/unit/test-fdmon-epoll.c

-- 
2.39.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event
  2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
@ 2023-04-03 18:29 ` Stefan Hajnoczi
  2023-04-03 20:47   ` Philippe Mathieu-Daudé
  2023-04-04 14:38   ` Michael S. Tsirkin
  2023-04-03 18:29 ` [PATCH 02/13] virtio-scsi: stop using aio_disable_external() during unplug Stefan Hajnoczi
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-03 18:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

Only report a transport reset event to the guest after the SCSIDevice
has been unrealized by qdev_simple_device_unplug_cb().

qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
to false so that scsi_device_find/get() no longer see it.

scsi_target_emulate_report_luns() also needs to be updated to filter out
SCSIDevices that are unrealized.

These changes ensure that the guest driver does not see the SCSIDevice
that's being unplugged if it responds very quickly to the transport
reset event.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/scsi/scsi-bus.c    |  3 ++-
 hw/scsi/virtio-scsi.c | 18 +++++++++---------
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index c97176110c..f9bd064833 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -487,7 +487,8 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq *r)
             DeviceState *qdev = kid->child;
             SCSIDevice *dev = SCSI_DEVICE(qdev);
 
-            if (dev->channel == channel && dev->id == id && dev->lun != 0) {
+            if (dev->channel == channel && dev->id == id && dev->lun != 0 &&
+                qatomic_load_acquire(&dev->qdev.realized)) {
                 store_lun(tmp, dev->lun);
                 g_byte_array_append(buf, tmp, 8);
                 len += 8;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 612c525d9d..000961446c 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1063,15 +1063,6 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
     SCSIDevice *sd = SCSI_DEVICE(dev);
     AioContext *ctx = s->ctx ?: qemu_get_aio_context();
 
-    if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
-        virtio_scsi_acquire(s);
-        virtio_scsi_push_event(s, sd,
-                               VIRTIO_SCSI_T_TRANSPORT_RESET,
-                               VIRTIO_SCSI_EVT_RESET_REMOVED);
-        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
-        virtio_scsi_release(s);
-    }
-
     aio_disable_external(ctx);
     qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
     aio_enable_external(ctx);
@@ -1082,6 +1073,15 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
         blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL);
         virtio_scsi_release(s);
     }
+
+    if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
+        virtio_scsi_acquire(s);
+        virtio_scsi_push_event(s, sd,
+                               VIRTIO_SCSI_T_TRANSPORT_RESET,
+                               VIRTIO_SCSI_EVT_RESET_REMOVED);
+        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
+        virtio_scsi_release(s);
+    }
 }
 
 static struct SCSIBusInfo virtio_scsi_scsi_info = {
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/13] virtio-scsi: stop using aio_disable_external() during unplug
  2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
  2023-04-03 18:29 ` [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
@ 2023-04-03 18:29 ` Stefan Hajnoczi
  2023-04-04 13:38   ` Paolo Bonzini
  2023-04-03 18:29 ` [PATCH 03/13] block/export: only acquire AioContext once for vhost_user_server_stop() Stefan Hajnoczi
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-03 18:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard, Zhengui Li

This patch is part of an effort to remove the aio_disable_external()
API because it does not fit in a multi-queue block layer world where
many AioContexts may be submitting requests to the same disk.

The SCSI emulation code is already in good shape to stop using
aio_disable_external(). It was only used by commit 9c5aad84da1c
("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
disk") to ensure that virtio_scsi_hotunplug() works while the guest
driver is submitting I/O.

Ensure virtio_scsi_hotunplug() is safe as follows:

1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
   device_set_realized() calls qatomic_set(&dev->realized, false) so
   that future scsi_device_get() calls return NULL because they exclude
   SCSIDevices with realized=false.

   That means virtio-scsi will reject new I/O requests to this
   SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
   virtio_scsi_hotunplug() is still executing. We are protected against
   new requests!

2. Add a call to scsi_device_purge_requests() from scsi_unrealize() so
   that in-flight requests are cancelled synchronously. This ensures
   that no in-flight requests remain once qdev_simple_device_unplug_cb()
   returns.

Thanks to these two conditions we don't need aio_disable_external()
anymore.

Cc: Zhengui Li <lizhengui@huawei.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/scsi/scsi-disk.c   | 1 +
 hw/scsi/virtio-scsi.c | 3 ---
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 97c9b1c8cd..e01bd84541 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2522,6 +2522,7 @@ static void scsi_realize(SCSIDevice *dev, Error **errp)
 
 static void scsi_unrealize(SCSIDevice *dev)
 {
+    scsi_device_purge_requests(dev, SENSE_CODE(RESET));
     del_boot_device_lchs(&dev->qdev, NULL);
 }
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 000961446c..a02f9233ec 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1061,11 +1061,8 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
     VirtIODevice *vdev = VIRTIO_DEVICE(hotplug_dev);
     VirtIOSCSI *s = VIRTIO_SCSI(vdev);
     SCSIDevice *sd = SCSI_DEVICE(dev);
-    AioContext *ctx = s->ctx ?: qemu_get_aio_context();
 
-    aio_disable_external(ctx);
     qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
-    aio_enable_external(ctx);
 
     if (s->ctx) {
         virtio_scsi_acquire(s);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/13] block/export: only acquire AioContext once for vhost_user_server_stop()
  2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
  2023-04-03 18:29 ` [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
  2023-04-03 18:29 ` [PATCH 02/13] virtio-scsi: stop using aio_disable_external() during unplug Stefan Hajnoczi
@ 2023-04-03 18:29 ` Stefan Hajnoczi
  2023-04-03 18:29 ` [PATCH 04/13] util/vhost-user-server: rename refcount to in_flight counter Stefan Hajnoczi
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-03 18:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

vhost_user_server_stop() uses AIO_WAIT_WHILE(). AIO_WAIT_WHILE()
requires that AioContext is only acquired once.

Since blk_exp_request_shutdown() already acquires the AioContext it
shouldn't be acquired again in vhost_user_server_stop().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 util/vhost-user-server.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 40f36ea214..5b6216069c 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -346,10 +346,9 @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
     aio_context_release(server->ctx);
 }
 
+/* server->ctx acquired by caller */
 void vhost_user_server_stop(VuServer *server)
 {
-    aio_context_acquire(server->ctx);
-
     qemu_bh_delete(server->restart_listener_bh);
     server->restart_listener_bh = NULL;
 
@@ -366,8 +365,6 @@ void vhost_user_server_stop(VuServer *server)
         AIO_WAIT_WHILE(server->ctx, server->co_trip);
     }
 
-    aio_context_release(server->ctx);
-
     if (server->listener) {
         qio_net_listener_disconnect(server->listener);
         object_unref(OBJECT(server->listener));
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/13] util/vhost-user-server: rename refcount to in_flight counter
  2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2023-04-03 18:29 ` [PATCH 03/13] block/export: only acquire AioContext once for vhost_user_server_stop() Stefan Hajnoczi
@ 2023-04-03 18:29 ` Stefan Hajnoczi
  2023-04-04 13:39   ` Paolo Bonzini
  2023-04-03 18:29 ` [PATCH 05/13] block/export: wait for vhost-user-blk requests when draining Stefan Hajnoczi
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-03 18:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

The VuServer object has a refcount field and ref/unref APIs. The name is
confusing because it's actually an in-flight request counter instead of
a refcount.

Normally a refcount destroys the object upon reaching zero. The VuServer
counter is used to wake up the vhost-user coroutine when there are no
more requests.

Avoid confusing by renaming refcount and ref/unref to in_flight and
inc/dec.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/vhost-user-server.h     |  6 +++---
 block/export/vhost-user-blk-server.c | 11 +++++++----
 util/vhost-user-server.c             | 14 +++++++-------
 3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h
index 25c72433ca..bc0ac9ddb6 100644
--- a/include/qemu/vhost-user-server.h
+++ b/include/qemu/vhost-user-server.h
@@ -41,7 +41,7 @@ typedef struct {
     const VuDevIface *vu_iface;
 
     /* Protected by ctx lock */
-    unsigned int refcount;
+    unsigned int in_flight;
     bool wait_idle;
     VuDev vu_dev;
     QIOChannel *ioc; /* The I/O channel with the client */
@@ -60,8 +60,8 @@ bool vhost_user_server_start(VuServer *server,
 
 void vhost_user_server_stop(VuServer *server);
 
-void vhost_user_server_ref(VuServer *server);
-void vhost_user_server_unref(VuServer *server);
+void vhost_user_server_inc_in_flight(VuServer *server);
+void vhost_user_server_dec_in_flight(VuServer *server);
 
 void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
 void vhost_user_server_detach_aio_context(VuServer *server);
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
index 3409d9e02e..e93f2ed6b4 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -49,7 +49,10 @@ static void vu_blk_req_complete(VuBlkReq *req, size_t in_len)
     free(req);
 }
 
-/* Called with server refcount increased, must decrease before returning */
+/*
+ * Called with server in_flight counter increased, must decrease before
+ * returning.
+ */
 static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
 {
     VuBlkReq *req = opaque;
@@ -67,12 +70,12 @@ static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
                                     in_num, out_num);
     if (in_len < 0) {
         free(req);
-        vhost_user_server_unref(server);
+        vhost_user_server_dec_in_flight(server);
         return;
     }
 
     vu_blk_req_complete(req, in_len);
-    vhost_user_server_unref(server);
+    vhost_user_server_dec_in_flight(server);
 }
 
 static void vu_blk_process_vq(VuDev *vu_dev, int idx)
@@ -94,7 +97,7 @@ static void vu_blk_process_vq(VuDev *vu_dev, int idx)
         Coroutine *co =
             qemu_coroutine_create(vu_blk_virtio_process_req, req);
 
-        vhost_user_server_ref(server);
+        vhost_user_server_inc_in_flight(server);
         qemu_coroutine_enter(co);
     }
 }
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 5b6216069c..1622f8cfb3 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -75,16 +75,16 @@ static void panic_cb(VuDev *vu_dev, const char *buf)
     error_report("vu_panic: %s", buf);
 }
 
-void vhost_user_server_ref(VuServer *server)
+void vhost_user_server_inc_in_flight(VuServer *server)
 {
     assert(!server->wait_idle);
-    server->refcount++;
+    server->in_flight++;
 }
 
-void vhost_user_server_unref(VuServer *server)
+void vhost_user_server_dec_in_flight(VuServer *server)
 {
-    server->refcount--;
-    if (server->wait_idle && !server->refcount) {
+    server->in_flight--;
+    if (server->wait_idle && !server->in_flight) {
         aio_co_wake(server->co_trip);
     }
 }
@@ -192,13 +192,13 @@ static coroutine_fn void vu_client_trip(void *opaque)
         /* Keep running */
     }
 
-    if (server->refcount) {
+    if (server->in_flight) {
         /* Wait for requests to complete before we can unmap the memory */
         server->wait_idle = true;
         qemu_coroutine_yield();
         server->wait_idle = false;
     }
-    assert(server->refcount == 0);
+    assert(server->in_flight == 0);
 
     vu_deinit(vu_dev);
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/13] block/export: wait for vhost-user-blk requests when draining
  2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2023-04-03 18:29 ` [PATCH 04/13] util/vhost-user-server: rename refcount to in_flight counter Stefan Hajnoczi
@ 2023-04-03 18:29 ` Stefan Hajnoczi
  2023-04-03 18:29 ` [PATCH 06/13] block/export: stop using is_external in vhost-user-blk server Stefan Hajnoczi
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-03 18:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

Each vhost-user-blk request runs in a coroutine. When the BlockBackend
enters a drained section we need to enter a quiescent state. Currently
any in-flight requests race with bdrv_drained_begin() because it is
unaware of vhost-user-blk requests.

When blk_co_preadv/pwritev()/etc returns it wakes the
bdrv_drained_begin() thread but vhost-user-blk request processing has
not yet finished. The request coroutine continues executing while the
main loop thread thinks it is in a drained section.

One example where this is unsafe is for blk_set_aio_context() where
bdrv_drained_begin() is called before .aio_context_detached() and
.aio_context_attach(). If request coroutines are still running after
bdrv_drained_begin(), then the AioContext could change underneath them
and they race with new requests processed in the new AioContext. This
could lead to virtqueue corruption, for example.

(This example is theoretical, I came across this while reading the
code and have not tried to reproduce it.)

It's easy to make bdrv_drained_begin() wait for in-flight requests: add
a .drained_poll() callback that checks the VuServer's in-flight counter.
VuServer just needs an API that returns true when there are requests in
flight. The in-flight counter needs to be atomic.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/vhost-user-server.h     |  4 +++-
 block/export/vhost-user-blk-server.c | 19 +++++++++++++++++++
 util/vhost-user-server.c             | 14 ++++++++++----
 3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h
index bc0ac9ddb6..b1c1cda886 100644
--- a/include/qemu/vhost-user-server.h
+++ b/include/qemu/vhost-user-server.h
@@ -40,8 +40,9 @@ typedef struct {
     int max_queues;
     const VuDevIface *vu_iface;
 
+    unsigned int in_flight; /* atomic */
+
     /* Protected by ctx lock */
-    unsigned int in_flight;
     bool wait_idle;
     VuDev vu_dev;
     QIOChannel *ioc; /* The I/O channel with the client */
@@ -62,6 +63,7 @@ void vhost_user_server_stop(VuServer *server);
 
 void vhost_user_server_inc_in_flight(VuServer *server);
 void vhost_user_server_dec_in_flight(VuServer *server);
+bool vhost_user_server_has_in_flight(VuServer *server);
 
 void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
 void vhost_user_server_detach_aio_context(VuServer *server);
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
index e93f2ed6b4..dbf5207162 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -254,6 +254,22 @@ static void vu_blk_exp_request_shutdown(BlockExport *exp)
     vhost_user_server_stop(&vexp->vu_server);
 }
 
+/*
+ * Ensures that bdrv_drained_begin() waits until in-flight requests complete.
+ *
+ * Called with vexp->export.ctx acquired.
+ */
+static bool vu_blk_drained_poll(void *opaque)
+{
+    VuBlkExport *vexp = opaque;
+
+    return vhost_user_server_has_in_flight(&vexp->vu_server);
+}
+
+static const BlockDevOps vu_blk_dev_ops = {
+    .drained_poll  = vu_blk_drained_poll,
+};
+
 static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
                              Error **errp)
 {
@@ -292,6 +308,7 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
     vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
                              logical_block_size, num_queues);
 
+    blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);
     blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
                                  vexp);
 
@@ -299,6 +316,7 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
                                  num_queues, &vu_blk_iface, errp)) {
         blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
                                         blk_aio_detach, vexp);
+        blk_set_dev_ops(exp->blk, NULL, NULL);
         g_free(vexp->handler.serial);
         return -EADDRNOTAVAIL;
     }
@@ -312,6 +330,7 @@ static void vu_blk_exp_delete(BlockExport *exp)
 
     blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
                                     vexp);
+    blk_set_dev_ops(exp->blk, NULL, NULL);
     g_free(vexp->handler.serial);
 }
 
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 1622f8cfb3..2e6b640050 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -78,17 +78,23 @@ static void panic_cb(VuDev *vu_dev, const char *buf)
 void vhost_user_server_inc_in_flight(VuServer *server)
 {
     assert(!server->wait_idle);
-    server->in_flight++;
+    qatomic_inc(&server->in_flight);
 }
 
 void vhost_user_server_dec_in_flight(VuServer *server)
 {
-    server->in_flight--;
-    if (server->wait_idle && !server->in_flight) {
-        aio_co_wake(server->co_trip);
+    if (qatomic_fetch_dec(&server->in_flight) == 1) {
+        if (server->wait_idle) {
+            aio_co_wake(server->co_trip);
+        }
     }
 }
 
+bool vhost_user_server_has_in_flight(VuServer *server)
+{
+    return qatomic_load_acquire(&server->in_flight) > 0;
+}
+
 static bool coroutine_fn
 vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
 {
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/13] block/export: stop using is_external in vhost-user-blk server
  2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2023-04-03 18:29 ` [PATCH 05/13] block/export: wait for vhost-user-blk requests when draining Stefan Hajnoczi
@ 2023-04-03 18:29 ` Stefan Hajnoczi
  2023-04-03 18:29 ` [PATCH 07/13] virtio: do not set is_external=true on host notifiers Stefan Hajnoczi
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-03 18:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

vhost-user activity must be suspended during bdrv_drained_begin/end().
This prevents new requests from interfering with whatever is happening
in the drained section.

Previously this was done using aio_set_fd_handler()'s is_external
argument. In a multi-queue block layer world the aio_disable_external()
API cannot be used since multiple AioContext may be processing I/O, not
just one.

Switch to BlockDevOps->drained_begin/end() callbacks.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/export/vhost-user-blk-server.c | 43 ++++++++++++++--------------
 util/vhost-user-server.c             | 10 +++----
 2 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
index dbf5207162..6e1bc196fb 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -207,22 +207,6 @@ static const VuDevIface vu_blk_iface = {
     .process_msg           = vu_blk_process_msg,
 };
 
-static void blk_aio_attached(AioContext *ctx, void *opaque)
-{
-    VuBlkExport *vexp = opaque;
-
-    vexp->export.ctx = ctx;
-    vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
-}
-
-static void blk_aio_detach(void *opaque)
-{
-    VuBlkExport *vexp = opaque;
-
-    vhost_user_server_detach_aio_context(&vexp->vu_server);
-    vexp->export.ctx = NULL;
-}
-
 static void
 vu_blk_initialize_config(BlockDriverState *bs,
                          struct virtio_blk_config *config,
@@ -254,6 +238,25 @@ static void vu_blk_exp_request_shutdown(BlockExport *exp)
     vhost_user_server_stop(&vexp->vu_server);
 }
 
+/* Called with vexp->export.ctx acquired */
+static void vu_blk_drained_begin(void *opaque)
+{
+    VuBlkExport *vexp = opaque;
+
+    vhost_user_server_detach_aio_context(&vexp->vu_server);
+}
+
+/* Called with vexp->export.blk AioContext acquired */
+static void vu_blk_drained_end(void *opaque)
+{
+    VuBlkExport *vexp = opaque;
+
+    /* Refresh AioContext in case it changed */
+    vexp->export.ctx = blk_get_aio_context(vexp->export.blk);
+
+    vhost_user_server_attach_aio_context(&vexp->vu_server, vexp->export.ctx);
+}
+
 /*
  * Ensures that bdrv_drained_begin() waits until in-flight requests complete.
  *
@@ -267,6 +270,8 @@ static bool vu_blk_drained_poll(void *opaque)
 }
 
 static const BlockDevOps vu_blk_dev_ops = {
+    .drained_begin = vu_blk_drained_begin,
+    .drained_end   = vu_blk_drained_end,
     .drained_poll  = vu_blk_drained_poll,
 };
 
@@ -309,13 +314,9 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
                              logical_block_size, num_queues);
 
     blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);
-    blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
-                                 vexp);
 
     if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
                                  num_queues, &vu_blk_iface, errp)) {
-        blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
-                                        blk_aio_detach, vexp);
         blk_set_dev_ops(exp->blk, NULL, NULL);
         g_free(vexp->handler.serial);
         return -EADDRNOTAVAIL;
@@ -328,8 +329,6 @@ static void vu_blk_exp_delete(BlockExport *exp)
 {
     VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
 
-    blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
-                                    vexp);
     blk_set_dev_ops(exp->blk, NULL, NULL);
     g_free(vexp->handler.serial);
 }
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 2e6b640050..332aea9306 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -278,7 +278,7 @@ set_watch(VuDev *vu_dev, int fd, int vu_evt,
         vu_fd_watch->fd = fd;
         vu_fd_watch->cb = cb;
         qemu_socket_set_nonblock(fd);
-        aio_set_fd_handler(server->ioc->ctx, fd, true, kick_handler,
+        aio_set_fd_handler(server->ioc->ctx, fd, false, kick_handler,
                            NULL, NULL, NULL, vu_fd_watch);
         vu_fd_watch->vu_dev = vu_dev;
         vu_fd_watch->pvt = pvt;
@@ -299,7 +299,7 @@ static void remove_watch(VuDev *vu_dev, int fd)
     if (!vu_fd_watch) {
         return;
     }
-    aio_set_fd_handler(server->ioc->ctx, fd, true,
+    aio_set_fd_handler(server->ioc->ctx, fd, false,
                        NULL, NULL, NULL, NULL, NULL);
 
     QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
@@ -362,7 +362,7 @@ void vhost_user_server_stop(VuServer *server)
         VuFdWatch *vu_fd_watch;
 
         QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
-            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
+            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false,
                                NULL, NULL, NULL, NULL, vu_fd_watch);
         }
 
@@ -403,7 +403,7 @@ void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx)
     qio_channel_attach_aio_context(server->ioc, ctx);
 
     QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
-        aio_set_fd_handler(ctx, vu_fd_watch->fd, true, kick_handler, NULL,
+        aio_set_fd_handler(ctx, vu_fd_watch->fd, false, kick_handler, NULL,
                            NULL, NULL, vu_fd_watch);
     }
 
@@ -417,7 +417,7 @@ void vhost_user_server_detach_aio_context(VuServer *server)
         VuFdWatch *vu_fd_watch;
 
         QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
-            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
+            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false,
                                NULL, NULL, NULL, NULL, vu_fd_watch);
         }
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/13] virtio: do not set is_external=true on host notifiers
  2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2023-04-03 18:29 ` [PATCH 06/13] block/export: stop using is_external in vhost-user-blk server Stefan Hajnoczi
@ 2023-04-03 18:29 ` Stefan Hajnoczi
  2023-04-03 18:29 ` [PATCH 08/13] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Stefan Hajnoczi
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-03 18:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

Host notifiers trigger virtqueue processing. There are critical sections
when new I/O requests must not be submitted because they would cause
interference.

In the past this was solved using aio_set_event_notifiers()
is_external=true argument, which disables fd monitoring between
aio_disable/enable_external() calls. This API is not multi-queue block
layer friendly because it requires knowledge of the specific AioContext.
In a multi-queue block layer world any thread can submit I/O and we
don't know which AioContexts are currently involved.

virtio-blk and virtio-scsi are the only users that depend on
is_external=true. Both rely on the block layer, where we can take
advantage of the existing request queuing behavior that happens during
drained sections. The block layer's drained sections are the only user
of aio_disable_external().

After this patch the virtqueues will be processed during drained
section, but submitted I/O requests will be queued in the BlockBackend.
Queued requests are resumed when the drained section ends. Therefore,
the BlockBackend is still quiesced during drained sections but we no
longer rely on is_external=true to achieve this.

Note that virtqueues have a finite size, so queuing requests does not
lead to unbounded memory usage.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/virtio/virtio.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 98c4819fcc..dcd7aabb4e 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3491,7 +3491,7 @@ static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
 
 void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
 {
-    aio_set_event_notifier(ctx, &vq->host_notifier, true,
+    aio_set_event_notifier(ctx, &vq->host_notifier, false,
                            virtio_queue_host_notifier_read,
                            virtio_queue_host_notifier_aio_poll,
                            virtio_queue_host_notifier_aio_poll_ready);
@@ -3508,14 +3508,14 @@ void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
  */
 void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext *ctx)
 {
-    aio_set_event_notifier(ctx, &vq->host_notifier, true,
+    aio_set_event_notifier(ctx, &vq->host_notifier, false,
                            virtio_queue_host_notifier_read,
                            NULL, NULL);
 }
 
 void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx)
 {
-    aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL, NULL);
+    aio_set_event_notifier(ctx, &vq->host_notifier, false, NULL, NULL, NULL);
     /* Test and clear notifier before after disabling event,
      * in case poll callback didn't have time to run. */
     virtio_queue_host_notifier_read(&vq->host_notifier);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/13] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore
  2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2023-04-03 18:29 ` [PATCH 07/13] virtio: do not set is_external=true on host notifiers Stefan Hajnoczi
@ 2023-04-03 18:29 ` Stefan Hajnoczi
  2023-04-04  9:52   ` David Woodhouse
  2023-04-03 18:30 ` [PATCH 09/13] hw/xen: do not set is_external=true on evtchn fds Stefan Hajnoczi
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-03 18:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

There is no need to suspend activity between aio_disable_external() and
aio_enable_external(), which is mainly used for the block layer's drain
operation.

This is part of ongoing work to remove the aio_disable_external() API.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/i386/kvm/xen_xenstore.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index 900679af8a..6e81bc8791 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -133,7 +133,7 @@ static void xen_xenstore_realize(DeviceState *dev, Error **errp)
         error_setg(errp, "Xenstore evtchn port init failed");
         return;
     }
-    aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), true,
+    aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), false,
                        xen_xenstore_event, NULL, NULL, NULL, s);
 
     s->impl = xs_impl_create(xen_domid);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/13] hw/xen: do not set is_external=true on evtchn fds
  2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (7 preceding siblings ...)
  2023-04-03 18:29 ` [PATCH 08/13] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Stefan Hajnoczi
@ 2023-04-03 18:30 ` Stefan Hajnoczi
  2023-04-03 18:30 ` [PATCH 10/13] block/export: rewrite vduse-blk drain code Stefan Hajnoczi
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-03 18:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

is_external=true suspends fd handlers between aio_disable_external() and
aio_enable_external(). The block layer's drain operation uses this
mechanism to prevent new I/O from sneaking in between
bdrv_drained_begin() and bdrv_drained_end().

The xen-block device actually works fine with is_external=false because
BlockBackend requests are already queued between bdrv_drained_begin()
and bdrv_drained_end(). Since the Xen ring size is finite, request
queuing will stop once the ring is full and memory usage is bounded.
After bdrv_drained_end() the BlockBackend requests will resume and
xen-block's processing will continue.

This is part of ongoing work to remove the aio_disable_external() API.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/xen/xen-bus.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index c59850b1de..c4fd26abe1 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -842,11 +842,11 @@ void xen_device_set_event_channel_context(XenDevice *xendev,
     }
 
     if (channel->ctx)
-        aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), true,
+        aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
                            NULL, NULL, NULL, NULL, NULL);
 
     channel->ctx = ctx;
-    aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), true,
+    aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
                        xen_device_event, NULL, xen_device_poll, NULL, channel);
 }
 
@@ -920,7 +920,7 @@ void xen_device_unbind_event_channel(XenDevice *xendev,
 
     QLIST_REMOVE(channel, list);
 
-    aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), true,
+    aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
                        NULL, NULL, NULL, NULL, NULL);
 
     if (qemu_xen_evtchn_unbind(channel->xeh, channel->local_port) < 0) {
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/13] block/export: rewrite vduse-blk drain code
  2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (8 preceding siblings ...)
  2023-04-03 18:30 ` [PATCH 09/13] hw/xen: do not set is_external=true on evtchn fds Stefan Hajnoczi
@ 2023-04-03 18:30 ` Stefan Hajnoczi
  2023-04-03 18:30 ` [PATCH 11/13] block/fuse: take AioContext lock around blk_exp_ref/unref() Stefan Hajnoczi
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-03 18:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

vduse_blk_detach_ctx() waits for in-flight requests using
AIO_WAIT_WHILE(). This is not allowed according to a comment in
bdrv_set_aio_context_commit():

  /*
   * Take the old AioContex when detaching it from bs.
   * At this point, new_context lock is already acquired, and we are now
   * also taking old_context. This is safe as long as bdrv_detach_aio_context
   * does not call AIO_POLL_WHILE().
   */

Use this opportunity to rewrite the drain code in vduse-blk:

- Use the BlockExport refcount so that vduse_blk_exp_delete() is only
  called when there are no more requests in flight.

- Implement .drained_poll() so in-flight request coroutines are stopped
  by the time .bdrv_detach_aio_context() is called.

- Remove AIO_WAIT_WHILE() from vduse_blk_detach_ctx() to solve the
  .bdrv_detach_aio_context() constraint violation. It's no longer
  needed due to the previous changes.

- Always handle the VDUSE file descriptor, even in drained sections. The
  VDUSE file descriptor doesn't submit I/O, so it's safe to handle it in
  drained sections. This ensures that the VDUSE kernel code gets a fast
  response.

- Suspend virtqueue fd handlers in .drained_begin() and resume them in
  .drained_end(). This eliminates the need for the
  aio_set_fd_handler(is_external=true) flag, which is being removed from
  QEMU.

This is a long list but splitting it into individual commits would
probably lead to git bisect failures - the changes are all related.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/export/vduse-blk.c | 132 +++++++++++++++++++++++++++------------
 1 file changed, 93 insertions(+), 39 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index f7ae44e3ce..35dc8fcf45 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -31,7 +31,8 @@ typedef struct VduseBlkExport {
     VduseDev *dev;
     uint16_t num_queues;
     char *recon_file;
-    unsigned int inflight;
+    unsigned int inflight; /* atomic */
+    bool vqs_started;
 } VduseBlkExport;
 
 typedef struct VduseBlkReq {
@@ -41,13 +42,24 @@ typedef struct VduseBlkReq {
 
 static void vduse_blk_inflight_inc(VduseBlkExport *vblk_exp)
 {
-    vblk_exp->inflight++;
+    if (qatomic_fetch_inc(&vblk_exp->inflight) == 0) {
+        /* Prevent export from being deleted */
+        aio_context_acquire(vblk_exp->export.ctx);
+        blk_exp_ref(&vblk_exp->export);
+        aio_context_release(vblk_exp->export.ctx);
+    }
 }
 
 static void vduse_blk_inflight_dec(VduseBlkExport *vblk_exp)
 {
-    if (--vblk_exp->inflight == 0) {
+    if (qatomic_fetch_dec(&vblk_exp->inflight) == 1) {
+        /* Wake AIO_WAIT_WHILE() */
         aio_wait_kick();
+
+        /* Now the export can be deleted */
+        aio_context_acquire(vblk_exp->export.ctx);
+        blk_exp_unref(&vblk_exp->export);
+        aio_context_release(vblk_exp->export.ctx);
     }
 }
 
@@ -124,8 +136,12 @@ static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq)
 {
     VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
 
+    if (!vblk_exp->vqs_started) {
+        return; /* vduse_blk_drained_end() will start vqs later */
+    }
+
     aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
-                       true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+                       false, on_vduse_vq_kick, NULL, NULL, NULL, vq);
     /* Make sure we don't miss any kick afer reconnecting */
     eventfd_write(vduse_queue_get_fd(vq), 1);
 }
@@ -133,9 +149,14 @@ static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq)
 static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
 {
     VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
+    int fd = vduse_queue_get_fd(vq);
 
-    aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
-                       true, NULL, NULL, NULL, NULL, NULL);
+    if (fd < 0) {
+        return;
+    }
+
+    aio_set_fd_handler(vblk_exp->export.ctx, fd, false,
+                       NULL, NULL, NULL, NULL, NULL);
 }
 
 static const VduseOps vduse_blk_ops = {
@@ -152,42 +173,19 @@ static void on_vduse_dev_kick(void *opaque)
 
 static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx)
 {
-    int i;
-
     aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
-                       true, on_vduse_dev_kick, NULL, NULL, NULL,
+                       false, on_vduse_dev_kick, NULL, NULL, NULL,
                        vblk_exp->dev);
 
-    for (i = 0; i < vblk_exp->num_queues; i++) {
-        VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
-        int fd = vduse_queue_get_fd(vq);
-
-        if (fd < 0) {
-            continue;
-        }
-        aio_set_fd_handler(vblk_exp->export.ctx, fd, true,
-                           on_vduse_vq_kick, NULL, NULL, NULL, vq);
-    }
+    /* Virtqueues are handled by vduse_blk_drained_end() */
 }
 
 static void vduse_blk_detach_ctx(VduseBlkExport *vblk_exp)
 {
-    int i;
-
-    for (i = 0; i < vblk_exp->num_queues; i++) {
-        VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
-        int fd = vduse_queue_get_fd(vq);
-
-        if (fd < 0) {
-            continue;
-        }
-        aio_set_fd_handler(vblk_exp->export.ctx, fd,
-                           true, NULL, NULL, NULL, NULL, NULL);
-    }
     aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
-                       true, NULL, NULL, NULL, NULL, NULL);
+                       false, NULL, NULL, NULL, NULL, NULL);
 
-    AIO_WAIT_WHILE(vblk_exp->export.ctx, vblk_exp->inflight > 0);
+    /* Virtqueues are handled by vduse_blk_drained_begin() */
 }
 
 
@@ -220,8 +218,55 @@ static void vduse_blk_resize(void *opaque)
                             (char *)&config.capacity);
 }
 
+static void vduse_blk_stop_virtqueues(VduseBlkExport *vblk_exp)
+{
+    for (uint16_t i = 0; i < vblk_exp->num_queues; i++) {
+        VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
+        vduse_blk_disable_queue(vblk_exp->dev, vq);
+    }
+
+    vblk_exp->vqs_started = false;
+}
+
+static void vduse_blk_start_virtqueues(VduseBlkExport *vblk_exp)
+{
+    vblk_exp->vqs_started = true;
+
+    for (uint16_t i = 0; i < vblk_exp->num_queues; i++) {
+        VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
+        vduse_blk_enable_queue(vblk_exp->dev, vq);
+    }
+}
+
+static void vduse_blk_drained_begin(void *opaque)
+{
+    BlockExport *exp = opaque;
+    VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+
+    vduse_blk_stop_virtqueues(vblk_exp);
+}
+
+static void vduse_blk_drained_end(void *opaque)
+{
+    BlockExport *exp = opaque;
+    VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+
+    vduse_blk_start_virtqueues(vblk_exp);
+}
+
+static bool vduse_blk_drained_poll(void *opaque)
+{
+    BlockExport *exp = opaque;
+    VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+
+    return qatomic_read(&vblk_exp->inflight) > 0;
+}
+
 static const BlockDevOps vduse_block_ops = {
-    .resize_cb = vduse_blk_resize,
+    .resize_cb     = vduse_blk_resize,
+    .drained_begin = vduse_blk_drained_begin,
+    .drained_end   = vduse_blk_drained_end,
+    .drained_poll  = vduse_blk_drained_poll,
 };
 
 static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
@@ -268,6 +313,7 @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
     vblk_exp->handler.serial = g_strdup(vblk_opts->serial ?: "");
     vblk_exp->handler.logical_block_size = logical_block_size;
     vblk_exp->handler.writable = opts->writable;
+    vblk_exp->vqs_started = true;
 
     config.capacity =
             cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
@@ -322,14 +368,20 @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
         vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
     }
 
-    aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), true,
+    aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), false,
                        on_vduse_dev_kick, NULL, NULL, NULL, vblk_exp->dev);
 
     blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
                                  vblk_exp);
-
     blk_set_dev_ops(exp->blk, &vduse_block_ops, exp);
 
+    /*
+     * We handle draining ourselves using an in-flight counter and by disabling
+     * virtqueue fd handlers. Do not queue BlockBackend requests, they need to
+     * complete so the in-flight counter reaches zero.
+     */
+    blk_set_disable_request_queuing(exp->blk, true);
+
     return 0;
 err:
     vduse_dev_destroy(vblk_exp->dev);
@@ -344,6 +396,9 @@ static void vduse_blk_exp_delete(BlockExport *exp)
     VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
     int ret;
 
+    assert(qatomic_read(&vblk_exp->inflight) == 0);
+
+    vduse_blk_detach_ctx(vblk_exp);
     blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
                                     vblk_exp);
     blk_set_dev_ops(exp->blk, NULL, NULL);
@@ -355,13 +410,12 @@ static void vduse_blk_exp_delete(BlockExport *exp)
     g_free(vblk_exp->handler.serial);
 }
 
+/* Called with exp->ctx acquired */
 static void vduse_blk_exp_request_shutdown(BlockExport *exp)
 {
     VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
 
-    aio_context_acquire(vblk_exp->export.ctx);
-    vduse_blk_detach_ctx(vblk_exp);
-    aio_context_acquire(vblk_exp->export.ctx);
+    vduse_blk_stop_virtqueues(vblk_exp);
 }
 
 const BlockExportDriver blk_exp_vduse_blk = {
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 11/13] block/fuse: take AioContext lock around blk_exp_ref/unref()
  2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (9 preceding siblings ...)
  2023-04-03 18:30 ` [PATCH 10/13] block/export: rewrite vduse-blk drain code Stefan Hajnoczi
@ 2023-04-03 18:30 ` Stefan Hajnoczi
  2023-04-04 13:46   ` Paolo Bonzini
  2023-04-03 18:30 ` [PATCH 12/13] block/fuse: do not set is_external=true on FUSE fd Stefan Hajnoczi
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-03 18:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

These functions must be called with the AioContext acquired:

  /* Callers must hold exp->ctx lock */
  void blk_exp_ref(BlockExport *exp)
  ...
  /* Callers must hold exp->ctx lock */
  void blk_exp_unref(BlockExport *exp)

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/export/fuse.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 06fa41079e..18394f9e07 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -244,7 +244,9 @@ static void read_from_fuse_export(void *opaque)
     FuseExport *exp = opaque;
     int ret;
 
+    aio_context_acquire(exp->common.ctx);
     blk_exp_ref(&exp->common);
+    aio_context_release(exp->common.ctx);
 
     do {
         ret = fuse_session_receive_buf(exp->fuse_session, &exp->fuse_buf);
@@ -256,7 +258,9 @@ static void read_from_fuse_export(void *opaque)
     fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf);
 
 out:
+    aio_context_acquire(exp->common.ctx);
     blk_exp_unref(&exp->common);
+    aio_context_release(exp->common.ctx);
 }
 
 static void fuse_export_shutdown(BlockExport *blk_exp)
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 12/13] block/fuse: do not set is_external=true on FUSE fd
  2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (10 preceding siblings ...)
  2023-04-03 18:30 ` [PATCH 11/13] block/fuse: take AioContext lock around blk_exp_ref/unref() Stefan Hajnoczi
@ 2023-04-03 18:30 ` Stefan Hajnoczi
  2023-04-03 18:30 ` [PATCH 13/13] aio: remove aio_disable_external() API Stefan Hajnoczi
  2023-04-04 13:43 ` [PATCH 00/13] block: " Paolo Bonzini
  13 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-03 18:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

This is part of ongoing work to remove the aio_disable_external() API.

Use BlockDevOps .drained_begin/end/poll() instead of
aio_set_fd_handler(is_external=true).

As a side-effect the FUSE export now follows AioContext changes like the
other export types.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/export/fuse.c | 58 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 56 insertions(+), 2 deletions(-)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 18394f9e07..83bccf046b 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -50,6 +50,7 @@ typedef struct FuseExport {
 
     struct fuse_session *fuse_session;
     struct fuse_buf fuse_buf;
+    unsigned int in_flight; /* atomic */
     bool mounted, fd_handler_set_up;
 
     char *mountpoint;
@@ -78,6 +79,42 @@ static void read_from_fuse_export(void *opaque);
 static bool is_regular_file(const char *path, Error **errp);
 
 
+static void fuse_export_drained_begin(void *opaque)
+{
+    FuseExport *exp = opaque;
+
+    aio_set_fd_handler(exp->common.ctx,
+                       fuse_session_fd(exp->fuse_session), false,
+                       NULL, NULL, NULL, NULL, NULL);
+    exp->fd_handler_set_up = false;
+}
+
+static void fuse_export_drained_end(void *opaque)
+{
+    FuseExport *exp = opaque;
+
+    /* Refresh AioContext in case it changed */
+    exp->common.ctx = blk_get_aio_context(exp->common.blk);
+
+    aio_set_fd_handler(exp->common.ctx,
+                       fuse_session_fd(exp->fuse_session), false,
+                       read_from_fuse_export, NULL, NULL, NULL, exp);
+    exp->fd_handler_set_up = true;
+}
+
+static bool fuse_export_drained_poll(void *opaque)
+{
+    FuseExport *exp = opaque;
+
+    return qatomic_read(&exp->in_flight) > 0;
+}
+
+static const BlockDevOps fuse_export_blk_dev_ops = {
+    .drained_begin = fuse_export_drained_begin,
+    .drained_end   = fuse_export_drained_end,
+    .drained_poll  = fuse_export_drained_poll,
+};
+
 static int fuse_export_create(BlockExport *blk_exp,
                               BlockExportOptions *blk_exp_args,
                               Error **errp)
@@ -101,6 +138,15 @@ static int fuse_export_create(BlockExport *blk_exp,
         }
     }
 
+    blk_set_dev_ops(exp->common.blk, &fuse_export_blk_dev_ops, exp);
+
+    /*
+     * We handle draining ourselves using an in-flight counter and by disabling
+     * the FUSE fd handler. Do not queue BlockBackend requests, they need to
+     * complete so the in-flight counter reaches zero.
+     */
+    blk_set_disable_request_queuing(exp->common.blk, true);
+
     init_exports_table();
 
     /*
@@ -224,7 +270,7 @@ static int setup_fuse_export(FuseExport *exp, const char *mountpoint,
     g_hash_table_insert(exports, g_strdup(mountpoint), NULL);
 
     aio_set_fd_handler(exp->common.ctx,
-                       fuse_session_fd(exp->fuse_session), true,
+                       fuse_session_fd(exp->fuse_session), false,
                        read_from_fuse_export, NULL, NULL, NULL, exp);
     exp->fd_handler_set_up = true;
 
@@ -248,6 +294,8 @@ static void read_from_fuse_export(void *opaque)
     blk_exp_ref(&exp->common);
     aio_context_release(exp->common.ctx);
 
+    qatomic_inc(&exp->in_flight);
+
     do {
         ret = fuse_session_receive_buf(exp->fuse_session, &exp->fuse_buf);
     } while (ret == -EINTR);
@@ -258,6 +306,10 @@ static void read_from_fuse_export(void *opaque)
     fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf);
 
 out:
+    if (qatomic_fetch_dec(&exp->in_flight) == 1) {
+        aio_wait_kick(); /* wake AIO_WAIT_WHILE() */
+    }
+
     aio_context_acquire(exp->common.ctx);
     blk_exp_unref(&exp->common);
     aio_context_release(exp->common.ctx);
@@ -272,7 +324,7 @@ static void fuse_export_shutdown(BlockExport *blk_exp)
 
         if (exp->fd_handler_set_up) {
             aio_set_fd_handler(exp->common.ctx,
-                               fuse_session_fd(exp->fuse_session), true,
+                               fuse_session_fd(exp->fuse_session), false,
                                NULL, NULL, NULL, NULL, NULL);
             exp->fd_handler_set_up = false;
         }
@@ -291,6 +343,8 @@ static void fuse_export_delete(BlockExport *blk_exp)
 {
     FuseExport *exp = container_of(blk_exp, FuseExport, common);
 
+    blk_set_dev_ops(exp->common.blk, NULL, NULL);
+
     if (exp->fuse_session) {
         if (exp->mounted) {
             fuse_session_unmount(exp->fuse_session);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 13/13] aio: remove aio_disable_external() API
  2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (11 preceding siblings ...)
  2023-04-03 18:30 ` [PATCH 12/13] block/fuse: do not set is_external=true on FUSE fd Stefan Hajnoczi
@ 2023-04-03 18:30 ` Stefan Hajnoczi
  2023-04-04  9:16   ` Juan Quintela
  2023-04-04 13:43 ` [PATCH 00/13] block: " Paolo Bonzini
  13 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-03 18:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

All callers now pass is_external=false to aio_set_fd_handler() and
aio_set_event_notifier(). The aio_disable_external() API that
temporarily disables fd handlers that were registered is_external=true
is therefore dead code.

Remove aio_disable_external(), aio_enable_external(), and the
is_external arguments to aio_set_fd_handler() and
aio_set_event_notifier().

The entire test-fdmon-epoll test is removed because its sole purpose was
testing aio_disable_external().

Parts of this patch were generated using the following coccinelle
(https://coccinelle.lip6.fr/) semantic patch:

  @@
  expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque;
  @@
  - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque)
  + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque)

  @@
  expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
  @@
  - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready)
  + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/aio.h           | 55 --------------------------
 util/aio-posix.h              |  1 -
 block.c                       |  7 ----
 block/blkio.c                 | 15 +++----
 block/curl.c                  | 10 ++---
 block/export/fuse.c           |  8 ++--
 block/export/vduse-blk.c      | 10 ++---
 block/io.c                    |  2 -
 block/io_uring.c              |  4 +-
 block/iscsi.c                 |  3 +-
 block/linux-aio.c             |  4 +-
 block/nfs.c                   |  5 +--
 block/nvme.c                  |  8 ++--
 block/ssh.c                   |  4 +-
 block/win32-aio.c             |  6 +--
 hw/i386/kvm/xen_xenstore.c    |  2 +-
 hw/virtio/virtio.c            |  6 +--
 hw/xen/xen-bus.c              |  6 +--
 io/channel-command.c          |  6 +--
 io/channel-file.c             |  3 +-
 io/channel-socket.c           |  3 +-
 migration/rdma.c              | 16 ++++----
 tests/unit/test-aio.c         | 27 +------------
 tests/unit/test-fdmon-epoll.c | 73 -----------------------------------
 util/aio-posix.c              | 20 +++-------
 util/aio-win32.c              |  8 +---
 util/async.c                  |  3 +-
 util/fdmon-epoll.c            | 10 -----
 util/fdmon-io_uring.c         |  8 +---
 util/fdmon-poll.c             |  3 +-
 util/main-loop.c              |  7 ++--
 util/qemu-coroutine-io.c      |  7 ++--
 util/vhost-user-server.c      | 11 +++---
 tests/unit/meson.build        |  3 --
 34 files changed, 75 insertions(+), 289 deletions(-)
 delete mode 100644 tests/unit/test-fdmon-epoll.c

diff --git a/include/block/aio.h b/include/block/aio.h
index e267d918fd..d4ce01ea08 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -467,7 +467,6 @@ bool aio_poll(AioContext *ctx, bool blocking);
  */
 void aio_set_fd_handler(AioContext *ctx,
                         int fd,
-                        bool is_external,
                         IOHandler *io_read,
                         IOHandler *io_write,
                         AioPollFn *io_poll,
@@ -483,7 +482,6 @@ void aio_set_fd_handler(AioContext *ctx,
  */
 void aio_set_event_notifier(AioContext *ctx,
                             EventNotifier *notifier,
-                            bool is_external,
                             EventNotifierHandler *io_read,
                             AioPollFn *io_poll,
                             EventNotifierHandler *io_poll_ready);
@@ -612,59 +610,6 @@ static inline void aio_timer_init(AioContext *ctx,
  */
 int64_t aio_compute_timeout(AioContext *ctx);
 
-/**
- * aio_disable_external:
- * @ctx: the aio context
- *
- * Disable the further processing of external clients.
- */
-static inline void aio_disable_external(AioContext *ctx)
-{
-    qatomic_inc(&ctx->external_disable_cnt);
-}
-
-/**
- * aio_enable_external:
- * @ctx: the aio context
- *
- * Enable the processing of external clients.
- */
-static inline void aio_enable_external(AioContext *ctx)
-{
-    int old;
-
-    old = qatomic_fetch_dec(&ctx->external_disable_cnt);
-    assert(old > 0);
-    if (old == 1) {
-        /* Kick event loop so it re-arms file descriptors */
-        aio_notify(ctx);
-    }
-}
-
-/**
- * aio_external_disabled:
- * @ctx: the aio context
- *
- * Return true if the external clients are disabled.
- */
-static inline bool aio_external_disabled(AioContext *ctx)
-{
-    return qatomic_read(&ctx->external_disable_cnt);
-}
-
-/**
- * aio_node_check:
- * @ctx: the aio context
- * @is_external: Whether or not the checked node is an external event source.
- *
- * Check if the node's is_external flag is okay to be polled by the ctx at this
- * moment. True means green light.
- */
-static inline bool aio_node_check(AioContext *ctx, bool is_external)
-{
-    return !is_external || !qatomic_read(&ctx->external_disable_cnt);
-}
-
 /**
  * aio_co_schedule:
  * @ctx: the aio context
diff --git a/util/aio-posix.h b/util/aio-posix.h
index 80b927c7f4..4264c518be 100644
--- a/util/aio-posix.h
+++ b/util/aio-posix.h
@@ -38,7 +38,6 @@ struct AioHandler {
 #endif
     int64_t poll_idle_timeout; /* when to stop userspace polling */
     bool poll_ready; /* has polling detected an event? */
-    bool is_external;
 };
 
 /* Add a handler to a ready list */
diff --git a/block.c b/block.c
index a79297f99b..e9625ffeee 100644
--- a/block.c
+++ b/block.c
@@ -7254,9 +7254,6 @@ static void bdrv_detach_aio_context(BlockDriverState *bs)
         bs->drv->bdrv_detach_aio_context(bs);
     }
 
-    if (bs->quiesce_counter) {
-        aio_enable_external(bs->aio_context);
-    }
     bs->aio_context = NULL;
 }
 
@@ -7266,10 +7263,6 @@ static void bdrv_attach_aio_context(BlockDriverState *bs,
     BdrvAioNotifier *ban, *ban_tmp;
     GLOBAL_STATE_CODE();
 
-    if (bs->quiesce_counter) {
-        aio_disable_external(new_context);
-    }
-
     bs->aio_context = new_context;
 
     if (bs->drv && bs->drv->bdrv_attach_aio_context) {
diff --git a/block/blkio.c b/block/blkio.c
index 0cdc99a729..72117fa005 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -306,23 +306,18 @@ static void blkio_attach_aio_context(BlockDriverState *bs,
 {
     BDRVBlkioState *s = bs->opaque;
 
-    aio_set_fd_handler(new_context,
-                       s->completion_fd,
-                       false,
-                       blkio_completion_fd_read,
-                       NULL,
+    aio_set_fd_handler(new_context, s->completion_fd,
+                       blkio_completion_fd_read, NULL,
                        blkio_completion_fd_poll,
-                       blkio_completion_fd_poll_ready,
-                       bs);
+                       blkio_completion_fd_poll_ready, bs);
 }
 
 static void blkio_detach_aio_context(BlockDriverState *bs)
 {
     BDRVBlkioState *s = bs->opaque;
 
-    aio_set_fd_handler(bdrv_get_aio_context(bs),
-                       s->completion_fd,
-                       false, NULL, NULL, NULL, NULL, NULL);
+    aio_set_fd_handler(bdrv_get_aio_context(bs), s->completion_fd, NULL, NULL,
+                       NULL, NULL, NULL);
 }
 
 /* Call with s->blkio_lock held to submit I/O after enqueuing a new request */
diff --git a/block/curl.c b/block/curl.c
index 8bb39a134e..0fc42d03d7 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -132,7 +132,7 @@ static gboolean curl_drop_socket(void *key, void *value, void *opaque)
     CURLSocket *socket = value;
     BDRVCURLState *s = socket->s;
 
-    aio_set_fd_handler(s->aio_context, socket->fd, false,
+    aio_set_fd_handler(s->aio_context, socket->fd,
                        NULL, NULL, NULL, NULL, NULL);
     return true;
 }
@@ -180,20 +180,20 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
     trace_curl_sock_cb(action, (int)fd);
     switch (action) {
         case CURL_POLL_IN:
-            aio_set_fd_handler(s->aio_context, fd, false,
+            aio_set_fd_handler(s->aio_context, fd,
                                curl_multi_do, NULL, NULL, NULL, socket);
             break;
         case CURL_POLL_OUT:
-            aio_set_fd_handler(s->aio_context, fd, false,
+            aio_set_fd_handler(s->aio_context, fd,
                                NULL, curl_multi_do, NULL, NULL, socket);
             break;
         case CURL_POLL_INOUT:
-            aio_set_fd_handler(s->aio_context, fd, false,
+            aio_set_fd_handler(s->aio_context, fd,
                                curl_multi_do, curl_multi_do,
                                NULL, NULL, socket);
             break;
         case CURL_POLL_REMOVE:
-            aio_set_fd_handler(s->aio_context, fd, false,
+            aio_set_fd_handler(s->aio_context, fd,
                                NULL, NULL, NULL, NULL, NULL);
             break;
     }
diff --git a/block/export/fuse.c b/block/export/fuse.c
index 83bccf046b..9d4c6c4dee 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -84,7 +84,7 @@ static void fuse_export_drained_begin(void *opaque)
     FuseExport *exp = opaque;
 
     aio_set_fd_handler(exp->common.ctx,
-                       fuse_session_fd(exp->fuse_session), false,
+                       fuse_session_fd(exp->fuse_session),
                        NULL, NULL, NULL, NULL, NULL);
     exp->fd_handler_set_up = false;
 }
@@ -97,7 +97,7 @@ static void fuse_export_drained_end(void *opaque)
     exp->common.ctx = blk_get_aio_context(exp->common.blk);
 
     aio_set_fd_handler(exp->common.ctx,
-                       fuse_session_fd(exp->fuse_session), false,
+                       fuse_session_fd(exp->fuse_session),
                        read_from_fuse_export, NULL, NULL, NULL, exp);
     exp->fd_handler_set_up = true;
 }
@@ -270,7 +270,7 @@ static int setup_fuse_export(FuseExport *exp, const char *mountpoint,
     g_hash_table_insert(exports, g_strdup(mountpoint), NULL);
 
     aio_set_fd_handler(exp->common.ctx,
-                       fuse_session_fd(exp->fuse_session), false,
+                       fuse_session_fd(exp->fuse_session),
                        read_from_fuse_export, NULL, NULL, NULL, exp);
     exp->fd_handler_set_up = true;
 
@@ -324,7 +324,7 @@ static void fuse_export_shutdown(BlockExport *blk_exp)
 
         if (exp->fd_handler_set_up) {
             aio_set_fd_handler(exp->common.ctx,
-                               fuse_session_fd(exp->fuse_session), false,
+                               fuse_session_fd(exp->fuse_session),
                                NULL, NULL, NULL, NULL, NULL);
             exp->fd_handler_set_up = false;
         }
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 35dc8fcf45..94e2491e4a 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -141,7 +141,7 @@ static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq)
     }
 
     aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
-                       false, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+                       on_vduse_vq_kick, NULL, NULL, NULL, vq);
     /* Make sure we don't miss any kick afer reconnecting */
     eventfd_write(vduse_queue_get_fd(vq), 1);
 }
@@ -155,7 +155,7 @@ static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
         return;
     }
 
-    aio_set_fd_handler(vblk_exp->export.ctx, fd, false,
+    aio_set_fd_handler(vblk_exp->export.ctx, fd,
                        NULL, NULL, NULL, NULL, NULL);
 }
 
@@ -174,7 +174,7 @@ static void on_vduse_dev_kick(void *opaque)
 static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx)
 {
     aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
-                       false, on_vduse_dev_kick, NULL, NULL, NULL,
+                       on_vduse_dev_kick, NULL, NULL, NULL,
                        vblk_exp->dev);
 
     /* Virtqueues are handled by vduse_blk_drained_end() */
@@ -183,7 +183,7 @@ static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx)
 static void vduse_blk_detach_ctx(VduseBlkExport *vblk_exp)
 {
     aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
-                       false, NULL, NULL, NULL, NULL, NULL);
+                       NULL, NULL, NULL, NULL, NULL);
 
     /* Virtqueues are handled by vduse_blk_drained_begin() */
 }
@@ -368,7 +368,7 @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
         vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
     }
 
-    aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), false,
+    aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev),
                        on_vduse_dev_kick, NULL, NULL, NULL, vblk_exp->dev);
 
     blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
diff --git a/block/io.c b/block/io.c
index db438c7657..5b0b126422 100644
--- a/block/io.c
+++ b/block/io.c
@@ -356,7 +356,6 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
 
     /* Stop things in parent-to-child order */
     if (qatomic_fetch_inc(&bs->quiesce_counter) == 0) {
-        aio_disable_external(bdrv_get_aio_context(bs));
         bdrv_parent_drained_begin(bs, parent);
         if (bs->drv && bs->drv->bdrv_drain_begin) {
             bs->drv->bdrv_drain_begin(bs);
@@ -409,7 +408,6 @@ static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent)
             bs->drv->bdrv_drain_end(bs);
         }
         bdrv_parent_drained_end(bs, parent);
-        aio_enable_external(bdrv_get_aio_context(bs));
     }
 }
 
diff --git a/block/io_uring.c b/block/io_uring.c
index 989f9a99ed..b64a3e6285 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -406,7 +406,7 @@ int coroutine_fn luring_co_submit(BlockDriverState *bs, int fd, uint64_t offset,
 
 void luring_detach_aio_context(LuringState *s, AioContext *old_context)
 {
-    aio_set_fd_handler(old_context, s->ring.ring_fd, false,
+    aio_set_fd_handler(old_context, s->ring.ring_fd,
                        NULL, NULL, NULL, NULL, s);
     qemu_bh_delete(s->completion_bh);
     s->aio_context = NULL;
@@ -416,7 +416,7 @@ void luring_attach_aio_context(LuringState *s, AioContext *new_context)
 {
     s->aio_context = new_context;
     s->completion_bh = aio_bh_new(new_context, qemu_luring_completion_bh, s);
-    aio_set_fd_handler(s->aio_context, s->ring.ring_fd, false,
+    aio_set_fd_handler(s->aio_context, s->ring.ring_fd,
                        qemu_luring_completion_cb, NULL,
                        qemu_luring_poll_cb, qemu_luring_poll_ready, s);
 }
diff --git a/block/iscsi.c b/block/iscsi.c
index 9fc0bed90b..34f97ab646 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -363,7 +363,6 @@ iscsi_set_events(IscsiLun *iscsilun)
 
     if (ev != iscsilun->events) {
         aio_set_fd_handler(iscsilun->aio_context, iscsi_get_fd(iscsi),
-                           false,
                            (ev & POLLIN) ? iscsi_process_read : NULL,
                            (ev & POLLOUT) ? iscsi_process_write : NULL,
                            NULL, NULL,
@@ -1540,7 +1539,7 @@ static void iscsi_detach_aio_context(BlockDriverState *bs)
     IscsiLun *iscsilun = bs->opaque;
 
     aio_set_fd_handler(iscsilun->aio_context, iscsi_get_fd(iscsilun->iscsi),
-                       false, NULL, NULL, NULL, NULL, NULL);
+                       NULL, NULL, NULL, NULL, NULL);
     iscsilun->events = 0;
 
     if (iscsilun->nop_timer) {
diff --git a/block/linux-aio.c b/block/linux-aio.c
index fc50cdd1bf..129908531a 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -443,7 +443,7 @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qiov,
 
 void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context)
 {
-    aio_set_event_notifier(old_context, &s->e, false, NULL, NULL, NULL);
+    aio_set_event_notifier(old_context, &s->e, NULL, NULL, NULL);
     qemu_bh_delete(s->completion_bh);
     s->aio_context = NULL;
 }
@@ -452,7 +452,7 @@ void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context)
 {
     s->aio_context = new_context;
     s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s);
-    aio_set_event_notifier(new_context, &s->e, false,
+    aio_set_event_notifier(new_context, &s->e,
                            qemu_laio_completion_cb,
                            qemu_laio_poll_cb,
                            qemu_laio_poll_ready);
diff --git a/block/nfs.c b/block/nfs.c
index 351dc6ec8d..2f603bba42 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -195,7 +195,6 @@ static void nfs_set_events(NFSClient *client)
     int ev = nfs_which_events(client->context);
     if (ev != client->events) {
         aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
-                           false,
                            (ev & POLLIN) ? nfs_process_read : NULL,
                            (ev & POLLOUT) ? nfs_process_write : NULL,
                            NULL, NULL, client);
@@ -373,7 +372,7 @@ static void nfs_detach_aio_context(BlockDriverState *bs)
     NFSClient *client = bs->opaque;
 
     aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
-                       false, NULL, NULL, NULL, NULL, NULL);
+                       NULL, NULL, NULL, NULL, NULL);
     client->events = 0;
 }
 
@@ -391,7 +390,7 @@ static void nfs_client_close(NFSClient *client)
     if (client->context) {
         qemu_mutex_lock(&client->mutex);
         aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
-                           false, NULL, NULL, NULL, NULL, NULL);
+                           NULL, NULL, NULL, NULL, NULL);
         qemu_mutex_unlock(&client->mutex);
         if (client->fh) {
             nfs_close(client->context, client->fh);
diff --git a/block/nvme.c b/block/nvme.c
index 5b744c2bda..17937d398d 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -862,7 +862,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
     }
     aio_set_event_notifier(bdrv_get_aio_context(bs),
                            &s->irq_notifier[MSIX_SHARED_IRQ_IDX],
-                           false, nvme_handle_event, nvme_poll_cb,
+                           nvme_handle_event, nvme_poll_cb,
                            nvme_poll_ready);
 
     if (!nvme_identify(bs, namespace, errp)) {
@@ -948,7 +948,7 @@ static void nvme_close(BlockDriverState *bs)
     g_free(s->queues);
     aio_set_event_notifier(bdrv_get_aio_context(bs),
                            &s->irq_notifier[MSIX_SHARED_IRQ_IDX],
-                           false, NULL, NULL, NULL);
+                           NULL, NULL, NULL);
     event_notifier_cleanup(&s->irq_notifier[MSIX_SHARED_IRQ_IDX]);
     qemu_vfio_pci_unmap_bar(s->vfio, 0, s->bar0_wo_map,
                             0, sizeof(NvmeBar) + NVME_DOORBELL_SIZE);
@@ -1546,7 +1546,7 @@ static void nvme_detach_aio_context(BlockDriverState *bs)
 
     aio_set_event_notifier(bdrv_get_aio_context(bs),
                            &s->irq_notifier[MSIX_SHARED_IRQ_IDX],
-                           false, NULL, NULL, NULL);
+                           NULL, NULL, NULL);
 }
 
 static void nvme_attach_aio_context(BlockDriverState *bs,
@@ -1556,7 +1556,7 @@ static void nvme_attach_aio_context(BlockDriverState *bs,
 
     s->aio_context = new_context;
     aio_set_event_notifier(new_context, &s->irq_notifier[MSIX_SHARED_IRQ_IDX],
-                           false, nvme_handle_event, nvme_poll_cb,
+                           nvme_handle_event, nvme_poll_cb,
                            nvme_poll_ready);
 
     for (unsigned i = 0; i < s->queue_count; i++) {
diff --git a/block/ssh.c b/block/ssh.c
index b3b3352075..2748253d4a 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -1019,7 +1019,7 @@ static void restart_coroutine(void *opaque)
     AioContext *ctx = bdrv_get_aio_context(bs);
 
     trace_ssh_restart_coroutine(restart->co);
-    aio_set_fd_handler(ctx, s->sock, false, NULL, NULL, NULL, NULL, NULL);
+    aio_set_fd_handler(ctx, s->sock, NULL, NULL, NULL, NULL, NULL);
 
     aio_co_wake(restart->co);
 }
@@ -1049,7 +1049,7 @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
     trace_ssh_co_yield(s->sock, rd_handler, wr_handler);
 
     aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
-                       false, rd_handler, wr_handler, NULL, NULL, &restart);
+                       rd_handler, wr_handler, NULL, NULL, &restart);
     qemu_coroutine_yield();
     trace_ssh_co_yield_back(s->sock);
 }
diff --git a/block/win32-aio.c b/block/win32-aio.c
index ee87d6048f..6327861e1d 100644
--- a/block/win32-aio.c
+++ b/block/win32-aio.c
@@ -174,7 +174,7 @@ int win32_aio_attach(QEMUWin32AIOState *aio, HANDLE hfile)
 void win32_aio_detach_aio_context(QEMUWin32AIOState *aio,
                                   AioContext *old_context)
 {
-    aio_set_event_notifier(old_context, &aio->e, false, NULL, NULL, NULL);
+    aio_set_event_notifier(old_context, &aio->e, NULL, NULL, NULL);
     aio->aio_ctx = NULL;
 }
 
@@ -182,8 +182,8 @@ void win32_aio_attach_aio_context(QEMUWin32AIOState *aio,
                                   AioContext *new_context)
 {
     aio->aio_ctx = new_context;
-    aio_set_event_notifier(new_context, &aio->e, false,
-                           win32_aio_completion_cb, NULL, NULL);
+    aio_set_event_notifier(new_context, &aio->e, win32_aio_completion_cb,
+                           NULL, NULL);
 }
 
 QEMUWin32AIOState *win32_aio_init(void)
diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index 6e81bc8791..0b189c6ab8 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -133,7 +133,7 @@ static void xen_xenstore_realize(DeviceState *dev, Error **errp)
         error_setg(errp, "Xenstore evtchn port init failed");
         return;
     }
-    aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), false,
+    aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh),
                        xen_xenstore_event, NULL, NULL, NULL, s);
 
     s->impl = xs_impl_create(xen_domid);
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index dcd7aabb4e..6125e4d556 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3491,7 +3491,7 @@ static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
 
 void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
 {
-    aio_set_event_notifier(ctx, &vq->host_notifier, false,
+    aio_set_event_notifier(ctx, &vq->host_notifier,
                            virtio_queue_host_notifier_read,
                            virtio_queue_host_notifier_aio_poll,
                            virtio_queue_host_notifier_aio_poll_ready);
@@ -3508,14 +3508,14 @@ void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
  */
 void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext *ctx)
 {
-    aio_set_event_notifier(ctx, &vq->host_notifier, false,
+    aio_set_event_notifier(ctx, &vq->host_notifier,
                            virtio_queue_host_notifier_read,
                            NULL, NULL);
 }
 
 void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx)
 {
-    aio_set_event_notifier(ctx, &vq->host_notifier, false, NULL, NULL, NULL);
+    aio_set_event_notifier(ctx, &vq->host_notifier, NULL, NULL, NULL);
     /* Test and clear notifier before after disabling event,
      * in case poll callback didn't have time to run. */
     virtio_queue_host_notifier_read(&vq->host_notifier);
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index c4fd26abe1..7b5c338ea1 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -842,11 +842,11 @@ void xen_device_set_event_channel_context(XenDevice *xendev,
     }
 
     if (channel->ctx)
-        aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
+        aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh),
                            NULL, NULL, NULL, NULL, NULL);
 
     channel->ctx = ctx;
-    aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
+    aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh),
                        xen_device_event, NULL, xen_device_poll, NULL, channel);
 }
 
@@ -920,7 +920,7 @@ void xen_device_unbind_event_channel(XenDevice *xendev,
 
     QLIST_REMOVE(channel, list);
 
-    aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
+    aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh),
                        NULL, NULL, NULL, NULL, NULL);
 
     if (qemu_xen_evtchn_unbind(channel->xeh, channel->local_port) < 0) {
diff --git a/io/channel-command.c b/io/channel-command.c
index e7edd091af..7ed726c802 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -337,10 +337,8 @@ static void qio_channel_command_set_aio_fd_handler(QIOChannel *ioc,
                                                    void *opaque)
 {
     QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc);
-    aio_set_fd_handler(ctx, cioc->readfd, false,
-                       io_read, NULL, NULL, NULL, opaque);
-    aio_set_fd_handler(ctx, cioc->writefd, false,
-                       NULL, io_write, NULL, NULL, opaque);
+    aio_set_fd_handler(ctx, cioc->readfd, io_read, NULL, NULL, NULL, opaque);
+    aio_set_fd_handler(ctx, cioc->writefd, NULL, io_write, NULL, NULL, opaque);
 }
 
 
diff --git a/io/channel-file.c b/io/channel-file.c
index d76663e6ae..8b5821f452 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -198,8 +198,7 @@ static void qio_channel_file_set_aio_fd_handler(QIOChannel *ioc,
                                                 void *opaque)
 {
     QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
-    aio_set_fd_handler(ctx, fioc->fd, false, io_read, io_write,
-                       NULL, NULL, opaque);
+    aio_set_fd_handler(ctx, fioc->fd, io_read, io_write, NULL, NULL, opaque);
 }
 
 static GSource *qio_channel_file_create_watch(QIOChannel *ioc,
diff --git a/io/channel-socket.c b/io/channel-socket.c
index b0ea7d48b3..d99945ebec 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -899,8 +899,7 @@ static void qio_channel_socket_set_aio_fd_handler(QIOChannel *ioc,
                                                   void *opaque)
 {
     QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
-    aio_set_fd_handler(ctx, sioc->fd, false,
-                       io_read, io_write, NULL, NULL, opaque);
+    aio_set_fd_handler(ctx, sioc->fd, io_read, io_write, NULL, NULL, opaque);
 }
 
 static GSource *qio_channel_socket_create_watch(QIOChannel *ioc,
diff --git a/migration/rdma.c b/migration/rdma.c
index df646be35e..aee41ca43e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3104,15 +3104,15 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
     if (io_read) {
-        aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd,
-                           false, io_read, io_write, NULL, NULL, opaque);
-        aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd,
-                           false, io_read, io_write, NULL, NULL, opaque);
+        aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd, io_read,
+                           io_write, NULL, NULL, opaque);
+        aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd, io_read,
+                           io_write, NULL, NULL, opaque);
     } else {
-        aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd,
-                           false, io_read, io_write, NULL, NULL, opaque);
-        aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd,
-                           false, io_read, io_write, NULL, NULL, opaque);
+        aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd, io_read,
+                           io_write, NULL, NULL, opaque);
+        aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd, io_read,
+                           io_write, NULL, NULL, opaque);
     }
 }
 
diff --git a/tests/unit/test-aio.c b/tests/unit/test-aio.c
index 321d7ab01a..519440eed3 100644
--- a/tests/unit/test-aio.c
+++ b/tests/unit/test-aio.c
@@ -130,7 +130,7 @@ static void *test_acquire_thread(void *opaque)
 static void set_event_notifier(AioContext *ctx, EventNotifier *notifier,
                                EventNotifierHandler *handler)
 {
-    aio_set_event_notifier(ctx, notifier, false, handler, NULL, NULL);
+    aio_set_event_notifier(ctx, notifier, handler, NULL, NULL);
 }
 
 static void dummy_notifier_read(EventNotifier *n)
@@ -383,30 +383,6 @@ static void test_flush_event_notifier(void)
     event_notifier_cleanup(&data.e);
 }
 
-static void test_aio_external_client(void)
-{
-    int i, j;
-
-    for (i = 1; i < 3; i++) {
-        EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true };
-        event_notifier_init(&data.e, false);
-        aio_set_event_notifier(ctx, &data.e, true, event_ready_cb, NULL, NULL);
-        event_notifier_set(&data.e);
-        for (j = 0; j < i; j++) {
-            aio_disable_external(ctx);
-        }
-        for (j = 0; j < i; j++) {
-            assert(!aio_poll(ctx, false));
-            assert(event_notifier_test_and_clear(&data.e));
-            event_notifier_set(&data.e);
-            aio_enable_external(ctx);
-        }
-        assert(aio_poll(ctx, false));
-        set_event_notifier(ctx, &data.e, NULL);
-        event_notifier_cleanup(&data.e);
-    }
-}
-
 static void test_wait_event_notifier_noflush(void)
 {
     EventNotifierTestData data = { .n = 0 };
@@ -935,7 +911,6 @@ int main(int argc, char **argv)
     g_test_add_func("/aio/event/wait",              test_wait_event_notifier);
     g_test_add_func("/aio/event/wait/no-flush-cb",  test_wait_event_notifier_noflush);
     g_test_add_func("/aio/event/flush",             test_flush_event_notifier);
-    g_test_add_func("/aio/external-client",         test_aio_external_client);
     g_test_add_func("/aio/timer/schedule",          test_timer_schedule);
 
     g_test_add_func("/aio/coroutine/queue-chaining", test_queue_chaining);
diff --git a/tests/unit/test-fdmon-epoll.c b/tests/unit/test-fdmon-epoll.c
deleted file mode 100644
index ef5a856d09..0000000000
--- a/tests/unit/test-fdmon-epoll.c
+++ /dev/null
@@ -1,73 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * fdmon-epoll tests
- *
- * Copyright (c) 2020 Red Hat, Inc.
- */
-
-#include "qemu/osdep.h"
-#include "block/aio.h"
-#include "qapi/error.h"
-#include "qemu/main-loop.h"
-
-static AioContext *ctx;
-
-static void dummy_fd_handler(EventNotifier *notifier)
-{
-    event_notifier_test_and_clear(notifier);
-}
-
-static void add_event_notifiers(EventNotifier *notifiers, size_t n)
-{
-    for (size_t i = 0; i < n; i++) {
-        event_notifier_init(&notifiers[i], false);
-        aio_set_event_notifier(ctx, &notifiers[i], false,
-                               dummy_fd_handler, NULL, NULL);
-    }
-}
-
-static void remove_event_notifiers(EventNotifier *notifiers, size_t n)
-{
-    for (size_t i = 0; i < n; i++) {
-        aio_set_event_notifier(ctx, &notifiers[i], false, NULL, NULL, NULL);
-        event_notifier_cleanup(&notifiers[i]);
-    }
-}
-
-/* Check that fd handlers work when external clients are disabled */
-static void test_external_disabled(void)
-{
-    EventNotifier notifiers[100];
-
-    /* fdmon-epoll is only enabled when many fd handlers are registered */
-    add_event_notifiers(notifiers, G_N_ELEMENTS(notifiers));
-
-    event_notifier_set(&notifiers[0]);
-    assert(aio_poll(ctx, true));
-
-    aio_disable_external(ctx);
-    event_notifier_set(&notifiers[0]);
-    assert(aio_poll(ctx, true));
-    aio_enable_external(ctx);
-
-    remove_event_notifiers(notifiers, G_N_ELEMENTS(notifiers));
-}
-
-int main(int argc, char **argv)
-{
-    /*
-     * This code relies on the fact that fdmon-io_uring disables itself when
-     * the glib main loop is in use. The main loop uses fdmon-poll and upgrades
-     * to fdmon-epoll when the number of fds exceeds a threshold.
-     */
-    qemu_init_main_loop(&error_fatal);
-    ctx = qemu_get_aio_context();
-
-    while (g_main_context_iteration(NULL, false)) {
-        /* Do nothing */
-    }
-
-    g_test_init(&argc, &argv, NULL);
-    g_test_add_func("/fdmon-epoll/external-disabled", test_external_disabled);
-    return g_test_run();
-}
diff --git a/util/aio-posix.c b/util/aio-posix.c
index a8be940f76..934b1bbb85 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -99,7 +99,6 @@ static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
 
 void aio_set_fd_handler(AioContext *ctx,
                         int fd,
-                        bool is_external,
                         IOHandler *io_read,
                         IOHandler *io_write,
                         AioPollFn *io_poll,
@@ -144,7 +143,6 @@ void aio_set_fd_handler(AioContext *ctx,
         new_node->io_poll = io_poll;
         new_node->io_poll_ready = io_poll_ready;
         new_node->opaque = opaque;
-        new_node->is_external = is_external;
 
         if (is_new) {
             new_node->pfd.fd = fd;
@@ -196,12 +194,11 @@ static void aio_set_fd_poll(AioContext *ctx, int fd,
 
 void aio_set_event_notifier(AioContext *ctx,
                             EventNotifier *notifier,
-                            bool is_external,
                             EventNotifierHandler *io_read,
                             AioPollFn *io_poll,
                             EventNotifierHandler *io_poll_ready)
 {
-    aio_set_fd_handler(ctx, event_notifier_get_fd(notifier), is_external,
+    aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
                        (IOHandler *)io_read, NULL, io_poll,
                        (IOHandler *)io_poll_ready, notifier);
 }
@@ -285,13 +282,11 @@ bool aio_pending(AioContext *ctx)
 
         /* TODO should this check poll ready? */
         revents = node->pfd.revents & node->pfd.events;
-        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read &&
-            aio_node_check(ctx, node->is_external)) {
+        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
             result = true;
             break;
         }
-        if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write &&
-            aio_node_check(ctx, node->is_external)) {
+        if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
             result = true;
             break;
         }
@@ -350,9 +345,7 @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
         QLIST_INSERT_HEAD(&ctx->poll_aio_handlers, node, node_poll);
     }
     if (!QLIST_IS_INSERTED(node, node_deleted) &&
-        poll_ready && revents == 0 &&
-        aio_node_check(ctx, node->is_external) &&
-        node->io_poll_ready) {
+        poll_ready && revents == 0 && node->io_poll_ready) {
         node->io_poll_ready(node->opaque);
 
         /*
@@ -364,7 +357,6 @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
 
     if (!QLIST_IS_INSERTED(node, node_deleted) &&
         (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
-        aio_node_check(ctx, node->is_external) &&
         node->io_read) {
         node->io_read(node->opaque);
 
@@ -375,7 +367,6 @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
     }
     if (!QLIST_IS_INSERTED(node, node_deleted) &&
         (revents & (G_IO_OUT | G_IO_ERR)) &&
-        aio_node_check(ctx, node->is_external) &&
         node->io_write) {
         node->io_write(node->opaque);
         progress = true;
@@ -436,8 +427,7 @@ static bool run_poll_handlers_once(AioContext *ctx,
     AioHandler *tmp;
 
     QLIST_FOREACH_SAFE(node, &ctx->poll_aio_handlers, node_poll, tmp) {
-        if (aio_node_check(ctx, node->is_external) &&
-            node->io_poll(node->opaque)) {
+        if (node->io_poll(node->opaque)) {
             aio_add_poll_ready_handler(ready_list, node);
 
             node->poll_idle_timeout = now + POLL_IDLE_INTERVAL_NS;
diff --git a/util/aio-win32.c b/util/aio-win32.c
index 6bded009a4..948ef47a4d 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -32,7 +32,6 @@ struct AioHandler {
     GPollFD pfd;
     int deleted;
     void *opaque;
-    bool is_external;
     QLIST_ENTRY(AioHandler) node;
 };
 
@@ -64,7 +63,6 @@ static void aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
 
 void aio_set_fd_handler(AioContext *ctx,
                         int fd,
-                        bool is_external,
                         IOHandler *io_read,
                         IOHandler *io_write,
                         AioPollFn *io_poll,
@@ -111,7 +109,6 @@ void aio_set_fd_handler(AioContext *ctx,
         node->opaque = opaque;
         node->io_read = io_read;
         node->io_write = io_write;
-        node->is_external = is_external;
 
         if (io_read) {
             bitmask |= FD_READ | FD_ACCEPT | FD_CLOSE;
@@ -135,7 +132,6 @@ void aio_set_fd_handler(AioContext *ctx,
 
 void aio_set_event_notifier(AioContext *ctx,
                             EventNotifier *e,
-                            bool is_external,
                             EventNotifierHandler *io_notify,
                             AioPollFn *io_poll,
                             EventNotifierHandler *io_poll_ready)
@@ -161,7 +157,6 @@ void aio_set_event_notifier(AioContext *ctx,
             node->e = e;
             node->pfd.fd = (uintptr_t)event_notifier_get_handle(e);
             node->pfd.events = G_IO_IN;
-            node->is_external = is_external;
             QLIST_INSERT_HEAD_RCU(&ctx->aio_handlers, node, node);
 
             g_source_add_poll(&ctx->source, &node->pfd);
@@ -368,8 +363,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
     /* fill fd sets */
     count = 0;
     QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
-        if (!node->deleted && node->io_notify
-            && aio_node_check(ctx, node->is_external)) {
+        if (!node->deleted && node->io_notify) {
             assert(count < MAXIMUM_WAIT_OBJECTS);
             events[count++] = event_notifier_get_handle(node->e);
         }
diff --git a/util/async.c b/util/async.c
index 21016a1ac7..be0726038e 100644
--- a/util/async.c
+++ b/util/async.c
@@ -377,7 +377,7 @@ aio_ctx_finalize(GSource     *source)
         g_free(bh);
     }
 
-    aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL, NULL);
+    aio_set_event_notifier(ctx, &ctx->notifier, NULL, NULL, NULL);
     event_notifier_cleanup(&ctx->notifier);
     qemu_rec_mutex_destroy(&ctx->lock);
     qemu_lockcnt_destroy(&ctx->list_lock);
@@ -561,7 +561,6 @@ AioContext *aio_context_new(Error **errp)
     QSLIST_INIT(&ctx->scheduled_coroutines);
 
     aio_set_event_notifier(ctx, &ctx->notifier,
-                           false,
                            aio_context_notifier_cb,
                            aio_context_notifier_poll,
                            aio_context_notifier_poll_ready);
diff --git a/util/fdmon-epoll.c b/util/fdmon-epoll.c
index e11a8a022e..ef3eacacd2 100644
--- a/util/fdmon-epoll.c
+++ b/util/fdmon-epoll.c
@@ -64,11 +64,6 @@ static int fdmon_epoll_wait(AioContext *ctx, AioHandlerList *ready_list,
     int i, ret = 0;
     struct epoll_event events[128];
 
-    /* Fall back while external clients are disabled */
-    if (qatomic_read(&ctx->external_disable_cnt)) {
-        return fdmon_poll_ops.wait(ctx, ready_list, timeout);
-    }
-
     if (timeout > 0) {
         ret = qemu_poll_ns(&pfd, 1, timeout);
         if (ret > 0) {
@@ -131,11 +126,6 @@ bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd)
         return false;
     }
 
-    /* Do not upgrade while external clients are disabled */
-    if (qatomic_read(&ctx->external_disable_cnt)) {
-        return false;
-    }
-
     if (npfd >= EPOLL_ENABLE_THRESHOLD) {
         if (fdmon_epoll_try_enable(ctx)) {
             return true;
diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
index ab43052dd7..17ec18b7bd 100644
--- a/util/fdmon-io_uring.c
+++ b/util/fdmon-io_uring.c
@@ -276,11 +276,6 @@ static int fdmon_io_uring_wait(AioContext *ctx, AioHandlerList *ready_list,
     unsigned wait_nr = 1; /* block until at least one cqe is ready */
     int ret;
 
-    /* Fall back while external clients are disabled */
-    if (qatomic_read(&ctx->external_disable_cnt)) {
-        return fdmon_poll_ops.wait(ctx, ready_list, timeout);
-    }
-
     if (timeout == 0) {
         wait_nr = 0; /* non-blocking */
     } else if (timeout > 0) {
@@ -315,8 +310,7 @@ static bool fdmon_io_uring_need_wait(AioContext *ctx)
         return true;
     }
 
-    /* Are we falling back to fdmon-poll? */
-    return qatomic_read(&ctx->external_disable_cnt);
+    return false;
 }
 
 static const FDMonOps fdmon_io_uring_ops = {
diff --git a/util/fdmon-poll.c b/util/fdmon-poll.c
index 5fe3b47865..17df917cf9 100644
--- a/util/fdmon-poll.c
+++ b/util/fdmon-poll.c
@@ -65,8 +65,7 @@ static int fdmon_poll_wait(AioContext *ctx, AioHandlerList *ready_list,
     assert(npfd == 0);
 
     QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
-        if (!QLIST_IS_INSERTED(node, node_deleted) && node->pfd.events
-                && aio_node_check(ctx, node->is_external)) {
+        if (!QLIST_IS_INSERTED(node, node_deleted) && node->pfd.events) {
             add_pollfd(node);
         }
     }
diff --git a/util/main-loop.c b/util/main-loop.c
index e180c85145..3e43a9cd38 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -642,14 +642,13 @@ void qemu_set_fd_handler(int fd,
                          void *opaque)
 {
     iohandler_init();
-    aio_set_fd_handler(iohandler_ctx, fd, false,
-                       fd_read, fd_write, NULL, NULL, opaque);
+    aio_set_fd_handler(iohandler_ctx, fd, fd_read, fd_write, NULL, NULL,
+                       opaque);
 }
 
 void event_notifier_set_handler(EventNotifier *e,
                                 EventNotifierHandler *handler)
 {
     iohandler_init();
-    aio_set_event_notifier(iohandler_ctx, e, false,
-                           handler, NULL, NULL);
+    aio_set_event_notifier(iohandler_ctx, e, handler, NULL, NULL);
 }
diff --git a/util/qemu-coroutine-io.c b/util/qemu-coroutine-io.c
index d791932d63..364f4d5abf 100644
--- a/util/qemu-coroutine-io.c
+++ b/util/qemu-coroutine-io.c
@@ -74,8 +74,7 @@ typedef struct {
 static void fd_coroutine_enter(void *opaque)
 {
     FDYieldUntilData *data = opaque;
-    aio_set_fd_handler(data->ctx, data->fd, false,
-                       NULL, NULL, NULL, NULL, NULL);
+    aio_set_fd_handler(data->ctx, data->fd, NULL, NULL, NULL, NULL, NULL);
     qemu_coroutine_enter(data->co);
 }
 
@@ -87,7 +86,7 @@ void coroutine_fn yield_until_fd_readable(int fd)
     data.ctx = qemu_get_current_aio_context();
     data.co = qemu_coroutine_self();
     data.fd = fd;
-    aio_set_fd_handler(
-        data.ctx, fd, false, fd_coroutine_enter, NULL, NULL, NULL, &data);
+    aio_set_fd_handler(data.ctx, fd, fd_coroutine_enter, NULL, NULL, NULL,
+                       &data);
     qemu_coroutine_yield();
 }
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 332aea9306..9ba19121a2 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -278,7 +278,7 @@ set_watch(VuDev *vu_dev, int fd, int vu_evt,
         vu_fd_watch->fd = fd;
         vu_fd_watch->cb = cb;
         qemu_socket_set_nonblock(fd);
-        aio_set_fd_handler(server->ioc->ctx, fd, false, kick_handler,
+        aio_set_fd_handler(server->ioc->ctx, fd, kick_handler,
                            NULL, NULL, NULL, vu_fd_watch);
         vu_fd_watch->vu_dev = vu_dev;
         vu_fd_watch->pvt = pvt;
@@ -299,8 +299,7 @@ static void remove_watch(VuDev *vu_dev, int fd)
     if (!vu_fd_watch) {
         return;
     }
-    aio_set_fd_handler(server->ioc->ctx, fd, false,
-                       NULL, NULL, NULL, NULL, NULL);
+    aio_set_fd_handler(server->ioc->ctx, fd, NULL, NULL, NULL, NULL, NULL);
 
     QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
     g_free(vu_fd_watch);
@@ -362,7 +361,7 @@ void vhost_user_server_stop(VuServer *server)
         VuFdWatch *vu_fd_watch;
 
         QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
-            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false,
+            aio_set_fd_handler(server->ctx, vu_fd_watch->fd,
                                NULL, NULL, NULL, NULL, vu_fd_watch);
         }
 
@@ -403,7 +402,7 @@ void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx)
     qio_channel_attach_aio_context(server->ioc, ctx);
 
     QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
-        aio_set_fd_handler(ctx, vu_fd_watch->fd, false, kick_handler, NULL,
+        aio_set_fd_handler(ctx, vu_fd_watch->fd, kick_handler, NULL,
                            NULL, NULL, vu_fd_watch);
     }
 
@@ -417,7 +416,7 @@ void vhost_user_server_detach_aio_context(VuServer *server)
         VuFdWatch *vu_fd_watch;
 
         QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
-            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false,
+            aio_set_fd_handler(server->ctx, vu_fd_watch->fd,
                                NULL, NULL, NULL, NULL, vu_fd_watch);
         }
 
diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index fa63cfe6ff..2980170397 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -121,9 +121,6 @@ if have_block
   if nettle.found() or gcrypt.found()
     tests += {'test-crypto-pbkdf': [io]}
   endif
-  if config_host_data.get('CONFIG_EPOLL_CREATE1')
-    tests += {'test-fdmon-epoll': [testblock]}
-  endif
 endif
 
 if have_system
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event
  2023-04-03 18:29 ` [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
@ 2023-04-03 20:47   ` Philippe Mathieu-Daudé
  2023-04-04 13:06     ` Stefan Hajnoczi
  2023-04-04 14:38   ` Michael S. Tsirkin
  1 sibling, 1 reply; 27+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-04-03 20:47 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Kevin Wolf, Peter Lieven,
	Coiby Xu, xen-devel, Richard Henderson, Stefano Garzarella,
	qemu-block, Eduardo Habkost, Paul Durrant, Richard W.M. Jones,
	Dr. David Alan Gilbert, Marcel Apfelbaum, Aarushi Mehta,
	Stefano Stabellini, Fam Zheng, David Woodhouse, Stefan Weil,
	Juan Quintela, Xie Yongji, Hanna Reitz, Ronnie Sahlberg,
	eesposit, Michael S. Tsirkin, Daniel P. Berrangé,
	Anthony Perard

On 3/4/23 20:29, Stefan Hajnoczi wrote:
> Only report a transport reset event to the guest after the SCSIDevice
> has been unrealized by qdev_simple_device_unplug_cb().
> 
> qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
> to false so that scsi_device_find/get() no longer see it.
> 
> scsi_target_emulate_report_luns() also needs to be updated to filter out
> SCSIDevices that are unrealized.
> 
> These changes ensure that the guest driver does not see the SCSIDevice
> that's being unplugged if it responds very quickly to the transport
> reset event.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   hw/scsi/scsi-bus.c    |  3 ++-
>   hw/scsi/virtio-scsi.c | 18 +++++++++---------
>   2 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> index c97176110c..f9bd064833 100644
> --- a/hw/scsi/scsi-bus.c
> +++ b/hw/scsi/scsi-bus.c
> @@ -487,7 +487,8 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq *r)
>               DeviceState *qdev = kid->child;
>               SCSIDevice *dev = SCSI_DEVICE(qdev);
>   
> -            if (dev->channel == channel && dev->id == id && dev->lun != 0) {
> +            if (dev->channel == channel && dev->id == id && dev->lun != 0 &&
> +                qatomic_load_acquire(&dev->qdev.realized)) {

Would this be more useful as a qdev_is_realized() helper?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 13/13] aio: remove aio_disable_external() API
  2023-04-03 18:30 ` [PATCH 13/13] aio: remove aio_disable_external() API Stefan Hajnoczi
@ 2023-04-04  9:16   ` Juan Quintela
  2023-04-04  9:38     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 27+ messages in thread
From: Juan Quintela @ 2023-04-04  9:16 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Paolo Bonzini, Julia Suvorova, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Xie Yongji, Hanna Reitz,
	Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

Stefan Hajnoczi <stefanha@redhat.com> wrote:
> All callers now pass is_external=false to aio_set_fd_handler() and
> aio_set_event_notifier(). The aio_disable_external() API that
> temporarily disables fd handlers that were registered is_external=true
> is therefore dead code.
>
> Remove aio_disable_external(), aio_enable_external(), and the
> is_external arguments to aio_set_fd_handler() and
> aio_set_event_notifier().
>
> The entire test-fdmon-epoll test is removed because its sole purpose was
> testing aio_disable_external().
>
> Parts of this patch were generated using the following coccinelle
> (https://coccinelle.lip6.fr/) semantic patch:
>
>   @@
>   expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque;
>   @@
>   - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque)
>   + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque)
>
>   @@
>   expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
>   @@
>   - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready)
>   + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

[....]

> diff --git a/migration/rdma.c b/migration/rdma.c
> index df646be35e..aee41ca43e 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -3104,15 +3104,15 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
>  {
>      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>      if (io_read) {
> -        aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd,
> -                           false, io_read, io_write, NULL, NULL, opaque);
> -        aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd,
> -                           false, io_read, io_write, NULL, NULL, opaque);
> +        aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd, io_read,
> +                           io_write, NULL, NULL, opaque);
> +        aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd, io_read,
> +                           io_write, NULL, NULL, opaque);
>      } else {
> -        aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd,
> -                           false, io_read, io_write, NULL, NULL, opaque);
> -        aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd,
> -                           false, io_read, io_write, NULL, NULL, opaque);
> +        aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd, io_read,
> +                           io_write, NULL, NULL, opaque);
> +        aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd, io_read,
> +                           io_write, NULL, NULL, opaque);
>      }
>  }

Reviewed-by: Juan Quintela <quintela@redhat.com>

For the migration bits.
I don't even want to know why the RDMA code uses a low level block layer API.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 13/13] aio: remove aio_disable_external() API
  2023-04-04  9:16   ` Juan Quintela
@ 2023-04-04  9:38     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 27+ messages in thread
From: Dr. David Alan Gilbert @ 2023-04-04  9:38 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Stefan Hajnoczi, qemu-devel, Paolo Bonzini, Julia Suvorova,
	Kevin Wolf, Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Marcel Apfelbaum,
	Aarushi Mehta, Stefano Stabellini, Fam Zheng, David Woodhouse,
	Stefan Weil, Xie Yongji, Hanna Reitz, Ronnie Sahlberg, eesposit,
	Michael S. Tsirkin, Daniel P. Berrangé,
	Anthony Perard

* Juan Quintela (quintela@redhat.com) wrote:
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > All callers now pass is_external=false to aio_set_fd_handler() and
> > aio_set_event_notifier(). The aio_disable_external() API that
> > temporarily disables fd handlers that were registered is_external=true
> > is therefore dead code.
> >
> > Remove aio_disable_external(), aio_enable_external(), and the
> > is_external arguments to aio_set_fd_handler() and
> > aio_set_event_notifier().
> >
> > The entire test-fdmon-epoll test is removed because its sole purpose was
> > testing aio_disable_external().
> >
> > Parts of this patch were generated using the following coccinelle
> > (https://coccinelle.lip6.fr/) semantic patch:
> >
> >   @@
> >   expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque;
> >   @@
> >   - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque)
> >   + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque)
> >
> >   @@
> >   expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
> >   @@
> >   - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready)
> >   + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> 
> [....]
> 
> > diff --git a/migration/rdma.c b/migration/rdma.c
> > index df646be35e..aee41ca43e 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -3104,15 +3104,15 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
> >  {
> >      QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> >      if (io_read) {
> > -        aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd,
> > -                           false, io_read, io_write, NULL, NULL, opaque);
> > -        aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd,
> > -                           false, io_read, io_write, NULL, NULL, opaque);
> > +        aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd, io_read,
> > +                           io_write, NULL, NULL, opaque);
> > +        aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd, io_read,
> > +                           io_write, NULL, NULL, opaque);
> >      } else {
> > -        aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd,
> > -                           false, io_read, io_write, NULL, NULL, opaque);
> > -        aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd,
> > -                           false, io_read, io_write, NULL, NULL, opaque);
> > +        aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd, io_read,
> > +                           io_write, NULL, NULL, opaque);
> > +        aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd, io_read,
> > +                           io_write, NULL, NULL, opaque);
> >      }
> >  }
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> For the migration bits.
> I don't even want to know why the RDMA code uses a low level block layer API.

I don't think it's block specific.
It looks like it's because qio_channel uses aio in the case where
something QIO_CHANNEL_ERR_BLOCK and then waits for the recovery; see
4d9f675 that added it.

Dave
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 08/13] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore
  2023-04-03 18:29 ` [PATCH 08/13] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Stefan Hajnoczi
@ 2023-04-04  9:52   ` David Woodhouse
  0 siblings, 0 replies; 27+ messages in thread
From: David Woodhouse @ 2023-04-04  9:52 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Paolo Bonzini, Julia Suvorova, Kevin Wolf, Peter Lieven,
	Coiby Xu, xen-devel, Richard Henderson, Stefano Garzarella,
	qemu-block, Eduardo Habkost, Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	Stefan Weil, Juan Quintela, Xie Yongji, Hanna Reitz,
	Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

[-- Attachment #1: Type: text/plain, Size: 1234 bytes --]

On Mon, 2023-04-03 at 14:29 -0400, Stefan Hajnoczi wrote:
> There is no need to suspend activity between aio_disable_external() and
> aio_enable_external(), which is mainly used for the block layer's drain
> operation.
> 
> This is part of ongoing work to remove the aio_disable_external() API.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

Thanks.

> ---
>  hw/i386/kvm/xen_xenstore.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
> index 900679af8a..6e81bc8791 100644
> --- a/hw/i386/kvm/xen_xenstore.c
> +++ b/hw/i386/kvm/xen_xenstore.c
> @@ -133,7 +133,7 @@ static void xen_xenstore_realize(DeviceState *dev, Error **errp)
>          error_setg(errp, "Xenstore evtchn port init failed");
>          return;
>      }
> -    aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), true,
> +    aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), false,
>                         xen_xenstore_event, NULL, NULL, NULL, s);
>  
>      s->impl = xs_impl_create(xen_domid);


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event
  2023-04-03 20:47   ` Philippe Mathieu-Daudé
@ 2023-04-04 13:06     ` Stefan Hajnoczi
  2023-04-04 13:47       ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-04 13:06 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: qemu-devel, Paolo Bonzini, Julia Suvorova, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost, Paul Durrant,
	Richard W.M. Jones, Dr. David Alan Gilbert, Marcel Apfelbaum,
	Aarushi Mehta, Stefano Stabellini, Fam Zheng, David Woodhouse,
	Stefan Weil, Juan Quintela, Xie Yongji, Hanna Reitz,
	Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

[-- Attachment #1: Type: text/plain, Size: 1650 bytes --]

On Mon, Apr 03, 2023 at 10:47:11PM +0200, Philippe Mathieu-Daudé wrote:
> On 3/4/23 20:29, Stefan Hajnoczi wrote:
> > Only report a transport reset event to the guest after the SCSIDevice
> > has been unrealized by qdev_simple_device_unplug_cb().
> > 
> > qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
> > to false so that scsi_device_find/get() no longer see it.
> > 
> > scsi_target_emulate_report_luns() also needs to be updated to filter out
> > SCSIDevices that are unrealized.
> > 
> > These changes ensure that the guest driver does not see the SCSIDevice
> > that's being unplugged if it responds very quickly to the transport
> > reset event.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   hw/scsi/scsi-bus.c    |  3 ++-
> >   hw/scsi/virtio-scsi.c | 18 +++++++++---------
> >   2 files changed, 11 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> > index c97176110c..f9bd064833 100644
> > --- a/hw/scsi/scsi-bus.c
> > +++ b/hw/scsi/scsi-bus.c
> > @@ -487,7 +487,8 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq *r)
> >               DeviceState *qdev = kid->child;
> >               SCSIDevice *dev = SCSI_DEVICE(qdev);
> > -            if (dev->channel == channel && dev->id == id && dev->lun != 0) {
> > +            if (dev->channel == channel && dev->id == id && dev->lun != 0 &&
> > +                qatomic_load_acquire(&dev->qdev.realized)) {
> 
> Would this be more useful as a qdev_is_realized() helper?

Yes. There are no other users, but I think a helper makes sense.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 02/13] virtio-scsi: stop using aio_disable_external() during unplug
  2023-04-03 18:29 ` [PATCH 02/13] virtio-scsi: stop using aio_disable_external() during unplug Stefan Hajnoczi
@ 2023-04-04 13:38   ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2023-04-04 13:38 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Julia Suvorova, Kevin Wolf, Peter Lieven, Coiby Xu, xen-devel,
	Richard Henderson, Stefano Garzarella, qemu-block,
	Eduardo Habkost, Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard, Zhengui Li

On 4/3/23 20:29, Stefan Hajnoczi wrote:
> This patch is part of an effort to remove the aio_disable_external()
> API because it does not fit in a multi-queue block layer world where
> many AioContexts may be submitting requests to the same disk.
> 
> The SCSI emulation code is already in good shape to stop using
> aio_disable_external(). It was only used by commit 9c5aad84da1c
> ("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
> disk") to ensure that virtio_scsi_hotunplug() works while the guest
> driver is submitting I/O.
> 
> Ensure virtio_scsi_hotunplug() is safe as follows:
> 
> 1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
>     device_set_realized() calls qatomic_set(&dev->realized, false) so
>     that future scsi_device_get() calls return NULL because they exclude
>     SCSIDevices with realized=false.
> 
>     That means virtio-scsi will reject new I/O requests to this
>     SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
>     virtio_scsi_hotunplug() is still executing. We are protected against
>     new requests!
> 
> 2. Add a call to scsi_device_purge_requests() from scsi_unrealize() so
>     that in-flight requests are cancelled synchronously. This ensures
>     that no in-flight requests remain once qdev_simple_device_unplug_cb()
>     returns.
> 
> Thanks to these two conditions we don't need aio_disable_external()
> anymore.
> 
> Cc: Zhengui Li <lizhengui@huawei.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   hw/scsi/scsi-disk.c   | 1 +
>   hw/scsi/virtio-scsi.c | 3 ---
>   2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 97c9b1c8cd..e01bd84541 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -2522,6 +2522,7 @@ static void scsi_realize(SCSIDevice *dev, Error **errp)
>   
>   static void scsi_unrealize(SCSIDevice *dev)
>   {
> +    scsi_device_purge_requests(dev, SENSE_CODE(RESET));
>       del_boot_device_lchs(&dev->qdev, NULL);
>   }
>   
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 000961446c..a02f9233ec 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1061,11 +1061,8 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>       VirtIODevice *vdev = VIRTIO_DEVICE(hotplug_dev);
>       VirtIOSCSI *s = VIRTIO_SCSI(vdev);
>       SCSIDevice *sd = SCSI_DEVICE(dev);
> -    AioContext *ctx = s->ctx ?: qemu_get_aio_context();
>   
> -    aio_disable_external(ctx);
>       qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
> -    aio_enable_external(ctx);
>   
>       if (s->ctx) {
>           virtio_scsi_acquire(s);

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 04/13] util/vhost-user-server: rename refcount to in_flight counter
  2023-04-03 18:29 ` [PATCH 04/13] util/vhost-user-server: rename refcount to in_flight counter Stefan Hajnoczi
@ 2023-04-04 13:39   ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2023-04-04 13:39 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Julia Suvorova, Kevin Wolf, Peter Lieven, Coiby Xu, xen-devel,
	Richard Henderson, Stefano Garzarella, qemu-block,
	Eduardo Habkost, Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

On 4/3/23 20:29, Stefan Hajnoczi wrote:
> The VuServer object has a refcount field and ref/unref APIs. The name is
> confusing because it's actually an in-flight request counter instead of
> a refcount.
> 
> Normally a refcount destroys the object upon reaching zero. The VuServer
> counter is used to wake up the vhost-user coroutine when there are no
> more requests.
> 
> Avoid confusing by renaming refcount and ref/unref to in_flight and
> inc/dec.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   include/qemu/vhost-user-server.h     |  6 +++---
>   block/export/vhost-user-blk-server.c | 11 +++++++----
>   util/vhost-user-server.c             | 14 +++++++-------
>   3 files changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h
> index 25c72433ca..bc0ac9ddb6 100644
> --- a/include/qemu/vhost-user-server.h
> +++ b/include/qemu/vhost-user-server.h
> @@ -41,7 +41,7 @@ typedef struct {
>       const VuDevIface *vu_iface;
>   
>       /* Protected by ctx lock */
> -    unsigned int refcount;
> +    unsigned int in_flight;
>       bool wait_idle;
>       VuDev vu_dev;
>       QIOChannel *ioc; /* The I/O channel with the client */
> @@ -60,8 +60,8 @@ bool vhost_user_server_start(VuServer *server,
>   
>   void vhost_user_server_stop(VuServer *server);
>   
> -void vhost_user_server_ref(VuServer *server);
> -void vhost_user_server_unref(VuServer *server);
> +void vhost_user_server_inc_in_flight(VuServer *server);
> +void vhost_user_server_dec_in_flight(VuServer *server);
>   
>   void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
>   void vhost_user_server_detach_aio_context(VuServer *server);
> diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
> index 3409d9e02e..e93f2ed6b4 100644
> --- a/block/export/vhost-user-blk-server.c
> +++ b/block/export/vhost-user-blk-server.c
> @@ -49,7 +49,10 @@ static void vu_blk_req_complete(VuBlkReq *req, size_t in_len)
>       free(req);
>   }
>   
> -/* Called with server refcount increased, must decrease before returning */
> +/*
> + * Called with server in_flight counter increased, must decrease before
> + * returning.
> + */
>   static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
>   {
>       VuBlkReq *req = opaque;
> @@ -67,12 +70,12 @@ static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
>                                       in_num, out_num);
>       if (in_len < 0) {
>           free(req);
> -        vhost_user_server_unref(server);
> +        vhost_user_server_dec_in_flight(server);
>           return;
>       }
>   
>       vu_blk_req_complete(req, in_len);
> -    vhost_user_server_unref(server);
> +    vhost_user_server_dec_in_flight(server);
>   }
>   
>   static void vu_blk_process_vq(VuDev *vu_dev, int idx)
> @@ -94,7 +97,7 @@ static void vu_blk_process_vq(VuDev *vu_dev, int idx)
>           Coroutine *co =
>               qemu_coroutine_create(vu_blk_virtio_process_req, req);
>   
> -        vhost_user_server_ref(server);
> +        vhost_user_server_inc_in_flight(server);
>           qemu_coroutine_enter(co);
>       }
>   }
> diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
> index 5b6216069c..1622f8cfb3 100644
> --- a/util/vhost-user-server.c
> +++ b/util/vhost-user-server.c
> @@ -75,16 +75,16 @@ static void panic_cb(VuDev *vu_dev, const char *buf)
>       error_report("vu_panic: %s", buf);
>   }
>   
> -void vhost_user_server_ref(VuServer *server)
> +void vhost_user_server_inc_in_flight(VuServer *server)
>   {
>       assert(!server->wait_idle);
> -    server->refcount++;
> +    server->in_flight++;
>   }
>   
> -void vhost_user_server_unref(VuServer *server)
> +void vhost_user_server_dec_in_flight(VuServer *server)
>   {
> -    server->refcount--;
> -    if (server->wait_idle && !server->refcount) {
> +    server->in_flight--;
> +    if (server->wait_idle && !server->in_flight) {
>           aio_co_wake(server->co_trip);
>       }
>   }
> @@ -192,13 +192,13 @@ static coroutine_fn void vu_client_trip(void *opaque)
>           /* Keep running */
>       }
>   
> -    if (server->refcount) {
> +    if (server->in_flight) {
>           /* Wait for requests to complete before we can unmap the memory */
>           server->wait_idle = true;
>           qemu_coroutine_yield();
>           server->wait_idle = false;
>       }
> -    assert(server->refcount == 0);
> +    assert(server->in_flight == 0);
>   
>       vu_deinit(vu_dev);
>   

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/13] block: remove aio_disable_external() API
  2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
                   ` (12 preceding siblings ...)
  2023-04-03 18:30 ` [PATCH 13/13] aio: remove aio_disable_external() API Stefan Hajnoczi
@ 2023-04-04 13:43 ` Paolo Bonzini
  2023-04-04 21:04   ` Stefan Hajnoczi
  13 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2023-04-04 13:43 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Julia Suvorova, Kevin Wolf, Peter Lieven, Coiby Xu, xen-devel,
	Richard Henderson, Stefano Garzarella, qemu-block,
	Eduardo Habkost, Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

On 4/3/23 20:29, Stefan Hajnoczi wrote:
> The aio_disable_external() API temporarily suspends file descriptor monitoring
> in the event loop. The block layer uses this to prevent new I/O requests being
> submitted from the guest and elsewhere between bdrv_drained_begin() and
> bdrv_drained_end().
> 
> While the block layer still needs to prevent new I/O requests in drained
> sections, the aio_disable_external() API can be replaced with
> .drained_begin/end/poll() callbacks that have been added to BdrvChildClass and
> BlockDevOps.
> 
> This newer .bdrained_begin/end/poll() approach is attractive because it works
> without specifying a specific AioContext. The block layer is moving towards
> multi-queue and that means multiple AioContexts may be processing I/O
> simultaneously.
> 
> The aio_disable_external() was always somewhat hacky. It suspends all file
> descriptors that were registered with is_external=true, even if they have
> nothing to do with the BlockDriverState graph nodes that are being drained.
> It's better to solve a block layer problem in the block layer than to have an
> odd event loop API solution.
> 
> That covers the motivation for this change, now on to the specifics of this
> series:
> 
> While it would be nice if a single conceptual approach could be applied to all
> is_external=true file descriptors, I ended up looking at callers on a
> case-by-case basis. There are two general ways I migrated code away from
> is_external=true:
> 
> 1. Block exports are typically best off unregistering fds in .drained_begin()
>     and registering them again in .drained_end(). The .drained_poll() function
>     waits for in-flight requests to finish using a reference counter.
> 
> 2. Emulated storage controllers like virtio-blk and virtio-scsi are a little
>     simpler. They can rely on BlockBackend's request queuing during drain
>     feature. Guest I/O request coroutines are suspended in a drained section and
>     resume upon the end of the drained section.

Sorry, I disagree with this.

Request queuing was shown to cause deadlocks; Hanna's latest patch is 
piling another hack upon it, instead in my opinion we should go in the 
direction of relying _less_ (or not at all) on request queuing.

I am strongly convinced that request queuing must apply only after 
bdrv_drained_begin has returned, which would also fix the IDE TRIM bug 
reported by Fiona Ebner.  The possible livelock scenario is generally 
not a problem because 1) outside an iothread you have anyway the BQL 
that prevents a vCPU from issuing more I/O operations during 
bdrv_drained_begin 2) in iothreads you have aio_disable_external() 
instead of .drained_begin().

It is also less tidy to start a request during the drained_begin phase, 
because a request that has been submitted has to be completed (cancel 
doesn't really work).

So in an ideal world, request queuing would not only apply only after 
bdrv_drained_begin has returned, it would log a warning and 
.drained_begin() should set up things so that there are no such warnings.

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 11/13] block/fuse: take AioContext lock around blk_exp_ref/unref()
  2023-04-03 18:30 ` [PATCH 11/13] block/fuse: take AioContext lock around blk_exp_ref/unref() Stefan Hajnoczi
@ 2023-04-04 13:46   ` Paolo Bonzini
  2023-04-04 21:01     ` Stefan Hajnoczi
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2023-04-04 13:46 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Julia Suvorova, Kevin Wolf, Peter Lieven, Coiby Xu, xen-devel,
	Richard Henderson, Stefano Garzarella, qemu-block,
	Eduardo Habkost, Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

On 4/3/23 20:30, Stefan Hajnoczi wrote:
> These functions must be called with the AioContext acquired:
> 
>    /* Callers must hold exp->ctx lock */
>    void blk_exp_ref(BlockExport *exp)
>    ...
>    /* Callers must hold exp->ctx lock */
>    void blk_exp_unref(BlockExport *exp)
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   block/export/fuse.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/block/export/fuse.c b/block/export/fuse.c
> index 06fa41079e..18394f9e07 100644
> --- a/block/export/fuse.c
> +++ b/block/export/fuse.c
> @@ -244,7 +244,9 @@ static void read_from_fuse_export(void *opaque)
>       FuseExport *exp = opaque;
>       int ret;
>   
> +    aio_context_acquire(exp->common.ctx);
>       blk_exp_ref(&exp->common);
> +    aio_context_release(exp->common.ctx);
>   
>       do {
>           ret = fuse_session_receive_buf(exp->fuse_session, &exp->fuse_buf);
> @@ -256,7 +258,9 @@ static void read_from_fuse_export(void *opaque)
>       fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf);
>   
>   out:
> +    aio_context_acquire(exp->common.ctx);
>       blk_exp_unref(&exp->common);
> +    aio_context_release(exp->common.ctx);
>   }

Since the actual thread-unsafe work is done in a bottom half, perhaps 
instead you can use qatomic_inc and qatomic_fetch_dec in 
blk_exp_{ref,unref}?

Paolo



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event
  2023-04-04 13:06     ` Stefan Hajnoczi
@ 2023-04-04 13:47       ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2023-04-04 13:47 UTC (permalink / raw)
  To: Stefan Hajnoczi, Philippe Mathieu-Daudé
  Cc: qemu-devel, Julia Suvorova, Kevin Wolf, Peter Lieven, Coiby Xu,
	xen-devel, Richard Henderson, Stefano Garzarella, qemu-block,
	Eduardo Habkost, Paul Durrant, Richard W.M. Jones,
	Dr. David Alan Gilbert, Marcel Apfelbaum, Aarushi Mehta,
	Stefano Stabellini, Fam Zheng, David Woodhouse, Stefan Weil,
	Juan Quintela, Xie Yongji, Hanna Reitz, Ronnie Sahlberg,
	eesposit, Michael S. Tsirkin, Daniel P. Berrangé,
	Anthony Perard

On 4/4/23 15:06, Stefan Hajnoczi wrote:
>> Would this be more useful as a qdev_is_realized() helper?
> Yes. There are no other users, but I think a helper makes sense.

Agreed; anyway,

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event
  2023-04-03 18:29 ` [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
  2023-04-03 20:47   ` Philippe Mathieu-Daudé
@ 2023-04-04 14:38   ` Michael S. Tsirkin
  1 sibling, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2023-04-04 14:38 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Paolo Bonzini, Julia Suvorova, Kevin Wolf,
	Peter Lieven, Coiby Xu, xen-devel, Richard Henderson,
	Stefano Garzarella, qemu-block, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Daniel P. Berrangé,
	Anthony Perard

On Mon, Apr 03, 2023 at 02:29:52PM -0400, Stefan Hajnoczi wrote:
> Only report a transport reset event to the guest after the SCSIDevice
> has been unrealized by qdev_simple_device_unplug_cb().
> 
> qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
> to false so that scsi_device_find/get() no longer see it.
> 
> scsi_target_emulate_report_luns() also needs to be updated to filter out
> SCSIDevices that are unrealized.
> 
> These changes ensure that the guest driver does not see the SCSIDevice
> that's being unplugged if it responds very quickly to the transport
> reset event.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>


Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

Feel free to merge.

> ---
>  hw/scsi/scsi-bus.c    |  3 ++-
>  hw/scsi/virtio-scsi.c | 18 +++++++++---------
>  2 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> index c97176110c..f9bd064833 100644
> --- a/hw/scsi/scsi-bus.c
> +++ b/hw/scsi/scsi-bus.c
> @@ -487,7 +487,8 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq *r)
>              DeviceState *qdev = kid->child;
>              SCSIDevice *dev = SCSI_DEVICE(qdev);
>  
> -            if (dev->channel == channel && dev->id == id && dev->lun != 0) {
> +            if (dev->channel == channel && dev->id == id && dev->lun != 0 &&
> +                qatomic_load_acquire(&dev->qdev.realized)) {
>                  store_lun(tmp, dev->lun);
>                  g_byte_array_append(buf, tmp, 8);
>                  len += 8;
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 612c525d9d..000961446c 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1063,15 +1063,6 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>      SCSIDevice *sd = SCSI_DEVICE(dev);
>      AioContext *ctx = s->ctx ?: qemu_get_aio_context();
>  
> -    if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> -        virtio_scsi_acquire(s);
> -        virtio_scsi_push_event(s, sd,
> -                               VIRTIO_SCSI_T_TRANSPORT_RESET,
> -                               VIRTIO_SCSI_EVT_RESET_REMOVED);
> -        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> -        virtio_scsi_release(s);
> -    }
> -
>      aio_disable_external(ctx);
>      qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
>      aio_enable_external(ctx);
> @@ -1082,6 +1073,15 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev,
>          blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL);
>          virtio_scsi_release(s);
>      }
> +
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> +        virtio_scsi_acquire(s);
> +        virtio_scsi_push_event(s, sd,
> +                               VIRTIO_SCSI_T_TRANSPORT_RESET,
> +                               VIRTIO_SCSI_EVT_RESET_REMOVED);
> +        scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> +        virtio_scsi_release(s);
> +    }
>  }
>  
>  static struct SCSIBusInfo virtio_scsi_scsi_info = {
> -- 
> 2.39.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 11/13] block/fuse: take AioContext lock around blk_exp_ref/unref()
  2023-04-04 13:46   ` Paolo Bonzini
@ 2023-04-04 21:01     ` Stefan Hajnoczi
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-04 21:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Julia Suvorova, Kevin Wolf, Peter Lieven, Coiby Xu,
	xen-devel, Richard Henderson, Stefano Garzarella, qemu-block,
	Eduardo Habkost, Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]

On Tue, Apr 04, 2023 at 03:46:34PM +0200, Paolo Bonzini wrote:
> On 4/3/23 20:30, Stefan Hajnoczi wrote:
> > These functions must be called with the AioContext acquired:
> > 
> >    /* Callers must hold exp->ctx lock */
> >    void blk_exp_ref(BlockExport *exp)
> >    ...
> >    /* Callers must hold exp->ctx lock */
> >    void blk_exp_unref(BlockExport *exp)
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   block/export/fuse.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/block/export/fuse.c b/block/export/fuse.c
> > index 06fa41079e..18394f9e07 100644
> > --- a/block/export/fuse.c
> > +++ b/block/export/fuse.c
> > @@ -244,7 +244,9 @@ static void read_from_fuse_export(void *opaque)
> >       FuseExport *exp = opaque;
> >       int ret;
> > +    aio_context_acquire(exp->common.ctx);
> >       blk_exp_ref(&exp->common);
> > +    aio_context_release(exp->common.ctx);
> >       do {
> >           ret = fuse_session_receive_buf(exp->fuse_session, &exp->fuse_buf);
> > @@ -256,7 +258,9 @@ static void read_from_fuse_export(void *opaque)
> >       fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf);
> >   out:
> > +    aio_context_acquire(exp->common.ctx);
> >       blk_exp_unref(&exp->common);
> > +    aio_context_release(exp->common.ctx);
> >   }
> 
> Since the actual thread-unsafe work is done in a bottom half, perhaps
> instead you can use qatomic_inc and qatomic_fetch_dec in
> blk_exp_{ref,unref}?

Sure, I'll give that a try in the next revision.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/13] block: remove aio_disable_external() API
  2023-04-04 13:43 ` [PATCH 00/13] block: " Paolo Bonzini
@ 2023-04-04 21:04   ` Stefan Hajnoczi
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2023-04-04 21:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Julia Suvorova, Kevin Wolf, Peter Lieven, Coiby Xu,
	xen-devel, Richard Henderson, Stefano Garzarella, qemu-block,
	Eduardo Habkost, Philippe Mathieu-Daudé,
	Paul Durrant, Richard W.M. Jones, Dr. David Alan Gilbert,
	Marcel Apfelbaum, Aarushi Mehta, Stefano Stabellini, Fam Zheng,
	David Woodhouse, Stefan Weil, Juan Quintela, Xie Yongji,
	Hanna Reitz, Ronnie Sahlberg, eesposit, Michael S. Tsirkin,
	Daniel P. Berrangé,
	Anthony Perard

[-- Attachment #1: Type: text/plain, Size: 3408 bytes --]

On Tue, Apr 04, 2023 at 03:43:20PM +0200, Paolo Bonzini wrote:
> On 4/3/23 20:29, Stefan Hajnoczi wrote:
> > The aio_disable_external() API temporarily suspends file descriptor monitoring
> > in the event loop. The block layer uses this to prevent new I/O requests being
> > submitted from the guest and elsewhere between bdrv_drained_begin() and
> > bdrv_drained_end().
> > 
> > While the block layer still needs to prevent new I/O requests in drained
> > sections, the aio_disable_external() API can be replaced with
> > .drained_begin/end/poll() callbacks that have been added to BdrvChildClass and
> > BlockDevOps.
> > 
> > This newer .bdrained_begin/end/poll() approach is attractive because it works
> > without specifying a specific AioContext. The block layer is moving towards
> > multi-queue and that means multiple AioContexts may be processing I/O
> > simultaneously.
> > 
> > The aio_disable_external() was always somewhat hacky. It suspends all file
> > descriptors that were registered with is_external=true, even if they have
> > nothing to do with the BlockDriverState graph nodes that are being drained.
> > It's better to solve a block layer problem in the block layer than to have an
> > odd event loop API solution.
> > 
> > That covers the motivation for this change, now on to the specifics of this
> > series:
> > 
> > While it would be nice if a single conceptual approach could be applied to all
> > is_external=true file descriptors, I ended up looking at callers on a
> > case-by-case basis. There are two general ways I migrated code away from
> > is_external=true:
> > 
> > 1. Block exports are typically best off unregistering fds in .drained_begin()
> >     and registering them again in .drained_end(). The .drained_poll() function
> >     waits for in-flight requests to finish using a reference counter.
> > 
> > 2. Emulated storage controllers like virtio-blk and virtio-scsi are a little
> >     simpler. They can rely on BlockBackend's request queuing during drain
> >     feature. Guest I/O request coroutines are suspended in a drained section and
> >     resume upon the end of the drained section.
> 
> Sorry, I disagree with this.
> 
> Request queuing was shown to cause deadlocks; Hanna's latest patch is piling
> another hack upon it, instead in my opinion we should go in the direction of
> relying _less_ (or not at all) on request queuing.
> 
> I am strongly convinced that request queuing must apply only after
> bdrv_drained_begin has returned, which would also fix the IDE TRIM bug
> reported by Fiona Ebner.  The possible livelock scenario is generally not a
> problem because 1) outside an iothread you have anyway the BQL that prevents
> a vCPU from issuing more I/O operations during bdrv_drained_begin 2) in
> iothreads you have aio_disable_external() instead of .drained_begin().
> 
> It is also less tidy to start a request during the drained_begin phase,
> because a request that has been submitted has to be completed (cancel
> doesn't really work).
> 
> So in an ideal world, request queuing would not only apply only after
> bdrv_drained_begin has returned, it would log a warning and .drained_begin()
> should set up things so that there are no such warnings.

That's fine, I will give .drained_begin/end/poll() a try with virtio-blk
and virtio-scsi in the next revision.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-04-04 21:05 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-03 18:29 [PATCH 00/13] block: remove aio_disable_external() API Stefan Hajnoczi
2023-04-03 18:29 ` [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event Stefan Hajnoczi
2023-04-03 20:47   ` Philippe Mathieu-Daudé
2023-04-04 13:06     ` Stefan Hajnoczi
2023-04-04 13:47       ` Paolo Bonzini
2023-04-04 14:38   ` Michael S. Tsirkin
2023-04-03 18:29 ` [PATCH 02/13] virtio-scsi: stop using aio_disable_external() during unplug Stefan Hajnoczi
2023-04-04 13:38   ` Paolo Bonzini
2023-04-03 18:29 ` [PATCH 03/13] block/export: only acquire AioContext once for vhost_user_server_stop() Stefan Hajnoczi
2023-04-03 18:29 ` [PATCH 04/13] util/vhost-user-server: rename refcount to in_flight counter Stefan Hajnoczi
2023-04-04 13:39   ` Paolo Bonzini
2023-04-03 18:29 ` [PATCH 05/13] block/export: wait for vhost-user-blk requests when draining Stefan Hajnoczi
2023-04-03 18:29 ` [PATCH 06/13] block/export: stop using is_external in vhost-user-blk server Stefan Hajnoczi
2023-04-03 18:29 ` [PATCH 07/13] virtio: do not set is_external=true on host notifiers Stefan Hajnoczi
2023-04-03 18:29 ` [PATCH 08/13] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Stefan Hajnoczi
2023-04-04  9:52   ` David Woodhouse
2023-04-03 18:30 ` [PATCH 09/13] hw/xen: do not set is_external=true on evtchn fds Stefan Hajnoczi
2023-04-03 18:30 ` [PATCH 10/13] block/export: rewrite vduse-blk drain code Stefan Hajnoczi
2023-04-03 18:30 ` [PATCH 11/13] block/fuse: take AioContext lock around blk_exp_ref/unref() Stefan Hajnoczi
2023-04-04 13:46   ` Paolo Bonzini
2023-04-04 21:01     ` Stefan Hajnoczi
2023-04-03 18:30 ` [PATCH 12/13] block/fuse: do not set is_external=true on FUSE fd Stefan Hajnoczi
2023-04-03 18:30 ` [PATCH 13/13] aio: remove aio_disable_external() API Stefan Hajnoczi
2023-04-04  9:16   ` Juan Quintela
2023-04-04  9:38     ` Dr. David Alan Gilbert
2023-04-04 13:43 ` [PATCH 00/13] block: " Paolo Bonzini
2023-04-04 21:04   ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.