All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] block: add blk_io_plug_call() API
@ 2023-05-23 17:12 Stefan Hajnoczi
  2023-05-23 17:12 ` [PATCH v2 1/6] " Stefan Hajnoczi
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-05-23 17:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aarushi Mehta, Michael S. Tsirkin, Stefano Garzarella,
	Julia Suvorova, Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, Stefan Hajnoczi, xen-devel, eblake, Anthony Perard,
	qemu-block

v2
- Patch 1: "is not be freed" -> "is not freed" [Eric]
- Patch 2: Remove unused nvme_process_completion_queue_plugged trace event
  [Stefano]
- Patch 3: Add missing #include and fix blkio_unplug_fn() prototype [Stefano]
- Patch 4: Removed whitespace hunk [Eric]

The existing blk_io_plug() API is not block layer multi-queue friendly because
the plug state is per-BlockDriverState.

Change blk_io_plug()'s implementation so it is thread-local. This is done by
introducing the blk_io_plug_call() function that block drivers use to batch
calls while plugged. It is relatively easy to convert block drivers from
.bdrv_co_io_plug() to blk_io_plug_call().

Random read 4KB performance with virtio-blk on a host NVMe block device:

iodepth   iops   change vs today
1        45612   -4%
2        87967   +2%
4       129872   +0%
8       171096   -3%
16      194508   -4%
32      208947   -1%
64      217647   +0%
128     229629   +0%

The results are within the noise for these benchmarks. This is to be expected
because the plugging behavior for a single thread hasn't changed in this patch
series, only that the state is thread-local now.

The following graph compares several approaches:
https://vmsplice.net/~stefan/blk_io_plug-thread-local.png
- v7.2.0: before most of the multi-queue block layer changes landed.
- with-blk_io_plug: today's post-8.0.0 QEMU.
- blk_io_plug-thread-local: this patch series.
- no-blk_io_plug: what happens when we simply remove plugging?
- call-after-dispatch: what if we integrate plugging into the event loop? I
  decided against this approach in the end because it's more likely to
  introduce performance regressions since I/O submission is deferred until the
  end of the event loop iteration.

Aside from the no-blk_io_plug case, which bottlenecks much earlier than the
others, we see that all plugging approaches are more or less equivalent in this
benchmark. It is also clear that QEMU 8.0.0 has lower performance than 7.2.0.

The Ansible playbook, fio results, and a Jupyter notebook are available here:
https://github.com/stefanha/qemu-perf/tree/remove-blk_io_plug

Stefan Hajnoczi (6):
  block: add blk_io_plug_call() API
  block/nvme: convert to blk_io_plug_call() API
  block/blkio: convert to blk_io_plug_call() API
  block/io_uring: convert to blk_io_plug_call() API
  block/linux-aio: convert to blk_io_plug_call() API
  block: remove bdrv_co_io_plug() API

 MAINTAINERS                       |   1 +
 include/block/block-io.h          |   3 -
 include/block/block_int-common.h  |  11 ---
 include/block/raw-aio.h           |  14 ---
 include/sysemu/block-backend-io.h |  13 +--
 block/blkio.c                     |  43 ++++----
 block/block-backend.c             |  22 -----
 block/file-posix.c                |  38 -------
 block/io.c                        |  37 -------
 block/io_uring.c                  |  44 ++++-----
 block/linux-aio.c                 |  41 +++-----
 block/nvme.c                      |  44 +++------
 block/plug.c                      | 159 ++++++++++++++++++++++++++++++
 hw/block/dataplane/xen-block.c    |   8 +-
 hw/block/virtio-blk.c             |   4 +-
 hw/scsi/virtio-scsi.c             |   6 +-
 block/meson.build                 |   1 +
 block/trace-events                |   6 +-
 18 files changed, 239 insertions(+), 256 deletions(-)
 create mode 100644 block/plug.c

-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/6] block: add blk_io_plug_call() API
  2023-05-23 17:12 [PATCH v2 0/6] block: add blk_io_plug_call() API Stefan Hajnoczi
@ 2023-05-23 17:12 ` Stefan Hajnoczi
  2023-05-24  8:08   ` Stefano Garzarella
  2023-05-23 17:12 ` [PATCH v2 2/6] block/nvme: convert to " Stefan Hajnoczi
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-05-23 17:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aarushi Mehta, Michael S. Tsirkin, Stefano Garzarella,
	Julia Suvorova, Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, Stefan Hajnoczi, xen-devel, eblake, Anthony Perard,
	qemu-block

Introduce a new API for thread-local blk_io_plug() that does not
traverse the block graph. The goal is to make blk_io_plug() multi-queue
friendly.

Instead of having block drivers track whether or not we're in a plugged
section, provide an API that allows them to defer a function call until
we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is
called multiple times with the same fn/opaque pair, then fn() is only
called once at the end of the function - resulting in batching.

This patch introduces the API and changes blk_io_plug()/blk_io_unplug().
blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument
because the plug state is now thread-local.

Later patches convert block drivers to blk_io_plug_call() and then we
can finally remove .bdrv_co_io_plug() once all block drivers have been
converted.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
v2
- "is not be freed" -> "is not freed" [Eric]
---
 MAINTAINERS                       |   1 +
 include/sysemu/block-backend-io.h |  13 +--
 block/block-backend.c             |  22 -----
 block/plug.c                      | 159 ++++++++++++++++++++++++++++++
 hw/block/dataplane/xen-block.c    |   8 +-
 hw/block/virtio-blk.c             |   4 +-
 hw/scsi/virtio-scsi.c             |   6 +-
 block/meson.build                 |   1 +
 8 files changed, 173 insertions(+), 41 deletions(-)
 create mode 100644 block/plug.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 1b6466496d..2be6f0c26b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2646,6 +2646,7 @@ F: util/aio-*.c
 F: util/aio-*.h
 F: util/fdmon-*.c
 F: block/io.c
+F: block/plug.c
 F: migration/block*
 F: include/block/aio.h
 F: include/block/aio-wait.h
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
index d62a7ee773..be4dcef59d 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -100,16 +100,9 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-/*
- * blk_io_plug/unplug are thread-local operations. This means that multiple
- * IOThreads can simultaneously call plug/unplug, but the caller must ensure
- * that each unplug() is called in the same IOThread of the matching plug().
- */
-void coroutine_fn blk_co_io_plug(BlockBackend *blk);
-void co_wrapper blk_io_plug(BlockBackend *blk);
-
-void coroutine_fn blk_co_io_unplug(BlockBackend *blk);
-void co_wrapper blk_io_unplug(BlockBackend *blk);
+void blk_io_plug(void);
+void blk_io_unplug(void);
+void blk_io_plug_call(void (*fn)(void *), void *opaque);
 
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index ca537cd0ad..1f1d226ba6 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2568,28 +2568,6 @@ void blk_add_insert_bs_notifier(BlockBackend *blk, Notifier *notify)
     notifier_list_add(&blk->insert_bs_notifiers, notify);
 }
 
-void coroutine_fn blk_co_io_plug(BlockBackend *blk)
-{
-    BlockDriverState *bs = blk_bs(blk);
-    IO_CODE();
-    GRAPH_RDLOCK_GUARD();
-
-    if (bs) {
-        bdrv_co_io_plug(bs);
-    }
-}
-
-void coroutine_fn blk_co_io_unplug(BlockBackend *blk)
-{
-    BlockDriverState *bs = blk_bs(blk);
-    IO_CODE();
-    GRAPH_RDLOCK_GUARD();
-
-    if (bs) {
-        bdrv_co_io_unplug(bs);
-    }
-}
-
 BlockAcctStats *blk_get_stats(BlockBackend *blk)
 {
     IO_CODE();
diff --git a/block/plug.c b/block/plug.c
new file mode 100644
index 0000000000..98a155d2f4
--- /dev/null
+++ b/block/plug.c
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Block I/O plugging
+ *
+ * Copyright Red Hat.
+ *
+ * This API defers a function call within a blk_io_plug()/blk_io_unplug()
+ * section, allowing multiple calls to batch up. This is a performance
+ * optimization that is used in the block layer to submit several I/O requests
+ * at once instead of individually:
+ *
+ *   blk_io_plug(); <-- start of plugged region
+ *   ...
+ *   blk_io_plug_call(my_func, my_obj); <-- deferred my_func(my_obj) call
+ *   blk_io_plug_call(my_func, my_obj); <-- another
+ *   blk_io_plug_call(my_func, my_obj); <-- another
+ *   ...
+ *   blk_io_unplug(); <-- end of plugged region, my_func(my_obj) is called once
+ *
+ * This code is actually generic and not tied to the block layer. If another
+ * subsystem needs this functionality, it could be renamed.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/coroutine-tls.h"
+#include "qemu/notify.h"
+#include "qemu/thread.h"
+#include "sysemu/block-backend.h"
+
+/* A function call that has been deferred until unplug() */
+typedef struct {
+    void (*fn)(void *);
+    void *opaque;
+} UnplugFn;
+
+/* Per-thread state */
+typedef struct {
+    unsigned count;       /* how many times has plug() been called? */
+    GArray *unplug_fns;   /* functions to call at unplug time */
+} Plug;
+
+/* Use get_ptr_plug() to fetch this thread-local value */
+QEMU_DEFINE_STATIC_CO_TLS(Plug, plug);
+
+/* Called at thread cleanup time */
+static void blk_io_plug_atexit(Notifier *n, void *value)
+{
+    Plug *plug = get_ptr_plug();
+    g_array_free(plug->unplug_fns, TRUE);
+}
+
+/* This won't involve coroutines, so use __thread */
+static __thread Notifier blk_io_plug_atexit_notifier;
+
+/**
+ * blk_io_plug_call:
+ * @fn: a function pointer to be invoked
+ * @opaque: a user-defined argument to @fn()
+ *
+ * Call @fn(@opaque) immediately if not within a blk_io_plug()/blk_io_unplug()
+ * section.
+ *
+ * Otherwise defer the call until the end of the outermost
+ * blk_io_plug()/blk_io_unplug() section in this thread. If the same
+ * @fn/@opaque pair has already been deferred, it will only be called once upon
+ * blk_io_unplug() so that accumulated calls are batched into a single call.
+ *
+ * The caller must ensure that @opaque is not freed before @fn() is invoked.
+ */
+void blk_io_plug_call(void (*fn)(void *), void *opaque)
+{
+    Plug *plug = get_ptr_plug();
+
+    /* Call immediately if we're not plugged */
+    if (plug->count == 0) {
+        fn(opaque);
+        return;
+    }
+
+    GArray *array = plug->unplug_fns;
+    if (!array) {
+        array = g_array_new(FALSE, FALSE, sizeof(UnplugFn));
+        plug->unplug_fns = array;
+        blk_io_plug_atexit_notifier.notify = blk_io_plug_atexit;
+        qemu_thread_atexit_add(&blk_io_plug_atexit_notifier);
+    }
+
+    UnplugFn *fns = (UnplugFn *)array->data;
+    UnplugFn new_fn = {
+        .fn = fn,
+        .opaque = opaque,
+    };
+
+    /*
+     * There won't be many, so do a linear search. If this becomes a bottleneck
+     * then a binary search (glib 2.62+) or different data structure could be
+     * used.
+     */
+    for (guint i = 0; i < array->len; i++) {
+        if (memcmp(&fns[i], &new_fn, sizeof(new_fn)) == 0) {
+            return; /* already exists */
+        }
+    }
+
+    g_array_append_val(array, new_fn);
+}
+
+/**
+ * blk_io_plug: Defer blk_io_plug_call() functions until blk_io_unplug()
+ *
+ * blk_io_plug/unplug are thread-local operations. This means that multiple
+ * threads can simultaneously call plug/unplug, but the caller must ensure that
+ * each unplug() is called in the same thread of the matching plug().
+ *
+ * Nesting is supported. blk_io_plug_call() functions are only called at the
+ * outermost blk_io_unplug().
+ */
+void blk_io_plug(void)
+{
+    Plug *plug = get_ptr_plug();
+
+    assert(plug->count < UINT32_MAX);
+
+    plug->count++;
+}
+
+/**
+ * blk_io_unplug: Run any pending blk_io_plug_call() functions
+ *
+ * There must have been a matching blk_io_plug() call in the same thread prior
+ * to this blk_io_unplug() call.
+ */
+void blk_io_unplug(void)
+{
+    Plug *plug = get_ptr_plug();
+
+    assert(plug->count > 0);
+
+    if (--plug->count > 0) {
+        return;
+    }
+
+    GArray *array = plug->unplug_fns;
+    if (!array) {
+        return;
+    }
+
+    UnplugFn *fns = (UnplugFn *)array->data;
+
+    for (guint i = 0; i < array->len; i++) {
+        fns[i].fn(fns[i].opaque);
+    }
+
+    /*
+     * This resets the array without freeing memory so that appending is cheap
+     * in the future.
+     */
+    g_array_set_size(array, 0);
+}
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index d8bc39d359..e49c24f63d 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -537,7 +537,7 @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
      * is below us.
      */
     if (inflight_atstart > IO_PLUG_THRESHOLD) {
-        blk_io_plug(dataplane->blk);
+        blk_io_plug();
     }
     while (rc != rp) {
         /* pull request from ring */
@@ -577,12 +577,12 @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
 
         if (inflight_atstart > IO_PLUG_THRESHOLD &&
             batched >= inflight_atstart) {
-            blk_io_unplug(dataplane->blk);
+            blk_io_unplug();
         }
         xen_block_do_aio(request);
         if (inflight_atstart > IO_PLUG_THRESHOLD) {
             if (batched >= inflight_atstart) {
-                blk_io_plug(dataplane->blk);
+                blk_io_plug();
                 batched = 0;
             } else {
                 batched++;
@@ -590,7 +590,7 @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
         }
     }
     if (inflight_atstart > IO_PLUG_THRESHOLD) {
-        blk_io_unplug(dataplane->blk);
+        blk_io_unplug();
     }
 
     return done_something;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 8f65ea4659..b4286424c1 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1134,7 +1134,7 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
     bool suppress_notifications = virtio_queue_get_notification(vq);
 
     aio_context_acquire(blk_get_aio_context(s->blk));
-    blk_io_plug(s->blk);
+    blk_io_plug();
 
     do {
         if (suppress_notifications) {
@@ -1158,7 +1158,7 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
         virtio_blk_submit_multireq(s, &mrb);
     }
 
-    blk_io_unplug(s->blk);
+    blk_io_unplug();
     aio_context_release(blk_get_aio_context(s->blk));
 }
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 612c525d9d..534a44ee07 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -799,7 +799,7 @@ static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, VirtIOSCSIReq *req)
         return -ENOBUFS;
     }
     scsi_req_ref(req->sreq);
-    blk_io_plug(d->conf.blk);
+    blk_io_plug();
     object_unref(OBJECT(d));
     return 0;
 }
@@ -810,7 +810,7 @@ static void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req)
     if (scsi_req_enqueue(sreq)) {
         scsi_req_continue(sreq);
     }
-    blk_io_unplug(sreq->dev->conf.blk);
+    blk_io_unplug();
     scsi_req_unref(sreq);
 }
 
@@ -836,7 +836,7 @@ static void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
                 while (!QTAILQ_EMPTY(&reqs)) {
                     req = QTAILQ_FIRST(&reqs);
                     QTAILQ_REMOVE(&reqs, req, next);
-                    blk_io_unplug(req->sreq->dev->conf.blk);
+                    blk_io_unplug();
                     scsi_req_unref(req->sreq);
                     virtqueue_detach_element(req->vq, &req->elem, 0);
                     virtio_scsi_free_req(req);
diff --git a/block/meson.build b/block/meson.build
index 486dda8b85..fb4332bd66 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -23,6 +23,7 @@ block_ss.add(files(
   'mirror.c',
   'nbd.c',
   'null.c',
+  'plug.c',
   'qapi.c',
   'qcow2-bitmap.c',
   'qcow2-cache.c',
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/6] block/nvme: convert to blk_io_plug_call() API
  2023-05-23 17:12 [PATCH v2 0/6] block: add blk_io_plug_call() API Stefan Hajnoczi
  2023-05-23 17:12 ` [PATCH v2 1/6] " Stefan Hajnoczi
@ 2023-05-23 17:12 ` Stefan Hajnoczi
  2023-05-24  8:11   ` Stefano Garzarella
  2023-05-23 17:12 ` [PATCH v2 3/6] block/blkio: " Stefan Hajnoczi
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-05-23 17:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aarushi Mehta, Michael S. Tsirkin, Stefano Garzarella,
	Julia Suvorova, Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, Stefan Hajnoczi, xen-devel, eblake, Anthony Perard,
	qemu-block

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
v2
- Remove unused nvme_process_completion_queue_plugged trace event
  [Stefano]
---
 block/nvme.c       | 44 ++++++++++++--------------------------------
 block/trace-events |  1 -
 2 files changed, 12 insertions(+), 33 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 5b744c2bda..100b38b592 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -25,6 +25,7 @@
 #include "qemu/vfio-helpers.h"
 #include "block/block-io.h"
 #include "block/block_int.h"
+#include "sysemu/block-backend.h"
 #include "sysemu/replay.h"
 #include "trace.h"
 
@@ -119,7 +120,6 @@ struct BDRVNVMeState {
     int blkshift;
 
     uint64_t max_transfer;
-    bool plugged;
 
     bool supports_write_zeroes;
     bool supports_discard;
@@ -282,7 +282,7 @@ static void nvme_kick(NVMeQueuePair *q)
 {
     BDRVNVMeState *s = q->s;
 
-    if (s->plugged || !q->need_kick) {
+    if (!q->need_kick) {
         return;
     }
     trace_nvme_kick(s, q->index);
@@ -387,10 +387,6 @@ static bool nvme_process_completion(NVMeQueuePair *q)
     NvmeCqe *c;
 
     trace_nvme_process_completion(s, q->index, q->inflight);
-    if (s->plugged) {
-        trace_nvme_process_completion_queue_plugged(s, q->index);
-        return false;
-    }
 
     /*
      * Support re-entrancy when a request cb() function invokes aio_poll().
@@ -480,6 +476,15 @@ static void nvme_trace_command(const NvmeCmd *cmd)
     }
 }
 
+static void nvme_unplug_fn(void *opaque)
+{
+    NVMeQueuePair *q = opaque;
+
+    QEMU_LOCK_GUARD(&q->lock);
+    nvme_kick(q);
+    nvme_process_completion(q);
+}
+
 static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req,
                                 NvmeCmd *cmd, BlockCompletionFunc cb,
                                 void *opaque)
@@ -496,8 +501,7 @@ static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req,
            q->sq.tail * NVME_SQ_ENTRY_BYTES, cmd, sizeof(*cmd));
     q->sq.tail = (q->sq.tail + 1) % NVME_QUEUE_SIZE;
     q->need_kick++;
-    nvme_kick(q);
-    nvme_process_completion(q);
+    blk_io_plug_call(nvme_unplug_fn, q);
     qemu_mutex_unlock(&q->lock);
 }
 
@@ -1567,27 +1571,6 @@ static void nvme_attach_aio_context(BlockDriverState *bs,
     }
 }
 
-static void coroutine_fn nvme_co_io_plug(BlockDriverState *bs)
-{
-    BDRVNVMeState *s = bs->opaque;
-    assert(!s->plugged);
-    s->plugged = true;
-}
-
-static void coroutine_fn nvme_co_io_unplug(BlockDriverState *bs)
-{
-    BDRVNVMeState *s = bs->opaque;
-    assert(s->plugged);
-    s->plugged = false;
-    for (unsigned i = INDEX_IO(0); i < s->queue_count; i++) {
-        NVMeQueuePair *q = s->queues[i];
-        qemu_mutex_lock(&q->lock);
-        nvme_kick(q);
-        nvme_process_completion(q);
-        qemu_mutex_unlock(&q->lock);
-    }
-}
-
 static bool nvme_register_buf(BlockDriverState *bs, void *host, size_t size,
                               Error **errp)
 {
@@ -1664,9 +1647,6 @@ static BlockDriver bdrv_nvme = {
     .bdrv_detach_aio_context  = nvme_detach_aio_context,
     .bdrv_attach_aio_context  = nvme_attach_aio_context,
 
-    .bdrv_co_io_plug          = nvme_co_io_plug,
-    .bdrv_co_io_unplug        = nvme_co_io_unplug,
-
     .bdrv_register_buf        = nvme_register_buf,
     .bdrv_unregister_buf      = nvme_unregister_buf,
 };
diff --git a/block/trace-events b/block/trace-events
index 32665158d6..048ad27519 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -141,7 +141,6 @@ nvme_kick(void *s, unsigned q_index) "s %p q #%u"
 nvme_dma_flush_queue_wait(void *s) "s %p"
 nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) "cmd_specific %d sq_head %d sqid %d cid %d status 0x%x"
 nvme_process_completion(void *s, unsigned q_index, int inflight) "s %p q #%u inflight %d"
-nvme_process_completion_queue_plugged(void *s, unsigned q_index) "s %p q #%u"
 nvme_complete_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d"
 nvme_submit_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d"
 nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, int c6, int c7) "%02x %02x %02x %02x %02x %02x %02x %02x"
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/6] block/blkio: convert to blk_io_plug_call() API
  2023-05-23 17:12 [PATCH v2 0/6] block: add blk_io_plug_call() API Stefan Hajnoczi
  2023-05-23 17:12 ` [PATCH v2 1/6] " Stefan Hajnoczi
  2023-05-23 17:12 ` [PATCH v2 2/6] block/nvme: convert to " Stefan Hajnoczi
@ 2023-05-23 17:12 ` Stefan Hajnoczi
  2023-05-24  8:13   ` Stefano Garzarella
  2023-05-23 17:12 ` [PATCH v2 4/6] block/io_uring: " Stefan Hajnoczi
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-05-23 17:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aarushi Mehta, Michael S. Tsirkin, Stefano Garzarella,
	Julia Suvorova, Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, Stefan Hajnoczi, xen-devel, eblake, Anthony Perard,
	qemu-block

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
v2
- Add missing #include and fix blkio_unplug_fn() prototype [Stefano]
---
 block/blkio.c | 43 ++++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 0cdc99a729..93c6d20d39 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -17,6 +17,7 @@
 #include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
 #include "qemu/module.h"
+#include "sysemu/block-backend.h"
 #include "exec/memory.h" /* for ram_block_discard_disable() */
 
 #include "block/block-io.h"
@@ -325,16 +326,30 @@ static void blkio_detach_aio_context(BlockDriverState *bs)
                        false, NULL, NULL, NULL, NULL, NULL);
 }
 
-/* Call with s->blkio_lock held to submit I/O after enqueuing a new request */
-static void blkio_submit_io(BlockDriverState *bs)
+/*
+ * Called by blk_io_unplug() or immediately if not plugged. Called without
+ * blkio_lock.
+ */
+static void blkio_unplug_fn(void *opaque)
 {
-    if (qatomic_read(&bs->io_plugged) == 0) {
-        BDRVBlkioState *s = bs->opaque;
+    BDRVBlkioState *s = opaque;
 
+    WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
         blkioq_do_io(s->blkioq, NULL, 0, 0, NULL);
     }
 }
 
+/*
+ * Schedule I/O submission after enqueuing a new request. Called without
+ * blkio_lock.
+ */
+static void blkio_submit_io(BlockDriverState *bs)
+{
+    BDRVBlkioState *s = bs->opaque;
+
+    blk_io_plug_call(blkio_unplug_fn, s);
+}
+
 static int coroutine_fn
 blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
@@ -345,9 +360,9 @@ blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
 
     WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
         blkioq_discard(s->blkioq, offset, bytes, &cod, 0);
-        blkio_submit_io(bs);
     }
 
+    blkio_submit_io(bs);
     qemu_coroutine_yield();
     return cod.ret;
 }
@@ -378,9 +393,9 @@ blkio_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
 
     WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
         blkioq_readv(s->blkioq, offset, iov, iovcnt, &cod, 0);
-        blkio_submit_io(bs);
     }
 
+    blkio_submit_io(bs);
     qemu_coroutine_yield();
 
     if (use_bounce_buffer) {
@@ -423,9 +438,9 @@ static int coroutine_fn blkio_co_pwritev(BlockDriverState *bs, int64_t offset,
 
     WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
         blkioq_writev(s->blkioq, offset, iov, iovcnt, &cod, blkio_flags);
-        blkio_submit_io(bs);
     }
 
+    blkio_submit_io(bs);
     qemu_coroutine_yield();
 
     if (use_bounce_buffer) {
@@ -444,9 +459,9 @@ static int coroutine_fn blkio_co_flush(BlockDriverState *bs)
 
     WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
         blkioq_flush(s->blkioq, &cod, 0);
-        blkio_submit_io(bs);
     }
 
+    blkio_submit_io(bs);
     qemu_coroutine_yield();
     return cod.ret;
 }
@@ -472,22 +487,13 @@ static int coroutine_fn blkio_co_pwrite_zeroes(BlockDriverState *bs,
 
     WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
         blkioq_write_zeroes(s->blkioq, offset, bytes, &cod, blkio_flags);
-        blkio_submit_io(bs);
     }
 
+    blkio_submit_io(bs);
     qemu_coroutine_yield();
     return cod.ret;
 }
 
-static void coroutine_fn blkio_co_io_unplug(BlockDriverState *bs)
-{
-    BDRVBlkioState *s = bs->opaque;
-
-    WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
-        blkio_submit_io(bs);
-    }
-}
-
 typedef enum {
     BMRR_OK,
     BMRR_SKIP,
@@ -1009,7 +1015,6 @@ static void blkio_refresh_limits(BlockDriverState *bs, Error **errp)
         .bdrv_co_pwritev         = blkio_co_pwritev, \
         .bdrv_co_flush_to_disk   = blkio_co_flush, \
         .bdrv_co_pwrite_zeroes   = blkio_co_pwrite_zeroes, \
-        .bdrv_co_io_unplug       = blkio_co_io_unplug, \
         .bdrv_refresh_limits     = blkio_refresh_limits, \
         .bdrv_register_buf       = blkio_register_buf, \
         .bdrv_unregister_buf     = blkio_unregister_buf, \
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/6] block/io_uring: convert to blk_io_plug_call() API
  2023-05-23 17:12 [PATCH v2 0/6] block: add blk_io_plug_call() API Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2023-05-23 17:12 ` [PATCH v2 3/6] block/blkio: " Stefan Hajnoczi
@ 2023-05-23 17:12 ` Stefan Hajnoczi
  2023-05-24  8:47   ` Stefano Garzarella
  2023-05-23 17:12 ` [PATCH v2 5/6] block/linux-aio: " Stefan Hajnoczi
  2023-05-23 17:13 ` [PATCH v2 6/6] block: remove bdrv_co_io_plug() API Stefan Hajnoczi
  5 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-05-23 17:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aarushi Mehta, Michael S. Tsirkin, Stefano Garzarella,
	Julia Suvorova, Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, Stefan Hajnoczi, xen-devel, eblake, Anthony Perard,
	qemu-block

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
v2
- Removed whitespace hunk [Eric]
---
 include/block/raw-aio.h |  7 -------
 block/file-posix.c      | 10 ----------
 block/io_uring.c        | 44 ++++++++++++++++-------------------------
 block/trace-events      |  5 ++---
 4 files changed, 19 insertions(+), 47 deletions(-)

diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index 0fe85ade77..da60ca13ef 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -81,13 +81,6 @@ int coroutine_fn luring_co_submit(BlockDriverState *bs, int fd, uint64_t offset,
                                   QEMUIOVector *qiov, int type);
 void luring_detach_aio_context(LuringState *s, AioContext *old_context);
 void luring_attach_aio_context(LuringState *s, AioContext *new_context);
-
-/*
- * luring_io_plug/unplug work in the thread's current AioContext, therefore the
- * caller must ensure that they are paired in the same IOThread.
- */
-void luring_io_plug(void);
-void luring_io_unplug(void);
 #endif
 
 #ifdef _WIN32
diff --git a/block/file-posix.c b/block/file-posix.c
index 0ab158efba..7baa8491dd 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2558,11 +2558,6 @@ static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
         laio_io_plug();
     }
 #endif
-#ifdef CONFIG_LINUX_IO_URING
-    if (s->use_linux_io_uring) {
-        luring_io_plug();
-    }
-#endif
 }
 
 static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
@@ -2573,11 +2568,6 @@ static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
         laio_io_unplug(s->aio_max_batch);
     }
 #endif
-#ifdef CONFIG_LINUX_IO_URING
-    if (s->use_linux_io_uring) {
-        luring_io_unplug();
-    }
-#endif
 }
 
 static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
diff --git a/block/io_uring.c b/block/io_uring.c
index 82cab6a5bd..4e25b56c9c 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -16,6 +16,7 @@
 #include "block/raw-aio.h"
 #include "qemu/coroutine.h"
 #include "qapi/error.h"
+#include "sysemu/block-backend.h"
 #include "trace.h"
 
 /* Only used for assertions.  */
@@ -41,7 +42,6 @@ typedef struct LuringAIOCB {
 } LuringAIOCB;
 
 typedef struct LuringQueue {
-    int plugged;
     unsigned int in_queue;
     unsigned int in_flight;
     bool blocked;
@@ -267,7 +267,7 @@ static void luring_process_completions_and_submit(LuringState *s)
 {
     luring_process_completions(s);
 
-    if (!s->io_q.plugged && s->io_q.in_queue > 0) {
+    if (s->io_q.in_queue > 0) {
         ioq_submit(s);
     }
 }
@@ -301,29 +301,17 @@ static void qemu_luring_poll_ready(void *opaque)
 static void ioq_init(LuringQueue *io_q)
 {
     QSIMPLEQ_INIT(&io_q->submit_queue);
-    io_q->plugged = 0;
     io_q->in_queue = 0;
     io_q->in_flight = 0;
     io_q->blocked = false;
 }
 
-void luring_io_plug(void)
+static void luring_unplug_fn(void *opaque)
 {
-    AioContext *ctx = qemu_get_current_aio_context();
-    LuringState *s = aio_get_linux_io_uring(ctx);
-    trace_luring_io_plug(s);
-    s->io_q.plugged++;
-}
-
-void luring_io_unplug(void)
-{
-    AioContext *ctx = qemu_get_current_aio_context();
-    LuringState *s = aio_get_linux_io_uring(ctx);
-    assert(s->io_q.plugged);
-    trace_luring_io_unplug(s, s->io_q.blocked, s->io_q.plugged,
-                           s->io_q.in_queue, s->io_q.in_flight);
-    if (--s->io_q.plugged == 0 &&
-        !s->io_q.blocked && s->io_q.in_queue > 0) {
+    LuringState *s = opaque;
+    trace_luring_unplug_fn(s, s->io_q.blocked, s->io_q.in_queue,
+                           s->io_q.in_flight);
+    if (!s->io_q.blocked && s->io_q.in_queue > 0) {
         ioq_submit(s);
     }
 }
@@ -370,14 +358,16 @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
 
     QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next);
     s->io_q.in_queue++;
-    trace_luring_do_submit(s, s->io_q.blocked, s->io_q.plugged,
-                           s->io_q.in_queue, s->io_q.in_flight);
-    if (!s->io_q.blocked &&
-        (!s->io_q.plugged ||
-         s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES)) {
-        ret = ioq_submit(s);
-        trace_luring_do_submit_done(s, ret);
-        return ret;
+    trace_luring_do_submit(s, s->io_q.blocked, s->io_q.in_queue,
+                           s->io_q.in_flight);
+    if (!s->io_q.blocked) {
+        if (s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES) {
+            ret = ioq_submit(s);
+            trace_luring_do_submit_done(s, ret);
+            return ret;
+        }
+
+        blk_io_plug_call(luring_unplug_fn, s);
     }
     return 0;
 }
diff --git a/block/trace-events b/block/trace-events
index 048ad27519..6f121b7636 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -64,9 +64,8 @@ file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "
 # io_uring.c
 luring_init_state(void *s, size_t size) "s %p size %zu"
 luring_cleanup_state(void *s) "%p freed"
-luring_io_plug(void *s) "LuringState %p plug"
-luring_io_unplug(void *s, int blocked, int plugged, int queued, int inflight) "LuringState %p blocked %d plugged %d queued %d inflight %d"
-luring_do_submit(void *s, int blocked, int plugged, int queued, int inflight) "LuringState %p blocked %d plugged %d queued %d inflight %d"
+luring_unplug_fn(void *s, int blocked, int queued, int inflight) "LuringState %p blocked %d queued %d inflight %d"
+luring_do_submit(void *s, int blocked, int queued, int inflight) "LuringState %p blocked %d queued %d inflight %d"
 luring_do_submit_done(void *s, int ret) "LuringState %p submitted to kernel %d"
 luring_co_submit(void *bs, void *s, void *luringcb, int fd, uint64_t offset, size_t nbytes, int type) "bs %p s %p luringcb %p fd %d offset %" PRId64 " nbytes %zd type %d"
 luring_process_completion(void *s, void *aiocb, int ret) "LuringState %p luringcb %p ret %d"
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/6] block/linux-aio: convert to blk_io_plug_call() API
  2023-05-23 17:12 [PATCH v2 0/6] block: add blk_io_plug_call() API Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2023-05-23 17:12 ` [PATCH v2 4/6] block/io_uring: " Stefan Hajnoczi
@ 2023-05-23 17:12 ` Stefan Hajnoczi
  2023-05-24  8:52   ` Stefano Garzarella
  2023-05-23 17:13 ` [PATCH v2 6/6] block: remove bdrv_co_io_plug() API Stefan Hajnoczi
  5 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-05-23 17:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aarushi Mehta, Michael S. Tsirkin, Stefano Garzarella,
	Julia Suvorova, Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, Stefan Hajnoczi, xen-devel, eblake, Anthony Perard,
	qemu-block

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 include/block/raw-aio.h |  7 -------
 block/file-posix.c      | 28 ----------------------------
 block/linux-aio.c       | 41 +++++++++++------------------------------
 3 files changed, 11 insertions(+), 65 deletions(-)

diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index da60ca13ef..0f63c2800c 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -62,13 +62,6 @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qiov,
 
 void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
 void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
-
-/*
- * laio_io_plug/unplug work in the thread's current AioContext, therefore the
- * caller must ensure that they are paired in the same IOThread.
- */
-void laio_io_plug(void);
-void laio_io_unplug(uint64_t dev_max_batch);
 #endif
 /* io_uring.c - Linux io_uring implementation */
 #ifdef CONFIG_LINUX_IO_URING
diff --git a/block/file-posix.c b/block/file-posix.c
index 7baa8491dd..ac1ed54811 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2550,26 +2550,6 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, int64_t offset,
     return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
 }
 
-static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
-{
-    BDRVRawState __attribute__((unused)) *s = bs->opaque;
-#ifdef CONFIG_LINUX_AIO
-    if (s->use_linux_aio) {
-        laio_io_plug();
-    }
-#endif
-}
-
-static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
-{
-    BDRVRawState __attribute__((unused)) *s = bs->opaque;
-#ifdef CONFIG_LINUX_AIO
-    if (s->use_linux_aio) {
-        laio_io_unplug(s->aio_max_batch);
-    }
-#endif
-}
-
 static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
 {
     BDRVRawState *s = bs->opaque;
@@ -3914,8 +3894,6 @@ BlockDriver bdrv_file = {
     .bdrv_co_copy_range_from = raw_co_copy_range_from,
     .bdrv_co_copy_range_to  = raw_co_copy_range_to,
     .bdrv_refresh_limits = raw_refresh_limits,
-    .bdrv_co_io_plug        = raw_co_io_plug,
-    .bdrv_co_io_unplug      = raw_co_io_unplug,
     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
     .bdrv_co_truncate                   = raw_co_truncate,
@@ -4286,8 +4264,6 @@ static BlockDriver bdrv_host_device = {
     .bdrv_co_copy_range_from = raw_co_copy_range_from,
     .bdrv_co_copy_range_to  = raw_co_copy_range_to,
     .bdrv_refresh_limits = raw_refresh_limits,
-    .bdrv_co_io_plug        = raw_co_io_plug,
-    .bdrv_co_io_unplug      = raw_co_io_unplug,
     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
     .bdrv_co_truncate                   = raw_co_truncate,
@@ -4424,8 +4400,6 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_co_pwritev        = raw_co_pwritev,
     .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
     .bdrv_refresh_limits    = cdrom_refresh_limits,
-    .bdrv_co_io_plug        = raw_co_io_plug,
-    .bdrv_co_io_unplug      = raw_co_io_unplug,
     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
     .bdrv_co_truncate                   = raw_co_truncate,
@@ -4552,8 +4526,6 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_co_pwritev        = raw_co_pwritev,
     .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
     .bdrv_refresh_limits    = cdrom_refresh_limits,
-    .bdrv_co_io_plug        = raw_co_io_plug,
-    .bdrv_co_io_unplug      = raw_co_io_unplug,
     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
     .bdrv_co_truncate                   = raw_co_truncate,
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 442c86209b..5021aed68f 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -15,6 +15,7 @@
 #include "qemu/event_notifier.h"
 #include "qemu/coroutine.h"
 #include "qapi/error.h"
+#include "sysemu/block-backend.h"
 
 /* Only used for assertions.  */
 #include "qemu/coroutine_int.h"
@@ -46,7 +47,6 @@ struct qemu_laiocb {
 };
 
 typedef struct {
-    int plugged;
     unsigned int in_queue;
     unsigned int in_flight;
     bool blocked;
@@ -236,7 +236,7 @@ static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
 {
     qemu_laio_process_completions(s);
 
-    if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
+    if (!QSIMPLEQ_EMPTY(&s->io_q.pending)) {
         ioq_submit(s);
     }
 }
@@ -277,7 +277,6 @@ static void qemu_laio_poll_ready(EventNotifier *opaque)
 static void ioq_init(LaioQueue *io_q)
 {
     QSIMPLEQ_INIT(&io_q->pending);
-    io_q->plugged = 0;
     io_q->in_queue = 0;
     io_q->in_flight = 0;
     io_q->blocked = false;
@@ -354,31 +353,11 @@ static uint64_t laio_max_batch(LinuxAioState *s, uint64_t dev_max_batch)
     return max_batch;
 }
 
-void laio_io_plug(void)
+static void laio_unplug_fn(void *opaque)
 {
-    AioContext *ctx = qemu_get_current_aio_context();
-    LinuxAioState *s = aio_get_linux_aio(ctx);
+    LinuxAioState *s = opaque;
 
-    s->io_q.plugged++;
-}
-
-void laio_io_unplug(uint64_t dev_max_batch)
-{
-    AioContext *ctx = qemu_get_current_aio_context();
-    LinuxAioState *s = aio_get_linux_aio(ctx);
-
-    assert(s->io_q.plugged);
-    s->io_q.plugged--;
-
-    /*
-     * Why max batch checking is performed here:
-     * Another BDS may have queued requests with a higher dev_max_batch and
-     * therefore in_queue could now exceed our dev_max_batch. Re-check the max
-     * batch so we can honor our device's dev_max_batch.
-     */
-    if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) ||
-        (!s->io_q.plugged &&
-         !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) {
+    if (!s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
         ioq_submit(s);
     }
 }
@@ -410,10 +389,12 @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
 
     QSIMPLEQ_INSERT_TAIL(&s->io_q.pending, laiocb, next);
     s->io_q.in_queue++;
-    if (!s->io_q.blocked &&
-        (!s->io_q.plugged ||
-         s->io_q.in_queue >= laio_max_batch(s, dev_max_batch))) {
-        ioq_submit(s);
+    if (!s->io_q.blocked) {
+        if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch)) {
+            ioq_submit(s);
+        } else {
+            blk_io_plug_call(laio_unplug_fn, s);
+        }
     }
 
     return 0;
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 6/6] block: remove bdrv_co_io_plug() API
  2023-05-23 17:12 [PATCH v2 0/6] block: add blk_io_plug_call() API Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2023-05-23 17:12 ` [PATCH v2 5/6] block/linux-aio: " Stefan Hajnoczi
@ 2023-05-23 17:13 ` Stefan Hajnoczi
  2023-05-24  8:52   ` Stefano Garzarella
  5 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-05-23 17:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: Aarushi Mehta, Michael S. Tsirkin, Stefano Garzarella,
	Julia Suvorova, Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, Stefan Hajnoczi, xen-devel, eblake, Anthony Perard,
	qemu-block

No block driver implements .bdrv_co_io_plug() anymore. Get rid of the
function pointers.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 include/block/block-io.h         |  3 ---
 include/block/block_int-common.h | 11 ----------
 block/io.c                       | 37 --------------------------------
 3 files changed, 51 deletions(-)

diff --git a/include/block/block-io.h b/include/block/block-io.h
index a27e471a87..43af816d75 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -259,9 +259,6 @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, AioContext *old_ctx);
 
 AioContext *child_of_bds_get_parent_aio_context(BdrvChild *c);
 
-void coroutine_fn GRAPH_RDLOCK bdrv_co_io_plug(BlockDriverState *bs);
-void coroutine_fn GRAPH_RDLOCK bdrv_co_io_unplug(BlockDriverState *bs);
-
 bool coroutine_fn GRAPH_RDLOCK
 bdrv_co_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
                                    uint32_t granularity, Error **errp);
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 6492a1e538..958962aa3a 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -753,11 +753,6 @@ struct BlockDriver {
     void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_debug_event)(
         BlockDriverState *bs, BlkdebugEvent event);
 
-    /* io queue for linux-aio */
-    void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_plug)(BlockDriverState *bs);
-    void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_unplug)(
-        BlockDriverState *bs);
-
     /**
      * bdrv_drain_begin is called if implemented in the beginning of a
      * drain operation to drain and stop any internal sources of requests in
@@ -1227,12 +1222,6 @@ struct BlockDriverState {
     unsigned int in_flight;
     unsigned int serialising_in_flight;
 
-    /*
-     * counter for nested bdrv_io_plug.
-     * Accessed with atomic ops.
-     */
-    unsigned io_plugged;
-
     /* do we need to tell the quest if we have a volatile write cache? */
     int enable_write_cache;
 
diff --git a/block/io.c b/block/io.c
index 4d54fda593..56b0c1ce6c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3219,43 +3219,6 @@ void *qemu_try_blockalign0(BlockDriverState *bs, size_t size)
     return mem;
 }
 
-void coroutine_fn bdrv_co_io_plug(BlockDriverState *bs)
-{
-    BdrvChild *child;
-    IO_CODE();
-    assert_bdrv_graph_readable();
-
-    QLIST_FOREACH(child, &bs->children, next) {
-        bdrv_co_io_plug(child->bs);
-    }
-
-    if (qatomic_fetch_inc(&bs->io_plugged) == 0) {
-        BlockDriver *drv = bs->drv;
-        if (drv && drv->bdrv_co_io_plug) {
-            drv->bdrv_co_io_plug(bs);
-        }
-    }
-}
-
-void coroutine_fn bdrv_co_io_unplug(BlockDriverState *bs)
-{
-    BdrvChild *child;
-    IO_CODE();
-    assert_bdrv_graph_readable();
-
-    assert(bs->io_plugged);
-    if (qatomic_fetch_dec(&bs->io_plugged) == 1) {
-        BlockDriver *drv = bs->drv;
-        if (drv && drv->bdrv_co_io_unplug) {
-            drv->bdrv_co_io_unplug(bs);
-        }
-    }
-
-    QLIST_FOREACH(child, &bs->children, next) {
-        bdrv_co_io_unplug(child->bs);
-    }
-}
-
 /* Helper that undoes bdrv_register_buf() when it fails partway through */
 static void GRAPH_RDLOCK
 bdrv_register_buf_rollback(BlockDriverState *bs, void *host, size_t size,
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/6] block: add blk_io_plug_call() API
  2023-05-23 17:12 ` [PATCH v2 1/6] " Stefan Hajnoczi
@ 2023-05-24  8:08   ` Stefano Garzarella
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Garzarella @ 2023-05-24  8:08 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Aarushi Mehta, Michael S. Tsirkin, Julia Suvorova,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, xen-devel, eblake, Anthony Perard, qemu-block

On Tue, May 23, 2023 at 01:12:55PM -0400, Stefan Hajnoczi wrote:
>Introduce a new API for thread-local blk_io_plug() that does not
>traverse the block graph. The goal is to make blk_io_plug() multi-queue
>friendly.
>
>Instead of having block drivers track whether or not we're in a plugged
>section, provide an API that allows them to defer a function call until
>we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is
>called multiple times with the same fn/opaque pair, then fn() is only
>called once at the end of the function - resulting in batching.
>
>This patch introduces the API and changes blk_io_plug()/blk_io_unplug().
>blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument
>because the plug state is now thread-local.
>
>Later patches convert block drivers to blk_io_plug_call() and then we
>can finally remove .bdrv_co_io_plug() once all block drivers have been
>converted.
>
>Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>Reviewed-by: Eric Blake <eblake@redhat.com>
>---
>v2
>- "is not be freed" -> "is not freed" [Eric]
>---
> MAINTAINERS                       |   1 +
> include/sysemu/block-backend-io.h |  13 +--
> block/block-backend.c             |  22 -----
> block/plug.c                      | 159 ++++++++++++++++++++++++++++++
> hw/block/dataplane/xen-block.c    |   8 +-
> hw/block/virtio-blk.c             |   4 +-
> hw/scsi/virtio-scsi.c             |   6 +-
> block/meson.build                 |   1 +
> 8 files changed, 173 insertions(+), 41 deletions(-)
> create mode 100644 block/plug.c

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/6] block/nvme: convert to blk_io_plug_call() API
  2023-05-23 17:12 ` [PATCH v2 2/6] block/nvme: convert to " Stefan Hajnoczi
@ 2023-05-24  8:11   ` Stefano Garzarella
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Garzarella @ 2023-05-24  8:11 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Aarushi Mehta, Michael S. Tsirkin, Julia Suvorova,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, xen-devel, eblake, Anthony Perard, qemu-block

On Tue, May 23, 2023 at 01:12:56PM -0400, Stefan Hajnoczi wrote:
>Stop using the .bdrv_co_io_plug() API because it is not multi-queue
>block layer friendly. Use the new blk_io_plug_call() API to batch I/O
>submission instead.
>
>Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>Reviewed-by: Eric Blake <eblake@redhat.com>
>---
>v2
>- Remove unused nvme_process_completion_queue_plugged trace event
>  [Stefano]
>---
> block/nvme.c       | 44 ++++++++++++--------------------------------
> block/trace-events |  1 -
> 2 files changed, 12 insertions(+), 33 deletions(-)

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/6] block/blkio: convert to blk_io_plug_call() API
  2023-05-23 17:12 ` [PATCH v2 3/6] block/blkio: " Stefan Hajnoczi
@ 2023-05-24  8:13   ` Stefano Garzarella
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Garzarella @ 2023-05-24  8:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Aarushi Mehta, Michael S. Tsirkin, Julia Suvorova,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, xen-devel, eblake, Anthony Perard, qemu-block

On Tue, May 23, 2023 at 01:12:57PM -0400, Stefan Hajnoczi wrote:
>Stop using the .bdrv_co_io_plug() API because it is not multi-queue
>block layer friendly. Use the new blk_io_plug_call() API to batch I/O
>submission instead.
>
>Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>Reviewed-by: Eric Blake <eblake@redhat.com>
>---
>v2
>- Add missing #include and fix blkio_unplug_fn() prototype [Stefano]
>---
> block/blkio.c | 43 ++++++++++++++++++++++++-------------------
> 1 file changed, 24 insertions(+), 19 deletions(-)

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/6] block/io_uring: convert to blk_io_plug_call() API
  2023-05-23 17:12 ` [PATCH v2 4/6] block/io_uring: " Stefan Hajnoczi
@ 2023-05-24  8:47   ` Stefano Garzarella
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Garzarella @ 2023-05-24  8:47 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Aarushi Mehta, Michael S. Tsirkin, Julia Suvorova,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, xen-devel, eblake, Anthony Perard, qemu-block

On Tue, May 23, 2023 at 01:12:58PM -0400, Stefan Hajnoczi wrote:
>Stop using the .bdrv_co_io_plug() API because it is not multi-queue
>block layer friendly. Use the new blk_io_plug_call() API to batch I/O
>submission instead.
>
>Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>Reviewed-by: Eric Blake <eblake@redhat.com>
>---
>v2
>- Removed whitespace hunk [Eric]
>---
> include/block/raw-aio.h |  7 -------
> block/file-posix.c      | 10 ----------
> block/io_uring.c        | 44 ++++++++++++++++-------------------------
> block/trace-events      |  5 ++---
> 4 files changed, 19 insertions(+), 47 deletions(-)

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/6] block/linux-aio: convert to blk_io_plug_call() API
  2023-05-23 17:12 ` [PATCH v2 5/6] block/linux-aio: " Stefan Hajnoczi
@ 2023-05-24  8:52   ` Stefano Garzarella
  2023-05-24 19:36     ` Stefan Hajnoczi
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Garzarella @ 2023-05-24  8:52 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Aarushi Mehta, Michael S. Tsirkin, Julia Suvorova,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, xen-devel, eblake, Anthony Perard, qemu-block

On Tue, May 23, 2023 at 01:12:59PM -0400, Stefan Hajnoczi wrote:
>Stop using the .bdrv_co_io_plug() API because it is not multi-queue
>block layer friendly. Use the new blk_io_plug_call() API to batch I/O
>submission instead.
>
>Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>Reviewed-by: Eric Blake <eblake@redhat.com>
>---
> include/block/raw-aio.h |  7 -------
> block/file-posix.c      | 28 ----------------------------
> block/linux-aio.c       | 41 +++++++++++------------------------------
> 3 files changed, 11 insertions(+), 65 deletions(-)
>
>diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
>index da60ca13ef..0f63c2800c 100644
>--- a/include/block/raw-aio.h
>+++ b/include/block/raw-aio.h
>@@ -62,13 +62,6 @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qiov,
>
> void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
> void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
>-
>-/*
>- * laio_io_plug/unplug work in the thread's current AioContext, therefore the
>- * caller must ensure that they are paired in the same IOThread.
>- */
>-void laio_io_plug(void);
>-void laio_io_unplug(uint64_t dev_max_batch);
> #endif
> /* io_uring.c - Linux io_uring implementation */
> #ifdef CONFIG_LINUX_IO_URING
>diff --git a/block/file-posix.c b/block/file-posix.c
>index 7baa8491dd..ac1ed54811 100644
>--- a/block/file-posix.c
>+++ b/block/file-posix.c
>@@ -2550,26 +2550,6 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, int64_t offset,
>     return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
> }
>
>-static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
>-{
>-    BDRVRawState __attribute__((unused)) *s = bs->opaque;
>-#ifdef CONFIG_LINUX_AIO
>-    if (s->use_linux_aio) {
>-        laio_io_plug();
>-    }
>-#endif
>-}
>-
>-static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
>-{
>-    BDRVRawState __attribute__((unused)) *s = bs->opaque;
>-#ifdef CONFIG_LINUX_AIO
>-    if (s->use_linux_aio) {
>-        laio_io_unplug(s->aio_max_batch);
>-    }
>-#endif
>-}
>-
> static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
> {
>     BDRVRawState *s = bs->opaque;
>@@ -3914,8 +3894,6 @@ BlockDriver bdrv_file = {
>     .bdrv_co_copy_range_from = raw_co_copy_range_from,
>     .bdrv_co_copy_range_to  = raw_co_copy_range_to,
>     .bdrv_refresh_limits = raw_refresh_limits,
>-    .bdrv_co_io_plug        = raw_co_io_plug,
>-    .bdrv_co_io_unplug      = raw_co_io_unplug,
>     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
>
>     .bdrv_co_truncate                   = raw_co_truncate,
>@@ -4286,8 +4264,6 @@ static BlockDriver bdrv_host_device = {
>     .bdrv_co_copy_range_from = raw_co_copy_range_from,
>     .bdrv_co_copy_range_to  = raw_co_copy_range_to,
>     .bdrv_refresh_limits = raw_refresh_limits,
>-    .bdrv_co_io_plug        = raw_co_io_plug,
>-    .bdrv_co_io_unplug      = raw_co_io_unplug,
>     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
>
>     .bdrv_co_truncate                   = raw_co_truncate,
>@@ -4424,8 +4400,6 @@ static BlockDriver bdrv_host_cdrom = {
>     .bdrv_co_pwritev        = raw_co_pwritev,
>     .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
>     .bdrv_refresh_limits    = cdrom_refresh_limits,
>-    .bdrv_co_io_plug        = raw_co_io_plug,
>-    .bdrv_co_io_unplug      = raw_co_io_unplug,
>     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
>
>     .bdrv_co_truncate                   = raw_co_truncate,
>@@ -4552,8 +4526,6 @@ static BlockDriver bdrv_host_cdrom = {
>     .bdrv_co_pwritev        = raw_co_pwritev,
>     .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
>     .bdrv_refresh_limits    = cdrom_refresh_limits,
>-    .bdrv_co_io_plug        = raw_co_io_plug,
>-    .bdrv_co_io_unplug      = raw_co_io_unplug,
>     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
>
>     .bdrv_co_truncate                   = raw_co_truncate,
>diff --git a/block/linux-aio.c b/block/linux-aio.c
>index 442c86209b..5021aed68f 100644
>--- a/block/linux-aio.c
>+++ b/block/linux-aio.c
>@@ -15,6 +15,7 @@
> #include "qemu/event_notifier.h"
> #include "qemu/coroutine.h"
> #include "qapi/error.h"
>+#include "sysemu/block-backend.h"
>
> /* Only used for assertions.  */
> #include "qemu/coroutine_int.h"
>@@ -46,7 +47,6 @@ struct qemu_laiocb {
> };
>
> typedef struct {
>-    int plugged;
>     unsigned int in_queue;
>     unsigned int in_flight;
>     bool blocked;
>@@ -236,7 +236,7 @@ static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
> {
>     qemu_laio_process_completions(s);
>
>-    if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
>+    if (!QSIMPLEQ_EMPTY(&s->io_q.pending)) {
>         ioq_submit(s);
>     }
> }
>@@ -277,7 +277,6 @@ static void qemu_laio_poll_ready(EventNotifier *opaque)
> static void ioq_init(LaioQueue *io_q)
> {
>     QSIMPLEQ_INIT(&io_q->pending);
>-    io_q->plugged = 0;
>     io_q->in_queue = 0;
>     io_q->in_flight = 0;
>     io_q->blocked = false;
>@@ -354,31 +353,11 @@ static uint64_t laio_max_batch(LinuxAioState *s, uint64_t dev_max_batch)
>     return max_batch;
> }
>
>-void laio_io_plug(void)
>+static void laio_unplug_fn(void *opaque)
> {
>-    AioContext *ctx = qemu_get_current_aio_context();
>-    LinuxAioState *s = aio_get_linux_aio(ctx);
>+    LinuxAioState *s = opaque;
>
>-    s->io_q.plugged++;
>-}
>-
>-void laio_io_unplug(uint64_t dev_max_batch)
>-{
>-    AioContext *ctx = qemu_get_current_aio_context();
>-    LinuxAioState *s = aio_get_linux_aio(ctx);
>-
>-    assert(s->io_q.plugged);
>-    s->io_q.plugged--;
>-
>-    /*
>-     * Why max batch checking is performed here:
>-     * Another BDS may have queued requests with a higher dev_max_batch and
>-     * therefore in_queue could now exceed our dev_max_batch. Re-check the max
>-     * batch so we can honor our device's dev_max_batch.
>-     */
>-    if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) ||

Why are we removing this condition?
Could the same situation occur with the new API?

Thanks,
Stefano

>-        (!s->io_q.plugged &&
>-         !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) {
>+    if (!s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
>         ioq_submit(s);
>     }
> }
>@@ -410,10 +389,12 @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
>
>     QSIMPLEQ_INSERT_TAIL(&s->io_q.pending, laiocb, next);
>     s->io_q.in_queue++;
>-    if (!s->io_q.blocked &&
>-        (!s->io_q.plugged ||
>-         s->io_q.in_queue >= laio_max_batch(s, dev_max_batch))) {
>-        ioq_submit(s);
>+    if (!s->io_q.blocked) {
>+        if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch)) {
>+            ioq_submit(s);
>+        } else {
>+            blk_io_plug_call(laio_unplug_fn, s);
>+        }
>     }
>
>     return 0;
>-- 
>2.40.1
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 6/6] block: remove bdrv_co_io_plug() API
  2023-05-23 17:13 ` [PATCH v2 6/6] block: remove bdrv_co_io_plug() API Stefan Hajnoczi
@ 2023-05-24  8:52   ` Stefano Garzarella
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Garzarella @ 2023-05-24  8:52 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Aarushi Mehta, Michael S. Tsirkin, Julia Suvorova,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, xen-devel, eblake, Anthony Perard, qemu-block

On Tue, May 23, 2023 at 01:13:00PM -0400, Stefan Hajnoczi wrote:
>No block driver implements .bdrv_co_io_plug() anymore. Get rid of the
>function pointers.
>
>Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>Reviewed-by: Eric Blake <eblake@redhat.com>
>---
> include/block/block-io.h         |  3 ---
> include/block/block_int-common.h | 11 ----------
> block/io.c                       | 37 --------------------------------
> 3 files changed, 51 deletions(-)

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/6] block/linux-aio: convert to blk_io_plug_call() API
  2023-05-24  8:52   ` Stefano Garzarella
@ 2023-05-24 19:36     ` Stefan Hajnoczi
  2023-05-29  8:50       ` Stefano Garzarella
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-05-24 19:36 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: qemu-devel, Aarushi Mehta, Michael S. Tsirkin, Julia Suvorova,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, xen-devel, eblake, Anthony Perard, qemu-block

[-- Attachment #1: Type: text/plain, Size: 7677 bytes --]

On Wed, May 24, 2023 at 10:52:03AM +0200, Stefano Garzarella wrote:
> On Tue, May 23, 2023 at 01:12:59PM -0400, Stefan Hajnoczi wrote:
> > Stop using the .bdrv_co_io_plug() API because it is not multi-queue
> > block layer friendly. Use the new blk_io_plug_call() API to batch I/O
> > submission instead.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Reviewed-by: Eric Blake <eblake@redhat.com>
> > ---
> > include/block/raw-aio.h |  7 -------
> > block/file-posix.c      | 28 ----------------------------
> > block/linux-aio.c       | 41 +++++++++++------------------------------
> > 3 files changed, 11 insertions(+), 65 deletions(-)
> > 
> > diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
> > index da60ca13ef..0f63c2800c 100644
> > --- a/include/block/raw-aio.h
> > +++ b/include/block/raw-aio.h
> > @@ -62,13 +62,6 @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qiov,
> > 
> > void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
> > void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
> > -
> > -/*
> > - * laio_io_plug/unplug work in the thread's current AioContext, therefore the
> > - * caller must ensure that they are paired in the same IOThread.
> > - */
> > -void laio_io_plug(void);
> > -void laio_io_unplug(uint64_t dev_max_batch);
> > #endif
> > /* io_uring.c - Linux io_uring implementation */
> > #ifdef CONFIG_LINUX_IO_URING
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 7baa8491dd..ac1ed54811 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -2550,26 +2550,6 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, int64_t offset,
> >     return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
> > }
> > 
> > -static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
> > -{
> > -    BDRVRawState __attribute__((unused)) *s = bs->opaque;
> > -#ifdef CONFIG_LINUX_AIO
> > -    if (s->use_linux_aio) {
> > -        laio_io_plug();
> > -    }
> > -#endif
> > -}
> > -
> > -static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
> > -{
> > -    BDRVRawState __attribute__((unused)) *s = bs->opaque;
> > -#ifdef CONFIG_LINUX_AIO
> > -    if (s->use_linux_aio) {
> > -        laio_io_unplug(s->aio_max_batch);
> > -    }
> > -#endif
> > -}
> > -
> > static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
> > {
> >     BDRVRawState *s = bs->opaque;
> > @@ -3914,8 +3894,6 @@ BlockDriver bdrv_file = {
> >     .bdrv_co_copy_range_from = raw_co_copy_range_from,
> >     .bdrv_co_copy_range_to  = raw_co_copy_range_to,
> >     .bdrv_refresh_limits = raw_refresh_limits,
> > -    .bdrv_co_io_plug        = raw_co_io_plug,
> > -    .bdrv_co_io_unplug      = raw_co_io_unplug,
> >     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> > 
> >     .bdrv_co_truncate                   = raw_co_truncate,
> > @@ -4286,8 +4264,6 @@ static BlockDriver bdrv_host_device = {
> >     .bdrv_co_copy_range_from = raw_co_copy_range_from,
> >     .bdrv_co_copy_range_to  = raw_co_copy_range_to,
> >     .bdrv_refresh_limits = raw_refresh_limits,
> > -    .bdrv_co_io_plug        = raw_co_io_plug,
> > -    .bdrv_co_io_unplug      = raw_co_io_unplug,
> >     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> > 
> >     .bdrv_co_truncate                   = raw_co_truncate,
> > @@ -4424,8 +4400,6 @@ static BlockDriver bdrv_host_cdrom = {
> >     .bdrv_co_pwritev        = raw_co_pwritev,
> >     .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> >     .bdrv_refresh_limits    = cdrom_refresh_limits,
> > -    .bdrv_co_io_plug        = raw_co_io_plug,
> > -    .bdrv_co_io_unplug      = raw_co_io_unplug,
> >     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> > 
> >     .bdrv_co_truncate                   = raw_co_truncate,
> > @@ -4552,8 +4526,6 @@ static BlockDriver bdrv_host_cdrom = {
> >     .bdrv_co_pwritev        = raw_co_pwritev,
> >     .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> >     .bdrv_refresh_limits    = cdrom_refresh_limits,
> > -    .bdrv_co_io_plug        = raw_co_io_plug,
> > -    .bdrv_co_io_unplug      = raw_co_io_unplug,
> >     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> > 
> >     .bdrv_co_truncate                   = raw_co_truncate,
> > diff --git a/block/linux-aio.c b/block/linux-aio.c
> > index 442c86209b..5021aed68f 100644
> > --- a/block/linux-aio.c
> > +++ b/block/linux-aio.c
> > @@ -15,6 +15,7 @@
> > #include "qemu/event_notifier.h"
> > #include "qemu/coroutine.h"
> > #include "qapi/error.h"
> > +#include "sysemu/block-backend.h"
> > 
> > /* Only used for assertions.  */
> > #include "qemu/coroutine_int.h"
> > @@ -46,7 +47,6 @@ struct qemu_laiocb {
> > };
> > 
> > typedef struct {
> > -    int plugged;
> >     unsigned int in_queue;
> >     unsigned int in_flight;
> >     bool blocked;
> > @@ -236,7 +236,7 @@ static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
> > {
> >     qemu_laio_process_completions(s);
> > 
> > -    if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
> > +    if (!QSIMPLEQ_EMPTY(&s->io_q.pending)) {
> >         ioq_submit(s);
> >     }
> > }
> > @@ -277,7 +277,6 @@ static void qemu_laio_poll_ready(EventNotifier *opaque)
> > static void ioq_init(LaioQueue *io_q)
> > {
> >     QSIMPLEQ_INIT(&io_q->pending);
> > -    io_q->plugged = 0;
> >     io_q->in_queue = 0;
> >     io_q->in_flight = 0;
> >     io_q->blocked = false;
> > @@ -354,31 +353,11 @@ static uint64_t laio_max_batch(LinuxAioState *s, uint64_t dev_max_batch)
> >     return max_batch;
> > }
> > 
> > -void laio_io_plug(void)
> > +static void laio_unplug_fn(void *opaque)
> > {
> > -    AioContext *ctx = qemu_get_current_aio_context();
> > -    LinuxAioState *s = aio_get_linux_aio(ctx);
> > +    LinuxAioState *s = opaque;
> > 
> > -    s->io_q.plugged++;
> > -}
> > -
> > -void laio_io_unplug(uint64_t dev_max_batch)
> > -{
> > -    AioContext *ctx = qemu_get_current_aio_context();
> > -    LinuxAioState *s = aio_get_linux_aio(ctx);
> > -
> > -    assert(s->io_q.plugged);
> > -    s->io_q.plugged--;
> > -
> > -    /*
> > -     * Why max batch checking is performed here:
> > -     * Another BDS may have queued requests with a higher dev_max_batch and
> > -     * therefore in_queue could now exceed our dev_max_batch. Re-check the max
> > -     * batch so we can honor our device's dev_max_batch.
> > -     */
> > -    if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) ||
> 
> Why are we removing this condition?
> Could the same situation occur with the new API?

The semantics of unplug_fn() are different from .bdrv_co_unplug():
1. unplug_fn() is only called when the last blk_io_unplug() call occurs,
   not every time blk_io_unplug() is called.
2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no
   way to get per-BlockDriverState fields like dev_max_batch.

Therefore this condition cannot be moved to laio_unplug_fn().

How important is this condition? I believe that dropping it does not
have much of an effect but maybe I missed something.

Also, does it make sense to define per-BlockDriverState batching limits
when the AIO engine (Linux AIO or io_uring) is thread-local and shared
between all BlockDriverStates? I believe the fundamental reason (that we
discovered later) why dev_max_batch is effective is because the Linux
kernel processes 32 I/O request submissions at a time. Anything above 32
adds latency without a batching benefit.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/6] block/linux-aio: convert to blk_io_plug_call() API
  2023-05-24 19:36     ` Stefan Hajnoczi
@ 2023-05-29  8:50       ` Stefano Garzarella
  2023-05-30 17:11         ` Stefan Hajnoczi
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Garzarella @ 2023-05-29  8:50 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Aarushi Mehta, Michael S. Tsirkin, Julia Suvorova,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, xen-devel, eblake, Anthony Perard, qemu-block

On Wed, May 24, 2023 at 03:36:34PM -0400, Stefan Hajnoczi wrote:
>On Wed, May 24, 2023 at 10:52:03AM +0200, Stefano Garzarella wrote:
>> On Tue, May 23, 2023 at 01:12:59PM -0400, Stefan Hajnoczi wrote:
>> > Stop using the .bdrv_co_io_plug() API because it is not multi-queue
>> > block layer friendly. Use the new blk_io_plug_call() API to batch I/O
>> > submission instead.
>> >
>> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> > Reviewed-by: Eric Blake <eblake@redhat.com>
>> > ---
>> > include/block/raw-aio.h |  7 -------
>> > block/file-posix.c      | 28 ----------------------------
>> > block/linux-aio.c       | 41 +++++++++++------------------------------
>> > 3 files changed, 11 insertions(+), 65 deletions(-)
>> >
>> > diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
>> > index da60ca13ef..0f63c2800c 100644
>> > --- a/include/block/raw-aio.h
>> > +++ b/include/block/raw-aio.h
>> > @@ -62,13 +62,6 @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qiov,
>> >
>> > void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
>> > void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
>> > -
>> > -/*
>> > - * laio_io_plug/unplug work in the thread's current AioContext, therefore the
>> > - * caller must ensure that they are paired in the same IOThread.
>> > - */
>> > -void laio_io_plug(void);
>> > -void laio_io_unplug(uint64_t dev_max_batch);
>> > #endif
>> > /* io_uring.c - Linux io_uring implementation */
>> > #ifdef CONFIG_LINUX_IO_URING
>> > diff --git a/block/file-posix.c b/block/file-posix.c
>> > index 7baa8491dd..ac1ed54811 100644
>> > --- a/block/file-posix.c
>> > +++ b/block/file-posix.c
>> > @@ -2550,26 +2550,6 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, int64_t offset,
>> >     return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
>> > }
>> >
>> > -static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
>> > -{
>> > -    BDRVRawState __attribute__((unused)) *s = bs->opaque;
>> > -#ifdef CONFIG_LINUX_AIO
>> > -    if (s->use_linux_aio) {
>> > -        laio_io_plug();
>> > -    }
>> > -#endif
>> > -}
>> > -
>> > -static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
>> > -{
>> > -    BDRVRawState __attribute__((unused)) *s = bs->opaque;
>> > -#ifdef CONFIG_LINUX_AIO
>> > -    if (s->use_linux_aio) {
>> > -        laio_io_unplug(s->aio_max_batch);
>> > -    }
>> > -#endif
>> > -}
>> > -
>> > static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
>> > {
>> >     BDRVRawState *s = bs->opaque;
>> > @@ -3914,8 +3894,6 @@ BlockDriver bdrv_file = {
>> >     .bdrv_co_copy_range_from = raw_co_copy_range_from,
>> >     .bdrv_co_copy_range_to  = raw_co_copy_range_to,
>> >     .bdrv_refresh_limits = raw_refresh_limits,
>> > -    .bdrv_co_io_plug        = raw_co_io_plug,
>> > -    .bdrv_co_io_unplug      = raw_co_io_unplug,
>> >     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
>> >
>> >     .bdrv_co_truncate                   = raw_co_truncate,
>> > @@ -4286,8 +4264,6 @@ static BlockDriver bdrv_host_device = {
>> >     .bdrv_co_copy_range_from = raw_co_copy_range_from,
>> >     .bdrv_co_copy_range_to  = raw_co_copy_range_to,
>> >     .bdrv_refresh_limits = raw_refresh_limits,
>> > -    .bdrv_co_io_plug        = raw_co_io_plug,
>> > -    .bdrv_co_io_unplug      = raw_co_io_unplug,
>> >     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
>> >
>> >     .bdrv_co_truncate                   = raw_co_truncate,
>> > @@ -4424,8 +4400,6 @@ static BlockDriver bdrv_host_cdrom = {
>> >     .bdrv_co_pwritev        = raw_co_pwritev,
>> >     .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
>> >     .bdrv_refresh_limits    = cdrom_refresh_limits,
>> > -    .bdrv_co_io_plug        = raw_co_io_plug,
>> > -    .bdrv_co_io_unplug      = raw_co_io_unplug,
>> >     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
>> >
>> >     .bdrv_co_truncate                   = raw_co_truncate,
>> > @@ -4552,8 +4526,6 @@ static BlockDriver bdrv_host_cdrom = {
>> >     .bdrv_co_pwritev        = raw_co_pwritev,
>> >     .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
>> >     .bdrv_refresh_limits    = cdrom_refresh_limits,
>> > -    .bdrv_co_io_plug        = raw_co_io_plug,
>> > -    .bdrv_co_io_unplug      = raw_co_io_unplug,
>> >     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
>> >
>> >     .bdrv_co_truncate                   = raw_co_truncate,
>> > diff --git a/block/linux-aio.c b/block/linux-aio.c
>> > index 442c86209b..5021aed68f 100644
>> > --- a/block/linux-aio.c
>> > +++ b/block/linux-aio.c
>> > @@ -15,6 +15,7 @@
>> > #include "qemu/event_notifier.h"
>> > #include "qemu/coroutine.h"
>> > #include "qapi/error.h"
>> > +#include "sysemu/block-backend.h"
>> >
>> > /* Only used for assertions.  */
>> > #include "qemu/coroutine_int.h"
>> > @@ -46,7 +47,6 @@ struct qemu_laiocb {
>> > };
>> >
>> > typedef struct {
>> > -    int plugged;
>> >     unsigned int in_queue;
>> >     unsigned int in_flight;
>> >     bool blocked;
>> > @@ -236,7 +236,7 @@ static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
>> > {
>> >     qemu_laio_process_completions(s);
>> >
>> > -    if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
>> > +    if (!QSIMPLEQ_EMPTY(&s->io_q.pending)) {
>> >         ioq_submit(s);
>> >     }
>> > }
>> > @@ -277,7 +277,6 @@ static void qemu_laio_poll_ready(EventNotifier *opaque)
>> > static void ioq_init(LaioQueue *io_q)
>> > {
>> >     QSIMPLEQ_INIT(&io_q->pending);
>> > -    io_q->plugged = 0;
>> >     io_q->in_queue = 0;
>> >     io_q->in_flight = 0;
>> >     io_q->blocked = false;
>> > @@ -354,31 +353,11 @@ static uint64_t laio_max_batch(LinuxAioState *s, uint64_t dev_max_batch)
>> >     return max_batch;
>> > }
>> >
>> > -void laio_io_plug(void)
>> > +static void laio_unplug_fn(void *opaque)
>> > {
>> > -    AioContext *ctx = qemu_get_current_aio_context();
>> > -    LinuxAioState *s = aio_get_linux_aio(ctx);
>> > +    LinuxAioState *s = opaque;
>> >
>> > -    s->io_q.plugged++;
>> > -}
>> > -
>> > -void laio_io_unplug(uint64_t dev_max_batch)
>> > -{
>> > -    AioContext *ctx = qemu_get_current_aio_context();
>> > -    LinuxAioState *s = aio_get_linux_aio(ctx);
>> > -
>> > -    assert(s->io_q.plugged);
>> > -    s->io_q.plugged--;
>> > -
>> > -    /*
>> > -     * Why max batch checking is performed here:
>> > -     * Another BDS may have queued requests with a higher dev_max_batch and
>> > -     * therefore in_queue could now exceed our dev_max_batch. Re-check the max
>> > -     * batch so we can honor our device's dev_max_batch.
>> > -     */
>> > -    if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) ||
>>
>> Why are we removing this condition?
>> Could the same situation occur with the new API?
>
>The semantics of unplug_fn() are different from .bdrv_co_unplug():
>1. unplug_fn() is only called when the last blk_io_unplug() call occurs,
>   not every time blk_io_unplug() is called.
>2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no
>   way to get per-BlockDriverState fields like dev_max_batch.
>
>Therefore this condition cannot be moved to laio_unplug_fn().

I see now.

>
>How important is this condition? I believe that dropping it does not
>have much of an effect but maybe I missed something.

With Kevin we agreed to add it to avoid extra latency in some devices,
but we didn't do much testing on this.

IIRC what solved the performance degradation was the check in
laio_do_submit() that we still have after this changes.

So it may not have much effect, but maybe it's worth mentioning in
the commit description.

>
>Also, does it make sense to define per-BlockDriverState batching limits
>when the AIO engine (Linux AIO or io_uring) is thread-local and shared
>between all BlockDriverStates? I believe the fundamental reason (that we
>discovered later) why dev_max_batch is effective is because the Linux
>kernel processes 32 I/O request submissions at a time. Anything above 32
>adds latency without a batching benefit.

This is a good point, maybe we should confirm it with some tests though.

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/6] block/linux-aio: convert to blk_io_plug_call() API
  2023-05-29  8:50       ` Stefano Garzarella
@ 2023-05-30 17:11         ` Stefan Hajnoczi
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2023-05-30 17:11 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: qemu-devel, Aarushi Mehta, Michael S. Tsirkin, Julia Suvorova,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	Stefano Stabellini, Paul Durrant, Hanna Reitz, Kevin Wolf,
	Fam Zheng, xen-devel, eblake, Anthony Perard, qemu-block

[-- Attachment #1: Type: text/plain, Size: 9241 bytes --]

On Mon, May 29, 2023 at 10:50:34AM +0200, Stefano Garzarella wrote:
> On Wed, May 24, 2023 at 03:36:34PM -0400, Stefan Hajnoczi wrote:
> > On Wed, May 24, 2023 at 10:52:03AM +0200, Stefano Garzarella wrote:
> > > On Tue, May 23, 2023 at 01:12:59PM -0400, Stefan Hajnoczi wrote:
> > > > Stop using the .bdrv_co_io_plug() API because it is not multi-queue
> > > > block layer friendly. Use the new blk_io_plug_call() API to batch I/O
> > > > submission instead.
> > > >
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > Reviewed-by: Eric Blake <eblake@redhat.com>
> > > > ---
> > > > include/block/raw-aio.h |  7 -------
> > > > block/file-posix.c      | 28 ----------------------------
> > > > block/linux-aio.c       | 41 +++++++++++------------------------------
> > > > 3 files changed, 11 insertions(+), 65 deletions(-)
> > > >
> > > > diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
> > > > index da60ca13ef..0f63c2800c 100644
> > > > --- a/include/block/raw-aio.h
> > > > +++ b/include/block/raw-aio.h
> > > > @@ -62,13 +62,6 @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qiov,
> > > >
> > > > void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
> > > > void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
> > > > -
> > > > -/*
> > > > - * laio_io_plug/unplug work in the thread's current AioContext, therefore the
> > > > - * caller must ensure that they are paired in the same IOThread.
> > > > - */
> > > > -void laio_io_plug(void);
> > > > -void laio_io_unplug(uint64_t dev_max_batch);
> > > > #endif
> > > > /* io_uring.c - Linux io_uring implementation */
> > > > #ifdef CONFIG_LINUX_IO_URING
> > > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > > index 7baa8491dd..ac1ed54811 100644
> > > > --- a/block/file-posix.c
> > > > +++ b/block/file-posix.c
> > > > @@ -2550,26 +2550,6 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, int64_t offset,
> > > >     return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
> > > > }
> > > >
> > > > -static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
> > > > -{
> > > > -    BDRVRawState __attribute__((unused)) *s = bs->opaque;
> > > > -#ifdef CONFIG_LINUX_AIO
> > > > -    if (s->use_linux_aio) {
> > > > -        laio_io_plug();
> > > > -    }
> > > > -#endif
> > > > -}
> > > > -
> > > > -static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
> > > > -{
> > > > -    BDRVRawState __attribute__((unused)) *s = bs->opaque;
> > > > -#ifdef CONFIG_LINUX_AIO
> > > > -    if (s->use_linux_aio) {
> > > > -        laio_io_unplug(s->aio_max_batch);
> > > > -    }
> > > > -#endif
> > > > -}
> > > > -
> > > > static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
> > > > {
> > > >     BDRVRawState *s = bs->opaque;
> > > > @@ -3914,8 +3894,6 @@ BlockDriver bdrv_file = {
> > > >     .bdrv_co_copy_range_from = raw_co_copy_range_from,
> > > >     .bdrv_co_copy_range_to  = raw_co_copy_range_to,
> > > >     .bdrv_refresh_limits = raw_refresh_limits,
> > > > -    .bdrv_co_io_plug        = raw_co_io_plug,
> > > > -    .bdrv_co_io_unplug      = raw_co_io_unplug,
> > > >     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> > > >
> > > >     .bdrv_co_truncate                   = raw_co_truncate,
> > > > @@ -4286,8 +4264,6 @@ static BlockDriver bdrv_host_device = {
> > > >     .bdrv_co_copy_range_from = raw_co_copy_range_from,
> > > >     .bdrv_co_copy_range_to  = raw_co_copy_range_to,
> > > >     .bdrv_refresh_limits = raw_refresh_limits,
> > > > -    .bdrv_co_io_plug        = raw_co_io_plug,
> > > > -    .bdrv_co_io_unplug      = raw_co_io_unplug,
> > > >     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> > > >
> > > >     .bdrv_co_truncate                   = raw_co_truncate,
> > > > @@ -4424,8 +4400,6 @@ static BlockDriver bdrv_host_cdrom = {
> > > >     .bdrv_co_pwritev        = raw_co_pwritev,
> > > >     .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> > > >     .bdrv_refresh_limits    = cdrom_refresh_limits,
> > > > -    .bdrv_co_io_plug        = raw_co_io_plug,
> > > > -    .bdrv_co_io_unplug      = raw_co_io_unplug,
> > > >     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> > > >
> > > >     .bdrv_co_truncate                   = raw_co_truncate,
> > > > @@ -4552,8 +4526,6 @@ static BlockDriver bdrv_host_cdrom = {
> > > >     .bdrv_co_pwritev        = raw_co_pwritev,
> > > >     .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> > > >     .bdrv_refresh_limits    = cdrom_refresh_limits,
> > > > -    .bdrv_co_io_plug        = raw_co_io_plug,
> > > > -    .bdrv_co_io_unplug      = raw_co_io_unplug,
> > > >     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> > > >
> > > >     .bdrv_co_truncate                   = raw_co_truncate,
> > > > diff --git a/block/linux-aio.c b/block/linux-aio.c
> > > > index 442c86209b..5021aed68f 100644
> > > > --- a/block/linux-aio.c
> > > > +++ b/block/linux-aio.c
> > > > @@ -15,6 +15,7 @@
> > > > #include "qemu/event_notifier.h"
> > > > #include "qemu/coroutine.h"
> > > > #include "qapi/error.h"
> > > > +#include "sysemu/block-backend.h"
> > > >
> > > > /* Only used for assertions.  */
> > > > #include "qemu/coroutine_int.h"
> > > > @@ -46,7 +47,6 @@ struct qemu_laiocb {
> > > > };
> > > >
> > > > typedef struct {
> > > > -    int plugged;
> > > >     unsigned int in_queue;
> > > >     unsigned int in_flight;
> > > >     bool blocked;
> > > > @@ -236,7 +236,7 @@ static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
> > > > {
> > > >     qemu_laio_process_completions(s);
> > > >
> > > > -    if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
> > > > +    if (!QSIMPLEQ_EMPTY(&s->io_q.pending)) {
> > > >         ioq_submit(s);
> > > >     }
> > > > }
> > > > @@ -277,7 +277,6 @@ static void qemu_laio_poll_ready(EventNotifier *opaque)
> > > > static void ioq_init(LaioQueue *io_q)
> > > > {
> > > >     QSIMPLEQ_INIT(&io_q->pending);
> > > > -    io_q->plugged = 0;
> > > >     io_q->in_queue = 0;
> > > >     io_q->in_flight = 0;
> > > >     io_q->blocked = false;
> > > > @@ -354,31 +353,11 @@ static uint64_t laio_max_batch(LinuxAioState *s, uint64_t dev_max_batch)
> > > >     return max_batch;
> > > > }
> > > >
> > > > -void laio_io_plug(void)
> > > > +static void laio_unplug_fn(void *opaque)
> > > > {
> > > > -    AioContext *ctx = qemu_get_current_aio_context();
> > > > -    LinuxAioState *s = aio_get_linux_aio(ctx);
> > > > +    LinuxAioState *s = opaque;
> > > >
> > > > -    s->io_q.plugged++;
> > > > -}
> > > > -
> > > > -void laio_io_unplug(uint64_t dev_max_batch)
> > > > -{
> > > > -    AioContext *ctx = qemu_get_current_aio_context();
> > > > -    LinuxAioState *s = aio_get_linux_aio(ctx);
> > > > -
> > > > -    assert(s->io_q.plugged);
> > > > -    s->io_q.plugged--;
> > > > -
> > > > -    /*
> > > > -     * Why max batch checking is performed here:
> > > > -     * Another BDS may have queued requests with a higher dev_max_batch and
> > > > -     * therefore in_queue could now exceed our dev_max_batch. Re-check the max
> > > > -     * batch so we can honor our device's dev_max_batch.
> > > > -     */
> > > > -    if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) ||
> > > 
> > > Why are we removing this condition?
> > > Could the same situation occur with the new API?
> > 
> > The semantics of unplug_fn() are different from .bdrv_co_unplug():
> > 1. unplug_fn() is only called when the last blk_io_unplug() call occurs,
> >   not every time blk_io_unplug() is called.
> > 2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no
> >   way to get per-BlockDriverState fields like dev_max_batch.
> > 
> > Therefore this condition cannot be moved to laio_unplug_fn().
> 
> I see now.
> 
> > 
> > How important is this condition? I believe that dropping it does not
> > have much of an effect but maybe I missed something.
> 
> With Kevin we agreed to add it to avoid extra latency in some devices,
> but we didn't do much testing on this.
> 
> IIRC what solved the performance degradation was the check in
> laio_do_submit() that we still have after this changes.
> 
> So it may not have much effect, but maybe it's worth mentioning in
> the commit description.

I'll update the commit description.

> > 
> > Also, does it make sense to define per-BlockDriverState batching limits
> > when the AIO engine (Linux AIO or io_uring) is thread-local and shared
> > between all BlockDriverStates? I believe the fundamental reason (that we
> > discovered later) why dev_max_batch is effective is because the Linux
> > kernel processes 32 I/O request submissions at a time. Anything above 32
> > adds latency without a batching benefit.
> 
> This is a good point, maybe we should confirm it with some tests though.

Yes, I would benchmark it. Also, switching to per-thread max_batch
involves a command-line interface change and we still need to keep
aio-max-batch for compatibility for some time, so it's not urgent.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-05-30 17:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-23 17:12 [PATCH v2 0/6] block: add blk_io_plug_call() API Stefan Hajnoczi
2023-05-23 17:12 ` [PATCH v2 1/6] " Stefan Hajnoczi
2023-05-24  8:08   ` Stefano Garzarella
2023-05-23 17:12 ` [PATCH v2 2/6] block/nvme: convert to " Stefan Hajnoczi
2023-05-24  8:11   ` Stefano Garzarella
2023-05-23 17:12 ` [PATCH v2 3/6] block/blkio: " Stefan Hajnoczi
2023-05-24  8:13   ` Stefano Garzarella
2023-05-23 17:12 ` [PATCH v2 4/6] block/io_uring: " Stefan Hajnoczi
2023-05-24  8:47   ` Stefano Garzarella
2023-05-23 17:12 ` [PATCH v2 5/6] block/linux-aio: " Stefan Hajnoczi
2023-05-24  8:52   ` Stefano Garzarella
2023-05-24 19:36     ` Stefan Hajnoczi
2023-05-29  8:50       ` Stefano Garzarella
2023-05-30 17:11         ` Stefan Hajnoczi
2023-05-23 17:13 ` [PATCH v2 6/6] block: remove bdrv_co_io_plug() API Stefan Hajnoczi
2023-05-24  8:52   ` Stefano Garzarella

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.