QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/5] aio-posix: towards an O(1) event loop
@ 2020-02-14 17:17 Stefan Hajnoczi
  2020-02-14 17:17 ` [PATCH 1/5] aio-posix: fix use after leaving scope in aio_poll() Stefan Hajnoczi
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2020-02-14 17:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini

This patch series makes AioHandler deletion and dispatch O(1) with respect to
the total number of registered handlers.  The event loop has scalability
problems when many AioHandlers are registered because it is O(n).  Linux
epoll(7) is used to avoid scanning over all pollfds but parts of the code still
scan all AioHandlers.

This series reduces QEMU CPU utilization and therefore increases IOPS,
especially for guests that have many devices.  It was tested with 32 vCPUs, 2
virtio-blk,num-queues=1,iothread=iothread1, and 99
virtio-blk,num-queues=32,iothread=iothread1 devices.  Using an IOThread is
necessary because this series does not improve the glib main loop, a non-goal
since the glib API is inherently O(n).

AioContext polling remains O(n) and will be addressed in a separate patch
series.  This patch series increases IOPS from 260k to 300k when AioContext
polling is commented out
(rw=randread,bs=4k,iodepth=32,ioengine=libaio,direct=1).

Stefan Hajnoczi (5):
  aio-posix: fix use after leaving scope in aio_poll()
  aio-posix: don't pass ns timeout to epoll_wait()
  qemu/queue.h: add QLIST_SAFE_REMOVE()
  aio-posix: make AioHandler deletion O(1)
  aio-posix: make AioHandler dispatch O(1) with epoll

 block.c              |   5 +-
 chardev/spice.c      |   4 +-
 include/block/aio.h  |   6 +-
 include/qemu/queue.h |  17 +++++
 util/aio-posix.c     | 172 +++++++++++++++++++++++++++++--------------
 5 files changed, 141 insertions(+), 63 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] aio-posix: fix use after leaving scope in aio_poll()
  2020-02-14 17:17 [PATCH 0/5] aio-posix: towards an O(1) event loop Stefan Hajnoczi
@ 2020-02-14 17:17 ` Stefan Hajnoczi
  2020-02-19  7:02   ` Sergio Lopez
  2020-02-14 17:17 ` [PATCH 2/5] aio-posix: don't pass ns timeout to epoll_wait() Stefan Hajnoczi
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2020-02-14 17:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini

epoll_handler is a stack variable and must not be accessed after it goes
out of scope:

      if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
          AioHandler epoll_handler;
          ...
          add_pollfd(&epoll_handler);
          ret = aio_epoll(ctx, pollfds, npfd, timeout);
      } ...

  ...

  /* if we have any readable fds, dispatch event */
  if (ret > 0) {
      for (i = 0; i < npfd; i++) {
          nodes[i]->pfd.revents = pollfds[i].revents;
      }
  }

nodes[0] is &epoll_handler, which has already gone out of scope.

There is no need to use pollfds[] for epoll.  We don't need an
AioHandler for the epoll fd.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 util/aio-posix.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index a4977f538e..31a8e03ca7 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -104,17 +104,18 @@ static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
     }
 }
 
-static int aio_epoll(AioContext *ctx, GPollFD *pfds,
-                     unsigned npfd, int64_t timeout)
+static int aio_epoll(AioContext *ctx, int64_t timeout)
 {
+    GPollFD pfd = {
+        .fd = ctx->epollfd,
+        .events = G_IO_IN | G_IO_OUT | G_IO_HUP | G_IO_ERR,
+    };
     AioHandler *node;
     int i, ret = 0;
     struct epoll_event events[128];
 
-    assert(npfd == 1);
-    assert(pfds[0].fd == ctx->epollfd);
     if (timeout > 0) {
-        ret = qemu_poll_ns(pfds, npfd, timeout);
+        ret = qemu_poll_ns(&pfd, 1, timeout);
     }
     if (timeout <= 0 || ret > 0) {
         ret = epoll_wait(ctx->epollfd, events,
@@ -658,13 +659,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
         /* wait until next event */
         if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
-            AioHandler epoll_handler;
-
-            epoll_handler.pfd.fd = ctx->epollfd;
-            epoll_handler.pfd.events = G_IO_IN | G_IO_OUT | G_IO_HUP | G_IO_ERR;
-            npfd = 0;
-            add_pollfd(&epoll_handler);
-            ret = aio_epoll(ctx, pollfds, npfd, timeout);
+            npfd = 0; /* pollfds[] is not being used */
+            ret = aio_epoll(ctx, timeout);
         } else  {
             ret = qemu_poll_ns(pollfds, npfd, timeout);
         }
-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/5] aio-posix: don't pass ns timeout to epoll_wait()
  2020-02-14 17:17 [PATCH 0/5] aio-posix: towards an O(1) event loop Stefan Hajnoczi
  2020-02-14 17:17 ` [PATCH 1/5] aio-posix: fix use after leaving scope in aio_poll() Stefan Hajnoczi
@ 2020-02-14 17:17 ` Stefan Hajnoczi
  2020-02-19 10:12   ` Sergio Lopez
  2020-02-14 17:17 ` [PATCH 3/5] qemu/queue.h: add QLIST_SAFE_REMOVE() Stefan Hajnoczi
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2020-02-14 17:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini

Don't pass the nanosecond timeout into epoll_wait(), which expects
milliseconds.

The epoll_wait() timeout value does not matter if qemu_poll_ns()
determined that the poll fd is ready, but passing a value in the wrong
units is still ugly.  Pass a 0 timeout to epoll_wait() instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 util/aio-posix.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 31a8e03ca7..b21bcd8e97 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -116,6 +116,9 @@ static int aio_epoll(AioContext *ctx, int64_t timeout)
 
     if (timeout > 0) {
         ret = qemu_poll_ns(&pfd, 1, timeout);
+        if (ret > 0) {
+            timeout = 0;
+        }
     }
     if (timeout <= 0 || ret > 0) {
         ret = epoll_wait(ctx->epollfd, events,
-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/5] qemu/queue.h: add QLIST_SAFE_REMOVE()
  2020-02-14 17:17 [PATCH 0/5] aio-posix: towards an O(1) event loop Stefan Hajnoczi
  2020-02-14 17:17 ` [PATCH 1/5] aio-posix: fix use after leaving scope in aio_poll() Stefan Hajnoczi
  2020-02-14 17:17 ` [PATCH 2/5] aio-posix: don't pass ns timeout to epoll_wait() Stefan Hajnoczi
@ 2020-02-14 17:17 ` Stefan Hajnoczi
  2020-02-19 10:30   ` Sergio Lopez
  2020-02-14 17:17 ` [PATCH 4/5] aio-posix: make AioHandler deletion O(1) Stefan Hajnoczi
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2020-02-14 17:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini

QLIST_REMOVE() assumes the element is in a list.  It also leaves the
element's linked list pointers dangling.

Introduce a safe version of QLIST_REMOVE() and convert open-coded
instances of this pattern.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c              |  5 +----
 chardev/spice.c      |  4 +---
 include/qemu/queue.h | 14 ++++++++++++++
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index 9c810534d6..484e01d042 100644
--- a/block.c
+++ b/block.c
@@ -2499,10 +2499,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 
 static void bdrv_detach_child(BdrvChild *child)
 {
-    if (child->next.le_prev) {
-        QLIST_REMOVE(child, next);
-        child->next.le_prev = NULL;
-    }
+    QLIST_SAFE_REMOVE(child, next);
 
     bdrv_replace_child(child, NULL);
 
diff --git a/chardev/spice.c b/chardev/spice.c
index 241e2b7770..bf7ea1e294 100644
--- a/chardev/spice.c
+++ b/chardev/spice.c
@@ -216,9 +216,7 @@ static void char_spice_finalize(Object *obj)
 
     vmc_unregister_interface(s);
 
-    if (s->next.le_prev) {
-        QLIST_REMOVE(s, next);
-    }
+    QLIST_SAFE_REMOVE(s, next);
 
     g_free((char *)s->sin.subtype);
     g_free((char *)s->sin.portname);
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 19425f973f..a276363372 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -144,6 +144,20 @@ struct {                                                                \
         *(elm)->field.le_prev = (elm)->field.le_next;                   \
 } while (/*CONSTCOND*/0)
 
+/*
+ * Like QLIST_REMOVE() but safe to call when elm is not in a list
+ */
+#define QLIST_SAFE_REMOVE(elm, field) do {                              \
+        if ((elm)->field.le_prev != NULL) {                             \
+                if ((elm)->field.le_next != NULL)                       \
+                        (elm)->field.le_next->field.le_prev =           \
+                            (elm)->field.le_prev;                       \
+                *(elm)->field.le_prev = (elm)->field.le_next;           \
+                (elm)->field.le_next = NULL;                            \
+                (elm)->field.le_prev = NULL;                            \
+        }                                                               \
+} while (/*CONSTCOND*/0)
+
 #define QLIST_FOREACH(var, head, field)                                 \
         for ((var) = ((head)->lh_first);                                \
                 (var);                                                  \
-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/5] aio-posix: make AioHandler deletion O(1)
  2020-02-14 17:17 [PATCH 0/5] aio-posix: towards an O(1) event loop Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2020-02-14 17:17 ` [PATCH 3/5] qemu/queue.h: add QLIST_SAFE_REMOVE() Stefan Hajnoczi
@ 2020-02-14 17:17 ` Stefan Hajnoczi
  2020-02-19 10:41   ` Sergio Lopez
  2020-02-14 17:17 ` [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll Stefan Hajnoczi
  2020-02-21 15:29 ` [PATCH 0/5] aio-posix: towards an O(1) event loop Stefan Hajnoczi
  5 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2020-02-14 17:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini

It is not necessary to scan all AioHandlers for deletion.  Keep a list
of deleted handlers instead of scanning the full list of all handlers.

The AioHandler->deleted field can be dropped.  Let's check if the
handler has been inserted into the deleted list instead.  Add a new
QLIST_IS_INSERTED() API for this check.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/aio.h  |  6 ++++-
 include/qemu/queue.h |  3 +++
 util/aio-posix.c     | 53 +++++++++++++++++++++++++++++---------------
 3 files changed, 43 insertions(+), 19 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 7ba9bd7874..1a0de1508c 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -42,6 +42,7 @@ void qemu_aio_unref(void *p);
 void qemu_aio_ref(void *p);
 
 typedef struct AioHandler AioHandler;
+typedef QLIST_HEAD(, AioHandler) AioHandlerList;
 typedef void QEMUBHFunc(void *opaque);
 typedef bool AioPollFn(void *opaque);
 typedef void IOHandler(void *opaque);
@@ -58,7 +59,10 @@ struct AioContext {
     QemuRecMutex lock;
 
     /* The list of registered AIO handlers.  Protected by ctx->list_lock. */
-    QLIST_HEAD(, AioHandler) aio_handlers;
+    AioHandlerList aio_handlers;
+
+    /* The list of AIO handlers to be deleted.  Protected by ctx->list_lock. */
+    AioHandlerList deleted_aio_handlers;
 
     /* Used to avoid unnecessary event_notifier_set calls in aio_notify;
      * accessed with atomic primitives.  If this field is 0, everything
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index a276363372..699a8a0568 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -158,6 +158,9 @@ struct {                                                                \
         }                                                               \
 } while (/*CONSTCOND*/0)
 
+/* Is elm in a list? */
+#define QLIST_IS_INSERTED(elm, field) ((elm)->field.le_prev != NULL)
+
 #define QLIST_FOREACH(var, head, field)                                 \
         for ((var) = ((head)->lh_first);                                \
                 (var);                                                  \
diff --git a/util/aio-posix.c b/util/aio-posix.c
index b21bcd8e97..3a98a2acb9 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -31,10 +31,10 @@ struct AioHandler
     AioPollFn *io_poll;
     IOHandler *io_poll_begin;
     IOHandler *io_poll_end;
-    int deleted;
     void *opaque;
     bool is_external;
     QLIST_ENTRY(AioHandler) node;
+    QLIST_ENTRY(AioHandler) node_deleted;
 };
 
 #ifdef CONFIG_EPOLL_CREATE1
@@ -67,7 +67,7 @@ static bool aio_epoll_try_enable(AioContext *ctx)
 
     QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
         int r;
-        if (node->deleted || !node->pfd.events) {
+        if (QLIST_IS_INSERTED(node, node_deleted) || !node->pfd.events) {
             continue;
         }
         event.events = epoll_events_from_pfd(node->pfd.events);
@@ -195,9 +195,11 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
     AioHandler *node;
 
     QLIST_FOREACH(node, &ctx->aio_handlers, node) {
-        if (node->pfd.fd == fd)
-            if (!node->deleted)
+        if (node->pfd.fd == fd) {
+            if (!QLIST_IS_INSERTED(node, node_deleted)) {
                 return node;
+            }
+        }
     }
 
     return NULL;
@@ -216,7 +218,7 @@ static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node)
 
     /* If a read is in progress, just mark the node as deleted */
     if (qemu_lockcnt_count(&ctx->list_lock)) {
-        node->deleted = 1;
+        QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_deleted);
         node->pfd.revents = 0;
         return false;
     }
@@ -358,7 +360,7 @@ static void poll_set_started(AioContext *ctx, bool started)
     QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
         IOHandler *fn;
 
-        if (node->deleted) {
+        if (QLIST_IS_INSERTED(node, node_deleted)) {
             continue;
         }
 
@@ -415,6 +417,26 @@ bool aio_pending(AioContext *ctx)
     return result;
 }
 
+static void aio_free_deleted_handlers(AioContext *ctx)
+{
+    AioHandler *node;
+
+    if (QLIST_EMPTY_RCU(&ctx->deleted_aio_handlers)) {
+        return;
+    }
+    if (!qemu_lockcnt_dec_if_lock(&ctx->list_lock)) {
+        return; /* we are nested, let the parent do the freeing */
+    }
+
+    while ((node = QLIST_FIRST_RCU(&ctx->deleted_aio_handlers))) {
+        QLIST_REMOVE(node, node);
+        QLIST_REMOVE(node, node_deleted);
+        g_free(node);
+    }
+
+    qemu_lockcnt_inc_and_unlock(&ctx->list_lock);
+}
+
 static bool aio_dispatch_handlers(AioContext *ctx)
 {
     AioHandler *node, *tmp;
@@ -426,7 +448,7 @@ static bool aio_dispatch_handlers(AioContext *ctx)
         revents = node->pfd.revents & node->pfd.events;
         node->pfd.revents = 0;
 
-        if (!node->deleted &&
+        if (!QLIST_IS_INSERTED(node, node_deleted) &&
             (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
             aio_node_check(ctx, node->is_external) &&
             node->io_read) {
@@ -437,21 +459,13 @@ static bool aio_dispatch_handlers(AioContext *ctx)
                 progress = true;
             }
         }
-        if (!node->deleted &&
+        if (!QLIST_IS_INSERTED(node, node_deleted) &&
             (revents & (G_IO_OUT | G_IO_ERR)) &&
             aio_node_check(ctx, node->is_external) &&
             node->io_write) {
             node->io_write(node->opaque);
             progress = true;
         }
-
-        if (node->deleted) {
-            if (qemu_lockcnt_dec_if_lock(&ctx->list_lock)) {
-                QLIST_REMOVE(node, node);
-                g_free(node);
-                qemu_lockcnt_inc_and_unlock(&ctx->list_lock);
-            }
-        }
     }
 
     return progress;
@@ -462,6 +476,7 @@ void aio_dispatch(AioContext *ctx)
     qemu_lockcnt_inc(&ctx->list_lock);
     aio_bh_poll(ctx);
     aio_dispatch_handlers(ctx);
+    aio_free_deleted_handlers(ctx);
     qemu_lockcnt_dec(&ctx->list_lock);
 
     timerlistgroup_run_timers(&ctx->tlg);
@@ -519,7 +534,7 @@ static bool run_poll_handlers_once(AioContext *ctx, int64_t *timeout)
     AioHandler *node;
 
     QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
-        if (!node->deleted && node->io_poll &&
+        if (!QLIST_IS_INSERTED(node, node_deleted) && node->io_poll &&
             aio_node_check(ctx, node->is_external) &&
             node->io_poll(node->opaque)) {
             /*
@@ -653,7 +668,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
         if (!aio_epoll_enabled(ctx)) {
             QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
-                if (!node->deleted && node->pfd.events
+                if (!QLIST_IS_INSERTED(node, node_deleted) && node->pfd.events
                     && aio_node_check(ctx, node->is_external)) {
                     add_pollfd(node);
                 }
@@ -730,6 +745,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
         progress |= aio_dispatch_handlers(ctx);
     }
 
+    aio_free_deleted_handlers(ctx);
+
     qemu_lockcnt_dec(&ctx->list_lock);
 
     progress |= timerlistgroup_run_timers(&ctx->tlg);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll
  2020-02-14 17:17 [PATCH 0/5] aio-posix: towards an O(1) event loop Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2020-02-14 17:17 ` [PATCH 4/5] aio-posix: make AioHandler deletion O(1) Stefan Hajnoczi
@ 2020-02-14 17:17 ` Stefan Hajnoczi
  2020-02-19 11:00   ` Sergio Lopez
  2020-02-19 11:13   ` Paolo Bonzini
  2020-02-21 15:29 ` [PATCH 0/5] aio-posix: towards an O(1) event loop Stefan Hajnoczi
  5 siblings, 2 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2020-02-14 17:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz, Stefan Hajnoczi,
	Marc-André Lureau, Paolo Bonzini

File descriptor monitoring is O(1) with epoll(7), but
aio_dispatch_handlers() still scans all AioHandlers instead of
dispatching just those that are ready.  This makes aio_poll() O(n) with
respect to the total number of registered handlers.

Add a local ready_list to aio_poll() so that each nested aio_poll()
builds a list of handlers ready to be dispatched.  Since file descriptor
polling is level-triggered, nested aio_poll() calls also see fds that
were ready in the parent but not yet dispatched.  This guarantees that
nested aio_poll() invocations will dispatch all fds, even those that
became ready before the nested invocation.

Since only handlers ready to be dispatched are placed onto the
ready_list, the new aio_dispatch_ready_handlers() function provides O(1)
dispatch.

Note that AioContext polling is still O(n) and currently cannot be fully
disabled.  This still needs to be fixed before aio_poll() is fully O(1).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 util/aio-posix.c | 106 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 76 insertions(+), 30 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 3a98a2acb9..dc33ca08a6 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -34,6 +34,7 @@ struct AioHandler
     void *opaque;
     bool is_external;
     QLIST_ENTRY(AioHandler) node;
+    QLIST_ENTRY(AioHandler) node_ready; /* only used during aio_poll() */
     QLIST_ENTRY(AioHandler) node_deleted;
 };
 
@@ -104,7 +105,18 @@ static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
     }
 }
 
-static int aio_epoll(AioContext *ctx, int64_t timeout)
+/* Add a handler to a ready list */
+static void add_ready_handler(AioHandlerList *ready_list,
+                              AioHandler *node,
+                              int revents)
+{
+    QLIST_SAFE_REMOVE(node, node_ready); /* remove from nested parent's list */
+    node->pfd.revents = revents;
+    QLIST_INSERT_HEAD(ready_list, node, node_ready);
+}
+
+static int aio_epoll(AioContext *ctx, AioHandlerList *ready_list,
+                     int64_t timeout)
 {
     GPollFD pfd = {
         .fd = ctx->epollfd,
@@ -129,11 +141,13 @@ static int aio_epoll(AioContext *ctx, int64_t timeout)
         }
         for (i = 0; i < ret; i++) {
             int ev = events[i].events;
+            int revents = (ev & EPOLLIN ? G_IO_IN : 0) |
+                          (ev & EPOLLOUT ? G_IO_OUT : 0) |
+                          (ev & EPOLLHUP ? G_IO_HUP : 0) |
+                          (ev & EPOLLERR ? G_IO_ERR : 0);
+
             node = events[i].data.ptr;
-            node->pfd.revents = (ev & EPOLLIN ? G_IO_IN : 0) |
-                (ev & EPOLLOUT ? G_IO_OUT : 0) |
-                (ev & EPOLLHUP ? G_IO_HUP : 0) |
-                (ev & EPOLLERR ? G_IO_ERR : 0);
+            add_ready_handler(ready_list, node, revents);
         }
     }
 out:
@@ -437,36 +451,63 @@ static void aio_free_deleted_handlers(AioContext *ctx)
     qemu_lockcnt_inc_and_unlock(&ctx->list_lock);
 }
 
-static bool aio_dispatch_handlers(AioContext *ctx)
+static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
 {
-    AioHandler *node, *tmp;
     bool progress = false;
+    int revents;
 
-    QLIST_FOREACH_SAFE_RCU(node, &ctx->aio_handlers, node, tmp) {
-        int revents;
+    revents = node->pfd.revents & node->pfd.events;
+    node->pfd.revents = 0;
 
-        revents = node->pfd.revents & node->pfd.events;
-        node->pfd.revents = 0;
+    if (!QLIST_IS_INSERTED(node, node_deleted) &&
+        (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
+        aio_node_check(ctx, node->is_external) &&
+        node->io_read) {
+        node->io_read(node->opaque);
 
-        if (!QLIST_IS_INSERTED(node, node_deleted) &&
-            (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
-            aio_node_check(ctx, node->is_external) &&
-            node->io_read) {
-            node->io_read(node->opaque);
-
-            /* aio_notify() does not count as progress */
-            if (node->opaque != &ctx->notifier) {
-                progress = true;
-            }
-        }
-        if (!QLIST_IS_INSERTED(node, node_deleted) &&
-            (revents & (G_IO_OUT | G_IO_ERR)) &&
-            aio_node_check(ctx, node->is_external) &&
-            node->io_write) {
-            node->io_write(node->opaque);
+        /* aio_notify() does not count as progress */
+        if (node->opaque != &ctx->notifier) {
             progress = true;
         }
     }
+    if (!QLIST_IS_INSERTED(node, node_deleted) &&
+        (revents & (G_IO_OUT | G_IO_ERR)) &&
+        aio_node_check(ctx, node->is_external) &&
+        node->io_write) {
+        node->io_write(node->opaque);
+        progress = true;
+    }
+
+    return progress;
+}
+
+/*
+ * If we have a list of ready handlers then this is more efficient than
+ * scanning all handlers with aio_dispatch_handlers().
+ */
+static bool aio_dispatch_ready_handlers(AioContext *ctx,
+                                        AioHandlerList *ready_list)
+{
+    bool progress = false;
+    AioHandler *node;
+
+    while ((node = QLIST_FIRST(ready_list))) {
+        QLIST_SAFE_REMOVE(node, node_ready);
+        progress = aio_dispatch_handler(ctx, node) || progress;
+    }
+
+    return progress;
+}
+
+/* Slower than aio_dispatch_ready_handlers() but only used via glib */
+static bool aio_dispatch_handlers(AioContext *ctx)
+{
+    AioHandler *node, *tmp;
+    bool progress = false;
+
+    QLIST_FOREACH_SAFE_RCU(node, &ctx->aio_handlers, node, tmp) {
+        progress = aio_dispatch_handler(ctx, node) || progress;
+    }
 
     return progress;
 }
@@ -628,6 +669,7 @@ static bool try_poll_mode(AioContext *ctx, int64_t *timeout)
 
 bool aio_poll(AioContext *ctx, bool blocking)
 {
+    AioHandlerList ready_list = QLIST_HEAD_INITIALIZER(ready_list);
     AioHandler *node;
     int i;
     int ret = 0;
@@ -678,7 +720,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
         /* wait until next event */
         if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
             npfd = 0; /* pollfds[] is not being used */
-            ret = aio_epoll(ctx, timeout);
+            ret = aio_epoll(ctx, &ready_list, timeout);
         } else  {
             ret = qemu_poll_ns(pollfds, npfd, timeout);
         }
@@ -733,7 +775,11 @@ bool aio_poll(AioContext *ctx, bool blocking)
     /* if we have any readable fds, dispatch event */
     if (ret > 0) {
         for (i = 0; i < npfd; i++) {
-            nodes[i]->pfd.revents = pollfds[i].revents;
+            int revents = pollfds[i].revents;
+
+            if (revents) {
+                add_ready_handler(&ready_list, nodes[i], revents);
+            }
         }
     }
 
@@ -742,7 +788,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
     progress |= aio_bh_poll(ctx);
 
     if (ret > 0) {
-        progress |= aio_dispatch_handlers(ctx);
+        progress |= aio_dispatch_ready_handlers(ctx, &ready_list);
     }
 
     aio_free_deleted_handlers(ctx);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] aio-posix: fix use after leaving scope in aio_poll()
  2020-02-14 17:17 ` [PATCH 1/5] aio-posix: fix use after leaving scope in aio_poll() Stefan Hajnoczi
@ 2020-02-19  7:02   ` Sergio Lopez
  0 siblings, 0 replies; 20+ messages in thread
From: Sergio Lopez @ 2020-02-19  7:02 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 992 bytes --]

On Fri, Feb 14, 2020 at 05:17:08PM +0000, Stefan Hajnoczi wrote:
> epoll_handler is a stack variable and must not be accessed after it goes
> out of scope:
> 
>       if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
>           AioHandler epoll_handler;
>           ...
>           add_pollfd(&epoll_handler);
>           ret = aio_epoll(ctx, pollfds, npfd, timeout);
>       } ...
> 
>   ...
> 
>   /* if we have any readable fds, dispatch event */
>   if (ret > 0) {
>       for (i = 0; i < npfd; i++) {
>           nodes[i]->pfd.revents = pollfds[i].revents;
>       }
>   }
> 
> nodes[0] is &epoll_handler, which has already gone out of scope.
> 
> There is no need to use pollfds[] for epoll.  We don't need an
> AioHandler for the epoll fd.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  util/aio-posix.c | 20 ++++++++------------
>  1 file changed, 8 insertions(+), 12 deletions(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] aio-posix: don't pass ns timeout to epoll_wait()
  2020-02-14 17:17 ` [PATCH 2/5] aio-posix: don't pass ns timeout to epoll_wait() Stefan Hajnoczi
@ 2020-02-19 10:12   ` Sergio Lopez
  0 siblings, 0 replies; 20+ messages in thread
From: Sergio Lopez @ 2020-02-19 10:12 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 544 bytes --]

On Fri, Feb 14, 2020 at 05:17:09PM +0000, Stefan Hajnoczi wrote:
> Don't pass the nanosecond timeout into epoll_wait(), which expects
> milliseconds.
> 
> The epoll_wait() timeout value does not matter if qemu_poll_ns()
> determined that the poll fd is ready, but passing a value in the wrong
> units is still ugly.  Pass a 0 timeout to epoll_wait() instead.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  util/aio-posix.c | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/5] qemu/queue.h: add QLIST_SAFE_REMOVE()
  2020-02-14 17:17 ` [PATCH 3/5] qemu/queue.h: add QLIST_SAFE_REMOVE() Stefan Hajnoczi
@ 2020-02-19 10:30   ` Sergio Lopez
  0 siblings, 0 replies; 20+ messages in thread
From: Sergio Lopez @ 2020-02-19 10:30 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 568 bytes --]

On Fri, Feb 14, 2020 at 05:17:10PM +0000, Stefan Hajnoczi wrote:
> QLIST_REMOVE() assumes the element is in a list.  It also leaves the
> element's linked list pointers dangling.
> 
> Introduce a safe version of QLIST_REMOVE() and convert open-coded
> instances of this pattern.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block.c              |  5 +----
>  chardev/spice.c      |  4 +---
>  include/qemu/queue.h | 14 ++++++++++++++
>  3 files changed, 16 insertions(+), 7 deletions(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] aio-posix: make AioHandler deletion O(1)
  2020-02-14 17:17 ` [PATCH 4/5] aio-posix: make AioHandler deletion O(1) Stefan Hajnoczi
@ 2020-02-19 10:41   ` Sergio Lopez
  0 siblings, 0 replies; 20+ messages in thread
From: Sergio Lopez @ 2020-02-19 10:41 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 714 bytes --]

On Fri, Feb 14, 2020 at 05:17:11PM +0000, Stefan Hajnoczi wrote:
> It is not necessary to scan all AioHandlers for deletion.  Keep a list
> of deleted handlers instead of scanning the full list of all handlers.
> 
> The AioHandler->deleted field can be dropped.  Let's check if the
> handler has been inserted into the deleted list instead.  Add a new
> QLIST_IS_INSERTED() API for this check.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/block/aio.h  |  6 ++++-
>  include/qemu/queue.h |  3 +++
>  util/aio-posix.c     | 53 +++++++++++++++++++++++++++++---------------
>  3 files changed, 43 insertions(+), 19 deletions(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll
  2020-02-14 17:17 ` [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll Stefan Hajnoczi
@ 2020-02-19 11:00   ` Sergio Lopez
  2020-02-19 11:13   ` Paolo Bonzini
  1 sibling, 0 replies; 20+ messages in thread
From: Sergio Lopez @ 2020-02-19 11:00 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Max Reitz,
	Paolo Bonzini, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 1292 bytes --]

On Fri, Feb 14, 2020 at 05:17:12PM +0000, Stefan Hajnoczi wrote:
> File descriptor monitoring is O(1) with epoll(7), but
> aio_dispatch_handlers() still scans all AioHandlers instead of
> dispatching just those that are ready.  This makes aio_poll() O(n) with
> respect to the total number of registered handlers.
> 
> Add a local ready_list to aio_poll() so that each nested aio_poll()
> builds a list of handlers ready to be dispatched.  Since file descriptor
> polling is level-triggered, nested aio_poll() calls also see fds that
> were ready in the parent but not yet dispatched.  This guarantees that
> nested aio_poll() invocations will dispatch all fds, even those that
> became ready before the nested invocation.
> 
> Since only handlers ready to be dispatched are placed onto the
> ready_list, the new aio_dispatch_ready_handlers() function provides O(1)
> dispatch.
> 
> Note that AioContext polling is still O(n) and currently cannot be fully
> disabled.  This still needs to be fixed before aio_poll() is fully O(1).
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  util/aio-posix.c | 106 +++++++++++++++++++++++++++++++++--------------
>  1 file changed, 76 insertions(+), 30 deletions(-)

Reviewed-by: Sergio Lopez <slp@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll
  2020-02-14 17:17 ` [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll Stefan Hajnoczi
  2020-02-19 11:00   ` Sergio Lopez
@ 2020-02-19 11:13   ` Paolo Bonzini
  2020-02-21 12:59     ` Stefan Hajnoczi
  1 sibling, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2020-02-19 11:13 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Max Reitz, qemu-block, Marc-André Lureau

On 14/02/20 18:17, Stefan Hajnoczi wrote:
> +    while ((node = QLIST_FIRST(ready_list))) {
> +        QLIST_SAFE_REMOVE(node, node_ready);

Why does this need safe remove?

Paolo

> +        progress = aio_dispatch_handler(ctx, node) || progress;
> +    }



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll
  2020-02-19 11:13   ` Paolo Bonzini
@ 2020-02-21 12:59     ` Stefan Hajnoczi
  2020-02-21 13:06       ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2020-02-21 12:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Max Reitz,
	Stefan Hajnoczi, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 1196 bytes --]

On Wed, Feb 19, 2020 at 12:13:40PM +0100, Paolo Bonzini wrote:
> On 14/02/20 18:17, Stefan Hajnoczi wrote:
> > +    while ((node = QLIST_FIRST(ready_list))) {
> > +        QLIST_SAFE_REMOVE(node, node_ready);
> 
> Why does this need safe remove?

Yes, it's necessary.  QLIST_SAFE_REMOVE() has two properties that make
it "safe":
1. It doesn't crash if the node is currently not on a list.
2. It clears the node's linked list pointers so that future linked
   list operations (like QLIST_SAFE_REMOVE()) aren't accidentally
   performed on stale pointers.

The node has a long lifespan and will be inserted into ready_lists
multiple times.  We need to safely remove it from ready_list to protect
against a corruption the next time the node is inserted into a
ready_list again:

  /* Add a handler to a ready list */
  static void add_ready_handler(AioHandlerList *ready_list,
                                AioHandler *node,
                                int revents)
  {
      QLIST_SAFE_REMOVE(node, node_ready); /* remove from nested parent's list */
      ^---- would cause corruption if node->node_ready was stale!

Would you like me to add a comment?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll
  2020-02-21 12:59     ` Stefan Hajnoczi
@ 2020-02-21 13:06       ` Paolo Bonzini
  2020-02-21 14:44         ` Stefan Hajnoczi
  2020-02-21 14:47         ` Stefan Hajnoczi
  0 siblings, 2 replies; 20+ messages in thread
From: Paolo Bonzini @ 2020-02-21 13:06 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Max Reitz,
	Stefan Hajnoczi, Marc-André Lureau

On 21/02/20 13:59, Stefan Hajnoczi wrote:
> 1. It doesn't crash if the node is currently not on a list.
> 2. It clears the node's linked list pointers so that future linked
>    list operations (like QLIST_SAFE_REMOVE()) aren't accidentally
>    performed on stale pointers.
>
> The node has a long lifespan and will be inserted into ready_lists
> multiple times.  We need to safely remove it from ready_list to protect
> against a corruption the next time the node is inserted into a
> ready_list again:

Ah, so the one I singled out is for (2) (we know the node is currently
on a list), while the one below is for (1).  Would it make sense to move
(2) to Q*_REMOVE_*?  We can do it separately after this pull request.

>   /* Add a handler to a ready list */
>   static void add_ready_handler(AioHandlerList *ready_list,
>                                 AioHandler *node,
>                                 int revents)
>   {
>       QLIST_SAFE_REMOVE(node, node_ready); /* remove from nested parent's list */
>       ^---- would cause corruption if node->node_ready was stale!
> 
> Would you like me to add a comment?
No, it's okay.

Paolo



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll
  2020-02-21 13:06       ` Paolo Bonzini
@ 2020-02-21 14:44         ` Stefan Hajnoczi
  2020-02-21 14:47         ` Stefan Hajnoczi
  1 sibling, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2020-02-21 14:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Stefan Hajnoczi, qemu-devel,
	Max Reitz, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 947 bytes --]

On Fri, Feb 21, 2020 at 02:06:26PM +0100, Paolo Bonzini wrote:
> On 21/02/20 13:59, Stefan Hajnoczi wrote:
> > 1. It doesn't crash if the node is currently not on a list.
> > 2. It clears the node's linked list pointers so that future linked
> >    list operations (like QLIST_SAFE_REMOVE()) aren't accidentally
> >    performed on stale pointers.
> >
> > The node has a long lifespan and will be inserted into ready_lists
> > multiple times.  We need to safely remove it from ready_list to protect
> > against a corruption the next time the node is inserted into a
> > ready_list again:
> 
> Ah, so the one I singled out is for (2) (we know the node is currently
> on a list), while the one below is for (1).  Would it make sense to move
> (2) to Q*_REMOVE_*?  We can do it separately after this pull request.

Extending all Q*_REMOVE*() macros to clear the linked list pointers is
nice.  I'll send a follow-up patch.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll
  2020-02-21 13:06       ` Paolo Bonzini
  2020-02-21 14:44         ` Stefan Hajnoczi
@ 2020-02-21 14:47         ` Stefan Hajnoczi
  2020-02-21 15:04           ` Paolo Bonzini
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2020-02-21 14:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Stefan Hajnoczi, qemu-devel,
	Max Reitz, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 682 bytes --]

On Fri, Feb 21, 2020 at 02:06:26PM +0100, Paolo Bonzini wrote:
> On 21/02/20 13:59, Stefan Hajnoczi wrote:
> >   /* Add a handler to a ready list */
> >   static void add_ready_handler(AioHandlerList *ready_list,
> >                                 AioHandler *node,
> >                                 int revents)
> >   {
> >       QLIST_SAFE_REMOVE(node, node_ready); /* remove from nested parent's list */
> >       ^---- would cause corruption if node->node_ready was stale!
> > 
> > Would you like me to add a comment?
> No, it's okay.

Are you happy with this series?

Shall I include it in my next pull request or do you want to merge it?

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll
  2020-02-21 14:47         ` Stefan Hajnoczi
@ 2020-02-21 15:04           ` Paolo Bonzini
  2020-02-21 15:29             ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2020-02-21 15:04 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Stefan Hajnoczi, qemu-devel,
	Max Reitz, Marc-André Lureau

On 21/02/20 15:47, Stefan Hajnoczi wrote:
>>>       QLIST_SAFE_REMOVE(node, node_ready); /* remove from nested parent's list */
>>>       ^---- would cause corruption if node->node_ready was stale!
>>>
>>> Would you like me to add a comment?
>> No, it's okay.
> Are you happy with this series?

Yes.  Let's keep the Q*_REMOVE cleanup on the todo list.  I'd keep
Q*_SAFE_REMOVE, but clear the pointer unconditionally in Q*_REMOVE so
that we can have something like Q*_IN_LIST too.

> Shall I include it in my next pull request or do you want to merge it?

No, it's yours.

Paolo



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll
  2020-02-21 15:04           ` Paolo Bonzini
@ 2020-02-21 15:29             ` Stefan Hajnoczi
  2020-02-21 15:37               ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2020-02-21 15:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Stefan Hajnoczi, qemu-devel,
	Max Reitz, Marc-André Lureau

[-- Attachment #1: Type: text/plain, Size: 833 bytes --]

On Fri, Feb 21, 2020 at 04:04:10PM +0100, Paolo Bonzini wrote:
> On 21/02/20 15:47, Stefan Hajnoczi wrote:
> >>>       QLIST_SAFE_REMOVE(node, node_ready); /* remove from nested parent's list */
> >>>       ^---- would cause corruption if node->node_ready was stale!
> >>>
> >>> Would you like me to add a comment?
> >> No, it's okay.
> > Are you happy with this series?
> 
> Yes.  Let's keep the Q*_REMOVE cleanup on the todo list.  I'd keep
> Q*_SAFE_REMOVE, but clear the pointer unconditionally in Q*_REMOVE so
> that we can have something like Q*_IN_LIST too.

QLIST_IS_INSERTED() is part of this patch series, although I can rename
it to Q*_IN_LIST() and cover all linked list variants. :)

> > Shall I include it in my next pull request or do you want to merge it?
> 
> No, it's yours.

Thanks!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] aio-posix: towards an O(1) event loop
  2020-02-14 17:17 [PATCH 0/5] aio-posix: towards an O(1) event loop Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2020-02-14 17:17 ` [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll Stefan Hajnoczi
@ 2020-02-21 15:29 ` Stefan Hajnoczi
  5 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2020-02-21 15:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Max Reitz,
	Marc-André Lureau, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1749 bytes --]

On Fri, Feb 14, 2020 at 05:17:07PM +0000, Stefan Hajnoczi wrote:
> This patch series makes AioHandler deletion and dispatch O(1) with respect to
> the total number of registered handlers.  The event loop has scalability
> problems when many AioHandlers are registered because it is O(n).  Linux
> epoll(7) is used to avoid scanning over all pollfds but parts of the code still
> scan all AioHandlers.
> 
> This series reduces QEMU CPU utilization and therefore increases IOPS,
> especially for guests that have many devices.  It was tested with 32 vCPUs, 2
> virtio-blk,num-queues=1,iothread=iothread1, and 99
> virtio-blk,num-queues=32,iothread=iothread1 devices.  Using an IOThread is
> necessary because this series does not improve the glib main loop, a non-goal
> since the glib API is inherently O(n).
> 
> AioContext polling remains O(n) and will be addressed in a separate patch
> series.  This patch series increases IOPS from 260k to 300k when AioContext
> polling is commented out
> (rw=randread,bs=4k,iodepth=32,ioengine=libaio,direct=1).
> 
> Stefan Hajnoczi (5):
>   aio-posix: fix use after leaving scope in aio_poll()
>   aio-posix: don't pass ns timeout to epoll_wait()
>   qemu/queue.h: add QLIST_SAFE_REMOVE()
>   aio-posix: make AioHandler deletion O(1)
>   aio-posix: make AioHandler dispatch O(1) with epoll
> 
>  block.c              |   5 +-
>  chardev/spice.c      |   4 +-
>  include/block/aio.h  |   6 +-
>  include/qemu/queue.h |  17 +++++
>  util/aio-posix.c     | 172 +++++++++++++++++++++++++++++--------------
>  5 files changed, 141 insertions(+), 63 deletions(-)
> 
> -- 
> 2.24.1
> 

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll
  2020-02-21 15:29             ` Stefan Hajnoczi
@ 2020-02-21 15:37               ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2020-02-21 15:37 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Stefan Hajnoczi, qemu-devel,
	Max Reitz, Marc-André Lureau

On 21/02/20 16:29, Stefan Hajnoczi wrote:
>> Yes.  Let's keep the Q*_REMOVE cleanup on the todo list.  I'd keep
>> Q*_SAFE_REMOVE, but clear the pointer unconditionally in Q*_REMOVE so
>> that we can have something like Q*_IN_LIST too.
> QLIST_IS_INSERTED() is part of this patch series, although I can rename
> it to Q*_IN_LIST() and cover all linked list variants. :)

Yes I meant having it for all variants.  I remembered you adding it but
not the exact spelling!

Paolo



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, back to index

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 17:17 [PATCH 0/5] aio-posix: towards an O(1) event loop Stefan Hajnoczi
2020-02-14 17:17 ` [PATCH 1/5] aio-posix: fix use after leaving scope in aio_poll() Stefan Hajnoczi
2020-02-19  7:02   ` Sergio Lopez
2020-02-14 17:17 ` [PATCH 2/5] aio-posix: don't pass ns timeout to epoll_wait() Stefan Hajnoczi
2020-02-19 10:12   ` Sergio Lopez
2020-02-14 17:17 ` [PATCH 3/5] qemu/queue.h: add QLIST_SAFE_REMOVE() Stefan Hajnoczi
2020-02-19 10:30   ` Sergio Lopez
2020-02-14 17:17 ` [PATCH 4/5] aio-posix: make AioHandler deletion O(1) Stefan Hajnoczi
2020-02-19 10:41   ` Sergio Lopez
2020-02-14 17:17 ` [PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll Stefan Hajnoczi
2020-02-19 11:00   ` Sergio Lopez
2020-02-19 11:13   ` Paolo Bonzini
2020-02-21 12:59     ` Stefan Hajnoczi
2020-02-21 13:06       ` Paolo Bonzini
2020-02-21 14:44         ` Stefan Hajnoczi
2020-02-21 14:47         ` Stefan Hajnoczi
2020-02-21 15:04           ` Paolo Bonzini
2020-02-21 15:29             ` Stefan Hajnoczi
2020-02-21 15:37               ` Paolo Bonzini
2020-02-21 15:29 ` [PATCH 0/5] aio-posix: towards an O(1) event loop Stefan Hajnoczi

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git