All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait in aio_poll
@ 2015-06-30 13:19 Fam Zheng
  2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 1/4] aio: Introduce aio_set_fd_handler_pri Fam Zheng
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Fam Zheng @ 2015-06-30 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, pbonzini, qemu-block, Stefan Hajnoczi

epoll is more scalable than ppoll. It performs faster than ppoll when the
number of polled fds is high.

See patch 4 for an example of the senario and some benchmark data.

Note: it is only effective on iothread (dataplane), while the main loop cannot
benefit from this yet, because the iohandler and chardev GSource's don't easily
fit into this epoll interface style (that's why main loop uses qemu_poll_ns
directly instead of aio_poll()).

There is hardly any timer activity in iothreads for now, as a result the
timeout is always 0 or -1. Therefore, timerfd, or the said nanosecond
epoll_pwait1 interface, which fixes the timeout granularity deficiency is not
immediately necessary at this point, but still that will be simple to add.

Please review!

Fam

Fam Zheng (4):
  aio: Introduce aio_set_fd_handler_pri
  aio: Move aio_set_fd_handler to async.c
  aio: Introduce aio_context_setup
  aio-posix: Use epoll in aio_poll

 aio-posix.c         | 127 +++++++++++++++++++++++++++++++++++++++++++++++-----
 aio-win32.c         |   7 +--
 async.c             |  24 ++++++++--
 include/block/aio.h |   6 +++
 4 files changed, 144 insertions(+), 20 deletions(-)

-- 
2.4.3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH RFC 1/4] aio: Introduce aio_set_fd_handler_pri
  2015-06-30 13:19 [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait in aio_poll Fam Zheng
@ 2015-06-30 13:19 ` Fam Zheng
  2015-07-07 14:29   ` Stefan Hajnoczi
  2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 2/4] aio: Move aio_set_fd_handler to async.c Fam Zheng
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2015-06-30 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, pbonzini, qemu-block, Stefan Hajnoczi

G_IO_PRI event is watched by slirp, add support of that to make future
refactoring possible.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 aio-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/aio-posix.c b/aio-posix.c
index a633c6e..f22657e 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -63,7 +63,7 @@ void aio_set_fd_handler_pri(AioContext *ctx,
     node = find_aio_handler(ctx, fd);
 
     /* Are we deleting the fd handler? */
-    if (!io_read && !io_write) {
+    if (!io_read && !io_write && !io_read_pri) {
         if (node) {
             g_source_remove_poll(&ctx->source, &node->pfd);
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH RFC 2/4] aio: Move aio_set_fd_handler to async.c
  2015-06-30 13:19 [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait in aio_poll Fam Zheng
  2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 1/4] aio: Introduce aio_set_fd_handler_pri Fam Zheng
@ 2015-06-30 13:19 ` Fam Zheng
  2015-07-07 14:30   ` Stefan Hajnoczi
  2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 3/4] aio: Introduce aio_context_setup Fam Zheng
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2015-06-30 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, pbonzini, qemu-block, Stefan Hajnoczi

This function is now simply a wrapper for aio_set_fd_handler_pri, so
move it to host independent file.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 aio-posix.c | 9 ---------
 aio-win32.c | 9 ---------
 async.c     | 9 +++++++++
 3 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index f22657e..f516de1 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -42,15 +42,6 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
     return NULL;
 }
 
-void aio_set_fd_handler(AioContext *ctx,
-                        int fd,
-                        IOHandler *io_read,
-                        IOHandler *io_write,
-                        void *opaque)
-{
-    return aio_set_fd_handler_pri(ctx, fd, io_read, io_write, NULL, opaque);
-}
-
 void aio_set_fd_handler_pri(AioContext *ctx,
                             int fd,
                             IOHandler *io_read,
diff --git a/aio-win32.c b/aio-win32.c
index 016cc19..3c75896 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -31,15 +31,6 @@ struct AioHandler {
     QLIST_ENTRY(AioHandler) node;
 };
 
-void aio_set_fd_handler(AioContext *ctx,
-                        int fd,
-                        IOHandler *io_read,
-                        IOHandler *io_write,
-                        void *opaque)
-{
-    return aio_set_fd_handler_pri(ctx, fd, io_read, io_write, NULL, opaque);
-}
-
 void aio_set_fd_handler_pri(AioContext *ctx,
                             int fd,
                             IOHandler *io_read,
diff --git a/async.c b/async.c
index 77d080d..06971f4 100644
--- a/async.c
+++ b/async.c
@@ -328,3 +328,12 @@ void aio_context_release(AioContext *ctx)
 {
     rfifolock_unlock(&ctx->lock);
 }
+
+void aio_set_fd_handler(AioContext *ctx,
+                        int fd,
+                        IOHandler *io_read,
+                        IOHandler *io_write,
+                        void *opaque)
+{
+    return aio_set_fd_handler_pri(ctx, fd, io_read, io_write, NULL, opaque);
+}
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH RFC 3/4] aio: Introduce aio_context_setup
  2015-06-30 13:19 [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait in aio_poll Fam Zheng
  2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 1/4] aio: Introduce aio_set_fd_handler_pri Fam Zheng
  2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 2/4] aio: Move aio_set_fd_handler to async.c Fam Zheng
@ 2015-06-30 13:19 ` Fam Zheng
  2015-07-07 14:35   ` Stefan Hajnoczi
  2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll Fam Zheng
  2015-07-07 14:54 ` [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait " Christian Borntraeger
  4 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2015-06-30 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, pbonzini, qemu-block, Stefan Hajnoczi

This is the place to initialize OS specific bits of AioContext.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 aio-posix.c         |  4 ++++
 aio-win32.c         |  4 ++++
 async.c             | 15 ++++++++++++---
 include/block/aio.h |  3 +++
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index f516de1..22406ce 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -42,6 +42,10 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
     return NULL;
 }
 
+void aio_context_setup(AioContext *ctx, Error **errp)
+{
+}
+
 void aio_set_fd_handler_pri(AioContext *ctx,
                             int fd,
                             IOHandler *io_read,
diff --git a/aio-win32.c b/aio-win32.c
index 3c75896..852aa97 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -31,6 +31,10 @@ struct AioHandler {
     QLIST_ENTRY(AioHandler) node;
 };
 
+void aio_context_setup(AioContext *ctx, Error **errp)
+{
+}
+
 void aio_set_fd_handler_pri(AioContext *ctx,
                             int fd,
                             IOHandler *io_read,
diff --git a/async.c b/async.c
index 06971f4..1d70cfd 100644
--- a/async.c
+++ b/async.c
@@ -290,12 +290,17 @@ AioContext *aio_context_new(Error **errp)
 {
     int ret;
     AioContext *ctx;
+    Error *local_err = NULL;
+
     ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
+    aio_context_setup(ctx, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        goto fail;
+    }
     ret = event_notifier_init(&ctx->notifier, false);
     if (ret < 0) {
-        g_source_destroy(&ctx->source);
-        error_setg_errno(errp, -ret, "Failed to initialize event notifier");
-        return NULL;
+        goto fail;
     }
     g_source_set_can_recurse(&ctx->source, true);
     aio_set_event_notifier(ctx, &ctx->notifier,
@@ -307,6 +312,10 @@ AioContext *aio_context_new(Error **errp)
     timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
 
     return ctx;
+fail:
+    g_source_destroy(&ctx->source);
+    error_setg_errno(errp, -ret, "Failed to initialize event notifier");
+    return NULL;
 }
 
 void aio_context_ref(AioContext *ctx)
diff --git a/include/block/aio.h b/include/block/aio.h
index 82502e1..5120583 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -261,6 +261,9 @@ void aio_set_fd_handler_pri(AioContext *ctx,
                             IOHandler *io_read_pri,
                             void *opaque);
 
+/* Initialize OS specific bits in AioContext */
+void aio_context_setup(AioContext *ctx, Error **errp);
+
 /* Register an event notifier and associated callbacks.  Behaves very similarly
  * to event_notifier_set_handler.  Unlike event_notifier_set_handler, these callbacks
  * will be invoked when using aio_poll().
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll
  2015-06-30 13:19 [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait in aio_poll Fam Zheng
                   ` (2 preceding siblings ...)
  2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 3/4] aio: Introduce aio_context_setup Fam Zheng
@ 2015-06-30 13:19 ` Fam Zheng
  2015-07-07 15:08   ` Stefan Hajnoczi
  2015-07-07 14:54 ` [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait " Christian Borntraeger
  4 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2015-06-30 13:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, pbonzini, qemu-block, Stefan Hajnoczi

This patch let aio_poll use epoll_wait(2) syscall instead of
qemu_poll_ns, if possible. It improves scalability of
iothread (for example, virtio-scsi-dataplane.)

The epollfd is managed together with the GSource and ctx->aio_handlers,
by creating epoll_event instances for each watched aio fd and adding to
the epollfd with epoll_ctl.

The following table is a fio benchmark comparison on a single guest
block device, with different number of disks attached to the same scsi
bus (in MB/s):

=====================================================================
  # of scsi-disks  |        master           |       epoll
                   |   rd     wr    randrw   |   rd    wr    randrw
---------------------------------------------------------------------
        1          |   103    96     49      |   105   99     49
        4          |   92     96     48      |   103   98     49
        8          |   96     94     46      |   101   97     50
        16         |   91     91     45      |   101   95     48
        32         |   84     83     40      |   95    95     48
        64         |   75     73     35      |   91    90     44
        128        |   54     53     26      |   79    80     39
        256        |   41     39     19      |   63    62     30
=====================================================================

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 aio-posix.c         | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 include/block/aio.h |   3 ++
 2 files changed, 117 insertions(+), 4 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index 22406ce..111d7fb 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -17,6 +17,9 @@
 #include "block/block.h"
 #include "qemu/queue.h"
 #include "qemu/sockets.h"
+#ifdef CONFIG_EPOLL
+#include <sys/epoll.h>
+#endif
 
 struct AioHandler
 {
@@ -44,6 +47,12 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
 
 void aio_context_setup(AioContext *ctx, Error **errp)
 {
+#ifdef CONFIG_EPOLL
+    ctx->epollfd = epoll_create1(EPOLL_CLOEXEC);
+    if (ctx->epollfd < 0) {
+        error_setg(errp, "Failed to create epoll fd: %s", strerror(errno));
+    }
+#endif
 }
 
 void aio_set_fd_handler_pri(AioContext *ctx,
@@ -54,6 +63,11 @@ void aio_set_fd_handler_pri(AioContext *ctx,
                             void *opaque)
 {
     AioHandler *node;
+#ifdef CONFIG_EPOLL
+    struct epoll_event event;
+    int r;
+    bool add = false;
+#endif
 
     node = find_aio_handler(ctx, fd);
 
@@ -61,6 +75,10 @@ void aio_set_fd_handler_pri(AioContext *ctx,
     if (!io_read && !io_write && !io_read_pri) {
         if (node) {
             g_source_remove_poll(&ctx->source, &node->pfd);
+#ifdef CONFIG_EPOLL
+            r = epoll_ctl(ctx->epollfd, EPOLL_CTL_DEL, fd, &event);
+            assert(!r);
+#endif
 
             /* If the lock is held, just mark the node as deleted */
             if (ctx->walking_handlers) {
@@ -83,6 +101,9 @@ void aio_set_fd_handler_pri(AioContext *ctx,
             QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
 
             g_source_add_poll(&ctx->source, &node->pfd);
+#ifdef CONFIG_EPOLL
+            add = true;
+#endif
         }
         /* Update handler with latest information */
         node->io_read = io_read;
@@ -93,6 +114,13 @@ void aio_set_fd_handler_pri(AioContext *ctx,
         node->pfd.events = (io_read ? G_IO_IN | G_IO_HUP | G_IO_ERR : 0);
         node->pfd.events |= (io_write ? G_IO_OUT | G_IO_ERR : 0);
         node->pfd.events |= (io_read_pri ? G_IO_PRI | G_IO_HUP | G_IO_ERR : 0);
+#ifdef CONFIG_EPOLL
+        event.data.ptr = node;
+        event.events = node->pfd.events;
+        r = epoll_ctl(ctx->epollfd, add ? EPOLL_CTL_ADD : EPOLL_CTL_MOD,
+                      fd, &event);
+        assert(!r);
+#endif
     }
 
     aio_notify(ctx);
@@ -198,7 +226,80 @@ bool aio_dispatch(AioContext *ctx)
     return progress;
 }
 
-/* These thread-local variables are used only in a small part of aio_poll
+#ifdef CONFIG_EPOLL
+QEMU_BUILD_BUG_ON((int)G_IO_IN != EPOLLIN);
+QEMU_BUILD_BUG_ON((int)G_IO_OUT != EPOLLOUT);
+QEMU_BUILD_BUG_ON((int)G_IO_PRI != EPOLLPRI);
+QEMU_BUILD_BUG_ON((int)G_IO_ERR != EPOLLERR);
+QEMU_BUILD_BUG_ON((int)G_IO_HUP != EPOLLHUP);
+
+#define EPOLL_BATCH 128
+static bool aio_poll_epoll(AioContext *ctx, bool blocking)
+{
+    AioHandler *node;
+    bool was_dispatching;
+    int i, ret;
+    bool progress;
+    int64_t timeout;
+    struct epoll_event events[EPOLL_BATCH];
+
+    aio_context_acquire(ctx);
+    was_dispatching = ctx->dispatching;
+    progress = false;
+
+    /* aio_notify can avoid the expensive event_notifier_set if
+     * everything (file descriptors, bottom halves, timers) will
+     * be re-evaluated before the next blocking poll().  This is
+     * already true when aio_poll is called with blocking == false;
+     * if blocking == true, it is only true after poll() returns.
+     *
+     * If we're in a nested event loop, ctx->dispatching might be true.
+     * In that case we can restore it just before returning, but we
+     * have to clear it now.
+     */
+    aio_set_dispatching(ctx, !blocking);
+
+    ctx->walking_handlers++;
+
+    timeout = blocking ? aio_compute_timeout(ctx) : 0;
+
+    if (timeout > 0) {
+        timeout = DIV_ROUND_UP(timeout, 1000000);
+    }
+
+    /* wait until next event */
+    if (timeout) {
+        aio_context_release(ctx);
+    }
+    ret = epoll_wait(ctx->epollfd, events, EPOLL_BATCH, timeout);
+    if (timeout) {
+        aio_context_acquire(ctx);
+    }
+
+    /* if we have any readable fds, dispatch event */
+    if (ret > 0) {
+        for (i = 0; i < ret; i++) {
+            node = events[i].data.ptr;
+            node->pfd.revents = events[i].events;
+        }
+    }
+
+    ctx->walking_handlers--;
+
+    /* Run dispatch even if there were no readable fds to run timers */
+    aio_set_dispatching(ctx, true);
+    if (aio_dispatch(ctx)) {
+        progress = true;
+    }
+
+    aio_set_dispatching(ctx, was_dispatching);
+    aio_context_release(ctx);
+
+    return progress;
+}
+#else
+
+/* These thread-local variables are used only in a small part of aio_poll_posix
  * around the call to the poll() system call.  In particular they are not
  * used while aio_poll is performing callbacks, which makes it much easier
  * to think about reentrancy!
@@ -212,7 +313,6 @@ bool aio_dispatch(AioContext *ctx)
 static __thread GPollFD *pollfds;
 static __thread AioHandler **nodes;
 static __thread unsigned npfd, nalloc;
-static __thread Notifier pollfds_cleanup_notifier;
 
 static void pollfds_cleanup(Notifier *n, void *unused)
 {
@@ -221,7 +321,7 @@ static void pollfds_cleanup(Notifier *n, void *unused)
     g_free(nodes);
     nalloc = 0;
 }
-
+static __thread Notifier pollfds_cleanup_notifier;
 static void add_pollfd(AioHandler *node)
 {
     if (npfd == nalloc) {
@@ -244,7 +344,7 @@ static void add_pollfd(AioHandler *node)
     npfd++;
 }
 
-bool aio_poll(AioContext *ctx, bool blocking)
+bool aio_poll_posix(AioContext *ctx, bool blocking)
 {
     AioHandler *node;
     bool was_dispatching;
@@ -311,3 +411,13 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
     return progress;
 }
+#endif
+
+bool aio_poll(AioContext *ctx, bool blocking)
+{
+#ifdef CONFIG_EPOLL
+    return aio_poll_epoll(ctx, blocking);
+#else
+    return aio_poll_posix(ctx, blocking);
+#endif
+}
diff --git a/include/block/aio.h b/include/block/aio.h
index 5120583..9178ff2 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -87,6 +87,9 @@ struct AioContext {
 
     /* TimerLists for calling timers - one per clock type */
     QEMUTimerListGroup tlg;
+
+    /* epoll fd */
+    int epollfd;
 };
 
 /* Used internally to synchronize aio_poll against qemu_bh_schedule.  */
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/4] aio: Introduce aio_set_fd_handler_pri
  2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 1/4] aio: Introduce aio_set_fd_handler_pri Fam Zheng
@ 2015-07-07 14:29   ` Stefan Hajnoczi
  2015-07-08  1:07     ` Fam Zheng
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2015-07-07 14:29 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, pbonzini, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 583 bytes --]

On Tue, Jun 30, 2015 at 09:19:42PM +0800, Fam Zheng wrote:
> G_IO_PRI event is watched by slirp, add support of that to make future
> refactoring possible.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  aio-posix.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/aio-posix.c b/aio-posix.c
> index a633c6e..f22657e 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -63,7 +63,7 @@ void aio_set_fd_handler_pri(AioContext *ctx,

Which branch or patch series is this based on?  qemu.git/master doesn't
have aio_set_fd_handler_pri().

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 2/4] aio: Move aio_set_fd_handler to async.c
  2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 2/4] aio: Move aio_set_fd_handler to async.c Fam Zheng
@ 2015-07-07 14:30   ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2015-07-07 14:30 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, pbonzini, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 424 bytes --]

On Tue, Jun 30, 2015 at 09:19:43PM +0800, Fam Zheng wrote:
> This function is now simply a wrapper for aio_set_fd_handler_pri, so
> move it to host independent file.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  aio-posix.c | 9 ---------
>  aio-win32.c | 9 ---------
>  async.c     | 9 +++++++++
>  3 files changed, 9 insertions(+), 18 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 3/4] aio: Introduce aio_context_setup
  2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 3/4] aio: Introduce aio_context_setup Fam Zheng
@ 2015-07-07 14:35   ` Stefan Hajnoczi
  2015-07-08  1:15     ` Fam Zheng
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2015-07-07 14:35 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, pbonzini, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1485 bytes --]

On Tue, Jun 30, 2015 at 09:19:44PM +0800, Fam Zheng wrote:
> diff --git a/async.c b/async.c
> index 06971f4..1d70cfd 100644
> --- a/async.c
> +++ b/async.c
> @@ -290,12 +290,17 @@ AioContext *aio_context_new(Error **errp)
>  {
>      int ret;
>      AioContext *ctx;
> +    Error *local_err = NULL;
> +
>      ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
> +    aio_context_setup(ctx, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);

Is there any reason to introduce local_err?  errp can be passed directly
into aio_context_setup().

> +        goto fail;
> +    }
>      ret = event_notifier_init(&ctx->notifier, false);
>      if (ret < 0) {
> -        g_source_destroy(&ctx->source);
> -        error_setg_errno(errp, -ret, "Failed to initialize event notifier");
> -        return NULL;
> +        goto fail;
>      }
>      g_source_set_can_recurse(&ctx->source, true);
>      aio_set_event_notifier(ctx, &ctx->notifier,
> @@ -307,6 +312,10 @@ AioContext *aio_context_new(Error **errp)
>      timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
>  
>      return ctx;
> +fail:
> +    g_source_destroy(&ctx->source);
> +    error_setg_errno(errp, -ret, "Failed to initialize event notifier");

If aio_context_setup() failed then this hits the *errp == NULL assertion
failure in error_setg_errno().  This call shouldn't be moved away from
the event_notifier_init() call.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait in aio_poll
  2015-06-30 13:19 [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait in aio_poll Fam Zheng
                   ` (3 preceding siblings ...)
  2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll Fam Zheng
@ 2015-07-07 14:54 ` Christian Borntraeger
  2015-07-08  1:02   ` Fam Zheng
  4 siblings, 1 reply; 20+ messages in thread
From: Christian Borntraeger @ 2015-07-07 14:54 UTC (permalink / raw)
  To: Fam Zheng, qemu-devel; +Cc: Kevin Wolf, pbonzini, Stefan Hajnoczi, qemu-block

Am 30.06.2015 um 15:19 schrieb Fam Zheng:
> epoll is more scalable than ppoll. It performs faster than ppoll when the
> number of polled fds is high.
> 
> See patch 4 for an example of the senario and some benchmark data.
> 
> Note: it is only effective on iothread (dataplane), while the main loop cannot
> benefit from this yet, because the iohandler and chardev GSource's don't easily
> fit into this epoll interface style (that's why main loop uses qemu_poll_ns
> directly instead of aio_poll()).
> 
> There is hardly any timer activity in iothreads for now, as a result the
> timeout is always 0 or -1. Therefore, timerfd, or the said nanosecond
> epoll_pwait1 interface, which fixes the timeout granularity deficiency is not
> immediately necessary at this point, but still that will be simple to add.
> 
> Please review!

Is there a branch somewhere, so that I could give it a spin?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll
  2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll Fam Zheng
@ 2015-07-07 15:08   ` Stefan Hajnoczi
  2015-07-07 15:27     ` Paolo Bonzini
  2015-07-08  1:01     ` Fam Zheng
  0 siblings, 2 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2015-07-07 15:08 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, pbonzini, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3747 bytes --]

On Tue, Jun 30, 2015 at 09:19:45PM +0800, Fam Zheng wrote:
> =====================================================================
>   # of scsi-disks  |        master           |       epoll
>                    |   rd     wr    randrw   |   rd    wr    randrw
> ---------------------------------------------------------------------
>         1          |   103    96     49      |   105   99     49
>         4          |   92     96     48      |   103   98     49
>         8          |   96     94     46      |   101   97     50
>         16         |   91     91     45      |   101   95     48
>         32         |   84     83     40      |   95    95     48
>         64         |   75     73     35      |   91    90     44
>         128        |   54     53     26      |   79    80     39
>         256        |   41     39     19      |   63    62     30
> =====================================================================

Nice results!

> @@ -44,6 +47,12 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
>  
>  void aio_context_setup(AioContext *ctx, Error **errp)
>  {
> +#ifdef CONFIG_EPOLL
> +    ctx->epollfd = epoll_create1(EPOLL_CLOEXEC);
> +    if (ctx->epollfd < 0) {
> +        error_setg(errp, "Failed to create epoll fd: %s", strerror(errno));

Slightly more concise:
error_setg_errno(errp, errno, "Failed to create epoll fd")

> -/* These thread-local variables are used only in a small part of aio_poll
> +#ifdef CONFIG_EPOLL
> +QEMU_BUILD_BUG_ON((int)G_IO_IN != EPOLLIN);
> +QEMU_BUILD_BUG_ON((int)G_IO_OUT != EPOLLOUT);
> +QEMU_BUILD_BUG_ON((int)G_IO_PRI != EPOLLPRI);
> +QEMU_BUILD_BUG_ON((int)G_IO_ERR != EPOLLERR);
> +QEMU_BUILD_BUG_ON((int)G_IO_HUP != EPOLLHUP);

I guess this assumption is okay but maybe the compiler optimizes:

  event.events = (node->pfd.events & G_IO_IN ? EPOLLIN : 0) |
                 (node->pfd.events & G_IO_OUT ? EPOLLOUT : 0) |
		 (node->pfd.events & G_IO_PRI ? EPOLLPRI : 0) |
		 (node->pfd.events & G_IO_ERR ? EPOLLERR : 0) |
		 (node->pfd.events & G_IO_HUP ? EPOLLHUP : 0);

into:

  events.events = node->pfd.events & (EPOLLIN | EPOLLOUT | EPOLLPRI |
                                      EPOLLERR | EPOLLHUP);

which is just an AND instruction so it's effectively free and doesn't
assume that these constants have the same values.

> +
> +#define EPOLL_BATCH 128
> +static bool aio_poll_epoll(AioContext *ctx, bool blocking)
> +{
> +    AioHandler *node;
> +    bool was_dispatching;
> +    int i, ret;
> +    bool progress;
> +    int64_t timeout;
> +    struct epoll_event events[EPOLL_BATCH];
> +
> +    aio_context_acquire(ctx);
> +    was_dispatching = ctx->dispatching;
> +    progress = false;
> +
> +    /* aio_notify can avoid the expensive event_notifier_set if
> +     * everything (file descriptors, bottom halves, timers) will
> +     * be re-evaluated before the next blocking poll().  This is
> +     * already true when aio_poll is called with blocking == false;
> +     * if blocking == true, it is only true after poll() returns.
> +     *
> +     * If we're in a nested event loop, ctx->dispatching might be true.
> +     * In that case we can restore it just before returning, but we
> +     * have to clear it now.
> +     */
> +    aio_set_dispatching(ctx, !blocking);
> +
> +    ctx->walking_handlers++;
> +
> +    timeout = blocking ? aio_compute_timeout(ctx) : 0;
> +
> +    if (timeout > 0) {
> +        timeout = DIV_ROUND_UP(timeout, 1000000);
> +    }

I think you already posted the timerfd code in an earlier series.  Why
degrade to millisecond precision?  It needs to be fixed up anyway if the
main loop uses aio_poll() in the future.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll
  2015-07-07 15:08   ` Stefan Hajnoczi
@ 2015-07-07 15:27     ` Paolo Bonzini
  2015-07-08  1:01     ` Fam Zheng
  1 sibling, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2015-07-07 15:27 UTC (permalink / raw)
  To: Stefan Hajnoczi, Fam Zheng; +Cc: Kevin Wolf, qemu-devel, qemu-block



On 07/07/2015 17:08, Stefan Hajnoczi wrote:
>> > +
>> > +#define EPOLL_BATCH 128
>> > +static bool aio_poll_epoll(AioContext *ctx, bool blocking)
>> > +{
>> > +    AioHandler *node;
>> > +    bool was_dispatching;
>> > +    int i, ret;
>> > +    bool progress;
>> > +    int64_t timeout;
>> > +    struct epoll_event events[EPOLL_BATCH];
>> > +
>> > +    aio_context_acquire(ctx);
>> > +    was_dispatching = ctx->dispatching;
>> > +    progress = false;
>> > +
>> > +    /* aio_notify can avoid the expensive event_notifier_set if
>> > +     * everything (file descriptors, bottom halves, timers) will
>> > +     * be re-evaluated before the next blocking poll().  This is
>> > +     * already true when aio_poll is called with blocking == false;
>> > +     * if blocking == true, it is only true after poll() returns.
>> > +     *
>> > +     * If we're in a nested event loop, ctx->dispatching might be true.
>> > +     * In that case we can restore it just before returning, but we
>> > +     * have to clear it now.
>> > +     */
>> > +    aio_set_dispatching(ctx, !blocking);
>> > +
>> > +    ctx->walking_handlers++;
>> > +
>> > +    timeout = blocking ? aio_compute_timeout(ctx) : 0;
>> > +
>> > +    if (timeout > 0) {
>> > +        timeout = DIV_ROUND_UP(timeout, 1000000);
>> > +    }
> I think you already posted the timerfd code in an earlier series.  Why
> degrade to millisecond precision?  It needs to be fixed up anyway if the
> main loop uses aio_poll() in the future.

BTW, what about putting the code in a separate aio-epoll.c file?

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll
  2015-07-07 15:08   ` Stefan Hajnoczi
  2015-07-07 15:27     ` Paolo Bonzini
@ 2015-07-08  1:01     ` Fam Zheng
  2015-07-08 10:58       ` Stefan Hajnoczi
  1 sibling, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2015-07-08  1:01 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, pbonzini, qemu-devel, qemu-block

On Tue, 07/07 16:08, Stefan Hajnoczi wrote:
> > @@ -44,6 +47,12 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
> >  
> >  void aio_context_setup(AioContext *ctx, Error **errp)
> >  {
> > +#ifdef CONFIG_EPOLL
> > +    ctx->epollfd = epoll_create1(EPOLL_CLOEXEC);
> > +    if (ctx->epollfd < 0) {
> > +        error_setg(errp, "Failed to create epoll fd: %s", strerror(errno));
> 
> Slightly more concise:
> error_setg_errno(errp, errno, "Failed to create epoll fd")

Okay.

> 
> > -/* These thread-local variables are used only in a small part of aio_poll
> > +#ifdef CONFIG_EPOLL
> > +QEMU_BUILD_BUG_ON((int)G_IO_IN != EPOLLIN);
> > +QEMU_BUILD_BUG_ON((int)G_IO_OUT != EPOLLOUT);
> > +QEMU_BUILD_BUG_ON((int)G_IO_PRI != EPOLLPRI);
> > +QEMU_BUILD_BUG_ON((int)G_IO_ERR != EPOLLERR);
> > +QEMU_BUILD_BUG_ON((int)G_IO_HUP != EPOLLHUP);
> 
> I guess this assumption is okay but maybe the compiler optimizes:
> 
>   event.events = (node->pfd.events & G_IO_IN ? EPOLLIN : 0) |
>                  (node->pfd.events & G_IO_OUT ? EPOLLOUT : 0) |
> 		 (node->pfd.events & G_IO_PRI ? EPOLLPRI : 0) |
> 		 (node->pfd.events & G_IO_ERR ? EPOLLERR : 0) |
> 		 (node->pfd.events & G_IO_HUP ? EPOLLHUP : 0);
> 
> into:
> 
>   events.events = node->pfd.events & (EPOLLIN | EPOLLOUT | EPOLLPRI |
>                                       EPOLLERR | EPOLLHUP);
> 
> which is just an AND instruction so it's effectively free and doesn't
> assume that these constants have the same values.

Okay, it'll be a few more typing (convert to and back) but more straigtforward
and self-documenting.

> 
> > +
> > +#define EPOLL_BATCH 128
> > +static bool aio_poll_epoll(AioContext *ctx, bool blocking)
> > +{
> > +    AioHandler *node;
> > +    bool was_dispatching;
> > +    int i, ret;
> > +    bool progress;
> > +    int64_t timeout;
> > +    struct epoll_event events[EPOLL_BATCH];
> > +
> > +    aio_context_acquire(ctx);
> > +    was_dispatching = ctx->dispatching;
> > +    progress = false;
> > +
> > +    /* aio_notify can avoid the expensive event_notifier_set if
> > +     * everything (file descriptors, bottom halves, timers) will
> > +     * be re-evaluated before the next blocking poll().  This is
> > +     * already true when aio_poll is called with blocking == false;
> > +     * if blocking == true, it is only true after poll() returns.
> > +     *
> > +     * If we're in a nested event loop, ctx->dispatching might be true.
> > +     * In that case we can restore it just before returning, but we
> > +     * have to clear it now.
> > +     */
> > +    aio_set_dispatching(ctx, !blocking);
> > +
> > +    ctx->walking_handlers++;
> > +
> > +    timeout = blocking ? aio_compute_timeout(ctx) : 0;
> > +
> > +    if (timeout > 0) {
> > +        timeout = DIV_ROUND_UP(timeout, 1000000);
> > +    }
> 
> I think you already posted the timerfd code in an earlier series.  Why
> degrade to millisecond precision?  It needs to be fixed up anyway if the
> main loop uses aio_poll() in the future.

Because of a little complication: timeout here is always -1 for iothread, and
what is interesting is that -1 actually requires an explicit

    timerfd_settime(timerfd, flags, &(struct itimerspec){{0, 0}}, NULL)

to disable timerfd for this aio_poll(), which costs somethings. Passing -1 to
epoll_wait() without this doesn't work because the timerfd is already added to
the epollfd and may have an unexpected timeout set before.

Of course we can cache the state and optimize, but I've not reasoned about what
if another thread happens to call aio_poll() when we're in epoll_wait(), for
example when the first aio_poll() has a positive timeout but the second one has
-1.

Fam

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait in aio_poll
  2015-07-07 14:54 ` [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait " Christian Borntraeger
@ 2015-07-08  1:02   ` Fam Zheng
  2015-07-08  7:59     ` Christian Borntraeger
  0 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2015-07-08  1:02 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kevin Wolf, pbonzini, Stefan Hajnoczi, qemu-devel, qemu-block

On Tue, 07/07 16:54, Christian Borntraeger wrote:
> Am 30.06.2015 um 15:19 schrieb Fam Zheng:
> > epoll is more scalable than ppoll. It performs faster than ppoll when the
> > number of polled fds is high.
> > 
> > See patch 4 for an example of the senario and some benchmark data.
> > 
> > Note: it is only effective on iothread (dataplane), while the main loop cannot
> > benefit from this yet, because the iohandler and chardev GSource's don't easily
> > fit into this epoll interface style (that's why main loop uses qemu_poll_ns
> > directly instead of aio_poll()).
> > 
> > There is hardly any timer activity in iothreads for now, as a result the
> > timeout is always 0 or -1. Therefore, timerfd, or the said nanosecond
> > epoll_pwait1 interface, which fixes the timeout granularity deficiency is not
> > immediately necessary at this point, but still that will be simple to add.
> > 
> > Please review!
> 
> Is there a branch somewhere, so that I could give it a spin?
> 

Here:

https://github.com/famz/qemu/tree/aio-posix-epoll

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/4] aio: Introduce aio_set_fd_handler_pri
  2015-07-07 14:29   ` Stefan Hajnoczi
@ 2015-07-08  1:07     ` Fam Zheng
  0 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2015-07-08  1:07 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, pbonzini, qemu-devel, qemu-block

On Tue, 07/07 15:29, Stefan Hajnoczi wrote:
> On Tue, Jun 30, 2015 at 09:19:42PM +0800, Fam Zheng wrote:
> > G_IO_PRI event is watched by slirp, add support of that to make future
> > refactoring possible.
> > 
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  aio-posix.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/aio-posix.c b/aio-posix.c
> > index a633c6e..f22657e 100644
> > --- a/aio-posix.c
> > +++ b/aio-posix.c
> > @@ -63,7 +63,7 @@ void aio_set_fd_handler_pri(AioContext *ctx,
> 
> Which branch or patch series is this based on?  qemu.git/master doesn't
> have aio_set_fd_handler_pri().

This is based on "[Qemu-devel] [PATCH 0/9] slirp: iohandler: Rebase onto aio".

http://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg02741.html

The dependency is soft and I can rebase onto qemu.git/master if you want.

Fam

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 3/4] aio: Introduce aio_context_setup
  2015-07-07 14:35   ` Stefan Hajnoczi
@ 2015-07-08  1:15     ` Fam Zheng
  2015-07-08 10:51       ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2015-07-08  1:15 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, pbonzini, qemu-devel, qemu-block

On Tue, 07/07 15:35, Stefan Hajnoczi wrote:
> On Tue, Jun 30, 2015 at 09:19:44PM +0800, Fam Zheng wrote:
> > diff --git a/async.c b/async.c
> > index 06971f4..1d70cfd 100644
> > --- a/async.c
> > +++ b/async.c
> > @@ -290,12 +290,17 @@ AioContext *aio_context_new(Error **errp)
> >  {
> >      int ret;
> >      AioContext *ctx;
> > +    Error *local_err = NULL;
> > +
> >      ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
> > +    aio_context_setup(ctx, &local_err);
> > +    if (local_err) {
> > +        error_propagate(errp, local_err);
> 
> Is there any reason to introduce local_err?  errp can be passed directly
> into aio_context_setup().

It's used for catching failure of aio_context_setup, because the convention is
errp can be NULL.

> 
> > +        goto fail;
> > +    }
> >      ret = event_notifier_init(&ctx->notifier, false);
> >      if (ret < 0) {
> > -        g_source_destroy(&ctx->source);
> > -        error_setg_errno(errp, -ret, "Failed to initialize event notifier");
> > -        return NULL;
> > +        goto fail;
> >      }
> >      g_source_set_can_recurse(&ctx->source, true);
> >      aio_set_event_notifier(ctx, &ctx->notifier,
> > @@ -307,6 +312,10 @@ AioContext *aio_context_new(Error **errp)
> >      timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
> >  
> >      return ctx;
> > +fail:
> > +    g_source_destroy(&ctx->source);
> > +    error_setg_errno(errp, -ret, "Failed to initialize event notifier");
> 
> If aio_context_setup() failed then this hits the *errp == NULL assertion
> failure in error_setg_errno().  This call shouldn't be moved away from
> the event_notifier_init() call.

Yes, will fix.

Fam

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait in aio_poll
  2015-07-08  1:02   ` Fam Zheng
@ 2015-07-08  7:59     ` Christian Borntraeger
  0 siblings, 0 replies; 20+ messages in thread
From: Christian Borntraeger @ 2015-07-08  7:59 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, pbonzini, Stefan Hajnoczi, qemu-devel, qemu-block

Am 08.07.2015 um 03:02 schrieb Fam Zheng:
> On Tue, 07/07 16:54, Christian Borntraeger wrote:
>> Am 30.06.2015 um 15:19 schrieb Fam Zheng:
>>> epoll is more scalable than ppoll. It performs faster than ppoll when the
>>> number of polled fds is high.
>>>
>>> See patch 4 for an example of the senario and some benchmark data.
>>>
>>> Note: it is only effective on iothread (dataplane), while the main loop cannot
>>> benefit from this yet, because the iohandler and chardev GSource's don't easily
>>> fit into this epoll interface style (that's why main loop uses qemu_poll_ns
>>> directly instead of aio_poll()).
>>>
>>> There is hardly any timer activity in iothreads for now, as a result the
>>> timeout is always 0 or -1. Therefore, timerfd, or the said nanosecond
>>> epoll_pwait1 interface, which fixes the timeout granularity deficiency is not
>>> immediately necessary at this point, but still that will be simple to add.
>>>
>>> Please review!
>>
>> Is there a branch somewhere, so that I could give it a spin?
>>
> 
> Here:
> 
> https://github.com/famz/qemu/tree/aio-posix-epoll
> 
In file included from /home/cborntra/REPOS/qemu/include/qemu/option.h:31:0,
                 from /home/cborntra/REPOS/qemu/include/qemu-common.h:44,
                 from /home/cborntra/REPOS/qemu/async.c:25:
/home/cborntra/REPOS/qemu/async.c: In function 'aio_context_new':
/home/cborntra/REPOS/qemu/include/qapi/error.h:57:20: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized]
     error_set_errno(errp, os_error, ERROR_CLASS_GENERIC_ERROR, \
                    ^
/home/cborntra/REPOS/qemu/async.c:291:9: note: 'ret' was declared here
     int ret;
         ^
cc1: all warnings being treated as errors

With that fixed, it seems to work. Still looking at the performance.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 3/4] aio: Introduce aio_context_setup
  2015-07-08  1:15     ` Fam Zheng
@ 2015-07-08 10:51       ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2015-07-08 10:51 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, pbonzini, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 976 bytes --]

On Wed, Jul 08, 2015 at 09:15:57AM +0800, Fam Zheng wrote:
> On Tue, 07/07 15:35, Stefan Hajnoczi wrote:
> > On Tue, Jun 30, 2015 at 09:19:44PM +0800, Fam Zheng wrote:
> > > diff --git a/async.c b/async.c
> > > index 06971f4..1d70cfd 100644
> > > --- a/async.c
> > > +++ b/async.c
> > > @@ -290,12 +290,17 @@ AioContext *aio_context_new(Error **errp)
> > >  {
> > >      int ret;
> > >      AioContext *ctx;
> > > +    Error *local_err = NULL;
> > > +
> > >      ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
> > > +    aio_context_setup(ctx, &local_err);
> > > +    if (local_err) {
> > > +        error_propagate(errp, local_err);
> > 
> > Is there any reason to introduce local_err?  errp can be passed directly
> > into aio_context_setup().
> 
> It's used for catching failure of aio_context_setup, because the convention is
> errp can be NULL.

You are right, I missed that aio_context_setup() has a void return type.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll
  2015-07-08  1:01     ` Fam Zheng
@ 2015-07-08 10:58       ` Stefan Hajnoczi
  2015-07-10  0:46         ` Fam Zheng
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2015-07-08 10:58 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, pbonzini, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2732 bytes --]

On Wed, Jul 08, 2015 at 09:01:27AM +0800, Fam Zheng wrote:
> On Tue, 07/07 16:08, Stefan Hajnoczi wrote:
> > > +#define EPOLL_BATCH 128
> > > +static bool aio_poll_epoll(AioContext *ctx, bool blocking)
> > > +{
> > > +    AioHandler *node;
> > > +    bool was_dispatching;
> > > +    int i, ret;
> > > +    bool progress;
> > > +    int64_t timeout;
> > > +    struct epoll_event events[EPOLL_BATCH];
> > > +
> > > +    aio_context_acquire(ctx);
> > > +    was_dispatching = ctx->dispatching;
> > > +    progress = false;
> > > +
> > > +    /* aio_notify can avoid the expensive event_notifier_set if
> > > +     * everything (file descriptors, bottom halves, timers) will
> > > +     * be re-evaluated before the next blocking poll().  This is
> > > +     * already true when aio_poll is called with blocking == false;
> > > +     * if blocking == true, it is only true after poll() returns.
> > > +     *
> > > +     * If we're in a nested event loop, ctx->dispatching might be true.
> > > +     * In that case we can restore it just before returning, but we
> > > +     * have to clear it now.
> > > +     */
> > > +    aio_set_dispatching(ctx, !blocking);
> > > +
> > > +    ctx->walking_handlers++;
> > > +
> > > +    timeout = blocking ? aio_compute_timeout(ctx) : 0;
> > > +
> > > +    if (timeout > 0) {
> > > +        timeout = DIV_ROUND_UP(timeout, 1000000);
> > > +    }
> > 
> > I think you already posted the timerfd code in an earlier series.  Why
> > degrade to millisecond precision?  It needs to be fixed up anyway if the
> > main loop uses aio_poll() in the future.
> 
> Because of a little complication: timeout here is always -1 for iothread, and
> what is interesting is that -1 actually requires an explicit
> 
>     timerfd_settime(timerfd, flags, &(struct itimerspec){{0, 0}}, NULL)
> 
> to disable timerfd for this aio_poll(), which costs somethings. Passing -1 to
> epoll_wait() without this doesn't work because the timerfd is already added to
> the epollfd and may have an unexpected timeout set before.
> 
> Of course we can cache the state and optimize, but I've not reasoned about what
> if another thread happens to call aio_poll() when we're in epoll_wait(), for
> example when the first aio_poll() has a positive timeout but the second one has
> -1.

I'm not sure I understand the threads scenario since aio_poll_epoll()
has a big aio_context_acquire()/release() region that protects it, but I
guess the nested aio_poll() case is similar.  Care needs to be taken so
the extra timerfd state is consistent.

The optimization can be added later unless the timerfd_settime() syscall
is so expensive that it defeats the advantage of epoll().

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll
  2015-07-08 10:58       ` Stefan Hajnoczi
@ 2015-07-10  0:46         ` Fam Zheng
  2015-07-13 10:02           ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2015-07-10  0:46 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, pbonzini, qemu-devel, qemu-block

On Wed, 07/08 11:58, Stefan Hajnoczi wrote:
> On Wed, Jul 08, 2015 at 09:01:27AM +0800, Fam Zheng wrote:
> > On Tue, 07/07 16:08, Stefan Hajnoczi wrote:
> > > > +#define EPOLL_BATCH 128
> > > > +static bool aio_poll_epoll(AioContext *ctx, bool blocking)
> > > > +{
> > > > +    AioHandler *node;
> > > > +    bool was_dispatching;
> > > > +    int i, ret;
> > > > +    bool progress;
> > > > +    int64_t timeout;
> > > > +    struct epoll_event events[EPOLL_BATCH];
> > > > +
> > > > +    aio_context_acquire(ctx);
> > > > +    was_dispatching = ctx->dispatching;
> > > > +    progress = false;
> > > > +
> > > > +    /* aio_notify can avoid the expensive event_notifier_set if
> > > > +     * everything (file descriptors, bottom halves, timers) will
> > > > +     * be re-evaluated before the next blocking poll().  This is
> > > > +     * already true when aio_poll is called with blocking == false;
> > > > +     * if blocking == true, it is only true after poll() returns.
> > > > +     *
> > > > +     * If we're in a nested event loop, ctx->dispatching might be true.
> > > > +     * In that case we can restore it just before returning, but we
> > > > +     * have to clear it now.
> > > > +     */
> > > > +    aio_set_dispatching(ctx, !blocking);
> > > > +
> > > > +    ctx->walking_handlers++;
> > > > +
> > > > +    timeout = blocking ? aio_compute_timeout(ctx) : 0;
> > > > +
> > > > +    if (timeout > 0) {
> > > > +        timeout = DIV_ROUND_UP(timeout, 1000000);
> > > > +    }
> > > 
> > > I think you already posted the timerfd code in an earlier series.  Why
> > > degrade to millisecond precision?  It needs to be fixed up anyway if the
> > > main loop uses aio_poll() in the future.
> > 
> > Because of a little complication: timeout here is always -1 for iothread, and
> > what is interesting is that -1 actually requires an explicit
> > 
> >     timerfd_settime(timerfd, flags, &(struct itimerspec){{0, 0}}, NULL)
> > 
> > to disable timerfd for this aio_poll(), which costs somethings. Passing -1 to
> > epoll_wait() without this doesn't work because the timerfd is already added to
> > the epollfd and may have an unexpected timeout set before.
> > 
> > Of course we can cache the state and optimize, but I've not reasoned about what
> > if another thread happens to call aio_poll() when we're in epoll_wait(), for
> > example when the first aio_poll() has a positive timeout but the second one has
> > -1.
> 
> I'm not sure I understand the threads scenario since aio_poll_epoll()
> has a big aio_context_acquire()/release() region that protects it, but I
> guess the nested aio_poll() case is similar.  Care needs to be taken so
> the extra timerfd state is consistent.

Nested aio_poll() has no racing on timerfd because the outer aio_poll()'s
epoll_wait() would have already returned at the point of the inner aio_poll().

Threads are different with Paolo's "release AioContext around blocking
aio_poll()".

> 
> The optimization can be added later unless the timerfd_settime() syscall
> is so expensive that it defeats the advantage of epoll().

That's the plan, and must be done before it get used by main loop.

Fam

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll
  2015-07-10  0:46         ` Fam Zheng
@ 2015-07-13 10:02           ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2015-07-13 10:02 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, pbonzini, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3609 bytes --]

On Fri, Jul 10, 2015 at 08:46:44AM +0800, Fam Zheng wrote:
> On Wed, 07/08 11:58, Stefan Hajnoczi wrote:
> > On Wed, Jul 08, 2015 at 09:01:27AM +0800, Fam Zheng wrote:
> > > On Tue, 07/07 16:08, Stefan Hajnoczi wrote:
> > > > > +#define EPOLL_BATCH 128
> > > > > +static bool aio_poll_epoll(AioContext *ctx, bool blocking)
> > > > > +{
> > > > > +    AioHandler *node;
> > > > > +    bool was_dispatching;
> > > > > +    int i, ret;
> > > > > +    bool progress;
> > > > > +    int64_t timeout;
> > > > > +    struct epoll_event events[EPOLL_BATCH];
> > > > > +
> > > > > +    aio_context_acquire(ctx);
> > > > > +    was_dispatching = ctx->dispatching;
> > > > > +    progress = false;
> > > > > +
> > > > > +    /* aio_notify can avoid the expensive event_notifier_set if
> > > > > +     * everything (file descriptors, bottom halves, timers) will
> > > > > +     * be re-evaluated before the next blocking poll().  This is
> > > > > +     * already true when aio_poll is called with blocking == false;
> > > > > +     * if blocking == true, it is only true after poll() returns.
> > > > > +     *
> > > > > +     * If we're in a nested event loop, ctx->dispatching might be true.
> > > > > +     * In that case we can restore it just before returning, but we
> > > > > +     * have to clear it now.
> > > > > +     */
> > > > > +    aio_set_dispatching(ctx, !blocking);
> > > > > +
> > > > > +    ctx->walking_handlers++;
> > > > > +
> > > > > +    timeout = blocking ? aio_compute_timeout(ctx) : 0;
> > > > > +
> > > > > +    if (timeout > 0) {
> > > > > +        timeout = DIV_ROUND_UP(timeout, 1000000);
> > > > > +    }
> > > > 
> > > > I think you already posted the timerfd code in an earlier series.  Why
> > > > degrade to millisecond precision?  It needs to be fixed up anyway if the
> > > > main loop uses aio_poll() in the future.
> > > 
> > > Because of a little complication: timeout here is always -1 for iothread, and
> > > what is interesting is that -1 actually requires an explicit
> > > 
> > >     timerfd_settime(timerfd, flags, &(struct itimerspec){{0, 0}}, NULL)
> > > 
> > > to disable timerfd for this aio_poll(), which costs somethings. Passing -1 to
> > > epoll_wait() without this doesn't work because the timerfd is already added to
> > > the epollfd and may have an unexpected timeout set before.
> > > 
> > > Of course we can cache the state and optimize, but I've not reasoned about what
> > > if another thread happens to call aio_poll() when we're in epoll_wait(), for
> > > example when the first aio_poll() has a positive timeout but the second one has
> > > -1.
> > 
> > I'm not sure I understand the threads scenario since aio_poll_epoll()
> > has a big aio_context_acquire()/release() region that protects it, but I
> > guess the nested aio_poll() case is similar.  Care needs to be taken so
> > the extra timerfd state is consistent.
> 
> Nested aio_poll() has no racing on timerfd because the outer aio_poll()'s
> epoll_wait() would have already returned at the point of the inner aio_poll().
> 
> Threads are different with Paolo's "release AioContext around blocking
> aio_poll()".

Ah, I see!

> > 
> > The optimization can be added later unless the timerfd_settime() syscall
> > is so expensive that it defeats the advantage of epoll().
> 
> That's the plan, and must be done before it get used by main loop.

I'd rather we merge correct code than fast code which violates the API.

Let's do nanosecond precision now, as advertised by the function names,
and optimize timerfd later.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-07-13 10:02 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-30 13:19 [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait in aio_poll Fam Zheng
2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 1/4] aio: Introduce aio_set_fd_handler_pri Fam Zheng
2015-07-07 14:29   ` Stefan Hajnoczi
2015-07-08  1:07     ` Fam Zheng
2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 2/4] aio: Move aio_set_fd_handler to async.c Fam Zheng
2015-07-07 14:30   ` Stefan Hajnoczi
2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 3/4] aio: Introduce aio_context_setup Fam Zheng
2015-07-07 14:35   ` Stefan Hajnoczi
2015-07-08  1:15     ` Fam Zheng
2015-07-08 10:51       ` Stefan Hajnoczi
2015-06-30 13:19 ` [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll Fam Zheng
2015-07-07 15:08   ` Stefan Hajnoczi
2015-07-07 15:27     ` Paolo Bonzini
2015-07-08  1:01     ` Fam Zheng
2015-07-08 10:58       ` Stefan Hajnoczi
2015-07-10  0:46         ` Fam Zheng
2015-07-13 10:02           ` Stefan Hajnoczi
2015-07-07 14:54 ` [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait " Christian Borntraeger
2015-07-08  1:02   ` Fam Zheng
2015-07-08  7:59     ` Christian Borntraeger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.