All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction
@ 2014-12-04  3:43 Fam Zheng
  2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 1/6] poll: Introduce QEMU Poll API Fam Zheng
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Fam Zheng @ 2014-12-04  3:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

v2: Emulate nanoseconds precison of timeout with ppoll and timerfd.
    Their performance is on par with each other, but both much better than
    qemu.git:

    syscall         high # of fd      low # of fd
    -------------------------------------------------
    qemu.git(ppoll) 44                96
    ppoll+epoll     85                101
    timerfd+epoll   87                109

(In high # of fd case, 3 activated but idle virtio-console devices are
attached, which will add us hundereds of fds to poll)

v1 cover letter
---------------

ppoll(2) doesn't scale as well as epoll: The elapsed time of the syscall is
linear to the number of fd's we poll, which hurts performance a bit when the
number of devices are many, or when a virtio device registers many virtqueues
(virtio-serial, for instance).

This series introduces qemu_poll, which is implemented  with g_poll and epoll,
decided at configure time with CONFIG_EPOLL.

Fam


Fam Zheng (6):
  poll: Introduce QEMU Poll API
  posix-aio: Use QEMU poll interface
  poll: Add epoll implementation for qemu_poll
  main-loop: Replace qemu_poll_ns with qemu_poll
  tests: Add test case for qemu_poll
  poll-linux: Add timerfd support

 Makefile.objs           |   2 +
 aio-posix.c             |  52 ++++----
 async.c                 |   5 +-
 include/block/aio.h     |   7 +-
 include/qemu/poll.h     |  40 ++++++
 include/qemu/timer.h    |  13 --
 include/qemu/typedefs.h |   2 +
 main-loop.c             |  35 +++++-
 poll-glib.c             | 130 ++++++++++++++++++++
 poll-linux.c            | 314 ++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-timer.c            |  28 -----
 tests/Makefile          |   2 +
 tests/test-poll.c       | 272 +++++++++++++++++++++++++++++++++++++++++
 13 files changed, 821 insertions(+), 81 deletions(-)
 create mode 100644 include/qemu/poll.h
 create mode 100644 poll-glib.c
 create mode 100644 poll-linux.c
 create mode 100644 tests/test-poll.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 1/6] poll: Introduce QEMU Poll API
  2014-12-04  3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
@ 2014-12-04  3:43 ` Fam Zheng
  2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 2/6] posix-aio: Use QEMU poll interface Fam Zheng
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2014-12-04  3:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

This is abstract of underlying poll implementation. A glib
implementation is included.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 Makefile.objs           |   2 +-
 include/qemu/poll.h     |  40 +++++++++++++++
 include/qemu/typedefs.h |   2 +
 poll-glib.c             | 130 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 173 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/poll.h
 create mode 100644 poll-glib.c

diff --git a/Makefile.objs b/Makefile.objs
index 18fd35c..57184eb 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -9,7 +9,7 @@ util-obj-y = util/ qobject/ qapi/ qapi-types.o qapi-visit.o qapi-event.o
 block-obj-y = async.o thread-pool.o
 block-obj-y += nbd.o block.o blockjob.o
 block-obj-y += main-loop.o iohandler.o qemu-timer.o
-block-obj-$(CONFIG_POSIX) += aio-posix.o
+block-obj-$(CONFIG_POSIX) += aio-posix.o poll-glib.o
 block-obj-$(CONFIG_WIN32) += aio-win32.o
 block-obj-y += block/
 block-obj-y += qemu-io-cmds.o
diff --git a/include/qemu/poll.h b/include/qemu/poll.h
new file mode 100644
index 0000000..597975f
--- /dev/null
+++ b/include/qemu/poll.h
@@ -0,0 +1,40 @@
+#ifndef QEMU_POLL_H
+#define QEMU_POLL_H
+
+#include "qemu/typedefs.h"
+#include "qemu-common.h"
+
+struct QEMUPollEvent {
+    int fd;
+    int events;
+    int revents;
+    void *opaque;
+};
+
+QEMUPoll *qemu_poll_new(void);
+void qemu_poll_free(QEMUPoll *qpoll);
+int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns);
+
+/* Add an fd to poll. Return -EEXIST if fd already registered. */
+int qemu_poll_add(QEMUPoll *qpoll, int fd, int gio_events, void *opaque);
+
+/* Delete a previously added fd. Return -ENOENT if fd not registered. */
+int qemu_poll_del(QEMUPoll *qpoll, int fd);
+
+/**
+ * A shortcut to:
+ * 1) remove all the existing fds;
+ * 2) add all in @fds, while setting each fd's opaque pointer to the pollfd
+ * struct itself.
+ */
+int qemu_poll_set_fds(QEMUPoll *qpoll, GPollFD *fds, int nfds);
+
+/**
+ * Query the events from last qemu_poll. ONLY revent and opaque are valid in
+ * @events after return.
+ */
+int qemu_poll_get_events(QEMUPoll *qpoll,
+                         QEMUPollEvent *events,
+                         int max_events);
+
+#endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 3475177..f9a5e60 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -7,6 +7,8 @@ typedef struct QEMUTimer QEMUTimer;
 typedef struct QEMUTimerListGroup QEMUTimerListGroup;
 typedef struct QEMUFile QEMUFile;
 typedef struct QEMUBH QEMUBH;
+typedef struct QEMUPoll QEMUPoll;
+typedef struct QEMUPollEvent QEMUPollEvent;
 
 typedef struct AioContext AioContext;
 
diff --git a/poll-glib.c b/poll-glib.c
new file mode 100644
index 0000000..64fde69
--- /dev/null
+++ b/poll-glib.c
@@ -0,0 +1,130 @@
+/*
+ * g_poll implementation for QEMU Poll API
+ *
+ * Copyright Red Hat, Inc. 2014
+ *
+ * Authors:
+ *   Fam Zheng <famz@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "qemu/timer.h"
+#include "qemu/poll.h"
+
+struct QEMUPoll {
+    /* Array of GPollFD for g_poll() */
+    GArray *gpollfds;
+
+    /* Array of opaque pointers, should be in sync with gpollfds */
+    GArray *opaque;
+};
+
+QEMUPoll *qemu_poll_new(void)
+{
+    QEMUPoll *qpoll = g_new0(QEMUPoll, 1);
+    qpoll->gpollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
+    qpoll->opaque = g_array_new(FALSE, FALSE, sizeof(void *));
+    return qpoll;
+}
+
+
+void qemu_poll_free(QEMUPoll *qpoll)
+{
+    g_array_unref(qpoll->gpollfds);
+    g_array_unref(qpoll->opaque);
+    g_free(qpoll);
+}
+
+int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
+{
+    int i;
+    for (i = 0; i < qpoll->gpollfds->len; i++) {
+        GPollFD *p = &g_array_index(qpoll->gpollfds, GPollFD, i);
+        p->revents = 0;
+    }
+    return g_poll((GPollFD *)qpoll->gpollfds->data,
+                  qpoll->gpollfds->len,
+                  qemu_timeout_ns_to_ms(timeout_ns));
+}
+
+/* Add an fd to poll. Return -EEXIST if fd already registered. */
+int qemu_poll_add(QEMUPoll *qpoll, int fd, int gio_events, void *opaque)
+{
+    int i;
+    GPollFD pfd = (GPollFD) {
+        .fd = fd,
+        .revents = 0,
+        .events = gio_events,
+    };
+
+    for (i = 0; i < qpoll->gpollfds->len; i++) {
+        GPollFD *p = &g_array_index(qpoll->gpollfds, GPollFD, i);
+        if (p->fd == fd) {
+            return -EEXIST;
+        }
+    }
+    g_array_append_val(qpoll->gpollfds, pfd);
+    g_array_append_val(qpoll->opaque, opaque);
+    assert(qpoll->gpollfds->len == qpoll->opaque->len);
+    return 0;
+}
+
+/* Delete a previously added fd. Return -ENOENT if fd not registered. */
+int qemu_poll_del(QEMUPoll *qpoll, int fd)
+{
+    int i;
+    int found = -1;
+
+    for (i = 0; i < qpoll->gpollfds->len; i++) {
+        GPollFD *p = &g_array_index(qpoll->gpollfds, GPollFD, i);
+        if (p->fd == fd) {
+            found = i;
+            break;
+        }
+    }
+    if (found >= 0) {
+        g_array_remove_index(qpoll->gpollfds, found);
+        g_array_remove_index(qpoll->opaque, found);
+        assert(qpoll->gpollfds->len == qpoll->opaque->len);
+        return 0;
+    } else {
+        return -ENOENT;
+    }
+}
+
+int qemu_poll_set_fds(QEMUPoll *qpoll, GPollFD *fds, int nfds)
+{
+    int i;
+    g_array_set_size(qpoll->gpollfds, 0);
+    g_array_set_size(qpoll->opaque, 0);
+    for (i = 0; i < nfds; i++) {
+        void *opaque = &fds[i];
+        g_array_append_val(qpoll->gpollfds, fds[i]);
+        g_array_append_val(qpoll->opaque, opaque);
+        assert(qpoll->gpollfds->len == qpoll->opaque->len);
+    }
+    return nfds;
+}
+
+int qemu_poll_get_events(QEMUPoll *qpoll,
+                         QEMUPollEvent *events,
+                         int max_events)
+{
+    int i;
+    int r = 0;
+    for (i = 0; i < qpoll->gpollfds->len && r < max_events; i++) {
+        GPollFD *fd = &g_array_index(qpoll->gpollfds, GPollFD, i);
+        if (fd->revents & fd->events) {
+            events[r].fd = fd->fd;
+            events[r].revents = fd->revents;
+            events[r].events = fd->events;
+            events[r].opaque = g_array_index(qpoll->opaque, void *, i);
+            r++;
+        }
+    }
+    return r;
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 2/6] posix-aio: Use QEMU poll interface
  2014-12-04  3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
  2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 1/6] poll: Introduce QEMU Poll API Fam Zheng
@ 2014-12-04  3:43 ` Fam Zheng
  2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 3/6] poll: Add epoll implementation for qemu_poll Fam Zheng
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2014-12-04  3:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

The AIO handler list is only modified by aio_set_fd_handler, so we can
easily add del poll fd there. Initialize a QEMUPoll and keep track of
all the fds, so we don't need to rebuild a GPollFD array for g_poll in
aio_poll.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 aio-posix.c         | 52 +++++++++++++++++++++-------------------------------
 async.c             |  5 +++--
 include/block/aio.h |  7 +++++--
 3 files changed, 29 insertions(+), 35 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index d3ac06e..32323b3 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -17,6 +17,7 @@
 #include "block/block.h"
 #include "qemu/queue.h"
 #include "qemu/sockets.h"
+#include "qemu/poll.h"
 
 struct AioHandler
 {
@@ -55,6 +56,7 @@ void aio_set_fd_handler(AioContext *ctx,
     /* Are we deleting the fd handler? */
     if (!io_read && !io_write) {
         if (node) {
+            qemu_poll_del(ctx->qpoll, fd);
             g_source_remove_poll(&ctx->source, &node->pfd);
 
             /* If the lock is held, just mark the node as deleted */
@@ -71,13 +73,15 @@ void aio_set_fd_handler(AioContext *ctx,
             }
         }
     } else {
-        if (node == NULL) {
+        if (node) {
+            /* Remove the old */
+            qemu_poll_del(ctx->qpoll, fd);
+            g_source_remove_poll(&ctx->source, &node->pfd);
+        } else {
             /* Alloc and insert if it's not already there */
             node = g_malloc0(sizeof(AioHandler));
             node->pfd.fd = fd;
             QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
-
-            g_source_add_poll(&ctx->source, &node->pfd);
         }
         /* Update handler with latest information */
         node->io_read = io_read;
@@ -87,6 +91,8 @@ void aio_set_fd_handler(AioContext *ctx,
 
         node->pfd.events = (io_read ? G_IO_IN | G_IO_HUP | G_IO_ERR : 0);
         node->pfd.events |= (io_write ? G_IO_OUT | G_IO_ERR : 0);
+        qemu_poll_add(ctx->qpoll, fd, node->pfd.events, node);
+        g_source_add_poll(&ctx->source, &node->pfd);
     }
 
     aio_notify(ctx);
@@ -191,6 +197,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
     AioHandler *node;
     bool was_dispatching;
     int ret;
+    int i;
     bool progress;
 
     was_dispatching = ctx->dispatching;
@@ -208,38 +215,21 @@ bool aio_poll(AioContext *ctx, bool blocking)
      */
     aio_set_dispatching(ctx, !blocking);
 
-    ctx->walking_handlers++;
-
-    g_array_set_size(ctx->pollfds, 0);
-
-    /* fill pollfds */
-    QLIST_FOREACH(node, &ctx->aio_handlers, node) {
-        node->pollfds_idx = -1;
-        if (!node->deleted && node->pfd.events) {
-            GPollFD pfd = {
-                .fd = node->pfd.fd,
-                .events = node->pfd.events,
-            };
-            node->pollfds_idx = ctx->pollfds->len;
-            g_array_append_val(ctx->pollfds, pfd);
-        }
-    }
-
-    ctx->walking_handlers--;
-
     /* wait until next event */
-    ret = qemu_poll_ns((GPollFD *)ctx->pollfds->data,
-                         ctx->pollfds->len,
-                         blocking ? aio_compute_timeout(ctx) : 0);
+    ret = qemu_poll(ctx->qpoll, blocking ? aio_compute_timeout(ctx) : 0);
 
     /* if we have any readable fds, dispatch event */
     if (ret > 0) {
-        QLIST_FOREACH(node, &ctx->aio_handlers, node) {
-            if (node->pollfds_idx != -1) {
-                GPollFD *pfd = &g_array_index(ctx->pollfds, GPollFD,
-                                              node->pollfds_idx);
-                node->pfd.revents = pfd->revents;
-            }
+        int r;
+        g_array_set_size(ctx->events, ret);
+        r = qemu_poll_get_events(ctx->qpoll,
+                                (QEMUPollEvent *)ctx->events->data,
+                                ret);
+        assert(r == ret);
+        for (i = 0; i < r; i++) {
+            QEMUPollEvent *e = &g_array_index(ctx->events, QEMUPollEvent, i);
+            node = e->opaque;
+            node->pfd.revents = e->revents;
         }
     }
 
diff --git a/async.c b/async.c
index 6e1b282..443a674 100644
--- a/async.c
+++ b/async.c
@@ -27,6 +27,7 @@
 #include "block/thread-pool.h"
 #include "qemu/main-loop.h"
 #include "qemu/atomic.h"
+#include "qemu/poll.h"
 
 /***********************************************************/
 /* bottom halves (can be seen as timers which expire ASAP) */
@@ -232,7 +233,6 @@ aio_ctx_finalize(GSource     *source)
     event_notifier_cleanup(&ctx->notifier);
     rfifolock_destroy(&ctx->lock);
     qemu_mutex_destroy(&ctx->bh_lock);
-    g_array_free(ctx->pollfds, TRUE);
     timerlistgroup_deinit(&ctx->tlg);
 }
 
@@ -300,10 +300,11 @@ AioContext *aio_context_new(Error **errp)
         error_setg_errno(errp, -ret, "Failed to initialize event notifier");
         return NULL;
     }
+    ctx->qpoll = qemu_poll_new();
+    ctx->events = g_array_new(FALSE, FALSE, sizeof(QEMUPollEvent));
     aio_set_event_notifier(ctx, &ctx->notifier,
                            (EventNotifierHandler *)
                            event_notifier_test_and_clear);
-    ctx->pollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
     ctx->thread_pool = NULL;
     qemu_mutex_init(&ctx->bh_lock);
     rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
diff --git a/include/block/aio.h b/include/block/aio.h
index 6bf0e04..c36b8c1 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -82,8 +82,11 @@ struct AioContext {
     /* Used for aio_notify.  */
     EventNotifier notifier;
 
-    /* GPollFDs for aio_poll() */
-    GArray *pollfds;
+    /* qemu_poll context */
+    QEMUPoll *qpoll;
+
+    /* QEMUPollEvents for qemu_poll_get_events() */
+    GArray *events;
 
     /* Thread pool for performing work and receiving completion callbacks */
     struct ThreadPool *thread_pool;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 3/6] poll: Add epoll implementation for qemu_poll
  2014-12-04  3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
  2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 1/6] poll: Introduce QEMU Poll API Fam Zheng
  2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 2/6] posix-aio: Use QEMU poll interface Fam Zheng
@ 2014-12-04  3:43 ` Fam Zheng
  2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 4/6] main-loop: Replace qemu_poll_ns with qemu_poll Fam Zheng
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2014-12-04  3:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

This implements qemu_poll with ppoll + epoll. The only complex part is
qemu_poll_set_fds, which will sync up epollfd with epoll_ctl by
computing the symmetric difference of previous and new fds.

The ppoll is used to retain ns precision of timeout.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 Makefile.objs |   4 +-
 poll-linux.c  | 232 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 235 insertions(+), 1 deletion(-)
 create mode 100644 poll-linux.c

diff --git a/Makefile.objs b/Makefile.objs
index 57184eb..ea86cb8 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -9,7 +9,9 @@ util-obj-y = util/ qobject/ qapi/ qapi-types.o qapi-visit.o qapi-event.o
 block-obj-y = async.o thread-pool.o
 block-obj-y += nbd.o block.o blockjob.o
 block-obj-y += main-loop.o iohandler.o qemu-timer.o
-block-obj-$(CONFIG_POSIX) += aio-posix.o poll-glib.o
+block-obj-$(CONFIG_POSIX) += aio-posix.o
+block-obj-$(CONFIG_EPOLL) += poll-linux.o
+block-obj-$(if $(CONFIG_EPOLL),n,$(CONFIG_POSIX)) += poll-glib.o
 block-obj-$(CONFIG_WIN32) += aio-win32.o
 block-obj-y += block/
 block-obj-y += qemu-io-cmds.o
diff --git a/poll-linux.c b/poll-linux.c
new file mode 100644
index 0000000..7605004
--- /dev/null
+++ b/poll-linux.c
@@ -0,0 +1,232 @@
+/*
+ * epoll implementation for QEMU Poll API
+ *
+ * Copyright Red Hat, Inc. 2014
+ *
+ * Authors:
+ *   Fam Zheng <famz@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <sys/epoll.h>
+#include <glib.h>
+#include <poll.h>
+#include "qemu-common.h"
+#include "qemu/timer.h"
+#include "qemu/poll.h"
+
+struct QEMUPoll {
+    int epollfd;
+    struct epoll_event *events;
+    int max_events;
+    int nevents;
+    GHashTable *fds;
+};
+
+QEMUPoll *qemu_poll_new(void)
+{
+    int epollfd;
+    QEMUPoll *qpoll = g_new0(QEMUPoll, 1);
+    epollfd = epoll_create1(EPOLL_CLOEXEC);
+    if (epollfd < 0) {
+        perror("epoll_create1:");
+        abort();
+    }
+    qpoll->epollfd = epollfd;
+    qpoll->max_events = 1;
+    qpoll->events = g_new(struct epoll_event, 1);
+    qpoll->fds = g_hash_table_new_full(g_int_hash, g_int_equal,
+                                       NULL, g_free);
+    return qpoll;
+}
+
+void qemu_poll_free(QEMUPoll *qpoll)
+{
+    g_free(qpoll->events);
+    g_hash_table_destroy(qpoll->fds);
+    close(qpoll->epollfd);
+    g_free(qpoll);
+}
+
+int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
+{
+    int r;
+    struct pollfd fd = {
+        .fd = qpoll->epollfd,
+        .events = POLLIN | POLLOUT | POLLERR | POLLHUP,
+    };
+
+    if (timeout_ns >= 0) {
+        struct timespec ts;
+        ts.tv_sec = timeout_ns / 1000000000LL;
+        ts.tv_nsec = timeout_ns % 1000000000LL;
+        r = ppoll(&fd, 1, &ts, NULL);
+    } else {
+        r = ppoll(&fd, 1, NULL, NULL);
+    }
+
+    if (r > 0) {
+        r = epoll_wait(qpoll->epollfd,
+                       qpoll->events,
+                       qpoll->max_events,
+                       0);
+        qpoll->nevents = r;
+    }
+    return r;
+}
+
+static inline uint32_t epoll_event_from_gio_events(int gio_events)
+{
+
+    return (gio_events & G_IO_IN  ? EPOLLIN : 0) |
+           (gio_events & G_IO_OUT ? EPOLLOUT : 0) |
+           (gio_events & G_IO_ERR ? EPOLLERR : 0) |
+           (gio_events & G_IO_HUP ? EPOLLHUP : 0);
+}
+
+
+/* Add an fd to poll. Return -EEXIST if fd already registered. */
+int qemu_poll_add(QEMUPoll *qpoll, int fd, int gio_events, void *opaque)
+{
+    int ret;
+    struct epoll_event ev;
+    QEMUPollEvent *e;
+
+    ev.events = epoll_event_from_gio_events(gio_events);
+    ev.data.fd = fd;
+    ret = epoll_ctl(qpoll->epollfd, EPOLL_CTL_ADD, fd, &ev);
+    if (ret) {
+        ret = -errno;
+        return ret;
+    }
+    qpoll->max_events++;
+    qpoll->events = g_renew(struct epoll_event,
+                            qpoll->events,
+                            qpoll->max_events);
+    /* Shouldn't get here if the fd is already added since epoll_ctl would
+     * return -EEXIST, assert to be sure */
+    assert(g_hash_table_lookup(qpoll->fds, &fd) == NULL);
+    e = g_new0(QEMUPollEvent, 1);
+    e->fd = fd;
+    e->events = gio_events;
+    e->opaque = opaque;
+    g_hash_table_insert(qpoll->fds, &e->fd, e);
+    return ret;
+}
+
+/* Delete a previously added fd. Return -ENOENT if fd not registered. */
+int qemu_poll_del(QEMUPoll *qpoll, int fd)
+{
+    int ret;
+
+    if (!g_hash_table_lookup(qpoll->fds, &fd)) {
+        ret = -ENOENT;
+        goto out;
+    }
+    ret = epoll_ctl(qpoll->epollfd, EPOLL_CTL_DEL, fd, NULL);
+    if (ret) {
+        ret = -errno;
+        goto out;
+    }
+    qpoll->max_events--;
+    qpoll->events = g_renew(struct epoll_event,
+                            qpoll->events,
+                            qpoll->max_events);
+out:
+    g_hash_table_remove(qpoll->fds, &fd);
+    return ret;
+}
+
+static void qemu_poll_copy_fd(gpointer key, gpointer value, gpointer user_data)
+{
+    GHashTable *dst = user_data;
+    QEMUPollEvent *event, *copy;
+
+    event = value;
+    copy = g_new(QEMUPollEvent, 1);
+    *copy = *event;
+    g_hash_table_insert(dst, &copy->fd, copy);
+}
+
+static void qemu_poll_del_fd(gpointer key, gpointer value, gpointer user_data)
+{
+    QEMUPoll *qpoll = user_data;
+    QEMUPollEvent *event = value;
+
+    qemu_poll_del(qpoll, event->fd);
+}
+
+int qemu_poll_set_fds(QEMUPoll *qpoll, GPollFD *fds, int nfds)
+{
+    int i;
+    int updated = 0;
+    int ret = nfds;
+    int old_size = g_hash_table_size(qpoll->fds);
+
+    for (i = 0; i < nfds; i++) {
+        int r;
+        GPollFD *fd = &fds[i];
+        QEMUPollEvent *e = g_hash_table_lookup(qpoll->fds, &fd->fd);
+        if (e) {
+            updated++;
+            assert(e->fd == fd->fd);
+            if (e->events == fd->events) {
+                e->opaque = fd;
+                continue;
+            }
+            r = qemu_poll_del(qpoll, fd->fd);
+            if (r < 0) {
+                ret = r;
+                break;
+            }
+        }
+        r = qemu_poll_add(qpoll, fd->fd, fd->events, fd);
+        if (r < 0) {
+            ret = r;
+            break;
+        }
+    }
+
+    if (updated < old_size) {
+        GHashTable *fds_copy;
+
+        fds_copy = g_hash_table_new_full(g_int_hash, g_int_equal, NULL, g_free);
+        g_hash_table_foreach(qpoll->fds, qemu_poll_copy_fd, fds_copy);
+
+        /* Some fds need to be removed, as they are not seen in new fds */
+        for (i = 0; i < nfds; i++) {
+            GPollFD *fd = &fds[i];
+            g_hash_table_remove(fds_copy, &fd->fd);
+        }
+
+        g_hash_table_foreach(fds_copy, qemu_poll_del_fd, qpoll);
+        g_hash_table_destroy(fds_copy);
+    }
+    return ret;
+}
+
+int qemu_poll_get_events(QEMUPoll *qpoll,
+                         QEMUPollEvent *events,
+                         int max_events)
+{
+    int i;
+    QEMUPollEvent *p;
+    struct epoll_event *ev;
+    int fd;
+
+    for (i = 0; i < MIN(qpoll->nevents, max_events); i++) {
+        ev = &qpoll->events[i];
+        fd = ev->data.fd;
+        p = g_hash_table_lookup(qpoll->fds, &fd);
+        assert(p);
+
+        events[i].revents = ev->events;
+        events[i].opaque = p->opaque;
+        events[i].fd = fd;
+        events[i].events = p->events;
+    }
+    return i;
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 4/6] main-loop: Replace qemu_poll_ns with qemu_poll
  2014-12-04  3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
                   ` (2 preceding siblings ...)
  2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 3/6] poll: Add epoll implementation for qemu_poll Fam Zheng
@ 2014-12-04  3:43 ` Fam Zheng
  2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 5/6] tests: Add test case for qemu_poll Fam Zheng
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2014-12-04  3:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

qemu_poll_set_fds + qemu_poll does the same, and when epoll is
available, it is faster.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 include/qemu/timer.h | 13 -------------
 main-loop.c          | 35 ++++++++++++++++++++++++++++++-----
 qemu-timer.c         | 28 ----------------------------
 3 files changed, 30 insertions(+), 46 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 5f5210d..cd3371d 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -646,19 +646,6 @@ void timer_put(QEMUFile *f, QEMUTimer *ts);
 int qemu_timeout_ns_to_ms(int64_t ns);
 
 /**
- * qemu_poll_ns:
- * @fds: Array of file descriptors
- * @nfds: number of file descriptors
- * @timeout: timeout in nanoseconds
- *
- * Perform a poll like g_poll but with a timeout in nanoseconds.
- * See g_poll documentation for further details.
- *
- * Returns: number of fds ready
- */
-int qemu_poll_ns(GPollFD *fds, guint nfds, int64_t timeout);
-
-/**
  * qemu_soonest_timeout:
  * @timeout1: first timeout in nanoseconds (or -1 for infinite)
  * @timeout2: second timeout in nanoseconds (or -1 for infinite)
diff --git a/main-loop.c b/main-loop.c
index 981bcb5..b70f929 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -29,6 +29,7 @@
 #include "slirp/libslirp.h"
 #include "qemu/main-loop.h"
 #include "block/aio.h"
+#include "qemu/poll.h"
 
 #ifndef _WIN32
 
@@ -130,6 +131,7 @@ void qemu_notify_event(void)
 }
 
 static GArray *gpollfds;
+static QEMUPoll *qpoll;
 
 int qemu_init_main_loop(Error **errp)
 {
@@ -150,6 +152,7 @@ int qemu_init_main_loop(Error **errp)
         return -EMFILE;
     }
     gpollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
+    qpoll = qemu_poll_new();
     src = aio_get_g_source(qemu_aio_context);
     g_source_attach(src, NULL);
     g_source_unref(src);
@@ -207,6 +210,8 @@ static int os_host_main_loop_wait(int64_t timeout)
 {
     int ret;
     static int spin_counter;
+    QEMUPollEvent events[1024];
+    int r, i;
 
     glib_pollfds_fill(&timeout);
 
@@ -236,7 +241,17 @@ static int os_host_main_loop_wait(int64_t timeout)
         spin_counter++;
     }
 
-    ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len, timeout);
+    qemu_poll_set_fds(qpoll, (GPollFD *)gpollfds->data, gpollfds->len);
+    ret = qemu_poll(qpoll, timeout);
+    if (ret > 0) {
+        ret = MIN(ret, sizeof(events) / sizeof(QEMUPollEvent));
+        r = qemu_poll_get_events(qpoll, events, ret);
+        for (i = 0; i < r; i++) {
+            GPollFD *fd = events[i].opaque;
+            fd->revents = events[i].revents;
+        }
+        ret = r;
+    }
 
     if (timeout) {
         qemu_mutex_lock_iothread();
@@ -389,9 +404,11 @@ static int os_host_main_loop_wait(int64_t timeout)
 {
     GMainContext *context = g_main_context_default();
     GPollFD poll_fds[1024 * 2]; /* this is probably overkill */
+    QEMUPollEvent poll_events[1024 * 2];
     int select_ret = 0;
-    int g_poll_ret, ret, i, n_poll_fds;
+    int nevents, ret, i, n_poll_fds;
     PollingEntry *pe;
+    QEMUPollEvent *events;
     WaitObjects *w = &wait_objects;
     gint poll_timeout;
     int64_t poll_timeout_ns;
@@ -441,10 +458,18 @@ static int os_host_main_loop_wait(int64_t timeout)
     poll_timeout_ns = qemu_soonest_timeout(poll_timeout_ns, timeout);
 
     qemu_mutex_unlock_iothread();
-    g_poll_ret = qemu_poll_ns(poll_fds, n_poll_fds + w->num, poll_timeout_ns);
+
+    qemu_poll_set_fds(qpoll, (GPollFD *)poll_fds, n_poll_fds + w->num)
+    nevents = qemu_poll(qpoll, poll_timeout_ns);
 
     qemu_mutex_lock_iothread();
-    if (g_poll_ret > 0) {
+    if (nevents > 0) {
+        r = qemu_poll_get_events(qpoll, poll_events, nevents);
+        for (i = 0; i < r; i++) {
+            GPollFD *fd = poll_events[i].opaque;
+            fd->revents = poll_events[i].revents;
+        }
+
         for (i = 0; i < w->num; i++) {
             w->revents[i] = poll_fds[n_poll_fds + i].revents;
         }
@@ -459,7 +484,7 @@ static int os_host_main_loop_wait(int64_t timeout)
         g_main_context_dispatch(context);
     }
 
-    return select_ret || g_poll_ret;
+    return select_ret || nevents;
 }
 #endif
 
diff --git a/qemu-timer.c b/qemu-timer.c
index c77de64..4500235 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -303,34 +303,6 @@ int qemu_timeout_ns_to_ms(int64_t ns)
     return (int) ms;
 }
 
-
-/* qemu implementation of g_poll which uses a nanosecond timeout but is
- * otherwise identical to g_poll
- */
-int qemu_poll_ns(GPollFD *fds, guint nfds, int64_t timeout)
-{
-#ifdef CONFIG_PPOLL
-    if (timeout < 0) {
-        return ppoll((struct pollfd *)fds, nfds, NULL, NULL);
-    } else {
-        struct timespec ts;
-        int64_t tvsec = timeout / 1000000000LL;
-        /* Avoid possibly overflowing and specifying a negative number of
-         * seconds, which would turn a very long timeout into a busy-wait.
-         */
-        if (tvsec > (int64_t)INT32_MAX) {
-            tvsec = INT32_MAX;
-        }
-        ts.tv_sec = tvsec;
-        ts.tv_nsec = timeout % 1000000000LL;
-        return ppoll((struct pollfd *)fds, nfds, &ts, NULL);
-    }
-#else
-    return g_poll(fds, nfds, qemu_timeout_ns_to_ms(timeout));
-#endif
-}
-
-
 void timer_init(QEMUTimer *ts,
                 QEMUTimerList *timer_list, int scale,
                 QEMUTimerCB *cb, void *opaque)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 5/6] tests: Add test case for qemu_poll
  2014-12-04  3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
                   ` (3 preceding siblings ...)
  2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 4/6] main-loop: Replace qemu_poll_ns with qemu_poll Fam Zheng
@ 2014-12-04  3:43 ` Fam Zheng
  2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 6/6] poll-linux: Add timerfd support Fam Zheng
  2014-12-16  2:04 ` [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
  6 siblings, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2014-12-04  3:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 tests/Makefile    |   2 +
 tests/test-poll.c | 272 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 274 insertions(+)
 create mode 100644 tests/test-poll.c

diff --git a/tests/Makefile b/tests/Makefile
index 16f0e4c..79f399d 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -41,6 +41,7 @@ check-unit-$(CONFIG_POSIX) += tests/test-rfifolock$(EXESUF)
 check-unit-y += tests/test-throttle$(EXESUF)
 gcov-files-test-aio-$(CONFIG_WIN32) = aio-win32.c
 gcov-files-test-aio-$(CONFIG_POSIX) = aio-posix.c
+check-unit-y += tests/test-poll$(EXESUF)
 check-unit-y += tests/test-thread-pool$(EXESUF)
 gcov-files-test-thread-pool-y = thread-pool.c
 gcov-files-test-hbitmap-y = util/hbitmap.c
@@ -241,6 +242,7 @@ tests/check-qjson$(EXESUF): tests/check-qjson.o libqemuutil.a libqemustub.a
 tests/check-qom-interface$(EXESUF): tests/check-qom-interface.o $(qom-core-obj) libqemuutil.a libqemustub.a
 tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(block-obj-y) libqemuutil.a libqemustub.a
 tests/test-aio$(EXESUF): tests/test-aio.o $(block-obj-y) libqemuutil.a libqemustub.a
+tests/test-poll$(EXESUF): tests/test-poll.o $(block-obj-y) libqemuutil.a libqemustub.a
 tests/test-rfifolock$(EXESUF): tests/test-rfifolock.o libqemuutil.a libqemustub.a
 tests/test-throttle$(EXESUF): tests/test-throttle.o $(block-obj-y) libqemuutil.a libqemustub.a
 tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(block-obj-y) libqemuutil.a libqemustub.a
diff --git a/tests/test-poll.c b/tests/test-poll.c
new file mode 100644
index 0000000..75074d8
--- /dev/null
+++ b/tests/test-poll.c
@@ -0,0 +1,272 @@
+/*
+ * QTest testcase for QEMU poll
+ *
+ * Copyright Red Hat, Inc. 2014
+ *
+ * Authors:
+ *  Fam Zheng <famz@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include <glib.h>
+#include "qemu/poll.h"
+#include "qemu/event_notifier.h"
+
+static EventNotifier *poll_add_one(QEMUPoll *qpoll)
+{
+    EventNotifier *e = g_new(EventNotifier, 1);
+
+    event_notifier_init(e, false);
+    qemu_poll_add(qpoll, event_notifier_get_fd(e),
+                  G_IO_IN,
+                  NULL);
+    return e;
+}
+
+static int poll_del_one(QEMUPoll *qpoll, EventNotifier *e)
+{
+    int r = qemu_poll_del(qpoll, event_notifier_get_fd(e));
+    event_notifier_cleanup(e);
+    g_free(e);
+    return r;
+}
+
+static void test_poll_single(void)
+{
+    QEMUPoll *qpoll;
+    EventNotifier *e;
+    int r, i;
+
+    qpoll = qemu_poll_new();
+    g_assert(qpoll);
+
+    e = poll_add_one(qpoll);
+
+    /* Write some data and poll */
+    event_notifier_set(e);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 1);
+
+    /* Clear data and poll */
+    r = event_notifier_test_and_clear(e);
+    g_assert(r);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 0);
+
+    /* Write a lot of data and poll */
+    for (i = 0; i < 10000; i++) {
+        event_notifier_set(e);
+    }
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 1);
+
+    /* Clear data and poll again */
+    r = event_notifier_test_and_clear(e);
+    g_assert(r);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 0);
+
+    /* Clean up */
+    poll_del_one(qpoll, e);
+    qemu_poll_free(qpoll);
+}
+
+static void test_poll_multiple(void)
+{
+    QEMUPoll *qpoll;
+    const int N = 32;
+    EventNotifier *e[N];
+    QEMUPollEvent events[N];
+    int r, s, i;
+
+    qpoll = qemu_poll_new();
+    g_assert(qpoll);
+
+    for (i = 0; i < N; i++) {
+        e[i] = poll_add_one(qpoll);
+    }
+
+    /* Write some data for each and poll */
+    for (i = 0; i < N; i++) {
+        event_notifier_set(e[i]);
+    }
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, N);
+
+    /* Clear data and poll */
+    for (i = 0; i < N; i++) {
+        r = event_notifier_test_and_clear(e[i]);
+        g_assert(r);
+    }
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 0);
+
+    /* Write some data for first half and poll */
+    for (i = 0; i < N / 2; i++) {
+        event_notifier_set(e[i]);
+    }
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, N / 2);
+    s = qemu_poll_get_events(qpoll, events, N);
+    g_assert_cmpint(s, ==, r);
+    for (i = 0; i < s; i++) {
+        g_assert(events[i].revents & G_IO_IN);
+    }
+
+    /* Clean up */
+    for (i = 0; i < N; i++) {
+        poll_del_one(qpoll, e[i]);
+    }
+    qemu_poll_free(qpoll);
+}
+
+static void test_poll_add_del(void)
+{
+    QEMUPoll *qpoll;
+    EventNotifier *e1, *e2;
+    QEMUPollEvent events[2];
+    int r;
+
+    qpoll = qemu_poll_new();
+    g_assert(qpoll);
+
+    e1 = poll_add_one(qpoll);
+    e2 = poll_add_one(qpoll);
+
+    /* Write some data for each and poll */
+    event_notifier_set(e1);
+    event_notifier_set(e2);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 2);
+
+    /* Clear e1 and poll */
+    r = event_notifier_test_and_clear(e1);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 1);
+    r = qemu_poll_get_events(qpoll, events, 2);
+    g_assert_cmpint(r, ==, 1);
+    g_assert_cmpint(events[0].fd, ==, event_notifier_get_fd(e2));
+
+    /* Write to both but remove one and poll the other */
+    event_notifier_set(e1);
+    event_notifier_set(e2);
+    qemu_poll_del(qpoll, event_notifier_get_fd(e2));
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 1);
+    r = qemu_poll_get_events(qpoll, events, 2);
+    g_assert_cmpint(r, ==, 1);
+    g_assert_cmpint(events[0].fd, ==, event_notifier_get_fd(e1));
+
+    r = qemu_poll_del(qpoll, event_notifier_get_fd(e2));
+    g_assert_cmpint(r, ==, -ENOENT);
+
+    /* Add it back and poll both */
+    qemu_poll_add(qpoll, event_notifier_get_fd(e2),
+                  G_IO_IN, NULL);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 2);
+
+    event_notifier_test_and_clear(e1);
+    event_notifier_test_and_clear(e2);
+
+    r = qemu_poll_add(qpoll, event_notifier_get_fd(e2),
+                      G_IO_IN, NULL);
+    g_assert_cmpint(r, ==, -EEXIST);
+
+    /* Clean up */
+    poll_del_one(qpoll, e1);
+    poll_del_one(qpoll, e2);
+    qemu_poll_free(qpoll);
+}
+
+static void gpollfd_fill(GPollFD *fds, int start, EventNotifier **e, int n)
+{
+    int i;
+    for (i = 0; i < n; i++) {
+        fds[i + start] = (GPollFD) {
+            .fd = event_notifier_get_fd(e[i]),
+            .events = G_IO_IN,
+        };
+    }
+}
+
+static void test_poll_set_fds(void)
+{
+    QEMUPoll *qpoll;
+    const int M = 20;
+    const int N = 10;
+    EventNotifier *em[M], *en[N];
+    GPollFD fds[M + N];
+    int i, r;
+
+    qpoll = qemu_poll_new();
+    g_assert(qpoll);
+
+    for (i = 0; i < M; i++) {
+        en[i] = poll_add_one(qpoll);
+    }
+
+    for (i = 0; i < M; i++) {
+        em[i] = poll_add_one(qpoll);
+    }
+
+    gpollfd_fill(fds, 0, em, M);
+    gpollfd_fill(fds, M, en, N);
+
+    /* Set N */
+    for (i = 0; i < N; i++) {
+        event_notifier_set(en[i]);
+    }
+
+    /* Poll M + N should get N */
+    r = qemu_poll_set_fds(qpoll, fds, M + N);
+    g_assert_cmpint(r, ==, M + N);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, N);
+
+    /* Poll M should get 0 */
+    r = qemu_poll_set_fds(qpoll, fds, M);
+    g_assert_cmpint(r, ==, M);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 0);
+
+    /* Poll N should get N */
+    r = qemu_poll_set_fds(qpoll, fds + M, N);
+    g_assert_cmpint(r, ==, N);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, N);
+
+    /* Poll M + N / 2 should get N / 2 */
+    r = qemu_poll_set_fds(qpoll, fds, M + N / 2);
+    g_assert_cmpint(r, ==, M + N / 2);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, N / 2);
+
+    /* Clean up */
+    for (i = 0; i < M; i++) {
+        poll_del_one(qpoll, em[i]);
+    }
+
+    for (i = 0; i < N; i++) {
+        poll_del_one(qpoll, en[i]);
+    }
+    qemu_poll_free(qpoll);
+}
+
+int main(int argc, char **argv)
+{
+    int ret;
+
+    g_test_init(&argc, &argv, NULL);
+
+    g_test_add_func("/poll/single", test_poll_single);
+    g_test_add_func("/poll/multiple", test_poll_multiple);
+    g_test_add_func("/poll/add-del", test_poll_add_del);
+    g_test_add_func("/poll/set-fds", test_poll_set_fds);
+
+    ret = g_test_run();
+
+    return ret;
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 6/6] poll-linux: Add timerfd support
  2014-12-04  3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
                   ` (4 preceding siblings ...)
  2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 5/6] tests: Add test case for qemu_poll Fam Zheng
@ 2014-12-04  3:43 ` Fam Zheng
  2014-12-16  2:04 ` [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
  6 siblings, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2014-12-04  3:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

In qemu_poll_timerfd, we arm the timerfd with timeout_ns. The timerfd is
also watched by epollfd, so that when there is no other events,
epoll_wait will still return on time, even though we pass -1 (wait
infinitely).

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 poll-linux.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 83 insertions(+), 1 deletion(-)

diff --git a/poll-linux.c b/poll-linux.c
index 7605004..154da8b 100644
--- a/poll-linux.c
+++ b/poll-linux.c
@@ -12,20 +12,52 @@
  */
 
 #include <sys/epoll.h>
+
+#define USE_TIMERFD CONFIG_TIMERFD
+
+#ifdef USE_TIMERFD
+#include <sys/timerfd.h>
+#endif
+
 #include <glib.h>
 #include <poll.h>
 #include "qemu-common.h"
 #include "qemu/timer.h"
 #include "qemu/poll.h"
 
+
 struct QEMUPoll {
     int epollfd;
+#if USE_TIMERFD
+    int timerfd;
+#endif
     struct epoll_event *events;
     int max_events;
     int nevents;
     GHashTable *fds;
 };
 
+static void qemu_poll_init_timerfd(QEMUPoll *qpoll)
+{
+#if USE_TIMERFD
+    int r;
+    struct epoll_event ev;
+    qpoll->timerfd = timerfd_create(CLOCK_MONOTONIC,
+                                    TFD_NONBLOCK | TFD_CLOEXEC);
+    if (qpoll->timerfd < 0) {
+        perror("timerfd_create");
+        abort();
+    }
+    ev.events = EPOLLIN | EPOLLOUT | EPOLLERR | EPOLLHUP;
+    ev.data.fd = qpoll->timerfd;
+    r = epoll_ctl(qpoll->epollfd, EPOLL_CTL_ADD, qpoll->timerfd, &ev);
+    if (r) {
+        perror("epoll_ctl add timerfd");
+        abort();
+    }
+#endif
+}
+
 QEMUPoll *qemu_poll_new(void)
 {
     int epollfd;
@@ -40,6 +72,7 @@ QEMUPoll *qemu_poll_new(void)
     qpoll->events = g_new(struct epoll_event, 1);
     qpoll->fds = g_hash_table_new_full(g_int_hash, g_int_equal,
                                        NULL, g_free);
+    qemu_poll_init_timerfd(qpoll);
     return qpoll;
 }
 
@@ -48,10 +81,13 @@ void qemu_poll_free(QEMUPoll *qpoll)
     g_free(qpoll->events);
     g_hash_table_destroy(qpoll->fds);
     close(qpoll->epollfd);
+#if USE_TIMERFD
+    close(qpoll->timerfd);
+#endif
     g_free(qpoll);
 }
 
-int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
+static int qemu_poll_ppoll(QEMUPoll *qpoll, int64_t timeout_ns)
 {
     int r;
     struct pollfd fd = {
@@ -78,6 +114,47 @@ int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
     return r;
 }
 
+#if USE_TIMERFD
+static int qemu_poll_timerfd(QEMUPoll *qpoll, int64_t timeout_ns)
+{
+    int r;
+    struct itimerspec its = { { 0 } };
+
+    if (timeout_ns > 0) {
+        its.it_value.tv_sec = timeout_ns / 1000000000LL;
+        its.it_value.tv_nsec = timeout_ns % 1000000000LL;
+    }
+
+    r = timerfd_settime(qpoll->timerfd, 0, &its, NULL);
+    if (r) {
+        struct pollfd fd = {
+            .fd = qpoll->epollfd,
+            .events = POLLIN | POLLOUT | POLLERR | POLLHUP,
+        };
+        perror("timerfd_settime");
+        abort();
+        r = ppoll(&fd, 1, &its.it_value, NULL);
+    }
+    if (r < 0) {
+        return r;
+    }
+    r = epoll_wait(qpoll->epollfd,
+                   qpoll->events,
+                   qpoll->max_events,
+                   timeout_ns > 0 ? -1 : timeout_ns);
+    qpoll->nevents = r;
+    return r;
+}
+#endif
+
+int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
+{
+#if USE_TIMERFD
+    return qemu_poll_timerfd(qpoll, timeout_ns);
+#endif
+    return qemu_poll_ppoll(qpoll, timeout_ns);
+}
+
 static inline uint32_t epoll_event_from_gio_events(int gio_events)
 {
 
@@ -220,6 +297,11 @@ int qemu_poll_get_events(QEMUPoll *qpoll,
     for (i = 0; i < MIN(qpoll->nevents, max_events); i++) {
         ev = &qpoll->events[i];
         fd = ev->data.fd;
+#if USE_TIMERFD
+        if (fd == qpoll->timerfd) {
+            continue;
+        }
+#endif
         p = g_hash_table_lookup(qpoll->fds, &fd);
         assert(p);
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction
  2014-12-04  3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
                   ` (5 preceding siblings ...)
  2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 6/6] poll-linux: Add timerfd support Fam Zheng
@ 2014-12-16  2:04 ` Fam Zheng
  2015-01-07 15:08   ` Stefan Hajnoczi
  6 siblings, 1 reply; 11+ messages in thread
From: Fam Zheng @ 2014-12-16  2:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi, xigao

On Thu, 12/04 11:43, Fam Zheng wrote:
> v2: Emulate nanoseconds precison of timeout with ppoll and timerfd.
>     Their performance is on par with each other, but both much better than
>     qemu.git:
> 
>     syscall         high # of fd      low # of fd
>     -------------------------------------------------
>     qemu.git(ppoll) 44                96
>     ppoll+epoll     85                101
>     timerfd+epoll   87                109

More data points.

Xiaomei tested this series (applied on top of RHEL 7 qemu-kvm-rhev) and found
that:

0) when # of fds is high, epoll solutions are much better (+30%).

1) timerfd+epoll is slightly better than ppoll+epoll, but the difference is
minimal.

2) original code is 2%~5% faster than the new implementations when # of fds is
low.

This leads to the conclusion that that we'll have a small performance
degradation if merge this series. I'm thinking about possible optimizations.
Options in my mind are:

1) Remove 1ns PR_SET_TIMERSLACK in timerfd+epoll, this doesn't make qemu_poll
faster than the old qemu_poll_ns, but may have other positive effects that
compensate the cost.

2) Use dynamic switch between ppoll and timerfd+epoll. In poll-linux.c, We
start with pure ppoll, while keeping track of elapsed time in ppoll. And
periodically, we try "timerfd+epoll" for a few iterations, so that we can
compare if it is faster than pure ppoll. If it is, swap them, use timerfd+epoll
and and periodically try "ppoll".

That said, I'll also look at the kernel side. Maybe optimizing ppoll or just
add EPOLL_NANOSECOND_TIMEOUT to epoll_create1 is a better place for
engineering.

Fam

> 
> (In high # of fd case, 3 activated but idle virtio-console devices are
> attached, which will add us hundereds of fds to poll)
> 
> v1 cover letter
> ---------------
> 
> ppoll(2) doesn't scale as well as epoll: The elapsed time of the syscall is
> linear to the number of fd's we poll, which hurts performance a bit when the
> number of devices are many, or when a virtio device registers many virtqueues
> (virtio-serial, for instance).
> 
> This series introduces qemu_poll, which is implemented  with g_poll and epoll,
> decided at configure time with CONFIG_EPOLL.
> 
> Fam
> 
> 
> Fam Zheng (6):
>   poll: Introduce QEMU Poll API
>   posix-aio: Use QEMU poll interface
>   poll: Add epoll implementation for qemu_poll
>   main-loop: Replace qemu_poll_ns with qemu_poll
>   tests: Add test case for qemu_poll
>   poll-linux: Add timerfd support
> 
>  Makefile.objs           |   2 +
>  aio-posix.c             |  52 ++++----
>  async.c                 |   5 +-
>  include/block/aio.h     |   7 +-
>  include/qemu/poll.h     |  40 ++++++
>  include/qemu/timer.h    |  13 --
>  include/qemu/typedefs.h |   2 +
>  main-loop.c             |  35 +++++-
>  poll-glib.c             | 130 ++++++++++++++++++++
>  poll-linux.c            | 314 ++++++++++++++++++++++++++++++++++++++++++++++++
>  qemu-timer.c            |  28 -----
>  tests/Makefile          |   2 +
>  tests/test-poll.c       | 272 +++++++++++++++++++++++++++++++++++++++++
>  13 files changed, 821 insertions(+), 81 deletions(-)
>  create mode 100644 include/qemu/poll.h
>  create mode 100644 poll-glib.c
>  create mode 100644 poll-linux.c
>  create mode 100644 tests/test-poll.c
> 
> -- 
> 1.9.3
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction
  2014-12-16  2:04 ` [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
@ 2015-01-07 15:08   ` Stefan Hajnoczi
  2015-01-08  2:53     ` Fam Zheng
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2015-01-07 15:08 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, xigao

[-- Attachment #1: Type: text/plain, Size: 2285 bytes --]

On Tue, Dec 16, 2014 at 10:04:38AM +0800, Fam Zheng wrote:
> On Thu, 12/04 11:43, Fam Zheng wrote:
> > v2: Emulate nanoseconds precison of timeout with ppoll and timerfd.
> >     Their performance is on par with each other, but both much better than
> >     qemu.git:
> > 
> >     syscall         high # of fd      low # of fd
> >     -------------------------------------------------
> >     qemu.git(ppoll) 44                96
> >     ppoll+epoll     85                101
> >     timerfd+epoll   87                109
> 
> More data points.
> 
> Xiaomei tested this series (applied on top of RHEL 7 qemu-kvm-rhev) and found
> that:
> 
> 0) when # of fds is high, epoll solutions are much better (+30%).
> 
> 1) timerfd+epoll is slightly better than ppoll+epoll, but the difference is
> minimal.
> 
> 2) original code is 2%~5% faster than the new implementations when # of fds is
> low.

What is "high" and "low"?

I'd like to understand whether they are extremes that almost no users
will encounter or whether they are plausible in the real world.

> This leads to the conclusion that that we'll have a small performance
> degradation if merge this series. I'm thinking about possible optimizations.
> Options in my mind are:
> 
> 1) Remove 1ns PR_SET_TIMERSLACK in timerfd+epoll, this doesn't make qemu_poll
> faster than the old qemu_poll_ns, but may have other positive effects that
> compensate the cost.

Sounds like a random hack.  What is the reasoning for messing with timer
slack?

Perhaps it is worth investigating timer slack as an independent issue
though.

> 2) Use dynamic switch between ppoll and timerfd+epoll. In poll-linux.c, We
> start with pure ppoll, while keeping track of elapsed time in ppoll. And
> periodically, we try "timerfd+epoll" for a few iterations, so that we can
> compare if it is faster than pure ppoll. If it is, swap them, use timerfd+epoll
> and and periodically try "ppoll".
> 
> That said, I'll also look at the kernel side. Maybe optimizing ppoll or just
> add EPOLL_NANOSECOND_TIMEOUT to epoll_create1 is a better place for
> engineering.

I agree that a kernel fix would be good.  Even if the patch is rejected,
we might get good ideas on how applications can optimize.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction
  2015-01-07 15:08   ` Stefan Hajnoczi
@ 2015-01-08  2:53     ` Fam Zheng
  2015-01-08 13:00       ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Fam Zheng @ 2015-01-08  2:53 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, xigao

On Wed, 01/07 15:08, Stefan Hajnoczi wrote:
> On Tue, Dec 16, 2014 at 10:04:38AM +0800, Fam Zheng wrote:
> > On Thu, 12/04 11:43, Fam Zheng wrote:
> > > v2: Emulate nanoseconds precison of timeout with ppoll and timerfd.
> > >     Their performance is on par with each other, but both much better than
> > >     qemu.git:
> > > 
> > >     syscall         high # of fd      low # of fd
> > >     -------------------------------------------------
> > >     qemu.git(ppoll) 44                96
> > >     ppoll+epoll     85                101
> > >     timerfd+epoll   87                109
> > 
> > More data points.
> > 
> > Xiaomei tested this series (applied on top of RHEL 7 qemu-kvm-rhev) and found
> > that:
> > 
> > 0) when # of fds is high, epoll solutions are much better (+30%).
> > 
> > 1) timerfd+epoll is slightly better than ppoll+epoll, but the difference is
> > minimal.
> > 
> > 2) original code is 2%~5% faster than the new implementations when # of fds is
> > low.
> 
> What is "high" and "low"?
> 
> I'd like to understand whether they are extremes that almost no users
> will encounter or whether they are plausible in the real world.

In the origin story, "low" means barely few fds, say 15; and "high" means what
we get after plugging one virtio-serial device, say 70. I wouldn't consider it
a extreme case because we assign one ioeventfd for each vq, and # of vq could
be times of host cpu core number. In a relatively big system it can go to a few
hundreds, easily.

> 
> > This leads to the conclusion that that we'll have a small performance
> > degradation if merge this series. I'm thinking about possible optimizations.
> > Options in my mind are:
> > 
> > 1) Remove 1ns PR_SET_TIMERSLACK in timerfd+epoll, this doesn't make qemu_poll
> > faster than the old qemu_poll_ns, but may have other positive effects that
> > compensate the cost.
> 
> Sounds like a random hack.  What is the reasoning for messing with timer
> slack?

In a test this doesn't work.  The reason is that timer slack affects poll
sys calls' timeout, therefore they are correlated. Anyway, I've left this.

Fam

> 
> Perhaps it is worth investigating timer slack as an independent issue
> though.
> 
> > 2) Use dynamic switch between ppoll and timerfd+epoll. In poll-linux.c, We
> > start with pure ppoll, while keeping track of elapsed time in ppoll. And
> > periodically, we try "timerfd+epoll" for a few iterations, so that we can
> > compare if it is faster than pure ppoll. If it is, swap them, use timerfd+epoll
> > and and periodically try "ppoll".
> > 
> > That said, I'll also look at the kernel side. Maybe optimizing ppoll or just
> > add EPOLL_NANOSECOND_TIMEOUT to epoll_create1 is a better place for
> > engineering.
> 
> I agree that a kernel fix would be good.  Even if the patch is rejected,
> we might get good ideas on how applications can optimize.
> 
> Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction
  2015-01-08  2:53     ` Fam Zheng
@ 2015-01-08 13:00       ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2015-01-08 13:00 UTC (permalink / raw)
  To: Fam Zheng, Stefan Hajnoczi; +Cc: Kevin Wolf, xigao, qemu-devel, Stefan Hajnoczi



On 08/01/2015 03:53, Fam Zheng wrote:
>>> > > 1) Remove 1ns PR_SET_TIMERSLACK in timerfd+epoll, this doesn't make qemu_poll
>>> > > faster than the old qemu_poll_ns, but may have other positive effects that
>>> > > compensate the cost.
>> > 
>> > Sounds like a random hack.  What is the reasoning for messing with timer
>> > slack?
> In a test this doesn't work.  The reason is that timer slack affects poll
> sys calls' timeout, therefore they are correlated. Anyway, I've left this.

1ns is definitely too much, it's just accuracy theater (and I am
responsible for it).  We don't care about 3 clock cycles when a context
switch is at least 500.

We probably could change it to 100 ns (0.1 us) with no practical
effects; the default is 50 us.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-01-08 13:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-04  3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 1/6] poll: Introduce QEMU Poll API Fam Zheng
2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 2/6] posix-aio: Use QEMU poll interface Fam Zheng
2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 3/6] poll: Add epoll implementation for qemu_poll Fam Zheng
2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 4/6] main-loop: Replace qemu_poll_ns with qemu_poll Fam Zheng
2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 5/6] tests: Add test case for qemu_poll Fam Zheng
2014-12-04  3:43 ` [Qemu-devel] [PATCH v2 6/6] poll-linux: Add timerfd support Fam Zheng
2014-12-16  2:04 ` [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
2015-01-07 15:08   ` Stefan Hajnoczi
2015-01-08  2:53     ` Fam Zheng
2015-01-08 13:00       ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.