All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction
@ 2015-04-16  4:57 Fam Zheng
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 1/7] poll: Introduce QEMU Poll API Fam Zheng
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Fam Zheng @ 2015-04-16  4:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

v3: Rebase to master for 2.4.
    Although epoll_pwait1 syscall is still being worked on [1], the QEMU part
    (if any) will base on this, so let's merge it first.
    
    That part is not included in this version because I'm still evaluating by
    comparing epoll_pwait1 with epoll+timerfd as with current master they seem
    to be really close.

    [1]: http://www.spinics.net/lists/linux-api/msg08216.html

v2: Emulate nanoseconds precison of timeout with ppoll and timerfd.
    Their performance is on par with each other, but both much better than
    qemu.git:

    syscall         high # of fd      low # of fd
    -------------------------------------------------
    qemu.git(ppoll) 44                96
    ppoll+epoll     85                101
    timerfd+epoll   87                109

(In high # of fd case, 3 activated but idle virtio-console devices are
attached, which will add us hundereds of fds to poll)

v1 cover letter
---------------

ppoll(2) doesn't scale as well as epoll: The elapsed time of the syscall is
linear to the number of fd's we poll, which hurts performance a bit when the
number of devices are many, or when a virtio device registers many virtqueues
(virtio-serial, for instance).

This series introduces qemu_poll, which is implemented  with g_poll and epoll,
decided at configure time with CONFIG_EPOLL.

Fam


Fam Zheng (7):
  poll: Introduce QEMU Poll API
  posix-aio: Use QEMU poll interface
  poll: Add epoll implementation for qemu_poll
  main-loop: Replace qemu_poll_ns with qemu_poll
  tests: Add test case for qemu_poll
  poll-glib: Support ppoll
  poll-linux: Add timerfd support

 Makefile.objs           |   2 +
 aio-posix.c             |  52 ++++----
 async.c                 |   5 +-
 include/block/aio.h     |   7 +-
 include/qemu/poll.h     |  40 ++++++
 include/qemu/timer.h    |  13 --
 include/qemu/typedefs.h |   4 +-
 main-loop.c             |  35 +++++-
 poll-glib.c             | 153 +++++++++++++++++++++++
 poll-linux.c            | 314 ++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-timer.c            |  28 -----
 tests/Makefile          |   2 +
 tests/test-poll.c       | 272 +++++++++++++++++++++++++++++++++++++++++
 13 files changed, 845 insertions(+), 82 deletions(-)
 create mode 100644 include/qemu/poll.h
 create mode 100644 poll-glib.c
 create mode 100644 poll-linux.c
 create mode 100644 tests/test-poll.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 1/7] poll: Introduce QEMU Poll API
  2015-04-16  4:57 [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
@ 2015-04-16  4:57 ` Fam Zheng
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 2/7] posix-aio: Use QEMU poll interface Fam Zheng
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Fam Zheng @ 2015-04-16  4:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

This is abstract of underlying poll implementation. A glib
implementation is included.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 Makefile.objs           |   2 +-
 include/qemu/poll.h     |  40 +++++++++++++++
 include/qemu/typedefs.h |   4 +-
 poll-glib.c             | 130 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 174 insertions(+), 2 deletions(-)
 create mode 100644 include/qemu/poll.h
 create mode 100644 poll-glib.c

diff --git a/Makefile.objs b/Makefile.objs
index 28999d3..77a56d0 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -9,7 +9,7 @@ util-obj-y = util/ qobject/ qapi/ qapi-types.o qapi-visit.o qapi-event.o
 block-obj-y = async.o thread-pool.o
 block-obj-y += nbd.o block.o blockjob.o
 block-obj-y += main-loop.o iohandler.o qemu-timer.o
-block-obj-$(CONFIG_POSIX) += aio-posix.o
+block-obj-$(CONFIG_POSIX) += aio-posix.o poll-glib.o
 block-obj-$(CONFIG_WIN32) += aio-win32.o
 block-obj-y += block/
 block-obj-y += qemu-io-cmds.o
diff --git a/include/qemu/poll.h b/include/qemu/poll.h
new file mode 100644
index 0000000..597975f
--- /dev/null
+++ b/include/qemu/poll.h
@@ -0,0 +1,40 @@
+#ifndef QEMU_POLL_H
+#define QEMU_POLL_H
+
+#include "qemu/typedefs.h"
+#include "qemu-common.h"
+
+struct QEMUPollEvent {
+    int fd;
+    int events;
+    int revents;
+    void *opaque;
+};
+
+QEMUPoll *qemu_poll_new(void);
+void qemu_poll_free(QEMUPoll *qpoll);
+int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns);
+
+/* Add an fd to poll. Return -EEXIST if fd already registered. */
+int qemu_poll_add(QEMUPoll *qpoll, int fd, int gio_events, void *opaque);
+
+/* Delete a previously added fd. Return -ENOENT if fd not registered. */
+int qemu_poll_del(QEMUPoll *qpoll, int fd);
+
+/**
+ * A shortcut to:
+ * 1) remove all the existing fds;
+ * 2) add all in @fds, while setting each fd's opaque pointer to the pollfd
+ * struct itself.
+ */
+int qemu_poll_set_fds(QEMUPoll *qpoll, GPollFD *fds, int nfds);
+
+/**
+ * Query the events from last qemu_poll. ONLY revent and opaque are valid in
+ * @events after return.
+ */
+int qemu_poll_get_events(QEMUPoll *qpoll,
+                         QEMUPollEvent *events,
+                         int max_events);
+
+#endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index cde3314..8e638cb 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -64,10 +64,12 @@ typedef struct QEMUBH QEMUBH;
 typedef struct QemuConsole QemuConsole;
 typedef struct QEMUFile QEMUFile;
 typedef struct QEMUMachine QEMUMachine;
+typedef struct QEMUPoll QEMUPoll;
+typedef struct QEMUPollEvent QEMUPollEvent;
 typedef struct QEMUSGList QEMUSGList;
 typedef struct QEMUSizedBuffer QEMUSizedBuffer;
-typedef struct QEMUTimerListGroup QEMUTimerListGroup;
 typedef struct QEMUTimer QEMUTimer;
+typedef struct QEMUTimerListGroup QEMUTimerListGroup;
 typedef struct Range Range;
 typedef struct SerialState SerialState;
 typedef struct SHPCDevice SHPCDevice;
diff --git a/poll-glib.c b/poll-glib.c
new file mode 100644
index 0000000..64fde69
--- /dev/null
+++ b/poll-glib.c
@@ -0,0 +1,130 @@
+/*
+ * g_poll implementation for QEMU Poll API
+ *
+ * Copyright Red Hat, Inc. 2014
+ *
+ * Authors:
+ *   Fam Zheng <famz@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "qemu/timer.h"
+#include "qemu/poll.h"
+
+struct QEMUPoll {
+    /* Array of GPollFD for g_poll() */
+    GArray *gpollfds;
+
+    /* Array of opaque pointers, should be in sync with gpollfds */
+    GArray *opaque;
+};
+
+QEMUPoll *qemu_poll_new(void)
+{
+    QEMUPoll *qpoll = g_new0(QEMUPoll, 1);
+    qpoll->gpollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
+    qpoll->opaque = g_array_new(FALSE, FALSE, sizeof(void *));
+    return qpoll;
+}
+
+
+void qemu_poll_free(QEMUPoll *qpoll)
+{
+    g_array_unref(qpoll->gpollfds);
+    g_array_unref(qpoll->opaque);
+    g_free(qpoll);
+}
+
+int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
+{
+    int i;
+    for (i = 0; i < qpoll->gpollfds->len; i++) {
+        GPollFD *p = &g_array_index(qpoll->gpollfds, GPollFD, i);
+        p->revents = 0;
+    }
+    return g_poll((GPollFD *)qpoll->gpollfds->data,
+                  qpoll->gpollfds->len,
+                  qemu_timeout_ns_to_ms(timeout_ns));
+}
+
+/* Add an fd to poll. Return -EEXIST if fd already registered. */
+int qemu_poll_add(QEMUPoll *qpoll, int fd, int gio_events, void *opaque)
+{
+    int i;
+    GPollFD pfd = (GPollFD) {
+        .fd = fd,
+        .revents = 0,
+        .events = gio_events,
+    };
+
+    for (i = 0; i < qpoll->gpollfds->len; i++) {
+        GPollFD *p = &g_array_index(qpoll->gpollfds, GPollFD, i);
+        if (p->fd == fd) {
+            return -EEXIST;
+        }
+    }
+    g_array_append_val(qpoll->gpollfds, pfd);
+    g_array_append_val(qpoll->opaque, opaque);
+    assert(qpoll->gpollfds->len == qpoll->opaque->len);
+    return 0;
+}
+
+/* Delete a previously added fd. Return -ENOENT if fd not registered. */
+int qemu_poll_del(QEMUPoll *qpoll, int fd)
+{
+    int i;
+    int found = -1;
+
+    for (i = 0; i < qpoll->gpollfds->len; i++) {
+        GPollFD *p = &g_array_index(qpoll->gpollfds, GPollFD, i);
+        if (p->fd == fd) {
+            found = i;
+            break;
+        }
+    }
+    if (found >= 0) {
+        g_array_remove_index(qpoll->gpollfds, found);
+        g_array_remove_index(qpoll->opaque, found);
+        assert(qpoll->gpollfds->len == qpoll->opaque->len);
+        return 0;
+    } else {
+        return -ENOENT;
+    }
+}
+
+int qemu_poll_set_fds(QEMUPoll *qpoll, GPollFD *fds, int nfds)
+{
+    int i;
+    g_array_set_size(qpoll->gpollfds, 0);
+    g_array_set_size(qpoll->opaque, 0);
+    for (i = 0; i < nfds; i++) {
+        void *opaque = &fds[i];
+        g_array_append_val(qpoll->gpollfds, fds[i]);
+        g_array_append_val(qpoll->opaque, opaque);
+        assert(qpoll->gpollfds->len == qpoll->opaque->len);
+    }
+    return nfds;
+}
+
+int qemu_poll_get_events(QEMUPoll *qpoll,
+                         QEMUPollEvent *events,
+                         int max_events)
+{
+    int i;
+    int r = 0;
+    for (i = 0; i < qpoll->gpollfds->len && r < max_events; i++) {
+        GPollFD *fd = &g_array_index(qpoll->gpollfds, GPollFD, i);
+        if (fd->revents & fd->events) {
+            events[r].fd = fd->fd;
+            events[r].revents = fd->revents;
+            events[r].events = fd->events;
+            events[r].opaque = g_array_index(qpoll->opaque, void *, i);
+            r++;
+        }
+    }
+    return r;
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 2/7] posix-aio: Use QEMU poll interface
  2015-04-16  4:57 [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 1/7] poll: Introduce QEMU Poll API Fam Zheng
@ 2015-04-16  4:57 ` Fam Zheng
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 3/7] poll: Add epoll implementation for qemu_poll Fam Zheng
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Fam Zheng @ 2015-04-16  4:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

The AIO handler list is only modified by aio_set_fd_handler, so we can
easily add del poll fd there. Initialize a QEMUPoll and keep track of
all the fds, so we don't need to rebuild a GPollFD array for g_poll in
aio_poll.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 aio-posix.c         | 52 +++++++++++++++++++++-------------------------------
 async.c             |  5 +++--
 include/block/aio.h |  7 +++++--
 3 files changed, 29 insertions(+), 35 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index cbd4c34..5c81fe6 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -17,6 +17,7 @@
 #include "block/block.h"
 #include "qemu/queue.h"
 #include "qemu/sockets.h"
+#include "qemu/poll.h"
 
 struct AioHandler
 {
@@ -55,6 +56,7 @@ void aio_set_fd_handler(AioContext *ctx,
     /* Are we deleting the fd handler? */
     if (!io_read && !io_write) {
         if (node) {
+            qemu_poll_del(ctx->qpoll, fd);
             g_source_remove_poll(&ctx->source, &node->pfd);
 
             /* If the lock is held, just mark the node as deleted */
@@ -71,13 +73,15 @@ void aio_set_fd_handler(AioContext *ctx,
             }
         }
     } else {
-        if (node == NULL) {
+        if (node) {
+            /* Remove the old */
+            qemu_poll_del(ctx->qpoll, fd);
+            g_source_remove_poll(&ctx->source, &node->pfd);
+        } else {
             /* Alloc and insert if it's not already there */
             node = g_new0(AioHandler, 1);
             node->pfd.fd = fd;
             QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
-
-            g_source_add_poll(&ctx->source, &node->pfd);
         }
         /* Update handler with latest information */
         node->io_read = io_read;
@@ -87,6 +91,8 @@ void aio_set_fd_handler(AioContext *ctx,
 
         node->pfd.events = (io_read ? G_IO_IN | G_IO_HUP | G_IO_ERR : 0);
         node->pfd.events |= (io_write ? G_IO_OUT | G_IO_ERR : 0);
+        qemu_poll_add(ctx->qpoll, fd, node->pfd.events, node);
+        g_source_add_poll(&ctx->source, &node->pfd);
     }
 
     aio_notify(ctx);
@@ -191,6 +197,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
     AioHandler *node;
     bool was_dispatching;
     int ret;
+    int i;
     bool progress;
 
     was_dispatching = ctx->dispatching;
@@ -208,38 +215,21 @@ bool aio_poll(AioContext *ctx, bool blocking)
      */
     aio_set_dispatching(ctx, !blocking);
 
-    ctx->walking_handlers++;
-
-    g_array_set_size(ctx->pollfds, 0);
-
-    /* fill pollfds */
-    QLIST_FOREACH(node, &ctx->aio_handlers, node) {
-        node->pollfds_idx = -1;
-        if (!node->deleted && node->pfd.events) {
-            GPollFD pfd = {
-                .fd = node->pfd.fd,
-                .events = node->pfd.events,
-            };
-            node->pollfds_idx = ctx->pollfds->len;
-            g_array_append_val(ctx->pollfds, pfd);
-        }
-    }
-
-    ctx->walking_handlers--;
-
     /* wait until next event */
-    ret = qemu_poll_ns((GPollFD *)ctx->pollfds->data,
-                         ctx->pollfds->len,
-                         blocking ? aio_compute_timeout(ctx) : 0);
+    ret = qemu_poll(ctx->qpoll, blocking ? aio_compute_timeout(ctx) : 0);
 
     /* if we have any readable fds, dispatch event */
     if (ret > 0) {
-        QLIST_FOREACH(node, &ctx->aio_handlers, node) {
-            if (node->pollfds_idx != -1) {
-                GPollFD *pfd = &g_array_index(ctx->pollfds, GPollFD,
-                                              node->pollfds_idx);
-                node->pfd.revents = pfd->revents;
-            }
+        int r;
+        g_array_set_size(ctx->events, ret);
+        r = qemu_poll_get_events(ctx->qpoll,
+                                (QEMUPollEvent *)ctx->events->data,
+                                ret);
+        assert(r == ret);
+        for (i = 0; i < r; i++) {
+            QEMUPollEvent *e = &g_array_index(ctx->events, QEMUPollEvent, i);
+            node = e->opaque;
+            node->pfd.revents = e->revents;
         }
     }
 
diff --git a/async.c b/async.c
index 2b51e87..dd5d62f 100644
--- a/async.c
+++ b/async.c
@@ -27,6 +27,7 @@
 #include "block/thread-pool.h"
 #include "qemu/main-loop.h"
 #include "qemu/atomic.h"
+#include "qemu/poll.h"
 
 /***********************************************************/
 /* bottom halves (can be seen as timers which expire ASAP) */
@@ -230,7 +231,6 @@ aio_ctx_finalize(GSource     *source)
     event_notifier_cleanup(&ctx->notifier);
     rfifolock_destroy(&ctx->lock);
     qemu_mutex_destroy(&ctx->bh_lock);
-    g_array_free(ctx->pollfds, TRUE);
     timerlistgroup_deinit(&ctx->tlg);
 }
 
@@ -299,10 +299,11 @@ AioContext *aio_context_new(Error **errp)
         return NULL;
     }
     g_source_set_can_recurse(&ctx->source, true);
+    ctx->qpoll = qemu_poll_new();
+    ctx->events = g_array_new(FALSE, FALSE, sizeof(QEMUPollEvent));
     aio_set_event_notifier(ctx, &ctx->notifier,
                            (EventNotifierHandler *)
                            event_notifier_test_and_clear);
-    ctx->pollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
     ctx->thread_pool = NULL;
     qemu_mutex_init(&ctx->bh_lock);
     rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
diff --git a/include/block/aio.h b/include/block/aio.h
index 7d1e26b..af7edba 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -82,8 +82,11 @@ struct AioContext {
     /* Used for aio_notify.  */
     EventNotifier notifier;
 
-    /* GPollFDs for aio_poll() */
-    GArray *pollfds;
+    /* qemu_poll context */
+    QEMUPoll *qpoll;
+
+    /* QEMUPollEvents for qemu_poll_get_events() */
+    GArray *events;
 
     /* Thread pool for performing work and receiving completion callbacks */
     struct ThreadPool *thread_pool;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 3/7] poll: Add epoll implementation for qemu_poll
  2015-04-16  4:57 [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 1/7] poll: Introduce QEMU Poll API Fam Zheng
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 2/7] posix-aio: Use QEMU poll interface Fam Zheng
@ 2015-04-16  4:57 ` Fam Zheng
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 4/7] main-loop: Replace qemu_poll_ns with qemu_poll Fam Zheng
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Fam Zheng @ 2015-04-16  4:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

This implements qemu_poll with ppoll + epoll. The only complex part is
qemu_poll_set_fds, which will sync up epollfd with epoll_ctl by
computing the symmetric difference of previous and new fds.

The ppoll is used to retain ns precision of timeout.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 Makefile.objs |   4 +-
 poll-linux.c  | 232 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 235 insertions(+), 1 deletion(-)
 create mode 100644 poll-linux.c

diff --git a/Makefile.objs b/Makefile.objs
index 77a56d0..d3f6f51 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -9,7 +9,9 @@ util-obj-y = util/ qobject/ qapi/ qapi-types.o qapi-visit.o qapi-event.o
 block-obj-y = async.o thread-pool.o
 block-obj-y += nbd.o block.o blockjob.o
 block-obj-y += main-loop.o iohandler.o qemu-timer.o
-block-obj-$(CONFIG_POSIX) += aio-posix.o poll-glib.o
+block-obj-$(CONFIG_POSIX) += aio-posix.o
+block-obj-$(CONFIG_EPOLL) += poll-linux.o
+block-obj-$(if $(CONFIG_EPOLL),n,$(CONFIG_POSIX)) += poll-glib.o
 block-obj-$(CONFIG_WIN32) += aio-win32.o
 block-obj-y += block/
 block-obj-y += qemu-io-cmds.o
diff --git a/poll-linux.c b/poll-linux.c
new file mode 100644
index 0000000..7605004
--- /dev/null
+++ b/poll-linux.c
@@ -0,0 +1,232 @@
+/*
+ * epoll implementation for QEMU Poll API
+ *
+ * Copyright Red Hat, Inc. 2014
+ *
+ * Authors:
+ *   Fam Zheng <famz@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <sys/epoll.h>
+#include <glib.h>
+#include <poll.h>
+#include "qemu-common.h"
+#include "qemu/timer.h"
+#include "qemu/poll.h"
+
+struct QEMUPoll {
+    int epollfd;
+    struct epoll_event *events;
+    int max_events;
+    int nevents;
+    GHashTable *fds;
+};
+
+QEMUPoll *qemu_poll_new(void)
+{
+    int epollfd;
+    QEMUPoll *qpoll = g_new0(QEMUPoll, 1);
+    epollfd = epoll_create1(EPOLL_CLOEXEC);
+    if (epollfd < 0) {
+        perror("epoll_create1:");
+        abort();
+    }
+    qpoll->epollfd = epollfd;
+    qpoll->max_events = 1;
+    qpoll->events = g_new(struct epoll_event, 1);
+    qpoll->fds = g_hash_table_new_full(g_int_hash, g_int_equal,
+                                       NULL, g_free);
+    return qpoll;
+}
+
+void qemu_poll_free(QEMUPoll *qpoll)
+{
+    g_free(qpoll->events);
+    g_hash_table_destroy(qpoll->fds);
+    close(qpoll->epollfd);
+    g_free(qpoll);
+}
+
+int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
+{
+    int r;
+    struct pollfd fd = {
+        .fd = qpoll->epollfd,
+        .events = POLLIN | POLLOUT | POLLERR | POLLHUP,
+    };
+
+    if (timeout_ns >= 0) {
+        struct timespec ts;
+        ts.tv_sec = timeout_ns / 1000000000LL;
+        ts.tv_nsec = timeout_ns % 1000000000LL;
+        r = ppoll(&fd, 1, &ts, NULL);
+    } else {
+        r = ppoll(&fd, 1, NULL, NULL);
+    }
+
+    if (r > 0) {
+        r = epoll_wait(qpoll->epollfd,
+                       qpoll->events,
+                       qpoll->max_events,
+                       0);
+        qpoll->nevents = r;
+    }
+    return r;
+}
+
+static inline uint32_t epoll_event_from_gio_events(int gio_events)
+{
+
+    return (gio_events & G_IO_IN  ? EPOLLIN : 0) |
+           (gio_events & G_IO_OUT ? EPOLLOUT : 0) |
+           (gio_events & G_IO_ERR ? EPOLLERR : 0) |
+           (gio_events & G_IO_HUP ? EPOLLHUP : 0);
+}
+
+
+/* Add an fd to poll. Return -EEXIST if fd already registered. */
+int qemu_poll_add(QEMUPoll *qpoll, int fd, int gio_events, void *opaque)
+{
+    int ret;
+    struct epoll_event ev;
+    QEMUPollEvent *e;
+
+    ev.events = epoll_event_from_gio_events(gio_events);
+    ev.data.fd = fd;
+    ret = epoll_ctl(qpoll->epollfd, EPOLL_CTL_ADD, fd, &ev);
+    if (ret) {
+        ret = -errno;
+        return ret;
+    }
+    qpoll->max_events++;
+    qpoll->events = g_renew(struct epoll_event,
+                            qpoll->events,
+                            qpoll->max_events);
+    /* Shouldn't get here if the fd is already added since epoll_ctl would
+     * return -EEXIST, assert to be sure */
+    assert(g_hash_table_lookup(qpoll->fds, &fd) == NULL);
+    e = g_new0(QEMUPollEvent, 1);
+    e->fd = fd;
+    e->events = gio_events;
+    e->opaque = opaque;
+    g_hash_table_insert(qpoll->fds, &e->fd, e);
+    return ret;
+}
+
+/* Delete a previously added fd. Return -ENOENT if fd not registered. */
+int qemu_poll_del(QEMUPoll *qpoll, int fd)
+{
+    int ret;
+
+    if (!g_hash_table_lookup(qpoll->fds, &fd)) {
+        ret = -ENOENT;
+        goto out;
+    }
+    ret = epoll_ctl(qpoll->epollfd, EPOLL_CTL_DEL, fd, NULL);
+    if (ret) {
+        ret = -errno;
+        goto out;
+    }
+    qpoll->max_events--;
+    qpoll->events = g_renew(struct epoll_event,
+                            qpoll->events,
+                            qpoll->max_events);
+out:
+    g_hash_table_remove(qpoll->fds, &fd);
+    return ret;
+}
+
+static void qemu_poll_copy_fd(gpointer key, gpointer value, gpointer user_data)
+{
+    GHashTable *dst = user_data;
+    QEMUPollEvent *event, *copy;
+
+    event = value;
+    copy = g_new(QEMUPollEvent, 1);
+    *copy = *event;
+    g_hash_table_insert(dst, &copy->fd, copy);
+}
+
+static void qemu_poll_del_fd(gpointer key, gpointer value, gpointer user_data)
+{
+    QEMUPoll *qpoll = user_data;
+    QEMUPollEvent *event = value;
+
+    qemu_poll_del(qpoll, event->fd);
+}
+
+int qemu_poll_set_fds(QEMUPoll *qpoll, GPollFD *fds, int nfds)
+{
+    int i;
+    int updated = 0;
+    int ret = nfds;
+    int old_size = g_hash_table_size(qpoll->fds);
+
+    for (i = 0; i < nfds; i++) {
+        int r;
+        GPollFD *fd = &fds[i];
+        QEMUPollEvent *e = g_hash_table_lookup(qpoll->fds, &fd->fd);
+        if (e) {
+            updated++;
+            assert(e->fd == fd->fd);
+            if (e->events == fd->events) {
+                e->opaque = fd;
+                continue;
+            }
+            r = qemu_poll_del(qpoll, fd->fd);
+            if (r < 0) {
+                ret = r;
+                break;
+            }
+        }
+        r = qemu_poll_add(qpoll, fd->fd, fd->events, fd);
+        if (r < 0) {
+            ret = r;
+            break;
+        }
+    }
+
+    if (updated < old_size) {
+        GHashTable *fds_copy;
+
+        fds_copy = g_hash_table_new_full(g_int_hash, g_int_equal, NULL, g_free);
+        g_hash_table_foreach(qpoll->fds, qemu_poll_copy_fd, fds_copy);
+
+        /* Some fds need to be removed, as they are not seen in new fds */
+        for (i = 0; i < nfds; i++) {
+            GPollFD *fd = &fds[i];
+            g_hash_table_remove(fds_copy, &fd->fd);
+        }
+
+        g_hash_table_foreach(fds_copy, qemu_poll_del_fd, qpoll);
+        g_hash_table_destroy(fds_copy);
+    }
+    return ret;
+}
+
+int qemu_poll_get_events(QEMUPoll *qpoll,
+                         QEMUPollEvent *events,
+                         int max_events)
+{
+    int i;
+    QEMUPollEvent *p;
+    struct epoll_event *ev;
+    int fd;
+
+    for (i = 0; i < MIN(qpoll->nevents, max_events); i++) {
+        ev = &qpoll->events[i];
+        fd = ev->data.fd;
+        p = g_hash_table_lookup(qpoll->fds, &fd);
+        assert(p);
+
+        events[i].revents = ev->events;
+        events[i].opaque = p->opaque;
+        events[i].fd = fd;
+        events[i].events = p->events;
+    }
+    return i;
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 4/7] main-loop: Replace qemu_poll_ns with qemu_poll
  2015-04-16  4:57 [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
                   ` (2 preceding siblings ...)
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 3/7] poll: Add epoll implementation for qemu_poll Fam Zheng
@ 2015-04-16  4:57 ` Fam Zheng
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 5/7] tests: Add test case for qemu_poll Fam Zheng
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Fam Zheng @ 2015-04-16  4:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

qemu_poll_set_fds + qemu_poll does the same, and when epoll is
available, it is faster.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 include/qemu/timer.h | 13 -------------
 main-loop.c          | 35 ++++++++++++++++++++++++++++++-----
 qemu-timer.c         | 28 ----------------------------
 3 files changed, 30 insertions(+), 46 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index e5bd494..ae99deb 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -738,19 +738,6 @@ void timer_put(QEMUFile *f, QEMUTimer *ts);
 int qemu_timeout_ns_to_ms(int64_t ns);
 
 /**
- * qemu_poll_ns:
- * @fds: Array of file descriptors
- * @nfds: number of file descriptors
- * @timeout: timeout in nanoseconds
- *
- * Perform a poll like g_poll but with a timeout in nanoseconds.
- * See g_poll documentation for further details.
- *
- * Returns: number of fds ready
- */
-int qemu_poll_ns(GPollFD *fds, guint nfds, int64_t timeout);
-
-/**
  * qemu_soonest_timeout:
  * @timeout1: first timeout in nanoseconds (or -1 for infinite)
  * @timeout2: second timeout in nanoseconds (or -1 for infinite)
diff --git a/main-loop.c b/main-loop.c
index 981bcb5..b70f929 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -29,6 +29,7 @@
 #include "slirp/libslirp.h"
 #include "qemu/main-loop.h"
 #include "block/aio.h"
+#include "qemu/poll.h"
 
 #ifndef _WIN32
 
@@ -130,6 +131,7 @@ void qemu_notify_event(void)
 }
 
 static GArray *gpollfds;
+static QEMUPoll *qpoll;
 
 int qemu_init_main_loop(Error **errp)
 {
@@ -150,6 +152,7 @@ int qemu_init_main_loop(Error **errp)
         return -EMFILE;
     }
     gpollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
+    qpoll = qemu_poll_new();
     src = aio_get_g_source(qemu_aio_context);
     g_source_attach(src, NULL);
     g_source_unref(src);
@@ -207,6 +210,8 @@ static int os_host_main_loop_wait(int64_t timeout)
 {
     int ret;
     static int spin_counter;
+    QEMUPollEvent events[1024];
+    int r, i;
 
     glib_pollfds_fill(&timeout);
 
@@ -236,7 +241,17 @@ static int os_host_main_loop_wait(int64_t timeout)
         spin_counter++;
     }
 
-    ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len, timeout);
+    qemu_poll_set_fds(qpoll, (GPollFD *)gpollfds->data, gpollfds->len);
+    ret = qemu_poll(qpoll, timeout);
+    if (ret > 0) {
+        ret = MIN(ret, sizeof(events) / sizeof(QEMUPollEvent));
+        r = qemu_poll_get_events(qpoll, events, ret);
+        for (i = 0; i < r; i++) {
+            GPollFD *fd = events[i].opaque;
+            fd->revents = events[i].revents;
+        }
+        ret = r;
+    }
 
     if (timeout) {
         qemu_mutex_lock_iothread();
@@ -389,9 +404,11 @@ static int os_host_main_loop_wait(int64_t timeout)
 {
     GMainContext *context = g_main_context_default();
     GPollFD poll_fds[1024 * 2]; /* this is probably overkill */
+    QEMUPollEvent poll_events[1024 * 2];
     int select_ret = 0;
-    int g_poll_ret, ret, i, n_poll_fds;
+    int nevents, ret, i, n_poll_fds;
     PollingEntry *pe;
+    QEMUPollEvent *events;
     WaitObjects *w = &wait_objects;
     gint poll_timeout;
     int64_t poll_timeout_ns;
@@ -441,10 +458,18 @@ static int os_host_main_loop_wait(int64_t timeout)
     poll_timeout_ns = qemu_soonest_timeout(poll_timeout_ns, timeout);
 
     qemu_mutex_unlock_iothread();
-    g_poll_ret = qemu_poll_ns(poll_fds, n_poll_fds + w->num, poll_timeout_ns);
+
+    qemu_poll_set_fds(qpoll, (GPollFD *)poll_fds, n_poll_fds + w->num)
+    nevents = qemu_poll(qpoll, poll_timeout_ns);
 
     qemu_mutex_lock_iothread();
-    if (g_poll_ret > 0) {
+    if (nevents > 0) {
+        r = qemu_poll_get_events(qpoll, poll_events, nevents);
+        for (i = 0; i < r; i++) {
+            GPollFD *fd = poll_events[i].opaque;
+            fd->revents = poll_events[i].revents;
+        }
+
         for (i = 0; i < w->num; i++) {
             w->revents[i] = poll_fds[n_poll_fds + i].revents;
         }
@@ -459,7 +484,7 @@ static int os_host_main_loop_wait(int64_t timeout)
         g_main_context_dispatch(context);
     }
 
-    return select_ret || g_poll_ret;
+    return select_ret || nevents;
 }
 #endif
 
diff --git a/qemu-timer.c b/qemu-timer.c
index 5741f0d..41440e2 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -299,34 +299,6 @@ int qemu_timeout_ns_to_ms(int64_t ns)
     return (int) ms;
 }
 
-
-/* qemu implementation of g_poll which uses a nanosecond timeout but is
- * otherwise identical to g_poll
- */
-int qemu_poll_ns(GPollFD *fds, guint nfds, int64_t timeout)
-{
-#ifdef CONFIG_PPOLL
-    if (timeout < 0) {
-        return ppoll((struct pollfd *)fds, nfds, NULL, NULL);
-    } else {
-        struct timespec ts;
-        int64_t tvsec = timeout / 1000000000LL;
-        /* Avoid possibly overflowing and specifying a negative number of
-         * seconds, which would turn a very long timeout into a busy-wait.
-         */
-        if (tvsec > (int64_t)INT32_MAX) {
-            tvsec = INT32_MAX;
-        }
-        ts.tv_sec = tvsec;
-        ts.tv_nsec = timeout % 1000000000LL;
-        return ppoll((struct pollfd *)fds, nfds, &ts, NULL);
-    }
-#else
-    return g_poll(fds, nfds, qemu_timeout_ns_to_ms(timeout));
-#endif
-}
-
-
 void timer_init_tl(QEMUTimer *ts,
                    QEMUTimerList *timer_list, int scale,
                    QEMUTimerCB *cb, void *opaque)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 5/7] tests: Add test case for qemu_poll
  2015-04-16  4:57 [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
                   ` (3 preceding siblings ...)
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 4/7] main-loop: Replace qemu_poll_ns with qemu_poll Fam Zheng
@ 2015-04-16  4:57 ` Fam Zheng
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 6/7] poll-glib: Support ppoll Fam Zheng
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Fam Zheng @ 2015-04-16  4:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 tests/Makefile    |   2 +
 tests/test-poll.c | 272 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 274 insertions(+)
 create mode 100644 tests/test-poll.c

diff --git a/tests/Makefile b/tests/Makefile
index 55aa745..d152bf5 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -41,6 +41,7 @@ check-unit-$(CONFIG_POSIX) += tests/test-rfifolock$(EXESUF)
 check-unit-y += tests/test-throttle$(EXESUF)
 gcov-files-test-aio-$(CONFIG_WIN32) = aio-win32.c
 gcov-files-test-aio-$(CONFIG_POSIX) = aio-posix.c
+check-unit-y += tests/test-poll$(EXESUF)
 check-unit-y += tests/test-thread-pool$(EXESUF)
 gcov-files-test-thread-pool-y = thread-pool.c
 gcov-files-test-hbitmap-y = util/hbitmap.c
@@ -251,6 +252,7 @@ tests/check-qjson$(EXESUF): tests/check-qjson.o libqemuutil.a libqemustub.a
 tests/check-qom-interface$(EXESUF): tests/check-qom-interface.o $(qom-core-obj) libqemuutil.a libqemustub.a
 tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(block-obj-y) libqemuutil.a libqemustub.a
 tests/test-aio$(EXESUF): tests/test-aio.o $(block-obj-y) libqemuutil.a libqemustub.a
+tests/test-poll$(EXESUF): tests/test-poll.o $(block-obj-y) libqemuutil.a libqemustub.a
 tests/test-rfifolock$(EXESUF): tests/test-rfifolock.o libqemuutil.a libqemustub.a
 tests/test-throttle$(EXESUF): tests/test-throttle.o $(block-obj-y) libqemuutil.a libqemustub.a
 tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(block-obj-y) libqemuutil.a libqemustub.a
diff --git a/tests/test-poll.c b/tests/test-poll.c
new file mode 100644
index 0000000..75074d8
--- /dev/null
+++ b/tests/test-poll.c
@@ -0,0 +1,272 @@
+/*
+ * QTest testcase for QEMU poll
+ *
+ * Copyright Red Hat, Inc. 2014
+ *
+ * Authors:
+ *  Fam Zheng <famz@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include <glib.h>
+#include "qemu/poll.h"
+#include "qemu/event_notifier.h"
+
+static EventNotifier *poll_add_one(QEMUPoll *qpoll)
+{
+    EventNotifier *e = g_new(EventNotifier, 1);
+
+    event_notifier_init(e, false);
+    qemu_poll_add(qpoll, event_notifier_get_fd(e),
+                  G_IO_IN,
+                  NULL);
+    return e;
+}
+
+static int poll_del_one(QEMUPoll *qpoll, EventNotifier *e)
+{
+    int r = qemu_poll_del(qpoll, event_notifier_get_fd(e));
+    event_notifier_cleanup(e);
+    g_free(e);
+    return r;
+}
+
+static void test_poll_single(void)
+{
+    QEMUPoll *qpoll;
+    EventNotifier *e;
+    int r, i;
+
+    qpoll = qemu_poll_new();
+    g_assert(qpoll);
+
+    e = poll_add_one(qpoll);
+
+    /* Write some data and poll */
+    event_notifier_set(e);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 1);
+
+    /* Clear data and poll */
+    r = event_notifier_test_and_clear(e);
+    g_assert(r);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 0);
+
+    /* Write a lot of data and poll */
+    for (i = 0; i < 10000; i++) {
+        event_notifier_set(e);
+    }
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 1);
+
+    /* Clear data and poll again */
+    r = event_notifier_test_and_clear(e);
+    g_assert(r);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 0);
+
+    /* Clean up */
+    poll_del_one(qpoll, e);
+    qemu_poll_free(qpoll);
+}
+
+static void test_poll_multiple(void)
+{
+    QEMUPoll *qpoll;
+    const int N = 32;
+    EventNotifier *e[N];
+    QEMUPollEvent events[N];
+    int r, s, i;
+
+    qpoll = qemu_poll_new();
+    g_assert(qpoll);
+
+    for (i = 0; i < N; i++) {
+        e[i] = poll_add_one(qpoll);
+    }
+
+    /* Write some data for each and poll */
+    for (i = 0; i < N; i++) {
+        event_notifier_set(e[i]);
+    }
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, N);
+
+    /* Clear data and poll */
+    for (i = 0; i < N; i++) {
+        r = event_notifier_test_and_clear(e[i]);
+        g_assert(r);
+    }
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 0);
+
+    /* Write some data for first half and poll */
+    for (i = 0; i < N / 2; i++) {
+        event_notifier_set(e[i]);
+    }
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, N / 2);
+    s = qemu_poll_get_events(qpoll, events, N);
+    g_assert_cmpint(s, ==, r);
+    for (i = 0; i < s; i++) {
+        g_assert(events[i].revents & G_IO_IN);
+    }
+
+    /* Clean up */
+    for (i = 0; i < N; i++) {
+        poll_del_one(qpoll, e[i]);
+    }
+    qemu_poll_free(qpoll);
+}
+
+static void test_poll_add_del(void)
+{
+    QEMUPoll *qpoll;
+    EventNotifier *e1, *e2;
+    QEMUPollEvent events[2];
+    int r;
+
+    qpoll = qemu_poll_new();
+    g_assert(qpoll);
+
+    e1 = poll_add_one(qpoll);
+    e2 = poll_add_one(qpoll);
+
+    /* Write some data for each and poll */
+    event_notifier_set(e1);
+    event_notifier_set(e2);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 2);
+
+    /* Clear e1 and poll */
+    r = event_notifier_test_and_clear(e1);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 1);
+    r = qemu_poll_get_events(qpoll, events, 2);
+    g_assert_cmpint(r, ==, 1);
+    g_assert_cmpint(events[0].fd, ==, event_notifier_get_fd(e2));
+
+    /* Write to both but remove one and poll the other */
+    event_notifier_set(e1);
+    event_notifier_set(e2);
+    qemu_poll_del(qpoll, event_notifier_get_fd(e2));
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 1);
+    r = qemu_poll_get_events(qpoll, events, 2);
+    g_assert_cmpint(r, ==, 1);
+    g_assert_cmpint(events[0].fd, ==, event_notifier_get_fd(e1));
+
+    r = qemu_poll_del(qpoll, event_notifier_get_fd(e2));
+    g_assert_cmpint(r, ==, -ENOENT);
+
+    /* Add it back and poll both */
+    qemu_poll_add(qpoll, event_notifier_get_fd(e2),
+                  G_IO_IN, NULL);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 2);
+
+    event_notifier_test_and_clear(e1);
+    event_notifier_test_and_clear(e2);
+
+    r = qemu_poll_add(qpoll, event_notifier_get_fd(e2),
+                      G_IO_IN, NULL);
+    g_assert_cmpint(r, ==, -EEXIST);
+
+    /* Clean up */
+    poll_del_one(qpoll, e1);
+    poll_del_one(qpoll, e2);
+    qemu_poll_free(qpoll);
+}
+
+static void gpollfd_fill(GPollFD *fds, int start, EventNotifier **e, int n)
+{
+    int i;
+    for (i = 0; i < n; i++) {
+        fds[i + start] = (GPollFD) {
+            .fd = event_notifier_get_fd(e[i]),
+            .events = G_IO_IN,
+        };
+    }
+}
+
+static void test_poll_set_fds(void)
+{
+    QEMUPoll *qpoll;
+    const int M = 20;
+    const int N = 10;
+    EventNotifier *em[M], *en[N];
+    GPollFD fds[M + N];
+    int i, r;
+
+    qpoll = qemu_poll_new();
+    g_assert(qpoll);
+
+    for (i = 0; i < M; i++) {
+        en[i] = poll_add_one(qpoll);
+    }
+
+    for (i = 0; i < M; i++) {
+        em[i] = poll_add_one(qpoll);
+    }
+
+    gpollfd_fill(fds, 0, em, M);
+    gpollfd_fill(fds, M, en, N);
+
+    /* Set N */
+    for (i = 0; i < N; i++) {
+        event_notifier_set(en[i]);
+    }
+
+    /* Poll M + N should get N */
+    r = qemu_poll_set_fds(qpoll, fds, M + N);
+    g_assert_cmpint(r, ==, M + N);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, N);
+
+    /* Poll M should get 0 */
+    r = qemu_poll_set_fds(qpoll, fds, M);
+    g_assert_cmpint(r, ==, M);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, 0);
+
+    /* Poll N should get N */
+    r = qemu_poll_set_fds(qpoll, fds + M, N);
+    g_assert_cmpint(r, ==, N);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, N);
+
+    /* Poll M + N / 2 should get N / 2 */
+    r = qemu_poll_set_fds(qpoll, fds, M + N / 2);
+    g_assert_cmpint(r, ==, M + N / 2);
+    r = qemu_poll(qpoll, 0);
+    g_assert_cmpint(r, ==, N / 2);
+
+    /* Clean up */
+    for (i = 0; i < M; i++) {
+        poll_del_one(qpoll, em[i]);
+    }
+
+    for (i = 0; i < N; i++) {
+        poll_del_one(qpoll, en[i]);
+    }
+    qemu_poll_free(qpoll);
+}
+
+int main(int argc, char **argv)
+{
+    int ret;
+
+    g_test_init(&argc, &argv, NULL);
+
+    g_test_add_func("/poll/single", test_poll_single);
+    g_test_add_func("/poll/multiple", test_poll_multiple);
+    g_test_add_func("/poll/add-del", test_poll_add_del);
+    g_test_add_func("/poll/set-fds", test_poll_set_fds);
+
+    ret = g_test_run();
+
+    return ret;
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 6/7] poll-glib: Support ppoll
  2015-04-16  4:57 [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
                   ` (4 preceding siblings ...)
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 5/7] tests: Add test case for qemu_poll Fam Zheng
@ 2015-04-16  4:57 ` Fam Zheng
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 7/7] poll-linux: Add timerfd support Fam Zheng
  2015-04-16 13:03 ` [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction Stefan Hajnoczi
  7 siblings, 0 replies; 12+ messages in thread
From: Fam Zheng @ 2015-04-16  4:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 poll-glib.c | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/poll-glib.c b/poll-glib.c
index 64fde69..23d528d 100644
--- a/poll-glib.c
+++ b/poll-glib.c
@@ -14,6 +14,9 @@
 #include "qemu-common.h"
 #include "qemu/timer.h"
 #include "qemu/poll.h"
+#if CONFIG_PPOLL
+#include <poll.h>
+#endif
 
 struct QEMUPoll {
     /* Array of GPollFD for g_poll() */
@@ -42,13 +45,33 @@ void qemu_poll_free(QEMUPoll *qpoll)
 int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
 {
     int i;
+    void *fds = qpoll->gpollfds->data;
+    int nfds = qpoll->gpollfds->len;
+
     for (i = 0; i < qpoll->gpollfds->len; i++) {
         GPollFD *p = &g_array_index(qpoll->gpollfds, GPollFD, i);
         p->revents = 0;
     }
-    return g_poll((GPollFD *)qpoll->gpollfds->data,
-                  qpoll->gpollfds->len,
-                  qemu_timeout_ns_to_ms(timeout_ns));
+#if CONFIG_PPOLL
+    if (timeout_ns < 0) {
+        return ppoll(fds, nfds, NULL, NULL);
+    } else {
+        struct timespec ts;
+        int64_t tvsec = timeout_ns / 1000000000LL;
+        /* Avoid possibly overflowing and specifying a negative number of
+         * seconds, which would turn a very long timeout into a busy-wait.
+         */
+        if (tvsec > (int64_t)INT32_MAX) {
+            tvsec = INT32_MAX;
+        }
+        ts.tv_sec = tvsec;
+        ts.tv_nsec = timeout_ns % 1000000000LL;
+        return ppoll(fds, nfds, &ts, NULL);
+    }
+
+#else
+    return g_poll(data, nfds, qemu_timeout_ns_to_ms(timeout_ns));
+#endif
 }
 
 /* Add an fd to poll. Return -EEXIST if fd already registered. */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 7/7] poll-linux: Add timerfd support
  2015-04-16  4:57 [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
                   ` (5 preceding siblings ...)
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 6/7] poll-glib: Support ppoll Fam Zheng
@ 2015-04-16  4:57 ` Fam Zheng
  2015-04-16 13:00   ` Stefan Hajnoczi
  2015-04-16 13:03 ` [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction Stefan Hajnoczi
  7 siblings, 1 reply; 12+ messages in thread
From: Fam Zheng @ 2015-04-16  4:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi

In qemu_poll_timerfd, we arm the timerfd with timeout_ns. The timerfd is
also watched by epollfd, so that when there is no other events,
epoll_wait will still return on time, even though we pass -1 (wait
infinitely).

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 poll-linux.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 83 insertions(+), 1 deletion(-)

diff --git a/poll-linux.c b/poll-linux.c
index 7605004..154da8b 100644
--- a/poll-linux.c
+++ b/poll-linux.c
@@ -12,20 +12,52 @@
  */
 
 #include <sys/epoll.h>
+
+#define USE_TIMERFD CONFIG_TIMERFD
+
+#ifdef USE_TIMERFD
+#include <sys/timerfd.h>
+#endif
+
 #include <glib.h>
 #include <poll.h>
 #include "qemu-common.h"
 #include "qemu/timer.h"
 #include "qemu/poll.h"
 
+
 struct QEMUPoll {
     int epollfd;
+#if USE_TIMERFD
+    int timerfd;
+#endif
     struct epoll_event *events;
     int max_events;
     int nevents;
     GHashTable *fds;
 };
 
+static void qemu_poll_init_timerfd(QEMUPoll *qpoll)
+{
+#if USE_TIMERFD
+    int r;
+    struct epoll_event ev;
+    qpoll->timerfd = timerfd_create(CLOCK_MONOTONIC,
+                                    TFD_NONBLOCK | TFD_CLOEXEC);
+    if (qpoll->timerfd < 0) {
+        perror("timerfd_create");
+        abort();
+    }
+    ev.events = EPOLLIN | EPOLLOUT | EPOLLERR | EPOLLHUP;
+    ev.data.fd = qpoll->timerfd;
+    r = epoll_ctl(qpoll->epollfd, EPOLL_CTL_ADD, qpoll->timerfd, &ev);
+    if (r) {
+        perror("epoll_ctl add timerfd");
+        abort();
+    }
+#endif
+}
+
 QEMUPoll *qemu_poll_new(void)
 {
     int epollfd;
@@ -40,6 +72,7 @@ QEMUPoll *qemu_poll_new(void)
     qpoll->events = g_new(struct epoll_event, 1);
     qpoll->fds = g_hash_table_new_full(g_int_hash, g_int_equal,
                                        NULL, g_free);
+    qemu_poll_init_timerfd(qpoll);
     return qpoll;
 }
 
@@ -48,10 +81,13 @@ void qemu_poll_free(QEMUPoll *qpoll)
     g_free(qpoll->events);
     g_hash_table_destroy(qpoll->fds);
     close(qpoll->epollfd);
+#if USE_TIMERFD
+    close(qpoll->timerfd);
+#endif
     g_free(qpoll);
 }
 
-int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
+static int qemu_poll_ppoll(QEMUPoll *qpoll, int64_t timeout_ns)
 {
     int r;
     struct pollfd fd = {
@@ -78,6 +114,47 @@ int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
     return r;
 }
 
+#if USE_TIMERFD
+static int qemu_poll_timerfd(QEMUPoll *qpoll, int64_t timeout_ns)
+{
+    int r;
+    struct itimerspec its = { { 0 } };
+
+    if (timeout_ns > 0) {
+        its.it_value.tv_sec = timeout_ns / 1000000000LL;
+        its.it_value.tv_nsec = timeout_ns % 1000000000LL;
+    }
+
+    r = timerfd_settime(qpoll->timerfd, 0, &its, NULL);
+    if (r) {
+        struct pollfd fd = {
+            .fd = qpoll->epollfd,
+            .events = POLLIN | POLLOUT | POLLERR | POLLHUP,
+        };
+        perror("timerfd_settime");
+        abort();
+        r = ppoll(&fd, 1, &its.it_value, NULL);
+    }
+    if (r < 0) {
+        return r;
+    }
+    r = epoll_wait(qpoll->epollfd,
+                   qpoll->events,
+                   qpoll->max_events,
+                   timeout_ns > 0 ? -1 : timeout_ns);
+    qpoll->nevents = r;
+    return r;
+}
+#endif
+
+int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
+{
+#if USE_TIMERFD
+    return qemu_poll_timerfd(qpoll, timeout_ns);
+#endif
+    return qemu_poll_ppoll(qpoll, timeout_ns);
+}
+
 static inline uint32_t epoll_event_from_gio_events(int gio_events)
 {
 
@@ -220,6 +297,11 @@ int qemu_poll_get_events(QEMUPoll *qpoll,
     for (i = 0; i < MIN(qpoll->nevents, max_events); i++) {
         ev = &qpoll->events[i];
         fd = ev->data.fd;
+#if USE_TIMERFD
+        if (fd == qpoll->timerfd) {
+            continue;
+        }
+#endif
         p = g_hash_table_lookup(qpoll->fds, &fd);
         assert(p);
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 7/7] poll-linux: Add timerfd support
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 7/7] poll-linux: Add timerfd support Fam Zheng
@ 2015-04-16 13:00   ` Stefan Hajnoczi
  0 siblings, 0 replies; 12+ messages in thread
From: Stefan Hajnoczi @ 2015-04-16 13:00 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1005 bytes --]

On Thu, Apr 16, 2015 at 12:57:36PM +0800, Fam Zheng wrote:
> +static int qemu_poll_timerfd(QEMUPoll *qpoll, int64_t timeout_ns)
> +{
> +    int r;
> +    struct itimerspec its = { { 0 } };
> +
> +    if (timeout_ns > 0) {
> +        its.it_value.tv_sec = timeout_ns / 1000000000LL;
> +        its.it_value.tv_nsec = timeout_ns % 1000000000LL;
> +    }
> +
> +    r = timerfd_settime(qpoll->timerfd, 0, &its, NULL);

Can't hurt to put a comment here:

/* The timer must be set even when there is no timeout so the readable
 * timerfd is cleared (we never call read(2) on it).
 */

> +    if (r) {
> +        struct pollfd fd = {
> +            .fd = qpoll->epollfd,
> +            .events = POLLIN | POLLOUT | POLLERR | POLLHUP,
> +        };
> +        perror("timerfd_settime");
> +        abort();
> +        r = ppoll(&fd, 1, &its.it_value, NULL);

Please remove the dead code.

In fact, there is no reasonable error for timerfd_settime().  It should
never fail, I'd be happy with just assert(r == 0).

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction
  2015-04-16  4:57 [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
                   ` (6 preceding siblings ...)
  2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 7/7] poll-linux: Add timerfd support Fam Zheng
@ 2015-04-16 13:03 ` Stefan Hajnoczi
  2015-04-17  2:02   ` Fam Zheng
  7 siblings, 1 reply; 12+ messages in thread
From: Stefan Hajnoczi @ 2015-04-16 13:03 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1295 bytes --]

On Thu, Apr 16, 2015 at 12:57:29PM +0800, Fam Zheng wrote:
> v3: Rebase to master for 2.4.
>     Although epoll_pwait1 syscall is still being worked on [1], the QEMU part
>     (if any) will base on this, so let's merge it first.
>     
>     That part is not included in this version because I'm still evaluating by
>     comparing epoll_pwait1 with epoll+timerfd as with current master they seem
>     to be really close.
> 
>     [1]: http://www.spinics.net/lists/linux-api/msg08216.html
> 
> v2: Emulate nanoseconds precison of timeout with ppoll and timerfd.
>     Their performance is on par with each other, but both much better than
>     qemu.git:
> 
>     syscall         high # of fd      low # of fd
>     -------------------------------------------------
>     qemu.git(ppoll) 44                96
>     ppoll+epoll     85                101
>     timerfd+epoll   87                109
> 
> (In high # of fd case, 3 activated but idle virtio-console devices are
> attached, which will add us hundereds of fds to poll)

Have you rerun benchmarks with this patch series?

I wonder how the ppoll-only performance changes.  It seems like there
are now additional copies of <fd, events, revents> information and
corresponding malloc/realloc/frees.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction
  2015-04-16 13:03 ` [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction Stefan Hajnoczi
@ 2015-04-17  2:02   ` Fam Zheng
  2015-04-20 10:36     ` Stefan Hajnoczi
  0 siblings, 1 reply; 12+ messages in thread
From: Fam Zheng @ 2015-04-17  2:02 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel

On Thu, 04/16 14:03, Stefan Hajnoczi wrote:
> On Thu, Apr 16, 2015 at 12:57:29PM +0800, Fam Zheng wrote:
> > v3: Rebase to master for 2.4.
> >     Although epoll_pwait1 syscall is still being worked on [1], the QEMU part
> >     (if any) will base on this, so let's merge it first.
> >     
> >     That part is not included in this version because I'm still evaluating by
> >     comparing epoll_pwait1 with epoll+timerfd as with current master they seem
> >     to be really close.
> > 
> >     [1]: http://www.spinics.net/lists/linux-api/msg08216.html
> > 
> > v2: Emulate nanoseconds precison of timeout with ppoll and timerfd.
> >     Their performance is on par with each other, but both much better than
> >     qemu.git:
> > 
> >     syscall         high # of fd      low # of fd
> >     -------------------------------------------------
> >     qemu.git(ppoll) 44                96
> >     ppoll+epoll     85                101
> >     timerfd+epoll   87                109
> > 
> > (In high # of fd case, 3 activated but idle virtio-console devices are
> > attached, which will add us hundereds of fds to poll)
> 
> Have you rerun benchmarks with this patch series?

Yes, here:

    syscall         high # of fd      low # of fd
                    (Unit MB/s)       (Unit MB/s)
    -------------------------------------------------
    qemu.git(ppoll) 24                73
    ppoll+epoll     49                77
    timerfd+epoll   49                82

> 
> I wonder how the ppoll-only performance changes.  It seems like there
> are now additional copies of <fd, events, revents> information and
> corresponding malloc/realloc/frees.
> 
> Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction
  2015-04-17  2:02   ` Fam Zheng
@ 2015-04-20 10:36     ` Stefan Hajnoczi
  0 siblings, 0 replies; 12+ messages in thread
From: Stefan Hajnoczi @ 2015-04-20 10:36 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2105 bytes --]

On Fri, Apr 17, 2015 at 10:02:30AM +0800, Fam Zheng wrote:
> On Thu, 04/16 14:03, Stefan Hajnoczi wrote:
> > On Thu, Apr 16, 2015 at 12:57:29PM +0800, Fam Zheng wrote:
> > > v3: Rebase to master for 2.4.
> > >     Although epoll_pwait1 syscall is still being worked on [1], the QEMU part
> > >     (if any) will base on this, so let's merge it first.
> > >     
> > >     That part is not included in this version because I'm still evaluating by
> > >     comparing epoll_pwait1 with epoll+timerfd as with current master they seem
> > >     to be really close.
> > > 
> > >     [1]: http://www.spinics.net/lists/linux-api/msg08216.html
> > > 
> > > v2: Emulate nanoseconds precison of timeout with ppoll and timerfd.
> > >     Their performance is on par with each other, but both much better than
> > >     qemu.git:
> > > 
> > >     syscall         high # of fd      low # of fd
> > >     -------------------------------------------------
> > >     qemu.git(ppoll) 44                96
> > >     ppoll+epoll     85                101
> > >     timerfd+epoll   87                109
> > > 
> > > (In high # of fd case, 3 activated but idle virtio-console devices are
> > > attached, which will add us hundereds of fds to poll)
> > 
> > Have you rerun benchmarks with this patch series?
> 
> Yes, here:
> 
>     syscall         high # of fd      low # of fd
>                     (Unit MB/s)       (Unit MB/s)
>     -------------------------------------------------
>     qemu.git(ppoll) 24                73
>     ppoll+epoll     49                77
>     timerfd+epoll   49                82

Nice.  Weird that the results are much lower than the v2 results, but
maybe the host changed?

> > 
> > I wonder how the ppoll-only performance changes.  It seems like there
> > are now additional copies of <fd, events, revents> information and
> > corresponding malloc/realloc/frees.

Please add a ppoll-only column hosts so we can be confident that the
non-epoll code works.  It will also show that non-Linux POSIX hosts
haven't regressed.

Thanks,
Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-04-20 10:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-16  4:57 [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 1/7] poll: Introduce QEMU Poll API Fam Zheng
2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 2/7] posix-aio: Use QEMU poll interface Fam Zheng
2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 3/7] poll: Add epoll implementation for qemu_poll Fam Zheng
2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 4/7] main-loop: Replace qemu_poll_ns with qemu_poll Fam Zheng
2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 5/7] tests: Add test case for qemu_poll Fam Zheng
2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 6/7] poll-glib: Support ppoll Fam Zheng
2015-04-16  4:57 ` [Qemu-devel] [PATCH v3 7/7] poll-linux: Add timerfd support Fam Zheng
2015-04-16 13:00   ` Stefan Hajnoczi
2015-04-16 13:03 ` [Qemu-devel] [PATCH v3 0/7] aio: Support epoll by introducing qemu_poll abstraction Stefan Hajnoczi
2015-04-17  2:02   ` Fam Zheng
2015-04-20 10:36     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.