* [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction
@ 2014-12-04 3:43 Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 1/6] poll: Introduce QEMU Poll API Fam Zheng
` (6 more replies)
0 siblings, 7 replies; 11+ messages in thread
From: Fam Zheng @ 2014-12-04 3:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi
v2: Emulate nanoseconds precison of timeout with ppoll and timerfd.
Their performance is on par with each other, but both much better than
qemu.git:
syscall high # of fd low # of fd
-------------------------------------------------
qemu.git(ppoll) 44 96
ppoll+epoll 85 101
timerfd+epoll 87 109
(In high # of fd case, 3 activated but idle virtio-console devices are
attached, which will add us hundereds of fds to poll)
v1 cover letter
---------------
ppoll(2) doesn't scale as well as epoll: The elapsed time of the syscall is
linear to the number of fd's we poll, which hurts performance a bit when the
number of devices are many, or when a virtio device registers many virtqueues
(virtio-serial, for instance).
This series introduces qemu_poll, which is implemented with g_poll and epoll,
decided at configure time with CONFIG_EPOLL.
Fam
Fam Zheng (6):
poll: Introduce QEMU Poll API
posix-aio: Use QEMU poll interface
poll: Add epoll implementation for qemu_poll
main-loop: Replace qemu_poll_ns with qemu_poll
tests: Add test case for qemu_poll
poll-linux: Add timerfd support
Makefile.objs | 2 +
aio-posix.c | 52 ++++----
async.c | 5 +-
include/block/aio.h | 7 +-
include/qemu/poll.h | 40 ++++++
include/qemu/timer.h | 13 --
include/qemu/typedefs.h | 2 +
main-loop.c | 35 +++++-
poll-glib.c | 130 ++++++++++++++++++++
poll-linux.c | 314 ++++++++++++++++++++++++++++++++++++++++++++++++
qemu-timer.c | 28 -----
tests/Makefile | 2 +
tests/test-poll.c | 272 +++++++++++++++++++++++++++++++++++++++++
13 files changed, 821 insertions(+), 81 deletions(-)
create mode 100644 include/qemu/poll.h
create mode 100644 poll-glib.c
create mode 100644 poll-linux.c
create mode 100644 tests/test-poll.c
--
1.9.3
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Qemu-devel] [PATCH v2 1/6] poll: Introduce QEMU Poll API
2014-12-04 3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
@ 2014-12-04 3:43 ` Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 2/6] posix-aio: Use QEMU poll interface Fam Zheng
` (5 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2014-12-04 3:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi
This is abstract of underlying poll implementation. A glib
implementation is included.
Signed-off-by: Fam Zheng <famz@redhat.com>
---
Makefile.objs | 2 +-
include/qemu/poll.h | 40 +++++++++++++++
include/qemu/typedefs.h | 2 +
poll-glib.c | 130 ++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 173 insertions(+), 1 deletion(-)
create mode 100644 include/qemu/poll.h
create mode 100644 poll-glib.c
diff --git a/Makefile.objs b/Makefile.objs
index 18fd35c..57184eb 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -9,7 +9,7 @@ util-obj-y = util/ qobject/ qapi/ qapi-types.o qapi-visit.o qapi-event.o
block-obj-y = async.o thread-pool.o
block-obj-y += nbd.o block.o blockjob.o
block-obj-y += main-loop.o iohandler.o qemu-timer.o
-block-obj-$(CONFIG_POSIX) += aio-posix.o
+block-obj-$(CONFIG_POSIX) += aio-posix.o poll-glib.o
block-obj-$(CONFIG_WIN32) += aio-win32.o
block-obj-y += block/
block-obj-y += qemu-io-cmds.o
diff --git a/include/qemu/poll.h b/include/qemu/poll.h
new file mode 100644
index 0000000..597975f
--- /dev/null
+++ b/include/qemu/poll.h
@@ -0,0 +1,40 @@
+#ifndef QEMU_POLL_H
+#define QEMU_POLL_H
+
+#include "qemu/typedefs.h"
+#include "qemu-common.h"
+
+struct QEMUPollEvent {
+ int fd;
+ int events;
+ int revents;
+ void *opaque;
+};
+
+QEMUPoll *qemu_poll_new(void);
+void qemu_poll_free(QEMUPoll *qpoll);
+int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns);
+
+/* Add an fd to poll. Return -EEXIST if fd already registered. */
+int qemu_poll_add(QEMUPoll *qpoll, int fd, int gio_events, void *opaque);
+
+/* Delete a previously added fd. Return -ENOENT if fd not registered. */
+int qemu_poll_del(QEMUPoll *qpoll, int fd);
+
+/**
+ * A shortcut to:
+ * 1) remove all the existing fds;
+ * 2) add all in @fds, while setting each fd's opaque pointer to the pollfd
+ * struct itself.
+ */
+int qemu_poll_set_fds(QEMUPoll *qpoll, GPollFD *fds, int nfds);
+
+/**
+ * Query the events from last qemu_poll. ONLY revent and opaque are valid in
+ * @events after return.
+ */
+int qemu_poll_get_events(QEMUPoll *qpoll,
+ QEMUPollEvent *events,
+ int max_events);
+
+#endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 3475177..f9a5e60 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -7,6 +7,8 @@ typedef struct QEMUTimer QEMUTimer;
typedef struct QEMUTimerListGroup QEMUTimerListGroup;
typedef struct QEMUFile QEMUFile;
typedef struct QEMUBH QEMUBH;
+typedef struct QEMUPoll QEMUPoll;
+typedef struct QEMUPollEvent QEMUPollEvent;
typedef struct AioContext AioContext;
diff --git a/poll-glib.c b/poll-glib.c
new file mode 100644
index 0000000..64fde69
--- /dev/null
+++ b/poll-glib.c
@@ -0,0 +1,130 @@
+/*
+ * g_poll implementation for QEMU Poll API
+ *
+ * Copyright Red Hat, Inc. 2014
+ *
+ * Authors:
+ * Fam Zheng <famz@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "qemu/timer.h"
+#include "qemu/poll.h"
+
+struct QEMUPoll {
+ /* Array of GPollFD for g_poll() */
+ GArray *gpollfds;
+
+ /* Array of opaque pointers, should be in sync with gpollfds */
+ GArray *opaque;
+};
+
+QEMUPoll *qemu_poll_new(void)
+{
+ QEMUPoll *qpoll = g_new0(QEMUPoll, 1);
+ qpoll->gpollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
+ qpoll->opaque = g_array_new(FALSE, FALSE, sizeof(void *));
+ return qpoll;
+}
+
+
+void qemu_poll_free(QEMUPoll *qpoll)
+{
+ g_array_unref(qpoll->gpollfds);
+ g_array_unref(qpoll->opaque);
+ g_free(qpoll);
+}
+
+int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
+{
+ int i;
+ for (i = 0; i < qpoll->gpollfds->len; i++) {
+ GPollFD *p = &g_array_index(qpoll->gpollfds, GPollFD, i);
+ p->revents = 0;
+ }
+ return g_poll((GPollFD *)qpoll->gpollfds->data,
+ qpoll->gpollfds->len,
+ qemu_timeout_ns_to_ms(timeout_ns));
+}
+
+/* Add an fd to poll. Return -EEXIST if fd already registered. */
+int qemu_poll_add(QEMUPoll *qpoll, int fd, int gio_events, void *opaque)
+{
+ int i;
+ GPollFD pfd = (GPollFD) {
+ .fd = fd,
+ .revents = 0,
+ .events = gio_events,
+ };
+
+ for (i = 0; i < qpoll->gpollfds->len; i++) {
+ GPollFD *p = &g_array_index(qpoll->gpollfds, GPollFD, i);
+ if (p->fd == fd) {
+ return -EEXIST;
+ }
+ }
+ g_array_append_val(qpoll->gpollfds, pfd);
+ g_array_append_val(qpoll->opaque, opaque);
+ assert(qpoll->gpollfds->len == qpoll->opaque->len);
+ return 0;
+}
+
+/* Delete a previously added fd. Return -ENOENT if fd not registered. */
+int qemu_poll_del(QEMUPoll *qpoll, int fd)
+{
+ int i;
+ int found = -1;
+
+ for (i = 0; i < qpoll->gpollfds->len; i++) {
+ GPollFD *p = &g_array_index(qpoll->gpollfds, GPollFD, i);
+ if (p->fd == fd) {
+ found = i;
+ break;
+ }
+ }
+ if (found >= 0) {
+ g_array_remove_index(qpoll->gpollfds, found);
+ g_array_remove_index(qpoll->opaque, found);
+ assert(qpoll->gpollfds->len == qpoll->opaque->len);
+ return 0;
+ } else {
+ return -ENOENT;
+ }
+}
+
+int qemu_poll_set_fds(QEMUPoll *qpoll, GPollFD *fds, int nfds)
+{
+ int i;
+ g_array_set_size(qpoll->gpollfds, 0);
+ g_array_set_size(qpoll->opaque, 0);
+ for (i = 0; i < nfds; i++) {
+ void *opaque = &fds[i];
+ g_array_append_val(qpoll->gpollfds, fds[i]);
+ g_array_append_val(qpoll->opaque, opaque);
+ assert(qpoll->gpollfds->len == qpoll->opaque->len);
+ }
+ return nfds;
+}
+
+int qemu_poll_get_events(QEMUPoll *qpoll,
+ QEMUPollEvent *events,
+ int max_events)
+{
+ int i;
+ int r = 0;
+ for (i = 0; i < qpoll->gpollfds->len && r < max_events; i++) {
+ GPollFD *fd = &g_array_index(qpoll->gpollfds, GPollFD, i);
+ if (fd->revents & fd->events) {
+ events[r].fd = fd->fd;
+ events[r].revents = fd->revents;
+ events[r].events = fd->events;
+ events[r].opaque = g_array_index(qpoll->opaque, void *, i);
+ r++;
+ }
+ }
+ return r;
+}
--
1.9.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Qemu-devel] [PATCH v2 2/6] posix-aio: Use QEMU poll interface
2014-12-04 3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 1/6] poll: Introduce QEMU Poll API Fam Zheng
@ 2014-12-04 3:43 ` Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 3/6] poll: Add epoll implementation for qemu_poll Fam Zheng
` (4 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2014-12-04 3:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi
The AIO handler list is only modified by aio_set_fd_handler, so we can
easily add del poll fd there. Initialize a QEMUPoll and keep track of
all the fds, so we don't need to rebuild a GPollFD array for g_poll in
aio_poll.
Signed-off-by: Fam Zheng <famz@redhat.com>
---
aio-posix.c | 52 +++++++++++++++++++++-------------------------------
async.c | 5 +++--
include/block/aio.h | 7 +++++--
3 files changed, 29 insertions(+), 35 deletions(-)
diff --git a/aio-posix.c b/aio-posix.c
index d3ac06e..32323b3 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -17,6 +17,7 @@
#include "block/block.h"
#include "qemu/queue.h"
#include "qemu/sockets.h"
+#include "qemu/poll.h"
struct AioHandler
{
@@ -55,6 +56,7 @@ void aio_set_fd_handler(AioContext *ctx,
/* Are we deleting the fd handler? */
if (!io_read && !io_write) {
if (node) {
+ qemu_poll_del(ctx->qpoll, fd);
g_source_remove_poll(&ctx->source, &node->pfd);
/* If the lock is held, just mark the node as deleted */
@@ -71,13 +73,15 @@ void aio_set_fd_handler(AioContext *ctx,
}
}
} else {
- if (node == NULL) {
+ if (node) {
+ /* Remove the old */
+ qemu_poll_del(ctx->qpoll, fd);
+ g_source_remove_poll(&ctx->source, &node->pfd);
+ } else {
/* Alloc and insert if it's not already there */
node = g_malloc0(sizeof(AioHandler));
node->pfd.fd = fd;
QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
-
- g_source_add_poll(&ctx->source, &node->pfd);
}
/* Update handler with latest information */
node->io_read = io_read;
@@ -87,6 +91,8 @@ void aio_set_fd_handler(AioContext *ctx,
node->pfd.events = (io_read ? G_IO_IN | G_IO_HUP | G_IO_ERR : 0);
node->pfd.events |= (io_write ? G_IO_OUT | G_IO_ERR : 0);
+ qemu_poll_add(ctx->qpoll, fd, node->pfd.events, node);
+ g_source_add_poll(&ctx->source, &node->pfd);
}
aio_notify(ctx);
@@ -191,6 +197,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
AioHandler *node;
bool was_dispatching;
int ret;
+ int i;
bool progress;
was_dispatching = ctx->dispatching;
@@ -208,38 +215,21 @@ bool aio_poll(AioContext *ctx, bool blocking)
*/
aio_set_dispatching(ctx, !blocking);
- ctx->walking_handlers++;
-
- g_array_set_size(ctx->pollfds, 0);
-
- /* fill pollfds */
- QLIST_FOREACH(node, &ctx->aio_handlers, node) {
- node->pollfds_idx = -1;
- if (!node->deleted && node->pfd.events) {
- GPollFD pfd = {
- .fd = node->pfd.fd,
- .events = node->pfd.events,
- };
- node->pollfds_idx = ctx->pollfds->len;
- g_array_append_val(ctx->pollfds, pfd);
- }
- }
-
- ctx->walking_handlers--;
-
/* wait until next event */
- ret = qemu_poll_ns((GPollFD *)ctx->pollfds->data,
- ctx->pollfds->len,
- blocking ? aio_compute_timeout(ctx) : 0);
+ ret = qemu_poll(ctx->qpoll, blocking ? aio_compute_timeout(ctx) : 0);
/* if we have any readable fds, dispatch event */
if (ret > 0) {
- QLIST_FOREACH(node, &ctx->aio_handlers, node) {
- if (node->pollfds_idx != -1) {
- GPollFD *pfd = &g_array_index(ctx->pollfds, GPollFD,
- node->pollfds_idx);
- node->pfd.revents = pfd->revents;
- }
+ int r;
+ g_array_set_size(ctx->events, ret);
+ r = qemu_poll_get_events(ctx->qpoll,
+ (QEMUPollEvent *)ctx->events->data,
+ ret);
+ assert(r == ret);
+ for (i = 0; i < r; i++) {
+ QEMUPollEvent *e = &g_array_index(ctx->events, QEMUPollEvent, i);
+ node = e->opaque;
+ node->pfd.revents = e->revents;
}
}
diff --git a/async.c b/async.c
index 6e1b282..443a674 100644
--- a/async.c
+++ b/async.c
@@ -27,6 +27,7 @@
#include "block/thread-pool.h"
#include "qemu/main-loop.h"
#include "qemu/atomic.h"
+#include "qemu/poll.h"
/***********************************************************/
/* bottom halves (can be seen as timers which expire ASAP) */
@@ -232,7 +233,6 @@ aio_ctx_finalize(GSource *source)
event_notifier_cleanup(&ctx->notifier);
rfifolock_destroy(&ctx->lock);
qemu_mutex_destroy(&ctx->bh_lock);
- g_array_free(ctx->pollfds, TRUE);
timerlistgroup_deinit(&ctx->tlg);
}
@@ -300,10 +300,11 @@ AioContext *aio_context_new(Error **errp)
error_setg_errno(errp, -ret, "Failed to initialize event notifier");
return NULL;
}
+ ctx->qpoll = qemu_poll_new();
+ ctx->events = g_array_new(FALSE, FALSE, sizeof(QEMUPollEvent));
aio_set_event_notifier(ctx, &ctx->notifier,
(EventNotifierHandler *)
event_notifier_test_and_clear);
- ctx->pollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
ctx->thread_pool = NULL;
qemu_mutex_init(&ctx->bh_lock);
rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
diff --git a/include/block/aio.h b/include/block/aio.h
index 6bf0e04..c36b8c1 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -82,8 +82,11 @@ struct AioContext {
/* Used for aio_notify. */
EventNotifier notifier;
- /* GPollFDs for aio_poll() */
- GArray *pollfds;
+ /* qemu_poll context */
+ QEMUPoll *qpoll;
+
+ /* QEMUPollEvents for qemu_poll_get_events() */
+ GArray *events;
/* Thread pool for performing work and receiving completion callbacks */
struct ThreadPool *thread_pool;
--
1.9.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Qemu-devel] [PATCH v2 3/6] poll: Add epoll implementation for qemu_poll
2014-12-04 3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 1/6] poll: Introduce QEMU Poll API Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 2/6] posix-aio: Use QEMU poll interface Fam Zheng
@ 2014-12-04 3:43 ` Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 4/6] main-loop: Replace qemu_poll_ns with qemu_poll Fam Zheng
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2014-12-04 3:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi
This implements qemu_poll with ppoll + epoll. The only complex part is
qemu_poll_set_fds, which will sync up epollfd with epoll_ctl by
computing the symmetric difference of previous and new fds.
The ppoll is used to retain ns precision of timeout.
Signed-off-by: Fam Zheng <famz@redhat.com>
---
Makefile.objs | 4 +-
poll-linux.c | 232 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 235 insertions(+), 1 deletion(-)
create mode 100644 poll-linux.c
diff --git a/Makefile.objs b/Makefile.objs
index 57184eb..ea86cb8 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -9,7 +9,9 @@ util-obj-y = util/ qobject/ qapi/ qapi-types.o qapi-visit.o qapi-event.o
block-obj-y = async.o thread-pool.o
block-obj-y += nbd.o block.o blockjob.o
block-obj-y += main-loop.o iohandler.o qemu-timer.o
-block-obj-$(CONFIG_POSIX) += aio-posix.o poll-glib.o
+block-obj-$(CONFIG_POSIX) += aio-posix.o
+block-obj-$(CONFIG_EPOLL) += poll-linux.o
+block-obj-$(if $(CONFIG_EPOLL),n,$(CONFIG_POSIX)) += poll-glib.o
block-obj-$(CONFIG_WIN32) += aio-win32.o
block-obj-y += block/
block-obj-y += qemu-io-cmds.o
diff --git a/poll-linux.c b/poll-linux.c
new file mode 100644
index 0000000..7605004
--- /dev/null
+++ b/poll-linux.c
@@ -0,0 +1,232 @@
+/*
+ * epoll implementation for QEMU Poll API
+ *
+ * Copyright Red Hat, Inc. 2014
+ *
+ * Authors:
+ * Fam Zheng <famz@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <sys/epoll.h>
+#include <glib.h>
+#include <poll.h>
+#include "qemu-common.h"
+#include "qemu/timer.h"
+#include "qemu/poll.h"
+
+struct QEMUPoll {
+ int epollfd;
+ struct epoll_event *events;
+ int max_events;
+ int nevents;
+ GHashTable *fds;
+};
+
+QEMUPoll *qemu_poll_new(void)
+{
+ int epollfd;
+ QEMUPoll *qpoll = g_new0(QEMUPoll, 1);
+ epollfd = epoll_create1(EPOLL_CLOEXEC);
+ if (epollfd < 0) {
+ perror("epoll_create1:");
+ abort();
+ }
+ qpoll->epollfd = epollfd;
+ qpoll->max_events = 1;
+ qpoll->events = g_new(struct epoll_event, 1);
+ qpoll->fds = g_hash_table_new_full(g_int_hash, g_int_equal,
+ NULL, g_free);
+ return qpoll;
+}
+
+void qemu_poll_free(QEMUPoll *qpoll)
+{
+ g_free(qpoll->events);
+ g_hash_table_destroy(qpoll->fds);
+ close(qpoll->epollfd);
+ g_free(qpoll);
+}
+
+int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
+{
+ int r;
+ struct pollfd fd = {
+ .fd = qpoll->epollfd,
+ .events = POLLIN | POLLOUT | POLLERR | POLLHUP,
+ };
+
+ if (timeout_ns >= 0) {
+ struct timespec ts;
+ ts.tv_sec = timeout_ns / 1000000000LL;
+ ts.tv_nsec = timeout_ns % 1000000000LL;
+ r = ppoll(&fd, 1, &ts, NULL);
+ } else {
+ r = ppoll(&fd, 1, NULL, NULL);
+ }
+
+ if (r > 0) {
+ r = epoll_wait(qpoll->epollfd,
+ qpoll->events,
+ qpoll->max_events,
+ 0);
+ qpoll->nevents = r;
+ }
+ return r;
+}
+
+static inline uint32_t epoll_event_from_gio_events(int gio_events)
+{
+
+ return (gio_events & G_IO_IN ? EPOLLIN : 0) |
+ (gio_events & G_IO_OUT ? EPOLLOUT : 0) |
+ (gio_events & G_IO_ERR ? EPOLLERR : 0) |
+ (gio_events & G_IO_HUP ? EPOLLHUP : 0);
+}
+
+
+/* Add an fd to poll. Return -EEXIST if fd already registered. */
+int qemu_poll_add(QEMUPoll *qpoll, int fd, int gio_events, void *opaque)
+{
+ int ret;
+ struct epoll_event ev;
+ QEMUPollEvent *e;
+
+ ev.events = epoll_event_from_gio_events(gio_events);
+ ev.data.fd = fd;
+ ret = epoll_ctl(qpoll->epollfd, EPOLL_CTL_ADD, fd, &ev);
+ if (ret) {
+ ret = -errno;
+ return ret;
+ }
+ qpoll->max_events++;
+ qpoll->events = g_renew(struct epoll_event,
+ qpoll->events,
+ qpoll->max_events);
+ /* Shouldn't get here if the fd is already added since epoll_ctl would
+ * return -EEXIST, assert to be sure */
+ assert(g_hash_table_lookup(qpoll->fds, &fd) == NULL);
+ e = g_new0(QEMUPollEvent, 1);
+ e->fd = fd;
+ e->events = gio_events;
+ e->opaque = opaque;
+ g_hash_table_insert(qpoll->fds, &e->fd, e);
+ return ret;
+}
+
+/* Delete a previously added fd. Return -ENOENT if fd not registered. */
+int qemu_poll_del(QEMUPoll *qpoll, int fd)
+{
+ int ret;
+
+ if (!g_hash_table_lookup(qpoll->fds, &fd)) {
+ ret = -ENOENT;
+ goto out;
+ }
+ ret = epoll_ctl(qpoll->epollfd, EPOLL_CTL_DEL, fd, NULL);
+ if (ret) {
+ ret = -errno;
+ goto out;
+ }
+ qpoll->max_events--;
+ qpoll->events = g_renew(struct epoll_event,
+ qpoll->events,
+ qpoll->max_events);
+out:
+ g_hash_table_remove(qpoll->fds, &fd);
+ return ret;
+}
+
+static void qemu_poll_copy_fd(gpointer key, gpointer value, gpointer user_data)
+{
+ GHashTable *dst = user_data;
+ QEMUPollEvent *event, *copy;
+
+ event = value;
+ copy = g_new(QEMUPollEvent, 1);
+ *copy = *event;
+ g_hash_table_insert(dst, ©->fd, copy);
+}
+
+static void qemu_poll_del_fd(gpointer key, gpointer value, gpointer user_data)
+{
+ QEMUPoll *qpoll = user_data;
+ QEMUPollEvent *event = value;
+
+ qemu_poll_del(qpoll, event->fd);
+}
+
+int qemu_poll_set_fds(QEMUPoll *qpoll, GPollFD *fds, int nfds)
+{
+ int i;
+ int updated = 0;
+ int ret = nfds;
+ int old_size = g_hash_table_size(qpoll->fds);
+
+ for (i = 0; i < nfds; i++) {
+ int r;
+ GPollFD *fd = &fds[i];
+ QEMUPollEvent *e = g_hash_table_lookup(qpoll->fds, &fd->fd);
+ if (e) {
+ updated++;
+ assert(e->fd == fd->fd);
+ if (e->events == fd->events) {
+ e->opaque = fd;
+ continue;
+ }
+ r = qemu_poll_del(qpoll, fd->fd);
+ if (r < 0) {
+ ret = r;
+ break;
+ }
+ }
+ r = qemu_poll_add(qpoll, fd->fd, fd->events, fd);
+ if (r < 0) {
+ ret = r;
+ break;
+ }
+ }
+
+ if (updated < old_size) {
+ GHashTable *fds_copy;
+
+ fds_copy = g_hash_table_new_full(g_int_hash, g_int_equal, NULL, g_free);
+ g_hash_table_foreach(qpoll->fds, qemu_poll_copy_fd, fds_copy);
+
+ /* Some fds need to be removed, as they are not seen in new fds */
+ for (i = 0; i < nfds; i++) {
+ GPollFD *fd = &fds[i];
+ g_hash_table_remove(fds_copy, &fd->fd);
+ }
+
+ g_hash_table_foreach(fds_copy, qemu_poll_del_fd, qpoll);
+ g_hash_table_destroy(fds_copy);
+ }
+ return ret;
+}
+
+int qemu_poll_get_events(QEMUPoll *qpoll,
+ QEMUPollEvent *events,
+ int max_events)
+{
+ int i;
+ QEMUPollEvent *p;
+ struct epoll_event *ev;
+ int fd;
+
+ for (i = 0; i < MIN(qpoll->nevents, max_events); i++) {
+ ev = &qpoll->events[i];
+ fd = ev->data.fd;
+ p = g_hash_table_lookup(qpoll->fds, &fd);
+ assert(p);
+
+ events[i].revents = ev->events;
+ events[i].opaque = p->opaque;
+ events[i].fd = fd;
+ events[i].events = p->events;
+ }
+ return i;
+}
--
1.9.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Qemu-devel] [PATCH v2 4/6] main-loop: Replace qemu_poll_ns with qemu_poll
2014-12-04 3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
` (2 preceding siblings ...)
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 3/6] poll: Add epoll implementation for qemu_poll Fam Zheng
@ 2014-12-04 3:43 ` Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 5/6] tests: Add test case for qemu_poll Fam Zheng
` (2 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2014-12-04 3:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi
qemu_poll_set_fds + qemu_poll does the same, and when epoll is
available, it is faster.
Signed-off-by: Fam Zheng <famz@redhat.com>
---
include/qemu/timer.h | 13 -------------
main-loop.c | 35 ++++++++++++++++++++++++++++++-----
qemu-timer.c | 28 ----------------------------
3 files changed, 30 insertions(+), 46 deletions(-)
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 5f5210d..cd3371d 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -646,19 +646,6 @@ void timer_put(QEMUFile *f, QEMUTimer *ts);
int qemu_timeout_ns_to_ms(int64_t ns);
/**
- * qemu_poll_ns:
- * @fds: Array of file descriptors
- * @nfds: number of file descriptors
- * @timeout: timeout in nanoseconds
- *
- * Perform a poll like g_poll but with a timeout in nanoseconds.
- * See g_poll documentation for further details.
- *
- * Returns: number of fds ready
- */
-int qemu_poll_ns(GPollFD *fds, guint nfds, int64_t timeout);
-
-/**
* qemu_soonest_timeout:
* @timeout1: first timeout in nanoseconds (or -1 for infinite)
* @timeout2: second timeout in nanoseconds (or -1 for infinite)
diff --git a/main-loop.c b/main-loop.c
index 981bcb5..b70f929 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -29,6 +29,7 @@
#include "slirp/libslirp.h"
#include "qemu/main-loop.h"
#include "block/aio.h"
+#include "qemu/poll.h"
#ifndef _WIN32
@@ -130,6 +131,7 @@ void qemu_notify_event(void)
}
static GArray *gpollfds;
+static QEMUPoll *qpoll;
int qemu_init_main_loop(Error **errp)
{
@@ -150,6 +152,7 @@ int qemu_init_main_loop(Error **errp)
return -EMFILE;
}
gpollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
+ qpoll = qemu_poll_new();
src = aio_get_g_source(qemu_aio_context);
g_source_attach(src, NULL);
g_source_unref(src);
@@ -207,6 +210,8 @@ static int os_host_main_loop_wait(int64_t timeout)
{
int ret;
static int spin_counter;
+ QEMUPollEvent events[1024];
+ int r, i;
glib_pollfds_fill(&timeout);
@@ -236,7 +241,17 @@ static int os_host_main_loop_wait(int64_t timeout)
spin_counter++;
}
- ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len, timeout);
+ qemu_poll_set_fds(qpoll, (GPollFD *)gpollfds->data, gpollfds->len);
+ ret = qemu_poll(qpoll, timeout);
+ if (ret > 0) {
+ ret = MIN(ret, sizeof(events) / sizeof(QEMUPollEvent));
+ r = qemu_poll_get_events(qpoll, events, ret);
+ for (i = 0; i < r; i++) {
+ GPollFD *fd = events[i].opaque;
+ fd->revents = events[i].revents;
+ }
+ ret = r;
+ }
if (timeout) {
qemu_mutex_lock_iothread();
@@ -389,9 +404,11 @@ static int os_host_main_loop_wait(int64_t timeout)
{
GMainContext *context = g_main_context_default();
GPollFD poll_fds[1024 * 2]; /* this is probably overkill */
+ QEMUPollEvent poll_events[1024 * 2];
int select_ret = 0;
- int g_poll_ret, ret, i, n_poll_fds;
+ int nevents, ret, i, n_poll_fds;
PollingEntry *pe;
+ QEMUPollEvent *events;
WaitObjects *w = &wait_objects;
gint poll_timeout;
int64_t poll_timeout_ns;
@@ -441,10 +458,18 @@ static int os_host_main_loop_wait(int64_t timeout)
poll_timeout_ns = qemu_soonest_timeout(poll_timeout_ns, timeout);
qemu_mutex_unlock_iothread();
- g_poll_ret = qemu_poll_ns(poll_fds, n_poll_fds + w->num, poll_timeout_ns);
+
+ qemu_poll_set_fds(qpoll, (GPollFD *)poll_fds, n_poll_fds + w->num)
+ nevents = qemu_poll(qpoll, poll_timeout_ns);
qemu_mutex_lock_iothread();
- if (g_poll_ret > 0) {
+ if (nevents > 0) {
+ r = qemu_poll_get_events(qpoll, poll_events, nevents);
+ for (i = 0; i < r; i++) {
+ GPollFD *fd = poll_events[i].opaque;
+ fd->revents = poll_events[i].revents;
+ }
+
for (i = 0; i < w->num; i++) {
w->revents[i] = poll_fds[n_poll_fds + i].revents;
}
@@ -459,7 +484,7 @@ static int os_host_main_loop_wait(int64_t timeout)
g_main_context_dispatch(context);
}
- return select_ret || g_poll_ret;
+ return select_ret || nevents;
}
#endif
diff --git a/qemu-timer.c b/qemu-timer.c
index c77de64..4500235 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -303,34 +303,6 @@ int qemu_timeout_ns_to_ms(int64_t ns)
return (int) ms;
}
-
-/* qemu implementation of g_poll which uses a nanosecond timeout but is
- * otherwise identical to g_poll
- */
-int qemu_poll_ns(GPollFD *fds, guint nfds, int64_t timeout)
-{
-#ifdef CONFIG_PPOLL
- if (timeout < 0) {
- return ppoll((struct pollfd *)fds, nfds, NULL, NULL);
- } else {
- struct timespec ts;
- int64_t tvsec = timeout / 1000000000LL;
- /* Avoid possibly overflowing and specifying a negative number of
- * seconds, which would turn a very long timeout into a busy-wait.
- */
- if (tvsec > (int64_t)INT32_MAX) {
- tvsec = INT32_MAX;
- }
- ts.tv_sec = tvsec;
- ts.tv_nsec = timeout % 1000000000LL;
- return ppoll((struct pollfd *)fds, nfds, &ts, NULL);
- }
-#else
- return g_poll(fds, nfds, qemu_timeout_ns_to_ms(timeout));
-#endif
-}
-
-
void timer_init(QEMUTimer *ts,
QEMUTimerList *timer_list, int scale,
QEMUTimerCB *cb, void *opaque)
--
1.9.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Qemu-devel] [PATCH v2 5/6] tests: Add test case for qemu_poll
2014-12-04 3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
` (3 preceding siblings ...)
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 4/6] main-loop: Replace qemu_poll_ns with qemu_poll Fam Zheng
@ 2014-12-04 3:43 ` Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 6/6] poll-linux: Add timerfd support Fam Zheng
2014-12-16 2:04 ` [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
6 siblings, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2014-12-04 3:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi
Signed-off-by: Fam Zheng <famz@redhat.com>
---
tests/Makefile | 2 +
tests/test-poll.c | 272 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 274 insertions(+)
create mode 100644 tests/test-poll.c
diff --git a/tests/Makefile b/tests/Makefile
index 16f0e4c..79f399d 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -41,6 +41,7 @@ check-unit-$(CONFIG_POSIX) += tests/test-rfifolock$(EXESUF)
check-unit-y += tests/test-throttle$(EXESUF)
gcov-files-test-aio-$(CONFIG_WIN32) = aio-win32.c
gcov-files-test-aio-$(CONFIG_POSIX) = aio-posix.c
+check-unit-y += tests/test-poll$(EXESUF)
check-unit-y += tests/test-thread-pool$(EXESUF)
gcov-files-test-thread-pool-y = thread-pool.c
gcov-files-test-hbitmap-y = util/hbitmap.c
@@ -241,6 +242,7 @@ tests/check-qjson$(EXESUF): tests/check-qjson.o libqemuutil.a libqemustub.a
tests/check-qom-interface$(EXESUF): tests/check-qom-interface.o $(qom-core-obj) libqemuutil.a libqemustub.a
tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(block-obj-y) libqemuutil.a libqemustub.a
tests/test-aio$(EXESUF): tests/test-aio.o $(block-obj-y) libqemuutil.a libqemustub.a
+tests/test-poll$(EXESUF): tests/test-poll.o $(block-obj-y) libqemuutil.a libqemustub.a
tests/test-rfifolock$(EXESUF): tests/test-rfifolock.o libqemuutil.a libqemustub.a
tests/test-throttle$(EXESUF): tests/test-throttle.o $(block-obj-y) libqemuutil.a libqemustub.a
tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(block-obj-y) libqemuutil.a libqemustub.a
diff --git a/tests/test-poll.c b/tests/test-poll.c
new file mode 100644
index 0000000..75074d8
--- /dev/null
+++ b/tests/test-poll.c
@@ -0,0 +1,272 @@
+/*
+ * QTest testcase for QEMU poll
+ *
+ * Copyright Red Hat, Inc. 2014
+ *
+ * Authors:
+ * Fam Zheng <famz@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include <glib.h>
+#include "qemu/poll.h"
+#include "qemu/event_notifier.h"
+
+static EventNotifier *poll_add_one(QEMUPoll *qpoll)
+{
+ EventNotifier *e = g_new(EventNotifier, 1);
+
+ event_notifier_init(e, false);
+ qemu_poll_add(qpoll, event_notifier_get_fd(e),
+ G_IO_IN,
+ NULL);
+ return e;
+}
+
+static int poll_del_one(QEMUPoll *qpoll, EventNotifier *e)
+{
+ int r = qemu_poll_del(qpoll, event_notifier_get_fd(e));
+ event_notifier_cleanup(e);
+ g_free(e);
+ return r;
+}
+
+static void test_poll_single(void)
+{
+ QEMUPoll *qpoll;
+ EventNotifier *e;
+ int r, i;
+
+ qpoll = qemu_poll_new();
+ g_assert(qpoll);
+
+ e = poll_add_one(qpoll);
+
+ /* Write some data and poll */
+ event_notifier_set(e);
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, 1);
+
+ /* Clear data and poll */
+ r = event_notifier_test_and_clear(e);
+ g_assert(r);
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, 0);
+
+ /* Write a lot of data and poll */
+ for (i = 0; i < 10000; i++) {
+ event_notifier_set(e);
+ }
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, 1);
+
+ /* Clear data and poll again */
+ r = event_notifier_test_and_clear(e);
+ g_assert(r);
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, 0);
+
+ /* Clean up */
+ poll_del_one(qpoll, e);
+ qemu_poll_free(qpoll);
+}
+
+static void test_poll_multiple(void)
+{
+ QEMUPoll *qpoll;
+ const int N = 32;
+ EventNotifier *e[N];
+ QEMUPollEvent events[N];
+ int r, s, i;
+
+ qpoll = qemu_poll_new();
+ g_assert(qpoll);
+
+ for (i = 0; i < N; i++) {
+ e[i] = poll_add_one(qpoll);
+ }
+
+ /* Write some data for each and poll */
+ for (i = 0; i < N; i++) {
+ event_notifier_set(e[i]);
+ }
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, N);
+
+ /* Clear data and poll */
+ for (i = 0; i < N; i++) {
+ r = event_notifier_test_and_clear(e[i]);
+ g_assert(r);
+ }
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, 0);
+
+ /* Write some data for first half and poll */
+ for (i = 0; i < N / 2; i++) {
+ event_notifier_set(e[i]);
+ }
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, N / 2);
+ s = qemu_poll_get_events(qpoll, events, N);
+ g_assert_cmpint(s, ==, r);
+ for (i = 0; i < s; i++) {
+ g_assert(events[i].revents & G_IO_IN);
+ }
+
+ /* Clean up */
+ for (i = 0; i < N; i++) {
+ poll_del_one(qpoll, e[i]);
+ }
+ qemu_poll_free(qpoll);
+}
+
+static void test_poll_add_del(void)
+{
+ QEMUPoll *qpoll;
+ EventNotifier *e1, *e2;
+ QEMUPollEvent events[2];
+ int r;
+
+ qpoll = qemu_poll_new();
+ g_assert(qpoll);
+
+ e1 = poll_add_one(qpoll);
+ e2 = poll_add_one(qpoll);
+
+ /* Write some data for each and poll */
+ event_notifier_set(e1);
+ event_notifier_set(e2);
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, 2);
+
+ /* Clear e1 and poll */
+ r = event_notifier_test_and_clear(e1);
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, 1);
+ r = qemu_poll_get_events(qpoll, events, 2);
+ g_assert_cmpint(r, ==, 1);
+ g_assert_cmpint(events[0].fd, ==, event_notifier_get_fd(e2));
+
+ /* Write to both but remove one and poll the other */
+ event_notifier_set(e1);
+ event_notifier_set(e2);
+ qemu_poll_del(qpoll, event_notifier_get_fd(e2));
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, 1);
+ r = qemu_poll_get_events(qpoll, events, 2);
+ g_assert_cmpint(r, ==, 1);
+ g_assert_cmpint(events[0].fd, ==, event_notifier_get_fd(e1));
+
+ r = qemu_poll_del(qpoll, event_notifier_get_fd(e2));
+ g_assert_cmpint(r, ==, -ENOENT);
+
+ /* Add it back and poll both */
+ qemu_poll_add(qpoll, event_notifier_get_fd(e2),
+ G_IO_IN, NULL);
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, 2);
+
+ event_notifier_test_and_clear(e1);
+ event_notifier_test_and_clear(e2);
+
+ r = qemu_poll_add(qpoll, event_notifier_get_fd(e2),
+ G_IO_IN, NULL);
+ g_assert_cmpint(r, ==, -EEXIST);
+
+ /* Clean up */
+ poll_del_one(qpoll, e1);
+ poll_del_one(qpoll, e2);
+ qemu_poll_free(qpoll);
+}
+
+static void gpollfd_fill(GPollFD *fds, int start, EventNotifier **e, int n)
+{
+ int i;
+ for (i = 0; i < n; i++) {
+ fds[i + start] = (GPollFD) {
+ .fd = event_notifier_get_fd(e[i]),
+ .events = G_IO_IN,
+ };
+ }
+}
+
+static void test_poll_set_fds(void)
+{
+ QEMUPoll *qpoll;
+ const int M = 20;
+ const int N = 10;
+ EventNotifier *em[M], *en[N];
+ GPollFD fds[M + N];
+ int i, r;
+
+ qpoll = qemu_poll_new();
+ g_assert(qpoll);
+
+ for (i = 0; i < M; i++) {
+ en[i] = poll_add_one(qpoll);
+ }
+
+ for (i = 0; i < M; i++) {
+ em[i] = poll_add_one(qpoll);
+ }
+
+ gpollfd_fill(fds, 0, em, M);
+ gpollfd_fill(fds, M, en, N);
+
+ /* Set N */
+ for (i = 0; i < N; i++) {
+ event_notifier_set(en[i]);
+ }
+
+ /* Poll M + N should get N */
+ r = qemu_poll_set_fds(qpoll, fds, M + N);
+ g_assert_cmpint(r, ==, M + N);
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, N);
+
+ /* Poll M should get 0 */
+ r = qemu_poll_set_fds(qpoll, fds, M);
+ g_assert_cmpint(r, ==, M);
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, 0);
+
+ /* Poll N should get N */
+ r = qemu_poll_set_fds(qpoll, fds + M, N);
+ g_assert_cmpint(r, ==, N);
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, N);
+
+ /* Poll M + N / 2 should get N / 2 */
+ r = qemu_poll_set_fds(qpoll, fds, M + N / 2);
+ g_assert_cmpint(r, ==, M + N / 2);
+ r = qemu_poll(qpoll, 0);
+ g_assert_cmpint(r, ==, N / 2);
+
+ /* Clean up */
+ for (i = 0; i < M; i++) {
+ poll_del_one(qpoll, em[i]);
+ }
+
+ for (i = 0; i < N; i++) {
+ poll_del_one(qpoll, en[i]);
+ }
+ qemu_poll_free(qpoll);
+}
+
+int main(int argc, char **argv)
+{
+ int ret;
+
+ g_test_init(&argc, &argv, NULL);
+
+ g_test_add_func("/poll/single", test_poll_single);
+ g_test_add_func("/poll/multiple", test_poll_multiple);
+ g_test_add_func("/poll/add-del", test_poll_add_del);
+ g_test_add_func("/poll/set-fds", test_poll_set_fds);
+
+ ret = g_test_run();
+
+ return ret;
+}
--
1.9.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Qemu-devel] [PATCH v2 6/6] poll-linux: Add timerfd support
2014-12-04 3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
` (4 preceding siblings ...)
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 5/6] tests: Add test case for qemu_poll Fam Zheng
@ 2014-12-04 3:43 ` Fam Zheng
2014-12-16 2:04 ` [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
6 siblings, 0 replies; 11+ messages in thread
From: Fam Zheng @ 2014-12-04 3:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi
In qemu_poll_timerfd, we arm the timerfd with timeout_ns. The timerfd is
also watched by epollfd, so that when there is no other events,
epoll_wait will still return on time, even though we pass -1 (wait
infinitely).
Signed-off-by: Fam Zheng <famz@redhat.com>
---
poll-linux.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 83 insertions(+), 1 deletion(-)
diff --git a/poll-linux.c b/poll-linux.c
index 7605004..154da8b 100644
--- a/poll-linux.c
+++ b/poll-linux.c
@@ -12,20 +12,52 @@
*/
#include <sys/epoll.h>
+
+#define USE_TIMERFD CONFIG_TIMERFD
+
+#ifdef USE_TIMERFD
+#include <sys/timerfd.h>
+#endif
+
#include <glib.h>
#include <poll.h>
#include "qemu-common.h"
#include "qemu/timer.h"
#include "qemu/poll.h"
+
struct QEMUPoll {
int epollfd;
+#if USE_TIMERFD
+ int timerfd;
+#endif
struct epoll_event *events;
int max_events;
int nevents;
GHashTable *fds;
};
+static void qemu_poll_init_timerfd(QEMUPoll *qpoll)
+{
+#if USE_TIMERFD
+ int r;
+ struct epoll_event ev;
+ qpoll->timerfd = timerfd_create(CLOCK_MONOTONIC,
+ TFD_NONBLOCK | TFD_CLOEXEC);
+ if (qpoll->timerfd < 0) {
+ perror("timerfd_create");
+ abort();
+ }
+ ev.events = EPOLLIN | EPOLLOUT | EPOLLERR | EPOLLHUP;
+ ev.data.fd = qpoll->timerfd;
+ r = epoll_ctl(qpoll->epollfd, EPOLL_CTL_ADD, qpoll->timerfd, &ev);
+ if (r) {
+ perror("epoll_ctl add timerfd");
+ abort();
+ }
+#endif
+}
+
QEMUPoll *qemu_poll_new(void)
{
int epollfd;
@@ -40,6 +72,7 @@ QEMUPoll *qemu_poll_new(void)
qpoll->events = g_new(struct epoll_event, 1);
qpoll->fds = g_hash_table_new_full(g_int_hash, g_int_equal,
NULL, g_free);
+ qemu_poll_init_timerfd(qpoll);
return qpoll;
}
@@ -48,10 +81,13 @@ void qemu_poll_free(QEMUPoll *qpoll)
g_free(qpoll->events);
g_hash_table_destroy(qpoll->fds);
close(qpoll->epollfd);
+#if USE_TIMERFD
+ close(qpoll->timerfd);
+#endif
g_free(qpoll);
}
-int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
+static int qemu_poll_ppoll(QEMUPoll *qpoll, int64_t timeout_ns)
{
int r;
struct pollfd fd = {
@@ -78,6 +114,47 @@ int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
return r;
}
+#if USE_TIMERFD
+static int qemu_poll_timerfd(QEMUPoll *qpoll, int64_t timeout_ns)
+{
+ int r;
+ struct itimerspec its = { { 0 } };
+
+ if (timeout_ns > 0) {
+ its.it_value.tv_sec = timeout_ns / 1000000000LL;
+ its.it_value.tv_nsec = timeout_ns % 1000000000LL;
+ }
+
+ r = timerfd_settime(qpoll->timerfd, 0, &its, NULL);
+ if (r) {
+ struct pollfd fd = {
+ .fd = qpoll->epollfd,
+ .events = POLLIN | POLLOUT | POLLERR | POLLHUP,
+ };
+ perror("timerfd_settime");
+ abort();
+ r = ppoll(&fd, 1, &its.it_value, NULL);
+ }
+ if (r < 0) {
+ return r;
+ }
+ r = epoll_wait(qpoll->epollfd,
+ qpoll->events,
+ qpoll->max_events,
+ timeout_ns > 0 ? -1 : timeout_ns);
+ qpoll->nevents = r;
+ return r;
+}
+#endif
+
+int qemu_poll(QEMUPoll *qpoll, int64_t timeout_ns)
+{
+#if USE_TIMERFD
+ return qemu_poll_timerfd(qpoll, timeout_ns);
+#endif
+ return qemu_poll_ppoll(qpoll, timeout_ns);
+}
+
static inline uint32_t epoll_event_from_gio_events(int gio_events)
{
@@ -220,6 +297,11 @@ int qemu_poll_get_events(QEMUPoll *qpoll,
for (i = 0; i < MIN(qpoll->nevents, max_events); i++) {
ev = &qpoll->events[i];
fd = ev->data.fd;
+#if USE_TIMERFD
+ if (fd == qpoll->timerfd) {
+ continue;
+ }
+#endif
p = g_hash_table_lookup(qpoll->fds, &fd);
assert(p);
--
1.9.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction
2014-12-04 3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
` (5 preceding siblings ...)
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 6/6] poll-linux: Add timerfd support Fam Zheng
@ 2014-12-16 2:04 ` Fam Zheng
2015-01-07 15:08 ` Stefan Hajnoczi
6 siblings, 1 reply; 11+ messages in thread
From: Fam Zheng @ 2014-12-16 2:04 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi, xigao
On Thu, 12/04 11:43, Fam Zheng wrote:
> v2: Emulate nanoseconds precison of timeout with ppoll and timerfd.
> Their performance is on par with each other, but both much better than
> qemu.git:
>
> syscall high # of fd low # of fd
> -------------------------------------------------
> qemu.git(ppoll) 44 96
> ppoll+epoll 85 101
> timerfd+epoll 87 109
More data points.
Xiaomei tested this series (applied on top of RHEL 7 qemu-kvm-rhev) and found
that:
0) when # of fds is high, epoll solutions are much better (+30%).
1) timerfd+epoll is slightly better than ppoll+epoll, but the difference is
minimal.
2) original code is 2%~5% faster than the new implementations when # of fds is
low.
This leads to the conclusion that that we'll have a small performance
degradation if merge this series. I'm thinking about possible optimizations.
Options in my mind are:
1) Remove 1ns PR_SET_TIMERSLACK in timerfd+epoll, this doesn't make qemu_poll
faster than the old qemu_poll_ns, but may have other positive effects that
compensate the cost.
2) Use dynamic switch between ppoll and timerfd+epoll. In poll-linux.c, We
start with pure ppoll, while keeping track of elapsed time in ppoll. And
periodically, we try "timerfd+epoll" for a few iterations, so that we can
compare if it is faster than pure ppoll. If it is, swap them, use timerfd+epoll
and and periodically try "ppoll".
That said, I'll also look at the kernel side. Maybe optimizing ppoll or just
add EPOLL_NANOSECOND_TIMEOUT to epoll_create1 is a better place for
engineering.
Fam
>
> (In high # of fd case, 3 activated but idle virtio-console devices are
> attached, which will add us hundereds of fds to poll)
>
> v1 cover letter
> ---------------
>
> ppoll(2) doesn't scale as well as epoll: The elapsed time of the syscall is
> linear to the number of fd's we poll, which hurts performance a bit when the
> number of devices are many, or when a virtio device registers many virtqueues
> (virtio-serial, for instance).
>
> This series introduces qemu_poll, which is implemented with g_poll and epoll,
> decided at configure time with CONFIG_EPOLL.
>
> Fam
>
>
> Fam Zheng (6):
> poll: Introduce QEMU Poll API
> posix-aio: Use QEMU poll interface
> poll: Add epoll implementation for qemu_poll
> main-loop: Replace qemu_poll_ns with qemu_poll
> tests: Add test case for qemu_poll
> poll-linux: Add timerfd support
>
> Makefile.objs | 2 +
> aio-posix.c | 52 ++++----
> async.c | 5 +-
> include/block/aio.h | 7 +-
> include/qemu/poll.h | 40 ++++++
> include/qemu/timer.h | 13 --
> include/qemu/typedefs.h | 2 +
> main-loop.c | 35 +++++-
> poll-glib.c | 130 ++++++++++++++++++++
> poll-linux.c | 314 ++++++++++++++++++++++++++++++++++++++++++++++++
> qemu-timer.c | 28 -----
> tests/Makefile | 2 +
> tests/test-poll.c | 272 +++++++++++++++++++++++++++++++++++++++++
> 13 files changed, 821 insertions(+), 81 deletions(-)
> create mode 100644 include/qemu/poll.h
> create mode 100644 poll-glib.c
> create mode 100644 poll-linux.c
> create mode 100644 tests/test-poll.c
>
> --
> 1.9.3
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction
2014-12-16 2:04 ` [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
@ 2015-01-07 15:08 ` Stefan Hajnoczi
2015-01-08 2:53 ` Fam Zheng
0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2015-01-07 15:08 UTC (permalink / raw)
To: Fam Zheng; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, xigao
[-- Attachment #1: Type: text/plain, Size: 2285 bytes --]
On Tue, Dec 16, 2014 at 10:04:38AM +0800, Fam Zheng wrote:
> On Thu, 12/04 11:43, Fam Zheng wrote:
> > v2: Emulate nanoseconds precison of timeout with ppoll and timerfd.
> > Their performance is on par with each other, but both much better than
> > qemu.git:
> >
> > syscall high # of fd low # of fd
> > -------------------------------------------------
> > qemu.git(ppoll) 44 96
> > ppoll+epoll 85 101
> > timerfd+epoll 87 109
>
> More data points.
>
> Xiaomei tested this series (applied on top of RHEL 7 qemu-kvm-rhev) and found
> that:
>
> 0) when # of fds is high, epoll solutions are much better (+30%).
>
> 1) timerfd+epoll is slightly better than ppoll+epoll, but the difference is
> minimal.
>
> 2) original code is 2%~5% faster than the new implementations when # of fds is
> low.
What is "high" and "low"?
I'd like to understand whether they are extremes that almost no users
will encounter or whether they are plausible in the real world.
> This leads to the conclusion that that we'll have a small performance
> degradation if merge this series. I'm thinking about possible optimizations.
> Options in my mind are:
>
> 1) Remove 1ns PR_SET_TIMERSLACK in timerfd+epoll, this doesn't make qemu_poll
> faster than the old qemu_poll_ns, but may have other positive effects that
> compensate the cost.
Sounds like a random hack. What is the reasoning for messing with timer
slack?
Perhaps it is worth investigating timer slack as an independent issue
though.
> 2) Use dynamic switch between ppoll and timerfd+epoll. In poll-linux.c, We
> start with pure ppoll, while keeping track of elapsed time in ppoll. And
> periodically, we try "timerfd+epoll" for a few iterations, so that we can
> compare if it is faster than pure ppoll. If it is, swap them, use timerfd+epoll
> and and periodically try "ppoll".
>
> That said, I'll also look at the kernel side. Maybe optimizing ppoll or just
> add EPOLL_NANOSECOND_TIMEOUT to epoll_create1 is a better place for
> engineering.
I agree that a kernel fix would be good. Even if the patch is rejected,
we might get good ideas on how applications can optimize.
Stefan
[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction
2015-01-07 15:08 ` Stefan Hajnoczi
@ 2015-01-08 2:53 ` Fam Zheng
2015-01-08 13:00 ` Paolo Bonzini
0 siblings, 1 reply; 11+ messages in thread
From: Fam Zheng @ 2015-01-08 2:53 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, xigao
On Wed, 01/07 15:08, Stefan Hajnoczi wrote:
> On Tue, Dec 16, 2014 at 10:04:38AM +0800, Fam Zheng wrote:
> > On Thu, 12/04 11:43, Fam Zheng wrote:
> > > v2: Emulate nanoseconds precison of timeout with ppoll and timerfd.
> > > Their performance is on par with each other, but both much better than
> > > qemu.git:
> > >
> > > syscall high # of fd low # of fd
> > > -------------------------------------------------
> > > qemu.git(ppoll) 44 96
> > > ppoll+epoll 85 101
> > > timerfd+epoll 87 109
> >
> > More data points.
> >
> > Xiaomei tested this series (applied on top of RHEL 7 qemu-kvm-rhev) and found
> > that:
> >
> > 0) when # of fds is high, epoll solutions are much better (+30%).
> >
> > 1) timerfd+epoll is slightly better than ppoll+epoll, but the difference is
> > minimal.
> >
> > 2) original code is 2%~5% faster than the new implementations when # of fds is
> > low.
>
> What is "high" and "low"?
>
> I'd like to understand whether they are extremes that almost no users
> will encounter or whether they are plausible in the real world.
In the origin story, "low" means barely few fds, say 15; and "high" means what
we get after plugging one virtio-serial device, say 70. I wouldn't consider it
a extreme case because we assign one ioeventfd for each vq, and # of vq could
be times of host cpu core number. In a relatively big system it can go to a few
hundreds, easily.
>
> > This leads to the conclusion that that we'll have a small performance
> > degradation if merge this series. I'm thinking about possible optimizations.
> > Options in my mind are:
> >
> > 1) Remove 1ns PR_SET_TIMERSLACK in timerfd+epoll, this doesn't make qemu_poll
> > faster than the old qemu_poll_ns, but may have other positive effects that
> > compensate the cost.
>
> Sounds like a random hack. What is the reasoning for messing with timer
> slack?
In a test this doesn't work. The reason is that timer slack affects poll
sys calls' timeout, therefore they are correlated. Anyway, I've left this.
Fam
>
> Perhaps it is worth investigating timer slack as an independent issue
> though.
>
> > 2) Use dynamic switch between ppoll and timerfd+epoll. In poll-linux.c, We
> > start with pure ppoll, while keeping track of elapsed time in ppoll. And
> > periodically, we try "timerfd+epoll" for a few iterations, so that we can
> > compare if it is faster than pure ppoll. If it is, swap them, use timerfd+epoll
> > and and periodically try "ppoll".
> >
> > That said, I'll also look at the kernel side. Maybe optimizing ppoll or just
> > add EPOLL_NANOSECOND_TIMEOUT to epoll_create1 is a better place for
> > engineering.
>
> I agree that a kernel fix would be good. Even if the patch is rejected,
> we might get good ideas on how applications can optimize.
>
> Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction
2015-01-08 2:53 ` Fam Zheng
@ 2015-01-08 13:00 ` Paolo Bonzini
0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2015-01-08 13:00 UTC (permalink / raw)
To: Fam Zheng, Stefan Hajnoczi; +Cc: Kevin Wolf, xigao, qemu-devel, Stefan Hajnoczi
On 08/01/2015 03:53, Fam Zheng wrote:
>>> > > 1) Remove 1ns PR_SET_TIMERSLACK in timerfd+epoll, this doesn't make qemu_poll
>>> > > faster than the old qemu_poll_ns, but may have other positive effects that
>>> > > compensate the cost.
>> >
>> > Sounds like a random hack. What is the reasoning for messing with timer
>> > slack?
> In a test this doesn't work. The reason is that timer slack affects poll
> sys calls' timeout, therefore they are correlated. Anyway, I've left this.
1ns is definitely too much, it's just accuracy theater (and I am
responsible for it). We don't care about 3 clock cycles when a context
switch is at least 500.
We probably could change it to 100 ns (0.1 us) with no practical
effects; the default is 50 us.
Paolo
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-01-08 13:00 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-04 3:43 [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 1/6] poll: Introduce QEMU Poll API Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 2/6] posix-aio: Use QEMU poll interface Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 3/6] poll: Add epoll implementation for qemu_poll Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 4/6] main-loop: Replace qemu_poll_ns with qemu_poll Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 5/6] tests: Add test case for qemu_poll Fam Zheng
2014-12-04 3:43 ` [Qemu-devel] [PATCH v2 6/6] poll-linux: Add timerfd support Fam Zheng
2014-12-16 2:04 ` [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction Fam Zheng
2015-01-07 15:08 ` Stefan Hajnoczi
2015-01-08 2:53 ` Fam Zheng
2015-01-08 13:00 ` Paolo Bonzini
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.