All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts"
@ 2012-09-25 12:55 Paolo Bonzini
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 01/17] build: do not rely on indirect inclusion of qemu-config.h Paolo Bonzini
                   ` (18 more replies)
  0 siblings, 19 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:55 UTC (permalink / raw)
  To: qemu-devel

This series removes the globals from async.c/aio-posix.c so that
multiple AIO contexts (mini event loops) can be added.  Right now,
all block devices still use qemu_bh_new, but switching them to
aio_bh_new would let you associate different files with different
AIO contexts.

As an added bonus, integration with the glib main loop now happens
via GSource.  Each AIO context is a GSource, which means you can
choose either to run it in its own thread (this of course needs
proper locking which is not yet here), or to attach it to the main
thread.

In this state this is a bit of an academic exercise (though it works
and may even make sense for 1.3), but I think it's an example of the
tiny steps that can lead us towards an upstreamable version of the
data-plane code.

Paolo

Paolo Bonzini (17):
  build: do not rely on indirect inclusion of qemu-config.h
  event_notifier: enable it to use pipes
  event_notifier: add Win32 implementation
  aio: change qemu_aio_set_fd_handler to return void
  aio: provide platform-independent API
  aio: introduce AioContext, move bottom halves there
  aio: add I/O handlers to the AioContext interface
  aio: add non-blocking variant of aio_wait
  aio: prepare for introducing GSource-based dispatch
  aio: add Win32 implementation
  aio: make AioContexts GSources
  aio: add aio_notify
  aio: call aio_notify after setting I/O handlers
  main-loop: use GSource to poll AIO file descriptors
  main-loop: use aio_notify for qemu_notify_event
  aio: clean up now-unused functions
  linux-aio: use event notifiers

 Makefile.objs                              |   6 +-
 aio.c => aio-posix.c                       | 159 +++++++++++++++-------
 aio.c => aio-win32.c                       | 190 ++++++++++++++------------
 async.c                                    | 118 ++++++++++++++---
 block/Makefile.objs                        |   6 +-
 block/blkdebug.c                           |   1 +
 block/iscsi.c                              |   1 +
 event_notifier.c => event_notifier-posix.c |  83 +++++++++---
 event_notifier.c => event_notifier-win32.c |  48 +++----
 event_notifier.h                           |  20 ++-
 hw/hw.h                                    |   1 +
 iohandler.c                                |   1 +
 linux-aio.c                                |  49 +++----
 main-loop.c                                | 152 +++++++--------------
 main-loop.h                                |  56 +-------
 oslib-posix.c                              |  31 -----
 qemu-aio.h                                 | 206 +++++++++++++++++++++++++++--
 qemu-char.h                                |   1 +
 qemu-common.h                              |   2 +-
 qemu-config.h                              |   1 +
 qemu-coroutine-lock.c                      |   2 +-
 qemu-os-win32.h                            |   1 -
 22 file modificati, 702 inserzioni(+), 433 rimozioni(-)
 copy aio.c => aio-posix.c (48%)
 rename aio.c => aio-win32.c (44%)
 copy event_notifier.c => event_notifier-posix.c (36%)
 rename event_notifier.c => event_notifier-win32.c (49%)

-- 
1.7.12

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 01/17] build: do not rely on indirect inclusion of qemu-config.h
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
@ 2012-09-25 12:55 ` Paolo Bonzini
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 02/17] event_notifier: enable it to use pipes Paolo Bonzini
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:55 UTC (permalink / raw)
  To: qemu-devel

Some files in the block layer rely on qemu-char.h including qemu-config.h,
but the block layer does not need qemu-char.h at all.  Clean this up.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/blkdebug.c | 1 +
 block/iscsi.c    | 1 +
 qemu-config.h    | 1 +
 3 file modificati, 3 inserzioni(+)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 59dcea0..213789b 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu-common.h"
+#include "qemu-config.h"
 #include "block_int.h"
 #include "module.h"
 
diff --git a/block/iscsi.c b/block/iscsi.c
index 0b96165..cf133d5 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -27,6 +27,7 @@
 #include <poll.h>
 #include <arpa/inet.h>
 #include "qemu-common.h"
+#include "qemu-config.h"
 #include "qemu-error.h"
 #include "block_int.h"
 #include "trace.h"
diff --git a/qemu-config.h b/qemu-config.h
index 5557562..daf1539 100644
--- a/qemu-config.h
+++ b/qemu-config.h
@@ -2,6 +2,7 @@
 #define QEMU_CONFIG_H
 
 #include "error.h"
+#include "qemu-option.h"
 
 extern QemuOptsList qemu_fsdev_opts;
 extern QemuOptsList qemu_virtfs_opts;
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 02/17] event_notifier: enable it to use pipes
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 01/17] build: do not rely on indirect inclusion of qemu-config.h Paolo Bonzini
@ 2012-09-25 12:55 ` Paolo Bonzini
  2012-10-08  7:03   ` Stefan Hajnoczi
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 03/17] event_notifier: add Win32 implementation Paolo Bonzini
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:55 UTC (permalink / raw)
  To: qemu-devel

This takes the eventfd emulation code from the main loop.  When the
EventNotifier is used for the main loop too, we need this compatibility
code.

Without CONFIG_EVENTFD, event_notifier_get_fd is only usable for the
"read" side of the notifier, for example to set a select() handler.

Reviewed-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 event_notifier.c | 83 +++++++++++++++++++++++++++++++++++++++++++++-----------
 event_notifier.h |  3 +-
 2 file modificati, 69 inserzioni(+), 17 rimozioni(-)

diff --git a/event_notifier.c b/event_notifier.c
index 2c207e1..88c114b 100644
--- a/event_notifier.c
+++ b/event_notifier.c
@@ -20,48 +20,99 @@
 
 void event_notifier_init_fd(EventNotifier *e, int fd)
 {
-    e->fd = fd;
+    e->rfd = fd;
+    e->wfd = fd;
 }
 
 int event_notifier_init(EventNotifier *e, int active)
 {
+    int fds[2];
+    int ret;
+
 #ifdef CONFIG_EVENTFD
-    int fd = eventfd(!!active, EFD_NONBLOCK | EFD_CLOEXEC);
-    if (fd < 0)
-        return -errno;
-    e->fd = fd;
-    return 0;
+    ret = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
 #else
-    return -ENOSYS;
+    ret = -1;
+    errno = ENOSYS;
 #endif
+    if (ret >= 0) {
+        e->rfd = e->wfd = ret;
+    } else {
+        if (errno != ENOSYS) {
+            return -errno;
+        }
+        if (qemu_pipe(fds) < 0) {
+            return -errno;
+        }
+        ret = fcntl_setfl(fds[0], O_NONBLOCK);
+        if (ret < 0) {
+            goto fail;
+        }
+        ret = fcntl_setfl(fds[1], O_NONBLOCK);
+        if (ret < 0) {
+            goto fail;
+        }
+        e->rfd = fds[0];
+        e->wfd = fds[1];
+    }
+    if (active) {
+        event_notifier_set(e);
+    }
+    return 0;
+
+fail:
+    close(fds[0]);
+    close(fds[1]);
+    return ret;
 }
 
 void event_notifier_cleanup(EventNotifier *e)
 {
-    close(e->fd);
+    if (e->rfd != e->wfd) {
+        close(e->rfd);
+    }
+    close(e->wfd);
 }
 
 int event_notifier_get_fd(EventNotifier *e)
 {
-    return e->fd;
+    return e->rfd;
 }
 
 int event_notifier_set_handler(EventNotifier *e,
                                EventNotifierHandler *handler)
 {
-    return qemu_set_fd_handler(e->fd, (IOHandler *)handler, NULL, e);
+    return qemu_set_fd_handler(e->rfd, (IOHandler *)handler, NULL, e);
 }
 
 int event_notifier_set(EventNotifier *e)
 {
-    uint64_t value = 1;
-    int r = write(e->fd, &value, sizeof(value));
-    return r == sizeof(value);
+    static const uint64_t value = 1;
+    ssize_t ret;
+
+    do {
+        ret = write(e->wfd, &value, sizeof(value));
+    } while (ret < 0 && errno == EINTR);
+
+    /* EAGAIN is fine, a read must be pending.  */
+    if (ret < 0 && errno != EAGAIN) {
+        return -1;
+    }
+    return 0;
 }
 
 int event_notifier_test_and_clear(EventNotifier *e)
 {
-    uint64_t value;
-    int r = read(e->fd, &value, sizeof(value));
-    return r == sizeof(value);
+    int value;
+    ssize_t len;
+    char buffer[512];
+
+    /* Drain the notify pipe.  For eventfd, only 8 bytes will be read.  */
+    value = 0;
+    do {
+        len = read(e->rfd, buffer, sizeof(buffer));
+        value |= (len > 0);
+    } while ((len == -1 && errno == EINTR) || len == sizeof(buffer));
+
+    return value;
 }
diff --git a/event_notifier.h b/event_notifier.h
index f0ec2f2..f04d12d 100644
--- a/event_notifier.h
+++ b/event_notifier.h
@@ -16,7 +16,8 @@
 #include "qemu-common.h"
 
 struct EventNotifier {
-    int fd;
+    int rfd;
+    int wfd;
 };
 
 typedef void EventNotifierHandler(EventNotifier *);
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 03/17] event_notifier: add Win32 implementation
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 01/17] build: do not rely on indirect inclusion of qemu-config.h Paolo Bonzini
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 02/17] event_notifier: enable it to use pipes Paolo Bonzini
@ 2012-09-25 12:55 ` Paolo Bonzini
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 04/17] aio: change qemu_aio_set_fd_handler to return void Paolo Bonzini
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:55 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.objs                              |  4 +-
 event_notifier.c => event_notifier-posix.c |  0
 event_notifier.c => event_notifier-win32.c | 97 ++++++------------------------
 event_notifier.h                           | 17 +++++-
 qemu-os-win32.h                            |  1 -
 5 file modificati, 37 inserzioni(+), 82 rimozioni(-)
 copy event_notifier.c => event_notifier-posix.c (100%)
 rename event_notifier.c => event_notifier-win32.c (29%)

diff --git a/Makefile.objs b/Makefile.objs
index cf00fd5..a99378c 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -91,7 +91,9 @@ common-obj-y += bt-host.o bt-vhci.o
 
 common-obj-y += acl.o
 common-obj-$(CONFIG_POSIX) += compatfd.o
-common-obj-y += notify.o event_notifier.o
+common-obj-y += notify.o
+common-obj-$(CONFIG_POSIX) += event_notifier-posix.o
+common-obj-$(CONFIG_WIN32) += event_notifier-win32.o
 common-obj-y += qemu-timer.o qemu-timer-common.o
 
 common-obj-$(CONFIG_SLIRP) += slirp/
diff --git a/event_notifier.c b/event_notifier-posix.c
similarity index 100%
copy from event_notifier.c
copy to event_notifier-posix.c
diff --git a/event_notifier.c b/event_notifier-win32.c
similarity index 29%
rename from event_notifier.c
rename to event_notifier-win32.c
index 88c114b..c723dad 100644
--- a/event_notifier.c
+++ b/event_notifier-win32.c
@@ -12,107 +12,48 @@
 
 #include "qemu-common.h"
 #include "event_notifier.h"
-#include "qemu-char.h"
-
-#ifdef CONFIG_EVENTFD
-#include <sys/eventfd.h>
-#endif
-
-void event_notifier_init_fd(EventNotifier *e, int fd)
-{
-    e->rfd = fd;
-    e->wfd = fd;
-}
+#include "main-loop.h"
 
 int event_notifier_init(EventNotifier *e, int active)
 {
-    int fds[2];
-    int ret;
-
-#ifdef CONFIG_EVENTFD
-    ret = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
-#else
-    ret = -1;
-    errno = ENOSYS;
-#endif
-    if (ret >= 0) {
-        e->rfd = e->wfd = ret;
-    } else {
-        if (errno != ENOSYS) {
-            return -errno;
-        }
-        if (qemu_pipe(fds) < 0) {
-            return -errno;
-        }
-        ret = fcntl_setfl(fds[0], O_NONBLOCK);
-        if (ret < 0) {
-            goto fail;
-        }
-        ret = fcntl_setfl(fds[1], O_NONBLOCK);
-        if (ret < 0) {
-            goto fail;
-        }
-        e->rfd = fds[0];
-        e->wfd = fds[1];
-    }
-    if (active) {
-        event_notifier_set(e);
-    }
+    e->event = CreateEvent(NULL, FALSE, FALSE, NULL);
+    assert(e->event);
     return 0;
-
-fail:
-    close(fds[0]);
-    close(fds[1]);
-    return ret;
 }
 
 void event_notifier_cleanup(EventNotifier *e)
 {
-    if (e->rfd != e->wfd) {
-        close(e->rfd);
-    }
-    close(e->wfd);
+    CloseHandle(e->event);
 }
 
-int event_notifier_get_fd(EventNotifier *e)
+HANDLE event_notifier_get_handle(EventNotifier *e)
 {
-    return e->rfd;
+    return e->event;
 }
 
 int event_notifier_set_handler(EventNotifier *e,
                                EventNotifierHandler *handler)
 {
-    return qemu_set_fd_handler(e->rfd, (IOHandler *)handler, NULL, e);
+    if (handler) {
+        return qemu_add_wait_object(e->event, (IOHandler *)handler, e);
+    } else {
+        qemu_del_wait_object(e->event, (IOHandler *)handler, e);
+        return 0;
+    }
 }
 
 int event_notifier_set(EventNotifier *e)
 {
-    static const uint64_t value = 1;
-    ssize_t ret;
-
-    do {
-        ret = write(e->wfd, &value, sizeof(value));
-    } while (ret < 0 && errno == EINTR);
-
-    /* EAGAIN is fine, a read must be pending.  */
-    if (ret < 0 && errno != EAGAIN) {
-        return -1;
-    }
+    SetEvent(e->event);
     return 0;
 }
 
 int event_notifier_test_and_clear(EventNotifier *e)
 {
-    int value;
-    ssize_t len;
-    char buffer[512];
-
-    /* Drain the notify pipe.  For eventfd, only 8 bytes will be read.  */
-    value = 0;
-    do {
-        len = read(e->rfd, buffer, sizeof(buffer));
-        value |= (len > 0);
-    } while ((len == -1 && errno == EINTR) || len == sizeof(buffer));
-
-    return value;
+    int ret = WaitForSingleObject(e->event, 0);
+    if (ret == WAIT_OBJECT_0) {
+        ResetEvent(e->event);
+        return true;
+    }
+    return false;
 }
diff --git a/event_notifier.h b/event_notifier.h
index f04d12d..88b57af 100644
--- a/event_notifier.h
+++ b/event_notifier.h
@@ -15,19 +15,32 @@
 
 #include "qemu-common.h"
 
+#ifdef _WIN32
+#include <windows.h>
+#endif
+
 struct EventNotifier {
+#ifdef _WIN32
+    HANDLE event;
+#else
     int rfd;
     int wfd;
+#endif
 };
 
 typedef void EventNotifierHandler(EventNotifier *);
 
-void event_notifier_init_fd(EventNotifier *, int fd);
 int event_notifier_init(EventNotifier *, int active);
 void event_notifier_cleanup(EventNotifier *);
-int event_notifier_get_fd(EventNotifier *);
 int event_notifier_set(EventNotifier *);
 int event_notifier_test_and_clear(EventNotifier *);
 int event_notifier_set_handler(EventNotifier *, EventNotifierHandler *);
 
+#ifdef CONFIG_POSIX
+void event_notifier_init_fd(EventNotifier *, int fd);
+int event_notifier_get_fd(EventNotifier *);
+#else
+HANDLE event_notifier_get_handle(EventNotifier *);
+#endif
+
 #endif
diff --git a/qemu-os-win32.h b/qemu-os-win32.h
index 3b5a35b..400264e 100644
--- a/qemu-os-win32.h
+++ b/qemu-os-win32.h
@@ -28,7 +28,6 @@
 
 #include <windows.h>
 #include <winsock2.h>
-#include "main-loop.h"
 
 /* Workaround for older versions of MinGW. */
 #ifndef ECONNREFUSED
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 04/17] aio: change qemu_aio_set_fd_handler to return void
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (2 preceding siblings ...)
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 03/17] event_notifier: add Win32 implementation Paolo Bonzini
@ 2012-09-25 12:55 ` Paolo Bonzini
  2012-09-25 21:47   ` Anthony Liguori
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 05/17] aio: provide platform-independent API Paolo Bonzini
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:55 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 aio.c      | 12 +++++-------
 qemu-aio.h | 10 +++++-----
 2 file modificati, 10 inserzioni(+), 12 rimozioni(-)

diff --git a/aio.c b/aio.c
index c738a4e..e062aab 100644
--- a/aio.c
+++ b/aio.c
@@ -53,11 +53,11 @@ static AioHandler *find_aio_handler(int fd)
     return NULL;
 }
 
-int qemu_aio_set_fd_handler(int fd,
-                            IOHandler *io_read,
-                            IOHandler *io_write,
-                            AioFlushHandler *io_flush,
-                            void *opaque)
+void qemu_aio_set_fd_handler(int fd,
+                             IOHandler *io_read,
+                             IOHandler *io_write,
+                             AioFlushHandler *io_flush,
+                             void *opaque)
 {
     AioHandler *node;
 
@@ -93,8 +93,6 @@ int qemu_aio_set_fd_handler(int fd,
     }
 
     qemu_set_fd_handler2(fd, NULL, io_read, io_write, opaque);
-
-    return 0;
 }
 
 void qemu_aio_flush(void)
diff --git a/qemu-aio.h b/qemu-aio.h
index bfdd35f..27a7e21 100644
--- a/qemu-aio.h
+++ b/qemu-aio.h
@@ -60,10 +60,10 @@ bool qemu_aio_wait(void);
  * Code that invokes AIO completion functions should rely on this function
  * instead of qemu_set_fd_handler[2].
  */
-int qemu_aio_set_fd_handler(int fd,
-                            IOHandler *io_read,
-                            IOHandler *io_write,
-                            AioFlushHandler *io_flush,
-                            void *opaque);
+void qemu_aio_set_fd_handler(int fd,
+                             IOHandler *io_read,
+                             IOHandler *io_write,
+                             AioFlushHandler *io_flush,
+                             void *opaque);
 
 #endif
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 05/17] aio: provide platform-independent API
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (3 preceding siblings ...)
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 04/17] aio: change qemu_aio_set_fd_handler to return void Paolo Bonzini
@ 2012-09-25 12:55 ` Paolo Bonzini
  2012-09-25 21:48   ` Anthony Liguori
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 06/17] aio: introduce AioContext, move bottom halves there Paolo Bonzini
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:55 UTC (permalink / raw)
  To: qemu-devel

This adds to aio.c a platform-independent API based on EventNotifiers, that
can be used by both POSIX and Win32.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.objs |  4 ++--
 aio.c         |  9 +++++++++
 qemu-aio.h    | 19 ++++++++++++++++++-
 3 file modificati, 29 inserzioni(+), 3 rimozioni(-)

diff --git a/Makefile.objs b/Makefile.objs
index a99378c..713dd87 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -45,6 +45,8 @@ block-obj-y = iov.o cache-utils.o qemu-option.o module.o async.o
 block-obj-y += nbd.o block.o aio.o aes.o qemu-config.o qemu-progress.o qemu-sockets.o
 block-obj-y += $(coroutine-obj-y) $(qobject-obj-y) $(version-obj-y)
 block-obj-$(CONFIG_POSIX) += posix-aio-compat.o
+block-obj-$(CONFIG_POSIX) += event_notifier-posix.o
+block-obj-$(CONFIG_WIN32) += event_notifier-win32.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 block-obj-y += block/
 
@@ -92,8 +94,6 @@ common-obj-y += bt-host.o bt-vhci.o
 common-obj-y += acl.o
 common-obj-$(CONFIG_POSIX) += compatfd.o
 common-obj-y += notify.o
-common-obj-$(CONFIG_POSIX) += event_notifier-posix.o
-common-obj-$(CONFIG_WIN32) += event_notifier-win32.o
 common-obj-y += qemu-timer.o qemu-timer-common.o
 
 common-obj-$(CONFIG_SLIRP) += slirp/
diff --git a/aio.c b/aio.c
index e062aab..44214e1 100644
--- a/aio.c
+++ b/aio.c
@@ -95,6 +95,15 @@ void qemu_aio_set_fd_handler(int fd,
     qemu_set_fd_handler2(fd, NULL, io_read, io_write, opaque);
 }
 
+void qemu_aio_set_event_notifier(EventNotifier *notifier,
+                                 EventNotifierHandler *io_read,
+                                 AioFlushEventNotifierHandler *io_flush)
+{
+    qemu_aio_set_fd_handler(event_notifier_get_fd(notifier),
+                            (IOHandler *)io_read, NULL,
+                            (AioFlushHandler *)io_flush, notifier);
+}
+
 void qemu_aio_flush(void)
 {
     while (qemu_aio_wait());
diff --git a/qemu-aio.h b/qemu-aio.h
index 27a7e21..dc416a5 100644
--- a/qemu-aio.h
+++ b/qemu-aio.h
@@ -16,6 +16,7 @@
 
 #include "qemu-common.h"
 #include "qemu-char.h"
+#include "event_notifier.h"
 
 typedef struct BlockDriverAIOCB BlockDriverAIOCB;
 typedef void BlockDriverCompletionFunc(void *opaque, int ret);
@@ -39,7 +40,7 @@ void *qemu_aio_get(AIOPool *pool, BlockDriverState *bs,
 void qemu_aio_release(void *p);
 
 /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
-typedef int (AioFlushHandler)(void *opaque);
+typedef int (AioFlushEventNotifierHandler)(EventNotifier *e);
 
 /* Flush any pending AIO operation. This function will block until all
  * outstanding AIO operations have been completed or cancelled. */
@@ -53,6 +54,10 @@ void qemu_aio_flush(void);
  * Return whether there is still any pending AIO operation.  */
 bool qemu_aio_wait(void);
 
+#ifdef CONFIG_POSIX
+/* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
+typedef int (AioFlushHandler)(void *opaque);
+
 /* Register a file descriptor and associated callbacks.  Behaves very similarly
  * to qemu_set_fd_handler2.  Unlike qemu_set_fd_handler2, these callbacks will
  * be invoked when using either qemu_aio_wait() or qemu_aio_flush().
@@ -65,5 +70,17 @@ void qemu_aio_set_fd_handler(int fd,
                              IOHandler *io_write,
                              AioFlushHandler *io_flush,
                              void *opaque);
+#endif
+
+/* Register an event notifier and associated callbacks.  Behaves very similarly
+ * to event_notifier_set_handler.  Unlike event_notifier_set_handler, these callbacks
+ * will be invoked when using either qemu_aio_wait() or qemu_aio_flush().
+ *
+ * Code that invokes AIO completion functions should rely on this function
+ * instead of event_notifier_set_handler.
+ */
+void qemu_aio_set_event_notifier(EventNotifier *notifier,
+                                 EventNotifierHandler *io_read,
+                                 AioFlushEventNotifierHandler *io_flush);
 
 #endif
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 06/17] aio: introduce AioContext, move bottom halves there
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (4 preceding siblings ...)
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 05/17] aio: provide platform-independent API Paolo Bonzini
@ 2012-09-25 12:55 ` Paolo Bonzini
  2012-09-25 21:51   ` Anthony Liguori
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 07/17] aio: add I/O handlers to the AioContext interface Paolo Bonzini
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:55 UTC (permalink / raw)
  To: qemu-devel

Start introducing AioContext, which will let us remove globals from
aio.c/async.c, and introduce multiple I/O threads.

The bottom half functions now take an additional AioContext argument.
A bottom half is created with a specific AioContext that remains the
same throughout the lifetime.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 async.c               | 30 +++++++++----------
 hw/hw.h               |  1 +
 iohandler.c           |  1 +
 main-loop.c           | 18 +++++++++++-
 main-loop.h           | 54 ++---------------------------------
 qemu-aio.h            | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 qemu-char.h           |  1 +
 qemu-common.h         |  1 +
 qemu-coroutine-lock.c |  2 +-
 9 file modificati, 118 inserzioni(+), 69 rimozioni(-)

diff --git a/async.c b/async.c
index 85cc641..189ee1b 100644
--- a/async.c
+++ b/async.c
@@ -26,9 +26,6 @@
 #include "qemu-aio.h"
 #include "main-loop.h"
 
-/* Anchor of the list of Bottom Halves belonging to the context */
-static struct QEMUBH *first_bh;
-
 /***********************************************************/
 /* bottom halves (can be seen as timers which expire ASAP) */
 
@@ -41,27 +38,26 @@ struct QEMUBH {
     bool deleted;
 };
 
-QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
+QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
 {
     QEMUBH *bh;
     bh = g_malloc0(sizeof(QEMUBH));
     bh->cb = cb;
     bh->opaque = opaque;
-    bh->next = first_bh;
-    first_bh = bh;
+    bh->next = ctx->first_bh;
+    ctx->first_bh = bh;
     return bh;
 }
 
-int qemu_bh_poll(void)
+int aio_bh_poll(AioContext *ctx)
 {
     QEMUBH *bh, **bhp, *next;
     int ret;
-    static int nesting = 0;
 
-    nesting++;
+    ctx->walking_bh++;
 
     ret = 0;
-    for (bh = first_bh; bh; bh = next) {
+    for (bh = ctx->first_bh; bh; bh = next) {
         next = bh->next;
         if (!bh->deleted && bh->scheduled) {
             bh->scheduled = 0;
@@ -72,11 +68,11 @@ int qemu_bh_poll(void)
         }
     }
 
-    nesting--;
+    ctx->walking_bh--;
 
     /* remove deleted bhs */
-    if (!nesting) {
-        bhp = &first_bh;
+    if (!ctx->walking_bh) {
+        bhp = &ctx->first_bh;
         while (*bhp) {
             bh = *bhp;
             if (bh->deleted) {
@@ -120,11 +116,11 @@ void qemu_bh_delete(QEMUBH *bh)
     bh->deleted = 1;
 }
 
-void qemu_bh_update_timeout(uint32_t *timeout)
+void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout)
 {
     QEMUBH *bh;
 
-    for (bh = first_bh; bh; bh = bh->next) {
+    for (bh = ctx->first_bh; bh; bh = bh->next) {
         if (!bh->deleted && bh->scheduled) {
             if (bh->idle) {
                 /* idle bottom halves will be polled at least
@@ -140,3 +136,7 @@ void qemu_bh_update_timeout(uint32_t *timeout)
     }
 }
 
+AioContext *aio_context_new(void)
+{
+    return g_new0(AioContext, 1);
+}
diff --git a/hw/hw.h b/hw/hw.h
index e5cb9bf..acb718d 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -10,6 +10,7 @@
 
 #include "ioport.h"
 #include "irq.h"
+#include "qemu-aio.h"
 #include "qemu-file.h"
 #include "vmstate.h"
 
diff --git a/iohandler.c b/iohandler.c
index a2d871b..60460a6 100644
--- a/iohandler.c
+++ b/iohandler.c
@@ -26,6 +26,7 @@
 #include "qemu-common.h"
 #include "qemu-char.h"
 #include "qemu-queue.h"
+#include "qemu-aio.h"
 #include "main-loop.h"
 
 #ifndef _WIN32
diff --git a/main-loop.c b/main-loop.c
index eb3b6e6..f0bc515 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -26,6 +26,7 @@
 #include "qemu-timer.h"
 #include "slirp/slirp.h"
 #include "main-loop.h"
+#include "qemu-aio.h"
 
 #ifndef _WIN32
 
@@ -199,6 +200,8 @@ static int qemu_signal_init(void)
 }
 #endif
 
+static AioContext *qemu_aio_context;
+
 int main_loop_init(void)
 {
     int ret;
@@ -215,6 +218,7 @@ int main_loop_init(void)
         return ret;
     }
 
+    qemu_aio_context = aio_context_new();
     return 0;
 }
 
@@ -478,7 +482,7 @@ int main_loop_wait(int nonblocking)
     if (nonblocking) {
         timeout = 0;
     } else {
-        qemu_bh_update_timeout(&timeout);
+        aio_bh_update_timeout(qemu_aio_context, &timeout);
     }
 
     /* poll any events */
@@ -507,3 +511,15 @@ int main_loop_wait(int nonblocking)
 
     return ret;
 }
+
+/* Functions to operate on the main QEMU AioContext.  */
+
+QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
+{
+    return aio_bh_new(qemu_aio_context, cb, opaque);
+}
+
+int qemu_bh_poll(void)
+{
+    return aio_bh_poll(qemu_aio_context);
+}
diff --git a/main-loop.h b/main-loop.h
index dce1cd9..47644ce 100644
--- a/main-loop.h
+++ b/main-loop.h
@@ -25,6 +25,8 @@
 #ifndef QEMU_MAIN_LOOP_H
 #define QEMU_MAIN_LOOP_H 1
 
+#include "qemu-aio.h"
+
 #define SIG_IPI SIGUSR1
 
 /**
@@ -173,7 +175,6 @@ void qemu_del_wait_object(HANDLE handle, WaitObjectFunc *func, void *opaque);
 
 typedef void IOReadHandler(void *opaque, const uint8_t *buf, int size);
 typedef int IOCanReadHandler(void *opaque);
-typedef void IOHandler(void *opaque);
 
 /**
  * qemu_set_fd_handler2: Register a file descriptor with the main loop
@@ -254,56 +255,6 @@ int qemu_set_fd_handler(int fd,
                         IOHandler *fd_write,
                         void *opaque);
 
-typedef struct QEMUBH QEMUBH;
-typedef void QEMUBHFunc(void *opaque);
-
-/**
- * qemu_bh_new: Allocate a new bottom half structure.
- *
- * Bottom halves are lightweight callbacks whose invocation is guaranteed
- * to be wait-free, thread-safe and signal-safe.  The #QEMUBH structure
- * is opaque and must be allocated prior to its use.
- */
-QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
-
-/**
- * qemu_bh_schedule: Schedule a bottom half.
- *
- * Scheduling a bottom half interrupts the main loop and causes the
- * execution of the callback that was passed to qemu_bh_new.
- *
- * Bottom halves that are scheduled from a bottom half handler are instantly
- * invoked.  This can create an infinite loop if a bottom half handler
- * schedules itself.
- *
- * @bh: The bottom half to be scheduled.
- */
-void qemu_bh_schedule(QEMUBH *bh);
-
-/**
- * qemu_bh_cancel: Cancel execution of a bottom half.
- *
- * Canceling execution of a bottom half undoes the effect of calls to
- * qemu_bh_schedule without freeing its resources yet.  While cancellation
- * itself is also wait-free and thread-safe, it can of course race with the
- * loop that executes bottom halves unless you are holding the iothread
- * mutex.  This makes it mostly useless if you are not holding the mutex.
- *
- * @bh: The bottom half to be canceled.
- */
-void qemu_bh_cancel(QEMUBH *bh);
-
-/**
- *qemu_bh_delete: Cancel execution of a bottom half and free its resources.
- *
- * Deleting a bottom half frees the memory that was allocated for it by
- * qemu_bh_new.  It also implies canceling the bottom half if it was
- * scheduled.
- *
- * @bh: The bottom half to be deleted.
- */
-void qemu_bh_delete(QEMUBH *bh);
-
 #ifdef CONFIG_POSIX
 /**
  * qemu_add_child_watch: Register a child process for reaping.
@@ -359,6 +310,7 @@ void qemu_fd_register(int fd);
 void qemu_iohandler_fill(int *pnfds, fd_set *readfds, fd_set *writefds, fd_set *xfds);
 void qemu_iohandler_poll(fd_set *readfds, fd_set *writefds, fd_set *xfds, int rc);
 
+QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
 void qemu_bh_schedule_idle(QEMUBH *bh);
 int qemu_bh_poll(void);
 void qemu_bh_update_timeout(uint32_t *timeout);
diff --git a/qemu-aio.h b/qemu-aio.h
index dc416a5..2ed6ad3 100644
--- a/qemu-aio.h
+++ b/qemu-aio.h
@@ -15,7 +15,6 @@
 #define QEMU_AIO_H
 
 #include "qemu-common.h"
-#include "qemu-char.h"
 #include "event_notifier.h"
 
 typedef struct BlockDriverAIOCB BlockDriverAIOCB;
@@ -39,9 +38,87 @@ void *qemu_aio_get(AIOPool *pool, BlockDriverState *bs,
                    BlockDriverCompletionFunc *cb, void *opaque);
 void qemu_aio_release(void *p);
 
+typedef struct AioHandler AioHandler;
+typedef void QEMUBHFunc(void *opaque);
+typedef void IOHandler(void *opaque);
+
+typedef struct AioContext {
+    /* Anchor of the list of Bottom Halves belonging to the context */
+    struct QEMUBH *first_bh;
+
+    /* A simple lock used to protect the first_bh list, and ensure that
+     * no callbacks are removed while we're walking and dispatching callbacks.
+     */
+    int walking_bh;
+} AioContext;
+
 /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
 typedef int (AioFlushEventNotifierHandler)(EventNotifier *e);
 
+/**
+ * aio_context_new: Allocate a new AioContext.
+ *
+ * AioContext provide a mini event-loop that can be waited on synchronously.
+ * They also provide bottom halves, a service to execute a piece of code
+ * as soon as possible.
+ */
+AioContext *aio_context_new(void);
+
+/**
+ * aio_bh_new: Allocate a new bottom half structure.
+ *
+ * Bottom halves are lightweight callbacks whose invocation is guaranteed
+ * to be wait-free, thread-safe and signal-safe.  The #QEMUBH structure
+ * is opaque and must be allocated prior to its use.
+ */
+QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
+
+/**
+ * aio_bh_poll: Poll bottom halves for an AioContext.
+ *
+ * These are internal functions used by the QEMU main loop.
+ */
+int aio_bh_poll(AioContext *ctx);
+void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout);
+
+/**
+ * qemu_bh_schedule: Schedule a bottom half.
+ *
+ * Scheduling a bottom half interrupts the main loop and causes the
+ * execution of the callback that was passed to qemu_bh_new.
+ *
+ * Bottom halves that are scheduled from a bottom half handler are instantly
+ * invoked.  This can create an infinite loop if a bottom half handler
+ * schedules itself.
+ *
+ * @bh: The bottom half to be scheduled.
+ */
+void qemu_bh_schedule(QEMUBH *bh);
+
+/**
+ * qemu_bh_cancel: Cancel execution of a bottom half.
+ *
+ * Canceling execution of a bottom half undoes the effect of calls to
+ * qemu_bh_schedule without freeing its resources yet.  While cancellation
+ * itself is also wait-free and thread-safe, it can of course race with the
+ * loop that executes bottom halves unless you are holding the iothread
+ * mutex.  This makes it mostly useless if you are not holding the mutex.
+ *
+ * @bh: The bottom half to be canceled.
+ */
+void qemu_bh_cancel(QEMUBH *bh);
+
+/**
+ *qemu_bh_delete: Cancel execution of a bottom half and free its resources.
+ *
+ * Deleting a bottom half frees the memory that was allocated for it by
+ * qemu_bh_new.  It also implies canceling the bottom half if it was
+ * scheduled.
+ *
+ * @bh: The bottom half to be deleted.
+ */
+void qemu_bh_delete(QEMUBH *bh);
+
 /* Flush any pending AIO operation. This function will block until all
  * outstanding AIO operations have been completed or cancelled. */
 void qemu_aio_flush(void);
diff --git a/qemu-char.h b/qemu-char.h
index 486644b..5087168 100644
--- a/qemu-char.h
+++ b/qemu-char.h
@@ -5,6 +5,7 @@
 #include "qemu-queue.h"
 #include "qemu-option.h"
 #include "qemu-config.h"
+#include "qemu-aio.h"
 #include "qobject.h"
 #include "qstring.h"
 #include "main-loop.h"
diff --git a/qemu-common.h b/qemu-common.h
index e5c2bcd..ac44657 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -14,6 +14,7 @@
 
 typedef struct QEMUTimer QEMUTimer;
 typedef struct QEMUFile QEMUFile;
+typedef struct QEMUBH QEMUBH;
 typedef struct DeviceState DeviceState;
 
 struct Monitor;
diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c
index 26ad76b..9dda3f8 100644
--- a/qemu-coroutine-lock.c
+++ b/qemu-coroutine-lock.c
@@ -26,7 +26,7 @@
 #include "qemu-coroutine.h"
 #include "qemu-coroutine-int.h"
 #include "qemu-queue.h"
-#include "main-loop.h"
+#include "qemu-aio.h"
 #include "trace.h"
 
 static QTAILQ_HEAD(, Coroutine) unlock_bh_queue =
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 07/17] aio: add I/O handlers to the AioContext interface
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (5 preceding siblings ...)
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 06/17] aio: introduce AioContext, move bottom halves there Paolo Bonzini
@ 2012-09-25 12:55 ` Paolo Bonzini
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 08/17] aio: add non-blocking variant of aio_wait Paolo Bonzini
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:55 UTC (permalink / raw)
  To: qemu-devel

With this patch, I/O handlers (including event notifier handlers) can be
attached to a single AioContext.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 aio.c       | 70 ++++++++++++++++++++++++-------------------------------------
 async.c     |  6 ++++++
 main-loop.c | 33 +++++++++++++++++++++++++++++
 qemu-aio.h  | 42 ++++++++++++++++++++++++++++++-------
 4 file modificati, 101 inserzioni(+), 50 rimozioni(-)

diff --git a/aio.c b/aio.c
index 44214e1..c89f1e9 100644
--- a/aio.c
+++ b/aio.c
@@ -18,17 +18,6 @@
 #include "qemu-queue.h"
 #include "qemu_socket.h"
 
-typedef struct AioHandler AioHandler;
-
-/* The list of registered AIO handlers */
-static QLIST_HEAD(, AioHandler) aio_handlers;
-
-/* This is a simple lock used to protect the aio_handlers list.  Specifically,
- * it's used to ensure that no callbacks are removed while we're walking and
- * dispatching callbacks.
- */
-static int walking_handlers;
-
 struct AioHandler
 {
     int fd;
@@ -40,11 +29,11 @@ struct AioHandler
     QLIST_ENTRY(AioHandler) node;
 };
 
-static AioHandler *find_aio_handler(int fd)
+static AioHandler *find_aio_handler(AioContext *ctx, int fd)
 {
     AioHandler *node;
 
-    QLIST_FOREACH(node, &aio_handlers, node) {
+    QLIST_FOREACH(node, &ctx->aio_handlers, node) {
         if (node->fd == fd)
             if (!node->deleted)
                 return node;
@@ -53,21 +42,22 @@ static AioHandler *find_aio_handler(int fd)
     return NULL;
 }
 
-void qemu_aio_set_fd_handler(int fd,
-                             IOHandler *io_read,
-                             IOHandler *io_write,
-                             AioFlushHandler *io_flush,
-                             void *opaque)
+void aio_set_fd_handler(AioContext *ctx,
+                        int fd,
+                        IOHandler *io_read,
+                        IOHandler *io_write,
+                        AioFlushHandler *io_flush,
+                        void *opaque)
 {
     AioHandler *node;
 
-    node = find_aio_handler(fd);
+    node = find_aio_handler(ctx, fd);
 
     /* Are we deleting the fd handler? */
     if (!io_read && !io_write) {
         if (node) {
             /* If the lock is held, just mark the node as deleted */
-            if (walking_handlers)
+            if (ctx->walking_handlers)
                 node->deleted = 1;
             else {
                 /* Otherwise, delete it for real.  We can't just mark it as
@@ -83,7 +73,7 @@ void qemu_aio_set_fd_handler(int fd,
             /* Alloc and insert if it's not already there */
             node = g_malloc0(sizeof(AioHandler));
             node->fd = fd;
-            QLIST_INSERT_HEAD(&aio_handlers, node, node);
+            QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
         }
         /* Update handler with latest information */
         node->io_read = io_read;
@@ -91,25 +81,19 @@ void qemu_aio_set_fd_handler(int fd,
         node->io_flush = io_flush;
         node->opaque = opaque;
     }
-
-    qemu_set_fd_handler2(fd, NULL, io_read, io_write, opaque);
-}
-
-void qemu_aio_set_event_notifier(EventNotifier *notifier,
-                                 EventNotifierHandler *io_read,
-                                 AioFlushEventNotifierHandler *io_flush)
-{
-    qemu_aio_set_fd_handler(event_notifier_get_fd(notifier),
-                            (IOHandler *)io_read, NULL,
-                            (AioFlushHandler *)io_flush, notifier);
 }
 
-void qemu_aio_flush(void)
+void aio_set_event_notifier(AioContext *ctx,
+                            EventNotifier *notifier,
+                            EventNotifierHandler *io_read,
+                            AioFlushEventNotifierHandler *io_flush)
 {
-    while (qemu_aio_wait());
+    aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
+                       (IOHandler *)io_read, NULL,
+                       (AioFlushHandler *)io_flush, notifier);
 }
 
-bool qemu_aio_wait(void)
+bool aio_wait(AioContext *ctx)
 {
     AioHandler *node;
     fd_set rdfds, wrfds;
@@ -122,18 +106,18 @@ bool qemu_aio_wait(void)
      * Do not call select in this case, because it is possible that the caller
      * does not need a complete flush (as is the case for qemu_aio_wait loops).
      */
-    if (qemu_bh_poll()) {
+    if (aio_bh_poll(ctx)) {
         return true;
     }
 
-    walking_handlers++;
+    ctx->walking_handlers++;
 
     FD_ZERO(&rdfds);
     FD_ZERO(&wrfds);
 
     /* fill fd sets */
     busy = false;
-    QLIST_FOREACH(node, &aio_handlers, node) {
+    QLIST_FOREACH(node, &ctx->aio_handlers, node) {
         /* If there aren't pending AIO operations, don't invoke callbacks.
          * Otherwise, if there are no AIO requests, qemu_aio_wait() would
          * wait indefinitely.
@@ -154,7 +138,7 @@ bool qemu_aio_wait(void)
         }
     }
 
-    walking_handlers--;
+    ctx->walking_handlers--;
 
     /* No AIO operations?  Get us out of here */
     if (!busy) {
@@ -168,11 +152,11 @@ bool qemu_aio_wait(void)
     if (ret > 0) {
         /* we have to walk very carefully in case
          * qemu_aio_set_fd_handler is called while we're walking */
-        node = QLIST_FIRST(&aio_handlers);
+        node = QLIST_FIRST(&ctx->aio_handlers);
         while (node) {
             AioHandler *tmp;
 
-            walking_handlers++;
+            ctx->walking_handlers++;
 
             if (!node->deleted &&
                 FD_ISSET(node->fd, &rdfds) &&
@@ -188,9 +172,9 @@ bool qemu_aio_wait(void)
             tmp = node;
             node = QLIST_NEXT(node, node);
 
-            walking_handlers--;
+            ctx->walking_handlers--;
 
-            if (!walking_handlers && tmp->deleted) {
+            if (!ctx->walking_handlers && tmp->deleted) {
                 QLIST_REMOVE(tmp, node);
                 g_free(tmp);
             }
diff --git a/async.c b/async.c
index 189ee1b..c99db79 100644
--- a/async.c
+++ b/async.c
@@ -136,7 +136,13 @@ void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout)
     }
 }
 
+
 AioContext *aio_context_new(void)
 {
     return g_new0(AioContext, 1);
 }
+
+void aio_flush(AioContext *ctx)
+{
+    while (aio_wait(ctx));
+}
diff --git a/main-loop.c b/main-loop.c
index f0bc515..8301fe9 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -523,3 +523,36 @@ int qemu_bh_poll(void)
 {
     return aio_bh_poll(qemu_aio_context);
 }
+
+void qemu_aio_flush(void)
+{
+    aio_flush(qemu_aio_context);
+}
+
+bool qemu_aio_wait(void)
+{
+    return aio_wait(qemu_aio_context);
+}
+
+void qemu_aio_set_fd_handler(int fd,
+                             IOHandler *io_read,
+                             IOHandler *io_write,
+                             AioFlushHandler *io_flush,
+                             void *opaque)
+{
+    aio_set_fd_handler(qemu_aio_context, fd, io_read, io_write, io_flush,
+                       opaque);
+
+    qemu_set_fd_handler2(fd, NULL, io_read, io_write, opaque);
+}
+
+#ifdef CONFIG_POSIX
+void qemu_aio_set_event_notifier(EventNotifier *notifier,
+                                 EventNotifierHandler *io_read,
+                                 AioFlushEventNotifierHandler *io_flush)
+{
+    qemu_aio_set_fd_handler(event_notifier_get_fd(notifier),
+                            (IOHandler *)io_read, NULL,
+                            (AioFlushHandler *)io_flush, notifier);
+}
+#endif
diff --git a/qemu-aio.h b/qemu-aio.h
index 2ed6ad3..f8a93d8 100644
--- a/qemu-aio.h
+++ b/qemu-aio.h
@@ -15,6 +15,7 @@
 #define QEMU_AIO_H
 
 #include "qemu-common.h"
+#include "qemu-queue.h"
 #include "event_notifier.h"
 
 typedef struct BlockDriverAIOCB BlockDriverAIOCB;
@@ -43,6 +44,15 @@ typedef void QEMUBHFunc(void *opaque);
 typedef void IOHandler(void *opaque);
 
 typedef struct AioContext {
+    /* The list of registered AIO handlers */
+    QLIST_HEAD(, AioHandler) aio_handlers;
+
+    /* This is a simple lock used to protect the aio_handlers list.
+     * Specifically, it's used to ensure that no callbacks are removed while
+     * we're walking and dispatching callbacks.
+     */
+    int walking_handlers;
+
     /* Anchor of the list of Bottom Halves belonging to the context */
     struct QEMUBH *first_bh;
 
@@ -121,7 +131,7 @@ void qemu_bh_delete(QEMUBH *bh);
 
 /* Flush any pending AIO operation. This function will block until all
  * outstanding AIO operations have been completed or cancelled. */
-void qemu_aio_flush(void);
+void aio_flush(AioContext *ctx);
 
 /* Wait for a single AIO completion to occur.  This function will wait
  * until a single AIO event has completed and it will ensure something
@@ -129,7 +139,7 @@ void qemu_aio_flush(void);
  * result of executing I/O completion or bh callbacks.
  *
  * Return whether there is still any pending AIO operation.  */
-bool qemu_aio_wait(void);
+bool aio_wait(AioContext *ctx);
 
 #ifdef CONFIG_POSIX
 /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
@@ -142,11 +152,12 @@ typedef int (AioFlushHandler)(void *opaque);
  * Code that invokes AIO completion functions should rely on this function
  * instead of qemu_set_fd_handler[2].
  */
-void qemu_aio_set_fd_handler(int fd,
-                             IOHandler *io_read,
-                             IOHandler *io_write,
-                             AioFlushHandler *io_flush,
-                             void *opaque);
+void aio_set_fd_handler(AioContext *ctx,
+                        int fd,
+                        IOHandler *io_read,
+                        IOHandler *io_write,
+                        AioFlushHandler *io_flush,
+                        void *opaque);
 #endif
 
 /* Register an event notifier and associated callbacks.  Behaves very similarly
@@ -156,8 +167,25 @@ void qemu_aio_set_fd_handler(int fd,
  * Code that invokes AIO completion functions should rely on this function
  * instead of event_notifier_set_handler.
  */
+void aio_set_event_notifier(AioContext *ctx,
+                            EventNotifier *notifier,
+                            EventNotifierHandler *io_read,
+                            AioFlushEventNotifierHandler *io_flush);
+
+/* Functions to operate on the main QEMU AioContext.  */
+
+void qemu_aio_flush(void);
+bool qemu_aio_wait(void);
 void qemu_aio_set_event_notifier(EventNotifier *notifier,
                                  EventNotifierHandler *io_read,
                                  AioFlushEventNotifierHandler *io_flush);
 
+#ifdef CONFIG_POSIX
+void qemu_aio_set_fd_handler(int fd,
+                             IOHandler *io_read,
+                             IOHandler *io_write,
+                             AioFlushHandler *io_flush,
+                             void *opaque);
+#endif
+
 #endif
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 08/17] aio: add non-blocking variant of aio_wait
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (6 preceding siblings ...)
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 07/17] aio: add I/O handlers to the AioContext interface Paolo Bonzini
@ 2012-09-25 12:55 ` Paolo Bonzini
  2012-09-25 21:56   ` Anthony Liguori
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch Paolo Bonzini
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:55 UTC (permalink / raw)
  To: qemu-devel

This will be used when polling the GSource attached to an AioContext.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 aio.c       | 16 ++++++++++++----
 async.c     |  2 +-
 main-loop.c |  2 +-
 qemu-aio.h  | 21 +++++++++++++++------
 4 file modificati, 29 inserzioni(+), 12 rimozioni(-)

diff --git a/aio.c b/aio.c
index c89f1e9..95ad467 100644
--- a/aio.c
+++ b/aio.c
@@ -93,13 +93,16 @@ void aio_set_event_notifier(AioContext *ctx,
                        (AioFlushHandler *)io_flush, notifier);
 }
 
-bool aio_wait(AioContext *ctx)
+bool aio_poll(AioContext *ctx, bool blocking)
 {
+    static struct timeval tv0;
     AioHandler *node;
     fd_set rdfds, wrfds;
     int max_fd = -1;
     int ret;
-    bool busy;
+    bool busy, progress;
+
+    progress = false;
 
     /*
      * If there are callbacks left that have been queued, we need to call then.
@@ -107,6 +110,11 @@ bool aio_wait(AioContext *ctx)
      * does not need a complete flush (as is the case for qemu_aio_wait loops).
      */
     if (aio_bh_poll(ctx)) {
+        blocking = false;
+        progress = true;
+    }
+
+    if (progress && !blocking) {
         return true;
     }
 
@@ -142,11 +150,11 @@ bool aio_wait(AioContext *ctx)
 
     /* No AIO operations?  Get us out of here */
     if (!busy) {
-        return false;
+        return progress;
     }
 
     /* wait until next event */
-    ret = select(max_fd, &rdfds, &wrfds, NULL, NULL);
+    ret = select(max_fd, &rdfds, &wrfds, NULL, blocking ? NULL : &tv0);
 
     /* if we have any readable fds, dispatch event */
     if (ret > 0) {
diff --git a/async.c b/async.c
index c99db79..513bdd7 100644
--- a/async.c
+++ b/async.c
@@ -144,5 +144,5 @@ AioContext *aio_context_new(void)
 
 void aio_flush(AioContext *ctx)
 {
-    while (aio_wait(ctx));
+    while (aio_poll(ctx, true));
 }
diff --git a/main-loop.c b/main-loop.c
index 8301fe9..67800fe 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -531,7 +531,7 @@ void qemu_aio_flush(void)
 
 bool qemu_aio_wait(void)
 {
-    return aio_wait(qemu_aio_context);
+    return aio_poll(qemu_aio_context, true);
 }
 
 void qemu_aio_set_fd_handler(int fd,
diff --git a/qemu-aio.h b/qemu-aio.h
index f8a93d8..f19201e 100644
--- a/qemu-aio.h
+++ b/qemu-aio.h
@@ -133,13 +133,22 @@ void qemu_bh_delete(QEMUBH *bh);
  * outstanding AIO operations have been completed or cancelled. */
 void aio_flush(AioContext *ctx);
 
-/* Wait for a single AIO completion to occur.  This function will wait
- * until a single AIO event has completed and it will ensure something
- * has moved before returning. This can issue new pending aio as
- * result of executing I/O completion or bh callbacks.
+/* Progress in completing AIO work to occur.  This can issue new pending
+ * aio as a result of executing I/O completion or bh callbacks.
  *
- * Return whether there is still any pending AIO operation.  */
-bool aio_wait(AioContext *ctx);
+ * If there is no pending AIO operation or completion (bottom half),
+ * return false.  If there are pending bottom halves, return true.
+ *
+ * If there are no pending bottom halves, but there are pending AIO
+ * operations, it may not be possible to make any progress without
+ * blocking.  If @blocking is true, this function will wait until one
+ * or more AIO events have completed, to ensure something has moved
+ * before returning.
+ *
+ * If @blocking is false, this function will also return false if the
+ * function cannot make any progress without blocking.
+ */
+bool aio_poll(AioContext *ctx, bool blocking);
 
 #ifdef CONFIG_POSIX
 /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (7 preceding siblings ...)
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 08/17] aio: add non-blocking variant of aio_wait Paolo Bonzini
@ 2012-09-25 12:55 ` Paolo Bonzini
  2012-09-25 22:01   ` Anthony Liguori
  2012-09-29 11:28   ` Blue Swirl
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 10/17] aio: add Win32 implementation Paolo Bonzini
                   ` (9 subsequent siblings)
  18 siblings, 2 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:55 UTC (permalink / raw)
  To: qemu-devel

This adds a GPollFD to each AioHandler.  It will then be possible to
attach these GPollFDs to a GSource, and from there to the main loop.
aio_wait examines the GPollFDs and avoids calling select() if any
is set (similar to what it does if bottom halves are available).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 aio.c      | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++---------
 qemu-aio.h |  7 ++++++
 2 file modificati, 78 inserzioni(+), 11 rimozioni(-)

diff --git a/aio.c b/aio.c
index 95ad467..c848a9f 100644
--- a/aio.c
+++ b/aio.c
@@ -20,7 +20,7 @@
 
 struct AioHandler
 {
-    int fd;
+    GPollFD pfd;
     IOHandler *io_read;
     IOHandler *io_write;
     AioFlushHandler *io_flush;
@@ -34,7 +34,7 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
     AioHandler *node;
 
     QLIST_FOREACH(node, &ctx->aio_handlers, node) {
-        if (node->fd == fd)
+        if (node->pfd.fd == fd)
             if (!node->deleted)
                 return node;
     }
@@ -57,9 +57,10 @@ void aio_set_fd_handler(AioContext *ctx,
     if (!io_read && !io_write) {
         if (node) {
             /* If the lock is held, just mark the node as deleted */
-            if (ctx->walking_handlers)
+            if (ctx->walking_handlers) {
                 node->deleted = 1;
-            else {
+                node->pfd.revents = 0;
+            } else {
                 /* Otherwise, delete it for real.  We can't just mark it as
                  * deleted because deleted nodes are only cleaned up after
                  * releasing the walking_handlers lock.
@@ -72,7 +73,7 @@ void aio_set_fd_handler(AioContext *ctx,
         if (node == NULL) {
             /* Alloc and insert if it's not already there */
             node = g_malloc0(sizeof(AioHandler));
-            node->fd = fd;
+            node->pfd.fd = fd;
             QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
         }
         /* Update handler with latest information */
@@ -80,6 +81,10 @@ void aio_set_fd_handler(AioContext *ctx,
         node->io_write = io_write;
         node->io_flush = io_flush;
         node->opaque = opaque;
+
+        node->pfd.events = G_IO_ERR;
+        node->pfd.events |= (io_read ? G_IO_IN | G_IO_HUP : 0);
+        node->pfd.events |= (io_write ? G_IO_OUT : 0);
     }
 }
 
@@ -93,6 +98,25 @@ void aio_set_event_notifier(AioContext *ctx,
                        (AioFlushHandler *)io_flush, notifier);
 }
 
+bool aio_pending(AioContext *ctx)
+{
+    AioHandler *node;
+
+    QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+        int revents;
+
+        revents = node->pfd.revents & node->pfd.events;
+        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
+            return true;
+        }
+        if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
 bool aio_poll(AioContext *ctx, bool blocking)
 {
     static struct timeval tv0;
@@ -114,6 +138,42 @@ bool aio_poll(AioContext *ctx, bool blocking)
         progress = true;
     }
 
+    /*
+     * Then dispatch any pending callbacks from the GSource.
+     *
+     * We have to walk very carefully in case qemu_aio_set_fd_handler is
+     * called while we're walking.
+     */
+    node = QLIST_FIRST(&ctx->aio_handlers);
+    while (node) {
+        AioHandler *tmp;
+        int revents;
+
+        ctx->walking_handlers++;
+
+        revents = node->pfd.revents & node->pfd.events;
+        node->pfd.revents &= ~revents;
+
+        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
+            node->io_read(node->opaque);
+            progress = true;
+        }
+        if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
+            node->io_write(node->opaque);
+            progress = true;
+        }
+
+        tmp = node;
+        node = QLIST_NEXT(node, node);
+
+        ctx->walking_handlers--;
+
+        if (!ctx->walking_handlers && tmp->deleted) {
+            QLIST_REMOVE(tmp, node);
+            g_free(tmp);
+        }
+    }
+
     if (progress && !blocking) {
         return true;
     }
@@ -137,12 +197,12 @@ bool aio_poll(AioContext *ctx, bool blocking)
             busy = true;
         }
         if (!node->deleted && node->io_read) {
-            FD_SET(node->fd, &rdfds);
-            max_fd = MAX(max_fd, node->fd + 1);
+            FD_SET(node->pfd.fd, &rdfds);
+            max_fd = MAX(max_fd, node->pfd.fd + 1);
         }
         if (!node->deleted && node->io_write) {
-            FD_SET(node->fd, &wrfds);
-            max_fd = MAX(max_fd, node->fd + 1);
+            FD_SET(node->pfd.fd, &wrfds);
+            max_fd = MAX(max_fd, node->pfd.fd + 1);
         }
     }
 
@@ -167,12 +227,12 @@ bool aio_poll(AioContext *ctx, bool blocking)
             ctx->walking_handlers++;
 
             if (!node->deleted &&
-                FD_ISSET(node->fd, &rdfds) &&
+                FD_ISSET(node->pfd.fd, &rdfds) &&
                 node->io_read) {
                 node->io_read(node->opaque);
             }
             if (!node->deleted &&
-                FD_ISSET(node->fd, &wrfds) &&
+                FD_ISSET(node->pfd.fd, &wrfds) &&
                 node->io_write) {
                 node->io_write(node->opaque);
             }
diff --git a/qemu-aio.h b/qemu-aio.h
index f19201e..ac24896 100644
--- a/qemu-aio.h
+++ b/qemu-aio.h
@@ -133,6 +133,13 @@ void qemu_bh_delete(QEMUBH *bh);
  * outstanding AIO operations have been completed or cancelled. */
 void aio_flush(AioContext *ctx);
 
+/* Return whether there are any pending callbacks from the GSource
+ * attached to the AioContext.
+ *
+ * This is used internally in the implementation of the GSource.
+ */
+bool aio_pending(AioContext *ctx);
+
 /* Progress in completing AIO work to occur.  This can issue new pending
  * aio as a result of executing I/O completion or bh callbacks.
  *
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 10/17] aio: add Win32 implementation
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (8 preceding siblings ...)
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch Paolo Bonzini
@ 2012-09-25 12:55 ` Paolo Bonzini
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 11/17] aio: make AioContexts GSources Paolo Bonzini
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:55 UTC (permalink / raw)
  To: qemu-devel

The Win32 implementation will only accept EventNotifiers, thus a few
drivers are disabled under Windows.  EventNotifiers are a good match
for the GSource implementation, too, because the Win32 port of glib
allows to place their HANDLEs in a GPollFD.  This is important so
that AioContexts can get an equivalent of qemu_notify_event.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.objs        |   6 +--
 aio.c => aio-posix.c |   0
 aio.c => aio-win32.c | 131 +++++++++++++++++----------------------------------
 block/Makefile.objs  |   6 ++-
 main-loop.c          |   2 +-
 5 file modificati, 52 inserzioni(+), 93 rimozioni(-)
 copy aio.c => aio-posix.c (100%)
 rename aio.c => aio-win32.c (59%)

diff --git a/Makefile.objs b/Makefile.objs
index 713dd87..8d2cbb9 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -42,11 +42,11 @@ coroutine-obj-$(CONFIG_WIN32) += coroutine-win32.o
 # block-obj-y is code used by both qemu system emulation and qemu-img
 
 block-obj-y = iov.o cache-utils.o qemu-option.o module.o async.o
-block-obj-y += nbd.o block.o aio.o aes.o qemu-config.o qemu-progress.o qemu-sockets.o
+block-obj-y += nbd.o block.o aes.o qemu-config.o qemu-progress.o qemu-sockets.o
 block-obj-y += $(coroutine-obj-y) $(qobject-obj-y) $(version-obj-y)
 block-obj-$(CONFIG_POSIX) += posix-aio-compat.o
-block-obj-$(CONFIG_POSIX) += event_notifier-posix.o
-block-obj-$(CONFIG_WIN32) += event_notifier-win32.o
+block-obj-$(CONFIG_POSIX) += event_notifier-posix.o aio-posix.o
+block-obj-$(CONFIG_WIN32) += event_notifier-win32.o aio-win32.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 block-obj-y += block/
 
diff --git a/aio.c b/aio-posix.c
similarity index 100%
copy from aio.c
copy to aio-posix.c
diff --git a/aio.c b/aio-win32.c
similarity index 59%
rename from aio.c
rename to aio-win32.c
index c848a9f..c46dfb2 100644
--- a/aio.c
+++ b/aio-win32.c
@@ -1,10 +1,12 @@
 /*
  * QEMU aio implementation
  *
- * Copyright IBM, Corp. 2008
+ * Copyright IBM Corp., 2008
+ * Copyright Red Hat Inc., 2012
  *
  * Authors:
  *  Anthony Liguori   <aliguori@us.ibm.com>
+ *  Paolo Bonzini     <pbonzini@redhat.com>
  *
  * This work is licensed under the terms of the GNU GPL, version 2.  See
  * the COPYING file in the top-level directory.
@@ -18,43 +20,30 @@
 #include "qemu-queue.h"
 #include "qemu_socket.h"
 
-struct AioHandler
-{
+struct AioHandler {
+    EventNotifier *e;
+    EventNotifierHandler *io_notify;
+    AioFlushEventNotifierHandler *io_flush;
     GPollFD pfd;
-    IOHandler *io_read;
-    IOHandler *io_write;
-    AioFlushHandler *io_flush;
     int deleted;
-    void *opaque;
     QLIST_ENTRY(AioHandler) node;
 };
 
-static AioHandler *find_aio_handler(AioContext *ctx, int fd)
+void aio_set_event_notifier(AioContext *ctx,
+                            EventNotifier *e,
+                            EventNotifierHandler *io_notify,
+                            AioFlushEventNotifierHandler *io_flush)
 {
     AioHandler *node;
 
     QLIST_FOREACH(node, &ctx->aio_handlers, node) {
-        if (node->pfd.fd == fd)
-            if (!node->deleted)
-                return node;
+        if (node->e == e && !node->deleted) {
+            break;
+        }
     }
 
-    return NULL;
-}
-
-void aio_set_fd_handler(AioContext *ctx,
-                        int fd,
-                        IOHandler *io_read,
-                        IOHandler *io_write,
-                        AioFlushHandler *io_flush,
-                        void *opaque)
-{
-    AioHandler *node;
-
-    node = find_aio_handler(ctx, fd);
-
     /* Are we deleting the fd handler? */
-    if (!io_read && !io_write) {
+    if (!io_notify) {
         if (node) {
             /* If the lock is held, just mark the node as deleted */
             if (ctx->walking_handlers) {
@@ -73,43 +62,23 @@ void aio_set_fd_handler(AioContext *ctx,
         if (node == NULL) {
             /* Alloc and insert if it's not already there */
             node = g_malloc0(sizeof(AioHandler));
-            node->pfd.fd = fd;
+            node->e = e;
+            node->pfd.fd = (uintptr_t)event_notifier_get_handle(e);
+            node->pfd.events = G_IO_IN;
             QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
         }
         /* Update handler with latest information */
-        node->io_read = io_read;
-        node->io_write = io_write;
+        node->io_notify = io_notify;
         node->io_flush = io_flush;
-        node->opaque = opaque;
-
-        node->pfd.events = G_IO_ERR;
-        node->pfd.events |= (io_read ? G_IO_IN | G_IO_HUP : 0);
-        node->pfd.events |= (io_write ? G_IO_OUT : 0);
     }
 }
 
-void aio_set_event_notifier(AioContext *ctx,
-                            EventNotifier *notifier,
-                            EventNotifierHandler *io_read,
-                            AioFlushEventNotifierHandler *io_flush)
-{
-    aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
-                       (IOHandler *)io_read, NULL,
-                       (AioFlushHandler *)io_flush, notifier);
-}
-
 bool aio_pending(AioContext *ctx)
 {
     AioHandler *node;
 
     QLIST_FOREACH(node, &ctx->aio_handlers, node) {
-        int revents;
-
-        revents = node->pfd.revents & node->pfd.events;
-        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
-            return true;
-        }
-        if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
+        if (node->pfd.revents && node->io_notify) {
             return true;
         }
     }
@@ -119,12 +88,10 @@ bool aio_pending(AioContext *ctx)
 
 bool aio_poll(AioContext *ctx, bool blocking)
 {
-    static struct timeval tv0;
     AioHandler *node;
-    fd_set rdfds, wrfds;
-    int max_fd = -1;
-    int ret;
+    HANDLE events[MAXIMUM_WAIT_OBJECTS + 1];
     bool busy, progress;
+    int count;
 
     progress = false;
 
@@ -147,19 +114,12 @@ bool aio_poll(AioContext *ctx, bool blocking)
     node = QLIST_FIRST(&ctx->aio_handlers);
     while (node) {
         AioHandler *tmp;
-        int revents;
 
         ctx->walking_handlers++;
 
-        revents = node->pfd.revents & node->pfd.events;
-        node->pfd.revents &= ~revents;
-
-        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
-            node->io_read(node->opaque);
-            progress = true;
-        }
-        if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
-            node->io_write(node->opaque);
+        if (node->pfd.revents && node->io_notify) {
+            node->pfd.revents = 0;
+            node->io_notify(node->e);
             progress = true;
         }
 
@@ -180,29 +140,22 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
     ctx->walking_handlers++;
 
-    FD_ZERO(&rdfds);
-    FD_ZERO(&wrfds);
-
     /* fill fd sets */
     busy = false;
+    count = 0;
     QLIST_FOREACH(node, &ctx->aio_handlers, node) {
         /* If there aren't pending AIO operations, don't invoke callbacks.
          * Otherwise, if there are no AIO requests, qemu_aio_wait() would
          * wait indefinitely.
          */
         if (node->io_flush) {
-            if (node->io_flush(node->opaque) == 0) {
+            if (node->io_flush(node->e) == 0) {
                 continue;
             }
             busy = true;
         }
-        if (!node->deleted && node->io_read) {
-            FD_SET(node->pfd.fd, &rdfds);
-            max_fd = MAX(max_fd, node->pfd.fd + 1);
-        }
-        if (!node->deleted && node->io_write) {
-            FD_SET(node->pfd.fd, &wrfds);
-            max_fd = MAX(max_fd, node->pfd.fd + 1);
+        if (!node->deleted && node->io_notify) {
+            events[count++] = event_notifier_get_handle(node->e);
         }
     }
 
@@ -210,14 +163,21 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
     /* No AIO operations?  Get us out of here */
     if (!busy) {
-        return progress;
+        return false;
     }
 
     /* wait until next event */
-    ret = select(max_fd, &rdfds, &wrfds, NULL, blocking ? NULL : &tv0);
+    for (;;) {
+        int timeout = blocking ? INFINITE : 0;
+        int ret = WaitForMultipleObjects(count, events, FALSE, timeout);
+
+        /* if we have any signaled events, dispatch event */
+        if ((DWORD) (ret - WAIT_OBJECT_0) >= count) {
+            break;
+        }
+
+        blocking = false;
 
-    /* if we have any readable fds, dispatch event */
-    if (ret > 0) {
         /* we have to walk very carefully in case
          * qemu_aio_set_fd_handler is called while we're walking */
         node = QLIST_FIRST(&ctx->aio_handlers);
@@ -227,14 +187,9 @@ bool aio_poll(AioContext *ctx, bool blocking)
             ctx->walking_handlers++;
 
             if (!node->deleted &&
-                FD_ISSET(node->pfd.fd, &rdfds) &&
-                node->io_read) {
-                node->io_read(node->opaque);
-            }
-            if (!node->deleted &&
-                FD_ISSET(node->pfd.fd, &wrfds) &&
-                node->io_write) {
-                node->io_write(node->opaque);
+                event_notifier_get_handle(node->e) == events[ret - WAIT_OBJECT_0] &&
+                node->io_notify) {
+                node->io_notify(node->e);
             }
 
             tmp = node;
diff --git a/block/Makefile.objs b/block/Makefile.objs
index b5754d3..65d4dc6 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -2,10 +2,14 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
-block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
+block-obj-y += parallels.o blkdebug.o blkverify.o
 block-obj-y += stream.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o
 block-obj-$(CONFIG_POSIX) += raw-posix.o
+
+ifeq ($(CONFIG_POSIX),y)
+block-obj-y += nbd.o sheepdog.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
+endif
diff --git a/main-loop.c b/main-loop.c
index 67800fe..b290c79 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -534,6 +534,7 @@ bool qemu_aio_wait(void)
     return aio_poll(qemu_aio_context, true);
 }
 
+#ifdef CONFIG_POSIX
 void qemu_aio_set_fd_handler(int fd,
                              IOHandler *io_read,
                              IOHandler *io_write,
@@ -546,7 +547,6 @@ void qemu_aio_set_fd_handler(int fd,
     qemu_set_fd_handler2(fd, NULL, io_read, io_write, opaque);
 }
 
-#ifdef CONFIG_POSIX
 void qemu_aio_set_event_notifier(EventNotifier *notifier,
                                  EventNotifierHandler *io_read,
                                  AioFlushEventNotifierHandler *io_flush)
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 11/17] aio: make AioContexts GSources
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (9 preceding siblings ...)
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 10/17] aio: add Win32 implementation Paolo Bonzini
@ 2012-09-25 12:55 ` Paolo Bonzini
  2012-09-25 22:06   ` Anthony Liguori
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 12/17] aio: add aio_notify Paolo Bonzini
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:55 UTC (permalink / raw)
  To: qemu-devel

This lets AioContexts be used (optionally) with a glib main loop.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 aio-posix.c |  4 ++++
 aio-win32.c |  4 ++++
 async.c     | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 qemu-aio.h  | 23 ++++++++++++++++++++++
 4 file modificati, 95 inserzioni(+). 1 rimozione(-)

diff --git a/aio-posix.c b/aio-posix.c
index c848a9f..e29ece9 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -56,6 +56,8 @@ void aio_set_fd_handler(AioContext *ctx,
     /* Are we deleting the fd handler? */
     if (!io_read && !io_write) {
         if (node) {
+            g_source_remove_poll(&ctx->source, &node->pfd);
+
             /* If the lock is held, just mark the node as deleted */
             if (ctx->walking_handlers) {
                 node->deleted = 1;
@@ -75,6 +77,8 @@ void aio_set_fd_handler(AioContext *ctx,
             node = g_malloc0(sizeof(AioHandler));
             node->pfd.fd = fd;
             QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
+
+            g_source_add_poll(&ctx->source, &node->pfd);
         }
         /* Update handler with latest information */
         node->io_read = io_read;
diff --git a/aio-win32.c b/aio-win32.c
index c46dfb2..5057371 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -45,6 +45,8 @@ void aio_set_event_notifier(AioContext *ctx,
     /* Are we deleting the fd handler? */
     if (!io_notify) {
         if (node) {
+            g_source_remove_poll(&ctx->source, &node->pfd);
+
             /* If the lock is held, just mark the node as deleted */
             if (ctx->walking_handlers) {
                 node->deleted = 1;
@@ -66,6 +68,8 @@ void aio_set_event_notifier(AioContext *ctx,
             node->pfd.fd = (uintptr_t)event_notifier_get_handle(e);
             node->pfd.events = G_IO_IN;
             QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
+
+            g_source_add_poll(&ctx->source, &node->pfd);
         }
         /* Update handler with latest information */
         node->io_notify = io_notify;
diff --git a/async.c b/async.c
index 513bdd7..ed2bd3f 100644
--- a/async.c
+++ b/async.c
@@ -136,10 +136,73 @@ void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout)
     }
 }
 
+static gboolean
+aio_ctx_prepare(GSource *source, gint    *timeout)
+{
+    AioContext *ctx = (AioContext *) source;
+    uint32_t wait = -1;
+    aio_bh_update_timeout(ctx, &wait);
+
+    if (wait != -1) {
+        *timeout = MIN(*timeout, wait);
+        return wait == 0;
+    }
+
+    return FALSE;
+}
+
+static gboolean
+aio_ctx_check(GSource *source)
+{
+    AioContext *ctx = (AioContext *) source;
+    QEMUBH *bh;
+
+    for (bh = ctx->first_bh; bh; bh = bh->next) {
+        if (!bh->deleted && bh->scheduled) {
+            return true;
+	}
+    }
+    return aio_pending(ctx);
+}
+
+static gboolean
+aio_ctx_dispatch(GSource     *source,
+                 GSourceFunc  callback,
+                 gpointer     user_data)
+{
+    AioContext *ctx = (AioContext *) source;
+
+    assert(callback == NULL);
+    aio_poll(ctx, false);
+    return TRUE;
+}
+
+static GSourceFuncs aio_source_funcs = {
+    aio_ctx_prepare,
+    aio_ctx_check,
+    aio_ctx_dispatch,
+    NULL
+};
+
+GSource *aio_get_g_source(AioContext *ctx)
+{
+    g_source_ref(&ctx->source);
+    return &ctx->source;
+}
 
 AioContext *aio_context_new(void)
 {
-    return g_new0(AioContext, 1);
+    return (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
+}
+
+void aio_context_ref(AioContext *ctx)
+{
+    g_source_ref(&ctx->source);
+}
+
+void aio_context_unref(AioContext *ctx)
+{
+    g_source_unref(&ctx->source);
 }
 
 void aio_flush(AioContext *ctx)
diff --git a/qemu-aio.h b/qemu-aio.h
index ac24896..aedf66c 100644
--- a/qemu-aio.h
+++ b/qemu-aio.h
@@ -44,6 +44,8 @@ typedef void QEMUBHFunc(void *opaque);
 typedef void IOHandler(void *opaque);
 
 typedef struct AioContext {
+    GSource source;
+
     /* The list of registered AIO handlers */
     QLIST_HEAD(, AioHandler) aio_handlers;
 
@@ -75,6 +77,22 @@ typedef int (AioFlushEventNotifierHandler)(EventNotifier *e);
 AioContext *aio_context_new(void);
 
 /**
+ * aio_context_ref:
+ * @ctx: The AioContext to operate on.
+ *
+ * Add a reference to an AioContext.
+ */
+void aio_context_ref(AioContext *ctx);
+
+/**
+ * aio_context_unref:
+ * @ctx: The AioContext to operate on.
+ *
+ * Drop a reference to an AioContext.
+ */
+void aio_context_unref(AioContext *ctx);
+
+/**
  * aio_bh_new: Allocate a new bottom half structure.
  *
  * Bottom halves are lightweight callbacks whose invocation is guaranteed
@@ -188,6 +206,11 @@ void aio_set_event_notifier(AioContext *ctx,
                             EventNotifierHandler *io_read,
                             AioFlushEventNotifierHandler *io_flush);
 
+/* Return a GSource that lets the main loop poll the file descriptors attached
+ * to this AioContext.
+ */
+GSource *aio_get_g_source(AioContext *ctx);
+
 /* Functions to operate on the main QEMU AioContext.  */
 
 void qemu_aio_flush(void);
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 12/17] aio: add aio_notify
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (10 preceding siblings ...)
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 11/17] aio: make AioContexts GSources Paolo Bonzini
@ 2012-09-25 12:55 ` Paolo Bonzini
  2012-09-25 22:07   ` Anthony Liguori
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 13/17] aio: call aio_notify after setting I/O handlers Paolo Bonzini
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:55 UTC (permalink / raw)
  To: qemu-devel

With this change async.c does not rely anymore on any service from
main-loop.c, i.e. it is completely self-contained.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 async.c    | 30 ++++++++++++++++++++++++++----
 qemu-aio.h | 18 ++++++++++++++++++
 2 file modificati, 44 inserzioni(+), 4 rimozioni(-)

diff --git a/async.c b/async.c
index ed2bd3f..31c6c76 100644
--- a/async.c
+++ b/async.c
@@ -30,6 +30,7 @@
 /* bottom halves (can be seen as timers which expire ASAP) */
 
 struct QEMUBH {
+    AioContext *ctx;
     QEMUBHFunc *cb;
     void *opaque;
     QEMUBH *next;
@@ -42,6 +43,7 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
 {
     QEMUBH *bh;
     bh = g_malloc0(sizeof(QEMUBH));
+    bh->ctx = ctx;
     bh->cb = cb;
     bh->opaque = opaque;
     bh->next = ctx->first_bh;
@@ -101,8 +103,7 @@ void qemu_bh_schedule(QEMUBH *bh)
         return;
     bh->scheduled = 1;
     bh->idle = 0;
-    /* stop the currently executing CPU to execute the BH ASAP */
-    qemu_notify_event();
+    aio_notify(bh->ctx);
 }
 
 void qemu_bh_cancel(QEMUBH *bh)
@@ -177,11 +178,20 @@ aio_ctx_dispatch(GSource     *source,
     return TRUE;
 }
 
+static void
+aio_ctx_finalize(GSource     *source)
+{
+    AioContext *ctx = (AioContext *) source;
+
+    aio_set_event_notifier(ctx, &ctx->notifier, NULL, NULL);
+    event_notifier_cleanup(&ctx->notifier);
+}
+
 static GSourceFuncs aio_source_funcs = {
     aio_ctx_prepare,
     aio_ctx_check,
     aio_ctx_dispatch,
-    NULL
+    aio_ctx_finalize
 };
 
 GSource *aio_get_g_source(AioContext *ctx)
@@ -190,9 +200,21 @@ GSource *aio_get_g_source(AioContext *ctx)
     return &ctx->source;
 }
 
+void aio_notify(AioContext *ctx)
+{
+    event_notifier_set(&ctx->notifier);
+}
+
 AioContext *aio_context_new(void)
 {
-    return (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
+    AioContext *ctx;
+    ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
+    event_notifier_init(&ctx->notifier, false);
+    aio_set_event_notifier(ctx, &ctx->notifier, 
+                           (EventNotifierHandler *)
+                           event_notifier_test_and_clear, NULL);
+
+    return ctx;
 }
 
 void aio_context_ref(AioContext *ctx)
diff --git a/qemu-aio.h b/qemu-aio.h
index aedf66c..2354617 100644
--- a/qemu-aio.h
+++ b/qemu-aio.h
@@ -62,6 +62,9 @@ typedef struct AioContext {
      * no callbacks are removed while we're walking and dispatching callbacks.
      */
     int walking_bh;
+
+    /* Used for aio_notify.  */
+    EventNotifier notifier;
 } AioContext;
 
 /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
@@ -102,6 +105,21 @@ void aio_context_unref(AioContext *ctx);
 QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
 
 /**
+ * aio_notify: Force processing of pending events.
+ *
+ * Similar to signaling a condition variable, aio_notify forces
+ * aio_wait to exit, so that the next call will re-examine pending events.
+ * The caller of aio_notify will usually call aio_wait again very soon,
+ * or go through another iteration of the GLib main loop.  Hence, aio_notify
+ * also has the side effect of recalculating the sets of file descriptors
+ * that the main loop waits for.
+ *
+ * Calling aio_notify is rarely necessary, because for example scheduling
+ * a bottom half calls it already.
+ */
+void aio_notify(AioContext *ctx);
+
+/**
  * aio_bh_poll: Poll bottom halves for an AioContext.
  *
  * These are internal functions used by the QEMU main loop.
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 13/17] aio: call aio_notify after setting I/O handlers
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (11 preceding siblings ...)
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 12/17] aio: add aio_notify Paolo Bonzini
@ 2012-09-25 12:55 ` Paolo Bonzini
  2012-09-25 22:07   ` Anthony Liguori
  2012-09-25 12:56 ` [Qemu-devel] [PATCH 14/17] main-loop: use GSource to poll AIO file descriptors Paolo Bonzini
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:55 UTC (permalink / raw)
  To: qemu-devel

In the current code, this is done by qemu_set_fd_handler2, which is
called by qemu_aio_set_fd_handler.  We need to keep the same behavior
even after removing the call to qemu_set_fd_handler2.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 aio-posix.c | 2 ++
 aio-win32.c | 2 ++
 2 file modificati, 4 inserzioni(+)

diff --git a/aio-posix.c b/aio-posix.c
index e29ece9..41f638f 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -90,6 +90,8 @@ void aio_set_fd_handler(AioContext *ctx,
         node->pfd.events |= (io_read ? G_IO_IN | G_IO_HUP : 0);
         node->pfd.events |= (io_write ? G_IO_OUT : 0);
     }
+
+    aio_notify(ctx);
 }
 
 void aio_set_event_notifier(AioContext *ctx,
diff --git a/aio-win32.c b/aio-win32.c
index 5057371..78faf69 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -75,6 +75,8 @@ void aio_set_event_notifier(AioContext *ctx,
         node->io_notify = io_notify;
         node->io_flush = io_flush;
     }
+
+    aio_notify(ctx);
 }
 
 bool aio_pending(AioContext *ctx)
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 14/17] main-loop: use GSource to poll AIO file descriptors
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (12 preceding siblings ...)
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 13/17] aio: call aio_notify after setting I/O handlers Paolo Bonzini
@ 2012-09-25 12:56 ` Paolo Bonzini
  2012-09-25 22:09   ` Anthony Liguori
  2012-09-25 12:56 ` [Qemu-devel] [PATCH 15/17] main-loop: use aio_notify for qemu_notify_event Paolo Bonzini
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:56 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 main-loop.c | 23 ++++++-----------------
 main-loop.h |  2 --
 2 file modificati, 6 inserzioni(+), 19 rimozioni(-)

diff --git a/main-loop.c b/main-loop.c
index b290c79..209f699 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -205,6 +205,7 @@ static AioContext *qemu_aio_context;
 int main_loop_init(void)
 {
     int ret;
+    GSource *src;
 
     qemu_mutex_lock_iothread();
     ret = qemu_signal_init();
@@ -219,6 +220,9 @@ int main_loop_init(void)
     }
 
     qemu_aio_context = aio_context_new();
+    src = aio_get_g_source(qemu_aio_context);
+    g_source_attach(src, NULL);
+    g_source_unref(src);
     return 0;
 }
 
@@ -481,8 +485,6 @@ int main_loop_wait(int nonblocking)
 
     if (nonblocking) {
         timeout = 0;
-    } else {
-        aio_bh_update_timeout(qemu_aio_context, &timeout);
     }
 
     /* poll any events */
@@ -505,10 +507,6 @@ int main_loop_wait(int nonblocking)
 
     qemu_run_all_timers();
 
-    /* Check bottom-halves last in case any of the earlier events triggered
-       them.  */
-    qemu_bh_poll();
-
     return ret;
 }
 
@@ -519,11 +517,6 @@ QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
     return aio_bh_new(qemu_aio_context, cb, opaque);
 }
 
-int qemu_bh_poll(void)
-{
-    return aio_bh_poll(qemu_aio_context);
-}
-
 void qemu_aio_flush(void)
 {
     aio_flush(qemu_aio_context);
@@ -543,16 +536,12 @@ void qemu_aio_set_fd_handler(int fd,
 {
     aio_set_fd_handler(qemu_aio_context, fd, io_read, io_write, io_flush,
                        opaque);
-
-    qemu_set_fd_handler2(fd, NULL, io_read, io_write, opaque);
 }
+#endif
 
 void qemu_aio_set_event_notifier(EventNotifier *notifier,
                                  EventNotifierHandler *io_read,
                                  AioFlushEventNotifierHandler *io_flush)
 {
-    qemu_aio_set_fd_handler(event_notifier_get_fd(notifier),
-                            (IOHandler *)io_read, NULL,
-                            (AioFlushHandler *)io_flush, notifier);
+    aio_set_event_notifier(qemu_aio_context, notifier, io_read, io_flush);
 }
-#endif
diff --git a/main-loop.h b/main-loop.h
index 47644ce..c58f38b 100644
--- a/main-loop.h
+++ b/main-loop.h
@@ -312,7 +312,5 @@ void qemu_iohandler_poll(fd_set *readfds, fd_set *writefds, fd_set *xfds, int rc
 
 QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
 void qemu_bh_schedule_idle(QEMUBH *bh);
-int qemu_bh_poll(void);
-void qemu_bh_update_timeout(uint32_t *timeout);
 
 #endif
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 15/17] main-loop: use aio_notify for qemu_notify_event
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (13 preceding siblings ...)
  2012-09-25 12:56 ` [Qemu-devel] [PATCH 14/17] main-loop: use GSource to poll AIO file descriptors Paolo Bonzini
@ 2012-09-25 12:56 ` Paolo Bonzini
  2012-09-25 22:10   ` Anthony Liguori
  2012-09-25 12:56 ` [Qemu-devel] [PATCH 16/17] aio: clean up now-unused functions Paolo Bonzini
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:56 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 main-loop.c | 106 +++++-------------------------------------------------------
 1 file modificato, 8 inserzioni(+), 98 rimozioni(-)

diff --git a/main-loop.c b/main-loop.c
index 209f699..978050a 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -32,70 +32,6 @@
 
 #include "compatfd.h"
 
-static int io_thread_fd = -1;
-
-void qemu_notify_event(void)
-{
-    /* Write 8 bytes to be compatible with eventfd.  */
-    static const uint64_t val = 1;
-    ssize_t ret;
-
-    if (io_thread_fd == -1) {
-        return;
-    }
-    do {
-        ret = write(io_thread_fd, &val, sizeof(val));
-    } while (ret < 0 && errno == EINTR);
-
-    /* EAGAIN is fine, a read must be pending.  */
-    if (ret < 0 && errno != EAGAIN) {
-        fprintf(stderr, "qemu_notify_event: write() failed: %s\n",
-                strerror(errno));
-        exit(1);
-    }
-}
-
-static void qemu_event_read(void *opaque)
-{
-    int fd = (intptr_t)opaque;
-    ssize_t len;
-    char buffer[512];
-
-    /* Drain the notify pipe.  For eventfd, only 8 bytes will be read.  */
-    do {
-        len = read(fd, buffer, sizeof(buffer));
-    } while ((len == -1 && errno == EINTR) || len == sizeof(buffer));
-}
-
-static int qemu_event_init(void)
-{
-    int err;
-    int fds[2];
-
-    err = qemu_eventfd(fds);
-    if (err == -1) {
-        return -errno;
-    }
-    err = fcntl_setfl(fds[0], O_NONBLOCK);
-    if (err < 0) {
-        goto fail;
-    }
-    err = fcntl_setfl(fds[1], O_NONBLOCK);
-    if (err < 0) {
-        goto fail;
-    }
-    qemu_set_fd_handler2(fds[0], NULL, qemu_event_read, NULL,
-                         (void *)(intptr_t)fds[0]);
-
-    io_thread_fd = fds[1];
-    return 0;
-
-fail:
-    close(fds[0]);
-    close(fds[1]);
-    return err;
-}
-
 /* If we have signalfd, we mask out the signals we want to handle and then
  * use signalfd to listen for them.  We rely on whatever the current signal
  * handler is to dispatch the signals when we receive them.
@@ -165,43 +101,22 @@ static int qemu_signal_init(void)
 
 #else /* _WIN32 */
 
-static HANDLE qemu_event_handle = NULL;
-
-static void dummy_event_handler(void *opaque)
-{
-}
-
-static int qemu_event_init(void)
+static int qemu_signal_init(void)
 {
-    qemu_event_handle = CreateEvent(NULL, FALSE, FALSE, NULL);
-    if (!qemu_event_handle) {
-        fprintf(stderr, "Failed CreateEvent: %ld\n", GetLastError());
-        return -1;
-    }
-    qemu_add_wait_object(qemu_event_handle, dummy_event_handler, NULL);
     return 0;
 }
+#endif
+
+static AioContext *qemu_aio_context;
 
 void qemu_notify_event(void)
 {
-    if (!qemu_event_handle) {
+    if (!qemu_aio_context) {
         return;
     }
-    if (!SetEvent(qemu_event_handle)) {
-        fprintf(stderr, "qemu_notify_event: SetEvent failed: %ld\n",
-                GetLastError());
-        exit(1);
-    }
+    aio_notify(qemu_aio_context);
 }
 
-static int qemu_signal_init(void)
-{
-    return 0;
-}
-#endif
-
-static AioContext *qemu_aio_context;
-
 int main_loop_init(void)
 {
     int ret;
@@ -213,12 +128,6 @@ int main_loop_init(void)
         return ret;
     }
 
-    /* Note eventfd must be drained before signalfd handlers run */
-    ret = qemu_event_init();
-    if (ret) {
-        return ret;
-    }
-
     qemu_aio_context = aio_context_new();
     src = aio_get_g_source(qemu_aio_context);
     g_source_attach(src, NULL);
@@ -408,7 +317,8 @@ void qemu_del_wait_object(HANDLE handle, WaitObjectFunc *func, void *opaque)
 
 void qemu_fd_register(int fd)
 {
-    WSAEventSelect(fd, qemu_event_handle, FD_READ | FD_ACCEPT | FD_CLOSE |
+    WSAEventSelect(fd, event_notifier_get_handle(&qemu_aio_context->notifier),
+                   FD_READ | FD_ACCEPT | FD_CLOSE |
                    FD_CONNECT | FD_WRITE | FD_OOB);
 }
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 16/17] aio: clean up now-unused functions
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (14 preceding siblings ...)
  2012-09-25 12:56 ` [Qemu-devel] [PATCH 15/17] main-loop: use aio_notify for qemu_notify_event Paolo Bonzini
@ 2012-09-25 12:56 ` Paolo Bonzini
  2012-09-25 22:11   ` Anthony Liguori
  2012-09-25 12:56 ` [Qemu-devel] [PATCH 17/17] linux-aio: use event notifiers Paolo Bonzini
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:56 UTC (permalink / raw)
  To: qemu-devel

Some cleanups can now be made, now that the main loop does not anymore need
hooks into the bottom half code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 async.c       | 23 +++++++----------------
 oslib-posix.c | 31 -------------------------------
 qemu-aio.h    |  1 -
 qemu-common.h |  1 -
 4 file modificati, 7 inserzioni(+), 49 rimozioni(-)

diff --git a/async.c b/async.c
index 31c6c76..5a96d11 100644
--- a/async.c
+++ b/async.c
@@ -117,16 +117,20 @@ void qemu_bh_delete(QEMUBH *bh)
     bh->deleted = 1;
 }
 
-void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout)
+static gboolean
+aio_ctx_prepare(GSource *source, gint    *timeout)
 {
+    AioContext *ctx = (AioContext *) source;
     QEMUBH *bh;
+    bool scheduled = false;
 
     for (bh = ctx->first_bh; bh; bh = bh->next) {
         if (!bh->deleted && bh->scheduled) {
+            scheduled = true;
             if (bh->idle) {
                 /* idle bottom halves will be polled at least
                  * every 10ms */
-                *timeout = MIN(10, *timeout);
+                *timeout = 10;
             } else {
                 /* non-idle bottom halves will be executed
                  * immediately */
@@ -135,21 +139,8 @@ void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout)
             }
         }
     }
-}
-
-static gboolean
-aio_ctx_prepare(GSource *source, gint    *timeout)
-{
-    AioContext *ctx = (AioContext *) source;
-    uint32_t wait = -1;
-    aio_bh_update_timeout(ctx, &wait);
-
-    if (wait != -1) {
-        *timeout = MIN(*timeout, wait);
-        return wait == 0;
-    }
 
-    return FALSE;
+    return scheduled;
 }
 
 static gboolean
diff --git a/oslib-posix.c b/oslib-posix.c
index dbeb627..9db9c3d 100644
--- a/oslib-posix.c
+++ b/oslib-posix.c
@@ -61,9 +61,6 @@ static int running_on_valgrind = -1;
 #ifdef CONFIG_LINUX
 #include <sys/syscall.h>
 #endif
-#ifdef CONFIG_EVENTFD
-#include <sys/eventfd.h>
-#endif
 
 int qemu_get_thread_id(void)
 {
@@ -183,34 +180,6 @@ int qemu_pipe(int pipefd[2])
     return ret;
 }
 
-/*
- * Creates an eventfd that looks like a pipe and has EFD_CLOEXEC set.
- */
-int qemu_eventfd(int fds[2])
-{
-#ifdef CONFIG_EVENTFD
-    int ret;
-
-    ret = eventfd(0, 0);
-    if (ret >= 0) {
-        fds[0] = ret;
-        fds[1] = dup(ret);
-        if (fds[1] == -1) {
-            close(ret);
-            return -1;
-        }
-        qemu_set_cloexec(ret);
-        qemu_set_cloexec(fds[1]);
-        return 0;
-    }
-    if (errno != ENOSYS) {
-        return -1;
-    }
-#endif
-
-    return qemu_pipe(fds);
-}
-
 int qemu_utimens(const char *path, const struct timespec *times)
 {
     struct timeval tv[2], tv_now;
diff --git a/qemu-aio.h b/qemu-aio.h
index 2354617..1b7eb6e 100644
--- a/qemu-aio.h
+++ b/qemu-aio.h
@@ -125,7 +125,6 @@ void aio_notify(AioContext *ctx);
  * These are internal functions used by the QEMU main loop.
  */
 int aio_bh_poll(AioContext *ctx);
-void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout);
 
 /**
  * qemu_bh_schedule: Schedule a bottom half.
diff --git a/qemu-common.h b/qemu-common.h
index ac44657..1ea6ea3 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -219,7 +219,6 @@ ssize_t qemu_recv_full(int fd, void *buf, size_t count, int flags)
     QEMU_WARN_UNUSED_RESULT;
 
 #ifndef _WIN32
-int qemu_eventfd(int pipefd[2]);
 int qemu_pipe(int pipefd[2]);
 #endif
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [Qemu-devel] [PATCH 17/17] linux-aio: use event notifiers
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (15 preceding siblings ...)
  2012-09-25 12:56 ` [Qemu-devel] [PATCH 16/17] aio: clean up now-unused functions Paolo Bonzini
@ 2012-09-25 12:56 ` Paolo Bonzini
  2012-09-26 12:28 ` [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Kevin Wolf
  2012-10-08 11:39 ` Stefan Hajnoczi
  18 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-25 12:56 UTC (permalink / raw)
  To: qemu-devel

Since linux-aio already uses an eventfd, converting it to use the
EventNotifier-based API simplifies the code even though it is not
meant to be portable.

Reviewed-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 linux-aio.c | 49 +++++++++++++++++++------------------------------
 1 file modificato, 19 inserzioni(+), 30 rimozioni(-)

diff --git a/linux-aio.c b/linux-aio.c
index ce9b5d4..769b558 100644
--- a/linux-aio.c
+++ b/linux-aio.c
@@ -10,8 +10,8 @@
 #include "qemu-common.h"
 #include "qemu-aio.h"
 #include "block/raw-posix-aio.h"
+#include "event_notifier.h"
 
-#include <sys/eventfd.h>
 #include <libaio.h>
 
 /*
@@ -37,7 +37,7 @@ struct qemu_laiocb {
 
 struct qemu_laio_state {
     io_context_t ctx;
-    int efd;
+    EventNotifier e;
     int count;
 };
 
@@ -76,29 +76,17 @@ static void qemu_laio_process_completion(struct qemu_laio_state *s,
     qemu_aio_release(laiocb);
 }
 
-static void qemu_laio_completion_cb(void *opaque)
+static void qemu_laio_completion_cb(EventNotifier *e)
 {
-    struct qemu_laio_state *s = opaque;
+    struct qemu_laio_state *s = container_of(e, struct qemu_laio_state, e);
 
-    while (1) {
+    while (event_notifier_test_and_clear(&s->e)) {
         struct io_event events[MAX_EVENTS];
-        uint64_t val;
-        ssize_t ret;
         struct timespec ts = { 0 };
         int nevents, i;
 
         do {
-            ret = read(s->efd, &val, sizeof(val));
-        } while (ret == -1 && errno == EINTR);
-
-        if (ret == -1 && errno == EAGAIN)
-            break;
-
-        if (ret != 8)
-            break;
-
-        do {
-            nevents = io_getevents(s->ctx, val, MAX_EVENTS, events, &ts);
+            nevents = io_getevents(s->ctx, MAX_EVENTS, MAX_EVENTS, events, &ts);
         } while (nevents == -EINTR);
 
         for (i = 0; i < nevents; i++) {
@@ -112,9 +100,9 @@ static void qemu_laio_completion_cb(void *opaque)
     }
 }
 
-static int qemu_laio_flush_cb(void *opaque)
+static int qemu_laio_flush_cb(EventNotifier *e)
 {
-    struct qemu_laio_state *s = opaque;
+    struct qemu_laio_state *s = container_of(e, struct qemu_laio_state, e);
 
     return (s->count > 0) ? 1 : 0;
 }
@@ -146,8 +134,9 @@ static void laio_cancel(BlockDriverAIOCB *blockacb)
      * We might be able to do this slightly more optimal by removing the
      * O_NONBLOCK flag.
      */
-    while (laiocb->ret == -EINPROGRESS)
-        qemu_laio_completion_cb(laiocb->ctx);
+    while (laiocb->ret == -EINPROGRESS) {
+        qemu_laio_completion_cb(&laiocb->ctx->e);
+    }
 }
 
 static AIOPool laio_pool = {
@@ -186,7 +175,7 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
                         __func__, type);
         goto out_free_aiocb;
     }
-    io_set_eventfd(&laiocb->iocb, s->efd);
+    io_set_eventfd(&laiocb->iocb, event_notifier_get_fd(&s->e));
     s->count++;
 
     if (io_submit(s->ctx, 1, &iocbs) < 0)
@@ -205,21 +194,21 @@ void *laio_init(void)
     struct qemu_laio_state *s;
 
     s = g_malloc0(sizeof(*s));
-    s->efd = eventfd(0, 0);
-    if (s->efd == -1)
+    if (event_notifier_init(&s->e, false) < 0) {
         goto out_free_state;
-    fcntl(s->efd, F_SETFL, O_NONBLOCK);
+    }
 
-    if (io_setup(MAX_EVENTS, &s->ctx) != 0)
+    if (io_setup(MAX_EVENTS, &s->ctx) != 0) {
         goto out_close_efd;
+    }
 
-    qemu_aio_set_fd_handler(s->efd, qemu_laio_completion_cb, NULL,
-        qemu_laio_flush_cb, s);
+    qemu_aio_set_event_notifier(&s->e, qemu_laio_completion_cb,
+                                qemu_laio_flush_cb);
 
     return s;
 
 out_close_efd:
-    close(s->efd);
+    event_notifier_cleanup(&s->e);
 out_free_state:
     g_free(s);
     return NULL;
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 04/17] aio: change qemu_aio_set_fd_handler to return void
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 04/17] aio: change qemu_aio_set_fd_handler to return void Paolo Bonzini
@ 2012-09-25 21:47   ` Anthony Liguori
  0 siblings, 0 replies; 72+ messages in thread
From: Anthony Liguori @ 2012-09-25 21:47 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

Paolo Bonzini <pbonzini@redhat.com> writes:

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori

> ---
>  aio.c      | 12 +++++-------
>  qemu-aio.h | 10 +++++-----
>  2 file modificati, 10 inserzioni(+), 12 rimozioni(-)
>
> diff --git a/aio.c b/aio.c
> index c738a4e..e062aab 100644
> --- a/aio.c
> +++ b/aio.c
> @@ -53,11 +53,11 @@ static AioHandler *find_aio_handler(int fd)
>      return NULL;
>  }
>  
> -int qemu_aio_set_fd_handler(int fd,
> -                            IOHandler *io_read,
> -                            IOHandler *io_write,
> -                            AioFlushHandler *io_flush,
> -                            void *opaque)
> +void qemu_aio_set_fd_handler(int fd,
> +                             IOHandler *io_read,
> +                             IOHandler *io_write,
> +                             AioFlushHandler *io_flush,
> +                             void *opaque)
>  {
>      AioHandler *node;
>  
> @@ -93,8 +93,6 @@ int qemu_aio_set_fd_handler(int fd,
>      }
>  
>      qemu_set_fd_handler2(fd, NULL, io_read, io_write, opaque);
> -
> -    return 0;
>  }
>  
>  void qemu_aio_flush(void)
> diff --git a/qemu-aio.h b/qemu-aio.h
> index bfdd35f..27a7e21 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -60,10 +60,10 @@ bool qemu_aio_wait(void);
>   * Code that invokes AIO completion functions should rely on this function
>   * instead of qemu_set_fd_handler[2].
>   */
> -int qemu_aio_set_fd_handler(int fd,
> -                            IOHandler *io_read,
> -                            IOHandler *io_write,
> -                            AioFlushHandler *io_flush,
> -                            void *opaque);
> +void qemu_aio_set_fd_handler(int fd,
> +                             IOHandler *io_read,
> +                             IOHandler *io_write,
> +                             AioFlushHandler *io_flush,
> +                             void *opaque);
>  
>  #endif
> -- 
> 1.7.12

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 05/17] aio: provide platform-independent API
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 05/17] aio: provide platform-independent API Paolo Bonzini
@ 2012-09-25 21:48   ` Anthony Liguori
  0 siblings, 0 replies; 72+ messages in thread
From: Anthony Liguori @ 2012-09-25 21:48 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

Paolo Bonzini <pbonzini@redhat.com> writes:

> This adds to aio.c a platform-independent API based on EventNotifiers, that
> can be used by both POSIX and Win32.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori

> ---
>  Makefile.objs |  4 ++--
>  aio.c         |  9 +++++++++
>  qemu-aio.h    | 19 ++++++++++++++++++-
>  3 file modificati, 29 inserzioni(+), 3 rimozioni(-)
>
> diff --git a/Makefile.objs b/Makefile.objs
> index a99378c..713dd87 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -45,6 +45,8 @@ block-obj-y = iov.o cache-utils.o qemu-option.o module.o async.o
>  block-obj-y += nbd.o block.o aio.o aes.o qemu-config.o qemu-progress.o qemu-sockets.o
>  block-obj-y += $(coroutine-obj-y) $(qobject-obj-y) $(version-obj-y)
>  block-obj-$(CONFIG_POSIX) += posix-aio-compat.o
> +block-obj-$(CONFIG_POSIX) += event_notifier-posix.o
> +block-obj-$(CONFIG_WIN32) += event_notifier-win32.o
>  block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
>  block-obj-y += block/
>  
> @@ -92,8 +94,6 @@ common-obj-y += bt-host.o bt-vhci.o
>  common-obj-y += acl.o
>  common-obj-$(CONFIG_POSIX) += compatfd.o
>  common-obj-y += notify.o
> -common-obj-$(CONFIG_POSIX) += event_notifier-posix.o
> -common-obj-$(CONFIG_WIN32) += event_notifier-win32.o
>  common-obj-y += qemu-timer.o qemu-timer-common.o
>  
>  common-obj-$(CONFIG_SLIRP) += slirp/
> diff --git a/aio.c b/aio.c
> index e062aab..44214e1 100644
> --- a/aio.c
> +++ b/aio.c
> @@ -95,6 +95,15 @@ void qemu_aio_set_fd_handler(int fd,
>      qemu_set_fd_handler2(fd, NULL, io_read, io_write, opaque);
>  }
>  
> +void qemu_aio_set_event_notifier(EventNotifier *notifier,
> +                                 EventNotifierHandler *io_read,
> +                                 AioFlushEventNotifierHandler *io_flush)
> +{
> +    qemu_aio_set_fd_handler(event_notifier_get_fd(notifier),
> +                            (IOHandler *)io_read, NULL,
> +                            (AioFlushHandler *)io_flush, notifier);
> +}
> +
>  void qemu_aio_flush(void)
>  {
>      while (qemu_aio_wait());
> diff --git a/qemu-aio.h b/qemu-aio.h
> index 27a7e21..dc416a5 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -16,6 +16,7 @@
>  
>  #include "qemu-common.h"
>  #include "qemu-char.h"
> +#include "event_notifier.h"
>  
>  typedef struct BlockDriverAIOCB BlockDriverAIOCB;
>  typedef void BlockDriverCompletionFunc(void *opaque, int ret);
> @@ -39,7 +40,7 @@ void *qemu_aio_get(AIOPool *pool, BlockDriverState *bs,
>  void qemu_aio_release(void *p);
>  
>  /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
> -typedef int (AioFlushHandler)(void *opaque);
> +typedef int (AioFlushEventNotifierHandler)(EventNotifier *e);
>  
>  /* Flush any pending AIO operation. This function will block until all
>   * outstanding AIO operations have been completed or cancelled. */
> @@ -53,6 +54,10 @@ void qemu_aio_flush(void);
>   * Return whether there is still any pending AIO operation.  */
>  bool qemu_aio_wait(void);
>  
> +#ifdef CONFIG_POSIX
> +/* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
> +typedef int (AioFlushHandler)(void *opaque);
> +
>  /* Register a file descriptor and associated callbacks.  Behaves very similarly
>   * to qemu_set_fd_handler2.  Unlike qemu_set_fd_handler2, these callbacks will
>   * be invoked when using either qemu_aio_wait() or qemu_aio_flush().
> @@ -65,5 +70,17 @@ void qemu_aio_set_fd_handler(int fd,
>                               IOHandler *io_write,
>                               AioFlushHandler *io_flush,
>                               void *opaque);
> +#endif
> +
> +/* Register an event notifier and associated callbacks.  Behaves very similarly
> + * to event_notifier_set_handler.  Unlike event_notifier_set_handler, these callbacks
> + * will be invoked when using either qemu_aio_wait() or qemu_aio_flush().
> + *
> + * Code that invokes AIO completion functions should rely on this function
> + * instead of event_notifier_set_handler.
> + */
> +void qemu_aio_set_event_notifier(EventNotifier *notifier,
> +                                 EventNotifierHandler *io_read,
> +                                 AioFlushEventNotifierHandler *io_flush);
>  
>  #endif
> -- 
> 1.7.12

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 06/17] aio: introduce AioContext, move bottom halves there
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 06/17] aio: introduce AioContext, move bottom halves there Paolo Bonzini
@ 2012-09-25 21:51   ` Anthony Liguori
  2012-09-26  6:30     ` Paolo Bonzini
  0 siblings, 1 reply; 72+ messages in thread
From: Anthony Liguori @ 2012-09-25 21:51 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

Paolo Bonzini <pbonzini@redhat.com> writes:

> Start introducing AioContext, which will let us remove globals from
> aio.c/async.c, and introduce multiple I/O threads.
>
> The bottom half functions now take an additional AioContext argument.
> A bottom half is created with a specific AioContext that remains the
> same throughout the lifetime.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  async.c               | 30 +++++++++----------
>  hw/hw.h               |  1 +
>  iohandler.c           |  1 +
>  main-loop.c           | 18 +++++++++++-
>  main-loop.h           | 54 ++---------------------------------
>  qemu-aio.h            | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  qemu-char.h           |  1 +
>  qemu-common.h         |  1 +
>  qemu-coroutine-lock.c |  2 +-
>  9 file modificati, 118 inserzioni(+), 69 rimozioni(-)
>
> diff --git a/async.c b/async.c
> index 85cc641..189ee1b 100644
> --- a/async.c
> +++ b/async.c
> @@ -26,9 +26,6 @@
>  #include "qemu-aio.h"
>  #include "main-loop.h"
>  
> -/* Anchor of the list of Bottom Halves belonging to the context */
> -static struct QEMUBH *first_bh;
> -
>  /***********************************************************/
>  /* bottom halves (can be seen as timers which expire ASAP) */
>  
> @@ -41,27 +38,26 @@ struct QEMUBH {
>      bool deleted;
>  };
>  
> -QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
> +QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
>  {
>      QEMUBH *bh;
>      bh = g_malloc0(sizeof(QEMUBH));
>      bh->cb = cb;
>      bh->opaque = opaque;
> -    bh->next = first_bh;
> -    first_bh = bh;
> +    bh->next = ctx->first_bh;
> +    ctx->first_bh = bh;
>      return bh;
>  }
>  
> -int qemu_bh_poll(void)
> +int aio_bh_poll(AioContext *ctx)
>  {
>      QEMUBH *bh, **bhp, *next;
>      int ret;
> -    static int nesting = 0;
>  
> -    nesting++;
> +    ctx->walking_bh++;
>  
>      ret = 0;
> -    for (bh = first_bh; bh; bh = next) {
> +    for (bh = ctx->first_bh; bh; bh = next) {
>          next = bh->next;
>          if (!bh->deleted && bh->scheduled) {
>              bh->scheduled = 0;
> @@ -72,11 +68,11 @@ int qemu_bh_poll(void)
>          }
>      }
>  
> -    nesting--;
> +    ctx->walking_bh--;
>  
>      /* remove deleted bhs */
> -    if (!nesting) {
> -        bhp = &first_bh;
> +    if (!ctx->walking_bh) {
> +        bhp = &ctx->first_bh;
>          while (*bhp) {
>              bh = *bhp;
>              if (bh->deleted) {
> @@ -120,11 +116,11 @@ void qemu_bh_delete(QEMUBH *bh)
>      bh->deleted = 1;
>  }
>  
> -void qemu_bh_update_timeout(uint32_t *timeout)
> +void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout)
>  {
>      QEMUBH *bh;
>  
> -    for (bh = first_bh; bh; bh = bh->next) {
> +    for (bh = ctx->first_bh; bh; bh = bh->next) {
>          if (!bh->deleted && bh->scheduled) {
>              if (bh->idle) {
>                  /* idle bottom halves will be polled at least
> @@ -140,3 +136,7 @@ void qemu_bh_update_timeout(uint32_t *timeout)
>      }
>  }
>  
> +AioContext *aio_context_new(void)
> +{
> +    return g_new0(AioContext, 1);
> +}
> diff --git a/hw/hw.h b/hw/hw.h
> index e5cb9bf..acb718d 100644
> --- a/hw/hw.h
> +++ b/hw/hw.h
> @@ -10,6 +10,7 @@
>  
>  #include "ioport.h"
>  #include "irq.h"
> +#include "qemu-aio.h"
>  #include "qemu-file.h"
>  #include "vmstate.h"
>  
> diff --git a/iohandler.c b/iohandler.c
> index a2d871b..60460a6 100644
> --- a/iohandler.c
> +++ b/iohandler.c
> @@ -26,6 +26,7 @@
>  #include "qemu-common.h"
>  #include "qemu-char.h"
>  #include "qemu-queue.h"
> +#include "qemu-aio.h"
>  #include "main-loop.h"
>  
>  #ifndef _WIN32
> diff --git a/main-loop.c b/main-loop.c
> index eb3b6e6..f0bc515 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -26,6 +26,7 @@
>  #include "qemu-timer.h"
>  #include "slirp/slirp.h"
>  #include "main-loop.h"
> +#include "qemu-aio.h"
>  
>  #ifndef _WIN32
>  
> @@ -199,6 +200,8 @@ static int qemu_signal_init(void)
>  }
>  #endif
>  
> +static AioContext *qemu_aio_context;
> +
>  int main_loop_init(void)
>  {
>      int ret;
> @@ -215,6 +218,7 @@ int main_loop_init(void)
>          return ret;
>      }
>  
> +    qemu_aio_context = aio_context_new();
>      return 0;
>  }
>  
> @@ -478,7 +482,7 @@ int main_loop_wait(int nonblocking)
>      if (nonblocking) {
>          timeout = 0;
>      } else {
> -        qemu_bh_update_timeout(&timeout);
> +        aio_bh_update_timeout(qemu_aio_context, &timeout);
>      }
>  
>      /* poll any events */
> @@ -507,3 +511,15 @@ int main_loop_wait(int nonblocking)
>  
>      return ret;
>  }
> +
> +/* Functions to operate on the main QEMU AioContext.  */
> +
> +QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
> +{
> +    return aio_bh_new(qemu_aio_context, cb, opaque);
> +}
> +
> +int qemu_bh_poll(void)
> +{
> +    return aio_bh_poll(qemu_aio_context);
> +}
> diff --git a/main-loop.h b/main-loop.h
> index dce1cd9..47644ce 100644
> --- a/main-loop.h
> +++ b/main-loop.h
> @@ -25,6 +25,8 @@
>  #ifndef QEMU_MAIN_LOOP_H
>  #define QEMU_MAIN_LOOP_H 1
>  
> +#include "qemu-aio.h"
> +
>  #define SIG_IPI SIGUSR1
>  
>  /**
> @@ -173,7 +175,6 @@ void qemu_del_wait_object(HANDLE handle, WaitObjectFunc *func, void *opaque);
>  
>  typedef void IOReadHandler(void *opaque, const uint8_t *buf, int size);
>  typedef int IOCanReadHandler(void *opaque);
> -typedef void IOHandler(void *opaque);
>  
>  /**
>   * qemu_set_fd_handler2: Register a file descriptor with the main loop
> @@ -254,56 +255,6 @@ int qemu_set_fd_handler(int fd,
>                          IOHandler *fd_write,
>                          void *opaque);
>  
> -typedef struct QEMUBH QEMUBH;
> -typedef void QEMUBHFunc(void *opaque);
> -
> -/**
> - * qemu_bh_new: Allocate a new bottom half structure.
> - *
> - * Bottom halves are lightweight callbacks whose invocation is guaranteed
> - * to be wait-free, thread-safe and signal-safe.  The #QEMUBH structure
> - * is opaque and must be allocated prior to its use.
> - */
> -QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
> -
> -/**
> - * qemu_bh_schedule: Schedule a bottom half.
> - *
> - * Scheduling a bottom half interrupts the main loop and causes the
> - * execution of the callback that was passed to qemu_bh_new.
> - *
> - * Bottom halves that are scheduled from a bottom half handler are instantly
> - * invoked.  This can create an infinite loop if a bottom half handler
> - * schedules itself.
> - *
> - * @bh: The bottom half to be scheduled.
> - */
> -void qemu_bh_schedule(QEMUBH *bh);
> -
> -/**
> - * qemu_bh_cancel: Cancel execution of a bottom half.
> - *
> - * Canceling execution of a bottom half undoes the effect of calls to
> - * qemu_bh_schedule without freeing its resources yet.  While cancellation
> - * itself is also wait-free and thread-safe, it can of course race with the
> - * loop that executes bottom halves unless you are holding the iothread
> - * mutex.  This makes it mostly useless if you are not holding the mutex.
> - *
> - * @bh: The bottom half to be canceled.
> - */
> -void qemu_bh_cancel(QEMUBH *bh);
> -
> -/**
> - *qemu_bh_delete: Cancel execution of a bottom half and free its resources.
> - *
> - * Deleting a bottom half frees the memory that was allocated for it by
> - * qemu_bh_new.  It also implies canceling the bottom half if it was
> - * scheduled.
> - *
> - * @bh: The bottom half to be deleted.
> - */
> -void qemu_bh_delete(QEMUBH *bh);
> -
>  #ifdef CONFIG_POSIX
>  /**
>   * qemu_add_child_watch: Register a child process for reaping.
> @@ -359,6 +310,7 @@ void qemu_fd_register(int fd);
>  void qemu_iohandler_fill(int *pnfds, fd_set *readfds, fd_set *writefds, fd_set *xfds);
>  void qemu_iohandler_poll(fd_set *readfds, fd_set *writefds, fd_set *xfds, int rc);
>  
> +QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
>  void qemu_bh_schedule_idle(QEMUBH *bh);
>  int qemu_bh_poll(void);
>  void qemu_bh_update_timeout(uint32_t *timeout);
> diff --git a/qemu-aio.h b/qemu-aio.h
> index dc416a5..2ed6ad3 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -15,7 +15,6 @@
>  #define QEMU_AIO_H
>  
>  #include "qemu-common.h"
> -#include "qemu-char.h"
>  #include "event_notifier.h"
>  
>  typedef struct BlockDriverAIOCB BlockDriverAIOCB;
> @@ -39,9 +38,87 @@ void *qemu_aio_get(AIOPool *pool, BlockDriverState *bs,
>                     BlockDriverCompletionFunc *cb, void *opaque);
>  void qemu_aio_release(void *p);
>  
> +typedef struct AioHandler AioHandler;
> +typedef void QEMUBHFunc(void *opaque);
> +typedef void IOHandler(void *opaque);
> +
> +typedef struct AioContext {
> +    /* Anchor of the list of Bottom Halves belonging to the context */
> +    struct QEMUBH *first_bh;
> +
> +    /* A simple lock used to protect the first_bh list, and ensure that
> +     * no callbacks are removed while we're walking and dispatching callbacks.
> +     */
> +    int walking_bh;
> +} AioContext;
> +
>  /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
>  typedef int (AioFlushEventNotifierHandler)(EventNotifier *e);
>  
> +/**
> + * aio_context_new: Allocate a new AioContext.
> + *
> + * AioContext provide a mini event-loop that can be waited on synchronously.
> + * They also provide bottom halves, a service to execute a piece of code
> + * as soon as possible.
> + */
> +AioContext *aio_context_new(void);
> +
> +/**
> + * aio_bh_new: Allocate a new bottom half structure.
> + *
> + * Bottom halves are lightweight callbacks whose invocation is guaranteed
> + * to be wait-free, thread-safe and signal-safe.  The #QEMUBH structure
> + * is opaque and must be allocated prior to its use.
> + */
> +QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
> +
> +/**
> + * aio_bh_poll: Poll bottom halves for an AioContext.
> + *
> + * These are internal functions used by the QEMU main loop.
> + */
> +int aio_bh_poll(AioContext *ctx);
> +void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout);
> +
> +/**
> + * qemu_bh_schedule: Schedule a bottom half.
> + *
> + * Scheduling a bottom half interrupts the main loop and causes the
> + * execution of the callback that was passed to qemu_bh_new.
> + *
> + * Bottom halves that are scheduled from a bottom half handler are instantly
> + * invoked.  This can create an infinite loop if a bottom half handler
> + * schedules itself.
> + *
> + * @bh: The bottom half to be scheduled.
> + */
> +void qemu_bh_schedule(QEMUBH *bh);
> +
> +/**
> + * qemu_bh_cancel: Cancel execution of a bottom half.
> + *
> + * Canceling execution of a bottom half undoes the effect of calls to
> + * qemu_bh_schedule without freeing its resources yet.  While cancellation
> + * itself is also wait-free and thread-safe, it can of course race with the
> + * loop that executes bottom halves unless you are holding the iothread
> + * mutex.  This makes it mostly useless if you are not holding the mutex.
> + *
> + * @bh: The bottom half to be canceled.
> + */
> +void qemu_bh_cancel(QEMUBH *bh);
> +
> +/**
> + *qemu_bh_delete: Cancel execution of a bottom half and free its resources.
> + *
> + * Deleting a bottom half frees the memory that was allocated for it by
> + * qemu_bh_new.  It also implies canceling the bottom half if it was
> + * scheduled.
> + *
> + * @bh: The bottom half to be deleted.
> + */
> +void qemu_bh_delete(QEMUBH *bh);
> +
>  /* Flush any pending AIO operation. This function will block until all
>   * outstanding AIO operations have been completed or cancelled. */
>  void qemu_aio_flush(void);
> diff --git a/qemu-char.h b/qemu-char.h
> index 486644b..5087168 100644
> --- a/qemu-char.h
> +++ b/qemu-char.h
> @@ -5,6 +5,7 @@
>  #include "qemu-queue.h"
>  #include "qemu-option.h"
>  #include "qemu-config.h"
> +#include "qemu-aio.h"
>  #include "qobject.h"
>  #include "qstring.h"
>  #include "main-loop.h"
> diff --git a/qemu-common.h b/qemu-common.h
> index e5c2bcd..ac44657 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -14,6 +14,7 @@
>  
>  typedef struct QEMUTimer QEMUTimer;
>  typedef struct QEMUFile QEMUFile;
> +typedef struct QEMUBH QEMUBH;
>  typedef struct DeviceState DeviceState;
>  
>  struct Monitor;

Any reason to do this here vs. just #include "qemu-aio.h" in
qemu-common.h?

I don't see an obvious dependency on qemu-common.h in qemu-aio.h other
than this typedef.

Regards,

Anthony Liguori

> diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c
> index 26ad76b..9dda3f8 100644
> --- a/qemu-coroutine-lock.c
> +++ b/qemu-coroutine-lock.c
> @@ -26,7 +26,7 @@
>  #include "qemu-coroutine.h"
>  #include "qemu-coroutine-int.h"
>  #include "qemu-queue.h"
> -#include "main-loop.h"
> +#include "qemu-aio.h"
>  #include "trace.h"
>  
>  static QTAILQ_HEAD(, Coroutine) unlock_bh_queue =
> -- 
> 1.7.12

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 08/17] aio: add non-blocking variant of aio_wait
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 08/17] aio: add non-blocking variant of aio_wait Paolo Bonzini
@ 2012-09-25 21:56   ` Anthony Liguori
  0 siblings, 0 replies; 72+ messages in thread
From: Anthony Liguori @ 2012-09-25 21:56 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

Paolo Bonzini <pbonzini@redhat.com> writes:

> This will be used when polling the GSource attached to an AioContext.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori

> ---
>  aio.c       | 16 ++++++++++++----
>  async.c     |  2 +-
>  main-loop.c |  2 +-
>  qemu-aio.h  | 21 +++++++++++++++------
>  4 file modificati, 29 inserzioni(+), 12 rimozioni(-)
>
> diff --git a/aio.c b/aio.c
> index c89f1e9..95ad467 100644
> --- a/aio.c
> +++ b/aio.c
> @@ -93,13 +93,16 @@ void aio_set_event_notifier(AioContext *ctx,
>                         (AioFlushHandler *)io_flush, notifier);
>  }
>  
> -bool aio_wait(AioContext *ctx)
> +bool aio_poll(AioContext *ctx, bool blocking)
>  {
> +    static struct timeval tv0;
>      AioHandler *node;
>      fd_set rdfds, wrfds;
>      int max_fd = -1;
>      int ret;
> -    bool busy;
> +    bool busy, progress;
> +
> +    progress = false;
>  
>      /*
>       * If there are callbacks left that have been queued, we need to call then.
> @@ -107,6 +110,11 @@ bool aio_wait(AioContext *ctx)
>       * does not need a complete flush (as is the case for qemu_aio_wait loops).
>       */
>      if (aio_bh_poll(ctx)) {
> +        blocking = false;
> +        progress = true;
> +    }
> +
> +    if (progress && !blocking) {
>          return true;
>      }
>  
> @@ -142,11 +150,11 @@ bool aio_wait(AioContext *ctx)
>  
>      /* No AIO operations?  Get us out of here */
>      if (!busy) {
> -        return false;
> +        return progress;
>      }
>  
>      /* wait until next event */
> -    ret = select(max_fd, &rdfds, &wrfds, NULL, NULL);
> +    ret = select(max_fd, &rdfds, &wrfds, NULL, blocking ? NULL : &tv0);
>  
>      /* if we have any readable fds, dispatch event */
>      if (ret > 0) {
> diff --git a/async.c b/async.c
> index c99db79..513bdd7 100644
> --- a/async.c
> +++ b/async.c
> @@ -144,5 +144,5 @@ AioContext *aio_context_new(void)
>  
>  void aio_flush(AioContext *ctx)
>  {
> -    while (aio_wait(ctx));
> +    while (aio_poll(ctx, true));
>  }
> diff --git a/main-loop.c b/main-loop.c
> index 8301fe9..67800fe 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -531,7 +531,7 @@ void qemu_aio_flush(void)
>  
>  bool qemu_aio_wait(void)
>  {
> -    return aio_wait(qemu_aio_context);
> +    return aio_poll(qemu_aio_context, true);
>  }
>  
>  void qemu_aio_set_fd_handler(int fd,
> diff --git a/qemu-aio.h b/qemu-aio.h
> index f8a93d8..f19201e 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -133,13 +133,22 @@ void qemu_bh_delete(QEMUBH *bh);
>   * outstanding AIO operations have been completed or cancelled. */
>  void aio_flush(AioContext *ctx);
>  
> -/* Wait for a single AIO completion to occur.  This function will wait
> - * until a single AIO event has completed and it will ensure something
> - * has moved before returning. This can issue new pending aio as
> - * result of executing I/O completion or bh callbacks.
> +/* Progress in completing AIO work to occur.  This can issue new pending
> + * aio as a result of executing I/O completion or bh callbacks.
>   *
> - * Return whether there is still any pending AIO operation.  */
> -bool aio_wait(AioContext *ctx);
> + * If there is no pending AIO operation or completion (bottom half),
> + * return false.  If there are pending bottom halves, return true.
> + *
> + * If there are no pending bottom halves, but there are pending AIO
> + * operations, it may not be possible to make any progress without
> + * blocking.  If @blocking is true, this function will wait until one
> + * or more AIO events have completed, to ensure something has moved
> + * before returning.
> + *
> + * If @blocking is false, this function will also return false if the
> + * function cannot make any progress without blocking.
> + */
> +bool aio_poll(AioContext *ctx, bool blocking);
>  
>  #ifdef CONFIG_POSIX
>  /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
> -- 
> 1.7.12

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch Paolo Bonzini
@ 2012-09-25 22:01   ` Anthony Liguori
  2012-09-26  6:36     ` Paolo Bonzini
  2012-09-26  6:48     ` Paolo Bonzini
  2012-09-29 11:28   ` Blue Swirl
  1 sibling, 2 replies; 72+ messages in thread
From: Anthony Liguori @ 2012-09-25 22:01 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

Paolo Bonzini <pbonzini@redhat.com> writes:

> This adds a GPollFD to each AioHandler.  It will then be possible to
> attach these GPollFDs to a GSource, and from there to the main loop.
> aio_wait examines the GPollFDs and avoids calling select() if any
> is set (similar to what it does if bottom halves are available).
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  aio.c      | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++---------
>  qemu-aio.h |  7 ++++++
>  2 file modificati, 78 inserzioni(+), 11 rimozioni(-)
>
> diff --git a/aio.c b/aio.c
> index 95ad467..c848a9f 100644
> --- a/aio.c
> +++ b/aio.c
> @@ -20,7 +20,7 @@
>  
>  struct AioHandler
>  {
> -    int fd;
> +    GPollFD pfd;
>      IOHandler *io_read;
>      IOHandler *io_write;
>      AioFlushHandler *io_flush;
> @@ -34,7 +34,7 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
>      AioHandler *node;
>  
>      QLIST_FOREACH(node, &ctx->aio_handlers, node) {
> -        if (node->fd == fd)
> +        if (node->pfd.fd == fd)
>              if (!node->deleted)
>                  return node;
>      }
> @@ -57,9 +57,10 @@ void aio_set_fd_handler(AioContext *ctx,
>      if (!io_read && !io_write) {
>          if (node) {
>              /* If the lock is held, just mark the node as deleted */
> -            if (ctx->walking_handlers)
> +            if (ctx->walking_handlers) {
>                  node->deleted = 1;
> -            else {
> +                node->pfd.revents = 0;
> +            } else {
>                  /* Otherwise, delete it for real.  We can't just mark it as
>                   * deleted because deleted nodes are only cleaned up after
>                   * releasing the walking_handlers lock.
> @@ -72,7 +73,7 @@ void aio_set_fd_handler(AioContext *ctx,
>          if (node == NULL) {
>              /* Alloc and insert if it's not already there */
>              node = g_malloc0(sizeof(AioHandler));
> -            node->fd = fd;
> +            node->pfd.fd = fd;
>              QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
>          }
>          /* Update handler with latest information */
> @@ -80,6 +81,10 @@ void aio_set_fd_handler(AioContext *ctx,
>          node->io_write = io_write;
>          node->io_flush = io_flush;
>          node->opaque = opaque;
> +
> +        node->pfd.events = G_IO_ERR;
> +        node->pfd.events |= (io_read ? G_IO_IN | G_IO_HUP : 0);
> +        node->pfd.events |= (io_write ? G_IO_OUT : 0);
>      }

Should we even set G_IO_ERR?  I think that corresponds to exceptfd in
select() but we've never set that historically.  I know glib recommends
it but I don't think it's applicable to how we use it.

Moreover, the way you do dispatch, if G_IO_ERR did occur, we'd dispatch
both the read and write handlers which definitely isn't right.

I think it's easiest just to drop it.

>  }
>  
> @@ -93,6 +98,25 @@ void aio_set_event_notifier(AioContext *ctx,
>                         (AioFlushHandler *)io_flush, notifier);
>  }
>  
> +bool aio_pending(AioContext *ctx)
> +{
> +    AioHandler *node;
> +
> +    QLIST_FOREACH(node, &ctx->aio_handlers, node) {
> +        int revents;
> +
> +        revents = node->pfd.revents & node->pfd.events;
> +        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
> +            return true;
> +        }
> +        if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>  bool aio_poll(AioContext *ctx, bool blocking)
>  {
>      static struct timeval tv0;
> @@ -114,6 +138,42 @@ bool aio_poll(AioContext *ctx, bool blocking)
>          progress = true;
>      }
>  
> +    /*
> +     * Then dispatch any pending callbacks from the GSource.
> +     *
> +     * We have to walk very carefully in case qemu_aio_set_fd_handler is
> +     * called while we're walking.
> +     */
> +    node = QLIST_FIRST(&ctx->aio_handlers);
> +    while (node) {
> +        AioHandler *tmp;
> +        int revents;
> +
> +        ctx->walking_handlers++;
> +
> +        revents = node->pfd.revents & node->pfd.events;
> +        node->pfd.revents &= ~revents;

This is interesting and I must admit I don't understand why it's
necessary.  What case are you trying to handle?

> +
> +        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
> +            node->io_read(node->opaque);
> +            progress = true;
> +        }
> +        if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
> +            node->io_write(node->opaque);
> +            progress = true;
> +        }
> +
> +        tmp = node;
> +        node = QLIST_NEXT(node, node);
> +
> +        ctx->walking_handlers--;
> +
> +        if (!ctx->walking_handlers && tmp->deleted) {
> +            QLIST_REMOVE(tmp, node);
> +            g_free(tmp);
> +        }
> +    }
> +
>      if (progress && !blocking) {
>          return true;
>      }
> @@ -137,12 +197,12 @@ bool aio_poll(AioContext *ctx, bool blocking)
>              busy = true;
>          }
>          if (!node->deleted && node->io_read) {
> -            FD_SET(node->fd, &rdfds);
> -            max_fd = MAX(max_fd, node->fd + 1);
> +            FD_SET(node->pfd.fd, &rdfds);
> +            max_fd = MAX(max_fd, node->pfd.fd + 1);
>          }
>          if (!node->deleted && node->io_write) {
> -            FD_SET(node->fd, &wrfds);
> -            max_fd = MAX(max_fd, node->fd + 1);
> +            FD_SET(node->pfd.fd, &wrfds);
> +            max_fd = MAX(max_fd, node->pfd.fd + 1);
>          }
>      }
>  
> @@ -167,12 +227,12 @@ bool aio_poll(AioContext *ctx, bool blocking)
>              ctx->walking_handlers++;
>  
>              if (!node->deleted &&
> -                FD_ISSET(node->fd, &rdfds) &&
> +                FD_ISSET(node->pfd.fd, &rdfds) &&
>                  node->io_read) {
>                  node->io_read(node->opaque);
>              }
>              if (!node->deleted &&
> -                FD_ISSET(node->fd, &wrfds) &&
> +                FD_ISSET(node->pfd.fd, &wrfds) &&
>                  node->io_write) {
>                  node->io_write(node->opaque);
>              }
> diff --git a/qemu-aio.h b/qemu-aio.h
> index f19201e..ac24896 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -133,6 +133,13 @@ void qemu_bh_delete(QEMUBH *bh);
>   * outstanding AIO operations have been completed or cancelled. */
>  void aio_flush(AioContext *ctx);
>  
> +/* Return whether there are any pending callbacks from the GSource
> + * attached to the AioContext.
> + *
> + * This is used internally in the implementation of the GSource.
> + */
> +bool aio_pending(AioContext *ctx);
> +
>  /* Progress in completing AIO work to occur.  This can issue new pending
>   * aio as a result of executing I/O completion or bh callbacks.
>   *
> -- 
> 1.7.12

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 11/17] aio: make AioContexts GSources
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 11/17] aio: make AioContexts GSources Paolo Bonzini
@ 2012-09-25 22:06   ` Anthony Liguori
  2012-09-26  6:40     ` Paolo Bonzini
  0 siblings, 1 reply; 72+ messages in thread
From: Anthony Liguori @ 2012-09-25 22:06 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

Paolo Bonzini <pbonzini@redhat.com> writes:

> This lets AioContexts be used (optionally) with a glib main loop.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  aio-posix.c |  4 ++++
>  aio-win32.c |  4 ++++
>  async.c     | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  qemu-aio.h  | 23 ++++++++++++++++++++++
>  4 file modificati, 95 inserzioni(+). 1 rimozione(-)
>
> diff --git a/aio-posix.c b/aio-posix.c
> index c848a9f..e29ece9 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -56,6 +56,8 @@ void aio_set_fd_handler(AioContext *ctx,
>      /* Are we deleting the fd handler? */
>      if (!io_read && !io_write) {
>          if (node) {
> +            g_source_remove_poll(&ctx->source, &node->pfd);
> +
>              /* If the lock is held, just mark the node as deleted */
>              if (ctx->walking_handlers) {
>                  node->deleted = 1;
> @@ -75,6 +77,8 @@ void aio_set_fd_handler(AioContext *ctx,
>              node = g_malloc0(sizeof(AioHandler));
>              node->pfd.fd = fd;
>              QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
> +
> +            g_source_add_poll(&ctx->source, &node->pfd);
>          }
>          /* Update handler with latest information */
>          node->io_read = io_read;
> diff --git a/aio-win32.c b/aio-win32.c
> index c46dfb2..5057371 100644
> --- a/aio-win32.c
> +++ b/aio-win32.c
> @@ -45,6 +45,8 @@ void aio_set_event_notifier(AioContext *ctx,
>      /* Are we deleting the fd handler? */
>      if (!io_notify) {
>          if (node) {
> +            g_source_remove_poll(&ctx->source, &node->pfd);
> +

Why remove vs. setting events = 0?

add_poll/remove_poll also comes with an event loop notify which I don't
think is strictly necessary here.

>              /* If the lock is held, just mark the node as deleted */
>              if (ctx->walking_handlers) {
>                  node->deleted = 1;
> @@ -66,6 +68,8 @@ void aio_set_event_notifier(AioContext *ctx,
>              node->pfd.fd = (uintptr_t)event_notifier_get_handle(e);
>              node->pfd.events = G_IO_IN;
>              QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
> +
> +            g_source_add_poll(&ctx->source, &node->pfd);
>          }
>          /* Update handler with latest information */
>          node->io_notify = io_notify;
> diff --git a/async.c b/async.c
> index 513bdd7..ed2bd3f 100644
> --- a/async.c
> +++ b/async.c
> @@ -136,10 +136,73 @@ void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout)
>      }
>  }
>  
> +static gboolean
> +aio_ctx_prepare(GSource *source, gint    *timeout)
> +{
> +    AioContext *ctx = (AioContext *) source;
> +    uint32_t wait = -1;
> +    aio_bh_update_timeout(ctx, &wait);
> +
> +    if (wait != -1) {
> +        *timeout = MIN(*timeout, wait);
> +        return wait == 0;
> +    }
> +
> +    return FALSE;
> +}
> +
> +static gboolean
> +aio_ctx_check(GSource *source)
> +{
> +    AioContext *ctx = (AioContext *) source;
> +    QEMUBH *bh;
> +
> +    for (bh = ctx->first_bh; bh; bh = bh->next) {
> +        if (!bh->deleted && bh->scheduled) {
> +            return true;
> +	}
> +    }
> +    return aio_pending(ctx);
> +}

Think you've got some copy/paste leftover glib coding style.  Probably
should use TRUE/true consistently too.  I think using TRUE/FALSE for
gboolean and true/false for bool is reasonable.

> +
> +static gboolean
> +aio_ctx_dispatch(GSource     *source,
> +                 GSourceFunc  callback,
> +                 gpointer     user_data)
> +{
> +    AioContext *ctx = (AioContext *) source;
> +
> +    assert(callback == NULL);
> +    aio_poll(ctx, false);
> +    return TRUE;
> +}
> +
> +static GSourceFuncs aio_source_funcs = {
> +    aio_ctx_prepare,
> +    aio_ctx_check,
> +    aio_ctx_dispatch,
> +    NULL
> +};
> +
> +GSource *aio_get_g_source(AioContext *ctx)
> +{
> +    g_source_ref(&ctx->source);
> +    return &ctx->source;
> +}
>  
>  AioContext *aio_context_new(void)
>  {
> -    return g_new0(AioContext, 1);
> +    return (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
> +}
> +
> +void aio_context_ref(AioContext *ctx)
> +{
> +    g_source_ref(&ctx->source);
> +}
> +
> +void aio_context_unref(AioContext *ctx)
> +{
> +    g_source_unref(&ctx->source);
>  }
>  
>  void aio_flush(AioContext *ctx)
> diff --git a/qemu-aio.h b/qemu-aio.h
> index ac24896..aedf66c 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -44,6 +44,8 @@ typedef void QEMUBHFunc(void *opaque);
>  typedef void IOHandler(void *opaque);
>  
>  typedef struct AioContext {
> +    GSource source;
> +
>      /* The list of registered AIO handlers */
>      QLIST_HEAD(, AioHandler) aio_handlers;
>  
> @@ -75,6 +77,22 @@ typedef int (AioFlushEventNotifierHandler)(EventNotifier *e);
>  AioContext *aio_context_new(void);
>  
>  /**
> + * aio_context_ref:
> + * @ctx: The AioContext to operate on.
> + *
> + * Add a reference to an AioContext.
> + */
> +void aio_context_ref(AioContext *ctx);
> +
> +/**
> + * aio_context_unref:
> + * @ctx: The AioContext to operate on.
> + *
> + * Drop a reference to an AioContext.
> + */
> +void aio_context_unref(AioContext *ctx);
> +
> +/**
>   * aio_bh_new: Allocate a new bottom half structure.
>   *
>   * Bottom halves are lightweight callbacks whose invocation is guaranteed
> @@ -188,6 +206,11 @@ void aio_set_event_notifier(AioContext *ctx,
>                              EventNotifierHandler *io_read,
>                              AioFlushEventNotifierHandler *io_flush);
>  
> +/* Return a GSource that lets the main loop poll the file descriptors attached
> + * to this AioContext.
> + */
> +GSource *aio_get_g_source(AioContext *ctx);
> +
>  /* Functions to operate on the main QEMU AioContext.  */
>  
>  void qemu_aio_flush(void);
> -- 
> 1.7.12

I kind of dislike the fact that we've got a single source for all bottom
halves but this is definitely a good starting point.

The GSource implementation looks right to me.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 13/17] aio: call aio_notify after setting I/O handlers
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 13/17] aio: call aio_notify after setting I/O handlers Paolo Bonzini
@ 2012-09-25 22:07   ` Anthony Liguori
  0 siblings, 0 replies; 72+ messages in thread
From: Anthony Liguori @ 2012-09-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

Paolo Bonzini <pbonzini@redhat.com> writes:

> In the current code, this is done by qemu_set_fd_handler2, which is
> called by qemu_aio_set_fd_handler.  We need to keep the same behavior
> even after removing the call to qemu_set_fd_handler2.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori


> ---
>  aio-posix.c | 2 ++
>  aio-win32.c | 2 ++
>  2 file modificati, 4 inserzioni(+)
>
> diff --git a/aio-posix.c b/aio-posix.c
> index e29ece9..41f638f 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -90,6 +90,8 @@ void aio_set_fd_handler(AioContext *ctx,
>          node->pfd.events |= (io_read ? G_IO_IN | G_IO_HUP : 0);
>          node->pfd.events |= (io_write ? G_IO_OUT : 0);
>      }
> +
> +    aio_notify(ctx);
>  }
>  
>  void aio_set_event_notifier(AioContext *ctx,
> diff --git a/aio-win32.c b/aio-win32.c
> index 5057371..78faf69 100644
> --- a/aio-win32.c
> +++ b/aio-win32.c
> @@ -75,6 +75,8 @@ void aio_set_event_notifier(AioContext *ctx,
>          node->io_notify = io_notify;
>          node->io_flush = io_flush;
>      }
> +
> +    aio_notify(ctx);
>  }
>  
>  bool aio_pending(AioContext *ctx)
> -- 
> 1.7.12

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 12/17] aio: add aio_notify
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 12/17] aio: add aio_notify Paolo Bonzini
@ 2012-09-25 22:07   ` Anthony Liguori
  0 siblings, 0 replies; 72+ messages in thread
From: Anthony Liguori @ 2012-09-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

Paolo Bonzini <pbonzini@redhat.com> writes:

> With this change async.c does not rely anymore on any service from
> main-loop.c, i.e. it is completely self-contained.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Other than the coding style bits that need to be fixed first in the
previous patch:

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori

> ---
>  async.c    | 30 ++++++++++++++++++++++++++----
>  qemu-aio.h | 18 ++++++++++++++++++
>  2 file modificati, 44 inserzioni(+), 4 rimozioni(-)
>
> diff --git a/async.c b/async.c
> index ed2bd3f..31c6c76 100644
> --- a/async.c
> +++ b/async.c
> @@ -30,6 +30,7 @@
>  /* bottom halves (can be seen as timers which expire ASAP) */
>  
>  struct QEMUBH {
> +    AioContext *ctx;
>      QEMUBHFunc *cb;
>      void *opaque;
>      QEMUBH *next;
> @@ -42,6 +43,7 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
>  {
>      QEMUBH *bh;
>      bh = g_malloc0(sizeof(QEMUBH));
> +    bh->ctx = ctx;
>      bh->cb = cb;
>      bh->opaque = opaque;
>      bh->next = ctx->first_bh;
> @@ -101,8 +103,7 @@ void qemu_bh_schedule(QEMUBH *bh)
>          return;
>      bh->scheduled = 1;
>      bh->idle = 0;
> -    /* stop the currently executing CPU to execute the BH ASAP */
> -    qemu_notify_event();
> +    aio_notify(bh->ctx);
>  }
>  
>  void qemu_bh_cancel(QEMUBH *bh)
> @@ -177,11 +178,20 @@ aio_ctx_dispatch(GSource     *source,
>      return TRUE;
>  }
>  
> +static void
> +aio_ctx_finalize(GSource     *source)
> +{
> +    AioContext *ctx = (AioContext *) source;
> +
> +    aio_set_event_notifier(ctx, &ctx->notifier, NULL, NULL);
> +    event_notifier_cleanup(&ctx->notifier);
> +}
> +
>  static GSourceFuncs aio_source_funcs = {
>      aio_ctx_prepare,
>      aio_ctx_check,
>      aio_ctx_dispatch,
> -    NULL
> +    aio_ctx_finalize
>  };
>  
>  GSource *aio_get_g_source(AioContext *ctx)
> @@ -190,9 +200,21 @@ GSource *aio_get_g_source(AioContext *ctx)
>      return &ctx->source;
>  }
>  
> +void aio_notify(AioContext *ctx)
> +{
> +    event_notifier_set(&ctx->notifier);
> +}
> +
>  AioContext *aio_context_new(void)
>  {
> -    return (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
> +    AioContext *ctx;
> +    ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
> +    event_notifier_init(&ctx->notifier, false);
> +    aio_set_event_notifier(ctx, &ctx->notifier, 
> +                           (EventNotifierHandler *)
> +                           event_notifier_test_and_clear, NULL);
> +
> +    return ctx;
>  }
>  
>  void aio_context_ref(AioContext *ctx)
> diff --git a/qemu-aio.h b/qemu-aio.h
> index aedf66c..2354617 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -62,6 +62,9 @@ typedef struct AioContext {
>       * no callbacks are removed while we're walking and dispatching callbacks.
>       */
>      int walking_bh;
> +
> +    /* Used for aio_notify.  */
> +    EventNotifier notifier;
>  } AioContext;
>  
>  /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
> @@ -102,6 +105,21 @@ void aio_context_unref(AioContext *ctx);
>  QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
>  
>  /**
> + * aio_notify: Force processing of pending events.
> + *
> + * Similar to signaling a condition variable, aio_notify forces
> + * aio_wait to exit, so that the next call will re-examine pending events.
> + * The caller of aio_notify will usually call aio_wait again very soon,
> + * or go through another iteration of the GLib main loop.  Hence, aio_notify
> + * also has the side effect of recalculating the sets of file descriptors
> + * that the main loop waits for.
> + *
> + * Calling aio_notify is rarely necessary, because for example scheduling
> + * a bottom half calls it already.
> + */
> +void aio_notify(AioContext *ctx);
> +
> +/**
>   * aio_bh_poll: Poll bottom halves for an AioContext.
>   *
>   * These are internal functions used by the QEMU main loop.
> -- 
> 1.7.12

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 14/17] main-loop: use GSource to poll AIO file descriptors
  2012-09-25 12:56 ` [Qemu-devel] [PATCH 14/17] main-loop: use GSource to poll AIO file descriptors Paolo Bonzini
@ 2012-09-25 22:09   ` Anthony Liguori
  2012-09-26  6:38     ` Paolo Bonzini
  0 siblings, 1 reply; 72+ messages in thread
From: Anthony Liguori @ 2012-09-25 22:09 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

Paolo Bonzini <pbonzini@redhat.com> writes:

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  main-loop.c | 23 ++++++-----------------
>  main-loop.h |  2 --
>  2 file modificati, 6 inserzioni(+), 19 rimozioni(-)
>
> diff --git a/main-loop.c b/main-loop.c
> index b290c79..209f699 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -205,6 +205,7 @@ static AioContext *qemu_aio_context;
>  int main_loop_init(void)
>  {
>      int ret;
> +    GSource *src;
>  
>      qemu_mutex_lock_iothread();
>      ret = qemu_signal_init();
> @@ -219,6 +220,9 @@ int main_loop_init(void)
>      }
>  
>      qemu_aio_context = aio_context_new();
> +    src = aio_get_g_source(qemu_aio_context);
> +    g_source_attach(src, NULL);
> +    g_source_unref(src);
>      return 0;
>  }
>  
> @@ -481,8 +485,6 @@ int main_loop_wait(int nonblocking)
>  
>      if (nonblocking) {
>          timeout = 0;
> -    } else {
> -        aio_bh_update_timeout(qemu_aio_context, &timeout);
>      }
>  
>      /* poll any events */
> @@ -505,10 +507,6 @@ int main_loop_wait(int nonblocking)
>  
>      qemu_run_all_timers();
>  
> -    /* Check bottom-halves last in case any of the earlier events triggered
> -       them.  */
> -    qemu_bh_poll();
> -

This is an awesome cleanup!

What do you think about deprecating bottom halves in the !block code in
favor of idle functions?  I don't see any reason to keep using bottom
halves...

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori

>      return ret;
>  }
>  
> @@ -519,11 +517,6 @@ QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
>      return aio_bh_new(qemu_aio_context, cb, opaque);
>  }
>  
> -int qemu_bh_poll(void)
> -{
> -    return aio_bh_poll(qemu_aio_context);
> -}
> -
>  void qemu_aio_flush(void)
>  {
>      aio_flush(qemu_aio_context);
> @@ -543,16 +536,12 @@ void qemu_aio_set_fd_handler(int fd,
>  {
>      aio_set_fd_handler(qemu_aio_context, fd, io_read, io_write, io_flush,
>                         opaque);
> -
> -    qemu_set_fd_handler2(fd, NULL, io_read, io_write, opaque);
>  }
> +#endif
>  
>  void qemu_aio_set_event_notifier(EventNotifier *notifier,
>                                   EventNotifierHandler *io_read,
>                                   AioFlushEventNotifierHandler *io_flush)
>  {
> -    qemu_aio_set_fd_handler(event_notifier_get_fd(notifier),
> -                            (IOHandler *)io_read, NULL,
> -                            (AioFlushHandler *)io_flush, notifier);
> +    aio_set_event_notifier(qemu_aio_context, notifier, io_read, io_flush);
>  }
> -#endif
> diff --git a/main-loop.h b/main-loop.h
> index 47644ce..c58f38b 100644
> --- a/main-loop.h
> +++ b/main-loop.h
> @@ -312,7 +312,5 @@ void qemu_iohandler_poll(fd_set *readfds, fd_set *writefds, fd_set *xfds, int rc
>  
>  QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
>  void qemu_bh_schedule_idle(QEMUBH *bh);
> -int qemu_bh_poll(void);
> -void qemu_bh_update_timeout(uint32_t *timeout);
>  
>  #endif
> -- 
> 1.7.12

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 15/17] main-loop: use aio_notify for qemu_notify_event
  2012-09-25 12:56 ` [Qemu-devel] [PATCH 15/17] main-loop: use aio_notify for qemu_notify_event Paolo Bonzini
@ 2012-09-25 22:10   ` Anthony Liguori
  0 siblings, 0 replies; 72+ messages in thread
From: Anthony Liguori @ 2012-09-25 22:10 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

Paolo Bonzini <pbonzini@redhat.com> writes:

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  main-loop.c | 106 +++++-------------------------------------------------------
>  1 file modificato, 8 inserzioni(+), 98 rimozioni(-)

diffstats like this are always a good sign :-)

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori

>
> diff --git a/main-loop.c b/main-loop.c
> index 209f699..978050a 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -32,70 +32,6 @@
>  
>  #include "compatfd.h"
>  
> -static int io_thread_fd = -1;
> -
> -void qemu_notify_event(void)
> -{
> -    /* Write 8 bytes to be compatible with eventfd.  */
> -    static const uint64_t val = 1;
> -    ssize_t ret;
> -
> -    if (io_thread_fd == -1) {
> -        return;
> -    }
> -    do {
> -        ret = write(io_thread_fd, &val, sizeof(val));
> -    } while (ret < 0 && errno == EINTR);
> -
> -    /* EAGAIN is fine, a read must be pending.  */
> -    if (ret < 0 && errno != EAGAIN) {
> -        fprintf(stderr, "qemu_notify_event: write() failed: %s\n",
> -                strerror(errno));
> -        exit(1);
> -    }
> -}
> -
> -static void qemu_event_read(void *opaque)
> -{
> -    int fd = (intptr_t)opaque;
> -    ssize_t len;
> -    char buffer[512];
> -
> -    /* Drain the notify pipe.  For eventfd, only 8 bytes will be read.  */
> -    do {
> -        len = read(fd, buffer, sizeof(buffer));
> -    } while ((len == -1 && errno == EINTR) || len == sizeof(buffer));
> -}
> -
> -static int qemu_event_init(void)
> -{
> -    int err;
> -    int fds[2];
> -
> -    err = qemu_eventfd(fds);
> -    if (err == -1) {
> -        return -errno;
> -    }
> -    err = fcntl_setfl(fds[0], O_NONBLOCK);
> -    if (err < 0) {
> -        goto fail;
> -    }
> -    err = fcntl_setfl(fds[1], O_NONBLOCK);
> -    if (err < 0) {
> -        goto fail;
> -    }
> -    qemu_set_fd_handler2(fds[0], NULL, qemu_event_read, NULL,
> -                         (void *)(intptr_t)fds[0]);
> -
> -    io_thread_fd = fds[1];
> -    return 0;
> -
> -fail:
> -    close(fds[0]);
> -    close(fds[1]);
> -    return err;
> -}
> -
>  /* If we have signalfd, we mask out the signals we want to handle and then
>   * use signalfd to listen for them.  We rely on whatever the current signal
>   * handler is to dispatch the signals when we receive them.
> @@ -165,43 +101,22 @@ static int qemu_signal_init(void)
>  
>  #else /* _WIN32 */
>  
> -static HANDLE qemu_event_handle = NULL;
> -
> -static void dummy_event_handler(void *opaque)
> -{
> -}
> -
> -static int qemu_event_init(void)
> +static int qemu_signal_init(void)
>  {
> -    qemu_event_handle = CreateEvent(NULL, FALSE, FALSE, NULL);
> -    if (!qemu_event_handle) {
> -        fprintf(stderr, "Failed CreateEvent: %ld\n", GetLastError());
> -        return -1;
> -    }
> -    qemu_add_wait_object(qemu_event_handle, dummy_event_handler, NULL);
>      return 0;
>  }
> +#endif
> +
> +static AioContext *qemu_aio_context;
>  
>  void qemu_notify_event(void)
>  {
> -    if (!qemu_event_handle) {
> +    if (!qemu_aio_context) {
>          return;
>      }
> -    if (!SetEvent(qemu_event_handle)) {
> -        fprintf(stderr, "qemu_notify_event: SetEvent failed: %ld\n",
> -                GetLastError());
> -        exit(1);
> -    }
> +    aio_notify(qemu_aio_context);
>  }
>  
> -static int qemu_signal_init(void)
> -{
> -    return 0;
> -}
> -#endif
> -
> -static AioContext *qemu_aio_context;
> -
>  int main_loop_init(void)
>  {
>      int ret;
> @@ -213,12 +128,6 @@ int main_loop_init(void)
>          return ret;
>      }
>  
> -    /* Note eventfd must be drained before signalfd handlers run */
> -    ret = qemu_event_init();
> -    if (ret) {
> -        return ret;
> -    }
> -
>      qemu_aio_context = aio_context_new();
>      src = aio_get_g_source(qemu_aio_context);
>      g_source_attach(src, NULL);
> @@ -408,7 +317,8 @@ void qemu_del_wait_object(HANDLE handle, WaitObjectFunc *func, void *opaque)
>  
>  void qemu_fd_register(int fd)
>  {
> -    WSAEventSelect(fd, qemu_event_handle, FD_READ | FD_ACCEPT | FD_CLOSE |
> +    WSAEventSelect(fd, event_notifier_get_handle(&qemu_aio_context->notifier),
> +                   FD_READ | FD_ACCEPT | FD_CLOSE |
>                     FD_CONNECT | FD_WRITE | FD_OOB);
>  }
>  
> -- 
> 1.7.12

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 16/17] aio: clean up now-unused functions
  2012-09-25 12:56 ` [Qemu-devel] [PATCH 16/17] aio: clean up now-unused functions Paolo Bonzini
@ 2012-09-25 22:11   ` Anthony Liguori
  0 siblings, 0 replies; 72+ messages in thread
From: Anthony Liguori @ 2012-09-25 22:11 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

Paolo Bonzini <pbonzini@redhat.com> writes:

> Some cleanups can now be made, now that the main loop does not anymore need
> hooks into the bottom half code.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori

> ---
>  async.c       | 23 +++++++----------------
>  oslib-posix.c | 31 -------------------------------
>  qemu-aio.h    |  1 -
>  qemu-common.h |  1 -
>  4 file modificati, 7 inserzioni(+), 49 rimozioni(-)
>
> diff --git a/async.c b/async.c
> index 31c6c76..5a96d11 100644
> --- a/async.c
> +++ b/async.c
> @@ -117,16 +117,20 @@ void qemu_bh_delete(QEMUBH *bh)
>      bh->deleted = 1;
>  }
>  
> -void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout)
> +static gboolean
> +aio_ctx_prepare(GSource *source, gint    *timeout)
>  {
> +    AioContext *ctx = (AioContext *) source;
>      QEMUBH *bh;
> +    bool scheduled = false;
>  
>      for (bh = ctx->first_bh; bh; bh = bh->next) {
>          if (!bh->deleted && bh->scheduled) {
> +            scheduled = true;
>              if (bh->idle) {
>                  /* idle bottom halves will be polled at least
>                   * every 10ms */
> -                *timeout = MIN(10, *timeout);
> +                *timeout = 10;
>              } else {
>                  /* non-idle bottom halves will be executed
>                   * immediately */
> @@ -135,21 +139,8 @@ void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout)
>              }
>          }
>      }
> -}
> -
> -static gboolean
> -aio_ctx_prepare(GSource *source, gint    *timeout)
> -{
> -    AioContext *ctx = (AioContext *) source;
> -    uint32_t wait = -1;
> -    aio_bh_update_timeout(ctx, &wait);
> -
> -    if (wait != -1) {
> -        *timeout = MIN(*timeout, wait);
> -        return wait == 0;
> -    }
>  
> -    return FALSE;
> +    return scheduled;
>  }
>  
>  static gboolean
> diff --git a/oslib-posix.c b/oslib-posix.c
> index dbeb627..9db9c3d 100644
> --- a/oslib-posix.c
> +++ b/oslib-posix.c
> @@ -61,9 +61,6 @@ static int running_on_valgrind = -1;
>  #ifdef CONFIG_LINUX
>  #include <sys/syscall.h>
>  #endif
> -#ifdef CONFIG_EVENTFD
> -#include <sys/eventfd.h>
> -#endif
>  
>  int qemu_get_thread_id(void)
>  {
> @@ -183,34 +180,6 @@ int qemu_pipe(int pipefd[2])
>      return ret;
>  }
>  
> -/*
> - * Creates an eventfd that looks like a pipe and has EFD_CLOEXEC set.
> - */
> -int qemu_eventfd(int fds[2])
> -{
> -#ifdef CONFIG_EVENTFD
> -    int ret;
> -
> -    ret = eventfd(0, 0);
> -    if (ret >= 0) {
> -        fds[0] = ret;
> -        fds[1] = dup(ret);
> -        if (fds[1] == -1) {
> -            close(ret);
> -            return -1;
> -        }
> -        qemu_set_cloexec(ret);
> -        qemu_set_cloexec(fds[1]);
> -        return 0;
> -    }
> -    if (errno != ENOSYS) {
> -        return -1;
> -    }
> -#endif
> -
> -    return qemu_pipe(fds);
> -}
> -
>  int qemu_utimens(const char *path, const struct timespec *times)
>  {
>      struct timeval tv[2], tv_now;
> diff --git a/qemu-aio.h b/qemu-aio.h
> index 2354617..1b7eb6e 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -125,7 +125,6 @@ void aio_notify(AioContext *ctx);
>   * These are internal functions used by the QEMU main loop.
>   */
>  int aio_bh_poll(AioContext *ctx);
> -void aio_bh_update_timeout(AioContext *ctx, uint32_t *timeout);
>  
>  /**
>   * qemu_bh_schedule: Schedule a bottom half.
> diff --git a/qemu-common.h b/qemu-common.h
> index ac44657..1ea6ea3 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -219,7 +219,6 @@ ssize_t qemu_recv_full(int fd, void *buf, size_t count, int flags)
>      QEMU_WARN_UNUSED_RESULT;
>  
>  #ifndef _WIN32
> -int qemu_eventfd(int pipefd[2]);
>  int qemu_pipe(int pipefd[2]);
>  #endif
>  
> -- 
> 1.7.12

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 06/17] aio: introduce AioContext, move bottom halves there
  2012-09-25 21:51   ` Anthony Liguori
@ 2012-09-26  6:30     ` Paolo Bonzini
  0 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-26  6:30 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

Il 25/09/2012 23:51, Anthony Liguori ha scritto:
>> >  typedef struct QEMUTimer QEMUTimer;
>> >  typedef struct QEMUFile QEMUFile;
>> > +typedef struct QEMUBH QEMUBH;
>> >  typedef struct DeviceState DeviceState;
>> >  
>> >  struct Monitor;
> Any reason to do this here vs. just #include "qemu-aio.h" in
> qemu-common.h?
> 
> I don't see an obvious dependency on qemu-common.h in qemu-aio.h other
> than this typedef.

I thought we were moving away from including everything in
qemu-common.h.  In fact, the only includes from QEMU in qemu-common.h are:

#ifdef _WIN32
#include "qemu-os-win32.h"
#endif

#ifdef CONFIG_POSIX
#include "qemu-os-posix.h"
#endif

#include "osdep.h"
#include "bswap.h"

#include "cpu.h"

where cpu.h could probably be removed---perhaps should.

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch
  2012-09-25 22:01   ` Anthony Liguori
@ 2012-09-26  6:36     ` Paolo Bonzini
  2012-09-26  6:48     ` Paolo Bonzini
  1 sibling, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-26  6:36 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

Il 26/09/2012 00:01, Anthony Liguori ha scritto:
>> > +        node->pfd.events = G_IO_ERR;
>> > +        node->pfd.events |= (io_read ? G_IO_IN | G_IO_HUP : 0);
>> > +        node->pfd.events |= (io_write ? G_IO_OUT : 0);
>> >      }
> Should we even set G_IO_ERR?  I think that corresponds to exceptfd

No, that would be G_IO_PRI.

> in select() but we've never set that historically.  I know glib recommends
> it but I don't think it's applicable to how we use it.
> 
> Moreover, the way you do dispatch, if G_IO_ERR did occur, we'd dispatch
> both the read and write handlers which definitely isn't right.

I'm not sure what gives POLLERR.  Probably a connect() that fails, and
in that case dispatching on the write handler would be okay.  But I was
not sure and calling both is safe: handlers have to be ready for
spurious wakeups anyway, it happens if qemu_aio_wait dispatches from a
VCPU thread before the main loop gets hold of the big lock.

> I think it's easiest just to drop it.

That's indeed the case, since the current code never sets either
G_IO_HUP or G_IO_ERR.

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 14/17] main-loop: use GSource to poll AIO file descriptors
  2012-09-25 22:09   ` Anthony Liguori
@ 2012-09-26  6:38     ` Paolo Bonzini
  0 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-26  6:38 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

Il 26/09/2012 00:09, Anthony Liguori ha scritto:
> What do you think about deprecating bottom halves in the !block code in
> favor of idle functions?  I don't see any reason to keep using bottom
> halves...

The ptimer.c code uses bottom halves internally.  Otherwise I'd agree.

Paolo

> Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 11/17] aio: make AioContexts GSources
  2012-09-25 22:06   ` Anthony Liguori
@ 2012-09-26  6:40     ` Paolo Bonzini
  0 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-26  6:40 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

Il 26/09/2012 00:06, Anthony Liguori ha scritto:
>> >          if (node) {
>> > +            g_source_remove_poll(&ctx->source, &node->pfd);
>> > +
> Why remove vs. setting events = 0?

Because otherwise you'd get a dangling pointer to node->pfd. :)

Paolo

> add_poll/remove_poll also comes with an event loop notify which I don't
> think is strictly necessary here.
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch
  2012-09-25 22:01   ` Anthony Liguori
  2012-09-26  6:36     ` Paolo Bonzini
@ 2012-09-26  6:48     ` Paolo Bonzini
  1 sibling, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-26  6:48 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

Il 26/09/2012 00:01, Anthony Liguori ha scritto:
> > +        revents = node->pfd.revents & node->pfd.events;
> > +        node->pfd.revents &= ~revents;
> 
> This is interesting and I must admit I don't understand why it's
> necessary.  What case are you trying to handle?

That's for the case where you got a write event for fd Y, but disabled
the write handler from the handler of fd X (called before fd Y).  But
what the current code does is just eat the event, so I can do the same
and set node->pfd.revents to 0.

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts"
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (16 preceding siblings ...)
  2012-09-25 12:56 ` [Qemu-devel] [PATCH 17/17] linux-aio: use event notifiers Paolo Bonzini
@ 2012-09-26 12:28 ` Kevin Wolf
  2012-09-26 13:32   ` Paolo Bonzini
  2012-10-08 11:39 ` Stefan Hajnoczi
  18 siblings, 1 reply; 72+ messages in thread
From: Kevin Wolf @ 2012-09-26 12:28 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Am 25.09.2012 14:55, schrieb Paolo Bonzini:
> This series removes the globals from async.c/aio-posix.c so that
> multiple AIO contexts (mini event loops) can be added.  Right now,
> all block devices still use qemu_bh_new, but switching them to
> aio_bh_new would let you associate different files with different
> AIO contexts.
> 
> As an added bonus, integration with the glib main loop now happens
> via GSource.  Each AIO context is a GSource, which means you can
> choose either to run it in its own thread (this of course needs
> proper locking which is not yet here), or to attach it to the main
> thread.
> 
> In this state this is a bit of an academic exercise (though it works
> and may even make sense for 1.3), but I think it's an example of the
> tiny steps that can lead us towards an upstreamable version of the
> data-plane code.

Do you have a git tree where I could see what things would look like in
the end?

I wonder how this relates to my plans of getting rid of qemu_aio_flush()
and friends in favour of BlockDriver.bdrv_drain(). In fact, after
removing io_flush, I don't really see what makes AIO fd handlers special
any more.

qemu_aio_wait() only calls these handlers, but would it do any harm if
we called all fd handlers? And other than that it's just a small main
loop, so I guess it could share code with the real main loop.

So, considering this, any reason to let aio.c survive at all?

Kevin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts"
  2012-09-26 12:28 ` [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Kevin Wolf
@ 2012-09-26 13:32   ` Paolo Bonzini
  2012-09-26 14:31     ` Kevin Wolf
  0 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-26 13:32 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Anthony Liguori

Il 26/09/2012 14:28, Kevin Wolf ha scritto:
> Do you have a git tree where I could see what things would look like in
> the end?

I will push it to aio-context on git://github.com/bonzini/qemu.git as
soon as github comes back.

> I wonder how this relates to my plans of getting rid of qemu_aio_flush()
> and friends in favour of BlockDriver.bdrv_drain().

Mostly unrelated, I think.  The introduction of the non-blocking
aio_poll in this series might help implementing bdrv_drain, like this:

    blocking = false;
    while(bs has requests) {
        progress = aio_poll(aio context of bs, blocking);
        if (progress) {
            blocking = false;
            continue;
        }
        if (bs has throttled requests) {
            restart throttled requests
            blocking = false;
            continue;
        }

        /* No progress, must have been non-blocking.  We must wait.  */
        assert(!blocking);
        blocking = true;
    }

BTW, is it true that "bs->file has requests || bs->backing_hd has
requests" (or any other underlying file, like vmdk extents) implies "bs
has requests"?

> In fact, after removing io_flush, I don't really see what makes AIO
> fd handlers special any more.

Note that while the handlers aren't that special indeed, there is still
some magic because qemu_aio_wait() bottom halves.

> qemu_aio_wait() only calls these handlers, but would it do any harm if
> we called all fd handlers?

Unfortunately yes.  You could get re-entrant calls from the monitor
while a monitor command drains the AIO queue for example.

> And other than that it's just a small main
> loop, so I guess it could share code with the real main loop.

Yes, the nested (and blocking) event loops are ugly.  On the other hand,
it is even uglier to have hooks to call the main loop from aio.c (for
handlers) and vice versa (for bottom halves).  One of the points of the
series is to make AIO just another GSource, with the bottom half magic
and fdhandler hooks handled in a single place (async.c).

Moving towards separate-thread event processing for one or more
BlockDriverStates can be done both with GMainLoop + GSource, or with
aio_poll.  I haven't put much thought in this, but a thread doing "while
(aio_poll(ctx, false));" would look very much like Stefan's data-plane code.

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts"
  2012-09-26 13:32   ` Paolo Bonzini
@ 2012-09-26 14:31     ` Kevin Wolf
  2012-09-26 15:48       ` Paolo Bonzini
  0 siblings, 1 reply; 72+ messages in thread
From: Kevin Wolf @ 2012-09-26 14:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Anthony Liguori

Am 26.09.2012 15:32, schrieb Paolo Bonzini:
> Il 26/09/2012 14:28, Kevin Wolf ha scritto:
>> Do you have a git tree where I could see what things would look like in
>> the end?
> 
> I will push it to aio-context on git://github.com/bonzini/qemu.git as
> soon as github comes back.
> 
>> I wonder how this relates to my plans of getting rid of qemu_aio_flush()
>> and friends in favour of BlockDriver.bdrv_drain().
> 
> Mostly unrelated, I think.  The introduction of the non-blocking
> aio_poll in this series might help implementing bdrv_drain, like this:
> 
>     blocking = false;
>     while(bs has requests) {
>         progress = aio_poll(aio context of bs, blocking);
>         if (progress) {
>             blocking = false;
>             continue;
>         }
>         if (bs has throttled requests) {
>             restart throttled requests
>             blocking = false;
>             continue;
>         }
> 
>         /* No progress, must have been non-blocking.  We must wait.  */
>         assert(!blocking);
>         blocking = true;
>     }

Yes, possibly.

> BTW, is it true that "bs->file has requests || bs->backing_hd has
> requests" (or any other underlying file, like vmdk extents) implies "bs
> has requests"?

I think each block driver is responsible for draining the requests that
it sent. This means that it will drain bs->file (because noone else
should directly go there) and in most cases also bs->backing_hd, but if
for example live commit has a request in flight that directly accesses
the backing file, I wouldn't expect that a block driver is required to
wait for the completion of this request.

>> In fact, after removing io_flush, I don't really see what makes AIO
>> fd handlers special any more.
> 
> Note that while the handlers aren't that special indeed, there is still
> some magic because qemu_aio_wait() bottom halves.

Do you mean the qemu_bh_poll() call? But the normal main loop does the
same, so I don't see what would be special about it.

>> qemu_aio_wait() only calls these handlers, but would it do any harm if
>> we called all fd handlers?
> 
> Unfortunately yes.  You could get re-entrant calls from the monitor
> while a monitor command drains the AIO queue for example.

Hm, that's true... Who's special here - is it the block layer or the
monitor? I'm not quite sure. If it's the monitor, maybe we should plan
to change that sometime when we have some spare time... ;-)

Kevin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts"
  2012-09-26 14:31     ` Kevin Wolf
@ 2012-09-26 15:48       ` Paolo Bonzini
  2012-09-27  7:11         ` Kevin Wolf
  0 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-26 15:48 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Anthony Liguori

Il 26/09/2012 16:31, Kevin Wolf ha scritto:

>>> In fact, after removing io_flush, I don't really see what makes AIO
>>> fd handlers special any more.
>>
>> Note that while the handlers aren't that special indeed, there is still
>> some magic because qemu_aio_wait() bottom halves.
> 
> Do you mean the qemu_bh_poll() call? But the normal main loop does the
> same, so I don't see what would be special about it.

That's an abstraction leakage, IMHO.  After this series the normal main
loop does not need anymore to call bottom halves.

(Most usage of bottom halves in hw/* is pointless and also falls under
the category of leaked abstractions.  The other uses could also in
principle be called at the wrong time inside monitor commands.  Many
would be served better by a thread pool if it wasn't for our beloved big
lock).

>>> qemu_aio_wait() only calls these handlers, but would it do any harm if
>>> we called all fd handlers?
>>
>> Unfortunately yes.  You could get re-entrant calls from the monitor
>> while a monitor command drains the AIO queue for example.
> 
> Hm, that's true... Who's special here - is it the block layer or the
> monitor? I'm not quite sure. If it's the monitor, maybe we should plan
> to change that sometime when we have some spare time... ;-)

It feels like it's the monitor.  But I think in general it is better if
as little QEMU infrastructure as possible is used by the block layer,
because you end up with impossibly-knotted dependencies.  Using things
such as GSource to mediate between the block layer and everything else
is also better with an eye to libqblock.

Also, consider that under Windows there's a big difference: after this
series, qemu_aio_wait() only works with EventNotifiers, while
qemu_set_fd_handler2 only works with sockets.  Networked block drivers
are disabled for Windows by these patches, there's really no way to move
forward without sacrificing them.

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts"
  2012-09-26 15:48       ` Paolo Bonzini
@ 2012-09-27  7:11         ` Kevin Wolf
  2012-09-27  7:43           ` Paolo Bonzini
  0 siblings, 1 reply; 72+ messages in thread
From: Kevin Wolf @ 2012-09-27  7:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Anthony Liguori

Am 26.09.2012 17:48, schrieb Paolo Bonzini:
> Il 26/09/2012 16:31, Kevin Wolf ha scritto:
> 
>>>> In fact, after removing io_flush, I don't really see what makes AIO
>>>> fd handlers special any more.
>>>
>>> Note that while the handlers aren't that special indeed, there is still
>>> some magic because qemu_aio_wait() bottom halves.
>>
>> Do you mean the qemu_bh_poll() call? But the normal main loop does the
>> same, so I don't see what would be special about it.
> 
> That's an abstraction leakage, IMHO.  After this series the normal main
> loop does not need anymore to call bottom halves.

This is something that I find hard to believe. Bottom halves aren't an
invention of the block layer, but used throughout qemu.

> (Most usage of bottom halves in hw/* is pointless and also falls under
> the category of leaked abstractions.  The other uses could also in
> principle be called at the wrong time inside monitor commands.  Many
> would be served better by a thread pool if it wasn't for our beloved big
> lock).

Possibly, but with the current infrastructure, I'm almost sure that most
of them are needed and you can't directly call them. Nobody uses BHs
just for fun.

>>>> qemu_aio_wait() only calls these handlers, but would it do any harm if
>>>> we called all fd handlers?
>>>
>>> Unfortunately yes.  You could get re-entrant calls from the monitor
>>> while a monitor command drains the AIO queue for example.
>>
>> Hm, that's true... Who's special here - is it the block layer or the
>> monitor? I'm not quite sure. If it's the monitor, maybe we should plan
>> to change that sometime when we have some spare time... ;-)
> 
> It feels like it's the monitor.  But I think in general it is better if
> as little QEMU infrastructure as possible is used by the block layer,
> because you end up with impossibly-knotted dependencies.  Using things
> such as GSource to mediate between the block layer and everything else
> is also better with an eye to libqblock.

I guess my expectation was that if GSource is an improvement for AIO fd
handlers, it would also be an improvement for the rest of fd handlers.

It's well known that qemu as a whole suffers from the NIH syndrome, but
should we really start introducing another NIH wall between the block
layer an the rest of qemu?

> Also, consider that under Windows there's a big difference: after this
> series, qemu_aio_wait() only works with EventNotifiers, while
> qemu_set_fd_handler2 only works with sockets.  Networked block drivers
> are disabled for Windows by these patches, there's really no way to move
> forward without sacrificing them.

Is it really only networked block drivers that you lose this way?

Kevin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts"
  2012-09-27  7:11         ` Kevin Wolf
@ 2012-09-27  7:43           ` Paolo Bonzini
  0 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-09-27  7:43 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Anthony Liguori

Il 27/09/2012 09:11, Kevin Wolf ha scritto:
> Am 26.09.2012 17:48, schrieb Paolo Bonzini:
>> Il 26/09/2012 16:31, Kevin Wolf ha scritto:
>>
>>>>> In fact, after removing io_flush, I don't really see what makes AIO
>>>>> fd handlers special any more.
>>>>
>>>> Note that while the handlers aren't that special indeed, there is still
>>>> some magic because qemu_aio_wait() bottom halves.
>>>
>>> Do you mean the qemu_bh_poll() call? But the normal main loop does the
>>> same, so I don't see what would be special about it.
>>
>> That's an abstraction leakage, IMHO.  After this series the normal main
>> loop does not need anymore to call bottom halves.
> 
> This is something that I find hard to believe. Bottom halves aren't an
> invention of the block layer

Actually they are, they were introduced by commit 83f6409 (async file
I/O API, 2006-08-01).

> but used throughout qemu.

>> (Most usage of bottom halves in hw/* is pointless and also falls under
>> the category of leaked abstractions.  The other uses could also in
>> principle be called at the wrong time inside monitor commands.  Many
>> would be served better by a thread pool if it wasn't for our beloved big
>> lock).
> 
> Possibly, but with the current infrastructure, I'm almost sure that most
> of them are needed and you can't directly call them. Nobody uses BHs
> just for fun.

Most of them are for hw/ptimer.c and are useless wrappers for a
(callback, opaque) pair.  The others are useful, and not used for fun
indeed.

But here is how they typically behave: the VCPU triggers a bottom half,
which wakes up the iothread, which waits until the VCPU frees the global
mutex.  So they are really a shortcut for "do this as soon as we are
done with this subsystem".  If locking were more fine-grained, you might
as well wrap the bottom half handler with a lock/unlock pair, and move
the bottom halves to a thread pool.  It would allow multiple subsystem
to process their bottom halves in parallel, for example.

Bottom halves are more fundamental for AIO, see for example how they
extend the lifetime of AIOCBs.

>> It feels like it's the monitor.  But I think in general it is better if
>> as little QEMU infrastructure as possible is used by the block layer,
>> because you end up with impossibly-knotted dependencies.  Using things
>> such as GSource to mediate between the block layer and everything else
>> is also better with an eye to libqblock.
> 
> I guess my expectation was that if GSource is an improvement for AIO fd
> handlers, it would also be an improvement for the rest of fd handlers.

It would, but you would need a separate GSource, because of the
different Windows implementations for the two.

> It's well known that qemu as a whole suffers from the NIH syndrome, but
> should we really start introducing another NIH wall between the block
> layer an the rest of qemu?

I don't see it as a NIH wall.  By replacing
qemu_bh_update_timeout()+qemu_bh_poll() with a GSource, you use glib for
interoperability instead of ad hoc code.  Basing libqblock AIO support
on GSources would be quite the opposite of NIH, indeed.

>> Also, consider that under Windows there's a big difference: after this
>> series, qemu_aio_wait() only works with EventNotifiers, while
>> qemu_set_fd_handler2 only works with sockets.  Networked block drivers
>> are disabled for Windows by these patches, there's really no way to move
>> forward without sacrificing them.
> 
> Is it really only networked block drivers that you lose this way?

Yes, nothing else calls qemu_aio_set_fd_handler on sockets.  qemu-nbd
uses qemu_set_fd_handler2 so it should work.

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch Paolo Bonzini
  2012-09-25 22:01   ` Anthony Liguori
@ 2012-09-29 11:28   ` Blue Swirl
  2012-10-01  6:40     ` Paolo Bonzini
  1 sibling, 1 reply; 72+ messages in thread
From: Blue Swirl @ 2012-09-29 11:28 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 25, 2012 at 12:55 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> This adds a GPollFD to each AioHandler.  It will then be possible to
> attach these GPollFDs to a GSource, and from there to the main loop.
> aio_wait examines the GPollFDs and avoids calling select() if any
> is set (similar to what it does if bottom halves are available).
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  aio.c      | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++---------
>  qemu-aio.h |  7 ++++++
>  2 file modificati, 78 inserzioni(+), 11 rimozioni(-)
>
> diff --git a/aio.c b/aio.c
> index 95ad467..c848a9f 100644
> --- a/aio.c
> +++ b/aio.c
> @@ -20,7 +20,7 @@
>
>  struct AioHandler
>  {
> -    int fd;
> +    GPollFD pfd;
>      IOHandler *io_read;
>      IOHandler *io_write;
>      AioFlushHandler *io_flush;
> @@ -34,7 +34,7 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
>      AioHandler *node;
>
>      QLIST_FOREACH(node, &ctx->aio_handlers, node) {
> -        if (node->fd == fd)
> +        if (node->pfd.fd == fd)

Forgot to run checkpatch.pl?

>              if (!node->deleted)
>                  return node;
>      }
> @@ -57,9 +57,10 @@ void aio_set_fd_handler(AioContext *ctx,
>      if (!io_read && !io_write) {
>          if (node) {
>              /* If the lock is held, just mark the node as deleted */
> -            if (ctx->walking_handlers)
> +            if (ctx->walking_handlers) {
>                  node->deleted = 1;
> -            else {
> +                node->pfd.revents = 0;
> +            } else {
>                  /* Otherwise, delete it for real.  We can't just mark it as
>                   * deleted because deleted nodes are only cleaned up after
>                   * releasing the walking_handlers lock.
> @@ -72,7 +73,7 @@ void aio_set_fd_handler(AioContext *ctx,
>          if (node == NULL) {
>              /* Alloc and insert if it's not already there */
>              node = g_malloc0(sizeof(AioHandler));
> -            node->fd = fd;
> +            node->pfd.fd = fd;
>              QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
>          }
>          /* Update handler with latest information */
> @@ -80,6 +81,10 @@ void aio_set_fd_handler(AioContext *ctx,
>          node->io_write = io_write;
>          node->io_flush = io_flush;
>          node->opaque = opaque;
> +
> +        node->pfd.events = G_IO_ERR;
> +        node->pfd.events |= (io_read ? G_IO_IN | G_IO_HUP : 0);
> +        node->pfd.events |= (io_write ? G_IO_OUT : 0);
>      }
>  }
>
> @@ -93,6 +98,25 @@ void aio_set_event_notifier(AioContext *ctx,
>                         (AioFlushHandler *)io_flush, notifier);
>  }
>
> +bool aio_pending(AioContext *ctx)
> +{
> +    AioHandler *node;
> +
> +    QLIST_FOREACH(node, &ctx->aio_handlers, node) {
> +        int revents;
> +
> +        revents = node->pfd.revents & node->pfd.events;
> +        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
> +            return true;
> +        }
> +        if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>  bool aio_poll(AioContext *ctx, bool blocking)
>  {
>      static struct timeval tv0;
> @@ -114,6 +138,42 @@ bool aio_poll(AioContext *ctx, bool blocking)
>          progress = true;
>      }
>
> +    /*
> +     * Then dispatch any pending callbacks from the GSource.
> +     *
> +     * We have to walk very carefully in case qemu_aio_set_fd_handler is
> +     * called while we're walking.
> +     */
> +    node = QLIST_FIRST(&ctx->aio_handlers);
> +    while (node) {
> +        AioHandler *tmp;
> +        int revents;
> +
> +        ctx->walking_handlers++;
> +
> +        revents = node->pfd.revents & node->pfd.events;
> +        node->pfd.revents &= ~revents;
> +
> +        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
> +            node->io_read(node->opaque);
> +            progress = true;
> +        }
> +        if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
> +            node->io_write(node->opaque);
> +            progress = true;
> +        }
> +
> +        tmp = node;
> +        node = QLIST_NEXT(node, node);
> +
> +        ctx->walking_handlers--;
> +
> +        if (!ctx->walking_handlers && tmp->deleted) {
> +            QLIST_REMOVE(tmp, node);
> +            g_free(tmp);
> +        }
> +    }
> +
>      if (progress && !blocking) {
>          return true;
>      }
> @@ -137,12 +197,12 @@ bool aio_poll(AioContext *ctx, bool blocking)
>              busy = true;
>          }
>          if (!node->deleted && node->io_read) {
> -            FD_SET(node->fd, &rdfds);
> -            max_fd = MAX(max_fd, node->fd + 1);
> +            FD_SET(node->pfd.fd, &rdfds);
> +            max_fd = MAX(max_fd, node->pfd.fd + 1);
>          }
>          if (!node->deleted && node->io_write) {
> -            FD_SET(node->fd, &wrfds);
> -            max_fd = MAX(max_fd, node->fd + 1);
> +            FD_SET(node->pfd.fd, &wrfds);
> +            max_fd = MAX(max_fd, node->pfd.fd + 1);
>          }
>      }
>
> @@ -167,12 +227,12 @@ bool aio_poll(AioContext *ctx, bool blocking)
>              ctx->walking_handlers++;
>
>              if (!node->deleted &&
> -                FD_ISSET(node->fd, &rdfds) &&
> +                FD_ISSET(node->pfd.fd, &rdfds) &&
>                  node->io_read) {
>                  node->io_read(node->opaque);
>              }
>              if (!node->deleted &&
> -                FD_ISSET(node->fd, &wrfds) &&
> +                FD_ISSET(node->pfd.fd, &wrfds) &&
>                  node->io_write) {
>                  node->io_write(node->opaque);
>              }
> diff --git a/qemu-aio.h b/qemu-aio.h
> index f19201e..ac24896 100644
> --- a/qemu-aio.h
> +++ b/qemu-aio.h
> @@ -133,6 +133,13 @@ void qemu_bh_delete(QEMUBH *bh);
>   * outstanding AIO operations have been completed or cancelled. */
>  void aio_flush(AioContext *ctx);
>
> +/* Return whether there are any pending callbacks from the GSource
> + * attached to the AioContext.
> + *
> + * This is used internally in the implementation of the GSource.
> + */
> +bool aio_pending(AioContext *ctx);
> +
>  /* Progress in completing AIO work to occur.  This can issue new pending
>   * aio as a result of executing I/O completion or bh callbacks.
>   *
> --
> 1.7.12
>
>
>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch
  2012-09-29 11:28   ` Blue Swirl
@ 2012-10-01  6:40     ` Paolo Bonzini
  0 siblings, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-10-01  6:40 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

Il 29/09/2012 13:28, Blue Swirl ha scritto:
>> > +        if (node->pfd.fd == fd)
> Forgot to run checkpatch.pl?
> 

No, just ignored its result for an RFC.

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] event_notifier: enable it to use pipes
  2012-09-25 12:55 ` [Qemu-devel] [PATCH 02/17] event_notifier: enable it to use pipes Paolo Bonzini
@ 2012-10-08  7:03   ` Stefan Hajnoczi
  0 siblings, 0 replies; 72+ messages in thread
From: Stefan Hajnoczi @ 2012-10-08  7:03 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 25, 2012 at 02:55:48PM +0200, Paolo Bonzini wrote:
>  int event_notifier_init(EventNotifier *e, int active)
>  {
> +    int fds[2];
> +    int ret;
> +
>  #ifdef CONFIG_EVENTFD
> -    int fd = eventfd(!!active, EFD_NONBLOCK | EFD_CLOEXEC);
> -    if (fd < 0)
> -        return -errno;
> -    e->fd = fd;
> -    return 0;
> +    ret = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
>  #else
> -    return -ENOSYS;
> +    ret = -1;
> +    errno = ENOSYS;
>  #endif
> +    if (ret >= 0) {
> +        e->rfd = e->wfd = ret;
> +    } else {
> +        if (errno != ENOSYS) {
> +            return -errno;
> +        }
> +        if (qemu_pipe(fds) < 0) {
> +            return -errno;
> +        }
> +        ret = fcntl_setfl(fds[0], O_NONBLOCK);
> +        if (ret < 0) {

ret = -errno;

> +            goto fail;
> +        }
> +        ret = fcntl_setfl(fds[1], O_NONBLOCK);
> +        if (ret < 0) {

ret = -errno;

> +            goto fail;
> +        }
> +        e->rfd = fds[0];
> +        e->wfd = fds[1];
> +    }
> +    if (active) {
> +        event_notifier_set(e);
> +    }
> +    return 0;
> +
> +fail:
> +    close(fds[0]);
> +    close(fds[1]);
> +    return ret;
>  }
>  
>  void event_notifier_cleanup(EventNotifier *e)
>  {
> -    close(e->fd);
> +    if (e->rfd != e->wfd) {
> +        close(e->rfd);
> +    }
> +    close(e->wfd);
>  }
>  
>  int event_notifier_get_fd(EventNotifier *e)
>  {
> -    return e->fd;
> +    return e->rfd;
>  }
>  
>  int event_notifier_set_handler(EventNotifier *e,
>                                 EventNotifierHandler *handler)
>  {
> -    return qemu_set_fd_handler(e->fd, (IOHandler *)handler, NULL, e);
> +    return qemu_set_fd_handler(e->rfd, (IOHandler *)handler, NULL, e);
>  }
>  
>  int event_notifier_set(EventNotifier *e)
>  {
> -    uint64_t value = 1;
> -    int r = write(e->fd, &value, sizeof(value));
> -    return r == sizeof(value);
> +    static const uint64_t value = 1;
> +    ssize_t ret;
> +
> +    do {
> +        ret = write(e->wfd, &value, sizeof(value));
> +    } while (ret < 0 && errno == EINTR);
> +
> +    /* EAGAIN is fine, a read must be pending.  */
> +    if (ret < 0 && errno != EAGAIN) {
> +        return -1;

If we're changing the return value representation, then we might as well:

return -errno;

But no caller actually checks the return value.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts"
  2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
                   ` (17 preceding siblings ...)
  2012-09-26 12:28 ` [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Kevin Wolf
@ 2012-10-08 11:39 ` Stefan Hajnoczi
  2012-10-08 13:00   ` Paolo Bonzini
  18 siblings, 1 reply; 72+ messages in thread
From: Stefan Hajnoczi @ 2012-10-08 11:39 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 25, 2012 at 02:55:46PM +0200, Paolo Bonzini wrote:
> This series removes the globals from async.c/aio-posix.c so that
> multiple AIO contexts (mini event loops) can be added.  Right now,
> all block devices still use qemu_bh_new, but switching them to
> aio_bh_new would let you associate different files with different
> AIO contexts.
> 
> As an added bonus, integration with the glib main loop now happens
> via GSource.  Each AIO context is a GSource, which means you can
> choose either to run it in its own thread (this of course needs
> proper locking which is not yet here), or to attach it to the main
> thread.
> 
> In this state this is a bit of an academic exercise (though it works
> and may even make sense for 1.3), but I think it's an example of the
> tiny steps that can lead us towards an upstreamable version of the
> data-plane code.
> 
> Paolo
> 
> Paolo Bonzini (17):
>   build: do not rely on indirect inclusion of qemu-config.h
>   event_notifier: enable it to use pipes
>   event_notifier: add Win32 implementation
>   aio: change qemu_aio_set_fd_handler to return void
>   aio: provide platform-independent API
>   aio: introduce AioContext, move bottom halves there
>   aio: add I/O handlers to the AioContext interface
>   aio: add non-blocking variant of aio_wait
>   aio: prepare for introducing GSource-based dispatch
>   aio: add Win32 implementation
>   aio: make AioContexts GSources
>   aio: add aio_notify
>   aio: call aio_notify after setting I/O handlers
>   main-loop: use GSource to poll AIO file descriptors
>   main-loop: use aio_notify for qemu_notify_event
>   aio: clean up now-unused functions
>   linux-aio: use event notifiers
> 
>  Makefile.objs                              |   6 +-
>  aio.c => aio-posix.c                       | 159 +++++++++++++++-------
>  aio.c => aio-win32.c                       | 190 ++++++++++++++------------
>  async.c                                    | 118 ++++++++++++++---
>  block/Makefile.objs                        |   6 +-
>  block/blkdebug.c                           |   1 +
>  block/iscsi.c                              |   1 +
>  event_notifier.c => event_notifier-posix.c |  83 +++++++++---
>  event_notifier.c => event_notifier-win32.c |  48 +++----
>  event_notifier.h                           |  20 ++-
>  hw/hw.h                                    |   1 +
>  iohandler.c                                |   1 +
>  linux-aio.c                                |  49 +++----
>  main-loop.c                                | 152 +++++++--------------
>  main-loop.h                                |  56 +-------
>  oslib-posix.c                              |  31 -----
>  qemu-aio.h                                 | 206 +++++++++++++++++++++++++++--
>  qemu-char.h                                |   1 +
>  qemu-common.h                              |   2 +-
>  qemu-config.h                              |   1 +
>  qemu-coroutine-lock.c                      |   2 +-
>  qemu-os-win32.h                            |   1 -
>  22 file modificati, 702 inserzioni(+), 433 rimozioni(-)
>  copy aio.c => aio-posix.c (48%)
>  rename aio.c => aio-win32.c (44%)
>  copy event_notifier.c => event_notifier-posix.c (36%)
>  rename event_notifier.c => event_notifier-win32.c (49%)

This series looks useful - it compartmentalizes aio.c so there can be multiple
event loops.  In order to get a performance benefit (hooking up virtio-blk
ioeventfd to a non-QEMU mutex thread) we need two more things:

1. Block layer association with an AioContext (perhaps BlockDriverState <->
   AioContext attaching).

2. Thread pool for dispatching I/O requests outside the QEMU global mutex.

I'm starting to work on these steps and will send RFCs.  This series looks good
to me.

Stefan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts"
  2012-10-08 11:39 ` Stefan Hajnoczi
@ 2012-10-08 13:00   ` Paolo Bonzini
  2012-10-09  9:08     ` [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts"" Stefan Hajnoczi
  0 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-10-08 13:00 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

Il 08/10/2012 13:39, Stefan Hajnoczi ha scritto:
> This series looks useful - it compartmentalizes aio.c so there can be multiple
> event loops.  In order to get a performance benefit (hooking up virtio-blk
> ioeventfd to a non-QEMU mutex thread) we need two more things:
> 
> 1. Block layer association with an AioContext (perhaps BlockDriverState <->
>    AioContext attaching).

Right.  This governs which AioContext the bottom halves are created on,
mostly.  Plus, all block devices along the ->file and ->backing_hd
chains need to belong to the same AioContext.

> 2. Thread pool for dispatching I/O requests outside the QEMU global mutex.

I looked at this in the past and it feels like a dead end to me.  I had
a lot of special code in the thread-pool to mimic yield/enter of
threadpool work-items.  It was needed mostly for I/O throttling, but
also because it feels unsafe to swap a CoMutex with a Mutex---the
waiting I/O operations can starve the threadpool.

I now think it is simpler to keep a cooperative coroutine-based
multitasking in the general case.  At the same time you can ensure that
AIO formats (both Linux and posix-aio-compat) gets a suitable
no-coroutine fast path in the common case of no copy-on-read, no
throttling, etc. -- which can be done in the current code too.

Another important step would be to add bdrv_drain.  Kevin pointed out to
me that only ->file and ->backing_hd need to be drained.  Well, there
may be other BlockDriverStates for vmdk extents or similar cases
(Benoit's quorum device for example)... these need to be handled the
same way for bdrv_flush, bdrv_reopen, bdrv_drain so perhaps it is useful
to add a common way to get them.

And you need a lock to the AioContext, too.  Then the block device can
we the AioContext lock in order to synchronize multiple threads working
on the block device.  The lock will effectively block the ioeventfd
thread, so that bdrv_lock+bdrv_drain+...+bdrv_unlock is a replacement
for the current usage of bdrv_drain_all within the QEMU lock.

> I'm starting to work on these steps and will send RFCs. This series
> looks good to me.

Thanks!  A lot of the next steps can be done in parallel and more
importantly none of them blocks each other (roughly)... so I'm eager to
look at your stuff! :)

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-08 13:00   ` Paolo Bonzini
@ 2012-10-09  9:08     ` Stefan Hajnoczi
  2012-10-09  9:26       ` Avi Kivity
  2012-10-09 15:02       ` Anthony Liguori
  0 siblings, 2 replies; 72+ messages in thread
From: Stefan Hajnoczi @ 2012-10-09  9:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Anthony Liguori, Ping Fan Liu, qemu-devel, Avi Kivity

On Mon, Oct 08, 2012 at 03:00:04PM +0200, Paolo Bonzini wrote:
> Il 08/10/2012 13:39, Stefan Hajnoczi ha scritto:
> > 2. Thread pool for dispatching I/O requests outside the QEMU global mutex.
> 
> I looked at this in the past and it feels like a dead end to me.  I had
> a lot of special code in the thread-pool to mimic yield/enter of
> threadpool work-items.  It was needed mostly for I/O throttling, but
> also because it feels unsafe to swap a CoMutex with a Mutex---the
> waiting I/O operations can starve the threadpool.
> 
> I now think it is simpler to keep a cooperative coroutine-based
> multitasking in the general case.  At the same time you can ensure that
> AIO formats (both Linux and posix-aio-compat) gets a suitable
> no-coroutine fast path in the common case of no copy-on-read, no
> throttling, etc. -- which can be done in the current code too.

You're right.  Initially we can keep coroutines and add aio fastpaths.  There's
no need to make invasive block layer changes to coroutines -> threads yet.

> Another important step would be to add bdrv_drain.  Kevin pointed out to
> me that only ->file and ->backing_hd need to be drained.  Well, there
> may be other BlockDriverStates for vmdk extents or similar cases
> (Benoit's quorum device for example)... these need to be handled the
> same way for bdrv_flush, bdrv_reopen, bdrv_drain so perhaps it is useful
> to add a common way to get them.
> 
> And you need a lock to the AioContext, too.  Then the block device can
> we the AioContext lock in order to synchronize multiple threads working
> on the block device.  The lock will effectively block the ioeventfd
> thread, so that bdrv_lock+bdrv_drain+...+bdrv_unlock is a replacement
> for the current usage of bdrv_drain_all within the QEMU lock.
> 
> > I'm starting to work on these steps and will send RFCs. This series
> > looks good to me.
> 
> Thanks!  A lot of the next steps can be done in parallel and more
> importantly none of them blocks each other (roughly)... so I'm eager to
> look at your stuff! :)

Some notes on moving virtio-blk processing out of the QEMU global mutex:

1. Dedicated thread for non-QEMU mutex virtio ioeventfd processing.
   The point of this thread is to process without the QEMU global mutex, using
   only fine-grained locks.  (In the future this thread can be integrated back
   into the QEMU iothread when the global mutex has been eliminated.)

   Dedicated thread must hold reference to virtio-blk device so it will
   not be destroyed.  Hot unplug requires asking ioeventfd processing
   threads to release reference.

2. Versions of virtqueue_pop() and virtqueue_push() that execute outside
   global QEMU mutex.  Look at memory API and threaded device dispatch.

   The virtio device itself must have a lock so its vring-related state
   can be modified safely.

Here are the steps that have been mentioned:

1. aio fastpath - for raw-posix and other aio block drivers, can we reduce I/O
   request latency by skipping block layer coroutines?  This is can be
   prototyped (hacked) easily to scope out how much benefit we get.  It's
   completely independent from the global mutex related work.

2. BlockDriverState <-> AioContext attach.  Allows I/O requests to be processed
   by in event loops other than the QEMU iothread.

3. bdrv_drain() and BlockDriverState synchronization.  Make it safe to use
   BlockDriverState outside the QEMU mutex and ensure that bdrv_drain() works.

4. Unlocked event loop thread.  This is simlar to QEMU's iothread except it
   doesn't take the big lock.  In theory we could have several of these threads
   processing at the same time.  virtio-blk ioeventfd processing will be done
   in this thread.

5. virtqueue_pop()/virtqueue_push() without QEMU global mutex.  Before this is
   implemented we could temporarily acquire/release the QEMU global mutex.

Let me run benchmarks on the aio fastpath.  I'm curious how much difference it
makes.

I'm also curious about virtqueue_pop()/virtqueue_push() outside the QEMU mutex
although that might be blocked by the current work around MMIO/PIO dispatch
outside the global mutex.

Stefan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09  9:08     ` [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts"" Stefan Hajnoczi
@ 2012-10-09  9:26       ` Avi Kivity
  2012-10-09 10:36         ` Paolo Bonzini
  2012-10-09 15:02       ` Anthony Liguori
  1 sibling, 1 reply; 72+ messages in thread
From: Avi Kivity @ 2012-10-09  9:26 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, Anthony Liguori, Ping Fan Liu, qemu-devel

On 10/09/2012 11:08 AM, Stefan Hajnoczi wrote:
> Here are the steps that have been mentioned:
> 
> 1. aio fastpath - for raw-posix and other aio block drivers, can we reduce I/O
>    request latency by skipping block layer coroutines?  

Is coroutine overhead noticable?

> I'm also curious about virtqueue_pop()/virtqueue_push() outside the QEMU mutex
> although that might be blocked by the current work around MMIO/PIO dispatch
> outside the global mutex.

It is, yes.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09  9:26       ` Avi Kivity
@ 2012-10-09 10:36         ` Paolo Bonzini
  2012-10-09 10:52           ` Avi Kivity
  0 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-10-09 10:36 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Wolf, Stefan Hajnoczi, Anthony Liguori, Ping Fan Liu, qemu-devel

Il 09/10/2012 11:26, Avi Kivity ha scritto:
> On 10/09/2012 11:08 AM, Stefan Hajnoczi wrote:
>> Here are the steps that have been mentioned:
>>
>> 1. aio fastpath - for raw-posix and other aio block drivers, can we reduce I/O
>>    request latency by skipping block layer coroutines?  
> 
> Is coroutine overhead noticable?

I'm thinking more about throughput than latency.  If the iothread
becomes CPU-bound, then everything is noticeable.

>> I'm also curious about virtqueue_pop()/virtqueue_push() outside the QEMU mutex
>> although that might be blocked by the current work around MMIO/PIO dispatch
>> outside the global mutex.
> 
> It is, yes.

It should only require unlocked memory map/unmap, not MMIO dispatch.
The MMIO/PIO bits are taken care of by ioeventfd.

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 10:36         ` Paolo Bonzini
@ 2012-10-09 10:52           ` Avi Kivity
  2012-10-09 11:08             ` Paolo Bonzini
  0 siblings, 1 reply; 72+ messages in thread
From: Avi Kivity @ 2012-10-09 10:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Stefan Hajnoczi, Anthony Liguori, Ping Fan Liu, qemu-devel

On 10/09/2012 12:36 PM, Paolo Bonzini wrote:
> Il 09/10/2012 11:26, Avi Kivity ha scritto:
>> On 10/09/2012 11:08 AM, Stefan Hajnoczi wrote:
>>> Here are the steps that have been mentioned:
>>>
>>> 1. aio fastpath - for raw-posix and other aio block drivers, can we reduce I/O
>>>    request latency by skipping block layer coroutines?  
>> 
>> Is coroutine overhead noticable?
> 
> I'm thinking more about throughput than latency.  If the iothread
> becomes CPU-bound, then everything is noticeable.

That's not strictly a coroutine issue.  Switching to ordinary threads
may make the problem worse, since there will clearly be contention.

What is the I/O processing time we have?  If it's say 10 microseconds,
then we'll have 100,000 context switches per second assuming a device
lock and a saturated iothread (split into multiple threads).

The coroutine work may have laid the groundwork for fine-grained
locking.  I'm doubtful we should use qcow when we want >100K IOPS though.

> 
>>> I'm also curious about virtqueue_pop()/virtqueue_push() outside the QEMU mutex
>>> although that might be blocked by the current work around MMIO/PIO dispatch
>>> outside the global mutex.
>> 
>> It is, yes.
> 
> It should only require unlocked memory map/unmap, not MMIO dispatch.
> The MMIO/PIO bits are taken care of by ioeventfd.

The ring, or indirect descriptors, or the data, can all be on mmio.
IIRC the virtio spec forbids that, but the APIs have to be general.  We
don't have cpu_physical_memory_map_nommio() (or
address_space_map_nommio(), as soon as the coding style committee
ratifies srtuct literals).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 10:52           ` Avi Kivity
@ 2012-10-09 11:08             ` Paolo Bonzini
  2012-10-09 11:55               ` Avi Kivity
  0 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-10-09 11:08 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Wolf, Stefan Hajnoczi, Anthony Liguori, Ping Fan Liu, qemu-devel

Il 09/10/2012 12:52, Avi Kivity ha scritto:
> On 10/09/2012 12:36 PM, Paolo Bonzini wrote:
>> Il 09/10/2012 11:26, Avi Kivity ha scritto:
>>> On 10/09/2012 11:08 AM, Stefan Hajnoczi wrote:
>>>> Here are the steps that have been mentioned:
>>>>
>>>> 1. aio fastpath - for raw-posix and other aio block drivers, can we reduce I/O
>>>>    request latency by skipping block layer coroutines?  
>>>
>>> Is coroutine overhead noticable?
>>
>> I'm thinking more about throughput than latency.  If the iothread
>> becomes CPU-bound, then everything is noticeable.
> 
> That's not strictly a coroutine issue.  Switching to ordinary threads
> may make the problem worse, since there will clearly be contention.

The point is you don't need either coroutines or userspace threads if
you use native AIO.  longjmp/setjmp is probably a smaller overhead
compared to the many syscalls involved in poll+eventfd
reads+io_submit+io_getevents, but it's also not cheap.  Also, if you
process AIO in batches you risk overflowing the pool of free coroutines,
which gets expensive real fast (allocate/free the stack, do the
expensive getcontext/swapcontext instead of the cheaper longjmp/setjmp,
etc.).

It seems better to sidestep the issue completely, it's a small amount of
work.

> What is the I/O processing time we have?  If it's say 10 microseconds,
> then we'll have 100,000 context switches per second assuming a device
> lock and a saturated iothread (split into multiple threads).

Hopefully with a saturated dedicated iothread you would not have any
context switches and a single CPU will be just dedicated to virtio
processing.

> The coroutine work may have laid the groundwork for fine-grained
> locking.  I'm doubtful we should use qcow when we want >100K IOPS though.

Yep.  Going away from coroutines is a solution in search of a problem,
it will introduce several new variables (kernel scheduling, more
expensive lock contention, starving the thread pool with locked threads,
...), all for a case where performance hardly matters.

>>>> I'm also curious about virtqueue_pop()/virtqueue_push() outside the QEMU mutex
>>>> although that might be blocked by the current work around MMIO/PIO dispatch
>>>> outside the global mutex.
>>>
>>> It is, yes.
>>
>> It should only require unlocked memory map/unmap, not MMIO dispatch.
>> The MMIO/PIO bits are taken care of by ioeventfd.
> 
> The ring, or indirect descriptors, or the data, can all be on mmio.
> IIRC the virtio spec forbids that, but the APIs have to be general.  We
> don't have cpu_physical_memory_map_nommio() (or
> address_space_map_nommio(), as soon as the coding style committee
> ratifies srtuct literals).

cpu_physical_memory_map could still take the QEMU lock in the slow
bounce-buffer case.  BTW the block layer has been using struct literals
for a long time and we're just as happy as you are about them. :)

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 11:08             ` Paolo Bonzini
@ 2012-10-09 11:55               ` Avi Kivity
  2012-10-09 12:01                 ` Paolo Bonzini
  0 siblings, 1 reply; 72+ messages in thread
From: Avi Kivity @ 2012-10-09 11:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Stefan Hajnoczi, Anthony Liguori, Ping Fan Liu, qemu-devel

On 10/09/2012 01:08 PM, Paolo Bonzini wrote:
>> 
>> That's not strictly a coroutine issue.  Switching to ordinary threads
>> may make the problem worse, since there will clearly be contention.
> 
> The point is you don't need either coroutines or userspace threads if
> you use native AIO.  longjmp/setjmp is probably a smaller overhead
> compared to the many syscalls involved in poll+eventfd
> reads+io_submit+io_getevents, but it's also not cheap.  Also, if you
> process AIO in batches you risk overflowing the pool of free coroutines,
> which gets expensive real fast (allocate/free the stack, do the
> expensive getcontext/swapcontext instead of the cheaper longjmp/setjmp,
> etc.).
> 
> It seems better to sidestep the issue completely, it's a small amount of
> work.

Oh, agree 100% raw + native aio wants to bypass coroutines/threads
completely.

>> What is the I/O processing time we have?  If it's say 10 microseconds,
>> then we'll have 100,000 context switches per second assuming a device
>> lock and a saturated iothread (split into multiple threads).
> 
> Hopefully with a saturated dedicated iothread you would not have any
> context switches and a single CPU will be just dedicated to virtio
> processing.

I meant, if you break that saturated thread into multiple threads (in
order to break the 1 core limit), then you start to context switch badly.

> 
>> The coroutine work may have laid the groundwork for fine-grained
>> locking.  I'm doubtful we should use qcow when we want >100K IOPS though.
> 
> Yep.  Going away from coroutines is a solution in search of a problem,
> it will introduce several new variables (kernel scheduling, more
> expensive lock contention, starving the thread pool with locked threads,
> ...), all for a case where performance hardly matters.
> 
>>>>> I'm also curious about virtqueue_pop()/virtqueue_push() outside the QEMU mutex
>>>>> although that might be blocked by the current work around MMIO/PIO dispatch
>>>>> outside the global mutex.
>>>>
>>>> It is, yes.
>>>
>>> It should only require unlocked memory map/unmap, not MMIO dispatch.
>>> The MMIO/PIO bits are taken care of by ioeventfd.
>> 
>> The ring, or indirect descriptors, or the data, can all be on mmio.
>> IIRC the virtio spec forbids that, but the APIs have to be general.  We
>> don't have cpu_physical_memory_map_nommio() (or
>> address_space_map_nommio(), as soon as the coding style committee
>> ratifies srtuct literals).
> 
> cpu_physical_memory_map could still take the QEMU lock in the slow
> bounce-buffer case.  

You're right.  In fact this is a good opportunity to introduce lockless
lookups where the only optimized path is RAM -- ioeventfd provides a
lockless lookup of its own.

We could perhaps even avoid refcounting, by shutting down the device
thread as part of hotunplug.

[could we also avoid refcounting by doing the equivalent of
stop_machine() during hotunplug?]

> BTW the block layer has been using struct literals
> for a long time and we're just as happy as you are about them. :)

So does upstream memory.c and the json tests.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 11:55               ` Avi Kivity
@ 2012-10-09 12:01                 ` Paolo Bonzini
  2012-10-09 12:18                   ` Jan Kiszka
                                     ` (2 more replies)
  0 siblings, 3 replies; 72+ messages in thread
From: Paolo Bonzini @ 2012-10-09 12:01 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Wolf, Anthony Liguori, Ping Fan Liu, Stefan Hajnoczi,
	qemu-devel, Jan Kiszka

Il 09/10/2012 13:55, Avi Kivity ha scritto:
> Oh, agree 100% raw + native aio wants to bypass coroutines/threads
> completely.

Even posix-aio-compat can bypass coroutines.

> We could perhaps even avoid refcounting, by shutting down the device
> thread as part of hotunplug.

Yes, you "just" join the thread, ask it to exit, and not hot-unplug
until it's done.

> [could we also avoid refcounting by doing the equivalent of
> stop_machine() during hotunplug?]

That's quite an interesting alternative.

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 12:01                 ` Paolo Bonzini
@ 2012-10-09 12:18                   ` Jan Kiszka
  2012-10-09 12:28                     ` Avi Kivity
  2012-10-09 12:22                   ` Avi Kivity
  2012-10-09 14:05                   ` Stefan Hajnoczi
  2 siblings, 1 reply; 72+ messages in thread
From: Jan Kiszka @ 2012-10-09 12:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Anthony Liguori, Ping Fan Liu, Stefan Hajnoczi,
	qemu-devel, Avi Kivity

On 2012-10-09 14:01, Paolo Bonzini wrote:
> Il 09/10/2012 13:55, Avi Kivity ha scritto:
>> Oh, agree 100% raw + native aio wants to bypass coroutines/threads
>> completely.
> 
> Even posix-aio-compat can bypass coroutines.
> 
>> We could perhaps even avoid refcounting, by shutting down the device
>> thread as part of hotunplug.
> 
> Yes, you "just" join the thread, ask it to exit, and not hot-unplug
> until it's done.
> 
>> [could we also avoid refcounting by doing the equivalent of
>> stop_machine() during hotunplug?]
> 
> That's quite an interesting alternative.

Not sure about the full context of this discussion, but I played with
"stop-machine" (pause_all_vcpus) recently. The problem is, at least ATM,
that it drops the BQL to wait for those threads, and that creates an
unexpected rescheduling point over which a lot of code stumbles. But if
this case here is about new, accordingly written code that is also not
called from unprepared corners, it may work.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 12:01                 ` Paolo Bonzini
  2012-10-09 12:18                   ` Jan Kiszka
@ 2012-10-09 12:22                   ` Avi Kivity
  2012-10-09 13:11                     ` Paolo Bonzini
  2012-10-09 14:05                   ` Stefan Hajnoczi
  2 siblings, 1 reply; 72+ messages in thread
From: Avi Kivity @ 2012-10-09 12:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Anthony Liguori, Ping Fan Liu, Stefan Hajnoczi,
	qemu-devel, Jan Kiszka

On 10/09/2012 02:01 PM, Paolo Bonzini wrote:
> 
>> [could we also avoid refcounting by doing the equivalent of
>> stop_machine() during hotunplug?]
> 
> That's quite an interesting alternative.

It's somewhat unattractive in that we know how much stop_machine is
hated in Linux.  But maybe it makes sense as a transitional path.

Note it's not sufficient to stop vcpu threads, we also have to stop
non-vcpu threads that may be issuing address_space_rw() or family.

But no, it's actually impossible.  Hotplug may be triggered from a vcpu
thread, which clearly it can't be stopped.  The only two solutions are
Ping's garbage collector thread or refcounting.

The original deadlock was:

 read_lock_rcu() / mmap_lock()
 lookup device
 dispatch
   device mmio handler
     memory_region_del_subregion()
       synchronize_rcu() / mmap_lock()
 rcu_read_unlock() / mmap_unlock

stop_machine() is just another name for synchronize_rcu() wrt locking.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 12:18                   ` Jan Kiszka
@ 2012-10-09 12:28                     ` Avi Kivity
  0 siblings, 0 replies; 72+ messages in thread
From: Avi Kivity @ 2012-10-09 12:28 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Kevin Wolf, Anthony Liguori, Ping Fan Liu, Stefan Hajnoczi,
	qemu-devel, Paolo Bonzini

On 10/09/2012 02:18 PM, Jan Kiszka wrote:
> Not sure about the full context of this discussion, but I played with
> "stop-machine" (pause_all_vcpus) recently. The problem is, at least ATM,
> that it drops the BQL to wait for those threads, and that creates an
> unexpected rescheduling point over which a lot of code stumbles. But if
> this case here is about new, accordingly written code that is also not
> called from unprepared corners, it may work.

In theory we could make qemu_mutex_lock() detect that it's being
stop_machine()d and loop there - stop_machine() would just keep the lock
held.  But that's not doable easily with pthreads (there is no
mutex_lock_interruptible() equivalent).

In any case I've cooled down on stop_machine(), see sibling reply.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 12:22                   ` Avi Kivity
@ 2012-10-09 13:11                     ` Paolo Bonzini
  2012-10-09 13:21                       ` Avi Kivity
  0 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-10-09 13:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Wolf, Anthony Liguori, Ping Fan Liu, Stefan Hajnoczi,
	qemu-devel, Jan Kiszka

Il 09/10/2012 14:22, Avi Kivity ha scritto:
> On 10/09/2012 02:01 PM, Paolo Bonzini wrote:
>>
>>> [could we also avoid refcounting by doing the equivalent of
>>> stop_machine() during hotunplug?]
>>
>> That's quite an interesting alternative.
> 
> It's somewhat unattractive in that we know how much stop_machine is
> hated in Linux.  But maybe it makes sense as a transitional path.
> 
> Note it's not sufficient to stop vcpu threads, we also have to stop
> non-vcpu threads that may be issuing address_space_rw() or family.

Yes, we need some list of "guest workers", of which VCPUs are just one
example.

> But no, it's actually impossible.  Hotplug may be triggered from a vcpu
> thread, which clearly it can't be stopped.

Hotplug should always be asynchronous (because that's how hardware
works), so it should always be possible to delegate the actual work to a
non-VCPU thread.  Or not?

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 13:11                     ` Paolo Bonzini
@ 2012-10-09 13:21                       ` Avi Kivity
  2012-10-09 13:50                         ` Paolo Bonzini
  0 siblings, 1 reply; 72+ messages in thread
From: Avi Kivity @ 2012-10-09 13:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Anthony Liguori, Ping Fan Liu, Stefan Hajnoczi,
	qemu-devel, Jan Kiszka

On 10/09/2012 03:11 PM, Paolo Bonzini wrote:
>> But no, it's actually impossible.  Hotplug may be triggered from a vcpu
>> thread, which clearly it can't be stopped.
> 
> Hotplug should always be asynchronous (because that's how hardware
> works), so it should always be possible to delegate the actual work to a
> non-VCPU thread.  Or not?

The actual device deletion can happen from a different thread, as long
as you isolate the device before.  That's part of the garbage collector
idea.

vcpu thread:
  rcu_read_lock
  lookup
  dispatch
    mmio handler
      isolate
      queue(delete_work)
  rcu_read_unlock

worker thread:
  process queue
    delete_work
      synchronize_rcu() / stop_machine()
      acquire qemu lock
      delete object
      drop qemu lock

Compared to the garbage collector idea, this drops fined-grained locking
for the qdev tree, a significant advantage.  But it still suffers from
dispatching inside the rcu critical section, which is something we want
to avoid.

I think refcounting is still the best direction, but maybe we can think
of a new idea here.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 13:21                       ` Avi Kivity
@ 2012-10-09 13:50                         ` Paolo Bonzini
  2012-10-09 14:24                           ` Avi Kivity
  0 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-10-09 13:50 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Wolf, Anthony Liguori, Ping Fan Liu, Stefan Hajnoczi,
	qemu-devel, Jan Kiszka

Il 09/10/2012 15:21, Avi Kivity ha scritto:
> On 10/09/2012 03:11 PM, Paolo Bonzini wrote:
>>> But no, it's actually impossible.  Hotplug may be triggered from a vcpu
>>> thread, which clearly it can't be stopped.
>>
>> Hotplug should always be asynchronous (because that's how hardware
>> works), so it should always be possible to delegate the actual work to a
>> non-VCPU thread.  Or not?
> 
> The actual device deletion can happen from a different thread, as long
> as you isolate the device before.  That's part of the garbage collector
> idea.
> 
> vcpu thread:
>   rcu_read_lock
>   lookup
>   dispatch
>     mmio handler
>       isolate
>       queue(delete_work)
>   rcu_read_unlock
> 
> worker thread:
>   process queue
>     delete_work
>       synchronize_rcu() / stop_machine()
>       acquire qemu lock
>       delete object
>       drop qemu lock
> 
> Compared to the garbage collector idea, this drops fined-grained locking
> for the qdev tree, a significant advantage.  But it still suffers from
> dispatching inside the rcu critical section, which is something we want
> to avoid.

But we are not Linux, and I think the tradeoffs are different for RCU in
Linux vs. QEMU.

For CPUs in the kernel, running user code is just one way to get things
done; QEMU threads are much more event driven, and their whole purpose
is to either run the guest or sleep, until "something happens" (VCPU
exit or readable fd).  In other words, QEMU threads should be able to
stay most of the time in KVM_RUN or select() for any workload (to some
approximation).

Not just that: we do not need to minimize RCU critical sections, because
anyway we want to minimize the time spent in QEMU, period.

So I believe that to some approximation, in QEMU we can completely
ignore everything else, and behave as if threads were always under
rcu_read_lock(), except if in KVM_RUN/select.  KVM_RUN and select are
what Paul McKenney calls extended quiescent states, and in fact the
following mapping works:

    rcu_extended_quiesce_start()     -> rcu_read_unlock();
    rcu_extended_quiesce_end()       -> rcu_read_lock();
    rcu_read_lock/unlock()           -> nop

This in turn means that dispatching inside the RCU critical section is
not really bad.

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 12:01                 ` Paolo Bonzini
  2012-10-09 12:18                   ` Jan Kiszka
  2012-10-09 12:22                   ` Avi Kivity
@ 2012-10-09 14:05                   ` Stefan Hajnoczi
  2 siblings, 0 replies; 72+ messages in thread
From: Stefan Hajnoczi @ 2012-10-09 14:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Anthony Liguori, Ping Fan Liu, Jan Kiszka,
	qemu-devel, Avi Kivity, Asias He

On Tue, Oct 9, 2012 at 2:01 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 09/10/2012 13:55, Avi Kivity ha scritto:
>> Oh, agree 100% raw + native aio wants to bypass coroutines/threads
>> completely.
>
> Even posix-aio-compat can bypass coroutines.

Yes.  I'll send benchmarks to show how much overhead there is from the
block layer coroutines.

The reason I'm interested in the aio fastpath is because Asias
benchmarked lkvm vs qemu-kvm and found that QEMU's average latency and
variance are much higher.

Stefan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 13:50                         ` Paolo Bonzini
@ 2012-10-09 14:24                           ` Avi Kivity
  2012-10-09 14:35                             ` Paolo Bonzini
  0 siblings, 1 reply; 72+ messages in thread
From: Avi Kivity @ 2012-10-09 14:24 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Anthony Liguori, Ping Fan Liu, Stefan Hajnoczi,
	qemu-devel, Jan Kiszka

On 10/09/2012 03:50 PM, Paolo Bonzini wrote:
> Il 09/10/2012 15:21, Avi Kivity ha scritto:
>> On 10/09/2012 03:11 PM, Paolo Bonzini wrote:
>>>> But no, it's actually impossible.  Hotplug may be triggered from a vcpu
>>>> thread, which clearly it can't be stopped.
>>>
>>> Hotplug should always be asynchronous (because that's how hardware
>>> works), so it should always be possible to delegate the actual work to a
>>> non-VCPU thread.  Or not?
>> 
>> The actual device deletion can happen from a different thread, as long
>> as you isolate the device before.  That's part of the garbage collector
>> idea.
>> 
>> vcpu thread:
>>   rcu_read_lock
>>   lookup
>>   dispatch
>>     mmio handler
>>       isolate
>>       queue(delete_work)
>>   rcu_read_unlock
>> 
>> worker thread:
>>   process queue
>>     delete_work
>>       synchronize_rcu() / stop_machine()
>>       acquire qemu lock
>>       delete object
>>       drop qemu lock
>> 
>> Compared to the garbage collector idea, this drops fined-grained locking
>> for the qdev tree, a significant advantage.  But it still suffers from
>> dispatching inside the rcu critical section, which is something we want
>> to avoid.
> 
> But we are not Linux, and I think the tradeoffs are different for RCU in
> Linux vs. QEMU.
> 
> For CPUs in the kernel, running user code is just one way to get things
> done; QEMU threads are much more event driven, and their whole purpose
> is to either run the guest or sleep, until "something happens" (VCPU
> exit or readable fd).  In other words, QEMU threads should be able to
> stay most of the time in KVM_RUN or select() for any workload (to some
> approximation).

If you're streaming data (the saturated iothread from that other thread)
or live migrating or have a block job with fast storage, this isn't
necessarily true.  You could make sure each thread polls the rcu state
periodically though.

> Not just that: we do not need to minimize RCU critical sections, because
> anyway we want to minimize the time spent in QEMU, period.
> 
> So I believe that to some approximation, in QEMU we can completely
> ignore everything else, and behave as if threads were always under
> rcu_read_lock(), except if in KVM_RUN/select.  KVM_RUN and select are
> what Paul McKenney calls extended quiescent states, and in fact the
> following mapping works:
> 
>     rcu_extended_quiesce_start()     -> rcu_read_unlock();
>     rcu_extended_quiesce_end()       -> rcu_read_lock();
>     rcu_read_lock/unlock()           -> nop
> 
> This in turn means that dispatching inside the RCU critical section is
> not really bad.

I believe you still cannot synchronize_rcu() while in an rcu critical
section per the rcu documentation, even when lock/unlock map to nops.
Of course we can violate that and it wouldn't know a thing, but I prefer
to stick to the established pattern.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 14:24                           ` Avi Kivity
@ 2012-10-09 14:35                             ` Paolo Bonzini
  2012-10-09 14:41                               ` Avi Kivity
  0 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-10-09 14:35 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Wolf, Anthony Liguori, Ping Fan Liu, Stefan Hajnoczi,
	qemu-devel, Jan Kiszka

Il 09/10/2012 16:24, Avi Kivity ha scritto:
> > But we are not Linux, and I think the tradeoffs are different for RCU in
> > Linux vs. QEMU.
> > 
> > For CPUs in the kernel, running user code is just one way to get things
> > done; QEMU threads are much more event driven, and their whole purpose
> > is to either run the guest or sleep, until "something happens" (VCPU
> > exit or readable fd).  In other words, QEMU threads should be able to
> > stay most of the time in KVM_RUN or select() for any workload (to some
> > approximation).
> 
> If you're streaming data (the saturated iothread from that other thread)
> or live migrating or have a block job with fast storage, this isn't
> necessarily true.  You could make sure each thread polls the rcu state
> periodically though.

Yep, that was the approximation part.

>> > Not just that: we do not need to minimize RCU critical sections, because
>> > anyway we want to minimize the time spent in QEMU, period.
>> > 
>> > So I believe that to some approximation, in QEMU we can completely
>> > ignore everything else, and behave as if threads were always under
>> > rcu_read_lock(), except if in KVM_RUN/select.  KVM_RUN and select are
>> > what Paul McKenney calls extended quiescent states, and in fact the
>> > following mapping works:
>> > 
>> >     rcu_extended_quiesce_start()     -> rcu_read_unlock();
>> >     rcu_extended_quiesce_end()       -> rcu_read_lock();
>> >     rcu_read_lock/unlock()           -> nop
>> > 
>> > This in turn means that dispatching inside the RCU critical section is
>> > not really bad.
> I believe you still cannot synchronize_rcu() while in an rcu critical
> section per the rcu documentation, even when lock/unlock map to nops.

Right, what the userspace RCU library does is that synchronize_rcu()
also calls rcu_extended_quiesce_start/end() around the actual
synchronization, so that synchronize_rcu() does not wait for its own
grace period.

Instead of a complete nop, rcu_read_lock/unlock() can just write to a
thread-local variable if you want to assert that synchronize_rcu() is
not called within a critical section.  Probably a good idea.

> Of course we can violate that and it wouldn't know a thing, but I prefer
> to stick to the established pattern.

I wasn't suggesting that, just evaluating the different tradeoffs QEMU
could make.  Reference counting is complicated because it has to apply
to all objects used as opaques, and we're using things other than the
DeviceState as opaques in many cases.

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 14:35                             ` Paolo Bonzini
@ 2012-10-09 14:41                               ` Avi Kivity
  0 siblings, 0 replies; 72+ messages in thread
From: Avi Kivity @ 2012-10-09 14:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Anthony Liguori, Ping Fan Liu, Stefan Hajnoczi,
	qemu-devel, Jan Kiszka

On 10/09/2012 04:35 PM, Paolo Bonzini wrote:

>> Of course we can violate that and it wouldn't know a thing, but I prefer
>> to stick to the established pattern.
> 
> I wasn't suggesting that, just evaluating the different tradeoffs QEMU
> could make.  Reference counting is complicated because it has to apply
> to all objects used as opaques, and we're using things other than the
> DeviceState as opaques in many cases.

In the last episode we had MemoryRegionOps::ref/unref() to work around that.

But yes, it's complicated, and we haven't started to deal with cycles yet.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09  9:08     ` [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts"" Stefan Hajnoczi
  2012-10-09  9:26       ` Avi Kivity
@ 2012-10-09 15:02       ` Anthony Liguori
  2012-10-09 15:06         ` Paolo Bonzini
  2012-10-11 12:28         ` Kevin Wolf
  1 sibling, 2 replies; 72+ messages in thread
From: Anthony Liguori @ 2012-10-09 15:02 UTC (permalink / raw)
  To: Stefan Hajnoczi, Paolo Bonzini
  Cc: Kevin Wolf, Ping Fan Liu, qemu-devel, Avi Kivity

Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Mon, Oct 08, 2012 at 03:00:04PM +0200, Paolo Bonzini wrote:
>> Another important step would be to add bdrv_drain.  Kevin pointed out to
>> me that only ->file and ->backing_hd need to be drained.  Well, there
>> may be other BlockDriverStates for vmdk extents or similar cases
>> (Benoit's quorum device for example)... these need to be handled the
>> same way for bdrv_flush, bdrv_reopen, bdrv_drain so perhaps it is useful
>> to add a common way to get them.
>> 
>> And you need a lock to the AioContext, too.  Then the block device can
>> we the AioContext lock in order to synchronize multiple threads working
>> on the block device.  The lock will effectively block the ioeventfd
>> thread, so that bdrv_lock+bdrv_drain+...+bdrv_unlock is a replacement
>> for the current usage of bdrv_drain_all within the QEMU lock.
>> 
>> > I'm starting to work on these steps and will send RFCs. This series
>> > looks good to me.
>> 
>> Thanks!  A lot of the next steps can be done in parallel and more
>> importantly none of them blocks each other (roughly)... so I'm eager to
>> look at your stuff! :)
>
> Some notes on moving virtio-blk processing out of the QEMU global mutex:
>
> 1. Dedicated thread for non-QEMU mutex virtio ioeventfd processing.
>    The point of this thread is to process without the QEMU global mutex, using
>    only fine-grained locks.  (In the future this thread can be integrated back
>    into the QEMU iothread when the global mutex has been eliminated.)
>
>    Dedicated thread must hold reference to virtio-blk device so it will
>    not be destroyed.  Hot unplug requires asking ioeventfd processing
>    threads to release reference.
>
> 2. Versions of virtqueue_pop() and virtqueue_push() that execute outside
>    global QEMU mutex.  Look at memory API and threaded device dispatch.
>
>    The virtio device itself must have a lock so its vring-related state
>    can be modified safely.
>
> Here are the steps that have been mentioned:
>
> 1. aio fastpath - for raw-posix and other aio block drivers, can we reduce I/O
>    request latency by skipping block layer coroutines?  This is can be
>    prototyped (hacked) easily to scope out how much benefit we get.  It's
>    completely independent from the global mutex related work.

We've discussed previously about having an additional layer on top of
the block API.

One problem with the block API today is that it doesn't distinguish
between device access and internal access.  I think this is an
opportunity to introduce a device-only API.

In the very short term, I can imagine an aio fastpath that was only
implemented in terms of the device API.  We could have a slow path that
acquired the BQL.

Regards,

Anthony Liguori

>
> 2. BlockDriverState <-> AioContext attach.  Allows I/O requests to be processed
>    by in event loops other than the QEMU iothread.
>
> 3. bdrv_drain() and BlockDriverState synchronization.  Make it safe to use
>    BlockDriverState outside the QEMU mutex and ensure that bdrv_drain() works.
>
> 4. Unlocked event loop thread.  This is simlar to QEMU's iothread except it
>    doesn't take the big lock.  In theory we could have several of these threads
>    processing at the same time.  virtio-blk ioeventfd processing will be done
>    in this thread.

I think we're reasonable close to being able to do this FWIW.

>
> 5. virtqueue_pop()/virtqueue_push() without QEMU global mutex.  Before this is
>    implemented we could temporarily acquire/release the QEMU global mutex.
>
> Let me run benchmarks on the aio fastpath.  I'm curious how much difference it
> makes.
>
> I'm also curious about virtqueue_pop()/virtqueue_push() outside the QEMU mutex
> although that might be blocked by the current work around MMIO/PIO dispatch
> outside the global mutex.
>
> Stefan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 15:02       ` Anthony Liguori
@ 2012-10-09 15:06         ` Paolo Bonzini
  2012-10-09 15:37           ` Anthony Liguori
  2012-10-11 12:28         ` Kevin Wolf
  1 sibling, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-10-09 15:06 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, Ping Fan Liu, qemu-devel, Avi Kivity

Il 09/10/2012 17:02, Anthony Liguori ha scritto:
> We've discussed previously about having an additional layer on top of
> the block API.
> 
> One problem with the block API today is that it doesn't distinguish
> between device access and internal access.  I think this is an
> opportunity to introduce a device-only API.

And what API would libqblock use?  I don't think this is a good idea
unless we can prove performance problems.

> In the very short term, I can imagine an aio fastpath that was only
> implemented in terms of the device API.  We could have a slow path that
> acquired the BQL.

Not sure I follow.

> > 4. Unlocked event loop thread.  This is simlar to QEMU's iothread except it
> >    doesn't take the big lock.  In theory we could have several of these threads
> >    processing at the same time.  virtio-blk ioeventfd processing will be done
> >    in this thread.
> 
> I think we're reasonable close to being able to do this FWIW.

Yes, I'm resending this series with your comments addressed.  It strays
a bit between your territory (main-loop) and Kevin's, I'll let you two
sort it out.

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 15:06         ` Paolo Bonzini
@ 2012-10-09 15:37           ` Anthony Liguori
  2012-10-09 16:26             ` Paolo Bonzini
  0 siblings, 1 reply; 72+ messages in thread
From: Anthony Liguori @ 2012-10-09 15:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Stefan Hajnoczi, Ping Fan Liu, qemu-devel, Avi Kivity

Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 09/10/2012 17:02, Anthony Liguori ha scritto:
>> We've discussed previously about having an additional layer on top of
>> the block API.
>> 
>> One problem with the block API today is that it doesn't distinguish
>> between device access and internal access.  I think this is an
>> opportunity to introduce a device-only API.
>
> And what API would libqblock use?

!device-only API

> I don't think this is a good idea unless we can prove performance problems.

Let's separate out the two things.

A device specific API has some advantages.  I'm pretty sure it's a hard
requirement for Kemari (they need to use device I/O as quiescent
points).  It makes testing easier too (simplier interface).

But it also lets us do a short-term hack.  That's all I'm suggesting
here.  A short term fast path.

>> In the very short term, I can imagine an aio fastpath that was only
>> implemented in terms of the device API.  We could have a slow path that
>> acquired the BQL.
>
> Not sure I follow.

As long as the ioeventfd thread can acquire qemu_mutex in order to call
bdrv_* functions.  The new device-only API could do this under the
covers for everything but the linux-aio fast path initially.

That means that we can convert block devices to use the device-only API
across the board (provided we make BQL recursive).

It also means we get at least some of the benefits of data-plane in the
short term.

>> > 4. Unlocked event loop thread.  This is simlar to QEMU's iothread except it
>> >    doesn't take the big lock.  In theory we could have several of these threads
>> >    processing at the same time.  virtio-blk ioeventfd processing will be done
>> >    in this thread.
>> 
>> I think we're reasonable close to being able to do this FWIW.
>
> Yes, I'm resending this series with your comments addressed.  It strays
> a bit between your territory (main-loop) and Kevin's, I'll let you two
> sort it out.

I was very happy with it so as long as Kevin is willing to Ack it, I'm
happy to apply it.

Regards,

Anthony Liguori

>
> Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 15:37           ` Anthony Liguori
@ 2012-10-09 16:26             ` Paolo Bonzini
  2012-10-09 18:26               ` Anthony Liguori
  0 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-10-09 16:26 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, Ping Fan Liu, qemu-devel, Avi Kivity

Il 09/10/2012 17:37, Anthony Liguori ha scritto:
>>> >> In the very short term, I can imagine an aio fastpath that was only
>>> >> implemented in terms of the device API.  We could have a slow path that
>>> >> acquired the BQL.
>> >
>> > Not sure I follow.
> 
> As long as the ioeventfd thread can acquire qemu_mutex in order to call
> bdrv_* functions.  The new device-only API could do this under the
> covers for everything but the linux-aio fast path initially.

Ok, so it's about the locking.  I'm not even sure we need locking if we
have cooperative multitasking.  For example if bdrv_aio_readv/writev
is called from a VCPU thread, it can just schedule a bottom half for
itself in the appropriate AioContext.  Similarly for block jobs.

The only part where I'm not sure how it would work is bdrv_read/write,
because of the strange "qemu_aio_wait() calls select with a lock taken".
Maybe we can just forbid synchronous I/O if you set a non-default
AioContext.

This would be entirely hidden in the block layer.  For example the
following does it for bdrv_aio_readv/writev:

diff --git a/block.c b/block.c
index e95f613..7165e82 100644
--- a/block.c
+++ b/block.c
@@ -3712,15 +3712,6 @@ static AIOPool bdrv_em_co_aio_pool = {
     .cancel             = bdrv_aio_co_cancel_em,
 };
 
-static void bdrv_co_em_bh(void *opaque)
-{
-    BlockDriverAIOCBCoroutine *acb = opaque;
-
-    acb->common.cb(acb->common.opaque, acb->req.error);
-    qemu_bh_delete(acb->bh);
-    qemu_aio_release(acb);
-}
-
 /* Invoke bdrv_co_do_readv/bdrv_co_do_writev */
 static void coroutine_fn bdrv_co_do_rw(void *opaque)
 {
@@ -3735,8 +3726,17 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
             acb->req.nb_sectors, acb->req.qiov, 0);
     }
 
-    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
-    qemu_bh_schedule(acb->bh);
+    acb->common.cb(acb->common.opaque, acb->req.error);
+    qemu_aio_release(acb);
+}
+
+static void bdrv_co_em_bh(void *opaque)
+{
+    BlockDriverAIOCBCoroutine *acb = opaque;
+
+    qemu_bh_delete(acb->bh);
+    co = qemu_coroutine_create(bdrv_co_do_rw);
+    qemu_coroutine_enter(co, acb);
 }
 
 static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
@@ -3756,8 +3756,8 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
     acb->req.qiov = qiov;
     acb->is_write = is_write;
 
-    co = qemu_coroutine_create(bdrv_co_do_rw);
-    qemu_coroutine_enter(co, acb);
+    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
+    qemu_bh_schedule(acb->bh);
 
     return &acb->common;
 }


Then we can add a bdrv_aio_readv/writev_unlocked API to the protocols, which
would run outside the bottom half and provide the desired fast path.

Paolo

> That means that we can convert block devices to use the device-only API
> across the board (provided we make BQL recursive).
> 
> It also means we get at least some of the benefits of data-plane in the
> short term.

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 16:26             ` Paolo Bonzini
@ 2012-10-09 18:26               ` Anthony Liguori
  2012-10-10  7:11                 ` Paolo Bonzini
  0 siblings, 1 reply; 72+ messages in thread
From: Anthony Liguori @ 2012-10-09 18:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Stefan Hajnoczi, Ping Fan Liu, qemu-devel, Avi Kivity

Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 09/10/2012 17:37, Anthony Liguori ha scritto:
>>>> >> In the very short term, I can imagine an aio fastpath that was only
>>>> >> implemented in terms of the device API.  We could have a slow path that
>>>> >> acquired the BQL.
>>> >
>>> > Not sure I follow.
>> 
>> As long as the ioeventfd thread can acquire qemu_mutex in order to call
>> bdrv_* functions.  The new device-only API could do this under the
>> covers for everything but the linux-aio fast path initially.
>
> Ok, so it's about the locking.  I'm not even sure we need locking if we
> have cooperative multitasking.  For example if bdrv_aio_readv/writev
> is called from a VCPU thread, it can just schedule a bottom half for
> itself in the appropriate AioContext.  Similarly for block jobs.


Okay, let's separate out the two issues here though.  One is whether we
need a device specific block API.  The second is whether we should short
cut to a fast path in the short term and go after a fully unlocked bdrv_
layer in the long(shortish?) term.

So let's talk about your proposal...

> The only part where I'm not sure how it would work is bdrv_read/write,
> because of the strange "qemu_aio_wait() calls select with a lock taken".
> Maybe we can just forbid synchronous I/O if you set a non-default
> AioContext.

Not sure how practical that is.  The is an awful lot of sync I/O still left.

> This would be entirely hidden in the block layer.  For example the
> following does it for bdrv_aio_readv/writev:
>
> diff --git a/block.c b/block.c
> index e95f613..7165e82 100644
> --- a/block.c
> +++ b/block.c
> @@ -3712,15 +3712,6 @@ static AIOPool bdrv_em_co_aio_pool = {
>      .cancel             = bdrv_aio_co_cancel_em,
>  };
>  
> -static void bdrv_co_em_bh(void *opaque)
> -{
> -    BlockDriverAIOCBCoroutine *acb = opaque;
> -
> -    acb->common.cb(acb->common.opaque, acb->req.error);
> -    qemu_bh_delete(acb->bh);
> -    qemu_aio_release(acb);
> -}
> -
>  /* Invoke bdrv_co_do_readv/bdrv_co_do_writev */
>  static void coroutine_fn bdrv_co_do_rw(void *opaque)
>  {
> @@ -3735,8 +3726,17 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
>              acb->req.nb_sectors, acb->req.qiov, 0);
>      }
>  
> -    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
> -    qemu_bh_schedule(acb->bh);
> +    acb->common.cb(acb->common.opaque, acb->req.error);
> +    qemu_aio_release(acb);
> +}
> +
> +static void bdrv_co_em_bh(void *opaque)
> +{
> +    BlockDriverAIOCBCoroutine *acb = opaque;
> +
> +    qemu_bh_delete(acb->bh);
> +    co = qemu_coroutine_create(bdrv_co_do_rw);
> +    qemu_coroutine_enter(co, acb);
>  }
>  
>  static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
> @@ -3756,8 +3756,8 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
>      acb->req.qiov = qiov;
>      acb->is_write = is_write;
>  
> -    co = qemu_coroutine_create(bdrv_co_do_rw);
> -    qemu_coroutine_enter(co, acb);
> +    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
> +    qemu_bh_schedule(acb->bh);
>  
>      return &acb->common;
>  }
>
>
> Then we can add a bdrv_aio_readv/writev_unlocked API to the protocols, which
> would run outside the bottom half and provide the desired fast path.

This works for some of the block layer I think.  How does this interact
with thread pools for AIO?

But this wouldn't work well with things like NBD or curl, right?  What's
the plan there?

Regards,

Anthony Liguori

>
> Paolo
>
>> That means that we can convert block devices to use the device-only API
>> across the board (provided we make BQL recursive).
>> 
>> It also means we get at least some of the benefits of data-plane in the
>> short term.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 18:26               ` Anthony Liguori
@ 2012-10-10  7:11                 ` Paolo Bonzini
  2012-10-10 12:25                   ` Anthony Liguori
  0 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-10-10  7:11 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, Ping Fan Liu, qemu-devel, Avi Kivity

Il 09/10/2012 20:26, Anthony Liguori ha scritto:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> Il 09/10/2012 17:37, Anthony Liguori ha scritto:
>>>>>>> In the very short term, I can imagine an aio fastpath that was only
>>>>>>> implemented in terms of the device API.  We could have a slow path that
>>>>>>> acquired the BQL.
>>>>>
>>>>> Not sure I follow.
>>>
>>> As long as the ioeventfd thread can acquire qemu_mutex in order to call
>>> bdrv_* functions.  The new device-only API could do this under the
>>> covers for everything but the linux-aio fast path initially.
>>
>> Ok, so it's about the locking.  I'm not even sure we need locking if we
>> have cooperative multitasking.  For example if bdrv_aio_readv/writev
>> is called from a VCPU thread, it can just schedule a bottom half for
>> itself in the appropriate AioContext.  Similarly for block jobs.
> 
> Okay, let's separate out the two issues here though.  One is whether we
> need a device specific block API.  The second is whether we should short
> cut to a fast path in the short term and go after a fully unlocked bdrv_
> layer in the long(shortish?) term.
> 
> So let's talk about your proposal...
> 
>> The only part where I'm not sure how it would work is bdrv_read/write,
>> because of the strange "qemu_aio_wait() calls select with a lock taken".
>> Maybe we can just forbid synchronous I/O if you set a non-default
>> AioContext.
> 
> Not sure how practical that is.  The is an awful lot of sync I/O still left.

Hmm, yeah, perhaps we need to bite the bullet and use a recursive lock.
 The lock would be taken by:

- sync I/O ops

- monitor commands that currently call bdrv_drain_all

- aio_poll when calling bottom halves or handlers

The rest of the proposal however would stand (especially with reference
to block jobs).

I think we can proceed incrementally.  The first obvious step is to
s/qemu_bh_new/aio_bh_new/ in the whole block layer (including the
CoQueue stuff), which would also help fixing the qemu-char bug that Jan
reported.

>> This would be entirely hidden in the block layer.  For example the
>> following does it for bdrv_aio_readv/writev:
>>
>> diff --git a/block.c b/block.c
>> index e95f613..7165e82 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -3712,15 +3712,6 @@ static AIOPool bdrv_em_co_aio_pool = {
>>      .cancel             = bdrv_aio_co_cancel_em,
>>  };
>>  
>> -static void bdrv_co_em_bh(void *opaque)
>> -{
>> -    BlockDriverAIOCBCoroutine *acb = opaque;
>> -
>> -    acb->common.cb(acb->common.opaque, acb->req.error);
>> -    qemu_bh_delete(acb->bh);
>> -    qemu_aio_release(acb);
>> -}
>> -
>>  /* Invoke bdrv_co_do_readv/bdrv_co_do_writev */
>>  static void coroutine_fn bdrv_co_do_rw(void *opaque)
>>  {
>> @@ -3735,8 +3726,17 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
>>              acb->req.nb_sectors, acb->req.qiov, 0);
>>      }
>>  
>> -    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
>> -    qemu_bh_schedule(acb->bh);
>> +    acb->common.cb(acb->common.opaque, acb->req.error);
>> +    qemu_aio_release(acb);
>> +}
>> +
>> +static void bdrv_co_em_bh(void *opaque)
>> +{
>> +    BlockDriverAIOCBCoroutine *acb = opaque;
>> +
>> +    qemu_bh_delete(acb->bh);
>> +    co = qemu_coroutine_create(bdrv_co_do_rw);
>> +    qemu_coroutine_enter(co, acb);
>>  }
>>  
>>  static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
>> @@ -3756,8 +3756,8 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
>>      acb->req.qiov = qiov;
>>      acb->is_write = is_write;
>>  
>> -    co = qemu_coroutine_create(bdrv_co_do_rw);
>> -    qemu_coroutine_enter(co, acb);
>> +    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
>> +    qemu_bh_schedule(acb->bh);
>>  
>>      return &acb->common;
>>  }
>>
>>
>> Then we can add a bdrv_aio_readv/writev_unlocked API to the protocols, which
>> would run outside the bottom half and provide the desired fast path.
> 
> This works for some of the block layer I think.  How does this interact
> with thread pools for AIO?
> 
> But this wouldn't work well with things like NBD or curl, right?  What's
> the plan there?

NBD uses coroutines; curl can use the non-unlocked
bdrv_aio_readv/writev.  In both cases they would execute in the
dataplane thread.  qcow2-over-raw would also execute its read/write code
entirely from the dataplane thread, for example.

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-10  7:11                 ` Paolo Bonzini
@ 2012-10-10 12:25                   ` Anthony Liguori
  2012-10-10 13:31                     ` Paolo Bonzini
  0 siblings, 1 reply; 72+ messages in thread
From: Anthony Liguori @ 2012-10-10 12:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Stefan Hajnoczi, Ping Fan Liu, qemu-devel, Avi Kivity

Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 09/10/2012 20:26, Anthony Liguori ha scritto:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> 
>>> Il 09/10/2012 17:37, Anthony Liguori ha scritto:
>>>>>>>> In the very short term, I can imagine an aio fastpath that was only
>>>>>>>> implemented in terms of the device API.  We could have a slow path that
>>>>>>>> acquired the BQL.
>>>>>>
>>>>>> Not sure I follow.
>>>>
>>>> As long as the ioeventfd thread can acquire qemu_mutex in order to call
>>>> bdrv_* functions.  The new device-only API could do this under the
>>>> covers for everything but the linux-aio fast path initially.
>>>
>>> Ok, so it's about the locking.  I'm not even sure we need locking if we
>>> have cooperative multitasking.  For example if bdrv_aio_readv/writev
>>> is called from a VCPU thread, it can just schedule a bottom half for
>>> itself in the appropriate AioContext.  Similarly for block jobs.
>> 
>> Okay, let's separate out the two issues here though.  One is whether we
>> need a device specific block API.  The second is whether we should short
>> cut to a fast path in the short term and go after a fully unlocked bdrv_
>> layer in the long(shortish?) term.
>> 
>> So let's talk about your proposal...
>> 
>>> The only part where I'm not sure how it would work is bdrv_read/write,
>>> because of the strange "qemu_aio_wait() calls select with a lock taken".
>>> Maybe we can just forbid synchronous I/O if you set a non-default
>>> AioContext.
>> 
>> Not sure how practical that is.  The is an awful lot of sync I/O still left.
>
> Hmm, yeah, perhaps we need to bite the bullet and use a recursive lock.
>  The lock would be taken by:
>
> - sync I/O ops
>
> - monitor commands that currently call bdrv_drain_all
>
> - aio_poll when calling bottom halves or handlers
>
> The rest of the proposal however would stand (especially with reference
> to block jobs).
>
> I think we can proceed incrementally.  The first obvious step is to
> s/qemu_bh_new/aio_bh_new/ in the whole block layer (including the
> CoQueue stuff), which would also help fixing the qemu-char bug that Jan
> reported.
>
>>> This would be entirely hidden in the block layer.  For example the
>>> following does it for bdrv_aio_readv/writev:
>>>
>>> diff --git a/block.c b/block.c
>>> index e95f613..7165e82 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -3712,15 +3712,6 @@ static AIOPool bdrv_em_co_aio_pool = {
>>>      .cancel             = bdrv_aio_co_cancel_em,
>>>  };
>>>  
>>> -static void bdrv_co_em_bh(void *opaque)
>>> -{
>>> -    BlockDriverAIOCBCoroutine *acb = opaque;
>>> -
>>> -    acb->common.cb(acb->common.opaque, acb->req.error);
>>> -    qemu_bh_delete(acb->bh);
>>> -    qemu_aio_release(acb);
>>> -}
>>> -
>>>  /* Invoke bdrv_co_do_readv/bdrv_co_do_writev */
>>>  static void coroutine_fn bdrv_co_do_rw(void *opaque)
>>>  {
>>> @@ -3735,8 +3726,17 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
>>>              acb->req.nb_sectors, acb->req.qiov, 0);
>>>      }
>>>  
>>> -    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
>>> -    qemu_bh_schedule(acb->bh);
>>> +    acb->common.cb(acb->common.opaque, acb->req.error);
>>> +    qemu_aio_release(acb);
>>> +}
>>> +
>>> +static void bdrv_co_em_bh(void *opaque)
>>> +{
>>> +    BlockDriverAIOCBCoroutine *acb = opaque;
>>> +
>>> +    qemu_bh_delete(acb->bh);
>>> +    co = qemu_coroutine_create(bdrv_co_do_rw);
>>> +    qemu_coroutine_enter(co, acb);
>>>  }
>>>  
>>>  static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
>>> @@ -3756,8 +3756,8 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
>>>      acb->req.qiov = qiov;
>>>      acb->is_write = is_write;
>>>  
>>> -    co = qemu_coroutine_create(bdrv_co_do_rw);
>>> -    qemu_coroutine_enter(co, acb);
>>> +    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
>>> +    qemu_bh_schedule(acb->bh);
>>>  
>>>      return &acb->common;
>>>  }
>>>
>>>
>>> Then we can add a bdrv_aio_readv/writev_unlocked API to the protocols, which
>>> would run outside the bottom half and provide the desired fast path.
>> 
>> This works for some of the block layer I think.  How does this interact
>> with thread pools for AIO?
>> 
>> But this wouldn't work well with things like NBD or curl, right?  What's
>> the plan there?
>
> NBD uses coroutines; curl can use the non-unlocked
> bdrv_aio_readv/writev.  In both cases they would execute in the
> dataplane thread.  qcow2-over-raw would also execute its read/write code
> entirely from the dataplane thread, for example.

Does that mean that we'd stop processing the queue if we're waiting for
an I/O completion to handle meta data operations?

If that's the case, that probably will hurt performance overall.

Regards,

Anthony Liguori

>
> Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-10 12:25                   ` Anthony Liguori
@ 2012-10-10 13:31                     ` Paolo Bonzini
  2012-10-10 14:44                       ` Anthony Liguori
  0 siblings, 1 reply; 72+ messages in thread
From: Paolo Bonzini @ 2012-10-10 13:31 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, Ping Fan Liu, qemu-devel, Avi Kivity

Il 10/10/2012 14:25, Anthony Liguori ha scritto:
>> > NBD uses coroutines; curl can use the non-unlocked
>> > bdrv_aio_readv/writev.  In both cases they would execute in the
>> > dataplane thread.  qcow2-over-raw would also execute its read/write code
>> > entirely from the dataplane thread, for example.
> Does that mean that we'd stop processing the queue if we're waiting for
> an I/O completion to handle meta data operations?
> 
> If that's the case, that probably will hurt performance overall.

>From discussion on IRC it looked like this was ENOCAFFEINE. :)

Paolo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-10 13:31                     ` Paolo Bonzini
@ 2012-10-10 14:44                       ` Anthony Liguori
  0 siblings, 0 replies; 72+ messages in thread
From: Anthony Liguori @ 2012-10-10 14:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Stefan Hajnoczi, Ping Fan Liu, qemu-devel, Avi Kivity

Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 10/10/2012 14:25, Anthony Liguori ha scritto:
>>> > NBD uses coroutines; curl can use the non-unlocked
>>> > bdrv_aio_readv/writev.  In both cases they would execute in the
>>> > dataplane thread.  qcow2-over-raw would also execute its read/write code
>>> > entirely from the dataplane thread, for example.
>> Does that mean that we'd stop processing the queue if we're waiting for
>> an I/O completion to handle meta data operations?
>> 
>> If that's the case, that probably will hurt performance overall.
>
>>From discussion on IRC it looked like this was ENOCAFFEINE. :)
>
> Paolo

Correct :-)

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts""
  2012-10-09 15:02       ` Anthony Liguori
  2012-10-09 15:06         ` Paolo Bonzini
@ 2012-10-11 12:28         ` Kevin Wolf
  1 sibling, 0 replies; 72+ messages in thread
From: Kevin Wolf @ 2012-10-11 12:28 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Stefan Hajnoczi, Avi Kivity, Ping Fan Liu, qemu-devel, Paolo Bonzini

Am 09.10.2012 17:02, schrieb Anthony Liguori:
> Stefan Hajnoczi <stefanha@gmail.com> writes:
> 
>> On Mon, Oct 08, 2012 at 03:00:04PM +0200, Paolo Bonzini wrote:
>>> Another important step would be to add bdrv_drain.  Kevin pointed out to
>>> me that only ->file and ->backing_hd need to be drained.  Well, there
>>> may be other BlockDriverStates for vmdk extents or similar cases
>>> (Benoit's quorum device for example)... these need to be handled the
>>> same way for bdrv_flush, bdrv_reopen, bdrv_drain so perhaps it is useful
>>> to add a common way to get them.
>>>
>>> And you need a lock to the AioContext, too.  Then the block device can
>>> we the AioContext lock in order to synchronize multiple threads working
>>> on the block device.  The lock will effectively block the ioeventfd
>>> thread, so that bdrv_lock+bdrv_drain+...+bdrv_unlock is a replacement
>>> for the current usage of bdrv_drain_all within the QEMU lock.
>>>
>>>> I'm starting to work on these steps and will send RFCs. This series
>>>> looks good to me.
>>>
>>> Thanks!  A lot of the next steps can be done in parallel and more
>>> importantly none of them blocks each other (roughly)... so I'm eager to
>>> look at your stuff! :)
>>
>> Some notes on moving virtio-blk processing out of the QEMU global mutex:
>>
>> 1. Dedicated thread for non-QEMU mutex virtio ioeventfd processing.
>>    The point of this thread is to process without the QEMU global mutex, using
>>    only fine-grained locks.  (In the future this thread can be integrated back
>>    into the QEMU iothread when the global mutex has been eliminated.)
>>
>>    Dedicated thread must hold reference to virtio-blk device so it will
>>    not be destroyed.  Hot unplug requires asking ioeventfd processing
>>    threads to release reference.
>>
>> 2. Versions of virtqueue_pop() and virtqueue_push() that execute outside
>>    global QEMU mutex.  Look at memory API and threaded device dispatch.
>>
>>    The virtio device itself must have a lock so its vring-related state
>>    can be modified safely.
>>
>> Here are the steps that have been mentioned:
>>
>> 1. aio fastpath - for raw-posix and other aio block drivers, can we reduce I/O
>>    request latency by skipping block layer coroutines?  This is can be
>>    prototyped (hacked) easily to scope out how much benefit we get.  It's
>>    completely independent from the global mutex related work.
> 
> We've discussed previously about having an additional layer on top of
> the block API.
> 
> One problem with the block API today is that it doesn't distinguish
> between device access and internal access.  I think this is an
> opportunity to introduce a device-only API.
> 
> In the very short term, I can imagine an aio fastpath that was only
> implemented in terms of the device API.  We could have a slow path that
> acquired the BQL.

FWIW, I think we'll automatically get two APIs with the
BlockDriverState/BlockBackend separation. However, I'm not entirely sure
if it's exactly the thing you're imagining, because BlockBackend (the
"device API") wouldn't only be used by devices, but also by qemu-img/io,
libqblock and probably block jobs, too.

Kevin

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2012-10-11 12:29 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-25 12:55 [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 01/17] build: do not rely on indirect inclusion of qemu-config.h Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 02/17] event_notifier: enable it to use pipes Paolo Bonzini
2012-10-08  7:03   ` Stefan Hajnoczi
2012-09-25 12:55 ` [Qemu-devel] [PATCH 03/17] event_notifier: add Win32 implementation Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 04/17] aio: change qemu_aio_set_fd_handler to return void Paolo Bonzini
2012-09-25 21:47   ` Anthony Liguori
2012-09-25 12:55 ` [Qemu-devel] [PATCH 05/17] aio: provide platform-independent API Paolo Bonzini
2012-09-25 21:48   ` Anthony Liguori
2012-09-25 12:55 ` [Qemu-devel] [PATCH 06/17] aio: introduce AioContext, move bottom halves there Paolo Bonzini
2012-09-25 21:51   ` Anthony Liguori
2012-09-26  6:30     ` Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 07/17] aio: add I/O handlers to the AioContext interface Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 08/17] aio: add non-blocking variant of aio_wait Paolo Bonzini
2012-09-25 21:56   ` Anthony Liguori
2012-09-25 12:55 ` [Qemu-devel] [PATCH 09/17] aio: prepare for introducing GSource-based dispatch Paolo Bonzini
2012-09-25 22:01   ` Anthony Liguori
2012-09-26  6:36     ` Paolo Bonzini
2012-09-26  6:48     ` Paolo Bonzini
2012-09-29 11:28   ` Blue Swirl
2012-10-01  6:40     ` Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 10/17] aio: add Win32 implementation Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 11/17] aio: make AioContexts GSources Paolo Bonzini
2012-09-25 22:06   ` Anthony Liguori
2012-09-26  6:40     ` Paolo Bonzini
2012-09-25 12:55 ` [Qemu-devel] [PATCH 12/17] aio: add aio_notify Paolo Bonzini
2012-09-25 22:07   ` Anthony Liguori
2012-09-25 12:55 ` [Qemu-devel] [PATCH 13/17] aio: call aio_notify after setting I/O handlers Paolo Bonzini
2012-09-25 22:07   ` Anthony Liguori
2012-09-25 12:56 ` [Qemu-devel] [PATCH 14/17] main-loop: use GSource to poll AIO file descriptors Paolo Bonzini
2012-09-25 22:09   ` Anthony Liguori
2012-09-26  6:38     ` Paolo Bonzini
2012-09-25 12:56 ` [Qemu-devel] [PATCH 15/17] main-loop: use aio_notify for qemu_notify_event Paolo Bonzini
2012-09-25 22:10   ` Anthony Liguori
2012-09-25 12:56 ` [Qemu-devel] [PATCH 16/17] aio: clean up now-unused functions Paolo Bonzini
2012-09-25 22:11   ` Anthony Liguori
2012-09-25 12:56 ` [Qemu-devel] [PATCH 17/17] linux-aio: use event notifiers Paolo Bonzini
2012-09-26 12:28 ` [Qemu-devel] [RFC PATCH 00/17] Support for multiple "AIO contexts" Kevin Wolf
2012-09-26 13:32   ` Paolo Bonzini
2012-09-26 14:31     ` Kevin Wolf
2012-09-26 15:48       ` Paolo Bonzini
2012-09-27  7:11         ` Kevin Wolf
2012-09-27  7:43           ` Paolo Bonzini
2012-10-08 11:39 ` Stefan Hajnoczi
2012-10-08 13:00   ` Paolo Bonzini
2012-10-09  9:08     ` [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts"" Stefan Hajnoczi
2012-10-09  9:26       ` Avi Kivity
2012-10-09 10:36         ` Paolo Bonzini
2012-10-09 10:52           ` Avi Kivity
2012-10-09 11:08             ` Paolo Bonzini
2012-10-09 11:55               ` Avi Kivity
2012-10-09 12:01                 ` Paolo Bonzini
2012-10-09 12:18                   ` Jan Kiszka
2012-10-09 12:28                     ` Avi Kivity
2012-10-09 12:22                   ` Avi Kivity
2012-10-09 13:11                     ` Paolo Bonzini
2012-10-09 13:21                       ` Avi Kivity
2012-10-09 13:50                         ` Paolo Bonzini
2012-10-09 14:24                           ` Avi Kivity
2012-10-09 14:35                             ` Paolo Bonzini
2012-10-09 14:41                               ` Avi Kivity
2012-10-09 14:05                   ` Stefan Hajnoczi
2012-10-09 15:02       ` Anthony Liguori
2012-10-09 15:06         ` Paolo Bonzini
2012-10-09 15:37           ` Anthony Liguori
2012-10-09 16:26             ` Paolo Bonzini
2012-10-09 18:26               ` Anthony Liguori
2012-10-10  7:11                 ` Paolo Bonzini
2012-10-10 12:25                   ` Anthony Liguori
2012-10-10 13:31                     ` Paolo Bonzini
2012-10-10 14:44                       ` Anthony Liguori
2012-10-11 12:28         ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.