All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PULL v2 0/7] Block patches
@ 2015-11-09 10:08 Stefan Hajnoczi
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 1/7] dataplane: simplify indirect descriptor read Stefan Hajnoczi
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2015-11-09 10:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Stefan Hajnoczi

The following changes since commit c4a7bf54e588ff2de9fba0898b686156251b2063:

  Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging (2015-11-07 19:55:15 +0000)

are available in the git repository at:

  git://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 84aa0140dd4f04f5d993f0db14c40da8d3de2706:

  blockdev: acquire AioContext in hmp_commit() (2015-11-09 10:07:10 +0000)

----------------------------------------------------------------

----------------------------------------------------------------

Denis V. Lunev (1):
  monitor: add missed aio_context_acquire into vm_completion call

Fam Zheng (3):
  aio: Introduce aio_external_disabled
  aio: Introduce aio_context_setup
  aio: Introduce aio-epoll.c

Michael S. Tsirkin (2):
  dataplane: simplify indirect descriptor read
  dataplane: support non-contigious s/g

Stefan Hajnoczi (1):
  blockdev: acquire AioContext in hmp_commit()

 aio-posix.c                 | 188 +++++++++++++++++++++++++++++++++++++++++++-
 aio-win32.c                 |   4 +
 async.c                     |  13 ++-
 blockdev.c                  |  12 ++-
 hw/virtio/dataplane/vring.c |  96 ++++++++++++++--------
 include/block/aio.h         |  24 ++++++
 monitor.c                   |  11 ++-
 7 files changed, 309 insertions(+), 39 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL v2 1/7] dataplane: simplify indirect descriptor read
  2015-11-09 10:08 [Qemu-devel] [PULL v2 0/7] Block patches Stefan Hajnoczi
@ 2015-11-09 10:08 ` Stefan Hajnoczi
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 2/7] dataplane: support non-contigious s/g Stefan Hajnoczi
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2015-11-09 10:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Stefan Hajnoczi, Michael S. Tsirkin

From: "Michael S. Tsirkin" <mst@redhat.com>

Use address_space_read to make sure we handle the case of an indirect
descriptor crossing DIMM boundary correctly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 1446047243-3221-1-git-send-email-mst@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/virtio/dataplane/vring.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index 68f1994..0b92fcf 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -257,6 +257,21 @@ static void copy_in_vring_desc(VirtIODevice *vdev,
     host->next = virtio_lduw_p(vdev, &guest->next);
 }
 
+static bool read_vring_desc(VirtIODevice *vdev,
+                            hwaddr guest,
+                            struct vring_desc *host)
+{
+    if (address_space_read(&address_space_memory, guest, MEMTXATTRS_UNSPECIFIED,
+                           (uint8_t *)host, sizeof *host)) {
+        return false;
+    }
+    host->addr = virtio_tswap64(vdev, host->addr);
+    host->len = virtio_tswap32(vdev, host->len);
+    host->flags = virtio_tswap16(vdev, host->flags);
+    host->next = virtio_tswap16(vdev, host->next);
+    return true;
+}
+
 /* This is stolen from linux/drivers/vhost/vhost.c. */
 static int get_indirect(VirtIODevice *vdev, Vring *vring,
                         VirtQueueElement *elem, struct vring_desc *indirect)
@@ -284,23 +299,16 @@ static int get_indirect(VirtIODevice *vdev, Vring *vring,
     }
 
     do {
-        struct vring_desc *desc_ptr;
-        MemoryRegion *mr;
-
         /* Translate indirect descriptor */
-        desc_ptr = vring_map(&mr,
-                             indirect->addr + found * sizeof(desc),
-                             sizeof(desc), false);
-        if (!desc_ptr) {
-            error_report("Failed to map indirect descriptor "
+        if (!read_vring_desc(vdev, indirect->addr + found * sizeof(desc),
+                             &desc)) {
+            error_report("Failed to read indirect descriptor "
                          "addr %#" PRIx64 " len %zu",
                          (uint64_t)indirect->addr + found * sizeof(desc),
                          sizeof(desc));
             vring->broken = true;
             return -EFAULT;
         }
-        copy_in_vring_desc(vdev, desc_ptr, &desc);
-        memory_region_unref(mr);
 
         /* Ensure descriptor has been loaded before accessing fields */
         barrier(); /* read_barrier_depends(); */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL v2 2/7] dataplane: support non-contigious s/g
  2015-11-09 10:08 [Qemu-devel] [PULL v2 0/7] Block patches Stefan Hajnoczi
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 1/7] dataplane: simplify indirect descriptor read Stefan Hajnoczi
@ 2015-11-09 10:08 ` Stefan Hajnoczi
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 3/7] aio: Introduce aio_external_disabled Stefan Hajnoczi
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2015-11-09 10:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Stefan Hajnoczi, Michael S. Tsirkin

From: "Michael S. Tsirkin" <mst@redhat.com>

bring_map currently fails if one of the entries it's mapping is
contigious in GPA but not HVA address space.  Introduce a mapped_len
parameter so it can handle this, returning the actual mapped length.

This will still fail if there's no space left in the sg, but luckily max
queue size in use is currently 256, while max sg size is 1024, so we
should be OK even is all entries happen to cross a single DIMM boundary.

Won't work well with very small DIMM sizes, unfortunately:
e.g. this will fail with 4K DIMMs where a single
request might span a large number of DIMMs.

Let's hope these are uncommon - at least we are not breaking things.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 1446047243-3221-2-git-send-email-mst@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/virtio/dataplane/vring.c | 68 ++++++++++++++++++++++++++++++---------------
 1 file changed, 46 insertions(+), 22 deletions(-)

diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index 0b92fcf..23f667e 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -25,15 +25,30 @@
 
 /* vring_map can be coupled with vring_unmap or (if you still have the
  * value returned in *mr) memory_region_unref.
+ * Returns NULL on failure.
+ * Callers that can handle a partial mapping must supply mapped_len pointer to
+ * get the actual length mapped.
+ * Passing mapped_len == NULL requires either a full mapping or a failure.
  */
-static void *vring_map(MemoryRegion **mr, hwaddr phys, hwaddr len,
+static void *vring_map(MemoryRegion **mr, hwaddr phys,
+                       hwaddr len, hwaddr *mapped_len,
                        bool is_write)
 {
     MemoryRegionSection section = memory_region_find(get_system_memory(), phys, len);
+    uint64_t size;
 
-    if (!section.mr || int128_get64(section.size) < len) {
+    if (!section.mr) {
         goto out;
     }
+
+    size = int128_get64(section.size);
+    assert(size);
+
+    /* Passing mapped_len == NULL requires either a full mapping or a failure. */
+    if (!mapped_len && size < len) {
+        goto out;
+    }
+
     if (is_write && section.readonly) {
         goto out;
     }
@@ -46,6 +61,10 @@ static void *vring_map(MemoryRegion **mr, hwaddr phys, hwaddr len,
         goto out;
     }
 
+    if (mapped_len) {
+        *mapped_len = MIN(size, len);
+    }
+
     *mr = section.mr;
     return memory_region_get_ram_ptr(section.mr) + section.offset_within_region;
 
@@ -78,7 +97,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
     addr = virtio_queue_get_desc_addr(vdev, n);
     size = virtio_queue_get_desc_size(vdev, n);
     /* Map the descriptor area as read only */
-    ptr = vring_map(&vring->mr_desc, addr, size, false);
+    ptr = vring_map(&vring->mr_desc, addr, size, NULL, false);
     if (!ptr) {
         error_report("Failed to map 0x%" HWADDR_PRIx " byte for vring desc "
                      "at 0x%" HWADDR_PRIx,
@@ -92,7 +111,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
     /* Add the size of the used_event_idx */
     size += sizeof(uint16_t);
     /* Map the driver area as read only */
-    ptr = vring_map(&vring->mr_avail, addr, size, false);
+    ptr = vring_map(&vring->mr_avail, addr, size, NULL, false);
     if (!ptr) {
         error_report("Failed to map 0x%" HWADDR_PRIx " byte for vring avail "
                      "at 0x%" HWADDR_PRIx,
@@ -106,7 +125,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
     /* Add the size of the avail_event_idx */
     size += sizeof(uint16_t);
     /* Map the device area as read-write */
-    ptr = vring_map(&vring->mr_used, addr, size, true);
+    ptr = vring_map(&vring->mr_used, addr, size, NULL, true);
     if (!ptr) {
         error_report("Failed to map 0x%" HWADDR_PRIx " byte for vring used "
                      "at 0x%" HWADDR_PRIx,
@@ -206,6 +225,7 @@ static int get_desc(Vring *vring, VirtQueueElement *elem,
     struct iovec *iov;
     hwaddr *addr;
     MemoryRegion *mr;
+    hwaddr len;
 
     if (desc->flags & VRING_DESC_F_WRITE) {
         num = &elem->in_num;
@@ -224,26 +244,30 @@ static int get_desc(Vring *vring, VirtQueueElement *elem,
         }
     }
 
-    /* Stop for now if there are not enough iovecs available. */
-    if (*num >= VIRTQUEUE_MAX_SIZE) {
-        error_report("Invalid SG num: %u", *num);
-        return -EFAULT;
-    }
+    while (desc->len) {
+        /* Stop for now if there are not enough iovecs available. */
+        if (*num >= VIRTQUEUE_MAX_SIZE) {
+            error_report("Invalid SG num: %u", *num);
+            return -EFAULT;
+        }
+
+        iov->iov_base = vring_map(&mr, desc->addr, desc->len, &len,
+                                  desc->flags & VRING_DESC_F_WRITE);
+        if (!iov->iov_base) {
+            error_report("Failed to map descriptor addr %#" PRIx64 " len %u",
+                         (uint64_t)desc->addr, desc->len);
+            return -EFAULT;
+        }
 
-    /* TODO handle non-contiguous memory across region boundaries */
-    iov->iov_base = vring_map(&mr, desc->addr, desc->len,
-                              desc->flags & VRING_DESC_F_WRITE);
-    if (!iov->iov_base) {
-        error_report("Failed to map descriptor addr %#" PRIx64 " len %u",
-                     (uint64_t)desc->addr, desc->len);
-        return -EFAULT;
+        /* The MemoryRegion is looked up again and unref'ed later, leave the
+         * ref in place.  */
+        (iov++)->iov_len = len;
+        *addr++ = desc->addr;
+        desc->len -= len;
+        desc->addr += len;
+        *num += 1;
     }
 
-    /* The MemoryRegion is looked up again and unref'ed later, leave the
-     * ref in place.  */
-    iov->iov_len = desc->len;
-    *addr = desc->addr;
-    *num += 1;
     return 0;
 }
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL v2 3/7] aio: Introduce aio_external_disabled
  2015-11-09 10:08 [Qemu-devel] [PULL v2 0/7] Block patches Stefan Hajnoczi
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 1/7] dataplane: simplify indirect descriptor read Stefan Hajnoczi
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 2/7] dataplane: support non-contigious s/g Stefan Hajnoczi
@ 2015-11-09 10:08 ` Stefan Hajnoczi
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 4/7] aio: Introduce aio_context_setup Stefan Hajnoczi
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2015-11-09 10:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Fam Zheng, Stefan Hajnoczi

From: Fam Zheng <famz@redhat.com>

This allows AioContext users to check the enable/disable state of
external clients.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1446177989-6702-2-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/aio.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/block/aio.h b/include/block/aio.h
index 92efc5e..cab7c76 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -406,6 +406,17 @@ static inline void aio_enable_external(AioContext *ctx)
 }
 
 /**
+ * aio_external_disabled:
+ * @ctx: the aio context
+ *
+ * Return true if the external clients are disabled.
+ */
+static inline bool aio_external_disabled(AioContext *ctx)
+{
+    return atomic_read(&ctx->external_disable_cnt);
+}
+
+/**
  * aio_node_check:
  * @ctx: the aio context
  * @is_external: Whether or not the checked node is an external event source.
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL v2 4/7] aio: Introduce aio_context_setup
  2015-11-09 10:08 [Qemu-devel] [PULL v2 0/7] Block patches Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 3/7] aio: Introduce aio_external_disabled Stefan Hajnoczi
@ 2015-11-09 10:08 ` Stefan Hajnoczi
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 5/7] aio: Introduce aio-epoll.c Stefan Hajnoczi
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2015-11-09 10:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Fam Zheng, Stefan Hajnoczi

From: Fam Zheng <famz@redhat.com>

This is the place to initialize platform specific bits of AioContext.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1446177989-6702-3-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 aio-posix.c         |  4 ++++
 aio-win32.c         |  4 ++++
 async.c             | 13 +++++++++++--
 include/block/aio.h |  8 ++++++++
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index 0467f23..5bff3cd 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -302,3 +302,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
     return progress;
 }
+
+void aio_context_setup(AioContext *ctx, Error **errp)
+{
+}
diff --git a/aio-win32.c b/aio-win32.c
index 43c4c79..cdc4456 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -369,3 +369,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
     aio_context_release(ctx);
     return progress;
 }
+
+void aio_context_setup(AioContext *ctx, Error **errp)
+{
+}
diff --git a/async.c b/async.c
index 9589e4b..e106072 100644
--- a/async.c
+++ b/async.c
@@ -325,12 +325,18 @@ AioContext *aio_context_new(Error **errp)
 {
     int ret;
     AioContext *ctx;
+    Error *local_err = NULL;
+
     ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
+    aio_context_setup(ctx, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        goto fail;
+    }
     ret = event_notifier_init(&ctx->notifier, false);
     if (ret < 0) {
-        g_source_destroy(&ctx->source);
         error_setg_errno(errp, -ret, "Failed to initialize event notifier");
-        return NULL;
+        goto fail;
     }
     g_source_set_can_recurse(&ctx->source, true);
     aio_set_event_notifier(ctx, &ctx->notifier,
@@ -345,6 +351,9 @@ AioContext *aio_context_new(Error **errp)
     ctx->notify_dummy_bh = aio_bh_new(ctx, notify_dummy_bh, NULL);
 
     return ctx;
+fail:
+    g_source_destroy(&ctx->source);
+    return NULL;
 }
 
 void aio_context_ref(AioContext *ctx)
diff --git a/include/block/aio.h b/include/block/aio.h
index cab7c76..9ffecf7 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -429,4 +429,12 @@ static inline bool aio_node_check(AioContext *ctx, bool is_external)
     return !is_external || !atomic_read(&ctx->external_disable_cnt);
 }
 
+/**
+ * aio_context_setup:
+ * @ctx: the aio context
+ *
+ * Initialize the aio context.
+ */
+void aio_context_setup(AioContext *ctx, Error **errp);
+
 #endif
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL v2 5/7] aio: Introduce aio-epoll.c
  2015-11-09 10:08 [Qemu-devel] [PULL v2 0/7] Block patches Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 4/7] aio: Introduce aio_context_setup Stefan Hajnoczi
@ 2015-11-09 10:08 ` Stefan Hajnoczi
  2015-11-13 17:09   ` Paolo Bonzini
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 6/7] monitor: add missed aio_context_acquire into vm_completion call Stefan Hajnoczi
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Stefan Hajnoczi @ 2015-11-09 10:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Fam Zheng, Stefan Hajnoczi

From: Fam Zheng <famz@redhat.com>

To minimize code duplication, epoll is hooked into aio-posix's
aio_poll() instead of rolling its own. This approach also has both
compile-time and run-time switchability.

1) When QEMU starts with a small number of fds in the event loop, ppoll
is used.

2) When QEMU starts with a big number of fds, or when more devices are
hot plugged, epoll kicks in when the number of fds hits the threshold.

3) Some fds may not support epoll, such as tty based stdio. In this
case, it falls back to ppoll.

A rough benchmark with scsi-disk on virtio-scsi dataplane (epoll gets
enabled from 64 onward). Numbers are in MB/s.

===============================================
             |     master     |     epoll
             |                |
scsi disks # | read    randrw | read    randrw
-------------|----------------|----------------
1            | 86      36     | 92      45
8            | 87      43     | 86      41
64           | 71      32     | 70      38
128          | 48      24     | 58      31
256          | 37      19     | 57      28
===============================================

To comply with aio_{disable,enable}_external, we always use ppoll when
aio_external_disabled() is true.

[Removed #ifdef CONFIG_EPOLL around AioContext epollfd field declaration
since the field is also referenced outside CONFIG_EPOLL code.
--Stefan]

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1446177989-6702-4-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 aio-posix.c         | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 include/block/aio.h |   5 ++
 2 files changed, 188 insertions(+), 1 deletion(-)

diff --git a/aio-posix.c b/aio-posix.c
index 5bff3cd..06148a9 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -17,6 +17,9 @@
 #include "block/block.h"
 #include "qemu/queue.h"
 #include "qemu/sockets.h"
+#ifdef CONFIG_EPOLL
+#include <sys/epoll.h>
+#endif
 
 struct AioHandler
 {
@@ -29,6 +32,162 @@ struct AioHandler
     QLIST_ENTRY(AioHandler) node;
 };
 
+#ifdef CONFIG_EPOLL
+
+/* The fd number threashold to switch to epoll */
+#define EPOLL_ENABLE_THRESHOLD 64
+
+static void aio_epoll_disable(AioContext *ctx)
+{
+    ctx->epoll_available = false;
+    if (!ctx->epoll_enabled) {
+        return;
+    }
+    ctx->epoll_enabled = false;
+    close(ctx->epollfd);
+}
+
+static inline int epoll_events_from_pfd(int pfd_events)
+{
+    return (pfd_events & G_IO_IN ? EPOLLIN : 0) |
+           (pfd_events & G_IO_OUT ? EPOLLOUT : 0) |
+           (pfd_events & G_IO_HUP ? EPOLLHUP : 0) |
+           (pfd_events & G_IO_ERR ? EPOLLERR : 0);
+}
+
+static bool aio_epoll_try_enable(AioContext *ctx)
+{
+    AioHandler *node;
+    struct epoll_event event;
+
+    QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+        int r;
+        if (node->deleted || !node->pfd.events) {
+            continue;
+        }
+        event.events = epoll_events_from_pfd(node->pfd.events);
+        event.data.ptr = node;
+        r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event);
+        if (r) {
+            return false;
+        }
+    }
+    ctx->epoll_enabled = true;
+    return true;
+}
+
+static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
+{
+    struct epoll_event event;
+    int r;
+
+    if (!ctx->epoll_enabled) {
+        return;
+    }
+    if (!node->pfd.events) {
+        r = epoll_ctl(ctx->epollfd, EPOLL_CTL_DEL, node->pfd.fd, &event);
+        if (r) {
+            aio_epoll_disable(ctx);
+        }
+    } else {
+        event.data.ptr = node;
+        event.events = epoll_events_from_pfd(node->pfd.events);
+        if (is_new) {
+            r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event);
+            if (r) {
+                aio_epoll_disable(ctx);
+            }
+        } else {
+            r = epoll_ctl(ctx->epollfd, EPOLL_CTL_MOD, node->pfd.fd, &event);
+            if (r) {
+                aio_epoll_disable(ctx);
+            }
+        }
+    }
+}
+
+static int aio_epoll(AioContext *ctx, GPollFD *pfds,
+                     unsigned npfd, int64_t timeout)
+{
+    AioHandler *node;
+    int i, ret = 0;
+    struct epoll_event events[128];
+
+    assert(npfd == 1);
+    assert(pfds[0].fd == ctx->epollfd);
+    if (timeout > 0) {
+        ret = qemu_poll_ns(pfds, npfd, timeout);
+    }
+    if (timeout <= 0 || ret > 0) {
+        ret = epoll_wait(ctx->epollfd, events,
+                         sizeof(events) / sizeof(events[0]),
+                         timeout);
+        if (ret <= 0) {
+            goto out;
+        }
+        for (i = 0; i < ret; i++) {
+            int ev = events[i].events;
+            node = events[i].data.ptr;
+            node->pfd.revents = (ev & EPOLLIN ? G_IO_IN : 0) |
+                (ev & EPOLLOUT ? G_IO_OUT : 0) |
+                (ev & EPOLLHUP ? G_IO_HUP : 0) |
+                (ev & EPOLLERR ? G_IO_ERR : 0);
+        }
+    }
+out:
+    return ret;
+}
+
+static bool aio_epoll_enabled(AioContext *ctx)
+{
+    /* Fall back to ppoll when external clients are disabled. */
+    return !aio_external_disabled(ctx) && ctx->epoll_enabled;
+}
+
+static bool aio_epoll_check_poll(AioContext *ctx, GPollFD *pfds,
+                                 unsigned npfd, int64_t timeout)
+{
+    if (!ctx->epoll_available) {
+        return false;
+    }
+    if (aio_epoll_enabled(ctx)) {
+        return true;
+    }
+    if (npfd >= EPOLL_ENABLE_THRESHOLD) {
+        if (aio_epoll_try_enable(ctx)) {
+            return true;
+        } else {
+            aio_epoll_disable(ctx);
+        }
+    }
+    return false;
+}
+
+#else
+
+static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
+{
+}
+
+static int aio_epoll(AioContext *ctx, GPollFD *pfds,
+                     unsigned npfd, int64_t timeout)
+{
+    assert(false);
+}
+
+static bool aio_epoll_enabled(AioContext *ctx)
+{
+    return false;
+}
+
+static bool aio_epoll_check_poll(AioContext *ctx, GPollFD *pfds,
+                          unsigned npfd, int64_t timeout)
+{
+    return false;
+}
+
+#endif
+
 static AioHandler *find_aio_handler(AioContext *ctx, int fd)
 {
     AioHandler *node;
@@ -50,6 +209,7 @@ void aio_set_fd_handler(AioContext *ctx,
                         void *opaque)
 {
     AioHandler *node;
+    bool is_new = false;
 
     node = find_aio_handler(ctx, fd);
 
@@ -79,6 +239,7 @@ void aio_set_fd_handler(AioContext *ctx,
             QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
 
             g_source_add_poll(&ctx->source, &node->pfd);
+            is_new = true;
         }
         /* Update handler with latest information */
         node->io_read = io_read;
@@ -90,6 +251,7 @@ void aio_set_fd_handler(AioContext *ctx,
         node->pfd.events |= (io_write ? G_IO_OUT | G_IO_ERR : 0);
     }
 
+    aio_epoll_update(ctx, node, is_new);
     aio_notify(ctx);
 }
 
@@ -262,6 +424,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
     /* fill pollfds */
     QLIST_FOREACH(node, &ctx->aio_handlers, node) {
         if (!node->deleted && node->pfd.events
+            && !aio_epoll_enabled(ctx)
             && aio_node_check(ctx, node->is_external)) {
             add_pollfd(node);
         }
@@ -273,7 +436,17 @@ bool aio_poll(AioContext *ctx, bool blocking)
     if (timeout) {
         aio_context_release(ctx);
     }
-    ret = qemu_poll_ns((GPollFD *)pollfds, npfd, timeout);
+    if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
+        AioHandler epoll_handler;
+
+        epoll_handler.pfd.fd = ctx->epollfd;
+        epoll_handler.pfd.events = G_IO_IN | G_IO_OUT | G_IO_HUP | G_IO_ERR;
+        npfd = 0;
+        add_pollfd(&epoll_handler);
+        ret = aio_epoll(ctx, pollfds, npfd, timeout);
+    } else  {
+        ret = qemu_poll_ns(pollfds, npfd, timeout);
+    }
     if (blocking) {
         atomic_sub(&ctx->notify_me, 2);
     }
@@ -305,4 +478,13 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 void aio_context_setup(AioContext *ctx, Error **errp)
 {
+#ifdef CONFIG_EPOLL
+    assert(!ctx->epollfd);
+    ctx->epollfd = epoll_create1(EPOLL_CLOEXEC);
+    if (ctx->epollfd == -1) {
+        ctx->epoll_available = false;
+    } else {
+        ctx->epoll_available = true;
+    }
+#endif
 }
diff --git a/include/block/aio.h b/include/block/aio.h
index 9ffecf7..e086e3b 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -124,6 +124,11 @@ struct AioContext {
     QEMUTimerListGroup tlg;
 
     int external_disable_cnt;
+
+    /* epoll(7) state used when built with CONFIG_EPOLL */
+    int epollfd;
+    bool epoll_enabled;
+    bool epoll_available;
 };
 
 /**
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL v2 6/7] monitor: add missed aio_context_acquire into vm_completion call
  2015-11-09 10:08 [Qemu-devel] [PULL v2 0/7] Block patches Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 5/7] aio: Introduce aio-epoll.c Stefan Hajnoczi
@ 2015-11-09 10:08 ` Stefan Hajnoczi
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 7/7] blockdev: acquire AioContext in hmp_commit() Stefan Hajnoczi
  2015-11-09 12:51 ` [Qemu-devel] [PULL v2 0/7] Block patches Peter Maydell
  7 siblings, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2015-11-09 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Peter Maydell, Markus Armbruster, Luiz Capitulino,
	Stefan Hajnoczi, Denis V. Lunev

From: "Denis V. Lunev" <den@openvz.org>

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Luiz Capitulino <lcapitulino@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 monitor.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/monitor.c b/monitor.c
index 6cd747f..3295840 100644
--- a/monitor.c
+++ b/monitor.c
@@ -3408,13 +3408,18 @@ static void vm_completion(ReadLineState *rs, const char *str)
     readline_set_completion_index(rs, len);
     while ((bs = bdrv_next(bs))) {
         SnapshotInfoList *snapshots, *snapshot;
+        AioContext *ctx = bdrv_get_aio_context(bs);
+        bool ok = false;
 
-        if (!bdrv_can_snapshot(bs)) {
-            continue;
+        aio_context_acquire(ctx);
+        if (bdrv_can_snapshot(bs)) {
+            ok = bdrv_query_snapshot_info_list(bs, &snapshots, NULL) == 0;
         }
-        if (bdrv_query_snapshot_info_list(bs, &snapshots, NULL)) {
+        aio_context_release(ctx);
+        if (!ok) {
             continue;
         }
+
         snapshot = snapshots;
         while (snapshot) {
             char *completion = snapshot->value->name;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PULL v2 7/7] blockdev: acquire AioContext in hmp_commit()
  2015-11-09 10:08 [Qemu-devel] [PULL v2 0/7] Block patches Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 6/7] monitor: add missed aio_context_acquire into vm_completion call Stefan Hajnoczi
@ 2015-11-09 10:08 ` Stefan Hajnoczi
  2015-11-09 12:51 ` [Qemu-devel] [PULL v2 0/7] Block patches Peter Maydell
  7 siblings, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2015-11-09 10:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Stefan Hajnoczi, Denis V. Lunev

This one slipped through.  Although we acquire AioContext when
committing all devices we don't for just a single device.

AioContext must be acquired before calling bdrv_*() functions to
synchronize access with other threads that may be using the AioContext.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockdev.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 8b8bfa9..97be42f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1120,6 +1120,9 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
     if (!strcmp(device, "all")) {
         ret = bdrv_commit_all();
     } else {
+        BlockDriverState *bs;
+        AioContext *aio_context;
+
         blk = blk_by_name(device);
         if (!blk) {
             monitor_printf(mon, "Device '%s' not found\n", device);
@@ -1129,7 +1132,14 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
             monitor_printf(mon, "Device '%s' has no medium\n", device);
             return;
         }
-        ret = bdrv_commit(blk_bs(blk));
+
+        bs = blk_bs(blk);
+        aio_context = bdrv_get_aio_context(bs);
+        aio_context_acquire(aio_context);
+
+        ret = bdrv_commit(bs);
+
+        aio_context_release(aio_context);
     }
     if (ret < 0) {
         monitor_printf(mon, "'commit' error for '%s': %s\n", device,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PULL v2 0/7] Block patches
  2015-11-09 10:08 [Qemu-devel] [PULL v2 0/7] Block patches Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 7/7] blockdev: acquire AioContext in hmp_commit() Stefan Hajnoczi
@ 2015-11-09 12:51 ` Peter Maydell
  2015-11-09 14:30   ` Peter Maydell
  7 siblings, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2015-11-09 12:51 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Marc-André Lureau, QEMU Developers

On 9 November 2015 at 10:08, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> The following changes since commit c4a7bf54e588ff2de9fba0898b686156251b2063:
>
>   Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging (2015-11-07 19:55:15 +0000)
>
> are available in the git repository at:
>
>   git://github.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to 84aa0140dd4f04f5d993f0db14c40da8d3de2706:
>
>   blockdev: acquire AioContext in hmp_commit() (2015-11-09 10:07:10 +0000)
>
> ----------------------------------------------------------------
>
> ----------------------------------------------------------------

I got that intermittent ivshmem test failure again :-(

ERROR:/home/petmay01/qemu/tests/ivshmem-test.c:345:test_ivshmem_server:
assertion failed (ret != 0): (0 != 0)

I'll retry the tests...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PULL v2 0/7] Block patches
  2015-11-09 12:51 ` [Qemu-devel] [PULL v2 0/7] Block patches Peter Maydell
@ 2015-11-09 14:30   ` Peter Maydell
  2015-11-09 15:09     ` Marc-André Lureau
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2015-11-09 14:30 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Marc-André Lureau, QEMU Developers

On 9 November 2015 at 12:51, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 9 November 2015 at 10:08, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>> The following changes since commit c4a7bf54e588ff2de9fba0898b686156251b2063:
>>
>>   Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging (2015-11-07 19:55:15 +0000)
>>
>> are available in the git repository at:
>>
>>   git://github.com/stefanha/qemu.git tags/block-pull-request
>>
>> for you to fetch changes up to 84aa0140dd4f04f5d993f0db14c40da8d3de2706:
>>
>>   blockdev: acquire AioContext in hmp_commit() (2015-11-09 10:07:10 +0000)
>>
>> ----------------------------------------------------------------
>>
>> ----------------------------------------------------------------
>
> I got that intermittent ivshmem test failure again :-(
>
> ERROR:/home/petmay01/qemu/tests/ivshmem-test.c:345:test_ivshmem_server:
> assertion failed (ret != 0): (0 != 0)
>
> I'll retry the tests...

Yep, worked fine second time, so applied.

Marc-André, can you look into why the ivshmem tests might be intermittently
failing like this, please?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PULL v2 0/7] Block patches
  2015-11-09 14:30   ` Peter Maydell
@ 2015-11-09 15:09     ` Marc-André Lureau
  2015-11-09 15:16       ` Peter Maydell
  0 siblings, 1 reply; 19+ messages in thread
From: Marc-André Lureau @ 2015-11-09 15:09 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Marc-André Lureau, QEMU Developers, Stefan Hajnoczi

Hi

----- Original Message -----
> On 9 November 2015 at 12:51, Peter Maydell <peter.maydell@linaro.org> wrote:
> > On 9 November 2015 at 10:08, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >> The following changes since commit
> >> c4a7bf54e588ff2de9fba0898b686156251b2063:
> >>
> >>   Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into
> >>   staging (2015-11-07 19:55:15 +0000)
> >>
> >> are available in the git repository at:
> >>
> >>   git://github.com/stefanha/qemu.git tags/block-pull-request
> >>
> >> for you to fetch changes up to 84aa0140dd4f04f5d993f0db14c40da8d3de2706:
> >>
> >>   blockdev: acquire AioContext in hmp_commit() (2015-11-09 10:07:10 +0000)
> >>
> >> ----------------------------------------------------------------
> >>
> >> ----------------------------------------------------------------
> >
> > I got that intermittent ivshmem test failure again :-(
> >
> > ERROR:/home/petmay01/qemu/tests/ivshmem-test.c:345:test_ivshmem_server:
> > assertion failed (ret != 0): (0 != 0)
> >
> > I'll retry the tests...
> 
> Yep, worked fine second time, so applied.
> 
> Marc-André, can you look into why the ivshmem tests might be intermittently
> failing like this, please?

Is this with an slow or emulated host? It could be that the 5s timeout is not enough?

Alternatively, I think I could come up with a framework to put some kind of breakpoints in qemu, and wait until they happen. This would get rid of the timer.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PULL v2 0/7] Block patches
  2015-11-09 15:09     ` Marc-André Lureau
@ 2015-11-09 15:16       ` Peter Maydell
  2015-11-09 17:50         ` Marc-André Lureau
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2015-11-09 15:16 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Marc-André Lureau, QEMU Developers, Stefan Hajnoczi

On 9 November 2015 at 15:09, Marc-André Lureau <mlureau@redhat.com> wrote:
>> On 9 November 2015 at 12:51, Peter Maydell <peter.maydell@linaro.org> wrote:
>> Marc-André, can you look into why the ivshmem tests might be intermittently
>> failing like this, please?
>
> Is this with an slow or emulated host? It could be that the 5s timeout
> is not enough?

This is with the 32-bit build on a 64-bit ARM server box. So it's
not the fastest machine in the world, but it's not bad either.
It will be using TCG, obviously.

A test which takes 5 seconds to run isn't ideal from a "keep
the make-check time down" perspective either.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PULL v2 0/7] Block patches
  2015-11-09 15:16       ` Peter Maydell
@ 2015-11-09 17:50         ` Marc-André Lureau
  2015-11-10 18:41           ` Peter Maydell
  0 siblings, 1 reply; 19+ messages in thread
From: Marc-André Lureau @ 2015-11-09 17:50 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Marc-André Lureau, QEMU Developers, Stefan Hajnoczi

Hi

----- Original Message -----
> On 9 November 2015 at 15:09, Marc-André Lureau <mlureau@redhat.com> wrote:
> >> On 9 November 2015 at 12:51, Peter Maydell <peter.maydell@linaro.org>
> >> wrote:
> >> Marc-André, can you look into why the ivshmem tests might be
> >> intermittently
> >> failing like this, please?
> >
> > Is this with an slow or emulated host? It could be that the 5s timeout
> > is not enough?
> 
> This is with the 32-bit build on a 64-bit ARM server box. So it's
> not the fastest machine in the world, but it's not bad either.
> It will be using TCG, obviously.
> 
> A test which takes 5 seconds to run isn't ideal from a "keep
> the make-check time down" perspective either.

I can imagine a test starting a server thread and 2 qemu instances would take more than 5s on such configuration then.

Could you try timing the test a few times to confirm this?

If it's too long, I suggest we move it to g_test_slow(), and perhaps get rid of the timer. 

thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PULL v2 0/7] Block patches
  2015-11-09 17:50         ` Marc-André Lureau
@ 2015-11-10 18:41           ` Peter Maydell
  2015-11-11 20:33             ` Peter Maydell
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2015-11-10 18:41 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Marc-André Lureau, QEMU Developers, Stefan Hajnoczi

On 9 November 2015 at 17:50, Marc-André Lureau <mlureau@redhat.com> wrote:
> Hi
>
> ----- Original Message -----
>> On 9 November 2015 at 15:09, Marc-André Lureau <mlureau@redhat.com> wrote:
>> >> On 9 November 2015 at 12:51, Peter Maydell <peter.maydell@linaro.org>
>> >> wrote:
>> >> Marc-André, can you look into why the ivshmem tests might be
>> >> intermittently
>> >> failing like this, please?
>> >
>> > Is this with an slow or emulated host? It could be that the 5s timeout
>> > is not enough?
>>
>> This is with the 32-bit build on a 64-bit ARM server box. So it's
>> not the fastest machine in the world, but it's not bad either.
>> It will be using TCG, obviously.
>>
>> A test which takes 5 seconds to run isn't ideal from a "keep
>> the make-check time down" perspective either.
>
> I can imagine a test starting a server thread and 2 qemu instances would take more than 5s on such configuration then.
>
> Could you try timing the test a few times to confirm this?

petmay01@moonshot-dsg-11:~/qemu/build/all-a64$ time
QTEST_QEMU_BINARY=i386-softmmu/qemu-system-i386
QTEST_QEMU_IMG=qemu-img MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM %
255 + 1))} gtester -k --verbose -m=quick tests/ivshmem-test
TEST: tests/ivshmem-test... (pid=10893)
  /i386/ivshmem/single:                                                OK
  /i386/ivshmem/pair:                                                  OK
  /i386/ivshmem/server:                                                OK
  /i386/ivshmem/hotplug:                                               OK
  /i386/ivshmem/memdev:                                                OK
PASS: tests/ivshmem-test

real    0m11.945s
user    0m11.020s
sys     0m0.310s

(almost all of the runtime seems to be in the "pair" subtest).

thanks
-- PMM

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PULL v2 0/7] Block patches
  2015-11-10 18:41           ` Peter Maydell
@ 2015-11-11 20:33             ` Peter Maydell
  2015-11-11 20:59               ` Marc-André Lureau
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2015-11-11 20:33 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Marc-André Lureau, QEMU Developers, Stefan Hajnoczi

On 10 November 2015 at 18:41, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 9 November 2015 at 17:50, Marc-André Lureau <mlureau@redhat.com> wrote:
>> I can imagine a test starting a server thread and 2 qemu instances
>> would take more than 5s on such configuration then.
>>
>> Could you try timing the test a few times to confirm this?
>
> petmay01@moonshot-dsg-11:~/qemu/build/all-a64$ time
> QTEST_QEMU_BINARY=i386-softmmu/qemu-system-i386
> QTEST_QEMU_IMG=qemu-img MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM %
> 255 + 1))} gtester -k --verbose -m=quick tests/ivshmem-test
> TEST: tests/ivshmem-test... (pid=10893)
>   /i386/ivshmem/single:                                                OK
>   /i386/ivshmem/pair:                                                  OK
>   /i386/ivshmem/server:                                                OK
>   /i386/ivshmem/hotplug:                                               OK
>   /i386/ivshmem/memdev:                                                OK
> PASS: tests/ivshmem-test
>
> real    0m11.945s
> user    0m11.020s
> sys     0m0.310s
>
> (almost all of the runtime seems to be in the "pair" subtest).

This is now failing on practically every pull request I test.
Please post a patch to fix this test or disable it...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PULL v2 0/7] Block patches
  2015-11-11 20:33             ` Peter Maydell
@ 2015-11-11 20:59               ` Marc-André Lureau
  2015-11-12 10:12                 ` Peter Maydell
  0 siblings, 1 reply; 19+ messages in thread
From: Marc-André Lureau @ 2015-11-11 20:59 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

Hi Peter

On Wed, Nov 11, 2015 at 9:33 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 10 November 2015 at 18:41, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 9 November 2015 at 17:50, Marc-André Lureau <mlureau@redhat.com> wrote:
>>> I can imagine a test starting a server thread and 2 qemu instances
>>> would take more than 5s on such configuration then.
>>>
>>> Could you try timing the test a few times to confirm this?
>>
>> petmay01@moonshot-dsg-11:~/qemu/build/all-a64$ time
>> QTEST_QEMU_BINARY=i386-softmmu/qemu-system-i386
>> QTEST_QEMU_IMG=qemu-img MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM %
>> 255 + 1))} gtester -k --verbose -m=quick tests/ivshmem-test
>> TEST: tests/ivshmem-test... (pid=10893)
>>   /i386/ivshmem/single:                                                OK
>>   /i386/ivshmem/pair:                                                  OK
>>   /i386/ivshmem/server:                                                OK
>>   /i386/ivshmem/hotplug:                                               OK
>>   /i386/ivshmem/memdev:                                                OK
>> PASS: tests/ivshmem-test
>>
>> real    0m11.945s
>> user    0m11.020s
>> sys     0m0.310s
>>
>> (almost all of the runtime seems to be in the "pair" subtest).
>
> This is now failing on practically every pull request I test.
> Please post a patch to fix this test or disable it...

This is the simplest patch I suggest for now.

-- 
Marc-André Lureau

[-- Attachment #2: 0001-tests-classify-some-ivshmem-tests-as-slow.patch --]
[-- Type: text/x-patch, Size: 1313 bytes --]

From 44f229675d4c9ea896fa83e8cbe5add4c5846541 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marc-Andr=C3=A9=20Lureau?= <marcandre.lureau@redhat.com>
Date: Wed, 11 Nov 2015 21:47:05 +0100
Subject: [PATCH] tests: classify some ivshmem tests as slow
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Some tests may take long to run, move them under g_test_slow()
condition.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 tests/ivshmem-test.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
index c8f0cf0..f1793ba 100644
--- a/tests/ivshmem-test.c
+++ b/tests/ivshmem-test.c
@@ -478,10 +478,12 @@ int main(int argc, char **argv)
     tmpserver = g_strconcat(tmpdir, "/server", NULL);
 
     qtest_add_func("/ivshmem/single", test_ivshmem_single);
-    qtest_add_func("/ivshmem/pair", test_ivshmem_pair);
-    qtest_add_func("/ivshmem/server", test_ivshmem_server);
     qtest_add_func("/ivshmem/hotplug", test_ivshmem_hotplug);
     qtest_add_func("/ivshmem/memdev", test_ivshmem_memdev);
+    if (g_test_slow()) {
+        qtest_add_func("/ivshmem/pair", test_ivshmem_pair);
+        qtest_add_func("/ivshmem/server", test_ivshmem_server);
+    }
 
     ret = g_test_run();
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PULL v2 0/7] Block patches
  2015-11-11 20:59               ` Marc-André Lureau
@ 2015-11-12 10:12                 ` Peter Maydell
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Maydell @ 2015-11-12 10:12 UTC (permalink / raw)
  To: Marc-André Lureau; +Cc: QEMU Developers, Stefan Hajnoczi

On 11 November 2015 at 20:59, Marc-André Lureau
<marcandre.lureau@gmail.com> wrote:
> Hi Peter
>
> On Wed, Nov 11, 2015 at 9:33 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 10 November 2015 at 18:41, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> On 9 November 2015 at 17:50, Marc-André Lureau <mlureau@redhat.com> wrote:
>>>> I can imagine a test starting a server thread and 2 qemu instances
>>>> would take more than 5s on such configuration then.
>>>>
>>>> Could you try timing the test a few times to confirm this?
>>>
>>> petmay01@moonshot-dsg-11:~/qemu/build/all-a64$ time
>>> QTEST_QEMU_BINARY=i386-softmmu/qemu-system-i386
>>> QTEST_QEMU_IMG=qemu-img MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM %
>>> 255 + 1))} gtester -k --verbose -m=quick tests/ivshmem-test
>>> TEST: tests/ivshmem-test... (pid=10893)
>>>   /i386/ivshmem/single:                                                OK
>>>   /i386/ivshmem/pair:                                                  OK
>>>   /i386/ivshmem/server:                                                OK
>>>   /i386/ivshmem/hotplug:                                               OK
>>>   /i386/ivshmem/memdev:                                                OK
>>> PASS: tests/ivshmem-test
>>>
>>> real    0m11.945s
>>> user    0m11.020s
>>> sys     0m0.310s
>>>
>>> (almost all of the runtime seems to be in the "pair" subtest).
>>
>> This is now failing on practically every pull request I test.
>> Please post a patch to fix this test or disable it...
>
> This is the simplest patch I suggest for now.

That will still mean that trying the slow tests gives random
failures, so this still needs attention (ie raising the timeouts
to something that won't actually be hit), but I guess it will
solve my immediate problem. Can you send it to the list as
a proper patch, please?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PULL v2 5/7] aio: Introduce aio-epoll.c
  2015-11-09 10:08 ` [Qemu-devel] [PULL v2 5/7] aio: Introduce aio-epoll.c Stefan Hajnoczi
@ 2015-11-13 17:09   ` Paolo Bonzini
  2015-11-16  6:25     ` Fam Zheng
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2015-11-13 17:09 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Peter Maydell, Fam Zheng



On 09/11/2015 11:08, Stefan Hajnoczi wrote:
> From: Fam Zheng <famz@redhat.com>
> 
> To minimize code duplication, epoll is hooked into aio-posix's
> aio_poll() instead of rolling its own. This approach also has both
> compile-time and run-time switchability.
> 
> 1) When QEMU starts with a small number of fds in the event loop, ppoll
> is used.
> 
> 2) When QEMU starts with a big number of fds, or when more devices are
> hot plugged, epoll kicks in when the number of fds hits the threshold.
> 
> 3) Some fds may not support epoll, such as tty based stdio. In this
> case, it falls back to ppoll.
> 
> A rough benchmark with scsi-disk on virtio-scsi dataplane (epoll gets
> enabled from 64 onward). Numbers are in MB/s.
> 
> ===============================================
>              |     master     |     epoll
>              |                |
> scsi disks # | read    randrw | read    randrw
> -------------|----------------|----------------
> 1            | 86      36     | 92      45
> 8            | 87      43     | 86      41
> 64           | 71      32     | 70      38
> 128          | 48      24     | 58      31
> 256          | 37      19     | 57      28
> ===============================================
> 
> To comply with aio_{disable,enable}_external, we always use ppoll when
> aio_external_disabled() is true.
> 
> [Removed #ifdef CONFIG_EPOLL around AioContext epollfd field declaration
> since the field is also referenced outside CONFIG_EPOLL code.
> --Stefan]
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> Message-id: 1446177989-6702-4-git-send-email-famz@redhat.com
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  aio-posix.c         | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  include/block/aio.h |   5 ++
>  2 files changed, 188 insertions(+), 1 deletion(-)
> 
> diff --git a/aio-posix.c b/aio-posix.c
> index 5bff3cd..06148a9 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -17,6 +17,9 @@
>  #include "block/block.h"
>  #include "qemu/queue.h"
>  #include "qemu/sockets.h"
> +#ifdef CONFIG_EPOLL
> +#include <sys/epoll.h>
> +#endif
>  
>  struct AioHandler
>  {
> @@ -29,6 +32,162 @@ struct AioHandler
>      QLIST_ENTRY(AioHandler) node;
>  };
>  
> +#ifdef CONFIG_EPOLL
> +
> +/* The fd number threashold to switch to epoll */
> +#define EPOLL_ENABLE_THRESHOLD 64
> +
> +static void aio_epoll_disable(AioContext *ctx)
> +{
> +    ctx->epoll_available = false;
> +    if (!ctx->epoll_enabled) {
> +        return;
> +    }
> +    ctx->epoll_enabled = false;
> +    close(ctx->epollfd);
> +}
> +
> +static inline int epoll_events_from_pfd(int pfd_events)
> +{
> +    return (pfd_events & G_IO_IN ? EPOLLIN : 0) |
> +           (pfd_events & G_IO_OUT ? EPOLLOUT : 0) |
> +           (pfd_events & G_IO_HUP ? EPOLLHUP : 0) |
> +           (pfd_events & G_IO_ERR ? EPOLLERR : 0);
> +}
> +
> +static bool aio_epoll_try_enable(AioContext *ctx)
> +{
> +    AioHandler *node;
> +    struct epoll_event event;
> +
> +    QLIST_FOREACH(node, &ctx->aio_handlers, node) {
> +        int r;
> +        if (node->deleted || !node->pfd.events) {
> +            continue;
> +        }
> +        event.events = epoll_events_from_pfd(node->pfd.events);
> +        event.data.ptr = node;
> +        r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event);
> +        if (r) {
> +            return false;
> +        }
> +    }
> +    ctx->epoll_enabled = true;
> +    return true;
> +}
> +
> +static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
> +{
> +    struct epoll_event event;
> +    int r;
> +
> +    if (!ctx->epoll_enabled) {
> +        return;
> +    }
> +    if (!node->pfd.events) {

Coverity says that node might have been freed by the time you call
aio_epoll_update.  You need to pass node->pfd.fd and node->pfd.events by
value instead, I think, or move the call earlier in aio_set_fd_handler.

Paolo

> +        r = epoll_ctl(ctx->epollfd, EPOLL_CTL_DEL, node->pfd.fd, &event);
> +        if (r) {
> +            aio_epoll_disable(ctx);
> +        }
> +    } else {
> +        event.data.ptr = node;
> +        event.events = epoll_events_from_pfd(node->pfd.events);
> +        if (is_new) {
> +            r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event);
> +            if (r) {
> +                aio_epoll_disable(ctx);
> +            }
> +        } else {
> +            r = epoll_ctl(ctx->epollfd, EPOLL_CTL_MOD, node->pfd.fd, &event);
> +            if (r) {
> +                aio_epoll_disable(ctx);
> +            }
> +        }
> +    }
> +}
> +
> +static int aio_epoll(AioContext *ctx, GPollFD *pfds,
> +                     unsigned npfd, int64_t timeout)
> +{
> +    AioHandler *node;
> +    int i, ret = 0;
> +    struct epoll_event events[128];
> +
> +    assert(npfd == 1);
> +    assert(pfds[0].fd == ctx->epollfd);
> +    if (timeout > 0) {
> +        ret = qemu_poll_ns(pfds, npfd, timeout);
> +    }
> +    if (timeout <= 0 || ret > 0) {
> +        ret = epoll_wait(ctx->epollfd, events,
> +                         sizeof(events) / sizeof(events[0]),
> +                         timeout);
> +        if (ret <= 0) {
> +            goto out;
> +        }
> +        for (i = 0; i < ret; i++) {
> +            int ev = events[i].events;
> +            node = events[i].data.ptr;
> +            node->pfd.revents = (ev & EPOLLIN ? G_IO_IN : 0) |
> +                (ev & EPOLLOUT ? G_IO_OUT : 0) |
> +                (ev & EPOLLHUP ? G_IO_HUP : 0) |
> +                (ev & EPOLLERR ? G_IO_ERR : 0);
> +        }
> +    }
> +out:
> +    return ret;
> +}
> +
> +static bool aio_epoll_enabled(AioContext *ctx)
> +{
> +    /* Fall back to ppoll when external clients are disabled. */
> +    return !aio_external_disabled(ctx) && ctx->epoll_enabled;
> +}
> +
> +static bool aio_epoll_check_poll(AioContext *ctx, GPollFD *pfds,
> +                                 unsigned npfd, int64_t timeout)
> +{
> +    if (!ctx->epoll_available) {
> +        return false;
> +    }
> +    if (aio_epoll_enabled(ctx)) {
> +        return true;
> +    }
> +    if (npfd >= EPOLL_ENABLE_THRESHOLD) {
> +        if (aio_epoll_try_enable(ctx)) {
> +            return true;
> +        } else {
> +            aio_epoll_disable(ctx);
> +        }
> +    }
> +    return false;
> +}
> +
> +#else
> +
> +static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
> +{
> +}
> +
> +static int aio_epoll(AioContext *ctx, GPollFD *pfds,
> +                     unsigned npfd, int64_t timeout)
> +{
> +    assert(false);
> +}
> +
> +static bool aio_epoll_enabled(AioContext *ctx)
> +{
> +    return false;
> +}
> +
> +static bool aio_epoll_check_poll(AioContext *ctx, GPollFD *pfds,
> +                          unsigned npfd, int64_t timeout)
> +{
> +    return false;
> +}
> +
> +#endif
> +
>  static AioHandler *find_aio_handler(AioContext *ctx, int fd)
>  {
>      AioHandler *node;
> @@ -50,6 +209,7 @@ void aio_set_fd_handler(AioContext *ctx,
>                          void *opaque)
>  {
>      AioHandler *node;
> +    bool is_new = false;
>  
>      node = find_aio_handler(ctx, fd);
>  
> @@ -79,6 +239,7 @@ void aio_set_fd_handler(AioContext *ctx,
>              QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
>  
>              g_source_add_poll(&ctx->source, &node->pfd);
> +            is_new = true;
>          }
>          /* Update handler with latest information */
>          node->io_read = io_read;
> @@ -90,6 +251,7 @@ void aio_set_fd_handler(AioContext *ctx,
>          node->pfd.events |= (io_write ? G_IO_OUT | G_IO_ERR : 0);
>      }
>  
> +    aio_epoll_update(ctx, node, is_new);
>      aio_notify(ctx);
>  }
>  
> @@ -262,6 +424,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
>      /* fill pollfds */
>      QLIST_FOREACH(node, &ctx->aio_handlers, node) {
>          if (!node->deleted && node->pfd.events
> +            && !aio_epoll_enabled(ctx)
>              && aio_node_check(ctx, node->is_external)) {
>              add_pollfd(node);
>          }
> @@ -273,7 +436,17 @@ bool aio_poll(AioContext *ctx, bool blocking)
>      if (timeout) {
>          aio_context_release(ctx);
>      }
> -    ret = qemu_poll_ns((GPollFD *)pollfds, npfd, timeout);
> +    if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
> +        AioHandler epoll_handler;
> +
> +        epoll_handler.pfd.fd = ctx->epollfd;
> +        epoll_handler.pfd.events = G_IO_IN | G_IO_OUT | G_IO_HUP | G_IO_ERR;
> +        npfd = 0;
> +        add_pollfd(&epoll_handler);
> +        ret = aio_epoll(ctx, pollfds, npfd, timeout);
> +    } else  {
> +        ret = qemu_poll_ns(pollfds, npfd, timeout);
> +    }
>      if (blocking) {
>          atomic_sub(&ctx->notify_me, 2);
>      }
> @@ -305,4 +478,13 @@ bool aio_poll(AioContext *ctx, bool blocking)
>  
>  void aio_context_setup(AioContext *ctx, Error **errp)
>  {
> +#ifdef CONFIG_EPOLL
> +    assert(!ctx->epollfd);
> +    ctx->epollfd = epoll_create1(EPOLL_CLOEXEC);
> +    if (ctx->epollfd == -1) {
> +        ctx->epoll_available = false;
> +    } else {
> +        ctx->epoll_available = true;
> +    }
> +#endif
>  }
> diff --git a/include/block/aio.h b/include/block/aio.h
> index 9ffecf7..e086e3b 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -124,6 +124,11 @@ struct AioContext {
>      QEMUTimerListGroup tlg;
>  
>      int external_disable_cnt;
> +
> +    /* epoll(7) state used when built with CONFIG_EPOLL */
> +    int epollfd;
> +    bool epoll_enabled;
> +    bool epoll_available;
>  };
>  
>  /**
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PULL v2 5/7] aio: Introduce aio-epoll.c
  2015-11-13 17:09   ` Paolo Bonzini
@ 2015-11-16  6:25     ` Fam Zheng
  0 siblings, 0 replies; 19+ messages in thread
From: Fam Zheng @ 2015-11-16  6:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Peter Maydell, qemu-devel, Stefan Hajnoczi

On Fri, 11/13 18:09, Paolo Bonzini wrote:
> > +static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
> > +{
> > +    struct epoll_event event;
> > +    int r;
> > +
> > +    if (!ctx->epoll_enabled) {
> > +        return;
> > +    }
> > +    if (!node->pfd.events) {
> 
> Coverity says that node might have been freed by the time you call
> aio_epoll_update.  You need to pass node->pfd.fd and node->pfd.events by
> value instead, I think, or move the call earlier in aio_set_fd_handler.
> 

Yes, I'll send a patch.

Fam

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-11-16  6:26 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-09 10:08 [Qemu-devel] [PULL v2 0/7] Block patches Stefan Hajnoczi
2015-11-09 10:08 ` [Qemu-devel] [PULL v2 1/7] dataplane: simplify indirect descriptor read Stefan Hajnoczi
2015-11-09 10:08 ` [Qemu-devel] [PULL v2 2/7] dataplane: support non-contigious s/g Stefan Hajnoczi
2015-11-09 10:08 ` [Qemu-devel] [PULL v2 3/7] aio: Introduce aio_external_disabled Stefan Hajnoczi
2015-11-09 10:08 ` [Qemu-devel] [PULL v2 4/7] aio: Introduce aio_context_setup Stefan Hajnoczi
2015-11-09 10:08 ` [Qemu-devel] [PULL v2 5/7] aio: Introduce aio-epoll.c Stefan Hajnoczi
2015-11-13 17:09   ` Paolo Bonzini
2015-11-16  6:25     ` Fam Zheng
2015-11-09 10:08 ` [Qemu-devel] [PULL v2 6/7] monitor: add missed aio_context_acquire into vm_completion call Stefan Hajnoczi
2015-11-09 10:08 ` [Qemu-devel] [PULL v2 7/7] blockdev: acquire AioContext in hmp_commit() Stefan Hajnoczi
2015-11-09 12:51 ` [Qemu-devel] [PULL v2 0/7] Block patches Peter Maydell
2015-11-09 14:30   ` Peter Maydell
2015-11-09 15:09     ` Marc-André Lureau
2015-11-09 15:16       ` Peter Maydell
2015-11-09 17:50         ` Marc-André Lureau
2015-11-10 18:41           ` Peter Maydell
2015-11-11 20:33             ` Peter Maydell
2015-11-11 20:59               ` Marc-André Lureau
2015-11-12 10:12                 ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.