All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
@ 2010-12-12 15:02 Stefan Hajnoczi
  2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 1/4] virtio-pci: Rename bugs field to flags Stefan Hajnoczi
                   ` (5 more replies)
  0 siblings, 6 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Michael S. Tsirkin

See below for the v5 changelog.

Due to lack of connectivity I am sending from GMail.  Git should retain my
stefanha@linux.vnet.ibm.com From address.

Virtqueue notify is currently handled synchronously in userspace virtio.  This
prevents the vcpu from executing guest code while hardware emulation code
handles the notify.

On systems that support KVM, the ioeventfd mechanism can be used to make
virtqueue notify a lightweight exit by deferring hardware emulation to the
iothread and allowing the VM to continue execution.  This model is similar to
how vhost receives virtqueue notifies.

The result of this change is improved performance for userspace virtio devices.
Virtio-blk throughput increases especially for multithreaded scenarios and
virtio-net transmit throughput increases substantially.

Now that this code is in virtio-pci.c it is possible to explicitly enable
devices for which virtio-ioeventfd should be used.  Only virtio-blk and
virtio-net are enabled at this time.

v5:
 * Fix spurious whitespace change in documentation
 * Test and clear event notifier when deassigning to catch race condition

v4:
 * Simpler start/stop ioeventfd mechanism using bool ioeventfd_started state
 * Support for migration
 * Handle deassign race condition to avoid dropping a virtqueue kick
 * Add missing kvm_enabled() check to kvm_has_many_ioeventfds()
 * Documentation updates for qdev -device with ioeventfd=on|off

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v5 1/4] virtio-pci: Rename bugs field to flags
  2010-12-12 15:02 [Qemu-devel] [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
@ 2010-12-12 15:02 ` Stefan Hajnoczi
  2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify Stefan Hajnoczi
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefan Hajnoczi, Michael S. Tsirkin

The VirtIOPCIProxy bugs field is currently used to enable workarounds
for older guests.  Rename it to flags so that other per-device behavior
can be tracked.

A later patch uses the flags field to remember whether ioeventfd should
be used for virtqueue host notification.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 hw/virtio-pci.c |   15 +++++++--------
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 6186142..13dd391 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -80,9 +80,8 @@
  * 12 is historical, and due to x86 page size. */
 #define VIRTIO_PCI_QUEUE_ADDR_SHIFT    12
 
-/* We can catch some guest bugs inside here so we continue supporting older
-   guests. */
-#define VIRTIO_PCI_BUG_BUS_MASTER	(1 << 0)
+/* Flags track per-device state like workarounds for quirks in older guests. */
+#define VIRTIO_PCI_FLAG_BUS_MASTER_BUG  (1 << 0)
 
 /* QEMU doesn't strictly need write barriers since everything runs in
  * lock-step.  We'll leave the calls to wmb() in though to make it obvious for
@@ -95,7 +94,7 @@
 typedef struct {
     PCIDevice pci_dev;
     VirtIODevice *vdev;
-    uint32_t bugs;
+    uint32_t flags;
     uint32_t addr;
     uint32_t class_code;
     uint32_t nvectors;
@@ -159,7 +158,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile *f)
        in ready state. Then we have a buggy guest OS. */
     if ((proxy->vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
         !(proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER)) {
-        proxy->bugs |= VIRTIO_PCI_BUG_BUS_MASTER;
+        proxy->flags |= VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
     }
     return 0;
 }
@@ -185,7 +184,7 @@ static void virtio_pci_reset(DeviceState *d)
     VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev);
     virtio_reset(proxy->vdev);
     msix_reset(&proxy->pci_dev);
-    proxy->bugs = 0;
+    proxy->flags = 0;
 }
 
 static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
@@ -235,7 +234,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
            some safety checks. */
         if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
             !(proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER)) {
-            proxy->bugs |= VIRTIO_PCI_BUG_BUS_MASTER;
+            proxy->flags |= VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
         }
         break;
     case VIRTIO_MSI_CONFIG_VECTOR:
@@ -403,7 +402,7 @@ static void virtio_write_config(PCIDevice *pci_dev, uint32_t address,
 
     if (PCI_COMMAND == address) {
         if (!(val & PCI_COMMAND_MASTER)) {
-            if (!(proxy->bugs & VIRTIO_PCI_BUG_BUS_MASTER)) {
+            if (!(proxy->flags & VIRTIO_PCI_FLAG_BUS_MASTER_BUG)) {
                 virtio_set_status(proxy->vdev,
                                   proxy->vdev->status & ~VIRTIO_CONFIG_S_DRIVER_OK);
             }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2010-12-12 15:02 [Qemu-devel] [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
  2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 1/4] virtio-pci: Rename bugs field to flags Stefan Hajnoczi
@ 2010-12-12 15:02 ` Stefan Hajnoczi
  2011-01-24 18:54   ` Kevin Wolf
  2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 3/4] virtio-pci: Don't use ioeventfd on old kernels Stefan Hajnoczi
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefan Hajnoczi, Michael S. Tsirkin

Virtqueue notify is currently handled synchronously in userspace virtio.  This
prevents the vcpu from executing guest code while hardware emulation code
handles the notify.

On systems that support KVM, the ioeventfd mechanism can be used to make
virtqueue notify a lightweight exit by deferring hardware emulation to the
iothread and allowing the VM to continue execution.  This model is similar to
how vhost receives virtqueue notifies.

The result of this change is improved performance for userspace virtio devices.
Virtio-blk throughput increases especially for multithreaded scenarios and
virtio-net transmit throughput increases substantially.

Some virtio devices are known to have guest drivers which expect a notify to be
processed synchronously and spin waiting for completion.  Only enable ioeventfd
for virtio-blk and virtio-net for now.

Care must be taken not to interfere with vhost-net, which uses host
notifiers.  If the set_host_notifier() API is used by a device
virtio-pci will disable virtio-ioeventfd and let the device deal with
host notifiers as it wishes.

After migration and on VM change state (running/paused) virtio-ioeventfd
will enable/disable itself.

 * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
 * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
 * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
 * vm_change_state(running=0) -> disable virtio-ioeventfd
 * vm_change_state(running=1) -> enable virtio-ioeventfd

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 hw/virtio-pci.c |  190 ++++++++++++++++++++++++++++++++++++++++++++++++-------
 hw/virtio.c     |   14 +++-
 hw/virtio.h     |    1 +
 3 files changed, 179 insertions(+), 26 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 13dd391..f57c45a 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -83,6 +83,11 @@
 /* Flags track per-device state like workarounds for quirks in older guests. */
 #define VIRTIO_PCI_FLAG_BUS_MASTER_BUG  (1 << 0)
 
+/* Performance improves when virtqueue kick processing is decoupled from the
+ * vcpu thread using ioeventfd for some devices. */
+#define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1
+#define VIRTIO_PCI_FLAG_USE_IOEVENTFD   (1 << VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT)
+
 /* QEMU doesn't strictly need write barriers since everything runs in
  * lock-step.  We'll leave the calls to wmb() in though to make it obvious for
  * KVM or if kqemu gets SMP support.
@@ -107,6 +112,8 @@ typedef struct {
     /* Max. number of ports we can have for a the virtio-serial device */
     uint32_t max_virtserial_ports;
     virtio_net_conf net;
+    bool ioeventfd_started;
+    VMChangeStateEntry *vm_change_state_entry;
 } VirtIOPCIProxy;
 
 /* virtio device */
@@ -179,12 +186,131 @@ static int virtio_pci_load_queue(void * opaque, int n, QEMUFile *f)
     return 0;
 }
 
+static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy,
+                                                 int n, bool assign)
+{
+    VirtQueue *vq = virtio_get_queue(proxy->vdev, n);
+    EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
+    int r;
+    if (assign) {
+        r = event_notifier_init(notifier, 1);
+        if (r < 0) {
+            return r;
+        }
+        r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
+                                       proxy->addr + VIRTIO_PCI_QUEUE_NOTIFY,
+                                       n, assign);
+        if (r < 0) {
+            event_notifier_cleanup(notifier);
+        }
+    } else {
+        r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
+                                       proxy->addr + VIRTIO_PCI_QUEUE_NOTIFY,
+                                       n, assign);
+        if (r < 0) {
+            return r;
+        }
+
+        /* Handle the race condition where the guest kicked and we deassigned
+         * before we got around to handling the kick.
+         */
+        if (event_notifier_test_and_clear(notifier)) {
+            virtio_queue_notify_vq(vq);
+        }
+
+        event_notifier_cleanup(notifier);
+    }
+    return r;
+}
+
+static void virtio_pci_host_notifier_read(void *opaque)
+{
+    VirtQueue *vq = opaque;
+    EventNotifier *n = virtio_queue_get_host_notifier(vq);
+    if (event_notifier_test_and_clear(n)) {
+        virtio_queue_notify_vq(vq);
+    }
+}
+
+static void virtio_pci_set_host_notifier_fd_handler(VirtIOPCIProxy *proxy,
+                                                    int n, bool assign)
+{
+    VirtQueue *vq = virtio_get_queue(proxy->vdev, n);
+    EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
+    if (assign) {
+        qemu_set_fd_handler(event_notifier_get_fd(notifier),
+                            virtio_pci_host_notifier_read, NULL, vq);
+    } else {
+        qemu_set_fd_handler(event_notifier_get_fd(notifier),
+                            NULL, NULL, NULL);
+    }
+}
+
+static int virtio_pci_start_ioeventfd(VirtIOPCIProxy *proxy)
+{
+    int n, r;
+
+    if (!(proxy->flags & VIRTIO_PCI_FLAG_USE_IOEVENTFD) ||
+        proxy->ioeventfd_started) {
+        return 0;
+    }
+
+    for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) {
+        if (!virtio_queue_get_num(proxy->vdev, n)) {
+            continue;
+        }
+
+        r = virtio_pci_set_host_notifier_internal(proxy, n, true);
+        if (r < 0) {
+            goto assign_error;
+        }
+
+        virtio_pci_set_host_notifier_fd_handler(proxy, n, true);
+    }
+    proxy->ioeventfd_started = true;
+    return 0;
+
+assign_error:
+    while (--n >= 0) {
+        if (!virtio_queue_get_num(proxy->vdev, n)) {
+            continue;
+        }
+
+        virtio_pci_set_host_notifier_fd_handler(proxy, n, false);
+        virtio_pci_set_host_notifier_internal(proxy, n, false);
+    }
+    proxy->ioeventfd_started = false;
+    proxy->flags &= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
+    return r;
+}
+
+static int virtio_pci_stop_ioeventfd(VirtIOPCIProxy *proxy)
+{
+    int n;
+
+    if (!proxy->ioeventfd_started) {
+        return 0;
+    }
+
+    for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) {
+        if (!virtio_queue_get_num(proxy->vdev, n)) {
+            continue;
+        }
+
+        virtio_pci_set_host_notifier_fd_handler(proxy, n, false);
+        virtio_pci_set_host_notifier_internal(proxy, n, false);
+    }
+    proxy->ioeventfd_started = false;
+    return 0;
+}
+
 static void virtio_pci_reset(DeviceState *d)
 {
     VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev);
+    virtio_pci_stop_ioeventfd(proxy);
     virtio_reset(proxy->vdev);
     msix_reset(&proxy->pci_dev);
-    proxy->flags = 0;
+    proxy->flags &= ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
 }
 
 static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
@@ -209,6 +335,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
     case VIRTIO_PCI_QUEUE_PFN:
         pa = (target_phys_addr_t)val << VIRTIO_PCI_QUEUE_ADDR_SHIFT;
         if (pa == 0) {
+            virtio_pci_stop_ioeventfd(proxy);
             virtio_reset(proxy->vdev);
             msix_unuse_all_vectors(&proxy->pci_dev);
         }
@@ -223,6 +350,12 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
         virtio_queue_notify(vdev, val);
         break;
     case VIRTIO_PCI_STATUS:
+        if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
+            virtio_pci_start_ioeventfd(proxy);
+        } else {
+            virtio_pci_stop_ioeventfd(proxy);
+        }
+
         virtio_set_status(vdev, val & 0xFF);
         if (vdev->status == 0) {
             virtio_reset(proxy->vdev);
@@ -403,6 +536,7 @@ static void virtio_write_config(PCIDevice *pci_dev, uint32_t address,
     if (PCI_COMMAND == address) {
         if (!(val & PCI_COMMAND_MASTER)) {
             if (!(proxy->flags & VIRTIO_PCI_FLAG_BUS_MASTER_BUG)) {
+                virtio_pci_stop_ioeventfd(proxy);
                 virtio_set_status(proxy->vdev,
                                   proxy->vdev->status & ~VIRTIO_CONFIG_S_DRIVER_OK);
             }
@@ -480,30 +614,27 @@ assign_error:
 static int virtio_pci_set_host_notifier(void *opaque, int n, bool assign)
 {
     VirtIOPCIProxy *proxy = opaque;
-    VirtQueue *vq = virtio_get_queue(proxy->vdev, n);
-    EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
-    int r;
-    if (assign) {
-        r = event_notifier_init(notifier, 1);
-        if (r < 0) {
-            return r;
-        }
-        r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
-                                       proxy->addr + VIRTIO_PCI_QUEUE_NOTIFY,
-                                       n, assign);
-        if (r < 0) {
-            event_notifier_cleanup(notifier);
-        }
+
+    /* Stop using ioeventfd for virtqueue kick if the device starts using host
+     * notifiers.  This makes it easy to avoid stepping on each others' toes.
+     */
+    if (proxy->ioeventfd_started) {
+        virtio_pci_stop_ioeventfd(proxy);
+        proxy->flags &= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
+    }
+
+    return virtio_pci_set_host_notifier_internal(proxy, n, assign);
+}
+
+static void virtio_pci_vm_change_state_handler(void *opaque, int running, int reason)
+{
+    VirtIOPCIProxy *proxy = opaque;
+
+    if (running && (proxy->vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+        virtio_pci_start_ioeventfd(proxy);
     } else {
-        r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
-                                       proxy->addr + VIRTIO_PCI_QUEUE_NOTIFY,
-                                       n, assign);
-        if (r < 0) {
-            return r;
-        }
-        event_notifier_cleanup(notifier);
+        virtio_pci_stop_ioeventfd(proxy);
     }
-    return r;
 }
 
 static const VirtIOBindings virtio_pci_bindings = {
@@ -563,6 +694,10 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
     proxy->host_features |= 0x1 << VIRTIO_F_NOTIFY_ON_EMPTY;
     proxy->host_features |= 0x1 << VIRTIO_F_BAD_FEATURE;
     proxy->host_features = vdev->get_features(vdev, proxy->host_features);
+
+    proxy->vm_change_state_entry = qemu_add_vm_change_state_handler(
+                                        virtio_pci_vm_change_state_handler,
+                                        proxy);
 }
 
 static int virtio_blk_init_pci(PCIDevice *pci_dev)
@@ -590,6 +725,9 @@ static int virtio_blk_init_pci(PCIDevice *pci_dev)
 
 static int virtio_exit_pci(PCIDevice *pci_dev)
 {
+    VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
+
+    qemu_del_vm_change_state_handler(proxy->vm_change_state_entry);
     return msix_uninit(pci_dev);
 }
 
@@ -597,6 +735,7 @@ static int virtio_blk_exit_pci(PCIDevice *pci_dev)
 {
     VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
 
+    virtio_pci_stop_ioeventfd(proxy);
     virtio_blk_exit(proxy->vdev);
     blockdev_mark_auto_del(proxy->block.bs);
     return virtio_exit_pci(pci_dev);
@@ -658,6 +797,7 @@ static int virtio_net_exit_pci(PCIDevice *pci_dev)
 {
     VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
 
+    virtio_pci_stop_ioeventfd(proxy);
     virtio_net_exit(proxy->vdev);
     return virtio_exit_pci(pci_dev);
 }
@@ -705,6 +845,8 @@ static PCIDeviceInfo virtio_info[] = {
         .qdev.props = (Property[]) {
             DEFINE_PROP_HEX32("class", VirtIOPCIProxy, class_code, 0),
             DEFINE_BLOCK_PROPERTIES(VirtIOPCIProxy, block),
+            DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
+                            VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
             DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
             DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
             DEFINE_PROP_END_OF_LIST(),
@@ -717,6 +859,8 @@ static PCIDeviceInfo virtio_info[] = {
         .exit       = virtio_net_exit_pci,
         .romfile    = "pxe-virtio.bin",
         .qdev.props = (Property[]) {
+            DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
+                            VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
             DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 3),
             DEFINE_VIRTIO_NET_FEATURES(VirtIOPCIProxy, host_features),
             DEFINE_NIC_PROPERTIES(VirtIOPCIProxy, nic),
diff --git a/hw/virtio.c b/hw/virtio.c
index 07dbf86..e40296a 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -575,11 +575,19 @@ int virtio_queue_get_num(VirtIODevice *vdev, int n)
     return vdev->vq[n].vring.num;
 }
 
+void virtio_queue_notify_vq(VirtQueue *vq)
+{
+    if (vq->vring.desc) {
+        VirtIODevice *vdev = vq->vdev;
+        trace_virtio_queue_notify(vdev, vq - vdev->vq, vq);
+        vq->handle_output(vdev, vq);
+    }
+}
+
 void virtio_queue_notify(VirtIODevice *vdev, int n)
 {
-    if (n < VIRTIO_PCI_QUEUE_MAX && vdev->vq[n].vring.desc) {
-        trace_virtio_queue_notify(vdev, n, &vdev->vq[n]);
-        vdev->vq[n].handle_output(vdev, &vdev->vq[n]);
+    if (n < VIRTIO_PCI_QUEUE_MAX) {
+        virtio_queue_notify_vq(&vdev->vq[n]);
     }
 }
 
diff --git a/hw/virtio.h b/hw/virtio.h
index 02fa312..5ae521c 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -219,5 +219,6 @@ void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx);
 VirtQueue *virtio_get_queue(VirtIODevice *vdev, int n);
 EventNotifier *virtio_queue_get_guest_notifier(VirtQueue *vq);
 EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq);
+void virtio_queue_notify_vq(VirtQueue *vq);
 void virtio_irq(VirtQueue *vq);
 #endif
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v5 3/4] virtio-pci: Don't use ioeventfd on old kernels
  2010-12-12 15:02 [Qemu-devel] [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
  2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 1/4] virtio-pci: Rename bugs field to flags Stefan Hajnoczi
  2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify Stefan Hajnoczi
@ 2010-12-12 15:02 ` Stefan Hajnoczi
  2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 4/4] docs: Document virtio PCI -device ioeventfd=on|off Stefan Hajnoczi
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefan Hajnoczi, Michael S. Tsirkin

There used to be a limit of 6 KVM io bus devices inside the kernel.  On
such a kernel, don't use ioeventfd for virtqueue host notification since
the limit is reached too easily.  This ensures that existing vhost-net
setups (which always use ioeventfd) have ioeventfds available so they
can continue to work.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 hw/virtio-pci.c |    4 ++++
 kvm-all.c       |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
 kvm-stub.c      |    5 +++++
 kvm.h           |    1 +
 4 files changed, 59 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index f57c45a..db0df67 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -690,6 +690,10 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
     pci_register_bar(&proxy->pci_dev, 0, size, PCI_BASE_ADDRESS_SPACE_IO,
                            virtio_map);
 
+    if (!kvm_has_many_ioeventfds()) {
+        proxy->flags &= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
+    }
+
     virtio_bind_device(vdev, &virtio_pci_bindings, proxy);
     proxy->host_features |= 0x1 << VIRTIO_F_NOTIFY_ON_EMPTY;
     proxy->host_features |= 0x1 << VIRTIO_F_BAD_FEATURE;
diff --git a/kvm-all.c b/kvm-all.c
index cae24bb..255b6fa 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -28,6 +28,11 @@
 #include "kvm.h"
 #include "bswap.h"
 
+/* This check must be after config-host.h is included */
+#ifdef CONFIG_EVENTFD
+#include <sys/eventfd.h>
+#endif
+
 /* KVM uses PAGE_SIZE in it's definition of COALESCED_MMIO_MAX */
 #define PAGE_SIZE TARGET_PAGE_SIZE
 
@@ -72,6 +77,7 @@ struct KVMState
     int irqchip_in_kernel;
     int pit_in_kernel;
     int xsave, xcrs;
+    int many_ioeventfds;
 };
 
 static KVMState *kvm_state;
@@ -441,6 +447,39 @@ int kvm_check_extension(KVMState *s, unsigned int extension)
     return ret;
 }
 
+static int kvm_check_many_ioeventfds(void)
+{
+    /* Older kernels have a 6 device limit on the KVM io bus.  Find out so we
+     * can avoid creating too many ioeventfds.
+     */
+#ifdef CONFIG_EVENTFD
+    int ioeventfds[7];
+    int i, ret = 0;
+    for (i = 0; i < ARRAY_SIZE(ioeventfds); i++) {
+        ioeventfds[i] = eventfd(0, EFD_CLOEXEC);
+        if (ioeventfds[i] < 0) {
+            break;
+        }
+        ret = kvm_set_ioeventfd_pio_word(ioeventfds[i], 0, i, true);
+        if (ret < 0) {
+            close(ioeventfds[i]);
+            break;
+        }
+    }
+
+    /* Decide whether many devices are supported or not */
+    ret = i == ARRAY_SIZE(ioeventfds);
+
+    while (i-- > 0) {
+        kvm_set_ioeventfd_pio_word(ioeventfds[i], 0, i, false);
+        close(ioeventfds[i]);
+    }
+    return ret;
+#else
+    return 0;
+#endif
+}
+
 static void kvm_set_phys_mem(target_phys_addr_t start_addr,
 			     ram_addr_t size,
 			     ram_addr_t phys_offset)
@@ -717,6 +756,8 @@ int kvm_init(int smp_cpus)
     kvm_state = s;
     cpu_register_phys_memory_client(&kvm_cpu_phys_memory_client);
 
+    s->many_ioeventfds = kvm_check_many_ioeventfds();
+
     return 0;
 
 err:
@@ -1046,6 +1087,14 @@ int kvm_has_xcrs(void)
     return kvm_state->xcrs;
 }
 
+int kvm_has_many_ioeventfds(void)
+{
+    if (!kvm_enabled()) {
+        return 0;
+    }
+    return kvm_state->many_ioeventfds;
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
     if (!kvm_has_sync_mmu()) {
diff --git a/kvm-stub.c b/kvm-stub.c
index 5384a4b..33d4476 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -99,6 +99,11 @@ int kvm_has_robust_singlestep(void)
     return 0;
 }
 
+int kvm_has_many_ioeventfds(void)
+{
+    return 0;
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
 }
diff --git a/kvm.h b/kvm.h
index 60a9b42..ce08d42 100644
--- a/kvm.h
+++ b/kvm.h
@@ -42,6 +42,7 @@ int kvm_has_robust_singlestep(void);
 int kvm_has_debugregs(void);
 int kvm_has_xsave(void);
 int kvm_has_xcrs(void);
+int kvm_has_many_ioeventfds(void);
 
 #ifdef NEED_CPU_H
 int kvm_init_vcpu(CPUState *env);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v5 4/4] docs: Document virtio PCI -device ioeventfd=on|off
  2010-12-12 15:02 [Qemu-devel] [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 3/4] virtio-pci: Don't use ioeventfd on old kernels Stefan Hajnoczi
@ 2010-12-12 15:02 ` Stefan Hajnoczi
  2010-12-12 15:14 ` [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
  2010-12-12 20:41 ` Michael S. Tsirkin
  5 siblings, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-12 15:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefan Hajnoczi, Michael S. Tsirkin

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 docs/qdev-device-use.txt |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/docs/qdev-device-use.txt b/docs/qdev-device-use.txt
index f252c8e..84d0c82 100644
--- a/docs/qdev-device-use.txt
+++ b/docs/qdev-device-use.txt
@@ -97,10 +97,13 @@ The -device argument differs in detail for each kind of drive:
 
 * if=virtio
 
-  -device virtio-blk-pci,drive=DRIVE-ID,class=C,vectors=V
+  -device virtio-blk-pci,drive=DRIVE-ID,class=C,vectors=V,ioeventfd=IOEVENTFD
 
   This lets you control PCI device class and MSI-X vectors.
 
+  IOEVENTFD controls whether or not ioeventfd is used for virtqueue notify.  It
+  can be set to on (default) or off.
+
   As for all PCI devices, you can add bus=PCI-BUS,addr=DEVFN to
   control the PCI device address.
 
@@ -240,6 +243,9 @@ For PCI devices, you can add bus=PCI-BUS,addr=DEVFN to control the PCI
 device address, as usual.  The old -net nic provides parameter addr
 for that, it is silently ignored when the NIC is not a PCI device.
 
+For virtio-net-pci, you can control whether or not ioeventfd is used for
+virtqueue notify by setting ioeventfd= to on (default) or off.
+
 -net nic accepts vectors=V for all models, but it's silently ignored
 except for virtio-net-pci (model=virtio).  With -device, only devices
 that support it accept it.
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-12 15:02 [Qemu-devel] [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 4/4] docs: Document virtio PCI -device ioeventfd=on|off Stefan Hajnoczi
@ 2010-12-12 15:14 ` Stefan Hajnoczi
  2010-12-12 20:41 ` Michael S. Tsirkin
  5 siblings, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-12 15:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: Michael S. Tsirkin

On Sun, Dec 12, 2010 at 3:02 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> Due to lack of connectivity I am sending from GMail.  Git should retain my
> stefanha@linux.vnet.ibm.com From address.

The From address didn't come through correctly so I've pushed the commits here:

git pull git://repo.or.cz/qemu/stefanha.git virtio-ioeventfd-2

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-12 15:02 [Qemu-devel] [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2010-12-12 15:14 ` [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
@ 2010-12-12 20:41 ` Michael S. Tsirkin
  2010-12-12 20:42   ` Michael S. Tsirkin
  5 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-12 20:41 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Sun, Dec 12, 2010 at 03:02:04PM +0000, Stefan Hajnoczi wrote:
> See below for the v5 changelog.
> 
> Due to lack of connectivity I am sending from GMail.  Git should retain my
> stefanha@linux.vnet.ibm.com From address.
> 
> Virtqueue notify is currently handled synchronously in userspace virtio.  This
> prevents the vcpu from executing guest code while hardware emulation code
> handles the notify.
> 
> On systems that support KVM, the ioeventfd mechanism can be used to make
> virtqueue notify a lightweight exit by deferring hardware emulation to the
> iothread and allowing the VM to continue execution.  This model is similar to
> how vhost receives virtqueue notifies.
> 
> The result of this change is improved performance for userspace virtio devices.
> Virtio-blk throughput increases especially for multithreaded scenarios and
> virtio-net transmit throughput increases substantially.

Interestingly, I see decreased throughput for small message
host to get netperf runs.

The command that I used was:
netperf -H $vguest -- -m 200

And the results are:
- with ioeventfd=off
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384    200    10.00      3035.48   15.50    99.30    6.695   2.680  

- with ioeventfd=on
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384    200    10.00      1770.95   18.16    51.65    13.442  2.389  


Do you see this behaviour too?

-- 
MST

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-12 20:41 ` Michael S. Tsirkin
@ 2010-12-12 20:42   ` Michael S. Tsirkin
  2010-12-12 20:56     ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-12 20:42 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Sun, Dec 12, 2010 at 10:41:28PM +0200, Michael S. Tsirkin wrote:
> On Sun, Dec 12, 2010 at 03:02:04PM +0000, Stefan Hajnoczi wrote:
> > See below for the v5 changelog.
> > 
> > Due to lack of connectivity I am sending from GMail.  Git should retain my
> > stefanha@linux.vnet.ibm.com From address.
> > 
> > Virtqueue notify is currently handled synchronously in userspace virtio.  This
> > prevents the vcpu from executing guest code while hardware emulation code
> > handles the notify.
> > 
> > On systems that support KVM, the ioeventfd mechanism can be used to make
> > virtqueue notify a lightweight exit by deferring hardware emulation to the
> > iothread and allowing the VM to continue execution.  This model is similar to
> > how vhost receives virtqueue notifies.
> > 
> > The result of this change is improved performance for userspace virtio devices.
> > Virtio-blk throughput increases especially for multithreaded scenarios and
> > virtio-net transmit throughput increases substantially.
> 
> Interestingly, I see decreased throughput for small message
> host to get netperf runs.
> 
> The command that I used was:
> netperf -H $vguest -- -m 200
> 
> And the results are:
> - with ioeventfd=off
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> 
>  87380  16384    200    10.00      3035.48   15.50    99.30    6.695   2.680  
> 
> - with ioeventfd=on
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> 
>  87380  16384    200    10.00      1770.95   18.16    51.65    13.442  2.389  
> 
> 
> Do you see this behaviour too?

Just a note: this is with the patchset ported to qemu-kvm.

> -- 
> MST

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-12 20:42   ` Michael S. Tsirkin
@ 2010-12-12 20:56     ` Michael S. Tsirkin
  2010-12-12 21:09       ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-12 20:56 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Sun, Dec 12, 2010 at 10:42:28PM +0200, Michael S. Tsirkin wrote:
> On Sun, Dec 12, 2010 at 10:41:28PM +0200, Michael S. Tsirkin wrote:
> > On Sun, Dec 12, 2010 at 03:02:04PM +0000, Stefan Hajnoczi wrote:
> > > See below for the v5 changelog.
> > > 
> > > Due to lack of connectivity I am sending from GMail.  Git should retain my
> > > stefanha@linux.vnet.ibm.com From address.
> > > 
> > > Virtqueue notify is currently handled synchronously in userspace virtio.  This
> > > prevents the vcpu from executing guest code while hardware emulation code
> > > handles the notify.
> > > 
> > > On systems that support KVM, the ioeventfd mechanism can be used to make
> > > virtqueue notify a lightweight exit by deferring hardware emulation to the
> > > iothread and allowing the VM to continue execution.  This model is similar to
> > > how vhost receives virtqueue notifies.
> > > 
> > > The result of this change is improved performance for userspace virtio devices.
> > > Virtio-blk throughput increases especially for multithreaded scenarios and
> > > virtio-net transmit throughput increases substantially.
> > 
> > Interestingly, I see decreased throughput for small message
> > host to get netperf runs.
> > 
> > The command that I used was:
> > netperf -H $vguest -- -m 200
> > 
> > And the results are:
> > - with ioeventfd=off
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo
> > Recv   Send    Send                          Utilization       Service Demand
> > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> > Size   Size    Size     Time     Throughput  local    remote   local   remote
> > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> > 
> >  87380  16384    200    10.00      3035.48   15.50    99.30    6.695   2.680  
> > 
> > - with ioeventfd=on
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo
> > Recv   Send    Send                          Utilization       Service Demand
> > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> > Size   Size    Size     Time     Throughput  local    remote   local   remote
> > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> > 
> >  87380  16384    200    10.00      1770.95   18.16    51.65    13.442  2.389  
> > 
> > 
> > Do you see this behaviour too?
> 
> Just a note: this is with the patchset ported to qemu-kvm.

And just another note: the trend is reversed for larged messages,
e.g. with 1.5k messages ioeventfd=on outputforms ioeventfd=off.

> > -- 
> > MST

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-12 20:56     ` Michael S. Tsirkin
@ 2010-12-12 21:09       ` Michael S. Tsirkin
  2010-12-13 10:24         ` Stefan Hajnoczi
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-12 21:09 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Sun, Dec 12, 2010 at 10:56:34PM +0200, Michael S. Tsirkin wrote:
> On Sun, Dec 12, 2010 at 10:42:28PM +0200, Michael S. Tsirkin wrote:
> > On Sun, Dec 12, 2010 at 10:41:28PM +0200, Michael S. Tsirkin wrote:
> > > On Sun, Dec 12, 2010 at 03:02:04PM +0000, Stefan Hajnoczi wrote:
> > > > See below for the v5 changelog.
> > > > 
> > > > Due to lack of connectivity I am sending from GMail.  Git should retain my
> > > > stefanha@linux.vnet.ibm.com From address.
> > > > 
> > > > Virtqueue notify is currently handled synchronously in userspace virtio.  This
> > > > prevents the vcpu from executing guest code while hardware emulation code
> > > > handles the notify.
> > > > 
> > > > On systems that support KVM, the ioeventfd mechanism can be used to make
> > > > virtqueue notify a lightweight exit by deferring hardware emulation to the
> > > > iothread and allowing the VM to continue execution.  This model is similar to
> > > > how vhost receives virtqueue notifies.
> > > > 
> > > > The result of this change is improved performance for userspace virtio devices.
> > > > Virtio-blk throughput increases especially for multithreaded scenarios and
> > > > virtio-net transmit throughput increases substantially.
> > > 
> > > Interestingly, I see decreased throughput for small message
> > > host to get netperf runs.
> > > 
> > > The command that I used was:
> > > netperf -H $vguest -- -m 200
> > > 
> > > And the results are:
> > > - with ioeventfd=off
> > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo
> > > Recv   Send    Send                          Utilization       Service Demand
> > > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> > > Size   Size    Size     Time     Throughput  local    remote   local   remote
> > > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> > > 
> > >  87380  16384    200    10.00      3035.48   15.50    99.30    6.695   2.680  
> > > 
> > > - with ioeventfd=on
> > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo
> > > Recv   Send    Send                          Utilization       Service Demand
> > > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> > > Size   Size    Size     Time     Throughput  local    remote   local   remote
> > > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> > > 
> > >  87380  16384    200    10.00      1770.95   18.16    51.65    13.442  2.389  
> > > 
> > > 
> > > Do you see this behaviour too?
> > 
> > Just a note: this is with the patchset ported to qemu-kvm.
> 
> And just another note: the trend is reversed for larged messages,
> e.g. with 1.5k messages ioeventfd=on outputforms ioeventfd=off.

Another datapoint where I see a regression is with 4000 byte messages
for guest to host traffic.

ioeventfd=off
set_up_server could not establish a listen endpoint for  port 12865 with family AF_UNSPEC
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384   4000    10.00      7717.56   98.80    15.11    1.049   2.566  

ioeventfd=on
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384   4000    10.00      3965.86   87.69    15.29    1.811   5.055  

-- 
MST

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-12 21:09       ` Michael S. Tsirkin
@ 2010-12-13 10:24         ` Stefan Hajnoczi
  2010-12-13 10:38           ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-13 10:24 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Sun, Dec 12, 2010 at 9:09 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Sun, Dec 12, 2010 at 10:56:34PM +0200, Michael S. Tsirkin wrote:
>> On Sun, Dec 12, 2010 at 10:42:28PM +0200, Michael S. Tsirkin wrote:
>> > On Sun, Dec 12, 2010 at 10:41:28PM +0200, Michael S. Tsirkin wrote:
>> > > On Sun, Dec 12, 2010 at 03:02:04PM +0000, Stefan Hajnoczi wrote:
>> > > > See below for the v5 changelog.
>> > > >
>> > > > Due to lack of connectivity I am sending from GMail.  Git should retain my
>> > > > stefanha@linux.vnet.ibm.com From address.
>> > > >
>> > > > Virtqueue notify is currently handled synchronously in userspace virtio.  This
>> > > > prevents the vcpu from executing guest code while hardware emulation code
>> > > > handles the notify.
>> > > >
>> > > > On systems that support KVM, the ioeventfd mechanism can be used to make
>> > > > virtqueue notify a lightweight exit by deferring hardware emulation to the
>> > > > iothread and allowing the VM to continue execution.  This model is similar to
>> > > > how vhost receives virtqueue notifies.
>> > > >
>> > > > The result of this change is improved performance for userspace virtio devices.
>> > > > Virtio-blk throughput increases especially for multithreaded scenarios and
>> > > > virtio-net transmit throughput increases substantially.
>> > >
>> > > Interestingly, I see decreased throughput for small message
>> > > host to get netperf runs.
>> > >
>> > > The command that I used was:
>> > > netperf -H $vguest -- -m 200
>> > >
>> > > And the results are:
>> > > - with ioeventfd=off
>> > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo
>> > > Recv   Send    Send                          Utilization       Service Demand
>> > > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
>> > > Size   Size    Size     Time     Throughput  local    remote   local   remote
>> > > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
>> > >
>> > >  87380  16384    200    10.00      3035.48   15.50    99.30    6.695   2.680
>> > >
>> > > - with ioeventfd=on
>> > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo
>> > > Recv   Send    Send                          Utilization       Service Demand
>> > > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
>> > > Size   Size    Size     Time     Throughput  local    remote   local   remote
>> > > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
>> > >
>> > >  87380  16384    200    10.00      1770.95   18.16    51.65    13.442  2.389
>> > >
>> > >
>> > > Do you see this behaviour too?
>> >
>> > Just a note: this is with the patchset ported to qemu-kvm.
>>
>> And just another note: the trend is reversed for larged messages,
>> e.g. with 1.5k messages ioeventfd=on outputforms ioeventfd=off.
>
> Another datapoint where I see a regression is with 4000 byte messages
> for guest to host traffic.
>
> ioeventfd=off
> set_up_server could not establish a listen endpoint for  port 12865 with family AF_UNSPEC
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
>
>  87380  16384   4000    10.00      7717.56   98.80    15.11    1.049   2.566
>
> ioeventfd=on
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
>
>  87380  16384   4000    10.00      3965.86   87.69    15.29    1.811   5.055

Interesting.  I posted the following results in an earlier version of
this patch:

"Sridhar Samudrala <sri@us.ibm.com> collected the following data for
virtio-net with 2.6.36-rc1 on the host and 2.6.34 on the guest.

Guest to Host TCP_STREAM throughput(Mb/sec)
-------------------------------------------
Msg Size  vhost-net  virtio-net  virtio-net/ioeventfd
65536         12755        6430                  7590
16384          8499        3084                  5764
 4096          4723        1578                  3659"

Here we got a throughput improvement where you got a regression.  Your
virtio-net ioeventfd=off throughput is much higher than what we got
(different hardware and configuration, but still I didn't know that
virtio-net reaches 7 Gbit/s!).

I have focussed on the block side of things.  Any thoughts about the
virtio-net performance we're seeing?

" 1024          1827         981                  2060

Host to Guest TCP_STREAM throughput(Mb/sec)
-------------------------------------------
Msg Size  vhost-net  virtio-net  virtio-net/ioeventfd
65536         11156        5790                  5853
16384         10787        5575                  5691
 4096         10452        5556                  4277
 1024          4437        3671                  5277

Guest to Host TCP_RR latency(transactions/sec)
----------------------------------------------

Msg Size  vhost-net  virtio-net  virtio-net/ioeventfd
   1          9903        3459                  3425
 4096          7185        1931                  1899
16384          6108        2102                  1923
65536          3161        1610                  1744"

I'll also run the netperf tests you posted to check what I get.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-13 10:24         ` Stefan Hajnoczi
@ 2010-12-13 10:38           ` Michael S. Tsirkin
  2010-12-13 13:11             ` Stefan Hajnoczi
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-13 10:38 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Mon, Dec 13, 2010 at 10:24:51AM +0000, Stefan Hajnoczi wrote:
> On Sun, Dec 12, 2010 at 9:09 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Sun, Dec 12, 2010 at 10:56:34PM +0200, Michael S. Tsirkin wrote:
> >> On Sun, Dec 12, 2010 at 10:42:28PM +0200, Michael S. Tsirkin wrote:
> >> > On Sun, Dec 12, 2010 at 10:41:28PM +0200, Michael S. Tsirkin wrote:
> >> > > On Sun, Dec 12, 2010 at 03:02:04PM +0000, Stefan Hajnoczi wrote:
> >> > > > See below for the v5 changelog.
> >> > > >
> >> > > > Due to lack of connectivity I am sending from GMail.  Git should retain my
> >> > > > stefanha@linux.vnet.ibm.com From address.
> >> > > >
> >> > > > Virtqueue notify is currently handled synchronously in userspace virtio.  This
> >> > > > prevents the vcpu from executing guest code while hardware emulation code
> >> > > > handles the notify.
> >> > > >
> >> > > > On systems that support KVM, the ioeventfd mechanism can be used to make
> >> > > > virtqueue notify a lightweight exit by deferring hardware emulation to the
> >> > > > iothread and allowing the VM to continue execution.  This model is similar to
> >> > > > how vhost receives virtqueue notifies.
> >> > > >
> >> > > > The result of this change is improved performance for userspace virtio devices.
> >> > > > Virtio-blk throughput increases especially for multithreaded scenarios and
> >> > > > virtio-net transmit throughput increases substantially.
> >> > >
> >> > > Interestingly, I see decreased throughput for small message
> >> > > host to get netperf runs.
> >> > >
> >> > > The command that I used was:
> >> > > netperf -H $vguest -- -m 200
> >> > >
> >> > > And the results are:
> >> > > - with ioeventfd=off
> >> > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo
> >> > > Recv   Send    Send                          Utilization       Service Demand
> >> > > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> >> > > Size   Size    Size     Time     Throughput  local    remote   local   remote
> >> > > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> >> > >
> >> > >  87380  16384    200    10.00      3035.48   15.50    99.30    6.695   2.680
> >> > >
> >> > > - with ioeventfd=on
> >> > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.104 (11.0.0.104) port 0 AF_INET : demo
> >> > > Recv   Send    Send                          Utilization       Service Demand
> >> > > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> >> > > Size   Size    Size     Time     Throughput  local    remote   local   remote
> >> > > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> >> > >
> >> > >  87380  16384    200    10.00      1770.95   18.16    51.65    13.442  2.389
> >> > >
> >> > >
> >> > > Do you see this behaviour too?
> >> >
> >> > Just a note: this is with the patchset ported to qemu-kvm.
> >>
> >> And just another note: the trend is reversed for larged messages,
> >> e.g. with 1.5k messages ioeventfd=on outputforms ioeventfd=off.
> >
> > Another datapoint where I see a regression is with 4000 byte messages
> > for guest to host traffic.
> >
> > ioeventfd=off
> > set_up_server could not establish a listen endpoint for  port 12865 with family AF_UNSPEC
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo
> > Recv   Send    Send                          Utilization       Service Demand
> > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> > Size   Size    Size     Time     Throughput  local    remote   local   remote
> > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> >
> >  87380  16384   4000    10.00      7717.56   98.80    15.11    1.049   2.566
> >
> > ioeventfd=on
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo
> > Recv   Send    Send                          Utilization       Service Demand
> > Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> > Size   Size    Size     Time     Throughput  local    remote   local   remote
> > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> >
> >  87380  16384   4000    10.00      3965.86   87.69    15.29    1.811   5.055
> 
> Interesting.  I posted the following results in an earlier version of
> this patch:
> 
> "Sridhar Samudrala <sri@us.ibm.com> collected the following data for
> virtio-net with 2.6.36-rc1 on the host and 2.6.34 on the guest.
> 
> Guest to Host TCP_STREAM throughput(Mb/sec)
> -------------------------------------------
> Msg Size  vhost-net  virtio-net  virtio-net/ioeventfd
> 65536         12755        6430                  7590
> 16384          8499        3084                  5764
>  4096          4723        1578                  3659"
> 
> Here we got a throughput improvement where you got a regression.  Your
> virtio-net ioeventfd=off throughput is much higher than what we got
> (different hardware and configuration, but still I didn't know that
> virtio-net reaches 7 Gbit/s!).

Which qemu are you running? Mine is upstream qemu-kvm + your patches v4
+ my patch to port to qemu-kvm.  Are you testing qemu.git?

My cpu is Intel(R) Xeon(R) CPU  X5560  @ 2.80GHz, I am running
without any special flags, so IIRC kvm64 cpu type is emulated.  Should
really try +x2apic.

> I have focussed on the block side of things.  Any thoughts about the
> virtio-net performance we're seeing?
> 
> " 1024          1827         981                  2060

I tried 1.5k, I am getting about 3000 guest to host,
but in my testing I get about 2000 without ioeventfd as well.

> Host to Guest TCP_STREAM throughput(Mb/sec)
> -------------------------------------------
> Msg Size  vhost-net  virtio-net  virtio-net/ioeventfd
> 65536         11156        5790                  5853
> 16384         10787        5575                  5691
>  4096         10452        5556                  4277
>  1024          4437        3671                  5277
> 
> Guest to Host TCP_RR latency(transactions/sec)
> ----------------------------------------------
> 
> Msg Size  vhost-net  virtio-net  virtio-net/ioeventfd
>    1          9903        3459                  3425
>  4096          7185        1931                  1899
> 16384          6108        2102                  1923
> 65536          3161        1610                  1744"
> 
> I'll also run the netperf tests you posted to check what I get.
> 
> Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-13 10:38           ` Michael S. Tsirkin
@ 2010-12-13 13:11             ` Stefan Hajnoczi
  2010-12-13 13:35               ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-13 13:11 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Fresh results:

192.168.0.1 - host (runs netperf)
192.168.0.2 - guest (runs netserver)

host$ src/netperf -H 192.168.0.2 -- -m 200

ioeventfd=on
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
(192.168.0.2) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 87380  16384    200    10.00    1759.25

ioeventfd=off
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
(192.168.0.2) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384    200    10.00    1757.15

The results vary approx +/- 3% between runs.

Invocation:
$ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img

I am running qemu.git with v5 patches, based off
36888c6335422f07bbc50bf3443a39f24b90c7c6.

Host:
1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
8 GB RAM
RHEL 6 host

Next I will try the patches on latest qemu-kvm.git

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-13 13:11             ` Stefan Hajnoczi
@ 2010-12-13 13:35               ` Michael S. Tsirkin
  2010-12-13 13:36                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-13 13:35 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Mon, Dec 13, 2010 at 01:11:27PM +0000, Stefan Hajnoczi wrote:
> Fresh results:
> 
> 192.168.0.1 - host (runs netperf)
> 192.168.0.2 - guest (runs netserver)
> 
> host$ src/netperf -H 192.168.0.2 -- -m 200
> 
> ioeventfd=on
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
> (192.168.0.2) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
>  87380  16384    200    10.00    1759.25
> 
> ioeventfd=off
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
> (192.168.0.2) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 
>  87380  16384    200    10.00    1757.15
> 
> The results vary approx +/- 3% between runs.
> 
> Invocation:
> $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
> type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
> virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
> if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
> 
> I am running qemu.git with v5 patches, based off
> 36888c6335422f07bbc50bf3443a39f24b90c7c6.
> 
> Host:
> 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
> 8 GB RAM
> RHEL 6 host
> 
> Next I will try the patches on latest qemu-kvm.git
> 
> Stefan

One interesting thing is that I put virtio-net earlier on
command line. Since iobus scan is linear for now, I wonder if this might
possibly matter.

-- 
MST

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-13 13:35               ` Michael S. Tsirkin
@ 2010-12-13 13:36                 ` Michael S. Tsirkin
  2010-12-13 14:06                   ` Stefan Hajnoczi
  2010-12-13 15:27                   ` Stefan Hajnoczi
  0 siblings, 2 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-13 13:36 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
> On Mon, Dec 13, 2010 at 01:11:27PM +0000, Stefan Hajnoczi wrote:
> > Fresh results:
> > 
> > 192.168.0.1 - host (runs netperf)
> > 192.168.0.2 - guest (runs netserver)
> > 
> > host$ src/netperf -H 192.168.0.2 -- -m 200
> > 
> > ioeventfd=on
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
> > (192.168.0.2) port 0 AF_INET
> > Recv   Send    Send
> > Socket Socket  Message  Elapsed
> > Size   Size    Size     Time     Throughput
> > bytes  bytes   bytes    secs.    10^6bits/sec
> >  87380  16384    200    10.00    1759.25
> > 
> > ioeventfd=off
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
> > (192.168.0.2) port 0 AF_INET
> > Recv   Send    Send
> > Socket Socket  Message  Elapsed
> > Size   Size    Size     Time     Throughput
> > bytes  bytes   bytes    secs.    10^6bits/sec
> > 
> >  87380  16384    200    10.00    1757.15
> > 
> > The results vary approx +/- 3% between runs.
> > 
> > Invocation:
> > $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
> > type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
> > virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
> > if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
> > 
> > I am running qemu.git with v5 patches, based off
> > 36888c6335422f07bbc50bf3443a39f24b90c7c6.
> > 
> > Host:
> > 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
> > 8 GB RAM
> > RHEL 6 host
> > 
> > Next I will try the patches on latest qemu-kvm.git
> > 
> > Stefan
> 
> One interesting thing is that I put virtio-net earlier on
> command line.

Sorry I mean I put it after disk, you put it before.

> Since iobus scan is linear for now, I wonder if this might
> possibly matter.
> 
> -- 
> MST

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-13 13:36                 ` Michael S. Tsirkin
@ 2010-12-13 14:06                   ` Stefan Hajnoczi
  2010-12-13 15:27                   ` Stefan Hajnoczi
  1 sibling, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-13 14:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Here are my results on qemu-kvm.git:

ioeventfd=on
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
(192.168.0.2) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384    200    10.00    1203.44

ioeventfd=off
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
(192.168.0.2) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384    200    10.00    1677.96

This is a 30% degradation that wasn't visible on qemu.git.

Same host.  qemu-kvm.git with v5 patches based on
cb1983b8809d0e06a97384a40bad1194a32fc814.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-13 13:36                 ` Michael S. Tsirkin
  2010-12-13 14:06                   ` Stefan Hajnoczi
@ 2010-12-13 15:27                   ` Stefan Hajnoczi
  2010-12-13 16:00                     ` Michael S. Tsirkin
  2010-12-13 16:12                     ` Michael S. Tsirkin
  1 sibling, 2 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-13 15:27 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
>> On Mon, Dec 13, 2010 at 01:11:27PM +0000, Stefan Hajnoczi wrote:
>> > Fresh results:
>> >
>> > 192.168.0.1 - host (runs netperf)
>> > 192.168.0.2 - guest (runs netserver)
>> >
>> > host$ src/netperf -H 192.168.0.2 -- -m 200
>> >
>> > ioeventfd=on
>> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
>> > (192.168.0.2) port 0 AF_INET
>> > Recv   Send    Send
>> > Socket Socket  Message  Elapsed
>> > Size   Size    Size     Time     Throughput
>> > bytes  bytes   bytes    secs.    10^6bits/sec
>> >  87380  16384    200    10.00    1759.25
>> >
>> > ioeventfd=off
>> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
>> > (192.168.0.2) port 0 AF_INET
>> > Recv   Send    Send
>> > Socket Socket  Message  Elapsed
>> > Size   Size    Size     Time     Throughput
>> > bytes  bytes   bytes    secs.    10^6bits/sec
>> >
>> >  87380  16384    200    10.00    1757.15
>> >
>> > The results vary approx +/- 3% between runs.
>> >
>> > Invocation:
>> > $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
>> > type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
>> > virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
>> > if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
>> >
>> > I am running qemu.git with v5 patches, based off
>> > 36888c6335422f07bbc50bf3443a39f24b90c7c6.
>> >
>> > Host:
>> > 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
>> > 8 GB RAM
>> > RHEL 6 host
>> >
>> > Next I will try the patches on latest qemu-kvm.git
>> >
>> > Stefan
>>
>> One interesting thing is that I put virtio-net earlier on
>> command line.
>
> Sorry I mean I put it after disk, you put it before.

I can't find a measurable difference when swapping -drive and -netdev.

Can you run the same test with vhost?  I assume it still outperforms
userspace virtio for small message sizes?  I'm interested because that
also uses ioeventfd.

I am wondering if the iothread differences between qemu.git and
qemu-kvm.git can explain the performance results we see.  In
particular, qemu.git produces the same (high) throughput whether
ioeventfd is on or off.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-13 15:27                   ` Stefan Hajnoczi
@ 2010-12-13 16:00                     ` Michael S. Tsirkin
  2010-12-13 16:29                       ` Stefan Hajnoczi
  2010-12-13 16:12                     ` Michael S. Tsirkin
  1 sibling, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-13 16:00 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Mon, Dec 13, 2010 at 03:27:06PM +0000, Stefan Hajnoczi wrote:
> On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
> >> On Mon, Dec 13, 2010 at 01:11:27PM +0000, Stefan Hajnoczi wrote:
> >> > Fresh results:
> >> >
> >> > 192.168.0.1 - host (runs netperf)
> >> > 192.168.0.2 - guest (runs netserver)
> >> >
> >> > host$ src/netperf -H 192.168.0.2 -- -m 200
> >> >
> >> > ioeventfd=on
> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
> >> > (192.168.0.2) port 0 AF_INET
> >> > Recv   Send    Send
> >> > Socket Socket  Message  Elapsed
> >> > Size   Size    Size     Time     Throughput
> >> > bytes  bytes   bytes    secs.    10^6bits/sec
> >> >  87380  16384    200    10.00    1759.25
> >> >
> >> > ioeventfd=off
> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
> >> > (192.168.0.2) port 0 AF_INET
> >> > Recv   Send    Send
> >> > Socket Socket  Message  Elapsed
> >> > Size   Size    Size     Time     Throughput
> >> > bytes  bytes   bytes    secs.    10^6bits/sec
> >> >
> >> >  87380  16384    200    10.00    1757.15
> >> >
> >> > The results vary approx +/- 3% between runs.
> >> >
> >> > Invocation:
> >> > $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
> >> > type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
> >> > virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
> >> > if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
> >> >
> >> > I am running qemu.git with v5 patches, based off
> >> > 36888c6335422f07bbc50bf3443a39f24b90c7c6.
> >> >
> >> > Host:
> >> > 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
> >> > 8 GB RAM
> >> > RHEL 6 host
> >> >
> >> > Next I will try the patches on latest qemu-kvm.git
> >> >
> >> > Stefan
> >>
> >> One interesting thing is that I put virtio-net earlier on
> >> command line.
> >
> > Sorry I mean I put it after disk, you put it before.
> 
> I can't find a measurable difference when swapping -drive and -netdev.
> 
> Can you run the same test with vhost?  I assume it still outperforms
> userspace virtio for small message sizes?  I'm interested because that
> also uses ioeventfd.

Seems to work same as ioeventfd.

> I am wondering if the iothread differences between qemu.git and
> qemu-kvm.git can explain the performance results we see.  In
> particular, qemu.git produces the same (high) throughput whether
> ioeventfd is on or off.
> 
> Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-13 15:27                   ` Stefan Hajnoczi
  2010-12-13 16:00                     ` Michael S. Tsirkin
@ 2010-12-13 16:12                     ` Michael S. Tsirkin
  2010-12-13 16:28                       ` Stefan Hajnoczi
  1 sibling, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-13 16:12 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Mon, Dec 13, 2010 at 03:27:06PM +0000, Stefan Hajnoczi wrote:
> On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
> >> On Mon, Dec 13, 2010 at 01:11:27PM +0000, Stefan Hajnoczi wrote:
> >> > Fresh results:
> >> >
> >> > 192.168.0.1 - host (runs netperf)
> >> > 192.168.0.2 - guest (runs netserver)
> >> >
> >> > host$ src/netperf -H 192.168.0.2 -- -m 200
> >> >
> >> > ioeventfd=on
> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
> >> > (192.168.0.2) port 0 AF_INET
> >> > Recv   Send    Send
> >> > Socket Socket  Message  Elapsed
> >> > Size   Size    Size     Time     Throughput
> >> > bytes  bytes   bytes    secs.    10^6bits/sec
> >> >  87380  16384    200    10.00    1759.25
> >> >
> >> > ioeventfd=off
> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
> >> > (192.168.0.2) port 0 AF_INET
> >> > Recv   Send    Send
> >> > Socket Socket  Message  Elapsed
> >> > Size   Size    Size     Time     Throughput
> >> > bytes  bytes   bytes    secs.    10^6bits/sec
> >> >
> >> >  87380  16384    200    10.00    1757.15
> >> >
> >> > The results vary approx +/- 3% between runs.
> >> >
> >> > Invocation:
> >> > $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
> >> > type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
> >> > virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
> >> > if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
> >> >
> >> > I am running qemu.git with v5 patches, based off
> >> > 36888c6335422f07bbc50bf3443a39f24b90c7c6.
> >> >
> >> > Host:
> >> > 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
> >> > 8 GB RAM
> >> > RHEL 6 host
> >> >
> >> > Next I will try the patches on latest qemu-kvm.git
> >> >
> >> > Stefan
> >>
> >> One interesting thing is that I put virtio-net earlier on
> >> command line.
> >
> > Sorry I mean I put it after disk, you put it before.
> 
> I can't find a measurable difference when swapping -drive and -netdev.

One other concern I have is that we are apparently using
ioeventfd for all VQs. E.g. for virtio-net we probably should not
use it for the control VQ - it's a waste of resources.

> Can you run the same test with vhost?  I assume it still outperforms
> userspace virtio for small message sizes?  I'm interested because that
> also uses ioeventfd.
> 
> I am wondering if the iothread differences between qemu.git and
> qemu-kvm.git can explain the performance results we see.  In
> particular, qemu.git produces the same (high) throughput whether
> ioeventfd is on or off.
> 
> Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-13 16:12                     ` Michael S. Tsirkin
@ 2010-12-13 16:28                       ` Stefan Hajnoczi
  2010-12-13 17:57                         ` Stefan Hajnoczi
  0 siblings, 1 reply; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-13 16:28 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Dec 13, 2010 at 03:27:06PM +0000, Stefan Hajnoczi wrote:
>> On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
>> >> On Mon, Dec 13, 2010 at 01:11:27PM +0000, Stefan Hajnoczi wrote:
>> >> > Fresh results:
>> >> >
>> >> > 192.168.0.1 - host (runs netperf)
>> >> > 192.168.0.2 - guest (runs netserver)
>> >> >
>> >> > host$ src/netperf -H 192.168.0.2 -- -m 200
>> >> >
>> >> > ioeventfd=on
>> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
>> >> > (192.168.0.2) port 0 AF_INET
>> >> > Recv   Send    Send
>> >> > Socket Socket  Message  Elapsed
>> >> > Size   Size    Size     Time     Throughput
>> >> > bytes  bytes   bytes    secs.    10^6bits/sec
>> >> >  87380  16384    200    10.00    1759.25
>> >> >
>> >> > ioeventfd=off
>> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
>> >> > (192.168.0.2) port 0 AF_INET
>> >> > Recv   Send    Send
>> >> > Socket Socket  Message  Elapsed
>> >> > Size   Size    Size     Time     Throughput
>> >> > bytes  bytes   bytes    secs.    10^6bits/sec
>> >> >
>> >> >  87380  16384    200    10.00    1757.15
>> >> >
>> >> > The results vary approx +/- 3% between runs.
>> >> >
>> >> > Invocation:
>> >> > $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
>> >> > type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
>> >> > virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
>> >> > if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
>> >> >
>> >> > I am running qemu.git with v5 patches, based off
>> >> > 36888c6335422f07bbc50bf3443a39f24b90c7c6.
>> >> >
>> >> > Host:
>> >> > 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
>> >> > 8 GB RAM
>> >> > RHEL 6 host
>> >> >
>> >> > Next I will try the patches on latest qemu-kvm.git
>> >> >
>> >> > Stefan
>> >>
>> >> One interesting thing is that I put virtio-net earlier on
>> >> command line.
>> >
>> > Sorry I mean I put it after disk, you put it before.
>>
>> I can't find a measurable difference when swapping -drive and -netdev.
>
> One other concern I have is that we are apparently using
> ioeventfd for all VQs. E.g. for virtio-net we probably should not
> use it for the control VQ - it's a waste of resources.

One option is a per-device (block, net, etc) bitmap that masks out
virtqueues.  Is that something you'd like to see?

I'm tempted to mask out the RX vq too and see how that affects the
qemu-kvm.git specific issue.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-13 16:00                     ` Michael S. Tsirkin
@ 2010-12-13 16:29                       ` Stefan Hajnoczi
  2010-12-13 16:30                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-13 16:29 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Mon, Dec 13, 2010 at 4:00 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Dec 13, 2010 at 03:27:06PM +0000, Stefan Hajnoczi wrote:
>> On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
>> >> On Mon, Dec 13, 2010 at 01:11:27PM +0000, Stefan Hajnoczi wrote:
>> >> > Fresh results:
>> >> >
>> >> > 192.168.0.1 - host (runs netperf)
>> >> > 192.168.0.2 - guest (runs netserver)
>> >> >
>> >> > host$ src/netperf -H 192.168.0.2 -- -m 200
>> >> >
>> >> > ioeventfd=on
>> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
>> >> > (192.168.0.2) port 0 AF_INET
>> >> > Recv   Send    Send
>> >> > Socket Socket  Message  Elapsed
>> >> > Size   Size    Size     Time     Throughput
>> >> > bytes  bytes   bytes    secs.    10^6bits/sec
>> >> >  87380  16384    200    10.00    1759.25
>> >> >
>> >> > ioeventfd=off
>> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
>> >> > (192.168.0.2) port 0 AF_INET
>> >> > Recv   Send    Send
>> >> > Socket Socket  Message  Elapsed
>> >> > Size   Size    Size     Time     Throughput
>> >> > bytes  bytes   bytes    secs.    10^6bits/sec
>> >> >
>> >> >  87380  16384    200    10.00    1757.15
>> >> >
>> >> > The results vary approx +/- 3% between runs.
>> >> >
>> >> > Invocation:
>> >> > $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
>> >> > type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
>> >> > virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
>> >> > if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
>> >> >
>> >> > I am running qemu.git with v5 patches, based off
>> >> > 36888c6335422f07bbc50bf3443a39f24b90c7c6.
>> >> >
>> >> > Host:
>> >> > 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
>> >> > 8 GB RAM
>> >> > RHEL 6 host
>> >> >
>> >> > Next I will try the patches on latest qemu-kvm.git
>> >> >
>> >> > Stefan
>> >>
>> >> One interesting thing is that I put virtio-net earlier on
>> >> command line.
>> >
>> > Sorry I mean I put it after disk, you put it before.
>>
>> I can't find a measurable difference when swapping -drive and -netdev.
>>
>> Can you run the same test with vhost?  I assume it still outperforms
>> userspace virtio for small message sizes?  I'm interested because that
>> also uses ioeventfd.
>
> Seems to work same as ioeventfd.

vhost performs the same as ioeventfd=on?  And that means slower than
ioeventfd=off?

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-13 16:29                       ` Stefan Hajnoczi
@ 2010-12-13 16:30                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-13 16:30 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Mon, Dec 13, 2010 at 04:29:58PM +0000, Stefan Hajnoczi wrote:
> On Mon, Dec 13, 2010 at 4:00 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Mon, Dec 13, 2010 at 03:27:06PM +0000, Stefan Hajnoczi wrote:
> >> On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
> >> >> On Mon, Dec 13, 2010 at 01:11:27PM +0000, Stefan Hajnoczi wrote:
> >> >> > Fresh results:
> >> >> >
> >> >> > 192.168.0.1 - host (runs netperf)
> >> >> > 192.168.0.2 - guest (runs netserver)
> >> >> >
> >> >> > host$ src/netperf -H 192.168.0.2 -- -m 200
> >> >> >
> >> >> > ioeventfd=on
> >> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
> >> >> > (192.168.0.2) port 0 AF_INET
> >> >> > Recv   Send    Send
> >> >> > Socket Socket  Message  Elapsed
> >> >> > Size   Size    Size     Time     Throughput
> >> >> > bytes  bytes   bytes    secs.    10^6bits/sec
> >> >> >  87380  16384    200    10.00    1759.25
> >> >> >
> >> >> > ioeventfd=off
> >> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
> >> >> > (192.168.0.2) port 0 AF_INET
> >> >> > Recv   Send    Send
> >> >> > Socket Socket  Message  Elapsed
> >> >> > Size   Size    Size     Time     Throughput
> >> >> > bytes  bytes   bytes    secs.    10^6bits/sec
> >> >> >
> >> >> >  87380  16384    200    10.00    1757.15
> >> >> >
> >> >> > The results vary approx +/- 3% between runs.
> >> >> >
> >> >> > Invocation:
> >> >> > $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
> >> >> > type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
> >> >> > virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
> >> >> > if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
> >> >> >
> >> >> > I am running qemu.git with v5 patches, based off
> >> >> > 36888c6335422f07bbc50bf3443a39f24b90c7c6.
> >> >> >
> >> >> > Host:
> >> >> > 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
> >> >> > 8 GB RAM
> >> >> > RHEL 6 host
> >> >> >
> >> >> > Next I will try the patches on latest qemu-kvm.git
> >> >> >
> >> >> > Stefan
> >> >>
> >> >> One interesting thing is that I put virtio-net earlier on
> >> >> command line.
> >> >
> >> > Sorry I mean I put it after disk, you put it before.
> >>
> >> I can't find a measurable difference when swapping -drive and -netdev.
> >>
> >> Can you run the same test with vhost?  I assume it still outperforms
> >> userspace virtio for small message sizes?  I'm interested because that
> >> also uses ioeventfd.
> >
> > Seems to work same as ioeventfd.
> 
> vhost performs the same as ioeventfd=on?  And that means slower than
> ioeventfd=off?
> 
> Stefan

Yes.

-- 
MST

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-13 16:28                       ` Stefan Hajnoczi
@ 2010-12-13 17:57                         ` Stefan Hajnoczi
  2010-12-13 18:52                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-13 17:57 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Mon, Dec 13, 2010 at 4:28 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Mon, Dec 13, 2010 at 03:27:06PM +0000, Stefan Hajnoczi wrote:
>>> On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> > On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
>>> >> On Mon, Dec 13, 2010 at 01:11:27PM +0000, Stefan Hajnoczi wrote:
>>> >> > Fresh results:
>>> >> >
>>> >> > 192.168.0.1 - host (runs netperf)
>>> >> > 192.168.0.2 - guest (runs netserver)
>>> >> >
>>> >> > host$ src/netperf -H 192.168.0.2 -- -m 200
>>> >> >
>>> >> > ioeventfd=on
>>> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
>>> >> > (192.168.0.2) port 0 AF_INET
>>> >> > Recv   Send    Send
>>> >> > Socket Socket  Message  Elapsed
>>> >> > Size   Size    Size     Time     Throughput
>>> >> > bytes  bytes   bytes    secs.    10^6bits/sec
>>> >> >  87380  16384    200    10.00    1759.25
>>> >> >
>>> >> > ioeventfd=off
>>> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
>>> >> > (192.168.0.2) port 0 AF_INET
>>> >> > Recv   Send    Send
>>> >> > Socket Socket  Message  Elapsed
>>> >> > Size   Size    Size     Time     Throughput
>>> >> > bytes  bytes   bytes    secs.    10^6bits/sec
>>> >> >
>>> >> >  87380  16384    200    10.00    1757.15
>>> >> >
>>> >> > The results vary approx +/- 3% between runs.
>>> >> >
>>> >> > Invocation:
>>> >> > $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
>>> >> > type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
>>> >> > virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
>>> >> > if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
>>> >> >
>>> >> > I am running qemu.git with v5 patches, based off
>>> >> > 36888c6335422f07bbc50bf3443a39f24b90c7c6.
>>> >> >
>>> >> > Host:
>>> >> > 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
>>> >> > 8 GB RAM
>>> >> > RHEL 6 host
>>> >> >
>>> >> > Next I will try the patches on latest qemu-kvm.git
>>> >> >
>>> >> > Stefan
>>> >>
>>> >> One interesting thing is that I put virtio-net earlier on
>>> >> command line.
>>> >
>>> > Sorry I mean I put it after disk, you put it before.
>>>
>>> I can't find a measurable difference when swapping -drive and -netdev.
>>
>> One other concern I have is that we are apparently using
>> ioeventfd for all VQs. E.g. for virtio-net we probably should not
>> use it for the control VQ - it's a waste of resources.
>
> One option is a per-device (block, net, etc) bitmap that masks out
> virtqueues.  Is that something you'd like to see?
>
> I'm tempted to mask out the RX vq too and see how that affects the
> qemu-kvm.git specific issue.

As expected, the rx virtqueue is involved in the degradation.  I
enabled ioeventfd only for the TX virtqueue and got the same good
results as userspace virtio-net.

When I enable only the rx virtqueue, performs decreases as we've seen above.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-13 17:57                         ` Stefan Hajnoczi
@ 2010-12-13 18:52                           ` Michael S. Tsirkin
  2010-12-15 11:42                             ` Stefan Hajnoczi
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-13 18:52 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Mon, Dec 13, 2010 at 05:57:28PM +0000, Stefan Hajnoczi wrote:
> On Mon, Dec 13, 2010 at 4:28 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> On Mon, Dec 13, 2010 at 03:27:06PM +0000, Stefan Hajnoczi wrote:
> >>> On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>> > On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
> >>> >> On Mon, Dec 13, 2010 at 01:11:27PM +0000, Stefan Hajnoczi wrote:
> >>> >> > Fresh results:
> >>> >> >
> >>> >> > 192.168.0.1 - host (runs netperf)
> >>> >> > 192.168.0.2 - guest (runs netserver)
> >>> >> >
> >>> >> > host$ src/netperf -H 192.168.0.2 -- -m 200
> >>> >> >
> >>> >> > ioeventfd=on
> >>> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
> >>> >> > (192.168.0.2) port 0 AF_INET
> >>> >> > Recv   Send    Send
> >>> >> > Socket Socket  Message  Elapsed
> >>> >> > Size   Size    Size     Time     Throughput
> >>> >> > bytes  bytes   bytes    secs.    10^6bits/sec
> >>> >> >  87380  16384    200    10.00    1759.25
> >>> >> >
> >>> >> > ioeventfd=off
> >>> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
> >>> >> > (192.168.0.2) port 0 AF_INET
> >>> >> > Recv   Send    Send
> >>> >> > Socket Socket  Message  Elapsed
> >>> >> > Size   Size    Size     Time     Throughput
> >>> >> > bytes  bytes   bytes    secs.    10^6bits/sec
> >>> >> >
> >>> >> >  87380  16384    200    10.00    1757.15
> >>> >> >
> >>> >> > The results vary approx +/- 3% between runs.
> >>> >> >
> >>> >> > Invocation:
> >>> >> > $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
> >>> >> > type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
> >>> >> > virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
> >>> >> > if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
> >>> >> >
> >>> >> > I am running qemu.git with v5 patches, based off
> >>> >> > 36888c6335422f07bbc50bf3443a39f24b90c7c6.
> >>> >> >
> >>> >> > Host:
> >>> >> > 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
> >>> >> > 8 GB RAM
> >>> >> > RHEL 6 host
> >>> >> >
> >>> >> > Next I will try the patches on latest qemu-kvm.git
> >>> >> >
> >>> >> > Stefan
> >>> >>
> >>> >> One interesting thing is that I put virtio-net earlier on
> >>> >> command line.
> >>> >
> >>> > Sorry I mean I put it after disk, you put it before.
> >>>
> >>> I can't find a measurable difference when swapping -drive and -netdev.
> >>
> >> One other concern I have is that we are apparently using
> >> ioeventfd for all VQs. E.g. for virtio-net we probably should not
> >> use it for the control VQ - it's a waste of resources.
> >
> > One option is a per-device (block, net, etc) bitmap that masks out
> > virtqueues.  Is that something you'd like to see?
> >
> > I'm tempted to mask out the RX vq too and see how that affects the
> > qemu-kvm.git specific issue.
> 
> As expected, the rx virtqueue is involved in the degradation.  I
> enabled ioeventfd only for the TX virtqueue and got the same good
> results as userspace virtio-net.
> 
> When I enable only the rx virtqueue, performs decreases as we've seen above.
> 
> Stefan

Interesting. In particular this implies something's wrong with the
queue: we should not normally be getting notifications from rx queue
at all. Is it running low on buffers? Does it help to increase the vq
size?  Any other explanation?

-- 
MST

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-13 18:52                           ` Michael S. Tsirkin
@ 2010-12-15 11:42                             ` Stefan Hajnoczi
  2010-12-15 11:48                               ` Stefan Hajnoczi
  2010-12-15 12:14                               ` Michael S. Tsirkin
  0 siblings, 2 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-15 11:42 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Mon, Dec 13, 2010 at 6:52 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Dec 13, 2010 at 05:57:28PM +0000, Stefan Hajnoczi wrote:
>> On Mon, Dec 13, 2010 at 4:28 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> > On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> On Mon, Dec 13, 2010 at 03:27:06PM +0000, Stefan Hajnoczi wrote:
>> >>> On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >>> > On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
>> >>> >> On Mon, Dec 13, 2010 at 01:11:27PM +0000, Stefan Hajnoczi wrote:
>> >>> >> > Fresh results:
>> >>> >> >
>> >>> >> > 192.168.0.1 - host (runs netperf)
>> >>> >> > 192.168.0.2 - guest (runs netserver)
>> >>> >> >
>> >>> >> > host$ src/netperf -H 192.168.0.2 -- -m 200
>> >>> >> >
>> >>> >> > ioeventfd=on
>> >>> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
>> >>> >> > (192.168.0.2) port 0 AF_INET
>> >>> >> > Recv   Send    Send
>> >>> >> > Socket Socket  Message  Elapsed
>> >>> >> > Size   Size    Size     Time     Throughput
>> >>> >> > bytes  bytes   bytes    secs.    10^6bits/sec
>> >>> >> >  87380  16384    200    10.00    1759.25
>> >>> >> >
>> >>> >> > ioeventfd=off
>> >>> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
>> >>> >> > (192.168.0.2) port 0 AF_INET
>> >>> >> > Recv   Send    Send
>> >>> >> > Socket Socket  Message  Elapsed
>> >>> >> > Size   Size    Size     Time     Throughput
>> >>> >> > bytes  bytes   bytes    secs.    10^6bits/sec
>> >>> >> >
>> >>> >> >  87380  16384    200    10.00    1757.15
>> >>> >> >
>> >>> >> > The results vary approx +/- 3% between runs.
>> >>> >> >
>> >>> >> > Invocation:
>> >>> >> > $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
>> >>> >> > type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
>> >>> >> > virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
>> >>> >> > if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
>> >>> >> >
>> >>> >> > I am running qemu.git with v5 patches, based off
>> >>> >> > 36888c6335422f07bbc50bf3443a39f24b90c7c6.
>> >>> >> >
>> >>> >> > Host:
>> >>> >> > 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
>> >>> >> > 8 GB RAM
>> >>> >> > RHEL 6 host
>> >>> >> >
>> >>> >> > Next I will try the patches on latest qemu-kvm.git
>> >>> >> >
>> >>> >> > Stefan
>> >>> >>
>> >>> >> One interesting thing is that I put virtio-net earlier on
>> >>> >> command line.
>> >>> >
>> >>> > Sorry I mean I put it after disk, you put it before.
>> >>>
>> >>> I can't find a measurable difference when swapping -drive and -netdev.
>> >>
>> >> One other concern I have is that we are apparently using
>> >> ioeventfd for all VQs. E.g. for virtio-net we probably should not
>> >> use it for the control VQ - it's a waste of resources.
>> >
>> > One option is a per-device (block, net, etc) bitmap that masks out
>> > virtqueues.  Is that something you'd like to see?
>> >
>> > I'm tempted to mask out the RX vq too and see how that affects the
>> > qemu-kvm.git specific issue.
>>
>> As expected, the rx virtqueue is involved in the degradation.  I
>> enabled ioeventfd only for the TX virtqueue and got the same good
>> results as userspace virtio-net.
>>
>> When I enable only the rx virtqueue, performs decreases as we've seen above.
>>
>> Stefan
>
> Interesting. In particular this implies something's wrong with the
> queue: we should not normally be getting notifications from rx queue
> at all. Is it running low on buffers? Does it help to increase the vq
> size?  Any other explanation?

I made a mistake, it is the *tx* vq that causes reduced performance on
short packets with ioeventfd.  I double-checked the results and the rx
vq doesn't affect performance.

Initially I thought the fix would be to adjust the tx mitigation
mechanism since ioeventfd does its own mitigation of sorts.  Multiple
eventfd signals will be coalesced into one qemu-kvm event handler call
if qemu-kvm didn't have a chance to handle the first event before the
eventfd was signalled again.

I added -device virtio-net-pci tx=immediate to flush the TX queue
immediately instead of scheduling a BH or timer.  Unfortunately this
had little measurable effect and performance stayed the same.  This
suggests most of the latency is between the guest's pio write and
qemu-kvm getting around to handling the event.

You mentioned that vhost-net has the same performance issue on this
benchmark.  I guess a solution for vhost-net may help virtio-ioeventfd
and vice versa.

Are you happy with this patchset if I remove virtio-net-pci
ioeventfd=on|off so only virtio-blk-pci has ioeventfd=on|off (with
default on)?  For block we've found it to be a win and the initial
results looked good for net too.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-15 11:42                             ` Stefan Hajnoczi
@ 2010-12-15 11:48                               ` Stefan Hajnoczi
  2010-12-15 12:00                                 ` Michael S. Tsirkin
  2010-12-15 12:14                               ` Michael S. Tsirkin
  1 sibling, 1 reply; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-15 11:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

For the record, here are the commits to selectively mask virtqueues
for ioeventfd and to add -device virtio-net-pci,tx=immediate:
http://repo.or.cz/w/qemu-kvm/stefanha.git/shortlog/refs/heads/virtio-ioeventfd-2

I'm posting this in case you want to try it out too.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-15 11:48                               ` Stefan Hajnoczi
@ 2010-12-15 12:00                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-15 12:00 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Wed, Dec 15, 2010 at 11:48:50AM +0000, Stefan Hajnoczi wrote:
> For the record, here are the commits to selectively mask virtqueues
> for ioeventfd and to add -device virtio-net-pci,tx=immediate:
> http://repo.or.cz/w/qemu-kvm/stefanha.git/shortlog/refs/heads/virtio-ioeventfd-2
> 
> I'm posting this in case you want to try it out too.
> 
> Stefan

Thanks!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-15 11:42                             ` Stefan Hajnoczi
  2010-12-15 11:48                               ` Stefan Hajnoczi
@ 2010-12-15 12:14                               ` Michael S. Tsirkin
  2010-12-15 12:59                                 ` Stefan Hajnoczi
  1 sibling, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-15 12:14 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Wed, Dec 15, 2010 at 11:42:12AM +0000, Stefan Hajnoczi wrote:
> On Mon, Dec 13, 2010 at 6:52 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Mon, Dec 13, 2010 at 05:57:28PM +0000, Stefan Hajnoczi wrote:
> >> On Mon, Dec 13, 2010 at 4:28 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >> > On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >> On Mon, Dec 13, 2010 at 03:27:06PM +0000, Stefan Hajnoczi wrote:
> >> >>> On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >>> > On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
> >> >>> >> On Mon, Dec 13, 2010 at 01:11:27PM +0000, Stefan Hajnoczi wrote:
> >> >>> >> > Fresh results:
> >> >>> >> >
> >> >>> >> > 192.168.0.1 - host (runs netperf)
> >> >>> >> > 192.168.0.2 - guest (runs netserver)
> >> >>> >> >
> >> >>> >> > host$ src/netperf -H 192.168.0.2 -- -m 200
> >> >>> >> >
> >> >>> >> > ioeventfd=on
> >> >>> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
> >> >>> >> > (192.168.0.2) port 0 AF_INET
> >> >>> >> > Recv   Send    Send
> >> >>> >> > Socket Socket  Message  Elapsed
> >> >>> >> > Size   Size    Size     Time     Throughput
> >> >>> >> > bytes  bytes   bytes    secs.    10^6bits/sec
> >> >>> >> >  87380  16384    200    10.00    1759.25
> >> >>> >> >
> >> >>> >> > ioeventfd=off
> >> >>> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
> >> >>> >> > (192.168.0.2) port 0 AF_INET
> >> >>> >> > Recv   Send    Send
> >> >>> >> > Socket Socket  Message  Elapsed
> >> >>> >> > Size   Size    Size     Time     Throughput
> >> >>> >> > bytes  bytes   bytes    secs.    10^6bits/sec
> >> >>> >> >
> >> >>> >> >  87380  16384    200    10.00    1757.15
> >> >>> >> >
> >> >>> >> > The results vary approx +/- 3% between runs.
> >> >>> >> >
> >> >>> >> > Invocation:
> >> >>> >> > $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
> >> >>> >> > type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
> >> >>> >> > virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
> >> >>> >> > if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
> >> >>> >> >
> >> >>> >> > I am running qemu.git with v5 patches, based off
> >> >>> >> > 36888c6335422f07bbc50bf3443a39f24b90c7c6.
> >> >>> >> >
> >> >>> >> > Host:
> >> >>> >> > 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
> >> >>> >> > 8 GB RAM
> >> >>> >> > RHEL 6 host
> >> >>> >> >
> >> >>> >> > Next I will try the patches on latest qemu-kvm.git
> >> >>> >> >
> >> >>> >> > Stefan
> >> >>> >>
> >> >>> >> One interesting thing is that I put virtio-net earlier on
> >> >>> >> command line.
> >> >>> >
> >> >>> > Sorry I mean I put it after disk, you put it before.
> >> >>>
> >> >>> I can't find a measurable difference when swapping -drive and -netdev.
> >> >>
> >> >> One other concern I have is that we are apparently using
> >> >> ioeventfd for all VQs. E.g. for virtio-net we probably should not
> >> >> use it for the control VQ - it's a waste of resources.
> >> >
> >> > One option is a per-device (block, net, etc) bitmap that masks out
> >> > virtqueues.  Is that something you'd like to see?
> >> >
> >> > I'm tempted to mask out the RX vq too and see how that affects the
> >> > qemu-kvm.git specific issue.
> >>
> >> As expected, the rx virtqueue is involved in the degradation.  I
> >> enabled ioeventfd only for the TX virtqueue and got the same good
> >> results as userspace virtio-net.
> >>
> >> When I enable only the rx virtqueue, performs decreases as we've seen above.
> >>
> >> Stefan
> >
> > Interesting. In particular this implies something's wrong with the
> > queue: we should not normally be getting notifications from rx queue
> > at all. Is it running low on buffers? Does it help to increase the vq
> > size?  Any other explanation?
> 
> I made a mistake, it is the *tx* vq that causes reduced performance on
> short packets with ioeventfd.  I double-checked the results and the rx
> vq doesn't affect performance.
> 
> Initially I thought the fix would be to adjust the tx mitigation
> mechanism since ioeventfd does its own mitigation of sorts.  Multiple
> eventfd signals will be coalesced into one qemu-kvm event handler call
> if qemu-kvm didn't have a chance to handle the first event before the
> eventfd was signalled again.
> 
> I added -device virtio-net-pci tx=immediate to flush the TX queue
> immediately instead of scheduling a BH or timer.  Unfortunately this
> had little measurable effect and performance stayed the same.  This
> suggests most of the latency is between the guest's pio write and
> qemu-kvm getting around to handling the event.
> 
> You mentioned that vhost-net has the same performance issue on this
> benchmark.  I guess a solution for vhost-net may help virtio-ioeventfd
> and vice versa.
> 
> Are you happy with this patchset if I remove virtio-net-pci
> ioeventfd=on|off so only virtio-blk-pci has ioeventfd=on|off (with
> default on)?  For block we've found it to be a win and the initial
> results looked good for net too.
> 
> Stefan

I'm concerned that the tests were done on qemu.git.
Could you check block with qemu-kvm too please?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-15 12:14                               ` Michael S. Tsirkin
@ 2010-12-15 12:59                                 ` Stefan Hajnoczi
  2010-12-16 16:40                                   ` Stefan Hajnoczi
  2010-12-19 14:49                                   ` Michael S. Tsirkin
  0 siblings, 2 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-15 12:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Wed, Dec 15, 2010 at 12:14 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, Dec 15, 2010 at 11:42:12AM +0000, Stefan Hajnoczi wrote:
>> On Mon, Dec 13, 2010 at 6:52 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Mon, Dec 13, 2010 at 05:57:28PM +0000, Stefan Hajnoczi wrote:
>> >> On Mon, Dec 13, 2010 at 4:28 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> >> > On Mon, Dec 13, 2010 at 4:12 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> >> On Mon, Dec 13, 2010 at 03:27:06PM +0000, Stefan Hajnoczi wrote:
>> >> >>> On Mon, Dec 13, 2010 at 1:36 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> >>> > On Mon, Dec 13, 2010 at 03:35:38PM +0200, Michael S. Tsirkin wrote:
>> >> >>> >> On Mon, Dec 13, 2010 at 01:11:27PM +0000, Stefan Hajnoczi wrote:
>> >> >>> >> > Fresh results:
>> >> >>> >> >
>> >> >>> >> > 192.168.0.1 - host (runs netperf)
>> >> >>> >> > 192.168.0.2 - guest (runs netserver)
>> >> >>> >> >
>> >> >>> >> > host$ src/netperf -H 192.168.0.2 -- -m 200
>> >> >>> >> >
>> >> >>> >> > ioeventfd=on
>> >> >>> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
>> >> >>> >> > (192.168.0.2) port 0 AF_INET
>> >> >>> >> > Recv   Send    Send
>> >> >>> >> > Socket Socket  Message  Elapsed
>> >> >>> >> > Size   Size    Size     Time     Throughput
>> >> >>> >> > bytes  bytes   bytes    secs.    10^6bits/sec
>> >> >>> >> >  87380  16384    200    10.00    1759.25
>> >> >>> >> >
>> >> >>> >> > ioeventfd=off
>> >> >>> >> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.2
>> >> >>> >> > (192.168.0.2) port 0 AF_INET
>> >> >>> >> > Recv   Send    Send
>> >> >>> >> > Socket Socket  Message  Elapsed
>> >> >>> >> > Size   Size    Size     Time     Throughput
>> >> >>> >> > bytes  bytes   bytes    secs.    10^6bits/sec
>> >> >>> >> >
>> >> >>> >> >  87380  16384    200    10.00    1757.15
>> >> >>> >> >
>> >> >>> >> > The results vary approx +/- 3% between runs.
>> >> >>> >> >
>> >> >>> >> > Invocation:
>> >> >>> >> > $ x86_64-softmmu/qemu-system-x86_64 -m 4096 -enable-kvm -netdev
>> >> >>> >> > type=tap,id=net0,ifname=tap0,script=no,downscript=no -device
>> >> >>> >> > virtio-net-pci,netdev=net0,ioeventfd=on|off -vnc :0 -drive
>> >> >>> >> > if=virtio,cache=none,file=$HOME/rhel6-autobench-raw.img
>> >> >>> >> >
>> >> >>> >> > I am running qemu.git with v5 patches, based off
>> >> >>> >> > 36888c6335422f07bbc50bf3443a39f24b90c7c6.
>> >> >>> >> >
>> >> >>> >> > Host:
>> >> >>> >> > 1 Quad-Core AMD Opteron(tm) Processor 2350 @ 2 GHz
>> >> >>> >> > 8 GB RAM
>> >> >>> >> > RHEL 6 host
>> >> >>> >> >
>> >> >>> >> > Next I will try the patches on latest qemu-kvm.git
>> >> >>> >> >
>> >> >>> >> > Stefan
>> >> >>> >>
>> >> >>> >> One interesting thing is that I put virtio-net earlier on
>> >> >>> >> command line.
>> >> >>> >
>> >> >>> > Sorry I mean I put it after disk, you put it before.
>> >> >>>
>> >> >>> I can't find a measurable difference when swapping -drive and -netdev.
>> >> >>
>> >> >> One other concern I have is that we are apparently using
>> >> >> ioeventfd for all VQs. E.g. for virtio-net we probably should not
>> >> >> use it for the control VQ - it's a waste of resources.
>> >> >
>> >> > One option is a per-device (block, net, etc) bitmap that masks out
>> >> > virtqueues.  Is that something you'd like to see?
>> >> >
>> >> > I'm tempted to mask out the RX vq too and see how that affects the
>> >> > qemu-kvm.git specific issue.
>> >>
>> >> As expected, the rx virtqueue is involved in the degradation.  I
>> >> enabled ioeventfd only for the TX virtqueue and got the same good
>> >> results as userspace virtio-net.
>> >>
>> >> When I enable only the rx virtqueue, performs decreases as we've seen above.
>> >>
>> >> Stefan
>> >
>> > Interesting. In particular this implies something's wrong with the
>> > queue: we should not normally be getting notifications from rx queue
>> > at all. Is it running low on buffers? Does it help to increase the vq
>> > size?  Any other explanation?
>>
>> I made a mistake, it is the *tx* vq that causes reduced performance on
>> short packets with ioeventfd.  I double-checked the results and the rx
>> vq doesn't affect performance.
>>
>> Initially I thought the fix would be to adjust the tx mitigation
>> mechanism since ioeventfd does its own mitigation of sorts.  Multiple
>> eventfd signals will be coalesced into one qemu-kvm event handler call
>> if qemu-kvm didn't have a chance to handle the first event before the
>> eventfd was signalled again.
>>
>> I added -device virtio-net-pci tx=immediate to flush the TX queue
>> immediately instead of scheduling a BH or timer.  Unfortunately this
>> had little measurable effect and performance stayed the same.  This
>> suggests most of the latency is between the guest's pio write and
>> qemu-kvm getting around to handling the event.
>>
>> You mentioned that vhost-net has the same performance issue on this
>> benchmark.  I guess a solution for vhost-net may help virtio-ioeventfd
>> and vice versa.
>>
>> Are you happy with this patchset if I remove virtio-net-pci
>> ioeventfd=on|off so only virtio-blk-pci has ioeventfd=on|off (with
>> default on)?  For block we've found it to be a win and the initial
>> results looked good for net too.
>>
>> Stefan
>
> I'm concerned that the tests were done on qemu.git.
> Could you check block with qemu-kvm too please?

The following results show qemu-kvm with virtio-ioeventfd v3 for both
aio=native and aio=threads:

http://wiki.qemu.org/Features/VirtioIoeventfd

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-15 12:59                                 ` Stefan Hajnoczi
@ 2010-12-16 16:40                                   ` Stefan Hajnoczi
  2010-12-16 23:39                                     ` Michael S. Tsirkin
  2010-12-19 14:49                                   ` Michael S. Tsirkin
  1 sibling, 1 reply; 52+ messages in thread
From: Stefan Hajnoczi @ 2010-12-16 16:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Wed, Dec 15, 2010 at 12:59 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Wed, Dec 15, 2010 at 12:14 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Wed, Dec 15, 2010 at 11:42:12AM +0000, Stefan Hajnoczi wrote:
>>> Are you happy with this patchset if I remove virtio-net-pci
>>> ioeventfd=on|off so only virtio-blk-pci has ioeventfd=on|off (with
>>> default on)?  For block we've found it to be a win and the initial
>>> results looked good for net too.

Please let me know if I should disable ioeventfd for virtio-net.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-16 16:40                                   ` Stefan Hajnoczi
@ 2010-12-16 23:39                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-16 23:39 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Thu, Dec 16, 2010 at 04:40:32PM +0000, Stefan Hajnoczi wrote:
> On Wed, Dec 15, 2010 at 12:59 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Wed, Dec 15, 2010 at 12:14 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> On Wed, Dec 15, 2010 at 11:42:12AM +0000, Stefan Hajnoczi wrote:
> >>> Are you happy with this patchset if I remove virtio-net-pci
> >>> ioeventfd=on|off so only virtio-blk-pci has ioeventfd=on|off (with
> >>> default on)?  For block we've found it to be a win and the initial
> >>> results looked good for net too.
> 
> Please let me know if I should disable ioeventfd for virtio-net.
> 
> Stefan

Sure, if it slows us down, we should disable it.
What bothers me is the API issue that makes ioeventfd an all or nothing
thing, so it's enabled for the control vq needs to be resolved anyway.

Still it does not affect block, so maybe we can merge as is
and fix later ... I will try to think it over on the weekend.

-- 
MST

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-15 12:59                                 ` Stefan Hajnoczi
  2010-12-16 16:40                                   ` Stefan Hajnoczi
@ 2010-12-19 14:49                                   ` Michael S. Tsirkin
  2011-01-06 16:41                                     ` Stefan Hajnoczi
  1 sibling, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2010-12-19 14:49 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Wed, Dec 15, 2010 at 12:59:45PM +0000, Stefan Hajnoczi wrote:
> > I'm concerned that the tests were done on qemu.git.
> > Could you check block with qemu-kvm too please?
> 
> The following results show qemu-kvm with virtio-ioeventfd v3 for both
> aio=native and aio=threads:
> 
> http://wiki.qemu.org/Features/VirtioIoeventfd
> 
> Stefan

What were the flags used to run qemu here? One option that's known to
affect speed significantly is x2apic. Did you try it?

-- 
MST

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2010-12-19 14:49                                   ` Michael S. Tsirkin
@ 2011-01-06 16:41                                     ` Stefan Hajnoczi
  2011-01-06 17:04                                       ` Michael S. Tsirkin
  2011-01-06 18:00                                       ` Michael S. Tsirkin
  0 siblings, 2 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2011-01-06 16:41 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Khoa Huynh, qemu-devel

Here are 4k sequential read results (cache=none) to check whether we
see an ioeventfd performance regression with virtio-blk.

The idea is to use a small blocksize with an I/O pattern (sequential
reads) that is cheap and executes quickly.  Therefore we're doing many
iops and the cost virtqueue kick/notify is especially important.
We're not trying to stress the disk, we're trying to make the
difference in ioeventfd=on/off apparent.

I did 2 runs for both ioeventfd=off and ioeventfd=on.  The results are
similar: 1% and 2% degradation in MB/s or iops.  We'd have to do more
runs to see if the degradation is statistically significant, but the
percentage value is so low that I'm satisfied.

Are you happy to merge virtio-ioeventfd v6 + your fixups?

Full results below:

x86_64-softmmu/qemu-system-x86_64 -m 1024 -drive
if=none,file=rhel6.img,cache=none,id=system -device
virtio-blk-pci,drive=system -drive
if=none,file=/dev/volumes/storage,cache=none,id=storage -device
virtio-blk-pci,drive=storage -cpu kvm64,+x2apic -vnc :0

fio jobfile:
[global]
ioengine=libaio
buffered=0
rw=read
bs=4k
iodepth=1
runtime=2m

[job1]
filename=/dev/vdb

ioeventfd=off:
job1: (groupid=0, jobs=1): err= 0: pid=2692
  read : io=2,353MB, bw=20,080KB/s, iops=5,019, runt=120001msec
    slat (usec): min=20, max=1,424, avg=34.86, stdev= 7.62
    clat (usec): min=1, max=11,547, avg=162.02, stdev=42.95
    bw (KB/s) : min=16600, max=20328, per=100.03%, avg=20084.25, stdev=241.88
  cpu          : usr=1.14%, sys=13.40%, ctx=604918, majf=0, minf=29
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=602391/0, short=0/0
     lat (usec): 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
     lat (usec): 100=0.01%, 250=99.89%, 500=0.07%, 750=0.01%, 1000=0.02%
     lat (msec): 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%

Run status group 0 (all jobs):
   READ: io=2,353MB, aggrb=20,079KB/s, minb=20,561KB/s,
maxb=20,561KB/s, mint=120001msec, maxt=120001msec

Disk stats (read/write):
  vdb: ios=601339/0, merge=0/0, ticks=112092/0, in_queue=111815, util=93.38%

ioeventfd=on:
job1: (groupid=0, jobs=1): err= 0: pid=2692
  read : io=2,299MB, bw=19,619KB/s, iops=4,904, runt=120001msec
    slat (usec): min=9, max=2,257, avg=40.43, stdev=11.65
    clat (usec): min=1, max=28,000, avg=161.12, stdev=61.46
    bw (KB/s) : min=15720, max=19984, per=100.02%, avg=19623.26, stdev=290.76
  cpu          : usr=1.49%, sys=19.34%, ctx=591398, majf=0, minf=29
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=588578/0, short=0/0
     lat (usec): 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
     lat (usec): 100=0.01%, 250=99.86%, 500=0.09%, 750=0.01%, 1000=0.02%
     lat (msec): 2=0.01%, 4=0.01%, 10=0.01%, 50=0.01%

Run status group 0 (all jobs):
   READ: io=2,299MB, aggrb=19,619KB/s, minb=20,089KB/s,
maxb=20,089KB/s, mint=120001msec, maxt=120001msec

Disk stats (read/write):
  vdb: ios=587592/0, merge=0/0, ticks=110373/0, in_queue=110125, util=91.97%

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2011-01-06 16:41                                     ` Stefan Hajnoczi
@ 2011-01-06 17:04                                       ` Michael S. Tsirkin
  2011-01-06 18:00                                       ` Michael S. Tsirkin
  1 sibling, 0 replies; 52+ messages in thread
From: Michael S. Tsirkin @ 2011-01-06 17:04 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Khoa Huynh, qemu-devel

On Thu, Jan 06, 2011 at 04:41:50PM +0000, Stefan Hajnoczi wrote:
> Here are 4k sequential read results (cache=none) to check whether we
> see an ioeventfd performance regression with virtio-blk.
> 
> The idea is to use a small blocksize with an I/O pattern (sequential
> reads) that is cheap and executes quickly.  Therefore we're doing many
> iops and the cost virtqueue kick/notify is especially important.
> We're not trying to stress the disk, we're trying to make the
> difference in ioeventfd=on/off apparent.
> 
> I did 2 runs for both ioeventfd=off and ioeventfd=on.  The results are
> similar: 1% and 2% degradation in MB/s or iops.  We'd have to do more
> runs to see if the degradation is statistically significant, but the
> percentage value is so low that I'm satisfied.
> 
> Are you happy to merge virtio-ioeventfd v6 + your fixups?

Think so. I would like to do a bit of testing of the whole thing
with migration (ideally with virtio net
and vhost too, even though we don't yet enable them).

Hope to put it on my tree by next week.

> Full results below:
> 
> x86_64-softmmu/qemu-system-x86_64 -m 1024 -drive
> if=none,file=rhel6.img,cache=none,id=system -device
> virtio-blk-pci,drive=system -drive
> if=none,file=/dev/volumes/storage,cache=none,id=storage -device
> virtio-blk-pci,drive=storage -cpu kvm64,+x2apic -vnc :0
> 
> fio jobfile:
> [global]
> ioengine=libaio
> buffered=0
> rw=read
> bs=4k
> iodepth=1
> runtime=2m
> 
> [job1]
> filename=/dev/vdb
> 
> ioeventfd=off:
> job1: (groupid=0, jobs=1): err= 0: pid=2692
>   read : io=2,353MB, bw=20,080KB/s, iops=5,019, runt=120001msec
>     slat (usec): min=20, max=1,424, avg=34.86, stdev= 7.62
>     clat (usec): min=1, max=11,547, avg=162.02, stdev=42.95
>     bw (KB/s) : min=16600, max=20328, per=100.03%, avg=20084.25, stdev=241.88
>   cpu          : usr=1.14%, sys=13.40%, ctx=604918, majf=0, minf=29
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued r/w: total=602391/0, short=0/0
>      lat (usec): 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
>      lat (usec): 100=0.01%, 250=99.89%, 500=0.07%, 750=0.01%, 1000=0.02%
>      lat (msec): 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%
> 
> Run status group 0 (all jobs):
>    READ: io=2,353MB, aggrb=20,079KB/s, minb=20,561KB/s,
> maxb=20,561KB/s, mint=120001msec, maxt=120001msec
> 
> Disk stats (read/write):
>   vdb: ios=601339/0, merge=0/0, ticks=112092/0, in_queue=111815, util=93.38%
> 
> ioeventfd=on:
> job1: (groupid=0, jobs=1): err= 0: pid=2692
>   read : io=2,299MB, bw=19,619KB/s, iops=4,904, runt=120001msec
>     slat (usec): min=9, max=2,257, avg=40.43, stdev=11.65
>     clat (usec): min=1, max=28,000, avg=161.12, stdev=61.46
>     bw (KB/s) : min=15720, max=19984, per=100.02%, avg=19623.26, stdev=290.76
>   cpu          : usr=1.49%, sys=19.34%, ctx=591398, majf=0, minf=29
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued r/w: total=588578/0, short=0/0
>      lat (usec): 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
>      lat (usec): 100=0.01%, 250=99.86%, 500=0.09%, 750=0.01%, 1000=0.02%
>      lat (msec): 2=0.01%, 4=0.01%, 10=0.01%, 50=0.01%
> 
> Run status group 0 (all jobs):
>    READ: io=2,299MB, aggrb=19,619KB/s, minb=20,089KB/s,
> maxb=20,089KB/s, mint=120001msec, maxt=120001msec
> 
> Disk stats (read/write):
>   vdb: ios=587592/0, merge=0/0, ticks=110373/0, in_queue=110125, util=91.97%
> 
> Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2011-01-06 16:41                                     ` Stefan Hajnoczi
  2011-01-06 17:04                                       ` Michael S. Tsirkin
@ 2011-01-06 18:00                                       ` Michael S. Tsirkin
  2011-01-07  8:56                                         ` Stefan Hajnoczi
  1 sibling, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2011-01-06 18:00 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Khoa Huynh, qemu-devel

On Thu, Jan 06, 2011 at 04:41:50PM +0000, Stefan Hajnoczi wrote:
> Here are 4k sequential read results (cache=none) to check whether we
> see an ioeventfd performance regression with virtio-blk.
> 
> The idea is to use a small blocksize with an I/O pattern (sequential
> reads) that is cheap and executes quickly.  Therefore we're doing many
> iops and the cost virtqueue kick/notify is especially important.
> We're not trying to stress the disk, we're trying to make the
> difference in ioeventfd=on/off apparent.
> 
> I did 2 runs for both ioeventfd=off and ioeventfd=on.  The results are
> similar: 1% and 2% degradation in MB/s or iops.  We'd have to do more
> runs to see if the degradation is statistically significant, but the
> percentage value is so low that I'm satisfied.
> 
> Are you happy to merge virtio-ioeventfd v6 + your fixups?

BTW if you could do some migration stress-testing too,
would be nice. autotest has support for it now.

-- 
MST

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify
  2011-01-06 18:00                                       ` Michael S. Tsirkin
@ 2011-01-07  8:56                                         ` Stefan Hajnoczi
  0 siblings, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2011-01-07  8:56 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Khoa Huynh, qemu-devel

On Thu, Jan 6, 2011 at 6:00 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Jan 06, 2011 at 04:41:50PM +0000, Stefan Hajnoczi wrote:
>> Here are 4k sequential read results (cache=none) to check whether we
>> see an ioeventfd performance regression with virtio-blk.
>>
>> The idea is to use a small blocksize with an I/O pattern (sequential
>> reads) that is cheap and executes quickly.  Therefore we're doing many
>> iops and the cost virtqueue kick/notify is especially important.
>> We're not trying to stress the disk, we're trying to make the
>> difference in ioeventfd=on/off apparent.
>>
>> I did 2 runs for both ioeventfd=off and ioeventfd=on.  The results are
>> similar: 1% and 2% degradation in MB/s or iops.  We'd have to do more
>> runs to see if the degradation is statistically significant, but the
>> percentage value is so low that I'm satisfied.
>>
>> Are you happy to merge virtio-ioeventfd v6 + your fixups?
>
> BTW if you could do some migration stress-testing too,
> would be nice. autotest has support for it now.

Okay, I'll let you know the results.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify Stefan Hajnoczi
@ 2011-01-24 18:54   ` Kevin Wolf
  2011-01-24 19:36     ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2011-01-24 18:54 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Michael S. Tsirkin, qemu-devel, Stefan Hajnoczi

Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
> Virtqueue notify is currently handled synchronously in userspace virtio.  This
> prevents the vcpu from executing guest code while hardware emulation code
> handles the notify.
> 
> On systems that support KVM, the ioeventfd mechanism can be used to make
> virtqueue notify a lightweight exit by deferring hardware emulation to the
> iothread and allowing the VM to continue execution.  This model is similar to
> how vhost receives virtqueue notifies.
> 
> The result of this change is improved performance for userspace virtio devices.
> Virtio-blk throughput increases especially for multithreaded scenarios and
> virtio-net transmit throughput increases substantially.
> 
> Some virtio devices are known to have guest drivers which expect a notify to be
> processed synchronously and spin waiting for completion.  Only enable ioeventfd
> for virtio-blk and virtio-net for now.
> 
> Care must be taken not to interfere with vhost-net, which uses host
> notifiers.  If the set_host_notifier() API is used by a device
> virtio-pci will disable virtio-ioeventfd and let the device deal with
> host notifiers as it wishes.
> 
> After migration and on VM change state (running/paused) virtio-ioeventfd
> will enable/disable itself.
> 
>  * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
>  * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
>  * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
>  * vm_change_state(running=0) -> disable virtio-ioeventfd
>  * vm_change_state(running=1) -> enable virtio-ioeventfd
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

On current git master I'm getting hangs when running iozone on a
virtio-blk disk. "Hang" means that it's not responsive any more and has
100% CPU consumption.

I bisected the problem to this patch. Any ideas?

Kevin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-24 18:54   ` Kevin Wolf
@ 2011-01-24 19:36     ` Michael S. Tsirkin
  2011-01-24 19:48       ` Kevin Wolf
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2011-01-24 19:36 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi

On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
> > Virtqueue notify is currently handled synchronously in userspace virtio.  This
> > prevents the vcpu from executing guest code while hardware emulation code
> > handles the notify.
> > 
> > On systems that support KVM, the ioeventfd mechanism can be used to make
> > virtqueue notify a lightweight exit by deferring hardware emulation to the
> > iothread and allowing the VM to continue execution.  This model is similar to
> > how vhost receives virtqueue notifies.
> > 
> > The result of this change is improved performance for userspace virtio devices.
> > Virtio-blk throughput increases especially for multithreaded scenarios and
> > virtio-net transmit throughput increases substantially.
> > 
> > Some virtio devices are known to have guest drivers which expect a notify to be
> > processed synchronously and spin waiting for completion.  Only enable ioeventfd
> > for virtio-blk and virtio-net for now.
> > 
> > Care must be taken not to interfere with vhost-net, which uses host
> > notifiers.  If the set_host_notifier() API is used by a device
> > virtio-pci will disable virtio-ioeventfd and let the device deal with
> > host notifiers as it wishes.
> > 
> > After migration and on VM change state (running/paused) virtio-ioeventfd
> > will enable/disable itself.
> > 
> >  * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
> >  * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
> >  * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
> >  * vm_change_state(running=0) -> disable virtio-ioeventfd
> >  * vm_change_state(running=1) -> enable virtio-ioeventfd
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> 
> On current git master I'm getting hangs when running iozone on a
> virtio-blk disk. "Hang" means that it's not responsive any more and has
> 100% CPU consumption.
> 
> I bisected the problem to this patch. Any ideas?
> 
> Kevin

Does it help if you set ioeventfd=off on command line?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-24 19:48       ` Kevin Wolf
@ 2011-01-24 19:47         ` Michael S. Tsirkin
  2011-01-24 20:05           ` Kevin Wolf
  0 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2011-01-24 19:47 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi

On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
> > On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
> >> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
> >>> Virtqueue notify is currently handled synchronously in userspace virtio.  This
> >>> prevents the vcpu from executing guest code while hardware emulation code
> >>> handles the notify.
> >>>
> >>> On systems that support KVM, the ioeventfd mechanism can be used to make
> >>> virtqueue notify a lightweight exit by deferring hardware emulation to the
> >>> iothread and allowing the VM to continue execution.  This model is similar to
> >>> how vhost receives virtqueue notifies.
> >>>
> >>> The result of this change is improved performance for userspace virtio devices.
> >>> Virtio-blk throughput increases especially for multithreaded scenarios and
> >>> virtio-net transmit throughput increases substantially.
> >>>
> >>> Some virtio devices are known to have guest drivers which expect a notify to be
> >>> processed synchronously and spin waiting for completion.  Only enable ioeventfd
> >>> for virtio-blk and virtio-net for now.
> >>>
> >>> Care must be taken not to interfere with vhost-net, which uses host
> >>> notifiers.  If the set_host_notifier() API is used by a device
> >>> virtio-pci will disable virtio-ioeventfd and let the device deal with
> >>> host notifiers as it wishes.
> >>>
> >>> After migration and on VM change state (running/paused) virtio-ioeventfd
> >>> will enable/disable itself.
> >>>
> >>>  * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
> >>>  * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
> >>>  * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
> >>>  * vm_change_state(running=0) -> disable virtio-ioeventfd
> >>>  * vm_change_state(running=1) -> enable virtio-ioeventfd
> >>>
> >>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> >>
> >> On current git master I'm getting hangs when running iozone on a
> >> virtio-blk disk. "Hang" means that it's not responsive any more and has
> >> 100% CPU consumption.
> >>
> >> I bisected the problem to this patch. Any ideas?
> >>
> >> Kevin
> > 
> > Does it help if you set ioeventfd=off on command line?
> 
> Yes, with ioeventfd=off it seems to work fine.
> 
> Kevin

Then it's the ioeventfd that is to blame.
Is it the io thread that consumes 100% CPU?
Or the vcpu thread?

-- 
MST

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-24 19:36     ` Michael S. Tsirkin
@ 2011-01-24 19:48       ` Kevin Wolf
  2011-01-24 19:47         ` Michael S. Tsirkin
  0 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2011-01-24 19:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi

Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
>>> Virtqueue notify is currently handled synchronously in userspace virtio.  This
>>> prevents the vcpu from executing guest code while hardware emulation code
>>> handles the notify.
>>>
>>> On systems that support KVM, the ioeventfd mechanism can be used to make
>>> virtqueue notify a lightweight exit by deferring hardware emulation to the
>>> iothread and allowing the VM to continue execution.  This model is similar to
>>> how vhost receives virtqueue notifies.
>>>
>>> The result of this change is improved performance for userspace virtio devices.
>>> Virtio-blk throughput increases especially for multithreaded scenarios and
>>> virtio-net transmit throughput increases substantially.
>>>
>>> Some virtio devices are known to have guest drivers which expect a notify to be
>>> processed synchronously and spin waiting for completion.  Only enable ioeventfd
>>> for virtio-blk and virtio-net for now.
>>>
>>> Care must be taken not to interfere with vhost-net, which uses host
>>> notifiers.  If the set_host_notifier() API is used by a device
>>> virtio-pci will disable virtio-ioeventfd and let the device deal with
>>> host notifiers as it wishes.
>>>
>>> After migration and on VM change state (running/paused) virtio-ioeventfd
>>> will enable/disable itself.
>>>
>>>  * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
>>>  * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
>>>  * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
>>>  * vm_change_state(running=0) -> disable virtio-ioeventfd
>>>  * vm_change_state(running=1) -> enable virtio-ioeventfd
>>>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>
>> On current git master I'm getting hangs when running iozone on a
>> virtio-blk disk. "Hang" means that it's not responsive any more and has
>> 100% CPU consumption.
>>
>> I bisected the problem to this patch. Any ideas?
>>
>> Kevin
> 
> Does it help if you set ioeventfd=off on command line?

Yes, with ioeventfd=off it seems to work fine.

Kevin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-24 19:47         ` Michael S. Tsirkin
@ 2011-01-24 20:05           ` Kevin Wolf
  2011-01-25  7:12             ` Stefan Hajnoczi
  0 siblings, 1 reply; 52+ messages in thread
From: Kevin Wolf @ 2011-01-24 20:05 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi

Am 24.01.2011 20:47, schrieb Michael S. Tsirkin:
> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
>>>>> Virtqueue notify is currently handled synchronously in userspace virtio.  This
>>>>> prevents the vcpu from executing guest code while hardware emulation code
>>>>> handles the notify.
>>>>>
>>>>> On systems that support KVM, the ioeventfd mechanism can be used to make
>>>>> virtqueue notify a lightweight exit by deferring hardware emulation to the
>>>>> iothread and allowing the VM to continue execution.  This model is similar to
>>>>> how vhost receives virtqueue notifies.
>>>>>
>>>>> The result of this change is improved performance for userspace virtio devices.
>>>>> Virtio-blk throughput increases especially for multithreaded scenarios and
>>>>> virtio-net transmit throughput increases substantially.
>>>>>
>>>>> Some virtio devices are known to have guest drivers which expect a notify to be
>>>>> processed synchronously and spin waiting for completion.  Only enable ioeventfd
>>>>> for virtio-blk and virtio-net for now.
>>>>>
>>>>> Care must be taken not to interfere with vhost-net, which uses host
>>>>> notifiers.  If the set_host_notifier() API is used by a device
>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal with
>>>>> host notifiers as it wishes.
>>>>>
>>>>> After migration and on VM change state (running/paused) virtio-ioeventfd
>>>>> will enable/disable itself.
>>>>>
>>>>>  * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
>>>>>  * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
>>>>>  * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
>>>>>  * vm_change_state(running=0) -> disable virtio-ioeventfd
>>>>>  * vm_change_state(running=1) -> enable virtio-ioeventfd
>>>>>
>>>>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>>>
>>>> On current git master I'm getting hangs when running iozone on a
>>>> virtio-blk disk. "Hang" means that it's not responsive any more and has
>>>> 100% CPU consumption.
>>>>
>>>> I bisected the problem to this patch. Any ideas?
>>>>
>>>> Kevin
>>>
>>> Does it help if you set ioeventfd=off on command line?
>>
>> Yes, with ioeventfd=off it seems to work fine.
>>
>> Kevin
> 
> Then it's the ioeventfd that is to blame.
> Is it the io thread that consumes 100% CPU?
> Or the vcpu thread?

I was building with the default options, i.e. there is no IO thread.

Now I'm just running the test with IO threads enabled, and so far
everything looks good. So I can only reproduce the problem with IO
threads disabled.

Kevin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-24 20:05           ` Kevin Wolf
@ 2011-01-25  7:12             ` Stefan Hajnoczi
  2011-01-25  9:49               ` Stefan Hajnoczi
  0 siblings, 1 reply; 52+ messages in thread
From: Stefan Hajnoczi @ 2011-01-25  7:12 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi, Michael S. Tsirkin

On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin:
>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
>>>>>> Virtqueue notify is currently handled synchronously in userspace virtio.  This
>>>>>> prevents the vcpu from executing guest code while hardware emulation code
>>>>>> handles the notify.
>>>>>>
>>>>>> On systems that support KVM, the ioeventfd mechanism can be used to make
>>>>>> virtqueue notify a lightweight exit by deferring hardware emulation to the
>>>>>> iothread and allowing the VM to continue execution.  This model is similar to
>>>>>> how vhost receives virtqueue notifies.
>>>>>>
>>>>>> The result of this change is improved performance for userspace virtio devices.
>>>>>> Virtio-blk throughput increases especially for multithreaded scenarios and
>>>>>> virtio-net transmit throughput increases substantially.
>>>>>>
>>>>>> Some virtio devices are known to have guest drivers which expect a notify to be
>>>>>> processed synchronously and spin waiting for completion.  Only enable ioeventfd
>>>>>> for virtio-blk and virtio-net for now.
>>>>>>
>>>>>> Care must be taken not to interfere with vhost-net, which uses host
>>>>>> notifiers.  If the set_host_notifier() API is used by a device
>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal with
>>>>>> host notifiers as it wishes.
>>>>>>
>>>>>> After migration and on VM change state (running/paused) virtio-ioeventfd
>>>>>> will enable/disable itself.
>>>>>>
>>>>>>  * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
>>>>>>  * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
>>>>>>  * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
>>>>>>  * vm_change_state(running=0) -> disable virtio-ioeventfd
>>>>>>  * vm_change_state(running=1) -> enable virtio-ioeventfd
>>>>>>
>>>>>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>>>>
>>>>> On current git master I'm getting hangs when running iozone on a
>>>>> virtio-blk disk. "Hang" means that it's not responsive any more and has
>>>>> 100% CPU consumption.
>>>>>
>>>>> I bisected the problem to this patch. Any ideas?
>>>>>
>>>>> Kevin
>>>>
>>>> Does it help if you set ioeventfd=off on command line?
>>>
>>> Yes, with ioeventfd=off it seems to work fine.
>>>
>>> Kevin
>>
>> Then it's the ioeventfd that is to blame.
>> Is it the io thread that consumes 100% CPU?
>> Or the vcpu thread?
>
> I was building with the default options, i.e. there is no IO thread.
>
> Now I'm just running the test with IO threads enabled, and so far
> everything looks good. So I can only reproduce the problem with IO
> threads disabled.

Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions
(relevant when --enable-io-thread is not used).  I will take a look at
that again and see why we're spinning without checking for ioeventfd
completion.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-25  7:12             ` Stefan Hajnoczi
@ 2011-01-25  9:49               ` Stefan Hajnoczi
  2011-01-25  9:54                 ` Stefan Hajnoczi
                                   ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2011-01-25  9:49 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Anthony Liguori, Avi Kivity, qemu-devel, Stefan Hajnoczi,
	Michael S. Tsirkin

On Tue, Jan 25, 2011 at 7:12 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin:
>>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
>>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
>>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
>>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
>>>>>>> Virtqueue notify is currently handled synchronously in userspace virtio.  This
>>>>>>> prevents the vcpu from executing guest code while hardware emulation code
>>>>>>> handles the notify.
>>>>>>>
>>>>>>> On systems that support KVM, the ioeventfd mechanism can be used to make
>>>>>>> virtqueue notify a lightweight exit by deferring hardware emulation to the
>>>>>>> iothread and allowing the VM to continue execution.  This model is similar to
>>>>>>> how vhost receives virtqueue notifies.
>>>>>>>
>>>>>>> The result of this change is improved performance for userspace virtio devices.
>>>>>>> Virtio-blk throughput increases especially for multithreaded scenarios and
>>>>>>> virtio-net transmit throughput increases substantially.
>>>>>>>
>>>>>>> Some virtio devices are known to have guest drivers which expect a notify to be
>>>>>>> processed synchronously and spin waiting for completion.  Only enable ioeventfd
>>>>>>> for virtio-blk and virtio-net for now.
>>>>>>>
>>>>>>> Care must be taken not to interfere with vhost-net, which uses host
>>>>>>> notifiers.  If the set_host_notifier() API is used by a device
>>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal with
>>>>>>> host notifiers as it wishes.
>>>>>>>
>>>>>>> After migration and on VM change state (running/paused) virtio-ioeventfd
>>>>>>> will enable/disable itself.
>>>>>>>
>>>>>>>  * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
>>>>>>>  * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
>>>>>>>  * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
>>>>>>>  * vm_change_state(running=0) -> disable virtio-ioeventfd
>>>>>>>  * vm_change_state(running=1) -> enable virtio-ioeventfd
>>>>>>>
>>>>>>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>>>>>
>>>>>> On current git master I'm getting hangs when running iozone on a
>>>>>> virtio-blk disk. "Hang" means that it's not responsive any more and has
>>>>>> 100% CPU consumption.
>>>>>>
>>>>>> I bisected the problem to this patch. Any ideas?
>>>>>>
>>>>>> Kevin
>>>>>
>>>>> Does it help if you set ioeventfd=off on command line?
>>>>
>>>> Yes, with ioeventfd=off it seems to work fine.
>>>>
>>>> Kevin
>>>
>>> Then it's the ioeventfd that is to blame.
>>> Is it the io thread that consumes 100% CPU?
>>> Or the vcpu thread?
>>
>> I was building with the default options, i.e. there is no IO thread.
>>
>> Now I'm just running the test with IO threads enabled, and so far
>> everything looks good. So I can only reproduce the problem with IO
>> threads disabled.
>
> Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions
> (relevant when --enable-io-thread is not used).  I will take a look at
> that again and see why we're spinning without checking for ioeventfd
> completion.

Here's my understanding of --disable-io-thread.  Added Anthony on CC,
please correct me.

When I/O thread is disabled our only thread runs guest code until an
exit request is made.  There are synchronous exit cases like a halt
instruction or single step.  There are also asynchronous exit cases
when signal handlers use qemu_notify_event(), which does cpu_exit(),
to set env->exit_request = 1 and unlink the current tb.

With this structure in mind, anything which needs to interrupt the
vcpu in order to process events must use signals and
qemu_notify_event().  Otherwise that event source may be starved and
never processed.

virtio-ioeventfd currently does not use signals and will therefore
never interrupt the vcpu.

However, you normally don't notice the missing signal handler because
some other event interrupts the vcpu and we enter select(2) to process
all pending handlers.  So virtio-ioeventfd mostly gets a free ride on
top of timer events.  This is suboptimal because it adds latency to
virtqueue kick - we're waiting for another event to interrupt the vcpu
before we can process virtqueue-kick.

If any other vcpu interruption makes virtio-ioeventfd chug along then
why are you seeing 100% CPU livelock?  My theory is that dynticks has
a race condition which causes timers to stop working in QEMU.  Here is
an strace of QEMU --disable-io-thread entering live lock.  I can
trigger this by starting a VM and running "while true; do true; done"
at the shell.  Then strace the QEMU process:

08:04:34.985177 ioctl(11, KVM_RUN, 0)   = 0
08:04:34.985242 --- SIGALRM (Alarm clock) @ 0 (0) ---
08:04:34.985319 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
08:04:34.985368 rt_sigreturn(0x2758ad0) = 0
08:04:34.985423 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0})
08:04:34.985484 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
08:04:34.985538 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
08:04:34.985588 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
273000}}, NULL) = 0
08:04:34.985646 ioctl(11, KVM_RUN, 0)   = -1 EINTR (Interrupted system call)
08:04:34.985928 --- SIGALRM (Alarm clock) @ 0 (0) ---
08:04:34.986007 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
08:04:34.986063 rt_sigreturn(0x2758ad0) = -1 EINTR (Interrupted system call)
08:04:34.986124 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0})
08:04:34.986188 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
08:04:34.986246 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
08:04:34.986299 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
250000}}, NULL) = 0
08:04:34.986359 ioctl(11, KVM_INTERRUPT, 0x7fff90404ef0) = 0
08:04:34.986406 ioctl(11, KVM_RUN, 0)   = 0
08:04:34.986465 ioctl(11, KVM_RUN, 0)   = 0              <--- guest
finishes execution

                v--- dynticks_rearm_timer() returns early because
timer is already scheduled
08:04:34.986533 timer_gettime(0, {it_interval={0, 0}, it_value={0, 24203}}) = 0
08:04:34.986585 --- SIGALRM (Alarm clock) @ 0 (0) ---    <--- timer expires
08:04:34.986661 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
08:04:34.986710 rt_sigreturn(0x2758ad0) = 0

                v--- we re-enter the guest without rearming the timer!
08:04:34.986754 ioctl(11, KVM_RUN^C <unfinished ...>
[QEMU hang, 100% CPU]

So dynticks fails to rearm the timer before we enter the guest.  This
is a race condition: we check that there is already a timer scheduled
and head on towards re-entering the guest, the timer expires before we
enter the guest, we re-enter the guest without realizing the timer has
expired.  Now we're inside the guest without the hope of a timer
expiring - and the guest is running a CPU-bound workload that doesn't
need to perform I/O.

The result is a hung QEMU (screen does not update) and a softlockup
inside the guest once we do kick it to life again (by detaching
strace).

I think the only way to avoid this race condition in dynticks is to
mask SIGALRM, then check if the timer expired, and then ioctl(KVM_RUN)
with atomic signal mask change back to SIGALRM enabled.  Thoughts?

Back to virtio-ioeventfd, we really shouldn't use virtio-ioeventfd
when there is no I/O thread.  It doesn't make sense because there's no
opportunity to process the virtqueue while the guest code is executing
in parallel like there is with I/O thread.  It will just degrade
performance when QEMU only has one thread.  I'll send a patch to
disable it when we build without I/O thread.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-25  9:49               ` Stefan Hajnoczi
@ 2011-01-25  9:54                 ` Stefan Hajnoczi
  2011-01-25 11:27                 ` Michael S. Tsirkin
  2011-01-25 19:18                 ` Anthony Liguori
  2 siblings, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2011-01-25  9:54 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Anthony Liguori, Avi Kivity, qemu-devel, Stefan Hajnoczi,
	Michael S. Tsirkin

On Tue, Jan 25, 2011 at 9:49 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> If any other vcpu interruption makes virtio-ioeventfd chug along then
> why are you seeing 100% CPU livelock?  My theory is that dynticks has
> a race condition which causes timers to stop working in QEMU.

I forgot to mention that you can test this theory by building without
I/O thread and running with -clock hpet.

If the guest no longer hangs, then this suggests you're seeing the
dynticks race condition.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-25  9:49               ` Stefan Hajnoczi
  2011-01-25  9:54                 ` Stefan Hajnoczi
@ 2011-01-25 11:27                 ` Michael S. Tsirkin
  2011-01-25 13:20                   ` Stefan Hajnoczi
  2011-01-25 19:18                 ` Anthony Liguori
  2 siblings, 1 reply; 52+ messages in thread
From: Michael S. Tsirkin @ 2011-01-25 11:27 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Anthony Liguori, Avi Kivity, qemu-devel, Stefan Hajnoczi

On Tue, Jan 25, 2011 at 09:49:04AM +0000, Stefan Hajnoczi wrote:
> On Tue, Jan 25, 2011 at 7:12 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> >> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin:
> >>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
> >>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
> >>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
> >>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
> >>>>>>> Virtqueue notify is currently handled synchronously in userspace virtio.  This
> >>>>>>> prevents the vcpu from executing guest code while hardware emulation code
> >>>>>>> handles the notify.
> >>>>>>>
> >>>>>>> On systems that support KVM, the ioeventfd mechanism can be used to make
> >>>>>>> virtqueue notify a lightweight exit by deferring hardware emulation to the
> >>>>>>> iothread and allowing the VM to continue execution.  This model is similar to
> >>>>>>> how vhost receives virtqueue notifies.
> >>>>>>>
> >>>>>>> The result of this change is improved performance for userspace virtio devices.
> >>>>>>> Virtio-blk throughput increases especially for multithreaded scenarios and
> >>>>>>> virtio-net transmit throughput increases substantially.
> >>>>>>>
> >>>>>>> Some virtio devices are known to have guest drivers which expect a notify to be
> >>>>>>> processed synchronously and spin waiting for completion.  Only enable ioeventfd
> >>>>>>> for virtio-blk and virtio-net for now.
> >>>>>>>
> >>>>>>> Care must be taken not to interfere with vhost-net, which uses host
> >>>>>>> notifiers.  If the set_host_notifier() API is used by a device
> >>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal with
> >>>>>>> host notifiers as it wishes.
> >>>>>>>
> >>>>>>> After migration and on VM change state (running/paused) virtio-ioeventfd
> >>>>>>> will enable/disable itself.
> >>>>>>>
> >>>>>>>  * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
> >>>>>>>  * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
> >>>>>>>  * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
> >>>>>>>  * vm_change_state(running=0) -> disable virtio-ioeventfd
> >>>>>>>  * vm_change_state(running=1) -> enable virtio-ioeventfd
> >>>>>>>
> >>>>>>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> >>>>>>
> >>>>>> On current git master I'm getting hangs when running iozone on a
> >>>>>> virtio-blk disk. "Hang" means that it's not responsive any more and has
> >>>>>> 100% CPU consumption.
> >>>>>>
> >>>>>> I bisected the problem to this patch. Any ideas?
> >>>>>>
> >>>>>> Kevin
> >>>>>
> >>>>> Does it help if you set ioeventfd=off on command line?
> >>>>
> >>>> Yes, with ioeventfd=off it seems to work fine.
> >>>>
> >>>> Kevin
> >>>
> >>> Then it's the ioeventfd that is to blame.
> >>> Is it the io thread that consumes 100% CPU?
> >>> Or the vcpu thread?
> >>
> >> I was building with the default options, i.e. there is no IO thread.
> >>
> >> Now I'm just running the test with IO threads enabled, and so far
> >> everything looks good. So I can only reproduce the problem with IO
> >> threads disabled.
> >
> > Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions
> > (relevant when --enable-io-thread is not used).  I will take a look at
> > that again and see why we're spinning without checking for ioeventfd
> > completion.
> 
> Here's my understanding of --disable-io-thread.  Added Anthony on CC,
> please correct me.
> 
> When I/O thread is disabled our only thread runs guest code until an
> exit request is made.  There are synchronous exit cases like a halt
> instruction or single step.  There are also asynchronous exit cases
> when signal handlers use qemu_notify_event(), which does cpu_exit(),
> to set env->exit_request = 1 and unlink the current tb.
> 
> With this structure in mind, anything which needs to interrupt the
> vcpu in order to process events must use signals and
> qemu_notify_event().  Otherwise that event source may be starved and
> never processed.
> 
> virtio-ioeventfd currently does not use signals and will therefore
> never interrupt the vcpu.
> 
> However, you normally don't notice the missing signal handler because
> some other event interrupts the vcpu and we enter select(2) to process
> all pending handlers.  So virtio-ioeventfd mostly gets a free ride on
> top of timer events.  This is suboptimal because it adds latency to
> virtqueue kick - we're waiting for another event to interrupt the vcpu
> before we can process virtqueue-kick.
> 
> If any other vcpu interruption makes virtio-ioeventfd chug along then
> why are you seeing 100% CPU livelock?  My theory is that dynticks has
> a race condition which causes timers to stop working in QEMU.  Here is
> an strace of QEMU --disable-io-thread entering live lock.  I can
> trigger this by starting a VM and running "while true; do true; done"
> at the shell.  Then strace the QEMU process:
> 
> 08:04:34.985177 ioctl(11, KVM_RUN, 0)   = 0
> 08:04:34.985242 --- SIGALRM (Alarm clock) @ 0 (0) ---
> 08:04:34.985319 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
> 08:04:34.985368 rt_sigreturn(0x2758ad0) = 0
> 08:04:34.985423 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0})
> 08:04:34.985484 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
> 08:04:34.985538 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
> 08:04:34.985588 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
> 273000}}, NULL) = 0
> 08:04:34.985646 ioctl(11, KVM_RUN, 0)   = -1 EINTR (Interrupted system call)
> 08:04:34.985928 --- SIGALRM (Alarm clock) @ 0 (0) ---
> 08:04:34.986007 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
> 08:04:34.986063 rt_sigreturn(0x2758ad0) = -1 EINTR (Interrupted system call)
> 08:04:34.986124 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0})
> 08:04:34.986188 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
> 08:04:34.986246 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
> 08:04:34.986299 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
> 250000}}, NULL) = 0
> 08:04:34.986359 ioctl(11, KVM_INTERRUPT, 0x7fff90404ef0) = 0
> 08:04:34.986406 ioctl(11, KVM_RUN, 0)   = 0
> 08:04:34.986465 ioctl(11, KVM_RUN, 0)   = 0              <--- guest
> finishes execution
> 
>                 v--- dynticks_rearm_timer() returns early because
> timer is already scheduled
> 08:04:34.986533 timer_gettime(0, {it_interval={0, 0}, it_value={0, 24203}}) = 0
> 08:04:34.986585 --- SIGALRM (Alarm clock) @ 0 (0) ---    <--- timer expires
> 08:04:34.986661 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
> 08:04:34.986710 rt_sigreturn(0x2758ad0) = 0
> 
>                 v--- we re-enter the guest without rearming the timer!
> 08:04:34.986754 ioctl(11, KVM_RUN^C <unfinished ...>
> [QEMU hang, 100% CPU]
> 
> So dynticks fails to rearm the timer before we enter the guest.  This
> is a race condition: we check that there is already a timer scheduled
> and head on towards re-entering the guest, the timer expires before we
> enter the guest, we re-enter the guest without realizing the timer has
> expired.  Now we're inside the guest without the hope of a timer
> expiring - and the guest is running a CPU-bound workload that doesn't
> need to perform I/O.
> 
> The result is a hung QEMU (screen does not update) and a softlockup
> inside the guest once we do kick it to life again (by detaching
> strace).
> 
> I think the only way to avoid this race condition in dynticks is to
> mask SIGALRM, then check if the timer expired, and then ioctl(KVM_RUN)
> with atomic signal mask change back to SIGALRM enabled.  Thoughts?
> 
> Back to virtio-ioeventfd, we really shouldn't use virtio-ioeventfd
> when there is no I/O thread.

Can we make it work with SIGIO?

>  It doesn't make sense because there's no
> opportunity to process the virtqueue while the guest code is executing
> in parallel like there is with I/O thread.  It will just degrade
> performance when QEMU only has one thread.

Probably. But it's really better to check this than theorethise about
it.

>  I'll send a patch to
> disable it when we build without I/O thread.
> 
> Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-25 11:27                 ` Michael S. Tsirkin
@ 2011-01-25 13:20                   ` Stefan Hajnoczi
  2011-01-25 14:07                     ` Stefan Hajnoczi
  0 siblings, 1 reply; 52+ messages in thread
From: Stefan Hajnoczi @ 2011-01-25 13:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, Anthony Liguori, Avi Kivity, qemu-devel, Stefan Hajnoczi

On Tue, Jan 25, 2011 at 11:27 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Jan 25, 2011 at 09:49:04AM +0000, Stefan Hajnoczi wrote:
>> On Tue, Jan 25, 2011 at 7:12 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> > On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> >> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin:
>> >>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
>> >>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
>> >>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
>> >>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
>> >>>>>>> Virtqueue notify is currently handled synchronously in userspace virtio.  This
>> >>>>>>> prevents the vcpu from executing guest code while hardware emulation code
>> >>>>>>> handles the notify.
>> >>>>>>>
>> >>>>>>> On systems that support KVM, the ioeventfd mechanism can be used to make
>> >>>>>>> virtqueue notify a lightweight exit by deferring hardware emulation to the
>> >>>>>>> iothread and allowing the VM to continue execution.  This model is similar to
>> >>>>>>> how vhost receives virtqueue notifies.
>> >>>>>>>
>> >>>>>>> The result of this change is improved performance for userspace virtio devices.
>> >>>>>>> Virtio-blk throughput increases especially for multithreaded scenarios and
>> >>>>>>> virtio-net transmit throughput increases substantially.
>> >>>>>>>
>> >>>>>>> Some virtio devices are known to have guest drivers which expect a notify to be
>> >>>>>>> processed synchronously and spin waiting for completion.  Only enable ioeventfd
>> >>>>>>> for virtio-blk and virtio-net for now.
>> >>>>>>>
>> >>>>>>> Care must be taken not to interfere with vhost-net, which uses host
>> >>>>>>> notifiers.  If the set_host_notifier() API is used by a device
>> >>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal with
>> >>>>>>> host notifiers as it wishes.
>> >>>>>>>
>> >>>>>>> After migration and on VM change state (running/paused) virtio-ioeventfd
>> >>>>>>> will enable/disable itself.
>> >>>>>>>
>> >>>>>>>  * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
>> >>>>>>>  * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
>> >>>>>>>  * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
>> >>>>>>>  * vm_change_state(running=0) -> disable virtio-ioeventfd
>> >>>>>>>  * vm_change_state(running=1) -> enable virtio-ioeventfd
>> >>>>>>>
>> >>>>>>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> >>>>>>
>> >>>>>> On current git master I'm getting hangs when running iozone on a
>> >>>>>> virtio-blk disk. "Hang" means that it's not responsive any more and has
>> >>>>>> 100% CPU consumption.
>> >>>>>>
>> >>>>>> I bisected the problem to this patch. Any ideas?
>> >>>>>>
>> >>>>>> Kevin
>> >>>>>
>> >>>>> Does it help if you set ioeventfd=off on command line?
>> >>>>
>> >>>> Yes, with ioeventfd=off it seems to work fine.
>> >>>>
>> >>>> Kevin
>> >>>
>> >>> Then it's the ioeventfd that is to blame.
>> >>> Is it the io thread that consumes 100% CPU?
>> >>> Or the vcpu thread?
>> >>
>> >> I was building with the default options, i.e. there is no IO thread.
>> >>
>> >> Now I'm just running the test with IO threads enabled, and so far
>> >> everything looks good. So I can only reproduce the problem with IO
>> >> threads disabled.
>> >
>> > Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions
>> > (relevant when --enable-io-thread is not used).  I will take a look at
>> > that again and see why we're spinning without checking for ioeventfd
>> > completion.
>>
>> Here's my understanding of --disable-io-thread.  Added Anthony on CC,
>> please correct me.
>>
>> When I/O thread is disabled our only thread runs guest code until an
>> exit request is made.  There are synchronous exit cases like a halt
>> instruction or single step.  There are also asynchronous exit cases
>> when signal handlers use qemu_notify_event(), which does cpu_exit(),
>> to set env->exit_request = 1 and unlink the current tb.
>>
>> With this structure in mind, anything which needs to interrupt the
>> vcpu in order to process events must use signals and
>> qemu_notify_event().  Otherwise that event source may be starved and
>> never processed.
>>
>> virtio-ioeventfd currently does not use signals and will therefore
>> never interrupt the vcpu.
>>
>> However, you normally don't notice the missing signal handler because
>> some other event interrupts the vcpu and we enter select(2) to process
>> all pending handlers.  So virtio-ioeventfd mostly gets a free ride on
>> top of timer events.  This is suboptimal because it adds latency to
>> virtqueue kick - we're waiting for another event to interrupt the vcpu
>> before we can process virtqueue-kick.
>>
>> If any other vcpu interruption makes virtio-ioeventfd chug along then
>> why are you seeing 100% CPU livelock?  My theory is that dynticks has
>> a race condition which causes timers to stop working in QEMU.  Here is
>> an strace of QEMU --disable-io-thread entering live lock.  I can
>> trigger this by starting a VM and running "while true; do true; done"
>> at the shell.  Then strace the QEMU process:
>>
>> 08:04:34.985177 ioctl(11, KVM_RUN, 0)   = 0
>> 08:04:34.985242 --- SIGALRM (Alarm clock) @ 0 (0) ---
>> 08:04:34.985319 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
>> 08:04:34.985368 rt_sigreturn(0x2758ad0) = 0
>> 08:04:34.985423 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0})
>> 08:04:34.985484 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
>> 08:04:34.985538 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
>> 08:04:34.985588 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
>> 273000}}, NULL) = 0
>> 08:04:34.985646 ioctl(11, KVM_RUN, 0)   = -1 EINTR (Interrupted system call)
>> 08:04:34.985928 --- SIGALRM (Alarm clock) @ 0 (0) ---
>> 08:04:34.986007 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
>> 08:04:34.986063 rt_sigreturn(0x2758ad0) = -1 EINTR (Interrupted system call)
>> 08:04:34.986124 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0})
>> 08:04:34.986188 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
>> 08:04:34.986246 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
>> 08:04:34.986299 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
>> 250000}}, NULL) = 0
>> 08:04:34.986359 ioctl(11, KVM_INTERRUPT, 0x7fff90404ef0) = 0
>> 08:04:34.986406 ioctl(11, KVM_RUN, 0)   = 0
>> 08:04:34.986465 ioctl(11, KVM_RUN, 0)   = 0              <--- guest
>> finishes execution
>>
>>                 v--- dynticks_rearm_timer() returns early because
>> timer is already scheduled
>> 08:04:34.986533 timer_gettime(0, {it_interval={0, 0}, it_value={0, 24203}}) = 0
>> 08:04:34.986585 --- SIGALRM (Alarm clock) @ 0 (0) ---    <--- timer expires
>> 08:04:34.986661 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
>> 08:04:34.986710 rt_sigreturn(0x2758ad0) = 0
>>
>>                 v--- we re-enter the guest without rearming the timer!
>> 08:04:34.986754 ioctl(11, KVM_RUN^C <unfinished ...>
>> [QEMU hang, 100% CPU]
>>
>> So dynticks fails to rearm the timer before we enter the guest.  This
>> is a race condition: we check that there is already a timer scheduled
>> and head on towards re-entering the guest, the timer expires before we
>> enter the guest, we re-enter the guest without realizing the timer has
>> expired.  Now we're inside the guest without the hope of a timer
>> expiring - and the guest is running a CPU-bound workload that doesn't
>> need to perform I/O.
>>
>> The result is a hung QEMU (screen does not update) and a softlockup
>> inside the guest once we do kick it to life again (by detaching
>> strace).
>>
>> I think the only way to avoid this race condition in dynticks is to
>> mask SIGALRM, then check if the timer expired, and then ioctl(KVM_RUN)
>> with atomic signal mask change back to SIGALRM enabled.  Thoughts?
>>
>> Back to virtio-ioeventfd, we really shouldn't use virtio-ioeventfd
>> when there is no I/O thread.
>
> Can we make it work with SIGIO?
>
>>  It doesn't make sense because there's no
>> opportunity to process the virtqueue while the guest code is executing
>> in parallel like there is with I/O thread.  It will just degrade
>> performance when QEMU only has one thread.
>
> Probably. But it's really better to check this than theorethise about
> it.

eventfd does not seem to support O_ASYNC.  After adding the necessary
code into QEMU no signals were firing so I wrote a test:

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <sys/eventfd.h>

int main(int argc, char **argv)
{
        int fd = eventfd(0, 0);
        if (fd < 0) {
                perror("eventfd");
                exit(1);
        }

        if (fcntl(fd, F_SETSIG, SIGTERM) < 0) {
                perror("fcntl(F_SETSIG)");
                exit(1);
        }

        if (fcntl(fd, F_SETOWN, getpid()) < 0) {
                perror("fcntl(F_SETOWN)");
                exit(1);
        }

        if (fcntl(fd, F_SETFL, O_NONBLOCK | O_ASYNC) < 0) {
                perror("fcntl(F_SETFL)");
                exit(1);
        }

        switch (fork()) {
        case -1:
                perror("fork");
                exit(1);

        case 0:         /* child */
                eventfd_write(fd, 1);
                exit(0);

        default:        /* parent */
                break;
        }

        sleep(5);
        wait(NULL);
        close(fd);
        return 0;
}

I'd expect the parent to get a SIGTERM but the process just sleeps and
then exits.  When replacing the eventfd with a pipe in this program
the parent does receive a SIGKILL.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-25 13:20                   ` Stefan Hajnoczi
@ 2011-01-25 14:07                     ` Stefan Hajnoczi
  0 siblings, 0 replies; 52+ messages in thread
From: Stefan Hajnoczi @ 2011-01-25 14:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, Anthony Liguori, Avi Kivity, qemu-devel, Stefan Hajnoczi

On Tue, Jan 25, 2011 at 1:20 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> eventfd does not seem to support O_ASYNC.

linux-2.6/fs/eventfd.c does not implement file_operations::fasync() so
I'm convinced SIGIO is not possible here.

I have sent a patch to disable virtio-ioeventfd when !CONFIG_IOTHREAD.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-25  9:49               ` Stefan Hajnoczi
  2011-01-25  9:54                 ` Stefan Hajnoczi
  2011-01-25 11:27                 ` Michael S. Tsirkin
@ 2011-01-25 19:18                 ` Anthony Liguori
  2011-01-25 19:45                   ` Stefan Hajnoczi
  2 siblings, 1 reply; 52+ messages in thread
From: Anthony Liguori @ 2011-01-25 19:18 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Michael S. Tsirkin,
	qemu-devel, Avi Kivity

On 01/25/2011 03:49 AM, Stefan Hajnoczi wrote:
> On Tue, Jan 25, 2011 at 7:12 AM, Stefan Hajnoczi<stefanha@gmail.com>  wrote:
>    
>> On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf<kwolf@redhat.com>  wrote:
>>      
>>> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin:
>>>        
>>>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
>>>>          
>>>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
>>>>>            
>>>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
>>>>>>              
>>>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
>>>>>>>                
>>>>>>>> Virtqueue notify is currently handled synchronously in userspace virtio.  This
>>>>>>>> prevents the vcpu from executing guest code while hardware emulation code
>>>>>>>> handles the notify.
>>>>>>>>
>>>>>>>> On systems that support KVM, the ioeventfd mechanism can be used to make
>>>>>>>> virtqueue notify a lightweight exit by deferring hardware emulation to the
>>>>>>>> iothread and allowing the VM to continue execution.  This model is similar to
>>>>>>>> how vhost receives virtqueue notifies.
>>>>>>>>
>>>>>>>> The result of this change is improved performance for userspace virtio devices.
>>>>>>>> Virtio-blk throughput increases especially for multithreaded scenarios and
>>>>>>>> virtio-net transmit throughput increases substantially.
>>>>>>>>
>>>>>>>> Some virtio devices are known to have guest drivers which expect a notify to be
>>>>>>>> processed synchronously and spin waiting for completion.  Only enable ioeventfd
>>>>>>>> for virtio-blk and virtio-net for now.
>>>>>>>>
>>>>>>>> Care must be taken not to interfere with vhost-net, which uses host
>>>>>>>> notifiers.  If the set_host_notifier() API is used by a device
>>>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal with
>>>>>>>> host notifiers as it wishes.
>>>>>>>>
>>>>>>>> After migration and on VM change state (running/paused) virtio-ioeventfd
>>>>>>>> will enable/disable itself.
>>>>>>>>
>>>>>>>>   * VIRTIO_CONFIG_S_DRIVER_OK ->  enable virtio-ioeventfd
>>>>>>>>   * !VIRTIO_CONFIG_S_DRIVER_OK ->  disable virtio-ioeventfd
>>>>>>>>   * virtio_pci_set_host_notifier() ->  disable virtio-ioeventfd
>>>>>>>>   * vm_change_state(running=0) ->  disable virtio-ioeventfd
>>>>>>>>   * vm_change_state(running=1) ->  enable virtio-ioeventfd
>>>>>>>>
>>>>>>>> Signed-off-by: Stefan Hajnoczi<stefanha@linux.vnet.ibm.com>
>>>>>>>>                  
>>>>>>> On current git master I'm getting hangs when running iozone on a
>>>>>>> virtio-blk disk. "Hang" means that it's not responsive any more and has
>>>>>>> 100% CPU consumption.
>>>>>>>
>>>>>>> I bisected the problem to this patch. Any ideas?
>>>>>>>
>>>>>>> Kevin
>>>>>>>                
>>>>>> Does it help if you set ioeventfd=off on command line?
>>>>>>              
>>>>> Yes, with ioeventfd=off it seems to work fine.
>>>>>
>>>>> Kevin
>>>>>            
>>>> Then it's the ioeventfd that is to blame.
>>>> Is it the io thread that consumes 100% CPU?
>>>> Or the vcpu thread?
>>>>          
>>> I was building with the default options, i.e. there is no IO thread.
>>>
>>> Now I'm just running the test with IO threads enabled, and so far
>>> everything looks good. So I can only reproduce the problem with IO
>>> threads disabled.
>>>        
>> Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions
>> (relevant when --enable-io-thread is not used).  I will take a look at
>> that again and see why we're spinning without checking for ioeventfd
>> completion.
>>      
> Here's my understanding of --disable-io-thread.  Added Anthony on CC,
> please correct me.
>
> When I/O thread is disabled our only thread runs guest code until an
> exit request is made.  There are synchronous exit cases like a halt
> instruction or single step.  There are also asynchronous exit cases
> when signal handlers use qemu_notify_event(), which does cpu_exit(),
> to set env->exit_request = 1 and unlink the current tb.
>    

Correct.

Note that this is a problem today.  If you have a tight loop in TCG and 
you have nothing that would generate a signal (no pending disk I/O and 
no periodic timer) then the main loop is starved.

This is a fundamental flaw in TCG.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-25 19:18                 ` Anthony Liguori
@ 2011-01-25 19:45                   ` Stefan Hajnoczi
  2011-01-25 19:51                     ` Anthony Liguori
  0 siblings, 1 reply; 52+ messages in thread
From: Stefan Hajnoczi @ 2011-01-25 19:45 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Michael S. Tsirkin,
	qemu-devel, Avi Kivity

On Tue, Jan 25, 2011 at 7:18 PM, Anthony Liguori
<aliguori@linux.vnet.ibm.com> wrote:
> On 01/25/2011 03:49 AM, Stefan Hajnoczi wrote:
>>
>> On Tue, Jan 25, 2011 at 7:12 AM, Stefan Hajnoczi<stefanha@gmail.com>
>>  wrote:
>>
>>>
>>> On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf<kwolf@redhat.com>  wrote:
>>>
>>>>
>>>> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin:
>>>>
>>>>>
>>>>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
>>>>>
>>>>>>
>>>>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Virtqueue notify is currently handled synchronously in userspace
>>>>>>>>> virtio.  This
>>>>>>>>> prevents the vcpu from executing guest code while hardware
>>>>>>>>> emulation code
>>>>>>>>> handles the notify.
>>>>>>>>>
>>>>>>>>> On systems that support KVM, the ioeventfd mechanism can be used to
>>>>>>>>> make
>>>>>>>>> virtqueue notify a lightweight exit by deferring hardware emulation
>>>>>>>>> to the
>>>>>>>>> iothread and allowing the VM to continue execution.  This model is
>>>>>>>>> similar to
>>>>>>>>> how vhost receives virtqueue notifies.
>>>>>>>>>
>>>>>>>>> The result of this change is improved performance for userspace
>>>>>>>>> virtio devices.
>>>>>>>>> Virtio-blk throughput increases especially for multithreaded
>>>>>>>>> scenarios and
>>>>>>>>> virtio-net transmit throughput increases substantially.
>>>>>>>>>
>>>>>>>>> Some virtio devices are known to have guest drivers which expect a
>>>>>>>>> notify to be
>>>>>>>>> processed synchronously and spin waiting for completion.  Only
>>>>>>>>> enable ioeventfd
>>>>>>>>> for virtio-blk and virtio-net for now.
>>>>>>>>>
>>>>>>>>> Care must be taken not to interfere with vhost-net, which uses host
>>>>>>>>> notifiers.  If the set_host_notifier() API is used by a device
>>>>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal
>>>>>>>>> with
>>>>>>>>> host notifiers as it wishes.
>>>>>>>>>
>>>>>>>>> After migration and on VM change state (running/paused)
>>>>>>>>> virtio-ioeventfd
>>>>>>>>> will enable/disable itself.
>>>>>>>>>
>>>>>>>>>  * VIRTIO_CONFIG_S_DRIVER_OK ->  enable virtio-ioeventfd
>>>>>>>>>  * !VIRTIO_CONFIG_S_DRIVER_OK ->  disable virtio-ioeventfd
>>>>>>>>>  * virtio_pci_set_host_notifier() ->  disable virtio-ioeventfd
>>>>>>>>>  * vm_change_state(running=0) ->  disable virtio-ioeventfd
>>>>>>>>>  * vm_change_state(running=1) ->  enable virtio-ioeventfd
>>>>>>>>>
>>>>>>>>> Signed-off-by: Stefan Hajnoczi<stefanha@linux.vnet.ibm.com>
>>>>>>>>>
>>>>>>>>
>>>>>>>> On current git master I'm getting hangs when running iozone on a
>>>>>>>> virtio-blk disk. "Hang" means that it's not responsive any more and
>>>>>>>> has
>>>>>>>> 100% CPU consumption.
>>>>>>>>
>>>>>>>> I bisected the problem to this patch. Any ideas?
>>>>>>>>
>>>>>>>> Kevin
>>>>>>>>
>>>>>>>
>>>>>>> Does it help if you set ioeventfd=off on command line?
>>>>>>>
>>>>>>
>>>>>> Yes, with ioeventfd=off it seems to work fine.
>>>>>>
>>>>>> Kevin
>>>>>>
>>>>>
>>>>> Then it's the ioeventfd that is to blame.
>>>>> Is it the io thread that consumes 100% CPU?
>>>>> Or the vcpu thread?
>>>>>
>>>>
>>>> I was building with the default options, i.e. there is no IO thread.
>>>>
>>>> Now I'm just running the test with IO threads enabled, and so far
>>>> everything looks good. So I can only reproduce the problem with IO
>>>> threads disabled.
>>>>
>>>
>>> Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions
>>> (relevant when --enable-io-thread is not used).  I will take a look at
>>> that again and see why we're spinning without checking for ioeventfd
>>> completion.
>>>
>>
>> Here's my understanding of --disable-io-thread.  Added Anthony on CC,
>> please correct me.
>>
>> When I/O thread is disabled our only thread runs guest code until an
>> exit request is made.  There are synchronous exit cases like a halt
>> instruction or single step.  There are also asynchronous exit cases
>> when signal handlers use qemu_notify_event(), which does cpu_exit(),
>> to set env->exit_request = 1 and unlink the current tb.
>>
>
> Correct.
>
> Note that this is a problem today.  If you have a tight loop in TCG and you
> have nothing that would generate a signal (no pending disk I/O and no
> periodic timer) then the main loop is starved.

Even with KVM we can spin inside the guest and get a softlockup due to
the dynticks race condition shown above.  In a CPU bound guest that's
doing no I/O it's possible to go AWOL for extended periods of time.

I can think of two solutions:
1. Block SIGALRM during critical regions, not sure if the necessary
atomic signal mask capabilities are there in KVM.  Haven't looked at
TCG yet either.
2. Make a portion of the timer code signal-safe and rearm the timer
from within the SIGLARM handler.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-25 19:45                   ` Stefan Hajnoczi
@ 2011-01-25 19:51                     ` Anthony Liguori
  2011-01-25 19:59                       ` Stefan Hajnoczi
  0 siblings, 1 reply; 52+ messages in thread
From: Anthony Liguori @ 2011-01-25 19:51 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Michael S. Tsirkin,
	qemu-devel, Avi Kivity

On 01/25/2011 01:45 PM, Stefan Hajnoczi wrote:
> On Tue, Jan 25, 2011 at 7:18 PM, Anthony Liguori
> <aliguori@linux.vnet.ibm.com>  wrote:
>    
>> On 01/25/2011 03:49 AM, Stefan Hajnoczi wrote:
>>      
>>> On Tue, Jan 25, 2011 at 7:12 AM, Stefan Hajnoczi<stefanha@gmail.com>
>>>   wrote:
>>>
>>>        
>>>> On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf<kwolf@redhat.com>    wrote:
>>>>
>>>>          
>>>>> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin:
>>>>>
>>>>>            
>>>>>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
>>>>>>
>>>>>>              
>>>>>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
>>>>>>>
>>>>>>>                
>>>>>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
>>>>>>>>
>>>>>>>>                  
>>>>>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>>> Virtqueue notify is currently handled synchronously in userspace
>>>>>>>>>> virtio.  This
>>>>>>>>>> prevents the vcpu from executing guest code while hardware
>>>>>>>>>> emulation code
>>>>>>>>>> handles the notify.
>>>>>>>>>>
>>>>>>>>>> On systems that support KVM, the ioeventfd mechanism can be used to
>>>>>>>>>> make
>>>>>>>>>> virtqueue notify a lightweight exit by deferring hardware emulation
>>>>>>>>>> to the
>>>>>>>>>> iothread and allowing the VM to continue execution.  This model is
>>>>>>>>>> similar to
>>>>>>>>>> how vhost receives virtqueue notifies.
>>>>>>>>>>
>>>>>>>>>> The result of this change is improved performance for userspace
>>>>>>>>>> virtio devices.
>>>>>>>>>> Virtio-blk throughput increases especially for multithreaded
>>>>>>>>>> scenarios and
>>>>>>>>>> virtio-net transmit throughput increases substantially.
>>>>>>>>>>
>>>>>>>>>> Some virtio devices are known to have guest drivers which expect a
>>>>>>>>>> notify to be
>>>>>>>>>> processed synchronously and spin waiting for completion.  Only
>>>>>>>>>> enable ioeventfd
>>>>>>>>>> for virtio-blk and virtio-net for now.
>>>>>>>>>>
>>>>>>>>>> Care must be taken not to interfere with vhost-net, which uses host
>>>>>>>>>> notifiers.  If the set_host_notifier() API is used by a device
>>>>>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal
>>>>>>>>>> with
>>>>>>>>>> host notifiers as it wishes.
>>>>>>>>>>
>>>>>>>>>> After migration and on VM change state (running/paused)
>>>>>>>>>> virtio-ioeventfd
>>>>>>>>>> will enable/disable itself.
>>>>>>>>>>
>>>>>>>>>>   * VIRTIO_CONFIG_S_DRIVER_OK ->    enable virtio-ioeventfd
>>>>>>>>>>   * !VIRTIO_CONFIG_S_DRIVER_OK ->    disable virtio-ioeventfd
>>>>>>>>>>   * virtio_pci_set_host_notifier() ->    disable virtio-ioeventfd
>>>>>>>>>>   * vm_change_state(running=0) ->    disable virtio-ioeventfd
>>>>>>>>>>   * vm_change_state(running=1) ->    enable virtio-ioeventfd
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Stefan Hajnoczi<stefanha@linux.vnet.ibm.com>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>> On current git master I'm getting hangs when running iozone on a
>>>>>>>>> virtio-blk disk. "Hang" means that it's not responsive any more and
>>>>>>>>> has
>>>>>>>>> 100% CPU consumption.
>>>>>>>>>
>>>>>>>>> I bisected the problem to this patch. Any ideas?
>>>>>>>>>
>>>>>>>>> Kevin
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>> Does it help if you set ioeventfd=off on command line?
>>>>>>>>
>>>>>>>>                  
>>>>>>> Yes, with ioeventfd=off it seems to work fine.
>>>>>>>
>>>>>>> Kevin
>>>>>>>
>>>>>>>                
>>>>>> Then it's the ioeventfd that is to blame.
>>>>>> Is it the io thread that consumes 100% CPU?
>>>>>> Or the vcpu thread?
>>>>>>
>>>>>>              
>>>>> I was building with the default options, i.e. there is no IO thread.
>>>>>
>>>>> Now I'm just running the test with IO threads enabled, and so far
>>>>> everything looks good. So I can only reproduce the problem with IO
>>>>> threads disabled.
>>>>>
>>>>>            
>>>> Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions
>>>> (relevant when --enable-io-thread is not used).  I will take a look at
>>>> that again and see why we're spinning without checking for ioeventfd
>>>> completion.
>>>>
>>>>          
>>> Here's my understanding of --disable-io-thread.  Added Anthony on CC,
>>> please correct me.
>>>
>>> When I/O thread is disabled our only thread runs guest code until an
>>> exit request is made.  There are synchronous exit cases like a halt
>>> instruction or single step.  There are also asynchronous exit cases
>>> when signal handlers use qemu_notify_event(), which does cpu_exit(),
>>> to set env->exit_request = 1 and unlink the current tb.
>>>
>>>        
>> Correct.
>>
>> Note that this is a problem today.  If you have a tight loop in TCG and you
>> have nothing that would generate a signal (no pending disk I/O and no
>> periodic timer) then the main loop is starved.
>>      
> Even with KVM we can spin inside the guest and get a softlockup due to
> the dynticks race condition shown above.  In a CPU bound guest that's
> doing no I/O it's possible to go AWOL for extended periods of time.
>    

This is a different race.  I need to look more deeply into the code.

> I can think of two solutions:
> 1. Block SIGALRM during critical regions, not sure if the necessary
> atomic signal mask capabilities are there in KVM.  Haven't looked at
> TCG yet either.
> 2. Make a portion of the timer code signal-safe and rearm the timer
> from within the SIGLARM handler.
>    

Or, switch to timerfd and stop using a signal based alarm timer.

Regards,

Anthony Liguori



> Stefan
>    

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-25 19:51                     ` Anthony Liguori
@ 2011-01-25 19:59                       ` Stefan Hajnoczi
  2011-01-26  0:18                         ` Anthony Liguori
  0 siblings, 1 reply; 52+ messages in thread
From: Stefan Hajnoczi @ 2011-01-25 19:59 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Michael S. Tsirkin,
	qemu-devel, Avi Kivity

On Tue, Jan 25, 2011 at 7:51 PM, Anthony Liguori
<aliguori@linux.vnet.ibm.com> wrote:
> On 01/25/2011 01:45 PM, Stefan Hajnoczi wrote:
>>
>> On Tue, Jan 25, 2011 at 7:18 PM, Anthony Liguori
>> <aliguori@linux.vnet.ibm.com>  wrote:
>>
>>>
>>> On 01/25/2011 03:49 AM, Stefan Hajnoczi wrote:
>>>
>>>>
>>>> On Tue, Jan 25, 2011 at 7:12 AM, Stefan Hajnoczi<stefanha@gmail.com>
>>>>  wrote:
>>>>
>>>>
>>>>>
>>>>> On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf<kwolf@redhat.com>    wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Virtqueue notify is currently handled synchronously in userspace
>>>>>>>>>>> virtio.  This
>>>>>>>>>>> prevents the vcpu from executing guest code while hardware
>>>>>>>>>>> emulation code
>>>>>>>>>>> handles the notify.
>>>>>>>>>>>
>>>>>>>>>>> On systems that support KVM, the ioeventfd mechanism can be used
>>>>>>>>>>> to
>>>>>>>>>>> make
>>>>>>>>>>> virtqueue notify a lightweight exit by deferring hardware
>>>>>>>>>>> emulation
>>>>>>>>>>> to the
>>>>>>>>>>> iothread and allowing the VM to continue execution.  This model
>>>>>>>>>>> is
>>>>>>>>>>> similar to
>>>>>>>>>>> how vhost receives virtqueue notifies.
>>>>>>>>>>>
>>>>>>>>>>> The result of this change is improved performance for userspace
>>>>>>>>>>> virtio devices.
>>>>>>>>>>> Virtio-blk throughput increases especially for multithreaded
>>>>>>>>>>> scenarios and
>>>>>>>>>>> virtio-net transmit throughput increases substantially.
>>>>>>>>>>>
>>>>>>>>>>> Some virtio devices are known to have guest drivers which expect
>>>>>>>>>>> a
>>>>>>>>>>> notify to be
>>>>>>>>>>> processed synchronously and spin waiting for completion.  Only
>>>>>>>>>>> enable ioeventfd
>>>>>>>>>>> for virtio-blk and virtio-net for now.
>>>>>>>>>>>
>>>>>>>>>>> Care must be taken not to interfere with vhost-net, which uses
>>>>>>>>>>> host
>>>>>>>>>>> notifiers.  If the set_host_notifier() API is used by a device
>>>>>>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal
>>>>>>>>>>> with
>>>>>>>>>>> host notifiers as it wishes.
>>>>>>>>>>>
>>>>>>>>>>> After migration and on VM change state (running/paused)
>>>>>>>>>>> virtio-ioeventfd
>>>>>>>>>>> will enable/disable itself.
>>>>>>>>>>>
>>>>>>>>>>>  * VIRTIO_CONFIG_S_DRIVER_OK ->    enable virtio-ioeventfd
>>>>>>>>>>>  * !VIRTIO_CONFIG_S_DRIVER_OK ->    disable virtio-ioeventfd
>>>>>>>>>>>  * virtio_pci_set_host_notifier() ->    disable virtio-ioeventfd
>>>>>>>>>>>  * vm_change_state(running=0) ->    disable virtio-ioeventfd
>>>>>>>>>>>  * vm_change_state(running=1) ->    enable virtio-ioeventfd
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Stefan Hajnoczi<stefanha@linux.vnet.ibm.com>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On current git master I'm getting hangs when running iozone on a
>>>>>>>>>> virtio-blk disk. "Hang" means that it's not responsive any more
>>>>>>>>>> and
>>>>>>>>>> has
>>>>>>>>>> 100% CPU consumption.
>>>>>>>>>>
>>>>>>>>>> I bisected the problem to this patch. Any ideas?
>>>>>>>>>>
>>>>>>>>>> Kevin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Does it help if you set ioeventfd=off on command line?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, with ioeventfd=off it seems to work fine.
>>>>>>>>
>>>>>>>> Kevin
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Then it's the ioeventfd that is to blame.
>>>>>>> Is it the io thread that consumes 100% CPU?
>>>>>>> Or the vcpu thread?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> I was building with the default options, i.e. there is no IO thread.
>>>>>>
>>>>>> Now I'm just running the test with IO threads enabled, and so far
>>>>>> everything looks good. So I can only reproduce the problem with IO
>>>>>> threads disabled.
>>>>>>
>>>>>>
>>>>>
>>>>> Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions
>>>>> (relevant when --enable-io-thread is not used).  I will take a look at
>>>>> that again and see why we're spinning without checking for ioeventfd
>>>>> completion.
>>>>>
>>>>>
>>>>
>>>> Here's my understanding of --disable-io-thread.  Added Anthony on CC,
>>>> please correct me.
>>>>
>>>> When I/O thread is disabled our only thread runs guest code until an
>>>> exit request is made.  There are synchronous exit cases like a halt
>>>> instruction or single step.  There are also asynchronous exit cases
>>>> when signal handlers use qemu_notify_event(), which does cpu_exit(),
>>>> to set env->exit_request = 1 and unlink the current tb.
>>>>
>>>>
>>>
>>> Correct.
>>>
>>> Note that this is a problem today.  If you have a tight loop in TCG and
>>> you
>>> have nothing that would generate a signal (no pending disk I/O and no
>>> periodic timer) then the main loop is starved.
>>>
>>
>> Even with KVM we can spin inside the guest and get a softlockup due to
>> the dynticks race condition shown above.  In a CPU bound guest that's
>> doing no I/O it's possible to go AWOL for extended periods of time.
>>
>
> This is a different race.  I need to look more deeply into the code.

int kvm_cpu_exec(CPUState *env)
{
    struct kvm_run *run = env->kvm_run;
    int ret;

    DPRINTF("kvm_cpu_exec()\n");

    do {

This is broken because a signal handler could change env->exit_request
after this check:

#ifndef CONFIG_IOTHREAD
        if (env->exit_request) {
            DPRINTF("interrupt exit requested\n");
            ret = 0;
            break;
        }
#endif

        if (kvm_arch_process_irqchip_events(env)) {
            ret = 0;
            break;
        }

        if (env->kvm_vcpu_dirty) {
            kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
            env->kvm_vcpu_dirty = 0;
        }

        kvm_arch_pre_run(env, run);
        cpu_single_env = NULL;
        qemu_mutex_unlock_iothread();

env->exit_request might be set but we still reenter, possibly without
rearming the timer:
        ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);

>> I can think of two solutions:
>> 1. Block SIGALRM during critical regions, not sure if the necessary
>> atomic signal mask capabilities are there in KVM.  Haven't looked at
>> TCG yet either.
>> 2. Make a portion of the timer code signal-safe and rearm the timer
>> from within the SIGLARM handler.
>>
>
> Or, switch to timerfd and stop using a signal based alarm timer.

Doesn't work for !CONFIG_IOTHREAD.

Stefan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify
  2011-01-25 19:59                       ` Stefan Hajnoczi
@ 2011-01-26  0:18                         ` Anthony Liguori
  0 siblings, 0 replies; 52+ messages in thread
From: Anthony Liguori @ 2011-01-26  0:18 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, qemu-devel, Avi Kivity, Stefan Hajnoczi, Michael S. Tsirkin

On 01/25/2011 01:59 PM, Stefan Hajnoczi wrote:
> int kvm_cpu_exec(CPUState *env)
> {
>      struct kvm_run *run = env->kvm_run;
>      int ret;
>
>      DPRINTF("kvm_cpu_exec()\n");
>
>      do {
>
> This is broken because a signal handler could change env->exit_request
> after this check:
>
> #ifndef CONFIG_IOTHREAD
>          if (env->exit_request) {
>              DPRINTF("interrupt exit requested\n");
>              ret = 0;
>              break;
>          }
> #endif
>    

Yeah, this is classic signal/select race with ioctl(KVM_RUN) subbing in 
for select.  But this is supposed to be mitigated by the fact that we 
block SIG_IPI except for when we execute KVM_RUN which means that we can 
reliably send SIG_IPI.

Of course, that doesn't help for SIGALRM unless we send a SIG_IPI from 
the SIGALRM handler which we do with the I/O thread but not w/o it.  At 
any rate, post stable-0.14, I want to enable I/O thread by default so I 
don't know that we really need to fix this...

>          if (kvm_arch_process_irqchip_events(env)) {
>              ret = 0;
>              break;
>          }
>
>          if (env->kvm_vcpu_dirty) {
>              kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
>              env->kvm_vcpu_dirty = 0;
>          }
>
>          kvm_arch_pre_run(env, run);
>          cpu_single_env = NULL;
>          qemu_mutex_unlock_iothread();
>
> env->exit_request might be set but we still reenter, possibly without
> rearming the timer:
>          ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
>
>    
>>> I can think of two solutions:
>>> 1. Block SIGALRM during critical regions, not sure if the necessary
>>> atomic signal mask capabilities are there in KVM.  Haven't looked at
>>> TCG yet either.
>>> 2. Make a portion of the timer code signal-safe and rearm the timer
>>> from within the SIGLARM handler.
>>>
>>>        
>> Or, switch to timerfd and stop using a signal based alarm timer.
>>      
> Doesn't work for !CONFIG_IOTHREAD.
>    

Yeah, we need to get rid of !CONFIG_IOTHREAD.  We need to run select() 
in parallel with TCG/KVM and interrupt the VCPUs appropriately when 
select() returns.

Regards,

Anthony Liguori

> Stefan
>
>    

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2011-01-26  0:18 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-12 15:02 [Qemu-devel] [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 1/4] virtio-pci: Rename bugs field to flags Stefan Hajnoczi
2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify Stefan Hajnoczi
2011-01-24 18:54   ` Kevin Wolf
2011-01-24 19:36     ` Michael S. Tsirkin
2011-01-24 19:48       ` Kevin Wolf
2011-01-24 19:47         ` Michael S. Tsirkin
2011-01-24 20:05           ` Kevin Wolf
2011-01-25  7:12             ` Stefan Hajnoczi
2011-01-25  9:49               ` Stefan Hajnoczi
2011-01-25  9:54                 ` Stefan Hajnoczi
2011-01-25 11:27                 ` Michael S. Tsirkin
2011-01-25 13:20                   ` Stefan Hajnoczi
2011-01-25 14:07                     ` Stefan Hajnoczi
2011-01-25 19:18                 ` Anthony Liguori
2011-01-25 19:45                   ` Stefan Hajnoczi
2011-01-25 19:51                     ` Anthony Liguori
2011-01-25 19:59                       ` Stefan Hajnoczi
2011-01-26  0:18                         ` Anthony Liguori
2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 3/4] virtio-pci: Don't use ioeventfd on old kernels Stefan Hajnoczi
2010-12-12 15:02 ` [Qemu-devel] [PATCH v5 4/4] docs: Document virtio PCI -device ioeventfd=on|off Stefan Hajnoczi
2010-12-12 15:14 ` [Qemu-devel] Re: [PATCH v5 0/4] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
2010-12-12 20:41 ` Michael S. Tsirkin
2010-12-12 20:42   ` Michael S. Tsirkin
2010-12-12 20:56     ` Michael S. Tsirkin
2010-12-12 21:09       ` Michael S. Tsirkin
2010-12-13 10:24         ` Stefan Hajnoczi
2010-12-13 10:38           ` Michael S. Tsirkin
2010-12-13 13:11             ` Stefan Hajnoczi
2010-12-13 13:35               ` Michael S. Tsirkin
2010-12-13 13:36                 ` Michael S. Tsirkin
2010-12-13 14:06                   ` Stefan Hajnoczi
2010-12-13 15:27                   ` Stefan Hajnoczi
2010-12-13 16:00                     ` Michael S. Tsirkin
2010-12-13 16:29                       ` Stefan Hajnoczi
2010-12-13 16:30                         ` Michael S. Tsirkin
2010-12-13 16:12                     ` Michael S. Tsirkin
2010-12-13 16:28                       ` Stefan Hajnoczi
2010-12-13 17:57                         ` Stefan Hajnoczi
2010-12-13 18:52                           ` Michael S. Tsirkin
2010-12-15 11:42                             ` Stefan Hajnoczi
2010-12-15 11:48                               ` Stefan Hajnoczi
2010-12-15 12:00                                 ` Michael S. Tsirkin
2010-12-15 12:14                               ` Michael S. Tsirkin
2010-12-15 12:59                                 ` Stefan Hajnoczi
2010-12-16 16:40                                   ` Stefan Hajnoczi
2010-12-16 23:39                                     ` Michael S. Tsirkin
2010-12-19 14:49                                   ` Michael S. Tsirkin
2011-01-06 16:41                                     ` Stefan Hajnoczi
2011-01-06 17:04                                       ` Michael S. Tsirkin
2011-01-06 18:00                                       ` Michael S. Tsirkin
2011-01-07  8:56                                         ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.