All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default
@ 2020-03-20 10:30 Stefan Hajnoczi
  2020-03-20 10:30 ` [PATCH RESEND v3 1/4] virtio-scsi: introduce a constant for fixed virtqueues Stefan Hajnoczi
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2020-03-20 10:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, slp, Michael S. Tsirkin,
	cohuck, Max Reitz, Stefan Hajnoczi, Paolo Bonzini, qemu-block

v3:
 * Add new performance results that demonstrate the scalability
 * Mention that this is PCI-specific [Cornelia]
v2:
 * Let the virtio-DEVICE-pci device select num-queues because the optimal
   multi-queue configuration may differ between virtio-pci, virtio-mmio, and
   virtio-ccw [Cornelia]

Enabling multi-queue on virtio-pci storage devices improves performance on SMP
guests because the completion interrupt is handled on the vCPU that submitted
the I/O request.  This avoids IPIs inside the guest.

Note that performance is unchanged in these cases:
1. Uniprocessor guests.  They don't have IPIs.
2. Application threads might be scheduled on the sole vCPU that handles
   completion interrupts purely by chance.  (This is one reason why benchmark
   results can vary noticably between runs.)
3. Users may bind the application to the vCPU that handles completion
   interrupts.

Set the number of queues to the number of vCPUs by default on virtio-blk and
virtio-scsi PCI devices.  Older machine types continue to default to 1 queue
for live migration compatibility.

Random read performance:
      IOPS
q=1    78k
q=32  104k  +33%

Boot time:
      Duration
q=1        51s
q=32     1m41s  +98%

Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks

Previously measured results on a 4 vCPU guest were also positive but showed a
smaller 1-4% performance improvement.  They are no longer valid because
significant event loop optimizations have been merged.

Stefan Hajnoczi (4):
  virtio-scsi: introduce a constant for fixed virtqueues
  virtio-scsi: default num_queues to -smp N
  virtio-blk: default num_queues to -smp N
  vhost-user-blk: default num_queues to -smp N

 hw/block/vhost-user-blk.c          |  6 +++++-
 hw/block/virtio-blk.c              |  6 +++++-
 hw/core/machine.c                  |  5 +++++
 hw/scsi/vhost-scsi.c               |  3 ++-
 hw/scsi/vhost-user-scsi.c          |  5 +++--
 hw/scsi/virtio-scsi.c              | 13 +++++++++----
 hw/virtio/vhost-scsi-pci.c         | 10 ++++++++--
 hw/virtio/vhost-user-blk-pci.c     |  6 ++++++
 hw/virtio/vhost-user-scsi-pci.c    | 10 ++++++++--
 hw/virtio/virtio-blk-pci.c         |  9 ++++++++-
 hw/virtio/virtio-scsi-pci.c        | 10 ++++++++--
 include/hw/virtio/vhost-user-blk.h |  2 ++
 include/hw/virtio/virtio-blk.h     |  2 ++
 include/hw/virtio/virtio-scsi.h    |  5 +++++
 14 files changed, 76 insertions(+), 16 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RESEND v3 1/4] virtio-scsi: introduce a constant for fixed virtqueues
  2020-03-20 10:30 [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefan Hajnoczi
@ 2020-03-20 10:30 ` Stefan Hajnoczi
  2020-03-20 10:30 ` [PATCH RESEND v3 2/4] virtio-scsi: default num_queues to -smp N Stefan Hajnoczi
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2020-03-20 10:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, slp, Michael S. Tsirkin,
	cohuck, Max Reitz, Stefan Hajnoczi, Paolo Bonzini, qemu-block

The event and control virtqueues are always present, regardless of the
multi-queue configuration.  Define a constant so that virtqueue number
calculations are easier to read.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
 hw/scsi/vhost-user-scsi.c       | 2 +-
 hw/scsi/virtio-scsi.c           | 7 ++++---
 include/hw/virtio/virtio-scsi.h | 3 +++
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a01bf63a08..e9752baa89 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -115,7 +115,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
         goto free_virtio;
     }
 
-    vsc->dev.nvqs = 2 + vs->conf.num_queues;
+    vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues;
     vsc->dev.vqs = g_new0(struct vhost_virtqueue, vsc->dev.nvqs);
     vsc->dev.vq_index = 0;
     vsc->dev.backend_features = 0;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 472bbd233b..427ad83c50 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -191,7 +191,7 @@ static void virtio_scsi_save_request(QEMUFile *f, SCSIRequest *sreq)
     VirtIOSCSIReq *req = sreq->hba_private;
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(req->dev);
     VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
-    uint32_t n = virtio_get_queue_index(req->vq) - 2;
+    uint32_t n = virtio_get_queue_index(req->vq) - VIRTIO_SCSI_VQ_NUM_FIXED;
 
     assert(n < vs->conf.num_queues);
     qemu_put_be32s(f, &n);
@@ -892,10 +892,11 @@ void virtio_scsi_common_realize(DeviceState *dev,
                 sizeof(VirtIOSCSIConfig));
 
     if (s->conf.num_queues == 0 ||
-            s->conf.num_queues > VIRTIO_QUEUE_MAX - 2) {
+            s->conf.num_queues > VIRTIO_QUEUE_MAX - VIRTIO_SCSI_VQ_NUM_FIXED) {
         error_setg(errp, "Invalid number of queues (= %" PRIu32 "), "
                          "must be a positive integer less than %d.",
-                   s->conf.num_queues, VIRTIO_QUEUE_MAX - 2);
+                   s->conf.num_queues,
+                   VIRTIO_QUEUE_MAX - VIRTIO_SCSI_VQ_NUM_FIXED);
         virtio_cleanup(vdev);
         return;
     }
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 24e768909d..9f293bcb80 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -36,6 +36,9 @@
 #define VIRTIO_SCSI_MAX_TARGET  255
 #define VIRTIO_SCSI_MAX_LUN     16383
 
+/* Number of virtqueues that are always present */
+#define VIRTIO_SCSI_VQ_NUM_FIXED    2
+
 typedef struct virtio_scsi_cmd_req VirtIOSCSICmdReq;
 typedef struct virtio_scsi_cmd_resp VirtIOSCSICmdResp;
 typedef struct virtio_scsi_ctrl_tmf_req VirtIOSCSICtrlTMFReq;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RESEND v3 2/4] virtio-scsi: default num_queues to -smp N
  2020-03-20 10:30 [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefan Hajnoczi
  2020-03-20 10:30 ` [PATCH RESEND v3 1/4] virtio-scsi: introduce a constant for fixed virtqueues Stefan Hajnoczi
@ 2020-03-20 10:30 ` Stefan Hajnoczi
  2020-03-20 10:58   ` Cornelia Huck
  2020-03-20 10:30 ` [PATCH RESEND v3 3/4] virtio-blk: " Stefan Hajnoczi
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2020-03-20 10:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, slp, Michael S. Tsirkin,
	cohuck, Max Reitz, Stefan Hajnoczi, Paolo Bonzini, qemu-block

Automatically size the number of virtio-scsi-pci, vhost-scsi-pci, and
vhost-user-scsi-pci request virtqueues to match the number of vCPUs.
Other transports continue to default to 1 request virtqueue.

A 1:1 virtqueue:vCPU mapping ensures that completion interrupts are
handled on the same vCPU that submitted the request.  No IPI is
necessary to complete an I/O request and performance is improved.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/core/machine.c               |  3 +++
 hw/scsi/vhost-scsi.c            |  3 ++-
 hw/scsi/vhost-user-scsi.c       |  3 ++-
 hw/scsi/virtio-scsi.c           |  6 +++++-
 hw/virtio/vhost-scsi-pci.c      | 10 ++++++++--
 hw/virtio/vhost-user-scsi-pci.c | 10 ++++++++--
 hw/virtio/virtio-scsi-pci.c     | 10 ++++++++--
 include/hw/virtio/virtio-scsi.h |  2 ++
 8 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 9e8c06036f..4bbcec8fbd 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -33,8 +33,11 @@ GlobalProperty hw_compat_4_2[] = {
     { "virtio-scsi-device", "virtqueue_size", "128"},
     { "virtio-blk-device", "x-enable-wce-if-config-wce", "off" },
     { "virtio-blk-device", "seg-max-adjust", "off"},
+    { "virtio-scsi-device", "num_queues", "1"},
     { "virtio-scsi-device", "seg_max_adjust", "off"},
     { "vhost-blk-device", "seg_max_adjust", "off"},
+    { "vhost-scsi", "num_queues", "1"},
+    { "vhost-user-scsi", "num_queues", "1"},
     { "usb-host", "suppress-remote-wake", "off" },
     { "usb-redir", "suppress-remote-wake", "off" },
     { "qxl", "revision", "4" },
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index f052377b7e..8cb7e3825f 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -272,7 +272,8 @@ static Property vhost_scsi_properties[] = {
     DEFINE_PROP_STRING("vhostfd", VirtIOSCSICommon, conf.vhostfd),
     DEFINE_PROP_STRING("wwpn", VirtIOSCSICommon, conf.wwpn),
     DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0),
-    DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
+    DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues,
+                       VIRTIO_SCSI_AUTO_NUM_QUEUES),
     DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size,
                        128),
     DEFINE_PROP_BOOL("seg_max_adjust", VirtIOSCSICommon, conf.seg_max_adjust,
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index e9752baa89..f0a7e76280 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -163,7 +163,8 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, Error **errp)
 static Property vhost_user_scsi_properties[] = {
     DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.chardev),
     DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0),
-    DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
+    DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues,
+                       VIRTIO_SCSI_AUTO_NUM_QUEUES),
     DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size,
                        128),
     DEFINE_PROP_UINT32("max_sectors", VirtIOSCSICommon, conf.max_sectors,
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 427ad83c50..3bf97836d9 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -891,6 +891,9 @@ void virtio_scsi_common_realize(DeviceState *dev,
     virtio_init(vdev, "virtio-scsi", VIRTIO_ID_SCSI,
                 sizeof(VirtIOSCSIConfig));
 
+    if (s->conf.num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
+        s->conf.num_queues = 1;
+    }
     if (s->conf.num_queues == 0 ||
             s->conf.num_queues > VIRTIO_QUEUE_MAX - VIRTIO_SCSI_VQ_NUM_FIXED) {
         error_setg(errp, "Invalid number of queues (= %" PRIu32 "), "
@@ -964,7 +967,8 @@ static void virtio_scsi_device_unrealize(DeviceState *dev, Error **errp)
 }
 
 static Property virtio_scsi_properties[] = {
-    DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues, 1),
+    DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues,
+                       VIRTIO_SCSI_AUTO_NUM_QUEUES),
     DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSI,
                                          parent_obj.conf.virtqueue_size, 256),
     DEFINE_PROP_BOOL("seg_max_adjust", VirtIOSCSI,
diff --git a/hw/virtio/vhost-scsi-pci.c b/hw/virtio/vhost-scsi-pci.c
index 5dce640eaf..a0b7cdc1ac 100644
--- a/hw/virtio/vhost-scsi-pci.c
+++ b/hw/virtio/vhost-scsi-pci.c
@@ -17,6 +17,7 @@
 #include "qemu/osdep.h"
 
 #include "standard-headers/linux/virtio_pci.h"
+#include "hw/boards.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/vhost-scsi.h"
 #include "qapi/error.h"
@@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
     VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
     DeviceState *vdev = DEVICE(&dev->vdev);
-    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
+    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
+
+    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
+    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
+        conf->num_queues = current_machine->smp.cpus;
+    }
 
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-        vpci_dev->nvectors = vs->conf.num_queues + 3;
+        vpci_dev->nvectors = conf->num_queues + VIRTIO_SCSI_VQ_NUM_FIXED + 1;
     }
 
     qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
diff --git a/hw/virtio/vhost-user-scsi-pci.c b/hw/virtio/vhost-user-scsi-pci.c
index 32febb2daa..2787123c8d 100644
--- a/hw/virtio/vhost-user-scsi-pci.c
+++ b/hw/virtio/vhost-user-scsi-pci.c
@@ -18,6 +18,7 @@
 #include "qemu/osdep.h"
 
 #include "standard-headers/linux/virtio_pci.h"
+#include "hw/boards.h"
 #include "hw/virtio/vhost-user-scsi.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-scsi.h"
@@ -53,10 +54,15 @@ static void vhost_user_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
     VHostUserSCSIPCI *dev = VHOST_USER_SCSI_PCI(vpci_dev);
     DeviceState *vdev = DEVICE(&dev->vdev);
-    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
+    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
+
+    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
+    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
+        conf->num_queues = current_machine->smp.cpus;
+    }
 
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-        vpci_dev->nvectors = vs->conf.num_queues + 3;
+        vpci_dev->nvectors = conf->num_queues + VIRTIO_SCSI_VQ_NUM_FIXED + 1;
     }
 
     qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
diff --git a/hw/virtio/virtio-scsi-pci.c b/hw/virtio/virtio-scsi-pci.c
index e82e7e5680..45618ed2d1 100644
--- a/hw/virtio/virtio-scsi-pci.c
+++ b/hw/virtio/virtio-scsi-pci.c
@@ -15,6 +15,7 @@
 
 #include "qemu/osdep.h"
 
+#include "hw/boards.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio-scsi.h"
 #include "qemu/module.h"
@@ -46,12 +47,17 @@ static void virtio_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
     VirtIOSCSIPCI *dev = VIRTIO_SCSI_PCI(vpci_dev);
     DeviceState *vdev = DEVICE(&dev->vdev);
-    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
     DeviceState *proxy = DEVICE(vpci_dev);
+    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.conf;
     char *bus_name;
 
+    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
+    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
+        conf->num_queues = current_machine->smp.cpus;
+    }
+
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-        vpci_dev->nvectors = vs->conf.num_queues + 3;
+        vpci_dev->nvectors = conf->num_queues + VIRTIO_SCSI_VQ_NUM_FIXED + 1;
     }
 
     /*
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 9f293bcb80..c0b8e4dd7e 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -39,6 +39,8 @@
 /* Number of virtqueues that are always present */
 #define VIRTIO_SCSI_VQ_NUM_FIXED    2
 
+#define VIRTIO_SCSI_AUTO_NUM_QUEUES UINT32_MAX
+
 typedef struct virtio_scsi_cmd_req VirtIOSCSICmdReq;
 typedef struct virtio_scsi_cmd_resp VirtIOSCSICmdResp;
 typedef struct virtio_scsi_ctrl_tmf_req VirtIOSCSICtrlTMFReq;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RESEND v3 3/4] virtio-blk: default num_queues to -smp N
  2020-03-20 10:30 [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefan Hajnoczi
  2020-03-20 10:30 ` [PATCH RESEND v3 1/4] virtio-scsi: introduce a constant for fixed virtqueues Stefan Hajnoczi
  2020-03-20 10:30 ` [PATCH RESEND v3 2/4] virtio-scsi: default num_queues to -smp N Stefan Hajnoczi
@ 2020-03-20 10:30 ` Stefan Hajnoczi
  2020-03-20 10:30 ` [PATCH RESEND v3 4/4] vhost-user-blk: " Stefan Hajnoczi
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2020-03-20 10:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, slp, Michael S. Tsirkin,
	cohuck, Max Reitz, Stefan Hajnoczi, Paolo Bonzini, qemu-block

Automatically size the number of virtio-blk-pci request virtqueues to
match the number of vCPUs.  Other transports continue to default to 1
request virtqueue.

A 1:1 virtqueue:vCPU mapping ensures that completion interrupts are
handled on the same vCPU that submitted the request.  No IPI is
necessary to complete an I/O request and performance is improved.

Performance improves from 78k to 104k IOPS on a 32 vCPU guest with 101
virtio-blk-pci devices (ioengine=libaio, iodepth=1, bs=4k, rw=randread
with NVMe storage).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
 hw/block/virtio-blk.c          | 6 +++++-
 hw/core/machine.c              | 1 +
 hw/virtio/virtio-blk-pci.c     | 9 ++++++++-
 include/hw/virtio/virtio-blk.h | 2 ++
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 142863a3b2..ed5df71779 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1135,6 +1135,9 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
         error_setg(errp, "Device needs media, but drive is empty");
         return;
     }
+    if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
+        conf->num_queues = 1;
+    }
     if (!conf->num_queues) {
         error_setg(errp, "num-queues property must be larger than 0");
         return;
@@ -1271,7 +1274,8 @@ static Property virtio_blk_properties[] = {
 #endif
     DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0,
                     true),
-    DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1),
+    DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues,
+                       VIRTIO_BLK_AUTO_NUM_QUEUES),
     DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 256),
     DEFINE_PROP_BOOL("seg-max-adjust", VirtIOBlock, conf.seg_max_adjust, true),
     DEFINE_PROP_LINK("iothread", VirtIOBlock, conf.iothread, TYPE_IOTHREAD,
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 4bbcec8fbd..c993b8f489 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -31,6 +31,7 @@
 GlobalProperty hw_compat_4_2[] = {
     { "virtio-blk-device", "queue-size", "128"},
     { "virtio-scsi-device", "virtqueue_size", "128"},
+    { "virtio-blk-device", "num-queues", "1"},
     { "virtio-blk-device", "x-enable-wce-if-config-wce", "off" },
     { "virtio-blk-device", "seg-max-adjust", "off"},
     { "virtio-scsi-device", "num_queues", "1"},
diff --git a/hw/virtio/virtio-blk-pci.c b/hw/virtio/virtio-blk-pci.c
index efb2c22a1d..bf628de675 100644
--- a/hw/virtio/virtio-blk-pci.c
+++ b/hw/virtio/virtio-blk-pci.c
@@ -17,6 +17,7 @@
 
 #include "qemu/osdep.h"
 
+#include "hw/boards.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio-blk.h"
 #include "virtio-pci.h"
@@ -50,9 +51,15 @@ static void virtio_blk_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
     VirtIOBlkPCI *dev = VIRTIO_BLK_PCI(vpci_dev);
     DeviceState *vdev = DEVICE(&dev->vdev);
+    VirtIOBlkConf *conf = &dev->vdev.conf;
+
+    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
+    if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
+        conf->num_queues = current_machine->smp.cpus;
+    }
 
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-        vpci_dev->nvectors = dev->vdev.conf.num_queues + 1;
+        vpci_dev->nvectors = conf->num_queues + 1;
     }
 
     qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 1e62f869b2..4e5e903f4a 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -30,6 +30,8 @@ struct virtio_blk_inhdr
     unsigned char status;
 };
 
+#define VIRTIO_BLK_AUTO_NUM_QUEUES UINT16_MAX
+
 struct VirtIOBlkConf
 {
     BlockConf conf;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RESEND v3 4/4] vhost-user-blk: default num_queues to -smp N
  2020-03-20 10:30 [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2020-03-20 10:30 ` [PATCH RESEND v3 3/4] virtio-blk: " Stefan Hajnoczi
@ 2020-03-20 10:30 ` Stefan Hajnoczi
  2020-03-29 13:49 ` [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default Michael S. Tsirkin
  2020-05-04 14:11 ` Michael S. Tsirkin
  5 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2020-03-20 10:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, slp, Michael S. Tsirkin,
	cohuck, Max Reitz, Stefan Hajnoczi, Paolo Bonzini, qemu-block

Automatically size the number of request virtqueues to match the number
of vCPUs.  This ensures that completion interrupts are handled on the
same vCPU that submitted the request.  No IPI is necessary to complete
an I/O request and performance is improved.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
 hw/block/vhost-user-blk.c          | 6 +++++-
 hw/core/machine.c                  | 1 +
 hw/virtio/vhost-user-blk-pci.c     | 6 ++++++
 include/hw/virtio/vhost-user-blk.h | 2 ++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 12925a47ec..5c275af935 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -403,6 +403,9 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    if (s->num_queues == VHOST_USER_BLK_AUTO_NUM_QUEUES) {
+        s->num_queues = 1;
+    }
     if (!s->num_queues || s->num_queues > VIRTIO_QUEUE_MAX) {
         error_setg(errp, "vhost-user-blk: invalid number of IO queues");
         return;
@@ -511,7 +514,8 @@ static const VMStateDescription vmstate_vhost_user_blk = {
 
 static Property vhost_user_blk_properties[] = {
     DEFINE_PROP_CHR("chardev", VHostUserBlk, chardev),
-    DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues, 1),
+    DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues,
+                       VHOST_USER_BLK_AUTO_NUM_QUEUES),
     DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128),
     DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true),
     DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/core/machine.c b/hw/core/machine.c
index c993b8f489..13d00abdc4 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -38,6 +38,7 @@ GlobalProperty hw_compat_4_2[] = {
     { "virtio-scsi-device", "seg_max_adjust", "off"},
     { "vhost-blk-device", "seg_max_adjust", "off"},
     { "vhost-scsi", "num_queues", "1"},
+    { "vhost-user-blk", "num-queues", "1"},
     { "vhost-user-scsi", "num_queues", "1"},
     { "usb-host", "suppress-remote-wake", "off" },
     { "usb-redir", "suppress-remote-wake", "off" },
diff --git a/hw/virtio/vhost-user-blk-pci.c b/hw/virtio/vhost-user-blk-pci.c
index 8d3d766427..846fec83ac 100644
--- a/hw/virtio/vhost-user-blk-pci.c
+++ b/hw/virtio/vhost-user-blk-pci.c
@@ -19,6 +19,7 @@
 #include "qemu/osdep.h"
 
 #include "standard-headers/linux/virtio_pci.h"
+#include "hw/boards.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/vhost-user-blk.h"
 #include "hw/pci/pci.h"
@@ -54,6 +55,11 @@ static void vhost_user_blk_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
     VHostUserBlkPCI *dev = VHOST_USER_BLK_PCI(vpci_dev);
     DeviceState *vdev = DEVICE(&dev->vdev);
 
+    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
+    if (dev->vdev.num_queues == VHOST_USER_BLK_AUTO_NUM_QUEUES) {
+        dev->vdev.num_queues = current_machine->smp.cpus;
+    }
+
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
         vpci_dev->nvectors = dev->vdev.num_queues + 1;
     }
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index 05ea0ad183..c28027c7c8 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -25,6 +25,8 @@
 #define VHOST_USER_BLK(obj) \
         OBJECT_CHECK(VHostUserBlk, (obj), TYPE_VHOST_USER_BLK)
 
+#define VHOST_USER_BLK_AUTO_NUM_QUEUES UINT16_MAX
+
 typedef struct VHostUserBlk {
     VirtIODevice parent_obj;
     CharBackend chardev;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND v3 2/4] virtio-scsi: default num_queues to -smp N
  2020-03-20 10:30 ` [PATCH RESEND v3 2/4] virtio-scsi: default num_queues to -smp N Stefan Hajnoczi
@ 2020-03-20 10:58   ` Cornelia Huck
  0 siblings, 0 replies; 9+ messages in thread
From: Cornelia Huck @ 2020-03-20 10:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, slp, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, qemu-block

On Fri, 20 Mar 2020 10:30:39 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> Automatically size the number of virtio-scsi-pci, vhost-scsi-pci, and
> vhost-user-scsi-pci request virtqueues to match the number of vCPUs.
> Other transports continue to default to 1 request virtqueue.
> 
> A 1:1 virtqueue:vCPU mapping ensures that completion interrupts are
> handled on the same vCPU that submitted the request.  No IPI is
> necessary to complete an I/O request and performance is improved.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  hw/core/machine.c               |  3 +++
>  hw/scsi/vhost-scsi.c            |  3 ++-
>  hw/scsi/vhost-user-scsi.c       |  3 ++-
>  hw/scsi/virtio-scsi.c           |  6 +++++-
>  hw/virtio/vhost-scsi-pci.c      | 10 ++++++++--
>  hw/virtio/vhost-user-scsi-pci.c | 10 ++++++++--
>  hw/virtio/virtio-scsi-pci.c     | 10 ++++++++--
>  include/hw/virtio/virtio-scsi.h |  2 ++
>  8 files changed, 38 insertions(+), 9 deletions(-)

(...)

> diff --git a/hw/virtio/vhost-scsi-pci.c b/hw/virtio/vhost-scsi-pci.c
> index 5dce640eaf..a0b7cdc1ac 100644
> --- a/hw/virtio/vhost-scsi-pci.c
> +++ b/hw/virtio/vhost-scsi-pci.c
> @@ -17,6 +17,7 @@
>  #include "qemu/osdep.h"
>  
>  #include "standard-headers/linux/virtio_pci.h"
> +#include "hw/boards.h"
>  #include "hw/qdev-properties.h"
>  #include "hw/virtio/vhost-scsi.h"
>  #include "qapi/error.h"
> @@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>  {
>      VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
>      DeviceState *vdev = DEVICE(&dev->vdev);
> -    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
> +    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
> +
> +    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
> +    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
> +        conf->num_queues = current_machine->smp.cpus;

I don't recall the discussion from previous versions of this patch
set... do we need to bound this by the maximum number of virtqueues? It
seems unlikely that something will break from my reading of the code,
but it might still be nicer.

> +    }
>  
>      if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> -        vpci_dev->nvectors = vs->conf.num_queues + 3;
> +        vpci_dev->nvectors = conf->num_queues + VIRTIO_SCSI_VQ_NUM_FIXED + 1;

You might already do the resolving of 3 into NUM_FIXED + 1 in the
previous patch; but as you touch this line anyway, I'd just keep this
if you don't need to respin.

>      }
>  
>      qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));

(Same comments apply to the two other cases below.)



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default
  2020-03-20 10:30 [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2020-03-20 10:30 ` [PATCH RESEND v3 4/4] vhost-user-blk: " Stefan Hajnoczi
@ 2020-03-29 13:49 ` Michael S. Tsirkin
  2020-03-30 14:35   ` Pankaj Gupta
  2020-05-04 14:11 ` Michael S. Tsirkin
  5 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2020-03-29 13:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, slp, cohuck, qemu-devel,
	Max Reitz, Paolo Bonzini, qemu-block

On Fri, Mar 20, 2020 at 10:30:37AM +0000, Stefan Hajnoczi wrote:
> v3:
>  * Add new performance results that demonstrate the scalability
>  * Mention that this is PCI-specific [Cornelia]
> v2:
>  * Let the virtio-DEVICE-pci device select num-queues because the optimal
>    multi-queue configuration may differ between virtio-pci, virtio-mmio, and
>    virtio-ccw [Cornelia]


I'd like to queue it for merge after the release. If possible
please ping me after the release to help make sure it didn't get
dropped.

Thanks!


> Enabling multi-queue on virtio-pci storage devices improves performance on SMP
> guests because the completion interrupt is handled on the vCPU that submitted
> the I/O request.  This avoids IPIs inside the guest.
> 
> Note that performance is unchanged in these cases:
> 1. Uniprocessor guests.  They don't have IPIs.
> 2. Application threads might be scheduled on the sole vCPU that handles
>    completion interrupts purely by chance.  (This is one reason why benchmark
>    results can vary noticably between runs.)
> 3. Users may bind the application to the vCPU that handles completion
>    interrupts.
> 
> Set the number of queues to the number of vCPUs by default on virtio-blk and
> virtio-scsi PCI devices.  Older machine types continue to default to 1 queue
> for live migration compatibility.
> 
> Random read performance:
>       IOPS
> q=1    78k
> q=32  104k  +33%
> 
> Boot time:
>       Duration
> q=1        51s
> q=32     1m41s  +98%
> 
> Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks
> 
> Previously measured results on a 4 vCPU guest were also positive but showed a
> smaller 1-4% performance improvement.  They are no longer valid because
> significant event loop optimizations have been merged.
> 
> Stefan Hajnoczi (4):
>   virtio-scsi: introduce a constant for fixed virtqueues
>   virtio-scsi: default num_queues to -smp N
>   virtio-blk: default num_queues to -smp N
>   vhost-user-blk: default num_queues to -smp N
> 
>  hw/block/vhost-user-blk.c          |  6 +++++-
>  hw/block/virtio-blk.c              |  6 +++++-
>  hw/core/machine.c                  |  5 +++++
>  hw/scsi/vhost-scsi.c               |  3 ++-
>  hw/scsi/vhost-user-scsi.c          |  5 +++--
>  hw/scsi/virtio-scsi.c              | 13 +++++++++----
>  hw/virtio/vhost-scsi-pci.c         | 10 ++++++++--
>  hw/virtio/vhost-user-blk-pci.c     |  6 ++++++
>  hw/virtio/vhost-user-scsi-pci.c    | 10 ++++++++--
>  hw/virtio/virtio-blk-pci.c         |  9 ++++++++-
>  hw/virtio/virtio-scsi-pci.c        | 10 ++++++++--
>  include/hw/virtio/vhost-user-blk.h |  2 ++
>  include/hw/virtio/virtio-blk.h     |  2 ++
>  include/hw/virtio/virtio-scsi.h    |  5 +++++
>  14 files changed, 76 insertions(+), 16 deletions(-)
> 
> -- 
> 2.24.1
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default
  2020-03-29 13:49 ` [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default Michael S. Tsirkin
@ 2020-03-30 14:35   ` Pankaj Gupta
  0 siblings, 0 replies; 9+ messages in thread
From: Pankaj Gupta @ 2020-03-30 14:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, slp, cohuck, qemu-devel,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini, qemu-block

For best case its really a good idea to configure default number of
queues to the number of CPU's.

For the series:
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default
  2020-03-20 10:30 [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2020-03-29 13:49 ` [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default Michael S. Tsirkin
@ 2020-05-04 14:11 ` Michael S. Tsirkin
  5 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2020-05-04 14:11 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, Eduardo Habkost, slp, cohuck, qemu-devel,
	Max Reitz, Paolo Bonzini, qemu-block

On Fri, Mar 20, 2020 at 10:30:37AM +0000, Stefan Hajnoczi wrote:
> v3:
>  * Add new performance results that demonstrate the scalability
>  * Mention that this is PCI-specific [Cornelia]
> v2:
>  * Let the virtio-DEVICE-pci device select num-queues because the optimal
>    multi-queue configuration may differ between virtio-pci, virtio-mmio, and
>    virtio-ccw [Cornelia]


So this needs to be rebased wrt compat properties. I also see some
comments from Cornelia, worth addressing.

> Enabling multi-queue on virtio-pci storage devices improves performance on SMP
> guests because the completion interrupt is handled on the vCPU that submitted
> the I/O request.  This avoids IPIs inside the guest.
> 
> Note that performance is unchanged in these cases:
> 1. Uniprocessor guests.  They don't have IPIs.
> 2. Application threads might be scheduled on the sole vCPU that handles
>    completion interrupts purely by chance.  (This is one reason why benchmark
>    results can vary noticably between runs.)
> 3. Users may bind the application to the vCPU that handles completion
>    interrupts.
> 
> Set the number of queues to the number of vCPUs by default on virtio-blk and
> virtio-scsi PCI devices.  Older machine types continue to default to 1 queue
> for live migration compatibility.
> 
> Random read performance:
>       IOPS
> q=1    78k
> q=32  104k  +33%
> 
> Boot time:
>       Duration
> q=1        51s
> q=32     1m41s  +98%
> 
> Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks
> 
> Previously measured results on a 4 vCPU guest were also positive but showed a
> smaller 1-4% performance improvement.  They are no longer valid because
> significant event loop optimizations have been merged.
> 
> Stefan Hajnoczi (4):
>   virtio-scsi: introduce a constant for fixed virtqueues
>   virtio-scsi: default num_queues to -smp N
>   virtio-blk: default num_queues to -smp N
>   vhost-user-blk: default num_queues to -smp N
> 
>  hw/block/vhost-user-blk.c          |  6 +++++-
>  hw/block/virtio-blk.c              |  6 +++++-
>  hw/core/machine.c                  |  5 +++++
>  hw/scsi/vhost-scsi.c               |  3 ++-
>  hw/scsi/vhost-user-scsi.c          |  5 +++--
>  hw/scsi/virtio-scsi.c              | 13 +++++++++----
>  hw/virtio/vhost-scsi-pci.c         | 10 ++++++++--
>  hw/virtio/vhost-user-blk-pci.c     |  6 ++++++
>  hw/virtio/vhost-user-scsi-pci.c    | 10 ++++++++--
>  hw/virtio/virtio-blk-pci.c         |  9 ++++++++-
>  hw/virtio/virtio-scsi-pci.c        | 10 ++++++++--
>  include/hw/virtio/vhost-user-blk.h |  2 ++
>  include/hw/virtio/virtio-blk.h     |  2 ++
>  include/hw/virtio/virtio-scsi.h    |  5 +++++
>  14 files changed, 76 insertions(+), 16 deletions(-)
> 
> -- 
> 2.24.1
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-05-04 14:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-20 10:30 [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefan Hajnoczi
2020-03-20 10:30 ` [PATCH RESEND v3 1/4] virtio-scsi: introduce a constant for fixed virtqueues Stefan Hajnoczi
2020-03-20 10:30 ` [PATCH RESEND v3 2/4] virtio-scsi: default num_queues to -smp N Stefan Hajnoczi
2020-03-20 10:58   ` Cornelia Huck
2020-03-20 10:30 ` [PATCH RESEND v3 3/4] virtio-blk: " Stefan Hajnoczi
2020-03-20 10:30 ` [PATCH RESEND v3 4/4] vhost-user-blk: " Stefan Hajnoczi
2020-03-29 13:49 ` [PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default Michael S. Tsirkin
2020-03-30 14:35   ` Pankaj Gupta
2020-05-04 14:11 ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.