All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] virtio-pci: enable blk and scsi multi-queue by default
@ 2020-01-24 10:01 Stefan Hajnoczi
  2020-01-24 10:01 ` [PATCH v2 1/4] virtio-scsi: introduce a constant for fixed virtqueues Stefan Hajnoczi
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-01-24 10:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, cohuck, Stefan Hajnoczi, Paolo Bonzini,
	Max Reitz

v2:
 * Let the virtio-DEVICE-pci device select num-queues because the optimal
   multi-queue configuration may differ between virtio-pci, virtio-mmio, and
   virtio-ccw [Cornelia]

Enabling multi-queue on virtio-pci storage devices improves performance on SMP
guests because the completion interrupt is handled on the vCPU that submitted
the I/O request.  This avoids IPIs inside the guest.

Note that performance is unchanged in these cases:
1. Uniprocessor guests.  They don't have IPIs.
2. Application threads might be scheduled on the sole vCPU that handles
   completion interrupts purely by chance.  (This is one reason why benchmark
   results can vary noticably between runs.)
3. Users may bind the application to the vCPU that handles completion
   interrupts.

Set the number of queues to the number of vCPUs by default.  Older machine
types continue to default to 1 queue for live migration compatibility.

This patch improves IOPS by 1-4% on an Intel Optane SSD with 4 vCPUs, -drive
aio=native, and fio bs=4k direct=1 rw=randread.

Stefan Hajnoczi (4):
  virtio-scsi: introduce a constant for fixed virtqueues
  virtio-scsi: default num_queues to -smp N
  virtio-blk: default num_queues to -smp N
  vhost-user-blk: default num_queues to -smp N

 hw/block/vhost-user-blk.c          |  6 +++++-
 hw/block/virtio-blk.c              |  6 +++++-
 hw/core/machine.c                  |  5 +++++
 hw/scsi/vhost-scsi.c               |  3 ++-
 hw/scsi/vhost-user-scsi.c          |  5 +++--
 hw/scsi/virtio-scsi.c              | 13 +++++++++----
 hw/virtio/vhost-scsi-pci.c         | 10 ++++++++--
 hw/virtio/vhost-user-blk-pci.c     |  6 ++++++
 hw/virtio/vhost-user-scsi-pci.c    | 10 ++++++++--
 hw/virtio/virtio-blk-pci.c         |  9 ++++++++-
 hw/virtio/virtio-scsi-pci.c        | 10 ++++++++--
 include/hw/virtio/vhost-user-blk.h |  2 ++
 include/hw/virtio/virtio-blk.h     |  2 ++
 include/hw/virtio/virtio-scsi.h    |  5 +++++
 14 files changed, 76 insertions(+), 16 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/4] virtio-scsi: introduce a constant for fixed virtqueues
  2020-01-24 10:01 [PATCH v2 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefan Hajnoczi
@ 2020-01-24 10:01 ` Stefan Hajnoczi
  2020-01-27 12:59   ` Cornelia Huck
  2020-01-24 10:01 ` [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N Stefan Hajnoczi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-01-24 10:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, cohuck, Stefan Hajnoczi, Paolo Bonzini,
	Max Reitz

The event and control virtqueues are always present, regardless of the
multi-queue configuration.  Define a constant so that virtqueue number
calculations are easier to read.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/scsi/vhost-user-scsi.c       | 2 +-
 hw/scsi/virtio-scsi.c           | 7 ++++---
 include/hw/virtio/virtio-scsi.h | 3 +++
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 23f972df59..eb37733bd0 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -115,7 +115,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
         goto free_virtio;
     }
 
-    vsc->dev.nvqs = 2 + vs->conf.num_queues;
+    vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues;
     vsc->dev.vqs = g_new0(struct vhost_virtqueue, vsc->dev.nvqs);
     vsc->dev.vq_index = 0;
     vsc->dev.backend_features = 0;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index d3af42ef92..224a290498 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -191,7 +191,7 @@ static void virtio_scsi_save_request(QEMUFile *f, SCSIRequest *sreq)
     VirtIOSCSIReq *req = sreq->hba_private;
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(req->dev);
     VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
-    uint32_t n = virtio_get_queue_index(req->vq) - 2;
+    uint32_t n = virtio_get_queue_index(req->vq) - VIRTIO_SCSI_VQ_NUM_FIXED;
 
     assert(n < vs->conf.num_queues);
     qemu_put_be32s(f, &n);
@@ -892,10 +892,11 @@ void virtio_scsi_common_realize(DeviceState *dev,
                 sizeof(VirtIOSCSIConfig));
 
     if (s->conf.num_queues == 0 ||
-            s->conf.num_queues > VIRTIO_QUEUE_MAX - 2) {
+            s->conf.num_queues > VIRTIO_QUEUE_MAX - VIRTIO_SCSI_VQ_NUM_FIXED) {
         error_setg(errp, "Invalid number of queues (= %" PRIu32 "), "
                          "must be a positive integer less than %d.",
-                   s->conf.num_queues, VIRTIO_QUEUE_MAX - 2);
+                   s->conf.num_queues,
+                   VIRTIO_QUEUE_MAX - VIRTIO_SCSI_VQ_NUM_FIXED);
         virtio_cleanup(vdev);
         return;
     }
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 24e768909d..9f293bcb80 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -36,6 +36,9 @@
 #define VIRTIO_SCSI_MAX_TARGET  255
 #define VIRTIO_SCSI_MAX_LUN     16383
 
+/* Number of virtqueues that are always present */
+#define VIRTIO_SCSI_VQ_NUM_FIXED    2
+
 typedef struct virtio_scsi_cmd_req VirtIOSCSICmdReq;
 typedef struct virtio_scsi_cmd_resp VirtIOSCSICmdResp;
 typedef struct virtio_scsi_ctrl_tmf_req VirtIOSCSICtrlTMFReq;
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-01-24 10:01 [PATCH v2 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefan Hajnoczi
  2020-01-24 10:01 ` [PATCH v2 1/4] virtio-scsi: introduce a constant for fixed virtqueues Stefan Hajnoczi
@ 2020-01-24 10:01 ` Stefan Hajnoczi
  2020-01-27 13:10   ` Cornelia Huck
  2020-01-24 10:01 ` [PATCH v2 3/4] virtio-blk: " Stefan Hajnoczi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-01-24 10:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, cohuck, Stefan Hajnoczi, Paolo Bonzini,
	Max Reitz

Automatically size the number of request virtqueues to match the number
of vCPUs.  This ensures that completion interrupts are handled on the
same vCPU that submitted the request.  No IPI is necessary to complete
an I/O request and performance is improved.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/core/machine.c               |  3 +++
 hw/scsi/vhost-scsi.c            |  3 ++-
 hw/scsi/vhost-user-scsi.c       |  3 ++-
 hw/scsi/virtio-scsi.c           |  6 +++++-
 hw/virtio/vhost-scsi-pci.c      | 10 ++++++++--
 hw/virtio/vhost-user-scsi-pci.c | 10 ++++++++--
 hw/virtio/virtio-scsi-pci.c     | 10 ++++++++--
 include/hw/virtio/virtio-scsi.h |  2 ++
 8 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 3e288bfceb..d6e2370c77 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -30,8 +30,11 @@
 GlobalProperty hw_compat_4_2[] = {
     { "virtio-blk-device", "x-enable-wce-if-config-wce", "off" },
     { "virtio-blk-device", "seg-max-adjust", "off"},
+    { "virtio-scsi-device", "num_queues", "1"},
     { "virtio-scsi-device", "seg_max_adjust", "off"},
     { "vhost-blk-device", "seg_max_adjust", "off"},
+    { "vhost-scsi", "num_queues", "1"},
+    { "vhost-user-scsi", "num_queues", "1"},
     { "usb-host", "suppress-remote-wake", "off" },
     { "usb-redir", "suppress-remote-wake", "off" },
 };
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 26f710d3ec..80fe5d999a 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -272,7 +272,8 @@ static Property vhost_scsi_properties[] = {
     DEFINE_PROP_STRING("vhostfd", VirtIOSCSICommon, conf.vhostfd),
     DEFINE_PROP_STRING("wwpn", VirtIOSCSICommon, conf.wwpn),
     DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0),
-    DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
+    DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues,
+                       VIRTIO_SCSI_AUTO_NUM_QUEUES),
     DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size,
                        128),
     DEFINE_PROP_BOOL("seg_max_adjust", VirtIOSCSICommon, conf.seg_max_adjust,
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index eb37733bd0..655d300875 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -163,7 +163,8 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, Error **errp)
 static Property vhost_user_scsi_properties[] = {
     DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.chardev),
     DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0),
-    DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
+    DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues,
+                       VIRTIO_SCSI_AUTO_NUM_QUEUES),
     DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size,
                        128),
     DEFINE_PROP_UINT32("max_sectors", VirtIOSCSICommon, conf.max_sectors,
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 224a290498..c9342004ef 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -891,6 +891,9 @@ void virtio_scsi_common_realize(DeviceState *dev,
     virtio_init(vdev, "virtio-scsi", VIRTIO_ID_SCSI,
                 sizeof(VirtIOSCSIConfig));
 
+    if (s->conf.num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
+        s->conf.num_queues = 1;
+    }
     if (s->conf.num_queues == 0 ||
             s->conf.num_queues > VIRTIO_QUEUE_MAX - VIRTIO_SCSI_VQ_NUM_FIXED) {
         error_setg(errp, "Invalid number of queues (= %" PRIu32 "), "
@@ -964,7 +967,8 @@ static void virtio_scsi_device_unrealize(DeviceState *dev, Error **errp)
 }
 
 static Property virtio_scsi_properties[] = {
-    DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues, 1),
+    DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues,
+                       VIRTIO_SCSI_AUTO_NUM_QUEUES),
     DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSI,
                                          parent_obj.conf.virtqueue_size, 128),
     DEFINE_PROP_BOOL("seg_max_adjust", VirtIOSCSI,
diff --git a/hw/virtio/vhost-scsi-pci.c b/hw/virtio/vhost-scsi-pci.c
index e8dfbfc60f..38a8f0c3ef 100644
--- a/hw/virtio/vhost-scsi-pci.c
+++ b/hw/virtio/vhost-scsi-pci.c
@@ -17,6 +17,7 @@
 #include "qemu/osdep.h"
 
 #include "standard-headers/linux/virtio_pci.h"
+#include "hw/boards.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/vhost-scsi.h"
 #include "qapi/error.h"
@@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
     VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
     DeviceState *vdev = DEVICE(&dev->vdev);
-    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
+    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
+
+    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
+    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
+        conf->num_queues = current_machine->smp.cpus;
+    }
 
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-        vpci_dev->nvectors = vs->conf.num_queues + 3;
+        vpci_dev->nvectors = conf->num_queues + VIRTIO_SCSI_VQ_NUM_FIXED + 1;
     }
 
     qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
diff --git a/hw/virtio/vhost-user-scsi-pci.c b/hw/virtio/vhost-user-scsi-pci.c
index ff13af7030..0cad29eb67 100644
--- a/hw/virtio/vhost-user-scsi-pci.c
+++ b/hw/virtio/vhost-user-scsi-pci.c
@@ -18,6 +18,7 @@
 #include "qemu/osdep.h"
 
 #include "standard-headers/linux/virtio_pci.h"
+#include "hw/boards.h"
 #include "hw/virtio/vhost-user-scsi.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-scsi.h"
@@ -53,10 +54,15 @@ static void vhost_user_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
     VHostUserSCSIPCI *dev = VHOST_USER_SCSI_PCI(vpci_dev);
     DeviceState *vdev = DEVICE(&dev->vdev);
-    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
+    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
+
+    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
+    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
+        conf->num_queues = current_machine->smp.cpus;
+    }
 
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-        vpci_dev->nvectors = vs->conf.num_queues + 3;
+        vpci_dev->nvectors = conf->num_queues + VIRTIO_SCSI_VQ_NUM_FIXED + 1;
     }
 
     qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
diff --git a/hw/virtio/virtio-scsi-pci.c b/hw/virtio/virtio-scsi-pci.c
index 3c55dc19a1..b22c8b79e2 100644
--- a/hw/virtio/virtio-scsi-pci.c
+++ b/hw/virtio/virtio-scsi-pci.c
@@ -15,6 +15,7 @@
 
 #include "qemu/osdep.h"
 
+#include "hw/boards.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio-scsi.h"
 #include "qemu/module.h"
@@ -46,12 +47,17 @@ static void virtio_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
     VirtIOSCSIPCI *dev = VIRTIO_SCSI_PCI(vpci_dev);
     DeviceState *vdev = DEVICE(&dev->vdev);
-    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
     DeviceState *proxy = DEVICE(vpci_dev);
+    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.conf;
     char *bus_name;
 
+    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
+    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
+        conf->num_queues = current_machine->smp.cpus;
+    }
+
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-        vpci_dev->nvectors = vs->conf.num_queues + 3;
+        vpci_dev->nvectors = conf->num_queues + VIRTIO_SCSI_VQ_NUM_FIXED + 1;
     }
 
     /*
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 9f293bcb80..c0b8e4dd7e 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -39,6 +39,8 @@
 /* Number of virtqueues that are always present */
 #define VIRTIO_SCSI_VQ_NUM_FIXED    2
 
+#define VIRTIO_SCSI_AUTO_NUM_QUEUES UINT32_MAX
+
 typedef struct virtio_scsi_cmd_req VirtIOSCSICmdReq;
 typedef struct virtio_scsi_cmd_resp VirtIOSCSICmdResp;
 typedef struct virtio_scsi_ctrl_tmf_req VirtIOSCSICtrlTMFReq;
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/4] virtio-blk: default num_queues to -smp N
  2020-01-24 10:01 [PATCH v2 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefan Hajnoczi
  2020-01-24 10:01 ` [PATCH v2 1/4] virtio-scsi: introduce a constant for fixed virtqueues Stefan Hajnoczi
  2020-01-24 10:01 ` [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N Stefan Hajnoczi
@ 2020-01-24 10:01 ` Stefan Hajnoczi
  2020-01-27 13:14   ` Cornelia Huck
  2020-01-24 10:01 ` [PATCH v2 4/4] vhost-user-blk: " Stefan Hajnoczi
  2020-01-27  9:59 ` [PATCH v2 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefano Garzarella
  4 siblings, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-01-24 10:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, cohuck, Stefan Hajnoczi, Paolo Bonzini,
	Max Reitz

Automatically size the number of request virtqueues to match the number
of vCPUs.  This ensures that completion interrupts are handled on the
same vCPU that submitted the request.  No IPI is necessary to complete
an I/O request and performance is improved.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/virtio-blk.c          | 6 +++++-
 hw/core/machine.c              | 1 +
 hw/virtio/virtio-blk-pci.c     | 9 ++++++++-
 include/hw/virtio/virtio-blk.h | 2 ++
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 9bee514c4e..d3ffaffc93 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1135,6 +1135,9 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
         error_setg(errp, "Device needs media, but drive is empty");
         return;
     }
+    if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
+        conf->num_queues = 1;
+    }
     if (!conf->num_queues) {
         error_setg(errp, "num-queues property must be larger than 0");
         return;
@@ -1271,7 +1274,8 @@ static Property virtio_blk_properties[] = {
 #endif
     DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0,
                     true),
-    DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1),
+    DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues,
+                       VIRTIO_BLK_AUTO_NUM_QUEUES),
     DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 128),
     DEFINE_PROP_BOOL("seg-max-adjust", VirtIOBlock, conf.seg_max_adjust, true),
     DEFINE_PROP_LINK("iothread", VirtIOBlock, conf.iothread, TYPE_IOTHREAD,
diff --git a/hw/core/machine.c b/hw/core/machine.c
index d6e2370c77..de6ceaa97f 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -28,6 +28,7 @@
 #include "hw/mem/nvdimm.h"
 
 GlobalProperty hw_compat_4_2[] = {
+    { "virtio-blk-device", "num-queues", "1"},
     { "virtio-blk-device", "x-enable-wce-if-config-wce", "off" },
     { "virtio-blk-device", "seg-max-adjust", "off"},
     { "virtio-scsi-device", "num_queues", "1"},
diff --git a/hw/virtio/virtio-blk-pci.c b/hw/virtio/virtio-blk-pci.c
index d9b69a5af3..7e6d863963 100644
--- a/hw/virtio/virtio-blk-pci.c
+++ b/hw/virtio/virtio-blk-pci.c
@@ -17,6 +17,7 @@
 
 #include "qemu/osdep.h"
 
+#include "hw/boards.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio-blk.h"
 #include "virtio-pci.h"
@@ -50,9 +51,15 @@ static void virtio_blk_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
     VirtIOBlkPCI *dev = VIRTIO_BLK_PCI(vpci_dev);
     DeviceState *vdev = DEVICE(&dev->vdev);
+    VirtIOBlkConf *conf = &dev->vdev.conf;
+
+    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
+    if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
+        conf->num_queues = current_machine->smp.cpus;
+    }
 
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-        vpci_dev->nvectors = dev->vdev.conf.num_queues + 1;
+        vpci_dev->nvectors = conf->num_queues + 1;
     }
 
     qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 1e62f869b2..4e5e903f4a 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -30,6 +30,8 @@ struct virtio_blk_inhdr
     unsigned char status;
 };
 
+#define VIRTIO_BLK_AUTO_NUM_QUEUES UINT16_MAX
+
 struct VirtIOBlkConf
 {
     BlockConf conf;
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/4] vhost-user-blk: default num_queues to -smp N
  2020-01-24 10:01 [PATCH v2 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2020-01-24 10:01 ` [PATCH v2 3/4] virtio-blk: " Stefan Hajnoczi
@ 2020-01-24 10:01 ` Stefan Hajnoczi
  2020-01-27 13:17   ` Cornelia Huck
  2020-01-27  9:59 ` [PATCH v2 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefano Garzarella
  4 siblings, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-01-24 10:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, cohuck, Stefan Hajnoczi, Paolo Bonzini,
	Max Reitz

Automatically size the number of request virtqueues to match the number
of vCPUs.  This ensures that completion interrupts are handled on the
same vCPU that submitted the request.  No IPI is necessary to complete
an I/O request and performance is improved.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/vhost-user-blk.c          | 6 +++++-
 hw/core/machine.c                  | 1 +
 hw/virtio/vhost-user-blk-pci.c     | 6 ++++++
 include/hw/virtio/vhost-user-blk.h | 2 ++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 98b383f90e..2ee26a434c 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -403,6 +403,9 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    if (s->num_queues == VHOST_USER_BLK_AUTO_NUM_QUEUES) {
+        s->num_queues = 1;
+    }
     if (!s->num_queues || s->num_queues > VIRTIO_QUEUE_MAX) {
         error_setg(errp, "vhost-user-blk: invalid number of IO queues");
         return;
@@ -500,7 +503,8 @@ static const VMStateDescription vmstate_vhost_user_blk = {
 
 static Property vhost_user_blk_properties[] = {
     DEFINE_PROP_CHR("chardev", VHostUserBlk, chardev),
-    DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues, 1),
+    DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues,
+                       VHOST_USER_BLK_AUTO_NUM_QUEUES),
     DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128),
     DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true),
     DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/core/machine.c b/hw/core/machine.c
index de6ceaa97f..d4c67f4d6e 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -35,6 +35,7 @@ GlobalProperty hw_compat_4_2[] = {
     { "virtio-scsi-device", "seg_max_adjust", "off"},
     { "vhost-blk-device", "seg_max_adjust", "off"},
     { "vhost-scsi", "num_queues", "1"},
+    { "vhost-user-blk", "num-queues", "1"},
     { "vhost-user-scsi", "num_queues", "1"},
     { "usb-host", "suppress-remote-wake", "off" },
     { "usb-redir", "suppress-remote-wake", "off" },
diff --git a/hw/virtio/vhost-user-blk-pci.c b/hw/virtio/vhost-user-blk-pci.c
index 1dc834a3ff..cf72b21c16 100644
--- a/hw/virtio/vhost-user-blk-pci.c
+++ b/hw/virtio/vhost-user-blk-pci.c
@@ -19,6 +19,7 @@
 #include "qemu/osdep.h"
 
 #include "standard-headers/linux/virtio_pci.h"
+#include "hw/boards.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/vhost-user-blk.h"
 #include "hw/pci/pci.h"
@@ -54,6 +55,11 @@ static void vhost_user_blk_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
     VHostUserBlkPCI *dev = VHOST_USER_BLK_PCI(vpci_dev);
     DeviceState *vdev = DEVICE(&dev->vdev);
 
+    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
+    if (dev->vdev.num_queues == VHOST_USER_BLK_AUTO_NUM_QUEUES) {
+        dev->vdev.num_queues = current_machine->smp.cpus;
+    }
+
     if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
         vpci_dev->nvectors = dev->vdev.num_queues + 1;
     }
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index 108bfadeeb..5a353dc1c6 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -25,6 +25,8 @@
 #define VHOST_USER_BLK(obj) \
         OBJECT_CHECK(VHostUserBlk, (obj), TYPE_VHOST_USER_BLK)
 
+#define VHOST_USER_BLK_AUTO_NUM_QUEUES UINT16_MAX
+
 typedef struct VHostUserBlk {
     VirtIODevice parent_obj;
     CharBackend chardev;
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/4] virtio-pci: enable blk and scsi multi-queue by default
  2020-01-24 10:01 [PATCH v2 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2020-01-24 10:01 ` [PATCH v2 4/4] vhost-user-blk: " Stefan Hajnoczi
@ 2020-01-27  9:59 ` Stefano Garzarella
  4 siblings, 0 replies; 24+ messages in thread
From: Stefano Garzarella @ 2020-01-27  9:59 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, cohuck, qemu-devel, Max Reitz, Paolo Bonzini

On Fri, Jan 24, 2020 at 10:01:55AM +0000, Stefan Hajnoczi wrote:
> v2:
>  * Let the virtio-DEVICE-pci device select num-queues because the optimal
>    multi-queue configuration may differ between virtio-pci, virtio-mmio, and
>    virtio-ccw [Cornelia]
> 
> Enabling multi-queue on virtio-pci storage devices improves performance on SMP
> guests because the completion interrupt is handled on the vCPU that submitted
> the I/O request.  This avoids IPIs inside the guest.
> 
> Note that performance is unchanged in these cases:
> 1. Uniprocessor guests.  They don't have IPIs.
> 2. Application threads might be scheduled on the sole vCPU that handles
>    completion interrupts purely by chance.  (This is one reason why benchmark
>    results can vary noticably between runs.)
> 3. Users may bind the application to the vCPU that handles completion
>    interrupts.
> 
> Set the number of queues to the number of vCPUs by default.  Older machine
> types continue to default to 1 queue for live migration compatibility.
> 
> This patch improves IOPS by 1-4% on an Intel Optane SSD with 4 vCPUs, -drive
> aio=native, and fio bs=4k direct=1 rw=randread.
> 
> Stefan Hajnoczi (4):
>   virtio-scsi: introduce a constant for fixed virtqueues
>   virtio-scsi: default num_queues to -smp N
>   virtio-blk: default num_queues to -smp N
>   vhost-user-blk: default num_queues to -smp N

The series looks good to me:

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] virtio-scsi: introduce a constant for fixed virtqueues
  2020-01-24 10:01 ` [PATCH v2 1/4] virtio-scsi: introduce a constant for fixed virtqueues Stefan Hajnoczi
@ 2020-01-27 12:59   ` Cornelia Huck
  0 siblings, 0 replies; 24+ messages in thread
From: Cornelia Huck @ 2020-01-27 12:59 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, qemu-devel, Max Reitz, Paolo Bonzini

On Fri, 24 Jan 2020 10:01:56 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> The event and control virtqueues are always present, regardless of the
> multi-queue configuration.  Define a constant so that virtqueue number
> calculations are easier to read.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  hw/scsi/vhost-user-scsi.c       | 2 +-
>  hw/scsi/virtio-scsi.c           | 7 ++++---
>  include/hw/virtio/virtio-scsi.h | 3 +++
>  3 files changed, 8 insertions(+), 4 deletions(-)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-01-24 10:01 ` [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N Stefan Hajnoczi
@ 2020-01-27 13:10   ` Cornelia Huck
  2020-01-29 15:44     ` Stefan Hajnoczi
  0 siblings, 1 reply; 24+ messages in thread
From: Cornelia Huck @ 2020-01-27 13:10 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, qemu-devel, Max Reitz, Paolo Bonzini

On Fri, 24 Jan 2020 10:01:57 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> Automatically size the number of request virtqueues to match the number

"If the pci transport is used, ..." ?

> of vCPUs.  This ensures that completion interrupts are handled on the
> same vCPU that submitted the request.  No IPI is necessary to complete
> an I/O request and performance is improved.

"For other transports, the number of request queues continues to
default to 1." ?

> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  hw/core/machine.c               |  3 +++
>  hw/scsi/vhost-scsi.c            |  3 ++-
>  hw/scsi/vhost-user-scsi.c       |  3 ++-
>  hw/scsi/virtio-scsi.c           |  6 +++++-
>  hw/virtio/vhost-scsi-pci.c      | 10 ++++++++--
>  hw/virtio/vhost-user-scsi-pci.c | 10 ++++++++--
>  hw/virtio/virtio-scsi-pci.c     | 10 ++++++++--
>  include/hw/virtio/virtio-scsi.h |  2 ++
>  8 files changed, 38 insertions(+), 9 deletions(-)
> 
(...)
> diff --git a/hw/virtio/vhost-scsi-pci.c b/hw/virtio/vhost-scsi-pci.c
> index e8dfbfc60f..38a8f0c3ef 100644
> --- a/hw/virtio/vhost-scsi-pci.c
> +++ b/hw/virtio/vhost-scsi-pci.c
> @@ -17,6 +17,7 @@
>  #include "qemu/osdep.h"
>  
>  #include "standard-headers/linux/virtio_pci.h"
> +#include "hw/boards.h"
>  #include "hw/qdev-properties.h"
>  #include "hw/virtio/vhost-scsi.h"
>  #include "qapi/error.h"
> @@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>  {
>      VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
>      DeviceState *vdev = DEVICE(&dev->vdev);
> -    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
> +    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
> +
> +    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
> +    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
> +        conf->num_queues = current_machine->smp.cpus;

This now maps the request vqs 1:1 to the vcpus. What about the fixed
vqs? If they don't really matter, amend the comment to explain that?

> +    }
>  
>      if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> -        vpci_dev->nvectors = vs->conf.num_queues + 3;
> +        vpci_dev->nvectors = conf->num_queues + VIRTIO_SCSI_VQ_NUM_FIXED + 1;
>      }
>  
>      qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/4] virtio-blk: default num_queues to -smp N
  2020-01-24 10:01 ` [PATCH v2 3/4] virtio-blk: " Stefan Hajnoczi
@ 2020-01-27 13:14   ` Cornelia Huck
  0 siblings, 0 replies; 24+ messages in thread
From: Cornelia Huck @ 2020-01-27 13:14 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, qemu-devel, Max Reitz, Paolo Bonzini

On Fri, 24 Jan 2020 10:01:58 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> Automatically size the number of request virtqueues to match the number
> of vCPUs.  This ensures that completion interrupts are handled on the
> same vCPU that submitted the request.  No IPI is necessary to complete
> an I/O request and performance is improved.

Same comment regarding other transports.

> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  hw/block/virtio-blk.c          | 6 +++++-
>  hw/core/machine.c              | 1 +
>  hw/virtio/virtio-blk-pci.c     | 9 ++++++++-
>  include/hw/virtio/virtio-blk.h | 2 ++
>  4 files changed, 16 insertions(+), 2 deletions(-)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/4] vhost-user-blk: default num_queues to -smp N
  2020-01-24 10:01 ` [PATCH v2 4/4] vhost-user-blk: " Stefan Hajnoczi
@ 2020-01-27 13:17   ` Cornelia Huck
  0 siblings, 0 replies; 24+ messages in thread
From: Cornelia Huck @ 2020-01-27 13:17 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, qemu-devel, Max Reitz, Paolo Bonzini

On Fri, 24 Jan 2020 10:01:59 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> Automatically size the number of request virtqueues to match the number
> of vCPUs.  This ensures that completion interrupts are handled on the
> same vCPU that submitted the request.  No IPI is necessary to complete
> an I/O request and performance is improved.

Ok, that one is pci-only anyway.

> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  hw/block/vhost-user-blk.c          | 6 +++++-
>  hw/core/machine.c                  | 1 +
>  hw/virtio/vhost-user-blk-pci.c     | 6 ++++++
>  include/hw/virtio/vhost-user-blk.h | 2 ++
>  4 files changed, 14 insertions(+), 1 deletion(-)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-01-27 13:10   ` Cornelia Huck
@ 2020-01-29 15:44     ` Stefan Hajnoczi
  2020-01-30  0:29       ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-01-29 15:44 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, qemu-devel, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 995 bytes --]

On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote:
> On Fri, 24 Jan 2020 10:01:57 +0000
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > @@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> >  {
> >      VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
> >      DeviceState *vdev = DEVICE(&dev->vdev);
> > -    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
> > +    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
> > +
> > +    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
> > +    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
> > +        conf->num_queues = current_machine->smp.cpus;
> 
> This now maps the request vqs 1:1 to the vcpus. What about the fixed
> vqs? If they don't really matter, amend the comment to explain that?

The fixed vqs don't matter.  They are typically not involved in the data
path, only the control path where performance doesn't matter.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-01-29 15:44     ` Stefan Hajnoczi
@ 2020-01-30  0:29       ` Paolo Bonzini
  2020-01-30 10:52         ` Stefan Hajnoczi
  0 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2020-01-30  0:29 UTC (permalink / raw)
  To: Stefan Hajnoczi, Cornelia Huck
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, qemu-devel, Max Reitz, Stefan Hajnoczi


[-- Attachment #1.1: Type: text/plain, Size: 1179 bytes --]

On 29/01/20 16:44, Stefan Hajnoczi wrote:
> On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote:
>> On Fri, 24 Jan 2020 10:01:57 +0000
>> Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>> @@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
>>>  {
>>>      VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
>>>      DeviceState *vdev = DEVICE(&dev->vdev);
>>> -    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
>>> +    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
>>> +
>>> +    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
>>> +    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
>>> +        conf->num_queues = current_machine->smp.cpus;
>> This now maps the request vqs 1:1 to the vcpus. What about the fixed
>> vqs? If they don't really matter, amend the comment to explain that?
> The fixed vqs don't matter.  They are typically not involved in the data
> path, only the control path where performance doesn't matter.

Should we put a limit on the number of vCPUs?  For anything above ~128
the guest is probably not going to be disk or network bound.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-01-30  0:29       ` Paolo Bonzini
@ 2020-01-30 10:52         ` Stefan Hajnoczi
  2020-01-30 11:03           ` Cornelia Huck
  2020-02-03 10:25           ` Sergio Lopez
  0 siblings, 2 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-01-30 10:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, Stefan Hajnoczi, Cornelia Huck, qemu-devel,
	Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1675 bytes --]

On Thu, Jan 30, 2020 at 01:29:16AM +0100, Paolo Bonzini wrote:
> On 29/01/20 16:44, Stefan Hajnoczi wrote:
> > On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote:
> >> On Fri, 24 Jan 2020 10:01:57 +0000
> >> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >>> @@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> >>>  {
> >>>      VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
> >>>      DeviceState *vdev = DEVICE(&dev->vdev);
> >>> -    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
> >>> +    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
> >>> +
> >>> +    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
> >>> +    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
> >>> +        conf->num_queues = current_machine->smp.cpus;
> >> This now maps the request vqs 1:1 to the vcpus. What about the fixed
> >> vqs? If they don't really matter, amend the comment to explain that?
> > The fixed vqs don't matter.  They are typically not involved in the data
> > path, only the control path where performance doesn't matter.
> 
> Should we put a limit on the number of vCPUs?  For anything above ~128
> the guest is probably not going to be disk or network bound.

Michael Tsirkin pointed out there's a hard limit of VIRTIO_QUEUE_MAX
(1024).  We need to at least stay under that limit.

Should the guest have >128 virtqueues?  Each virtqueue requires guest
RAM and 2 host eventfds.  Eventually these resource requirements will
become a scalability problem, but how do we choose a hard limit and what
happens to guest performance above that limit?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-01-30 10:52         ` Stefan Hajnoczi
@ 2020-01-30 11:03           ` Cornelia Huck
  2020-02-03 10:25           ` Sergio Lopez
  1 sibling, 0 replies; 24+ messages in thread
From: Cornelia Huck @ 2020-01-30 11:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, Stefan Hajnoczi, qemu-devel, Max Reitz,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2353 bytes --]

On Thu, 30 Jan 2020 10:52:35 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Thu, Jan 30, 2020 at 01:29:16AM +0100, Paolo Bonzini wrote:
> > On 29/01/20 16:44, Stefan Hajnoczi wrote:  
> > > On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote:  
> > >> On Fri, 24 Jan 2020 10:01:57 +0000
> > >> Stefan Hajnoczi <stefanha@redhat.com> wrote:  
> > >>> @@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> > >>>  {
> > >>>      VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
> > >>>      DeviceState *vdev = DEVICE(&dev->vdev);
> > >>> -    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
> > >>> +    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
> > >>> +
> > >>> +    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
> > >>> +    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
> > >>> +        conf->num_queues = current_machine->smp.cpus;  
> > >> This now maps the request vqs 1:1 to the vcpus. What about the fixed
> > >> vqs? If they don't really matter, amend the comment to explain that?  
> > > The fixed vqs don't matter.  They are typically not involved in the data
> > > path, only the control path where performance doesn't matter.  
> > 
> > Should we put a limit on the number of vCPUs?  For anything above ~128
> > the guest is probably not going to be disk or network bound.  
> 
> Michael Tsirkin pointed out there's a hard limit of VIRTIO_QUEUE_MAX
> (1024).  We need to at least stay under that limit.
> 
> Should the guest have >128 virtqueues?  Each virtqueue requires guest
> RAM and 2 host eventfds.  Eventually these resource requirements will
> become a scalability problem, but how do we choose a hard limit and what
> happens to guest performance above that limit?

There's probably two kind of limits involved here:

- a hard limit (we cannot do more), which should be checked even for
  user-specified values, and
- a soft limit (it does not make sense to go beyond this for the
  default case), which can be overridden if explicitly specified.

VIRTIO_QUEUE_MAX (and two less for virtio-scsi) sounds like a hard
limit, maybe 128 is a reasonable candidate for a soft limit.

(I would expect systems that give 128 vcpus to the guest to also be
generously sized in other respects.)

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-01-30 10:52         ` Stefan Hajnoczi
  2020-01-30 11:03           ` Cornelia Huck
@ 2020-02-03 10:25           ` Sergio Lopez
  2020-02-03 10:35             ` Michael S. Tsirkin
                               ` (2 more replies)
  1 sibling, 3 replies; 24+ messages in thread
From: Sergio Lopez @ 2020-02-03 10:25 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, Cornelia Huck, qemu-devel, Max Reitz,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2655 bytes --]

On Thu, Jan 30, 2020 at 10:52:35AM +0000, Stefan Hajnoczi wrote:
> On Thu, Jan 30, 2020 at 01:29:16AM +0100, Paolo Bonzini wrote:
> > On 29/01/20 16:44, Stefan Hajnoczi wrote:
> > > On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote:
> > >> On Fri, 24 Jan 2020 10:01:57 +0000
> > >> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >>> @@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> > >>>  {
> > >>>      VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
> > >>>      DeviceState *vdev = DEVICE(&dev->vdev);
> > >>> -    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
> > >>> +    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
> > >>> +
> > >>> +    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
> > >>> +    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
> > >>> +        conf->num_queues = current_machine->smp.cpus;
> > >> This now maps the request vqs 1:1 to the vcpus. What about the fixed
> > >> vqs? If they don't really matter, amend the comment to explain that?
> > > The fixed vqs don't matter.  They are typically not involved in the data
> > > path, only the control path where performance doesn't matter.
> > 
> > Should we put a limit on the number of vCPUs?  For anything above ~128
> > the guest is probably not going to be disk or network bound.
> 
> Michael Tsirkin pointed out there's a hard limit of VIRTIO_QUEUE_MAX
> (1024).  We need to at least stay under that limit.
> 
> Should the guest have >128 virtqueues?  Each virtqueue requires guest
> RAM and 2 host eventfds.  Eventually these resource requirements will
> become a scalability problem, but how do we choose a hard limit and what
> happens to guest performance above that limit?

From the UX perspective, I think it's safer to use a rather low upper
limit for the automatic configuration.

Users of large VMs (>=32 vCPUs) aiming for the optimal performance are
already facing the need of manually tuning (or relying on a software
to do that for them) other aspects of it, like vNUMA, IOThreads and
CPU pinning, so I don't think we should focus on this group.

On the other hand, the increase in host resource requirements may have
unforeseen in some environments, specially to virtio-blk users with
multiple disks.

All in all, I don't have data that would justify setting the limit to
one value or the other. The only argument I can put on the table is
that, so far, we only had one VQ per device, so perhaps a conservative
value (4? 8?) would make sense from a safety and compatibility point
of view.

Thanks,
Sergio.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-02-03 10:25           ` Sergio Lopez
@ 2020-02-03 10:35             ` Michael S. Tsirkin
  2020-02-03 10:51             ` Cornelia Huck
  2020-02-03 10:57             ` Daniel P. Berrangé
  2 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2020-02-03 10:35 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Cornelia Huck, qemu-devel, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini

On Mon, Feb 03, 2020 at 11:25:29AM +0100, Sergio Lopez wrote:
> On Thu, Jan 30, 2020 at 10:52:35AM +0000, Stefan Hajnoczi wrote:
> > On Thu, Jan 30, 2020 at 01:29:16AM +0100, Paolo Bonzini wrote:
> > > On 29/01/20 16:44, Stefan Hajnoczi wrote:
> > > > On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote:
> > > >> On Fri, 24 Jan 2020 10:01:57 +0000
> > > >> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > >>> @@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> > > >>>  {
> > > >>>      VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
> > > >>>      DeviceState *vdev = DEVICE(&dev->vdev);
> > > >>> -    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
> > > >>> +    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
> > > >>> +
> > > >>> +    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
> > > >>> +    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
> > > >>> +        conf->num_queues = current_machine->smp.cpus;
> > > >> This now maps the request vqs 1:1 to the vcpus. What about the fixed
> > > >> vqs? If they don't really matter, amend the comment to explain that?
> > > > The fixed vqs don't matter.  They are typically not involved in the data
> > > > path, only the control path where performance doesn't matter.
> > > 
> > > Should we put a limit on the number of vCPUs?  For anything above ~128
> > > the guest is probably not going to be disk or network bound.
> > 
> > Michael Tsirkin pointed out there's a hard limit of VIRTIO_QUEUE_MAX
> > (1024).  We need to at least stay under that limit.
> > 
> > Should the guest have >128 virtqueues?  Each virtqueue requires guest
> > RAM and 2 host eventfds.  Eventually these resource requirements will
> > become a scalability problem, but how do we choose a hard limit and what
> > happens to guest performance above that limit?
> 
> From the UX perspective, I think it's safer to use a rather low upper
> limit for the automatic configuration.
> 
> Users of large VMs (>=32 vCPUs) aiming for the optimal performance are
> already facing the need of manually tuning (or relying on a software
> to do that for them) other aspects of it, like vNUMA, IOThreads and
> CPU pinning, so I don't think we should focus on this group.
> 
> On the other hand, the increase in host resource requirements may have
> unforeseen in some environments, specially to virtio-blk users with
> multiple disks.
> 
> All in all, I don't have data that would justify setting the limit to
> one value or the other. The only argument I can put on the table is
> that, so far, we only had one VQ per device, so perhaps a conservative
> value (4? 8?) would make sense from a safety and compatibility point
> of view.
> 
> Thanks,
> Sergio.
> 

A bit more testing with different vcpu values can't hurt here ...
Stefan?

-- 
MST



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-02-03 10:25           ` Sergio Lopez
  2020-02-03 10:35             ` Michael S. Tsirkin
@ 2020-02-03 10:51             ` Cornelia Huck
  2020-02-03 10:57             ` Daniel P. Berrangé
  2 siblings, 0 replies; 24+ messages in thread
From: Cornelia Huck @ 2020-02-03 10:51 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, qemu-devel, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3203 bytes --]

On Mon, 3 Feb 2020 11:25:29 +0100
Sergio Lopez <slp@redhat.com> wrote:

> On Thu, Jan 30, 2020 at 10:52:35AM +0000, Stefan Hajnoczi wrote:
> > On Thu, Jan 30, 2020 at 01:29:16AM +0100, Paolo Bonzini wrote:  
> > > On 29/01/20 16:44, Stefan Hajnoczi wrote:  
> > > > On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote:  
> > > >> On Fri, 24 Jan 2020 10:01:57 +0000
> > > >> Stefan Hajnoczi <stefanha@redhat.com> wrote:  
> > > >>> @@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> > > >>>  {
> > > >>>      VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
> > > >>>      DeviceState *vdev = DEVICE(&dev->vdev);
> > > >>> -    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
> > > >>> +    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
> > > >>> +
> > > >>> +    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
> > > >>> +    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
> > > >>> +        conf->num_queues = current_machine->smp.cpus;  
> > > >> This now maps the request vqs 1:1 to the vcpus. What about the fixed
> > > >> vqs? If they don't really matter, amend the comment to explain that?  
> > > > The fixed vqs don't matter.  They are typically not involved in the data
> > > > path, only the control path where performance doesn't matter.  
> > > 
> > > Should we put a limit on the number of vCPUs?  For anything above ~128
> > > the guest is probably not going to be disk or network bound.  
> > 
> > Michael Tsirkin pointed out there's a hard limit of VIRTIO_QUEUE_MAX
> > (1024).  We need to at least stay under that limit.
> > 
> > Should the guest have >128 virtqueues?  Each virtqueue requires guest
> > RAM and 2 host eventfds.  Eventually these resource requirements will
> > become a scalability problem, but how do we choose a hard limit and what
> > happens to guest performance above that limit?  
> 
> From the UX perspective, I think it's safer to use a rather low upper
> limit for the automatic configuration.
> 
> Users of large VMs (>=32 vCPUs) aiming for the optimal performance are
> already facing the need of manually tuning (or relying on a software
> to do that for them) other aspects of it, like vNUMA, IOThreads and
> CPU pinning, so I don't think we should focus on this group.
> 
> On the other hand, the increase in host resource requirements may have
> unforeseen in some environments, specially to virtio-blk users with
> multiple disks.

Yes... what happens on systems that have both a lot of vcpus and a lot
of disks? We don't know how many other disks are there in the
configuration, and they might be hotplugged later, anyway.

> 
> All in all, I don't have data that would justify setting the limit to
> one value or the other. The only argument I can put on the table is
> that, so far, we only had one VQ per device, so perhaps a conservative
> value (4? 8?) would make sense from a safety and compatibility point
> of view.

The more I think about it, the more I agree. Aiming a bit lower will
hopefully give more performance with less opportunity for unforeseen
breakage due to resource exhaustion.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-02-03 10:25           ` Sergio Lopez
  2020-02-03 10:35             ` Michael S. Tsirkin
  2020-02-03 10:51             ` Cornelia Huck
@ 2020-02-03 10:57             ` Daniel P. Berrangé
  2020-02-03 11:39               ` Sergio Lopez
  2 siblings, 1 reply; 24+ messages in thread
From: Daniel P. Berrangé @ 2020-02-03 10:57 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, Cornelia Huck, qemu-devel, Max Reitz,
	Stefan Hajnoczi, Paolo Bonzini

On Mon, Feb 03, 2020 at 11:25:29AM +0100, Sergio Lopez wrote:
> On Thu, Jan 30, 2020 at 10:52:35AM +0000, Stefan Hajnoczi wrote:
> > On Thu, Jan 30, 2020 at 01:29:16AM +0100, Paolo Bonzini wrote:
> > > On 29/01/20 16:44, Stefan Hajnoczi wrote:
> > > > On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote:
> > > >> On Fri, 24 Jan 2020 10:01:57 +0000
> > > >> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > >>> @@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> > > >>>  {
> > > >>>      VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
> > > >>>      DeviceState *vdev = DEVICE(&dev->vdev);
> > > >>> -    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
> > > >>> +    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
> > > >>> +
> > > >>> +    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
> > > >>> +    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
> > > >>> +        conf->num_queues = current_machine->smp.cpus;
> > > >> This now maps the request vqs 1:1 to the vcpus. What about the fixed
> > > >> vqs? If they don't really matter, amend the comment to explain that?
> > > > The fixed vqs don't matter.  They are typically not involved in the data
> > > > path, only the control path where performance doesn't matter.
> > > 
> > > Should we put a limit on the number of vCPUs?  For anything above ~128
> > > the guest is probably not going to be disk or network bound.
> > 
> > Michael Tsirkin pointed out there's a hard limit of VIRTIO_QUEUE_MAX
> > (1024).  We need to at least stay under that limit.
> > 
> > Should the guest have >128 virtqueues?  Each virtqueue requires guest
> > RAM and 2 host eventfds.  Eventually these resource requirements will
> > become a scalability problem, but how do we choose a hard limit and what
> > happens to guest performance above that limit?
> 
> From the UX perspective, I think it's safer to use a rather low upper
> limit for the automatic configuration.
> 
> Users of large VMs (>=32 vCPUs) aiming for the optimal performance are
> already facing the need of manually tuning (or relying on a software
> to do that for them) other aspects of it, like vNUMA, IOThreads and
> CPU pinning, so I don't think we should focus on this group.

Whether they're runing manually, or relying on software to tune for
them, we (QEMU maintainers) still need to provide credible guidance
on what todo with tuning for large CPU counts. Without clear info
from QEMU, it just descends into hearsay and guesswork, both of which
approaches leave QEMU looking bad.

So I think we need to, at the very least, make a clear statement here
about what tuning approach should be applied vCPU count gets high,
and probably even apply that  as a default out of the box approach.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-02-03 10:57             ` Daniel P. Berrangé
@ 2020-02-03 11:39               ` Sergio Lopez
  2020-02-03 12:53                 ` Michael S. Tsirkin
  2020-02-11 16:20                 ` Stefan Hajnoczi
  0 siblings, 2 replies; 24+ messages in thread
From: Sergio Lopez @ 2020-02-03 11:39 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Kevin Wolf, Fam Zheng, Eduardo Habkost, qemu-block,
	Michael S. Tsirkin, Cornelia Huck, qemu-devel, Max Reitz,
	Stefan Hajnoczi, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3988 bytes --]

On Mon, Feb 03, 2020 at 10:57:44AM +0000, Daniel P. Berrangé wrote:
> On Mon, Feb 03, 2020 at 11:25:29AM +0100, Sergio Lopez wrote:
> > On Thu, Jan 30, 2020 at 10:52:35AM +0000, Stefan Hajnoczi wrote:
> > > On Thu, Jan 30, 2020 at 01:29:16AM +0100, Paolo Bonzini wrote:
> > > > On 29/01/20 16:44, Stefan Hajnoczi wrote:
> > > > > On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote:
> > > > >> On Fri, 24 Jan 2020 10:01:57 +0000
> > > > >> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > >>> @@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> > > > >>>  {
> > > > >>>      VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
> > > > >>>      DeviceState *vdev = DEVICE(&dev->vdev);
> > > > >>> -    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
> > > > >>> +    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
> > > > >>> +
> > > > >>> +    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
> > > > >>> +    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
> > > > >>> +        conf->num_queues = current_machine->smp.cpus;
> > > > >> This now maps the request vqs 1:1 to the vcpus. What about the fixed
> > > > >> vqs? If they don't really matter, amend the comment to explain that?
> > > > > The fixed vqs don't matter.  They are typically not involved in the data
> > > > > path, only the control path where performance doesn't matter.
> > > > 
> > > > Should we put a limit on the number of vCPUs?  For anything above ~128
> > > > the guest is probably not going to be disk or network bound.
> > > 
> > > Michael Tsirkin pointed out there's a hard limit of VIRTIO_QUEUE_MAX
> > > (1024).  We need to at least stay under that limit.
> > > 
> > > Should the guest have >128 virtqueues?  Each virtqueue requires guest
> > > RAM and 2 host eventfds.  Eventually these resource requirements will
> > > become a scalability problem, but how do we choose a hard limit and what
> > > happens to guest performance above that limit?
> > 
> > From the UX perspective, I think it's safer to use a rather low upper
> > limit for the automatic configuration.
> > 
> > Users of large VMs (>=32 vCPUs) aiming for the optimal performance are
> > already facing the need of manually tuning (or relying on a software
> > to do that for them) other aspects of it, like vNUMA, IOThreads and
> > CPU pinning, so I don't think we should focus on this group.
> 
> Whether they're runing manually, or relying on software to tune for
> them, we (QEMU maintainers) still need to provide credible guidance
> on what todo with tuning for large CPU counts. Without clear info
> from QEMU, it just descends into hearsay and guesswork, both of which
> approaches leave QEMU looking bad.

I agree. Good documentation, ideally with some benchmarks, and safe
defaults sound like a good approach to me.

> So I think we need to, at the very least, make a clear statement here
> about what tuning approach should be applied vCPU count gets high,
> and probably even apply that  as a default out of the box approach.

In general, I would agree, but in this particular case the
optimization has an impact on something outside's QEMU control (host's
resources), so we lack the information needed to make a proper guess.

My main concern here is users upgrading QEMU to hit some kind of crash
or performance issue, without having touched their VM config. And
let's not forget that Stefan said in the cover that this amounts to a
1-4% improvement on 4k operations on an SSD, and I guess that's with
iodepth=1. I suspect with a larger block size and/or higher iodepth
the improvement will be barely noticeable, which means it'll only have
a positive impact on users running DB/OLTP or similar workloads on
dedicated, directly attached, low-latency storage.

But don't get me wrong, this is a *good* optimization. It's just I
think we should play safe here.

Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-02-03 11:39               ` Sergio Lopez
@ 2020-02-03 12:53                 ` Michael S. Tsirkin
  2020-02-11 16:20                 ` Stefan Hajnoczi
  1 sibling, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2020-02-03 12:53 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Kevin Wolf, Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, qemu-block, Cornelia Huck, qemu-devel,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini

On Mon, Feb 03, 2020 at 12:39:49PM +0100, Sergio Lopez wrote:
> On Mon, Feb 03, 2020 at 10:57:44AM +0000, Daniel P. Berrangé wrote:
> > On Mon, Feb 03, 2020 at 11:25:29AM +0100, Sergio Lopez wrote:
> > > On Thu, Jan 30, 2020 at 10:52:35AM +0000, Stefan Hajnoczi wrote:
> > > > On Thu, Jan 30, 2020 at 01:29:16AM +0100, Paolo Bonzini wrote:
> > > > > On 29/01/20 16:44, Stefan Hajnoczi wrote:
> > > > > > On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote:
> > > > > >> On Fri, 24 Jan 2020 10:01:57 +0000
> > > > > >> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > > >>> @@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> > > > > >>>  {
> > > > > >>>      VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
> > > > > >>>      DeviceState *vdev = DEVICE(&dev->vdev);
> > > > > >>> -    VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
> > > > > >>> +    VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf;
> > > > > >>> +
> > > > > >>> +    /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
> > > > > >>> +    if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
> > > > > >>> +        conf->num_queues = current_machine->smp.cpus;
> > > > > >> This now maps the request vqs 1:1 to the vcpus. What about the fixed
> > > > > >> vqs? If they don't really matter, amend the comment to explain that?
> > > > > > The fixed vqs don't matter.  They are typically not involved in the data
> > > > > > path, only the control path where performance doesn't matter.
> > > > > 
> > > > > Should we put a limit on the number of vCPUs?  For anything above ~128
> > > > > the guest is probably not going to be disk or network bound.
> > > > 
> > > > Michael Tsirkin pointed out there's a hard limit of VIRTIO_QUEUE_MAX
> > > > (1024).  We need to at least stay under that limit.
> > > > 
> > > > Should the guest have >128 virtqueues?  Each virtqueue requires guest
> > > > RAM and 2 host eventfds.  Eventually these resource requirements will
> > > > become a scalability problem, but how do we choose a hard limit and what
> > > > happens to guest performance above that limit?
> > > 
> > > From the UX perspective, I think it's safer to use a rather low upper
> > > limit for the automatic configuration.
> > > 
> > > Users of large VMs (>=32 vCPUs) aiming for the optimal performance are
> > > already facing the need of manually tuning (or relying on a software
> > > to do that for them) other aspects of it, like vNUMA, IOThreads and
> > > CPU pinning, so I don't think we should focus on this group.
> > 
> > Whether they're runing manually, or relying on software to tune for
> > them, we (QEMU maintainers) still need to provide credible guidance
> > on what todo with tuning for large CPU counts. Without clear info
> > from QEMU, it just descends into hearsay and guesswork, both of which
> > approaches leave QEMU looking bad.
> 
> I agree. Good documentation, ideally with some benchmarks, and safe
> defaults sound like a good approach to me.
> 
> > So I think we need to, at the very least, make a clear statement here
> > about what tuning approach should be applied vCPU count gets high,
> > and probably even apply that  as a default out of the box approach.
> 
> In general, I would agree, but in this particular case the
> optimization has an impact on something outside's QEMU control (host's
> resources), so we lack the information needed to make a proper guess.
> 
> My main concern here is users upgrading QEMU to hit some kind of crash
> or performance issue, without having touched their VM config. And
> let's not forget that Stefan said in the cover that this amounts to a
> 1-4% improvement on 4k operations on an SSD, and I guess that's with
> iodepth=1. I suspect with a larger block size and/or higher iodepth
> the improvement will be barely noticeable, which means it'll only have
> a positive impact on users running DB/OLTP or similar workloads on
> dedicated, directly attached, low-latency storage.
> 
> But don't get me wrong, this is a *good* optimization. It's just I
> think we should play safe here.
> 
> Sergio.

Yea I think a bit more benchmarking than with 4 vcpus
so at least we can see the trend can't hurt.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-02-03 11:39               ` Sergio Lopez
  2020-02-03 12:53                 ` Michael S. Tsirkin
@ 2020-02-11 16:20                 ` Stefan Hajnoczi
  2020-02-11 16:31                   ` Michael S. Tsirkin
  1 sibling, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-02-11 16:20 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Kevin Wolf, Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, qemu-block, Michael S. Tsirkin, Cornelia Huck,
	qemu-devel, Max Reitz, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]

On Mon, Feb 03, 2020 at 12:39:49PM +0100, Sergio Lopez wrote:
> On Mon, Feb 03, 2020 at 10:57:44AM +0000, Daniel P. Berrangé wrote:
> > On Mon, Feb 03, 2020 at 11:25:29AM +0100, Sergio Lopez wrote:
> > > On Thu, Jan 30, 2020 at 10:52:35AM +0000, Stefan Hajnoczi wrote:
> > > > On Thu, Jan 30, 2020 at 01:29:16AM +0100, Paolo Bonzini wrote:
> > > > > On 29/01/20 16:44, Stefan Hajnoczi wrote:
> > > > > > On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote:
> > > > > >> On Fri, 24 Jan 2020 10:01:57 +0000
> > > > > >> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > So I think we need to, at the very least, make a clear statement here
> > about what tuning approach should be applied vCPU count gets high,
> > and probably even apply that  as a default out of the box approach.
> 
> In general, I would agree, but in this particular case the
> optimization has an impact on something outside's QEMU control (host's
> resources), so we lack the information needed to make a proper guess.
> 
> My main concern here is users upgrading QEMU to hit some kind of crash
> or performance issue, without having touched their VM config. And

I don't think this is an issue since only newly created guests are
affected.  Existing machine types are unchanged.

> let's not forget that Stefan said in the cover that this amounts to a
> 1-4% improvement on 4k operations on an SSD, and I guess that's with
> iodepth=1. I suspect with a larger block size and/or higher iodepth
> the improvement will be barely noticeable, which means it'll only have
> a positive impact on users running DB/OLTP or similar workloads on
> dedicated, directly attached, low-latency storage.
> 
> But don't get me wrong, this is a *good* optimization. It's just I
> think we should play safe here.

The NVMe card I've been testing has 64 queues.  Let's keep the virtio
limit roughly the same as real hardware.  That way, multi-queue block
layer support in QEMU will be able to fully exploit the hardware
(similar to how we size request queues to be larger than the common 64
/sys/block/FOO/queue/nr_requests).

The point of this change is to improve performance on SMP guests.
Setting the limit to 4-8 is too low, since it leaves guests that most
need this optimization with a sub-optimal configuration.

I will create a 32 vCPU guest with 100 virtio-blk devices and verify
that enabling multi-queue is successful.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-02-11 16:20                 ` Stefan Hajnoczi
@ 2020-02-11 16:31                   ` Michael S. Tsirkin
  2020-02-12 11:18                     ` Stefan Hajnoczi
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2020-02-11 16:31 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, qemu-block, Sergio Lopez, Cornelia Huck,
	qemu-devel, Max Reitz, Paolo Bonzini

On Tue, Feb 11, 2020 at 04:20:41PM +0000, Stefan Hajnoczi wrote:
> On Mon, Feb 03, 2020 at 12:39:49PM +0100, Sergio Lopez wrote:
> > On Mon, Feb 03, 2020 at 10:57:44AM +0000, Daniel P. Berrangé wrote:
> > > On Mon, Feb 03, 2020 at 11:25:29AM +0100, Sergio Lopez wrote:
> > > > On Thu, Jan 30, 2020 at 10:52:35AM +0000, Stefan Hajnoczi wrote:
> > > > > On Thu, Jan 30, 2020 at 01:29:16AM +0100, Paolo Bonzini wrote:
> > > > > > On 29/01/20 16:44, Stefan Hajnoczi wrote:
> > > > > > > On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote:
> > > > > > >> On Fri, 24 Jan 2020 10:01:57 +0000
> > > > > > >> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > So I think we need to, at the very least, make a clear statement here
> > > about what tuning approach should be applied vCPU count gets high,
> > > and probably even apply that  as a default out of the box approach.
> > 
> > In general, I would agree, but in this particular case the
> > optimization has an impact on something outside's QEMU control (host's
> > resources), so we lack the information needed to make a proper guess.
> > 
> > My main concern here is users upgrading QEMU to hit some kind of crash
> > or performance issue, without having touched their VM config. And
> 
> I don't think this is an issue since only newly created guests are
> affected.  Existing machine types are unchanged.
> 
> > let's not forget that Stefan said in the cover that this amounts to a
> > 1-4% improvement on 4k operations on an SSD, and I guess that's with
> > iodepth=1. I suspect with a larger block size and/or higher iodepth
> > the improvement will be barely noticeable, which means it'll only have
> > a positive impact on users running DB/OLTP or similar workloads on
> > dedicated, directly attached, low-latency storage.
> > 
> > But don't get me wrong, this is a *good* optimization. It's just I
> > think we should play safe here.
> 
> The NVMe card I've been testing has 64 queues.  Let's keep the virtio
> limit roughly the same as real hardware.  That way, multi-queue block
> layer support in QEMU will be able to fully exploit the hardware
> (similar to how we size request queues to be larger than the common 64
> /sys/block/FOO/queue/nr_requests).
> 
> The point of this change is to improve performance on SMP guests.
> Setting the limit to 4-8 is too low, since it leaves guests that most
> need this optimization with a sub-optimal configuration.
> 
> I will create a 32 vCPU guest with 100 virtio-blk devices and verify
> that enabling multi-queue is successful.
> 
> Stefan


and that it's helpful for performance?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-02-11 16:31                   ` Michael S. Tsirkin
@ 2020-02-12 11:18                     ` Stefan Hajnoczi
  2020-02-21 10:55                       ` Stefan Hajnoczi
  0 siblings, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-02-12 11:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, qemu-block, Sergio Lopez, Cornelia Huck,
	qemu-devel, Max Reitz, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1393 bytes --]

On Tue, Feb 11, 2020 at 11:31:17AM -0500, Michael S. Tsirkin wrote:
> On Tue, Feb 11, 2020 at 04:20:41PM +0000, Stefan Hajnoczi wrote:
> > On Mon, Feb 03, 2020 at 12:39:49PM +0100, Sergio Lopez wrote:
> > > On Mon, Feb 03, 2020 at 10:57:44AM +0000, Daniel P. Berrangé wrote:
> > > > On Mon, Feb 03, 2020 at 11:25:29AM +0100, Sergio Lopez wrote:
> > > > > On Thu, Jan 30, 2020 at 10:52:35AM +0000, Stefan Hajnoczi wrote:
> > > > > > On Thu, Jan 30, 2020 at 01:29:16AM +0100, Paolo Bonzini wrote:
> > > > > > > On 29/01/20 16:44, Stefan Hajnoczi wrote:
> > > > > > > > On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote:
> > > > > > > >> On Fri, 24 Jan 2020 10:01:57 +0000
> > > > > > > >> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > I will create a 32 vCPU guest with 100 virtio-blk devices and verify
> > that enabling multi-queue is successful.
> 
> and that it's helpful for performance?

I may be a little while before the next revision of this patch series.
Testing reveals scalability problems when creating so many virtqueues
:).

I've measured boot time, memory consumption, and random read IOPS.  They
are all significantly worse (32 vCPUs, 24 GB RAM, 101 virtio-blk
devices, 32 queues/device).

Time to see what's going on and whether some general scalability
improvements are possible here before we enable multi-queue by default.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N
  2020-02-12 11:18                     ` Stefan Hajnoczi
@ 2020-02-21 10:55                       ` Stefan Hajnoczi
  0 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2020-02-21 10:55 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, qemu-block, Michael S. Tsirkin, Cornelia Huck,
	qemu-devel, Max Reitz, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2014 bytes --]

On Wed, Feb 12, 2020 at 11:18:32AM +0000, Stefan Hajnoczi wrote:
> On Tue, Feb 11, 2020 at 11:31:17AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Feb 11, 2020 at 04:20:41PM +0000, Stefan Hajnoczi wrote:
> > > On Mon, Feb 03, 2020 at 12:39:49PM +0100, Sergio Lopez wrote:
> > > > On Mon, Feb 03, 2020 at 10:57:44AM +0000, Daniel P. Berrangé wrote:
> > > > > On Mon, Feb 03, 2020 at 11:25:29AM +0100, Sergio Lopez wrote:
> > > > > > On Thu, Jan 30, 2020 at 10:52:35AM +0000, Stefan Hajnoczi wrote:
> > > > > > > On Thu, Jan 30, 2020 at 01:29:16AM +0100, Paolo Bonzini wrote:
> > > > > > > > On 29/01/20 16:44, Stefan Hajnoczi wrote:
> > > > > > > > > On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote:
> > > > > > > > >> On Fri, 24 Jan 2020 10:01:57 +0000
> > > > > > > > >> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > I will create a 32 vCPU guest with 100 virtio-blk devices and verify
> > > that enabling multi-queue is successful.
> > 
> > and that it's helpful for performance?
> 
> I may be a little while before the next revision of this patch series.
> Testing reveals scalability problems when creating so many virtqueues
> :).
> 
> I've measured boot time, memory consumption, and random read IOPS.  They
> are all significantly worse (32 vCPUs, 24 GB RAM, 101 virtio-blk
> devices, 32 queues/device).
> 
> Time to see what's going on and whether some general scalability
> improvements are possible here before we enable multi-queue by default.

Update:

Boot time has improved with "[PATCH] memory: batch allocate ioeventfds[]
in address_space_update_ioeventfds()".

IOPS looks a lot better with the O(1) QEMU event loop patches that I've
posted.  This work is not complete yet, I still need to make AioContext
polling O(1) too (it consumes too much CPU with many idle devices).

After this work is complete I'll measure boot time, memory consumption,
and IOPS again.  Then we can decide whether multiqueue by default is a
good idea.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-02-21 10:55 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-24 10:01 [PATCH v2 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefan Hajnoczi
2020-01-24 10:01 ` [PATCH v2 1/4] virtio-scsi: introduce a constant for fixed virtqueues Stefan Hajnoczi
2020-01-27 12:59   ` Cornelia Huck
2020-01-24 10:01 ` [PATCH v2 2/4] virtio-scsi: default num_queues to -smp N Stefan Hajnoczi
2020-01-27 13:10   ` Cornelia Huck
2020-01-29 15:44     ` Stefan Hajnoczi
2020-01-30  0:29       ` Paolo Bonzini
2020-01-30 10:52         ` Stefan Hajnoczi
2020-01-30 11:03           ` Cornelia Huck
2020-02-03 10:25           ` Sergio Lopez
2020-02-03 10:35             ` Michael S. Tsirkin
2020-02-03 10:51             ` Cornelia Huck
2020-02-03 10:57             ` Daniel P. Berrangé
2020-02-03 11:39               ` Sergio Lopez
2020-02-03 12:53                 ` Michael S. Tsirkin
2020-02-11 16:20                 ` Stefan Hajnoczi
2020-02-11 16:31                   ` Michael S. Tsirkin
2020-02-12 11:18                     ` Stefan Hajnoczi
2020-02-21 10:55                       ` Stefan Hajnoczi
2020-01-24 10:01 ` [PATCH v2 3/4] virtio-blk: " Stefan Hajnoczi
2020-01-27 13:14   ` Cornelia Huck
2020-01-24 10:01 ` [PATCH v2 4/4] vhost-user-blk: " Stefan Hajnoczi
2020-01-27 13:17   ` Cornelia Huck
2020-01-27  9:59 ` [PATCH v2 0/4] virtio-pci: enable blk and scsi multi-queue by default Stefano Garzarella

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.