qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci
@ 2021-03-25 15:07 Greg Kurz
  2021-03-25 15:07 ` [RFC 1/8] memory: Allow eventfd add/del without starting a transaction Greg Kurz
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Greg Kurz @ 2021-03-25 15:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin, Greg Kurz,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini, David Gibson

Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
a serious slow down may be observed on setups with a big enough number
of vCPUs.

Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):

1		0m20.922s	0m21.346s
2		0m21.230s	0m20.350s
4		0m21.761s	0m20.997s
8		0m22.770s	0m20.051s
16		0m22.038s	0m19.994s
32		0m22.928s	0m20.803s
64		0m26.583s	0m22.953s
128		0m41.273s	0m32.333s
256		2m4.727s 	1m16.924s
384		6m5.563s 	3m26.186s

Both perf and gprof indicate that QEMU is hogging CPUs when setting up
the ioeventfds:

 67.88%  swapper         [kernel.kallsyms]  [k] power_pmu_enable
  9.47%  qemu-kvm        [kernel.kallsyms]  [k] smp_call_function_single
  8.64%  qemu-kvm        [kernel.kallsyms]  [k] power_pmu_enable
=>2.79%  qemu-kvm        qemu-kvm           [.] memory_region_ioeventfd_before
=>2.12%  qemu-kvm        qemu-kvm           [.] address_space_update_ioeventfds
  0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single

address_space_update_ioeventfds() is called when committing an MR
transaction, i.e. for each ioeventfd with the current code base,
and it internally loops on all ioventfds:

static void address_space_update_ioeventfds(AddressSpace *as)
{
[...]
    FOR_EACH_FLAT_RANGE(fr, view) {
        for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {

This means that the setup of ioeventfds for these devices has
quadratic time complexity.

This series introduce generic APIs to allow batch creation and deletion
of ioeventfds, and converts virtio-blk and virtio-scsi to use them. This
greatly improves the numbers:

1		0m21.271s	0m22.076s
2		0m20.912s	0m19.716s
4		0m20.508s	0m19.310s
8		0m21.374s	0m20.273s
16		0m21.559s	0m21.374s
32		0m22.532s	0m21.271s
64		0m26.550s	0m22.007s
128		0m29.115s	0m27.446s
256		0m44.752s	0m41.004s
384		1m2.884s	0m58.023s

The series deliberately spans over multiple subsystems for easier
review and experimenting. It also does some preliminary fixes on
the way. It is thus posted as an RFC for now, but if the general
idea is acceptable, I guess a non-RFC could be posted and maybe
extend the feature to some other devices that might suffer from
similar scaling issues, e.g. vhost-scsi-pci, vhost-user-scsi-pci
and vhost-user-blk-pci, even if I haven't checked.

This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
which reported the issue for virtio-scsi-pci.

Greg Kurz (8):
  memory: Allow eventfd add/del without starting a transaction
  virtio: Introduce virtio_bus_set_host_notifiers()
  virtio: Add API to batch set host notifiers
  virtio-pci: Batch add/del ioeventfds in a single MR transaction
  virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
  virtio-blk: Use virtio_bus_set_host_notifiers()
  virtio-scsi: Set host notifiers and callbacks separately
  virtio-scsi: Use virtio_bus_set_host_notifiers()

 hw/virtio/virtio-pci.h          |  1 +
 include/exec/memory.h           | 48 ++++++++++++++++------
 include/hw/virtio/virtio-bus.h  |  7 ++++
 hw/block/dataplane/virtio-blk.c | 26 +++++-------
 hw/scsi/virtio-scsi-dataplane.c | 68 ++++++++++++++++++--------------
 hw/virtio/virtio-bus.c          | 70 +++++++++++++++++++++++++++++++++
 hw/virtio/virtio-pci.c          | 53 +++++++++++++++++--------
 softmmu/memory.c                | 42 ++++++++++++--------
 8 files changed, 225 insertions(+), 90 deletions(-)

-- 
2.26.3




^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC 1/8] memory: Allow eventfd add/del without starting a transaction
  2021-03-25 15:07 [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Greg Kurz
@ 2021-03-25 15:07 ` Greg Kurz
  2021-03-29 17:03   ` Stefan Hajnoczi
  2021-03-25 15:07 ` [RFC 2/8] virtio: Introduce virtio_bus_set_host_notifiers() Greg Kurz
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Greg Kurz @ 2021-03-25 15:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin, Greg Kurz,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini, David Gibson

Each addition or deletion of an eventfd happens in its own MR
transaction. This doesn't scale well with multiqueue devices
that do 1:1 queue:vCPU mapping (e.g. virtio-scsi-pci or
virtio-blk-pci) : these devices typically create at least one
eventfd per queue and memory_region_transaction_commit(),
which is called during commit, also loops on eventfds, resulting
in a quadratic time complexity. This calls for batching : a
device should be able to add or delete its eventfds in a single
transaction.

Prepare ground for this by introducing extended versions of
memory_region_add_eventfd() and memory_region_del_eventfd()
that take an extra bool argument to control if a transaction
should be started or not.

No behavior change at this point.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 include/exec/memory.h | 48 ++++++++++++++++++++++++++++++++-----------
 softmmu/memory.c      | 42 ++++++++++++++++++++++---------------
 2 files changed, 62 insertions(+), 28 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5728a681b27d..98ed552e001c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1848,13 +1848,25 @@ void memory_region_clear_flush_coalesced(MemoryRegion *mr);
  * @match_data: whether to match against @data, instead of just @addr
  * @data: the data to match against the guest write
  * @e: event notifier to be triggered when @addr, @size, and @data all match.
+ * @transaction: whether to start a transaction for the change
  **/
-void memory_region_add_eventfd(MemoryRegion *mr,
-                               hwaddr addr,
-                               unsigned size,
-                               bool match_data,
-                               uint64_t data,
-                               EventNotifier *e);
+void memory_region_add_eventfd_full(MemoryRegion *mr,
+                                    hwaddr addr,
+                                    unsigned size,
+                                    bool match_data,
+                                    uint64_t data,
+                                    EventNotifier *e,
+                                    bool transaction);
+
+static inline void memory_region_add_eventfd(MemoryRegion *mr,
+                                             hwaddr addr,
+                                             unsigned size,
+                                             bool match_data,
+                                             uint64_t data,
+                                             EventNotifier *e)
+{
+    memory_region_add_eventfd_full(mr, addr, size, match_data, data, e, true);
+}
 
 /**
  * memory_region_del_eventfd: Cancel an eventfd.
@@ -1868,13 +1880,25 @@ void memory_region_add_eventfd(MemoryRegion *mr,
  * @match_data: whether to match against @data, instead of just @addr
  * @data: the data to match against the guest write
  * @e: event notifier to be triggered when @addr, @size, and @data all match.
+ * @transaction: whether to start a transaction for the change
  */
-void memory_region_del_eventfd(MemoryRegion *mr,
-                               hwaddr addr,
-                               unsigned size,
-                               bool match_data,
-                               uint64_t data,
-                               EventNotifier *e);
+void memory_region_del_eventfd_full(MemoryRegion *mr,
+                                    hwaddr addr,
+                                    unsigned size,
+                                    bool match_data,
+                                    uint64_t data,
+                                    EventNotifier *e,
+                                    bool transaction);
+
+static inline void memory_region_del_eventfd(MemoryRegion *mr,
+                                             hwaddr addr,
+                                             unsigned size,
+                                             bool match_data,
+                                             uint64_t data,
+                                             EventNotifier *e)
+{
+    memory_region_del_eventfd_full(mr, addr, size, match_data, data, e, true);
+}
 
 /**
  * memory_region_add_subregion: Add a subregion to a container.
diff --git a/softmmu/memory.c b/softmmu/memory.c
index d4493ef9e430..1b1942d521cc 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2341,12 +2341,13 @@ void memory_region_clear_flush_coalesced(MemoryRegion *mr)
 
 static bool userspace_eventfd_warning;
 
-void memory_region_add_eventfd(MemoryRegion *mr,
-                               hwaddr addr,
-                               unsigned size,
-                               bool match_data,
-                               uint64_t data,
-                               EventNotifier *e)
+void memory_region_add_eventfd_full(MemoryRegion *mr,
+                                    hwaddr addr,
+                                    unsigned size,
+                                    bool match_data,
+                                    uint64_t data,
+                                    EventNotifier *e,
+                                    bool transaction)
 {
     MemoryRegionIoeventfd mrfd = {
         .addr.start = int128_make64(addr),
@@ -2367,7 +2368,9 @@ void memory_region_add_eventfd(MemoryRegion *mr,
     if (size) {
         adjust_endianness(mr, &mrfd.data, size_memop(size) | MO_TE);
     }
-    memory_region_transaction_begin();
+    if (transaction) {
+        memory_region_transaction_begin();
+    }
     for (i = 0; i < mr->ioeventfd_nb; ++i) {
         if (memory_region_ioeventfd_before(&mrfd, &mr->ioeventfds[i])) {
             break;
@@ -2380,15 +2383,18 @@ void memory_region_add_eventfd(MemoryRegion *mr,
             sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
     mr->ioeventfds[i] = mrfd;
     ioeventfd_update_pending |= mr->enabled;
-    memory_region_transaction_commit();
+    if (transaction) {
+        memory_region_transaction_commit();
+    }
 }
 
-void memory_region_del_eventfd(MemoryRegion *mr,
-                               hwaddr addr,
-                               unsigned size,
-                               bool match_data,
-                               uint64_t data,
-                               EventNotifier *e)
+void memory_region_del_eventfd_full(MemoryRegion *mr,
+                                    hwaddr addr,
+                                    unsigned size,
+                                    bool match_data,
+                                    uint64_t data,
+                                    EventNotifier *e,
+                                    bool transaction)
 {
     MemoryRegionIoeventfd mrfd = {
         .addr.start = int128_make64(addr),
@@ -2402,7 +2408,9 @@ void memory_region_del_eventfd(MemoryRegion *mr,
     if (size) {
         adjust_endianness(mr, &mrfd.data, size_memop(size) | MO_TE);
     }
-    memory_region_transaction_begin();
+    if (transaction) {
+        memory_region_transaction_begin();
+    }
     for (i = 0; i < mr->ioeventfd_nb; ++i) {
         if (memory_region_ioeventfd_equal(&mrfd, &mr->ioeventfds[i])) {
             break;
@@ -2415,7 +2423,9 @@ void memory_region_del_eventfd(MemoryRegion *mr,
     mr->ioeventfds = g_realloc(mr->ioeventfds,
                                   sizeof(*mr->ioeventfds)*mr->ioeventfd_nb + 1);
     ioeventfd_update_pending |= mr->enabled;
-    memory_region_transaction_commit();
+    if (transaction) {
+        memory_region_transaction_commit();
+    }
 }
 
 static void memory_region_update_container_subregions(MemoryRegion *subregion)
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 2/8] virtio: Introduce virtio_bus_set_host_notifiers()
  2021-03-25 15:07 [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Greg Kurz
  2021-03-25 15:07 ` [RFC 1/8] memory: Allow eventfd add/del without starting a transaction Greg Kurz
@ 2021-03-25 15:07 ` Greg Kurz
  2021-03-29 17:06   ` Stefan Hajnoczi
  2021-03-25 15:07 ` [RFC 3/8] virtio: Add API to batch set host notifiers Greg Kurz
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Greg Kurz @ 2021-03-25 15:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin, Greg Kurz,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini, David Gibson

Multiqueue devices such as virtio-scsi or virtio-blk, all open-code the
same pattern to setup/tear down host notifiers of the request virtqueues.
Consolidate the pattern in a new virtio_bus_set_host_notifiers() API.
Since virtio-scsi and virtio-blk both fallback to userspace if host
notifiers can't be set, e.g. file descriptor exhaustion, go for a
warning rather than an error. Also make it oneshot to avoid flooding
the logs since the message would be very likely the same for all
virtqueues.

Devices will be converted to use virtio_bus_set_host_notifiers() in
separate patches.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 include/hw/virtio/virtio-bus.h |  3 +++
 hw/virtio/virtio-bus.c         | 36 ++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index ef8abe49c5a1..6d1e4ee3e886 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -154,5 +154,8 @@ void virtio_bus_release_ioeventfd(VirtioBusState *bus);
 int virtio_bus_set_host_notifier(VirtioBusState *bus, int n, bool assign);
 /* Tell the bus that the ioeventfd handler is no longer required. */
 void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n);
+/* Call virtio_bus_set_host_notifier() for several consecutive vqs */
+int virtio_bus_set_host_notifiers(VirtioBusState *bus, int nvqs, int n_offset,
+                                  bool assign);
 
 #endif /* VIRTIO_BUS_H */
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index d6332d45c3b2..c9e7cdb5c161 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -308,6 +308,42 @@ void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
     event_notifier_cleanup(notifier);
 }
 
+static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
+                                                        int nvqs, int n_offset)
+{
+    int i;
+
+    for (i = 0; i < nvqs; i++) {
+        virtio_bus_set_host_notifier(bus, i + n_offset, false);
+        virtio_bus_cleanup_host_notifier(bus, i + n_offset);
+    }
+}
+
+int virtio_bus_set_host_notifiers(VirtioBusState *bus, int nvqs, int n_offset,
+                                  bool assign)
+{
+    VirtIODevice *vdev = virtio_bus_get_device(bus);
+    int i;
+    int rc;
+
+    if (assign) {
+        for (i = 0; i < nvqs; i++) {
+            rc = virtio_bus_set_host_notifier(bus, i + n_offset, true);
+            if (rc != 0) {
+                warn_report_once("%s: Failed to set host notifier (%s).\n",
+                                 vdev->name, strerror(-rc));
+
+                virtio_bus_unset_and_cleanup_host_notifiers(bus, i, n_offset);
+                return rc;
+            }
+        }
+    } else {
+        virtio_bus_unset_and_cleanup_host_notifiers(bus, nvqs, n_offset);
+    }
+
+    return 0;
+}
+
 static char *virtio_bus_get_dev_path(DeviceState *dev)
 {
     BusState *bus = qdev_get_parent_bus(dev);
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 3/8] virtio: Add API to batch set host notifiers
  2021-03-25 15:07 [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Greg Kurz
  2021-03-25 15:07 ` [RFC 1/8] memory: Allow eventfd add/del without starting a transaction Greg Kurz
  2021-03-25 15:07 ` [RFC 2/8] virtio: Introduce virtio_bus_set_host_notifiers() Greg Kurz
@ 2021-03-25 15:07 ` Greg Kurz
  2021-03-29 17:10   ` Stefan Hajnoczi
  2021-03-25 15:07 ` [RFC 4/8] virtio-pci: Batch add/del ioeventfds in a single MR transaction Greg Kurz
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Greg Kurz @ 2021-03-25 15:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin, Greg Kurz,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini, David Gibson

Introduce VirtioBusClass methods to begin and commit a transaction
of setting/unsetting host notifiers. These handlers will be implemented
by virtio-pci to batch addition and deletion of ioeventfds for multiqueue
devices like virtio-scsi-pci or virtio-blk-pci.

Convert virtio_bus_set_host_notifiers() to use these handlers. Note that
virtio_bus_cleanup_host_notifier() closes eventfds, which could still be
passed to the KVM_IOEVENTFD ioctl() when the transaction ends and fail
with EBADF. The cleanup of the host notifiers is thus pushed to a
separate loop in virtio_bus_unset_and_cleanup_host_notifiers(), after
transaction commit.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 include/hw/virtio/virtio-bus.h |  4 ++++
 hw/virtio/virtio-bus.c         | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index 6d1e4ee3e886..99704b2c090a 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -82,6 +82,10 @@ struct VirtioBusClass {
      */
     int (*ioeventfd_assign)(DeviceState *d, EventNotifier *notifier,
                             int n, bool assign);
+
+    void (*ioeventfd_assign_begin)(DeviceState *d);
+    void (*ioeventfd_assign_commit)(DeviceState *d);
+
     /*
      * Whether queue number n is enabled.
      */
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index c9e7cdb5c161..156484c4ca14 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -295,6 +295,28 @@ int virtio_bus_set_host_notifier(VirtioBusState *bus, int n, bool assign)
     return r;
 }
 
+static void virtio_bus_set_host_notifier_begin(VirtioBusState *bus)
+{
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
+    DeviceState *proxy = DEVICE(BUS(bus)->parent);
+
+    if (k->ioeventfd_assign_begin) {
+        assert(k->ioeventfd_assign_commit);
+        k->ioeventfd_assign_begin(proxy);
+    }
+}
+
+static void virtio_bus_set_host_notifier_commit(VirtioBusState *bus)
+{
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
+    DeviceState *proxy = DEVICE(BUS(bus)->parent);
+
+    if (k->ioeventfd_assign_commit) {
+        assert(k->ioeventfd_assign_begin);
+        k->ioeventfd_assign_commit(proxy);
+    }
+}
+
 void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
 {
     VirtIODevice *vdev = virtio_bus_get_device(bus);
@@ -308,6 +330,7 @@ void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
     event_notifier_cleanup(notifier);
 }
 
+/* virtio_bus_set_host_notifier_begin() must have been called */
 static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
                                                         int nvqs, int n_offset)
 {
@@ -315,6 +338,10 @@ static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
 
     for (i = 0; i < nvqs; i++) {
         virtio_bus_set_host_notifier(bus, i + n_offset, false);
+    }
+    /* Let address_space_update_ioeventfds() run before closing ioeventfds */
+    virtio_bus_set_host_notifier_commit(bus);
+    for (i = 0; i < nvqs; i++) {
         virtio_bus_cleanup_host_notifier(bus, i + n_offset);
     }
 }
@@ -327,17 +354,24 @@ int virtio_bus_set_host_notifiers(VirtioBusState *bus, int nvqs, int n_offset,
     int rc;
 
     if (assign) {
+        virtio_bus_set_host_notifier_begin(bus);
+
         for (i = 0; i < nvqs; i++) {
             rc = virtio_bus_set_host_notifier(bus, i + n_offset, true);
             if (rc != 0) {
                 warn_report_once("%s: Failed to set host notifier (%s).\n",
                                  vdev->name, strerror(-rc));
 
+                /* This also calls virtio_bus_set_host_notifier_commit() */
                 virtio_bus_unset_and_cleanup_host_notifiers(bus, i, n_offset);
                 return rc;
             }
         }
+
+        virtio_bus_set_host_notifier_commit(bus);
     } else {
+        virtio_bus_set_host_notifier_begin(bus);
+        /* This also calls virtio_bus_set_host_notifier_commit() */
         virtio_bus_unset_and_cleanup_host_notifiers(bus, nvqs, n_offset);
     }
 
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 4/8] virtio-pci: Batch add/del ioeventfds in a single MR transaction
  2021-03-25 15:07 [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Greg Kurz
                   ` (2 preceding siblings ...)
  2021-03-25 15:07 ` [RFC 3/8] virtio: Add API to batch set host notifiers Greg Kurz
@ 2021-03-25 15:07 ` Greg Kurz
  2021-03-29 17:24   ` Stefan Hajnoczi
  2021-03-25 15:07 ` [RFC 5/8] virtio-blk: Fix rollback path in virtio_blk_data_plane_start() Greg Kurz
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Greg Kurz @ 2021-03-25 15:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin, Greg Kurz,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini, David Gibson

Implement the ioeventfd_assign_begin() and ioeventfd_assign_commit()
handlers of VirtioBusClass. Basically track that a transaction was
already requested by the device and use this information to prevent
the memory code to generate a transaction for each individual eventfd.

Devices that want to benefit of this batching feature must be converted
to use the virtio_bus_set_host_notifiers() API.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/virtio/virtio-pci.h |  1 +
 hw/virtio/virtio-pci.c | 53 +++++++++++++++++++++++++++++-------------
 softmmu/memory.c       |  4 ++--
 3 files changed, 40 insertions(+), 18 deletions(-)

diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index d7d5d403a948..a1b3f1bc45c9 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -141,6 +141,7 @@ struct VirtIOPCIProxy {
     bool disable_modern;
     bool ignore_backend_features;
     OnOffAuto disable_legacy;
+    bool ioeventfd_assign_started;
     uint32_t class_code;
     uint32_t nvectors;
     uint32_t dfselect;
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 883045a22354..0a8738c69541 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -243,47 +243,66 @@ static int virtio_pci_ioeventfd_assign(DeviceState *d, EventNotifier *notifier,
     hwaddr modern_addr = virtio_pci_queue_mem_mult(proxy) *
                          virtio_get_queue_index(vq);
     hwaddr legacy_addr = VIRTIO_PCI_QUEUE_NOTIFY;
+    bool transaction = !proxy->ioeventfd_assign_started;
 
     if (assign) {
         if (modern) {
             if (fast_mmio) {
-                memory_region_add_eventfd(modern_mr, modern_addr, 0,
-                                          false, n, notifier);
+                memory_region_add_eventfd_full(modern_mr, modern_addr, 0,
+                                               false, n, notifier, transaction);
             } else {
-                memory_region_add_eventfd(modern_mr, modern_addr, 2,
-                                          false, n, notifier);
+                memory_region_add_eventfd_full(modern_mr, modern_addr, 2,
+                                               false, n, notifier, transaction);
             }
             if (modern_pio) {
-                memory_region_add_eventfd(modern_notify_mr, 0, 2,
-                                              true, n, notifier);
+                memory_region_add_eventfd_full(modern_notify_mr, 0, 2,
+                                               true, n, notifier, transaction);
             }
         }
         if (legacy) {
-            memory_region_add_eventfd(legacy_mr, legacy_addr, 2,
-                                      true, n, notifier);
+            memory_region_add_eventfd_full(legacy_mr, legacy_addr, 2,
+                                           true, n, notifier, transaction);
         }
     } else {
         if (modern) {
             if (fast_mmio) {
-                memory_region_del_eventfd(modern_mr, modern_addr, 0,
-                                          false, n, notifier);
+                memory_region_del_eventfd_full(modern_mr, modern_addr, 0,
+                                               false, n, notifier, transaction);
             } else {
-                memory_region_del_eventfd(modern_mr, modern_addr, 2,
-                                          false, n, notifier);
+                memory_region_del_eventfd_full(modern_mr, modern_addr, 2,
+                                               false, n, notifier, transaction);
             }
             if (modern_pio) {
-                memory_region_del_eventfd(modern_notify_mr, 0, 2,
-                                          true, n, notifier);
+                memory_region_del_eventfd_full(modern_notify_mr, 0, 2,
+                                               true, n, notifier, transaction);
             }
         }
         if (legacy) {
-            memory_region_del_eventfd(legacy_mr, legacy_addr, 2,
-                                      true, n, notifier);
+            memory_region_del_eventfd_full(legacy_mr, legacy_addr, 2,
+                                           true, n, notifier, transaction);
         }
     }
     return 0;
 }
 
+static void virtio_pci_ioeventfd_assign_begin(DeviceState *d)
+{
+    VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
+
+    assert(!proxy->ioeventfd_assign_started);
+    proxy->ioeventfd_assign_started = true;
+    memory_region_transaction_begin();
+}
+
+static void virtio_pci_ioeventfd_assign_commit(DeviceState *d)
+{
+    VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
+
+    assert(proxy->ioeventfd_assign_started);
+    memory_region_transaction_commit();
+    proxy->ioeventfd_assign_started = false;
+}
+
 static void virtio_pci_start_ioeventfd(VirtIOPCIProxy *proxy)
 {
     virtio_bus_start_ioeventfd(&proxy->bus);
@@ -2161,6 +2180,8 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, void *data)
     k->query_nvectors = virtio_pci_query_nvectors;
     k->ioeventfd_enabled = virtio_pci_ioeventfd_enabled;
     k->ioeventfd_assign = virtio_pci_ioeventfd_assign;
+    k->ioeventfd_assign_begin = virtio_pci_ioeventfd_assign_begin;
+    k->ioeventfd_assign_commit = virtio_pci_ioeventfd_assign_commit;
     k->get_dma_as = virtio_pci_get_dma_as;
     k->queue_enabled = virtio_pci_queue_enabled;
 }
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 1b1942d521cc..0279e5671bcb 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2368,7 +2368,7 @@ void memory_region_add_eventfd_full(MemoryRegion *mr,
     if (size) {
         adjust_endianness(mr, &mrfd.data, size_memop(size) | MO_TE);
     }
-    if (transaction) {
+    if (!transaction) {
         memory_region_transaction_begin();
     }
     for (i = 0; i < mr->ioeventfd_nb; ++i) {
@@ -2383,7 +2383,7 @@ void memory_region_add_eventfd_full(MemoryRegion *mr,
             sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
     mr->ioeventfds[i] = mrfd;
     ioeventfd_update_pending |= mr->enabled;
-    if (transaction) {
+    if (!transaction) {
         memory_region_transaction_commit();
     }
 }
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 5/8] virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
  2021-03-25 15:07 [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Greg Kurz
                   ` (3 preceding siblings ...)
  2021-03-25 15:07 ` [RFC 4/8] virtio-pci: Batch add/del ioeventfds in a single MR transaction Greg Kurz
@ 2021-03-25 15:07 ` Greg Kurz
  2021-03-29 17:25   ` Stefan Hajnoczi
  2021-03-25 15:07 ` [RFC 6/8] virtio-blk: Use virtio_bus_set_host_notifiers() Greg Kurz
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Greg Kurz @ 2021-03-25 15:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin, Greg Kurz,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini, David Gibson

When dataplane multiqueue support was added in QEMU 2.7, the path
that would rollback guest notifiers assignment in case of error
simply got dropped.

Later on, when Error was added to blk_set_aio_context() in QEMU 4.1,
another error path was introduced, but it ommits to rollback both
host and guest notifiers.

It seems cleaner to fix the rollback path in one go. The patch is
simple enough that it can be adjusted if backported to a pre-4.1
QEMU.

Fixes: 51b04ac5c6a6 ("virtio-blk: dataplane multiqueue support")
Cc: stefanha@redhat.com
Fixes: 97896a4887a0 ("block: Add Error to blk_set_aio_context()")
Cc: kwolf@redhat.com
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/block/dataplane/virtio-blk.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index e9050c8987e7..d7b5c95d26d9 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -207,7 +207,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
                 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
                 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
             }
-            goto fail_guest_notifiers;
+            goto fail_host_notifiers;
         }
     }
 
@@ -221,7 +221,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
     aio_context_release(old_context);
     if (r < 0) {
         error_report_err(local_err);
-        goto fail_guest_notifiers;
+        goto fail_aio_context;
     }
 
     /* Process queued requests before the ones in vring */
@@ -245,6 +245,13 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
     aio_context_release(s->ctx);
     return 0;
 
+  fail_aio_context:
+    for (i = 0; i < nvqs; i++) {
+        virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+        virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
+    }
+  fail_host_notifiers:
+    k->set_guest_notifiers(qbus->parent, nvqs, false);
   fail_guest_notifiers:
     /*
      * If we failed to set up the guest notifiers queued requests will be
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 6/8] virtio-blk: Use virtio_bus_set_host_notifiers()
  2021-03-25 15:07 [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Greg Kurz
                   ` (4 preceding siblings ...)
  2021-03-25 15:07 ` [RFC 5/8] virtio-blk: Fix rollback path in virtio_blk_data_plane_start() Greg Kurz
@ 2021-03-25 15:07 ` Greg Kurz
  2021-03-29 17:26   ` Stefan Hajnoczi
  2021-03-25 15:07 ` [RFC 7/8] virtio-scsi: Set host notifiers and callbacks separately Greg Kurz
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Greg Kurz @ 2021-03-25 15:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin, Greg Kurz,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini, David Gibson

This allows the virtio-blk-pci device to batch additions and deletions
of host notifiers. This significantly improves boot time of VMs with a
high number of vCPUs, e.g. from 3m26.408s down to 0m59.923s for a pseries
machine with 384 vCPUs.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/block/dataplane/virtio-blk.c | 25 ++++++-------------------
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index d7b5c95d26d9..fd2a60010285 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -172,6 +172,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
     VirtIOBlockDataPlane *s = vblk->dataplane;
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vblk)));
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+    VirtioBusState *bus = VIRTIO_BUS(qbus);
     AioContext *old_context;
     unsigned i;
     unsigned nvqs = s->conf->num_queues;
@@ -199,16 +200,9 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
     }
 
     /* Set up virtqueue notify */
-    for (i = 0; i < nvqs; i++) {
-        r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, true);
-        if (r != 0) {
-            fprintf(stderr, "virtio-blk failed to set host notifier (%d)\n", r);
-            while (i--) {
-                virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
-                virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-            }
-            goto fail_host_notifiers;
-        }
+    r = virtio_bus_set_host_notifiers(bus, nvqs, 0, true);
+    if (r != 0) {
+        goto fail_host_notifiers;
     }
 
     s->starting = false;
@@ -246,10 +240,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
     return 0;
 
   fail_aio_context:
-    for (i = 0; i < nvqs; i++) {
-        virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
-        virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-    }
+    virtio_bus_set_host_notifiers(bus, nvqs, 0, false);
   fail_host_notifiers:
     k->set_guest_notifiers(qbus->parent, nvqs, false);
   fail_guest_notifiers:
@@ -287,7 +278,6 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
     VirtIOBlockDataPlane *s = vblk->dataplane;
     BusState *qbus = qdev_get_parent_bus(DEVICE(vblk));
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-    unsigned i;
     unsigned nvqs = s->conf->num_queues;
 
     if (!vblk->dataplane_started || s->stopping) {
@@ -312,10 +302,7 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 
     aio_context_release(s->ctx);
 
-    for (i = 0; i < nvqs; i++) {
-        virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
-        virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-    }
+    virtio_bus_set_host_notifiers(VIRTIO_BUS(qbus), nvqs, 0, false);
 
     qemu_bh_cancel(s->bh);
     notify_guest_bh(s); /* final chance to notify guest */
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 7/8] virtio-scsi: Set host notifiers and callbacks separately
  2021-03-25 15:07 [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Greg Kurz
                   ` (5 preceding siblings ...)
  2021-03-25 15:07 ` [RFC 6/8] virtio-blk: Use virtio_bus_set_host_notifiers() Greg Kurz
@ 2021-03-25 15:07 ` Greg Kurz
  2021-03-25 15:07 ` [RFC 8/8] virtio-scsi: Use virtio_bus_set_host_notifiers() Greg Kurz
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Greg Kurz @ 2021-03-25 15:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin, Greg Kurz,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini, David Gibson

Host notifiers are guaranteed to be idle until the callbacks are
hooked up with virtio_queue_aio_set_host_notifier_handler(). They
thus don't need to be set or unset with the AioContext lock held.

Do this outside the critical section, like virtio-blk already
does : basically splitting virtio_scsi_vring_init() in two
functions, one to set/unset the host notifier and one for the
aio handler.

Further improvement is to convert virtio-scsi-pci to use the
virtio_bus_set_host_notifiers() API in order to batch setup
and tear down of ioeventfds. This is expected to significantly
reduce boot time of VMs with high number of vCPUs.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/scsi/virtio-scsi-dataplane.c | 46 ++++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 4ad879340645..11b53ab767be 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -94,8 +94,7 @@ static bool virtio_scsi_data_plane_handle_event(VirtIODevice *vdev,
     return progress;
 }
 
-static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue *vq, int n,
-                                  VirtIOHandleAIOOutput fn)
+static int virtio_scsi_set_host_notifier(VirtIOSCSI *s, VirtQueue *vq, int n)
 {
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s)));
     int rc;
@@ -109,10 +108,15 @@ static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue *vq, int n,
         return rc;
     }
 
-    virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, fn);
     return 0;
 }
 
+static void virtio_scsi_set_host_notifier_handler(VirtIOSCSI *s, VirtQueue *vq,
+                                                  VirtIOHandleAIOOutput fn)
+{
+    virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, fn);
+}
+
 /* Context: BH in IOThread */
 static void virtio_scsi_dataplane_stop_bh(void *opaque)
 {
@@ -154,38 +158,44 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
         goto fail_guest_notifiers;
     }
 
-    aio_context_acquire(s->ctx);
-    rc = virtio_scsi_vring_init(s, vs->ctrl_vq, 0,
-                                virtio_scsi_data_plane_handle_ctrl);
-    if (rc) {
-        goto fail_vrings;
+    rc = virtio_scsi_set_host_notifier(s, vs->ctrl_vq, 0);
+    if (rc != 0) {
+        goto fail_host_notifiers;
     }
 
     vq_init_count++;
-    rc = virtio_scsi_vring_init(s, vs->event_vq, 1,
-                                virtio_scsi_data_plane_handle_event);
-    if (rc) {
-        goto fail_vrings;
+    rc = virtio_scsi_set_host_notifier(s, vs->event_vq, 1);
+    if (rc != 0) {
+        goto fail_host_notifiers;
     }
 
     vq_init_count++;
+
     for (i = 0; i < vs->conf.num_queues; i++) {
-        rc = virtio_scsi_vring_init(s, vs->cmd_vqs[i], i + 2,
-                                    virtio_scsi_data_plane_handle_cmd);
+        rc = virtio_scsi_set_host_notifier(s, vs->cmd_vqs[i], i + 2);
         if (rc) {
-            goto fail_vrings;
+            goto fail_host_notifiers;
         }
         vq_init_count++;
     }
 
+    aio_context_acquire(s->ctx);
+    virtio_scsi_set_host_notifier_handler(s, vs->ctrl_vq,
+                                          virtio_scsi_data_plane_handle_ctrl);
+    virtio_scsi_set_host_notifier_handler(s, vs->event_vq,
+                                          virtio_scsi_data_plane_handle_event);
+
+    for (i = 0; i < vs->conf.num_queues; i++) {
+        virtio_scsi_set_host_notifier_handler(s, vs->cmd_vqs[i],
+                                             virtio_scsi_data_plane_handle_cmd);
+    }
+
     s->dataplane_starting = false;
     s->dataplane_started = true;
     aio_context_release(s->ctx);
     return 0;
 
-fail_vrings:
-    aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
-    aio_context_release(s->ctx);
+fail_host_notifiers:
     for (i = 0; i < vq_init_count; i++) {
         virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
         virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 8/8] virtio-scsi: Use virtio_bus_set_host_notifiers()
  2021-03-25 15:07 [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Greg Kurz
                   ` (6 preceding siblings ...)
  2021-03-25 15:07 ` [RFC 7/8] virtio-scsi: Set host notifiers and callbacks separately Greg Kurz
@ 2021-03-25 15:07 ` Greg Kurz
  2021-03-29 17:28   ` Stefan Hajnoczi
  2021-03-25 17:05 ` [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Michael S. Tsirkin
  2021-03-29 17:35 ` Stefan Hajnoczi
  9 siblings, 1 reply; 30+ messages in thread
From: Greg Kurz @ 2021-03-25 15:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin, Greg Kurz,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini, David Gibson

This allows the virtio-scsi-pci device to batch additions and deletions
of host notifiers. This significantly improves boot time of VMs with a
high number of vCPUs, e.g. from 6m13.969s down to 1m4.268s for a pseries
machine with 384 vCPUs.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/scsi/virtio-scsi-dataplane.c | 28 +++++++++++++---------------
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 11b53ab767be..eec2b6e19a5b 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -141,6 +141,7 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
     VirtIOSCSI *s = VIRTIO_SCSI(vdev);
+    VirtioBusState *bus = VIRTIO_BUS(qbus);
 
     if (s->dataplane_started ||
         s->dataplane_starting ||
@@ -171,12 +172,9 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 
     vq_init_count++;
 
-    for (i = 0; i < vs->conf.num_queues; i++) {
-        rc = virtio_scsi_set_host_notifier(s, vs->cmd_vqs[i], i + 2);
-        if (rc) {
-            goto fail_host_notifiers;
-        }
-        vq_init_count++;
+    rc = virtio_bus_set_host_notifiers(bus, vs->conf.num_queues, 2, true);
+    if (rc) {
+        goto fail_host_notifiers;
     }
 
     aio_context_acquire(s->ctx);
@@ -196,10 +194,13 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
     return 0;
 
 fail_host_notifiers:
-    for (i = 0; i < vq_init_count; i++) {
-        virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
-        virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-    }
+    /*
+     * Only host notifiers for ctrl_vq and event_vq can be set at
+     * this point. Notifiers for cmd_vqs[] have been reverted by
+     * virtio_bus_set_host_notifiers() already.
+     */
+    assert(vq_init_count <= 2);
+    virtio_bus_set_host_notifiers(bus, vq_init_count, 0, false);
     k->set_guest_notifiers(qbus->parent, vs->conf.num_queues + 2, false);
 fail_guest_notifiers:
     s->dataplane_fenced = true;
@@ -215,7 +216,6 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
     VirtIOSCSI *s = VIRTIO_SCSI(vdev);
-    int i;
 
     if (!s->dataplane_started || s->dataplane_stopping) {
         return;
@@ -235,10 +235,8 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
 
     blk_drain_all(); /* ensure there are no in-flight requests */
 
-    for (i = 0; i < vs->conf.num_queues + 2; i++) {
-        virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
-        virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-    }
+    virtio_bus_set_host_notifiers(VIRTIO_BUS(qbus), vs->conf.num_queues + 2, 0,
+                                  false);
 
     /* Clean up guest notifier (irq) */
     k->set_guest_notifiers(qbus->parent, vs->conf.num_queues + 2, false);
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci
  2021-03-25 15:07 [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Greg Kurz
                   ` (7 preceding siblings ...)
  2021-03-25 15:07 ` [RFC 8/8] virtio-scsi: Use virtio_bus_set_host_notifiers() Greg Kurz
@ 2021-03-25 17:05 ` Michael S. Tsirkin
  2021-03-25 17:43   ` Stefan Hajnoczi
  2021-03-25 18:05   ` Greg Kurz
  2021-03-29 17:35 ` Stefan Hajnoczi
  9 siblings, 2 replies; 30+ messages in thread
From: Michael S. Tsirkin @ 2021-03-25 17:05 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Jason Wang, qemu-devel,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini, David Gibson

On Thu, Mar 25, 2021 at 04:07:27PM +0100, Greg Kurz wrote:
> Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
> a serious slow down may be observed on setups with a big enough number
> of vCPUs.
> 
> Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):
> 
> 1		0m20.922s	0m21.346s
> 2		0m21.230s	0m20.350s
> 4		0m21.761s	0m20.997s
> 8		0m22.770s	0m20.051s
> 16		0m22.038s	0m19.994s
> 32		0m22.928s	0m20.803s
> 64		0m26.583s	0m22.953s
> 128		0m41.273s	0m32.333s
> 256		2m4.727s 	1m16.924s
> 384		6m5.563s 	3m26.186s
> 
> Both perf and gprof indicate that QEMU is hogging CPUs when setting up
> the ioeventfds:
> 
>  67.88%  swapper         [kernel.kallsyms]  [k] power_pmu_enable
>   9.47%  qemu-kvm        [kernel.kallsyms]  [k] smp_call_function_single
>   8.64%  qemu-kvm        [kernel.kallsyms]  [k] power_pmu_enable
> =>2.79%  qemu-kvm        qemu-kvm           [.] memory_region_ioeventfd_before
> =>2.12%  qemu-kvm        qemu-kvm           [.] address_space_update_ioeventfds
>   0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single
> 
> address_space_update_ioeventfds() is called when committing an MR
> transaction, i.e. for each ioeventfd with the current code base,
> and it internally loops on all ioventfds:
> 
> static void address_space_update_ioeventfds(AddressSpace *as)
> {
> [...]
>     FOR_EACH_FLAT_RANGE(fr, view) {
>         for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
> 
> This means that the setup of ioeventfds for these devices has
> quadratic time complexity.
> 
> This series introduce generic APIs to allow batch creation and deletion
> of ioeventfds, and converts virtio-blk and virtio-scsi to use them. This
> greatly improves the numbers:
> 
> 1		0m21.271s	0m22.076s
> 2		0m20.912s	0m19.716s
> 4		0m20.508s	0m19.310s
> 8		0m21.374s	0m20.273s
> 16		0m21.559s	0m21.374s
> 32		0m22.532s	0m21.271s
> 64		0m26.550s	0m22.007s
> 128		0m29.115s	0m27.446s
> 256		0m44.752s	0m41.004s
> 384		1m2.884s	0m58.023s
> 
> The series deliberately spans over multiple subsystems for easier
> review and experimenting. It also does some preliminary fixes on
> the way. It is thus posted as an RFC for now, but if the general
> idea is acceptable, I guess a non-RFC could be posted and maybe
> extend the feature to some other devices that might suffer from
> similar scaling issues, e.g. vhost-scsi-pci, vhost-user-scsi-pci
> and vhost-user-blk-pci, even if I haven't checked.
> 
> This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
> which reported the issue for virtio-scsi-pci.


Series looks ok from a quick look ...

this is a regression isn't it?
So I guess we'll need that in 6.0 or revert the # of vqs
change for now ...

> Greg Kurz (8):
>   memory: Allow eventfd add/del without starting a transaction
>   virtio: Introduce virtio_bus_set_host_notifiers()
>   virtio: Add API to batch set host notifiers
>   virtio-pci: Batch add/del ioeventfds in a single MR transaction
>   virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
>   virtio-blk: Use virtio_bus_set_host_notifiers()
>   virtio-scsi: Set host notifiers and callbacks separately
>   virtio-scsi: Use virtio_bus_set_host_notifiers()
> 
>  hw/virtio/virtio-pci.h          |  1 +
>  include/exec/memory.h           | 48 ++++++++++++++++------
>  include/hw/virtio/virtio-bus.h  |  7 ++++
>  hw/block/dataplane/virtio-blk.c | 26 +++++-------
>  hw/scsi/virtio-scsi-dataplane.c | 68 ++++++++++++++++++--------------
>  hw/virtio/virtio-bus.c          | 70 +++++++++++++++++++++++++++++++++
>  hw/virtio/virtio-pci.c          | 53 +++++++++++++++++--------
>  softmmu/memory.c                | 42 ++++++++++++--------
>  8 files changed, 225 insertions(+), 90 deletions(-)
> 
> -- 
> 2.26.3
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci
  2021-03-25 17:05 ` [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Michael S. Tsirkin
@ 2021-03-25 17:43   ` Stefan Hajnoczi
  2021-03-25 20:51     ` Greg Kurz
  2021-03-25 18:05   ` Greg Kurz
  1 sibling, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2021-03-25 17:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Fam Zheng, Kevin Wolf, qemu-block, qemu-devel, Jason Wang,
	Greg Kurz, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 3231 bytes --]

On Thu, Mar 25, 2021 at 01:05:16PM -0400, Michael S. Tsirkin wrote:
> On Thu, Mar 25, 2021 at 04:07:27PM +0100, Greg Kurz wrote:
> > Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
> > a serious slow down may be observed on setups with a big enough number
> > of vCPUs.
> > 
> > Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):
> > 
> > 1		0m20.922s	0m21.346s
> > 2		0m21.230s	0m20.350s
> > 4		0m21.761s	0m20.997s
> > 8		0m22.770s	0m20.051s
> > 16		0m22.038s	0m19.994s
> > 32		0m22.928s	0m20.803s
> > 64		0m26.583s	0m22.953s
> > 128		0m41.273s	0m32.333s
> > 256		2m4.727s 	1m16.924s
> > 384		6m5.563s 	3m26.186s
> > 
> > Both perf and gprof indicate that QEMU is hogging CPUs when setting up
> > the ioeventfds:
> > 
> >  67.88%  swapper         [kernel.kallsyms]  [k] power_pmu_enable
> >   9.47%  qemu-kvm        [kernel.kallsyms]  [k] smp_call_function_single
> >   8.64%  qemu-kvm        [kernel.kallsyms]  [k] power_pmu_enable
> > =>2.79%  qemu-kvm        qemu-kvm           [.] memory_region_ioeventfd_before
> > =>2.12%  qemu-kvm        qemu-kvm           [.] address_space_update_ioeventfds
> >   0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single
> > 
> > address_space_update_ioeventfds() is called when committing an MR
> > transaction, i.e. for each ioeventfd with the current code base,
> > and it internally loops on all ioventfds:
> > 
> > static void address_space_update_ioeventfds(AddressSpace *as)
> > {
> > [...]
> >     FOR_EACH_FLAT_RANGE(fr, view) {
> >         for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
> > 
> > This means that the setup of ioeventfds for these devices has
> > quadratic time complexity.
> > 
> > This series introduce generic APIs to allow batch creation and deletion
> > of ioeventfds, and converts virtio-blk and virtio-scsi to use them. This
> > greatly improves the numbers:
> > 
> > 1		0m21.271s	0m22.076s
> > 2		0m20.912s	0m19.716s
> > 4		0m20.508s	0m19.310s
> > 8		0m21.374s	0m20.273s
> > 16		0m21.559s	0m21.374s
> > 32		0m22.532s	0m21.271s
> > 64		0m26.550s	0m22.007s
> > 128		0m29.115s	0m27.446s
> > 256		0m44.752s	0m41.004s
> > 384		1m2.884s	0m58.023s
> > 
> > The series deliberately spans over multiple subsystems for easier
> > review and experimenting. It also does some preliminary fixes on
> > the way. It is thus posted as an RFC for now, but if the general
> > idea is acceptable, I guess a non-RFC could be posted and maybe
> > extend the feature to some other devices that might suffer from
> > similar scaling issues, e.g. vhost-scsi-pci, vhost-user-scsi-pci
> > and vhost-user-blk-pci, even if I haven't checked.
> > 
> > This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
> > which reported the issue for virtio-scsi-pci.
> 
> 
> Series looks ok from a quick look ...
> 
> this is a regression isn't it?
> So I guess we'll need that in 6.0 or revert the # of vqs
> change for now ...

Commit 9445e1e15e66c19e42bea942ba810db28052cd05 ("virtio-blk-pci:
default num_queues to -smp N") was already released in QEMU 5.2.0. It is
not a QEMU 6.0 regression.

I'll review this series on Monday.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci
  2021-03-25 17:05 ` [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Michael S. Tsirkin
  2021-03-25 17:43   ` Stefan Hajnoczi
@ 2021-03-25 18:05   ` Greg Kurz
  1 sibling, 0 replies; 30+ messages in thread
From: Greg Kurz @ 2021-03-25 18:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Jason Wang, qemu-devel,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini, David Gibson

On Thu, 25 Mar 2021 13:05:16 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Mar 25, 2021 at 04:07:27PM +0100, Greg Kurz wrote:
> > Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
> > a serious slow down may be observed on setups with a big enough number
> > of vCPUs.
> > 
> > Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):
> > 
> > 1		0m20.922s	0m21.346s
> > 2		0m21.230s	0m20.350s
> > 4		0m21.761s	0m20.997s
> > 8		0m22.770s	0m20.051s
> > 16		0m22.038s	0m19.994s
> > 32		0m22.928s	0m20.803s
> > 64		0m26.583s	0m22.953s
> > 128		0m41.273s	0m32.333s
> > 256		2m4.727s 	1m16.924s
> > 384		6m5.563s 	3m26.186s
> > 
> > Both perf and gprof indicate that QEMU is hogging CPUs when setting up
> > the ioeventfds:
> > 
> >  67.88%  swapper         [kernel.kallsyms]  [k] power_pmu_enable
> >   9.47%  qemu-kvm        [kernel.kallsyms]  [k] smp_call_function_single
> >   8.64%  qemu-kvm        [kernel.kallsyms]  [k] power_pmu_enable
> > =>2.79%  qemu-kvm        qemu-kvm           [.] memory_region_ioeventfd_before
> > =>2.12%  qemu-kvm        qemu-kvm           [.] address_space_update_ioeventfds
> >   0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single
> > 
> > address_space_update_ioeventfds() is called when committing an MR
> > transaction, i.e. for each ioeventfd with the current code base,
> > and it internally loops on all ioventfds:
> > 
> > static void address_space_update_ioeventfds(AddressSpace *as)
> > {
> > [...]
> >     FOR_EACH_FLAT_RANGE(fr, view) {
> >         for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
> > 
> > This means that the setup of ioeventfds for these devices has
> > quadratic time complexity.
> > 
> > This series introduce generic APIs to allow batch creation and deletion
> > of ioeventfds, and converts virtio-blk and virtio-scsi to use them. This
> > greatly improves the numbers:
> > 
> > 1		0m21.271s	0m22.076s
> > 2		0m20.912s	0m19.716s
> > 4		0m20.508s	0m19.310s
> > 8		0m21.374s	0m20.273s
> > 16		0m21.559s	0m21.374s
> > 32		0m22.532s	0m21.271s
> > 64		0m26.550s	0m22.007s
> > 128		0m29.115s	0m27.446s
> > 256		0m44.752s	0m41.004s
> > 384		1m2.884s	0m58.023s
> > 
> > The series deliberately spans over multiple subsystems for easier
> > review and experimenting. It also does some preliminary fixes on
> > the way. It is thus posted as an RFC for now, but if the general
> > idea is acceptable, I guess a non-RFC could be posted and maybe
> > extend the feature to some other devices that might suffer from
> > similar scaling issues, e.g. vhost-scsi-pci, vhost-user-scsi-pci
> > and vhost-user-blk-pci, even if I haven't checked.
> > 
> > This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
> > which reported the issue for virtio-scsi-pci.
> 
> 
> Series looks ok from a quick look ...
> 
> this is a regression isn't it?

This is a regression only if we consider the defaults. Manually setting
num_queues to a high value already affects existing devices.

> So I guess we'll need that in 6.0 or revert the # of vqs
> change for now ...
> 

Not sure it is safe to merge these fixes this late... also,
as said above, I've only tested virtio-scsi and virtio-blk
but I believe the vhost-user-* variants might be affected too.

Reverting the # of vqs would really be a pity IMHO. What
about mitigating the effects ? e.g. enforce previous
behavior only if # vcpus > 64 ?

> > Greg Kurz (8):
> >   memory: Allow eventfd add/del without starting a transaction
> >   virtio: Introduce virtio_bus_set_host_notifiers()
> >   virtio: Add API to batch set host notifiers
> >   virtio-pci: Batch add/del ioeventfds in a single MR transaction
> >   virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
> >   virtio-blk: Use virtio_bus_set_host_notifiers()
> >   virtio-scsi: Set host notifiers and callbacks separately
> >   virtio-scsi: Use virtio_bus_set_host_notifiers()
> > 
> >  hw/virtio/virtio-pci.h          |  1 +
> >  include/exec/memory.h           | 48 ++++++++++++++++------
> >  include/hw/virtio/virtio-bus.h  |  7 ++++
> >  hw/block/dataplane/virtio-blk.c | 26 +++++-------
> >  hw/scsi/virtio-scsi-dataplane.c | 68 ++++++++++++++++++--------------
> >  hw/virtio/virtio-bus.c          | 70 +++++++++++++++++++++++++++++++++
> >  hw/virtio/virtio-pci.c          | 53 +++++++++++++++++--------
> >  softmmu/memory.c                | 42 ++++++++++++--------
> >  8 files changed, 225 insertions(+), 90 deletions(-)
> > 
> > -- 
> > 2.26.3
> > 
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci
  2021-03-25 17:43   ` Stefan Hajnoczi
@ 2021-03-25 20:51     ` Greg Kurz
  0 siblings, 0 replies; 30+ messages in thread
From: Greg Kurz @ 2021-03-25 20:51 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	Jason Wang, qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 3572 bytes --]

On Thu, 25 Mar 2021 17:43:10 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Thu, Mar 25, 2021 at 01:05:16PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Mar 25, 2021 at 04:07:27PM +0100, Greg Kurz wrote:
> > > Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
> > > a serious slow down may be observed on setups with a big enough number
> > > of vCPUs.
> > > 
> > > Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):
> > > 
> > > 1		0m20.922s	0m21.346s
> > > 2		0m21.230s	0m20.350s
> > > 4		0m21.761s	0m20.997s
> > > 8		0m22.770s	0m20.051s
> > > 16		0m22.038s	0m19.994s
> > > 32		0m22.928s	0m20.803s
> > > 64		0m26.583s	0m22.953s
> > > 128		0m41.273s	0m32.333s
> > > 256		2m4.727s 	1m16.924s
> > > 384		6m5.563s 	3m26.186s
> > > 
> > > Both perf and gprof indicate that QEMU is hogging CPUs when setting up
> > > the ioeventfds:
> > > 
> > >  67.88%  swapper         [kernel.kallsyms]  [k] power_pmu_enable
> > >   9.47%  qemu-kvm        [kernel.kallsyms]  [k] smp_call_function_single
> > >   8.64%  qemu-kvm        [kernel.kallsyms]  [k] power_pmu_enable
> > > =>2.79%  qemu-kvm        qemu-kvm           [.] memory_region_ioeventfd_before
> > > =>2.12%  qemu-kvm        qemu-kvm           [.] address_space_update_ioeventfds
> > >   0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single
> > > 
> > > address_space_update_ioeventfds() is called when committing an MR
> > > transaction, i.e. for each ioeventfd with the current code base,
> > > and it internally loops on all ioventfds:
> > > 
> > > static void address_space_update_ioeventfds(AddressSpace *as)
> > > {
> > > [...]
> > >     FOR_EACH_FLAT_RANGE(fr, view) {
> > >         for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
> > > 
> > > This means that the setup of ioeventfds for these devices has
> > > quadratic time complexity.
> > > 
> > > This series introduce generic APIs to allow batch creation and deletion
> > > of ioeventfds, and converts virtio-blk and virtio-scsi to use them. This
> > > greatly improves the numbers:
> > > 
> > > 1		0m21.271s	0m22.076s
> > > 2		0m20.912s	0m19.716s
> > > 4		0m20.508s	0m19.310s
> > > 8		0m21.374s	0m20.273s
> > > 16		0m21.559s	0m21.374s
> > > 32		0m22.532s	0m21.271s
> > > 64		0m26.550s	0m22.007s
> > > 128		0m29.115s	0m27.446s
> > > 256		0m44.752s	0m41.004s
> > > 384		1m2.884s	0m58.023s
> > > 
> > > The series deliberately spans over multiple subsystems for easier
> > > review and experimenting. It also does some preliminary fixes on
> > > the way. It is thus posted as an RFC for now, but if the general
> > > idea is acceptable, I guess a non-RFC could be posted and maybe
> > > extend the feature to some other devices that might suffer from
> > > similar scaling issues, e.g. vhost-scsi-pci, vhost-user-scsi-pci
> > > and vhost-user-blk-pci, even if I haven't checked.
> > > 
> > > This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
> > > which reported the issue for virtio-scsi-pci.
> > 
> > 
> > Series looks ok from a quick look ...
> > 
> > this is a regression isn't it?
> > So I guess we'll need that in 6.0 or revert the # of vqs
> > change for now ...
> 
> Commit 9445e1e15e66c19e42bea942ba810db28052cd05 ("virtio-blk-pci:
> default num_queues to -smp N") was already released in QEMU 5.2.0. It is
> not a QEMU 6.0 regression.
> 

Oh you're right, I've just checked and QEMU 5.2.0 has the same problem.

> I'll review this series on Monday.
> 

Thanks !

> Stefan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 1/8] memory: Allow eventfd add/del without starting a transaction
  2021-03-25 15:07 ` [RFC 1/8] memory: Allow eventfd add/del without starting a transaction Greg Kurz
@ 2021-03-29 17:03   ` Stefan Hajnoczi
  2021-03-30  7:47     ` Greg Kurz
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2021-03-29 17:03 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 2443 bytes --]

On Thu, Mar 25, 2021 at 04:07:28PM +0100, Greg Kurz wrote:
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 5728a681b27d..98ed552e001c 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -1848,13 +1848,25 @@ void memory_region_clear_flush_coalesced(MemoryRegion *mr);
>   * @match_data: whether to match against @data, instead of just @addr
>   * @data: the data to match against the guest write
>   * @e: event notifier to be triggered when @addr, @size, and @data all match.
> + * @transaction: whether to start a transaction for the change

"start" is unclear. Does it begin a transaction and return with the
transaction unfinished? I think instead the function performs the
eventfd addition within a transaction. It would be nice to clarify this.

>   **/
> -void memory_region_add_eventfd(MemoryRegion *mr,
> -                               hwaddr addr,
> -                               unsigned size,
> -                               bool match_data,
> -                               uint64_t data,
> -                               EventNotifier *e);
> +void memory_region_add_eventfd_full(MemoryRegion *mr,
> +                                    hwaddr addr,
> +                                    unsigned size,
> +                                    bool match_data,
> +                                    uint64_t data,
> +                                    EventNotifier *e,
> +                                    bool transaction);
> +
> +static inline void memory_region_add_eventfd(MemoryRegion *mr,
> +                                             hwaddr addr,
> +                                             unsigned size,
> +                                             bool match_data,
> +                                             uint64_t data,
> +                                             EventNotifier *e)
> +{
> +    memory_region_add_eventfd_full(mr, addr, size, match_data, data, e, true);
> +}
>  
>  /**
>   * memory_region_del_eventfd: Cancel an eventfd.
> @@ -1868,13 +1880,25 @@ void memory_region_add_eventfd(MemoryRegion *mr,
>   * @match_data: whether to match against @data, instead of just @addr
>   * @data: the data to match against the guest write
>   * @e: event notifier to be triggered when @addr, @size, and @data all match.
> + * @transaction: whether to start a transaction for the change

Same here.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/8] virtio: Introduce virtio_bus_set_host_notifiers()
  2021-03-25 15:07 ` [RFC 2/8] virtio: Introduce virtio_bus_set_host_notifiers() Greg Kurz
@ 2021-03-29 17:06   ` Stefan Hajnoczi
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2021-03-29 17:06 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 943 bytes --]

On Thu, Mar 25, 2021 at 04:07:29PM +0100, Greg Kurz wrote:
> Multiqueue devices such as virtio-scsi or virtio-blk, all open-code the
> same pattern to setup/tear down host notifiers of the request virtqueues.
> Consolidate the pattern in a new virtio_bus_set_host_notifiers() API.
> Since virtio-scsi and virtio-blk both fallback to userspace if host
> notifiers can't be set, e.g. file descriptor exhaustion, go for a
> warning rather than an error. Also make it oneshot to avoid flooding
> the logs since the message would be very likely the same for all
> virtqueues.
> 
> Devices will be converted to use virtio_bus_set_host_notifiers() in
> separate patches.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  include/hw/virtio/virtio-bus.h |  3 +++
>  hw/virtio/virtio-bus.c         | 36 ++++++++++++++++++++++++++++++++++
>  2 files changed, 39 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 3/8] virtio: Add API to batch set host notifiers
  2021-03-25 15:07 ` [RFC 3/8] virtio: Add API to batch set host notifiers Greg Kurz
@ 2021-03-29 17:10   ` Stefan Hajnoczi
  2021-03-30 10:17     ` Greg Kurz
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2021-03-29 17:10 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 4658 bytes --]

On Thu, Mar 25, 2021 at 04:07:30PM +0100, Greg Kurz wrote:
> Introduce VirtioBusClass methods to begin and commit a transaction
> of setting/unsetting host notifiers. These handlers will be implemented
> by virtio-pci to batch addition and deletion of ioeventfds for multiqueue
> devices like virtio-scsi-pci or virtio-blk-pci.
> 
> Convert virtio_bus_set_host_notifiers() to use these handlers. Note that
> virtio_bus_cleanup_host_notifier() closes eventfds, which could still be
> passed to the KVM_IOEVENTFD ioctl() when the transaction ends and fail
> with EBADF. The cleanup of the host notifiers is thus pushed to a
> separate loop in virtio_bus_unset_and_cleanup_host_notifiers(), after
> transaction commit.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  include/hw/virtio/virtio-bus.h |  4 ++++
>  hw/virtio/virtio-bus.c         | 34 ++++++++++++++++++++++++++++++++++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
> index 6d1e4ee3e886..99704b2c090a 100644
> --- a/include/hw/virtio/virtio-bus.h
> +++ b/include/hw/virtio/virtio-bus.h
> @@ -82,6 +82,10 @@ struct VirtioBusClass {
>       */
>      int (*ioeventfd_assign)(DeviceState *d, EventNotifier *notifier,
>                              int n, bool assign);
> +
> +    void (*ioeventfd_assign_begin)(DeviceState *d);
> +    void (*ioeventfd_assign_commit)(DeviceState *d);

Please add doc comments for these new functions.

> +
>      /*
>       * Whether queue number n is enabled.
>       */
> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> index c9e7cdb5c161..156484c4ca14 100644
> --- a/hw/virtio/virtio-bus.c
> +++ b/hw/virtio/virtio-bus.c
> @@ -295,6 +295,28 @@ int virtio_bus_set_host_notifier(VirtioBusState *bus, int n, bool assign)
>      return r;
>  }
>  
> +static void virtio_bus_set_host_notifier_begin(VirtioBusState *bus)
> +{
> +    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
> +    DeviceState *proxy = DEVICE(BUS(bus)->parent);
> +
> +    if (k->ioeventfd_assign_begin) {
> +        assert(k->ioeventfd_assign_commit);
> +        k->ioeventfd_assign_begin(proxy);
> +    }
> +}
> +
> +static void virtio_bus_set_host_notifier_commit(VirtioBusState *bus)
> +{
> +    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
> +    DeviceState *proxy = DEVICE(BUS(bus)->parent);
> +
> +    if (k->ioeventfd_assign_commit) {
> +        assert(k->ioeventfd_assign_begin);
> +        k->ioeventfd_assign_commit(proxy);
> +    }
> +}
> +
>  void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
>  {
>      VirtIODevice *vdev = virtio_bus_get_device(bus);
> @@ -308,6 +330,7 @@ void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
>      event_notifier_cleanup(notifier);
>  }
>  
> +/* virtio_bus_set_host_notifier_begin() must have been called */
>  static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
>                                                          int nvqs, int n_offset)
>  {
> @@ -315,6 +338,10 @@ static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
>  
>      for (i = 0; i < nvqs; i++) {
>          virtio_bus_set_host_notifier(bus, i + n_offset, false);
> +    }
> +    /* Let address_space_update_ioeventfds() run before closing ioeventfds */

assert(memory_region_transaction_depth == 0)?

> +    virtio_bus_set_host_notifier_commit(bus);
> +    for (i = 0; i < nvqs; i++) {
>          virtio_bus_cleanup_host_notifier(bus, i + n_offset);
>      }
>  }
> @@ -327,17 +354,24 @@ int virtio_bus_set_host_notifiers(VirtioBusState *bus, int nvqs, int n_offset,
>      int rc;
>  
>      if (assign) {
> +        virtio_bus_set_host_notifier_begin(bus);
> +
>          for (i = 0; i < nvqs; i++) {
>              rc = virtio_bus_set_host_notifier(bus, i + n_offset, true);
>              if (rc != 0) {
>                  warn_report_once("%s: Failed to set host notifier (%s).\n",
>                                   vdev->name, strerror(-rc));
>  
> +                /* This also calls virtio_bus_set_host_notifier_commit() */
>                  virtio_bus_unset_and_cleanup_host_notifiers(bus, i, n_offset);
>                  return rc;
>              }
>          }
> +
> +        virtio_bus_set_host_notifier_commit(bus);
>      } else {
> +        virtio_bus_set_host_notifier_begin(bus);
> +        /* This also calls virtio_bus_set_host_notifier_commit() */
>          virtio_bus_unset_and_cleanup_host_notifiers(bus, nvqs, n_offset);
>      }
>  
> -- 
> 2.26.3
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 4/8] virtio-pci: Batch add/del ioeventfds in a single MR transaction
  2021-03-25 15:07 ` [RFC 4/8] virtio-pci: Batch add/del ioeventfds in a single MR transaction Greg Kurz
@ 2021-03-29 17:24   ` Stefan Hajnoczi
  2021-03-30 10:29     ` Greg Kurz
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2021-03-29 17:24 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 928 bytes --]

On Thu, Mar 25, 2021 at 04:07:31PM +0100, Greg Kurz wrote:
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 1b1942d521cc..0279e5671bcb 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -2368,7 +2368,7 @@ void memory_region_add_eventfd_full(MemoryRegion *mr,
>      if (size) {
>          adjust_endianness(mr, &mrfd.data, size_memop(size) | MO_TE);
>      }
> -    if (transaction) {
> +    if (!transaction) {
>          memory_region_transaction_begin();
>      }
>      for (i = 0; i < mr->ioeventfd_nb; ++i) {
> @@ -2383,7 +2383,7 @@ void memory_region_add_eventfd_full(MemoryRegion *mr,
>              sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
>      mr->ioeventfds[i] = mrfd;
>      ioeventfd_update_pending |= mr->enabled;
> -    if (transaction) {
> +    if (!transaction) {
>          memory_region_transaction_commit();
>      }

Looks like these two hunks belong in a previous patch.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 5/8] virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
  2021-03-25 15:07 ` [RFC 5/8] virtio-blk: Fix rollback path in virtio_blk_data_plane_start() Greg Kurz
@ 2021-03-29 17:25   ` Stefan Hajnoczi
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2021-03-29 17:25 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 946 bytes --]

On Thu, Mar 25, 2021 at 04:07:32PM +0100, Greg Kurz wrote:
> When dataplane multiqueue support was added in QEMU 2.7, the path
> that would rollback guest notifiers assignment in case of error
> simply got dropped.
> 
> Later on, when Error was added to blk_set_aio_context() in QEMU 4.1,
> another error path was introduced, but it ommits to rollback both
> host and guest notifiers.
> 
> It seems cleaner to fix the rollback path in one go. The patch is
> simple enough that it can be adjusted if backported to a pre-4.1
> QEMU.
> 
> Fixes: 51b04ac5c6a6 ("virtio-blk: dataplane multiqueue support")
> Cc: stefanha@redhat.com
> Fixes: 97896a4887a0 ("block: Add Error to blk_set_aio_context()")
> Cc: kwolf@redhat.com
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  hw/block/dataplane/virtio-blk.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 6/8] virtio-blk: Use virtio_bus_set_host_notifiers()
  2021-03-25 15:07 ` [RFC 6/8] virtio-blk: Use virtio_bus_set_host_notifiers() Greg Kurz
@ 2021-03-29 17:26   ` Stefan Hajnoczi
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2021-03-29 17:26 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

On Thu, Mar 25, 2021 at 04:07:33PM +0100, Greg Kurz wrote:
> This allows the virtio-blk-pci device to batch additions and deletions
> of host notifiers. This significantly improves boot time of VMs with a
> high number of vCPUs, e.g. from 3m26.408s down to 0m59.923s for a pseries
> machine with 384 vCPUs.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  hw/block/dataplane/virtio-blk.c | 25 ++++++-------------------
>  1 file changed, 6 insertions(+), 19 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 8/8] virtio-scsi: Use virtio_bus_set_host_notifiers()
  2021-03-25 15:07 ` [RFC 8/8] virtio-scsi: Use virtio_bus_set_host_notifiers() Greg Kurz
@ 2021-03-29 17:28   ` Stefan Hajnoczi
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2021-03-29 17:28 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 546 bytes --]

On Thu, Mar 25, 2021 at 04:07:35PM +0100, Greg Kurz wrote:
> This allows the virtio-scsi-pci device to batch additions and deletions
> of host notifiers. This significantly improves boot time of VMs with a
> high number of vCPUs, e.g. from 6m13.969s down to 1m4.268s for a pseries
> machine with 384 vCPUs.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  hw/scsi/virtio-scsi-dataplane.c | 28 +++++++++++++---------------
>  1 file changed, 13 insertions(+), 15 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci
  2021-03-25 15:07 [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Greg Kurz
                   ` (8 preceding siblings ...)
  2021-03-25 17:05 ` [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Michael S. Tsirkin
@ 2021-03-29 17:35 ` Stefan Hajnoczi
  2021-03-30 13:15   ` Greg Kurz
  9 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2021-03-29 17:35 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 2801 bytes --]

On Thu, Mar 25, 2021 at 04:07:27PM +0100, Greg Kurz wrote:
> Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
> a serious slow down may be observed on setups with a big enough number
> of vCPUs.
> 
> Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):
> 
> 1		0m20.922s	0m21.346s
> 2		0m21.230s	0m20.350s
> 4		0m21.761s	0m20.997s
> 8		0m22.770s	0m20.051s
> 16		0m22.038s	0m19.994s
> 32		0m22.928s	0m20.803s
> 64		0m26.583s	0m22.953s
> 128		0m41.273s	0m32.333s
> 256		2m4.727s 	1m16.924s
> 384		6m5.563s 	3m26.186s
> 
> Both perf and gprof indicate that QEMU is hogging CPUs when setting up
> the ioeventfds:
> 
>  67.88%  swapper         [kernel.kallsyms]  [k] power_pmu_enable
>   9.47%  qemu-kvm        [kernel.kallsyms]  [k] smp_call_function_single
>   8.64%  qemu-kvm        [kernel.kallsyms]  [k] power_pmu_enable
> =>2.79%  qemu-kvm        qemu-kvm           [.] memory_region_ioeventfd_before
> =>2.12%  qemu-kvm        qemu-kvm           [.] address_space_update_ioeventfds
>   0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single
> 
> address_space_update_ioeventfds() is called when committing an MR
> transaction, i.e. for each ioeventfd with the current code base,
> and it internally loops on all ioventfds:
> 
> static void address_space_update_ioeventfds(AddressSpace *as)
> {
> [...]
>     FOR_EACH_FLAT_RANGE(fr, view) {
>         for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
> 
> This means that the setup of ioeventfds for these devices has
> quadratic time complexity.
> 
> This series introduce generic APIs to allow batch creation and deletion
> of ioeventfds, and converts virtio-blk and virtio-scsi to use them. This
> greatly improves the numbers:
> 
> 1		0m21.271s	0m22.076s
> 2		0m20.912s	0m19.716s
> 4		0m20.508s	0m19.310s
> 8		0m21.374s	0m20.273s
> 16		0m21.559s	0m21.374s
> 32		0m22.532s	0m21.271s
> 64		0m26.550s	0m22.007s
> 128		0m29.115s	0m27.446s
> 256		0m44.752s	0m41.004s
> 384		1m2.884s	0m58.023s

Excellent numbers!

I wonder if the code can be simplified since
memory_region_transaction_begin/end() supports nesting. Why not call
them directly from the device model instead of introducing callbacks in
core virtio and virtio-pci code?

Also, do you think there are other opportunities to have a long
transaction to batch up machine init, device hotplug, etc? It's not
clear to me when transactions must be ended. Clearly it's necessary to
end the transaction if we need to do something that depends on the
MemoryRegion, eventfd, etc being updated. But most of the time there is
no immediate need to end the transaction and more code could share the
same transaction before we go back to the event loop or vcpu thread.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 1/8] memory: Allow eventfd add/del without starting a transaction
  2021-03-29 17:03   ` Stefan Hajnoczi
@ 2021-03-30  7:47     ` Greg Kurz
  2021-03-30 10:16       ` Stefan Hajnoczi
  0 siblings, 1 reply; 30+ messages in thread
From: Greg Kurz @ 2021-03-30  7:47 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 3050 bytes --]

On Mon, 29 Mar 2021 18:03:49 +0100
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Thu, Mar 25, 2021 at 04:07:28PM +0100, Greg Kurz wrote:
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 5728a681b27d..98ed552e001c 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -1848,13 +1848,25 @@ void memory_region_clear_flush_coalesced(MemoryRegion *mr);
> >   * @match_data: whether to match against @data, instead of just @addr
> >   * @data: the data to match against the guest write
> >   * @e: event notifier to be triggered when @addr, @size, and @data all match.
> > + * @transaction: whether to start a transaction for the change
> 
> "start" is unclear. Does it begin a transaction and return with the
> transaction unfinished? I think instead the function performs the
> eventfd addition within a transaction. It would be nice to clarify this.
> 

What about: 

 * @transaction: if true, the eventfd is added within a nested transaction,
 *               if false, it is up to the caller to ensure this is called
 *               within a transaction.

> >   **/
> > -void memory_region_add_eventfd(MemoryRegion *mr,
> > -                               hwaddr addr,
> > -                               unsigned size,
> > -                               bool match_data,
> > -                               uint64_t data,
> > -                               EventNotifier *e);
> > +void memory_region_add_eventfd_full(MemoryRegion *mr,
> > +                                    hwaddr addr,
> > +                                    unsigned size,
> > +                                    bool match_data,
> > +                                    uint64_t data,
> > +                                    EventNotifier *e,
> > +                                    bool transaction);
> > +
> > +static inline void memory_region_add_eventfd(MemoryRegion *mr,
> > +                                             hwaddr addr,
> > +                                             unsigned size,
> > +                                             bool match_data,
> > +                                             uint64_t data,
> > +                                             EventNotifier *e)
> > +{
> > +    memory_region_add_eventfd_full(mr, addr, size, match_data, data, e, true);
> > +}
> >  
> >  /**
> >   * memory_region_del_eventfd: Cancel an eventfd.
> > @@ -1868,13 +1880,25 @@ void memory_region_add_eventfd(MemoryRegion *mr,
> >   * @match_data: whether to match against @data, instead of just @addr
> >   * @data: the data to match against the guest write
> >   * @e: event notifier to be triggered when @addr, @size, and @data all match.
> > + * @transaction: whether to start a transaction for the change
> 
> Same here.

and:

 * @transaction: if true, the eventfd is cancelled within a nested transaction,
 *               if false, it is up to the caller to ensure this is called
 *               within a transaction.

?

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 1/8] memory: Allow eventfd add/del without starting a transaction
  2021-03-30  7:47     ` Greg Kurz
@ 2021-03-30 10:16       ` Stefan Hajnoczi
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2021-03-30 10:16 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 1274 bytes --]

On Tue, Mar 30, 2021 at 09:47:49AM +0200, Greg Kurz wrote:
> On Mon, 29 Mar 2021 18:03:49 +0100
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> > On Thu, Mar 25, 2021 at 04:07:28PM +0100, Greg Kurz wrote:
> > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > index 5728a681b27d..98ed552e001c 100644
> > > --- a/include/exec/memory.h
> > > +++ b/include/exec/memory.h
> > > @@ -1848,13 +1848,25 @@ void memory_region_clear_flush_coalesced(MemoryRegion *mr);
> > >   * @match_data: whether to match against @data, instead of just @addr
> > >   * @data: the data to match against the guest write
> > >   * @e: event notifier to be triggered when @addr, @size, and @data all match.
> > > + * @transaction: whether to start a transaction for the change
> > 
> > "start" is unclear. Does it begin a transaction and return with the
> > transaction unfinished? I think instead the function performs the
> > eventfd addition within a transaction. It would be nice to clarify this.
> > 
> 
> What about: 
> 
>  * @transaction: if true, the eventfd is added within a nested transaction,
>  *               if false, it is up to the caller to ensure this is called
>  *               within a transaction.

Sounds good, thanks!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 3/8] virtio: Add API to batch set host notifiers
  2021-03-29 17:10   ` Stefan Hajnoczi
@ 2021-03-30 10:17     ` Greg Kurz
  2021-03-30 13:55       ` Stefan Hajnoczi
  0 siblings, 1 reply; 30+ messages in thread
From: Greg Kurz @ 2021-03-30 10:17 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 5517 bytes --]

On Mon, 29 Mar 2021 18:10:57 +0100
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Thu, Mar 25, 2021 at 04:07:30PM +0100, Greg Kurz wrote:
> > Introduce VirtioBusClass methods to begin and commit a transaction
> > of setting/unsetting host notifiers. These handlers will be implemented
> > by virtio-pci to batch addition and deletion of ioeventfds for multiqueue
> > devices like virtio-scsi-pci or virtio-blk-pci.
> > 
> > Convert virtio_bus_set_host_notifiers() to use these handlers. Note that
> > virtio_bus_cleanup_host_notifier() closes eventfds, which could still be
> > passed to the KVM_IOEVENTFD ioctl() when the transaction ends and fail
> > with EBADF. The cleanup of the host notifiers is thus pushed to a
> > separate loop in virtio_bus_unset_and_cleanup_host_notifiers(), after
> > transaction commit.
> > 
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >  include/hw/virtio/virtio-bus.h |  4 ++++
> >  hw/virtio/virtio-bus.c         | 34 ++++++++++++++++++++++++++++++++++
> >  2 files changed, 38 insertions(+)
> > 
> > diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
> > index 6d1e4ee3e886..99704b2c090a 100644
> > --- a/include/hw/virtio/virtio-bus.h
> > +++ b/include/hw/virtio/virtio-bus.h
> > @@ -82,6 +82,10 @@ struct VirtioBusClass {
> >       */
> >      int (*ioeventfd_assign)(DeviceState *d, EventNotifier *notifier,
> >                              int n, bool assign);
> > +
> > +    void (*ioeventfd_assign_begin)(DeviceState *d);
> > +    void (*ioeventfd_assign_commit)(DeviceState *d);
> 
> Please add doc comments for these new functions.
> 

Will do.

> > +
> >      /*
> >       * Whether queue number n is enabled.
> >       */
> > diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> > index c9e7cdb5c161..156484c4ca14 100644
> > --- a/hw/virtio/virtio-bus.c
> > +++ b/hw/virtio/virtio-bus.c
> > @@ -295,6 +295,28 @@ int virtio_bus_set_host_notifier(VirtioBusState *bus, int n, bool assign)
> >      return r;
> >  }
> >  
> > +static void virtio_bus_set_host_notifier_begin(VirtioBusState *bus)
> > +{
> > +    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
> > +    DeviceState *proxy = DEVICE(BUS(bus)->parent);
> > +
> > +    if (k->ioeventfd_assign_begin) {
> > +        assert(k->ioeventfd_assign_commit);
> > +        k->ioeventfd_assign_begin(proxy);
> > +    }
> > +}
> > +
> > +static void virtio_bus_set_host_notifier_commit(VirtioBusState *bus)
> > +{
> > +    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
> > +    DeviceState *proxy = DEVICE(BUS(bus)->parent);
> > +
> > +    if (k->ioeventfd_assign_commit) {
> > +        assert(k->ioeventfd_assign_begin);
> > +        k->ioeventfd_assign_commit(proxy);
> > +    }
> > +}
> > +
> >  void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
> >  {
> >      VirtIODevice *vdev = virtio_bus_get_device(bus);
> > @@ -308,6 +330,7 @@ void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
> >      event_notifier_cleanup(notifier);
> >  }
> >  
> > +/* virtio_bus_set_host_notifier_begin() must have been called */
> >  static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
> >                                                          int nvqs, int n_offset)
> >  {
> > @@ -315,6 +338,10 @@ static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
> >  
> >      for (i = 0; i < nvqs; i++) {
> >          virtio_bus_set_host_notifier(bus, i + n_offset, false);
> > +    }
> > +    /* Let address_space_update_ioeventfds() run before closing ioeventfds */
> 
> assert(memory_region_transaction_depth == 0)?
> 

Hmm... appart from the fact that memory_region_transaction_depth is
a memory internal thing that shouldn't be exposed here, it seems to
me that memory_region_transaction_depth can be != 0 when, e.g. when
batching is used... or I'm missing something ?

I was actually thinking of adding some asserts for that in the
memory_region_*_eventfd_full() functions introduced by patch 1.

    if (!transaction) {
        memory_region_transaction_begin();
    }
    assert(memory_region_transaction_depth != 0);

> > +    virtio_bus_set_host_notifier_commit(bus);
> > +    for (i = 0; i < nvqs; i++) {
> >          virtio_bus_cleanup_host_notifier(bus, i + n_offset);
> >      }
> >  }
> > @@ -327,17 +354,24 @@ int virtio_bus_set_host_notifiers(VirtioBusState *bus, int nvqs, int n_offset,
> >      int rc;
> >  
> >      if (assign) {
> > +        virtio_bus_set_host_notifier_begin(bus);
> > +
> >          for (i = 0; i < nvqs; i++) {
> >              rc = virtio_bus_set_host_notifier(bus, i + n_offset, true);
> >              if (rc != 0) {
> >                  warn_report_once("%s: Failed to set host notifier (%s).\n",
> >                                   vdev->name, strerror(-rc));
> >  
> > +                /* This also calls virtio_bus_set_host_notifier_commit() */
> >                  virtio_bus_unset_and_cleanup_host_notifiers(bus, i, n_offset);
> >                  return rc;
> >              }
> >          }
> > +
> > +        virtio_bus_set_host_notifier_commit(bus);
> >      } else {
> > +        virtio_bus_set_host_notifier_begin(bus);
> > +        /* This also calls virtio_bus_set_host_notifier_commit() */
> >          virtio_bus_unset_and_cleanup_host_notifiers(bus, nvqs, n_offset);
> >      }
> >  
> > -- 
> > 2.26.3
> > 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 4/8] virtio-pci: Batch add/del ioeventfds in a single MR transaction
  2021-03-29 17:24   ` Stefan Hajnoczi
@ 2021-03-30 10:29     ` Greg Kurz
  0 siblings, 0 replies; 30+ messages in thread
From: Greg Kurz @ 2021-03-30 10:29 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 1246 bytes --]

On Mon, 29 Mar 2021 18:24:40 +0100
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Thu, Mar 25, 2021 at 04:07:31PM +0100, Greg Kurz wrote:
> > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > index 1b1942d521cc..0279e5671bcb 100644
> > --- a/softmmu/memory.c
> > +++ b/softmmu/memory.c
> > @@ -2368,7 +2368,7 @@ void memory_region_add_eventfd_full(MemoryRegion *mr,
> >      if (size) {
> >          adjust_endianness(mr, &mrfd.data, size_memop(size) | MO_TE);
> >      }
> > -    if (transaction) {
> > +    if (!transaction) {
> >          memory_region_transaction_begin();
> >      }
> >      for (i = 0; i < mr->ioeventfd_nb; ++i) {
> > @@ -2383,7 +2383,7 @@ void memory_region_add_eventfd_full(MemoryRegion *mr,
> >              sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
> >      mr->ioeventfds[i] = mrfd;
> >      ioeventfd_update_pending |= mr->enabled;
> > -    if (transaction) {
> > +    if (!transaction) {
> >          memory_region_transaction_commit();
> >      }
> 
> Looks like these two hunks belong in a previous patch.

And they are actually wrong... we *do* want a nested
transaction if 'transaction' is true :) This is a
leftover I thought I had removed but obviously not...

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci
  2021-03-29 17:35 ` Stefan Hajnoczi
@ 2021-03-30 13:15   ` Greg Kurz
  0 siblings, 0 replies; 30+ messages in thread
From: Greg Kurz @ 2021-03-30 13:15 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 4296 bytes --]

On Mon, 29 Mar 2021 18:35:16 +0100
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Thu, Mar 25, 2021 at 04:07:27PM +0100, Greg Kurz wrote:
> > Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
> > a serious slow down may be observed on setups with a big enough number
> > of vCPUs.
> > 
> > Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):
> > 
> > 1		0m20.922s	0m21.346s
> > 2		0m21.230s	0m20.350s
> > 4		0m21.761s	0m20.997s
> > 8		0m22.770s	0m20.051s
> > 16		0m22.038s	0m19.994s
> > 32		0m22.928s	0m20.803s
> > 64		0m26.583s	0m22.953s
> > 128		0m41.273s	0m32.333s
> > 256		2m4.727s 	1m16.924s
> > 384		6m5.563s 	3m26.186s
> > 
> > Both perf and gprof indicate that QEMU is hogging CPUs when setting up
> > the ioeventfds:
> > 
> >  67.88%  swapper         [kernel.kallsyms]  [k] power_pmu_enable
> >   9.47%  qemu-kvm        [kernel.kallsyms]  [k] smp_call_function_single
> >   8.64%  qemu-kvm        [kernel.kallsyms]  [k] power_pmu_enable
> > =>2.79%  qemu-kvm        qemu-kvm           [.] memory_region_ioeventfd_before
> > =>2.12%  qemu-kvm        qemu-kvm           [.] address_space_update_ioeventfds
> >   0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single
> > 
> > address_space_update_ioeventfds() is called when committing an MR
> > transaction, i.e. for each ioeventfd with the current code base,
> > and it internally loops on all ioventfds:
> > 
> > static void address_space_update_ioeventfds(AddressSpace *as)
> > {
> > [...]
> >     FOR_EACH_FLAT_RANGE(fr, view) {
> >         for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
> > 
> > This means that the setup of ioeventfds for these devices has
> > quadratic time complexity.
> > 
> > This series introduce generic APIs to allow batch creation and deletion
> > of ioeventfds, and converts virtio-blk and virtio-scsi to use them. This
> > greatly improves the numbers:
> > 
> > 1		0m21.271s	0m22.076s
> > 2		0m20.912s	0m19.716s
> > 4		0m20.508s	0m19.310s
> > 8		0m21.374s	0m20.273s
> > 16		0m21.559s	0m21.374s
> > 32		0m22.532s	0m21.271s
> > 64		0m26.550s	0m22.007s
> > 128		0m29.115s	0m27.446s
> > 256		0m44.752s	0m41.004s
> > 384		1m2.884s	0m58.023s
> 
> Excellent numbers!
> 
> I wonder if the code can be simplified since
> memory_region_transaction_begin/end() supports nesting. Why not call
> them directly from the device model instead of introducing callbacks in
> core virtio and virtio-pci code?
> 

It seems a bit awkward that the device model should assume a memory
transaction is needed to setup host notifiers, which are ioeventfds
under the hood but the device doesn't know that.

> Also, do you think there are other opportunities to have a long
> transaction to batch up machine init, device hotplug, etc? It's not
> clear to me when transactions must be ended. Clearly it's necessary to

The transaction *must* be ended before calling
virtio_bus_cleanup_host_notifier() because
address_space_add_del_ioeventfds(), called when
finishing the transaction, needs the "to-be-closed"
eventfds to be still open, otherwise the KVM_IOEVENTFD 
ioctl() might fail with EBADF.

See this change in patch 3:

@@ -315,6 +338,10 @@ static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
 
     for (i = 0; i < nvqs; i++) {
         virtio_bus_set_host_notifier(bus, i + n_offset, false);
+    }
+    /* Let address_space_update_ioeventfds() run before closing ioeventfds */
+    virtio_bus_set_host_notifier_commit(bus);
+    for (i = 0; i < nvqs; i++) {
         virtio_bus_cleanup_host_notifier(bus, i + n_offset);
     }
 }

Maybe I should provide more details why we're doing that ?

> end the transaction if we need to do something that depends on the
> MemoryRegion, eventfd, etc being updated. But most of the time there is
> no immediate need to end the transaction and more code could share the
> same transaction before we go back to the event loop or vcpu thread.
> 

I can't tell for all scenarios that involve memory transactions but
it seems this is definitely not the case for ioeventfds : the rest
of the code expects the transaction to be complete.

> Stefan

Thanks for the review !

Cheers,

--
Greg

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 3/8] virtio: Add API to batch set host notifiers
  2021-03-30 10:17     ` Greg Kurz
@ 2021-03-30 13:55       ` Stefan Hajnoczi
  2021-03-30 14:17         ` Greg Kurz
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2021-03-30 13:55 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 1473 bytes --]

On Tue, Mar 30, 2021 at 12:17:40PM +0200, Greg Kurz wrote:
> On Mon, 29 Mar 2021 18:10:57 +0100
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > On Thu, Mar 25, 2021 at 04:07:30PM +0100, Greg Kurz wrote:
> > > @@ -315,6 +338,10 @@ static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
> > >  
> > >      for (i = 0; i < nvqs; i++) {
> > >          virtio_bus_set_host_notifier(bus, i + n_offset, false);
> > > +    }
> > > +    /* Let address_space_update_ioeventfds() run before closing ioeventfds */
> > 
> > assert(memory_region_transaction_depth == 0)?
> > 
> 
> Hmm... appart from the fact that memory_region_transaction_depth is
> a memory internal thing that shouldn't be exposed here, it seems to
> me that memory_region_transaction_depth can be != 0 when, e.g. when
> batching is used... or I'm missing something ?
> 
> I was actually thinking of adding some asserts for that in the
> memory_region_*_eventfd_full() functions introduced by patch 1.
> 
>     if (!transaction) {
>         memory_region_transaction_begin();
>     }
>     assert(memory_region_transaction_depth != 0);

In that case is it safe to call virtio_bus_cleanup_host_notifier()
below? I thought it depends on the transaction committing first.

> 
> > > +    virtio_bus_set_host_notifier_commit(bus);
> > > +    for (i = 0; i < nvqs; i++) {
> > >          virtio_bus_cleanup_host_notifier(bus, i + n_offset);
> > >      }
> > >  }

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 3/8] virtio: Add API to batch set host notifiers
  2021-03-30 13:55       ` Stefan Hajnoczi
@ 2021-03-30 14:17         ` Greg Kurz
  2021-03-31 14:47           ` Stefan Hajnoczi
  0 siblings, 1 reply; 30+ messages in thread
From: Greg Kurz @ 2021-03-30 14:17 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 1702 bytes --]

On Tue, 30 Mar 2021 14:55:42 +0100
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Tue, Mar 30, 2021 at 12:17:40PM +0200, Greg Kurz wrote:
> > On Mon, 29 Mar 2021 18:10:57 +0100
> > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > On Thu, Mar 25, 2021 at 04:07:30PM +0100, Greg Kurz wrote:
> > > > @@ -315,6 +338,10 @@ static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
> > > >  
> > > >      for (i = 0; i < nvqs; i++) {
> > > >          virtio_bus_set_host_notifier(bus, i + n_offset, false);
> > > > +    }
> > > > +    /* Let address_space_update_ioeventfds() run before closing ioeventfds */
> > > 
> > > assert(memory_region_transaction_depth == 0)?
> > > 
> > 
> > Hmm... appart from the fact that memory_region_transaction_depth is
> > a memory internal thing that shouldn't be exposed here, it seems to
> > me that memory_region_transaction_depth can be != 0 when, e.g. when
> > batching is used... or I'm missing something ?
> > 
> > I was actually thinking of adding some asserts for that in the
> > memory_region_*_eventfd_full() functions introduced by patch 1.
> > 
> >     if (!transaction) {
> >         memory_region_transaction_begin();
> >     }
> >     assert(memory_region_transaction_depth != 0);
> 
> In that case is it safe to call virtio_bus_cleanup_host_notifier()
> below? I thought it depends on the transaction committing first.
> 

Yes because the transaction ends...

> > 
> > > > +    virtio_bus_set_host_notifier_commit(bus);
...                here ^^

> > > > +    for (i = 0; i < nvqs; i++) {
> > > >          virtio_bus_cleanup_host_notifier(bus, i + n_offset);
> > > >      }
> > > >  }


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 3/8] virtio: Add API to batch set host notifiers
  2021-03-30 14:17         ` Greg Kurz
@ 2021-03-31 14:47           ` Stefan Hajnoczi
  2021-03-31 16:21             ` Greg Kurz
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2021-03-31 14:47 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 2758 bytes --]

On Tue, Mar 30, 2021 at 04:17:32PM +0200, Greg Kurz wrote:
> On Tue, 30 Mar 2021 14:55:42 +0100
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> > On Tue, Mar 30, 2021 at 12:17:40PM +0200, Greg Kurz wrote:
> > > On Mon, 29 Mar 2021 18:10:57 +0100
> > > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > On Thu, Mar 25, 2021 at 04:07:30PM +0100, Greg Kurz wrote:
> > > > > @@ -315,6 +338,10 @@ static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
> > > > >  
> > > > >      for (i = 0; i < nvqs; i++) {
> > > > >          virtio_bus_set_host_notifier(bus, i + n_offset, false);
> > > > > +    }
> > > > > +    /* Let address_space_update_ioeventfds() run before closing ioeventfds */
> > > > 
> > > > assert(memory_region_transaction_depth == 0)?
> > > > 
> > > 
> > > Hmm... appart from the fact that memory_region_transaction_depth is
> > > a memory internal thing that shouldn't be exposed here, it seems to
> > > me that memory_region_transaction_depth can be != 0 when, e.g. when
> > > batching is used... or I'm missing something ?
> > > 
> > > I was actually thinking of adding some asserts for that in the
> > > memory_region_*_eventfd_full() functions introduced by patch 1.
> > > 
> > >     if (!transaction) {
> > >         memory_region_transaction_begin();
> > >     }
> > >     assert(memory_region_transaction_depth != 0);
> > 
> > In that case is it safe to call virtio_bus_cleanup_host_notifier()
> > below? I thought it depends on the transaction committing first.
> > 
> 
> Yes because the transaction ends...
> 
> > > 
> > > > > +    virtio_bus_set_host_notifier_commit(bus);
> ...                here ^^
> 
> > > > > +    for (i = 0; i < nvqs; i++) {
> > > > >          virtio_bus_cleanup_host_notifier(bus, i + n_offset);
> > > > >      }
> > > > >  }

That contradicts what you said above: "it seems to me that
memory_region_transaction_depth can be != 0 when, e.g. when batching is
used".

If memory_region_transaction_depth can be != 0 when this function is
entered then memory_region_transaction_commit() will have no effect:

  void memory_region_transaction_commit(void)
  {
      AddressSpace *as;

      assert(memory_region_transaction_depth);
      assert(qemu_mutex_iothread_locked());

      --memory_region_transaction_depth;
      if (!memory_region_transaction_depth) {
          ^--- we won't take this branch!

So the code after memory_region_transaction_commit() cannot assume that
anything was actually committed.

That's why I asked about adding assert(memory_region_transaction_depth
== 0) to guarantee that our commit takes effect immediately so that it's
safe to call virtio_bus_cleanup_host_notifier().

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 3/8] virtio: Add API to batch set host notifiers
  2021-03-31 14:47           ` Stefan Hajnoczi
@ 2021-03-31 16:21             ` Greg Kurz
  0 siblings, 0 replies; 30+ messages in thread
From: Greg Kurz @ 2021-03-31 16:21 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, qemu-block, Michael S. Tsirkin,
	qemu-devel, Max Reitz, Paolo Bonzini, David Gibson

[-- Attachment #1: Type: text/plain, Size: 3384 bytes --]

On Wed, 31 Mar 2021 15:47:45 +0100
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Tue, Mar 30, 2021 at 04:17:32PM +0200, Greg Kurz wrote:
> > On Tue, 30 Mar 2021 14:55:42 +0100
> > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > > On Tue, Mar 30, 2021 at 12:17:40PM +0200, Greg Kurz wrote:
> > > > On Mon, 29 Mar 2021 18:10:57 +0100
> > > > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > > On Thu, Mar 25, 2021 at 04:07:30PM +0100, Greg Kurz wrote:
> > > > > > @@ -315,6 +338,10 @@ static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
> > > > > >  
> > > > > >      for (i = 0; i < nvqs; i++) {
> > > > > >          virtio_bus_set_host_notifier(bus, i + n_offset, false);
> > > > > > +    }
> > > > > > +    /* Let address_space_update_ioeventfds() run before closing ioeventfds */
> > > > > 
> > > > > assert(memory_region_transaction_depth == 0)?
> > > > > 
> > > > 
> > > > Hmm... appart from the fact that memory_region_transaction_depth is
> > > > a memory internal thing that shouldn't be exposed here, it seems to
> > > > me that memory_region_transaction_depth can be != 0 when, e.g. when
> > > > batching is used... or I'm missing something ?
> > > > 
> > > > I was actually thinking of adding some asserts for that in the
> > > > memory_region_*_eventfd_full() functions introduced by patch 1.
> > > > 
> > > >     if (!transaction) {
> > > >         memory_region_transaction_begin();
> > > >     }
> > > >     assert(memory_region_transaction_depth != 0);
> > > 
> > > In that case is it safe to call virtio_bus_cleanup_host_notifier()
> > > below? I thought it depends on the transaction committing first.
> > > 
> > 
> > Yes because the transaction ends...
> > 
> > > > 
> > > > > > +    virtio_bus_set_host_notifier_commit(bus);
> > ...                here ^^
> > 
> > > > > > +    for (i = 0; i < nvqs; i++) {
> > > > > >          virtio_bus_cleanup_host_notifier(bus, i + n_offset);
> > > > > >      }
> > > > > >  }
> 
> That contradicts what you said above: "it seems to me that
> memory_region_transaction_depth can be != 0 when, e.g. when batching is
> used".
> 
> If memory_region_transaction_depth can be != 0 when this function is
> entered then memory_region_transaction_commit() will have no effect:
> 
>   void memory_region_transaction_commit(void)
>   {
>       AddressSpace *as;
> 
>       assert(memory_region_transaction_depth);
>       assert(qemu_mutex_iothread_locked());
> 
>       --memory_region_transaction_depth;
>       if (!memory_region_transaction_depth) {

memory_region_transaction_depth should be equal to 1 when
entering the function, not 0... which is the case when
batching.

>           ^--- we won't take this branch!
> 
> So the code after memory_region_transaction_commit() cannot assume that
> anything was actually committed.
> 

Right and nothing in the current code base seems to prevent
memory_region_*_eventfd() to be called within an ongoing
transaction actually. It looks that we might want to fix that
first.

> That's why I asked about adding assert(memory_region_transaction_depth
> == 0) to guarantee that our commit takes effect immediately so that it's
> safe to call virtio_bus_cleanup_host_notifier().
> 

Yes, it was just misplaced and I didn't get the intent at first :)

> Stefan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-03-31 16:23 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-25 15:07 [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Greg Kurz
2021-03-25 15:07 ` [RFC 1/8] memory: Allow eventfd add/del without starting a transaction Greg Kurz
2021-03-29 17:03   ` Stefan Hajnoczi
2021-03-30  7:47     ` Greg Kurz
2021-03-30 10:16       ` Stefan Hajnoczi
2021-03-25 15:07 ` [RFC 2/8] virtio: Introduce virtio_bus_set_host_notifiers() Greg Kurz
2021-03-29 17:06   ` Stefan Hajnoczi
2021-03-25 15:07 ` [RFC 3/8] virtio: Add API to batch set host notifiers Greg Kurz
2021-03-29 17:10   ` Stefan Hajnoczi
2021-03-30 10:17     ` Greg Kurz
2021-03-30 13:55       ` Stefan Hajnoczi
2021-03-30 14:17         ` Greg Kurz
2021-03-31 14:47           ` Stefan Hajnoczi
2021-03-31 16:21             ` Greg Kurz
2021-03-25 15:07 ` [RFC 4/8] virtio-pci: Batch add/del ioeventfds in a single MR transaction Greg Kurz
2021-03-29 17:24   ` Stefan Hajnoczi
2021-03-30 10:29     ` Greg Kurz
2021-03-25 15:07 ` [RFC 5/8] virtio-blk: Fix rollback path in virtio_blk_data_plane_start() Greg Kurz
2021-03-29 17:25   ` Stefan Hajnoczi
2021-03-25 15:07 ` [RFC 6/8] virtio-blk: Use virtio_bus_set_host_notifiers() Greg Kurz
2021-03-29 17:26   ` Stefan Hajnoczi
2021-03-25 15:07 ` [RFC 7/8] virtio-scsi: Set host notifiers and callbacks separately Greg Kurz
2021-03-25 15:07 ` [RFC 8/8] virtio-scsi: Use virtio_bus_set_host_notifiers() Greg Kurz
2021-03-29 17:28   ` Stefan Hajnoczi
2021-03-25 17:05 ` [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci Michael S. Tsirkin
2021-03-25 17:43   ` Stefan Hajnoczi
2021-03-25 20:51     ` Greg Kurz
2021-03-25 18:05   ` Greg Kurz
2021-03-29 17:35 ` Stefan Hajnoczi
2021-03-30 13:15   ` Greg Kurz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).